Proceedings of International Conference on Recent Trends in Computing: ICRTC 2022 9811988242, 9789811988240

This book is a collection of high-quality peer-reviewed research papers presented at International Conference on Recent

748 73 23MB

English Pages 836 [837] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface and About the Organization
Contents
Editors and Contributors
A Novel High Gain Single Switch Flyback DC–DC Converter for Small-Scale Lightning
1 Introduction
2 Proposed Topology
3 Mathematical Modelling
3.1 Duty Cycle and Turns Ratio
3.2 Design of Inductors
3.3 Efficiency of the Proposed Converter
3.4 Primary and Peak Currents:
3.5 Power and Voltage Ratings of the Designed Converter
4 Modelling Closed Loop Mechanism
4.1 PI Controller
5  Simulation and Results
6 Hardware Analysis
7 Conclusion
References
Learning-Based Model for Auto-Form Filling
1 Introduction
1.1 Existing System
1.2 Proposed System
1.3 Advantages of Automatic Form Filler
2 Working of Local Model
3 Algorithms
3.1 Updating Module
3.2 Search Module
4 Analysis
5 Conclusion and Future Scope
References
Fatality Prediction in Road Accidents Using Neural Networks
1 Introduction
2 Literature Review
3 Methodology
4 Results and Analyses
5 Conclusions
References
Managing Peer Review Process of Research Paper
1 Introduction
2 Literature Survey
2.1 The Effect of Enterprise Resource Planning (ERP) System and Their Practical Use Cases on Business Performances [3]
2.2 A Research Study on the Enterprise Resource Planning (ERP) System Implementation and Current Trends in ERP [5]
2.3 Enterprise Resource Planning (ERP) System Implementation: A Case for User Participation [6]
2.4 Research on Data Security Technology Based on Cloud Storage [9]
2.5 Future Improvisation Is the Work
2.6 Why ERP?
3 Proposed Methodology
3.1 Work Flow of the System
3.2 Different Components of the System
4 Result
5 Conclusion
References
Internet of Things-Based Centralised Water Distribution Monitoring System
1 Introduction
2 Proposed Cloud Architecture for Water Metre
2.1 Sensors and Actuators Layer
2.2 IoT Device/Edge Computing Layer
2.3 Cloud Provider Layer
2.4 Enterprise Network Layer
3 Backbone of Water Metre Network
4 Communication Between Water Metres and Server
5 Develop Water Metre and Its Components
6 Results
7 Conclusion
References
StakePage: Analysis of Stakeholders of an Information System Using Page Rank Algorithm
1 Introduction
2 Related Work
3 Proposed Method
4 Case Study
5 Comparative Study
6 Conclusion and Future Work
References
Violence Recognition from Videos Using Deep Learning
1 Introduction
2 Related Work
3 Research Gap
4 Proposed Methodology
5 Experimental Results
5.1 Confusion Matrix
5.2 Output
6 Classification Report
6.1 Training and Validation Loss
6.2 Training and Validation Accuracy
7 Conclusion
References
Stock Price Prediction Using Machine Learning
1 Introduction
2 Related Work
3 Proposed Methodology
4 Results and Discussion
5 Conclusion
References
Brain Tumor Detection Using Deep Learning
1 Introduction
2 Related Work
3 Proposed Method
3.1 Dataset Description
3.2 FlowChart
3.3 Image Preprocessing
3.4 Image Enhancement
3.5 Thresholding
3.6 Morphological Operations
3.7 Brain Tumor Image Classification Using CNN
3.8 Convolution
3.9 Pooling
4 Performance Evaluation
4.1 Performance of VGG 16
4.2 Performance of VGG 19
4.3 Performance of Resnet50
4.4 Comparison of CNN Models
5 Conclusion
References
Predicting Chances of Cardiovascular Diseases Through Integration of Feature Selection and Ensemble Learning
1 Introduction
2 Literature Review
3 Proposed Framework
4 Performance Evaluation
4.1 Data Analysis
4.2 Classifiers Evaluated
4.3 Feature Selection
5 Result
6 Conclusion
References
Feedback Analysis of Online Teaching Using SVM
1 Introduction
2 Literature Review
3 Methodology
3.1 Sentiment Analysis
3.2 Experimental Result Analysis
4 Result
5 Conclusion
References
DCGAN for Data Augmentation in Pneumonia Chest X-Ray Image Classification
1 Introduction
2 Related Work
3 Dataset Description
4 Proposed Approach
4.1 Conventional Data Augmentation
4.2 Generative Adversarial Network (GAN)
4.3 Convolutional Neural Network (CNN)
4.4 DCGAN
4.5 Performance Metrics
5 Result and Discussions
6 Conclusion
References
Fair Quality of Voice Over WiMAX Coexisting of WiFi Networks for Video Streaming Applications
1 Introduction
1.1 Voice Over Internet Protocol
1.2 VoIP Over IEEE 802.11
1.3 VoIP Over IEEE 802.16
1.4 Mean Opinion Score (MOS) Value
2 A Brief Introduction About Network Model
3 Simulation Results
4 Conclusion
References
An Empirical Overview on DDoS: Taxonomy, Attacks, Tools and Attack Detection Mechanism
1 Introduction
2 Literature Survey
3 Taxonomy of Distributed Denial of Service Attacks
4 DDoS Tools
5 DDoS Attack Detection Framework
6 Proposed Attack Detection Mechanism
7 Result Analysis
8 Conclusion
References
Histopathology Osteosarcoma Image Classification
1 Introduction
2 Material and Method
2.1 Dataset and Methodology
2.2 CNN
2.3 CNN Model
2.4 Machine Learning Models Using Predefined Features
2.5 Experiment
3 Result and Discussion
4 Conclusion and Future Scope
References
Information-Based Image Extraction with Data Mining Techniques for Quality Retrieval
1 Introduction
2 Background
3 Proposed Scheme
3.1 Grid and Segmentation
3.2 Feature Retrieval
3.3 Neighborhood Cluster Module
3.4 Prediction Estimation Block
3.5 Feature Retrieval Question Block
3.6 Conversion of Low-Level Features into High Level Features
3.7 Image Recovery Based on High Level Feature
4 Implementation
5 Conclusions
References
Fake News Detection System Using Multinomial Naïve Bayes Classifier
1 Introduction
2 Literature Survey
3 Various Machine Learning Algorithm
4 Fake News Detection Using Naive Bayes Classification
5 Architecture of Multinomial Naïve Bayes Classifier for Fake News Detection
6 Performance Evaluation Metrics and Experimental Analysis
7 Conclusion
References
Superconductivity-Based Energy Storage System for Microgrid Stabilization by Connecting and Disconnecting Loads
1 Introduction
2 Selection of Architectures
3 Selection of Electronics Components
4 Simulations and Analysis
5 Conclusion
References
Deep Learning-Based Model for Face Mask Detection in the Era of COVID-19 Pandemic
1 Introduction
1.1 Main Objective of the Paper
2 Literature Review
3 Methodology Used
3.1 Datasets
3.2 Data Pre-processing
3.3 ConvNet
3.4 MobileNet
3.5 VGG19
3.6 Inception
3.7 DenseNet
3.8 ResNet50
4 Result
5 Conclusion
References
Efficient System to Predict Harvest Based on the Quality of the Crop Using Supervised Techniques and Boosting Classifiers
1 Introduction
2 Literature Survey
3 Proposed Methodology
4 Experimental Framework
4.1 Dataset Description
4.2 System Requirements
4.3 Performance Metrics
5 Results and Discussion
5.1 Comparative Analysis
6 Conclusion and Future Work
References
ResNet: Solving Vanishing Gradient in Deep Networks
1 Introduction
2 Related Work
3 Residual Network
3.1 Mathematical Representation of ResNet
3.2 Types of ResNet and Their Comparison
3.3 Solution to Vanishing Gradients
4 Dataset and Implementation
4.1 Dataset
4.2 TensorFlow
5 Network Design
5.1 Network Architecture
5.2 Data Augmentation
6 Experiment Results
7 Conclusion
References
BRCA1 Genomic Sequence-Based Early Stage Breast Cancer Detection
1 Introduction
2 Literature Survey
3 Proposed Work
3.1 BRCA1 Dataset Description
3.2 Data Pre-processing
3.3 Genomic Sequence Encoding
4 Experimental Results
5 Conclusion
References
Develop Model for Recognition of Handwritten Equation Using Machine Learning
1 Introduction
2 Literature Review
3 Proposed Model
4 Result
5 Conclusion
References
Feature Over Exemplification-Based Classification for Revelation of Hypothyroid
1 Introduction
2 Literature Review
3 Our Contributions
4 Results and Predictive Analysis
5 Conclusion
References
A Framework with IOAHT for Heat Stress Detection and Haemoprotozoan Disease Classification Using Multimodal Approach Combining LSTM and CNN
1 Introduction
2 Literature Review
3 Proposed Method
3.1 LSTM Layer
3.2 Batch Normalization
3.3 Convolutional Layer
3.4 Pooling Layer
3.5 Output Layer
3.6 Simulation Setup
4 Results
5 Conclusion
References
Using Classifier Ensembles to Predict Election Results Using Twitter Data Sentiment Analysis
1 Introduction
2 Related Work
3 Methodology
3.1 Acquiring Data
3.2 Data Pre-processing
3.3 Classification of Sentiments Using SentiWordNet
3.4 Naive Bayes for Sentiment Classification
3.5 Sentiment Classification Using HMM
3.6 Ensemble Approach for Sentiment Classification
4 Algorithms to Calculate Sentiment Score
5 Experimental Details
5.1 Results on the Twitter Sentiment Analysis Datasets
5.2 Results on Healthcare Reform Dataset
6 Conclusion
References
Optimization Algorithms to Reduce Route Travel Time
1 Introduction
2 System Architecture and Design
3 Methods and Methodologies
3.1 Optimization Using Genetic Algorithm
3.2 Optimization Using Ant Colony Optimization
3.3 Optimization Using Grey Wolf Optimization
3.4 Optimization Using Artificial Bee Colony Algorithm
4 Results
5 Conclusion
References
Survival Analysis and Its Application of Discharge Time Likelihood Prediction Using Clinical Data on COVID-19 Patients-Machine-Learning Approaches
1 Introduction
2 Methodology
2.1 Analysis of Longevity
2.2 Artificial Neural Networks—ANN
2.3 Methods and Details
3 Data Analysis
4 Conclusion
References
Credit Card Fraud Detection Using Machine Learning and Incremental Learning
1 Introduction
2 Literature Reviewed
3 Implementation
3.1 Data Preprocessing and Feature Extraction
3.2 Model Training and Class Imbalance Correction
3.3 Feature Selection
3.4 Parametric Tuning and K-Fold Cross Validation
3.5 Incremental Learning
4 Results
5 Conclusion
References
Game Data Visualization Using Artificial Intelligence Techniques
1 Introduction
2 Related Work
3 Proposed Work
3.1 Application AI in Games
3.2 Random Forest Regression
4 Dataset Preparation and Analysis
5 Conclusion and Future Work
References
Energy-Efficient and Fast Data Collection in WSN Using Genetic Algorithm
1 Introduction
2 Literature Review
3 Proposed Method
3.1 Genetic Algorithm Concept
3.2 Modified LEACH Protocol
4 Result and Simulation
5 Conclusion
References
Feature Reduced Anova Element Oversampling Elucidation Based Categorisation for Hepatitis C Virus Prognostication
1 Introduction
2 Literature Review
3 Our Contributions
4 Implementation Setup
5 Conclusion
References
Personality Trait Detection Using Handwriting Analysis by Machine Learning
1 Introduction
2 Literature Survey
3 Proposed System
3.1 Data Acquisition
3.2 Data Preprocessing
3.3 Segmentation (Horizontal and Vertical Projections)
3.4 Feature Extraction
3.5 Classification
4 Implementation
4.1 Extraction of Letter ‘S’
4.2 Extraction of Letter ‘M’
4.3 Extraction of Letter ‘D’
4.4 Extraction of Letter ‘P’
5 Result
6 Conclusion
References
Road Traffic Density Classification to Improvise Traffic System Using Convolutional Neural Network (CNN)
1 Introduction
2 Literature Review
3 Accident Data Analysis and the Need for Traffic Density Classification
4 Proposed Methodology
4.1 “Preprocessing”
4.2 CNN Module
4.3 Fully Connected Layer
5 Implementation and Results
5.1 Data Collection
5.2 Data Preprocessing
5.3 Building the Neural Network
5.4 Training the Network
5.5 Performance Evaluation
6 Conclusion
References
Fake Reviews Detection Using Multi-input Neural Network Model
1 Introduction
2 Related Work
3 Proposed Model
3.1 Dataset
3.2 Preprocessing
3.3 Feature Selection
3.4 Model Training
3.5 Model Evaluation
4 Result and Discussion
5 Confusion Matrix
6 Conclusion
References
Classification of Yoga Poses Using Integration of Deep Learning and Machine Learning Techniques
1 Introduction
2 Related Work
2.1 Human Pose Detection
2.2 Yoga Pose Detection
2.3 Classification Approaches
3 Methodology
3.1 Dataset
3.2 Approach
4 Results and Discussion
5 Conclusion and Future Scope
References
Tabular Data Extraction From Documents
1 Introduction
2 Market Survey
3 Literature Review
4 Methodology
5 Result
6 Conclusion
References
Vision-Based System for Road Lane Detection and Lane Type Classification
1 Introduction
1.1 Different Type of Road Lanes and Their Purpose:
2 Literature Review
2.1 Traditional Approaches
2.2 Deep Learning-Based Approaches
2.3 Approaches for Lane Type Classification
3 Methodology
3.1 Data Acquisition and Image Preprocessing
3.2 Lane Boundary Detection
3.3 Classification of Lane Boundary
4 Result
5 Conclusion
References
An Energy-Efficient Cluster Head Selection in MANETs Using Emperor Penguin Optimization Fuzzy Genetic Algorithm
1 Introduction
2 Review of Literature
3 Proposed Methodology: Energy-Efficient CH Selection in MANET Using Emperor Penguin Optimization Fuzzy Genetic Algorithm
3.1 System Model
3.2 Proposed EPO (Emperor Penguin Optimization)-Based Clustering
4 Experimental Results and Discussion
5 Conclusion
References
Ground Water Quality Index Prediction Using Random Forest Model
1 Introduction
2 Literature Review
3 Methodology
3.1 Dataset
3.2 Variable Importance
3.3 WQI Calculation
4 Drinking Water WQI and Irrigation Water WQI Prediction
5 Conclusion
References
Near Threshold Operation Based a Bug Immune DET-FF for IoT Applications
1 Introduction
2 Existing Work
3 New Bug Immune DET-FF
4 Result Analysis
5 Conclusion
References
Analyzing the Trade-Off Between Complexity Measures, Ambiguity in Insertion System and Its Application in Natural Languages
1 Introduction
2 Preliminaries
3 New Descriptional Complexity Measures
4 Trade-Off Results Between (Descriptional) Complexity Measures and Ambiguity Levels
5 Application of the Trade-Off Results in Natural Languages
6 Conclusion
References
Human-to-Computer Interaction Using Hand Gesture Tracking System
1 Introduction
1.1 An Overview to Hand Gesture Tracking System
2 Literature Review
3 Proposed Methodology
3.1 Real-Time Video Acquisition -
3.2 Flipping of Individual Video Frames -
3.3 BGR to HSV
3.4 Colour Identification
3.5 Removing Noise and Binary Image Formation
3.6 Find Contours and Draw Centroids
3.7 Set Cursor Position
3.8 Choose an Action
3.9 Perform an Action
4 Conclusion
References
Human Emotion Recognition Based on EEG Signal Using Deep Learning Approach
1 Introduction
2 Proposed Work
2.1 Dataset Description
2.2 Data Preprocessing
2.3 Feature Selection and Extraction
2.4 Classification
3 Results and Discussion
4 Conclusion and Future Work
References
Sentiment Analysis of COVID-19 Tweets Using BiLSTM and CNN-BiLSTM
1 Introduction
2 Literature Study
3 Methods
4 Implementation
4.1 Data Collection and Preparation
4.2 Lemmatization
4.3 Tokenization and Padding
4.4 Classification
5 Result and Discussion
5.1 BiLSTM
5.2 CNN-BiLSTM
6 Conclusion
References
COPRAS-Based Decision-Making Strategy for Optimal Cluster Head Selection in WSNs
1 Introduction
2 Related Work
3 Proposed COPRAS-Based Optimal Cluster Head Selection (COPRAS-CHS) Technique
4 Results and Discussion
5 Conclusion
References
Unusual Activity Detection Using Machine Learning
1 Introduction
2 Related Work
3 Methodology Followed
3.1 Dataset
3.2 Implementation
4 Result
5 Conclusion
References
Disease Detection for Cotton Crop Through Convolutional Neural Network
1 Introduction
2 Literature Survey
3 Methodology
3.1 Dataset Description
3.2 Pre-processing Images
4 Results
5 Conclusion and Future Work
References
Deriving Pipeline for Emergency Services Using Natural Language Processing Techniques
1 Introduction
2 Literature Review
3 Data Pipeline
4 Proposed Model Pipeline
5 Automation
6 Results
7 Conclusion
8 Future Work
References
Fetal Head Ultrasound Image Segmentation Using Region-Based, Edge-Based and Clustering Strategies
1 Introduction
2 Methodology
2.1 Proposed Technique
2.2 U HC18 Dataset
2.3 Region-Based Segmentation
2.4 Edge-Based Segmentation
2.5 Clustering Strategies
3 Results and Discussion
3.1 Comparative Analysis
3.2 Performance Metrics
4 Conclusion
References
A Shallow Convolutional Neural Network Model for Breast Cancer Histopathology Image Classification
1 Introduction
2 Data Set
3 Methodology
3.1 Pre-processing
3.2 CNN Architecture
3.3 Training and Testing Strategy
4 Results and Discussions
4.1 Analysis of Accuracy
5 Conclusion
References
Efficient Packet Flow Path Allocation Using Node Proclivity Tracing Algorithm
1 Introduction
2 Literature Review
3 Proposed Efficient Packet Flow Path Allocation
3.1 Analysis of Packet Flow and Size
3.2 Efficient Packet Flow Path Allocation Algorithm
4 Implementation Simulation Setup
5 Results and Performance Analysis
6 Conclusion
References
Energy-Efficient Multilevel Routing Protocol for IoT-Assisted WSN
1 Introduction
2 Related Work
3 Proposed Energy-Efficient Multilevel Routing Protocol
3.1 Setup Phase
3.2 Formation of Cluster
3.3 Selection of Cluster Head
4 Simulation Results and Validation
5 Conclusion
References
Ship Detection from Satellite Imagery Using RetinaNet with Instance Segmentation
1 Introduction
2 Related Work
3 Methodology
3.1 Architecture of RetinaNet Model with Instance Segmentation Part
3.2 Instance Segmentation Part
3.3 Focal Loss
4 Results and Discussion
4.1 Dataset Used
4.2 Train the Proposed Model
4.3 Evaluation of Proposed Model
4.4 Model Output
5 Conclusion
References
A Novel Technique of Mixed Gas Identification Based on the Group Method of Data Handling (GMDH) on Time-Dependent MOX Gas Sensor Data
1 Introduction
2 Literature Review
3 Data Collection
3.1 Experimental Setup
3.2 Dataset
4 Methodology
5 Experimental Results
5.1 Performance Indicators
5.2 Statistical Error Analysis
6 Conclusion
References
Software Fault Diagnosis via Intelligent Data Mining Algorithms
1 Introduction
2 Related Work
2.1 Software Fault Seeding
2.2 Mutation Testing
2.3 Software Fault Injection
3 Proposed Method
4 Experiment and Results
4.1 Evaluation Metrics
4.2 Dataset Summary
4.3 Findings and Discussion
5 Conclusion
References
Face Mask Detection Using MobileNetV2 and VGG16
1 Introduction
2 Related Work
2.1 Dataset
2.2 Pre-processing
3 Building the Models
4 Experimental Setup
5 Result Analysis
5.1 VGG16 Result
5.2 MobileNetV2 Results
6 Detailed Comparison of VGG16 and MobileNetV2
7 Conclusion
References
Face Recognition Using EfficientNet
1 Introduction
2 Related Works
3 Materials and Methods
3.1 EfficientNet
3.2 Dataset Used
4 Methodology Used
5 Result and Discussions
6 Conclusion
References
Implementation and Analysis of Decentralized Network Based on Blockchain
1 Introduction
2 Literature Study
3 Methodology
4 Implementation
5 Conclusion
References
Social Distance Monitoring Framework Using YOLO V5 Deep Architecture
1 Introduction
2 Literature Review and Related Works
3 Proposed Algorithm
3.1 Object Detection
3.2 Object Tracking
3.3 Distance Measurement
4 Experiment and Result Discussion
4.1 Dataset Description
5 Conclusion and Future Work
References
Real-Time Smart Traffic Analysis Employing a Dual Approach Based on AI
1 Introduction
2 Proposed and Implemented System
2.1 Data Acquisition and Collection
3 Experiments and Results
4 Conclusion
References
Sustainable Development in Urban Cities with LCLU Mapping
1 Introduction
2 Literature Survey
3 Methodology
3.1 Preprocessing Remote Sensing Imagery
3.2 Urban LCLU Mapping
3.3 Analysis of ULCLU Maps
4 Results and Discussion
4.1 Study Area and Data
4.2 Creating ULCLU Maps from Remote Sensing Data
4.3 Analysis of ULCLU Maps for Urban Green Spaces
4.4 Potential Expansion Areas
5 Conclusion and Future Work
References
Multi-order Replay Attack Detection Using Enhanced Feature Extraction and Deep Learning Classification
1 Introduction
2 Voice Spoofing Detection Corpus (VSDC)
3 Proposed Automatic Speaker Verification (ASV) System
3.1 Feature Extraction Using Combined FLDP-MFCC Technique
3.2 Classification Using GRU
4 Experimental Setup
5 Results
5.1 Analysis of 0PR Versus 1PR
5.2 Analysis of 0PR Versus 2PR
5.3 Comparison of Proposed Approach with Existing Techniques
6 Conclusion
References
Ferry Mobility-Aware Routing for Sparse Flying Ad-Hoc Network
1 Introduction
2 Literature Work
3 Proposed Scheme
3.1 Mobility Model of Ferry UAVs
3.2 Routing Strategy
4 Performance Evaluation
4.1 Simulation Settings
4.2 Performance Comparison with Other DTN Protocols
5 Conclusion
References
Prediction of Cardio Vascular Disease from Retinal Fundus Images Using Machine Learning
1 Introduction
2 Literature Survey
3 Proposed Work
4 Methodology
4.1 Image Acquisition
4.2 Preprocessing
4.3 Segmentation
4.4 Extraction of Texture Features
5 Conclusion
References
Tampering Detection Driving License in RTO Using Blockchain Technology
1 Introduction
1.1 Objectives
2 Literature Survey
3 Methodology
3.1 Driving License Registration
3.2 SHA256 Algorithm
3.3 Padding Bits
3.4 Compression Techniques
3.5 Output
3.6 Manage Driving License
3.7 Blockchain Validation
3.8 Blockchain Mine Performance
3.9 Blockchain Recovery
4 Proposed System
5 Conclusion and Future Work
References
Content-based Image Retrieval in Cyber-Physical System Modeling of Robots
1 Introduction
2 Related Work
3 Proposed System
3.1 Image Pre-processing and Feature Extraction
3.2 ID-SIFT Feature Extraction for Reference and Test Images
3.3 Image Analysis
3.4 Image Retrieval (IR)
3.5 Color and Shape Retrieval
3.6 Similarity Measure
4 Experimentation and Results
5 Conclusion
References
A Hybrid Spotted Hyena and Whale Optimization Algorithm-Based Load-Balanced Clustering Technique in WSNs
1 Introduction
2 Related Work
3 Hybrid Spotted Hyena and Whale Optimization Algorithm (HSHWOA)-Based Load-Balanced Clustering Technique
3.1 Primitives of Whale Optimization Algorithm (WOA)
3.2 Fundamentals of Spotted Hyena Optimization (SHO)
3.3 Hybrid HSHWOA Used for Load Balanced Clustering
4 Simulation Results and Discussion
5 Conclusion
References
Evaluating the Effect of Variable Buffer Size and Message Lifetimes in A Disconnected Mobile Opportunistic Network Environment
1 Introduction
2 Literature Survey
3 Routing Protocols
3.1 Blind-Based Routing Protocols (Flooding)
3.2 Knowledge-Based Routing (Forwarding)
4 Simulation Tool and Setup
4.1 Delivery Probability
4.2 Average Delay
4.3 Overhead Ratio
5 Result and Analysis
5.1 Impact of Varying Buffer Sizes on the Performances of the Network
5.2 Impact of Message Lifetimes on the Performances of the Network
5.3 Finding Impact of Varying Movement Models on the Performances of the Network
6 Conclusion
7 Future Work
8 Result Summary
References
Unsupervised Machine Learning for Unusual Crowd Activity Detection
1 Introduction
2 Literature Survey
3 Problem Identification
4 Proposed Method and Implementation
4.1 Finding Optical Flow of Blocks
4.2 Generation of Motion Influence Information
4.3 Mega Block Generator
4.4 Training
4.5 Testing
5 Results and Discussion
6 Conclusion
References
Author Index
Recommend Papers

Proceedings of International Conference on Recent Trends in Computing: ICRTC 2022
 9811988242, 9789811988240

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Networks and Systems 600

Rajendra Prasad Mahapatra Sateesh K. Peddoju Sudip Roy Pritee Parwekar   Editors

Proceedings of International Conference on Recent Trends in Computing ICRTC 2022

Lecture Notes in Networks and Systems Volume 600

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).

Rajendra Prasad Mahapatra · Sateesh K. Peddoju · Sudip Roy · Pritee Parwekar Editors

Proceedings of International Conference on Recent Trends in Computing ICRTC 2022

Editors Rajendra Prasad Mahapatra SRM Institute of Science and Technology Ghaziabad, Uttar Pradesh, India Sudip Roy Indian Institute of Technology Roorkee Roorkee, Uttarakhand, India

Sateesh K. Peddoju Department of Computer Science and Engineering Indian Institute of Technology Roorkee Roorkee, India Pritee Parwekar SRM Institute of Science and Technology Ghaziabad, Uttar Pradesh, India

ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-19-8824-0 ISBN 978-981-19-8825-7 (eBook) https://doi.org/10.1007/978-981-19-8825-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface and About the Organization

This Lecture Notes in Networks and Systems (LNNS) volume contains the papers presented 10th International Conference on Intelligent Computing and Applications (ICRTC 2021) held during June 3–4, 2022, at SRM Institute of Science and Technology, Delhi-NCR Campus, Modinagar, Ghaziabad, India. ICRTC 2022 is aiming at bringing together the researchers from academia and industry to report and review the latest progress in the cutting-edge research on various research areas of image processing, computer vision and pattern recognition, machine learning, data mining, big data and analytics, soft computing, mobile computing and applications, cloud computing, green IT and finally to create awareness about these domains to a wider audience of practitioners. ICRTC 2022 received 350 paper submissions including two submissions from foreign countries. All the papers were peer-reviewed by the experts in the area from India and abroad, and comments were sent to the authors of accepted papers. Finally, seventy papers were accepted for online Zoom presentation in the conference. This corresponds to an acceptance rate of 34% that is intended to maintain the high standards of the conference proceedings. The papers included in this Lecture Notes in Networks and Systems (LNNS) volume cover a wide range of topics in intelligent computing and algorithms and their real-time applications in problems from diverse domains of science and engineering. The conference was inaugurated by Prof. Milan Tuba, Professor, Singidunum University, Serbia, on June 3, 2022. The conference featured distinguished keynote speakers as follows: Prof. Sheng-Lung Peng, National Tsing Hua University, Taiwan; Address by Chief Guest Prof. Mike Hinchey, University of Limerick, Ireland. We take this opportunity to thank the authors of the submitted papers for their hard work, adherence to the deadlines, and patience with the review process. The quality of a referred volume depends mainly on the expertise and dedication of the reviewers. We are indebted to the technical committee members, who produced excellent reviews in short time frames. First, we are indebted to the Hon’ble Dr. T. R. Paari Vendhar, Member of Parliament (Lok Sabha), Founder-Chancellor, SRM Institute of Science and Technology; Shri. Ravi Pachamoothoo, Pro-Chancellor— Administration, SRM Institute of Science and Technology; Dr. P. Sathyanarayanan, v

vi

Preface and About the Organization

Pro-Chancellor—Academics, SRM Institute of Science and Technology; Dr. R. Shivakumar, Vice-President, SRM Institute of Science and Technology; Prof. C. Muthamizhchelvan, Vice-Chancellor i/c, SRM Institute of Science and Technology for supporting our cause and encouraging us to organize the conference there. In particular, we would like to express our heartfelt thanks for providing us with the necessary financial support and infrastructural assistance to hold the conference. Our sincere thanks to Dr. D. K. Sharma, Professor and Dean; Dr. S. Viswanathan, Director; Dr. Navin Ahalawat, Professor and Dean (Campus Life), SRM Institute of Science and Technology, Delhi-NCR Campus, Modinagar, Ghaziabad, for their continuous support and guidance. We specially thank Dr. Pritee Parwekar, Associate Professor, and Dr. Veena Khandelwal, Associate Professor, Co-conveners-ICRTC 2022, SRM Institute of Science and Technology, Delhi-NCR Campus, of this conference for their excellent support and arrangements. Without them, it is beyond imagination to conduct this conference. We thank the international advisory committee members for providing valuable guidelines and inspiration to overcome various difficulties in the process of organizing this conference. We would also like to thank the participants of this conference. The faculty members and students of SRM Institute of Science and Technology, Delhi-NCR Campus, Modinagar, Ghaziabad, deserve special thanks. Without their involvement, we would not have been able to face the challenges of our responsibilities. Finally, we thank all the volunteers who made great efforts in meeting the deadlines and arranging every detail to make sure that the conference could run smoothly. We hope the readers of these proceedings find the papers inspiring and enjoyable. Ghaziabad, India Roorkee, India Roorkee, India Ghaziabad, India

Dr. Rajendra Prasad Mahapatra Dr. Sateesh K. Peddoju Dr. Sudip Roy Dr. Pritee Parwekar

Contents

A Novel High Gain Single Switch Flyback DC–DC Converter for Small-Scale Lightning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Sathiya, M. Arun Noyal Doss, S. Avinash, and R. R. Hitesh

1

Learning-Based Model for Auto-Form Filling . . . . . . . . . . . . . . . . . . . . . . . . Manan Gupta, Hardik Sharma, Nitesh Kumar, and Mukesh Rawat

15

Fatality Prediction in Road Accidents Using Neural Networks . . . . . . . . . M. Rekha Sundari, Prasadu Reddi, K. Satyanarayana Murthy, and D. Sai Sowmya

25

Managing Peer Review Process of Research Paper . . . . . . . . . . . . . . . . . . . . Samarth Anand, Samarpan Jain, Sarthak Aggarwal, Shital Kasyap, and Mukesh Rawat

35

Internet of Things-Based Centralised Water Distribution Monitoring System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Biswaranjan Bhola and Raghvendra Kumar

47

StakePage: Analysis of Stakeholders of an Information System Using Page Rank Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tanveer Hassan, Chaudhary Wali Mohammad, and Mohd. Sadiq

59

Violence Recognition from Videos Using Deep Learning . . . . . . . . . . . . . . . Shivam Rathi, Shivam Sharma, Sachin Ojha, and Kapil Kumar

69

Stock Price Prediction Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . Piyush, Amarjeet, Anubhav Sharma, Sunil Kumar, and Nighat Naaz Ansari

79

Brain Tumor Detection Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . Sunny Yadav, Vipul Kaushik, Vansh Gaur, and Mala Saraswat

89

vii

viii

Contents

Predicting Chances of Cardiovascular Diseases Through Integration of Feature Selection and Ensemble Learning . . . . . . . . . . . . . . 103 Raghav Bhardwaj, Shashvat Mishra, Isha Gupta, and Shweta Paliwal Feedback Analysis of Online Teaching Using SVM . . . . . . . . . . . . . . . . . . . . 119 Punit Mittal, Kartikey Tiwari, Kanupriya Malik, and Meghna Tyagi DCGAN for Data Augmentation in Pneumonia Chest X-Ray Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 S. P. Porkodi, V. Sarada, and Vivek Maik Fair Quality of Voice Over WiMAX Coexisting of WiFi Networks for Video Streaming Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 V. R. Vinothini, C. Ezhilazhagan, and K. Sakthisudhan An Empirical Overview on DDoS: Taxonomy, Attacks, Tools and Attack Detection Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Varsha Parekh and M. Saravanan Histopathology Osteosarcoma Image Classification . . . . . . . . . . . . . . . . . . . 163 Ayush Chhoker, Kunlika Saxena, Vipin Rai, and Vishwadeepak Singh Baghela Information-Based Image Extraction with Data Mining Techniques for Quality Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 S. Vinoth Kumar, H. Shaheen, A. Christopher Paul, M. Shyamala Devi, R. Aruna, and S. Sangeetha Fake News Detection System Using Multinomial Naïve Bayes Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 S. Sangeetha, S. Vinoth Kumar, R. Manoj Kumar, R. S. Rathna Sharma, and Rakesh Shettar Superconductivity-Based Energy Storage System for Microgrid Stabilization by Connecting and Disconnecting Loads . . . . . . . . . . . . . . . . . 197 Amol Raut and Kiran Dongre Deep Learning-Based Model for Face Mask Detection in the Era of COVID-19 Pandemic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Ritu Rani, Amita Dev, Ritvik Sapra, and Arun Sharma Efficient System to Predict Harvest Based on the Quality of the Crop Using Supervised Techniques and Boosting Classifiers . . . . . 221 S. Divya Meena, Jahnavi Chakka, Srujan Cheemakurthi, and J. Sheela ResNet: Solving Vanishing Gradient in Deep Networks . . . . . . . . . . . . . . . . 235 Lokesh Borawar and Ravinder Kaur BRCA1 Genomic Sequence-Based Early Stage Breast Cancer Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 S. G. Shaila, Ganapati Bhat, V. R. Gurudas, Arya Suresh, and K. Hithyshi

Contents

ix

Develop Model for Recognition of Handwritten Equation Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Kaushal Kishor, Rohan Tyagi, Rakhi Bhati, and Bipin Kumar Rai Feature Over Exemplification-Based Classification for Revelation of Hypothyroid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 M. Shyamala Devi, P. S. Ramesh, S. Vinoth Kumar, R. Bhuvana Shanmuka Sai Sivani, S. Muskaan Sultan, and Thaninki Adithya Siva Srinivas A Framework with IOAHT for Heat Stress Detection and Haemoprotozoan Disease Classification Using Multimodal Approach Combining LSTM and CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Shiva Sumanth Reddy and C. Nandini Using Classifier Ensembles to Predict Election Results Using Twitter Data Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Pinki Sharma and Santosh Kumar Optimization Algorithms to Reduce Route Travel Time . . . . . . . . . . . . . . . 311 Yash Vinayak and M. Vijayalakshmi Survival Analysis and Its Application of Discharge Time Likelihood Prediction Using Clinical Data on COVID-19 Patients-Machine-Learning Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 S. Muruganandham and A. Venmani Credit Card Fraud Detection Using Machine Learning and Incremental Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 Akanksha Dhyani, Ayushi Bansal, Aditi Jain, and Sumedha Seniaray Game Data Visualization Using Artificial Intelligence Techniques . . . . . . 351 Srikanta Kumar Mohapatra, Prakash Kumar Sarangi, Premananda Sahu, Santosh Kumar Sharma, and Ochin Sharma Energy-Efficient and Fast Data Collection in WSN Using Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 Rahul Shingare and Satish Agnihotri Feature Reduced Anova Element Oversampling Elucidation Based Categorisation for Hepatitis C Virus Prognostication . . . . . . . . . . . . . . . . . 375 M. Shyamala Devi, S. Vinoth Kumar, P. S. Ramesh, Ankam Kavitha, Konkala Jayasree, and Venna Sri Sai Rajesh Personality Trait Detection Using Handwriting Analysis by Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 Pratibha Singh, Sushant Verma, Shivam Chaudhary, and Shivam Gupta

x

Contents

Road Traffic Density Classification to Improvise Traffic System Using Convolutional Neural Network (CNN) . . . . . . . . . . . . . . . . . . . . . . . . . 397 Nidhi Singh and Manoj Kumar Fake Reviews Detection Using Multi-input Neural Network Model . . . . . 405 Akhandpratap Manoj Singh and Sachin Kumar Classification of Yoga Poses Using Integration of Deep Learning and Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Kumud Kundu and Adarsh Goswami Tabular Data Extraction From Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Jyoti Madake and Sameeran Pandey Vision-Based System for Road Lane Detection and Lane Type Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441 Jyoti Madake, Dhavanit Gupta, Shripad Bhatlawande, and Swati Shilaskar An Energy-Efficient Cluster Head Selection in MANETs Using Emperor Penguin Optimization Fuzzy Genetic Algorithm . . . . . . . . . . . . . 453 Fouziah Hamza and S. Maria Celestin Vigila Ground Water Quality Index Prediction Using Random Forest Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 Veena Khandelwal and Shantanu Khandelwal Near Threshold Operation Based a Bug Immune DET-FF for IoT Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 Sumitra Singar and Raghuveer Singh Dhaka Analyzing the Trade-Off Between Complexity Measures, Ambiguity in Insertion System and Its Application in Natural Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 Anand Mahendran, Kumar Kannan, and Mohammed Hamada Human-to-Computer Interaction Using Hand Gesture Tracking System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 Raunaq Verma, Raksha Agrawal, Nisha Thuwal, Nirbhay Bohra, and Pranshu Saxena Human Emotion Recognition Based on EEG Signal Using Deep Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 S. G. Shaila, A. Sindhu, D. Shivamma, V. Suma Avani, and T. M. Rajesh Sentiment Analysis of COVID-19 Tweets Using BiLSTM and CNN-BiLSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 Tushar Srivastava, Deepak Arora, and Puneet Sharma

Contents

xi

COPRAS-Based Decision-Making Strategy for Optimal Cluster Head Selection in WSNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 J. Sengathir, M. Deva Priya, R. Nithiavathy, and S. Sam Peter Unusual Activity Detection Using Machine Learning . . . . . . . . . . . . . . . . . . 551 Akshat Gupta, Anshul Tickoo, Nikhil Jindal, and Avinash K. Shrivastava Disease Detection for Cotton Crop Through Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 Manas Pratap Singh, Venus Pratap Singh, Nitasha Hasteer, and Yogesh Deriving Pipeline for Emergency Services Using Natural Language Processing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 Akshat Anand and D. Rajeswari Fetal Head Ultrasound Image Segmentation Using Region-Based, Edge-Based and Clustering Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581 G. Mohana Priya and P. Mohamed Fathimal A Shallow Convolutional Neural Network Model for Breast Cancer Histopathology Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . 593 Shweta Saxena, Praveen Kumar Shukla, and Yash Ukalkar Efficient Packet Flow Path Allocation Using Node Proclivity Tracing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 R. Aruna, M. Shyamala Devi, S. Vinoth Kumar, S. Umarani, N. S. Kavitha, and S. Gopi Energy-Efficient Multilevel Routing Protocol for IoT-Assisted WSN . . . . 615 Himani K. Bhaskar and A. K. Daniel Ship Detection from Satellite Imagery Using RetinaNet with Instance Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627 Arya Dhorajiya, Anusree Mondal Rakhi, and P. Saranya A Novel Technique of Mixed Gas Identification Based on the Group Method of Data Handling (GMDH) on Time-Dependent MOX Gas Sensor Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641 Ghazala Ansari, Preeti Rani, and Vinod Kumar Software Fault Diagnosis via Intelligent Data Mining Algorithms . . . . . . 655 Rohan Khurana, Shivani Batra, and Vineet Sharma Face Mask Detection Using MobileNetV2 and VGG16 . . . . . . . . . . . . . . . . 669 Ujjwal Kumar, Deepak Arora, and Puneet Sharma Face Recognition Using EfficientNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679 Prashant Upadhyay, Bhavya Garg, Anant Tyagi, and Arin Tyagi

xii

Contents

Implementation and Analysis of Decentralized Network Based on Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693 Cheshta Gupta, Deepak Arora, and Puneet Sharma Social Distance Monitoring Framework Using YOLO V5 Deep Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703 D. Akshaya, Charanappradhosh, and J. Manikandan Real-Time Smart Traffic Analysis Employing a Dual Approach Based on AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713 Neera Batra and Sonali Goyal Sustainable Development in Urban Cities with LCLU Mapping . . . . . . . . 725 Yash Khurana, Swamita Gupta, and Ramani Selvanambi Multi-order Replay Attack Detection Using Enhanced Feature Extraction and Deep Learning Classification . . . . . . . . . . . . . . . . . . . . . . . . . 739 Sanil Joshi and Mohit Dua Ferry Mobility-Aware Routing for Sparse Flying Ad-Hoc Network . . . . . 747 Juhi Agrawal, Monit Kapoor, and Ravi Tomar Prediction of Cardio Vascular Disease from Retinal Fundus Images Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759 M. Sopana Devi and S. Ebenezer Juliet Tampering Detection Driving License in RTO Using Blockchain Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771 P. Ponmathi Jeba Kiruba and P. Krishna Kumar Content-based Image Retrieval in Cyber-Physical System Modeling of Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783 P. Anantha Prabha, B. Subashree, and M. Deva Priya A Hybrid Spotted Hyena and Whale Optimization Algorithm-Based Load-Balanced Clustering Technique in WSNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797 J. David Sukeerthi Kumar, M. V. Subramanyam, and A. P. Siva Kumar Evaluating the Effect of Variable Buffer Size and Message Lifetimes in A Disconnected Mobile Opportunistic Network Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811 Pooja Bagane, Anurag Shrivastava, Sudhir Baijnath Ojha, Saurabh Gupta, and Deepak Kumar Ray

Contents

xiii

Unsupervised Machine Learning for Unusual Crowd Activity Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831 Pooja Bagane, Konda Hari Krishna, Shehab Mohamed Beram, Priyambada Purohit, and B. Gayathri Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847

Editors and Contributors

About the Editors Prof. Rajendra Prasad Mahapatra is the dean admissions and head—Department of Computer Science and Engineering. Prof. R. P. Mahapatra is a pioneer in the field of computer science and engineering. He has vast experience of 21 years as an academician, researcher and administrator. During his 21 years of experience, he has worked in India and abroad. He has been associated with Mekelle University, Ethopia for more than two years. Prof. R. P. Mahapatra has authored more than 100 research papers which are published in international journals like Inderscience Emerald, Elsevier, IEEE and Springer. He has authored ten books, ten book chapters and five granted patents. His 06 students have successfully completed their Ph.D. under his guidance and 8 students are pursuing their Ph.D. He is a fellow member of I.E. (India), senior member of IACSIT Singapore, life member of ISTE member of IEEE and many more reputed bodies. He is a recipient of the Institution Award from Institution of Engineers (India), Calcutta, on 56th Annual Technical Session on 15th February 2015, Madhusudan Memorial Award from Institution of Engineers (India), Calcutta, on 57th Annual Technical Session on 14th February 2016, Certificate of Excellence from SRM University, NCR Campus, on 9th February 2015. Sateesh K. Peddoju is an associate professor at Indian Institute of Technology Roorkee India. He is the senior member of ACM and senior member of IEEE, and recipient of Cloud Ambassador Award from AWS Educate, IBM SUR Award, Microsoft Educate Award, university merit scholarship, best teacher award in his previous employment, best paper/presentation awards and travel grants. He has publications in reputed journals like IEEE TIFS, IEEE Access, IEEE Potentials, MTAP, WPC, IJIS and PPNA and conferences like ACM MobiCom IEEE TrustCom IEEE MASS, ACM/IEEE ICDCN and ISC. He is the co-author of the book Security and Storage Issues in the Cloud Environment and co-editor of the book Cloud Computing Systems and Applications in Healthcare. He is on board for many conferences. He is the program chair for IEEE MASS 2020 and founding Steering Committee Chair

xv

xvi

Editors and Contributors

for SLICE. He has received grants from NMHS MEITY Railtel MHRD DST, IBM, Samsung, CSI and Microsoft. He is involved in various committees including him being the coordinator for communications sub-group of IoT Security Workgroup constituted by MEITY Government of India, expert member in CERT-Uk committee constituted by Department of IT, Government of Uttarakhand. His research interests include cloud computing, ubiquitous computing and security. Dr. Sudip Roy is an assistant professor in the Department of Computer Science and Engineering of Indian Institute of Technology (IIT) Roorkee India, since July 2014. He is also an associated faculty member of the Centre of Excellence in Disaster Mitigation and Management (CoEDMM) in IIT Roorkee India, since April 2015. He is a JSPS fellow (long-term) in the college of information science and engineering, Ritsumeikan University, Japan, during April 2021 to January 2022. He has authored one book, one book chapter, two granted US patents including 25 research articles in international peer-reviewed journals and 40 papers in international peer-reviewed conference proceedings. His current research interests include computer-aided design for digital systems, electronic design automation (EDA) for microfluidic lab-on-achips, algorithm design, optimization techniques, information and communication technologies (ICT) for disaster risk reduction (DRR). He is a recipient of JSPS Invitational Fellowship Award (Long-Term) from the Japan Society for the Promotion of Science (JSPS), Government of Japan in 2021, Early Career Research Award from Department of Science and Technology, Government of India in 2017 and Microsoft Research India Ph.D. Fellowship Award in 2010. He is a member of IEEE and ACM. Dr. Pritee Parwekar is an associate professor in the Department of Computer Science and Engineering, Faculty of Engineering and Technology. She has been an academician from last 21 years. She has been awarded Ph.D. in the area of wireless sensor networks. She holds a life membership of IEEE, ACM, CSI and ISTE. She has also contributed as Computer Society of India (CSI) State Student Coordinator for two years. Her research interests include Internet of things, cloud computing, machine learning, wireless sensor networks, software engineering, information retrieval systems, social media and data mining. She has published more than 60 research papers in peer-reviewed journals with SCI, SSCI Scopus Index, and in conferences. She has been a resource person for workshops, FDPs and international conferences. She has also been invited to many workshops and international conferences as speaker, organizer, session chair and member in advisory/program committees. She is associated with IEEE Network Magazine Springer, IGI Global Inderscience, Evolutionary Intelligence (EVIN) Journals, Personal and Ubiquitous Computing Multimedia Tools and Applications Expert Systems and many more international journals as peer reviewer.

Editors and Contributors

xvii

Contributors Sarthak Aggarwal Meerut Institute of Engineering and Technology, Meerut, India Satish Agnihotri Computer Science and Engineering Department, Madhyanchal Professional University Ratibad, Bhopal, Madhya Pradesh, India Juhi Agrawal School of Computer Science, University of Petroleum and Energy Studies, Uttrakhand, India Raksha Agrawal Department of Computer Science & Engineering, Inderprastha Engineering College, Ghaziabad, India D. Akshaya Rajalakshmi Engineering College, Thandalam, Chennai, India Amarjeet Department of Computer Science and Engineering, Meerut Institute of Engineering and Technology, Meerut, India Akshat Anand Department of Data Science and Business Systems, School of Computing, College of Engineering and Technology, SRM Institute of Science and Technology, Chennai, Tamil Nadu, India Samarth Anand Meerut Institute of Engineering and Technology, Meerut, India P. Anantha Prabha Department of Computer Science and Engineering, Sri Krishna College of Technology, Coimbatore, Tamil Nadu, India Ghazala Ansari Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, NCR Campus, Modinagar, Ghaziabad, Uttar Pradesh, India Nighat Naaz Ansari Department of Computer Science and Engineering, Meerut Institute of Engineering and Technology, Meerut, India Deepak Arora Department of Computer Science and Engineering, Amity University Lucknow Campus, Lucknow, Uttar Pradesh, India M. Arun Noyal Doss Department of Electrical and Electronics Engineering, SRM Institute of Science and Technology, Chennai, India R. Aruna Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamil Nadu, India S. Avinash Department of Electronics and Communication Engineering, College of Engineering and Technology, SRM Institute of Science and Technology, Chennai, India Pooja Bagane Affiliated to Symbiosis International (Deemed University), Symbiosis Institute of Technology, Pune, India Ayushi Bansal Delhi Technological University, New Delhi, India Neera Batra Maharishi Markandeshwer (Deemed to Be) University, Mullana, India

xviii

Editors and Contributors

Shivani Batra KIET Group of Institutions, Ghaziabad, India Shehab Mohamed Beram Research Centre for Human-Machine Collaboration (HUMAC), Department of Computing and Information Systems, School of Engineering and Technology, Sunway University, Kuala Lumpur, Malaysia Raghav Bhardwaj Department of Computer Science and Engineering, Meerut Institute of Engineering and Technology, Meerut, India Himani K. Bhaskar MMM University of Technology, Gorakhpur, Uttar Pradesh, India Ganapati Bhat Dayananda Sagar University, Bangalore, Karnataka, India Rakhi Bhati Department of Information Technology, ABES Institute of Technology, Ghaziabad, Uttar Pradesh, India Shripad Bhatlawande Department of Electronics and Telecommunication (E&TC), Vishwakarma Institute of Technology, Pune, India Biswaranjan Bhola Department of Computer Science and Engineering, GIET University, Gunupur, Odisha, India R. Bhuvana Shanmuka Sai Sivani Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamilnadu, India Nirbhay Bohra Department of Computer Science & Engineering, Inderprastha Engineering College, Ghaziabad, India Lokesh Borawar Department of Computer Science and Engineering, Chandigarh University, Mohali, Punjab, India Jahnavi Chakka School of Computer Science and Engineering, VIT-AP University, Amaravati, India Charanappradhosh Rajalakshmi Engineering College, Thandalam, Chennai, India Shivam Chaudhary ABES Engineering College, Ghaziabad, UP, India Srujan Cheemakurthi School of Computer Science and Engineering, VIT-AP University, Amaravati, India Ayush Chhoker Galgotias University, Greater Noida, India A. Christopher Paul Karpagam Institute of Technology, Coimbatore, India A. K. Daniel MMM University of Technology, Gorakhpur, Uttar Pradesh, India J. David Sukeerthi Kumar Department of Computer Science and Engineering, JNTUA, Ananthapuramu, India

Editors and Contributors

xix

Amita Dev Center of Excellence, Indira Gandhi Delhi Technical University for Women, New Delhi, Delhi, India M. Deva Priya Department of Computer Science and Engineering, Sri Eshwar College of Engineering, Coimbatore, Tamilnadu, India M. Shyamala Devi Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamilnadu, India M. Sopana Devi V. V. College of Engineering, Tisaiyanvilai, Tamilnadu, India Raghuveer Singh Dhaka Thapar Institute of Engineering and Technology, Patiala, Panjab, India Arya Dhorajiya Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur, Chennai, India Akanksha Dhyani Delhi Technological University, New Delhi, India S. Divya Meena School of Computer Science and Engineering, VIT-AP University, Amaravati, India Kiran Dongre Electrical Engineering Research Centre, Prof Ram Meghe College of Engineering & Manage Ment, Badnera, Amravati, Maharashtra, India Mohit Dua Department of Computer Engineering, National Institute of Technology, Kurukshetra, India C. Ezhilazhagan N.G.P. Institute of Technology, Coimbatore, Tamil Nadu, India Bhavya Garg ABES Institute of Technology, Ghaziabad, India Vansh Gaur ABES Engineering College, Ghaziabad, India B. Gayathri Bishop Heber College, Trichy, Tamilnadu, India S. Gopi Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamilnadu, India Adarsh Goswami Inderprastha Engineering College, Ghaziabad, Uttar Pradesh, India Sonali Goyal Maharishi Markandeshwer (Deemed to Be) University, Mullana, India Akshat Gupta Amity University, Noida, Uttar Pradesh, India Cheshta Gupta Department of Computer Science and Engineering, Amity University Lucknow Campus, Lucknow, Uttar Pradesh, India Dhavanit Gupta Department of Electronics and Telecommunication (E&TC), Vishwakarma Institute of Technology, Pune, India Isha Gupta Department of Computer Science and Engineering, Meerut Institute of Engineering and Technology, Meerut, India

xx

Editors and Contributors

Manan Gupta Meerut Institute of Engineering and Technology, Meerut, India Saurabh Gupta CSE Department, SRM Institute of Science and Technology, Gaziabad, India Shivam Gupta ABES Engineering College, Ghaziabad, UP, India Swamita Gupta Vellore Institute of Technology, Vellore, India V. R. Gurudas Dayananda Sagar University, Bangalore, Karnataka, India Mohammed Hamada Software Engineering Lab, The University of Aizu, Aizuwakamatsu, Japan Fouziah Hamza Noorul Islam Center for Higher Education, Kanyakumari, Kumaracoil, India Tanveer Hassan Department of Applied Sciences and Humanities, Faculty of Engineering and Technology, Jamia Millia Islamia, A Central University, New Delhi, India Nitasha Hasteer Amity University, Noida, Uttar Pradesh, India R. R. Hitesh Department of Electronics and Communication Engineering, College of Engineering and Technology, SRM Institute of Science and Technology, Chennai, India K. Hithyshi Dayananda Sagar University, Bangalore, Karnataka, India Aditi Jain Delhi Technological University, New Delhi, India Samarpan Jain Meerut Institute of Engineering and Technology, Meerut, India Konkala Jayasree Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamilnadu, India Nikhil Jindal Amity University, Noida, Uttar Pradesh, India Sanil Joshi Department of Computer Engineering, National Institute of Technology, Kurukshetra, India S. Ebenezer Juliet V. V. College of Engineering, Tisaiyanvilai, Tamilnadu, India; Vellore Institute of Technology, Vellore, Tamilnadu, India Kumar Kannan School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India Monit Kapoor Institute of Engineering and Technology, Chitkara University, Punjab, India Shital Kasyap Meerut Institute of Engineering and Technology, Meerut, India Ravinder Kaur Department of Computer Science and Engineering, Chandigarh University, Mohali, Punjab, India

Editors and Contributors

xxi

Vipul Kaushik ABES Engineering College, Ghaziabad, India Ankam Kavitha Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamilnadu, India N. S. Kavitha Erode Sengunthar Engineering College, Erode, Tamilnadu, India Shantanu Khandelwal SRM Institute of Science and Technology, Ghaziabad, India; KPMG Services Pvt. Ltd Singapore, Singapore, Singapore Veena Khandelwal SRM Institute of Science and Technology, Ghaziabad, India; KPMG Services Pvt. Ltd Singapore, Singapore, Singapore Rohan Khurana KIET Group of Institutions, Ghaziabad, India Yash Khurana Vellore Institute of Technology, Vellore, India Kaushal Kishor Department of Information Technology, ABES Institute of Technology, Ghaziabad, Uttar Pradesh, India Konda Hari Krishna Department of CSE, Koneru Lakshmaiah Education Foundation, Koneru Lakshmaiah Education, Vaddeswaram, AP, India P. Krishna Kumar VV Collage of Engineering, Tisaiyanvillai, Tamil Nadu, India Kapil Kumar Meerut Institute of Engineering and Technology, Meerut, India Manoj Kumar NSUT East Campus (Formerly AIACT&R) Delhi, New Delhi, India Nitesh Kumar Meerut Institute of Engineering and Technology, Meerut, India Raghvendra Kumar Department of Computer Science and Engineering, GIET University, Gunupur, Odisha, India S. Vinoth Kumar Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamilnadu, India Sachin Kumar Ajay Kumar Garg Engineering College, Ghaziabad, Uttar Pradesh, India Santosh Kumar School of Computing Science & Engineering, Galgotias University, Greater Noida, Uttar Pradesh, India Sunil Kumar Department of Computer Science and Engineering, Meerut Institute of Engineering and Technology, Meerut, India Ujjwal Kumar Department of Computer Science and Engineering, Amity University Lucknow Campus, Lucknow, Uttar Pradesh, India Vinod Kumar Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, NCR Campus, Modinagar, Ghaziabad, Uttar Pradesh, India Kumud Kundu Inderprastha Engineering College, Ghaziabad, Uttar Pradesh, India

xxii

Editors and Contributors

Jyoti Madake Department of Electronics and Telecommunication (E&TC), Vishwakarma Institute of Technology, Pune, Maharashtra, India Anand Mahendran Laboratory of Theoretical Computer Science, Higher School of Economics, Moscow, Russia Vivek Maik SRM Institute of Science and Technology, Kattankulathur, Chennai, India Kanupriya Malik Meerut Institute of Engineering and Technology, Meerut, India J. Manikandan Rajalakshmi Engineering College, Thandalam, Chennai, India R. Manoj Kumar SNS College of Technology, Coimbatore, Tamil Nadu, India Shashvat Mishra Department of Computer Science and Engineering, Meerut Institute of Engineering and Technology, Meerut, India Punit Mittal Meerut Institute of Engineering and Technology, Meerut, India P. Mohamed Fathimal SRMIST, Chennai, Tamil Nadu, India Chaudhary Wali Mohammad Department of Applied Sciences and Humanities, Faculty of Engineering and Technology, Jamia Millia Islamia, A Central University, New Delhi, India G. Mohana Priya SRMIST, Chennai, Tamil Nadu, India Srikanta Kumar Mohapatra Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India S. Muruganandham SRM Institute of Science and Technology (SRMIST), Kattankulathur, Chennai, India S. Muskaan Sultan Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamilnadu, India C. Nandini Department of Computer Science and Engineering, Dayananda Sagara Academy of Technology and Management, Visvesvaraya Technological University (VTU), Bangalore, India R. Nithiavathy Department of Computer Science and Engineering, Sri Krishna College of Technology, Coimbatore, Tamilnadu, India Sachin Ojha Meerut Institute of Engineering and Technology, Meerut, India Sudhir Baijnath Ojha Shri Sant Gadge Baba College of Engineering and Technology, Bhusawal, MS, India Shweta Paliwal Department of Computer Science and Engineering, Meerut Institute of Engineering and Technology, Meerut, India Sameeran Pandey Vishwakarma Institute of Technology, Pune, Maharashtra, India

Editors and Contributors

xxiii

Varsha Parekh Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamil Nadu, India Piyush Department of Computer Science and Engineering, Meerut Institute of Engineering and Technology, Meerut, India P. Ponmathi Jeba Kiruba VV Collage of Engineering, Tisaiyanvillai, Tamil Nadu, India S. P. Porkodi SRM Institute of Science and Technology, Kattankulathur, Chennai, India Priyambada Purohit Department of Faculty of Management Studies, SRM IST, Delhi NCR Campus Ghaziabad, Uttar Pradesh, Ghaziabad, (U.P), India Bipin Kumar Rai Department of Information Technology, ABES Institute of Technology, Ghaziabad, Uttar Pradesh, India Vipin Rai Galgotias University, Greater Noida, India T. M. Rajesh Dayananda Sagar University, Bangalore, Karnataka, India Venna Sri Sai Rajesh Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamilnadu, India D. Rajeswari Department of Data Science and Business Systems, School of Computing, College of Engineering and Technology, SRM Institute of Science and Technology, Chennai, Tamil Nadu, India Anusree Mondal Rakhi Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur, Chennai, India P. S. Ramesh Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamilnadu, India Preeti Rani Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, NCR Campus, Modinagar, Ghaziabad, Uttar Pradesh, India Ritu Rani Center of Excellence, Indira Gandhi Delhi Technical University for Women, New Delhi, Delhi, India Shivam Rathi Meerut Institute of Engineering and Technology, Meerut, India R. S. Rathna Sharma SNS College of Technology, Coimbatore, Tamil Nadu, India Amol Raut Electrical Engineering Research Centre, Prof Ram Meghe College of Engineering & Manage Ment, Badnera, Amravati, Maharashtra, India Mukesh Rawat Meerut Institute of Engineering and Technology, Meerut, India Deepak Kumar Ray Pune Bharati Vidyapeeth Deemed to Be University College of Engineering, Pune, India

xxiv

Editors and Contributors

Prasadu Reddi Department of Information Technology, Anil Neerukonda Institute of Technology & Sciences, Visakhapatnam, Andhra Pradesh, India Shiva Sumanth Reddy Department of Computer Science and Engineering, Dayananda Sagara Academy of Technology and Management, Visvesvaraya Technological University (VTU), Bangalore, India M. Rekha Sundari Department of Information Technology, Anil Neerukonda Institute of Technology & Sciences, Visakhapatnam, Andhra Pradesh, India Mohd. Sadiq Software Engineering Laboratory, Computer Engineering Section, UPFET, Jamia Millia Islamia, A Central University, New Delhi, India Premananda Sahu Department of Computer Science and Engineering, SRMIST, DELHI-NCR, Ghaziabad, India D. Sai Sowmya Department of Information Technology, Anil Neerukonda Institute of Technology & Sciences, Visakhapatnam, Andhra Pradesh, India K. Sakthisudhan N.G.P. Institute of Technology, Coimbatore, Tamil Nadu, India S. Sam Peter Department of Computer Science and Engineering, Sri Krishna College of Technology, Coimbatore, Tamilnadu, India S. Sangeetha Tamil Nadu State Council for Science and Technology, Chennai, Tamil Nadu, India Ritvik Sapra Amdocs Development Center India, Gurgaon, Haryana, India V. Sarada SRM Institute of Science and Technology, Kattankulathur, Chennai, India Prakash Kumar Sarangi Department of CSE (AI and ML), Vardhaman College of Engineering, Hyderabad, India P. Saranya Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur, Chennai, India Mala Saraswat Bennett University, Greater Noida, U.P, India M. Saravanan Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamil Nadu, India R. Sathiya Department of Electronics and Communication Engineering, College of Engineering and Technology, SRM Institute of Science and Technology, Chennai, India K. Satyanarayana Murthy Department of Computer Science and Technology, Baba Institute of Technology and Sciences, Visakhapatnam, Andhra Pradesh, India Kunlika Saxena Galgotias University, Greater Noida, India

Editors and Contributors

xxv

Pranshu Saxena Department of Computer Science & Engineering, Inderprastha Engineering College, Ghaziabad, India Shweta Saxena School of Computing Science Engineering, VIT Bhopal University, Bhopal, India Ramani Selvanambi Vellore Institute of Technology, Vellore, India J. Sengathir Department of Information Technology, CVR College of Engineering, Hyderabad, Vastunagar, Telangana, India Sumedha Seniaray Delhi Technological University, New Delhi, India H. Shaheen University of West London, Ras Al Khaimah, United Arab Emirates S. G. Shaila Dayananda Sagar University, Bangalore, Karnataka, India Anubhav Sharma Department of Computer Science and Engineering, Meerut Institute of Engineering and Technology, Meerut, India Arun Sharma Center of Excellence, Indira Gandhi Delhi Technical University for Women, New Delhi, Delhi, India Hardik Sharma Meerut Institute of Engineering and Technology, Meerut, India Ochin Sharma Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India Pinki Sharma Computer Science and Engineering, ABES Engineering College, Ghaziabad, India Puneet Sharma Department of Computer Science and Engineering, Amity University, Lucknow, Uttar Pradesh, India Santosh Kumar Sharma Department of Computer Science and Engineering, CVRCE, Bhubaneswar, Odisha, India Shivam Sharma Meerut Institute of Engineering and Technology, Meerut, India Vineet Sharma KIET Group of Institutions, Ghaziabad, India J. Sheela School of Computer Science and Engineering, VIT-AP University, Amaravati, India Rakesh Shettar SNS College of Technology, Coimbatore, Tamil Nadu, India Swati Shilaskar Department of Electronics and Telecommunication (E&TC), Vishwakarma Institute of Technology, Pune, India Rahul Shingare Computer Science and Engineering Department, Madhyanchal Professional University Ratibad, Bhopal, Madhya Pradesh, India D. Shivamma Dayananda Sagar University, Bangalore, Karnataka, India Anurag Shrivastava Sushila Devi Bansal College, Indore, Madhya Pradesh, India

xxvi

Editors and Contributors

Avinash K. Shrivastava International Management Institute, Kolkata, India Praveen Kumar Shukla School of Computing and Information Technology, Manipal University Jaipur, Jaipur, India M. Shyamala Devi Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamilnadu, India A. Sindhu Dayananda Sagar University, Bangalore, Karnataka, India Sumitra Singar Bhartiya Skill Development University, Jaipur, Rajasthan, India Akhandpratap Manoj Singh Ajay Kumar Garg Engineering College, Ghaziabad, Uttar Pradesh, India Manas Pratap Singh Amity University, Noida, Uttar Pradesh, India Nidhi Singh USICT, Guru Gobind Singh Indraprastha University, New Delhi, India Pratibha Singh Krishna Engineering College, Ghaziabad, UP, India Venus Pratap Singh Amity University, Noida, Uttar Pradesh, India Vishwadeepak Singh Baghela Galgotias University, Greater Noida, India A. P. Siva Kumar Department of Computer Science and Engineering, JNTUA, Ananthapuramu, India Thaninki Adithya Siva Srinivas Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamilnadu, India Tushar Srivastava Department of Computer Science & Engineering, Amity University, Lucknow, Uttar Pradesh, India B. Subashree Department of Computer Science and Engineering, Sri Krishna College of Technology, Coimbatore, Tamil Nadu, India M. V. Subramanyam Santhiram Engineering College, Nandyal, India V. Suma Avani Vijaya Institute of Technology for Women, Vijayawada, Andhra Pradesh, India Arya Suresh Dayananda Sagar University, Bangalore, Karnataka, India Nisha Thuwal Department of Computer Science & Engineering, Inderprastha Engineering College, Ghaziabad, India Anshul Tickoo Amity University, Noida, Uttar Pradesh, India Kartikey Tiwari Meerut Institute of Engineering and Technology, Meerut, India Ravi Tomar Institute of Engineering and Technology, Chitkara University, Punjab, India Anant Tyagi ABES Institute of Technology, Ghaziabad, India

Editors and Contributors

xxvii

Arin Tyagi ABES Institute of Technology, Ghaziabad, India Meghna Tyagi Meerut Institute of Engineering and Technology, Meerut, India Rohan Tyagi Department of Information Technology, ABES Institute of Technology, Ghaziabad, Uttar Pradesh, India Yash Ukalkar School of Computing Science Engineering, VIT Bhopal University, Bhopal, India S. Umarani Erode Sengunthar Engineering College, Erode, Tamilnadu, India Prashant Upadhyay ABES Institute of Technology, Ghaziabad, India A. Venmani SRM Institute of Science and Technology (SRMIST), Kattankulathur, Chennai, India Raunaq Verma Department of Computer Science & Engineering, Inderprastha Engineering College, Ghaziabad, India Sushant Verma ABES Engineering College, Ghaziabad, UP, India S. Maria Celestin Vigila Noorul Islam Center for Higher Education, Kanyakumari, Kumaracoil, India M. Vijayalakshmi SRM Institute of Science and Technology, Kattankulathur, Chengalpat, Tamil Nadu, India Yash Vinayak SRM Institute of Science and Technology, Kattankulathur, Chengalpat, Tamil Nadu, India S. Vinoth Kumar Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamilnadu, India V. R. Vinothini Bannari Amman Institute of Technology, Erode, Tamil Nadu, India Sunny Yadav ABES Engineering College, Ghaziabad, India Yogesh Amity University, Noida, Uttar Pradesh, India

A Novel High Gain Single Switch Flyback DC–DC Converter for Small-Scale Lightning R. Sathiya, M. Arun Noyal Doss, S. Avinash, and R. R. Hitesh

Abstract Day by day the population is increasing in our country. Increasing population leads to increase demand for power. In order to work out this growing demand, a simple and reliable model of combination of flyback and Luo converters are put forward in this paper. The source for the proposed converter is taken as photovoltaic energy obtained directly from the sun and the load is an outdoor lighting device, a well lamp. In this research, a hybrid converter is proposed that produces high gain with less ripples and also less stress across connected components. The operating modes also justify the working of the proposed converter. Furthermore, the proposed converter is backed with MATLAB Simulation waveforms and is verified with mathematical calculations to prove its efficiency. A 100 W prototype model is proposed to verify the effectiveness of the software model. Keywords Flyback converter · High gain · Single switch · High efficiency · Increased voltage gain

R. Sathiya (B) · S. Avinash · R. R. Hitesh Department of Electronics and Communication Engineering, College of Engineering and Technology, SRM Institute of Science and Technology, Vadapalani Campus, Chennai 600026, India e-mail: [email protected] S. Avinash e-mail: [email protected] R. R. Hitesh e-mail: [email protected] M. Arun Noyal Doss Department of Electrical and Electronics Engineering, SRM Institute of Science and Technology, SRM Nagar, Kattankalathur, Chennai 603203, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_1

1

2

R. Sathiya et al.

1 Introduction Most preferred DC–DC converter, flyback converter is the converter that draws attention of young budding researchers when it comes to its simplicity and effectiveness. Usually, the other topologies are seemingly quite hard to design and to practically formidable to implement, whereas in the case of flyback converter, where we have minimalistic design, increased efficiency, multiple differentiated input–output, increased voltage gain and reduced cost so it is quite easy to call it an ‘Efficient converter’. Two mutually coupled inductors are integrated into flyback converters for separation between input–output energy storage and increased output power transformation [1]. There are many differentiated types of DC-DC converters available in the industry available for various purposes but we are only concerned about isolated converters. These are mostly preferred as it supplies the required differentiation between input and output side which delivers a wide range of functions including agreement with noise immunity, safety requirements and other possible ground references. Also, it formulates a considerable amount of change in the voltage levels, energy ratings, etc. [2]. Flyback converters usually find its applications in cell phones, LCD TVs and personal computers as a power supply circuit [3]. Renewable energy became a major requirement due to its wide range of application sets, because of which it has become an essential part in our day-to-day activities. Among the various available energies, solar energy is predominantly selected because of its superabundance and versatility [4]. The proposed converter uses solar energy in the form of photovoltaic energy from the photovoltaic array of cells. It can generate up to 20 V of energy. A battery is considered as a secondary source of energy in case of any complications or breakdown of the primary source. The proposed converter uses two boost converters, a flyback converter and a Luo converter which elevates the efficiency of the proposed converter and decreases the stress among the connected constituents [5]. The proposed converter can be best explained with the help of the functional diagram given in Fig. 1. The proposed converter uses MOSFET as a switch to drive the input pulse, and thus, it provides improved voltage gain making decreased stress across the constituents possible and producing less ripples in the waveform [5].

2 Proposed Topology The ideology behind the proposed converter is best understood with the help of the given flowchart (Fig. 2) [6]. The DC-DC converter which can increase or decrease voltage according to the user needs is called a flyback converter. It can act both like a boost or buck converter. In this paper, we are currently interested in the boost mode of the flyback converter. The main advantage of flyback converter instilled in this designed converter is that even when there is no primary current flowing through

A Novel High Gain Single Switch Flyback DC–DC Converter …

3

Fig. 1 Functional diagram of the proposed converter

the coil, still some amount of secondary voltage is induced on the mutually coupled inductors and consequently power is delivered to the load. The presented converter is given in the diagram given in Fig. 3. There are minimalistic amount of components used. With limited components, there is no compromise on the efficiency or the performance of the converter which separates the proposed converter from its previous competitors. In the presented converter, direct current (DC) is taken as input considering the fact that alternating current cannot be stored because of its alternating nature. But in order for the components to run smoothly, alternating current (AC) is required. Therefore, the DC is supplied with the help of input pulse generator so that the pulse generator converts the DC to AC in such a way that the components are not damaged considering that continuous AC damages the components [7–11]. Also, the MOSFET is used as a switch that is used to drive the input pulse generator and the resistor is used as load [5]. The inductors are used as storage elements. The capacitors are used to store energy in the off state, and they discharge the same stored energy along with the applied input energy and hence making the voltage to increase. This paper consists of flyback converter combined with Luo to boost the load voltage of the proposed module. This will produce a boosted voltage which will in turn power the load. The converter can be used to power all the three applications as specified in Fig. 1. One of the major issues of our country is the improperly developed construction sites. These poorly developed construction sites often lead to accident of innocent people usually children. So by developing a proper lighting system, these accidents can be reduced and their effects can be neutralized. Out of the proposed applications well lamp is finalized as a load considering its necessity and also its

4

Fig. 2 Flowchart of the proposed converter

Fig. 3 Design of the proposed converter

R. Sathiya et al.

A Novel High Gain Single Switch Flyback DC–DC Converter … Table 1 Input parameters and their specifications

5

Input parameters

Corresponding values

Input voltage, V in

20 V

Output voltage, V out

240 V

Duty cycle

0.5

Switching frequency

20 kHz

Gain

12

specifications. The detailed value specifications of the proposed converter are listed in Table 1. The main advantage of the designed converter is that it can supply the desired output parameters continuously with the help of a mutually coupled inductor.

3 Mathematical Modelling Some assumptions are made to the energy storage elements for the efficient working of the converter. In an ideal state of the converter, the output power will be equal to the input power. The power is denoted by ‘P’. Pinput = Poutput

3.1 Duty Cycle and Turns Ratio Duty ratio is a very important design parameter to be considered as it increases or decreases the stability of the converter [10]. It can be defined as the time for which the load is turned on. Duty cycle is conventionally represented as a percentage or sometimes as a ratio. Mathematically, it is defined as Duty Ratio = 1 −

Vin Vout

where V in Source Voltage of the converter. V out Load Voltage of the converter. For our proposed converter, we have tried maintaining a duty ratio as low as possible to increase the efficiency of the designed converter. The turns ratio is another important parameter that might affect the current flowing through the primary and secondary winding and also the duty cycle [12, 13]. In the

6

R. Sathiya et al.

proposed converter, we have achieved a high gain with low duty cycle and a comparison of various gain parameters is done for better understanding and is presented in Fig. 7 for reference. The values used in the figure are referred from [14–19].

3.2 Design of Inductors The selection of the inductors primarily depends upon the load voltage, inductor ripple current, switching frequency and duty ratio [5]. The primary inductance of the inductor L1 is calculated by   2 2 ∗ V(in_ L p = η ∗ Dmax (2 ∗ Fsw ∗ K fr ∗ Po ) max) where Lp η V(in_max) 2 F SW K fr Po

Primary inductance (L1) Efficiency Maximum Input Voltage Switching Frequency Switching Frequency Constant Output Power

[where it is calculated by the formula (Po = V o * I o ) where V o = Output Voltage I o = Output Current].

3.3 Efficiency of the Proposed Converter For an ideal converter, the efficiency is taken to 100% but for a practical converter due to the losses produced from the connected components it cannot be equal to 100%. The mathematical expression to calculate the efficiency of the proposed converter is given by ηc =

Vin_ max ∗ Dmax (1 − Dmax ) ∗ (Vo + Vd )

where ηc Vin_ max Dmax Vo

Efficiency Maximum Input Voltage Maximum Duty Cycle Output Voltage According to the initial assumption, output power will be equal to the input power, and therefore, the output power Po = 1 W.

A Novel High Gain Single Switch Flyback DC–DC Converter …

7

3.4 Primary and Peak Currents: When the MOSFET is turned on and is valid for the interval [0, KT ], where K is the duty cycle and T is the switching time period. The primary current I p which increases linearly is given by the mathematical expression Ip =

Vs ∗ t Lp

(1)

where Vs Primary winding across the transformer t Time period for the switch is valid L p Magnetizing inductance of primary wind. At the endpoint of this mode where t = KT then the primary peak current touches a value equal to I p(pk) Vs kT Lp

Ip(pk) = Ip (t = kT ) =

(2)

The peak secondary current is given by I se(pk)  Ise(pk) =

NP Ns

 ∗ Ip(pk)

(3)

The secondary peak current that decreases linearly is given by  Ise = Ise(pk) −

Vo Ls

 ∗t

(4)

where L s is the secondary magnetizing inductance

3.5 Power and Voltage Ratings of the Designed Converter As the power is travelling from the source to destination between the time interval 0 and KT only, the input power is given by, 1 Pin =

2

2 ∗ L p ∗ Ip(pk)

T

For η efficiency, the output power Pout ,

=

(k ∗ Vs )2 2 ∗ f ∗ Lp

(5)

8

R. Sathiya et al.

Pout = η ∗ Pi =

η ∗ (k ∗ Vs )2 2 ∗ f ∗ Lp

(6)

Vo2 RL

(7)

We know that Po =

So therefore on comparing (6) with (7), we get √

Vo = Vs ∗ k ∗

η ∗ RL 2 ∗ f ∗ Lp

(8)

4 Modelling Closed Loop Mechanism Closed loop, or a feedback loop, is where we take the output of the system and feed it to a controller used to control the total harmonic distortions and displacements of the load signal [20–22]. The controller runs by comparing the required condition with the output of a system and converting the produced error signal to a proper coordinated step aimed to reduce the errors produced and to improve the output of the system to the clearly defined proper required response. The number of feedbacks required depends on various parameters like the required output voltage, required tuned waveform and input voltage [6]. This converter was first proposed in open loop only. But considering its application of automatic rectifications of distortions present in the waveforms and also the stability of the system is increased which enhances the efficiency against external variables to the process. This converter is proceeded with closed loop [6].

4.1 PI Controller Although various controllers have been considered for closed loop control, the PI controller has been proposed for the same purpose considering its efficiency and performance. Some additional advantages of the PI controller include increased overshoot, settling time, order and type of the system. Also, it acts like a low-pass filter. It produces a stabilized maximum overshoot. PI controller includes a combined I controller which is better than integral controller alone in terms of performance.

A Novel High Gain Single Switch Flyback DC–DC Converter …

5

9

Simulation and Results

The designed converter is compared and contrasted with other converters in terms of their voltage gain, duty cycle, input, output voltages and their number of components as depicted in Table 2. The values used in the figure are referred from [14–19]. To highlight the advantages of the designed converter, the required simulations were carried out with the help of MATLAB/Simulink software [23]. Required values for the constituents were chosen by trial and error method for the stimulation purpose. From Table 2, the needful parameters are taken into account for the purpose of simulation of the converter. The MATLAB/Simulink diagram is given in Fig. 4. The output voltage of 240 V is achieved in the open loop simulation. The output is attached here for reference [5, 23, 24] in Figs. 4, 5, 6 and 7. Table 2 Performance comparison of the converter Circuit elements

[14]

[16]

[18]

[19]

Proposed converter

Input voltage

24

36

180

90

20

Output Voltage

150

48

400

400

240

Topology—boost











Inductors

2

4

1

1

2

Capacitors

6

2

3

1

4

Diodes

5

8

6

3

4

Switches

3

2

1

2

1

Duty cycle

0.8

0.3

0.6

0.8

0.5

Voltage gain

6.3

1.3

2.2

4.4

12

Fig. 4 Proposed open loop converter MATLAB diagram

10

R. Sathiya et al.

Fig. 5 Open loop stimulation of the presented converter

Fig. 6 Proposed closed loop converter MATLAB diagram

6 Hardware Analysis To showcase the efficiency of the designed converter [25], the stimulated software demonstration is practically implemented with the help of practical hardware components. The proposed hardware model is in nullified mode, the load is not charged and it is in off state as demonstrated in Fig. 8. The proposed model is charged, all capacitors and inductors are charged, and it is in operating mode. The operating mode is demonstrated in Fig. 9. It can be noted that the load is functioning properly in Fig. 9. Thus, the efficiency of 91% is achieved with this proposed converter which is far greater than its competitors.

A Novel High Gain Single Switch Flyback DC–DC Converter …

11

Fig. 7 Proposed converter closed loop simulation

Fig. 8 Proposed converter hardware demonstration with the load in off state

7 Conclusion A novel two mutually coupled inductor-based high step-up gain boosted flyback converter along with Luo, with its improved reliability and increased efficiency is presented in this research. Boosted step-up voltage gain is achieved by the proposed converter even while functioning at a low duty ratio. Not only reliable but also costeffective inconsideration with the number of components involved as highlighted in Table 2. It also uses only one switch which increases its minimalistic design. The experimental results and the simulation are carried out to demonstrate the advantages of the designed converter.

12

R. Sathiya et al.

Fig. 9 Proposed converter hardware demonstration with the load in ON state

References 1. Taneri MC, Genc N, Mamizadeh A (2019) Analyzing and comparing of variable and constant switching frequency flyback DC-DC converter. In: 2019 4th international conference on power electronics and their applications (ICPEA). https://doi.org/10.1109/ICPEA1.2019.8911196 2. Singh R, Bose S, Dwivedi P (2020) Multi-output flyback converter closed loop control with MPPT tracked PV module. In: 2020 IEEE 17th India Council international conference (INDICON). https://doi.org/10.1109/INDICON49873.2020.9342563 3. Guepfrih MF, Waltrich G, Lazzarin TB (2018) Quadratic boost-flyback DC-DC converter with coupled inductors. In: 2018 13th IEEE international conference on industry applications (INDUSCON). https://doi.org/10.1109/INDUSCON.2018.8627065 4. Pansare C, Sharma SK, Jain C, Saxena R (2017) Analysis of a modified positive output Luo converter and its application to solar PV system. In: IEEE industry applications society annual meeting, pp 1–6.https://doi.org/10.1109/IAS.2017.8101849 5. Sathiya R, Arun Noyal Doss M (2021) Non isolated high gain converter DC-DC converter using sustainable energy. Mater Today Proc 6. Liu X, Yang X, Jiang J, Cai X (2005) Design of double closed loop in boost aerospace DCDC power supply. In: 2005 international conference on power electronics and drives systems, pp1552–1556.https://doi.org/10.1109/PEDS.2005.1619935 7. Pradhan R, Subudhi B (2014) Design and real-time implementation of a new auto-tuned adaptive MPPT control for a photovoltaic system. Int J Electr Power Energy Syst 64:792–803. https://doi.org/10.1016/j.ijepes.2014.07.080 8. Kumar S, Thakura PR (2017) Closed loop PI control of DC-DC cascode buck-boost converter. In: 2017 international conference on innovations in information, embedded and communication systems (ICIIECS), pp 1–6. https://doi.org/10.1109/ICIIECS.2017.8275838 9. Yun L, Qianqian G, Xiang Z (2021) Double closed loop control of DC-DC boost converter. In: 2021 IEEE international conference on power electronics, computer applications (ICPECA), pp 607–610. https://doi.org/10.1109/ICPECA51329.2021.9362504 10. Rashid MH (2004) Power electronics: circuits, devices, and applications, 3rd edn. Pearson/Prentice Hall. ISBN 0131011405, 9780131011403, 880 p 11. Singh R, Bose S, Dwivedi P (2020) Closed loop control of flyback converter with PV as a source. In: 2020 IEEE 9th power India international conference (PIICON), pp 1–6. https://doi. org/10.1109/PIICON49524.2020.9113035

A Novel High Gain Single Switch Flyback DC–DC Converter …

13

12. Mirzaee A, Moghani JS (2019) Coupled inductor-based high voltage gain DC–DC converter for renewable energy applications. IEEE Trans Power Electron 35(7). https://doi.org/10.1109/ TPEL.2019.2956098 13. Doss MAN, Christy AA, Krishnamoorthy R (2018) Modified hybrid multilevel inverter with reduced number of switches for PV application with smart IoT system. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-018-1151-2 14. Xu D, Cai Y, Chen Z, Zhong S (2014) A novel two winding coupled-inductor step-up voltage gain boost-flyback converter, 5–8 Nov 2014. https://doi.org/10.1109/PEAC.2014.7037818 15. Park K-B, Moon G-W, Youn M-J (2011) Non isolated high step-up stacked converter based on boost-integrated isolated converter. IEEE Trans Power Electron 26(2):0885–8993. https://doi. org/10.1109/TPEL.2010.2066578 16. Lu Y, Wu H, Sun K, Xing Y (2015) A family of isolated buck-boost converters based on semiactive rectifiers for high output voltage applications, 18 Nov 2015. https://doi.org/10.1109/ TPEL.2015.2501405 17. Moon BH, Jung HY, Kim SH, Lee S-H (2017) A modified topology of two-switch buck-boost converter, 14 Sept 2017. https://doi.org/10.1109/ACCESS.2017.2749418 18. Bianchin CG, Gules R, Badin AA, Romaneli EFR (2014) High power factor rectifier using the modified SEPIC Converter operating in discontinuous conduction mode. IEEE Trans Power Electron. https://doi.org/10.1109/TPEL.2014.2360172 19. Muhammad KS, Lu D-DC (2014) ZCS Bridgeless boost PFC rectifier using only two active switches. IEEE Trans Ind Electron 35(7). https://doi.org/10.1109/TIE.2014.2364983 20. Marzaki MH, Rahiman MHF, Adnan R, Tajuddin M (2015) Real time performance comparison between PID and fractional order PID controller in SMISD plant. In: 2015 IEEE 6th control and system graduate research colloquium (ICSGR C), pp 141–145. https://doi.org/10.1109/ ICSGRC.2015.7412481 21. Oku D, Obot E (2019) Comparative study of PD, PI and PID controllers for control of a single joint system in robots. https://doi.org/10.9790/1813-0709025154 22. Aseem K, Selvakumar S (2020) Closed loop control of DC-DC converters using PID and FOPID controllers. Int J Power Electron Drive Syst 11(3):1323–1332. https://doi.org/10.11591/ijpeds. v11.i3 23. Ranganathan S, Mohan AND (2021) Formulation and analysis of single switch high gain hybrid dc to dc converter for high power applications. Electronics 10(19):2445. https://doi.org/ 10.3390/electronics10192445 24. Sathiya R, Arun Noyal Doss M, Archana Prasanthi R (2020) PV based DC to DC boost converter with RC snubber circuits. Solid State Technol 63(6):10438–10447. ISSN 0038-111X 25. Chandrasekar B, Nallaperumal C, Dash SS (2019) A nonisolated three-port DC–DC converter with continuous input and output currents based on cuk topology for PV/fuel cell applications. Electronics 8(2):214. https://doi.org/10.3390/electronics8020214

Learning-Based Model for Auto-Form Filling Manan Gupta, Hardik Sharma, Nitesh Kumar, and Mukesh Rawat

Abstract In the age of computer technology, most humans now depend upon online applications for all daily life purposes, from online shopping to government business. Often such users need to provide their information to websites through browser-based forms to interact further (Ponmurugesh et al. in Int Res J Eng Technol (IRJET) 06(01) (2019) [1]). Generally, users have to fill these online forms manually by typing similar information repeatedly. It becomes very monotonous for the human to perform the same actions again and again for a task that could be handled by the machine itself. Recently, many new techniques have emerged that store user data and automatically fill it when required. In this paper revolves around an improvised and better technique for automatically filling web browser-based forms for data which the user has entered previously, wherever required exactly or to similar meaning fields. The main idea here is to collect provided information and feed it to a database for future reference. The system makes association rules to judge what fields may be similar to others and connects their data; this enables other users’ applications to understand these similarities in fields without even having experienced it. To make this application more feasible, safe and to generalize its use, it is best to develop it as an extension in a browser (like Google Chrome) and drive the functions locally. Keywords Online forms · Database · Browser · Association rules

M. Gupta (B) · H. Sharma · N. Kumar · M. Rawat Meerut Institute of Engineering and Technology, Meerut, India e-mail: [email protected] H. Sharma e-mail: [email protected] N. Kumar e-mail: [email protected] M. Rawat e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_2

15

16

M. Gupta et al.

1 Introduction Today web applications are used for almost all fields of work and have become an innate part of our daily lives from Web sites for shopping, traveling, online applications, etc., to social media and governing applications. To fill multiple web forms, it takes a huge amount of time and is a laborious process. Often users fill information in forms that contains similar input fields and information to provide services [2]. With the development in web technologies and online culture, manual form filling still prevails as a time-eating demon even after efforts like browser’s default form filling services. To prevent such unessential labor, we use data mining integrated with machine learning to deliver accurate outputs. For example, to purchase a product online, the user needs to sign in the website and deliver the asked details through a form, say, name, phone number, and address. Now some other time a user wants to travel and visits such a Web site to book tickets, the similar details (like name contact and city) may be asked again and default browser-based services fail to fill these details. To solve this problem with automatic form filling, it can be developed further to be integrated with machine learning techniques such as reinforcement learning, natural language processing, and probabilistic models to get the data’s first time entry, then the data shall be stored, and several rules can be formed about this data at this stage, which will impact the future use of this data like contact means phone number [2]. This will fill the contact field on its first encounter itself. This software system will be automatic form filler for general computer users throughout the world. This system will be designed to minimize the user’s efforts and time invested in filling online forms by providing assistance to the user in automating the form filling and writing process that was meant to be performed manually. This tool shall provide a support to user’s efficiency in work without making things too complicated for user [3]. Precisely, this system is designed in order to support the user to save monotonous efforts and time delayed in manual filling of online forms repeatedly [4]. The software will facilitate the general public with predictive rules that can fill the form fields which were never actually filled by the user in the past. These rules are predicted by machine on the basis of data entered by other users which can relate multiple different forms of similar meaning words like phone number and contact [1]. The system contains a local database and a cloud-based service that keeps users synced with updated logics.

1.1 Existing System Nowadays, most such auto-filling applications can fill all kinds of dynamic forms. These applications may not fill the information that was not provided in the available database (like the details required for the first time cannot be filled automatically) [1]. Existing works in automatic form filling techniques disregard the importance of fetching good and accurate values of form filling. Most solutions that deal with

Learning-Based Model for Auto-Form Filling

17

Fig. 1 Extension for Google Chrome to fill forms

text fields generally rely on a dataset of past experiences through a special sample of known values, or they become dependent on one spoken language for processing similar meanings to fetch ideas of important data objects. Experiments using existing data mining models and proposed models on hidden web data deliver results in favor of reinforcement learning models implemented with association rules [3].

1.2 Proposed System In this document, the work proposes an automation way of filling forms. The proposed method will explore two strategies. The first module, known as local model, examines how to evaluate the fields efficiently, especially the text fields containing common fields such as names, addresses, and mobile numbers, which do not have a set of determined values as they vary person to person. This model stays on the client side (here, as an extension in web browser, see Fig. 1). The second module, known as Cloud Instance, stores and examines all the association rules that local modules from different clients upload to retrieve useful information as a collection [4]. The plan here is to make use of experiences of different users and develop some insights out of it that will help future encounters for other users [2]. This module ensures that bad experiences (data being not filled even when it was available) are not wasted, but learning from them helps other users (Fig. 2). • This application is easy to embed, hence, can fill most of the browser-based online forms automatically along with suggestions. • Machine learning techniques like forming association rules out of data objects are also used for making the dataset learn about generalized behavior of natural language; this boosts model’s range among different languages [2].

1.3 Advantages of Automatic Form Filler • • • •

User-friendly interface Saves time and effort Compatible and reliable Fast and accurate.

18

M. Gupta et al.

In Fig. 3, sample data is filled to “name” field and submitted by user. This data is stored in local database by extension and referenced by other fields for future forms. In next form field, “First name” appears (given in Fig. 4). From cloud, we already have an association rule for it (first name means name) along with its probability of being true, and data is filled from “name” to “First name” accordingly [1]. Fig. 2 Basic overview of client-cloud architecture

Fig. 3 Name field is filled manually by user

Fig. 4 First name field is automatically filled

Learning-Based Model for Auto-Form Filling

19

Writing module shall input the data that can be verified manually by the user.

2 Working of Local Model Basic idea for implementation here is, when the URL is loaded, form fields and labels on the appeared page are fetched. Then the algorithm filters this information according to the past field values. On the basis of probability, (that is decided by occurrences of expected events implying number of times a value was filled divided by total times this label appeared), module will write the most suitable values from a pool of data to the given form field. After submission (clicking on submit button), update module identifies the user edits of information provided, if the information filled by extension is not edited, then it will simply increase the probability for useful pointers and data objects present in our pool (database). But if user changed filled values in that field before submitting, model will change the probability of the pointers and create new pointers or data objects in the pool as per the situation (the new value was/was not present in dataset) [5]. This updating of probability occurs in local module for the present user, it allows extension to learn from its previous mistake and next time chances to find the most suitable data may increase. Update module also provides a functionality to cloud sync after specific time period (example every 30 days or after every 100 calls). Cloud syncing simply refers to downloading data association rules (like “phone number” points to “contact”) which were available on common cloud space but not on user’s local dataset, along with their probabilities. And also, uploading all its experiences (locally available rules) to the common cloud space to create a generalized set for other users to download/update. This syncing allows the local extension to learn from others’ experiences (meaning of one label in many other languages or synonyms, etc.) and fill data for some fields which user had never seen before (Fig. 5).

3 Algorithms The following algorithms account for working of local module.

3.1 Updating Module a. Inputs this module accepts read fields, which is an array of objects with key as label Id and value as field name.

20

M. Gupta et al.

Fig. 5 High level diagram of extension

b. Created a local array named as Data_object, which store the updated values of read array. c. Data Matching Section: 1. Initialize a variables named filter_with_id_present, updated_data_object with Data_object and filter_with_id_absent to store all the read data where the label id’s are present and absent respectively. 2. Filter_with_id_absent will be find by performing left join in read and Data_object. 3. Get the parsed read objects according to the changes. 4. For each entry as parsed_Item in parsed read objects. A. Increase the value of key with parsed_item.id of value of “count” in read object by one. B. Check whether the value in updated_data_object of key pared_item.id is equal to the value of parsed_item with key “value”. i. Update value of Data_object with key “probability” with (probability * (count − 1)/100)/count). C. If the condition false, then do.

Learning-Based Model for Auto-Form Filling

21

i. Update value of updated_data_object with key “probability” with (probability * (count − 1)/100 + 1)/count. 5. For each item as filtered_item of filter_with_id_absent. A. Update the value of rules with value returned by Rules Matching Section by passing filtered_item as parameters. 6. Return the value as updated_data_objects concated with Rules. d. Rules Matching Section: 1. Accepts the parameter with name absent_labels. 2. Initialize a variable named Label_id to store label as value and is as key of all the input fields. 3. Check whether the Label_id is null or not: A. If it is null, then create a new data object with equals to { id: absent_labels.id, label_id: absent_labels.name, probability: 100, count:1} B. If it is not null, then reassign rules with value {pointer: absent_label.value, location: Label_id, probability: 100, count:1} 4. Check whether rule.pointer is present in rules or not: A. If it is present, then increase the probability of that rule. B. If it is absent, then create a new object of that rule and append it in rules. 5. Return the rules.

3.2 Search Module a. Initialize variables named as data, forms and rules to store data_object, forms input fields and updated rules. b. Searching Section: 1. Get the all the data fields present in the form with tag id as “my_form”, store it to forms and for each data field as dataField, do: A. Assign data equals to dataField.my_form.data B. Assign rules equals to dataField.my_form.rule C. Call Find Label Section. c. Find Label Section: 1. For each form item in forms, do: A. Initialize a variable named as labels B. For label in labels, do: i. Convert label into lower case. ii. Initialize variable named as filter_data. iii. Filter the values from data where data.label equals label and data.probability is greater than 50. iv. Assign this filtered value to filter_data. C. Return or print the result

22

M. Gupta et al.

Fig. 6 Recall plotting in 20 to 100 forms filled

d. Clean Up Section: 1. Reassign module variables data equals to null. 2. Rules equals to null. 3. Forms equals to null.

4 Analysis Recall: Number of times a known value was filled in the field per the total number of fields whose value should have been filled ideally. Precision: Number of times correct values were filled per total values filled. Experimental dataset: This extension integrated with Google Chrome v. 99 observed 100 randomly appearing forms in daily life, like sign in, registration, filling e-shopping carts, most of them being Google Forms like job applications, event registration, college data-collection, etc. After successfully running this algorithm on a single machine, precision of 73.4 was achieved. That means 73.4% of times data filled by extension was found satisfactory. It keeps increasing with number of forms; thus, these results are just a preview of its achievement. Starting with low recall (that is known value was not filled). But after 100 forms, it was able to reach 57%. This can be significantly improved on integration with cloud module and increasing number of machines simultaneously (Figs. 6 and 7).

5 Conclusion and Future Scope This application is most suitable for html-based forms, it reduces manual effort and saves human’s time in filling details that can be automated. Cloud integration of

Learning-Based Model for Auto-Form Filling

23

Fig. 7 Precision versus number of forms filled graph

the system helps this system to fill details into other (inexperienced) languages or symbol fields as well. Accessibility of the system is easier as extension, but there are a few limitations to it as well [2]. Smartphone applications are now used by everyone for easier accessibility. Among all the platforms, it is the easiest to use for some of its abstract qualities. So, developing this web-based structure into an app that can fill forms on all other applications on the present platform is a good idea for upcoming updates. Although the form filling process may be the same as in web application. Security has been a prominent issue with cloud services and browser extensions. It is easier for hackers to hit browsers’ and exposing vulnerabilities. For better security assurance, cryptographic encryption is necessary for future updates.

References 1. Ponmurugesh M, Rajashobika M, Usha K, Senthil Kumar C (2019) Automatic form filler. Int Res J Eng Technol (IRJET) 06(01) 2. Anisha D (2018) Association rule [online]. Available: https://www.geeksforgeeks.org/associ ation-rule/ 3. Douglas N, Douglas G, Derrett R (eds) (2001) Special interest groups: context and cases. Wiley, Brisbane, QLD 4. Chrome Developers (2018) Extension development overview [online]. Available: https://develo per.chrome.com/docs/extensions/mv3/devguide/ 5. Upadhayay A (2019) Introduction to NodeJs and Javascript. Geeks for Geeks. Available: https:// www.geeksforgeeks.org/introduction-to-node-js/

Fatality Prediction in Road Accidents Using Neural Networks M. Rekha Sundari, Prasadu Reddi, K. Satyanarayana Murthy, and D. Sai Sowmya

Abstract The fatalities due to road traffic injuries have a great effect on society. The time of occurrence of road accidents can’t be predicted exactly but the reasons for road accident fatalities can be analyzed by a prediction model, so that care can be taken to reduce the risk. Several factors like age, gender, region, speed, etc., affect the occurrence of accidents on roads. Our paper presents a neural network model to predict the situations that could influence fatality rate in road accidents and identifies the factors that play a vital role in affecting the fatality rate. The work also addresses various solutions for class imbalance problem that arises in typical situations. Using the model, the relation between the attributes and fatality rate is analyzed. Keywords Road accidents · Fatality · Neural networks · Class imbalance

1 Introduction Accidents are caused due to various reasons, rash driving, drunk and drive, wrong signaling, carelessness of pedestrian, etc. Often the causes can be analyzed and avoided. Every individual or driver, irrespective of whether they are a cyclist, motorcyclist or pedestrian must be cautious all through their journey to avoid risk of road accidents. Though every person is aware of the common reasons of road accidents still the same conditions like drunken driving, speeding, avoiding safety restraints use are the most frequent reasons for fatal incidents on the road. The effect of road accidents and their impact on society are very high. The rapid increase in number of vehicles on roads causes congestion in traffic, thereby increasing the rate of freak road accidents [1]. Congestion in traffic during peak period of the day is identified as major reason for accidents. In an analysis in the M. Rekha Sundari (B) · P. Reddi · D. Sai Sowmya Department of Information Technology, Anil Neerukonda Institute of Technology & Sciences, Visakhapatnam, Andhra Pradesh, India e-mail: [email protected] K. Satyanarayana Murthy Department of Computer Science and Technology, Baba Institute of Technology and Sciences, Visakhapatnam, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_3

25

26

M. Rekha Sundari et al.

state of Andhra Pradesh, it was identified that incidents are prone during the timings of 9 A.M. to 11 A.M. and 5 P.M. to 9 P.M. During these timings, the roads are highly congested with vehicles. It is observed 70% of the accidents happen during these time frames. The congestion and connectivity of the road junction also plays a vital role in road accidents. At most of the famous centers or areas, accidents are prone during weekends rather than weekdays. The accidents that occur during the times other than specified above are only due to negligence of the drivers. The severity of accidents also depends upon the type of vehicles involved in the crash. The crash between motorcyclists may lead to only property damage or curable injuries under secured conditions like restraint and safety equipment use, but the crash between a heavy vehicle like coach and a motorcyclist may lead to severe injury even under secured conditions. Another major reason for fatality in accidents is speed. The speed limits should be followed strictly under various road types like urban, rural, one-way roads, and two way roads and also according to the vehicle type and make being used. The main goal of our work is to analyze under which conditions discussed above, there is a possibility of occurrence of fatality and to identify the role of each factor in causing fatality in accidents.

2 Literature Review Ogwueleka et al. [2] designed Artificial Neural Network (ANN) model for the study and forecasting of accidents in a particular region. Most recent data were considered for experimental analyses. Number of vehicles per day, accidents, road length in kilometers, and population were taken as model parameters. The unrelated and unwanted data were removed by preprocessing. Similar types of road accidents were categorized by using self-organizing map-based clustering. ANN was designed on these grouped clusters. The feed-forward-back propagation approach was utilized with the sigmoid and linear functions as activation functions. The authors conclude that ANN outperformed the statistical methods in use due to their ability of dealing with new and unexpected situations, and the neural networks are best suited for non-algorithmic problems. Azad Abdulhafedh [3] summarized different statistical modeling approaches for road accident predictions. These were the methods popularly used by transportation agencies and researchers to gain better understanding of the situations that lead to accidents. The models that were presented in the paper are Poisson Regression, Negative Binomial Regression Model (NB), Poisson-Lognormal Regression Model, Zero Inflated Poisson and Negative Binomial Regression Models, Conway–Maxwell Poisson Regression Models, Random-Parameter Mode, Artificial Neural Networks, Fuzzy Logic Models, and Logit and Probit Models. Pragya Baluni [4] in her work classified the road accidents of Uttarakhand using neural networks. The work concentrated to group the victims according to their age and gender and concluded that number of males affected were more than females. Miao M. Chong [5] models the severity of injury due to road accidents using Artificial

Fatality Prediction in Road Accidents Using Neural Networks

27

Neural Networks and decision trees. The data were collected from the National Automotive Sampling System (NASS) and General Estimates System (GES). The research concluded that the driver’s seat belt use, the road’s light condition, and the driver’s alcohol consumption are the three most critical factors in fatal injury. The decision tree consistently outperforms the neural network in all of the experiments. Dong [6] in his work proposed an improvised deep learning to explore the complex interconnections between the traffic crashes, roadways, and environmental elements. The proposed model includes two modules. i. An unguided characteristics learning module to identify functionality between the explanatory variables and the feature representations and ii. A guided tuning module to carry out traffic crash prediction. As a regression layer, the supervised tuning module includes a multivariate negative binomial (MVNB) model. The dataset collected from Knox County in Tennessee is used to validate the performances. Garrido et al. [7] user ordered Probit model an alternative method for linear regression to access the severity of accidents in urban and rural areas, between female and male, on which day and at what time of the day the accidents are frequent. The authors concluded that the model fits the data well, urban areas have decreased injury severity, females suffer with severe injury than men due to lack of technical proficiency in critical situations.

3 Methodology A neural network is a collection of large number of nodes connected in parallel and arranged in tiers, so that each tier of nodes depends on the prior tier for its input. A single layer neural network is a perceptron and a multilayer perceptron is a multilayer neural network in which it may have one or more than one hidden layers. A perceptron is a linear classifier that receives multiple inputs and produces a single output. The output is dependent on the weights attached to each input. The output is decided between 0 and 1 according to the relation between the weighted sum of the perceptron value and fixed threshold value. The first tier receives the raw input, and the output derived from the first tier is the sum of products of the calculations at each node with its own rules and weights developed or designed post training with data under all classes. The first layer that takes input from the dataset is the visible layer or input layer that passes the data to the next layer. A hidden layer is a layer after the input layer and before the output layer that is not either an input or output. A deep learning network is a network with many hidden layers to a neural network often with complex transformations. The final layer is an output layer that is tuned to output the values in the required format of the problem via a function. This function maps the weighted sum of inputs of the pen-ultimate layer to the output and is called as the transfer function. The functions can be classified as linear and nonlinear. They are selected for the application in the neural network output layer according

28

M. Rekha Sundari et al.

to the problem the user opts. A binary classification function chooses sigmoid function, whereas the multi-class classification problem may choose a softmax activation function. Neural networks are famous for their adaptability as they update themselves as they learn from the first step and subsequent steps. Initially, neural networks are trained with the training datasets that have samples to adjust the weights on the nodes being tuned with different inputs and their corresponding outputs. Each node calculates the product of the weights and inputs and sums these weighted inputs in each tier. The network is then tuned to decide what inputs to pass to the next tier. Neural networks found widespread use post-2010 when there arose a need to analyze huge amounts of data from web, business and media trends, and so on. Neural networks beat human accuracy in making predictions when they are trained with sufficient amount of training data. Here in our work, our goal is to maximize the accuracy of predicting capability of the network under different situations. The network must predict the fatality rate of an accident under different situations. Neural networks are chosen for implementation assuming that the performance with multiple hidden layers may result in better accuracy of classifying the data into fatal and non-fatal when we have the imbalanced data. Random forests algorithm performed well but failed to predict non-fatal tuples accurately. The dataset we considered consist many factors that affect road traffic injuries of which we have considered only some for our analyses, namely • • • • • • •

region of accident A_RU, speed involved in crash A_SPCRA, involving a drowsy driver A_DROWSY, day of week of accident A_DOW, Crash type A_CT, Class-1-fatal, Class-2-minor damages (not fatal).

The attribute region of accident is considered as main factor for analysis, and other attributes are considered as covariates. The main factor consists A_RU consists of three values 1 = rural, 2 = urban and 3 = unknown. The neural networks work on only numerical data so the data that is categorical is also converted in to numerical values. The processing summary of the dataset in Table 1 shows that out of 33,840 valid instances 23,467 cases are considered as training data and 10,193 as holdout sample. There are no cases eliminated from the data with the reason of missing data or unavailability of data. The model information Table 2 shows information about the neural network and can be used to verify if the specifications are correct. Note in particular that the sum of the counts of covariates and factor levels equals the number of units in the input layer; none of the variables repeats in the covariates or factor levels, as is representative in many posturing processes. The neural network presented in Fig. 1 consists of seven units in the input layer as the network considers four covariates as single units and three values of A_RU as

Fatality Prediction in Road Accidents Using Neural Networks Table 1 Dataset processing summary

29 #N

Sample

Percent (%)

Training

23,647

69.9

Holdout

10,193

30.1

33,840

100.0

Valid Excluded

0

Total

33,840

Table 2 Network information Heading level

Example

Font size and style

Input layer

Factors (region)

1

A_RU

Covariates

1

A_SPCRA

2

A_DROWSY

3

A_DOW

4 Number of units Hidden layer(s)

Rescaling method for covariates

Standardized

Count of hidden layers

1

Count of units in hidden layer

3

Activation function Output layer

A_CT 7

Dependent variables

Hyperbolic tangent 1

FATALS

Count of units

2

Activation function

Softmax

Error function

Cross-entropy

three units. There is only one hidden layer that consists three units the combination values of A_RU. The output layer consists of two units the class-1 and class-2 that indicates the case is fatal or not fatal, respectively. The hidden layers use hyperbolic tangent as the activation function as the data is to be classified between two classes. The property of probabilistic interpretation of the softmax activation function is essential at the output layer to arrive at the accurate results.

4 Results and Analyses Machine learning and data mining algorithms have been applied in literature to predict the severity of road accidents [8, 9]. These algorithms failed to predict the fatality rate under various conditions like region of accident, speed, drowsiness of the driver, day of the week, etc. The model evaluation techniques can’t be restricted to precision and recall [10, 11] that only evaluates the positive or negative cases

30

M. Rekha Sundari et al.

Fig. 1 Neural network with hidden layers

evaluation of the models using other metrics like specificity, sensitivity, and gain helps the researcher to analyze the model under different cases. The confusion matrix shown in Table 3 presents the predicted values of both the class. All the tuples of Class-1 are predicted true but none of the tuples of class-2 are predicted 2.Though the overall accuracy of the method is good the output suffers from a problem of class imbalance. The class imbalance problem exists when the tuples of one class dominate the other class to a maximum extent. Table 3 Classification data, confusion matrix Sample

Observed

1

Predicted 2

Percent correct (%)

Training

1

22,287

0

100.0

Holdout

2

1360

0

Overall percent (%)

100.0

0.0

1

9626

0

2

567

0

Overall percent (%)

100.0

0.0

Dependent Variable: FATALS

0.0 94.2 100.0 0.0 94.4

Fatality Prediction in Road Accidents Using Neural Networks

31

Performance Measures A large set of performance metrics allows MLAs to be comprehensively evaluated and compared to each other. Below equations define the overall accuracy of the classifier and the precision, FPR, F1-score, Sensitivity, and Specificity. • • • • • • •

Accuracy = (TP + TN) / (TP + FP + TN + FN) Precision = TP / (TP + FP) FPR = FP / (TN + FP) Sensitivity = TP / (TP + FN) Specificity = TN / (TN + FP) F1_Score = 2 * (Precision * Sensitivity) / (Precision + Sensitivity) Gain or lift is a measure of the efficacy of a classification model calculated as factor of results obtained with and without the model. Gain and lift charts are graphical aids for judging fulfillment of classification models.

where • TP = count of True Positive instances, i.e., the instances belonging to class not fatal correctly classified into the same class; • TN = count of True Negative instances, i.e., the instances belonging to the class fatal correctly classified into the same class; • FP = count of False Positive instances, i.e., the instances belonging to the class fatal erroneously classified into class fatal; • FN = count number of False Negative instances, i.e., the instances belonging to the class not fatal erroneously classified into class not fatal. The anticipated pseudo-transitivity chart in Fig. 2a depicts grouped box plots of predicted pseudo-probabilities for both the training and testing samples for categorical variables. The x-axis represents observed response categories, whereas the legend represents expected response categories. The plotted figure in the leftmost box 2b shows the anticipated pseudo-probability of category 1 in circumstances when category 1 was observed. The part of the boxplot above the 0.5 mark on the y-axis represents the classification table’s correct predictions. Below the 0.5 point, inaccurate predictions are represented. The next box plot Fig. 2c to the right displays the anticipated pseudo-probability of category 2 for cases where category 1 was seen. Because the target variable has only two categories, the first two boxplots are symmetrical around the horizontal line at 0.5. The third boxplot depicts the anticipated pseudo-probability of category 1 in circumstances when category 2 was observed. The anticipated pseudo-probability of category 2 is shown in the last boxplot Fig. 2d for occurrences where category 2 was observed. Correct predictions in the classification table are represented by the portion of the boxplot above the 0.5 mark on the y-axis. Below the 0.5 point, inaccurate predictions are represented.

32

M. Rekha Sundari et al.

Fig. 2 a Predicted Pseudo-transitivity chart. b Specificity, sensitivity. c Lift. d Gain

5 Conclusions The neural network presented in the work predicts the fatality rate accurately but suffers with the problem of class imbalance. The network classifies the class1 with 100% accuracy but fails to classify class2 data. This is because of unavailability or less

Fatality Prediction in Road Accidents Using Neural Networks

33

availability training set under class 2. Due to huge data available under fatality rate, there exists a bias toward majority class and ignorance toward minority class. This problem is significant in datasets like fraud detection, spam filtering, non-payment of loans, etc. To deal this problem of imbalanced data, various methods like data level techniques and algorithmic level techniques with combination of neural networks are available in the literature. As a future work, we like to implement over sampling, under sampling, and smote techniques to balance the class distribution.

References 1. Karlaftis MG, Golias I (2001) Effects of road geometry and traffic volumes on rural roadway accident rates. Accid Anal Prev 34:357–365 2. Ogwueleka FN et al (2014) An artificial neural network model for road accident prediction: a case study of a developing country. Acta Polytech Hungarica 11(5):177–197 3. Abdulhafedh A (2017) Road crash prediction models: different statistical modeling approaches. J Transp Technol 7(02):190 4. Baluni P, Raiwani YP (2014) Vehicular accident analysis using neural network. Int J Emerg Technol Adv Eng 4(9):161–164 5. Chong MM, Abraham A, Paprzycki M (2004) Traffic accident analysis using decision trees and neural networks. arXiv preprint cs/0405050 6. Dong C et al (2018) An improved deep learning model for traffic crash prediction. J Adv Transp 2018 7. Garrido R et al (2014) Prediction of road accident severity using the ordered probit model. Transp Res Procedia 3:214–223 8. Ramya S, Reshma SK, Manogna VD, Saroja YS, Gandhi GS (2019) Accident severity prediction using data mining methods. Int J Sci Res Comput Sci Eng Inf Technol 5(2) 9. Burnett RA, Si D (2017) Prediction of injuries and fatalities in aviation accidents through machine learning. In: Proceedings of the international conference on computer and data analysis 10. Lee J et al (2020) Model evaluation for forecasting traffic accident severity in rainy seasons using machine learning algorithms: Seoul city study. Appl Sci 10(1):129 11. Iranitalab A, Khattak A (2017) Comparison of four statistical and machine learning methods for crash severity prediction. Accid Anal Prev 108:27–36

Managing Peer Review Process of Research Paper Samarth Anand, Samarpan Jain, Sarthak Aggarwal, Shital Kasyap, and Mukesh Rawat

Abstract The research paper suggests a web-based evaluation management system for dealing with the complex problem of evaluating research papers in peer reviewing. The trend of Web sites is changing into web applications that acquire logical architecture, interactivity, security, etc. The suggested model aims to tackle the problem of aggregating various ideas in the form of research papers from authors all around the world and evaluating them by assigning papers to different evaluators according to their domain expertise. After taking the papers from the authors, we first determine the plagiarism score using the REST API. Admin can often view the plagiarism score, then admin can assign those papers to evaluators whose plagiarism score is lower than our threshold score. After reviewing the papers by the respective evaluators, if the reviewer rejects a particular paper for any reason, then this system suggests all the upcoming conferences and journals related to the interesting areas of the author so that the author can submit their paper to those conferences and journals by doing some necessary updates suggested by the evaluator. This system collects all the reviews/comments from the evaluators, then analyzes those data using machine learning algorithms which provide us with the key aspects behind the rejection of the research work (like plagiarism score, quality of research, etc.). It is also helpful in determining the interest of authors in various latest technologies related to the world. This entire application is using the storage of Google Cloud which makes this application lightweight and highly scalable. Since it is very difficult to evaluate lots of research papers simultaneously by a single user, the admin divides the submission S. Anand (B) · S. Jain · S. Aggarwal · S. Kasyap · M. Rawat Meerut Institute of Engineering and Technology, Meerut 250005, India e-mail: [email protected] S. Jain e-mail: [email protected] S. Aggarwal e-mail: [email protected] S. Kasyap e-mail: [email protected] M. Rawat e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_4

35

36

S. Anand et al.

of the authors between the evaluators so that the unbiased evaluation process goes smoothly and error-free. Keywords Web application · Security · Machine learning · Web scraping · Enterprise resource planning · REST API

1 Introduction Enterprise resource planning (ERP) [1, 2] refers to a type of software that organizations use to manage day-to-day activities such as reviewing, assigning, notifying, and evaluating documents. The best thing about ERP [1, 2] is its flexibility, i.e., it comprises many features in one application that customers can opt according to their needs. With the outbreak of the pandemic [7], ERP became one of the most important parts of the education sector in which it helps in the IT management of many educational institutions. It helps in the complete digitization of educational processes for the growth and development of the students and maintains the competitive nature among the students. It gives a wider perspective to the educators in the management of the institute. Online assessment [10] can significantly impact the professional and educational sector of us in the current pandemic. Due to pandemic most of us day-to-day are converted from traditional method to online techniques like students are learning online, meetings are conducted through online mode, and universities and various educational sectors are conducting their various events using online medium, etc. For all these online activities, there may some tasks such as collecting research articles, collecting assignments from students, evaluating them on time, and providing feedback manually make it a very daunting task. Using the proposed model, institutions get benefit in saving their time and resources over the traditional pen–paper method. Therefore, many institutions will be attracted by the features of the proposed model and definitely adopt it in their day-to-day activities. The proposed model is structured into four main sections. The following section includes Super Admin Panel, Admin Panel, Evaluator Panel, and Author’s Panel. The basic function of the Super Admin Panel is to generate different admin credentials. (S)he is the main user of the model who has created, deleted, and updated admins. The basic function of Admin Panel is to register different evaluator on the portal, and after registering them they assign the tasks to evaluator for reviewing, admin has the authorities to verify registered authors by reviewing their identity cards, admin can also download the machine learning report of the event which tell all the main reasons behind rejection of paper (like plagiarism, quality of content, up to the mark or not, etc.), what is the success rate of event and so on. The basic function of the Evaluator Panel is evaluating the assigned task and providing their feedback on it. The basic function of the Author’s Panel is to submit their paper, if in any case the author’s paper is rejected, then the model also suggests some future conferences to the user so that the user can again apply their paper for publishing by making

Managing Peer Review Process of Research Paper

37

respective changes suggested by evaluator. Therefore, it allows the model to split the articles between different panels to ease the evaluation process. Proposed model also ensures fair online evaluation with high speed and security.

2 Literature Survey 2.1 The Effect of Enterprise Resource Planning (ERP) System and Their Practical Use Cases on Business Performances [3] This paper provides us with an idea about the various effects of ERP systems on business. It examines this topic by analyzing a critical case study consisting of an Egyptian SME branch of a multinational company. The results indicate that in general many benefits in business performance were achieved after implementing the ERP as reported by the business users but have also shown that a few benefits previously linked to ERP were not fully achieved.

2.2 A Research Study on the Enterprise Resource Planning (ERP) System Implementation and Current Trends in ERP [5] From this paper, we found that an ERP system integrates all the functions in an organization like finance, marketing, manufacturing, and human resource with an advanced real-time data collection, processing, and communication with very fast speed allowing the organization for a quick decision on the real-time issues to control the complete business process today.

2.3 Enterprise Resource Planning (ERP) System Implementation: A Case for User Participation [6] This paper tells us about the different streams of research on ERP systems that have mainly been on ERP adoption, success measurement, and critical success factors (CSFs). There is a paucity of studies on user participation and the contribution of users toward the successful implementation of ERP systems.

38

S. Anand et al.

2.4 Research on Data Security Technology Based on Cloud Storage [9] This paper gives an idea of how we secure a storage scheme based on Tornado codes (DSBT) by combining the technique of symmetric encryption and erasure codes. The program uses boot password to solve the traditional data encryption in the problem of key preservation and management; system design by correcting Tornado data redundancy code delete code to solve problems and recover lost data; through a hash keyed to Tornado code with error correction function to solve the problem of data tampering. On this basis, the paper continues to research data retrieval (POR).

2.5 Future Improvisation Is the Work We understood the functioning of the classification algorithms, Django-based web applications [4, 8], use of AWS Cloud Storage services, Security of data by various methods, etc. We have studied the use of ERP-based systems and different ways to secure web applications. The same is tried to be implemented effectively in the model.

2.6 Why ERP? We are using the concept of ERP-based model to handle all the tasks that need to be obtained on a single platform and the author can submit their proposal on the portal then the admin can transfer that proposal to the reviewer for review. After the reviewing process is completed, admin (host) can check the Status of Reviewed paper and their respective reviews and generate reports according to their requirement. Through their graphical user interface, it makes ERP easy to use; users don’t need to write any query for performing operations; it can be done only by just Clicking the tabs on their panel.

3 Proposed Methodology 3.1 Work Flow of the System For a good understanding of any system, we need to study the overall workflow of the system. Let us discuss the workflow of our model as shown in Fig. 1, in the form of a flowchart.

Managing Peer Review Process of Research Paper

39

Fig. 1 Flow diagram of proposed model

It starts with the registration of the organization on the application, then the login credentials are sent to their registered E-mail Id. These are the credentials for super admin to login and create different admin based on their organization’s requirement. Then through the Login page, each different user such as super-admin, admin, evaluators, and authors can login to their respective panels. Let’s first discuss the various features provided to the Admin panel.

3.1.1

Hosting the Event

This feature provides the admin the ability to host the event with event-required details. As the organization is not bound to ask for any limited amount of details, they should create their Google Form but with at least some mandatory details like E-mail Id, research work upload, Id-card for authentication, etc. After providing all the mandatory details in the hosting tab, the event is quickly hosted on the platform.

3.1.2

Registering the Evaluators

To judge the research work someone with their high expertise and area of interest that matches with the event’s interest is needed. So, this feature allows admin to create such evaluators whose expertise and domain knowledge will evaluate the best research work out of all submissions. Admin after inviting the evaluators can send mails on their respective E-mail Id having login credentials, and they can login on the portal.

40

3.1.3

S. Anand et al.

Assigning Research Work Based on Evaluator Expertise

To assign the research work according to the evaluator’s expertise and interested domain admin can select the evaluator name and check the plagiarism score of respected authors. If admin finds the research work is appropriate, then research work is assigned to the appropriate evaluator to get their expert reviews on the research done by authors.

3.1.4

Checking the Review Given by Evaluators

After analyzing the research work by the evaluator, admin can use this feature to view the reviews given by all the evaluators and their acceptance or rejection toward the research work. Admin is allowed to generate the report based on the reviews given to each author for the future reference. This feature comprises two types of report such as a report that only specifies the number of entries, given reviews to each author and accepted/rejected results while another report is machine learning-based report which analyzes the key reasons of rejection, event success rate score, and all the visual graphs.

3.1.5

Sending the Results to Authors

After the final report is generated, the admin can release the results among the authors by sending customized auto-generated bulk E-mails to their respective E-mail Id. These E-mails will consist of the comments and changes (if any) suggested by the evaluators to the authors of accepted research work. Authors can contact the admin for any queries after viewing the notified results. After discussing all the features of the Admin Panel, let’s discuss more about the features of the Evaluator Panel as shown in Fig. 2. It contains features such as to view the assigned research work of authors, give them suggestions (if needed) and mark the research work accepted or rejected. All the information is stored in the database and can be viewed by the admin. Evaluators will be notified on their registered E-mail Id if any new work is assigned to them. There is also an application Feedback tab which can be used to give suggestions on the interface and usability of the application. This is all about the Evaluator Panel; now, let’s give some light to the features of the Authors Panel as shown in Fig. 3. It comprises many features such as to register in an event by filling all the required details, getting event notification on their registered E-mail Ids, resetting their password (if required) and getting the suggestion of the related conferences to be held in the future. All the details required by the ongoing event are saved over the Google Cloud which help in reducing the load from the server (Fig. 4). All this comprises the workflow of the system. Now, the main components involve in this application are as follows:

Managing Peer Review Process of Research Paper

Fig. 2 Flow diagram of admin dashboard

Fig. 3 Flow diagram of evaluator dashboard

41

42

S. Anand et al.

Fig. 4 Flow diagram of author’s dashboard

3.2 Different Components of the System 3.2.1

The Security Feature of the Login System

To maintain the privacy and security in the Login System, the application back end is based on Python Django Framework which is a highly secure framework. It uses SHA-256 encryption algorithm which is based on hashing techniques and supports one way encryption procedure making its decryption impossible. So, working on this framework protects the application from any type of hacking and data leakage practices. Figures 5, 6, and 7 will show the working of algorithm.

Fig. 5 Password saved using hash function in the form of hash value/digest

Fig. 6 When the password does not matched

Managing Peer Review Process of Research Paper

43

Fig. 7 When the password is matched

3.2.2

Detecting Plagiarism in Author’s Research Work

It is one of main features of the proposed model in this feature when authors submit their work/article on the portal, then submitted paper goes into the process of plagiarism detection to determine the plagiarism score of the author’s article. In this process, the proposed model uses the API of plagiarism detection and collects URLs of user’s articles from their submission then using those URLs with plagiarism API a url request is raised which after detecting plagiarism API return response which contains the plagiarism score. After this model updates the received plagiarism score into the csv using the csv module of python. When admin is assigning the received articles to different evaluator for evaluation process, then there is a column which displays the plagiarism score of the respective authors, using those plagiarism score admin can decide whether they have to assign that article to any evaluator or simply reject it due to high plagiarism content in it.

3.2.3

Providing Suggestions of Different Conferences on Similar Interest Areas of Author

Whenever a research work is rejected due to any reasons, it may be possible that in some other event it can be appreciated and accepted. So, keeping this in mind this feature suggests authors from different other conferences that will be held in the future and have the same area of interest as that of the author’s research area. To make this feature possible, web scraping from different websites is done and data is stored in the csv file. In doing web scraping and making multiple requests makes the processing slow as each request takes it’s time and also the time in loading the website takes place. Therefore, to overcome this problem use of beautiful soup with multi-processing is done which increases the efficiency of the application. In multi-processing, threads are created that run simultaneously with other threads, and hence, the overall time taken by each request executing together decreases, and data scraped is stored in the csv quickly. This allows the retrieval of data quickly, and the authors will get the best suited suggestions for them within seconds.

44

3.2.4

S. Anand et al.

Key Reasons Behind the Rejection of Research Papers

It is the main and very unique feature of the proposed model which makes the model stand out of the crowd. Through this feature, admin can generate the machine report of the whole event which tell the admin main reasons behind the rejection of papers (which may be plagiarism in author’s content, quality of research article, lack of sufficient data, use of wrong format, and many more), provide visual representation of different data, determine success rate score of the event, and many more. Since there are different evaluators who are evaluating the research work of the author and after evaluating their work, they provide the feedback of the articles and their decision either accept or reject of the user’s article, then using those feedback data collects from all evaluator, and their decision (acceptance/rejection) is used for generating the report. On that data concepts of Natural Language Processing are applied which is used to extract all the key words present in the feedback, then those keywords are grouped into two types of groups Acceptance which contains all keywords related to acceptance of article and the rejection group which contains all the keywords related to rejection of articles. Then the groups which are generated above are visualized using different visualization techniques. Using those visuals, we get the idea of the major reasons behind the rejection of their research articles and using those visuals we also get the idea of which mistake is made by lots of people. After this, using the data of acceptance and rejection of articles model also generates the score of the success rate of the event by determining how many articles are accepted by the evaluator and how many of articles are rejected by the evaluator out of the total articles received by the admin. Then, all the above data is used to generate the report. This report gives us the idea of major reasons behind the rejection and using those data users get the idea to manipulate their article such that their chances of acceptance increase.

4 Result The above table represents the comparison between the time taken in scraping the data from the number of different countries using sequential and parallel computation. The result shown in the table is computed by taking the mean of three observations. As a result, we observe that time taken by parallel computation is increasing slowly with the rise in the number of countries, while time taken in sequential computing is increasing very rapidly as the number of countries rises (Fig. 8 and Table 1). The overall worst-case complexity of Algorithm is around O(n), but it uses the concept of parallel computing which reduces its complexity to 1/10 of time taken in sequential computing.

Managing Peer Review Process of Research Paper

45

Fig. 8 Graphical representation of speed of parallel and sequential computation

Table 1 Performance table based on model’s speed using sequential and parallel computation Serial number Number of Number of thread Time taken in Time taken in countries selected used sequential parallel computation (in s) computation (in s) 1

3

3

8.78

3.70

2

5

5

12.26

4.01

3

8

8

20.11

4.23

4

10

10

24.51

4.31

5

15

15

39.37

5.08

5 Conclusion The proposed model can be used as an application which can serve the challenges faced by the different education institutions. It can provide the necessary features which are required to host research-related events. It has many other uses like assignment submissions, writing competitions, etc. It is a complete ERP model which is highly secure, fast, reliable, and efficient. The proposed model contains some unique features which stands out this model out of the crowd like machine learning-based complete analysis generation, using URL sharing instead of traditional document sharing method which makes the execution faster and preparing list of world-wide upcoming conferences suggestions for authors whose article has been rejected by expert evaluators to resubmit their article in the suggested conference by implementing some necessary changes. Since, the proposed model used the concept of parallel computation using multi-processing for generating the list of upcoming conferences, this makes the system very efficient in comparison with serial computation as parallel computation is approximately nine times faster in execution.

46

S. Anand et al.

References 1. Al-Fedaghi S (2011) Developing web applications. Int J Softw Eng Its Appl 5.https://doi.org/ 10.1007/978-1-4302-3531-6_12 2. Dalai A, Jena S (2011) Evaluation of web application security risks and secure design patterns, pp 565–568. https://doi.org/10.1145/1947940.1948057 3. Elragal A, Al-Serafi A (2011) The effect of ERP system implementation on business performance: an exploratory case-study. Commun IBIMA 2011:19. https://doi.org/10.5171/2011. 670212 4. Holovaty A, Kaplan-Moss J (2008) The definitive guide to Django. https://doi.org/10.1007/ 978-1-4302-0331-5 5. Kenge R (2020) A research study on the ERP system implementation and current trends in ERP. Shanlax Int J Manage 8:34–39. https://doi.org/10.34293/management.v8i2.3395 6. Matende A, Ogio P (2013) Enterprise resource planning (ERP) system implementation: a case for user participation. Procedia Technol 9:518–526. ISSN 2212-0173. https://doi.org/10.1016/ j.protcy.2013.12.058 7. Qiu W, Rutherford S, Mao A, Chu C (2017) The Pandemic and its impacts. Health Culture Soc 9:1–11. https://doi.org/10.5195/HCS.2017.221 8. Susheel S, Nelabhotla NK, Shyamasundar R (2015) Enforcing secure data sharing in web Application development frameworks like Django through information flow control, pp 551– 561. https://doi.org/10.1007/978-3-319-26961-0_34 9. Wang R (2017) Research on data security technology based on cloud storage. Procedia Eng 174:1340–1355. ISSN 1877-7058. https://doi.org/10.1016/j.proeng.2017.01.286 10. Zlatovi´c M, Balaban I, Kermek D (2015) Using online assessments to stimulate learning strategies and achievement of learning goals. Comput Educ 91:32–45. https://doi.org/10.1016/j.com pedu.2015.09.012

Internet of Things-Based Centralised Water Distribution Monitoring System Biswaranjan Bhola and Raghvendra Kumar

Abstract We have entered another world evolution period where the world’s financial areas are subject to each other for guaranteeing natural resources security and stable monetary conditions. However, major parts of rural regions in the world rely on groundwater for drinking purposes, and it is reported in the different surveys that 80% of the diseases in the world are due to the poor quality of drinking water. Hence it indicates that drinking water is scarce and human society will depend on the supply of water. There will be a chance of wastage of affluent supply drinking water which further leads to fast diminishing of water day by day. To reduce the wastage and monitoring the use of water by the end-user helps to improve the accuracy and applicability of the drinking water distribution network, water metre is a very necessary component. In the proposed work design a Internet of Things enabled water metre and installed into four different houses by the aim of reading the usage of water in a home and tested our device. All the water metres are connected to a central sever to store the data for testing the communication between sensor and server. The reading information are analysed through graph. The proposed work can be very much help full to minimised the wastage of drinking water. Keywords IoT · Water metre · IoT architecture

1 Introduction Always, the value of water in people’s lives remains most significant. In the twentyfirst century, however, the changing world is experiencing a lack of natural resources, recognising that the value of water for life is greater. During the slow regeneration Biswaranjan Bhola, Raghvendra Kumar: These authors contributed equally to this work. B. Bhola · R. Kumar (B) Department of Computer Science and Engineering, GIET University, Kharling, Gunupur 765022, Odisha, India e-mail: [email protected] B. Bhola e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_5

47

48

B. Bhola and R. Kumar

process, the use of water only for life is justified not for development purposes. Due to the limited availability of water, the use of water resources for many purposes has put varying pressures on the quality and quantity of water resources. This has led to a situation of water scarcity worldwide and has created significant challenges in water access and accessibility. Since the problem of uneven and irregular water is a global phenomenon, the United Nations (here in after, UN) estimates that by 2025 total water shortages will exceed 1.8 billion people or about a third of the worlds population can live under extreme water pressure (according to a UN report published in 2017). The gravity of the water pressure problem is so dire that efforts at various levels to address the challenges of accessibility and availability have been limited. The underlying complexity lies in the physical properties of water (such as changing shape in liquids, ice, and glaciers) as a barrier to measuring or increasing the volume and accessibility of water. With the need to address strategically identified challenges, academia is increasingly thinking, arguing, and exploring the need for water for life and well-being and its related issues. If we take an example country like India has thrived on rich sources of surface water supply due to the presence of water bodies such as the Ganga River, Kaveri, Narmada, Tapi, which are spread all over the country. However, due to untreated wastewater and unknown sources of pollution into the main rivers and waters, approximately 90% of surface water, including groundwater is polluting and it will not be used for drinking purpose. Other impacts like, highly contaminated with various pathogens, chemicals, granular substances, and pollutants are emerging problems [1–3]. As a result, there is a reduction in consumption of safe drinking water and India has major problems with the water crisis in many regions. According to the study conducted by the Integrated Water Management Index (IWMI 2018), major cities in India such as Hyderabad, Delhi, Chennai, and others will face zero groundwater levels by the year 2020. This will affect approximately 100 million residents in these areas and other cities in the country. It is, therefore, necessary to introduce remedial measures such as wastewater treatment and reuse so that natural water resources are protected from other pollutants. Access to safe, affordable water supply and sustainable sanitation is declining across the country. As a result, cases of uneven distribution of piped water in urban and rural areas and excessive sewage from open sewage have been observed in several urban and rural areas. It severely damages the living standards of urban people, spreads unrest in the community, and increases environmental pollution. It is, therefore, necessary to implement corrective measures at the country level for safe and clean water supply in urban and rural areas [4]. As a result of the rapid urban development in India, the strategies for safe and clean drinking water and sanitation are being implemented by the government in collaboration with various communities; including international and national organisations. Water and sanitation-related investments have skyrocketed worldwide due to population growth, urban development, increased demand for freshwater resources, and so on. In this regard, several local governments and administrators have initiated many water supply works and maintained infrastructure, however, weakness and lack of financial resources have been found to have a significant impact on overall safe water supply activities [5]. In addition, the development of sanitation strate-

Internet of Things-Based Centralised Water Distribution …

49

gies and the development of the city’s sanitation process will help to reduce water pollution and keep natural waters clean. The operational capacity of the municipal sanitation system needs to be increased by providing technical and financial support to enable the various processes such as wastewater collection, transportation, treatment, and disposal to be carried out. The introduction of the Supply and Sanitation Service Development Programme (WSS) will also support institutional development plans and contribute to the implementation of a sustainable water supply system in the country. Adoption of urban infrastructure development programmes, such as urban development, will help to streamline water supply management and ensure an integrated water supply for all areas. To improve sanitation and safe water supply at urban and semi-urban levels, the Government of India has initiated many water and sanitation improvement programmes related to the development of infrastructure, governance, and educational frameworks at various levels standards. It will also help to define the urban treatment process and integrate it with the proposed plan and will effectively solve problems related to the conduct of the sanitary cycle. To maintain the water sanitary cycle properly and reducing the wastage of water, a water metre plays a vital role in the entire water transport and management system. Again if the water metre is connected with the IoT then it allows controlling all the water metre connected with different houses and the information related to the water supply can be stored centrally using that one automated business model can be designed, can be predicted for the future requirement of the water supply, etc. Hence as per discussion of the above points, water metre is an important element for the future of the water department of government as well as the houses present within the country. In the proposed paper discuss some water metre currently available in the market. The information of the commercialised water metre is taken from the E-business website like amazon, Flipkart, etc., and analysis the features of the water metre. The explained paper also presents the architecture of a water metre which supports the IoT also. The different applications of IoT are the Health monitoring system, Traffic Management system, Smarthome automation system, etc. [6–9]. IoT Provides a platform that connects all the physical in a one unit using Internet. The devices are connected with the user using certain architecture [10–12]. Hence if the water metre and IoT is integrated designed a device which can be used to manged and monitor water supply in each home. The main goal of this work is to create a water metre network that connects three water metres put in three distinct homes. The number of persons present in each home distinguishes them from one another. The data from a month’s worth of utilised water metres is kept on a centralised server. We utilised the XAMP server for this study, but it may also be used on the cloud. Each water metre is connected to a server through the internet, which stores the data, and then we utilise graphs to analyse the water usage data. The bellow section of the explained paper is organised in the following way. Section two explains the water metre available in the market with specification and analysis, Sect. 3 discussed the architecture for a water metre, Sect. 4 summarise the result part and in Sect. 5 the conclusion part of this paper is explained.

50

B. Bhola and R. Kumar

Fig. 1 Proposed water metre architecture with IoT enabled

2 Proposed Cloud Architecture for Water Metre The architecture of a device is a blueprint that contains the functional components of a particular system. In the proposed architecture is divided into four important layer. The layers are Sensors and Actuators layer, IoT device/ Edge computing layer, Cloud Provider layer, Enterprise network layer. The whole architecture and its functional units are depicted in Fig. 1.

2.1 Sensors and Actuators Layer The sensor and actuators layer is the physical layer for this device. One turbine is associated with every sensor. As per the rotation of the turbine and speed of the turbine the algorithm is written to calculate the flow amount of water. If we measure the flow amount concerning the time then it can easily find out the water tank’s full state and water tank’s empty state. That means we can find out the water level which very helps full for the consumer as well as the water department to fill the water tank. Actuter is the flow valve for the water distribution system and water motor switch for the single tone system. Single-tone is the system in which the water lift from the well or borewell. The distributed system is the system in which the tank is filled by the supply water system.

Internet of Things-Based Centralised Water Distribution …

51

2.2 IoT Device/Edge Computing Layer The second layer is the IoT device/Edge computing layer. It is an intermediate layer between the Sensor and Actuator layer and the Cloud provider layer. The main objective of this layer is to manage the information generated by the sensor and transmit the information to the central repository that is the cloud. The functional components of this layer are Data storage, Data Analytics, and AI Models. Data storage is the buffer memory that is used to store the reading data from the sensor. Using data analytics we can calculate the water flow rate, water level in the tank, and using AI Models we can predict any error in the device or any leakage in the home water distribution system.

2.3 Cloud Provider Layer The Cloud Provider layer is the central repository for the whole system, Through this layer, all the information and application required for the systems are managed. System Security is also managed using the cloud. The prediction like water requirement for the area, Water requirement for the home, Fund generations is predicted and analysed by this layer. The functional components of this layer are Device Registry, Analytics, Process Management, Application logic, etc. It is an intermediate layer between the enterprise layer and the IoT device layer.

2.4 Enterprise Network Layer Using the Enterprise layer the end-user like the consumer and water department connected with the whole system. It contains an end-user interface for the user. The functional component for this layer are Enterprise data, Enterprise user directory to manage the different permission for the user, and Enterprises application. In this section, the architecture of the whole system is explained. The communication structure from the device to user is explained and the functional components of each layer are also analysed. The basic function of each layer is also discussed. The next section explains the result session.

3 Backbone of Water Metre Network In the previous section the architecture of entire IoT water energy metre networks with cloud components are explained. For better monitoring the whole networks are divided into some sub networks by village wise. Entire backbone of the energy water metre network is complex, so Fig. 2 explain the abstract diagram of the backbone.

52

B. Bhola and R. Kumar

Fig. 2 Energy water metre network backbone

Cloud Network System (CNS) is the top layer of the backbone. The CNS is cloud infrastructure which worked as a central repository and central analysis of information. CNS is maintaining distributed database system for efficient analysis of information. Each CNS connected with the Water Network Service provider (WNS) in a mesh topology. WNS layer connected with the CNS using Internet network. WNS server setup by the service provider company. The intermediate layer like WNS, Zone Water Network Service provider (ZWNS), Village Water Network Service provider (VWNS) are functioning as an edge computing devices of entire network. The VWNS connected with ZWNS either using Internet or LORA network as per the availability of the network. As most of the village in country like India having no Internet connectivity. For that we can used low power-based long distance coverage LORA network [13] to connect VWNS to ZWNS. House water metre connected with VWNS using LORA network or Internet as per the availability of the network. In both case the first priority is given to the Internet, if Internet is not available then LORA network is setup for communication with the edge server setup in the village.

Internet of Things-Based Centralised Water Distribution …

53

4 Communication Between Water Metres and Server In the previous section explained the backbone and interconnectivity of the water metre with the interconnected network, and this section explained the algorithm associated in water metre as well as edge server for making communication. All the communication carried out using Transmission Control Protocol (TCP), and Hyper Text Transfer Protocol (HTTP) protocol. Each smart metre is uniquely identified using IP address for communication in the network, and a unique port number is defined for the communication. Algorithm 1 explained the socket programme which is used to read the data from the client(water metre) and stored into the database. In step1 and step2 it initialise the public IP address and the port number for the communication, then it verifies the IP address and check its correctness. Socket programme creates an end to end communication point between server and client and create a socket descriptor sfd. Using sfd, and listen API which is a predefined API for reading the data from the client it reads the information from the client like its address, water use information, port number, and stored into the database with the current timestamp. Algorithm 1 Operation executed by the Server Require: Read the data from the water metre Ensure: Stored into the database 1: Initialise: Initialise server IP //It contain the IP address of the Server 2: Initialise: Initialise server port //Port number to identify the application 3: Database db 4: Store the stream address to sfd 5: if ServerIP = INVALID then 6: Invalid address error 7: end if 8: Bind(sfd, db) 9: while 1 do 10: Wait until Ack is not recieved 11: ClientAddress ⇐ ReadAddress() 12: ClientPort ⇐ ReadPort() 13: ClientData ⇐ ReadData() 14: Updated db(ClientAddress, ClientPort, ClientData) 15: Send Ack to client 16: end while

Algorithm 2 explains the data transmission procedure to send the used water information from water metre to edge server. Each water metre assigned one unique Media Access Control (MAC) address, IP address, and port number. Using socket programme it sends the information. It is communicated with the server using http protocol by implementing TCP/IP socket programming. The details of the water metre devices are explained in Fig. 3.

54

B. Bhola and R. Kumar

Algorithm 2 Operation executed by the water metre client Require: Read data from flowsensor Ensure: Send data to server to store in central databases 1: Initialise: MAC 2: Initialise: SSID 3: Initialise: IP 4: Initialise: Port 5: Store the stream address to cfd 6: if ServerIP = INVALID then 7: Invalid address error 8: end if 9: Start timer interrupts 10: PreviousMillis ⇐ 0 11: while 1 do 12: flowRate ⇐ ((1000.0/(millis() − PreviousMillis)) ∗ pulse1Sec)/CalibrationFactor 13: PreviousMillis ⇐ millis() 14: Store flowRate 15: while !(currentMillis - previousMillis > interval) do 16: Wait 17: end while 18: end while 19: IntruptHandler(Send flowRate to Server)

5 Develop Water Metre and Its Components There are four main components of the water metre. Each components has different function. The function of each components are explained bellow. The components are Arduino UNO, Ethernet Shield, LCD, and flow sensor. Arduino UNO is a processing element. It contained 8 bit micro processor, and 28 pins. The basic function of UNO is that it reads information from to the flow sensor in form of a signal, and processed it. LCD is used as an output component for UNO. It is used to display the flow rate and water consumption in the metre. Ethernet shield containing the Network Interface Card (NIC) which is used to communicate with the network. Flow sensor is used to measure the amount of water flow into the flow sensor. Figure 3 display the figure of water metre which is designed and implemented into 3 different house hold to test the water metre and read the used water information. In the next section (RESULT) section we analyse the used water information of three house hold of one week.

6 Results Water is directly linked to human life, so water-related studies have been conducted on almost every topic. These studies are fundamentally problem-oriented and usually aim to solve water-related problems within a specific management area. To properly

Internet of Things-Based Centralised Water Distribution …

55

Fig. 3 Used water metre

Fig. 4 Graph of water consumption in three house hold

manage the water, the water metre plays a vital roll. In the above section three, the architecture of the proposed water metre, its related algorithms, designed device, and its components are explained. For verifying this device, at first we installed in to one house for verifying the flow rate. Figure 5a and b represents the flow rate analysis a house. It indicates that when you simultaneously open three tap the water flow rate is increased for the particular level of emptyness of the water storage tank. After that the flow rate is equal if you open three tap, two tap or on tap. Figure 5b indicates that the water level is directly proportional with the flowrate (Fig. 4). After testing completed the device is installed into the 3 different house hold and read the water consumption for one week. Household1 containing 2 family

56

B. Bhola and R. Kumar

Fig. 5 Analysis of the water flow rate

members, Household2 containing 3 family members, and Household3 containing 8 family members. Graph 4a, b explains the use water of three households within seven days. This device can be set up in every home for reading the used water information which very much helps full for the government by reading automated information form the home directly. Funds can be generated to process the quality water as well as the consumer can get quality water by which a healthy and peace full lifestyle can be created.

7 Conclusion Water is a necessary element for the life and the amount of drinking water is very little as compared to the increasing population of the world. Hence lots of researches of the different departments has given importance to this area. The proposed work can helps a lot in the area to manage the wastage of water as well as to provide the quality of the water to the society. Lots of issues can be arised to implement this architecture. The issues are electricity issue, Internet issue, security issue, performance issue, the quality issue which can be treated as the future work for the proposed work. The water is life. Without Water, the life will not be imagined hence this work is required for society.

References 1. Cox D, Van Nierkerk K, Govender V, Anton B, Smits S, Sullivan CA, Chonguica E, Monggawe F, Nyagwambo L, Pule R, Bonjean M (2008) Reaping the benefits-how local governments gain from IWRM. Southern Cross University 2. Sharholy M, Ahmad K, Mahmood G, Trivedi RC (2008) Municipal solid waste management in Indian cities—a review. Waste Manage 28(2):459–467

Internet of Things-Based Centralised Water Distribution …

57

3. Maria A (2003) The costs of water pollution in India. In: Market development of water & waste technologies through environmental economics, vol 99, no 8, pp 459–480 4. Sharma Y (1997) Case study I—The Ganga, India. In: Water Pollution Control—a guide to the use of water quality management principles, vol 21, no 2, pp 480–488 5. Greenstone M, Hanna R (2011) Environmental regulations, air and water pollution, and infant mortality in India. Am Econ Rev 20(7):409–488 6. Motlagh NH, Khajavi SH, Jaribion A, Holmstrom J (2018) An IoT-based automation system for older homes: a use case for lighting system. IEEE, pp 247–252 7. Imran Quadri SA, Sathish P (2017) IoT based home automation and surveillance system. In: ICICCS. IEEE, pp 861–866 8. Shaikh SA, Kapare AS (2017) Intelligent office area monitoring and control using internet of things. IJAREEIE 6:4705–4711 9. Baker SB, Xing W, Atkinson I (2017) Internet of things for smart healthcare: technologies, challenges, and opportunities. IEEE Access 5:26521–26544 10. Munir A, Kansakar P, Khan SU (2017) IFCIoT: integrated fog cloud IoT. IEEE Consumer Electronics Magazine, pp 74–82, July 2017 11. RiazulIslam SM, NazimUddin M, Kwak KS (2016) The IoT: exciting possibilities for bettering lives. IEEE Consumer Electronics Magazine, pp 49–57, April 2016 12. Sabella D, Vaillant A, Kuure P, Rauschenbach U, Giust F (2016) Mobile edge computing architecture. IEEE Consumer Electronics Magazine, pp 84–91, October 2016

StakePage: Analysis of Stakeholders of an Information System Using Page Rank Algorithm Tanveer Hassan, Chaudhary Wali Mohammad, and Mohd. Sadiq

Abstract The requirements elicitation process is the sub-process of requirements engineering in which stakeholders are involved to identify the requirements of an information system. Various methods have been developed to identify and analyze the stakeholders like goal-oriented requirements elicitation process, traditional methods, etc. Based on our review, we found that little attention is given to the analysis of the stakeholders using different measures of social networks before the starting of the requirements elicitation process. Therefore, to address this issue, this paper presents a StackPage method for the analysis of the stakeholders using page rank algorithm of social network measures. The applicability of the StackPage method is discussed with the help of the stakeholders of an institute examination system. Finally, the results are compared with two state-of-the-art methods, i.e., StakeRare and StakeSoNet methods. Keywords Requirements elicitation · Goal-oriented requirements elicitation · Stakeholders · Social network measures

1 Introduction The requirements elicitation process is used to identify the need of stakeholders using different techniques like traditional methods, goal-oriented methods, etc. These methods can only be effective if the stakeholders have been identified according to the need of the project. In real life applications, several stakeholders participate to identify the requirements of an information system. It is an important research issue that how to identify the key stakeholders from the initial set of the stakeholders. Therefore, the T. Hassan · C. W. Mohammad Department of Applied Sciences and Humanities, Faculty of Engineering and Technology, Jamia Millia Islamia, A Central University, New Delhi 110025, India e-mail: [email protected] Mohd. Sadiq (B) Software Engineering Laboratory, Computer Engineering Section, UPFET, Jamia Millia Islamia, A Central University, New Delhi 110025, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_6

59

60

T. Hassan et al.

objective of this paper is to analyze the stakeholders for the successful development of an information system [1]. Social network analysis (SNA) of stakeholders investigates the social structures of the stakeholders using the network and graph theory. SNA is the quantitative and qualitative analysis of the social network. It measures the flow of relationship between the knowledge-possessing entities. The social network of stakeholders analyzes the behavior of the stakeholders at the micro-level and the pattern of the relationship among the stakeholders [2]. We have identified following tools to examine the social structures [3]: (1) AllegroGraph, (2) Commetrix, (3) Social Network Visualizer, (4) Java Universal Network/Graph Framework (JUNG), (5) Tulip, (6) Statnet, (7) Netlytic, (8) NetworkX, (9) Cytoscape, (10) Subdue, (11) Graphviz, (12) NetMiner, (13) SocioViz, (14) NetworkKit, (15) GraphStream, (16) R, (17) Pajek, and (18) Gephi. Among these tools, the Gephi tool is widely used for the analysis of the social structures of the stakeholders. In the literature of the SNA, the following centrality measures are used to understand the networks which is based on the graph theory to identify the importance of a node in a graph, i.e., degree centrality, betweenness centrality, closeness centrality, Eigen centrality, and page rank centrality [4]. Each centrality measure of social network has its importance. For example, the degree centrality of a node in a graph G is computed by calculating the number of links held by each node of a graph G. This centrality is used for identifying the popular and connected stakeholders who hold most of the information of a project. Bibi et al. [5] discussed the applications of online social networks for the identification of the influential authors in an academic network. The authors have used the digital bibliography library project (DBLP) to identify the citation count and h-index of each author. Based on our review, we found that several stakeholders are present during the development of a project; and it is important to identify the key and most influential stakeholders from the list of the stakeholders who can participates during the development of a project. In the literature, most of the focus is on the identification of the stakeholders rather than on the analysis of the stakeholders using social network measures. Only few studies have discussed on the application of the social networks during the requirements analysis. For example, Hassan et al. [4] developed a StakeSoNet methodology for the analysis of the stakeholders using social networks in which degree centrality, betweenness centrality, and closeness centrality were used for the analysis of the stakeholders of an institute examination system. One of the limitations of this method is that it uncovers the influential and key stakeholders whose reach extends beyond just their direct connections. Therefore, to address this issue, we present a StackPage method using page rank algorithm for the analysis of the stakeholders of an information system. The contributions of the paper are given below: 1. A StackPage method has been developed to analyze the stakeholders of an institute examination system using one of the social network measures, i.e., page rank algorithm

StakePage: Analysis of Stakeholders of an Information System Using …

61

2. The StackPage method is compared with two state-of-the-art methods, i.e., StakeRare and StackSoNet methodologies using the data gleaned from the work of Hassan et al. [4]. The remaining part of this paper is organized as follows: The related work in the area of the stakeholder’s analysis in the context of software project management is discussed in Sect. 2. The proposed StakePage method for the analysis of the stakeholders is presented in Sect. 3. The applicability of the proposed method is discussed in Sect. 4 with help of the requirements of an institute examination system. The comparative study between the proposed method and the two state-of-the-art methods is discussed in Sect. 5. Finally, the conclusion and the future work are given in Sect. 6.

2 Related Work The SNA is a key tool to investigate the necessary information about the stakeholders. This tool is used to deal with uncertainty and missing information in a social network. Ghali et al. [6] applied SNA on DBLP dataset to identify the number of publications in the co-authorship relationships among the authors of research articles. The different social network measures were compared in [7]. Damian et al. [8] focused on the challenges in the global software development. The authors have reported the applications of SNA to explore the collaboration and awareness among the team members. Babar et al. [9] proposed a StakeMeter for the analysis of the stakeholders using values-based requirements engineering process. In their method, different types of the stakeholders factors were discussed like risk factor, communication factor, skill factor, etc. Glinz and Wieringa [10] developed a process to classify the stakeholders into three key categories, i.e., critical, major, and minor. This process does not contain the stakeholder identification and quantification. Lim and Finkelstein [11] proposed a StakeRare method that uses social networks as well as collaborative filtering for the identification of the software requirements and its prioritization. This method identifies stakeholders for large software project and asks the stakeholders to recommend the other stakeholders. Based on these recommendations, a social network was developed in which nodes were represented by the “stakeholders” and recommendations by “links”. The StakeRare method was evaluated based on a software project for a 30,000 user system. As a result, it was found that StakeRare method predicts stakeholders’ needs accurately. One of the limitations of the StakeRare method was that crisp values were used during the recommendations of the stakeholders during the construction of the social network of stakeholders. To overcome this limitation, Hassan et al. [4] developed a StakeSoNet methodology in which fuzzy-based approach was used during the recommendations of the stakeholders. In their method, three social network measures were used for the analysis of the stakeholders, i.e., degree centrality, betweenness centrality, and closeness centrality. These centralities fail to identify the stakeholders

62

T. Hassan et al.

whose influence extends through direct connections. Therefore, to address this issue, we proposed a StakePage methodology using page rank algorithm for the analysis of the stakeholders of an information system.

3 Proposed Method This section presents a method for the identification and analysis of the stakeholders of an information system using page rank algorithm, i.e., StakePage. The StakePage method includes the following steps: • • • •

Step 1: Identification of stakeholders and their roles Step 2: Draw the social network of stakeholders Step 3: Analyze the stakeholders using page rank algorithm Step 4: Classify the stakeholders based on their importance.

Step 1: Identification of stakeholders and their roles A stakeholder can be defined as an individual or group who has interest in the development of a project. Stakeholders are the key players in the development of an information system [12]. The aim of this step is to identify the stakeholders (St), i.e., St1 , St2 , . . . , St p based on the requirements of the project. The initial set of stakeholders is identified after the analysis of the existing documents. After that, the roles of the stakeholders are discussed and finalized so that stakeholders who can contribute in the successful development of a project can be identified. The initial set of stakeholders can recommend the other stakeholders based on the need of the project. The information about the recommendations of stakeholders is stored in database, as shown in Table 1. For example, stakeholder St1 can recommends the new stakeholder Stk , where k < p, to estimate and analyze the cost of the project with high confidence values, see Table 1. Step 2: Draw the social network of stakeholders In this step, the relationship among the stakeholders is identified and represented using the social network. This network is referred to as a social network of stakeholders, as shown in Fig. 1. In Fig. 1, there are seven stakeholders and the directions from one stakeholder to another stakeholder show the recommendation of the stakeholder. For example, stakeholder St1 recommends the stakeholder St2 ; and there is a Table 1 Stakeholder’s matrix S. No.

Recommended by

New stakeholders

Role of the new stakeholder

Confidence value

1

St1

Stk

Cost analysis

High

2

St5

St j

Security analysis

Medium

StakePage: Analysis of Stakeholders of an Information System Using …

63

Fig. 1 Social network of stakeholders

direct path from St1 to St2 . The information about the roles and the confidence value of the stakeholder are stored in the stakeholder matrix. Step 3: Analyze the stakeholders using page rank algorithm Analyzing the identified stakeholders is one of the key steps of the StackPage in which key stakeholders are identified based on the information presented in social network graph of the stakeholders using page rank algorithm. The page rank algorithm [13] is also referred to as a Google algorithm to determine which page is important. The page ranking algorithm has been widely used in different area like social media, web information retrieval, econometrics, etc. The basic idea of this step is to compute the importance of each stakeholder before the starting of the requirements elicitation process. A stakeholder is important if it is pointed to by other stakeholders. Suppose there are p stakeholders in a project and these stakeholders are connected as directed graph G = (V, E). The importance of a stakeholder k, i.e., I (k) can be computed as I (k) =

 I ( j) Out j j,k∈E

(1)

The above Eq. (1) can be generalized as It+1 = STMT It

(2)

In Eq. (2), STM is the “state transition matrix” of the stakeholders and “t” is the number of iterations. The state transition matrix of STM can be represented as ⎡

STM11 STM12 ⎢ STM21 STM22 ⎢ STM = ⎢ . .. ⎣ .. . STM p1 STM p2

⎤ · · · STM1 p · · · STM2 p ⎥ ⎥ .. ⎥ ··· . ⎦ · · · STM pp

Suppose, there are some connections among three nodes in a social network graph of stakeholders as: A ↔ B → C. The importance value of ST M will be I (A) = I (B) . Let the importance of B = 3. Then, the value of I ( A) = 23 = 1.5. Here, 2 2 indicates the out-degree of stakeholder B, which is 2. Now, I (B) = I (A) = 1.5. 1

64

T. Hassan et al.

Let I0 = (I0 (1), I0 (2), . . . , I0 ( p))T is a column vector and a p × p transition probability matrix STM, We have p 

I0 (t) = 1

(3)

t=1



STMt j =

1 Outt

0,

, if (t, j) = E otherwise

(4)

After Z iterations, the I Z will converge to a steady state probability vector µ regardless of the choice of I0 , i.e., lim I Z = µ. At the steady state, we have Z →∞

I Z = I Z +1 = µ, and thus, µ = (STM)T µ, where µ is the principal eigenvector of (STM)T with eigenvalue = 1. Step 4: Classify the stakeholders based on their importance The aim of this step is to classify the stakeholders based on the importance values obtained from the previous step. All the identified stakeholders cannot participate during the elicitation process. So, it is important to identify those stakeholders who have good knowledge about different types of functional requirements (FR) and non-functional requirements (NFRs). After the successful execution of this step, the classified list of the stakeholders along with their importance value will be identified; and it will be useful during the elicitation of the software requirements of an information system; Further, it will also be useful during the development of the information system.

4 Case Study The aim of this section is to explain the steps of the StackPage methodology by considering the institute examination system (IES) as a case study. The IES has been widely used in the area of software engineering. For example, Sadiq and Devi [14] developed a method for the selection and prioritization of the requirements of an IES using fuzzy soft-set theory. In another study, Sadiq et al. [15] developed method for the analysis of the requirements under incomplete linguistic preference relations. Step 1: Identification of stakeholders and their roles In this step, we have identified p = 18 stakeholders based on the need of the requirements of an IES. The stakeholder s St 1 , i.e., director of an institution recommends the controller of examination (CoE), i.e., St 3 to understand the need of the IES with very high (VH) confidence value. The stakeholder s St 2 also recommend St 3 for the same purpose with high confidence value. This process was repeated to recommend the remaining stakeholders of the IES. In this study, we have used the following

StakePage: Analysis of Stakeholders of an Information System Using …

65

linguistic variables to specify the confidence value during the recommendation of the stakeholders in a stakeholder matrix: very high (VH), high (H), medium (M), low (L), and very low (VL). The information about the stakeholders is exhibited in Table 2. Table 2 Stakeholder’s matrix S. No.

Recommended by

New stakeholders

Role of the new stakeholder

Confidence value

1

St1

St3

Examination activities

Very high

2

St2

St3

Examination activities

High

3

St3

St4

Requirements analyst

High

4

St3

St5

Developer

High

5

St3

St6

Assistant CoE

Very high

6

St6

St7

Head

Medium

7

St6

St8

Faculty

Low

8

St6

St9

Student

Very low

9

St4

St10

Requirements analyst for teacher’s module

Medium

10

St4

St11

Requirements analyst for student’s module

Medium

11

St4

St12

Requirements analyst for administrative module

Medium

12

St4

St13

Graphical user interface (GUI) designer

Medium

13

St5

St14

Database administrator Medium

14

St5

St15

Tester

Medium

15

St10

St16

Requirements analyst for NFRs

High

16

St10

St17

Interface designer

High

17

St11

St16

Requirements analyst for NFRs

Medium

18

St11

St17

Interface designer

Medium

19

St12

St16

Requirements analyst for NFRs

Medium

20

St12

St17

Interface designer

Medium

21

St13

St14

Database administrator Medium

22

St4

St18

Requirements modeling engineer

High

66

T. Hassan et al.

Fig. 2 Identified stakeholders of an IES [4]

Step 2: Draw the social network of stakeholders In this step, we have drawn the social network of 18 stakeholders. The social network of 18 stakeholders is exhibited in Fig. 2. In our case study, we have identified the same set of the stakeholders that have been identified in our previous work [4]. Step 3: Analyze the stakeholders using page rank algorithm The state transition matrix (STM) of the stakeholders of Fig. 2 is given below:

StakePage: Analysis of Stakeholders of an Information System Using …

67

After applying Eq. (2), the importance ranking of stakeholders = 0.0, 0.0, 2.0, 0.33, 0.50, 0.33, 0.33, 0.33, 0.33, 0.167, 0.167, 0.167, 0.167, 1.5, 0.5, 1.5, 1.5, 0.167. Step 4: Classify the stakeholders based on their importance Based on our analysis, we found that stakeholder St 3 has more importance followed by St 14 , St 16 , and St 17 . Stakeholders St 10 , St 11 , St 12 will elicit the FRs; on the other hand, the NFRs of IES will be elicited by stakeholder St 16 .

5 Comparative Study This section presents a comparative study between the proposed method and two selected methods, i.e., StakeRare and StakeSoNet methodologies. In StakeRare method, there was no discussion about the analysis of the stakeholders based on different measures of social networks which can be used to identify the most influential stakeholders of a project. In this methodology, the crisp values were used for the analysis of the stakeholders. In real life, people use linguistic variables to specify the preferences of the requirements. To address this issue, a StakeSoNet methodology was developed in which L−1 R−1 inverse arithmetic principle was employed to model the linguistic variables. In StakeSoNet methodology, the degree centrality, betweenness centrality, and closeness centrality of the stakeholders were computed for the analysis of the stakeholders. One of the limitations of these methods is that it fails to identify the influential and key stakeholders whose reach extends beyond just their direct connections. To address this issue, the StakePage method is developed for the analysis of the stakeholders using page rank algorithm using the requirements of an IES.

6 Conclusion and Future Work This paper presents a StakePage method for the analysis of the stakeholders using page rank algorithm. This method includes the following steps: (a) identification of stakeholders and their roles, (b) draw the social network of stakeholders, (c) analyze the stakeholders using page rank algorithm; and (d) classify the stakeholders based on their importance. The method has been applied to analyze the stakeholders of an IES. The information about different stakeholders and their recommendations about the other stakeholders are stored in stakeholder matrix which have four attributes, i.e., recommended by, new stakeholders, role of new stakeholder, and confidence value. In our study, we have identified 18 stakeholders from the stakeholder matrix. Based on the analysis, it is found that stakeholder St 3 has high importance; and stakeholder St 16 is also key stakeholder for the elicitation of the NFR of an IES. Among various

68

T. Hassan et al.

stakeholders, St 14 , St 16 , and St 17 are the key stakeholders for the analysis of the stakeholders of the IES. Future research agenda includes the following: • To develop an extended StakeRare methodology using fuzzy logic • To extend the work of StakePage methodology using rough and fuzzy-set theory.

References 1. Stokman FN (2001) Networks: social. In: International encyclopedia of the social and behavioral sciences, pp 10509–10514 2. Brandes U.: Social network algorithms and software. In: International encyclopedia of the social and behavioral sciences, 2nd edn, pp 454–460 3. Tools: https://www.rankred.com/free-social-network-analysis-tools/ 4. Hassan T, Mohammad CW, Sadiq M (2020) StakeSoNet: analysis of stakeholders using social networks. In: IEEE 17th India Council international conference, India, pp 1–6 5. Bibi F, Khan HU, Iqbal T, Farooq M, Mehmood I, Nam Y (2018) Ranking authors in an academic network using social network measures. Appl Sci 8:1824–1842 6. Ghali N, Panda M, Hassanien AE, Abraham A, Snasel V (2012) Social networks analysis: tools, measures and visualization. In: Abraham A (ed) Computational social networks. Springer, London, pp 3–23 7. Guzman JD, Deckro RF, Robbins MJ, Morris JF, Ballester NA (2014) An analytical comparison of social network measures. IEEE Trans Comput Soc Syst 1(1):35–45 8. Damian D, Marczak S, Kwan I (2007) Collaboration patterns and the impact of distance on awareness in requirements-centred social networks. In: 15th IEEE international requirements engineering conference, pp 59–68 9. Babar MI, Ghazali M, Jawawi DNA, Zaheer KB (2015) StakeMeter: value-based stakeholder identification and quantification framework for value-based software systems.PLOS ONE 10(3):1–33 10. Glinz M, Wieringa RJ (2007) Guest editors introduction: stakeholders in requirements engineering. IEEE Softw 24(2):18–20 11. Lim SL, Finkelstein A (2012) StakeRare: using social networks and collaborative filtering for large-scale requirements elicitation. IEEE Trans Software Eng 38(3):707–735 12. Pacheco C, Garcia I (2012) A systematic literature review of stakeholder identification methods in requirements elicitation process. J Syst Softw 85(9):2171–2181 13. Massimo F (2010) PageRank: standing on the shoulders of giants. Commun ACM 54(6):92–101 14. Sadiq M, Devi VS (2022) Fuzzy-soft set approach for ranking the functional requirements of software. Expert Syst Appl 193:1–11 15. Sadiq M, Parveen A, Jain SK (2021) Software requirements selection with incomplete linguistic preference relations. Bus Inf Syst 63:669–688

Violence Recognition from Videos Using Deep Learning Shivam Rathi, Shivam Sharma, Sachin Ojha, and Kapil Kumar

Abstract Violence recognition is a process of recognizing violent activities in public places through video surveillance. Violence recognition can be used to monitor violence in public places through video surveillance. In this paper, CNN algorithm is implemented with MobileNetV2 architecture and OpenCV for violence recognition. In this system, we use various real-life-based videos to train our system in which half of the dataset contains violence videos and another half contains safe videos. The proposed model achieves the accuracy of 94%. Keywords Violence recognition · CNN algorithm · Computer vision · Deep learning · Supervised · Violence

1 Introduction Violence recognition using deep learning is a challenging task in computer science domain. Computer vision provides various techniques to work on video and images datasets [1]. In this paper, violence recognition system is proposed which takes video as input; it may contain audio or not and recognizes violent activities in frames generated from the input video and shows each frame as safe or violent as output. This paper focuses on taking frames from the input video coming from CCTV or other sources [2]. After trying some basic approaches for violence recognition, CNN + MobileNetV2 is chosen for this paper. S. Rathi (B) · S. Sharma · S. Ojha · K. Kumar Meerut Institute of Engineering and Technology, Meerut, India e-mail: [email protected] S. Sharma e-mail: [email protected] S. Ojha e-mail: [email protected] K. Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_7

69

70

S. Rathi et al.

MobileNetV2 is a type of convolutional neural network that is 53 layers deep designed for mobile and embedded vision applications. Convolutional neural network is a type of artificial neural network which is very useful while working with images and videos [3]. CNN uses deep learning to perform descriptive tasks. CNN has a multilayer perceptron system. The layers of CNN are input layer, output layer, hidden layer, pooling layers and fully connected layers [4]. This paper uses CNN as main working algorithm to train our model. CNN is used to extract features from frames, learn from images and classify them as violent or safe. The main objective of this paper is to reduce human intervention into the monitoring of videos collected from various public places CCTV cameras to recognize the violence activities happening in them [5]. Hope this paper will help in further violence recognition and activities happening in computer vision and deep learning domain [6]. The remaining portion of this paper contains Sect. 2 which consists of Literature Review, Sect. 4 which consists of Proposed Methodology, Sect. 5 which consists of Result Discussion and Sect. 7 which consists of conclusion.

2 Related Work With the increasing number of surveillance cameras in modern cities, huge number of videos can be collected. While there is insufficient human resource for monitoring all the screens at one time. The techniques of video understanding to recognize violent behaviour [7]. In this research, statistical analysis morphological and thresholding techniques are proposed to process the pictures obtained from sample violence videos. Convolutional neural network (CNN) model is going to be used for the same. There are various researches on this topic which use different classification techniques [8]. This problem can be solved by using SVM and KNN also, but CNN with MobileNetV2 is used in this paper for solving this problem.

3 Research Gap In December 2019, Soliman et al. proposed a model which used a pre-trained VGG16 on ImageNet as spatial feature extractor followed by long short-term memory (LSTM) as sequence of fully connected layers for classification purpose.

Violence Recognition from Videos Using Deep Learning

71

The dataset used was real-time violence situation which contains 2000 videos divided into 1000 violence videos and 1000 non-violence videos. The proposed models achieved a best accuracy of 88.2% [13]. Dataset In this paper, real-life violence dataset is used which consists of video clips having violence and others are normal video clips containing normal activities. Both types of videos are stored in different directories. This dataset contains 1000 violence video clips and 1000 non-violence video clips. In this paper, only 350 violence and 350 non-violence video clips are used to train the model due to memory issues.

4 Proposed Methodology Figure 1 shows the block diagram for this paper and describes the control flow of various steps which includes frame generation, frame image augmentation and checking for violence and labelling frames with their respective class after classification. Step 1: Data splitting In this paper, 70% of our dataset videos are used for training purpose and 30% of videos are used for validation purpose. This means 245 violence videos and 245 non-violence videos are used for training and 105 violence and 105 non-violence videos are used for validation. Step 2: Dataset preprocessing In this step, frames are generated from video clips using computer vision tool OpenCV. The frames will be augmented and preprocessed further. Augmentation is done to enlarge the size of dataset to overcome the problem of overfitting. Image frames are pulled out from the video clips, and each frame is stored in the dataset. Then, the frame is resized into 128 * 128 * 3 to decrease computational time. Step 3: Developing a neural network model The dataset is split into training and testing dataset. Then, a MobileNetV2 pre-trained model with CNN classifier for classification of frames is fitted with training dataset. Each frame is fed into a neural network. It goes through the following layers: zero padding layer, convolutional layer, batch normalization layer, sigmoid activation layer, max pooling layer (twice), flatten layer, dense fully connected layer with one neuron. Step 4: Training and experimentation on datasets Training and testing are performed on MobileNetV2 pre-trained model with CNN classifier on the dataset to do the prediction precisely. The model is trained for 50

72

S. Rathi et al.

Fig. 1 Block diagram

epochs and loss, and accuracy plots will be constructed. The accuracy of the model is also calculated. Convolutional neural network (CNN) Figure 2 shows layers of CNN. The convolutional neural network works best when images are used as input. It extracts features from the image and uses that feature matrix as a filter to recognize the feature for further inputs. CNN works on analogous architecture that works like neurons of a human brain. CNN requires less amount of data pre-processing as compared to other deep learning algorithms. CNN provides large progress in computer vision domain. It works as multilayer perceptron. The workflow of CNN is input layer, convolutional layer, pooling layer, fully connected layer. CNN is location invariant algorithm in recognizing object and feature from image. CNN algorithm with MobileNetV2 pre-trained model is used to extract features from frames, on using real-time violence situation dataset; with this model, a total no. of 1281 trainable parameters are obtained.

Violence Recognition from Videos Using Deep Learning

73

Fig. 2 CNN layers

Layered Configuration Input layer takes an image frame of size 128 * 128 with three colour channels. Then, it goes to first convolutional layer with an image size of 64 * 64 with 32 parameters. Then, it goes to batch normalization layer and then ReLU layer. Then, the neural network is expanded depth wise, and the same process is done for 16 blocks, and after that, size of 4 * 4 with 1280 is obtained. Then, it goes through global average pooling 2D layer, and finally, in the dense layer 1281 trainable parameters are obtained.

5 Experimental Results After training our model, it is capable enough to generate frames from video clips and recognize whether it is having violence or not.

5.1 Confusion Matrix The confusion matrix is used to describe the performance of a classification model. Figure 3 shows the confusion matrix of our model, which states that the true negative values are 1667, true positive values are 2295, false positive values are 111, and false negative values are 121.

5.2 Output Figures 4 and 5 are the sample screenshot of output, which shows the label on each frame output.

74

S. Rathi et al.

Fig. 3 Confusion matrix

Fig. 4 Output showing violence

Fig. 5 Output showing safe frame

Figure 4 shows the output frame with violence label in red colour, which shows that the above image frame, contains violence. Figure 5 frame shows the output image frame with no violence and safe label in green colour, which shows that the above image frame, does not contain violence.

Violence Recognition from Videos Using Deep Learning

75

6 Classification Report Figure 6 shows the classification report obtained from the trained model.

6.1 Training and Validation Loss Figure 7 shows the training and validation loss of our model, where orange line shows validation loss and blue line shows training loss.

6.2 Training and Validation Accuracy Figure 8 shows the training and validation accuracy of our model, where orange line shows validation accuracy and blue line shows training accuracy. Fig. 6 Classification report

Fig. 7 Training and validation loss

76

S. Rathi et al.

Fig. 8 Training and validation accuracy

7 Conclusion The authors concluded that the proposed model is able to recognize violence from videos by generating frames from it and labelling each frame as violence or safe by using CNN algorithm and obtains an accuracy of 94%. This model is capable enough to recognize violence in real life-based videos. The authors hope that this work will help in reducing violence at public places by doing monitoring of videos by this system in place of human monitoring system.

References 1. Abdali A-MR, Al-Tuma RF (2019) Robust real-time violence detection in video using CNN And LSTM. In: 2019 2nd scientific conference of computer sciences (SCCS) 2. Gkountakos K, Ioannidis K, Tsikrika T, Vrochidis S, Kompatsiaris I (2021) Crowd violence detection from video footage. In: 2021 international conference on content-based multimedia indexing (CBMI) 3. Jain A, Vishwakarma DK (2020) State-of-the-arts violence detection using ConvNets. In: 2020 international conference on communication and signal processing (ICCSP) 4. Roman DGC, Chávez GC (2020) Violence detection and localization in surveillance video. In: 2020 33rd SIBGRAPI conference on graphics, patterns and images (SIBGRAPI) 5. Su M, Zhang C, Tong Y, Liang B, Ma S, Wang J (2021) Deep learning in video violence detection. In: 2021 international conference on computer technology and media convergence design (CTMCD) 6. Jianjie S, Weijun Z (2020) Violence detection based on three-dimensional convolutional neural network with inception-ResNet. In: 2020 IEEE conference on telecommunications, optics and computer science (TOCS), pp 145–150. https://doi.org/10.1109/TOCS50858.2020.9339755 7. Eitta AA, Barghash T, Nafea Y, Gomaa W (2021) Automatic detection of violence in video scenes. In: 2021 international joint conference on neural networks (IJCNN), pp 1–8. https:// doi.org/10.1109/IJCNN52387.2021.9533669

Violence Recognition from Videos Using Deep Learning

77

8. Ratnesh RK, Singh M, Pathak S, Dakulagi V (2020) Reactive magnetron sputtered-assisted deposition of nanocomposite thin films with tunable magnetic, electrical and interfacial properties. J Nanopart Res 22:290 9. Ratnesh RK, Goel A, Kaushik G, Garg H, Chandan, Singh M, Prasad B (2021) Advancement and challenges in MOSFET scaling. Mater Sci Semicond Process 134:106002 10. Ratnesh RK (2019) Hot injection blended tunable CdS quantum dots for production of blue LED and a selective detection of Cu2+ ions in aqueous medium. J Opt Laser Technol 116:103–111 11. Chandan, Ratnesh RK, Kumar A (2021) A Compact dual rectangular slot monopole antenna for WLAN /WiMAX applications. Innov Cyber Phys Syst 788:699–705 12. Pathak S, Kishore N, Upadhyay G, Ratnesh RK, Mishra R (2021) A compact size planar microstrip-fed patch antenna with hexagonal DGS Slot for WLAN application. Recent Trends Electron Commun, LNEE 777:263–271 13. Soliman MM, Kamal MH, El-Massih Nashed MA, Mostafa YM, Chawky BS, Khattab D (2019) Violence recognition from videos using deep learning techniques. In: 2019 ninth international conference on intelligent computing and information systems (ICICIS)

Stock Price Prediction Using Machine Learning Piyush, Amarjeet, Anubhav Sharma, Sunil Kumar, and Nighat Naaz Ansari

Abstract Stock price prediction is the methodology of predicting the future value of a company stock. It is strenuous task to buy stock or invest in set of goods/assets; the financial market’s fast transformation makes it very difficult for prediction of future value of assets with high accuracy. Machine learning is the phenomenon of teaching computers to perform tasks that would normally need human intelligence, and is a major topic in scientific research at present. This article tries to develop a model for predicting future stock market rates by using the “Long Short Term Memory” model also called LSTM algorithm along with deep learning algorithms which include DENSE, DROP OUT, and SEQUENTIAL. In this work, we have considered mainly five factors that are open, close, low, high, and volume. Keywords Stock market prediction · Random forest regression · Machine learning · Artificial neural network

1 Introduction Many studies have looked at the use of machine learning (ML) in quantitative finance, including such forecasting stock, regulating and limiting a vast asset portfolio, as well as many other tasks that ML algorithms can do. Machine learning, in general, refers Piyush · Amarjeet · A. Sharma · S. Kumar (B) · N. N. Ansari Department of Computer Science and Engineering, Meerut Institute of Engineering and Technology, Meerut, India e-mail: [email protected] Piyush e-mail: [email protected] Amarjeet e-mail: [email protected] A. Sharma e-mail: [email protected] N. N. Ansari e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_8

79

80

Piyush et al.

to any process that uses computers to identify patterns relying on data instead of programming instructions. In mathematical finance, several models provide a wide range of strategies for using machine learning to anticipate future capital value, notably asset selection. Models of this type provide a technique for combining weak sources of data and turn it into a bizarre instrument that may be put to good use. Several ML techniques, including as critical neural networks, GBRT, SVM, and random forecast, have recently used after refining as a mix of statistics and learning models. These methods can show complicated nonlinear patterns and also some relationships that are impossible to uncover using linear techniques. These types of algorithms also help in outperform the process linear regressions in context of efficiency and the multicollinearity. A vast number of research are now being conducted on the issue of ML methods in finance. From Some of the studies it employed treebased models for forecast portfolio returns [1], while on the other hand, some uses deep learning (DL) in prediction of future asset prices [2, 3]. In addition of this, other writers looked at how the ADa Boost algo may be used to anticipate returns [4]. Many go through to predict stock returns using an unique decision-making modal for daytrading stock market investments. For portfolio selection, the authors employ the support vector machine (SVM) technique, and for stock return predictions, they use the mean variance (MV) method. Deep learning methodologies and techniques for the purpose of smart indexing were being discussed with the help another study [5]. In addition, some research has looked into a wide range of trends and ML applications in quantitative finance [6]. The review for this literature is being cover by these papers, and also include the return of forecasting with the construction of portfolio decision and that of sentiment analysis. A class of machine learning techniques based on long-term memory (passing data sequences) does not rely on these models in this regard. In the financial world, the stock market price Forecasting and prediction, recurrent neural networks shown to be quite beneficiary. From the prediction of time series data, a paper compared the efficiency of ARIMA and LSTM as illustrated in the methodology. These algorithms were also being tested on the set of financial data, and the findings that help in revealing that the LSTM outperformed ARIMA with a very wide margin [7]. The aim of our study is to estimate the modified closing price for the portfolio of an asset using algorithms of machine learning based on LSTM and RNN. The purpose of this article is to develop a model for predicting future stock market values using the (LSTM).

2 Related Work During early studies on stock price prediction, E. F. Fama presented the efficient market theory, whereas Horne and Parker introduced a random walk hypothesis. According to these beliefs, market prices are impacted by variables other than prior prices, hence market price prediction is impossible. As per the EMH theory, a stock’s value is wholly sensitive to market data, so any new information will cause a price movement as a reaction to newly found information. The idea claimed that stocks

Stock Price Prediction Using Machine Learning

81

always sold for their fair value, with traders is unable buy or sell stocks at a reduced or inflated cost, and the only option for a trader to increase his or her earnings was to take on greater risk. EMH discusses three separate differences that influence market pricing: Weak form, which only considers historical data, mid form, which includes current public data and also past data, and Semi strong, that considers both formal and informal data. Any price change, according to EMH, is either the result of newly available information or a spontaneous shift that could cause prediction models to fail. A random walk hypothesis asserts that stock prices change at random and also that past price swings are unrelated to current price changes, as proposed by Horne and Parker. This distinguishes form EMH that it concentrates on short-term stock market trends. In contrast to these views, a number of recent research have demonstrated how stock price movements could be forecast to some extent. To anticipate stock market values, this research uses two different methods of financial analysis: First one is Fundamental analysis is centred on the company’s health and covers qualitative and quantitative elements like rate of interest, asset turnover, sales, costs, and cost to profit, between others. The goal of this research is to determine the company’s longterm viability and strength in order to make a protracted investment. Second one is technical analysis is based on data from time series. Traders examine previous price fluctuations and patterns of chart, with time as a key factor in their forecasting. Technical analysis relies on three basic elements: movement of stock price, which might appear random at times, historical trends that are thought to reoccur over time, and all pertinent information related to stocks. Long Short-Term Memory [LSTM] LSTM is a kind of RNN that has the ability to learn with the aid of long-term dependencies. LSTMs perform exceptionally well on a wide range of sequence modelling problems, and they are now frequently used. [LSTMs] designed in a manner for avoidance of problem which is being occurred because of long-term dependency [8]. It posses the property of Remembering the information over long period of time in its behaviour. Let’s have a look at what an RNN looks like shown in Fig. 1. The RNN unit, as we saw in the RNN article, takes current input (X) and prior input (A) to create output (H) and current state (A) as shown in Fig. 2.

Fig. 1 RNN model

82

Piyush et al.

Fig. 2 Process of RNN model

In comparison to a singular tanh (activation) level in a recurrent neural network, LSTMs have the same structure with the help of internals and contain many components. An LSTM block has four levels that interact with one another as shown in Fig. 3. For performing the operation of LSTM, the major part is played by this top-most horizontal line which run from left side to right side highlighted below in Fig. 4. Fig. 3 LSTM model

Fig. 4 Operations of LSTM model

Stock Price Prediction Using Machine Learning

83

By using a few modest linear interactions with the line, the cell state C allows information to trickle down from the whole LSTM without being modified, allowing the LSTM to recall context from previous steps multiple times. There are various inputs and outputs on this line that allow us to add and delete information from the cell state. Gates are used to manage the process of adding and removing information. The sigmoid layers are these (Yellow boxes inside the RNN cell). The result is displayed as output ranges amongst 0 and 1, allowing us to determine which constituent is being allowed in/through. Allow nothing with a value of 0 through, and anything with a value of just one through. In the LSTM process, gates are employed to modulate cell state. Forget Gate [FG] Let’s take a closer look on first gate, also known as the forget gate. The purpose of this gate is to help determine whether data from the cell’s state should be discarded. It is chosen with the aid of the first sigmoid layer, which considers the past output as well as the current input. The forget gate equation is     f t = σ W f ∗ h t−1 , X t + b f

(1)

Consider the following phrase, in which the next word is to be predicted: Robert called Angela to ask her out. Because the pronoun her is predicated on the subject Angela rather than Robert in this line, the computer will forget about context Robert when it gets encounters with the new subject Angela. This is exactly depiction of process of forget gate how it works. Next phase will show what will be stored in the state of the cell C. Input Gate [IG] Another form of sigmoid layer is the [IG], which displays output between 0 and 1 and determines which value will be updated. With the aid of the tanh layer, those values that will be needed for the process of updating the cell state are calculated, and these two are combined to create an updation to states. In the above example, we are attempting to replace the older ones by adding the gender of a new subject to the cell state. i t = σ (Wi∗ h t−1 , xt + bi )

(2)

    t = tanh Wc ∗ h t−1 , xt + bc C

(3)

We’ll now perform the procedure of updating the old cell state to the new cell state. We needed to accomplish this since the previous phases had already determined what we needed to do. t Ct = f t ∗ Ct−1 + i t ∗ C

(4)

84

Piyush et al.

This is where, in the case of a language-based model, we actually dismiss the old subject’s gendered data and replace it with new data, as we decided in the previous phases. Output Gate [OG] This is a crucial gate since it determines which output we will provide. The output is dependent on the condition of the cells, but it will be in a filter format. We’ll start by running the sigmoid layer, which will select which part of such cell’s states output amongst −1 and 1, then multiplying that output also with sigmoid gate’s result so we can only output the part we agree to. Mathematical equation look like as shown below.     Ot = σ W0 ∗ h t−1 , xt + b0

(5)

h t = Ot ∗ tanh(Ct )

(6)

Dropout Dropout is a regularisation technique for neural networks it removes a unit (along with the rest of the network with connections) with a specified uniform probability at training time (p = 0.5 is a common value.) All units are present at test time, but with weights. p has been scaled (i.e. w becomes pw). The goal of Dropout is to avoid co-adaptation, which occurs when a neural network becomes overly reliant on one input for specific connections, as this may be a sign of overfitting. Intuitively, Dropout can be viewed as the formation of an implicit neural ensemble. Dense The dense layer is a closely linked with neural network layer, meaning that each neuron receives input on all neurons in the previous layer. A dense area was discovered to be the more generally used layer in this algorithm. Sequential In Keras, the easiest way to produced or develop a model is sequential. It helps to build a model layer by layer. Each layer has the same weights as the layer above this.

3 Proposed Methodology Getting historical data from the market is must, as stated in the previous section. The next step is to extract the feature that is necessary for data analysis. Then divide the data for the purpose of training and testing by using an algorithm for predicting the

Stock Price Prediction Using Machine Learning

85

price, and finally visualising the data. The suggested system’s architecture is depicted in Fig. 5. In this work, we have taken daily beginning prices of stocks collected from GOOGLE. For creating our model, we have utilised LSTM algorithm that uses 80% for the training of dataset and 20% for testing. We optimise our model using mean squared error for training. In this work, we have applied various Epochs during dataset training (10 Epochs, 20 Epochs, 55 Epochs, and 95 Epochs). This model is now organised as follows: Layer (types)

Output shapes

Parameter

LSTM_ _1 (LSTM)

(NONE, 55, 95)

37,631

Dropout_ _1 (DROPOUT)

(NONE, 55, 95)

0

LSTM_ _2 (LSTM)

(NONE, 55, 95)

74,111

Dropout_ _2 (DROPOUT)

(NONE, 55, 95)

0

LSTM_ _3 (LSTM)

(NONE, 55, 95)

74,111

Dropout_ _3 (DROPOUT)

(NONE, 55, 95)

0

LSTM_ _4 (LSTM)

(NONE, 95)

Dropout_ _4 (DROPOUT)

(NONE, 97)

0

Dense_ _1 (DENSE)

(NONE, 1)

96

Fig. 5 System architecture of proposed model

74,111

86

Piyush et al.

4 Results and Discussion This is a dataset created by Google for the aim of data training. In Table 1, again, we have six parameters: date, open, high, low, close, and volume which are being used for training of model. After training the dataset, we reshape the data to fit them in Keras recurrent neural network. After that, we will apply the process which includes LSTM layers with dropout regulation in which we use methodology to compile the model and fit the model. After this step, we will preprocess the data, and in the last step, we will be going to predict the output. Finally, as indicated in the picture, we get our outcome visualisation. The result is shown in Fig. 6 by comparing it to the trained model generated by the technique described in the preceding section. The “x” axis represents Time. The “y” axis represents price of a stock. Figure 6 depicts the data for a 100-day period. Table 1 Google dataset for the aim of data training Date

Open

High

Low

Close

01/03/12

325.25

332.83

324.97

663.59

7,380,500

01/04/12

331.27

333.87

329.08

666.45

5,749,400

01/05/12

329.83

330.75

326.89

657.21

6,590,300

01/06/12

328.34

328.77

323.68

648.24

5,405,900

01/09/12

322.04

322.29

309.46

620.76

11,688,800

01/10/12

313.7

315.72

307.3

621.43

8,824,000

01/11/12

310.59

313.52

309.4

624.25

4,817,800

01/12/12

314.43

315.26

312.08

627.92

3,764,400

01/13/12

311.96

312.3

309.37

623.28

4,631,800

Fig. 6 Google stock price prediction

Volume

Stock Price Prediction Using Machine Learning

87

5 Conclusion By using the proposed model, anyone can witness the Google stock price prediction, analysis, and visualisation by using deep learning algorithms including LSTM, DENSE, DROP OUT, and SEQUENTIAL. The proposed model can also be applied for every company’s stock dataset with the most accurate prognosis. Another benefit of this model is that it can work on any platform, including cloud systems.

References 1. Moritz B, Zimmermann T (2016) Tree-based conditional portfolio sorts: The relation between past and future stock returns. Available at SSRN 2740751 2. Batres-Estrada B (2015) Deep learning for multivariate financial time series 3. Takeuchi L, Lee YYA (2013) In Technical report. Stanford University 4. Wang S, Luo Y (2012) Signal processing: the rise of the machines. Deutsche Bank Quantitative Strategy 5. Paiva FD, Cardoso RTN, Hanaoka GP, Duarte WM (2018) Decision-making for financial trading: a fusion approach of machine learning and portfolio selection. Expert Syst Appl 6. Emerson S, Kennedy R, O’Shea L, O’Brien J (2019) Trends and applications of machine learning in quantitative finance. In: 8th international conference on economics and finance research (ICEFR 2019) 7. Siami-Namini S, Namin AS (2018) Forecasting economics and financial time series: Arima vs. LSTM. arXiv preprint arXiv:1803.06386 8. Patterson J (2017) Deep learning: a practitioner’s approach. O’Reilly Media 9. Heaton JB, Polson NG, Witte JH (2017) Deep learning for finance: deep portfolios. Appl Stoch Model Bus Ind 33(1):3–12 10. Olah C (2015) Understanding LSTM network—Colah’s blog. https://colah.github.io/ 11. Khan MA, Kadry S, Parwekar P et al (2021) Human gait analysis for osteoarthritis prediction: a framework of deep learning and kernel extreme learning machine. Complex Intell Syst. https:// doi.org/10.1007/s40747-020-00244-2 12. Mittal M, Satapathy SC, Pal V, Agarwal B, Goyal LM, Parwekar P (2021) Prediction of coefficient of consolidation in soil using machine learning techniques. Microprocess Microsyst 82:103830 13. Sadia H, Sharma A, Paul A, Padhi S, Sanyal S (2019) Stock market prediction using machine learning algorithms. IJEAT 14. Deepak RS, Uday SI, Malathi D (2017) Machine learning approach in stock market prediction. IJPAM

Brain Tumor Detection Using Deep Learning Sunny Yadav, Vipul Kaushik, Vansh Gaur, and Mala Saraswat

Abstract Human brain is the essential organ of the body, and furthermore, it controls the body. Brain cancer is created from distorted cell development and division in the brain, and the continuation of tumor prompts brain diseases. In the medical field, computer vision (CV) assumes a vital part in diminishing the requirement for human judgment to get right discoveries. CT examines, X-beams, and MRI filters are the most generally involved and obtained imaging advancements in attractive reverberation imaging (MRI). Our research focuses on the utilization of a few methodologies for the identification of brain tumor growths utilizing brain MRIs. In this paper, we performed pre-processing using the respective bilateral filter (BF) provided by Opencv library to eliminate commotion from an MRI image. The binary thresholding and convolution neural network (CNN) division calculations are additionally used to distinguish the cancer. We have made three datasets, viz., training, testing, and validation datasets. We will utilize our framework to distinguish whether or not the individual has a tumor or not. The results will be assessed utilizing different performance indicators such as accuracy, sensitivity, and specificity. Keywords Brain tumor · Opencv · Convolution neural network · Deep learning

S. Yadav · V. Kaushik · V. Gaur ABES Engineering College, Ghaziabad, India e-mail: [email protected] V. Kaushik e-mail: [email protected] V. Gaur e-mail: [email protected] M. Saraswat (B) Bennett University, Greater Noida U.P, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_9

89

90

S. Yadav et al.

1 Introduction Our research focuses on the automated detection and categorization of brain tumors. MRIs or CT scans are the most widely used techniques to examine the structure of the brain. The purpose of this research paper work is solely to identify tumors in brain MR images. The primary reason for detecting brain tumors is to help in clinical diagnosis and enhance the health sector. The goal is to create an algorithm that assures the existence of a tumor in MR brain images by integrating different techniques. Filtering, erosion, dilation, threshold, and tumor delineation techniques such as edge detection are used by us. The purpose of this research paper study is to eliminate tumors from MR brain images and depict them in a way that everyone can understand. The purpose of this research paper work is to provide essential information to users, particularly medical field experts who are treating the patients, in a simple manner. The purpose of this research paper is to develop an algorithm that can generate an image of the tumor from an MR brain image. The output image will basically give us information such as the size of the tumor detected, dimension, and position as well as tumor-related information that may be useful in a range of situations. Finally, we make use of the convolution neural network (CNN) algorithms to identify whether the given MR brain image has a tumor. The main goal of this research paper work is to identify and extract tumor regions from MR brain images and portray them in a way that is simpler and everyone can understand. The goal of this effort is to convey some useful information in a simplified manner to the users, particularly the medical field experts who are treating the patient. The goal is to create an algorithm that will create an image of the tumor retrieved from an MR brain image. The resulting image will be able to provide information such as the tumor’s size, dimension, and position, and its boundary supplies us with tumor-related information.

2 Related Work The resilience and accuracy of prediction algorithms are critical in medical diagnosis since the outcome is critical for patient care. For prediction, there are already developed several common classification and grouping techniques. The purpose of clustering a medical image is to reduce an image’s representation into a meaningful image that is easier to evaluate. A lot of clustering and classification algorithms are being developed to improve the diagnostic process’s prediction accuracy in finding problems. A. Sivaramakrishnan in his paper uses Fuzzy C-approach grouping algorithm and histogram equalization were used to provide an efficient and creative detection of the brain tumor location from an image. The disintegration of images is accomplished by using primary factor assessment to lower the extent of the wavelet coefficient [1].

Brain Tumor Detection Using Deep Learning

91

Sufyan has reported a detection method for brain tumor segmentation based mostly on Sobel feature detection utilizing an upgraded edge methodology. The binary thresholding operation is combined with the Sobel technique in their work, and various extents are excavated to utilize a secure contour process. Following the conclusion of that procedure, tumor cells are removed from the generated image using intensity values [2]. In Sathya’s study, it is represented by various clustering algorithms such as Kmeans, improvised K-means, C-means, and improvised C-means. An experimental examination of enormous data sets made up of unique images is offered in their article. They used a variety of parametric tests to examine the identified outcomes [3]. B. Devkotahas recommended that Morphological procedures be employed in conjunction with a computer-aided detection (CAD) technique to detect aberrant tissues. Among the several segmentation procedures available, the morphological opening and closure operations are favored since they need less processing time while extracting tumor regions with the least defects [4]. According to K. Sudharani study, the MRI images were subjected to a K-nearest neighbor technique in order to find and restrict the hysterically full-fledged region inside the aberrant tissues. The suggested work employs a cumbersome process yet yields stunning results. The sample training phase helped to determine the accuracy correctly [5]. Jaskirat Kaur built a few clustering algorithms for the segmentation process and assessed their style. Kaur presented a framework for assessing certain clustering algorithms based on their consistency in exceptional tenders. Other performance measure tests, such as sensitivity, specificity, and accuracy, were also presented in this study [6]. Kwok created a few clustering strategies for the segmentation process and carried out a style assessment for those approaches. Kaur provided a system for evaluating certain clustering approaches based on their consistency in extraordinary tenders. They also specified other performance measure tests which included sensitivity, specificity, and accuracy [7]. Kumar and Mehta work suggested the texture-based method. They emphasized the consequences of segmentation when the tumor tissue borders are not sharp. Because of those edges, the performance of the suggested technology may yield unfavorable outcomes. They used the MATLAB environment to carry out texture assessment and seeded region methods [8]. Dalia Mahmoud developed a model for tumor identification in brain imaging using Artificial Neural Networks. They used artificial neural networks to create a computerized recognition system for MR imaging. When the Elman community was utilized in the recognition system, the period duration and the nine accuracy level were found to be high when compared to other ANNs systems. This neural community had a sigmoid shape, which increased the accuracy of tumor segmentation [9]. Marroquin showed automatic 3D segmentation of brain MRI images. Using a distinct parametric model instead of a single multiplicative magnificence reduces the influence on a grandeur’s intensities. The main outcomes of this study show

92

S. Yadav et al.

that the MPM-MAP approach is more resistant to errors in computing the posterior marginal than the EM technique [10]. Deep learning approaches are widely used in various areas such as fake news detection [11, 12] and recommender system such as book recommendations using recurrent neural network [13]. Khan et al., in their paper, detects problems in walking style using deep learning [14].

3 Proposed Method It is broken into six sections, the first of which is image extraction from data gathering, followed by image pre-handling, image improvement, paired thresholding image division, and CNN mind cancer grouping. The result is assessed when each of the former activities has been finished.

3.1 Dataset Description The dataset used was obtained from Kaggle [15] and consists of around 7000 MRI images that helped in training and testing the system. Dataset contains an aggregate of 7023 images divided into testing and training folders. Each folder contains four sub-parts named glioma, meningioma, no tumor, and pituitary. We reconstructed this dataset by combining glioma and meningioma into YES category with 606 images and no tumor into NO category with 405 images. Images contained in these sub folders have different shapes. CNN helps in categorizing the images so as to discover the existence or absence of the tumor. Figure 1 depicts CT scan of brain with tumor and without tumor. Fig. 1 No-brain tumor image and brain image with tumor

Brain Tumor Detection Using Deep Learning

93

3.2 FlowChart Figure 2 depicts the stepwise overall proposed method which is explained in below subsections.

3.3 Image Preprocessing A dataset of approx 7000 images is being collected from Kaggle which comprises MRI images of both positive and negative tumor cases. In the first stage, these MRI images are used as input. Pre-processing is the first and most important step for image enhancement. Methods for reducing impulsive noises and image scaling, as well as other preprocessing techniques are essential. To begin, we transform the brain MRI image to a gray-scale version. Figure 3 shows image after applying filters. The versatile reciprocal separating approach is utilized to decrease mutilated sounds in the mind image. This works on the exactness of both classification and determination.

3.4 Image Enhancement After image enhancement operation, a new image is obtained which is rich in every manner for performing an image operation. Main aim of image enhancement is to improve image information to humans and algorithms who work on automated image processing. It can perform on digital images using software. Most basic operation involved in image enhancement is contrast, brightness, adding visual effects, changing in black and white, sepia, etc.

3.5 Thresholding It is performed on a grayscale image; after the thresholding operation, image appears in black-white manner which is due to its binary operating behavior. It is an OpenCV method that involves assigning pixel values based on a given threshold value.

94 Fig. 2 Proposed method

S. Yadav et al.

Brain Tumor Detection Using Deep Learning

95

Fig. 3 Filters applied on images

3.6 Morphological Operations Erosion and dilation are two fundamental morphological operations. By using fundamental operations, we can perform things like opening, closing, gradient, top hat, and black hat is formed. The opening operation’s main goal is to link a tiny group of pixels to a gap between two objects. Following the installation of the bridge, dilatation was employed to restore the erosion to its previous size. Following the opening of a binary image with the same structural components, subsequent opened images with the same structural elements have no influence on that image. Closing operation is taken place after opening operation. Based on the closure operation, erosion and dilation can handle altering noise in the image region while maintaining the original region sizes.

3.7 Brain Tumor Image Classification Using CNN The best method for identifying images, such as medical imaging, is to classify them. All classification algorithms assume that an image has one or more features, each of which may be classified into one of many categories. Because of its powerful structure, which assists in recognizing even the tiniest features, the convolutional neural network (CNN) will be used as an automated and reliable classification approach. The strength of a CNN is derived from a type of layer known as the convolutional layer. CNN is made up of multiple convolutional layers piled on top of each other, each capable of identifying increasingly complex structures. Manually written symbols can be recognized with three or four convolutional layers, while personification can be done with 25 layers. VGG16, VGG19, RESNET, INCEPTION, and so many other models are supplied by the Keras library for CNN. VGG 16 CNN Model The convolutional neural network architecture model that won the 2014 ILSVRC (Imagenet) competition is VGG16. It is still considered as an outstanding vision

96

S. Yadav et al.

model, despite the fact that more recent breakthroughs like Inception and ResNet have outperformed it. It comprises 16 layers that are formed by combination of convolution and pooling layers. For convolution, it operates using a 3 × 3 Kernel. The maximum pool size is 2 × 2. It contains around 138 million characteristics in all. It is trained using ImageNet data. It also has another version called VGG 19, which has 19 levels in total. VGG 19 CNN Model VGG 19 is a CNN architecture model which is very much similar to VGG 16, but it has 19 layers. This model contains different layers like 16 convolution layers, three fully connected layers, etc. Due to high depth, VGG 19 architectures are slow to train and they produce models of large sizes. The digits like 16, 19 represent the number of weight layers in the particular model. Using the VGG 19 architecture, we can build a template which can transfer learning from the given models within the few lines of code possible. RESNET 50 CNN Model ResNet stands for residual neural network. The old style neural organization, ResNet, goes about as the spine for some PC vision undertakings. In 2015, ResNet was perceived all throughout the world as the model that won the ImageNet challenge. Before the appearance of ResNet, profound neural organizations were dealing with the issue of evaporating slopes. Through ResNet, we can train extremely deep neural networks like having 150+ layers; this is the functionality of ResNet. For this phase, we’ll need to import Keras as well as the other packages we’ll be using to build the convolutional neural network. The following steps are to be followed to build a neural network: • • • •

The sequential layer helps in the initialization of the neural network. The image-processing convolutional network is built using Convolution2D. The pooling layers are added using the MaxPooling2D layer. The pool features like a flatten layer are used to convert the image into a single long numpy array which is fed to fully linked layers. • A fully connected layer is created using the Dense function provided by keras library.

3.8 Convolution To add the convolution layer, we use the add method with the model object and pass in Convolution2D with parameters. Some of the important parameters that are essential are feature detectors and dimension of feature detectors. In this instance, we’ll assume we’re working with colorful photos. A tuple named input shape contains the number of channels (3 for a colored image) and the size of the 2D array in each channel.

Brain Tumor Detection Using Deep Learning

97

3.9 Pooling The pooling procedure demands sliding a 2D filter across the different channels of the feature map and summing up the features within the filter’s coverage zone. Due to this, processing capacity requirement is lowered. It implements a feature map and performs pooling on it to summarize the important characteristics. As a result, subsequent operations are carried out on summarized features rather than precisely positioned features created by the convolution layer which helps in reducing computation process. Pooling can be divided into two types: max pooling and average pooling. In most circumstances, we use max pooling. For max pooling, we usually make a 2 × 2 pool. This allows us to shrink the feature map without sacrificing crucial image information.

4 Performance Evaluation Here we are discussing performance evaluation of different models we used to identify tumors from MRI images. Models such as VGG16, VGG19, and RESNET 50 statistics are provided below.

4.1 Performance of VGG 16 Figure 4 represents the confusion matrix of VGG16 validation accuracy. The validation accuracy which we got in our VGG16 model is 95%.This matrix clearly represents the statistics on which our model fails and on which statistics it gives correct results.

Fig. 4 Confusion matrix for vgg16 model validation and test accuracy

98

S. Yadav et al.

Fig. 5 Confusion matrix for vgg19 model validation and test accuracy

4.2 Performance of VGG 19 Figure 5 represents the confusion matrix of VGG19 validation accuracy. The validation accuracy which we got in the VGG19 model is 99%.This matrix clearly represents the statistics on which our model fails and on which cases it gives correct results.

4.3 Performance of Resnet50 Figure 6 represents the confusion matrix of Resnet50 validation accuracy. The validation accuracy which we got in the Resnet model is 98%.This matrix clearly represents the statistics on which our model fails and on which statistics it gives correct results.

Fig. 6 Confusion matrix for Resnet50 model validation and test accuracy

Brain Tumor Detection Using Deep Learning

99

Fig. 7 Comparing VGG16, VGG19 and RESNET 50 accuracy

Table 1 Comparison between CNN models S. No. Model

Activation function Performance metrics No of epochs Accuracy (%)

1

VGG16

Softmax

Validation accuracy

10

2

VGG19

Softmax

Validation accuracy

10

99

3

RESNET50 Softmax

Validation accuracy

10

98

95

4.4 Comparison of CNN Models See Fig. 7 and Table 1. shows and compares the accuracy of all three CNN models viz. VGG-16, VGG-19 and RESNET-50

5 Conclusion In this paper, we used multiple CNN algorithms to detect brain tumors in MRI images. Images prepared after passing through image enhancing techniques are quite good in quality because of reduction of noise. Image enhancing techniques used are scaling, interpolation, resizing, Gaussian blurring, morphological operation, cropping, contour etc. We collected data consisting of MRI images from Kaggle and performed analysis on them. Our dataset consists of around 7000 MRI images. Those 7 k images are further divided into testing sets and training sets which include 4900

100

S. Yadav et al.

images and 2100 images, respectively. We used three different CNN models which are VGG16, VGG19, and RESNET50.VGG19 and Resnet50 models performed better as compared to VGG16 models with accuracy of 98% and 98% respectively. Also VGG19 performed best among all three models with 99% validation accuracy score. So, we can say that VGG19 works better and it can be used to detect brain tumors in MRI images accurately and precisely. Although some of the literature mentioned above had proposed a method which achieves the accuracy of 98%, but the size of the dataset used in that study is very less as compared to ours. In that study, the size of the dataset used is only 1666 images, from which only 266 images are used for testing, which is a very small number for a CNN model. Our method is better in comparison to that in the context that we have used a total of 7000 images, which includes 70% for training and the rest 30% for testing. Achieving an accuracy of 98% with a large dataset makes a model more stable and accurate. Toward further examination, it is discovered that the proposed methodology requires to be improved to become eligible to detect the types of the tumors detected. Currently the method is unable to detect the types of the tumors. This is discovered as the improvement step of this research. Basically, in the field of medical image processing, acquiring medical data is a time-consuming task, and datasets may not be available in some circumstances. In all of these situations, the suggested system must be capable of accurately recognizing tumor areas from MRI images.

References 1. Sivaramakrishnan A, Karnan M (2013) A novel based approach for extraction of brain tumor in MRI images using soft computing techniques. Int J Adv Res Comput Commun Eng 2(4) 2. Aslam A, Khan E, Sufyan Beg MM (2015) Improved edge detection algorithm for brain tumor segmentation. Procedia Comput Sci 58:430–437. ISSN 1877-0509 3. Sathya B, Manavalan R (2011) Image segmentation by clustering methods: performance analysis. Int J Comput Appl 29(11). 0975-8887 4. Devkota B, Alsadoon A, Prasad PWC, Singh AK, Elchouemi A (2018) 5. Sudharani K, Sarma TC, Satya Rasad K (2015) Intelligent brain tumor lesion classification and identification from MRI images using a K-NN technique. In: 2015 international conference on control, instrumentation, communication and computational technologies (ICCICCT), Kumaracoil, pp 777–780. https://doi.org/10.1109/ICCICCT.2015.7475384 6. Kaur J, Agrawal S, Renu V (2012) A comparative analysis of thresholding and edge detection segmentation techniques. Int J Comput Appl 39:29–34. https://doi.org/10.5120/4898-7432 7. Li S, Kwok JT-Y, Tsang IW-H, Wang Y (2004) Fusing images with different focuses using support vector machines. IEEE Trans Neur Netw 15(6):1555–1561 8. Kumar M, Mehta KK (2011) A texture based tumor detection and automatic segmentation using seeded region growing method. Int J Comput Technol Appl 2(4):855–859. ISSN 2229-6093 9. Mahmoud D, Mohamed E (2012) Brain tumor detection using artificial neural networks. J Sci Technol 13:31–39 10. Marroquin JL, Vemuri BC, Botello S, Calderon F (2002) An accurate and efficient Bayesian method for automatic segmentation of brain MRI. In: Heyden A, Sparr G, Nielsen M, Johansen P (eds) Computer vision—ECCV 2002. Lecture notes in computer science, vol 2353 11. Sharma S, Saraswat M, Dubey AK (2021) Fake news detection using deep learning. In: Villazón-Terrazas B, Ortiz-Rodríguez F, Tiwari S, Goyal A, Jabbar M (eds) Knowledge graphs

Brain Tumor Detection Using Deep Learning

12.

13. 14.

15.

101

and semantic web. KGSWC 2021. Communications in computer and information science, vol 1459. Springer, Cham. https://doi.org/10.1007/978-3-030-91305-2_19 Dubey AK, Saraswat M (2022) Fake news detection through ML and deep learning approaches for better accuracy. In: Gao XZ, Tiwari S, Trivedi MC, Singh PK, Mishra KK (eds) Advances in computational intelligence and communication technology. Lecture notes in networks and systems, vol 399. Springer, Singapore. https://doi.org/10.1007/978-981-16-9756-2_2 Saraswat M, Saraswat R, Bahuguna R (2021) Recommending books using RNN. Recent Innov Comput Proc ICRIC 2:85 Khan MA, Kadry S, Parwekar P et al (2021) Human gait analysis for osteoarthritis prediction: a framework of deep learning and kernel extreme learning machine. Complex Intell Syst. https:// doi.org/10.1007/s40747-020-00244-2 https://www.kaggle.com/datasets. Last accessed April 2022

Predicting Chances of Cardiovascular Diseases Through Integration of Feature Selection and Ensemble Learning Raghav Bhardwaj, Shashvat Mishra, Isha Gupta, and Shweta Paliwal

Abstract Heart problems are among the top most notable problems in the field of medical science. The prediction of diseases related to heart is a peculiar challenge in the field of clinical analysis, and hence, it can be observed that heart problems have been the major death reason globally. The integration of different techniques of machine learning (ML) has proven to be an effective aid in decision making and helps in prediction from the diverse quantity of the data which is produced as a result of the observations in the field of health care. In this study, analysis of a real-time dataset has been performed using the integrated concept of feature selection. Different classifiers of ML are subject to parameter optimization, and values of classical performance measures are being observed. An interface has been designed to give user a pictorial view where the system will recommend what are the chances that an individual is suffering from heart disease. The identification of heart disease is based upon the proposed algorithm and identified features from the real-time dataset. Keywords Cardiovascular diseases · Electrocardiography (ECG) · Feature selection · Ensemble learning

1 Introduction The prediction of cardiovascular disease has acted as a significant concept in today’s world that is impacting the health of the people at a global level. The main reason R. Bhardwaj · S. Mishra · I. Gupta · S. Paliwal (B) Department of Computer Science and Engineering, Meerut Institute of Engineering and Technology, Meerut, India e-mail: [email protected] R. Bhardwaj e-mail: [email protected] S. Mishra e-mail: [email protected] I. Gupta e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_10

103

104

R. Bhardwaj et al.

of death is reported due to heart-related diseases in medical field. The main reason of increasing heart-related health issues is due to the lifestyle of people (increasing pressure, family and world related problems). Cardiovascular diseases are of two types acute and chronic. Determination of the person having high risk of heartrelated problem before time will help in decreasing death or disabilities in people. Main reasons of deaths can somewhere be attributed to lack of medical infrastructure and professional people in heart-related disease field. The principal concept in the proposed model is to determine the age group and heart rate using the machine learning algorithm. The paper conveys how the rate of heart and heart condition is estimated on the basis of inputs provided like blood pressure, ECG and various other important factors that are provided to the system by the user. The implementation of machine learning algorithms has proven to provide a better experience and also results in better accuracy as compared to other algorithms. The intervention of ML has made the diagnosis possible at primary stage. ECG is the trending term in health care and is based on detection of heart’s electrical rhythm and then creates a tracing similar to squiggly lines. In ECG test, sensors like device are connected to various parts of body especially chest area to record the electric signals generated by heart. In this study, ECG value with approximate 30% weight is considered but since ECG alone cannot check the heart problem other factors like, age, gender, blood pressure (bp) are also considered to draw the final conclusion, etc. In the field of health care, ML is becoming more prominent and is proven to be significantly useful in the areas of providing help to patients and doctors in several ways. There are several use cases for ML in the domain of health care like automated medical billing, support in clinical decisions and development of guidelines for medical care. ML is also applied in the areas of medicine in order to find uncommon patterns automatically ML is used and aids radiologist make significant decisions from images and radiology reports. The ML applications related to health care developed by Google were trained to predict breast cancer and was successful in achieving the accuracy of 89%. There are many used cases of health care for ML. Healthcare data which is unstructured in nature is found to be of considerable assistance in order to train the machines using ML. This data is actually documents or texts files which cannot be analyse without a human in the past. Natural language is quite complex and holds a large amount of vagueness. The conversion of these documents into a form in which they can be analyse depends upon programmes of natural language processing. It has been seen that most of the deep learning in applications related to health care makes use of natural language processing in some form. Deep learning techniques are also applied along with feature selection and has proven to be more accurate in case of prediction purpose. Various ML methods were applied for the prediction purpose of cardiovascular disease. Feature selection was performed to identify the features that holds the true purpose in prediction procedure. ML classifiers algorithms were also taken into account in the process. Naïve Bayes classifier was used and had an accuracy of 94.72% alongside this KNN with 89.06% accuracy, SVM with 88.62% accuracy,

Predicting Chances of Cardiovascular Diseases Through Integration …

105

logistic regression with 88.13% accuracy, decision tree with 92.03% accuracy and random forest with 89.21% accuracy were also applied. Ensemble learning was also applied with integration to feature selection. AdaBoost was found to be the most accurate with an accuracy of 96.09%. Key Highlights of the study are: • Collection of a new dataset for the analysis purpose; performing data cleaning and pre-processing. • Identifying important attributes using correlation-based feature selection. • Evaluating different ML classifiers and ensemble technique AdaBoost; performing hyper parameter tuning for the ensemble classifier. • Observing the standard performance measures. • Design an interface for the purpose of prediction using the identified features, standard threshold value and tuned ensemble classifier.

2 Literature Review A method has been proposed to find relevant features by the application of ML techniques which results in increasing the accuracy for predicting cardiovascular diseases. Their model was introduced with several combinations of features and different classification techniques. They produce acceptable level of performance with an accuracy of 88.7% through their model for the prediction of heart disease along with hybrid random forest with a liner model. The data of heart disease used by them was collected from the public repository to find patterns using different classifiers (KNN, SVM and DT). Their method takes the heart cycle into account with several staring positions from the ECG signals in their phase of learning [1]. In their study, they have used several attributes related to the disease of heart and have also prepared a model which uses supervised machine learning algorithms such as Naive Bayes, DT, KNN and random forest (RF). The model that they have proposed makes use a pre-existing dataset. In their paper, a discussion has been made on different ML algorithms on the dataset that was used by them. Also, the analysis of dataset is described in their paper. Their paper describes the attributes which has more weight (emphasis) as compared to others which results in higher accuracy. The research that has been made by them reduces the expense of several trials of a patient leading to a substantial amount [2]. Their research aims at providing a detailed explanation of Naive Bayes and DT algorithm that were used in their research majorly in predicting the diseases related to heart. Some experiments were conducted in order to compare the running of predictive data mining technique on the dataset that was used. Several results reveal that the performance of DT is much better than Bayesian classification. The pre-processed data from their dataset is clustered with the help of clustering algorithms such as K-means, to accumulate significant data in a database.

106

R. Bhardwaj et al.

The common patterns were also classified into different classes using the concept of information entropy. In their research, the concept of pattern recognition was also used for predicting risk of heart diseases [3]. A research was conducted in an affiliated hospital in year 2015–2018 on the prediction of cardiovascular diseases based on multiple factors using machine learning models (multinomial logistic regression, XGBoost, random forest and extreme gradient boosting). The accuracies of each of these algorithms were compared along with their F1-scores. In a record of 17,661 patients when the methods were applied, it was found that the most important features generated by XGBoost were blood pressure, oxygen saturation, pulse rate and age [4]. As per this paper, a research was conducted on the prediction of diseases related to heart as the lives of patients are very crucial and they lie in the hands of the doctor, so the prediction must be accurate. An application was developed which aims at predicting the probability of occurrence of heart disease on the basis of simple factors like ST depression; in earlier surveys, only ten factors were considered but now 14 were taken into account. Also, this paper shows a comparison of various ML models like logistic regression, Naïve Bayes, random forest and several others. After comparing, it was found that random forest was the most accurate and reliable algorithm and was therefore used in the system proposed by them [5]. A research was conducted on finding the pattern of data from the data that was observed and finding the functional dependencies between the data. As cardiovascular diseases have become a major problem to the health of people, it is necessary to conduct researches to come to appropriate conclusions. SVM has strong mathematical theory support hence can be used in prediction of cardiovascular diseases. This research majorly emphasizes on finding whether the population is sick or not using machine learning. At first data, pre-processing was done and then, SVM and linear regression were used for the purpose of prediction [6]. A study was conducted on cardiovascular disease with the help of techniques used in ML in the following research paper. It was found that the target of this research was to establish an efficient method for predicting diseases related to heart. The working in this model includes many steps such as combination of five dataset was performed so as to form a more understandable and a larger dataset. Two of the selection techniques were also used in this method naming Relief and LASSO, these techniques were used to infuse the most important features that were based on rank values in reference to medical area. This technique also helps us to overcome the problems related to overfitting and underfitting in ML. It was also found that various models that are characterized as supervised like AdaBoost, DT and KNN algorithm together are applied with hybrid classifiers, following this the results were compared with the existing studies [7]. A research was carried out in accordance with the preferred reporting information for systematic reviews and meta-analysis (PRISMA). Various algorithms related to ML are utilized in an increasing manner for the prediction of heart diseases. The aim of this model was to list the ML algorithms ability to predict heart disease. A search strategy was applied within the MEDLINE, Embase databases from inception related to database. For predicting coronary artery

Predicting Chances of Cardiovascular Diseases Through Integration …

107

disease, boosting algorithms had AUC of 0.88 and custom-build algorithms had AUC of 0.93. For stroke prediction, SVM had 0.92 AUC, boosting algorithm had 0.91 AUC and CNN had AUC of 0.90 [8]. In this research paper, study was carried out related to the classification of heart disease using tools of data mining and ML techniques. Six data mining tools are integrated with machine learning techniques: logistic regression, SVM, KNN, ANN, Naive Bayes and random forest. Performance measures that were used for the comparisons of performances are accuracy, sensitivity and specificity. From the following research, the best performing tools and the best performing techniques were MATLAB and MATLAB’s ANN model [9]. A study was carried out in this research paper related to the process of simulation of diagnosis of heart-related diseases with the help of intelligent and computational approaches. The sole purpose behind this research was to gather knowledge on important attributes. In the following model, heart disease is identified with the help of fuzzy inference system (DCD-MFIS) that shows the precision to be about 87.05%. The proposed model provided with an accuracy that is more as compared to the previously provided solutions that are about 92.45% accurate [10]. Another relevant study on the prediction of diseases related to heart using pre-existing ML techniques was directed towards the prediction of cardiovascular diseases at a very early stage in order to overcome the adverse effect. Decision making was carried out in an effective manner using the data mining techniques that helps to identify various patterns that acts as an important factor for the purpose of prediction. It aims at the development of a decision-making system based on cloud technology using ML for the purpose of prediction [11]. A study was carried out in this paper so as to identify a better and fast detection technique. DT is said to be one of the most efficient techniques for data mining. In this research, comparison was made among different algorithms related to DT classification for effective results in the diagnosis process using WEKA [12]. In this paper, a study was carried out for the prediction of heart disease of diabetic patients. For the purpose of extracting knowledge from huge amount of datasets, data mining techniques are taken into account. Diabetes can be defined as a disease that takes place inside a body when the pancreas fail to produce enough insulin or the body fails to effectively use the insulin that was produced. Algorithm (Naive Bayes and SVM) are used in this research along with several features like age, sex, blood pressure and blood sugar [13]. This research emphasizes on the fact that the tremendous amount of clinical data when mined effectively may give results and conclusions which will help the healthcare facilities to overcome a disease by discovering it in earlier stages. The mining of data and knowledge discovery in database are related to each other and are sometimes used interchangeably. The discovery of knowledge is a multiple step process and consists of several predefined steps like selection of data, collection of data, processing of data and several others [14]. A study has been conducted that gives inference use of cloud computing for making the management of cloud-based healthcare services. It aims at the development of an intelligent model that focuses on real-time monitoring of health data of a user for the purpose of diagnosis of chronic illness. IoT-based

108

R. Bhardwaj et al.

sensors were used to gather the health data and store the same in cloud repositories for the purpose of analysing data [15]. This research emphasizes on the serious life situations that are encountered due to heart-related problems and CVDs. This research solely focuses on development of feasible, accurate and trustworthy system for the diagnose purpose of diseases related to heart so that proper steps can be performed for precaution. This paper provides information about the survey related to models based on ML algorithms and also their performance analysis. According to various researches, models used are based on supervised learning algorithms like SVM, Naïve Bayes and random forest [16]. In this paper, a research study was carried on cardiovascular disease using approaches of data mining techniques. This paper focuses on life threatening issues caused due to heart diseases. In this research, we get to know about the features such as blood pressure, cholesterol and pulse rate and the fact that these features have different values in different individuals. There are medically proven normal values for various features such as normal value for blood pressure is 120/90, for cholesterol is 125–200 mg/dL and pulse rate is 72. This research provides us with the information about various different classification techniques for prediction purpose in an individual on the basis of age, blood pressure, cholesterol and pulse rate with the use of data mining techniques such as Naive Bayes (NB), KNN and neural network [17]. This research emphasizes on the use of DT techniques for the earliest prediction and diagnosis of heart diseases. This study focuses on developing a clinical decision support system that can help the health specialists for the early prediction of disease using the available medical health data of an individual. This research was directed towards the development of a DT technique for earliest prediction. Alternating DT can be determined as a new rule for classification which is composed of DTs, voted DTs and voted decision stumps. To accumulate the purpose of feature selection from the dataset collected from hospitals at Hyderabad, principal component analysis has been used [18]. In this paper, a research was carried out to predict the heart-related diseases using different techniques like removing noisy data, removing missing data and classification of different attributes for making decision and prediction. Different methods like classification, accuracy, sensitivity and specific analysis were applied to measure the performance of diagnosis model. The heart disease by this proposed model is predicted by comparing the accuracies of different algorithms like SVM, gradient boosting, NB classifier, logistic regression, etc. [19]. This research emphasizes construction of models for prediction on the basis of data-driven models on the basis of different ML methodologies. It also emphasizes the drawbacks of current applications involving ML algorithms in the field of cardiovascular diseases [20]. In this paper, a research was carried in which KNHSC data was analysed and ML and big data characteristics were studied to predict the risk involving heart disease. The prediction of risk related to heart diseases is done by assessing the effectiveness of several ML algorithms. Various models based on ML were developed for prediction using logistic regression, neural networks, random forest and several others. The analysis shows that whether information related to previous medication were included or not the accuracy of ML models was equivalent to each other [21]. This research emphasizes on the prediction of heart disease with proper accuracy while using

Predicting Chances of Cardiovascular Diseases Through Integration …

109

less number of attributes. For the same purpose, initially, only thirteen attributes were taken into account. Their work focuses on the use of genetic algorithm for the prediction of heart-related disease using thirteen attributes which will eventually cut down the number of tests that has to be performed by the patient. With the use of genetic algorithm, the numbers of attributes was reduced to six in numbers. Naïve Bayes, classification with clustering and DT were used for the diagnosis purpose [22]. In this research, a newly created approach which is based on coactive neuro-fuzzy inference system (CANFIS) was proposed for the diagnose purpose of heart disease. The presented model can be defined as the combination of neural network adaptive capabilities and the fuzzy logic qualitative approach which was further combined with the genetic algorithm for the prediction purpose of the heart disease [23]. In this paper, cardiovascular diseases are predicted using different ML techniques. Several sampling techniques were also applies to handle unbalanced datasets. The overall risk is predicted by applying various ML algorithms. The dataset used in this model is available on public repository; the prediction in regards whether a person has a 10 year risk of future heart disease or not is the main goal of this project and is achieved with 89% accuracy [24]. This research emphasizes on identifying ML classifiers having high accuracy to predict disease. To predict cardiovascular disease, multiple supervised learning algorithms performance were compared. This research applies three algorithms namely KNN, DT and random forest on dataset collected from public repository and it was found that 85% accuracy and sensitivity and specificity was attain by random forest algorithm [25]. Thus, the application of ML is changing the perspective of the biomedical science and health care [26].

3 Proposed Framework AdaBoost can be stated as an acronym for adaptive boosting is a statistical classification meta-algorithm which. The algorithm is based on the logic of weighted sum of other algorithms and provides the final result of the boosted classifier. AdaBoost is adaptive in nature as it tweaks the weak learners in accordance to those instances which were misleading by the earlier used classifier. A learner on individual basis is said to be weak; however, the performance aspects of one learner is said to be better than another, the final model converged is said to be of better performance and also a strong learner. In this study, the dataset is considered as D; AdaBoost works on the concept of weights and Gini index (a statistical measure used by classification and regression tree) given by Eq. 1. The logic of AdaBoost is presented by algorithm1. The proposed framework is given by Fig. 1. Gini index = 1 −

n  i=1

( pi )2

(1)

110

R. Bhardwaj et al.

Dataset Gathering

Pre- Processing

Feature Selection Tuning the parameters ML Algorithms+ ADABOOST

Statistical Result

Web Interface

Final Prediction of Heart Disease Fig. 1 Proposed methodology

Algorithm 1 Input: Initialize the dataset D Assign the weights to the example as; wi = 1/T where I takes the value as 1, 2, …, T For e = 1 to E; (a) Assign a classifier G e (D) to the training data using weights wi (b) Calculate the error; erre =

n  i=1

Set the weights in exponent power Final Output O



wi I yi= G e (Di )

 n   i=1

wi

Predicting Chances of Cardiovascular Diseases Through Integration …

111

AdaBoost has been selected as the proposed algorithm because since the dataset is collected from real-time scenario; the algorithm uses the statistical measure of Gini index, and hence, we can calculate which data is more relevant for the analysis purpose. The algorithm is also subjected to hyperparameter tuning for better analytical results. Hyperparameter tuning or optimization in the field of ML is defined as the problem for selecting a set of exquisite hyperparameters for learning algorithms. Different factors such as weights or learning rates are required by ML models for the generalization of various data patterns. A set of hyperparameters is identified by hyperparameter optimization so that these hyperparameters can produce an optimal model that can minimize the value of predefined loss function on the desired independent data; base_estimator object = 15; n_estimator = 70; learning_ratefloat = 0.5

4 Performance Evaluation 4.1 Data Analysis In ML, data analysis is a technique of analysing dataset in order to list down the main characteristics, often with visual methodologies. EDA can be used for determining what can be concluded from the data even before modelling any task. It is quite clear that we cannot go through a complete spreadsheet and find relevant features of the data. This task is quite tedious and boring and is not that efficient, here data analysis came into action. Some of the Python libraries like Pandas, Matplotlib, Seaborn and Numpy are some of the libraries in Python using which we can perform analysis of data even in a single line.

4.2 Classifiers Evaluated (a) NB Classifier It is a ML algorithm which is probabilistic in nature and is based on Bayes theorem. It has been seen that the simplest solutions are sometimes the best and Naive Bayes algorithm has proven this. Each variable is treated independently, and it helps to predict even if variables do not hold problem relations. It finds its use mainly in the field of text classification where a dataset of high dimension is included. It is capable of handling data in discrete as well as continuous form. It is expandable in nature. It aids in the making of predictions that are real time. This is also a good performer in cases where multiple classes are taken into account.

112

R. Bhardwaj et al.

(b) K-Nearest Neighbour In terms of supervised ML algorithm, KNN is known to be one of the simplest algorithms. The working KNN algorithm tries to find similarity between the available cases and new case or data and thus classifies the new data into the categories based on the similarity measures. All the available data is stored in KNN algorithm, and a new data point is classified on the basis of similarity. The KNN algorithm is very flexible in nature and can be used for both classification as well as for regression. KNN algorithm can find its use in applications where a higher accuracy is required. The standard of predictions is dependent on distance measure. The KNN algorithm does not require any training time for the purpose of classification or regression as all the work happens during the course of prediction. (c) Logistic Regression (LR) LR is one of the most prominent and popular ML algorithms. It comes under the supervised learning technique. It is measurly used for the prediction of the output of categorical dependant variables. Hence, it can be concluded that the outcome has to be categorical or discrete value. Logistic regression is much similar to that of linear except in a way they are used. In logistic regression, it has been seen that instead of fitting on a line of regression, it can be fitted on a logistic function whose shape is S, which aims at predicting two maximum values. The major reason behind the popularity of this algorithm is that it converts the values of the logs ranging between negative to positive infinity to the range lying between 0 and 1. LR is orderly and therefore easy to train, implement and elucidate. (d) Support Vector Machine SVM is one of the most prominent supervised learning algorithms which is often used for the purpose of classification as well as regression problems in ML. It aims to create a decision boundary that can separate various dimensional spaces into classes so that it becomes easy to insert the new point of data in the correct category in future. The decision boundary is known as hyperplane. SVM makes use of extreme vectors that helps in the creation of hyperplane. SVM makes use of a technique known as kernel trick for the transformation of data, and then, on the basis of these results, it finds a boundary which is optimal between the set of generated outputs. It is found to be very effective in cases where number of dimensions is found greater than the number of samples. (e) Decision Tree A DT has structure similar to flowchart used for classification and prediction. It has a tree like structure with nodes namely of two types internal and external. The test on the attribute is denoted by internal node, the test outcome is represented by branch, and a class label is hold by leaf node. DT algorithm is found to be much easier to read and interpret when compared to other algorithms. It is very fast to implement, but the accuracy is not up to the mark when talking about predicting results. It emphasizes

Predicting Chances of Cardiovascular Diseases Through Integration …

113

more on creating a model for the purpose of prediction than predicting the results on its own. It finds its use mostly in the cases where classification is required. (f) Random Forest An algorithm which is used widely for classification and regression problems is known as random forest algorithm. Random forest algorithm is a supervised ML algorithm. By considering different samples, it builds DT and then for classification and regression it counts majority votes. Random forest algorithm is capable of handling datasets containing both continuous variable and categorical variable in the case of regression and classification, respectively. Random forest algorithm provides better result for classification problems. Random forest algorithm is much more effective in providing accurate outcomes as compared to DT algorithm. It is considered as one of the most powerful algorithms because of its capability to decrease the overfitting of data without enormously expanding the error due to biasing.

4.3 Feature Selection Feature selection is said to be one of the most important and key concepts in ML that results in the huge impact over the execution of your ML model. Various features that are used in the training process of your learning model have a huge influence over the performance and working of the desired model. In case of any irrelevant features included in training process might affect the performance of model in a negative manner. The features in our model are evaluated on the basis of linear regression along with correlation coefficient. The threshold value is considered to be 0.5. Linear regression is a method of finding the relation between two or more variables known as dependent and independent variables. Linear regression involves the modelling of relationship of functions using linear predictor whose parameter is approximated from data. These models are known as linear models. The centre of linear regression is conditional probability distribution in place of joint probability distribution given by Eq. 2. Z i = f (Ai , β) + ei

(2)

Z i = Dependent variable; f = Function; Ai = Independent variable; β = Unknown parameters; ei = Error terms. Correlation coefficient can be defined as the count of linear correlation between two or more provided datasets. It is the estimate of the ratio between covariance of two variables and the product of standard deviation of the variables; therefore, it is defined as a normalized measurement of the covariance in a manner that the result produced always have the value between −1 and 1. Figure 2 describes the identified features from the dataset along with their correlation values.

114

R. Bhardwaj et al.

Fig. 2 Correlation matrix

Table 1 Attribute description No.

Feature

Description

Type

1

Age

Age

Continuous

2

Gender

Sex

Categorical

3

Trestbps

Resting blood sugar

Continuous

4

Chol

Cholesterol of serums

Continuous

5

Fbs

Fasting blood sugar

Continuous

6

Restecg

Resting electrocardiographic

Categorical

7

Thaalch

Maximum heart rate achieved

Continuous

8

Exang

Exercise induced angina

Categorical

9

Oldpeak

ST depression

Continuous

10

Slope

Slope of peak exercise segment of ST

Categorical

11

Ca

Major vessels coloured by fluoroscopy

Categorical

12

Thal

Thalassemia

Categorical

13

ECG level

Most frequent level of categorical ECG

Categorical

14

cp

Type of chest pain

Categorical

Features that were taken into account while developing this model in order to predict heart attack in an individual are as follows given by Table1. Hence, out of the total features 14 features have been identified for the analysis purpose.

5 Result We have evaluated several algorithms for the purpose of prediction in our model on the basis of following measures: Accuracy—The fraction of predictions achieved correctly by our model; Precision—It represents how close the model’s predictions are to the observed values; Recall—It finds out the number of false negatives that were present in the prediction; F1-score—Harmonic mean of precision and recall;

Predicting Chances of Cardiovascular Diseases Through Integration … Table 2 Performance measures

115

#

Measure

Formulae

1

Accuracy

TP+TN TP+TN+FP+FN TP TP+FP TP TP+FN TP TP+1/2(FP+FN)

2

Precision

3

Recall

4

F1-score

Algorithm

Accuracy

Precision

Recall

F1-score

AdaBoost

96.09

98.00

99.00

99.00

Naïve Bayes

94.72

96.33

83.61

89.62

KNN

89.06

89.37

87.34

88.34

Logistic regression

88.13

89.42

82.56

85.85

SVM

88.62

91.32

80.64

85.64

DT

92.03

93.24

86.40

89.68

Random forest

89.21

91.50

84.24

87.72

Table 3 Observations

Accuracy Measure

100 90 80 70 60 50 40 30 20 10 0 AdaBoost

Naïve Bayes

KNN

Logisc Regression

SVM

Decision Tree

Random Forest

96.09

94.72

89.06

88.13

88.62

92.03

89.21

Accuracy Measure

Fig. 3 Accuracy visualization

the formula is given by Table 2; TP—True positive; TN—True negative; FP—False positive; FN—False negatives; Table 3 describes the experimental values of performance measures on the dataset. Figures 3 and 4 denote the visualization of the performance measures.

6 Conclusion In the following paper, we have presented a prediction system for cardiovascular diseases. As it has been seen that there has been an increase of the diseases related to

116

R. Bhardwaj et al.

100 80 60 Precision Recall

40

F1-Score 20 0 AdaBoost Naïve Bayes

KNN

LR

SVM

DT

Randorm Forest

Fig. 4 Other performance visualization

heart; diseases related to heart are fatal in nature. So the prediction of diseases related to heart when recognized at earlier stages can be cured thereby reducing the chances of death due to cardiovascular diseases. We have tested a number of algorithms on our real-time dataset for the purpose of prediction. After the application of different algorithms and optimization, we arrive at the conclusion that AdaBoost has delivered significant results and when subjected to the designed web interface has resulted in précised prediction rate as compared to other classifiers. The future objective is targeted to design more robust model on the real-time dataset using the integration of deep learning methods with ensemble methods.

References 1. Mohan S, Thirumalai C, Srivastava G (2019) Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7:81542–81554 2. Shah D, Patel S, Bharti SK (2020) Heart disease prediction using machine learning techniques. SN Comput Sci 1(6):1–6 3. Nikhar S, Karandikar AM (2016) Prediction of heart disease using machine learning algorithms. Int J Adv Eng Manage Sci 2(6):239484 4. Jiang H, Mao H, Lu H, Lin P, Garry W, Lu H, Yang G, Rainer TH, Chen X (2021) Machine learning-based models to support decision-making in emergency department triage for patients with suspected cardiovascular disease. Int J Med Informatics 145:104326 5. Rubini PE, Subasini CA, Katharine AV, Kumaresan V, Kumar SG, Nithya TM (2021) A cardiovascular disease prediction using machine learning algorithms. Ann Roman Soc Cell Biol, 904–912 6. Sun W, Zhang P, Wang Z, Li D (2021) Prediction of cardiovascular diseases based on machine learning. ASP Trans Internet Things 1(1):30–35 7. Ghosh P, Azam S, Jonkman M, Karim A, Shamrat FJM, Ignatious E, Shultana S, Beeravolu AR, De Boer F (2021) Efficient prediction of cardiovascular disease using machine learning algorithms with relief and LASSO feature selection techniques. IEEE Access 9:19304–19326 8. Krittanawong C, Virk HUH, Bangalore S, Wang Z, Johnson KW, Pinotti R, Zhang HJ, Kaplin S, Narasimhan B, Kitai T, Baber U, Halperin JL, Tang WH (2020) Machine learning prediction in cardiovascular diseases: a meta-analysis. Sci Rep 10(1):1–11

Predicting Chances of Cardiovascular Diseases Through Integration …

117

9. Sharma N, Mishra MK, Chadha JS, Lalwani P (2021) Heart stroke risk analysis: a deep learning approach. In: 2021 10th IEEE international conference on communication systems and network technologies (CSNT). IEEE, pp 543–598 10. Siddiqui SY, Athar A, Khan MA, Abbas S, Saeed Y, Khan MF, Hussain M (2020) Modelling, simulation and optimization of diagnosis cardiovascular disease using computational intelligence approaches. J Med Imag Health Informatics 10(5):1005–1022 11. Maini E, Venkateswarlu B, Gupta A (2018) Applying machine learning algorithms to develop a universal cardiovascular disease prediction system. In: International conference on intelligent data communication technologies and internet of things. Springer, Cham, pp 627–632 12. Marimuthu M, Abinaya M, Hariesh KS, Madhankumar K, Pavithra V (2018) A review on heart disease prediction using machine learning and data analytics approach. Int J Comput Appl 181(18):20–25 13. Parthiban G, Srivatsa SK (2012) Applying machine learning methods in diagnosing heart disease for diabetic patients. Int J Appl Inf Syst 3(7):25–30 14. Thenmozhi K, Deepika P (2014) Heart disease prediction using classification with different decision tree techniques. Int J Eng Res General Sci 2(6):6–11 15. Kaur PD, Chana I (2014) Cloud based intelligent system for delivering health care as a service. Comput Methods Programs Biomed 113(1):346–359 16. Ramalingam VV, Dandapath A, Raja MK (2018) Heart disease prediction using machine learning techniques: a survey. Int J Eng Technol 7(2.8):684–687 17. Thomas J, Princy RT (2016) Human heart disease prediction system using data mining techniques. In: 2016 international conference on circuit, power and computing technologies (ICCPCT). IEEE, pp 1–5 18. Jabbar MA, Deekshatulu BL, Chndra P (2014) Alternating decision trees for early diagnosis of heart disease. In: International conference on circuits, communication, control and computing. IEEE, pp 322–328 19. Angayarkanni G, Hemalatha S, Towards analyzing the prediction of developing cardiovascular disease using implementation of machine learning techniques 20. Al’Aref SJ, Anchouche K, Singh G, Slomka PJ, Kolli KK, Kumar A, Pandey M, Maliakal G, van Rosendael AR, Beecy AN, Berman DS, Leipsic J, Nieman K, Andreini D, Pontone G, Schoepf UJ, Shaw LJ, Chang H-J, Narula J, Bax JJ, Guan Y, Min JK (2019) Clinical applications of machine learning in cardiovascular disease and its relevance to cardiac imaging. Eur Heart J 40(24):1975–1986 21. Joo G, Song Y, Im H, Park J (2020) Clinical implication of machine learning in predicting the occurrence of cardiovascular disease using big data (Nationwide Cohort Data in Korea). IEEE Access 8:157643–157653 22. Anbarasi M, Anupriya E, Iyengar NCSN (2010) Enhanced prediction of heart disease with feature subset selection using genetic algorithm. Int J Eng Sci Technol 2(10):5370–5376 23. Parthiban L, Subramanian R (2008) Intelligent heart disease prediction system using CANFIS and genetic algorithm. Int J Biol Biomed Med Sci 3(3) 24. Lakshmanarao A, Srisaila A, Kiran TSR (2021) Heart disease prediction using feature selection and ensemble learning techniques. In: 2021 third international conference on intelligent communication technologies and virtual mobile networks (ICICV). IEEE, pp 994–998 25. Ali MM, Paul BK, Ahmed K, Bui FM, Quinn JM, Moni MA (2021) Heart disease prediction using supervised machine learning algorithms: performance analysis and comparison. Comput Biol Med 136:104672 26. Paliwal S, Bharti V, Singh S (2020) 3. Innovations in biological applications with machine learning. In: Srivastava R, Nguyen N, Khanna A, Bhattacharyya S (ed) Predictive intelligence in biomedical and health informatics. De Gruyter, Berlin, Boston, pp 49–62

Feedback Analysis of Online Teaching Using SVM Punit Mittal, Kartikey Tiwari, Kanupriya Malik, and Meghna Tyagi

Abstract Study examined how we can upgrade the quality of online teaching (Feng and Bienkowski in Enhancing teaching and learning through educational data mining and learning analytics. Department of Education, Office of Educational Technology [Feng M, Bienkowski M (2012) Enhancing teaching and learning through educational data mining and learning analytics. Department of Education, Office of Educational Technology]) in the upcoming years. Data were collected through the Google form which were filled up by the students. Feedback is the most important attribute of assessment as it provides students with a statement of their learning and advises how to improve. Result was helping to enhance the quality of online teaching in educational system and also provide the constituent which helps in improving the online teaching. We will use sentiment analysis (Kumar and Nezhurina in Sentiment analysis on tweets for trains using machine learning. Research Gate, p 10 [Kumar S, Nezhurina MI (2020) Sentiment analysis on tweets for trains using machine learning. Research Gate, p 10]) also. Keywords Student feedback · NLP · Sentiment analysis · COVID-19 · Online learning

1 Introduction Due to the outbreak of COVID-19 [3], the education system across the model has been affected a lot. The academic institutions were shut down to stop COVID-19 chain, due P. Mittal (B) · K. Tiwari · K. Malik · M. Tyagi Meerut Institute of Engineering and Technology, Meerut 250001, India e-mail: [email protected] K. Tiwari e-mail: [email protected] K. Malik e-mail: [email protected] M. Tyagi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_11

119

120

P. Mittal et al.

to which online learning approach was adopted for the learner’s fulfilment. Today’s study target is to satisfy students’ requirements [4]. It is evident that there is lack of exposure to the outer world, but still with one click, students can access their notes without the fear of losing them. The purpose of the project is to identify the need of the learners. By using machine learning tools, we collect the student’s feedback and analyse to provide the better knowledge in correct environment. Particularly, we wanted to know about students’ opinion on content delivery along with social effects of education they are getting online.

2 Literature Review The problem with sentimental analysis is categorisation. In a given piece of written text, what exactly required is to categorise the text into positive or negative. Based on that, we have three levels of categorisation on which it depends. So here we have summarised the list of various positive and negative words based on customer’s reviews. A technique has been proposed in which tokens are featured. Online learning is an alternate and versatile approach, by using new technologies, it became much easier to adopt, but somehow it is not an instant solution. In such learning approach, user feels unsatisfied with web-based learning in larger way. There are a number of challenges which have been evolved due to these pandemic situations. Following which lots of manageable advancements are required in order to deal with various login issues and blunders with videos and sounds. Being at home alone makes individual confront the Internet learning. Most of the times, we need collaboration of more than one method which sometimes becomes hard to carry out. We consistently need to have clarifications to fix all these issues along with satisfied testing.

3 Methodology For a good understanding of any system, we need to study the overall workflow of the system. Let us discuss the workflow of our model as shown in Fig. 1, in the form of a flow chart.

3.1 Sentiment Analysis Data Preparation In the first model, we downloaded the data set from the Internet to perform sentiment analysis. The data set has two columns: feedback and polarity score. It is about

Feedback Analysis of Online Teaching Using SVM

121

Fig. 1 Flowchart

student feedback. Next, we added a column that described the nature of the feedback [5] (positive, neutral, or negative), based on polarity scores. EDA [6] It is a process where we recognise the different patterns of datasets and form theory about what we know about the data set. It consists of creating summary of numerical data as well as graphical representations of various types of datasets. (Figure 2) Distribution of data is used to visualise the spread of the data. By analysing the percentage of positive, neutral and negative data, we can get a better understanding of the data. Let us create three additional features shown in Fig. 3 of “wordcount”, “character count” and “sentence count” to determine the further detail about the data. We have done with the help of lambda function in Python. If it is not providing the clear picture for negative, positive and neutral, now divide the data into negative, neutral and positive then you easily know more about the data. This is provided through the pandas’ functionality for better insight about the data. Figure 4 describes the paragraph. After this, we can use heat maps in Fig. 5 to show correlation to find out the features that are crucial for model building and recognise the correlation between the features. If the features are highly correlated, then that feature is removed. This will help to produce better results and provide better insight. Data Pre-processing It is a technique for cleaning text data and getting it ready to feed into a model. Because text data can fluctuate, this is a necessary step. There are numerous factors

122 Fig. 2 Distribution of positive, neutral and negative data

Fig. 3 Description of paragraph

Fig. 4 Description of positive, negative and neutral paragraph

P. Mittal et al.

Feedback Analysis of Online Teaching Using SVM

123

Fig. 5 Correlation between the features

that can inject noise into your data, ranging from its structure (website, text message, voice recognition) to the individuals who provide it (language, dialect). The ultimate purpose of text cleaning and preparation is to minimise the text to only the words required for your NLP objectives. For cleaning and processing text data, we will employ a few standard methods [7]. They include the following: 1. Remove Punctuation—As another text processing technique, punctuation can be removed. There are 32 main punctuations to deal with. 2. Remove Words and Digits—Words and digits which are combined are sometimes written in the text which makes it difficult for machines to understand. Therefore, we should remove combined words and digits such as game 57 or game44ts66. 3. Tokenisation—In the process of tokenisation, before converting the text into vectors, we convert it into tokens. It is majorly classified into sub-words, characters and word tokenisation. 4. Remove Stop Words—In pre-processing, we remove words that are not important to the structure of the sentence. These are usually the most common words in the language and do not provide any insight into the tone of the statement. 5. Lemmatisation—Lemmatisation is related to stemming in that it is used to turn words into root words, but it works differently. In reality, lemmatisation is a method of systematically reducing words to their lemma by comparing them to a linguistic dictionary. It takes a long time; therefore if you want to save time, you should utilise Porter Stemmer.

124

P. Mittal et al.

Fig. 6 Before and after the transformation of the feedback

The function above shows how to perform data pre-processing tasks. The function creates a new column after performing each step for each feedback. The result of the data reprocessing is shown Fig. 6 in the transformed text column. Vectorising Data We now have a corpus of words to work with. After that, vectorisation [8] of the corpus is required. The act of encoding text as integers, i.e. numeric form, in order to produce feature vectors that the ML algorithm can interpret, is known as vectorisation. There are a variety of techniques for vectorising data, but I’m going to use Word2Vec because Bag of Words and TF-IDF have a number of drawbacks, including giving weight to uncommon words, which is not good for our model, and not being able to keep track of word context, which is critical when predicting quality. Instead of a single integer, each word in Word2Vec is represented as a vector with 32 or more dimensions. Model Building Here, we are focusing on supervised machine learning technique. A train-test split is a way to judge the performance of ML algorithm; taking a data set and separating it into two subgroups is the technique. The training data set is used to fit the model. In second subset, the dataset’s input element is given to the model, which then makes predictions and compares them to the predicted values. The test data set is the name given to the second dataset. • Train dataset—Used to fit the ML model. • Test dataset—Used to evaluate the fit ML model. In this study, we used the supervised learning algorithm in which we use classification techniques. Some algorithms are used for both regression and classification

Feedback Analysis of Online Teaching Using SVM

125

Fig. 7 Hyper-plane separating three classes in SVM

technique. And we tried all this algorithm, but we find SVM [9] is performing really well. Support vector machine is a method of differentiation that generates separation model by obtaining a margin limit to differentiate between various categories. Support vector machine was originally designed for binary split problems where target class value has only two labels. For multiple classification problems, there are support vector machine versions introduced. Therefore, our problem is multiclass then we are using one against all SVM technique for building the model. Classification of three classes is shown in Fig. 7. One against all SVM creates a support vector model, where n is a target class number. All these models are trained on the data set having respective class value as positive and all remaining classes as negative. Hence, we got the accuracy of 70%, and confusion matrix is shown in Fig. 8.

3.2 Experimental Result Analysis We can use the first model to predict the sentiment of the student feedback whether it is positive, neutral or negative. We can collect the data using a Google form provided to every student, in which students provide their evaluation of the online classes. Data is shown below which is collected through the Google form. Firstly, feedback [10] review of teacher is taken by student(users) that contains 5–6 questions of a feedback on the basis of which we have review data prior on which we will generate a NLP model that analyses sentiments and gives tokens as positive, negative or neutral. Now we are left with questions and tokens. And we are going to factorise the answers of questions, i.e. converting alphabets, respectively, into numeric values example—agree, disagree, strongly agree, strongly disagree, neither _ nor _ as rating on scale of 5 and further tokenised. Now, we have different numbers as a form of answers to questions and then tokens. On the basis of which, I am going to generate a new ML model working on many ML algorithm which acts in such a way that takes questions as input and return token as output. For example, we will provide some

126

P. Mittal et al.

Fig. 8 Confusion matrix of SVM on students’ feedbacks

sample options of all questions to the user(student) whose answer will let us know that whether they like that particular lecture of the teacher or not. It can also work as each question asked by the student won’t have same priority to judge on equal scale; accordingly, each question hasn’t same importance for which another model plays an effective role that how much priority each question holds to give a proper judgement of a teacher keeping in context that on which aspect/skill should teacher majorly focus on to give proper understanding of whole context to students. Let’s take an example like knowledge of teacher and interaction in class, both are important factors to keep in mind to create a healthy environment in class. But interaction in class is majorly an important factor than knowledge of teacher. If both factors don’t satisfy, then consider it as NULL. In this way, our model works and notifies that along with the knowledge, you need to be interactive for the students. Hence, the different type of questions and answers accordingly will give a proper judgement of Quality of Education. Figure 9 shows the distribution of all the answers from the students.

4 Result • The accuracy of the sentiment analysis is 70%. • The accuracy of the second model which was trained on student feedback is 70% using random forest technique.

Feedback Analysis of Online Teaching Using SVM

127

Fig. 9 Questionnaire data analysis

• In Fig. 10, you have to fill according to your experience and then it will calculate the result based on the inputs. Therefore, you will get result anything between positive, negative and neutral.

5 Conclusion After a study, we found that conventional learning is more effective than the online learning. By analysing the survey and sentimental analysis, we came to know traditional approach is more convenient for students. According to learners’ feedback, there is lack of practical skills that should be upgraded in order to make online system more effective. Online study has benefits as well as challenges. Mostly in online learning, teachers are more involved in preparing lessons, and this automatically reduces the job of teaching students.

128

P. Mittal et al.

Fig. 10 User interface

In the end, we would recommend that along with the availability of recordings and PDFs, practical projects must be provided by the teachers in order to skill up student’s knowledge.

References 1. Feng M, Bienkowski M (2012) Enhancing teaching and learning through educational data mining and learning analytics. Department of Education, Office of Educational Technology 2. Kumar S, Nezhurina MI (2020) Sentiment analysis on tweets for trains using machine learning. Research Gate, p 10 3. Mishra L, Gupta T, Mishra Shree A (2020) Online teaching-learning in higher education during lockdown period of COVID-19 pandemic. Int J Educ Res Open 4. Goyal C (2021) Multiclass classification using SVM. Analytics Vidhya 18 May 2021 [Online]. Accessed 2022 5. Hounsell D (2003) Student feedback, learning and development. Research Gate, p 35 6. Bartlett P, Shawe-Taylor J (1998) Generalization performance of support vector machine and other pattern classifiers. MIT Press 7. Sharma S (2021) To study on impact of the online learning/teaching on the students of higher education. Research Gate, p 59 8. Pantola P (2018) Natural language processing: text data vectorization. Medium [Online] 9. Ari N, Ustazhanov M (2014) Matplotlib in python. In: 11th International conference on electronics, computer and computation (ICECCO), p 6 10. Vineet Yadav HE (2017) Student feedback mining system using sentiment. IRJET

DCGAN for Data Augmentation in Pneumonia Chest X-Ray Image Classification S. P. Porkodi, V. Sarada, and Vivek Maik

Abstract Major advancement in the field of medical image science is mainly due to deep learning technology, and it has demonstrated good performance in numerous applications such as segmentation and registration. Using generative adversarial networks (GAN), this study provides an outstanding data augmentation technique for developing synthetic chest X-ray images of pneumonia victims. The proposed model first leverages standard data augmentation methodologies in combination with GANs in order to produce more data. The unparalleled chest X-ray descriptions of patients who suffer from pneumonia using a unique application of GANs are developed. The generated samples are used to train a deep convolutional neural network (DCNN) model to classify chest X-ray data. The performance metrics values of existent and synthetic images were also compared and calculated. Keywords Generative adversarial network (GAN) · Deep convolutional neural network (DCNN) · Deep convolutional generative adversarial network (DCGAN) · Chest X-ray images · Pneumonia

1 Introduction One of the lung infections that causes inflammation of one or both lungs in air sacs is pneumonia. The hitches caused when the air sacs are filled with fluid or pus are cough with phlegm or pus, fever, chills, and difficulty in breathing (purulent material). The cause of pneumonia is mainly due to number of different organisms, including bacteria, viruses, and fungus [1]. The most frequent method for identifying pneumonia is via a chest X-ray [2]. Deep learning-based solutions are increasingly being used for medical diagnosis [3]. The outcomes are more accurate than the predicted diagnosed results by radiologists [4]. Convolutional neural S. P. Porkodi · V. Sarada (B) · V. Maik SRM Institute of Science and Technology, Kattankulathur, Chennai 603206, India e-mail: [email protected] V. Maik e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_12

129

130

S. P. Porkodi et al.

networks (CNN) are commonly used in biological image diagnostic systems and show promising results for image classification. The optimization is tractable in CNN because of the flow of information and gradients [5]. CNN may be used to create powerful classification models for classifying chest X-ray images in order to detect pneumonia. A massive amount of training data is required to obtain greater performance. The model will get more accurate as the dataset size increases [6]. CNNs, on the other hand, suffer from overfitting and have a large loss in accuracy due to a lack of training data. To alleviate these problems, another data augmentation option is generative adversarial network (GAN), which generates synthetic samples that may be appended to the real dataset. Unlike the previously described traditional data augmentation approaches, which only make small amendments to the training set, GANs may produce realistic images that can possibly provide variability to the training set. This ability of GANs has attracted the interest of researchers in the field of medical imaging, along with many approaches like low-dose CT denoising [7], skin lesion synthesis [8], organ segmentation [9], and cross-modality transfer, for example from MR to CT [10]. An identification deep learning prototype that can be used to spot pneumonia with the help of chest X-ray images is proposed in this research. To boost the amount of samples in the dataset, GANs were utilized, also to compare and calculate the PSNR and MSE value of original and synthetic images. The content of this paper is systematized as follows: Sect. 2 goes over the related work. Section 3 describes the dataset, while Sect. 4 describes the proposed technique. The results are described in Sect. 5, and the conclusion is presented in Sect. 6.

2 Related Work In the field of research in medical area, numerous studies are carried out to enhance deep learning applications of diagnosing pneumonia. Individual study has its own set of benefits and drawbacks. In images, voice processing, and image-to-image transformation [10, 11], generative techniques are frequently employed to produce high-dimensional data. GANs, autoregressive models, and variational autoencoders (VAE) are some of the most recent techniques [12, 13]. Autoregressive approaches, such as PixelCNN, provide high-resolution images but require a long time to train and directly map the distribution across pixels, resulting in images with low variability. Training a dataset is faster by using VAEs, but it does not provide highresolution images, whereas GANs can compete this challenge and produce quality low-resolution images though they have insecure training that commonly leads in mode collapse.

DCGAN for Data Augmentation in Pneumonia Chest X-Ray Image …

131

3 Dataset Description For our research, we used Kaggle’s public dataset chest X-ray images (pneumonia). There are 4273 images of patients with pneumonia and 1583 images of chest Xrays of a normal person in the collection. The total number of chest X-ray pictures obtained by us is 5856 [11].

4 Proposed Approach 4.1 Conventional Data Augmentation The regular techniques of data augmentation ways are affine transformation, geometric transformation, and a blend of the two techniques [12]. We used affine transformation to rotate the images, and each image was randomly interchanged from left to right. To the pixel values, Gaussian noise was applied. Geometric changes were made using contrast enhancement, image sharpening, and histogram equalization methods. Image sharpening highlights the differences between dark and light areas, whereas histogram equalization increases image contrast.

4.2 Generative Adversarial Network (GAN) Game theory is used to define GAN designs; it is a combination of discriminator and generator network. The discriminator network is most perplexed by the generator network, which provides a sample that should be comparable to the training images [13]. The work of the discriminator is to extricate between synthetic and genuine images, encouraging the generator to provide realistic images. As a result, in game theory, this design is the same as the min–max problem. The architecture of generative adversarial network is shown in Fig. 1.

4.3 Convolutional Neural Network (CNN) A sample image is identified; the features of the image are segregated by assigning acquire weights and biases using deep learning convolutional neural network (ConvNet/CNN) [14]. The image preprocessing done by ConvNet is low compared to the other classification techniques, whereas kernels in primitive techniques are hand-engineered, and ConvNets with appropriate training can grasp these filters/characteristics. The concept of a ConvNet is comparable to the connecting network of neurons in the human brain and was inspired by the arrangement of the

132

S. P. Porkodi et al.

Fig. 1 GAN architecture

Visual Cortex. Individual neurons only respond to changes in a small portion of the visual field known as the receptive field. If a set of comparable fields overlap, they will cover the whole visible region. The architecture of generator and discriminator model is shown below in Fig. 2.

4.4 DCGAN The architecture of DCGAN shows the deep CNNs where both the discriminator and generator networks form a part of it [15]. The images produced by the generator and, is then correlated to find the difference of the genuine and the synthetic image that process is done by the discriminator. It returns the likelihood that the specified experimental data is a genuine data. The generator’s purpose is to generate image data, whereas discriminator purpose is to accept as real with a high likelihood [16]. The generator is given a 100-number input vector created at random from a basic distribution. Figure 3 depicts a contrast between genuine (real) and synthetic images. The fractionally stride convolutional layers are made up of four interconnected transpose convolutional layers. For the totally connected layers, ReLU initiation function with group normalization is applied, whereas the output layer is applied with tanh activation function. Discriminator is a CNN with four convolutional layers that are all completely linked. For the totally connected layers, leaky ReLU initiation function with batch normalization is applied, whereas the output layer is applied with sigmoidal activation function. Group normalization dramatically improves neural network optimization and is especially effective with DCGANs [17]. The batch is normalized using its

DCGAN for Data Augmentation in Pneumonia Chest X-Ray Image …

133

Fig. 2 a Generator model and b discriminator model

own statistical properties [18]. Over ten epochs, the DCGAN prototype was trained with arbitrarily created noise for the generator. Figure 4 portrays GAN training loss. It also shows a contrast of existent and replica images created by DCGAN. The experiments were carried out in the Google Colaboratory environment using Python and PyTorch. Some elements of the model were also implemented using other Python packages that include NumPy, time, scikit-learn, and Matplotlib. However, owing to Google Colab’s dynamic GPU allocation, the execution time may be inaccurate and vary among trials.

134

Fig. 3 Contrast between genuine (real) and synthetic (fake) images

Fig. 4 Generator and discriminator training loss

S. P. Porkodi et al.

DCGAN for Data Augmentation in Pneumonia Chest X-Ray Image …

135

4.5 Performance Metrics Peak signal-to-noise ratio (PSNR) is the relation of an image’s obtained power to the power of noise that impacts the illustration’s quality. To determine the PSNR of an image, it must be compared to a real image with the highest possible power. The PSNR is given by the expression PSNR = 10 log10

  M −1 (M − 1)2 = 20 log MSE RMSE

(1)

In this case, M is the number of maximum potential intensity levels in an image (the least intensity level is assumed to be 0). MSE stands for mean squared error and is defined as follows: here G represents the existent image’s matrix data. F denotes the degraded image matrix data, whereas m is the count rows of pixels in the image, I is the pointer of that row, n is the count of pixel columns in the image, and j is the pointer of that column. The root mean squared error is abbreviated as RMSE. MSE =

m−1 n−1 1  (G(i, j) − F(i, j))2 mn i=0 j=0

(2)

5 Result and Discussions In this part, we discuss the results of our predicted technique for detecting pneumonia in chest X-rays, which employs DCGAN data augmentation and pre-trained CNNs. In summary, the following tests were carried out: – Evaluate the performance of original and synthetic images generated and also GAN training losses. – It can also be used to calculate the PSNR and MSE value. – Our model achieves a PSNR value: 28.11and MSE: 3.404e−05. As indicated in Table 1, we used grid search to get the optimal hyper-parameter setup. We just present and analyze the best findings and settings, which are given in Table 1.

6 Conclusion GAN with CNN model was proposed in this work for detecting pneumonia in chest X-ray images. Grid search was done on constraints that includes number of epochs,

136 Table 1 Parameter setting

S. P. Porkodi et al. Parameter

Value

Batch size

128

Image size

64

Epochs

10

Learning rate

0.0002

Beta

0.5

Optimization

Adam

learning rates, batch size, and numbers of synthetic images to establish the ideal hyper-parameter configuration that will be regularized across trials and so allow for a fair assessment. To improve model performance, the DCGAN approach was utilized, and bogus images were amplified to balance the dataset. The classic structure achieves high correctness and computes the PSNR and MSE value for both the synthetic and genuine images.

References 1. José RJ, Brown JS (2016) Opportunistic bacterial, viral and fungal infections of the lung. Medicine 44(6):378–83 2. World Health Organization (2001) Standardization of interpretation of chest radiographs for the diagnosis of pneumonia in children. World Health Organization 3. Kallianos K, Mongan J, Antani S, Henry T, Taylor A, Abuya J, Kohli M (2019) How far have we come? Artificial intelligence for chest radiograph interpretation. Clin Radiol 74(5):338–345 4. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL (2018) Artificial intelligence in radiology. Nat Rev Cancer 18(8):500–510 5. Rajpurkar P, Irvin J, Zhu K, Yang B, Mehta H, Duan T, Ding D, Bagul A, Langlotz C, Shpanskaya K, Lungren MP (2017) Chexnet: radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225 6. Jain R, Nagrath P, Kataria G, Kaushik VS, Hemanth DJ (2020) Pneumonia detection in chest X-ray images using convolutional neural networks and transfer learning. Measurement 1(165):108046 7. Kim J, Kim J, Han G, Rim C, Jo H (2020) Low-dose CT image restoration using generative adversarial networks. Inform Med Unlocked 1(21):100468 8. Frid-Adar M, Klang E, Amitai M, Goldberger J, Greenspan H (2018) Synthetic data augmentation using GAN for improved liver lesion classification. In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018) 2018 April 4. IEEE, pp 289–293 9. Dong X, Lei Y, Tian S, Wang T, Patel P, Curran WJ, Jani AB, Liu T, Yang X (2019) Synthetic MRI-aided multi-organ segmentation on male pelvic CT using cycle consistent deep attention network. Radiother Oncol 141:192–199 10. Qian P, Xu K, Wang T, Zheng Q, Yang H, Baydoun A, Zhu J, Traughber B, Muzic RF (2020) Estimating CT from MR abdominal images using novel generative adversarial networks. J Grid Comput 18(2):211–226 11. Hashmi MF, Katiyar S, Keskar AG, Bokde ND, Geem ZW (2020) Efficient pneumonia detection in chest Xray images using deep transfer learning. Diagnostics 10(6):417 12. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25

DCGAN for Data Augmentation in Pneumonia Chest X-Ray Image …

137

13. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27 14. Li Z, Liu F, Yang W, Peng S, Zhou J (2021) A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans Neural Netw Learn Syst 15. Srivastav D, Bajpai A, Srivastava P. Improved classification for pneumonia detection using transfer learning with GAN based synthetic image augmentation. In: 2021 11th International conference on cloud computing, data science engineering (confluence) 2021 Jan 28. IEEE, pp 433–437 16. Nitz DA (2006) Tracking route progression in the posterior parietal cortex. Neuron 49(5):747– 756 17. Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 18. Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training GANs. Adv Neural Inf Process Syst 29

Fair Quality of Voice Over WiMAX Coexisting of WiFi Networks for Video Streaming Applications V. R. Vinothini, C. Ezhilazhagan , and K. Sakthisudhan

Abstract In earlier wireless technology, the transmission of voice with high quality is very difficult. But later this problem is solved by Voice over Internet Protocol which uses traditional Internet Protocol to transmit voice as packets. To transmit the voice over larger areas is done easily by using WiMAX. However, highly developed areas are using WiFi, which is growing in wireless technology. Therefore transmitting voice packet traffic over a wide area with high Quality of Service, we can interface both WiMAX and WiFi. But the Quality of Voice degrades when the number of users increases in the network. This paper deals with the increase in Quality of Voice with the help of MOS and providing security. Keywords VoIP · MOS value · WiMAX · WiFi

1 Introduction Recently, wireless technology has grown immensely in fame with various choices of networks. IEEE 802.11 (WiFi) and IEEE 802.16 (WiMAX) play an important role in the wireless communication. WiMAX is a talented technology due to high data rate, better throughput and broad coverage area [1]. With the recent immense development of WiFi networks, the coexistence between WiFi and WiMAX is inevitable. Companies like Intel and Motorola are endorsing the integrated WiFi and WiMAX interface to acquire the benefits of a scenario. In such interface, the WiFi Access Point gets the signal from the WiMAX Base Station by wireless Backhaul [2]. Users in such a WiFi/WiMAX interface can get a signal either from the Base Station or from the Access Point. The main advantage of this interface network is to improve the QoS of both the networks.

V. R. Vinothini Bannari Amman Institute of Technology, Erode, Tamil Nadu, India C. Ezhilazhagan · K. Sakthisudhan (B) N.G.P. Institute of Technology, Coimbatore, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_13

139

140

V. R. Vinothini et al.

Fig.1 VoIP

A recent trend in wireless communication is transmitting the voice using traditional IP networks. It is more helpful in the business world and also increases the number of vendors and payers [3]. The major advantages of Voice over Internet Protocol are bandwidth efficiency and make easier for creating the new services, and the disadvantages are more jitter and packet delay when the number of users is more, which in turn reduces the voice clarity [4]. The eminence of voice application mainly depends on Mean Opinion Score value and the type of encoder scheme [5]. In this article, we propose an architecture to overcome the shortcomings which is composed of real-time routers, switches, firewall and better voice encoder. With this, the performance is investigated by using the parameters such as jitter, delay, MOS value and throughput of low and high loaded network condition.

1.1 Voice Over Internet Protocol It is known that the Voice over Internet Protocol is a network application which falls within the application layer. VoIP is a combination of voice data and Internet Protocol, i.e. voice traffic is transmitted over IP networks. By using VoIP, we can call from phone to phone, phone to PC, PC to Phone via the Internet which is shown in Fig. 1.

1.2 VoIP Over IEEE 802.11 The main reason for spreading and integrating IEEE 802.11 is due to the flourishing of voice application. The foremost troubles, when VoIP is used over WiFi, are as follows:

Fair Quality of Voice Over WiMAX Coexisting of WiFi Networks … Table 1 Values and parameters in WiFi

141

Parameters selected

Values set

Physical layer technology

IEEE 802.11

Data rates bits/s

11 Mbps

Transmit power

0.0005 w

Packet received power

−95

Short relay

7

Long relay

4

Physical characteristics

Direct sequence

AP beacon interval

0.02

Max received lifetime

0.5

Antenna gain

14 dbi

• WLAN has a very low system capacity for voice application. • The performance of VoIP is degraded because of the merging of VoIP traffic and traditional data traffic. These difficulties are due to the following reasons: • WiFi inflicts the largest per-packet overhead. • Network contention occurs due to the accessing of channel in a distributed manner. These contentions and traffic are controlled by using the appropriate values and specifications of each node. Table 1 provides the specification used in this WiFi network.

1.3 VoIP Over IEEE 802.16 IEEE 802.16 gives “last mile” connection in WMAN. It is a best alternative to the wired network such as DSL and cable modem. In addition to Quality of Service, it offers a data rate up to 1 Giga bits per second for a fixed node and a wide coverage area of around 50 kms. Thus, the usage of multimedia services such as Voice over Internet Protocol, Voice on Demand and video conferencing is now possible for long distance and in low cost. It has five QoS classes [6, 7]. They are • Unsolicited Grant Service (UGS) is proposed for Constant Bit Rate services such as T1/E1 transport and VoIP with silence compression and generates fixed data size on a recurring interval with low latency and jitter. • Real-time Polling Service (rtPS) supports real-time services that create variable data size on a periodic basis. It is used in VoIP and MPEG without silence compression.

142

V. R. Vinothini et al.

Table 2 Values and parameters in WiMAX

Parameters selected

Values set

Max no. of SS nodes supports

100

Transmits power (W)

0.5

Received power tolerance

−90 to −60 dbi

Physical profile

OFDMA 20 MHz

Modulation

QPSK3/2

Average SDU size (bytes)

1420

Block time interval (s)

3

Connection ranging retries

16

T44 (Scan request timer) (milliseconds)

50

Antenna gain

15 dbi

• Extended real-time Polling Service (ertPS) has been recently introduced by the IEEE 802.16 to support VoIP traffic. • Non-real-time Polling Service (nrtPS) supports non-real-time services such as FTP and requires variable data size on a regular basis and minimum data rate. Best Effort (BE) is used where the application does not need any QoS. Ex: HTTP. Table 2 contains the values used in WiMAX network.

1.4 Mean Opinion Score (MOS) Value In voice or video communication, the quality of human speech is expressed in terms of numerical value. It is called as Mean Opinion Score value. We can compute MOS by three ways. They are • By using a nonlinear mapping from the R-factor as, MOS = 1 + 0.035Rf + 7 × 10−6 Rf (Rf − 60)(100 − Rf )

(1)

where Rf is an R-factor, and it is given by Rf = 100−ISN − Ie − Id + A

(2)

I SN is the signal-to-noise impairments I e is a harm factor associated with the losses due to the codecs and network I d represents the harm caused by the mouth-to-ear delay A compensates for the above harm factor under various user conditions and is known as the expectation factor. For VoIP, the variables that considered in general are I e and I d . So by using default values for the remaining variables, Eq. (2) is reduced into

Fair Quality of Voice Over WiMAX Coexisting of WiFi Networks …

Rf = 94.2 − Ie −Id

143

(3)

This R-factor ranges from 50 (bad) to 95 (excellent). • By computing the arithmetic mean (average) from the rating of the users, the final average value scales from 1 to 5. The value 1 implies the poor voice quality, and the value 5 points out the excellent quality of speech. • By using some software like Chariot VoIP Assessor, OPNET, VoP test solution and GoldenMOS, we can calculate the MOS value very easily. Currently, software is used to calculate the MOS value directly by considering the factors that affect the Quality of Voice, since the manual computation of MOS is very subjective and less dynamic. Remaining part of the article is organized as follows. Section II describes the existing method and the proposed model. The scenario and the simulation results are illustrated in Section III. Section IV gives the conclusion of the paper.

2 A Brief Introduction About Network Model In WiMAX, the best service flow of data is ertPS since it has a higher throughput, similarly for VoIP the UGS is better because of low delay, jitter and traffic which is reported in Refs. [8] and [6]. Hence, in this article, the authors use UGS as service flow in order to reduce the delay, jitter and traffic which in turn increase the throughput. As reported earlier [9], the performance of VoIP over IEEE 802.11b is compared and IEEE 802.11b is observed to be very sensitive to delay factor, further it can tolerate the packet loss up to some extent, also it is based on voice codecs and millisecond packetization intervals. At the same time, it poses some challenge in security issues such as Man in the Middle attack, Denial of Service, Spam over Internet Telephony (SPIT) and Protocol Fuzzing, which is explained in detail. To overcome the issues regarding security, in the present article the authors use IPSec to provide data authentication, data integrity and data encryption with the help of Investigation of Internet Key Exchange (IKE), Gateway Security (GSE) and Encapsulating Security Payload (ESP) tunnel, which is explained in [10]. It allows to avoid these issues by a firewall [11]. Similarly, G.711 is used, which in turn uses Pulse Code Modulation as its compression method as the best encoder scheme for VoIP, which are reported in earlier Refs. [12, 13]. It is reported [7] that the Quality of VoIP over WiMAX is good and the Quality of Service in WiMAX is more powerful which guarantees better quality for interactive and real-time audio and video services. The WiFi network is already widely used in highly developed cities. But it has a low coverage area and poor Quality of Service. WiMAX networks will have a broad coverage area, better Quality of Service and low cost in future. So the integration of WiFi and WiMAX network achieves better Quality of Service such as better throughput, lower packet loss ratio and lower latency [14–17]. This in turn increases the Quality of Voice. Hence in this scenario, we transmit the voice traffic (voice data) in heterogeneous

144

V. R. Vinothini et al.

BS: Base Station

AP: Access Point

SS: Subscriber Station

Fig. 2 Integrated WiMAX and WiFi

WiFi/WiMAX network over VoIP. As WiFi has both indoor and outdoor communication, and WiMAX has the only outdoor communication, we chose WiFi as outdoor in this scenario, i.e. IEEE 802.11b. Figure 2 is the integrated WiFi/WiMAX network. The Base Station of WiMAX gives signal to the Subscriber Station and also to the Access Point of WiFi by wireless backhaul. The WiMAX BS is finally connected to the server via the IP Cloud. The server also determines the network performance in terms of security [18]. The voice clarity gets degraded when the number of users increases. To implement these effects, we use Opnet as a simulator tool, since compare to other simulation software such as Qualnet, NS2 and so on, Opnet is very easy in implementing real-time applications, and also it supports large range of communication from Ethernet to global internetworks. In Ref. [19], the authors have deployed a real-time video conferencing network service. In this paper, the MOS is directly calculated by using the software.

3 Simulation Results We use the Opnet simulator for analysing the performance of both low and high loaded network conditions in the WiMAX and WiFi coexisting scenario. Table 3 represents the simulation parameter of the scenario.

Fair Quality of Voice Over WiMAX Coexisting of WiFi Networks …

145

Table 3 Parameters used in this scenario Parameters

Low loaded network

High loaded network

No. of subnets

2

6

No. of nodes (each subnet)

WiMAX-4 WiFi-4

WiMAX-4 WiFi-4

Mobility of nodes

Fixed

Fixed

Application coder

G.711

G.711

Traffic type

VoIP

VoIP

Simulation time

1h

1h

Simulation area

100 × 100 km

100 × 100 km

Subscriber station

Wimax_ss_wkstn (fix)

Wlan_station_adv (fix)

Base station (WiMAX)/access point (WiFi)

Wimax_bs_ethernet4_slip4_router

Wimax_ss_wlan_router (fix)

Figure 3a describes the scenario model for low loaded network condition, which contains only two subnets. The network with the minimum number of users is called low loaded network condition. In this scenario, the total number of users is 16. Figure 3b illustrates the scenario model for high loaded network condition, which means that the network has interconnected huge number of users in the same network scenario. It contains a number of subnets as 6 and the total number of users as 48. In this set-up, all the users are connected to the IP Cloud by means of the Base Station. This IP Cloud is then allied to the server via router, switch and firewall. The router used here is the real-time Cisco 7200 router, the switch that used for this tries out to be Bay Network Accelar 1050 and the firewall is Nortel Firewall. Each subnet has the integration of IEEE 802.16, which has a data rate of 52 Mbps and IEEE 802.11b whose data rate is 11 Mbps. All the nodes are provided with voice application and security by using application and profile definition. The encoder

Fig. 3 a Snapshot of low loaded network condition, b snapshot of high loaded network condition

146

a

V. R. Vinothini et al. 1

x 10

-3

b

5 Low Load Network Condition High Load Network Condition

4.5

Mean Opinion Value

Voice Qos: Jitter (Sec)

0.5

0

-0.5

-1

-1.5

4

3.5

3

2.5

2

-2

1.5 Low Load Network Condition High Load Network Condition

-2.5

0

500

1000

1500

2000

2500

3000

3500

1

0

500

1000

Simulation Time (Sec)

1500

2000

2500

3000

3500

Simulation Time (Sec)

Fig. 4 a Voice application: Jitter, b voice application: MOS value

scheme for voice application G.711 is used to improve the MOS. Load, jitter, Mean Opinion Score value and throughput are the metrics that are analysed in this scenario. The delay in the arrival of packets at the receiver end is called as Jitter. If the time gap between two successive packets is high, then the fineness of speech gets degraded. This jitter is the most important factor in multimedia services. The outcome in Fig. 4a demonstrates the average graph for jitter over voice application. By contrasting the low and high loaded scenario, the jitter is more or less same, and it is approximately equal to −0.0004. The negative jitter illustrates that the delay between the adjacent packets at the receiver end is less than that of the sender side due to less network congestion. Figure 4b enlightens the Mean Opinion Score over voice application. When we relate both high and low loaded network conditions for Mean Opinion Score over voice application, we have both of its value nearly equal to 4. It is since the jitter is low, and also we use the voice encoder scheme as G.711, which reduces the usage of bandwidth by using the Voice Activity Detection and Comfort Noise Generation. Seeing that the MOS value is almost 4, we can say that the speech quality is good and the level of distortion as just perceptible but not annoying. Delay defines the time taken by the packet to reach their destination from its source. It includes transmission delay, propagation delay and processing delay. Figure 5a– c illustrates the average graph for delay variation in voice packet, IEEE 802.16 and IEEE 802.11b, respectively. It infers that the delay is slightly more for high loaded condition. This mild variation is owing to the maximum number of users who originate the traffic and congestion. Medium access delay represents the time required for the first packet which held up a line of packets (head-of-line blocking) in the MAC buffer to access the medium. This delay is only due to the competition of packets to access the medium for transmission and not by the own packet traffic. This delay also comprises the time taken to exchanging the RTS-CTS control packets and back off period. Figure 5d shows the high medium access delay for high loaded network condition since it has more

Fair Quality of Voice Over WiMAX Coexisting of WiFi Networks …

a

b

0.2 Low Load Network Condition High Load Network Condition

0.018 0.016

0.16

0.014

0.14

0.012

WiMax Qos: Network Delay (Sec)

Voice Qos:Packet End to End Delay

0.18

0.12 0.1 0.08

0.01 0.008

0.006

0.06

0.004

0.04

0.002

0.02

0

500

1000

1500

2000

2500

3000

Low Load Network Condition High Load Network Condition

0

3500

0

500

1000

Simulation Time (Sec)

c 3.6 x 10

147

2000

1500

2500

3000

3500

Simulation Time (Sec)

-3

d 0.025 Low Load Network Condition High Load Network Condition

Low Load Network Condition High Load Network Condition

3.4

WiFi Qos: Medium Acess Delay (Sec)

0.02

WiFi Qos: Network Delay (Sec)

3.2

3

2.8

2.6

0.015

0.01

0.005 2.4

2.2

0 0

500

1000

1500

2000

SimulationTime (Sec)

2500

3000

3500

0

500

1000

1500

2000

2500

3000

3500

4000

Simulation Time (Sec)

Fig.5 a Voice application: end-to-end delay, b IEEE 802.16: delay, c IEEE 802.11b: delay, and d IEEE 802.11b: medium access delay

number of users. But it does not affect the quality of speech as the effective voice codec, router and switch configuration are used. The amount of traffic (data) carried by the network is called a load. It decides the performance of the network since it affects two factors such as delay and throughput. The delay and throughput increase with increase in load within the network capacity. But once the load exceeds its limit, the throughput starts degrading due to network congestion and increase in delay. Thus, the routers and switches become very important in carrying out the load. Figure 6a, b portrays that the load is more for high network loaded condition since there are more number of users. This increase in load may create network congestion when it surpasses its capacity. But the type and configuration of switch and router used here are controlling the load to be active within its capacity. Figure 7a, b demonstrates the enhanced throughput for more numbers of users. The conventional configurations that are used today for switches and routers tend to lose their efficiency exponentially with respect to increase in the number of users. The reason is obvious as the number of users increases, and there will be congestion in the network due to heavy load which will eventually result in packet loss or bit

V. R. Vinothini et al.

7

6

x 10

5

5

4

3

2

Low Load Network Condition High Load Network Condtion

1

0

x 10

b 3.5 WiFi Qos: WiFi Network Load (Bits/Sec)

a WiMax Qos: Network Load (bits/Sec)

148

500

0

1000

1500

2000

3000

2500

3500

5

3

2.5

2

1.5

1

0.5

High Load Network Condition Low Load Network Condition

0

4000

500

0

1000

Simulation Time (Sec)

1500

2000

2500

3000

3500

Simulation Time (Sec)

a

2

x 10

5

WiMax Qos : Throughput (Bits/Sec)

1.8 1.6 1.4 1.2 1 0.8 0.6 0.4

12

10

x 10

4

8

6

4

2 Low Load Network Condition High Load Network Condition

Low Load Network condition High Load Network Condition

0.2 0

b WiFi Qos: Network Throughput (bits/sec)

Fig.6 a IEEE 802.16: Load, b IEEE 802.11b: load

0

500

1000

1500

2000

2500

3000

3500

0

0

500

1000

Simulation Results (Sec)

1500

2000

2500

3000

3500

Simulation Time (Sec)

Fig. 7 a IEEE 802.16: Throughput, b IEEE 802.11b: throughput

errors. But the configuration and type of routers that we have proposed will overcome these odds and giving us the best possible throughput.

4 Conclusion Thus winding up the examination by using the real-time router such as Cisco 7200 router and the switch as Bay Network Accelar 1050, we get the enhanced throughput than others even when increasing the number of users. Moreover, because of this router and switch, there is a reduction in the jitter and end-to-end delay which in turn improves the voice clarity. Such improved voice clarity is very useful in face-to-face conversation (video conversation). This result is completely implemented in Wide Area Network’s coverage area (100 × 100 km), and it can be extended by integrating UMTS and simulating in vehicular-to-vehicular environment.

Fair Quality of Voice Over WiMAX Coexisting of WiFi Networks …

149

Acknowledgements The authors acknowledge the financial support rendered by All India Council for Technical Education (AICTE), New Delhi. Infrastructural support provided by Bannari Amman Institute of Technology is thankfully acknowledged.

References 1. Chou C-M, Li C-Y, Chien W-M, Lan K (2009) A feasibility study on vehicle-to-infrastructure communication: WiFi versus WiMAX. In: Proceedings of the 10th international conference on mobile data management: systems, services and Middleware, Taipei, 2009, pp 397–398 [Online]. Available: https://doi.org/10.1109/MDM.2009.127 2. Wang W, Liu X, Vicente J, Mohapatra P (2011) Integration gain of heterogeneous WiFi/WiMAX networks. IEEE Trans Mob Comput 10(8):1131–1143 [Online]. Available: https://doi.org/10.1109/TMC.2010.232 3. Atif Qureshi M, Younus A, Saeed M, Sidiqui FA, Touheed N, Shahid Qureshi M (2011) Comparative study of VoIP over WiMax and WiFi. Int J Comput Sci Issues 8(3):433–437, no. 1 [Online]. Available: http://ijcsi.org/articles/Comparative-Study-of-VoIP-over-WiMax-and-WiFi.php 4. Wang X, Yu K, Zhang L (2009) Improving performance of VoIP services in IEEE 802.16 OFDMA system. In: Proceedings 5th international conference on wireless communications, networking and mobile computing, Beijing, 2009, pp 1–4 [Online]. Available: https://doi.org/ 10.1109/ICOM.2009.5300945 5. Islam S, Rashid M, Tarique M (2011) Performance analysis of WiMax/WiFi system under different codecs. Int J Comput Appl 18(6):13–19 [Online]. Available: https://doi.org/10.5120/ 2290-2973 6. Grewal V, Sharma AK (2010) On performance evaluation of different QoS mechanisms and AMC scheme for an IEEE 802.16 based WiMAX network. Int J Comput Appl 6(7):12–17 [Online]. Available: https://doi.org/10.5120/1090-1424 7. Alshomrani S, Qamar S, Jan S, Khan I, Shah IA (2012) QoS of VoIP over WiMAX access networks. Int J Comput Sci Telecommun 3(4):92–98 8. Adhicandra I (2010) Measuring data and VoIP traffic in WiMAX networks. J Telecommun 2(1):1–6 [Online]. Available: http://arxiv.org/abs/1004.4583 9. Kazemitabar H, Ahmed S, Nisar K, Said AB, Hasbullah HB (2010) A survey on voice over IP over wireless LANs. World Acad Sci, Eng Technol 71(Special Journal Issue):352–358 [Online]. Available: http://eprints.utp.edu.my/4530/ 10. Jha RK, Wankhede Vishal A, Dalal UD (2011) Investigation of Internet key exchange (IKE) In terms of traffic security with gateway security (GSE) In WiMAX network. Int J Comput Appl Special Issue on Network Security and Cryptograph 1:59–66 [Online]. Available: https://doi. org/10.5120/5918-053 11. Malik A, Verma HK, Pal R (2012) Impact of firewall and VPN for securing WLAN. Int J Adv Res Comput Sci Softw Eng 2(5):407–410 12. Ismail MN, Analyzing of MOS and codec selection for voice over IP technology. Ann Comput Sci Ser (Anale. Seria Informatic˘a) 7(1):263–276 [Online]. Available: http://arxiv.org/abs/0906. 0845 13. Ramakrishnan RS, Vinod Kumar P (2008) Performance analysis of different codecs in VoIP Using SIP. Mob Pervasive Comput, pp 142–145 14. Ghazisaidi N, Kassaei H, Saeed Bohlooli M (2009) Integration of WiFi and WiMAX-mesh networks. In: Proceedings of the 2nd international conference on advances in mesh networks, Athens, Glyfada, 2009, pp 1–6 [Online]. Available: http://ieeexplore.ieee.org/xpl/articleDe tails.jsp?arnumber=5222997 15. Lin H-T, Lin Y-Y, Chang W-R, Cheng R-S (2009) An integrated WiMAX/WiFi architecture with QoS consistency over broadband wireless networks. In: Proceedings of the 6th consumer

150

16.

17.

18.

19.

V. R. Vinothini et al. communications and networking conference, Las Vegas, 2009, pp 1–7 [Online]. Available: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4784890 Huang C-J, Hu K-W, Chen I-F, Chen Y-J, Chen H-X (2010) An intelligent resource management scheme for heterogeneous WiFi and WiMAX Multihop relay network. Int J Expert Syst Appl (Elsevier) 37(2):1134–1142 [Online]. Available: http://www.sciencedirect.com/science/ article/pii/S0957417409005909 Sgora A, Gizelis CA, Vergados DD (2011) Network selection in a WiMAX–WiFi environment. Int J Pervasive Mob Comput (Elsevier) 7(5):584–594 [Online]. Available: http://www.scienc edirect.com/science/article/pii/S1574119210001124 Shen C, Nahum E, Schulzrinne H, Wright CP (2012) The impact of TLS on SIP server performance: measurement and modeling. IEEE/ACM Trans Network 20(4):1217–1230 [Online]. Available: https://doi.org/10.1109/TNET.2011.2180922 Salah K, Calyam P, Buhari MI (2008) Assessing readiness of IP networks to support desktop videoconferencing using OPNET. J Netw Comput Appl (Elsevier) 31(4):921–943 [Online]. Available: https://doi.org/10.1016/j.jnca.2007.01.001

An Empirical Overview on DDoS: Taxonomy, Attacks, Tools and Attack Detection Mechanism Varsha Parekh and M. Saravanan

Abstract The denial of service (DoS) attacks are one of the most serious threats and one of the biggest security concerns facing the Internet today. A distributed denial of service (DDoS) attack can quickly deplete the target’s computing and communications resources in a short amount of time. Given the gravity of the situation, a variety of countermeasures have been proposed. In order to counteract these attacks, networkbound systems such as Web applications, database systems, virtualization servers and similar systems are currently being attacked by network attackers. Because, a distributed denial of service (DDoS) attack makes it difficult to identify network users from malicious traffic, and it is difficult to deal with, especially when traffic is coming from multiple, dispersed sources at different speeds. To avoid a distributed denial of service, it is preferable to prevent it and then take appropriate action to deal with it after it has already occurred. Tools and a taxonomy of defense mechanisms attempt to mitigate the effects of these attacks. The purpose of this study is to bring some order into the wide range of attack detection and defense techniques currently in use in order to gain a better understanding of the problem those working in the field of distributed denial of service attacks. Keywords DDoS attack · DDoS tools · Defense techniques

1 Introduction Defending against DDoS attacks has grown to be a significant issue for everyone who uses a computer system that is linked to the Internet. Attacks on the Internet’s productivity and profitability, known as denial of service (DoS) attacks, are becoming increasingly common. DDoS attacks are a relatively simple, yet extremely effective V. Parekh (B) · M. Saravanan Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamil Nadu, India e-mail: [email protected] M. Saravanan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_14

151

152

V. Parekh and M. Saravanan

strategy for attacking Internet resources and services, and they are becoming increasingly popular. An indirect assault on the delivery of resources of a clearly defined target network or system that is sent off through a large number of compromised computer devices is known as DDoS attack. The services that are being attacked are referred to as the primary victims, although the infiltrated systems that were utilized to send off the assault are referred to as the auxiliary victims. A DDoS attack that makes more use diminished responsibility allows the attacker to launch a considerably larger and more destructive attack while staying anonymous. By carrying out the attack themselves, secondary victims make it increasingly challenging to track down the original hacker for network forensics. Against this attack, the only way to survive is through a highly adaptable and spread defense strategy. Several large-scale attacks on high-profile Websites have been launched in the recent several years, according to the FBI. In order to put up an effective defense, it is important to see all elements of DDoS attacks as well as the installed defense measures. DDoS attacks and defense systems have been classified in a number of ways, according to certain experts. DDoS attacks are a relatively simple, yet extremely effective strategy for attacking Internet resources and services, and they are becoming increasingly popular. Attacks on network resources, such as Websites, online services and computer systems, are referred to as DoS attacks [1], which are planned to prevent user from entering a particular network asset. As defined in computer science, a denial of service is a deliberate endeavor to forestall certain systems and applications from providing their services. The attack is launched indirect through some kind of substantial percentage of composed computational linguistics. The services that are under attack are referred to as the “main victims,” while the hacked systems that were utilized to attacker uses are alluded to as the “secondary victims,” respectively. Because of the rapid growth of the Internet, widespread denial of service assaults increasingly becoming one of the most critical problems in information center environments where a large number of servers are deployed. A DDoS attack carried out by an enormous number of specialists creates a significant number of bundles that can quickly deplete a target’s computing and administrative resources in a brief timeframe. Due to the modular architecture of the Internet, network security is becoming more intricate than any other time for today’s corporate organizations, making it even more difficult to maintain network security. Among each successive year, the security risks that computer networks are subjected to get more technologically advanced, better organized and more difficult to detect and mitigate. The aim of this proposed of assault is to overburden the victim’s system, making it unequipped for executing typical exchanges. The DDoS refers to an attack in which attackers explicitly seek to prohibit legitimate users of a service from utilizing that service. The following are some examples [2]: 1. Attempts to “flood” a network in order to prevent legal network traffic from passing through it.

An Empirical Overview on DDoS: Taxonomy, Attacks, Tools and Attack …

153

2. An attempt to interrupt connections between two devices, hence prohibiting the use of an Internet-based service. 3. Attempts to prohibit a specific person from utilizing an assistance. 4. An attempt to intrude on support of a particular platform or individual. The remainder of this work is arranged as follows: Sect. 2 will respond to the literature review on DDoS attacks, and Sect. 3 will examine its recently developed taxonomy of DDoS. In Sect. 4, DDoS tools used to prevent attacks are discussed. Section 5 will bring this report to a close and offer an outline of the attack detection framework that will be done moving forward. Section 6 provides the attack detection mechanism with Sect. 7 providing the obtained results for performance parameters analysis for better outcomes.

2 Literature Survey To build an effective defense, it is essential to comprehend all of the components of a DDoS attack, as well as the countermeasures in place. According to some experts, DDoS attacks and defense systems have been classified in different ways. According to the research [1], the susceptibility of DDoS attacks may be separated into two basic types: throughput failure operations and destruction of resources attacks. Both of these categories can be classified as cyberattacks. Aiming to flood the targeted infrastructure with unwelcome traffic, bandwidth depletion attacks prevent actual authorized traffic from reaching the intended target. A destruction of natural resources attack is one that is meant to saturate the victim’s system’s resources, rendering it unable to process genuine service requests. In article [3], a number of classification criteria are listed, including the levels of automation, exploitation vulnerabilities, attack frequency characteristics and impact and so on. In [4], it was proposed to classify attacks according to the degree of automation, attack networks, exploited vulnerabilities, influence of the DDoS attack and taxonomy of the attack intensity dynamics. In [5], the authors defined DDoS attacks into three classifications as follows: congestion-based, irregularity-based and source-based approaches. The authors also classified defenses into two categories as follows: integration of existing filtering and resource network filtering. A framework for categorizing service attacks according to the kind of target (for example, firewalls, application servers, routers), the resources are consumed by the attack (for example, networks throughput, TCP/IP stack), and actually, the weak points that would have been exploited are described in [2, 6]. According to [7], the authors developed a framework for differentiating between DoS tactics, which is based, among other things, on header content, transient nominal operating properties and modern techniques like spectral analysis.

154

V. Parekh and M. Saravanan

Fig. 1 Taxonomy of DDoS attacks

3 Taxonomy of Distributed Denial of Service Attacks DDoS attacks come in different shapes and sizes. It is enthusiastically suggested that you understand the modular nature of attacks in order to build an efficient defense. Different types of attacks are described, and these attacks fall into several categories. In the case of DDoS attacks, these characteristics are included in the taxonomy. Stepby-step automated attacks exploited vulnerability attacks with dynamic attack rates and impact attacks with a communication mechanism attack with scanning strategy attack with propagation mechanism attack with percentage of change mechanism attack with rate of transformation framework attack with rate of change mechanism as mentioned (see Fig. 1).

4 DDoS Tools An extensive collection of diverse attacking tools, referred to as “stressors,” can be found on the Internet for no charge. Some of these tools have valid goals at their core, such as doing sensitivity analyzes against their own networks, which cybersecurity experts and system administrators may do from time to time. Some attack tools are

An Empirical Overview on DDoS: Taxonomy, Attacks, Tools and Attack …

155

specialized and only target a certain portion of the Internet protocol suite, while others are designed to permit for many attack vectors to be used in conjunction with one another. Attack tools can be roughly divided into different groups [6, 8]: A. Low and slow attack tools In order to keep connections on something other than a destination computer open for as long as possible, these technologies, which are designed to provide little amounts of information over several connections, continue to use network bandwidth until the targeted Website is unable to sustain any more connections. B. Application layer attack tools These tools are largely focused on layer seven of the OSI model, where HTTP requests occur. An attacker may create malicious traffic that is hard to differentiate from genuine visitor demands by flooding a focus with HTTP GET and POST requests. C. Protocol and transport layer attack tools A step or two deeper down the Internet protocol suite, these tools make use of protocols such as UDP to transmit enormous number of customers to a destination computer, such as during a unidirectional datagram protocol flood. These attacks, while often ineffective, are most commonly found in the form of DDoS attacks, where the benefit of more attacking machines multiplies the effect. A few regularly used tools include [8]: Low Orbit Ion Cannon (LOIC), High Orbit Ion Cannon (HOIC), Slowloris, TOR’s Hammer, THC-SSL-DoS, Pyloris, Http Unbearable Load King (HULK), R.U.D.Y (R-U-Dead-Yet), XOIC, GoldenEye.

5 DDoS Attack Detection Framework The attack detection framework is intended to tackle DDoS attacks on the Internet in a collaborative and contemporary manner. In a broader manner, the system gathers and categorizes network traffic samples. Initially, important features are extracted after cleaning the dataset, then these features are passed to the classification models. After performing classification, we can detect attack. The detection system is built on a dataset and a machine learning/deep learning algorithm. To begin, regular traffic and DDoS features were extracted, tagged and saved to a file. Following that, a dataset was built using feature selection approaches. Finally, the most precise ML/DL model was chosen, trained and imported into the attack detection system. The detection system’s architecture was built to interact with network traffic samples given by industry standard traffic sampling protocols and acquired from network devices [9].

156

V. Parekh and M. Saravanan

6 Proposed Attack Detection Mechanism The proposed system aims at detection of distributed denial of service attack by joining two methodologies, viz., signature-based and abnormality-based [9]. These two types of detection working in parallel. Initially, the traffic capture module records network traffic. The feature extraction module processes raw features into new meaningful feature set which in turn will be used by anomaly detection module [10]. The anomaly detection module takes input these features and labels that traffic as either legitimate or attack traffic [11]. At the same time, captured traffic is passed to rule-based detection module. Here, it is checked against different existing attack signatures, and if match is found, then the traffic is labeled as attack traffic, otherwise it is labeled as legitimate [12]. A. Traffic Capture This module performs errand of blocking packets that are bridging specific network. There are many tools that can catch live traffic, for instance, Wireshark. It is free and open-source packet analyzer. This module will go about as a contribution to detection system. The input to the detection system can be given: (1) on wire, that is, live traffic and (2) datasets or captured traffic. Live traffic as immediate contribution to system can be destructive. Accordingly, new systems are generally analyzed and tried every now and again utilizing datasets. Various Datasets are available like (KDDCup99 [13], MIT DARPA 2000, etc. Dataset downloaded from CAIDA “DDoS Attack 2007” Dataset is used to test this system. It is one of the newest benchmarked datasets available. This dataset provides anonymized traffic records divided in 5 min time interval in “.pcap”format [14]. B. Feature Extraction After the traffic capture takes place, next step in the detection process is feature extraction. There are five elements extricated in this module to be specific, such as source IP entropy, protocol entropy, source IP variation, packet rate and packet size entropy [15, 16]. 1. Entropy of Source IP (Esip): Entropy is a proportion of degree of particularity (concentration) or randomness (dispersion) of given attribute. Here, it is calculated for source IP, because in DDoS attack, many source IPs together attack one destination IP. Therefore, entropy of source IP will be maximum, and entropy of destination IP will be very less or even 0. That is, when variation is high, the value of entropy is also high, and when variation is less, it tends to zero, or in other words, if specificity is more, then entropy is less or tends to zero. It is calculated using following formula: H (X ) = −

n  i

P(xi )log2 P(xi )

An Empirical Overview on DDoS: Taxonomy, Attacks, Tools and Attack …

157

Here, X is variable addressing source IP, and n is total number of distinct values for source IP. 2. Entropy of Protocol (Eprotocol): Entropy of protocol measures the randomness in protocol. As specified in above section, when DDoS is performed, randomness of particular type of protocol is reduced than observed in legitimate traffic. This helps in detection of abnormal traffic behavior. It is calculated using formula: H (Protocol ) = −

n 

P( pri )log2 P( pri )

i

Here, pri represents protocol, and n is number of distinct protocols present in captured data. 3. Entropy of Packet Size (Epacket_size): Irregularity of packet size is called entropy of packet size. It is difficult to record all packet size in fast network; therefore, new approach of the packet size-level as displayed in table below is more efficient. Likewise, since ordinary sudden immense traffic can change the entropy of packet size as well, improved entropy of packet size is used to distinguish them from DDoS traffic. 5  n   ni  i log2 H (size) = −i max S S i=1 ∗

Here, imax is the packet size-level where most packets fall into, S is number of packets, and n is frequency of particular sized packet. 4. Packet Rate (Prate): The packet rate is defined as absolute number of packets got/caught in unit time. For example, there are 20 packets received in 1 s time interval, then Prate = 20. It is important feature and good signal of DDoS attack as there is difference between Prate of legitimate traffic and Prate of attack traffic. Prate =

no.of packets time - interval

5. Variation of Source IP (Vsip): Rate of change of IP addresses with respect to time is defined as variation of source IP. If variation is more than it is normally observed in legitimate traffic, then it is attack traffic. Vsip =

no.of unique IP addresses time − interval

If change is frequent, then V sip will be high. For example, if there are 70 unique IP addresses in 1 s time frame, then V sip = 70.

158

V. Parekh and M. Saravanan

C. Anomaly Detection Anomaly detector also called as outlier detector or behavior detector is type of detection approach which identifies events or observations not conforming to normal or expected pattern. Extricated 5 element vectors from include extraction module go about as a contribution to this locator. Mahalanobis Distance: The distance calculation is needed here to find the deviation of feature vector from other records or other vectors. The more distance means that the current feature vector at hand is largely dissimilar to normal traffic. Mahalanobis distance is unitless and scale-invariant. It considers the correlations of the dataset. The following formula is used for Mahalanobis distance calculation [13]:  x) = D M (

( x − µ)  T ( x − µ)  S

Here, x is feature vector of which distance is to be calculated, µ  is mean of x and S is covariance. P-Value Calculation: The P-value is known as the probability value. The P-value is known as the level of minimal importance within the theory testing that represents the possibility of occurrence of the given event. Generally, the observations having p-value less than 0.001 are assumed to be an “outlier.” It can be calculated as [11]: pˆ − pO z=

p(1− pO) n

where pˆ is sample proportion, p0 is assumed population proportion in the null hypothesis and n is the sample size. D. Machine Learning Classification Using machine learning, a classifier may be trained to recognize characteristics of known assaults in the data at hand. This sort of locator has a slower pace of false positive locator than other types. However, if the trademark is not updated, it will not be able to identify novel attacks or variants of existing attacks, which is a serious limitation. ML classification is accomplished by the utilization of an assortment of machine learning techniques [17]. 1. Support Vector Machine (SVM) This method provides a high degree of classification accuracy, which is beneficial. It is described as a Euclidean space vector space that has a dimension for each feature/attribute of an item [18] and consists of a dimension for each feature/attribute of an object.

An Empirical Overview on DDoS: Taxonomy, Attacks, Tools and Attack …

159

2. Naïve Bayes (NB) The Naive Bayes classifier is one of a group of Bayes classifiers which is a family based on the Naive Bayes theorem, and it is one of the most often used. In order for this classifier to make predictions, it has to be confident in the independence of the features, which is a crucial aspect of its design. It is simple to construct, and it typically works well, making it ideally suited for use in the medical sciences sector and for detecting disorders [18]. E. Combined Decision Combined output from two locator modules, specifically the abnormality detector and the signature detector, is responsible for generating the output from this module. This can be implemented using OR logic, that is, even if one of the detectors’ outputs are positive, then the system’s output is “attack detected,” otherwise system’s output is “no attack detected.”

7 Result Analysis Figure 2 shows the accuracy comparison of SVM, NB and hybrid algorithm. The highest accuracy achieved by hybrid algorithm is 99.86%. Figure 3 shows the precision comparison of SVM, NB and hybrid algorithm, respectively. The hybrid algorithm shows better performance, while the SVM algorithm has less performance.

Fig. 2 Accuracy comparison graph

160

V. Parekh and M. Saravanan

Fig. 3 Precision comparison graph

8 Conclusion In this paper, we proposed a book DDoS detection technique that utilizes a feature extraction algorithm and machine learning algorithms, the Mahalanobis distance metric, and the p-value calculation is used to detect DDOS attack, which eliminates the need for training using labeled data. Also combining decision is used to detect attack. On the CAIDA 2007 dataset, we tested the suggested algorithm’s performance. Furthermore, we examined its performance with different algorithm settings for resolving the alert by examining its accuracy parameter values and provided the taxonomy, DDoS tools, detection framework and its relevant proposed mechanism. Given that the suggested algorithm’s computational and memory cost are not reliant on the passage of time, it is well suited for real-time applications.

References 1. Specht S, Lee RB (2004) Distributed Denial of service: taxonomies of attacks, tools and countermeasures. In: Proceedings of the 17th international conference on parallel and distributed computing 2. Kargl F, Maier J, Weber M (2004) Protecting web servers from distributed denial of service attacks. In: Proceedings of 10th international world wide web conference, computing systems, pp 543–550 3. Mirkovic J, Martin J et al (2002) A taxonomy of DDoS attacks and DDoS defence mechanisms. Computer Science Department, University of California

An Empirical Overview on DDoS: Taxonomy, Attacks, Tools and Attack …

161

4. Tariq U, Hang M, et al (2006) A comprehensive categorization of DDoS attack and DDoS defense techniques 5. Chen L, Longstaff T, Carley K (2004) A taxonomy of DDoS attack and DDoS defence mechanisms. Comput Secur 6. Jun JH, Ahn CW, Kim SH (2014) DDoS attack detection by using packet sampling and flow features. In: Proceedings of the 29th annual ACM symposium on applied computing. ACM, Gyeongju, Korea, pp 711–712 7. Hussain A, Heidemann J, Papadopoulos C (2003) A framework for classifying denial of service attacks. ACM: 99–110 8. https://www.cloudflare.com/learning/ddos/ddos-attack-tools/how-to-ddos/ 9. Ravi N, Shalinie SM (2020) Learning-driven detection and mitigation of DDoS attack in IOT via SDN-cloud architecture. IEEE Internet Things J 7:3559–3570 10. Bhuyan MH, Bhattacharyya D, Kalita JK (2015) An empirical evaluation of information metrics for low-rate and high-rate DDoS attack detection. Pattern Recogn Lett 51:1–7 11. Gu Y, Li K, Guo Z, Wang Y (2019) Semi-supervised k-means DDoS detection method using hybrid feature selection algorithm. IEEE Access 7:64351–64365 12. Karan BV, Narayan DG, Hiremath (2018) Detection of DDoS attacks in software defined networks. In: Proceedings of the 3rd international conference on computational systems and information technology for sustainable solution 13. Daneshgadeh S, Ahmed T, Kemmerich T, Baykal N (2019a) Detection of DDoS attacks and flash events using Shannon entropy, KOAD and Mahalanobis distance. In: 22nd Conference on innovation in clouds, Internet and networks and workshops (ICIN). IEEE, pp 222–229 14. Caida (2007) The Caida DDoS attack dataset. Massachusetts Institute of Technology [Online]. Available: http://www.caida.org/data/passive/ddos-20070804dataset. xml. Author, F.: Article title. Journal 2(5), 99–110 (2016) 15. Stone R (2000) CenterTrack: an IP overlay network for tracking DoS floods. In: Proceedings of 9th USENIX security symposium 16. Kumar K, Joshi R, Singh K (2007) A distributed approach using entropy to detect DDoS attacks in ISP domain. In: ICSCN‘07. International conference signal processing, communications and networking, 2007 17. Verma V Kumar V (2020) DOS/DDOS attack detection using machine learning: a review. ICICC 18. Perez-Diaz JA, Valdovinos IA, Choo K-KR, Zhu D (2020) A flexible SDN-based architecture for identifying and mitigating low-rate DDoS attacks using machine learning. IEEE Access: 155859–155872

Histopathology Osteosarcoma Image Classification Ayush Chhoker, Kunlika Saxena, Vipin Rai, and Vishwadeepak Singh Baghela

Abstract Bone malignant growth is one of the hazardous sicknesses which might make pass Ing numerous people. Most osteosarcomas occur in youths aged 10–14. Potentially, the most broadly perceived sort of a bone dangerous development in kids and a widely recognized sort of bone disease. There should be an exact identification and arrangement framework accessible to analyze bone malignant growth at the beginning phase. Early discovery of disease is by all accounts the significant component in expanding the shot at malignant growth patient endurance. It is a noxious and threatening infection, caused because of the uncontrolled division of cells in the bone. Convolutional brain organization (CNN a start to finish model) is known to be a choice to conquer the previously mentioned issues. Consequently, this work proposes a limited CNN plan that has been completely investigated on a little osteosarcoma histology picture dataset (an elegant imbalanced dataset). However, during readiness, class-imbalanced data can antagonistically impact the introduction of CNN. Accordingly, an oversampling system has been proposed to beat the previously mentioned issue and further foster hypothesis execution. In this cycle, a different evened out CNN model is arranged, in which the past model is non-regularized (due to thick design), and the last one is regularized, unequivocally planned for little histopathology pictures. In addition, the regularized model is Incorp-evaluated with CNN’s basic plan to lessen overfitting. Trial output exhibits that oversampling may be a successful method for tending to the imbalanced class issue during preparation. The readiness and testing correctness of the non-regularized CNN model are 98 and 78% with an imbalanced dataset and 96 and 81% with a good dataset, independently.

A. Chhoker (B) · K. Saxena · V. Rai · V. Singh Baghela Galgotias University, Gautam Budha Nagar, Greater Noida, India e-mail: [email protected] K. Saxena e-mail: [email protected] V. Rai e-mail: [email protected] V. Singh Baghela e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_15

163

164

A. Chhoker et al.

The regularized CNN model planning and testing correctness are 84 and 75% for an imbalanced dataset and 87 and 86% for a sensible dataset. Keywords Deep learning · Machine learning · Classification · CNN · Osteosarcoma

1 Introduction The most dangerous development is relinquished disease from one side of the planet to the other which is clinically insinuated as a noxious neoplasm, and a different inherited contamination occurs mainly due to regular processes. For genuine treatment is not available most of the patient gets kicked the can yet the quantity of passing can be diminished through the early area of infection must continue to control. Free cell advancement is the appearance of illness provoking the design of vindictive developments, which assault the nearby body tissues. These malignant growths further create and obstruct the circulatory structure, fearful, and stomach-related system and can free synthetic compounds that brief modify the suitable bodywork. The early revelation of sickness is apparently the huge variable in extending the shot at threatening the development of patient perseverance. Malignant tumors are abnormal growths of cells that can spread to other parts of the body and invade normal tissue. They can obstruct the circulatory, respiratory, and digestive systems, and can also release harmful chemicals that can negatively impact the body’s normal functioning. Early detection of the disease is a crucial factor in improving the chances of survival and extending the patient’s life. In other words, the earlier the disease is detected, the better the chances are of treating it effectively and preserving the patient’s health. Bone sickness starts in the bone. Danger starts at the point when cells in the body begin to end up being wild. Cells in practically any piece of the body can advance toward becoming growth and can spread to various zones of the body bone malignant growth that is one of the dangerous illnesses which might make passing numerous people. It is caused because of uncontrolled division of cells in the bone. Prior identification ton and arrangement of disease are the most difficult issue. Clinically, bone disease is named the Sacro mas, which begins in the muscle, bone, tacky tissue, veins, and some other issues. Likely, the most notable sorts of bone dangerous development are osteosarcoma, chondrosarcoma, Ewing’s sarcoma, pleomorphic sarcoma, and fibrosarcoma. In bone illness, the disease gets outlined into the bone and impacts bone development and bone turn of events. Unequivocally in the bone disease thought, enchondroma is a sort of in nocuous development found inside the bone that begins at the tendon. In most of cases, enchondroma is found in the little bones of the hand, and possible weak bone locales for enchondroma are the femur (thigh bone), tibia (shin bone), humerus (upper arm bone).With the coming of profound CNN, CNN-based grouping has as of late accomplished huge triumphs in PC vision and example acknowledgment. Diverse CNN stowed

Histopathology Osteosarcoma Image Classification

165

away layers give distinctive picture reflection levels and can be utilized to remove complex highlights like human countenances and regular struct. Pictures are useful for assessing threats [1] and, as a result, for arranging treatments. In the not-so-distant future, PC-supported finding frameworks might be utilized to illuminate the wellbeing proficiency with regards to the conceivable event of osteosarcoma on regularly produced pictures. As there is a plenitude of bone radiographs (X-beam images), such frameworks will be incredibly valuable. A trademark frequently found in these pictures in patients with osteosarcoma is the Codman triangle, which is fundamentally a sore shaped when the periosteum is raised because of the growth [2, 3]. We propose a PC helped determination framework dependent on CNNs for the order of osteosarcoma on radiographs (plain X-beam pictures). The CNN ought to order bone radio diagrams identifying the presence of osteosarcoma. The framework ought to likewise show locales of the picture that might contain cancer. To show these districts in the picture, we propose to part the picture in windows and exclusively characterize them by utilizing CNN. To consequently create the picture windows and use them in preparing and testing the CNN, procedures for pre-handling, like window avoidance and naming, are proposed. By doing this, a benefit of the proposed strategy is that no manual pre-handling steps, e.g., division and extraction of highlights, are important. Two CNNs are analyzed in the proposed framework: (I) CNN prepared without any preparation. (II) A pre-prepared CNN (VGG16). The customary methodology for ordering pictures is to extricate highlights utilizing predefined channels (highlights) and to involve them as contributions of AI models. The CNNs are contrasted with four customary AI models that utilization highlights extricated from the picture windows as sources of info: • • • • • •

MLP MLP with feature selection Decision tree Random forest CNN trained from scratch Pre-trained CNN.

The imbalanced class scattering powers CNN designs to be uneven toward the larger part class; subsequently, the characteristics of the minority classes are progressed insufficiently, inciting misclassification, and ending up being all the more difficult to anticipate. ICD impacts the two associations during the planning stage and the hypothesis of a model on the test set (10). Data Level, Algorithmic Level, and Cost Sensitive • Data level methods add a pre-handling step where the information conveyance is rebalanced to diminish the impact of the slanted class dispersion in the growing experience process it includes the trade-offs between accuracy, computational cost, and interpretability of different algorithms (11).

166

A. Chhoker et al.

• Calculation level methodologies make or change the calculations that exist to think about the meaning of positive models [4] • Cost sensitive refers to the consideration of costs associated with false positive and false negative predictions in a machine learning model. It involves balancing the accuracy of the model with the cost implications of incorrect predictions. One more test that CNN faces for those histopathological picture datasets where absence of fluctuation exists among bury classes and an absence of similitudes inside intra-class. Histopathological datasets have high fluctuation among bury classes and need changeability inside the intra-class [9].

2 Material and Method In the proposed framework, a CNN characterizes windows of radiographs containing bones. The CNN arranges the windows into one of two classes: typical and growth (osteosarcoma). The dataset utilized in the analyzes is introduced. The pre-handling systems are additionally portrayed while the framework depends on AI models utilizing predefined highlights which is portrayed.

2.1 Dataset and Methodology We have collected the dataset available at cancer imaging archive.net. This Website provides images of different types of cancer. As our topic is related to osteosarcoma, we focused on the imagining of its only. The following is the link (16). The dataset is made out of hematoxylin and eosin (H&E) stained osteosarcoma histology pictures. The information was gathered by a group of clinical researchers at the University of Texas Southwestern Medical Center, Dallas. Recorded examples for 50 patients treated at Children’s Medical Center, Dallas, somewhere in the range of 1995 and 2015, were utilized to make this dataset. Four patients (out of 50) were chosen by pathologists in view of the variety of cancer examples after careful resection. The pictures are marked as non-tumor, viable tumor, and necrosis as per the dominating disease type in each picture. The explanation was carried out by two clinical specialists. All pictures were split between two pathologists for the explanation action. Each picture had a solitary explanation as some random picture was commented on by just a single pathologist. The dataset comprises of 1144 pictures of size 1024 × 1024 at 10× goal with the accompanying circulation: 536 (47%) non-growth pictures, 263 (23%) necrotic cancer pictures, and 345 (30%) feasible cancer tiles. A technique for creating the contributions of the CNN comparable to that embraced in [5] is utilized, in which each picture is separated into little rectangular windows. Here, the goal is to arrange every one of the picture windows into

Histopathology Osteosarcoma Image Classification

167

two classes: typical what’s more osteosarcoma. In [5], the goal was to arrange focally cortical dysplasia in windows of mind pictures got by attractive resonance imaging. Systems were produced for consequently, making and marking the picture windows. In the windowing system, the radiography is cut into more modest square fragments (windows), which are here utilized for preparing and testing the CNN. Figures 1 and 2 show an illustration of the windowing system applied to the radiography displayed in Fig. 1. Tests (not displayed here) were acted to look at the presentation of the framework for 100 * 100 pixels and 1024 * 1024 pixels windows. Best outcomes were gotten for 1024 * 1024 windows; the 1020 * 1024 windows came about in mistakenly grouping all models with osteosarcoma, i.e., the model could not get familiar with the significant attributes for the order of cancers. Despite bringing about more models

Fig. 1 Osteosarcoma radiography

Fig. 2 Illustration of windowing applied to radiography

168

A. Chhoker et al.

Fig. 3 Windows with 100 * 100 pixels

Fig. 4 Windows with 1024 * 1024 pixels

for preparing, the 100 * 100 windows have a more modest measure of data significant for the order. This should be visible in the model introduced in Fig. 3, where it is simpler to recognize applicable qualities in the window with 1024 * 1024 pixels in Fig. 4. In the naming method, the windows utilized for preparing furthermore testing the classifier are consequently marked into one of two classes. The names demonstrate the presence or nonattendance of cancer in the window. To make the dataset for preparing and testing the CNN, a radiologist physically denoted the areas of the radiographs with cancer (osteosarcoma). This limit was gotten in introductory analyzes (not displayed here), where it was seen that edges with higher qualities came about, for certain radiographs, in the naming of any window with cancer. The edge rises to 23% guarantees that something like one window is named with growth on every one of the pictures of the dataset.

2.2 CNN Python libraries and schedules, for example, TensorFlow and Keras, were utilized in this work for carrying out the CNNs. The open-source library Pillow was utilized for picture control [6] and Google Colab [7] for running the CNNs. Google Colab is a

Histopathology Osteosarcoma Image Classification

169

free cloud administration that offers free admittance to GPUs and simple sharing of codes. We propose two approaches for creating the CNN in the PC helped conclusion framework. CNN prepared without any preparation: The contribution of the CNN is the 1024 * 1024 pixels pictures, and one single result shows the presence or nonappearance of osteosarcoma in the window. Tests (not displayed here) with solitary radiography were done to choose the engineering and hyperparameters boundaries of the CNN prepared without any preparation. Precision was utilized to assess CNNs with various hyperparameters and designs. The model with the best outcomes has five convolutional layers with 3 × 3 windows. The initial two convolutional layers have 128 channels each, while the third and fourth layers have 64 channels each, and the fifth layer has 32 channels. Later the second, fourth, and fifth convolutional layers, one MaxPooling layer is applied. MaxPooling layers have 2 × 2 windows. At last, three completely associated layers, with individually 8, 4, and 1 neuron, are added. In all convolutional and thick layers, the (ReLU (Rectified Linear Unit) is a commonly used activation function in deep learning. It replaces all negative values in the input with zero, making it computationally efficient and well-suited for large networks. In the convolutional and dense layers of the CNN, ReLU is utilized as the activation function, except for the last dense layer where a different activation function may be used, depending on the problem being solved and the desired output format) work is utilized, besides in the last thick layer, where sigmoid capacity is utilized. Group standardization is applied. The Adam enhancer with default boundaries for TensorFlow is utilized for changing loads of the CNN.

2.3 CNN Model Pre-prepared CNN: Instead of making and preparing CNN without any preparation, the pre-prepared model utilizes a CNN with pre-characterized engineering and pre-prepared utilizing a colossal arrangement of pictures. Doing this highlights, the significance of grouping countless sorts of pictures found during preparation. These highlights are addressed by convolutional and pooling layers. Here, the pre-prepared CNN is VGG16, proposed by [8]. VGG16 acquired generally excellent execution on the 2014 ImageNet Competition; it got 92.7% top-6 test precision on a dataset with in excess of 14 million pictures having a spot with 1000 classes. A benefit of utilizing pre-prepared CNNs is that enormous models can be utilized on the grounds that we need not bother with preparing the counterfeit neural organization without any preparation. VGG16 has numerous convolutional and pooling layers and around 138 million boundaries. Here, the VGG16 is joined with a completely associated stowed away layer of 32 neurons, with ReLU actuation work, and the last layer with 1 neuron for grouping, with sigmoid enactment work. The CNN was re-prepared with the bone radiographs dataset by involving default boundaries for TensorFlow.

170

A. Chhoker et al.

2.4 Machine Learning Models Using Predefined Features As an option in contrast to utilizing CNNs, we can utilize conventional AI classifiers with inputs furnished with predefined radiomic highlights. The radiomic highlights are removed from the windows by utilizing the pyradiomics library [9], which is an open-source Python library for separating radiomic highlights from clinical pictures [9]. Here, pyradiomics is applied to every 1024 * 1024 pixel picture, and the removed elements are utilized as contributions to the classifier. All elements extricated from pyradiomics are utilized, except for 3D shape-based highlights. The elements are as follows: • First Order Statistics: 19 features • Dim Level Co-event Matrix: 23 elements gray level size zone Matrix: 16 highlights; Gray Level Run Length Matrix: 16 elements • Adjoining Gray Tone Difference Matrix: 4 elements; Gray Level Dependence Matrix: 14 highlights. The classifiers are decision tree, MLP, MLP with feature selection, and random forest. In the MLP with feature selection, the features extracted by the decision tree are used as inputs of the MLP. All models were implemented with default parameters of the Scikit-Learn Library.

2.5 Experiment Radiographs of 50 patients were marked by the radiologist, bringing about 3018 windows of 1024 * 1024 pixels. The avoidance system dispensed with 2196 windows, coming about in a dataset with 1104 models for preparing and testing the CNN. Tenfold cross-approval is utilized to assess the classifiers. Cross-validation is applied by thinking about the division by patients and not by windows. As such, the classifiers are prepared to utilize the windows of a subset of patients and try utilizing the windows of one more subset of patients. This is done as such that it is feasible to notice the exhibition of each model for every one of the windows of radiography, similarly that it is done in a certifiable circumstance. Moreover, a few windows on radiography are relied upon to be comparative; isolating the subsets by understanding does not bring about inclination that could be produced if windows of a similar patient are utilized for preparing and testing the classifiers. There are 185 windows with growth and 952 without growth. To adjust the dataset for preparing the models, a similar number of windows (189) is utilized for each class during preparation (13). In any case, every one of the 11,041 instances of the dataset is utilized for testing the classifiers in cross-approval.

Histopathology Osteosarcoma Image Classification

171

3 Result and Discussion The accuracy for the CNN trained from scratch was 74%. Table 1 shows the confusion matrix, while Fig. 5 shows the classification results for the 6 first radiographs of the dataset. Figure 6 shows the ROC curve; the area under the curve (AUC) was 0.8201. Table 2 shows the disarray framework for the pre-prepared CNN. The exactness for the pre-prepared CNN was 77%. The consequences of the pre-prepared CNN for the two classes were better than the consequences of the CNN prepared without any preparation. The pre-prepared CNN utilizes a bigger and more perplexing design than the CNN prepared without any preparation. This engineering demonstrated to be more viable for this picture arrangement issue. Likewise, the pre-prepared CNN was recently prepared with a huge picture dataset that permitted to find highlights helpful for the grouping of various sorts of pictures (14). Those elements were helpful for arranging the dataset utilized in the examinations. Better outcomes could be gotten by the CNN prepared without any preparation if a bigger dataset was utilized. The exactness, affectability, and particularity for all models are introduced in Table 3. The re-prepared CNN acquired the best execution among all models. It acquired better exactness and affectability (attached with the MLP with highlight determination) and the second better particularity (0.90 against 0.81 of the MLP) (Table 4). The CNNs utilized here have one result with sigmoid initiation work. The basis took on here for grouping the window is that cancer is recognized assuming the result is more modest than 0.5, and the class is typical in any case. Then again, the Table 1 ICD of osteosarcoma histology images

Fig. 5 CNN model

Class type

Images

Percentage (%)

Non-tumor

536

47

Viable tumor

345

30

Necrotic tumor

263

23

172

A. Chhoker et al.

Fig. 6 ROC curve for the CNN trained from scratch

Table 2 Confusion matrix obtained by CNN trained from scratch

Predicted class Tumor

Normal

Real class tumor

152

63

Normal

210

603

Table 3 Confusion matrix obtained by the pre-trained CNN Predicted class Tumor

Normal

Real class tumor

251

102

Normal

301

703

Table 4 Ten-fold cross-validation for all models Decision tree

Random tree

MLP

MLP with feature selection

CNN trained from scratch

Pre-defined CNN

Accuracy

0.73

0.77

0.65

0.79

0.75

0.82

Sensitivity

0.69

0.77

0.76

0.90

0.72

0.84

Specificity

0.64

0.80

0.81

0.61

0.77

0.79

radiologist can dissect the worth of the result as a certainty marker for the presence of growth. A model is introduced in Fig. 6; figures like this can be naturally produced, helping the radiologist when settling on a choice.

Histopathology Osteosarcoma Image Classification

173

4 Conclusion and Future Scope Framework dependent on CNNs for the order of osteosarcoma on radiographs is proposed. In request to demonstrate the locales with cancer, the picture is partitioned in windows. These windows are separately characterized by utilizing the CNN. The result of the CNN can likewise be utilized as a certainty pointer for the resence of growth, helping the radiologist when settling on a choice. The fundamental fascination of the proposed strategy is that, in the wake of preparing, all pre-handling steps are programmed, i.e., the radiologist does not have to fragment the pictures, remove includes, or play out any manual pre-handling steps. Strategies were proposed here for consequently making the windows, barring immaterial windows, and marking the models for preparing and testing the models. The best outcomes were acquired for windows with 1024 * 1024 pixels, however, the order framework can be utilized with various windows size. When contrasted with the CNN prepared without any preparation and to 4 AI models that utilization pre-characterized high lights as information sources, the best exhibition was gotten by the pre-prepared CNN. The exactness acquired by the pre-prepared CNN was 0.90, while the affectability and particularity were individually 0.81 and 0.84.

References 1. Onikul E, Fletcher B, Parham D, Chen G (1996) Accuracy of MR imaging for estimating the intraosseous extent of osteosarcoma. AJR Am J Roentgenol 167(5):1211–1215 2. Moore DD, Luu HH (2014) Osteosarcoma. In: Peabody T, Attar S (eds) Orthopaedic oncology—primary and metastatic tumors of the skeletal system, vol 162. Springer, pp 65–92 3. Kundu ZS (2014) Classification, imaging, biopsy, and staging of osteosarcoma. Indian J Orthop 48:238–246 4. López V, Fernández A, Moreno-Torres JG, Herrera F (2012) Analysis of preprocessing versus cost-sensitive learning for imbalanced classification. Expert Syst Appl 39(7):6585–6608 5. Silva S, Simozo F, Junior LM, Tinos R (2020) Uso de redes neurais convolucionais para identificar displasia cortical focal em pacientes com epilepsia refratária, in Anais do XVII Encontro Nacional de Inteligencia Artificiale Computacional 6. Clark (2015) Pillow (pil fork) documentation: Release 6.2.0.dev0. https://www.realmoon.net/ wordpress/wpcontent/uploads/2019/07/pillow.pdf 7. Bisong E (2019) Google colaboratory. In: Building machine learning and deep learning models on Google Cloud Platform. Springer, pp 59–64 8. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large scale image recognition. arXiv preprint arXiv:1409.1556 9. Van Griethuysen J et al (2017) Computational radiomics system to decode the radio graphic phenotype. Cancer Res 77(21):e104–e107 10. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid based approaches. IEEE Trans Syst, Man, Cybern, Part C (Appl Rev) 42(4):463–484 11. Abdi L, Hashemi S (2015) To combat multi-class imbalanced problems by means of oversampling technique. IEEE Trans Knowl Data Eng 28(1):238–251 12. Simonyan, K., and Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

174

A. Chhoker et al.

13. Litjens G, Sánchez CI, Timofeeva N et al (2016) Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci Rep 6:26286 14. Goode A, Gilbert B, Harkes J et al (2013) Open slide: a vendor-neutral software foundation

Information-Based Image Extraction with Data Mining Techniques for Quality Retrieval S. Vinoth Kumar, H. Shaheen, A. Christopher Paul, M. Shyamala Devi, R. Aruna, and S. Sangeetha

Abstract The search engines are becoming unavoidable opening to an immense volume of data prevailing over the web. Due to the web users normally concentrating on the initial pages of the investigation outcomes the scoring schemes are initiated which resembles a momentous prejudice for perceiving the web. Retrieval of image with text has been in practice for several years since when the retrieval of an image has been on the deck of every major crop. While the retrieval has been done based on the text provided with the image sometimes leaves no clue of what the picture actually looks like. Hence it is considered that data mining techniques along with the color analysis of the image and the retrieval based on the content of the image would be more than an effective process to make the feature extraction along with prediction of nearest neighbor and estimation algorithms recognizably builds the proposed system. The resultant retrieval process can analyzed with proper color extraction and conversion of histogram equalization. Keywords Moving picture experts group (MPEG) · Information-based image recovery (IBIR) · Converse context histogram (CCH) · K-means · KNN-k-nearest neighbors · Estimation algorithms S. Vinoth Kumar (B) · M. Shyamala Devi · R. Aruna Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamil Nadu, India e-mail: [email protected] M. Shyamala Devi e-mail: [email protected] R. Aruna e-mail: [email protected] H. Shaheen University of West London, Ras Al Khaimah, United Arab Emirates A. Christopher Paul Karpagam Institute of Technology, Coimbatore, India S. Sangeetha Tamil Nadu State Council for Science and Technology, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_16

175

176

S. Vinoth Kumar et al.

1 Introduction There are plenty of audio and visual information are available in digital form playing very important role in day-to-day life preferably called the multimedia content description interface (MPEG-7) for documenting multimedia contents [1]. The images are classified based on their color, texture, shape and the spatial characteristics. Several conventional image reservoirs are available for preparing queries by presenting an example image based on which the system recognizes the stored images whose characteristics matches the presented queries. The color features are usually represented as a histogram of strength of color pixels where the segregations are performed based on chunks each of which are indexed using leading color and diffusion values. The color and spatial diffusion are detained using analog ram data structure. The commonly used quality features are Gabon filters along with other quality measures like Tamura features, Unser’s sum and divergence histogram, Galloway’s run—length features, Chen geometric features, Laine’s color energy [2]. The shape feature methods are characterized using ancient measures such as area and circularity to more complicated measures for several instant invariants, transformation-based techniques varying from transformation involving functionalities like Fourier descriptions to structure-based transformations like coded chains and curve space featured vectors. The special characteristics are offered as standard set of relations among the objects within the image usually vector images which focuses on the related positions of the objects of images and a set of measures like strings using 2D, geometrybased strings, graphs over space [3], feature points over space based on quarters. The high scale image illustration methods are based on the idea of designing a model for recognizing every model and discovering the regions in image which might serve as example of objects in image. A system was designed in conventional days for handling the problem for analyzing the objects by using structure grammars for obtaining likely analysis. Another method called low-frequency images are used for teaching neural networks. The users can recognize potential color ranges, quality and shape for delivering their requests which can further be developed using user opinion schemes. Based on the user satisfaction level the request are analyzed and hoarded into a database for future reference. The color-based mechanism is employed for recovering the images from the hoarded database. The image mining technique involves mining of information [4], relationship between the image and the data. It employs different methods involving processing of images, image recovery, mining data, learning through machines, computer vision, databases and artificial intelligence [5]. Huge image databases employ rule mining techniques. Two techniques are possible wherein the first approach extracts from huge image collections and the second approach extracts aggregated collection of images along with its alphanumeric data. The process requires an algorithm for extracting images using a blob for extracting images within its context for which rule mining technique are employed for identifying the relation between structure and functions of an image. The paper focuses on image extraction for defining rules for altering low level semantic features

Information-Based Image Extraction with Data Mining Techniques …

177

into high level. The process of obtaining an image from a warehouse is said to be called as the image retrieval [6]. While the intrusion of the diversity and the complexity increases, obtaining the exact image becomes a challenge. Earlier, the retrieval is based on the titles provided along with the images [7], which the humans can intervene the process to change the title that made the text-based retrieval to be an ineffective task and there existed a void to provide the better image retrieval system. Taking an advantage of these things, the information-based image recovery system (IBIR) was implemented [8]. While looking into the race of the image retrieval, IBIR outperformed all other methods in extracting, mining, browsing, etc. It was all then the necessity of the extraction of vital information from the actual data proven needy and were keenly discussed to be more important. Although it is in the development phase, many problems faced are yet to be resolved. This ultimately caused the rapid growth in the sector crowding to a lean doorway. To delegate the propaganda, we proposed the feature extraction with the information-based image recovery system, capitulating the difficulty with predicting nearest neighbor and clustering algorithm based on estimation. A faultless stream framework model, picture bunching and neighborhood examination geography are acquainted with work on the effectiveness and the exactness.

2 Background Initial come back to information-based image recovery (IBIR) [9]. Before IBIR, the standard image retrieval is sometimes supported text [10]. Text-based for the most part picture recovery has been referenced over those years. The recovery in light of the text is easy to search out certain hindrances, for example: • Physically comment is typically worried by human’s inclination, circumstance and so on that straightforwardly prompts what’s inside the pictures and what’s it concerning. • Explanation is seldom finished. • Language and culture qualification perpetually cause issues, indistinguishable picture is some of the time text out by numerous different techniques. • Botches like text blunder or spell qualification brings about all entirely unexpected results. To beat these disadvantages [11], content based for the most part pictures recovery (IBIR) was beginning presented by Kato in 1992. The term, IBIR, is wide utilized for recovering wanted pictures from an outsized collection, which is predicated on removing the choices like surface, shapes and variety from imagines themselves. IBIR verifiable about the visual properties of picture objects rather than issue explanation. And furthermore the most sweltering and straightforwardly choices is that the variety highlight, that is also applied during this paper. During this work, variety is picked as an essential element in pictures cluster.

178

S. Vinoth Kumar et al.

The image mining technique is a process of retrieving images from a huge database. The image collection holds the information about the data that is been hidden and extra patterns are also mined [12]. All the important patterns are produced without any previous information about the mining process. A high level semantic feature are also designed for recovering images by mining several characteristics which is converted into high level feature employing fuzzy rules. Image mining technique is a most promising technique in the area of criminology, forensics, robotics, automation, industries and education [10]. It is important to obtain particular information from image databases. The images belong to a separate category which differs from text data based on their hoarding and extracting abilities. The image mining is an aggregation of several techniques involving data extraction [13], processing of images, image recovery, analysis, identification, learning through machines and artificial intelligence. The authors focused on present image extraction approaches and techniques with a goal of extending the abilities in analyzing facial images. The focus is on evaluating the present state of mining the images based on the faced challenges. In the present [14] day every individual are surrounded by several multimedia devices and are addicted to it for their daily usage. The presence of digital data over Internet involves several forms like voice, image, animations, text and videos which increases the risks in efficient data extraction over the web [10]. The extraction of related images over the web involves high risks and the users search engine are based on text with images for extracting images over the web [13]. The authors proposed an automated image explanation and image swarming techniques based on the object-based image extraction algorithms. The proposed technique overcomes freshness problem, redundancy and different object collection in image retrieval [14]. The results obtained were compared with results of users search engine where the performance of these image extraction processes outperformed the commercial search engine extracted results. Large volume of multimedia information is used for learning useful information. The data mining technique is a process of discovering patterns involving audio, video, image and text which are not accessed by basic search queries and the results associated with it. The author focused on image extraction technique and content-based image extraction for image extraction. The development of Internet is a progress in hoarding techniques [15]. The exceptional growth in data extraction allows the search engines to obtain results of image queries along with the text rather extracting images for delivering particular results based on the queries. The image mining is not independently based on mining images alone but it involves various areas like processing image, extraction, database, machine-based learning, computer vision and artificial intelligence. The authors focused on implementing image mining along with future research capabilities. There are present growth and development in image mining. The face identification technique is gaining attention in image processing and analysis. The identification of facial objects is necessary for medical research and it serves as a communication mode for people who suffer from speech impairment and body paralysis. The authors focused on these problems and designed a monitoring techniques based on images which continuously observes the user using a camera and employs image mining techniques for identifying the

Information-Based Image Extraction with Data Mining Techniques …

179

expressions in face based on which the actions will be informed to a doctor or to their family members in case of any discomfort.

3 Proposed Scheme In the proposed system, querying and understanding are the two basic amenities with the information-based image recovery system. Understanding deals with the sample pictures being put over into the first block of the process as input and then they are suited to proceed for the retrieval of the properties of the pictures, for which the kmeans clustering algorithm is used, as it is the simple process to evaluate the feature of the cluster. The output from the first step is then segmented and the neighborhood grids are optimized for the better cluster arrangement in the pictures that are provided as the samples to the proposed system. Figure 1 reveals the architecture of the entire system. There are four modules present in the architecture that applies to both of the stages, Grid and Segmentation, Clustering block, feature retrieval block and the cluster in the neighborhood block. This can be best viewed in Fig. 2.

3.1 Grid and Segmentation The sample pictures 3 show that are put into the system are processed in this block. When there are pictures with larger size present, they tend to decrease the level of accuracy and hence they are sliced as smaller grids which would eventually help in the retrieval of the properties of the image and processing them. The E * E grid is the first developed grid, followed by totaling the other sub category of the grids as T * T the totaled variable. They can be best viewed in Fig. 4 as follows:

3.2 Feature Retrieval The information pictures, as well as the instructing and question stage, are totally handled during this module. It’s also the most important in picture recovery. Since tone is just the liked and natural element that upheld on the human picture, it’s applied inside the framework. To actuate extra strong choices, the CCH procedure is comprehensively applied for separating the essential element reason. The significant two component extractions are addressed as beneath: • Color Extraction: Input pictures will be separated into E * T networks before this stage. All matrices are contribution to extricate the assortment of variety

180

S. Vinoth Kumar et al.

Fig. 1 IBIR system architecture

highlight. To begin with, the module sorts out the normal RGB counts of the E * E frameworks. Second, the internal T * T matrices in each E * T lattices additionally will be contribution to compute the normal RGB count. The T * T matrices’ detail RGB information is add once the E * T lattices’ variety highlight information. Every one of those are prepared for beginning K-implies bunch. Figures 3 and 4 represents the variety highlight extractions of this stage. • Converse Histogram Extraction: The framework uses CCH to search out the vital element focuses. Every one of the focuses are recognized for preparing the gave document of the local module and K-implies cluster or KNN characterizing. The information of CCH highlight focuses, as well as the 64 aspects data, consolidates with the local module result. Taking it on the grounds that the contribution for the

Information-Based Image Extraction with Data Mining Techniques …

181

Fig.2 Architecture modules

second season of K-implies bunch, the K-implies cluster prompts section-based data, choice the Code book. Since the equivalent implementation being referred to step, K-implies is supplanted by KNN equation. Question data inputted will be characterized to support the instructing code book, moreover right grouped outcome helps for rapidly recovery.

182

S. Vinoth Kumar et al.

Fig. 3 Segmentation sample

Fig. 4 Grid sample and color extraction

3.3 Neighborhood Cluster Module In this module, the gave document is from the CCH highlight focuses. Include points of each picture can consider as a record to make up the local table. Also the essential K-implies bunch aftereffects of each E*E matrices are unfamiliar to mean the value of the local table. The means of each detail are addressed as follow: (a) Input the CCH include reason Y of the X picture, portray as PICXY. (b) Get the essential K-implies bunch result upheld the CCH highlight point’s direction. (c) Get the areas’ underlying K-implies bunch results. (d) Appending the outcomes from step3 concurring with the request left to right then prime to base. In the event that there’s no area, the count will be “0” that represents the part of the photos. (e) Appending the CCH information into the local table. (f) Estimation bunch upheld the local table to get the code book.

Information-Based Image Extraction with Data Mining Techniques …

183

The data show and cycle of the local module. Taking a comparable organization, the inquiry step applies neighbor forecast equation instead of assessment.

3.4 Prediction Estimation Block Assessment cluster is applied doubly in our framework. The assessment cluster creates the code book. To remain those information (incorporate the training and question stage) being grouped inside a similar ordinary, our plan keeps the bunch essential issues inside the instructing stage. The main issues are unfamiliar into the K-implies pack inside the inquiry stage. At last, the code book might be planned inside a similar investigation standard. Neighbor prediction equation (NPF) is furthermore worried into our IBIR system. Upheld the training result, NPF is applied for the inquiry data pictures. NPF assists with grouping the information; moreover it fixes the code book which recommends the training result might be self-adjusted.

3.5 Feature Retrieval Question Block Taking into account the element points of CCH and the variety and furthermore the framework section focuses, all photos input for question will be separated into things. Then, at that point, NPF is applied to characterize those photos that guides to the training result (code book). The inquiry network pictures are contrasted and those picture lattices inside a similar bunch, the framework then works out the qualification upheld the variety highlights. All fragments are labeled and joined with one grid within the code book. By calculate the foremost quantity of the grids, the IBIR finally output the question and retrieval result. Hue Feature: The hue feature mining involves segmentation of images based on hue for which the image is initially changed as C l C R–G C B–Y, where C l represents color luminance, C R–G represents color red to green and C B–Y represents the color blue to yellow. The feature uses twelve colors as basic which are obtained by aggregating red, yellow, blue, orange, green and purple in a linear fashion. Here, it is possible to extract five levels of color luminance and the obtained results are shifted into one of 180 reference hues. The spatial representation of hues are performed using k next nearest algorithm based on which the image are divided into chunks for delivering each of the chunk into absolute space. Quality Feature: The Quasi filters are investigated for representing the quality of images. Every image is described with 42 values by estimating the energy level of each block as defined in aggregation of one of 6 frequencies and one of 7 orientations. From the above observations an average of value is chosen as the direction of filtered image in every block.

184

S. Vinoth Kumar et al.

Shape Feature: The shape features are characterized by converting the image into binary for which Beizer curves and spines are applied for obtaining a set of straight lines, images and arcs. Rule Mining: A database for images are designed by employing an identity for image I 1, I 2 , …, I n with Hue features H 1 , H 2 , …, H n , quality feature Q1 , Q2 , …, Qn and Shape features S 1 , S 2 , …, S n . Here, L 1 , L 2 , …, L n represents the high level feature of an image foe which the image extraction process happens in two ways. The initial step involves finding a consequent multidirectional value and the related consequent characteristics within the database. The second method involves extraction of consequent characteristics for each multidirectional pattern by attaining rule set for high level feature extraction.

3.6 Conversion of Low-Level Features into High Level Features The motto of this phase is to arrange difficult analysis of image semantics derived through the low-level image analysis features by employing techniques for mining high-level features repeatedly using defined rule sets for the application domain. The aim of the rule is to attain identification of a high level feature by measuring the distance between the applied rule and semantics found on the image. The method employs fuzzy calculations and a hindrance technique uses the backward chaining method for deriving the low level characteristics for identifying the degree of characteristics attained.

3.7 Image Recovery Based on High Level Feature The recovery of images based on high level features including hue, shape and quality are discussed below. Hue Feature: The agreement is defined as an aggregation of hues resulting in a gray blend which creates a stable effect for a human eye. The disagreement aggregation results are costly implemented using fuzzy rules which are employed for transforming low level features based on the degree of humidity and contrast from rest of the colors. Quality Features: The conversion of low level features into high level features are achieved by estimating the low level features of a typical set of particular quality and selecting the chunk values for designing fuzzy rules. Shape Features: It defines the shapes for specifying objects over the domain. The fuzzy-based rules are employed for estimating the resemblance between the shapes of search and the shape of the object during image extraction.

Information-Based Image Extraction with Data Mining Techniques …

185

Semantic Features: The semantic features described in image extraction are employed which aggregates the hue, quality and shape features and the high-level characteristics defined by the professionals during image extraction.

4 Implementation In view of the procedure furnished with paper, the analyses are intended to check the plan of the IBIR framework. Conjointly the tests shows the modules projected with this article perform shrewd and efficient with the IBIR framework plan. The proposed system used to classify the images to calculate the typical RGB, we’ve enforced and Fig. 5 shows samples of the color feature and split result. The color options extracted from X image divided into Y fragments is denoted as Table 1 and image coaching clustered result’s visualized as shown in the same. The results show that identical color options of fragments are clustered along. Then the neighborhood module, CCH feature points are combined to be clustered to come up with the coaching result, known as the code book. They also show the visualization of the coaching results. The options extracted through CCH are classified along supported the color options. In the code book, the variety choices

Fig. 5 Example of converse context histogram feature result

Table 1 Color converse context histogram feature representation Representation of data from clusters Representation for image

Total fragments

RGB

RGB fragments

0.5

2

[145, 198, 16]

[9, 16, 16]…[9, 25, 145]

0.5

3

[16, 1, 16]

[1, 25, 145]…[2, 6, 56]









X

Y

[3, 121, 15]

[56, 1, 15]…[56, 145, 1]

186

S. Vinoth Kumar et al.

and furthermore the CCH choices are encased. Upheld the pieces, picture will be recovery well and furthermore the cluster assists decline the processing with valuing. This paper has confirmed numerous settings for projected IBIR framework plan. The outcomes shows the framework perform well for instructing and questioning. In order to verify the IBIR system, the Wang’s information set which has 386 * 254 constituent pictures is applied because the testing and coaching data. The 1000 pictures from the Wang’s dataset area unit applied into the codebook. Fifty pictures are at random elect as question pictures. With the CCH feature points, the IBIR system projected during this paper with success retrievals the right pictures for those question pictures. What is more, improved pictures are elect from 10,000 pictures because the coaching and querying pictures. The results indicate the IBIR system projected retrievals properly once it’s well trained. Table a pair of shows the question sample of the Wang’s dataset. Several pictures are input for pictures retrieval. To question for similar pictures, the system operates as section three delineated. Figure 7 shows the image retrieval results. The IBIR system projected during this paper with success resolve the photographs that are enclosed within the code book. For those that aren’t within the code book, the similar footage is worked out as in Fig. 7. Conjointly the retrieval processes that are generated by the grid module are shown in Fig. 5, the grid puzzle pictures. The experiment indicates that the IBIR system retrievals the easy-to-tell similar image, despite the fact that the image inputted for question isn’t trained within the codebook. The results indicate the IBIR system projected retrievals properly once it’s well trained. Table a pair of shows the question sample of the Wang’s dataset.

5 Conclusions The articles offer a clear insight of the retrieval of information based on the user query. The goal is to retrieve precise and accurate image for the input user image. The keyword-based extraction faces diverse setbacks which are addressed using imagebased extraction. Whereas our methods nearest neighbor and estimation algorithms both are working together and provides better image retrieval process compare than keyword-based extraction methods. The performance parameters are provides more effective results in view of trueness, accuracy, indexing and occurrences. The main feature of color extraction and conversion of histogram equalization is observed very high in this proposed method. Finally this methods gives high semantic search engine property, this provides the results of searching a precisely important word (context based) which is not similar to the conventional search engine. The stationary semantic indexes are used as a part of improving the message. The part of the message which ranks top in the semantic query is specified first and they are concatenated according to the rank they obtained.

Information-Based Image Extraction with Data Mining Techniques …

Sample Images

Fig.7 Feature extraction results

Extraction ResultsProposed System

187

Extraction imagesother systems

188

S. Vinoth Kumar et al.

References 1. Eskicioglu AM, Fisher PS (1995) Image quality measures and their performance. IEEE Trans Commun 43(12):2959–2965 2. Keren D, Osadchy M, Gotsman C (2001) Antifaces: a novel, fast method for image detection. IEEE Trans Pattern Anal Mach Intell 23(7):747–761 3. Eidenberger H (2003) How good are the visual MPEG-7 features? In: Proceedings SPIE visual communications and image processing conference, vol 5150, pp 476–488 4. Cai F, de Rijke M (2016) A survey of query auto completion in information retrieval. Found Trends Inf Retr 10(4):1–92 5. Corcoglioniti F, Dragoni M, Rospocher M, Aprosio AP (2016) Knowledge extraction for information retrieval, Lecture Notes on The Semantic Web, pp 317–333 6. Khokher A, Talwar R (2017) A fast and effective image retrieval scheme using color-, texture, and shape-based histograms. Multim Tools Appl 76(20):21787–21809. https://doi.org/10. 1007/s11042-016-4096-5 7. Ghosh N, Agrawal S, Motwani M (2018) A survey of feature extraction for content-based image retrieval system. Lecture Notes in Networks and Systems, vol 34. Springer, pp 305–313 8. Ghrabat MJJ, Ma G, Maolood IY, Alresheedi SS, Abduljabbar ZA (2019) An effective image retrieval based on optimized genetic algorithm utilized a novel SVM-based convolutional neural network classifier. Hum-Cent Comput Inf Sci 9(1):31. https://doi.org/10.1186/s13673-0190191-8 9. Chu X, Wang W, Ni X, Li C, Li Y (2020) Classifying maize kernels naturally infected by fungi using near-infrared hyperspectral imaging. Infrared Phys Technol. https://doi.org/10.1016/j.inf rared.2020.103242 10. Kavitha P, Vinoth Kumar S, Karthik S (2015) A survey method of re-ranking techniques for improved web based image searches. Int J Appl Eng Res 10(41) 11. Ashraf R, Ahmed M, Ahmad U, Habib MA, Jabbar S, Naseer K (2020) MDCBIR-MF: multimedia data for content-based image retrieval by using multiple features. Multim Tools Appl 79(13–14):8553–8579. https://doi.org/10.1007/s11042-018-5961-1 12. Baig F, Mehmood Z, Rashid M, Javid MA, Rehman A, Saba T, Adnan A (2020) Boosting the performance of the BoVW model using SURF–CoHOG-based sparse features with relevance feedback for CBIR. Iranian J Sci Technol, Trans Electr Eng 44(1):99–118. https://doi.org/10. 1007/s40998-019-00237-z 13. Vinoth Kumar S, Karthik S (2014) A text based image retrieval for improving the performance during feature extraction. Aust J Basic Appl Sci, pp 45–51 14. Vinoth Kumar S, Karthik S (2017) Rule based image extraction for mining images from databases. Asian J Res Soc Sci Humanit 7(1) 15. Vinoth Kumar S, Shaheen H, Sreenivasulu T (2020) An improved scheme for organizing E-commerce based websites using semantic web mining. In: Intelligent computing and applications, Springer AISC Series, vol 1172, Sept 2020, pp 115–123

Fake News Detection System Using Multinomial Naïve Bayes Classifier S. Sangeetha, S. Vinoth Kumar, R. Manoj Kumar, R. S. Rathna Sharma, and Rakesh Shettar

Abstract People in today’s environment rely more on the web news because it is more readily available to them. Internet usage is increasing daily, which contributes to the dissemination of false information. Similar bogus news can spread intentionally or by accident, but it still has an impact on society. Accordingly, an expanding number of fake news must be constrained by involving the computational instrument which predicts similar deceptive data as though it’s phony or genuine. The objective is to concentrate on developing computational tool which helps in classifying news using Naive Bayes algorithm. Multinomial language processing is used to classify fake news in secure manner. It helps to identify the fake news. It gives topmost perfection and assists with deciding the fake news. To descry fake news, proper dataset that has predefined set of classified news is required to process the data further. Stemming or Lemmatization can be done in addition to that, which use term frequence and inverse frequence document model to gain semantic meaning from the news data. Finally, Multinomial Naive Bayes algorithm is applied to it. This algorithm predicts the label of a textbook comparable to a piece of dispatch or review composition and is based on the Bayes theorem. It calculates each label’s liability for the given example before displaying the label with the highest liability as the final product. Keywords Fake news · Naïve Bayes · Machine learning · Multinomial

S. Sangeetha (B) Tamil Nadu State Council for Science and Technology, Chennai, Tamil Nadu, India e-mail: [email protected] S. Vinoth Kumar Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamil Nadu, India R. Manoj Kumar · R. S. Rathna Sharma · R. Shettar SNS College of Technology, Coimbatore, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_17

189

190

S. Sangeetha et al.

1 Introduction The news data can be effectively got it easily from Internet and web-grounded media. It’s profitable for customer to follow their advantage occasions accessible in Internetgrounded mode. But then the main problem faced is the spread of misinformation through online mode. So, a proper system to handle this issue is demanded. Broad dispatches assuming a colossal part in affecting the general public and as its normal, certain individualities essay to exploit it. There are multitudinous spots which give bogus data. Their introductory part is to control the data that can unveil put stock in it. There are bunches of illustration of similar spots everyplace. Consequently, fake news influences the personalities of individualities. As per concentrate on Scientist accept that multitudinous man-made logic computations can help in uncovering the deceiving news. Now the artificial intelligence is getting veritably popular and numerous bias are available to ensure it incompletely. These days’ fake news is creating different issues from sardonic papers to a fabricated news and plan government propaganda in some outlets. Fake news and absence of confidence in the media are developing issues with immense counteraccusations in our general public. Easily, a designedly deceptive story is “phony news” still lately chatting web-grounded media’s talk is changing its description. Some of them presently use the term to excuse the realities fight to their favored perspectives.

2 Literature Survey Methods being used in this field that rely on calculations, including the Hybrid Cloud approach to dealing with fake news. However, such calculations need more stockpiling and have a very low degree of precision. This calculation occasionally uses human input, therefore there is a high risk that the information is provided by a single human, which prevents the accuracy of the identification of fake news [1–3]. As a result, a calculation with productivity that is more prominent than the current calculation is needed. The key problem is that this hybrid approach is quite sluggish and prone to mistakes. Hybrid approach-based models need ale informational indexes to prepare the information and this strategy additionally doesn’t now and then group the information so there is a higher danger of coordinating with the inconsequential information which thus will influence the precision of the news[4–7].

3 Various Machine Learning Algorithm Machine learning is a widely used technique that makes the machine to learn from experience and mostly used for prediction in almost all domain. Figure 1

Fake News Detection System Using Multinomial Naïve Bayes Classifier

191

Machine Learning

Supervised Learning

Reinforcement Learning

Unsupervised Learning

Classification

Model-Free

Clustering

Regression

Association Analysis`

Model-Based

Dimensionality Reduction

Fig. 1 Classification of machine learning algorithms

shows the classification on machine learning algorithms. The accuracy of prediction is affected due to poor data quality, improper model representation [8]. Machine learning algorithms progressively enhance the performance of the mode [9–11]. It is broadly classified into 3 types namely supervised learning, unsupervised learning and reinforcement learning. A predictive model is created by supervised learning using input and output data. Using only input, unsupervised learning groups and interprets data. Figure 2 shows various types of supervised learning algorithm.

SUPERVISED LEARNING

ClassificaLogistic Regression Naïve Bayes K-Nearest Neighbor Support Vector Machine

Examples Email Spam Detection Speech Recognition

Fig. 2 Supervised learning algorithms

Regression Linear Regression Ridge Regression Ordinary Least Square Stepwise Regression

Examples Stock Price Prediction Climate Prediction

192

S. Sangeetha et al.

4 Fake News Detection Using Naive Bayes Classification The proposed system uses a Multinomial Naïve Bayes algorithm for identifying fake news. The raw data is selected for testing and training. The information is split into two sections here (test and train). The data are needed to be pre-processed before sending it to the Multinomial classifier model so the process of feature extraction takes place in which first the data is tokenized, tokenizer breaks unstructured data and natural language text into chunks of information that can be considered as discrete elements. And then the Stemming or Lemmatization process occurs in which Stemming just removes or stems the last few characters of a word, often leading to incorrect meanings and spelling. Lemmatization is the process of grouping together the words into a base word. And then the stop words are removed which converts the low-level information from our text in order to give more focus to the important information. The final process in feature extraction is TF-IDF, which is a statistic that aims to better define how important a word is for a document, this is performed by checking the repetition of a word in the document and giving it weightage to mark it as important so that it will be useful for the model to work. After TF-IDF the final feature extraction process is completed and the data is pre-processed, the next process is to send the pre-processed data to the Multinomial NB classifier model to train and test them for predicting whether the data is fake or not. Understanding the Bayes hypothesis concept is essential before understanding how the Naive Bayes hypothesis functions because it depends on the last alternative. The Bayes theorem, developed by Thomas Bayes, determines the likelihood that an event will occur based on previously known conditions. It is based on the following formula: P(X |Y ) = P(X ) ∗ P(Y |X )/P(Y )

(1)

The probability of class X when predictor Y is already provided is calculated. P(Y ) = prior probability of Y P(X) = prior probability of class X P(Y|X) = occurrence of predictor Y given class X probability. This formula helps in calculating the probability of the tags in the text. The data is divided into two categories in this instance: test data and train data. The train dataset is then divided into categories with similar entities. The test data is then organized, the data is divided up according to where it belongs, and then Naive Bayes classifier is used to determine the likelihood of each individual word separately. The Laplace smoothing is used in this case assuming that the word whose likelihood is to be determined is not accessible in the dataset of the train information. Finally, the assumption that it is real or fake is not totally fixed.

Fake News Detection System Using Multinomial Naïve Bayes Classifier

193

5 Architecture of Multinomial Naïve Bayes Classifier for Fake News Detection Figure 3 Shows the major modules of this proposal that are feature extraction which includes tokenization, stemming or lemmatization, removal of stop words, TF-IDF, and then the pre-processed data is passed to the ML model to get our final desired classification Data Preprocessing: Initial step is to perform information pre-handling on preparing information to set up the information for displaying of framework. Pre-handling includes steps like eliminating missing qualities from informational collection, eliminating online media slangs, eliminating stop words, amending constriction. Tokenization: The most popular method for turning a string of text into a list of tokens is tokenization. A word is a token in a sentence, while a phrase is a token in a passage. These are examples of token sections. Stemming or Lemmatization: The most typical method of reducing a word to its own stem is through stemming, which appends to postfixes, prefixes, or the lemma, the basic building block of words. Stemming plays a crucial role in regular language handling and normal language understanding (NLU) (NLP) Stemming is a component of morphological etymological analyses and artificial intelligence (AI) data recovery and extraction. Because more forms of a term associated with a subject may need to be searched for in order to acquire the best results, stemming and AI information separate significant data from large sources like vast information or the Internet. The most popular method for grouping the various arching forms of a word so they can be dissected into a single entity is lemmatization. Lemmatization is like stemming yet it carries setting to the words. So it joins words with comparable importance to single word. Removal of Stop Words: A stop word is a commonly used word that a search engine has been programmed to ignore, both when indexing sections for looking and while recovering them as the after effect of a pursuit inquiry.

Raw news Data

Tokenization

Stemming or

Removal of

Lemmatiza-

Stopwords

TF-IDF

Multinomial NB Classification model

Input News Data News classified as real or fake

Fig. 3 Functional flow for fake news system using multinomial Naïve Bayes classifiers

194

S. Sangeetha et al.

TF-IDF: For a word in a document is calculated by multiplying two different metrics: • The term frequency of a word in a document. There are multiple approaches to working out this recurrence, with the least complex being a crude count of occurrences a word shows up in an archive. Then, at that point, there are ways of changing the recurrence, by length of a record, or by the crude recurrence of the most successive word in an archive. • The converse record recurrence of the word across a bunch of reports. This implies, how normal or interesting a word is in the whole report set. The nearer it is to 0, the more normal a word is. This measurement can be determined by taking the complete number of reports, partitioning it by the quantity of archives that contain a word and ascertaining the logarithm. Along these lines, assuming the word is exceptionally normal and shows up in many reports, this number will move toward 0. If not, it will move toward 1. Increasing these two numbers brings about the TF-IDF score of a word in a record. The higher the score, the more important that word is in that specific archive [12–14].

6 Performance Evaluation Metrics and Experimental Analysis In order to have accurate prediction, the predicted data need to be identified as true positive, false positive, true negative, false negative. Confusion matrix is used for it, which shows the binary classification between actual data and predicted data. For each model create a confusion matrix, based on this evaluate the performance of the model. Confusion Matrix Actual positive

Actual negative

Predicted positive

True positive

False positive

Predicted negative

False negative

True negative

Following are the four possible outcomes: • True positives: Predicted value is same as that of actual value both are positive (true). • False positives: Predicted value is positive (true) but actual value is false (negative). • True negatives: Predicted value is same as that of actual value both are negative (false). • False negatives: Predicted value is negative (false) but actual value is positive (true). The performance of classification models is evaluated by various metrics such as Recall, Precision, Accuracy, Sensitivity & F1 Score. These metrics are derived from confusion matrix.

Fake News Detection System Using Multinomial Naïve Bayes Classifier

195

• Accuracy: Ratio of correctly predicted instance to total no of instances. Accuracy =

true positive + true negative true positive + falsepositive + true negative + false negative

• Precision: Ratio of correctly predicted positive instances to the total predicted positive instances. Precision =

true positive true positive + true negative

• Recall/Sensitivity: Ratio of correctly predicted positive instances to all the instances in actual class. Recall =

true positive true positive + false positive

• Specificity: Ratio of true negatives to total negatives. Specificity =

true negative true negative + false positive

• F1 score: single metric that combines both recall and precision as a weighted average we say as harmonic mean. F1score =

2 ∗ Recall ∗ Precision Recall + precision

An N × N lattice called a confusion network is used to evaluate the display of an arrangement model, where N is the number of target classes. The lattice contrasts the true objective characteristics with those that the AI model predicts. This gives us a thorough understanding of the effectiveness of our grouping model as well as the kinds of errors it is committing. Investigated were various classifier models using feature extraction such as Count Vectorization and TF-IDF. The proposed Multinomial Naive Bayes model obtains an accuracy of 0.90 on average. The experimental result of the confusion matrix for the fake news detection system is shown in Fig. 4.

7 Conclusion In this project, Multinomial Naïve Bayes classifier is used which will predict the truthfulness of user input news, here we have presented a prediction model with feature selection used as Count Vectorization, TF-IDF which assists the model with being more precise Therefore, using the Naive Bayes theorem, any news from a little or large dataset may be categorized as being fake or being true by comparing it to

196

S. Sangeetha et al.

Fig. 4 Confusion matrix result using Naïve Bayes model

the values of the prior dataset. It reduces time and works effectively by helping the users to believe whether particular news is genuine or not.

References 1. Shaori H, Wibowo WC (2018) Fake news identification characteristics using named entity recognition and phrase detection 2. 10th ICITEE, Universitas Indonesia Shivam B. Parikh, Pradeep K. Atrey, Media-rich fake news detection: a survey. In: 2018, IEEE conference on multimedia information processing and retrieval (MIPR) 3. Shu K, Wang S, Liu H (2018) Understanding user profiles on social media for fake news detection. MIPR 4. Jain A, Kasbe A (2018) Fake news detection. The Institute of Electrical and Electronics Engineers, Published 2018 5. Poddar K, Geraldine Bessie Amali D, Umadevi KS (2019) [6] Comparison of various machine learning models for accurate detection of fake news. The Institute of Electrical and Electronics Engineers, Published 2019 6. Shan G, Foulds J, Pan S (2020) Causal feature selection with dimension reduction for interpretable text classification. University of Maryland, Baltimore County 7. Marco L, Tacchini E, Moret S, Ballarin G Automatic online fake news detection combining content and social signals 8. Stahl K (2018) Fake news detection in social media. California State University Stanislaus 9. Gurav S, Sase S, Shinde S, Wabale P, Hirve S (2019) Survey on automated system for fake news detection using NLP & machine learning approach. Int Res J Eng Technol (IRJET) 10. Manzoor SI, Singla J, Nikita (2019) Fake news detection using machine learning approaches: a systematic review. The Institute of Electrical and Electronics Engineers, Published 2019 11. Tanvir AA, Mahir EM, Akhter S, RezwanulHuq M (2019) Detecting fake news using machine learning and deep learning algorithms. The Institute of Electrical and Electronics Engineers, Published 2019 12. Jin Z, Cao J (2017) News credibility evaluation on microblog with a hierarchical propagation model. Fudan University, Shanghai, China 13. Conroy, N. Rubin Chen Y (2018) CIMT detect: a community infused matrix-tensor coupled factorization 52(1):1–4 14. Markines B, Cattuto C, Menczer F (2018) Hybrid machine-crowd approach, pp 41–48

Superconductivity-Based Energy Storage System for Microgrid Stabilization by Connecting and Disconnecting Loads Amol Raut and Kiran Dongre

Abstract Before developing the solution adopted for the bidirectional converter for stabilizing a microgrid, a study was carried out on its needs in the field of voltage stabilization in isolated networks. Given the growing use of this type of electrical distribution networks, the need arises to guarantee their correct supply, which is why reliability and efficiency in the electrical supply is necessary. The objective of this work is the study and design of an energy stabilization system for a microgrid. It is intended to be able to absorb and yield the over voltages, as well as the voltage drops associated with the connection and disconnection of loads in a network with the characteristics: Effective voltage 1200 V, Energy to store 1 MJ, and Power 1.2 kVA. Keywords Energy storage system · Isolated networks · Microgrid · Stabilization · Superconductivity

1 Introduction Faced with the growing increase in renewable energies, systems that integrate a greater number of them are proliferating, the use of solar, wind, hydroelectric or geothermal energy. That is why, a new system of production, consumption and joint generation has emerged in recent years, the microgrid, this moves away from the traditional distribution system based on non-renewable energy sources and integrates non-renewable electricity production [1–3]. In the same way, this system has certain disadvantages, one of them, which is intended to be solved in this work, is the uneven and unstable energy production due to the rise and fall of voltage. The advantages of a microgrid compared to the traditional system are as follows: economic, reliability and efficiency, and environmental. To overcome this disadvantage of uneven energy production, we bet on a new system based on super-conducting materials; with them, we can create an energy store that A. Raut (B) · K. Dongre Electrical Engineering Research Centre, Prof Ram Meghe College of Engineering & Manage Ment, Badnera, Amravati, Maharashtra 444701, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_18

197

198

A. Raut and K. Dongre

absorbs voltage surges and stores them until there is a voltage drop in the network. The solution to guarantee a constant supply of the microgrid consists of storing the surges in a coil made up of superconductors. This innovative system is called SMES. To carry out this stabilization process, a bidirectional AC/DC converter will be implemented that allows both absorbing energy from the network and returning it to maintain a constant value of it. In addition to the rectifier, a DC/DC converter will be necessary to ensure the charge, discharge, or storage of energy in the form of a magnetic field in the SMES coil. Two options have been considered for the selection of the bidirectional converter: Behavior as a voltage source, Voltage Source Converter (VSC) and Behavior as a constant current source, Current Source Converter (CSC). Once the type of converter has been chosen, the way to control it has been sought, for three possible options have been considered: Control strategy over phase and magnitude, vector control strategy, and P&Q control strategy are indented. The main objective of this work is to study possible alternatives to stress stabilization systems within a microgrid.

2 Selection of Architectures An AC/DC converter makes it possible to provide electrical energy in the form of direct current from an alternating current source. First, a comparative study will be made, and the best option will be obtained to apply it to the SMES system. Second, a detailed description of its operation will be given, as well as the reason for choosing this topology will be justified. Since the SMES system will be used to stabilize networks, the most important value that your converter must have is the ability to transmit active power (P) and reactive power (Q) within the distribution system [4]. The world of converters is very broad, and there are numerous topologies; the simplest classification would be between unidirectional and bidirectional rectifiers. We must first of all rule out all types of unidirectional rectifiers since within a SMES system; we must be able to load a coil with the network and unload it again in the three-phase distribution line. That is, we need energy to flow from the grid side to the coil and from it to the distribution, so unidirectional AC/DC converters will not be studied. Within the number of phases, we distinguish between those with only one phase, monophasic, and multiphase [5–8]. Due to the power ranges that are capable of managing this type of rectifiers, they are too low, and they are ruled out for their application in a SMES system. Therefore, we will focus on those that do allow large amounts of energy. They allow working with much higher powers in a more stable way, within them we differentiate between: operation as a voltage source, operation as a current source and based on thyristors. Those based on thyristors have synchronization problems and cause delays or “lag” in the power factor of the distribution network. In addition to this fact, they inject numerous low-frequency harmonics into the network that can hardly be filtered. Using a 12-pulse converter, we will be able to reduce many undesirable harmonics,

Superconductivity-Based Energy Storage System for Microgrid …

199

but the THD or “Total Harmonic Distortion” is still too high compared to the standard dictated by the regulations [9]. To avoid these two major drawbacks, the use of converters based on self-controlled components, such as Gate Turn Off (GTO) or Insulated Gate Bipolar Transistors (IGBT), has been proposed as a solution. The application of this type of transistors allows varying the amount of active and reactive power transferred to the load, in our case, pure inductive. Depending on the field of application, the power of the converter may vary, from kilowatts in micro-SMES for applications such as UPS or uninterruptible sources, to megawatts in energy storage applications such as our case. IGBTs fall short when it comes to working with high powers, they are only used in small applications. For the rest of the applications, we have GTO thyristors that are the true alternative within this high power range [10]. The main drawback with GTOs is their large switching losses, to avoid this we must reduce the switching frequency to levels of less than 1–2 kHz. Using a control with PWM, we will be able to minimize the injection of current harmonics in the network. The topology of thyristor-based rectifiers is governed by varying the gate firing angle α, this is a very simple control. If the angle is greater than 90°, the converter operates as an inverter, and the coil will discharge. This topology will be discarded because it does not allow significant exchanges between active and reactive power. This topology is typically characterized by existence of a transformer at the input of the converter, PWM modulation of a 6-pulse rectifier/inverter converter, use of IGBT transistors, and a chopper that works in two quadrants. In between the two converters, there is always a junction capacitor; the control is based on a Proportional and Integral (PI) algorithm; in the chopper circuit, it is controlled by PWM, when the duty cycle is greater than 0.5 the coil will charge, but when it is less it will discharge. To carry out the algorithm, a triangular wave-based modulation is usually done. They can be two or three levels of voltage source. In this chopper topology, it is made up of two GTOs and two diodes. The superconducting coil is charged through the capacitor current. This converter presents an analysis that encompasses 6 GTO’s pulses to allow bidirectional rectification, this fact entails the need to generate trigger pulses or drivers. The GTOs of the constant voltage source allow that with variations in the firing angle and with the influence of modulation, both the active and reactive powers can be controlled. Sine/triangular modulation is used with a frequency that must be low for high powers, of the order of 1 kHz. With this we achieve that the injection of harmonics is only of high frequency, so its elimination will be relatively simple with a filtering. The nexus capacitor introduces a stabilized source of extra voltage for the VSC, and its voltage will be the one that allows charging or discharging the coil based on superconductors. This capability is important as it enables fast and efficient cooperation between the bidirectional converter and the chopper. The AC/DC converter is a three-level Neutral Point Clamped (NPC) converter. Each branch of the converter is made up of four active switches with four diodes. In practice, IGBTs or IGCTs are usually used as switches. On the direct current side, the junction capacitor is split in two, which provides a neutral point (NP, Neutral Point).

200

A. Raut and K. Dongre

The diodes connected to this NP point are clamping diodes, also called clamping diodes. Converters as a three-level voltage source have advantages over two-level converters: All components of two-level converters will withstand half the voltage during switching than in the case of three levels. For the same voltage ranges and the same switching frequency, lower THD and dv/dt, that is, lower harmonic injection and lower voltage gradients compared to two-level. But the biggest drawback is the high number of components it has to use, in addition to its highly complex PWMbased control. Another drawback is that due to the charging and discharging of the two capacitors, fluctuations can occur in the output frequency to the distribution line [11]. This topology is typically characterized by existence of a transformer at the input of the converter. The capacitor bank arranged at the input is used as an inductive energy storage buffer to allow the exchange of reactive power with the inductance of the line. These capacitors also make it possible to filter harmonics from highfrequency current. Its control is usually carried out through a three-phase PWM signal to control the injected current. Allows 6 or 12 current pulse topologies, the six pulse allows lower ripple voltage on the direct current side, which translates into lower losses due to the alternating component within the coil. The control is based on a Proportional and Integral (PI) algorithm.

3 Selection of Electronics Components The power elements of both the rectifier and the chopper will be dimensioned, and then, the necessary protections for said elements as well as the required radiators will be added. The DSP that will implement the control algorithm will be selected, in addition, the conditioning circuits of these signals will be made to act on the IGBTs. The power supplies for the drivers, the microprocessor, and the input signal conditioning circuits will be selected. For the selection of the six IGBTs of the rectifier, we must take into account the parameters in Table 1. These parameters are obtained as a result of the simulation carried out with the PSIM. Taking into account that the parameters in Table 1, it has been chosen to place two equal IGBTs in parallel. They are placed in parallel so that the high current is distributed evenly. In Fig. 1, we see the element chosen in appearance and its representative parameters. For the junction capacitor between the AC/DC and DC/DC converters, the parameters in Table 2 must be taken into account. Once obtained through the simulator, a Table 1 IGBT’s parameters

V MAX voltage

800 V

Average current, IM

650 A

Maximum current, IMAX

2000 A

Superconductivity-Based Energy Storage System for Microgrid …

201

Fig. 1 Chosen IGBTs and their characteristic parameters

database of electronic components has been accessed and selected the most appropriate. In Fig. 2, we see the specifications and the appearance of the chosen condenser. Since 100uF is needed, two 50uF capacitors will be associated in parallel in such a way that, on the one hand, we add their capacity and, on the other, we divide by two the resistance associated with each of them. The three coils that serve as a low-pass filter for the currents have been sought taking into account the current they are capable of withstanding in the most extreme conditions of the circuit. The selection of both value of the necessary inductance, 0.5H, and the design of the coil from a superconducting material has been carried out. This coil, due to its characteristics, will not be acquired in the same way as the rest of the components, but will be ordered and custom designed for the occasion. For the selection of the diodes, the characteristics of Table 3 must be taken into account. It is important to Table 2 Junction capacitor parameters

Maximum tension

900 V

Voltage VDC

250 V

Effective current

70 A

Capacity, C

0.1 mF

Fig. 2 Connecting capacitor (Texas Instruments)

202 Table 3 Diode parameters

Table 4 Fuse selection parameters

Table 5 Control unit needs

A. Raut and K. Dongre Reverse voltage V INV_MAX

1900 V

Average current, IM

2000 A

Average current, IM

2500 A

I Inominal

2000 A

I 2t √ I2 t

4810 kA2 s

√ 48,100 kA2 s

Parameters

Needs

Minimum PWM outputs

8

Minimum number of input and output pins

16

Minimum analog/digital (A/D) converters

8

Memory

10 kB

emphasize again that the values have been taken in the worst circumstances, that is, when there has been a voltage drop in the microgrid, and all coil current has discharged through diodes. The parameters necessary to select a fuse in series with this component must also be taken into account given in Table 4. These parameters define the ability to withstand an excess current for a time of less than half a half period, so they will allow dimensioning the protections against over currents and short circuits. For the selection of the DSP, it must be taken into account, first of all, that it will be the element that will carry the most workload, this places it in a priority position. Several different manufacturers of DSP and microprocessors have been considered, but finally one of the manufacturer microchip has been chosen due to the great variety that it offers. They have been searched based on the parameters given in Table 5. It has been chosen not to use multiplexers as this would greatly slow down the processing speed of the DSP. This value is an estimate and has been oversized, and it is also a more than reasonable number.

4 Simulations and Analysis This section is intended to show the correct operation of the stabilization system in the event of voltage rises and falls. For which, first, the amount of energy capable of storing the SMES coil will be shown. Next, voltage variations in the microgrid will be simulated, and finally, the Fourier development of the currents injected into the network will be carried out. All the part of calculations and simulations have been

Superconductivity-Based Energy Storage System for Microgrid …

203

carried out with the PSIM electrical calculations simulator, on Intel Core i7 with 16 GB RAM, 256 GB SSD, and GeForce DTX 960 M. Once the capacity of the system to store up to 1 MJ of energy has been demonstrated, the voltage drop of the microgrid will be simulated. The expected behavior is that in the event of a voltage drop, the coil has to be discharged by the two diodes in such a way that said voltage is stabilized in the microgrid. It has been simulated as follows: • The control algorithm compares the rms value of the power of the three phases with the reference of the stable value. • Lowering this effective value will mean that there has been a decrease in the instantaneous value of the voltage and consequently the system must act. • The way to act is to return the current previously stored in the coil. • To do this, the IGBTs of the DC/DC converter are blocked, and therefore, the only path that has the current stored in the SMES coil is that of the two diodes. In Fig. 3, a voltage drop peak of 1000 V has been simulated for 2 μs, and we see how this translates into a marked variation in the effective value of the power of one of the phases, and consequently the diodes will be acted upon, returning the current stored. This manages to stabilize the network very effectively. In the event of a voltage rise due to the disconnection of large loads, the result is an increase in the effective value of the power of one of the phases. In this situation, proceed as follows: • The control algorithm compares the rms value of the power of the three phases with the reference of the stable value. • Increasing this effective value will mean that there has been an increase in the instantaneous value of the voltage, an overvoltage, and consequently the system has to act.

Fig. 3 Voltage drop and rise in the microgrid (PSIM)

204

A. Raut and K. Dongre

• The way to act is to apply a pulse to the gate of the two IGBTs of the DC/DC converter, so that all the current caused by the overvoltage is stored in the coil. In Figs. 4 and 5, we see how the system behaves before these over voltages and before the voltage drops.

Fig. 4 Sum of currents through the junction capacitor and the coil—current through the diodes (PSIM)

Fig. 5 Voltage surge across the junction capacitor (PSIM)

Superconductivity-Based Energy Storage System for Microgrid …

205

5 Conclusion A detail of great importance must be taken into account; due to simulation problems, more resistance has been added to the components than they actually have. This can be demonstrated by seeing Fig. 5 when the SMES coil is short-circuited, the current should remain constant due to the almost non-existent resistance associated with superconductivity. But instead, as we are forced to increase the resistance for the PSIM software to function properly, this current drops and is dissipated as heat. Regardless of this inconvenience, we can see that the system works precisely and allows to stabilize tensions.

References 1. Agus Purwadi NS (2013) Modelling and analysis of electric vehicle DC fast charging infrastructure based on PSIM. In: First international conference on artificial intelligence, modelling & simulation 2. Chowdhury DS (2007) Superconducting magnetic energy storage system (SMES) for improved dynamic system performance 3. Embaiya Salih SL (2014) Application of a superconducting magnetic energy storage unit for power systems stability improvement 4. Jing Shi AZ (2015) Voltage distribution characteristic of HTS SMES. IEEE Trans Appl Supercond 1–6 5. Jing Shi YT (2005) Study on control method of voltage source power conditioning system for SMES. In: IEEE/PES transmission and distribution. China 6. Ali MH, Bin W (2010) An overview of SMES applications in power and energy systems. IEEE Trans Sustain Energy 1(1):38–47 7. Ali MH, Tamura J (2008) SMES strategy to minimize frequency fluctuations of wind generator system 8. Ali MH, Park M (2009) Improvement of wind-generator stability by fuzzy-logic-controlled SMES. IEEE Trans Ind Appl 45(3):1045–1051 9. Said SM, Aly MM (2014) Application of superconducting magnetic energy storage (SMES) for voltage sag/swell supression in distribution system with wind power penetration 10. Tomoki Asao RT (2008) Evaluation method of power rating and energy capacity of superconducting magnetic energy storage system for output smoothing control of wind farm 11. Zanxiang Nie XX (2013) SMES-battery energy storage system for conditioning out-puts from direct drive linear wave energy converters. IEEE Trans Appl Supercond 23(3)

Deep Learning-Based Model for Face Mask Detection in the Era of COVID-19 Pandemic Ritu Rani, Amita Dev, Ritvik Sapra, and Arun Sharma

Abstract Recent advancements in the growth of classification tasks and deep learning have culminated in the worldwide success of numerous practical applications. With the onset of COVID-19 pandemic, it becomes very important to use technology to help us control the infectious nature of the virus. Deep learning and image classification can help us detect face mask from a crowd of people. However, choosing the correct deep learning architecture can be crucial in the success of such an idea. This study presents a model for extracting features from face masks utilizing pre-trained models ConvNet, InceptionV3, MobileNet, DenseNet, ResNet50, and VGG19, as well as stacking a fully connected layer to solve the issue. On the face mask 12 k dataset, the study assesses the effectiveness of the suggested deep learning approaches for the task of facemask detection. The performance metrics used for analysis are loss, accuracy, validation loss, and validation accuracy. The maximum accuracy is achieved by DenseNet and MobileNet. Both the models gave a comparable and good accuracies in terms of training and validation (99.89% and 99.79%), respectively. Further, the paper also demonstrates the deployment of deep learning architecture in the real-world using Raspberry Pi 2B (1 GB RAM). Keywords Face mask detection · Deep learning · VGG19 · DenseNet · MobileNet · InceptionV3 · ConvNet · ResNet50

R. Rani (B) · A. Dev · A. Sharma Center of Excellence, Indira Gandhi Delhi Technical University for Women, New Delhi, Delhi, India e-mail: [email protected] A. Dev e-mail: [email protected] A. Sharma e-mail: [email protected] R. Sapra Amdocs Development Center India, Gurgaon, Haryana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_19

207

208

R. Rani et al.

1 Introduction COVID-19 is one of the major pandemic human kinds has ever witnessed. It has not only affected our day-to-day life but also has made wearing mask as one of the basic necessities while stepping out of the house to avail any kind of the service in the public [1]. Wearing masks in public can pointedly lessen virus transmission in communities although it does not prove to be perfect barrier to transmission but wearing masks combined with other preventive measures like use of sanitizers and social distancing can drastically minimize the COVID cases and deaths. Wearing a mask not only protects the user from infection, but also stops the virus from spreading to others, as face coverings can trap droplets ejected by the bearer [2], which are primarily the result of virus transmission. COVID-19-infected people are likely to be 40% asymptomatic and unaware of their infection, although they can potentially spread the virus to others. In [3], a detail study has been presented which shows that lockdown and COVID-19 cases had a strong positive correlation.

1.1 Main Objective of the Paper • The main strength of the paper lies on the emphasis and the importance of wearing masks during the time of pandemic as it not only reduces the chances of the wearer from being infected but also saves the other people from virus to be transmitted. • To test the performances of different pre-trained deep learning models on the face mask datasets, including InceptionV3, VGG19, MobileNet, DenseNet, ResNet50, and classic ConvNet, using training loss, validation loss, and accuracy. • To discuss the architectures of all these pre-trained deep learning models InceptionV3, VGG19, MobileNet, DenseNet, ResNet50, and traditional ConvNet in detail. • The performance of the pre-trained models has been analyzed by computing the confusion matrix along with their classification report. The performance metrics like recall, F1-score, and precision of the models have been computed as well.

2 Literature Review Deep learning approaches have outperformed older state-of-the-art machine learning techniques in a wide range of domains over the last two decades. Many novel applications of computer vision techniques have been developed by deep learning and are now becoming a part of our daily lives [4–9]. As the COVID-19 hit the world, wearing face masks became one of the most significant necessities in our day-to-day lives. Several studies [10, 11] have found that wearing a face mask not only lessens the risk of virus infection but also gives the impression of protection. Different models with different backbone architectures have been discussed below. Militant

Deep Learning-Based Model for Face Mask Detection in the Era …

209

et al. [12] used convolution neural networks and the VGG-16 network model to develop an alarm system for face mask detection and social distancing. They trained their model on the dataset to classify images into with or without masks. The model also extracts features like human figures and see if there is distance between the two figures or not. If there is no distance, the model predicts a social distancing violation. However, while predicting the social distancing violations, a major challenge is the camera angle. In yet another paper, Vinh et al. [13] designed a model with YOLO v3 and Haar cascade classifier. Image enhancement techniques were used in addition to improve the accuracy, and the model can work in real time with 30 fps. Loey et al. [14] proposed YOLO v2 model with ResNet50 to improve their accuracy. In all these models, the architecture uses heavy computation for execution and classification. It will require even more power when live video stream will be an input. To implement these models in a real-life situation, we will either need powerful computers (in comparison with embedded systems), or we will have to switch to cloud technology (virtual machines or cloud computations). Jiang et al. [15] in their paper proposed RetinaNet as classifier using ResNet or MobileNet as the convolutional layer. A feature pyramid network is used to combine high-level semantic information with various feature maps in the retina face mask. MobileNet has yet taken one step further when Nagrath et al. [16] designed a SDDMNV2, which is a MobileNet v2 architecture, to classify images with face masks with competitive accuracy on a model which requires relatively very less computational resources than other researches. The challenge remains that these models might not perform very well in the real-life situations. This is because we have limited dataset as of now to train the models on every possible viewing angle of the images. To overcome this problem, researches are being carried out to build hybrid or complex models, but they require high computer resources to run 24 × 7. Wang et al. [17] provide a deep analysis of the Voila–Jones algorithm, and Vikram et al. [18] demonstrate how effectively this algorithm was able to extract facial features. In Viola et al. [19], we can see that authors proposed Voila–Jones for object detection as well. It is indeed a very important algorithm for feature extraction. However, the question is how well it can perform when features are hidden behind the masks. Ejaz et al. [20] demonstrated a model on masked and non-masked face dataset using principal component analysis (PCA), using Voila–Jones as feature extractor. The results were not surprising. The model worked extremely well to predict non-masked faces, but struggled relatively for masked faces. To tackle this problem, yet another paper presents joint cascade as a solution for excellent feature extraction [21]. They used Haar cascade for mapping features and then checked for overlaps to detect a face. These researches might be excellent for feature extraction, but they would have to be tweaked to work with our use case (because of masked faces). Loey et al. [22] proposed a fantastic method for creating a hybrid deep classical machine learning model. The model had two parts: feature extraction and classification. The authors used ResNet50 for extraction of features, and SVM proved to be the best candidate for classification into masks and non-masks category. Some of the most recent and significant work in this domain is summarized in Table 1.

210

R. Rani et al.

Table 1 Recent and significant work in the domain of face mask detection using deep learning and machine learning Authors

Methodology used

Dataset used

Results

Nagrath et al. 2021[16]

SSDMNV2 (face detector: single shot multibox detector; classifier: MobileNetV2 architecture)

Collected from various F1-score is 0.93, and sources an accuracy score is 0.9264

Leoy et al. 2021 [22] Hybrid deep learning and machine learning model

Real-world masked face dataset (RMFD) Labeled faces in the wild (LFW) Simulated masked face dataset (SMFD)

99.64 percent accuracy was attained with RMFD The precision of SMFD was 99.49% The accuracy of LFW was 100 percent

Jignesh Chowdary et al. 2020 [23]

InceptionV3

Simulated masked face 99.9% and 100% dataset (SMFD) training and testing accuracies, respectively

Oumina et al. 2020 [24]

MobileNetV2 as feature extractor and SVM as classifier

Face mask dataset

Classification rate 97.1%

Asif et al. 2021 [25]

Hybrid model (deep learning and machine learning)

Different videos and images from smartphone camera

99.2% training accuracy and 99.8% validation accuracy

3 Methodology Used In the modern years, convolution neural networks (ConvNet) are mostly used to make different architectures for specific purposes. The pre-trained deep neural networks such as MobileNet, Inception, VGG19, DenseNet, and ResNet50 are implemented using the concept of transfer learning.

3.1 Datasets Since wearing masks is a very recent trend, not many good quality datasets are available. Image augmentation is the process of creating sample images to increase the dataset. It is done primarily to enhance the dataset size so that training can be improved. Figure 1a original and Fig. 1b augmented images were created from the dataset to increase its size further. The datasets which were used were from Kaggle; face mask 12 k dataset. Distribution of images in the test and train dataset is a critical decision that can affect the

Deep Learning-Based Model for Face Mask Detection in the Era …

211

Fig. 1 These three figures depict a original images and b augmented images

overall worth of the model. 80% of images are given to the train part, and rest 20% of the images was given to the test part.

3.2 Data Pre-processing The first stage in pre-processing an image dataset is augmentation. Augmentation helps us in increasing variability in the data. It includes randomly rotating and resizing images so that the model is able to train better and predict more accurately. In this paper, there are multiple parameters given for image augmentation, such as rotation, flip, and image shift from the original point.

3.3 ConvNet ConvNet [26] is a game-changing breakthrough in computer vision. Almost every image classification or computer vision architecture uses CNN as its backbone. The proposed model has been run for ten epochs for the face mask dataset with ‘Adam’ optimizer and loss function. The size of the batch chosen is 32. The accuracies in terms of training as well as validation for the CNN model for the dataset are shown in Fig. 2a, and the training losses and validation losses are shown in Fig. 2b.

3.4 MobileNet MobileNet is a special type of architecture made specially for mobile phone CPUs. The proposed model has been run for ten epochs for the face mask dataset with ‘Adam’ optimizer and loss function. The size of the batch chosen is 32. Figure 3a given shows the accuracies in terms of training as well as validation, and Fig. 3b gives the training losses and validation losses for the MobileNet model for the dataset.

212

R. Rani et al.

Fig. 2 a Training loss and training accuracy versus epochs, b validation loss and validation accuracy versus epochs for CNN

Fig. 3 a Training loss and training accuracy versus epochs, b validation loss and validation accuracy versus epochs for MobileNet

3.5 VGG19 VGG19 architecture is especially useful for large-scale image recognition. The basic principle is to take the classical ConvNet layers and increase its depth substantially. The ConvNet configurations used are explained in detail in [27]. There are MaxPool layers, three fully connected (FC) layers, and in the end a softmax layer for binary classification. The proposed model has been run for ten epochs for the face mask dataset with ‘Adam’ optimizer and loss function. The size of the batch chosen is 32. Figure 4a given shows the accuracies in terms of training as well as validation, and Fig. 4b gives the training losses and validation losses for the VGG19 model for the dataset.

3.6 Inception Although VGG produces promising results, it is computationally expensive. In [28], authors provide an explanation of how optimizing network and network layers can

Deep Learning-Based Model for Face Mask Detection in the Era …

213

Fig. 4 a Training loss and training accuracy versus epochs, b validation loss and validation accuracy versus epochs for VGG19

Fig. 5 a Training loss and training accuracy versus epochs, b validation loss and validation accuracy versus epochs for Inception

dramatically reduce computational costs for large datasets. Inception was tested on ImageNet dataset and was made especially for image classification or recognition on huge datasets. The proposed model has been run for ten epochs for the face mask dataset with ‘Adam’ optimizer and loss function. The size of the batch chosen is 32. Figure 5a given shows the accuracies in terms of training as well as validation, and Fig. 5b gives the training losses and validation losses for the Inception model for the dataset.

3.7 DenseNet DenseNet [29] uses the principle of avoiding loss of information over large no. of layers. A traditional ConvNet has L connections for L layers (one between each). DenseNet, on the other hand, has (L × (L + 1)) ÷ 2 connections for L layers, implying that each layer is linked or feature mapped to all the others. The proposed model has been run for ten epochs for the face mask dataset with ‘Adam’ optimizer and loss function. The size of the batch chosen is 32. Figure 6a given shows the

214

R. Rani et al.

Fig. 6 a Training loss and training accuracy versus epochs, b validation loss and validation accuracy versus epochs for DenseNet

accuracies in terms of training as well as validation, and Fig. 6b gives the training losses and validation losses for the DenseNet model for the dataset.

3.8 ResNet50 ResNet50 is a modified version of ResNet that features 50 convolution layers, one MaxPool layer, and one average pool layer, as well as 3.8 × 109 floating point operations. The proposed model has been run for ten epochs for the face mask dataset with ‘Adam’ optimizer and loss function. The size of the batch chosen is 32. Figure 7a given shows the accuracies in terms of training as well as validation, and Fig. 7b gives the training losses and validation losses for the ResNet50 model for the dataset.

Fig. 7 a Training loss and training accuracy versus epochs, b validation loss and validation accuracy versus epochs for ResNet50

Deep Learning-Based Model for Face Mask Detection in the Era …

215

4 Result The experimentation has been carried out on 11th Gen Intel Core i5-1135G7 @2.40 Ghz, ×64 bit microprocessor using the Kaggle Notebook with the inbuilt GPU from Kaggle itself. A CNN-based model is first trained with three layers. Each layer uses ‘ReLU’ as activation function. Max pooling layers with the kernel size 2 × 2 is used in between layers. These layers are then flattened, the model is created after adding which uses ‘Sigmoid’ as activation function, optimizer used is ‘Adam’, and the loss function used is ‘binary_cross entropy’. The BATCH_SIZE and EPOCH are taken as 32 and 10 throughout the code. The results are given in Table 2 given. Among all the architectures for comparison, DenseNet was most stable. Both from the table and graph, we can see DenseNet has the highest accuracy and very little loss and validation loss as possible. The second most important architecture was MobileNet. With a slight fluctuation over the sixth epoch, MobileNet was very stable otherwise. The confusion matrix has been computed for all the models to analyze the performance of the models and shown in Fig. 8. For each model, the confusion matrix provides four parametric values: True positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). Table 3 gives the classification report for all the models and analyzes the various parameters like recall, precision, and F1-score. The metric used for the model is the ‘accuracy score’. MobileNet and DenseNet give the maximum accuracy score of 99.59% and 99.69%, respectively. The comparison of the above models leads us to the point of deploying such an implementation to a practical device. The architecture which was more suitable for such a purpose is MobileNet. The architecture design for the MobileNet is made specifically for mobile hardware. Basically, ARM architectures are suitable for such an architecture. Most SoCs (System on Chip) like Raspberry Pi are made with ARM, and MobileNet provides promising results to be used on them. For deploying a MobileNet model on a Raspberry Pi, we would need some software to capture video and also to extract features. The challenge was to make the feature extractor light on computational Table 2 Training and validation accuracy and training and validation loss of the six models over 10 epochs Models

Epochs

Loss

Accuracy

Validation loss

Validation accuracy

ConvNet

10

0.0283

0.9892

0.0237

0.9925

Inception

10

0.0040

0.9787

0.0059

0.9675

MobileNet

10

0.0031

0.9989

0.0077

0.9987

DenseNet

10

0.0066

0.9979

0.0011

1.000

VGG19

10

0.0542

0.9786

0.0327

0.9912

ResNet50

10

0.2013

0.9192

0.1793

0.9375

216

R. Rani et al.

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 8 a Confusion matrix for CNN using Adam optimizer, b confusion matrix for DenseNet using Adam optimizer, (c) confusion matrix for Inception using Adam optimizer, (d) confusion matrix for MobileNet using Adam optimizer, (e) confusion matrix for ResNet50 using Adam optimizer, and (f) confusion matrix for VGG19 using Adam optimizer

resources as well. Here we have used Caffe [23] as the feature extractor. It is developed by Berkely AI Research and is very customizable according to different architectures. We can also set it to make use of GPU, if it is installed. Otherwise, it can work perfectly fine with hardware resource limitations too. An alternate choice was OpenCV. The model was trained completely on the Google Colab platform with a GPU support. This is done in order to properly train the model once. The model was saved with a. model file which can be used with other applications. Finally, the code with Caffe loads the. model file and uses it to predict face masks. EPOCH and BATCH_SIZE were set to 20 and 32. Optimizer was set as ‘Adam’, and ‘binary cross-entropy’ was set for loss. Test and train size were divided into 20 and 80%. Input shape is 224 × 224 × 3 and weights set to ImageNet. The following are the results of the model. Raspberry Pi 2 B with 1 GB RAM was chosen as the hardware to deploy the model on, with any compatible camera module for the camera input. With just 1 GB RAM, it is challenging for any deep learning model to run smoothly. After installing all the dependencies and launching the model, the results were very good. The model ran relatively smoothly. The frames of the video were not dropping, and the model was predicting and working as it was on the Intel i3 5th Gen laptop. The only places where

0.99

With mask

0.95

0.95

Without mask

With mask

ResNet50

1.00

DenseNet121

Precision

Without mask

Classification report

0.94

0.95

1.00

0.99

Recall

0.94

0.95

1.00

1.00

F1-score

0.99

0.97

CNN

0.99

1.00

MobileNet

Precision

Table 3 Classification report of the models in terms of precision, recall, and F1-score

0.97

0.99

1.00

0.99

Recall

0.98

0.98

1.00

1.00

F1-score

0.99

0.99

InceptionV3

0.98

1.00

VGG19

Precision

0.99

0.99

1.00

0.98

Recall

0.99

0.99

0.99

0.99

F1-score

Deep Learning-Based Model for Face Mask Detection in the Era … 217

218

R. Rani et al.

(a)

(b)

(c)

Fig. 9 Model working from a front, b left, and c right profile and the confidence in predicting

the model lacked and the confidence in prediction dropped when the viewing angle was changed substantially, i.e., person’s side view was visible and not the complete face. However, this is due to the dataset and not the model. This demonstrates (see Fig. 9) that such a model can be deployed on board directly on a camera or any light resource hardware.

5 Conclusion One of the most essential parameters in limiting the transmission of the COVID19 virus is the use of masks. Recent studies reveal that the different potential deep learning architectures can be used extensively in the domain of image classification and detection of objects. The study provided a detailed and exhaustive empirical analysis from their internal structures to how they performed on real-world data. DenseNet provided the maximum accuracy of 99.89% followed by MobileNet with 99.79%. ResNet50 gave the least training accuracy of 91.92%. However, MobileNet was of special importance since it is an architecture specifically made for mobile hardware. A model was trained with MobileNet and Caffe and deployed on Raspberry Pi 2B with only 1 GB of RAM. The study concluded that the model performed adequately and the frames did not drop to a very low level. The point of this study was to provide a solution for face masks detection to every person.

References 1. Martin G, Hanna E, Dingwall R (2020) Face masks for the public during Covid-19: an appeal for caution in policy 2. Siegfried IM (2020) Comparative study of deep learning methods in detection face mask utilization 3. Atalan A (2020) Is the lockdown important to prevent the COVID-19 pandemic? Effects on psychology, environment and economy-perspective. Ann Med Surg 56:38–42

Deep Learning-Based Model for Face Mask Detection in the Era …

219

4. Bhatt S, Dev A, Jain A (2021) Effects of the dynamic and energy-based feature extraction on Hindi speech recognition. Recent Adv Comput Sci Commun 14(5):1422–1430 5. Bhatt S, Jain A, Dev A (2021). Feature extraction techniques with analysis of confusing words for speech recognition in the Hindi language. Wirel Pers Commun 1–31. https://doi.org/10. 1007/s11277-021-08181-0 6. Agrawal SS, Jain A, Sinha S (2016) Analysis and modeling of acoustic information for automatic dialect classification. Int J Speech Technol 19, pp 593–609. https://doi.org/10.1007/s10 772-016-9351-7 7. Bhatt S, Dev A, Jain A (2020) Confusion analysis in phoneme based speech recognition in Hindi. J Ambient Intell Humanized Comput 11(10):4213–4238 8. Bhatt S, Jain A, Dev A (2021) Continuous speech recognition technologies-a review. Recent Dev Acoust, pp 85–94 9. Alzubi JA, Jain R, Singh A, Parwekar P, Gupta M (2021) COBERT: COVID-19 question answering system using BERT. Arab J Sci Eng 1–11 10. Howard J, Huang A, Li Z, Tufekci Z, Zdimal V, van der Westhuizen HM, Rimoin AW (2020) Face masks against COVID-19: an evidence review 11. Verma S, Dhanak M, Frankenfield J (2020) Visualizing the effectiveness of face masks in obstructing respiratory jets. Phys Fluids 32(6):061708 12. Militante SV, Dionisio NV (2020) Deep learning implementation of facemask and physical distancing detection with alarm systems. In: 2020 Third international conference on vocational education and electrical engineering (ICVEE). IEEE, pp 1–5 13. Vinh TQ, Anh NTN (2020) Real-time face mask detector using YOLOv3 algorithm and Haar cascade classifier. In: 2020 International conference on advanced computing and applications (ACOMP). IEEE, pp 146–149 14. Loey M, Manogaran G, Taha MHN, Khalifa NEM (2021) Fighting against COVID-19: a novel deep learning model based on YOLO-v2 with ResNet-50 for medical face mask detection. Sustain Cities Soc 65:102600 15. Jiang M, Fan X, Yan H (2020) Retinamask: a face mask detector. arXiv preprint arXiv:2005. 03950 16. Nagrath P, Jain R, Madan A, Arora R, Kataria P, Hemanth J (2021) SSDMNV2: a real time DNN-based face mask detection system using single shot multibox detector and MobileNetV2. Sustain Cities Soc 66:102692 17. Wang YQ (2014) An analysis of the Viola-Jones face detection algorithm. Image Process Line 4:128–148 18. Vikram K, Padmavathi S (2017) Facial parts detection using Viola Jones algorithm. In: 2017 4th International conference on advanced computing and communication systems (ICACCS). IEEE, pp 1–4 19. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, vol 1. IEEE, p I 20. Ejaz MS, Islam MR, Sifatullah M, Sarker A (2019) Implementation of principal component analysis on masked and non-masked face recognition. In: 2019 1st International conference on advances in science, engineering and robotics technology (ICASERT). IEEE, pp 1–5 21. Chen D, Ren S, Wei Y, Cao X, Sun J (2014) Joint cascade face detection and alignment. In: European conference on computer vision. Springer, Cham, pp 109–122 22. Loey M, Manogaran G, Taha MHN, Khalifa NEM (2021) A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19 pandemic. Measurement 167:108288 23. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678 24. Chowdary GJ, Punn NS, Sonbhadra SK, Agarwal S (2020) Face mask detection using transfer learning of inceptionv3. In: International conference on Big Data analytics. Springer, Cham, pp 81–90

220

R. Rani et al.

25. Oumina A, El Makhfi N, Hamdi M (2020) Control the covid-19 pandemic: face mask detection using transfer learning. In: 2020 IEEE 2nd international conference on electronics, control, optimization and computer science (ICECOCS). IEEE, pp 1–5 26. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324 27. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 28. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826 29. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

Efficient System to Predict Harvest Based on the Quality of the Crop Using Supervised Techniques and Boosting Classifiers S. Divya Meena, Jahnavi Chakka, Srujan Cheemakurthi, and J. Sheela

Abstract Many agricultural workers have lost their jobs as a result of crop damage, which is now the most frequent reason for farm losses. The harvest season determines their annual revenue. Therefore, a loss of income across several harvest seasons will lead to a string of yearly losses. In addition to all of this, crop forecasting is crucial because if one crop in a field is harmed there is a significant likelihood that crops of the same type will also be affected. A loss or impact on a tiny area might have a significant negative impact on the entire field and the associated revenues. Therefore, it is crucial to periodically monitor the status. The boosting technique LightGBM outperformed other methods with an efficiency of 97.2% and is useful for predicting crop status. In order to increase accuracy and precision, we built an effective predictor in this study using a variety of machine learning techniques. This predictor will produce an immediate, effective outcome that will assist farmers in making adjustments to their job and simplifying their chores. Keywords Boosting algorithms · Supervised learning algorithms · Agriculture · Crop status

1 Introduction Increasing output levels in agriculture is essential for a nation’s population’s security and health because it is one of the biggest and most significant industries in the world. Pests and pathogens decimate crops all over the planet. The productivity of S. Divya Meena · J. Chakka · S. Cheemakurthi · J. Sheela (B) School of Computer Science and Engineering, VIT-AP University, Amaravati, India e-mail: [email protected] S. Divya Meena e-mail: [email protected] J. Chakka e-mail: [email protected] S. Cheemakurthi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_20

221

222

S. Divya Meena et al.

crops has been significantly impacted by pests. The abuse of pesticides in agriculture has led to an increase in residues in plants, insect resistance, and soil, water, and air pollution. Farmers have a key role in enhancing the agricultural environment. By switching to sustainable production practices, they can do a lot of good. Every industry has adopted machine learning techniques, including agriculture, where the usage of their work and data they are striving for farmers to help in out various way ensure a fine-textured harvest season and a flushed plantation following the time period. Nevertheless, here are still a number of unresolved issues in the agricultural sector, as well as potential technological advancements that may help farmers use supplying even more efficiently. Agriculture is essential to human life, either directly or indirectly. Agriculture is the source of life that produces food. Pesticides are currently highly beneficial to crops throughout harvest season, but only if they are restricted to a few calibrated levels required during harvest season. Farmers would otherwise risk huge crop losses, resulting in a torrent of debts and tough circumstances. This would not be the case if we could assess crop condition at the early stages of harvesting, allowing farmers to take the necessary precautions. This indicates that determining the state of the agricultural sector is a challenging procedure involving numerous complex phases. While more accurate yield prediction is still required, growth and yield algorithms may already estimate real yield very well. Figure 1 shows the crop region. Science’s field of machine learning (ML) enables computers to learn without being explicitly programmed [1, 2]. Agribusiness [3, 4], biochemistry [5, 6], health care [6– 8], meteorology [9–11], economic studies [12–14], automation [15, 16], informatics [17, 18], biochemistry [5, 6], health care [6–8], climatology [19], and economic studies are just a few of the industries using machine learning. Crop status prediction is one of the trickiest issues in precision agriculture; yet, several approaches have been created and shown to work so far. The season of crop plantations and the crop’s results at the end of that season are discussed in this essay harvest. It includes details on plantations that have already produced crops, including the number of pesticides applied, the number of insects present, the kind of soil, the type Fig. 1 Crop region

Efficient System to Predict Harvest Based on the Quality of the Crop …

223

of crop, and more. It also includes information on the crop damage status of each harvest plantation. Comparative analysis uses effective procedures. To raise prediction efficiency and accuracy, techniques including random forest, KNN, decision tree, Gaussian nave Bayes, AdaBoost, XGBoost, and LightGBM are used. Finding the best-fit approach for a model with the highest level of accuracy that will streamline farmers’ tasks and help them produce lucrative items is the aim.

2 Literature Survey Our study delivers a resolution forecast based on a system and gathers data from a wide range of latitudes. Crop strength forecasts before the start of the seeding and harvesting seasons. Global and regional decision trees for crop yield prediction St. Paul, Minnesota 55108, USA: Minnesota University’s Institute for the Climate. The results point to RF as a special ML strategy for forecasting regional and global crop yields due to its exceptional accuracy and performance. Two layers—one for regression and the other for k-nearest neighbor—are used to implement the Journal (SVG). In [20], the agricultural yield was predicted using the machine learning technique. A journal devoted to the growth of technology is called the International Review of Technology Scientific Research. By means of available data and the random forest method, this study aims to estimate agricultural productivity. When Beulah analyzed the various data mining methods for predicting agricultural production, she discovered that they could be able to help with the major problems [21]. Now, being used is this journal (SVG), techniques for significantly excessive chemical ingestion have been presented. According to this study [22], there is a connection between chemical use and agricultural productivity. The models were created with real-time data from Tamil Nadu (TN) and tested with survey samples. In some instances, the random forest algorithm can be used to accurately estimate crop production. This paper [23] presents a comprehensive assessment of the research on the use of machine learning methods in agricultural sectors that have contributed to the advancement of technology in this field. Machine learning, together with digitalization, methods, and advanced computing, was developed to offer new opportunities for computation processes in agricultural functioning sectors to be examined, measured, and assessed. The paper’s implementation made use of support vector machines (SVM). Chlingaryan and Sukkarieh did a review of literature on the status of nitrogen estimate using machine learning [24]. The study comes to the conclusion that meager agricultural solutions will be produced by the quick development of sensor technology and machine learning techniques [25, 26] oversaw a study of the literature on various machine learning (ML) algorithms for use in agriculture to estimate production using meteorological factors. The study suggests that you broaden your lookup to include more harvest yield-related parameters [27]. Recently, a review article on the application of machine learning in agriculture was released. This study helps to widen the search for other agricultural yield factors [28, 29]. The analysis made use of the

224

S. Divya Meena et al.

literature on managing soil, water, livestock, and agriculture. The research, which was directed at anticipating fruit maturity in order to establish the most accurate date for harvest prediction, was overseen by Li et al. [30].

3 Proposed Methodology In order to extract the better true positives from the new prediction points, two methodologies are investigated in this work using the data points. First, a thorough examination of the data is done to ensure that each column has a continuous set of data points. [0, 1, 2] and [crop is alive, hurt by pesticides, injured for some other reason] make up the dependent variable, which is a predictor with three independent data points. It has eight independent factors. After the data has undergone data preprocessing, numerous machine learning techniques are applied, and missing values and outliers are adjusted using the column means is displays in Figs. 2 and 3. We introduce several new algorithms, such as random forest, KNN, decision trees, AdaBoost, XGBoost, Gaussian nave Bayes, and LightGBM. The best algorithm was then found to fit the model with the lowest levels of accuracy, precision, recall, and F1-score. A new strategy for boost effectiveness will be presented. Here, expanding the array to include every potential grouping and point in time reinjecting all the algorithmic program with the expanded dataset produced highly accurate and superior outcome.

Fig. 2 Model’s approach-I architecture

Efficient System to Predict Harvest Based on the Quality of the Crop …

225

Fig. 3 Model’s approach-II architecture

Table 1 Dataset description Journal no.

Size of dataset (MB)

Types

Takahashi et al. [12]

5.89

Crop status, absence of pesticides, absence of weekly pesticide doses

Aybar-Ruiz et al. [2]

1.7

The damage of the crop

Kang et al. [8]

25

No. of fertilizers, crops status, no. of pesticides

4 Experimental Framework 4.1 Dataset Description We used Kaggle dataset for the testing. We collected an entire of 1,48,172 illustration from various collection for our suggested model. The data is divided into 88,860 samples for model training and 59,312 samples for model validation. The dataset’s properties include Features-ID. The state of a crop is assessed using the expected insect population, crop type, soil type, pesticide usage category, number of doses per week, number of weeks used, number of weeks halted, and crop damage (dependent variable). Crop dead, crop living, and crop damaged are three general categories for these nine characteristics are shown in Table 1.

4.2 System Requirements These tests were conducted using a 8GB RAM laptop GPU-based on the NVIDIA GeForce 940 M in version 376.82, coupled has an 8 GB RAM and 2.40 GHz Intel(R) i5-5200U Core(TM) processor. We utilized a 1 TB Seagate hard drive for storage.

226

S. Divya Meena et al.

4.3 Performance Metrics Accuracy: The number of new points, or data that the algorithm successfully classifies, indicates how accurate a model is. For instance, if 100 fresh data points were used to test the algorithm, 97 of them were correctly categorized, then the accuracy is 97.2% Accuracy =

True Negative + True Positive (1) True Negatice + False Positive + True Postive + False Negatice

Confusion Matrix It is a method for condensing algorithmic classification performance. If there are unequal amounts of data, the only thing that could be misleading is classification accuracy inside each class or if your dataset has more than two classifications. We can more accurately evaluate the classification model’s strengths and weaknesses by calculating the confusion matrix. Precision The precision is calculated as ratio of true positives (TV) and (TV + FV), where TV is the entire/total sum of true positives and FV is the total sum of false positives, is used to measure precision. It is the classifier’s ability to avoid classifying a negative sample as positive. The worst value is denoted by 0, and the best value is represented by 1. Xp =

TU TU + FU

(2)

where TU stands for correct positive predictions, FU for false positive predictions, and Xp stands for precision. Recall The capacity of a classifier to find every successful instance is known as recall. One is the highest value, while zero is the lowest. Yp =

TV TV + FV

(3)

When recall is Y p, TV is accurate positive forecasts, and FV is inaccurate negative predictions. E1-Score The E1-score is also well known as balanced of E-score or E-measure. With the maximum score of 1 and a minimum score of 0, it can be viewed as an average weighting of memory and accuracy. The F1 score is equally influenced by recall and precision.

Efficient System to Predict Harvest Based on the Quality of the Crop …

227

Table 2 Measures of the approach performance S. No.

Approaches

Accuracy (%)

Precision

Recall

1

(a)

82.2

0.74

0.81

2

(b)

84.3

0.79

0.84

3

(c)

75.3

0.66

0.71

4

(d)

82.4

0.75

0.80

5

(e)

84

0.79

0.84

6

(f)

84.6

0.82

0.85

7

(g)

80

0.72

0.65

 EI = 2 ∗

Xp ∗ Yp Xp + Yp

 (4)

where X p is known as the precision, E1 is known as the E1-score, and Y p is known as the recall. Loss If the loss is less, the model will perform better (except if the model has been over-fit to the training set of data). The performance of the model for the training and testing sets is expressed by the loss, which is computed for these two sets. These values describe the entire model’s performance following each optimized iteration. One would want to witness a reduction in loss after each or few iterations (s). Table 2 shows the different measures of the approach performance. Where (a) random forest classifier (b) K-nearest neighbors (c) decision tree classifier (d) Gaussian naïve Bayes (e) AdaBoost (f) LightGBM (g) XGBoost.

5 Results and Discussion After the harvest time at the government-owned agricultural plantation, results are forecasted using a number of methodologies in an effort to determine the one that produces the highest accuracy. When performance measurements for each algorithm are examined, LightGBM outperforms KNN, random forest, decision tree, naive Bayes, AdaBoost, LightGBM, and XGBoost. Additionally, since this is the form cost method that incorporates explorative data analysis and the trained model achieved the maximum accuracy of 84.6%, another method for boosting accuracy has been found. The first method’s maximum accuracy is 84.6%. The first method’s maximum accuracy is 84.6%. In the secondary method, in order to build a pattern in the data and incorporate some inconsistent data, we added more columns to the dataset that previously existed. We then repeated the process in order to apply exploratory data

228

S. Divya Meena et al.

analysis and feature engineering to the model. Performance has increased, and the boosting technique LightGBM has great accuracy compared to other algorithms at 97.2%. Data Visualization for the Harvest Case 1: Correlativity explains the difference link between the columns’ strong and weak bonds. In order to aid feature engineering, each feature offers distinct data that can be used in a variety of ways to set up both known and undiscovered connections. Figure 4 shows how each attribute of the data relates to the others, and as color brightness grows, so does the correlation between the attributes. An analysis of the graph shows that the estimated number of insects depends on the number of weeks used by an amount of 0.41 and the number of weeks used depends on the pesticide category by an amount of 0.3 and so on. This analysis will deepen our understanding of the relationships between the traits and how altering one impacts the other. Case 2. The type of a crop in Fig. 5 crop type damaged is very more prevalent when compared with crop type in the plot between crop type and relative to crop damage, from which certain conclusions can be drawn. Case 3. Crop damage versus pesticide use is depicted in Fig. 6. Case 4. Distribution of the number of weeks used in relation to the interdependent class variable is depicted on Fig. 7. Case 5. The bar graph displays crop damage in relation to the kind of crop and the projected number of insects. Plot between estimated bug count and crop damage is shown in Fig. 8. The graph’s visual cues for differentiating different groupings of data are identified in the legend. Understanding the significance of grouping is made easier by the legend. In Fig. 8, it is utilized to categorize the data and is categorized as 0 and 1. The prediction system’s output is shown in Fig. 9, where users can complete the necessary fields and quickly determine the crop status. Fig. 4 Correlation plot

Efficient System to Predict Harvest Based on the Quality of the Crop … Fig. 5 Crop_Type grouped count plot

Fig. 6 Crop damage versus pesticide use category

Fig. 7 Weeks in relation to the dependent variable

229

230

S. Divya Meena et al.

Fig. 8 Crop damage based on the kind of crop and estimated insect population

Fig. 9 Flask deployment

5.1 Comparative Analysis Predicting how the crop plantation will turn out or behave at the end of harvest season is the main goal of harvest season results. Therefore, machine learning is used to tackle this problem, and a comparative analysis was carried out by following several methods and attempting to fit the model in them is shows in Table 3. The performance measurements of each method were later noticed to be Gaussian NB,

Efficient System to Predict Harvest Based on the Quality of the Crop … Table 3 Compares improved accuracy to current accuracy

231

S. No.

Approaches

Accuracy earlier (%)

Increased accuracy (%)

1

(a)

84.6

97.2

2

(b)

82.2

92.6

3

(c)

84.3

93.2

4

(d)

82.4

77.4

5

(e)

75.3

90.3

6

(f)

84

95.4

K-nearest neighbor, decision tree classifier, XGBoost, AdaBoost, and LightGBM are some examples of classification algorithms. It has been determined that LightGBM performs well in comparison with other algorithms. Additionally, this is the first method in which the model was trained and exploratory data analysis was done. The first method’s maximum accuracy is 84.6%. The second method involved adding the new columns to the big dataset, generating very lags between the columns to establish trends in the data and adding some erroneous data before repeatedly using exploratory data analysis and feature engineering. The model was then applied to the algorithms, and a sharp rise in accuracy was noted, foremost to the best possible predictive analysis. where (a) LightGBM (b) random forest classifier, (c) K-nearest neighbor, (d) Gaussian (e) decision tree naïve Bayes (f) AdaBoost. Performance has increased, and the boosting technique LightGBM has great accuracy compared to other algorithms, at 97.2%

6 Conclusion and Future Work This learning found that this is based on the information accessible in this field and under particular publications utilizes a number of features. The prediction systems were created by almost all publications utilizing ML techniques, although the features vary. To determine which model performs the best, models with different numbers of characteristics should be compared. In various studies, different algorithms have been used. Although it is challenging to predict which model is superior, the use of different machine learning techniques is shown by data to vary. Among the most popular models are CNN, linear regression, and neural networks. To determine which model could be able to predict outcomes the most accurately, many studies used different models. In this investigation, we used the LightGBM classifier, which maximizes prediction and has the highest accuracy. The website has already been made. The Web site aims to predict agricultural yield by providing data from that region. This work is expected to lay the groundwork for further investigation into the development of the agricultural yield projection problem. By focusing on the creation of numerous cropping prediction models based on DL, we hope to enhance the results of this work in the future.

232

S. Divya Meena et al.

References 1. McQueen RJ, Garner SR, Nevill-Manning CG, Witten IH (1995) Applying machine learning to agricultural data. Comput Electron Agric 12(4):275–293 2. Aybar-Ruiz A, Jiménez-Fernández S, Cornejo-Bueno L, Casanova-Mateo C, Sanz-Justo J, Salvador-González P, Salcedo-Sanz S (2016) A novel grouping genetic algorithm-extreme learning machine approach for global solar radiation prediction from numerical weather models inputs. Sol Energy 132:129–142 3. Asadi H, Dowling R, Yan B, Mitchell P (2014) Machine learning for outcome prediction of acute ischemic stroke post intra-arterial therapy 4. Beulah R (2019) A survey on different data mining techniques for crop yield prediction. Int J Comput Sci Eng 7(1):738–744 5. Cramer S, Kampouridis M, Freitas AA, Alexandridis AK (2017) An extensive evaluation of seven machine learning methods for rainfall prediction in weather derivatives. Expert Syst Appl 85:169–181 6. Kong L, Zhang Y, Ye ZQ, Liu XQ, Zhao SQ, Wei L, Gao G, CPC (2007) Assess the proteincoding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res 35:345–349 7. Richardson A, Signor BM, Lidbury BA, Badrick T (2016) Clinical chemistry in higher dimensions: machine-learning and enhanced prediction from routine clinical chemistry data. Clin Biochem 49:1213–1220 8. Kang J, Schwartz R, Flickinger J, Beriwal S (2015) Machine learning approaches for predicting radiation therapy outcomes: a clinician’s perspective. Int J Radiat Oncol Biol Phys 93:1127– 1135 9. Rhee J, Im J (2017) Meteorological drought forecasting for ungauged areas based on machine learning: using long-range climate forecast and remote sensing data 10. Zhang B, He X, Ouyang F, Gu D, Dong Y, Zhang L, Mo X, Huang W, Tian J, Zhang S (2017) Radiomic machine-learning classifiers for prognostic biomarkers of advanced 403:21–27 11. Kong L, Zhang Y, Ye ZQ, Liu XQ, Zhao SQ, Wei L, Gao G (2007) CPC Assess the proteincoding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res 35:345–349 12. Takahashi K, Kim K, Ogata T, Sugano S (2017) Tool-body assimilation model considering grasping motion through deep learning. Rob Auton Syst 91:115–127 13. Gastaldo P, Pinna L, Seminara L, Valle M, Zunino R (2015) 2015, A tensor-based approach to touch modality classification by using machine learning. Rob Auton Syst 63:268–278 14. Zhou C, Lin K, Xu D, Chen L, Guo Q, Sun C, Yang X (2018) 2018, Near infrared computer vision and neuro-fuzzy model-based feeding decision system for fish in aquaculture. Comput Electron Agric 146:114–124 15. Maione C, Barbosa RM (2018) Recent applications of multivariate data analysis methods in the authentication of rice and the most analyzed parameters: a review. Crit Rev Food Sci Nutr, 1–12 16. Kaur M, Gulati H, Kundra H (20014) Data mining in agriculture on crop price prediction: techniques and applications. Int J Comput Appl 99(12):1–3 17. López-Cortés XA, Nachtigall FM, Olate VR, Araya M, Oyanedel S, Diaz V, Jakob E, RíosMomberg M, Santos LS (2017) 2017, Fast detection of pathogens in salmon farming industry. Aquaculture 470:17–24 18. Priya, Muthaiah U, Balamurugan M (2018) Predicting yield of the crop using machine learning algorithm. Int J Eng Sci Res Technol (IJESRT) 19. Barboza F, Kimura H, Altman E (2017) 2017, Machine learning models and bankruptcy prediction. Expert Syst Appl 83:405–417 20. Dhanya CT, Nagesh Kumar D (2009) Data mining for evolution of association rules for droughts and floods in India using climate inputs. J Geo Phys Res 114:1–14

Efficient System to Predict Harvest Based on the Quality of the Crop …

233

21. Chlingaryan A, Sukkarieh S, Whelan B (2018) Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: a review. Comput Electron Agric 151:61–69 22. Zhao Y, Li J, Yu L (2017) A deep learning ensemble approach for crude oil price forecasting 23. Elavarasan D, Vincent DR, Sharma V, Zomaya AY, Srinivasan K (2015) Forecasting yield by integrating agrarian factors and machine learning models: a survey. Comput A, Fernandez, C., Maiguashca J 24. Liakos KG, Busato P, Moshou D, Pearson S, Bochtis D (2018) Machine learning in agriculture: a review. Sens (Switzerland) 18 25. Mackowiak SD, Zauber H, Bielow C, Thiel D, Kutz K, Calviello L, Obermayer B (2015) Extensive identification and analysis of conserved small ORFs in animals. Genome Biol 16(1):1–21 26. Samuel AL (2000) Some studies in machine learning using the game of checkers. IBM J Res Dev 44(1.2):206–226 27. Bohanec M, Kljaji´c Borštnar M, Robnik-Šikonja M (2017) Explaining machine learning models in sales predictions 28. Wildenhain J, Spitzer M, Dolma S, Jarvik N, White R, Roy M, Tyers M (2016) Systematic chemical-genetic and chemical-chemical interaction datasets for prediction of compound synergism. Sci data 3(1):1–9 29. Veenadhari S, Misra B, Singh CD (2014) Machine learning approach for forecasting crop yield based on climatic parameters. In: 2014 International conference on computer communication and informatics (pp. 1–5). IEEE 30. Li B, Lecourt J, Bishop G (2018) Advances in non-destructive early assessment of fruit ripeness towards defining optimal time of harvest and yield prediction—a review

ResNet: Solving Vanishing Gradient in Deep Networks Lokesh Borawar and Ravinder Kaur

Abstract Training of a neural network is easier when layers are limited but situation changes rapidly when more layers are added and a deeper architecture network is built. Due to the vanishing gradient and complexity issues, it makes it more challenging to train neural networks, which makes training deeper neural networks more time consuming and resource intensive. When residual blocks are added to neural networks, training becomes more effective even with more complex architecture. Due to skip connections linked to the layers of artificial neural networks, which improves residual network (ResNet) efficiency, otherwise it was a time consuming procedure. The implantation of residual networks, their operation, formulae, and the solution to the vanishing gradient problem are the topics of this study. It is observed that because of ResNet, the model obtains good accuracy on image recognition task, and it is easier to optimize. In this study, ResNet is tested on the CIFAR-10 dataset, which has a depth of 34 layers and is both, more dense than VGG nets and less complicated. ResNet achieves error rates of up to 20% on the CIFAR-10 test dataset after constructing this architecture, which takes 80 epochs. More epochs can decrease the error further. The outcomes of ResNet and its corresponding convolutional network (ConvNet) without skip connection are compared. The findings indicate that ResNet offers more accuracy but is more prone to overfitting. To improve accuracy, overfitting prevention techniques including stochastic augmentation on training datasets and the addition of dropout layers in networks have been used. Keywords ResNet · ConvNet · CIFAR-10 · Dropout · Data augmentation · Image recognition

L. Borawar · R. Kaur (B) Department of Computer Science and Engineering, Chandigarh University, Mohali, Punjab, India e-mail: [email protected] L. Borawar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_21

235

236

L. Borawar and R. Kaur

1 Introduction Convolutional neural networks (CNNs) have achieved success in a number of applications involving images and their categorization. The layered structure of CNNs is present, and each layer has its own convolutional filter. These filters combine the input picture and produce feature vectors for the following layer. Filters are rather simple to learn. The degradation problem is addressed in this research without the use of a deep residual network. Layers are employed with a residual mapping rather than just a few stacked layers to obtain the required input to output mapping. In reference to Fig. 1a, say H(x) is original data mapping, and this desired mapping comes from stacked non-linear layers. Further, this data mapping is sent into residual mapping to get F(x) = H(x) − x. The actual mapping is changed into F(x) + x, shown in Fig. 1b [1]. Hypothesis on residual mapping is that it will optimize faster than the original one, and the previous studies (see Sect. 2) indicate it is true. By using feed-forward propagation with bypass connections, as illustrated in Fig. 1a [1], the expression of F(x) + x is derived. Bouncing connections between one or more sequential layers are known as bypass connections. The bypass connection described in this work is just an identity mapping without any parameters. Output of bypass connection x is added in output of stacked layers F(x). Identity bypass links in Fig. 1a do not increase computational complexity and parameters. Adam optimizer [2] backpropagation method allows for end-to-end training of the whole network, and leveraging libraries like TensorFlow make its implementation incredibly simple and modification free. CNNs initial layers filters learn low-level properties like as color contrasts, edges, and blobs, while layers at later levels retain more challenging or high-level information like the geometry of a particular item. CNNs can improve their classification performance by deepening their networks, however, they are challenging to train for the following reasons:

(a) A residual blo Fig. 1 ResNet architecture

(b) Representation of residual function

ResNet: Solving Vanishing Gradient in Deep Networks

237

• Harder to optimize: As a deeper network is developed, more parameters are included into the model so it will make the network tougher to train. Lots of parameter create complicated curves with several local minima and maxima, so it is more difficult to optimize. • Vanishing and exploding gradients: At the time of backpropagation, if weight changes exponentially small (vanishing gradients) in every layer, then training of neural network will take a lot of time, and it stops neural network training completely in some circumstances. And gradients explodation is exactly the vanishing gradient’s opposite. In exploding gradients by some means, weight becomes high so the derivative will also be high. It updates the new weight more; hence, the model never converges. Deeper models are advantageous for improved classification performance although they are more difficult to train. ResNet is the answer to this issue. It differs from a basic network. The sole distinction is a skip link that runs parallel to regular convolutional layers. This shortcut also allows for backpropagation, which speeds up optimization. For training on the CIFAR-10 dataset [3], TensorFlow framework is used for this implementation. Data augmentation techniques are used to avoid overfitting. And comparison between ResNet (ConvNet with residual connection) and ConvNet (ResNet without skip connection) is done later in this paper.

2 Related Work Practices and ideas which talk about shortcut connections [4–6] have been studied for long time. Earlier practice is to provide linear layer connection from input to output of the network and to try training these multilayer perceptrons (MLPs) [5, 6]. In [7, 8], for addressing vanishing and exploding gradients, few layers are connected to auxiliary classifiers. The papers of [9–12] suggest techniques for gradients, propagated errors, and centering layer responses, executed by shortcut connections. In [8], “inception” model with deeper branches and a shortcut branch is introduced. Parallel to our work, “highway networks” [13, 14] represent gating function [15] using shortcut connections. These gates also have parameters and are data-dependent, totally opposite to identity shortcuts which are parameter free. In highway networks, the layers act as non-residual functions when gated shortcut connection is “closed” (approached to zero). In contrast, all information is always passed through and learn residual function so identity shortcuts will never get closed. However, extremely deep highway networks (e.g., over 100 layers) does not show good accuracy. There is a little and obvious difference between ResNet and ConvNet which is skip connection. It offers gradients a smooth path that they may utilize subsequently to backpropagate to the network’s initial layers. By avoiding dead neurons or vanishing gradient issues, neural network learning became quick in this approach. In their publication [16], researchers offered several design options for ResNets, including residual networks with 18, 34, 50, 101, and 152 layers. ResNet is made up of a

238

L. Borawar and R. Kaur

Fig. 2 ResNet basic block

collection of interconnected multiple blocks. A fundamental residual block is shown in Fig. 2 [17]. If the output vector size from a residual block is the same as the input vector and the bypass connection, the matrix may be described as an identity matrix. In addition, if the size is different, it may be changed using average pooling for reduction and zero padding for expansion. Researchers tested several shortcut connections in ResNets and discovered that adding a parametered layer loses the residual network advantage since there is no longer a faster route to convey gradients [17]. Considering that circumstance, adding a layer without parameters like dropout and ReLU after addition likewise does not offer a significant benefit or drawback.

3 Residual Network 3.1 Mathematical Representation of ResNet Here, residual connection is applied to few stacked layers for less complexity. Formula of a building block mentioned in Fig. 1 is derived as follows: y = F(x, {Wi , bi }) + x

(1)

y and x represent the output and input vector, respectively. F(x, {W i }) residual mapping is to be learnt. The building block shown in Fig. 1 has two layers, so the complete function will be F(x, {Wi , bi }) = W2 σ (W1 x + b1 ) + b2

(2)

ResNet: Solving Vanishing Gradient in Deep Networks

239

where σ denotes ReLU [18] function. Then, shortcut connection is performed, and element wise addition is done as F(x, {W i , bi }) + x. After the addition, non-linearity σ (y) is performed. The shortcut connections shown in Eq. (1) does not contain extra parameters and also does not increase computational complexity. This is not for appealing but also crucial for getting differences between residual and plain networks. Sufficient difference between residual and plain networks can be achieved in terms of width, amount of parameters, depth, and computational value (other than element wise summation). For successful addition, the dimension of F and x must be identical. Sometimes, dimension changes because of changing input and output channels; to solve this, two options are considered: • Dimension can be increased by applying extra zero padding to vector which is coming from the shortcut connection. This option will not add more parameters to the network. • Right dimension of the shortcut connection can be achieved by using linear projection W s (done by 1 × 1 convolution), shown in Eq. (3). y = F(x, {Wi , bi }) + Ws x

(3)

The residual function is flexible. Residual connection between any number of layers can be applied, but if one uses it only on one layer, then as per equation Eq. (1), it is just a linear layer: y = W 1 + x (because adding skip connection simply works as adding bias), which does not give any advantage to the problem. All notations mentioned above are about fully connected layers for better understanding, and residual connection also can apply to convolutional layers. In this case, function F(x, {W i }) represents numerous CNN layers. The factor wise summation is carried out between two features (one is coming from convolutional layer, and another is coming from skip connection) channel by channel.

3.2 Types of ResNet and Their Comparison Some pre-existing ResNets are as follows: • 34-layer ResNet: ResNet-34 has 21.8 million parameters, ReLU activation, and at the back of convolution layers, batch normalization is applied to each basic block. • 50-layer ResNet: For each 2-layer, block presents in 34-layer exchanged with 3layer (these three layers are 1 × 1, 3 × 3, and 1 × 1 convolutions) block. Resulting in 50-layer ResNet. And it uses second option for increasing dimensions. • 152-layer and 101-layer ResNets: 152 and 101 layer ResNet are constructed by having more than 3-layer blocks. Even though the length of the network is remarkably increased, still 152-layer residual network has less intricacy than VGG-16.

240

L. Borawar and R. Kaur

Table 1 Comparison of accuracies between ResNets Model

Error top 1 (%)

Error top 5 (%)

ResNet-34

26.73

8.74

ResNet-50

22.85

6.71

ResNet-101

21.75

6.05

ResNet-152

21.43

5.71

Table 2 Number of parameters of the networks

Model

Parameter number

ResNet-34

21.8 M

ResNet-50

25.6 M

ResNet-101

44.5 M

ResNet-152

60.2 M

50/101/152-layer residual networks show higher accuracy than ResNet-34 by considerable margins shown in Table 1 [16]. So, depth is witnessed to good accuracy. Table 2 shows number of parameters which exists in these networks. Major characteristics of ResNet • Batch normalization is used at the center of ResNet. The batch normalization improves the performance of the network by adjusting the input layer. Using this, the issue of covariate shift is alleviated. • ResNet protects the network from vanishing gradient problem by using identity connection. • Deep ResNet uses bottleneck residual block design to improve the performance of the network and also reduces number of parameters.

3.3 Solution to Vanishing Gradients At the backpropagation time, gradients have two pathways to travel back to the input layer for updating the weights while passing a residual block. As described in the Fig. 3 [1], gradients can travel back by two pathways: path-1 is through identity mapping, and path-2 is through residual mapping. At the time computed gradients pass through path-2, two layers come across (as shown in Fig. 1) that are W 2 and W 1 in the residual network F(x). The kernels (if layers are convolutional layers) or the weights W 2 and W 1 are updated and new gradients computed. These newly computed gradients can be smaller or vanish for the initial layers. The identity mapping (shortcut connection) [19] will help to solve this vanishing gradients problem. Gradients can pass through path-1 shown in the Fig. 3. At passing time of the gradients through path-1, gradients will not face any weight layers. Hence, no changes

ResNet: Solving Vanishing Gradient in Deep Networks

241

Fig. 3 Gradient pathways in ResNet

occurred to computed gradients. Gradients reach to initial layers by skipping residual block at once that will help the weights to learn correctly. ResNet basic block has ReLU function applied after F(x) + x operation; hence, gradient values would be changed as soon as they are getting inside the residual block.

4 Dataset and Implementation The dataset and framework discussed in this part help in model training.

4.1 Dataset CIFAR-10 dataset [3] is used in this task. The CIFAR-10 dataset contains 60,000 total images of 10 classes, and each class have 6000 images of size 32 × 32 pixels. There are around 50,000 training images and roughly 10,000 test set images. Figure 4 [3] shows some examples of randomly picked images from each class. Those 10 classes are (1) airplane, (2) automobile, (3) bird, (4) cat, (5) deer, (6) dog, (7) frog, (8) horse, (9) ship, (10) truck. All the classes available in the dataset are mutually exclusive means there is no overlapping between them.

4.2 TensorFlow TensorFlow [20] is a framework (open source) which is useful in artificial intelligence and machine learning. Although it may be used to many other tasks, deep learning is the core emphasis. This framework is used in this paper to create and assess ConvNet and ResNet models. Users may also do training on a GPU with TensorFlow, which also offers some predefined neural network layers with the option to alter them and also allow for the creation of custom layers.

242

L. Borawar and R. Kaur

Fig. 4 Glimpse of CIFAR-10 dataset

5 Network Design The neural network is initially designed using the residual model developed in [17]. ResNet’s image classification model was specifically created to receive 256 × 256 pixel pictures and classify them into 1000 categories. There are several options, starting with this trained model. One way is to alternate the input and output layer such that CIFAR-10 has pictures of size 32 × 32 which can be accepted and classified into about 10 different categories. Simple techniques include sending the original image to the second convolutional layer while bypassing the first layer and fine-tuning the final layers to increase accuracy, or simply rearranging a 32 × 32 image to a 256 × 256 image and then forwarding that image to the input layer. Here, ResNet models with ConvNet’s similar architecture as ResNet but without skip connection are contrasted, so necessitating the creation of a new model from

ResNet: Solving Vanishing Gradient in Deep Networks

243

scratch. Firstly, network architecture for classifying image is described, and after that, the data augmentation technique in the model to prevent overfitting is represented.

5.1 Network Architecture ResNet-34 inspired by VGG-19 architecture on which skip connection or shortcut connection is added. These residual block or skip connection changes the architecture into residual network. Initially, ResNet block has a convolution a layer which has 64 filters of size 7 × 7, and this is first convolution layer which is then followed by a max-pooling layer. This pooling layer is again followed by the convolution layer with same parameters. Stride is set to 2 in both cases. After that there are two convolution layers which is made of kernel size 3 × 3 with 64 number of filters, and this is a first basic block which is repeated three times. For increasing the size of channel, pool layer (stride by 2) followed by 128 filter convolution layer is used. After that 2 convolution layers with kernel of size 3 × 3 and 128 number of filters are repeated four times. This will continue till the average pooling layer and softmax function; and each time the number of filters get doubled. When the CIFAR-10 dataset is added to the original ResNet, overfitting will not be prevented. Therefore, the basic block seen in Fig. 2 is given the dropout layer with 0.7 probability (to employ random neurons in the layer) in order to lessen overfitting. Following the addition of the dropout layer between two convolution layers creates a new basic block, Fig. 5 depicts a new basic ResNet block. Fig. 5 Dropout layer in basic block

244

L. Borawar and R. Kaur

Fig. 6 Sample ResNet-34 model for image classification

ReLU activation function used for adding non-linearity in neurons. Complete 34 layers ResNet architecture with 10 classes output is built by help of new basic block, but real architecture of ResNet does not have dropout layer. But to reduce overfitting, it is needed in image classification task, and also true ResNet architecture contains 1000 nodes in the output layer. Figure 6 represents both 34-ResNet and ConvNet with same number of layers as ResNet have but without skip connection. ResNet with dropout achieved 80% classification accuracy on the cross-validation set. For more details, see Sect. 6.

5.2 Data Augmentation Each class in the CIFAR-10 dataset has 6000 pictures. The neural network is too deep, and this dataset is thought to be too tiny. The data augmentation approach, which increases the quantity of different pictures seen by model, is used to prevent overfitting. Adding cropped and horizontally flipped photos to the current dataset is one method of augmentation. But as there is not so much memory available, it is impossible to load all the new datasets into memory. Therefore, this may be done

ResNet: Solving Vanishing Gradient in Deep Networks

245

online rather than offline manually adding freshly developed pictures to a dataset. Every time a new batch of photographs arrives, a random transformation function may be utilized to augment the existing batch of images. The previously mentioned method uses probability P as a parameter to flip photos horizontally, randomly crop them to 26 × 26 pixels, and then rescale them to their original size of 32 × 32 pixels.

6 Experiment Results The primary difficulty with deep neural networks is vanishing gradient, however, this never arises with light networks. Two shallow networks with six layers have been built to observe this conclusion. Figure 7a details the accuracy and loss on the validation and training data for this network. ResNet and ConvNet with 6 layers can be observed to perform equally well, however, less effectively than 6 layers, such as 4 layers (Fig. 7b). ResNet’s validation accuracy is significantly worse than that of a straightforward ConvNet with the same layers. Because adding an input vector to the convolutional layer’s output just averages learned data with raw data, which is bad for training, the consequence is evident. But the accuracy will increase if skip connection is employed for deep networks. On the validation dataset, the deep residual network architecture depicted in Fig. 6 obtained an accuracy of 80%. The same architecture’s plain ConvNet (without skip connections) was also trained. Comparison of these two models’ accuracy is shown in Fig. 8a (ResNet and equivalent ConvNet). In comparison with ResNet, the similar ConvNet has training accuracy of 52% and validation accuracy of 60%. Very less difference was observed between validation accuracy and training accuracy in both case of training (through ResNet and similar architecture of ConvNet) because of overfitting technique, one is introducing dropout in ResNet, and another one is data augmentation. Figure 8b shows the overfitting condition because a huge difference exists between validation accuracy and training accuracy so it means true architecture

(a) 6-layers

(b) 4-layers

Fig. 7 6 and 4 layers neural network’s each epoch validation/training accuracy

246

(a) ResNet and it’s equivalent ConvNet

L. Borawar and R. Kaur

(b) ResNet without overfitting methods

Fig. 8 Accuracies versus epoch curve

of ResNet is more vulnerable to overfitting. Difference between training and validation accuracy is about 22%, and this difference is significantly decreased to 4% due to overfitting techniques which are discussed in Sect. 5. Another method to reduce overfitting is to use lesser parameters which means lesser number of convolutional layers. If accuracy of lesser layers or 6 layers is observed, there is no significant difference exist between ResNet with skip and without skip, but in 34 layers, notable difference exist. Building deeper network where number of layers increases from 6 to 34 layers (without skip connection) results in validation accuracy reduction from 70 to 50% approximately.

7 Conclusion As indicated before, the training procedure may be greatly speed up while simultaneously improving accuracy by adding a skip connection to the deep network. Overfitting, however, is the issue with residual networks, which is undesirable. Additionally, with help of data augmentation, injecting dropout layers to neural networks and using regularization and other machine learning techniques can all help to lessen overfitting. If an architecture is properly constructed with fewer parameters, it can also help to lessen overfitting. According to the aforementioned observation, ResNet is extremely helpful for deep neural networks training. If this is applied to very shallow networks, it will not have an impact on performance (using skip connections and not using skip connections will result in the same results, so for less layers skip connections are not useful). ResNet also aids in the solution of the vanishing gradient problem. ResNet, therefore, offers a fantastic possibility for deep learning, but it cannot be employed naively; before to implementation, it still has to have adequate functional understanding.

ResNet: Solving Vanishing Gradient in Deep Networks

247

References 1. Detailed guide to understand and implement ResNets. CV (2019) Retrieved November 27, 2021, from https://cv-tricks.com/keras/understand-implement-resnets/ 2. Bock S, Weiß M (2019) A proof of local convergence for the adam optimizer. Int Joint Conf Neural Netw (IJCNN) 2019:1–8. https://doi.org/10.1109/IJCNN.2019.8852239 3. Cs.toronto.edu (2022) CIFAR-10 and CIFAR-100 datasets. [online] Available at: https://www. cs.toronto.edu/kriz/cifar.html. Accessed 15 Dec 2021 4. Bishop CM (1995) Neural networks for pattern recognition. Oxford university press 5. Ripley BD (1996) Pattern recognition and neural networks. Cambridge university press 6. Venables W, Ripley B (1999) Modern applied statistics with s-plus 7. Lee CY, Xie S, Gallagher P, Zhang Z, Tu Z (2014) Deeplysupervised nets. arXiv:1409.5185 8. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: CVPR 9. Raiko T, Valpola H, LeCun Y (2012) Deep learning made easier by linear transformations in perceptrons. In: AISTATS 10. Schraudolph NN (1998) Centering neural network gradient factors. In: Neural networks: tricks of the trade. Springer, p 207–226 11. Schraudolph NN (1998) Accelerated gradient descent by factor centering decomposition. Technical report 12. Vatanen T, Raiko T, Valpola H, LeCun Y (2013) Pushing stochastic gradient towards secondorder methods–backpropagation learning with transformations in nonlinearities. In: Neural Information Processing 13. Srivastava RK, Greff K, Schmidhuber J (2015) Highway networks. arXiv:1505.00387 14. Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. 1507.06228 15. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 16. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE Conf Comput Vis Pattern Recogn (CVPR) 2016:770–778 17. Torch.ch (2021) Torch—exploring residual networks. Available at: http://torch.ch/blog/2016/ 02/04/resnets.html Accessed 26 Nov 2021 18. Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: ICML 19. He K, Zhang X, Ren S, Sun J, Identity mappings in deep residual networks 20. TensorFlow (n.d.) The functional API, TensorFlow Core. [online] Available at: https://www. tensorflow.org/guide/keras/functional#a toy resnet model. Accessed 29 Nov 2021

BRCA1 Genomic Sequence-Based Early Stage Breast Cancer Detection S. G. Shaila, Ganapati Bhat, V. R. Gurudas, Arya Suresh, and K. Hithyshi

Abstract Breast cancer is one of the most common types of cancer found in more than half of women worldwide, and the main cause of breast cancer is due to changes in the genomic sequence of breast tissue caused mainly by environmental factors and genetic mutations. There is a specific genomic sequence associated with the diagnosis of breast cancer. There is a sequence commonly referred to as the breast cancer gene 1 (BRCA1) sequence, the modification of the aforementioned sequence will increase breast cancer. The paper aims at discussing the genomic sequence-based breast cancer detection. The proposed approach identifies the specific changes that occur in the BRCA1 sequence. The approach analyzes the specific affected areas based on the additional studies conducted on the sequence of acquired DNA. Various techniques are used for analysis and tried to find the most accurate methods of DNA analysis and discovery. The approach applies one-hot encoding technique to compare the mutated BRCA1 and normal BRCA1. This identifies the location and degree with which the gene is being altered. Thus, this paper proposes the technique that identifies the place and degree of alteration that causes the breast cancer. Keywords Breast cancer · DNA sequence · Gene · BRCA1 · Classification

1 Introduction Nowadays, breast cancer is one of the most popularly caused cancer types found in more than half of women worldwide. The main cause of breast cancer is due S. G. Shaila (B) · G. Bhat · V. R. Gurudas · A. Suresh · K. Hithyshi Dayananda Sagar University, Bangalore, Karnataka 560068, India e-mail: [email protected] G. Bhat e-mail: [email protected] V. R. Gurudas e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_22

249

250

S. G. Shaila et al.

to changes in the DNA sequence of breast tissue caused mainly by environmental factors, genetic mutations, life style, etc. There is a specific gene sequence associated with the diagnosis of breast cancer and that is commonly referred as the breast cancer gene 1 (BRCA1) gene sequence. BRCA1 is a tumor suppressor gene which is known as a caretaker gene and is responsible for repairing DNA. BRCA1 and BRCA2 (breast cancer gene 2) are two unrelated proteins that are present in the cells of human breast and other tissue, where they support in repairing the damaged DNA or they destroy the cells if DNA. BRCA1 inherited from the genes of the affected parents have the chances of getting cancer to their offspring. If any of the parent has a genetic mutation, there is a 50% chance that the offspring will also get cancer as shown in the Fig. 1. The first and foremost limitation associated with the identification of breast cancer using BRCA1 gene mutation is that there are chances that there will be a false positive result when there is an unaffected BRCA1 gene is present. Due to environmental and other genetic factors, there are instances that some have mutations in DNA resulting in cancerous. Figure 1 represents the combination. The BRCA1 found at the chromosome 17 as shown in Fig. 2 along the long arm at region 2 band one, and it is extended from 41,196,321 base pair to 41,277,500 base pair. The BRCA1 base pair is found in most of the vertebrates. This is represented in Fig. 2. The approach finds that certain alteration in particular gene causes cancer, and this can be treated as priority in removing the particular genes in an individual that have the inherited cancerous characteristics. For others, who does not have inherited gene can also undergo removal, thereby avoiding the cancer spread.

Fig. 1 Position of BRCA1 gene in chromosome 17 of human DNA

Fig. 2 Positioning of gene base pair

BRCA1 Genomic Sequence-Based Early Stage Breast Cancer Detection

251

Fig. 3 Proposed model for identification of point mutation in BRCA1 gene sequence

BRCA1 is name after breast cancer 1 and is basically located in the chromosome no 17, and the transmission of the gene is taken place according to the autosomal gene pattern that are inherited from the family. This is represented in Fig. 3. The BRCA1 protein contains the following domains: • Finger like structure made of Zinc type is denoted as C3HC4 (RING finger). • BRCA1 C terminus (BRCT) domain. • The protein contains local nuclear signals as well as nuclear export signal. BRCA1 gene protein contains different protein domains such as 2 variety of BRCT and Znf domain. Isomers 1 and 2 contain 1863 amino acids with six isomers for BRCA1. In this paper, the proposed approach discusses about identifying the specific changes that occur in the BRCA1 sequence. The approach analyzes the specific affected areas based on the sequence of acquired DNA. Various techniques are used for analysis. The approach finds that certain alteration in particular gene causes cancer, and this can be removed in an individual as the gene has inherited cancerous characteristics. For others, who does not have inherited gene can also undergo removal, thereby avoiding the cancer spread. The approach identifies the location and degree with which the gene is being altered. The rest of this paper is organized as follows. We review the literature in the next section, and the proposed work is presented in Sect. 3. In Sect. 4, we present the experimental results and conclude this paper in the last section of this paper.

2 Literature Survey This section represents various literatures done by many researchers. The researchers used four models with certain dimensions, and each of which was magnified expecting that the intensity of each models provides the information about each

252

S. G. Shaila et al.

model under consideration from the obtained intensity, the cancer prediction was taken place [1]. The authors had used a deep convolutional neural network (CNN) of ResNet which is trained on whole-slide images (WSIs) to predict the gBRCA mutation in breast cancer in which various types of molecular approaches were used to obtain triple negative breast cancer. The authors in [2] analyze and used machine learning classification to identify genes differentiating triple negative breast cancers. The authors in [3] proposed genome deep learning to study the relationship between genomic variations and traits to identify 12 different types of cancers. Here, the approach used ReLU activation function for the identification and optimization of the model L2 regularization was considered. Reference [4] explains that uncertainty in gene significance will affect the genetic information, and they used SNVs for identification and clinical interpretations. The next [5] referred paper was for the prediction of breast cancer, and ovarian cancer was based on the obtained BRCA1 sequence and the exon 11 of BRCA2 sequence using different bioinformatics tools. The authors in [6] studied about the frequency or the chances of the BRCA1 sequence for the surrogate and their fitness and thereby identifying the risk of breast cancer. The next reference was on [7] the identification of germline DNA variations from BRCA1 sequence. Human DNA BRCA1 sequence dataset is available in the National Center for Biotechnology Information (NCBI) [8]. The approach in [9] discusses identification of different variety of breast cancers by identifying susceptibility single nucleotide polymorphisms (SNPs). The approach in [10] analyzes breast cancer, and point mutation is identified using tissue sequencing approach, which will give certain values for point mutation analysis. The authors in [11] used HMRS techniques for the identification of breast cancer, which can be considered as a complementary measure for breast cancer. The approach in [12] discusses breast cancer analysis and classification using machine learning techniques, biosensors, breast screening techniques. Breast screening techniques were further classified into mammography, MRI, WSI, ultrasound. Biosensors use optical biosensor, electrochemical biosensor, piezo electrical biosensor for rectifying. Machine learning techniques such as KNN, SVM, and decision tree were used.

3 Proposed Work The proposed model is represented in Fig. 3. It represents the work flow from the data extraction process to the prediction of the breast cancer. Human DNA BRCA1 sequence dataset is available in the National Center for Biotechnology Information (NCBI) [8]. The sequence was converted to binary representation using one-hot encoding process. This process was done on both mutated and normal gene sequences to identify the position of changes that happens in sequences and to predict the breast cancer. String and binary comparison techniques were used in the approach. The classifiers that are used in machine learning cannot be applied in this technique, since the evaluation metrics are different. The classifiers that are used in machine

BRCA1 Genomic Sequence-Based Early Stage Breast Cancer Detection Table 1 Details of the NCBI dataset

253

Dataset

Type of DNA sequence

Length of sequence

NCBI

Normal BRCA1 sequence

7191

Mutated BRCA1 sequence

7191

learning cannot be applied in this technique, since the evaluation metrics are different. The detailed architecture is shown in Fig. 3.

3.1 BRCA1 Dataset Description Human DNA BRCA1 sequence dataset is available in the National Center for Biotechnology Information (NCBI) [8]. The details are represented in Table.1. The extracted BRCA1 sequence was in FASTA form, and preliminary processing was performed such as removing specific sequences. This gene consists of 190kD phosphoprotein code that helps in maintaining genomic stability. The BRCA1 gene consists of 22 exons that is 110 kb of DNA. The coded protein combines with other plant extracts DNA damage sensors and signal transducers to create an oversized complex component of the protein because of the BRCA1 associated genome surveillance complex (BASC). This genetic product is said to be in RNA polymerase II and thru the C-terminal domain interacts with histone deacetylase complexes. Thus, this protein plays a task in transcription, DNA sequencing of double-stranded breakthroughs, and recombination. Genetic mutations during this gene contribute to about 40% of inherited carcinoma and over 80% of inherited ovarian cancer. Other compounds play a role in regulating the subcellular localization and life function of this component.

3.2 Data Pre-processing The proposed approach deals with dataset pre-processing. This is basically performed in order to obtain the desired output or the expected output by removing the unwanted noisy information from the raw dataset. The data extracted from the NCBI [8] is already in the form of FASTA sequence. Hence, initially, the approach extracts the normal gene sequence from the FASTA sequence.

254

S. G. Shaila et al.

Fig. 4 Encoding techniques

3.3 Genomic Sequence Encoding When processing DNA sequences, it is necessary to convert the wire sequence into a numerical sequence, in order to build the matrix input model training. Data preprocessing is a very important task to obtain the desired output, since the dataset is huge, we must consider various types of sequencing techniques in order to obtain the mapped output. There are three modes: sequential coding, one-hot encoding, and k-mer encoding (one-hot coding is widely used in deep learning methods and is best suited for algorithms like CNN). The encoding techniques are represented in Fig. 4. The approach is experimented on two datasets such as normal BRCA1 sequence and mutated BRCA1 sequence. Both will undergo one-hot encoding process. The one-hot code text represents the variables of categories such as binary vectors. This first requires the category values to be mapped to the total values. Then, each numeric value is represented as a zero vector of zero value without a numerical index, marked with 1. In one-hot encoding method, rescaling is easy compared to other encoding methods. The probability of values can be easily determined. The complex ACGT sequences are converted into simple binary numbers dummies. When dealing with the normal DNA sequence, it takes lot of memory and execution time since the sequence is converted to binary number executions that becomes easier with less memory consumption. DNA sequence has four bases, such as adenine (A), thymine (T), cytosine (C), and guanine (G). These bases are encoded as a vector. ‘A’ is encoded as (0,0,0,1), ‘C’ as (0,0,1,0), ‘G’ as (0,1,0,0), and ‘T’ as (1,0,0,0). This is represented in Fig. 5. Once the mutated DNA sequence and normal sequences are encoded, both of them are compared to find out the changes. When dealing with the direct DNA sequence, it takes lot of memory and execution time. Hence, the sequence is converted to binary, and executions become easier with less memory consumption. The output indicates the position in which the alteration of gene had taken place. We use Python programing for identifying the genomic changes that takes place in the location.

BRCA1 Genomic Sequence-Based Early Stage Breast Cancer Detection

255

Fig. 5 Binary feature mapping results

4 Experimental Results This section presents the experimental results that have been performed on normal and mutated BRCA1 dataset. Initially, the extracted BRCA1 genome sequence is preprocessed to convert it from FASTA to normal so that the obtained sequence is in normal sequence. Next stage is classifying the normal BRCA1 sequence and mutated BRCA1 sequence. Further, one-hot encoding technique is applied on both the sequences using orthogonal rule which has a feature size of 4 N which indicates A,C,G,T. Accuracy of deviation can be detected by difference in the percentage mutation. Difference in percentage mutation in parts per million (ppm) is calculated as shown in Eq. (1).  Percentage difference = count length(BRCA1) ∗ 1000000

(1)

The obtained ppm values differ according to the length of the sequence, when the sequence size is less, percentage difference will become high and vice versa. The ppm measured value was 564.334085778781. The point mutation has been calculated in percentage as shown in Table 2. Table 2 Percentage difference evaluation

Approaches

Technique

Point mutation (%)

Proposed approach

DNA sequence

0.05

Yi and Ma et al. [10]

Tissue sequence

0.3

256

S. G. Shaila et al.

Once the point mutation is identified, the position of mutated sequence in the dataset was obtained, and the positions were (784, 949, 1364, 1378). The abovementioned point mutation states that the proposed approach will even detect the tiniest bit of point mutation compared to the existing approach that is being considered approach in [10]. The approach used tissue sequencing in the identification and is compare with the proposed approach.

5 Conclusion The paper proposes the breast cancer detection through genomic sequencing approach. The approach identifies the specific changes that occur in the BRCA1 sequence. The approach analyzes the specific affected areas on the sequence of acquired DNA. The approach applies one-hot encoding technique to compare the mutated BRCA1 and normal BRCA1. This identify the location and degree with which the gene is being altered. The paper proposes the technique that identifies the place and degree of alteration that causes breast cancer. The proposed approach detects the tiniest bit of point mutation. The approach identifies the particular mutated sequence and helps in removal of particular gene that causes breast cancer through gene replacement therapy.

References 1. Wang X, Zou C, Zhang Y, Li X, Wang C, Ke F, Chen J, Wang W, Wang D, Xu X, Xie L, Zhang Y (2021) Prediction of BRCA gene mutation in breast cancer based on deep learning and histopathology images. Front Genet 12. https://doi.org/10.3389/fgene.2021.661109 2. Kothari C, Osseni MA, Agbo L et al (2020) Machine learning analysis identifies genes differentiating triple negative breast cancers. Sci Rep 10:10464. https://doi.org/10.1038/s41598-02067525-1 3. Sun Y, Zhu S, Ma K et al (2019) Identification of 12 cancer types through genome deep learning. Sci Rep 9:17256. https://doi.org/10.1038/s41598-019-53989-3 4. Findlay GM, Daza RM, Martin B, Zhang MD, Leith AP, Gasperini M, Janizek JD, Huang X, Starita LM, Shendure J (2018) Accurate classification of BRCA1 variants with saturation genome editing. Nature 562(7726):217–222. https://doi.org/10.1038/s41586-018-0461-z 5. Cortés C, Rivera AL, Trochez D, Solarte M, Gómez D, Cifuentes L, Barreto G (2019) Mutational analysis of BRCA1 and BRCA2 genes in women with familial breast cancer from different regions of Colombia. Hered Cancer Clin Pract. 15(17):20. https://doi.org/10.1186/ s13053-019-0120-x 6. Møller P, Dominguez-Valentin M, Rødland EA, Hovig E (2019) Causes for frequent pathogenic BRCA1 variants include low penetrance in fertile ages, recurrent de-novo mutations and genetic drift [published correction appears in Cancers (Basel). Cancers (Basel) 11(2):132. https://doi.org/10.3390/cancers11020132 7. Golubeva VA, Nepomuceno TC, Monteiro ANA (2019) Germline missense variants in BRCA1: new trends and challenges for clinical annotation. Cancers (Basel). 2019 11(4):522. https://doi. org/10.3390/cancers11040522 8. https://www.ncbi.nlm.nih.gov/, Dataset

BRCA1 Genomic Sequence-Based Early Stage Breast Cancer Detection

257

9. Madariaga A, Lheureux S, Oza AM (2019) Tailoring ovarian cancer treatment: implications of BRCA1/2 mutations. Cancers (Basel) 11(3):416. https://doi.org/10.3390/cancers11030416 10. Yi Z, Ma F, Li C et al (2017) Landscape of somatic mutations in different subtypes of advanced breast cancer with circulating tumor DNA analysis. Sci Rep 7:5995. https://doi.org/10.1038/ s41598-017-06327-4 11. Haddadin IS, McIntosh A, Meisamy S et al (2009) Metabolite quantification and high-field MRS in breast cancer. NMR Biomed 22(1):65–76. https://doi.org/10.1002/nbm.1217 12. Amethiya Y, Pipariya P, Patel S, Shah M (2021) Comparative analysis of breast cancer detection using machine learning and biosensors. Intell Med, ISSN 2667-1026. https://doi.org/10.1016/ j.imed.2021.08.004

Develop Model for Recognition of Handwritten Equation Using Machine Learning Kaushal Kishor, Rohan Tyagi, Rakhi Bhati, and Bipin Kumar Rai

Abstract The problems that stand with the existing solution of handwritten equation recognizer have numerous problems such as not being purely based on ML and thus unable to occupy dominance of the large processing capacity of the machines that stand in this scenario and also not having an adequate amount of data to precisely estimate the characters, symbols, and numbers that are drawn out from the image being satiated under the software. The current handwritten equation recognizer solution is applied, and imitation utilizing different competence and performance levels is evaluated on real photos. Neural networks which give an accuracy of 95% are used for recognition of the alphabets, all the numbers, mathematical symbols, and some special symbols like π and ψ (Hossain et al. in 2018 Joint 7th International Conference on Informatics, Electronics and Vision (ICIEV) and 2018 2nd International Conference on Imaging, Vision and Pattern Recognition (icIVPR). IEEE, New York, pp. 250–255, 2018). The major goal of this work is to improve a software programme that detects handwritten equations, phrases, and paragraphs generated by handwritten numbers, characters, words, and mathematical symbols utilizing CNN, and some image processing expertise to achieve appropriate accuracy. The primary goal of this research work is to execute fundamental operations on the identified equation, such as number addition and subtraction. We also do text recognition, such as reading a handwritten sentence or paragraph and translating it to computer-typed format. Finally, the observational results suggest that our current software application is a huge success.

K. Kishor (B) · R. Tyagi · R. Bhati · B. K. Rai Department of Information Technology, ABES Institute of Technology, Ghaziabad, Uttar Pradesh, India e-mail: [email protected] R. Tyagi e-mail: [email protected] R. Bhati e-mail: [email protected] B. K. Rai e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_23

259

260

K. Kishor et al.

Keywords Convolution neural network · Image processing · Handwritten equation recognizer · Image contour · Segmentation

1 Introduction In addition to the advancement of technology, ML and deep learning are now playing critical roles [1, 2]. ML and deep learning techniques are now being used in robotics, handwriting recognition, AI, and a variety of other fields [3, 4]. Establishing this type of setup needs instructions from our machines as well as a suitable dataset in order to efficiently identify and provide acceptable predictions. This study offers a handwritten equation and text recognizer capable of recognizing handwritten numbers, phrases, characters, and mathematical symbols by combining CNN with some image processing techniques to achieve appropriate precision [5–8]. Handwritten equation and text recognizer are a software tool that helps you interpret the characters that make up any given mathematical equation, paragraph, or sentence. It just involves the usage of a hand-operated image of the mathematical equation, phrase, or paragraph and feeding the picture to the software to recognize the characters of the mathematical equation or text for further processing and prediction [9]. A computer’s ability to examine patterns and figure out the information gathered during pattern recognition is a tremendous accomplishment. This mannerism poses a significant barrier in training a computer to detect and decide mathematical formulas, paragraphs, or words, which are popular instances of pattern recognition [10]. Being able to build a neural network that can strongly ‘train’ a machine to admit arithmetic patterns so that it can eventually perform more arithmetic operations on the expression without hesitation or with minimal manual input from beings will almost certainly result in a significant improvement in mathematical equation calculations. Only the tasks listed above are intended to be accomplished by the final programme. It teaches a computer model to recognize mathematical patterns and can offer the user with a list of characters involved in the equation, as well as read handwritten words or phrases and represent them in computer-typed format.

2 Literature Review Hossain et al. [11], they plan to recognize handwritten mathematical formulae. They employed a single quadratic as well as a sequence of quadratics to achieve this recognition. As a result, each quadratic line is split using horizontal prediction. Connected components having a high success rate are used for character segmentation. Feature extraction is the most challenging element of classification. This study by Manchala et al. [12] concerns the classification of attributes. The project is carried out with the help of a standard neural network. They had a success rate of more than 90.3%. The algorithm will produce an efficient and effective output for recognition. The

Develop Model for Recognition of Handwritten Equation Using …

261

project gives the maximum level of text accuracy with the least amount of noise. Tiwari et al. [13], they develop an equation-solving mobile app for the Android operating system that takes a snapshot using the camera. Find a solution and present the outcome. Simple arithmetic formulae (addition, subtraction, and multiplication) as well as systems of two linear equations may be solved using the programme. Handwritten or calculated variations are both acceptable. Murugan et al. [14], the method generates suitable handwritten text images in a variety of character representations while maintaining consistency in style and size. There will be higher precision when there is less inaccuracy or distortion of characters, and a comparable manner will actually occur in the background. The clarity of the image will vary depending on the handwritten content (Shweta et al. [15]). They developed a Website for solving handwritten equations. It records handwritten equations with a camera, and character recognition is accomplished by image pre-processing. As a consequence, the model that was created is straightforward to use for solving complex equations and producing correct answers (Velpuri et al. [16]). The model is intended to assess written information and convert it to computer text and voice formats. The software has the potential to be employed in a range of healthcare and consumer settings. This paradigm, which is used in health applications, may store people’s understanding perspectives and digitally save practically every record. Upadhyay et al. [17], they created neural network architecture for handwritten character recognition. The model is composed of a convolution encoder for input images and a bidirectional LSTM decoder for character sequence prediction. The architecture is also influenced by the desire for fast training on GPUs and fast interpretation on CPUs (Nikitha et al. [18]). They collect data for training handwritten texts, extract features from those text datasets, and then train the model using a deep learning approach. In order to improve accuracy, they use the strategy of recognizing in terms of words rather than characters in this study. The model built using the LSTM deep model is quite accurate. Saritha et al. [19], they extracted the digital form of handwritten text from scanned photographs using a machine learning method. They will be able to recognize text in ready-made datasets including pixel values from scanned photographs as inputs. They use OpenCV and CNN for this. Khalkar et al. [20], they created an API to recognize handwritten text [21, 22]. They established an interface with the model so that the user may access it and utilize it as needed. They used normal OCR to gather physical trait matches, which they then transformed into a database of recognized types [9].

3 Proposed Model The Architecture of Proposed model—The architecture of proposed model is shown in the Fig. 1. The architectural model depicts the method of our model I. To begin, we processed a picture with the term ‘STATE’, which was then delivered to the convolutional layer, which mapped the properties of the inputted image. Following feature extraction from convolutional layers, the CNN output is processed

262

K. Kishor et al.

Fig. 1 Workflow model for handwritten equation recognizer

further by recurrent layers, which work on the features extracted by convolutional layers and process each feature in sequence, and then by long short-term memory (LSTM), which has feedback connections and can process all data sequences [15]. Finally, transcription layers manage the output, dividing the text into discrete frames, and working on per frame predictions before generating a predicted sequence that translates the handwritten text to computer-typed format. Module 1: Equation Recognizer: We build the model of equation recognizer using CNN in which we first take input as an image containing handwritten mathematical equation and then apply certain processes of CNN over it for recognition of characters and symbols in the equation and then display the equation in computer-typed text. Process of Handwritten Equation Recognizer. Module 2: Text Recognizer: We build the model for text recognizer using CNN, RNN, and CTC in which we first take a text image as an input and then apply CNN, RNN, and CTC over it and then get the desired output, i.e. the computer-typed text after recognition of text from inputted image which is shown in Fig. 2. Process of Handwritten Text Recognizer: In Fig. 2, it shows about overview of handwritten text recognizer, and we are using neural network for this project. It contains CNN layers, RNN layers, and CTC layers for recognition of text. The model first takes an image of text as input image and then applies CNN over the image for recognition of text from image up to certain level, and after processing, the output of CNN is processed as an input for RNN, and then, further implementations are done over the features sequences extracted using CNN and work on all features sequences and process it to LSTM which after processing goes to CTC for analyzing the score of recognized text, and the output of recognized text is displayed. Fig. 2 Model overview of handwritten text recognizer [15]

Develop Model for Recognition of Handwritten Equation Using …

263

4 Result We are developing a ML model for handwritten text and equation recognizer for which we are using CNN, RNN, CTC, image processing, segmentation, and backpropagation algorithms. After implementation, the model will be able to recognize handwritten text and equations and convert it into digital form which will be possible by the help of ML technologies and their algorithms such as CNN, RNN, and CTC. During training and testing, we faced certain losses in CNN and multilayer perception. In Figs. 3 and 4, they are the graphical representation of training loss and testing loss in CNN and also the graphical representation of training accuracy and testing accuracy in CNN with rise in number of epochs. After training of the model, we use thousands of test images for recognition of the quadratics, text, and paragraphs. We use the image of quadratics and text. Firstly, Fig. 3 Graphical representation of loss in CNN

Fig. 4 Graphical representation of accuracy in CNN

264

K. Kishor et al.

Fig. 5 Recognition and conversion of handwritten text into digital format

Fig. 6 Recognition and conversion of handwritten equation into digital format

we did training on the model, and after completion, we used each test image for recognition. For the recognition of quadratics, in case of right recognition, we will run a command for the solution of the equation, else providing command again for second equation recognition. As a result, here are some outputs of our model shown in Figs. 5 and 6.

5 Conclusion The handwritten mathematics quadratic equations prose, and paragraphs are the focus of this study. For the recognition, we considered quadratics and text and translated them into computer written text. In the equation recognizer, we first pre-processed the picture, then segmented each equation, and then projected it horizontally. The equation recognizer is processed on CNN, and the text is processed on CNN and CTC layers before being converted from handwritten text to computer-typed text. The prior model was bulky and required a large amount of processing power and resources to train the model; however, our new model worked on optimization and took less time to compute and train the model.

References 1. Kishor K, Sharma R, Chhabra M (2022) Student performance prediction using technology of machine learning. In: Sharma DK, Peng SL, Sharma R, Zaitsev DA (eds) Micro-electronics and telecommunication engineering. Lecture notes in networks and systems, vol 373. Springer, Singapore. https://doi.org/10.1007/978-981-16-8721-1_53 2. Kishor K (2022) Communication-efficient federated learning. In: Yadav SP, Bhati BS, Mahato DP, Kumar S (eds) Federated learning for IoT applications. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-03085559-8_9 3. Sharma R, Maurya SK, Kishor K (2021) Student performance prediction using technology of machine learning. In: Proceedings of the international conference on innovative computing and communication (ICICC), 3 Jul 2021. Available at SSRN: https://ssrn.com/abstract=3879645 4. Jain A, Sharma Y, Kishor K (2021) Prediction and analysis of financial trends using Ml algorithm (July 11, 2021). In: Proceedings of the international conference on innovative computing

Develop Model for Recognition of Handwritten Equation Using …

5. 6.

7.

8.

9.

10.

11.

12. 13. 14.

15. 16.

17. 18.

19. 20. 21.

22.

265

and communication (ICICC) 2021, Available at SSRN: https://ssrn.com/abstract=3884458 or https://doi.org/10.2139/ssrn.3884458 Tyagi D, Sharma D, Singh R, Kishor K, Real time ‘driver drowsiness’& monitoring and detection techniques. Int J Innov Technol Exploring Eng 9(8):280–284 Kishor K (2022) Personalized federated learning. In: Yadav SP, Bhati BS, Mahato DP, Kumar S (eds) Federated learning for IoT applications. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-030-85559-8_3 Gupta S, Tyagi S, Kishor K (2022) Study and development of self sanitizing smart elevator. In: Gupta D, Polkowski Z, Khanna A, Bhattacharyya S, Castillo O (eds) Proceedings of data analytics and management. Lecture notes on data engineering and communications technologies, vol 90. Springer, Singapore. https://doi.org/10.1007/978-981-16-6289-8_15 Sharma A, Jha N, Kishor K (2022) Predict COVID-19 with Chest X-ray. In: Gupta D, Polkowski Z, Khanna A, Bhattacharyya S, Castillo O (eds) Proceedings of data analytics and management. Lecture notes on data engineering and communications technologies, vol 90. Springer, Singapore. https://doi.org/10.1007/978-981-16-6289-8_16. Agrawal P, Chaudhary D, Madaan V et al (2021) Automated bank cheque verification using image processing and deep learning methods. Multimed Tools Appl 80:5319–5350. https://doi. org/10.1007/s11042-020-09818-1 Lin H, Tan J (2020) Application of deep learning in handwritten mathematical expressions recognition. In: Lu Y, Vincent N, Yuen PC, Zheng WS, Cheriet F, Suen CY (eds) Pattern recognition and artificial intelligence. ICPRAI 2020. Lecture notes in computer science, vol 12068. Springer, Cham. https://doi.org/10.1007/978-3-030-59830-3_12 Hossain MB et al (2018) Recognition and solution for handwritten equations using convolutional neural networks. In: 2018 joint 7th international conference on informatics, electronics and vision (ICIEV) and 2018 2nd international conference on imaging, vision and pattern recognition (ICIVPR) IEEE Manchala S, Kinthali J, Kotha K, Kumar J (2020) Handwritten text recognition using deep learning with TensorFlow. Int J Eng Res and V9. https://doi.org/10.17577/IJERTV9IS050534 Tiwari A, Tiwari A, Mishra A, Wakurdekar SB (2020) Equation solving using image processing. Int J Innov Res Technol ISSN: 2349-6002 Murugan N, Sivakumar R., Yukesh G, Vishnupriyan J (2020) Recognition of character from handwritten. In: 2020 6th international conference on advanced computing and communication systems (ICACCS), pp 1417–1419. 10.1109/ ICACCS48705.2020.9074424 Patil SV, Patil AS, Mokal HC (2021) Handwritten equation solver using convolutional neural network. Int J Sci Res Eng Trends 7(4), ISSN (Online): 2395-566X Velpuri NSST, Pandala GK, Lalitha Rao Sharma Polavarapu SD, Kumari PR (2019) Handwritten text recognition using machine learning techniques in application of NLP. Int J Innov Technol Exploring Engineering (IJITEE) 9(2) ISSN: 2278-3075 Upadhyay BD, Chaudhary S, Gulhane M (2020) Handwritten character recognition using machine learning. Int J Future Gener Commun Netw 13(2s):1256–1266 Nikitha, Geetha J, JayaLakshmi DS (2020) Handwritten text recognition using deep learning. In: International conference on recent trends on electronics, information, communication and technology (RTEICT), 2020, pp. 388–392. https://doi.org/10.1109/RTEICT49044. 2020.9315679 Saritha SJ, Deepak Teja KRJ, Hemanth Kumar G, Jeelani Sharief S (2020) Handwritten text detection using OpenCV and CNN. Int J Eng Res Technol (IJERT) 09(04) Khalkar R, Dikhit A, Goel A (2021) Handwritten text recognition using deep learning (CNN and RNN). IARJSET 8:870–881.https://doi.org/10.17148/IARJSET.2021.86148 Nanehkaran YA, Zhang D, Salimi S et al (2021) Analysis and comparison of machine learning classifiers and deep neural networks techniques for recognition of Farsi handwritten digits. J Supercomput 77:3193–3222. https://doi.org/10.1007/s11227-020-03388-7 Sethi R, Kaushik Ila (2020) Hand written digit recognition using machine learning. In: 2020 IEEE 9th international conference on communication systems and network technologies (CSNT). IEEE. p 49–54

Feature Over Exemplification-Based Classification for Revelation of Hypothyroid M. Shyamala Devi, P. S. Ramesh, S. Vinoth Kumar, R. Bhuvana Shanmuka Sai Sivani, S. Muskaan Sultan, and Thaninki Adithya Siva Srinivas Abstract Thyroid hypothyroidism is an ailment during which the endocrine system generates insufficient thyroid hormone. Heart rate, body temperature, and many other components of metabolic activity can become problematic when there is an inadequacy of hormone insulin in a hypothyroid patient. Conventional thyroid stimulating hormone levels-based diagnosis and controlling does not always contribute in symptomatic improvement in hypothyroid symptoms, much to the despair of both patients and physicians. The machine learning technology could help the healthcare industry for the prediction of hypothyroid. With this review, this project aims to predict the presence of hypothyroid by using hypothyroid data source obtained from the KAGGLE machine learning library. The hypothyroid dataset is pre-processed with encoding and missing values and contains 24 components and 3164 patient details. To examine the performance measures, the original information is deployed to all classifiers with and without feature scaling. The exploratory data analysis is done to analyze the distribution of target variable. The target data distribution is found to have 91.7% of non-hypothyroid and 8.3% of hypothyroid, which clearly specifies the imbalanced target distribution. The dataset is applied with oversampling methods like borderline, smote, SVM smote, and ADASYN methods to balance the target feature distribution. The above approaches then apply multiple classifiers to the oversampled dataset to predict hypothyroidism both before and after feature scaling. The performance metrics are precision, recall, F-score, and accuracy. The accuracy of the random forest classifier before applying oversampling of the target hypothyroid feature is 89 percent, according to experimental observations. After using oversampling techniques like Borderline1, Borderline2, Smote, and SVMSmote, the same random forest classifier displays accuracy of 99 percent. However, the random forest classifier exhibits the accuracy of 100% for ADASYN oversampling method. Keywords Machine learning · Oversampling · Classifier · Scaling and metrics M. Shyamala Devi (B) · P. S. Ramesh · S. Vinoth Kumar · R. Bhuvana Shanmuka Sai Sivani · S. Muskaan Sultan · T. A. S. Srinivas Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamilnadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_24

267

268

M. Shyamala Devi et al.

1 Introduction In the diagnosis of thyroid disease, the interpretation of thyroid gland functional data is crucial. The primary function of the thyroid gland is to aid in the regulation of the body’s metabolism. Thyroid disorders have become more common in recent years all over the world. Hypothyroidism, hyperthyroidism, and thyroid cancer affect one out of every eight Romanian women. According to various studies, approximately 30% of Romanians suffer from pervasive goiter. Thyroid function is affected by stress, infection, trauma, toxins, a low-calorie diet, certain medications, and other factors. Classification is a key supervised learning data mining technique for categorizing predefined datasets. The paper is structured so that the research study is covered in Sect. 2, and the paper’s contributions are covered in Sect. 3. Section 4 discusses the implementation setup and results, and Sect. 5 draws a conclusion.

2 Literature Review This study emphasized the importance of determining how to predict thyroid disease and when to use logistic regression, decision trees, and KNN as classification tools. This study used the thyroid dataset from the machine learning repository at UC Irvin’s knowledge discovery in databases archive [1]. The classification of thyroid disease in common thyroid dysfunctions in the general population is the focus of this research. The results show that all of the above-mentioned classification models are accurate, with the decision tree model having the best classification rate [2]. In thyroid disease diagnosis, precise estimation of thyroid gland operational information is critical [3]. The primary function of the thyroid gland is to aid in the regulation of the body’s metabolism. The production of either too little or too much thyroid hormone determines the type of thyroid disease. This study used a variety of neural networks to aid in the diagnosis of thyroid disease [4]. The aim of this work is to use a new hybrid learning method that includes this classification system to diagnose thyroid disease. Hybridizing AIRS with an advanced fuzzy weighted pre-processing yields a method for solving this diagnosis problem via classification. To determine the technique’s robustness to sampling variability, a cross-validation analysis is used [5]. The objective of this revision is to practice a novel crossbreed learning technique that includes this classification system to detect thyroid disease. A method for classifying this diagnosis problem is obtained by combining AIRS with advanced fuzzy weighted pre-processing. A cross-validation analysis is used to determine the technique’s robustness to sampling variability [6]. The exponential growth of databases and repositories has resulted from the expansion of scientific knowledge and the massive production of data. The biomedical domain is one of the rich data domains. Biomedical data is currently abundant, ranging from clinical symptom information to various types of biochemical data and imaging device outputs. It is

Feature Over Exemplification-Based Classification for Revelation …

269

difficult to mechanically extract biological information from images and reshape it into machine-readable knowledge [7]. Using a backpropagation algorithm, this paper describes a method for detecting thyroid disease earlier. Backpropagation of error was used to create ANN for prior disease diagnosis. The ANN was then trained using empirical values, and testing was carried out using data that had not been used during the training process [8]. Because efficient techniques for analyzing and identifying disorders are required, data collection is an important methodological approach in the field of medical disciplines. Clinical governance, health information technology, patient care systems, and other areas all use data mining applications. Classification and clustering are two popular data mining techniques for recognizing the complex parameters of the nutrition dataset [9]. In iron-deficient blood smears, this study describes a novel method for detecting three types of anomalous red blood cells known as Poikilocytes. Poikilocyte cell classification and counting are regarded as a critical step in the early detection of iron deficiency anemia disease. Dacrocyte, Elliptocyte, and Schistocyte cells are the three basic Poikilocyte cells in IDA [10].

3 Our Contributions In Fig. 1, the overall structure of the project is depicted. The work incorporates the below listed achievements. • To begin with, the hypothyroid dataset has been normalized with encoding and null values. It has 24 features and 3164 patient details. • To assess the performance indicators, the original dataset has been used with various classifier models, both with and without feature scaling. • The dispersion of the target attribute is examined using exploratory data analysis. • Fourth, the target data distribution is found to have 91.7% of non-hypothyroid and 8.3% of hypothyroid, which clearly specifies the imbalanced target distribution. • Fifth, the dataset is deployed with oversampling methods like borderline, smote, SVM smote, and ADASYN methods to balance the target feature distribution. • Sixth, predictions of hypothyroidism before and after feature scaling are made using the oversampled dataset created by the aforementioned approaches to examine the performance indices.

4 Results and Predictive Analysis Data preparation is used to fill in null values and encode categorical variables in the KAGGLE repository’s dyslexia dataset, which has 3164 rows and 24 component attributes. The data source target distribution is depicted in Fig. 2, and Table.1 shows the performance factor analysis when the original dataset is applied to classifier models.

270

M. Shyamala Devi et al. Hypothyroid Data Set

Division of dependent and independent attribute

Data Exploratory Analysis

Raw dataset

Feature Scaling

Oversampling

SMOTE, SVM Smote ADASYN, Borderline oversampled Dataset

Fitting to all the Classifiers

Evaluation of Precision, Recall, FScore, Accuracy and Run Time

Prediction of Hypothyroid

Fig. 1 Architecture system workflow

Fig. 2 Density map and target distribution of the hypothyroid information

RunTime

0.85

0.86

0.76

0.81

0.83

0.82

0.88

0.88

0.85

0.84

0.86

0.85

0.85

KNN

KSVM

GNB

Dtree

Etree

RFor

AdaB

Ridge

RCV

SGD

PAg

Bagg

0.88

0.87

0.88

0.87

0.88

0.89

0.88

0.83

0.84

0.36

0.87

0.88

0.87

0.86

0.86

0.87

0.83

0.83

0.88

0.83

0.82

0.83

0.42

0.81

0.84

0.85

0.88

0.87

0.88

0.87

0.88

0.88

0.89

0.83

0.84

0.36

0.87

0.88

0.87

1.04

0.06

0.09

0.08

0.08

0.82

0.17

0.02

0.14

0.02

1.02

0.77

0.09

0.86

0.86

0.87

0.84

0.85

0.88

0.86

0.82

0.83

0.81

0.85

0.87

0.86

0.88

0.84

0.87

0.87

0.88

0.89

0.88

0.83

0.84

0.36

0.87

0.88

0.88

Recall

Absence of scaling Accu

Precision

F-score

Precision

Recall

Presence of scaling

LReg

Classifier

Table 1 Original dataset performance analysis

0.85

0.85

0.87

0.83

0.83

0.88

0.83

0.82

0.83

0.42

0.82

0.83

0.87

F-score

0.88

0.84

0.87

0.87

0.88

0.89

0.88

0.83

0.84

0.36

0.87

0.88

0.88

Accu

1.05

0.02

0.10

0.05

0.02

0.82

0.16

0.02

0.11

0.02

0.88

0.78

0.11

RunTime

Feature Over Exemplification-Based Classification for Revelation … 271

272

M. Shyamala Devi et al.

The dataset is applied with oversampling, and the target component distribution of the hypothyroid attribute is shown in Fig. 3. The performance of each classifier is examined and is displayed in Table 2 after the dataset has been adapted with Borderline1 oversampling and adapted to all of them to predict hypothyroidism with the scaling investigation. The performance of each classifier is examined and is displayed in Table. 3 after the dataset has been adapted with Borderline2 oversampling and adapted to all of them to predict hypothyroidism with the scaling investigation. The performance of each classifier is examined and is displayed in Table. 4 after the dataset has been adapted with SMOTE oversampling and adapted to all of them to predict hypothyroidism with the scaling investigation. The dataset is then fitted with SVM SMOTE oversampling and then fitted to all the classifiers to predict the hypothyroid with and without feature scaling, and performance is analyzed and shown in Table 5.

Fig. 3 Target hypothyroid dissemination with oversampling methods

RunTime

0.95

0.86

0.91

0.99

0.92

0.88

0.99

0.95

0.86

0.91

0.99

0.99

0.86

KNN

KSVM

GNB

Dtree

Etree

RFor

AdaB

Ridge

RCV

SGD

PAg

Bagg

0.86

0.99

0.99

0.91

0.86

0.95

0.99

0.88

0.92

0.99

0.91

0.86

0.95

0.86

0.99

0.99

0.91

0.86

0.95

0.99

0.88

0.92

0.99

0.91

0.86

0.95

0.86

0.99

0.99

0.91

0.86

0.95

0.99

0.88

0.92

0.99

0.91

0.86

0.95

4.27

0.09

0.26

0.10

0.02

3.62

0.79

0.02

0.51

0.03

7.34

2.23

0.20

0.99

0.97

0.98

0.96

0.96

0.99

0.99

0.99

0.99

0.83

0.99

0.99

0.98

0.99

0.97

0.98

0.96

0.96

0.99

0.99

0.99

0.99

0.74

0.99

0.99

0.98

Recall

Absence of scaling Accu

Precision

F-score

Precision

Recall

Presence of scaling

LReg

Classifier

Table 2 Borderline1 oversampling dataset performance analysis

0.99

0.97

0.98

0.96

0.96

0.99

0.99

0.99

0.99

0.72

0.99

0.99

0.97

F-score

0.99

0.97

0.98

0.96

0.96

0.99

0.99

0.99

0.99

0.74

0.99

0.99

0.98

Accu

4.51

0.05

0.19

0.11

0.02

3.34

0.78

0.01

0.51

0.03

4.31

3.67

0.20

RunTime

Feature Over Exemplification-Based Classification for Revelation … 273

RunTime

0.99

0.99

0.94

0.99

0.92

0.88

0.99

0.92

0.88

0.97

0.95

0.92

0.88

KNN

KSVM

GNB

Dtree

Etree

RFor

AdaB

Ridge

RCV

SGD

PAg

Bagg

0.88

0.92

0.95

0.97

0.88

0.92

0.99

0.88

0.92

0.99

0.94

0.99

0.99

0.88

0.92

0.95

0.97

0.88

0.92

0.99

0.88

0.92

0.99

0.94

0.99

0.99

0.88

0.92

0.95

0.97

0.88

0.92

0.99

0.88

0.92

0.99

0.94

0.99

0.99

4.27

0.09

0.26

0.10

0.02

3.62

0.79

0.02

0.51

0.03

7.34

2.23

0.20

0.96

0.93

0.94

0.95

0.95

0.99

0.99

0.99

0.99

0.82

0.98

0.98

0.96

0.96

0.93

0.94

0.94

0.94

0.99

0.99

0.99

0.99

0.73

0.98

0.98

0.96

Recall

Absence of scaling Accu

Precision

F-score

Precision

Recall

Presence of scaling

LReg

Classifier

Table 3 Borderline2 oversampling dataset performance analysis

0.96

0.93

0.94

0.94

0.94

0.99

0.99

0.99

0.99

0.70

0.98

0.98

0.96

F-score

0.96

0.93

0.94

0.94

0.94

0.99

0.99

0.99

0.99

0.73

0.98

0.98

0.96

Accu

4.51

0.05

0.19

0.11

0.02

3.34

0.78

0.01

0.51

0.03

4.31

3.67

0.20

RunTime

274 M. Shyamala Devi et al.

RunTime

0.98

0.98

0.94

0.98

0.99

0.99

0.99

0.99

0.98

0.98

0.98

0.98

0.99

KNN

KSVM

GNB

Dtree

Etree

RFor

AdaB

Ridge

RCV

SGD

PAg

Bagg

0.99

0.98

0.98

0.98

0.98

0.99

0.99

0.99

0.99

0.98

0.94

0.98

0.98

0.99

0.98

0.98

0.98

0.98

0.99

0.99

0.99

0.99

0.98

0.94

0.98

0.98

0.99

0.98

0.98

0.98

0.98

0.99

0.99

0.99

0.99

0.98

0.94

0.98

0.98

7.77

0.33

0.41

0.15

0.04

7.54

1.89

0.03

1.26

0.04

18.74

4.70

0.36

0.94

0.77

0.98

0.98

0.98

0.99

0.99

0.99

0.99

0.98

0.98

0.99

0.97

0.93

0.77

0.98

0.97

0.97

0.99

0.99

0.99

0.99

0.97

0.98

0.99

0.97

Recall

Absence of scaling Accu

Precision

F-score

Precision

Recall

Presence of scaling

LReg

Classifier

Table 4 SMOTE oversampling dataset performance analysis

0.93

0.77

0.98

0.97

0.97

0.99

0.99

0.99

0.99

0.97

0.98

0.99

0.97

F-score

0.93

0.77

0.98

0.97

0.97

0.99

0.99

0.99

0.99

0.97

0.98

0.99

0.97

Accu

10.06

0.22

0.62

0.18

0.05

6.18

1.15

0.02

0.90

0.03

7.66

7.08

0.38

RunTime

Feature Over Exemplification-Based Classification for Revelation … 275

RunTime

0.94

0.99

0.92

0.94

0.99

0.92

0.99

0.94

0.99

0.92

0.97

0.99

0.94

KNN

KSVM

GNB

Dtree

Etree

RFor

AdaB

Ridge

RCV

SGD

PAg

Bagg

0.94

0.99

0.97

0.92

0.99

0.94

0.99

0.92

0.99

0.94

0.92

0.99

0.94

0.94

0.99

0.97

0.92

0.99

0.94

0.99

0.92

0.99

0.94

0.92

0.99

0.94

0.94

0.99

0.97

0.92

0.99

0.94

0.99

0.92

0.99

0.94

0.92

0.99

0.94

10.52

0.25

0.26

0.17

0.04

7.46

1.55

0.05

1.22

0.05

15.31

5.61

0.31

0.99

0.99

0.99

0.96

0.96

0.99

0.99

0.99

0.99

0.98

0.99

0.98

0.99

0.99

0.99

0.99

0.96

0.96

0.99

0.99

0.99

0.99

0.98

0.99

0.98

0.99

Recall

Absence of scaling Accu

Precision

F-score

Precision

Recall

Presence of scaling

LReg

Classifier

Table 5 SVM SMOTE oversampling dataset performance analysis

0.99

0.99

0.99

0.96

0.96

0.99

0.99

0.99

0.99

0.98

0.99

0.98

0.99

F-score

0.99

0.99

0.99

0.96

0.96

0.99

0.99

0.99

0.99

0.98

0.99

0.98

0.99

Accu

9.97

0.15

0.43

0.19

0.04

8.75

2.62

0.06

1.98

0.07

9.99

9.34

0.32

RunTime

276 M. Shyamala Devi et al.

Feature Over Exemplification-Based Classification for Revelation …

277

The dataset is then fitted with ADASYN oversampling and then fitted to all the classifiers to predict the hypothyroid with and without feature scaling, and performance is analyzed and shown in Table. 6.

5 Conclusion This research work provides the efficiency of prediction and the classification of hypothyroid disease based on the target class distribution. The hypothyroid dataset is examined to explore the imbalanced target hypothyroid class, and the target data distribution is found to have 91.7% of non-hypothyroid and 8.3% of hypothyroid patients. So the target hypothyroid class dataset is subjected to perform oversampling. This paper is attempted to prove how well the classifier accuracy is enhanced by applying the oversampling methods. The dataset is deployed with the oversampling methods like smote, SVM smote, ADASYN and Borderline smote to analyze the performance of the classifiers toward predicting the target hypothyroid class. Experimental results show that the random forest classifier shows accuracy of 89% before applying oversampling of target hypothyroid feature. The same random forest classifier shows the accuracy of 99% after oversampling methods like Borderline1, Borderline2, Smote, SVMSmote. However, the random forest classifier exhibits the accuracy of 100% for ADASYN oversampling method.

RunTime

0.94

0.95

0.86

0.91

0.99

0.99

1.00

0.98

0.92

0.92

0.88

0.80

0.86

KNN

KSVM

GNB

Dtree

Etree

RFor

AdaB

Ridge

RCV

SGD

PAg

Bagg

0.86

0.75

0.88

0.92

0.92

0.98

1.00

0.99

0.99

0.91

0.86

0.95

0.94

0.86

0.74

0.88

0.92

0.92

0.98

1.00

0.99

0.99

0.91

0.86

0.95

0.94

0.86

0.75

0.88

0.92

0.92

0.98

1.00

0.99

0.99

0.91

0.86

0.95

0.94

5.67

0.14

0.42

0.13

0.04

5.80

1.12

0.04

0.88

0.04

15.70

4.74

0.23

0.99

0.93

0.92

0.91

0.91

0.99

1.00

0.99

0.99

0.86

0.99

0.99

0.94

0.99

0.93

0.92

0.91

0.91

0.99

1.00

0.99

0.99

0.86

0.99

0.99

0.94

Recall

Absence of scaling Accu

Precision

F-score

Precision

Recall

Presence of scaling

LReg

Classifier

Table 6 ADASYN oversampling dataset performance analysis

0.99

0.93

0.92

0.91

0.91

0.99

1.00

0.99

0.99

0.86

0.99

0.99

0.94

F-score

0.99

0.93

0.92

0.91

0.91

0.99

1.00

0.99

0.99

0.86

0.99

0.99

0.94

Accu

5.68

0.12

0.22

0.11

0.03

5.68

1.16

0.03

0.76

0.04

8.40

5.51

0.28

RunTime

278 M. Shyamala Devi et al.

Feature Over Exemplification-Based Classification for Revelation …

279

References 1. Marimuthu M, Hariesh KS, Madhankumar K (2018) Heart disease prediction using machine learning and data analytics approach. Int J Comput Appl 181(18) 2. Huang QA, Dong L, Wan LF (2016) Cardiotocography analysis for fetal state classification using machine learning algorithms. J Micro Electromechanical Syst 25(5) 3. Maknouninejad A, Woronowicz K, Safaee A (2018) Enhanced algorithm for real time temperature rise prediction of a traction linear induction motor. J Inst Energy Futures Smart Power Netw 4. Lakshmanaprabu SK (2018) Effective features to classify big data using social internet of things. IEEE Access 6:24196–24204 5. Jancovic P, Kokuer M (2019) Bird species recognition using unsupervised modeling of individual vocalization elements. IEEE/ACM Trans Audio Speech Lang Process 27(5):932–947 6. Sethi P, Jain MA (2010) Comparative feature selection approach for the prediction of healthcare coverage. Commun Comput Inf Sci 54:392–403 7. Piri J, Mohapatra P, Dey R (2020) Fetal health status classification using MOGA—CD based feature selection approach. In: Proceedings of the IEEE international conference on electronics, computing and communication technologies 8. Keenan E, Udhayakumar RK, Karmakar CK, Brownfoot FC, Palaniswami M (2020) Entropy profiling for detection of fetal arrhythmias in short length fetal heart rate recordings. In: Proceedings of the international conference of the IEEE engineering in medicine & biology society 9. Li J, Huang L, Shen Z, Zhang Y, Fang M, Li B, Fu X, Zhao Q, Wang H (2019) Automatic classification of fetal heart rate based on convolutional neural network. IEEE Internet of Things J 6(2) 10. Chen M, Hao Y, Hwang K, Wang L (2017) Disease prediction by machine learning over big data from health care communities. IEEE Access 5:8869–8879

A Framework with IOAHT for Heat Stress Detection and Haemoprotozoan Disease Classification Using Multimodal Approach Combining LSTM and CNN Shiva Sumanth Reddy and C. Nandini

Abstract The disease in the image can be identified using Haemoprotozoan and Histogram image classification. Existing illness classification algorithms have over fitting issues, data imbalances, and lower feature learning efficiency. Batch normalization is used in this study to lessen the disparity in feature values and increase the classifier’s learning rate. To increase the performance of early detection of heat stress and disease categorization in images, the Long Short-Term Memory (LSTM)— Convolution Neural Network (CNN) model is used. The LSTM layer saves significant data for a long time, and the CNN model selects relevant features based on the LSTM data. The batch normalization method reduces over fitting, the LSTM model aids in feature learning, and the CNN model chooses the relevant features. The performance of the devised approach was tested using the obtained Haemoprotozoan images and breast cancer images. Keywords Batch normalization · Convolution neural network · Haemoprotozoan · Histogram · Long short-term memory

1 Introduction Cattle contract haemoprotozoan illness, which is spread via the ixodid tick and blood transfusion. Babesia spp. and Theileria spp. are two of the most common causes of haemoprotozoan illness, which includes babesiosis and theileriosis. Anaplasmosis in Haemoprotozoanis Anaplasma spp that causes the rickettsial [1, 2]. Haemoprotozoan transmit the disease to animals, tick paralysis, anemia and hide damage [3]. Animal with Haemoparasitaemic has reduced working capacity in bullocks, anemic, S. S. Reddy (B) · C. Nandini Department of Computer Science and Engineering, Dayananda Sagara Academy of Technology and Management, Visvesvaraya Technological University (VTU), Bangalore, India e-mail: [email protected] C. Nandini e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_25

281

282

S. S. Reddy and C. Nandini

poor productive performances, and poor reproductive [4]. Effective treatment and early diagnosis are important to prevent the animals from death to improve country production ratio. Histopathology images are useful for detecting and grading disease at an early stage for brain cancer, Haemoprotozoan disease, breast and lung cancer. Manual detection of disease is subjective, error-prone, labor expense, and huge time. Machine learning methods were applied for detection of disease with less time and more efficient. Breast cancer is common malignancy that led to cancer and disease mortality rate is high around worldwide [5]. ROI approaches based on Handcrafted features are used in traditional image processing methods to focus on important parts of the images. Texture, morphology, and color are retrieved and used to train classifiers like the Support Vector Machine (SVM) and the Artificial Neural Network (ANN) [6, 7]. Deep learning approaches have been effectively utilized in a variety of disciplines, including cancer diagnosis, which has sparked interest in Convolution Neural Network research (CNN). This leads to the rapid development of numerous architectural styles. CNN models were widely used in mitosis identification in histopathology images for breast cancer classification, but these methods were limited to manually selected ROIs [8, 9]. CNN models have the ability to automatically collect mitotic cell appearance properties and have a better potential for performance improvement [10]. The LSTM–CNN model is suggested in this study to increase classification performance in BreakHis and Haemoprotozoan datasets [11–13]. The paper is divided as follows: Sect. 2 has a literature review, Sect. 3 contains an explanation of the suggested approach, Sect. 3.6 contains the simulation setup, Sect. 4 contains the results, and Sect. 5 contains the conclusion.

2 Literature Review Haemoprotozoan tick transmit the disease to animal and also causes tick paralysis, anemia, and hide damage. Few researches were involves in classify the Haemoprotozoan tick disease in images. Some of the recent histopathological image for disease classifications were reviewed in this section. Saini and Susan [14] used a Deep Convolution Generative Adversarial Network (DCGAN) in conjunction with a histology picture to classify breast cancer. In the initial phase, the DCGAN approach is employed for data augmentation of the minority class and is then applied to a deep transfer network. The VGG16 deep network architecture’s initial pre-trained layers’ deep transfer architecture. Dense layer, dropout, Global Average Pooling 2D, 2D convolutional layer, and batch normalization make up the transfer network’s higher end. The new approach is evaluated on the BreakHis dataset and exhibits improved classification performance. The created method has an overfitting problem in the classification process, which is a limitation.

A Framework with IOAHT for Heat Stress Detection …

283

For the identification of Invasive Ductal Carcinoma, Celik et al. [15] used a deep transfer learning approach (IDC). DenseNet-161 and ResNet-50 are deep learning pre-models for IDC detection. The created deep learning algorithm was put to the test using the histopathological BreakHis dataset. When compared to existing methods, the new method performs better in categorization. The developed solution is limited by the network’s overfitting problem. Alom, et al. [16] used CNN to classify breast cancer using the Inception Recurrent technique. Inception, Residual, and Recurrent methods were combined to create the created approach. The created approach was put to the test using the BreakHis and Breast Cancer Classification Challenge 2015 datasets. The deep learning method was put to the test in terms of patient level classification, image-level classification, patch-based classification, and image-based classification. In terms of breast cancer classification, the developed method performs better than existing methods. For the classification of breast cancer histopathology images, Zhu et al. [17] used a multiple compact CNN model. The system is constructed with a hybrid CNN architecture that includes both local and global model branches. Based on local voting and the combining of two branches’ information, the hybrid model offers a stronger representation ability. In the hybrid model, the Squeeze-Excitation-Pruning block is used to learn which channels are crucial and which are redundant. For breast cancer categorization, the described technique is evaluated using the BreakHis dataset. Kumar, et al. [18, 19] suggested a VGGNet-16-based framework for evaluating the performance of various classifiers. The developed approach was put to the test using the BreakHis dataset. To boost classification performance, the strategy comprises using magnification, stain normalization, and data augmentation. To increase the learning performance of the created approach, the transfer learning with fine tuning method is used. ConvNets were utilized to extract features from a fully connected layer. To perform disease categorization, the SVM and Random Forest approaches were utilized as classifiers.

3 Proposed Method 3.1 LSTM Layer In this study, batch normalization is used to scale the data and improve classification performance. The LSTM–CNN model is used with scaled data to improve classification performance. Figure 1 depicts batch normalization using the LSTM–CNN model. The advantage of keeping key features for a long time is considered by recurrent neural networks (RNNs) [19–21]. The RNN model captures information from sequential input, but it suffers from the vanishing gradient problem, which impairs

284 Fig. 1 LSTM–CNN model

S. S. Reddy and C. Nandini

A Framework with IOAHT for Heat Stress Detection …

285

Fig. 2 LSTM cell diagram

the network’s capacity to connect features. Due to its memory cell to store essential information, the LSTM model has a better advantage in feature extraction in sequence of data than CNN. The input data is transmitted through two LSTM layers in this study to extract textual information in a sequential order. Each LSTM has 32 memory cells, and inputs are supplied to various gates to govern gate behavior, such as forget, input, and output gates. The activation function of each LSTM unit is determined using Eq. (1). ht = σ (wi, h · xt + wh, h · ht − 1 + b)

(1)

where hidden bias vector is given as b, the hidden-hidden weight matrix is given as wh, h, the input-hidden weight matrix is denoted as wi, h, non-linear activation function is denoted as σ, activation at time t and t − 1 are denoted as ht and ht − 1, respectively. The LSTM unit cell is shown in Fig. 2.

3.2 Batch Normalization During Deep Neural Network training, the hidden layer’s input distribution changes frequently. This phenomenon is known as internal covariate shift, and the new distribution gradually applies intervals of upper and lower bounds in the activation function. This tends to vanish and lessen the gradient of shall buried layers during backpropagation. As a result, the Deep Neural Network model converges more slowly. The input distribution is transformed into a standard normal distribution with a variance of 1 and a mean of 0. This scaling process results in a distribution in the range

286

S. S. Reddy and C. Nandini

of activation function intervals, as well as a greater loss value in the network for minor changes in input. This also results in a greater gradient, which eliminates the network’s gradient dispersion problem. Input is denoted as [xm] and output is denoted as [ym] in a mini-batch, m∈ [1, 2,…, M]. Batch normalization is explained in steps as: (i) the mean of mini-batch μB is calculated, (ii) variance σ B2 of mini-batch is calculated σ B2 , (iii) the input normalized value x’ is calculated, (iv) the output ym is calculated, as given in Eqs. (2–5). M 1  xm M m=1

(2)

M 1  (xm − μ B )2 M m=1

(3)

μB =

σ B2 =

xm − μ B xm =  σ B2 + ε

(4)

ym = γ xm = β

(5)

where backpropagation is learned using two parameters such as γ and β, and a small positive number is denoted as for avoiding the divisor is 0. The gradient noise estimate and training efficiency is set as 256 in the mini-batch size.

3.3 Convolutional Layer Because of its ability to learn unique representations from images, CNN has gained prominence in a variety of fields [22, 23]. CNN uses a convolution layer to learn features from input images, and convolution kernels are used to convolve the inputs. As seen in equation, this acts as a filter and activates a non-linear activation function (6).  ai, j˙ = f

M  N 

 wm , n · xi+m, j+n + b

(6)

m=1 n=1

where upper neurons connected to neuron (i, j) is denoted as xi + m, j + n, the bias value is denoted as f , a non-linear function is denoted as b convolution of weight matrix is denoted as wm,n, and corresponding activation is denoted as ai,j. In this study, rectified linear units (ReLU) of the convolution layers to measure the feature maps and non-linear function denoted as: (x) = max (0, x)

(7)

A Framework with IOAHT for Heat Stress Detection …

287

More hidden features are mined for input samples and more convolution kernels are used. In LSTM–CNN mode, two convolutional layers are applied. In first convolutional layer, each convolution kernel is set as 1 × 5 and for feature extraction, 64 convolution kernels are used. The convolution window is set as 2 in sliding step and a deeper feature extraction are performed in 128 convolution kernels. A size of 1 × 3 is set for each convolution kernel and layer has convolution window of 1 step size. The down sampling is performed in convolutional layer on max-pooling layer between two convolutional layers. This performs two purposes: (i) maintain dominant features to reduce the parameters, and (ii) filter the interference noise in images.

3.4 Pooling Layer Many features in the activation map add to the computational overhead and overfitting problem. In the pooling layer, non-linear sub-sampling is created and used to decrease the features. To achieve translation invariance, pooling is used. There are two types of pooling methods: maximum pooling and average pooling. The maximum pooling method is used to select the maximum value of elements in each pooling zone, whereas the average pooling method is used to select the average value. Consider P as activation set, and E as pooling region, activation is given in Eq. (8). P = { pk|k ∈ E}

(8)

Average Pooling (AP) is given in Eq. (9),  AP =

PE |PE |

(9)

The cardinal number of set x is |x|. The max-pooling is given in Eq. (10), M P = max(P E)

(10)

3.5 Output Layer Fully connected layer and a softmax classifier is applied in output layer and this is important to add fully connected layer at model last layer. Fully connected layer each node is connected to upper layer of nodes to merge the extracted features. This helps to overcome the limitation of GAP layer.

288

S. S. Reddy and C. Nandini

Fig. 3 Batch normalization and LSTM–CNN model

Softmax classifier is applied behind fully connected layer that converts upper layer output to probability vector to represent classes probability in which current samples belongs. The Eq. (11) provides the formula for softmax layer. ea j s j = N k=1

e ak

(11)

where output vector is defined as aj in j-th value, fully connected layer output vector is a and the number of classes is denoted as N. The suggested LSTM–CNN standard diagram is demonstrated in Fig. 3.

3.6 Simulation Setup The proposed LSTM–CNN with batch normalization method is implemented in two datasets to test the performance. The implemented details of the suggested LSTM– CNN standard is discussed in this segment.

A Framework with IOAHT for Heat Stress Detection …

289

Datasets: Two datasets such as BreakHis and collected Haemoprotozoan datasets have being applied to assess the implementation of the proposed LSTM–CNN example. The cattle heat stress dataset consists of two types of features: internal and external features. The internal features involve in respiratory, history of yield, cattle type, and sweat rate. The external features involve temperature and humidity from the region of Karnataka. The cattle respiratory, sweat rate, temperature, and humidity were measured using the IoT devices. The data were collected from the different regions of Karnataka. Metrics: Accuracy, Sensitivity, and Specificity have being applied to estimate the execution of the developed LSTM–CNN version. The formula used for Accuracy, Sensitivity, and Specificity were given in Eqs. (12–14). ACCURACY =

TP + TN TP + TN + FP + FN

(12)

SENSITIVITY =

TP × 100 TP + FN

(13)

SPECIFICITY =

TN × 100 TN + FN

(14)

System Specification: The proposed LSTM–CNN model is executed in method of Intel i7 workstation, 16 GB of RAM and 6 GB Graphics card. The proposed LSTM–CNN model is implemented in Python 3.7 tool. Parameter settings: The LSTM–CNN model has parameter settings of 0.01 dropout layer, 128 batch size, 40 epochs, kernel size is set as (4,4) and filters is set as 32.

4 Results The proposed LSTM–CNN with batch normalization method is tested on two datasets such as BreakHis and Haemoprotozoan. The proposed LSTM–CNN model and standard classifiers were tested on: Haemoprotozoan dataset, as demonstrated in Table 1 and Fig. 4. The suggested LSTM–CNN model have got superior execution compared to standard classifier. The proposed LSTM–CNN model has higher efficiency in feature extraction and batch normalization for proper scaling of the feature update in neural network. The LSTM model has advantage of store the relevant features for long term and CNN layer [24, 25] extract the relevant features and applies for classification. The LSTM–CNN model is tested on BreakHis dataset and evaluated with standard classifiers, as demonstrated in Table 2 and Fig. 5. The LSTM–CNN standard has better performing in BreakHis dataset compared to standard classifier. The RF model has overfitting problem due to more number of tree generation and SVM has

290

S. S. Reddy and C. Nandini

Table 1 Performance of the proposed LSTM–CNN model in Haemoprotozoan dataset Methods

Accuracy (%)

Sensitivity (%)

Specificity (%)

SVM

87.3

88.2

88.9

RF

89.3

89.2

89.1

LSTM

91.5

91.7

91.2

CNN

94.2

93.4

93.1

LSTM–CNN

98.2

97.3

97.6

Fig. 4 The performance of LSTM–CNN on Haemoprotozoan dataset

imbalance data problem. LSTM standard has control of evaporating gradient difficult and CNN has overfitting problem. The batch normalization method is applied to reduce the overfitting problem, LSTM–CNN model extracts the relevant feature for classification. The proposed LSTM–CNN model and existing models were compared on Haemoprotozoan dataset, as demonstrated in Table 3 and Fig. 6. Existing methods have limitation of overfitting problem in the convolution layer. The proposed LSTM–CNN Table 2 The performance of LSTM–CNN model on BreakHis dataset

Methods

Accuracy (%) Sensitivity (%) Specificity (%)

SVM

78.2

76.3

RF

82.3

81.4

82.1

LSTM

92.2

93.1

92.1

CNN

94.5

94.6

94.2

LSTM–CNN 98.7

98.3

98.2

76.5

A Framework with IOAHT for Heat Stress Detection …

291

Fig. 5 The performance of LSTM–CNN model on BreakHis dataset

model has applied batch normalization to reduce overfitting problem. The LSTM layer holds the import information for long term and this helps the CNN layer to select the relevant features. The proposed LSTM–CNN model and existing methods were tested on BreakHis dataset, as demonstrated in Table 4 and Fig. 7. The LSTM layer stores the important information for long term and CNN version chooses the characteristics established on LSTM layer. The batch normalization method helps to scale the data to reduce the difference for improve learning rate. The proposed LSTM–CNN model performance in cattle heat stress detection is associated with current processes, as demonstrated in Table 5 and Fig. 8. This demonstrates that proposed LSTM–CNN model has higher efficiency in heat stress detection compared to standard classifiers. The LSTM model gets the relevant characteristics from the input data and CNN standard performs classification. The computation complexity of the suggested LSTM–CNN model is O (N + 1) for the classification process. The computational complexity of SVM is O(n3), The learning complexity Table 3 Comparative analysis of LSTM–CNN model on Haemoprotozoandataset Methods DCGAN [14]

Accuracy (%)

Sensitivity (%)

Specificity (%)

92.3

91.2

91.4

DenseNet and ResNet [15] 91.2

89.1

90.5

Inception Recurrent [16]

94.1

93.5

92.4

Hybrid CNN [17]

81.4

82.6

81.2

VGG-16 Net [18]

91.3

91.1

90.6

LSTM–CNN

98.2

97.3

97.6

292

S. S. Reddy and C. Nandini

Fig. 6 Comparative analysis of LSTM–CNN model on Haemoprotozoan dataset

Table 4 Comparative analysis on BreakHis dataset

Methods

Accuracy (%)

Sensitivity (%)

Specificity (%)

DCGAN [14]

96.5

95.2

95.1

DenseNet and ResNet [15]

91.2

89.59

93.56

Inception Recurrent [16]

96.7

97.9

93.07

Hybrid CNN [17]

83.9

89.8

94.9

VGG-16 Net [18]

91.4

94

91.3

LSTM–CNN

98.7

98.3

98.2

of LSTM per weight is O(1). The outcome demonstrates that the intended LSTM– CNN model has greater execution in the disease category and heat stress detection in cattle.

5 Conclusion In this study, the batch normalization technique is employed to reduce the differences in features to improve the learning performance and reduces the features. The LSTM– CNN model stores the relevant information and selects the relevant features for

A Framework with IOAHT for Heat Stress Detection …

293

Fig. 7 Comparative analysis on BreakHis dataset Table 5 Performance of proposed LSTM–CNN model in cattle heat stress detection

Methods

Accuracy (%)

SVM

85.1

RF

88.2

LSTM

93.4

CNN

92.1

LSTM–CNN

98.2

Fig. 8 Heat stress detection in cattle

294

S. S. Reddy and C. Nandini

classification. The LSTM layer helps to solve imbalance data problem and CNN model performs classification process. The Haemoprotozoan and BreakHis datasets were used to assess the implementation of the expanded technique. Future work of the established process involves in applying the attention layer to improve the performance of feature selection.

References 1. Bary MA, Ali MZ, Chowdhury S, Mannan A, Nur e Azam M, Moula MM, Bhuiyan ZA, Shaon MTW, Hossain MA (2018) Prevalence and molecular identification of haemoprotozoan diseases of cattle in Bangladesh. Adv Anim Vet Sci 6(4):176–182 2. Patra G, Ghosh S, Mohanta D, Kumar Borthakur S, Behera P, Chakraborty S, Debbarma A, Mahata S (2019) Prevalence of haemoprotozoa in goat population of West Bengal, India. Biol Rhythm Res 50(6):866–875 3. Jayalakshmi K, Sasikala M, Veeraselvam M, Venkatesan M, Yogeshpriya S, Ramkumar PK, Selvaraj P, Vijayasarathi MK (2019) Prevalence of haemoprotozoan diseases in cattle of Cauvery delta region of Tamil Nadu. J Parasit Dis 43(2):308–312 4. Prameela DR, Rao VV, Chengalvarayulu V, Venkateswara P, Rao TV, Karthik A (2020) Prevalence of Haemoprotozoan infections in Chittoor District of Andhra Pradesh. J Entomol Zool Stud 5. Ghosh S, Patra G, Kumar Borthakur S, Behera P, Tolenkhomba TC, Deka A, Kumar Khare R, Biswas P (2020) Prevalence of haemoprotozoa in cattle of Mizoram, India. Biol Rhythm Res 51(1):76–87 6. Mahmood T, Arsalan M, Owais M, Lee MB, Park KR (2020) Artificial intelligence-based mitosis detection in breast cancer histopathology images using faster R-CNN and deep CNNs. J Clin Med 9(3):749 7. Carvalho ED, Antonio Filho OC, Silva RR, Araujo FH, Diniz JO, Silva AC, Paiva AC, Gattass M (2020) Breast cancer diagnosis from histopathological images using textural features and CBIR. Artif Intell Med 105:101845 8. Wahab N, Khan A (2020) Multifaceted fused-CNN based scoring of breast cancer whole-slide histopathology images. Appl Soft Comput 97:106808 9. Saxena S, Shukla S, Gyanchandani M (2021) Breast cancer histopathology image classification using kernelized weighted extreme learning machine. Int J Imaging Syst Technol 31(1):168– 179 10. Sebai M, Wang T, Al-Fadhli SA (2020) PartMitosis: a partially supervised deep learning framework for mitosis detection in breast cancer histopathology images. IEEE Access 8:45133–45147 11. Alhussein M, Aurangzeb K, Haider SI (2020) Hybrid CNN-LSTM model for short-term individual household load forecasting. IEEE Access 8:180544–180557 12. Rajesh Kumar Dhanaraj D, Reddy Gadekallu T, Aboudaif MK, Abouel Nasr E, Krishnasamy L (2020) A heuristic angular clustering framework for secured statistical data aggregation in sensor networks. Sensors 20(17):4937 13. Reddy SS, Nandini C (2021) Edge boost curve transform and modified relief algorithm for communicable and non-communicable disease detection using pathology. Int J Intell Eng Syst (INASS) 14(2):463–473 14. Saini M, Susan S (2020) Deep transfer with minority data augmentation for imbalanced breast cancer dataset. Appl Soft Comput 97:106759 15. Muthukumaran V, Hsu CH, Karuppiah M, Chung YC, Chen YH (2021) Public key encryption with equality test for Industrial Internet of Things system in cloud computing. Trans Emerg Telecommun Technol:e4202

A Framework with IOAHT for Heat Stress Detection …

295

16. Celik Y, Talo M, Yildirim O, Karabatak M, Acharya UR (2020) Automated invasive ductal carcinoma detection based using deep transfer learning with whole-slide images. Pattern Recogn Lett 133:232–239 17. Alom MZ, Yakopcic C, Nasrin MS, Taha TM, Asari VK (2019) Breast cancer classification from histopathological images with inception recurrent residual convolutional neural network. J Digit Imaging 32(4):605–617 18. Zhu C, Song F, Wang Y, Dong H, Guo Y, Liu J (2019) Breast cancer histopathology image classification through assembling multiple compact CNNs. BMC Med Inform Decis Mak 19(1):1–17 19. Kumar A, Singh SK, Saxena S, Lakshmanan K, Sangaiah AK, Chauhan H, Shrivastava S, Singh RK (2020) Deep feature learning for histopathological image classification of canine mammary tumors and human breast cancer. Inf Sci 508:405–421 20. Nagarajan SM, Chatterjee P, Alnumay W, Muthukumaran V (2022) Integration of IoT based routing process for food supply chain management in sustainable smart cities. Sustain Cities Soc 76:103448 (2022) 21. Nagarajan SM, Chatterjee P, Alnumay W, Ghosh U (2021) Effective task scheduling algorithm with deep learning for Internet of Health Things (IoHT) in sustainable smart cities. Sustain Cities Soc 71:102945 22. Shahid F, Zameer A, Muneeb M (2020) Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM. Chaos, Solitons Fractals 140:110212 23. Chimmula VKR, Zhang L (2020) Time series forecasting of COVID-19 transmission in Canada using LSTM networks. Chaos, Solitons Fractals 135:109864 24. Yu C, Han R, Song M, Liu C, Chang CI (2020) A simplified 2D–3D CNN architecture for hyperspectral image classification based on spatial–spectral fusion. IEEE J Sel Top Appl Earth Observations Remote Sens 13:2485–2501 25. Heidari M, Mirniaharikandehei S, Khuzani AZ, Danala G, Qiu Y, Zheng B (2020) Improving the performance of CNN to predict the likelihood of COVID-19 using chest X-ray images with preprocessing algorithms. Int J Med Inf 144:104284

Using Classifier Ensembles to Predict Election Results Using Twitter Data Sentiment Analysis Pinki Sharma and Santosh Kumar

Abstract A quick and easy technique to gauge public opinion on a political party and its leaders is through the use of sentiment analysis on Twitter. Prior sentiment analysis algorithms have struggled to identify the best classifier for a given classification challenge. As long as a single classifier is selected, there is no guarantee that it will perform the best on unknown data. We’re combining the results of several classifiers in order to lessen the chance of picking the wrong one. Our method is to classify the twitter data on the basis of sentiments using a ML classifier that is combined with a lexicon-based one. We’ve added the numerous classifier such as SentiWordNet, Naive Bayes, and Hidden-Markov Model (HMM) to new set of classifiers. In this case, the majority voting principle is used to assess whether or not a tweet is positive or negative. Consequently, we have enhanced the accuracy of sentiment analysis utilising the classifier ensemble approach. In order to achieve high accuracy, we also make use of our method like word sense disambiguation and negation handling. Keywords Machine learning · Opinion mining · Lexicon · Ensemble · WordNet · SentiWordNet

1 Introduction Tweets are short communications that can be posted, updated, and read on Twitter’s fast growing social network. Tweets allow people to express themselves freely and openly about a wide range of topics. There are many ways to recognise and categorise text’s emotional tone, but sentiment analysis (SA) is the most commonly used

P. Sharma Computer Science and Engineering, ABES Engineering College, Ghaziabad, India e-mail: [email protected] S. Kumar (B) School of Computing Science & Engineering, Galgotias University, Greater Noida, Uttar Pradesh, India e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_26

297

298

P. Sharma and S. Kumar

method [1]. Scholars use a variety of machine learning algorithms to gain meaningful access to human reviews. Three different levels of sentiment analysis are available. In addition to the aspect, sentence, and document levels, an aspect-level technique was used to find sentiment by analysing all of the emotional expressions in a document. Positive, Negative, or Neutral reviews can be expressed in a sentence by looking at the level of the sentence. As a final option, you can select Positive, Negative, or Neutral as the sentiment of a document’s entire classification. Many areas, including e-commerce, health care, the arts, and politics, make use of this strategy. As an example, sentiment analysis can be used by firms to keep tabs on consumer sentiments about their products, and by consumers to find the best products based on public opinion. Twitter SA’s primary goal is to identify whether a tweet has a good or negative sentiment. Twitter sentiment analysis faces a number of obstacles, including: (1) tweets are written in informal language, (2) brief messages don’t provide much information on sentiment, and (3) abbreviations and acronyms are common on Twitter. It is a microblogging service, which is rapidly growing in popularity. Users of the microblogging platform can post their views and opinions about various events and people. As a result, Twitter is regarded as a useful tool for studying public sentiment. When it comes to sentiment analysis, it’s all about looking at how people express themselves through words. It’s a challenge of classifying words into positive or negative connotations based on the polarity of their meanings. Text analysis, natural language processing (NLP), and computational linguistics can all be used to identify and extract the sentiments from source materials can be identified and extracted using the text analysis method, natural language processing (NLP), and computational linguistics approach. Opinion examination is utilised to decide an author’s attitude towards a specific issue or the overall extremity of a work. For sentiment analysis, the primary goal is to categorize Positive, Negative, or Neutral views in a given text at the document, or phrase level [2] has become the primary goal. How closely a sentiment analysis agrees with human opinions determines how accurate it is. Precision and recall [3] are two metrics that can be used to gauge this. Using a combination of machine learning and lexicon-based sentiment classifiers [4], this study proposes an accurate sentiment classifier that can be used to identify political sentiment and sentiment towards newly released movies from tweets. The approach is thoroughly examined in Sect. 2. In this section, we’ll talk about how we did it and what we learned. The conclusion is found in Sect. 4.

2 Related Work The growth of social networks, particularly Twitter, is accelerating rapidly these days. Most people use Twitter to discuss a person, a product, or a film. Generally, people use Twitter to express their opinions about other people or things like movies, products, and other media. Twitter makes it simple for anyone to express their thoughts on any subject. Machine learning techniques and lexicon approaches can be used to examine this data. In terms of accuracy, it’s excellent [5].

Using Classifier Ensembles to Predict Election Results Using Twitter …

299

To assess multihull social data benefits, another model is introduced as “Sentiment analysis as a Service” (SaaS). Using the spatio-temporal properties of social media, users’ disease outbreaks can be located [6]. Those who spend time on social media sites like Facebook and Twitter can be thought of as social sensors who collect vital and trustworthy data. When used in conjunction with a trained neural network for SA, the “Select Additive Learning” (SAL) approach can enhance the generalisation abilities of the network. There is also evidence to support this claim that the SAL technique outperforms in all three models (verbal, acoustic, and visual) [7]. Feature and decision-level fusion approaches function well together as sources of information. That allows for a wide range of treatment options. The YouTube dataset surpasses current systems by about 15% and achieves an accuracy of roughly 85% [8]. The sentiment analysis can also be presented visually via images. For each pair of photos, the Temporal CNN is used to integrate them into one image. A weighted matrix is concatenated with the activation matrix of the layer below that is trained in deep studies of CNNs [9]. The English language has twenty-one approaches for sentiment analysis that may be compared to two language-specific methodologies. It shows that a multilingual sentence-level method is also supported based on nine datasets particular to specific languages [10]. In order to categorise emotions from social networks, the Naive Bayes and Levantine algorithms are applied. This approach is particularly effective for delivering real-time data from platforms, such as Twitter and Facebook [11]. Unsupervised Training for initial word embeddings is provided by using a neural language model that is then modified using different supervised corpora. Letters are taught in a controlled environment. The model that performs better in sentiment analysis is started with pre-trained parameters [12]. Online and offline sentiment analysis can be performed using a two-step approach. To train a classifier, tweet datasets are pre-processed to extract meaningful information that can be used as a training set. Once the classification model has been trained, it is stored in secondary storage until it is needed in the online phase [13]. In the realm of SA, approaches such as NLP and ML are gaining ground. Sentiment classification is accomplished via a two-step procedure. Positive, Negative, and Neutral polarity are scored using the first lexicon technique. SVM classifier is used to pass second tweets with low polarity strength, and this two-step technique beats [14]. Sentiment analysis is based on the Netflix and Stanford methodology. The ensemble SA method detects human trafficking in web data. Two human trafficking-related websites, DARPA and MEMEX, can be found on binary sentiments like “yes” or “no”, as well as categories like “love” or “neutral” [15]. According to another data source of opinion mining, Twitter opinion mining is challenging and useful because of misspellings of repeated and slang terms. Keeping in mind, the 140-character limit for tweets, many shortened form of words are used. So, it becomes important to find the right meaning of tweets. That’s going to necessitate finding the right meaning in each word. SVM and Naive Bayes classifiers are used in this model, which has been shown to be very accurate [16]. Sentiment analysis can benefit from a combination of rule-based classifiers and supervised learning. On top of that, the SVM is trained using dependence and the sentiment lexicon feature [17].

300

P. Sharma and S. Kumar

The normalisation of microtext to plain text in English is discussed using a phoneticbased approach. After normalising, the classification accuracy is thus improved by > 4% in terms of detecting polarity [18].

3 Methodology The majority of the system’s work is focused on extracting tweets and determining their sentiment. Sentiment classifiers are built using machine learning and lexiconbased classifiers in this case. Proposed system contains six modules as given in Fig. 1. Figure 2 depicts a diagram of the suggested sentiment categorization architecture. Using this classifier, political sentiment may be discovered in real time from retrieved tweets. Modules of the proposed system, i.e. SA works in the following manner:

Fig. 1 Main modules of sentiment classification system

Using Classifier Ensembles to Predict Election Results Using Twitter …

301

Fig. 2 Proposed methodology

• The tweets, reviews, and comments that are collected from the social networking sites are pre-processed and remove the stop words, then extract the features on the basis of vector representation using the classifier. • The probability of being positive sentiment or negative sentiment is returned. • It provides a full metrics which includes accuracy, precision, recall, and F1 score. To attain a maximum level of accuracy, the entire system is trained using millions of reviews from different sites and datasets like Kaggle, etc.

3.1 Acquiring Data The Twitter streaming API tool [19] is used to retrieve Twitter feeds in a continuous basis and has real-time access to Twitter’s publicly available data. Negative, Positive, or Neutral tweets are fed into a pre-processing module, which then sorts them.

302

P. Sharma and S. Kumar

3.2 Data Pre-processing A regular expression is used to identify the presence of a URL, and then, all URLs are removed from the extracted tweet. @ user’s private usernames are then deleted from the list. Afterwards, it eliminates all the special characters and Hash tags (#) symbol. Then, the classification schemes are used to sort the more refined tweets. Our classifiers’ accuracy was much enhanced by our approach to processing negative data. Negation management is a fundamental stumbling block in sentiment classification. The word “win” in the sentence “not win” denotes positive attitude instead of negative attitude because here each word is used as a feature. As a result, classification errors will occur. Not taking into account the existence of “not” is the cause of this type of error. We used state variables and bootstrapping in conjunction with a basic technique for handling negations to arrive at a solution. Using a different method of representation of negated forms [20] was the inspiration for our work [1]. Status variables are used to keep track of the naysayers’ state. As soon we set the negation state variable, the read terms are interpreted as “not” + word. The state variable will be reset if a punctuation mark or a double negation is encountered. Our three classifiers have been given negation handling to ensure proper categorization.

3.3 Classification of Sentiments Using SentiWordNet SentiWordNet (SWN) and WordNet are used to classify tweets based on their emotional content. Accurate classification relies on the concept of word sense disambiguation [1]. Groups of synonyms (synsets) for each word in the English language are stored in WordNet, a database for the English language. WordNet’s SentiWordNet adds numeric scores for positivity, negativity, and objectivity to each WordNet synset. In some cases, WordNet lexical relations may not be a useful predictor of polarity detection. Depending on the word’s part of speech, distinct synonyms may have different polarities. Sense-tagged word lists can be used to solve this problem. For this, we present sentiment classifier based on WordNet, which utilises word sense disambiguation. Each word has a separate sentiment weight assigned by the SWN classifier. Additionally, the “part of speech” of a word must be identified in order for the SWN classifier to properly categorise it.

3.4 Naive Bayes for Sentiment Classification As the name suggests, Naïve-Bayes classifier is one of the simplest and most commonly used. According this classifier, a class’s posterior probability is computed using the word distribution in the document. Like in BOW classification, the document is represented as a single object. The BOW feature extraction is used by this

Using Classifier Ensembles to Predict Election Results Using Twitter …

303

model, and it doesn’t care where a word appears in the document. The Bayes Theorem is used to calculate the probability of a given set of features which belong to a particular label. Bigrams from the Twitter data are utilised as features on Naive Bayes for sentiment analysis on Twitter. It classifies tweets into two categories: those that are favourable and those that are bad.

3.5 Sentiment Classification Using HMM This classifier uses the Viterbi forward backward algorithm, where each state is traversed: every individual state represents a probable forecast of the opinions across a context traversed by this process. As we progress with the algorithm, sentiment tags are predicted. To avoid the polynomial growth of a BFS, the Viterbi method uses best “m” Maximum Likelihood Estimates (MLEs) to trim the search tree at each state using the next word’s tags. An emission probability is used to keep record of the opinion tag given the word and its frequency of repetition in the training data for the present state of the sentiment of the sentence. Transition probability is the second probability, which deals with the current state of a system in relation to the prior one. However, there are certain positives to this, such as when enough information exists to incite the assumption that the system is no longer looking at the sentence in question as either positive reviews, negative reviews or neutral reviews, a required shift can be done. The current state of the system, the current likelihood, and the probability that the next word to be added will induce a transition are all taken into account while making this decision. Markov’s assumption takes into account the preceding and succeeding states.

3.6 Ensemble Approach for Sentiment Classification To automatically classify the sentiment of tweets, we combine ML classifiers and the Lexicon-based classifier [21]. As an outcome, we are utilising SentimentNet, Naive Bayes, and a HMM Classifier to accurately sort through our political data. Positive attitude or negative attitude behind the tweets is assessed on the basis of the results of these three classifiers, with the principle of majority voting being applied. Word sense disambiguation and SentiWordNet opinion lexical resources are used by the SentiWordNet classifier to accurately classify tweets retrieved in real time. Thus, this classifier takes into account the context in which the term is used and selects the most appropriate meaning. Classifiers based on training data are used for the other two. That’s why our sentiment classifier is so good at detecting real-time political sentiment in tweets.

304

P. Sharma and S. Kumar

4 Algorithms to Calculate Sentiment Score Algorithm 1 is used to find out the sentiment score of the tweet. Data is classified in two categories. Test data which is used to test the overall accuracy of the system. In ensemble classifier, each base classifier predicts the sentiments of the tweets whether the tweets are positive or negative. After the system is tested, next step is to determining the probability of the tweets to be positive or negative. Then, on the basis of accuracy of the system, the weight is assigned to the classifier. In the last step of the algorithm, the sentiment score is calculated. Calculation for Sentiment Score Input Tweets for Each tweet: PC = 0, NC = 0; —PC: Positive Count, NC: Negative Count For each classifier in Ci Update PC and NC Prob (POSi) = PCi/ Pcii + Nci Prob(NEGi) = Nci/Pci + Nci For each Classifier Calculate weight Wci Wci = (Accuracy of ith Classifier)/Summation of Acc of jth Classifier For each tweet PS: = 0, NS = 0;—PS: Positive Score, NS: Negative Score Psi = Wci* Pr(+ve); Nsi = Wci* Pr(-ve); Return PS, NS

Algorithm 2 is used to calculate the sentiment of the tweets. The (+) ve score or (−) ve score of the tweets are given as the input to the algorithm. If the (+) ve score is high, then sentiment is taken as positive else negative, and in any case if both the score are similar, then cosine similarity is calculated with all other tweets, and similar tweets are identified. After that again it calculates sentiment score of identified tweets. Again if (+) ve score is high, then sentiment is positive else negative. Ensemble approach-based algorithm for predicting sentiment Input Tweets, PS, NS If Psi > Nsi Sentiment = + ve Else if Nsi > Psi Sentiment = -ve Calculate Jaccard Similarity for similar tweets Calculate PSj and NSj using first algorithm If PSj > = NSj then; Sentiment = + ve; Else Sentiment = − ve; Return Sentiment;

Using Classifier Ensembles to Predict Election Results Using Twitter …

305

5 Experimental Details Experiments were conducted, and the results are presented in this section. Ensemble approach classifier is implemented by combining various classifier such as SentiWordNet, Naïve-Bayes Classifier, and HMM. We have used python library and frameworks for the implementation of our proposed approach. The training of the classifier needs some parameter. Here, we have taken the dataset of political reviews from the twitter. Out of all the datasets of reviews, 80% of the data is used for training purpose, and 20% of the data is used for testing purpose. Different proportions are additionally worked upon for the train and test sets which have decided on the most accurate outcomes that can outperform the past results. The ensemble approach classifier that is implemented by adding various traditional classifier such as Naïve-Bayes classifier, SentiWordNet, and HMM gives the best accuracy. In this study, to overcome the issue of convergence to a desired level we have used a method of selecting a parameter after we trained our model multiple times. Subsequently, by more than once preparing the model while dispensing a few irregular qualities as introductory loads is to address the problem of non-convergence.

5.1 Results on the Twitter Sentiment Analysis Datasets Different politician’s Twitter feeds were analysed for four weeks during the election campaign period. Then, we used the classifier ensemble approach to analyse the sentiment of these tweets. An API called Twitter streaming API is used to collect data. In real scenario, it pulled relevant tweets for a particular query string from Twitter’s streaming API and saved them as rows in an Excel.csv file. A classification scheme is used to classify the tweets once they’ve been pre-processed. This dataset consists of 99,989 tweets out of which 43,532 are categorised as positive tweets and 56,457 are negative. Table 1 represents the comparative study of the results obtained from ensemble approach and traditional classifier using the Twitter sentiment analysis datasets. Performance evaluation of various classifiers is shown in Fig. 3.

5.2 Results on Healthcare Reform Dataset This particular dataset is collected by finding tweets by the keyword #hcr. This dataset collectively contains 888 tweets out of which 365 are categorised (+) ve and 523 are (−) ve tweets. Table 2 gives the comparison of the results of ensemble approach and traditional classifiers on healthcare reform dataset, HCR. Performance evaluation of classifiers is shown in Fig. 4.

306

P. Sharma and S. Kumar

Table 1 Results obtained from various classifiers on twitter dataset Techniques

Accuracy (%)

(+) ve value

(−) ve value

Average

Pre

Rec

F1

Pre

Rec

F1

F1

SentiWordNet classifier

73.30

68.86

68.73

73.18

67.57

68.67

68.06

72.81

Naïve-Bayes classifier

70.80

73.37

77.37

76.56

62.55

39.58

52.45

65.59

Hidden-Markov Model

69.92

72.85

72.99

82.56

52.88

56.68

62.34

72.91

Ensemble classifier

74.68

76.85

85.61

88.65

76.74

63.85

67.37

75.45

90 80 70 60 50 40 30 20 10 0 SentiWordNet

Naïve-Bayes Classifier Accuracy

Precision

Hidden-Markov Model Recall

Ensemble Classifier

F1 Score

Fig. 3 Performance evaluation of various classifiers on twitter datasets Table 2 Results obtained from various classifiers on HCR dataset Techniques

Accuracy (%)

(+) ve value

(−) ve value

Average

Pre

Rec

F1

Pre

Rec

F1

F1

SentiWordNet classifier

70.30

56.86

62.36

59.49

78.66

74.57

76.56

68.02

Naïve-Bayes classifier

71.80

59.37

61.29

60.32

78.82

77.46

78.13

69.22

Hidden-Markov model

69.92

65.85

29.03

40.30

70.67

91.90

79.70

60.10

Ensemble classifier

73.68

63.85

56.99

60.23

78.14

82.56

81.56

70.45

Using Classifier Ensembles to Predict Election Results Using Twitter …

307

80 70 60 50 40 30 20 10 0 SenWordNet

Naïve-Bayes Classifier Accuracy

Precision

Hidden-Markov Model Recall

Ensemble Classifier

F1 Score

Fig. 4 Performance evaluation of various classifiers on HCR datasets

6 Conclusion In this study, an ensemble approach classifier is used for the implementation of real-time Twitter sentiment analyser. Twitter SA is a domain where the significant methodology compares various traditional classifiers and select the most accurate classifier on the basis of accuracy to execute tweet sentiments. The outfit arrangement procedures have been broadly utilised in various field to take care of the order issue. Yet, if there should be an occurrence of tweet feeling investigation, similarly less amount of research has been performed on the utilisation of collecting classifiers. In our work, we have proposed an ensemble-based approach classifier which is displayed to work on the presentation of tweet opinion arrangement. The ensemblebased classifier is implemented by using the base classifiers like SentiWordNet, Naïve-Bayes classifier, and HMM. The results conclude that the ensemble classifier outperforms the stand-alone classifiers. This approach is material for organisations to screen shopper assessments in regards to their item, and for purchasers to pick the best items in view of general conclusions. For future work, the principal centre is to extend this classifier for the nonpartisan tweets since a portion of the tweets are neither positive reviews nor negative reviews. Despite the fact that proposed methodology is in the space of twitter information, it suggests that this paper can be reached out to the examination of information on other interpersonal organisation stages also.

308

P. Sharma and S. Kumar

References 1. Kathuria P, Shirai K (2020) Example based word sense disambiguation towards reading assistant system. Assoc Natural Lang Process 2. Valakunde ND, Patwardhan MS (2018) Multi-aspect and multi-class based document sentiment analysis of educational data catering accreditation process IEEE. In: International conference on cloud and ubiquitous computing and emerging technologies 3. Liu B (2020) Sentiment analysis and opinion mining. Morgan and Claypool Publishers 4. Kumar S, Kumar R, Sidhu (2022) Comparative analysis of classifiers based on spam data in Twitter sentiments. In: Tomar A, Malik H, Kumar P, Iqbal A (eds) Machine learning, advances in computing, renewable energy and communication. Lecture notes in electrical engineering, vol 768. Springer, Singapore. https://doi.org/10.1007/978-981-16-2354-7_36 5. Ali K, Dong H, Bouguettaya A, Erradi A, Hadjidj R (2017) Sentiment analysis as a service: a social media based sentiment analysis framework. In: 2017 IEEE international conference on web services (ICWS). IEEE, pp 660–667, 25 June 2017 6. Wang H, Meghawat A, Morency LP, Xing EP. Select-additive learning: improving generalization in multimodal sentiment analysis. In: 2017 IEEE international conference on multimedia and expo (ICME). IEEE, pp 949–954, 10 Jul 2017 7. Poria S, Cambria E, Howard N, Huang GB, Hussain A (2016) Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing 22(174):50–59 8. Poria S, Chaturvedi I, Cambria E, Hussain A. Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 439–448, 12 Dec 2016 9. Araujo M, Reis J, Pereira A, Benevenuto F (2016) An evaluation of machine translation for multilingual sentence-level sentiment analysis. In: Proceedings of the 31st annual ACM symposium on applied computing pp 1140–1145, 4 Apr 2016 10. Shahare FF (2017) Sentiment analysis for the news data based on the social media. In: 2017 international conference on intelligent computing and control systems (ICICCS). IEEE, pp 1365–1370, 15 Jun 2017 11. Severyn A, Moschitti A (2015) Twitter sentiment analysis with deep convolutional neural networks. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 959–962, 9 Aug 2015 12. Karanasou M, Ampla A, Doulkeridis C, Halkidi M. Scalable and real-time sentiment analysis of twitter data. In: 2016 IEEE 16th international conference on data mining workshops (ICDMW). IEEE, pp. 944–951, 12 Dec 2016 13. Bindal N, Chatterjee N (2016) A two-step method for sentiment analysis of tweets. In: 2016 international conference on information technology (ICIT). IEEE, pp. 218–224, 22 Dec 2016 14. Mensikova A, Mattmann CA, Ensemble sentiment analysis to identify human trafficking in web data 15. Amolik A, Jivane N, Bhandari M, Venkatesan M (2016) Twitter sentiment analysis of movie reviews using machine learning techniques. Int J Eng Technol 7(6):1–7 16. Kanakaraj M, Guddeti RM (2015) Performance analysis of ensemble methods on Twitter sentiment analysis using NLP techniques. In: Proceedings of the 2015 IEEE 9th international conference on semantic computing (IEEE ICSC 2015). IEEE, pp 169–170, 7 Feb 2015 17. Satapathy R, Guerreiro C, Chaturvedi I, Cambria E. Phonetic-based microtext normalization for twitter sentiment analysis. In: 2017 IEEE international conference on data mining workshops (ICDMW). IEEE, pp 407–413, 18 Nov 2017 18. Speriosu M, Sudan N, Upadhyay S, Baldridge J (2011) Twitter polarity classification with label propagation over lexical links and the follower graph. In: Proceedings of the first workshop on unsupervised learning in NLP, association for computational linguistics, pp 53–63 19. Bifet G, Holmes B (2011) Pfahringer, MOA-TweetReader: real-time analysis in twitter streaming data LNCS 6926. Springer, Berlin Heidelberg, p 4660

Using Classifier Ensembles to Predict Election Results Using Twitter …

309

20. Narayanan V, Arora I, Bhatia A (2012) Fast and accurate sentiment classification using an enhanced Naïve Bayes model 21. Khan FH, Bashir S, Qamar U (2014) TOM: Twitter opinion mining framework using hybrid classification scheme Decision Support Systems 57:245257. Elsevier B.V.

Optimization Algorithms to Reduce Route Travel Time Yash Vinayak and M. Vijayalakshmi

Abstract Planning the most energy-effective and fastest route is pivotal to reducing the windshield time of field workers, ensuring guests receive their packages at the listed time and dwindling the logistic cost. One of the numerous benefits of route optimization is that all the parcels are delivered with the most systematized use of resources. Route optimization enables the computation of the fastest and most energy-effective route while taking into account multiple stops and limited delivery time windows. It solves the vehicle route problem and travelling salesman problem. Using a collection of New York City locales represented as longitude and latitude coordinates, the algorithm searches the solution space to find the most effective travelling salesman path which takes the least amount of time to travel. By creating a regression model to unravel the trip time cost between two points, the algorithm optimizes the travelling salesman path using four biologically inspired meta-heuristics: Genetic Evolution, ant colony, grey wolf, and Artificial Bee Colony. Keywords Optimization algorithm · Ant colony · Genetic Evolution · Bee colony · Grey wolf · Route planning

1 Introduction The most important logistical problem is to find efficient vehicle route with minimum travel time, and the research related to this problem has been going on since a decade. When a company can shorten its delivery time or reduce the number of vehicles it employs, it can better serve its customers and run more efficiently. The method of determining vehicle routes allows any combination of customers to be picked in order to define the delivery path to be taken by each vehicle. The vehicle routing problem is the place, from where the algorithm finds the best solution possible from Y. Vinayak (B) · M. Vijayalakshmi SRM Institute of Science and Technology, Kattankulathur, Chengalpat, Tamil Nadu, India e-mail: [email protected] M. Vijayalakshmi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_27

311

312

Y. Vinayak and M. Vijayalakshmi

a collection of solutions where the number of plausible solutions grows exponentially as the number of consumers to be served. The vehicle routing problem is also closely linked to the travelling salesman problem, in which the algorithm needs to find shortest route. Because there is no known optimization approach that will always find the best answer, the vehicle routing problem is classified as NP-hard. Heuristics are seen to be a good way to find solutions to such problems. The main focus of the research here is on optimization techniques which will include ant colony optimization, Genetic algorithm, grey wolf optimization, and Bee Colony Optimization. The novelty of this research is that majority of the research already available in route optimization is based on implementation of just the travelling salesman problem [1], but here the focus is on not just travelling salesman problem but also on optimizing the total travel time by training the model based on a dataset.

2 System Architecture and Design The proposed model can be divided into three parts (see Fig. 1) user interface, middleware, and optimization service. The user interface is frontend, the website developed using hypertext markup language (HTML), cascading style sheets (CSS), and JavaScript (JS) where user can enter the coordinates of the location user want to visit. After entering location and clicking enter the coordinates are send to middleware which is flask which act as a bridge between backend and frontend, forwarding the coordinates received from user to the optimization service and passing the data received from optimization service to the user interface. The optimization service is implemented using python where optimization algorithm is applied.

Fig. 1 New York Taxi dataset

Optimization Algorithms to Reduce Route Travel Time

313

3 Methods and Methodologies New York Taxi dataset (see Fig. 2) from Kaggle is used here which can provide great insights about the traffic and time taken by the vehicle to move from one place to another. A simple linear regression will not be powerful enough for data and purpose. Hence, Gradient Boosted Trees were used because there were a few random outliers in a large dataset, and there could be a lot of categorical features. They can readily capture nonlinear relationships, accommodate for complexity, and handle categorical features. Passenger count, pickup longitude and latitude drop-off longitude and latitude, and store and forward flags are the input features we used, and Trip duration is the output target. This model was created using Extreme Gradient Boost, a distributed gradient boosting framework that is highly efficient, adaptable, and portable. It is simple to put up this model, but it is challenging to fine-tune and analyse. The root mean squared logarithmic error was utilized as the evaluation metric since it decreases the amount of the error. Figure 3 illustrates the flow of Extreme Gradient Boost.

Fig. 2 New York Taxi dataset

Fig. 3 Extreme gradient boost flowchart

314

Y. Vinayak and M. Vijayalakshmi

Fig. 4 Genetic Algorithm flowchart

Here, the user enters the points which they want to visit then using the Extreme Gradient Boost model, the algorithm finds time cost between each point and store them. Then, the algorithm calculates the Manhattan distance using polar coordinates in the taxicab geometry. After calculating Manhattan distance and time cost, the algorithm run the optimization algorithm for “n” iterations and select the iteration as output which gives the user the path which takes minimal time.

3.1 Optimization Using Genetic Algorithm Another biologically inspired meta-heuristic, Genetic Evolution (GE) (see Fig. 4), derives its cues from the evolutionary process. Over subsequent generations, it employs mutation, crossover, and selection functions. The initial solutions to this challenge will arrive in the population at random. The population in the first example will be based entirely on the nodes that are available on the global map [5]. The time it takes to traverse the path offered is used to determine the overall fitness of the population. After each population’s fitness has been tested, an elitist technique will be used to grade it [6]. The shorter road will have a high fitness value and will be chosen to be passed down to future generations, but the longer path, which will have a poor fitness value, will be eliminated from the population. The technique is then repeated g times, where g is the number of generations [3].

3.2 Optimization Using Ant Colony Optimization The ant colony optimization (see Fig. 5) method is a biologically inspired metaheuristic that searches the solution space in the same way that ants do. Depending on

Optimization Algorithms to Reduce Route Travel Time

315

Fig. 5 Ant colony optimization flowchart

the quality of the path, ants leave pheromones on it. Since this algorithm is aiming to save time on travel, the algorithm will have greater pheromone quality and shorter journey periods. Each edge in the entire graph will have an inverse transit time cost and pheromone associated with it. Each generation, a certain number of ants are randomly placed at the graph’s beginning points. After that, each ant finds a solution by traversing the graph. It accomplishes this by using a weighted probability to select the next position [2]. The pheromone matrix is refreshed after each ant generates their appropriate solution. Depending on the technique used, this can be done in a variety of ways. The algorithm updates each pheromone edge in our solution by multiplying it by, or the residual coefficient. This is the pace at which pheromones “evaporate” or lose their effectiveness. The pheromone edge is then multiplied by q/C, where q is the pheromone intensity and C is the overall cost of the created path. Because the denominator includes the cost of journey time, pathways with longer travel times will have a lesser chance of getting chosen. This process is done g times, with g representing the number of generations [4].

316

Y. Vinayak and M. Vijayalakshmi

3.3 Optimization Using Grey Wolf Optimization In the grey wolf optimization (see Fig. 6), the algorithm is based on the leadership positioning and hunting process of grey wolves in wild which are at the top of food chain. This algorithm is a population based meta-heuristic algorithm. The grey wolves are more inclined to live in groups and everyone in the community follows a strict social dominance structure.

Fig. 6 Grey wolf optimization flowchart

Optimization Algorithms to Reduce Route Travel Time

317

The alpha wolf is the group’s dominant wolf, and all pack members must follow his or her orders. Beta wolves are inferior to alpha who assist the alpha wolf make decisions and are the finest candidates for the role of alpha. Alpha and beta wolves must submit to delta wolves, but they govern the omega. Deltas include Scouts, Sentinels, Elders, Hunters, and Caretakers, to name a few. Omega wolves serve as the pack’s scapegoats; they are the pack’s least important members and are only allowed to eat last [7]. Tracking, chasing, and approaching the target are the three main phases of grey wolf hunting. Pursue the prey, encircle it, and antagonize it, until it stops moving and then attack towards the direction of the prey. In the algorithm, the number of wolves is taken as 50 and number of iterations as 50 too.

3.4 Optimization Using Artificial Bee Colony Algorithm Honey bee foraging behaviour when hunting for a high-quality food source was the inspiration for this algorithm. In the Artificial Bee Colony Algorithm (see Fig. 7), there is a group of locations where food is kept, and our bees work on these positions to alter them over time. The method utilizes a swarm of statistical agents known as honeybees to get the best response. Worker bees, observation bees, and scout bees are the three types of honey bees in the Artificial Bee Colony Algorithm. The employed bees take advantage of the food spots, while the observation bees wait for data from the employed bees about the nectar content of the food spots. The observer bees use the knowledge of the hired bees to locate food sources, which they then exploit. At last, the scout bees seek out new nectar sources at random. Each search space solution is made up of a collection of optimization parameters that identify the location of a nectar source. The collection of bees used is the same as the amount of nectar sources available. The “fitness value” of a nectar source is a measure of its quality and is linked to its location [8]. In this optimization approach, the employed bees are accountable for researching their nectar sources taking fitness score into consideration and then share the data gathered in order to acquire the spectator bees. The amount of hired or bystander bees is the same as the number of solutions (SN). A D-dimensional vector represents each solution (food source). The spectator bees will select a nectar source established on the basis of this data. Onlooker bees are more inclined to choose a higher-quality food supply. The technique used to choose the optimal solution is similar to that of a swarm of bees searching for, promoting, and finally selecting the most wellknown nectar source. Nectar source selection by an observer bee is dependent on the prospect value linked with that nectar source.

318

Y. Vinayak and M. Vijayalakshmi

Fig. 7 Artificial colony algorithm flowchart

4 Results Using Extreme Gradient Boost, a feature score was calculated (see Fig. 8). While researching, it was found out that the significant features responsible for finding out the travel time between points are pickup and drop-off longitude and latitude; trip distance and pickup time are also considered for prediction.

Optimization Algorithms to Reduce Route Travel Time

319

Fig. 8 Feature score of the extreme gradient boosting model

The total travel time for the first generation of Genetic Algorithm is more than 330 min which is around 5 h and 30 min. While the best travel time was found out to be 3 h and 38 min. The following Fig. 9 illustrates the optimized path calculated by the Genetic Algorithm.

Fig. 9 Best solution of Genetic Algorithm

320

Y. Vinayak and M. Vijayalakshmi

Fig. 10 Best solution of grey wolf optimization

Table 1 Optimal time observed in various optimizations Algorithm/Optimization name

Genetic Algorithm

Ant colony optimization

Grey wolf optimization

Artificial Bee Colony Algorithm

Best time (in minutes)

218

165

157

132

The travel time of first-generation ant colony optimization is around 171 min or 3 h approx. While the best travel from ant colony optimization is 167 min or 2 h and 47 min (at most). The best travel time from grey wolf optimization (see Fig. 10) is 157 min or 2 h and 37 min. The best travel time found out from Bee Colony Optimization is 132 min or 2 h and 11 min. The comparison of optimal travel time calculated by each of the algorithms is mentioned in Table 1.

5 Conclusion Travelling Salesman Problem and Vehicle Routing Problem are NP-Hard problems, and hence, the complexity would be increased with addition of more nodes or points. Bio-inspired algorithms are the go-to solution for solving the problem without facing a polynomial time complexity.

Optimization Algorithms to Reduce Route Travel Time

321

The New York Taxi dataset was the one of the best choice of datasets to get insights about travel time, traffic, etc. Extreme Gradient Boosting Model was used to create a model out of the data, which easily trains itself with high accuracy. Genetic Algorithm, ant colony optimization, grey wolf optimization, and Bee Colony Algorithms were explored for solving this problem. All of the algorithms were able to optimize the travel time, but from Table 1, the conclusion can be drawn that Artificial Bee Colony Algorithm gave the best result since its best time is 132 min whereas Genetic Algorithm had worst performance since its best time is 218 min.

References 1. Lee M, Hong J, Cheong T, Lee H (2021) Flexible delivery routing for elastic logistics: a model and an algorithm. IEEE Trans Intell Transp Syst 2. Wang Y, Han Z (2021) Ant colony optimization for traveling salesman problem based on parameters optimization. Appl Soft Comput 107:107439 3. Kuhlemann S, Tierney K (2020) A genetic algorithm for finding realistic sea routes considering the weather. J Heuristics 26(6):801–825 4. Donbosco IS, Chakraborty UK (2021) Comparing performance of evolutionary algorithms-A travelling salesman perspective. In: 2021 11th international conference on cloud computing, data science & engineering (confluence). IEEE, pp 182–187 5. Syauqi MH, Zagloel TYM (2020) Optimization of heterogeneous vehicle routing problem using genetic algorithm in courier service. In: Proceedings of the 3rd Asia Pacific conference on research in industrial and systems engineering 2020, pp 48–52 6. Sun C (2020) A study of solving traveling salesman problem with genetic algorithm. In: 2020 9th international conference on industrial technology and management (ICITM). IEEE, 307–311 7. Sopto DS, Ayon SI, Akhand MAH, Siddique N (2018) Modified y optimization to solve traveling salesman problem. In: 2018 International conference on innovation in engineering and technology (ICIET). IEEE, pp 1–4 8. Wong LP, Low MYH, Chong CS (2009) An efficient bee colony optimization algorithm for traveling salesman problem using frequency-based pruning. In: 2009 7th IEEE international conference on industrial informatics. IEEE, pp 775–782

Survival Analysis and Its Application of Discharge Time Likelihood Prediction Using Clinical Data on COVID-19 Patients-Machine-Learning Approaches S. Muruganandham and A. Venmani

Abstract COVID-19 has significant fatality rate since its appearance in December 2019 as a respiratory ailment that is extremely contagious. As the number of cases in reduction zones rises, highly health officials are control that authorized treatment centers may become overrun with corona virus patients. Artificial neural networks (ANNs) are machine coding that can be used to find complicate relationships between datasets. They enable the detection of category in complicated biological datasets that would be impossible to identify with traditional linear statistical analysis. To study the survival characteristics of patients, several computational techniques are used. Men and older age groups had greater mortality rates than women, according to this study. COVID-19 patients discharge times were predicted; also, utilizing various machine learning and statistical tools applied technically. In medical research, survival analysis is a regularly used technique for identifying relevant predictors of adverse outcomes and developing therapy guidelines for patients. Historically, demographic statistics have been used to predict outcomes in such patients. These projections, on the other hand, have little meaning for the individual patient. We present the training of neural networks to predict outcomes for individual patients at one institution, as well as their predictive performance using data from another institution in a different region. The research output show that the Gradient boosting longevity model beats the all other different models, also in this research study for predicting patient longevity. This study aims to assist health officials in making more informed decisions during the outbreak. Keywords Time-to-Event · Longevity analysis · Prediction model · Artificial intelligence · Machine learning

S. Muruganandham (B) · A. Venmani SRM Institute of Science and Technology (SRMIST), Kattankulathur, Chennai, India e-mail: [email protected] A. Venmani e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_28

323

324

S. Muruganandham and A. Venmani

1 Introduction COVID-19 has quickly expanded throughout several countries, and overcrowded hospitals may be a direct result of the rapid rise in corona virus cases. The COVID19 pandemic’s rapid spread has caused a public health emergency and raised global concern. The rapid epidemic of corona virus in the year 2020 shook the entire world, affecting both symptomatic and asymptomatic COVID-19 carriers. Many cases of COVID-19 have begun to emerge to rise at multiple hospitals, primarily in Chennai and other cities of importance elsewhere, as the deadly virus from Chennai spread across the country. In fact, hospitals and medical institutions that were proven to be harboring corona virus patients were shuttered in March 2020. Patients were swiftly transferred to alternative care units once the main public hospitals were shuttered. A surge in atypical pneumonia cases was observed in Wuhan, China, in December of 2019. COVID-19, a new corona virus, was quickly identified as the source of the outbreak [1] The virus had spread to more than 175 nations as of March 2020, with four and half lakh confirmed cases and more than 19,000 deaths [2]. The death rate fluctuates throughout world population, age distribution, and health infrastructure, according to the earliest reports. COVID-19 patients in China had a death rate of 2.3 percent overall. It is possible to reduce the amount of recuperation, and discharge time can help health officials devise effective measures for lowering the mortality toll. Patients who needed to be transferred to these hospitals could only be admitted to locations that had been approved by the healthcare authorities. Patients with coronary artery disease could only be treated at specialized hospitals. As a result, patients from all across the country were gradually relocated to these places, limiting the spread of the disease. This period is critical because it provides decision-makers with the information they need to plan for hospital overcrowding.

2 Methodology Early research has indicated that statistical analysis may be used to develop predictive models for COVID-19 problems in order to identify death rates and their risk factors [3–5]. In medical research, there are many techniques for predicting survival rate of infected patients. But we are used in this work to construct models with the ability to anticipate of forecasting patient’s length of stay in the hospital utilizing patient release time as the intriguing occurrence. We apply techniques for determining survival analysis technologies in this research to estimate periods of patient survival and investigate the impact of primary risk factors that influence the likelihood of being discharged from the hospital. We adopt perfectly matched approaches with the ability to analyzing censored cases, resulting in more trustworthy results by avoiding large data shrinkage. Experimental approaches introduce these methods to examine the link between hazard elements and the fascinating occurrence in clinical research, reliable approaches for censored data [1, 6]. It has been frequently used

Survival Analysis and Its Application of Discharge Time Likelihood …

325

to predict corona prognosis in order to aid in the optimization and improvement of corona treatment [2, 7–9]. Survival analysis is one of the most well-known types of research is Cox [6, 10, 11] PH regression. It has been commonly employed in several forecasts of prognosis tasks and has been implemented in many well-known software tool boxes [4] (Figs. 1 and 2).

Fig. 1 Analysis diagram

Fig. 2 Censoring diagram

326

S. Muruganandham and A. Venmani

2.1 Analysis of Longevity Survival analysis is a well-known statistical technique for predicting the elapsed time until a notable event over a particular time frame or a variety of applications; it is common to use survival analysis in the economy [12] as well as health care [13]. The point at which a patient is released from the hospital is the event of interest in this study [14–16]. A continuous variable is predicted using analysis of longevity, which is a type of regression. The key distinction between these sorts of backward and backward traditional approaches is that model data for analysis of longevity noted partially. Censoring According to the examination of survival data, some study participants did not experience the occurrence of interest at the completion of the research or during the analysis. At the end of the experiment, some patients may still be alive or in remission. These are the topics. It’s impossible to say how long you’ll live. Observations or times that have been censored are referred to as censored observations or censored times, respectively. A person does not have the opportunity to see the incident before the study concludes. A person is lost to follow-up during the study term. An individual drops out of the research for a variety of reasons (assuming death is not the event of interest) [17]. Intervals, right censoring, and left censoring. Survival Function The function S(t) describes the likelihood that a person will live longer than t.S(t) = p(T > t) = 1 (Before t, a person fails). S(t) is a non-increasing time function that has the property. If t = 0, S(t) is 1, and if t = 1, S(t) is 0. The cumulative survival rate is another name for the function S(t),‘t’ is the current time, T is the death time, and P is probability. The function of survival, in other words, the likelihood that the moment of death will be later than a specified time [12]. The function is also known as the survivorship function in biological survival problems [18]. Simply, the probability of failure in a brief interval per unit time is calculated in the short interval t and to (t + Δt) per unit width t. f (t) = lim

t→0

p (t, t + t) t

(1)

Hazard Function The “h” is the survival function of hazard over time (t). The conditional failure rate is denoted by the letter T. This is defined as the top the upper limit likelihood that a person fails in a major way short time period (t + Δt) if someone unique survives to time t [19].  P (t, t + t) given the individual has survived to t) h(t) = lim (2) t→0 t

Survival Analysis and Its Application of Discharge Time Likelihood …

327

The pdf and the cdf (t) can also be used to define the hazard function (t). f (t) 1 − F(t)

h(t) =

(3)

The instantaneous failure rates, as well as the force of failure of mortalities, age and rate of conditional mortality specific failure rate are all terms used to describe the hazard function. The cumulative hazard function is defined as follows: 1 H (t) =

h(t) dx

(4)

0

The KM estimator uses life-time data to estimate the survival function. It could be used in medical research to determine the percentage of patients who live for a specific amount of time after treatment [13]. Let S(t) denote the probability that an item from a given population would outlive t. For a population with this size of sample, ˆ = S(t)

 n i − di ni t 45 is 0.410, and mean ratio for the whole group is 0.423.

5 Conclusion and Future Work So far, three factors relating to hobby AI have been listed. It is clear that this analogy is by no means complete, given the fact that game AI encompasses a broad range of topics, several of which are likely to be outside the reach of this article. For instance, task AI has been used for fantastic software program software functions, as well as assisting work layout and manufacturing, sport trying out, and so on, which are not mentioned in this article. Here we have concluded a model called logic smasher which can provide the different game visualization in the form of terms and conditions using different AI-based algorithms in terms of parameters like height, weight, and age of the fit players. As per our result analysis, random forest algorithm gives MAE 9.1915. Since playing game is volatile in nature. So, we can say random forest algorithm is better than other approaches. Also, we have analyzed height vs age vs weight of the players. At last, we have visualized different country-wise participation and goals. In the future, we want to do a more thorough analysis that takes into account additional aspects of work AI using other machine learning algorithms.

Game Data Visualization Using Artificial Intelligence Techniques

359

Fig. 7 Weight and height ratio age wise

References 1. Spronck P, André E, Cook M„ Preuß M (2018) Artificial and computational intelligence in games: AI-driven game design (Dagstuhl Seminar 17471). In: Dagstuhl Reports, vol 7, no 11). SchlossDagstuhl-Leibniz-ZentrumfuerInformatik 2. Skinner G, Walmsley T (2019) Artificial intelligence and deep learning in video games a brief review. In: 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS). IEEE, pp 404–408 3. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489 4. Esfahlani SS, Butt J, Shirvani H (2019) Fusion of artificial intelligence in neuro-rehabilitation video games. IEEE Access 7:102617–102627 5. Dillon R (2011) The golden age of video games: the birth of a multibillion dollar industry. CRC Press

360

S. K. Mohapatra et al.

6. Koundal D, Gupta, S, Singh S (2018) Computer aided thyroid nodule detection system using medical ultrasound images. Biomed Sig Process Control 40:117–130 7. Guimaraes M, Santos P, Jhala A (2017) Cif-ck: Architecturefor socialnpcsin commercial games. In 2017 IEEE conference on computational intelligence and games (CIG). IEEE, Conference Proceedings, pp 126–133 8. Arzate Cruz C, Ramirez Uresti JA (2018) Hrlb2: wa reinforcement learning based framework for believable bots. Appl Sci 8(12):2453 9. Zhao Y, Borovikov I, Rupert J, Somers C, Beirami A (2019) On multi agent learning in team sports games. arXivpreprintarXiv:1906.10124 10. Borovikov I, Harder J, Sadovsky M, Beirami A: Towards interactive training of non-player characters in video games. arXivpreprintarXiv:1906.00535 11. Razzaq S, Maqbool F, Khalid M, Tariq I, Zahoor A, Ilyas M (2018) Zombies arena: fusion ofreinforcement learning with augmented reality on npc. Clust Comput 21(1):655–666 12. Nadiger C, Kumar A , Abdelhak S (2019) Federated reinforcement learning for fast personalization. In: 2019 IEEE second international conference on artificial intelligence andknowledge engineering (AIKE). IEEE, pp 123–127 13. Sarangi PK, Nayak BK, Dehuri S (2021) Stock market price behavior prediction using Markov models: a bioinformatics approach. Data Analytics Bioinform: A Mach Learn Perspect:485– 505 14. Sarangi PK, Nayak BK, Dehuri S (2021) A novel approach for prediction of stock market behavior using bioinformatics techniques. Data Analytics Bioinform: A Machine Learn Perspect:459–484 15. Mohapatra SK, Kamilla SK, Swarnkar T, Patra GR (2020) Forecasting world petroleum fuel crisis by nonlinear autoregressive network. In: New paradigm in decision science and management. Springer, Singapore, pp 67–76 16. Bamunif AOA (2021) Sports information and discussion forum using artificial intelligence techniques: a new approach. Turkish J Comput Math Educ (TURCOMAT) 12(11):2847–2854

Energy-Efficient and Fast Data Collection in WSN Using Genetic Algorithm Rahul Shingare and Satish Agnihotri

Abstract For military and civilian applications in remote and difficult-to-reach regions, nodes with low power consumption are routinely employed. Access to these networks and data collection are possible through the use of an unmanned aerial vehicle (UAV) (WSNs). With the use of low-altitude unmanned aerial vehicles (UAVs), it is possible to optimise WSN data gathering placements (drones). Wi-Fi sensor networks, also known as WSNs, are networks of sensors that are capable of sensing, calculating, and transmitting data. Another advantage of employing this technology is that it requires less computing and communication power. WSN networks consume a substantial amount of energy to keep them running. Clusterbased routing methods are used to the life of a network can be extended by distributing the load over multiple nodes. Improved GAs and K-means have the potential to minimise power consumption while also extending the life of a network. It is possible to lower the energy consumption of cluster head (CH) nodes by utilising a more efficient genetic algorithm (GA). The K-means algorithm is used to generate dynamically clustered networks. In this paper, we have used genetic algorithm and modified LEACH protocol to make energy-efficient WSN clustered network and compared with existing algorithm. The result shows that proposed algorithm performs better than existing. Keywords Sensors · Clustering · Genetic algorithm · K-means · Energy efficiency

1 Introduction When autonomous devices interact wireless and employ sensors to acquire information or identify crucial events in the physical world, they are referred to as nodes in a wireless sensor network. This type of sensor, in addition to providing environmental monitoring and military battlefield views, can also be utilised for a variety R. Shingare · S. Agnihotri (B) Computer Science and Engineering Department, Madhyanchal Professional University Ratibad, Bhopal, Madhya Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_31

361

362

R. Shingare and S. Agnihotri

of other applications such as logistical management, health tracking, and industrial control. Each of these apps has a specific purpose that it is responsible for performing for the user. When implementing WSN, a number of design issues must be taken into account in accordance with the application’s requirements and the anticipated consequences of the deployment. Network lifespan is a significant component in determining whether a network is capable of providing appropriate service when addressing channel characteristics, network architecture, resource restrictions, interference management, and other quality of service requirements (NL). WSNs are capable of supporting a certain application throughout the time that they are operational. Wireless sensor networks (WSNs) must be developed with long-term operation in mind, as the network’s operational state is a requirement for accomplishing all design objectives. The fundamental goal of our design was to maximise network lifetime (NL) while keeping bit error rate (BER) as low as possible (BER). In this study, it was discovered that the BER was the most important parameter for judging quality of service. For example, network designers for WSNs can use the NL to make educated judgments about network performance and service-level agreements. WSN sensor nodes have a short battery life, so the North American (NL) region often relies on them when conducting WSN operations. In most real-world applications, such as sensors buried in glaciers to detect climate change, it is either difficult or prohibitively expensive to recharge or replace the batteries of the sensors. The NL’s capabilities are restricted by the WSN’s battery-powered sensors. In order to account for this, our transmission rate and power dissipation balance method is adaptive. Physical layer settings that have little effect on fixed-rate systems, such as the amount of processing power needed by each sensor, were investigated. An example of a string topology is when data must be transported from a source to a destination node, which is known as a sink node in the network. String topology transmission is the term used to describe this (DN). Data can only be sent to the sink node if the sink node and the destination node are both connected. It is necessary to optimise low-complexity routing in WSNs with interference constraints in order to extend the network’s nodes in order to achieve this goal (NL). WSN routing protocols are therefore focused on preserving sensor node energy in order to keep them operational for as long as possible, thereby increasing network lifespan and ensuring network connectivity. Because nearby sensor nodes may produce comparable sensing data, routing techniques in a WSN must identify and delete duplicate information. The majority of data in WSNs is routed through a single sink, despite the fact that IP-based organisation is not explicitly used, and sensor nodes are constrained in their ability to store and process data. Many protocols for energy-efficient routing, particularly in WSNs, have been proposed over the past two decades for all of the aforementioned reasons. According to relevant surveys, many of them can be found in one place for taxonomy of energy-efficient routing protocol as shown in Fig. 1, it has been suggested that protocols can be classified into specific categories. Based on the structure of the network, protocols can be divided into four groups based on four factors: how data is transmitted, whether location information is used, and whether or not multiple pathways are supported. All of these protocols, both classic and modern, will be presented in this article, with a focus on highlighting

Energy-Efficient and Fast Data Collection in WSN Using Genetic …

363

Fig. 1 Classification of energy-efficient routing protocols

their most important features as well as making a comparison of their advantages and disadvantages. This will also include a discussion of general and specific considerations, as well as future research questions. Comparing this review to others of its kind, it offers a more comprehensive examination of energy-efficient routing protocols. For the purposes of comparing the various protocols, it broadens the existing taxonomies while also taking into account additional performance metrics. Figure 2 summarises the proposed taxonomy for energy-efficient mechanisms. Tuning in to the radio Sensor nodes’ batteries are depleted primarily as a result of the radio module. Antenna direction and power transmission have been investigated as ways to cut down on the amount of energy lost during wireless communication. The goal is to minimise the radio’s power consumption by identifying the most efficient modulation parameters. For example, circuit power consumption and transmitted signal power consumption both contribute to energy depletion. Circuit consumption is greater for short distances than transmission power, whereas signal power becomes more dominant for longer distances. Current research is focused on finding a balance between constellation size, information rate, and the transmitting time. The rest of this document is structured as follows: Sect. 2 provides an overview of the associated research on this subject. Section 3 contains a description of the intended work. Section 4 provides examples of the proposed algorithm’s detailed implementations, which are illustrated in depth. The outcomes of a set of experimental simulations are analysed after they have been completed. Finally, Sect. 5 brings the paper to a close.

364

R. Shingare and S. Agnihotri

Fig. 2 Energy-saving mechanism classification

2 Literature Review Wireless sensor networks (WSNs) can extend their operational lives by utilising sleep scheduling (WSNs). Following the random distribution of an extensive network of sensors, it is usual practice to divide the sensors into sets with particular boundaries and then schedule the sensor activation in line with the numerical order in which the sets were created. Li et al. [1] to initialise the population, new greedy tactics can be utilised that take a fraction of the time that it would ordinarily take to do so. Author demonstrates that their technique outperforms the methods currently in use, particularly when applied to vast networks of nodes. Through experimentation, it has been demonstrated that both the genetic operations presented are effective and that the ideal BMHGA parameter settings may be discovered through trial and error with little difficulty. In order to be effective, wireless sensor networks (WSNs) must have high levels of availability as well as high levels of energy efficiency (WSNs). When sensor readings are sparsely represented, one option for improving them is compression sensing (CS), which is a data reduction technique that recovers significant amounts of data from a small number of samples. When making an increasing number of measurements, it is necessary to upsurge data transmission in order to reach better levels of precision. The algorithm’s primary goal is to attain a decent balance between energy productivity and precision in its calculations. When constructing a multihop path, it is necessary to make use of the optimised values. The Pareto-front output of MOGA assists the user in balancing energy efficiency and accuracy when determining how many measurements and transmission ranges are there to utilise in a given application, which can be accomplished through numerical simulations

Energy-Efficient and Fast Data Collection in WSN Using Genetic …

365

and experimental data collection. Using measurement matrices with reduced mutual coherency, the accuracy of CS can be improved, according to the findings. It is NP-difficult to place a high number of nodes while simultaneously optimising multiple measurements, which is due to the large number of nodes that must be placed. In paper [2], there is a new method for node location optimisation that has been developed. Identifying the limits of frequently utilised approaches was the first step took towards reaching our goal. The study, improvement, and validation of present physical models are all critical steps in the process of arriving at precise solutions. Because of this, author was able to develop a limited multi-objective optimisation problem to address the deployment optimisation dilemma. As a result, the multiobjective genetic algorithm and weighted sum optimisation (MOONGA) algorithm was developed. The topology, environment, application standards, and preferences of a network designer can all be taken into consideration by this optimiser. It is a powerful tool. Consider the following: author has created and implemented a number of algorithms that have been evaluated on different sets of test data to demonstrate the effectiveness of our solution to this problem. Our technique, according to the results of our analyses, is both intriguing and superior to the options under discussion. Intruder nodes and compromised nodes might cause the data collection procedure to sluggishly progress. As a result, safety is extremely important. It is possible to locate the hacked nodes through the usage of secret sharing. Recent research has concentrated on key authentication, but little has been done to understand how to communicate secrets among a large number of participants. Given the size of the network, the predicted fraction of hostile nodes, and the fundamental quantum teleportation principles that are taken into consideration, this protocol is preferable when it comes to performing safe data aggregation. Increasing sink mobility within a restricted channel in wireless sensor networks has been demonstrated to increase energy efficiency in recent research. As a result of the path constraint on the mobile sink, it is difficult to collect data from sensor nodes that have been installed in random locations. It is therefore a huge challenge [3] to increase data collecting while still extending the life of the network. In order to solve this issue, author offers a sensor node and routing solution that maximise network lifetime while simultaneously increasing network performance at the same time. There are a lot of civilian and military sensor nodes in remote, steep, and difficultto-reach areas, and their battery life is limited. We devised a speedy and energyefficient data collection (EFDC) approach for wireless sensor networks (WSNs) operating in mountainous terrain with the use of an unmanned aerial vehicle (UAV). An energy-efficient distributed clustering technique based on a central bias hybrid approach was developed first, before the sensors were clustered and grouped together. It was then applied to the UAV’s position in a cluster using a modified tabu search method to determine its location in the cluster. A Multi-objective Genetic Algorithm (MOGA) for Wireless Sensor Networks (WSN) AODV Routing Protocol is discussed in detail in [4]. (WSN). The minimum hop count, which was deployed by AODV in the form of an individual routing statistic, was the underlying cause of two issues: First, it was the source of a network outage. It is possible to cause traffic congestion and uneven energy depletion by taking the quickest possible route all the time. Routing

366

R. Shingare and S. Agnihotri

over short paths and weak links is more damaging than routing over long paths and strong links because of the risk of retransmission and packet drops. Praveena et al. [5] suggest that wireless sensor networks (WSNs) are networks of wireless devices (sensors) that are spatially dispersed to monitor environmental and physical conditions. Nodes in wireless sensor networks can fail for a variety of reasons, including excessive power consumption, hardware malfunction, and environmental factors. WSNs are plagued by a major issue: fault discrepancy. Using the Network Simulator version 2, this paper compares the advantages of PSO over genetic algorithm (GA). In addition to increasing the number of active nodes, this will replace the sensor node with more reused routing paths. Data loss can be reduced by reducing the amount of energy used [5]. Dhami et al. [6] suggest that energy-efficient protocols are used in wireless sensor networks to improve the system’s energy conservation and extend its lifespan. The deployment of nodes, rather than a physical channel, is how data is transmitted over a WSN. The nodes are responsible for sending the data to the final destination. The nodes have been grouped together for the purpose of improving communication. Information from the nodes that are near to one other is sent to the cluster head, which in turn transmits the information to the sink. Clustering methods and strategies have been developed in an effort to improve WSN. Traditionally, the network’s central node was designated as the cluster head and was restricted in its ability to transmit data across the network. Using more energy than other nodes results in the death of this node. However, because it travels in a straight line, the route is congested, resulting in increased energy use and a shorter network lifespan. An energy-efficient evolutionary algorithm and the virtual grid-based dynamic routes adjustment idea can improve the overall performance of wireless sensor networks (VGDRA).There is a better chance of getting a better result in fewer loops with the proposed approach, which is dynamic rather than static, because of its ability to balance the load and optimise it. MATLAB is used to simulate the results of the proposed method [6]. In this paper, Zhou et al. propose a genetic clustering-based routing algorithm for wireless sensor networks (WSN). Once the nodes are set, the algorithm can cluster the network and choose the heads of the clusters. Reclustering and selecting new cluster heads can be done quickly using the algorithm, which is good for keeping energy consumption in check and extending WSN lifespan [7]. Sujee and Kannammal propose that wireless sensor networks (WSNs) are networks made up of sensors located in various locations and used to continuously record information about the surrounding environment or physical conditions. From a sensor to a central location, data is routed through the network. The battery power determines how long a sensor will last. When the battery power is reduced, the sensor’s life expectancy is reduced as well. It is critical to extend the life of sensors and distribute power across WSNs, as well. The low-energy adaptive clustering hierarchy routing protocol can use energy effectively up to a certain point (LEACH). LEACH, genetic LEACH, and inter-cluster communication in LEACH are all compared in this paper. Using genetic algorithm (GA), it is possible to extend the lifespan of WSN by first analysing the fundamental operations involved in LEACH. LEACH uses inter-cluster communication rather than direct communication to reach the sink,

Energy-Efficient and Fast Data Collection in WSN Using Genetic …

367

and these results are then compared. Genetic LEACH increases the WSN’s lifespan, while the inter-cluster communication in LEACH significantly reduces the energy consumption of the nodes and increases the WSN’s lifespan compared to LEACH and genetic LEACH. MATLAB simulation results show this [8]. Miao et al. propose that genetic algorithm (GA) is used to improve the LEACH protocol (LEACH-H) proposed in this paper. One of the optimisation variables is based on a weighted average of three influencing factors: the current residual energy of neighbour nodes, the number of neighbours, and the distance between neighbours and the base station (or hub). The goal is to maximise the lifetime of the first and half of the nodes. Before and after improvements to the LEACH protocol are used on the prefabricated substation. The sensor nodes are used to monitor equipment performance and environmental conditions at the substation. For data transmission, the LEACH-H and LEACH protocols are used separately. According to the simulations, the LEACH-H protocol has a longer lifetime and better network performance than the original LEACH protocol’s first and second half of nodes [9]. Hampiholi et al. propose that it is now easier than ever to conduct wireless sensor network and wireless mesh network research thanks to recent developments in the Internet of Things (IoT). If the routing of data in a network can be done effectively in an optimisation problem with many constraints such as path, energy in a node, quality of link, traffic, then a significant amount of energy can be saved. Such problems can be solved using genetic algorithm (GA) that incorporates heuristic techniques over the network population. Premature convergence, on the other hand, reduces the algorithm’s performance and prevents it from traversing the search space and finding numerous solutions that save energy. An improved genetic algorithm using local search technique can be adapted to address such drawbacks. Local search and sleep–wake-up mechanisms are used in this paper to create a modified GA called maximum enhanced genetic algorithm (MEGA). By taking into account the communication constraints and energy consumption of sensors while they are operating and communicating, it dynamically optimises the wireless sensor network (WSN). In order to evaluate the efficiency of our proposed MEGA routing protocol, author compared it to a number of existing protocols. WSN protocols are developed and tested using software-based simulation tools, and their performance is evaluated in a variety of network scenarios and conditions [10, 11]. Mehetre and Wagh propose that designing an efficient WSN topology is critical to extending network life in wireless sensor networks. The energy consumption of a topology that has been carefully designed is expected to be both efficient and balanced. For wireless sensor networks, genetic algorithm (GA)-based disjoint path routing is proposed in this study. The suggested approach outperforms the current one in terms of both energy consumption and network longevity. As long as disconnected paths are required for fault tolerance, the proposed technique can be used [12, 13]. Garg and Saxena propose that sensor nodes in wireless sensor networks come in all shapes and sizes. Each node has a limited number of resources in terms of both power and bandwidth. Wireless sensor networks use type-2 fuzzy logic with three parameters for cluster head selection, including remaining energy, distance, and concentration. Many hierarchical routing protocols exist for the purpose of directing

368

R. Shingare and S. Agnihotri

traffic from one point in the network to another. Using this type of protocol, the network is broken up into small clusters and a hierarchy of nodes is built. For this reason, an algorithm based on the same parameters is proposed in an effort to increase the network’s lifespan. Type-2 fuzzy logic and the genetic algorithm are compared and contrasted. These parameters have been used to evaluate various performance metrics, such as the number of alive nodes, the number of dead nodes, and the amount of residual energy. In comparison to type-2 fuzzy-based selection, the genetic method performs better, and this has been demonstrated across a variety of network typologies. One with a random topology and the other with a predetermined topology are two examples [14]. In paper [15], edge weight fuzzy logic system is introduced which used received signal strength indicator (RSSI) and link quality indicator for deciding the location of sensor nodes. In paper [16], author proposed an optimisation of clustering using genetic algorithm which helps to minimise the communication distance.

3 Proposed Method 3.1 Genetic Algorithm Concept Aside from that, the WSN places a significant strain on the sensor node’s lifetime due to the fact that nodes have a greater probability of dying early on, before they reach a crucial energy consumption threshold. For the purpose of resolving the problem, the virtual grid concept made use of genetic algorithm. Input from GA is used to calculate population sizes as well as the maximum number of iterations possible. The fitness function should be evaluated iteratively in order to discover the optimum path. When a path in population generation is formed, nodes from each cluster are picked and placed on it. When performing each fitness function, the total energy and total distance of each node, as well as the path from origin to destination with the highest fitness, are calculated. If the final output response exceeds the fitness of the population, a new population is investigated. It saves if the difference is the greatest possible from the beginning. In every other case, make use of the crossover. Roulette wheels are employed in the crossover, and the output of the crossover is also investigated in this manner (Cout). Otherwise, if Cout’s initial fitness is greater than that of any other candidate, the mutation should be applied. The output of mutations should be maximised by measuring the mutation fitness (Mff) and mutation standard deviation (Mtd) (mutation distance). If the population has reached its maximum size, the data should be saved; otherwise, the data should be deleted and the process restarted. There are considerations for iteration and energy, as well as geographic regions and standard energy. Before applying for the best and most ideal site for the transmitter and receiver, make sure that the transmitter and receiver are compatible. The optimum

Energy-Efficient and Fast Data Collection in WSN Using Genetic …

369

location should be able to deliver the highest level of fitness for the most efficient route and data gathering.

3.2 Modified LEACH Protocol Methodology of the Proposed Technique • • • •

Initialisation of network Applying genetic algorithm for choosing cluster head Applying modified LEACH protocol for fast data collection Using modified LEACH protocol or proposed solution for energy-efficient WSN network.

The majority of nodes broadcast to the cluster heads, which in turn aggregate and compress their data before forwarding it to the base station (sink). To choose which node would be the cluster leader in this round, a random algorithm is used. In modified LEACH, the cluster head was selected using a genetic method, and an intermediate cluster head was selected to transfer data at a higher rate than in the original. Energy adjustment is not taken into account in standard LEACH-based protocols, resulting in rapid depletion of node energy. As a result, the proposed modified LEACH protocol incorporates four energy factors when deciding on a threshold (i.e. the initial node and residual and total node energies, the network’s total energy, and average energy of all nodes). This technique can ensure that all nodes receive the same amount of energy, hence maximising energy efficiency. In addition, the node closer to the BS than the CH will not participate in the construction of the cluster to save energy usage. As a result, the network’s robustness and lifespan are both improved.

4 Result and Simulation The NS2 simulator on Ubuntu was used to develop a specific technique, which we shared with the group. The computer that runs the simulation is equipped with eight gigabytes of RAM. Clusters are formed by wireless sensor networks with 20, 40, 60, 80, and 100 nodes each. The simulation results of our method, when compared to earlier clustering algorithms, are depicted in Figs. 3 and 4, respectively. The choices are selected by spinning the roulette wheel. The most common type of crossover is a one-point crossover. This phrase refers to the relationship between the number of successful packet deliveries across sinks in a network and the total number of packet deliveries received over a network in a network. It takes an average of one millisecond for data packets [17, 18] to travel through sinks on the water’s surface. It is necessary to take an

370

R. Shingare and S. Agnihotri Input

The declaration of the network's area, nodes, and sink location, source location

Optimization algorithm GA is used to select CHs.

Iteration and population of GA initial parameters objective, function)

Select maximum optimal solution for node to be CH

In accordance with the CH's decision begin communication process.

Applying Modified LEACH protocol to Make Energy WSN

Approaches to balancing the load

Calculation of variables End

Fig. 3 Flow diagram of proposed work

“energy consumption” measurement for each data packet in order for it to reach its destination on the water’s surface (Table 1). Figure 4 depicts a wireless sensor network that consists of a space-distributed network of independently operated sensors (WSN). Because energy costs are a significant limiting factor in WSN, networks and processing must be energy-efficient in order to be effective. When there is a sensor event of interest, it is occasionally desired to send data only to a gateway node, which is one of the WSN’s most energyintensive nodes and one of the most expensive. Sensors restrict communication to times when an incident is likely to occur, resulting in significant savings in communication costs. For example, security services, home automation, disaster support, traffic control, and health care are all network-enabled services that are available. The NS2 simulator running on Ubuntu was used to design a specific technique, which we then shared with the rest of the group for their consideration. The machine that is used to execute the simulation has eight gigabytes of random access memory (RAM). Wireless sensor networks of 50, 100, 150, 200, and 250 nodes each create

Energy-Efficient and Fast Data Collection in WSN Using Genetic …

371

Fig. 4 Comparison of network lifetime

Table 1 Network properties

Properties

Values

Node count

100

Initial energy

0.5 J

Idle-state energy

50 n/J

Data aggregation energy

10 Pj/bit/m2

Amplification energy (CH to BS)

10 pj/bit/m2

Amplification energy (Node to BS)

0.0003 pj/bit/m2

Packet size

400

clusters, which are then subdivided into smaller groups. As shown in Figs. 3 and 4, the simulation results obtained by our method when compared to those obtained by earlier clustering algorithms are quite promising. The options are determined by the outcome of the roulette wheel spin. A one-point crossover is the most commonly seen sort of crossover. As the name implies, this phrase relates to the relationship between the total number of successful packet deliveries across sinks in a network and the total number of packet deliveries received throughout a network. Data packets [17, 18] pass via sinks on the water’s surface in an average of one millisecond, according to the researchers. Before a data packet may reach its destination on the water’s surface, it

372

R. Shingare and S. Agnihotri

Fig. 5 Compression of throughput

must first be subjected to an “energy consumption” analysis. Nodes only return to the gateway when something happens, resulting in minimal energy consumption on their part. The latter is particularly problematic in dynamic settings when the vast majority of the data collected is of little or no significance. By relaxing the limitations on what defines an event, threshold, or probability, the problem can be resolved. In Fig. 5, end-to-end delay or one-way delay (OWD) is the time required to transport a packet from source to destination over a network.

5 Conclusion The increase in lifetime of WSN network is achieved in the proposed work using the genetic and modified LEACH protocols. The modified LEACH protocol selects the cluster head using the genetic algorithm; thereafter, the selection of CH is based on four different types of energy which give the better result than existing. Data can be sent to the nodes with less energy consumption and improved security when using this method. A WSN’s energy consumption is crucial, and the method presented here allows packet loads to be dispersed based on node energy consumption, so preventing the network from failing. Based on the simulation results, the proposed approach has

Energy-Efficient and Fast Data Collection in WSN Using Genetic …

373

the potential to reduce packet loss and delay while also extending the network’s lifespan, all of which are crucial for WSN communication. It is possible to lower the energy consumption of cluster head (CH) nodes by utilising a more efficient genetic algorithm (GA). The K-means algorithm is used to generate dynamically clustered networks. The suggested technique has a longer network lifetime than well-known methods, such as the LEACH, GAEEP, and GABEEC protocols, according to the results of NS2 simulations.

References 1. Li J, Luo Z, Xiao J (2020) A hybrid genetic algorithm with bidirectional mutation for maximizing lifetime of heterogeneous wireless sensor networks. IEEE Access 8, pp 72261–72274. https://doi.org/10.1109/ACCESS.2020.2988368 2. Bouzid SE, Seresstou Y, Raoof K, Omri MN, Mbarki M, Dridi C (2020) MOONGA: multiobjective optimization of wireless network approach based on genetic algorithm. IEEE Access 8:105793–105814. https://doi.org/10.1109/ACCESS.2020.2999157 3. Yetgin H, Cheung KTK, El-Hajjar M, Hanzo L (2015) Network-lifetime maximization of wireless sensor networks. IEEE Access 3:2191–2226. https://doi.org/10.1109/ACCESS.2015. 2493779 4. Nazib RA, Moh S (2021) Energy-efficient and fast data collection in UAV-aided wireless sensor networks for hilly terrains. IEEE Access 9:23168–23190. https://doi.org/10.1109/ACC ESS.2021.3056701 5. Praveena KS, Bhargavi K, Yogeshwari KR (2017) Comparision of PSO algorithm and genetic algorithm in WSN using NS-2. In: 2017 international conference on current trends in computer, electrical, electronics and communication (CTCEEC), Mysore, India, pp 513–516 6. Dhami M, Garg V, Randhawa NS (2018)Enhanced lifetime with less energy consumption in wsn using genetic algorithm based approach. In: 2018 IEEE 9th annual information technology, electronics and mobile communication conference (IEMCON), Vancouver, BC, Canada, pp 865–870 7. Zhou R, Chen M, Feng G, Liu H, He S (2016) Genetic clustering route algorithm in WSN. In: 2010 sixth international conference on natural computation, Yantai, China, pp 4023–4026 8. Sujee R, Kannammal KE (2017) Energy efficient adaptive clustering protocol based on genetic algorithm and genetic algorithm inter cluster communication for wireless sensor networks. In: 2017 international conference on computer communication and informatics (ICCCI), Coimbatore, India, pp 1–6 9. Miao H, Xiao X, Qi B, Wang K (2015) Improvement and application of LEACH Protocol based on Genetic Algorithm for WSN. In: 2015 IEEE 20th international workshop on computer aided modelling and design of communication links and networks (CAMAD), Guildford, pp 242–245 10. Hampiholi AS, Vijaya Kumar BP (2018) Efficient routing protocol in IoT using modified Genetic algorithm and its comparison with existing protocols. In: 2018 3rd international conference on circuits, control, communication and computing (I4C), Bangalore, India, pp 1–5 11. Fattoum M, Jellali Z, Atallah LN (2020) A joint clustering and routing algorithm based on GA for multi objective optimization in WSN. In: 2020 IEEE eighth international conference on communications and networking (ComNet), Hammamet, Tunisia, pp 1–5 12. Gupta SR, Bawane NG, Akojwar S (2013) A clustering solution for wireless sensor networks based on energy distribution & genetic algorithm. In: 2013 6th international conference on emerging trends in engineering and technology, Nagpur, India, pp 94–95

374

R. Shingare and S. Agnihotri

13. Mehetre D, Wagh S (2015) Energy efficient disjoint path routing using genetic algorithm for wireless sensor network. In: 2015 international conference on computing communication control and automation, Pune, India, pp 182–185 14. Garg N, Saxena S (2018) Cluster head selection using genetic algorithm in hierarchical clustered sensor network. In: 2018 second international conference on intelligent computing and control systems (ICICCS), Madurai, India, pp 511–515 15. Parwekar P, Reddy R (2013) An efficient fuzzy localization approach in Wireless Sensor Networks. In: 2013 IEEE international conference on fuzzy systems (FUZZY-IEEE), pp 1–6. https://doi.org/10.1109/FUZZ-IEEE.2013.6622548. 16. Parwekar P, Rodda S (2017) Optimization of clustering in wireless sensor networks using genetic algorithm. Int J Appl Metaheuristic Comput (IJAMC) 8(4):84–98 17. “Vehicular Ad-hoc Network Security and Data Transmission: Survey and Discussions”, International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGCApproved), ISSN: 2349–5162, vol. 6, Issue 6 18. Temurnikar A, Verma P, Choudhary J (2021) Design and simulation: a multi-hop clustering approach of VANET using SUMO and NS2. In: Tiwari A, Ahuja K, Yadav A, Bansal JC, Deep K, Nagar AK (eds) Soft computing for problem solving. Advances in Intelligent Systems and Computing, vol 1392. Springer, Singapore. https://doi.org/10.1007/978-981-16-2709-5_42 19. Mazaideh MA, Levendovszky J (2021) “A multi-hop routing algorithm for WSNs based on compressive sensing and multiple objective genetic algorithm. J Commun Netw 99, pp 1–10. https://doi.org/10.23919/JCN.2021.000003. 20. Kim T-H, Madhavi S (2020) Quantum data aggregation using secret sharing and genetic algorithm. IEEE Access 8:175765–175775. https://doi.org/10.1109/ACCESS.2020.3026238 21. Balas FA, Almomani O, Jazoh RMA, Khamayseh YM, Saaidah A (2019) An enhanced end to end route discovery in AODV using multi-objectives genetic algorithm. IEEE Jordan Int Joint Conf Electr Eng Inf Technol (JEEIT) 2019:209–214. https://doi.org/10.1109/JEEIT.2019.871 7489 22. Temurnikar A, Verma P, Choudhary J (2020) Securing vehicular adhoc network against malicious vehicles using advanced clustering technique. In: 2nd international conference on data, engineering and applications (IDEA), Bhopal, India, pp 1–9. https://doi.org/10.1109/IDEA49 133.2020.9170696 23. Temurnikar A, Verma P, Choudhary J (2020) Development of multi-hop clustering approach for vehicular ad-hoc network. Int J Emerg Technol 11(4):173–177 24. Temurnikar A, Sharma S (2013) Secure and stable VANET architecture model. Int J Comput Sci Netw 2(1):37–43 25. Temurnikar A, Verma P, Dhiman G (2022) A PSO enable multi-hop clustering algorithm for VANET. Int J Swarm Intell Res (IJSIR) 13(2):1–14. https://doi.org/10.4018/IJSIR.202204 01.oa7 26. Liu Y, Wu Q, Zhao T, Tie Y, Bai F, Jin M (2019) An improved energy-efficient routing protocol for wireless sensor networks. Sensors 19(20):4579. https://doi.org/10.3390/s19204579

Feature Reduced Anova Element Oversampling Elucidation Based Categorisation for Hepatitis C Virus Prognostication M. Shyamala Devi, S. Vinoth Kumar, P. S. Ramesh, Ankam Kavitha, Konkala Jayasree, and Venna Sri Sai Rajesh Abstract Hepatitis C virus is a virus-borne infection that attacks the liver and causes inflammation. Disease infection is caused due to exposure to infectious blood, such as from sharing needles or using unsterile tattoo equipment. Identification of this disease is a challenging task as it has no predefined symptoms. The machine learning technology could help with the analysis of clinical parameters for the classification of Hepatitis C virus. With this review, this project aims to predict the presence of Hepatitis C virus by using Hepatitis C virus dataset retrieved from the KAGGLE machine learning repository. The Hepatitis C virus sample has been preprocessed with encoding and incomplete data. It has 12 attributes and 615 individual person records. To examine the outcome measures, the original database is fed to all classifier models. To look at how the target feature variable is distributed, exploratory data analysis is done. The target data distribution is found to have 86.7% of blood donor, 4% of suspect blood donor, 3.9% of Hepatitis, 3.4% of fibrosis, 1.1% of cirrhosis patients, which evidently stipulates the unprovoked target dissemination. The dataset is applied to Anova test to identify the essential features for classifying the target. The features like sex, alkaline phosphatase (ALP) and protein exhibit the value of PR(>F) to be more than 0.05, and it is detached from the database to form Anova compressed datasample. The Anova compressed datasample is applied with oversampling techniques like borderline, smote, SVM smote and ADASYN methods to stabilise the distribution of Hepatitis C virus target distribution. The Anova oversampled datasample by the above methods is then deployed with various classifier models to forecast the hypothyroid to examine the performance metrics. Implementation results show that the Decision Tree classifier model shows accuracy of 84% before applying Anova test and oversampling of target Hepatitis C virus feature. The same Decision Tree classifier portrays the accuracy of 99% after applying the Anova reduced dataset to oversampling methods like Borderline1, Borderline2, SVMSmote and ADASYN. However, the Decision Tree classifier exhibits the accuracy of 100% for SMOTE oversampling method. M. S. Devi (B) · S. V. Kumar · P. S. Ramesh · A. Kavitha · K. Jayasree · V. S. S. Rajesh Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamilnadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_32

375

376

M. S. Devi et al.

Keywords Machine learning · Oversampling · Precision · Recall · Accuracy

1 Introduction RNA pattern of the genomic sequence of the Hepatitis C virus can now be easily sequenced and analysed (HCV). More patients are now able to obtain a sustained biological response thanks to the development of direct-acting antiviral medicines (DAA), which has significantly improved HCV treatment outcomes. Using information outputs, such as treatment outcomes, machine learning is a technique for choosing the optimal set of explanatory variables in a mathematical model. The finest explanatory variables are generated automatically with machine learning, which is harder to understand than traditional statistical methods but is incredibly helpful for evaluating massive volumes of data. Machine learning recent developments, particularly deep learning, have made it possible to identify, measure and characterise patterns in medical images. These developments were made possible by deep learning’s capacity to learn features from data rather than manually designing features based on domain-specific expertise. Deep learning is rapidly taking over as the industry norm, which has enhanced performance in a variety of medical applications. Because of these developments, professionals will find it simpler to identify and examine specific healthcare conditions. The paper is structured so that the literature review is covered in Sect. 2, and the paper’s contributions are covered in Sect. 3. Section 4 discusses the implementation setup and results, and Sect. 5 draws a conclusion.

2 Literature Review In this paper [1], the price of gene sequencing has decreased because of the creation and use of next-generation sequencing technologies. For instance, it is now simple to extract and study the full Hepatitis C virus genomes. Many individuals are now achieving a sustained biological response because of the advent of direct-acting antiviral medicines, which has drastically improved treatment results for HCV. This study [2] deals with treatment-resistant varieties evolve, there are still certain situations where the virus has not been eradicated. Despite the fact that DAA behaviour has a significant healing impact due to its direct action on HCV, the possibility of achieving SVR is dependent on the location of the HCV genomic variant. This paper explores [3] with current medical understanding, which suggests that elevated LDH levels are linked to tissue disintegration in a variety of illnesses, including pneumonia. This paper [4] proposed the recursive feature elimination (RFE) algorithm which is contraption learning approach that uses a regressive mixture procedure to assess whether predictors are useful and then choose the best predictors to build the model. This research [5] performs the analysis of history of reinforcement learning

Feature Reduced Anova Element Oversampling Elucidation Based …

377

and cardiac radiology applications are discussed in that study. Deep learning techniques are used to treat numerous lung ailments, including pulmonary pneumonia, chronic obstructive lung disease, pulmonary embolism and nodular diseases. The goal of this study is to use a new hybrid machine learning method that includes this classification system to detect thyroid disease. A cross-validation analysis is used to determine the technique’s robustness to sampling variability [6]. The exponential growth of databases and repositories has resulted from the expansion of scientific knowledge and the massive production of data. Biomedical data is currently abundant, ranging from clinical symptom information to various types of biochemical data and imaging device outputs [7]. Using a back propagation algorithm, this paper describes a method for detecting thyroid disease earlier. The ANN was then trained using empirical values, and testing was carried out using data that had not been used during the training process [8]. Because efficient techniques for analysing and identifying disorders are required, data collection is an important methodological approach in the field of medical disciplines [9]. In iron-deficient blood smears, this study describes a novel method for detecting three types of anomalous red blood cells known as poikilocytes. Poikilocyte cell classification and counting is regarded as a critical step in the early detection of iron deficiency anaemia disease. Dacrocyte, elliptocyte and schistocyte cells are the three basic poikilocyte cells in IDA. In this paper, [6] proposed a method based on two architectures, one for nodule segmentation and the other for finding the grade of malignancy. Max pooling is utilised for subpooling, ReLU is being used as an activation function, and Softmax is the classifier used to perform classification and assign malignancy level. CNN is being used for categorisation as well as extraction of features to establish the malignancy threshold. The Adam classifier is being used to maximise value distribution in convolution layer. A deep learning model was adopted in this work [7] to identify and categorise malignant tissues. As input, CT scans from LIDC and private datasets were used, and the intensity level was enhanced utilising normalisation. The paradigm for multiscene deep learning described in this work [8] comprises a number of components. The probability density function of various greyscale image is produced from the raw CT images using threshold segmentation, also known as entropy. This study examines the [9] vessel filters, which are employed to eliminate vessels and lower the incidence of false positives. A pooling layer, a convolutional layer and a fully integrated layer are all present in the CNN design. In this paper [10], different methods for discriminating melanoma clusters from non-nodules were compared. In order to lessen or completely eliminate false positive forecasts, they created the 3D convolutional neural network technique. This study [11] presented an IoT-based system made up of wearable smart devices and certain symptomatic maps that may be utilised to identify any significant indications that the individual may be experiencing and notify the clinician.

378

M. S. Devi et al.

3 Our Contributions In Fig. 1, the overall structure of the project is depicted. This paper offers the aspects listed below. • To start, the Hepatitis C virus dataset has been preprocessed with encoding and missing values. It has 12 attributes and 615 patient characteristics. • To assess the quality metrics, the original dataset is used with all classifiers, both with and without feature scaling. To look at how the target feature variable is distributed, exploratory data analysis is done.

Hepatitis CVirus Dataset

Partition of dependent and independent attribute

Data Exploratory Analysis

Raw dataset

Feature Scaling

Anova Test Analysis

Oversampling

SMOTE, SVM Smote ADASYN, Borderline oversampled Dataset

Fitting to all the Classifiers

Analysis of Precision, Recall, FScore, Accuracy and Run Time

Prediction of Hepatitis CVirus

Fig. 1 Architecture system workflow

Feature Reduced Anova Element Oversampling Elucidation Based …

379

• Thirdly, the target data distribution is found to have 86.7% of blood donor, 4% of suspect blood donor, 3.9% of Hepatitis, 3.4% of fibrosis, 1.1% of cirrhosis patients, which evidently stipulates the irregular target dissemination. • Fourth, the dataset is applied to Anova test to identify the essential features for classifying the target. • Fifth, to stabilise the prevalence of the Hepatitis C virus target distribution, the Anova reduced dataset is applied using oversampling techniques such borderline, smote, SVM smote and ADASYN approaches. • Sixth, using the aforementioned techniques, the Anova oversampled dataset is applied with a variety of classifiers to forecast hypothyroidism before and after feature scaling.

4 Implementation Setup The Hepatitis C virus database from the KAGGLE repository, which consists of 615 rows and 12 feature attributes, is subjected to data preprocessing, including filling in incomplete data and encrypting categorical data. The target distribution of the dataset is shown in Fig. 2. The exploratory data analysis is done to check the missing values in the dataset, and Fig. 1 shows the features containing the incomplete data. To examine the performance metrics, the raw dataset is applied to all classifiers with and without feature scaling, and the results are displayed in Table. 1. The Anova test is used to confirm dataset features with PR( > F) 0.05 that have a significant impact on the target. The results of the Anova test on the dataset’s characteristics indicate that the features “Sex,” “ALP,” and “PROT” have values of PR(> F) > 0.05 and do not influence to the target, as shown in Table 2. The target Hepatitis C virus feature pattern is displayed in Fig. 3 following the application of the Anova compressed database with oversampling. After applying Borderline1 oversampling and fitting all classifiers on the Anova reduced dataset to predict the Hepatitis C virus with and without feature scaling, performance is evaluated and shown in Table 3.

Fig. 2 Density map and target distribution of the Hepatitis C virus dataset

380

M. S. Devi et al.

Table 1 Performance analysis of original Hepatitis C virus dataset Classifier

Presence of scaling

Absence of scaling

Prec

Rcall FScore

Accu

RunTime

Prec

Recall

FScore

Accu

Run time

LReg

0.85

0.87

0.85

0.87

0.09

0.86

0.88

0.87

0.88

0.11

KNN

0.86

0.88

0.84

0.88

0.77

0.87

0.88

0.83

0.88

0.78

KSVM

0.76

0.87

0.81

0.87

1.02

0.85

0.87

0.82

0.87

0.88

GNB

0.81

0.36

0.42

0.36

0.02

0.81

0.36

0.42

0.36

0.02

Dtree

0.83

0.84

0.83

0.84

0.14

0.83

0.84

0.83

0.84

0.11

Etree

0.82

0.83

0.82

0.83

0.02

0.82

0.83

0.82

0.83

0.02

RFor

0.88

0.88

0.83

0.89

0.17

0.86

0.88

0.83

0.88

0.16

AdaB

0.88

0.89

0.88

0.88

0.82

0.88

0.89

0.88

0.89

0.82

Ridge

0.85

0.88

0.83

0.88

0.08

0.85

0.88

0.83

0.88

0.02

RCV

0.84

0.87

0.83

0.87

0.08

0.84

0.87

0.83

0.87

0.05

SGD

0.86

0.88

0.87

0.88

0.09

0.87

0.87

0.87

0.87

0.10

PAg

0.85

0.87

0.86

0.87

0.06

0.86

0.84

0.85

0.84

0.02

Bagg

0.85

0.88

0.86

0.88

1.04

0.86

0.88

0.85

0.88

1.05

Table 2 Anova method evaluation with the database components Attributes

Sum_sq

df

F

PR(> F)

Age

7.688

1

7.01134

0.008302

Sex

2.501

1

2.26373

0.132945

ALB—Albumin blood test

5.405

1

5.43862

5.36 E-13

ALP—Alkaline phosphatase

0.551

1

0.49789

0.480696

ALT—Alanine transaminase

7.692

1

7.01512

0.008291

AST—Aspartate transaminase

2.858

1

4.44528

1.277 E-74

BIL—Bilirubin

1.523

1

1.77678

1.344 E-35

CHE—Acetylcholinesterase

7.380

1

7.46456

4.887 E-17

CHOL—Cholesterol

6.123

1

6.07347

2.802 E-14

CREA—Creatinine

2.253

1

2.10102

0.000006

GGT—Gamma-glutamyl transferase

1.509

1

1.74913

2.623 E-35

PROT—Protein

0034

1

0.03142

0.8593541

The efficiency of each classifier to predict the Hepatitis C virus with and without feature scaling is examined and shown in Table 4 after the Anova reduced dataset has been equipped with Borderline2 oversampling. After SMOTE oversampling, the Anova reduced dataset is integrated to all the classifiers to predict the Hepatitis C virus with and without feature scaling. Performance is evaluated, and results are shown in Table 5. The performance of each

Feature Reduced Anova Element Oversampling Elucidation Based …

381

Fig. 3 Target Hepatitis C virus dissemination with oversampling approaches

classifier is evaluated and is presented in Table 6 after the Anova reduced dataset has been fitted using SVM SMOTE oversampling and fitted to all of them to predict the Hepatitis C virus with and without feature scaling. The efficiency of each classification is evaluated and is presented in Table 7 after the Anova reduced dataset has been adapted using SVM SMOTE oversampling and applied to all of them to predict the Hepatitis C virus with and without feature scaling.

5 Conclusion This work investigates the classification of Hepatitis C virus disease established on the target Hepatitis C virus class dissemination. The Hepatitis C virus dataset is examined to explore the imbalanced target hypothyroid class and the target data

382

M. S. Devi et al.

Table 3 Borderline1 oversampling efficiency analysis Classifier

Presence of scaling

Absence of scaling

Precision Rcall FScore Accu RunTime Precision Recall FScore Accu Run time Lreg

0.90

0.89

0.90

0.89

7.34

0.89

0.89

0.89

0.89

0.03

KNN

0.98

0.98

0.98

0.98

0.03

0.96

0.95

0.95

0.95

0.51

KSVM 0.95

0.95

0.95

0.95

0.51

0.95

0.95

0.95

0.95

0.01

GNB

0.85

0.85

0.85

0.85

0.02

0.83

0.83

0.82

0.83

0.78

Dtree

0.99

0.99

0.99

0.99

0.79

0.99

0.99

0.99

0.99

3.34

Etree

0.96

0.96

0.96

0.96

3.62

0.96

0.96

0.96

0.96

0.02

Rfor

1.00

1.00

1.00

1.00

0.02

0.98

0.98

0.98

0.98

0.11

AdaB

0.41

0.34

0.30

0.34

0.10

0.53

0.43

0.37

0.43

0.19

Ridge

0.87

0.86

0.86

0.86

0.02

0.83

0.82

0.82

0.82

0.02

RCV

0.87

0.86

0.86

0.86

0.79

0.83

0.82

0.82

0.82

0.21

SGD

0.87

0.86

0.85

0.86

3.62

0.88

0.87

0.87

0.87

0.29

Pag

0.86

0.81

0.82

0.81

0.02

0.83

0.74

0.73

0.74

0.15

Bagg

0.95

0.95

0.95

0.95

4.27

0.98

0.98

0.98

0.98

4.31

Table 4 Classifier performance of Borderline2 oversampling before and after feature scaling Classifier

Presence of scaling

Absence of scaling

Precision Rcall FScore Accu RunTime Precision Recall FScore Accu Run time Lreg

0.88

KNN

0.88

0.88

0.88

0.51

0.87

0.87

0.87

0.87

0.51

0.96

0.95

0.95

0.95

0.02

0.95

0.95

0.95

0.95

0.01

KSVM 0.90

0.90

0.90

0.90

0.79

0.95

0.95

0.95

0.95

0.78

GNB

0.84

0.84

0.83

0.84

3.62

0.80

0.80

0.80

0.80

3.34

Dtree

0.99

0.99

0.99

0.99

0.02

0.99

0.99

0.99

0.99

0.02

Etree

0.92

0.92

0.92

0.92

0.10

0.94

0.94

0.94

0.94

0.11

Rfor

0.90

0.90

0.90

0.90

0.26

0.97

0.97

0.97

0.97

0.19

AdaB

0.34

0.38

0.27

0.38

3.62

0.54

0.54

0.49

0.54

0.05

Ridge

0.83

0.83

0.83

0.83

0.02

0.79

0.79

0.79

0.79

4.51

RCV

0.84

0.83

0.83

0.83

0.10

0.79

0.79

0.79

0.79

0.11

SGD

0.85

0.77

0.76

0.77

0.26

0.81

0.81

0.81

0.81

0.19

Pag

0.81

0.81

0.79

0.81

0.09

0.78

0.76

0.75

0.76

0.05

Bagg

0.96

0.95

0.96

0.95

4.27

0.97

0.97

0.97

0.97

4.61

Feature Reduced Anova Element Oversampling Elucidation Based …

383

Table 5 SMOTE oversampling performance analysis Classifier

Presence of scaling

Absence of scaling

Precision Rcall FScore Accu RunTime Precision Recall FScore Accu Run time Lreg

0.88

0.88

0.88

0.88

0.04

0.86

0.86

0.86

0.86

0.90

KNN

0.95

0.95

0.95

0.95

1.26

0.96

0.96

0.96

0.96

0.02

KSVM 0.92

0.91

0.91

0.91

0.03

0.94

0.93

0.93

0.93

1.15

GNB

0.85

0.85

0.85

0.85

1.89

0.82

0.81

0.81

0.81

6.18

Dtree

1.00

1.00

1.00

1.00

7.54

1.00

1.00

1.00

1.00

0.05

Etree

0.92

0.92

0.92

0.92

0.04

0.93

0.92

0.92

0.92

0.18

RFor

0.96

0.95

0.95

0.95

0.15

0.98

0.98

0.98

0.98

0.62

AdaB

0.42

0.38

0.33

0.38

0.41

0.34

0.44

0.34

0.44

0.22

Ridge

0.85

0.85

0.85

0.85

0.33

0.77

0.77

0.77

0.77

10.06

RCV

0.85

0.85

0.85

0.85

7.77

0.77

0.77

0.77

0.77

0.18

SGD

0.84

0.83

0.83

0.83

0.41

0.84

0.83

0.83

0.83

0.62

PAg

0.74

0.54

0.55

0.54

0.33

0.81

0.68

0.68

0.68

0.22

Bagg

0.98

0.98

0.98

0.98

6.77

0.97

0.97

0.97

0.97

11.06

Table 6 SVM SMOTE oversampling performance analysis Classifier

Presence of scaling

Absence of scaling

Precision Rcall FScore Accu Run Precision Recall FScore Accu Run time time LReg

0.89

KNN

0.89

0.89

0.89

0.05

0.96

0.96

0.96

0.96

0.07

0.98

0.98

0.98

0.98

1.22

0.98

0.98

0.98

0.98

1.98

KSVM 0.92

0.90

0.90

0.90

0.05

0.97

0.97

0.97

0.97

0.06

GNB

0.85

0.85

0.85

0.85

1.55

0.92

0.92

0.92

0.92

2.62

Dtree

0.99

0.99

0.99

0.99

7.46

0.99

0.99

0.99

0.99

8.75

Etree

0.96

0.96

0.95

0.96

0.04

0.96

0.96

0.96

0.96

0.04

RFor

0.96

0.95

0.95

0.95

0.17

0.96

0.95

0.95

0.95

0.19

AdaB

0.53

0.57

0.46

0.57

0.26

0.59

0.64

0.58

0.64

0.43

Ridge

0.86

0.85

0.85

0.85

0.25

0.85

0.85

0.85

0.85

0.15

RCV

0.86

0.85

0.85

0.85

0.17

0.85

0.85

0.85

0.85

9.97

SGD

0.80

0.77

0.75

0.77

0.26

0.93

0.93

0.93

0.93

0.43

PAg

0.84

0.80

0.80

0.80

0.25

0.89

0.89

0.89

0.89

0.15

Bagg

0.99

0.99

0.99

0.99

1.52

0.97

0.97

0.97

0.97

7.97

384

M. S. Devi et al.

Table 7 ADASYN oversampling performance analysis Classifier

Presence of scaling

Absence of scaling

Precision Rcall FScore Accu Run Precision Recall FScore Accu Run time time LReg

0.92

0.95

0.93

0.95

0.04

0.95

0.96

0.95

0.96

0.04

KNN

0.94

0.95

0.93

0.95

0.88

0.90

0.94

0.92

0.94

0.76

KSVM 0.91

0.95

0.93

0.95

0.04

0.91

0.95

0.93

0.95

0.03

GNB

0.94

0.94

0.94

0.94

1.12

0.94

0.95

0.95

0.95

1.16

Dtree

0.99

0.99

0.99

0.99

5.80

0.99

0.99

0.99

0.99

5.68

Etree

0.91

0.93

0.92

0.93

0.04

0.92

0.93

0.93

0.93

0.03

RFor

0.96

0.97

0.96

0.97

0.13

0.92

0.95

0.93

0.95

0.11

AdaB

0.68

0.61

0.58

0.61

0.42

0.81

0.83

0.81

0.83

0.22

Ridge

0.90

0.94

0.92

0.94

0.14

0.90

0.94

0.92

0.94

0.12

RCV

0.90

0.94

0.92

0.94

5.67

0.90

0.94

0.92

0.94

0.11

SGD

0.88

0.86

0.86

0.86

0.42

0.91

0.94

0.93

0.94

1.22

PAg

0.93

0.95

0.94

0.95

1.14

0.91

0.93

0.92

0.93

2.12

Bagg

0.95

0.95

0.95

0.95

3.67

0.94

0.95

0.94

0.95

3.68

distribution found to have 86.7% of blood donor, 4% of suspect blood donor, 3.9% of Hepatitis, 3.4% of fibrosis, 1.1% of cirrhosis patients, which evidently stipulates the unprovoked target dissemination. This paper work is done to show how well the classifier accuracy is improved by applying the Anova test and oversampling methods. The Anova reduced dataset is deployed with the oversampling techniques to analyse the performance of the classifiers towards classifying the Hepatitis C virus class. Experimental results show that the Decision Tree classifier shows accuracy of 84% before applying Anova test and oversampling of target Hepatitis C virus feature. The same Decision Tree classifier shows the accuracy of 99% after applying the Anova reduced dataset to oversampling methods like Borderline1, Borderline2, SVMSmote and ADASYN. However, the Decision Tree classifier exhibits the accuracy of 100% for SMOTE oversampling method.

References 1. Monsi J, Saji J, Vinod K, Joy L, Mathew JJ (2019) XRAY AI: lung disease prediction using machine learning. Int J Inf 8(2) 2. Ausawalaithong W, Thirach A, Marukatat S, Wilaiprasitporn T (2018) Automatic lung cancer prediction from chest X-ray images using the deep learning approach. In: Proceedings of the biomedical engineering international conference, pp 1–5 3. Bhatia S, Sinha Y, Goel L (2019) Lung cancer detection: a deep learning approach. In: Proceedings of the soft computing for problem solving, pp 699–705 4. Punithavathy K, Ramya MM, Poobal S (2015) Analysis of statistical texture features for automatic lung cancer detection in PET/CT images. In: Proceedings of the international conference

Feature Reduced Anova Element Oversampling Elucidation Based …

385

on robotics, automation, control and embedded systems, pp 1–5 5. Sharma D, Jindal G (2011) Identifying lung cancer using image processing techniques. In: Proceedings of the international conference on computational techniques and artificial intelligence, vol 17, pp 872–880 6. Asuntha A, Srinivasan A (2020) Deep learning for lung cancer detection and classification, multimedia tools and applications, pp 1–32 7. Teramoto A, Tsukamoto T, Kiriyama Y, Fujita H (2017) Automated classification of lung cancer types from cytological images using deep convolutional neural networks. BioMed research international 8. Ozdemir O, Russell RL, Berlin AA (2019) A 3D probabilistic deep learning system for detection and diagnosis of lung cancer using low-dose CT scans. IEEE Trans Med Imaging 14:19–29 9. Gordienko Y, Gang P, Hui J, Zeng W, Kochura Y, Alienin O, Rokovyi O, Stirenko S (2018) Deep learning with lung segmentation and bone shadow exclusion techniques for chest x-ray analysis of lung cancer. In: Proceedings of the international conference on computer science, engineering and education applications, pp 638–647 Springer, Cham 10. Bhatia S, Sinha Y, Goel L (2019) Lung cancer detection: a deep learning approach. In: Proceedings of the soft computing for problem solving, pp 699–705. Springer, Singapore 11. Chen W, Wei H, Peng S, Sun J, Qiao X, Liu B (2019) HSN: hybrid segmentation network for small cell lung cancer segmentation, IEEE Access, pp 75591–603

Personality Trait Detection Using Handwriting Analysis by Machine Learning Pratibha Singh, Sushant Verma, Shivam Chaudhary, and Shivam Gupta

Abstract Handwriting analysis with a computer using artificial intelligence (AI) is better than conventional methods of handwriting analysis. This can be termed based on different aspects such as speed, accuracy, and better pattern identification than visual inspection. Moreover, we can say machine learning-assisted analysis on the computer is much more efficient and devoid of human errors. A system is designed here to automate the basic personality trait detection tasks using handwriting analysis of graphology by using convolutional neural networks (CNNs). Keywords Machine learning · Handwriting analysis · Personality prediction · Graphology

1 Introduction There are several characteristics and qualities in an individual which is when combined together is termed as personality. The evolution and growth of an individual’s attributes, values, personal memories of life events, habits, skills and relationships with the community can affect their personality. Personality is majorly affected by behaviors and decisions of an individual. Different handwriting features of an individual can be used for identification of several personality traits. The machine learning-based classifiers can predict the personality traits of the writer. The objective of this paper is to develop a system that takes an image document containing the handwriting of a person and outputs a few of his/her personality traits based on some selected handwriting features. Carefully analyzing all the significant characteristics of a handwriting manually is not only time-consuming but prone to errors as well. Automating the analysis on a few selected characteristics of handwriting will speed up the process and reduce the errors. P. Singh (B) Krishna Engineering College, Ghaziabad, UP, India e-mail: [email protected] S. Verma · S. Chaudhary · S. Gupta ABES Engineering College, Ghaziabad, UP, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_33

387

388

P. Singh et al.

2 Literature Survey This paper [1] has proposed a method to predict individual personality traits by analyzing handwritten patterns using machine learning. It extracts seven personality traits and predicts eight personality traits by their different combinations. After adequate training using SVM, the personality traits of new handwritten image samples can be very accurately and efficiently predicted. This paper [2] provides an overview of the literature related to personality analysis through personal writing. Key results were presented and discussed, and an attempt was made to take factors into account when analyzing handwriting samples. RNN, CNN, and LSTM networks are efficient and have limited knowledge of semantic structure to process very large data sets in a short period of time. This paper [3] conducted a survey to determine personality traits through automated handwriting analysis and current technologies available in the process. Both humans and machines can identify personality traits using handwritten samples. This study covers topics that can be identified from handwritten samples. Gender, handedness, education level, age, country all use different streams to help with forensic medicine by advanced technology. This paper [4] has proposed a method to predict an individual’s exact personality traits from traits extracted from handwriting using machine learning. This article describes the personality traits revealed by a person’s baseline, margins, slant, and height of writing. These features are extracted from the handwritten sample into feature vectors and compared to the originally formed data set. It is then assigned to a class with the corresponding personality trait. Tone of voice (speech), facial sharpness, gesture, posture and, in many cases, the exterior styling reflect the interior styling. The handwritten analysis was previously done manually, which is time-consuming and human in nature. In manual analysis, the accuracy of the analysis depends on r geometry skills. In addition, when analyzing many samples, the geometry is subject to fatigue [5]. This paper [6] compares the performance of the algorithm with the underlying machine learning model of the data set. They test the new architecture on this data set and compare the algorithms with different metrics, showing that the new algorithm outperforms the underlying machine learning system. The proposed architecture is therefore an analysis of the five personality traits. In this paper [7], the authors mention the handwriting characteristcs identified by the psychologists and handwriting experts. They conducted a survey and observed the relationship between personality traits and handwriting characteristics. Research papers include photogrammetric characteristics such as tilt, baseline, gradient, margin, font size, pen pressure, line spacing, word spacing, and lowercase letters such as that ‘f’, ‘”, ‘t’, and respective subfeatures of these auxiliary characters. This paper [8] discussed some important methodological issues and highlighted some potential pitfalls to consider when applying the ML model. However, it is very convincing that core ML concepts such as resampling, sample error evaluation (cross-validation, etc.), and interpretable ML methods (ALE charts, etc.) can contribute to the solidity and generalization of personality psychology research. This paper [9] has proposed a method to detect personality and characteristics by performing calligraphic analysis on input images. They used two

Personality Trait Detection Using Handwriting Analysis by Machine …

389

data sets in order to compare results. The input images were preprocessed firstly using various techniques and then passed on to extract the various mentioned features. The system predicts the character of the writer by comparing the input image and characteristics with the related model. The main function of the proposed work is to require all the features that can use the combined model (CNN and MLP). Gav-rilescu and Vizireanu [10] described the first non-invasive three-level architecture in the literature for the purpose of identifying Big Five personality traits simply by analysis of handwritten letters. The characteristics of handwriting are determined from calculated words, letters and segments. This paper [11] performed several experiments using 40 different script classes. Features were extracted from handwritten images using Gabor multi-channel filtering and grayscale co-occurrence matrix (GSCM) technology. Identification is done using two different classes, the weighted Euclidean distance (WED) and the nearest neighbor classifier (KNN). The results obtained were very promising, with a discriminant precision of 96.0% obtained using the WED classifier. The KNN classifier gives relatively bad results.

3 Proposed System The proposed system has following main phases:

3.1 Data Acquisition Data from the IAM Handwriting Database of Research Group on Computer Vision and Artificial Intelligence INF, University of Bern, Switzerland, is obtained. The data is readily available for download to be used for non-profit research purposes. The database contains 1538 pages of scanned text for which 657 writers contributed samples of their handwriting. Each handwriting sample is labeled with the corresponding psychological traits by manually studying each document [1].

3.2 Data Preprocessing A set of procedure is applied on every handwriting sample before running it through the system for handwriting analysis and personality trait detection. It is done to improve the quality of image. Various steps involved in data preprocessing are: Re-sizing and Resolution fixing. These images are cropped and saved as PNG images with an automatic action script. Now the width of all the image is 850 pixels, and the height is according to the content of the handwriting in the image [1].

390

P. Singh et al.

Noise Removal. Noise in an image is defined as a random change in brightness or color information within an images, and is generally an aspect of electronic noise. Noise can be removed by filtering the images. There are several filtration methods to remove the noise present inside the images. Some of them are mean filter, median filter and bilateral filter. On a negative side, applying these features may also reduce the level of details of the image [1]. Grayscale and Binarization. For extracting the handwriting features, conversion to grayscale and binarization are important parts of the pipeline. The image instances are converted to grayscale and binarized using inverted global thresholding. In binary images, pixels can take 0 (black) or 255 (white). To create a binary image, apply a simple threshold to classify all pixels in the image plane into foreground and background pixels, that is, the text itself and the white background of the paper. We can then create an inverted binary image function so that pixels above a certain threshold (foreground) are converted to 255 and pixels below a certain threshold (background) are converted to 0. This threshold activity can be expressed as follows:  dst(x, y) =

0, if src(x, y) > thresh maxval, otherwise

(1)

If the intensity of the pixel src(x, y) is stronger than threshold, the new pixel intensity is set to a 0. Otherwise, the pixel intensity will be set to MaxVal [1]. Contour and warp affine Transformation (Normalization). An outline or contour is a closed curve of a point segment or line representing the boundary of an object in an image. In other words, the outline represents the shape of the object in the image. If we can see internal details in the image, the object may appear as some associated contour lines which are returned in a hierarchical data structure. Warp affine transformation is applied to rotate the contours found on an image so that the baseline of the handwriting is strictly horizontal.

3.3 Segmentation (Horizontal and Vertical Projections) Segmentation is the process of segmenting a handwritten page into three different types of segments: lines, words, and letters. The image row, vertical projection is the sum of the Python list of all pixel values in each column of the image. Both operations are performed on the grayscale image [1].

Personality Trait Detection Using Handwriting Analysis by Machine …

391

3.4 Feature Extraction Feature extraction is the process of reducing size and extracting important data from large input data. The output data is used to analyze the character of the writer. Different neural network algorithms can be used in this process of decreasing the dimensionality. The features to be extracted are zones, baseline, slant of a letter, size of a letter, pressure applies, word spacing, line spacing, speed, height of t-bar, etc.

3.5 Classification During the classification process, personality traits are extracted based on various traits extracted from handwritten samples. Features extracted from the previous step are given as input to the classifier. According to the feature values, the personality traits of the writer are identified. We can perform classifications using classification methods, classifiers, or rule-based systems. The classifier output acts as a personality trait taken from the writer’s handwritten sample.

4 Implementation The implementation uses Python 3.9, Open CV library, Sci-kit Learn Library, Spyder IDE and the hardware used is a scanner, GPU (GTX 1050Ti), CPU (8-Cores 2.3Ghz). The flow of work done is shown in Fig. 1. It involves the steps of capturing images, its binarization, processing image, feature extraction, and at last, classifier is used for personality detection. All the configurations were executed on identical test set and 100 Epochs, respectively.

Fig. 1 Work flow diagram

392

P. Singh et al.

Fig. 2 Big loop ‘S’

Fig. 3 Small loop ‘S’

Fig. 4 Closed loop ‘S’

4.1 Extraction of Letter ‘S’ The personality trait detection in the proposed system is done on the basis of the features of the letter ‘S’ in the handwriting sample. The features of English letter ‘S’ being checked are whether it is a calligraphic letter or a printed letter. The second thing we check in a calligraphic letter is the size of the loop. If the letter ‘S’ is made in a calligraphic way then it signifies a personality with a sense of economy. Practical sense, serious, realistic, cautious and meticulous. ‘Conscious’ person [12]. A big loop in the letter ‘S’ Fig. 2 shows generosity, sympathy and giving nature of a person [12]. A small loop in the letter ‘S’ Fig. 3 shows Honesty, familiarity, sensitive to ethical principles. If ending in a curve, it means the individual has capacity for adjustment to reality [12]. Closed loop in the letter ‘S’ Fig. 4 implies that generosity is limited. Individual need to control expenses. Adjusts to reality through practical sense [12].

4.2 Extraction of Letter ‘M’ Width of letter ‘m’. The feature of English letter ‘M’ being checked is the width of the letter. If letter ‘m’ is wider than normal (as shown in Figs. 5 and 6), then psychologically, it means aplomb, composure, extroversion, need for contact with the others, generosity, sociability, self-confidence [12]. Fig. 5 Wide letter m

Personality Trait Detection Using Handwriting Analysis by Machine …

393

Fig. 6 Normal letter m

Fig. 7 Letter d without stroke

Fig. 8 Letter d with stroke

4.3 Extraction of Letter ‘D’ The features of English letter ‘d’ being checked are whether it has an initial stroke or not. Without initial stroke in ‘d’: Shows a simple personality [12] (Fig. 7). With initial stroke in ‘d’: Firm principles, rigor, good abilities and aggressiveness. May be inappropriate and even tiring [12] (Fig. 8).

4.4 Extraction of Letter ‘P’ The features of English letter ‘p’ being checked are whether it has an initial stroke or not. Without initial stroke ‘p’. Very good intelligence and standard independence (as shown in Fig. 9) with a few exceptions can be very good intelligence when the initial stroke is sustained [12]. Fig. 9 Letter p without stroke

394

P. Singh et al.

Fig. 10 Letter p with stroke

Table 1 Performance analysis on different CNN configurations

Configuration

Epochs

Accuracy

3 CNN layers

100

68.42

4 CNN layers

100

76.32

5 CNN layers

100

71.05

With initial stroke ‘p’. It involves the ability to conduct business and material activities with good communication, optimism and creative thinking (as shown in Fig. 10). We can be a charlatan, a liar or superficial [12].

5 Result We tried different configurations as shown in Table 1 of CNN layers to determine the best configuration for getting the most optimum accuracy for the proposed model. Different configurations we tried were three CNN layers, four CNN layers and five CNN layers. Here, the four-CNN-layer configuration showed the best results with 76.32% accuracy on test set (Fig. 11). The three-CNN-layer configuration showed the worst result with 68.42% accuracy on test set (Fig. 12). The five-CNN-layer configuration showed the average result with 71.05% accuracy on test set (Fig. 13). All the configurations were executed on identical test set and 100 epochs, respectively.

6 Conclusion Handwriting analysis can be done using machine learning to extract various features from a handwriting sample which can be used to define the personality of the writer of that sample. The implementation mainly focuses on eight handwriting characteristics like zones, baseline, slant, size, pressure applied, word spacing, line spacing, speed, height of t-bar. The proposed system can be used by lexicographers as a tool to improve the accuracy of handwriting analysis and streamline the process. The four letters analyzed are ‘s’, ‘m’, ‘d’ and ‘p’. Here, in ‘s’ analysis is done based on the loop in the calligraphic small case letter ‘s’. In ‘m’, the width of the letter is being analyzed. In ‘d’ and ‘p’, analysis is performed based on the initial strokes in the letter.

Personality Trait Detection Using Handwriting Analysis by Machine … Fig. 11 Graph for four CNN layers

Fig. 12 Graph for three CNN layers

Fig. 13 Graph for five CNN layers

395

396

P. Singh et al.

Though the feature checked in ‘d’ and ‘p’ is same still the resultant for both the letters is different. Handwriting analysis for personality trait detection using CNN is faster and efficient than the conventional method. Though the performance of the system may vary for different algorithms like MLP, CNN, SVM, KNN, etc. for different set of training and testing data provided.

References 1. Singh LD, Malemnganba M, Humjah MA (2018) Report submitted to national institute of technology Manipur. Psychological analysis based on handwriting pattern with machine learning 2. Pathak AR, Raut A, Pawar S, Nangare M, Abbott HS, Chandak P (2020) Personality analysis through handwriting recognition. J Discrete Math Sci Crypt 23(1):19–33 3. Fisher J, Maredia A, Nixon A, Williams N, Leet J (2012) Identifying personality traits, and especially traits resulting in violent behavior through automatic handwriting analysis. In: Proceedings of student-faculty research day, CSIS, Pace University, D6.1-D6.8 4. Joshi P, Agarwal A, Dhavale A, Suryavanshi R, Kodolikar S (2015) Handwriting analysis for detection of personality traits using machine learning approach. Int J Comput Appl 130:40–45 5. Hemlata S, Singh K (2018) Personality detection using handwriting analysis. In: The seventh international conference on advances in computing, electronics and communication (ACEC), pp 85–89 6. Elngar AA, Jain N, Sharma D, Negi H, Trehan A, Srivastava A (2020) A deep learning based analysis of the big five personality traits from handwriting samples using image processing. J Inf Technol Manage 12, no. Special issue: deep learning for visual information analytics and management, pp 3–35 7. Chaudhari K, Thakkar A (2019) Survey on handwriting-based personality trait identification. Expert Syst Appl 124:282–308 8. Stachl C, Pargent F, Hilbert S, Harari GM, Schoedel R, Vaid S, Gosling SD, Bühner M (2020) Personality research and assessment in the era of machine learning. Eur J Pers 34(5):613–631 9. Raj NN, Thaha M, Shaji SG, Shibina S (2020) Forecasting personality based on calligraphy using CNN and MLP. Int J Comput Sci Eng 8(7):41–48 10. Gavrilescu M, Vizireanu N (2018) Predicting the Big Five personality traits from handwriting. EURASIP J Image Video Process 2018:1–17 11. Said HES, Tan TN, Baker KD (2000) Personal identification based on handwriting. Pattern Recogn 33(1):149–160 12. Handwriting & Graphology (2021) Learn graphology and handwriting analysis online. https:// www.handwriting-graphology.com/. Last accessed on 23 June 2021

Road Traffic Density Classification to Improvise Traffic System Using Convolutional Neural Network (CNN) Nidhi Singh and Manoj Kumar

Abstract Traffic system improvisation has been emerging as a societal concern because of increasing number of deaths due to road accidents. This problem highlights the need to study traffic accidents and propose a solution by analyzing the responsible factors. Development of an automated system for controlling the traffic in comparison of traditional methods is the need of today. This could be achieved by analyzing the density of the traffic at any location especially in the cities where the count of accident hotspot is high. This paper proposed CNN-based model that performed hyperparameter tuning of sequential model to classify road traffic conditions base on traffic conditions images collected from Internet. Model classifies the image content and measures the congestion of traffic. Traffic conditions are classified into two levels as high density and low density with an accuracy of nearly 90%. Keywords CNN (Convolutional neural network) · Data analytics · Keras framework

1 Introduction Traffic system improvisation is the need of today to reduce the loss of life due to sudden road accidents in day-to-day life. And this could be done if users can determine the traffic conditions of the coming locations well in advance. Heavy congestion on roads results in loss of time and degrading the traveling quality. Therefore, the issue of classifying the conditions of roads is an important parameter for traffic improvisation. One can classify the road condition by estimating the density of traffic using Images. Traffic density estimation will help in the development of an Intelligent Transport System (ITS) by giving insight into required road construction to smoothen the traffic congestion and also in optimizing the traveling route [1]. Deep N. Singh (B) USICT, Guru Gobind Singh Indraprastha University, New Delhi, India e-mail: [email protected] M. Kumar NSUT East Campus (Formerly AIACT&R) Delhi, New Delhi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_34

397

398

N. Singh and M. Kumar

learning models provide outstanding results in the domain of image processing using neural networks. Convolutional neural networks (CNNs) are widely used for image processing. Traffic density estimation will be more effective if image processing is combined with artificial intelligence. The proposed model of traffic density classification uses the CNN model and achieved good accuracy in comparison to the other proposed model in the literature. Keras, a Python package for cognitive abilities, is used for this purpose.

2 Literature Review Traffic density estimation using image processing proved as an effective approach. AI and computer vision attained remarkable achievement in the majority of domains like the economy, security, education, etc. Vehicle counting is one of the methods for finding the density of traffic in the past. Different method is proposed [2] in the past that counts the number of vehicles using heavily dense vehicle images and even images taken during bad weather using the concept of headlight detection to count the vehicles. Nowadays, deep learning approaches to neural network have remarkable achievements in domain of image processing and computer vision. CNN has wide applications in vehicle count using a regression method. This simplifies the structure and also reduced the inference time. Author in [3, 4] estimates the traffic density and classifies the vehicles using the approach of neural networks and CNN. TrafficNN model is designed [5] that classifies the road conditions into five categories using CNN and achieves an accuracy of 82% which is high in comparison with other pretrained models. A new approach is proposed [6] that classifies traffic density by generating the images using the concept of vehicles average travel time and classifies the generated images using the CNN. A hybrid model using deep neural network is developed [7] which forecast the conditions of road traffic based on images using CNN that helps in predicting road accidents. Density estimation using computer vision in the daytime is proposed [8], where density is calculated based on vehicles/unit area. The proposed model achieved an accuracies of 96.0 and 82.1% for fast-paced and slow-moving traffic scenes. Image processing provides an intelligent traffic management system, but preprocessing of images, background modeling and foreground detection is a bit computationally heavy in MATLAB. Today such computation on images could be done in optimized time using machine learning and deep learning methods. A review is presented [9] for vehicle detection and optimal classification methods for congestion detection, namely DT, SVM, and KNN. Results are promising, but the proposed model only focuses on daytime traffic congestion detection. The vehicle behavior is dynamic that degraded the prediction performance of any deep learning model. To overcome such issues, a new hybrid model is proposed [10] that ensembles the CNN and BLSTME to reduce the vehicle dynamic behavior. This approach predicts the road traffic congestion. The

Road Traffic Density Classification to Improvise Traffic System Using …

399

author proposed [11] method for object classification recognition helps to improvise the articulation of CNN model for good results. The literature presented above clearly depicts the performance of convolutional neural network modeling in the domain of traffic improvisation. The next section presents the visualization of different factors involved in the accident and the need for attention in the direction of traffic density estimation or classification.

3 Accident Data Analysis and the Need for Traffic Density Classification To strengthen the need for traffic density classification, data analytics and visualization have been performed on NYPD Motor Vehicle Collision dataset that is obtained from New York City Open Data. Following observations made during visualization: a. The majority of accidents happen between May and July and September and November. Fridays are the day of the week when most accidents occur. b. The highest number of accidents occurs in urban areas. In serious accidents, 71.53% of accidents occur in urban areas and 28.47% in rural. Men were more involved in both serious and non-serious accidents. c. Most accidents occurred at a speed limit of 20 to 30 km/h. 69.87% of serious accidents occurred at a speed limit of 20 km/h. This visualization depicts that there is more focus on other accident-causing factors like the age band, vehicle type, sex of the driver, road condition, weather condition, etc. But no one considers traffic density as a factor in accidents. This motivates the author to propose a methodology for traffic density classification using CNN.

4 Proposed Methodology The proposed methodology consists of basic two modules as shown in Fig. 1. Initially, images collected related to the traffic will be preprocessed; then after preprocessing, we will train our proposed CNN model for density classification. Based on the provided input images, the proposed method will classify the input image as high traffic density or low traffic density.

Traffic Images

Pre-processing

Fig. 1 Proposed methodology architecture

Proposed CNN

High traffic or low traffic

400

N. Singh and M. Kumar

4.1 “Preprocessing” In this, images with vehicle objects are detected and stored using traffic images, while the images without vehicles are filtered out. Resulting images of vehicle object only are further used as input to CNN module.

4.2 CNN Module CNN module consists of three layers: (1) The convolutional layer: is used to find the relevant from the input image. (2) ReLU layer: ReLU is used to perform a nonlinear operation. The output is computed: f (x) = max(0, x)

(1)

(3) Pooling layers: It is used to reduce decrease the count of parameters used if images are large enough. Avg pooling and Max pooling techniques are frequently used.

4.3 Fully Connected Layer Fully connected layer is used to flatten the layer which transfers the collected features into 1-D array.

5 Implementation and Results 5.1 Data Collection Image data classified into high traffic and low traffic was scrapped from the web using simple_image_download package of python language. Around 1044 training images and 294 validation images were used. High-density and low-density traffic distribution were 604:631 images, respectively, which is almost 50%. An example of the type of images used is shown in Fig. 2. This dataset is divided into 80:20%

Road Traffic Density Classification to Improvise Traffic System Using …

401

Fig. 2 a Types of traffic density; b low density; c high density

5.2 Data Preprocessing Keras, a Python package for cognitive abilities, is used for this purpose. From Keras, backed is imported and used to check the input shape. Image height and width are set to 244 to bring out more consistency in data. Image data is further expanded artificially for better training purposes using ImageDataGenerator() function, which is part of Keras again. A horizontal flip is used. Vertical flip is not used. Rescaling is done by 1/255 for better model results.

5.3 Building the Neural Network Sequential model is used which is imported from the Keras package. The model is having three convolutional layers. Layers 1 and 2 are having 32 neurons, while the third layer has 64 neurons. The activation function used is “ReLU”. MaxPooling2D is used so that the exact feature map obtained from the convolutional layers can be cooled down to extract the maximum of all the patches. Finally, the flattening layer is added to the model, having 64 neurons in the dense layer and just one output neuron having a sigmoid activation function. The dropout rate is 0.5 to remove the features that are below this threshold.

5.4 Training the Network Compile function is used with the parameters of loss as binary_crossentropy, optimizer as rmsprop, and metrics as “accuracy” because the goal was to evaluate the accuracy of the model at the end. Binary_crossentropy is used because the traffic data to be classified has two classes, that is, high and low traffic. The model is fit using the images data generated, and batches and number of epochs are tried out so many times to attain the best accuracy of the model as shown in Table 1. Model training was started with batch size = 32 and epochs = 10. So, the path to follow was taken of the maximum accuracy, i.e., batch size 16. Epochs were raised to 20, 25, and 40 resulting in an accuracy of 83.05%, 85.44%, and 89.56%, respectively.

402 Table 1 Network training to find best configurations for epoch and batch size

N. Singh and M. Kumar Epoch(E) versus batch size(B) B = 32

B = 16 B = 64

E = 10

70 (high rise) X

X

E = 15

81.13

66.9

E = 20



83.05

X

E = 25

81.90 (test)

85.44

X

89.56

X

E = 40

81.8

To test whether the path chosen was right or not. The model was trained with epochs = 25 and batch size = 32, and the results came out in favor. Batch size = 16 gave higher accuracy than 32. Hence, the final configuration was decided to be batch size = 16 and epoch = 40.

5.5 Performance Evaluation While building the model, the model metrics were set to “accuracy”. These metrics were the perfect choice for this model metrics. The model shown in Fig. 4 is deployed, and then a random traffic image is given as input which is classified as high traffic. Many input images are tested to test the accuracy of the model, and a 90% classification of an image as “high traffic” or “low traffic” is achieved. Figure 3 reflects the input and output of the proposed model, (a) input, (b) classification as high traffic or low traffic density as an output. Finally, the accuracy of the model came to be 89.56 ~ 90%.

Fig. 3 Traffic density classification of the input image using the proposed model

Road Traffic Density Classification to Improvise Traffic System Using … Table 2 Comparison of accuracy of the proposed model

403

Models

Average accuracy (%)

Method used

Proposed model

89.50

CNN

Work proposed in [7]

82

CNN

Work proposed in [10]

88.75

Computer vision, MATLAB

From Table 2, it is derived that the proposed model improvises the CNN model by doing hyperparameter setting and finds better accuracy of traffic classification in comparison to others [7, 10].

6 Conclusion The proposed study designed a CNN model by doing hyperparameter optimization after collecting traffic images using web scrapping technique. The proposed work achieved a good overall accuracy in comparison to other works. CNN belonging to Keras package of Python was used to make an image classification model to distinguish between low-traffic and high-traffic images which achieve the accuracy of nearly 90% with configuration of batch size = 16 and epoch = 40. To strengthen the model, bad weather and nighttime images will be considered as a future scope. Also proposed model must be validated with other deep learning approaches.

References 1. Nguyen LAT, Ha TX (2022) A novel approach of traffic density estimation using CNNs and computer vision. European J Electr Eng Comput Sci 5(4):80–84 2. Pang CCC, Lam WWL, Yung NHC (2007) A method for vehicle count in the presence of multiple-vehicle occlusions in traffic images. IEEE Trans Intell Transp Syst 8(3):441–459 3. Kapse SA, Bhoyar RA, Dhokne CN (2016) Classification of traffic density using three class neural network classifiers. Int J Eng Res Technol (IJERT), ISSN, 2278–0181 4. Nubert J, Truong NG, Lim A, Tanujaya HI, Lim L, Vu MA (2018) Traffic density estimation using a convolutional neural network. arXiv preprint arXiv:1809.01564 5. Shipu MK, Mamun FA, Razu SH, Nishat Sultana M (2022) TrafficNN: CNN-based road traffic conditions classification. In: Soft computing for security applications, pp 241–253, Springer, Singapore 6. Cho J, Yi H, Jung H, Bui KHN (2021) An image generation approach for traffic density classification at large-scale road network. J Inf Telecommun 5(3):296–309 7. Manchanda C, Rathi R, Sharma N (2019) Traffic density investigation and road accident analysis in India using deep learning. In: 2019 international conference on computing, communication, and intelligent systems (ICCCIS), pp 501–506. IEEE

404

N. Singh and M. Kumar

8. Suseendran G, Akila D, Balaganesh D, Elangovan VR, Vijayalakshmi V (2021) Incremental multi-feature tensor subspace learning based smart traffic control system and traffic density calculation using image processing. In: 2021 2nd international conference on computation, automation and knowledge management (ICCAKM), pp 87–91, IEEE 9. Chetouane A, Mabrouk S, Jemili I, Mosbah M (2022) Vision-based vehicle detection for road traffic congestion classification. Concurrency Pract Experience 34(7):e5983 10. Kothai G, Poovammal E, Dhiman G, Ramana K, Sharma A, AlZain MA, Gaba GS, Masud M (2021) A new hybrid deep learning algorithm for prediction of wide traffic congestion in smart cities. Wirel Commun Mobile Comput 11. Mostafa SA, Mustapha A, Gunasekaran SS et al (2021) An agent architecture for autonomous UAV flight control in object classification and recognition missions. Soft Comput

Fake Reviews Detection Using Multi-input Neural Network Model Akhandpratap Manoj Singh

and Sachin Kumar

Abstract The increasing penetration of the Internet and accessibility to smart devices in the last decade has led to new dimensions in the e-commerce ecosystem. The concept, of word-of-mouth in traditional marketing, is becoming obsolete, and a greater number of users are preferring reviews and ratings of a product before buying it online. With the increase in competition, e-commerce players and customers are hit by the new phenomenon of fake reviews, which are used to change customers’ sentiments and make businesses profitable. The global e-commerce industry is expected to cross USD 16,215.6 billion valuations by 2027 at a compound annual growth rate of 22.9%, which makes it extremely important to study fake reviews. So far, very limited work has been done in detecting fake reviews with the help of traditional machine learning models and limited datasets. This issue can be bridged up with the advancement in technology as new deep learning models and effective datasets have evolved. We have proposed a new model by combining the features of multilayer perceptron and LSTM, which provides the highest accuracy of 91.395%. Keywords Fake reviews · E-commerce · Amazon · Multilayer perceptron · Long short-term memory

1 Introduction The twenty-first century is commonly called to as the digital age because traditional methods of communicating messages and ideas are becoming obsolete as the Internet takes over. The active use of smart devices and social media sites has changed our way of life and has given birth to many new dimensions. Many views social media as an excellent and happening phenomenon because it is a rapidly developing technology which can have a long-lasting impact on one’s life and affect our perspective. A. M. Singh (B) · S. Kumar Ajay Kumar Garg Engineering College, Ghaziabad, Uttar Pradesh 201009, India e-mail: [email protected] S. Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_35

405

406

A. M. Singh and S. Kumar

In this open Internet world, users can post their opinions, i.e., reviews on various platforms. These reviews can be helping organizations and potential consumers to get a clear idea about products and services which is about to be purchased. Also, it has been noted that the number of customer-centric reviews has increased manifold on various e-commerce platforms in the last few years and can potentially affect the decision of upcoming buyers [1]. This means the potential buyers make their purchasing decisions based on available reviews across platforms. As a result, these reviews offer an invaluable service for potential buyers. True and positive reviews can help in generating large revenues, while negative reviews can hit badly. As a result, as customers become more important in the marketplace, the trend of relying on customer feedback to reshape businesses through improving products, services, and marketing is rapidly growing. Fake reviews have become an important challenge in this digital age. The manifold growth of the e-commerce industry has led to multidimensional competition in the segment. As per a recent study, online reviews will influence $3.8 trillion of global e-commerce spending in 2021 [2]. Looking at the potential reviews has to offer to the e-commerce industry; they became very vital for every market player. Therefore, almost all business establishments globally are directly or indirectly affected by fake reviews, both in positive and negative terms. Thus, it became extremely important to study the various aspects of fake reviews, like their potential, their impact, etc. With the increase of computational power and advancement of various machine learning and deep learning algorithms, computer science can effectively help in solving the problem of fake reviews. Thus, the following observations are made from our literature research: • Very limited research has been done in detecting the fake reviews on a real dataset, which makes it essential to bridge the gaps by further studying various evolved algorithms. • Many researchers have used up to a few hundred reviews for their study purpose, which again is a very limited dataset to study classification problems using machine learning algorithms. Also, few datasets were not publicly available to other researchers. • By using some highly efficient machine learning algorithms like long shortterm memory and Multilayer Perceptron on real datasets, we can achieve better accuracy than all earlier models studied. The paper further discusses related literature surveys in detecting fake reviews in the next section, i.e., Sect. “2”. Section “3” of this paper discusses the proposed model to detect fake reviews. The result with proper analysis is discussed in the next section (Sect. “4”), and the conclusion and future scope of proposed work is mentioned in the last section, i.e., Sect. “5”.

Fake Reviews Detection Using Multi-input Neural Network Model

407

2 Related Work A fake review is one which has been written or generated with the help of computer program without any prior knowledge of the actual product or services [3]. Deceptive opinions, spam opinions, and spam reviews are all terms used to describe fake reviews, and their creators are known to us as spammers. On the one hand, developments in NLP lead to an increase in the production of fake reviews on large scale. With the introduction of the Internet and commerce activities backed by the Internet, the world has seen tremendous growth in e-commerce activities. All these led to tough competition, and with the advancement in technology, companies became to harness such technologies to make their business profitable. This led to the new business of fake reviews. The study of spam finds its origin at the beginning of the twenty-first century, with Drucker et al. [4]; studying the new approaches which were used to make a business profitable in earlier times, this study was in context with emails. Later, Gyongyi et al. [5] provided formal taxonomy of Web spam where he studied various spamming techniques that used various deep learning approaches like TF-IDF to rank Web pages based on text fields. Ntoulas et al. [6] in his research discussed a number of heuristic methods for early detection of content-based spam, and his proposed model provided an accuracy of 86.2% in detecting spam pages. Further, they have discussed that most of the methods they have used in their research can be easily fooled by spammers. A breakthrough in the study of fake reviews was made in 2008, when Jindal and Liu [7] studied in the context of e-commerce product reviews and have used a real dataset from Amazon. This dataset contains 5.8 million reviews from 2.14 reviewers and was updated till June 2006. They observed that opinion spamming is a global problem and distinct from email or Web spam. Also, they have employed a supervised learning model for their research and found that the logistic regression model was highly effective in detecting spam reviews. In 2009, Yoo and Gretzel [8] studied 40 truthful and 42 deceptive hotel reviews and, using a standard statistical test, manually comparing the psychologically important linguistic variations between them using a standard statistical test. The dataset used by them was self-generated and was too small for general studies. Due to lack of gold-standard data, Wu et al. [9] presented an alternative technique based on the distortion of popularity rankings for detecting false opinion spam. Because we compare gold-standard misleading and truthful judgments, both of these heuristic evaluation procedures are unnecessary in our study. Feng et al. [10] confirmed the link between fraudulent reviews and abnormal distributions. They went on to show that features derived from context-free grammar (CFG) parsing trees boost detection performance consistently. By assuming that there must be some difference in sentiment polarity and language structure between fake reviews and truthful reviews. Ren et al. [11] identified the features connected to the review text and used a genetic algorithm for feature selection, and these two unsupervised clustering algorithms were merged to identify fake reviews. Campbell

408

A. M. Singh and S. Kumar

et al. [12] found that 66% of visiting customers have keen trust in e-commerce portals which have mix of positive and negative reviews, and these truthful reviews can increase the revenue, and the deceptive reviews can lead to a decrease in the conversions by as far as 67%. Sandulescu and Ester et al. [13] studied fake reviews on the Trustpilot dataset which has 9000 reviews. This dataset was not publicly available, and this dataset was skewed toward positive reviews. Christopher et al. [14] found that fake reviews have created negative consequences, so much so that prompting researchers to work on methods to develop methods to make it easier for both producers and consumers to tell the difference between real and deceptive reviews. Levitan et al. [15] have proposed a new hybrid deep neural network model after comparing various machine learning methods with different feature sets that use a combination of textual and audio features for fraud detection and achieved an F1score of 63.90% for their new deep-hybrid model and precision of 76.11% for their RF model by utilizing a subset of the Columbia X-Cultural Deception (CXD) Corpus. Lirong et al. [16] have studied the seller’s motivation to manipulate the reviews to help their businesses. Raffel, Colin et al. [17] build a model to detect fake reviews with the help of a bidirectional LSTM algorithm. In 2020, Faranak et al. [18] studied 15 linguistic features to classify a review as fake or not. They have used RF features, RFE, Boruta, and ANOVA to identify the number of adjectives, pausality, and redundancy as the most critical features in reviews. During this study, they have used seven classifiers and found that MLP has achieved the highest accuracy of 79.09%. Luis et al. (2020) [19] have a self-made restaurant dataset of 86 reviews and trained on an ensemble of SVM and MLP to achieve 77.3% accuracy. Elmogy et al. (2021) [20] have found that KNN (with K = 7) outperformed other models with an average F-score of 83.73% on the Yelp dataset. Chunyong et al. [21] found that high-quality labeled data is the most important aspect in the study and classification of fake reviews. Also, they have proposed a model that uses three iterations to classify a review. In 2021, Rami et al. [22] have reviewed the work done till now in detecting fake reviews by different authors and discussed their approaches in detail. Further, they have compared these approaches to other approaches to find the best model for classifying fake reviews. Among various neural network-based—deep learning models, they found RoBERTa the most efficient one; also, they found that the most of the work in this field was focused on supervised machine learning models.

3 Proposed Model Neural network algorithms give exceptional results in data classification projects for natural language processing tasks. Most representative neural networks, which are deep learning methods, can quickly extract important data properties when compared to typical machine learning approaches. Using embedding approaches, deep learning algorithms can assist in retaining the text’s semantic content. This model proposes

Fake Reviews Detection Using Multi-input Neural Network Model

409

Fig. 1 Block diagram of the proposed methodology

(Fig. 1) ensembling of two well-known machine learning models, namely multilayer perceptron (MLP) and long short-term memory (LSTM), to detect fake reviews on e-commerce dataset.

3.1 Dataset Dataset plays important role in training any machine learning model. Reviews of a product are available on e-commerce sites. Datasets of fake reviews are available on various platforms, but very few of them are effective. Many researchers have used self-created datasets for their research purpose. The fake review dataset from Salminen [23] is used in this proposed model, and it contains a total of 40 k reviews,

410

A. M. Singh and S. Kumar

out of which 20 k are real reviews and 20 k are of fake reviews. This dataset is properly labeled, and real and fake reviews are in equal proportion, which is the main motive behind using this dataset as such datasets avoid the model from overfitting and underfitting.

3.2 Preprocessing The first and most crucial step in any text mining process is commonly known as preprocessing. Our proposed model involves two neural network models for obtaining optimized results. In the LSTM model, the text-to-sequence method has been used to convert reviews into numerical sequences because LSTM works on numerical and sequential data. Prepositions, articles, and pronouns are ones that do not provide distinct meaning to the text are the mostly included in any review, and they are known as stop words. The MLP method requires the removal of these stop words because these stop words consume more computational power. There are a variety of stop word removal methods. The traditional method involves removing stop words from a list that has already been compiled. The change of a word to convey numerous grammatical categories such as tense, case, voice, aspect, person, number, gender, and mood is known as inflection in grammar. A prefix, suffix, or infix, or another internal change such as a vowel shift, communicates one or more grammatical categories. This problem can be solved using lemmatization, a mechanism for converting any sort of word to its base root mode.

3.3 Feature Selection Feature selection for detecting fake reviews is a tough task. The text in reviews is rich in properties, and we can’t neglect any. They are not only limited to simple text, but they contain various keywords which are different for different products. Reviews include both positive and negative emotions, the characteristic of a product, and their advantages and disadvantages. In this section, we have discussed various statistical features that were considered to obtain the optimum results. The multilayer perceptron model requires a TF-IDF algorithm, which is best suitable for information retrieval and document classification. For a non-normalized weight of term ‘m’ in the document ‘n’ in a corpus of X documents, TF-IDF is computed by multiplying a local component (TF) with a global component (IDF) and normalizing the resulting documents to unit length, and it is given by,   weightm,n = wlocal frequencym,n   ∗wglobal document__frequencym , X

(1)

Fake Reviews Detection Using Multi-input Neural Network Model

411

where wlocals function for local weighting. Wglobal function for global weighting. Reviews are not of same size; they can be of one word, one complete sentence (15–20 words), a paragraph (80–100 words), or combination of paragraphs. It was observed that reviews up to 100–120 words dominate the dataset, and in the long short-term memory (LSTM) model, only, the first 300 words were considered for the training purpose. Padding is also required in this model, which is a type of masking where the masked steps are at the beginning or end of a sequence.

3.4 Model Training The dataset contains approximately 40 thousand reviews of which 20 thousand reviews are of real products and half 20 thousand reviews are computer-generated one. Thus, 67% of these reviews are used for training purposes, whereas the remaining 33% are used for validations. For the purpose of obtaining optimum results, this model is trained up to 25 epochs, and the results are discussed in later sections. Since detection of fake reviews is a binary classification task, that is reviews can be either fake or real, the binary cross-entropy loss function is best suited for this.

3.5 Model Evaluation To assess the performance of given proposed model and find the suitable epoch value for obtaining optimum results, the proposed model with different epoch values has been trained and tested. Thus, to get the efficient result depending upon the training and testing accuracy of the model, the best epoch value has been selected. Furthermore, the performance measures like accuracy of model, precision and recall value of model, and the F1-score, etc., have been also calculated for the best epoch value of the model. The results of the model evaluation are discussed in detail in the next section.

4 Result and Discussion In this research study, we introduce a novel model to detect fake reviews using deep learning techniques. In this proposed model, we applied an ensemble learning-based approach using MLP and LSTM with optimized hyperparameters and trained this model on the fake review dataset from Salminen [23] which has a total of 40 k

412

A. M. Singh and S. Kumar

reviews to obtain best results. This dataset contains an equal proportion of real and fake reviews, making it best for training purposes. To comprehend the suggested model’s performance with the epochs the Fig. 2, Fig. 3 and Fig. 4 shows the accuracy and loss with respect to epochs. Figure 2 illustrates that the MLP model performs best after the first epochs, and that as the number of epochs increases, the models begin to overfit, as the accuracy on the training data increases while the accuracy on the testing data decreases in comparison to earlier epoch values. Furthermore, the model’s loss on training data increases with epochs, whereas it decreases on testing data. As a result, the MLP model performs well with lower epoch values, requiring less time for model training to get optimal results. Figure 3 shows that the LSTM model’s performance on the training and testing datasets initially improves as the number of epochs grows, but after ten iterations, the performance on the testing data stabilizes. The loss of the model on both the training and testing data initially drops, but the loss on the testing data stabilizes, while the loss on the training data continues to fall, as seen in the loss versus epochs graph in Fig. 3. As a result, it is clear that the LSTM model improves with epochs until it reaches a point where it stabilizes. When we compare the MLP model and the LSTM model, the LSTM model has higher accuracy than the MLP model, but the MLP model gives its optimum result much faster than the LSTM model, but as the epochs value increases, it begins to overfit, whereas the LSTM model’s performance keeps increasing with epochs and stabilizes after a certain number of iterations.

Fig. 2 Model accuracy and model loss of MLP model

Fig. 3 Model accuracy and model loss of LSTM model

Fake Reviews Detection Using Multi-input Neural Network Model

413

Fig. 4 Model accuracy and model loss of the proposed model, ensembling of MLP and LSTM

One of the motivations behind ensembling multilayer perceptron and long shortterm memory was to attain the advantages of both LSTM (High accuracy) and MLP (Early optimum results), as shown in Fig. 4 for the proposed model. In Fig. 4, the proposed model’s testing accuracy is optimum for epoch 2, after which the training accuracy improves, but the testing accuracy progressively drops and stabilizes, but the training accuracy continues to rise. As a result, the suggested model has both LSTM and MLP capabilities with better results.

5 Confusion Matrix The proposed model’s performance was compared to that of other machine learning algorithms employed by various authors, such as Naive Bayes, support vector machine, multilayer perceptron, and LSTM. The following performance measures have been calculated from the confusion matrix of the training and testing datasets. From Fig. 5, it is clearly seen that the output of proposed ensembled model of LSTM and MLP is highest among all the mentioned models; this is because the proposed model enjoys the characteristic of both LSTM and MLP model. Based on the entries of above confusion matrix (Table 1), the accuracy, precision, recall, and F1-score of all four models along with proposed model were calculated using the following formula. Accuracy = TP + TN/(TP + TN + FN + FP) Precision = TP/(TP + FP) Recall = (TP/TP + FN) F1 − Score = 2 ∗ (Precision ∗ Recall)/(Precision + Recall)

414

A. M. Singh and S. Kumar

Fig. 5 Graphical representation of confusion matrix for NB, SVM, MLP, LSTM, and proposed model (MLP + LSTM) Table 1 Result of confusion matrix for NB, SVM, MLP, LSTM, and proposed model (MLP + LSTM) Model

True positives (TP)

True negatives (TN)

False positives (FP)

False negatives (FN)

Naive Bayes

5292

6216

484

1351

SVM

6064

5998

702

579

MLP

5645

5955

743

1000

LSTM

6216

6081

621

427

LSTM + MLP (proposed)

6159

6264

482

438

Table 2 contains all of the performance indicators for the proposed model as well as several commonly used machine learning models. The values of MLP and LSTM, as well as the suggested model, are provided in table for epoch value 7. The suggested model outperforms all other models tested in this research in terms of accuracy, precision, recall, and F1-score. As a result, the suggested model is the best strategy for review categorization.

6 Conclusion Fake reviews can be a problem for both customers and e-commerce players. It is detection, and classification at an early stage is important to maintain the trust between seller and customer and also maintain the credibility of the product without diluting the customer’s sentiment. This paper proposes a fake review detection multi-input

Fake Reviews Detection Using Multi-input Neural Network Model

415

Table 2 Comparison of accuracy, precision, and recall of NB, SVM, MLP, LSTM, and proposed model Model

Accuracy

Precision

Recall

F1-Score

Naïve Bayes

87.17

87.16

86.87

86.98

SVM

90.01

89.96

90.01

89.98

MLP

89.14

89.21

89.14

89.13

LSTM

90.32

90.33

90.32

90.32

Proposed model

91.39

91.40

91.39

91.39

neural network model having two input branches of MLP and LSTM. To test the performance of the above proposed model, it has been compared with the outputs of Naïve Bayes, SVM, MLP, and LSTM individually. From the comparison, the ensembling of MLP and LSTM that is the proposed model has achieved the highest accuracy of 91.395% on the fake review dataset from Salminen [23]. Hence, the proposed model becomes an efficient model for such tasks. The proposed model, in this study, was trained and tested on reviews from e-commerce dataset, and in future, it can be used for the other datasets such as for hotels, restaurants, shops, and other business establishments and any other dataset to detect fake reviews effectively and efficiently.

References 1. Big-commerce Blog (2022) https://www.bigcommerce.com/blog/online-reviews/#who-is-rea ding-online-reviews. Accessed 7 Apr 2022 2. The Print (2022) https://theprint.in/opinion/almost-4-of-all-online-reviews-are-fake-their-imp act-is-costing-us-152-billion/715689/. Accessed 3 Mar 2022 3. Lee KD, Han K, Myaeng SH (2016) Capturing word choice patterns with LDA for fake review detection in sentiment analysis. In: Proceedings of the 6th international conference on web intelligence, mining and semantics, pp 1–7 4. Drucker PF (2002) The discipline of innovation. Harv Bus Rev 80(8):95–102 5. Gyongyi Z, Garcia-Molina H, Pedersen J (2004) Combating web spam with trustrank. In: Proceedings of the 30th international conference on very large data bases (VLDB) 6. Ntoulas A, Najork M, Manasse M, Fetterly D (2006) Detecting spam web pages through content analysis. In: Proceedings of the 15th international conference on World Wide Web, pp 83–92 7. Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining, pp 219–230 8. Yoo KH, Gretzel U (2009) Comparison of deceptive and truthful travel reviews. In: ENTER, pp 37–47 9. Wu G, Greene D, Smyth B, Cunningham P (2010). Distortion as a validation criterion in the identification of suspicious reviews. In: Proceedings of the first workshop on social media analytics, pp 10–13 10. Feng S, Xing L, Gogar A, Choi Y (2012) Distributional footprints of deceptive product reviews. In: Proceedings of the international AAAI conference on web and social media 6(1):98–105 11. Ren Y, Ji D, Zhang H (2014) Positive unlabeled learning for deceptive reviews detection. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 488–498

416

A. M. Singh and S. Kumar

12. Kunle Campbell (2022) https://2xecommerce.com/importance-reviews-ecommerce. Accessed Mar 3 2022 13. Sandulescu V, Ester M (2015) Detecting singleton review spammers using semantic similarity. In: Proceedings of the 24th international conference on World Wide Web, pp 971–976 14. Christopher, S. L., & Rahulnath, H. A. (2016, October). Review authenticity verification using supervised learning and reviewer personality traits. In: 2016 international conference on emerging technological trends (ICETT), pp 1–7. IEEE 15. Mendels G, Levitan SI, Lee KZ, Hirschberg J (2017) Hybrid acoustic-lexical deep learning approach for deception detection. In: Interspeech, pp 1472–1476 16. Chen L, Li W, Chen H, Geng S (2019) Detection of fake reviews: analysis of sellers’ manipulation behavior. Sustainability 11(17):4802 17. Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2019). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 18. Abri F, Gutierrez LF, Namin AS, Jones KS, Sears DR (2020) Fake reviews detection through analysis of linguistic features. arXiv preprint arXiv:2010.04260 19. Gutierrez-Espinoza L, Abri F, Namin AS, Jones KS, Sears DR (2020) Fake reviews detection through ensemble learning. arXiv preprint arXiv:2006.07912. 20. Elmogy AM, Tariq U, Ammar M, Ibrahim A (2021) Fake reviews detection using supervised machine learning. Int J Adv Comput Sci Appl 12 21. Yin C, Cuan H, Zhu Y, Yin Z (2021) Improved fake reviews detection model based on vertical ensemble tri-training and active learning. ACM Trans Intell Syst Technol (TIST) 12(4):1–19 22. Mohawesh R, Xu S, Tran SN, Ollington R, Springer M, Jararweh Y, Maqsood S (2021) Fake reviews detection: a survey. IEEE Access 9:65771–65802 23. Salminen J (2021) Fake reviews dataset. Retrieved from osf.io/tyue9. Last retrieved 3 Mar 2022

Classification of Yoga Poses Using Integration of Deep Learning and Machine Learning Techniques Kumud Kundu and Adarsh Goswami

Abstract Pose estimation is a classical problem in computer vision. With the recent change across the globe, there is much focus on self-care with the help of yoga. To derive the desired benefits of yoga, the poses must be done as per its correct posture. Information about the name of yoga pose gives one idea about its associated benefits. In this paper, the majority voting classifier is utilized for voting out the given yoga pose into five classes (goddess pose, downward dog pose, plank pose, tree pose, and warrior 2 pose). Voting classifier is explored to improve the accuracy of stacked individual ensemble classifiers (AdaBoost, bagging, and dagging classifier). Classification accuracy is evaluated with the help of standard machine learning evaluation metrics like precision, recall, F1-score, area under curve (AUC). Experimental results validate the improved performance of voting classifier as compared to individual classifiers. Higher average F1-score of 0.9755, as compared to scores of bagging classifier, classifier, and AdaBoost classifier, also confirms the better balance of precision and recall metrics and better tolerance to the imbalanced or small datasets. Keywords Yoga pose classifier · Voting classifier · Dagging classifier · Bagging classifier

1 Introduction With the exponential rise in the popularity of yoga across the world, there is a subsequent need to frame the auto-assisted model which can support/advocate selfinstructed and correct yoga practices. Yoga pose estimation is one of the recent problem statements in the field of computer vision which is enjoying huge interest throughout the globe. This problem statement is concerned with the recognition of the individual parts of the body in a particular direction that make up the desired pose. Incorrect or wrong yoga poses can do more harm than benefits as it can lead to K. Kundu (B) · A. Goswami Inderprastha Engineering College, Ghaziabad, Uttar Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_36

417

418

K. Kundu and A. Goswami

various health issues like muscle pull, stiff neck, ankle sprain, etc. Recently, there is a huge interest in the research community to apply human pose estimation models in the study of posture assessment in yoga poses. OpenPose repository from Carnegie Mellon University Cao et al. [1] is the first real-time system that can detect 135 key points in human body image including hand, facial, and foot key points. Basically, open pose utilized multi-stage convolutional neural networks (CNNs). Deep pose of Toshev et al. [2] is another deep learning-based approach that formulates pose estimation as a regression problem around body joints. TensorFlow Lite version of MoveNet Bajpai et al. [3] is another state-of-the-art pose estimation model that is popularly employed for the estimation of seventeen key points on skeletal structure of the human body. Deep cut of Pishchulin et al. [4] is a pose estimation model for detecting poses of multiple people in real-world images. Various approaches for pose estimation have aroused huge interest in the analysis of yoga poses identification and classification. Though these popular pose estimation models work well for the estimation of skeletal portrayal, adapting them for yoga pose identification requires further exploration. In this study, we aim to recognize and classify the query image into five yoga poses. Computer vision techniques and MobileNet are utilized, an existing pose estimation framework, for the extraction of pose information from an image. The contribution of this paper is utilization of voting classifier with soft voting and the association of every classified pose with the audio captioning-explaining the details of the yoga pose. For yoga pose classification, convolutional neural networks (CNNs) are employed for the landmark detection, which further feeds into the voting classifier. A sequence-tosequence recurrent neural network (RNN) is employed to the outputs of the voting classifier which outputs the information in audio format. Section 2 gives a brief review of pose estimation of the human body and the related machine learning/deep learning approaches that have been applied for yoga pose identification classification. A brief introduction is also presented about the automated audio captioning. Section 3 describes the formulation of yoga pose detection as a machine learning problem. Subsequently in Sect. 4, experimental results are presented, and lastly, conclusions and direction for future works are presented in Sect. 5.

2 Related Work Yoga, an ancient Indian method of exercise for physical, spiritual, and mental strength, is gaining huge attention since lockdown. It has now become an integral part of a healthy lifestyle but to derive its optimum benefits, right posture in yoga asanas is very much required. Yoga pose estimation may be formulated as a machine learning task with the aim of accurate estimation of specific body parts spatial locations of a person from an image or video as per the ancient yoga pose literature. Similarity score is computed between the correct pose and estimated pose, and the pose is treated as satisfactory correct if the similarity score is above a certain confidence level. In this section, we give a brief overview of formulation of yoga pose

Classification of Yoga Poses Using Integration of Deep Learning …

419

detection as a machine learning problem and various machine learning approaches that have been applied for the yoga pose identification and classification.

2.1 Human Pose Detection Since the last two decades, there has been fast pace progress in the development of efficient deep neural networks (DNNs) for human pose estimation. Human pose estimation requires approximation of the location of the key points at joints and landmarks. DNN is inherently good non-linear approximators, so their performance at approximation of the non-linear mapping functions existing between the human body images and the joint/key points locations. Xiao et al. [5] utilized a flow-based pose similarity metric and a few deconvolutional layers added on a popular convolutional neural network (CNN), ResNet for pose estimation. Zhang et al. [6] investigated the practically critical pose model efficiency problem and proposed the fast pose distillation (FPD) model to train effectively even the extremely small human pose CNN networks. The concern of knowledge distillation lies in the transfer of the structured information of dense joint confidence maps between different neural networks of different capacities. Wang et al. [7] in their work tackle the problem of multi-person human pose estimation and tracking in videos and proposed the approach consisting of three components: clip tracking network, video tracking pipeline, and spatial– temporal merging. Clip tracking network performed pose estimation and tracked it on small video clippings. The video tracking pipeline merged the tracklets estimated by the previous component. Spatial–temporal merging component refined the joint locations over the multiple detections for the same person by utilizing a spatial– temporal consensus procedure. Their proposed approach was able to get state-ofthe-art results on both joint detection and tracking on Pose Track 2017 and 2018 datasets [8, 9]. Bin et al. [10] proposed semantic data augmentation (SDA), a method that augments images by pasting segmented body parts with various semantic granularity and adversarial semantic data augmentation (ASDA), which exploits a generative network to dynamically predict tailored pasting configuration. Both the proposed approaches are able to mitigate some of the challenges (like heavy occlusion due to nearby person, symmetric appearance), posed to the multi-person human pose estimation approaches. Gong et al. [11] proposed PoseAug, a new auto-augmentation framework that learns to augment the available training poses toward a greater diversity and thus improve generalization of the trained 2D–3D pose estimator. Proposed PoseAug evaluates a differentiable pose augmenter framework and generates realistic and diverse 2D–3D pose pairs which are eventually utilized for the training of 3D pose estimation. Enhanced availability of 2D–3D pairs enhances the feasible region of augmented poses. Kim et al. [12] applied OpenPose pose estimation repository for ergonomic postural assessment to evaluate the risks of musculoskeletal disorders. In their work, they validated rapid upper limb assessment (RULA) and rapid entire body assessment (REBA) scores which were computed for the joint angles

420

K. Kundu and A. Goswami

on the basis of 3D human skeleton data acquired by OpenPose, and compared its performance to Kinect. Anipose, an open-source toolkit for robust marker-less 3D pose estimation, was developed by Karashchuk et al. [13]. Anipose, estimated and tracked joint angles to define and refine 3D poses for three different animal datasets.

2.2 Yoga Pose Detection Every yoga pose is associated with the set of angles at different human body joints. The solutions to the human pose estimation can be easily extended to the yoga pose estimation with the knowledge of yoga asanas. Narayanan et al. [14] designed a simple sequential neural network model consisting of two hidden layers and one output layer to classify the 3 yoga poses (goddess, tree, warrior) perfectly. Anil Kumar et al. [15] in their work captured seventeen 2D points estimated from the human pose and compared it with the reference threshold set of joint key point values for a given standard yoga pose. Kutálek [16] captured data from 162 collected videos based on the annotations, which was then passed to train a CNN.

2.3 Classification Approaches The basic purpose of a classification technique is to determine which category or class new data will belong to. Unsupervised classification and supervised classification are two types of classification techniques that are commonly used. In supervised learning, the target labels are provided with the input data/features, while in unsupervised learning, no target labels are provided with the input data/features during the training phase. The difficulty in achieving high classification accuracy is in selecting the appropriate classification algorithm among a variety of classification algorithms based on the nature of the problem. In K-nearest neighbor (KNN) classification, a query instance is classified by exploring the whole training set for the K most comparable cases (neighbors) and then averaging the output variable for those K examples. Support vector machine (SVM)—another classical supervised classification approach classifies the data points of two classes by finding a maximum margin hyperplane between them Cervantes et al. [17]. SVM is able to handle outliers better than KNN. Random forest is another meta estimator that averages a number of decision tree classifiers across different samples to increase predictive accuracy and control over-fitting Chitra et al. [18]. Chandra et al. [19] in their work reviewed the different computational models of SVM and its applications for image classification. Kirasich et al. [20] compared random forest vs logistic regression and found that logistic regression consistently performed with a higher overall accuracy as compared to random forest even when there is an increase in the variance of explanatory and noise variables. Gradient boosted trees have more capacity than random forests, so they can model

Classification of Yoga Poses Using Integration of Deep Learning …

421

very complex decision boundaries and relationships. However, as with any model with high capacity, they can overfit very quickly. To improve the classification accuracy, ensemble classifier is utilized abundantly. Bagging, boosting, and stacking are the three main ensemble strategies. The AdaBoost classifier, an iterative ensemble method, creates a strong classifier by combining weakly performing ones. Such a strong classifier has high accuracy. Trejo et al. [21] utilized the AdaBoost classifier for the recognition of the six yoga poses (dragon, gate, tiger, tree, triangle, warrior) and achieved a best accuracy of 94.78% for each pose. Another ensemble classifier— Dagging classifier creates a number of disjoint, stratified folds from the data, feeding each one into a copy of the supplied base classifier. Classification is determined by average class membership probabilities if the base estimator produces probabilities, otherwise by plural probability estimates. To achieve generalized fit of different classifiers with no bias toward a particular classifier, voting classifier is a controlling method. Campos et al. [22] in their work devised an efficient way of stacking baggingbased classifiers. Sharma et al. [23] in their work utilized a stack-based multi-level ensemble model based on autoregressive integrated moving average (auto-ARIMA), neural network autoregression (NNAR), exponential smoothing (ETS), Holt Winter (HW) to forecast the future incidences of conjunctivitis disease. Though ensemble classifiers improve classification accuracy, not much work has been done on application of ensemble classifiers for the classification of yoga pose which is inherently a pose classification problem.

3 Methodology Yoga poses dataset was framed in order to test and train a customized yoga pose classifier. MobileNet takes input an image and returns the landmarks of splines forming the skeletal structure and then feeds it to the deep learning model (set of few fully connected layers). The model works on the principle of object detection. The model is trained to detect and locate 33 key points of the human body. The input image is segmented then run through the landmark detector to get the key landmarks. The model outputs the label of the detected pose based on the list of similarity scores of the estimated pose with the known pose types. In the context of this study, pose estimation is done by a pre-trained model, namely MediaPipe. Body pose is estimated from the position of joints in a given human body. Geometric analysis is done at each joint captured from the frame data of the pose image with the help of the MediaPipe library. The MediaPipe library outputs the key landmarks of the human body in the image. These landmarks are then used as an input to the voting classifier. Using every major machine learning algorithm available, we combined each of them to get a powerful ensemble classifier. The voting classifier works by training multiple machine learning models, taking probabilities of classes from each individual model. Probabilities from each class are then averaged. The class with the highest probability is predicted as the pose.

422

K. Kundu and A. Goswami

3.1 Dataset Dataset consists of images of five yoga asanas (goddess pose, downward dog pose, plank pose, tree pose, and warrior 2 pose) [24]. Dataset is divided into training set and testing set (70:30). The data is used for supervised machine learning to identify yoga poses.

3.2 Approach With our simplistic approach, machine learning models were able to identify the geometry of each pose successfully. However, relying on a single model may jinx our performance. Also, to overcome the issue of small size of yoga pose dataset, we utilized stacking classifier, stacking classifier—an ensemble approach. Stacking classifier makes use of multiple classifiers results to define final classification results as shown in Fig. 1. We also employed other ensemble classifiers like AdaBoost classifier, bagging, and dagging classifier. All the models performed decently on the problem. To reinforce our model, a majority voting classifier is trained with all the models giving soft probabilities and voting out the final answer. As metrics for comparing classification performance, we used accuracy, area under the curve, true positive rate, false positive rate, and precision. Precision represents the fraction between true positives and all the positives, whereas recall represents the ratio of correctly identifying true positives. F1-score represents the harmonic mean of precision and recall values.

4 Results and Discussion Landmark model in MediaPipe BlazePose [25] solution predicts the location of 33 pose landmarks (Fig. 2). In order to classify the yoga asana, classifiers such as KNN, logistic regression, random forest, random forest, gradient boosting tree, Naïve Bayes classifier, multilayer perceptron, and extreme gradient boosting classifiers are trained on 33 detected landmarks. Voting classifier further predicts class membership probabilities which defines the class of asana. In order to examine the overall performance of the proposed approach, precision, recall, and F1-score accuracy metrics have been used. A confusion matrix represents count from predicted and actual values. Figure 3 represents the center of main key points detected on the joints as well as their incidence angle for different yoga asanas. The system achieves 99.58% accuracy on training data and 97.41% accuracy on testing data. A decent score of 94.66% is achieved on cross-validation. Yoga pose estimation and feedback generation using deep learning [25] achieved the same

Classification of Yoga Poses Using Integration of Deep Learning …

Fig. 1 Proposed approach

Fig. 2 Thirty three landmarks detected by MediaPipe BlazePose [25]

423

424

K. Kundu and A. Goswami

Fig. 3 Sample output of MediaPipe BlazePose model [25]

accuracy with the help of CNN + LSTM. Our lightweight machine learning model directly compares to the heavy deep learning model. Figures 4 and 5 present the normalized confusion matrix of voting classifier and normalized confusion matrix of AdaBoost, bagging, and dagging classifier. Comparing Figs. 4 and 5, we can see that the voting classifier performs better than all other classifier approaches. The dense and nearly pure diagonal shows that most of the poses were predicted correctly. AdaBoost mis-predicts goddess and warrior pose, and dagging also mis-predicts goddess pose a little. Bagging classifier performs well on all of the poses. Voting classifier performs best among all the models (Tables 1, 2, 3, and 4). Again, voting classifier beats all the other models with an average F1-score of 0.9755, with bagging classifier at 0.9376, dagging classifier at 0.8723, and AdaBoost classifier at 0.8259 (Fig. 6). Higher F1-score implies that voting classifier has better balance of precision and recall and can work robustly on an imbalanced dataset. Fig. 4 Normalized confusion matrix of proposed voting classifier with true labels on X-axis and predicted label on Y-axis

Classification of Yoga Poses Using Integration of Deep Learning …

425

Fig. 5 Normalized confusion matrix of AdaBoost, bagging, and dagging classifier with true labels on X-axis and predicted label on Y-axis Table 1 Accuracy metrics for voting classifier

Table 2 Accuracy metrics for ADABOOST classifier

Pose

Precision

Recall

F1-score

Downdog

1.00000000

0.98936170

0.99465241

Goddess

1.00000000

0.90000000

0.94736842

Plank

0.97435897

0.99130435

0.98275862

Tree

0.97101449

0.98529412

0.97810219

Warrior

0.93805310

0.99065421

0.96363636

Pose

Precision

Recall

F1-score

Downdog

0.95789474

0.96808511

0.96296296

Goddess

0.59292035

0.83750000

0.69430052

Plank

0.99038462

0.89565217

0.94063927

Tree

0.98275862

0.83823529

0.90476190

Warrior

0.67021277

0.58878505

0.62686567

426 Table 3 Accuracy metrics for bagging classifier

Table 4 Accuracy metrics for dagging classifier

K. Kundu and A. Goswami Pose

Precision

Recall

F1-score

Downdog

0.957894

0.968085

0.962962

Goddess

0.592920

0.837500

0.694300

Plank

0.990384

0.895652

0.940639

Tree

0.982758

0.838235

0.904761

Warrior

0.670212

0.588785

0.626865

Pose

Precision

Recall

F1-score

Downdog

0.957446

0.957446

0.957446

Goddess

0.684210

0.812500

0.742857

Plank

0.887096

0.956521

0.920502

Tree

0.984848

0.955882

0.970149

Warrior

0.870588

0.691588

0.770833

For further visualization of characteristics of the models, we use receiver operator characteristics (ROC) and area under curve (AUC). ROC is a graph showing performance of a classification model at all classification thresholds represents an estimated average performance on all the thresholds.

Fig. 6 Area under curve (AUC) and receiver operating characteristic (ROC) on testing data

Classification of Yoga Poses Using Integration of Deep Learning …

427

5 Conclusion and Future Scope This study explores the concept of transfer learning by utilizing pre-trained model trained on the MobileNet model. Investigation of application performance of stacked voting classifier and individual ensemble classifiers (AdaBoost, bagging, and dagging classifier) for the classification of input yoga pose into the five selected cases of yoga asanas has been done in this paper. Higher F1-score of voting classifier explicitly indicates that it has better balance of precision and recall. Therefore, it can handle robustly an imbalanced dataset than the other ensemble classifiers. Future scope of this study can be training and testing with multiple yoga image datasets captured both from female and male yogis. Furthermore, optimized versions of boosted ensemble classifiers can be utilized for future studies. Better alternatives of MediaPipe library for the key points identification and posture identification can be applied for future studies.

References 1. Cao Z, Hidalgo G, Simon T, Wei SE, Sheikh Y (2019) OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell 43(1):172–186. https://doi.org/10.1109/TPAMI.2019.2929257 2. Toshev A, Szegedy (2014) Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660. https://doi.org/10.1109/CVPR.2014.214 3. https://github.com/ildoonet/tf-pose-estimation 4. Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler PV, Schiele B (2016) Deepcut: Joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4929–4937. https://doi. org/10.1109/CVPR.2016.533 5. Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 466–481. https://doi. org/10.1007/978-3-030-01231-1 6. Zhang F, Zhu X, Ye M (2019) Fast human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3517–3526. https://doi.org/10.1109/ CVPR.2019.00363 7. Wang M, Tighe J, Modolo D (2020) Combining detection and tracking for human pose estimation in videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11088–11096. https://doi.org/10.1109/CVPR42600.2020.01110 8. Posetrack (2017) Leaderboard https://posetrack.net/leaderboard.php 9. Posetrack (2018) Leaderboard https://posetrack.net/workshops/eccv2018/posetrack_eccv_2 018_results.html 10. Bin Y, Cao X, Chen X, Ge Y, Tai Y, Wang C, Li J, Huang F, Gao C, Sang N (2020) Adversarial semantic data augmentation for human pose estimation. In: The European conference on computer vision, pp 606–622. Springer. https://doi.org/10.1007/978-3-030-58529-7 11. Gong K, Zhang J, Feng J (2021) PoseAug: a differentiable pose augmentation framework for 3D human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8575–8584. https://doi.org/10.1109/CVPR46437.2021.00847

428

K. Kundu and A. Goswami

12. Kim W, Sung J, Saakes D, Huang C, Xiong S (2021) Ergonomic postural assessment using a new open-source human pose estimation technology (OpenPose). Int J Ind Ergon 84:103164. https://doi.org/10.1016/j.ergon.2021.103164 13. Karashchuk P, Rupp KL, Dickinson ES, Walling-Bell S, Sanders E, Azim E, Brunton BW, Tuthill JC (2021) Anipose: a toolkit for robust markerless 3D pose estimation. Cell Rep 36(13):109730. https://doi.org/10.1016/j.celrep.2021.109730 14. Narayanan SS, Misra DK, Arora K, Rai H (2021) Yoga pose detection using deep learning techniques. Available at SSRN 3842656. https://doi.org/10.2139/ssrn.3842656 15. Anilkumar A, KT A, Sajan S, KA S (2021) Pose estimated yoga monitoring system. Available at SSRN 3882498. https://doi.org/10.2139/ssrn.3882498 16. Kutálek J Detection of yoga poses in image and video. excel.fit.vutbr.cz 17. Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L, Lopez A (2020) A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 408:189–215. https://doi.org/10.1016/j.neucom.2019.10.118 18. Chitra S, Srivaramangai P (2020) Feature selection methods for improving classification accuracy–a comparative study. UGC Care Group I Listed J 10(1):1. https://ieeexplore.ieee.org/doc ument/7218098 19. Chandra MA, Bedi SS (2021) Survey on SVM and their application in image classification. Int J Inf Tecnol 13:1–11. https://doi.org/10.1007/s41870-017-0080-1 20. Kirasich K, Smith T, Sadler B (2018) Random forest vs logistic regression: binary classification for heterogeneous datasets. SMU Data Sci Rev 1(3):9. https://scholar.smu.edu/datasciencer eview/vol1/iss3/9/ 21. Trejo EW, Yuan P (2018) Recognition of yoga poses through an interactive system with Kinect device. In: 2nd international conference on robotics and automation sciences (ICRAS), pp 1–5, IEEE. https://doi.org/10.1109/ICRAS.2018.8443267 22. Campos R, Canuto S, Salles T, de Sá CC, Gonçalves MA (2017) Stacking bagged and boosted forests for effective automated classification. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pp 105–114. https:// doi.org/10.1145/3077136.3080815 23. Sharma N, Mangla M, Mohanty SN, Pattanaik CR (2021) Employing stacked ensemble approach for time series forecasting. Int J Inf Tecnol 13:2075–2080. https://doi.org/10.1007/ s41870-021-00765-0 24. https://www.kaggle.com/general/192938 25. https://google.github.io/mediapipe/solutions/pose.html

Tabular Data Extraction From Documents Jyoti Madake and Sameeran Pandey

Abstract Data is everywhere, and data is the new oil, meaning it is omnipresent and valuable. Unfortunately, much of this data is not digitized and that calls for a technology capable of translating a hardcopy into usable softcopy. Text extraction or “textraction” is the key. Using image processing and advanced machine learning, a technique to perform text extraction comes into the picture which is optical character recognition. But, the OCR’s scope is limited to just extraction of data. Making sense of that data by reorganizing it is a different story altogether. This project proposes a couple of methods to extract data from tables and then re-tabulate it in the database using the two approaches. The first method straightaway utilizes the processing power of Google’s vision API and later processes the data obtained from its OCR while the second method focuses on image processing algorithms to extract the data from the columns of the tables and use it to repopulate the digital database. Keywords Optical character recognition · Text extraction · Data processing · Data extraction from tables

1 Introduction The world is full of ever-growing and massive amounts of data: both paper-bound like invoices, handwritten letters, notes, printed forms, etc., and digital data like Excel sheets, Word documents, etc. One of the easiest and most common ways of bridging the gap between digital and paper-bound data is by clicking the hardcopy’s photo. But, photos tend to occupy a larger space than containing information.

J. Madake (B) · S. Pandey Vishwakarma Institute of Technology, Pune, Maharashtra, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_37

429

430

J. Madake and S. Pandey

Optical character recognition (OCR) is a technique for identifying typed or manually handwritten text characters in digital copies of physical records, such as a scanned paper record. OCR is a technology that examines a document’s text and converts the text into a language that may be used to process the data. Text recognition is another term for OCR. The physical structure of a paper is processed using a scanner in the first phase of OCR. After all, pages have been scanned; OCR software turns the content into a twocolor, or black and white, form. The bright and dark areas of the retrieved image or bitmap are labeled as characters to remember, and the lighter regions are identified as context. The darker areas are then analyzed furthermore to determine if they include alphabetic words or numeric figures. OCR systems use a variety of methods, but most focus on a single character, expression, or block of text at a moment. After that, one of two algorithms is used to identify the characters: (i) Pattern Recognition: In this, the OCR systems are fed samples of writing in a variety of formats and fonts, which are then compared and recognized as characters in the scanned manuscript. (ii) Feature Detection: To identify characters in a scanned text, OCR programs use rules based on the features of a particular letter or digit. For example, the character’s number of angled lines, curves, or crossing lines could be used as a comparison feature. OCR is usually an “offline” method, that analyzes a standing file or manuscript, but there also happens to be some cloud-based services that deliver online OCR API services, some of which are free while others are paid. The key benefits of OCR technology are time savings, reduced mistakes, and reduced work. Although photographing documents allow them to be filed digitally, OCR adds the ability to edit and scan those documents and the text inside those documents. For this project, the idea was to use OCR in a way Amazon uses its Textract OCR service. Amazon separates data in rows and columns if the parent data file has a table in it. The reason behind trying to replicate it is, to make a cost-efficient model of the same. Currently, Amazon charges ten times more than its rivals for this particular feature. The motive of this project was to incorporate it at a base price; the exact methodology in a more detailed manner can be found in the proposed methodology section of this document. The UI is based on React.js. The backend is written using Java while the OCR and data processing is done using Python and OCR API.

2 Market Survey There are various available OCR providers in the market. It is not possible to study each and every one of them, but a comparative study for 9 well-known OCR services in the market was done. Initially, a model was designed using PyTesseract [1] which is an offline OCR service that is also open source and currently managed by Google. The biggest advantage of this tool was the immense number of resources available at

Tabular Data Extraction From Documents

431

every stage where one might get stuck. It was free of cost as well, but the downside to it was the inability to give accurate output for varied datasets because the image preprocessing was required to be done manually for every picture, and the dataset was not uniform. Hence, a decision was made to test using the paid OCR services. The dataset was mostly invoiced containing tables, so whichever service provided accurate results and gave multi-scale bounding boxes was my ideal choice. Taggun, which according to the claims of the Web site, was supposedly a tailor-made OCR for invoices only [2]. Surprisingly, on our dataset, its demo version could not perform as well as it should. There were many mistakes in text identification, and for largescale orders, the pricing was unclear. The next on the list was intento OCR API which did not provide any demo service [3]. The reviews, if be trusted, suggested a mediocre performance; hence, we did not bother to shell out money for testing. Moreover, the documentation did not indicate anywhere, anything about multi-level bounding boxes. The Cloudmersive OCR API is a handy tool for extracting basic text from photos [4]. It just has one endpoint, picture to text, and instead of returning text by regions, it returns all of the text in the image as a single string. This, once again, renders it less valuable for our purposes. OCRspace was quite an economic option, but the result was almost like that of Taggun in terms of performance and features [5]. After this, it was time to turn toward the market leaders in OCR; the big 3: Google, Amazon, and Microsoft. Amazon Textract is the best OCR service out there in terms of features and accuracy [6]. It is the only one which gives a tabulated result as well. The sole problem with Textract was that when used with the table extraction feature, it costs 10 times more than normal. Now, in an experimental use or small-scale use, the difference is not much because the pricing is as it is very reasonable. But, when implemented on large scale, the tenfold rise in cost is a lot of money, and hence, it was required to be rejected, but due to its perfect feature, it became the inspiration for the model that was needed to be made. While looking for a cheaper alternative, Microsoft Vision seemed like a good choice as it was slightly cheaper than Google Vision API, and the accuracy of text recognition was at par with Amazon and Google [7]. But in the end, we decided to use Google Vision API [8] for multiple reasons, namely: It was not as costly as Amazon’s Textract with tabulation feature; the accuracy was as good as Amazon’s, and it gave multi-level bounding boxes which could be utilized for separating columns upon optimization of algorithm. Google Vision does not provide as much control over its configuration as Tesseract. However, its defaults are very effective in general.

3 Literature Review Over the past decade, OCR has been one of the hot topics of the industry, but as pointed out by Mori et al. [11] in their research, OCR was first patented in 1929 in Germany and in 1933 in the USA, and since then, the technology is evolving non-stop [11]. OCR or optical character recognition is a technique for converting

432

J. Madake and S. Pandey

text from a digital picture into editable text [12]. Through optical mechanics, it enables a machine to recognize characters. In terms of formatting, the OCR output should ideally match the input. The methodology in general for an OCR software, as described by Mithe et al. [10] in their study, is scanning the image, thresholding, segmentation of text, preprocessing of the image, and finally feature extraction [10]. According to a research, handwriting recognition and machine-printed character recognition are two types of OCR systems based on the kind of input [13]. For our use case, we just need machine-printed character recognition from structured tables. In their study, Faisal Mohammad et al. elaborate on manually performing OCR by applying a pattern matching algorithm, which gives an insight into the working of OCR internally [14]. But, as mentioned by Smith et al. [15] in the research, Tesseract is now behind the leading commercial engines in terms of its accuracy, so a lot of labor can be avoided by utilizing existing and minimal-flawed technology. But, for using the free version, one needs to do a lot of preprocessing on his own [15]. A comparative and experimental study done by Maya Gupta et al. in their study shows how effective Otsu thresholding can be in the preprocessing stages [16]. But, according to Yan et al. experimental results on document images, considering a noise free image, multistage adaptive thresholding is marginally superior in terms of performance when the dataset is comprising of darker tones; this result made us adopt a multistage approach for thresholding, consisting of global and Otsu thresholding [17]. In a research, the authors explain in detail about the morphological image processing [18], and according to another study, color pictures benefit from techniques based on mathematical morphology that have been established for binary and grayscale images [17]. Nina et al. presented a paper which highlights two novel approaches to the problem of degradation of images while binarizing them, to allow automated binarization and segmentation of handwritten text pictures; one uses a recursive variant of Otsu thresholding with selective bilateral filtering. The other extends on the recursive Otsu approach by improving background normalization and adding a post-processing phase to the algorithm, making it more robust and capable of handling photos with bleed-through artifacts [19]. Researches done by Raid et al. and Goyal discuss various morphological operations like dilation, erosion, etc., and their effects on images. [20, 21] But, since the objective is to extract something more out of the document, a research by Christopher Clark et al., through reasoning about the vacant regions inside that text, their technique evaluates the structure of particular pages of a document by recognizing sections of body text and locating the locations where figures or tables may exist [22], seems more useful. Another research paper proposes an algorithm that is based on the fact that tables often contain separate columns, implying that spaces between fields are significantly greater than gaps between words in text lines [23]. Nataliya Le Vine et al. propose a method which takes a top-down approach, mapping a table image into a standardized ‘skeleton’ table form denoting the approximate row and column borders without table content using a generative adversarial network, then fitting renderings of candidate latent table structures to the skeleton structure using a distance measure optimized by a genetic algorithm [24]. But, as pointed out by Shafait in his research, many table identification methods are designed to recognize tables in

Tabular Data Extraction From Documents

433

single columns of text and do not function consistently on documents with different layouts [25]. A paper proposes a novel approach for detecting tables where in region of interest (ROIs) suited as table candidates they find text components and extract text blocks. The height of each text block is then compared to the average height after all text blocks are verified to see if they are placed horizontally or vertically. The ROI is considered a table if the text blocks comply with this set of requirements [26]. Tupaj et al. also follow a similar approach with a slight variation [27]. But, a more ready-to-use approach is mentioned in few other research papers [28–30]. All of the following research papers and references helped the team to come up with a novel, robust, and usable approach to extract the tables from a given dataset.

4 Methodology In order to bridge the gap between the basic OCR and what Amazon’s Textract can do, the paper proposes two methods, Fig. 1 one using Google’s vision API with Textricator and the other using morphological operations with pyTeserract. Dataset: The dataset for testing the algorithms consisted of two hundred different invoices generated by a business firm. These invoices had data in tabulated form, but there were no inked and separated lines to distinguish between rows and columns. Apart from the data in tables, the other information such as the name of seller and buyer, account details, and date of the transaction, which was mentioned in different regions of the invoice, were also required to be collected in the database. Method 1: Google vision API and Textricator For this approach, following the documentation, the dependencies were installed; the credential file was created by enabling the API from the Google cloud platform itself. After the credential file is passed, the response was obtained. The response is used to construct bounding boxes around the attribute we describe, which in this case

Fig. 1 Block diagram

434

J. Madake and S. Pandey

Fig. 2 Flowchart for method 1 using Google vision API

is a block. Then, the list of objects, which contain the coordinates of every blocks bounding box is formed. Once the list of coordinates is obtained, any given set of coordinates can be accessed simply using the index of the object in the list. Text can be obtained from the given coordinates. Alternatively, Textricator [9] can be used. Textricator is a tool to extract text from documents and generate structured data; it is not an OCR tool, so it cannot be directly used on image datasets. In table mode, the data is grouped into columns based on the x-coordinate of the text. This method is elaborated in the flowchart shown in Fig. 2. Method 2: Morphological operations for extraction Figure 3 elaborates the method which uses morphological operations for extraction. This method initiates by loading the image (Fig. 5) and then inverting it using global and Otsu thresholding. When converting an image to a binary picture, global thresholding involves using a single-pixel intensity threshold value for all pixels in the image, and Otsu’s thresholding approach entails iterating overall potential threshold values and calculating an amount of spread for the pixels on each side of the threshold, which are the pixels present in either the foreground or background. The goal is to obtain the smallest overall threshold value for the foreground and background spreads (Fig. 4). Then, a rectangular structuring element for performing morphological operations was created. For morphological operation, a new structuring element to perform erosion on the image is created in the beginning. First, the requirement is to get the vertical lines, so a vertical kernel with kernel width 1 and height equivalent to 200 divided by the number of columns in the picture array is created. Erosion makes

Tabular Data Extraction From Documents

435

Fig. 3 Flowchart for method 2 using morphological operations

Fig. 4 Input image

the pixel 1 only if all the pixels under the kernel are 1. Erosion is performed for 5 iterations with the vertical kernel and then the same for the horizontal kernel. If at least one of the pixels under the kernel is 1, then dilation will turn the value of the pixel to 1. Then, both the images are added. Once the image is obtained, the next step is to get the contours from the grid that was previously obtained by adding the horizontal and vertical lines. Then, the contours are drawn on the image and stored in the list boxes.

436

J. Madake and S. Pandey

Fig. 5 Tabulated output

Algorithm 1: Extract rows and column information Input: images Output: table rows and columns summary 1. Get rows and columns // For storing rows and columns 2. Add rows and columns 3. Calculate the average height of the box 4. Add first box to the list of columns // The columns list is only a placeholder and hence temporary 5. Allocate the preceding box to the first box 6. Iterate through the rest of the bounding boxes list in a loop 7. Check if the y coordinate of the current box’s top left corner is smaller than the y coordinate of the previous box multiplied by half the mean of all heights at each iteration 8.If yes: i. The selected (current one) box is added to the list of columns ii. The current box is allocated to the previous box iii. We then check to see if we have arrived at the last index. If yes: 1. The whole corresponding column is appended to the rows list 9. If no: i. The list of columns is appended to the rows ii. The columns’ list is set to empty, which creates a new empty columns list iii. The current box is assigned to the previous box variable iv. The current box is appended to the newly constructed empty column list

Tabular Data Extraction From Documents

437

Fig. 6 Input image on AWS Textract page. Source Amazon Textract demo

5 Result Now, a list comprising of the boxes’ coordinates is created, and then, the image is extracted from the boxes and get the contents using pyTeserract (because it is free of cost). For this, we extract the region of interest from the image. For the cell, we resize it and then perform morphological operations on the extracted cell area to remove noise. Finally, the image is transformed into a string with PyTesseract. The strings in each row are then appended to a temporary string “s” with spaces, which is subsequently appended to the final data frame. Next, a NumPy array is created from the data frame; then using Pandas library, we reshape the array into a data frame with the number of rows and columns. This gives a table output Fig. 5 which can also be imported as a.csv file. Once the table data is extracted, it can be sent to the Java backend which can link it with the UI where the user is supposed to upload images like shown in Fig. 6 and receive an output similar to Fig. 7.

6 Conclusion The following two methods were tested on 200 invoice samples and 5 random prints of documents containing some type of table. The results were barely satisfactory. Both methods were able to get close to Amazon Textract’s algorithm and desirable output (Fig. 7), but unfortunately, both approaches alone were not sufficient to replicate it. In the first approach, i.e., the Google vision API approach, when we try to access

438

J. Madake and S. Pandey

Fig. 7 Ideal output UI. Source Amazon Textract demo

indices, there is no way to uniformly predict which index to choose, and for different data, the user might need to use hit and trial to determine useful block. By using Textricator, this issue can be resolved, but that will again work only if the template of the table and other information placement on the page is common across the entire dataset. If the second approach is followed, it is not commercially viable since for each cell it is doing an OCR call. If the free OCR tool, pyTeserract, is used, the accuracy in text recognition would be an issue, and if a paid OCR service like Google vision API is used, the algorithm will end up giving API calls for every cell which would exponentially raise the cost. Another disadvantage is that the second approach performs poorly when used on an actual dataset with lighting and orientation nonuniformities. Hence, it is safe to conclude that this project can be used by using method 1 along with Textricator that too only for a unique case implementation where all the data is in a particular template. One cannot use it as a replacement to Amazon’s Textract. From here, one can extend it in many ways: (i) Change the design of UI, (ii) Improve the algorithm, (iii) Implement a fusion of both methods, (iv) Add additional features like login and authentication on the main page of UI.

References 1. 2. 3. 4. 5.

https://pytesseract.readthedocs.io/en/latest/. Last accessed 6 March 2021 https://www.taggun.io/. Last accessed 2 March 2021 https://inten.to/api-platform Last accessed 3 April 2021 https://cloudmersive.com/ocr-api Last accessed 30 April 2021 https://ocr.space/ocrapi Last accessed 30 April 2021

Tabular Data Extraction From Documents

439

6. https://aws.amazon.com/textract/ Last accessed 10 May 2021 7. https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/concept-recogn izing-text. Last accessed 10 May 2021 8. https://cloud.google.com/vision/docs/resources. Last accessed 10 May 2021 9. https://textricator.mfj.io/. Last accessed 15 March 2021 10. Mithe R, Indalkar S, Divekar N (2013) Optical character recognition. Int J Recent Technol Eng (IJRTE) 11. Mori S, Suen CY, Yamamoto K (1992) Historical review of OCR research and development. Proc IEEE 80(7):1029–1058. https://doi.org/10.1109/5.156468 12. Singh S (2013) Optical character recognition techniques: a survey. J Emerg Trends Comput Inf Sci 4(6) 13. Islam N, Islam Z, Noor N (2016) A survey on optical character recognition system. J Inf Commun Technol-JICT 10(2) 14. Mohammad F, Anarase J, Shingote M, Ghanwat P (2014) Optical character recognition implementation using pattern matching. (IJCSIT) Int J Comput Sci Inf Technol 5(2) 15. Smith R (2007) An Overview of the Tesseract OCR Engine. Ninth international conference on document analysis and recognition (ICDAR 2007), pp 629–633. https://doi.org/10.1109/ ICDAR.2007.4376991 16. Gupta MR, Jacobson NP, Garcia EK (2007) OCR binarization and image pre-processing for searching historical documents. Pattern Recogn 40(2) 17. Yan F, Zhang H, Kube CR (2005) A multistage adaptive thresholding method. Pattern Recogn Lett 26(8) 18. Batchelor BG, Waltz FM (2012) Morphological image processing. Machine Vision Handbook, pp 801–870 19. Nina O, Morse B, Barrett W (2011) A recursive Otsu thresholding method for scanned document binarization. In: 2011 IEEE Workshop on Applications of Computer Vision (WACV). 20. Raid AM, Khedr WM, El-Dosuky MA, Aoud M (2014) Image restoration based on morphological operations. Int J Comput Sci Eng Inf Technol (IJCSEIT) vol 4, no 3 21. Goyal M (2011) Morphological image processing. Int J Comput Sci Technol (IJCST) 2(4) 22. Clark CA, Divvala S (2015) Looking beyond text: extracting figures, tables, and captions from computer science paper. In: AAAI, workshop on scholarly big data 23. Mandal S, Chowdhury SP, Das AK, Chanda B (2006) A simple and effective table detection system from document images Int J Document Anal 24. Le Vine N, Zeigenfuse M, Rowan M (2019) Extracting tables from documents using conditional generative adversarial networks and genetic algorithms. In: IJCNN 2019. international joint conference on neural networks. Budapest, Hungary 25. Shafait F, Smith R (2010) Table detection in heterogeneous documents. In: ACM international conference proceeding series 26. Tran DN, Tran TA, Oh A, Kim SH, Na IS (2015) Table detection from document image using vertical arrangement of text blocks. Int J Contents 11(4) 27. Tupaj S, Shi Z, Chang CH (1996) Extracting tabular information from text files. Available at http://www.ee.tufts.edu/~hchang/paperl.ps 28. Yildiz B, Kaiser K, Miksch S (2005) pdf2table: a method to extract table information from PDF files, pp 1773–1785 29. Oro E, Ruffolo M (2009) PDF-TREX: an approach for recognizing and extracting tables from PDF. In: 2009 10th international conference on document analysis and recognition 30. Rastan R, Paik HY, Shepherd J (2019) TEXUS: A unified framework for extracting and understanding tables in PDF documents. Inf Process Manage 56(3)

Vision-Based System for Road Lane Detection and Lane Type Classification Jyoti Madake, Dhavanit Gupta, Shripad Bhatlawande, and Swati Shilaskar

Abstract This study proposes a method for the detection and classification of road lane boundaries. In an autonomous vehicle, the scene understanding of the road in front of the vehicle is most essential. For safe driving, only detection of lanes is not sufficient, the vehicle position with respect to the road boundary geometry, road borders are also essential. The classification of road lane type is very important for the safety of passengers and for the management of traffic. The proposed system can decrease the number of accidents and increase the safety of passengers. This system is implemented using two modules. The first module detects the road lanes and color them, and the second module classifies the lane into its type. Hough transform is used to recognize curved and straight lane. After detection, the lanes are color-coded to improve identification. For lane classification, a CNN model is trained which gives a very good performance. This model is created to classify three different types of markings such as white dashed, solid white, and double yellow with 97.5% accuracy. Keywords Lane type · Lane detection · Lane classification · Hough transform · CNN · Driver assistance

1 Introduction Lane detection is critical for self-driving automobiles. To properly drive an autonomous vehicle, a thorough grasp of the surrounding environment is required. The accurate detection of the scene in front of the vehicle is the most crucial step. It involves the detection of other cars, pedestrians, traffic signals, and road markings. The road markings help in keeping track of the proper lane. Lane boundary position and type are one of the most important factors, and this helps to keep track of the driving path and avoids the lateral crash with the adjacent vehicles on the road. As a result, many techniques incorporate lane border information or GPS-based localization to locate the car inside the roadway. The convolutional neural networks J. Madake (B) · D. Gupta · S. Bhatlawande · S. Shilaskar Department of Electronics and Telecommunication (E&TC), Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_38

441

442

J. Madake et al.

(CNNs) are used to identify the lane boundaries, as they are for many other computer vision applications. Nonetheless, the location of lane markings can be extracted by detecting the lanes using computer vision techniques. In terms of the vehicle, a reliable positioning may not be sufficient, as course planning or localization information about lane kinds is required. The lane boundary classification information can provide real-time warnings for detecting the lane departure and can also be used for checking whether in semi-automatic mode the driver needs any assistance of overtaking the vehicle next to it. The road lane information is also useful for planning the vehicle trajectory and the position of vehicle on the road with respect to the other lateral vehicles. Different types of lanes are carefully planned and positioned around the nation to fit the environment. They carefully consider the type of road to be utilized in each location to ensure that people enjoy the safest driving experience possible.

1.1 Different Type of Road Lanes and Their Purpose: (1) Dashed—This is the most common form of road in the world. A broken white line allows you to change lanes, overtake, and make U-turns. However, you must first confirm that the route is clear and that doing such a move is safe. (2) Solid White—A solid white line is slightly more rigid. You are not permitted to pass other cars or make U-turns on this route. Keep going straight if you are on this sort of road. Crossing the line is only permitted in the event of an accident or to take a turn. These roads are most commonly found in steep places with a high risk of accidents. (3) Double Yellow—A double continuous yellow line denotes that crossing it is absolutely prohibited on both sides. So, no overtaking, U-turns, or lane changes are allowed. This pattern is most common on risky two-lane highways with a high risk of accidents. In the proposed system for lane detection, the dataset is generated indigenously. It consists of 3600 images divided into 3 classes as shown in Fig. 1, that is, white dashed, solid white, and double yellow. In dataset, 1200 images of each label are present which is further divided for training and validation. The resolution of images is 224 × 224.

2 Literature Review The following section discusses the research and work done in road lane domain.

Vision-Based System for Road Lane Detection and Lane Type …

443

Lane Boundary

Dashed

Single Continuous

Double Yellow

Fig. 1 Lane boundary types

2.1 Traditional Approaches Chiu et al. discussed color ridge features and template matching which are used in most classic approaches to extract a mixture of visual highly specialized information [1] Hough transformations [2], Kalman filters [3], edges [4], and particle filters [5] may also be used to merge these elementary characteristics. Most of these techniques are vulnerable to changes in illumination and road surface state, making them susceptible to failure [6].

2.2 Deep Learning-Based Approaches Lee et al. [7] suggested a vanishing point guided net (VPGNet) architecture which can detect lanes and recognized road markings in a variety of weather situations. Their data was collected in Seoul, South Korea’s downtown region. For lane marker recognition, there are primarily two types of segmentation methods: (1) semantic segmentation and (2) instance segmentation. Each pixel is assigned a binary label that indicates the pixel will be a part of lane or not. For example, in [8], the authors described a CNN-based lane feature extraction for detecting lanes that uses front-view and top-view picture areas. After that, they employed a global optimization phase to arrive at a set of precise lane lines. Another approach for lane detection is LaneNet [9]. Instance segmentation, on the other hand, distinguishes distinct instances of each class in an image and recognizes independent segments of a line as one unit. To achieve this, [10] presented spatial CNN architecture to accomplish useful spatial details to pass to the classifier in the network.

444

J. Madake et al.

2.3 Approaches for Lane Type Classification Despite the fact that there are several techniques to lane monitoring, little effort has been put into recognizing lane markers. Lane markers come in a variety of styles. The road lane marking is often classified by its color used for marking, whether the lanes are dashed or solid segments. Also, these lanes can be single line segment or double line segments. In [8], an approach for detecting road lanes that distinguish between dashed and solid lane markers is given. Traditional lane detecting methods were outperformed by their method [11]. Another method identifies five-lane marking types as follows: dashed, dashed-solid, double solid, solid-dashed, and single solid. Pizzati [12] which used ERFNET [13] and some other approaches for lane classification is discussed in [14]. Another approach to classify road lane marking with the help of ridge is discussed in [15]. Li et al. [16] implemented a model based on image segmentation to extract road marking from bird eye view. This model is effective under varying light condition. Another approach to detect and classify road lane types under different traffic conditions using CNN and RNN is discussed in [17]. Most of the researches done for lane classification are very complex. The method proposed in this paper is simple compared to the previous approaches and gives the required output with good accuracy.

3 Methodology This paper presents a lane detection and classification system. The system detects the lane and then classifies it into its type, that is, dashed, single continuous, and double yellow. Figure 2 shows the complete block diagram of the proposed system that consists of camera and processor-based system for detection and classification.

3.1 Data Acquisition and Image Preprocessing The dataset includes 3600 images of different lane types. The images were captured from Samsung Galaxy M40 with 32MP resolution each from various scenarios. The author collected 40% of the dataset and the remaining 60% was obtained from the Internet. The distribution of images in the dataset is presented in Table 1. The collected images were resized to 224 × 224 pixels. The resized images were then converted into grayscale format. Figure 3 displays some sample images from collected dataset with class labels.

Vision-Based System for Road Lane Detection and Lane Type …

445

Processor Based System

Lane Detection

Camera

Lane Type Classification

Output Type of Lane

Fig. 2 Block diagram of lane type detection system

Table 1 Dataset details

Type of lane

No. of images

Dashed

1200

Solid white

1200

Double yellow

1200

Total

3600

3.2 Lane Boundary Detection The image preprocessing is needed to extract the image’s edge feature. The edge feature extraction comprises converting pictures which are in RGB form to gray images, then Gaussian blurring is used for smoothing the image edges, and for edge detection, Canny operator is used. The region of interest (ROI) is chosen to select the required section of the image and separate it from other background details. The lane edge detection is then carried out using Hough transforms. The Hough transform detects all the lane image straight line key points. Line representation may be represented in the Hough space using only two parameters. Those are l and m as shown in Eq. (1). y = l.x + m

(1)

446

J. Madake et al.

Fig. 3 Images of lanes with labels

Vertical lines, on the other hand, are not possible to express with this form. As a result, the Hough transform employs Eq. (2), which may be used to recast as Eq. (3) that approximate Eq. (1). Here, θ and d represent the line’s angle and distance from the origin, respectively. d = x.cosθ + y.sinθ

(2)

d cosθ .x + sinθ sinθ

(3)

y=−

when θ ∈ [0, 180] and d ∈ R (or θ ∈ [0, 360] and d ≥ 0) are used, this represents the lines present.

Vision-Based System for Road Lane Detection and Lane Type …

Mapping

Line

447

d

y

Point

x Fig. 4 Mapping of line in Hough transform

As a result, each θ and d represent the Hough space for lines. The l line in x and y axis of image domain is shown by a single point with a unique set of characteristics (θ 0, d0) in Hough domain. Figure 4 illustrates the line-to-point mapping. However, numerous straight lines may be observed, but they are not all continuous. As a result, numerous straight lines must be filtered and screened. The left and right lanes may be differentiated by the slope of straight lines. At the same time, certain straight lines with a substantial slope variation from the majority of straight lines can be removed. Finally, after screening, the remaining lines may be installed as well as two straight lines for left and right lanes, respectively. Algorithm1: Lane detection Input: Image Output: Detected lane represented by line Initialization: 1.For each image specify the path of image 2. Resize the images(224 × 224) 3. Define region of interest 4. Convert image into grayscale image 5. Convert to canny image 6. Perform Hough line detection (threshold = 16) 7. Merged lines in the original image

3.3 Classification of Lane Boundary The dataset of lane type images is used for classification purposes. After, this neural network is implemented. Neural network receives input, and using weights and bias, it finds the output through a series of hidden layers as shown in Fig. 5. Neural network is good for small kind of classification tasks, but these small artificial neural networks do not scale up to huge images, so a new neural network architecture called

448

J. Madake et al.

Fig. 5 Architecture of CNN

convolutional neural network is made. Convolutional neural network takes scales up to bigger tasks on classification. Convolutional neural networks have neurons arranged in 3 dimensions, the same as that of a 3 channel RGB image. The convolutional neural network is a sequential model. The model for detecting lanes is built using three layers of convolution operation, and each of the three convolution blocks in the sequential model has a max pooling layer. A ReLU activation function activates a fully linked layer with 128 units on top of it. A dropout layer is added to prevent overfitting. Algorithm 2: Lane classification Input: Images from dataset Output: Prediction on random images Initialization: 1. for each class in data 2. Specified path of dataset 3. for loop for each image in dataset 4. Image resize (224 × 224) 5. Batch size (32) 6. Standardizing data by rescaling (1/255) 7. Creating sequential model 8. Compiling model 9. Training and saving model 10.for loop for each image in testing folder 11. Predicting unseen images

The complete working flow of the proposed lane type detection system is given in Fig. 6. This system merges the output of lane detection model and lane classification model to get final output.

Vision-Based System for Road Lane Detection and Lane Type …

449

Image Preprocessing

Training Of Classification Model

Canny Edge

Region Of Interest

Hough Transform

Prediction Based on Trained Model

Linear Fitting

Output

Fig. 6 Lane detection and classification algorithm flowchart

4 Result The performance of model is evaluated on the basis of training and validation accuracy on dataset. The detection and classification model training were done on 80% of the data, and the remaining 20% images had been used for validation. The model is validated on 10 unseen images, and it predicted right class 9/10 times. The model is having batch size 32 and is tuned for good accuracy till 15 epochs. It is giving 95.3% validation accuracy at 15 epochs. The training results of the model at different epochs are presented in Table 2. A learning curve’s shape and dynamics may be used to assess a machine learning model’s behavior and, in turn, recommend the sort of configuration modifications that might be made to improve learning and performance. The Fig. 7 details the accuracy graph obtained during training and validation of the model. It also detailed

450

J. Madake et al.

Table 2 Accuracy and loss No. of epochs

Training

Validation

Loss

Accuracy

Loss

Accuracy

1

0.46

0.82

0.16

0.93

5

0.095

0.96

0.14

0.94

10

0.065

0.96

0.20

0.94

15

0.054

0.97

0.11

0.95

the train and validation loss. These curves can be classified into three types as follows: underfit, overfit, good fit. And the observed curve from this system can be classified as good fit. The model proposed in this paper is not very complex like the previous researches mentioned to detect and classify the type of road lanes and still gives equal or more accuracy as shown in Table 3. The model gives very good accuracy and is successfully detecting and classifying the road lane boundaries as shown in Fig. 8.

Fig. 7 Accuracy and loss graph

Vision-Based System for Road Lane Detection and Lane Type … Table 3 Comparison with the previous methods

451

Method

Dataset

Resolution

Accuracy (%)

Pizzati [12]

6408 [Images]

512 × 256

96

de Paula [14]

10 [Video sequence]

480 × 640

78.07

Zhang [17]

13,902 [Images]

80 × 160

95.21

Ours

3600[Images]

224 × 224

95.3

Fig. 8 Model successfully classifying and detecting lane

5 Conclusion This study presents a CNN-based approach for accurate detection and reliable classification of road lane types and is classifying lane boundaries with a very good accuracy. The primary function of road markings on a roadway is to direct and manage traffic. They serve as a backup to traffic signs. The markings act as a psychological barrier, indicating the boundaries of the traffic channel as well as its lateral clearing from traffic dangers for safe traffic flow. As a result, they are critical to ensuring a safe, smooth, and harmonic flow of traffic. This system will ensure safety of passengers and can be used in autonomous vehicles in future. This model is giving 97.5% training accuracy and 95.3% validation accuracy for classification and is also giving good results in detection. The advancement of safety technology is laying the groundwork for the creation of sophisticated software-defined automated systems that can navigate highways with little to no human interaction, and this system can be used in autonomous vehicle.

452

J. Madake et al.

References 1. Chiu KY, Lin SF (2005) Lane detection using color-based segmentation. In: IEEE Proceedings. Intelligent Vehicles Symposium, pp 706–711. IEEE 2. Chiu KY, Lin SF (2010) Illumination invariant lane color recognition by using road color reference & neural networks. In: The 2010 international joint conference on neural networks (IJCNN), pp 1–5. IEEE 3. Welch G, Bishop G (1995) An introduction to the Kalman filter, pp 127–132 4. Lee C, Moon J-H (2018) Robust lane detection and tracking for real-time applications. IEEE Trans Intell Transp Syst 19(12):4043–4048 5. Chopin N (2002) A sequential particle filter method for static models. Biometrika 89(3):539– 552 6. Aminuddin NS, Ibrahim MM, Ali NM, Radzi SA, Saad WH, Darsono AM (2017) A new approach to highway lane detection by using Hough transform technique. J Inf Commun Technol 16(2): 244–260 7. Lee S, Kim J, Shin Yoon J, Shin S, Bailo O, Kim N, Lee TH, Seok Hong H, Han SH, So Kweon I (2017) “Vpgnet: Vanishing point guided network for lane and road marking detection and recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1947–1955 8. He B, Ai R, Yan Y, Lang X (2016) Accurate and robust lane detection based on dual-view convolutional neutral network. In: 2016 IEEE intelligent vehicles symposium (IV), pp 1041– 1046. IEEE 9. Wang Z, Ren W, Qiu Q (2018) Lanenet: Real-time lane detection networks for autonomous driving. arXiv preprint arXiv:1807.01726 10. Pan X, Shi J, Luo P, Wang X, Tang X Spatial as deep: Spatial CNN for traffic scene understanding. In: The thirty-second AAAI conference on artificial intelligence 11. Song W, Yang Y, Mengyin F, Li Y, Wang M (2018) Lane detection and classification for forward collision warning system based on stereo vision. IEEE Sens J 18(12):5151–5163 12. Pizzati F, Allodi M, Barrera A, García F (2019) Lane detection and classification using cascaded CNNs. arXiv preprint arXiv:1907.01294 13. Romera E, Alvarez JM, Bergasa LM, Arroyo R (2017) Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272 14. De Paula MB, Jung CR Real-time detection and classification of road lane markings. In: 2013 XXVI conference on graphics, patterns and images, pp 83–90. IEEE 15. Lopez A, Canero C, Serrat J, Saludes J, Lumbreras F, Graf T (2005) Detection of lane markings based on ridgeness and RANSAC. In: Proceedings 2005 IEEE intelligent transportation systems, pp 254–259. IEEE 16. Li Z, Cai ZX, Xie J, Ren XP (2012) Road markings extraction based on threshold segmentation. In: 2012 9th international conference on fuzzy systems and knowledge discovery, pp 1924– 1928. IEEE 17. Zhang W, Liu H, Xuncheng W, Xiao L, Qian Y, Fang Z (2019) Lane marking detection and classification with combined deep neural network for driver assistance. Proc Inst Mech Eng Part D: J Automobile Eng 233(5):1259–1268

An Energy-Efficient Cluster Head Selection in MANETs Using Emperor Penguin Optimization Fuzzy Genetic Algorithm Fouziah Hamza and S. Maria Celestin Vigila

Abstract Mobile ad hoc network comprised of autonomous movable nodes without any central control. Due to the lack of centralized control in mobile ad hoc networks, an energy-efficient leader is required to run the intrusion detection system. To reduce energy consumption during the intrusion detection process, a new energy-efficient cluster head selection algorithm named emperor penguin optimization fuzzy genetic algorithm (EPO-FGA) is proposed here. The proposed strategy incorporates the dynamic capability of the fuzzy genetic algorithm as well as the high search efficiency of emperor penguin optimization, which enhances the lifetime of the mobile node. The emperor penguin algorithm first performs an effective cluster formation based on characteristics such as mobile node speed, movement direction, and node position. Following that, a fuzzy genetic algorithm-based strategy is being used to pick the best cluster head based on three major factors: node energy, node mobility, and node degree. The implementation is performed on the NS2 platform. The results are evaluated in terms of packet delivery ratio, network energy utilization, throughput, network lifetime. The simulation results show that the introduced strategy achieves a higher lifetime and a lower energy consumption than others. Keywords Cluster head · Clustering · Emperor penguin algorithm · Fuzzy logic · Genetic algorithm and MANET

1 Introduction Mobile ad hoc networks (MANETs) are a mobile device-based self-organizing multihop wireless network [1]. Due to their ease of deployment, low-cost MANETs are increasingly being used in applications such as wartime communication, vehicular networks, rescue services, home networking, training, and recreation [2]. Because of their characteristics, MANETs are vulnerable to attacks including black-hole, passive eavesdropping, flooding, active interfering, denial of service, impersonating a node, F. Hamza (B) · S. M. C. Vigila Noorul Islam Center for Higher Education, Kanyakumari, Kumaracoil, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_39

453

454

F. Hamza and S. M. C. Vigila

spoofing, modifying, and deleting messages, and so on [3, 4]. An intrusion detection system (IDS) can detect and stop security threats in MANETs. Unlike traditional networks, mobile ad hoc networks lack a central point where IDSs can be deployed. As MANETs don’t have a central entity, it is essential to have a clustering algorithm and select a suitable node as a cluster head who can act as a local coordinator and manage the IDS to screen various network functions and intrusions. A cluster is a collection of nodes. A robust cluster formation and cluster head selection mechanism is necessary for an efficient communication between neighboring nodes. The cluster head (CH) collects and transfers data in a MANET. MANET issues such as topology control, intrusion detection, and routing necessitate the use of a well-structured clustering system [5]. Cluster-based routing can reduce routing overhead within a cluster by forming groups and assigning a coordinator (CH) to each group. Clustering also enables hierarchical routing by recording paths between clusters rather than groups [6]. Clustering-based malicious node recognition achieves improved performance in terms of energy usage, packet delivery ratio (PDR), and latency in MANET [7]. A load-balanced cluster head is produced by an efficient clustering scheme to adapt topology dynamics [8]. To some extent, a cluster aids in improving MANET performance. If the algorithm is powerful enough, the data transfer rate can be enhanced while the delay is reduced [9, 10]. Each cluster has a cluster head (CH) who gathers data from its members and transmits to the other cluster heads [11]. The cluster head keeps track of members and topology. In the highest degree clustering algorithm, a node’s degree is determined by its distance from neighboring nodes. The most connected node is chosen as the CH [12]. In addition to improving bandwidth reusability and resource allocation, a cluster head node also improves power control [13]. The CH is changed periodically to save energy and prevent network connection problems due to CH failure [14]. Clustering increases MAC protocol performance by boosting spatial reuse, throughput, scalability, and power usage [15]. The number of CH must be ideal as the nodes that play the role of CH utilize the additional energy. The motivation for using the cluster-based approach is to enhance the network’s energy efficiency. The energy required for data communication is significantly greater than the energy required for detecting and listening on the control channel combined. As a result, reducing global data communication of mobile nodes is the key to energy efficiency. Clustering decreases packet loss by minimizing global communication. The main contribution of the work is to, • Develop a hybrid dynamic optimization algorithm-based clustering algorithm • Create an efficient clustering algorithm based on the EPO algorithm that replicates emperor penguins’ huddling behavior • Create an optimal cluster head selection algorithm by using the combination of fuzzy logic and genetic algorithm (GA). The remaining works are organized as follows: Sect. 2 includes the literature; Sect. 3 introduces the concept of the proposed approach; Sect. 4 includes experimental results and discussions, and Sect. 5 concludes the research.

An Energy-Efficient Cluster Head Selection in MANETs Using …

455

2 Review of Literature This section gives a brief description of some of the most recent existing research on energy-efficient clustering-based routing. Ali et al. [16] founded the multi-objective particle swarm optimization (MOPSO) algorithm. The primary goal of this scheme was to create an ad hoc network with fewer clusters and less network traffic. The CH managed the inter-cluster and intracluster traffic in this case, and a set of solutions was provided at a time. The implemented results were obtained using the optimal Pareto front approach. Pathak and Jain [17] suggested an optimized stable clustering algorithm (OSCA) for excellent network durability by minimizing CH changes and clustering overhead. In this method, a new node was added to the cluster called backup node, which served as the cluster’s CH when the original CH died. The OSCA technique increased network stability by reducing clustering overhead. He-Series was proposed by Rajeswari and Ravi [18]. A heterogeneous MANET with multiple MAC layer protocols is investigated in this research. The Heterogeneous Secured Eed-Reflection-Induced-eState (He-SE-RIeS) algorithm has scaling and offloading issues. He-SE-RIeS was developed to address the shortcomings of the current SE-RIeS algorithm. This algorithm generates the cluster head (CH) and organizes nodes that have similar requirements into different clusters. Drishya and Vijayakumar [19] suggested an energy-efficient robust clustering technique for MANET. The cluster-based routing protocol (CBRP) is a well-known and well-proven routing protocol in MANET. The cluster’s stability depends on the CH’s. So, for a CH to live longer, CH selection should be efficient. The proposed technique improved cluster formation and cluster head longevity. Popli et al. [20] introduced an efficient and secure cluster head selection algorithm that takes security into account during the clustering process. Only, non-malicious nodes participate in the selection of cluster heads in this algorithm, which improves security. A k mean algorithm approach is used to maintain distributed clusters, and cluster heads are selected by taking various parameters like connectivity and residual energy into account. According to the simulation results, the proposed approach outperforms traditional LID algorithms (Lowest identifier).

3 Proposed Methodology: Energy-Efficient CH Selection in MANET Using Emperor Penguin Optimization Fuzzy Genetic Algorithm In this study, developing a hybrid dynamic optimization-based clustering is suggested to tackle the challenges of active cluster formation in MANET. This technique is based on emperor penguin optimization (EPO) that replicates the huddling (grouping) behavior of emperor penguins for creating efficient clusters. The combination of FGA

456

F. Hamza and S. M. C. Vigila Start

1

Number of nodes

2

Speed

Data Generation based on network parameters

Transmission range

Grid size

EPO based Clustering (based on similarity of features)

Location

Direction

Position

Node Energy

FGA based CH Selection

3

Node Mobility

Node Degree

End

Fig. 1 Schematic diagram of energy-efficient CH selection in MANET

is also utilized to select the CH depending upon the cluster formed. Figure 1 depicts the schematic design of energy-efficient CH selection in a MANET.

3.1 System Model Figure 2 shows the MANETs’ schematic system model. This is because the mobile nodes are classified into several groupings. Leader nodes are in charge of group administration and key management. MANET includes N independent moveable nodes by undirected graph G = (V, E), where V is the set of vertices and E is the set of edges connecting any two vertices.

An Energy-Efficient Cluster Head Selection in MANETs Using …

457

Fig. 2 Schematic system model for MANETs

Mobile Nodes Cluster Head

Fitness Function: The fitness function of EPO algorithm is defined as: Fitness = Similarity of nodes features

(1)

The speed of the mobile nodes, the direction of movement, the position of the nodes, and mobility of the differences are the features considered for the clustering process.

3.2 Proposed EPO (Emperor Penguin Optimization)-Based Clustering In this section, the mathematical model of the huddling (crowd) manner of emperor penguins explained. Finding an effective mover is the primary goal of this model. Initially, the cluster boundary is created by the mobile nodes. Next, the direction around the cluster is evaluated. For more exploration and exploitation, distance between mobile nodes is also calculated. At last, obtain the effective cluster and the boundary of the cluster with updated positions of mobile nodes recomputed. Figure 3 depicts the pseudo-code for this approach.

3.2.1

Generation and Determination of Cluster Boundary

Initially, the number of mobile nodes is generated. Normally, mobile nodes positioned personally on the polygon grid boundary shape during the cluster boundary creation.

458

F. Hamza and S. M. C. Vigila

Fig. 3 Pseudo-code for EPO

3.2.2

Evaluation of Direction of Mobile Nodes

Through the conservation of energy, the mobile nodes balance the direction inside the cluster. The indicated node’s direction is responsible for the emperor penguin’s exploitation and exploration course with various positions. The direction of nodes all over the cluster is computed as follows: 

epochs direction = direction − a − epochs 

 (2)

An Energy-Efficient Cluster Head Selection in MANETs Using …

 direction =

0 if R > 1 1 if R < 1

459

(3)

where a indicates the present iteration, an epoch is a supreme iteration, and node radius is expressed as R.

3.2.3

Distance Calculation Among Mobile Nodes

Once the cluster boundary is created, the distance among mobile nodes and the best obtained optimal solution is calculated. The nodes with least fitnessvalue are considered as the best ideal solution. According to the current best optimal result, the remaining mobile nodes will update their location. A mathematical function is explained as follows:  → →  → → → Dmn = Abs s ( u ) . p (a) − v . pmn (a)

(4)

where the distance among mobile nodes and the best fittest search agent is denoted →

as Dep the present iteration expressed as a. To avoid collision among other mobile → → nodes, u and v are utilized. The best optimal solution and the location vector of the → → mobile node are denoted as p and pep , respectively. Mobile nodes speed is indicated as s(), and that is reliable to forward in the best search agent direction. → → The vectors u and v are evaluated as:     u = M × direction + Pgrid (accuracy) × Random () − direction







Pgrid (accuracy) = Abs( p − pmn ) →

v = Random ()

(5) (6) (7)

where movement parameter is M that maintains the space among search agents for collision avoidance (M = 2). The polygon grid accuracy is represented as Pgrid (accuracy), and [0, 1] is the range the random function is placed and is denoted as Random (). S() is evaluated by, →

S( u ) =



2 f.e−a / l − e−a

(8)

where the expression function is represented as e. f and l are the control variables. [2, 3], and [1.5, 2] are the ranges for f and l, respectively.

460

3.2.4

F. Hamza and S. M. C. Vigila

Updating Mobile Nodes Position

To update the next position of mobile nodes, the following equation is evaluated. →







p (a + 1) = p (a) − u . D

mn

(9)

mn →

where the next updated position of mobile nodes is represented as pmn (a + 1). The input parameters are described below. • Node energy (Enode ): The maximum power node is assigned as a strong applicant for CH. Likewise, the medium power node is named as the optimum node, and a minimum power node is called a weak node. • Node degree (Dnode ): More than 10% of degree nodes are considered an excellent node. A good node having a degree between 5 and 10% and a bad node having less degree. • Node mobility (M node ): The relative mobility node is named recommended, and likewise, different mobility node but placed in the same direction is named partially recommended. The following equation gives the amount of nectar in a node zi can be evaluated by z i = E node + Dnode + Mnode

(10)

where the node to be explored is given as zi . According to the fact of Euclidian, the distance between CH to the other CH should be the same; this is the ability of a node to become the CH. The output parameter of the fuzzy rule-based system is fitness. Fitness is the three triangular membership function mentioned in Table 1. Figure 4 illustrates the fuzzy approach. It consists of defuzzifier, fuzzifier, fuzzy inference system, and fuzzy rules. • Fuzzifier: To map every input parameter into the equivalent fuzzy sets, fuzzifier is utilized. • Fuzzy inference engine: It is used to fuzzify the given input parameters into the appropriate linguistic variables. There are six levels of linguistic variables are utilized for each fuzzy set. • Fuzzy rule: It simply a series of IF-THEN rules. • Defuzzifier: Defuzzification process records the solution of fuzzy space to a solitary crisp input parameter. Table 1 Variables which are utilized as input with membership functions

Input

Membership functions

Node energy

Maximum

Moderate

Minimum

Node mobility

Great

Average

Poor

Node degree

Large

Medium

Short

An Energy-Efficient Cluster Head Selection in MANETs Using …

461

Fig. 4 Fuzzy logic for CH selection

The three input functions energy, degree, and node mobility activate the transformation of system input to fuzzy sets. Membership function represents energy as minimum, moderate, and maximum. The degree function is large, medium, and short. The mobility function used is great average and poor. The six linguistic levels for the chance output variable are as follows: very high, high, medium, rather medium, low, and very low. Table 1 shows the input variable membership functions. Table 2 lists 27 rules for combining linguistic variables. Example of the fuzzy rules: Rule 1: If node energy is minimum and node mobility is poor and the node degree is short, then the rank is very low. Rule 2: If node energy is minimum and node mobility is average and degree is short, then the rank is low. Rule 3: If node energy is moderate and node mobility is average and degree is short, then the rank is rather medium. Rule 4: If energy is moderate and node mobility is great and the degree is medium, then the rank is medium. Rule 5: If node energy is maximum and node mobility is great and the degree is medium, then the rank is high. Rule 6: If node energy is maximum and node mobility is great and the degree is large, then the rank is very high. The fuzzy rules-based triangular membership function is defined as: 

 x −u w−x , ,0 f (x ; u, v, w) = max min v−u w−v

(11)

where x is the membership function and u, v, and w denote MANET nodes. The ideal CH may be determined, and the network life can be extended by taking into account a node’s energy, mobility, and degree. Determining the location and number of CH in this section is challenging. Other research’s common clustering algorithms have benefited from heuristic methods. At the same time, the GA is very flexible and adaptive. The place of the CH is determined by a genetic algorithm, in a manner that the low amount of energy is consumed. This

462

F. Hamza and S. M. C. Vigila

Table 2 Input variables using fuzzy rules Rules

Node energy

Node mobility

Node degree

Rank

1

Maximum

Poor

Short

Rather medium

2

Maximum

Poor

Medium

Rather medium

3

Maximum

Poor

Large

High

4

Maximum

Average

Short

Medium

5

Maximum

Average

Medium

Medium

6

Maximum

Average

Large

High

7

Maximum

Great

Short

High

8

Maximum

Great

Medium

High

9

Maximum

Great

Large

Very high

10

Moderate

Poor

Short

Low

11

Moderate

Poor

Medium

Rather medium

12

Moderate

Poor

Large

Medium

13

Moderate

Average

Short

Rather medium

14

Moderate

Average

Medium

High

15

Moderate

Average

Large

High

16

Moderate

Great

Short

Medium

17

Moderate

Great

Medium

Medium

18

Moderate

Great

Large

High

19

Minimum

Poor

Short

Very low

20

Minimum

Poor

Medium

Low

21

Minimum

Poor

Large

Rather medium

22

Minimum

Average

Short

Low

23

Minimum

Average

Medium

Rather medium

24

Minimum

Average

Large

Medium

25

Minimum

Great

Short

Low

26

Minimum

Great

Medium

Medium

27

Minimum

Great

Large

High

model adopts a GA [21] try to find the optimal beliefs of nodes parameter and choose the cluster head. Figure 5 depicts the flowchart of FGA.

An Energy-Efficient Cluster Head Selection in MANETs Using …

Fig. 5 Flowchart for FGA

463

464

F. Hamza and S. M. C. Vigila

The initial population contains the GA’s randomly generated chromosomes and is used to identify the optimum node metrics. The three parameters such as mobility, energy, and degree were directly given to the real-value data. The algorithmic steps are given below: • Representation: The chromosomes are generated by using the relative real-value variables when the GA solved the optimal issues. Therefore, the chromosomes are appeared as straightforward in GA. X = {r 1, r 2, r 3,} is the representation of chromosome X, where the regularization parameters node energy, mobility, and degree are denoted by r 1, r 2, r 3, respectively. Selection, crossover, mutation, etc., are the steps which used in the GA. • Fitness value: It is very easy to evaluate in fuzzy rules, and it is the most suitable one. Fitness evaluation module calculates the fitness of every individual chromosome, i.e., parameters using the results produced by the simulation unit. A chromosome cij ’s fitness F cij is calculated by,      n 1 1 + ν× Energyi Fcij = μ × i=1 n ξ

(12)

In the above equation, ith chromosome in the jth generation is denoted as cij ; membership function energy of the network is denoted as energy; μ and ν are the weight factors; an average number of hops traversed by false reports is denoted as ξ , and n is the size of the network. • Population initialization: Initially, the population was collected from eight randomly generated chromosomes. A trade-off among the population diversity and convergence time is the population size of eight chromosomes, i.e., nodes. • Fitness evaluation: The fitness value for each chromosome was calculated according to Eq. (12). • Selection: From the present population, to elect eight chromosomes, a standard roulette wheel approach was activated. • Crossover: The simulated binary crossover was applied to randomly paired chromosomes. The probability value was set to 0.8 for creating a new chromosome in each pair.

An Energy-Efficient Cluster Head Selection in MANETs Using …

465

• Mutation: After the crossover operation, this process is initialized and in the next generation whether a chromosome should be mutated is also determined. Then, the polynomial mutation models are applied to the designed approach. Through the probability of 0.05, each node in the fresh population was given to mutation. • Elitist Approach: For the fresh population, the fitness value of the nodes will be evaluated. Consider the reduced fitness of the proposed scheme. IF it is less than the former population, then the node become new. • Stopping Measures: Till the number of generations was reached, the process was repeated from four to eight.

4 Experimental Results and Discussion The experimentation of the works is carried out using NS2 platform. The suggested approach performance is compared with the existing strategies such as Developed Distributed Energy-Efficient Clustering (DDEEC) and stability-based multi-metricweighted clustering algorithm (SM-WCA). The graph below compares the proposed scheme’s performance metrics to those of existing mechanisms for 100–500 nodes. The proposed approach’s performance is assessed using performance metrics such as network lifetime, energy consumption, packet delivery ratio, and throughput. According to Fig. 6, our proposed EPO-FGA approach utilizes less energy than existing works such as DDEEC and SM-WCA. The proposed technique achieved a higher throughput than the previous approaches, as shown in Fig. 7. Finally, Figs. 8 and 9 demonstrate that our suggested strategy outperforms other relevant strategies in terms of packet delivery ratio and network longevity.

5 Conclusion This research presented a new clustering-based CH algorithm named EPO-FGA that considered a fuzzy logic strategy to form a cluster and choose energy-aware CHs. A dynamic optimization algorithm-based clustering is developed to resolve the dynamic clustering issues in MANET. EPO is utilized for the clustering and that replicates the emperor penguin’s huddling behavior for creating efficient clusters. The proposed EPO-FGA analyzed the nodes energy consumption and lifetime and the usefulness of clustering, besides, the energy of network is scrutinized. Initially, the similarly

466

F. Hamza and S. M. C. Vigila

Fig. 6 Performance analysis of energy consumption

Fig. 7 Performance analysis of throughput

characterized nodes are grouped into several clusters by enriched expectation maximization. Then, the FGA approach elected the best CH from the appropriate nodes. The chance of a node to become CH has been found by the hybrid FGA. Simulation is carried out on NS2 platform, and the results showed that the designed method provided improved performance than other methodologies. In the future, for a wider network, the energy will be consumed, and lifetime also enhanced by optimizing the number of levels of the proposed scheme.

An Energy-Efficient Cluster Head Selection in MANETs Using …

467

Fig. 8 Performance analysis of packet delivery ratio

Fig. 9 Performance analysis of network lifetime

References 1. Krishnan RS, Julie EG, Robinson YH et al (2020) Modified zone based intrusion detection system for security enhancement in mobile ad hoc networks. Wireless Netw 26:1275–1289 2. Usman M, Jan MA, He X, Nanda P (2020) QASEC: A secured data communication scheme for mobile Ad-hoc networks. Futur Gener Comput Syst 109:604–610 3. Hamza F, Maria Celestin Vigila S (2019) Review of machine learning-based intrusion detection techniques for MANETs. In: Peng SL, Dey N, Bundele M (eds) Computing and network sustainability. Lecture notes in networks and systems, Springer, Singapore, vol 75, pp 367–374 4. Ahmad M, Hameed A, Ullah F, Wahid I, Rehman SU, Khattak HA (2020) A bio-inspired clustering in mobile adhoc networks for internet of things based on honey bee and genetic algorithm. J Ambient Intell Humaniz Comput 11(11):4347–4361 5. Rajasekar S, Subramani A (2016) Performance analysis of cluster-based routing protocol For MANET using RNS algorithm. Int J Adv Research Com Sci Soft Eng 6(12):234–239

468

F. Hamza and S. M. C. Vigila

6. Rajeswari R, Kulothungan K, Ganapathy S, Kannan A (2016) Malicious nodes detection in MANET using back-off clustering approach. Circuits Syst 07(08):2070–2079 7. Pathak S, Jain S (2016) A novel weight based clustering algorithm for routing in MANET. Wirel Netw 22(8):2695–2704 8. Kaliappan M, Mariappan E, Prakash MV, Paramasivan B (2016) Load balanced clustering technique in MANET using genetic algorithms. Defense Sci J 66(3):251–258 9. Gavhale M, Saraf PD (2016) Survey on algorithms for efficient cluster formation and cluster head selection in MANET. Phys Procedia 78:477–482 10. Thirukrishna JT, Karthik S, Arunachalam VP (2018) Revamp energy efficiency in homogeneous wireless sensor networks using optimized radio energy algorithm (OREA) and power-aware distance source routing protocol. Futur Gener Comput Syst 81:331–339 11. Juliana R, Uma Maheswari P (2016) An energy efficient cluster head selection technique using network trust and swarm intelligence. Wirel Pers Commun 89(2):351–364 12. Rajkumar M, Subramanian S, Shankar S (2014) Internal and external factors based clustering algorithm for MANET. Int J Eng Technol (IJET) 6(4) 13. Sathiamoorthy J, Ramakrishnan B (2017) Energy and delay efficient dynamic cluster formation using hybrid AGA with FACO in EAACK MANETs. Wireless Netw 23(2):371–385 14. Sathiamoorthy J, Professor A, Ramakrishnan B (2016) Computer network and information security. Comput Netw Inf Secur 2:64–71 15. Mohapatra S, Siddappa M (2016) Improvised routing using border cluster node for Bee-AdHocC: an energy-efficient and systematic routing protocol for MANETs. In: 2016 IEEE international conference on advances in computer applications (ICACA), Coimbatore, India, pp 175–180 16. Ali H, Shahzad W, Khan FA (2012) Energy-efficient clustering in mobile ad-hoc networks using multi-objective particle swarm optimisation. Appl Soft Comput J 12(7):1913–1928 17. Pathak S, Jain S (2017) An optimised stable clustering algorithm for mobile ad hoc networks. EURASIP J Wirel Commun Netw 2017(1):1–11 18. Rajeswari P, Ravi TN (2018) He-SERIeS: An inventive communication model for data offloading in MANET. Egypt Inf J 19(1):11–19 19. Drishya SR, Vijayakumar V (2019) Modified energy-efficient stable clustering algorithm for mobile ad hoc networks (MANET). In: Advances in intelligent systems and computing, vol 740, Springer Verlag, pp 455–465 20. Popli R, Garg K, Batra S (2016) SECHAM: Secure and efficient cluster head selection algorithm for MANET. In: 2016 3rd international conference on computing for sustainable global development (INDIACom), IEEE, New Delhi, India, pp 1776–1779 21. Guerrero C, Lera I, Juiz C (2017) Genetic algorithm for multi-objective optimisation of container allocation in cloud architecture. J Grid Comput 16(1):113–135

Ground Water Quality Index Prediction Using Random Forest Model Veena Khandelwal and Shantanu Khandelwal

Abstract The present work predicts and assesses the water quality index (WQI) that exhibits overall water quality levels using machine learning. The physiochemical parameters taken into account for the present work for drinking water quality index are pH, calcium, magnesium, sulphate, chloride, nitrate, fluoride, total hardness, total alkalinity, iron and sodium in mg/l. The physiochemical parameters for irrigation water quality index are electrical conductivity, residual sodium carbonate and SAR in mg/l. WQI is predicted from Yearly Ground Water Quality information from 01 January 2000 to 01 January 2018 using Central Ground Water Board (CGWB) data of Jaipur in the state Rajasthan, India. The data contains information from 118 Ground Water Points /Stations in Ganga Basin. Furthermore, IS-10500 (June 2015) and IS:11624-1986 (Reaffirmed 2001) limits are used for the calculating WQI for drinking and irrigation purposes, respectively. Decision tree regressor and regression random forest models were used for predicting water quality index. Water quality index is determined by the ground water physiochemical parameters. Random forest model outperformed decision tree model by achieving higher model accuracy with RMSE 10.92 and MAE 7.16. Keywords Drinking water · Decision trees · Ground water · Irrigation purpose · Machine learning · Random forest · Water quality index

1 Introduction Rajasthan state in Northwestern India occupies the country’s 10.4% land area. This large land area has only 1.16% surface water coming from streams, river and lakes and 1.70% of ground water which is recharged by rainfall. This makes it almost completely dependent on ground water which is depleting fast as per the study V. Khandelwal (B) · S. Khandelwal SRM Institute of Science and Technology, Delhi NCR Campus, Ghaziabad, India e-mail: [email protected] KPMG Services Pvt. Ltd Singapore, Singapore, Singapore e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_40

469

470

V. Khandelwal and S. Khandelwal

conducted by CGWB in 2018. More than 62.70% of the Rajasthan state water stations show a decline in water levels signifying a severe water crisis in the state. Ground water is an important source of livelihood in regions where rainfall is scanty. Most of the districts in the Rajasthan state have parched earth during peak summer months with deficit rainfall. People in rural parts of Rajasthan not only face severe drinking water problems due to short spells of monsoon and scanty rainfall throughout the year but also crops are destroyed due to drought and non-availability of water. Underground water quality in the Rajasthan state has been threatened due to rapid urbanization and industrialization, over-abstraction, fouled by human-made pollutants such as detergents, chemical fertilizers and naturally occurring toxic minerals such as fluoride, nitrate, uranium. Human activities such as deforestation, destruction of water systems, water use for agriculture, contamination due to chemicals such as those used in fertilizers and pesticides and landscape changes enhance these natural processes. Moreover, high levels of human-induced pollutants may nourish geological formations and processes underneath the surface, worsening the release of natural contaminants confined in aquifer rocks. However, despite the deteriorated ground water conditions, residents consume ground water for fulfilling drinking requirements without any treatment because of the unavailability of alternative resources. The problem of spoiled ground water is “alarming” in the state of Rajasthan. What is worse is that more than 20% of wells have water levels below acceptable limits making it unsafe for drinking. Due to high EC values, about 31.23% of Rajasthan’s water is saline, 30.92% has very high fluoride. Iron in water does not usually cause any severe health impacts. Its main issues involve taste, visual effects and clogging [1]. Excessive iron in water, however, leads to problems such as stained teeth, brown colour-stained toilet wares and the formation of free radicals in the human body [2]. Excessive fluoride may affect the discolouration of teeth or can lead to skeletal deformities. It can also cause lower limb deformities and damage gastrointestinal systems. According to a study, Coyte et al. [3], both man-made or natural pollutants and naturally occurring toxic pollutants are exploiting ground water quality in the state of Rajasthan. Arid to the semi-arid climatic regime and geogenic contaminants of the area and the fact that most of the state’s ground water resources are recharged by rainfall increase the likelihood of geogenic pollution. Furthermore, much of the state is dependent upon ground water either for drinking or for irrigation purposes that worsen the situation. Increasingly, water quality concerns are becoming a grave concern due to the increasing population density, urban sprawl and development. Thus, the assessment of surface water quality has become a crucial issue. Water is the utmost important resource of life, essential for supporting the life of maximum existing creatures and human beings. Living organisms demand water with adequate quality to stay alive. The water quality of rivers and lakes is becoming central to human and economic growth. Public promulgation about drinking water quality is required for public immunity against water-borne diseases. At present, Central Ground Water Board have established monitoring stations to examine any change or difference in ground water quality, but these monitoring stations cannot contribute to evaluation of water quality. Nevertheless, these monitoring stations can be made to contribute significantly towards predictive basis for

Ground Water Quality Index Prediction Using Random Forest Model

471

several data-driven models. Researchers persistently warn that spoiled ground water is life-threatening for drinking without treatment. Therefore, modelling and assessing future water quality are essential for protection of the environment, societal and economic development. It has also become very important in controlling water pollution. Water quality index (WQI) is calculated using a large number of physiochemical parameters [4] and is an indicator for the overall characterization of the quality of water in a comprehensible manner for different purposes. Estimation of quality indices is a long and tedious process. Modelling and predicting water quality have become predominant in controlling water pollution. The current study provides a cost-efficient approach for prompt and precise assessment of water quality. Water quality prediction can help improve the management level of water resources. Precise water quality predictions can also provide a groundwork for policymakers and produce data for the environmental management department to act as an “early warning”. This kind of promising research can not only accurately reflect spatial variations in water quality, but can also contribute significantly towards quickly monitoring water quality levels. The rest of this paper is described as follows. Section 2 presents literature survey with the introduction of some related works done for ground water quality assessment with the help of various machine learning methods. Section 3 provides the methodology used for calculating water quality index, variable importance calculated using random forest and decision tree and classification of ground water quality for the purpose of drinking and irrigation based on the analysis of ground water samples. Section 4 describes the model and algorithm that we used to predict WQI for drinking water and irrigation water. Section 5 discusses the results of our experiments and the efficiency of our model and concludes the paper.

2 Literature Review Authors of Awadh and Al-Kilabi [5] collect 28 ground water samples after 10 min of pumping. Authors classify water quality using information about the concentration of various salts. Authors use Richard diagram, Turgeon and DON classification method. Wang et al. [6] estimate WQI using a combination of support vector regression model and remote sensing spectral indices using fractional derivatives methods. Authors obtain optimal parameters values using PSO method. Authors in Abbasnia et al. [7] assess the ground water quality. The assessment is done analytically to check the usage of water for drinking and agricultural purposes. Analytical techniques require expert judgement for assigning parameter weights to the physiochemical parameters for computing the WQI score. Authors of Banerji and Mitra [8] use principal component analysis and water quality index mapping for evaluating WQI. Authors find that the parametric values of the physiochemical parameter have a tendency to increase abruptly within the shortest of ranges, indicating urban pollution as the root cause of contamination.

472

V. Khandelwal and S. Khandelwal

Water quality of the Yuqiao reservoir in Tianjinr is forecasted using backpropagation neural network technique by authors in [9]. Authors use Levenberg–Marquardt (LM) algorithm to achieve a higher speed and a lower error rate. Two hybrid models, namely extreme gradient boosting (XGBoost) and random forest (RF), are used by Wang et al. [6] to predict six ground water quality indicators of the Tualatin River which is situated in Oregon, USA. Singha et al. [10] predict entropy weight-based ground water quality index EWQI of Mahanadi River Basin (Arang within Chhattisgarh) using machine learning models RF, XGBoost, ANN and DL. Authors of Lu and Ma [11] used hybrid decision tree-based machine learning models, namely XGBoost and random forest to predict WQI of water from Gales Creek site of Tualatin River in Oregon, USA. Authors suggest considering other factors along with time series issues that can affect water quality. Authors in Deeba et al. [12] assess ground water quality of different areas in Punjab region to check its suitability for drinking purpose. Assessment is done in terms of pH, colour, odour and suspended solids. The values of these parameters are difficult to be interpreted by a common man, and their values such as odour and colour also vary from individual to individual. According to Kayastha et al. [13], human activities such as excessive usage of fertilizers and unprocessed industrial waste going into water bodies constitute the two major sources of ground water pollution. Fertilizers and unprocessed industrial waste increase the levels of arsenic and cadmium pollutants that affect the ground water quality to a great extent. Kumar and Sangeetha [14] study water quality of Madurai region using geospatial techniques. The authors collected 20 water samples from the area, and analysis was performed for finding the levels of pH, TH, Ca2+, Cl- and Mg2+ in the ground water. Authors created water quality map with the help of inverse distance weighting (IDW) interpolation technique. Using their technique, they classified the entire area into different classes such as excellent, good, poor. Authors of Najafzadeh et al. [15] perform evaluation of the ground water quality in parts of Iran for different usages using data-driven techniques that use evolutionary algorithms and classification using ground water quality chemical parameters from 1349 observations. Authors show that performance of evolutionary polynomial regression is better than other data-driven methods. Authors of Unigwe and Egbueri [16] use modified WQI, integrated WQI and entropy-weighted WQI with statistical analysis to assess drinking water quality in regions of Nigeria. Authors find that ground water is highly contaminated and cannot be used without treatment.

3 Methodology 3.1 Dataset The study uses Central Ground Water Board (CGWB) data of Jaipur, in the state of Rajasthan, that spans from 01 January 2000 to 01 January 2018. The data

Ground Water Quality Index Prediction Using Random Forest Model

473

Table 1 Weight and feature importance of different physiochemical parameters for drinking water quality Component IS-10500 Weight Feature importance Feature importance (June 2015) (decision tree) (random forest) pH Ca Mg SO4 Cl NO3 F TH TA Fe Na

7.5 75 30 200 250 45 1 300 200 0.3 50 Sum

0.029253 0.002925 0.007313 0.001097 0.000878 0.004876 0.219400 0.000731 0.001097 0.731333 0.001097 1.000000

0.000544 0.000169 0.000339 0.000624 0.000078 0.000337 0.016071 0.000083 0.000314 0.981245 0.000197 1.000000

0.000515 0.000255 0.000199 0.000253 0.000239 0.000220 0.011750 0.000304 0.000869 0.984886 0.000509 1.000000

contains information of all physiochemical parameters from 118 Ground Water Points/Stations in Ganga Basin. Furthermore, IS-10500 (June 2015) and IS:116241986 (Reaffirmed 2001) limits are used for the calculating WQI for drinking and irrigation purposes, respectively.

3.2 Variable Importance Variable importance by ML models is studied by authors in [17]. The input physiochemical parameters have a significant role as they control the prediction model dependability and usability [18]. The importance of input chemical parameter variables for the calculating WQI is computed using decision tree and random forest prediction models and is shown in Tables 1 and 2. Decision tree model gives higher importance to iron, fluoride, SO4 and pH and least importance to chlorine. Random forest model gives higher importance to iron, fluoride, total alkalinity and pH and least importance to magnesium. Total hardness and chlorine have remarkably lower relative importance as compared to the other chemical parameters. Electrical conductivity has the highest feature importance for irrigation water quality index computation.

474

V. Khandelwal and S. Khandelwal

Table 2 Weight and feature importance of different physiochemical parameters for irrigation water quality Component IS:11624-1986 Weight Feature Feature (Reaffirmed importance importance 2001) (decision tree) (random forest) EC RSC SAR

7.5 75 30 Sum

0.000869 0.868810 0.130321 1.000000

0.999272 0.000391 0.000337 1.000000

0.998349 0.001090 0.000561 1.000000

3.3 WQI Calculation WQI: It is a rating that reflects the influence of several water quality parameters that affect the overall water quality. Representation of water quality chemical parameter using water quality index converts chemical parameters into information that can be easily interpreted and is understandable and more usable. Parameter Selection: We first, study the Indian Standard IS-10500-91, Revised 2003 for drinking water specification and irrigation water specification. It provides the physicochemical parameters and the desirable limits. Calculation of WQI: The WQI is calculated using the following steps: 1. wi weight is assigned to the selected water chemical parameters (pH, Ca, Mg, SO4 , Cl, NO3 , F, TH, TA, Fe, Na). Weight is assigned depending on their relative importance in defining the overall quality of water to be used for drinking purposes and water chemical parameters (EC—electrical conductivity, RSC— residual sodium carbonate, SAR—sodium adsorption ratio) according to their relative importance in the calculating the overall quality of water for irrigation use. 2. Computation of a relative weight Wi for calculating the chemical parameter: Wi = wi /

n 

wi

(1)

1

where Wi denotes the relative weight, wi denotes the weight of every parameter, and n denotes the total number of parameters used. 3. Calculating quality rating scale qi for every parameter qi = (Ci /Si ) × 100

(2)

where qi represents the quality rating, Ci denotes the concentration of every chemical parameter in every sample of water measured in mg/l, and Si is the guideline value or desired limit as per specifications. For computing water quality index, we first determine the subindex S I for every chemical parameter.

Ground Water Quality Index Prediction Using Random Forest Model

475

S Ii = Wi × qi W QI =

n 

(3)

S Ii

(4)

1

where S Ii denotes the subindex, Wi denotes the relative weight, and qi denotes the rating depending on the concentration of i th chemical parameter.

4 Drinking Water WQI and Irrigation Water WQI Prediction Using the decision tree and random forest models, we predict WQI for drinking and irrigation purposes. The output calculations of WQI are shown in Tables 3 and 4. Regression forest parameters for WQI prediction are number of trees = 500, OOB score=true. Table 5 shows the RMSE and MAE error in WQI prediction values for drinking and irrigation purposes. RMSE values obtained using regression random forest are much less than decision tree regressor. This shows that regression random

Table 3 Classification of ground water quality for drinking purpose according to water quality index S. No. WQI values Class Quality No. of % of samples samples 1 2 3 4 5

< 50 50 – 100 101–200 201–300 >300

I II III IV V

Excellent Good Poor Very poor Unsuitable Total

65 94 129 59 152 499

0.13 0.19 0.26 0.12 0.30 1.00

Table 4 Classification of ground water quality for irrigation purpose according to water quality index S. No. WQI values Class Quality No. of % of samples samples 1 2 3 4

< 150 150 – 300 300 – 450 > 450

I II III IV

None Slight Moderate Severe Total

0 2 1 351 354

0.00 0.01 0.00 0.99 1.00

476

V. Khandelwal and S. Khandelwal

Table 5 RMSE and MAE errors in WQI prediction Drinking water WQI Error RMSE MAE

Decision tree 14.61 8.66

Random forest 11.53 7.77

Irrigation water WQI Decision tree 18.87 9.27

Random forest 10.92 7.16

Fig. 1 Calculated and predicted water quality index using regression random forest for a drinking water b irrigation water

forest model is a good predictor for computing WQI values when extreme values of Fe, F and electrical conductivity are not taken into consideration. Figure 1 shows graph of calculated and predicted WQI for both drinking and irrigation purposes.

5 Conclusion This study proposes two ML models, namely decision tree and regression random forest models, for ground water quality prediction. The study compares the prediction results of random forest model against decision tree model. The prediction accuracy of WQI of the two models is compared using MAE and RMSE. The results distinctively show that random forest model outperformed decision tree model by achieving higher model accuracy with RMSE 10.92 and MAE 7.16. The results further confirmed that the random forest and decision tree models are capable of effectively identifying the feature importance of input parameters in the WQI computation. Random forest model can be applied effectively for assessing ground water quality for both drinking and irrigation purposes in the present area of study. The present study uses dataset of Jaipur district in Rajasthan. Dataset from other districts in Rajasthan and other states of India could provide a more deeper insight into the ground water quality of the present area under study.

Ground Water Quality Index Prediction Using Random Forest Model

477

References 1. Chaturvedi S, Dave PN (2012) Removal of iron for safe drinking water. Desalination 303:1–11 2. Ajayi O, Omole D, Emenike PC (2016) Use of agricultural wastes and limestone for the removal of iron from drinking water 3. Coyte RM, Singh A, Furst KE, Mitch WA, Vengosh A (2019) Co-occurrence of geogenic and anthropogenic contaminants in groundwater from Rajasthan, India. Sci Total Environ 688:1216–1227. [Online]. Available: https://doi.org/10.1016%2Fj.scitotenv.2019.06.334 4. Dede OT, Telci IT, Aral (2013) The use of water quality index models for the evaluation of surface water quality: a case study for Kirmir Basin, Ankara, Turkey. Water Quality Exposure Health 5(1):41–56. [Online]. Available: https://doi.org/10.1007 5. Awadh SM, Al-Kilabi JAH (2014) Assessment of groundwater in Al-Hawija (Kirkuk governorate) for irrigation purposes. Iraqi J Sci 55(2B):760–767 6. Wang X, Zhang F, Ding J (2017) Evaluation of water quality based on a machine learning algorithm and water quality index for the Ebinur lake watershed, China. Sci Rep 7(1). [Online]. Available: https://doi.org/10.1038 7. Abbasnia A, Yousefi N, Mahvi AH, Nabizadeh R, Radfard M, Yousefi M, Alimohammadi M (2018) Evaluation of groundwater quality using water quality index and its suitability for assessing water for drinking and irrigation purposes: Case study of sistan and baluchistan province (Iran). Human Ecol Risk Assess: An Int J vol 25, no 4, pp 988–1005. [Online]. Available: https://doi.org/10.1080 8. Banerji S, Mitra D (2019) Geographical information system-based groundwater quality index assessment of northern part of Kolkata, India for drinking purpose. Geocarto Int 34(9):943–958 9. Zhao Y, Nan J, Cui F-Y, Guo L (2007) Water quality forecast through application of bp neural network at Yuqiao reservoir. J Zhejiang Univ-Sci A 8(9):1482–1487 10. Singha S, Pasupuleti S, Singha SS, Singh R, Kumar S (2021) Prediction of groundwater quality using efficient machine learning technique. Chemosphere 276:130265. [Online]. Available: https://doi.org/10.1016 11. Lu H, Ma X (2020) Hybrid decision tree-based machine learning models for short-term water quality prediction. Chemosphere 249:126169 12. Deeba F, Abbas N, Butt M, Irfan M (2019) Ground water quality of selected areas of Punjab and Sind provinces, Pakistan: chemical and microbiological aspects. Chem Int 5(4):241–246 13. Kayastha V, Patel J, Kathrani N, Varjani S, Bilal M, Show PL, Kim S-H, Bontempi E, Bhatia SK, Bui X-T (2022) New insights in factors affecting ground water quality with focus on health risk assessment and remediation techniques. Environ Res 212:113171 14. Kumar S, Sangeetha B (2020) Assessment of ground water quality in Madurai city by using geospatial techniques. Groundwater Sustain Develop 10:100297 15. Najafzadeh M, Homaei F, Mohamadi S (2022) Reliability evaluation of groundwater quality index using data-driven models. Environ Sci Pollution Res 29(6):8174–8190 16. Unigwe CO, Egbueri JC (2022) Drinking water quality assessment based on statistical analysis and three water quality indices (mwqi, iwqi and ewqi): a case study. Environ Dev Sustain pp 1–22 17. Specification ISDW (2012) Is 10500.(2012) Bureau of Indian Standards 18. Gültekin B, Sakar ¸ BE (2018) Variable importance analysis in default prediction using machine learning techniques. In: Proceedings of the 7th international conference on data science, technology and applications. SCITEPRESS - Science and Technology Publications. [Online]. Available: https://doi.org/10.5220

Near Threshold Operation Based a Bug Immune DET-FF for IoT Applications Sumitra Singar

and Raghuveer Singh Dhaka

Abstract With the latching input feature on each end of the clock, the dual edge triggered flip-flop (DET-FF) can boost energy performance compared to the single edge triggered flip-flop (SET-FF). For the actual performance at low voltage as a nearby circuit, the proposed bug immune dual edge triggered flip-flop (BI_DET_FF) layout will meet the requirement. This BI_DET_FF reduces power intake, space and will increase the rate and performance of the gadget and may offer completely bug free output. The BI_DET_FF design is modeled and authorized via the SPICE with the range of 0.4–1 V power supply. Delay and power analysis were performed in near threshold and super threshold with technology of 130, 90, 65, 45, 32, 22 and 16 nm. Outcome evaluation and comparisons with existing designs confirmed that the proposed BI-DET-FF can function at a minimum voltage of 0.4 V at temperatures ranging from −60 to 100 °C as long as it holds a very fine power delay product (PDP) and energy efficiency. This design may be used for energy restricted IoT applications. Keywords Clock distribution network · Dual edge triggered · System bugs · Near threshold designs · Voltage scaling

1 Introduction These days, flip-flops are widely used to keep information and synchronization. The bug tolerant capacity and the performance of the devices are actually tormented by the reliability, speed and power intake of the flip-flops. Therefore, there is a demand to design flip-flops to remove power intake, space, delay and high reliability that can S. Singar (B) Bhartiya Skill Development University, Jaipur, Rajasthan, India e-mail: [email protected] R. S. Dhaka Thapar Institute of Engineering and Technology, Patiala, Panjab, India e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_41

479

480

S. Singar and R. S. Dhaka

face up to bugs. Considering the excessive frequency operation and reduced size of VLSI chips, power intake will become a most important problem. With excessive energy consumption, CMOS circuits can affect operational frequency. The supply voltage is as little as the threshold voltage (V th ) of the circuit can extensively lessen power consumption [1]. Of course, there are some barriers in sub-threshold circuits [1]. The decreased voltage supplies to sub-threshold region may also get hold of less power, even though, it is miles suitable for some very low power designs [2]. Measuring the operating voltage in the near threshold voltage (V th-n ) area also can reduce power intake, but the performance price of numerous operations to be carried [3, 4]. A massive part of the dynamic power is used by the synchronous system. To create flip-flops with great energy efficiency and low power intake, many research activities have been proposed through numerous researchers [5–8]. Voltage scaling down is an easy, however, extensively used technique to improve energy performance within the electrical circuit [9]. Amidst energy constrained IoT applications [10, 11] and battery operated miniature sensors [12–14], scaling down the voltage is broadly accepted. Near threshold voltage circuit designs require additional circuit margins compared to standard or normal circuit construction [15]. V th-n designs, with a supply voltage above the standard threshold voltage of MOS transistors, represent the pleasant settlement between power consumption and overall performance of low power VLSI circuits [16]. The V th-n area provides an extra reduction in power intake compared to the sub-threshold area in terms of great performance [17]. This draws researchers into the area of V th-n computing. In [18], submitted research work on V th-n effects on delay with different technologies and ensured that delays are enhanced with the enhancement in the threshold voltage and declined in the new technologies where volume decline. Generally, a large wide variety of flip-flops are used in a standard processor [19]. For IoT powered applications, the low energy intake of flip-flops has a high impact on system efficiency.

2 Existing Work To improve battery life, reducing power intake in IoT integrated circuits is a good deal [20]. To reduce power intake and enhance the performance of digital systems, currently researchers are specializing in DET flip-flops. The dual data rate (DDR) DET-FF has a low clock load due to its simple configuration and feature of small operation [21]. A c-element-based solid DDR_DET_FF, as presented in Fig. 1, used precise clocks to connect the data to reduce the power consumption of the clock without any additional circuitry [21]. The DET flip-flop offers the same data rate

Near Threshold Operation Based a Bug Immune DET-FF for IoT …

481

Fig. 1 DDR_DET_FF

like the SET flip-flop at 50% of the clock frequency that may lessen the energy consumption of logic designs [22]. The provided design of LG_C_DET_FF [23] offers enormous improvements in the area of power dissipation because of bugs at the input node. The LG_C_DET_FF, without bugs, is designed with three c-element systems as shown in Fig. 2. In [24], to enable overall performance, reduce clock distortion troubles and get reliable performance at the near threshold voltage, a true single phase clocking (TSPC) system is used. For the proposed novel layout, we used a 2P-1N structure with three transistors [25] and a c-element circuit [26]. The 2P-1N circuit has three transistors, but the c-element circuit has four transistors. In [27], a DET D-type flip-flop with a half static clock gating circuit (HSCG_DET) was developed. The two flexible latches are used, where, to catch and hold the data signal in feedback to both the falling and rising edges of the clock signal, the falling and rising edges of the gated clock are linked parallel to a semi static clock [27]. With integrated true random number generator (TRNG), a balanced D flip-flop was suggested by the authors and it provides good randomization with low power per bit as per the simulation results [28]. Miscellaneous radiation hardened flipflops are suggested by the researchers to improve design reliability, as the need for Fig. 2 LG_C_DET_FF

482

S. Singar and R. S. Dhaka

more important safety applications grows. To obtain both soft error tolerance and testability, a delay adjustable self-testable flip-flop, DAST-FF, with two additional multiplexers, one for scanning and one for delay verification, was suggested [29].

3 New Bug Immune DET-FF In this research article, we have designed a new bug immune dual edge triggered flipflop (BI_DET_FF). Figure 3 presented a fresh approach designed with the integration of two 2P-1N systems and a c-element circuit. The proposed circuit uses much less power and offers bug free output as described underneath. The functionality of the BI_DET_FF is as follows: (1) Either the bug is corrected or it reaches the output Q. (2) Though the bug reaches output Q, lastly the bug is corrected through the feedback circuit. We now expect that the beginning case of the M and N nodes, as presented in Fig. 3, maybe M is zero and N is zero, X is one and Y is also one, output Q is 1 and clk is also zero. M’s status will change from 0 to 1, when any bug will have happened from the earlier combinational circuit. M is 1 and clk is 0, in the first 2P-1N structure, in such a manner the output of the first 2P-1N structures is zero (X = 0) and in the second formation of 2P-1N, M is 1 and clk is 1, ensuing output is zero (Y = 0). Now, for c-element circuit both inputs are zero (X = 0, Y = 0), consequently, its output switches to its input value (N = 0, Q = 1). As an end result, we will see that there may be no difference among the prior output case and the current output case of the BI_DET_FF. We can also count on that the beginning case of nodes M and N could be M is one and N is 1, X is zero and Y is also 0, output Q is 0 and clk is also zero. Though Fig. 3 BI_DET_FF

Near Threshold Operation Based a Bug Immune DET-FF for IoT …

483

any bug happens at M, the case of M will exchange from one to zero. In first 2P-1N circuit, M is zero and clk is also zero, consequently, the output of the primary 2P-1N systems is one (X = 1) and in the second 2P-1N circuit, M is zero and clk is one, consequently, the output of the second 2P-1N circuit is zero (Y = 0). Now for the c-element structure, the inputs are as X is one and Y is zero, so its output is one (N = 1) after which final output Q is zero (Q = 0). And we will see that there may be no changes among the prior output case and the current output case of the BI_DET_FF. In truth, the false input is erased without any degradation in the efficiency, quality, time, area and power. The proposed novel BI_DET_FF is completely immune to bugs and works very well with excessive velocity.

4 Result Analysis The proposed BI_DET_FF is evolved by the use of a SPICE simulator with PTM 130, 90, 65, 45, 32, 22 and 16 nm CMOS technologies [30] with varying power supply from 0.4 to 1 V. To simulate the circuit, the clock frequency is decided to 500 MHz. The overall performance result outcomes (16 nm) are stated in Table 1. The delay, average power intake and power delay product (PDP) calculations are calculated and proven across all DET_FF taken into consideration designs. We word that the proposed flip-flop (BI_DET_FF) has very low propagation delay, power consumption, PDP and a small count of transistors compared to existing DET_FF designs (i.e., DDR_DET, TSPC_DET, LG_C_DET, FS_TSPC_DET and HSCG_DET). We achieved the delay evaluation of the proposed design in the near threshold region (Fig. 4) and super threshold region (Fig. 5) with the aid of distinct technology nodes. From the simulation outcomes, we have noticed that the delay is decreased by means of superior technology and is reduced with the aid of growing the supply voltage. Delays are high in the near threshold area (0.4–0.7 V) but low in the super threshold area (0.8–1 V). The evaluation of the average power consumption of the proposed design in the near threshold area and super threshold area with different technologies is shown in Figs. 6 and 7. Power intake is reduced with advanced technology. Also, power consumption decreases as the supply voltage decreases. Power intake is low within the near threshold area (0.4–0.7 V) and high inside the super threshold area (0.8– 1 V). Figures 8 and 9 display a power delay product that goes down with higher technologies as predicted. In near threshold area, the PDP is high, but in the super threshold area, it is low.

C-element

28

Two phase

12.089

192.798

155.019

1.874

Type

Transistors

Clock phase

Pavg.cons. (µw)

t p(D-Q) (ps)

t p(CLK-Q) (ps)

PDP (fJ)

TSPC_DET[22]

0.578

73.199

95.687

7.895

Single phase

38

Latch-MUX

LG_C_DET [23]

0.856

96.198

114.354

8.897

Two phase

28

C-element

FS_TSPC_DET [24]

0.627

81.6

99.64

7.69

Single phase

36

Latch-MUX

Bold represent the best result of proposed design as compared to other existing DET-FF designs

DDR_DET [21]

DET-FF design

Table 1 Result analysis of different DET-FF designs HSCG_DET [27]

3.084

321.2

198.98

15.5

Two phase

34

M-S latch

BI_DET_FF (proposed work)

0.232

77.071

106.965

3.010

Two phase

18

C-element

484 S. Singar and R. S. Dhaka

Near Threshold Operation Based a Bug Immune DET-FF for IoT …

485

Delay in Near-Threshold Region

Fig. 4 BI_DET_FF delay evaluation inside the near threshold region with numerous technologies

16

0.7 V

0.6 V

0.5 V

0.4 V

Technology (nm)

22 32 45 65 90 130 0

50

100

150

200

250

300

350

400

Delay (ps)

Delay in Super-Threshold Region

Fig. 5 BI_DET_FF delay evaluation within the super threshold region with numerous technologies

16

1V

0.9 V

0.8 V

Technology (nm)

22 32 45 65 90 130 0

50

100

150

200

250

300

Delay (ps)

Avg. Power Cons. in Near-Threshold Region 16

Technology (nm)

Fig. 6 Evaluation of the average power consumption of BI_DET_FF in the near threshold area with different technologies

0.7 V

0.6 V

6

7

0.5 V

0.4 V

22 32 45 65 90 130 0

1

2

3

4

5

Avg. Power Cons. (μW)

8

9

486

S. Singar and R. S. Dhaka

Avg. Power Cons. in Super-Threshold Region

Fig. 7 Analysis of the average power consumption of BI_DET_FF in the super threshold area with different technologies

1V

0.9 V

0.8 V

Technology (nm)

16 22 32 45 65 90 130 0

2

4

6

8

10

Avg. Power Cons. (μW)

PDP in Near-Threshold Region

Fig. 8 Power delay product evaluation for BI_DET_FF in near threshold area with different technologies

16

0.7 V

0.6 V

0.5 V

0.4 V

Technology (nm)

22 32 45 65 90 130 0

0.5

1

1.5

2

2.5

3

PDP (fJ)

5 Conclusion This article introduces a unique low power BI_DET_FF. The proposed flip-flop reduces circuit size and power intake and increases device velocity and efficiency. From the simulation results, we have ensured that the delay is reduced within the higher technologies with higher supply voltages and delay is high in near threshold area (0.4–0.7 V) and low in the super threshold area (0.8–1 V). Power consumption decreases because the supply voltage decreases and also decreases with higher technology. Power consumption is low in near threshold area (0.4–0.7 V), however, high in the super threshold area (0.8–1 V). The PDP also improved and declined with higher technologies in both the near threshold and super threshold areas. BI_DET_FF

Near Threshold Operation Based a Bug Immune DET-FF for IoT …

487

PDP in Super-Threshold Region

Fig. 9 Power delay product analysis of BI_DET_FF in a super threshold area with different technologies

16

1V

0.9 V

0.8 V

Technology (nm)

22 32 45 65 90 130 0

0.5

1

1.5

2

2.5

PDP (fJ)

can effortlessly be operated in the near threshold area with low power intake and delay with higher technologies without any deviation of performance parameters. This flipflop may be used in contemporary digital designs with higher technologies for power constrained IoT programs.

References 1. Verma N, Kwong J, Chandrakasan AP (2008) Nanometer MOSFET variation in minimum energy subthreshold circuits. IEEE Trans Electron Devices 55(1):163–173 2. Bol D, Flandre D, Legat JD (2009) Technology flavor selection and adaptive techniques for timing-constrained 45nm sub-threshold circuits. In: Proceedings of the 2009 international symposium on low power electronics and design, San Francisco, USA pp 21–26 3. Dreslinski R, Wieckowski M, Blaauw D, Sylvester D, Mudge TL (2010) Near-Threshold computing: reclaiming Moore’s law through energy efficient integrated circuits. Proc IEEE 98(2):253–266 4. Alioto M (2012) Ultra-low power VLSI circuit design demystified and explained: a tutorial. Trans Circ Syst I 59(1):3–29 5. Stas F, Bol D (2018) A 0.4-V 0.66-fJ/cycle retentive true-single-phase-clock 18T flip-flop in 28-nm fully-depleted SOI CMOS. IEEE Trans Circuits Syst 65(3):935–945 6. Cai Y et al (2019) Ultra-low-power 18-transistor fully static contention-free single-phase clocked flip-flop in 65-nm CMOS. IEEE J Solid-State Circuits 54(2):550–559 7. Xu P, Gimeno C, Bol D Optimizing TSPC frequency dividers for always-on low-frequency applications in 28 nm FDSOI CMOS. In: Proceedings of IEEE SOI-3D-subthreshold microelectronics technology unified conference (SS), Burlingame, CA, USA, pp 1–2 8. Muthukumar S, Choi G (2013) Low-power and area-efficient 9-transistor double-edge triggered flip-flop. IEICE Electron Exp 10(18):1–11 9. Wang W, Mishra P (2012) System-wide leakage-aware energy minimization using dynamic voltage scaling and cache reconfiguration in multitasking systems. IEEE Trans Very Large Scale Integr (VLSI) Syst 20(5):902–910

488

S. Singar and R. S. Dhaka

10. Paul S (2016) An energy harvesting wireless sensor node for IoT systems featuring a nearthreshold voltage IA-32 microcontroller in 14nm tri-gate CMOS. In: Proceedings of IEEE symposium on VLSI circuits (VLSI-Circuits), Honolulu, HI, USA, pp 1–2 11. Lotze N, Manoli Y (2017) Ultra-sub-threshold operation of always-on digital circuits for IoT applications by use of schmitt trigger gates. IEEE Trans Circuits Syst I 64(11):2920–2933 12. Shi Y et al (2016) A 10 mm3 inductive-coupling near-field radio for syringe-implantable smart sensor nodes. IEEE J Solid State Circuits 51(11):2570–2583 13. Lee Y et al (2013) A modular 1 mm3 die-stacked sensing platform with low power I2 C inter-die communication and multi-modal energy harvesting. IEEE J Solid-State Circuits 48(1):229–243 14. Cho M (2017) A 6 × 5 × 4 mm3 general purpose audio sensor node with a 4.7 µW audio processing IC. In: Proceedings of symposium on VLSI circuits, Kyoto, Japan, pp C312–C313 15. Kaul H et al (2012) Near-threshold voltage (NTV) design: opportunities and challenges. In: Proceedings of the 49th annual design automation conference (ACM), vol 1, San Francisco California, pp 1153–1158 16. Markovic D et al (2010) Ultralow-power design in near-threshold region. Proc IEEE 98(2):237– 252 17. Wu Y, Fan X, Ni H (2013) Run-time leakage reduction in near-threshold circuits with gatelength biasing techniques. In: Proceedings of the 2nd international conference on computer science and electronics engineering, pp 0662–0665 18. Singar S, Ghosh PK (2017) Near threshold impact on delay in a low power D-latch with technology variations. In: Renewable energy and smart grid technology, 1st edn. Bloomsbury, Delhi, India 19. Shin JL et al (2013) The next generation 64b SPARC core in a t4 SoC processor. IEEE J Solid-State Circ 48(1):82–90 20. Korczynski ED (2016) IoT demands: are we ready? Solid State Technol 59(4):19–23 21. Devarapalli SV et al (2010) A robust and low power dual data rate (DDR) flip-flop using celements. In: 11th International symposium on quality electronic design. IEEE, San Jose, CA, USA, pp 147–150 22. Bonetti A (2015) An overlap-contention free true-single- phase clock dual-edge-triggered flip-flop. In: Proceedings of IEEE international symposium on circuits and systems (ISCAS), Lisbon, Portugal, pp 1850–1853 23. Lapshev S, Hasan SMR (2016) New low glitch and low power DET flip-flops using multiple c-elements. IEEE Trans Circ Syst 63(10):1673–1681 24. Lee Y (2020) A fully static true-single-phase-clocked dual-edge-triggered flip-flop for nearthreshold voltage operation in IoT applications. IEEE Access 8:40232–40245 25. Pudi N S, A. K. & Baghini, M.S.: Robust Soft Error Tolerant CMOS Latch Configurations. IEEE transactions on computers. 65(9), 2820–2834 (2016). 26. Muller DE (1955) Theory of asynchronous circuits. Internal report no. 66, Digital Computer Laboratory. University of Illinois at Urbana-Champaign 27. Ng W-K et al (2021) Double edge-triggered half-static clock-gating D-type flip-flop. Solid State Electron Lett 3:1–4 28. Khan S et al (2012) D-flip-flop based TRNG with zero hardware cost for IoT security applications. Microelectron Reliab 120:1–8 (Elsevier) 29. Lin DY-W, Wen CH-P (2021) A delay-adjustable, self-testable flip-flop for soft-error tolerability and delay-fault testability. ACM Trans Des Autom Electron Syst 26(6):1–12 30. PTM Homepage. https://www.ptm.asu.edu/modelcard. Last accessed 10 May 2022

Analyzing the Trade-Off Between Complexity Measures, Ambiguity in Insertion System and Its Application in Natural Languages Anand Mahendran, Kumar Kannan, and Mohammed Hamada

Abstract Insertion is one of the basic operations in DNA computing. Based on this basic operation, an evolutionary computation model, the insertion system, was defined. For the evolutionary computation model defined above, varying levels of ambiguity and basic descriptional complexity measures have been defined. In this paper, we define twelve new (descriptional) complexity measures based on the integral parts of the derivation, such as axioms, strings to be inserted, and contexts used in the insertion rules. Later, we analyze the trade-off among the complexity measures and the existing ambiguity levels. Finally, we examine the application of the analyzed trade-off in natural languages. Keywords Insertion systems · Complexity measures · Ambiguity levels · Trade-off · Natural languages

1 Introduction In the recent decades, the usage of computer has been increased enormously starting from storing and retrieving of data, manipulating scientific computations, and performing other complex operations. To capture the needs of the fast growing world, there is a constant research in the domain of computer science. Due to the need of The article was prepared within the framework of the HSE University Basic Research Program. A. Mahendran (B) Laboratory of Theoretical Computer Science, Higher School of Economics, Moscow, Russia e-mail: [email protected] K. Kannan School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India e-mail: [email protected] M. Hamada Software Engineering Lab, The University of Aizu, Aizuwakamatsu, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_42

489

490

A. Mahendran et al.

increase in computation speed and storage of data, the computing models used for computation, the technologies used for storage medium has to be changed rapidly. As nature is always more faster than human brains and the computing devices, researchers felt that the nature would play a critical role, in specific, if biology is introduced in the computer science domain. This introduced the notion of natural computing or bio-inspired computing models which bridged the gap between nature and computer science. As a result, a lot of bio-inspired computing models has been defined namely membrane computing, sticker systems, splicing systems, WatsonCrick automata, insertion–deletion systems, DNA computing, and H-systems [2, 17, 18]. In formal language theory, the language generation depends on the rewriting operations, which paved a new dimension for insertion systems. If a string β is lodged between two substrings α1 , α2 of a string α1 α2 to get a new string α1 βα2 , then the performed operation on the strings is called insertion. In DNA computing, the insertion operations have biological relevance, which in turn has biological (some) relevant properties in human genetics. Insertion operation was first theoretically studied in [4]. In [16], the application of the insertion operation in the domain of genetics has been investigated. In 1969, Solomon Marcus introduced Contextual grammars which are mainly based on the descriptive linguistics [13]. In contextual grammars based on the selector, the context is inserted to the left and right of selectors. Using the adjoining operation, iteratively, the strings are generated in the language, where as in insertion system based on the left and right context, the string is inserted. In [15], different ambiguity levels were defined and studied for external, internal contextual grammars depending on the parts that are used in the derivation. For more details, on the ambiguity issues related to contextual grammars, we cite [7, 8, 12, 14, 16]. In [9], various ambiguity levels were interpreted for insertion systems. As there will be more than one grammar which generates a language L, a situation arises to choose an economical grammar G which generates L. This idea of economical grammar leads to the introduction of the notation called descriptional complexity measures. In [5], the complexity measures were investigated for context-free grammars. Several complexity measures were defined for contextual grammars such as Ax,MAx,TAx,Con,MCon,TCon,Phi,MSel, TSel [6, 19, 20]. In [10], depending on the insertion rules, different complexity measures were introduced and analyzed for insertion–deletion systems. In programming and natural languages, ambiguity is one of the interesting problems that has been investigated. Given a grammar G and an input string w ∈ L(G), if it has more than one derivation or derivation trees (for the same string w), then the grammar is said to be ambiguous. On the other hand, a grammar G is unambiguous, if there exists only one derivation for all the words in L(G). As insertion system is mainly based on the insertion operation, it has a potential application in natural languages. In general, if we want to store natural languages, we will prefer economical and an unambiguous system. Under these circumstances, a trade-off needs to be performed based on the descriptional complexity measures and the ambiguity levels. Such a trade-off has been investigated for insertion–deletion systems using the basic measures [10].

Analyzing the Trade-Off Between Complexity Measures …

491

The organization of the paper is given as, the preliminaries are dealt in Sect. 2. The newly introduced descriptional complexity measures of insertion systems were discussed in Sect. 3. The trade-off results between the newly defined complexity measures and various ambiguity levels of insertion systems were investigated in Sect. 4. The application of the trade-off between ambiguity and measures in natural languages has been probed in Sect. 5. The conclusion and the future work is dealt in Sect. 6.

2 Preliminaries We start with discussing about the fundamental notations used in formal language theory. V is called an alphabet set. T is called a terminal set. The free monoid generated by V is represented as V ∗ . The null string is denoted by . Strings or words are the elements from V ∗ . By eliminating  from V ∗ , we can obtain V + . A language L is given as L ⊆ V ∗ . For more details, we refer to [22]. An insertion system is defined as follows: γ = (V , A, P), where V represents an alphabet, A is a finite language over the alphabet (axiom), and P is defined as a set of finite insertion rules in the given format (u, β, v). In the above insertion rule (IR), the pair (u, v) is called context and (u, v) ∈ V ∗ × V ∗ . The β represents the string to be inserted (IS) and β ∈ V + . Given an insertion rule, depending on the left context (LC) and right context (RC), (u, v), the string β is inserted. If (u, v) ∈ λ, then the insertion of β can be done anywhere in the word. Given an insertion rule (u, β, v), the y can be derived from x as follows: ((x, y) ∈ V ∗ and x =⇒ y). Consider the following derivation step: x = x1 u↓ vx2 , y = x1 u βvx2 , for some x1 , x2 ∈ V ∗ and (u, β, v) ∈ P, ↓ marks the location of the string to be inserted, the inserted string is represented by a underline. The language generated by γ is defined as follows: L(γ ) = {w ∈ V ∗ | x =⇒∗ w, for some x ∈ A}, where =⇒∗ is the reflexive and transitive closure of the defined relation =⇒. In [9], six new levels of ambiguity were defined for insertion systems by considering the parts that are used in the derivation. Given a derivation step in an insertion system γ : δ : w1 =⇒ w2 =⇒ ... =⇒ wm , m ≥ 1, where w1 ∈ A, and the scenario can be wj = x1,j uj vj x2,j , wj+1 = x1,j uj βj vj x2,j , when an insertion rule (uj , βj , vj ) is used, where x1,j , uj , vj , x2,j ∈ V ∗ . If the sequence has w1 , β1 , β2 , β3 , ..., βm−1 (axioms and inserted strings), then it is control sequence (CS). If the sequence has w1 , (u1 , β1 , v1 ), (u2 , β2 , v2 ), (u3 , β3 , v3 ), ..., (um−1 , βm−1 , vm−1 ) (axioms, inserted strings and the contexts), then it is complete control sequence (CCS). If the sequence is defined as w1 , x1,1 u1 β1 v1 x2,1 , x1,2 u2 β2 v2 x2,2 , x1,3 u3 β3 v3 x2,3 , ..., x1,m−1 um−1 βm−1 vm−1 x2,m−1 , which is mainly based on the position where the string β is inserted, then it is a description. Given an insertion system γ , Table 1 depicts the various ambiguity levels.

492

A. Mahendran et al.

Table 1 Different ambiguity levels of insertion system Ambiguity level Description 0-ambiguous 1-ambiguous 2-ambiguous 3-ambiguous 4-ambiguous 5-ambiguous

From two different axioms (A1 , A2 ∈ A, A1 = A2 ), a same word w can be derived If the same word w can be obtained by two distinct Unordered CS If the same word w can be obtained by two distinct Unordered CCS If the same word w can be obtained by two distinct Ordered CS If the same word w can be obtained by two distinct Ordered CCS If the derivations are different based on the descriptions

Now, we will discuss about the measures introduced for insertion–deletion system γ = (V , T , A, R), the measures are Ax, MAx, TAx, TLength − Con, TLength − Str, TINS − StrCon, TDEL − StrCon, TINS − Str, TDEL − Str. For more details, we refer [6, 10, 19, 20]. Given the measure M and L, the minimal system γ for the language L is defined as follows: M (L) = min{M (γ ) | L = L(γ )}. For a given measure M and a language L, we define M −1 (L) = {γ | L(γ ) = L and M (γ ) = M (L)}. In the above definition, M −1 (L) denotes the set of all minimal systems that generates L which are minimal in the measures M . For a language L, two measures M1 , M2 are said to be incompatible if the following relation M1−1 (L) ∩ M2−1 (L) = ∅ holds true. If M1−1 (L) ∩ M2−1 (L) = ∅, then the measures (M1 and M2 ) are called compatible. Based on the above definition, in [16], any two of the measures Ax, MAx, TAx are proved to be compatible. From the above measures, if the deletion rules were not used by the system γ , the measures {TDEL − StrCon, TDEL − Str} are not applicable to insertion systems.

3 New Descriptional Complexity Measures In this section, we introduce twelve new descriptional complexity measures depending on the integrants used in the derivation of the language. Table 2 shows the newly introduced measures.

Analyzing the Trade-Off Between Complexity Measures … Table 2 Newly introduced descriptional complexity measures S. No. Measure Notation 1

MLen − InsStr

2

mLen − InsStr

3

MLen − LCon

4

MLen − RCon

5

mLen − LCon

6

mLen − RCon

7

TLen − LCon

8

TLen − RCon

9

TLen − LCon + InsStr

10

TLen − RCon + InsStr

11

MLen − LCon + InsStr

12

MLen − RCon + InsStr

493

Description

max

|β|

Maximum length of the IS

min

|β|

Minimum length of the IS

max

|u|

max

|v|

min

|u|

min

|v|

Maximum length of the LC used in the IR Maximum length of the RC used in the IR Minimum length of the LC used in the IR Minimum length of the RC used in the IR

(u,β,v)∈R (u,β,v)∈R (u,β,v)∈R (u,β,v)∈R (u,β,v)∈R (u,β,v)∈R



(u,β,v)∈R



(u,β,v)∈R



(u,β,v)∈R



(u,β,v)∈R

|u|

Total length of all LC used in the IR

|v|

Total length of all RC used in the IR

|u| + |β|

Total length of LC + the length of the IS

|v| + |β|

Total length of LC + the length of the IS Maximum length of LC + the length of the IS Maximum length of RC + the length of the IS

max

|u| + |β|

max

|v| + |β|

(u,β,v)∈R (u,β,v)∈R

4 Trade-Off Results Between (Descriptional) Complexity Measures and Ambiguity Levels In this section, we investigate the trade-off for insertion languages by considering the complexity measures and ambiguity levels. Theorem 1 There are pseudo inherently 5-ambiguous insertion languages which are minimal in M1 ∈ {TLength − Str} and M2 ∈ {TLen − RCon, MLen − RCon, mLen − RCon}. Proof Let the language L1 ={d (a3 b)k | k ≥ 0} ∪ {(a3 b)k c | k≥0}. The following 5-ambiguous insertion system γ1 can be used to generate L1 . γ1 = ({a, b, c, d }, {d , da3 b, c, a3 bc}, {(λ, a3 b, a3 b)}.The system γ1 is minimal in TLength − Str. Any insertion system γ1 which generates L1 should have an insertion string of length ≤ 4. Therefore, γ1 is minimal in TLength − Str and TLength − Str(L1 ) = 4. Consider any γ1 which generates L1 which has TLength − Str = 4. Consider the words d (a3 b)i or (a3 b)j c ∈ L1 , for a large values of i, j. In the derivation of the

494

A. Mahendran et al.

above words, different a3 b can be chosen, thus producing two different descriptions. Therefore, the system γ1 is 5-ambiguous. However, the language L1 is unambiguous as there exists an unambiguous insertion system γ1 which generates L1 . Consider the system γ1 = ({a, b, c, d }, {c, d }, {(d , a3 b, λ), (λ, a3 b, c)}. The system γ1 is unambiguous. With the help of the insertion rule (d , a3 b, λ), d (a3 b)k , k ≥ 0 can be generated. Likewise, by the using the insertion rule (λ, a3 b, c), (a3 b)k c, k ≥ 0 can be generated. While deriving d (a3 b)r or (a3 b)s c, r, s ≥ 1 ∈ L1 , the position of the string to be inserted a3 b is unique in the derivation. From the system γ1 , it is easy to see that the γ1 is minimal with respect to {TLen − RCon, MLen − RCon, mLen − RCon}. Corollary 1 In the above result, if the insertion rule is changed as (a3 b, a3 b, λ), then there exists a result for the following measures. There are pseudo inherently 5-ambiguous insertion languages with respect to M1 ∈ {TLength − Str} and M2 ∈ {TLen − LCon, MLen − LCon, mLen − LCon}. Theorem 2 There are pseudo inherently 4-ambiguous insertion languages with M1 ∈ {Ax} and M2 ∈ {TLen − LCon, MLen − LCon, TLength − Str}. Proof Let the language L2 = {c2 an | n ≥ 0} ∪ {d 2 an | n ≥ 0} ∪ {c2 an d 2 am | n, m ≥ 0}. The following 4-ambiguous insertion system γ2 can be used to generate L2 . γ2 = ({a, b, c, d }, {c2 , d 2 , c2 d 2 }, {(c2 , a, λ), (d 2 , a, λ)}. Hence, Ax(L2 ) ≤3. To generate L2 , any insertion system γ2 should have the axioms of the form (which should be minimum three) c2 , d 2 , c2 d 2 . Therefore, Ax(L2 ) = 3. Consider any γ2 which generates L2 , where Ax(L2 )=3. To generate c2 al , l ≥ 0 of L2 , definitely, the insertion system must have an insertion rule of the following form (c2 , ai , λ), i ≥ 0. To generate d 2 ak , k ≥ 0 of L2 , definitely, the insertion system should have a rule of the form (d 2 , aj , λ), j ≥ 0. In order to prove γ2 is 4-ambiguous, let us examine a string c2 al+i d 2 ak+j ∈ L2 . The above string can be acquired by two different ordered CCS. In one CCS, first the following insertion rule (c2 , a, λ) can be used, followed by the another insertion rule (d 2 , a, λ). In another CCS, first the following insertion rule (d 2 , a, λ) can be used, followed by the other insertion rule (c2 , a, λ). As the string to be inserted ai is same for any arbitrary system, the insertion system γ2 is 2- and 3-unambiguous. However, L2 is unambiguous as there exists an unambiguous system γ2 which generates L2 . Consider the system γ2 = ({a, b, c, d }, {c2 , d 2 , c2 d 2 , c2 a, d 2 a, c2 ad 2 , c2 d 2 a, c2 ad 2 a}, {(a, a, λ)}. The system γ2 is minimal while considering the following measures: {TLen − LCon, MLen − LCon, TLength − Str}. As the system γ2 uses only one insertion rule (a, a, λ), it is clear that the system is 4-unambiguous. Corollary 2 There are pseudo inherently 4-ambiguous insertion languages with M1 ∈ {MAx, TAx} and M2 ∈ {TLen − LCon, MLen − LCon, TLength − Str}. Theorem 3 There are pseudo inherently 2-ambiguous insertion languages with M1 ∈ {Ax} and M2 ∈ {TLen − LCon, TLen − RCon, MLen − LCon, MLen − RCon, TLen − LCon + InsStr, TLen − RCon + InsStr}.

Analyzing the Trade-Off Between Complexity Measures …

495

Proof Let the language L3 = {a2 bm | m ≥ 0} ∪ {bm c2 | m ≥ 0} ∪ {a2 bm c2 | m ≥ 0}. The following 2-ambiguous insertion system γ3 can be used to generate L3 . γ3 = ({a, b, c}, {a2 , c2 , a2 c2 }, {(a2 , b, λ), (λ, b, c2 )}. The insertion system γ3 is minimal in Ax. Any system which generates L3 should have minimum three axioms {a2 , c2 , a2 c2 }. Therefore, Ax(L3 ) ≤3, in turn it infers Ax(L3 ) = 3. Consider any insertion system γ3 which is used to generate L3 , where Ax(L3 )=3. Since the strings of the form a2 bi , i ≥ 0 ∈ L3 , definitely in the insertion rule there should be a context of the form (a2 , bu ), u ≥ 0. Likewise, since the string of the form bj c2 , j ≥ 0 ∈ L3 , definitely in the insertion rule there should be a context of the form (bt , c2 ), t ≥ 0. In both the cases, the inserted string will be bv , v ≥ 1. To prove the insertion system is γ3 is 2-ambiguous, lets us take a string a2 be+f c2 ∈ L3 . From two (different) unordered CCS, the above word can be obtained from the (same) axiom. In one sequence using the context (a2 , bu ) completely the word can be obtained. In another sequence, using the context (bt , c2 ) completely the word can be derived. Thus, the same word a2 be+f c2 is derived from two distinct unordered CCS. Therefore, the language L3 is 2-ambiguous. The language L3 is unambiguous as L(γ3 ) = L3 . The 2-unambiguous insertion system is given as follows: γ3 = ({a, b, c}, {a2 , c2 , a2 b, bc2 , a2 c2 , a2 bc2 }, {(b, b, λ)}. Since, the system γ3 as only one context in the insertion rule (b, λ), it implies γ3 is 2-unambiguous. The insertion system γ3 is minimal in the measures {TLen − LCon, TLen − RCon, MLen − LCon, MLen − RCon, TLen − LCon + InsStr, TLen − RCon + InsStr}. From the system γ3 , it is not minimal in Ax. Corollary 3 There are pseudo inherently 2-ambiguous insertion languages in the measures M1 ∈ {MAx, TAx} and M2 ∈ {TLen − LCon, TLen − RCon, MLen − LCon, MLen − RCon, TLen − LCon + InsStr, TLen − RCon + InsStr}. Theorem 4 There are pseudo inherently 0-ambiguous insertion languages with respect to M1 ∈ {Ax, MLen − InsStr, TLength − Str} and M2 ∈ {MLen − LCon, MLen − RCon, TLen − LCon, TMLen − RCon}. Proof Let the language L4 = {a2 b3n | n ≥ 1} ∪ {b2n c2 | n ≥ 1} ∪ {a2 bn c2 | n ≥ 0}. The following 0-ambiguous insertion system γ4 can be used to generate L4 . γ4 = ({a, b, c}, {a2 b3 , b2 c2 , a2 c2 , a2 bc2 }, {(a2 , b3 , λ), (λ, b2 , c2 }. The system γ4 is minimal with respect to {Ax, MLen − InsStr, TLength − Str}. First, we will prove for the measures {MLen − InsStr, TLength − Str}. From the system γ4 , it is easy to see that MLen − InsStr(γ4 ) ≤ 3 and TLength − Str(γ4 ) ≤ 5. To generate the strings of the form a2 bk c2 , k ≥ 0 ∈ L4 , the insertion rule should have the string b. However, if such an insertion string is present in any of the insertion rules, then the system γ4 may gen/ L4 . From the above erate some strings a2 b3p , p ≥ 1 and b2q c2 , q ≥ 1 which doesn’t ∈ claim, all the parts of L4 cannot be produced by the insertion string b. Next, we will discuss on the following measure TLength − Str. Since the strings of the structure a2 b3p , p ≥ 1 ∈ L4 , insertion rule will certainly have the string b3 . Likewise, Since the strings of the structure b2q c2 , q ≥ 1 ∈ L4 , insertion rule will certainly have the string b2 . Therefore, we conclude MLen − InsStr(γ4 ) ≥ 3 and TLength − Str(γ4 ) ≥ 5.

496

A. Mahendran et al.

Next, we sill discuss on the measure Axiom. Any system which generates L4 will have three axioms a2 b3 , b2 c2 , a2 c2 . Next, we will discuss why the system should have an axiom a2 bc2 . If the system is not having the axiom a2 bc2 , probably it can be generated by using the axiom a2 c2 by inserting the string b. But previously, we have proved that the system cannot have b as the string to be inserted. Therefore, it implies a2 bc2 should be present in the axiom. Therefore, the system γ4 is minimal in the measure Ax. Consider any system γ4 which generates L4 . The system γ4 is minimal in the measure Ax, MLen − InsStr, TLength − Str. In order to claim γ4 is 0-ambiguous, let us take the strings a2 b3r and b2s c2 ∈ L4 , for a large values of r and s. To produce the words of the form a2 b3p , p ≥ 1 and b2q c2 , q ≥ 1, the insertion system γ4 should have strings of the form b3t , t ≥ 1 and b2u , u ≥ 1, respectively. Consider a word a2 b3tm+2un c2 ∈ L4 , for m ≥ 1, n ≥ 0. The above word can be achieved from two unique axioms a2 c2 and a2 bc2 . Starting from the axiom a2 c2 , the word a2 b3tm+2un c2 can be obtained by inserting the strings b3t , m-times and b2u , n-times. On the other hand, the word a2 b3tm+2un c2 can be derived from the axiom a2 bc2 . In one derivation, the string b3t can be inserted for m − i1 -times, i1 ≥ 1. In another derivation, the string b2u can be inserted for n + i2 -times, i2 ≥ 1. Thus, the word a2 b3tm+2un c2 is obtained from two different axioms a2 c2 , a2 bc2 . Therefore, the system γ4 is 0-ambiguous. For the measures MLen − InsStr, TLength − Str, an akin reasoning can be given. Next, we have to prove the L4 is 0-unambiguous, by showing there exists an 0unambiguous system γ4 = ({a, b, c}, {a2 b3 , a2 b6 , b2 c2 , b4 c2 , b6 c2 , a2 c2 , a2 bc2 , a2 b2 c2 , a2 b3 c2 , a2 b4 c2 , a2 b5 c2 }, {(b, b6 , λ)}) which generates L4 . The system will produce a unique derivation step for any word ∈ L4 , starting from an axiom by inserting the string b6 , which shows γ4 is 0-unambiguous. As the system uses the following insertion rule (b, b6 , λ), the system γ4 is minimal in the measures MLen − LCon, MLen − RCon, TLen − LCon, TMLen − RCon.

5 Application of the Trade-Off Results in Natural Languages Syntactic and semantic ambiguity deserves a special attention in programming, artificial, and natural languages. As the programming language constructs are mainly based on syntax and semantic rules, handling these ambiguities is not a great deal of interest. In natural languages, handling syntactic ambiguity is easier when compared to semantic ambiguity. The main reason is, while dealing with the natural languages, one sentence (or a word) can convey different meaning. Even in Google translator, if the translation is carried out by word by word the meaning may be different from the source to the target language. Under these circumstances, the natural languages should be translated (stored) in an unambiguous manner. As we know, for every (programming/natural/artificial) language, there is a grammar G, such that L(G) = L. To generate the natural languages such as English and Dutch, we need grammars that

Analyzing the Trade-Off Between Complexity Measures …

497

are beyond the (generative) capability of context-free grammar [1, 21]. In addition to that, many natural languages have the existence of sentences beyond context free [3, 11]. In this regard, to generate (store) such natural languages, the grammar G which generates L should be unambiguous, and at the same time, it should be minimal in terms of measures. In practical, such a minimal unambiguous system will not be there for all languages. Under these circumstances, a necessary trade-off needs to be studied between the (descriptional) complexity measures and ambiguity. To prove why such a trade-off is very important in natural languages, lets consider the following sentence, They are hunting dogs. The sentence is syntactically correct, where as the sentence is having semantic ambiguity, as it can be elucidated in a different manner. The different interpretation of the above mentioned sentence can be as follows: Whether any group is hunting for dogs? or Whether the category of dogs belongs to the hunting type? or Whether the phrase hunting dogs refers to a music band or a basket ball team or a secret code?. In fact, the right phrases of the sentence are They are hunting, They are dogs, They are hunting dogs. Assume that, we want to construct a insertion system which generates the above sentence. Since it is insertion system, every derivation step should represent a correct phrase, the correct phrases are They are, They are hunting, They are dogs, They are hunting dogs. Consider, ‘They are’ is an axiom and the insertion rules are of the form: (They are, dogs, λ) and (They are, hunting, λ). By using the above axiom and the insertion rules, the derivations can be of the forms (the underlined words indicates the inserted string): (1) They are =⇒ They are dogs =⇒ They are hunting dogs, which gives all the three correct phrases. (2) They are =⇒ They are hunting =⇒ They are dogs hunting, which is not a correct phrase. So, with insertion rules all the correct phrases cannot be generated. However, if we consider three insertion rules (They are, dogs, λ), (They are, hunting, λ), (They are, hunting dogs, λ), all the three correct phrases can be derived from the axiom or else using different insertion rules we may get all correct phrases of the sentence, but the number of insertion rules will be more. So, to derive the above sentence we need three insertion rules. Such sentences can be stored compactly if there exist an unambiguous system which generates it, but may happen to be not minimal with respect to measure(s). As insertion systems are found to be one of the (prominent) rewriting grammar mechanisms, the system can be recognized to be one of the fit (rewriting) mechanisms to generate natural languages. The above example clearly shows that the sentence can be generated by an unambiguous system but not minimal in terms of components used to iterate the sentence. The above case study explicit the importance of studying the trade-off in natural languages, such as similar study can even be analyzed in programming languages.

498

A. Mahendran et al.

6 Conclusion In this paper, we defined twelve new descriptional complexity measures, and we have shown that there exists insertion languages which can be generated by an unambiguous system which are minimal in M1 and ambiguous if they are minimal in M2 . Finally, we have studied the application of the investigated trade-off in natural languages. Analyzing the trade-off between measures and ambiguity levels which are not considered in this paper is left as a future work. Acknowledgements The author would like to acknowledge the CSIR funded project 25(0291)/18/EMRII.

References 1. Bresnan J, Kaplan RM, Peters S, Zaenen S (1982) Cross-serial dependencies in Dutch. Linguist Inq 13(4):613–663 2. Calude CS, P˘aun G (2001) Computing with cells and atoms an introduction to quantum. DNA and membrane computing. Taylor and Francis, London 3. Chomsky N (1963) Formal properties of grammars. In: Luce RD et al (eds) Handbook of Mathematical Psychology. John Wiley, New York, pp 323–418 4. Galiukschov BS (1981) Semicontextual Grammars. Mat Logical Mat Ling, Talinin University, pp 38–50 (in Russian) 5. Gruska J (1973) Descriptional complexity of context-free languages. Proc Math Found Comput Sci’73:71–84, High Tatras 6. Lakshmanan K (2005) Incompatible measures of internal contextual grammars. In: Mereghetti C et al (eds) Proceedings DCFS’05, Como, Italy, pp 253–260 7. Lakshmanan K (2006) A note on ambiguity of internal contextual grammars. Theor Comput Sci 369:436–441 8. Lakshmanan K, Anand M, Krithivasan K (2008) On the trade-off between ambiguity and measures in internal contextual grammars. In: C˘ampeanu C, Pighizinni G (eds) DCFS, Charlottetown 2008, Canada, pp 216–223 9. Lakshmanan K, Anand M, Krithivasan K (2011) On the ambiguity of insertion systems. Int J Found Comput Sci 22(07):1747–1758 10. Lakshmanan K, Anand M, Krithivasan K, Mohammed K (2011) On the study of ambiguity and the trade-off between measures and ambiguity in insertion-deletion languages. Nano Commun Netw 2(2–3):106–118 11. Langendoen DT, Postal PM (1984) The vastness of nat. lang. Blackwell, Oxford 12. Ilie, L (1997) On ambiguity in internal contextual languages. In: Martin-Vide C (ed) II International conference on mathematical linguistics. John Benjamins, Amsterdam, pp 29–45 13. Marcus S (1969) Cont. gram. Rev Roum Pures Appl 14:1525–1534 14. Martin-Vide C, Miguel-Verges J, P˘aun G, Salomaa A (1997) Attempting to define the Ambiguity in internal contextual languages. In: Martin-Vide C (ed) II International conference on mathematical linguistics, Tarragona, 1996, John Benjamins, Amsterdam, pp 59–81 15. P˘aun G (1982) Contextual grammars. The Publishing House of the Romanian Academy of Sciences, Bucuresti 16. P˘aun G (1997) Marcus contextual grammars. Kluwer Academic Publishers 17. P˘aun G, Rozenberg G, Salomaa A (1998) DNA computing, new computing paradigms. Springer 18. P˘aun G (2002) Membrane computing an introduction. Springer 19. P˘aun G (1975) On the compl of contextual grammars with choice. Stud Cerc Matem 2:559–569

Analyzing the Trade-Off Between Complexity Measures …

499

20. P˘aun G (1991) Further remarks on the syntactical complexity of Marcus contextual languages. Ann Univ Buc, Ser Matem-Inform, pp 72–82 21. Pullum GH (1980) On two Recent Attempts to show that English is not a context-free language. Comput Linguis 10:182–186 22. Rozenberg G, Salomaa A (1997) Handbook of formal languages, vol 1. word, language, grammar. Springer

Human-to-Computer Interaction Using Hand Gesture Tracking System Raunaq Verma, Raksha Agrawal, Nisha Thuwal, Nirbhay Bohra, and Pranshu Saxena

Abstract A virtual mouse with fingertip identification and hand motion tracking based on visuals in a live video is one of the research projects in human–computer interaction. A real-time fingertip-gesture-based interface is still problematic for human–computer interactions because of sensor noise, changing light levels, and the challenge of tracking a fingertip over a variety of subjects. When you don’t have a mouse, using your fingertip as a virtual mouse is a typical technique to interact with computers. Hand gestures were used to control mouse movement and mouse click events in this work. A virtual mouse is controlled using fingertip identification and hand gesture recognition. Various approaches have been utilised to produce the virtual mouse in recent years. Finger tracking is done in this study utilising coloured finger caps. Finger detection utilising colour identification, hand motion recognition, and mouse control implementation are the three key phases to implement this. The user must apply the RGB colour tapes to his or her fingers so that he or she may easily make motions for each tape, as well as a combination of these three-coloured tapes will help us to obtain the appropriate cursor movement output in the system. Keywords Colour detection · Human–computer interaction · Web camera · Image processing · Virtual mouse · Gesture recognition

1 Introduction 1.1 An Overview to Hand Gesture Tracking System In today’s technological world, many technologies are continually evolving. Human– computer interaction (HCI) is one such promising approach. A wired mouse has limited functional capabilities and cannot be further extensible. There will be no such constraints in the planned initiative, which will rely on gesture recognition. Gesture R. Verma (B) · R. Agrawal · N. Thuwal · N. Bohra · P. Saxena Department of Computer Science & Engineering, Inderprastha Engineering College, Ghaziabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_43

501

502

R. Verma et al.

detection, picture processing, and colour recognition using ‘sixth sense technology’ are the primary technologies used in this research. Sixth sense technology is a collection of wearable gadgets that serve as a gesture-based interface between the physical and digital worlds. The goal is to move the mouse cursor on the screen without using any hardware, such as a mouse, and just by finger movements. This movement is referred to as gesture recognition. This system proposes a virtual mouse system based on a web camera to communicate with the computer/laptop systems in a more user-friendly manner, which might be used to replace traditional mouses or touchpads. For the virtual mouse, a web camera is typically utilised, which employs a variety of image processing algorithms, including the usage of the centroid for object recognition and tracking. The number of centroids on the photographs was used to detect the left- and right-click events of the mouse. A camera continuously records a user’s hand gestures and then translates into mouse inputs without physically connecting to the computer or relocating any gear. This paper creates a cost-effective and hardware-independent system, allowing humans to engage without contacting the computer.

2 Literature Review Nowadays, the advancement and growth of contact with computer devices have progressed at such a rapid pace that we have been unable to keep up with the influence as individuals. It has turned into our essential thing. Technology is utilised for communicating, shopping, working, and even entertaining ourselves since it is all around us. Gestures can be caused by a number of physical motions, although the face and hands are the most typical sources of gestures. The process of tracking motions, representing them, and translating them into a useful command is known as gesture recognition [1]. Data gloves [2, 3], hand belts [4], and cameras are some of the most frequent techniques of recording human input. Hand gesture-based interfaces are widely used in HCI systems for various applications, including sign language recognition [4], robot control [5], virtual mouse control [6], medical picture data exploration, human–vehicle interaction, and immersive gaming technologies. As a result, for decades, one of the critical goals of computer vision has been to recognise human hand motions from videos, particularly with traditional cameras (RGB cameras). The paper [7] presented a revolutionary technology that uses simply a commodity depth camera to track the detailed movements of a user’s hand. Their method can reliably reconstruct the hand’s complex articulated attitude while resisting tracking failure and allowing for flexible configurations such as tracking over long distances and camera placement over the shoulder. This study [8] provides a complete overview of deep learning-based RGB-D motion recognition. They gave a quick rundown of commonly used datasets and connected them to studies that focus solely on datasets. According to the modality, the approaches are divided into four categories: RGB-based, depth-based, skeleton-based, and RGB + D-based. The three modalities have distinct characteristics, resulting in various deep learning algorithms using

Human-to-Computer Interaction Using Hand Gesture Tracking System

503

their unique traits. Convolutional neural networks (CNNs) [9, 10]-based systems have lately demonstrated exceptional performance in HCI. CNNs’ architecture is designed for feature learning automation because of numerous layers. As a result, data is represented using a deep learning architecture. From the low to the high levels, there is a hierarchy of features. Virtual gaming controllers, sign language recognition, directional indication by pointing, training young children to interact with computers, human–computer interface, robot control, and lie detection are just a few of the uses for hand gesture recognition [11]. A new virtual mouse technology based on RGB-D pictures and fingertip detection was presented in this research [12]. With no mouse, gloves, or markers, the user’s fingertip movement communicated with the computer in front of a camera. Not only did the method provide exact gesture predictions, but it also had practical uses. In this paper, the proposed methodology uses the Python3.7 programming language to implement the algorithm. It takes use of the OpenCV cross-platform image processing module, and mouse operations are implemented using the Pythonspecific library PyAutoGUI. For this three-coloured fingertips are used, taken from the video captured by the webcam of the system. The inRange() method is used in order to detect the colours from the input feed. The centres of the detected colour are determined by using the method of moments, and the action to be taken is determined based on their relative positions. By using the PyAutoGUI library, various mouse actions can be performed such as: 1. 2. 3. 4. 5. 6.

Left click Right click Scrolling up Scrolling down Dragging files Mouse cursor movement. In the upcoming section, the proposed methodology is discussed.

3 Proposed Methodology With the help of a webcam, this research is creating a virtual human–computer interaction tool. For the man–machine interaction, a colour-based identification and tracking technique are used. Coloured objects having a considerable amount of visible light to detect the colour (such as red, blue, yellow, cyan, green, and magenta) are sufficient for the system’s design. The block diagram of the proposed methodology is shown in Fig. 1. The video of the hand gesture is captured using the webcam. The video taken by the webcam is generally demonstrated in a horizontally flipped manner which means when a person moves their hand to the left side, the hands in the picture move to the right side and vice-versa. Flip the captured video frame horizontally (mirror image) followed by HSV conversion to make it more comfortable. Since HSV

504

R. Verma et al.

is more resistant to changes in external lighting, it was chosen for colour detection. After converting to HSV format, identify the colours used for the mouse’s movement. Then unwanted colour (noise) is removed, and the video frame is converted into the binary format. This filtered image is used for locating the centres. After finding centres of each colour, set the cursor’s positions and choose the mouse’s actions. And at last, after selecting the particular action for the mouse, finally actions such as left click, scroll up, right click, scroll down, drag can be carried out. Fig. 1 Flowchart of proposed methodology

Human-to-Computer Interaction Using Hand Gesture Tracking System

505

Fig. 2 Taking real-time video as an input

3.1 Real-Time Video Acquisition Sensors that provide real-time information to the system are always present in an interactive system. In this case, a webcam is utilised to acquire real-time system inputs at a fixed resolution and frame rate. Following that, the image frames from the video are extracted and processed one by one, as seen in Fig. 2. Every frame in the database is expressed by a matrix (m * n) of blue, green, and red channels, each element having a matrix of those channels. Pixels are a term for these elements. It is known that webcams capture video at a resolution of pixels at 30 fps in most cases.

3.2 Flipping of Individual Video Frames When the video is previewed after real-time video capture, it is noticed that it is horizontally inverted. When a person moves their hand to the left side, the hand in the picture moves to the right side and vice-versa. As a result, the image must be horizontally flipped. Figure 3 shows the horizontally flipped frame. Fig. 3 Flipping of individual video frames

506

R. Verma et al.

3.3 BGR to HSV All colours are present in a flipped visual frame. Detecting the current frame’s red, blue, and yellow colours is necessary to operate the cursor. So, firstly the frame is converted into HSV format (see Fig. 5). These formats use a geometrical representation to store the information of the image’s colour. The colours are linked onto a cylinder, with the hue representing the angle, saturation representing the distance to the centre, and the value representing the height. This image format is handy for image segmentation algorithms. HSV to RGB/BGR is chosen for colour detection because HSV is more immune to fluctuations in external lighting. This signifies that hue values shift less than RGB values in circumstances of slight changes in external lighting. The HSV colour wheel (hue, saturation, value) might take the form of a cone or a cylinder, but it always has three components: Hue—Hue is a value that ranges from 0° to 360°. Colour red is represented by hue 0°, green colour is represented by hue 120°, and blue colour is represented by hue 240°. It represents the model’s colour characteristic. Figure 4 shows the different values of hue for different colours. Saturation—Saturation represents a colour’s grayscale percentage, which can range from 0 to 100%. When the percentage of saturation in a colour is 100%, then the colour will be in its purest form, whereas when the percentage of the saturation in a colour is 0% it will be in greyscale. Value—Brightness or intensity of the colour is controlled by the term value which can also range from 0 to 100%. When the percentage of value in a colour is 100%, Fig. 4 Representation of hue values associated with different colours

Human-to-Computer Interaction Using Hand Gesture Tracking System

507

Fig. 5 Real-time extracted HSV frame

then the colour will be totally black, whereas when the percentage of the value in a colour is 0%, it will be most brightest displaying maximum colour.

3.4 Colour Identification After converting into the HSV image frame, the RGB image frame is used to extract the required colours to scan the colour caps on the fingertip. By default, three colours can be used to perform different actions. For example, red, blue, and yellow can be taken as default colours. The lower limit and upper limit of the required default colour in HSV format (hue, saturation, and value) are given below Lower limit = [88, 78, 20] Upper limit = [128, 255, 255] Red: Lower limit = [158, 85, 72] Upper limit = [180, 255, 255] Yellow: Lower limit = [21, 70, 80] Upper limit = [61, 255, 255] Blue:

If these three default colours are not available, then some different available colours can also be used by recalibrating that particular colour with the colour that is not available. After successful calibration, the value of the selected colour can be changed and used accordingly to perform various actions. After calibrating the colours, these colours are used for the motion of the cursor. For this, only three fingertip colours are extracted from the video, one by one.

3.5 Removing Noise and Binary Image Formation After extracting three colours from the video, remove noise from the video so that only fingertips can be traced in the video. Then this filtered image is sent for locating the centres. The boundaries of a foreground object are eroded by erosion. If all of

508

R. Verma et al.

Fig. 6 Extracting blue colour in the frame

the pixels within the kernel are 1, a pixel in the original picture (either 1 or 0) will be considered 1, else it will be degraded (made to zero). Then after applying erosion, dilation can be done because erosion decreases our object’s size while eliminating white noise. Due to this, dilation is done. The white noise won’t return now and the noise is gone now permanently, but the area of the object will grow as shown in Figs. 6, 7, and 8 in which red colour, blue colour, and yellow colour are extracted from the images and converted into binary images. Combining all the three extracted images of the colour into one frame represents the final extraction of the colours in a single frame (shown in Fig. 9) which is used for further actions. The opposite of erosion is dilation. If at least a single pixel under the kernel is ‘one’, a pixel element is ‘one’. As an outcome, the image’s white region or the area of the foreground object expands. Because erosion eliminates white noises while also reducing our object, therefore it is followed by dilation.

3.6 Find Contours and Draw Centroids In this part, the counters in the mask are found according to the particular colour range and discard counters whose area is not relevant according to the given area range. Contours are just curves connecting all of the continuous points (along the border) that are of the same hue or intensity. The contours are essential for item recognition and detection. The default area of the default colours for the contours can be red_area = [100, 1700]

Human-to-Computer Interaction Using Hand Gesture Tracking System

509

Fig. 7 Extracting red colour in the frame

Fig. 8 Extracting yellow colour in the frame

blue_area = [100, 1700] yellow_area = [100, 1700] After finding all the counters in the image frame, the contour in the given predefined area is calculated, specific for the fingertip detection. After seeing the particular counter in the video frame, find the centroid of that specific counter. The centroid, or centre point, for a contour can be found by dividing specific values accordingly. After that truncate the results of the divisions to integers and save the coordinates of

510

R. Verma et al.

Fig. 9 Extracting three colours together in a single frame

Fig. 10 Detecting and drawing centroids on the colours used to perform mouse action

the centre point in the cx and cy variables as follows:      cx = M  m10 /M  m00      cy = M  m01 /M  m00 After finding the values of x- and y-coordinates, a circle can be drawn on the particular contour, which becomes the centroid of that contour as shown in Fig. 10, so that it can be visible that the specific colour is identified correctly.

3.7 Set Cursor Position Finally, after finding the centroid, various operations for the mouse control are defined. In this, different colour combinations can be chosen to perform different actions of the mouse. Any colour can be used for the movement of the cursor. If the fingertip contains yellow colour is used to set the cursor’s position or for the movement of the cursor. To set the cursor’s position, calculate the new cursor position

Human-to-Computer Interaction Using Hand Gesture Tracking System

511

using the yellow region’s centre and the old cursor position. The centres continue to flicker around a mean location due to noise acquired by the camera and motions in hand. When such vibrations are scaled up, they reflect poorly on the precision of the position of the cursor. Employ differential position allotment for the cursor to eliminate shakiness in the cursor. The new centre is compared to the cursor’s prior position. It’s usually noise if the difference is less than 5 pixels. As a result, the position of the new cursor resembles the prior one more closely. When the difference between the new centre position and old centre position is large, then it is taken as a considerable mouse movement. So, the position of the mouse cursor is set more towards the new centre position.

3.8 Choose an Action The three centres are used to determine what action should be taken based on their relative positions. Using the three centroids’ position, action like a left click, right click, drag, scroll up, scroll down, and free cursor moment is decided. For instance • If the distance between the yellow centroid, red centroid, and the blue centroid is less than 50 pixels, drag action is performed. • If the distance between the red and blue centroid is less than 40 pixels, then right-click action is performed. • If the distance between the yellow and red centroid is less than 40 pixels, then left-click action is performed. • If the distance between the yellow and red centroid is greater than 40 pixels and between the red and blue centroid is greater than 120 pixels, then a scroll-down action is performed. • If the distance between the yellow and red centroid is greater than 40 pixels and between the blue and red centroid is greater than 120 pixels, then a scroll-up action is performed.

3.9 Perform an Action After deciding what actions to perform, now actions related to the mouse are performed, such as free movement of the cursor, left click, drag, right click, and so on. As we know, the resolution that webcams generally capture video is at 640*480 pixels in most cases. Assume the 1920 × 1080 pixel display screen was linearly transferred to this frame. If there is a right-handed user, accessing the left edge of the screen is more difficult than accessing the right edge. The wrist would be strained as well if the bottom half of the screen was accessed. Rather than mapping the complete frame of the video, in order to improve comfort, consider a rectangular section of

512

R. Verma et al.

the frame that is more right-skewed (assuming the user to be right-handed) and topskewed. After that, with a scale factor of 4, this 480 * 270 pixels sub-portion is linearly translated to the display screen as follows: • For horizontal position of the cursor cursor_position[0] = 4∗(color_position[0] − 110) • For vertical position of the cursor cursor_position[1] = 4∗(color_position[1] − 120) Now, the frame is completely translated to the entire screen, so different actions of the mouse are performed on the entire screen such as left click, drag, double left click, right click, scroll up, and scroll down.

4 Conclusion Using a webcam, this chapter created and built an object tracking-based virtual mouse application. The proposed system uses a real-time camera to control the mouse cursor and perform its job. The user communicates with the computer in front of a camera using fingertip movement rather than a mouse, gloves, or markers. Mouse movement is supported, as well as mouse operations such as right click, left click, double click, and scrolling actions and dragging actions. To perform mouse pointer motions and other mouse activities, this system uses image comparison and motion detection technology. The demonstrated approach is not just highly accurate but also very practical for operating different applications, software’s, etc. Most current virtual mouse systems have shortcomings, but the suggested solution overcomes them. It has a number of advantages, including the ability to perform in a variety of lighting conditions and with complicated backgrounds, as well as accurate fingertip tracking over a longer distance. There are still some drawbacks in this study, such as the inability to distinguish fingertip colour when the same hue is present in the backdrop. As a result, our next effort attempts to address these issues and improve the fingertip tracking algorithm. Also, various other actions can be also done along with the mouse movements and mouse functions by making combinations of these three colours differently.

Human-to-Computer Interaction Using Hand Gesture Tracking System

513

References 1. Sharma RP, Verma GK (2015) Human computer interaction using hand gesture. In: Eleventh international multi-conference on information processing-2015 (IMCIP-2015). Procedia Comput Sci 54:721–727 2. Tran NX (2009) Wireless data glove for gesture-based robotic control. In: 13th International conference on HCI, vol 5611 issue 1, pp 271–280 3. Hariaa A, Subramaniana A, Asokkumara N, Poddara S, Nayak JS (2017) Hand gesture recognition for human computer interaction. In: 7th International conference on advances in computing & communications, ICACC-2017, 22–24 Aug 2017, Cochin, India. Procedia Comput Sci 115:367–374 4. Starner T, Pentland A (1997) Real-time American sign language recognition from video using hidden Markov models. In: Motion-based recognition. Springer, Berlin/Heidelberg, Germany, pp 227–243 5. Malima AK Özgür E, Çetin MA (2006) Fast algorithm for vision-based hand gesture recognition for robot control. In: Proceedings of the 2006 IEEE 14th signal processing and communications applications, Antalya, Turkey, pp 17–19 6. Tsai T-H Huang C-C, Zhang K-L (2015) Embedded virtual mouse system by using hand gesture recognition. In: Proceedings of the 2015 IEEE international conference on consumer electronics-Taiwan (ICCE-TW), Taipei, Taiwan, pp 352–353 7. Sharp T, Freedman D, Wei Y Kohli P (2015) Accurate, robust, and flexible real-time hand tracking 8. Wanga P, Lia W, Ogunbonaa P, Wanc J, Escalerad S (2018) RGB-D-based human motion recognition with deep learning: a survey 9. Deng L (2014) A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans Signal Inf Process 3:1–29 10. Tran D-S, Ho N-H, Yang H-J, Baek E-T, Kim S-H, Lee G (2020) Real-time hand gesture spotting and recognition using RGB-D camera and 3D convolutional neural network. School of Electronics and Computer Engineering. Chonnam National University, 77 Yongbong-ro, Gwangju, pp 500–757 11. Mitra S, Acharya T (2007) Gesture recognition: a survey. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 37(3):311–324 12. Tran1 D-S, Ho1 N-H, Yang1 H-J, Kim1 S-H, Lee GS (2021) Real-time virtual mouse system using RGB-D images and fingertip detection. Multimedia Tools Appl 10473–10490

Human Emotion Recognition Based on EEG Signal Using Deep Learning Approach S. G. Shaila, A. Sindhu, D. Shivamma, V. Suma Avani, and T. M. Rajesh

Abstract Nowadays, emotion recognition and classification plays a vital role in the field of human–computer interaction (HCI). Emotions are being recognized through body behaviors such as facial expression, voice tone, and body movement. The present research considers electroencephalogram (EEG) as one of the foremost used modality to identify emotions. EEG measures the electrical activities of the brain through a bunch of electrodes placed on the scalp. This mechanism is used due to its high temporal resolution with no risks and less cost. Over the last decades, many researchers involved EEG signals in sequence to cope up with brain-computer interface (BCI) and to detect emotions. It includes removing artifacts from EEG signals, extracting temporal or spectral features from the EEG signals, analysis on time or frequency domain, respectively, and eventually, designing a multi-class classification strategy. The paper discusses the approach of identifying and classifying human emotions based on EEG signals. The approach used deep learning technique such as long-short term memory (LSTM) model and gated recurrent units (GRUs) model for classification. The obtained experimental result seems to be promising with good accuracy in the emotion classification. Keywords Emotions · EEG signal · Long-short term memory (LSTM) · Gated recurrent units (GRUs) · DEAP dataset · Classification

S. G. Shaila (B) · A. Sindhu · D. Shivamma · T. M. Rajesh Dayananda Sagar University, Bangalore, Karnataka, India e-mail: [email protected] D. Shivamma e-mail: [email protected] T. M. Rajesh e-mail: [email protected] V. Suma Avani Vijaya Institute of Technology for Women, Vijayawada, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_44

515

516

S. G. Shaila et al.

1 Introduction Emotions play an important role in the human life to process the communication between people. Emotions are expressed in many ways like facial expression, movement of the body and communication. These expressions cannot be always true, many a times this results in false emotions. Hence, nowadays, the researcher tends with the approach to acknowledge human emotions through brain activities. Physiological signals of the brain help us to acknowledge real expressions. Brain activities are often monitored in numerous ways, such as resonance imaging (fMRI), electroencephalography (EEG), magnetoencephalography (MEG), and position emission tomography (PET). EEG is the most adapted technique for recognizing brain signals. Initially, researchers have an interest within the sphere of emotion recognition using facial expressions. Later, the researchers found that the EEG approach identifies false expression, thus brain-computer interface (BCI) became a hot research topic nowadays. Many companies and researchers use BCI, as it is of low cost and risk-free. Most BCI system relies on EEG signals, and it is simple to use by normal users. The feature extraction with help of a human brain activities signals supports in recognizing the emotions. Emotion recognition gained importance as it also supports physically disabled persons who cannot express their emotions. This section discusses various literatures done by many researchers. The authors have used various machine learning and deep learning techniques such as convolutional neural network (CNN), recurrent neural network (RNN), LSTM and bidirectional-LSTM, and support vector machine (SVM) for predicting human internal emotions. The authors Bazgir et al. [1] applied machine learning classifiers such as K-nearest neighbor, support vector machine, and artificial neural network (ANN) on EEG dataset. The arousal, valence, and accuracy achieved are 86.7, 91.3, and 83.5%. The authors Reñosa et al. [2], applied ANN classifier and achieved a poor accuracy of 78%. The authors Granados et al. [3] used Amigos dataset that contains group of individual emotion and approach used deep convolution neural network (DCNN). The authors Alarcao and Fonseca [4] analyze the dataset in two modes such as offline and online. In online, signal changes accordingly based on a music dataset. The authors Ullah et al. [5] used the DEAP dataset, which contains the tiny print of 32 subjects and 8064 samples and used two algorithms like DNN and CNN and achieved an accuracy of 73%. The authors Blaiech et al. [6] proposed the fuzzy logic method and emotions are categorized, and percentages of assorted emotions are calculated. The authors Ding et al. [7] handle EEG data which contains details of 18 healthy subjects, the performance of the proposed deep learning network is gaged using this method for the classification of low and high arousal states. The method used SVM and LSTM and achieves a classification accuracy of 86.03%. Bird et al. [8] tested WDBC dataset on a combination of different feature selection algorithms and different classifiers such as SVM, random forest, and Bayesian network with an accuracy of 84.96% for tenfold cross-validation. Qiao et al. [9] applied three classifiers decision trees, KNN, and random forest for two different datasets, for DEAP dataset achieved 62.63% accuracy and for seed, they achieved

Human Emotion Recognition Based on EEG Signal Using Deep …

517

74.85%. Gupta et al. [10] used DEAP and seed datasets for the classifiers SVM and random forest with the accuracy of 83.33% and 59.06%. It used the FAWT method for emotion identification. The paper discusses the approach for identifying the emotions using recurrent neural network long-short term memory (RNN-LSTM) and gated recurrent units (GRUs) models and their comparison. The LSTM is a type of RNN and is employed as it is difficult to sustain the long-term dependency. In RNN model, the long sequences are hard to be processed as it is trained by back-propagation and it causes the vanishing/exploding gradient. To avoid this, the RNN cell is replaced by a gated cell similar to LSTM cell. These gates control which information must be remembered in memory and which are not. LSTM cell holds the memory till the last stage. Gated recurrent unit (GRU) is a simplified version of LSTM. It uses two gates such as reset gate and update gate. Reset gate creates only one gate function to input dataset, and update gate creates two gate functions which range from 0 to 1. The rest of this paper is organized as follows. We review the literature in the next section, and the proposed work is presented in Sect. 3. In Sect. 4, we present the experimental results and conclude this paper in the last section of this paper.

2 Proposed Work This section represents the proposed model in Fig. 1. The approach used the EEG dataset and demographic dataset as input for experimenting. The dataset is preprocessed and further preceded with feature selection and extraction. The approach used cross-validation for splitting up of the data into training and validation sets. Classification is done using deep learning technique, and performance evaluation is done.

2.1 Dataset Description EEG dataset contains the signals of online education video of ten students, while they are watching MOOC video clips. The videos such as while teaching basic algebra or geometry are assumed to be non-confusing to the students and extracted the video clips, while they are watching this video. Another video contains the education video on the topics like quantum physics and vegetative cells. The observation is done in Fig. 1 Proposed model for identifying emotions

518 Table 1 Details of the EEG dataset

S. G. Shaila et al. Dataset

Attributes Instances Classes

Electroencephalography (EEG) 16

12,811

2

which is assumed to be confusing for the students and recorded the video clips. The dataset in total contains 20 videos, ten are confused and ten are non-confused. Each video clip was about 2 min. In the middle two minutes, video clips are chopped to make the students more confusing. Here, students were made to wear the wireless mindset which measures brain activities. The voltage between electrodes that are resting on the forehead and electrodes which are connected to the ear is measured using mindset. The confusing scale of 1–7 corresponds to the lowest amount of confusion and more than 7 corresponds to the highest amount of confusion. The details of the EEG dataset used by the proposed approach is depicted in Table 1. The demographic dataset represents characteristics of the EEG dataset like subject ID, age, gender, and ethnicity. This data is sample data that can be created according to our convenience. Demographic data is used to analyze the complex data easily, such as instead of using subject ID we can fetch the data using gender.

2.2 Data Preprocessing In preprocessing, two or more datasets are combined into a single dataset. Here, the approach used Pandas’ for merging into one data frame. The dataset may have some features that might profoundly confuse. These features are subject ID and video ID. The subject and video ID indicate details that are not related to the EEG brainwave. These features will over fit the model since we have ten clips in all for each of the ten students and these 60-s clips are divided into parts of 0.5 s samples. There is a chance that the model may end up learning these IDs and the resultant prediction might be dominated by these features. The ‘predefined label’ indicates which confusion state was supposed to be detected by the experiment conductor prior to the test. The approach do not need this feature and been removed. The approach targets the ‘userdefined label’ since it is the label that indicates if a signal is correlated to a confusion state. From 13,000+ samples, after cleaning, the approach uses 12,811 rows with 15 columns and this dataset is used for further analysis.

2.3 Feature Selection and Extraction Extraction of features in the EEG signal classification is a crucial method. The proposed approach mainly focused on gender, ethnicity features. The approach encodes the gender feature for uniqueness as 1 for male and 0 for female. The proposed approach contains a total of 16 features in which five categorical features

Human Emotion Recognition Based on EEG Signal Using Deep …

519

Fig. 2 Features selection

like age, gender, ethnicity such as English, Bengali, and Han Chinese. The remaining 11 features are continuous features. Signals of EEG are divided into different bands such as alpha (8–16 Hz), gamma (32–64 Hz), theta (4–8 Hz), beta (16–32 Hz), and noises (>64 Hz). This is represented in Fig. 2. These are five main features utilized in the proposed approach. The dataset is splitting it into training, validation, and testing set using 70–20–10 split ratio. They are then split into input X and target Y of the respective categories for further processing. The given dataset is not identical across all of them with respect to the standard deviation of attribute values because of this issue certain attribute ends up being weighted over other attributes.

2.4 Classification The dataset includes EEG signal features such as alpha, gamma, and delta. The dataset contains a total of 16 features, further divided into five categorical and 11 continuous features for better prediction and to reduce the complexity in the dataset. Here, the models are trained to identify whether students are confused or not. Initially, the proposed approach used the LSTM model. Here, 70% of the data sample is used for the prediction model and 30% for validating the developed model with 100 epochs. In the next stage, the LSTM model uses 80% of the data sample for training the model and 20% for validating the developed model with 50 epochs. Further, the proposed approach used GRU model for experimenting, the results with a ratio of 70:30, 70% for training the model, and 30% for testing the model with 50 epochs. Here, LSTM and GRU models are used for developing the predictive models for classification. LSTM is an advanced version of recurrent neural network (RNN) and is mainly designed to solve the problem of fading and exploding the gradient of RNN. It

520

S. G. Shaila et al.

Fig. 3 LSTM architecture

Fig. 4 GRU architecture

remembers the past data and forget the irrelevant data. The working architecture of LSTM is shown in Fig. 3. GRU is a simplified version of the LSTM. GRU uses two gate vectors and one state vector. It aims to solve the vanishing gradient problem. The working architecture of LSTM is shown in Fig. 4.

3 Results and Discussion The experimental evaluations are analyzed with two different models such as LSTM and GRU. Out of 12,811 samples, the dataset has been divided into 70:30 (70% for training set and 30% for testing set) and then, the dataset is divided into 80:20 and experimented for 50 epochs. The training data is modeled and the proposed approach applied the confusion matrix to evaluate the performance of the classification. The evaluation of confusion matrix in terms of True Positive, False Positive, True Negative, and False Negative values for considered. Apart from these, the metrics such as classification accuracy, precision, recall, and F1-score are evaluated for both the classifiers. For each of the models, the classification accuracy is measured by comparing the classification accuracy of true labels with false labels. Eleven sets of feature combinations were used to assess the best outcomes. The performance metrics of

Human Emotion Recognition Based on EEG Signal Using Deep …

521

Table 2 Performance evaluation on testing data Algorithm

Accuracy

F1-score

Precision

Recall

LSTM (70:30)

0.82

0.80

0.85

0.81

LSTM (80:20)

0.83

0.83

0.88

0.80

GRU (70:30)

0.85

0.84

0.89

0.81

Fig. 5 Confusion matrix

the suggested technique in relation to the LSTM and GRU classifiers are shown in Table 2. The LSTM model with 100 epochs produces the best results. An accuracy of 0.83 was attained by the LSTM. The confusion matrix for the logistic regression model with the best accuracy may be found in Fig. 5. Thus, it is noticed that the performance of the deep learning approach in emotion recognition based on EEG signal has gained better results.

4 Conclusion and Future Work In this paper, the proposed approach mainly focuses on emotion classification based on EEG signals. The proposed system uses LSTM model and GRU model to identify the emotions based on the extracted five signals from an EEG sensor. The LSTM model makes use of categorical and continuous features to produce accurate accuracy of 85.62% and 82.53% for GRU, for emotions happy, sad, and neutral. Future work will be focused on mixed emotions which are complex in nature such as happily surprised, sadly angry, and angrily fearful with the comparison of different algorithms.

522

S. G. Shaila et al.

References 1. Bazgir O, Mohammadi Z, Habibi SAH (2018) Emotion recognition with machine learning using EEG signals. In: International Iranian conference on biomedical engineering (ICBME), pp 1–5. https://doi.org/10.1109/ICBME.2018.8703559 2. Reñosa CRM Bandala AA, Vicerra RRP (2019) Classification of confusion level Using EEG data and artificial neural networks. In: 2019 IEEE 11th international conference on humanoid, nanotechnology, information technology, communication and control, environment, and management (HNICEM), pp 1–6. https://doi.org/10.1109/HNICEM48295.2019.9072766 3. Santamaria-Granados M, Munoz-Organero G, Ramirez-González E, Abdulhay Arunkumar N (2019) Using deep convolutional neural network for emotion detection on a physiological signals dataset (AMIGOS). IEEE Access 7:57–67. https://doi.org/10.1109/ACCESS.2018.288 3213 4. Alarcao SM, Fonseca MJ (2019) Emotions recognition using EEG signals: a survey. IEEE Trans Affect Comput 10:374–393. https://doi.org/10.1109/TAFFC.2017.2714671 5. Ullah H, Uzair M, Mahmood A, Ullah M, Khan SD, Cheikh FA (2019) Internal emotion classification using EEG signal with sparse discriminative ensemble. IEEE Access 7:40144– 40153. https://doi.org/10.1109/ACCESS.2019.2904400 6. Blaiech H, Neji M, Wali A, Alimi AM (2013) Emotion recognition by analysis of EEG signals. In: 13th International conference on hybrid intelligent systems (HIS 2013), pp 312–318. https:// doi.org/10.1109/HIS.2013.6920451 7. Ding Y, Robinson N, Zeng Q, Chen D, Wai AP, Lee T-S, Guan C (2020) TSception: a deep learning framework for emotion detection using EEG. IEEE. https://doi.org/10.1109/IJCNN4 8605.2020.9206750 8. Bird JJ, Manso LJ, Ribeiro EP, Ekart A, Faria DR (2018) A study on Mental state classification using EEG-based brain machine interface. IEEE. https://doi.org/10.1109/IS.2018.8710576 9. Qiao C, Xu X, Cheng Y (2019) Interpretable emotion recognition using EEG signal. IEEE Access 7:94160–94170. https://doi.org/10.1109/ACCESS.2019.2928691 10. Gupta V, Chopda MD, Pachori RB (2018) Cross-subject emotion recognition using flexible analytic wavelet transform from EEG signals. IEEE Sens J 19:2266–2274. https://doi.org/10. 1109/JSEN.2018.2883497

Sentiment Analysis of COVID-19 Tweets Using BiLSTM and CNN-BiLSTM Tushar Srivastava, Deepak Arora, and Puneet Sharma

Abstract In a society where people express almost every thought they have on social media, analysing social media for sentiment has become very significant in order to understand what the masses are thinking. Especially microblogging website like twitter, where highly opinionated individuals come together to discuss ongoing socioeconomic and political events happening in their respective countries or happening around the world. For analysing such vast amounts of data generated every day, a model with high efficiency, i.e., less running time and high accuracy, is needed. Sentiment analysis has become extremely useful in this regard. A model trained on a dataset of tweets can help determine the general sentiment of people towards a particular topic. This paper proposes a bidirectional long short-term memory (BiLSTM) and a convolutional bidirectional long short-term memory (CNN-BiLSTM) to classify tweet sentiment; the tweets were divided into three categories—positive, neutral and negative. Specialized word embeddings such as Word2Vec or term frequency— inverse document frequency (tf-idf) were avoided. The aim of this paper is to analyse the performance of deep neural network (DNN) models where traditional classifiers like logistic regression and decision trees fail. The results show that the BiLSTM model can predict with an accuracy of 0.84, and the CNN-BiLSTM model can predict with an accuracy of 0.80. Keywords Twitter sentiment analysis · COVID-19 sentiment analysis · BiLSTM · CNN-BiLSTM

T. Srivastava (B) · D. Arora Department of Computer Science & Engineering, Amity University, Lucknow Campus, Lucknow, Uttar Pradesh, India e-mail: [email protected] D. Arora e-mail: [email protected] P. Sharma Department of CSE, Amity University, Lucknow, Uttar Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_45

523

524

T. Srivastava et al.

1 Introduction On 11 March 2020, WHO declared coronavirus disease (COVID-19) a pandemic and the rest are history. Millions died, children orphaned, multiple lockdowns took place. In times of such distress, people took to social media for help, be it for arranging a certain medicine, blood plasma of survivors or just to check on their friends and families. Microblogging site twitter generated massive amount of data during the peak of pandemic. The tweets ranged from helpful information on how to deal with the infection to misinformation, fear-mongering and fake news. Sentiment analysis is a branch of natural language understanding, where text is categorized in a specific category. It can be as simple as positive/negative or as complex as human emotions such as sadness, anger and happiness. Sentiment analysis can be understood as a classification problem but for text. Analysing the sentiment of social platforms is extremely crucial nowadays. Unlike the olden days, today the whole world is connected via the Internet. Crucial information is shared on the Internet which can be helpful in many ways. Sentiment analysis of a microblogging website like twitter could have helped governments to analyse where the virus has spread, and it could have been used to map the movement of the virus. Negative sentiment COVID-19 tweets decreasing from a certain area and negative tweets increasing in another area could have helped government policy making. Similarly, for vaccines, sentiment analysis could’ve helped analyse people’s apprehension towards the vaccine and government programmes could’ve been formulated to spread better awareness among the people. In this research, a COVID-19 tweet dataset has been used to evaluate the performance of a couple of DNN models and some classifiers. The dataset contained some clean and some unclean data. The dataset was pre-processed, lemmatized, and exploratory data analysis (EDA) was done on the dataset but feature engineering was avoided to achieve unbiased accuracy of the neural networks. The main metric used for evaluation is ‘accuracy’ but precision, recall and F1-score were also considered. The results showed that the performance of neural networks was much better in comparison with traditional classifiers like logistic regression, multinomial naive Bayes classifier, random forest and decision tree.

2 Literature Study As the researchers are realizing the vast application of sentiment analysis, the number of research works related to this field is increasing rapidly. Some related research work is briefly mentioned below: Kouloumpis et al. [1], explored the importance of feature engineering in sentiment analysis. The dataset comprised of two types of data, emoticon-based tweets and hashtag-based tweets. Feature engineering techniques like parts-of-speech feature, ngram, lexicon feature and microblogging feature were used on both the datasets.

Sentiment Analysis of COVID-19 Tweets Using BiLSTM …

525

From the results, it was evident that the parts-of-speech feature was not useful for sentiment analysis of twitter while parts-of-speech combined with lexicon feature and microblogging feature was giving the best results. Mahbub et al. [2] used a COVID-19 tweet dataset and analysed the importance of word embedding in sentiment analysis. The dataset was vectorized using bagof-words and tf-idf models, further classifiers like logistic regression, SVM and XGBoost were used for evaluation. The metrics used for evaluation were accuracy, precision, recall and F1-score. Here, accuracy was the primary determinant of the model’s performance. The results showed that random forest was the best working model when combined with both bag-of-words dataset and tf-idf dataset while XGBoost was the worst working classifier. Between bag-of-words and tf-idf, bag-of-words had higher accuracy in combination with random forest and XGBoost both. Gopalakrishnan and Salem [3] in their research paper analysed LSTM and bidirectional LSTM models with various parameters. The objective of the research was to find a model with low training time and low-cost function. The dataset used was GOP debate twitter dataset. The paper explored different combinations of optimizers and models, consisting of shuffled LSTM and CNN layers. The paper concluded that first CNN and then LSTM layer in a model were performing better than the rest. Rhanoui et al. [4] researched document-level sentiment analysis using CNNLSTM, CNN, LSTM and BiLSTM. The dataset used was created by extracting tweets from twitter. Three types of word embeddings used on the dataset, namely Word2Vec, GLoVe, etc., were used. The results showed that the CNN-BiLSTM with Doc2Vec embedded dataset had the highest accuracy of 90.66%. Shazley et al. [5–7] showcased that feature selection is crucial in sentiment analysis. The researchers used an Arabic language dataset for sentiment analysis and proposed an algorithm called HWPGA which comprised of the following optimizers: grey wolf optimizer (GWO), particle swarm optimizer (PSO) and genetic algorithm (GA). HWPGA algorithm involves pre-processing the dataset then combining the above optimizers to extract the best features. Further, a bidirectional recurrent neural network (BRNN) was used to check the performance of the algorithm. The research concluded that the BRNN model had better accuracy in comparison with SVM and KNN. HWPGA had the highest accuracy in comparison with GWO, PSO and GA individually.

3 Methods A lot of researches have been conducted on sentiment analysis previously by some extremely qualified researchers but the focus of a majority of these researches has been feature engineering and feature selection. This paper focuses more on DNN modelling rather than feature engineering. The problem with vectorizers like bagof-words or tf-idf is that they tend to increase the dimensionality of the dataset by a fair amount, which turns the whole prediction process into a very time-consuming

526

T. Srivastava et al.

and costly task. Finding the right size of vocabulary is crucial with these models else the efficiency of the model suffers due to the curse of dimensionality. The dataset analysed here comes from a couple of different sources, one consisting of COVID-19 disease-related tweets and another COVID-19 vaccine-related tweets. The dataset was already sentiment annotated. The datasets were then cleaned and combined to form a single dataset. After that, the dataset was pre-processed which involved lemmatization, tokenization and padding. Further, the dataset was divided into X and y, where X was the features which consisted of tokenized and padded tweets, whereas y contained labels which were of three types: positive, negative and neutral. Later on, the dataset was split into a testing and training dataset. 80% of the data was used for the purpose of training the model while the remaining 20% was used for testing. Further, a bidirectional long short-term memory (BiLSTM) network was modelled to divide the tweets into the three above-mentioned categories. Another convolutional bidirectional long short-term memory (CNN-BiLSTM) network was modelled for classification purposes. The metrics used for evaluation were accuracy, precision, recall and F1-score. Here, accuracy was the primary determinant of the model’s performance. Since the dataset was properly balanced, using accuracy was reliable. Subsequently, traditional classifiers like logistic regression, decision tree, multinomial Naïve Bayes and random forest were used to compare the performance of deep neural Networks with them. Figure 1 illustrates the approach adopted in this research work; the dataset was first taken through the illustrated steps of pre-processing then the dataset underwent different types of classification as shown in the figure.

4 Implementation The research work here was done using python programming language. The platform used was Google Colaboratory.

4.1 Data Collection and Preparation The datasets utilized here were collected from Kaggle.com. One of the datasets consisted of tweets made from India between the time frame 15th July 2020 and 23rd March 2020. The tweets were sentiment annotated and were divided among the categories of sad, anger, joy and fear. For this research, sadness, anger and fear were considered to be negative and joy was considered to be positive. Another dataset contained COVID-19 vaccine-related tweets, such as Pfizer, Astra—Zeneca and Sputnik V. The tweets were annotated as 1 for negative, 2 for neutral and 3 for positive. Both the datasets were combined to form a single dataset. The final dataset contained 9090 rows and 2 columns. The datasets contained raw text which needed to be cleaned. The text in the dataset contained special characters like #, *, ^, % and

Sentiment Analysis of COVID-19 Tweets Using BiLSTM …

527

Fig. 1 Graphical representation of the approach of this research

emoticons which needed to be removed. The dataset also contained twitter handles starting with @, this research did not require the usage of twitter usernames, and thus the words starting with @ were removed from the dataset. Lastly, the dataset contained URLs which were also removed from the dataset.

4.2 Lemmatization Lemmatization is a text transformation process where the words are converted into their root words. Unlike stemming, in lemmatization the word formed is a valid word. In lemmatization, context of the sentence is also taken into consideration when getting to the root word. The lemmatizer here is WordNet lemmatizer. WordNet has a large database of its own while lemmatizing it cross references words from its own database. During this phase, stop words were also removed from the dataset. Stop words are English words which add no real meaning to sentences and are used for the convenience of humans. For example, ‘What is the time’? can be converted to ‘What time’, here ‘is’ and ‘the’ are stop words. Figure 2 shows a tweet before and after being lemmatized.

528

T. Srivastava et al.

Fig. 2 Tweet before and after lemmatization

Fig. 3 Tweet before and after tokenization and padding

4.3 Tokenization and Padding Computers cannot read like humans do and are only able to understand 0s and 1s and thus the need to tokenize the words arises. Tokenization can be an extremely crucial step in natural language understanding/processing. A tokenizer breaks the sentence into words and converts them into a representation of numbers. Here, word embedding models like Word2Vec or GloVe play a crucial role by embedding words with similar meaning closer to each other. In this research, a built-in Keras tokenizer is used, and it first creates a vocabulary based on how many times a word appears in the dataset. Higher the frequency, lower the index. Then it assigns each word an integer value based on its rank in the vocabulary. Next step is to pad the sentences. Since neural networks take in a fix sized input, the length of all the sentences should be same. To get this fixed input length, zeros are added before or after the sentences to scale them. Here, the padding type is ‘post’; i.e., zeros were added after the sentences. Figure 3 illustrates the tweet before and after being padded, here the words were assigned a unique number and the empty spaces were filled with zeros in accordance with the padding size.

4.4 Classification A classification problem is one most important tasks which can be accomplished with the help of machine learning. It involves predicting the class of a data point in the feature dataset. When the classification is done between two classes, it is known as binary classification. When multiple classes are involved, it is known as multiclass classification. This research work involves multiclass classification. The

Sentiment Analysis of COVID-19 Tweets Using BiLSTM …

529

feature dataset contains tweets, the task here is to classify those tweets into three classes, namely positive, neutral and negative. Here, the label dataset is one hot encoded; i.e., the classes represented by 1s and 0s. For example, a tweet belongs to positive class then the positive class will have a value of 1 and rest of the classes will be 0. Using one hot representation helps the label dataset stay nominal rather becoming ordinal, if label encoded. For classification, the data is split into two parts: training dataset and test dataset. Here, the data is split into train to test in a ratio of 80–20%; i.e., 80% of the data was used for the purpose of training and 20% was used for testing. Another validation dataset is used which comprises 20% of the data so that the test dataset remains untouched till the time of actual prediction. BiLSTM Neural Network Recurrent neural networks are a type of feedback neural networks; i.e., the output of the NN is connected to its input. These neural networks have two sets of weights one of input and the other output; at each time step based on the input and out of the neuron, the weight of the complete layer is adjusted, till the layer reaches a stable state. Since the output is based on time, it is assumed that the neural network can preserve the state of the cell. It can be loosely said that the neural network has a memory. The problem with RNN is that after some iterations, no trace of the initial data is left in the network. This can be a problem when dealing with large sequences. This problem is commonly referred to as the short-term memory issue of the RNN. Here, long short-term memory networks play a major role. These are a modified version of RNN. They contain a long-term memory state which prevents too much data loss. The BiLSTM neural network used here is described below: Architecture • The first layer of the network was a Keras embedding layer, and this layer converts the tokenized words into vectors, which can be processed by the BiLSTM Layer. • Second layer of this network is consisted of a Keras bidirectional LSTM layer. A BiLSTM layer is used where the whole input feature matters, like sentiment analysis. It consists of two layers of LSTM concatenated together. One layer processes the sentence from left to right, the other from right to left. Their outputs are combined to adjust the weights. Since the dataset size was not very large, the model was overfitting the training dataset and had to be regularized. L2 regularization was used in this case. • The third layer was a Keras dropout layer, and this layer discards a specific number of neurons randomly and forces to model to improve. • The final layer used in this network is a Keras dense layer with softmax activation function to obtain the output. e zi Softmax(z i ) = n j=1

ez j

Here, n is the total number of neurons in the previous layer.

(1)

530

T. Srivastava et al.

• The optimizer used was stochastic gradient descent with adaptive learning rate. This was achieved by using a callback and learning rate scheduler function. The formula for this step decay function is:

1+Epoch

Learning Rate = Initial Learning Rate × Drop Epochs per Drop

(2)

• Since this is a multiclass classification problem, the loss function used was categorical cross-entropy.



output size

Loss = −

yi × log yˆi

(3)

i=1

Here, yi is the actual output and yˆi is the predicted output. CNN-BiLSTM Neural Network Convolutional neural networks are a type of feedforward neural networks; here each neuron in a layer is connected to the output of the neuron in the previous layer. These networks are mainly used in computer vision. These networks also have some application in natural language processing since they mainly work by learning patterns. Architecture • Similar to BiLSTM model, the first layer used here is Keras embedding layer to create vectors from the tokenized feature data. • The second layer used here is Keras Conv1D layer to identify the patterns in the inputted data. The kernel size used is 3, the parameter for padding is ‘same’, and the activation function used for this layer is exponential linear unit (ELU)  ELU(x) =

x, x W1 > W3 with the condition W1 + W2 + W3 = 1 to provide more importance to the energy parameter during CH reselection. This parameter condition in the process of selection is identified based on empirical evaluation.

4 Results and Discussion The simulation investigations of the propounded COPRAS-CHS and the competitive CH selection approaches such as fuzzy-TOPSIS, BFSA, P-WWO and PSO-CHS are conducted using MATLAB R2016a. The experiments of the propounded COPRASCHS scheme are executed over the network scenario that comprises of 200 sensor nodes randomly deployed with the initial energy of 2.0 J in a homogeneous environment [21–24]. In specific, the number of rounds is considered to be 5000 with packet size of 512 bytes during the transmission process. Initially, the proposed COPRAS-CHS and the completive fuzzy-TOPSIS, BFSA, P-WWO and PSO-CHS-based CH selection methods offer better throughput and network lifetime for increasing amount of nodes in the network. From Figs. 1 and 2, it is recognized that the propounded COPRAS-CHS is proficient in improving the throughput and network lifespan with increase in the quantity of nodes in the network as it adopts COPRAS for exploring the maximized number of factors that attributes toward optimized CH selection. Thus, the proposed COPRAS-CHS improves the throughput on an average by 12.56%, 15.63%, 16.98% and 19.12% when compared to the competitive fuzzy-TOPSIS, BFSA, P-WWO and PSO-CHS-based CH selection approaches.

COPRAS-Based Decision-Making Strategy for Optimal Cluster Head …

545

Fig. 1 Percentage increase in throughput of the proposed COPRAS-CHS for varying number of sensor nodes

Fig. 2 Network lifetime of the proposed COPRAS-CHS for varying number of sensor nodes

The network lifetime of the proposed COPRAS-CHS is also improved by 10.94%, 12.31%, 14.58% and 17.82% when compared to the competitive fuzzy-TOPSIS, BFSA, P-WWO and PSO-CHS-based CH selection approaches. Further, the proposed COPRAS-CHS and the competitive fuzzy-TOPSIS, BFSA, P-WWO and PSO-CHS-based CH selection schemes are assessed based on the death of the first, half and all nodes in the network with increase in the number of rounds considered for implementation. From Figs. 3 and 4, it is determined that the

546

J. Sengathir et al.

propounded COPRAS-CHS is efficient in sustaining first node and half node deaths with increase in the number of rounds as it includes the merits of the COPRASbased MADM and provides maximum weight to the energy of sensor nodes during CH selection and reselection processes. The proposed COPRAS-CHS delays the first node death by 9.34%, 11.42%, 13.82% and 15.26% in contrast to the competitive fuzzy-TOPSIS, BFSA, P-WWO and PSO-CHS-based CH selection approaches. Likewise, Fig. 5 clearly depicts that the death of half nodes is extended by the proposed COPRAS-CHS scheme by 8.56%, 10.65%, 12.68% and 15.92% in contrast to the baseline approaches considered for examination. In addition, the lifetime of nodes in the network is prolonged

Fig. 3 First node death of the proposed COPRAS-CHS for varying number of sensor nodes

Fig. 4 Half nodes death of the proposed COPRAS-CHS for varying number of sensor nodes

COPRAS-Based Decision-Making Strategy for Optimal Cluster Head …

547

Fig. 5 All nodes death of the proposed COPRAS-CHS for varying number of sensor nodes

by the propounded COPRAS-CHS scheme by 9.56%, 11.42%, 13.86% and 14.91% in contrast to the baseline schemes taken for investigation.

5 Conclusion In this paper, the proposed COPRAS-based CH Selection (COPRAS-CHS) approach aids in achieving optimized CH selection for stabilizing energy and prolonging network lifetime in WSNs. It adopts the significant factors of merged node, the number of times a specific sensor node is selected as CH, the distance of each sensor nodes from the centroid, residual energy and distance between nodes into account and achieves predominant CH selection. It incorporates the characteristic merits of COPRAS for exploring exhaustive factors of influence that provides a positive edge toward the attainment of optimized CH selection. The simulation outcomes of proposed COPRAS-CHS confirm prolonged network lifetime based on first node death, half node death and all nodes death on an average by 11.28%, 13.21% and 14.83% in contrast to the standard mechanisms considered for analysis. Further, the network lifetime guaranteed by the propounded COPRAS-CHS is also improved by 10.94%, 12.31%, 14.58% and 17.82% in contrast to the competitive fuzzy-TOPSIS, BFSA, P-WWO and PSO-CHS-based CH selection approaches. As a part of future, it is planned to verify the scalability and security characteristics of the proposed COPRAS-CHS approach with feasible influential factors considered for CH selection.

548

J. Sengathir et al.

References 1. Younis O, Fahmy S (2004) HEED: a hybrid, energy-efficient, distributed clustering approach for ad hoc sensor networks. IEEE Trans Mob Comput 3(4):366–379 2. Azad P, Sharma V (2013) Cluster head selection in wireless sensor networks under fuzzy environment. ISRN Sens Netw 2013(2):1–8 3. Gao T, Jin RC, Song JY, Xu TB, Wang LD (2012) Energy-efficient cluster head selection scheme based on multiple criteria decisions making for wireless sensor networks. Wirel Pers Commun 63(4):871–894 4. Balamurugan A, Janakiraman S, Deva Priya M (2022) Modified African buffalo and group teaching optimization algorithm-based clustering scheme for sustaining energy stability and network lifetime in wireless sensor networks. Trans Emerg Telecommun Technol 33(1) 5. Venkatanaresh M, Yadav R, Thiyagarajan D, Yasotha S, Ramkumar G, Varma PS (2022) Effective proactive routing protocol using smart nodes system. Measur Sensors 24:100456 6. Abbasi AA, Younis M (2007) A survey on clustering algorithms for wireless sensor networks. Comput Commun 30(14–15):2826–2841 7. Christy Jeba Malar A, Siddique Ibrahim SP, Deva Priya M (2019) A novel cluster based scheme for node positioning in indoor environment. Int J Eng Advanced Technol 8(6S):79–83 8. Balamurugan A, Janakiraman S, Deva Priya M, Christy Jeba Malar A (2022) Hybrid Marine predators optimization and improved particle swarm optimization-based optimal cluster routing in wireless sensor networks (WSNs). China Commun 19(6):219–247. https://doi.org/10.23919/ JCC.2022.06.017 9. Aslam N, Phillips W, Robertson W, Sivakumar S (2011) A multi-criterion optimization technique for energy efficient cluster formation in wireless sensor networks. Inf Fusion 12. 2011:202–212 10. Janakiraman SM, Devapriya (2020) An energy-proficient clustering-inspired routing protocol using improved Bkd-tree for enhanced node stability and network lifetime in wireless sensor networks. Int J Commun Syst 33:e4575 11. Farman H, Javed H, Jan B, Ahmad J, Ali S, Khalil FN, Khan M (2017) Analytical network process based optimum cluster head selection in wireless sensor network. PLoS ONE 12(7):e0180848 12. Rajeswarappa G, Vasundra S (2021) Red deer and simulation annealing optimization algorithmbased energy efficient clustering protocol for improved lifetime expectancy in wireless sensor networks. Wirel Pers Commun 63(4):871–894 13. Sengottuvelan P, Prasath N (2017) BAFSA: breeding artificial fish swarm algorithm for optimal cluster head selection in wireless sensor networks. Wirel Pers Commun 94:1979–1991 14. Bongale AM, Nirmala CR, Bongale AM (2019) Hybrid Cluster head election for WSN based on firefly and harmony search algorithms. Wirel Pers Commun 106:275–306 15. Khan BM, Bilal R (2020) Fuzzy-topsis-based cluster head selection in mobile wireless sensor networks. Sens Technol 2(2):596–627. https://doi.org/10.4018/978-1-7998-2454-1.ch029 16. Khot PS, Naik U (2021) Particle-water wave optimization for secure routing in wireless sensor network using cluster head selection. Wirel Pers Commun 119:2405–2429 17. Bandi R, Ananthula VR, Janakiraman S (2021) Self Adapting differential search strategies improved artificial bee colony algorithm-based cluster head selection scheme for WSNs. Wirel Pers Commun. https://doi.org/10.1007/s11277-021-08821-5 18. Rajeswarappa G, Vasundra S (2021) Red deer and simulation annealing optimization algorithmbased energy efficient clustering protocol for improved lifetime expectancy in wireless sensor networks. Wirel Pers Commun 1–28 19. Loganathan S, Arumugam J (2021) Energy efficient clustering algorithm based on particle swarm optimization technique for wireless sensor networks. Wirel Pers Commun 119:815–843 20. Narang M, Joshi MC, Pal AK (2021) A hybrid fuzzy COPRAS-base-criterion method for multi-criteria decision making. Soft Comput 25(13):8391–8399

COPRAS-Based Decision-Making Strategy for Optimal Cluster Head …

549

21. Balamurugan A, Priya MD, Janakiraman S, Malar ACJ (2021) Hybrid stochastic ranking and opposite differential evolution-based enhanced firefly optimization algorithm for extending network lifetime through efficient clustering in WSNs. J Netw Syst Manage 29(3):1–31 22. Janakiraman S, Priya MD, Devi SS, Sandhya G, Nivedhitha G, Padmavathi S (2021) A Markov process-based opportunistic trust factor estimation mechanism for efficient cluster head selection and extending the lifetime of wireless sensor networks. EAI Endorsed Trans Energy Web 8(35):e5–e5 23. Tamilarasan N, Lenin SB, Jayapandian N, Subramanian P (2021) Hybrid shuffled frog leaping and improved biogeography-based optimization algorithm for energy stability and network lifetime maximization in wireless sensor networks. Int J Commun Syst 34(4):e4722 24. Shyjith MB, Maheswaran CP, Reshma VK (2021) Optimized and dynamic selection of cluster head using energy efficient routing protocol in WSN. Wirel Pers Commun 116(1):577–599

Unusual Activity Detection Using Machine Learning Akshat Gupta, Anshul Tickoo, Nikhil Jindal, and Avinash K. Shrivastava

Abstract In this paper, we discuss a technique developed to detect unusual activities. In detection, the primary aim is to capture and locate different block’s movement. Machine learning techniques have been used, with the help of which we detect real-time individual actions and compute the unusual frames. Also in this study, we determine if the object in motion is an individual or not with the help of its characteristics. Utilizing the movement impact map, we constructed a system which can detect unusual activities in a frame. The given method consists of motion influence map generation for the frames to see interactions that have been inside the frame. The main aim in using the movement impact map is that it correctly analyzes the motion in the block of the individual moving in that frame and gives a clear picture about its characteristics, which includes movement direction, and the objects interacting with other objects in that frame. Keywords Unusual activity · Closed-circuit television (CCTV) · Computer vision · Motion influence map (MIP) · Mixture of dynamic texture (MDT)

1 Introduction Unusual activity can become very dangerous easily, especially in community areas and open spaces which can cause harm to individuals in any form. We need to find a system with the help of which we can monitor such activities without the requirement of a person physically monitoring the video 24 × 7. There are different kinds of methods already made to detect such activities which work with the acquisition of the frames where some movement of an individual is detected. It is very difficult to analyze video and identify the suspicious activities in real time. Detecting suspicious A. Gupta · A. Tickoo (B) · N. Jindal Amity University, Noida, Uttar Pradesh, India e-mail: [email protected] A. K. Shrivastava International Management Institute, Kolkata, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_47

551

552

A. Gupta et al.

events like robbery and commination in all day video captured by the CCTV is not easy. As security becomes a priority, an incredible number of observation cameras have been introduced in public and private areas.

2 Related Work Unusual function or action recognition has received increased interest from scientists in vision-based reconnaissance [1]. Jiang et al. proposed a system for inconsistency identification utilizing a spatiotemporal technique. [2]. Wang et al. utilized Kanade– Lucas–Tomasi (KLT) [3] corners to speak to moving items and bunched comparative movement designs in a solo way. They recognized oddities in an edge arrangement utilizing two kinds of chronicled movement descriptors: the self-history as well as the neighboring history. Xiong et al. proposed a camera boundary autonomous strategy by checking individuals [4]. Both the optical stream and a frontal area appropriation were used. The motor energy was estimated using an optical stream to separate jogging from normal walking exercises, and a group record circulation, which was characterized by the forefront pixel dispersion esteems and was also analyzed for detection and separation of the social event and dissipating exercises. Various other scientists have zeroed in on swarm conduct showing an intriguing examination issue in various fields [5–8]. Many procedures have been embraced for worldwide unusual action recognition. Cui et al. considered social conduct and its activity utilizing the association energy potential [9]. They distinguished the spacetime interest focuses [10] and followed them utilizing a KLT highlight tracker to get human movement inside a video succession. The association energy potential was assessed from the speed of the space-time revenue focuses to clarify whether they will meet soon [11]. Mancas et al. quantifiably spoke to the worldwide extraordinariness to choose superfluous maneuver from the captured environment utilizing base up saliency. The saliency list was measured in numerous channels, which comprised of various rates and headings. Nearby surprising action was at last identified utilizing the relating saliency maps. Kratz et al. broke down a volume of incredibly packed video arrangements by building a movement design [12]. They encoded the movement designs into an appropriation-based hidden Markov model (HMM) [13]. Wang et al. [14–17] estimated the adjustment in power frequencies over the long haul in a spatio-transient cuboid utilizing a wavelet change. They demonstrated that an anomalous area shows a high recurrence inside a specific timeframe. These past strategies exhibited their efficacy in their own tests, and they generally centered exclusively around nearby or worldwide strange action recognition. Besides, we contend that a joint thought of the movement streams, object size, and connections among items can quantifiably speak to individual movement in a crowd-filled environment and along these lines help to upgrade the exhibition of bizarre action location [18]. In particular, we initially propose a M. I. M. for analyzing the crowd movements in the captured environment. In light of this proposed movement impact map, we further devise a

Unusual Activity Detection Using Machine Learning

553

strategy for detecting and limiting either nearby or worldwide irregular exercises inside a bound together system. The methods proposed till now are planned to perceive straightforward human activity, for example, strolling, running and a lot more however not reasonable for swarmed territory. Framework which has been proposed can perceive unordinary human activity from group and activity likewise utilizing movement impact guide and OpenCV [19–22]. The proposed framework can work for the prevention of any crime. The Unusual Crowd Activity Detection can be actualized in different public spots. In any case, precision is frequently significant which requires upgrading for building up an ideal framework that can be actualized essentially.

3 Methodology Followed We implemented the algorithm which will perceive the activities in a gathering and will examine if the recognized event is a usual or an unusual event. Our model uses libraries provided in python like OpenCV and the developing environment that gives us the most promising results with accuracy. The program uses the MIP which can detect the events correctly and accurately as described earlier. This realizes the unusual action and is used for recognizing a method for detecting the machine vision features. It measures the variations in the different segments in the frames from the videos that show us the minute variations in a set span of time. Machine vision is a useful method with the help of which we can find real-time individual actions and also compute the unusual frames. Figure 1 shows the various steps involved in our proposed work. The main parts of this study are as follows: Motion influence map (MIP): The motion of an individual in a gathering and direction of it will be affected by numerous factors like the surrounding objects and

Fig. 1 Different steps of our proposed work [23]

554

A. Gupta et al.

the individuals and some other different motion objects. All this combined is what is called as the ‘motion influence.’ Motion descriptor: We convert the segments of the video into images, and each frame is subdivided into tiny blocks which are used as the descriptor for the motion of the unusual events detection as it can be very difficult tracking individual real-world objects like a shopping cart or a human. Characteristics extraction, detection, and localization: As an event is recorded by a number of segments or frames. We can individually feature extract a vector, and also, the localization is done with the help of the similar method for every individual block. In extraction, the background and the foreground are separated, and different masks levels are formed for each individual frame. Then the objects from the front mask are extracted using the help of histogram and the local region map. Then we build a region size matrix which shows us the numerous regions that are made up of similar size to detect the objects which are called as the PCs. All the regions that are found if it is bigger than the PCs, then it is known as a large region which are then used from the data and the descriptor is used on it. Anomaly detection—If the sizes are approximately same to the PCs, then the histogram is made and then applied to the model and if it does not apply to the normal-half of the feature space, it is defined and declared as an unusual activity. The approach that we discussed uses the whole information, and there is not much of the spatial knowledge of the anomalies so even if there is a local change, it brings about a change in the histogram which leads it to be undetected. So to alter this fault we use scheme known as K-means clustering with the help of which we can bring together different pixels along with their speed and location.

3.1 Dataset For an authentic software, it is far practical for a keen statement framework to identify both nearby and global events in a bound environment. Utilizing the proposed movement impact map, we have built a novel software that can distinguish each local and global events taking place in crowd-filled scenes. The proposed work identifies unusual activities at the threshold stage and afterward further restricts the territories of neighborhood unordinary sporting activities at the pixel level. We authorized the adequacy of the proposed model on public datasets, i.e., the UMN1 and UCSD2 datasets. The proposed work is crucial in looking after troubles in thickly jam-crowd-filled situations investigation. Some of these are, ordering swarm states, dividing recordings into segments, assessing swarm length, and following gadgets in swarms. The goal right here is to understand the deviations from common organization practices that are propelled through the pervasiveness of digital camera commentary frameworks, the problems in showing swarm methods and its importance of swarm checking for exclusive applications. Irregularity region is a functioning vicinity of exploration all by itself. Different methodologies had been proposed, for

Unusual Activity Detection Using Machine Learning

555

both swarmed and non-swarmed scenes. Existing methodologies middle remarkably round movement statistics, overlooking anomaly information due to varieties of article look. They are impenetrable to the anomalies that do not consist of motion exceptions. Besides, descriptors, for example, optical circulate, pixel trade histograms, or different standard basis deduction activities, are difficult for swarmed scenes, where the inspiration is with the aid of definition dynamic, of extensive mess, and convoluted impediments.

3.2 Implementation In the implementation, two types of unusual activities have been considered which are local activities and global activities. The local activities occur within a specific frame. So these activities are used for a very limited area. Global activities are considered to be identified in a bigger environment because they occur all over the frame. The full frame is considered for capturing the movement. The steps involved in the implementation of this study are: Data Input and Preprocessing The clip is attached so that it can be worked upon. In preprocessing, we filter the data which is of no use, as we know data contains some useful information and some unimportant information, so we will filter out the data which is not required in this step. Optical Flow In this step, the input clip is implemented in the form of multiple images which are processed sequentially. Then we divide a frame into blocks and calculate the optical flow of each block. Segregating the Image into Smaller Sections In the wake of calculating the optical streams for every pixel inside an edge, we portion the packaging into M by N uniform squares without loss of agreement. An edge of size 240 × 320 apportioned into 48 squares where each square is of the size 20 × 20. Optical Flow of the Smaller Sections Calculation of blocks is done individually by calculating as a whole of all the individual pixels generating the section. There is a vector which tells how much each center is progressed and its direction. Motion Influence Map The motion direction of an individual within gathering is affected by various factors. We call this motion influence. Figure 2 shows the algorithm for creating a motion influence map.

556

A. Gupta et al.

Fig. 2 Algorithm of the motion influence map [23]

Creating Mega Blocks Casings are parceled into non-covering super squares, and each one of which is a mix of various movement impact blocks. The motion influence of bigger block of movement impacts upsides of the relative multitude of more modest squares establishing a bigger square. Extracting Features After the new ‘t’, a number of edges are apportioned into bigger block, and associated segment vector is removed across all of the housings. Cluster making For each block, we perform grouping utilizing the spatio-transient highlights and set the codewords. For each block, K codewords are generated. Here, we implement just video clasps of typical exercises. In this manner, the codewords of a mega block model the examples of common exercises that can happen in the individual territory. Testing phase We built up the last module of our program by fixing the bugs that we were confronting, utilized the dataset in request to discover the suitable outcome and discovered the missing credits by assessing each line of codes and improved the soundness of model via preparing the model with sufficient informational collection and revealing the issues looked to different individuals from the gathering so we can fix them and make the venture as hearty as could be expected. We split the video into different parts, and the parts with clamor were taken out from the informational collection with the help of FastDvdNet (video noise removal algorithm) and into various subsets for preparing and testing with each having 3–4 scenes. The preparation stage has regular action cuts, and the testing stage has common and surprising both movement cuts.

Unusual Activity Detection Using Machine Learning

557

Fig. 3 Confusion matrix

Pixel-level discovery If edge is identified as unusual, we look at the worth distance framework of all the different bigger blocks along with the edge esteem. Also that the worth is bigger than the edge, we arrange that block as uncommon.

4 Result After following our approach method, we can depict that our proposed method got an accuracy of 93.37% with a precision of 83.19%, recall value of model is 89.34%, and F-score which helps individual to compare the recall value and precision at the very exact time came out to be. 0.861553. Accuracy = (TP + TN)/total = 93.37% Precision = TP/predicted:yes = 83.19% Recall = TP/Actual: Yes = 89.34% F - score = (2 ∗ precision ∗ recall)/(precision + recall) = 0.861553 In the confusion matrix shown in Fig. 3, in total 165 predictions were made, there were in total 60 Nos and our model predicted 50 of them as a NO and 10 of them as a Yes, while on the other hand there were in total 105 Yes in which our model predicted 100 Yes. Figures 4 and 5 show an illustration of unusual activity detection in different scenarios by the model developed by us. This model is able to identify the abnormal crowd activity and gives us the number of frames and number of blocks.

5 Conclusion When we take a note of the past methods discussed in the literature survey, their programs concentrate on the individual local activities or global. In this paper, we proposed a technique to detect unusual activities. We used machine vision technique

558

A. Gupta et al.

Fig. 4 Red blocks represent the unusual activity taking place along with frame number and block

Fig. 5 Red blocks represent the unusual activity taking place along with frame number and block

to detect individual actions in real time. On training and testing every module, we can depict that our proposed method got a high accuracy of 93.37% which is a good accuracy when compared with the previous studies. We deduced and finalized an approach for showing the different kinds of characteristics of an individual in a segment of image to find the unusual events in a crowd. Like all other methods, it also has some cons in case of distortion in the input signal a denoised input is essential in finding and developing an accurate motion influence map (MIP) which are made on vector parameters of the dynamic objects.

Unusual Activity Detection Using Machine Learning

559

References 1. Xiang T, Gong S (2008) Video behavior for anomaly detection. IEEE Trans Pattern Anal Mach Intell 30(5):893–908 2. Jiang F, Yuan J, Tsaf Taris S, Katsa Gelos A (2011) Anomalous video event detection using spatiotemporal context. Comput Vis Image Understand 115(3):323–333 3. . Lucas BD, Kanade T (1981) Iterative image registration technique. In: 7th International joint conference on AI, USA, pp 674–679 4. Xiong G, Cheng J, Wu X, Chen Y Ou Y, Xu Y (2012) An energy model approach to people counting for abnormal crowd behavior detection. Neurocomputing 83:121–135. 5. Zhan B, Monekosso DN, Remagnino P, Velastin SA, Xu LQ (2008) Crowd analysis: a survey. Int J Mach Vis App 19(5–6):345–357. 6. Helbing D, Molnar P (1995) Social force model for pedestrian dynamics. Phys Rev E 51(5):4282–4286 7. Schadschneider A, Schreckenberg M, Sharma SD (2002) Automaton approach to pedestrian dynamics. In: Pedestrian and evacuation dynamics. Springer. 8. Lerner A, Chrysanthou Y, Lischinski D (2007) Crowds by example. Comput Graph Forum 26(3):645–674 9. Cui X, Liu Q, Gao M, Metaxas DN (2011) Abnormal detection using interaction energy. In: Conference on computer vision and pattern recognition, USA, pp 3161–3167 10. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions. In: Conference on computer vision and pattern recognition, USA, pp 1–8 11. Pellegrini S, Ess A, Schindler K, Gool LV (2009) Social behavior for multi-target tracking. In: International conference on computer vision, Japan, pp 261–268 12. Kratz L, Nishino K (2012) Tracking pedestrians using local spatio-temporal motion patterns in crowded sciences. Trans Pattern Anal Mach Intell 34(5):987–1002. 13. Rabiner LR. Hidden Markov models in speech recognition. reading in speech recognition, pp 267–296 14. Paul M, Haque SME, Chakraborty S (2013) Human detection in surveillance videos. Adv Signal Process 2013:176 15. Shu W, Miao Z (2010) Anomaly detection in crowd scene. In: Proceedings of the IEEE 10th international conference on signal processing proceedings. IEEE, pp 1220–1223 16. Wang B, Ye M, Li X, Zhao F (2011) Abnormal crowd behavior detection. Int J Control Autom Syst 9:905–912 17. Zhao J, Bao W, Zhang F, Zhu S, Liu Y, Lu H, Shen M, Ye Z (2018) Modified motion influence map monitoring. Aquaculture 493 18. Ali S, Shah M (2007) Dynamics approach for crowd flow segmentation analysis. In: Conference on computer vision and pattern recognition USA, pp 1–6 19. Cong Y, Yuan J, Liu J (2013) Abnormal event detection in crowded scenes using sparse representation. Pattern Recogn 1851–1864 20. Roy D, Mohan CK (2018) Snatch theft detection in unconstrained surveillance videos using action attribute modelling. Pattern Recogn 108:56–61 21. Shen H, Zhang L, Huang B, Li P (2007) A MAP approach for joint motion estimation, segmentation, and super resolution. Trans Image Process 479–490 22. Jhapate AK, Malviya S, Jhapate M (2020) Unusual crowd activity detection using OpenCV and motion influence map. In: 2nd International conference on data, engineering and applications (IDEA), pp 1–6. https://doi.org/10.1109/IDEA49133.2020.9170704 23. Lee DG, Suk H-I, Park SK, Lee SW (2015) Motion influence map for unusual human activity detection and localization in crowded Scenes. Trans Circ Syst Video Tech 1612–1623

Disease Detection for Cotton Crop Through Convolutional Neural Network Manas Pratap Singh, Venus Pratap Singh, Nitasha Hasteer, and Yogesh

Abstract Agriculture is the backbone of the Indian economy as major part of the financial system is dependent on it. India is an agricultural economy where more than 55% of population is dependent on agriculture. Today agriculture industry is facing a lot of problems due to the attack of various diseases caused by bacteria and pests which are not possible to be detected by bare eye. In this study, a model has been developed using convolutional neural network architecture to facilitate the detection of crop diseases through images of leaves. Cotton crop has been considered for this study as it holds a special position among all crops and is also known as white gold. The dataset comprised of five categories of cotton leaves diseases, namely bacterial blight, leaf curl, powdery mildew and attack of leaf pests and healthy leaves. Resnet152V2 model has been implemented, and an accuracy of 97% has been achieved for identifying classes of cotton leaf diseases. Keywords Crop disease · Deep learning · Resnet152V2

1 Introduction India is an agro-based country where farmers have to face a lot of difficulties in managing the crops and saving it from the attack of bacteria, pests and various other diseases. According to a recent study by Associated Chambers of Commerce and Industry of India, every year 30–35% of crop gets wasted due to pests and diseases resulting in a loss of Rs. 50,000 crores which is huge for any nation [1]. Considering textile production, cotton-producing crops hold an extraordinary footing among all the other crops. Cotton is considered as white gold as it plays significant role in national economy giving work in farms, marketing and other processing areas. India occupies almost 40% of cotton land of world. Gujarat and Maharashtra are the largest cotton-producing states then followed by Haryana and Punjab. According to the

M. P. Singh (B) · V. P. Singh · N. Hasteer · Yogesh Amity University, Noida, Uttar Pradesh 201313, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_48

561

562

M. P. Singh et al.

estimation made by CAI, country is likely to witness a decline in cotton production by 30 bales resulting a loss of Rs. 4700 crores [2]. India is the only nation where all the cultivated types of cotton are developed on large scale. However, recently the productivity is decreasing due to many reasons such as temperature, inadequate rains and occurrences of various diseases. The most common diseases which can be seen in cotton leaf are bacterial blight, fusarium wilt and curl virus. If timely detection of disease is done at an early stage, then it can help farmers in preventing the disease from spreading further thereby saving their money and time. Bare eye detection gives a low-level accuracy, and sometimes even an expert would fail to detect the disease leading to more wastages of crops. It is therefore important to develop the framework to timely detect diseases. The presence of existing techniques is not sufficient for controlling diseases and pests. Artificial intelligence analysers have been now fined tuned in order to make effective disease classification. The use of deep learning in neural network is a preferred choice among the researchers in the recent days. It is a science of computer technology that deals with extracting the features and characteristic of an image. The result obtained by using convolutional neural network (CNN) is far better than using those traditional methods like image processing. The use of multi-layered CNN is among the best deep learning model as it automatically extract features from the datasets. In this paper, a model is developed for the detection of cotton leaf diseases and classifying them using Resnet152V2 model. It also demonstrates the feasibility of residual network to classify cotton diseases. The study takes into consideration, common cotton leaf diseases such as bacteria blight, curl virus, powdery mildew and attack of leaf sucking.

2 Literature Survey Researchers across the globe have worked on detecting and classifying leaf disease through machine learning innovation. A portion of the current examination and exploration has been figured out in this section. In 2021, Zekiwos and Bruck [3] proposed image processing model for classifying cotton leaf disease and pest using deep learning (bacterial blight, spider mite and leaf miner disease). Their proposed work contains 600 images in each class. The model achieved the best accuracy at 100 epochs and focuses on diseases mostly prevalent in Ethiopia. Another research study [4] in 2021 on CNN was built by using Resnet50, and accuracy of 86.66% was obtained. The model contained three classes for obtaining the result which was background soil, lesioned leaf and healthy leaf. Another study [5] existing in the literature has reported an accuracy of 99.3 for crop disease detection. The study by Revathi and Hemalatha [6] proposed a system to identify leaf disease by processing image based on the colour, texture and shape. In this study, 6 classes of diseases were taken. Brandão et al. [7] have also evaluated the productivity and

Disease Detection for Cotton Crop Through Convolutional Neural Network

563

growth of cotton crop by using spectral reflection. Yang et al. [8] developed a model using unsupervised classification to identify regions of cotton leaves suffering from the existence of fungus. The result obtained by using CNN is considered better as comparison to other image processing techniques. Since it has different layers of abstraction and data is in form of hierarchy, therefore CNN has greater capability for learning. The work of Abade et al. [9] showed huge improvement when CNN was utilized for recognizing the diseases in plant. The model showed a significant accuracy when the images were taken from the real-time environment. Different CNN architectures were used by Amuthe and Deebe [10], and an accuracy of 60–95% was obtained. The study by Warne and Ganorkar [11] detects cotton diseases using Euclidean distance in Kmeans clustering which gives the best performance in 35 epochs. Boulente et al. [12] also reported that CNN gives the finest performance when used for classification and detection of crop diseases. The study by Abbas et al. [13] proposed a work using pre-trained architecture of DenseNet121 and conditional generative adversarial network (C-GAN). The images of tomatoes were generated using C-GAN with 10 different classes which were taken from Plant Village dataset. The model obtained 99.51%, 98.11%, 97.11% with 5 classes, 7 classes and 10 classes, respectively. Another study reported in the literature for prediction of pest and disease has been done [14] by Peng Chenn, Qingxin Xiao using long short-term memory network (LSTM). The model was able to give 94% accuracy using Bi-LSTM. A comparative study by various authors has been listed in Table 1. Research shows that relatively less work have been done in the field of cotton disease with newer technology such as CNN and transfer learning. In this proposed work, cotton leaf disease detection has been done by using transfer learning method called Resnet152V2 since it uses the concept of residual block which addresses degradation problem by skipping one or more layers.

3 Methodology Algorithm Step 1: Collect the images of cotton disease from different sources. Step 2: Giving the class labels to corresponding diseases acquired from different sources. Step 3: Pre-processing of images collected from different sources. Step 4: Splitting of images into train, test, validation. Step 5: Train the model. Step 6: Testing the CNN architectures.

564

M. P. Singh et al.

Table 1 Comparative analysis of different crop diseases S. No.

Author

Application

Technique used

Accuracy (%)

1

Bhimte and Thool [15]

Image processing approach for automatic diagnosis of cotton leaf diseases. Colour-based segmentation was used

Classification based on texture of images done using SVM classifier

98.4

2

Warne and Ganorkar [11] Cotton leaf disease detection is done. K-means clustering using segmentation and Euclidean distance method was used

K-means clustering method using Euclidean distance

89.56

3

Revathi and Hemalatha [16]

Image processing is based on the colour, texture and shape to identify leaf diseases

CIGDFNN, Edge detection

95

4

Wesley Esdras Santiago Barbara Teruel [4]

A CNN model built using Resnet50 and accuracy of 85% obtained and compared with traditional approaches

CNN

86.66

5

Shanmugam et al. [17]

Detection using remote Histogram analysis and sensing images monitoring of canny edge detection crops and identification of particular disease using canny edge detection algorithm and histogram analysis

92

6

Zekiwos and Bruck [3]

Model to boost the detection of cotton leaves disease and pests using the deep learning technique, CNN

96.4

7

Ramesh et al. [18]

Plant detection using machine Random forest and HOG learning. Classification using random forest and histogram of an oriented gradient (HOG) has been used for extracting features of image

70

8

Prashar et al. [19]

This paper implements multi-layered perceptron (MLP) with overlapping polling to classify leaves for detection of healthy and infected leaves

96

9

Paymode et al. [20]

A model developed for CNN tomato leaf disease detection by using CNN and a total of 6 types of leaf diseases were taken

7

10

Coulibaly et al. [21]

Transfer learning with feature Transfer learning extraction to build an identification system of mildew disease in pearl millet

95

K-fold cross-validation strategy and generalization of the CNN model

CNN, K-Nearest Neighbour (KNN), Support Vector Machine (SVM)

(continued)

Disease Detection for Cotton Crop Through Convolutional Neural Network

565

Table 1 (continued) S. No.

Author

Application

Technique used

Accuracy (%)

11

Thenmozhi and Reddy [22]

Crop pest classification, a deep learning model is developed to classify the insect species on public available datasets

CNN and transfer learning

96.75

12

Haider et al. [23]

Crop disease diagnosis using deep learning models. An efficient approach has been presented for the timely diagnosis of wheat disease

CNN

90

13

Aravind et al. [24]

Transfer learning approach Transfer learning Multiclass used for the classification of SVM grapes crop disease. Pre-trained AlexNet was used and three leaf diseases used

97.62

14

Abbas et al. [13]

Proposed a work using DenseNet121 and C-GAN pre-trained architecture of DenseNet121 and conditional generative adversarial network (C-GAN). The images of tomatoes were generated using C-GAN with 10 different classes which were taken from Plant Village dataset

99.5

3.1 Dataset Description In this work, images are taken from different sources such as Kaggle [25] and Google images. Total images are 2220 which belong to 5 different classes, in which 1905 images belong to training and the rest are divided into validation and testing. Four cotton leaf diseases have been taken consisting of bacterial disease—bacteria blight, viral disease-leaf curl, fungal disease—powdery mildew and attack of leaf sucking. The images of dataset have been shown in Fig. 1.

3.2 Pre-processing Images Since the images are gathered from different sources, therefore image pre-processing is done before going to further deep learning processing, to remove the noise, resize the image and perform data augmentation. Data augmentation is done to generate more datasets from the existing datasets by horizontal flipping, zooming and rotating the images. Convolution layer network is a sequential model which is inspired by the working of human mind. CNN has a much greater capacity for learning as data is represented in

566

M. P. Singh et al.

(a)

(b)

(c)

(d)

Fig. 1 a Bacterial blight, b Attacking leaf sucking c Leaf curl, d Powdery mildew

hierarchic form and has various levels of abstraction. Basically, it functions by implementing various convolutions in different layers. Each convolution layer extracts the characteristic from the image. In generic, the layers act as filter highlighting some specific patterns. The activation function is used in the model to reduce the nonlinearity in the input–output model and remove negative values from the filtered images. ReLU and softmax are popular activations functions used in classification problems. The softmax activation function [26] is given as: ez σ (z )i = k i

j=1 ez j

(1)

where zi values are input vector, k is number of classes and the term in denominator ensures that value is in range (0–1). In this work, Resnet152V2 model is used as it is better than other VGG or inception model because it consists of more deep layers, and it uses the concept of residual block which addresses degradation problem by skipping one or more layer [27]. It is first followed by a reshape layer, then by flatten layer, then by a dense layer having 128 neurons, dropout layer and finally last layer connected with softmax activation function to classify the diseases. The architecture of Resnet used is shown in Fig. 2.

Disease Detection for Cotton Crop Through Convolutional Neural Network Fig. 2 Resnet152V2 architecture

567

568 Table 2 Various parameters of model

M. P. Singh et al. Total param

58, 833,413

Trainable param

5,01,765

Non-trainable param

5,83,31,648

The reason for using Resnet architecture is that when plain or simple CNN network is used that is stacked one after another but on going deeper in network the training error should decrease but practically the error increases after a point. It skips connection which helps going deeper in the network without affecting the performance.

4 Results In this work, Resnet152V2 residual network model is trained and validated by using images from dataset consisting of 2220 images belonging to five different classes. Data augmentation is done as according to study [28] the RHB augmented images provide 15% more improvement when compared to non-augmented images. The model is first trained by using training dataset images, wherein a batch size of 32 is used and various parameters of model are illustrated in Table 2 The model is trained for 20 epochs. The training and validation accuracy of the model increased with number of epochs. From Fig. 4, it is observed that the model attains its best accuracy in 20 epochs. Adam optimizer is used as it is computationally efficient, requires less memory and is suited for program that has large data or parameters. The graphs in Figs. 3 and 4 show the training and validation accuracy as well as loss that model achieved during the process. The accuracy comes out to be 99 percent, and validation accuracy is 98 percent. It is also seen that accuracy increases with the number of epochs. The accuracy of model on test dataset came out to be 96%. The accuracy based on the result of the confusion matrix is shown in Fig. 5. Here in these Figs. 1, 2, 3, 4 and 5 are labelled as attack leaf suck, bacterial blight, curl virus, healthy leaves and powdery mildew, respectively.

5 Conclusion and Future Work In this work, a model has been built by using deep learning, transfer learning method called Resnet152V2 for the identification of the cotton diseases. It is implemented by using python & keras package and jupyter is used as development environment. Various parameters such as dataset colour, number of epochs and augmentation have been customized to get an efficient model. The Resenet152V2 model is used, and the number of epochs is significant to boost the model performance by 10%. The model

Disease Detection for Cotton Crop Through Convolutional Neural Network

569

Fig. 3 Training and validation accuracy

Fig. 4 Training and validation Loss

has achieved the highest accuracy of 97% for identifying each leaf diseases. The macro- and weighted average precision are 96% and 97%, respectively. Development of such system may be used to assist the farmers which will save their time and money. The main challenge encountered while developing the model by using deep learning

570

M. P. Singh et al.

Fig. 5 Confusion matrix

was to collect a large no. of high-quality training images with different size, shapes, light intensity and background. In future, different CNN architectures with optimized parameter selection can be used and dataset can also be increased by including wider range of cotton leaf diseases. With the high-quality images, the accuracy can be increased. The proposed approach can also be applied to various other crops, and effective treatment measures can be provided.

References 1. The Economics Times. https://economictimes.indiatimes.com/news/economy/agriculture/ crops-worth-rs-50000-crore-arelost-a-year-due-to-pestdiseasestudy/articleshow/303454 09.cms. Last accessed 21 Sept 2021 2. The print. https://theprint.in/india/cotton-farmers-stare-at-over-rs-4700-crloss-this-season-loc kdown-and-locusts-to-blame/432849. Last accessed 05 Sept 2021 3. Zekiwos M, Bruck A (2021) Deep learning-based image processing for cotton leaf disease and pest diagnosis. J Electr Comput Eng 4. Caldeira RF, Santiago WE, Teruel B (2021). Identification of cotton leaf lesions using deep learning techniques 5. Mohanty SP, Hughes DP, Salathé M (2016) Using deep learning for image-based plant disease detection. Front Plant Sci 7:1419

Disease Detection for Cotton Crop Through Convolutional Neural Network

571

6. Revathi P, Hemalatha M (2014) Identification of cotton diseases based on cross information gain deep forward neural network classifier with PSO feature selection. Int J Eng Technol 5(6):4637–4642 7. Brandão ZN, Sofiatti V, Bezerra JR, Ferreira GB, Medeiros JC (2015) Spectral reflectance for growth and yield assessment of irrigated cotton. Aust J Crop Sci 9(1):75–84 8. Yang C, Odvody GN, Fernandez CJ, Landivar JA, Minzenmayer RR, Nichols RL (2015) Evaluating unsupervised and supervised image classification methods for mapping cotton root rot. Precis Agric 16(2):201–215 9. Abade A, Ferreira PA, de Barros Vidal F (2021) Plant diseases recognition on images using convolutional neural networks: a systematic review. Comput Electron Agric 185:106125 10. Deeba K, Amutha B (2020) ResNet-deep neural network architecture for leaf disease classification. Microprocess Microsyst 103364 11. Warne PP, Ganorkar SR (2015) Detection of diseases on cotton leaves using K-mean clustering method. Int Res J Eng Technol (IRJET) 2(4):425–431 12. Boulent J, Foucher S, Théau J, St-Charles PL (2019) Convolutional neural networks for the automatic identification of plant diseases. Front Plant Sci 10:941 13. Abbas A, Jain S, Gour M, Vankudothu S (2021) Tomato plant disease detection using transfer learning with C-GAN synthetic images. Comput Electron Agric 187:106279 14. Chen P, Xiao Q, Zhang J, Xie C, Wang B (2020) Occurrence prediction of cotton pests and diseases by bidirectional long short-term memory networks with climate and atmosphere circulation. Comput Electron Agric 176:105612 15. Bhimte NR, Thool VR (2018) Diseases detection of cotton leaf spot using image processing and SVM classifier. In: 2018 Second international conference on intelligent computing and control systems (ICICCS). IEEE, pp 340–344 16. Revathi P, Hemalatha M (2012) Classification of cotton leaf spot diseases using image processing edge detection techniques. In: 2012 International conference on emerging trends in science, engineering and technology (INCOSET). IEEE, pp 169–173 17. Shanmugam L, Adline AA, Aishwarya N, Krithika G (2017) Disease detection in crops using remote sensing images. In: 2017 IEEE Technological innovations in ICT for agriculture and rural development (TIAR). IEEE, pp 112–115 18. Ramesh S, Hebbar RN, Vinod PV (2018) Plant disease detection using machine learning. International conference on design innovations for 3Cs compute communicate control (ICDI3C). IEEE, pp 41–45 19. Prashar K, Talwar R, Kant C (2019) CNN based on overlapping pooling method and multilayered learning with SVM & KNN for American cotton leaf disease recognition. In: International conference on automation, computational and technology management (ICACTM). IEEE, pp 330–333 20. Paymode AS, Magar SP, Malode VB (2021) Tomato leaf disease detection and classification using convolution neural network. In: 2021 International conference on emerging smart computing and informatics (ESCI). IEEE, pp 564–570 21. Coulibaly S, Kamsu-Foguem B, Kamissoko D, Traore D (2019) Deep neural networks with transfer learning in millet crop images. Comput Ind 108:115–120 22. Thenmozhi K, Reddy US (2019) Crop pest classification based on deep convolutional neural network and transfer learning. Comput Electron Agric 164:104906 23. Haider W, Rehman AU, Maqsood A, Javed SZ (2020) Crop disease diagnosis using deep learning models. In: 2020 Global conference on wireless and optical technologies (GCWOT). IEEE pp 1–6 24. Aravind KR, Raja P, Aniirudh R, Mukesh KV, Ashiwin R, Vikas G (2018) Grape crop disease classification using transfer learning approach. In: International conference on ISMAC in computational vision and bio-engineering. Springer, Cham, pp 1623–1633 25. Kaggle.https://www.kaggle.com/seroshkarim/cotton-leaf-disease-dataset. Last accessed at 01 Dec 2021 26. DeepAI. https://deepai.org/machine-learning-glossary-and-terms/softmaxlayer. Last accessed at 05 Dec 2021

572

M. P. Singh et al.

27. Chollet F (2018) Deep learning with python. Manning Publications Co., Shelter Island, NY, USA 28. Shin J, Chang YK, Heung B, Nguyen-Quang T, Price GW, Al-Mallahi A (2021) A deep learning approach for RGB image-based powdery mildew disease detection on strawberry leaves. Comput Electron Agric 183:106042

Deriving Pipeline for Emergency Services Using Natural Language Processing Techniques Akshat Anand and D. Rajeswari

Abstract Communications have a crucial role in times of crisis, particularly in times of emergency. Whenever a region is affected by any disaster, social media sites such as Twitter, etc., are a great way to get information out to those who need it the most. Our paper suggests that in times of crisis, when people are worried about finding the right emergency contacts or are experiencing a transformation in their psychological response, they should manually enter their experience into an app to obtain contact information for the nearest SOS. Alternatively, the proposed application suggests scraping hashtags from around the world with web scraping scripts to retrieve geo-referenced tweets in real time. In order to gather accurate data, Twitter’s API or some third-party application can be used in the development of this software. When there is a major crisis, social media user behavior plays a vital role, owing to its efficiency and high reliability with swift data dissemination. Anxiety over finding the right emergency team contact may cause people to lose their cool and become unclear about which authority they should call when an emergency occurs. Eventually, our application bridges the gap between clients in need of help and the approved agency that can really provide that support in different circumstances. In the case of, for example, a lost child or a flooded area, our application helps you get in touch with an organization that can assist you in dealing with the situation. We have used CNN-1-D + LSTM, CNN-1-D + Bi-LSTMs, and CNN-1-D + LSTM + Glove100-D to get maximum accuracy for the classification model. It is possible to use the prototype solution in a novel way to organize immediate assistance with authorities in comprehensive administration, while integrating Twitter or other social platforms into the application can make it a feasible platform for preventive intervention and catastrophe management. Keywords Emergency responders · Critical care facility · Natural language processing A. Anand · D. Rajeswari (B) Department of Data Science and Business Systems, School of Computing, College of Engineering and Technology, SRM Institute of Science and Technology, Chennai, Tamil Nadu 603203, India e-mail: [email protected] A. Anand e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_49

573

574

A. Anand and D. Rajeswari

1 Introduction Due to significant technical advancements in the last decade, people may now quickly get information over the Internet. Users may track problems in real time and work on tasks with strangers hundreds of kilometers away, thanks to platforms like social media. Cellphones as well as other smart devices have made it possible to connect with people all around the globe with just a simple click. In the event of a natural disaster, socioeconomic platforms such as Instagram, Twitter, etc., have proved to be highly useful tools, since users can quickly interact with those who are suffering similar problems and report the severity of the disaster. As a consequence, many people have turned to social media for knowledge through social groups, posts, and other means. People are increasingly using social media to share information about catastrophes, their impacts, and rescue activities [1]. Prior to and during a natural or man-made catastrophe, we may utilize social media to enhance our ability to communicate [2]. A new generation of social media has emerged, which is capable of distributing disaster preparedness data, alerting the public to impending dangers, and generating a sense of community based on user behavior. Effective information processors are those who utilize communications to report occurrences or make requests for aid so that relief organizations may act quickly in times of crisis. Disaster-related data plays an important aster communication activity as they are grouped into three types: pre-event (preparation and reduction), event (reaction), and post-event (improvement) [3]. Pre-event preparations include disaster detection, early warning, and public outreach. Social media is a great place for victims who are affected by calamities to relate their stories, publish the latest news, and coordinate fundraisers after a disaster. Relevant information has been spread throughout the Internet via social media platforms. It is possible to utilize social networks to educate the masses in real time and inform real reporters who go to ground zero to accurately report the severity of a disaster. Social media and user networks made it possible to send and receive instant messages to a bigger audience. In addition to serving as a source of knowledge for the public on a large scale, rescue personnel and government agencies may keep tabs on changing situations using relevant data on social media. Residents in the disaster zone may be able to give real time, geo-targeted data that can be very useful to emergency responders. Using social media channels that run in parallel to the major news networks, informal media foundations can be nurtured. Using the Twitter API, we can gather messages from across the globe, which can provide realistic event planning and catastrophe recovery recommendations inside our proposed software. We chose Twitter because it has the most users, allows people to post anything at any time, and isn’t set up for a specific event. Catastrophes that strike just a certain nation or region are known as “regional” catastrophes. Based on this, regional and global data can be gathered separately and used to train a specific model on it. Some people are usually concerned about what may happen to patients’ information if it is posted on Twitter or if false information is spread. A catastrophic risk management plan can be built by considering diseases as catastrophes or some

Deriving Pipeline for Emergency Services Using Natural Language …

575

other disaster categories while keeping in mind the potential and hazards of using Twitter as a means of information communication. There are many ways in which Twitter’s land-based data can be used in the event of a natural disaster [4].

2 Literature Review There are several studies by federal agencies throughout the world that employ tweeting statistics, so-called Emergency Situation Awareness (ESA-AWTM), to provide real-time notification and identification of events based on tweets [5]. This led to the creation of the “Tweettracker” program [6], which differentiates navigation and pseudo posts, which gives a named search bar, and shows interesting characteristics for the user. There are many institutions and people who come together in times of disaster to help each other [7]. These include those who have been directly affected, journalists, and non-governmental organizations. Twitter may be used for a multitude of purposes in a single piece of art. Vieweg et al. [8] addressed both the Oklahoma Grassfires and the flood events in various categories, such as hazard, volunteer, warnings, pet care, damage reporting, and so on. Xiao et al. [9] used the hurricane as a disaster incident, implemented a classification algorithm with various regression models and used testing data of messages for classifying texts into multiple categories for varying stages of tragedy. Following that, they used these classification techniques to classify social platform messages. Imran et al. [10] suggested how we can use machine learning to gather information about victims, donations, and other disaster-related topics from social media postings. They also collected and studied each factor’s temporal and spatial diversity thoroughly. Also, Hughes et al. [11] studied a variety of disaster-related duties, such as analyzing papers on disaster responses and so on. According to research conducted by Eyal [12] on the 2013 Hattiesburg F4 Tornadoes in the USA, utilizing a geolocation label in a tweet may help rescue efforts. As just a consequence, it demonstrates the value of Twitter’s use in catastrophe preparedness and response. When a tweet was sent with the keyword 911, the search team were able to make contact with the sender by turning on their geolocation. Aside from coordinating rescues, Tweets have also been used to help with psychiatric treatment. Thanks to the rescuers for their efforts [13]. In fact, Anand et al. [14] talk about various methods by outlining a pipeline using social media and emergency services, which is an extended work for the given topic.

3 Data Pipeline In order to keep track of disaster messages on social media, the proposed software’s main goal is to make a workflow [15]. The data pipeline, the NLP pipeline, the model pipeline, and the automated process help compensate for the four main parts

576

A. Anand and D. Rajeswari

of our technology that weren’t implemented in other papers, and consequently, it also reduces the time for emergency responders to react as fast as possible. In the next step, we provide an input field for the rescue workers this same user is seeking. Alternatively, we can scrape trending topics to use the data from twitter to provide links that exist solely based on the rescue workers the consumer has provided, and then feed that data into the previously created different classifiers. The data that we used for our neural network training was downloaded from Kaggle [16], medical speech, transcription, and intent. This set of data comprises hundreds of audio recordings which describe typical medical symptoms such as “headache” or “knee pain” of recordings lasting more than 8 h. Each phrase is developed by a single human source in response to a specific ailment. For our model training, we used the overview-of-recording.csv file from the downloaded data. To get features, we clean, normalize, remove stop words, and then stemming and lemmatizing the words or phrases we get. This gives us the features that are right for modeling. After that, it was converted into word vectors using the TF-IDF, and then we built neural networks using the downloaded data. We also retrain them by using some optimization techniques such as hyperparameter tuning. Finally, we try to classify the words into classifications as a prediction with real-world data. GridSearchCV was employed for finding the best parameters using hyperparameter tuning. In order to alert emergency responders, we first ask users to indicate the geolocation of the occurrence of a disaster, which they may do by clicking on a button such as “emergency call,” which consists of 911 or 100 depending on location, and another by using online maps, which suggests nearest emergency services.

4 Proposed Model Pipeline In the proposed pipeline, we have considered and implemented a dense neural network with bidirectional LSTM and embedding layer, in order to obtain better performance. Pre-processing Phase It involves basic pre-processing with a custom function, which removes all the punctuations, followed by performing stemming and removal of stop words. An initial train test split was done with 80:20 ratios, where 80% is for the training data and the rest 20% is for the test data. Using the TensorFlow package, tokenization of text to sequences was done to convert the raw text to tokens and integer-type sequences, respectively, in order to pass it through our network with the following parameters: vocab size = 1100, embedding dimensions = 128, and max length = 30. After converting it into sequences, padding was done, because for the reason being to map our sequences in a uniform way, with the following parameters: For padding, trunc type = “post,” padding type = “post,” out of vocab token tok = “OOV.” The above step was implemented with both training and testing data—messages and target labels.

Deriving Pipeline for Emergency Services Using Natural Language …

577

Model Architecture We have proposed a dense neural network for performing multi-classification. A sequential model with an embedding layer along with ConvNet1-D and LSTM is used. In sequential model, the first layer is embedding layer with 128 latent dimensions, to get featured vector representations, with a shape = (embedding dim, vocab size, input length = train padded.shape)). Next, for extracting features from these sequence of integer sequences, a ConvNet1-D Layer was defined. After that, a LSTM layer was defined to make sequence learning more efficient with 128 Neurons and with “ReLU” activation function. Finally, a normal dense layer and a final output dense layer with a total number of 25 neurons, due to 25 unique classes with a “softmax” activation function were implemented to perform a multi-class classification with probabilistic distributions. Training Specifications For training, Adam optimizer was used along with early callbacks to monitor validation accuracy to make sure to stop training if the model’s performance doesn’t improve. While compiling the model, we used a total number of 10 epochs and batch size of 32.

5 Automation Our suggested system, derived from natural language processing, bridges the gap between those in need of assistance and the appropriate authority who responds to such situations in healthcare and disaster management. Various features of our proposed application are as follows: It takes emergency or disaster-related messages from the Internet, such as social platforms like Twitter, and classifies them according to a group of emergency text with a bunch of other related emergency types. Our model understands how urgent your search is and assigns it a priority level. It also provides location-based results, which gives us location-based suggestions of the nearest help centers and their contact information. Let’s say a catastrophe has already affected a part of the region where they’re trying to locate a missing kid, or people, etc. Whatever the case may be, the app will help us get in touch with the appropriate agency. Using a system that gathers tweet data from all over the world can help people better understand where disasters happen. Since we are using automated Gmaps scripts to track nearby location-based suggested departments with the help of realtime retrieval of tweets based on a user’s location, it can be beneficial to proximate at-risk areas in case of natural calamities like earthquakes, floods, volcano eruptions, etc. To shorten the time required for emergency responders to react in disasteraffected areas, color-coded regions such as red, green, and yellow zones might be assigned based on prior incidences. Using this color-coding technique, it will assign a priority level based on urgency, such as “very critical,” “ordinary immediate,” or “regular crisis,” “highly likely,” and “secure” according to the amount of urgency. This app can also be used in health care using NLP, neural networks based on supervised disease prediction (medical text mining), which would be able to predict

578 Table 1 Accuracy and loss scores for different model

A. Anand and D. Rajeswari Model

Accuracy

Loss

CNN-1-D + LSTM + GloVe-100-D

0.9089

0.327

CNN-1-D + LSTM

0.9583

0.223

CNN-1-D + Bi-LSTMs

0.9983

0.056

the particular medical words and analyze them to get the topic it’s concerned about to deliver more easy and relevant solutions.

6 Results We used NLTK library utilities, including punkt and wordnet, prior to the training. After that, the data was downloaded and cleansed, the model was built, and it was then trained and evaluated. To make it easier to use in our program, we’ve stored the model in model.pkl or model.h5 format if the model is for machine learning or deep learning, respectively. We were able to get better outcomes by retraining the suggested system model via hyperparameter tuning, which resulted in a 3–4% increase in accuracy. The following results are depicted in Table 1. Finally, the final pipeline that we chose is as follows: Using ConvNet1-D + LSTM along with TensorFlow’s embedding layers with fine tuning by hyperparameters was suggested as the classification model for the proposed application.

7 Conclusion Because it is a serious subject it addresses, the paper’s goal is unique and significant. The primary motivation for creating and documenting the proposed application was to save people’s lives in situations when a human being’s life can be decided in a second. Emergency responders must act quickly in order to save precious time for hospital care. This is where our suggested approach (Fig. 1) comes into play. It takes data from social media sites and lets people search for possible emergency remarks on these platforms to cut down on the time it takes for services to get there. A key finding of this study is that social networking can be used as a more effective means of spreading information to the concerned response services during times of crisis. According to estimates, approximately 70% of Americans used social networking sites in 2018, demonstrating their widespread use [17]. Whenever a user has been in severe need of help, our service helps bridge the gap between both the user and the relevant authorities. International level data extraction and the extraction of georeferenced tweets are two of the primary components of the system’s architecture for getting relevant data. The use of NLP makes it novel for extracting relevant and critical Twitter or other social media posts by building a data pipeline and data

Deriving Pipeline for Emergency Services Using Natural Language …

579

Fig. 1 Accuracy and loss scores for different model

modeling. The suggested design isn’t based on a specific social network, but it’s flexible and has a lot of important things.

8 Future Work We proposed a high-level pipeline that aids in the wide use of relevant information by the appropriate emergency services when a few seconds can mean the difference between life and death. Aside from the planned application, future changes that might be made to the application will also help cut down on response times for the different emergency response agencies. As a means of verifying user postings and tweets, the Fast Lane lite (an addon or light variant of our program) will be beneficial in delivering notifications to the proper emergency organizations, with its function as a channel for confirmation. Emotions have a significant role in dispersed classification. Face and voice-activated technologies can allow us to examine the user’s emotions and score them on a scale of significance, urgency, and severity. This will help us prioritize crucial or urgent messages to emergency services. While this method is currently being tested on Twitter, it may be used on a wide variety of systems with the right level of experience and knowledge of the system design. Furthermore, a complaint cell with a database linked to an identity card such as a voter’s ID card in India or a Social Security Number (SSN) in the USA can be set up by superior government officials to prevent unfairness. Aside from that, processing the massive amounts of data that will be supplied to our app would be a major challenge. Therefore, the data will be divided up and processed in parallel with the big data, which makes it an ideal situation for it to be used. Having a smart device that can be used in a variety of ways, as well as having AI that monitors the environment and generates results, shows that trustworthy services are going to be great.

580

A. Anand and D. Rajeswari

References 1. Kusumasari B, Prabowo NPA (2020) Scraping social media data for disaster communication: how the pattern of Twitter users affects disasters in Asia and the Pacific. Nat Hazards 103(3):3415–3435. https://doi.org/10.1007/s11069-020-04136-z 2. Houston JB et al (2015) Social media and disasters: a functional framework for social media use in disaster planning, response, and research. Disasters 39(1):1–22. https://doi.org/10.1111/ disa.12092. Author F, Author S, Author T (1999) Book title, 2nd edn. Publisher, Location (1999) 3. Feldman D et al (2016) Communicating flood risk: Looking back and forward at traditional and social media outlets. Int J Disaster Risk Reduct 15:43–51. https://doi.org/10.1016/j.ijdrr. 2015.12.004 4. Seddighi H, Salmani I (2020) Saving lives and changing minds with twitter in disasters and pandemics: a literature review, pp 59–77 5. Yin J, Karimi S, Robinson B, Cameron M (2012) ESA: emergency situation awareness via microbloggers. In: Proceedings of the 21st ACM international conference on information and knowledge management, pp 2701–2703. https://doi.org/10.1145/2396761.2398732 6. Kumar S, Barbier G, Ali Abbasi MA, Liu H (2011) TweetTracker: an analysis tool for humanitarian and disaster relief. In: Fifth international AAAI conference on weblogs and social media, no. April 2014, pp. 661–662. https://doi.org/10.1145/1935826.1935854 7. Seddighi H, Seddighi S, Salmani I, Sharifi Sedeh M (2020) Public-private-people partnerships (4P) for improving the response to COVID-19 in Iran. Disaster Med Public Health Prep 1–6. https://doi.org/10.1017/dmp.2020.202 8. Corvey WJ, Vieweg S, Rood T, Palmer M (2010) Twitter in mass emergency: what NLP techniques can contribute. no June, pp. 23–24. Available http://aclweb.org/anthology-new/W/ W10/W10-0512.pdf 9. Xiao Y, Huang Q, Wu K (2015) Understanding social media data for disaster management. Nat Hazards 79(3):1663–1679. https://doi.org/10.1007/s11069-015-1918-0 10. Imran M, Elbassuoni S, Castillo C, Diaz F, Meier P (2013) Extracting information nuggets from disaster-related messages in social media 11. Hughes AL, Palen L, Sutton J, Liu SB, Vieweg S (2008) Site-seeing’ in disaster: an examination of on-line social convergence. In: Proceedings of ISCRAM 2008—5th International conference on information systems for crisis response and management, no. May, pp 324–333 12. Eyal N (2012) Repeat triage in disaster relief: questions from Haiti. PLoS Curr 115(12):700– 701. https://doi.org/10.1371/currents 13. Généreux M et al (2019) Psychosocial management before, during, and after emergencies and disasters—results from the Kobe expert meeting. Int J Environ Res Public Health 16(8). https:// doi.org/10.3390/ijerph16081309 14. Anand A, Patel R, Rajeswari D (2022) A comprehensive synchrzation by deriving fluent pipeline and web scraping through social media for emergency services. In: 2022 International conference on advances in computing, communication and applied informatics (ACCAI), pp 1–8. https://doi.org/10.1109/ACCAI53970.2022.9752629 15. Corvey WJ, Verma S, Vieweg S, Palmer M, Martin JH (2012) Foundations of a multilayer annotation framework for twitter communications during crisis events. In: Proceedings of 8th international conference languages and resources and evaluation (LREC 2012), pp 3801–3805 16. https://www.kaggle.com/datasets/paultimothymooney/medical-speech-transcription-andintent 17. https://www.pewresearch.org/internet/2018/03/01/social-media-use-in

Fetal Head Ultrasound Image Segmentation Using Region-Based, Edge-Based and Clustering Strategies G. Mohana Priya and P. Mohamed Fathimal

Abstract Manual measurement of fetal head circumference (HC) is still a difficult and time-consuming work, even intended for professional examiners. The improvement of automatic fetal measurement methods is vital and essential for image handling difficulty that must be the case solved for additional precise and effective obstetric examination. For prenatal diagnosis, ultrasound imaging is a typical method for evaluating biometric parameters during pregnancy. The most significant norm in determining a fetus’s growth and health is the fetal head circumference (HC). Two-dimensional ultrasound imaging (2DUI) examination is used to monitor embryonic development in most cases. Fetal growth and gestational age (GA) can be determined using the HC. Automatic estimation of fetal head circumference in 2DUS images in all trimesters of being pregnant was explored in this work. To generate the fetal head circumference (FHC) a part of the 2D ultrasound image of the selection, the proposed work uses edge-based segmentation, clustering-based segmentation and region-based segmentation. Keywords Ultrasound image examination · Region-based segmentation · Edge-based segmentation · Fetal biometry · Segmentation · Ultrasound (US)

1 Introduction Medical imaging is a powerful diagnostic and therapeutic tool. Computed tomography, MRI, ultrasound and scanners are examples of imaging technologies. Ultrasound imaging is frequently used for prenatal observation and screening, and it is the favored approach in many scientific research besides its non-invasive nature, low price and true capture, as in relation to other magnetic resonance imaging techniques. Procedures are established in order to produce the best photos while preserving the qualities of the artifact of interest (e.g., structure and anatomical). Ultrasound images, on the other hand, are operator-dependent, with attenuation and speckle, as well as G. Mohana Priya (B) · P. Mohamed Fathimal SRMIST, Vadapalani Campus, Chennai, Tamil Nadu 600026, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_50

581

582

G. Mohana Priya and P. Mohamed Fathimal

artifacts like reverberations and reflections, making a difficult task of interpretation. Evaluating biometric specifications such as the baby’s head circumference (HC), the femur length (F), crown-rump length (CRL), biparental diameter (BPD) and abdominal circumference (AC) are regularly computed during an ultrasound screening examination to identify the GA and track the preborn child’ growth. Measurement of HC is used to estimate the fetus’s weight, size and detect abnormalities in the unborn child. Developing countries account for 99% of all maternal fatalities worldwide. Women and newborn babies can be saved if they receive expert treatment before, during and after childbirth. Unfortunately, in low-resource situations, there is still a chronic lack of well-trained sonographers. Most pregnant women in these nations are unable to access ultrasound screening since of this. An automated system could aid in the correct measuring of untrained human observers. Emphasis on HC detection in this study since it is being used to determine the GA and track the growth of the embryo. Furthermore, when compared to the fetal abdomen, the fetal head is more easily observable. CRL is the appropriate measure for determining fetal GA. Because CRL could no longer be accurately measured after 13 weeks, HC was utilized as the mainly precise test for estimating GA. Among the important determinants of prenatal growth and well-being is fetal head circumference (HC). Research is being done with the goal of extracting medical images’ intrinsic biomarkers like organ volume, area or features directly for use in prognosis. In clinical practice, by drawing lines from the major and minor elliptical axes of an ultrasound image, the major and minor axes are each represented by two points which are used to evaluate HC. The HC ellipse’s circumference can then be calculated using these two pieces of data. Since speckles and objects are so common when looking at ultrasound images, their interpretation is difficult and subject to the individual’s own biases. As a result, manual annotation of HC is susceptible to both inter- and intra-observer variation. The fetal HC can be automatically measured, which could improve the objectivity of HC measurements. This work [1] proposes a multi-task deep CNN for autonomous HC ellipse recognition and prediction by optimizing a composite objective functions made up MSE of ellipse parameters and segmentation dice score. Guiding principle state that there must be an echogram with an anterior third that is split by the transparent septum, in which the anterior and posterior horns of lateral ventricles are evident. With an echogram of the mid-midline, the head must be cross-sectioned and recorded. Biometrics are obtained manually, which introduces inter-and within-observer variation. Precise automated systems do not suffer from within-observer variability, thus reducing measurement time and variability. The conclusions drawn with the hybrid approach are comparably better than those produced with segmentation separately, according to the outcomes of this investigation. Importantly, the hybrid image processing system delivers higher picture similarity measures values than the alternative technique [2]. The processes to the fetal head sub-challenge are significantly varied, concentrating on part of the criteria or edge sharpness in diverse ways. Neither Ciurte et al.’s and Sun’s solutions were predicated on graphs model. Foi [3] et al.’s techniques primarily centered on signal

Fetal Head Ultrasound Image Segmentation Using Region-Based, …

583

processing and optimization systems. Stebbing and McManigle [4] utilized a machine learning method that also was predicated on a model for designing boundary fragments throughout the training phase. The findings revealed that the highest degree of productivity comparable to manual demarcation can be achieved [5]. In this paper [6], a fast ellipse fitting (ElliFit) approach is used to determine HC automatically. To determine the fetal head, an ultrasound examination is used in combination with a random forest classifier. This method’s framework might be split into four aspects: With the use of pre-existing knowledge, random forest classifier and previous knowledge, edge extraction, fitting and measurement of ellipses were all carried out. Studies were conducted out on the HC measurement to compare their method to four other methods in requisites of region-based metrics like dice, sensitivity, accuracy and specificity and distance-based metrics like most symmetric contour distance (MSD) and root mean square symmetric contour distance (RMSD), average symmetric contour distance (ASD) and Bland–Altman plots. The result showed that the technique has a lot of potential for use in medical care. False ROI localizations and missing ROIs were found in this study, which is a limitation of this article. False localizations often occurred when there was a structure comparable to the US image HC. The HC predicted [7] by a direct estimation method via regression was compared to typical prediction methods based on the segmentation and elliptical fit-ting of the head circumferences in US scans. Compared to segmentation-based techniques, the CNN regressor’s error is twice. Researchers examined CNN techniques for fetal head segmentation in this paper [8]. The impact of multi-scale effects on LinkNet’s US imaging performance and segmentation task was then provided with a mini-LinkNet network. This method enhanced the network’s performance compared to deep LinkNet. Training time was also lowered, as were the amount of parameters that could be trained. In this paper [9], measurement of the circumference of the foetus’ head is done in four following steps: the choice of fetal head candidates, the data pre-processing stages to enhance pixel choice and the ellipse fitting and assessment of the fetus’s head circumference by the fitted ellipse. According to this paper [10], an ultrasound estimation of the baby’s fetal head circumference, the actual head circumference is significantly underestimated. The greater the fetal HC measurement error, the earlier the placenta is located. Pregnancy monitoring with fetal growth restriction, suspected fetal head growth abnormalities, labor management and perinatal outcome all depend on accurate fetal head circumference measurement. An important factor [11] in the evaluation of fetal growth is the head circumference (HC) and regression CNN trained with distance field to determine the mean absolute difference (MAD) = 1.9 mm in US images to precisely define the HC.

2 Methodology Miscellaneous extraction and assessment methods were employed to study the FHC of 2DUI in this paper. The goal of this research is to compare the outcomes of

584

G. Mohana Priya and P. Mohamed Fathimal

fetal head retrieval. A complete relative assessment is presented to demonstrate the requirement for the proposed strategy.

2.1 Proposed Technique The 2DUI received from hospital first goes through a pre-processing way of enhancing the readability of the head portion. This module executes a segmentation procedure, analyzing segmentation techniques such as region-based, edge-based and clustering strategies.

2.2 U HC18 Dataset Heuvel et al. [12] provided a standard 2DUI dataset termed the HC18 for this study. It is made up of 800 × 540 pixel training images (999 images + 999 annotations) and test images (335 images) were obtained from diverse participants.

2.3 Region-Based Segmentation A technique for splitting an image into simple regions with regular interaction is segmentation. The homogeneous approach is determined using Euclidean distance to quantify pixel homogeneity.   n  (qi − pi )2 d( p, q) =  i=1

Euclidean vectors starting from the origin of the space (initial point) are qi, pi and n—space, where p and q are two points in Euclidean space. Pixel values that go below or above that threshold separate objects and backgrounds. This technique is known as threshold segmentation. Region growing. The region grows segmentation algorithm works by comparing the last region of a pixel in an image to its neighbors. Region splitting. The algorithm begins with the entire image and divides it into sub-regions until they are all uniform. The splitting procedure is normally ended, when the attributes of a newly separated pair cannot differ from the original region’s attributes by more than a threshold.

Fetal Head Ultrasound Image Segmentation Using Region-Based, …

585

Region Merging. Merging must begin with a seed region that is identical. An appropriate seed region has been identified after some investigation. Split and Merging. Merging is accomplished by comparing neighboring groups of four nodes that have a common parent. Global threshold. The image is separated into two halves (object and background), with each receiving a single threshold value. Local threshold. Various objects, as well as the backdrop, are used to define multiple thresholds.

2.4 Edge-Based Segmentation Edge detection is an image processing method that detects the edges of elements in an image. Interruptions in sensor brightness are employed. In disciplines including image processing, data analysis and object recognition, edge detection is utilized for image segmentation and data extraction. Extraction of image edges by evaluating if a pixel is on the edge and comparing its values to those of its neighbors. Edges are large local variations in digital image intensity. A cohort of pixels along the border between two different regions forms such a boundary is referred to as an edge. Different sorts of edges include diagonal edges, vertical edges and horizontal edges. The technique of edge detection is used to segment an image into discrete pieces. Edge detection seeks out a major change in the gray level within an image’s features. In this image, this texture serves as a transition between two distinct sections. Maintain the image’s structural integrity while reducing the amount of data in the image. Gradient and Gaussian edge detection operators are the two sorts of edge detection operators. The gradient-based Sobel, Prewitt and Robert operators estimate first-order derivations in an image. Laplacian of Gaussian operators and Canny edge detectors evaluate second-order derivations in images. Sobel Operator. This is the discrete derivative operator, and for edge detection, it evaluates the gradient approximate of the pixel intensity’s function. The Sobel operator determines whether a vector’s norm or the equivalent gradient vector at an image’s pixels. Considering a 3 × 3 portion of the original image, the gradient approximations at pixel (x, y) are generated as follows: Gx = x-direction kernel* (part of image 3 × 3 with (x, y) as center cell) Gy = y-direction kernel* (part of image 3 × 3 with (x, y) as center cell). The following formula is used to calculate an estimated magnitude: |G| =



Gx 2 + Gy 2

586

G. Mohana Priya and P. Mohamed Fathimal

The spatial gradient is caused by the angle of orientation of the edge:  θ = a tan(Gy Gx) Prewitt Operator. Prewitt and Sobel operators are almost indistinguishable. It also recognizes the vertical and horizontal boundaries of a picture. It is one of the most effective methods for determining the size and orientation of a picture.  The direction of the gradient θ at pixel (x, y) is θ = a tan(Gy Gx) where a tan is the arctangent operator. Gradient-based operator: Consider the image as a function f of the intensity value of pixels to determine the gradient g. (x, y).  g=

∂f ∂x

2

+

∂f ∂v

2

Canny Operator. Detect edges using Gaussian-based operators. Noise has no effect on this operator. It does not alter or affect the image features in any way during the extraction process. The Canny edge detector’s superior algorithm was consequent from the Laplacian of the Gaussian operator. How to Calculate the Image’s Intensity Gradient: The first derivatives in the horizontal (Gx) and vertical (Vx) directions were calculated using the Sobel kernel in both the horizontal and vertical (Gy) directions. Edge_Gradient (G) = Angel(θ ) = tan−1



G 2x + G 2y

Gy Gx



Marr–Hildreth Operator. The Marr–Hildreth operator is also known as Laplacian of Gaussian (LoG). This is a Gaussian-based operative that uses the Laplacian to calculate the second derivative of a representation. When the gray-level transition looks to be abrupt, this is extremely useful. It employs the zero-crossing technique, which claims that the position equates to the maximum level if the second-order derivative crosses zero. An edge position is what it is called. The Gaussian operator decreases the noise, while the Laplacian operator identify the sharp edges.

2 x + y2 exp − G(x, y) = √ 2π σ 2 2π σ 2 1

where σ is the standard deviation. Log =

2 ∂2 x 2 y 2 2σ 2 x + y2 ∂2 G(x, y) + G(x, y) = exp ∂x2 ∂ y2 σ4 2σ 2

Fetal Head Ultrasound Image Segmentation Using Region-Based, …

587

2.5 Clustering Strategies Clustering is the practice of arranging data points into many groups, so that measured values in one group are more comparable to measured values in other groups. The k-means clustering method calculates centroids and repeats the procedure to find the optimal centroid. Another term for this is the flat clustering algorithm. The number of clusters is denoted by the “K” in K means recognized by the algorithm. There are a variety of ways to figure out how many clusters are ideal, but the elbow method is a sensible method.

3 Results and Discussion The data for this study used from the HC18 database. region-based segmentation, edge-based segmentation and clustering segmentation are used to segment the out- puts of the chosen 2D ultrasound images. The object and background have a great contrast in region-based segmentation, and the method works effectively. The fundamental drawback of region-based segmentation (see Fig. 1) is that it becomes extremely difficult to construct accurate segments when there is not a significant grayscale difference or when grayscale pixel values overlap. In edge-based segmentation (see Fig. 2), the Sobel operator is used for fast and simple computations, as well as searching for smooth edges. In the Sobel operator, points in diagonal directions are not always retained. Noise sensitivity is high, and in terms of edge detection, it is not very efficient in the Sobel operator. When

(a)

(b)

(c)

(d)

000_HC.png

012_HC.png

Fig. 1 Region-based segmentation results of fetal head (from the left to the right): a input image, b twofold segmented image c fivefold segmented image d ground truth

588

G. Mohana Priya and P. Mohamed Fathimal

(a)

(b)

000_HC.png

012_HC.png

(c)

(d)

Sobel X

Sobel Y

Sobel

Sobel X

Sobel Y

Sobel

Fig. 2 Edge-based segmentation results of fetal head (from the left to the right): a input image, b Sobel X, c Sobel Y, d Sobel

compared to Sobel, the Prewitt operator is essentially identical to the Sobel operator in terms of recognizing vertical and horizontal edges and the best operator for detecting an image’s orientation. By using the Prewitt operator (see Fig. 3), coefficient’s magnitude is set and cannot be modified and points in opposite directions are not always preserved. The Canny operator (see Fig. 4) has excellent localization and extracts features from images without modifying them and noise sensitivity is reduced. (conclusion for Canny best: false zero crossing is present and complicated and time-consuming calculations). In k-means clustering, an instance can modify the cluster on re-computation of centroids. Using k-means, tighter groups are created. Compared to hierarchical clustering, k-means clustering is faster and in the case of using a large number of features. Data points are assigned membership to each cluster center, which means that they can belong to more than one cluster center, unlike k-means, whereby each data point is allotted to a single cluster center. The main disadvantage of FCM is Euclidean distance measures can unequally weight underlying factors. The spatial FCM lessens the spurious blobs, and the outcomes are much more homogeneous than with other techniques and the disadvantage is sensitivity to initialization of cluster centroids.

3.1 Comparative Analysis Table 1.

Fetal Head Ultrasound Image Segmentation Using Region-Based, …

(a)

(b)

000_HC.png

012_HC.png

(c)

Prewitt X

589

(d)

Prewitt Y

Prewitt

Prewitt Y

Prewitt X

Prewitt

Fig. 3 Edge-based segmentation results of fetal head (from the left to the right): a input image, b Prewitt X, c Prewitt Y, d Prewitt

(a)

(b)

000_HC.png

012_HC.png

(c)

ground truth

Canny

ground truth

Canny

Fig. 4 Edge-based segmentation results of fetal head (from the left to the right): a input image, b ground truth, c Canny

590

G. Mohana Priya and P. Mohamed Fathimal

Table 1 Comparison of segmentation techniques Methodology

Merits

Demerits

Edge detection segmentation

Maintains the boundaries of high-contrast objects

There is a need for manual intervention and unable to detect noise

Region-based segmentation techniques

Easy to enact and resistant to noise

It is necessary to use manual intervention. In terms of time and memory, there is a high computational cost

Clustering techniques

Inexpensive of processing

Noise sensitivity is high. Calculating the membership function is challenging

3.2 Performance Metrics A performance metric is a metric that is used to evaluate the characteristics of fetal image. This can categorize, analyze and identify changes inside an image. The performance measures examined in this study were mean squared error (MSE) and root mean square error (Table 2). Mean Square Error (MSE). The mean square error (MSE) is the difference between the filtrate and source images mean squared error. MSE is calculated using the formula below, where M and N are the width and height of the images, (i, j) is the improved image, and K is the source image (i, j). The source and improved images, row and column pixels are represented by the characters I and j. Table 2 Table of performance metrics

Segmentation technique 000_HC.png RMSE

012_HC.png RMSE

Edge-based segmentation Sobel operator

98.6

130.9

Prewitt operator

26.6

20.5

Canny operator

34.1

32.4

Marr–Hildreth operator 177.2

179.2

Region-based segmentation twofold segmented image

168.9

172.8

fivefold segmented image

146.8

147.5

149.9

150

Clustering K-means clustering

Fetal Head Ultrasound Image Segmentation Using Region-Based, …

 MSE = 1 M ∗ N

M−1,N  −1

591

[I (i, j) − k(i, j)]2

i=0, j=0

Root Mean Squared Error (RMSE). The RMSE is determined by squared rooting MSE. RMSE is an indicator of how much a pixel varies as a response of processing. RMSE =



MSE

4 Conclusion Edge-based segmentation, region-based segmentation and clustering image segmentation approaches (2DUI) are used for detecting the FHC session based on the 2D sonograms selected. For this investigation, we used a standard 2DUI dataset called the HC18 and tested it on an independent test set that comprised data from all trimesters. Based on the results, it is observed that region-based outperforms the edge-based and clustering segmentation for determining fetal head. In the future, deep learning techniques will be used for analyzing the fetal growth well-being of the baby to enhance accuracy of the model.

References 1. Sobhaninia Z, Rafiei S, Emami A, Karimi N, Najarian K, Samavi S, Soroushmehr SR (2019) Fetal ultrasound image segmentation for measuring biometric parameters using multi-task deep learning. In: 2019 41st annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 6545–6548 2. Rajinikanth V, Dey N, Kumar R, Panneerselvam J, Raja NSM (2019) Fetal head periphery extraction from ultrasound image using Jaya algorithm and Chan-Vese segmentation. Procedia Comput Sci 152:66–73 3. Foi A, Maggioni M, Pepe A, Tohka J (2012) Head contour extraction from fetal ultrasound images by difference of Gaussians revolved along elliptical paths. In: Proceedings of challenge US-biometric measurements from fetal ultrasound images. IEEE international symposium on biomedical imaging, ISBI 4. Stebbing RV, McManigle JE (2012) A boundary fragment model for head segmentation in fetal ultrasound. In: Peters J (ed) Proceedings of challenge US: biometric measurements from fetal ultrasound images, ISBI, 9-11.G. O. Young. Synthetic structure of industrial plastics. Plastics, 2nd edn. McGraw-Hill, New York, USA, pp. 15–64, 1964 5. Rueda S, Fathima S, Knight CL, Yaqub M, Papageorghiou AT, Rahmatullah B, ..., Noble JA (2013) Evaluation and comparison of current fetal ultrasound image segmentation methods for biometric measurements: a grand challenge. IEEE Trans Med Imaging 33(4):797–813 6. Li J, Wang Y, Lei B, Cheng JZ, Qin J, Wang T, …, Ni D (2017) Automatic fetal head circumference measurement in ultrasound using random forest and fast ellipse fitting. IEEE J Biomed Health Inform 22(1), 215–223

592

G. Mohana Priya and P. Mohamed Fathimal

7. Zhang J, Petitjean C, Lopez P, Ainouz S (2020) Direct estimation of fetal head circumference from ultrasound images based on regression CNN. In: Medical imaging with deep learning. PMLR, pp 914–922 8. Sobhaninia Z, Emami A, Karimi N, Samavi S (2020) Localization of fetal head in ultrasound images by multiscale view and deep neural networks. In: 2020 25th international computer conference, computer society of Iran (CSICC). IEEE, pp 1–5 9. Avalokita DT, Rismonita T, Handayani A, Setiawan AW (2020) Automatic fetal head circumference measurement in 2D ultrasound images based on optimized fast ellipse fitting. In: 2020 IEEE region 10 conference (TENCON). IEEE, pp 37–42 10. Poojari VG, Jose A, Pai MV (2021) Sonographic estimation of the fetal head circumference: accuracy and factors affecting the error. J Obstet Gynecol India:1–5 11. Fiorentino MC, Moccia S, Capparuccini M, Giamberini S, Frontoni E (2021) A regression framework to head-circumference delineation from US fetal images. Comput Methods Programs Biomed 198:105771 12. van den Heuvel TL, de Bruijn D, de Korte CL, Ginneken BV (2018) Automated measurement of fetal head circumference using 2D ultrasound images. PLoS ONE 13(8):e0200412

A Shallow Convolutional Neural Network Model for Breast Cancer Histopathology Image Classification Shweta Saxena, Praveen Kumar Shukla, and Yash Ukalkar

Abstract Identification of malignancy using histopathology image processing is a crucial method for cancer diagnosis. A model to classify images based on deep convolutional neural networks (CNNs) attains a promising performance in the computer vision for a large dataset but creates a computational complexity for a comparatively small histopathology dataset. A shallow CNN model is proposed in this paper, which classifies the histopathology images of breast tissues. The images are classified as either malignant or benign by the model. It is tested on two publicly available, benchmark breast cancer histopathology datasets. According to the result, the proposed model achieves (95.19 ± 8.50) accuracy and requires a less no. of parameters (8, 052,062) for training as compared to the existing deep CNN models implemented on the same protocol. Keywords Convolutional neural network · Computer-aided diagnosis · Histopathology · Breast cancer

1 Introduction “Breast cancer is common cancer in females” [1]. Breast cancer occurs due to the unwanted mutations in breast cells. As a result, the abnormal cells in the breast form a mass called tumour, which may be malignant (cancerous) or benign (non-cancerous) [2]. Histopathology is the diagnosis of the infected tissues under microscope to identify cancer [3]. However, histopathological images comprise complex visual patterns, which are difficult to be analysed by a pathologist [4]. The histopathology data becomes very large and difficult to handle by a human expert [5, 6] because many patients are considered for histopathology tests. A computer-aided diagnosis

S. Saxena · Y. Ukalkar School of Computing Science Engineering, VIT Bhopal University, Bhopal, India P. K. Shukla (B) School of Computing and Information Technology, Manipal University Jaipur, Jaipur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_51

593

594

S. Saxena et al.

(CAD) model based on “convolutional neural networks (CNNs)” analyses and categorizes images into benign and malignant categories quickly. It provides a quick and accurate diagnostic opinion about the sample image, which helps the pathologist to take the final decision about the cancer diagnosis. Deep learning has been a branch of study that is expanding for applications in computer vision [7–10]. In past years, LeNet [11], AlexNet [12], VGGNet [13], GoogLeNet [14], and ResNet [15] have been popular for the classification of a large image dataset. So, the researchers also developed CNN-based CAD models for the histopathological image classification [16, 17]. Cruz-Roa et al. [18] created a CNN model to recognize “breast cancer” tissues in whole slide histopathology images. Compared to colour, texture, and geometrical feature-based categorization models, this model had a greater level of accuracy. In [19], the researchers extracted features of histopathological images using CNN and classified them using the support vector machine (SVM). These studies were performed on a small unpublished histopathology dataset. So, further research and comparison of these methods were difficult. To overcome this limitation, Spanhol et al. [20] published the “BreaKHis” dataset, a histopathology dataset, which was made publicly available. The authors performed some initial experiments on the dataset, and 80–85 accuracy was achieved in classifying the data. Nowadays, researchers are using this dataset as the benchmark for histopathological image analysis. In [21], an AlexNet-based model was proposed to classify BreaKHis dataset, and the researchers obtained a higher accuracy than [20]. Bayramoglu et al. [22] proposed a model independent of magnification (named single task CNN) to classify the histopathological dataset. The authors demonstrated that the magnification independent CNN classification is better as compared to the magnification specific classification. In [23], authors used the pre-trained BVLC CaffeNet (modifiedAlexNet) model to classify BreakHis data. Apart from the fast development (due to pertained model), the authors achieved improved accuracy as compared to the traditional handcrafted texture features. All the above studies were focused on binary classification. In [24], benign and malignant images were further classified into their subclasses using a “class-structure-based deep CNN” (CSDCNN) model. Nejad et al. [25] used data augmentation on the “BreaKHis” dataset to increase the CNN model accuracy. Nahid and Kong [26] classified the BreaKHis dataset using five different CNN models; among them, the CNN CT histogram model achieved 92.19 accuracy on 200× magnified images. Motlagh et al. [27] proposed deep learning-based ResNet models for the multi-classification of tissue microarray dataset and BreakHis dataset. In [28], the researchers proposed a multi-category classification model using deep residual networks. The CNN models proposed in the last decade achieved promising results as compared to traditional methods. However, these models require a huge number of parameters. Zou [29] proposed novel attention high-order deep neural network (AHoNet) to collect deep features to classify images of breast tissues. Kumar [30] proposed a 7-CNN layer for breast cancer image classification. It employs seven models, each of which has been used to do predictions.

A Shallow Convolutional Neural Network Model for Breast Cancer …

595

The computational complexity and training time increase with no. of parameters of training. In this paper, we offer a computationally efficient CNN model for histopathology data classification. The following are the study’s specific contributions: 1. The paper proposes a novel CNN architecture to classify histopathology images of breast cancer tissues. 2. This research compares LeNet, AlexNet, VGG-16, VGG-19, and the suggested CNN model to classify breast cancer histopathology images at magnifications of 40×, 100×, and 200×. The proposed model gives high accuracy with minimum computational time complexity as compared to the discussed models. The paper is structured as follows: Sect. 2 provides the detail of the histopathological dataset. Section 3 provides the detail of the methodology. In Sect. 4, we compared the proposed model to existing deep models on the same experimental protocol. Finally, Sect. 5 concludes along with future scope of research in the area.

2 Data Set We used two public histopathology datasets to implement the proposed model. Our first dataset is BreaKHis dataset. It consists of the images of 82 breast biopsy patients. Spanhol et al. [20] introduced it that contains 7909 images. Figure 1 shows images of malignant cancer collected at various magnifications from the database. Magnifications 40×, 100×, 200×, and 400× were used to capture the images. The second dataset is UCSB bio-segmentation benchmark dataset named as Bisque [31]. It contains 58 breast cancer histopathology images (26 malignant and 32 benign cases). Figure 2 shows several examples of this dataset’s images.

3 Methodology Figure 3 depicts the entire model for histopathology image classification. There are three primary steps in the approach: (1) pre-processing, (2) CNN architecture for feature extraction and classification, and (3) training and testing strategy. Moreover, we have explained the complete proposed model in Sect. 3.3.

3.1 Pre-processing Firstly, in pre-processing step the image was transformed from RGB to HSV space and performed the histogram equalization. The pre-processing contains four steps as image colour transformation, histogram equalization, and again colour transformation in original colour followed by resizing. We first converted the images from RGB

596

S. Saxena et al.

Fig. 1 Various magnifications of a breast tissue biopsy slide a 40×, b 100×, c 200×, d 400×

Fig. 2 Images of UCSB bio-segmentation benchmark dataset a Benign b Malignant

to HSV and performed histogram equalization on “value” channel in HSV image. The images were then colour converted back into RGB images. Three trials were carried out in order to resize the image to a suitable size. In first experiment, we reduced the dimensions of original image (700 × 460 × 3) by a factor of 0.2 (140 × 92 pixels) and implemented the proposed (see Sect. 3.2). We got (68.67 ± 0.00) accuracy with slow training process. In the second experiment, we reduced the dimensions by a factor of 0.1 (70 × 46 pixels) and implemented the same CNN model. It produced (81.45 ± 6.67) accuracy along with fast training. In the third experiment, we resized

A Shallow Convolutional Neural Network Model for Breast Cancer …

597

Fig. 3 Proposed CNN model for classification of histopathological images. Here FC1 and FC2 are abbreviations used for the fully connected layers

the image by a factor of 0.05 (35 × 23 pixels) and trained the CNN model. The training becomes comparatively faster as compared to the previous experiment, but the accuracy is reduced to (70.2 ± 5.94). Based on these experiments, we finally resized the image into 70 × 46 pixels for the proposed the models.

3.2 CNN Architecture The deep CNN model presented in [32] is the inspiration for the suggested architecture. In this article, an eight-layered deep CNN was used to recognize traffic signals. We have considered the above architecture as a base and performed a no.

598

S. Saxena et al.

of experiments to find a suitable architecture to perform the histopathology image classification task. Finally, we have got a five-layered architecture that classified the histopathology images of different magnifications with 93–95% accuracy. The input layer of the proposed CNN consists of three feature maps and 70 × 46 neurons corresponding to RGB channels and image size (after pre-processing), respectively. For all three feature maps, the convolutional layer employed 20 filters sized 3 × 3 and stride = 1. The filters convolved and found the dot products of kernel (filter) matrix values with the input image pixels. Subsequently, 20 feature maps of size 70 × 46 were generated for the next consecutive layer. We used padding to preserve the spatial resolution after the convolutional layers as per the requirement. Max-pooling [33] of window 2 × 2 with stride = 2 was applied following the convolutional layer. The use of max-pooling reduced the size of feature maps to half of what they were before. Consequently, the output of the max-pooling layer contains 20 feature maps of size 35 × 23. These feature maps were flattened into the one-dimensional array and provided to the fully connected layer. The output of every convolution layer is processed via a ReLU activation function [12]. The concept of dropout [34] is used with a dropout of rate = 0.5 in fully connected layers to prevent overfitting. FC1, the first fully connected layer, contains 500 neurons. The fully connected layer (FC2) is the decision-making layer. It contains only two neurons: one for the benign case and another one for the malignant case. The softmax function is utilized in the model’s output layer [35] in place of the ReLU activation function to get the final result.

3.3 Training and Testing Strategy Using fivefold cross-validation, the dataset was separated into a training and a testing group. The suggested CNN model was trained using a stochastic gradient descent (SGD) optimizer with a momentum value of 0.9 [36] and random weight initialization. The training data is divided into mini-batches of size 32. For each batch, the model computes the gradient and updates the weights automatically. The initialized learning rate is 0.01, and the training was done up to ten epochs (an epoch is a single iteration over the entire training set). Since the method considered five folds, five different accuracies and losses were calculated. The loss function was minimized by the SGD optimizer. Finally, the average of all these losses and accuracies were calculated to get the result.

4 Results and Discussions The suggested model was created using the Keras API, TensorFlow backend, and Scikit-learn library in Python [37]. The image data processing was done with an NVIDIA Quadro K5200 graphic card. An HP Z840 workstation equipped with two 64-bit (Intel® Xeon® CPU E5-2650 V3 @ 2.30 GHz) processors and 8 GB RAM were

A Shallow Convolutional Neural Network Model for Breast Cancer …

599

utilized to perform the experiments. The proposed model’s accuracy and computational complexity are utilized to assess its performance. The number of successfully classified malignant images is represented by True Positives (TP), while the number of correctly classified benign images is represented by True Negatives (TN). Therefore, the accuracy (Ac) is calculated using the formula shown in Eq. 1. Accuracy =

TP + TN total number of images

(1)

The computational complexity of a CNN model depends on the number of trainable parameters. Huge parameters require a large size memory and massive computations to calculate the weight matrix for each training iteration. As a result, the time required to train the model increases. So, in this research, we calculated the parameters required to train each CNN model and compared them with the proposed model. Two publicly available breast cancer datasets (BreakHis and Bisque) were used to implement the CNN models. In the case of the Bisque dataset, we cropped the original image of size 896 × 768 × 3 pixels into 700 × 460 × 3 pixels before pre-processing. The remaining process was the same as for the BreakHis dataset.

4.1 Analysis of Accuracy Our CNN model is comparable with the existing CNNs: LeNet, AlexNet, VGG-16, and VGG-19. These CNNs revealed state-of-the-art performance in previous studies [11–13]. LeNet was proposed for the MNIST dataset, and Alexnet, VGG-16, and VGG-19 models were proposed for the ImageNet dataset. We have implemented these CNN architectures on the same datasets and environment. Tables 1 and 2 show the accuracy and time complexity comparisons of the BreaKHis dataset, respectively. Only LeNet, as compared to our model, has the smallest amount of parameters to train at the price of accuracy. In comparison with the previous models, our model gives the highest accuracy while using less parameters to train, as demonstrated in Tables 3 and 4. The time complexity and accuracy of the Bisque dataset are presented in Tables 3 and 4, respectively. The proposed model showed a similar result on the Bisque dataset Table 1 Accuracy comparison of models on BreaKHis dataset in same environment CNN models

40×

100×

200×

400×

Proposed model

95.19 ± 8.50

95.03 ± 8.21

93.65 ± 9.67

94.63 ± 8.01

LeNet [11]

80.2 ± 4.95

69.82 ± 2.31

79.39 ± 11.48

73.73 ± 6.46

AlexNet [12]

68.67 ± 0.0

68.67 ± 0.0

69.05 ± 0.08

69.05 ± 0.0910

VGG-16 [13]

68.67 ± 0.0

68.67 ± 0.0

69.05 ± 0.08

69.05 ± 0.0910

VGG-19 [13]

68.67 ± 0.0

68.67 ± 0.0

69.05 ± 0.08

69.05 ± 0.08

600 Table 2 Number of parameters and time for classification on BreaKHis dataset

Table 3 Number of parameters and time for classification on Bisque dataset

Table 4 Accuracy comparison of models on Bisque dataset in same environment

S. Saxena et al. CNN model

Parameters

Time (s)

Proposed model

8, 052,062

587.569

LeNet [11]

372, 366

533.294

AlexNet [12]

112, 815, 766

1849.785

VGG-16 [13]

134, 268, 738

2140.558

VGG-19 [13]

187, 812, 930

2702.981

CNN model

Parameters

Time (s)

Proposed model

8, 052,062

72.046

LeNet [11]

372, 366

45.42

AlexNet [12]

112, 815, 766

827.633

VGG-16 [13]

134, 268, 738

1259.645

VGG-19 [13]

187, 812, 930

1365.272

CNN model

Accuracy

Proposed model

90.64 ± 14.90

LeNet [11]

91.90 ± 8.44

AlexNet [12]

55.16 ± 1.61

VGG-16 [13]

55.16 ± 1.61

VGG-19 [13]

55.16 ± 1.61

as for the BreakHis dataset. However, this dataset has a relatively tiny number of samples. So, a small number of samples are available for the training. Therefore, accuracy is decreased, and the standard deviation is increased as compared to the result of the BreakHis dataset.

5 Conclusion In this article, we proposed a computationally proficient shallow CNN model for breast histopathological image classification. The results show that our model’s feature extraction and classification procedure is faster, more accurate, and less resource intensive than existing techniques. This approach could be used to classify real-time breast cancer histopathology images. Our model shows a large standard deviation from the mean accuracy and can be improved using a large dataset. The CNN ensembles may also provide more accurate classification results.

A Shallow Convolutional Neural Network Model for Breast Cancer …

601

References 1. Torre LA, Islami F, Siegel RL, Ward EM, Jemal A (2017) Global cancer in women: burden and trends. Cancer Epidemiol Prev Biomarkers 26(4):444–457 2. Cooper GM (2000) The development and causes of cancer, chap 15, 2 edn. Sinauer Associates, Sunderland, MA 3. Kumar R, Srivastava R, Srivastava S (2015) Detection and classification of cancer from microscopic biopsy images using clinically significant and biologically interpretable features. J Med Eng 2015 4. Arevalo J, Cruz-Roa A, GONZÁLEZ O FA (2014) Histopathology image representation for automatic analysis: a state-of-the-art review. Rev Med 22(2):79–91 5. Dalle J-R, Leow WK, Racoceanu D, Tutac AE, Putti TC (2008) Automatic breast cancer grading of histopathological images. In: 2008 30th annual international conference of the IEEE engineering in medicine and biology society. IEEE, pp 3052–3055 6. Gupta V, Bhavsar A (2017) Breast cancer histopathological image classification: is magnification important? In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), Honolulu, HI, USA. IEEE, pp 769–776 7. Alom MdZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Van Esesn BC, Awwal AAS, Asari VK (2018) The history began from alexnet: a comprehensive survey on deep learning approaches. arXiv preprint arXiv:1803.01164 8. Signoroni A, Savardi M, Pezzoni M, Guerrini F, Arrigoni S, Turra G (2018) Combining the use of CNN classification and strength-driven compression for the robust identification of bacterial species on hyperspectral culture plate images. IET Comput Vis 12(7):941–949 9. Lakhal MI, Çevikalp H, Escalera S, Ofli F (2018) Recurrent neural networks for remote sensing image classification. IET Comput Vis 12(7):1040–1045 10. Duraisamy S, Emperumal S (2017) Computer-aided mammogram diagnosis system using deep learning convolutional fully complex-valued relaxation neural network classifier. IET Comput Vis 11(8):656–662 11. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324 12. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, vol 1, Lake Tahoe, Nevada, May 2012. ACM, pp 1–9 13. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Inf Softw Technol 51(4):769–784 14. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, USA, June 2015. IEEE, pp 1–9 15. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE, pp 1–9 16. Sharma S, Kumar S (2022) The xception model: a potential feature extractor in breast cancer histology images classification. ICT Express 8(1):101–108 17. Kashyap R (2022) Breast cancer histopathological image classification using stochastic dilated residual ghost model. Int J Inf Retrieval Res (IJIRR) 12(1):1–24 18. Cruz-Roa A, Basavanhally A, González F, Gilmore H, Feldman M, Ganesan S, Shih N, Tomaszewski J, Madabhushi A (2014) Automatic detection of invasive ductal carcinoma in whole slide images with convolutional neural networks. In: Gurcan MN, Madabhushi A (eds) Proceedings of the international society for optical engineering (SPIE) 9041, medical imaging 2014: digital pathology, vol 9041, San Diego, California, United States, Mar 2014. Springer, Berlin, p 904103 19. Araujo T, Aresta G, Castro E, Rouco J, Aguiar P, Eloy C, Polonia A, Campilho A (2017) Classification of breast cancer histology images using convolutional neural networks. PLoS One 12(6):1–14

602

S. Saxena et al.

20. Spanhol FA, Oliveira LS, Petitjean C, Heutte L (2016) A dataset for breast cancer histopathological image classification. IEEE Trans Biomed Eng 63(7):1455–1462 21. Spanhol FA, Oliveira LS, Petitjean C, Heutte L (2016) Breast cancer histopathological image classification using convolutional neural networks. In: 2016 international joint conference on neural networks (IJCNN), Vancouver, British Columbia, Canada. IEEE, pp 2560–2567 22. Bayramoglu N, Kannala J, Heikkila J (2016) Deep learning for magnification independent breast cancer histopathology image classification. In: Proceedings of the 23rd international conference on pattern recognition (ICPR), Cancun, Mexico, Dec 2016. IEEE, pp 2440–2445 23. Spanhol FA, Oliveira LS, Cavalin PR, Petitjean C, Heutte L (2017) Deep features for breast cancer histopathological image classification.In: 2017 IEEE international conference on systems, man, and cybernetics (SMC), Banff, AB, Canada, Oct 2017. IEEE, pp 1868–1873 24. Han Z, Wei B, Zheng Y, Yin Y, Li K, Li S (2017) Breast cancer multi-classification from histopathological images with structured deep learning model. Sci Rep 7(1):1–10 25. Nejad EM, Affendey LS, Latip RB, Ishak IB (2017) Classification of histopathology images of breast into benign and malignant using a single-layer convolutional neural network. In: Proceedings of the international conference on imaging, signal processing and communication, pp 50–53 26. Nahid A-A, Kong Y (2018) Histopathological breast-image classification using local and frequency domains by convolutional neural network. Information 9(1):19 27. Motlagh MH, Jannesari M, Aboulkheyr HR, Khosravi P, Elemento O, Totonchi M, Hajirasouliha I (2018) Breast cancer histopathological image classification: a deep learning approach. BioRxiv, p 242818 28. Gandomkar Z, Brennan PC, Mello-Thoms C (2018) MuDeRN: multi-category classification of breast histopathological image using deep residual networks. Artif Intell Med 88:14–24 29. Zou Y, Zhang J, Huang S, Liu B (2022) Breast cancer histopathological image classification using attention high-order deep network. Int J Imaging Syst Technol 32(1):266–279 30. Kumar D, Batra U (2021) Breast cancer histopathology image classification using soft voting classifier. In: Proceedings of 3rd international conference on computing informatics and networks. Springer, Berlin, pp 619–631 31. Drelie Gelasca E, Byun J, Obara B, Manjunath BS (2008) Evaluation and benchmark for biological image segmentation. In: 2008 15th IEEE international conference on image processing, pp 1816–1819 32. Stallkamp J, Schlipsing M, Salmen J, Igel C (2012) Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition. Neural Netw 32:323–332 33. Scherer D, Müller A, Behnke S (2010) Evaluation of pooling operations in convolutional architectures for object recognition. In: International conference on artificial neural networks. Springer, Berlin, pp 92–101 34. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207. 0580 35. Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning, vol 4. Springer, Berlin 36. Ruder S (2017) An overview of gradient descent optimization algorithms 37. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(85):2825–2830

Efficient Packet Flow Path Allocation Using Node Proclivity Tracing Algorithm R. Aruna, M. Shyamala Devi, S. Vinoth Kumar, S. Umarani, N. S. Kavitha, and S. Gopi

Abstract In the mobile ad hoc network, the mobile nodes activities are different; if the node activity is good, then it performs normal routing process, otherwise it performs abnormal routing. Such abnormal routing node is called as omission node. This node does not transmit and receive data packet at the given specific time slot. Data packets are not reached to destination; it decreases the packet delivery ratio, increases the energy consumption, and decreases the network overall performance. Detection of such nodes is difficult. So the proposed technique, efficient packet flow path allocation (EPF), is used to establish the optimized route between source and destination. EPF analyzes the packet forwarding behavior of each node and packet size of each node transmission. The node proclivity tracing algorithm is used to measure the every node’s characteristics in the particular routing path and capture the omission attacker node in the path. This improves the detection efficiency and minimizes the overall energy consumption of the network. Keywords Efficient packet flow path allocation technique · Node proclivity tracing algorithm · Omission attack node

1 Introduction A mobile network is a group of mobile nodes, which are share data packets without any central management. Considering its self-configuration, elasticity, and dispersed R. Aruna (B) · M. Shyamala Devi · S. Vinoth Kumar · S. Gopi Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamilnadu, India e-mail: [email protected] S. Vinoth Kumar e-mail: [email protected] S. Gopi e-mail: [email protected] S. Umarani · N. S. Kavitha Erode Sengunthar Engineering College, Erode, Tamilnadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_52

603

604

R. Aruna et al.

environment, this applied in various uses like a failure revival, discovery, liberate process, and movable nodes in network environment. Mobile ad hoc network contains a lot of characteristics, as well as multichip packet sharing, dynamics topology, restriction, protection, energy availability [1]. The technique needs to balance different difficulties. Important aim of the mobile network communication technique increases the lifetime network, energy effectiveness, better transmission rate, and lesser packet latency. It supports to perform combined, tree-based communication in mobile ad hoc network [2]. The various techniques are applied to construct efficient communication schemes, rejecting the packet latency which minimize the overall traffic rate of the network. The routing scheme should operate depends on minimum distance route communication [3]. Mobile nodes are fixed approximately the center of the network take heavy loads while compare to remaining nodes that are placed at the edge of the network environment. Since node speed provides the path not working to maximum packet drop rate, with minimum transmission rate [4].

2 Literature Review Chacko and Senthilkumar [5], the combined many route ad hoc communication technique is used to identify the many paths on demand which depends on the intrusion plus blast rate. This technique is called as combined many route communication method. The SINR depends on the decision which is very useful to provide the exposed path energetic. Priya and Gopinathan [6] presenting the sequence to obtain an efficient and trustworthy data success rate for these mobile network, a zone-based opportunistic routing technique is applied. These methods use the most excellent character of the previous methods which are used to provide better output. This support to discover the accurate position of the target node and allow to transmit the data packets in an better way to target node. Akbas et al. [7] propose an enlarge a reasonable wireless link layer representation construct on summit of the better confirmed energy rakishness behavior of mobile network paths. It use the aforesaid link layer representation to construct a novel miscellaneous numeral program structure which is used for combined optimization of packet broadcasting in minimum energy range and the data packet size to obtain up the difficulties. Mallapur et al. [8] proposed load managing technique which is used to the choice of a intermediate node by means of the connectivity rate and the routing cost to capably deal out the traffic by choosing the most popular routes. In an better packet flow of allocation, the node present density metrics are represented. Experimental output shows the present scheme which enhances the characteristics compared with metrics are the traffic rate, transmission rate, packet latency, and packet loss rate. Chandra and Kumar [9] present an recent support and steady route ad hoc on demand multi-path distance vector communication that is an improvement of ad hoc on require multipath communication technique. Liu et al. [10] propose scheme which needs to reduce latency, and all nodes are designed with a surrounded buffer storage, which maintains the details of packet transmission. Bijon et al. [11] present a narrative many node

Efficient Packet Flow Path Allocation Using Node Proclivity Tracing …

605

suggestion-based secure routing method. Simulation output indicates the present method not only process well in the availability of differing suggestion node except also ensure better and scalable security depending details involvement by minimizing the entire packet transmission delay. Kiran and Reddy [12] present prioritized QoS which is used to increase the transmission rate of the prioritized stream in self-aware mobile ad hoc network. Experimental output of proposed scheme achieves better transmission rate which is compared with existing scheme. Song and Meng [13] propose the foundation of important the connection steadiness and the route steadiness; combine with information. Amin et al. [14] present various routing schemes in earlier to provide the solution for various problems. The important technique is ARRIVE which is used to control various problems during communication.

3 Proposed Efficient Packet Flow Path Allocation The workflow of the proposed ten layered deep convolutional neural network is network environment which contains lot of nodes, and its activities are different; normally, nodes behavior are changed based on circumstance such as location, packet size, and packet flow direction. The node characteristics are trusted in order to make perfect communication or else this may achieve imperfect communication. The node, which blocks the communication in the network, is called as omission node. It does not share the information to its neighbor node, so difficult to find omission attacker node and difficult to analyze the historical details of the node’s packet transmission rate. It increase energy usage and reduce attack detection rate. It increases detection efficiency and reduces energy usage. Figure 1 shows block diagram of the proposed efficient packet flow path allocation technique. Analyze the every node packet flow direction and size of packet during the communication period. Filter out the infrequent packet flow path in mobile network environment.

3.1 Analysis of Packet Flow and Size Data packet consists of header section and information section. Header section contains the information such as source address, destination address, total count of node, with its position details and parity bits. The maximum size of packet transmission causes overload, time delay, and more energy consumption for particular communication. PA—path allocation, PP—perfect path, NT—node tracing. PA = PP + NT

(1)

The packet block for each node in a path is analyzed. The energy usage of path finding and path generation process represents a minimum for entire network energy usage. Consider the all nodes are perform communication in time slot-based manner.

606 Fig. 1 Efficient packet flow path allocation workflow

R. Aruna et al. Packet Flow and packet size analysis

Filter out the infrequent packet flow

Efficient Packet Flow path allocation

Find and allocate perfect routing path

Node proclivity tracing algorithm

Improve detection efficiency, reduce energy consumption

Nodes forward data packet is organized destination node, moreover, it provide straight link connection with next neighbor node in a routing path. The communication process is used to moderate intrusion among processing nodes during particular time slot allocation. The every packet transmission time slot usage is measured. Then assign that time slot for every packet transmission from source node to destination node. PS—packet size, PF—packet flow. PP = PS ∗ PF

(2)

A successfully packet transmission is performed whether the each data and reply packets are accepted, no loss in packets planned by the receiver node. Every time analyze the node energy level for before packet transmission and after packet transmission. Packet size is important for obtaining the perfect routing path. P(H + I) header and information of packet. P1(t1) packet 1 with its time slot 1. PS = P(H + I )

(3)

PF = P1(t1) + P2(t2) + P3(t3)

(4)

Packet flows on next neighboring node are estimated by using the already estimated data broadcasting node energy levels and re-broadcasting to each nodes in routing path. However, estimation of the packet flows is an overall optimization procedure. However, it does not provide a straight link to each other nodes in routing path that having an lesser packet loss.

Efficient Packet Flow Path Allocation Using Node Proclivity Tracing …

607

3.2 Efficient Packet Flow Path Allocation Algorithm Routing paths are provided among the source and destination node with support of relaying node, and it offer the routing paths, which are not used for further communication process. While the heavy load is exceeds the threshold level, the traffic is occurred and use another relay node for packet transmission. The packets are uneven, consistently dispersed, which are dissimilar arbitrarily select routes in a mobile network. The presence of mobile node on the processing route and the standard accessibility of route is estimated for communication process. The packet flow sharing on every route is based on node density. P1(t1) + P2(t2) + P3(t3) = Pn(tn)

(5)

A node packet size is with the waiting state time which are measured, after that neighbor node decides that this is attacker or trust node. While the waiting state of packet is maximum that the node is totally packed. Data packet received at receiver node, which are removed. Otherwise, when the buffer storage is not maximized, analyzing the nodes input packet and output packet is essential. The receiving and transmitting time of packet in every node are efficiently measured. The time gap for each packet transmission in receiving and next ready transmitting. The packet transmission time consumes minimum level means it offers the efficient communication. Additionally, estimate the total amount of waiting time for load occurrence. PF = Pn(tn)

(6)

The present packet receiving time and the present packet transmitting time of each node, correspondingly. The existing packet receiving rate and existing packet transmitting time of each node, correspondingly. The receiving time of the previous packet and the second packet for routing. It is applied as a threshold level to identify omission attack, which cannot transmit and receive packet in specific time period. Data packets are initially disjointed and then broadcasted among various routes in the nodes connected in network environment. CN(E) characteristics of node are most efficient. PP = P(H + I ) ∗ Pn(tn)

(7)

NT = CN(E) ∗ (P1 + P2 + P3)

(8)

Depending on the rate of packet transmission, the intermediate nodes are chosen as relay nodes it consumes lesser energy. Relaying node chooses the most unprocessed route as the defaulting route, and data packets flow among the default path. While the omission attack is identified in a evasion route, subsequently, the intermediate node another time selects the most better route from the residual routes and estimates the most helpful route.

608

R. Aruna et al.

Efficient packet flow path allocation algorithm: Step 1: Measure packet size and packet flow for each node Step 2: For each select neighbor node sequentially Step 3: Analyze the load of routing path Step 4: It checks the path nodes are perfect Step 5: if {node==perfect} Step 6: Permit packet transmission Step 7: else Step 8: if {node==omission} Step 9: It does not permit packet transmission Step 10: Search another node for routing Step 11: choose relay node Step 12: end if Step 13: end for The routing time taken is individual part of the network need. Consider dynamic communication is exactly residential from the min-hop design. The connectivity with lesser steadiness is known as overload. With dynamic communication, whether a connection is one of associates creating the minimum distance route, this selects with lesser delay path. Whether this connection is the center in network environment, also its breakdown path basis reduces network performance. Sequence to overcome the difficulty, it choose the path which has maximum steadiness. It refers to the max-flow part, and it obtain the connection steadiness in networks as network overload. Procedure of communication path chosen, the maximum amount of packet are forwarded easily, the node process is monitored by using node proclivity tracing algorithm. NT = CN(E) ∗ (Pn)

(9)

PA = P(H + I ) ∗ Pn(tn) + CN(E) ∗ (Pn)

(10)

This algorithm traces the all node activities during communication time and achieves perfect routing path for source node to destination node in network environment. The data packets are overloaded by omission attack, which are easily traced. Those details are found and remove the node use alternate relay node for communication based on packet flow and packet size. Node proclivity tracing algorithm Step 1: Node process monitoring Step 2: for each trace all details of node communication Step 3: if {nodeTrace==sufficient} Step 4: effective packet sharing in path Step 5: detect attacks and remove Step 6: else Step 7: if {nodeTrace==unsufficient}

Efficient Packet Flow Path Allocation Using Node Proclivity Tracing … Table 1 Simulation setup

No. of nodes

100

Area size

1050 × 900

Mac

802.11g

Radio range

250 m

Simulation time

43 ms

Traffic source

CBR

Packet size

512 bytes

Mobility model

Random way point

Protocol

AODV

609

Step 8: change the relay node for transmission. Step 9: End if. Step 10: Increase detection efficiency, and reduce energy usage Step 11: End for The node proclivity tracing algorithm is constructed to trace the every mobile node characteristics for routing time. Packet ID contains each mobile node information. It also trace the routing mobile node characteristics and maintained in routing table.

4 Implementation Simulation Setup Packet flow path allocation (EPF) technique is simulated with network simulator tool (NS 2.34). In our simulation, 100 mobile ad hoc nodes are placed in a 1050 m × 900 m2 region for 41 ms simulation time. Each mobile node goes random manner among the network in different speed. All nodes have the same transmission range of 250 m and are shown in Table 1. The constant bit rate (CBR) provides a constant speed of packet transmission in the network to restrict the packet overload during communication time.

5 Results and Performance Analysis Figure 2 shows that the proposed packet flow path allocation (EPF) technique is used to achieve perfect routing path without omission attack which is compare with existing HSR and ARHUM. EPF is used to identify the omission attack, which is available in routing path, and this node blocks the packet during communication period. Node proclivity tracing algorithm is designed to trace the node character and provide normal routing. It improves detection efficiency and reduces energy usage. Through simulation, the following performance metrics are analyzed using X graph in NS 2.34. Figure 3 shows energy consumption, how extended energy spends

610

R. Aruna et al.

Source ID

Destination ID

3

Packet Flow, and packet size Analysis

3

Filter out infrequent packet flow

3

Efficient Packet Flow path allocation

3

Node proclivity tracing algorithm

2

3

Fig. 2 Proposed efficient packet flow path allocation packet format

for communication that means calculate energy consumption starting energy level to ending energy level. In proposed EPF method, perfect routing path without omission attack; energy consumption is minimized compared to existing method HSR, OEERA, and ARHUM. Energy Consumption = Initial Energy − Final Energy

(11)

Packet Delivery Ratio: Fig. 4 shows packet delivery ratio which is measured by no. of received from no of a packet sent in particular speed. Node velocity is not a constant, and simulation mobility is fixed at 100 (bps). In proposed EPF method, packet delivery ratio is improved compared to existing method HSR, OEERA, and ARHUM (Fig. 5).

Fig. 3 Proposed OEERA output

Efficient Packet Flow Path Allocation Using Node Proclivity Tracing …

Energy Consumption(J)

Fig. 4 Graph for nodes versus energy consumption

611

Exitsing HSR

Exitsing ARHUM

Exitsing OEERA

Proposed EPF

250 200 150 100 50 0 20

40

60

80

100

Nodes

Packet Delivery Ratio(%)

Fig. 5 Graph for mobility versus end-to-end delay

Existing HSR

Existing ARHUM

Existing OEERA

Proposed EPF

100 90 80 70 60 50 40 30 20 10 0 20

30

60

80

100

Nodes

Packet Delivery Ratio = (Number of packet received/Sent) ∗ speed

(12)

Network Lifetime: Fig. 6 shows that lifetime of the network is measured by nodes process time which is taken to utilize network from overall network ability, and node proclivity tracing algorithm is designed to trace the node, to remove the omission attack, with stable routing. In proposed EPF method network, lifetime is increased compared to existing method HSR, OEERA, and ARHUM. Network Lifetime = time taken to utilize network/overall ability

(13)

Packet Integrity Rate: Fig. 7 shows that packet integrity of particular communication in the network is estimated by nodes transmit a packet with coverage limit.

612

R. Aruna et al.

Fig. 6 Graph for nodes versus network lifetime

Existing ARHUM Proposed EPF

Network Lifetime(%)

100 90 80 70 60 50 40 30 20 10 0

Existing HSR Existing OEERA

20

40

60

80

100

Nodes

In proposed EPF method, packet integrity rate is improved compared to existing method HSR, OEERA, and ARHUM.  Packet integrity rate =

Number of packet successfully sent coverage limit

 ∗ 100

(14)

Detection efficiency: Fig. 8 shows detection efficiency, and attacks occur packet transmission which is repeated from source node to destination node. Detection efficiency = attack detection rate/overall time

Fig. 7 Graph for nodes versus packet integrity rate

Existing HSR

Existing ARHUM

Existing OEERA

Proposed EPF

Packet Integrity rate(%)

100 90 80 70 60 50 40 30 20 10 0

(15)

20

40

60 Nodes

80

100

Efficient Packet Flow Path Allocation Using Node Proclivity Tracing …

613

Fig. 8 Graph for nodes versus detection efficiency

6 Conclusion Mobile environment contains the different activities of more number of nodes for routing. The node activity is best means providing the routing else it providing faultbased routing, the packet transmission is carried in irregular manner which is known as omission attack, and this nodes must node need to transmit or receive data packet frequently. It minimize the detection efficiency and improve energy usage. Then, the proposed efficient packet flow path allocation (EPF) method is used to obtain the packet transmission path for sender to target node. This attacker should block the communication on specific routing path. This scheme measures the packet size and packet flow before start the packet transmission. Node proclivity tracing algorithm is constructed to analyze the behavior and achieve perfect routing path. It increases detection efficiency and reduces energy consumption. In future work, it focuses cross layer-based MIMO routing technique to measure different parameters.

References 1. Saghar K, Tariq M, Kendall D, Bouridane A (2016) RAEED: a formally verified solution to resolve sinkhole attack in wireless sensor network. In: The proceedings of IBCAST 2. Saghar K, Henderson W, Kendall D, Bouridane A (2010) Formal modelling of a robust wireless sensor network routing protocol. In: The proceedings of NASAIESA conference on adaptive hardware and systems 3. Mohanapriya M, Krishnamurthi T (2014) Modified DSR protocol for detection and rem oval of selective black hole attack in MANET. Comput Electr Eng 40(2) 4. Deng R, Han, Mishra S (2006) lNSENS: intrusion-tolerant routing for wireless sensor networks. Comput Commun 29(2):216–230 5. . Chacko JM, Senthilkumar KB (2016) SINR based hybrid multipath routing protocol for MANET. In: The proceedings of emerging trends in engineering, technology and science, pp 1–6 6. Priya S, Gopinathan B (2014) Reliable data delivery in MANETs using VDVH framework. In: The proceedings of information communication and embedded systems, pp 1–6

614

R. Aruna et al.

7. Akbas A, Yildiz HU, Tavli B, Uludag S (2016) Joint optimization of transmission power level and packet size for WSN lifetime maximization. IEEE Sens J 16(12):5084–5094 8. Mallapur SV, Patil SR, Agarkhed JV (2015) Load balancing technique for congestion control multipath routing in mobile ad hoc networks. In: The proceedings of TENCON 2015–2015 IEEE region 10 conference, pp 1–6 9. Chandra VN, Kumar K (2015) QoS improvement in AOMDV through backup and stable path routing. In: The proceedings of communication systems and network technologies (CSNT), vol 2, pp 283–287 10. Liu J, Xu Y, Jiang X (2014) End-to-end delay in two hop relay MANETs with limited buffer. In: The proceedings of computing and networking, pp 151–156 11. Bijon KZ, Haque MM, Hasan R (2014) A trust based Information sharing model (TRUISM) in MANET in the presence of uncertainty. In: The proceedings of privacy, security and trust, pp 347–354 12. Kiran M, Reddy GRM (2011) Throughput enhancement of the prioritized flow in self aware MANET based on neighborhood node distances. In: The proceedings of computer applications and industrial electronics, pp 497–502 13. Song W, Meng L (2012) High-stability routing protocol based on min-cost max-flow algorithm for manet 14. Amin S, Saghar K, Abbasi MBT, Elahi A (2017) ARHUM-ARRIVE protocol with handshake utilization and management. In: The proceedings of applied sciences and technology, pp 401– 407

Energy-Efficient Multilevel Routing Protocol for IoT-Assisted WSN Himani K. Bhaskar and A. K. Daniel

Abstract Sensor network has played important role in various aspects of life as monitoring of habitat, fire alarms, agriculture, military, and building control monitoring system. Sensor nodes (SNs) have limited battery life. The energy-efficient multilevel routing protocol (EE-MLRP) is proposed, in which the network is partitioned into levels and sublevels. The paper proposed a rank-based cluster head (CH) selection process using maximum residual energy of SNs, minimum base station (BS) distance, and maximum neighbor nodes as a parameter. The cluster head to cluster head communication takes place in the network. The CH aggregates the data and transmits it to the BS. The proposed protocol simulation results improved system performance and enhanced overall network lifetime. Keywords Sensor node · IoT · Sensor network · Residual energy · Clustering

1 Introduction In [1], the authors have discussed that the sensor network continuously monitors the environment. Wireless sensor network (WSN) application ranges from weather forecasting, environmental monitoring, smart cities, smart complex, complex manufacturing plant control, military surveillance, etc. In [2], the authors have discussed the last few years that SN with important characteristics such as small size, low cost, and smarter work made this possible. In [3], over 5 billion devices are linked, with more to come in the upcoming future. The survey reveals that by 2022, an estimated 30 billion entities are expected to be included with IoT devices. This opens numerous doors for IoT study in various research fields. The few SN are designed in the network for IoT devices friendly. In [4], the authors said that the SN collaborates with other nodes to perform a sensing task, transmitting data to BS. These sensors have wireless interfaces that enable them to connect to form the wireless network.

H. K. Bhaskar (B) · A. K. Daniel MMM University of Technology, Gorakhpur, Uttar Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_53

615

616

H. K. Bhaskar and A. K. Daniel

In [5], the authors said that the WSN architecture is influenced by various applications and important factors like design goals, cost, hardware, and limited battery constraints. In [6], the WSN is used to track and monitor objects within the network continuously. The detecting scope of sensors might be unit or variable. The aim of the sensor network is to cover the maximum area by using the minimum number of SN. In [7], he author discussed that the SN are deployed in a deterministic or random way in the network. The battery power of the SN is limited to perform any task for a limited time. The energy optimization is an essential task in the network because it defines system aliveness and stability period. The WSN has two types of sensor nodes known as homogeneous and heterogeneous. The homogeneous sensor has the same energy level; however, the heterogeneous sensor has different energy levels throughout the network. The sensor nodes cover a large network area to achieve the QoS requirement in the sensor network. In [8] discussed, numerous techniques are used in WSN to solve several problems in IoT-based networks. The important concern is to resource utilization of WSNs for IoT-based applications. In [9], the authors discussed that clustering is an important technique for improving network lifetime. The SN is grouped into various clusters. Each cluster has a single node that works as a CH, and the rest of the SN performs a function as CM. The selection of CH is an important task because it requires more energy than CM nodes. The CH aggregated data and transmitted it to BS. The efficient algorithm is developed for optimum energy utilization inside the network. In the paper, energy-efficient multilevel routing (EE-MLRP) protocol is proposed. The network area is divided into multiple regions and sub-regions. The paper focuses on developing an effective CH selection process using maximum RE, distance minimum from the BS, and maximum neighboring SN. The CH aggregates data and transmit to BS. The architecture of sensor network is shown in Fig. 1. The research paper is structured as follows: In Sect. 2, related work is discussed. Energy-efficient multilevel routing (EE-MLRP) protocol is discussed in Sect. 3. The simulation result is in Sect. 4, and the conclusion is in Sect. 5. Fig. 1 Architecture of sensor network

Energy-Efficient Multilevel Routing Protocol for IoT-Assisted WSN

617

2 Related Work In [10], the authors discussed that in making people’s life smart and effortless, the Internet of Things (IoT) has a massive effect on our daily lives. The Internet of Things is also increasing the demand of Internet users all around the globe. The sensor node can be deployed anywhere and contains a large amount of data. The node battery power, computing capability, and range are all constrained. The IoT-enabled wireless sensing and monitoring platform that monitors smoke, fire alarms, carbon monoxide, and security in the smart buildings. The rerecorded data is useful for further monitoring and controlling the smart building environment. The innovation in new technology emerges the need in the human lives. The twenty-first century has more advancements in the fields of the digital world. The new modern technology widely focused on monitoring and controlling modern devices. This is easily performed nowadays because it reduces wired transmission complexity and facilitates the easy installation of sensors, actuators, and controllers. The cost of installing the devices is minimized by innovation of wireless technology. A number of wireless communication technology have been developed in which we can construct a network based on applications and strengths. The data in the sensor network is routed using different routing protocols. In [11], the authors discussed that the communication in the network takes place in three ways: direct communication, multi-hop communication, and clustering technique. Clustering is an important technique performed in either static or dynamic ways. In [12], the authors discussed that cluster formation is fixed in a static approach and dynamically changes based on specific rules in the dynamic approach. The cluster formation is energetic in each round. In [13], the author divided the network into the different regions for the deployment of the SNs and discussed the fuzzy logic technique for selection of CH using parameters like RE, minimum distance from BS, and load parameters which reduced the energy consumption and the lifetime, and the throughput of the network was increased. In [14], the author discussed regionbased clustering for node deployment and used fuzzy information for selecting CH also some of the nodes used hybrid routing protocol to transmit data directly to the BS. The selection process time and re-election of CH were minimized. The energy consumption was reduced, and the lifetime of network was enhanced. In [15], the author discussed the enhancement of EECP for target coverage. A mechanism of coding for node scheduling protocol and at cluster level, a cover level was set for target coverage. The number of transmissions and energy consumed was enhanced and improved throughput and average energy. In [16], the authors introduced the LEACH protocol for WSN. The LEACH is the first clustering protocol, and it uses single-hop communication. The protocol works for a homogenous environment, not for a heterogeneous environment. The LEACH protocol enhanced the lifetime of IoT-based systems. Numerous routing techniques for WSN are introduced to increase the efficiency of the various IoTenabled applications. The IoT plays a vital role in connecting smart devices with the cloud infrastructure. The more energy is consumed during the transmission of

618

H. K. Bhaskar and A. K. Daniel

data in the network. Developing an energy-efficient routing protocol has always been an important and challenging task for a researcher. Wireless sensors and IoT devices play an essential task and therefore require energy consumption schemes for optimal utilization of resources inside the network. The cluster formation and efficient CH selection strategies use optimal resources inside IoT—the enabled network. In [17], the authors introduced an enhanced version of the LEACH protocol which is LEACH centralized with a chain. The selection of super cluster head (SCH) is done by utilizing maximum CH value in the protocol and enhanced network life span. In [18], the authors introduced the A-LEACH protocol to collect the auxiliary SN data of each cluster and send to the respective CH. The SN is energy efficient because the node nearer to BS is selected as auxiliary nodes in the cluster. The auxiliary SN collected the data, sent it to CH nodes for routing tasks, and transmitted it to BS. In [19], the authors introduced the LEACH-C protocol, which behaves like the LEACH protocol but is different in the steady phase. The sensor node contains the global positioning system in the network to track the object. The BS receives location information and the RE level of each node. The protocol efficiently uses resources and enhances network lifetime. In [20], the authors discussed the MH-LEACH protocol. The multi-hop scheme collects data and transmits it to BS. The SN collected the data and sent it to the respective zonal CH and communicated in the network through CH-to-CH. The CHs perform data aggregation tasks and transmit to BS. The optimal path is chosen for transmission of data to the BS. The number of hop count for transmission of data in the network is lowered by using multi-hop transmission and minimizing the network energy consumption. The limitation of the protocol is that delay is increased due to unnecessary hop counts in the network. In [21], the authors said the ARZ-LEACH protocol is used for static clustering and improves the CH selection technique for sensor networks. The network area consists of advanced clusters, rectangular clusters, and zones for handling data collection tasks in the network. The grid area is divided in multiple regions. The position of BS is near center. The advanced nodes are nearer to BS. The data is collected by advanced nodes from their CM, and other CHs transmitted to BS. A minimum, one advanced cluster is present in the group of clusters. In comparison with the LEACH routing protocol, the AZR-LEACH protocol uses a different technique process of CH selection. The SN having the maximum RE is chosen as CH among the nodes in the rectangular cluster. The advantage of the protocol is that it minimizes the network traffic by portioning the network area into multiple sub-areas for performing the various tasks in the network and improved system performance. In [22], the authors said that in the SEP protocol, normal nodes (NN) and advanced nodes (AN) are used. In comparison with the NN, the AN has more opportunities to become CH causes various issues. In [23], the authors introduced E-SEP to overcome the existing problems. The E-SEP protocol adds an intermediate node layer in the SEP protocol and overcomes the limitation of the SEP protocol. In [24], the authors introduced the MR-SEP protocol, an improved version of SEP. The MR-SEP splits the network area into different cluster levels, and each level has its CH. The criteria for member SN to join CHs was by using

Energy-Efficient Multilevel Routing Protocol for IoT-Assisted WSN

619

minimum BS distance as a parameter. The CH collects the data from the member SN and adjacent layer CHs for the data transmission task. The upper layer CHs act as SCH for the lower layer CHs. The CHs are equally distributed across the network area by using different layering approaches. The multi-hop approach is utilized in the protocol to enhance network lifespan. In [25], the authors discussed the relay node concept for data transmission in the network to improve the system performance. In [26], the authors introduced the HEED protocol. The CH selection on the basis of node RE and proximity principle, which minimized the network overhead and enhanced the network’s lifetime. In [27], the authors introduced the EEDCA technique which were the criteria for selecting CH based on the position of the node and RE of SN. The hierarchical transmission in WSN is one of the best ways to route the data to the destination in the sensor network. In hierarchical architecture, the network area is divided into different layers to carry out different tasks. In the clustering technique, the SN is divided into distinct groups and data is transmitted to the head node, which is then transmitted to the BS. In [28], the author introduced SGO technique. The optimization technique to reduce the transmission distance and the number of CH selected dynamically. In [29], the author discussed the fuzzy-based range-free localization algorithm. To estimate the sensor node’s location, the RSSI and LQI information were used from the anchor node, which minimized the consumption of energy and extended network lifetime. In [30], the author introduced the RBCHS protocol. The network division was done into different regions, advanced nodes and normal nodes deployed based on their energy level, and data transmitted to the BS by hybrid routing scheme. The entire network is covered, and the network lifetime was enhanced. The EE-MLRP protocol is introduced for WSN. The two-tier architecture is used for transmitting network data. In the network, communication takes placed as CHto-CH and data transmission takes place through BS to the cloud and the user can access the same data anytime and anywhere.

3 Proposed Energy-Efficient Multilevel Routing Protocol The EE-MLRP is proposed, in which the network is partitioned into various levels. This research article aims to develop a rank-based efficient selection of CH by taking maximum RE and minimum distance to the BS ratio so that there is no node failure problem and maximum neighboring nodes as parameter. The process utilized the optimal CH selection number inside the network. The working of the EE-MLRP protocol as follows: The network is partitioned into regions, which are then further divided into sub-regions. For transmission of data in the network, multilevel architecture is used. The CH-to-CH communications to transmission of data and finally transmitted to BS, where the user can access the same data anytime and anywhere. The energy-efficient multilevel routing protocol is shown in Fig. 2. The working process of the EE-MLRP protocol is as follows:

620

H. K. Bhaskar and A. K. Daniel

Fig. 2 Energy-efficient multilevel routing protocol [4]

3.1 Setup Phase Setup phase partitioned into region and sub-region based on longitudinal distance from BS. The BS responsibility is to partition the network into two tiers/levels. The location of Tier 1 from the BS is distant, while Tier 2 is located nearer to the BS.

3.2 Formation of Cluster In the network, SN is distributed randomly, and cluster formation takes place. The unique identification number is allocated to each SN based on its location, which can only join its neighbor nodes cluster.

3.3 Selection of Cluster Head The efficient CH selection scheme for EE-MLRP protocol is discussed in this section. The CH is selected, and the message is broadcast to all SN inside the network, inviting other SN to join and become CM. When the nodes get the invitation message from the respective CHs node, to join immediate CH nodes based on distance information. The effective CHs selection method for data packet routing is done in the network. The Tier 1 CHs transfer the data to Tier 2 CHs. The Trier 2 CHs are nearer to BS, which collects and aggregates all the data and finally transmits to BS. The CH selection is performed by taking the maximum rank and neighbor nodes as parameters rank is defined as

Energy-Efficient Multilevel Routing Protocol for IoT-Assisted WSN

Rank = α ×

D RE × IE avg D

621

(1)

where α is the probability of the initial percentage of CHs, IE is the initial energy of the SN before any transmission of data, RE = residual energy of the SN after the respective round, D is the BS distance from all the SNs, and avg D = the average distance of the SNs from the BS or sink. Algorithm for EE-MLRP Initialization: R = residual energy, INE = initial energy of node, DBS = distance to base station, DAVG = average distance of node, RANK = rank of sensor node, NB = number of neighboring nodes of a SN, CH = cluster head, CM = member of cluster, SN = sensor node, CH=cluster head, α = probability of initial percentage of CHs, and BS = base station 1: for every region r 1:n 2: for every node i in r 3: SNR(i)=calculate_R(); 4: SNINE(i)=calculate_INE(); 5: SNDBS(i)=calculate_DBS(); 6: SNDAVG(i)=calculate_DAVG(); 7: SNRANK (i)=α*(SNR (i)/SNINE (i))* (SNDBS(i)/SNDAVG(i)): 8: SNNB(i)=calculate_NR() 9: if (SNRANK (i) is max && SNNB(i) is max) 10: SN(i) ← CH 11: else 12: SN(i) ← CM 13: End if 14: End for 15: End for 16: If (SN(i) is CH) 17: CH aggregates data 18: CH send data to BS 19: else 20: CM sends data to CH 21: End if

622

H. K. Bhaskar and A. K. Daniel

4 Simulation Results and Validation The simulations are performed in the MATLAB software, and it offers a good framework for various applications of algorithms, data virtualization and processing, numeric computation, etc. The heterogeneous nodes are used for the simulation purpose. 100 SN are scattered randomly in (100, 100) m2 area. The BS position is outside of the network boundary area. The AN and NN are distributed randomly in the network area. The 10% SN are AN, and the rest are NN in the network. We have compared the proposed protocol EE-MLRP with the SEP protocol for 3500 rounds of successful packet transmission in the same network setting. The simulation result is performed for lifetime of the network, dead nodes per round, remaining energy of network, and packets send to BS as shows in the below figures. The simulation parameters have been taken from [4]. Assumptions for EE-MLRP protocol (Table 1). 1. 2. 3. 4. 5.

Heterogeneous nodes are used to design the proposed protocol. The noise factor and collision are ignored in the system. The cluster head(s) aggregates all data and transmit it to BS. The distribution of SN is random and deterministic. The battery is not chargeable, and BS continuously supplies power.

In the proposed EE-MLRP protocol, the communication takes place from CHto-CH and an efficient rank-based CH selection takes place due to which energy consumption minimizes inside the network and hence the network lifetime is enhanced. The EE-MLRP protocol lifetime is 3099 rounds, while the SEP protocol is 2087 rounds. The performance of EE-MLRP protocol is better compared with SEP protocol as shown in Fig. 3. Figure 4 shows that in the proposed protocols first node and last node dies at 1501 rounds and 3099 rounds, respectively, while in SEP protocol, it is 1000 rounds and 2087 rounds, respectively. The proposed protocol has a better stability period than the SEP protocol. Table 1 Simulation parameters [10]

Parameter

Values

SN

100

Network area

(100, 100) m2

Free space model (E fs )

10 pJ/bit/m2

Multipath model (E amp )

0.0013 pJ/bit/m4

Initial level battery (E 0 )

0.5 J

Initial energy of advanced nodes

E 0 (1 + E α )

Electronic circuitry (E RX )

50 nJ/bit

Data aggregation (E DA )

10 nJ/bit

Energy-Efficient Multilevel Routing Protocol for IoT-Assisted WSN

623

Fig. 3 Lifetime of network

Fig. 4 Dead nodes per round

In Fig. 5, the proposed protocol, the packets transmitted are around 2.5 × 104 packets to the BS, where as in SEP, the packets transmitted are 1.6 × 104 packets to the BS. In the proposed EE-MLRP protocol, the efficient CH selection process reduces the energy consumption inside the network and data is transmitted using multilevel architecture and finally transmit the data to BS which overall reduces the energy consumption inside the network and enhanced residual energy of each node. As compared to the SEP protocol, the proposed EE-MLRP protocol has more residual energy per node as shown in Fig. 6.

624

H. K. Bhaskar and A. K. Daniel

Fig. 5 Packet transmitted to base station

Fig. 6 Remaining amount of energy

5 Conclusion The energy-efficient multilevel routing protocol (EE-MLRP) is proposed to select CH efficiently based on rank in the network using maximum node residual energy, a minimum distance of node from BS, and a maximum number of neighboring nodes. The CHs are chosen optimally for transmission of data inside the network. The proposed protocol simulation result shows an improved network stability period compared to the SEP protocol. In the future, fuzzy logic techniques will be integrated to select the optimal cluster head in the network.

Energy-Efficient Multilevel Routing Protocol for IoT-Assisted WSN

625

References 1. Sadek RA (2018) Hybrid energy aware clustered protocol for IoT heterogeneous network. Future Comput Inf J 3(2):166–177. https://doi.org/10.1016/j.fcij.2018.02.003 2. Kallam S, Madda RB, Chen CY, Patan R, Cheelu D (2018) Low energy aware communication process in IoT using the green computing approach. IET Netw 7(4):1–7. https://doi.org/10. 1049/iet-net.2017.0105 3. Almadhoun R, Kadadha M, Alhemeiri M, Alshehhi M, Salah K (2019) A user authentication scheme of IoT devices using blockchain-enabled fog nodes. In: Proceeding of IEEE/ACS international conference on computer systems and applications, AICCSA, vol 2018. http://doi. org/10.1109/AICCSA.2018.8612856 4. Narayan V, Daniel AK, Rai AK (2020) Energy efficient two tier cluster based protocol for wireless sensor network. In: 2020 international conference on electrical and electronics engineering (ICE3), pp 574–579 5. Narayan V, Daniel AK (2021) A novel approach for cluster head selection using trust function in WSN. Scalable Comput Pract Exp 22(1):1–13 6. Narayan V, Daniel AK (2019) Novel protocol for detection and optimization of overlapping coverage in wireless sensor networks. Int J Eng Adv Technol 8 7. Oyelade J et al (2019) Data clustering: algorithms and its applications. In: Proceedings of 2019 19th international conference on computational science and its applications, ICCSA 2019, no ii, pp 71–81. http://doi.org/10.1109/ICCSA.2019.000-1 8. Narayan V, Daniel AK (2022) CHHP: coverage optimization and hole healing protocol using sleep and wake-up concept for wireless sensor network. Int J Syst Assur Eng Manag 1–11 9. Al-Fuqaha A, Guizani M, Mohammadi M, Aledhari M, Ayyash M (2015) Internet of things: a survey on enabling technologies, protocols, and applications. IEEE Commun Surv Tutorials 17(4):2347–2376. https://doi.org/10.1109/COMST.2015.2444095 10. Narayan V, Daniel AK (2020) Multi-tier cluster based smart farming using wireless sensor network. In: 2020 5th international conference on computing, communication and security (ICCCS), pp 1–5 11. Senouci O, Aliouat Z, Harous S (2019) MCA-V2I: a multi-hop clustering approach over vehicle-to-internet communication for improving VANETs performances. Future Gener Comput Syst 96:309–323. https://doi.org/10.1016/j.future.2019.02.024 12. Cao Y, Zhang L (2018) Energy optimization protocol of heterogeneous WSN based on node energy. In: 2018 IEEE 3rd international conference on cloud computing and big data analysis (ICCCBDA), pp 495–499 13. Maurya S, Daniel AK (2014) An energy efficient routing protocol under distance, energy and load parameter for heterogeneous wireless sensor networks. In: 2014 international conference on information technology. IEEE, pp 161–166 14. Narayan V, Daniel AK (2021) RBCHS: region-based cluster head selection protocol in wireless sensor network. In: Proceedings of integrated intelligence enable networks and computing. Springer, Singapore, pp 863–869 15. Chaturvedi P, Daniel AK (2019) A novel approach for target coverage in wireless sensor networks based on network coding. In: Soft computing: theories and applications. Springer, Singapore, pp 303–310 16. Das Adhikary DR, Mallick DK (2017) An energy aware unequal clustering algorithm using fuzzy logic for wireless sensor networks. J ICT Res Appl 11(1):56–77. http://doi.org/10.5614/ itbj.ict.res.appl.2017.11.1.4 17. Pachlor R, Shrimankar D (2017) VCH-ECCR: a centralized routing protocol for wireless sensor networks. J Sens 2017(iii). http://doi.org/10.1155/2017/8946576 18. Vinodh Kumar S, Pal A (2013) Assisted-Leach (A-Leach) energy efficient routing protocol for wireless sensor networks. Int J Comput Commun Eng 2(4):420–424. http://doi.org/10.7763/ ijcce.2013.v2.218

626

H. K. Bhaskar and A. K. Daniel

19. Sivakumar P, Radhika M (2018) Performance analysis of LEACH-GA over LEACH and LEACH-C in WSN. Procedia Comput Sci 125:248–256. https://doi.org/10.1016/j.procs.2017. 12.034 20. Yousaf A, Ahmad F, Hamid S, Khan F (2019) Performance comparison of various LEACH protocols in wireless sensor networks. In: Proceedings of 2019 IEEE 15th international colloquium on signal processing and its applications, CSPA 2019, pp 108–113. http://doi.org/10. 1109/CSPA.2019.8695973 21. Information Technology (2020) Designing equal size clusters to balance energy consumption in wireless 22. Yazid Y, Ez-zazi I, Salhaoui M, Arioua M, Ahmed EO, González A (2019) Extensive analysis of clustered routing protocols for heteregeneous sensor networks. http://doi.org/10.4108/eai. 24-4-2019.2284208 23. Islam MM, Matin MA, Mondol TK (2012) Extended stable election protocol (SEP) for threelevel hierarchical clustered heterogeneous WSN. In: IET conference publications, vol 2012, no 601 CP. http://doi.org/10.1049/cp.2012.0595 24. Mittal N (2020) An energy efficient stable clustering approach using fuzzy type-2 bat flower pollinator for wireless sensor networks. Wirel Pers Commun 112(2):1137–1163. https://doi. org/10.1007/s11277-020-07094-8 25. Han W (2016) Performance analysis for NOMA energy harvesting relaying networks with transmit antenna selection and maximal-ratio combining over Nakagami-m fading. IET Commun 10(18):2687–2693. https://doi.org/10.1049/iet-com.2016.0630 26. Tang C (2020) A clustering algorithm based on non uniform partition for WSNs. Open Phys 18(1):1154–1160. https://doi.org/10.1515/phys-2020-0192 27. Meddah SM, Haddad R, Ezzedine T (2017) An energy efficient and density control clustering algorithm for wireless sensor network. In: 2017 13th international wireless communications & mobile computing conference, IWCMC 2017, pp 357–364. http://doi.org/10.1109/IWCMC. 2017.7986313 28. Kalla N, Parwekar P (2018) Social group optimization (SGO) for clustering in wireless sensor networks. In: Intelligent engineering informatics. Springer, Singapore, pp 119–128 29. Parwekar P, Reddy R (2013) An efficient fuzzy localization approach in wireless sensor networks. In: 2013 IEEE international conference on fuzzy systems (FUZZ-IEEE). IEEE, pp 1–6 30. Maurya S, Daniel AK (2014) Hybrid routing approach for heterogeneous wireless sensor networks using fuzzy logic technique. In: 2014 fourth international conference on advanced computing & communication technologies. IEEE, pp 202–207

Ship Detection from Satellite Imagery Using RetinaNet with Instance Segmentation Arya Dhorajiya, Anusree Mondal Rakhi, and P. Saranya

Abstract Recognizing the centrality of ship detection and segmentation based on optical remote sensing images in the military and civilian domains, the sea surface remote sensing field is receiving a lot of attention. The deep learning-based ship detection approach can also extract vessel position and category information, crucial for maritime surveillance. Unfortunately, less research has been done in this area. The majority of known approaches for spotting ships perform object detection but don’t perform semantic segmentation. Furthermore, earlier studies included limitations such as the inability to detect small ships or ships of diverse sizes and the higher rate of false positives. Convolutional Neural Networks (CNNs) benefits are also significantly reduced due to the complexities of SAR images. Then there are a few models that can’t discern the difference between items that seem like ships, and these false alarms lower the total accuracy rate. The available models have also faced trouble while identifying and segmenting ships with a complicated background and dealing with light variations. Furthermore, some existing models performed somewhat worse than other state-of-the-art frameworks. Some models have a higher cost of computation. This paper uses RetinaNet to provide the ship detecting algorithm. Instance Segmentation was also included in the model. The proposed model was created by integrating these two strategies. RetinaNet is a fantastic one-stage object detection model that can precisely work with dense and small-scale objects. The Feature Pyramid Network (FPN) and Focal Loss are two improvements over conventional one-step object detection models. Then basically, an instance segmentation algorithm has been implemented in place of the object detector to increase RetinaNet’s performance by removing background better. The proposed model can work more precisely than existing models due to these adjustments. We achieved a promising result using the described techniques on the Airbus dataset from Kaggle, with Average Precision (AP) score of 65.7 and APs of 51.8, respectively.

A. Dhorajiya · A. M. Rakhi · P. Saranya (B) Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_54

627

628

A. Dhorajiya et al.

Keywords Optical remote sensing images · Semantic segmentation · Convolutional neural networks · Synthetic aperture radar · RetinaNet · Feature pyramid network · Focal loss · Instance segmentation · Average precision

1 Introduction In the realm of remote sensing image detection, ship detection and segmentation using remote sensing imagery is a central research direction. Due to the wide aspect ratio, ships are more challenging to detect in remote sensing imagery than other detection jobs, such as automobiles, buildings, and aircraft [1]. However, because ship identification is crucial in marine transportation, fisheries management, and maritime catastrophe rescue, more study into ship detection with a greater accuracy rate is needed [2]. The identification of ships using synthetic aperture radar (SAR) has long been a hot topic. The higher performance of deep neural networks in computer vision has enabled its application to SAR image identification. Marine surveillance also necessitates the identification of SAR ships [3]. However, there are a few difficulties with SAR ship detection at the present time. First, dealing with multi-scale ship recognition has been a problematic study issue since SAR ship targets vary widely in size. Second, dealing with tiny target detection is difficult since the size of a SAR ship target is small in low-resolution photos [4]. As a result, this paper employed optical images instead of SAR images. Many deep learning methods are now employed in ship detection and segmentation, but they all have a lot of issues with ship detection and segmentation. When dealing with multi-scale ship detection, ship detection and segmentation become more complex. Small ships with few pixels and poor contrast are also complicated to detect and segment, and typical detection algorithms are unable to do so [5]. Ship detection and segmentation processes are also associated with low accuracy rates, missing detection, and erroneous detection [6]. Due to complicated backgrounds and parked ships side by side, ship detection in high-resolution optical satellite imagery is a complicated operation [7]. Then there’s the problem of finding and identifying small items and distinguishing between different ship kinds. And these flaws have a significant impact on overall accuracy. Along with the low accuracy rate, present models also operate at a slow speed, limiting the favorable aspects of ship identification and segmentation [8]. To mitigate the drawbacks mentioned above in the existing papers, a ship detection model using RetinaNet and the Instance Segmentation part has been proposed. As the name suggests, the main purpose of this model is to detect and segment the ships from the targeted area with a higher accuracy rate. This model can also differentiate the ship-like object and detect ships under critical variation of lights, even with a complex background. Otherwise, this model can also work correctly with smaller ships of different sizes. This number of false positives and false alarms is very minimal in the proposed model. Moreover, this model can work with a very complex and confusing

Ship Detection from Satellite Imagery Using RetinaNet with Instance …

629

Fig. 1 Detection of ships from a complex background

background, as shown in Fig. 1. So, this proposed method can efficiently detect and segment ships at the Pixel Level. The following is how the paper is organized: In Sect. 2, we look at some related work on detecting and segmenting ships. The proposed implementation methodology, which includes a brief description of RetinaNet and Instance Segmentation for identifying and segmenting ships, is described in Sect. 3. In addition, a brief description of the focal loss is included here. Section 4 shows the proposed model’s implementation and a performance comparison with previous models. It also offers information about the dataset used to examine the proposed model. The findings and future research directions are described in Sect. 5.

2 Related Work Both military and civilian application fields rely heavily on ship detection [9]. Deep convolutional networks have been used successfully, although performance is hampered by problematic images such as a complex background, light variation, and low visibility of the surrounding environment. Y. You offered a Scene Mask R-CNN model, which is exactly what it is named. It’s a complete end-to-end method that minimizes false alarms while delivering a satisfactory detection result. As a result of the reduction, the total accuracy rate is significantly higher than the Faster R-CNN model. As a result, this model was built around an end-to-end process with four subnetworks, each with its function. Feature Extraction Network (FEN), Scene Mask Extraction Network (SMEN), Region Proposal Network (RPN), and Classification and Regression Network (CRN) are the acronyms for these networks. The HRSC2016 dataset and additional 215 high-resolution optical RS imageries in DOTA

630

A. Dhorajiya et al.

were used to train this model. Despite the trained network’s great identification accuracy, some onshore ship-like features, such as docks, rooftops, and roadways, are viewed as more likely items of interest than ships. Mask R-CNN, a fusion of Faster R-CNN and FCN, was utilized by Nie et al. [10] to construct a ship detection and segmentation model. The Airbus Ship Detection Challenge dataset was used to train this model. Using this technique, the identification and segmentation of mAPs improved from 70.6% and 62.0% to 76.1% and 65.8%, respectively. Due to the loss of low-level features inside its framework, this model cannot detect ships with small sizes correctly. Zhu et al. [11] provide a model for an arbitrary-oriented ship detection approach that addresses the issues of arbitrary orientations, huge aspect ratios, and dense arrangements that general object detection systems struggle with. A rotated RetinaNet, a corrected network, a feature alignment module, and an enhanced loss function are all part of the suggested technique. Two datasets, HRSC2016 and DOTA, were used to test the model’s validity. Even though this method outperforms other state-of-the-art methods in terms of experimental outcomes, the proposed method’s AP is not the highest. To overcome the overall challenges from the existing papers, such as the inability to detect small ships or ships of diverse sizes and the higher rate of false positives, Convolutional Neural Networks (CNNs) benefits are also significantly reduced due to the complexities and disadvantages of SAR images. Then there are a few models that can’t discern the difference between items that seem like ships, and these false alarms lower the total accuracy rate. The available models have also faced trouble while identifying and segmenting ships with a complicated background and dealing with light variations. Furthermore, some existing models performed somewhat worse than other state-of-the-art frameworks. Some models have a higher cost of computation. A modified model is proposed to detect and segment ships using RetinaNet and Instance Segmentation part.

3 Methodology Object detection is an essential component that influences visual performance. Object detection algorithms are being utilized to solve a wide range of commercial problems, including autonomous driving, video surveillance, medical applications, and many more. The RetinaNet model was developed by the Facebook AI Research (FAIR) team to solve the issue of recognizing dense and small objects. As a result, it has become a popular object recognition model for both aerial and satellite photography. RetinaNet outperforms previous single-stage object identification algorithms by two significant improvements. The following are two notable enhancements: • Feature Pyramid Networks (FPN) • Loss of Focal.

Ship Detection from Satellite Imagery Using RetinaNet with Instance …

631

Because of the class imbalance issue, one-stage detectors perform worse than two-stage detectors. As a result, this work introduced RetinaNet Model with the idea of Focal Loss to fill in for the class imbalances and inconsistencies problems of single-shot object detectors such as You Only Look Once (YOLO) and single-shot detector (SSD) when dealing with extreme foreground–background classes. The suggested model’s essential flow is that the raw input image is fed to the model initially. This input image will now go through the pre-processing stage before moving to the feature extraction stage. This model’s backbone layer is composed of ResNet50 and FPN. After the feature extraction section is completed, the result will be transferred to the Classification, Bounding Box, and Mask generation parts. Finally, the final image will be generated by combining the output of all the given subnets (Fig. 2). Fig. 2 Flow of the proposed model

632

A. Dhorajiya et al.

3.1 Architecture of RetinaNet Model with Instance Segmentation Part The original RetinaNet architecture can be broken down into the three components listed below: 3.1.1 Backbone Network (Feature Extraction Network) 3.1.2 Subnetwork for object Classification 3.1.3 Subnetwork for object Regression (Fig. 3).

Fig. 3 Architecture of RetinaNet with instance segmentation

Ship Detection from Satellite Imagery Using RetinaNet with Instance …

3.1.1

633

The Backbone Network

Bottom-up pathway: For feature extraction, the bottom-up approach (ResNet) is applied. As a result, it calculates the feature maps at various scales, regardless of the size of the input image. Top-down pathway with lateral connections: The top-down pathway (FPN) uses higher pyramid levels to sample progressively coarser feature maps. In contrast, the lateral connections combine top-down and bottom-up layers of the same spatial dimension. Higher-level feature maps have a lower resolution but are semantically stronger and are therefore more suited to recognizing more oversized objects; lower-level feature maps, on the other hand, have higher resolution and are thus better suited to detecting smaller objects. Consequently, by integrating the top-down pathway and its lateral connections with the bottom-up pathway, which does not need much more computing, every level of the resultant feature maps may be semantically and geographically powerful. As a result, its design is scale-invariant and capable of offering improved speed and precision.

3.1.2

Subnetwork for Object Classification

Each FPN level contains a fully convolutional network (FCN) for object classification. This subnetwork is made up of 3 * 3 convolutional layers with 256 filters, followed by another 3 * 3 convolutional layer with K * A filters, as seen in the graphic above. Consequently, the size of the output feature map will be W * H * KA, where W and H are proportional to the input feature map’s width and height, respectively, and K and A are the numbers of object classes and anchor boxes. Finally, the suggested sigmoid layer was applied for object categorization. Because there is “A” number of anchor box suggestions for each point in the feature map collected from the final convolution layer, the KA filters in the last convolution layer are used, each anchor box has the capability of being categorized into K different classes. As a result, the resulting feature map would have KA channels or filters as its size.

3.1.3

Subnetwork for Object Regression

The regression subnetwork is linked to each feature map of the FPN in the same way that the classification subnetwork is. The regression subnetwork is similar to the classification subnetwork, except the final convolutional layer is 3 * 3 with four filters, resulting in a W * H * 4A output feature map. The last convolution layer contains four filters since the regression subnetwork outputs four integers for each anchor box in order to localize the class objects. These numbers predict the relative offset (in terms of center coordinates, width, and height) between the anchor box and

634

A. Dhorajiya et al.

the ground-truth box. As a consequence, the feature map of the regression sub-output net includes 4A filters or channels.

3.2 Instance Segmentation Part Instance segmentation is a complicated computer vision problem that entails predicting object instances as well as their per-pixel segmentation mask. As a result, it’s a cross between semantic segmentation and object detection. To improve RetinaNet’s object localization better, an instance segmentation system should be used instead of the object detector to remove the background more thoroughly. It does not automatically forecast a mask covering the entire image while performing instance segmentation. Instead, it forecasts one mask for each RoI that is likely to include an item. A corresponding feature map for each RoI must be obtained and fed into the mask subnet to accomplish this. Essentially, RoIAlign is a technique for achieving the aforementioned goal. RetinaNet extracts feature maps of an entire picture using a feature pyramid network, as previously discussed. These feature maps are input into the classification and regression subnets to produce bounding box predictions. The bounding box predictions can be used to define RoIs when using RetinaNet with Instance Segmentation on top of it. Figure 4 depicts the RoIAlign procedure. The suggested model obtains a corresponding region on the feature map by rescaling and projecting a bounding box to an FPN feature map. The presented model will first determine a resolution in order to obtain a new feature map inside this region. The resolution in the original paper [12] is 14 × 1414 × 14 (although, for convenience, the example in Fig. 4 utilizes a coarser resolution, i.e., 2 × 22 × 2). A set of regularly spaced sampling points is picked for each grid cell, and the feature value for each point is derived using bilinear interpolation from nearby grid points on the FPN feature map. Fig. 4 RoI align [12]

Ship Detection from Satellite Imagery Using RetinaNet with Instance …

635

Finally, we derive the RoI’s feature map by max-pooling the sampling points within each grid cell. This method is not depicted in Fig. 4, but in essence, maxpooling the sampling points means selecting the point with the highest value and using it as the grid cell’s feature value. It’s also worth noting that the final feature map has a fixed size of 14 × 14 × 25,614 × 14 × 256, with the same number of channels as the FPN output.

3.2.1

The Mask Subnet

Each RoI feature map is placed in the mask subnet (Fig. 3), which is an FCN, after RoIAlign. The FCN starts with four 3 × 33 × 3 convolution layers in the original study [12], then a 2 × 22 × 2 deconvolution layer with stride 22, and lastly, a 1 × 11 × 1 convolution layer. The last layers employ sigmoid activation, whereas all hidden layers use ReLU activation. The mask output has a size of 28 × 28 × K28 × 28 × K, where KK is the number of object classes; that is, a mask exists for each class. However, in subsequent steps, only the mask that matches the predicted class is relevant.

3.3 Focal Loss For the focal loss function (Source: Focal Loss [13]), we implemented a multi-task loss function given as L total = L FL + L mask , where L FL is used to solve the class imbalance problem between the given foreground and the background classes while the model is still training. The focal loss function is defined as follows: FL( pt ) = −αt (1 − pt )γ log( pt )

(1)

The binary classification is done by—log(pt ) from cross-entropy, where we observe that when pt value >> 5, then objects, which can be easily classified, get a heavy loss with immense magnitude. So, to recorrect and reduce the loss values, γ  the modulating factor 1 − p t is introduced, which helps reducing the losses by focusing on examples with highly negative values and reduce the down-weight easy example values. The α t factor is added to further reduce the loss values, and when pt ≈ 0.968, we observe 1000× reduction in loss value, which gives us a better, faster, and improved result to highly reduction in loss values. While L mask is specified solely for mask kth for a Region of Interest connected with the given ground-truth class k, and it is already mentioned within the Mask-R-CNN paper [14].

636

A. Dhorajiya et al.

4 Results and Discussion 4.1 Dataset Used The dataset comes from the Airbus Ship Detection Challenge on Kaggle [15], which contains 193,000 images for training and 15,600 images for testing, each with a size of 768 * 768 pixels and one or more ships. To transform the run-length encoded data into the mask for training the model with the given dataset, a second file containing run-length encoded data was provided. The resilience and increased performance of the prior existing state-of-the-art models will be supported by the overall favorable prediction outcome on the test dataset.

4.2 Train the Proposed Model To improve the model’s performance while reducing false positives, it is trained with three different learning rates of 0.006, 0.003, and 0.0015. The model was trained for two epochs at a learning rate of 0.006, which resulted in a significant reduction in loss values with a decrease in the number of false positives. The models were subsequently trained for another 10 epochs at a learning rate of 0.003, while bespoke augmentation of high-quality ship pictures was used, lowering the total loss values. The models were then trained for another 12 epochs at a learning rate of 0.0015, resulting in a plateauing of loss values but a lower number of false positives. The Nvidia Tesla P100 GPU was used for all of the given tests.

4.3 Evaluation of Proposed Model We examined many pre-trained models with similar datasets and discovered a surprising result because the given Ship Detection dataset was taken from Airbus’ Ship Detection Challenge at Kaggle [14]. The quantitative evaluation of our model’s performance is shown in Table 1, which attained an accuracy (private score) of 0.80855. Because the ultimate conclusion is computed by comparing our RLE.csv file output with the competition’s output file, which is a private dataset held by Airbus and hasn’t been released. The Private Score may also be used as an accuracy statistic for our technique. The Private Score is calculated using around 93% of the test data in the dataset, with the remaining 7% of the dataset photos given by Airbus’ competition organizers. The Public Score is calculated utilizing about 7% of the test data from our dataset and 93% of the dataset photos given by Airbus’ competition administrators. We have also used COCO evaluation to classify further our results, which will be given below in Table 2.

Ship Detection from Satellite Imagery Using RetinaNet with Instance …

637

Table 1 Performance evaluation of the proposed model Proposed model

Private score

Public score

RetinaNet with instance segmentation

0.80855

0.66003

Table 2 Comparison of COCO evaluated average precision between SOTA models and proposed model AP

AP50

AP75

APS

APM

APL

PANet [15]

62.0

82.3

68.5

49.4

79.5

84.6

Mask-scoring R-CNN [16]

56.2

89.7

60.2

42.0

78.1

85.2

SSS-Net [17]

54.9

87.8

63.5

32.7

71.3

74.4

Our proposed model

65.7

91.5

71.3

51.8

76.3

86.1

Figure 5 shows that after training the proposed model for 22 epochs at a 0.003 learning rate, we plotted Total Loss values for Train Loss versus Validation Loss in the first graph, and we measured the best Loss value of 1.420643 around the 19th epoch. In the second graph, we plotted Train Mask Loss versus Validation Mask Loss and found a different trend: the lowest loss of 0.345641 in the 21st epoch. The final model is chosen after examining the epoch with the lowest loss value, i.e., the 19th epoch, which is then used to predict the final output of the image and construct annotations before sending the data in a CSV file in the run-length encoded format required as per the competition rules and regulations.

Fig. 5 a Training loss versus validation loss, b training mask loss versus validation mask loss values

638

A. Dhorajiya et al.

Fig. 6 a Input images, b output images with predicted mask and score

4.4 Model Output Comparing it to other given state-of-the-art models in Table 2, we found that the proposed model had outperformed them with Average Precision (AP) of 65.7 and APs of 51.8, respectively. While implementing the model based on our study findings, when compared to the state-of-the-art Mask-scoring R-CNN, the proposed model was able to lower its false positive rate and improve its APs rate when recognizing ships of smaller sizes in images but APM was lesser compared to PANet, assuring us of our beneficial discoveries while improving the identification of even small ships (Fig. 6).

5 Conclusion After adding the mask segmentation layer to the RetinaNet model, the overall outcome was favorable as we predicted. After implementing and comparing our AP scores of multiple object sizes with other state-of-the-art models, we observed that we achieved the highest average precision score of 65.7 while performing the analysis with the COCO evaluation method. We also observed a higher amount of reduction in false positives. We analyzed that our model predicted small size ships, i.e., APsmall, better than other models, as we expected before the implementation.

Ship Detection from Satellite Imagery Using RetinaNet with Instance …

639

The proposed model can detect the smaller ships efficiently compared to other pretrained models. Still, in the future, we will try to enhance the proposed model’s efficiency and improve computational cost by experimenting with different object detection models.

References 1. Wang Y, Li W, Li X, Sun X (2018) Ship detection by modified RetinaNet. In: 2018 10th IAPR workshop on pattern recognition in remote sensing (PRRS), pp 1–5 2. Zhang T, Zhang X, Shi J, Wei S (2019) High-speed ship detection in SAR images by improved Yolov3. In: 16th international computer conference on wavelet active media technology and information processing, pp 149–152 3. Wang R et al (2019) An improved faster R-CNN based on MSER decision criterion for SAR image ship detection in harbor. In: IGARSS 2019—2019 IEEE international geoscience and remote sensing symposium, pp 1322–1325 4. Zhao K, Zhou Y, Chen X (2020) A dense connection based SAR ship detection network. In: 2020 IEEE 9th joint international information technology and artificial intelligence conference (ITAIC), pp 669–673 5. Jianxin G, Zhen W, Shanwen Z (2020) Multi-scale ship detection in SAR images based on multiple attention cascade convolutional neural networks. In: 2020 international conference on virtual reality and intelligent systems (ICVRIS), pp 438–441 6. Yu H, Li Y, Zhang D (2021) An improved YOLO v3 small-scale ship target detection algorithm. In: 2021 6th international conference on smart grid and electrical automation (ICSGEA), pp 560–563 7. Xiao X, Zhou Z, Wang B, An Z (2019) Accurate ship detection via paired semantic segmentation. In: 2019 Chinese control and decision conference (CCDC), pp 5990–5994 8. Zhu M et al (2020) Rapid ship detection in SAR images based on YOLOv3. In: 2020 5th international conference on communication, image and signal processing (CCISP), pp 214–218 9. Li Z, You Y, Liu F (2019) Multi-scale ships detection in high-resolution remote sensing image via saliency-based region convolutional neural network. In: IGARSS 2019—2019 IEEE international geoscience and remote sensing symposium, pp 246–249 10. Nie X, Duan M, Ding H, Hu B, Wong EK (2020) Attention mask R-CNN for ship detection and segmentation from remote sensing images. IEEE Access 8:9325–9334 11. Zhu M et al (2021) Arbitrary-oriented ship detection based on retinanet for remote sensing images. IEEE J Sel Top Appl Earth Observations Remote Sens 14:6694–6706 12. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: 2017 IEEE international conference on computer vision (ICCV), pp 2980–2988 13. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327 14. Kaggle. Airbus ship detection challenge [Internet]. Available at: https://www.kaggle.com/c/air bus-ship-detection/data 15. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of IEEE/CVF conference on computer vision and pattern recognition, pp 8759–8768 16. Huang Z, Huang L, Gong Y, Huang C, Wang X (2019) Mask scoring R-CNN. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 6409–6418 17. Huang Z, Sun S, Li R (2020) Fast single-shot ship instance segmentation based on polar template mask in remote sensing images

A Novel Technique of Mixed Gas Identification Based on the Group Method of Data Handling (GMDH) on Time-Dependent MOX Gas Sensor Data Ghazala Ansari, Preeti Rani, and Vinod Kumar Abstract Gas sensor arrays have been used to identify gases based on their dynamic response. Pre-processing of data plays a critical role in identifying different gases. There are several pre-processing techniques for data found in the literature, all of which are complex and sophisticated. Six different volatile organic compounds were exposed to eight metal-oxide (MOX) gas sensors under tightly controlled operating conditions at different concentration levels. In addition, the generated dataset can be used to tackle a variety of chemical sensing challenges such as gas identification, sensor failure, or system calibration. It is a novel way to classify mixed gases. According to the statistical parameters calculated, the proposed GMDH neural network was found to exhibit acceptable accuracy in this study. The mean square (R 2 ) value for the proposed GMDH model obtained (R 2 ) train set = 0.99346, (R 2 ) test set = 0.992, (R 2 ) complete set: 0.99261, represents a good match between the actual and output amounts of the proposed model, which is interpreted as the efficiency of the GMDH algorithm in predicting ethylene in mixtures. It has been verified that the proposed GMDH technique produces accurate classification outcomes. Keywords Classification · Gas sensor · Mixed gas data · Neural network · Regression

1 Introduction Electronic noses, also called e-noses, are a comparatively recent addition to the world of MOX gas sensors, comprising a spread of broadly cross-reactive gas sensors united with electronic equipment. Chemical analytes or odors within an area can be detected, classified, and quantified with a pattern recognition system [1, 2]. E-noses use sensors to identify the types of gas in the surrounding environment [3–6]. In early e-nose systems, gas measurements were displayed in the form of a color array via a calorimetric sensor [7]. In addition to complex analyses, gas chromatography G. Ansari (B) · P. Rani · V. Kumar Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, NCR Campus, Delhi-Meerut Road, Modinagar, Ghaziabad, Uttar Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_55

641

642

G. Ansari et al.

and mass spectrometry equipment are also required to complete the system, which combine difficult machine intelligence (MI) to reduce the service environment [8, 9]. Due to the development of electrochemical sensors and both hardware and software, electronic systems have become more intelligent and portable, even though calorimetric-sensor-based systems are limited in their real-world applications [5, 10]. The technology advancements mentioned above have gradually led to an everexpanding array of e-nose applications, such as medical check-ups [11], gas leak and pollution investigations [12], environmental protection [13], and cosmetics industry [3, 14, 15]. Comparing the results of a sensor array to a single sensor element system, the information redundancy provides a distinctive pattern for each gas, enhancing classification results substantially [16]. An electronic-nose structure is a combination of a sensor array that consists of various sensors and channels, along with a signal processing system for sensing gas content [5, 17]. The sensor array has different characteristics for each channel, so the response varies according to the gas type (class), and the data is converted into numerical values by an electronic interface. The computing system uses a classifier to determine which gas class belongs to the converted data. The data is converted to extract the relevant features. An enose system must assess its sensitivity, its feature extraction, and its classification capability in order to assess its performance. In comparison with individual gases, the problem of classifying gases mixtures is significantly more complex. The gas mixture problem has been studied in a large amount of literature. Machine learning methods are most commonly used in the study of mixed gas classification. A variety of fields, such as industrial process control, health, environment, and safety, have long struggled with identifying and classifying gases, because they need to blend and monitor the elemental compositions of ambient air. It has been reported in recent years that thick film gas sensors have proven they are capable of detecting various gases. One such sensor is doped tin oxide thick film gas sensor, which has been described to be accurate in identifying various gases in recent years.

2 Literature Review With back-propagation algorithms, feed-forward neural networks (FFNNs) carry out pattern recognition tasks easily, but their convergence rate is slow and their solutions may be suboptimal at times. The test gases and their mixture were also classified using support vector machines (SVMs) in order to address this issue. When applied to odor classification, SVM provides good generalization even though it does not incorporate domain knowledge [18]. A multioutput support vector regression (MSVR) has been proposed to quantitate single gases and odors individually as well as in mixtures [19]. Author in [20] proposed PCA with ANN, SVM, and M-SVR approaches along with thick film gas sensor arrays, and it is possible to successfully classify and quantitate VOCs in binary mixtures of acetone and 2-propanol. Further,

A Novel Technique of Mixed Gas Identification Based on the Group …

643

in previous studies [21, 22], PCA pre-processed data provided higher accuracy in locating and quantifying individual gases/odors. Researchers have offered their efforts to characterize gases, and volatile organic compounds (VOCs) by the use of steady state as well as dynamic responses using single gas sensors or multiple gas sensor arrays. Author in [23], a thick film gas sensor array has been used successfully to classify gases and odors based on an acoustic response and steady state response. Author in [24], analyzed chemical sensor array dynamic response for classification of gases and odors utilized the Kalman filter technique for data conversion and linear vector quantization (LVQ) for classification. Author in [25], combination the discrete wavelet transform (DWT) and k-nearest neighbors (KNN) technique as well as probability neural network (PNN) is reported to improve classification accuracy. Author in [26], UC San Diego researchers examined the dynamic response of sensor arrays for identifying gases, emphasizing the processing of discrete measurements. It used a data profiling scheme to identify gases. Author in [27], by analyzing the responses of gas sensor arrays with wavelet transformation (WT) and multiscale principal component analysis (MSPCA), accurate identification of gases and odors can be achieved. Data that is transformed is used as input to build the neural network. A back-propagation neural network was chosen because it is easy to implement and quite suitable to implement multiclass classification processes [28]. The 3-element sensor array was used to classify and quantify binary gas mixtures [29]. We trained a neural net with 20 vectors and converted sensor responses into SPDCA components, then tested 21 response vectors. These 41 vectors were used to train and test a neural classifier using the two principal components using 108 vectors of SPCA-translated data. 108 datasets were used for this SPCA, whose training and testing were split 70:30, respectively [30]. Authors [31], the gas classification neural network was trained using 1175 train and 375 test vectors obtained from the SPCA transformed data of their eight element SnO2 sensor response. As a result, to be able to use PCA/SPCA trained neural classifiers in order to test freshly generated samples, [32], a multi-layer perceptron (MLP) classifier has been developed along with a pre-processor, a postprocessor, and a training heuristic. There are various other application where we can apply this technique [33–35].

3 Data Collection Here, we describe the experimental setup, recorded dataset by an array of metal-oxide (MOX) gas sensors, feature extraction, and classification techniques are explained in this section.

644

G. Ansari et al.

Table 1 MOX sensors array information Sensor type

Number of units

Channel

Target gases

TGS2600

1

1

Carbon monoxide (CO), hydrogen, methane

TGS2602

2

2, 3

Hydrogen sulfide, ammonia, toluene

TGS2610

1

8

Isobutane, propane

TGS2611

1

7

Methane, hydrogen, isobutane

TGS2612

1

5

Propane, methane

TGS2620

2

4, 6

Carbon monoxide (CO), hydrogen, methane

3.1 Experimental Setup In this experiment, gas sensors were tested in a realistic environment. The most common technique used to control the flow of air in gas chambers and minimize turbulence in chemical detection systems is chemo-resistive countermeasures. Our solution was to use a wind tunnel that is powered by two gas sources that generate two gas plumes independently. Consequently, the gas sensors can measure the spatial and temporal dataset delimited in the gas plumes if the plumes are mixed in a turbulent flow. So, gas sensors can mimic natural gas plumes’ spatio-temporal behavior. The chemical detection platform consisted of eight MOX gas sensors that respond to different gas stimuli in a time-dependent manner as mention is Table 1. The sensors were available from Figaro (TGS2612, TGS2611, TGS2610, TGS2602, TGS2600, and TGS2620) for direct sale. Built-in heaters kept the sensors at a constant temperature of 5 V. Temperature as well as relative humidity sensors are also on the detection platform. During the entire duration of the experiment, the created sensors’ responses were analyzed with sampling rate of 20 ms [36].

3.2 Dataset A 2.5 × 1.2 × 0.4 m wind tunnel facility with two gas sources (labeled as gas source 1 and source 2) was used for data acquisition. The experimental data was gathered by using the UCI dataset “gas sensor array exposed to turbulent gas mixtures” [37]. Ethylene, methane, and carbon monoxide (CO) were used as mixtures in order to expose the detection system. The mixtures were emitted from one source of ethylene, while the other source of methane or CO was a source of methane/CO. There is subsequently a mixture of “methane and ethylene” or “CO and ethylene”. The dataset is organized into 180 text files, with each file denoting a different measurement. The original datasets for each measurement are summarized in Table 2. Measurements are identified as follows in the filenames: This filename is composed of 3 letters, which represent a local identifier that does not relate to any order of measurements; characters 4–8 represent the concentration level of ethylene released at source 2 (n:

A Novel Technique of Mixed Gas Identification Based on the Group …

645

Table 2 Sample points of the original datasets Time TGS2600 TGS2602 TGS2602 TGS2620 TGS2612 TGS2620 TGS2611 TGS2610 1

334

293

529

563

556

688

618

705

1.1

336

294

532

566

558

688

615

701

1.2

336

294

532

566

558

688

614

701

1.3

336

294

532

565

558

689

614

701

1.06 .. . .. .

336

294

532

565

557

688

615

701

299.9 374

335

592

610

600

740

642

734

300

335

591

610

600

740

642

734

373

Table 3 Mixed gases valves based on their concentration level

Concentration CO (ppm) Methane (ppm) Ethylene (ppm) level Zero

0

0

0

Low

270

51

31

Medium

397

115

46

High

460

131

96

zero, L: low, M: medium, and H: high); the last 4 characters represent the gas released at source 1 (Me: methane, CO: carbon monoxide). Using the measurements shown in Table 3, we identified the following concentrations at the locations of the sensors: ethylene (l: 31 ppm, m: 46 ppm, and h: 96 ppm), CO (l: 270 ppm, m: 397 ppm, and h: 460 ppm), and methane (l: 51 ppm, m: 115 ppm, and h: 131 ppm) [36]. Figure 1 shows the response curve of the TGS2600 sensor for single ethylene, CO, and a mixture of the two samples. The sample values representation for different 8 MOX gas sensors based on the concentration (ppm) is mentioned in Fig. 2.

4 Methodology In GMDH networks, neurons are selected for their hidden layers during the train process by selecting from a pool of candidates. The network does not begin with only input neurons; neurons are added to hidden layers as they are selected from the pool of candidates. In GMDH networks, the connections in the network are not fixed but rather are selected during the train process to improve the network’s performance. In addition, the network’s number of layers is automatically selected so as to produce

646

G. Ansari et al.

Fig. 1 Single and mixed TGS2600 gas sensor reading with respect to the sampling points

Fig. 2 Single and mixed 8 MOX gas sensor reading with respect to the concentration (ppm)

maximum accuracy while reducing overfitting. Ivakhnenko found the GMDH algorithm useful in modeling complex systems and datasets with multiple inputs and outputs [38]. A second-degree transfer function is the main reason for constructing GMDH networks in feed-forward networks. This algorithm automatically calculates elements such as how many layers and neurons are contained within the hidden layers, optimal model structure, and effective input variables. A GMDH neural network translates inputs and outputs using a nonlinear function called the Volterra series,

A Novel Technique of Mixed Gas Identification Based on the Group …

647

expressed in the form of its Eq. (1). The Volterra series as a two-variable polynomial of second degree is analyzed by its Eq. (2). yˆ = a0 +

m  i=1

ai xi +

m  m 

ai j xi x j +

i=1 j=1

m  m  m 

x j ai jk xi x j xk + · · ·

(1)

i=1 j=1 k=1

  G xi , x j = a0 + a1 x1 + a2 x j + a3 xi2 + a1 x 2j + a5 xi x j

(2)

In the Volterra series, the GMDH technique is used to find ai unidentified coefficients. Each pair of variables xi and x j is solved for their ai coefficients using regression methods [39, 40]. On this basis, assuming least squares error is taken into account [41], the G function is defined as follows in Eq. (3): M E=

− G i O)2 M

i=1 (y1

(3)

yi = f (xi1 , xi2 , xi3 , . . . , xim ), i = 1, 2, 3, . . . , m Figure 3 shows the GMDH network structure with yn outputs and xn model inputs. In mathematical terms, the GMDH provides a general relationship between outputs and input parameter [39, 42, 43]. In the case of GMDH network, the challenge is to govern and design the network to reduce the difference among the actual output (y) and the predicted output y˜ as mention in Eq. (4). N  | y˜ − yi |2 ∼ min

(4)

i=1

5 Experimental Results The GMDH neural network model was implemented using MATLAB R2020a. GMDH output contains 5 number of hidden layers, 25 neurons in the layer with single output layer (Y). The systematic layout of the proposed GMDH technique is mentioned in Fig. 4.

5.1 Performance Indicators We compared the performance of the GMDH by taking into account such statistical parameters as correlation coefficients (R), mean square error (MSE), and root mean

648

G. Ansari et al.

Fig. 3 GMDH network system with n inputs

Fig. 4 Systematic layout of the proposed GMDH technique

square error (RMSE). R, MSE, and RMSE are mathematically expressed in Eqs. (5– 7), respectively [44–47]. This GMDH model has been implemented in MATLAB R2020a on an 8 Gb RAM CPU with Windows 10 operating system. In other words, a correlation coefficient (R) close to 1 indicates the best network performance. On the other hand, when MAE and RMSE methods zero during the model comparison,

A Novel Technique of Mixed Gas Identification Based on the Group …

649

this means that the model has been accepted.   ˆ y − y ˆ y − Y i i i i=1

R = 

2 2 N  N  ˆ ˆi i=1 yi − y i=1 yi − Yi N 

N  1  yi − Y i  N i=1   N 1   2 RM SE =  yi − Y i N i=1

M AE =

(5)

(6)

(7)

Assuming that, N is the total number of data points, yi is the value of the actual or measured parameter, y is the mean of the measured parameter, Yˆi is the mean of the predicted parameter, and Yi is the value of the estimated parameter. An RMSE and MAE value approaching zero during training indicates the model has trained well. On the other hand, the model is more likely to perform well when tested against the withheld data. In this case, the best performing model will also have an improved generalization capability if the MAE and RMSE approaches zero [45].

5.2 Statistical Error Analysis In Fig. 5, you can see the graphical model that was used to evaluate the quality of the GMDH. GMDH model shows good precision based on the presence of dense classes of data points around the horizontal line of error. Histogram plots of residual values can be used by statisticians to determine how well a model performed and determine the discrepancy between actual and predicted values. Figure 5 illustrates the distribution of residuals for training and testing data points. According to the histogram of residuals, the differences between the actual and estimated points follow a normal distribution. Figure 5 illustrates the high accuracy of the ultimate model of this study by comparing the real to the predicted values of each data point and the train set data. The GMDH estimated values follow the trend of the actual ethylene solubility data points in an exact way based on the below figure. As a result, the proposed modeling technique is confirmed to provide precise and efficient results. The group method of data handling (GMDH) technique, as a result of enumerating the appropriate features, demonstrated high performance in the estimation of ethylene solubility in multiple gas sensor arrays composed of mixed MOX gas composition. A regression plot depicting the model outputs for training, testing, and complete datasets is shown in Fig. 6. It is seen in this figure that a high concentration of data points, both experimental and training, occurs near the unit slope line, which

650

G. Ansari et al.

Fig. 5 Tarin outcome and its comparison of estimated values with actual value, train error, and histogram of residuals for the proposed GMDH model train sets

indicates that the predicted data points match up well with the experimental data points. Similarly, the efficiency of the model can be determined from the coefficients of which are near 1 for train sets, test sets and complete sets R 2 train set = 0.99346, R 2 test set = 0.992, and R 2 complete set: 0.99261.

6 Conclusion Gas identification has been done using the dynamic response of gas sensor arrays. Pre-processing data is crucial to the identification of different gases. In the literature, there are different data pre-processing techniques that are considered very complex and sophisticated. A proposed GMDH method for estimating the solubility of ethylene in six types of the 8 MOX gas sensor was assessed in terms of temperature (T ), relative humidity (%), and the readings of the eight gas sensors. There are 30 different mixture configuration data points in this research, including 15 mixtures of ethylene with CO and 15 mixtures of ethylene with methane. These configurations were repeated six times each. It is a novel way to classify mixed gases. A gas sensor collects time series data

A Novel Technique of Mixed Gas Identification Based on the Group …

651

Fig. 6 Regression plots of proposed GMDH model for train, test and complete set-in predicting CO2 ethylene, respectively

that changes over time called gas data. A sensor array composed of multiple sequence values will form a two-dimensional matrix m × n, where “m” is much larger than “n”. According to the statistical parameters calculated, the proposed GMDH neural network was found to exhibit acceptable accuracy in this study. The mean square (R 2 ) value for the proposed GMDH model obtained R 2 train set = 0.99346, R 2 test set = 0.992, and R 2 complete set: 0.99261, represents a good match between the actual and output amounts of the proposed model, which is interpreted as the efficiency of the GMDH algorithm in predicting ethylene in mixtures. It has been verified that the proposed method produces accurate classification results. The proposed average slope multiplication scheme allows for the accurate classification of gases based on the dynamic response/recovery characteristics of the gas sensor array. Furthermore, using a variety of machine learning techniques, the research will be aimed at quantifying gases with dynamic responses.

652

G. Ansari et al.

References 1. Gardner JW, Bartlett PN (1999) Electronic noses: principles and applications. Oxford University Press on demand 2. Persaud K, Dodd G (1982) Analysis of discrimination mechanisms in the mammalian olfactory system using a model nose. Nature 299(5881):352–355 3. Bougrini M, Tahri K, Haddi Z, Saidi T, El Bari N, Bouchikhi B (2014) Detection of adulteration in argan oil by using an electronic nose and a voltammetric electronic tongue. J Sens 2014 4. Chiu S-W, Tang K-T (2013) Towards a chemiresistive sensor-integrated electronic nose: a review. Sensors 13(10):14214–14247 5. Choi S-I, Kim S-H, Yang Y, Jeong G-M (2010) Data refinement and channel selection for a portable e-nose system by the use of feature feedback. Sensors 10(11):10387–10400 6. Zhou J, Feng T, Ye R (2015) Differentiation of eight commercial mushrooms by electronic nose and gas chromatography-mass spectrometry. J Sens 2015 7. Lerchner J, Caspary D, Wolf G (2000) Calorimetric detection of volatile organic compounds. Sens Actuators B Chem 70(1–3):57–66 8. Farré M, Kantiani L, Petrovic M, Pérez S, Barceló D (2012) Achievements and future trends in the analysis of emerging organic contaminants in environmental samples by mass spectrometry and bioanalytical techniques. J Chromatogr A 1259:86–99 9. Kim Y-H, Kim K-H (2012) Ultimate detectability of volatile organic compounds: how much further can we reduce their ambient air sample volumes for analysis? Anal Chem 84(19):8284– 8293 10. Nicolas J, Romain A-C, Wiertz V, Maternova J, André P (2000) Using the classification model of an electronic nose to assign unknown malodours to environmental sources and to monitor them continuously. Sens Actuators B Chem 69(3):366–371 11. Di Natale C, Macagnano A, Martinelli E, Paolesse R, D’Arcangelo G, Roscioni C, FinazziAgro A, D’Amico A (2003) Lung cancer identification by the analysis of breath by means of an array of non-selective gas sensors. Biosens Bioelectron 18(10):1209–1218 12. Khalaf W, Pace C, Gaudioso M (2009) Least square regression method for estimating gas concentration in an electronic nose system. Sensors 9(3):1678–1691 13. Macías MM, Agudo JE, Manso AG, Orellana CJG, Velasco HMG, Caballero RG (2013) A compact and low cost electronic nose for aroma detection. Sensors 13(5):5528–5541. https:// doi.org/10.3390/s130505528 14. Arshak K, Moore E, Lyons GM, Harris J, Clifford S (2004) A review of gas sensors employed in electronic nose applications. Sensor Rev 24(2):181–198. https://doi.org/10.1108/026022804 10525977 15. Norman A, Stam F, Morrissey A, Hirschfelder M, Enderlein D (2003) Packaging effects of a novel explosion-proof gas sensor. Sens Actuators B Chem 95(1–3):287–290 16. Srivastava AK (2003) Detection of volatile organic compounds (VOCs) using SnO2 gas-sensor array and artificial neural network. Sens Actuators B Chem 96(1–2):24–37 17. Jeong G-M, Nghia NT, Choi S-I (2014) Pseudo optimization of e-nose data using region selection with feature feedback based on regularized linear discriminant analysis. Sensors 15(1):656–671 18. Gulbag A, Temurtas F (2006) A study on quantitative classification of binary gas mixture using neural networks and adaptive neuro-fuzzy inference systems. Sens Actuators B Chem 115(1):252–262 19. Pérez-Cruz F, Camps-Valls G, Soria-Olivas E, Pérez-Ruixo JJ, Figueiras-Vidal AR, ArtésRodríguez A (2002) Multi-dimensional function approximation and regression estimation. In: International conference on artificial neural networks, pp 757–762 20. Sunny, Kumar V, Mishra VN, Dwivedi R, Das RR (2015) Classification and quantification of binary mixtures of gases/odors using thick-film gas sensor array responses. IEEE Sens J 15(2):1252–1260. https://doi.org/10.1109/JSEN.2014.2361852 21. Mishra VN, Dwivedi R, Das RR (2013) Classification of gases/odors using dynamic responses of thick film gas sensor array. IEEE Sens J 13(12):4924–4930

A Novel Technique of Mixed Gas Identification Based on the Group …

653

22. Mishra VN, Dwivedi R, Das RR (2013) Quantification of individual gases/odors using dynamic responses of gas sensor array with ASM feature technique. IEEE Sens J 14(4):1006–1011 23. Llobet E, Brezmes J, Vilanova X, Sueiras JE, Correig X (1997) Qualitative and quantitative analysis of volatile organic compounds using transient and steady-state responses of a thick-film tin oxide gas sensor array. Sens Actuators B Chem 41(1–3):13–21 24. Nakamura M, Sugimoto I, Kuwano H (1997) Pattern recognition of dynamic chemical-sensor responses by using LVQ algorithm. In: 1997 IEEE international conference on systems, man, and cybernetics. Computational cybernetics and simulation, vol 4, pp 3036–3041 25. Sobanski T, Modrak I, Nitsch K, Licznerski BW (2005) Application of sensor dynamic response analysis to improve the accuracy of odour-measuring systems. Meas Sci Technol 17(1):1 26. Szczurek A, Maciejewska M (2009) Sensor array data profiling for gas identification. Talanta 78(3):840–845 27. Kumar R, Das RR, Mishra VN, Dwivedi R (2010) Wavelet coefficient trained neural network classifier for improvement in qualitative classification performance of oxygen-plasma treated thick film tin oxide sensor array exposed to different odors/gases. IEEE Sens J 11(4):1013–1018 28. Gutierrez-Osuna R (2002) Pattern analysis for machine olfaction: a review. IEEE Sens J 2(3):189–202 29. Alizadeh T, Zeynali S (2008) Electronic nose based on the polymer coated SAW sensors array for the warfare agent simulants classification. Sens Actuators B Chem 129(1):412–423 30. Siripatrawan U (2008) Rapid differentiation between E. coli and Salmonella typhimurium using metal oxide sensors integrated with pattern recognition. Sens Actuators B Chem 133(2):414– 419 31. Lv P, Tang Z, Wei G, Yu J, Huang Z (2007) Recognizing indoor formaldehyde in binary gas mixtures with a micro gas sensor array and a neural network. Meas Sci Technol 18(9):2997 32. Mohamad-Saleh J, Hoyle BS (2008) Improved neural network performance using principal component analysis on Matlab, p 9 33. Hussain N, Rani P (2020) Comparative studied based on attack resilient and efficient protocol with intrusion detection system based on deep neural network for vehicular system security. In: Distributed artificial intelligence. CRC Press, Boca Raton, pp 217–236 34. Hussain N, Rani P, Chouhan H, Gaur US (2022) Cyber security and privacy of connected and automated vehicles (CAVs)-based federated learning: challenges, opportunities, and open issues. In: Federated learning for IoT applications. Springer, Berlin, pp 169–183 35. Rani P, Hussain N, Khan RAH, Sharma Y, Shukla PK (2021) Vehicular intelligence system: time-based vehicle next location prediction in software-defined internet of vehicles (SDNIOV) for the smart cities. In: Al-Turjman F, Nayyar A, Devi A, Shukla PK (eds) Intelligence of things: AI-IoT based critical-applications and innovations. Springer International Publishing, pp 35–54. http://doi.org/10.1007/978-3-030-82800-4_2 36. Fonollosa J, Rodríguez-Luján I, Trincavelli M, Vergara A, Huerta R (2014) Chemical discrimination in turbulent gas mixtures with mox sensors validated by gas chromatography-mass spectrometry. Sensors 14(10):19336–19353 37. Dataset (n.d.) https://archive.ics.uci.edu/ml/datasets/Gas+sensor+array+exposed+to+turbul ent+gas+mixtures 38. Ivakhnenko AG (1971) Polynomial theory of complex systems. IEEE Trans Syst Man Cybern 4:364–378 39. Farlow SJ (2020) Self-organizing methods in modeling: GMDH type algorithms. CRC Press, Boca Raton 40. Iba H, deGaris H, Sato T (1995) A numerical approach to genetic programming for system identification. Evol Comput 3(4):417–452 41. Nariman-Zadeh N, Darvizeh A, Jamali A, Moeini A (2005) Evolutionary design of generalized polynomial neural networks for modelling and prediction of explosive forming process. J Mater Process Technol 164:1561–1571 42. Mulashani AK, Shen C, Nkurlu BM, Mkono CN, Kawamala M (2022) Enhanced group method of data handling (GMDH) for permeability prediction based on the modified Levenberg Marquardt technique from well log data. Energy 239:121915. https://doi.org/10.1016/j.energy. 2021.121915

654

G. Ansari et al.

43. Pereira IM, Moraes DA (2021) Monitoring system for an experimental facility using GMDH methodology. Braz J Radiat Sci 8(3B). http://doi.org/10.15392/bjrs.v8i3B.663 44. Aliouane L, Ouadfeul S-A, Djarfour N, Boudella A (2014) Permeability prediction using artificial neural networks. A comparative study between back propagation and Levenberg– Marquardt learning algorithms. In: Pardo-Igúzquiza E, Guardiola-Albert C, Heredia J, MorenoMerino L, Durán JJ, Vargas-Guzmán JA (eds) Mathematics of planet earth. Springer, Berlin, pp 653–657. http://doi.org/10.1007/978-3-642-32408-6_142 45. Asante-Okyere S, Shen C, Yevenyo Ziggah Y, Moses Rulegeya M, Zhu X (2018) Investigating the predictive performance of Gaussian process regression in evaluating reservoir porosity and permeability. Energies 11(12):3261. https://doi.org/10.3390/en11123261 46. Elkatatny S, Mahmoud M, Tariq Z, Abdulraheem A (2018) New insights into the prediction of heterogeneous carbonate reservoir permeability from well logs using artificial intelligence network. Neural Comput Appl 30(9):2673–2683. https://doi.org/10.1007/s00521-017-2850-x 47. Liang M, Zheng B, Zheng Y, Zhao R (2021) A two-step accelerated Levenberg–Marquardt method for solving multilinear systems in tensor-train format. J Comput Appl Math 382:113069. https://doi.org/10.1016/j.cam.2020.113069

Software Fault Diagnosis via Intelligent Data Mining Algorithms Rohan Khurana, Shivani Batra, and Vineet Sharma

Abstract Faults are common during any software development phase, causing the system susceptible to land in a failure state thereby degrading the overall quality of software. There is an abundance of fault-finding tools and methodologies accessible to the developers, testers and end-users, but none of them is viable and cost-effective enough when dealing with the complex software-based computation system. The paper discusses the specifics of fault diagnosis and software-based software fault detection techniques and proposes a method to implement bebugging using data mining algorithms while estimating the site and characteristics of faults present in the computational system. Also, the paper demonstrates a detailed analysis by comparing and contrasting the seven software performance metrics evaluated by each data mining algorithm to predict the most effective software fault detection approach. Keywords Software fault injection · Fault detection · Data mining · Classification

1 Introduction In today’s era, the most difficult task is to develop a fault free hardware or software system. In fact, the system developer cannot assure that the software will run well. In the developing phase, a mechanism for fault tolerance [1] is placed in the computation system to handle numerous faults that can take place due to syntax error, semantic error, improper manufacture or misuse of components, absence of function declarative statements or any other code in the program, incorrect condition checks, execution errors, etc. Faults can become a part of the system that results in R. Khurana (B) · S. Batra · V. Sharma KIET Group of Institutions, Delhi-NCR, Ghaziabad 201206, India e-mail: [email protected] S. Batra e-mail: [email protected] V. Sharma e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_56

655

656

R. Khurana et al.

the misleading of a system specification. There may be times when the system can function appropriately in fault presence. In the field of computer science, there has been demand for software quality assurance rapidly increased because issues that may occur in the system are concerned with the testing phase and it became a big concern. So, we need to generate the ability to measure the software faults [2] and refining the global effectiveness of the testing phase or the computation system has become a necessary task. The probability of early estimation of faults in the system will help in the development activities in the system and the process of monitoring the system for the purpose of identifying faults from the system and also recognizing the faults type and its location in the system in order to provide the finest system. Fault detection [3] from the system is a time-taking and sluggish process, but it is necessary for the system in the long run. The source field of this scenario is problem engineering, and it is a sub-part of control engineering. According to the analysis of the development phase, system can be designed with its own faults handling mechanism that is known as FTM [1, 4]. This mechanism can permit a system to perform accurately even though faults may be present in the system in a specific component of the system. When these faults come under the testing phase, it showed some different behavior with different set of test cases but still some can be identified and removed during UAT. But, some of which cannot be identified by testing those faults are still there in the system which are required to be identified and detached. These unidentified faulty anomalies can be detected by implementing the strategies like fault seeding, mutation testing, fault injection [5], fault-tree analysis and spectrum analysis. Detecting faults in the system is a difficult task to do. Generally, faults that befallen within the system can be demonstrated with the assistance of some software metrics that are constructed on classification machine learning procedures. However, to design a quality estimation model is not an easy task because faulty version of the system may degrade the performance and system’s quality. Hence, there is a necessity to recover from it or preserve the quality value and efficiency of the system based on classification algorithms. Classification can be defined as the act of categorizing and labeling things into some form of classes. Classification task can be represented by the collection of records. Each record consists of classes that are used to represent things as shown in Fig. 1. The record that consists of attribute features and the input set of attributes denoted with ‘x’ (sample dataset) can be labeled into output class label denoted with ‘y’. Earlier, many classification algorithms are proposed such as decision tree approach [6], Naive Bayes [7] and Bayesian belief networks (BNN) [8], rule-based methods [9], memory-based reasoning [9], neural networks, support vector machines (SVM) [10] which can be used to generate the relevant results. Fig. 1 Representation of classification mapping with the set of input attributes and its output class label

Software Fault Diagnosis via Intelligent Data Mining Algorithms

657

2 Related Work This section of paper represents an impression of how software faults [2] can be injected in the software system and usefulness of introducing the software faults helps to improve the reliability of software for many working models. There are lots of ways through which the software faults can be inserting in the system, some of which are described below. This is basically the survey in order to detect the software faults and also describes the existing approaches and tools and typical scenarios that may reference to the software faults detection.

2.1 Software Fault Seeding Software fault seeding or error seeding [11] is the procedure developed by Gerald M. Weinberg in 1970s for the purpose of determining the test coverage by amending the testing procedure code. The procedure of fault seeding [12, 13] is used to seed a plenty of faulty anomalies in the system so they may find the site of other faults in residing in that system and measure the coverage of test without taking the permission of tester. This methodology can be proficient enough for enhancing the test cases effectiveness and make available the system more reliable [14]. Software fault seeding is also coined as bebugging [11]. According to Pfleeger scientific approach, the simple and initiative study associated with fault seeding is discovered around 1970s by Harlan Mills in their unpublished paper, suppose the seeded faults  are of same thoroughness, assumed likewise exposed than a resultant latent fault ρ, then that relationship can be scripted as shown in Eq. (1). μ μ α = →ρ=  ρ α

(1)

where for the program code,  is the seeded fault’s frequency, ρ is the original fault’s frequency (that requires estimation), α is the detected seeded fault’s frequency, and μ is the detected original fault’s frequency. Equation (1) depicts the relation among the detected seeded faults (α) and the frequency of actual errors (ρ) to be found.

2.2 Mutation Testing Mutation testing is a familiar system utilized for the purpose of improvement in the software quality during development phase. It is used to modify the program code in minimal form. The aim of mutation testing is to expand the test coverage in order to detect faults. Basically, this methodology estimates the efficacy of test cases by

658

R. Khurana et al.

processing tests with version of program code in which a minor faulty alteration can be seeded in the system to look for the other software anomalies residing in the system. This type of testing can be done after the system in process to deliver all the specifications of the system. It is used to design new tests coverage of the software system and moreover, evaluates the existing quality of the system. In a software system, when software faults inhibit changes manually is referred as hand-seeded faults or the software faults inhibits changes automatically is referred as mutants. In experimental studies, mutants are capable of estimating the ability of identifying faults and compare the result that of automatically introduced changes with manual changes. Moreover, mutation testing shows the significant advantage over software reliability.

2.3 Software Fault Injection Software fault injection [5, 15, 16] is the procedure of injecting software faults and anomalies within the software system followed by evaluating the performance of the system in the presence of injected software anomalies and faults and then detecting the location of other faults and anomalies that exists within the software system. The fault injection process [17, 18] in a system is performed to quantify the efficacy of the fault tolerance mechanism (FTM) [19] of the system and to administer the conduct of existing anomalies. After injecting faults in the system, it senses the other faults in the system, recover them and reform the system so that it performs its specific responsibilities appropriately. The process requires only the introduction of a fault in a system with minimal changes in the original system code. Although the fault injection method [20] works in a sluggish manner, it can make the changes very effortlessly in the system. Afterward, the system is capable of being deployed on its own. As discussed by Arlat in [5, 21], the validation procedure needed to understand the fidelity of fault-tolerant systems comprises the successful execution of controlled experiments such that the reflection of the comportment of the system in the existence of faults is induced explicitly by scripting injection of faults in the system. Fault injection techniques [22] can be categorized into two parts: First is invasive and the second is non-invasive. The key strain when handling complex systems is the exclusion of traces of the testing mechanisms. Invasive fault injection methods leave their traces whensoever they are being processed though the testing phase, on the other hand, non-invasive techniques mask their occurrence so they do not affect the whole system but only affect the injected faults. The faulty version system introduces anomalies at particular site. As depicted in Fig. 2, the procedure begins with the source files and then it is employed over any preprocessor which analyzes the files to construct an abstract tree code representation. Furthermore, the abstract tree code representation assists in detecting the site where any type of fault can be introduced in order to produce the faulty files (files containing the faulty code) and the complete system turn into a faulty system (a faulty variant of the target system).

Software Fault Diagnosis via Intelligent Data Mining Algorithms

659

Fig. 2 Representation of fault injection in a target system

3 Proposed Method The authors have examined a plethora of research related to fault detection and have realized that the chief concern in today’s environment is software fault detection from the system with better efficacy and performance without affecting a system. Many faults detection tools and techniques have been designed and developed for the purpose to sense and perceive the faults from the system and assessing the behavior of that system before or after eradicating the faults for further analysis in order to evaluate the system’s efficiency and performance. The proposed model has been elucidated below using seven steps: 1. Initially, create the GUI where faulty data has been uploaded and result will be shown after analysis. 2. Upload faulty datasets; here, the authors have used two faulty datasets, i.e., Eclipse IDE faulty dataset and NetBeans IDE faulty dataset which is represented in arff form in the GUI. 3. Formulate the classification algorithms which are supervised algorithms and compare three classification algorithms viz. Support vector machine (SVM), Naïve Bayes classifier (NBC) and J48 decision tree (J48DT) for analyzing the effectiveness of the proposed approach. 4. Now, apply faulty datasets in order to train the classification algorithms with the help of examples. 5. Classifier code has been produced by each data mining algorithm, then evaluation methods of the trainer are calculated. 6. Software metrics are used to evaluate and compare the results that are produced after applying classification algorithms. Software metrics that are used in it are FP rate, specificity, precision, recall, F1-measure, receiver operating characteristic area and class. 7. With the help of software metrics, results can be examined for analyzing the best effectiveness of algorithm from all three of them. Furthermore, the system faults are detached making the system exhibit better execution quality.

4 Experiment and Results In this paper, the authors employed classification algorithms as a tool for identifying and isolating software faults from the system. It may be useful for enhancing the

660

R. Khurana et al.

performance of the software system and as well as improve the test coverage and reliability [14] of the software. Maximum number of software faults or anomalies are found in the testing phase, but if some of the software anomalies are neither tackled by the software tolerance mechanism [23] nor detected in the testing phase, then it is obligatory to apply these data mining algorithms24 and detect the remaining fault’s location and evaluate system performance in the existence of these faults or anomalies. Then, compare the performance after detecting the location of other remaining software faults from the system. Here, three classification algorithms are used which are supervised learning algorithms in order to detect software faults which are as follow: J48 decision tree (a predecessor of C4.5 algorithm), Naïve Bayes classifier (NBC) and support vector machine (SVM). The classification algorithms that may yield in detecting faults from the system are described below: • J48DT (Decision Tree): This is a supervised machine learning model based on the working principle of decision tree used for data mining that is extrapolative in nature. On the basis of observed feature attributes values, it chooses the target value of the new dataset sample. A decision tree is a chain of commands that is grounded on the set of questions that may be utilized in order to categorize a particular element command line. • Naïve Bayes: This type of algorithm is used for highly text documents. It is a simple algorithm in which a document is present in the absence of specific words. It is also used for adjusting the calculation and explicitly models the words count in the algorithm. • Support Vector Machine: This is a type of supervised machine learning procedure that can be practically applied on both linear as well as nonlinear data. It is assumed to be employed on the binary classified datasets. SVM uses certain mathematical functions called a kernel in order to transform the data into higher dimensions for easy linear separation. Kernel functions may be sigmoid, linear, quadratic, Gaussian, radial basis functions, etc. This algorithm is basically used to analyze data content in order to identify its patterns and further use its result for prediction for future mechanisms.

4.1 Evaluation Metrics The software metrics used in this paper for evaluation purpose involves terminology as True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (TN) that depicts the outcome categories as predicted by the model. FP Rate (FPR): This type of software metric is referred as false positive rate. It is also termed as fall-out, and it can be calculated as depicted in Eq. (2). FPR =

FP TN + FP

(2)

Software Fault Diagnosis via Intelligent Data Mining Algorithms

661

Specificity (Spec): This type of software metric is referred as true negative rate or the negative recall, and it can be calculated as depicted in Eq. (3). Spec =

TN = 1 − FPR TN + FP

(3)

Precision (Pre): Precision metrics can used to represent the proportion of samples which are positively related to a class and divided by the resultant sample that are classified as that class. It is also called positive predicted value, and it can be calculated according the formula as depicted in Eq. (4) Pre =

TP TP + FP

(4)

Recall (Rec): This software metrics is used to represents the proportion of samples that need to be classifies and which is divided by the actual resultant. Recall can be calculated according the formula as depicted in Eq. (5) Rec =

TP FN + FP

(5)

F1-Measure (F): This type of software metrics is used to combine the measure for precision and recall. In other words, it can be seen as the harmonic mean (HM) of positive predicted value (PPV) and true positive rate (TPR), it is calculated as depicted in Eq. (6) F=

2 × TP 2 × Pre × Rec = Pre + Rec 2 × TP + FP + FN

(6)

Class: In this metrics, it is used to represent the specific order of samples category divided into either minor or major and also as C, if it becomes combination of both categories. ROC Area: In this software metrics, it is referred to the receiver operating characteristics area or curve is used to represent the performance of the binary classifier system. It is basically used to diagnose the test for accuracy.

4.2 Dataset Summary NetBeans and Eclipse are an integrated development environment (IDE where execution of program code will be done) datasets and used its test cases. The main purpose is to identify software faults through NetBeans and Eclipse test case (datasets) which contains datasets of dot arff (.arf) format. This dataset will apply toward the segment programming that may generate after segment.java can be processed the dataset. The main purpose of using NetBeans and Eclipse datasets is to identify the scenario in

662

R. Khurana et al.

which software faults may be occurred in IDE such that it can be isolated with each other and remove from the software system followed by recovery of software system.

4.3 Findings and Discussion The performance analysis is based on software metrics or parameters that are helpful for evaluation of the fault detection method using classification algorithms and distinguish the results on the basis of minor or major class (support vector machine, J48 decision tree and Naïve Bayes). Here, software metrics are used to compare the result of three classification algorithms. Identification of software faults in the system and measurement of system performance has been major agenda of this research. Therefore, the authors have considered two test cases (Eclipse IDE test case and NetBeans IDE test case) as a system and applied the three classification algorithms on the respective test cases followed by the comparison of result of all three algorithms in order to conclude the best result by one of the algorithms. Since, performance measure can be divided into two categories: performance for minor class and performance for major class. Both of the performance measure can be evaluated for two test cases. Table 1, which shows the performance for minor class can be evaluated for Eclipse Faulty dataset in which six software parameters (FP rate, specificity, precision, recall, F1-measure, receiver operating characteristic area) are evaluated by employing all three algorithms in which support vector machine (SVM) produces the finest outcomes, i.e., roughly (1.0) unity for most of the parameters and J48 decision tree produced the lowest result in all parameters as compared to all three algorithms. For the purpose of more appropriate observation of the calculated result, the graphical representation of performance for minor class on Eclipse Faulty dataset can be constructed in Fig. 3a which shows that SVM classification methodology generates the best outcome in order to categorize the faults from the software system and handle the software system to conclude the finest performance from it. Similarly, in Table 3, which shows the performance for minor class on NetBeans Faulty dataset in which all six parameters are calculated for all three algorithms to compare and find the finest result. Therefore, by going through with the whole Table 1 Minor class performance of Eclipse Faulty dataset

Parameters

SVM

Naïve Bayes

J48 decision tree

FP rate

1

0.538

0.628

Specificity

0

0.462

0.372

Precision

0.71

0.812

0.742

Recall

1

0.946

0.738

F1-measure

0.83

0.874

0.74

ROC area

0.5

0.837

0.552

Software Fault Diagnosis via Intelligent Data Mining Algorithms

663

MINOR CLASS PERFORMANCE OF ECLIPSE FAULTY DATASET SVM

Naïve Bayes

J48 Decision Tree

Value

1.5 1 0.5 0

FP Rate

Specificity

Precision

Recall

F1-Measure ROC Area

Performance Parameters

(a) MAJOR CLASS PERFORMANCE OF ECLIPSE FAULTY DATASET SVM

Naïve Bayes

J48 Decision Tree

Value

1.5 1 0.5 0

FP Rate

Specificity

Precision

Recall

F1-Measure

ROC Area

Performance Parameters

(b) Fig. 3 Graphical representation of performance evaluated on Eclipse Faulty dataset

process, again SVM generates the finest result, i.e., approximately (0.8) and J48 decision tree calculate the lowest result, i.e., approximately (0.6) and it’s graphically construct representation is shown in Fig. 4a. By comparing the performance for minor class for all three algorithms (support vector machine, Naïve Bayes classifier and J48 decision tree), it will concluded that SVM gives the best result. Similarly, the performance measure for major class can be shown in Table 2 in which all six parameters are calculated for Eclipse IDE Faulty dataset by applying all three algorithms viz. Naïve Bayes, support vector machine and J48 decision tree and it can be observed that Naïve Bayes calculates the worst result for all parameters, i.e., mostly zero and SVM gives nearby finest result, i.e., (0.4) in all three algorithms. In this case, SVM produces the finest precision value, i.e., (0.82) on Eclipse Faulty dataset. For the purpose of comparative observational study of calculated result, generated the graphical representation of major class on Eclipse Faulty dataset is shown in Fig. 3b. Moreover, the performance measure for major class on NetBeans Faulty dataset is also calculated for all six parameters followed by applying three algorithms that

664

R. Khurana et al.

MINOR CLASS PERFORMANCE OF NETBEANS FAULTY DATASET SVM

Naïve Bayes

J48 Decision Tree

Value

1 0.5 0

FP Rate

Specificity

Precision

Recall

F1-Measure

ROC Area

Performance Parameters

(a) MAJOR CLASS PERFORMANCE OF NETBEANS FAULTY DATASET SVM

Naïve Bayes

J48 Decision Tree

Value

1 0.5 0

FP Rate

Specificity

Precision

Recall

F1-Measure

ROC Area

Performance Parameters

(b) Fig. 4 Graphical representation of performance evaluated on NetBeans Faulty dataset

Table 2 Major class performance of Eclipse Faulty dataset

Parameters

SVM

Naïve Bayes

J48 decision tree

FP rate

0.110

0

0.262

Specificity

0.890

1

0.738

Precision

0.820

0

0.362

Recall

0.501

0

0.372

F1-measure

0.622

0

0.37

ROC area

0.696

0.5

0.552

concluded result as shown in Table 4 in which it will observe that SVM and Naïve Bayes both gives the approximately finest result, i.e., (0.55) and (0.6) but SVM generates better precision value, i.e., (0.84) than Naïve Bayes precision value, i.e., (0.797). The graphical representation of performance for major class is shown in Fig. 4b that must be evaluated for NetBeans Faulty dataset for all three algorithms for observing the better result.

Software Fault Diagnosis via Intelligent Data Mining Algorithms Table 3 Minor class performance of NetBeans Faulty dataset

Table 4 Major class performance of NetBeans Faulty dataset

Parameters

SVM

Naïve Bayes

665 J48 decision tree

FP rate

0.456

0.259

0.389

Specificity

0.544

0.741

0.611

Precision

0.655

0.758

0.602

Recall

0.868

0.811

0.589

F1-measure

0.747

0.784

0.595

ROC area

0.706

0.863

0.608

Parameters

SVM

Naïve Bayes

J48 decision tree

FP rate

0.132

0.189

0.411

Specificity

0.868

0.811

0.589

Precision

0.804

0.797

0.598

Recall

0.544

0.741

0.611

F1-measure

0.649

0.768

0.6

ROC area

0.706

0.863

0.608

In all cases, i.e., performance for minor class on Eclipse Faulty dataset and NetBeans Faulty dataset and performance for major class on Eclipse and NetBeans faulty dataset were concluded that SVM gives the most relevant result that was observed in all cases. Therefore, concluded the resultant that a supervised algorithm SVM gives the best result on both Eclipse and NetBeans Faulty datasets resultant value, therefore, this algorithm is much better to identify numerous software faults from the system as well as maintain better performance of system.

5 Conclusion This paper represents the fault detection techniques that are made possible with the assistance of classification-based data mining algorithms (algorithms such as support vector machine, Naïve Bayes classifier and J48 decision tree algorithms) and using two faulty datasets, one of them is NetBeans IDE Faulty dataset and another is Eclipse IDE Faulty dataset. All three algorithms conclude with producing the satisfactory results by detecting the faults and eliminating them from the software system as well as maintaining the software efficiency and performance. But one of algorithm generates more appropriate and effective results, i.e., support vector machine algorithm. This algorithm shows approximately 70% improvement in results which is better than the other two algorithms. Therefore, it has been verified that support vector machine is more accurate and efficient algorithm for fault detection as it shows its

666

R. Khurana et al.

effectiveness over other two algorithms based on specific set of parameters. Moreover, it gives high coverage of tests, enhances the system performance and behavior during the test as well as after the test.

References 1. Duraes JA, Madeira HS (2006) Emulation of software faults: a field data study and a practical approach. IEEE Trans Softw Eng 32:849–867. https://doi.org/10.1109/TSE.2006.113 2. Iannillo AK (2014) A fault injection tool for java software applications. Ph.D. thesis 3. Chakkor S, University of Abdelmalek Essaâdi, Faculty of Sciences, Department of Physics, Communication and Detection Systems Laboratory, Tetouan, Morocco, Baghouri M, Hajraoui A (2015) High resolution identification of wind turbine faults based on optimized ESPRIT algorithm. Int J Image Graph Signal Process 7:32–41 4. Randell B (1975) System structure for software fault tolerance. IEEE Trans Softw Eng SE1:220–232. http://doi.org/10.1109/TSE.1975.6312842 5. Natella R, Cotroneo D, Duraes J, Madeira H (2013) On fault representativeness of software fault injection. IEEE Trans Softw Eng 39:80–96. https://doi.org/10.1109/TSE.2011.124 6. Rathore S, Kumar S (2016) A decision tree regression based approach for the number of software faults prediction. ACM SIGSOFT Softw Eng Notes 41:1–6. https://doi.org/10.1145/ 2853073.2853083 7. Kaviani P, Dhotre S (2017) Short survey on Naive Bayes algorithm. Int J Adv Res Comput Sci Manag 4 8. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29:131–163. https://doi.org/10.1023/A:1007465528199 9. Fürnkranz J (2013) Rule-based methods. In: Dubitzky W, Wolkenhauer O, Cho K-H, Yokota H (eds) Encyclopedia of systems biology. Springer, New York, pp 1883–1888 10. Awad M, Khanna R (2015) Support vector machines for classification. In: Efficient learning machines: theories, concepts, and applications for engineers and system designers. Apress, Berkeley, CA, pp 39–66 11. Pasquini A, De Agostino E (1995) Fault seeding for software reliability model validation. Control Eng Pract 3:993–999 12. Offutt AJ, Hayes JH (1996) A semantic model of program faults. SIGSOFT Softw Eng Notes 21:195–200. https://doi.org/10.1145/226295.226317 13. Tamak J (2013) A review of fault detection techniques to detect faults and improve the reliability in web applications 14. Lyu MR (2007) Software reliability engineering: a roadmap. In: 2007 future of software engineering. IEEE Computer Society, USA, pp 153–170 15. Hsueh M-C, Tsai T, Iyer R (1997) Fault injection techniques and tools. Computer 30:75–82. https://doi.org/10.1109/2.585157 16. Voas JM (2000) A tutorial on software fault injection 17. Lemos R (2005) Architecting dependable systems III. Springer, Berlin 18. Bieman JM, Dreilinger D, Lin L (1996) Using fault injection to increase software test coverage. In: Proceedings of ISSRE ’96: 7th international symposium on software reliability engineering, pp 166–174 19. Randell B (2003) System structure for software fault tolerance. ACM SIGPLAN Not 10:437– 449. https://doi.org/10.1145/390016.808467 20. Gupta AK, Armstrong JR (1985) Functional fault modeling and simulation for VLSI devices. In: Proceedings of the 22nd ACM/IEEE design automation conference. IEEE Press, Las Vegas, Nevada, USA, pp 720–726 21. Ziade H, Ayoubi R, Velazco R (2004) A survey on fault injection techniques. Int Arab J Inf Technol 1:171–186

Software Fault Diagnosis via Intelligent Data Mining Algorithms

667

22. Umadevi KS, Rajakumari S (2015) A review on software fault injection methods and tools. Int J Innov Res Comput Commun Eng 3:1582–1587 23. Avizienis A (1986) The N-version approach to fault-tolerant software. IEEE Trans Softw Eng 11:1491–1501. http://doi.org/10.1109/TSE.1985.231893 24. Goebel M, Gruenwald L (1999) A survey of data mining and knowledge discovery software tools. SIGKDD Explor Newsl 1:20–33. https://doi.org/10.1145/846170.846172

Face Mask Detection Using MobileNetV2 and VGG16 Ujjwal Kumar, Deepak Arora, and Puneet Sharma

Abstract As the COVID-19 situation is not over yet, a new strain of corona virus is again affecting population. Strain like Omicron and Deltacron still poses thread to the society. It is very necessary to keep our self-safe. To prevent spread of COVID few precautions are suggested by governments in the world like maintaining distance of 1 m, use of hand sanitizer, and always wear a mask. The new variant of COVID is now reported by the WHO on November 28, 2021; it was first designated as B.1.1.529 and then named as omicron and later a hybrid variant of delta and omicron was also reported. As these are affecting large population and seeing continuous straggle, it can conclude that corona virus can affect people for few more years considering the current scenario. Keeping that in mind people made face detection software which can be used to tell that a person wearing a mask not. This project is based on same object by using two different technologies MobileNetV2 and VGG16 so that a detail comparing can be done. By comparing both of them it can be known that which perform better and people can choose according to their necessity. This research paper is based on machine learning algorithm and deep learning using different Python libraries like OpenCV, TensorFlow with Keras, MobileNetV2, and VGG16. In this project, the main aim this to detect and then identify that person is wearing a mask or not then comparing both technologies and analyzes the result. Keywords Deep learning · CNN · Face mask detection · MobileNetV2 · VGG16

U. Kumar (B) · D. Arora Department of Computer Science and Engineering, Amity University Lucknow Campus, Lucknow, Uttar Pradesh, India e-mail: [email protected] D. Arora e-mail: [email protected] P. Sharma Dept. of CSE, Amity University, Noida, Uttar Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_57

669

670

U. Kumar et al.

1 Introduction The World Health Organization (WHO) published a report in Aug 16, 2020, in which it was mentioned that COVID-19 (coronavirus disease) caused by SARS-CoV2 (acute respiratory syndrome) infecting around 6 Million people and 379,941 deaths across the globe [1]. According to the report of Carissa F. Etienne, Director, Pan American Health Organization (PAHO), the strategic to void COVID-19 pandemic is by following social distancing, reinforcement health sectors, and improving surveillance [2]. In a recent study on understanding measures to fight COVID-19 pandemic carried by the academics at the University of Edinburgh discloses that wearing a mask or alternative covering over the nose and mouth cuts the danger of coronavirus. Face mask detection refers to detect whether or not someone is wearing a mask [3]. As a matter of fact, it is done by reverse engineering of face detection wherever the face is detected by using different learning algorithms for the aim of authentication, security, and observation. Face detection could be an important sub-field within the field of pattern recognition and computer vision. A major researcher of researcher has contributed subtle to algorithms for face detection in past [4]. The first analysis on face detection was tired 2001 using the planning of handcraft feature and application of old machine learning algorithms to coach effective classifiers for recognition and detection [5]. To curb the corona various variant and other respiratory problem wearing face mask is now essential for us. There are several steps taken by our government to spread the awareness but still there are some people who overlook this and do not wear mask or any face cover which put them and other in potential danger from the new variant Omicron which is more lethal than previous variant. Therefore, these systems will be big help for our society as a precaution. This research paper will provide you the working knowledge of face detection and mask detection along with various technologies involve in it. Then discuss about the dataset and how it is used in training and testing, then the analysis and result. In the end, both MobileNetV2 and VGG16 are compared to find it which one performance better.

2 Related Work In the face mask detection, a face is recognizing using several attributes like face recognition, face tracking, and face position [6]. This project is developed to compare two technologies MobileNetV2 and VGG16 using Keras which TensorFlow library for deep learning with OpenCV2 which a Python library for image data. Tensor Flow is a package in Python special used in machine learning (ML) and deep learning (DL) algorithm which a used in fabrication of machine learning system over wide branch of computer science. Keras is used in deep learning as an API, is an open-sourced library which provide interface for artificial neural networks (ANN). Keras acts as an interface and backend for TensorFlow and for other deep learning technologies.

Face Mask Detection Using MobileNetV2 and VGG16

671

OpenCV2 is an open-sourced ML and computer vision library which used for many different tasks which include image recognition and image processing. It utilizes digital image as well as real-time video input for face recognition, object recognition, movement in recording, tracking camera task and other. In this project, it used for real-time input and then based on the resizing and color conversion so that it can provide the accurate output. Neural networks are the functional unit of deep learning algorithms. It is the subdomain of machine learning which falls in artificial intelligence. Artificial neural networks or neural networks or simulated neural networks (SNNs) are just like neurons in a human brain. We can simply denote neural as neurons and network means that they’re connected with each other and passes signals which result in our body move and other functions. They are just like our brain, as our brain learns from experiences and practice, and in the similar fashion, we want a machine do the same. It is done by implement biological brain cell into machine with the help of AI. By giving some input and initialize it with some weights so performs some mathematical function and algorithms consequently [7–9]. Neural networks are widely used for analysis in artificial intelligence, image recognition, speech recognition, and many more. And also, there is several software like Google Translate and Google Assistant uses artificial neural networks. This project used MobileNetV2 and VGG16 which uses convolutional neural network to train our model. They are streamlined architecture is based depth wise separable convolutions to create light neural networks. convolutional neural networks (CNNs) are a key aspect in modern computer Vision tasks like pattern object detection, image classification, pattern recognition tasks, etc. A CNN is a deep learning algorithm that takes input, assigns learnable weights and biases to totally different aspects of input and eventually is ready to differentiate inputs from each other. The main advantage of using CNN over other networks is that the pre-processing needed is much easier in CNN as compared to others [10–13].

2.1 Dataset The data of images is collected from Kaggle which is a dataset library and from Google images which consist of two different types of data one is people wearing mask and other contain image of people not wearing mask or any face cover. The dataset for training and testing contains a total of 3978 images of both persons with mask and without mask. In which a total of 1926 image contain person with mask and the other 2052 image are persons not wearing mask. On running this program, it will detect a person’s face using a rectangular box and then show that if that person has a proper placed mask of their face. This dataset is a combination of multi-national people so that it can be used anywhere in the world.

672

U. Kumar et al.

2.2 Pre-processing This phase is the first one in which image is converting into more refine data. There are several phases in this staring with converting the image into required. Seizing the image is an important phase because it affects the model during training. For MobileNetV2 and VGG16 image size 224 × 224 is required. Then converting the images into arrays which done by looping function. After one hot encoding on the data labels which is converting the data into categorical values using LabelBinarizer method. To train and test the model, data is split into 80:20 ratio 80% of the data is for training and rest 20% data is used for the testing purpose. Each batch will contain the same ratio having both masked and without mask data. The augmentation is an important phase to generate multiple images using the same data. These augmentations mimic different scenario which can be faced by software. In this many properties are applied on the image like rotation, shifting, flip, brightness, and shear.

3 Building the Models The last phase built our model. In this, the initial learning rate is kept very slow (0.0001), and batch size is 32 with 20 epochs. There are few steps in which first the base model is created for both with three channels RGB then using the base model, head model is created for fully connected layer in which different parameters are added. Finally, the model is saved in h5 format, and results are accuracy and results are plot.

4 Experimental Setup Figure 1 explains the architecture and working. First image data is taken as input which then will be passed as arrays into mobilenetv2 or VGG16, where the image detection and classification with feature extraction take place. After that the max pooling started with the size of (7, 7) and flattens the layers. Further in fully connected layer is created by adding a dense layer of 128 neurons using Relu activation function then dropout function is used to avoid over fitting of the model. Lastly in the output layer a dense layer in used of two neurons because it will only categories the output into with mask or without mask.

Face Mask Detection Using MobileNetV2 and VGG16

673

Fig. 1 Face mask detector architecture

5 Result Analysis This model is trained on the same dataset for both MobileNetV2 and VGG16 and keeping all the parameters same. Both the model MobileNetV2 and VGG16 are able to reach 99% accuracy on train and validation. When compare both side by side, MobileNetV2 accuracy is slightly better as compared to VGG16. Overall, MobileNetV2 is better because of light weight and slightly better accuracy in the start iterations.

5.1 VGG16 Result VGG16 was able to maintain accuracy on around 99%, but it took 7 h to train the model. Every epoch took around 16–18 min to train, and details are shown in the figures below. In Fig. 2 the accuracy started with 76% and tries to improve in every iteration. In the 10 iteration, it was about 99% and stabilized it till the last iteration. In Fig. 3 the model is able to achieve 99% accuracy including macro and weighted average 99.

5.2 MobileNetV2 Results MobileNetV2 was able to maintain an accuracy of 99% which took 1/2 h to train the model. Every epoch took around less 2 min to train, and details are shown in the figures below.

674

U. Kumar et al.

Fig. 2 VGG16 training data

Fig. 3 VGG16 confusion matrix

In Fig. 4, the accuracy started with 86% and tries to improve in every iteration. In the 10 iteration, it was about 99% and stabilized it till the last iteration. In Fig. 5, the model is able to achieve 99% accuracy including macro and weighted average 99.

6 Detailed Comparison of VGG16 and MobileNetV2 In this research paper, both technologies use multistage stage detection approach; mainly, in this two-stage detector is used, one is during training time and other on running time [14, 15]. Table 1 shows details for training which include epoch and batch size, which are 20 epochs and 32 batches at which the loss were minimized. During training, all the values and dataset were kept same for accurate comparison. In table-1, the initial training loss for both was maximum, which was greater in VGG16 case. The main difference can be seen in the training time of models, as

Face Mask Detection Using MobileNetV2 and VGG16

675

Fig. 4 MobileNetV2 training data

Fig. 5 MobileNetV2 confusion matrix

MobileNetV2 is light weighted it took quit less time compared to VGG16. Both were able to recognize face mask accurately in a home environment with single and multiple faces. Also in Figs. 6 and 7 graphs shows training and validation accuracy, and training and validation losses. Initial VGG16 started with low accuracy 76.38% and high loss of 1.0175 but in the end it is able to achieve 98.99% as shown in Fig. 6, whereas MobileNetV2 started with better accuracy and minimum losses as shown in Fig. 7. Table 1 MobileNetV2 and VGG16 details and comparison

MobileNetV2

VGG16

Epoch = 20

Epoch = 20

Batch size = 32

Batch size = 32

Training loses = 0.3846

Training loses = 1.0175

Accuracy = 99.14%

Accuracy = 98.99%

Time to train model = 40 min

Time to train model = 7 h

676

U. Kumar et al.

Fig. 6 MobileNetV2 accuracy and losses graph

Fig. 7 VGG16 accuracy and losses graph

With all this data the conclusion is made that mobileNetV2 is more accurate and it is more suitable device with low RAM.

7 Conclusion This research paper on face mask detection using MobilenetV2 and VGG16 gave very good performance using both CNN architectures. On comparing the performance, it is found out that both are quick good, but MobileNetV2 is better because of its light weight, which helps it to run-on low-end devices. The main different comes in the training of models, which took quick significant time in case of VGG16 because

Face Mask Detection Using MobileNetV2 and VGG16

677

of its large size and large parameters, where on the other hand mobilenetV2 is very fast because of it light weight which make him suitable for low performance devices like mobile phones. VGG16 will be better suitable for those models which required many parameters in hidden and output layer, as in this project there were only two output layer one is with mask and other is without make so mobilenetV2 is able to provide high accuracy but in case of complicated model VGG16 could be a better choice. The accuracy of model can also be increased using GPU as they required large amount of computational power. In the end person or organization can choose any one of these depending on their preference.

References 1. World Health Organization et al (2020) Coronavirus disease 2019 (covid-19): situation report, 96 2. PAHO/WHO | Pan American Health Organization (n.d.) Social distancing, surveillance, and stronger health systems as keys to controlling COVID-19 pandemic, PAHO Director says 3. Su X, Gao M, Ren J et al. (2022) Face mask detection and classification via deep transfer learning. Multimed Tools Appl 81:4475–4494. https://doi.org/10.1007/s11042-021-11772-5 4. Nanni L, Ghidoni S, Brahnam S (2017) Handcrafted vs. non-handcrafted features for computer vision classification. Pattern Recogn 71:158–172. http://doi.org/10.1016/j.patcog.2017.05.025 5. Jia Y et al (2014) Caffe: convolutional architecture for fast feature embedding. In: MM 2014— Proceedings of the 2014 ACM conference on multimedia. http://doi.org/10.1145/2647868.265 4889 6. Jiang M, Fan X, Yan H (2020) Retina Face Mask: A Single Stage Face Mask Detector for Assisting Control of the COVID-19 Pandemic. https://arxiv.org/abs/2005.03950 7. Rajagopalan A, Lad AM, Mishra A (2021) Comparative analysis of convolutional neural network architectures for real-time COVID-19 facial mask detection. J Phys: Conf Ser 1969(1):1–9. https://doi.org/10.1088/1742-6596/1969/1/012037 8. Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In: Proceedings of the 2017 international conference on engineering and technology (ICET), Antalya, Turkey, Aug 2017. IEEE, pp 1–6 9. Jameel SM, Hashmani MA, Rehman M, Budiman A (2020) Adaptive CNN ensemble for complex multispectral image analysis. Complexity 2020(Article ID 83561989):21 p 10. Demir F, Abdullah DA, Sengur A (2020) A new deep CNN model for environmental sound classification. IEEE Access 8:66529–66537 11. Kneis B (2018) Face detection for crowd analysis using deep convolutional neural networks. In: Pimenidis E, Jayne C (eds) Engineering applications of neural networks. EANN 2018. Communications in computer and information science, vol 893. Springer, Cham 12. Saxen F, Werner P, Handrich S, Othman E, Dinges L, Al-Hamadi A (2019) Face attribute detection with MobileNetV2 and NasNet-Mobile. In: 2019 11th international symposium on image and signal processing and analysis (ISPA), pp 176–180 13. Qassim H, Verma A, Feinzimer D (2018) Compressed residual-VGG16 CNN model for big data places image recognition. In: 2018 IEEE 8th annual computing and communication workshop and conference (CCWC), pp 169–175 14. Ali R, Adeel S, Ahmed A, Shahriar MH, Mojumder MS, Lippert C (2020) Face mask detector. ResearchGate 15. Chavda A, Dsouza J, Badgujar S, Damani A (2020) Multi-stage CNN architecture for face mask detection. ResearchGate

Face Recognition Using EfficientNet Prashant Upadhyay, Bhavya Garg, Anant Tyagi, and Arin Tyagi

Abstract Communication is becoming increasingly vital as we move closer to a new transformation. Face features are crucial facial expressions qualities that also participate in human–computer interaction. They play an essential role in social interactions and are commonly used in the behavioral description of emotions. The work introduced in this research focuses on facial expression recognition from images to improve the identification accuracy of real-world images with challenges in expression datasets. This approach also applies to laboratory-trained dataset images trained in a controlled environment for cross-database evaluation studies. The feature extraction process is more difficult in real-world films than in films trained in a controlled environment. A convolutional neural network model based on in-depth learning technology using EfficientNet has been proposed in this research work. The novel concept of EfficientNet Architecture, introduced in 2019, uses compound scaling to enhance CNN’s structure further. Literary review shows that no work has been done so far for facial expression recognition using this architecture. Various optimizers SGD, RMSprop, and Adam are applied to determine which optimizer performs best in identity accuracy. This concept works well with high-resolution images. It offers an in-depth learning prototype structure that enables better accuracy and processing speed. Keywords Neural networks · Machine learning · Convolution neural network (CNN) · Deep learning · Face recognition · EfficientNet

1 Introduction Image detection is a challenging subject in computer graphics and computer vision, and it has sparked much attention in recent years due to its variety of applications. Face recognition algorithms may be classified into three groups based on the face data gathering method: methods that operate on intensity photographs, approaches P. Upadhyay (B) · B. Garg · A. Tyagi · A. Tyagi ABES Institute of Technology, Ghaziabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_58

679

680

P. Upadhyay et al.

that work with surveillance videos, and procedures that deal with additional sensory data like as 3D models or infrared imaging [1]. Recognition is a critical approach that uses automated techniques to ensure or identify genuine personal differences in physical. Robust authentication systems often employ physical (handprints, eye retina, or facial) or psychological (writing skills, speech, or crucial patterns) traits to identify a person. Facial recognition starts by distinguishing facial characteristics from unpredictable circumstances and then perpetuates the photos for variations in morphology and luminance. It uses deep learning-based algorithms to identify features and then analyzes the findings employing post-model-based techniques and statistical evaluation. All facial recognition algorithms are compatible with two main components: (1) Facial Recognition and generalization and (2) Facial Recognition (Fig. 1). Component algorithms are called automated algorithms, and algorithms with the most effective second component are called computerized algorithms. The partly automatic set of rules is given the middle coordinates of the photo and the eye [1, 2]. People generally utilise facial expressions and voice to predict distinct human emotional states such as happiness, unhappiness, and wrath. According to numerous surveys, the verbal aspect accounts for one-third of human communication and the nonverbal element for two-thirds. Automatic facial emotion reputation (FER) differs from paper to paper, as do emotion recognition and facial feature popularity; electrocardiogram (ECG), electroencephalograph (EEG), and a digital camera may be utilised for FER input. The FER technique significantly reduces the reliance on facial-physics-based fashions and other pre-processing strategies, permitting direct “cease-to-quit” mastering from the enter images within the pipeline.

Fig. 1 Facial recognition

Face Recognition Using EfficientNet

681

On the other hand, there are numerous more significant few of the one-of-a-kind deep gaining knowledge of fashions. In a CNN-primarily based approach, the input photo is converted right into a characteristic map by a group of filters in convolution layers [3]. FER may be divided into two agencies depending on whether or not the body or video uses pics. First, solid (body-primarily based) FER is primarily based on absolutely static facial functions obtained by shooting handcrafted features from decided on frames of excessive photo sequences. Second, dynamic (video-based) FER uses spatiotemporal houses to seize the expressive dynamics in expression sequences. Even though dynamic FER has a higher detection price than static FER, it gives additional temporal expertise, which suffers from some hazards. Face recognition technology has been declared an essential study topic in recent years due to increasing security concerns and benefits for humanity in law enforcement and industrial applications. It has two modes of action. First, the confirmation mode in biometrics is called one-to-one matching. Functionality confirmation mode selects faces from an extensive database of faces to assess whether the facial information is relevant to a particular individual. The second identification process is very similar to the one. Authentication refers to comparing a person’s characteristics with a collection of possible I.D.s. The identification process consists of facial identification, rearrangement, modeling, and classification [4, 5]. The data augmentation method used to identify and capture a specific biometric feature using the correct description method is a significant problem in the F.R. system. The method of shooting functions entails retaining the maximum essential information for classification. A number of the characteristic extraction strategies recommended for application in biometric systems include significant issue evaluation (PCA), unbiased component evaluation (ICA), neighborhood binary fashions (LBP), and histogram methods. Deep gaining knowledge of the convolutional neural community (CNN) has emerged as the maximum popular characteristic extraction method used by F.R. CNN may be used in many methods. Step one is to get acquainted with the design from the bottom to the pinnacle. The purpose of facial recognition is to enable a computer system to identify human faces quickly and accurately in photos or videos as shown in Fig. 2. Several methods and strategies have been developed to improve the performance of face recognition models [5]. Deep learning for computer vision applications has attracted much attention recently. The human brain can detect and recognize multiple faces automatically and quickly. Facial recognition is often described as a process consisting of the first four steps; they are: • Facial identification: Choose one or even more faces in the image and place a perimeter box around them. • Face alignment: Faces should be normalized to match datasets like shape and essential research tools. • Feature extraction: Extract the traits that can be used for the recognition assignment from the image. • Facial Recognition: Faces are compared to one or even more recognized features in the collection.

682

P. Upadhyay et al.

Fig. 2 Face recognition processing flow

Finally, CNN may be utilized in switch studying to maintain the convolutional base in its original form and send its output to the classifier. Each time the data is small or the problem to be labeled is similar, the pre-educated version is used as a stable characteristic extraction approach. The goal of this paper is to research the type of accuracy with the aid of reading the F.R. performance, the usage of a pre-educated CNN (EfficientNet structure) for function extraction, aid vector device (SVM), and sooner or later, the CNN for both capabilities (AlexNet model) extraction and classification.

2 Related Works Duong et al. [6] A cut-rate dimensionality approach was created through an untrained framework known as Projective complicated Matrix Factorization (ProCMF). The technique is like ProNMF and the cosine dissimilarity metric, transforming the actual records into a complicated region. A projective matrix emerges through fixing a free complicated hassle, and the price characteristic is modified into minimized using an excellent gradient optimization technique. The proposed technique plays appropriately in contrast to ProNMF and is more potent to extract discriminative facial abilities and might be better than FER even beneath noise and outliers. Fatima et al. [7] proposed a method for improving emotion identification accuracy using facial expressions based solely on financial variables. The Viola-Jones set of rules was utilized, and 49 fiducial factors were monitored using SDM [7, 8], with the derived factors representing the current sections of the face, including the brows, eyes, nostrils, and mouth. The proposed method determines the Euclidean distance between each modern-day pair of points. The estimated distance ratio and the initial and last frames were used to obtain dynamic characteristics. EMDBUTMF was introduced by Ravina and Emmanuel [8] to reduce noisy pixels in an image. The approach appears strong enough to withstand cutting-edge salt and pepper noise. The feature vectors LDN (nearby directional huge variety) pattern and DGLTP (directional gradient close by ternary sample) were received after the noise was removed. To encode the close-by texture, DGLTP estimates the directional sample’s modern-day neighbor and quantifies it in the three-bias degree.

Face Recognition Using EfficientNet

683

Ding et al. [9] proposed using video to detect and characterize face expressions. The 24-dimensional DLBP was proposed as a way to discover top frames from a picture collection and correctly extract face capabilities. Taylor’s theorem has also been applied to magnify peak body function pixels to obtain discriminant features. The logarithmic-Laplacian became offered to triumph over cutting-edge variant illumination for real-time purposes. The proposed technique TFP surpassed existing LBP-based feature extraction approaches and became suitable for real-time programs, as tested on the JAFFE and C.K. datasets. Finally, Arshid et al. [10] suggested a multi-degree binary pattern (MSBP) function extraction approach employing sine and gradient variations to handle lighting in an actual-international situation. The MSBP technique finished 96% current accuracy charges in the blended approach and 60% cutting-edge accuracy in the segmentation-primarily based method, according to the results. Qin et al. [11] suggested a deep CNN-based validation technique. Face detection, face alignment, and function extraction were all latest in this approach. To extract facial functions, deep CNN VGG16 was employed. Images from 5 one-of-a-kind views have been used in the tests (left, right, front, unseen, and research). The test findings showed that the rules did a terrific job detecting faces in a modern-day expansion situation. Menotti et al. [12] studied ML strategies on iris detection and fingerprint. Modernday CNN and weight adjustment are high-quality methods for face reputation and imaging. They acknowledged minimum experimental information on biometric faking on sensors. However, they primarily used deep learning algorithms to develop a detection framework for facial, iris, and fingerprint variants. Backpropagation cutting-edge new community weights for every area trendy the CNN and stateof-the-art today is the suitable convolutional network topology have been two brand new the methods used. Simon et al. [13] evolved a method to grow facial reputation accuracy. A brilliant manner of picking out facial recognition is the multimodal reputation of using CNN’s gadget. They hired Gabor ordinal measurements (HOGOMs), nearby binary styles (LBPs), histograms brand new directed gradients, and histograms advanced functions which include HAARs to mix modality-specific CNNs. The method has a tremendous effect on the recognition fault price. Deep modern day could be improved through using present-day superior computer systems. In addition, CNN has been hired in research, even though only on neonatal Munir et al. [14] proposed a merged binary sample coding to apply sine and gradient variations to extract close-by facial skills. Light and currency fluctuations no longer posed a threat to the technique. Some pre-processing techniques, including using FFT with CLAHE and histogram equalization, were carried out before extraction. The classifier’s current-day performance has improved by using principal component analysis (PCA), a method for extracting features. The modern and proposed method was used for the real-world global picture collection. As a result, the proposed strategy achieves a higher success rate (96.5% overall) than any other approach. The evaluation results indicated that the holistic method performed better than the segmentation-based method.

684

P. Upadhyay et al.

Mahmood and Al Mamoon [15] advocated for FER because it is a potent weapon for acquiring rulership in the modern world. The facial analysis community has identified the Viola-Jones technique. Utilizing factor identification methods and morphological processes, we settled on the final function vectors. The characteristic vectors were fed into a feedforward neural community classifier to complete the expression class. We made some adjustments to the proposed approach, checked it against the publicly available JAFFE database, and are now entirely pleased with its accuracy. In order to measure the prevalence of an expression, Khadija et al. [16] suggested a novel face decomposition. The interface’s set of recommendations has been updated to locate seven regions of interest (ROIs) on the face using modern support landmarks. Different neighborhood descriptors are applied to LBP, CLBP, LTP, and dynamic LTP to extract a set of defining characteristics in the form of a vector. After evaluating the C.K. and FEED databases, the retrieved characteristic vectors serve as input for the SVM classifier. Qayyum et al. [17] The suggested desk-bound wavelet transformation (SWT) extracts facial capabilities in the spectral and spatial domain names. Every subband’s cutting-edge SWT consists of outstanding photo facts, and maximum modern-day data is stored within the L.L. subband. The extracted skills to maintain the proper photo size and DCT feature discount have emerged as hired. In addition, the decreased characteristic vectors had been allowed as inputs to the feed-forward neural network classifier and skilled with the backpropagation set current day rules. The CK+ and JAFFE datasets had been engaged in attempting out the accuracy and universal overall performance present day-day day this approach and the outcomes were in assessment with present techniques. Liu et al. [18] counseled an easy framework set cutting-edge day-to-day hints to extract facial capabilities from the precept facial place. The proposed set of guidelines normalizes the number one competency to equal length and extracts comparable facial abilities from one-of-a-type subjects. Afterward, the contemporary suggestions compared the most effective-scenario competencies with neutral facial abilities. LBP and gamma correction techniques were moreover hired to gather the popularity price. Skills extracted from LBP and histogram extremely-reducing present-day orientation gradients (HOG) had been fused, and PCA changed into finished lessening dimension. Mazumdar et al. [19] proposed a deep network-based, completely certainly automated FER system that changed into a complex and fast records fusion using geometric and LBP-based completely function extraction, auto encoders, and SOM-based classifiers. The proposed method was validated on well-known datasets, primarily CK+ and MMI. Finally, the skilled dataset allows for the use of an SVM classifier and a suggested SOM-based sincerely classifier. The outcomes were extraordinary proposed fused function strategy outperformed the individual and combined abilities, which ended higher with larger reputation charges. Tang et al. [20] proposed three video-based models for FER. First, a Differential Geometric Fusion community (DGFN) model becomes considering various combos decreasing-reducing present-day geometric capabilities on the component

Face Recognition Using EfficientNet

685

extremely-modern-day pretending the traits cutting-edge modern the movement devices present day-day each network and global features.

3 Materials and Methods Convolutional Neural Network: A convolutional neural network (ConvNet/CNN) is an enter getting to know the system that assigns relevance to unique elements/items in an image (learnable weights and biases) and differentiates between them [21, 22]. Compared to different category techniques, the quantity of pre-processing required for ConvNet may be minimal. ConvNet can analyze these filters/capabilities with sufficient schooling if the primary method requires hand-engineering of filters. The ConvNet layout changed into inspired with the aid of the employer of the visual cortex and resembles the connection version of neurons within the human brain. Character neurons reply to stimuli most effectively in a small region of the visual view known as the receptor area. Convolutional neural networks may be compared to everyday neural networks, but their inputs are simply illustrated, allowing designers to include particular functions into the architecture. The CNN structure is constructed with layers, [INPUT-CONV-RELU-POOLFC] being the best. The CONV layer includes a kernel or filter of a predetermined length, which slides in a window sample to carry out convolution operations to seize capabilities at the home windows picture, even as the input layer includes the uncooked pixel values of the pix. To avoid choppy mapping with the clean outsize, padding is carried out to the entered image length [22]. RELU stands for Rectified Linear Unit, an element-by-element activation feature that gives the hidden sensors zero values. The pooling layer, or pool, is compensated for down-sampling and size discounts, which minimises the number of CPU assets required to analyze the data. At the kernel, feature input is used to extract rotational and positional variant principal capabilities in a window-like slide or pool layer. Most pooling and common pooling are the two most commonly used maximum functions. F.C. is a connected layer that connects every neuron on the input to every neuron in the output and is chargeable for computing the score of a specific magnificence, ensuing in N outputs in which N is classified. Specifies the variety of classes/training we have. The everyday elegance of the CNN structure is determined via the beauty of the delightful score. F.C. layer is likewise called the dense layer. It must be cited that the CNN shape can be changed in line with the layout requirements and performance of the machine as shown in Fig. 3. DROPOUT and FLATTEN are one-of-a-kind layers used in CNN shape. The DROPOUT layer is a sorting mechanism to save us CNN overfitting, wherein the proportion of inputs (referred to as the dropout rate) is not noted with the aid of converting their price to 0 at some point of each update at some point of schooling. The scaled-up values of the created inputs are used to maintain all inputs consistent in the schooling path. FLATTEN layers precede F.C. layers to convert two-dimensional capabilities into one-dimensional features.

686

P. Upadhyay et al.

Fig. 3 Face recognition using CNN

To begin, the input image from the digital camera is supplied in real-time to the face detection rules. Next, we convert the cropped face photo to grayscale, reduce its size to 120 × 120 pixels, and render it inside the first convolution layer, consisting of 32 filters with lengths of 3 × 3 pixels. We initialize the filter weights with a random number and then update them periodically using the lower back propagation rules. The final weights are applied during the elegance stage [22, 23]. The best CONV + RELU layer results using the 32 filters we stated before are shown in Fig. The max-pooling function specifies a window size of 4 × 4 pixels; thus, the output of the second CONV + RELU layer is passed on to the pooling layer. Layer output is pooled in two ways: max pooling and common pooling. Maximum pooling was chosen because it was expected to perform better than common pooling in addressing the given challenge.

3.1 EfficientNet The EfficientNet set of rules is a network idea. This is extraordinarily green. Using borrowing the residual network to increase the intensity of the neural community, EfficientNet can extract functions from a deep neural network. EfficientNet also changes the number of feature levels for every layer to acquire more excellent capabilities by including extra characteristic extraction layers [23]. Sooner or later, the decision of the enter picture can be elevated to allow the community to research and constitute more excellent records, thereby increasing the accuracy. EfficientNet provides a superior balance of precision and efficiency by using ratios to scale every parameter and composite scaling, enhancing fine and precision. The mobile inverted bottleneck MBConv, initially presented in MobileNetV2, is a critical EfficientNet architectural piece as shown in Fig. 4. We reduce processing by approximately a factor of k2 compared to traditional layers by instantly exploiting bypassing between bottlenecks, which link a significantly lesser number of connections than extension layers and insight separable convolution. K is the kernel size, setting the two-dimensional (2D) convolution window’s height and breadth. The essential advantage of the EfficientNet model is that it uses a hybrid scaling approach

Face Recognition Using EfficientNet

687

Fig. 4 EfficientNet architecture

to scale up CNN in a more systematic manner, which will assist in the feature extraction process and ensure that emotions are consistently classified. Another notable aspect of this model is that it works well with high-resolution images. In the intensively embedded layer of CNN, the CNN architecture is utilized to collect distinguishing face attributes, while the SoftMax predictor is used to identify faces. The user may choose the number of convolutional and closely linked layers in the generating structure and whether batch normalization, dropout, and max-pooling layers are present. Many other parts of the face change often; as a result, the facial characteristics used for face identification should remain consistent [24]. Although higher feature representations have been uncovered in the literature, internal layers of Inception Module C are presumed to be part of the InceptionV3 CNN architecture in this recommended approach. A multi-feature fusion approach integrates the feature map from Inception Module C’s internal layer with its final feature vector. Module C’s internal layers will be evaluated to see which one performs better throughout the fusion process.

3.2 Dataset Used The dataset consists of 30,000 high-quality PNG images, as shown in Fig. 5, with a resolution of 1024 × 1024 pixels, covering a wide range of ages, ethnicities, and image backgrounds. Figure 5 shows some images from the dataset. Fer2013 contains about 30,000 RGB images of different expressions limited to 48 × 48 in size, and its leading labels can be divided into seven types: 0 = angry, 1 = hate, 2 = fear, 3 = happy, 4 = sad, 5 = surprised, 6 = neutral. The disgust expression has a minimum number of images—600, while the other labels have about 5000 samples. A sufficiently primary dataset, as is evident, can supply a plethora of image features to the human face recognition challenge. Unfortunately, such an ideal database is hard

688

P. Upadhyay et al.

Fig. 5 Images from dataset

to come by in real applications [25]. A small picture collection can be significantly expanded to become a huge one by employing photo transformation. As a result, more image features can be extracted to train the classifier, improving facial recognition.

4 Methodology Used In this project, we create a new system with four components. The first module shows how to do picture pre-processing [25, 26]. Second, feature extraction is performed before feeding the normalized picture into the next phase to extract feature expression from an image. In addition, the feature selection module was used as a third step to minimize feature dimensions before categorization. Finally, CNN has to classify the image. The image must next be examined to categorize it as one of the facial expressions (Sad, Surprise, Neutral, Anger, Disgust, Happy, or Fear). This article looks at the Multi-Task Neural Network for handling various face feature identification problems, as seen in Fig. 6. Continuous characteristics between jobs are essential for improving accuracy. As illustrated in Fig. 3, the CNN architecture is employed to obtain discriminating facial traits in CNN’s dense integrated layer, while SoftMax Predictor is used to identifying faces. The user may choose whether the batch normalization, dropout, and max-pooling layers are included in the development model and the sum of the convoluted and closely connected layers [26]. Because many other elements of the face vary, the traits employed for facial recognition must be constant. As a result, the facial characteristics taught by CNN on the classification job will not be used directly for face recognition, according to this research. At the same time, the bottom layers of such CNNs include characteristics like edges and corners, which perform better than pre-trained CNNs on datasets that have not been linked with faces in the past, such as ImageNet. As shown in Fig. 6, these characteristics can be used with a broad categorization, such as a fully connected (F.C.) layer, to assess qualities specific to a single person,

Face Recognition Using EfficientNet

689

Fig. 6 Methodology used for face recognition

such as gender and ethnicity. The problem’s age does not remain constant throughout time, but it does fluctuate with time. Therefore, this option may be anticipated with the same input vectors x but with more layers earlier than the last F.C. layer. Because many other components of the face change over time, the facial tendencies employed for facial popularity have to remain consistent. The inter-elegance distance among attributes of the same individual with numerous feelings, for instance, must be a lot smaller than the inter-magnificence difference between different folks with the same facial movements. As a result, the face characteristics produced by CNN trained on the recognition task will not be used directly for emotional recognition, according to this article [26, 27]. Unlike CNN’s trained on unrelated datasets like ImageNet, the bottom layers of such CNNs include characteristics like edges and corners, which are a focus for future research.

5 Result and Discussions Convolutional Neural Networks (CNN) are neural network that resembles a regular neural community in appearance. They are made of neurons that have found out weights and biases. Every neuron takes some input and does a dot product, after which, if necessary, does nonlinearity. Finally, the entire community specifies a oneof-a-kind scoring feature, from raw picture pixels at one quit to magnificence ratings at the opposite. At the closing (fully related) layer, they still have a loss function, and all of the learning pointers/hints found for regular neural networks are applied [28]. We used the OpenCV package deal to collect stay photographs from the webcam using the HAAR cascade method for face detection. The AdaBoost learning method is used in the HAAR cascade [28, 29]. The AdaBoost getting to know technique selects some critical functions from a vast set to provide the effectual output of the classifier. To finish the image enhancement, we used the photo data Generator magnificence in Keras, as proven in discern 9.

690 Table 1 Validation accuracy of CNN models for face recognition

P. Upadhyay et al. Method

Accuracy %

MobileNet-v1

91.55

EfficientNet-B0

92.05

EfficientNet-B2

95.75

ResNet-150

90.89

Using this lesson, we have been able to rotate, flow, clip, zoom, and flip the education images. Our CNN version turns out to be then extended with four convolutional layers, four pooling layers, and associated layers. After that, we used the ReLU feature to assign nonlinearity to our CNN model, similarly to batch normalization to normalize the activation of the previous layer in each batch and L2 regularization to impose consequences on unique parameters of the version made. Table 1 shows the performance of CNN. Although ResNet-18 runs at a similar speed, the trained CNN’s computational complexity and operational length are also less than expected. MTCNN was used to identify the facial regions in each picture [29]. When a frame contains multiple faces that are identified, the face with the most significant structural element is selected. Table 1 shows the test results of the individual models. As can be proven, the investigated method has the very best performance. After rigorous and continuous pre-training on the frame language dataset, MobileNet is 0.66% extra effective than ResNet-18. The more effective EfficientNet-B0 architecture improves accuracy by way of 1%. As a result, the ensemble classifier reaches its quality-recognized accuracy (92.05%), despite the reality that the whole validation set is not always processed because of facial recognition mistakes. The accuracy of EfficientNet-B0 and EfficientNet-B2 is 95.75% and 95%, respectively, while analyzing the most straightforward 379 validation motion pictures with faces. It is worth noting that the EfficientNet-based generation is the most linear version for this dataset [30]. For example, it is 24% extra correct than the winner’s DenseNet121 face model. Furthermore, improved face recognition can predict huge gains in group-level emotion classification.

6 Conclusion The performance of the EfficientNet method was evaluated experimentally in this work. Various training and test photographs were used to compute the overall results. So far, the most satisfactory results have come from convolutional neural networks. When employing sophisticated frameworks, accuracy rates of approximately 98% are possible. Despite this great outcome, CNNs cannot function without harming others. Large training datasets result in a significant amount of compute load and memory use, necessitating a lot of computing capacity. This research looks at a novel training strategy for compact neural networks with state-of-the-art precision in facial

Face Recognition Using EfficientNet

691

emotion identification in photos and videos for various datasets. It was demonstrated that, as compared to previous models, the facial feature extractor provides more resilience to face extraction and alignment, as evidenced by pre-training the facial feature extractor for facial expression and features identification using multi-task learning. Faces were sliced with no margins, depending on the areas supplied by face detectors. As a result, there is much precision and much speed, and model size. A novel emotion identification system based on facial expression was addressed and tested on the Radboud Faces Database. The proposed approach had a good outcome, with the highest average categorization rates equaling 86.8%. Four of the seven had a high level of precision 99%. Consequently, the suggested technique’s estimated parameters might be employed in integrated devices, such as mobile expert machines, to make quick choices. Due to the remarkable quality of the trained lightweight models’ face representations, we only employed traditional detectors (support vector machines and random forests) in our research. Thus, not all of our findings boost the effectiveness of current approaches.

References 1. Duong LT, Nguyen PT, Di Sipio C, Di Ruscio D (2020) Automated fruit recognition using EfficientNet and MixNet. Comput Electron Agric 171:105326 2. Ab Wahab MN, Nazir A, Ren ATZ, Noor MHM, Akbar MF, Mohamed ASA (2021) EfficientNet-lite and hybrid CNN-KNN implementation for facial expression recognition on raspberry pi. IEEE Access 9:134065–134080 3. Almadan A, Rattani A (2021) Towards on-device face recognition in body-worn cameras. arXiv preprint arXiv:2104.03419 4. Castellano G, Carolis BD, Marvulli N, Sciancalepore M, Vessio G (2021) Real-time age estimation from facial images using Yolo and EfficientNet. In: International conference on computer analysis of images and patterns. Springer, Cham, pp 275–284 5. Aruleba I, Viriri S (2021) Deep learning for age estimation using EfficientNet. In: International work-conference on artificial neural networks. Springer, Cham, pp 407–419 6. Yu J, Sun K, Gao F, Zhu S (2018) Face biometric quality assessment via light CNN. Pattern Recogn Lett 107:25–32 7. Sun Y, Wang X, Tang X (2013) Hybrid deep learning for computing face similarities. Int Conf Comput Vis 38:1997–2009 8. Singh R, Om H (2017) Newborn face recognition using deep convolutional neural network. Multimedia Tools Appl 76:19005–19015 9. Guo K, Wu S, Xu Y (2017) Face recognition using visible light and near-infrared images and a deep network. CAAI Trans Intell Technol 2:39–47 10. Hu H, Afaq S, Shah A, Bennamoun M, Molton M (2017) 2D, and 3D face recognition using convolutional neural network. In: Proceedings of the TENCON 2017 IEEE region 10 conference, Penang, Malaysia, 5–8 Nov 2017, pp 133–138 11. Nam GP, Choi H, Cho J (2018) PSI-CNN: a pyramid-based scale-invariant CNN architecture for face recognition robust to various image resolutions. Appl Sci 8:1561 12. Khan S, Javed MH, Ahmed E, Shah SAA, Ali SU (2019) Networks and implementation on smart glasses. In: Proceedings of the 2019 international conference on information science and communication technology (ICISCT), Karachi, Pakistan, 9–10 Mar 2019, pp 1–6 13. Qin C, Lu X, Zhang P, Xie H, Zeng W (2019) Identity recognition based on face image. J Phys Conf Ser 1302:032049

692

P. Upadhyay et al.

14. Menotti D, Chiachia G, Pinto A, Schwartz WR, Pedrini H, Falcao AX, Rocha A (2015) Deep representations for iris, face, and fingerprint spoofing detection. IEEE Trans Inf Forensics Secur 10:864–879 15. Simón MO, Corneanu C, Nasrollahi K, Nikisins O, Escalera S, Sun Y, Greitans M (2016) Improved RGB-D-T based face recognition. IET Biom 5:297–303 16. Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. BMVC 1:6 17. Zhu Z, Luo P, Wang X, Tang X (2014) Recover canonical-view faces with deep neural networks in the wild. arXiv:1404.3543 18. Guo S, Chen S, Li Y (2016) Face recognition based on convolutional neural network & support vector machine. In: Proceedings of the 2016 IEEE international conference on information and automation (ICIA), Ningbo, China, 1–3 Aug 2016, pp 1787–1792 19. Lawrence S, Giles CL, Tsoi AC, Back AD (1997) Face recognition: a convolutional neuralnetwork approach. IEEE Trans Neural Netw 8:98–113 20. Sun Y, Wang X, Tang X (2014) Deep learning face representation from predicting 10,000 classes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, OH, USA, 23–28 June 2014, pp 1891–1898 21. Sun Y, Chen Y, Wang X, Tang X (2014) Deep learning face representation by joint identificationverification. In: Proceedings of the advances in neural information processing systems 27, Montreal, QC, Canada, 8–13 Dec 2014, pp 1988–1996 22. Sun Y, Wang X, Tang X (2015) Deeply learned face representations are sparse, selective, and robust. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, 7–12 June 2015, pp 2892–2900 23. Lu Z, Jiang X, Kot AC (2018) Deep coupled ResNet for low-resolution face recognition. IEEE Signal Process Lett 25:526–530 24. Wan J, Chen Y, Bai B (2021) Joint feature extraction and classification in a unified framework for cost-sensitive face recognition. Pattern Recogn 115:107927 25. Wang M, Deng W (2021) Deep face recognition: a survey. Neurocomputing 429:215–244 26. Saeed U, Masood K, Dawood H (2021) Illumination normalization techniques for makeupinvariant face recognition. Comput Electr Eng 89:106921 27. Alibraheemi JMAKH (2021) Deep neural networks based feature extraction with multi-class SVM classifier for face recognition. Des Eng 12833–12853 28. Saini S, Malhotra P (2021) A comprehensive survey of feature extraction and feature selection techniques of face recognition system 29. Ahmed S, Frikha M, Hussein TDH, Rahebi J (2021) Optimum feature selection with particle swarm optimization to face recognition system using Gabor wavelet transform and deep learning. BioMed Res Int. PMID: 33778071; PMCID: PMC7969091 30. Plichoski GF, Chidambaram C, Parpinelli RS (2021) A face recognition framework based on a pool of techniques and differential evolution. Inf Sci 543:219–241

Implementation and Analysis of Decentralized Network Based on Blockchain Cheshta Gupta, Deepak Arora, and Puneet Sharma

Abstract In the current modern world of computation and technology, the Internet plays a very important role. Majority of the things on the Internet work on the concept of centralized networks which offer good accessibility and security along with innovations. But centralized networks are vulnerable to a fair share of cyber-attacks and security issues. Decentralized networks form the basis of blockchain technology. Blockchain offers a simple solution to ever increasing cyber and security attacks on networks by removing the concept of a sole point of failure from networks. This is what decentralized networks entail a decentralized network distributes informationprocessing workloads across multiple devices instead of relying on a sole central server. Each of these separate devices serves as a mini-central unit that interacts independently with other nodes. As a result, even if one of the master nodes crashes or is compromised, the remaining systems can continue providing data access to users, and the overall network will continue to operate with limited or zero disruption. In this paper, we have presented a way to create custom decentralized networks using Hyperledger Besu and established how this network can be used as a base to create smart contracts. This paper shows how different nodes can be created in a network and how these nodes communicate with each other. Peer-to-peer interaction is done with the network using P2P ports. Keywords Decentralized networks · Blockchain · WSN · Authentication · Trust management

C. Gupta (B) · D. Arora · P. Sharma Department of Computer Science and Engineering, Amity University Lucknow Campus, Lucknow, Uttar Pradesh, India e-mail: [email protected] D. Arora e-mail: [email protected] P. Sharma e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_59

693

694

C. Gupta et al.

1 Introduction Technologies like decentralized networks and cloud are on the rise. As such, the concept of decentralized networks is being utilized in many places. Decentralized networks are now rising to become the norm due to the increased security and protection they provide. In layman terms, a decentralized network is a network architecture in which the data on the network and workload is divided among several machines and servers instead of just a single server. The data being distributed on multiple servers provides increased reliability of the system and is more difficult for attackers to access. The greatest advantage of decentralized networks is that there is no single point of failure. This is because, as explained before, the user is not dependent on a single server to perform all the process and get access to the data. So, the failure of a sole server does not have any effect on the functionality in any way. Decentralized and blockchain networks are also very easy to scale because to increase the size, new machines or servers can be simply added to increase the computation power. Data privacy is also greatly enforced by decentralized networks as user information and data does not simply pass through a single point but across a different number of points. So, this makes it difficult for anyone to track this data across a network. Focusing on the downsides of decentralized networks, the biggest one is that it requires many machines to set up. This implies that many resources are required to set up a decentralized network perfectly. Figure 1 shows a very big difference in the workings of centralized and decentralized networks. Decentralized networks give rise to an entirely new area of development known as Decentralized Apps development (or dApps for short). These apps are programs that run on a blockchain network or decentralized network instead of a traditional centralized network of computers. Advantages of dApps incorporate the shielding of client data and provide protection, the absence of censorships and restrictions, and the adaptability of improvements and scalability. The downsides incorporate the possible challenges in creating and fostering a UI, and potential difficulties in making code adjustments and improvements. A great number of the upsides of dApps are based on the program’s capacity to defend client data integrity and privacy. With decentralized applications, clients never have to present or provide their own personal data to utilize the application in its full capacity. dApps utilize smart contracts to finish the exchange and complete the transaction between two anonymous parties without the need to depend on a central authority. This is exactly what decentralized networks mean and enforce.

2 Literature Study Several people have given proposals in the market on decentralized networks. Some of these proposed models are mentioned below:

Implementation and Analysis of Decentralized Network Based …

695

Fig. 1 Centralized versus decentralized networks

Nakamoto [1] has done a research paper on a peer-to-peer electronic cash system. In his paper, he talks about a transaction system in which cash can be transferred from one party to another without the involvement of an intermediary party or financial organization. He has presented an electronic transaction system which operates on cryptographic proof instead of trust or digital signatures which can allow two users to make transactions directly to each other without the need of a trusted third party. Cinque et al. [2] have done a research paper on decentralized trust management for Internet of Vehicles security which is empowered by blockchain. Dynamic access control is required in IOV where authentication and trust are required in line with the security policies which is why a trust management system based on a decentralized network is preferable. The authors have proposed a trust management system in the fields of IOT and IOV by utilizing security features and risk-free guarantees of blockchain. Montresor [3] has given a proposal on decentralized networks. He has discussed the possibility of using the p2p connection in the field of network analysis. In his research paper, he has discussed potential problems that could arise when network analysis is done using a decentralized approach and has also proposed a preliminary procedure to tackle these problems. Dabek et al. [4] have done a research paper on a coordinate system powered by a decentralized network called ‘Vivaldi.’ They have written about Vivaldi which is a very simple and lightweight algorithm which is used to predict the latency of communication between two hosts using a simulated decentralized network as a base. Fragouli and Soljanin [5] have done a research paper in which they have proposed new deterministic algorithms for coding in decentralized networks. According to them, their proposed algorithms will allow us to locally specify the coding operations at network nodes without any revelation or knowledge of the topology of the network. It will also make addition of new receiver nodes easier.

696

C. Gupta et al.

Lian et al. [6] have done a research paper in which they have compared the performance of centralized networks with that of decentralized networks. They have checked whether decentralized networks can outperform centralized networks or not. Centralized networks are highly dependent on the main central system or node. Henceforth, they have discussed whether decentralized networks can perform better since in decentralized networks, the load is divided between multiple machines. They have analyzed a D-PSGD algorithm and confirmed that on networks with low bandwidth, high ping, or high latency, D-PSGD can perform better than even optimized centralized networks. In a recent work, Maksymyuk et al. [7] have discussed the potential application of decentralized networks in 6G networks. They have proposed a decentralized network management in 6G by using a blockchain empowered network. They have talked about how spectrum management must be done in a completely new way to improve the existing 5G networks and how blockchain integration into the network infrastructure might be a potential step in the right direction. Benisi et al. [8] have written a research paper on decentralized storage networks based on blockchain. Here, they also have discussed how blockchain can be used to form a P2P connection between hosts without the involvement of any third party. They have also proposed that decentralized storage networks can ensure privacy by using the concept of end-to-end encryption which can enable a risk-free transfer of data through a decentralized network and eliminate issues that arise from centralized networks.

3 Methodology Blockchain and decentralized networks are slowly starting to firmly take root in the modern technological world. The most extensive and broadly used implementation of blockchain technology is in the form of cryptocurrency development and transactions. But the concept of decentralization is much wider and not just limited to smart contracts and transactions. The application of decentralized networks that this paper focuses on is Hyperledger. Hyperledger technology is a piece of blockchain innovation. Hyperledger is a very huge endeavor in blockchain development. It offers the essential design, standards, rules, and tools to implement and develop open source blockchains. It likewise provides decentralized networks and related applications for usage across various fields and endeavors. It is a non-benefit and nonprofit piece of technology. It is essentially used to bring the assets and resources for open source blockchain projects. It additionally guarantees a steady environment for open-source programming. It is a java-based Ethereum client. It is basically intended for big business and enterpriserelated work. This is what makes Hyperledger Besu an ideal piece of technology for public and private permissioned networks. It additionally empowers it to run on test networks like Rinkeby, Ropsten, and Gorli.

Implementation and Analysis of Decentralized Network Based …

697

This paper outlines how to set up the Hyperledger Besu network in various modes and then set up and run decentralized applications on them. There are two ways to set up the Besu client and to set up decentralized networks on it. The first way is to use the Binaries environment and run the Besu client locally and test different networks on it. The second way is to set up the Besu client and all the network related work inside a docker environment. This paper focuses on using a docker to install the Besu client, set up privacy-enabled Besu networks with off-chain permissioning and on-chain permissioning and then run the smart contract and the decentralized application and develop an API server which would be used to interact with the smart contract.

4 Implementation In this research work, authors have used Hyperledger Besu to set up a decentralized network in a docker environment. Docker is a prerequisite for this process. Hyperledger Besu is utilized to create various open source blockchains, and this is done by using the Besu client. The Besu client is available on the Hyperledger Besu Web site, and it is either installed from a zip file or built from source. To implement a decentralized network, first the docker is installed and set up and it is executed. After that, the Besu image must be pulled into the docker. To do this, the default Web site of Hyperledger Besu is opened and the option to ‘Run Besu from Docker Image’ is selected. There the command is found that is needed to spin up the Besu network and pull up the Besu image. This command will basically set up all the prerequisites needed to create a Besu node inside the docker which will be connected to the Ethereum MainNet network. The command ‘docker pull Hyperledger/Besu’ will pull up the Besu image on the local machine. To do this, upon execution of the command, the image will be downloaded onto the local machine. Now to start and run the Besu node on the docker, the command used is ‘docker run Hyperledger/Besu:latest’. This will initiate a connection to the MainNet network and set up local connection with a local host and a DNS. The process of the establishment of the connection involves waiting for the peers to connect one by one. This process takes time depending on the quality of the local host’s network and the hardware on which the docker is being run on. By default, it will try to connect with five peers to establish the connection. The next process involves interacting with the network which was created up earlier. The interaction with the Besu network is done with the use of ports. To do this, the P2P ports need to be exported for the discovery. Other ports can also be exposed for metrics and http web sockets as well. This time the ‘docker run’ command will have some other arguments to enable interaction with the network. The command for this is ‘docker run -p 8545:8545 -p 13001:30303 Hyperledger/Besu --rpc-http-enabled’. This command has a lot of arguments. The ‘-p’ argument is used for exposing the RPC ports. The port 8545 is mapped with the port 8545 of the container or docker. Similarly, the other port is 13001 which is mapped with the port 30303 of the docker.

698

C. Gupta et al.

The ‘Hyperledger/Besu’ argument indicates the image which will be used to set up the network. Finally, the ‘rpc enabled’ flag needs to be passed which is specified by ‘--rpc-http-enabled.’ And now using this command, the image can be run in docker which will allow it to connect to the Ethereum network. Once again, it will connect to peers. Now to test the connection, the ‘curl’ command can be used with a ‘POST’ request. This is the rpc command with ‘jsonrpc,’ the version, and the method in the payload which will specify the method that the user wants to invoke. For the purpose of testing, the method specified will be ‘net_version’ which will give the network version. Finally, the ID of the application is also outputted along with the port number of the docker which is 8545 for the local Besu node. Once the command is executed, the results will be posted which will be the same as the ones provided in the argument. This can be used to prove the fact that the Besu client is connected with the Ethereum main network and now the user can communicate with the Besu network with the use of the local mapped ports. Now, the other rpc APIs can be explored, which can be discovered in the Besu documentation. In the Besu documentation, many API methods can be found. To use these, either the POSTMAN collection can be imported and then the API calls can be made using the POSTMAN or the command line functions can be used directly. To begin, the first step would be to get the ChainID from the network. The ‘curlHTTP request’ command can be discovered in the Besu documentation. Executing this command on the command line will return the chainID of the network in hexadecimal notation. For more testing of networks, a new Besu node can also be created. For a new Besu node, the Ropsten network would be a suitable choice instead of the MainNet. This choice can be specified in the command to connect to a Besu network using the ‘network’ flag. The ports would also need to be changed since the original ports would be occupied by the MainNet. Figure 2 shows the console window for setting up a privacy enabled decentralized network. • Generating private–public key pairs The previous section involved connecting with the existing MainNet and Ropsten networks to establish a Besu node. But a private Besu network can also be set up on the docker manually. To do this, public–private key pairs are generated for all the nodes and then a docker-compose file is created inside which all the services for the network will be defined. A configuration file in json format is created. Hyperledger Besu provides a tool known as ‘Besu operator’ which takes the configuration file for the private network as input and generates the private–public key pairs based on the configuration file. The number of public–private key pairs generated is dependent on the number of nodes created in the configuration file. After that a docker-compose file is created inside which the version of the network is written along with what network that is to be created which is “Besu-network,” the driver that will be used which is ‘bridge’ and the subnet of the network. The subnet will also define how many IP addresses will be created for the private network. The subnet is required and is very important because each docker has to have a unique IP, and these IPs should

Implementation and Analysis of Decentralized Network Based …

699

Fig. 2 Private network console

be known so that when a node has to be joined with the boot node, the node address and the IP address of that particular node is required. The bootnode and the other nodes and their services are also defined in the docker-compose file. The services include the command that will be used to start the network, the shell on which this command will run, the location of the genesis file and the data directory, the rpc port and the P2P port that will be used to interact with the network and an IPv4 address which will be assigned to the docker container of the boot node. The other nodes also have their public keys in the docker-compose file. Figure 3 shows the configuration file for the network and the nodes present in the network. • Setting up a network with privacy enabled The Hyperledger Besu has provided a pre-built repository for setting up custom private networks with authentication and different privacy measures enabled. The repository can be cloned to the local system from Quorum Developer Quickstart. It can be done by executing the command ‘npx quorum-dev-quickstart’ after which a bunch of options will come up. This network will be set up with Hyperledger Besu, and support will be enabled for private transactions. This network will be using Orion Nodes as its privacy manager. Orion will be responsible for managing the privacy among the nodes. Once the network is started by executing the ‘run.sh’ script, the web block explorer can be opened by going to the local host address on which the network is running. The block explorer will show if the network is running correctly or not and how many

700

C. Gupta et al.

Fig. 3 Configuration file for the network

blocks have been mined. The Grafana explorer can also be opened to see an overview of all the validator and member nodes that are running on the network along with the block time of each node. Now, different smart contracts can be deployed on this network and various types of transactions can also be made. Figure 4 shows the Grafana block explorer dashboard which will be available at a local host created by the Hyperledger Besu. The URL for this local host can be found on the terminal on which the private network is running. This dashboard can be used to compare and check the different transactions and communications happening between the different blocks of the network. The grafana explorer shows the different validator nodes running on the network along with the rpc and orion nodes for privacy. The statistics shown are chain height, total difficulty, time since the last block was mined, etc. The graphs in Fig. 4 show the CPU usage and the memory usage of the network. So, this dashboard can be used to visualize and monitor the network.

5 Conclusion This paper contains the process of implementation of a custom privacy-based decentralized network and how to create implement root nodes and peer nodes. This research also deals with the analysis of the performance of the decentralized network while running and how to interact with it at the local host. After the implementation, the next step would be to implement and run smart contracts on the network to check how it performs and what is the process involved in the creation of decentralized applications on this network. Finally, an analysis of the performance of the network while the smart contract is running on it, as well as a comparison with

Implementation and Analysis of Decentralized Network Based …

701

Fig. 4 Grafana dashboard for statistics

other networks can be done in future research. The applications of this decentralized networks are further discussed. The most basic application of decentralized networks is decentralized apps. Decentralized apps work similarly to normal apps except with a decentralized network as its backend. The application of decentralized networks and blockchain has many possibilities and scenarios, what perhaps most stands out is optimization and increased trust of logistic value chains, quality control in the manufacturing stream, and transparency in maintenance models ranging from simple to sophisticated equipment. DApps, DAOs, smart contracts are one of the many applications that decentralized networks provide. These applications will be very useful for private data transfers and secure transactions without any risk of theft and manipulation. Creating different dApps using a private decentralized network is usually the first step that needs to be taken. For this, smart contracts are needed to be utilized. These applications will be much more versatile and provide a much tighter security to the user’s privacy and data.

References 1. Nakamoto S (2008) Bitcoin: a peer-to-peer electronic cash system. Bitcoin. https://bitcoin.org/ bitcoin.pdf 2. Cinque M, Esposito C, Russo S, Tamburis O (2020) Blockchain-empowered decentralised trust management for the internet of vehicles security. Comput Electr Eng 86:106722 3. Montresor A (2008) Decentralized network analysis: a proposal. In: Proceedings of the workshop on enabling technologies: infrastructure for collaborative enterprises, WETICE, pp 111–114. http://doi.org/10.1109/WETICE.2008.36 4. Dabek F, Cox R, Kaashoek F, Morris R (2004) Vivaldi: a decentralized network coordinate system. ACM SIGCOMM Comput Commun Rev 34(4):15–26 5. Fragouli C, Soljanin E (2004) Decentralized network coding. In: Information theory workshop. IEEE, pp 310–314

702

C. Gupta et al.

6. Lian X, Zhang C, Zhang H, Hsieh CJ, Zhang W, Liu J (2017) Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. In: Advances in neural information processing systems, vol 30 7. Maksymyuk T, Gazda J, Volosin M, Bugar G, Horvath D, Klymash M, Dohler M (2020) Blockchain-empowered framework for decentralized network management in 6G. IEEE Commun Mag 58(9):86–92 8. Benisi NZ, Aminian M, Javadi B (2020) Blockchain-based decentralized storage networks: a survey. J Netw Comput Appl 162:102656

Social Distance Monitoring Framework Using YOLO V5 Deep Architecture D. Akshaya, Charanappradhosh, and J. Manikandan

Abstract Due to the current outburst and speedy spread of the COVID-19 pandemic, there is a need to comply with social distancing rules by the general public. The public needs to, at minimum, hold a distance of 3 ft or 1 m among one another to follow strict social distancing as instructed by using the World Health Organization for general public safety. Researchers have proposed many solutions based on deep learning to reduce the current pandemic, including COVID-19 screening, diagnosis, social distancing monitoring, etc. This work focuses explicitly on social distancing monitoring by a deep learning approach. Here we employ the YOLOV5 object detection technique upon different images and videos to develop a strategy to assist and put strict social distancing in public. The YOLOV5 algorithm is more robust and has a quicker detection pace than its competitors. The suggested object detection framework shows an average precision rating of 94.75%. Some of the existing analyses suffer to identify humans within a range. A few identification blunders happen because of overlapping video frames or humans taking walks too near each other. This detection mistake is due to the overlapping structures, and human beings are standing too close to each other. This paper focuses on correctly identifying humans by using and overcoming the flaws of frame overlapping. Following the proposed social distancing technique also yields positive results in numerous variable eventualities. Keywords YOLO · Object detection · Deep learning

1 Introduction COVID-19 is a viral illness that is highly contagious and can cause severe breathing troubles. The world has faced a disastrous effect of more than 6.1 million deaths globally. After the influenza pandemic in 1918, COVID-19 has turned out to be a dangerous illness resulting in a global health crisis. On March 11th of 2020, the D. Akshaya · Charanappradhosh · J. Manikandan (B) Rajalakshmi Engineering College, Thandalam, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_60

703

704

D. Akshaya et al.

World Health Organization (WHO) had to announce it as a global pandemic when a few cases of predominant respiratory illness due to a virus was reported in Hubei, Wuhan province of China, in December 2019, quickly spread around the world in a short period. To respond to the present pandemic situation due to COVID-19 [1–5]. Therefore, social distancing is an essential protocol to control the outburst of this contagious virus in a locality. Following social distancing implies that people keep a distance of about 1.5 m from one another to restrain the infection of this viral contamination [6]. This virus transmits through airdrop lets, and so following social distancing is the best helpful way to eliminate the virus spread. When we sneeze, cough, or talk, the droplets are produced to hold a potential for transmission. We can contain the reach of this virus while we follow social distancing. To prevent the spread among the people globally, one should consciously wear a mask, wash hands frequently, and use hand sanitizers (alcohol-based hand rub). To restrain the disease, World Health Organization has endorsed that many nations must improvise case identification, tracing, keep isolation from closer contact to isolate the sufferers, and manage the site visitors [7]. In the United States, following the protocol of social distancing has benefited the economic bound by roughly 8 trillion dollars, which is greater than 1/3rd of their GDP. Other benefits of social distancing that are non-monetized are proper medication of non-COVID-19 patients, shorter periods of lockdown, ample time to develop the vaccine, and improvising the health resources. Thus, it is vital to promote proper social distancing [8].

2 Literature Review and Related Works In this division, we shortly present former work proposed for social distancing in light of the infectious COVID-19 disorder. The researchers started to contribute to the fatal situation when the disease began to spread in the last weeks of December. The alternative solution to the problem was social distancing. Various research studies were performed to give a proper solution to implement social distancing. In a similar context, Prem et al. studied the effects of social distancing protocols in spreading the coronavirus epidemic in the Wuhan province of China. They used artificial locationunique connection styles to structure different social distancing models to copy an occurring trajectory outbreak using an age networked SEIR model. They ciphered that an immediate hike on interventions might lead to a sooner secondary hike; however, the curve will flatten in time. Since everyone knows that following social distancing is crucial to dealing with today’s prevailing state yet economically, this measure will knock down the curve [9]. On the other hand, Researchers have begun to use artificial intelligence to identify the environmental and observational study factors that contribute to epidemics and predict the spread of infectious diseases before they manifest themselves in the real world [10]. Another researcher, Adolph et al. pointed out certain situations in the USA, where when they discussed the responses from states about social distancing and saw that there were different opinions by politicians and policymakers which

Social Distance Monitoring Framework Using YOLO V5 Deep …

705

delayed and halted from putting forth social distancing protocols that resulted in spread among the general public. On a positive note, implementing social distancing supported to a great extent minimizing the space of the illness; however, it put a setback economically [11]. Another work is by Ramadas, along with his colleagues, who came up with a monitoring system to maintain social distancing by using an automated drone. The drone passes alert signals to police stations that are closer by and also triggers public alarms. This device that the authors propose identifies objects by employing the YOLOv3 algorithm for object detection. The YOLOv3 has more intensifying layers of the convolutional neural network structure of the deep networks. Additionally, this advanced system can also deliver masks to individuals who aren’t wearing one and offers guidance about wearing a mask and keeping social distance. So far, many scholars have carried out substantial work for the process of detection, few present an intelligent medical management system for the pandemic using IoT in medicals [12]. Prem et al. deliberated on how keeping social distancing brings off the unfolding of the COVID-19 outburst. The authors concluded their study by stating that the prompt and instant implementation of social distancing must slowly lower the seriousness of the viral attack. Even though following social distancing is essential to strike down the curve, it is not a wise step economically [13–15]. Punn et al. [4] put forward a model to identify people by masking the use of the YOLOv3 algorithm and tracing the identified people, and a deep sort method was used. The detected people had I.D.s and bounding boxes. For a frontal view information set, they used an open data set of images. The experimental outcome was with SSD and faster-RCNN by the authors. Lu et al. put forward a well-structured YOLO-compact, YOLO architecture meant for single category identification in real-time [16]. As it is known, in object detection, the number of categories is always single in most practical applications, and the writer aims to programs, the amount of training in object detection is usually less. The authors aimed to produce faster detections for different scenarios. The authors could bring out a compact and efficient structure by using YOLOv3 by conducting various experiments. Ullah, put forward a model that can work on non-GPU-based computers, formulated by YOLO object detection [12, 17, 18].

3 Proposed Algorithm The proposed machine, the social distance analyzing tool, became advanced by python, computer vision, and deep learning to locate the gap between human beings to keep protection. The YOLOv3 version is based on computer vision; deep understanding and convolutional neural networks of algorithms are used to develop this model. At first, to detect the human inside the video frame or an image YOLOv3 algorithm is equipped, and an object detection open work formed on YOLOv3 has been used. From the result received, we focus on the “People” set by ignoring all other categories of objects. The bounding lines are outlined inside the video frames. Then we calculate the distance by using the result acquired through this process. YOLOv3

706

D. Akshaya et al.

works quickest along with SSD, following carefully and quicker RCNN in the final position. YOLOv3 is the best choice if you need to work with a live-streamed video. SSD affords an outstanding balance between faster performance and accurate results. Additionally, YOLOv3 is the most recently launched of all three and is constantly being equipped in the significant open-source community.

3.1 Object Detection In this paper, YOLOv3 algorithm is equipped for detection of objects since it enhances predictive precision, mainly for small objects. Most important advantage about this is, it comes with a tuned network layout for the detection of multi-scale objects. Further, to categorize the objects, it makes use of diverse unbiased logistics. It may be visible that feature learning is achieved by making use of convolutional layers, which is also called as Residual Blocks. Here, transfer learning method is acquired in order to enhance the effectiveness of this model. By using transfer learning, the framework is furthermore trained in a way where valuable data in existing model are not dropped. In addition, the present architecture is appended with the subsidized overhead information. So, in this manner, the framework takes the advantage of previously trained and newly skilled records, and the results we get are delivered with a faster and higher precision. We focus only on pedestrians that are identified and ignore different objects if identified. Here, transfer learning method is carried out to boost the accuracy of the detection framework.

3.2 Object Tracking To monitor an object, we would start with all possible detections in a body and supply them with an I.D. In subsequent frames, we can carry ahead this object I.D. I will get dropped while the object disappears from the edge. Firstly, while someone gets detected, draw a container, discover the centroid, and assign an I.D. to that individual. When a person moves from one function to another, to music his new role, the Euclidean distance among all the centroids is discovered. The shortest distance among the centroids is considered the new centroid of that man or woman. The schematic representation of general hierarchy in order to follow social distancing is given in Fig. 1.

3.3 Distance Measurement Given the bounding field for all and individuals with the centroid, we can find the distance among the person detected for violations. Euclidian’s distance formula

Social Distance Monitoring Framework Using YOLO V5 Deep …

707

Fig. 1 Hierarchy to follow social distancing

measures the length between all the identified centroids [19, 20]. For each of the upcoming frames in the input video, we, in the first place, calculate bounding field centroids, after which we compute the distance (represented by red lines) adjoining every pair in the detected bounding field centroids. Information about all the centroid is saved within like a listing. Based upon the distance value, a breaching threshold value is set to verify if any two human beings hold lesser than pixels separated apart or not. If this measured distance breaches the minimal social distancing set or two individuals are too closer, this record is appended into the breach list set. The bounding field is adjusted to green color. The papers are verified inside the breach list set. If the present-index exists breach set at all, then the color is upgraded to red. Furthermore, this centroid tracking algorithm is equipped to trace the detected individuals in the video frame. The tracing algorithm helps trace the individuals who violate the social distancing threshold value. At the output, the framework shows data about many social distance violations. To measure the focal length, F = (P × D)/W To measure the distance, D  = (W × F)/P

708

D. Akshaya et al.

4 Experiment and Result Discussion This pre-recorded video of humans in a busy location is considered input. Every character detected within the video frame is represented by the use of points and circles by a bounding box. The person whose measured distance is lesser than the tolerable minimal threshold standard is highlighted by using a red bounding box. And the person who maintains a safer distance from the other individuals is stressed through a green bounding box. The correctness of the measured distance among each man or woman depends upon the set of algorithms used. The YOLOv3 framework can also locate the pedestrians like a detected object. Even though only the partial body is viewable, an abounding field may be drawn to the half-viewable body. The location of the man or woman, similar to the center point of the lowermost edge of the bounding area, is relatively less precise. To remove the inaccuracy happening due to the lap over of the frames, a quadrilateral field box is put on to show the range.

4.1 Dataset Description Four general multi-objects manually labeled records were examined, such as Image Net ILSVRC [21], Google Open Images Datasets V6+ [22], PASCAL VOC [23], Microsoft COCO [24]. Each of which contained 16 million ground-truth bounding boxes in 600 categories and included 16 million ground-truth bounding boxes. An analysis of the dataset revealed that it contained a total of 21,947 classifications, with the majority of those classes being suitable for human detection and identification. The bounding box tags upon every image, as well as the corresponding collaborates of every label, were used to annotate the dataset. To train the dataset, we use an epoch of 50 and a batch size of 64. Over the architecture, we added a middle layer that reduces training losses and testing errors. Fast-RCNN (pre-trained), Faster-RCNN (pre-trained), Mask-CNN (pre-trained), YOLOv3 (pre-trained), and YOLOv5 are compared in this phase. According to our research, when it comes to detecting objects over an image, we used state-of-art approaches. Output screenshot of our built model is given in Fig. 2. Also Fig. 3 emphasize the training and testing loss of our model. Our experimentation used a variety of performance evaluation measures, which we documented in Table 1, and the respective graph is given in Fig. 4. When we chose to make the learning algorithm even during training phase, we have been using Stochastic Gradient Descent (SGD) to warm reboots. This assisted in getting out of local minima inside the optimal solution and saving time during the training process. The implementation refers considered a high value for the training data, then decelerated the learning speed midway through, and finally lowered the learning algorithm for every batch, with a slight downward slope in the initial stages. As a result, we reduced the learning rate for each batch by using the cosine annealing feature in the following ways:

Social Distance Monitoring Framework Using YOLO V5 Deep …

709

Fig. 2 Output taken from our model

Fig. 3 Training and testing loss of our model

Table 1 Testing accuracy, precision, recall between YOLOv5 and other state-of-art approaches Model

Accuracy (%)

Fast-RCNN

89

81

67

72.37

Faster-RCNN

91

81

71

73.57

Mask-RCNN

91

81

71

76.33

YOLOv3

91

85

77

81.84

YOLOv5

94.3

86.38

79.46

81.83

η pj

=

j ηmin

Precision (%)

Recall (%)

F-measure (%)

   Pcur 1 j j η − ηmin 1 + cos + π 2 max Pi

(1)

710

D. Akshaya et al. 100% 90% 80% 70% 60% 50% 40% 30% 20% 10%

0000

0% Fast-RCNN

Faster-RCNN

Mask-RCNN

YOLOv3

YOLOv5

Fig. 4 Comparison graph

5 Conclusion and Future Work This paper brings up a social distancing monitoring framework based on deep learning furnished with an overhead angle. Here, a pre-trained YOLOv3 prototype is equipped to detect humans in video frames. As someone’s appearance, scale, duration, visibility, pose, and form range appreciably from overhead vision, the transfer learning technique is also acquired to enhance this pre-trained prototype’s usual overall performance. This version is upskilled with overhead records set, plus the freshly trained layer is added to the existing prototype. To our best expertise, the proposed work attempts to transfer learning for a deep learning-based monitoring prototype used in the social distance monitoring overhead mindset. This detection framework yields bounding field statistics that contain coordinates of the centroid. Using Euclidian’s formula to measure, the set of pairwise distances between the centroids among the identified bounding fields is calculated. To test social distancing violations among humans, an approximated value of physical gap distance to pixels is applied, and a rule breaching threshold value is described. A rule breaching threshold value is kept to test whether the calculated value violates minimal social distancing norms. Furthermore, the centroid-based tracking technique is equipped for monitoring humans in the video frame. The experimental outcomes pointed out that this framework effectively identifies humans walking closer and breaching social distance norms; moreover, using the transfer learning process in this technique increases the detection version’s average performance and accuracy.

Social Distance Monitoring Framework Using YOLO V5 Deep …

711

References 1. Ansari MA, Singh DK (2021) Monitoring social distancing through human detection for preventing/reducing COVID spread. Int J Inf Technol 13:1255–1264 2. Baber H, Tripati RD (2021) The price of the lockdown: the effects of social distancing on the Indian economy and business during the COVID-19 pandemic. Ekonomski horizonti 23(1):85– 99 3. Ramadass L, Arunachalam S, Sagayasree Z (2020) Applying deep learning algorithms to maintain social distance in a public place through drone technology. Int J Pervasive Comput Commun 16:223–234 4. Punn NS, Sonbhadra SK, Agarwal S (2020) Monitoring COVID-19 social distancing with person detection and tracking via fine-tuned YOLO v3 and deepsort techniques 5. Musinguzi G, Asamoah BO (2020) The science of social distancing and total lock down: does it work? Whom does it benefit? Electron J Gen Med 17(6):em230 6. Nguyen CT, Saputra YM, Van Huynh N et al (2020) Enabling and emerging technologies for social distancing: a comprehensive survey 7. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR) 8. Ahmad M, Ahmed I, Ullah K, Khan I, Khattak A, Adnan A (2019). Int J Adv Comput Sci Appl. https://doi.org/10.14569/IJACSA.2019.0100367 9. Matrajt L, Leung T (2020) Evaluating the effectiveness of social distancing interventions to delay or flatten the epidemic curve of coronavirus disease. Emerg Infect Dis 26(8) 10. Elakkiya R, Vijayakumar P, Karuppiah M (2021) COVID_SCREENET: COVID-19 screening in chest radiography images using deep transfer stacking. Inf Syst Front 23(6):1369–1383 11. Singh NJ, Nongmeikapam K (2019) Stereo system-based distance calculation of an object in image. In: 2019 fifth international conference on image information processing (ICIIP). IEEE, pp 29–34 12. Chakraborty C, Banerjee A, Garg L, Coelho Rodrigues JJP (2021) Series studies in big data 80:98–136. http://doi.org/10.1007/978-981-15-8097-0 13. Kissler SM, Tedijanto C, Lipsitch M, Grad Y (2020) Social distancing strategies for curbing the covid-19 epidemic. medRxiv 14. Greenstone M, Nigam V (2020) Does social distancing matter? University of Chicago, Becker Friedman Institute for Economics working paper, no 2020-26 15. Pop DO, Rogozan A, Chatelain C, Nashashibi F, Bensrhair A (2019) Multi-task deep learning for pedestrian detection, action recognition and time to cross prediction. IEEE Access 7:149318–149327 16. Galdino B, Nicolau A (2017) A measure distance system for docks: an image-processing approach. In: 2017 IEEE first summer school on smart cities (S3C). IEEE, pp 145–148 17. Zhu X, Jing X-Y, You X, Zhang X, Zhang T (2018) Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. IEEE Trans Image Process 27(11):5683–5695 18. Zhong J, Sun H, Cao W, He Z (2020) Pedestrian motion trajectory prediction with stereo-based 3d deep pose estimation and trajectory learning. IEEE Access 8:23480–23486 19. Nilaiswariya R, Manikandan J, Hemalatha P (2021) Improving scalability and security medical dataset using recurrent neural network and blockchain technology. In: 2021 international conference on system, computation, automation and networking (ICSCAN). IEEE, pp 1–6 20. Sriram S, Manikandan J, Hemalatha P, Leema Roselin G (2021) A chatbot mobile quarantine app for stress relief. In: 2021 international conference on system, computation, automation and networking (ICSCAN). IEEE, pp 1–5 21. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115:211–252

712

D. Akshaya et al.

22. Kuznetsova A, Rom H, Alldrin N, Uijlings J, Krasin I, Pont-Tuset J, Kamali S, Popov S, Malloci M, Kolesnikov A et al (2020) The open images dataset v4. Int J Comput Vis 1–26 23. Chen X, Fang H, Lin T, Vedantam R, Dollar P, Zitnick C (2015) Microsoft COCO captions: data collection and evaluation server. arXiv:1504.00325 24. Everingham M, Van Gool L, Williams C, Winn J, Zisserman A (2010) The PASCAL visual object classes challenge 2010 (VOC2010) results. Int J Comput Vis 88:303–338

Real-Time Smart Traffic Analysis Employing a Dual Approach Based on AI Neera Batra and Sonali Goyal

Abstract Sensor data, which is also accurate, is used in the bulk of studies on traffic-related data. The volume of this data, however, is insufficient to cover the majority of the road network due to the high cost. To get a complete and accurate range of data, image processing-based solutions with higher compatibility and ease of maintenance, as well as sensors, are necessary (Rakesh et al., Int J Sci Technol Res 8(12) (2019)). Free-flowing traffic is harder to detect and manage than dedicated lanes, and therefore necessitates more exact forecasting. A traffic analysis system based on the random forest algorithm is presented in this study, which predicts traffic congestion on a given road and notifies users well in advance. Keywords Machine learning · Traffic analysis · Traffic density · Traffic congestion · Random forest

1 Introduction Infrastructure investment in transportation is crucial for a country’s growth and is at the heart of a modern economy [1]. India has the second-largest road network in the world. The government spends a substantial amount of money on transportation each year [2, 3]. Traffic congestion, on the other hand, continues to be the country’s most critical issue. Congestion is costly as well as inconvenient. The principal causes of traffic congestion are a large number of vehicles due to a large population and a lack of adequate planning in the construction of a robust road network [4]. As a result, the average city inhabitant spends more than ten hours per week driving, with one to three hours of the time spent stuck in traffic [5]. Despite the fact that the car industry has spent a lot of money on sensors inside vehicles to increase safety, performance, and comfort, traffic data collection via roadside methods have become one of the most difficult problems for intelligent transportation systems [6]. In order to deal with this issue, Cameras and sensors may now be used. These can be put along a N. Batra (B) · S. Goyal Maharishi Markandeshwer (Deemed to Be) University, Mullana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_61

713

714

N. Batra and S. Goyal

road to detect traffic flows, speed, and the road’s continuing occupancy. To analyze the data, traffic analysis and prediction technologies might be used. The ability to provide Quality of Service for video monitoring traffic flows is the most important requirement of a road monitoring scenario. However, Quality of Service (QoS) management is challenging to achieve due to a number of issues, including sensor power, energy, memory, processor capacity limitations, and node heterogeneity (each node has different capabilities) [7]. Due to node mobility and removal or addition of a node, the network topology may alter over time. The conditions in which sensors operate are frequently uncertain. All of these circumstances can produce “delays” which can lead to erroneous traffic data interpretation. Furthermore, environmental pollution and fuel waste are exacerbated as a result of traffic congestion waiting times. In traffic monitoring and analysis, one is interested in getting detailed statistics such as traffic density or average speed of a particular road section [8]. It plays a little but crucial part in society as it is necessary for the planning and maintenance of a city’s road network and traffic management [9]. All of these applications have one thing in common: They all require low-cost, dependable, and energy-efficient sensor units. These sensor devices combine data processing, networking and sensors to directly or indirectly measure the quantities of interest. As a result, cameras and well-designed sensor solutions, as well as machine learning algorithms, are critical for learning about the traffic monitoring system. In traffic monitoring systems, vehicle tracking employing image processing techniques have been used to provide traffic metrics such as traffic count, speed, density, vehicle classification, and incident detection [10]. Despite the fact that all detector technologies and devices have limitations and/or capabilities, only active infrared and video image processing (VIP) systems can enable multiple lane and multiple detection zone applications. In comparison with other technologies, the VIP system is the easiest to implement, maintain, and upgrade in the future. Furthermore, this technology allows users to visually examine the outcomes by watching previously recorded movies. We describe a system that automatically detects and classifies traffic congestion based on data acquired by sensors and security cameras in this paper. Techniques for efficient traffic management are required to increase road traffic while also reducing waiting and travel time. To achieve this goal, a variety of strategies and approaches have been offered in the literature.

2 Proposed and Implemented System The proposed system is designed to detect road traffic, sensing through sensors, and cameras which work together to collect data in a well-defined manner.

Real-Time Smart Traffic Analysis Employing a Dual Approach Based …

715

Fig. 1 Location of site on satellite imagery (Courtesy Google maps)

2.1 Data Acquisition and Collection 2.1.1

Selection of Site for Traffic Surveys

The traffic survey is performed on NH-344 4-lane flyover near Mullana. The survey is conducted for three days, three hours each day, in both directions, namely “To Ambala” and “To Yamuna Nagar.” The flyover is approximately 500 m long, including approaches, and traffic flow is deemed to be unbroken. On satellite images, the position of this place is depicted in Fig. 1.

2.1.2

Setup and Preparation of Sensors and Cameras

This research uses surveillance cameras and ultrasonic sensors to estimate traffic movement, which is subsequently utilized to assess the traffic situation or status of a roadway. One of the most important features of the proposed system is its accuracy in acquiring traffic data such as flow, speed, and density. The data is fed into the system to aid in the decision-making process. The quantity of vehicles and the speed at which they move in a given period on a road network are commonly used as indicators of traffic congestion and jams [11]. Most traffic management systems employ ultrasonic sensors or surveillance cameras to assess traffic density, but both have been used in this presented system to improve the accuracy level of the obtained data. The proposed system collects traffic data for three hours using four ultrasonic sensors and four cameras, with the data from both types of resources being delivered to the corresponding microcontroller by the local server on a constant schedule. Two sets of sensors were installed horizontally, 5 m apart, one facing Ambala and the other facing Yamuna Nagar. The time difference between when the first sensor detects an

716

N. Batra and S. Goyal

Table 1 Status of the sensors

Condition/Sensors

P1

P2

Status

Condition 1

0

0

Low

Condition 2

0

1

Medium

Condition 3

1

0

Medium

Condition 4

1

1

High

impediment and when the second sensor detects an obstacle is used to calculate a vehicle’s speed. The speed of the vehicle may be correctly approximated by dividing the fixed distance by the time difference between sensors feed since the distance between the two sensors is set (5 m). The concept uses a small number of ultrasonic devices, which makes it less expensive to install and maintain than the solutions previously discussed. This will allow for the installation of more sensors to assist in the monitoring of a larger stretch of road network for effective implementation.

2.1.3

Data Acquisition

Data Acquisition Using Ultrasonic Sensors With a 3 mm precision, ultrasonic sensors provide excellent non-contact range detection between 2 and 400 cm (about an inch to 13 ft). It may be immediately connected to an Arduino or any other 5 V logic microcontroller because it runs on 5 V [12]. The sensor head sends out an ultrasonic pulse, which is reflected back to it by the target. Ultrasonic sensors use the time between emission and reception to calculate the distance to the target. The distance can be calculated with the following formula: Distance D =

1 ×T ×S 2

where D is the distance, T is the time between the emission and reception, and S is the sonic speed. (The value is multiplied by 1/2 because T is the time for go-and-return distance). Two pairs of sensors are used to calculate traffic density in the proposed and implemented system. Each sensor displays a value of 1 or 0. At the node level, density is computed. 

( pi = pi + pi + 1 . . . . . . . . . . . .)

where P is the pair of ultrasonic sensors. Table 1 shows the status of the sensors and their results as follow.

(1)

Real-Time Smart Traffic Analysis Employing a Dual Approach Based …

717

Fig. 2 Installation overview

Table 2 Cumulative density based on different scenarios

Scenarios

Results obtained Result obtained Traffic density from sensors from camera

Scenario 1 Low

Low

Low

Scenario 2 Low

Medium

Medium

Scenario 3 Low

High

Medium

Scenario 4 Medium

Low

Medium

Scenario 5 Medium

Medium

Medium

Scenario 6 Medium

High

High

Scenario 7 High

Low

Medium

Scenario 8 High

Medium

High

Scenario 9 High

High

High

Data Acquisition Using Surveillance Cameras Data was collected from the NH-344 4-lane flyover, near Mullana. The approaching traffic is divided into four lanes, each with its own set of cameras. Figure 2 shows an overview of the installation. For optical flow estimate, numerous consecutive frames are acquired for each vehicle. To avoid occlusions, many cameras record vehicle movements. For training and testing reasons, a vehicle database is created. As indicated in Table 2, the microcontroller receives data from a local server transmitted by ultrasonic sensors and cameras on a regular basis in order to determine the cumulative density.

2.1.4

Data Preparation

Data is cleaned after acquisition, with null values addressed, potentially inconsistent data deleted, feature extraction conducted, and data processed as shown in Fig. 3.

718

N. Batra and S. Goyal

Fig. 3 Steps for data preparation

Cleaning and feature extraction are carried out using a Python script and some Python libraries (e.g., Pandas and NumPy). Null data was removed throughout this operation. Because no anomalies may be discovered when the vehicle is stationary, rows with speed values of 0 were removed. Feature extraction is conducted on this set of images after many consecutive frames have been acquired. Feature selection is carried out after feature extraction [13]. The random forest method is used to analyze traffic and will assist in determining if there is more or less traffic. The reason for employing the random forest is that while training and outputting the class, the random forest actually produces a number of decision trees, the mode of all classes or mean prediction of the decision trees. The random forest will take two-thirds of the original data in the dataset and attempt to build a large number of decision trees on it. A.csv file will be used to store the data.

2.1.5

ML Model Training and Validation

After data preparation, the rush interval is identified, the local server intimates to the respective microcontroller along with the road id. The camera sensors capture the details from the lane with live streaming and pass it on to the controller board. The board will differentiate all the vehicles from obtained data and maintain the count of vehicles. This count is passed on to another controller board which receives the data from ultrasonic sensors as well. Traffic density is measured. After the receiving the information, the traffic condition on the road is reported. The random forest algorithm is used to predict the mean square error (MSE) as well as root mean squared error (RMSE). MAE and RMSE measurements were used as evaluation metrics for performance comparison. The MAE is the mean of the absolute error between the predicted value and the true value. It ranges from zero to infinity, it is expected to give a low result. It is a linear value. The MAE formula is shown in Eq. (1). In the formula, pi represents the predicted value and oi original value. MAE =

n 1 |oi − pi | n i=1

(2)

The RMSE is the root mean square error between the predicted value and the true value. It ranges from zero to infinity, it is expected to give a low result. The RMSE formula is shown in Eq. (2). In the formula, pi represents the predicted value and oi original value.

Real-Time Smart Traffic Analysis Employing a Dual Approach Based …

 RMSE =

n 1 (oi − pi )2 n i=1

719

(3)

The proposed system architecture diagram is shown in Fig. 4. The real image captured from camera is shown in Fig. 5. The vehicle detection and counting being done in shown in Fig. 6a, b, respectively. The frame set is shown in Fig. 7.

Fig. 4 Proposed system architecture

720

N. Batra and S. Goyal

Fig. 5 Image captured from camera

Fig. 6 a and b Vehicle detection and count process

3 Experiments and Results Experimental evaluations of linear regression, decision trees, and random forest methods are made using real data as discussed above. RMSE and MAE values are used as evaluation criteria to examine the performance of the methods. The experimentally evaluated station information and the values of the methods are presented in Tables 3 and 4. The different sensor and camera marked positions on the NH-344 4-lane flyover near Mullana, and the corresponding MAE values are given in Table 3. When the result values are examined (Tables 3 and 4), the random forest method made predictions with lower error rates compared to other methods as shown in Fig. 8.

Real-Time Smart Traffic Analysis Employing a Dual Approach Based …

721

Fig. 7 Frame set Table 3 MAE values of the methods Sensor and camera marked positions on the Linear regression Decision tree Random forest NH-344 344-A

3679

7118

2934

344-B

7680

9745

4554

344-C

4284

7378

3544

344-D

8992

8545

6442

344-E

5643

6336

3845

344-F

10,802

9824

7226

344-G

6503

5745

5002

344-H

6445

4702

3978

722

N. Batra and S. Goyal

Table 4 RMSE values of the methods Sensor and camera marked positions on the Linear regression decision tree Random forest NH-344 344-A

6897

7542

4332

344-B

11,834

11,902

10,882

344-C

15,988

15,755

14,997

344-D

10,204

9824

9801

344-E

4356

3457

3202

344-F

6978

7322

6456

344-G

7883

8774

5845

344-H

9887

10,892

7665

Fig. 8 Difference between measured and calculated densities

Statistical tests are done to examine the difference between the two densities given by cameras and sensors. For this purpose, the correlation between both densities is determined as 0.61 confirming the significance of this relationship at the 0.01 level. In the second step, t-test is performed on two data sets of densities to examine the difference between the mean values of the two densities. The SPSS Package is used for this analysis. The t-test confirms that there is no significant difference between the two mean values of measured and calculated densities at the 6% significance level. Furthermore, the 94% confidence intervals established for both density means indicate that the whole range of calculated density overlaps entirely with the limits of the measured density. In view of the above results, it is concluded that the system is successfully utilized to measure traffic density on the road.

Real-Time Smart Traffic Analysis Employing a Dual Approach Based …

723

4 Conclusion The traffic problem takes a large part of people’s time in last years. In addition, it is very important to manage traffic systems in places with high population density [14]. One of the most important components of traffic analysis is that traffic density can be determined in advance. Experimental evaluations have shown that random forest-based methods are more successful than linear regression and decision trees.

References 1. Yan G, Chen Y (2021) The application of virtual reality technology on intelligent traffic construction and decision support in smart cities. Wireless Commun Mobile Comput 2021. https://doi.org/10.1155/2021/3833562 2. Kaur J, Batra N (2014) A traffic aware health monitoring application embedded in smart ambulance (THESA). Int J Comput Sci Eng (IJCSE) 2(11):132–137 3. Batra N, Goyal S (2022) DDSS: an AI powered system for driver safety. Lecture Notes Netw Syst 339:429–437. https://doi.org/10.1007/978-981-16-7018-3_32 4. Sun D et al (2020) A highway crash risk assessment method based on traffic safety state division. PLoS ONE 15(1):e0227609 5. Feng C, Suren C, Ma X (2016) Crash frequency modeling using real-time environmental and traffic data and unbalanced panel data models. Int J Environ Res Public Health 13(6):609 6. Liu B, Sun Y (2011) Application and study on artificial intelligence technology in traffic signal control system. In: International conference on applied informatics and communication. Springer, Berlin, Heidelberg 7. Šusteková D, Knutelska M (2015) How is the artificial intelligence used in applications for traffic management. Mach Technol Mater 9(10):49–52 8. Rakesh S, Hegde NP (2019) Automatic traffic density estimation and vehicle classification method using neural network. Int J Sci Technol Res 8(12). ISSN 2277–8616 9. Pozanco A, Fernández S, Borrajo D (2016) Urban traffic control assisted by AI planning and relational learning. ATT@ IJCAI 10. Elish MC, Boyd D (2017) Situating methods in the magic of big data and artificial intelligence. Commun Monographs, Forthcoming 11. Raja J, Bahuleyana H, Vanajakshia LD (2014) Application of data mining techniques for traffic density estimation and prediction. In: 11th Transportation planning and implementation methodologies for developing countries, TPMDC 2014, Mumbai, India 12. Rabby MKM, Islam MM, Imon SM (2020) A review of IoT application in a smart traffic management system. Int J Eng Appl Sci Technol 5(1):612–615(2020). ISSN No. 2455–2143. https://doi.org/10.1109/ICAEE48663.2019.8975582 13. Regassa AA, Feng WZ (2021) Prediction of road traffic concentration using random forest algorithm based on feature compatibility. Int J Eng Res Technol (IJERT) 10(4), ISSN: 2278– 0181 14. Aydin S, Ta¸syürek M, Öztürk C (2021) Traffic density estimation using machine learning methods. J Artific Intell Data Sci (JAIDA) 1(2):136–143. e-ISSN: 2791–8335

Sustainable Development in Urban Cities with LCLU Mapping Yash Khurana, Swamita Gupta, and Ramani Selvanambi

Abstract Rapid and uneven urbanization of the compact city in the last few decades has threatened its ecosystem. WHO recommends having a green space of at least 9 m2 per individual and an ideal UGS value of 50 m2 per capita to restore the ecological balance of such cites. This study proposes a novel remote sensing-based approach that utilizes LCLU classification maps by applying various machine learning and deep learning methods on multispectral imagery to analyze and verify the presence of the amount of UGS area present in a city. Employed on the urban region of South–West Delhi, it reveals an unsatisfactory level of green space with an UGS per capita of 25.9 m2 , UGBS per capita as 48.14 m2 and UGS percentage abundance as 14.14%. Further, potential expansion areas are suggested using these maps to aid policymakers to strive toward sustainable development. Keywords Urban green space · Urbanization · Remote sensing · LCLU · SVM

1 Introduction Urbanization is an indispensable aspect of modernizing a country. However, this development has taken place at the expense of many natural elements, and it continues to be a threat to the structure and dynamics of an earlier harmonious ecosystem in developing countries [11], making these regions very fragile. With its rate only increasing, urbanization is the most significant anthropogenic force to change the landscape and land use pattern of an area. Since changes in the physical characteristics of the land portray the socioeconomic, natural and ecological processes of a given area, it is important to identify and map the land cover for monitoring studies, urban planning and resource management. Various studies on sustainable development have been carried out for the same in the past. Rahman [18] attempts to combine remote sensing data with social media data from Twitter usage to determine urban sprawl from year 2011 to 2017 in Morogoro Y. Khurana · S. Gupta · R. Selvanambi (B) Vellore Institute of Technology, Vellore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_62

725

726

Y. Khurana et al.

urban district of Tanzania using the random forest (RF) method for classification. Rahman et al. [19] assess LCLU change in the North–West districts of Delhi on very high-resolution (VHR) imagery to help policymakers in regulating land transformation. In some recent works, changing pattern of urban landscapes and their effect on land surface temperature in and around Delhi have been analyzed and evaluated [5]. These studies reveal a positive temperature trend throughout the study periods which necessitates the need for expansion of urban green spaces to nullify the effects of increased urbanization. This paper proposes a novel remote sensing-based method to analyze and assess the UGS area in an urban city. Multispectral imagery from satellites is preprocessed and classified using three different methods, namely, ANN, SVM and MLC to generate ULCLU maps. These maps are then analyzed by calculating parameters such as UGS per capita, UGBS per capita, total UGS abundance and percentage UGS abundance to assess the UGS situation of the southwest region of Delhi and identify potential expansion areas. Studies like these help policymakers to analyze the present day situation of rapid and uneven urbanization in cities like Delhi and to regulate the minimum amount of green space areas as per the WHO guidelines.

2 Literature Survey Urban green spaces are open areas, consisting of parks, trees and other plant life present in the city. Similarly, the water bodies in a city are referred to as blue spaces. Biodiversity and abundance must be conserved, designed and managed to maintain ecological processes, keeping in mind the city’s socioeconomic and cultural dynamics [2]. UGS varies from place to place depending upon its demography, population and so on, but is found to be essential to enhance health benefits, making it incomparable. Unprecedented urban expansion is responsible for the encroachment and destruction of green spaces, thus proving to be a deleterious threat to biodiversity and the ecological ambience of a city [1]. Due to the same, green spaces have seen a rapid decline at various places where they were replaced by built up areas [13]. It is reported that by 2030, urban land cover will expand by 1.2 million km2 and triple the urban land as compared to 2000, suffering a loss of 1.28 PgC of the green spaces [21]. Hence, urban green space planning is crucial for sustainable development at the present day. This is met by various challenges like areas under densification, counteracting social factors, reducing the deterioration of recreation activities and so on [7]. Ignatieva et al propose a number of nature-based solutions for UGS that are dependent on the type of surrounding nature—European urban and Australian urban and suggest different alternatives to lawns [8]. Similarly, other approaches to existing UGS planning include park-planning, demolition of factories, mass tree planting along roads and retrofitting green space near canals, roads and railway lines [25]. A simple and sustainable solution to increase green spaces is to merge UGS with their adjacent raw land areas using techniques like superblocks and suitability

Sustainable Development in Urban Cities with LCLU Mapping

727

Table 1 Techniques for classification of multispectral imagery Method

Authors

Merits

Gaps

Per-pixel based

[9, 23]

General performance; simple segmentation principle; Easy to implement

Salt and pepper effect; problem of mixed pixels

Sub-pixel based

[14, 26]

General performance; solves the mixed pixel problem

The evaluation of the accuracy of the calculated sub-pixel estimates is extremely difficult; representation of sub-pixel thematic information is a difficult task

Object based

[6, 12]

Well performance; spatial relationship between pixels is considered

Over- and under-segmentation

Knowledge based

[10, 17]

Can incorporate ancillary data from multiple sources

No direct way to develop rules for incorporating data

Multi-classifier Systems

[4, 22]

Increased overall accuracy; Integrating multiple classifiers removes limitations of the individual classifiers

No direct rule of developing suitable rules for combining

checklists [15]. Another approach toward monitoring of UGS loss is through the use LCLU maps obtained by classifying multispectral imagery from satellite data [3]. The assessment of these maps gives meaningful and relevant information about the UGS of an area and its abundance. Various types of machine learning and deep learning algorithms can be used to generate these maps. These algorithms can be classified into 6 major categories. Table 1 describes the previous work done using these techniques. A comparative study of different classification techniques is often conducted to determine the best type of technique for a particular task [20]. Time and again it has been indicated that contextual-based classifiers and non-parametric classifiers based on machine learning and SVM outperform classifiers such as MLC with some minor trade-offs in overall classification accuracy and computational time.

3 Methodology 3.1 Preprocessing Remote Sensing Imagery Multispectral imaging sensors from a satellite often capture different types of spectral bands where each band has its own uses. However, these bands obtained from

728

Y. Khurana et al.

satellites often suffer from noise, generated by optical sensors have a calibration error, or an intrinsic property of the hardware; noise can also arise from atmospheric effects such as cloud cover, topographic effects or even shadows. Such noise hinders the performance of ULCLU classification. Preprocessing of remote sensing data generally includes two major steps: (a) Radiometric calibration, (b) Correction of distortions: geometrical and atmospheric.

3.2 Urban LCLU Mapping Once remote sensing data has been preprocessed, it can be classified into urban LCLU maps with various machine learning and deep learning algorithms available. Different approaches to algorithms have been explored upon in the previous work section. Selection of training data Training supervised algorithms require an enormous number of samples which play an essential role in the final quality of the algorithm often determined by its accuracy. Training data can selected from thematic maps or obtained from true-color composite imagery of VHR satellites. Further, the chosen area is divided ULCLU classes for classification purposes such as barren land, residential areas/buildings, blue space and green space and road networks. Performance analysis The most frequently used performance metrics for LCLU classification, such as users’ accuracy (UAC), producers’ accuracy (PAC), overall accuracy (OAC) and kappa statistics, are calculated from error matrices that contain cross-tabulated data of classified and testing data.

3.3 Analysis of ULCLU Maps Analysis of ULCLU maps can reveal a lot of meaningful and relevant information. Parameters such as UGS and UGBS area help in determining the amount of biodiversity present per capita in the study area. UGS and UGBS can be formulated as follows: UGS = UGBS =

no of green pixels ∗

total area of tile total pixels

total area of tile ∗ population density

(no of green + blue pixels) ∗

total area of tile total pixels

total area of tile ∗ population density

(1)

(2)

Similarly, green space (GS) abundance tells us the total area of the study area that contains urban green spaces, and green space percentage gives us the percentage of green spaces out of the total area. GS abundance and GS percentage can be formulated

Sustainable Development in Urban Cities with LCLU Mapping

729

as follows: GS Abundance = no of green pixels ∗ %G S Abundance =

total area of tile total pixels

noof green pixels total pixels

(3) (4)

4 Results and Discussion This section employs the proposed method to analyze UGS area of the South–West region of Delhi. Figure 1 describes the methodology followed for the same.

4.1 Study Area and Data Dataset and satellite This study uses the data from the Sentinel-2A satellite which generates two major products, Level-1C, Level-2A. In this work, Level-1C (ID = L1C-T43RFN-A01657620200509T053704) data was acquired from the USGS Web site. This was then converted to Level-2A, and a study area of 14 km × 14 km (Fig. 3) was clipped from the 100 km × 100 km original tile (Fig 2). This data is preprocessed as per the Level-2A algorithm (L2) using the SNAP 7.0 toolbox with the Sen2Cor plugin (2.80). This product is then georeferenced in WGS 84 UTM coordinate system. Study area The study area can be seen in Fig. 3. Situated in the Northern Part of India, it is geographical located between the longitudes 76058’ 4” E to 7706’34” E to 28033’52” N 28041’31”N and covers an area of 202 . The area mainly includes five LCLU classes, namely, roadways, water bodies, vegetation, buildings/residential and barren land. What makes the study area interesting and challenging is the high and uneven rate of urbanization. Being densely populated and having an uneven land use pattern, sustainable development is essential for Delhi.

4.2 Creating ULCLU Maps from Remote Sensing Data In this work, the selected data containing the urban settlement is classified using the state-of-the-art supervised algorithms such as MLC, SVM and ANN. The selection of these classifiers is done ensuring that each one belongs to a different class. The performance of all classifiers is compared with each other, and the best one is selected for analyzing UGS.

730

Fig. 1 Methodology employed for analyzing UGS in the study area

Y. Khurana et al.

Sustainable Development in Urban Cities with LCLU Mapping

731

Fig. 2 Original Sentinel-2 tile

Fig. 3 Selected study area

Selection of training data Training supervised algorithms requires an enormous number of samples which play an essential role in the final quality of the algorithm often determined by its accuracy. Training data can selected from thematic maps. However, such data can reflect intrinsic errors of previous ULCLU classifications despite being reliable and accurate. Therefore, to avoid such deviations, this work uses training and validation data obtained from true-color composite imagery of Sentinel-2 by matching with VHR imagery of Google Earth. A 15m buffer is used all data points in order to guarantee training and testing samples’ independence. Since the chosen study area belongs to a high and unevenly urbanized zone, 5

732

Y. Khurana et al.

Table 2 Training and testing data for each class used

Land cover class training samples (in px) testing samples (in px) Barren land

200

410

UGS

200

410

Water bodies

200

444

Buildings

200

659

Roads

200

486

ULCLU classes are chosen for classification purposes namely, barren land, residential areas/buildings, water bodies and road networks. Number of training and testing samples used in pixels are provided in Table 2. Support Vector Machine (SVM) This study uses SVM with the radial basis function which evidently required the least time to train and also gave the highest overall accuracy. The parameters were chosen after various trials with gamma of the radial function equal to 0.333 and the penalty parameter as 100.00. Artificial Neural Network (ANN) For the purposes of this study, a feed-forward neural network with one hidden layer based on the backpropagation algorithm is used to classify multispectral imagery of Sentinel-2. To achieve the highest accuracy possible, the hyper-parameters of the ANN are selected after the careful inspection several trials of training and testing. The neural network with one hidden layer is trained for over 1000 iterations, with a sigmoid activation having momentum rate as 0.9 and learning rate set to 0.2. Maximum Likelihood Classifier (MLC) Being a parametric method, the pixel values are assumed to be distributed among various classes according to a posterior probability. The classifier then calculates probabilities of each pixel that belongs to each land cover type through a learning function deduced from the training samples. Performance Analysis Table 3 mentions the UAC and PAC for the chosen methods for the classification of Sentinel-2 imagery. Further, the OAC and KS of the methods are represented graphically in 7, 8, respectively. Figures 7 and 8 represent the KS and OAC of all the classifiers, respectively. SVM scores the highest KP and OAC among the three, which makes it superior to ANN and MLC. Figures 4, 5 and 6 also visually demonstrate the classification performance Table 3 Producer and user accuracy % obtained SVM

ANN

MLC

PAC

UAC

PAC

UAC

PAC

UAC

Road

95.47

96.87

87.04

98.62

96.76

95.60

Residential

98.03

98.18

98.70

87.36

95.67

100

Water

100

99.11

100

98.39

100

99.59

UGS

98.29

98.53

99.78

99.11

99.33

94.86

Bare land

99.02

97.83

97.42

99.47

94.07

97.86

Sustainable Development in Urban Cities with LCLU Mapping

733

superiority of SVM over its peers. Hence, SVM is selected as the final classifier for generating the ULCLU map for analysis. Fig. 4 Classified map of the study area using MLC

Fig. 5 Classified map of the study area using SVM

734

Y. Khurana et al.

Fig. 6 Classified map of the study area using ANNs

Fig. 7 Comparative analysis based on OAC

Fig. 8 Comparative analysis based on kappa statistics

4.3 Analysis of ULCLU Maps for Urban Green Spaces For the chosen tile of Delhi with an area of 196 km2 , the values of UGS and UGBS per capita come out to be 25.976 m2 per capita and 48.148 m2 per capita, respectively. The population density of Delhi is taken as 55,445 inhabitants per km2 [24]. The green space abundance total was 27.722 km2 , and the percentage abundance for UGS

Sustainable Development in Urban Cities with LCLU Mapping

735

was 14.144%. These values reveal the poor state of landscape quality and vegetation in South–West Delhi. It is seen that the study area is densely populated and has a low concentration of public open green spaces, signifying a low index score on landscape quality. Currently, the green space consists of only 14.144% of the study area, whereas the rest of the area is covered by residential buildings, roads, barren lands and so on. While the UGBS estimate gives us a value of 48.148 m2 per capita, the green space excluding the water bodies only has an UGS value of 25.976 m2 per capita, which is almost half of the ideal UGS value of 50 m2 per capita, as recommended by the WHO guidelines for urban green space planning [16]. Generally, a low abundance is an indicator that a city might be going through social, ecological and environmental problems. Sparse and uneven distribution of green space areas suggests the lack of easy and equal access to public green spaces by all residents of the study area. In order to align the area by the WHO UGS standards, it is required to double the amount of green spaces present in the study area.

4.4 Potential Expansion Areas The amount of urban green spaces in a given area can be easily increased by expanding the current green spaces to their surrounding raw or bare lands. This will enable the new green spaces to thrive sustainably with the aid of their adjoining existing green spaces. Figure 9 displays the highlighted UGS and bare land pixels from the classified image, where the proximity of different UGS and bare lands can be clearly observed, indicating the easy expansion of existing green spaces into their adjoining barren lands.

5 Conclusion and Future Work This study proposes a successful method of combating urbanization by calculating the UGS of an area, comparing it to the WHO standards and expanding on the existing green spaces. When utilized for the Sentinel-2 image for South–West Delhi, it valued the UGS as 25.976 per capita and indicated various shortcomings in the ecological balance of the area. Our study area has a green space abundance of 14. Further, potential expansion areas are suggested by determining barren land areas near the UGS identified. In future works, high- and very high-resolution imagery can be used which would allow classification of UGS into categories such as residential UGS, institutional UGS and commercial UGS. This can be extremely useful in determining and identifying most the appropriate green spaces to be expanded and for policymakers in making

736

Y. Khurana et al.

Fig. 9 Map highlighting UGS and adjoining bare lands

decisions relating to urban planning. Additionally, classification algorithms such as convolutional neural networks and other complex deep learning approaches can be utilized to produce better accuracy.

References 1. Adhikari B, Pokharel S, Mishra SR (2019) Shrinking urban greenspace and the rise in noncommunicable diseases in South Asia: an urgent need for an advocacy. Frontiers Sustain Cities 1. https://doi.org/10.3389/frsc.2019.00005 2. Aronson MF, Lepczyk CA, Evans KL, Goddard MA, Lerman SB, MacIvor JS, Nilon CH, Vargo T (2017) Biodiversity in the city: key challenges for urban green space management. Frontiers Ecol Environ 15(4):189–196. https://doi.org/10.1002/fee.1480,10.1002/fee.1480 3. Dinda S, Chatterjee ND, Ghosh S (2021) Modelling the future vulnerability of urban green space for priority-based management and green prosperity strategy planning in Kolkata, India: a PSR-based analysis using AHP-FCE and ANN-markov model. Geocarto Int 1–28. https:// doi.org/10.1080/10106049.2021.1952315, https://doi.org/10.1080/10106049.2021.1952315 4. Du P, Xia J, Zhang W, Tan K, Liu Y, Liu S (2012) Multiple classifier system for remote sensing image classification: a review. Sensors 12:4764–4792. https://doi.org/10.3390/S120404764 5. Dutta D, Rahman A, Paul SK, Kundu A (2019) Changing pattern of urban land- scape and its effect on land surface temperature in and around delhi. Environ Monitor Assess 191(9). https:// doi.org/10.1007/s10661-019-7645-3 6. Gupta N, Bhadauria HS (2014) Object based information extraction from high resolution satellite imagery using ecognition. www.IJCSI.org 7. Haaland C, van den Bosch CK (2015) Challenges and strategies for urban green-space planning in cities undergoing densification: a review. Urban Forestry Urban Greening 14(4):760–771. https://doi.org/10.1016/j.ufug.2015.07.009

Sustainable Development in Urban Cities with LCLU Mapping

737

8. Ignatieva M, Haase D, Dushkova D, Haase A (2020) Lawns in cities: Froma globalised urban green space phenomenon to sustainable nature-based solutions. Land 9(3):73. https://doi.org/ 10.3390/land9030073,10.3390/land9030073 9. Khatami R, Mountrakis G, Stehman SV (2017) Mapping per-pixel predicted accuracy of classified remote sensing images. Remote Sens Environ 191:156–167 10. Kontoes C, Wilkinson GG, Burrill A, Goffredo S, M´egier J (2007) An experimental system for the integration of gis data in knowledge-based image analysis for remote sensing of agriculture 7(3):247–262. https://doi.org/10.1080/02693799308901955 11. Kuang W, Liu A, Dou Y, Li G, Lu D (2018) Examining the impacts of urbaniza tion on surface radiation using landsat imagery. GI Sci Remote Sens 56(3):462–484. https://doi.org/10.1080/ 15481603.2018.1508931 12. Liu D, Xia F (2010) Assessing object-based classification: advantages and limitations 1(4):187– 194. https://doi.org/10.1080/01431161003743173 13. Liu S, Zhang X, Feng Y, Xie H, Jiang L, Lei Z (2021) Spatiotemporal dynamics of urban green space influenced by rapid urbanization and land use policies in shanghai. Forests 12(4):476. https://doi.org/10.3390/f12040476 14. Liu W, Seto KC, Wu EY, Gopal S, Woodcock CE (2004) Art-mmap: a neural network approach to subpixel classification. IEEE Trans Geosci Remote Sensing 42(9):1976–1983. https://doi. org/10.1109/TGRS.2004.831893 15. M’Ikiugu MM, Kinoshita I, Tashiro Y (2021) Urban green space analysis and identification of its potential expansion areas. Procedia—Soc Behav Sci 35:449–458. https://doi.org/10.1016/ j.sbspro.2012.02.110 16. WH et al (2012) Health indicators of sustainable cities in the context of the rio+ 20 un conference on sustainable development. WHO, Geneva, Switzerland 17. Pierce L, Ulaby FT, Sarabandi K, Dobson MC (1994) Knowledge-based classification of polarimetric sar images. IEEE Trans Geosci Remote Sens 32(5):1081–1086. https://doi.org/10.1109/ 36.312896 18. Rahman A (2007) Application of remote sensing and GIS technique for urban environmental management and sustainable development of Delhi, India. Springer, Berlin, Heidelberg, pp 165–197. https://doi.org/10.1007/978-3-540-68009-38 19. Rahman A, Kumar S, Fazal S, Siddiqui MA (2012) Assessment of land use/land cover change in the north-west district of delhi using remote sensing and GIS techniques. J Indian Soc Remote Sensing 40(4):689–697. https://doi.org/10.1007/s12524-011-0165-4 20. Ruiz L, Fdez-Sarr´ıa A, Recio J (2004) Texture feature extraction for classification of remote sensing data using wavelet decomposition: a comparative study. Int Arch Photogrammetry Remote Sensing 35 21. Seto KC, Gu¨neralp B, Hutyra LR (2012) Global forecasts of urban expansion to 2030 and direct impacts on biodiversity and carbon pools. Proc Natl Acad Sci 109(40):16083–16088. https://doi.org/10.1073/pnas.1211658109 22. Steele BM (2000) Combining multiple classifiers: an application using spatial and remotely sensed information for land cover type mapping. Remote Sens Environ 74(3):545–556. https:// doi.org/10.1016/S0034-4257(00)00145-0 23. Tao Y, Xu M, Lu Z, Zhong Y (2018) DenseNet-based depth-width double reinforced deep learning neural network for high-resolution remote sensing image per-pixel classification. Remote Sens (Basel) 10(5):779 24. Wikipedia contributors: South west delhi district — Wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=SouthWestDelhidistrictoldid=1080371205(2022), [Online; Accessed 15 Apr 2022] 25. Wolch JR, Byrne J, Newell JP (2014) Urban green space, public health, and environmental justice: the challenge of making cities ‘just green enough’. Landscape Urban Plan 125:234–244, https://doi.org/10.1016/j.landurbplan.2014.01.017 26. Zhong Y, Wu Y, Xu X, Zhang L (2015) An adaptive subpixel mapping method based on MAP model and class determination strategy for hyperspectral remote sensing imagery. IEEE Trans Geosci Remote Sens 53(3):1411–1426

Multi-order Replay Attack Detection Using Enhanced Feature Extraction and Deep Learning Classification Sanil Joshi and Mohit Dua

Abstract The authenticated users are identified and verified using automatic speaker verification (ASV) technologies. An automatic speaker verification (ASV) system, like any other user identification system, is also sensitive to spoofing. In order to make the ASV systems robust against spoofing, these systems are alienated into two different phases, i.e., frontend feature extraction and backend classification model. The main emphasis of the paper is on the development of the system against multiorder replay attacks. The joint frequency-domain linear prediction (FDLP) and melfrequency cepstral coefficients (MFCC) is used at frontend to extract the features from the audio samples. At backend, gated recurrent unit (GRU) classification model is used. The proposed system is achieving 2.99% equal error rate (ERR) and 1.6% ERR under 1PR and 2PR spoofing attacks, respectively, and also provides 97.7% and 97.9% accuracy under the same environment. Keywords ASV · Spoofing · FDLP · MFCC · GRU · Deep learning

1 Introduction Spoofing is a security exploit in which an attacker impersonates another authorized user to get access to a trusted source. Three types of spoofing attacks are common in the ASV system: logical access (LA) attacks, presentation attacks (PA), and deepfake attacks (DF). Voice conversion (VC) and speech synthesis (SS) are two examples of logical access attacks that combine the use of various algorithms and computer-related skills. Presentation attacks, on the other hand, do not necessitate any technical skills. Replay attacks and mimicry attacks are the examples of presentation attacks. Replay attacks are performed using recording devices, and mimicry attacks are performed by skilled artist who can impersonate the voice of others. S. Joshi (B) · M. Dua Department of Computer Engineering, National Institute of Technology, Kurukshetra, India e-mail: [email protected] M. Dua e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_63

739

740

S. Joshi and M. Dua

Nowadays, replay attacks are of big concern because these attacks can be executed by any person without difficulty. Deepfake attacks are very similar to logical access (LA) attacks, except compression of audio samples. Various speech corpuses with different types of spoofing attacks have been proposed. For example, SS and VC attacks [1, 2] are present in ASVspoof 2015 dataset, the ASVspoof 2017 dataset only contains realistic presentation attacks [3], and the Yoho dataset only contains mimicry attacks [4]. To increase the robustness of the ASV system, ASVspoof 2019 [5] includes logical access and presentation attacks. ASVspoof 2021 [6] includes two widely used spoofing methods PA and LA attacks, as well as deepfake attacks. Voice Spoofing Detection Corpus (VSDC) [7] dataset includes various replay attack patterns like 0PR, 1PR, and 2PR; i.e., it contains spoofed speech samples recorded in multi-order replay scenarios as well as samples of original user. As replay attacks are easy to execute and can lead to the reduction in efficiency of ASV system, hence the proposed system uses VSDC dataset to make the system robust against replay attacks. The basic ASV system is mainly composed of two main phases. First phase is the frontend feature extraction technique. In this phase, important features are extracted from the audio samples provided by the users. Various feature extraction techniques are widely used in audio spoofing domain such as: acoustic ternary patterns (ATP) [7], constant-Q cepstral coefficients (CQCC) [3], gammatone cepstral coefficients (GTCC) [7], mel-frequency cepstral coefficients (MFCC) [8, 9]. Second phase of ASV system is backend classification model. The feature vector is fed to the classification model to classify audio samples into spoofed or the original audios. The different classification techniques are: Gaussian mixture model (GMM) [10], random forest (RF) [11], naive Bayes (NB) [7], long short-term memory (LSTM) [8], recurrent neural network (RNN) [3], convolutional neural network (CNN) [3], etc. In order to measure the performance of ASV, different performance evaluation criteria are used such as: equal error rate (EER) and tandem detection cost function (t-DCF) [12]. In the proposed system, the features are extracted using joint FDLP-MFCC feature extraction techniques because FDLP and MFCC are popular for speech feature extraction in speech recognition. The system gets trained and evaluated on Voice Spoofing Detection Corpus (VSDC). GRU classification algorithm is used for audio classification. The efficiency of the proposed system is measured using equal error rate (EER) [8] and accuracy metrics [7]. EER is standard evaluation metric for measuring the efficiency of biometric system such as ASV. The value of EER depends on false acceptance rate (FAR) and false rejection rate (FRR). In order to make the ASV system more efficient, the proposed system also uses accuracy as the evaluation metric to measure the performance.

Multi-order Replay Attack Detection Using Enhanced Feature …

741

2 Voice Spoofing Detection Corpus (VSDC) Voice Spoofing Detection Corpus [7] is designed to include the real scenarios of replay attacks, i.e., multi-order replay attacks. This dataset is divided into training and testing sets. Both training and testing sets contain the bonafide audio samples (0PR), first-order replay audios (1PR), and second-order replay audio samples (2PR). The audio samples of 9 female and 10 male reciters are used to create this dataset. And, all the audio samples are same in length.

3 Proposed Automatic Speaker Verification (ASV) System The proposed automatic speaker verification (ASV) system consists of joint FDLPMFCC feature extraction technique at frontend and gated recurrent unit (GRU) at backend. The evaluation and training of whole proposed system is done on VSDC dataset. Figure 1 shows the proposed architecture of automatic speaker verification (ASV) system. First, audio samples are provided as input to the FDLP-MFCC feature extraction, and feature vector of 78 features is extracted. This feature vector is then fed to the GRU model to classify the audio samples into spoof or the original audio samples.

3.1 Feature Extraction Using Combined FLDP-MFCC Technique Firstly, the audio signal of the user is given as input to FDLP technique to extract first 13 FDLP features [13]. Then, the same signal is passed to MFCC feature extraction technique to extract 13 MFCC features [8]. After that, these 13–13 features are combined to get feature vector of 26 features. Then, delta (first-order derivative) and

Fig. 1 Architecture of proposed automatic speaker verification (ASV) system

742

S. Joshi and M. Dua

Fig. 2 Steps of proposed FDLP-MFCC feature extraction technique

double delta functions (second-order derivative) are applied on 26 feature vectors to produce the feature vector of 78 features. Figure 2 shows the detailed steps in joint FLDP-MFCC feature extraction technique.

3.2 Classification Using GRU GRU is highly popular in deep learning as it solves the problem of vanishing gradient in RNN. GRU has quite similar properties as LSTM. Both have gated mechanism and hidden layers. But, GRU has less computational complexity. It provides faster results as compared to LSTM because it takes less computational parameters and can learn better on small training data. The proposed system comprises of GRU input layer with 50 units of neurons. This layer is followed by the dropout layer with drop rate of 20% to reduce the dependency among the neurons. Then, there are 4 GRU layers with 100,150,200 and 250 units of neurons. Each layer has tanh activation function. It generally defines the output the particular node gives when some input is applied to the node. Each layer is followed by dropout layer of 20%. Then, there is dense layer having one unit of neuron and sigmoid activation function to represent binary classification. Figure 3 provides the architecture of GRU model used in proposed system.

4 Experimental Setup Firstly, at the frontend, 13 FDLP features are extracted using FDLP feature extraction technique. Then, 13 MFCC features are extracted. Then, these static features are combined to get feature vector of 26 features. From this feature vector, dynamic features are extracted using delta and double delta functions to provide the feature vector of 78 features. The features are extracted using MATLAB R2021b version on Windows 11 operating system. For implementation of GRU, different machine learning and deep leaning libraries of Python are used. For training and validation, K-fold cross-validation is used. A number of folds are set to 5, batch size as 10,

Multi-order Replay Attack Detection Using Enhanced Feature …

743

Fig. 3 Architecture of GRU classification model

and number of epochs as 50. Early stopping criteria are used to prevent overfitting. Adam optimizer is used to provide optimization, and binary cross-entropy is used as loss function. The system is trained and tested on VSDC dataset. For measuring the efficiency of the proposed system, EER and accuracy metrics are used.

5 Results The proposed work is trained and evaluated on VSDC dataset. Firstly, we have taken results on bona fide (0PR) and single-point replay attacks (1PR). Then, bona fide (0PR) and multi-point replay audios (2PR) are combined to check system robustness against multi-point replay environment. The detailed interpretation of the obtained results is as follows:

5.1 Analysis of 0PR Versus 1PR In the first experiment, we have evaluated the proposed system performance against single-point replay attacks. It provides 2.99% EER value when joint FDLP-MFCC features are fed to GRU model. The results of 0PR versus 1PR are shown in Table 1.

744

S. Joshi and M. Dua

Table 1 Results of proposed system against single-point (1PR) and multi-point (2PR) replay attacks Dataset

Frontend

Backend

Type of attack

EER (%)

Accuracy (%)

VSDC

Joint FDLP-MFCC

GRU

1PR

2.99

97.7

2PR

1.6

97.9

5.2 Analysis of 0PR Versus 2PR In the second experiment, we have combined 0PR and 2PR. The system performance is evaluated under multi-point replay environment. The proposed system provides 1.6% EER value against 2PR. The obtained results show that the proposed system is more robust against multi-order replay attacks. The results of 0PR versus 2PR are shown in Table 1.

5.3 Comparison of Proposed Approach with Existing Techniques Dua et al. [14] have combined CQCC-CNN and MFCC-LSTM. Output of both the models is used to predict the final output. The hybrid model provides 3.6% EER and 2.96% EER in 1PR and 2PR environments, respectively. Also, system accuracy in both the environments is 97.6% and 97.78%, respectively. Shukla et al. [15] have only worked on single-point replay environment. The authors have worked on spectrograms and raw waveforms 1D and 2D CNN. The authors have tested the system performance using EER only. Their system provides 5.29% EER. The proposed system provides 2.99% EER value and 97.7% accuracy in single-point replay attack environment. Also, in multi-replay environment, the proposed system provides 1.6% EER and 97.9% accuracy. The comparison of the proposed approach with the existing approaches is given in Table 2. Table 2 Comparison of proposed approach with existing systems Works

Dataset

Frontend

Backend

EER 1PR (%)

Accuracy 1PR (%)

EER 2PR (%)

Accuracy 2PR (%)

Dua et al. [14]

VSDC

CQCC, MFCC

Hybrid LSTM, CNN

3.6

97.6

2.96

97.78

Shukla et al. [15]

ASVspoof 2017

Spectrograms, 1D, 2D raw waveforms CNN

5.29







Proposed work

VSDC

Joint FDLP-MFCC

2.99

97.7

1.6

97.9

GRU

Multi-order Replay Attack Detection Using Enhanced Feature …

745

6 Conclusion ASV systems are very popular among the voice-controlled devices, although these are vulnerable to different security threats. Proposed joint FDLP-MFCC feature extraction technique along with GRU model at backend provides 2.99% EER and 97.7% accuracy under 1PR attacks and 1.6% EER and 97.9% accuracy under 2PR attacks. It can be noted that joint feature extraction technique helps to reduce the complexity of the classification model, as GRU (less complex) model is used with joint feature extraction technique. For future scope, the system can be developed under different noisy conditions similar to real-life scenarios.

References 1. Mittal A, Dua M (2021) Automatic speaker verification system using three dimensional static and contextual variation-based features with two-dimensional convolutional neural network. Int J Swarm Intell 6(2):143–153 2. Mittal A, Dua M (2021) Automatic speaker verification systems and spoof detection techniques: review and analysis. Int J Speech Technol 1–30 3. Lavrentyeva G, Novoselov S, Malykh E, Kozlov A, Kudashev O, Shchemelinin V (2017) Audio replay attack detection with deep learning frameworks. In: Interspeech, pp 82–86 4. Campbell JP (1995) Testing with the YOHO CD-ROM voice verification corpus. In 1995 international conference on acoustics, speech, and signal processing, vol 1, IEEE, pp 341–344 5. Mittal A, Dua M (2021) Static–dynamic features and hybrid deep learning models-based spoof detection system for ASV. Complex Intell Syst 1–14 6. Delgado H et al (2021) ASVspoof 2021: Automatic speaker verification spoofing and countermeasures challenge evaluation plan. arXiv Prepr. arXiv2109.00535 7. Malik KM, Javed A, Malik H, Irtaza A (2020) A light-weight replay detection framework for voice controlled IoT devices. IEEE J Sel Top Signal Process 14(5):982–996 8. Dua M, Jain C, Kumar S (2021) LSTM and CNN based ensemble approach for spoof detection task in automatic speaker verification systems. J Ambient Intell Humaniz Comput 1–16 9. Mittal A, Dua M, Dua S (2021) Classical and deep learning data processing techniques for speech and speaker recognitions. In: Deep learning approaches for spoken and natural language processing. Springer, Cham, pp 111–126 10. Dua M, Aggarwal RK, Biswas M (2019) GFCC based discriminatively trained noise robust continuous ASR system for Hindi language. J Ambient Intell Humaniz Comput 10(6):2301– 2314 11. Biau G, Scornet E (2016) A random forest guided tour. TEST 25(2):197–227 12. Mittal A, Dua M (2021) Constant Q cepstral coefficients and long short-term memory modelbased automatic speaker verification system. In: Proceedings of international conference on intelligent computing, information and control systems. Springer, Singapore, pp 895–904 13. Ganapathy S, Pelecanos J, Omar MK (2011) Feature normalization for speaker verification in room reverberation. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4836–4839 14. Dua M, Sadhu A, Jindal A, Mehta R (2022) A hybrid noise robust model for multireplay attack detection in automatic speaker verification systems. Biomed Signal Process Control 74:103517 15. Shukla S, Prakash J, Guntur RS (2019) Replay attack detection with raw audio waves and deep learning framework. In: 2019 international conference on data science and engineering (ICDSE), IEEE, pp 66–70

Ferry Mobility-Aware Routing for Sparse Flying Ad-Hoc Network Juhi Agrawal, Monit Kapoor, and Ravi Tomar

Abstract Existing routing protocols are suitable for the well-connected network but are not suitable for the sparse network. This article presents a geographical routing scheme that utilizes a delay-tolerant routing protocol for on FANETs. The proposed ferry mobility-aware direction and time-based greedy DTN combines position routing strategy with DTN to reduce routing errors. Ferry unmanned aerial vehicles (UAVs) broadcast the beacons to make the other nodes aware of their next anchor location. The node that receives the beacon, calculates its nearest destination among the base station and ferry node. The source node chooses the forwarder node based on the time that neighbor nodes take to reach the destination and the zone to which that node belongs. Each node in the network uses the store-carry-and-forward technique. This paper proposes a new communication protocol with two-fold solutions, wherein low latency is the intrinsic aim. This trend highlights the impact of delivering data with lower routing overhead and lower latency in communication protocols. The proposed model proves its efficiency through the simulation scenarios, in the form of increased packet delivery, reduced end-to-end delay, and reduced overhead. Keywords DTN routing · Search · And rescue · Unmanned aerial vehicle

J. Agrawal (B) School of Computer Science, University of Petroleum and Energy Studies, Uttrakhand, India e-mail: [email protected] M. Kapoor · R. Tomar Institute of Engineering and Technology, Chitkara University, Punjab, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_64

747

748

J. Agrawal et al.

1 Introduction In emergency operations or search and rescue operations, many UAVs are specifically configured to perform complex tasks over a wide area. UAVs are often combined with the delay-tolerant network (DTN) technique to support routing in the intermittently connected network [1–6]. The technologies such as cameras, GPS, and sensors have received much attention in the same context as monitoring, surveillance, and search and rescue operations [7]. The UAV’s quality to quickly and easily reach points of interest makes it an excellent solution for a variety of applications. Due to UAV’s low node density and high mobility, the network is not connected all the time, and a route may not exists between the two UAVs. This class of the network is called sparse flying ad network (FANET), which is the type of delay tolerance (DTN) [8]. Figure 1 depicts that UAVs are searching for an object and forwarding data to the base station. The design of the routing protocol is more complicated because sparse FANETs induce frequent connection compounds with highly dynamic topologies. Existing routing algorithms designed for connected FANETs generally assume that there is at least one full route exists between source and destination, which may be suitable for dense networks. Therefore, packets cannot be forwarded efficiently in a partially connected or sparse network environment. Hence, a highly efficient and fault-tolerant routing protocol is required for sparse FANETs. Positioning-based

Fig. 1 UAVs are searching for an object and forwarding data to the BS

Ferry Mobility-Aware Routing for Sparse Flying Ad-Hoc Network

749

routing schemes are more suitable for sparse networks as compared to floodingbased schemes. Position-based routing scheme uses the geographic location of the UAVs to take data forwarding related decisions. For this reason, this article effectively combines geographic routing strategies with a DTN strategy for sparse FANETs. To obtain high packet delivery with low delay and low routing overhead, FM-DT-GDR uses a variety of strategies such as the ferry’s efficient mobility model to collect the data from UAVs and routing strategy. DTN uses store-carry-and-forward (SCF) policies where each node stores messages for forwarding to other nodes at a later stage [11]. Thus, various aspects of DTN, such as routing, remain open. Many routing strategies have been introduced to increase message forward ability, increase message replication, or rely on predicting the behavior of nodes. Other routing methods rely on various information to properly pass messages between nodes [12–15]. This paper aims to provide the routing solution of the DTN by combining the geographical routing with the DTN scheme.

2 Literature Work This section provides a brief overview of related studies on delay-tolerant networks (DTNs) routing as DTN-based routing is mainly designed for sparse FANETs. Therefore, conversational protocols cannot be enforced under these limitations. Then, numerous research work was carried out in this field to develop a protocol suitable for the characteristics of the DTN environment. To solve the DTN routing problem, various DTN routing schemes have been proposed in the literature. In [16], the authors proposed an epidemic routing protocol. In the epidemic routing protocol, the source node generates extreme replicas of each message. The exchange of buffered messages starts between two nodes when they meet; and at the end of the contact, they have the same set of packets (each node infects its neighbor, similar to the spread of dis-ease). This strategy increases the message transfer speed but is resource-intensive. Also, message replication control is not taken into account. Conversely, [17] proposes a spray and wait (S&W) protocol to control replication to decrease resource consumption. It has two phases: • Spray phase: when the source forwards n-copies of each packet to n intermediate UAVs. Additionally, send one copy to each neighbor (for normal spray and wait) and store n–1 in the buffer, or send half of the copies (for binary spray and wait) and send n–n/2. • Waiting phase: Each node waits for direct contact with the destination to deliver a copy of the message if the destination is not found in the spray. Although the S&W protocol performs better than the epidemic protocol. Thus, to increase the transmission rate, the number of copies required must exceed 5, so power consumption occurs.

750

J. Agrawal et al.

In [18], the authors propose a new version of the prophet routing which is known as the Prophetv2 protocol. The original idea is the same, but some minor changes have been done such as the predictability calculation mechanism. It should be possible to update the original prophet’s code work to conform to the Prophetv2 specification with minor code changes. Prophet is a generic routing scheme that uses historical data. Therefore, this protocol has crucial limitations [19]. • When delivering the message to the receiver, some of the messages can be severely corrupted by an intermediate node. • The risk of some packets being dropped due to a large number of nodes and hops in the connection path is quite high. • Some messages may be lost in transit, mostly due to time to live and complex buffer memory. The reunion of two nodes in a normal prophet greatly maximizes the probability of packet delivery. Also, if two nodes do not experience network outages in a defined time interval, the predictability of forwarding decreases rapidly. In addition, flaws in transport control can result in deletion, cancellation, or corruption of messages. In [19], LAROD is a geographic routing approach combined with SCF with a greedy routing approach. The source node broadcasts the data when data needs to be forwarded to the BS. The overhearing scheme is used in LAROD to control the routing overhead. In [20] location-aided delay-tolerant routing (LADTR), when the source node wants to send data, it simply broadcasts the route request message. If the sink node is available within the communication range of the node, then UAV simply forwards the data to the sink node only. A mobility prediction technique is used to estimate the future location of nodes. LADTR exhibits low latency and high packet delivery as compared to flooding-based techniques like an epidemic and spray and wait. Table 1 depicts the DTN routing protocols summary. The paper aims to propose a routing protocol that sends the data to the base station in less time and less overhead, unlike flooding-based schemes.

3 Proposed Scheme We proposed a novel ferries mobility-based direction and time-aware greedy delaytolerant routing (FM-DT-GDR) protocol. In this work, a mobility model for ferries is designed for an emergency scenario to collect data from UAVs. The steps of FM-DT-GDR have been discussed below:

Ferry Mobility-Aware Routing for Sparse Flying Ad-Hoc Network Table 1 Summary of DTN routing protocols

751

DTN routing protocol

Type of replication

Routing strategy

Epidemic

Unlimited replication

Flooding

Prophet

Controlled replication

Forwarding

Spray and Wait

Unlimited replication

Flooding

Maxprop

Controlled replication

Forwarding

Prophetv2

Controlled replication

Forwarding

LAROD

Unlimited replication

Forwarding

LADTR

Unlimited replication

Forwarding

GeoSaw

Unlimited replication

Forwarding

3.1 Mobility Model of Ferry UAVs First of all, the center of the operational area is calculated to design the optimized path for the ferry to collect the captured information from the UAVs. After that, ferry nodes are placed Cr, a distance away from the center location of the operational area. In this work, it is considered that each node has the same communication range. Two nodes are considered in this work for collecting the information from the searching nodes. The ferry node’s waiting point lies in the middle edge of the deployed area, in all four directions. The waiting points are those points on which a ferry waits and collects the data from other nodes. In Fig.2, A, B, C, and D are the waiting points of the outer ferry whereas, a, b, c, and d are the waiting points for the inner ferry. After collecting the data from one waiting point, the ferry traverses to the next waiting point. Thus, after collecting the data from all the four waiting points, the ferry finally reaches to fourth or last waiting point and sends the collected data to BS. All the waiting points of the ferry are connected diagonally, to reach from one waiting point location to another waiting point location.

752

J. Agrawal et al.

Fig. 2 Proposed mobility pattern

3.2 Routing Strategy The proposed routing solution takes the routing decision based on many factors like the node’s direction, speed, distance, and zone. Zone 1 is the zone in which the neighbor nodes are more close to the destination (distance-wise). Whereas, in zone 2, nodes are far away from the destination as compared to the source or sender node [19]. In this work, greedy forwarding is considered as a part of routing-related decisions. The routing strategy has been divided into the following steps:

3.2.1

Discovery of Neighbors

When considering the routing protocol, the general path is considered to be the periodically broadcast of the announcement of the presence of the neighbor. Each node broadcasts a short greeting to declare its presence and location. Each node stores information about its adjacent node to the neighbor table when receiving a ‘hello’ message. If the node does not receive a ‘hello’ message from the neighbors during a specific time interval, then it is considered that the neighbor node is not present in the transmission scope or is not accessible for any reason and is removed from adjacent tables. The neighbor table contains the current location, previous location, node id, and current speed of the node.

Ferry Mobility-Aware Routing for Sparse Flying Ad-Hoc Network

3.2.2

753

Nearest Destination Calculation

The beacon message is received by the UAV which is in the communication range of the ferries. Upon arrival at the waiting point, ferries continue to broadcast beacon messages. The beacon packet contains the waiting time, the next location of the ferry, and the ferry id. d(U, F) < d(U, BS)

(1)

where d(U, F) = distance between the node and ferry d(U, BS) = distance between node and BS.

3.2.3

Forwarding Node Selection

1. UAV’s direction prediction The direction prediction of the node is performed using the previous and current location of the node from the destination. The following formula has been used to predict the direction of the neighbor node: (t − 1) > (t) = 1

(2)

where (t–1) = distance between the previous location of the node and from destination (t) = distance between the current location of the node from destination. If the current location is closer to the destination as compared to the previous location, then it is considered that the node is moving toward the destination. 2. Minimum traveling time calculation: If the node is moving toward the destination, then the traveling time of each node to reach the destination is calculated using the following formula: T = [(d/s)]

(3)

where T = traveling time of a node to reach the destination from the current location. d = distance of a node from the destination. s =speed of a node. The concept of greedy is used in this work. Thus, the concept of the zone has been considered. The nodes that are closer to the destination as compared to the source node are the most preferred node as those nodes are already close to the destination.

754

J. Agrawal et al.

Algorithm 1 shows the proposed algorithm.

4 Performance Evaluation This section shows the evaluation of the proposed method and the results obtained. The simulation has been performed on NS-3.25.

Ferry Mobility-Aware Routing for Sparse Flying Ad-Hoc Network

755

4.1 Simulation Settings The simulation settings used for simulation have been shown below [20]: Parameters

Value

Network simulator

NS-3.25

Simulation duration

2700 s

Simulation area

2000 × 2000 × 500 m3

Antenna type

Omni-directional

Altitude of ferries

250 m.(fixed)

Network type

Wireless ad-hoc network

Communication range

250 m

Wi-Fi standard and frequency

IEEE 802.11 n(2.4 GHz) ferry UAVs IEEE 802.11b(2.4 GHz)searching UAVs

Message size

20 KB

Buffer size

20 MB

Number of ground station

1

Routing protocol

FM-DT-GDR, LADTR, LAROD, Spray and Wait, Epidemic

Mobility of ferries

Proposed predefined mobility

Average speed of searching UAVs 20 m/sec Beacon message interval

1 message/s

Hello message interval

1 message/s

4.2 Performance Comparison with Other DTN Protocols 4.2.1

Effect on PDR

In this simulation, the proposed protocol is compared with the existing DTN protocol such as an epidemic, spray and wait, LAROD, and LADTR. In Fig. 3, we can see that the average packet delivery rate of FM-DT-GDR is nearly 83%. The epidemic routing protocol has the lowest packet delivery rate. Epidemic routing protocol floods the message or makes the replica of the same message without any control. Therefore, initially, with the increase in the number of UAVs, the packet delivery rate also increased. But, gradually due to network congestion, the packet delivery rate starts getting decrease. The spray and wait perform better than the epidemic routing as spray and wait set the limit of replicas of the messages. LADTR performs better than any other previously proposed routing scheme as it also uses the ferry to collect the data and deliver it to the base station.

756

J. Agrawal et al.

Fig. 3 Packet delivery ratio versus number of UAVs

Fig. 4 Routing overhead versus no. of UAVs

4.2.2

Effect on RO

The effect of average routing overhead with the increasing number of UAVs has been shown in Fig. 4. Epidemic routing exhibits the highest routing overhead, i.e., (40–74)%. Spray and wait limit the number of copies in the forwarding process, whereas in FM-DTGDR, the number of packets forwarded is always 1 which results low overhead, i.e., (18–24)%. Replica-based protocols have no limit or limited limit on the number of packets, which increases traffic in the network and results in an increased RO. FM-DT-GDR depicts a maximum 24% routing overhead with 30 UAV nodes.

4.2.3

Effect on Latency

The average latency or end-to-end delay with the increasing number of UAVs has been shown in Fig. 5. FM-DT-GDR shows acceptable results, especially better than the epidemic routing protocol. The average time difference between the epidemic

Ferry Mobility-Aware Routing for Sparse Flying Ad-Hoc Network

757

Fig. 5 End-to-end delay versus number of UAVs

and FM-DT-GDR values is about 212 seconds when the number of nodes is 10. This difference is due to FM-DT-GDR direction prediction, speed, and current position consideration. The replica-based routing strategy shows a high delay as compared to no replica schemes like LADTR and FM-DT-GDR. The delay of FM-DT-GDR varies from 207 to 123 seconds when the number of nodes is from 10 to 30 nodes.

5 Conclusion This paper proposed an FM-DT-GDR routing protocol that utilizes the ferries’ mobility, node’s speed, direction, and current position to select the optimal forwarder node. A ferry mobility model has been proposed that provides the optimized path to ferry nodes to collect data from UAVs. To select the best forwarder node, FM-DTGDR first predicts the future direction of each neighbor node. If the node is moving toward the destination, then the source node calculates the time that each neighbor is taking to reach the destination. If the neighbor node takes more time than the source node itself, the source node doesn’t forward the data to any of its neighbor nodes. We analyzed the performance within a variety of scenarios, showing advantages in terms of packet delivery rate, routing overhead, and end-to-end delay.

References 1. Semsch E et al (2009) Autonomous UAV surveillance in complex urban environments. In: 2009 IEEE/WIC/ACM international joint conference on web intelligence and intelligent agent technology, vol 2, IEEE 2. Barrado C, Messeguer R, Lopez J, Pastor E, Santamaria E, Royo P (2010) Wildfire monitoring using a mixed air-ground mobile network. IEEE Pervasive Comput 9(4):24–32 3. Manathara JG, Sujit P, Beard RW (2011) Multiple UAV coalitions for a search and prosecute mission. J Intell Robot Syst 62(1):125–158

758

J. Agrawal et al.

4. De Freitas EP, Heimfarth T, Netto IF, Lino CE, Pereira CE, Ferreira AM, Wagner FR, Larsson T (2010) UAV relay network to support WSN connectivity. In: Proceedings IEEE international congress on ultra modern telecommunications and control systems and workshops (ICUMT), Moscow, Russia, pp 309–314 5. Maza I, Caballero F, Capitan J, Mart JR, Martínez-de Dios, Ollero A (2011) Experimental results in multi-UAV coordination for disaster management and civil secu rity applications. J Intell Robot Syst 61(1):563–585 6. Kerr S (2014) UAE to develop fleet of drones to deliver public services. The Financial Times, World News, Retrieved, vol 12 7. Cho A, Kim J, Lee S, Kee C (2011) Wind estimation and airspeed calibration using a UAV with a single-antenna GPS receiver and pitot tube. IEEE Trans Aerospace Electron Syst 47(1):109–117 8. Zafar W, Khan BM (2016) Flying ad-hoc networks: technological and social implications. IEEE Technol Soc Mag 35(2):67–74 9. Bekmezci I, Sahingoz OK, Temel S (2013) Flying ad-hoc networks (FANETs): a survey. Ad Hoc Netw 11(3):1254–1270 10. Bilen T, Canberk B (2022) Three-phased clustered topology formation for aeronautical Ad-Hoc networks. Pervasive Mob Comput 79:101513 11. Gupta L, Jain R, Vaszkun G (2016) Survey of important issues in UAV communication networks. IEEE Commun Surveys Tuts 18(2):1123–1152, 2nd Quart 12. Chmaj G, Selvaraj H (2015) Distributed processing applications for UAV/drones: a survey. Progress Syst Eng. Springer 366:449–454 13. Zheng Z, Li J, Yong T, Wen Z (2022) Research of routing protocols for unmanned aerial vehicle Ad Hoc network. In: 2022 2nd international conference on consumer electronics and computer engineering (ICCECE), IEEE, pp 518–524 14. Maurilio M, Saeed N, Kishk MA, Alouini MS (2022) Post-disaster communications: enabling technologies, architectures, and open challenges. arXiv preprint arXiv:2203.13621 15. Malhotra A, Kaur S (2022) A comprehensive review on recent advancements in routing protocols for flying ad hoc networks. Trans Emerg Telecommun Technol 33(3):e3688 16. Vahdat A, Becker D (2000) Epidemic routing for partially-connected ad hoc networks; Technical Report CS-2000-06. Duke University, Dehan, NC, USA 17. Spyropoulos T, Psounis K, Raghavendra CS (2005) Spray and wait: an efficient routing scheme for intermittently connected mobile networks. In: Proceedings of the 2005 WDTN ’05 ACM SIGCOMM workshop on delay-tolerant networking, Philadelphia, PA, USA, pp 252–259 18. Grasic S, Davies E, Lindgren A, Doria A (2011) The evolution of a DTN routing protocolPRoPHETv2. In: Proceedings of the 6th ACM workshop on Challenged networks, pp 27–30 19. Monicka S (2014) LAROD-LoDiS-an efficient routing algorithm for intermittently connected mobile ad hoc networks (IC-MANETs). Network Commun Eng 6(2) 20. Arafat MY, Moh S (2018) Location-aided delay tolerant routing protocol in UAV networks for post-disaster operation. IEEE Access 6: 59891–59906 21. Bujari A, Calafate CT (2018) A location-aware waypoint-based routing protocol for airborne DTNs in search and rescue scenarios. Sensors 18(11):375

Prediction of Cardio Vascular Disease from Retinal Fundus Images Using Machine Learning M. Sopana Devi and S. Ebenezer Juliet

Abstract The publicly available DIARETDB1 datasets contain retinal images that can be used to diagnose and forecast the risk of cardiovascular disease (CVD). Exudates, microaneurysms, and blood vessel segmentation in the retinal fundus images can all be signs of cardiovascular disease. The K-nearest neighbor (KNN) approach is demonstrated for CVD prediction. Anomalies in fundus images are used to train the proposed model using image processing technologies. The model was trained and tested using about 89 pictures from the publicly available DIARTDB1 dataset. This paper describes the feature extraction in blood vessel and aneurysm segmentation, and then, supervised KNN classification model is used for the prediction of the occurrence of heart disease from retinal fundus images with higher rate of accuracy. The CVD prediction accuracy of the trained model is 96.6%. Keywords Exudates · Blood vessel segmentation · K-nearest neighbor (KNN) · Microaneurysm · Cardio vascular disease (CVD) · Exudates · Blood vessel segmentation

1 Introduction CVD is a primary cause of disease and death around the world, and it plays a key role in the rising costs of human services. CVD is a group of diseases rather than a single disease. It is a group of illness that impacts the vascular, cardiac, and sensory systems. Stroke, hypertension, coronary heart disease (CHD), angina, and diabetic retinopathy are among them (DR). Demographic features, CVD attacks, pressure, etc., are the number of diseases is used to detect abnormalities in the eye’s retina. M. S. Devi (B) · S. E. Juliet V. V. College of Engineering, Tisaiyanvilai, Tamilnadu, India e-mail: [email protected]; [email protected] S. E. Juliet e-mail: [email protected] S. E. Juliet Vellore Institute of Technology, Vellore, Tamilnadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_65

759

760

M. S. Devi and S. E. Juliet

Abnormalities can be identified non-invasively by reviewing these fundus photos. The retina and optic nerve are the most important parts of the fundus picture. The thinning of blood vessels in the retina indicates DR (inner membrane of the eye). Other symptoms include hemorrhages (blood seeping from arteries and veins), microaneuryms (tiny bulges in the retinal blood vessels), and exudates. In a retinal hemorrhage, blood leaks into the retina, whereas in a vitreous hemorrhage, blood leaks into the vitreous fluid of the eye. Ventricular and atrial septal aneurysms cause myocardial infarction. These aneurysms range in size from 14 to 136 m. Saccular, fusiform, and focal bulges are examples of these formations. Exudates are fluids made up of pus, white blood cells, serum, and dead cells that ooze out of stick sites. Exudates are divided into two categories: hard and soft exudates. During the early stages of CVD, hard exudate forms. Image processing techniques can be used to extract these anomalies from retinal fundus pictures. Threshold for Intensity The picture of the retinal fundus is segmented to remove the blood vessels. A machine learning technique called clustering is used to separate the exudates from the fundus image. In the segmentation of microaneuryms, the OTSU thresholding segmentation approach is applied. Exudate is a kind of liquid that leaks from blood vessels into the tissues nearby. The fluid is made up of proteins, solids, and cells. Cuts, infections, and inflammation can all cause exudate to ooze. Microaneurysms are saccular outpouchings of capillary walls that can leak fluid, causing intra retinal edema and hemorrhages. A retinal hemorrhage happens when blood vessels in the retina bleed. A hemorrhage, or bleeding, happens when these little blood vessels are destroyed by an accident or disease. Retinal hemorrhage can be caused by diabetes, high blood pressure, head injuries, or even changes in air pressure. The above explanation is about features of retina in the fundus image (Fig. 1). Fig. 1 Features of the retina in the fundus image

Prediction of Cardio Vascular Disease from Retinal Fundus Images …

761

2 Literature Survey In addition to being a major factor in the world’s rising healthcare costs, CVD is also a primary cause of disability and mortality [1]. Diabetics’ cardiovascular disease (CVD) is the leading cause of death, and hypertension exacerbates the problem [2]. Krishna Sree and Sudhakar Rao focused on blindness in the industrial age, such as advanced level macular degeneration, diabetic retinopathy, and glaucoma. Their method finds infected pixels and reports them, allowing them to be dealt with [3]. The study’s primary goal is to create a fundamental framework for diagnosing diseases like pressure, insulin, and cardiovascular attacks [4]. The primary goal of this review is to investigate and report the retinal linkages associated with these lifethreatening unfavorable conditions, using data from a variety of clinical studies [5]. The retinal vascular function of asymptomatic participants is tested in this study, and they are divided into risk groups depending on their Framingham Risk Score [6]. This paper seeks to provide an efficient method of retinal vascular segmentation by combining correlation filters with measurements produced from the eigenvalues of the Hessian matrix [7]. In this research, the exudates regions were captured on each visual field using the scale-invariant feature transform (SIFT) and speeded up robust features (SURF). The main objective of the study is to offer a conceptual foundation for recreational activities. In order to identify medical conditions and predict significant issues including diabetes, cardiac vascular difficulties, and heart conditions, image processing is essential [8]. The segmentation of blood vessels using the FCM technique, the Gabor transform function, and ant colony optimization is suggested in this paper [9]. Tests of the suggested technique were conducted on fifteen high-resolution retinal pictures, and the findings demonstrated that our method outperformed all others [10]. One important factor in determining CVD is the presence of retinal vessels, and the procedures used to measure them may reveal the existence of diseases [11]. This study [12] employs a collection of quantitative methods to examine the outcomes of the fusion of multimodality images. The primary goal of the DWT-IFS vision of creating is to combine the characteristics of both intuitionistic fuzzy sets (IFSs) and a fully unique discrete wavelet transform (DWT) [13].

3 Proposed Work See Fig. 2.

762

M. S. Devi and S. E. Juliet TRAINING PHASE

TESTING PHASE

Input Image Test Image

Green Channel Green Channel Pre-processing (Median Filtering,CLACHE)

Pre-processing (Median Filtering,CLACHE)

Segmentation (Intensity Thresholding,Otsu Thresholding) Segmentation (Intensity Thresholding,Otsu Thresholding) Feature Extraction GLCM

Feature Extraction GLCM DataBase

Model training

Classification KNN Classifier

If yes, Cardiovascular disease risk exists

If no, Cardiovascular disease risk doesn’t exists

Fig. 2 Flow diagram for proposed system

4 Methodology It is proposed to predict CVD using the K-nearest neighbor algorithm (KNN). The above flow diagram is for the proposed system (Fig. 2). Stroke, hypertension, coronary heart disease (CHD), angina, and diabetic retinopathy (DR) are just a few of the ailments that the CVD delivers significant predictions for utilizing machine learning approaches. Various classification models can be used to forecast how patients with a certain ailment will behave in the future. Our goal in this work is to construct a cardiovascular disease predicting system to help with the current humanitarian situation (CVD). These prediction algorithms can be extremely useful in guiding

Prediction of Cardio Vascular Disease from Retinal Fundus Images …

763

early actions to properly manage diseases in the current context. Data is divided into training and testing values in the proposed model. This proposed model employs the KNN algorithm. The data is trained using machine learning methodologies. To evaluate the test values, testing models such as precision, recall, F1 score, and support are used. This approach is used to predict the accuracy of CVD.

4.1 Image Acquisition This module was previously used to capture a digital image. Human retinal images play a critical part in the diagnosis of diseases that determine the severity of the ailment or naturally assess the condition. The retinal images should be transferred. The retina optic circle macula and region, as well as the rear shaft, are included in the fundus of the consideration, which is located within the surface of people’s eyes inverse the focus point. A medicinal thing complexity body part photography approach will be used to examine the body structure. We will construct the vasculature inside the retina, which might be a stratified structure with multiple layers of coupled neurons by neurotransmitters. We will construct the vasculature inside the retina, which might be a stratified structure with multiple layers of coupled neurons by neurotransmitters. Veins display variances from the usual at first, as well as vein alterations. Blood vessel and vein narrowing related to upper weight level dimensions that are typical communicated by the blood vessel to vein breadth greatness connection were summarized. It is made up of set of images that will be used to train and test our setup system. To begin with, when compared to traditional images, there are easier to differentiate. To begin with compared to conventional pictures are less difficult to distinguish. Second, some dimension of progress with irregular vessel appearances should be set up to the recommended clinical use are frequently observed in an exceedingly conventional picture comprises of veins, optic plate area, and thus, the foundation, however, the anomalous picture has numerous antiquities of unmistakable shapes and hues brought about by the completely unexplained.

4.2 Preprocessing The preprocess method is used to improve the image in such a way that the likelihood of completing the opposite procedures is increased. Improving the visual nature of eyes by refining the high recurrence components of an image results in a companion degree honing entails a flag that is related to pass high channel yield is added to the essential film, and the essential film is then fabricated as honed files. Picture honing refers to any enhancement method that emphasizes the edges and small details of a photograph. Essentially, picture honing entails a flag that is related to a pass high detached form of the underlying film in addition to the essential image. In this channel, the essential film is first sifted by a pass high channel, which removes 33 of

764

M. S. Devi and S. E. Juliet

Fig. 3 a Retinal image, b Grayscale image, and c Green-channel image

the high recurrence segments, after which a scaled variant of the pass high dedicated form of the underlying film, in addition to the essential film is then fabricated as honed film.

4.2.1

Channel Extraction

This step is used to separate the fundus image’s green channel. The intensity level of the converted grayscale image is equalized using this histogram equalization. Contrast limited adaptive histogram equalization (CLAHE) is used to boost the contrast at the title level by isolating the green channel from the concatenated RGB. The green channels (Fig. 3) are extracted when the fundus images are scaled uniformly, and the uneven illumination is removed. The “green” channel is generally subjected to contrast normalization since it is less susceptible to noise. After CLAHE, the pictures are equalized using histogram equalization. This is done in addition to CLAHE to boost the contrast even more. As a result, the photos are evenly lighted, and the contrast is increased at the end of the preprocessing.

4.2.2

Median Filtering

The median filtering method, often known as salt-and-pepper noise, is a nonlinear method for reducing impulsive noise. The median filter reduces noise while keeping edge features. Smoothing techniques such as Gaussian blur are also used to minimize noise. However, they are unable to preserve edge features. Because it preserves edge features, the median filter is commonly employed in digital image processing.

4.2.3

CLAHE

A type of AHE called contrast limited adaptive histogram equalization (CLAHE) prevents the contrast from being over-amped. Instead than working with the entire

Prediction of Cardio Vascular Disease from Retinal Fundus Images …

765

Fig. 4 a Green channel b after histogram equalization c after-CLAHE

image, CLAHE uses tiles, which are compact portions of an image. To eliminate the fictitious borders from the surrounding tiles, bilinear interpolation is adopted. The contrast of images is enhanced by using this technique. The effects of adjusting only the luminance channel of an HSV image are markedly better than those of adjusting all the channels of a BGR image. CLAHE can also be used on color images, where it is widely prescribed to the luminance channel. The green channel is extracted, and then, histograms are equalized; finally, CLAHE is applied to enhance the retinal fundus images which is used for segmentation (Fig. 4).

4.3 Segmentation Digital image processing has a subsection called image segmentation that deals with dividing an image into segments based on their attributes and characteristics. One can partition or divide an image into many portions. The essential segments can be used for image processing after the image has been segmented into segments. An image is composed of pixels. Using image segmentation, we combine pixels with comparable qualities. A pixel-wise mask is constructed for each object in the image as a result of image segmentation.

4.3.1

OTSU Thresholding

The original RGB image is used to segment aneurysm by OTSU thresholding segmentation method (Fig. 5).An image segmentation algorithm with a global adaptive binarization threshold is the OTSU method (OTSU). Based on the increased interclass variation between the background and the target image, the threshold selection rule in this technique is based. Both manually setting and automatic application setting options are available for the relative image lightness (picture intensity) threshold.

766

M. S. Devi and S. E. Juliet

Fig. 5 a Original RGB image, b OTSU thresholded aneurysm segmented image

Fig. 6 a Original RGB image, b segmented image of blood vessels using intensity thresholding

Black pixels are created when the bit value is zero, and white pixels are created when the bit value is one (a bit value of one).

4.3.2

Intensity Thresholding

The original RGB image is used to segment blood vessels by intensity thresholding segmentation method (Fig. 6).Only, the intensity threshold is a factor in segmentation. Each pixel in an image is evaluated against with threshold in one pass. The output turns a pixel to white if its intensity is greater than the threshold. To isolate one or more areas of interest from the background of a picture, segmentation is used. It is referred to as a binary picture since each pixel only has two values: 1 for the foreground and 0 for the background. Since the mask has a value of 1 for each position (x, y) of the input images, if this pixel is a featured pixel, it can be used as a mask.

4.4 Extraction of Texture Features 4.4.1

GLCM

A co-occurrence matrix is used to determine the probability that two pixel values will appear together in an image from a distance. GLCM is the name of the algorithm. Each matrix of the gray-level co-occurrence has a total value of one. A string array

Prediction of Cardio Vascular Disease from Retinal Fundus Images …

767

Table 1 Recall, precision, F1 score, support, and accuracy values of KNN-based CVD risk prediction S. No.

Training data

Precision rate

Recall rate

F1 score

Support

Accuracy

1

50

96

96

0.96

75

96

2

60

97

97

0.97

60

97

3

70

98

98

0.98

45

98

4

80

97

97

0.97

30

97

5

90

94

93

0.93

15

93

with strings in each cell, the string “all,” a space-separated string, and a commaseparated list of strings are examples of all distinct combinations for properties. The location of pixels with comparable gray-level values is monitored by a graylevel co-occurrence matrix (GLCM). A statistical method for analyzing texture that considers pixel spatial correlations is the gray-level co-occurrence matrix (GLCM), often referred to as the gray-level spatial dependence matrix. A GLCM is produced by the GLCM functions, which determine how frequently pairs of pixels with specific values appear in a picture in a specific spatial relationship.

4.4.2

Experimental Results

The proposed system was successfully implemented using Python on a system with an Intel Pentium 2.10 GHz processor and 4 GB of RAM. The data was obtained from the DIARETDB1 database. This database derives the accuracy values from 89 color fundus photographs, 84 of which show at least mild symptoms and 5 of which are normal.

Performance Measures Table 1 calculated the value of the decision tree also calculated the value of precision. Recall, F1 score, and support values of KNN-based CVD risk prediction (Table 1). Graphical representation of recall, precision, F1 score, support, and accuracy values of KNN (Fig. 7) shown below. From the experimental result, it is found that the evaluated measures are precision 97%, recall 97%, F1 score 97%, and support value is 60 with 97% of accuracy is considered as the best among other training data.

768

M. S. Devi and S. E. Juliet 120 100

Training Rate

80

Precision Rate

60

Recall Rate F1 Score

40

support 20

Accuracy

0 50

60

70

80

90

Fig. 7 Graphical representation of recall, precision, F1 score, support, and accuracy values of KNN-based CVD risk prediction

5 Conclusion To diagnose and forecast the chance of CVD start, the retinal pictures in the publicly accessible datasets DIARETDB1 were subjected to a series of traditional preprocessing techniques, and grayscale images were generated. Several feature extraction methods are applied to the images after these preprocessing processes. The intensity segmentation technique is used to isolate the blood vessels from the fundus photographs. OTSU threshold segmentation was used to extract both small and large microaneurysms. By diagnosing a large volume of data with high accuracy in a short amount of time, the applicable methodologies would improve precision and relieve an ophthalmologist’s workload in the real world. This research could be used to identify different stages of CVD from retinal fundus images in the future.

References 1. Staal JJ, Abramoff MD, Niemeijer M, van Ginneken B (2004) Ridge based vessel segmentation in color images of the retina. IEEE Trans Med Imageing 23:501–509 2. Caroline M, Mary VS, Rajsingh EB, Naik GR (2016) Retinal fundus image analysis for diagnosis of glaucoma: a comprehensive survey. Access IEEE 4:4327–4354 3. Ren X, Zheng Y, Zhao Y, Luo C, Wang H, Lian J, He Y (2018) Drusen segmentation from retinal images via supervised feature learning. Access IEEE 6:2952–2961 4. Shivappriya N, Rajaguru H, Ramya M, Asiyabegum U, Prasanth D (2021) Disease prediction based on retinal images. In: 2021 smart technologies, communication and robotics (STCR), pp 1–6. https://doi.org/10.1109/STCR51658.2021.9588829 5. L.K. Pampana, M.S. Rayudu (2020) A review: prediction of multiple adverse health conditions from retinal images. In: 2020 IEEE Bangalore humanitarian technology conference (B-HTC), pp 1–6. https://doi.org/10.1109/B-HTC50970.2020.9297936 6. Fathalla KM, Ekart A, Seshadri S, Gherghel D (2016) Cardiovascular risk prediction based on retinal vessel analysis using machine learning. In: 2016 IEEE international conference on systems, man, and cybernetics(SMC), pp. 000880–000885. https://doi.org/10.1109/SMC. 2016.7844352

Prediction of Cardio Vascular Disease from Retinal Fundus Images …

769

7. Oliveira WS, Ren TI, Cavalcanti GDC (2012) Retinal vessel segmentation using average of synthetic exact filters and hessian matrix. In: 2012 19th IEEE international conference on image processing, pp 2017–2020. https://doi.org/10.1109/ICIP.2012.6467285 8. Behera MK, Chakravarty S (2020) Diabetic retinopathy image classification using support vector machine. In: 2020 international conference on computer science, engineering and applications (ICCSEA), pp 1–4. https://doi.org/10.1109/ICCSEA49143.2020.9132875. 9. Malik R, Shrivastava M, Takur VS (2021) Analysis of retinal image for blood vessel using swarm intelligence and transform function. In: 2021 international conference on advances in technology, management and education (ICATME), pp 48–53. https://doi.org/10.1109/ICA TME50232.2021.9732748 10. Nguyen UTV, Bhuiyan A, Park LAF, Kawasaki R, Wong TY, Ramamohanarao K (2013) Automatic detection of retinal vascular landmark features for colour fundus image matching and patient longitudinal study. In: 2013 IEEE international conference on image processing, pp 616–620. https://doi.org/10.1109/ICIP.2013.6738127 11. Rajan SP (2020) Recognition of cardiovascular diseases through retinal images using optic cup to optic disc ratio pattern recognition. Image Anal 30:256–263.https://doi.org/10.1134/S10546 6182002011X 12. Soundrapandiyan R, Vijayan V, Karuppiah M (2013) Quantitative analysis of various image fusion techniques based on various metrics using different multimodality medical images. Int J Eng Technol 5(1):133–141 13. Soundrapandiyan R, Karuppiah M, Kumari S, Tyagi SK, Wu F, Jung KH (2017) An Efficient DWT and intuitionistic fuzzy based multimodality medical image fusion 7. First published: 18 May 2017 https://doi.org/10.1002/ima.22216,Citations

Tampering Detection Driving License in RTO Using Blockchain Technology P. Ponmathi Jeba Kiruba and P. Krishna Kumar

Abstract Manufactures, dealers, buyers, registration authorities, and insurance companies are the entities involved in motor vehicle registration. All entities must work together for the successful registration of the vehicle. However, errors may occur as a result of incorrect or incomplete data entry, as well as other effects like data tampering, could pose a menace to the system. Regional Transport Office is in charge of the vehicle driving license process (RTO). Maintaining automotive procedures in traditional way may obstruct the route to a more dependable, and detection of fraud is a difficult with the existing system. If fraud occurs, it is problematic to track back because many entities such as insurance company, dealer, manufacturer, and RTO function with their norms. SHA256 algorithm is used for hashing the data in blockchain. Hashing scramble, the raw data and hence, it is hard to get back the original data. By incorporating blockchain technology, we are able to obtain a more dependable solution, that is more promising, which aids the creation of a more dependable, secured and efficient system. It is a strategy to solve the problems underlying in the vehicle driving license system and connect them using blockchain technology. Binding them under one gateway will increase the system’s reliability because all of them will be on the chain; with blockchain, every transaction record will be available together with the timestamp. Keywords Vehicle registration · BCT · Buyer · RTO · Dealer · Manufacturer · Blockchain

1 Introduction Blockchain is a network of computer nodes that shares a distributed database. A blockchain is a type of digital database used to store data. In cryptocurrency systems P. Ponmathi Jeba Kiruba (B) · P. Krishna Kumar VV Collage of Engineering, Tisaiyanvillai, Tamil Nadu, India e-mail: [email protected] P. Krishna Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_66

771

772

P. Ponmathi Jeba Kiruba and P. Krishna Kumar

Fig. 1 Blockchain

like Bitcoin, blockchains play a key role in keeping a secure and decentralized record of transactions. The distinctive quality of the blockchain is that it establishes trust without the aid of a reliable third party while simultaneously ensuring data record integrity and security. The data structure of a blockchain is distinct from that of a traditional database. Blockchain separates data into blocks, each of which contains a group of data. The blockchain is a chain of data that is created when a block is filled, closed, and connected to the preceding block (Fig. 1). To ensure the further security of blockchain, in this work, we are using SHA256 algorithm. It is a hashing technique used to store the data in a scrambled form, and hence, it is tedious to trace back its original data. It generates the output as hash value.

1.1 Objectives • To build a blockchain-based infrastructure for vehicle driving licenses to ensure data availability. • Using smart contracts, develop blockchain technology for all forms of driver’s licenses. • To forecast the effect on vehicle driving license business procedures.

2 Literature Survey During the previous few years, blockchain has been a hot topic. It has been on the market because of its ability to never destroy a block and to protect the system’s history from being erased. To solve the issues, many blockchain applications have been implemented. Crypto-money, e-commerce, and other applications are only a few

Tampering Detection Driving License in RTO Using Blockchain …

773

examples. Gaurav Nagala [1] presented a dependable ledger framework for vehicle records. It records past car information such as crash information, financial information, ownership transfer, repairs, and vehicle life. The traditional technology can readily tamper with data, allowing for unwanted and unintentional alterations. The author recommended the use of blockchain technology to solve these issues store individual vehicle data; the system uses the blockchain to establish a chain made up of distinct blocks. The newly formed node is appended to the chain’s terminus and can be transverse. The interface unit was also created to allow users to observe and add new nodes to the chain. The author’s proposed distributed ledger platform is decentralized. The blockchain’s approved nodes can only construct new nodes, and any changes in the chain are reflected in the nodes that are connected to it. The contract is made when a vehicle is transferred from one owner to another. This contract includes information about the seller’s electronic signature, the buyer’s electronic signature, and the transaction terms, which decide if the transactions terms, such as payment, have been met. This contract is uploaded to distributed ledger, resulting in the creation of a new car block with a new owner name. Michal [2] in addition Steven Sprague presented an example of how incorporating hardware safety might improve the end user’s and blockchain’s security and privacy. First and foremost, a block diagram of inner computer node structure must be created. And the data flow must be maintained. The authentication device should be positioned in the front so that every user who wants to use the system must first verified. They claimed that in order to improve the user’s security and privacy on blockchain, they must be authenticated as a legitimate buyer or seller on the platform by putting a layer of protection above the blockchain transaction; we can guarantee great security to very essential data and our privacy [3] Cho Cho htet and May htet set out to create a secure trading system. There is a need for a more secure environment to protect against attacks, such as 51 percent attacks. He advised using a dependable e-commerce business model with a product rating system to determine the product’s price depending on its quality. He created a platform where car owners, buyers, and other entities must register. A public blockchain network was used to store their information. To learn about a car’s history, a buyer can make a request to the owner, who will respond with a block comprising the details of the vehicle. Everything is recorded in the blockchain. When a vehicle is sold, the information in the block for that vehicle is updated, and a new block is added to the chain. The block for the created blockchain was mined by a miner who was picked at random. The miner was picked based on the amount of Bitcoin produced. Because the blockchain is dispersed among multiple nodes, it makes it more difficult for attackers to assault the system. San war Hosen, Saurabh Singh, Byungun Yoon [4] proposed the blockchain concept that related variables, as well as a comprehensive examination of potential security threats and known responses. This study also looks at how to improve blockchain security by laying out essential features that may be used to build various blockchain systems and security solutions that solve security problems. Finally, article covers open challenges with blockchain-IoT systems as well as future research prospects. Mary Subaja Christo, Vijayakumar Pandi, Azees Maria, Jeatha Deborah Lazarus and Marimuthu Karuppiah [5] proposed that the advent of blockchain technology has given VANETs the

774

P. Ponmathi Jeba Kiruba and P. Krishna Kumar

opportunity to address the aforementioned issues. Fast accreditation of cars is done in this article by the secure transfer of authentication codes between subsequent RSUs to decentralized nature of blockchain technology. In the security analysis section, the proposed blockchain-based anonymous authentication technique is proved to be secure against a variety of destructive security attacks, ensuring that it delivers greater security. Furthermore, as shown, blockchain is employed for the reduction of cost significantly when compared to traditional methods. Xianfu Chen, Celimuge Wu, Liming Gao, Tsutomu Yoshinaga, Yusheng Ji [6] proposed by defining various blockchain channels, each of which is designed for a specific amount of vehicle density. The system formerly chooses the optimum channel based on the count of vehicles and the transaction throughput and latency requirements of the application. It shows that the proposed blockchain method outperforms previous baselines using extensive simulations [7] Donpiti Chulerttiyawong; Abbas Jamalipour presented a system for vehicle pseudonymous communications that makes use of the very practical, adaptable and mature public key infrastructure (PKI) technology, as well as the widely anticipated release of roadside units (RSUs).The proposed architecture to use the hyperledger fabric platform as the permissioned consortium blockchain system and the vehicle in network simulation (Veins) platform for integrated traffic and network simulation services (SUMO as the traffic simulator and OMNeT + + as the network simulator) was successfully small-scale simulated. The simulation and performance study’s findings imply that the scheme is possible for actual implementation and that it addresses existing works’ drawbacks, such as the ability to strike a better balance between connection and storage requirements. Marimuthu Karuppiah, Hridya Venugopal, Uma Priyadarsini, Elizabeth Jesi, Anbarasu [8] proposed about the data authentication and data confidentiality; they are the two key data security challenges discussed in this paper. The suggested system encrypts and decrypts data with a 512-bit key to assure data authentication. The usage of the blockchain ledger system ensures data secrecy by allowing only ethical individuals access to the information. Finally, on the edge device, the encrypted data is saved. Medical reports are stored within the edge network using edge computing technology, which allows for quick access to the data. Data can be decrypted and processed at maximum speed by an authenticated user. The revised data is saved on a cloud server when it has been processed. The proposed method ensures the safe storage and retrieval of medical information and reports. Ritesh Mohan Acharya, Shashank Singh, Selvan [9] want to reduce their electricity bills by managing demand response (DR). Two architectures are presented: one with a third-party agent demonstrated in the MATLAB environment, and the other with a virtual agent (without a third-party) implemented in the blockchain environment. The simulation findings show large financial benefits for each market participant, as well as increased community self-sufficiency, selfconsumption, and less dependency on the utility grid. Abdulhadi Shoufan, Noura Alnuaimi, Ruba Alkadi, Chan Yeob Yeun [10] proposed a literature on state-of-theart across blockchain to highpoint the most recent advances. A current overview of blockchain-based UAV network applications will be presented. We propose a variety of UAV network scenarios based on the findings of our study that could benefit

Tampering Detection Driving License in RTO Using Blockchain …

775

from current solutions. Lastly, find outstanding difficulties and challenges in implementation of a cross-blockchain technique for UAV systems, which may aid future study.

3 Methodology • • • • • • •

Driving License Registration Manage Driving License Blockchain Validation Blockchain Mine Performance Blockchain Recovery Blockchain Performance Description of Modules.

A real-time application (RTO) is a program that operates in a time period that the user perceives to be immediate or current. The latency must be under a certain threshold, commonly measured in seconds (Fig. 2). The maximum length of time a particular task or combination of tasks required on a certain hardware platform determines whether or not an application qualifies as an RTO. RTO’s application is shown below.

3.1 Driving License Registration The user can easily apply for a new two-wheeler or four-wheeler license by filling out a new form and providing all relevant information, then clicking submit. After successfully completing the application, we are directed to the payment page. This

• DRIVING LICENSE REGISTRATION

• MANAGE DRIVING LICENSE

• BLOCKCHAIN VALIDATION

• PERFORMANCE

• BLOCKCHAIN RECOVERY

• BLOCKCHAIN MINE PERFORMANCE

Fig. 2 Flow diagram of the proposed system

776

P. Ponmathi Jeba Kiruba and P. Krishna Kumar

Fig. 3 SHA256 algorithm

module is used to enter driving license information such as name, registration date, photo, and so on. The SHA256 algorithm is used to concatenate and encrypt the given data. The SHA256 technique is used to generate the hash key. The hash key and prior hash key are used to store the data in the database. The current date and time are also included when data is entered. The hash key and preceding hash key are not 0 when the first record is uploaded to the blockchain. The genesis block is the name given to this block. The hash key is formed from current data after each record is inserted, and the previous hash key is carried forward from the previous record.

3.2 SHA256 Algorithm A 256-bit result produced by a unique cryptographic hash function is the SHA256 hash function. Cryptographic method modifies hashed data in such a way that it becomes fully unreadable. The Bitcoin protocol and mining method both use the term secure hashing method (SHA256), which refers to a cryptographic hash function that yields a 256-bit value. The creation and maintenance of addresses, as well as the transactional verification, are all under its authority. SHA256 is utilized not only for Bitcoin mining but also for generating Bitcoin addresses. Because of the great level of protection it provides, this is the case. A hash function serves three basic purposes: • To deterministically scramble data • To take an arbitrary length input and return a result of a fixed length • To modify data in an irreversible manner. The output cannot be used to infer the input (Fig. 3). The above picture describes about the hashing function in SHA256.

3.3 Padding Bits It adds 64 bits to the message’s length, reducing its multiple of 512 by exactly 64 bits. The addition’s first bit should be one. Zeros should be placed in the empty space that remains. The plain text can now be given a padding length of 64 bits to render

Tampering Detection Driving License in RTO Using Blockchain …

777

it a multiplier of 512.You can compute these 64 bits of characters by applying the modules on basic clear text without the padding.

3.4 Compression Techniques The message is broken up into multiple 512-bit segments. Each block goes through 62 rounds of processing, with the output from one block serving as the intake for the next. The process proceeds as follows, with the value of k[i] being pre-initialized in each round. Depending on how many iterations are currently being handled, each block’s I separate input is independently calculated.

3.5 Output At the end of each iteration, the block’s results are used as the source for subsequent block. Up to the final 256-bit block, the cycle repeats itself, and at that time, you take the output as the final cipher digest. This digest, which will be 256 bits long as the technique’s name says,

3.6 Manage Driving License This module is used to display the details of the driver’s license that the administrator has already provided. This module is displayed as a table. This table provides the following information, photo, name, registration date, location, and so on (Fig. 4).

3.7 Blockchain Validation A blockchain validator is a person who is in charge of confirming blockchain transactions, by hosting a full node on the Bitcoin blockchain, every participant can become a blockchain validator. However, the major motivation for running a full node is to improve security. Unfortunately, because this is an intangible reward, it will not be sufficient to motivate someone to run a full node. As a result, blockchain validator validates transactions by ensuring that they are legal. Consensus the other hand, it is to identify the order of occurrence in the blockchain and reaching agreement on that order. Independently verifying the integrity of the Bitcoin network as it stands is a full node’s main duty. It accomplishes this by obtaining each block and transaction

778

P. Ponmathi Jeba Kiruba and P. Krishna Kumar

Fig. 4 Manage driving license

in turn, then comparing them to the collaborative for bitcoin. When a block or transaction transgresses one of Bitcoin’s influence subsequent, a full node will reject it right away.

3.8 Blockchain Mine Performance Blockchain mining is a method of verifying each step of a transaction while using Bitcoin. The blocks are made up of several Bitcoins, which are small units that hold all of the data code separately. The connections that connect one neighborhood block to the next are referred to as the chain.

3.9 Blockchain Recovery Every application on your ledger device utilizes the blockchain recovery phrase as a master key to calculate its private keys. This means that the calculation of the private keys will always return the same result as long as your recovery phrase is the same.

3.9.1

Blockchain Performance

The number of transactions per second that a blockchain network is capable of processing each second is referred to as transaction per second (TPS). The Bitcoin

Tampering Detection Driving License in RTO Using Blockchain …

779

Fig. 5 Blockchain performance

blockchain’s average TPS is around 5, though this can fluctuate. Ethereum, the other hand, has a capacity of about twice that. Users and administrators of the blockchain should be aware of the following. We propose TPS, TPC, ARD, TPDIO and TPMS as overall performance measures for blockchain, which are detailed in the result analysis chapter with formula (Fig. 5).

4 Proposed System The business procedures that are described in the previous segments can be adjusted so that it takes use of the features of a blockchain-based automobile registration system. A blockchain-based car registration system should operate on the premise that information updates over vehicle data are correct unless later proven to be incorrect a result; we propose a blockchain-based vehicle registration system in which modification requests for vehicle information are first registered to the systems soon as the request is given by the relevant participant, and the request’s payment is made has been confirmed. This method simplifies corporate procedures by utilizing blockchain technology and its immutability. It is a strategy to make the car registration system safer and more dependable using the offered dependable using the offered solution. The approach aims to address issues such as the creation of fraudulent documents and the manipulation of vehicle data in the system by authorized individuals for personal gain. By putting all the agencies together under one platform connected by blockchain technology, any modifications made by a single authorized

780

P. Ponmathi Jeba Kiruba and P. Krishna Kumar

individual would be reflected throughout the system, making it simple for us to figure out who did what and when.

5 Conclusion and Future Work The futures given by BCT, such as privacy, security, traceability and provenance, blockchain technology and its implementation will be more efficient. As a result, it will be useful as a foundation for making the current system more reliable. Vehicles are commonly used assets with a high associated value all over the world. The adoption of technology such as blockchain to boost vehicle ownership transparency will make the entire process more trustworthy. Furthermore, this will free us timeconsuming, manual and repetitive activities for manufactures, customers, authorities, and all other parties involved procedures recognized for a vehicle driving license system, as well as the system’s participants, were used to create a data model. The collection of available actions for this RTO registration system was then specified in terms of the registry’s effects. We provided twostep transaction flow as change ownership transaction flow for vehicle registry procedures, allowing us to use a blockchain-based automobile registration system. Employees at the register were able to reduce their intervention in the transaction flow as a result of these procedure s. Finally, we ran a series of performance tests on the hyperledger fabric system using a simple configuration and analyzed the results. The throughput and latency findings throughout range of the system purposes, adjusting the size, based on the test results.

References 1. Nagala G (2015) Further applications of the blockchain 2. Michal, Sprague S (2016) Hawk: the blockchain model of cryptography and privacy-preserving smart contracts. Published in Proceedings of IEEE symposium on Security and Privacy (SP), San Jose, CA, USA, pp 839858 3. Htet CC, Mandalay (2019) A secure used car rading system based on blockchain technology. In: The 21st international conference on information integration and web based applications and services 4. Singh S, Hosen ASMS, Yoon B (2021) Blockchain security attacks, challenges, and solutions for the future distributed IoT network. Published in IEEE Access, vol 9 5. Azees M, Vijayakumar P, Lazarus JD, Karuppiah M, Christo MS (2021) BBAAS: blockchainbased anonymous authentication scheme for providing secure communication in VANETs Published in vol 2021(18), ArticleID 6679882 6. Gao L, Wu C, Yoshinaga T, Chen X, Ji Y (2021) Multi-channel blockchain scheme for internet of vehicles. Published in IEEE Open Journal of the Computer Society 7. Chulerttiyawong D, Jamalipour A (2021) A blockchain assisted vehicular pseudonym issuance and management system for conditional privacy enhancement. Published in: IEEE Access 9 8. Elizabeth Jesi V, Priyadarsini U, Anbarasu V, Venugopal H, Karuppiah M (2021) Ensuring improved security in medical data using ECC and blockchain technology with edge devices, 2021, Article ID 6966206

Tampering Detection Driving License in RTO Using Blockchain …

781

9. Singh S, Acharya RM, Selvan MP (2022) Blockchain-based peer-to-peer transactive energy system for community microgrid with demand response management. CSEE J Power Energy Syst 8(1) 10. Alkadi R, Alnuaimi N, Yeun CY, Shoufan A (2022) Blockchain interoperability in unmanned aerial vehicles networks: state-of-the-art and open issues. IEEE Access 10

Content-based Image Retrieval in Cyber-Physical System Modeling of Robots P. Anantha Prabha, B. Subashree, and M. Deva Priya

Abstract Cyber-Physical System (CPS) is used in industries and automated plants as they have modern administration abilities and are capable of performing real-time processing in a distributed architecture. In mechanical plants, they are expected to perform image recognition and retrieval, and other Machine Learning (ML) processes for efficient functioning and screening of assembling units and products. The existing strategies like remote sensing retrieval and the Gaussian mixture model for Image Retrieval (IR) are not suitable for huge datasets. The drawbacks of the existing strategies can be eliminated based on features of data by using Improvised Density ScaleInvariant Feature Transform (ID-SIFT). This approach mainly focuses on geometrical features like color, texture, size and shape-based image matching offering an improved accuracy of 89%. This approach is relatively efficient when compared to existing algorithms used in IR. Keywords CPS · Industrial robots · Content-Based Image Retrieval (CBIR) · SIFT · Image Retrieval (IR) · Image mining

1 Introduction A Cyber-Physical System (CPS) is a computer-based intelligent system that uses algorithms to enable functioning or monitoring of a mechanism. Physical as well as software components are closely linked, permitting them to operate on various spatial P. Anantha Prabha · B. Subashree Department of Computer Science and Engineering, Sri Krishna College of Technology, Coimbatore, Tamil Nadu, India e-mail: [email protected] B. Subashree e-mail: [email protected] M. Deva Priya (B) Department of Computer Science and Engineering, Sri Eshwar College of Engineering, Coimbatore, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_67

783

784

P. Anantha Prabha et al.

along with temporal dimensions, display multiple and unique behavioral modalities and interact in context-based ways. CPS associates Cybernetics, Mechatronics, design and process science in a multi-disciplinary way. It has a more complex combination and co-ordination between physical as well as computational elements. The instances of CPS incorporate shrewd network, independent auto frameworks, clinical observation, modern control and advanced mechanical frameworks, and programmed pilot flying. CPS has its origin in a variety of areas like aviation, cars, synthetic cycles, energy sector, medical services, transportation and manufacturing industries. Internet, cloud platforms and big data are some of the predominant technologies involved in rapid information generation and communication. They are fast evolving and have a significant impact on re-industrialization. The age of information and communication systems has brought about a profound and basic transformation in conventional manufacturing sectors’ production techniques, business models as well as industrial systems. CPS in smart robotic factory must possess some abilities to handle the mechanisms in manufacturing industries. Elementary functional unit of CPS is formed using several sensors as well as actuators and is accountable for the execution of the most rudimentary monitoring as well as control functions. For example, in Industry 4.0, the goal is to use CPS not only for promoting information with intelligence in a factory, but also use IoT system as a basis for exterior expansion, connecting appropriate services external to the factory and finally contributing to data with intelligence of the whole industrial chain. Designing mobile robots requires transmission of data commands from numerous systems like local processing, communication and platform computing owing to huge amount of processing tasks like command distribution, image observing and other activities essential to guarantee real-time data [1]. Image processing in robots entails altering the nature of an image to enhance graphical information for human understanding or to make it more suitable for autonomous machine discernment [2–4]. Let a digital image be represented as F(x, y). The amplitude of ‘F’ at spatial coordinates (x, y) gives the image’s intensity or gray level at that point. The values of (x, y) and ‘F’ amplitude are finite discrete measures. The digital images are made up of limited number of elements, each with its own position and value. Pixels represent the components of an image [5, 6]. An image is enhanced by sharpening the edges, removing the damages of optical distortion and identifying periodic interference of the image which certainly enhances the image representation. Image segmentation is a part of image processing which involves subdividing the image into basic parts or segregating particular aspects by finding the shapes and objects in an image [7]. In an image mining system, several operations will be carried out in order to obtain the needed images. Many of these tasks rely on image processing and pattern recognition tools. The image mining model influences the sequence in which certain of these processes are performed [8]. Image mining consists of three important steps, namely image feature extraction, identification of object, and creation of record and auxiliary images. In these steps, the image is segmented into regions that can be

Content-based Image Retrieval in Cyber-Physical System Modeling …

785

identified by the descriptors and the identified objects are compared with every object in other images. Each object is marked and labeled with an IDentifier (ID). Image data mining techniques are applied, wherein association rules are interpreted using collected objects. During feature extraction, the image characteristics are taken into consideration. Geometrical features like color, shape and size are taken as parameters for retrieving images in Content-Based Image Retrieval (CBIR) functioning of CPS robots. Multi-dimensional association rules and feature translation processes are carried out. In multi-dimensional association rule, the frequent characteristics for every multi-dimensional pattern are mined. During low-level interpretation, a more complex image semantic understanding is generated from low-level image exploration features. The guidelines also define the recognition of a major semantic component at a higher level as a distance amid features [9]. Structural Similarity Index Measure (SSIM) used in detecting image structure deficiencies is obtained hypothetically as a result of a signal design error. It accepts visual distortions caused by variations in lighting. Evaluation of similarities among reference image and its mathematically variant reproductions produced by interpretation, turn, scale, flipping and distinct distortion are required in several applications. On the other hand, one could see visual inconsistencies such as foundation disarray, different views and orientations. High-level techniques such as SSIM record and Visual Information Fidelity (VIF) may withstand certain numerical variations. D-SIFT transforms image data into scale-invariant directions in contrast to neighboring highlights. The technique is notable for producing massive amounts of components that heavily cover the image over a wide range of sizes and places. The initial stages of computation looks for prospective interest spots that are invariant to scale and orientation across all scales and image locations, and it is done quickly by employing a Difference-of-Gaussian (DoG) function. The keypoints are chosen depending on stability. Based on the local image gradient directions, one or more orientations are assigned to each keypoint position. All subsequent procedures are performed on image data that is modified in accordance with every feature’s given orientation, scale and position. The local image gradients in the region around each keypoint are measured at specified scale. The keypoint descriptors are converted into a format that permits severe local shape distortion and lighting changes [10]. Section 2 details the work done by several researchers related to CBIR in CPS. Section 3 gives information about the proposed system, while Section 4 focuses on experimentation and results. Section 5 gives the conclusion.

2 Related Work IR, face identification and other image processing techniques used in existing CPS robots are classical approaches while text-based search is the most frequent kind of search on the web. When it comes to text query and retrieval, most search engines and other CPS such as bots rely on keywords. The results of keyword searches are mainly from blogs or other discussion forums. Due to lack of trust in blogs and other

786

P. Anantha Prabha et al.

sources, users will be dissatisfied with outcomes which have very less accuracy. Existing approaches mostly assess the relationship between a reference image and non-geometrically varied form such as decompressed and contrast, brightness and other filter-enhanced versions. The virtual information present in a test image is highly connected to its image quality, and this information may be measured to quantify the similarity between the test and the reference image. Mainali et al. [11] have used Scale-Invariant Feature detector with Error Resilience (SIFER) features. It improves scale-space management by utilizing high granularity image pyramid representation and better scale-tuned filtering using a Cosine Modulated Gaussian (CMG) filter (at the cost of increased computational load). This approach offers increased accuracy and robustness. Accuracy comes at the cost of a 2-fold increase in the execution time over D-SIFT. Szegedy et al. [12] have demonstrated that, despite improved accuracy, the current deep networks are vulnerable to adversarial attacks in the form of minor visual changes that are undetectable by the human visual system. Such threats can lead to drastic shift of a Neural Network’s (NN’s) prediction of an image. Alzu’bi et al. [13] have proposed Semantic CBIR in which a sematic gap exists between low-level image features represented by digital machines and profusion of high-level human perception that are used to perceive images. These semantic gaps are generated by the difficulties of representing high-level semantic concepts using low-level visual features. Zhang et al. [14] have developed a statistical modeling algorithm to achieve automatic detection of object classes and image concepts via partial similarity matching. They have shown a 10-dimensional feature with 3 colors R, G and B, color variations of R, G and B utilizing a 5 × 5 block, 2 brightness gradients and 2 positions (x, y). Expectation–Maximization (EM) scheme is used to create a finite mixture model after feature extraction. An adaptive EM technique is created in response to the difficulty of automatically determining the number of clusters. EM algorithm in image processing has its own downside of converging to local optimum. A CBIR model based on shading and Discrete Wavelet Transform (DWT) is introduced by Ashraf et al. [15]. The low-level elements such as tone, surface and structure are used for restoring comparable images. These factors play a substantial role in rehabilitation. Different kinds of elements and extraction procedures are discussed along with scenarios, wherein feature extraction approach is beneficial. Color edge identification and DWT techniques are used to build Eigen Vectors (EVs) from the image. As a result, this current system is used as a model for the proposed system with some additional features added. CBIR using hybrid features and various distance metrics is proposed by Mistry et al. [16]. Various distance measures are used to develop a hybrid feature-based effective CBIR system. Color moments, auto-correlogram and HSV histogram features are used in the spatial domain, whereas frequency domain features like Stationary Wavelet Transform (SWT) moments and Gabor Wavelet Transform (GWT) features are employed in the frequency domain. Color and edge directivity descriptor features are used to improve precision binarized statistical image features in order to construct an effective CBIR system. As a result, when comparing the existing solution with the

Content-based Image Retrieval in Cyber-Physical System Modeling …

787

suggested method, it is clear that the proposed system uses geometric characteristics for IR and is more simple and easy to use with high quality. An approach for Hyper-Spectral Image (HSI) approach which is a visual consideration-driven way for HSI organization is proposed by Haut et al. [17]. It connects consideration systems to a residual network in particular to make it easier to depict the unique spatial data included in the information or an image. The newly recommended method computes a cover that is applied to the components collected in order to identify the best ones. The lead experiments which use four well-used HSI informative sets demonstrate that the suggested profound consideration model outperforms other cutting-edge procedures in terms of precision. The significant disadvantage of this strategy is that whenever enormous information is utilized, the presentation of considered classifiers is influenced by data distortions. Traditional approaches[18–21] and algorithms result in outcomes that are less precise and accurate. These drawbacks are overcome in the proposed system.

3 Proposed System CPS robots require a methodology for dealing with large datasets while performing CBIR in manufacturing industries. As a result, the CBIR and ID-SIFT methods of implementation will be more effective while finding objects that are captured by robots. To store and index an image, CBIR employs ID-SIFT method which takes into account the visual contents of the image like color, form, texture and spatial arrangement. The development of approaches for examining, interpreting, classifying and indexing image databases is a major focus of CBIR research. In addition to research and development, IR systems are evaluated to determine performance. The flow diagram depicts the design of the proposed system (Fig. 1). The method used for producing feature vectors and the similarity metric involved in comparing features have a significant impact on solution quality. The proposed technique combines the benefits of several different algorithms to increase retrieval accuracy and performance. Color Coherence Vector (CCV) is used for successive refinement which can improve the accuracy of color histogram-based matching. By considering the approximate form instead of precise shape, the rapidity of shape-based retrieval is increased. In addition, to increase the accuracy of the result, a mix of color and shape-based retrieval is used. By using the Bit Pattern Feature (BPF), the likelihood of an image is converted from a bitmap image into visual patterns that are saved for future use.

3.1 Image Pre-processing and Feature Extraction The feature vectors from images that are captured by robots are initially extracted, considered as input images and are saved in the image dataset. The feature vector of

788

P. Anantha Prabha et al.

Fig. 1 System design

every image is saved in the database, while a query image is processed in the query module. Following that, the feature vector is extracted. The feature vector of the query image is compared with every vector contained in the dataset. Shape, texture, color and spatial information are some of the most commonly used attributes. Searching image databases of ever-increasing size is of great demand.

3.2 ID-SIFT Feature Extraction for Reference and Test Images ID-SIFT converts image data into scale-invariant virtual to local features resulting in an increased amount of features which compactly covers the image at all scales and locations. Shape is a substantial visual characteristic and mostly used method to represent visual information. On the other hand, shape representation and description are challenging, since single dimension of object information is lost when a 3-D realworld object is projected in a 2-D image plane. As a result, the projected object is

Content-based Image Retrieval in Cyber-Physical System Modeling …

789

partially represented by the form taken from the image. Shape is frequently distorted by noise, flaws, arbitrary distortion and occlusion further complicating the situation. The prevalent techniques have both good and bad aspects; effective shape representation in computer graphics or mathematics is ineffective in image recognition. Despite this, traits common to most shape description systems may be found. Shape-based IR basically involves comparing the similarity of images represented by their attributes. Shapes may be described using certain simple geometric properties. As basic geometric characteristics can only distinguish images with significant variances, they are frequently used as filters to eliminate false positives or in combination with additional shape descriptors to distinguish shapes.

3.3 Image Analysis Image analysis consists of two functions namely, Scale-space extreme detection which facilitates image searches at all scales and locales and a DoG function which is used to find possible interested locations that are scale as well as orientation insensitive. By associating a pixel with its neighbors, a keypoint is discovered, and a precise fit to the neighboring data for position, size and ratio of key curvatures is performed. Keypoint localization removes low contrast points or those that are inadequately localized along edges.

3.4 Image Retrieval (IR) The keypoints are converted into an illustration that allows for considerable local shape deformation and lighting changes. The descriptor representation technique evaluates the similarity of ID-SIFT feature descriptors by comparing their size, texture, shape and color to their associated images.

3.5 Color and Shape Retrieval During this process, histogram-based comparison is performed and images that match are short-listed. To fine-tune the results, Color Coherence Vectors (CCVs) of the short-listed images are employed. The image determines the number of coherent and non-coherent pixels for all color intensities. A vector is created with the size of the coherence array along with the array and number of coherence pixels. The suggested shape retrieval method utilizes an automated segmentation approach to get approximate information about an object’s shape. It commences by splitting the image into a number of classes based on brightness. The shape vector is then computed with 3 properties for each class namely, mass, dispersion and centroid.

790

P. Anantha Prabha et al.

The query image and database image vectors are compared for retrieval, and the most similar images are short-listed as results.

3.6 Similarity Measure Matching is intended to be done based on color. The number of colors in both query and database images is calculated by examining the histograms. The proportion of a given hue in both the images is then compared to determine whether they are equivalent. The best match is the image that meets the majority of the criteria. As CBIR is not based on precise matching, the retrieval result is a list of images ranked by similarity to the query image.

4 Experimentation and Results In this part, the implementation of the proposed system is detailed and the performance is examined momentarily with expected output. First and foremost, the features of the reference image are extracted using ID-SIFT. For testing, an image dataset with 1000 random images that are related to the industrial environment is taken. ID-SIFT features are mined from the image dataset and saved in another dataset. Another image is matched by independently looking at every element in the new image with the images in the dataset and identifying the features that match in light of the component vectors’ Euclidean distance. Keypoints of an image are extracted for all scales and locations, even if both are modified. By keypoint localization, the less contrast points in the images are eliminated. Based on angle bearings in the adjoining image, at least one direction-based rotation is performed. The image data that gets changed comparatively with the change in direction, scale and area for each element are noted as keypoints. In the encoding step, the proposed work packs an image block into relating quantizers and bitmap image which is depicted in Figs. 2 and 3. Two image features such as Color Coherence Feature (CCF) and Bit Pattern Feature (BPF) (Fig. 4) which are determined directly from the encoded information are extracted to index an image. The CCF and BPF of an image are obtained from 2-scale-invariant quantizers (min and max) and bitmap individually by including them in the database and using them later in image similarity assessment. The keypoints are refined with respect to their scales and by determining their orientation and descriptors. The gradient magnitude and direction for each image that is present around the keypoint location are determined, and then a keypoint descriptor is generated. These keypoint descriptors are weighted using Gaussian window approach and the samples are combined into orientation histograms, summing the contents over 4 × 4 sub-regions with the length of each vector corresponding to

Content-based Image Retrieval in Cyber-Physical System Modeling …

791

Fig. 2 Bitmap image

Fig. 3 Generate min–max quantizers

Fig. 4 CCF and BCF generation

the sum of gradient magnitudes. This helps in examining the change in intensity of images with respect to the changes in scales and other factors. The Bag-of-words (BoW) approach quantizes the ID-SIFT feature descriptor into a collection of visual features or words. By matching their associated visual features

792

P. Anantha Prabha et al.

Fig. 5 Image Retrieval (IR)

by histogram co-ordination, the BoW depiction strategy can be used to determine the comparability of ID-SIFT feature descriptors in an image. The similarity measure is calculated between reference and test image to retrieve similar images from the database. Exploratory outcomes show that the proposed technique is better than the other prior strategies, and it additionally offers a straightforward and compelling descriptor to record images in CBIR framework. The expected result is to retrieve similar images from the database which is depicted in Fig. 5. Further, image similarity assessment is carried out. Compression, restoration, enhancement, copy detection, retrieval, recognition, and classification are a few image processing methods that rely on image similarity evaluation. The main aim of image similarity evaluation is to provide algorithms for automatic and objective similarity assessments that are comparable to subjective human evaluations. Peak Signal Noise Ratio (PSNR), Natural Scene Statistics (NSS), Structural Similarity Index Measure (SSIM), Human Visual System (HVS), Mean Squared Error (MSE), SIFT and Visual Information Fidelity (VIF) are some of the parameters used in image similarity assessment. HVS and NSS are typically used to compare the similarities amid a reference image and non-mathematically variable variations such as de-pressurized and brilliance/ contrast-improved adaptations. To compare image compression quality, MSE and PSNR are utilized. PSNR is used to compare the quality of the original and compressed images computed in decibels. The quality of the compressed or rebuilt image shows an improvement with an increase in PSNR. MSE gives the peak error amid the compressed and original image, whereas PSNR is the cumulative squared error. As the value of MSE reduces, the error also decreases [22, 23]. The matching stability of the D-SIFT descriptor is quite strong (>85%) within a range of approximately 10% to 15% as shown in Fig. 6. As a result, if the scales are close enough to each other, it should be possible to narrow the gap between them

Content-based Image Retrieval in Cyber-Physical System Modeling …

793

using the D-SIFT descriptor’s scale coverage to get higher accuracy matching in the whole process. Figure 7 depicts the performance of the ID-SIFT algorithm which encodes descriptors using BoW histograms and achieves an accuracy of roughly 89%. For BoW encoding, performance improves as the size of codebook increases. The image blocks are encoded in codebook. Image distortion also affects the performance of the system when performing robot vision tasks. Figures 8 and 9 depict the precision and recall with respect to some distortions in D-SIFT and ID-SIFT methods of CBIR respectively.

Fig. 6 Image matching graph

Fig. 7 Accuracy

794

P. Anantha Prabha et al.

Fig. 8 Precision-recall in D-SIFT

Fig. 9 Precision-Recall in ID-SIFT

5 Conclusion In this paper, a CBIR system for CPS robotic manufacturing plants is proposed wherein, the CPS robots make use of image correspondence and retrieval based on ID-SIFT features. Related image descriptors can be used for tasks like mapping the environment with image data acquired as the robot moves around, localizing the robot in relation to a set of known references, or recognizing and establishing geometric relations to objects in the environment for a robot that moves in an industrial environment. To make the ID-SIFT feature more compact, BoW quantizes ID-SIFT descriptors by performing vector quantization of a group of visual words depending on pre-defined visual vocabulary or vocabulary tree. By building weights of feature vectors, the CBIR system is built employing color and texture-fused features. This

Content-based Image Retrieval in Cyber-Physical System Modeling …

795

can lead to progression in manufacturing industries and production-related industries in economically growing countries. In the future, a cloud-based platform for big data-driven CPS robots can be built by incorporating CBIR.

References 1. Zhang N (2021) A cloud-based platform for big data-driven CPS modeling of robots. IEEE Access 9:34667–34680 2. Prabha PA, Priya MD, Jeba Malar AC, Karthik S, Dakshin G, Kumar SD (2021) Improved ResNet-based image classification technique for malaria detection. In proceedings of 6th international conference on recent trends in computing: ICRTC 2020, pp 795–803. Springer, Singapore 3. Haldorai A, Ramu A (2021) Survey of image processing techniques in medical image assessment methodologies. In computational vision and bio-inspired computing: ICCVBIC 2020, pp 795–811. Springer, Singapore 4. Haldorai A, Anandakumar S (2020) Image segmentation and the projections of graphic centered approaches in medical image processing. J Med Image Comput, 74–81 5. Dwivedi RS (2017) An introduction to remote sensing. in remote sensing of soils. Springer, Berlin, Heidelberg, pp 1–47 6. Prabha PA, Vighneshbalaji SA, Surya CS, Kumar KV (2021) Identifying criminal suspects by human gestures using deep learning. In proceedings of 6th international conference on recent trends in computing, pp 723–732, Springer, Singapore. 7. Mundra K, Modpur R, Chattopadhyay A, Kar IN (2020) Adversarial image detection in cyberphysical systems. In: Proceedings of the 1st ACM workshop on autonomous and intelligent mobile systems, pp 1–5 8. Yousofi MH, Esmaeili M, Sharifian MS (2016) A study on image mining; its importance and challenges. Am J Softw Eng Appl 5(3–1):5–9 9. Zhang T, Lee B, Zhu Q, Han X, Ye EM (2020) Multi-dimension topic mining based on hierarchical semantic graph model. IEEE Access 8:64820–64835 10. https://www.sciencedirect.com/topics/computer-science/scale-invariant-feature-transform 11. Mainali P, Lafruit G, Yang Q, Geelen B, Gool LV, Lauwereins R (2013) SIFER: scale-invariant feature detector with error resilience. Int J Comput Vision 104(2):172–197 12. Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 13. Alzu’bi A, Amira A, Ramzan N (2015) Semantic content-based image retrieval: a comprehensive study. J Visual Commun Image Represent 32:20–54 14. Zhang B, Luo H, Fan J (2016) Statistical modeling for automatic image indexing and retrieval. Neurocomputing 207:105–119 15. Ashraf R, Ahmed M, Jabbar S, Khalid S, Ahmad A, Din S, Jeon G (2018) Content based image retrieval by using color descriptor and discrete wavelet transform. J Med Syst 42(3):1–12 16. Mistry Y, Ingole DT, Ingole MD (2018) Content based image retrieval using hybrid features and various distance metric. J Electric Syst Inf Technol 5(3):874–888 17. Haut JM, Paoletti ME, Plaza J, Plaza A, Li J (2019) Visual attention-driven hyperspectral image classification. IEEE Trans Geosci Remote Sens 57(10):8065–8080 18. Reshma C, Patil AM (2012) Content based image retrieval using color and shape features. Int J Adv Res Electric Electron Instrument Eng 1(5):386–392 19. Kour A, Yadav VK, Maheshwari V, Prashar D (2013) A review on image processing. IJECCE 4(1):270–275 20. Chen G, Weng Q, Hay GJ, He Y (2018) Geographic object-based image analysis (GEOBIA): emerging trends and future opportunities. GIScience Remote Sens. 55:159–182

796

P. Anantha Prabha et al.

21. Juszczyk M, Le´sniak A, Zima K (2018) ANN based approach for estimation of construction costs of sports fields. Complexity 2018 22. https://www.cs.umd.edu/class/fall2016/cmsc426/matlab/matlab_imageprocessing.pdf 23. https://www.mathworks.com/help/vision/ref/psnr.html

A Hybrid Spotted Hyena and Whale Optimization Algorithm-Based Load-Balanced Clustering Technique in WSNs J. David Sukeerthi Kumar, M. V. Subramanyam, and A. P. Siva Kumar

Abstract The process of guaranteeing energy potent network operations is essential in wireless sensor networks (WSNs) due to its nature of resource limitation constraints. Sensor nodes clustering has emerged as the significant strategy responsible for mitigating the major challenge of implementing an energy efficient network. The development of energy efficient clustering process targets on improving scalability and network lifetime with even load distribution performed among the sensor nodes of the network in an intelligent manner. In this paper, a Hybrid Spotted Hyena and Whale Optimization Algorithm (HSHWOA)-based Load-Balanced Clustering Technique is proposed for constructed energy-balance clusters for extending the network lifetime. This construction of balance clusters is achieved based on the fitness evaluation metrics that included energy distribution, proximity of nodes, and distribution of nodes into account. It specifically adopted SHOA for determining the energy potent sensor nodes from the entire nodes deployed into the network. It also used WOA algorithm for attaining optimized load distribution among the cluster members in a more uniform manner. This proposed HSHWOA prevented frequent selection of cluster heads (CH) and sustained the network lifetime to the desired level. The predominance of the proposed HSHWOA evaluated using extensive set of simulation experiments confirmed better network lifetime of 28.12%, better than the load balanced clustering schemes used for investigation. Keywords Wireless sensor networks (WSNs) · Load balancing · Spotted hyena optimization algorithm (SHOA) · Whale Optimization Algorithm (WOA) energy stability

J. David Sukeerthi Kumar (B) · A. P. Siva Kumar Department of Computer Science and Engineering, JNTUA, Ananthapuramu, India e-mail: [email protected] A. P. Siva Kumar e-mail: [email protected] M. V. Subramanyam Santhiram Engineering College, Nandyal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_68

797

798

J. David Sukeerthi Kumar et al.

1 Introduction Wireless sensor networks (WSNs) include small sensors which comprise units that are involved in processing, energy handling, and wireless communication. Sensors are accountable for observing the surroundings, recording the physical conditions and forwarding the gathered data to the base stations (BSs). WSN is a mandatory component of the present time because of its intensifying positioning in nearly every field. They are of different kinds like underwater, terrestrial, multimedia, underground, and mobile. WSNs find their applications in real-world scenarios like battlefield monitoring, critical mission investigation, and detection of enemy movement. They find their application in health observation including monitoring, testing, disease identification, and diagnostics. Similarly in case of controlling traffic, observing and tracing vehicles are performed using WSNs. The nodes may be either static or dynamic. In case of a static network, the nodes stay fixed during the entire network lifespan, but nodes move freely in a dynamic environment. The data gathered are transmitted to the base station (BS) from the sensors either directly or through intermediate nodes. Handling mobility is challenging as sensor movement is usually irregular. The sensor networks are capable of reorganizing themselves. Synchronization is vital for error-free and reliable communication. Owing to limited amount of resources, routing is challenging in WSNs in contrast to other wireless networks. Hence, new routing schemes are proposed to deal with the challenges faced during topology creation. Unsuitable topology creation involves more energy and deterioration in network’s lifespan. To design a routing protocol, some factors like energy, data storage, distance from a BS, time, mobility etc., routing protocols may be either flat or hierarchical. In case of flat networks, the nodes play similar role and might involve flooding scheme which consumes more bandwidth and increased energy consumption. In case of hierarchical protocol, nodes are grouped depending on a particular standard. The optimal node amid a particular collection is chosen as the cluster head (CH). Clustering is a popular scheme used in topology formation that deals with prolonging the lifespan of WSN. The clustering mechanism organizes nodes in a well-structured form depending on a collection of pre-defined standards. The clustering protocol includes different stages which include CH selection, cluster formation, data gathering, and transmission. A suitable clustering scheme can guarantee effective use of resources that result in prolonging network’s lifetime. These schemes face several challenges like stable clustering, insufficient CH selection criteria, and fixed rounds. It is necessary to design a dynamic clustering approach for efficient CH selection as well as load balancing.

A Hybrid Spotted Hyena and Whale Optimization Algorithm-Based …

799

In this paper, a Hybrid Spotted Hyena and Whale Optimization Algorithm (HSHWOA)-based Load-Balanced Clustering Technique is proposed for constructed energy-balance clusters for extending the network lifetime. This construction of balance clusters is achieved based on the fitness evaluation metrics that included energy distribution, proximity of nodes, and distribution of nodes into account. It specifically adopted SHOA for determining the energy potent sensor nodes from the entire nodes deployed into the network. It also used WOA algorithm for attaining optimized load distribution among the cluster members in a more uniform manner. This proposed HSHWOA prevented frequent selection of cluster heads (CH) and sustained the network lifetime to the desired level.

2 Related Work Kumar and Chaparala [1] have designed Opposition-based Chaotic Whales Optimization algorithm (OBC-WOA) that mimics the behavior of humpback whales. OBC-WOA generates the population arbitrarily during exploration as well as exploitation phases similar to existing population-dependent systems that can generate values distant from ideal substitute or stop the growth of local optima. The proposed algorithm is intended to increase precision and trustworthiness. It uses a scheme dependent on opposition to enhance the efficacy. The performance of OBCWOA is compared with Particle Swarm Optimization (PSO), Whale Optimization Algorithm (WOA), Gravitational Search Algorithm (GSA), and Fuzzy K-Means and Centralized Mid-point Algorithm (FKM-CMA). Pitchaimanickam and Murugaboopathi [2] have propounded a Hybrid approach of Firefly Algorithm with Particle Swarm Optimization (HFAPSO) for finding optimal CHs in LEACH-C. This algorithm enhances the universal search of fireflies by using PSO and attains ideal placing of CHs. The performance of the propounded method is assessed using amount of alive nodes, throughput, and residual energy. The amount of energy consumed is reduced, thus improving the network lifetime. In contrast to the firefly algorithm, it is seen that the proposed scheme offers improved throughput involving reduced residual energy. Subramanian et al. [3] have proposed a Hybrid Grey Wolf and Crow Search Optimization Algorithm-based Optimal Cluster Head Selection (HGWCSOA-OCHS) mechanism for improving the lifespan expectation by focusing on reduction of delay and distance amid nodes, and increasing energy steadiness. GWOA is hybridized CSOA for handling the challenge of premature convergence which stops it from discovering the search space effectively. This method of CH selection preserves the trade-off amid exploitation as well as exploration in the search space. It is seen that HGWCSOA-OCHS scheme offers better results in contrast to Firefly Optimization (FFO), GWO, Artificial Bee Colony Optimization (ABCO), and Firefly Cyclic Grey

800

J. David Sukeerthi Kumar et al.

Wolf Optimization (FCGWO) in terms of energy, network lifespan expectation by balancing the amount of alive in addition to dead nodes. Balamurugan et al. [4] have proposed Hybrid Stochastic Ranking and Opposite Differential Evolution enhanced Firefly Algorithm (HSRODE-FFA)-based clustering for dealing with position-based CH selection which chooses identical nodes with computation and reduced selection accurateness. This scheme incorporates sampling for choosing CHs from sensors in the sample populace and handles the issues hosted by diverse positions of nodes as well as CHs. It improves the stability and lifespan of WSNs depending on the advantages of Stochastic Firefly Ranking (SFR) which improves the exploration ability of FFA. Hybridization of improved FFA with ODE helps in quick and optimal exploitation in choosing CHs. A balance amid the rate of exploitation and exploration is maintained for improving shared benefit of quick and efficient choice of CHs from sample populace. This scheme is better in contrast to some benchmarked mechanisms taken for comparison. Suresh Kumar and Vimala [5] have proposed energy efficient and trust dependent routing model based on Exponentially-Ant Lion Whale Optimization (E-ALWO). E-ALWO is the combination of Exponentially Weighted Moving Average (EWMA) with ALO and WOA correspondingly. Routing is performed through the CH such that CH selection is done using ALWO depending on energy as well as delay. Ideal and secure route for data transfer is calculated depending on fitness based on delay, energy, trust as well as distance. Furthermore, path with best fitness is taken as the path for sending data to sink through the CH. Kalburgi and Manimozhi [6] have proposed a scheme to deal with network failure. An efficient CH selection scheme called Taylor-Spotted Hyena Optimization (TaylorSHO) which incorporates Taylor series with SHO. It deals with efficient CH selection using fitness based on distance, energy, and delay. Data routing is performed by modified k-Vertex Disjoint Path Routing (mod-kVDPR) algorithm that is obtained by altering kVDPR using parameters like link reliability as well as throughput. Route upkeep is done to detect the distribution of data packets and find link failure. The propounded scheme is compared with Distributed Energy Efficient Heterogeneous Clustering approach (DEEHCA), GWO, Tabu Particle Swarm Optimization (TPSO), and Herding Optimization-Greedy (HOG) conventional schemes. Another metaheuristic optimization algorithm that deals with optimizing energy for principal sustenance of network lifespan is proposed by Balamurugan et al. [7]. Modified African Buffalo and Group Teaching Optimization Algorithm (MABGTOA) is propounded for attaining energy steadiness and preserving network lifespan by effective CH selection during clustering. This mechanism is designed for sustaining the trade-off amid the exploitation and exploration rates for effective choice of CH that maintains lifespan and energy steadiness.

A Hybrid Spotted Hyena and Whale Optimization Algorithm-Based …

801

3 Hybrid Spotted Hyena and Whale Optimization Algorithm (HSHWOA)-Based Load-Balanced Clustering Technique The proposed HSHWOA is base station supported clustering scheme that introduced the process of CH selection through the integration of SHOA and WOA algorithms, respectively. Then, the base station hands over the responsibility of network operations to the network nodes once the process of constructing optimized and balanced clusters are achieved. This HSHWOA scheme starts with the phase of bootstrapping in which all the nodes are assigned with a distinct identifier (IDs). This IDs is essential for establishing communication among the sensor nodes with the help of updating the location information. Further, the BS employs HSHWOA scheme using a wellformulated fitness function for attaining load balanced cluster formation. Then, the CHs selected from the network are furthermore information about their responsibilities, and information associated with the sensor members is specified by the BS. The IDs of the CH is provided to the associated cluster sensor member nodes together with the schedules of TDMA. Furthermore, the entire operation of the network is partitioned into several rounds that comprises the phase of steady and node selection state. In specific, the cluster nodes during the steady state phase aggregates the sensed data and forward them to the associated CHs.

3.1 Primitives of Whale Optimization Algorithm (WOA) Traditional WOA is stimulated by hunting strategy of humpback whales of the Baleen family. A stimulating feature of these whales is their particular hunting strategy. They identify their prey and enclose them. This behavior is shown in the ensuing equations.   B = C.pvi∗ − pvi 

(1)

pvit+1 = pvi∗ − D.B

(2)

where C, D—Coefficient vectors i—Current iteration pv∗ —Position vector of the attained outcome pv—Position vector with values in the range [−1, 1] ||—Absolute value .—Element-based multiplication ‘C’ and ‘D’—are shown in the following equations C = 2.rv

(3)

802

J. David Sukeerthi Kumar et al.

D = 2 j.rv − j

(4)

where rv—Random vector in the range [0, 1]. Moreover, ‘j’ drops from 2 to 0 in the whole iteration. For mathematically representing the bubble net scheme of these whales, two methods are used namely Shrink enclosing and Spiral position updation approaches. In the previous technique, ‘j’ is dropped. In case of latter scheme, the distance amid the location of the whale (X, Y ) and the prey (X ∗ , Y ∗ ) is examined using the spiral equation for imitating a helix-shaped collection of whales. pvit+1 = B  .ecr . cos(2πr ) + pvi∗

(5)

where r—random number in the range [−1, 1] c—Constant   The term B = C.pvi∗ − pvi  specifies the distance amid a whale and its prey. The solution is updated using the shrink encircling scheme as shown below.  pvi+1 =

pvi∗ − D.B, if h < 0.5 B .e . cos(2πr ) + pvi∗ , if h ≥ 0.5 

cr

(6)

where h—Random number in the range [0, 1]. Furthermore, ‘pv’ is employed for searching the prey vector. As ‘pv’ includes random values, the Search Agent (SA) circumvents the reference whale. The arbitrary ‘pv’ from former solutions is signified as ‘pvrand . ’   B = C.pvrand − pv

(7)

pvi+1 = pvrand − D.B

(8)

A Hybrid Spotted Hyena and Whale Optimization Algorithm-Based …

803

The steps of the conventional WOA are listed below.

3.2 Fundamentals of Spotted Hyena Optimization (SHO) The traditional SHO is stimulated by the hunting behavior of spotted hyenas. The association amid these hyenas is lively. The algorithm involves searching, encircling, and attacking the prey. To arithmetically signify the social nature of hyenas, the current finest solution is taken as the target that is particularly close to the ideal search space. The residual SAs endeavor to modify the corresponding solutions once the best solution is found. The ‘pv’ of the hyena is shown in the following Equation.   prey Dhy =  P.pvi − pvi ) prey

pvi+1 = pvi

− Q.Dhy

(9) (10)

804

J. David Sukeerthi Kumar et al.

where Dhy —Distance amid prey and spotted hyena P, Q—Coefficient vectors pv—Position vector of the hyena pvprey —Position vector of the prey Furthermore, ‘P,’ ‘Q,’ and ‘r’ are shown in the following Equations. P = 2.r1

(11)

Q = 2r.r2 − r

(12)

   5 r =5− i ∗ Maxi

(13)

where r—Reduces from 5 to 0 for maximum amount of iterations. r1 , r2 —Random vectors in the range [0, 1]. Using Eqs. (7) and (8), the hyena’s location is modified arbitrarily very much near the prey. The following Equations show the hunting nature of the hyenas.   Dhy =  E.pvhy − pvoh 

(14)

pvoh = pvhy − Q.Dhy

(15)

Clhy = pvoh + pvoh+1 + . . . . . . + pvoh+n

(16)

where pvhy —Location of the first best-spotted hyena. pvoh —Locations of the residual hyenas. Clhy —Cluster of ‘n’ solutions. n—Total spotted hyenas    n = Cons pvhy , pvhy+1 , . . . . . . ., pvhy + rv

(17)

where ns—Amount of solutions. rv—Random vector in the range [0.5, 1]. The candidate solutions are totaled. The value of ‘r’ is decreased to mathematically signify the prey attack. The variance in vector (Q) is reduced to modify the value of ‘r’ that decreases from 5 to 0 over some iterations. The following Equation shows attacking of prey, where ‘pvi+1 ’

A Hybrid Spotted Hyena and Whale Optimization Algorithm-Based …

805

keeps the best solution and modifies the position of added SAs consequently for the position of the best SA. pvi+1 =

Clhy n

(18)

The prey is mainly searched depending on the position of the collection of hyenas which exist in the vector ‘Clhy . ’ Furthermore, the hyenas drift from one other to hunt for prey and then attack it. The steps in the traditional SHO algorithm are listed below.

3.3 Hybrid HSHWOA Used for Load Balanced Clustering The process of integrating SHOA and WOA aided in balancing the trade-off between the exploration and exploitation process, such that energy efficient CH is always selected from the deployed sensor nodes in the network. This hybridization process improves the search for prey, encircle, and attack them. Furthermore, conventional optimization schemes have improved exploitation and minimized convergence speed, such that the algorithms are appropriate for dealing with optimization concerns. Furthermore, metaheuristic search models seem to be precise and suitable for numerous applications. In several engineering problems, the optimization algorithms are used quickly. Depending on optimization values, outstanding

806

J. David Sukeerthi Kumar et al.

decision-making systems are presented. Formerly, hybrid optimization schemes are found to be appropriate for particular search problems. Furthermore, they involve the merits of discrete optimization schemes to converge quickly. The convergence nature of hybrid schemes is found to offer improved performance in contrast to conventional schemes. To perform effective feature selection as well as classification, SHO is used in WOA. In the propounded SH-WOA, if h ≥ 0.5, the solution is modified using Eq. (18) depending on SHO in place of Eq. (5) in the traditional WOA. Consequently, further procedures are executed depending on current WOA.

4 Simulation Results and Discussion The performance evaluation of the proposed HSHWOA and the load balanced EALWO, OBC-WOA, and FCGWO clustering schemes is conducted using MATLAB 2016a. The simulation experiments of this proposed scheme are conducted over a network area of 300 × 300 m with 200 nodes randomly deployed into the network initially, the performance evaluation of the proposed HSHWOA is achieved using the performance metrics of network lifetime with respect to first node and last node death under different sensor node count. The results of network lifetime plotted in Figs. 1 and 2 depict the extended network lifetime of the sensor nodes, since it adopted potent WOA approach for better load balancing process. Thus, the proposed HSHWOA sustained maximum number of sensor nodes in the network by a significant margin of 13.21, 17.38, and 19.28%, excellent to the baseline ALWO, OBC-WOA, and FCGWO clustering schemes.

Fig. 1 Network lifetime until first node death attained by HSHWOA with different sensor nodes

A Hybrid Spotted Hyena and Whale Optimization Algorithm-Based …

807

Fig. 2 Network lifetime until last node death attained by HSHWOA with different sensor nodes

In addition, Figs. 3 and 4 demonstrate the performance evaluation of the proposed HSHWOA conducted based on data packets delivered to BS and energy consumption under different sensor node count. The plots of data packets delivered to BS clearly proved the potential of the proposed work in forwarding the data packets to the BS through proper and potent selection of CH without any delayed convergence. On the other hand, the energy spent by the sensor nodes in the network is also considerably conserved as it balanced the trade-off between local and global search to prevent worst nodes from being identified as CH in the network. Thus, the proposed HSHWOA with different sensor node confirmed maximized data packet to the BS by a significant degree of 12.86, 15.48, and 18.14%, excellent to the baseline ALWO, OBC-WOA, and FCGWO clustering schemes. In addition, the energy consumptions of the proposed HSHWOA with different sensor node also proved its predominance through a potential improvement of 14.32%, 16.98%, and 19.23%, better than the baseline approaches.

5 Conclusion The proposed HSHWOA-based Load-Balanced Clustering Technique constructed energy-balance clusters and prolonged network lifetime. It constructed clusters based on the fitness evaluation metrics that comprised of energy distribution, proximity of nodes, and distribution of nodes. It potentially adopted SHOA and significantly identified energy potent sensor nodes as CH in the network for increasing throughput and energy stability. It further utilized WOA algorithm and achieved optimized load distribution among the cluster members.

808

J. David Sukeerthi Kumar et al.

ENERGY CONSUMPTIONS (IN Joules)

Fig. 3 Data packet delivered to BS by HSHWOA with different sensor nodes

300 250 200 150 100 50 0 40

80

120

160

200

SENSOR NODES WITH BS LOCATED AT (50,50) Proposed HSHWOA

E-ALWO

OBC-WOA

FCGWO

Fig. 4 Energy consumptions incurred by HSHWOA with different sensor nodes

References 1. Kumar MM, Chaparala A (2019) OBC-WOA: opposition-based chaotic whale optimization algorithm for energy efficient clustering in wireless sensor network. Intelligence 250(1) 2. Pitchaimanickam B, Murugaboopathi G (2020) A hybrid firefly algorithm with particle swarm optimization for energy efficient optimal cluster head selection in wireless sensor networks. Neural Comput Appl 32(12):7709–7723

A Hybrid Spotted Hyena and Whale Optimization Algorithm-Based …

809

3. Subramanian P, Sahayaraj JM, Senthilkumar S, Alex DS (2020) A hybrid grey wolf and crow search optimization algorithm-based optimal cluster head selection scheme for wireless sensor networks. Wireless Pers Commun 113(2):905–925 4. Balamurugan A, Priya M, Janakiraman S, Malar A (2021) Hybrid stochastic ranking and opposite differential evolution-based enhanced firefly optimization algorithm for extending network lifetime through efficient clustering in WSNs. J Netw Syst Manage 29(3):1–31 5. SureshKumar K, Vimala P (2021) Energy efficient routing protocol using exponentially-ant lion whale optimization algorithm in wireless sensor networks. Comput Netw 197:108250 6. Kalburgi SS, Manimozhi M (2022) Taylor-spotted hyena optimization algorithm for reliable and energy-efficient cluster head selection based secure data routing and failure tolerance in WSN. Multimedia Tools Appl 1–25 7. Balamurugan A, Janakiraman S, Priya DM (2022) Modified African buffalo and group teaching optimization algorithm-based clustering scheme for sustaining energy stability and network lifetime in wireless sensor networks. Trans Emerg Telecommun Technol 33(1)

Evaluating the Effect of Variable Buffer Size and Message Lifetimes in A Disconnected Mobile Opportunistic Network Environment Pooja Bagane, Anurag Shrivastava, Sudhir Baijnath Ojha, Saurabh Gupta, and Deepak Kumar Ray Abstract The importance of using a mobile opportunistic network is that it allows users to communicate and exchange packets with one another from anywhere and at any time without the need for infrastructure. A mobile opportunistic network is a type of delay-tolerant network that is evolving from mobile ad hoc networks. This network is distinguished by its infrequent connectivity. End-to-end data transmission connections are not available here, and they rely on the store, carry, and forward mechanisms. A node receives packets, stores them in their buffers, carries them while moving, and forwards them to other nodes when they come across each other when transmitting a message to its proper destination. However, if two nodes do not come across each other, the nodes’ buffer sizes fill up or overflow, resulting in packet loss. As a result, we analyzed the performance of zero-information-based routing protocols such as Epidemic, SprayandWait, MaxProp, and Information-rich routing protocol in a mobile opportunistic network with varying buffer sizes, message lifetimes, and mobility models scenarios in this study. These simulations are carried out using a synthetic-based mobility model and the Opportunistic Network Environment (ONE) simulation tool, with three performance criteria in mind: delivery probability, overhead ratio, and average latency. By the end of the simulation, MaxProp from zero-information algorithms has a higher delivery ratio than the other protocols in all scenarios, according to the simulation setups. P. Bagane (B) Affiliated to Symbiosis International (Deemed University), Symbiosis Institute of Technology, Pune, India e-mail: [email protected] A. Shrivastava Sushila Devi Bansal College, A.B. Road, Indore, Madhya Pradesh 453331, India S. B. Ojha Shri Sant Gadge Baba College of Engineering and Technology, Bhusawal, MS, India S. Gupta CSE Department, SRM Institute of Science and Technology, Delhi NCR Campus, Gaziabad, India D. K. Ray Pune Bharati Vidyapeeth Deemed to Be University College of Engineering, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_69

811

812

P. Bagane et al.

Keywords Opportunistic network · Blind-based protocols · Knowledge-based protocol · Human mobility

1 Introduction Nowadays because of the huge accessibility of smartphones in network interfaces, researchers are interested in the development of the mobile opportunistic network. Opportunistic network is an extension of mobile ad hoc network (MANET) [1]. In MANET, the contacts or links are established earlier from source to destination or from source to relayed nodes, (i.e., packets in this kind of network are transmitted to an appropriate destination node by making use of an approach called node by node). There must be the presence of a path in the network and packets with the inaccessible route to relay nodes or receivers are dropped [2, 3]. Regarding these problems, a mobile opportunistic network (MON) is developed and the main assumption of MON is that an end-to-end connection between two nodes may not be possible or exist at all times. Disconnections and reconnections are common due to a variety of factors, including the mobile phone shifting away or turning off its power to conserve energy, a limited communication range, and even physical obstacles such as buildings and trees disrupting continuous communication, causing links to break at regular intervals [3]. As a result, the concept of a mobile opportunistic network (MON) evolved from the perception of a MANET, in which mobile nodes can store, carry, and forward packets rather than discard them until a communication path becomes available. The transfer of messages between mobile devices in an opportunistic network is accomplished in a hop-by-hop or node-by-node style [4]. Nodes save incoming messages in their buffer space and wait for a suitable next-hop node to convey the message closer to the destination. If a source node has a message to send but the path to the nexthop is unavailable, it will store the messages in its buffer and wait for a suitable next-hop node to move the message closer to its destination. As messages are sent node by node to their destination, this procedure is repeated. The performance of an opportunistic network is then based on a synthetic mobility model, storage spaces, and the maximum duration of a message in a specific context. The main problem of an opportunistic network is the limitation of storage. For the successful delivery of a message to the proper destination, the mobile nodes in the opportunistic network require sufficient storage space for storing messages until they encounter another either intermediate node or receiver node. In case if a storage capacity of a node becomes full, messages might be dropped and useful information may get lost [3, 5].

2 Literature Survey Briefly, in this section, we talk about the associated literature on opportunistic network strategies that have been adopted so far.

Evaluating the Effect of Variable Buffer Size and Message Lifetimes …

813

Gang et al. [6] proposed social opportunistic networks routing protocol (SONR) algorithm in the opportunistic network, which calculates and assumes transitivity probability of neighboring nodes, and to improve routing performance, they used social characteristics like centrality to support packet forwarding and a communitybased mobility model is used, where nodes are distributed. According to the writers, a proposed routing protocol is evaluated with Epidemic and SprayandWait routing protocols with variable message lifetimes and replication limits. The simulation result shows that increasing the TTL Epidemic routing protocol has a higher delivery ratio and poorest average delay [6]. By increasing a message replication limit from 10 to 80, the proposed protocol performs better in terms of overhead ratio metrics; however, back to our motivation in this paper the impact of altering the buffer sizes on the performances of routing protocols did not investigate. Hossen and Rahim [7] looked and examined the effect of changing mobile nodes with mobility models on a delay-tolerant network. The performance of copy-based routing protocols such as Epidemic, Binary SprayandWait (B-SNW), SprayandFocus (SNF), MaxProp, Probabilistic Routing Protocol using History of Encounter and Transitivity (PRoPHET), and Resource Allocation Protocol for Intentional DTN (RAPID) was investigated in this paper by using three mobility models: random walk, random direction, and shortest path map based [7]. According to the authors, in shortest path map-based mobility models, all routing protocols outperform each other. Similarly, in the case of emergency scenarios, Abraham et al. evaluated the performance of several routing protocols, including Epidemic, Probabilistic Routing Protocol using History of Encounter and Transitivity (PRoPHET), Time To Return (TTR), and MaxProp, as well as variable message size, a number of nodes, and a number of messages. According to the authors, the MaxProp routing protocol outperforms the other routing protocols in all cases [8]. However, neither research [7] discusses the impact of different buffer sizes and message durations on. According to Daru et al. a new routing technique has been devised to increase packet forwarding performance in intermittently coupled nodes. The goal of this study is to propose the SpecRouter protocol [9], which maximizes delivery probability, minimizes delay, and reduces resource waste. They compared the new protocol against well-known routing protocols such as Epidemic, SprayandWait, and Probabilistic Routing Protocol employing History of Encounter and Transitivity, as well as other routing protocols (PRoPHET). By varying transmission ranges and the number of mobile nodes, they use two alternative scenarios. In both instances, the proposed routing algorithm performs better; nonetheless, returning to our objectives, the paper fails to mention the impact of modifying buffer sizes and message lives on algorithm performance. Kim et al. [10] proposed a new routing protocol called space-aware spray and transfer routing protocol (SSTR) which consider both temporal and spatial social characteristic like the geographical place and speed of the mobile user, encountering interval, and contact time. They used space-aware delivery predictability calculation, space strategy delivers mean predictability, and transfer approach to calculate a delivery predictability value to select the appropriate relay node in a store carry

814

P. Bagane et al.

and forward technique that can create opportunistic communication from the source node to the destination node. Results obtained by simulation show that the suggested algorithm outperforms Epidemic, SprayandWait, and PRoPHET. The authors in this paper make use of two different scenarios (message lifetime and sizes of the message). However, the paper depends on constant storage spaces. Spaho et al. [11] have evaluated various and famous delay-tolerant routing protocols such as Epidemic, SprayandWait, PRoPHET, and MaxProp by considering multiple sender nodes to a single receiver communication. Two scenarios, namely time to live and distances, have been used to test the performance of routing protocols. Simulation results show that increasing the time to live from 15 to 60 min has a minor influence on protocol performance. In the second case, as the distance between source nodes and destination nodes increases, the delivery ratio for all protocols decreases by about 10% [11]. Nevertheless, back to our motivation, changing buffer sizes is missing in this paper. In an opportunistic network, Dhurandher et al. [12] compared the performance of six routing protocols. First contact, direct delivery, epidemic, Probabilistic Routing Protocol with History of Encounter and Transitivity, SprayandWait, and MaxProp algorithms are some of the algorithms. They considered two human mobility models, random waypoint and shortest path map-based mobility model [12], and employed three alternative situations, namely number of nodes, speed of nodes, and message lifetimes. When the shortest path map-based mobility model is applied, all routing protocols perform better in terms of delivering messages to the destination, and when the values of time to live are changed, all routing protocols perform almost the same. However, there is no way to test the protocols by adjusting buffer sizes. Han et al. [13] proposed a better probabilistic routing protocol based on the history of encounter and transitivity routing protocols, which leverage Epidemic routing protocols to forward packets to their destinations in a delay-tolerant network. In comparison with Probabilistic Routing Protocol using History of Encounter and Transitivity (PRoPHET), and Epidemic routing protocols, the authors assessed and investigated the performance of the proposed routing protocol with a variable threshold forwarding count and threshold hop count [13]. Simulations have revealed that the suggested routing protocol has a higher delivery probability than the other routing protocols. Nonetheless, there are no variable storage spaces or time to live (TTL) in this paper. N. V. V. and Rajam [14] investigated the performance of the Epidemic routing protocol using two different human movement models: individual path-based mobility, which considers all nodes within the same community (group) and can share common features, and random waypoint mobility, in which nodes move randomly from source to destination node. The authors of the paper investigated the influence of a mobility model with a variable number of nodes and node speed using two scenarios. The simulation results show that the Epidemic protocol has a superior delivery ratio with lower average latency in an individual path-based mobility model than random waypoint and that the overhead ratio increases as the number of nodes increase from 5 to 60.

Evaluating the Effect of Variable Buffer Size and Message Lifetimes …

815

3 Routing Protocols There are various opportunistic network routing protocols proposed for an opportunistic environment where contact probabilities are irregular. All protocols in an opportunistic network follow the store carry and forward mechanism. If there is no node in the communication range, then the existing node stores and carries the data until it comes across another node. The only difference in each protocol is in how they make a forwarding decision. MON routing protocols are categorized into two basic classes [15].

3.1 Blind-Based Routing Protocols (Flooding) Flooding (blind)-based routers are one type of routing protocols in an opportunistic network. In zero-information routers, there is no knowledge about the neighbor’s nodes and packets are copied blindly whenever two nodes come in the same communication ranges. This kind of routing protocol consumes huge network resources. The following are flooding-based routing protocols [16]. The Epidemic routing protocol is a flooding-based protocol that does not require any prior knowledge of the network [17]. When the mobile nodes have a sufficient buffer size, it is best in delivering messages with a higher delivery ratio. This is due to an unlimited copy of messages. When two mobile nodes are in the same communication range, the epidemic routing protocol allows them to share packets they do not have in common by checking their summary vectors (SVs). Multiple copies of the message circulate through the network once the message is exchanged. All nodes in the network, including the receiver node, have the same packets in their buffers, and all packets are flooded to all nodes in the network. When storage space is limited, however, this protocol performs poorly because many messages are repeated throughout the network, resulting in packet loss. The nature of packet exchanges that occurred throughout the Epidemic routing protocol operation is depicted in Fig. 1. When node A comes into contact with node B, it indicates that the messages are to be copied. In the first stage, node A sends its summary vector, SVA, to node B to start the message exchanges. This summary vector is a compressed representation of all the messages that have been buffered and need to be transmitted at node A. Once node B receives Host A as a summary vector, it does a logical AND operation among the negation of its summary vector and SVA. By doing, this node B determines the nodes that node B does not have and is ready to copy from node A. Then node B expresses its readiness to admit those messages from node A. As the final step, node A transmits the requested messages one by one to node B. This process is repetitive transitively when B meets a new neighbor.

816

P. Bagane et al.

Fig. 1 Summary vector packet exchanging

SprayandWait for routing protocol [18], like Epidemic, is a flooding-based (zero information) routing protocol, but it improves on Epidemic by limiting packet replication in the network. This protocol is characterized as a zero-information routing protocol since it assumes no prior knowledge of the network’s mobile nodes’ mobility. The SprayandWait algorithm, according to the authors, contains two steps. The first phase, known as the Spray Phase, occurs when a sender mobile node replicates a limited number of message copies (L) around the network to every (L) encountered node and then waits for confirmation from the recipient [18]. The second phase is called Wait Phase: If a message could not be transmitted to the destination in the Spray Phase, every node holding copy of messages will maintain and enter into the direct transmission to the receiver [18]. Like the Epidemic algorithm, the SprayandWait routing protocol uses a summary vector mechanism to check and receive the uncommon message during replicating messages and encountering times. Instantly, the messages are transferred to the destination node, and it generates an acknowledgment to confirm the successful delivery of a packet. The MaxProp routing algorithm is one of the types of opportunistic network protocol and pure flooding-based routing strategies [19]. This routing algorithm was proposed by [19], and it is designed to increase the delivery ratio by reducing the latency of messages through the network. The MaxProp routing protocol assumes no prior knowledge of network connectivity and relies on local data. It calculates the total cost of the path to the receiver nodes by making use of Dijkstra’s algorithm [19, 20] which instructs the nodes to decide and plan a route with the shortest path and calculate a weight cost for each message to its neighbor. MaxProp also uses a complementary mechanism to send acknowledgments of delivered messages and notify them to eliminate packets from their buffer [19]. MaxProp routing protocol also prevents and enables the nodes not to accept the same packets more than once.

Evaluating the Effect of Variable Buffer Size and Message Lifetimes …

817

3.2 Knowledge-Based Routing (Forwarding) Information about the network could be utilized to improve routing techniques and boost network performance to transfer packets effectively. Link metric, history of contacts, and movement pattern are all things that may be learned about a network. A node can choose the next relay or intermediate node that has the best chance of communicating with the target node using the knowledge-based process. [21] suggested PRoPHET, a probabilistic routing protocol. The PRoPHET routing protocol is an information-rich routing system that calculates the delivery predictability of each node in the network using the history of meetings. The possibility of seeing the receiving node is represented by delivery predictability. Every node keeps track of each other’s delivery predictability for all known destinations. When nodes come into contact with one another, they exchange information about delivery predictability. In addition, it uses transitivity information to determine the next hop. At each opportunistic encounter, the delivery predictabilities used by each node are recalculated using the three calculations listed below. (a) When node A crosses paths with node B, B’s predictability improves. P(a) = P(a, b)old + (1 − P(a, b)old) × Pinit .

(1)

where Pinit is an initialization constant among [0, 1]. (b) Because two nodes do not encounter each other for a long time, they are less likely to forward messages to each other, and the delivery predictability must age. P(a) = P(a, b) × γ k .

(2)

where k is the number of time units (the predicted delay of the network) that have passed since the predictability was last updated and γ is the aging constant in the range [0, 1]. (c) If node A frequently encounters node B and node B frequently encounters node D, then node D is most likely an appropriate node to relay messages intended for node A. P(a, d) = P(a, d)old + (1 − P(a.d)old) × P(a, b) × P(b, d) × β.

(3)

where β is a scaling constant that governs the impact of transitivity on delivery predictability between [0, 1]. Node (a) updates its delivery probability P (a, d) using P (b, d) and P (a, b) obtained from the encountered node (b) as Eq. (3). Equations (1) and (2)’s delivery probabilities have been revised as follows. When both nodes (a) and (b) contact each other, the delivery probability is updated in Eq. (1). After k times, unit Eq. (2) is updated. Suppose a and b are nodes with

818

P. Bagane et al.

the destination (d). A packet in a node (a) is forwarded to node (b) when the P (b, d) > P (a, d).

4 Simulation Tool and Setup The main objective of this simulation is to evaluate and analyze that for which routing protocols among zero-information and information-rich routing [22] achieve a high delivery ratio with less delay by considering three different scenarios where buffer sizes, time to live, and movement models are varied. Tables 1, 2, and 3 provide the parameters for simulation setup and routing methods, respectively. The total number of nodes for varying buffer sizes is 100, with 50 pedestrians (two groups of 50 nodes each) and 50 vehicles, for varying message lifetimes, once again. Because nodes can join or leave the network at any time, we have created two types of mobile node groups. Pedestrians and automobiles are involved. Each of these groupings moves at different speeds. Pedestrians, for example, walk at irregular rates of 0.5– 1.5 m/s, whereas cars travel at speeds ranging from 2.7 to 13.9 km/h. Bluetooth interfaces with a transmission speed of 2Mbps and a transmit range of 10 m were given to both groups. Each node was given the same buffer size, which ranged from 10 to 40 MB. The initial time to live (TTL) was set to 150 min. The ONE 1.5.1 simulator [23, 24] was used for this investigation. These simulations are carried out utilizing a synthetic mobility model [25]. Synthetic-based mobility models have been used to evaluate opportunistic networking technologies for a variety of reasons. The vast majority of datasets are location specific, having been collected at universities, conferences, or retail malls. They are also unmanageable and inflexible when it comes to altering characteristics like node density and velocity. Due to these challenges, researchers have turned to simulation-based mobility models, in which the parameters of the mobility models can be changed by issue specifications [25]. We employ event generator modules in this evaluation, and the simulator creates messages every 25– 35 s with message sizes ranging from 500 kb to 1 Mb. The following measures are used to evaluate and compare routing protocols: delivery probability, overhead ratio, and average latency. In this experiment, we simulate three distinct buffer sizes, message lifetimes, and mobility models using three different parameter choices. We employed buffer sizes of 10, 20, 30, and 40 Mb in the first batch. In the second batch, the message lives were adjusted to 50, 75, 100, 125, and 150 min. Finally, we experimented with different two-movement models. The delivery ratio, average latency, and overhead ratio are the performance indicators used to compare the four routing protocols.

Evaluating the Effect of Variable Buffer Size and Message Lifetimes … Table 1 Simulation setup

Parameters

Values

Simulation time

43,200 s

Number of nodes

100

Interface

Bluetooth

Transmit range

10 m

Transmit speed

2 Mbps

Buffer sizes

10, 20, 30, 40 Mb

Routing protocol

Epidemic, SprayandWait, PRoPHET, and MaxProp

Number of groups

2

Group1:speed

Pedestrian (0.5–1.5 m/s)

Group2.speed

Car (2.7–13.9 km/hr)

819

TTL

150 min

Mobility models

ShortestPathMapBasedMovement

Rate of message generation 25–35

Table 2 Parameter for varying TTL scenarios

Table 3 Parameters configuration for mobility models

Size of messages

500 Kb–1 Mb

Simulation area size

4500, 3400

Input parameters

Values

Protocols

Epidemic, SprayandWait, PRoPHET, and MaxProp

TTL

50, 75, 100, 125, 150 min

Input parameters

Values

Protocols

Epidemic, SprayandWait, PRoPHET, and MaxProp

Group movement

MBM, SPMBM

4.1 Delivery Probability It represents the probability of communications being accurately received by the destination within a particular time frame. Delivery probabilty =

Successfuly delivered messages × 100%. Generated packets at source node

820

P. Bagane et al.

4.2 Average Delay It is the average difference between the time it takes to send a message and the time it takes to create a message for all of the messages that have been delivered to the destination. Average latency =

n  Time when messages received − Time when messages created i=1

Number of messages received

4.3 Overhead Ratio The approach can be used to analyze communication cost, which is defined as the number of duplicate messages that must be conveyed to deliver one message to the target device. Overhead ratio =

Relayed messages − Delivered messages Delivered messages

5 Result and Analysis To compare and assess the influence of increasing buffer sizes, the message lives, and mobility on the performance of different routing algorithms, four routing protocols were chosen: Epidemic, SprayandWait, PRoPHET, and MaxProp. By keeping TTL constant in the first scenario, we can mimic the impact of changing buffer sizes on each node. The delivery ratio, overhead ratio, and average delay results are displayed in Figs. 1, 2, and 3, respectively, after the simulation.

5.1 Impact of Varying Buffer Sizes on the Performances of the Network The impact of adjusting buffer size on the performance of various opportunistic network routing methods is discussed in this section. The delivery ratio, overhead ratio, and delivery average delay of the routing algorithms Epidemic, SprayandWait, PRoPHET, and MaxProp are shown in Figs. 2, 3, and 4. Table 1 shows the parameter configuration used to compute these results, and the time to live for each node was kept constant. According to the graph in Fig. 2, the delivery ratio of all routing

Evaluating the Effect of Variable Buffer Size and Message Lifetimes …

821

Fig. 2 Delivery ratio under varying buffer sizes

Fig. 3 Overhead ratio under varying buffer sizes

techniques increases as the buffer size of nodes grows. This is because increasing the buffer size allows each node to retain more messages from other nodes, preventing message drops owing to a lack of storage space, and delivering the message to its intended destination. Increasing the buffer sizes from 20 to 40 Mb does not affect the delivery ratio while switching from zero-information routing to SprayandWait routing. This is because the buffer for this routing protocol fills up and can no longer accept fresh messages, so the packets are lost. Finally, based on the simulation results, the MaxProp algorithm has the highest delivery probability compared to the other routing protocols for zero-information routing methods. This is because MaxProp sends a message acknowledgment and tells nodes to delete data from their buffers, allowing nodes to move along the shortest path possible. In comparison with the other algorithms, the overhead ratios of zero-information (Epidemic) and

822

P. Bagane et al.

Fig. 4 Average latency under varying buffer sizes

information-rich protocol (PRoPHET) are greater, while the overhead ratio of the SprayandWait protocol is lower. (Figure 3) This demonstrates that the SprayandWait approach allows for regulated flooding while lowering the overhead ratio. For Epidemic, SprayandWait, PRoPHET, and MaxProp, the simulation results in Fig. 4 suggest that average latency is expected to grow with buffer size. This is due to the fact that as the buffer size grows, the number of messages stored in the buffer grows as well, as does the average time spent by message copies in each buffer.

5.2 Impact of Message Lifetimes on the Performances of the Network We tested the next condition with varied message lifetimes for each group node. Table 2 shows the simulation settings for the second scenario; all other simulation parameters are the same as in the previous scenario. The buffer size (30 MB) is kept constant throughout the performance evaluation, and only the TTL in the node is altered. The simulation results of delivery ratio vs. TTL for all examined routing protocols are displayed in Fig. 5. According to the simulation results, increasing TTL has a little impact on the performance of all routing protocols, with MaxProp outperforming the others in terms of message delivery. Because the MaxProp protocol calculates each node’s cost before swamping copies of the message, nodes can duplicate the message on the shortest path possible before the TTL expires. The overhead ratio simulation results are shown in Fig. 6. For Epidemic and PRoPHET Protocols, a longer TTL means more overhead. SprayandWait, on the other hand, has the lowest overhead ratio, which is due to its direct transmission technique. To prevent flooding, the SprayandWait protocol limits the number of packet copies made per bundle. The epidemic routing protocol achieves the highest overhead ratio according to simulation results. This is because Epidemic duplicates bundles to send messages to the destination node, resulting in a high overhead ratio. Next to the SprayandWait methodology, MaxProp has the lowest overhead ratio.

Evaluating the Effect of Variable Buffer Size and Message Lifetimes …

823

Fig. 5 Delivery ratio under different message lifetimes

Fig. 6 Overhead ratio under different message lifetimes

The average time between when a message is created and when it is received by the destination is shown in Fig. 7. The graph shows that epidemic and PRoPHET Protocols have more delay than other protocols. This is why, as time to live increases from 75 to 150 min, the packet has a longer time to exist in the network. Both SprayandWait and MaxProp protocols have little effect when the time to live is increased, and they attain the lowest average latency. Figures 5, 6, and 7 show the impact of changing time to live on delivery ratio, overhead ratio, and average delivery latency, respectively, on the performances of zero-information and information-rich routing protocols.

824

P. Bagane et al.

Fig. 7 Average latency under different message lifetimes

5.3 Finding Impact of Varying Movement Models on the Performances of the Network The outcomes obtained here are found by executing the simulations as per the parameters defined in Tables 1 and 3. Figure 8 shows the delivery ratio of the four famous routing protocols with two mobility. By the simulation result, the delivery probability is maximum in the case of SPMBM and lowest for MBM movement. Each routing protocol shows a better delivery rate, minimum overhead ratio, and average latency in SPMBM. This is because SPMBM chooses the shortest path on the map and enables a node to select their destination randomly inside a map. Figures 8, 9 and 10 show the impact of changing movement models on the delivery ratio, overhead ratio, and average latency, respectively, on the four routing protocols. The following summarizes the main features of each of the routing protocols. Epidemic • Minimum delivery ratio both for the first and third scenarios. Higher overhead ratio and average latency in all scenarios. SprayandWait • In all scenarios, there is a good delivery ratio, as well as a low overhead ratio and average latency. PRoPHET • All scenarios have a low delivery probability, a high overhead ratio, and the average delay in the first and second scenarios. In comparison with both MaxProp and SprayandWait, the third scenario has a larger latency.

Evaluating the Effect of Variable Buffer Size and Message Lifetimes … Fig. 8 Delivery ratio under two mobility movements

Fig. 9 Overhead ratio under two-node mobility movement

Fig. 10 Average latency under two-node mobility movement

825

826

P. Bagane et al.

MaxProp. • The maximum delivery ratio in all scenarios. • The minimum overhead ratio in the first and second scenarios. Lowest overhead in SPMBM and higher overhead for MBM. • Minimum latency in all scenarios following to SprayandWait.

6 Conclusion Opportunistic networks are a hot topic in research right now, and they are one of the more exciting MANET extensions. Mobile nodes in opportunistic mobile networks are allowed to communicate with one another even if a link or path linking them does not exist. While a packet is in transit between the sender and the destination(s), routes are built dynamically, and any available node might be used as the next relay node if it is likely to convey the message closer to the final receiver. To deliver the packet to its proper target nodes, the opportunistic mobile network employs store and forward mechanisms. The performance of zero-information versus informationrich routing in an opportunistic mobile network is compared in this research. To achieve these performance improvements, we used the well-known routers from floods (Epidemic, SprayandWait, and MaxProp) as well as forwarding or knowledgebased (PRoPHET) routing protocols. Using a synthetic mobility model, we run these simulations (simulation-based mobility model). We used three separate performance measures to evaluate the project: delivery probability, overhead ratio, and average delivery latency. We utilized three different scenarios to examine how the network performed in sometimes connected environments: varying buffer sizes, different times to live (e.g., decreasing or rising message lives for the node), and mobility models. We adjusted the buffer sizes while keeping the time to live constant in the first set of simulations. We altered the time to live with varying message lifetimes in the second set of simulations while keeping the buffer size constant. We kept both buffer sizes and TTL constant in the third instance while altering node motions. In comparison with SprayandWait, simulation results indicated that MaxProp from floodingbased (zero-information) routing protocol achieves superior delivery probability with a lower overhead ratio in all scenarios.

7 Future Work In the future, we need to conduct the experiment using other technologies like WiFi connectivity. We planned to carry the simulation with different parameters and performance metrics like energy spent metrics. Finally, we wish to design an efficient

Evaluating the Effect of Variable Buffer Size and Message Lifetimes …

827

routing protocol, which is capable of overcoming opportunistic network challenges by taking into consideration the following problems (storage constraints, delay in packet delivery, security issues, and human mobility models).

8 Result Summary See Figs. 11, 12, 13, 14, 15, and 16.

Fig. 11 Simulation results for the varied buffer

Fig. 12 Simulation results for the varied buffer sizes

Fig. 13 Simulation results for the varied time to live

828

P. Bagane et al.

Fig. 14 Simulation results for the varied time to live

Fig. 15 Simulation results for the varied time to live

Fig. 16 Simulation results for two mobility models

References 1. Saravjit C, Maninder S (2017) An extensive literature review of various routing protocols in delay tolerant networks. Int Res J Eng Technol 04(07):1311–1312 2. Mangrulkar R, Atique M (2011) Performance evaluation of flooding based delay tolerant routing protocols. Int J Comput Appl 35–40

Evaluating the Effect of Variable Buffer Size and Message Lifetimes …

829

3. Fall K (2003) A delay-tolerant network architecture for challenged internets. In: The proceedings of the 2003 ACM conference on applications, technologies, architectures, and protocols for computer communications (SIGCOMM 2003), pp 27–34 4. Aidi L, Changsu J (2011) Delay tolerant network 5. Chung H, Kun L, Chang T (2008) A survey of opportunistic networks. In: 22nd International conference on advanced information networking and applications, Taiwan 6. Gang C, Mei S, Yong Z, Yihai X, Xuyan B (2014) Routing protocol based on social characteristic for opportunistic networks. J China Univ Posts Telecommun 21(1):67–73 7. Hossen M, Rahim M (2016) Impact of mobile nodes for few mobility models on delay-tolerant network routing protocols 8. Mart´n-Campillo A, Crowcroft J, RamonMart E (2013) Evaluating opportunistic networks in disaster scenario. J Netw Comput Appl 36(2):870–880 9. Da-ru P, Wei C, Xiong L, Jia-Jia S, Jun S (2012) A performance- guarantee routing algorithm in complex distributed opportunistic networks. J China Univ Posts Telecommun 19(1):87–93 10. Myung C, Hee Han Y, Sang Youn J, Sik Jeong Y (2014) A socially aware routing based on local contact information in delay- tolerant networks. Sci World J 7 11. Spaho E, Bylykbashi K, Barolli L, Kolici V, Lala A (2016) Evaluation of different DTN routing protocols in an opportunistic network considering many to one communication scenario. Czech Republic, Ostrava 12. Dhurandher S, Sharma D, Woungang I, Chieh Chao H (2011) Performance evaluation of various routing protocols in opportunistic networks 13. Han S, Chung Y (2015) An improved PROPHET routing protocol in delay tolerant network. Sci World J 7 14. V V, N, Anita Rajam V (2013) Performance analysis of epidemic routing protocol for opportunistic networks in different mobility patterns 15. Kumar Samyal V, Singh Bamber S, Singh N (2015) Performance evaluation of delay tolerant network routing protocols. Int J Comput Appl 24–27 16. Mangrulkar R, Atique M (2011) Performance evaluation of flooding based delay tolerant routing protocols. In: National conference on emerging trends in computer science and information technology, pp 35–40 17. Vahdat A, Becker D (2000) Epidemic routing for partially connected ad hoc networks. Department of Computer Science, Duke University, Tech. Rep 18. Spyropoulos T, Psounis K, Raghavendra C (2005) Spray and wait: an efficient routing scheme for intermittently connected mobile networks. In: Proc. of ACM WDTN, Philadelphia, PA, USA, pp 252–259 19. Burgess J, Gallagher B, Jensen D, Levine B (2006) MaxProp: routing for vehicle-based disruption tolerant networks. In: Proceedings of IEEE INFOCOM, Barcelona, Spain 20. Pedro J, Milan M (2016) Dijkstra’s algorithm learning tool. Manchester 21. Lindgren A, Doria A, Scheln O (2003) Probabilistic routing in intermittently connected networks. ACM Mobile Comput Commun Rev 7:19–20 22. Yuan P, Fan L, Liu P, Tang S (2016) Recent progress in routing protocols of mobile opportunistic networks: A clear taxonomy, analysis and evaluation. J Network Comput Appl 163–170 23. Keranen A (2008) Opportunistic network environment simulator. Spec Assignment Rep Helsinki Univ Technol, Depart Commun Networking 24. Keränen A, Ott J, Kärkkäinen T (2009) The ONE simulator for DTN protocol evaluation. In: Proceedings of the 2nd international conference on simulation tools and techniques pp 55 25. Pirozmand P, Wu G, Jedari B, Xia F (2014) Human mobility in opportunistic networks characteristics models and prediction methods. J Netw Comput Appl 42:45–58

Unsupervised Machine Learning for Unusual Crowd Activity Detection Pooja Bagane, Konda Hari Krishna, Shehab Mohamed Beram, Priyambada Purohit, and B. Gayathri

Abstract Unusual human crowd activity detection is the approach to identifying undesired human activities in a congested environment with a lot of objects in motion. This is accomplished by transforming input video into separate frames and then studying the behaviour of persons and the actions performed by them in the frames that are processed one by one. This paper puts forth a design for developing a system that can identify such human actions which are questionable. The performance of the proposed system compared with the existing model in terms of accuracy and F-score, which reflects higher than the existing model and decent enough for considering the activity as unusual in videos with complex graphics. Keywords Unusual crowd activity · Mega blocks · Unsupervised learning · Motion influence map

P. Bagane (B) Affiliated to Symbiosis International (Deemed University), Symbiosis Institute of Technology, Pune, India e-mail: [email protected] K. H. Krishna Department of CSE, Koneru Lakshmaiah Education Foundation, Koneru Lakshmaiah Education, Vaddeswaram, AP, India S. M. Beram Research Centre for Human-Machine Collaboration (HUMAC), Department of Computing and Information Systems, School of Engineering and Technology, Sunway University, Kuala Lumpur, Malaysia P. Purohit Department of Faculty of Management Studies, SRM IST, Delhi NCR Campus Ghaziabad, Uttar Pradesh, Modinagar, Ghaziabad, (U.P) 201204, India e-mail: [email protected] B. Gayathri Bishop Heber College, Trichy, Tamilnadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7_70

831

832

P. Bagane et al.

(a) usual activity

(b) unusual activity

Fig. 1 Examples for a usual activity and b unusual activity

1 Introduction Suspicious activities have been happening around us. Traditionally, they were watched by placing personnel in crowded public locations who would constantly monitor the behaviours of people and alert the authorities if there were any questionable acts. However, since it is quite difficult to keep an eye constantly on public spaces, a system that is smart and intelligent enough to do this task is necessary. It would perform continuous surveillance of the places to check any doubtful activity and classify the activities being taken place as ordinary or unusual. It would also raise an alarm and let the concerned authorities be vigilant. It is important to identify such doubtful activities from real-time videos to avert thefts, fighting, personal attacks, and attacks by terrorists by using explosives or some deserted objects. It is also helpful in keeping individuals and surroundings like banks, railway stations, bus stands, schools, parking areas, etc., safe and developing a faith in the system. Figure 1a shows the examples of usual activity, and Fig. 1b shows unusual activity This paper puts forth a way of designing a system that can identify such human actions which are questionable or are suspicious so that the consequences of these may be averted. Section 2 describes the literature survey, Sect. 3 deals with problem identification, Sect. 4 deals with the proposed method and its implementation, and next sections deal with results discussions and conclusion.

2 Literature Survey Suspicious activities are any activities that are posing a kind of threat to daily life. Earlier these were detected manually by keeping an eye on the crowd. But now this is done using cameras and analysing video streams. This field is called content analysis or video analytics. Providing security in sensitive areas using video surveillance has

Unsupervised Machine Learning for Unusual Crowd Activity Detection

833

been in for a long time and has been helpful in catching the accused. A lot of research work has already been conducted on video surveillance which includes. Ahmad et al. [1] evaluate different face recognition and detection methods and thereafter provides a comprehensive solution for image-based face detection and recognition with greater accuracy and better response rate. Thus, showing it to be a preliminary step towards successful real-time video examination. Face detection is considered a complicated topic in computer vision. This is so because a human face has a lot of features, and it is a dynamic object. The solution proposed in the paper is based on performing tests on different databases that are rich in human faces in terms of pose, emotions, appearance, and light. Ben-Musa et al. [2] proposed a system to find out cheating activities, if any, going on in the examination hall using speed up robust features (SURF) to obtain from it the interest points and the apply SURF method to match and locate those analogous features. Viola-Jones object detectors have been used for searching faces and tagging activities. Tracking algorithms have been employed to trace detectors in the input video. The detectors employed are fast and robust. Text labelling is also used here along with detectors and tracking algorithms so that false classification does not take place. The faces and hands are detected in input through object detection algorithms, and the location of objects is tracked using tracking algorithms. Interest points from any two similar and related images were then used to pinpoint suspicious activity of students in the examination hall and notify the concerned authorities. Lee et al. [3] proposed a system to detect local and global abnormal activity by employing a motion influence map. To start with, the motion information for every frame at the pixel level and block level in the video is calculated in sequence. Based on this information, motion influence energy is computed for each block. A motion map is then generated for each frame. Both spatial and temporal characteristics of objects are represented in a single feature matrix of the motion influence map. k— means clustering is then performed after dividing the motion map into a uniform grid. The distances between the centre of the clusters and each of the obtained spatial– temporal motion features are used to classify unusual activity at the frame level. After the identification of an unusual frame, the exact position of the activity is then identified. Gauss filter [4] can be used for the concurrent detection of the unusual crowd behaviour, which is based upon neighbouring location estimation, as stated by G. Wang. It removes image noise and then performs histogram equalization to improve the contrast of the picture/image. Feature point for tracking which is Harris corner is then extracted from the image. Then in the end, the adjacent position estimation is used to estimate the state of motion of the entire population. So, by analysing the moving trends of the feature points between successive video frames, it can be decided whether the behaviour of the crowd is in normal or abnormal state.

834

P. Bagane et al.

Fathia et al. [5] described algorithms to detect human activities which are running and walking. There was no restriction in direction of motion and number of people in the frame. Background subtraction algorithm is used to detect moving object corresponding to people. The important attributes for activity classification are the displacement rate of the centroids of the foreground area which has been divided and the second one is the pace with which the size of the divided area fluctuates. This activity gives a very high accuracy. For identifying undesired human activities in a congested environment with lot of objects in motion [6], a system has been proposed by Nandini. The process involves conversion of video into singular frames and then analysing the people and the activities they are doing in the video. Noise must be removed in the frames by preprocessing the frames. After that, the area of interest is removed from the background. This is carried out by detecting the edges of the objects of interest in the frames and removing them from the background. Background removal of the picture is followed by post processing to filter out the noises from it. Now after, we have removed noises from the video the face detection algorithm must come into play to identify the faces in the image which further helps to identify the person. Person identification is followed by the activity identification, i.e. to identify the activity done by a person which can be achieved by looking for same patterns in the database. The concerned organization must be informed and alerted if the activity done by the person which has been identified by the system as unusual or suspicious and the details of the person along with the activity must be sent to the organizations. Kak et al. [7] proposed three major strategies for feature extraction. Modelbased methods, hybrid methods, and appearance-based as feature extraction are also discussed. Some distance measurements methods are also discussed such as support vector machine (SVM), KNN, Euclidean, and squared Euclidean distance. Mohite et al. [8] described a technique for unusual human activity detection in crowded areas. They explained a method of motion influenced map for representing human activities. This system could recognize multiple activities in a single video. Jhapate et al. [9] explained a system which focuses on detection of suspicious activities using computer vision. Here, they have used motion influenced maps to recognize motion analysis. Pixel level presentation is used to make it easier to understand. First, a video containing group of people must be input that contains usual as well as unusual activities. After this, frame validating starts by frame selection. Until and unless, we reach the last frame the process of unusual human activity detection continues. Next, we compute its effect on the map, through which we get the feature vector. This tells us influence density in motion influence map. The activity is considered unusual if the influence density is greater than threshold density, else no doubtful activity is detected. When the unusual activity is validated, the influence area is clustered over the frames and finally detected at pixel.

Unsupervised Machine Learning for Unusual Crowd Activity Detection

835

A system has been proposed to extract features and to describe the behaviour representation [10] along with unsupervised categorization methods for behaviour modelling to identify unusual, behaved objects in congested places, such as fast and irregular motion in a group or crowd of people who are walking. Features are extracted for frames. The movement impact map, a square where unusual activity occurs alongside its adjoining squares, develops a distinguishing vector development influence. A component vector from a cuboid defined by n/n squares is eliminated when an activity is recognized as suspicious over various sequential edges, over the most recent t number of casings. The k-means clustering algorithm is used to cluster the extracted features. They employ the spatio-fleeting highlights to cluster each mega block and establish code words. Henceforth, the codewords of a super square model the examples of customary feat that may occur in the area in question. A way to identify and delimit the suspicious activity in a congested area has been explained by Agrawal [11]. To classify the abnormality or irregularity, an autoencoder-based deep learning framework has been used. Motion influenced map is used to calculate optical flow and given as input to the convolutional auto-encoder. The output that we get from encoder, i.e. the spatio-temporal features are further used for classification using K-means clustering algorithm.

3 Problem Identification In case of face detection and recognition, the results of Haar-like features [1] were good but there was the issue of false detection when it was compared to LBP [1] and that needs to be taken under consideration if working in surveillance-based systems to reduce false detection. Some work focused on only few activities as walking and running to identify the type of activity going on in the video. While some models used feature extraction, auto-encoder, motion influence maps, determined optical flow for frames and proposed a model to identify suspicious activity in crowd. But the proposed system remained far from being ideal as their accuracy and precision could be improved further. In some cases, the system gave false results in case of occluded scenes. In this paper, we propose to design a system to identify any atypical, unusual activity in a video using OpenCV and Python IDE through unsupervised machine learning, which will also be capable of identifying a particular required object in the video by using deep neural networks (DNN), with focus on being improvement of accuracy as compared to old systems by reducing false detections. The system would also be trained to identify objects from the video. For example, car, bus, etc., which can help in locating a particular required object. The novelty of this paper is that its complete designing can detect person’s actions that are suspicious not all so that result of identification can be prevented. Previous all attempts rely on false identification limited to walking or running but the activity that has been captured

836

P. Bagane et al.

in video was not considered. Only the required using different type of approaches was done like machine learning, neural network, OpenCV, etc., and focus only on the improvement of accuracy.

4 Proposed Method and Implementation Any input video consists of sequence of frames. Each frame is divided into M × N uniform blocks. Motion influence information for all the blocks is collected from the optical flows using FarneBack algorithm. Based on the motion influence information gained, we construct a motion influence map. Spatial and temporal features like speed of motion of an entity in a frame, its’ direction of motion, size of the entity, and its’ interactions with nearby entities are represented in a motion influence map. All this information is represented simultaneously contained by a single feature matrix. After we construct a motion influence map, we partition the frames of the video under consideration into mega blocks. These mega blocks are non-overlapping and each one of these is further formed by combining several motion influence blocks. We add up the values of motion influence of smaller blocks that are forming a larger block. With this addition operation, we can get the motion influence value of the mega block. After the division of a particular number of recent frames (here considered ‘t’) into mega blocks, a concatenated feature vector of dimensions 8 × t is extracted across all frames and for every mega block. After that, K-means clustering algorithm is applied on each mega block. Based on results obtained after K-means clustering, unusual activity if any is detected. Flowchart for proposed method represented in Fig. 2. For detection of objects in input video, the methodology adopted is discussed. Divide the input video into frames and further to blobs (pixels with same intensity). Pass the blob to the deep neural network (DNN) module of OpenCV. This module is pre-trained to identify different objects. We will predefine an object to be detected in video so that we get results related to that object only. We can also pre-train the module if the predefined object is not present in it. This can be shown in Fig. 3.

Fig. 2 Flowchart for proposed method

Unsupervised Machine Learning for Unusual Crowd Activity Detection

837

Fig. 3 Flowchart for object identification

Fig. 4 System architecture

Figure 4 shows an overview of system architecture. The implementation phase involves the actual materialization of the system architecture. In this regard, decisions like selection of the platform, the language used, etc., are made. Python 3.9, OpenCV 4.5, and NumPy 1.19.5. Were used for implementation of the system. The code is divided into six modules, finding optical flow of blocks, motion influence map generator which uses FarneBack algorithm, creating non-overlapping mega blocks, object detection, training, and testing.

4.1 Finding Optical Flow of Blocks Optical flow is calculated using FarneBack algorithm for every pixel that is present in a frame. This is further used to estimate the motion information. The frames are then partitioned uniform blocks of dimension M × N uniform blocks. Thereafter, for each block, mean of optical flow values of pixels present in the block is calculated which is also set as the representative value for that block. Here, it may be noted that motion characteristics of the blocks are estimated here instead of trying to identify objects like cycle, pedestrian, etc. This motion information is passed further and is helpful in identifying unusual crowd activity.

838

P. Bagane et al.

4.2 Generation of Motion Influence Information For generating motion influence information, we should note that if an object or entity is present in a crowded place or is part of a crowd in general, then various factors can influence its’ movement information like other objects moving or resting in video clip. The blocks which may be affected by movement of an object in the video depend on the fact that how fast the object is moving and in which direction it is moving. If an object is moving at a fast rate, then a greater number of neighbouring blocks will be affected or influenced by its movement as compared to the blocks which are farther from it. The algorithm for the same is as given below: Input: B ← motion information in form of vector W ← size of block, L ← a set of blocks in a frame Output: H ← motion influence map set H y (y  L) to zero at the start of every frame of video repeat for every x L Td = bx × W Fx /2 = bx + π/ 2; − Fx /2 = bx −π/ 2 repeat for every y  L if x = y then Compute the Euclidean distance Ed (x, y) between bx and by if Ed (x, y) < Td then Compute the angle xy between bx and by if – Fx /2 < xy < Fx /2 then. H (bx ) = Hy (bx ) + exp (Ed − (x, y)/bx ) y

end-if end-if end-if end-for end-for If a fast-moving object is present between many slow-moving objects, then the number of neighbouring blocks that are affected by the object will be more and therefore lead to higher values in the motion influence map. But if there are a greater number of slow-moving objects, the motion flows are more consistent and stable. Therefore, these have tendency to have coherent motion patterns for both direction

Unsupervised Machine Learning for Unusual Crowd Activity Detection

839

and magnitude, which result in high values in influence map. These values can be used to predict the type of motion occurring in a particular frame and thus judge the occurrence of unusual activity.

4.3 Mega Block Generator The unusual activity if any is picked up by multiple frames in succession. So, we define a cuboid comprising n x n blocks over ‘t’ number of successive frames followed by extraction of a feature vector from it. So non-overlapping mega blocks are formed by partitioning the frames to take out the spatial–temporal features for every mega block. All the motion vectors obtained in each mega block for each frame are then added and joined with motion influence vectors of the latest ‘t’ number of frames. In this way, we get a concatenated feature vector for each mega block. The dimensions of this feature vector are 8 × ‘t’. We then carry out K-means clustering.

4.4 Training For training the model, we used video clips of normal or activities that can occur in the frame under consideration. Therefore, after if we apply k-means clustering algorithm on the mega blocks the codewords that will be generated will represent the normal activities that can occur in the frame.

4.5 Testing Testing phase includes construction of a minimum-distance matrix. It is generated using the feature vectors for the mega blocks. The value of each element of this matrix is computed by finding the least Euclidean distance in between a test frame currently being processed and the codeword for the mega block containing that frame. If the value of element in this matrix is small, it is less probable that an unusual activity could have occurred in that block. But presence of a large value in this matrix indicates the presence of an unusual activity. So, from the contents of minimum-distance matrix, highest value is picked as the representative feature value for that frame. If this highest value is greater than the brink (threshold) value, we can identify the frame under consideration as “unusual”.

840

P. Bagane et al.

Fig. 5 Division of the image into blocks of uniform size

5 Results and Discussion The results obtained from the system are shown in this section. Figure 5 represents division of the image into blocks of uniform size. Figure 6 shows how frame number of frames being processed. Codewords generated using mega blocks and motion influence map displayed in Fig. 7. Examples of abnormal crowd activity shown in Fig. 8, 9, and 10. Unusual frame numbers are shown in Fig. 11. The frame level results obtained based on various parameters of confusion matrix, and the confusion matrix itself is shown in Fig. 12. Based on Fig. 13, the accuracy given by proposed system is 96.98%, precision is 94.76%, recall value is 97.03%, and value of F-score is 95.86%, which is higher than any of the existing models.

6 Conclusion Most of the systems which have been suggested earlier intend to identify only simple human actions of type walking or running. These are not appropriate for crowded areas. The system proposed through this paper can recognize any unusual or questionable activity of humans from a crowd with help of motion influence map and OpenCV. Since it also raises an alarm to concerned authorities, appropriate action can also be taken. The accuracy of the proposed system comes out to be 96.98%, and it is decent enough for considering the activity as unusual in videos with complex

Unsupervised Machine Learning for Unusual Crowd Activity Detection

841

Fig. 6 Frame number of frames being processed

graphics. This system can be implemented at various places to keep track of different ongoing activities and get an alert in advance in case of any suspicion. The system is thus helpful saving lives of many and at the same time in instilling faith of law in public. However, for an ideal system, high accuracy and definite results are required, and hence, systems may be improved to achieve perfection. The future enhancement depends on the activity selected for different applications and the action chosen that influence the motion. Also, the questionable activity for crowdie scenario will be easy target for proposed model.

842

Fig. 7 Codewords generated using mega blocks and motion influence map Fig. 8 Abnormal crowd activity

P. Bagane et al.

Unsupervised Machine Learning for Unusual Crowd Activity Detection

Fig. 9 Abnormal crowd activity

Fig. 10 Abnormal crowd activity

843

844 Fig. 11 Unusual frame numbers

Fig. 12 Confusion matrix

P. Bagane et al.

Unsupervised Machine Learning for Unusual Crowd Activity Detection

845

Fig. 13 Confusion matrix graph

References 1. Ahmad F, Najam A, Ahmed Z (2013) Image-based face detection and recognition. JCSI Int J Comput Sci Issues 9(6):29–32 2. Ben-Musa A, Singh S, Agrawal P (2014) Suspicious human activity recognition for video surveillance system. In: International conference on control, instrumentation, communication and computational technologies (ICCICCT), pp 214–218 3. Lee D, Suk H, Park S, Lee S (2015) Motion influence map for unusual human activity detection and localization in crowded scenes. IEEE Trans Circuits Syst Video Technol 25(10):1612–1623 4. Wang G, Fu H, Liu Y (2016) Real time abnormal crowd behavior detection based on adjacent flow location estimation. In: 4th international conference on cloud computing and intelligence systems (CCIS), Beijing, China, pp 476–479 5. Salem G, Ibrahim F, Hassanpour R, Ahmed A, Douma A (2021) Detection of suspicious activities of human from surveillance videos. In: IEEE 1st international maghreb meeting of the conference on sciences and techniques of automatic control and computer engineering MI-STA, pp 794–801 6. Nandini G, Mathivanan B, NanthaBala R, Poornima P (2018) Suspicious human activity detection. Int J Adv Res Develop 3(4):12–14 7. Kak S, Mustafa M, Valente P (2018) A review of person recognition based on face model. Eurasian J Sci Eng (EAJSE) 4(1):157–168 8. Mohite A, Sangale D, Oza P, Parekar T, Navale M (2020) Unusual Human Activity Detection using OpenCV Python with Machine Learning. International Journal of Advanced Research in Computer and Communication Engineering 9(1):50–52 9. Jhapate, A., Malviya, S., Jhapate, M.: Unusual Crowd Activity Detection using OpenCV and Motion Influence Map. In: 2nd International Conference on Data, Engineering and Applications (IDEA), Bhopal, India, pp. 1–6 (2020). 10. Chalapati M, Pratap R (2020) Abnormal Human Activity Detection using Unsupervised Machine Learning Techniques. International Journal of Recent Technology and Engineering (IJRTE) 8(6):3949–3953 11. Agrawal, S., Dash, R.: Anomaly Detection in Crowded Scenes Using Motion Influence Map and Convolutional Autoencoder. In: International Conference on Computational Intelligence and Data Engineering, pp. 155–164 (2020).

Author Index

A Adarsh Goswami, 417 Aditi Jain, 337 Akanksha Dhyani, 337 Akhandpratap Manoj Singh, 405 Akshat Anand, 573 Akshat Gupta, 551 Akshaya, D., 703 Amarjeet, 79 Amita Dev, 207 Amol Raut, 197 Anand Mahendran, 489 Anantha Prabha, P., 783 Anant Tyagi, 679 Ankam Kavitha, 375 Anshul Tickoo, 551 Anubhav Sharma, 79 Anurag Shrivastava, 811 Anusree Mondal Rakhi, 627 Arin Tyagi, 679 Aruna, R., 175, 603 Arun Noyal Doss, M., 1 Arun Sharma, 207 Arya Dhorajiya, 627 Arya Suresh, 249 Avinash K. Shrivastava, 551 Avinash, S., 1 Ayush Chhoker, 163 Ayushi Bansal, 337

B Beram, Shehab Mohamed, 831 Bhavya Garg, 679 Bhuvana Shanmuka Sai Sivani, R., 267 Bipin Kumar Rai, 259

Biswaranjan Bhola, 47

C Charanappradhosh, 703 Chaudhary Wali Mohammad, 59 Cheshta Gupta, 693 Christopher Paul, A., 175

D Daniel, A. K., 615 David Sukeerthi Kumar, J., 797 Deepak Arora, 523, 669, 693 Deepak Kumar Ray, 811 Deva Priya, M., 537, 783 Dhavanit Gupta, 441 Divya Meena, S., 221

E Ebenezer Juliet, S., 759 Ezhilazhagan, C., 139

G Ganapati Bhat, 249 Gayathri, B., 831 Ghazala Ansari, 641 Gopi, S., 603 Gurudas, V. R., 249

H Hamada, Mohammed, 489 Hamza, Fouziah, 453

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. P. Mahapatra et al. (eds.), Proceedings of International Conference on Recent Trends in Computing, Lecture Notes in Networks and Systems 600, https://doi.org/10.1007/978-981-19-8825-7

847

848 Hardik Sharma, 15 Himani K. Bhaskar, 615 Hitesh, R. R., 1 Hithyshi, K., 249

I Isha Gupta, 103

J Jahnavi Chakka, 221 Juhi Agrawal, 747 Jyoti Madake, 429, 441

K Kanupriya Malik, 119 Kapil Kumar, 69 Kartikey Tiwari, 119 Kaushal Kishor, 259 Kavitha, N. S., 603 Kiran Dongre, 197 Konda Hari Krishna, 831 Konkala Jayasree, 375 Krishna Kumar, P., 771 Kumar Kannan, 489 Kumud Kundu, 417 Kunlika Saxena, 163

L Lokesh Borawar, 235

M Mala Saraswat, 89 Manan Gupta, 15 Manas Pratap Singh, 561 Manikandan, J., 703 Manoj Kumar, 397 Manoj Kumar, R., 189 Maria Celestin Vigila, S., 453 Meghna Tyagi, 119 Mohamed Fathimal, P., 581 Mohana Priya, G., 581 Mohd. Sadiq, 59 Mohit Dua, 739 Monit Kapoor, 747 Mukesh Rawat, 15, 35 Muruganandham, S., 323 Muskaan Sultan, S., 267

Author Index N Nandini, C., 281 Neera Batra, 713 Nidhi Singh, 397 Nighat Naaz Ansari, 79 Nikhil Jindal, 551 Nirbhay Bohra, 501 Nisha Thuwal, 501 Nitasha Hasteer, 561 Nitesh Kumar, 15 Nithiavathy, R., 537

O Ochin Sharma, 351

P Pinki Sharma, 297 Piyush, 79 Ponmathi Jeba Kiruba, P., 771 Pooja Bagane, 811, 831 Porkodi, S. P., 129 Prakash Kumar Sarangi, 351 Pranshu Saxena, 501 Prasadu Reddi, 25 Prashant Upadhyay, 679 Pratibha Singh, 387 Praveen Kumar Shukla, 593 Preeti Rani, 641 Premananda Sahu, 351 Priyambada Purohit, 831 Puneet Sharma, 523, 669, 693 Punit Mittal, 119

R Raghav Bhardwaj, 103 Raghuveer Singh Dhaka, 479 Raghvendra Kumar, 47 Rahul Shingare, 361 Rajesh, T. M., 515 Rajeswari, D., 573 Rakesh Shettar, 189 Rakhi Bhati, 259 Raksha Agrawal, 501 Ramani Selvanambi, 725 Ramesh, P. S., 267, 375 Rathna Sharma, R. S., 189 Raunaq Verma, 501 Ravinder Kaur, 235 Ravi Tomar, 747 Rekha Sundari, M., 25 Ritu Rani, 207

Author Index Ritvik Sapra, 207 Rohan Khurana, 655 Rohan Tyagi, 259 S Sachin Kumar, 405 Sachin Ojha, 69 Sai Sowmya, D., 25 Sakthisudhan, K., 139 Samarpan Jain, 35 Samarth Anand, 35 Sameeran Pandey, 429 Sam Peter, S., 537 Sangeetha, S., 175, 189 Sanil Joshi, 739 Santosh Kumar, 297 Santosh Kumar Sharma, 351 Sarada, V., 129 Saranya, P., 627 Saravanan, M., 151 Sarthak Aggarwal, 35 Sathiya, R., 1 Satish Agnihotri, 361 Satyanarayana Murthy, K., 25 Saurabh Gupta, 811 Sengathir, J., 537 Shaheen, H., 175 Shaila, S. G., 249, 515 Shantanu Khandelwal, 469 Shashvat Mishra, 103 Sheela, J., 221 Shital Kasyap, 35 Shivam Chaudhary, 387 Shivam Gupta, 387 Shivamma, D., 515 Shivam Rathi, 69 Shivam Sharma, 69 Shivani Batra, 655 Shiva Sumanth Reddy, 281 Shripad Bhatlawande, 441 Shweta Paliwal, 103 Shweta Saxena, 593 Shyamala Devi, M., 175, 267, 375, 603 Sindhu, A., 515 Siva Kumar, A. P., 797 Sonali Goyal, 713 Sopana Devi, M., 759 Srikanta Kumar Mohapatra, 351

849 Srujan Cheemakurthi, 221 Subashree, B., 783 Subramanyam, M. V., 797 Sudhir Baijnath Ojha, 811 Suma Avani, V., 515 Sumedha Seniaray, 337 Sumitra Singar, 479 Sunil Kumar, 79 Sunny Yadav, 89 Sushant Verma, 387 Swamita Gupta, 725 Swati Shilaskar, 441

T Tanveer Hassan, 59 Thaninki Adithya Siva Srinivas, 267 Tushar Srivastava, 523

U Ujjwal Kumar, 669 Umarani, S., 603

V Vansh Gaur, 89 Varsha Parekh, 151 Veena Khandelwal, 469 Venmani, A., 323 Venna Sri Sai Rajesh, 375 Venus Pratap Singh, 561 Vijayalakshmi, M., 311 Vineet Sharma, 655 Vinothini, V. R., 139 Vinod Kumar, 641 Vinoth Kumar, S., 175, 189, 267, 375, 603 Vipin Rai, 163 Vipul Kaushik, 89 Vishwadeepak Singh Baghela, 163 Vivek Maik, 129

Y Yash Khurana, 725 Yash Ukalkar, 593 Yash Vinayak, 311 Yogesh, 561