Innovations in Computer Science and Engineering: Proceedings of 8th ICICSE (Lecture Notes in Networks and Systems, 171) 981334542X, 9789813345423

This book features a collection of high-quality, peer-reviewed research papers presented at the 8th International Confer

153 31 32MB

English Pages 835 [788] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
Editors and Contributors
About the Editors
Contributors
Static and Dynamic Activities Prediction of Human Using Machine and Deep Learning Models
1 Introduction
2 Related Work
3 HAR Dataset
4 Machine Learning Models
5 Deep Learning Model-LSTM
6 Experimental Results
7 Conclusion and Future Work
References
Implementation of Braille Tab
1 Introduction
2 Related Work
3 Proposed System
4 Implementation and Working (Software)
5 Implementation and Working (Hardware)
6 Results
7 Conclusion
References
Resized MIMO Antenna for 5G Mobile Antenna Applications
1 Introduction
2 Self-isolated Antenna Configurations and the System Operating Sequence
3 A Compact-Based Self-isolated Antenna Component System with Four-antenna MIMO
4 Conclusion
References
Molecule Four-Port Antenna is Utilizing Detachment Progress of MIMO
1 Introduction
2 Antenna Configurations and the System Operating Sequence
3 A Compact-Based Self-isolated Antenna Component System with Four-antenna MIMO
4 Conclusion
References
IOT-Based Underwater Wireless Communication
1 Introduction
1.1 Literature Review
2 System Model
2.1 Module 1 (Terrestrial Module)
2.2 Module II (Underwater Module)
3 Results and Discussion
4 Conclusion and Future Scope
References
Pattern Prediction Using Binary Trees
1 Introduction
2 Background
3 Proposed Method
4 Result Analysis
5 Conclusion
References
Fruit Recognition Using Deep Learning
1 Introduction
2 Literature Survey
3 Proposed Method
4 Result
4.1 Validation and Testing Results After Each Epoch
5 Conclusion
References
Cross-Domain Variational Capsules for Information Extraction
1 Introduction
1.1 Domain Adaptation
2 Dataset
2.1 Introduction
2.2 Why Our Dataset Is Required?
2.3 Training and Testing
3 Approach
3.1 Variational Capsules
3.2 Cross Domain Variational Capsules
3.3 The Model
3.4 Learning Algorithm
3.5 Losses
4 Experiments and Results
4.1 Model Evaluation
4.2 Results
5 Conclusion
5.1 Future Enhancements
References
Automotive Accident Severity Prediction Using Machine Learning
1 Introduction
1.1 Organization of the Paper
2 Literature Review
3 Proposed Architecture and Methodology
4 Implementation Details and Result Analysis
5 Conclusion and Future Work
References
Analysis of Quality of Experience (QoE) in Video Streaming Over Wi-Fi in Real Time
1 Introduction
1.1 Motivation for Analysis of QOE
2 Design for Analysis of QOE
2.1 Measurement Methodology
2.2 QoE Metrics
3 Related Work
4 Data Analysis
4.1 User Study
4.2 Objective and Subjective MOS
5 Conclusion
6 Future Work
References
Self Driven UGV for Military Requirements
1 Introduction
2 Literature Survey
3 Existing System
4 Network Architecture
4.1 Convolutional Neural Network(CNN)
4.2 Equations
4.3 CARLA
5 Proposed System
5.1 Objectives
5.2 Advantages
6 Methodology
7 Experimental Set up
8 Results
8.1 Model Training and Security of the Bot
8.2 Testing of the Model
9 Conclusion
10 Future Scope
References
Vehicular Ant Lion Optimization Algorithm (VALOA) for Urban Traffic Management
1 Introduction
2 Routing Scenarios
2.1 Topology-Based Protocol
2.2 Position-Dependent Routing Protocol
2.3 Cluster-Dependent Routing Protocol
2.4 Geo-Cast Routing Protocol
2.5 Broadcasting Routing Protocol
3 Prior Work
4 Research Policies
4.1 Dynamic Protocol
4.2 Ant Lion Optimization Process
5 Simulation Result
6 Conclusion and Future Scope
References
Dynamic and Incremental Update of Mined Association Rules Against Changes in Dataset
1 Introduction
2 Related Work
3 Proposed Methodology
4 Generating Association Rules
5 Conclusions and Future Scope
References
E-Governance Using Big Data
1 Introduction
2 E-Governance and Big Data—An Inside
2.1 Characteristics of Big Data
2.2 Phases of Big Data
2.3 Features of Big Data
3 Adoption of Big Data in E-Governance Project
3.1 E-Governance and Big Data Across the Globe
3.2 E-Governance and Big Data in India
4 Challenges of Using Big Data Analytic Techniques
5 Conclusion
References
Implementation of Voice Controlled Hot and Cold Water Dispenser System Using Arduino
1 Introduction
2 Background
3 The Proposed System
3.1 Hardware
3.2 Software (Arduino 1.8.10)
4 Algorithm of the Proposed Method
5 Results
6 Conclusion
References
Future Smart Home Appliances Using IoT
1 Introduction
2 Literature Survey
2.1 Related Work
2.2 Problem Definition
3 Proposed Model
3.1 Sensors and Network Communication
3.2 Smart Home—phone and notification through Functions
3.3 Smart Home—Automation
3.4 Smart Home—Controlling and Monitoring with Smart Mobile Apps and Functions
4 Discussion
5 Conclusion and Future Scope
References
Multilingual Crawling Strategies for Information Retrieval from BRICS Academic Websites
1 Introduction
2 Problem Statement
3 Techniques Used
3.1 UNL: Universal Networking Language
3.2 NER: Named Entity Recognition
3.3 Direct APIs
3.4 NMT: Neural Machine Translation
4 Methodology
5 Data Gathering
5.1 Dataset of Names
5.2 List of Universities
6 Counter-Intuitive Approach
7 Results
8 Conclusion
References
Missing Phone Activity Detection Using LSTM Classifier
1 Introduction
2 Related Work
3 System Design
4 Algorithm/Method Design
5 Experimental Setup
5.1 Preliminary
5.2 Data Collection
5.3 Data Processing
5.4 Training
5.5 Implementation
6 Performance Evaluation
7 Conclusion
References
Suvarga: Promoting a Healthy Society
1 Introduction
2 Literature Survey
3 Proposed System
3.1 Overview
3.2 System Architecture
4 Detailed Architecture of Suvarga
4.1 Air Quality Monitoring System
4.2 Water Quality Monitoring System
4.3 Sanitation
4.4 Algorithms Used
4.5 Intelligent System
4.6 Prediction Model
4.7 Data Visualization
5 Implementation
5.1 Need for Real-Time Monitoring
5.2 Experimental Setup of Air Quality Monitoring Device
5.3 Slum Health Drive Data Analysis
5.4 Sanitation Data Analysis
5.5 Health Data Analysis
6 Results and Analysis
6.1 Mean Squared Error
6.2 Mean Absolute Error
7 Conclusion and Inferences
References
Multi-task Data Driven Modelling Based on Transfer Learned Features in Deep Learning for Biomedical Application
1 Introduction
2 Literature Survey
3 Methodology
4 Experiments and Results
5 Conclusion and Future Work
References
Punjabi Children Speech Recognition System Under Mismatch Conditions Using Discriminative Techniques
1 Introduction
2 Related Work
3 Theoretical Background
3.1 Discriminative Techniques
4 Experimental Setup
5 Experimental Results
5.1 Experimental Results with Varying Boosted Parameter Values
5.2 Experimental Results with Varying Number of Iteration Values
6 Conclusion and Future Work
References
Effective Irrigation Management System for Agriculture Using Machine Learning
1 Introduction
2 Literature Survey
3 Dataset
4 Proposed System Architecture and Methodology
5 Experimental Results
6 Conclusion
References
IoT-Based Smart Irrigation System
1 Introduction
2 Related Work
3 The Necessity to Use the Cloud
4 Benefits
5 Different Features
6 Literature Review
7 Hardware Description
7.1 Raspberry PI 4—Model B
7.2 Software Used
7.3 Power Supply
7.4 Data Acquisition System
7.5 Soil Moisture Sensor
7.6 Temperature Sensor (LM35)
7.7 Buzzer
8 Block Diagram
9 Result
10 Conclusion
References
A Hybrid Approach for Region-Based Medical Image Compression with Nature-Inspired Optimization Algorithm
1 Introduction
2 Related Works
3 Proposed Method
3.1 Region-Based Active Contour Segmentation
3.2 Integer Karhunen-Loeve Transform
4 Results and Discussion
5 Conclusion
References
Attention Mechanism-Based News Sentiment Analyzer
1 Introduction
2 Related Work
3 Data Preparation
3.1 Train Dataset
3.2 Test Dataset
4 Proposed Methodology
5 Results
5.1 Final Predictions
6 Conclusions
7 Future Work
References
Interactive Chatbot for COVID-19 Using Cloud and Natural Language Processing
1 Introduction
2 Literature Review
3 Proposed Model
4 Implementation
5 Conclusion
References
Investigating the Performance of MANET Routing Protocols Under Jamming Attack
1 Introduction
1.1 Jamming Attack
2 Related Works
3 MANET Routing Protocols
3.1 AODV Routing Protocol
3.2 Geographical Routing Protocol (GRP)
3.3 Optimized Link State Routing Protocol (OLSR)
4 Experimental Details
4.1 Simulation Tool
4.2 Simulation Setup
5 Results and Analysis
5.1 Comparison of Jamming Attack Under AODV Protocol
5.2 Comparison of Jamming Attack Under OLSR Protocol
5.3 Comparison of Jamming Attack Under GRP Protocol
6 Conclusion
References
Classification of Skin Cancer Lesions Using Deep Neural Networks and Transfer Learning
1 Introduction
1.1 Objective
2 Literature Review
3 Methodology
3.1 Data Preprocessing
3.2 Prediction Techniques
4 Results
5 Conclusion
5.1 Future Scope
References
Security Features in Hadoop—A Survey
1 Introduction
2 Hadoop Architecture
3 Literature Survey
4 Some Security Techniques
5 Conclusion and Future Scope
References
Optical Character Recognition and Neural Machine Translation Using Deep Learning Techniques
1 Introduction
2 Related Work
3 Proposed Methodology
4 Results and Discussions
5 Conclusion
References
COVID-19 Touch Project Using Deep Learning and Computer Vision
1 Introduction
2 Related Works
3 Methodology
3.1 Working of YOLOv3
3.2 Preprocessing Steps and Training
3.3 Darknet
3.4 OpenCV
3.5 Posenet
3.6 Tensorflow.js
4 Scenarios for Use Case and Working Applications
4.1 Introduction to in-Home-Based Model
4.2 ATM/Supermarket Scenario
5 Results
5.1 Comparisons of YOLOv3 Versus SSD Versus R-CNN Versus Mask-R-CNN| Verdict|Test Images
5.2 Observations of the Experimental Analysis
6 Conclusion and Future Works
References
Flood Relief Rover for Air and Land Deployment (FRRALD)
1 Introduction
2 Drone Design
3 Rover Fabrication
4 Rover Motion and Control
5 Mapping Using GPS
6 Human Detection Using PIR
7 Camera and Face Recognition
8 Conclusion
References
An Enhanced Differential Evolution Algorithm with Sorted Dual Range Mutation Operator to Solve Key Frame Extraction Problem
1 Introduction
2 Related Works
3 Proposed Mutation Strategy
4 Design of Experiments
5 Results and Discussions
6 Validation of DEsdrm on Video Analytics Problem
7 Conclusions
References
Annotation for Object Detection
1 Introduction
2 Related Work
2.1 Annotation Tools
2.2 Object Detection Algorithm
2.3 Currently Available Annotated Datasets
2.4 Metric
3 Implementation
3.1 System Architecture
3.2 Interface Design
3.3 Design
4 Experimental Setup
5 Case Study: YOLO9000
5.1 Overview of YOLO9000
5.2 Dataset Description
5.3 Experimental Setup
5.4 Results
6 Conclusion and Future Enhancements
References
Development of Self Governed Flashing System in Automotives Using AI Technique
1 Introduction
2 Necessity of the Device
3 Problem Identification
4 Existing Systems
4.1 Conventional Turn Indicator
4.2 ORVM
4.3 Automatic Vehicle Turn Indicator Using Speech Recognition (Still, It Did Not Come into the Market)
5 Proposed Solution
6 Experimental Results
7 Conclusion and Future Work
References
Comparison Between CNN and RNN Techniques for Stress Detection Using Speech
1 Introduction
2 Literature Survey
3 Database Generation
3.1 Database Generated at CPR
3.2 Recording Speech Samples
4 Methodology
4.1 Signal Pre-processing
4.2 Feature Extraction
4.3 Classification
5 Results
6 Conclusion
References
Finding the Kth Max Sum Pair in an Array of Distinct Elements Using Search Space Optimization
1 Introduction
2 Problem Statement
2.1 Abbreviations
2.2 Example
2.3 Related Work
3 Proposed Approach
3.1 Proposed Algorithm
3.2 Complexity Analysis
4 Experimental Setup, Results and Discussion
5 Conclusion
References
Dynamic Trade Flow of Selected Commodities Using Entropy Technique
1 Introduction
2 Data Analysis
3 Methods
3.1 Global Entropy
3.2 Uniformity
3.3 Local Entropy
3.4 Trade Partnership
4 Results
4.1 Global Entropy and Uniformity
4.2 Local Entropy
4.3 Export-Trade Partnership Analysis
5 Conclusion
References
39 An Automated Bengali Text Summarization Technique Using Lexicon-Based Approach
Abstract
1 Introduction
2 Literature Review
3 Suggested Method
3.1 General Tagging
3.2 Special Tagging
4 Experimental Results
4.1 Co-selection Measures
5 Conclusion
References
Location-Based Pomegranate Diseases Prediction Using GPS
1 Introduction
2 Literature Review
3 Proposed System
3.1 Processing
3.2 Segmentation
3.3 Feature Extraction
3.4 Preprocessing
3.5 Testing
3.6 Classification
3.7 Detection
4 Result and Discussion
5 Conclusion
References
Medical Image Enhancement Technique Using Multiresolution Gabor Wavelet Transform
1 Introduction
2 Related Work
3 Multiresolution Gabor Wavelet Transform
4 Experimental Results
5 Conclusion
References
HOMER-Based DES for Techno-Economic Optimization of Grid
1 Introduction
2 HOMER Software
2.1 Simulation
2.2 Optimization
3 Case Study
3.1 Renewable Energy Resources (Solar Resource)
4 Description of Hybrid Renewable Energy System
5 Size and Cost Optimization
5.1 Cost Data and Size Specifications of Each Component
5.2 Solar PV Size and Cost
6 Simulation Results and Discussions
7 Conclusion
References
Odor and Air Quality Detection and Mapping in a Dynamic Environment
1 Introduction
2 Literature Review
3 Proposed System
4 Results and Discussion
5 Conclusion
References
A Comparative Study on the Performance of Bio-inspired Algorithms on Benchmarking and Real-World Optimization Problems
1 Introductions
2 Related Works
3 Results and Discussions
3.1 Image Segmentation Using GA
3.2 Image Segmentation Using PSO
3.3 Image Segmentation Using SA
4 Conclusions
References
A Study on Optimization of Sparse and Dense Linear System Solver Over GF(2) on GPUs
1 Introduction
2 Literature Review
3 Research Methodology and Analysis
4 Linear System Solver Over GF(2) for Dense
4.1 Generate Random Matrices
4.2 Generate Linear System Using LFSR
4.3 Single and Multi-GPU Gaussian Elimination
4.4 Optimization
5 Block Lanczos Solver Over Finite Field or GF(2) for Sparse
5.1 Better Test Data Generation
5.2 Optimization of SpMV and SpMTV Operations Pression
6 Conclusions
References
Intracranial Hemorrhage Detection Using Deep Convolutional Neural Network
1 Introduction
2 Materials and Methods
3 Result and Discussion
4 Conclusion and Future Work
References
A Multi-factor Approach for Cloud Security
1 Introduction
2 Related Work
3 Proposed Model
3.1 Overview of the Proposed Approach
3.2 Detailed Description and Working Principle
4 Research Analysis
5 Future Work
6 Conclusion
References
An Innovative Authentication Model for the Enhancement of Cloud Security
1 Introduction
2 Importance of Security
3 Related Work
4 Proposed Model
4.1 Working Principal of the Proposed Work
4.2 Program Code
5 Future Work
5.1 Conclusion
References
Substituting Phrases with Idioms: A Sequence-to-Sequence Learning Approach
1 Introduction
2 Literature Survey
2.1 Encoder–Decoder
2.2 Recurrent Neural Network
2.3 Part-of-Speech Tagging
3 Methodology
4 Experimental Setup
4.1 Dataset
4.2 Evaluation Metrices
5 Results
6 Conclusions
References
A Composite Framework for Implementation of ICT Enabled Road Accident Prediction Using Spatial Data Analysis
1 Introduction
2 Problem Identification
2.1 Proposed Architecture
3 Comparative Analysis
4 Conclusion
References
VISION AID: Scene Recognition Through Caption Generation Using Deep Learning
1 Introduction
2 Related Works
3 Proposed System
3.1 Object Detection and Recognition
3.2 Attribute Prediction
3.3 Caption Generation
3.4 Application Development
4 Evaluation Results
5 Conclusion
References
Effect of Hybrid Multi-Verse with Whale Optimization Algorithm on Optimal Inventory Management in Block Chain Technology with Cloud
1 Introduction
2 Literature Review
3 Major Assumptions and Structure of Proposed Inventory Management Model
3.1 Structure and Assumptions
3.2 Problem Definition
4 Contribution of Whale-Based Multi-Verse Optimization for Inventory Management
4.1 Proposed Architecture
4.2 Proposed W-MVO
4.3 Solution Encoding
4.4 Objective Model
5 Results and Discussion
5.1 Simulation Setup
5.2 Analysis of Proposed W-MVO
6 Conclusion
References
Bottleneck Feature Extraction in Punjabi Adult Speech Recognition System
1 Introduction
2 Related Work
3 Theoretical Background
3.1 Bottleneck
4 System Overview
5 Experimental Setup
6 Results and Discussion
6.1 Performance Measure in Clean Environment of GMM-HMM with MFCC and DNN
6.2 Performance Measure Through Learning Rate
6.3 Performance Measure Through Epochs
7 Conclusion
References
A Study of Machine Learning Algorithms in Speech Recognition and Language Identification System
1 Introduction
1.1 Preprocessing
1.2 Machine Learning in Speech Processing
1.3 Types of Features
2 Language Identification Model I
2.1 Methodology
2.2 Data set and Putting into Practice
3 Language Identification Model II
3.1 Methodology
3.2 Procedural Steps in a Nutshell
3.3 Data set
3.4 Evaluation Results
4 Language Identification Model III
4.1 Data set
4.2 Procedural Steps in a Nutshell
4.3 Machine Learning Algorithm
4.4 Extended Evaluation
5 Language Identification Model IV
5.1 Methodology
5.2 Data set
5.3 Procedure in a Nutshell
5.4 Machine Learning Algorithm
5.5 Evaluation and Results
6 Conclusion
References
Plant Leaf Disease Detection and Classification Using Machine Learning Approaches: A Review
1 Introduction
2 Categorization of Plant Diseases
3 Process of Leaf Disease Detection and Classification System
3.1 Image Acquisition
3.2 Image Preprocessing
3.3 Image Segmentation
3.4 Feature Extraction
3.5 Existing Machine Learning Algorithms for Plant Disease Classification
4 Discussions
5 Conclusions
References
Single-Channel Speech Enhancement Based on Signal-to-Residual Selection Criterion
1 Introduction
2 Proposed Method
2.1 Signal Residual Selection Criterion
2.2 Channel Selection Algorithms
3 Objective Measures
3.1 SSNR
3.2 STOI
4 Subjective Measures
5 Mean Comprehensibility Score
6 Spectral Analysis
7 Results and Conclusion
8 Future Scope
References
Evolutionary Algorithm for Solving Combinatorial Optimization—A Review
1 Introduction
2 Combinatorial Optimization Problems
3 Evolutionary Algorithms for COPs
4 Conclusion
References
Effect of J48 and LMT Algorithms to Classify Movies in the Web—A Comparative Approach
1 Introduction
2 Proposed Model
2.1 J48 Decision Tree
2.2 LMT Decision Tree
2.3 Attribute Details of Table 1
3 Findings and Results
3.1 Folds Cross-Validation Test
3.2 Comparison Analyses of J48 and LMT Decision Tree
4 Conclusion
5 Future Work
References
A System to Create Automated Development Environments Using Docker
1 Introduction
2 Literature Survey
2.1 Docker
2.2 Docker in Research
2.3 Electron JS
3 Existing Solutions
4 Proposed System
5 Mechanism
6 Results
7 Conclusion
References
Novel Methodologies for Processing Structured Big Data Using Hadoop Framework
1 Introduction
2 Related Work
3 Proposed Methodology
3.1 UI
3.2 Metastore
3.3 Executing Engine
4 Characteristics of Hive
5 Conclusion
References
Intelligent Cane for Assistant to Blind and Visual Impairment People
1 Introduction
2 Related Works
2.1 Design and Implementation of Mobility Aid for Blind People
2.2 Smart Stick Blind and Visually Impaired People
3 Proposed System
3.1 Electronic Component and Their Usage
3.2 Design and Development
3.3 Hardware Assembly
4 Working
5 Results
6 Conclusion and Future Work
References
A Comprehensive Survey on Attacks and Security Protocols for VANETs
1 Introduction
2 Application Areas
2.1 Safety Related Applications
2.2 Intelligent Transportation Applications
2.3 Comfort Applications
3 Security Services
3.1 Availability
3.2 Integrity
3.3 Authentication
3.4 Confidentiality
3.5 Non Repudiation
4 Possible Attack Types
4.1 Attacks on Availability
4.2 Attacks on Authentication
4.3 Attack on Confidentiality
4.4 Attack on Non-repudiation
5 Survey on Existing Research
5.1 Public Key Infrastructure Based Schemes
5.2 Elliptic Curve Cryptography Schemes
5.3 Identity-Based Signature Schemes
6 Conclusion
References
Analysis, Visualization and Prediction of COVID-19 Pandemic Spread Using Machine Learning
1 Introduction
2 Literature Survey
3 Description of Dataset Used
4 Experimental Setup and Result
4.1 Dashboard Creation
4.2 Comparative Analysis of ML Algorithm
4.3 Data Analysis and Visualization
5 Conclusion
References
Study of Behavioral Changes and Depression Control Mechanism Using IoT and VR
1 Introduction
2 Assistance for Depressive Disorder Victims
2.1 Early Stage Identification
2.2 Monitoring the Victims
3 Wearable Devices and Sensors
4 Diagnostic Algorithms
5 VR in Treating Clinical Disorder Victims
6 Research
7 Conclusion and Future Scope
References
Sentiment Analysis on Hindi–English Code-Mixed Social Media Text
1 Introduction
2 Related Works
3 Dataset Description
4 Experiments and Results
4.1 Preprocessing
4.2 Experiments-1
4.3 Customized FastText Embedding Model
4.4 Experiments-2
5 Conclusion
References
Accident Risk Rating of Streets Using Ensemble Techniques of Machine Learning
1 Introduction
2 Literature Survey
3 The Proposed Risk Assignment Method
3.1 Dataset Selection
3.2 Data Preprocessing
3.3 Feature Engineering
3.4 Assigning the Danger Score
3.5 Applying Machine Learning Models
4 Model Evaluation
5 Conclusions and Future Scope
References
Skin Detection Using YCbCr Colour Space for UAV-Based Disaster Management
1 Introduction
2 Methodology
3 Image Processing
3.1 YCbCr Colour Space
4 Results of Skin Detection
4.1 Skin Detection Results in YCbCr
4.2 Results
5 Conclusion
References
Lie Detection Using Thermal Imaging Feature Extraction from Periorbital Tissue and Cutaneous Muscle
1 Introduction
2 Literature Survey
3 Methodology
3.1 Architecture
3.2 Data Acquisition
3.3 Data Processing
3.4 Experimenting
4 Dataset
5 Experimental Setup
6 Result Analysis
7 Conclusion
References
Voting Classification Method with PCA and K-Means for Diabetic Prediction
1 Introduction
2 Literature Survey
3 The Proposed Diabetes Prediction Model
4 Model Evaluation and Result Comparison from Previous Work
5 Conclusion and Future Scope
References
Hybrid Model for Heart Disease Prediction Using Random Forest and Logistic Regression
1 Introduction
2 Literature Survey
3 The Proposed Heart Disease Prediction Model
3.1 Dataset Selection
3.2 Data Preprocessing
3.3 Feature Selection
3.4 Classification
4 Model Evaluation and Result Comparison from Previous Work
5 Conclusion and Future Scope
References
Detection of Android Malware Using Machine Learning Techniques
1 Introduction
1.1 Android Background
1.2 Malware and Their Types
1.3 Malware Analysis Approaches
1.4 Malware Detection Techniques
2 Background Study
3 Proposed Methodology
3.1 Dataset Collection
3.2 Feature Extraction
3.3 Features Selection
3.4 Classiftcation
4 Experimental Results
5 Conclusion and Future Scope
References
The Predictive Genetic Algorithm (GA) Load Management Mechanism for Artificial Intelligence System Implementation (AI)
1 Introduction
2 Literature Survey
3 Proposed System and Methodology
3.1 Machine Learning Algorithms with BSP Paradigm
3.2 The Efficiency of the BSP Machine Model
3.3 DSP-Based Load Equilibrium Adaptation Method
4 Prediction and Simulation Method
4.1 The Load Balancing in the IWRR
4.2 Load Balance Measures with Percentage from VM
5 Results
6 Conclusion and Enhancement in Future
References
Continuous Recognition of 3D Space Handwriting Using Deep Learning
1 Introduction
2 Related Work
3 Gesture Recognition Using Arduino 101
3.1 Deep Learning on Intel® Curie™
4 Gesture Recognition Using Deep Learning
4.1 Spotting Stage
4.2 Recognition Stage
5 Conclusion
References
Automated SQL Grading System
1 Introduction
2 Literature Review
3 Challenges Identified
4 Problem Definition
5 Problem System Methodology
5.1 Flow-Chart of the System
5.2 Algorithm
6 Performance Evaluation Parameter
7 Experimental Setup
8 Conclusion
References
Error Analysis with Customer Retention Data
1 Introduction
2 Literature Review
3 Methodology
3.1 Dataset
3.2 Data Preprocessing
3.3 Evaluation Metrics
3.4 Observation
3.5 Error Analysis
4 Conclusion and Future Work
References
Prediction Based Task Scheduling for Load Balancing in Cloud Environment
1 Introduction
2 Related Work
3 System Design
4 Experimental Evaluation and Results
5 Conclusion
References
Test Case Generation Using Adequacy-Based Genetic Algorithm
1 Introduction
2 Methodology
3 Experimental Studies
3.1 Dataset
3.2 Fitness Function Construction
3.3 Parametric Settings
3.4 Experimental Results
4 Conclusion
References
Performance Analysis of π, AL and CT for Consistency Regularization Using Semi-Supervised Learning
1 Introduction
2 Literature Review
3 Consistency-Based Models
3.1 Π-model
3.2 Consistency-Based Semi-Supervised AL (Active Learning)
3.3 ICT Model
4 Experiments
4.1 Implementation
4.2 Performance Analysis
5 Conclusion
References
An Energy-Efficient PSO-Based Cloud Scheduling Strategy
1 Introduction
1.1 Performance Parameters of Cloud Scheduling
1.2 Existing Algorithms
2 Literature Review
3 Problem Statement
4 Proposed Energy-Efficient PSO (EPSO) Based Cloud Optimal Scheduling
5 Experimental Results
6 Conclusion
References
A Pronoun Replacement-Based Special Tagging System for Bengali Language Processing (BLP)
1 Introduction
2 Methodology
2.1 General Tagging
2.2 Special Tagging
3 Results and Discussion
3.1 Results on Replacement of Pronoun
4 Conclusion
References
A High Performance Pipelined Parallel Generative Adversarial Network (PipeGAN)
1 Introduction
2 Related Work
3 Methodology
3.1 Training
3.2 Pipelining Algorithm
4 Implementation
5 Experimental Evaluation
6 Results and Analysis
7 Conclusion
References
Electroencephalogram-Based Classification of Brain Disorders Using Artificial Intelligence
1 Introduction
2 EEG Signals Pre-processing
3 Feature Extraction by Dual Tree Complex Wavelet Transform (DTCWT)
4 Gaussian Mixture Model Classifier
5 Conclusion
References
Parallel Antlion Optimisation (ALO) and Grasshopper Optimization (GOA) for Travelling Salesman Problem (TSP)
1 Introduction
2 Related Work
3 Adaptation of ALO Algorithm to TSP
3.1 Parallel ALO for TSP
4 Adaptation of GOA Algorithm to TSP
4.1 Parallel GOA for TSP
5 Performance Evaluation
6 Conclusion and Future Directions
References
Design and Development of Machine Learning Model for Osteoarthritis Identification
1 Introduction
2 Literature Review
3 Methodology
4 Development of a Novel Classification Model
5 Importance of Proposed Research
6 Conclusion
References
Author Index
Recommend Papers

Innovations in Computer Science and Engineering: Proceedings of 8th ICICSE (Lecture Notes in Networks and Systems, 171)
 981334542X, 9789813345423

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Networks and Systems 171

H. S. Saini Rishi Sayal A. Govardhan Rajkumar Buyya   Editors

Innovations in Computer Science and Engineering Proceedings of 8th ICICSE

Lecture Notes in Networks and Systems Volume 171

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas— UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Turkey Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA; Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering University of Alberta, Alberta, Canada; Systems Research Institute Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.

More information about this series at http://www.springer.com/series/15179

H. S. Saini · Rishi Sayal · A. Govardhan · Rajkumar Buyya Editors

Innovations in Computer Science and Engineering Proceedings of 8th ICICSE

Editors H. S. Saini Guru Nanak Institutions Ibrahimpatnam, Telangana, India A. Govardhan Jawaharlal Nehru Technological University Hyderabad, Telangana, India

Rishi Sayal Guru Nanak Institutions Ibrahimpatnam, Telangana, India Rajkumar Buyya CLOUDS Laboratory The University of Melbourne Melbourne, VIC, Australia

ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-33-4542-3 ISBN 978-981-33-4543-0 (eBook) https://doi.org/10.1007/978-981-33-4543-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

This volume contains 84 papers that were presented at the Eighth International Conference on Innovations in Computer Science and Engineering (ICICSE-2020) held during August 28–29, 2020, at Guru Nanak Institutions, Hyderabad, India, in collaboration with Computer Society of India(CSI) and funding from All India Council for Technical Education (AICTE). The aim of this conference is to provide an vibrant virtual international forum that hubs together the researchers, scientists, academicians, corporate professionals and technically sound students under a roof to make it as a phenomenal, informative and interactive session which is acutely needed to pave the way to promote research advancements in the field of computer science and engineering. ICICSE-2020 received more than 400 research papers from various sub-fields of computer science and engineering. Each submitted paper was meticulously reviewed by our review committee consisting of senior academicians, industry professionals and professors from premier institutions and universities. • This conference was inaugurated and attended by top dignitaries such as Mr. Srini Santhanam, Vice President, S2 Integrators LLC, Atlanta, Georgia, USA; Dr. A. Govardhan, Professor and Rector, JNTU, Hyderabad; Dr. M. Manzoor Hussain, Professor and Registrar, JNTU, Hyderabad; and Mr. Aninda Bose, Senior Editor, Springer India Pvt. Ltd, India. • This conference has a fantastic line up of keynote sessions, webinars sessions by eminent speakers, paper presentation sessions to present the latest outcomes related to advancements in computing technologies. • The keynote and webinar sessions were conducted on cutting-edge technologies such as advancement in the field of artificial intelligence, advanced machine learning techniques, cybersecurity, data science-case studies, and the invited speakers were Dr. Sujala Deepak Shetty, Professor, BITS Pilani, Dubai Campus, UAE; Mr. Kiran Naidu, Data Scientist, AW Rostamani, Dubai, UAE; Dr. G. Shanmugarathinam, Professor and CISCO Certified Ethical Hacker, Presidency University, Bengaluru, India; and Dr. B. Sateesh Kumar, Professor, JNUTH, Hyderabad, India, respectively.

v

vi

Preface

• The organizing committee of ICICSE-2020 takes the opportunity to thank the invited speakers, session chairs and reviewers for their excellent support in making this ICICSE-2020 a grand success during this unprecedented pandemic time. • The quality of the research papers is a courtesy from respective authors and reviewers to come up to the desired level of excellence. We are indebted to the program committee members and external reviewers in producing the best-quality research papers in a short span of time. We also thank CSI delegates, AICTE, toward their valuable suggestions and funding in making this event a grand success. Hyderabad, India Hyderabad, India Hyderabad, India Melbourne, Australia

H. S. Saini Rishi Sayal A. Govardhan Rajkumar Buyya

Contents

Static and Dynamic Activities Prediction of Human Using Machine and Deep Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Valai Ganesh, Mohit Agarwal, Suneet Kr. Gupta, and S. Rajakarunakaran

1

Implementation of Braille Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tejas Kulkarni, Shikha Jha, Sunny Gupta, and Anuja Gote

9

Resized MIMO Antenna for 5G Mobile Antenna Applications . . . . . . . . . S. Subramanyam, S. Ashok Kumar, and T. Shanmuganantham

19

Molecule Four-Port Antenna is Utilizing Detachment Progress of MIMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K. Venugopal Rao, S. Ashok Kumar, and T. Shanmuganantham

27

IOT-Based Underwater Wireless Communication . . . . . . . . . . . . . . . . . . . . Gitimayee Sahu and Sanjay S. Pawar

33

Pattern Prediction Using Binary Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T. Aditya Sai Srinivas, Ramasubbareddy Somula, Karrothu Aravind, and S. S. Manivannan

43

Fruit Recognition Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. Balakesava Reddy, Somula Ramasubbareddy, D. Saidulu, and K. Govinda

53

Cross-Domain Variational Capsules for Information Extraction . . . . . . . Akash Nagaraj, K. Akhil, Akshay Venkatesh, and H. R. Srikanth

63

Automotive Accident Severity Prediction Using Machine Learning . . . . . Niva Mohapatra, Shreyanshi Singh, Bhabendu Kumar Mohanta, and Debasish Jena

73

Analysis of Quality of Experience (QoE) in Video Streaming Over Wi-Fi in Real Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Vijayalakshmi and Linganagouda Kulkarni

79

vii

viii

Contents

Self Driven UGV for Military Requirements . . . . . . . . . . . . . . . . . . . . . . . . . Hrishikesh Vichore, Jaishankar Gurumurthi, Akhil Nair, Mukesh Choudhary, and Leena Ladge Vehicular Ant Lion Optimization Algorithm (VALOA) for Urban Traffic Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ruchika Kumari and Rakesh Kumar

87

99

Dynamic and Incremental Update of Mined Association Rules Against Changes in Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 N. Satyavathi and B. Rama E-Governance Using Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Poonam Salwan and Veerpaul Kaur Maan Implementation of Voice Controlled Hot and Cold Water Dispenser System Using Arduino . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 K. Sateesh Kumar, P. Udaya Bhanu, T. Murali Krishna, P. Vijay Kumar, and Ch. Saidulu Future Smart Home Appliances Using IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Pattlola Srinivas, M. Swami Das, and Y. L. Malathi Latha Multilingual Crawling Strategies for Information Retrieval from BRICS Academic Websites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Shubam Bharti, Shivam Kathuria, Manish Kumar, Rajesh Bhatia, and Bhavya Chhabra Missing Phone Activity Detection Using LSTM Classifier . . . . . . . . . . . . . . 161 Abhinav Rastogi, Arijit Das, and Aruna Bhat Suvarga: Promoting a Healthy Society . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 R. L. Priya, Gayatri Patil, Gaurav Tirodkar, Yash Mate, and Nikhil Nagdev Multi-task Data Driven Modelling Based on Transfer Learned Features in Deep Learning for Biomedical Application . . . . . . . . . . . . . . . . 185 N. Harini, B. Ramji, V. Sowmya, Vijay Krishna Menon, E. A. Gopalakrishnan, V. V. Sajith Variyar, and K. P. Soman Punjabi Children Speech Recognition System Under Mismatch Conditions Using Discriminative Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 195 Harshdeep Kaur, Vivek Bhardwaj, and Virender Kadyan Effective Irrigation Management System for Agriculture Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 S. T. Patil, M. S. Bhosale, and R. M. Kamble IoT-Based Smart Irrigation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Mithilesh Kumar Pandey, Deepak Garg, Neeraj Kumar Agrahari, and Shivam Singh

Contents

ix

A Hybrid Approach for Region-Based Medical Image Compression with Nature-Inspired Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . 225 S. Saravanan and D. Sujitha Juliet Attention Mechanism-Based News Sentiment Analyzer . . . . . . . . . . . . . . . 235 Sweta Kaman Interactive Chatbot for COVID-19 Using Cloud and Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Patel Jaimin, Patel Nehal, and Patel Sandip Investigating the Performance of MANET Routing Protocols Under Jamming Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Protiva Sen and Mostafizur Rahman Classification of Skin Cancer Lesions Using Deep Neural Networks and Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Danny Joel Devarapalli, Venkata Sai Dheeraj Mavilla, Sai Prashanth Reddy Karri, Harshit Gorijavolu, and Sri Anjaneya Nimmakuri Security Features in Hadoop—A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 Gousiya Begum, S. Zahoor Ul Huq, and A. P. Siva Kumar Optical Character Recognition and Neural Machine Translation Using Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 K. Chandra Shekar, Maria Anisha Cross, and Vignesh Vasudevan COVID-19 Touch Project Using Deep Learning and Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Chatla Venkat Rohit and GRS Murthy Flood Relief Rover for Air and Land Deployment (FRRALD) . . . . . . . . . 297 Jewel Moncy John, Justin Eapen, Jeffin John, Ebin Joseph, and Abraham K Thomas An Enhanced Differential Evolution Algorithm with Sorted Dual Range Mutation Operator to Solve Key Frame Extraction Problem . . . . 307 M. Aathira and G. Jeyakumar Annotation for Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 P. Myna, R. V. Anirudh, Brundha Rajendra Babu, Eleanor Prashamshini, and Jyothi S. Nayak Development of Self Governed Flashing System in Automotives Using AI Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 N. Sankarachelliah, V. Rijith Kumar, P. Senthilram, S. Valai Ganesh, T. Selva Sundar, S. Godwin Barnabas, and S. Rajakarunakaran Comparison Between CNN and RNN Techniques for Stress Detection Using Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Bageshree Pathak, Snehal Gajbhiye, Aditi Karjole, and Sonali Pawar

x

Contents

Finding the Kth Max Sum Pair in an Array of Distinct Elements Using Search Space Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Deepak Ahire, Smriti Bhandari, and Kiran Kamble Dynamic Trade Flow of Selected Commodities Using Entropy Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Sharmin Akter Milu, Javed Hossain, and Ashadun Nobi An Automated Bengali Text Summarization Technique Using Lexicon-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Busrat Jahan, Sheikh Shahparan Mahtab, Md. Faizul Huq Arif, Ismail Siddiqi Emon, Sharmin Akter Milu, and Md. Julfiker Raju Location-Based Pomegranate Diseases Prediction Using GPS . . . . . . . . . . 375 Rajshri N. Malage and Mithun B. Patil Medical Image Enhancement Technique Using Multiresolution Gabor Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Kapila Moon and Ashok Jetawat HOMER-Based DES for Techno-Economic Optimization of Grid . . . . . . 393 R. Raja Kishore, D. Jaya Kumar, Dhonvan Srinu, and K. Satyavathi Odor and Air Quality Detection and Mapping in a Dynamic Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Raghunandan Srinath, Jayavrinda Vrindavanam, Rahul Rajendrakumar Budyal, Y. R. Sumukh, L. Yashaswini, and Sangeetha S. Chegaraddi A Comparative Study on the Performance of Bio-inspired Algorithms on Benchmarking and Real-World Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 E. Lakshmi Priya, C. Sai Sreekari, and G. Jeyakumar A Study on Optimization of Sparse and Dense Linear System Solver Over GF(2) on GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 Prashant Verma and Kapil Sharma Intracranial Hemorrhage Detection Using Deep Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 K. Thirunavukkarasu, Anmol Gupta, Satheesh Abimannan, and Shahnawaz Khan A Multi-factor Approach for Cloud Security . . . . . . . . . . . . . . . . . . . . . . . . . 437 Francis K. Mupila and Himanshu Gupta An Innovative Authentication Model for the Enhancement of Cloud Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 Francis K. Mupila and Himanshu Gupta

Contents

xi

Substituting Phrases with Idioms: A Sequence-to-Sequence Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 Nikhil Anand A Composite Framework for Implementation of ICT Enabled Road Accident Prediction Using Spatial Data Analysis . . . . . . . . . . . . . . . . 465 Dara Anitha Kumari and A. Govardhan VISION AID: Scene Recognition Through Caption Generation Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 Mathew Regi and Mathews Abraham Effect of Hybrid Multi-Verse with Whale Optimization Algorithm on Optimal Inventory Management in Block Chain Technology with Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483 C Govindasamy and A. Antonidoss Bottleneck Feature Extraction in Punjabi Adult Speech Recognition System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 Shashi Bala, Virender Kadyan, and Vivek Bhardwaj A Study of Machine Learning Algorithms in Speech Recognition and Language Identification System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 Aakansha Mathur and Razia Sultana Plant Leaf Disease Detection and Classification Using Machine Learning Approaches: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 Majji V. Appalanaidu and G. Kumaravelan Single-Channel Speech Enhancement Based on Signal-to-Residual Selection Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 Ramesh Nuthakki, Junaid Abbas, Ayesha Afnan, Faisal Ahmed Shariff, and Akshaya Hari Evolutionary Algorithm for Solving Combinatorial Optimization—A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539 Anisha Radhakrishnan and G. Jeyakumar Effect of J48 and LMT Algorithms to Classify Movies in the Web—A Comparative Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 Prashant Bhat and Pradnya Malaganve A System to Create Automated Development Environments Using Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 N. S. Akhilesh, M. N. Aniruddha, Anirban Ghosh, and K. Sindhu Novel Methodologies for Processing Structured Big Data Using Hadoop Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565 Prashant Bhat and Prajna Hegde

xii

Contents

Intelligent Cane for Assistant to Blind and Visual Impairment People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 Meet Patel, Hemal Ahir, and Falgun Thakkar A Comprehensive Survey on Attacks and Security Protocols for VANETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583 Aminul Islam, Sudhanshu Ranjan, Arun Pratap Rawat, and Soumayadev Maity Analysis, Visualization and Prediction of COVID-19 Pandemic Spread Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597 Snigdha Sen, B. K. Thejas, B. L. Pranitha, and I. Amrita Study of Behavioral Changes and Depression Control Mechanism Using IoT and VR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Pavan Kumar Katkuri and Archana Mantri Sentiment Analysis on Hindi–English Code-Mixed Social Media Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615 T. Tulasi Sasidhar, B. Premjith, K. Sreelakshmi, and K. P. Soman Accident Risk Rating of Streets Using Ensemble Techniques of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623 Akanksha Rastogi and Amrit Lal Sangal Skin Detection Using YCbCr Colour Space for UAV-Based Disaster Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633 S. J. Arya, A. Asish, B. S. Febi Shine, J. L. Sreelakshmi, and Elizabeth Varghese Lie Detection Using Thermal Imaging Feature Extraction from Periorbital Tissue and Cutaneous Muscle . . . . . . . . . . . . . . . . . . . . . . . 643 Prajkta Kodavade, Shivani Bhandigare, Aishwarya Kadam, Neha Redekar, and Kiran P. Kamble Voting Classification Method with PCA and K-Means for Diabetic Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651 Anupama Yadav, Harsh K. Verma, and Lalit Kumar Awasthi Hybrid Model for Heart Disease Prediction Using Random Forest and Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657 Hemant Kumar Sharma and Amrit Lal Sangal Detection of Android Malware Using Machine Learning Techniques . . . 663 Sonal Pandey, C. Rama Krishna, Ashu Sharma, and Sanjay Sharma The Predictive Genetic Algorithm (GA) Load Management Mechanism for Artificial Intelligence System Implementation (AI) . . . . . 677 T. Pushpatha and S. Nagaprasad

Contents

xiii

Continuous Recognition of 3D Space Handwriting Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693 Sagar Maheshwari and Sachin Gajjar Automated SQL Grading System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701 Shohna Kanchan, Samruddhi Kalsekar, Nishita Dubey, Chelsea Fernandes, and Safa Hamdare Error Analysis with Customer Retention Data . . . . . . . . . . . . . . . . . . . . . . . 709 V. Kaviya, V. Harisankar, and S. Padmavathi Prediction Based Task Scheduling for Load Balancing in Cloud Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719 Suresh Chandra Moharana, Amulya Ratna Swain, and Ganga Bishnu Mund Test Case Generation Using Adequacy-Based Genetic Algorithm . . . . . . . 727 Ruchika Malhotra and Shivani Pandey Performance Analysis of π, AL and CT for Consistency Regularization Using Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . . . 737 Rishita Choubey and Koushik Bhattacharyya An Energy-Efficient PSO-Based Cloud Scheduling Strategy . . . . . . . . . . . 749 Ranga Swamy Sirisati, M. Vishnu Vardhana Rao, S. Dilli Babu, and M. V. Narayana A Pronoun Replacement-Based Special Tagging System for Bengali Language Processing (BLP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761 Busrat Jahan, Ismail Siddiqi Emon, Sharmin Akter Milu, Mohammad Mobarak Hossain, and Sheikh Shahparan Mahtab A High Performance Pipelined Parallel Generative Adversarial Network (PipeGAN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769 Rithvik Chandan, Niharika Pentapati, Rahul M. Koushik, and Rahul Nagpal Electroencephalogram-Based Classification of Brain Disorders Using Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779 Laxmi Raja and R. Santhosh Parallel Antlion Optimisation (ALO) and Grasshopper Optimization (GOA) for Travelling Salesman Problem (TSP) . . . . . . . . . . 787 G. R. Dheemanth, V. C. Skanda, and Rahul Nagpal Design and Development of Machine Learning Model for Osteoarthritis Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795 Naidu Srinivas Kiran Babu, E. Madhusudhana Reddy, S. Jayanthi, and K. Rajkumar Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803

Editors and Contributors

About the Editors Dr. H. S. Saini Managing Director of Guru Nanak Institutions obtained his Ph.D. in the field of computer science. He has over 30 years of experience at university/college level in teaching UG/PG students and has guided several B.Tech. and M.Tech. projects and six Ph.D. scholars. He has published/presented above 90 high-quality research papers in international and national journals and proceedings of international conferences. He has published six books with Springer. He is a lover of innovation and is an advisor for NBA/NAAC accreditation process to many institutions in India and abroad. He is chief editor of many innovative journals and chairing various international conferences. Dr. Rishi Sayal Associate Director, Guru Nanak Institute of Technical Campus, has completed his B.E. (CSE), M.Tech. (IT) and Ph.D. (CSE). He has obtained his Ph.D. in computer science and engineering in the field of data mining from prestigious Mysore University of Karnataka State. He has over 28 years of experience in training, consultancy, teaching and placements. His current areas of research interest include data mining, network security and databases. He has published wide number of research papers in international conferences and journals. He has guided many UG and PG research projects, and he is recipient of many research grants from government funding agencies. He is co-editor of various innovative journals and convened international conferences. Dr. A. Govardhan is presently Professor of computer science and engineering, Rector, JNTUH, and Executive Council Member, Jawaharlal Nehru Technological University (JNTUH), Hyderabad (JNTUH), India. He did his Ph.D. from JNTUH. He has 25 years of teaching and research experience. He is member on advisory boards and academic boards and technical program committee member for more than 85 international and national conferences. He is member on boards of governors and academic councils for number of colleges. He has three monographs and ten book chapters in Springer, Germany. He has guided 85 Ph.D. theses, xv

xvi

Editors and Contributors

1 M.Phil. and 135 M.Tech. projects. He has published 555 research papers at international/national journals/conferences including IEEE, ACM, Springer, Elsevier and Inderscience. He has delivered more than 100 keynote speeches and invited lectures. He has chaired 22 sessions at the international/national conferences in India and abroad. He has the research projects (completed/ ongoing) worth of Rs. 1.159 crores. Dr. Rajkumar Buyya is Redmond Barry Distinguished Professor and Director of the Cloud Computing and Distributed Systems (CLOUDS) Laboratory at the University of Melbourne, Australia. He is also serving as Founding CEO of Manjrasoft Pvt. Ltd., a spin-off company of the university, commercializing its innovations in cloud computing. He served as Future Fellow of the Australian Research Council during 2012–2016. He received his Ph.D. from Monash University, Melbourne, Australia, in 2002. He has authored/co-authored over 625 publications. He has co-authored five textbooks and edited proceedings of over 26 international conferences. He is one of the highly cited authors in computer science and software engineering (h-index=134, g-index=298 and 95,300+ citations). He has edited proceedings of over 25 international conferences published by prestigious organizations, namely the IEEE Computer Society Press and Springer Verlag.

Contributors M. Aathira Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India Junaid Abbas Department of Electronics and Communication Engineering, Atria Institute of Technology, Bengaluru, India Satheesh Abimannan School of Computer Science and Engineering, Galgotias University, Greater Noida, India Mathews Abraham Department of Information Technology, Rajagiri School of Engineering and Technology, Ernakulam, Kerala, India T. Aditya Sai Srinivas Computer Science Department, G. Pullaiah College of Engineering and Technology, Kurnool, India Ayesha Afnan Department of Electronics and Communication Engineering, Atria Institute of Technology, Bengaluru, India Mohit Agarwal Bennett University, Greater Noida, India Neeraj Kumar Agrahari Department of Computer Application, National Institute of Technology Kurukshetra, Kurukshetra, India

Editors and Contributors

xvii

Hemal Ahir G H Patel College of Engineering and Technology, Vallabh Vidhyanagar, Gujarat, India Deepak Ahire Walchand College of Engineering, Sangli, Maharashtra, India K. Akhil Department of Computer Science, PES University, Bengaluru, India N. S. Akhilesh BMS College of Engineering, Bangalore, India I. Amrita Department of CSE, Global Academy of Technology, Bengaluru, Karnataka, India Nikhil Anand Internshala, Gurugram, India M. N. Aniruddha BMS College of Engineering, Bangalore, India R. V. Anirudh Computer Science and Engineering, B.M.S. College of Engineering, Basavanagudi, Bangalore, Karnataka, India A. Antonidoss Department of Computer Science and Engineering, Hindustan Institute of Technology and Science, Chennai, India Majji V. Appalanaidu Department of Computer Science, Pondicherry University Karaikal Campus, Karaikal, Pondicherry, India Karrothu Aravind Computer Science and Engineering, GMRIT Engineering College, Razam, India S. J. Arya Department of Electrical & Electronics Engineering, Mar Baselios College of Engineering & Technology, Thiruvananthapuram, India S. Ashok Kumar Jyothishmathi Institute of Technological Sciences, Karimnagar, India A. Asish Department of Electrical & Electronics Engineering, Mar Baselios College of Engineering & Technology, Thiruvananthapuram, India Lalit Kumar Awasthi Department of Computer Science and Engineering, Dr. B R Ambedkar National Institute of Technology Jalandhar, Jalandhar, Punjab, India Brundha Rajendra Babu Computer Science and Engineering, B.M.S. College of Engineering, Basavanagudi, Bangalore, Karnataka, India Naidu Srinivas Kiran Babu Department of Computer Applications, Career Point University, Kota, India Shashi Bala Chitkara University Institute of Engineering & Technology, Chitkara University, Rajpura, Punjab, India P. Balakesava Reddy Information Telangana, India

Technology,

VNRVJIET,

Gousiya Begum CSE Department, JNTU, Ananthapuramu, India

Hyderabad,

xviii

Editors and Contributors

Smriti Bhandari Department of Computer Science and Engineering, Annasaheb Dange College of Engineering and Technology, Ashta, Maharashtra, India Shivani Bhandigare Department of Computer Science and Engineering, Walchand College of Engineering, Sangli, India Vivek Bhardwaj Chitkara University Institute of Engineering & Technology, Chitkara University, Rajpura, Punjab, India Shubam Bharti Department of Computer Science and Engineering, Punjab Engineering College, Chandigarh, India Aruna Bhat Department of Computer Technological University, Delhi, India

Science

and

Engineering,

Delhi

Prashant Bhat School of Computational Sciences and Information Technology, Garden City University, Bengaluru, Karnataka, India Rajesh Bhatia Department of Computer Science and Engineering, Punjab Engineering College, Chandigarh, India Koushik Bhattacharyya Computer Science and Engineering, Dream Institute of Technology, Kolkata, India M. S. Bhosale Department of Computer Science and Engineering, TKIET, Warnanagar, Kolhapur, India Rahul Rajendrakumar Budyal Department of ECE, Nitte Meenakshi Institute of Technology, Bengaluru, India Rithvik Chandan Department of Computer Science and Engineering, PES University, Bangalore, India Sangeetha S. Chegaraddi Department of ECE, Nitte Meenakshi Institute of Technology, Bengaluru, India Bhavya Chhabra Department of Computer Science and Engineering, SRM Institute of Science and Technology, Chennai, India Rishita Choubey Computer Science and Engineering, Dream Institute of Technology, Kolkata, India Mukesh Choudhary SIES Graduate School of Technology, Navi Mumbai, India Maria Anisha Cross GNITC, Hyderabad, Telangana, India Arijit Das Department of Computer Science and Engineering, Delhi Technological University, Delhi, India Danny Joel Devarapalli Department of Computer Science and Engineering, Vignan Institute of Technology and Science, Hyderabad, Telangana, India G. R. Dheemanth Department of Computer Science and Engineering, PES University, Bengaluru, India

Editors and Contributors

xix

S. Dilli Babu Department of CSE, Vignan’s Institute of Management and Technology for Women, Hyderabad, India Nishita Dubey Department of Computer Engineering, St. Francis Institute of Technology, Mumbai, India Justin Eapen Faculty, Department of ECE, Saintgits College of Engineering, Kottayam, Kerala, India Ismail Siddiqi Emon Department of CSE, Feni University, Feni, Bangladesh Md. Faizul Huq Arif Department of ICT(DoICT), ICT Division, Dhaka, Bangladesh B. S. Febi Shine Department of Electrical & Electronics Engineering, Mar Baselios College of Engineering & Technology, Thiruvananthapuram, India Chelsea Fernandes Department of Computer Engineering, St. Francis Institute of Technology, Mumbai, India Snehal Gajbhiye Department of Electronics and Telecommunications, MKSSS’s Cummins College of Engineering for Women, Pune, India Sachin Gajjar Department of Electronics and Communication Engineering, Nirma University, Ahmedabad, Gujarat, India Deepak Garg Department of Computer Application, National Institute of Technology Kurukshetra, Kurukshetra, India Anirban Ghosh BMS College of Engineering, Bangalore, India S. Godwin Barnabas Ramco Institute of Technology, Rajapalayam, Tamil Nadu, India E. A. Gopalakrishnan Center for Computational Engineering & Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India Harshit Gorijavolu Department of Computer Science and Engineering, Vignan Institute of Technology and Science, Hyderabad, Telangana, India Anuja Gote Department of Information Technology, Vidyalankar Institute of Technology, Mumbai, India A. Govardhan JNTUH, Hyderabad, India K. Govinda SCOPE, VIT University, Vellore, Tamilnadu, India C Govindasamy Department of Computer Science and Engineering, Hindustan Institute of Technology and Science, Chennai, India Anmol Gupta School of Computer Science and Engineering, Galgotias University, Greater Noida, India Himanshu Gupta Amity University, Noida, Uttar Pradesh, India

xx

Editors and Contributors

Suneet Kr. Gupta Bennett University, Greater Noida, India Sunny Gupta Department of Information Technology, Vidyalankar Institute of Technology, Mumbai, India Jaishankar Gurumurthi SIES Graduate School of Technology, Navi Mumbai, India Safa Hamdare Department of Computer Engineering, St. Francis Institute of Technology, Mumbai, India Akshaya Hari Department of Electronics and Communication Engineering, Atria Institute of Technology, Bengaluru, India N. Harini Center for Computational Engineering & Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India V. Harisankar Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India Prajna Hegde School of Computational Sciences and Information Technology, Garden City University, Bengaluru, Karnataka, India Javed Hossain Department of Computer Science and Telecommunication Engineering(CSTE), Noakhali Science and Technology University, Sonapur, Noakhali, Bangladesh Mohammad Mobarak Hossain Department of CSE, Asian University of Bangladesh, Dhaka, Bangladesh S. Zahoor Ul Huq CSE Department, GPREC, Kurnool, India Aminul Islam Department of Information Technology, Indian Institute of Information Technology Allahabad, Prayagraj, India Busrat Jahan Department of CSE, Feni University, Feni, Bangladesh Patel Jaimin Smt. K D Patel Department of Information Technology, Chandubhai S. Patel Institute of Technology (CSPIT), Faculty of Technology & Engineering (FTE), Charotar University of Science and Technology (CHARUSAT), Changa, Gujarat, India D. Jaya Kumar Department of ECE, Marri Laxman Reddy Institute of Technology and Management, Hyderabad, India S. Jayanthi Department of IT, Guru Nanak Institute of Technology, Hyderabad, India Debasish Jena Department of Computer Science and Engineering, IIIT Bhubaneswar, Bhubaneswar, Odisha, India Ashok Jetawat Faculty of Engineering, Pacific Academy of Higher Education and Research University, Udaipur, India

Editors and Contributors

xxi

G. Jeyakumar Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India Shikha Jha Department of Information Technology, Vidyalankar Institute of Technology, Mumbai, India Jeffin John Department of ECE, Saintgits College of Engineering, Kottayam, Kerala, India Jewel Moncy John Department of ECE, Saintgits College of Engineering, Kottayam, Kerala, India Ebin Joseph Department of ECE, Saintgits College of Engineering, Kottayam, Kerala, India Md. Julfiker Raju Department of CSE, Feni University, Feni, Bangladesh D. Sujitha Juliet Department of Computer Science and Engineering, Karunya Institute of Technology and Sciences, Coimbatore, India Aishwarya Kadam Department of Computer Science and Engineering, Walchand College of Engineering, Sangli, India Virender Kadyan Department of Informatics, School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India Samruddhi Kalsekar Department of Computer Engineering, St. Francis Institute of Technology, Mumbai, India Sweta Kaman Department of Science of Intelligence, IIT Jodhpur, Karwar, India Kiran Kamble Department of Computer Science and Engineering, Walchand College of Engineering, Sangli, Maharashtra, India Kiran P. Kamble Department of Computer Science and Engineering, Walchand College of Engineering, Sangli, India R. M. Kamble Department of Computer Science and Engineering, ADCET, ASTHA, Ashta, Kolhapur, India Shohna Kanchan Department of Computer Engineering, St. Francis Institute of Technology, Mumbai, India Aditi Karjole Department of Electronics and Telecommunications, MKSSS’s Cummins College of Engineering for Women, Pune, India Sai Prashanth Reddy Karri Department of Computer Science and Engineering, Vignan Institute of Technology and Science, Hyderabad, Telangana, India Shivam Kathuria Department Electrical Engineering, Punjab Engineering College, Chandigarh, India Pavan Kumar Katkuri Chitkara University Institute of Engineering and Technology, Punjab, India

xxii

Editors and Contributors

Harshdeep Kaur Chitkara University, Institute of Engineering and Technology, Chitkara University, Punjab, India V. Kaviya Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India Shahnawaz Khan Department of Information Technology, University College of Bahrain, Saar, Bahrain Prajkta Kodavade Department of Computer Science and Engineering, Walchand College of Engineering, Sangli, India Rahul M. Koushik Department of Computer Science and Engineering, PES University, Bangalore, India Vijay Krishna Menon Center for Computational Engineering & Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India Linganagouda Kulkarni KLE Technological University, Hubli, India Tejas Kulkarni Department of Information Technology, Vidyalankar Institute of Technology, Mumbai, India Manish Kumar Department of Computer Science and Engineering, Punjab Engineering College, Chandigarh, India Bhabendu Kumar Mohanta Department of Computer Science and Engineering, IIIT Bhubaneswar, Bhubaneswar, Odisha, India G. Kumaravelan Department of Computer Science, Pondicherry University Karaikal Campus, Karaikal, Pondicherry, India Rakesh Kumar Department of CSE, CUH Mahendergarh, Mahendergarh, Haryana, India Dara Anitha Kumari Department of Computer Science, JNTUH, Hyderabad, India Ruchika Kumari Department of CSE, NITTTR, Chandigarh, India Leena Ladge SIES Graduate School of Technology, Navi Mumbai, India E. Lakshmi Priya Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India Veerpaul Kaur Maan Giani Zail Singh Punjab Technical University, Bathinda, Punjab, India Sagar Maheshwari Department of Electronics and Communication Engineering, Nirma University, Ahmedabad, Gujarat, India Sheikh Shahparan Mahtab Department Chittagong Division, Bangladesh

of

EEE,

Feni

University,

Feni,

Editors and Contributors

xxiii

Soumayadev Maity Department of Information Technology, Indian Institute of Information Technology Allahabad, Prayagraj, India Pradnya Malaganve Department Computational Science and IT, Garden City University, Bengaluru, India Rajshri N. Malage Department of CSE, N K Orchid College of Engineering and Technology Solapur, Solapur, India Y. L. Malathi Latha Department of CSE, Swami Vivekananda Institute of Technology, Secunderabad, Telangana State, India Ruchika Malhotra Department of Computer Science and Engineering, Delhi Technological University, New Delhi, India S. S. Manivannan SCOPE, VIT University, Vellore, India Archana Mantri Chitkara University Institute of Engineering and Technology, Punjab, India Yash Mate Computer Department, Vivekanand Education Society’s Education of Society Chembur, Chembur, Mumbai, India Aakansha Mathur Department of Computer Science, BITS Pilani, Dubai, United Arab Emirates Venkata Sai Dheeraj Mavilla Department of Computer Science and Engineering, Vignan Institute of Technology and Science, Hyderabad, Telangana, India Sharmin Akter Milu Department of Computer Science and Telecommunication Engineering(CSTE), Noakhali Science and Technology University, Sonapur, Noakhali, Bangladesh Niva Mohapatra Department of Computer Science and Engineering, IIIT Bhubaneswar, Bhubaneswar, Odisha, India Suresh Chandra Moharana KIIT Deemed to be University, Bhubaneswar, Odisha, India Kapila Moon Department of Electronics Engineering, Ramrao Adik Institute of Technology, Navi Mumbai, India Ganga Bishnu Mund KIIT Deemed to be University, Bhubaneswar, Odisha, India Francis K. Mupila Amity University, Noida, Uttar Pradesh, India T. Murali Krishna Department of ECE, Vignan’s Lara Institute of Technology and Science, Vadlamudi, AP, India GRS Murthy Department of Computer Science and Engineering, Avanthi Institute of Engineering and Technology, Vizianagaram, Andhra Pradesh, India P. Myna Computer Science and Engineering, B.M.S. College of Engineering, Basavanagudi, Bangalore, Karnataka, India

xxiv

Editors and Contributors

S. Nagaprasad Department of M.C.A., St.Ann’s College, Mehdipatnam, Hyderabad, Telanagana, India; Faculty of CS and CA, Tara Govt. College (A), Sangareddy, Telangana, India Akash Nagaraj Department of Computer Science, PES University, Bengaluru, India Nikhil Nagdev Computer Department, Vivekanand Education Society’s Education of Society Chembur, Chembur, Mumbai, India Rahul Nagpal Department of Computer Science and Engineering, PES University, Bengaluru, India Akhil Nair SIES Graduate School of Technology, Navi Mumbai, India M. V. Narayana Department of CSE, Guru Nanak Institutions Technical Campus, Hyderabad, India Jyothi S. Nayak Computer Science and Engineering, B.M.S. College of Engineering, Basavanagudi, Bangalore, Karnataka, India Patel Nehal Smt. K D Patel Department of Information Technology, Chandubhai S. Patel Institute of Technology (CSPIT), Faculty of Technology & Engineering (FTE), Charotar University of Science and Technology (CHARUSAT), Changa, Gujarat, India Sri Anjaneya Nimmakuri Department of Computer Science and Engineering, Vignan Institute of Technology and Science, Hyderabad, Telangana, India Ashadun Nobi Department of Computer Science and Telecommunication Engineering(CSTE), Noakhali Science and Technology University, Sonapur, Noakhali, Bangladesh Ramesh Nuthakki Department of Electronics and Communication Engineering, Atria Institute of Technology, Bengaluru, India S. Padmavathi Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India Mithilesh Kumar Pandey Department of Computer Application, National Institute of Technology Kurukshetra, Kurukshetra, India Shivani Pandey Department of Computer Science and Engineering, Delhi Technological University, New Delhi, India Sonal Pandey NITTTR Chandigarh, Chandigarh, India Meet Patel G H Patel College of Engineering and Technology, Vallabh Vidhyanagar, Gujarat, India Bageshree Pathak Department of Electronics and Telecommunications, MKSSS’s Cummins College of Engineering for Women, Pune, India

Editors and Contributors

xxv

Gayatri Patil Computer Department, Vivekanand Education Society’s Education of Society Chembur, Chembur, Mumbai, India Mithun B. Patil Department of CSE, N K Orchid College of Engineering and Technology Solapur, Solapur, India S. T. Patil Department of CSE, Sanjay Ghodawat University, Kolhapur, India Sanjay S. Pawar Department of EXTC, UMIT, SNDT Women’s University, Mumbai, India Sonali Pawar Department of Electronics and Telecommunications, MKSSS’s Cummins College of Engineering for Women, Pune, India Niharika Pentapati Department of Computer Science and Engineering, PES University, Bangalore, India B. L. Pranitha Department of CSE, Global Academy of Technology, Bengaluru, Karnataka, India Eleanor Prashamshini Computer Science and Engineering, B.M.S. College of Engineering, Basavanagudi, Bangalore, Karnataka, India B. Premjith Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyappetham, Coimbatore, India R. L. Priya Computer Department, Vivekanand Education Society’s Education of Society Chembur, Chembur, Mumbai, India T. Pushpatha Department of M.C.A., St.Ann’s College, Mehdipatnam, Hyderabad, Telanagana, India; Faculty of CS and CA, Tara Govt. College (A), Sangareddy, Telangana, India Anisha Radhakrishnan Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, India Mostafizur Rahman Department of Electronics and Communication Engineering, Khulna University of Engineering and Technology, Khulna, Bangladesh Laxmi Raja Department of CSE, Faculty of Engineering, Karpagam Academy of Higher Education, Coimbatore, India R. Raja Kishore Department of ECE, Marri Laxman Reddy Institute of Technology and Management, Hyderabad, India S. Rajakarunakaran Ramco Institute of Technology, Rajapalayam, Tamil Nadu, India K. Rajkumar School of Computer Science and Information Technology, DMI-St John the Baptist University, Mangochi, Malawi B. Rama Department of CS, Kakatiya University, Warangal, Telangana, India C. Rama Krishna NITTTR Chandigarh, Chandigarh, India

xxvi

Editors and Contributors

Somula Ramasubbareddy Information Technology, VNRVJIET, Hyderabad, Telangana, India B. Ramji Center for Computational Engineering & Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India Sudhanshu Ranjan Department of Information Technology, Indian Institute of Information Technology Allahabad, Prayagraj, India Abhinav Rastogi Department of Computer Science and Engineering, Delhi Technological University, Delhi, India Akanksha Rastogi Department of Computer Science and Engineering, Dr. B R Ambedkar National Institute of Technology, Jalandhar, Punjab, India Arun Pratap Rawat Department of Information Technology, Indian Institute of Information Technology Allahabad, Prayagraj, India E. Madhusudhana Reddy Department of CSE, Guru Nanak Institutions Technical Campus, Hyderabad, India Neha Redekar Department of Computer Science and Engineering, Walchand College of Engineering, Sangli, India Mathew Regi Department of Information Technology, Rajagiri School of Engineering and Technology, Ernakulam, Kerala, India V. Rijith Kumar Ramco Institute of Technology, Rajapalayam, Tamil Nadu, India Chatla Venkat Rohit Department of School of Computing, Sastra University, Thanjavur, Tamil Nadu, India Gitimayee Sahu Department of EXTC, UMIT, Juhu, Mumbai, India C. Sai Sreekari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India Ch. Saidulu Department of ECE, Vignan’s Lara Institute of Technology and Science, Vadlamudi, AP, India D. Saidulu Information Technology, Guru Nanak Institutions Technical Campus, Hyderabad, Telangana, India V. V. Sajith Variyar Center for Computational Engineering & Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India Poonam Salwan I.K. Gujral Punjab Technical University, Jalandhar, Punjab, India Patel Sandip Smt. K D Patel Department of Information Technology, Chandubhai S. Patel Institute of Technology (CSPIT), Faculty of Technology & Engineering (FTE), Charotar University of Science and Technology (CHARUSAT), Changa, Gujarat, India

Editors and Contributors

xxvii

Amrit Lal Sangal Department of Computer Science and Engineering, Dr. B R Ambedkar National Institute of Technology Jalandhar, Jalandhar, Punjab, India N. Sankarachelliah Ramco Institute of Technology, Rajapalayam, Tamil Nadu, India R. Santhosh Department of CSE, Faculty of Engineering, Karpagam Academy of Higher Education, Coimbatore, India S. Saravanan Department of Computer Science and Engineering, Karunya Institute of Technology and Sciences, Coimbatore, India K. Sateesh Kumar Department of ECE, Vignan’s Lara Institute of Technology and Science, Vadlamudi, AP, India K. Satyavathi Department of ECE, Nalla Malla Reddy Engineering College, Hyderabad, India N. Satyavathi Department of CSE, JNTUH, Hyderabad, Telangana, India T. Selva Sundar Ramco Institute of Technology, Rajapalayam, Tamil Nadu, India Protiva Sen Department of Electronics and Communication Engineering, Khulna University of Engineering and Technology, Khulna, Bangladesh Snigdha Sen Department of CSE, Global Academy of Technology, Bengaluru, Karnataka, India P. Senthilram Ramco Institute of Technology, Rajapalayam, Tamil Nadu, India T. Shanmuganantham Pondicherry University, Puducherry, India Faisal Ahmed Shariff Department of Electronics and Engineering, Atria Institute of Technology, Bengaluru, India

Communication

Ashu Sharma Mindtree Hyderabad, Hyderabad, India Hemant Kumar Sharma Department of Computer Science and Engineering, Dr. B R Ambedkar National Institute of Technology Jalandhar, Jalandhar, Punjab, India Kapil Sharma Department of Information Technology, Delhi Technological University, New Delhi, Delhi, India Sanjay Sharma C3i, IIT Kanpur, Kanpur, India K. Chandra Shekar JNTUH, Hyderabad, Telangana, India K. Sindhu BMS College of Engineering, Bangalore, India Shivam Singh Department of Computer Application, National Institute of Technology Kurukshetra, Kurukshetra, India Shreyanshi Singh Department of Computer Science and Engineering, IIIT Bhubaneswar, Bhubaneswar, Odisha, India

xxviii

Editors and Contributors

Ranga Swamy Sirisati Department of CSE, Vignan’s Institute of Management and Technology for Women, Hyderabad, India A. P. Siva Kumar MGIT, Hyderabad, India V. C. Skanda Department of Computer Science and Engineering, PES University, Bengaluru, India K. P. Soman Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyappetham, Coimbatore, India Ramasubbareddy Somula Information Technology, VNRVJIET, Hyderabad, India V. Sowmya Center for Computational Engineering & Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India J. L. Sreelakshmi Department of Electrical & Electronics Engineering, Mar Baselios College of Engineering & Technology, Thiruvananthapuram, India K. Sreelakshmi Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyappetham, Coimbatore, India H. R. Srikanth Department of Computer Science, PES University, Bengaluru, India Raghunandan Srinath SenZopt Technologies, Bengaluru, India Pattlola Srinivas Department of CSE, Malla Reddy Engineering College (Autonomous), Hyderabad, Telangana State, India Dhonvan Srinu Department of ECE, Marri Laxman Reddy Institute of Technology and Management, Hyderabad, India S. Subramanyam Jyothishmathi Institute of Technological Sciences, Karimnagar, India Razia Sultana Department of Computer Science, BITS Pilani, Dubai, United Arab Emirates Y. R. Sumukh Department of ECE, Nitte Meenakshi Institute of Technology, Bengaluru, India Amulya Ratna Swain KIIT Deemed to be University, Bhubaneswar, Odisha, India M. Swami Das Department of CSE, Malla Reddy Engineering College (Autonomous), Hyderabad, Telangana State, India Falgun Thakkar G H Patel College of Engineering and Technology, Vallabh Vidhyanagar, Gujarat, India B. K. Thejas Department of CSE, Global Academy of Technology, Bengaluru, Karnataka, India

Editors and Contributors

xxix

K. Thirunavukkarasu School of Computer Science and Engineering, Galgotias University, Greater Noida, India Abraham K Thomas Department of ECE, Saintgits College of Engineering, Kottayam, Kerala, India Gaurav Tirodkar Computer Department, Vivekanand Education Society’s Education of Society Chembur, Chembur, Mumbai, India T. Tulasi Sasidhar Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyappetham, Coimbatore, India P. Udaya Bhanu Department of ECE, Vignan’s Lara Institute of Technology and Science, Vadlamudi, AP, India S. Valai Ganesh Ramco Institute of Technology, Rajapalayam, Tamil Nadu, India Elizabeth Varghese Department of Electrical & Electronics Engineering, Mar Baselios College of Engineering & Technology, Thiruvananthapuram, India Vignesh Vasudevan NIT, Trichy, Tamil Nadu, India Akshay Venkatesh Department of Computer Science, PES University, Bengaluru, India K. Venugopal Rao Jyothishmathi Institute of Technological Sciences, Karimnagar, India Harsh K. Verma Department of Computer Science and Engineering, Dr. B R Ambedkar National Institute of Technology Jalandhar, Jalandhar, Punjab, India Prashant Verma Department of Information Technology, Delhi Technological University, New Delhi, Delhi, India Hrishikesh Vichore SIES Graduate School of Technology, Navi Mumbai, India P. Vijay Kumar Department of ECE, Vignan’s Lara Institute of Technology and Science, Vadlamudi, AP, India M. Vijayalakshmi KLE Technological University, Hubli, India M. Vishnu Vardhana Rao Department of CSE, Vignan’s Institute of Management and Technology for Women, Hyderabad, India Jayavrinda Vrindavanam Department of ECE, Nitte Meenakshi Institute of Technology, Bengaluru, India Anupama Yadav Department of Computer Science and Engineering, Dr. B R Ambedkar National Institute of Technology Jalandhar, Jalandhar, Punjab, India L. Yashaswini Department of ECE, Nitte Meenakshi Institute of Technology, Bengaluru, India

Static and Dynamic Activities Prediction of Human Using Machine and Deep Learning Models S. Valai Ganesh, Mohit Agarwal, Suneet Kr. Gupta, and S. Rajakarunakaran

Abstract Recent advancement in smart phones and computing technologies has played a vital role in people’s life. Develop a model to detect the human basic dynamic activities such as Amble, Climb stairs, coming down the stairs into the floor and human basic static activities like Sitting, Standing or Laying using the person’s smart phone and computers are the major work of this paper. Conventional Machine learning models like Logistic Regression, SVC, Decision tree, etc. results are compared with a recurrent deep neural network model named as Long Short Term Memory (LSTM). LSTM is proposed to detect the human behavior based on Human Activity Recognition (HAR) dataset. The data is monitored and recorded with the aid of sensors like accelerometer and Gyroscope in the user smart phone. HAR dataset is collected from 30 persons, performing different activities with a smart phone to their waists. The testing of the model is evaluated with respect to accuracy and efficiency. The designed activity recognition system can be manipulated in other activities like predicting abnormal human actions, disease by human actions, etc. The overall accuracy has improved to 95.40%. Keywords Human activity recognition · LSTM · Sensors · Smart phones · Recurrent neural network · Gyroscope · Accelerometer

S. Valai Ganesh (B) · S. Rajakarunakaran Ramco Institute of Technology, Rajapalayam, Tamil Nadu 626117, India e-mail: [email protected] S. Rajakarunakaran e-mail: [email protected] M. Agarwal · S. Kr. Gupta Bennett University, Greater Noida, India e-mail: [email protected] S. Kr. Gupta e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_1

1

2

S. Valai Ganesh et al.

1 Introduction Human behavior detection by static and dynamic motions is a latest technology that can be used for identifying human activities through computer and smart phone systems. A typical human behavior detection dataset is the ‘Activity Recognition Using Smart Phones Dataset’ available from the internet. Input data’s can be taken from several sorts of devices, such as sensors for capturing images, recording audio, monitoring pressure level, orientation and accelerations. The quick development of communication systems between human with computers and human with smart phones leads to identifying activities of humans in every aspect. More importantly, the recent introduction of GPU and deep learning [1, 2] algorithms has made human behavior detection applications in various areas like athletic competition, smart home automation and health care or monitoring for the Elder peoples. In current scenario, two types of methods are available to detect human behavior detection: First one is using live images of human behavior and the second one is from wearable sensors [3, 9]. By using the sensors (Gyroscope and accelerometer) in a smart phone [5], data’s like acceleration and orientation are taken from Accelerometer and Gyroscope with several variations are recorded. Accelerometer [4, 6] and Gyroscope readings are taken from 30 volunteers (referred as subjects) while performing the static activities like Sitting, Standing or Laying and dynamic activities like Walking, Walking Upstairs, Walking Downstairs. Accelerometer readings are divided into gravity acceleration and body acceleration readings, which has three dimensions in nature. Each Sensor signal is preprocessed by adopting noise filters. The remaining portion of the article is aligned as follows. In Sect. 2, there was discussion about research work completed in the past by the research community. Dataset particulars in the proposed work are provided in Sect. 3. In Sects. 4 and 5, a discussion about Machine Learning and LSTM model has been discussed. The experimental results have been discussed in Sect. 6. Article is ended with the comparison of machine learning model’s accuracy with deep learning model accuracy along with possible extension of the work by introducing new deep learning models in future in Sect. 7.

2 Related Work Bayat et al. [4] developing two different models for detecting human activities. One model is named as “in-hand” and another model is “in-pocket”. There are six different activities are detected. They are Fast Walk, Slow Walk, Running, Stairs-Up, StairsDown and Dancing. They are using tri-axial accelerometer is used to detect the human activities. Six different classification methods are adopted for this work and their results are compared. Testing accuracy of up to 91.15% is achieved in everyday activities using accelerometer.

Static and Dynamic Activities Prediction of Human …

3

Bulbul et al. [5] predicting human behavior using Smartphone’s by adopting deep learning models. Sensors like Accelerometer and gyroscope are used to predict the human behaviors. Dataset contains the information of nine individuals performing three different dynamic activities like walking, climbing up the stairs, climbing down the stairs, and three different static activities like sitting, standing and laying. Input data’s are monitored with a frequency range of 50 Hz. The signals are received and saved for every proportions. Designed models first trained with 80% of the total dataset and tested with 20% data. Models are developed, observed and tested using fivefold cross validation. Various conventional Machine learning Classification models such as Decision Trees, SVM, etc. were used in this work. Attal et al. [3] provides an overview of various techniques to detect human behavior from a wearable inertial sensing unit. Sensors were located in different portions of the physical body of a human. Importantly, the detecting devices were located in the lumbar and waiting for detecting various static and dynamic activities. A concatenation of forty stochastic tasks was chosen for the study. Conventional machine learning classification techniques used in their study among that k-Nearest Neighbors produced better accuracy among other techniques.

3 HAR Dataset An Accelerometer and Gyroscope readings are taken from 30 human beings (referred as subjects) while performing the following 6 classes (labels) such as static classes like Standing, Sitting, Laying and dynamic classes like Walking, Walking-Upstairs, Walking Downstairs. Accelerometer outputs are classified into two parameters, namely, gravity acceleration and body acceleration readings, which has three components x, y and z. Gyroscope readings are used to represent angular velocities of three dynamic activities. Jerk signals are taken from body acceleration readings. Fourier Transforms are finished on the above time readings to calculate frequency readings. There are 561 features are available in the dataset. Each window of readings is a data point of 561 features of subjects. Thirty subjects (human beings) data are randomly split to 70% (21 persons) test and rest is train data. Each data point corresponds one of the 6 class of activities.

4 Machine Learning Models HAR Dataset was initially tested by developing a model using conventional machine learning models like Logistic Regression, Linear SVC, SVM, Random Forest, Decision Tree and Gradient Boosting methods. The results of the model are shown in experimental result section.

4

S. Valai Ganesh et al.

Fig. 1 Precision and recall comparison of machine learning methods

Fig. 2 Accuracy and F1-score comparison of machine Learning methods

Comparing the precision results of all machine learning models are shown in Fig. 1. Linear SVC and SVM models are produced almost the same value whereas DT model produced least value. Recall comparison results are shown that Linear SVC produced some better value than other machine learning models. Similarly, by comparing the other two parameters like Accuracy and F1 score are shown in the form of bar chart as shown in Fig. 2. In that, the Linear SVC model provides better accuracy than other five machine learning models. Decision Tree provides least accuracy among other models. On the other hand, by Linear SVC shows good F1 score as compared to all others. By using six different machine learning models, Linear SVC produced some good results among others. Decision tree models produced the least results compared to all other five machine learning models where as SVM provide some decent results.

5 Deep Learning Model-LSTM The Long Short-Term Memory model is selected in this work. LSTM able to learn long-term dependencies. LSTM work extremely well on sequential modeling. Sequential modeling means ability to predict what comes next in order. Problems like Vanishing Gradient and Exponential gradient are normally occurring during backpropagation through time process. LSTM normally overcomes this problem of BPTT (Back Propagation Through Time). LSTMs looks chain like structure, on the other

Static and Dynamic Activities Prediction of Human …

5

Fig. 3 Overview of LSTM model

hand the repeating module has a different structure. There are four interacting layers in LSTM [7]. Interacting layers are equipped with point wise operations, activation functions like sigmoid or hyperbolic tangent functions. Line merging indicates concatenation process, line forking denotes the copy process and copied content is fed to various portions inside the interacting layers. The LSTM have able to forget the information or append the incoming data to the cell state by means of layout called as gates (Fig. 3). There are two packages are added with LSTM. One is Hyperopt and Hyperas. Hyperopt is the open source python library for doing optimization in serial and parallel spaces. Hyperopt may include real-valued, discrete, and conditional dimensions of data. Hyperas is a casing for hyperopt to perform optimization for keras models. The Softmax activation function is used here to predict six different classes. The Softmax activation provides a probability based distribution function. It is used to predict the output when multiple classes are involved. In this work there are six classes are labels are available in which softmax function provides probability rates for all classes. Based upon the highest probability rate the output is predicted. Normally, softmax function is located in the endmost layer in a categorizing problem.

6 Experimental Results In this work, human activities are obtained based upon movements. The experiment is performed with python version 3. Initially, the work was started with LSTM layer. Then expanded to two-layer LSTM, LSTM with hyperparameters with 15 evaluations. The outcome of LSTM model is shown in Tables 1, 2 and 3 (Fig. 4). The same HAR Dataset is trained and tested with deep learning models like LSTM single and two layers then with hyperparameters. LSTM with single layer produced around 92.40% validation accuracy, whereas LSTM with two layers produced slightly improved results around 92.43% and LSTM with hyperparameters provides around 95.40% accuracy by using hyperopt modules. Linear SVC model and LSTM with

6

S. Valai Ganesh et al.

Table 1 LSTM single layer output results—Model-I Classifier Output shape Parameters lstm_1 (LSTM) Dropout_1 (dropout) Dense_1 (dense)

(None, 64) (None, 64)

18,944 0

(None, 6)

(None, 128, 64) (None, 128, 64)

18,944 0

(None, 56) (None, 56)

27,104 0

(None, 6)

(None, 32)

5376

(None, 32)

0

(None, 6)

50

92.40%

Epochs

Accuracy

50

92.43%

342

Table 3 LSTM hyperparameters—Model-III Classifier Output shape Parameters lstm_4 (LSTM) Dropout_4 (dropout) Dense_3 (dense)

Accuracy

342

Table 2 LSTM two layer output results—Model-II Classifier Output shape Parameters lstm_2 (LSTM) Dropout_2 (dropout) lstm_3 (LSTM) Dropout_3 (dropout) Dense_2 (dense)

Epochs

Epochs

Evaluations

Accuracy

30

15

95.40%

198

Fig. 4 a LSTM validation single layer; b LSTM validation two layer; c LSTM validation above two layer

Static and Dynamic Activities Prediction of Human …

7

hyperparameters models are producing almost same validation results. In the upcoming work, Convolutional Neural Network based model and Resnet based model will be developed and check for their accuracy level. By adopting new deep learning models we can predict the many label of human activities during normal and abnormal condition of humans.

7 Conclusion and Future Work Smart phone applications are not limited to communications and networks. Various deep learning models are slowly incorporated in smart phones in order to collect the data for various activities. HAR is one of the important outcomes of this feature. LSTM produces 95.40% of testing accuracy. In the future, we are planning to add some classes with developing other models like CNN and Resnet Models and check for their validation accuracy level. Acknowledgements We are thankful to RAMCO Institute of Technology and Bennett University for providing expertise that greatly assisted the research, although they may not agree with all of the interpretations provided in this paper.

References 1. Agarwal, M., Kaliyar, R.K., Singal, G., Gupta, S.K.: FCNN-LDA: a faster convolution neural network model for leaf disease identification on apple’s leaf dataset. In: 2019 12th International Conference on Information & Communication Technology and System (ICTS), pp. 246–251. IEEE (2019) 2. Agarwal, M., Sinha, A., Gupta, S.K., Mishra, D., Mishra, R.: Potato crop disease classification using convolutional neural network. In: Smart Systems and IoT: Innovations in Computing, pp. 391–400. Springer (2020) 3. Attal, F., Mohammed, S., Dedabrishvili, M., Chamroukhi, F., Oukhellou, L., Amirat, Y.: Physical human activity recognition using wearable sensors. Sensors 15(12), 31314–31338 (2015) 4. Bayat, A., Pomplun, M., Tran, D.A.: A study on human activity recognition using accelerometer data from smartphones. Procedia Comput. Sci. 34, 450–457 (2014) 5. Bulbul, E., Cetin, A., Dogru, I.A.: Human activity recognition using smartphones. In: 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), pp. 1–6. IEEE (2018) 6. Kwapisz, J.R., Weiss, G.M., Moore, S.A.: Activity recognition using cell phone accelerometers. ACM SigKDD Expl. Newslett. 12(2), 74–82 (2011) 7. Kwon, M.C., Choi, S.: Recognition of daily human activity using an artificial neural network and smartwatch. Wirel. Commun. Mob. Comput. 2018 (2018) 8. Polu, S.K., Polu, S.K.: Human activity recognition on smartphones using machine learning algorithms. Int. J. Innov. Res. Sci. Technol. 5(6), 31–37 (2018) 9. Sousa Lima, W., Souto, E., El-Khatib, K., Jalali, R., Gama, J.: Human activity recognition using inertial sensors in a smartphone: an overview. Sensors 19(14), 3213 (2019) 10. Sun, J., Fu, Y., Li, S., He, J., Xu, C., Tan, L.: Sequential human activity recognition based on deep convolutional network and extreme learning machine using wearable sensors. J. Sens. 2018 (2018)

Implementation of Braille Tab Tejas Kulkarni, Shikha Jha, Sunny Gupta, and Anuja Gote

Abstract Braille tab is an electronic device used to perform various functions of a general e-book reader, but for the visually impaired people, the tab will have features that make it user-friendly and convenient for both who know the Braille language and for students as well. The objective of this paper is to demonstrate the various stages of development of the tab—both software and hardware aspects of it. An Android application called “Braille Tab Companion,” which is compatible with the tab, has also been developed to improve user experience. The app also helps in accessing various features of the tab. Education is a special focus here, as both students and teachers can use it for classroom learning as well as self-learning by various means. Thus, a Braille equivalent of all the information will be displayed. The paper contains a detailed illustration and discussion on the software and hardware aspects, including the application and electronic components used in the system. Keywords Braille tab · Android app · Firebase · Shift register · Multiplexer

1 Introduction Digital revolution has brought tremendous changes in the field of education. One can access a pool of knowledge on their fingertips. This revolution has made our society more educated and liberal; due to which, the overall standard of living has improved. But for the visually impaired, acquiring knowledge in not easy even in 2020. Efforts T. Kulkarni (B) · S. Jha · S. Gupta · A. Gote Department of Information Technology, Vidyalankar Institute of Technology, Mumbai, India e-mail: [email protected] S. Jha e-mail: [email protected] S. Gupta e-mail: [email protected] A. Gote e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_2

9

10

T. Kulkarni et al.

have been made to provide them knowledge using audiobooks and Braille printed books. However, research has it that dependence on audiobooks reduces cortical plasticity—an important factor in cognitive development [1]. A part of the socioeconomic strata is not able to afford information sources for the blind because of limited number of books are available in Braille lipi. Often, these books are bulky and expensive, which limits their accessibility in social circles. In recent years, refreshable tactile displays have been developed, thereby allowing the blind to access information available online. In most of these projects, more emphasis was given on the hardware aspect, making them accurate, but not focused on the ease of usage. Hence, the need to develop a system ensuring easy access to braille tabs was realized and worked upon.

2 Related Work After conducting a thorough survey of existing systems, it was learnt that diverse research has been performed on the hardware aspect of tactile display. Actuators were composed of piezoelectric material, shape memory alloys, and solenoids to raise individual pins in the Braille character [2]. Bending characteristics of electroactive polymers are utilized to provide hydraulic action of Braille dot [3]. Pneumatic signals were used to raise dots of several braille cells arranged in a row [4]. MCU was designed to convert Chinese or English text into Braille, play music and provides keyboard and other display features [5]. Character recognition was used for system development [6]. Arduino-based tab was created for Devanagari to Braille conversion [7]. Solidification of liquid state alloys was performed to allow locking of braille dots [8]. Though a few drawbacks have been noted in above systems such as—(a) they are commercially not viable, (b) the actuator mechanism can break when excessive pressure is applied, (c) the one actuator per dot mechanism can make the device bulky, and finally, (d) the refreshing rate is not satisfying in all the available tabs [9].

3 Proposed System • Braille tab is in effect, an e-book reader, but for the visually impaired, a special emphasis has been given on its usage in educational institutions. For testing purpose, LEDs are used which can be easily replaced with solenoid actuators for making tactile displays [10] (see Fig. 1). • The device uses a hierarchical architecture consisting of shift registers and multiplexers for accessing each character individually. This makes it energy efficient and reduces maintenance costs. • For making any device user-friendly, android applications become helpful. Many inbuilt features help users to access app and developers to create such apps without losing focus on the main objective of the application.

Implementation of Braille Tab

11

Fig. 1 Selecting 1st horizontal line

• “Braille companion” acts as a mediator between the user and the braille tab, helping user to utilize system effectively by providing a smooth flow. The android application is designed keeping in mind that most of the users will be visually impaired; hence, a description of talk back and large buttons are provided for improving user experience.

4 Implementation and Working (Software) • For providing a secured system and hassle-free usage, biometric lock is enabled for the app. Only those fingerprints are valid which are already registered for the device to unlock and saved in TEE. An authenticated user is prompted for the same (Fig. 3). • The application uses the Android ID which is unique and assigned during initial boot-up, and which remains unchanged unless factory reset is performed. It is used to identify user uniquely on firebase database. • The absence of user profile indicates that the user is not registered and so is redirected to the registration page (Fig. 4a). • Users enrolled in the institution must fill their Institute ID after which they will be redirected to another login page (Fig. 4b). After successful registration, data is uploaded on firebase. To control accessibility, such user needs approval from admin. Newly arrived requests are listed as shown in Fig. 7. • Unless admin takes any action, the user must wait, and the same is prompted to the user if a login attempt is made. • If admin accepts the request using cloud messaging and volley, then a push notification is sent. Also depending on admins action, user profile is updated on firebase (Fig. 6). • A teacher and a student can both decide to choose the purpose of using the tab through two modes: “Classroom” and “Self-Learning” (Fig. 8). • After joining a classroom, the student can access data available for particular classroom.

12

T. Kulkarni et al.

Fig. 2 Selecting 1st character from first line

Fig. 3 Login page

• For a non-institutional user, classroom mode is not available, and only selflearning mode is accessible. • After mode is selected, user has to choose the type of data to be uploaded (Fig. 9). • In document mode, .txt files can be selected which are then uploaded on firestorage, and its download link is uploaded on the firebase for the tab to access

Implementation of Braille Tab

13

Fig. 4 Registration request

Fig. 5 Firestorage

Fig. 6 Notification

file. Once operation is done successfully, user is prompted for the same (Figs. 5 and 10).

14

T. Kulkarni et al.

Fig. 7 Registration request

Fig. 8 Select mode

• “Dictate” mode uses Google’s speech-to-text’ API (see Fig. 11) to perform speechto-text conversion. The output string is confirmed using talkback system and then uploaded on firebase (Table 1). Above table maps the flags set by application values and the download location for the tab in firebase. For example, when teacher uses dictate, feature is self-learning mode; flag values of “role-setup-mode” will be “1-2-2,” respectively. Tab can download the string from androidID//Text//my. Flag values help the tab to identify which mode was used most recently by the user, thereby indicating what data to display on the tab (Fig. 10).

Implementation of Braille Tab

15

Table 1 Chart for understanding different parameters before fetching data Role

Setup

Mode

Upload place

1

1

1

Teacher: classroom—upload doc Link//Classroom

1

1

2

Teacher: classroom—dictate

Text//Classroom

1

2

1

Teacher: self-learning—upload doc

Link//My

1

2

2

Teacher: self-learning—dictate

Text//My

2

1

X

Student: join classroom

Depending on what teacher uploaded

2 or 10

2

1

Student: self-learning—upload doc

Link//My

2 or 10

2

2

Student: self-learning—dictate

Text//My

Fig. 9 Select file type

Fig. 10 ‘Dictate’ mode

Download from

16

T. Kulkarni et al.

Fig. 11 “Upload File” mode

Fig. 12 LED matrix

5 Implementation and Working (Hardware) • In the proposed system, a 3 × 3 Braille tab is implemented, in which each character is accessed sequentially. • In this system, initially, for selecting a particular horizontal line, the shift register connected directly to a micro-controller is used (Fig. 1). Each horizontal line is provided with its own shift register responsible for accessing each of the three characters individually (Fig. 2). Once all first line characters are processed, the second line is selected. Then, characters in the second line are processed, and the same process is repeated for the third line. • When a character is accessed, its mux is enabled by Vcc connected to the shift register. The requirement of i/o port is minimized, because the same data is provided to all mux(s), but only one is enabled at a time.

Implementation of Braille Tab

17

Fig. 13 User profile in use

• Once a mux is enabled, it select lines are used to glow an LED as per requirement. Here, only one LED glows at a time, but due to high processing power of micro-controller and persistence of vision, it seems that all the LEDs are glowing simultaneously. • For flawless communication, the android ID of user’s device is already registered in tab.

6 Results See Figs. 12 and 13.

7 Conclusion In this paper, a system for creating a user-friendly Braille tab was proposed. Implementation based on this proposed system was carried out, and software and hardware aspects of the system were discussed. Special emphasis was put toward the application side, in order to better the user experience. Future adaptations of the Braille tab are possible. The tab which is currently used for pre-defined classrooms can be further developed to create customized classrooms. A keyboard can also be added to the tab depending upon user requirement. The tab can be developed to support formats other than .txt files. The system can be adapted to be used as a book reader by designing a web portal.

18

T. Kulkarni et al.

References 1. Hamilton, R.H., Pascual-Leone, A.: Cortical Plasticity Associated with Braille Learning (1998) 2. Schmidt, R.N., Lisy, F.J., Prince, T.S., Shaw, G.S.: US Patent number: US6743021B2. Retrieved from https://patents.google.com/patent/US6743021B2/en (2002) 3. Yang, P.: US Patent number: US6881063B2. Retrieved from https://patents.google.com/patent/ US6881063B2/en (2005) 4. Sutherland, N.B.: US Patent number: US3659354A. Retrieved from https://patents.google. com/patent/US3659354A/en (1972) 5. Xiaoli, H., Tao, L., Bing, H., Qiang, C., Qiang, X., Qiang, H.: Electronic reader for the blind based on MCU. In: 2010 International Conference on Electrical and Control Engineering, pp. 888–890. Wuhan (2010) 6. Wajid, M., Kumar, V.: E-Braille documents: novel method for error free generation. Image Process. Commun. 19(4), 21–26 (2014) 7. Gupta, R., Singh, P.K., Bhanot, S.: Design and implementation of Arduino based refreshable braille display controller. Indian J. Sci. Technol. 9, 33 (2016) 8. Soule, C.W., Lazarus, N.: Reconfigurable Braille display with phase change locking. Smart Mater. Struct. 25(7), 075040 (2016) 9. Gote, A., Kulkarni, T., Jha, S., Gupta, S.: A review of literature on braille tab and the underlying technology. In: 2020 5th International Conference on Devices, Circuits and Systems (ICDCS), pp. 333–335. Coimbatore, India (2020) 10. Yang, T.-H., Lee, J.-S., Lee, S.S., Kim, S.-Y., Kwon, D.-S.: Conceptual design of new microactuator for tactile display. In: 2007 International Conference on Control, Automation and Systems, pp. 1306–1309, Seoul (2007)

Resized MIMO Antenna for 5G Mobile Antenna Applications S. Subramanyam, S. Ashok Kumar, and T. Shanmuganantham

Abstract Resized MIMO antenna for 5G mobile antenna applications is established on a self-isolated property. The suggested antenna has been miniaturized to resize the antenna by using two vertical stubs and these two vertical stubs are inserted in isolated antenna. With the help of isolation elements, the four-antenna MIMO system can achieve a good efficiency. The antenna designed by using FR4 substrate contains two different antennas like; T-shaped feeding element and perpendicular stubs which are inserted in naturally self-isolated antenna component. Here, fourantenna MIMO can achieve a target without any utilization of decoupling components. Antenna models are constructed and simulated; it has a good arrangement in simulation and analysis using IE3D simulator. Keywords Communication of 5G · MIMO applications · Mobile terminal · Compact self-isolated antenna

1 Introduction The user equipment can have a many advantages by using fifth generation (5G) such as rate of high transmission and smaller inertia over the present 4G system. MIMO antenna systems with multiple antennas (more than three antennas) are capable to achieve higher rate of transmission from the 5G antenna. In this paper, the MIMO antenna can achieve the high isolation by using the more number of antennas [1] but, due to restricted space in mobile phones, not able to use the large number S. Subramanyam (B) · S. Ashok Kumar Jyothishmathi Institute of Technological Sciences, Karimnagar, India e-mail: [email protected] S. Ashok Kumar e-mail: [email protected] T. Shanmuganantham Pondicherry University, Puducherry 605014, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_3

19

20

S. Subramanyam et al.

of antennas. There are few techniques to develop the process, such as decoupling element, orthogonal polarization, ground structure, and neutralization line [2–4]. Most of the conventional systems consist of two antennas only, but the signal strength of two antennas is weak. Antenna isolation was improved by using the fourantenna MIMO system [5–7]. The more number of antennas are presented in single system; and the more antennas in a device have high data rate and upload speeds. Mostly, the size of the antenna is large, but in this work to decrease the size of the antenna around 30% by using the vertical stubs [8–10]. The U-structured antenna element can be used as decoupling element and radiating element. This is the unique feature of MIMO antenna to have very good isolation.

2 Self-isolated Antenna Configurations and the System Operating Sequence Self-isolated configuration component was displayed in Fig. 1. The self-isolated antenna element consists of three elements such as T-shaped component and Ushaped component and two vertical stubs. Here, T-shaped element acts as a feeding component. The working process of self-isolated antenna was explained by fourantenna MIMO system. The height and length of the self-isolated configuration are given as H and L (H = 13.6 mm and L = 17.4 mm) and parameters p, t, h and q will clearly show all the stubs and components. The value of p, q, t and h is as follows, p = 3.4 mm, q = 11.9 mm, t = 5.1 mm, h = 10.7 mm and it is shown in Fig. 1. U-structure component was grounded, where the values of H and L are constant but, possible to change the alternating the values of h, t, q and p. Without using the vertical stubs, we can know the value of length by using antenna frequency.

Fig. 1 Single-MIMO antenna configuration

Resized MIMO Antenna for 5G Mobile Antenna Applications

21

In this paper, the proposed antenna can be operated at the frequency about 2.4 GHz (2.3–2.5 GHz) for four-port MIMO antenna and also to use the same frequency for single antenna. The single antenna and four antennas had a same discrepancy in dielectric, tangential loss, and top surface where dielectric = 3.4, tanδ = 0.02 and top surface (t) = 1.524. The values of H and L are same for both the type of antennas. The total length of single antenna and four antennas was different. The length of four antennas is 130.3 mm. Here, the antenna size is decreased about 40% from the actual height. It should be a small gap between the U-structure component and T-shaped component. The space between them gives good impedance matching in the exact antenna system result. MIMO technology leverages multipath behaviour by using multi-smart transmitters and receivers. MIMO is a wireless technology and it is used to increase the channel capacity. MIMO can also be called as spatial multiplexing.

3 A Compact-Based Self-isolated Antenna Component System with Four-antenna MIMO In mobile terminal, four-port MIMO antenna system is positioned at the boundary. The measurement of the dimensions shows 130.3 and 13.6 mm. The 4 × 4 MIMO systems will have four different signals from four transmitted antennas, and by using this setup, user equipment can receive the better signals. The measurement of the four antennas is 127.7 mm × 13.6 mm × 1.2 mm and the dimensions of the small substrate can be seen in Fig. 2. Here, the four antennas consist of four ports where each port is fed with 50  impedance matching. The distance from one end of the antenna to another antenna is given as d. The connection between D and d is: D = d + Length, where D = 38.2 mm and d = 20.9 mm. The standards of L = 17.4 mm and H = 13.65 mm are kept constant where the values of d and D will vary. The antenna system operates well with 2.4 GHz frequency band with the following specification: t = 5.1 mm, h = 10.7 mm, p = 3.4 mm and

Fig. 2 Four-antenna system with MIMO

22

S. Subramanyam et al.

Fig. 3 S-parameter display of self-isolated MIMO system

q = 11.9 mm. A very good isolation can be obtained when antenna moves from one antenna to another antenna. The antenna radiation pattern, current distribution, and total efficiencies are good. For 2.4 GHz frequency band, the efficiency of antenna is more than 90%. The current passing through the antenna and ground plane explains us about MIMO methodology isolation. The data rate customary with a pair of 2.4 GHz frequency is greater to 34 bps/Hz. The MIMO antenna return loss is influenced by factor t and perfectly matching in 50  impedance matching at resonant frequency. The operating frequency is operating based on various stub length. For better performance and proper gain, the structure of antenna with specified measurements and then the behaviour is verified at resonant frequency 2.4 GHz, with a reflection coefficient of −30 dB as exhibited in Fig. 3. The 2D radiation model of this antenna, which is 4 port MIMO, works well despite showing different angles as shown in Figs. 4 and 5. Above figures show the maximum gain is 1 dBi and the efficiency shows 90% at the operating frequency. From the above figure shows the gain and directivity in between 4.5 and 5 dBi. 4 × 4 MIMO will consist of four antennas. Generally, a device with more antennas is used to have a high cost because of its hardware. And they will use a bit of extra power for extra wireless hardware.

Resized MIMO Antenna for 5G Mobile Antenna Applications

23

Fig. 4 Elevation designed gain

4 Conclusion A resized four-port antenna system for 5G mobile antenna applications has been presented. This antenna system is depended on compressed antenna component and this antenna element is a self-isolated one. The MIMO antenna is confirmed by reproduction and analysis. And it can also achieve the good isolation without any decoupling element or isolation components. Without decreasing any efficiency, the diffusing component of the implemented antenna acts as an un-coupling component. Because of size reduction, the suggested MIMO system is a better choice for 5G mobile in portable systems.

24

S. Subramanyam et al.

Fig. 5 Azimuth designed gain

Acknowledgements The authors would like to thank JNTUH for supporting this project. This research was supported by the TEQIP III Collaborative Research Scheme, JNTUH.

References 1. Teja, R., Kumar, S.A., Thangavelu: CPW-fed inverted six shaped antenna design for internet of things (IoT) applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019 2. Sahithya, V., Kumar, S.A., Thangavelu, S.: Design of CPW fed Antenna for WIMAX Applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019 3. Ravali, S., Kumar, S.A., Thangavelu, S.: Design of a CPW fed detective solid bowtie antenna for satellite applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019 4. Kumar, S.A., Thangavelu, S.: Design of clover slot antenna for biomedical applications. Alexandria Eng. J. 56, 313–317 (2016) 5. Andrews, J.G., et al.: What will 5G be? IEEE J. Sel. Areas Commun. 32(6), 1065–1082 (2018) 6. Kumar, S.A., Thangavelu, S.: CPW fed monopole implantable antenna for 2.45 GHz ISM band applications. IJEL 3(3), 152–159 (2015) 7. Kumar, S.A., Thangavelu, S.: CPW fed implantable Z-monopole antennas for ISM band biomedical applications. IJMWT 7, 529–533 (2015)

Resized MIMO Antenna for 5G Mobile Antenna Applications

25

8. Kumar, S.A., Thangavelu, S.: Implantable CPW fed rectangular patch antenna for ISM band biomedical applications. IJMWT 6(1), 101–107 (2014) 9. Kumar, S.A., Shanmuganantham, T.: Design of CPW-fed inverted six shaped antenna for IoT applications. TEEM, Springer (2020) 10. Kumar, S.A., Thangavelu, S.: Design and performance of textile antenna for wearable applications. TEEM 19(5), 352–355 (2018)

Molecule Four-Port Antenna is Utilizing Detachment Progress of MIMO K. Venugopal Rao, S. Ashok Kumar, and T. Shanmuganantham

Abstract To investigate the characteristics multiple input multiple output (MIMO), four-element antenna is proposed for wireless applications. As a radiating element, hexagon molecule-shaped fractal structure is proposed in this antenna. To obtain better isolation, without any additional decoupling structure, the elements of an antenna are placed orthogonally to each other. On each radiating element, a C-shaped slot is placed to attain band negative response in WLAN. The designed antenna exhibits constant omni-directional wave pattern. The range of 3.5–3.9 GHz an acceptable impedance bandwidth is shown in this paper, also return loss more than −20 dB over the resonant frequency. Based on the characteristics and MIMO parameters, the performance of antenna is simulated and designed. For MIMO applications, the detachment level in between the material is flexible. For great MIMO and more-density operations, the simulated and measured results are good and suitable. Keywords Communication of 5G · MIMO applications · Mobile terminal · Wireless applications

1 Introduction In modern days, to make smaller in size, devices designing of MIMO system with multiple antennas are developed. While designing some implications is done one of that is the space between MIMO receiver materials as short as possible. Space between the antenna element will lead the problem in mutual coupling. Number of K. Venugopal Rao (B) · S. Ashok Kumar Jyothishmathi Institute of Technological Sciences, Karimnagar, India e-mail: [email protected] S. Ashok Kumar e-mail: [email protected] T. Shanmuganantham Pondicherry University, Puducherry 605014, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_4

27

28

K. Venugopal Rao et al.

methods are designed in novel as similar to original location of the bits, placing neutralization strip, moderately equivalent design, and further decoupling design. The radiating bits are arranged perpendicular together and provided by coplanar wave guide feeds to enhance isolation. The large area is occupied by usage of CPW. Isolation is enhanced by ground plane on the coupling current to cancel out it neutralization line is placed between the antenna element. It is reduced the mutual coupling by considering L-shaped holes in elliptical, radiator slots this radiator and elliptical slot in the ground plane between perpendicularly placed and fractal-shaped antenna [1]. To enhance the isolation, radiating element is used inside the T-shaped stub and in meander line feed combination. To get more isolation, instead of using t-stub we can use I-model stubs on surface and F-model holes in a co-shared detachment. Most of the conventional systems consist of two antennas only, but the signal strength of two antennas is weak. Antenna isolation was improved by using the four-antenna MIMO system [5–7]. The more number of antennas are presented in single system; and the more antennas in a device have high data rate and upload speeds. Mostly, the size of the antenna is large, but in this work to decrease the size of the antenna around 30% by using the vertical stubs [8–10]. The U-structured antenna element can be used as decoupling element and radiating element. This is the unique feature of MIMO antenna to have very good isolation.

2 Antenna Configurations and the System Operating Sequence Figure 1 shows the diagrammatic representation of UWB MIMO antenna. The multiinput multi-output system consists of four single poles; each bit delivered by 50  a micro-strip line. The good isolation is achieved by perpendicular orientation of the perpendicular element. The edges of the geometry are applied at the hexagon molecule fractal to achieve the wideband phenomenon.

3 A Compact-Based Self-isolated Antenna Component System with Four-antenna MIMO In mobile terminal, four-port MIMO antenna system is positioned at the boundary. The measurement of the dimensions shows 130 and 13 mm. The 4 × 4 MIMO systems will have four different signals from four transmitted antennas, and by using this setup, user equipment can receive the better signals. The measurement of the four antennas is 130 mm × 13 mm × 1.2 mm and the dimensions of the small substrate can be seen in Fig. 2.

Molecule Four-Port Antenna is Utilizing Detachment Progress of MIMO

29

Fig. 1 Geometrical view of antenna

Fig. 2 S-parameter display

The data rate customary with a pair of 3.7 GHz frequency is greater to 34 bps/Hz. The MIMO antenna return loss is influenced by factor t and perfectly matching in 50 impedance matching at resonant frequency. The operating frequency is operating based on various stub length. Top of form for better performance and proper gain, the structure of antenna with specified measurements and then the result is verified at resonant frequency 3.7 GHz, with a return of less than −22 dB at 3.7 GHz as shown in Fig. 2. The 2D radiation model of this antenna, which is four-port MIMO, works well despite showing different angles as shown in Figs. 3 and 4. Figure 5 shows the maximum gain is 1 dBi and the efficiency shows 90% at the operating frequency.

30

K. Venugopal Rao et al.

Fig. 3 Elevation plane

4 Conclusion In this paper, four-port antenna system for 5G mobile antenna applications has been presented. This antenna system is depended on compressed antenna component and this antenna element is a self-isolated one. The MIMO antenna is confirmed by reproduction and analysis. And it can also achieve the good isolation without any decoupling element or isolation components. Without decreasing any efficiency, the diffusing component of the implemented antenna acts as an un-coupling component. Because of size reduction, the suggested MIMO system is a better choice for 5G mobile in portable systems.

Molecule Four-Port Antenna is Utilizing Detachment Progress of MIMO

Fig. 4 Azimuth plane

Fig. 5 Field gain

31

32

K. Venugopal Rao et al.

Acknowledgements The authors would like to thank JNTUH for supporting this project. This research was supported by the TEQIP III Collaborative Research Scheme, JNTUH.

References 1. Sahithya, V., Kumar, S.A., Thangavelu, S.: Design of CPW fed antenna for WIMAX applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019 2. Ravali, S., Kumar, S.A., Thangavelu, S.: Design of a CPW fed detective solid bowtie antenna for satellite applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019 3. Kumar, S.A., Thangavelu, S.: Design of clover slot antenna for biomedical applications. Alexandria Eng. J. 56, 313–317 (2016) 4. Kumar, S.A., Thangavelu, S.: Design of CPW-fed inverted six shaped antenna for IoT applications. TEEM, Springer (2020) 5. Kumar, S.A., et al.: CPW fed monopole implantable antenna for 2.45 GHz ISM band applications. IJEL 3(3), 152–159 (2015) 6. Kumar, S.A., Thangavelu, S.: CPW fed implantable Z-monopole antennas for ISM band biomedical applications. IJMWT 7, 529–533 (2015) 7. Kumar, S.A., et al.: Implantable CPW fed rectangular patch antenna for ISM band biomedical applications. IJMWT 6(1), 101–107 (2014) 8. Kumar, S.A., et al.: Design of implantable CPW fed monopole antenna for ISM band applications. TEEM 15(2), 55–59 (2014) 9. Kumar, S.A., et al.: Design and performance of textile antenna for wearable applications. TEEM 19(5), 352–355 (2018) 10. Teja, R., Kumar, S.A., Shanmuganantham, T.: CPW fed inverted six shaped antenna design for internet of things (IoT) applications. In: IEEE IMICPW 2019, NIT Trichy, 22–24 May 2019

IOT-Based Underwater Wireless Communication Gitimayee Sahu and Sanjay S. Pawar

Abstract Underwater wireless communication is an advanced research area need to be explored extensively. This topic is highly significant for various purposes, starting from aquatic pollution control, marine life monitoring, and quality of the water and moreover signal transmission. Different sensors can be used under water for the above said applications. For signal transmission under water, sound waves and optical signals were used extensively in back. The drawback is low data rate, attenuation and backscattering due to suspended particles which are the major challenges. To meet the above challenges here, a prototype of underwater RF wireless network is developed. It not only establishes the wireless network but also tracks the underwater devices and updates the location in the IOT cloud. It also measures temperature, vibration and quality of the water using underwater sensors and update in www.thingspeak.com website. The established underwater wireless network provides coverage upto 1.2 km in diameter and 50 m in depth which is highly considerable. Further optimization can be done to enhance the range. Keywords Underwater module · Terrestrial module · Arduino Uno · HC12 · Epson WiFi module

1 Introduction The three forth of earth’s surface is enclosed with water in the aqua-structure of oceans, rivers and seas. The unexplored underwater environment needs to be examined. The path to do fruitful experimentation has always dependent on various technologies. Latest improvement in technologies has directed the way to do the underwater explorations using different sensors at each level. Hence, underwater G. Sahu (B) Department of EXTC, UMIT, Juhu, Mumbai, India e-mail: [email protected] S. S. Pawar Department of EXTC, UMIT, SNDT Women’s University, Mumbai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_5

33

34

G. Sahu and S. S. Pawar

sensor network (UWSN) is an emerging research area for variety of applications such as, (i) aquatic surveillance, (ii) river and sea pollution monitoring and controlling, (iii) oceanographic data compilation and commercial exploitation of the aquatic environment and (iv) marine monitoring. Moreover signal transmission using wireless network underwater is a fusion of wireless technology with miniaturized sensors for smart sensing, communication capabilities and intelligent computing. Underwater wireless network has significant applications in military and navy academy, marine monitoring and various industrial applications such as marine fish farms, reduction of deposition of organic waste on the seabed and to fight against atmospheric pollution are highly beneficial. UWSN is a network of autonomous sensor nodes [1]. The sensor nodes are geographically dispersed in order to sense various properties related to under water such as salinity level, pressure, temperature, etc. The sensor nodes may be mobile or fixed which may be connected wirelessly through communication modems to transfer the sensed data [2]. The data can be used by different functions for the welfare of living things. Underwater communication is primarily carried out using a group of wireless nodes communicating their data to the gateway node. The gateway then relays the data to the closest control station. The objective of the research work is to establish an aquatic wireless network under the water. This idea is originated from “the crashing of inter-continental aeroplanes at the mid air and falling inside the sea”. For example, Malaysian aeroplane MH370 get vanished from ATC in March 2014 is the most incredible aviation mystery in the twenty-first century. The plane abruptly turned back towards Malaysia and then towards Indian Ocean. The Malaysian government did not found the whereabouts of the plane and the humans and gadgets inside the ocean. If an aquatic wireless network can be created under the water, then any device falling inside the sea can be detected, tracked, and location of the device can be updated in the IOT cloud. This research work is very innovative and significant. The objective of this research is to investigate and design a prototype to establish a wireless network under the water for communication. It also locates and tracks the underwater devices and updates the position of the devices in IOT cloud. The various use cases of this research are, (i) to establish underwater wireless network and (ii) communication and transfer of data between various devices. This network also helps to track the cellular devices and to find the exact location, i.e. latitude and longitude, and it will be updated in the IOT cloud. It also monitors and controls the quality of the water. Aquatic surveillance and oceanographic data compilation and commercial exploitation of the aquatic environment are the major objectives of the research work. The other applications include marine monitoring, coastal area surveillance, oil-rig maintenance and collection of data from under water.

IOT-Based Underwater Wireless Communication

35

1.1 Literature Review Nowadays, there is huge research related to underwater communication is on progress. The main research lines are based on increasing the distance and bandwidth to reduce the energy consumption of underwater devices which aims to increase the network life period. Underwater communication research especially focuses on the application of optical signals, electromagnetic waves and the generation of acoustic and ultrasonic waves. Each methodology has its own significances, with its advantages and drawbacks. Some literature reviews related to this research work are, (a) Devices using optical communication are having high propagation speed. But strong back scattering occurs by suspended particles and is concerned by the turbidity of the water, hence not better option for large areas [3–5]. (b) Devices using acoustic waves are having minimum sensitive level. They can able to reach large distances (over 20 km). But the drawback is the low data rate (0 b/s–20 kb/s), which is defined by low carrier frequency, high attenuation and reflections [6, 7]. (c) For higher data rates, radio frequency (RF) technique can be used, which achieves data rate up to 100 Mbits/s in short distances. Electromagnetic (EM) waves, in the high frequency range, are better choice for underwater wireless communication system. EM waves have minimum sensitive level of reflection and refraction in shallow water compared to acoustic waves. Oubei et al. [3] explained underwater wireless communication using optical communication. It uses visible light to transmit data in underwater environment. It can transmit large information through wide bandwidth using unlicensed spectrum and low power. Saeed et al. [4] also explained underwater wireless networking and localization using optical communications. Adib et al. [6] from MIT Media Lab researchers have designed underwater transmitter that sends a sonar signal to propagate on the water’s surface, which causes tiny vibrations that reciprocates to 1 and 0 s transmitted. Over the surface, an extremely sensitive receiver reads these tiny vibrations and decodes the sonar signal. The system is called as “translational acoustic-RF communication” (TARF). “Acoustic transmitting beacons can be realized like aeroplane’s black box”. It transmits signal every second, and TARF system can be used to pick up the signal. Ranjan et al. [8] explained underwater wireless network using sensors and autonomous underwater vehicles (AUVs). The AUVs communicate, cooperate and exchange data among each other to carry out sensing and monitoring function. Underwater communication network (UWCN) has found an increasing use in a widespread range of applications, such as, autonomous underwater vehicle (AUV), coastal surveillance systems operation, environmental research, oil-rig maintenance, collection of data for water monitoring and linking submarines to land. The research paper is organized in the following manner. Section 1 presents introduction and literature review in brief. Section 2 presents system model, Sect. 3 presents results and discussion and Sect. presents 4 conclusion.

36

G. Sahu and S. S. Pawar

2 System Model Underwater wireless communication can use sensor network to monitor environmental condition below the water. Figure 2 shows the block diagram of the system model. Sensors will be connected to Arduino Uno. It then communicates with the RF antenna attached to it. The underwater sensors will collect all the information’s below the water and convey to RF antenna through Arduino. The antenna then broadcasts with other antennas. It will establish communication between the devices, track the devices and find the location, i.e. latitude and longitude and update in IOT cloud. Further optimization can be done for range expansion and higher number of attached devices to enhance the depth of the established network. Figure 1 shows the prototype of the underwater wireless network (Fig. 2). The technology of underwater wireless communication provides the solution of transmitting and receiving information between two different mediums. Here, we have developed a prototype wireless network, having coverage in the range of transmitted frequency. The two wireless WiFi modules consist of Epson 8266 which works as transmitter and receiver. The transceiver module 1 specified as terrestrial unit operating at 2.4 GHz consists of node MCU HC12 wireless module which interfaced with the onchip Arduino board. It has broadcasting range upto 1 km and operating bandwidth of 433.4–473.0 MHz. This unit has the capacity to communicate with hundreds or more live transmitting frequency channels. It transmits bit code message from terrestrial to water medium.

Fig. 1 Underwater wireless network [1]

IOT-Based Underwater Wireless Communication

37

Fig. 2 Block diagram of system model

2.1 Module 1 (Terrestrial Module) Figure 3 shows the transreceiver block diagram. The terrestrial module consists of four main components, i.e. (i) Arduino, (ii) WiFi (Epson), (iii) HC12 and (iv) Node MCU. The Arduino works as a controller to all this serially connected devices. First the Arduino Uno triggers the WiFi Epson module which functions as hotspot and operates at ISM band. After successful interfacing of hotspot module with the WiFi Epson module, it displays a coded message on LCD [16 * 2]. After connecting to the nearby RF network, it works as a blank spot which can be connected to many such cellular devices in the geographical area approximately 1.5 km range. The data rate is nearly 10 Kbps for the terrestrial communication. The interfaced HC12 module frequently transmits and receives data with the specified frequency range. It supports half duplex transmission. It has sensitivity of 5 Kbps. The paired transreceiver antennas capable of communicating beyond 1 km range, thus provides adequate coverage distance. Node MCU (Epson 8266) is server-based communicating device interconnected with on-chip Arduino, HC12 and WiFi module. When all these devices get ready to transmit ping message from the underwater module, the node MCU will act as transmitter for the IOT cloud via thingspeak website. This module also measures temperature via sensor and checks the salinity level of the water.

38

G. Sahu and S. S. Pawar

Fig. 3 Block diagram of transreceiver

2.2 Module II (Underwater Module) The transreceiver consists of on-chip Arduino, WiFi Epson module, HC12 and piezoelectric sensor. The HC12 module connects the upper layer which has half the range, i.e. upto depth of 5 ft. When any cellular device comes in contact with the HC12 under the water, within a range of 0.5 km, any devices in this range will get attached to the HC12 and the location of the lost cellular devices can be traced. The HC12 will communicate with the Arduino for updating the location to upper module of HC12. Then the upper module restores the information to www.thingspeak.com (i.e. IOT cloud) site. The HC-12 is a half-duplex 20 dBm (100 mW) transmitter paired with a receiver that has −117 dBm (2 × 10 − 15 W) sensitivity at 5000 bps paired with an external antenna. These transceivers are capable of communicating up to 1 km in open air interface and provide adequate coverage and high throughout.

3 Results and Discussion The research work provides underwater wireless network for short communication distances, i.e. near field communication (NFC). It can be used to establish the wireless network under the surface of sea water also. This network can be used to connect between the cellular devices, track the devices and localize the devices like finding

IOT-Based Underwater Wireless Communication

39

Fig. 4 Module I, i.e. RF terrestrial module

the GPS position and to store the location in the IOT cloud. It can also be used for precision monitoring, controlling the uncleanliness of the water, which may arrive from neighbour localities and industries. Figure 4 shows the RF terrestrial module will be switch ON first and gets the broadband service from nearby base station, i.e. the primary server. The server will ping node MCU, then the transmitter and receiver HC12 will ON by blinking. It triggers Epson 826 WiFi module; it operates at the ISM band, i.e. 2.4 GHz. As all the system starts to respond, the WiFi system will transmit and will connect module II, i.e. underwater module as shown in Fig. 5. The module II activates and communicate with the terrestrial module and tracks the nearby devices which are included in its range. The HC12 will be interfaced with node MCU and will update the location of the tracked device in www.thingspeak.com. The sensors connected with module II are DS18B20 temperature sensor, piezoelectric ceramic ring transducer (SMR3515T55) and Waspmote Smart Water quality monitoring sensor. It measures temperature, vibration and water quality and update in the IOT cloud. Waspmote is smart aqua-quality monitoring sensor which is portable and measures whether there is any chemical leakage to the water or not. It checks various aqua-quality monitoring parameters like pH level, dissolved oxygen (do), oxidation reduction potential (ORP) and salinity level of the water. Figure 6 displays the location, i.e. latitude and longitude of the submerged device under the water.

40

G. Sahu and S. S. Pawar

Fig. 5 Module II, i.e. underwater module with sensors

Fig. 6 Displaying the location of the device under water

The various use cases include, (i) military applications, (ii) monitoring the marine activities, (iii) industrial applications, for example, fish farming and (iv) decrement of waste deposition on the sea bed. The main challenge lies with underwater wireless network since water is the conducting medium, i.e. lossy in nature unlike air interface. Hence, coverage range of the network will be small. So more base stations (BS) need to be deployed for proper and adequate coverage. Deployment of BS inside or above the surface of the water is also a major assert. As water has flowing in nature, hence fixed deployment of BS is not possible. Through buoyant, drones or short range water proof BSs can be placed on the surface of sea water to establish wireless communication.

IOT-Based Underwater Wireless Communication

41

4 Conclusion and Future Scope This research is highly significant since it establishes wireless network under water. It can able to communicate between the devices, track and locate the devices under water. It updates the GPS location in the IOT cloud for further reference. It helps for marine monitoring, sensing and controlling the quality of the saline water. It also measures temperature, vibration and monitors the quality of the water, i.e. pH level, dissolved oxygen (do) and salinity level and update in the IOT cloud. It reduces deposition of organic waste on the sea bed. The developed system provides coverage upto a range of 1.2 km diameter and 50 m in depth. This coverage is adequate in water like lossy medium as compared with other related literatures. Further optimization can be done for range expansion, more number of attached devices and to enhance the depth of the established network. It also increases the quality of the signal and reduces losses due to back scattering and reflections.

References 1. Felemban, E., Shaikh, F.K., Qureshi, U.M.: Underwater sensor network applications: a comprehensive survey. Sage J. (2015). https://doi.org/10.1155/2015/896832 2. Khalid, M.A., Shah, P.A., Iqbal, K., Gillani, S., Ahmad, W., Nam, Y.: Underwater wireless sensor networks: a review of recent issues and challenges. Wirel. Commun. Mobile Comput. 6470359, 20 (2019) 3. Oubei, H.M., Durán, J.R., Janjua, B., Wang, H.-Y., Tsai, C.-T., Chi, Y.-C., Ng, T.K., Kuo, H.-C., He, J.-H., Alouini, M.-S., Lin, G.-R., Ooi, B.S.: Wireless optical transmission of 450 nm, 3.2 Gbit/s 16-QAM-OFDM signals over 6.6 m underwater channel. OSA Tech. Digest Opt. Soc. Am. 23(18), 23302–23309 (2016) 4. Saeed, N., Celik, A., Al-Naffouri, Y.Y., Alouini, M.-S.: Camera based optical communications, localization, navigation, and motion capture: a survey. Ad Hoc Netw. (2018) 5. Oubei, H.M. et al.: Light based underwater wireless communications. Jpn. J. Appl. Phys. (2018) 6. Jang, J., Adib, F.: Underwater backscatter networking. In: SIGCOMM, Aug 19, pp. 19–23 (2019). Beijing, China 7. Gussen, C.M.G., Diniz, P.S.R., Campos, M.L.R., Martins, W.A., Costa, F.M., Gois, J.N.: A survey of underwater wireless communication technologies. J. Commun. Inf. Syst. 31(1) (2016) 8. Ranjan, A., Ranjan, A.: Underwater wireless communication network. Adv. Electron. Electr. Eng. 3(1), 41–46 (2013)

Pattern Prediction Using Binary Trees T. Aditya Sai Srinivas, Ramasubbareddy Somula, Karrothu Aravind, and S. S. Manivannan

Abstract In this busy world, no one has time now. Technology is being developed every day to increase the efficiency. In this front, word predictor is a small step which increases our efficiency multifold times. Word predictor has applications in various areas like texting, search engine, etc. To develop our word predictor program, this project uses the data structure Trie. Our program uses a stored file of words to predict the words which the user may think of thus helping a lot. This project has compared the implementation of word completion using binary trees to that of binary tries. The proposed method that this project has used is word prediction using binary trees as compared to already existing binary tries and has proved that implementation of binary tries takes longer time as compared to our proposed work. Auto-complete is a feature which helps the user to find out the things that one wants to search by predicting the value in the search box. This auto-complete starts predicting the searches related to the few letters or words that are being typed by the user in the search box. This feature works best when the words typed by the user are more common such as when addressing an email. Keywords Prediction · Binary tree · Trie

T. Aditya Sai Srinivas Computer Science Department, G. Pullaiah College of Engineering and Technology, Kurnool 518002, India R. Somula (B) Information Technology, VNRVJIET, Hyderabad 500090, India e-mail: [email protected] K. Aravind Computer Science and Engineering, GMRIT Engineering College, Razam 532001, India S. S. Manivannan SCOPE, VIT University, Vellore 632014, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_6

43

44

T. Aditya Sai Srinivas et al.

1 Introduction The feature “Auto-complete” starts predicting the words when the user enters the first few letters of the word that one wants to search. When the user enters the first letter, the auto-complete displays all the words beginning with that letter, and so the writer can select the word from the predicted values instead of fully typing the text. This saves a lot of time for the users around [1, 2]. Sometimes the words predicted are the ones which are recently searched by the user. Language modeling and Augmentative and Alternative Communication (AAC) devices are used in the word prediction process to predict the most frequently and commonly used words. The user also can enter the words into the prediction dictionaries using the word prediction software [3, 4]. • To understand the dynamic data structure tree used in developing the program. • To understand the data structure "trie" being used in the program. • To construct a strong and efficient algorithm to develop the program which is editable and can be later used as a module for bigger software mechanism. • To develop a real-time program which is efficient and has a fast processing and also has an industrial application. In this program, data structure Trie is being used to search the data in an ordered fashion. This data structure is usually known as radix or prefix or digital tree. This helps in storing the data in a dynamic set in which the keys are strings. The node in the tree doesn’t store any information about the key instead the position of the node defines the key. All the successors of a node have a common prefix term of the previous string associated with the earlier node, and the root is associated with the empty string. Values tend only to be associated with leaves, and with some inner nodes that display the keys of interest (Fig. 1). Fig. 1 Searching a node using trie

Pattern Prediction Using Binary Trees

45

Fig. 2 Types of binary tree

Compact prefix tree is used in case of space optimization. In the above shown example, the predictions are done in the nodes based on the first node information. The final nodes have a prefix term as the earlier node [5–10]. Binary Trie A tree is a data structure which has two elements in it which are known as left child and right child. When general tree is converted to binary tree, then left most child of parent will become left child, and all remaining children will be right child to their siblings (Fig. 2). To traverse through the binary tree, this project has three different types of traversals. They are: • Post-order • In-order • Pre-order Post-order: It first traverse through the left child node and then right child node and finally to the root (LRV). In-order: It first traverse through the left child node and then root finally to right child (LVR). Pre-order: It first traverse through root and then left child finally to right child (VLR). In auto-complete binary tree, the traversal through the tree is it first traverse through the root and then to the left child and then to its right child till the second letter is found; then, it traverses through the right child of the found node. To traverse for the word ARE First visit the root node A and traverse to its left child N. Then compare to second letter of the word. Since it is not the same, so this project traverse to its right child R and compare. It is same; therefore, this project traverse to its left child (Fig. 3).

46

T. Aditya Sai Srinivas et al.

Fig. 3 Finding the prefix ARE

2 Background The main theme of this study was to investigate if the word processing was helpful to people, especially the disabled people who find difficulty in writing. It mainly focused on teaching children to use word processing. There was an activity conducted among the children. The first case was the children wrote stories by their own which involved handwritten work while in the other case children used word processing and word prediction in writing. The difference was noted in the use of spellings, grammatical errors, and use of legible words. The results varied. The difference clearly noted the importance of use of word processing or word prediction [11–15]. Word processing with word prediction improves the legibility and spelling of written assignments completed by some children with learning disabilities and handwriting difficulties. Many students a with physical disabilities find difficulty in write fluently. One type of assistive technology that has been developed to improve accuracy in writing is word prediction software, although there is lack of research supporting its use for individuals with physical disabilities [16–20]. This study did a research on word prediction and word processing to examine the accuracy in writing draft papers by physically disabled people. Results indicated that there was no effect on the writing speed of the people but it shows promise in decreasing spelling and typographical errors [21–25]. Writing is a medium of human communication that involves interaction between physical and cognitive skills. Physically disabled people find difficulty in writing, and so they have to overcome few barriers in order to overcome the difficulties in writing. Most of the opportunities gained by the individuals are based on their writing skills, and so there must be some technological development in the field of writing. One such technology is assistive technology which helps in increasing the fluency in typing. The main motive of this study is to improve the writing skills of physically disabled people. There was an alternating treatment designed in which the diverse

Pattern Prediction Using Binary Trees

47

physically disabled people were recruited in. The words correct per minute (WCPM) and the grammatical errors were noted down for further investigation [26–30]. The recruited people were allowed to type for three minutes using the word processor and word prediction. This was done to check which of these two were more efficient in case of writing fluently. The most widely used websites involve the library website and the online searching tools that are being used by the youth to search things. And so providing automated searching features are most important for gaining relevant results of their academic research scenarios. This feature helps the user to end up getting the best hit. This technology auto-completion has been very productive very affordable for the people who have disabilities and find difficulty in writing. This thing is easily made available when compared to other speech-to-text technology or special input devices. The only disadvantage of this feature is that for a document, consider a set A and alphabetical range B of words, compute the set of all word-in-document pairs (a, b) from the collection such that a belongs to A and b belongs to B. This leads to the independent size of the underlying document collection. Python is the most frequently used language for testing advanced features compared to some auto hotkey scripting language [31–37].

3 Proposed Method Implementation of binary tree Binary tree data structure instantiates a pointer of node type and initiates its left and right child nodes to NULL. It contains methods to add a new word to the dictionary, search for the location of the partial word which returns the node pointer type. Also contains methods to parse the binary tree and return all the results in a vector of strings. Methods • void add word (string): adds the given string to the binary tree if it does not exist already. This function works on interfaces provided by the node class to append a child or adding the character of the word to the binary tree. • Node* search word (string, bool): returns pointer to the node that contains the last character in the string by traversing through the binary tree. If bool flag is false, then it returns pointer to the node even if it is not a word. If bool flag is true, then it only returns the pointer to the node if it is a word. • bool auto-complete (string, vector < string > &): returns true if auto-complete can be performed on the word entered by the user, else returns false. Calls two utility functions search word and parse tree. • void parse Tree (Node*, string&, vector < string > &, bool&): traverses the binary tree and appends the words in the vector < string > &.

48

T. Aditya Sai Srinivas et al.

Parsing the Tree Method void parse Tree(Node∗, string, vector < string > &, bool&) This module traverses through the binary tree and appends the word in the vector of strings which passed as reference to the parse tree function. It uses the algorithm. 1. If left exists, then go left. 2. If right exists, then go right else return to previous node.

4 Result Analysis In Fig. 4, y-axis represents the time in seconds, and x-axis represents the number of iterations for binary tree and trie data for prefix appl. The binary tree approach is better for the prefix appl (Tables 1 and 2).

Fig. 4 Time comparison for prefix appl

Table 1 Time taken for both approaches

Iteration

Time taken in BT (s)

Time taken in trie (s)

1

0.016

0.031

2

0.015

0.035

3

0.025

0.027

4

0.016

0.026

Pattern Prediction Using Binary Trees Table 2 Time comparison for different prefix

49

Iteration

Time taken in BT (s)

Time taken in trie (s)

1

0.016

0.030

2

0.016

0.031

3

0.015

0.031

4

0.015

0.026

Fig. 5 Time comparison for prefix Gan

In Fig. 5, y-axis represents the time in seconds, and x-axis represents the number of iterations for binary tree and trie data for prefix appl. The binary tree approach is better for the prefix Gan.

5 Conclusion Word predictor has application in messaging application like WhatsApp, web search engines, word processors, command like interpreters, etc. The original need of word prediction software was to help people with physical disabilities which increase their speed of typing as well as fewer the number of keystrokes needed in order to complete a word or a sentence. Thus, in this front, this project has developed a program for word prediction using data structure binary which definitely increases efficiency of the user by at least 10%.

50

T. Aditya Sai Srinivas et al.

References 1. Sturm, J.M., Rankin-Erickson, J.L.: This report that mind mapping helps students with learning disabilities to enhance their writing skills. Learn. Disabilities Res. Practice 17, 124–139 (2002) 2. Todman, J., Dugard, P.: Single-Case and Small-N Experimental Designs: A Practical Adviser to Randomization Tests. Lawrence Erlbaum Associates, Mahwah, NJ (2001) 3. Tumlin, J., Heller, K.: Using word prediction software, writing becomes more easier to mild disabilities. J. Special Educ. Technol. 19(3) (2004). https://jset.unlv.edu/19.3/tumlin/first.html 4. Weller, H.G.: Evaluating the effect of computer-based methods to support science teaching. J. Res. Comput. Educ. 28, 461–485 (1996) 5. Zhang, Y.: Technology and the writing skills of students with learning disabilities. J. Res. Comput. Educ. 32, 467–478 (2000) 6. Basu, S., Kannayaram, G., Ramasubbareddy, S., Venkatasubbaiah, C.: Improved genetic algorithm for monitoring of virtual machines in cloud environment. In: Smart Intelligent Computing and Applications, pp. 319–326. Springer, Singapore (2019) 7. Somula, R., Sasikala, R.: Round robin with load degree: an algorithm for optimal cloudlet discovery in mobile cloud computing. Scal. Comput. Practice Exper. 19(1), 39–52 (2018) 8. Somula, R., Anilkumar, C., Venkatesh, B., Karrothu, A., Kumar, C. P., Sasikala, R.: Cloudlet services for healthcare applications in mobile cloud computing. In: Proceedings of the 2nd International Conference on Data Engineering and Communication Technology, pp. 535–543. Springer, Singapore (2019) 9. Somula, R.S., Sasikala, R.: A survey on mobile cloud computing: mobile computing+ cloud computing (MCC= MC + CC). Scal. Comput. Pract. Experi. 19(4), 309–337 (2018) 10. Somula, R., Sasikala, R.: A load and distance aware cloudlet selection strategy in multi-cloudlet environment. Int. J. Grid High Perform. Comput. (IJGHPC) 11(2), 85–102 (2019) 11. Somula, R., Sasikala, R.: A honey bee inspired cloudlet selection for resource allocation. In: Smart Intelligent Computing and Applications, pp. 335–343. Springer, Singapore (2019) 12. Nalluri, S., Ramasubbareddy, S., Kannayaram, G.: Weather prediction using clustering strategies in machine learning. J. Comput. Theor. Nanosci. 16(5–6), 1977–1981 (2019) 13. Sahoo, K.S., Tiwary, M., Mishra, P., Reddy, S.R.S., Balusamy, B., Gandomi, A.H.: Improving end-users utility in software-defined wide area network systems. In: IEEE Transactions on Network and Service Management (2019) 14. Sahoo, K.S., Tiwary, M., Sahoo, B., Mishra, B.K., RamaSubbaReddy, S., Luhach, A.K.: RTSM: response time optimisation during switch migration in software-defined wide area network. In: IET Wireless Sensor Systems (2019) 15. Somula, R., Kumar, K.D., Aravindharamanan, S., Govinda, K.: Twitter sentiment analysis based on US presidential election 2016. In: Smart Intelligent Computing and Applications, pp. 363–373. Springer, Singapore (2020) 16. Sai, K.B.K., Subbareddy, S.R., Luhach, A.K.: IOT based air quality monitoring system using MQ135 and MQ7 with machine learning analysis. Scal. Comput. Practice Experi. 20(4), 599– 606 (2019) 17. Somula, R., Narayana, Y., Nalluri, S., Chunduru, A., Sree, K.V.: POUPR: properly utilizing user-provided recourses for energy saving in mobile cloud computing. In: Proceedings of the 2nd International Conference on Data Engineering and Communication Technology, pp. 585– 595. Springer, Singapore (2019) 18. Vaishali, R., Sasikala, R., Ramasubbareddy, S., Remya, S., Nalluri, S.: Genetic algorithm based feature selection and MOE Fuzzy classification algorithm on Pima Indians Diabetes dataset. In: 2017 International Conference on Computing Networking and Informatics (ICCNI), pp. 1–5. IEEE (2017, Oct)

Pattern Prediction Using Binary Trees

51

19. Somula, R., Sasikala, R.: A research review on energy consumption of different frameworks in mobile cloud computing. In: Innovations in Computer Science and Engineering, pp. 129–142. Springer, Singapore (2019); Kumar, I.P., Sambangi, S., Somukoa, R., Nalluri, S., Govinda, K.: Server security in cloud computing using block-chaining technique. In: Data Engineering and Communication Technology, pp. 913–920. Springer, Singapore (2020) 20. Kumar, I.P., Gopal, V.H., Ramasubbareddy, S., Nalluri, S., Govinda, K.: Dominant color palette extraction by K-means clustering algorithm and reconstruction of image. In: Data Engineering and Communication Technology, pp. 921–929. Springer, Singapore (2020) 21. Nalluri, S., Saraswathi, R.V., Ramasubbareddy, S., Govinda, K., Swetha, E.: Chronic heart disease prediction using data mining techniques. In: Data Engineering and Communication Technology, pp. 903–912. Springer, Singapore (2020) 22. Krishna, A.V., Ramasubbareddy, S., Govinda, K.: Task scheduling based on hybrid algorithm for cloud computing. In: International Conference on Intelligent Computing and Smart Communication 2019, pp. 415–421. Springer, Singapore (2020) 23. Srinivas, T.A.S., Ramasubbareddy, S., Govinda, K., Manivannan, S.S.: Web image authentication using embedding invisible watermarking. In: International Conference on Intelligent Computing and Smart Communication 2019, pp. 207–218. Springer, Singapore (2020) 24. Krishna, A.V., Ramasubbareddy, S., Govinda, K.: A unified platform for crisis mapping using web enabled crowdsourcing powered by knowledge management. In: International Conference on Intelligent Computing and Smart Communication 2019, pp. 195–205. Springer, Singapore (2020) 25. Saraswathi, R.V., Nalluri, S., Ramasubbareddy, S., Govinda, K., Swetha, E.: Brilliant corp yield prediction utilizing internet of things. In: Data Engineering and Communication Technology, pp. 893–902. Springer, Singapore (2020) 26. Baliarsingh, S.K., Vipsita, S., Gandomi, A.H., Panda, A., Bakshi, S., Ramasubbareddy, S.: Analysis of high-dimensional genomic data using map reduce based probabilistic neural network. Comput. Methods Progr. Biomed. 105625 (2020) 27. Lavanya, V., Ramasubbareddy, S., Govinda, K.: Fuzzy keyword matching using N-gram and cryptographic approach over encrypted data in cloud. In: Embedded Systems and Artificial Intelligence, pp. 551–558. Springer, Singapore (2020) 28. Revathi, A., Kalyani, D., Ramasubbareddy, S., Govinda, K.: Critical review on course recommendation system with various similarities. In: Embedded Systems and Artificial Intelligence, pp. 843–852. Springer, Singapore (2020) 29. Mahesh, B., Kumar, K.P., Ramasubbareddy, S., Swetha, E.: A review on data deduplication techniques in cloud. In: Embedded Systems and Artificial Intelligence, pp. 825–833. Springer, Singapore (2020) 30. Sathish, K., Ramasubbareddy, S., Govinda, K.: Detection and localization of multiple objects using VGGNet and single shot detection. In: Emerging Research in Data Engineering Systems and Computer Communications, pp. 427–439. Springer, Singapore (2020) 31. Pradeepthi, C., Geetha, V.V., Ramasubbareddy, S., Govinda, K.: Prediction of real estate price using clustering techniques. In: Emerging Research in Data Engineering Systems and Computer Communications, pp. 281–289. Springer, Singapore (2020) 32. Maddila, S., Ramasubbareddy, S., Govinda, K.: Crime and fraud detection using clustering techniques. In: Innovations in Computer Science and Engineering, pp. 135–143. Springer, Singapore (2020) 33. Rakshitha, K., Rao, A.S., Sagar, Y., Ramasubbareddy, S.: Demonstrating broadcast aggregate keys for data sharing in cloud. In: Innovations in Computer Science and Engineering, pp. 185– 193. Springer, Singapore (2020) 34. Ramasubbareddy, S., Srinivas, T.A.S., Govinda, K., Manivannan, S.S.: Comparative study of clustering techniques in market segmentation. In: Innovations in Computer Science and Engineering, pp. 117–125. Springer, Singapore (2020) 35. Ramasubbareddy, S., Srinivas, T.A.S., Govinda, K., Manivannan, S.S.: Crime prediction system. In: Innovations in Computer Science and Engineering, pp. 127–134. Springer, Singapore (2020)

52

T. Aditya Sai Srinivas et al.

36. Sahoo, K.S., Tiwary, M., Sahoo, S., Nambiar, R., Sahoo, B., Dash, R.: A learning automatabased DDoS attack defense mechanism in software defined networks. In: Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pp. 795–797 (2018, Oct) 37. Sahoo, K.S., Sahoo, S., Sarkar, A., Sahoo, B., Dash, R.: On the placement of controllers for designing a wide area software defined networks. In: TENCON 2017–2017 IEEE Region 10 Conference, pp. 3123–3128. IEEE (2017, Nov)

Fruit Recognition Using Deep Learning P. Balakesava Reddy, Somula Ramasubbareddy, D. Saidulu, and K. Govinda

Abstract This paper discusses on the fruits classification for which the data is collected from the dataset called Fruits_360. Using this data, training of a neural network which will identify the fruit. Using the deep learning and image processing concepts form a neural networking system. The proposed work uses convolution neural networks in building the model and also used ResNet to get the results of image classification from deep learning concept. To meet the resource requirement for the proposed work, it uses Google cloud vision API which gives us the required GPU to proceed with the process of analyzing the data from the image and also discussed the in depth of how image classification is done using deep learning concepts. Here, in building a deep learning model, which classifies the given image into any of these nine categories: Apple, Avocado, Banana, Cherry, Cocos, Kiwi, Mango, Orange, Lemon. This model can also be implemented into mobile version. Keywords Deep learning · ResNet · Prediction · Network

1 Introduction Convolution neural networks are designed based on the neural networks of human brain system. The human brain needs many incidents in real life so that our human brain recognizes the incidents again and provides the action that is to be given [1]. Similarly, here, the testing data is given to the convolution network, and this testing data is used to train the network for the further validation of the images [2]. So, this P. Balakesava Reddy · S. Ramasubbareddy (B) Information Technology, VNRVJIET, Hyderabad, Telangana, India e-mail: [email protected] D. Saidulu Information Technology, Guru Nanak Institutions Technical Campus, Hyderabad, Telangana, India K. Govinda SCOPE, VIT University, Vellore, Tamilnadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_7

53

54

P. Balakesava Reddy et al.

convolution network concept is similar to the human neural networks, and moreover, it is designed in the same manner. Now, in the testing process, the given image is read and compared with many other images and finds the nearest image to the given image based on the probabilities. It processes the whole images sets until the last image even though if it finds the accurate image so giving poor bastards will form more complicated network, and achieving the high accurate output can be more complicated and sometimes not even possible [3]. In convolution, the images are classified into two parts one is black and white image which forms a 2D array and the other colored image which forms a 3D array. Since they are different, the values assigned will be different for the pixels when given to the CNN [4]. For the black and white image the values to the pixel are assigned between 0 and 255 to represent the what is the color of the pixel, where colored image is a combination of red green blue that they have separate extra layer which means each color has a range of 0–255 like a pixel might have (255, 105, 180) as its pixel value, and this defines the pink color of the pixel in the image [5]. So from this, we get to know the color of the given image which is considered as the input of the network. First set the boundaries of the image by detecting its edges, and we sent the data inside the boundaries as 1’s, and the remaining part of the image is giving 0’s; now, we will be able to mark the location of the fruit boundaries. Now, we use a 3 × 3 matrix as feature detector which is a kernel which detects the image and gets the data. The input image is placed such that the cell from the first row and first column is fit inside the boundary of a selected image. Later on the feature detector is moved toward to the other end. Then, the detector is moved in the next row from the previous row which results in the feature map. This feature map helps in reducing the pixel which reduces the input image size and takes less time to detect the image for further usage. The larger the stride is, the lesser the image map will be, and the lesser the image map, then the accuracy will also be decreased. There is also a loss in the information of the data which might reduce the accuracy; however, its function is to take only main content of the image and remove the extra part and which also improves the accuracy. Reducing the input image means that we are just concentrating on the main features of the input; this will help the detection process to concentrate only on the main data instead of the useless data in the image which also will decrease the accuracy of the system [2]. The linearity for the image is maintained through the usage of rectified linear unit when the image is under convolution operation. It removes all the black elements and taking only positive value from the data [3]. In general, we use a regular feature map, whereas there is another kind, i.e., pooled feature map which process differs from the general one. Now, let us take 2 × 2 box and place it as usual we place in 3 × 3 at left corner. Now move it along the opposite end of the row. If we use strides of 2 pixel it results in 3 × 3 pooled feature map. In general, stride of 2 are very familiar in usage. The minimum value that leads the account for distortion is the whole point for pooling. So, use these techniques creation of the model can be achieved [6–10].

Fruit Recognition Using Deep Learning

55

2 Literature Survey The previous work where the neural networks and deep learning concepts are used in the image classification. Following these papers which discusses about counting the fruits of various kinds from the given bunch of fruits. Locating and counting the red and green pepper fruits from large bunch of fruits is the main aim. They had used around 28,000 images for training and validation of around various plants. Two steps involved in the process, one among them is placing the single image, and the other one is integrating all the views to get accuracy. Here, in the project, creation neural network concepts are used and also should use convolution networks. Network will be trained using RGB images; the RGB images are further normalized into two dimensions. The paper which discusses on the apple production prediction from which get to know how to get the edges and cross-sectional area of the fruits and getting the cross-sectional area of the ripened part. Detect the damaged part of the fruit based on the texture and color of the fruit, and this is compared with the other testing data using the k-nearest neighbor algorithm which predicts the accuracy. This is also used in face detection and vehicle detection based on linear projections and analysis of the image [1–4]. In this model, the concepts of convolutional neural network (CNN). This has five layers called convolution layer after this then it goes to rectified linear Unit layer and which passes through pooling layer then it goes to the fully connected layers and then loss layer. Here, we use an RGB images with pixel size of 100 × 100 [5]. The operation over two functions which produces the third function is called convolution. Here, the third result is derived from the two functions which have all the characteristics of the two functions used to derive it. In convolution layer is also done the same the input from all image data which is convoluted to form a resulting function which predicts the output [11–15]. To increase the non-linear properties of the input data, we use ReLU layer. To reduce the dimensions and to reduce the number of computations, we use pooling layers. 2 × 2 is the pooling layer filter used by us with 2 strides. This makes us to reduce the input to one fourth of it. The layers from the regular neural network are called fully connected layers. The connection between the neurons from one layer to another layer is done here [15–20].

3 Proposed Method All the images in the dataset were pre-processed using tensorflow image data generator. Since there are some null entries for age, they are being filled by the mean age in the dataset. To avoid overfitting and exploding or vanishing of gradients, all the images in the dataset were normalized by subtracting by their mean and divided by their standard deviation. Since the deep learning models can only feed on the numeric data, we cannot directly feed the raw images to the model, so we convert the

56

P. Balakesava Reddy et al.

images into numeric tensors which are multi-dimensional arrays which are mostly three dimensional. Generally, gray scale images are two dimensional, and their pixel value varies between 0 and 255, whereas in the case of colored images, they are three dimensional, and the third dimension is the color dimension, which generally have three channels which are red, green, blue channels which are called as the RGB channels. Similar to the gray-scaled images, the pixel value varies between 0 and 255 in these channels. Accordingly, all the images in the dataset will be converted to numeric tensors with multiple channels. The training and the testing data were split into 80:20 ratio. In present paper, size of the training set is increased by data augmentation. The parameters such as rotation range, vertical flip, horizontal flip, height shift range width shift range, and zoom range are added to the images to increase its size by data augmentation. By applying these parameters, the image randomly gets cropped and rotated, zoomed, flipped in different angles. In this way, the model won’t see the repeated images of base image but it learns better by observing the image in different ways. So, there is no chance of overfitting the model by data augmentation. The values of the parameters used for this paper are: Rotation range = 0.5, Zoom range = 0.5, Width shift range = 0.5, Height shift range = 0.5, Horizontal flip = False, Vertical flip = False. Normalizing our data is very important in deep learning because in the training data, there will be various ranges of distribution of feature values for each feature, and there will be a update caused by learning rate in each and every dimension that will be different from other dimensions. We may be highly increasing the correction in one of the weight dimensions at the same time decreasing in another. Training also takes long time when training the model without normalization. In this paper, we have normalized our training and test data by subtracting with their theoretical mean values and then dividing by their standard deviation. The values will be roughly between zero and one. The speed of the training also increases since the gradient updating becomes easier and gradually the accuracy increases [21–25]. In this paper, the proposed method uses convolutional neural networks which are playing major role in solving problems related to computer vision. Convolution neural networks are similar to general neural networks which consist of input, hidden layers, and output. The activation function used in this work is rectified linear unit (RELU). The following layers after the convolution layer in the network are the max pooling layers which help in reducing the dimension, preserve spatial invariance and to output the high-level features of the input. Since to avoid overfitting, gradient explode or gradient vanishing we have added some dropout layers in between the model. Since the model is mainly build using convolutional layers and we need an output among seven classes so at the end of the model, we had flattened the layers and we have added some dense layers along with some dropout layers and as the final layer, we have a SoftMax layer since we have multiple classes in our data. Since

Fruit Recognition Using Deep Learning

57

Fig. 1 Convolution neural network using ResNet flow diagram

there are seven classes, we build the final layer with seven neurons. In the output, we get seven probable values; each neuron outputs a probability value of input image belonging to that class, and all the values of the seven neurons add up to get the sum as one (Fig. 1). The proposed method also uses tensorflow framework for designing, training, and evaluating our deep learning model in this task. The model in this algorithm is built mainly by using transfer learning, which means using a pre-trained model which is trained previously on large datasets, and the pre-trained model can be used in building our model. The pre-trained model used in our model is ResNet which is trained on 14 million images containing more than 1000 classes (categories). ResNet is trained on the ImageNet dataset, and the weights are saved after the training. So, at the starting stage of the training, we assign these weights, which are generated when trained ImageNet to the pre-trained model. And above the pre-trained model, we have added some customized convolutional and max pooling layers. At first, we train the model by freezing the weights of pre-trained model; it means that the pre-trained model has the initial weights, and the weights don’t get updated. During the training, only the parameters of added layers will get updated. The performance of the model gets better after unfreezing the weights because the whole model will be trained on the particular dataset provided by us. The parameters in the model get updated by back propagation which means after calculating the loss between the actual and the predicted output the parameters gets updated with respect to the loss value [26–30]. The loss between the actual and predicted output is calculated by using the function called categorical cross-entropy. In this work, the optimizer used to update the parameters which are RMSprop optimizer with an initial learning rate of 0.0001. As we are adding the earlier model to the latter model, artificial neural networks and convolutional neural networks collide. Hence, it becomes more complex and

58

P. Balakesava Reddy et al.

Table 1 Network structure parameters

Layer

Dimension parameter

Output

Convolutional layer

3×3×4

16

Max pooling

2 × 2—Stride: 2



Convolutional layer

3 × 3 × 16

32

Max pooling

2 × 2—Stride: 2



Convolutional layer

3 × 3 × 32

64

Max pooling

2 × 2—Stride: 2



Convolutional layer

3 × 3 × 64

128

Max pooling

2 × 2—Stride: 2



Fully connected layer

3 × 3 128

1024

Fully connected layer

1024

256

Softmax

256

60

sophisticated during the beginning stage of creating a convolutional neural network. Here, the artificial neural network plays a vital role by making the convolutional network more capable of categorizing an image. Artificial neural network helps in getting the data, integrating the features and making the convolutional network more efficient. Since we use many classes of data, we have to use the soft max function [31–37] (Table 1).

4 Result 4.1 Validation and Testing Results After Each Epoch Epoch 31/40 95/95 [==============================]—3 s 0.0558—acc: 0.9784—val_loss: 0.0126—val_acc: 0.9959 Epoch 32/40 95/95 [==============================]—3 s 0.0602—acc: 0.9764—val_loss: 0.0034—val_acc: 0.9995 Epoch 33/40 95/95 [==============================]—3 s 0.0603—acc: 0.9763—val_loss: 0.0618—val_acc: 0.9808 Epoch 34/40 95/95 [==============================]—3 s 0.0544—acc: 0.9785—val_loss: 0.0254—val_acc: 0.9906 Epoch 35/40 95/95 [==============================]—3 s 0.0703—acc: 0.9720—val_loss: 0.0472—val_acc: 0.9837

31 ms/step—loss:

31 ms/step—loss:

30 ms/step—loss:

31 ms/step—loss:

31 ms/step—loss:

Fruit Recognition Using Deep Learning

59

Fig. 2 Accuracy after 40 epochs

Epoch 36/40 95/95 [==============================]—3 s 31 ms/step—loss: 0.0551—acc: 0.9788—val_loss: 0.0671—val_acc: 0.9787 Epoch 37/40 95/95 [==============================]—3 s 30 ms/step—loss: 0.0510—acc: 0.9800—val_loss: 0.0124—val_acc: 0.9925 Epoch 38/40 95/95 [==============================]—3 s 32 ms/step—loss: 0.0558—acc: 0.9781—val_loss: 0.0173—val_acc: 0.9934 Epoch 39/40 95/95 [==============================]—3 s 30 ms/step—loss: 0.0481—acc: 0.9810—val_loss: 0.0038—val_acc: 0.9979 Epoch 40/40 95/95 [==============================]—3 s 31 ms/step—loss: 0.0407—acc: 0.9839—val_loss: 0.0417—val_acc: 0.9889 (Figs. 2 and 3).

5 Conclusion An effective algorithm for detection and to track the objects is explained. Also got to know about the drawbacks and efficiency of the algorithm. In request to defeat the issue of recognition, tracking related to movement and appearance. The major application of fruit detection can be observed in the vision-based AI’s, where the identification and tracking of individuals play major role. For any fruit tracking algorithm, the initial step is to locate the fruit in the respective frame. Though there are numerous algorithms choosing the accurate location of the face which has been

60

P. Balakesava Reddy et al.

Fig. 3 Confusion matrix

a difficult task. A CNN is widely being used by all kind of researchers for the fruit detection. Tracking is followed by the fruit detection. There are many algorithms to track the objects. For future work, one can implement the same algorithm for some more different objects with some more advanced filters for the noise reduction. In request to defeat the issue of recognition, tracking related to movement and appearance.

References 1. O’Shea, K., Nash, R.: An Introduction to Convolutional Neural Networks. ArXiv e-prints (2015) 2. Albawi, S., Abed Mohammed, T., Alzawi, S.: Understanding of a Convolutional Neural Network (2017). https://doi.org/10.1109/ICEngTechnol.2017.8308186 3. Khan, A., Sohail, A., Zahoora, U., Saeed, A.: A Survey of the Recent Architectures of Deep Convolutional Neural Networks (2019) 4. Zhang, F., Hu, M.: Memristor-Based Deep Convolution Neural Network: A Case Study (2018) 5. Bambharolia, P.: Overview of convolutional neural networks (2017) 6. Basu, S., Kannayaram, G., Ramasubbareddy, S., Venkatasubbaiah, C.: Improved genetic algorithm for monitoring of virtual machines in cloud environment. In: Smart Intelligent Computing and Applications, pp. 319–326. Springer, Singapore (2019) 7. Somula, R., Sasikala, R.: Round robin with load degree: an algorithm for optimal cloudlet discovery in mobile cloud computing. Scal. Comput. Practice Exper. 19(1), 39–52 (2018) 8. Somula, R., Anilkumar, C., Venkatesh, B., Karrothu, A., Kumar, C.P., Sasikala, R.: Cloudlet services for healthcare applications in mobile cloud computing. In: Proceedings of the 2nd International Conference on Data Engineering and Communication Technology, pp. 535–543. Springer, Singapore (2019) 9. Somula, R.S., Sasikala, R.: A survey on mobile cloud computing: mobile computing + cloud computing (MCC= MC + CC). Scal. Comput. Practice Experi. 19(4), 309–337 (2018)

Fruit Recognition Using Deep Learning

61

10. Somula, R., Sasikala, R.: A load and distance aware cloudlet selection strategy in multi-cloudlet environment. Int. J. Grid High Perform. Comput. (IJGHPC) 11(2), 85–102 (2019) 11. Somula, R., Sasikala, R.: A honey bee inspired cloudlet selection for resource allocation. In: Smart Intelligent Computing and Applications, pp. 335–343. Springer, Singapore (2019) 12. Nalluri, S., Ramasubbareddy, S., Kannayaram, G.: Weather prediction using clustering strategies in machine learning. J. Comput. Theor. Nanosci. 16(5–6), 1977–1981 (2019) 13. Sahoo, K.S., Tiwary, M., Mishra, P., Reddy, S.R.S., Balusamy, B., Gandomi, A.H.: Improving end-users utility in software-defined wide area network systems. In: IEEE Transactions on Network and Service Management 14. Sahoo, K.S., Tiwary, M., Sahoo, B., Mishra, B.K., RamaSubbaReddy, S., Luhach, A.K.: RTSM: response time optimisation during switch migration in software-defined wide area network. In: IET Wireless Sensor Systems 15. Somula, R., Kumar, K.D., Aravindharamanan, S., Govinda, K.: Twitter sentiment analysis based on US presidential election 2016. In: Smart Intelligent Computing and Applications, pp. 363–373. Springer, Singapore (2020) 16. Sai, K.B.K., Subbareddy, S.R., Luhach, A.K.: IOT based air quality monitoring system using MQ135 and MQ7 with machine learning analysis. Scal. Comput. Practice Experi. 20(4), 599– 606 (2019) 17. Somula, R., Narayana, Y., Nalluri, S., Chunduru, A., Sree, K.V.: POUPR: properly utilizing user-provided recourses for energy saving in mobile cloud computing. In: Proceedings of the 2nd International Conference on Data Engineering and Communication Technology, pp. 585– 595. Springer, Singapore (2019) 18. Vaishali, R., Sasikala, R., Ramasubbareddy, S., Remya, S., Nalluri, S.: Genetic algorithm based feature selection and MOE Fuzzy classification algorithm on Pima Indians diabetes dataset. In: 2017 International Conference on Computing Networking and Informatics (ICCNI), pp. 1–5. IEEE (2017, Oct) 19. Somula, R., Sasikala, R.: A research review on energy consumption of different frameworks in mobile cloud computing. In: Innovations in Computer Science and Engineering, pp. 129–142. Springer, Singapore (2019); Kumar, I.P., Sambangi, S., Somukoa, R., Nalluri, S., Govinda, K.: Server security in cloud computing using block-chaining technique. In: Data Engineering and Communication Technology, pp. 913–920. Springer, Singapore (2020) 20. Kumar, I.P., Gopal, V.H., Ramasubbareddy, S., Nalluri, S., Govinda, K.: Dominant color palette extraction by K-means clustering algorithm and reconstruction of image. In: Data Engineering and Communication Technology, pp. 921–929. Springer, Singapore (2020) 21. Nalluri, S., Saraswathi, R.V., Ramasubbareddy, S., Govinda, K., Swetha, E.: Chronic heart disease prediction using data mining techniques. In: Data Engineering and Communication Technology, pp. 903–912. Springer, Singapore (2020) 22. Krishna, A.V., Ramasubbareddy, S., Govinda, K.: Task scheduling based on hybrid algorithm for cloud computing. In: International Conference on Intelligent Computing and Smart Communication 2019, pp. 415–421. Springer, Singapore (2020) 23. Srinivas, T.A.S., Ramasubbareddy, S., Govinda, K., Manivannan, S.S.: Web image authentication using embedding invisible watermarking. In: International Conference on Intelligent Computing and Smart Communication 2019, pp. 207–218. Springer, Singapore (2020) 24. Krishna, A.V., Ramasubbareddy, S., Govinda, K.: A unified platform for crisis mapping using web enabled crowdsourcing powered by knowledge management. In: International Conference on Intelligent Computing and Smart Communication 2019, pp. 195–205. Springer, Singapore (2020) 25. Saraswathi, R.V., Nalluri, S., Ramasubbareddy, S., Govinda, K., Swetha, E.: Brilliant corp yield prediction utilizing internet of things. In: Data Engineering and Communication Technology, pp. 893–902. Springer, Singapore (2020) 26. Baliarsingh, S.K., Vipsita, S., Gandomi, A.H., Panda, A., Bakshi, S., Ramasubbareddy, S.: Analysis of high-dimensional genomic data using mapreduce based probabilistic neural network. Comput. Methods Progr. Biomed. 105625 (2020)

62

P. Balakesava Reddy et al.

27. Lavanya, V., Ramasubbareddy, S., Govinda, K.: Fuzzy keyword matching using N-gram and cryptographic approach over encrypted data in cloud. In: Embedded Systems and Artificial Intelligence, pp. 551–558. Springer, Singapore (2020) 28. Revathi, A., Kalyani, D., Ramasubbareddy, S., Govinda, K.: Critical review on course recommendation system with various similarities. In: Embedded Systems and Artificial Intelligence, pp. 843–852. Springer, Singapore (2020) 29. Mahesh, B., Kumar, K.P., Ramasubbareddy, S., Swetha, E.: A review on data deduplication techniques in cloud. In: Embedded Systems and Artificial Intelligence, pp. 825–833. Springer, Singapore (2020) 30. Sathish, K., Ramasubbareddy, S., Govinda, K.: Detection and localization of multiple objects using VGGNet and single shot detection. In: Emerging Research in Data Engineering Systems and Computer Communications, pp. 427–439. Springer, Singapore (2020) 31. Pradeepthi, C., Geetha, V.V., Ramasubbareddy, S., Govinda, K.: Prediction of real estate price using clustering techniques. In: Emerging Research in Data Engineering Systems and Computer Communications, pp. 281–289. Springer, Singapore (2020) 32. Maddila, S., Ramasubbareddy, S., Govinda, K.: Crime and fraud detection using clustering techniques. In: Innovations in Computer Science and Engineering, pp. 135–143. Springer, Singapore (2020) 33. Rakshitha, K., Rao, A.S., Sagar, Y., Ramasubbareddy, S.: Demonstrating broadcast aggregate keys for data sharing in cloud. In: Innovations in Computer Science and Engineering, pp. 185– 193. Springer, Singapore (2020) 34. Ramasubbareddy, S., Srinivas, T.A.S., Govinda, K., Manivannan, S.S.: Comparative study of clustering techniques in market segmentation. In: Innovations in Computer Science and Engineering, pp. 117–125. Springer, Singapore (2020) 35. Ramasubbareddy, S., Srinivas, T.A.S., Govinda, K., Manivannan, S.S.: Crime prediction system. In: Innovations in Computer Science and Engineering, pp. 127–134. Springer, Singapore (2020) 36. Sahoo, K.S., Tiwary, M., Sahoo, S., Nambiar, R., Sahoo, B., Dash, R.: A learning automatabased DDoS attack defense mechanism in software defined networks. In: Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pp. 795–797 (2018, Oct) 37. Sahoo, K.S., Sahoo, S., Sarkar, A., Sahoo, B., Dash, R.: On the placement of controllers for designing a wide area software defined networks. In: TENCON 2017–2017 IEEE Region 10 Conference, pp. 3123–3128. IEEE (2017, Nov)

Cross-Domain Variational Capsules for Information Extraction Akash Nagaraj, K. Akhil, Akshay Venkatesh, and H. R. Srikanth

Abstract In this paper, we present a characteristic extraction algorithm and the Multi-domain Image Characteristics Dataset of characteristic-tagged images to simulate the way a human brain classifies cross-domain information and generates insight. The intent was to identify prominent characteristics in data and use this identification mechanism to auto-generate insight from data in other unseen domains. An information extraction algorithm is proposed which is a combination of Variational Autoencoders (VAEs) and Capsule Networks. Capsule Networks are used to decompose images into their individual features and VAEs are used to explore variations on these decomposed features. Thus, making the model robust in recognizing characteristics from variations of the data. A noteworthy point is that the algorithm uses efficient hierarchical decoding of data which helps in richer output interpretation. Noticing a dearth in the number of datasets that contain visible characteristics in images belonging to various domains, the Multi-domain Image Characteristics Dataset was created and made publicly available. It consists of thousands of images across three domains. This dataset was created with the intent of introducing a new benchmark for fine-grained characteristic recognition tasks in the future. Keywords Machine reasoning · Image information · Capsule networks · Variational autoencoders · Hierarchical decoding.

A. Nagaraj · K. Akhil · A. Venkatesh (B) · H. R. Srikanth Department of Computer Science, PES University, Bengaluru, India e-mail: [email protected] A. Nagaraj e-mail: [email protected] K. Akhil e-mail: [email protected] H. R. Srikanth e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_8

63

64

A. Nagaraj et al.

1 Introduction The machine reasoning domain [1], a part of the machine learning umbrella, deals with extracting information from latent data, decoding it and reasoning out the decisions made by machine learning systems. Machine reasoning is a two-step process; the generation of information and the generation of reasoning from this information. We extract information by training the model on a few domains and testing the model on a new domain. In doing so, the model discovers information from the new domain. Though this might not seem like machine reasoning in the truest sense, it does generate information from latent data. With this paper, we aim to solve a small problem in this vast domain: Simulate the way a human brain classifies cross-domain information and generates insight, by identifying prominent characteristics in data and use this identification mechanism to auto-generate insight from data in unseen domains. A part of machine reasoning is transfer learning [2]. It stores the knowledge gained from tackling one problem and applies it to another problem which is related to the previous problem solved. Our model incorporates transfer learning to transfer latent information across domains, known as Domain Adaptation [3].

1.1 Domain Adaptation Domain adaptation is a field that deals with machine learning as well as transfer learning. Domain Adaptation can be used when the goal is to learn from one source distribution and apply the learning to a different target distribution related to the source. Scenarios, in which there are multiple source distributions present, are called multi-source domain adaptations. Research being done in this field addresses a major issue—the need to determine a model’s capacity to accurately accept data from a given target domain and label that data accordingly. The challenge arises because the model is trained on a different source domain. Unsupervised learning algorithms [4] that are implemented without using domain adaptation assume that the examples are independent and identically distributed.

2 Dataset 2.1 Introduction The dataset introduced in this paper, the Multi-domain Image Characteristic Dataset [5], consists of thousands of images sourced from the internet. Each image falls under one of three domains—animals, birds or furniture. There are five types under each domain. There are 200 images of each type, summing up the total dataset to

Cross-Domain Variational Capsules for Information Extraction

65

3000 images. The master file consists of two columns; the image name and the visible characteristics in that image. Every image was manually analysed and the characteristics for each image was generated, ensuring accuracy. Images falling under the same domain have a similar set of characteristics. For example, pictures under the Birds domain will have a common set of characteristics such as the color of the bird, the presence of a beak, wing, eye, legs, etc. Care has been taken to ensure that each image is as unique as possible by including pictures that have different combinations of visible characteristics present. This includes pictures having variations in the capture angle, etc.

2.2 Why Our Dataset Is Required? At the time of our research, there was a dearth of publicly available datasets that contain visible characteristics in images belonging to various domains. The proposed dataset [5] addresses this, as it has the following features: • describes visible characteristics present in every picture. • contains at least hundreds of pictures belonging to multiple domains, and also contains multiple types within each domain. This is crucial to train our model accurately. • contains unique pictures belonging to a type that fall under a certain domain. This is accomplished by collecting pictures that have different combinations of visible characteristics, different angles in which the object was captured, etc.

2.3 Training and Testing We recommend a test-train split of 600 samples (20%) and 2400 samples (80%). A .txt file with the images to be included in the test and train splits is included, with no overlap between the sets. Following the train-test split as mentioned would help ensure consistency of experiments reported on the Multi-domain Image Characteristics Dataset.

3 Approach 3.1 Variational Capsules Variational capsules are a combination of capsule networks [6] and variational autoencoders [7]. The capsules generated from capsule networks follow a known prior distribution, and new capsules can be sampled from each of them. They are a

66

A. Nagaraj et al.

natural fit for the model presented in this paper, as they provide a rich representation of image data and are robust to tiny variations in the decoupled features of the image.

3.2 Cross Domain Variational Capsules Cross-Domain Variational Capsules are an enhancement to Variational Capsules introduced in the previous subsection. After the latent representation of Variational Capsules is generated for the input image data, this representation is fed to the Information Decoder. The Information Decoder performs the hierarchical decoding of the rich latent information available from the capsules. In comparison with traditional decoders, our decoder preserves the hierarchical relationship—constructed by the capsules—between features in the data. It leverages the depth of information (in the form of a vector) available for each feature to construct a multi-hot vector identifying the important characteristics from a vocabulary of words spanning all the domains in scope. The representation can also be leveraged to store cross-domain information to perform information extraction across them. The Cross-domain Variational Capsule model is divided into two parts: Creating the latent representation (Variational Autoencoders and Capsule Networks) and Generating insights from that representation (a tailor-made deep network is used for this). A high-level overview of the model can be seen in Fig. 1.

3.3 The Model Let w[lower, higher] be a matrix where (lower, higher) are the dimensions of lowerlevel and higher-level capsules respectively. The depth of the vector (dimensions) is achieved by stacking m feature maps together. The vector output of the 32 lower capsules is sent to all the higher-level capsules.

Fig. 1 Model design

Cross-Domain Variational Capsules for Information Extraction

67

Essentially, from the squash function, it can be inferred the lower level capsules sends information only to the capsule having the closest centroid to themselves; as it reinforces this connection. It enforces a level of agreement or disagreement between the capsules in different layers. The squash function: vj =

||sj2 || 1+

||sj2 ||

sj ||sj ||

(1)

3.4 Learning Algorithm A prediction vector uˆ i/j is the prediction from the capsule i to the output of the capsule j. If the activity vector vj is in close agreement with the prediction vector uˆ i/j , we strengthen the connection bij . This is the Routing algorithm introduced in capsule networks. “Agreement” coefficient: aij =< uˆ i/j , vi >

(2)

The Routing algorithm works on inner epochs/iterations which specify the number of times it needs to be run. This is a hyper-parameter to the capsule network model. An epoch starts with bij = 0 for all capsules i in the lower level and corresponding connection capsules j in the higher level. A normalization function is added to bij . We define cij = softmax(bij )

(3)

An agreement weighted sum is calculated, sj =



cij uˆ j/i

(4)

i

After squishing this sum, we get vj =

||sj2 || 1 + ||sj2 ||

sˆj

(5)

Finally, we update the weight of the connection (bij = bij + uˆ j/i vj ). This process is performed for all pair wise capsule layers.

(6)

68

A. Nagaraj et al.

3.5 Losses The total loss is defined as TL = Marginal Loss + αCapsule Loss + βKL Divergence Loss.

(7)

where α, β, and γ are constants. It is important to note that the Reconstruction Loss is not relevant for our model. However for capsule training purposes, we chose to keep it.

3.5.1

Capsule Loss

The capsule loss Lc for each capsule is Lc = Tc max(0, m+ − ||vc ||)2 + λ(1 − Tc )max(0, ||vc || − m− )2

(8)

where Tc is 1 if an object of class C is present (If a relevant object is present, the capsule agrees with the lower level capsule), m+ is the threshold for ||vc || if Tc = 1, m− is the threshold for ||vc || if Tc = 0 and λ is a learning hyper-parameter (negative sample loss rate).

3.5.2

Marginal Loss/Hinge Loss

The Hinge loss is: (LM = max(0, 1 − t.y)

(9)

where t is the target and y is the output.

3.5.3

KL Divergence Loss

[8] Let Z be a latent variable, X be a real distribution, P the encoder network, Q the decoder network and E the expectation. log P(X ) − DKL (Q(Z|X )||P(Z|X )) = E(log P(X |Z)) − DKL (Q(Z|X )||P(Z)) (10) Equation (10) is the variational autoencoder objective function. The left-hand side of the objective can be interpreted as lowering the bound of log P(X ), which describes our data. The error is the KL Divergence term and lowers the bound of P(X ). The maximum likelihood estimate [9] (MLE) can be calculated by maximizing log(P(X |Z)) and minimizing the difference between the true latent distribution P(Z) and a simple Gaussian distribution Q(Z|X ).

Cross-Domain Variational Capsules for Information Extraction

69

Variational autoencoders deal with constructing the underlying distribution of the prior. To achieve this, it uses a reparameterization trick to reconstruct the distribution from the trained μ and log(σ )2 of the prior. Log variance is used instead of true variance (σ 2 ) as it is less volatile and numerically stable. DKL is to be reduced to P(Z) = N (0, I ). Let Q(Z|X ) be Gaussian with parameters μ(x) and (x). These are the trainable capsules’ mean and log variance. DKL between these two distributions are computed in the close form. DKL [N (μ(x), (x))||N (0, I )] = 0.5(trace((x)) + μ(x)T μ(x) − k − log det((x)))

(11) where, k is the dimension of the Gaussian distribution, trace(x) is the trace function (sum of diagonal of X ) and det(x) is the determinant (diagonal of matrix X ). LKL = 0.5k (σ 2 (X ) + mu2 (X ) − 1 − log σ 2 (X ))

(12)

4 Experiments and Results 4.1 Model Evaluation 4.1.1

Metrics

The model’s objective dictates that it is tolerant with noisy characteristics but not with missing ones. Due to this unequal weightage given to false positives and false negatives, accuracy is a poor evaluation metric. Hence, the model uses recall and precision instead. To achieve the objective, the recall must be high, while the precision could be low.

4.1.2

Evaluation

To evaluate the performance of the Cross-domain Variational Capsule model, we used the Multi-domain Image Characteristics Dataset. We have trained the model on 3 domains: Animals, Birds and Furniture. To test the model, we used crossvalidation with a 20–80 test-train split. A simple end-to-end supervised training of image versus characteristic gave poor results. We also made sure that capsules were trained sufficiently to accurately generate the rich vector representation for each class. Hence, the model is trained on two levels: • The Variational Capsule setup is a typical Capsule Network with output capsules duplicated to be the mean and variances for each capsule. It is trained with the image as the input and the classification as output. This setup uses a modified Capsule Routing algorithm to train both sections simultaneously.

70

A. Nagaraj et al.

• The Information Decoder is a hierarchical neural network (where the nodes in a layer are connected to only its parent in the previous layer). It is trained with the image as input and its corresponding characteristics as output.

4.2 Results The results obtained by our algorithm on the Multi-domain Image Characteristic Dataset is seen in Table 1. As seen, the value of precision is low, while the value of recall is high because recall depicts the capability of the model to identify relevant characteristics, while precision depicts the proportion of the characteristics the model identifies correctly, to the correct characteristics. Although the accuracy of the model on a whole is quite low (at about 18%), considering precision and recall shows that the model can successfully identify characteristics in image data. A point worth noting; F1-score is a metric that finds the balance between precision and recall, and was not a relevant metric to consider in our case, as all the classes had an equal number of data points. A sample output is seen in Fig. 2, showing the probabilities of the characteristics identified in the sample image of a dog from the proposed dataset [5].

Table 1 Model results Metric Recall Precision

Value 0.7666 0.0024

Fig. 2 Sample output: characteristic identification from a sample image

Cross-Domain Variational Capsules for Information Extraction

71

5 Conclusion A cross-domain information extraction algorithm using Variational Capsules that learns to extract individual characteristics from image data is proposed. The aim of this algorithm is not to improve an existing model but to satisfactorily solve the relatively recent problem of identifying prominent characteristics of data. This algorithm preserves the relationship developed between features in capsules, using hierarchical decoding as opposed to fully-connected layers. It is also very data efficient, working with a limited number of data points on multi-domain information and is also robust to noise owing to the use of Variational Capsules. Our algorithm was evaluated using the Multi-Domain Image Characteristics Dataset, confirming that it successfully extracts characteristics (or information in general) from image data. The algorithm can also work on any form of data supported by capsules. Potential applications of our algorithm are numerous as information extraction is used in a wide number of fields. Image characteristic extraction is also very versatile and is used in a plethora of fields ranging from autonomous driving to astronomy.

5.1 Future Enhancements Future enhancements include experimentation with different data formats (audio, text, etc.) and characteristic recognition methods. Applying the above algorithm to different data formats, and extracting characteristics from the data, we aim to best represent the underlying characteristics of all formats of data. An additional improvement would be to improve the efficiency and speed of the proposed algorithm, drawing inspiration from similar real-time approaches [10].

References 1. Bottou, L.: From machine learning to machine reasoning. Mach. Learn. 94(2), 133–149 (2014) 2. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009) 3. Sun, B., Feng, J., Saenko, K.: Return of frustratingly easy domain adaptation. In: Thirtieth AAAI Conference on Artificial Intelligence (2016) 4. Barlow, H.B.: Unsupervised learning. Neural Comput. 1(3), 295–311 (1989) 5. Nagaraj, A.K.A., Venkatesh, A.: Multi-domain Image Characteristic Dataset. https://www. kaggle.com/grassknoted/multidomain-image-characteristics-dataset (2020) 6. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, pp. 3856–3866 (2017) 7. Doersch, C.: Tutorial on variational autoencoders. Stat 1050, 13 (2016) 8. Hershey, J.R., Olsen, P.A.: Approximating the kullback Leibler divergence between gaussian mixture models. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing—ICASSP’07, vol. 4, pp. IV–317. IEEE (2007)

72

A. Nagaraj et al.

9. Myung, I.J.: Tutorial on maximum likelihood estimation. J. Math. Psychol. 47(1), 90–100 (2003) 10. Nagaraj, A., Sood, M., Srinivasa, G.: Real-time automated answer scoring. In: 2018 IEEE 18th International Conference on Advanced Learning Technologies (ICALT), pp. 231–232. IEEE (2018)

Automotive Accident Severity Prediction Using Machine Learning Niva Mohapatra, Shreyanshi Singh, Bhabendu Kumar Mohanta, and Debasish Jena

Abstract Prediction of the automotive accident severity plays a very crucial role in the smart transportation system. The main motive behind our research is to find out the specific features which could affect the vehicle accident severity. In this paper, some of the classification models, specifically logistic regression, artificial neural network, decision tree, k-nearest neighbors and random forest, have been implemented for predicting the accident severity. All the models have been verified, and the experimental results prove that these classification models have attained considerable accuracy. The results of this research can be used in the smart transportation system to predict if the road accident will be slight, severe or fatal, in accordance with the top three features as predicted from the machine learning model. Keywords Logistic regression · Artificial neural network · Decision tree · K-nearest neighbors · Random forest · Machine learning

1 Introduction Road accidents are an increasing cause of concern in today’s world. These accidents result in injuries, damage to properties and even death. These accidents also cause heavy monetary losses. Many researchers have tried to examine the significant features that can affect the automotive accident severity [1, 2]. The main aim of this N. Mohapatra (B) · S. Singh · B. Kumar Mohanta · D. Jena Department of Computer Science and Engineering, IIIT Bhubaneswar, Bhubaneswar, Odisha 751003, India e-mail: [email protected] S. Singh e-mail: [email protected] B. Kumar Mohanta e-mail: [email protected] D. Jena e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_9

73

74

N. Mohapatra et al.

research project is the enhancement of the safety of people by extracting particular features that determine how severe an accident will be. The authors in [3] proposed “Magtrack” for detecting the road surface condition using Machine learning. Classification based on the road accidents dataset is performed using machine learning methodologies. Based on the prediction, people can be made aware about the severity of the impending accident by notifying them through text messages. Applying Machine Learning (ML) methodologies on the available dataset can help to understand the features that have a critical role to play in affecting the severity of accidents and help prevent such accidents in the future.

1.1 Organization of the Paper Rest of the sections are arranged as follows. In Sect. 2, some of the related works on the prediction of crash severity are described briefly. The proposed model is presented in Sect. 3. It is then followed by implementation of the model and results analysis in Sect. 4. Further, our paper sums up with a conclusion and any possible future work that can be extended from our research in Sect. 5.

2 Literature Review Predicting the severity of road accidents has been a major challenge globally. Iranitalab et al. [4] presented that pretermitting crash costs would result in misreckoning while selecting the correct prediction algorithm. They have developed a crash costbased approach to compare the prediction models of accident severity and researched about various clustering algorithms. Alkheder et al. [5] proposed an artificial neural network algorithm, which would predict the severity of injuries in the road accidents. For better accuracy of ANN classifier, the datasets were then splitted to three specific clusters by the use of K-means Clustering (KC) algorithm. The outcomes after the clustering unveiled remarkable enhancement in the accuracy of the ANN classifier. Zong et al. [6] compared two ML modeling algorithms, Bayesian network as well as regression models and concluded that the Bayesian network is more efficient than regression models for predicting accident severity. Hashmienejad et al. [7] contrived a multi-objective Genetic Algorithm (GA) for optimizing and identifying protocols in accordance with the metrics (confidence, comprehensibility and support). Kunt et al. [8] predicted the accident severity by implementing twelve crash related features in GA, pattern search and ANN algorithms. They concluded that the ANN algorithm obtained the highest R-value, leading to the result that ANN provided the best prediction. The security and privacy issue of Internet of Things (IoT) are survey in detailed which mentioned that machine learning could be used for addressing security issue [9, 10]. Although most of the previous works presented the effects of various classification models, there has been no specific contribution that can com-

Automotive Accident Severity Prediction Using Machine Learning

75

pare the accuracy of five classification models taken together. Therefore, we have collectively applied these models and further, accuracy of all the above mentioned algorithms is compared, so that we can find the most efficient algorithm which can predict the accident severity.

3 Proposed Architecture and Methodology Here, in the architecture depicted in Fig. 1, there is a roadmap that demonstrates the communication among various vehicles, roadside units (RSU), a base station and a GPS module. All these are connected within a closed network which ensures that only authenticated devices are allowed to communicate. The machine learning model is now fitted into the vehicle. We have assumed automation in this model, which implies that the vehicle sensors are providing proper signals to the ML algorithm. Various ML algorithms are compared and the selected features are clustered using kmeans clustering. The most efficient ML model is trained which predicts the accident severity. If the inputs are matched, the model will either predict 0 (slight), 1 (severe) or 2 (fatal). The predicted accident severity will be informed to the driver beforehand through SMS using the way2sms API. It can be referred from Fig. 2 that shown in flowchart. Firstly, data preprocessing takes place which includes merging datasets, dropping rows and columns which contain null values, resampling unbalanced data by oversampling and undersampling to avoid bias and creating dummies out of categorical variables and dropping variables containing the same information. Standardization is done to transform input features into comparable scale. Standardization is performed before PCA to prevent input

Fig. 1 Architecture of the accident severity prediction model

76

N. Mohapatra et al.

Fig. 2 Flowchart of the accident severity prediction model

features with higher or wider ranges illegitimately dominating over those with low variance. K-Fold cross-validation is used to produce a less biased result. Here, ten splits have been chosen. Random forest model [11] can handle datasets with higher dimensionality. Decision tree algorithm is used since it is quite resistant to the outliers [12]. Artificial neural networks [13] algorithm is used as it needs less statistical training. KNN algorithm classifies a new data point, based on the similarity between new and available data points [14]. Here, in our research, we have also used multinomial LR classification, where it can deal with three or more than three classes [15]. The mean accuracy and standard deviation of these five classification models are compared. The selected features are then clustered using k-means clustering. The ML model with highest accuracy is trained while clustering, which predicts the severity of the impending accident.

4 Implementation Details and Result Analysis Here, the authors have used a 64-bit operating system, x64-based processor with an installed memory (RAM) of 8.00 GB. The system has an Intel(R) Core(TM) i5-8250U CPU. A laptop manufactured by HP (Hewlett-Packard) is used, where Windows 10 OS is booted by default. Python (version 3.6.10) is the programming language used in this project. The front-end/UI technology used here is Flask (version

Automotive Accident Severity Prediction Using Machine Learning

77

Fig. 3 Screenshot of the comparison of various ML models

1.1.2). The integrated development environments (IDEs) used are Jupyter Notebook and PyCharm. Way2sms API is used for sending mobile alerts to the drivers or doctors of the nearby hospitals. The comparison of all the five ML models, after implementing all the machine learning classifiers, is summed up in Fig. 3. The best ML model as per our research is the Artificial Neural Network (ANN) with a mean accuracy of 73.98%. We also concluded that the three most important conditions which can affect the automotive accident severity are the age of casualty, number of vehicles and casualty classpedestrian.

5 Conclusion and Future Work Due to development of Information and Communications Technology (ICT) and other emerging technology like Internet of Things (IoT), cloud computing, Artificial Intelligence (AI) and machine learning, transportation systems are now referred to as smart transportation. The number of vehicles has also rapidly increased creating lots of traffic congestion and accidents in daily basis. In this work, authors have used various machine learning models and concluded that Artificial Neural Network (ANN) model has the highest mean accuracy of 72.94% and a standard deviation of 2.71%. The authors then clustered the accident severity feature into three classes (using K-Means Clustering) as per ANN classification model, which helped in predicting if the road accident is slight, severe or fatal. The paper also concluded that the three most important conditions which can affect the automotive accident severity are the age of casualty, number of vehicles and casualty class-pedestrian. Severity prediction of road accidents is very useful in the smart transportation system. It is high time that we need to implement this in our roads to save people’s lives. It can alert the driver beforehand about the accident on his/her mobile phones using the way2sms API, so that the driver will be careful while driving (Proactive approach). In future, this work can be extended to alert the nearby hospital to the doctors about the accident on their mobile phones using the same API, so that the hospital will take immediate actions to save the victim (Reactive approach).

78

N. Mohapatra et al.

References 1. Chong, M., Abraham, A., Paprzycki, M.: Traffic accident data mining using machine learning paradigms. In Fourth International Conference on Intelligent Systems Design and Applications (ISDA’04), Hungary, pp. 415–420 (2004) 2. Chong, M.M., Abraham, A., Paprzycki, M.: Traffic accident analysis using decision trees and neural networks. arXiv preprint cs/0405050 (2004) 3. Dey, M.R., Satapathy, U., Bhanse, P., Mohanta, B.K., Jena, D.: MagTrack: detecting road surface condition using smartphone sensors and machine learning. In: TENCON 2019—2019 IEEE Region 10 Conference (TENCON), pp. 2485–2489. IEEE (2019) 4. Iranitalab, A., Khattak, A.: Comparison of four statistical and machine learning methods for crash severity prediction. Accid. Anal. Prev. 108, 27–36 (2017) 5. Alkheder, S., Taamneh, M., Taamneh, S.: Severity prediction of traffic accident using an artificial neural network. J. Forecast. 36(1), 100–108 (2017) 6. Zong, F., Xu, H., Zhang, H.: Prediction for traffic accident severity: comparing the Bayesian network and regression models. Math. Probl. Eng. 2013 (2013) 7. Hashmienejad, S.H.A., Hasheminejad, S.M.H.: Traffic accident severity prediction using a novel multi-objective genetic algorithm. Int. J. Crashworth. 22(4), 425–440 (2017) 8. Kunt, M.M., Aghayan, I., Noii, N.: Prediction for traffic accident severity: comparing the artificial neural network, genetic algorithm, combined genetic algorithm and pattern search methods. Transport 26(4), 353–366 (2011) 9. Mohanta, B.K., Jena, D., Satapathy, U., Patnaik, S.: Survey on IoT security: challenges and solution using machine learning. Artificial Intelligence and Blockchain Technology, Internet of Things, p. 100227 (2020) 10. Mohanta, B.K., Satapathy, U., Jena, D.: Addressing security and computation challenges in IoT using machine learning. In: Advances in Distributed Computing and Machine Learning, pp. 67–74. Springer, Singapore (2020) 11. Mohapatra, N., Shreya, K., Chinmay, A.: Optimization of the random forest algorithm. In: Advances in Data Science and Management, pp. 201–208. Springer, Singapore (2020) 12. Tanha, J., van Someren, M., Afsarmanesh, H.: Semi-supervised self-training for decision tree classifiers. Int. J. Mach. Learn. Cybern. 8(1), 355–370 (2017) 13. Da Silva, I.N., Spatti, D.H., Flauzino, R.A., Liboni, L.H.B., dos Reis Alves, S.F.: Artificial neural networks, p. 39. Springer, Cham (2017) 14. Yu, B., Song, X., Guan, F., Yang, Z., Yao, B.: k-Nearest neighbor model for multiple-time-step prediction of short-term traffic condition. J. Transp. Eng. 142(6), 04016018 (2016) 15. Yin, M., Zeng, D., Gao, J., Wu, Z., Xie, S.: Robust multinomial logistic regression based on RPCA. IEEE J. Sel. Top. Signal Process. 12(6), 1144–1154 (2018)

Analysis of Quality of Experience (QoE) in Video Streaming Over Wi-Fi in Real Time M. Vijayalakshmi and Linganagouda Kulkarni

Abstract Over the years, in wireless and mobile networks, video traffic is becoming more dominant. In order to assess the users’ satisfaction of the services, a measure has to be considered which depicts the delight or annoyance of the users’ experience with the services. Quality of Experience is one such measure which focuses on the experience of the users with the services delivered, unlike quality of service (QoS) which focuses on the media or network itself. In addition to video transmission, Quality of Experience introduces a user experience-driven strategy that focuses on the contextual and human factors. This is helpful because it expresses user experience both objectively and subjectively. Hence, in order enhance viewers’ experience, measuring the Quality of Experience of the services along with network and a system factor proves to be beneficial. We aim to analyze the Quality of Experience of users in the university. The data gives insight about the various parameters that affect transmission of video or any data in that regard. The quality of the transferred videos is assessed by the end users by rating their experience. We aim to provide objective and subjective measure of Quality of Experience by analyzing the factors affecting Quality of Experience and the users’ experience, respectively. Keywords Mean opinion score (MOS) · Quality of experience (QoE) · Quality of service (QoS)

1 Introduction User satisfaction is important for any service provider, since it is decisive in determining the success of the service. Hence, quality of service based on user’s perception plays an important role. Quality of Experience (QoE) is one such measure that reflects M. Vijayalakshmi (B) · L. Kulkarni KLE Technological University, Hubli, India e-mail: [email protected] L. Kulkarni e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_10

79

80

M. Vijayalakshmi and L. Kulkarni

this user’s perception. It shows the satisfaction level of the customer is with a certain service and represents how well the service fulfills the user’s expectation [1]. In video streaming and related applications, user viewing experience plays a major role in determining whether the user wants to repeat the services or discard it forever. A user will continue to avail the services of the same network provider depending on the experience of the services offered, be it be video buffering and loading time, or the quality of transmission. With the increasing demand of multimedia services such as video transmission, there is a need to develop performance-based evaluation metrics to evaluate video services/applications. Although there are many video quality representation metrics like peak-signal-tonoise-ratio (PSNR), jitter, bandwidth that objectively measures the quality of video between the clients, but user’s views are not considered for quality evaluation, hence they are incapable of representing the true experience of users. Quality of Experience (QoE) is a user centric quality strategy that overcomes the shortcomings of the above quality metrics. QoE is the degree of satisfaction or dissatisfaction of the user with an application or service. There are various factors which drive the Quality of Experience (QoE) for video consumption which in turn plays a key role in the perception of quality of the service. The paper is aimed to analyze the various factors affecting the quality of video transmission over Wi-Fi of the university and thereby analyze the Quality of Experience (QoE) of the users with respect to these videos sent over the network. The quality of the videos is assessed subjectively by the collective ratings given by the users. These ratings constitute mean opinion score (MOS). MOS in this context is a numerical measure of the human—judged overall quality of an experience. The rating scale ranges from 1 to 5 with 1 indicating bad and 5 indicating excellent experience. The quality of the video is evaluated objectively by objective video quality models, where several independent variables such as bit rate, length, PSNR are fit against the results obtained in a subjective quality evaluation using regression techniques. Finally, the objectively predicted values are compared with subjective scores available as mean opinion score (MOS).

1.1 Motivation for Analysis of QOE One of the most popular online services today is video streaming. It is occupying more than 77% of all consumer Internet traffic [2] as per the cisco visual networking index. Users demand high Quality of Experience (QoE) while using these video services on wireless networks, such as Wi-Fi. This poses a challenge for network administrator’s environments such as university campuses and also the service providers. Guaranteeing the best possible QoE becomes consequential. This leads to challenges in optimizing network resources and also providing better experience to end-users. Hence, QoE becomes a key metric for both network providers and end-users.

Analysis of Quality of Experience (QoE) …

81

2 Design for Analysis of QOE 2.1 Measurement Methodology Setup In Fig. 1, we show the proposed framework for the analysis of QoE. The analysis of QoE of videos is done over university Wi-Fi network. The network condition at a place depends on the health of the Access Point the user is connected to. For our analysis three network conditions, i.e., good, medium and poor, based on the performance and health of the Access Points in the campus, the places are selected for the transfer of videos under study. The Access Points (AP) is Aruba AP-135 [3]. Videos is sent from sender (Client 1) to receiver (Client 2) over the selected places of the campus. Data and the changes during the transfer of the videos are extracted from the Aruba controller software, [4] where MAC address and IP address identify the devices of transfer. FFmpeg [5] tool is used to extract the required video characteristics of the received videos. FFmpeg is a video framework with large collection of coding libraries. It is also used to calculate PSNR of the received videos. PSNR is a video quality metric or performance indicator. Receivers rate their experience and MOS from all the receivers is tabulated and compared with MOS obtained from the QoE metrics. Test videos A total of 22 videos are transferred from Client 1 to Client 2 at different network conditions. The videos are of variable length, resolution and in MP4 format.

Fig. 1 Overview of the proposed system

82

M. Vijayalakshmi and L. Kulkarni

2.2 QoE Metrics PBR It is the number of bits per second. It determines the size and quality of the video, the higher the bit rate, the better the quality. Hence, higher bit rate may be providing excellent video quality. Dropped frames When the connection to the server is unstable, or problems such as random disconnections due to firewall/anti-virus/security software, routers, etc. Will lead to dropping of frames. Because of this, some of the video frames will be dropped in order to lower the traffic. This may lead to disconnection from the streaming server. Due to congestion during the transfer of videos, the dropped frames will be resent which constitutes the retried frames. PSNR Peak signal-to-noise ratio is the ratio between the maximum power of signal and the power of corrupting noise. Logarithmic decibel scale is used to express PSNR, as many signals have a wide dynamic range. PSNR is used in detecting the presence of dropped frames and the location of dropped frames in a video.

3 Related Work Former works [6] have introduced a machine learning technique that explains the QoE ground-truth, for all the video applications. In contrast to the above work, we focus on analyzing the factors affecting the QoE for video streaming over university Wi-Fi network and providing a comparison between subjective and objective MOS which depicts the Quality of Experience of the users. Some works [7] shows the analysis of video streaming over mobile network by considering MOS metrics. Models proposed were a combination of clustering and logistic regression methods on a realistic data set. Compared to this work, our model proposes to use different models such as random forest, Ridge, Linear, and Lasso regression for analysis on a Wi-Fi network in real time over a university scenario. Authors [8] proposed an analysis of QoE over video streaming, by considering MOS (matrices). Proposed model is a combination of K-mean clustering method and logistic regression method, and experiments were conducted on realistic datasets and have the precision of 96.94, 97.13, and 97.54% on dataset 1, dataset 2, and dataset 3. Authors [9] have developed a model based on Markov chains for user experience using adaptive streaming in dynamic environment and buffer-based DASH clients for switching frequency. The Authors [10] proposed an SDN control plane approach to multimedia transmission problem, employing video encoding based on latest standard. In paper [11], author explains about the video streaming importance in the Wi-Fi environment and how helpful it will be to stream video in Wi-Fi condition.

Analysis of Quality of Experience (QoE) …

83

4 Data Analysis 4.1 User Study The dataset comprises of 22 videos that are of variable length, resolution, codec, size, and in MP4 format. These videos resent over Wi-Fi network and are received by receiver (Client 2). The receiver is then asked to rate their experience based on the quality of the received videos on a scale of 1-5. All the ratings are collected and this constitutes the subjective MOS. Subjective MOS gives the user’s perception of video quality.

4.2 Objective and Subjective MOS Subjective MOS is taken from user’s rating. And, objective MOS is calculated by taking characteristics of received video and different parameters depicting network conditions. Train and test data are split in the ration 7:3. QOE metrics, subjective MOS, and different characteristics are considered as x, and objective MOS is predicted for test data. The further subjective and objective MOS are calculated (Fig. 2). Machine learning techniques are used to predict the objective mean opinion score. Different models like linear regression, Lasso regression, ridge regression, AdaBoost, random forest have been implemented and their accuracy and mean absolute error have been calculated. Linear regression performs the task to predict the value of dependent variable (Objective MOS) based on a given independent variable (QoE metrics). By applying this model, mean absolute error of 5.023 was obtained. To reduce the over-fitting caused by simple linear regression and to reduce complexities, some of the simple techniques like ridge and Lasso regression are used. By applying ridge and lasso regression, mean absolute error of 0.867 and 0.327 was obtained,

Video Client 1

Send over Wi-Fi

Controller

Packet Transmission Data

Received Video Client 2

Extraction of QoE Metrics

Analysis and Prediction

MOS Scores

Fig. 2 Architecture of the proposed system

84 Table 1 Learning models for prediction of objective MOS

M. Vijayalakshmi and L. Kulkarni S. No.

Model name

Mean absolute error

1

Linear regression

5.023

2

Lasso regression

0.327

3

Random forest

48.3 (accuracy)

4

Ridge regression

0.867

respectively. By applying random forest model, 0.483 accuracy was obtained. The predicted MOS and actual MOS have been compared. Model name mean absolute error linear 5.023 Lasso0.327 random forest 48.3 ridge 0.867 (Table 1).

5 Conclusion Measuring the Quality of Experience plays a major role in determining users’ satisfaction with the services. The videos sent and received over the network based on different performance states of the access points like good, medium, and low shows how the experience of the users is affected based on these parameters. The comparison between the subjective and objective mean opinion score (MOS) depicts the users experience based on their perception of quality, and the quality of the received videos based on the different parameters that contribute to it, respectively. This helps in understanding of the how the users perceive the quality of the videos as well as the different network and video parameters that determine the quality of the transferred videos.

6 Future Work In future, this understanding becomes helpful in using different techniques such as adaptation and optimization to enhance the experience of users in video streaming. Furthermore, Quality of Experience of users can be analyzed for videos of different formats.

References 1. Dai, Q.: A survey of quality of experience. In: Lehnert, R. (ed.) Energy-Aware Communications. Springer, Berlin Heidelberg, pp. 146–156 (2011) 2. Pepper, R.: Cisco visual networking index (VNI) global mobile data traffic forecast update. Tech. Rep. (2013) 3. Aruba ap-135. Available at http://content.etilize.com/user-manual/1023377357.pdf

Analysis of Quality of Experience (QoE) …

85

4. Aruba controller software. Available at https://www.arubanetworks.com/products/networking/ gateways-and-controllers/ 5. Dasari, M., Sanadhya, S., Vlachou, C.: Ffmpeg. Available at https://www.ffmpeg.org/about. html 6. Kim, K.H., Das, S.R.: Scalable ground-truth annotation for video QOE modeling in enterprise Wi-Fi. In: 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS). IEEE, pp. 1–6 (2018) 7. Wang, Q., Dai, H., Wang, H., Wu, D.: Data-driven QOE analysis on video streaming in mobile networks. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), December 2017, pp. 1115–1121 (2017) 8. Wang, Q., Dai, H.-N., Wang, H., Wu, D.: Data-driven QoE analysis on video streaming in mobile networks. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), pp. 1115–1121. IEEE (2017) 9. Poojary, S., El-Azouzi, R., Altman, E., Sunny, A., Triki, I., Haddad, M., Jimenez, T., Valentin, S., Tsilimantos, D.: Analysis of QoE for adaptive video streaming over wireless networks. In: 2018 16th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), pp. 1–8. IEEE (2018) 10. Awobuluyi, O., Nightingale, J., Wang, Q., Alcaraz-Calero, J.M.: Video quality in 5G networks: context-aware QoE management in the SDN control plane. In: 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, pp. 1657–1662. IEEE (2015) 11. Zhu, X., Schierl, T., Wiegand, T., Girod, B.: Video multicast over wireless mesh networks with scalable video coding (SVC). In: Visual Communications and Image Processing 2008, vol. 6822, p. 682205. International Society for Optics and Photonics (2008)

Self Driven UGV for Military Requirements Hrishikesh Vichore, Jaishankar Gurumurthi, Akhil Nair, Mukesh Choudhary, and Leena Ladge

Abstract Soldiers of any nation are engaged in close combats against terrorists where human life is at stake. Unmanned Ground Vehicles are used everywhere to reduce human life loss as it may be impossible to have a human operator present at the location. The vehicle will have a set of sensors to observe the environment which mainly contains four cameras and a gun loaded on the top of the UGV acting like a turret. The bot will have two modes of operation. For autonomous driving, the accuracy of the self-driving model is about 74% and it will make decisions by getting real-time feeds from the camera by using the Image Processing algorithm with an accuracy of about 95%. For manual driving, a human operator will control it from a remote control centre over the Internet by providing security against one of the biggest threat in remote control vehicles i.e. Man-in-the-middle (ITM) Attack. Keywords Unmanned ground vehicle · Tensorflow · Convolutional neural network · Raspberry pi · Object detection · Carla

H. Vichore (B) · J. Gurumurthi · A. Nair · M. Choudhary · L. Ladge SIES Graduate School of Technology, Navi Mumbai, India e-mail: [email protected] J. Gurumurthi e-mail: [email protected] A. Nair e-mail: [email protected] M. Choudhary e-mail: [email protected] L. Ladge e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_11

87

88

H. Vichore et al.

1 Introduction Carrying out military affairs requires a lot of manpower these days. Thus, the security of life becomes the most important and prevailing question. To solve this a concept known as Unmanned Ground Vehicle or in short UGVs was introduced. It is a mechatronics robot that is used in place of humans to carry out life-threatening tasks such as surveillance, disposal of bombs and shooting on spot. Albeit a military robot, the application of it does not limit to defence systems only. It can also be used for domestic purposes such as a toy car, cleaning bot or a payload carrying bot etc. People, before used to lambaste on this idea but lately they are more munificent about this idea. Climacteric operations such as rescue operations are the coup de grace for using this technology. To also reduce the latency and time of the decision the bot is self-driven. With 74% of accuracy, the bot will be able to avoid obstacles and correct its location which is very helpful in rescue operations. The vehicle will use various components such as cameras, servo motor, stepper motors, geared brush-less DC motors. The cameras will observe the environment and detect human location and movements. The vehicle will be controlled automatically through if needed a user who is in a remote location will be able to override the decision of the autonomous bot. The vehicle would be armed with a weapon mounted on a turret which will automatically change its direction and follow the target. A person remotely operating the vehicle will be getting the live video feed from the camera and will be able to trigger the weapon according to his own decision. Security is a major issue in this type of vehicles. Spoofing and Man-in-the-middle (MITM) attacks are the common attacks used in security breaches. To provide security to the vehicle it will be averted from such attacks using encryption techniques such as two-way authentication protocols and Strict Transport Layer Security (STLS).

2 Literature Survey Thomas et al. [1] describes the design and implementation of the unmanned ground vehicle (UGV) for surveillance and bomb diffusion is presented. This paper gives a general idea of surveillance, although the feedback is a haptic glove system. Following this idea, Noor et al. [2], describes the development of a remote-operated multi-direction Unmanned Ground Vehicle (UGV) was designed and implemented. Xbee Pro, PIC micro-controller was used to achieve this. The 16-bits Microchips microcontroller were used in the UGV’s system that embeds with Xbee Pro through variable baud-rate value via UART protocol and controls the direction of wheels. For securing the communication Sandhya and Devi [3], provides countermeasures for thwarting the MITM attack. Along with the existing approaches a new way was discussed in the paper. It includes the generation of security overheads and allocating a separate channel for the security.

Self Driven UGV for Military Requirements

89

Yong et al. [4], provides the methods for performing MITM along with defence, handicapping the attack was also analyzed. Quantum key distribution was used to distribute cryptographic key. Correspondents have no way to verify each other’s key which created all the problem leading to communication be noticeably delayed. For increasing the speed of object detection and still keeping the accuracy same, Kanimozhi et al. [5], provides the solution of a lightweight model to reduce the computational overhead. For the above purpose, MobileNet was used. Single Shot Multi-Box Detection was used to increase accuracy and identifying the items in the household real-time. Tensorflow Object Detection API is also used in this process. The insights of Anping Gu and Xu et al. [6], proved to be very helpful in making the detection of objects very fast. In this paper, a vision-based real-time circular ground marker detection method, which is used to guild small autonomous control robotic UAV (RUAV) to pick up a ground target object, is presented. The method is based on RHT algorithm and running entirely on GPU via Microsoft DirectX 11 API since the CPU on-board doesn’t have sufficient computing power. To further improve the performance of object detection, Talukdar et al. [7], does transfer learning through the use of synthetic images and pre-trained convolutional neural networks offer a promising approach to improve the object detection performance of deep neural networks. The object detection performance of various deep CNN architectures is also studied, with Faster-RCNN proving to be the most suitable choice, achieving the highest mAP of 70.67. Transfer Learning was used to increase the accuracy of the state google’s TensorFlow object detection API and extend it on a synthesized data. For self-driving car and it’s testing on an environment CARLA the research of Dworak et al. [8], proved very resourceful although this paper uses LiDAR for object detection, RGB camera sensor of the environment can also be used for object detection. The data generated from the CARLA was used to train the model which is a CNN model to create a self-driving experience. For navigation of self-driven cars using CNN, Duong et al.[9], provided a very innovative way of dealing with the complications of using Markov models and hence need of generative models was eliminated. Such a colossal task was reduced down to one simple model.

3 Existing System It allows the soldier to spot out the enemies on the patrol or waiting to ambush them. It can help save lives. It can adjust the strategies based on the surroundings. UGVs can be used to detonate the explosive. Helps in the firefight, combat as well as to supply ammunition. It can spot the explosives or the human opposition before the soldiers can be harmed in the combat. Some of the drawbacks in existing systems where, Bandwidth always is the problem with the wireless solutions and even some wired solutions. Battery discharge during some mission. If autonomous then can misfire some other person rather than the enemy. Expensive, The big disadvantage is the

90

H. Vichore et al.

cost at which these vehicles bring in our military. Requires its specific programming, They require many engineers to spend countless hours on testing and designing them. They can be destroyed before they have benefited any of our soldiers.

4 Network Architecture 4.1 Convolutional Neural Network(CNN) Majorly, the experiments done for the self-driving car involved some of the other variations of generative models viz. hidden Markov models. This method was computationally very expensive and thus needed a simple solution out of it. CNN is one such solution to this problem. The camera fitted on the vehicle would act as the data on which the training will happen. Accompanying the camera feeds is one more feature viz. steering angle. All the above information is fed to the CNN model for training and error will be solved by using the back-propagation algorithm. The training files are created by recording the CARLA environment which tells the speed of the direction and steering angle. The data is recorded and stored in a npy file extension. Each npy file was of 185 MB and there were 106 such total files making the entire dataset around 19.1 GB. Even though the recording was done at 1280 × 720p it was later resized to 480 × 270p to make the images more CNN friendly. The recording was done at 25 frames per second. The architecture used for training the model was Inception v3 model. Since Inception focuses mainly on computational cost this model was used. All the previous weights of the Inception v3 model was used during the training. There are 5 epochs overall i.e. the entire training will be done 5 times but fitting for every file would be done just once in the batch size of 15. The learning rate for this training was 1e-3. The average accuracy came out to be 74%. To avoid overfitting a dropout rate of 0.3 was set. It took around 8 hours of training on Nvidia GTX 1070 GPU (Fig. 1).

4.2 Equations Equation of CNN G[m, n] = ( f ∗ h)[m, n] =

 j

k

h[ j, k] f [m − j, n − k]

(1)

Self Driven UGV for Military Requirements

91

Fig. 1 Training of the model

Equation of Sobel filters ⎛

⎞ ⎛ ⎞ −1 0 1 1 2 1 K x = ⎝−2 0 2⎠ , K y = ⎝ 0 0 0 ⎠ −1 0 1 −1 −2 −1

(2)

Equation of Gradient Intensity |G| =



y

I2x + I2 

(x, y) = arctan

Iy Ix

(3) (4)

Equation of Gaussian Blur  1 (i − (k + 1))2 + ( j − (k + 1))2 ; Hi j = exp − 2π σ 2 2σ 2

(5)

1 ≤ i, j ≤ (2k + 1)

(6)

Equation of Hough Line Transform r = x cos θ + y sin θ

(7)

92

H. Vichore et al.

4.3 CARLA CARLA is an open source simulator for researching self driving cars. It’s dynamic and free roam capacity along with different weather patterns and changing of game mechanics according to it makes it the front runner in any consideration for self driving experiments. As of writing this paper the latest stable version available for Windows 10 is 0.9.5. CARLA along with standard RGB camera has 7 different sensors which can also be used to make the self driving experience even more realistic. In the experiments every frame from the CARLA was picked and applied Gaussian blur (Fig. 2) to make out the out-linings of the lane and delete every other components that doesn’t matter. To zero in on the lanes and avoid crossing of it this is a must do process. Before applying Gaussian Filter Canny Edge Detection was also used with two threshold values as 150 and upper end capping at 220. These values were set after a bit of research and found out that the algorithm works best at these values. A 5 × 5 mask was applied during Gaussian Filter. To extract the shape of the lanes Hough Lines Transform was used. A major reason behind this being it can recognize the shape even if it broken or distorted to some extent.Finally, the images with Gaussian Blur on it was imposed on the original images and we got back the original images with two lines of green color indicating lanes (Fig. 2b). Our car will always be in between these lanes and crossing it accounts for an error. This image which has lanes imposed on it along with the vehicle is considered for training the model. The prediction part of the model outputs an image of size 480 × 270p which is then scaled up to 1280 × 720p. The choices for the prediction are straight, left, right, reverse, forward left, forward right, reverse left, reverse right and lastly no keys pressed.

(a) Gaussian Filter Applied on CARLA Fig. 2 Guassian filter

(b) Lane detection for CARLA using Gaussian Filter

Self Driven UGV for Military Requirements

93

5 Proposed System 5.1 Objectives • It identifies the person as well as the gun with the help of object detection API from google. • Increases the security. • The bot can be triggered from the remote location. • Can identify person in the night as well. • To make the bot self driven. Operation of weapons through triggers and central controller is shown. It also depicts how the camera system helps in triggering the weapon. The flowchart [Fig. 3] of how the UGV is controlled and the cameras giving feed to the object classification model so as to detect any person intruding. Based on this the above weapons system will be triggered.

5.2 Advantages • • • •

Increases mobility. Loss of human life is reduced. Voice or autonomous controlling of bot. Invulnerable to Spoofing and MITM.

Fig. 3 UGV control system

94

H. Vichore et al.

6 Methodology We are using Tensorflow object detection API, a deep learning model which is pretrained model to detect the number of persons along with the weapon they are using. The current version of the object detection API is 9.0.2. MobileNet V3 and COCO database are used for training the model for which ResNet is used. A protocol buffer known as Protobuf version 3.0.0 is used which helps in maintaining the kernel-based interaction of the API for scheduling of the jobs. For the vehicle, iron is used instead of aluminium to maintain ground clearance and stability. Two brushless DC planetary motors are used. Two batteries with 12 V 9 A configuration are used in parallel. To handle such heavy load RKI 1341 is used providing safety to the micro-controller. The micro-controller used here is Raspberry Pi 3B+. Rpi is present on the vehicle for processing all the inputs which includes controlling the hardware and also managing the video feed input and providing it to the server. For safety precautions, 100 F capacitor is used along with 1000k acting as a shield. End stop switches are used to make the vehicle move forward, backward, left and right. The vehicle is to be armed with a turret, originally an air pressure gun was used but reloading, in that case, was very difficult. So, a gear-based model gun is used in that place. Stepper motor, NEMA 17, is used to turn the motor with 30 steps. Security is a major concern in this type of system and one of the most tangible attacks in this situation is Man in the Middle Attack. So, to prevent this a two-step authentication is used and the service was provided by NGROK. It also uses SHA-512 system for encryption. NGROK creates a VPN tunnel through there servers and providing proxy servers for safety. The connection is a TCP/STLS connection which maintains the security throughout. The bot is contacted through the internet and no hotspot module is connected but a GSM module is used.

7 Experimental Set up Image of the bot (Fig. 4a) after completely assembling it with all the cameras and gun. Using Scapy (Fig. 4) with python to perform Man-in-the-Middle attack in kali linux. Using Metasploit (Fig. 4b) on kali linux to perform Man-in-the-Middle attack using different types of sniffers. Using Raspberry pi (Fig. 4c) to provide internet to the bot and transfer camera feeds over it. An example of how tensorflow object detection API (Fig. 4d) works and classifies different types of objects. A diagram of how CNN (Fig. 4e) is used to make self driven cars using camera feeds and steering angles.

Self Driven UGV for Military Requirements

95

(a) Bot image

(b) ARP Spoofing Part 1

(c) ARP Spoofing Part 2

(d) Camera and Internet Connectivity via Rpi

(e) Object Detection API

(f) Flowchart for Self Driving Car

Fig. 4 Experimental setup

8 Results 8.1 Model Training and Security of the Bot To make the system secure and avoid MITM attack NGROK (Fig. 5a) was used. It provides tunnels from a public endpoint to local running services. Given below is the NGROK tunnel in working condition. The training accuracy (Fig. 5b) of the model caps out at just a hair below 80% after more than 30 thousand iterations of training.The validation accuracy (Fig. 5c) caps at around an average of 74% with clear signs of over-fitting here and there.

96

H. Vichore et al.

(a) NGROK Secure Tunnelling

(b) Training Accuracy

(c) Validation Accuracy Fig. 5 Accuracy results and security

8.2 Testing of the Model Following are the examples of testing of self driving car. It can be clearly seen that in some locations the car crosses the road and enters pedestrian path before correcting it’s way and returning to the street. This can prove to be very harmful and can be eliminated by incorporating more training data into it (Fig. 6).

9 Conclusion Thus concerning the above lying scope decided for the project, we have successfully implemented an Unmanned Ground Vehicle for threat detection and elimination which is a prototype bot which can be used in war-prone areas to detect the threat remotely using classification model which shows object detected in-camera and can be remotely controlled over the internet along with the camera feed. CARLA is a continuous and dynamic environment and thus data collected from it provides a very

Self Driven UGV for Military Requirements

97

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 6 Testing of the model on CARLA

realistic approach towards the self-driving car. CNN proves to be a better method than hidden Markov models which are generative and computationally expensive. The accuracy of the model is around 74%.

10 Future Scope Due to limited computational resources, the data on which the model was trained was limited. So, more data can be collected and different optimizers can be tried out as well along with different loss functions. For the vehicle, a more lightweight metal can be used to reduce its weight and the object detection API can be made more fast and efficient as well. The reliability of the model can be increased because it still runs out of bounds which can be a very serious threat. Over-fitting should also be reduced.

98

H. Vichore et al.

Acknowledgements Every aspect of this idea that was brought into fruition wouldn’t have been possible without the tremendous family support for all the authors. We would also like to thank the SIES Graduate School of Technology and the HOD of IT Dept. Dr. Lakshmisudha Kondaka for allowing us to work on the project and supporting us throughout.

References 1. Thomas, S., Devi, A.: Design and implementation of unmanned ground vehicle (UGV) for surveillance and bomb detection using haptic arm technology. In: 2017 International Conference on Innovations in Green Energy and Healthcare Technologies (IGEHT), Coimbatore, pp. 1–5 (2017) 2. Noor, M.Z.H., Zain, S.A.S.M., Mazalan, L.: Design and development of remote-operated multidirection unmanned ground vehicle (UGV). In: 2013 IEEE 3rd International Conference on System Engineering and Technology, Shah Alam, pp. 188–192 (2013) 3. Sandhya, S., Devi, K.A.S.: Contention for man-in-the-middle attacks in bluetooth networks. In: 2012 Fourth International Conference on Computational Intelligence and Communication Networks, Mathura, pp. 700–703 (2012) 4. Wang, Y., Wang, H., Li, Z., Huang, J.: Man-in-the-middle attack on BB84 protocol and its defence. In: 2009 2nd IEEE International Conference on Computer Science and Information Technology, Beijing, pp. 438–439 (2009) 5. Kanimozhi, S., Gayathri, G., Mala, T.: Multiple real-time object identification using single shot multi-box detection. In: 2019 International Conference on Computational Intelligence in Data Science (ICCIDS), Chennai, India, pp. 1–5 (2019) 6. Gu, A., Xu, J.: Vision based ground marker fast detection for small robotic UAV. In: 2014 IEEE 5th International Conference on Software Engineering and Service Science, Beijing, pp. 975–978 (2014) 7. Talukdar, J., Gupta, S., Rajpura, P.S., Hegde, R.S.: Transfer learning for object detection using state-of-the-art deep neural networks. In: 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, pp. 78–83 (2018) 8. Dworak, D., Ciepiela, F., Derbisz, J., Izzat, I., Komorkiewicz, M., Wójcik, M.: Performance of LiDAR object detection deep learning architectures based on artificially generated point cloud data from CARLA simulator. In: 2019 24th International Conference on Methods and Models in Automation and Robotics (MMAR), Miedzyzdroje, Poland, pp. 600–605 (2019) 9. Duong, M., Do, T., Le, M.: Navigating self-driving vehicles using convolutional neural network. In: 2018 4th International Conference on Green Technology and Sustainable Development (GTSD), Ho Chi Minh City, pp. 607–610 (2018)

Vehicular Ant Lion Optimization Algorithm (VALOA) for Urban Traffic Management Ruchika Kumari and Rakesh Kumar

Abstract In the past years, various routing methods have been advanced for VANETs. Routing protocols that utilize various parameters have been defined to be most suitable for vehicle networks due to their efficiency in production with DE modifies due to vehicular network mobility. Parameters are linked-stable, network speed, and environment situations, etc. This research article presents a traffic-based management system-based protocols for VANETs satisfactory for urban city background. The novel method is an improved description of the dynamic source routing (DSR) energy-based protocol. The developed protocol, termed effective DSR, uses an ant-based method to search a path that has optimized network connectivity. It is supposed that every vehicle node has a vehicle id of paths such as road and streets, etc. Utilizing data included of small control network packets called ANTLIONs, the vehicle nodes evaluate a distance, energy for roadsides or streets to the network connections. ANTLION data packets are defined by the vehicle in particular areas. To search the valuable and the perfect route among source to sink node, the src vehicle defines the route on roads with minimum total distance and energy for the complete path. The fitness function of the planned routing protocol has been identified, and its presentation has been calculated in reproduction parameters or inputs. The experiment outcomes define that the PDR rate 10% improved as compared with the existing protocol (VACO: vehicle ant colony optimization) that also utilized and vehicle ant lion optimization (VALO) method. In the calculation, the control the E2D and network overhead (NO) is also mitigated. Keywords Vehicle ant colony optimization (VACO) · Vehicle ant lion optimization (VALO) · Dynamic environment (DE) · VANET · And dynamic source routing (DSR) R. Kumari (B) Department of CSE, NITTTR, Chandigarh, India e-mail: [email protected] R. Kumar Department of CSE, CUH Mahendergarh, Mahendergarh, Haryana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_12

99

100

R. Kumari and R. Kumar

1 Introduction In the past few years, enhancements in ITS have been motivated to reduce traffic jamming to mitigate time complexity and overhead, the improvement of transportation management or traffic safety, etc. To attain the major communication needs of both security and without security uses in VANET situation, there is a requirement to advance vehicle communication (VC) and smart communications (SCs). VANET is a sub-type of a MANET, in which VCs with each other and with nearby static roadside device (RS). Its communications include various models like as a V2V and V2I. VANET is a developing topology that aimed to present a wireless communication (WC) between rotating vehicles and also among vehicles and organization’s stations [1]. The main goal of VANET is to present security-related data to vehicles. Vehicles interchange status data like speed and location of periodic data known as beacons, to generate awareness for neighboring vehicles, improve security, and decrease the rate of the accidents. Figure 1 defines a characteristic vehicle network set-up, where vehicle-2infrastructure transportations can be utilized to access position services or attain traffic data. Vehicle-2-Vehicle can be engaged to prepared about difficulties or reach out of coverage vehicle nodes (CNs) done multihop message. Vehicular communication is used in several applications with highly varied requirements. The probable applications of VANET are secure oriented, convenience, and commercial oriented. Some of the uses of VANET networks are: security, weather conditions, and traffic management, etc., [2]. In the existing routing protocol described as a hybrid approach (VACO), it is a combination of two algorithms which are GSR and ACO optimization algorithm. Fig. 1 Illustration of VANET scenes: accident information (V2V) and V2I to send data to difficulties services [18]

Vehicular Ant Lion Optimization Algorithm (VALOA) …

101

VANET Routing protocols

Topology based protocols

Position dependent Routing protocols

Cluster dependent Routing protocols

GeocastR outing protocols

Broadcasti ng Routing protocol

Fig. 2 Categorized of routing methods in VANET

Network Initialization

Searching the vehicle nodes for data transmission

Calculate the coverage distance (with Dist_formula)

Stop

Develop a DSR routing protocol (to signal broadcasting, reply, and maintenance)

After that implement a novel approach (VALO) algorithm

Paramet ers

Fig. 3 Research algorithm process

Geographical source routing is the linking of the positional and topological-based routing. The selection of the route and searching of the shortest path is used Dijkstra approach. It utilizes a reactive position-based service to achieve the receiver address. It utilizes the best path to advance data packets. The main structure of the routing of the GSR routing protocol follows the top town operation. The application layer presents benefits to interface with users and other layers for the transmission of the data. The transport layer utilizes UDP and TCP for controlling the congestion over the VANET system [3]. The network layer presents a path to the information through RLS. The next layer is the physical layer which presents the wireless transmission that emphasis IEEE802.11a, IEEE802.11b, and IEEE802.11c. The issue of the hidden

102

R. Kumari and R. Kumar

data, flooding can be prevented at the time of communication in this approach. Ant colony optimization approach is an algorithm that is nature inspired by the foraging conduct of the ants. Pheromone is the part of the hormone that can be identified by ants. Its appeal to ants, and ants follow maximum pheromone absorptions. It includes the concept of the specialization of the group of the regulation and selective rules. This algorithm is based on swarm intelligence approach. The approach is based on the standard evolution and shared conduct in animals. An algorithm is dependent on the genetic swarm of the specific insect sequences. It leads to a complex approach and may have intelligent conduct by the complex interface of the thousands of automating swarm associates [4]. The communication is based on the main nature with no supervision. In the existing research work, an author [5] presents a traffic-aware location that depends on routing methods for VANET. An improved version of the geographical source routing (GSR) method is used in existing work. VACO method found an optimal path between a source and a sink, search route connectivity in the network. VACO method was searching the route to improve network performance, but maintenance cost was high, and packet delivery was only 10% improved. The response of this research article is ordered as express: In Sect. 2, defines the classification of numerous routing protocols of Vehicular ad hoc networks). After that, 3 Section defines the previous routing protocols that cannot adequately satisfy the routing requirements in these vehicular networks, due to the active behavior of VANETs, which is a consequence of traffic situations and problems. Section 4 describes the research policies using dynamic source routing (DSR) protocol, ant lion optimization (ALO) algorithm, and research methodology. Sections 5 and 6 present the experimental result analysis, mathematical formulas, comparison between proposed and existing routing protocols (VACO, DSR, and VALO), define conclusion and future scope.

2 Routing Scenarios The routing process is the provider responsible for inventing and protecting routes between origin and destination hops. The routing process is essential for network operations. Routing protocols are used for interchanging node data among the nodes in the network with the least network overhead. VANET routing protocols may be indicated by different factors such as routing algorithm used, information of routing data, similarities of protocols, network protocols. The VANET can be categorized as [6].

Vehicular Ant Lion Optimization Algorithm (VALOA) …

103

2.1 Topology-Based Protocol This protocol is to connect the data that is present in the system to perform the forwarding of the data. These protocols are categorized as [7]. The methods use associations data that occur in the vehicle network to achieve packet forwarding.

2.1.1

Pro-active Routing Protocol

This routing protocol is the method of direction finding of the data such as the nearest node is managed in contextual instead of the transmission demands. The data packets are regularly broadcasted and scattered between the hops to manage the route, and then, the routing table is developed to build in the hop that recognizes the next node along the receiver hop. The main benefit of the routing protocol is that there is a requirement of the path investigation though receiver hop which is placed in the contextual, but this protocol has minimum latency. It is known a table-driven routing protocol. The protocol runs periodically due to the interchanging of the topology between the hops in the system.

2.1.2

Re-Active Routing Protocol

This protocol contains the route investigation stage, where query packets are scattered to the system for searching for the route and completion of the task. The routing protocols is known as on demand routing protocol since of the periodic updating of the route when the information is to be transferred.

2.1.3

Hybrid Protocol

This protocol is developed to decrease the data overhead of pro-active routing protocols, thus reduce the starting path investigation delay in reactive routing protocols [8].

2.2 Position-Dependent Routing Protocol This protocol contains the group of routing approach. The geographical positioning data features are shared to excellent the nearest advancing nodes. The data packet is forwarded in the absence of the mapping data to one neighbor node that is nearest to receiver hop. This is essential routing due to the absence of the globalized path from sender to receiver hop which is required to be generated and managed. These protocols are location greedy vehicle to vehicle protocol and delay resistant protocol.

104

R. Kumari and R. Kumar

2.3 Cluster-Dependent Routing Protocol This protocol is formed in which vehicles are placed close to each other to form clusters. Every cluster has a unique CH that is accountable for intra- and intercluster maintenance. Intra-cluster hops interconnect using directional links. On the other hand, inter-cluster communication is presented by cluster head (CH).

2.4 Geo-Cast Routing Protocol It is mainly a position based on multiple route protocol. The major objective of this protocol is to transmit packets from source to all connecting hops in a unified geographical area. The vehicles which are placed in the external region of the zone are not notified to prevent an unexpected dangerous response. This protocol is determined as the multiple cast provision in a geographical area. In the receiver area, unicast routing is utilized to send data packets [8].

2.5 Broadcasting Routing Protocol It is mainly a position based on multiple route protocol. The main goal of this protocol is to transmit data packets from source to all connecting hops in a unified geographical area. The vehicles which are placed in the external region of the zone are not notified to prevent an unexpected dangerous response. This protocol is determined as the multiple cast provision in a geographical area [9].

3 Prior Work In the section elaborates on the survey of the various research articles in VANET. As described already, routing procedures are a major problem in vehicular ad hoc networks. Goudarzi et al. [10] presented research on a traffic-aware position-based routing protocol for VANET that was appropriate for the city scenario. The routing protocol is an improved Ver. of the geographic src routing protocol. GSR protocol is utilized as an ant dependent approach to search the path that has the optimum connection. It was estimated that each vehicle has a digital map of the path that consists of the distributed route. The data contain the smaller controlled packets which are known as the ant and vehicle compute the mass of each route segment related to that connection. The data packets of ant were established by the vehicles in street areas. The optimum path was searched among the sender and receiver, where the sender vehicle recognizes the route of the mapping street with less mass of the

Vehicular Ant Lion Optimization Algorithm (VALOA) …

105

whole path. The exact function of the planned protocol was recognizing, and performance was evaluated through simulation outcomes. Simulation outcomes showed that a PDR was enhanced by more than 10% for speed up to 70 km/h and associated to the VANET routing protocol that was based on the VACO. Mejdoubi et al. [11] presented a segmented probable road traffic maintenance scheme for VANET. It aimed at recognizing the traffic on the road along with the regular adoption of the path at every junction to decrease the time of driving and prevent congestion. The communication among the vehicles and roadside units determines the traffic prediction which can be acquired by the segmented method. Nawaz and Sattar [12] analyzed traffic in rural and urban areas using vehicle ad hoc network routing protocols. In this research, some of the protocols were studied which were listed as AODV, DSDV, and DSR. The exploration was achieved in both rustic and urban zones. The examination was performed based on information drop, vehicle thickness, and throughput and starts to finish delay. It was investigated from the got outcomes in the type of low packet drop and maximum throughput. DSR gives better outcomes when contrasted with AODV and DSDV in country regions, and AODV provides great execution in contrast with DSR in conditions of low thickness. Saha et al. [13] proposed research through simulation parameters of different cities. This research showed a near trial of different versatility situations of vehicular specially appointed system in three surely understood Indian metros. The AODV routing protocol has been utilized for the simulation results. The comparison analysis was done among protocols based on the packet drop, throughput, and complete time taken by the test system to simulate the given system. Durga et al. [14] defined reliable information distributed in vehicular ad hoc networks. Impact evasion and traffic advancements were further investigations of imminence in the Wise Vehicle Framework. The widespread data and proficient data between vehicles were significant parts of the considerable number of ITS applications. Guo et al. [15] implemented a real-time application in monitoring traffic environment. In this research, they initially proposed a compelling constant traffic data sharing component which depends on a dispersed transportation framework with RSUs, which has a lower registering intricacy and less excess.

4 Research Policies In this section, mitigate the existing issues; the major purpose of this proposed work is to grow a technique for routing protocol in vehicle network to enhance: • PDR full form is Packet Delivery Ratio • E2D full form is End to End Delay • RO’s full form is Routing Overhead. The research technique utilizes a dynamic routing protocol (DRP) and the ant lion optimization (ALO) approach (VALO) and hence improving the delivery rate, network overhead, and end to end delay (E2D).

106

R. Kumari and R. Kumar

4.1 Dynamic Protocol The DSR is a basic routing protocol in which the source determines a series of intermediate hop in the data packet of the routing table. In this protocol, the header is copied in the query packet of the middle hop that is transferred. After that, the receiver retrieves the route from a query and utilizes it to reply to the receiver hop. In case, receiver hop forward multiple paths, the source hop will get the data and store multiple paths from the receiver hop. The other hops used a similar connection in the present path [13]. DSR is a re-active protocol that depends on the source route method. This protocol is mainly reliant on the link state convention in which the sender initiates the route request on-demand basis [16].

4.2 Ant Lion Optimization Process The ant lion optimization algorithm is an evolutionary approach used in the searching of the area through the establishment of the randomized output [13]. The group of the applicant leads to searching for the exact global optimum output rather than a random variable. The method used for resolving the problem of internal and external results [14]. Hence, there is the establishment of the required optimal value with randomized alterations in the output value. In this method, there were maximum possible results to get the desired possible optimal comparison of the native optima [17]. Initially, to create a vehicle ad hoc network with x-coordinates (network length) and y-coordinates (network width). It defines the simulation parameters such as vehicle nodes, energy, veh_ids, and data packet rates. It searches the source node and destination node in the VANET. After that create the coverage set in the VANET. It calculates the coverage set distance, range, and matrix of the VANET. Develop the dynamic source routing protocol (DSR) algorithm to send the request one node to the intermediator node. If route request sent the one node to another vehicular node for data or packet transmission in VANET. If request accepts through intermediator node using reply back. In case, route error occurs, then the DSR third phase used (Route Maintenance). After that, evaluate the performance parameters such as PDR, routing overhead (RO), and E2D. In the proposed algorithm, the ALO optimization algorithm is an evolutionary approach used in the searching of the area through the establishment of the randomized output. In this process is optimized the routes and evaluate the performance metrics (PDR, RO, and E2D) and compared with the existing VACO traffic control protocol.

Vehicular Ant Lion Optimization Algorithm (VALOA) …

107

5 Simulation Result The simulation tool used MATLAB is the high-level programming language and interacting environment for the numerical, alpha-numerical, and mathematical, programming approach. It is established by the organization of the MathWorks. Tables 1 and 2 show the network simulation parameters such as network area 2000 m × 2000 m; range of data communication value is 300 m; the protocol used DSR for data communication in sequence manner; vehicle nodes used to data sending one hop to another hope such as 0, 5, 10, 15, 20,…, 30. RSU is called as receiver value is 5, coverage distance evaluate is 100, and network performance is calculated based on PDR, network overhead, and delay. The mathematical formula’s used in this research work: Packet delivery ratio: It is the proportion of the amount of the packets to be transferred. The performance of the network is improved with an increase in PDR. Mathematically, it is given as, Table 1 Differentiate various routing protocols in VANET [10] Protocol type

Topology based

Position based

Cluster based

Geocast based

Broadcast based

FM: Forwarding method

WMH: Wireless multihop

Heuristic

WMH: Wireless multihop

WMH: Wireless multihop

Wireless multihop

Method recovery

No

No

Yes

No

No

The need for a digital map

No

No

Yes

No

No

Structure need

Yes

Yes

No

Yes

Yes

Scenario

City area

City area

City area

High-way area

High-way area

Table 2 Simulation parameters

Parameters

Values

Network area

2000 m × 2000 m

vnode

5, 10, 15, 20, 25, 30, …

Range

300 m

Energy

Randomly

Parameters

PDR, E2D, and RO

Data packets

Randomly

RSU

5

108

R. Kumari and R. Kumar

Fig. 4 Process is DSR protocol

PDR =

No of packets achieved × 100 No of packets transferred

Network overhead It is the proportion of the amount of the packets created to the total number of the packets created. Network overhead =

Number of the packets which is generated Total number of the packets generated

End to end delay The network performance metric that defines the average amount of the different delays of every data packet achieved by sink hop, and the period a data packet is forwarded by sensor hops. End to end delay = Packets received time − Packets transferred.

Vehicular Ant Lion Optimization Algorithm (VALOA) …

109

Figure 4 shows the deploy the VANET as a receiver. Calculate the vehicular ad hoc network area based on the network length and network width. To search the start node for packet transmission and a destination node in the vehicular ad hoc network. The coverage set and calculate the distance between the source node to the destination node in the VANET. It calculates the network distance from the coverage range and matrix of the VANET. This network defines the vehicle node ID. When the user sends the data one node to another node and assigns the unique id which is 100 up to 500, a unique id exceeds then overload increase and delays occur in the VANET. Figure 5 shows the path maintenance process in the VANET. Particularly, route maintenance in the DSR protocol needs no periodical data packets at any equal in the network. For instance, DSR does not need any route broadcasting and nearest recognized data packets and does not depend on the functional data. The whole on-demand conduct and loss of the periodical operation permit the amount of the overhead data packets produced by DSR to measure data. When hop starts to transfer, the pattern of the communication modifies, and the route data packet identifies the route. In contrast to route searching, hop recognizes multiple paths to receiver hop. It permits the response to route modification that may take place rapidly. Figure 6 shows the sender will forward path demand information (PREQ). Every node gets path demand and that is again forwarded to the neighboring node. If the

Fig. 5 Path maintenance procedure

110

R. Kumari and R. Kumar

Fig. 6 Route maintenance in VANET

sink node or receiver gets the path demand, then it will respond to the sender as a perfect route reply message (PREP). A start node will get the shortest route and forward data packets toward the specific route. Path maintenance is accountable for the failure of connections. In case middle hop determines path breakages that will forward an error fault data to the source node. Figure 7 shows the comparative analysis with numerous protocols such as VALO, DSR, and VACO algorithm. VALO is a proposed routing protocol which is optimized the valuable route and network performance. VACO is used for traffic-aware routing protocol, and ants are transferred by well-organized broadcasting apparatus to control the network issues. In DSR, routing protocol is used for route broadcasting, route reply, and problems come under the network lacking then route maintenance in the VANET. In VALO is reduce the end to end delay parameters as compared with the VACO and dynamic source routing method. Figure 8 demonstrations the comparison between proposed and existing routing protocols such as DSR, VACO, and VALO optimization algorithm. VALO is used to fast signal broadcasting and high data transmission from source to destination vehicle nodes. In VACO, traffic routing protocols to manage the route errors and recover the lost information from the valid route in the network. In DSR, routing protocol to handle the route searching, maintenance, and replying inaccurate manner. In VALO, PDR performance is increasing as compared to VACO and DSR Routing protocols.

Vehicular Ant Lion Optimization Algorithm (VALOA) …

111

Fig. 7 Comparison—end to end delay (ms)

Fig. 8 Comparison—packet delivery ratio (%)

Figure 9 shows a comparison between routing overhead such as VALO, VACO, and DSR routing protocol. In VALO, performance is mitigated as compared with the VACO and DSR routing protocols. Tables 3 and 4 show the performance of the network parameters like PDR, NO, and E2D. In the proposed algorithm, VALO is PDR is 70%, delay 0.3 ms, and overhead 0.00057 bytes. VACO algorithm parameter values PDR is 55%, delay value is 0.8, and overhead value is 0.150 bytes. In DSR, routing protocol performance PDR is 60%, the delay is 0.3, and overhead 0.0013 bytes. VALO PDR is 10% increased as compared with the VACO and DSR routing protocol in VANET.

112

R. Kumari and R. Kumar

Fig. 9 Comparison— routing overhead (Byte)

Table 3 Simulation result analysis

Table 4 Performance parameters with DSR and DSR_ALO proposed Work

Table 5 Comparison between proposed and existing protocol

Parameters

Values

V_network

2000 m × 2000 m

Range of communication

300

Protocol

Dynamic Source Routing (DSR)

Vehicle node

10, 20, 30, 40, …

RSU

1, 2, 3, 4, 5

Coverage distance

100

Parameters

PDR (%), Overhead (byte), and Delay (ms)

Parameters

DSR Protocol

DSR_ALO 0.707–70.7

PDR (%)

0.60–60

Delay (ms)

0.3

0.2

Overhead (Byte)

0.0013

0.00057

Parameters

VACO

DSR protocol

VALO

PDR (%)

0.55–55

0.60–60

0.707–70.7

Delay (ms)

0.8

0.3

0.2

Overhead (Byte)

0.150

0.0013

0.00057

Vehicular Ant Lion Optimization Algorithm (VALOA) …

113

6 Conclusion and Future Scope In conclusion, it is determined that VANET is the most prominent type of communication. VANET contains the vehicles which are the subtype of the MANET to present the connection between the closest vehicles and among vehicles and close by roadside devices, but generally different from other systems based on characteristics. Precisely, the movement of hops in VANET is inadequate to road topology, so that the road data are obtainable, and one may able to recognize the further location of vehicles. Moreover, VANET provides substantial computing, transmission and sensing abilities, and also regular communication, energy to provide these functions. The protocol was made awake of the traffic situations on street segments. Hence, minimum packets in the form of ant were used to sample traffic situation and update vehicle route data. This existing protocol is done through ant colony optimization (VACO). The main issue in existing research is that it did not consider the vehicle traffic situations on the street toward the route to assure the connecting. Hence, the VALO method developed to overcome this issue. A new VALO technique will be developed to improve the traffic rate in urban environments. VALO is a technique to find the route, manage, and deliver maximum data from sender to receiver in VANET. Also, studied and recognize the routing protocols in VANET. Moreover, develop a routing protocol DSR to find route array, request, and store information in VANET. After that, implement an ant lion optimization (ALO) technique through convergence and fitness function to improve the performance rate. Along with that, it evaluated the performance rate based on the PDR, E2D, and network overhead using VALO and compared with the existing parameters. That is, its PDR rate is at least 10% maximum achieve than that of the other comparison routing protocols up to a various number of vehicle nodes. The experimental outcomes will show the planned protocol (EEFFA) offers better network presentation than energy efficient OLSR and firefly optimization methods. It will improve the routing overhead and delay as related to the previous routing protocols.

References 1. Tonguz, O., Wisitpongphan, N., Bait, F., Mudaliget, P., Sadekart, V.: Broadcasting in VANET. In: IEEE 2007 Mobile Networking For Vehicular Environments, pp. 7–12 (2007) 2. Singh, A., Kumar, M., Rishi, R., Madan, D.K.: A relative study of MANET and VANET: its applications, broadcasting approaches, and challenging issues. In: Springe International Conference on Computer Science and Information Technology, pp. 627–632 (2011) 3. Dixit, M., Kumar, R., Sagar, A.K.: VANET: architectures, research issues, routing protocols, and its applications. In: IEEE International Conference on Computing, Communication and Automation (ICCCA), pp. 555–561 (2016) 4. Qu, F., Wu, Z., Wang, F.Y., Cho, W.: A security and privacy review of VANETs. In: IEEE Transactions on Intelligent Transportation Systems, vol. 16(6), pp. 2985–2996 (2015)

114

R. Kumari and R. Kumar

5. Gaikwad, D.S., Zaveri, M.: VANET routing protocols and mobility models: a survey. In: Springer Trends in Network and Communications, pp. 334–342 (2011) 6. Cabrera, V., Rose, F.J., Ruiz, P.M.: Simulation-based study of common issues in vanet routing protocols. In: 2009—IEEE 69th Vehicular Technology Conference, pp. 1–5 (2009) 7. Gozalvez, J., Sepulcre, M., Bauza, R.: Impact of the radio channel modelling on the performance of VANET communication protocols. Springer Telecommun Syst 50(3), 149–167 (2012) 8. Paul, Bijan, Islam, Mohammed J.: Survey over VANET routing protocols for vehicle to vehicle communication. IOSR J Comput Eng (IOSRJCE) 7(5), 1–9 (2012) 9. Kumar, S., Rani, S.: A study and performance analysis of AODV, DSR and GSR routing protocols in VANET. Int J Comput Appl 96(9), 48–52 (2014) 10. Goudarzi, F., Asgari, H., Al-Raweshidy, H.S.: Traffic-aware VANET routing for city environments—a protocol based on ant colony optimization. IEEE Syst J 13(1), 571–581 (2018) 11. Mejdoubi, A., Fouchal, H., Zytoune, O., Ouadou, M.: A distributed predictive road traffic management system in urban VANETs. In: IEEE 15th International Wireless Communications and Mobile Computing Conference (IWCMC), pp. 37–42 (2019) 12. Nawaz, A., Sattar, A.R.: Traffic analysis in rural/urban area using VANET routing protocols. In: Hybrid Electrical/Fuel Cell Vehicles Advances in Automobile Engineering, pp 2–5 (2016) 13. Saha, S., Roy, D.U., Sinha, D.D.: VANET simulation in different Indian city scenario. In: Advance in Electronic and Electric Engineering. ISSN 2231-1297 (2013) 14. Durga, C.V., Chakravarthy, G., Alekya, B. Efficient data dissemination in VANETs: urban scenario. In: IEEE 2018 International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 891–896 (2018) 15. Guo, C., Li, D., Zhang, G., Zhai, M.: Real-time path planning in urban area via vanet-assisted traffic information sharing. IEEE Trans Veh Technol 67(7), 5635–5649 (2018) 16. Kaur, H.: Analysis of VANET geographic routing protocols on real city map. In: 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information and Communication Technology (RTEICT), pp. 895–899. IEEE (2017) 17. Sachdev, A., Mehta, K., Malik, L.: Design of protocol for cluster based routing in VANET using Fire Fly algorithm. In: 2016 IEEE International Conference on Engineering and Technology (ICETECH), pp. 490–495. IEEE (2016) 18. Zhu, M., Cao, J., Pang, D., He, Z., Xu, M.: SDN-based routing for efficient message propagation in VANET. In: Springer International Conference on Wireless Algorithms, Systems, and Applications, pp. 788–797 (2015) 19. Kaur, H.: Analysis of VANET geographic routing protocols on real city map. In: 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information and Communication Technology (RTEICT), pp. 895–899. IEEE (2017)

Dynamic and Incremental Update of Mined Association Rules Against Changes in Dataset N. Satyavathi and B. Rama

Abstract Association rule mining (ARM) in data mining provides quality association rules based on support and confidence measures. These rules are interpreted by domain experts for making well-informed decisions. However, there is an issue with ARM when the dataset is subjected to changes from time to time. Discovering rules by reinventing wheel, scanning entire dataset every time in other words, consumes more memory, processing power, and time. This is still an open problem due to proliferation of different data structures being used for extracting frequent item sets. We proposed an algorithm for update of mined association rules when dataset changes occur. The algorithm is known as FIN_INCRE which exploits the preorder coded tree used by FIN algorithm for fast item set mining. The proposed algorithm outperforms the traditional approach as it mines association rules incrementally and dynamically updates mined association rules. Keywords Association rule mining · POC-Tree · FIN-INCRE · Incremental mining · Support · Confidence

1 Introduction Association rule mining (ARM) has numerous applications such as analysis of sales and discovering latent relationships among attributes in medical dataset to mention few. ARM has two important phases known as discovery of frequent item sets and producing association rules from the results of first phase. Different association rule mining algorithms are evaluated [1, 2]. The difference of algorithms lies in the usage of data structure. For instance, node set is the data structure used [3] for reducing time and space complexity. Fast item set mining was thus made possible. However, N. Satyavathi (B) Department of CSE, JNTUH, Hyderabad, Telangana, India e-mail: [email protected] B. Rama Department of CS, Kakatiya University, Warangal, Telangana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_13

115

116

N. Satyavathi and B. Rama

there was need for incremental association rule mining algorithms that are expected to generate association rules incrementally without rescanning the entire database when database update occurs. We found that the dataset used in FIN algorithm [3] and POC-Tree associated with that provides faster and efficient mining of incremental association rules. FIN-INCRE is the algorithm proposed for discovering association rules incrementally. It exploits the POC-Tree and the underlying data structure of FIN algorithm. The paper is structured as follows. Section 2 presents related work done for mining association rules. Section 3 presents the proposed methodology that can be used for incremental and fast mining of frequent item sets. Section 4 provides procedure for generating interesting association rules. Conclusion of the paper and future scope of the research is provided in Sect. 5.

2 Related Work This section reviews literature on association rule mining. ARM has been a persistent topic in the domain of data mining for number of years. Plentiful research is found on ARM and it proved its utility. Many algorithms were developed for incremental association rule mining: DB-tree and PotFP-tree [4], AFPIM [5], IFP-Growth [6], CAN tree [7], EFPIM [8], CP-tree [9], FUFP-tree [10], BIT-FP-Growth [11] were developed. A new approach called IRARM [12] for mining relational association rules is developed. A system [13] is developed for incremental mining. Original database is represented in the form of COMVAN tree, and frequent item sets are mined using COMVAN tree. Many approaches have been developed for mining incremental association rules. But still, these algorithms suffer from the following drawbacks: • • • • •

Required to scan original database many times. Works only in case of insertions. Works only in case of Deletions. Doesn’t work in case of support change. Data structure used in mining is not capable in terms of time complexity/space complexity.

Hence, there is a need for an algorithm for ARM, which overcomes the drawbacks of existing algorithms, must be developed. Hence, such algorithm is proposed by enhancing FIN algorithm for efficient mining of incremental association rules.

Dynamic and Incremental Update of Mined Association Rules …

117

3 Proposed Methodology Proposed algorithm works well (i) When original database is changed with new transactions (ii) When some of the old transactions are deleted (iii) When change in specified user threshold. FIN algorithm is used for discovering frequent item sets from database D. When D is subjected to new records or removal of existing records or user specified support changes, the item which is frequent may become infrequent and infrequent item sets may become frequent. The proposed incremental mining algorithm FIN_INCRE (shown in Fig. 1) finds the items which became infrequent after adding new transactions, deletes them from Original POC-Tree and also finds items which became frequent after adding new transactions and adds them to original POC-Tree. In this way, POC-Tree is updated to leverage performance of FIN_INCRE significantly. Algorithm UPOCinINS (shown in Fig. 2), UPOCinDEL (shown in Fig. 3), UPOCinSup (shown in Fig. 4) is used to update POC-Tree in case of insertions, deletions, and support change, respectively.

Fig. 1 FIN_INCRE algorithm

118

Fig. 2 UPOCinINS algorithm

N. Satyavathi and B. Rama

Dynamic and Incremental Update of Mined Association Rules …

119

Fig. 3 UPOCinDEL algorithm

4 Generating Association Rules Association rules can be generated from frequent item sets by using “confidence” measure. But enormous amount of rules may be generated which may or may not be interested to the user. So, post processing of rules is required. To find the interesting rules, several evaluation measures can be used [14, 15].

120

N. Satyavathi and B. Rama

Fig. 4 UPOCinSup algorithm

5 Conclusions and Future Scope In this paper FIN_INCRE, an ARM algorithm that is incremental in nature, is proposed. The algorithm exploits the nodeset data structure and POC-Tree. Nodeset consumes less memory as it performs encoding of each node of POC-Tree. Thus, the proposed algorithm runs much faster than other incremental ARM algorithms. The FIN_INCRE algorithm requires scanning of original dataset only once. After scanning the dataset, it generates POC-Tree and from which it produces item sets that occur frequently. Then, they are used to generate association rules. When some new instances are inserted to original dataset, the algorithm scans only the newly added instances. Then, it updates POC-Tree and frequent item sets before actually updating the mined association rules. In future, we continue this research with FIN_INCRE algorithm using distributed datasets. Another interesting future work is to explore FIN_INCRE with streaming data.

References 1. Satyavathi, N., Rama, B., Nagaraju, A.: Present State-of-the-Art of association rule mining algorithms. Int. J. Eng. Adv. Technol. (IJEAT) 9(1). ISSN 2249–8958 (2019) 2. Satyavathi, N., Rama, B., Nagaraju, A.: Present State-of-the-art of dynamic association rule mining algorithms. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 9(1). ISSN 2278-3075 (2019)

Dynamic and Incremental Update of Mined Association Rules …

121

3. Hong, Z., Deng, H., Sheng Long, L.V.: Fast mining frequent itemsets using Nodesets. Exp. Syst. Appl. 41(10), 4505–4512 (2014) 4. Ezeife, C.I., Su, Y.: Mining incremental association rules with generalized FP-tree. In: Advances in Artificial Intelligence, Lecture Notes in Computer Science, vol. 2338. Springer, Berlin, Heidelberg (2002) 5. Koh, J.L., Shieh, S.F.: An efficient approach for maintaining association rules based on adjusting FP-tree structures. In: Proceedings of the DASFAA, pp. 417–424. Springer, Berlin Heidelberg, New York (2004) 6. Tong, Y., Baowen, X., Fangjun, W.: A FP-tree based incremental updating algorithm for mining association rules. 5, 703–710 (2004) 7. Leung, C.K., Khan, Q.I., Hoque, T.: CanTree: a tree structure for efficient incremental mining of frequent patterns. In: Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05) (2005) 8. Li, X., Deng, X., Tang, S.: A fast algorithm for maintenance of association rules in incremental databases. In: Proceeding of International Conference on Advance Data Mining and Applications, pp. 56–63 (2006) 9. Tanbeer, S.K., Ahmed, C.F., Jeong, B.S., Lee, YK.: CP-tree: a tree structure for single-pass frequent pattern mining. In: Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, vol. 5012. Springer, Berlin, Heidelberg (2008) 10. Hong, T.P., Lin, J.W., We, Y.L.: Incrementally fast updated frequent pattern trees. Exp. Syst. Appl. 34, 2424–2435 (2008) 11. Totad, S.G., Geeta, R.B., Prasad Reddy, P.V.G.D.: Batch incremental processing for FP-tree construction using FP-growth algorithm. Knowl. Inform. Syst. 33(2), 475–490 (2012) 12. Diana-Lucia, M., Gabriela, C., Liana, C.: A new incremental Relational association rules mining approach. In: International Conference on Knowledge Based and Intelligent Information and Engineering Systems, KES2018, Belgrade, Serbia (2018) 13. Gupta, A., Tiwari, A., Jain, S.: A system for incremental association rule mining without candidate generation. Int. J. Comput. Sci. Inform. Sec. (IJCSIS) 17(7) (2019) 14. Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right objective measure for association analysis. Knowl. Discov. Data Min. 29(4), 293–313 (2004) 15. Liu, B., Hsu, W., Chen, S., Ma, Y.: Analyzing the subjective interestingness of association rules. Intell. Syst. Appl. 15, 47–55 (2000). https://doi.org/10.1109/5254.889106. IEEE

E-Governance Using Big Data Poonam Salwan and Veerpaul Kaur Maan

Abstract The continuous advancements in the field of ICT and the constant efforts from the Central and State governments have been the foremost forces for the successful launch and reinforcement of e-governance in India. With the help of public and private sectors, governments are encouraging organizations for interoperability to store and process data from a central location that further enhances decisionmaking. This fastest-growing data is turning into big data. The tools used to study and analyse big data at a great speed and accuracy are known as big data analytics. These big datasets can be text/audio/video/picture, etc. As the use of e-governance datasets is increasing, the citizens expect to analyse and process datasets at greater speed and accuracy. This paper shows the relationship between e-governance and big data, its implementation around the globe, initiatives taken by India to establish e-governance, and some challenges in implementing big data with e-governance projects. Keywords E-governance · Big data · Big data analytics · Interoperability

1 Introduction E-governance refers to the process of delivering government services electronically. It helps to maintain the essence of real democracy by making the government procedures transparent. It helps to establish the government of the people, for the people, and by the people through making the government officers accountable and responsible for their duties. With the help of private sectors [1, 2], the Central and State governments are conducting seminars, workshops, and advertisements to encourage citizens towards e-governance. With all such initiatives, the transactional amount of P. Salwan (B) I.K. Gujral Punjab Technical University, Jalandhar, Punjab, India e-mail: [email protected] V. K. Maan Giani Zail Singh Punjab Technical University, Bathinda, Punjab, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_14

123

124

P. Salwan and V. K. Maan

data has been increasing so fast that the traditional database management system cannot be used to deal with such exponentially growing data. This also affects the decision-making as more than half amounts of data remain unprocessed due to change in its type. In this paper, Sect. 1 will discuss how to manage this continuously growing data related to e-governance. Section 2 will discuss big data in e-governance and its features. Section 3 will discuss the role of big data in e-governance projects across the globe and India’s initiatives to adopt big data. Section 4 will discuss some challenges that may occur while using big data analytics in e-governance.

2 E-Governance and Big Data—An Inside The main notion of e-governance is to provide a better socio-economic-political environment to the citizens [3]. In 2006, the Indian government had initiated the National e-Governance Plan (NeGP). Initially, it was having 27 Mission Mode Projects (MMPs) of State and Central governments and 8 integrated MMPs. Later on, another four projects were added in NeGP. All these projects led to the generation of a huge amount of data. This huge amount of data is known as big data [4, 5]. The most popular example of e-governance-based big data is Aadhar-UID. The term big data is referred for the datasets whose size and capacity are beyond the capabilities of a traditional database system. These datasets may be structured/semistructured/unstructured in nature which cannot be dealt with a traditional database system. “The size or amount of data under big data varies from company to company; i.e. one company’s big data may not be as big as other company’s big data” [6].

2.1 Characteristics of Big Data Basically, all the datasets that satisfy the characteristics of 3Vs—Volume, Velocity, and Variety—are considered as big data (Fig. 1). The technique used to study and process mixed type datasets at a faster speed is called big data analytics. The big data analytics process the big data by dividing datasets into equal sizes [7], storing them on different computers known as nodes in a cluster of computers. This way big data analytics make the processing faster and accurate.

2.2 Phases of Big Data The different phases comprising data as big data [7] are as follows (Fig. 2): • Big data generation: This phase refers to different sources generating huge amounts of data at greater speed.

E-Governance Using Big Data

125

Fig. 1 Characteristics of big data

Fig. 2 Phases of big data

• Big data acquisition: This phase refers to collecting data, turning into big data, from different resources, or distributing data among other resources or pre-processing of data. • Big data storage: This phase refers to the management skills to store big data in such a way that it could enhance the accessibility and availability of big data. • Big data analytics: This phase refers to the analysis of structured/unstructured/mixed datasets to forecast future trends or predictions.

2.3 Features of Big Data The important features of big data are as follows:

126

P. Salwan and V. K. Maan

• It is capable to manage a dynamic type of data; i.e. it can manage the structured, semi-structured, and unstructured type of data easily. • It can easily manage a great volume of datasets, produced at a great velocity. • It is scalable in nature; i.e. its setup can be modified as and when required. • It has very vast analytic techniques meant for different types of data that help to study different patterns or trends from processed/unprocessed data. • It helps to take important decisions basis the current trend’s analysis.

3 Adoption of Big Data in E-Governance Project Earlier, when the digital form of data was not available—the veteran leaders of the government were expected to use their wisdom and past experiences to make decisions [8]. In the present era, big data analytics help in decision-making using digitized datasets. Almost 90% of datasets generated through different resources are of an unstructured type. Big data analytic techniques give us a facility to explore the unknown or hidden facts through the dissemination and processing of data under different phases. Figure 3 shows how different types of datasets are collected, refined, and synthesized to get the required data from the datasets [9]. The private sector has started using big data analytics to maximize their profit by studying market trends, consumer behaviour, expectations, etc. The government departments are using it for the growth and development of their citizens. The governments are also making laws and implementing policies to ensure security and privacy, at all the phases of big data processing, for the validity of the information. Many countries of the world like the US, UK, and Japan have already started projects using big data analytic techniques to make future predictions [10].

Fig. 3 Overview of different phases of big data processing

E-Governance Using Big Data

127

3.1 E-Governance and Big Data Across the Globe Here is the analysis of various counties running e-governance projects based on big data analytics [10, 11]. • The Australian government has been using big data analytics to provide better services to their citizens. The Australian Customs and Border Protection Service (ACBPS) is using big data analytics to ensure the security of their borders. • The UK government had allotted £189 million for big data research, and major emphasis was given to the agriculture industry. • The government of France has allocated e11.5 million on the proposal related to 7 big data processing projects. • The Norway government has been using big data analytics for the health care of its citizens. • The Indian government has invested Rs. 5630 crores on the UID project to provide a unique ID to its citizens. The United Nations Department of Economic and Social Affairs (U.N DESA) conducts E-Governance Development Survey [12–14] every two years (biannually). This survey helps to find out the e-readiness of different countries and calculates E-Government Development Index (EGDI) using human development-related parameters. The detail of these parameters is as follows: 1. Online Service Index (OSI): It checks whether the countries are following the minimum level of Web Content Accessibility Guidelines or not. 2. Telecommunication Infrastructure Index (TII): It checks communication-related aspects of the nation like total users of computer per 100 people; total connections of telephone per 100 people; total connections of Internet per 100 people; total users of mobile per 100 people; and total users of broadband per 100 people. 3. Human Capital Index (HCI): This parameter checks the literacy rate, enrolment, and level of education at the primary and secondary levels, and skill development. After calculating the above parameters, EGDI further finds out the composite index based on the weighted average of these parameters. The possible values of this index lie between zero (minimum) to 1 (maximum). EGDI = (1/3 ∗ OSI) + (1/3 ∗ TII) + (1/3 ∗ HCI) The EGDI index report 2018 (Table 1) shows Denmark at the top rank with 0.9150 index value. India, through its constant efforts, has made it possible to achieve 96th global rank in the EGDI report with 0.5669 index value [12, 15]. Now the obvious question that comes to the mind is—Is the ranks scored by different countries is the result of continuous efforts [16] or the result of efforts invested in two years only? The answer to this question can be understood with the help of Table 2. It shows the consolidated status of different countries on the basis of EGDI Biannual reports of 2014, 2016, and 2018.

128

P. Salwan and V. K. Maan

Table 1 E-Governance Development Index (EGDI) 2018 survey report Rank

Countries

EGDI 2018

OSI 2018

TII 2018

HCI 2018

1

Denmark

0.9150

1.0000

0.7978

0.9472

2

Australia

0.9053

0.9722

0.7436

1.0000

3

Republic of Korea

0.9010

0.9792

0.8496

0.8743

4

UK

0.8999

0.9792

0.8004

0.9200

5

Sweden

0.8882

0.9444

0.7835

0.9366

11

USA

0.8769

0.9861

0.7564

0.8883

65

China

0.6811

0.8611

0.4735

0.7088

94

Sri Lanka

0.5751

0.6667

0.3136

0.7451

96

India

0.5669

0.9514

0.2009

0.5484

117

Nepal

0.4748

0.6875

0.2413

0.4957

Source UN e-government survey 2018 Table 2 Biannual comparison of EGDI Rank

Country

EGDI 2014

EGDI 2016

EGDI 2018

1

Denmark

0.8162

0.8510

0.9150

2

Australia

0.9103

0.9143

0.9053

3

Korea

0.9462

0.8915

0.9010

4

UK

0.8695

0.9193

0.8999

5

Sweden

0.8225

0.8704

0.8882

6

USA

0.8748

0.8420

0.8769

7

China

0.5450

0.6071

0.6811

8

Sri Lanka

0.5418

0.5445

0.5751

9

India

0.3834

0.4637

0.5669

10

Nepal

0.2344

0.3458

0.4748

Source UN e-government survey 2014, 2016, and 2018 Fig. 4 Biannual comparison of EGDI survey 2014–2018 EGDI Index

Biannual comparison of EGDI 1 0.8 0.6 0.4 0.2 0

EGDI 2014 EGDI 2016 EGDI 2018

Countries and their ranks

E-Governance Using Big Data

129

The pictorial representation (Fig. 4) further helps to understand the difference in parameters, the growth rate of e-governance, and the e-readiness of various countries at different time intervals. It indicates that e-governance is a long-term project seeking continuous efforts, time, money, and management for its successful implementation.

3.2 E-Governance and Big Data in India The Indian governments have initiated many e-governance-based projects at the Central level, State level, or with the integration of both for the citizen’s welfare [17]. The most prestigious project was UID (Aadhar Card) where the government has invested Rs. 5630 crores, to uniquely identify citizens. This project is using big data analytic techniques as it is dealing with huge amounts of mixed datasets that need to be processed in real time at great speed. In order to make India—Digital India, the governments are trying to make the citizens aware and enable to use all the government services. As far as the digitization of public departments is taking place, the problem of maintaining a huge amount of data using traditional databases was also becoming heartbreaking. Thus, now the role and support from big data analytics have not been only supporting e-governance but also to provide various techniques to easily store or process a huge amount of datasets at a great speed and accuracy. That is how big data has been proving its worth in e-governance projects. E-governance projects at Central level: Table 3 shows the detailed list of Mission Mode Projects initiated in India [18]. Some of the MMPs initiated and implemented at the Central level are: • Digitization of government offices: The Department of Administrative Reforms and Public Grievances (DAR&PG) and National Information Centre (NIC) worked together for the computerization and its successful implementation in all the government departments [19] to make the system more transparent. • Issuance of Unique Identification (UID): The idea of this project was first triggered and discussed in 2006 [20]. This project stores all the related information like name, address, retina scan, and finger impressions. • Income tax (IT): This project enables the citizens to file income tax [21] on anytime and anywhere basis. This project encourages issuing PAN cards to citizens that are further linked with the citizen’s account. Citizens can also track the status of their returns or refund online. • Central Excise and Customs: This project facilitates the trade and industry by simplifying the custom and excise processes [22], filing of returns, reconciliation, e-registration for excise and service tax, etc. • Insurance: This project provides speedy processing of claims, online insurance of policies on the web, etc., through interoperability [23]. E-governance projects at the State level: Some of the Mission Mode Projects initiated and implemented at the State level are as follows:

130

P. Salwan and V. K. Maan

Table 3 Mission Mode Projects (MMPs) S. No. Central MMPs

State MMPs

Integrated MMPs

1.

Banking

Agriculture

CSC

2.

Central Excise and Customs Commercial taxes

3.

Income tax (IT)

e-District

e-Courts

4.

Insurance

Employment Exchange

e-Procurement

5.

MAC 21

Land Records (NLRMP)

EDI for eTrade

6.

Passport

Municipalities

National e-Governance Service Delivery Gateway

7.

Immigration, Visa e-Panchayats And Foreigners Registration and Tracking

8.

Pension

Police

9.

e-Office

Road transport

10.

Posts

Treasuries Computerization

11.

UID

PDS

12.

Education

13.

Health

e-Biz

India Portal

Source https://meity.gov.in/content/mission-mode-projects

• Agriculture: The main objective of this MMP is to inform the farmers [24] about seeds, type of soil and matching crops, fertilizers, pesticides, government schemes, weather forecasts, etc. • Commercial taxes: The main objectives taken care by this project are e-filling of returns [25], refunds, e-payment of taxes, online dealer ledger, etc. • Education: Education is the common concern of both the Central and State governments [26]. Thus, the Ministry of Human Resource Development (MHRD) established a centralized structure that will be implemented by State governments. • E-municipalities: Digitization of the state-level municipalities is another very important initiative taken by the Central government [27] under the e-governance plan. • Digitization of land records: The main objective of this project is to digitize the existing land records to avoid the chances of human mistakes [28]. • Employment Exchange: This project helps employers and employees to match their requirements and find the best fit using online resources [29]. Integrated e-governance projects: Other than the projects mentioned above, there are many projects seeking Central and the State governments’ coordination for the welfare of the citizens, for example, land records, education, entertainment, etc. Some of the integrated projects and their objectives are as follows.

E-Governance Using Big Data

131

• Road transport: This project created a unified scheme (states and union territories) to computerize their transport offices for efficient and quick management of driving licences and certificates [30]. • E-Procurement: This project helps to make the procurement processes simple, transparent, and result-oriented [31] using the Internet. • EDI for eTrade: The electronic data interchange (EDI) for online trade provides deliveries of services (24 * 7) electronically, increased transparency, reduced time, cost, etc. [32]. • E-Biz: This project provides services in Government-to-Business (G2B) [33] by sharing updated online information, easy to access the website, etc.

4 Challenges of Using Big Data Analytic Techniques Big data analytic techniques have proved their worth in e-governance-based projects. Still there seems some challenges or gaps to overcome for the successful use of big data in e-governance [34, 3]. • Threat to privacy: Big data analytic techniques need to process personal details of the citizens like UIDs, bank details, health details, sale, or purchase information for analysis. If this personal information is not used appropriately, then it may lead to its safety threat. • Ethical versus unethical: As the end-users (citizens) are neither aware nor informed that their personal details have been shared for future analysis, this act inclines towards the unethical use of power for accessing sensitive information. • Security of data: The e-governance project’s datasets, placed on the distant servers may lead to intentional or unintentional threats to sensitive datasets. • Lack of skilled resources: There is a deficiency of skilled resources to maximize the utilization of big data analytics by finding out hidden patterns or detail. • Reliability of information: The reliability of these reports mainly depends on the capabilities and intentions of the enabled resource generating that report.

5 Conclusion E-governance has been transforming the whole world. Now paper files have been turned into computerized files and stored and maintained at the repositories placed at far of places. Big data analytic techniques have been adding sophistication in the e-governance by providing detailed insights of hidden patterns or datasets. Big data analytic techniques have also been overwhelming the traditional DBMS problems like storing, sharing, and processing huge volumes, high velocity of datasets at greater speed. Big data analytics also have some issues or risks related to safety, security, and accessing of datasets. Technocrats are continuously working to provide safeguards

132

P. Salwan and V. K. Maan

against all the odds being faced using big data analytic techniques. The Indian government is also working to make India—Digital India. Various e-governance projects have been implemented at the central and the state levels for the welfare of the citizens. The most popular project; i.e. UID has been using big data analytic techniques to store and process huge amounts of data. Thus, the integration of e-governance and big data should be encouraged to make Indian cities—smart cities and India—Digital India. This will also help the Indian government in decision-making, better planning, and management of resources for the welfare of citizens.

References 1. WSP International Management Consulting: Report on Public-Private Partnerships in India 2. Infrastructure in India: The Economist (Magazine) (2012) 3. Benchmarking E-government: A Global Perspective. United Nations Division for Public Economics and Public Administration 4. Preet, N., Neeraj, S., Manish: Role of big data analytics in analyzing e-governance projects. Gian Jyoti E-J. 6(2), 53–63 (2016) 5. Big data: Wikipedia. https://en.wikipedia.org/wiki/Big_data 6. Poonam, S., Mann, V.K.: IJRTE 8(6), 1609–1615 (2019). https://www.ijrte.org/wp-content/upl oads/papers/v8i6/F7820038620.pdf 7. Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mob. Netw. Appl. 171–209 (2014) 8. Zaid, S.A.: Case Study: Impact of Leadership on Governance in Public Administration. Academia 9. Big Data and Analysis. FOSTECH & Company 10. Rajagopalan, M.R, Solaimurugan, V.: Big data framework for national e-governance plan. In: 11th International Conference on ICT and Knowledge Engineering. IEEE (2013) 11. Sridhar, V.: E-paper—Big Data’s Big Governance Impact (2017) 12. E-governance Development Index survey report from UN DESA (2018). https://drive.google. com/file/d/1FZT5zDfTa-ejvPh9c1Zu1w51DoMOefw1/view 13. E-governance Development Index survey report from UN DESA (2016). https://drive.google. com/file/d/1C-wGuGkLEIY4pwM-cO7Nv2xjM2_IvJbI/view 14. E-governance Development Index survey report from UN DESA (2014). https://drive.google. com/file/d/1BrSZ7zfsPGLd6t6AiHyynsZLhCEFZjmY/view 15. E-governance and Digital India Empowering Indian Citizens Through Technology, September 2015 16. Chap 8—E-governance in India: Initiatives, Opportunities and Pr0spects to Bridge the Digital Divide 17. Mohanty, P.K.: Using e-Tools for Good Governance and Administrative Reforms. Academia 18. Mission Mode Projects (MMPs) of India: Ministry of Electronics and Information Technology. Government of India 19. National Portal of India: www.india.gov.in 20. UID: http://www.censusindia.gov.in/ 21. Income Tax Department: www.incometaxindia.gov.in 22. Central Excise and Custom: http://www.cbec.gov.in/ 23. Insurance: http://financialservices.gov.in/ 24. Agriculture: http://agricoop.nic.in/ 25. Commercial Taxes: https://dor.gov.in/ 26. Ministry of Human Resource Development: www.mhrd.gov.in, www.edudel.nic.in 27. E-municipalities: http://tte.delhigovt.nic.in/wps/wcm/connect/doit_udd/Urban+Development/ Home

E-Governance Using Big Data 28. 29. 30. 31. 32. 33. 34.

Digitization of land records: http://noidaauthorityonline.com/land-record.html Employment exchange: www.labour.nic.in/ Road Transport: http://morth.nic.in/ and http://meity.gov.in/ E-procurement: https://commerce.gov.in/ Electronic Data Interchange (EDI) for Trade (eTrade): https://commerce.gov.in/ E-Biz Homepage: http://dipp.nic.in/ Sachdeva, S.: White Paper on E-Governance Strategy in India (2002)

133

Implementation of Voice Controlled Hot and Cold Water Dispenser System Using Arduino K. Sateesh Kumar, P. Udaya Bhanu, T. Murali Krishna, P. Vijay Kumar, and Ch. Saidulu

Abstract The water dispenser is a system which can be used to dispense drinking water at various work and commercial places. Due to extensive usage among public, the demand for these water dispensers is increasing day to day. As per the healthconscious and/or based on their interest prefer hot and cold water. Even though a plethora of water dispensers are available in the market still there is scope to improve in performance wise. The existing choice microcontroller unit-based water dispenser systems dispensers are facing common problems like button operated and wastage of water in the case of overflow of no glass/container case. In this paper, these problems are addressed with hardware design. A novel voice control-based water dispenser is proposed by maintaining the choice-based dispense with voice control using Arduino Nano. This system also avoids the wastage of water. Keywords Arduino · Microcontroller unit · Voice controlled · Water dispenser

1 Introduction The water dispenser is a system which can be used to dispense drinking water at various work and commercial places from schools to corporate workplaces including hospitals. Due to extensive usage among public, the demand for these K. Sateesh Kumar · P. Udaya Bhanu (B) · T. Murali Krishna · P. Vijay Kumar · Ch. Saidulu Department of ECE, Vignan’s Lara Institute of Technology and Science, Vadlamudi, AP, India e-mail: [email protected] K. Sateesh Kumar e-mail: [email protected] T. Murali Krishna e-mail: [email protected] P. Vijay Kumar e-mail: [email protected] Ch. Saidulu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_15

135

136

K. Sateesh Kumar et al.

water dispensers is increasing day to day [1, 2]. These dispensers are used to dispense water at low (cool), normal (room temperature), and high (hot) temperatures with different microcontroller-based circuits/designs. Even soft drinks also offer with this technology. Particularly, in pandemic situations, people from all over the world are very conscious about their health, particularly about drinking water. On the other side, embedded systems are rapidly developing to address various real-time issues in our day to day life [2]. This makes the large space for marketing innovative and smart water dispensers on a global scale as an application of an embedded system. With this, we propose an Arduino nano-based voice controlled hot and cold water dispenser system. This paper is organized as follows. Section 1 represents the introduction to the research problem and followed by background in Sect. 2. Detailed discussion about the proposed method is presented in Sect. 3. Algorithm of the proposed is presented in Sect. 4. Results are presented in Sect. 5 and conclusion follows.

2 Background In site view of background is presented in this section. As an impact of digitalization process all household appliances, from the basic need to entertainment is transformed into a smart mode. This was extensively applicable for water dispensers. In the early stage, water dispensers are equipped with heating coils with controlling circuits which are analog in nature. Due to the development of embedded systems, the microcontroller-based units are replaced these existing analog systems [3]. Performance wise like heating of water, controlling of temperature, speed and accuracy are good in these digital-based systems [4, 5]. The standard Micro Controlled base Unit (MCU) is efficient in power management too [6]. ATMEL family-based water dispensers can also supply hot water with specially designed outlets made up of copper or steel [7] and having a mixing mechanism to produce water at a required temperature level [6]. Apart from the basic functioning, water dispensers are also getting into smarter with interesting features like mixing of water to get required temperature, monitoring of water level, mineral level in the water, quality of water, and sensor-based ON and OFF, etc., with the Internet of Things (IoT) [7, 8]. Most of the embedded based machines are operated with power supply. These are better than power-free water dispenser [9]. The motive of this paper is to design a smart system to avoid wastage of water with more users friendly rather than to study the chemical attributes [4] of water. In the existing system of AT89S52-based microcontroller, the existing water dispenser [9] was implemented with microcontroller (AT89S52) that was done with embedded C program. This microcontroller unit can handle and interface with the I/O modules. Whenever the user presses a particular button in the dispenser, it will send the information to the microcontroller, which is connected to DC motors. According to the information received as an input, the corresponding motors will activate, and they dispense the water. The heat section coils present in the dispenser will heat the

Implementation of Voice Controlled Hot and Cold Water …

137

water in the dispenser. The temperature sensor can be used to sense the temperature of the water tank. The power supply is given to heat section coils heat the water. Some of the drawbacks of the existing system are as follows. • Water over flow will occur at the dispenser, during no glass case. • Continuously operating the buttons of dispenser, till the completion of required water level. These drawbacks and addressed and resolved in the proposed method of voicebased water dispenser system with advanced hardware and high computational power than existing.

3 The Proposed System The block diagram of the proposed Arduino Nano-based water dispenser system is represented in Fig. 1. The proposed voice choice-based water dispenser system works with Arduino Nano board. This Nano board is advanced than Arduino Uno [5, 10, 11]. This is an improved version of the existing 89S52-based system in hardware design and functionality (software). The novelty of the proposed system with the existing is there is feedback between user (input) and outlet (output). This reduces the wastage of water and improves the performance of the system. The insight of the proposed water dispenser is categorized into two parts. (i) Hardware, (ii) Software.

Relay, Pump motor (Hot)

Voice Command Crystal Oscillator Power supply Temperature Sensor

ARDUINO NANO

Relay, Pump motor (Cold) IR sensor

LCD Display

Fig. 1 Block diagram of the proposed water dispenser using Arduino

138

K. Sateesh Kumar et al.

3.1 Hardware The heart of the hardware section of the water dispenser Arduino Nano. Arduino replaces the AT89S52, which was at down stage in the performance than Arduino. Arduino Nano: Arduino Nano is a product of Arduino. It is a flexible advanced microcontroller with and broadband supportive. It supports its earlier version UNO at small in size. The output Arduino Nano can produce analog and digital outputs to control the peripherals. Power supply: The power supply required for this water dispenser is at maximum of 12 V. Voice recognition module (VR-3): In this, voice recognition module voice(s) of user are recorded and stored. When user again gives command, compares with database, and gives the response as either hot or cold which was opted by the user. Crystal oscillator: Crystal oscillator is used to provide the clock to Arduino board. This is assembled on the board itself. The crystal frequency is in the order of 6 MHz. Relay: Relay is electromechanical to acts as automatic switch to drive the pump motor drivers (L298N) to dispense water of choice through outlets. Separate relays are used for hot and cold outlets. IR sensor: The IR sensor is used to identify the presence of the container or glass at the outlet by transmitting and receiving infrared signals. Water level also measured using IR sensor. LCD display: In this proposed method, 16 × 2 LCD is used to display the status of container, status of water that dispensing and any error messages. The outputs of LCD display are discussed in the result section. Temperature sensor: The temperature sensor is to sense the temperature value of the hot and cold water tank. The mixing of water is also possible based on the user choice.

3.2 Software (Arduino 1.8.10) The Arduino Integrated Development Environment (IDE) is used to program the proposed water dispenser in C++. This IDE provides the environment to store and process voice commands given by user.

Implementation of Voice Controlled Hot and Cold Water …

139

4 Algorithm of the Proposed Method The algorithm of the proposed method is represented in this section. 1. 2.

Power on the Unit, it displays “Please place glass” in LCD. If the glass is not detected by IR sensor, then it displays “Please place glass” in LCD. 3. If the glass is detected by IR sensor, then it displays “Glass detected gives your input in LCD. 4. Now, user can give voice through VR3. This recorded voice is compared with the database. 5. If the voice matches with database, then voice command is goes to Arduino module. 6. Arduino dispenses the water, i.e., HOT or COLD based on the choice. 7. Check whether the glass is full or not using IR sensor. 8. Dispense the water continuously till the completion of the glass. 9. If the glass is full, then display message “Take your water” LCD. 10. Stop the dispensing of water.

5 Results This section provides the brief discussion of the results with the help of corresponding screenshots. Figure 2 shows the overall design of the proposed voice-based smart water dispenser using Arduino and message of “Please place Glass”. Figure 3 represents the message “Glass detected give voice input,” and Fig. 4 shows the display of “HOT water” and “Cold water” in LCD display after accepting the voice command by user.

Fig. 2 Overall design of the proposed water dispenser using Arduino

140

K. Sateesh Kumar et al.

Fig. 3 Displaying “Glass detected give the voice”, when system is on

Fig. 4 Display of message “Hot water” and “Cold water” in LCD

6 Conclusion The voice-controlled water dispenser using Arduino Nano is proposed with more user-friendly accessibility. This system is an improved version in hardware and in functional than existing 89S52 model. Avoidance of water wastage is an added advantage for this system. This makes it best suitable for home and commercial areas. It needs minimum maintenance than power-free water dispensers. The proposed Arduino-based system can save the water resource by implementing IR sensor-based container detection mechanism and overflow detection.

Implementation of Voice Controlled Hot and Cold Water …

141

References 1. Huang, P.P.: The effect of different temperature water intake to the resting heart rate variability. Department of Physical Education, Fu Jen Catholic University, Magisterial Thesis (2005) 2. Reverter, F., Gasulla, M., Palhls-Areny, R.: Analysis of power-supply interference effects on direct sensor-to-microcontroller interfaces. IEEE Trans. Instrum. Meas. 56(1), l71–177 (2007) 3. Jinxiong, X., Dong, Z., Yuying, W., et al.: A design of energy-saving drinking dispenser based on fuzzy memory control. J. Inspection Quarantine 20(3), 30–33 (2010) 4. Huang, J., Xie, J.: Intelligent water dispenser system based on embedded systems. In: Proceedings of 2010 IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications, pp 279–282. Qingdao (2010) 5. Cheng, W.Z., Cheng, R.Z., Shuo-Yanchou: Power saving for IoT enabled water dispenser system. In: 2019 42nd International Conference on Tele Communications and Signal Processing (2019) 6. Huang, C.J., Tsai, F.T.: Research and development of a practical water dispenser. In: International Conference on Applied System Innovation (ICASI), pp. 1225–1228. Sapporo (2017) 7. Zhongren, C., Fangjing, C., Yanfeng, Z.: Development and application of an external intelligent power saver for drinking water dispenser. Shan Xi Electronic Technology (2012) 8. Smart Systems and IoT: Innovations in Computing. Springer Science and Business Media LLC (2020) 9. Ariffin, S.H.S., Baharuddin, M.A., Fauzi, M.H.M., Latiff, N.M.A., Yusof, S.K.S., Latiff, N.A.A.: Wireless water quality cloud monitoring system with self-healing algorithm. In: 2017 IEEE 13th Malaysia International Conference on Communications (MICC), pp. 218–223 (2017) 10. Yen, Y., Chou, Z., Hou, M., Wang, X.: The design of intelligent water supply device based on MCU. In; 2015 IEEE 5th International Conference on Electronics Information and Emergency Communication, pp. 388–391. Beijing (2015) 11. Aisuwarya, R., Hidyathi, Y.: Implementation of Ziegler-Nichols PID tuning method on stabilizing temperature of hot-water dispenser. In: 2019 16th International Symposium on Research (QIR), International Symposium on Electrical and Computer Engineering (2019)

Future Smart Home Appliances Using IoT Pattlola Srinivas, M. Swami Das, and Y. L. Malathi Latha

Abstract Internet of Things is a physical device and objects that collect data, store, and analyze the data. In-home appliances manually operated activities and functions. The IoT uses technical product elements, in-home appliances rapid changes in society. The IoT system design development, control and monitoring in various applications such as health, transport, agriculture, home appliances, etc. We propose a framework model future smart home appliances using IoT which helps the developer to build infrastructure home automation applications accordingly to the user specifications and the requirements. The proposed model is the best solution to use smart home applications with the use of sensors, communication, smart home operations, controlling with the use of mobile apps and Arduino. The system will provide security, smart home automation. In future, it will be extended to develop intelligent smart-based application with an integrated environment and reporting applications. Keywords IoT · Home security · SmartPhone · Smart home appliances · Automation

1 Introduction Internet of Things is the network of physical devices, objects, sensors in the network connectivity, which are used to collect, exchange data, store, and analyze the data. IoT applications in home automation features energy, protection, and safety. In 1990, P. Srinivas (B) · M. Swami Das Department of CSE, Malla Reddy Engineering College (Autonomous), Hyderabad, Telangana State, India e-mail: [email protected] M. Swami Das e-mail: [email protected] Y. L. Malathi Latha Department of CSE, Swami Vivekananda Institute of Technology, Patny Center, Secunderabad, Telangana State, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_16

143

144

P. Srinivas et al.

home automation using Internet and devices. In 2000, home network systems using a smartphone which uses apps for remote monitoring applications; in 2010, smart home application by IoT and AI technologies is used for context-aware systems, now 2020, intelligent smart home appliances using IoT, AI, and machine learning for record data, store, analyze, and remote control and access according to the information in context. IoT plays by 2023 US $137.9 billion, growth rate. The home appliances by smartphone with wi-fi as the communication protocol. Internet of Things is a new technology used in important various applications like smart homes, health, energy servers, defense, monitoring, transport, traffic management, infrastructure management, and water and building environment, etc. IoT consisting of components, networks, sensors, able to integrate to read, store, and analyze information. IoT essential technologies are WSN, RFID, middleware, cloud computing, and IoT software applications. IoT is a service required in worldwide various applications. According to world statistics, 2020 by 20.4 billion IoT devices, expected to be 64 billion IoT devices are used by 2025. Growing the popularity to use of home systems, security is the most important; in IoT-based security helps guarantee availability of services. Today smart home technology advancements are mostly used in general human aid, intelligent smart home IoT services. IoT changing the life of a human, home appliances are also used of home in the office, every domestic space light, dishwater, gardening, air conditioning, etc. The sensors controlled smart devices use smartphone or tablets with wi-fi connection are used to collect the sensor data, which allows to read, store, analyze the data. Gardening use automatic sprinklers in smart automation infrastructure of house with communication medium wi-fi or Bluetooth. In 2020, there are 20 billion devices are connected in healthcare, advanced technology with IoT products are used in home appliances and house automation. IoT rapid changes in society to provide the scope of the devices. Future home appliances used TV, lighting, heating, freeze operations and use of IoT systems. IoT devices connected with communication efficiently. The paper is organized Sect. 2 describes the literature survey, Sect. 3 describes the proposed model, Sect. 4 is about discussions, and Sect. 5 specifies about the conclusion and future scope.

2 Literature Survey 2.1 Related Work In Lee et al. [1], IoT technology emphasis the essential elements, products, and services. Kumar Mandula et al. [2] discussed IoT-based applications health care, home, etc., using automation microcontroller and mobile apps. Mussab Alaa et al. [3] studied IoT related to 229 articles and technological advancements in smart home applications, smart homes, apps, IoT databases and classified papers into IoT smart

Future Smart Home Appliances Using IoT

145

home survey. Vignesh et al. [4] proposed a model home automation that accesses control devices remotely by smartphones with the use of WSN, cloud networking from remote locations. Timothy Malche et al. [5] proposed a FLIP architecture that uses sensor environment alert, monitor, controlling, intelligent, in smart home applications using Frugal Laboratories IoT (FLIP) architecture. Swetha et al. [6] studied systems to monitor smart homes in electrical appliances light, fan, etc., using sensors and the Internet. According to Min Li et al. [7], smart home applications were used in the most important part of smart grid usage that users respond from services in designing a smart home with electricity service. Petnik et al. [8] proposed a home care cloud-based service with an integration layer. Heetae Yang et al. [9] studied smart home service functions the authors collected 216 samples from Korea, personal and characteristics based on behavior. Jian Mao et al. [10] studied IoT functionality, security with machine learning algorithms which play the most significant role in smart home systems. Hana Jo et al. [11] study smart home, IoT-related technology which integrated devices that are used to organize each device in a network perform activities. Majid Al-Kuwari et al. [12] proposed a smart home automation with the use of IoT sensing and monitoring. Controlling home, smart home with intelligent automation with design, sensing, and monitoring. Shradha Somani et al. [13] proposed an IoTbased smart security that provides home automation using a smart home that will use the software, sensors, and actuators. Ahmed et al. [14] studied IoT quality assurance. IoT applications growing in various domains such as security, e-health, smart cities, and defense applications. Batalla et al. [15] proposed an architecture to provide security and availability. According to Khalaf et al. [16], smart home control activity using IoT sensors, processing, and applications.

2.2 Problem Definition Design and developing a model for IoT-based smart home appliance system with automation activities based on sensors, processing data control and monitoring system in the smart home environment.

3 Proposed Model The architecture mainly consists of users, devices, network communication, controlling, and application services. The system proposed model is shown in Fig. 1. IoTbased home application and reporting system design, development of infrastructure and application services. The architecture is at initial stages that the developer will need to investigate needs and design the futures of IoT home automation appliances system and collect different specifications according to the operations.

146

P. Srinivas et al.

Arduinos

Users

IoT Home appliances

Networks

Sensor

Control and Monitoring Smart MobileApp

Smart Phone

Light

Security

Smart Speaker

Refrigerator

Smart TV

Door Temperature and others

Fig. 1 IoT-based home application and reporting system

Table 1 Smart environment applications

Technology

Elements

Network size

Small, medium, and large

Users

Home users

Energy

Rechargeable battery

Internet

Wi-Fi, Bluetooth

Data

Local, sensor, and remote data

IoT devices

Smart mobile, RFID, protocols, and apps

Analysis

Data storage, analysis, and reporting

IoT elements each component, hardware, middleware, storage, computing tools, data analysis, and visualization. It is used by mobile users and remote users; IoT home appliance users are the stakeholders to user functions [17, 18]. The IoT home appliances will use the smart environment technology elements which are described in Table 1. In IoT sensors are collecting data will be communicating to IoT applications according to the sensors information and user operations. The sensors are used for data management, processing, and analysis home health, entertainment, based on information process events and running actions in services. The system with components uses of sensors. Sensors will collect the data; for example, sensors will detect light, refrigerator, door, security, smart speaker, smart TV, temperature, water level, air quality, video, sound, pressure, humidity, infrared, vibration, and ultrasonic. The communication system will use gateways, protocols, firmware, network in home appliances to use communication to collect the information, for example, RFID, Wi-Fi, WSN, and satellite home automation using IoT, use the software, hardware, and networking. Automation uses sensors, protocols, hardware, software, apps, and communication protocols. Application services in IoT smart home appliances consist of home safety gardening management and security, air and water quality monitoring, voice assistant for natural language, smart watch, smart lock, smart energy

Future Smart Home Appliances Using IoT

147

meter depend on smart home IoT home automation gateway. Communication and control smart mobile with functions and communicating to control and monitoring with Arduino. Home automation will provide functions according to the stakeholder’s specifications. Characteristics features like house automation, functions, and security services. IoT emerging personal, home, enterprise, utilization of mobile operations for data collection, keep track, home maintenance application services, and optimization. Home automation flexibility to use automation uses of IoT home appliances. In open source IoT platform: Home assistant, demotics, OpenHAB uses IoT device security with message queuing, administration of devices, data collection, analysis, visualization, and integrity with services. The other IoT applications are transport, agriculture, production, environment, industrial, safety, and retail. Network connected objects will provide security and utilization of applications.

3.1 Sensors and Network Communication The availability of the services of the proposed architecture with efficient use of technology like CCTV, door sensors, smart lock sensors with gateway collection of the data using communication protocols, and alert or alarm warning information to the users. Security, availability, and response according to events and provide security, privacy is an important task in smart homes [19].

3.2 Smart Home—phone and notification through Functions Smart home functions in the proposed smart IoT home system will provide various functions according to the user specifications that use, smart control use of mobile phones, functions like IoT operations which are electric light On/Off refrigerator On/Off, door Open/Closed, security On/Off, smart speaker On/Off, smart TV On/Off, and temperature On/Off. In addition to home appliance control using smartphone and notification through Email, SMS, and other applications like solar power system and the smart parking applications according to the user specifications.

3.3 Smart Home—Automation Home IoT applications provide home automation using network service providers by quality of services. The standard traffic network management, security protocol Bluetooth use of optimizing data transmission using channels like Zig-Bhee, 5G network to automate the response of actions in home automation by sending signals in unidirectional sensors, interfaces, controllers, send commands to actuators (outputs).

148

P. Srinivas et al.

It uses residential gateway sensors, light, temperature with the use of the smartphone with interface functional control with actuators (i.e., light home automation based on WSN technology, Zigbee, and wi-fi technology) and android-based smartphone, functionalities of smart home systems [20]. House automation systems are connected with the Internet which communicates to the user. A sensor helps to user to provide security. All security of terminals, alarm, records with sensor data like video, door interface, and use of energy efficiently. The security system which uses automatic doors, safety, and alarm systems [21]. Home assistant process and home controlling including collect data, storing, and to analyze data. Home automation systems and trigger commands are based on configuration. Smart home trigger based on past behavior of a user, the apps will be used to control devices in mobile phones and tablet. The model with a scenario to design, control, and monitor system for the smart-based home systems [22, 23].

3.4 Smart Home—Controlling and Monitoring with Smart Mobile Apps and Functions The home automation will use IoT-based sensing and monitoring platforms that have sensors, signals in smart home—Automation. It uses communication with wi-fi, Bluetooth, and others. The functional principles are using Algorithm 1. The sensors, signals with devices read the data home security, smart home interior design, intelligent light, hardware, Arduino board, software IoT design, residential connectivity, home ecosystem devices, components, software, and sensors to monitor setup the functions in smart home technology that chooses a site for smart home construction. The architecture design that provides all amenities using IoT-based smart home technology according to the user specifications the some of the of functions which are energy monitoring, health smart parking, and smart gardening, etc. The controlling of operations and applications according to the sensor data with mobile app functions, for example, light on/off. Algorithm 1 // Pseudo Code for Home Appliances // Input: Using Mobile Interface Functions ON/OFF Output: Light ON/OFF Begin Step.1. Read the input data from Sensors with set up communication credentials Step.2. Use the Console Arduino-Board with Smart Phone App control operate the functions according to sensor data Step.3. Setup the Functions and operate Home appliances using sensor data using communication

Future Smart Home Appliances Using IoT

149

Step.4. Control the signals using sensors control the home appliances with required operations on/off Step.5 Report the information to the user End Developing a smart home with home safety, required functions monitoring, smart parking, and video surveillance, the smart home uses of hardware, software, sensors and utilizes the household applications that optimize home application systems by low-cost maintenance, energy (i.e., Remote adjustments, improve efficiency). For examples, applications lighting is light turn on/off, stove turn on/off, water on/off, video monitoring video conference with family, friends, security by video surveillance control— automatic alert, SMS, and detection. Preventive decision systems like healthcare application will use of Bluetooth technology to measure blood pressure, temperature, safety, alarm, activity monitoring as well and direct exercises, diet, food, and preventive measures, internal functions of each day life activities, SMS, alert, and functional operations to the stake holders [24, 25].

4 Discussion The IoT-based smart home will improve efficiency. Home lighting-automated lighting, managing energy with effective usage of mobile applications that can control Air conditioning, Security with use of Wireless Technologies. IoT problems are data management, Home care systems, Home appliances consumption, Home safety, Home security, device connectivity and software reliability. The system uses networking, hardware, software, communicating protocols, switches, Sensors to ensure system interfaces between them to provide security, planning, configuration, monitoring Connectivity for home applications with Time sensitivity networking. Communication end to end, time synchronization, latency, reliability, resource, features, options, configurations and protocols. The recommendations of Smart home reduce energy usage, warnings of defective are home, smart-phones in health care medical guidelines, health assistant, fire systems, security systems, Home safety, security, connectivity and scientific analysis. Smart Home is a combination of subsystems related to advanced technologies. The smart home interaction with a user and enhance safety, interaction, convenience, and optimize the people’s lifestyle. Support remote operations, it can monitor interacting with home through a mobile device in a remote work. The smart Home realizes real-time meter security is used for reading and timing process support network, business, and intelligent services like Home air conditioning, power information service, home appliances and electricity management. In future Smart Home applications can be extended to communicate service applications use Intelligent interactive terminal users to operate command interaction with smart home appliances security equipment systems.

150

P. Srinivas et al.

5 Conclusion and Future Scope Sensor networks are used to measure for understanding the environment, natural resources, urban environment, IoT, Internet into future specifications are user applications. Pervasive communication, IoT, and smart connectivity computer systems that are interacting with objects, sensors using smart devices like (Smartphone, smart watches). The Internet of Things helps humans in home automation, house hold operations which respond with actions. The proposed model will help the developers in smart home appliances using IoT which use sensors, hardware, RF-transmitter, receiver, sensors, user interfaces, hardware, processor, data collector analysis and reporting systems, and effective utilization of required functional services. In future, smart tags are used for logistics, vehicles using heterogeneous systems, smart traffic that automatically uses intelligent smart traffic and intelligent applications. Future Internet, Worldwide Internet of Things many objects, smart connectivity, network resources, smart phones, devices, objects, and intelligent environments that smart home appliances with integrated systems into future home appliances are used to identify the suspicious activity to detect any activity by video surveillance and report when events in critical conditions.

References 1. Lee, In, Lee, Kyoochun: The Internet of Things (IoT): applications, investments, and challenges for enterprises. Bus. Horiz. 58, 431–440 (2015) 2. Kumar, M., Ramu, P., Murty, C.H.A.S., Magesh, E., Lunagariya, R.: Mobile-based horne automation using Internet of Things (IoT). In: 2015 (lCCICCT). IEEE, pp. 340-34 (2015) 3. Alaa, M., Zaidan, A.A., Zaidan, B.B., Talal, M., Kiah, M.L.M.: A review of smart home applications based on Internet of Things. J. Netw. Comput. Appl. 1–36 (2017). Elsevier 4. Vignesh, G., Sathiya Narayanan, M., Abubakar, B.: Customary Homes to Smart Homes Using Internet of Things (IoT) and Mobile Application. IEEE, pp. 1059–1063 (2017) 5. Malche, T., Maheshwary, P.: Internet of Things (IoT) for building smart home system. In: International Conference on I-SMAC (IoT in Social, Mobile, Analytics, and Cloud) (I-SMAC 2017). IEEE, pp. 65–70 (2017) 6. Swetha, S., Suprajah, S., Vaishnavi Kanna, S., Dhanalakshmi, R.: An intelligent monitor system for home appliances using IoT. In: International Conference on Technical Advancements in Computers and Communications. IEEE, pp. 106–109 (2017) 7. Li, M., Gu, W., Chen, W., He, Y., Wu, Y., Zhang, Y.: Smart home: architecture, technologies and systems. In: ICICT-2018. Elsevier, pp. 393–400 (2018) 8. Petnik, J., Vanus, J.: Design of Smart Home Implementation Within IoT with Natural Language Interface. Elsevier, pp. 174–179 (2018) 9. Yang, H., Lee, W., Lee, H.: IoT smart home adoption: the importance of proper level automation. Hindawi. J. Sens. 1–11 (2018) 10. Mao, J., Lin, Q., Bian, J.: Application of learning algorithms in smart home IoT system security. Math. Found. Comput. 63–76 (2018) 11. Jo, H., Yoon, Y.I.: Intelligent smart home energy efficiency model using artificial TensorFlow engine. Hum. Cent. Computer. Inf. Sci. 8(9), 1–8 (2018) 12. Al-Kuwari, M., Ramadan, A., Ismael, Y., Al-Sughair, L., Gastli, A., Benammar, M.: Smarthome automation using IoT-based sensing and monitoring platform. In: 2018 IEEE 12th (CPEPOWERENG 2018), Doha, pp. 1–6 (2018)

Future Smart Home Appliances Using IoT

151

13. Somani, S., Solunke, P., Oke, S., Medhi, P., Laturkar, P.P.: IoT based smart security and home automation. In: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, pp. 1–4 (2018) 14. Ahmed, B.S., Bures, M., Frajtak, K., Cerny, T.: Aspects of quality in the Internet of Things (IoT) solutions: a systematic mapping study. IEEE Access 7, 13758–13780 (2019) 15. Batalla, J.M., Gonciarz, F.: Deployment of the smart home management system at the edge: mechanisms and protocols. Neural Comput. Appl. 31(1301–1315), 1301–1315 (2019) 16. Khalaf, R., Mohammed, A., Essa, E., Ali, H.: Controlling smart home activities Using IoT. In: 2019 International Conference on Computing and Information Science and Technology and Their Applications (ICCISTA), Kirkuk, Iraq, pp. 1–6 (2019) 17. Bhat, O., Bhat, S., Gokhale, P.: Implementation of IoT in smart homes. Int. J. Adv. Res. Comput. Commun. Eng. 6(12), 149–154 (2017) 18. Shah. H.: Home Automation Using IoT. https://www.simform.com 19. Batalla, J.M., Gonciarz, F.: Deployment of the smart home management system at the edge: mechanisms and protocols. Neural Comput. Appl. 31, 1301–1315 (2019). https://doi.org/10. 1007/s00521-018-3545-7 20. https://www.slideshare.net/shohin/iot-home-automation-using-arduino-cayenne 21. https://www.businesswire.com/news/home/20200102005197/en/Prominent-IoT-TechnologyLeader-Showcase-Newest-Must-Have 22. Cheruvu, S., Kumar, A., Smith, N., Wheeler, D.M.: Demystifying Internet of Things security successful IoT Device/Edge and Platform Security Deployment. Springer, pp. 347–411 (2020) 23. Linskell, J., Dewsbury, G.: Home automation system. Science Directory. In: Handbook of Electronic Assistive Technology. Elsevier (2019) 24. Kadima, M.N., Jafari, F.: A customized design of smart home using Internet of Things. In: ICIME2017. ACM, pp. 83–86 (2017) 25. Gubbi, J., Buyya, R., Music, S., Palaniswami, M.: Internet of Things (IoT): a vision architectural elements and future directions. Future Gener. Comput. Syst. 29, 1645–1660 (2013)

Multilingual Crawling Strategies for Information Retrieval from BRICS Academic Websites Shubam Bharti, Shivam Kathuria, Manish Kumar, Rajesh Bhatia, and Bhavya Chhabra

Abstract This paper proposes a web crawler for finding details of Indian origin academicians working in foreign academic institutions. While collecting the data of Indian origin academicians, we came across BRICS nations. In BRICS, except South Africa, all other countries have university websites in native languages. Even if the English version is available, it is with lesser data that can’t make the decision of whether an academician is of Indian origin or not. This paper proposes a translation method of the data from the main website in the native language to English language. It is to be noted that google translation on such website does not give output in the desired manner. We discover the area of translation using various APIs as well as other techniques available for the same like UNL, NER (provides a supportive role for translation), NMT, etc. Also, we will explore Stanford NER and segmenter for these operations. Keywords Web crawler · Indian origin academicians · BRICS · Language translation

S. Bharti (B) · M. Kumar · R. Bhatia Department of Computer Science and Engineering, Punjab Engineering College, Chandigarh, India e-mail: [email protected] M. Kumar e-mail: [email protected] R. Bhatia e-mail: [email protected] S. Kathuria Department Electrical Engineering, Punjab Engineering College, Chandigarh, India e-mail: [email protected] B. Chhabra Department of Computer Science and Engineering, SRM Institute of Science and Technology, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_17

153

154

S. Bharti et al.

1 Introduction The heart of any search engine is the data collected by its web crawler. A web crawler can be defined as the program that browses the internet storing links to and information about the pages they visit. There is a type of web crawlers that focus on only the web pages which are relevant according to a pre-defined set of keywords/topics. They are called focused crawlers [1]. For finding the information of Indian origin academicians [2], various constraints were devised, different educational domains were studied. After this, a relevancy checking-based mechanism was developed that can guide the focused crawling. Our developed crawler was running successfully for all the countries in which the academic websites are in English, but when we tried to apply the same thing on non-English websites problem raised. This paper discusses the problems that came across while crawling the BRICS nations, which have a high number of Indian origin academicians. However, these websites are mostly in their native language and have to be translated via the Google Translate API, which is automatically handled by the browsers but not by the web crawlers which directly visit the seed URL provided. This process yields very poor results as the keywords used to train crawler were in English language.

2 Problem Statement A. Language problem for academic websites of BRICS Nations BRICS Nations—The BRICS nations is an association of 5 nations, namely Brazil, Russia, India, South Africa, and China. As we need the data only from non-Indian nations, so India is excluded. For South Africa, the language of the University websites is English, so there is no need for translation. For Brazil, the official language of most websites is Portuguese, and some are in Spanish. Both these languages make no changes in the names, and hence extracting the data needs no translation there as well. For China, all the academic websites are in Chinese and hence the names as well need to be translated. But here, a mere Google Translate does not work as it sometimes gives the literal meaning of the word. B. Extracting data from the translated text From the converted texts driven from webpages, required data needs to be extracted. The data to be fetched includes anything that might prove useful for the purpose of the project, like name, department name, specializations, contact, etc. For Chinese websites, recognizing above-mentioned data from a translated webpage is difficult. Hence, we need to handle this challenge as well.

Multilingual Crawling Strategies for Information Retrieval …

155

Hence, we aim to enhance the current crawler [2], with the capability to handle these different languages, convert them to English based on the translation mechanism needed for that particular instance.

3 Techniques Used 3.1 UNL: Universal Networking Language UNL is a technique in which the given words/sentences are converted into Universal words and vice-versa. Thus, it focuses on taking a source language and converting it into a language which is independent of the given languages, whether source or target. As shown in Fig. 1, the system of UNL consists of the two core processes, namely UNLization and NLization, which are explained below. These two processes are explained in detail. (1) UNLization (using IAN): Interactive analyser (IAN) is a tool based on JAVA. It is a web application used for the process of UNLization. Its input is a natural language (source language), and it converts the given words into UNL form which is language independent. (2) NLization(using EUGENE): This is an online software tool. It was developed by UNDL organization. It is similar to IAN and was released in 2012 [3]. The UNL created in UNLization process is given as input to it. UNL Components: The various components of UNL are discussed next.

Fig. 1 System architecture of UNL

156

S. Bharti et al.

(a) Universal Words (UW): Universal expressions consists of nodes, which are formed, or represented, by the Universal Words. 2 other components, namely relations and attributes are combined to represent these words. The following format given for Eq. (1) is used to represent a Universal word in UNL. =[] (1) To demonstrate this process, we provide the following English expression in (2). English sentence: Man drives car. (2) Universal words here are man(icl>person)@singular),drive(icl>travel> do,agt>thing),car((icl>object)@singular))) (b) Relations: Relations are the links that exist between 2 UNLs. The relation names, that are then used to make UNL expressions, are pre-decided set of names. (c) Attributes: The subjective nature of a Universal Word in a sentence is depicted with the help of attributes.

3.2 NER: Named Entity Recognition NER is an initial step in data extraction. The major objective of NER is to locate and classify different entities (which are named) in the text provided, and allocate them into several pre-defined categories. These categories can be person, organizations, expressions of time, locations, monetary values, symbols, percentages, quantities, etc. One example of the same mapping is shown below in Fig. 2. Where the sentence is used to classify person and organization.

Fig. 2 An example of NER

Multilingual Crawling Strategies for Information Retrieval …

157

3.3 Direct APIs Many direct APIs like pytrans, googletrans, translation library of JAVA were used for the translation of Spanish and Portuguese. Direct use of googletrans is enough for these languages and gives high accuracy for the names.

3.4 NMT: Neural Machine Translation Neural machine translation (NMT) [3] has proved to be one of the most powerful and efficient algorithm to perform the task of natural language translation. Where statistical translation uses only data for translation, it also raises the issues wrong translation of sentences due to it, as mostly, they make no sense after the process. One of these models of NMT is the encoder decoder structure. This architecture as shown in Fig. 3 is comprised of 2 recurrent neural networks (RNNs) which are used together in tandem to create a model for translation. After coupling it with the power of attention mechanisms, this architecture can achieve impressive results [4].

4 Methodology For BRICS, we modified this approach to bring in translation as well, which makes the final approach as follows: 1. A list of University URLs is fed to crawler which visits them. 2. It extracts those URLs which might have faculty member names in the pages. 3. Applying NER on those pages, we can find proper nouns, which are mostly names. 4. These names, in case of BRICS nations, are translated to the English language.

5 Data Gathering The following datasets were used for the filtering process. Fig. 3 Structure of NMT translator

158

S. Bharti et al.

5.1 Dataset of Names The list of Indian names required for matching has been taken from Kaggle [5]. Also, the list of names we had from our ongoing project having database of the Indian origin academicians in US and UK. Further names from Lok-Sabha elections and parliamentary election [6] were extracted as to be candidate in the elections you need to be an Indian citizen.

5.2 List of Universities The start the crawling we need to have URLs of the academic websites that act as the seed URL for the crawler to get the academicians. The following are the approach we use to get the list of all the seed URLs of the home page for the universities in the BRICS nations. 1. Higher education boards and sources [7] 2. Seed URL using google maps crawling [8].

6 Counter-Intuitive Approach The proposed approach to deal with the change in names due to literal word translation by the google translate is as follow. We are making a comparison between the English version of the name and it’s the translation in Chinese. So that we can map the Chinese translation with the original Indian name [9]. The first part of this creating a mapping of Indian to Chinese names. For this, we used Google Translate to find the Chinese translation of the Indian name and then again translating that Chinese translated name back to English to check for the consistency in the translation and repeating the process one more time. This was done for 33,000 Indian names derived from the datasets mentions above. As shown in Fig. 4, the first column is the original name, and the consecutive columns are translations from Google Translate of the previous column. The reader can clearly observe the total change in name from column A to E after these translations. So, the set of translations is as follows: English -> Chinese -> English -> Chinese -> English So, the first case is looking for Indian names in the English article that contains Indian names and then counting the Indian names manually from the NER labeled PERSON entities in the text and also making a count by comparing with the dataset above. Next, we convert the English article to foreign language, say Chinese using Google Translate and then search for the Indian names in Chinese from the translated Chinese

Multilingual Crawling Strategies for Information Retrieval …

159

Fig. 4 Sample data set of names

text using the list of names in column B and column D. Surprisingly, the count in both the cases comes out to be different. Here, in the translated Chinese text file we use regex to remove any English words present in the file. Next, the Translated Chinese text from above is converted back to English and tested for getting the names using NER as in the first case.

7 Results Figure 5 shows the results from translation of entities of Chinese Websites. First row shows the length of entities we get after running NER on the sites—1818. In this corpus, the entities designated as PERSON were 2869, and it matched 2239 names from our dataset of names, whose size is 30,000. Also, total matches for overall entities is 35,821 (includes names, as well as other text on the website pages). Now, as shown in Fig. 4, we translated the 30,000 names back and forth to English and Chinese. Using the final translated names from those results, we match them again

Fig. 5 Sample results for Chinese website

160

S. Bharti et al.

with the NER entities. This time, the matches for English text were reduced to 450, and the Chinese matches reduced to 317. Similarly, for Row 8 and Row 9 which match the Chinese versions of the words. Applying all these translations on the text, the final matches with the dataset as shown in row 11:7814, which is a decrease of 78.18% from Row 4 where for the original text, the matches were 35,821.

8 Conclusion This paper discussed the usage of various techniques, and the results obtained after applying some of those viable techniques on the corpus of data. The results can be divided into 3 categories for BRICS nations: A. South Africa: The data is already in English and hence needs to translation, so just direct extraction of data is possible. B. Brazil: The languages for the websites are Portuguese and Spanish. Both these languages make no changes to the names of people, and hence names need not to be translated. C. Russia and China: For Russian and Chinese, the character set is different than English with some letters like M and V and others are not present in one or both of these. Hence, to generalize, any language which has a character set same as that of English is easily translatable using currently available methods, but other languages need different methods to achieve the same. A combination of NER and NMT-based translators seem a viable option for the same.

References 1. Kumar, Manish, Bhatia, Rajesh, Rattan, Dhavleesh: A survey of Web crawlers for information retrieval. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 7(6), e1218 (2017) 2. Kumar, M., Bindal, A., Gautam, R., Bhatia, R.: Keyword query based focused Web crawler. Procedia Comput. Sci. 125, 584–590 (2018) 3. Neural Machine Translation (online article). https://towardsdatascience.com/neural-machinetranslation-15ecf6b0b. Last accessed 1 March 2020 4. Wang, X., Zhu, C., Li, S., Zhao. T., Zheng, D.: Neural machine translation research based on the semantic vector of the tri-lingual parallel corpus. In: 2016 International Conference on Machine Learning and Cybernetics (ICMLC), Jeju, pp. 69–74. https://doi.org/10.1109/icmlc.2016.786 0879 5. https://www.kaggle.com/chaitanyapatil7/indian-names/version/1 [online dataset] 6. https://github.com/datameet [a community of Data Science enthusiasts.] 7. https://www.ugc.ac.in/oldpdf/Consolidated%20list%20of%20All%20Universities.pdf 8. https://github.com/shivamkathuria/Google-Maps-Crawler [to get code of developed crawler] 9. Creekmore, L.: Named entity recognition and classification for entity extraction. District Data Labs

Missing Phone Activity Detection Using LSTM Classifier Abhinav Rastogi, Arijit Das, and Aruna Bhat

Abstract We propose a smart phone application that aids a user to find his lost phone. In this application, we try to find the cases, where a mobile can be separated from the user, by training a classifier. The application identifies specific events, where a mobile phone goes away from the user, thus recording cumulative sensor data. The obtained sensor data can be used for further analysis of mobile phone surroundings which could narrow down the search domain. A more trivial example would be that GPS couldn’t help when you forgot, where the phone has been placed. So, the application can gather the information of surroundings at the last recognized event and make the search more effective. Keywords Mobile sensor · Smart phone detection · Long short-term memory · Recurrent neural network

1 Introduction Science and technology in today’s world has enhanced by a great extent. It is an effort for making humans life easier and importantly comforting. A lot of information that we might need can be stored in a smart phone. So, a mobile phone acts as our secondary brain not only to remember but also remind, see, and listen, covering basically all the senses humans have. With all these abilities, a smart phone has been such a helpful companion to our daily life, what if you lost it or forgot it at some place? And was low on battery causing it to turn off eventually? We will be left A. Rastogi (B) · A. Das · A. Bhat Department of Computer Science and Engineering, Delhi Technological University, Delhi 110042, India e-mail: [email protected] A. Das e-mail: [email protected] A. Bhat e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_18

161

162

A. Rastogi et al.

clueless. There aren’t many efficient ways to find the phone; the existing solutions give you a GPS location, which could be helpful only when outside. And the mobile need to be alive when the location request is triggered. What if you were doing some task at your home and forgot your phone? Like placing the phone in a bookshelf picking books. The GPS can just say that it is in your home but nothing more. As said a smart phone has a sensing capability like that of humans, we want to use this to advantage. A mobile phone can be made to sense the surroundings like lighting, coordinates (Indoor positioning if it can), sound signals (to identify if the environment is silent/noisy) along with GPS; at the instant, a phone got separated from the user [5]. On sensing that is recording the sensor information, it can be uploaded to a server. Thus knowing surrounding info on all cases when a phone identifies the trained instances. In this paper, we propose a mobile phone monitoring application. The application serves as an aid, helping narrow down the details of the surroundings when a phone gets separated from the user. The existing solutions make the user to query for the location after they realize that the phone is lost. Then, the application can give the GPS location contacting the mobile phone provided the mobile phone is on, whereas our application addresses the problem of phone being separated from the user through the question “How or when can a phone be separated from the user?” Thus, the trigger events, which indicate when a phone gets separated from the user. With this approach, the user can get the latest information about phones location before its dead (turned off). The trigger events can be a phone getting dropped from the pocket or hands unnoticed, placing it at a location and forgetting the location due to some other important distraction or a mobile phone being stolen from the user [2, 6, 7].

2 Related Work Loosing, a phone can be considered as one of the trivial problems yet difficult to recover, a mobile user could face. On losing a phone, a user would like to know where the phone currently is and want the phone not to be misused. Mobile phone operating systems like Android and iOS have apps Find your phone and Find My iPhone that can query the device for its location. But, the disadvantage with this approach is that the phone needs to be alive (ON) when the query is made; otherwise, the phone can’t respond to the query. A smart phone can learn the location of a neighborhood not only through physical coordinates but also in a logical way. That is, a smart phone can sense the light ambiance and sound in a neighborhood describing if the location is a quite/noisy place, a dark/bright area logically. Research is also being done in this field, where a mobile phone can locate users in adjacent stores, solving the problem through logical localization [5]. This added to the application, brings the search domain lower making work easy for users. The other approach is finding the trigger event of the phone being lost and alerting the user [1] proposed a framework called iGuard which triggers an alarm when it identifies a theft. Their paper addresses problem by analyzing the activity during moment of theft. That is, they described

Missing Phone Activity Detection Using LSTM Classifier

163

how the sensor signatures of a mobile phone would differ when being taken out by the thief and the mobile owner, thus alerting the user instantly on theft. However, the disadvantage of this approach is that feature extraction is done manually by the researchers, which is a very cumbersome task. For the activity of a missing phone, it is possible that we may miss out on the actual factors responsible for differentiating two different actions. The approach of our application aligns to their paper that is how or when a mobile phone could be separated from the user. In this paper, we make use of a deep learning approach to classify our sensor data for different activities.

3 System Design The application at a high level can be divided into communication among three components, (i) The mobile phone, to monitor the user’s activity, (ii) A classifier that can take in the sensor information from mobile phone and identify the trigger events, (iii) An online platform, where the sensor information about the neighborhood is logged automatically for further logical analysis when the trigger event is identified. Following are the assumptions for the app to be functional (i) The application needs to be started manually and continuously monitors the user activity for trigger, (ii) The mobile phone must be on when the trigger event occurred, (iii) The mobile phone must have Internet on, in order for the sensor to log information to an online platform. Figure 1 gives a high-level overview of the mentioned system. The most important and challenging component of the system is identifying the trigger event. The classifier must differentiate the short duration activities like taking the phone out of pocket using the features extracted from the sensor signatures of various activities. Once trigger has been identified, the system can log sensor information onto an online platform for further analysis. Fig. 1 High-level working

164

A. Rastogi et al.

Fig. 2 Algorithm

4 Algorithm/Method Design Figure 2 describes the algorithm of the application. Information shall be logged to the online platform directly in both the cases when the phone is dropped and when the phone is stolen. But in case of a mobile phone being placed on a surface, this event is a longer event, and the phone shall be stationary unless another activity is noted. Therefore, the algorithm checks if the mobile is inactive, and only then, it logs the sensor information. The inactivity of the mobile phone can be verified through sensor readings, mic can be used to know if the user is around [3]. This is rather a complex case because the mobile could be separated from the user, but he/she could be still aware that the phone is nearby. Therefore, the system needs all the possible factors that can assure that the user is nearby.

5 Experimental Setup 5.1 Preliminary In this section, we first develop various scenarios, where a person can be separated from his phone. Specifically, we take three different situations into consideration. (i) User places a phone on a surface and forgets about it. (ii) User drops his phone

Missing Phone Activity Detection Using LSTM Classifier

165

Fig. 3 Testing accuracy as predicted by our model

while walking or standing. (iii) User’s phone is stolen from his pocket by a thief or perpetrator. The implementation of this paper is focused on creating different signatures for each of these activities and then training a classifier to demonstrate how mobile sensor data could be used to warn the user about the missing phone problem in real time.

5.2 Data Collection We used the AndroSensor application available on Google play store to collect sensor data when the user performs several motions. We use the values of accelerometer, gyroscope, linear acceleration, and gravity sensors for our application. Experiments are performed on four android phones (Pixel 2, Samsung s7, Samsung Note3, Samsung S8). Data for all sensors is collected at a frequency of 10 Hz. We found four volunteers (2 female and 2 male) between the ages of 19–23 to perform experiments. In the first experiment, the user walks with the phone in his hand and accidentally drops it on the ground. As part of the second experiment, a volunteer walks with the phone in his pocket and then performs the activity of taking the phone out of his pocket. In the third experiment, while the first volunteer walks at a normal pace, the second volunteer performs the act of stealing the phone from the person’s pocket slyly. In the fourth experiment, the user places his phone on a plain surface. Further experiments comprised of the user walking, standing, and sitting with phone in his/her pocket/hand. Each experiment is performed by a volunteer for a period of seven sets of 6 min each. In each set, the specific activity is repeated periodically such that we

166

A. Rastogi et al.

collect 50 samples per volunteer per activity, leading to a total (50 × 7 × 4) 1400 samples. In addition to that, all experiments are video recorded, and data collected from the filters is labeled manually by comparing sensor data to the video frame by frame.

5.3 Data Processing Collected sensor data was passed through a low pass filter to remove noise. Next, data collected from each experiment per volunteer was sampled with a window size of 3 s with 50% overlap. Since the frequency of data collection is 10 Hz, we end up with 30 samples of data per input. Each sample is further represented by 12 features individually, where the twelve entities comprise of x, y, z values of each of the four sensors involved, namely accelerometer, gyroscope, linear acceleration, and gravity sensors. Data processing has been done using Python’s SciPy and NumPy libraries.

5.4 Training In most of the previous models for theft detection, feature extraction is done manually by the researchers. For example, [1] proposed a framework called iGuard, where the authors explicitly look for a specific signature in both the activities of the phone being taken out by the user himself and another, where the phone is stolen by the perpetrator. Their paper mentions how in the event of a user taking out the phone himself; the speed of the user first decreases; then, the phone is taken out, and then, normal speed is resumed. Also, for another case of the perpetrator taking out the user’s phone, the phone is first taken out; then, the speed of the perpetrator increases. These scenarios though true for most cases are highly specific, and in an activity of theft, the user or the perpetrator may not act like the model figures they have been represented as in the application. Feature extraction is a highly cumbersome task and requires precise feature engineering [4]. For the activity of a missing phone, it is possible that we may miss out on the actual factors responsible for differentiating two different actions. Also, as we consider more and more sensors in our application, it is a challenging task to analyze the effect of each sensor on different activities. To counter the above scenarios, we make use of a deep learning approach to classify our data for different activities. We make use of a LSTM recurrent neural network to classify mobile sensor data. The advantage of using LSTM is that while giving accurate results, it does the feature engineering for us [4]. Also, we can avoid the hassle of doing a whole lot of signal processing before using the sensor data in our model. We wanted to use an RNN for our activity identification model, because we were dealing with a sequence

Missing Phone Activity Detection Using LSTM Classifier

167

of sensor data. And also, because we wanted the neural network to learn the hidden features that differentiate two different activities.

5.5 Implementation The classifier has been implemented in Python using tensor flow. Along with that, all sensor processing, data preprocessing, and data analysis have been done using scikit-learn, NumPy, matplotlib, and pandas libraries in Python along with Jupyter Notebooks.

6 Performance Evaluation In this section, we evaluate the performance of our model under various scenarios to show its accuracy and robustness. We have tested our model using Android Phones (Pixel 2, Samsung s7, Samsung Note3, Samsung S8). The sampling rate is set as 10 Hz. This sampling rate is conducive to our model because each activity of taking the phone out, or stealing of the phone by the perpetrator takes roughly 3 s. Having a sampling rate of 10 Hz gives us 30 readings per sensor per time window, which is good enough to train our model. All experiments have been conducted in four different scenarios: (i) the library, (ii) an open area, (iii) an open area with people around, (iv) in an office like setting. To evaluate the accuracy of our model, we randomly segment our dataset into training and test set, in the ratio 4:1. Figure 4 highlights the accuracy of our model which is 95.52%, which is fairly good. We also calculate the precision (96.03%), recall (95.51%), and f1_score (95.50%) for our model. Figure 3 shows how the loss over both training and testing data decreases with increase in number of training iterations. Also, with increase in number of training iterations the accuracy of the model increases. Fig. 4 Training and testing losses and accuracies

168

A. Rastogi et al.

In the confusion matrix depicted below, we can find that while the activities of sitting, standing, and placing the phone on the table are detected without a miss; certain segments (25) of walking are misclassified as being dropped or as taking phone out of pocket. Apart from that, only four segments of the activity of phone being taken out by the user were classified as that being stolen by a thief. Also, only eight segments of the phone being stolen by a thief were classified as phone being taken out by self (Fig. 5 and Fig. 6).

Fig. 5 Confusion matrix normalized to percentage of total dataset

Fig. 6 Confusion matrix showing the accuracy of our model using different colors

Missing Phone Activity Detection Using LSTM Classifier

169

7 Conclusion The proposed solution can successfully identify the triggers with around 95.5% accuracy, thus can send the log information for an effective search. But the application has few drawbacks. The application is of no use when the trigger event happens, and the mobile phone is turned off. There is a solution, where a signal can be transmitted from the phone even if it is turned off using the bios battery. So, a minor log information can be embedded to signal can be sent, for recovery. We will have to manually open the app in order to check for triggers. Phone getting dropped and user falling down along with the phone can give the same results. Hence, the classifier can further be trained for such kind of scenarios. Therefore, proposed application can successfully recognize the trigger events, which play a vital role in automated sensor logging. This helps in removing the window between a user recognizing the phone is missing and responding to that, by automation, thus helping in finding the phone.

References 1. Jin, M., He, Y., Fang, D., Chen, X., Meng, X., Xing, T.: iGuard: a real-time anti-theft system for smartphones. IEEE Trans. Mob. Comput. 17(10), 2307–2320 (2018). https://doi.org/10.1109/ tmc.2018.2798618 2. Liu, X., Wagner, D., Egelman, S.: Detecting phone theft using machine learning, pp. 30–36 (2018). https://doi.org/10.1145/3209914.3209923 3. Chang, S., Lu, T., Song, H.: SmartDog: real-time detection of smartphone theft. In: IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), Chengdu, pp. 223–228 (2016). https://doi.org/10.1109/ithings-greencomCPSCom-SmartData.2016.61 4. Pulver, A., Lyu, S.: LSTM with working memory. In: International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, pp. 845–851 (2017). https://doi.org/10.1109/ijcnn.2017. 7965940 5. Uddin, M.P., Nitu, A.: A tracking application for lost/stolen android phones using face detection (2015) 6. Carrara, F., Elias, P., Sedmidubsky, J., Zezula, P.: LSTM-based real-time action detection and prediction in human motion streams. Multimedia Tools Appl. 78, 27309–27331 (2019) 7. Senyurek, V.Y., Imtiaz, M.H., Belsare, P., Tiffany, S., Sazonov, E.: A CNN-LSTM neural network for recognition of puffing in smoking episodes using wearable sensors. Biomed. Eng. Lett. 10, 195–203 (2020)

Suvarga: Promoting a Healthy Society R. L. Priya, Gayatri Patil, Gaurav Tirodkar, Yash Mate, and Nikhil Nagdev

Abstract In India, over 22% of the population is below the poverty line. This poverty pushes people on streets which in the future transforms into slums. These slums, as are not planned, lack certain necessities like electricity, sanitary services, and basic hygiene resources leading to a hub for the spread of diseases. In essence, the primary aim of this paper is to identify the leading causes of diseases in slum areas of Mumbai using data collected from IoT modules, health checkup drives, and various government authorities. With this information, the concerned civic authorities and slum residents will be alerted regarding the danger so that necessary action can be taken. This, in turn, promotes the healthier society in various slum regions of India. Keywords Internet of things (IoT) · Slum management · Sanitation · Decision tree · LSTM · Air quality index · Water quality index

R. L. Priya · G. Patil (B) · G. Tirodkar · Y. Mate · N. Nagdev Computer Department, Vivekanand Education Society’s Education of Society Chembur, Chembur, Mumbai 400074, India e-mail: [email protected] R. L. Priya e-mail: [email protected] G. Tirodkar e-mail: [email protected] Y. Mate e-mail: [email protected] N. Nagdev e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_19

171

172

R. L. Priya et al.

1 Introduction According to the United Nations (UN 2009) estimates, only 4% of the terrestrial surface is occupied by cities [1]. Though the percentage is so low, more than half the world population stays in these cities which eventually generates a huge imbalance in the world resources as this section consumes three-quarters of the world’s natural resources. For upgrading these slums, commonly, the first action taken is to demolish the slums and reallocate the residents but since 1970 there have been multiple recommendations by authors such as Turner (1972) which suggest otherwise. This gives birth to the concept of upgrading slums and their residents to a better standard of habitation [2]. The paper uses the approach of data analysis and deep learning to have a better understanding of this approach and provide solutions for implementing the same.

2 Literature Survey Slum management has always been a major issue in the city like Mumbai [3]. Current research by SRA maps the information of residents on a website with the help of drones. The government has gathered demographic data from this method. [4]. Although this method covers many data fields, major data fields like the health factors and pollution measures are ignored. Improved infrastructure can prove to be a major catalyst for achieving major sustainable goals. Taking this aspect into consideration, the UN-Habitat Opinion Survey method which was based on the nature of social reality and the perspective of the researchers was used in the slum residents of Africa. The result after analysis displayed that the infrastructure in Africa can be primarily developed with the help of proper water supply, road networks, and telecommunication [4]. To understand the positive and negative implications of upgrading slums, a case study was conducted in Moravia’s Neighborhood, Medellin. The principles of urban design strategies and urban rehabilitation programs were identified through technical documents, qualitative and quantitative data which was collected through surveys at the community level [5]. A non-integrated framework was adopted to evaluate the suitability of the interior design of a low-income multipurpose apartment to provide enhanced IEQ. Here, expert opinion survey was taken, AHP TOPSIS was performed, and the final optimized solution was generated [6]. This research included only the interior design and not the other parameters like pollution and health and other geographical aspects.

Suvarga: Promoting a Healthy Society

173

3 Proposed System 3.1 Overview To aid the health situation in the country, the model proposed a novel approach by building an Internet of things (IoT)-based intelligence system. The model will provide regular updates about the chances of epidemics, few symptoms to spot those diseases, and also emergency contacts of concerned doctors along with a few home remedies. It also alerts the government authorities regarding its aim to create the necessary awareness among the government authorities and the slum residents.

3.2 System Architecture The system is composed of various modules such as data collection from various sources (IoT, BMC health data, water data, sanitation, survey data), preprocessing, feature extraction, training, and testing models, displaying the final output on the web and mobile application as listed below and shown in Fig. 1.

Fig. 1 System architecture of Suvarga

174

R. L. Priya et al.

4 Detailed Architecture of Suvarga The detailed workflow of Suvarga as shown in Fig. 2 describes the data collection from heterogeneous sources with data preprocessing to build a better prediction model.

Fig. 2 Detailed architecture of Suvarga

Suvarga: Promoting a Healthy Society

175

Table 1 Component in air quality monitoring device Component name

Features and description

MQ135

Gases, including NH3 , NOx, alcohol, benzene, smoke, and CO2 , are detected by this air quality sensor

MQ2

Combustible gas and smoke at concentrations from 300 to 10,000 ppm are detected by the semiconductor gas sensor

MQ3

This sensor is used to detect leakage of flammable gases (LPG), methane

4.1 Air Quality Monitoring System To calculate the air quality index of specific regions, we have built an IoT based module. Each module is composed of MQ series sensors (MQ135, MQ2, MQ3) to measure the air quality index. These sensors are placed on an ESP8266 (Table 1).

4.2 Water Quality Monitoring System Poor water quality is a big issue, especially in slum regions. To test the water quality, the model uses the BMC water quality monitoring module. The data gives out the pH, dissolved oxygen, BOD, COD, etc. Over ten years of data is collected.

4.3 Sanitation Sanitation data from BMC gives the distribution of toilets for men and women in the Chembur region. It provides information on the number of toilets with respect to the number of people.

4.4 Algorithms Used The model was trained using algorithms like long short-term memory (LSTM) and decision trees. Later it was compared to choose the best algorithm.

4.5 Intelligent System The proposed system aims to build an intelligent system for promoting healthy living in various slum regions of India. It consists of mainly two components such as the

176

R. L. Priya et al.

prediction or analysis model and data visualization model. Such components are designed to display the analysis obtained from the analysis model and visualized in graphical formats via the web application.

4.6 Prediction Model Using the data collected from the IoT module for reading air quality parameters and others collected from various other sources via government authorities such as BMC and MPCB, an analysis algorithm is run to predict the values of air/water and correlation among various features of the dataset. LSTM algorithm is applied to data, and the correlation among features is found using Pearson’s correlation formula available in the pandas’ library.

4.7 Data Visualization The final outcome of all the analysis needs to be presented to the layman in terms understandable; hence, a user-friendly web app is built for the government authorities as well as the slum residents. Each can access multiple features of the web app like predictive analysis of air and water quality in the future, basic care, and home available remedies to prevent oneself and loved ones from epidemics.

5 Implementation 5.1 Need for Real-Time Monitoring The aggregated data provided by the government is useful in data analysis over long time but there can be mishaps. To address this problem, Suvarga has developed a network of IoT devices that would be installed in the slums. These IoT devices would act as a network, continuously monitoring the various air quality parameters.

5.2 Experimental Setup of Air Quality Monitoring Device The IoT device consists of a microcontroller called ‘NodeMCU.’ It is capable of being interfaced with gas sensors and also transmits data over a Wi-Fi network. The sensors that are interfaced with this microcontroller are MQ135, MQ3, and MQ2 sensors.

Suvarga: Promoting a Healthy Society

177

Fig. 3 Experimental setup of the air quality real-time monitoring device

Three different IoT devices are fitted at the three corners of the slum area. The devices act in unison, forming a mesh, transferring data to a common device acting as the source which forms a server. The data received over the IoT module is sent to this centralized data visualization Cayenne server. The data could be visualized in real time and plotted on a live graph. The set trigger is activated when a sensor gives a value that crosses a threshold, indicating that an accident has taken place. An instant notification in the form of SMS and email alert to the concerned government authority is sent simultaneously. With this, the government can send instant relief or could take necessary actions to pacify the toxic environment. As shown in Fig. 3, it describes the experimental setup of the air quality real-time monitoring device.

5.3 Slum Health Drive Data Analysis A slum health drive was conducted for the residents of the slum adjoining VESIT on the 25th of January 2020. The parameters of the data that the team obtained from the health drive are name, gender, age, weight, height, poverty status, toilet, drainage linked to the house, waste collection system, compost pit, source of water, washing clothes and utensils, alcohol, diabetes, hypertension, cholesterol, level of education, Aadhaar card, authorized electricity connection, bank account, computer literate, and source of income. Figure 4 shows the ratio of blood pressure people. Figure 5 represents the percentage of people that were healthy by weight, or overweight, or underweight.

178

R. L. Priya et al.

Fig. 4 Blood pressure ratio

Fig. 5 Weight ratio

5.4 Sanitation Data Analysis Open defecation has been an onerous issue for a while, causing a variety of healthrelated issues. The team of researchers decided to collect sanitation data from the government authorities of the Chembur region through BMC offices. The dataset obtained was in the form of a CSV file encompassing various parameters including hypertension, fever, asthma, communicable, etc. The dataset comprises 172 rows (records) and 7 columns (attributes). A ward by ward analysis is done to ensure proper sanitation facilities which exist in every ward so as to not strain the resources. The group by method in pandas (a library in Python that deals with data frames) is used to group the data by each ward. To find out, if all wards have toilets commensurate with each other, a pie chart has been plotted that shows the distribution of the toilets in the region. A discrepancy has been observed in the distribution. While the number of toilets in ward number 154 soars as high as 32, the number of toilets in ward number 149 is a meager 3. The total toilets in the region are 2669. According to research, the number of people per toilet is 100 but using population data obtained, the number of persons per toilet is close to 457 (Fig. 6).

Suvarga: Promoting a Healthy Society

179

Fig. 6 People per toilet

5.5 Health Data Analysis The health data was obtained from the hospitals in the vicinity of the area where cases of malaria and other respiratory infections happened sporadically. The dataset obtained from BMC offices was month-wise historical data of years 2017 and 2018 and had the name of the dispensary, the month under consideration, and the average levels of health-related parameters like the total number of people suffering from asthma, malaria, URTIs, and heart diseases to name a few. The air and water quality data obtained encompassed historic data for the past four years, whereas the health data procured from the BMC authorities attributed records from 2017 to 2018. To avoid any aberrations, air and water quality data of 2018 and 2017 is taken into account along with health data of the same two years. Correlation between all the parameters is obtained, and the results of which are stored in a correlation matrix. All the correlations are stored in the matrix and then sorted in ascending order using quick sort. The variables having the most correlation are shown in Fig. 7. However, the correlations above show the relation between the attributes of the same table. A correlation between air quality parameters and URTIs is established, and in the same way, malaria is correlated with the biological oxygen demand in the water. According to research, there has been an association between upper urinary tract infection and respirable suspended particulate matter (RSPM) [7]. An increase in particulates is detrimental to health as it causes a variety of conditions related to respiratory tracts. The analysis conducted (Fig. 8) also states the direct positive association between the two factors.

6 Results and Analysis The comparative study established between the regression algorithms suggests that Decision Tree Regression achieves the lowest error rate when evaluated with error measurement metrics for regression comprising of mean squared error (MSE), mean absolute error (MAE), and R2 score as compared to the other regression algorithms

180

Fig. 7 Correlation matrix

Fig. 8 URTI versus RSPM

R. L. Priya et al.

Suvarga: Promoting a Healthy Society

181

Fig. 9 R2 score

being tested on these metrics like Lasso Regression, Lasso Lars Regression, Bayesian Regression, and Random Forest Regression (Fig. 9).

6.1 Mean Squared Error Mean squared error is a metric that tells how close the predicted points are to the actual points on the regression line. L=

 2  1  ˆ Y −Y N

(1)

where: L—loss, Yˆ —output, Y —actual value, and N—samples. The results of MSE indicate that Lasso Lars Regression gives the maximum error of 381.74, while Decision Tree Regression gives the least MSE of 5.65. Random forests perform better than most of the algorithms except decision trees giving an MSE of 17.19. The results are shown in Fig. 10.

6.2 Mean Absolute Error Mean absolute error is the measure of the difference in the actual value and the predicted value.

182

R. L. Priya et al.

Fig. 10 Mean squared error n 1 |xi − x| MAE = n i=1

(2)

n = the number of errors,  = symbol for summation, and |x i − x| = the absolute errors. From the chart in Fig. 11 below, we can infer that the MAE for decision trees is the least with 0.64, while Bayesian and Lasso Regression give the most R2 score among the five indicating poor performances. The values of MAE of other algorithms lie between these two, represented by the bar chart.

7 Conclusion and Inferences The research undertaken particularly focuses on improving the health and sanitation facilities on the slums in Chembur, Mumbai region. Harnessing the potential of artificial intelligence, data analysis, and Internet of things (IoT), the proposed system predicts the patterns in air quality, water quality, and sanitary facilities and builds a strong interdependence of these environmental and sanitary factors on the health of the individuals residing there. Through sanitation data, it was found out that the number of people per toilet was 456.955 which were a lot higher than the ideal ratio which is 100 people per toilet. A correlation is also established between the numbers of malaria patients in the hospitals in the vicinity of the slums to the water quality Index. It was concluded that URTI and RSPM have correlation of 0.547853, thus having a significant correlation.

Suvarga: Promoting a Healthy Society

183

Fig. 11 Mean absolute error

Data from the NGO health drive conducted indicated that a 57.15% of residents were either overweight or underweight. Moreover, the R2 score obtained from decision trees has a value of 0.99, indicating almost perfect prediction. The government could use the findings of the research to take appropriate actions to assuage the detrimental effects of poor well-being, unclean surroundings, and polluted environment on the dwellers of slums.

References 1. United Nations: World Population Prospects: 2009 revision, Population and Development Division, Department of Economics, and Social affairs (2009) 2. Building resilience of urban slums in Dhaka, Bangladesh, Iftekhar Ahmed. Procedia-Soc. Behav. Sci. 218 (2016) 3. Dikhle, S., Lakhena, R.: GIS-Based slum information management system. In: 17th Esri India User Conference (2017) 4. Arimah, B.: Infrastructure as a catalyst for the prosperity of African cities. In: Urban Transitions Conference, Shanghai (2016) 5. Vilar, K., Cartes, I.: Urban design, and social capital in slums. Case Study_ Moravia’s Neighborhood, Medellin (2004–2014) 6. Sarkar, A., Bardhan, R.: Improved indoor environment through an optimised ventilator and furniture positioning: a case of slum rehabilitation housing, Mumbai, India. Accepted 1 Dec 2019 7. Li, Y.R., Xiao, C.C., Li, J., Tang, J., Geng, X.Y., Cui, L.J., Zhai, J.X.: Association between air pollution and upper respiratory tract infection in hospital outpatients aged 0–14 years in Hefei, China: a time series study. Public Health 156, 92–100 (2018)

Multi-task Data Driven Modelling Based on Transfer Learned Features in Deep Learning for Biomedical Application N. Harini, B. Ramji, V. Sowmya, Vijay Krishna Menon, E. A. Gopalakrishnan, V. V. Sajith Variyar, and K. P. Soman

Abstract Accurate automatic Identification and localization of spine vertebrae points in CT scan images is crucial in medical diagnosis. This paper presents an automatic feature extraction network, based on transfer learned CNN, in order to handle the availability of limited samples. The 3D vertebrae centroids are identified and localized by an LSTM network, which is trained on CNN features extracted from 242 CT spine sequences. The model is further trained to estimate age and gender from LSTM features. Thus, we present a framework that serves as a multi-task data driven model for identifying and localizing spine vertebrae points, age estimation and gender classification. The proposed approach is compared with benchmark results obtained by testing 60 scans. The advantage of the multi-task framework is that it does not need any additional information other than the annotations on the spine images indicating the presence of vertebrae points. Keywords Vertebrae localization · Transfer learning · Multi-task model · LSTM · CT spine volumes · Convolutional neural network

1 Introduction Automatic identification and localization of spine vertebrae points from Computerized Tomography (CT) spine scans is challenging due to the homogeneity and symmetry of structure of each vertebrae [12]. The main goal of this work is to transfer learn from existing image networks, to overcome the limitations of limited N. Harini (B) · B. Ramji · V. Sowmya · V. Krishna Menon · E. A. Gopalakrishnan · V. V. Sajith Variyar · K. P. Soman Center for Computational Engineering & Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] URL: https://www.amrita.edu K. P. Soman e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_20

185

186

N. Harini et al.

data samples in deep learning, adapting them for biomedical application. We also extend the model to classify gender and predict the age of the patients using the features extracted from spine CT volumes. This extension might help in analysing spine diseases for a particular time or place, even when there is no metadata. The proposed method uses transfer learning to extract the feature descriptor of each spine scan images from each spine sequence. Those feature descriptors are then fed to an LSTM network combined with dense layers, to localize the spine vertebrae for each spine sequence. The final dense layer features are used to classify the gender and to predict the age of the patient using a random forest classifier and regressor respectively. We also report a comparative study on our results with previous techniques that have used the same dataset. Furthermore, the age prediction and gender classification are evaluated using standard cross validation; 10, 20 and 30% of the available training set. The following are the major contributions and novelty of the proposed method 1. Limited availability of the data is handled with transfer learned feature extraction for spine CT volumes in the MICCAI 2014 challenge dataset. 2. The LSTM network is utilized to extract continuity information from feature descriptors of the spine volumes, where each feature descriptor is handled as each instance of the spine scan. 3. A novel extension to identify age and gender of the patient.

2 Literature Survey A good amount of spine scan images are necessary to model a machine learning problem yielding high localization accuracy. This creates challenges where availability of data is scarce due to legal and medical bindings. Furthermore, to perform segmentation or localization tasks, annotating large volumes of data (by domain experts) is necessary, which is a tedious and challenging task. Generally, low amount of data is dealt with by many, using traditional augmentation techniques such as translation and rotation [9, 10] and GAN [7] as proposed for classification [5] and segmentation [2]. Though generation of data helps to improve many computer vision applications, generating spine scans comparable to original is risky and highly challenging. So we have used, another approach; transfer learning a feature extraction network, that will leverage using existing experience, without any synthetic data generation [13]. Transfer learning approaches using a pre-trained CNN networks has improved the results on various medial imaging applications as shown in [8, 13]. Among other methods for localization of spine vertebrae, Glocker et al.’s random forest (RF) regression and Hidden Markov Model achieved benchmark results on the MICCAI 2014 Computational Challenge on Vertebrae Localization and Identification [6] dataset. It uses hand crafted features from CT spine volumes. Chen et al. proposed three stages; a coarse vertebra candidate localization, vertebrae identification using JCNN, and localization refinement with shape regression model [3]. This method employs a binary

Multi-task Data Driven Modelling Based on Transfer …

187

RF classifier, based on HOG features extracted from CT volumes, into vertebrae and non-vertebrae, trained using the ground truths provided by domain experts. The method succeeded in improving accuracy at the expense of complex computations. Liao et al. proposed a multi-task model that provides short and long range contextual information using CNN and bi-directional RNN (Recurrent Neural Network) [11]. Wang et al. proposed combined Deep Stacked Sparse Auto Encoder (SSAE) Contextual Features and Structured Regression Forest to identify and localize spine vertebrae [14]. This method was evaluated with the MICCAI 2014 test set and 38 local datasets and the results were compared with previous techniques. None of these methods have used transfer learned feature extraction in vertebrae localization and Identification tasks.

3 Methodology The overall architecture of the proposed work is shown in Fig. 1. It has three stages, where each stage serves as a feature extractor for the next stage. Dataset Description: MICCAI 2014 conducted a challenge titled, “Vertebrae Localization and Identification” which consists of 242 training and 60 testing scans of CT spine volumes. Each scan in the training and testing set is manually annotated with 3 dimensional vertebrae centroids with labels; the meta data such as age and gender of the patient is available only for the training set. Within each scan in the

Fig. 1 Overall architecture proposed for the identification and localization of vertebrae centroids and estimation of age and gender

188

N. Harini et al.

dataset there are a varying number of CT images between 31 and 511 grayscale images of size 512 × 512. The total vertebrae centroids available are 7 cervical, 12 thoracic, 5 lumbar and 2 sacrum. Stage 1—Transfer Learned Feature Descriptors: The extraction of feature descriptors from the CT volumes is an essential step to train an automatic localization model. Each scan is converted to a feature vector using a pre-trained Dense network. This network is trained on Imagenet dataset and the final Global Average Pooling (GAP) layer is selected to extract the feature for each image in the scan volN where each image I umes. Each scan has N number of images represented as {In }n=1 is converted to f (I ) as a transfer learned feature descriptor with vector length 1664 as the final GAP layer of Dense network yields the same. Stage 2—Vertebrae Localization using LSTM layer: The converted training set, where each scan represented as f (I ) where n in range of 1 to N has vertebrae centroids as V = {C1 , C2 , C3 , . . . , S1 , S2 } where each element is a 3D point represented as (x, y). It consists of total 26 vertebrae centroids resulting in a vector length of 78 (26 * 3) and the missing centroids are set as zero. The feature descriptors (vector) for the training set are used to train an LSTM layer combined with a dense layer; the final fully connected regression layer gives the centroid points. Stage 3—Age and Gender identification: From the trained LSTM Network, the resultant feature before the regression layer serves as the feature descriptor for each scan. In this module, each scan is represented as f (s) where s is the CT spine volumes and f (s) is the feature extracted from a dense layer with 256 neural nodes. Those features are used to train a random forest regressor to estimate the age and a classifier to identify the gender. Random forests are the ensemble machine learning algorithm which can handle both regression and classification [4].

4 Experiments and Results The feature descriptor obtained by the transfer learning are embeddings, which are used for the vertebrae centroids identification, age and gender prediction, a relevant metric should be chosen to evaluate the extracted features. Multiple pre-trained CNN networks are evaluated based on the cosine distance between these embeddings (which are our feature descriptors). The premise is, each scan maps to a different target vector V , so the cosine distance between the images belonging to different scans should indicate divergence; all the images of the same scan are expected to show proximity. The cosine distance between the images are computed as follows Dsame =

N N   Ai × B j 1 N (N − 1) i=1 i= j, j=1 | Ai | × | B j |

(1)

where Dsame represent the distance of a scan with N images. Similarly, the same formula is used to calculate the distance between two scans in which A is the feature

Multi-task Data Driven Modelling Based on Transfer …

189

vector of the image belonging to one scan and B is the feature vector of the image belonging to another scan. The cosine distance between two scans of N1 and N2 images respectively is computed as follows Ddiff

N1 N2   Ai × B j 1 = N1 (N2 − 1) i=1 i= j, j=1 | Ai | × | B j |

(2)

The obtained cosine distance between the experimented pre-trained networks is as tabulated in Table 1. The experimental results between different pre-trained CNN networks are tabulated in Table 1. The best model is that which has larger difference between the estimated Dsame and Ddiff . The Dsame is estimated for all 242 scans and the mean is computed. Similarly, Ddiff is estimated between 242 scans which produces 58,322 values ((242 * 242) − 242) and the mean is computed. This way, the model Densenet 169 is selected as a Stage-1 network that gives the feature descriptor of length 1664 for each image in the scans, with which the stage2 network is trained to identify and localize the vertebrae centroids. The extracted feature descriptors are of varying sequence length. Each sequence is the features of each scan represented as a 2D matrix with size N × 1664 where N is the number of images in each scan. The maximum sequence length (511) is selected to which all the sequences are zero padded. These sequences are fed to the LSTM network as shown in Fig. 1. Various experiments on selecting the network are performed and analysed based on the mean localization error. The results obtained on different experiments are tabulated in Table 2. It explains the difference with the results obtained before and after preprocessing steps implemented on the target vector. The target vector is heavily sparse as not all the centroid points are available for all the scans and it scaled and transformed based on the minimum and maximum value of the centroid points. The comparative result analysis is based on Mean Localization Error (MLoE). It is calculated as the mean of distance(in mm) between each predicted vertebrae centroids and the manual annotations. From Table 2, it is evident that the LSTM cells returning the hidden state of every instance with GAP layer and the preprocessing steps improved the obtained results.

Table 1 Comparison between pre-trained networks for feature extraction based on the cosine distance Model Cosine distance Cosine distance (diff) Difference in distance (same) Inception V3 Densenet 169 VGG 19 Xception Resnet 50

0.246 0.230 0.016 0.320 0.036

0.308 0.319 0.021 0.406 0.054

0.061 0.089 0.004 0.086 0.050

190

N. Harini et al.

Table 2 Comparison between the experiments performed on localization of vertebrae centroids Model MLoE (before preprocessing) MLoE (after preprocessing) LSTM(1024) + Dense(256) + Dense(78) LSTM(1024) + Dense(256) + GAP + Dense(78) LSTM(512) + Dense(256) + Dense(78) LSTM(512) + Dense(256) + GAP + Dense(78)

34.81

17.96

28.72

16.63

28.04

16.11

26.33

14.71

Table 3 Comparison of proposed results with benchmark results Method

Glocker [6]

JCNN [3]

Proposed

Region

Counts Id.Rate Mean (%)

Std

Id.Rate Mean (%)

Std

Id.Rate Mean (%)

Std

All

657

74.04

13.20

17.83

84.76

8.82

13.04

85.60

14.71

18.95

Cervical

188

88.76

6.81

10.02

91.84

5.12

8.22

86.44

12.01

15.56

Thoracic

324

61.74

17.35

22.30

76.38

11.39

16.48

81.85

10.87

14.25

Lumbar

113

79.86

13.05

12.45

88.11

8.42

8.62

88.22

12.14

12.84

V =

V − min(V ) max(V ) − min(V )

(3)

LSTM network mentioned without GAP layer returns only the output of the final LSTM cell but not, every cell. Extracting the information provided by every LSTM cell and averaging the features vertically (GAP) before the regression layer (Dense(78)) provides lesser localization error. Furthermore, the results obtained by the best LSTM network is compared with the previous benchmark results that have used the same test set evaluation as shown in Table 3. Though the network fails to provide localization with lesser mean and standard deviation error with respect to Cervical and Lumbar centroids, it gives better and uniform identification accuracy across all types of centroids as shown in Fig. 2a. The method provides better results than Glocker [6] without any classification on images as vertebrae or background as presumed by JCNN [3]. The information that whether an image in the CT spine volume is a vertebrae or a background is not available in the challenge dataset and it requires domain expert’s knowledge. Chen et al. has reported the variation of identification rate for all the vertebrae centroids [3] where the identification rate ranges between 30 and 100% in which the thoracic region experiences a lower identification rate. In Fig. 2a and b, the identification rate and the localization error between all the centroids are shown. The proposed method has the identification rate varying between 70 and 90% resulting in a balanced estimation on vertebrae centroids. The

Multi-task Data Driven Modelling Based on Transfer …

191

Fig. 2 a Identification rate of the predicted vertebrae centroids. b Localization error of the predicted vertebrae centroids Table 4 Comparison on gender estimation between different validation splits Validation split 10% 20% 30%

Accuracy F1 score Confusion matrix

Train −216 Validation −25 0.64 0.7272   4 9 0 12

Train −192 Validation −49 0.653 0.7462   7 13 4 25

Train −168 Validation −73 0.6164 0.7142   10 14 14 35

stage 3 in the proposed model is a novel approach that extends the use of features extracted from the stage 2 network. The MICCAI 2014 challenge provided the meta information for every scan in the training set and it was not utilized in any previous approaches as the main goal is to identify and localize the vertebrae centroids [1]. This extension is evaluated by different validation splits on the training set due to availability of metadata only for the training set. The features obtained by the GAP layer of the LSTM network is used to estimate the gender and age. In the training set, the distribution of the data between the two genders are imbalanced, and so a weighted class with more weightage to the Female class is trained on different validation splits as shown in Table 4. In all the three splits, the accuracy, F1 score and confusion matrix obtained on testing the validation data is compared. The model is evidently capable of estimating the gender of the patient from the spine scans with an F1 score of 0.70. The same features, that are used to train the gender classifier, is utilized to identify the age of the patients. The age range of the patients available in the train data varies between 10 and 100 which is a wide space, the target variable (age) is transformed in the range between 0 and 1 and trained using a random forest regressor. The results obtained on evaluating different validation splits as experimented in gender identification are compared based on Mean Absolute Error (MAE) in Table 5. As the estimation of the

192

N. Harini et al.

Table 5 Mean absolute error (MAE) between the age ranges among different validation splits Validation split (%)

Age ranges

MAE on validation

10–20

20–30

30–40

40–50

50–60

60–70

70–80

80–90

90–100

10

0

6.22

0

2.84

1.31

5.70

4.90

0

0

5.74

20

0

6.54

1.13

2.33

4.44

3.70

4.14

6.25

0.7

5.97

30

0

7.74

1.23

2.46

4.60

3.64

4.22

5.98

1.10

5.59

age is with the mean absolute error not more than 6, the model is evidently believed to estimate the age from spine scans with less difference to the actual age of the patient. The error obtained with age ranges in the validation data is tabulated in Table 5. The error rate is lesser for the scans that belong to the patient of age lesser than 80 and greater than 30. It might be due to the more amount of data in the train set for the mentioned age ranges.

5 Conclusion and Future Work Clinical spine diagnosis and spine disease trend analysis can be assisted with a multitask data driven model that localizes spine vertebrae centroids and identifies the age and gender using spine CT volumes. The proposed model handles the disadvantages of limited sample dataset through transfer learning feature extraction using a novel performance analysis to find the right transfer learning network. The model also seeks to perform uniformly across all types of spine vertebrae by producing identification rates between 70 and 90%. The novel extension of the model is not just limited to identify age and gender but can also be extended to cluster the scans belonging to the same patient. This will be tabled for future research since the metadata in the challenge dataset also includes the annotation of scans belonging to the same patient. Though the proposed algorithm outperforms the benchmark results with identification rate of the vertebrae centroids, the model can be hyper tuned to further reduce the localization error.

References 1. http://csi-workshop.weebly.com/challenges.html 2. Bowles, C., Chen, L., Guerrero, R., Bentley, P., Gunn, R., Hammers, A., Dickie, D.A., Hernández, M.V., Wardlaw, J., Rueckert, D.: GAN augmentation: augmenting training data using generative adversarial networks. arXiv preprint arXiv:1810.10863 (2018) 3. Chen, H., Shen, C., Qin, J., Ni, D., Shi, L., Cheng, J.C., Heng, P.A.: Automatic localization and identification of vertebrae in spine CT via a joint learning model with deep neural networks. In:

Multi-task Data Driven Modelling Based on Transfer …

4. 5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

193

International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 515–522. Springer (2015) Cutler, A., Cutler, D.R., Stevens, J.R.: Random forests. In: Ensemble Machine Learning, pp. 157–175. Springer (2012) Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan, H.: GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 321, 321–331 (2018) Glocker, B., Zikic, D., Konukoglu, E., Haynor, D.R., Criminisi, A.: Vertebrae localization in pathological spine CT via dense classification from sparse annotations. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 262–270. Springer (2013) Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014) Hon, M., Khan, N.M.: Towards Alzheimer’s disease classification through transfer learning. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1166– 1169. IEEE (2017) Hussain, Z., Gimenez, F., Yi, D., Rubin, D.: Differential data augmentation techniques for medical imaging classification tasks. In: AMIA Annual Symposium Proceedings. vol. 2017, p. 979. American Medical Informatics Association (2017) Kwasigroch, A., Mikołajczyk, A., Grochowski, M.: Deep neural networks approach to skin lesions classification—a comparative analysis. In: 2017 22nd International Conference on Methods and Models in Automation and Robotics (MMAR), pp. 1069–1074. IEEE (2017) Liao, H., Mesfin, A., Luo, J.: Joint vertebrae identification and localization in spinal CT images by combining short- and long-range contextual information. IEEE Trans. Med. Imaging 37(5), 1266–1275 (2018) Schmidt, S., Kappes, J., Bergtholdt, M., Pekar, V., Dries, S., Bystrov, D., Schnörr, C.: Spine detection and labeling using a parts-based graphical model. In: Biennial International Conference on Information Processing in Medical Imaging, pp. 122–133. Springer (2007) Van Opbroek, A., Ikram, M.A., Vernooij, M.W., De Bruijne, M.: Transfer learning improves supervised image segmentation across imaging protocols. IEEE Trans. Med. Imaging 34(5), 1018–1030 (2014) Wang, X., Zhai, S., Niu, Y.: Automatic vertebrae localization and identification by combining deep SSAE contextual features and structured regression forest. J. Digit. Imaging 32(2), 336– 348 (2019)

Punjabi Children Speech Recognition System Under Mismatch Conditions Using Discriminative Techniques Harshdeep Kaur, Vivek Bhardwaj, and Virender Kadyan

Abstract It is a very difficult challenge to recognize children’s speech on Automatic Speech Recognition (ASR) systems built using adult speech. In such ASR tasks, a significant deteriorated recognition efficiency is observed, as noted by several earlier studies. It is primarily related to the significant inconsistency between the two groups of speakers in the auditory and linguistic attributes. One of the numerous causes of conflict found is that the adult and child speaker vocal organs are of substantially different dimensions. Discriminatory approaches are noted for dealing extensively with the effects emerging from these differences. Specific parameter variations have been introduced with boosted parameters and iteration values to achieve the optimum value of the acoustic models boosted maximum mutual information (bMMI) and feature-space bMMI (fbMMI). Experimental results demonstrate that the feature space discriminative approaches have achieved a significant reduction in the Word Error Rate (WER). This is also shown that fbMMI achieves better performance than the bMMI and fMMI. Recognition of children and the elderly will need even more studies if we are to examine these age groups features in existing and future speech recognition systems. Keywords ASR · Children speech recognition · Kaldi toolkit · Discriminative techniques · Acoustic mismatch

1 Introduction Speech recognition devices are usually based on adult data. While the latest technology of speech recognition is not yet ideal for adults, the task of building children’s H. Kaur · V. Bhardwaj (B) Chitkara University, Institute of Engineering and Technology, Chitkara University, Punjab, India e-mail: [email protected] V. Kadyan Department of Informatics, School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_21

195

196

H. Kaur et al.

spoken dialog applications faces far greater challenges [1]. Children are an important segment of users that will benefit from advances in multimedia technology. In multimedia games and computer instructional material, children are one of the primary potential users of computers for conversational interaction. By using spoken language interfaces, children are generally comfortable and happy. In order to make ASR interfaces more interesting to interact with, it is important that they understand and adapt the language of the user to match or complement the speech of the user [2]. The topic of children’s speech recognition has been gaining attention in recent years [3]. Wilpon’s study [4] shows that recognition of the language of children is more difficult than that of adults. He also noticed that some formant information is missing in the case of children’s speech of telecommunications bandwidth expression, and he lowered the count of Linear Predictive Coding (LPC) coefficients in his recognizer to account for this trend. Another study [1] explored the utility of frequency warping to account for shifts in the frequency spectrum owing to the smaller vocal tracts of boys. Some of the mistakes were considered to be related to grammar problems. Further focus had to be given to the features related to the ASR of an infant. It is well recognized that young children’s speech habits differ significantly from those of adults. Differences have been studied between the auditory properties of speech in children and adults in [5]. For digit and phrase recognition activities, this form of deterioration is checked in [6]. In this paper, we discuss some aspects of our work on the children Punjabi speech recognition system developed under mismatched acoustic conditions for continuous speech. We used discriminative techniques for our work. As a consequence, discriminatory training techniques are key components of technologies and are a major area of speech recognition research [7]. These techniques achieved substantial improvements for small vocabulary tasks on small datasets. Standard databases such as TIMIT and ATIS are available for foreign languages such as English but the key obstacle in Punjabi speech research or any other Indian language is the lack of resources such as speech and text corpora. In this paper, the confusion among word pattern is handled on small child training dataset using discriminative methods to enhance the unbalancing between corrected and uncorrected word sequence. Apart from the introduction in Sect. 1, the rest of the paper is sorted as pursues. A part of the relevant work is shown in Sect. 2. Section 3 describes the context in theoretical terms. The experimental setup is given in Sect. 4, and the experimental results are provided in Sect. 5. The method is eventually summarized in Sect. 6.

2 Related Work Li and Huang [8] have proposed an auditory extraction algorithm based on the feature. The author applied Cochlear Filter Cepstral Coefficients (CFCC) features for speaker

Punjabi Children Speech Recognition System Under …

197

identification to resolve the conditions of acoustic mismatch between test and environment. Typically, the system’s output dropped significantly when examined on noisy data when it is focused on clean speech. In this type of situation, CFCC is better to perform on the baseline instead Mel Frequency Cepstral Coefficient (MFCC) under the three mismatched conditions that is car noise, white noise, and babble noise. All the MFCC and CFCC worked well when data is clean but the precision of MFCC decreases when signal-to-noise level of 6 decibels although the CFCC can still achieve higher precision than the MFCC. The CFCC does better under the white noise than the PLP and RASTA but the CFCC does similarly to the PLP and RASTA under the vehicle and babble noise. Giuliani and Gerosa [5] examined speech comprehension in children in the context of the process of phoneme identification. They analyzed phone recognition by comparing the two experimental configurations of which children get the lower accuracy, 77.30% with respect to accuracy, 79.43% for adult phone recognition obtained. The outcomes of many children speakers being heard were as strong as for adults. Concerning the reference system under mismatch conditions, for adults and children, respectively, Vocal Tract Length Normalization (VTLN) makes a relative reduction of 10.5 and 5.3%. When they recognize the children’s speech with the baseline system, they obtained a low recognition performance of 58.11%. When they applied VTLN, they obtained better recognition performance on the same dataset with the trained system on children up to 66.43%. Das et al. [1] conducted a several experiments with children’s information to develop a system of speech recognition for children. Using certain commands and control data, they found a gain of frequency warping. They designed the acoustic and language prototype and analyzed word recognition results in different configurations, where a WER of 10.4 was achieved by the construction method. Li and Russell [9] studied a speech recognition quality in a small children’s community. The grammar is proposed to be a major influence on the quality of recognition of expression. Use a personalized dictionary will improve the performance of the ASR, but the change is small. Quality deterioration due to poor speech, combined with degradation due to the use of telecommunications bandwidth capacity, is proposed to account for most of the recorded differences in performance between adult and child. Lee et al. [3] published on a collection of temporal and auditory parameters calculated from a recently compiled speech sample of 436 participants between 5 and 18 years of age and 56 adults. Their findings indicated that a major trend correlated with the growth of speech in normal children is the decrease in amplitude and in subject variation of both temporal and spectral acoustic parameters with age. Arunachalam et al. [2] studied the research which focuses on a discourse analysis of child-machine relationships in the spoken language. Their results indicate that with no obvious gender differences, younger children are less likely to use respectful signs and more direct requests for information compared to older ones. Narayanan and Potamianos [10] reported results that are feasible for the creation of children’s conversational framework. Standardization of the speaker and modification of the template was used to improve the performance of speech recognition.

198

H. Kaur et al.

Overall, the prototype was a positive first effort to build a children’s multimodal program with an emphasis on conversational language. Kathania et al. [11] investigated the possibility of deliberately adjusting the pitch of the voice of children to minimize reported pitch variations between two speaker classes. Such a clear reduction in pitch is acknowledged as a significant improvement in the recognition quality. The feasibility of the suggested methods was tested on ASR models equipped for adults using different auditory simulation strategies, i.e., Gaussian Mixture Simulation (GMM), subspace GMM, and Deep Neural Network (DNN). It is observed that the suggested approaches were highly effective in all the simulation paradigms that have been studied. Shahnawazuddin et al. [12] introduce their efforts to improve the quality of the keyword spotting program under a limited data scenario for children’s language. They addressed two different ways of implementing prosody alteration effectively. Augmentation of data based on prosody modification often helps to improve the quality of adult voice. The frequency of the speech test utterance of the kids is lowered, and the speaking level is increased significantly. The performance attained by prosody adjustment is much higher than the output obtained through VTLN usage. It is also observed that data increase is very effective in improving the performance of KWS concerning the speech of children.

3 Theoretical Background 3.1 Discriminative Techniques Discriminative training is a supervised learning theory that minimizes the modeling of bias labels and the outcomes of recognition. The article focuses mainly on MMI, though other methods of training are accessible. The objective function in the MMI is given as  K pλ {xt }r |Hsr p L (sr ) FMMI (λ) = log  K s pλ ({x t }r |Hs ) p L (s) r =1 R 

(1)

where R is the count of assertions of training; where {x t }r is a list of rth function of utterance. The generalized Baum-Welch algorithm optimizes the acoustic design parameters; r is the learning utterance list. HSr and H s are the correct label HMM sequences S r and the product of recognitions, respectively. Pλ is an acoustic model’s probability, k is an acoustic size, and pL is a language model’s probability. Assertions involving multiple errors need to be treated intensively, and reliability is improved by testing phoneme accuracies.

Punjabi Children Speech Recognition System Under …

 K pλ {xt }r |Hsr p L (sr ) FbMMI (λ) = log  K −b A(s,sr ) s pλ ({x t }r |Hs ) p L (s)e r =1 R 

199

(2)

where A(s, sr ) for a reference sr is the phoneme accuracy of s, and (b > 0) controls the stability of its outcomes. We equate MMI and bMMI output with ML efficiency.

3.1.1

Discriminative Feature Transformation

Aside from discriminative learning, the use of a function transformation focused on discriminative training criteria is necessary. The present approach predicts a matrix M which projection from high-dimensional non-linear features to low-dimensional transformed features, as present in Eq. (3): yt = xt + Mh t

(3)

where x t is the initial k-dimensional function, ht is the non-aligned L-dimensional feature, and yt is the transformed element. The proportions of matrix M are K × L. We verify the utility of fMMI and its improved MMI (fbMMI) feature-space extension. The features were designed in the same manner in both of these, but the objective function of training is specific. After the substitution of y of Eq. (3) for x, we get the objective function of fMMI in Eq. (1) and (2):  K pλ {yt }r |Hsr p L (sr ) F f −MMI (M) = log  K s pλ ({yt }r |Hs ) p L (s) r =1 R 

(4)

The objective function F is defined by M as   T ∂F ∂F ∂F  h1 · · · h T f = ··· ∂M y1 yT f

(5)

where T represents the interchange, and T f is the aggregate count of datasets. The objective function of fbMMI is similarly constructed. The optimal matrix M is obtained by gradient descent. N components of GMM are achieved by accumulating the Gaussians into N components in the original triphone acoustic models and restating their criterion to form the functionality. The ht non-aligned properties are measured as

 pt,n



T xt,1 − μn,i xt,K − μn,K , . . . , pt,n , αpt,n σn,i σn,K

(6)

where μn , I, and σ n , i in the nth Gaussian element are the mean and variance in dimension i. α is the element of scaling. For each frame, Pt, n is measured posteriors of Gaussian components, accurately measured so that all posteriors except the Q-best

200

H. Kaur et al.

are set at zero. This calculation was achieved to minimize the expense of computation by making sure that the ht is sparse.

4 Experimental Setup The experiments were performed to confirm the efficiency of discriminative sequential training. The study of the outcome was carried out on the Punjabi children’s speech corpus under mismatch conditions. Separate systems were trained using the two-speech corpus (Child/Adult). The first is a child speech recognition system; the corpus and results are given in [13]. The second system was trained on the adult speech corpus and test on the child’s speech corpus. The adult speech corpus consists of a total of 3353 utterances. Both the datasets were used to analyze the performance of discriminative techniques under mismatch conditions. The recognition system for mismatch was trained on 3353 utterances spoken by the adult and test on 1440 utterances spoken by the child speaker. The other recognition system for mismatch was trained on 3859 utterances spoken by the adult and child speaker and test on 1653 utterances spoken by the child speaker. The whole children’s and adult speech repertoire were measured at 16,000 Hz level. On these databases, the acoustic models were trained, and the corresponding language model 5 k scale was used. Using 13 standard MFCC + Delta + Double Delta coefficients, an input speech signal was analyzed to produce acoustic characteristics. It was also found that later Linear Discriminative Analysis (LDA) reconstructs such acoustic characteristics derived and invented the limited vocabulary dataset training substantially. The first 13 MFCC elements culminated in 117 measurements that were further enclosed in 40 dimensions through the LDA solution. However, it also tried to use these features on HMM state alignments, utilizing the triphone layout. This was created by the creation of high-dimensional acoustic characteristics in the degradation of the efficiency of the model, thereby making it more difficult to modify the feature space. Therefore, utilizing stateconditional covariance of combined space properties, MLLT was used to convert the function area. The linear regression of the feature-space ML was applied to achieve substantial improvement and was therefore helpful in the adaptation of speakers.

5 Experimental Results Discriminative training is performed in both feature and model space using a variant of the MMI criterion called boosted (or margin based) MMI (BMMI). The objective function is used is a modification of BMMI which uses a frame-based, state-based loss function instead of a phone-based accuracy measure. Result analysis was conducted in this segment utilizing four variants of MMI discriminative approach: MMI, bMMI, fMMI, and fbMMI to obtain the following results:

Punjabi Children Speech Recognition System Under …

201

Table 1 Word error rate using MMI, boosted MMI model on boosted value 0.25, and fMMI, fbMMI on the iteration value 3 Dataset (train/test)

WER% MMI

bMMI

fMMI

fbMMI

Adult/child

63.72

61.62

55.25

53.49

Adult, child/child

21.32

20.23

17.93

16.06

• Variation in error rate with MMI boost value level. • Error rate variance observed with iteration value number in fbMMI. Using the Kaldi toolkit [14], the system was developed. It was built on the Ubuntu operating system which is a Linux platform. In terms of WER, system recognition efficiency was evaluated. The results for child speech recognition are given in [13]. Additionally, the recognition system output trained from mixed data from the adult and child corpora shown in Table 1.

5.1 Experimental Results with Varying Boosted Parameter Values The primary step in the discriminative training sequence is the generation of lattices for numerator and denominator. It has been found that the number of lattices also appears to decrease the transcription’s forced alignment. We have adapted Kaldi’s MMI recipes to fit the data through yet to our knowledge. In order to decode for the process of MMI, four iterations (by default) of stochastic gradient descent are being used. Further, we have used the boosting factor with MMI to robustly locate the likelihood of the path with more errors. bMMI performs better than MMI due to the refined constant learning rate of 0.00001 and I-smoothing as regularization shown in Table 1. Moreover, the boosting factor was investigated with the values varied from [0.25, 0.5, 0.75, 1.0, and 2.0] and the system obtained lower WER at bMMI value of 0.25.

5.2 Experimental Results with Varying Number of Iteration Values Before starting with feature space discriminative training, to train the diagonal mix of the implemented Gaussian, 250 amounts of Gaussian with the silence of weight value 0.5 were employed. The total number of eight iterations with a boost value of 0.25 and a learning rate of 0.001 were employed for the training of feature space bMMI. The denominator states are thus used, and the lattice is re-scored on all eight iterations

202

H. Kaur et al.

leading to the transformation of the features needed for robust discriminative training in the feature space. The obtained WER shows that system obtained maximum output at fbMMI iteration value 3 as shown in Table 1.

6 Conclusion and Future Work This paper represents a Punjabi language speech recognition system for children’s speech under mismatch conditions. The experiments were repeated for the child and adult speech corpus described in Sect. 5. Discriminative techniques were explored for the training and testing conditions of both matched and mismatched data. The framework presented was developed using one kind of discriminative technique and its variants, i.e., MMI, bMMI, fMMI, and fbMMI. It showed significant that changes were made by using discriminative training methods for limited vocabulary tasks on small datasets. Recognition efficiency declines significantly for matched and mismatch conditions. The WER for mismatch conditions is about higher than for matched conditions. The primary cause of the loss of results is the inconsistency between the mismatch of speech data. The boosting factor was investigated with the different values and the system obtained lower WER at bMMI value of 0.25. The obtained WER shows that system obtained maximum output at fbMMI iteration value 3. It has been analyzed that fbMMI appeared to be a promising technique than MMI, bMMI, and fMMI from the results presented in this paper. In mismatched speech recognition process, acoustic properties of the speech signal like pitch, formant frequency, fundamental frequency, and speech speaking rate play an important role for achieving the good performance. There is a lot of differences in acoustic properties of children’s and adult speech signal. So, in the future by enhancing the pitch and acoustic properties of the children’s speech signal, the performance of the Punjabi children’s speech recognition system will be increased.

References 1. Das, S., Nix, D., Picheny, M.: Improvements in children’s speech recognition performance. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’98 (Cat. No. 98CH36181), vol. 1, pp. 433–436. IEEE (1998, May) 2. Arunachalam, S., Gould, D., Andersen, E., Byrd, D., Narayanan, S.: Politeness and frustration language in child-machine interactions. In: Seventh European Conference on Speech Communication and Technology (2001) 3. Lee, S., Potamianos, A., Narayanan, S.: Acoustics of children’s speech: developmental changes of temporal and spectral parameters. J. Acoust. Soc. Am. 105(3), 1455–1468 (1999) 4. Wilpon, J.G., Jacobsen, C.N.: A study of speech recognition for children and the elderly. In: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, vol. 1, pp. 349–352. IEEE (1996, May)

Punjabi Children Speech Recognition System Under …

203

5. Giuliani, D., Gerosa, M.: Investigating recognition of children’s speech. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. (ICASSP’03), vol. 2, pp. II-137. IEEE (2003, April) 6. Potamianos, A., Narayanan, S., Lee, S.: Automatic speech recognition for children. In: Fifth European Conference on Speech Communication and Technology (1997) 7. Heigold, G., Ney, H., Schluter, R., Wiesler, S.: Discriminative training for automatic speech recognition: modeling, criteria, optimization, implementation, and performance. IEEE Signal Process. Mag. 29(6), 58–69 (2012) 8. Li, Q., Huang, Y.: An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Trans. Audio Speech Lang. Process. 19(6), 1791–1801 (2010) 9. Li, Q., Russell, M.J.: An analysis of the causes of increased error rates in children’s speech recognition. In: Seventh International Conference on Spoken Language Processing (2002) 10. Narayanan, S., Potamianos, A.: Creating conversational interfaces for children. IEEE Trans. Speech Audio Process. 10(2), 65–78 (2002) 11. Kathania, H.K., Ahmad, W., Shahnawazuddin, S., Samaddar, A.B.: Explicit pitch mapping for improved children’s speech recognition. Circ. Syst. Sig. Process. 37(5), 2021–2044 (2018) 12. Shahnawazuddin, S., Maity, K., Pradhan, G.: Improving the performance of keyword spotting system for children’s speech through prosody modification. Digit. Signal Proc. 86, 11–18 (2019) 13. Kaur, H., Kadyan, V.: Feature space discriminatively trained Punjabi children speech recognition system using Kaldi toolkit. Available at SSRN 3565906 (2020) 14. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Silovsky, J.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (No. CONF). IEEE Signal Processing Society (2011)

Effective Irrigation Management System for Agriculture Using Machine Learning S. T. Patil, M. S. Bhosale, and R. M. Kamble

Abstract Farming of different crops is major income source of the farmers in India but there are many factors that affect the farming business. One of the important factor is efficient water supply for the crop. The work in this paper proposes the effective irrigation system that helps to increase the productivity of the crops by regulating the water requirement for the crop with the help of machine learning approach. The images of different farming land are studied and classified depending upon the soil type and its properties like the water requirements in different conditions. The image processing is applied to images of lands to understand the current soil condition. This phase is followed by application of decision tree and random forest to take a decision of water is required or not. If answer is yes, then using linear regression, we calculate the time period of water flow. Keywords Classification · Regression · Decision tree · Random forest · Agriculture · Image processing

1 Introduction Agriculture is an important sector in Indian economy as it contributed 17–18% of total GDP of country (2018). Indian farmer grow different crops at different parts of the country which depends upon the weather conditions of the region and the soil type. In addition to weather conditions and properties of the soil, effective watering to the crop is one of important aspect of the farming. As Dr. Vibha Dhawan in [1] S. T. Patil (B) Department of CSE, Sanjay Ghodawat University, Kolhapur, India e-mail: [email protected] M. S. Bhosale Department of Computer Science and Engineering, TKIET, Warnanagar, Kolhapur, India e-mail: [email protected] R. M. Kamble Department of Computer Science and Engineering, ADCET, ASTHA, Ashta, Kolhapur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_22

205

206

S. T. Patil et al.

explained that good seeds and fertilizers fail to achieve their full potential if crops are not watered optimally. The fresh water requirement of different industry and agriculture is increasing rapidly, but uncertainty in rain fall, limitations of reservoir, and low ground water level are a big threat for farming/agriculture sector. To cope with this situation and for the better and efficient use of existing water, we need an effective irrigation management system. Sugarcane is major crop in southern Maharashtra and proper percentage water, moisture in soil/land yields to better productivity. The sugarcane is very water intensive crop as it requires large quantity of water. Currently, sugarcane farmers are not having any economical resource/equipment which will guide them for effective and efficient watering for farming. They are guessing water requirement on their past experience and flood the water into farm; this leads to huge wastage of water. Some farmer uses sprinklers to minimize water wastage. In recent years, some low-cost sensors are developed to detect the water contents in soil [Development of low-cost sensor]. In [determination of soil moisture], Praveen et al. designed IoT-based sensors for irrigation system but they are not suggesting what should be the water quantity required for maintaining required moisture. The work presented in this paper estimates the water requirement of crop based on given image of soil with machine learning approach. In the next chapter, literature survey is given. Third chapter explains methods used for data collection. In the fourth chapter, we have explained methodology. Fifth chapter explains the result analysis and includes the numerical comparisons of these methods. The last chapter gives concluding comments.

2 Literature Survey dos Santos [2] in which he presented relation between the soil and moisture with the help of images taken by digital camera, and images are adjusted to white balance. In his study, he derived the equation which will relate the presence of moisture in soil from image taken. He referred the data provided by Federal University of Vicosa. Fitton [3] studied on farming in Africa, China, Europe, and Asia. After study is observed that proper watering can increase the productivity also he explained how water affects productive capacity of land/soil. More water or less water contents decreases the productivity. Proper water increases the productivity. Ashok [4] studied how images can be used to detect and treatment of different disease by capturing the images. After capturing the image, some preprocessing is done from which the histogram which is generated will classify or detects the disease on plant. This study is helpful for detecting the disease from image. Khan in paper [5] uses decision tree method to estimate how much water is available in that region. He also tried to estimate the moisture in soil. He showed how data mining can be used for estimating water availability in that region. This estimation will help to get good crop.

Effective Irrigation Management System …

207

Dhawan in [1] presented detailed analysis of water and agriculture in India. One of the outcome of research is water efficiency in the country that should be increased by making best use of available technologies.

3 Dataset For our study, we have selected eight different pieces of lands of same size, depending upon water holding capacity. The information of land is collected in the form of four digital images from each corner of the land, and physical samples of soil are collected. The collected soil samples are studied for moisture contents of the soil and also to get other properties. The wind flow rate and current temperature of the day are also important factor which affects the moisture contents of soil. That’s why while collecting samples of soil the wind flow rate and temperature of that region is also recorded and maintained with images taken. Then, we studied total quantity of water irrigated for a sugarcane over the period of time and measured the final product gain of sugarcane. For the experimental study, we prepared a dataset of images. The images were taken twice a day from every piece of land for six months excluding the rainy season for sugarcane crop cycle. So we collected nearly 11,520 images, 5760 samples of soil, and 1440 temperature and wind flow readings. We considered a standard moisture contents that are expected to be present in land/soil to increase productivity at every stage of a crop.

4 Proposed System Architecture and Methodology The practical analysis of our work is represented in Fig. 1. First digital images were captured at 1 feet height from surface of land/soil. Then, the captured images are taken as input to our system. The final analysis is depended on captured images so we did preprocessing of images to remove unwanted part. The images are stored into RGB format with 256 × 256 size. Also it is stored in 8 × 8 representation for further processing. The images are classified into three classes. This classification is dependent on the water requirement of the soil, i.e., low, moderate, and high water requirement. Also from this classification [5], we can get the current moisture in land/soil. Then, images are provided to decision trees [4]. The output of decision trees provided to input to random forest [4] which gives decision whether water is required for land/soil or not. If irrigation is required, then from linear regression, it is decided how much water is required. The following equation is used to calculate water need of the crop. W = EM − CM

(1)

208 Fig. 1 System architecture

S. T. Patil et al.

Image Acquisition

Image Preprocessing

Dataset

Image Classification Random Forest

Irrigation is NOT required

Irrigation is required

Linear Regression. How much is required?

where W is how much water is required, EM is expected moisture in land/soil, and CM is current moisture I land/soil. Expected moisture (EM) depends upon which type of sugarcane we have selected, date of sow, current date, and soil type. This information we have taken from government offices/agriculture departments of that region. And the time for water should be irrigated is calculated as, T =

W F

(2)

where T is time in minutes; F is flow rate of water from irrigation system install in land/soil. We are also calculating total water consumption (TWC) of the month using formula  TWC = F∗T (3)

Effective Irrigation Management System …

209

The algorithm is as follows Input: Images from land/soil. Output: Quantity of water required in time format. Step 1: Capture the image. Step 2: Preprocess the image. Step 3: Find the class of the image and get the value of EM and CM. Step 4: Process the random forest. Step 5: If output of step 4 is no, then go to step 8. Step 6: Input EM, CM, current temperature, wind flow rate to linear regression. Step 7: Get the T from linear regression. Step 8: End.

5 Experimental Results Initially, we checked the performance of the system which we have developed. For this, the collected data is divided into two parts training data and test data. Training data is used to train the model, and then, model is tested on testing data, and results are shown below. Average accuracy of the model is approximately 84% (Fig. 2). For experimental result, we primarily focused on number of units of electricity and water consumption for one crop cycle of sugarcane and also recorded total production of the product. We have studied the crop cycle of sugarcane for three years; out of which first 2 years, we observed the normal farming of the crop, and for third year crop cycle, we implemented our proposed system. Following are the experimental results for electric consumptions. For experimental result we also considered the distance from water resource also. We combined the results depending upon this distance also.

Fig. 2 Performance measure of the model

210

S. T. Patil et al.

If we consider the electrical consumption, then it clearly shows that after the system is implemented, the electrical consumption is reduced as per previous two years. This is because here we are giving time duration for allowing water to flow which was not previously considered (Figs. 3, 4, 5, and 6). If we check at the water consumption, then it clearly shows less water consumption as previous years after implementation of the system (Fig. 7). Effective irrigation system provides proper water for land which keeps moisture in land/soil as per requirement which increases the productivity of the land/soil (Fig. 8).

Fig. 3 Electrical consumption in number of units (low water requirement)

Fig. 4 Electrical consumption in number of units (moderate water requirement)

Effective Irrigation Management System …

211

Fig. 5 Electrical consumption in number of units (high water requirement)

Fig. 6 Electrical consumption in number of units (combine)

6 Conclusion In India, high water intensive crop like sugarcane is the major crop taken by farmers. Water plays an important role in the growth of sugarcane. Optimal use of water gives us better growth and better productivity. To achieve this machine learning techniques can be used effectively. From the result, it is clear that our system decreases the water consumption, electric consumption and increases the productivity of the crop.

212

S. T. Patil et al.

Fig. 7 Water consumption in number of units (combine)

Fig. 8 Sugarcane productivity (combine)

In future, we can implement IoT system for starting and stopping of irrigation system. Also we can use this system to guide farmer for selecting type of sugarcane and other crops.

Effective Irrigation Management System …

213

References 1. Dhawan, V.: Water and agriculture in India. In: Background paper for the South Asia expert panel during the Global Forum for Food and Agriculture (GFFA) (2017) 2. dos Santos, J.F.C.: Use of digital images to estimate soil moisture. Sci. Direct (2016) 3. Fitton, N.: Global Environment Change. Elsevier, Amsterdam (2019) 4. Ashok, J.M.: Agricultural Plant Disease Detection and its Treatment usig Image Processing. IJSRD (2015) 5. Khan, S.A.: An Approach to Predict Soil Nutrients and Efficient Irrigation for Agriculture with Spatial Data Mining, IJSRD (2015) 6. Aruna, D.D.: A Survey on Different Disease and Image Processing Techniques in Sugarcane Crops. IJSRD (2016) 7. Balew, A.: The Egyptian Journal of Remote Sensing and Space Science. Elsevier, Amsterdam (2020) 8. Sneht, S.H.: Land Use Land Cover Change Detection of Gulbarga City Using Remote Sensing and GIS. IJSRD (2014) 9. BenDor, E.: Using imaging spectroscopy to study soil properties. Remote Sens. Environ. http:// dx.doi.org/10.2016/j.rse.2008.09.019 10. Tomar, M.: Development of Low Cost Soil Moisture Sensor. IEEE, ViTECoN (2019) 11. Barapatre, P., Patel, J.: Determination of soil moisture using various sensors for irrigation water management. IJITEE, 8 (2019) 12. Wang, W., Liu, K.: Remote sensing image-based analysis of the urban heat isaland effect in Shenzhen, China. Elsevier Book 110 (2019) 13. Peng, J., Jia, J.: Seasonal contract of the dominant factors for spatial distribution of land surface temperature in urban areas. Elsevier Book, 215 (2018)

IoT-Based Smart Irrigation System Mithilesh Kumar Pandey, Deepak Garg, Neeraj Kumar Agrahari, and Shivam Singh

Abstract IoT-based smart irrigation system is used to automatize farming. It can be used to manipulate the quantity of water to go with the flow at desired durations, maintains desired humidity, soil moisture level for crop protection and crop improvement, and to save the time of farmers. In this irrigation system, all the work perform automatically by using the technologies, and all the processes can be handle via a mobile. Sensors sense the field and give the information to the microcontroller of change in moisture and temperature. Then, microcontroller reads parameters measured by sensors and transfers to the server and users through MQTT protocol which has high speed, and the user can use easily all information on his mobile. In this paper, we have deeply reviewed this area and all related applications. Based on vast review, we have highlighted all the features or functioning. Keywords IoT · Raspberry · Sensor · Cloud computing · MQTT protocol

1 Introduction The Internet of Things is the concept of connecting any tool (see you later because it has an on/off transfer) to the Internet and different related devices. The IoT is a huge network of connected topics and people—all of which gather and percentage facts about the way they’ll be used and approximately the surroundings round them. M. K. Pandey · D. Garg · N. K. Agrahari (B) · S. Singh Department of Computer Application, National Institute of Technology Kurukshetra, Kurukshetra, India e-mail: [email protected] M. K. Pandey e-mail: [email protected] D. Garg e-mail: [email protected] S. Singh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_23

215

216

M. K. Pandey et al.

Irrigation is the process of providing the water to plants at the wanted intervals. Irrigation allows us to grow crops and re-vegetate the disturbed soils in parched areas. It gives water transport at the proper time, within the proper amount, and within the right location within the area which performs an essential feature inside the plant’s boom. Water manipulates remotely which is likewise a hard venture; particularly, the manipulate turns into greater hard within the direction of the shortage of water, which can also in any other case damage the crop. By using sensors like moisture, rain, and so on. Water delivers for irrigation can be managed effects with the useful resource of analyzing the state of affairs of soil and weather. Soil moisture sensors properly hit upon the degree the soil moisture and primarily based on that statistics; the place is getting irrigated robotically with amazing deal plenty much less human interventions. Smart irrigation, that is, the idea of doing irrigation in a modern day manner, there are a variety of strategies that might be adopted in irrigation so that the yield grows and could boom the production.

2 Related Work In this paper [1], Cloud computing, Wi-Fi sensors, UAVs, and verbal exchange technologies are used. Also, diverse IoT primarily based structures are furnished along admires to farm applications. In this paper [2], Message Queuing Telemetry Transport (MQTT) is used which communicates with devices; it saves power by way of which it is low luxurious. This calls for an awful lot less human intercession and less maintenance for agricultural fields. In this paper [3], programmed irrigation system screens just as keep up special soil dampness fulfilled material texture through programmed watering. It utilizes microcontroller ATMEGA328P, soil dampness sensors, and sensor readings are transmitted to an issue talk channel to deliver diagrams for the appraisal. In this paper [4] test the edge values for climatic conditions like humidity, temperature, moisture and sense the invasion of animals and deliver ideas via SMS to the farmer at once on his cell the usage of GSM module.

3 The Necessity to Use the Cloud Arrangement of a wi-fi sensor organizes (WSN) reenactment environmental factors requires the consideration of many stuff viz suitable walking gadgets, appropriately enough RAM necessities, and capacity. Additionally, the host PC should be run for an extended term continually. A virtual stage is expected to satisfy the above prerequisites which may be based on cloud completely. “Sensefinity machinates cloud platform, except storing all of the measurements acquired from WSN, is likewise chargeable for: information sources identity, performing records validation, partitioning, and processing. The later includes on foot the irrigation set of guidelines for detecting whenever the vegetation want irrigation [5].”

IoT-Based Smart Irrigation System

Fig. 1 Precept of cloud computing for IoT

Utilization of the cloud depository: 1. Data might be gotten to from any place. 2. Hardware necessities and value decreases. 3. The security of measurements will development (Fig. 1).

4 Benefits 1. Saves more money and more water. 2. Reduces the wish to bring water and store. 3. Make the garden convenient and smooth.

5 Different Features Different features are the following: 1. Easy to install and operation.

217

218

M. K. Pandey et al.

2. Design a unique time table for watering. 3. Efficiently use of water. 4. Handle by a mobile phone application.

6 Literature Review In this paper, some of these factors are taken into deliberation and the location of various technologies; particularly, IoT is displayed to form the smart farm and more efficient to satisfy future expectations. According to this aim, here, cloud computing, UAVs, wi-fi sensors, and communique technologies are explained fully [1]. To require an outstanding arrangement less human mediation and least wellbeing for horticultural grounds, the machine is conceivable to use without issues for all agriculturists. Additionally, the moment is taken care of, and water is utilized satisfactorily exclusive of dipping. This MQTT has utilized which is capable of talking with the various contraption. It is the lowest information move limit and espresso potential utilization forms the projected contraption expenditure astounding [2]. The framework likewise encourages ongoing far-flung tracking of the current ecological kingdom of place. Present day innovation may be consolidated to permit down the cost [3]. This device generates an irrigation time desk based on the sensed actual time facts from field and statistics from the climate repository. This machine can advise farmers whether or now not or not is there a need for irrigation [4]. Better and optimally farms are irrigated appropriately which is the crop yield. So this portray has designed a brilliant farming tool dependent totally on IoT with together sensor of moisture [6]. The edge voltages are picked for modification of the sensors through because of past significant lots of temperature and soil sogginess regards. Limit esteems can be different based upon the crop and plantation. In the destiny, with the resource of introducing, the device reading set of rules for use to device the information and decrease the complexity of the hardware [7]. A channel is created with the aid of the open-supply IOT platform is created to shop and show the soil moisture data and additionally to govern the irrigation by the manner of the Internet [8]. These techniques take the longer length and wasting the available water in higher quotes so it ends in the utilization of water extra than what required [9]. The gadget proposed in this paper pursuits the purpose of the mixture of structures with the engaging strengths provided through the manner of cloud computing. It can be carried out for rural applications [5]. All the machine features a characteristic sensor format for power standard performance, charge performance, preservatives, with other gullibility to surrender the benefit of use [10].

IoT-Based Smart Irrigation System

219

Fig. 2 Raspberry Pi 4

7 Hardware Description 7.1 Raspberry PI 4—Model B Raspberry Pi is like a small PC, and it is lightweight, and it has an ARM processor. Besides, it has HDMI port, wi-fi modules, USB ports, and Ethernet port. Raspberry Pi has an operating system like Raspbian, Linux, Kali Linux, Snappy Ubuntu, Arch Linux ARM, etc. It has an HDMI port, and it does not have HDD or SSD still we can insert micro SD card into raspberry pi so that we can boot the operating system of Raspberry Pi (Fig. 2).

7.2 Software Used To implement, first, I have to install the software in Raspberry Pi 4. There is a need to download two software and one operating system. The first software is Win32 Disk Imager. The second software is SD card formatter, and one operating system for the Raspberry Pi is the Raspbian operating system. The useful programming language of Raspberry pi is Python.

220

M. K. Pandey et al.

Fig. 3 Shows the primary circuit diagram of hard and fast regulated energy delivery

7.3 Power Supply Every single computerized circuit requires managed vitality dispatching. In this context, we will find approaches to get controlled incredible conveyance from the mains convey (Fig. 3).

7.4 Data Acquisition System A propelled reality securing is chosen on an unmarried chip that fuses styles and proceeds with hardware. MCP3208 exhibit in Fig. 4 is the driven IC which changes over easy to twelve-piece automated pointers. It has ended up programmable each unmarried or discrepancy couple input. The difference between non-linearity is ±1 LSB, and central nonlinearity is ±1 LSB. It also has a form of SAR. Safeguard capacitor can achieve for 1.5 clocks/cycles starting at the fourth creation brink of the consecutive clock. This chip put the capacitor divide the yield into 12 pieces and change the pace of 100 ksps. Fig. 4 Data acquisition chip

IoT-Based Smart Irrigation System

221

Fig. 5 Soil moisture sensor

Fig. 6 Temperature sensor

7.5 Soil Moisture Sensor The exactness soil moistness has picked appeared in Fig. 5 which consolidates tests that might be inserted into the earth. Exactly, when the present day experiences the tests, the soil contains low clamminess which gives a better than the average plan of abundance fundamentally less check and goes through extraordinary contemporary day. Volatile defiance is the criterion to choose the fraction of soil condensation.

7.6 Temperature Sensor (LM35) The temperature sensor combination is accuracy included route heat sensors as attempted in Fig. 6, whose acquiesce electrical energy is straightly relating to the centigrade heat.

7.7 Buzzer A buzzer reviewed in Fig. 7 is used in this projected organization to offer notification markers that the water pump is turned ON or OFF. It offers sound posted notification each through mechanical, electric controlled, or equipment worked. The buzzer is a sound hailing instrument.

222 Fig. 7 Buzzer

8 Block Diagram See Fig. 8. Fig. 8 Flow chart of the process used

M. K. Pandey et al.

IoT-Based Smart Irrigation System

223

Fig. 9 Image of output

9 Result • The above image describes the output of MQTT clients who are receiving parameter values from different sensors (Fig. 9). • Using MQTT protocol, all sensors parameters are transmitted to clients. • If “Crop/node” resembles the MQTT node, then more clients in the same node can receive multiple pieces of information from different clients placed in different areas of fields.

10 Conclusion IoT-based smart irrigation system reduces the human survey, utilization of water, and the labor related with normal processes. By using simple electronic parts, this smart irrigation system can be made at a low cost. To avoid the waste of water and to use water efficiently, the smart irrigation system is very important. Also, it can increase the creation of fruits or vegetables, and it helps the agriculture ground to reduce the waste of water. In all the processes of a smart irrigation system, the MQTT protocol plays the most important role. By the MQTT protocol, a smart irrigation system becomes independent with the fast transmission of information. The benefit of MQTT protocol is that whenever clients are not in range of node network, the information will be sent, whenever clients come in range and connected with that node network; then, they can see the information which has been sent before.

224

M. K. Pandey et al.

References 1. Ayaz, M., Ammad-Uddin, M., Sharif, Z., Mansour, A., Aggoune, E.-H. M.: Internet-of-Things (IoT)-based smart agriculture: toward making the fields talk. IEEE Access (2019) 2. Islam, M.M., Hossain, M.S., Reza, R.K., Nath, A.: IOT based automated solar irrigation system using MQTT protocol in Charandeep Chakaria. IEEE (2019) 3. Dokhande, A., Bomble, C., Patil, R., Khandekar, P., Dhone, N., Gode, C.: A review paper on IOT based smart irrigation system. IJSRCSEIT (2019) 4. Sushanth, G., Sujatha, S.: IOT based smart agriculture system. IEEE (2018) 5. Saraf, S.B., Gawali, D.H.: IOT based smart irrigation monitoring and controlling system. IEEE (2017) 6. Mishra, D., Khan, A., Tiwari, R., Upadhay, S.: Automated irrigation system-IOT based approach. IEEE (2018) 7. Nageswara Rao, R., Sridhar, B.: IoT based smart crop-field monitoring and automation irrigation system. IEEE (2018) 8. Benyezza, H., Bouhedda, M., Djellou, K.: Smart irrigation system based Thingspeak and Arduino. IEEE (2018) 9. Pernapati, K.: IOT based low-cost smart irrigation system. IEEE (2018) 10. Vaishali, S., Suraj, S., Vignesh, G., Dhivya, S., Udhayakumar, S.: Mobile integrated smart irrigation management and monitoring system using IOT. IEEE (2017)

A Hybrid Approach for Region-Based Medical Image Compression with Nature-Inspired Optimization Algorithm S. Saravanan and D. Sujitha Juliet

Abstract Medical modalities generate a massive amount of digital volumetric data to analyze and diagnose medical problems. To sure over the data quality and storage space, compression primes to be an efficient methodology in the digital world. Medical images represent the body features for a diagnostic purpose that needs to get compressed without a loss of information. Reducing the redundancies and representation in a shorter manner achieved over a region of interest area on an image solves the problem. The proposed methodology uses the region-based active contour method driven by bat algorithm to segment an image into a region of interest and non-region of interest. Region of interest area compressed by a lossless integer-based Karhunen-Loeve transforms, where the non-region of interest compressed by a lossy Karhunen-Loeve transforms. Optimum results suggest that the proposed method improvises the region segmentation of a medical image, which results in achieving a high PSNR, SSIM and a quality compressed image. Keywords Medical image compression · Region of interest · Nature-inspired algorithm · BAT

1 Introduction Medical image compression attains to be an efficient process for image archiving in hospitals. Computed tomography, magnetic resonance imaging, ultrasound, electrocardiography, X-ray, mammogram, etc. were the popular modalities that generate vast medical data. Compression techniques were able to process substantial volumetric

S. Saravanan (B) · D. S. Juliet Department of Computer Science and Engineering, Karunya Institute of Technology and Sciences, Coimbatore, India e-mail: [email protected] D. S. Juliet e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_24

225

226

S. Saravanan and D. S. Juliet

data to attain the rapid interactivity, context-oriented images, and also for quantitative analysis [1]. Lossy, lossless, and hybrid are the categories of compression techniques. To represent the medical information source without significantly degrading, its quality is achieved through lossless compression. Two crucial factors which dealt with image compression are redundancy and irrelevance. Removing duplicate information from the image is named redundancy reduction, whereas unnoticed image information avoids in terms of irrelevance reduction. A hybrid method of compression acts with two or more algorithms processed over an image to achieve the best visual quality output. JPEG and JPEG2000 [2] are the well-known algorithms, which brings the output image as same as the quality of the input image, which represents as a lossless compression algorithm. DCT and DWT are the expansively used transform in image compression models. Transform coding and predictive coding play a significant role in the process of lossless compression technique. Predictive coding, namely JPEG-LS, CALIC were used as a single predictor to achieve the lossless compression. In this article, we propose a region-based medical image compression model in Sect. 3, where the region of interest and non-region of interest are segmented using the region-based active contour method. The BAT optimization algorithm drives further results for region-based segmentation. Region of interest part process with the integer-based KL transform (IKLT) that realizes integer to integer mapping by factorization for achieving a lossless image compression. Non-region area compressed with the KL transform. Results are analyzed in Sect. 4 with the existing image compression IDTT [3] and IDWT [4].

2 Related Works In telemedicine to overcome the problem of storage space and transmission time, a medical image compression algorithm is essential. Image compression is widely classified into lossy technique and lossless technique. The hybrid compression technique combines two or more algorithms in achieving a high-quality compressed image. Region-based medical image compression has become the most efficient methodology in the region of interest selection method of compression. DCT, DWT, and KLT are the widely used transform in image compression. As the medical data need to be compressed with a loss of quality, lossless based compression is developed such as JPEG-2000, JPEG-LS, and CALIC. Image segmentation has a key role in the region-based compression model such as thresholding, edge detection method, anatomical region separation, diseased area separation using K-means [5], fuzzy c means [6], active contour method [7], etc. Multilevel thresholding [8] proposes on achieving a higher PSNR and high SSIM metrics. Active-based contour also proved efficient in separating the region of interest and non-region of interest area from the medical image. Advantages of active contour methods over the classical segmentation methods are sub-pixel accuracy of edges of the objects; the formulation method is simple for

A Hybrid Approach for Region-Based Medical Image Compression …

227

energy minimization and can achieve a smooth closed contour result. Comparing the types of active contour methods, the edge-based model [9] and the region-based contour model [10] region-based active contour method proved to be efficient in comparing it with the Chan-Vese method [11]. Metaheuristic algorithms are used to furthermore driven with a segmentation algorithm to achieve an optimized output selection. Particle swarm optimization algorithm on segmentation proposed to be efficient in order to compress the image with a high-quality image. Bat optimization with fuzzy encoding method proves to be efficient when its compared with the tradition transform-based compression model. Using an optimization technique to optimum the active contour method can bring out efficient segmented image output. Integer-based transform [12] tends to achieve a lossless image compression model as proved in [3]. When the transformed output are factorized to the nearby values results in lossy compression technique. Integer to integer conversion is achieved through the integer-based transform model to produce a lossless compression technique.

3 Proposed Method Finding from the survey, it declares that a region-based compression model can be an efficient methodology to obtain an exhaustive uncompressed image for the region of interest area. In this section, image segmentation experiments with a region-based active contour method, which is optimized using the BAT algorithm. Segmented region of interest areas are lead into the integer-based KL transform for decorrelation purpose, and non-region of interest area is compressed using the KL transform as illustrated in Fig. 1.

3.1 Region-Based Active Contour Segmentation In order to segment the region of interest and non-region of interest from the medical database, an active contour method of segmentation is utilized. As compared to the classic method of segmentation like edge detection [13], thresholding method, and region growing method, active contour method attain sub-pixel accuracy of the area boundaries, it is easy to formulate under the energy minimization context. As a

Input medical image

Region based acve contour method driven by BAT algorithm

ROI

IKLT

NON-ROI

KLT

Fig. 1 Flowchart of the proposed methodology

Compressed image

228

S. Saravanan and D. S. Juliet

result, it achieves smooth and closed contours. Region-based active contour method is implemented in this article that aims to identify each region of interest using a certain region descriptor in order to guide the movement of active contour. It works based on the intensity homogeneity [14]. In the article [14], overcome of assumption problem in region-based active contour is achieved, which is implemented in the proposed method.  ε=∫

N  i=1

 ∫ K (y − x)|I (x) − b(y)ci | dx dy 2

i

(1)

Notation describes that true image (J) and observed image (I) with  as image domain with N distant constant value C 1 ,…, C N at disjoint regions say 1 ,…N , which results in minimizing the energy function Eq. (1). b denotes the component for intensity inhomogeneity, K as kernel function chosen as truncated Gaussian function as followed in [14]. Metaheuristics algorithm is clubbed with the active contour method to achieve the optimal segmented region of the medical data. BAT algorithm [15] is driven with a region-based active contour method to adaptively select the external energy weights and escape the local minima to obtain the classical contour method. The BAT optimization is a metaheuristic algorithm based on the echolocation property of the natural bats. Described in notation as bat frequency (f i ), position (X i ) and velocity (vi ) are initiated with loudness (Ai ) and pulse rate (r i ). According to the number of iterations, the velocity and position of the bats are updated.   f i = f min + f min − f max β

(2)

vij = vij (t − 1) + Xˆ j − Xˆ j (t − 1) f i

(3)

X ij (t) = X ij (t − 1) + vij (t)

(4)

where minimum frequency ( f min ), maximum frequency ( f max ), and β denote the randomly selected in the interval [0,1]. To create random walks using Eq. (5) and Eqs. (6) and (7) are used to update the loudness and pulse rates. X new = X old + ε A(t)

(5)

Ai (t + 1) = α Ai (t)

(6)

  ri (t + 1) = ri (0) 1 − e−γ t

(7)

A Hybrid Approach for Region-Based Medical Image Compression …

229

where α and γ are constants with bats, average loudness at time (t), which is signified by A(t); the strength and direction of the random walk are employed using the variable ε ∈ [−1, 1]. Figure 2 compares the region-based active contour method with the existing Chan-Vese active contour method, and as it says that the region-based segmentation works efficiently. Table 1 denotes the algorithm of BAT optimization algorithm.

Fig. 2 Segmentation results using Chan-Vese contour method and region-based active contour method

Table 1 BAT optimization algorithm

Objective function f (x), where x = (x1 , . . . , xn ) Step 1: Initialize the position xi , velocity vi , and pulse frequency f i for bat population Step 2: Initialize the pulse rates ri , loudness Ai —max no of iterations T Step 3: Each iteration t in T, repeat step 4 to step 16 Step 4: For each BAT bi , repeat step 5 to step 14 Step 5: Creating new solutions using Eq. (2)–(4) Step 6: IF (rand > ri ) Step 7: Choose the solution among the best solutions Step 8: Create a local solution within the best solution Step 9: END IF Step 10: IF (rand < Ai ) and (f (xi ) < f (GlobalBest )) Step 11: Admit the new solutions Step 12: Increase ri and reduce Ai Step 13: END IF Step 14: END FOR Step 15: Order the bats rank and find GlobalBest Step 16: END FOR

230

S. Saravanan and D. S. Juliet

3.2 Integer Karhunen-Loeve Transform KLT is a linear transformation that removes the redundancy by decorrelating the data, which achieves an effective compressed method. Results in factorized output which reaches a lossy compression model. In order to achieve a lossless compression method, an integer-based KLT is declared. IKLT which proposes a matrix factorization of converting to integer outputs. The main advantages of IKLT over KLT are lossless conversion, huge energy compaction, efficient lifting plan, and enhancement over the linearity. And moreover, it reverts as zero-missing data during the process of compression and decompression using IKLT. Factorization process [12] results of eigenvectors into four matrices, P, L, U, and S. P represents the pivoting matrix; L and S state lower triangular elementary reversible matrices (TERM), and U denotes upper TERM. Equation (8) denotes the integer to integer version of A matrix of an image. Non-region of interest is compressed by the linear KL transform, which results in lossy compression. A˜ : Z N → Z N , A˜ = P L˜ U˜ S˜

(8)

4 Results and Discussion After the combination in region of interest and non-region of interest, the performance metrics evaluation like peak signal-noise ratio, mean square error, compression ratio, and SSIM are used for findings. Sample input images considered for evaluation and output compression images obtained are illustrated in the Fig. 3.

√ MSE PSNR = 10 ∗ log10 2552

(9)

Peak signal-noise ratio is a parameter for assessing the quality of the compressed image. It is defined in Eq. (9). MSE defines the mean square error using Eq. (10). The compression ratio is achieved by the size of the input image divided by the size of a compressed image as given in Eq. (11). MSE =

 1 × ( f (x, y) − F(x, y))2 N i j

Compression Ratio (CR) =

Size of the Original Image Size of the compressed image

(10)

(11)

Structural similarity (SSIM) is an important metric used to measure the similarity between the input and output images. The equation for SSIM is defined in Eq. (12).

A Hybrid Approach for Region-Based Medical Image Compression …

231

Fig. 3 Input images with compressed images using proposed methodology

   2μx μ y + C1 2δx y + C2   SSIM = 2 (μx + μ2y + C1 ) δx2 + δ 2y + C2

(12)

The proposed method is compared with the existing algorithms like integerbased DWT [4], inter based DTT [3]. And the results are analyzed in Table 2. The proposed method outperforms when compared with the existing compression algorithms. SSIM values from Table 2 reflect that the proposed method can able to regain its best quality output image with the highest similarity of 0.998. Bolded values in Table 2 indicates the highest value achieved when compared with other existing algorithms. And it proves that the region-based compression technique works efficiently with the medical image in terms of separating the region of interest and non-region of interest. It also proved to be achieving a high-quality compressed image with a higher PSNR value of 44 dB. Figure 4 describes the PSNR value analysis.

232

S. Saravanan and D. S. Juliet

Table 2 Comparison of the proposed method with other existing algorithms for image compression Images

Methodology

PSNR

CR

MSE

SSIM

Image 1

IDWT

40.14 42.01 42.39

4.1 4.49 4.58

2.19 2.53 2.03

0.9971 0.9975 0.9998

40.64 40.16 40.86

3.91 4.18 4.29

2.16 2.20 2.09

0.9972 0.9979 0.9996

40.26 41.10 41.4

4.40 4.8 4.93

2.17 2.09 1.9

0.9991 0.9996 0.9997

41.92 42.07 43.21

4.25 4.72 4.91

2.48 2.60 2.32

0.9971 0.9979 0.9997

41.2 42.81 42.13

3.72 4.04 4.12

2.40 2.15 2.23

0.9989 0.9994 0.9998

IDTT IKLT (proposed) Image 2

IDWT IDTT IKLT (proposed)

Image 3

IDWT IDTT IKLT (proposed)

Image 4

IDWT IDTT IKLT (proposed)

Image 5

IDWT IDTT IKLT (proposed)

Fig. 4 Comparison of PSNR achieved using proposed and existing algorithms

5 Conclusion Region-based medical image compression is achieved with a high PSNR and CR value through the region-based active contour method, which is optimized by the metaheuristic BAT algorithm. Integer-based KL transform is used to decorrelate the image and produce the region of interest compressed image with the factorization integer to an integer output value. KL transform decomposes the non-region of interest image, and the combination of the compressed images achieves a higher similarity index. Thus, an efficient model of region-based image compression algorithms is evaluated in order to achieve a high fidelity compressed image.

A Hybrid Approach for Region-Based Medical Image Compression …

233

References 1. Gonzalez, R.C., Woods, R.E., Masters, B.R.: Digital image processing, Third Edition. J. Biomed. Opt. 14(2), 029901 (2009). https://doi.org/10.1117/1.3115362 2. Skodras, A., Christopoulos, C., Ebrahimi, T.: The JPEG 2000 still image compression standard. IEEE Signal Process. Mag. 18(5), 36–58 (2001). https://doi.org/10.1109/79.952804 3. Xiao, B., Lu, G., Zhang, Y., Li, W., Wang, G.: Lossless image compression based on integer Discrete Tchebichef Transform. Neurocomputing 214, 587–593 (2016). https://doi.org/10. 1016/j.neucom.2016.06.050 4. Nagendran, R., Vasuki, A.: Hyperspectral image compression using hybrid transform with different wavelet-based transform coding. Int. J. Wavelets Multiresolut. Inf. Process. 17(2), 1–21 2019. https://doi.org/10.1142/s021969131941008x 5. Chen, X., Zhou, Y., Luo, Q.: A hybrid monkey search algorithm for clustering analysis. Sci. World J. 2014 (2014). https://doi.org/10.1155/2014/938239 6. Vincent, C.S., Janet, J.: An enhanced N-pattern hybrid technique for medical images in telemedicine. Procedia Comput. Sci. 79, 305–313 (2016). https://doi.org/10.1016/j.procs.2016. 03.040 7. Palanivelu, L.M., Vijayakumar, P.: Effective image segmentation using Particle Swarm Optimization for image compression in multi application smart cards. In: Proceedings of the World Congress Information and Communication Technologies WICT 2011, pp. 535–539 (2011). https://doi.org/10.1109/wict.2011.6141302 8. Horng, M.H.: Multilevel thresholding selection based on the artificial bee colony algorithm for image segmentation. Expert Syst. Appl. 38(11), 13785–13791 (2011). https://doi.org/10.1016/ j.eswa.2011.04.180 9. Xie, W., Li, Y., Ma, Y.: PCNN-based level set method of automatic mammographic image segmentation. Optik (Stuttg) 127(4), 1644–1650 (2016). https://doi.org/10.1016/j.ijleo.2015. 09.250 10. Zuo, Z., Lan, X., Deng, L., Yao, S., Wang, X.: Optik an improved medical image compression technique with lossless region of interest. Optik—Int. J. Light Electron Opt. 126(21), 2825– 2831 (2015). https://doi.org/10.1016/j.ijleo.2015.07.005 11. Mandal, D., Chatterjee, A., Maitra, M.: Robust medical image segmentation using particle swarm optimization aided level set based global fitting energy active contour approach. Eng. Appl. Artif. Intell. 35, 199–214 (2014). https://doi.org/10.1016/j.engappai.2014.07.001 12. Hao, P., Shi, Q.: Matrix factorizations for reversible integer mapping. IEEE Trans. Signal Process. 49(10), 2314–2324 (2001). https://doi.org/10.1109/78.950787 13. Kiran, R., Kamargaonkar, C.: Region separation techniques for medical. 1314–1325 (2016) https://doi.org/10.15680/ijirset.2016.0502021 14. Li, C., Huang, R., Ding, Z., Gatenby, J.C., Metaxas, D.N., Gore, J.C.: A level set method for image segmentation in the presence of intensity inhomogeneities with application to MRI. IEEE Trans. Image Process. 20(7), 2007–2016 (2011). https://doi.org/10.1109/TIP.2011.214 6190 15. Yang, X.S.: A new metaheuristic bat-inspired algorithm. Stud. Comput. Intell. 284, 65–74 (2010). https://doi.org/10.1007/978-3-642-12538-6_6

Attention Mechanism-Based News Sentiment Analyzer Sweta Kaman

Abstract Sentiment Analysis is the task of determining the feeling or opinion of a chunk of text. One of the crucial applications of sentiment analysis is to classify the various news articles into three fundamental categories of sentiments—negative, positive, and neutral. By considering the sentiments of news articles, we can figure out whether the writer’s sentiment is negatively or positively oriented. Multiple models have been constructed to analyze the news articles but none of them compassed at both the levels of a sentence as well as the whole document. In this paper, I have proposed a method which can perform this task at both the levels with accurate results, by using LSTM network and a deep learning framework, i.e., attention network. Keywords Sentiment analyzer · News web crawler · Attention mechanism · Deep learning · Text classification · Semeval 2016 · LSTM · NLP

1 Introduction The global news media or news industry which is a very important source of information linked with all of our lives is associated with numerous biased and unbiased press groups according to [1]. We still await to believe that these news sources which influence each one of us in many different ways, delivers true stories to the public and not the sugar-coated one. However, some news channels, press groups and websites are corrupt and political party dedicated and intentionally publishes fake news, hate speech, etc., that transmits an alarming and disturbing environment. They are awfully engaged in publishing negative stories that they have forgotten their role in the society, i.e., enlightening the citizens of a country with reality and spreading positivity and hope. Attention mechanism [2] accommodates a neural network to pay attention to only a specific part of an input sentence while generating a translation much like a human translation. The task of sentiment analysis [3] along with the attention mechanism will help us to identify those websites and news sources by S. Kaman (B) Department of Science of Intelligence, IIT Jodhpur, Karwar, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_25

235

236

S. Kaman

paying attention to each of the negative words present in the article and finding their intentions that will ultimately lead us to prevent us from following them and sustain an environment filled with positivity. The goal of the proposed model in this paper is to analyze the sentiments of the article and to classify them into negative, positive, and neutral with high accuracy. The brief explanation of the structure of this paper is elaborated as follows. Section 2 of this paper discusses the pros and cons of the existing methods; Sect. 3 of this paper discloses the dataset used in the project and the method to prepare the train and test dataset; Sect. 4 elaborates the proposed methodology step by step in completing the task; Sect. 5 of this paper presents the experimental results and predictions of the proposed model; Sects. 6 and 7 conclude this paper with some research accomplishments and future work of this project.

2 Related Work Analyzing the sentiments of a news article have been in trend since last few years. Numerous models have been proposed to perform this task with high accuracy. One of the method is proposed by Lim et al. [4] in which the author has used a machine learning technique to predict the opinion of an article with business headlines. The task of analyzing the sentiments has also been modified by classifying the text into two categories, i.e., “good news and bad news” as proposed by Alvarez et al. [5], where they have focused only in positive and negative sentiments. But in my opinion, both the sentence and document level analysis, by focusing at all three sentiments are important while classifying the article, which is not included in the existing models and the methodology of which is interpreted in the next sections of this paper.

3 Data Preparation 3.1 Train Dataset SemEval-2016 Task 4 [6] has been used as the training data for my model which consists of a combination of training and additional datasets. The training dataset lacked necessary amount of data to train my model so I successfully assembled all the additional data files and train data files into a single file consisting of 53,368 sentences since “larger the corpus, higher the accuracy.” The training data contains three basic sentiment columns which I explicitly constructed corresponding to each of the train sentences. I named these three columns as negative, positive, and neutral. The values inside each of these columns denote the presence or absence of the name head sentiment tag, i.e., if the value is 1 under the neutral column, then the sentiment

Attention Mechanism-Based News Sentiment Analyzer

237

Fig. 1 An instance of training data (Source Jupyter Notebook)

of the sentence is neutral, and if the value is 0, then vice versa, and so on (follow Fig. 1).

3.2 Test Dataset BeautifulSoup [7] that analyzes the XML and HTML files and is a Python library which has been used in the proposed model to create a news crawler. The input to the crawler will be news articles from any news sources, and the output will be preprocessed and clean tokenized sentences of the article. These sentences will be compiled together with the whole article itself to reconstruct the test data, i.e., if the number of sentences in the article is “n,” then “n + 1”th row in the file will consist of the article itself for the document level analysis. Three columns will be explicitly generated as described in Sect. 3.1, but here, the values will be initialized to 0.

4 Proposed Methodology After performing some primitive preprocessing on the train and test datasets, glove embeddings [8] which stand for global vector embeddings are used, which were invented by a group of Stanford researchers. This assists us to consider the global property of the dataset unlike the other word embeddings like Word2Vec [9] which considers only the local property of the dataset. This makes use of the co-occurrence matrix or simply the count matrix which helps in extracting the semantic relationship between the words and to predict the words which are semantically sound to the words around it. This reduces the dimension of the word vector embedding, and the glove embeddings which I have used in this model are 300 dimension vectors. Then, the text to sequence is done, and padding is added in the data after which the shape of the train data will be (53,368, 150), and the shape of final test data will be (310, 150). After segregating the train data into train and validation with a ratio of 80:20, it is ready to be fed into the neural network which uses attention mechanism and LSTM network that helps us to decide what type of output we want to generate. The activation function used is “relu,” the optimizer used to minimize the loss value which is “rmsprop,” and the loss function used is “binary_crossentropy.” The brief summary about the network is displayed in the following Fig. 2.

238

S. Kaman

Fig. 2 Summary of the model (Source Jupyter Notebook)

5 Results The model which is being depicted in Sect. 4 is now ready for training and for which I set the number of the batch size as 256 and number of epochs as 25. After the successful training, the model achieved an accuracy of 0.9186 and loss of 0.1885 which is displayed as follows (Fig. 3). The validation loss and accuracy are 0.1948 and 0.9257 respectively.

5.1 Final Predictions The final output consists of predicted scores for the probable sentiments corresponding to each of the sentences of the test dataset. There are total 310 rows in the final result out of which the last row, i.e., ID 309 indicates the whole article itself, and the remaining rows are sentences of the article. The three sentiment tag columns consist of predicted scores such that the highest amongst all three scores will decide the final sentiment of the sentence. The last row of the output is the document level prediction of the news article, corresponding to which the highest score is 0.999970 that comes under the column ‘neutral, as illustrated in the figure. This affirms that

Fig. 3 Statistics after training the model

Attention Mechanism-Based News Sentiment Analyzer

239

Fig. 4 Final predictions at sentence and document level (Source Jupyter Notebook)

the document level sentiment is neutral. And rest of the rows in the final prediction are the sentence level predictions of the test dataset which is laid out in the following Fig. 4.

6 Conclusions This project can successfully contribute to the task of analyzing the sentiment of news sources. The three elementary sentiments, i.e., neutral, negative, and positive are successfully predicted with an accuracy of 0.9186 by using attention mechanism at sentence and document level. This model is not limited to predict the sentiments for news articles only but can also be modified further and can be implied in numerous other fields of natural language processing.

7 Future Work The task of sentiment analysis has an ample amount of application areas, and in the forthcoming years, there will be an increase in these numbers. This project can be further modified to get much better performance by using XLNET [10] or BERT [11] which are the pre-trained models of deep learning. The news articles which I have used can be replaced by the conversations of people and the task of deception detection can comply in it. Determine the emotions and mental health of people

240

S. Kaman

during the times of pandemic like COVID-19 [12], or to detect fake news’ in multiple platforms since it spread chaos amidst people. Acknowledgements This project has been fortunately executed because of the inspirational ideas and teachings I got from numerous remarkable projects of Dr. L. Dey, chief scientist at TCS Research and Innovation, India.

References 1. Eveland, J.W.P., Shah, D.V.: The impact of individual and interpersonal factors on perceived news media bias. Polit. Psychol. 24(1), 101–17 (2003) 2. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998– 6008 (2017) 3. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008) 4. Lim, S.L.O., Lim, H.M., Tan, E.K., Tan, T.P.: Examining machine learning techniques in business news headline sentiment analysis. In: Computational Science and Technology, pp. 363–372. Springer, Singapore (2020) 5. Alvarez, G., Choi, J., Strover, S.: Good news, bad news: a sentiment analysis of the 2016 Election Russian Facebook Ads. Int. J. Commun. 14, 27 (2020) 6. Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.: SemEval-2016 task 4: sentiment analysis in Twitter. arXiv preprint arXiv:1912.01973 (2019) 7. Chandrika, G.N., Ramasubbareddy, S., Govinda, K., Swetha, E.: Web scraping for unstructured data over web. In: Embedded Systems and Artificial Intelligence, pp. 853–859. Springer, Singapore (2020) 8. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014) 9. Rong, X.: Word2vec parameter learning explained. arXiv preprint arXiv:1411.2738 (2014) 10. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: generalized autoregressive pretraining for language understanding. In: Advances in Neural Information Processing Systems, pp. 5754–5764 (2019) 11. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) 12. Mehta, P., McAuley, D.F., Brown, M., Sanchez, E., Tattersall, R.S., Manson, J.J.: COVID-19: consider cytokine storm syndromes and immunosuppression. Lancet 395(10229), 1033–1034 (2020)

Interactive Chatbot for COVID-19 Using Cloud and Natural Language Processing Patel Jaimin, Patel Nehal, and Patel Sandip

Abstract Spreading of COVID-19 is making it very hard for healthcare departments and governments to solve people queries; due to limited front-desk assistance. This problem can be solved by cutting-edge technology like artificial intelligence. The purpose of this study is to implement AI-powered chatbot, which is very helpful to give necessary information about the disease to the user in a conversational way. With the help of NLP technology, Cloud and message application, we can build chatbots very easily. We can easily define the flow of the conversation in an AI service, which have an excellent capability of natural language technology. We have built a chatbot which can give common question–answer to the user and on the top of it also able to predict the sign and severity of COVID-19 based on user’s symptoms. As users interact with the bot, we can make our NLP model more accurate, by training it further, to understand users’ questions meaning. Keywords COVID-19 · Health care · Cloud computing · Artificial intelligence · Natural language processing · Chatbot

1 Introduction A new disease in late December 2019 has been spreading rapidly in China, which has been named “coronavirus disease 2019,” also known as “COVID-19” [1]. Within a few weeks, COVID-19 disease has been spread rapidly outside china in all over the world. On March 11, the World Health Organization has declared it as a pandemic P. Jaimin · P. Nehal (B) · P. Sandip Smt. K D Patel Department of Information Technology, Chandubhai S. Patel Institute of Technology (CSPIT), Faculty of Technology & Engineering (FTE), Charotar University of Science and Technology (CHARUSAT), Changa, Gujarat 388421, India e-mail: [email protected] P. Jaimin e-mail: [email protected] P. Sandip e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_26

241

242

P. Jaimin et al.

[2]. COVID-19 is a respiratory disease spreading from person to person contact. The incubation period [3] of this disease is between 2 and 14 days. The primary symptoms are fever, cough and shortness in breath, although studies found that a majority of cases, up to 80%, are asymptomatic in India [4]. Older adults with underlying medical conditions like heart or lung disease or diabetes are at higher risk for developing COVID-19 severe illness [3]. Currently, there is no vaccine or treatment available for COVID-19. So, the only way to help slow the spread of this virus is Nonpharmaceutical Interventions [5] taken by people and communities. In response, the government is working closely with public health partner. In this pandemic situation, government and health departments are receiving a high amount of requests about COVID-19 information which is very difficult to process in a short time. Due to technological advancements, chatbots [6] play leading role by bridging the gap between patients and clinicians. Because of the recent advancement in natural language processing [7], chatbots can recognize the meaning of a user’s message (abbreviated “Intent [8]”) and give the relevant information accordingly in reply. Hence, the user can directly ask their specific questions and saves much time going through traditional interface on the website [9]. Chatbots save healthcare costs when used in place of a human or assist them as a preliminary step of helping to assess a condition and providing self-care recommendations [10]. This project aims to spread awareness about COVID-19 by giving relevant information to the user’s questions and predicts the severity of COVID-19 based on the user’s inputted symptoms. Apart from this, this research shows that how the cutting edge field artificial intelligence [11] becomes useful in this type of pandemic situation with the help of the cloud.

2 Literature Review Table 1 show that 17 chatbots regarding COVID-19. In these bots, we have found only one conversational bot, which is Providence St. Joseph. Some bots are providing primary information about COVID-19 in the form of question–answer, while some are checking the sign or severity of COVID-19 based on the user’s symptoms.

3 Proposed Model Figure 1 depicts the COVID-19 Assessment bot flowchart. There are a total of three services used in this system—(i) Messenger application [12], which is responsible for the user interface, (ii) IBM Watson Assistant [13], which is natural language processing service from IBM; it is liable to detect the intent of the text message and deliver an appropriate response back to the messenger application and (iii) AWS Lambda [14], which is a cloud service where custom code can be executed. It is responsible for predicting the severity and sign of COVID-19. The flow of this system

Interactive Chatbot for COVID-19 Using Cloud …

243

Table 1 Chatbot survey Bot name

Platform

AI Conversational Symptoms Q/A service checker

World Health Organization’s health alert Whatsapp (+41 79 893 18 92)

No

No

No

Yes

Cleveland clinic’s tool [15]

Webchat

No

No

Yes

No

Cobot-19

Whatsapp No (+91 7948058218)

No

No

Yes

Coronavirus self-checker [16]

Webchat

Yes

No

Yes

No

COVID-19_Mohw [17]

Facebook Messenger

No

No

No

Yes

COVID-19-assessment-chatbot-template Webchat [18]

No

No

Yes

No

COVID-19-cases-tracker [19]

Webchat

No

No

No

No

COVID-19-faq-chatbot [20]

Webchat

No

No

No

Yes

Delhi Govt—corona helpline

Whatsapp + 91 8800007722

No

No

No

Yes

Gog covid-19 helpline

Whatsapp + 91 7433000104

No

No

No

Yes

GOV.UK

Whatsapp + 44 7860064422

No

No

No

Yes

MyGov corona helpdesk

Whatsapp + 91 9013151515

No

No

Yes

Yes

MyGov corona hub [21]

Facebook Messenger

No

No

No

Yes

Project baseline [22]

Webchat

No

No

Yes

No

Providence St. Joseph [23]

Webchat

Yes

Yes

Yes

Yes

Symptoms checker [24]

Webchat

No

No

Yes

No

TS Gov Covid Info

Whatsapp + 91 9000658658

No

No

No

Yes

starts with the user text entered in messenger application, which will be sent to IBM Watson Assistant to recognize the meaning of the text. The assistant identifies the intent of the text message and sends the response back to the messenger application. The response is defined in the dialog of that particular intent. If the assistant identifies the intent as the assessment test intent, then POST request webhook will be called to AWS lambda, where custom code will be executed and give the prediction of COVID-19 sign and severity as a webhook response, which will be sent back to messenger application by Assistant as a dialog response.

244

P. Jaimin et al.

Fig. 1 System workflow for COVID-19 assessment

4 Implementation COVID-19 assessment bot implemented in node.js [25] programming language. The user interface of this bot is the Facebook messenger application. Facebook page is required to give a unique identity to a bot. Messenger application sets up from Facebook for developers [26]. Subscribing the Facebook page from the messenger application links the page to the application so that the user can communicate with the page through the messenger application. In order to make bot conversational, AI service is essential, which can recognize the meaning of the user’s message and respond accordingly. We have chosen IBM Watson Assistant for this bot. In Watson Assistant, we defined some intent and entity. Intent is responsible for getting the meaning of text message [27], and Entity is responsible for recognizing stored values [28]. In Watson Assistant, we have to define some dialog and set input in the form of intent or entity. When Watson Assistant recognizes set intent, the dialog will be triggered and send the response text, which has configured in that dialog. Besides, a webhook is required to run some custom code from an external server to get a response text. In this application, we have

Interactive Chatbot for COVID-19 Using Cloud …

245

used AWS lambda service to predict COVID-19 sign and severity based on user inputted symptoms. Lastly, the Facebook messenger application needs to be linked with Watson Assistant to make the bot conversational. When a user enters some text, it will be sent to IBM Watson Assistant by messenger application. Watson Assistant will perform some natural language processing and give the response text based on the user’s text meaning. So, in this way, a user will get a personalized experience with the bot. It saves the user’s time and effort to get the required information as they can directly ask their query instead of going through structured design on websites to get information. Figure 2 depicts Chatbot Profile, Fig. 3 represents Welcome message, Fig. 4 denotes question–answer and Fig. 5 shows Symptoms Checker. Fig. 2 Chatbot profile

246

P. Jaimin et al.

Fig. 3 Welcome message

5 Conclusion In this demo bot, we are successfully able to give information to the user based on the message. This type of healthcare bot does not require having a dedicated server and storage. It can be easily implemented in the cloud with the help of AI services such as IBM Watson Assistant [13], Google Dialogflow [29], and Microsoft Azure [30]. In addition, we can easily integrate it into a website or any other messaging platform. Thus, a chatbot is very useful, especially in this type of pandemic situation, when the healthcare department can lower their burden to solve primary user’s queries with the help of conversational chatbots.

Interactive Chatbot for COVID-19 Using Cloud … Fig. 4 Question–answer

247

248

P. Jaimin et al.

Fig. 5 Symptoms checker

References 1. APA Wu, Y.-C., Chen, C.-S., Chan, Y.-J.: The outbreak of COVID-19. J. Chin. Med. Assoc. 83(3), 217–220 (2020). https://doi.org/10.1097/jcma.0000000000000270 2. https://www.cdc.gov/mmwr/volumes/69/wr/mm6918e2.htm?s_cid=mm6918e2_w 3. https://www.cdc.gov/coronavirus/2019-ncov/symptoms-testing/symptoms.html 4. https://www.indiatoday.in/newsmo/video/what-are-asymptomatic-covid-19-cases-16704222020-04-24 5. https://www.cdc.gov/nonpharmaceutical-interventions/index.html 6. https://en.wikipedia.org/wiki/Chatbot 7. https://en.wikipedia.org/wiki/Natural_language_processing 8. https://www.nlpworld.co.uk/nlp-glossary/i/intent/ 9. Valtolina, S., Barricelli, B.R., Di Gaetano, S.: Communicability of traditional interfaces VS chatbots in healthcare and smart home domains. Behav. Inf. Technol. 39(1), 108–132 (2020) 10. Fadhil, A.: Beyond patient monitoring: conversational agents role in telemedicine and healthcare support for home-living elderly individuals. arXiv preprint arXiv:1803.06000 (2018) 11. https://en.wikipedia.org/wiki/Artificial_intelligence

Interactive Chatbot for COVID-19 Using Cloud … 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30.

249

https://developers.facebook.com/docs/messenger-platform/ https://cloud.ibm.com/docs/services/assistant?topic=assistant-getting-started#getting-started https://docs.aws.amazon.com/lambda/latest/dg/welcome.html https://my.clevelandclinic.org/landing/preparing-for-coronavirus https://www.cdc.gov/coronavirus/2019-ncov/testing/diagnostic-testing.html https://www.messenger.com/t/COVID19.MOHW.BW https://hellotars.com/chatbot-templates/coronavirus-covid19-fight/NkBd08/covid-19-assess ment-chatbot-template https://hellotars.com/chatbot-templates/coronavirus-covid19-fight/VJXnAu/covid-19-casestracker https://hellotars.com/chatbot-templates/coronavirus-covid19-fight/NJJrH-/covid-19-faq-cha tbot https://www.facebook.com/MyGovIndia/ https://www.projectbaseline.com/study/covid-19/ https://coronavirus.providence.org/ https://www.buoyhealth.com/symptom-checker/ https://nodejs.org/en/docs/ https://developers.facebook.com/docs https://cloud.ibm.com/docs/assistant?topic=assistant-intents https://cloud.ibm.com/docs/assistant?topic=assistant-entities https://dialogflow.com/ https://azure.microsoft.com/en-in/

Investigating the Performance of MANET Routing Protocols Under Jamming Attack Protiva Sen and Mostafizur Rahman

Abstract Mobile ad hoc networks are a genre of wireless networks that can perform as both routes and hosts and have the competency to organize dynamically without using static infrastructure. Because of the scarcity of central administration and rapid topological changes, they are mostly affected by various security attacks. Jamming attack is a physical layer attack which is responsible for decreasing network performance by isolating the communication with neighboring nodes. This paper aims to find out the network performance under jamming attack on three routing protocols such as geographical routing protocol (GRP), optimized link state routing protocol (OLSR) and ad hoc on-demand distance vector (AODV). The simulation of these protocols is considered with respect to performance parameters, network load, throughput and delay by using Riverbed simulator. Finally, the outcome of different scenarios is compared to find out the better performing protocols in case of jamming attack. Keywords MANET · Jamming attack · Riverbed · AODV · GRP · OLSR

1 Introduction In recent days, mobile ad hoc network (MANET) has gained its popularity because of its dynamic characteristics and mobility and also can handle any kind of changes that is happened within the network. However, it is not mandatory to have any central management [1]. Every node in the network plays a role to discover the routes and maintain connection with other nodes around. A great benefit of MANET is that it can be generated in any place, any time and any natural conditions without the necessity of any pre-installed infrastructure [2]. Note that wireless network faces P. Sen (B) · M. Rahman Department of Electronics and Communication Engineering, Khulna University of Engineering and Technology, Khulna 9203, Bangladesh e-mail: [email protected] M. Rahman e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_27

251

252

P. Sen and M. Rahman

more security challenges than wired network because of the mobility of the nodes [3, 4]. Pulse jamming attack is one of the most serious attacks in denial of service (DoS) attack which prevents information transmission between genuine sender and receiver. Sometimes malicious node can detect the original signal and destruct the communication [5]. In this investigation paper, we represent the performance of three routing protocols as AODV, OLSR and GRP using medium FTP, medium email and low database traffic with respect to performance parameters, throughput and delay. The same things are implemented under pulse jamming attack to find out the best performed protocol under attack and without attack.

1.1 Jamming Attack Jamming attack is a well-known DoS attack. Due to the characteristics of wireless communication, MANET is subjected to security attack [3, 6]. The responsibility of jamming attack is to prohibit nodes from transmission and reception of data packets on network [1]. It is caused by matching the radio frequency of sender node by a jammer device which continuously transmit radio signal to match with it [2, 7]. The performance of the network is decreased by affecting network load, throughput, end to end delay, data dropped, etc. (Fig. 1).

2 Related Works Singh and Gupta [1] analyzed the performance of MANET under jamming attack and without jamming attack. AODV routing protocol was selected for simulation. The network performance was surveyed with simulation results of delay, data dropped and network load. From the comparison, it was concluded that jamming attack is Fig. 1 Jamming attack

Investigating the Performance of MANET Routing Protocols Under …

253

responsible to decrease the network performance. Rao et al. [2] investigated the performance of MANET routing protocols such as AODV, DSR, GRP and OLSR. Eight performance metrics were used to compare the simulation result. Based on these results, OLSR’s performance proved better than others. Popli and Raj [3] configured a network with high mobility and AODV routing protocol. The network in normal condition was compared with the network under jamming attack for end to end delay and throughput. Jassim [8] presented the effect of jamming attack on WLAN. Jamming attack decreased the throughput and increased delay. To mitigate the jamming attack, PCF was enabled into the guard nodes.

3 MANET Routing Protocols Depending on different routing criteria, MANET routing protocols are classified. Based on the way of routing information attained and maintained, MANET routing protocols are categorized into proactive routing protocol (table-driven), reactive routing protocol (on-demand) and geographical routing protocol (Hybrid).

3.1 AODV Routing Protocol Ad hoc on-demand routing protocol is used for mobile nodes in MANET which can handle thousands of nodes at a time. It performs route table management to find only one destination route instead of multiple routes and no other nodes are required to maintain it. The main features of this protocol are that it can adopt quickly and requires lower processing and lower utilization of network [9]. There are three message formats which are used in this protocol. One of them is route request (RREQ) which is a broadcast routing and used to discover a new route to the receiver node. Second one is route reply (RREP) which is a unicast routing and reply to its RREQ flood, and route error (RERR) which is a re-broadcast message, used only when at least one unreachable destination is found [10].

3.2 Geographical Routing Protocol (GRP) A familiar routing protocol for mobile network is GRP which deals with the position of source node. Because of its combined robustness of proactive and reactive routing protocol, it is named as hybrid routing protocol [2]. Source node is liable for collecting the information for finding the best route, and according to this information, data packets are started to transmit. A great disadvantage of this protocol is complexity and overhead [10].

254

P. Sen and M. Rahman

3.3 Optimized Link State Routing Protocol (OLSR) A suitable protocol for random traffic and large network is optimized link state routing protocol. It uses multipoint relay (MRP) of node to forward packets rather than flooding them [2]. It executes hop by hop routing and successfully delivered its packets to the destination by following the shortest route. Its proactive behavior makes it possible to get available routes immediately when needed [11]. The distributed characteristic of its design makes sure that no central entity is required. OLSR is compatible to any dense network where a considerable amount of networks perform frequent communication [10].

4 Experimental Details In this section, required simulation tool along with the several simulation setups will be described.

4.1 Simulation Tool In this paper, Riverbed 17.5 modeler is used for network simulation. It was previously known as OPNET simulator. The modified version of OPNET is Riverbed. It is specialized for network researches and development. It facilitates the user to design communication networks with different devices and protocols, test securities and simulate the network with different performance parameters and applications [7]. Several experiments have been settled in wireless technologies regarding with development problems and their solutions.

4.2 Simulation Setup The simulation setup describes MANET’s performance for three different protocols. For each protocol, two scenarios are designed to compare its performance under jamming attack and without attack. A campus network is implemented with 20 mobile nodes in 10 * 10 km. area. Each scenario is set to run for simulation as 250 s and the seed value as 128. For traffic generation on the network, the used applications are email (medium load), FTP (medium load) and database (low load) (Tables 1 and 2).

Investigating the Performance of MANET Routing Protocols Under … Table 1 MANET parameters

Table 2 Jammer parameters

255

Name

Value

Mobility model

Random waypoint

Mobility speed

10 m/s

Traffic type

Email, FTP, database

Ad hoc routing protocol

AODV, OLSR, GRP

Pause time

50 s

Trajectory

Vector

Packet size

16,000 bits

Physical characteristics

Direct sequence

Data rate (bps)

11 mbps

Transmit power (W)

0.005

Packet-reception threshold

−95

Rts threshold (bytes)

128

Short retry time

4

Long retry time

7

Max receive lifetime (secs)

0.5

Buffer size (bits)

1,024,000

Name

Value

No. of jammer

3

Trajectory

Vector

Jammer band base frequency

2402

Jammer bandwidth

100,000

Jammer transmit power

0.001

Pulse width

1.0

5 Results and Analysis The simulation outcomes are compared and studied in this section. Jammer nodes are established inside the network for comparing the performance of this new network with its normal condition. This comparison is accomplished by observing throughput, delay, traffic sent, traffic received, etc.

256

P. Sen and M. Rahman

Fig. 2 Performance of AODV (with and without jamming attack)

5.1 Comparison of Jamming Attack Under AODV Protocol In the first scenario, AODV protocol is configured without any jammer node. This scenario is then modified by introducing jammer nodes and compared the performance of the new scenario with previous scenario. From the simulation result, it is clear that jammer nodes decrease the performance of the network by creating unwanted traffic. It reduces throughput from 4.5 megabits to 3.0 megabits and increases delay from 3.6 to 4.8 s. Figure 2 shows the performance of AODV in case of normal condition and under attack condition.

5.2 Comparison of Jamming Attack Under OLSR Protocol The same networks are configured with OLSR protocol with and without jamming attack. Figure 3 shows the performance parameters of OLSR protocol without and with jamming attack. When the routing traffic sent is compared, it is almost 114,000 bits without the existence of jammer whereas this value reduces to almost 72,000 with the presence of jammer. Then the results of both simulations are compared by using throughput and delay of the network. After introducing jammer nodes, throughput reduces from 4.0 megabits to 2.52 megabits and delay rises from 3.9 to 5.75 s which causes due to the congestion in the network. As a result, the overall performance is fallen down.

Investigating the Performance of MANET Routing Protocols Under …

257

Fig. 3 Performance of OLSR (without and with jamming attack)

5.3 Comparison of Jamming Attack Under GRP Protocol In this case, again two scenarios are simulated for GRP protocol; one is with the involvement of jammer nodes and another is without jammer node. The performance parameters of GRP protocol are compared for these two scenarios which are shown in Fig. 4. This comparison gives clear evident about the degradation of network performance during jamming attack. Throughput and delay of the network under GPR protocol is also affected because of jamming attack. It decreases the number of packet reached at the destination during run time.

Fig. 4 Performance of GRP (without and with jamming attack)

258

P. Sen and M. Rahman

6 Conclusion Due to the behavior of wireless media between the sender node and the receiver node, MANET is more susceptible to different attacks. These attacks are responsible to fall down the network performance. The objective of this research was to find out the reliable wireless routing protocol in the face of jamming attack. The networks under AODV, OLSR and GRP protocol are all severely affected by jamming attack. Among these three protocols, OLSR indicates the worst performance in term of traffic sent, delay, throughput, traffic received and network load. From the observed results of these three protocols, it can be concluded that GRP and OLSR protocols are more attackable under jamming attack. On the contrary, AODV is verified as the best performer in case of jamming attack among aforementioned three protocols. For this reason, configuring the network with AODV protocol will be the best choice to withstand under jamming attack. Security issue in WSN is now a great concern. This research work would be further expanded to other security attacks like wormhole attack and byzantine attack. Prevention mechanism against these attacks will also be discussed.

References 1. Singh, J., Gupta, S.: Impact of jamming attack in performance of mobile ad hoc networks. Int. J. Comput. Sci. Trends Technol. (IJCST) 5(3), 184–190 (2017) 2. Rao, Y.C., Kishore, P., Prasad, S.R.: Riverbed modeler simulation-based performance analysis of routing protocols in mobile ad hoc networks. Int. J. Recent Technol. Eng. (IJRTE) 7(6S), 350–354 (2019) 3. Popli, P., Raj, P.: Effect of jamming attack in mobile ad hoc environment. Int. J. Sci. Eng. Technol. Res. (IJSETR), 5(5), 1521–1526 (2016) 4. Yadav, N., Dr. Kumar, V.: Securing ad hoc network by mitigating jamming attack. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 4(6), 2502–2506 (2015) 5. Bandaru, S.: Investigating the effect of jamming attacks on wireless LANS. Int. J. Comput. Appl. (0975–8887) 99(14), 5–9 (2014) 6. Manickam, P., Baskar, T.G., Girija, M., Dr. Manimegalai, D.: Performance comparisons of routing protocols in mobile ad hoc networks. Int. J. Wirel. Mob. Netw. (IJWMN) 3(1), 98–106 (2011) 7. Jasim, S.I.: PCF investigation to improve the performance of TORA—based manet against jamming attacks. Int. J. Comput. Sci. Eng. Survey (IJCSES) 5(3), 17–28 (2014) 8. Jassim, S.I.: Investigate the integration of PCF in WLAN to improve its performance against attackers. J. Univ. Babylon Pure Appl. Sci. 26(5), 241–255 (2018) 9. Modi, S., Dr. Singh, P., Dr. Rani, S.: Performance improvement of mobile ad hoc networks under jamming attack. Int. J. Comput. Sci. Inf. Technol. 5(4), 5197–5200 (2014) 10. Baxla, S., Nema, R.: Performance Analysis of ODV, OLSR, DSR and GRP routing protocols of adhoc networks. Int. J. Innovative Res. Dev. 2(5), 888–900 (2013) 11. Jacquet, P., Muhlethaler, P., Clausen, T., Laouiti, A., Qayyum, A., Viennot, L.: Optimized link state routing protocol for ad hoc networks. In: Proceedings, IEEE International Multi Topic Conference, 2001, IEEE INMIC 2001, Technology for the 21st Century, pp. 62–68. IEEE (2001)

Classification of Skin Cancer Lesions Using Deep Neural Networks and Transfer Learning Danny Joel Devarapalli, Venkata Sai Dheeraj Mavilla, Sai Prashanth Reddy Karri, Harshit Gorijavolu, and Sri Anjaneya Nimmakuri Abstract Skin cancer is among the life-threatening cancers, but unlike most cancers, skin cancer is observable and can be detected in early stages, yet not many are aware of its detectability. There are mainly three types of skin cancers, which are basal cell carcinoma, squamous cell carcinoma, and melanoma, where melanoma is the most dangerous type of cancer with a very low survival rate. Skin cancers are not painful, most of the time, even though they appear to be visibly distressing it makes them easily detectable, as cancer is nothing but the abnormal growth of skin cells. A person can detect if a skin lesion is cancerous by taking a picture. Deep neural networks can be used to classify the type of cancer. This can be done by collecting and feeding several clinical images of cancerous skin lesions, segmentation, removing noise, etc., to a deep neural network to train on before detecting cancerous lesions. Our data was scraped from the Internet and few images were collected from the HAM10000 dataset, ISIC Archive, and scraped images from the Web. Every class has 3552 images which are a total of 10,656 images; image augmentation was used to generate images to make all classes have an equal number of images. The first model was a basic CNN model that trained, several times changing the hyperparameter values to fine-tune the model to give accurate results, which gave us 86.5% accuracy and implemented transfer learning with the ImageNet weights of different ImageNet

D. J. Devarapalli (B) · V. S. D. Mavilla · S. P. R. Karri · H. Gorijavolu · S. A. Nimmakuri Department of Computer Science and Engineering, Vignan Institute of Technology and Science, Hyderabad, Telangana, India e-mail: [email protected] V. S. D. Mavilla e-mail: [email protected] S. P. R. Karri e-mail: [email protected] H. Gorijavolu e-mail: [email protected] S. A. Nimmakuri e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_28

259

260

D. J. Devarapalli et al.

models, where ResNet50 gave us the highest accuracy of 95.6%. We have deployed this into a Web application using JavaScript and tensorflow.js. Keywords Transfer learning · Deep neural networks · Skin cancer · Image classification

1 Introduction There are many types of skin diseases and skin cancers and some of them can be fatal. Skin cancers must be treated in the earlier stages or it proves to be deadly in the long run. Many people neglect the skin lesions which can cost their life.In rural areas, where the hospitals are not well equipped, there is struggle to detect the diseases in early stages. So, the need for automatic skin disease prediction is increasing for the patients and as well as the dermatologist. The available current diagnosis procedure consists of long laboratory procedures and takes two weeks for the patient to get their biopsy results, but this system will enable users to predict skin disease using image processing. With a predictor system, a patient or the dermatologist can check whether the lesion is malignant or not easily, so that cancer can be treated in the early stages. We collected images of three most common skin cancers and the prediction system is used to predict these three skin cancers, namely melanoma, squamous cell carcinoma (SCC), and basal cell carcinoma (BCC). Early detection of melanoma can potentially improve survival rate, yet nearly 30,000 people die yearly in the USA alone. Skin cancers do not give any pain most of the time, but they are visible, as the cancer is nothing but the abnormal growth of skin cells. In this paper, we have discussed our prior study in the literature survey that has been published focusing on a similar objective, in the Proposed Work section we have discussed our study and workflow of its implementation, and Results section which shows promising results when transfer learning is used. The Future Scope section of this paper discusses the real-world possible implementations of this work.

1.1 Objective Primary Objective: Our primary objective is to classify the skin cancers to their types by building the best model using convolutional neural network and transfer learning. Second Objective: Our second objective is to learn how convolutional neural networks work on sample data like this, to learn and carefully observe other architectures such as VGG16/19, InceptionV2/V3, and ResNet50/101, which have excelled in ILSVRC throughout the years. Study the results with and without transfer learning.

Classification of Skin Cancer Lesions Using Deep Neural …

261

2 Literature Review There have been many types of research following not only skin cancer detection/classification but also on many skin diseases, and some of the papers that have been published before [1]. Proposed a technique to classify a melanoma lesion only. A very careful study on skin cancers and their anatomy performed three preprocessing techniques—grayscale conversion, noise removal, and image enhancement and used support vector machine classifier; the image is segmented and then fed into the fit function for training, as melanoma is tested based on the shape and used a very small sample size, which cannot identify the possibilities of real-world images [2]. Computer-aided clinical skin disease diagnosis using CNN and object detection models gives us the influence of object detection technique in this approach for training, which can increase the accuracy and decrease the computation cost by removing unwanted background learning, used an ensemble learning approach to get the final output, and used two datasets Skin-10 (which contains images belonging to 10 classes of skin diseases, with a total of 10,218 images) and Skin-100 (which contains 100 classes of common skin diseases with a total of 19,807 images). The best accuracy achieved is 79.01% for dataset Skin-10 and 53.54% for dataset Skin100, the reason we believe for the low accuracy is because of the class imbalance after checking the dataset they mentioned, and the model was sensitive toward the high volume class, which may have led to the poor performance. The importance of object detection and ensemble learning methods is emphasized. Jana et al. [3] is a research on skin cancer cell detection using image processing gives the role of segmentation, features extraction in image processing, and proposes a technique to remove unwanted features in the area of interest, such as hair. Ambad [4] is an image analysis system to detect skin diseases, and a workflow comprises basic steps, used two-level classifier, which is a great idea, the first classifier detects if a lesion is normal or defected and further if it is infected, the second classifier classifies whether the lesion is a melanoma, psoriasis, dermo. A two-way classifier is a good approach.

3 Methodology This project is aimed at determining the class of a skin cancer lesion to its correct classified class of skin cancer, achieved by training a deep convolutional neural network and other segmentation methods. The major area of concentration and evaluation would be in defining the number of epoch cycles, batch sizes to be good enough to fit non-linearly to the data. The whole workflow of this research is by following steps, and a clear explanation and description are given to every step.

262

D. J. Devarapalli et al.

3.1 Data Preprocessing Data cleaning is done by removing the unwanted images and cropping the images so that the lesion is focused. After data collection, the images are uploaded to the Google Drive in different folders, that is, HAM 10,000, Web, social media. Some images contained watermarks and clinical markings on lesions which might confuse the model; these pictures have been either avoided or edited using the tool GIMP. The images are resized to 224 × 224 × 3 since most of the ImageNet models use this image format.

3.1.1

Generating Data

This step involves the final image data that is ready to be generated to a deep neural network. Since our dataset is imbalance, which means our classes have a different number of images, for melanoma we have found many images, the data available was huge since it is very common, but we found less number of images for squamous and basal cell carcinoma, for melanoma it was 1500+ images, SCC 700+, and BCC 900+ images. With class imbalanced data, there is a greater chance that the model might overfit to the class belonging the highest number of classes, or be sensitive to only the class that has more images. To avoid this vulnerability to overfitting, we have used the ImageDataGenerator class that generates images with specified image augmentations. Image Augmentation Brownlee [5] Image augmentation is a technique that is used to artificially expand the dataset. This is helpful when we are given a dataset with very few data samples. Image augmentation parameters that are generally used to increase the data sample count. We did not use the ones that would distort the image too much by further removing the lesion features/information. We have generated 3552 images for each class, and total dataset image files to 10,656 images. The parameters and their respective values are as follows, brightness_range = [0.3, 1.0], zoom_range = [0.5, 1.0], horizontal_flip = True, rotation_range = 90, and vertical_flip = True.

3.2 Prediction Techniques In deep learning, model training is the most tiring job, since it takes a lot of time to train, to overcome this we have trained our models on Google Colab which allows users to use their computer engine backend integrated GPUs and TPUs for free, with almost 12 GB of RAM. We started by building a simple CNN architecture to train on our data.

Classification of Skin Cancer Lesions Using Deep Neural …

3.2.1

263

CNN—Basic Architecture

CNN architecture starts with feature extraction, followed by pooling layers, fully connected layers, and finishes with classification. Feature extraction is performed by changing convolution layers with subsampling or pooling layers. Classification is performed with dense or fully connected layers followed by a final softmax layer. For image classification, CNN architecture performs better than a fully connected feedforward neural network. Deotte [6] A basic CNN architecture contains the following. • Filters is the number of desired feature maps. • Kernel size is the size of the convolution kernel. A single number 5 means a 5 × 5 convolution. • Padding is either ‘same’ or ‘valid’. Leaving this blank results in padding = ‘valid’. If padding is ‘valid’ then the size of the new layer maps is reduced by kernel_size1. For example, if you perform a 5 × 5 convolution on a 28 × 28 image (map) with padding = ‘valid’, then the next layer has maps of size 24 × 24. If padding is ‘same’, then the size is not reduced. • Activation is applied during forward propagation. Leaving this blank result in no activation. • We used ‘ReLu’ activation for every layer and softmax in the end, we also used a drop out of 0.4 (40%) to generalize data, thereby avoiding overfitting. Our architecture takes in an input_size of (28, 28, 1) ‘grayscale’ image and consists of: • Two convolutional layers with feature map 32 × 32 and kernel_size 3 × 3, with activation ‘ReLu’ and one convolutional layer with feature map 32 × 32, kernal_size 5 × 5 with stride 2. • Two convolutional layers with feature map 64 × 64 and kernel_size 3 × 3, with activation ‘ReLu’ and one convolutional layer with feature map 64 × 64, kernal_size 5 × 5 with stride 2. • Flatten layer followed by a fully connected layer, dense—128, a dropout of 0.4, dense—3(number of classes present). • The following model is trained with batch_size = 32 and epochs = 100, we added batch normalization after every layer also we added two dropouts, one with 0.4 after two layers, and 0.5 before the final fully connected layer, we used kernel regularizer = l2(0.001), bias regularized = l2(0.001). The accuracy after 100 epochs was 0.8647 and on training accuracy, it was 0.9947, but this was not our best model. We next trained with ILSVRC models that were trained on the ImageNet dataset. We try to train our data without ImageNet weights and solely on their architectures to see the results and then use the transfer learning method to fine-tune the model and train it with ImageNet pretrained weights. We have used ResNet101, VGG, and InceptionResNetV2 with and without transfer learning, which means weights as None and with transfer learning weights as ‘ImageNet’. We created a pickle object file of our image data ready in our drive to avoid extracting data every time we ran by just loading it from Google Drive.

264

3.2.2

D. J. Devarapalli et al.

Without Transfer Learning

We used Keras.applications to use VGG16, InceptionResNetV2, and ResNet101 architectures, from the documentation provided by Keras/applications, the weights parameter as None. The input images were extracted and resized to 224 × 224 × 3, since most of the ImageNet models use this image format and include top = False. To this, we added a global average pooling 2D layer, dropout of 0.5 for VGG16 and ResNet101, and a dropout of 0.4 for InceptionResNetV2, all with activation ‘ReLu’ and a dense layer of 3, for three classes in the dataset. All the models have loss function as ‘sparse_categorical_crossentropy’ since, one image belongs to only one type of cancer, and the metrics values = [‘accuracy’] for all. ResNet101 Fung [7] A residual neural network (ResNet) is an artificial neural network (ANN) that is based on constructs known from pyramidal cells in the cerebral cortex. Residual neural networks utilize skip connections or shortcuts to jump over some layers. ResNet-N is a deep residual network that is N layers deep. It is a subclass of convolutional neural networks, with ResNet most popularly used for image classification. • We have used stochastic gradient descent (SGD) optimizer with a learning rate of 0.01 and a momentum of 0.9. • With ResNet’s 101 layers, the trainable params: 42,558,979, non-trainable params: 105,344. With batch_size of 32, epochs 20, and validation_slipt 0.2. TRAINING ACCURACY = 0.891 and VALIDATION ACCURACY = 0.8116. VGG16 Tewari [8] VGG16 is a CNN model. The model achieved 92.7% top-5 test accuracy with the ImageNet dataset. This network consists of a very simple architecture using only 3 × 3 convolutional layers stacked on top of each other in increasing depth. • We have used Adam optimizer with a learning rate of 0.0001. • With VGG16’s architecture, total params: 3,588,003, trainable params: 3,588,003, non-trainable params: 0, with batch_size of 32, epochs 100, and validation_slipt 0.2. TRAINING ACCURACY = 0.9981 and VALIDATION ACCURACY = 0.8918. InceptionResNetV2 Raj [9] Inception layer is a combination of a 1 × 1 convolutional layer, 3 × 3 convolutional layer, 5 × 5 convolutional layer with their output filter banks concatenated into a single output vector forming the input of the next stage. • We have used Adam optimizer with a learning rate of 0.0001 the same as VGG16. • With InceptionV2’s architecture, total params: 54,451,939, trainable params: 54,391,395, non-trainable params: 60,544. With batch_size of 32, epochs 10, and validation_slipt 0.2. TRAINING ACCURACY = 0.9575 and VALIDATION ACCURACY = 0.866.

Classification of Skin Cancer Lesions Using Deep Neural …

3.2.3

265

With Transfer Learning

Marcelino [10] With transfer learning, instead of starting the learning process from scratch, we start from patterns that have been learned when solving a different problem, such as ImageNet in our case. This way we leverage previous learning and avoid starting from scratch which would save us a lot of time. There are three types of transfer learning or strategies to implement transfer learning. 1. Train the entire model. 2. Train some layers and leave the others frozen 3. Freeze the convolutional base. Out of the three strategies, the second strategy is used when we have a small dataset, as the dataset of 10 k images is small compared to the 14million images of ImageNet. We used Keras.applications to use VGG16, InceptionResNetV2, and ResNet101 architectures, from the documentation provided by Keras/applications, the weights parameter as ‘ImageNet’. The input images were extracted and resized to 224 × 224 × 3, since most of the ImageNet models use this image format and include top = False. All the models have loss function as ‘sparse_categorical_crossentropy’ since one image belongs to only one type of cancer, and the metrics values = [‘accuracy’] for all. VGG16-ImageNet Weights • We added a convolutional layer to this with feature maps 64, kernal_size = (3, 3), a max pooling 2D with pool_size = 2, a flatten layer, a fully connected layer of 256, a dropout of 0.5, and a fully connected layer 3 with a ‘softmax’ activation. • We have used Adam optimizer with a learning rate of 0.0001. • With VGG16’s architecture, total params: 12,278,915, trainable params: 12,278,915, non-trainable params: 0. With batch_size of 32, epochs 100, and validation_slipt 0.2. • TRAINING ACCURACY = 0.9785 and VALIDATION ACCURACY = 0.9009, which is an acceptable result, but the difference value can be accepted but we cannot certainly call it an optimal model. As the difference of accuracies shows evidence of overfitting. InceptionResNetV2—ImageNet weights • We added a flatten layer to the loaded model, a dropout of 0.4, and finally a dense layer 3 with ‘softmax’ activation. • We have used Adam optimizer with a learning rate of 0.0001. • With InceptionResNetV2’s architecture, total params: 54,451,939, trainable params: 54,391,395, non-trainable params: 60,544. With batch_size of 32, epochs 10, and validation_slipt 0.2. • TRAINING ACCURACY = 0.9927 and VALIDATION ACCURACY = 0.9314, which is an acceptable result, but the difference value can be accepted but we

266

D. J. Devarapalli et al.

Fig. 1 ResNet101 learning curves with transfer learning

cannot certainly call it an optimal model. The difference also does not show much of the evidence of overfitting. ResNet101—with ImageNet Weights • For ResNet101, we have added a global average pooling 2D layer, a dropout of 0.4, and finally a dense layer 3 with ‘softmax’ activation. • We have used Adam optimizer with a learning rate of 0.0001, as we used for the rest of the models. Gave us better results with a slow learning rate. • With ResNet101’s architecture, total params: 42,664,323, trainable params: 42,558,979, non-trainable params: 105,344. With batch_size of 32, epochs 100, and validation_slipt 0.2. • TRAINING ACCURACY = 0.9992 and VALIDATION ACCURACY = 0.9563, which is an acceptable result and almost an optimal model, but the difference value is accepted. This has been the best result so far with training and testing accuracies having very less difference, which means that the model has trained well on the training data of 8516 images, generalized well without overfitting and it can 95.63% accurately predicts to new data, and compared with the ResNet101 model without using any pretrained weights, we can say the transfer learning method outperformed the traditional approaches. • The learning curves (in Fig. 1) also do not show much of the evidence of overfitting. Compared with ResNet101 without pretrained weights.

4 Results The experimental results are shown below for the input images of skin cancers, by the means of transfer learning approach. Hence, from the results obtained from the deep learning algorithms (in Table 1), it can be accounted for that ResNet101 is the best algorithm for predicting the class of skin lesions.

Classification of Skin Cancer Lesions Using Deep Neural … Table 1 Accuracies of different deep learning algorithms

Algorithm

Accuracy

With ImageNet weights

Training accuracy

267

Validation accuracy

VGG16

0.9785

0.9001

InceptionResNetV2

0.9927

0.9317

ResNet101

0.9992

0.9563

5 Conclusion The project’s key goal is to predict an image of skin lesion to its type with the highest possible accuracy, by the means of transfer learning approach. Several architectures have been trained with different learning rates, epochs, and batch sizes, however ResNet101 architecture with ImageNet weights has given us the best accuracy to identify the type of a skin cancer lesion ever to be published or recorded, which is 95.63% with training accuracy of 99.92%, and we do not see the model overfitting in this case. Also, an ensemble approach that has been said to give better results is implemented by using a basic voting mechanism that was written in Python. We have tried to deploy this model as a Web application, but we got few errors with the express server and tensorflow.js version. Our second goal of understanding how these deep neural networks work and knowing how to implement them and fine-tune them to get better results is achieved.

5.1 Future Scope Since the project identifies the cancer lesion type, this can be used by both dermatologists and patients. Before sending the clinical image for the biopsy test dermatologists can run the lesion through the model and based on the results, they can focus on validating if the lesion belongs to the class that the model has specified. This would cut down the delay of 2 to 3 weeks for the biopsy results. If the model predicts inaccurately in certain conditions then the model can be trained with the wrongly classified images to better learn the features it missed in the first learning. For better reliability on the model, because we cannot solely trust a machine for final prediction and take the result of the machine as a final answer, we can find the performance of dermatologists and the machine, by providing the dermatologists and the model to classify a set of images and validate them with their predictions and evaluate the performance of the model over an experienced doctor.

268

D. J. Devarapalli et al.

References 1. Ansari, U.B., Sarode, T.: Skin cancer detection using image processing. Int. Res. J. Eng. Technol. (IRJET) (2017) Mumbai, India. Available at: https://www.irjet.net/archives/V4/i4/ IRJET-V4I4702.pdf 2. He, X., Wang, S., Shi, S., Tang, Z.: Computer-Aided clinical skin disease diagnosis using CNN and object detection models, Nov 2019, China. Available at: https://www.researchg ate.net/publication/337413270_Computer-Aided_Clinical_Skin_Disease_Diagnosis_Using_ CNN_and_Object_Detection_Models 3. Jana, E., Subban, R., Saraswathi, S.: Research on skin cancer cell detection using image processing. In: IEEE-International Conference on Computational Intelligence and Computing Research (ICCIC), Dec-2017, India. Available at: https://ieeexplore.ieee.org/document/852 4554 4. Ambad, P.S.: An image analysis system to detect skin diseases. IOSR J. VLSI Sig. Process. (IOSR-JVSP) (2016) India. Available at: https://pdfs.semanticscholar.org/014e/75f75274d4b8 a75ae3e2356556f7450fdb5a.pdf 5. Brownlee, J.: How to configure image data augmentation?, (2019). Available at https://mac hinelearningmastery.com/how-to-configure-image-data-augmentation-when-training-deeplearning-neural-networks/ 6. Deotte, C.: Basic CNN architecture, (2018). Available at https://www.kaggle.com/cdeotte/howto-choose-cnn-architecture-mnist 7. Fung, V.: An overview of resnet and its variants, (2017). Available at https://towardsdatascie nce.com/an-overview-of-resnet-and-its-variants-5281e2f56035 8. Tewari, S.: CNN architecture series—VGG16 with implementation (Part I), (2019). https:// medium.com/datadriveninvestor/cnn-architecture-series-vgg-16-with-implementation-part-ibca79e7db415 9. Raj, B.: A simple guide to the versions of inception networks, (2018). Available at https://toward sdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202 10. Marcelino, P.: Transfer learning from pre-trained models, (2018). Available at https://toward sdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751

Security Features in Hadoop—A Survey Gousiya Begum, S. Zahoor Ul Huq, and A. P. Siva Kumar

Abstract Extensive usage of Information and Communication Technology applications including online banking, ecommerce, retail, social media and smart phone apps etc. are responsible for creation of extremely large amounts of digital data every hour which is termed as Big Data. To store and process Big Data, an open source framework is needed and the framework is known as Apache Hadoop. Initially it was developed for internal use at Yahoo with limited security features. Later Hadoop was made open source and distributed under Apache License. Once Hadoop was made open source, so many developers contributed to Hadoop development. In this process many authentication, authorization services were developed and distributed as part of Hadoop framework. In this paper, the security features of Hadoop are discussed, their limitations and also highlight the future scope of work with respect to Hadoop Security. Keywords HDFS · Map reduce · Security · Kerberos · ACL

1 Introduction The large amount of data collected is known as Big Data. The data is collected from various sources like social media, databases etc. Initially Big Data characteristics are specified by 3v’s. They are variety, velocity and volume. Volume specifies the size of data to be stored. Recent study forecasted that 1.8 zettabytes of data was created in 2011 alone [1]. Around 2.5 quintillon bytes of data is created everyday and every G. Begum (B) CSE Department, JNTU, Ananthapuramu, India e-mail: [email protected] S. Z. U. Huq CSE Department, GPREC, Kurnool, India e-mail: [email protected] A. P. Siva Kumar MGIT, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_29

269

270

G. Begum et al.

single second it is increasing. Variety specifies whether the data is structured or unstructured data [2]. Velocity specifies how fast the data is coming in and how fast it has to be processed. Apart from 3 V’s today it has been increased to 51 V’s. They are (variety, velocity, volume, volatility, validity, veracity, vanilla, voice, visualization, victual, value, viability, verification, verbosity, vexed, versatility, voluntariness, vet, vulpine, variability, viscosity, vocabulary, verdict, venue, versed, violation, vibrant, versioning, vagueness, vitality, virality, visibility, vastness, varmint, vantage, valor, varnish, veer, vaticination, vane, varifocal, vault, veil, virtuosity, vivification, vogue, voodooism, voyage, verve, venturesomeness) [3]. The data collected from various sources has to be processed and to process this data traditional system resources are not enough so Hadoop is used. Hadoop is a fully distributed massive parallelism framework powered by Apache which stores Big Data and process it in an environment of distributed fashion in various computers which form clusters using programming languages such as java. Hadoop has its own file system(HDFS) and it is placed above host computer file system [4]. HDFS is used to store Big Data and MapReduce is used to process. It uses scale out technique. YARN (Yet Another Resource Negotiator) is used as resource manager [5, 6]. The organization of paper is as follows: Sect. 2 describes about Hadoop architecture. Section 3 about literature survey. Section 4 describes about Hadoop security techniques. Section 5 describes about conclusion and future work.

2 Hadoop Architecture Figure 1 describes about Hadoop architecture. In this architecture, data is divided into blocks and these blocks are distributed on different nodes/machines. In each

Fig. 1 Architecture of hadoop framework

Security Features in Hadoop—A Survey

271

machine there are some splits and for each split we are running a mapper that is parallelism inside parallelism. This is called massive parallelism.. The client stores the file on HDFS to perfom some operation on it. So, he sends a request to NameNode to store the data. NameNode stores files metadata and tells the client, the location of data nodes where the files in HDFS has to be stored [7]. The client then stores the files in particular data nodes. After successfully storing the file, DataNode sends acknowledgment to the client. The data is replicated in minimum 3 systems to avoid data loss. A heartbeat is sent to NameNode from data nodes to inform about block report and also tell NameNode that data node is still running. If the client want to perform processes then Job Tracker will do that, it will ask NameNode where the data is stored and Task Tracker will perform the functions required. While processing the two functions done are Map and Reduce. One the two functions are completed the output from all data nodes are taken and reduced to a single output. To tell about current functioning of Task Tracker, it also send a heartbeat to Job Tracker. The main distributors of Hadoop software [8, 9] are the 1. MapR, 2. Cloudera, 3. Hortonworks, 4. Microsoft HD Insight, 5. IBM Infosphere Big Insights, 6. Pivotal HD.

3 Literature Survey Hadoop has developed with less security. In 2009 Security implementation has started to hadoop. Hadoop distributors like Cloudera, Hortonworks and MapR have proprietary security but these features are not present in Apache releases of Hadoop. Security of Hadoop is based on four levels as per Cloudera [10]. • Authentication: It explains who are the users who can be authenticated. • Authorization: It explains which authenticated users can access how much data and what data. • Audit: It monitors when the data is accessed, where the data is accessed from and how the data can be accessed. • Encryption: It explains how the data can be protected when it is at rest or moving. A. Authentication/Perimeter Security: It is added in 2010 and the aim of providing authentication is clients accessing the cluster should be genuine and servers of the cluster should be authenticated. One of the concept used for authentication is Kerberos. It is an authentication protocol [11, 12] used in networks which is used to provide authentication for applications. It uses the concept of tickets and based upon symmetric encryption cryptography. First the user will be authenticated himself/herself with Authentication Service (AS) [30] by providing a password then Ticket Granting Ticket (TGT) is issued by Ticket Granting Service (TGS) [13] and stores that in cache. When the user wants to get any service, he/she sends TGT to TGS and gets a service ticket using which is used to authorize the access the service. Though Kerberos is the good solution for authentication it is also having some disadvantages like

272

G. Begum et al.

• For every MapReduce job if the user use TGT then the Key Distribution Center(KDC) would quickly become the bottleneck and traffic will increase and there is a chance of getting Distributed Denial of Service attack. • Kerberos tickets are not renewal frequently by which there may be hackers who can hack the tickets and use the system. • Deployed code should be complaint to Kerberos so there should be a separate planning and testing for performing authentication to the code. • If KDC fails HDFS or MapReduce will not work. • To identify authentication breaches KDC does not have any strategy. To decrease above disadvantage Delegation Tokens are used. The user use Delegation Token for authenticating with servers. MapReduce processes use these delegation tokens for authenticating themselves by a Name Node whenever they want to access HDFS. Delegation Token uses HMAC mechanism and they are stored as HashMap in server where key contains public information and value contain private information. Block Access Token [11]: For data block accesses on Data nodes, client should be authenticated so NameNode provides Block Access Token which is available only for less time (default 10 h) and it cannot be renewed after that time. In [14], An authentication protocol called Trusted Platform Module(TPM) developed using which authentication is provided to hadoop internal items. The problem with this TPM is that it believes that NameNode is trustworthy. B. Authorization: In HDFS authorization is primarily governed by file permissions. To provide access to a file or directory in HDFS it requires permissions. Similar to Linux system permissions like read, write and delete are given to owner, group and others. Any member of the group defined in dfs.permissions.superusergroup in NameNode can read, write or delete any file and directory. HDFS supports three additional special permissions: sticky, setgid and setuid. The sticky bit is used for directories, such as/tmp, where you want all users to have write access to the directory but only the deletion of data is done by owner of the data. Hadoop enable authorization based on ACLs (Access Control Lists) [15, 16]. It supports access control lists (ACLs) on the job queues by controlling which users can submit jobs to queues and which users can administer a queue. Apache Sentry is developed in order to resolve such issues of access control. In Apache Sentry [17] a fine-grained role- based access controls (RBAC) is used to give administrators the flexibility to control what can be accessed by the users. Extended ACLs are introduced from Hadoop version 2.4. Extended ACLs are enabled on NameNode. The configuration property dfs.namenode.acls.enabled is set to true in hdfs-site.xml. Authorization in hadoop is supported at service level also and hadoop.security.authorization variable is set to true in core-site.xml. This will check which users or groups of users can access which protocols so as to stop unauthorized access. The actual authorization polices are configured in hadoop-policy.xml file. HDFS, MapReduce and YARN

Security Features in Hadoop—A Survey

273

also supports authorizations at service level. MapReduce or YARN does not control access to data but only provide access to resources of the clusters like memory, disk, CPU and network I/O. C. Auditing means keeping track of what users and services are doing in the cluster [18]. In HDFS, audit.log is used for auditing which will audit user activities like when the user create a file, change file permissions, etc. To audit at service level, SecurityAuth-hdfs.audit, is used. The log files used for auditing in HDFS are log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit and log4j.category.SecurityLogger. Auditing in MapReduce focuses on end user queries and jobs and to audit this mapred-audit.log is used. To audit authorization at service level, SecurityAuthmapred.audit, is used. The log file used for auditing in MapReduce are log4j.logger.org.apache.hadoop.mapred.AuditLogger and log4j.category.SecurityLogger D. Encryption: Encryption can be done for Data-in-Transit encyrption and Dataat-Rest encryption 1. HDFS Data-at-Rest Encryption Data at Rest encryption is encrypting data at application layer before it is sent in transit and reaches storage. This type of encryption runs above the operating system layer and it requires Hadoop system packages or hardware only. Within HDFS, directory paths which has to be encrypted are broken down into encryption zones. A unique data encryption key(DEK) [19] is used to encrypt each file in encryption zone. A zone level encryption known as encryption zone key (EZK) is used to encrypt the DEK into encrypted DEK (EDEK) as a plain text DEKs does not exist. These EZKs should not be stored in HDFS because if it is stored in HDFS decryption becomes easy, so EZKs must be accessed through a secure key server. In big enterprises, actual storage component is taken care by a dedicated hardware security module(HSM). Hadoop Key Management Server(KMS) is used between HDFS clients and key server. The KMS handles both EZKs and DEKs, communicating with the key server, and decrypting EDEKs. The KMS communicates with the key server through a Java API called the KeyProvider. 2. Data-in-Transit Encryption Transport Layer Security: SSL/TLS are thee protocols which is used for securing data which moves through network. They are used to secure any socket connection. SSL/TLS trust on a certificate authority(CA) for providing security Hadoop Datain-Transit Encryption: Hadoop used RPC, TCP/IP and HTTP [20, 21] are used to communicate over the network RPC calls are used by API clients of MapReduce, Job Tracker, Task Tracker, NameNode and data nodes. TCP/IP sockets for data transfer are used by HDFS MapReduce shuffles uses HTTP protocol. (a) Hadoop RPC Encryption: Hadoop’s RPC implementation supports SASL which supports integrity, confidentiality and authentication using different variables.

274

G. Begum et al.

RPC protection mechanism of Hadoop is configured in core-site.xml file with the hadoop.rpc.protection property. (b) HDFS TCP/IP protocol encryption:To transfer data between clients in HDFS a direct TCP/IP socket is used. For data transfer encryption, in hdfs-site.xml file, dfs.encrypt.data.transfer is set to true. 3. Hadoop HTTP encryption: HTTPS is used to encrypt Data-in-Transit.

4 Some Security Techniques Data Leakage Prevention (DLP) Technology [22] is used for data security against data leakage. This technology is introduced in year 2000. The disadvantage of DLP is if the data is removed it does not able to protect it. Verizon released a white paper [23] on cloud security. This model is been divided into four security layers: • Base: which takes cares of physical security • Logical: checks the integrity, availability and confidentially of data, resources in network. It has network, compute, management and storage sublayers • Value-Added: Provides a Private IP network capability, Firewall capabilities and VPN capabilities. • Governance, Risk and Compliance: Ensure all measures of security in above three layers Based on Verizon, a security architecture by Twilio company has been introduced. Twilio is a cloud company which implements Hadoop reliably using Amazon S3 services. It uses S3 policies and ACL. According to [24], The computation of MapReduce is distributed in nature and there is a chance for variety attacks such as • Impersonation attack: A illegitimate user acts like a legitimate user by some brute force attack and run map reduce jobs which result in data leakage. • A Denial of Service attack: An attacker stops the functioning and accessing of mapper or reducer using different tasks which are undesirable. • A replay attack: Attacker use previous tasks to the data nodes and keeps them busy continuously. • An eavesdropping attack: Attacker gives input data and generate intermediate and final outputs without MapReduce computations. • A Man in the middle attack [25]: Attacker modifies or corrupts computing code between two legitimate users. Proper authorization, authentication, restricted access, confidentiality and input to mapper class and reducer class is required for Secure MapReduce computation. According to the author in [11], a Bull Eye Algorithm is proposed for hadoop. This algorithm allows read or write of data only by authorized persons and when

Security Features in Hadoop—A Survey

275

implementing it will check that data is encrypted for better protection. Only highly confidential data is stored in data node is checked. There is one more approach given in [11] called NameNode approach and to increase the security in the data available, two name nodes are used where one is master and the other is slave. Name Node Security Enhance (NNSE) provides the two redundant name nodes and these name nodes uses Bull Eye Algorithm. Apache Knox [26, 27] is a framework for supporting security on Hadoop clusters. It is a REST(Representational State Transfer) API gateway. The REST interact with clusters using one access point. Authentication using LDAP [28] and Active Directory is managed by System Administrators. Through knox, they conduct an HTTP header-based federated identity management and audit hardware on clusters. Apache Ranger [17] is a centralized framework used to manage policies at resource level and it developed various tools and techniques to standardize security across Hadoop clusters. It also provides authorization in Hadoop. Apache Rhino provides a security solution for Hadoop ecosystem. It is a framework based on crypto codec and offers block level encryption of data in hadoop. It also provides Token based authentication and SSO solution. To encrypt data blocks, various key distribution and management functions which executes MapReduce jobs are also supported. Audit logging framework for auditing [29] is also provided.

5 Conclusion and Future Scope In this paper various levels of security defined by Cloudera like Authentication, Authorization, Audit and Encryption levels is being discussed. At each level some feature is added to provide security but each level is confined to some limitations. We also discussed some more techniques to provide security to Hadoop. Existing features clearly indicate that the security framework of Hadoop is confined to authentication and authorization. There is no such mechanism to detect malicious jars and presence of harmful code in Pig scripts and Hive queries. A legitimate user may also execute a jar which may contain harmful code. Hence a concrete solution to detect harmful code and prevent from executing on HDFS is needed.

References 1. Tang, Y., Yang, J.: Secure Deduplications of General Computations. Columbia University 2. Geczy, P.: Big data characteristics. Macrotheme Rev. 3(6), 94–104 (2014) 3. Khan, N., Naim, A., Hussain, M.R., Ahmad, N., Qamar, S.: The 51 V’s of big data: survey technologies characteristics opportunities issues and challenges. In: Proceedings of ACM Omni-layer Intelligent Systems Conference (COINS’19), ACM, Heraklion, Crete, May (2019) 4. Martis, M., Pai, N.V., Pragathi, R.S., Rakshatha, S., Dixit, S.: Comprehensive survey on hadoop security. Springer Nature Singapore Pte Ltd, (2019) 5. Horwitz, J., Nugent, A., Halper, F., Kaufman, M.: Big Data for Dummies. Wiley (2013)

276

G. Begum et al.

6. Akshata, Chandrashekhar, B.S.: An execution for security scheme in hadoop. MAT J. Comput. Sci. Eng. Softw. Testing 4(2) 7. Das, D., O’Malley, O., Radia, S., Zhang, K.: Adding Security to Apache Hadoop. HortonWorks (2011) 8. Erraissi, A., Belangour, A., Tragha, A.: A big data hadoop building blocks comparative study. Int. J. Comput. Trends Technol. (IJCTT) 48(1):336 (2017). ISSN: 2231-2803 http://www.ijcttj ournal.org 9. Securosis: Securing Hadoop: Security Recommendations for Hadoop Environments. Securosis. White paper, Mar 2016 (2014-06-13). Knox Gateway Available: http://knox.apache.org/ 10. Bhatal, G.S., Singh, A.: Big data: hadoop framework vulnerabilities, security issues and attacks. Elsevier 11. Saraladevi, B., Pazhaniraja, N. Paul, P.V., Saleem Basha, M.S, Dhavachelvan. P.: Big data and Hadoop—a study in security perspective. In: 2nd International Symposium on Big Data and Cloud Computing (ISBCC’2015). Elsevier 12. Kohl, J., Neuman, C.: The Kerberos network authentication service (V5). (2017) 13. O’Malley, O., Zhang, K., Radia, S., Marti, R., Harrell, C.: Hadoop security design. Yahoo, Inc., Tech. Rep (2009) 14. Dou, Z., Khalil, I., Khreishah, A., Al-Fuqaha, A., Robust insider attacks countermeasure for hadoop: design and implementation. IEEE Syst. J. (2017) 15. Shetty, M.M., Manjaiah, D.H., Hemdan, E.E.D.: Policy-Based access control scheme for securing hadoop ecosystem. Springer Nature Singapore Pte Ltd. (2019) 16. Narayana, S., Securing hadoop implement robust end-to-end security for your Hadoop ecosystem. Packt Publishing 17. Gupta, M., Patwa, F., Sandhu, R.: An attribute-based access control model for secure big data processing in hadoop ecosystem. In: ABAC’18, Mar 21 (2018), Tempe, AZ, USA, 13 18. Spivey, B., Echeverria, J.: Hadoop security: protecting your big data platform. O’Reilly Publishers (2015) 19. Cloudera Security Report, Cloudera version Cloudera Enterprise version 5.5x (2016) 20. Perwej, Y.: The hadoop security in big data: a technological viewpoint and analysis. Int. J. Sci. Res. Comput. Sci. Eng. 7(3), 1–14 (2019) 21. Parmar, R.R., Roy, S., Bhattacharyya, D., Bandopadhyay, S.K., Kim, T.-H.: Large-Scale encryption in the hadoop environment: challenges and solutions. IEEE (2017) 22. Security and Privacy in the Era of Big Data. The SMW, a Technological Solution to the Challenge of Data Leakage, Arenci/ National Consortium for Data Science White Paper 23. Sharif, A., Cooney, S. Gong, S.: Current security threats and prevention measures relating to cloud services, hadoop concurrent processing, and big data. In: IEEE International Conference on Big Data, Washington, DC, USA (2015) 24. Philip Derbekoa, S.D.E.G.S.S.: Security and privacy aspects in MapReduce on clouds: a survey. Comput. Sci. Rev. 1–28 (2016) 25. Butt, K.K., Li, G., Rehman, M.O.U.: Comparative analysis of hadoop security Ad-Ons. In: IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC) (2019) 26. Sharma, P.P., Navdeti, C.P.: Securing big data hadoop: a review of security issues, threats and solution. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 5(2), 2126–2131 (2014) 27. Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, R, Lowe, J., Shah, H., Seth, S., Saha, B., Curino, C., OMalley, O., Radia, S., Reed, B., Baldeschwieler, E.: Apache hadoop YARN: yet another resource negotiator. SoCC13, Santa Clara, California, USA, Oct (2013) 28. Priyadharshini, M., Baskaran, R., Srinivasan, M.K., Rodriques, P.: A framework for securing web services by formulating a collaborative security standard among prevailing WS-* security standards. Springer CCIS, Springer, Heidelberg, USA, Sep. 2012, Service, vol. 193, pp. 269– 283 (2012). https://doi.org/10.1007/978-3-642-22726-4_29 29. Kim, S.-H., Lee, I.-Y.: Data block management scheme based on secret sharing for HDFS. In: 10th International Conference on Broadband and Wireless Computing, Communication and Applications (2015)

Optical Character Recognition and Neural Machine Translation Using Deep Learning Techniques K. Chandra Shekar, Maria Anisha Cross, and Vignesh Vasudevan

Abstract Over the years, the applications of text detection and text translation have expanded across various fields. Many researchers have used several deep learning algorithms for text detection and text translation separately. We propose a hybrid methodology to use NMT with OCR to develop a better result to perform text detection and translation from an image. In this paper, we present techniques to detect and recognize texts in Hindi from a given image and translate them into English and vice versa. To achieve this, we are combining two concepts: optical character recognition (OCR) and neural machine translation (NMT). The output from this hybrid scheme gives the optimized feature. Keywords Optical character recognition · Neural machine translation · Convolutional recurrent neural networks · Long short-term memory · Recurrent attention model · Encoder–Decoder model

1 Introduction Deep learning, predominantly used in different AI and machine learning applications, empowers the framework to learn like a human and to improve the capability by training data. Deep learning strategies [1] are capable of feature representation using unsupervised/supervised learning; there even exist higher and increasingly abstract layers. Deep learning is at present being utilized in image applications, big data analysis, machine translation, and speech recognition. K. C. Shekar (B) JNTUH, Hyderabad, Telangana, India e-mail: [email protected] M. A. Cross GNITC, Hyderabad, Telangana, India e-mail: [email protected] V. Vasudevan NIT, Trichy, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_30

277

278

K. C. Shekar et al.

Optical character recognition (OCR) is defined to be the mechanical or electrical change of pictures of printed, composed, or transcribed text into encoded text that a computer system can process, edit, and store as a text file. The images may include examined archives, a photograph of a record, a photograph of a scene containing text, legal forms, street signs, number plates of vehicles, shipping container numbers, and ID cards. We use this concept to detect and recognize the text in the image present in our problem statement [2]. The recurrent attention model (RAM) is based on the idea that when the natural human eye is shown in a particular scene, certain pieces of the picture grab its attention. The eye obtains information by focusing on those pieces first. In the model, the picture is trimmed to various sizes around a typical focus, and glimpse vectors are made with noticeable highlights from each edited rendition. Glimpse vectors are then passed to a location network, which utilizes an RNN to foresee the next piece of the picture to focus. The next input for the glimpse network is this location. Eventually, the model investigates extra pieces of the picture, each time performing backpropagation to check whether the data from the past impressions is adequate to accomplish an elevated level of precision. RAM accomplishes a significant level of precision by “glimpsing” at the picture from alternate points of view and afterward characterizing the output. DRAM uses two RNNs—an area RNN to anticipate the next impression area and a characterization RNN committed to predicting class labels. The convolutional recurrent neural networks(CRNNs) are the mix of two of the most conspicuous neural systems. The CRNN includes convolutional neural network (CNN) trailed by the recurrent neural network (RNN). A CRNN works in layers that break pictures into sections, recognizing connections among characters, and then delivering output. Machine translation is a technique to translate a sentence from one language to another with the assistance of modernized computerized frameworks, without any human help [3]. Various methodologies are available to develop such kinds of frameworks [1], yet we require an increasingly strong strategy to make a more proficient framework than the current ones. Jiajun Zhang and Chengqing Zong [4] gave an extensive outline of usage of DNNs in machine translation from two perspectives: indirect application, which endeavors to improve standard MT frameworks, and direct application, which receives DNNs to structure a simply neural MT model. A well-trained system drives the framework toward its objective, which is to produce a progressively effective translation framework that is fit for achieving great accuracy. A statistics-based system can be developed with training data in which translation of the text is done into several languages. Here, we consider thousands of possible translations, and the probability of each of those translations is similar to the training data under consideration. This is done in three stages: Step 1: Divide the original sentence into multiple pieces. Step 2: Locate every possible translation for each piece. Step 3: Create all possible sentences and locate the most likely one.

Optical Character Recognition and Neural Machine Translation …

279

Statistical models are a challenge to build and maintain as it is an excess of work to develop multiple pipelines and manage the large amounts of training data. To overcome this problem, we use two concepts: RNNs and the encoder–decoder model. By using these two concepts effectively, we can build a self-training translation model. A recurrent neural network (RNN) is an adaptation of a neural network where the next input of the neural network is the previous state. RNNs are expected to see information’s successive qualities and utilize examples to predict the following likely situation. The next most likely word is predicted by considering the first few words of a sentence. Three kinds of RNN cells are commonly used: simple RNN, LSTM, and GRU. The simple RNN limited the training of deep RNNs, so the long short-term memory (LSTM) was developed to address this vanishing gradient problem. The gated recurrent unit (GRU) was built to simplify the LSTM. It has been demonstrated that both the GRU and LSTM are altogether better than the simple RNN; however, in general, the LSTM is commonly better. LSTM cells reliably beat GRU cells in our tests. The definitive objective of our NMT model is to identify the language of the detected text which is the input and translate that text into the desired language that will be returned as output. To be specific, we need an approach to change sentences into a data format that will be given as input into a machine learning model. This is done by converting our textual data into a numeric form by using Encoders. The key advantages of this methodology are the capability to deal with variablelength input and output groups of text and the capability to train a solitary end-to-end model right above the source and target sentences.

2 Related Work Vijaya Kumar Reddy et al. [5] proposed an alternate neural network approach for recognition of Hindi characters written by hand. G Vamvakas et al. [6] proposed a total OCR approach that helps the identification of text in historical documents. This technique can be applied to either hand-written or printed documents. Shashi Pal Singh et al. [7] found that RNN and RAE provide better outcomes in text processing when contrasted to other neural networks. Sarkhel, R. et al. [8] proposed a multiscale deep quad tree-based component extraction technique for the acknowledgment of disconnected transcribed characters of famous Indic contents. Shahnawaz et al. [9] proposed a neural network-based methodology for machine translation from English to Hindi.

280

K. C. Shekar et al.

3 Proposed Methodology Different models and functions have been used in the research reviewed so far, for text detection and text translation. In this study, we propose a model which is capable of performing both these tasks by the use of OCR and NMT in the following way: Step 1: Image preprocessing • Removal of noise present in the image. • Removal of the ambient background. • Handling of the various lighting conditions. Step 2: Using an LSTM cell as a component of a CRNN to divide the image into columns, identifying relationships between characters, and then generating the text. • An established convolutional neural network (CNN)—this is the first layer that breaks the image into features and divides it into feature columns. • These columns are supplemented to a deep-bidirectional long short-term memory (LSTM) cell, which provides a pattern to identify relationships between the characters. • The output of the LSTM cell is then given to a transcription layer, which takes the character sequence, including expendable characters, and takes up a probabilistic approach to clean the output. Step 3: Using the LSTM cell as a component of an encoder–decoder model in identifying the language of the detected text and translating it. • This is achieved by transforming each word into a one hot encoding vector, which is then fed into the model. A one hot encoding vector is merely a vector with “0” at every index except for “1” at a single index corresponding to this specific word. Thus, each word has a distinct one hot encoding vector, and in this way, every word in our dataset can be represented by a numerical index. • To develop this encoding, we need to feed the sentence into the RNN, word by word. The final result is obtained after the last word is processed, which are returned in the form of values that represent the entire sentence.

Fig. 1 Encoder–decoder model

Optical Character Recognition and Neural Machine Translation …

281

• As shown in Fig. 1, two RNNs are placed from back to back: the first RNN is responsible for generating the encoding that represents the recognized sentence, the second RNN is responsible for taking that encoding and applying the same logic inversely, to decode the original sentence; we can edit and train the second RNN to decode the sentence into Hindi or any other language by using the parallel corpora training data to train and develop it.

4 Results and Discussions For our experiments, we used the Devanagri character dataset and street view text dataset to train our model to locate and recognize texts in Hindi and English. Street View Text Dataset Dealing with images that involve ambient noise, lighting issues, and image artifacts is a highly demanding and arduous OCR task. The legacy OCR algorithms cannot normally process the images in this dataset. A sample image from this dataset is shown in Fig. 2. This dataset only has word-level interpretations (no character bounding boxes) and can only be used for the • Recognition of the cropped lexicon-driven word, • Detection and recognition of the full image lexicon-driven word. Devanagri Character Dataset This dataset contains 1800 samples from 36 character sets obtained from 25 varied writers in the Devanagri script. A distinct file is used to store each character, and all these files are comma-separated text-based values. Each character is estimated at around 4 KB. The organized datasets which mirror the 36 classes are stored in the folders. There are 50 such samples inside each class folder. A pattern of coordinates (pen-tip positions) from pen up to pen down movement is considered as one stroke as shown below in Fig. 3. The pattern of strokes made in a pen movement is captured by the digitizer. Fig. 2 Street view text data

282

K. C. Shekar et al.

Fig. 3 Devanagri character data

Table 1 Statistics of IIT Bombay English Hindi dataset Language #Sentences #Tokens #Types

Train

Test

Dev

1,492,827

2507

520

eng

20,667,259

57,803

10,656

hin

22,171,543

63,853

10,174

eng

250,782

8957

2569

hin

343

8489

2625

Fig. 4 Final translation

IIT Bombay English Hindi Parallel Corpus We have used the IIT Bombay English Hindi Parallel Corpus [10] to train our model to translate the detected and recognized text from Hindi to English and vice versa. The approach proposed in this study generates enhanced and optimized features by using LSTM with CRNN which is an encoder–decoder model. The statistics corresponding to the number of sentences, tokens, and types for the different types of data are given in Table 1. The model was then able to detect the text and translate it, as shown in Fig. 4.

5 Conclusion In this paper, we proposed an alternate neural network framework to deal with detecting and recognizing texts in Hindi from an image and translate it into English and vice versa. We utilized LSTM as a part of both CRNN and the encoder–Decoder model to build a model to perform both OCR and NMT together. This approach was trained and tested on a standard user-defined dataset, which was collected from different users. Our future work will concentrate on advancing and optimizing the current recognition and translation results by implementing new approaches for integrating OCR and NMT more efficiently. We would also like to move toward hybrid generic intelligent systems to improve recognition and translation accuracy further.

Optical Character Recognition and Neural Machine Translation …

283

References 1. Cheragui, M.A.: Theoretical Overview of machine translation. African University, Adrar, Algeria, Icwit (2012) 2. Ciregan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012 3. Hutchins, W.J.: Machine translation: past, present, future. (Ellis Horwood Series in Computers and their Applications. Chichester, Ellis Horwood, 1986. 382p. ISBN: 0-85312-788-3 4. Zhang, J., Zong, C.: Deep neural network in machine translation. In: Institute of Automation, Chinese Academy of Sciences, IEEE International Conference on Computer, Communications and Electronics (2017) 5. Reddy, R.V.K., Babu, U.R.: Handwritten hindi character recognition using deep learning techniques. Int J Comput Sci Eng (2019) 6. Vamvakas, G., Gatos, B., Stamatopoulos, N., Perantonis, S.J.: A complete optical character recognition methodology for historical documents. In: IEEE Computational Intelligence Laboratory, Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos”, GR-153 10 Agia Paraskevi, Athens, Greece, 2008 7. Singh, S.P., Kumar, A., Darbari, H., Singh H., Rastogi, A., Jain, S.: AAI, center for development of advanced computing, Pune, India. In: Conference: International Conference on Computer, Communications, and Electronics (Comptelix), 2017 8. Sarkhel, R., Das, N., Das, A., Kundu, M. Nasipuri, M.: A multi-scale deep quad tree-based feature extraction method for the recognition of isolated handwritten characters of popular indic scripts. Pattern Recogn. (2017) 9. Shahnawaz, Mishra, R.B.: A neural network-based approach for english to hindi machine translation. Int. J. Comput. Appl. 53 (2012) 10. Kunchukuttan A, Mehta P, Bhattacharyya P.: The IIT Bombay english-hindi parallel corpus. In: Language Resources and Evaluation Conference (2018)

COVID-19 Touch Project Using Deep Learning and Computer Vision Chatla Venkat Rohit and GRS Murthy

Abstract The world is suffering from the COVID pandemic. Out of empathy we want to serve the planet with whatever possible we can do. We came up with an idea of using technology to maintain physical distancing and tracking sanitizing of each person at public places including ATMs and supermarkets. As monitoring each person is very difficult, we have used detection models to solve the problem. Our model can be deployed easily and can be integrated into a network of existing CC cameras so that it is best used by the society. In our model, we have used close to 60 with max at 92FPS as a trade-off between speed and accuracy, and our project has detected hand touching of the door/chair or any object around with those details with an accuracy score of 96%. Keywords Computer vision · Cloud functions · YOLOv3 · Darknet · Pub/sub pipeline · Transfer learning

1 Introduction COVID-19 pandemic is spread like fire all over the world due to unknowing physical contacts between persons. Keeping in mind that public places are visited by people on a daily basis, we came up with a model which tracks the contacts made by hands with different objects around the vicinity of the camera and alerting whenever we bring our hand closer to face without washing\sanitizing (action). In our work, we considered ATMs and shopping markets are the places that need this model for regulation of the COVID-19 spread. Hand movements and sanitizing of hands can be detected. Every detail of touching, timestamps, and skeleton information is stored in a database, and C. V. Rohit (B) Department of School of Computing, Sastra University, Thanjavur, Tamil Nadu 613401, India e-mail: [email protected] G. Murthy Department of Computer Science and Engineering, Avanthi Institute of Engineering and Technology, Vizianagaram, Andhra Pradesh 531162, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_31

285

286

C. V. Rohit and G. Murthy

using cloud functions, we triggered notifications to citizens whenever they touch their face without washing/sanitizing hands. Our models also sense whether a person wore a mask, gloves or not, to give access to public places. Continuous camera monitoring can be achieved by installing the model at places like ATMs and shopping malls where doors are touched frequently and supermarkets where people get in contact with different objects (fruits, vegetables, etc.). By integrating our model with existing CC cameras officials now only need to sanitize that particular area instead of sanitizing the whole market preventing adverse effects of cleansing more (cost and time). This will also help officials to track travel and engagement history of COVID-19 positive patients.

2 Related Works There are so many projects which adopt YOLOv3 for different purposes but as far as tracking/detecting for COVID cases and alerting clients using detection of human movements and action concern none is available. Recently, India’s main contact tracing technology launched Arogya Setu app [1] which works on the principle of tracking of clients nearby them and alerting if COVID-positive patient is reported nearby the client vicinity, but this will be fruitful to the fullest only when all clients turn on their Bluetooth and GPS continuously, thus allowing to track uninterruptedly but our model can be integrated to shopping malls and ATM’s seamlessly with a simple software plug into existing CC cameras without installation of any sensors. Arogya Setu also has some privacy issues like anyone can change the internal database data snaps, etc., which we hope to be solved.

3 Methodology If we are able to monitor people who are making contact with different objects around them, then we can prevent these diseases from spreading. Our project used YOLOv3 for object detection to recognize the touches made by people near ATMs which is a place that we visit frequently (Fig. 1).

3.1 Working of YOLOv3 YOLO model is fast and accurate enough to recognize different objects and uses a regression mechanism. This model recognizes the images during runtime in each frame and updates the table when required. It uses convolution neural network (CNN) that is used as object detection in real time [2]. Each frame is divided into regions

COVID-19 Touch Project Using Deep Learning and Computer Vision

287

Fig. 1 Architecture diagram of corona touch project

with boundary probabilities. Then, boxes are weighed accordingly to distinguish between different objects [3].

3.2 Preprocessing Steps and Training We preprocessed the images of the human hand, door, and common objects in context (COCO). We have used open image dataset v6 [4] and OIDv4_Toolkit to download all the images and used a custom script to convert Protobuf formatted label to YOLOv3 coordinates format [5, 6]. We have used darknet.conv.53 pre-trained weights and classes to train YOLOv3. We trained some objects: Chair, door, human hand, and used transfer learning for common objects in context (COCO objects). As we are going to use this setup for a custom model, so we needed to write Python scripts that were used as a wrapper to the original darknet repository, and those scripts are used to detect and track (deep sort) human hands touching the objects. To make the dataset serve our needs, we cropped out the unnecessary portions in the background and focused on the foreground required objects.

3.3 Darknet It is a data science framework written in C++ [7]. It can be installed along with two dependencies—1. OpenCV and 2. CUDA. The computation power can be increased

288

C. V. Rohit and G. Murthy

Fig. 2 Overlapping model for tracking

by shifting the process to GPU from CPU using darknet. OpenCV along with darknet gives more freedom to the model for detecting different images and videos. Darknet flexibility makes it a favorite model for use.

3.4 OpenCV In our project, we have used darknet/YOLOv3 for custom training, and we needed to build OpenCV as a binder and should be built from scratch as darknet/YOLOv3 is written in C++. After custom training in our project, we used a custom Python wrapper function for tracking (deep sort) and detecting objects. We used OpenCV for image/video manipulations like opening of the stream, and contours (rectangular boxes) are also constructed using OpenCV [8]. OpenCV is a computer vision library that has 2500+ optimized algorithms for computer vision and machine learning [9]. These algorithms track camera coordinates or movements and extract 3D, 2D models from the image, and for manipulating with the camera and video, image streams [10]. We used box overlapping methods to detect touch and hand coming to face features of our project, for example, if you assume red box as door (Fig. 2) and a white box as hands, then whenever in output white box touches/overlaps/comes inside the red box, then it means hand touching the door [6].

3.5 Posenet We used Posenet for detecting hand wash/sanitization action in our project. Our function returns pose info (wrist left/right points, etc.), skeleton (elbow, arms lines, etc.) info, key points, confidence, and accuracy score for all properties. It is a robust and real-time monocular six degree of freedom relocalization system. It is trained on a convolutional neural network to regress the 6-DOF camera pose from a single RGB image in an end-to-end manner with no need for additional engineering or graph optimization [11]. Posenet’s poses and skeleton helped for us to build a model

COVID-19 Touch Project Using Deep Learning and Computer Vision

289

Fig. 3 Detecting wash/sanitize action

out of it for custom washing/sanitizing actions with custom coordinates calibrations (Fig. 3).

3.6 Tensorflow.js We used Tensorflow.js for the hand and face detection which in turn used for hand bring closer to face action (Fig. 4). We used deep sort for tracking frame by frame and giving different ID’s to similar objects which enables to track all image throughout the duration of a video feed, once is being detected by the object detection at the first frame, if the detection of its presence is lost, then the tracking is stopped for static (position of the camera) video feed. To reach our goal, we tested a few models and gathered meta-information which outstands others and gathered the results. We mainly used YOLOv3 and deep sort algorithm [2] with customization of Posenet and OpenCV as mentioned above for our use case as shown in below section. Fig. 4 Overlapping model for hand to face detection

290

C. V. Rohit and G. Murthy

4 Scenarios for Use Case and Working Applications 4.1 Introduction to in-Home-Based Model What this model basically does is whenever you touch any item in your house, say, for example, a chair with your hand, it keeps tracking this through CC camera context (24 × 7) [6] then that info (metadata) will be sent to the database. In a database with metadata info, we will be having importantly three attributes which are (1) touch, (2) wash, and (3) detect. All are preset to 0, which are actually binary-valued classes (logic high referred by 1). How Cloud functions and pub/sub are used: Whenever touch is detected of human hand with chair/door (Fig. 5), a cloud function is activated to change the Touch variable to 1 and store the metadata (object coordinates, timestamp, contours, etc.) in the database. Whenever you wash your hands (Fig. 6), using Posenet/ml5.js info of poses and skeleton cloud function changes Wash value to 1 and stores the Wash metadata in database [12] and changing Touch to 0 else Touch remains 1 and Wash remains 0 (Fig. 7). Whenever hand coming to face is detected (Fig. 8), then cloud function is used to change the Detect value to Fig. 5 Hand touching chair tracked (1) Touch == 1: notify chair coordinates, etc detail

Fig. 6 Posenet’s coordinates, poses, and skeletons info (2) Wash == 1: wash details

COVID-19 Touch Project Using Deep Learning and Computer Vision

291

Fig. 7 Wash action detected using Posenet/ml5.js

Fig. 8 Hand detected while bringing closer to face using webcam on laptop and notified for the same (3) touch == ‘1’ && detect == ‘1’

1 and obviously not detected means again changing to 0 in real time. Later to buzz the alarm or not by simply checking if both touch and detect attribute is 1 then only to shoot the alert, mobile notifications or of course in all stages too, depending on attributes state (high/low) [6].

4.2 ATM/Supermarket Scenario In the real-world scenario, we can deploy our model (in-home) by adjusting it to particular needs. At first in ATM’s, we can use a session scheme for a person, which will have a completed transaction cycle, from entering to leaving ATM’s, this session will be used for authorization/identity (Fig. 9) of person to store his touching’s in his table of the database instead of complex and less accurate face detections. Now particular person’s actions, movements of touching the door (Fig. 10), and the currency will be recorded and if he/she sanitized their hands before entering the ATM, then the display board before ATM will have a safe sign else will have a touch history (touch details) of others (Fig. 11) who did not follow the rules (without revealing their identity). This mechanism can also be incorporated with personal hygiene by adding additional rules of sanitizing after touching door, currency, etc., which will

292

C. V. Rohit and G. Murthy

Fig. 9 ATM card authenticator (outside view)

Fig. 10 Person not touching the door

Fig. 11 Person touching the door tracked

not cause an alarm to buzz or else like in in-home scenario while working before PC/laptop whenever a person brings his hands close to face the notifications and alarm will buzz. Same goes for supermarkets, we can use the existing CC cameras to monitor the touches of people and send that information on the fly to the inventory department later they can sanitize those areas in no-working hours. Using session schemes, the touches of individuals can be sent to particular customers.

COVID-19 Touch Project Using Deep Learning and Computer Vision

293

Fig. 12 Graphical Representation of: (i) Accuracy score (dynamic stream), (ii) GPU (training) time, (iii) memory used, and (iv) speed (FPS) of mask_R-CNN, SSD, CNN, and YOLOv3 algorithms, respectively

5 Results 5.1 Comparisons of YOLOv3 Versus SSD Versus R-CNN Versus Mask-R-CNN| Verdict|Test Images See Fig. 12.

5.2 Observations of the Experimental Analysis All these graphs (Fig. 12) are plotted by taking the average of 500 images/ 2 min length video which is used as validation/testing data with a split of 80/20 into train and testing dataset. Observing accuracy (avg dynamic) YOLOv3 is ahead, R-CNN is ahead in accuracy (avg static stream), mask_R-CNN is detecting noise. As you can see, the FPS graph which infers us that YOLOv3 is powerful for dynamic motions. As YOLOv3 is sparse algorithm (macro items) which further fits for our need in dynamic motions. So, we opted YOLOv3 for its faster predictions (as once we detect, fits enough for next 10 s). Average FPS is way less than that of Max FPS shown in

294

C. V. Rohit and G. Murthy

Fig. 13 Testing images of (i) mask_R-CNN, (ii) SSD, (iii) R-CNN, and (iv) YOLOv3

the graph, but still YOLOv3 with 40 is a good score for average. Though GPU time, memory usage is more for YOLOv3 for training but as training is more often a one-time process so this con is also not a potential problem in our case. As we are using macro-particles, (Fig. 13) so the resolution is decreased using cloud function trigger (on the cloud) and is sent to the algorithm so FPS is increased and out average is now bumped close to 60 with max at 92 fps, so with a trade-off between speed and accuracy, we can achieve our project goal to detect hand touching door/chair or any object around with those details, though accuracy is also good close to 96% all graph is evidence for these and this also makes less use of computation on GPU (cost-effective on GPU/CPU) as the resolution in diminished by 10’s scale [6]. We used the following configurations for training dataset GPU: 1xTesla K80, compute 3.7, having 2496 CUDA cores, 12 GB GDDR5 VRAM.

6 Conclusion and Future Works Finally based on experimental analysis and best-fit conditions, we preferred YOLOv3 with a pre-trained model and custom training weights (transfer learning) are the best fit compared to other algorithms like: mask-R-CNN, SSD, and R-CNN. YOLOv3 fastness and manageable accuracy are best suited for our goal and as even after compressing the images stream, fine adjusting to our requirements, still accuracy is kept to the par for our project. By using this model, we can bring in small change in society by helping the people in terms of their touches. Futuristically, it can be used for detection of social distance management by using drones/traffic CC cameras, two-person are at min distance away or not and analyzing from that. Also detecting a person is wearing a mask/gloves or not wearing to further allow them to public places by keeping mask authentication barriers in public places (airport, crowd areas, etc.). Totally, the scenarios used in the project are used to give an idea to deploy the same in different cases with fine adjustments in the real-world applications.

COVID-19 Touch Project Using Deep Learning and Computer Vision

295

References 1. Arogya Setu App: https://www.mygov.in/aarogya-setu-app/ 2. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. (2018), arXiv:1804.02767v1 3. Tian, Y., Yang, G., Wang, Z., Wang, H., Li, E., Liang, Z.: Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Comput. Electron. Agric. 157, 417–426 (2019). https://doi.org/10.1016/j.compag.2019.01.012 4. Dataset: https://storage.googleapis.com/openimages/web/visualizer/index.html?set=train& type=detection&c=%2Fm%2F0k65p 5. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 10.1109/CVPR.2017.690 6. Covid App Extra files/pics: http://www.cvrrocket.ga/projects/touch_app 7. Darknet repository. https://github.com/AlexeyAB/darknet 8. He, K., Zhang, X., Ren, S. et al.: Deep residual learning for image recognition. In: Computer vision and pattern recognition (cs.CV). (2015b), arXiv:1512.03385 9. https://www.pyimagesearch.com/2018/11/12/yolo-object-detection-with-opencv 10. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) 11. Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera, (2015). arXiv:1505.07427 12. https://youtu.be/iwSNO8O-xvY (Project Video (Demo in RealTime))

Flood Relief Rover for Air and Land Deployment (FRRALD) Jewel Moncy John, Justin Eapen, Jeffin John, Ebin Joseph, and Abraham K Thomas

Abstract A drone and rover integrated setup is used for the rapid recovery of the people affected by a natural disaster. The rover has an onboard camera, and the visuals are relayed to an operator remotely for control. The rover moves around using its four wheels and the operator will manually move both the rover and the drone using the system cameras. The rover also has an inbuilt GPS and PIR for giving exact location and for helping in the accurate detection of victims. Using face recognition algorithm, the user will identify the victim. This face recognition can be done using database that contains a list of disaster-prone people. The medical staff can also collect patient’s medical information using facial recognition algorithm for rapid medical support and recovery. A fleet of these systems can be deployed so that the search and rescue can be done efficiently and can save life. Keywords Disaster · Drone · Rover · Mapping · Rescue · Detection · Medical

J. M. John (B) · J. John · E. Joseph · A. K. Thomas Department of ECE, Saintgits College of Engineering, Kottayam, Kerala, India e-mail: [email protected] J. John e-mail: [email protected] E. Joseph e-mail: [email protected] A. K. Thomas e-mail: [email protected] J. Eapen Faculty, Department of ECE, Saintgits College of Engineering, Kottayam, Kerala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_32

297

298

J. M. John et al.

1 Introduction In catastrophic circumstances such as floods, tornadoes or hurricanes, one of the main obstacles facing rescue and recovery teams is to find and identify survivors and casualties at the earliest opportunity. In these situations, however, rescue teams are unable to detect the actual state of life under the rubble that eventually leads to disaster. In fact, accidents have had such a destructive impact on the body, making it much more difficult to differentiate between materials like mud and person itself. This may direct to a lot number of people losing their lives; an uncontrollable situation. The rescue teams find it difficult to save people as they are unable to locate people quickly and then provide them with emergency medical care. The availability and demand for drones have risen considerably in recent years due to technical breakthroughs that have provided them with more sophisticated technologies such as multi-functional sensors, position trackers, and built-in cameras. Although commonplace for industrial use, aerial drones [1] also are utilized in surveillance, army operations, and catastrophe alleviation investigations. Their flexibility and compactness make it easier to perform activities which might be potentially dangerous to humans. A literature survey was conducted taking into account the flood that has occurred in the state of Kerala, India in 2018 and 2019. It occurred in the months of July and August when the rainfall was above the normal average to about 120%. The dams became full; as a result, the government opened all the dams and resulted in a severe flood. More than 700 people were dead and costed the state more than USD 5.6 billion. The rescue team found it really difficult to evaluate the condition of the disaster that has taken place. It took a lot of time to find the victims trapped under debris during the landslide along with the flood. The rescue teams used helicopters for rescue and to evaluate the situation, which proved to be really inefficient and expensive. The idea is to create a rover which is both waterproof and is suitable for all terrains. The rover is integrated on to a drone. The drone can fly and find and people or victim with its onboard camera. The drone and rover setup is being controlled remotely by an operator. Moreover, there is an onboard PIR sensor attached to the rover. This can help the rover detect people that are stuck under debris in hard to reach areas. The rover is deployed when it finds possible victims, and the rover can reach hard to reach site like under debris, tunnel, inside the house where the drone cannot fly. Critical data like the position of the rover, which gives the location of the victim can be conveyed to the remote operator, and the remote operator then conveys the location to the rescue team. The rescue team can reach the location to provide medical help. Simultaneously, the remote operator can detect the person using face recognition and identify the victim’s medical history like blood group, allergies toward any medicine, etc. This data can be helpful for the medical team.

Flood Relief Rover for Air and Land Deployment (FRRALD)

299

2 Drone Design The drone provides the lift to reach the affected area quickly [2]. The drone is integrated to the rover. The parts of the drone include the following [3]. Frame: The frame provides the support for the essential control and motor of the quadcopter. The design should be light, should be compatible with the rover design, and should hold the rover and carry it with ease. It also provides support for four of the motors that provides lift to the design. However, the design should be collision free and there should be adequate separation between each motor blades. For the design of the frame, Hobbyking SK450 frame is used. The frame should facilitate space for enough holes for screwing other parts to the frame. Brushless Motors: The required lift and thrust for the drone is provided by the 4 brushless motors fixed on 4 arms of the quad frame. Brushless DC motors are used instead of brushed DC motors, as they provide a greater thrust/weight ratio. The motors are controlled with the help of ESC and other sensors so that correct rpm is maintained on the motors and we can control the motion of the motor for the 6 DOF. The kv rating indicates the speed of the motor when 1v is applied. This motor provides 1000 kV. The current of 12A in 60 s indicates the maximum current that can be drawn. The weight of the motor is approximately 275 g. Propellers: For the propeller, a carbon fiber propeller of length 12’, pitch 4.5’, weight 36.5 g, and shaft diameter 6 mm. Four propellers are required for the four quadcopter drone motors to provide the thrust. The propeller needs to be light, be balanced to reduce the vibrations and should not overheat the motors. ESC: Each motor needs its own electronic speed controller (ESC). The ESC accepts commands in the form of pulse width modulated control signal and output corresponding motor speed. The current rating of the ESC the maximum current, it can deliver without causing the motor to overheat. They provide power to the motor so that the required rpm is maintained to keep the drone stable. A 25A four in one controller can be used to deliver power to the motors. Transmitter and Receiver: The radio transmitter is what the pilot or the remote operator controls. The transmitter sends out signal mostly in the 2.4 GHz range to the receiver. The receiver processes these signals and send it to the flight control unit. Increased power rating can increase the range as the operator in this case works with the help of cameras and not line of sight communication. The drone army or swarms of drones can act as repeaters for extending the range. The transmitter can be used to control all six DOF. LIPO Battery: Lithium polymer batteries are used to provide high torque to the motors of the drone. Since LIPO batteries have very high discharge capacity they are ideal for placement in the drone unlike a Li–ion battery which has less discharge capacity. They also have high energy density when compared to the weight of the battery itself. It typically has an output voltage of 3.7 V. Attitude Sensor: The orientation and the attitude are controlled by the attitude sensors. Basically, they contain sensors like gyroscope and accelerometer for control. A six-axis inertial measurement unit (IMU) is used, consisting of an accelerometer

300

J. M. John et al.

Fig. 1 Inertial frame of a free body

and gyroscope on the same unit. Without the attitude sensors, the drone can simply tip over and cannot fly. The drone must be able to control all the six degrees of freedom with the least number of motors as much as possible. The six degrees of freedom that we are using involves movement and rotation along each x, y, and z-axis as shown in Fig. 1. All aspects of flight can be manipulated by applying different values of thrusts to the different motors individually. When all the motors have the same rpm; the drone lifts upwards. To move to a particular direction, for example, to the left, the rpm of the two motors at the left can be reduced while also increasing the rpm of the side. The six DOF are X, Y, Z, and rotational DOF includes roll, pitch, and Yaw.

3 Rover Fabrication The design of the rover was planned to be done using 3D printing. But unfortunately, the overall design proved to be fragile using 3D printing. So, laser cutting was used. The design was made using SolidWorks. As can be seen from the side view in Fig. 2a, a belt drive was also designed, but during the fabrication stage, it was difficult to laser cut it. Figure 2b shows the 3D view of the rover. In order to laser cut the model, each frame was resolved from 3D to 2D that is an outline was made in 2D and was printed using laser cutter. An acrylic sheet of 5 mm thickness for the structure and 8 mm thickness for the wheels is used. However, the wheels were found to be less strong, so the thickness of both the front wheels was increased to 10 mm. To make the design lighter, other materials instead of acrylic like carbon fiber or plastic can be used.

4 Rover Motion and Control The rover motion is provided by Arduino. The reason a dedicated microcontroller is used to basically increase efficiency. The efficiency in communication between the Raspberry Pi and its subsystems can be improved by making use of an independent microcontroller. An Arduino Uno is programmed and is used as the microcontroller to

Flood Relief Rover for Air and Land Deployment (FRRALD)

301

Fig. 2 SolidWorks model a side view, b 3D view

control the system. It is connected to a Bluetooth module HC-05 for communication. Four DC motors, each of 500 rpm and input 12 V (dc) is used for rover movement. The rover is powered by a 12 V dc LIPO rechargeable battery. Since the voltage generated by an Arduino is 5v (DC) and the motor works at 12v (DC), we use a motor driver, L298 bridge motor controller board to powers all the four motor at 12v (DC) and also powers the Arduino itself using its output 5v (DC). The circuit diagram in Fig. 3 shows the connections with a single motor driver. The motor has two polarities. The peculiar feature of a DC motor is that when we change the polarity the direction of the motor is reversed. So, let us name the front two motors as M1A and M2A. Similarly, the back two motors as M1B and M2B. For example, to get a

302

J. M. John et al.

Fig. 3 Circuit diagram of motor control

forward motion, M1A can be set to high and M1B to low. The direction is reversed when M1A is low and M1B is high. An HC05 Bluetooth module is used, and the TX is connected to Rx and Rx to TX on the Bluetooth module and Arduino.

5 Mapping Using GPS The GPS module is used to get the position of the victim in Google Maps [4]. The GPS uses three satellites which sends the longitude, attitude, and time to the GPS receiver. The GPS module is connected to the Raspberry Pi [5]. It consists of four pins—VCC, GND, TX, and Rx. The TX is connected to the Rx and the Rx to the TX of the Raspberry Pi and the GPS module. The VCC and GND pins are connected to the respective pins on the Raspberry Pi. The data received is in the form of pseudocode and it is converted to coordinates at the Raspberry Pi. The coordinates that are converted in the Raspberry Pi are sent to an online server; in this case, we use ThingSpeak since it is free. The latitude [6] as in Fig. 4a and longitude as in Fig. 4b is uploaded to the server using its upload keys. The server plots the graph of the data received as shown in Fig. 7a, b. With the help of Google API keys under the developer options, we can plot the coordinates of the victims’ location on Google Maps as shown in Fig. 5.

Flood Relief Rover for Air and Land Deployment (FRRALD)

303

Fig. 4 ThingSpeak data a latitude, b longitude

Fig. 5 Location on Google Maps

6 Human Detection Using PIR The passive infrared sensor (PIR) detects people at arrange. The human body emits radiations of 0.7–300 µm. The PIR has two slots. The slot is made of a material which is sensitive to infrared radiations (IR). When the sensor is idle both slots receive the same amount of radiations. These radiations can be the radiations from the room, wall, etc. A positive differential change is created when a human body passes to the first half of the sensor. When the human body leaves, a negative differential change between the two halves is detected. These pulse changes are detected and communicated to the Raspberry Pi. The PIR module has three pins—VCC, GND, and control. The PIR module detects people to about 8 m for the quick detection of humans [2]. The remote operator can then detect the people with ease and can also know the location of the victims with the help of GPS module. A high signal is given out from the PIR sensor on detection of the human body to the microcontroller. Figure 6 shows the influence of the human or animal body on the PIR sensor. The PIR module is connected to the Raspberry Pi. The data received from the Pi is uploaded in the same way as in the GPS module to the online server ThingSpeak. If human is detected the operator gets notification. Figure 7 shows the data received on the ThingSpeak server, if the system shows “2” that shows the presence of a human. On the other hand, if it shows “1” then the system

304

J. M. John et al.

Fig. 6 Influence of PIR on humans

Fig. 7 PIR data in ThingSpeak

conveys the absence of a human being. However, this system works with continuous integration with the camera module for accurate results.

7 Camera and Face Recognition The PI cam is connected to a Raspberry Pi. The camera gives about 5 megapixel clarity. Instead of the Raspbian OS, we will have to install MotionEye OS. This OS is used for surveillance with the Raspberry Pi. The data received from the Raspberry Pi is sent to the operator via the Internet [7]. The operator can get the visuals by entering the Raspberry Pi’s IP address. This IP address is integrated to MATLAB for visualization. We use database to store the relevant pictures of all the people. In the case of an emergency this dataset can be used for face recognition. The system is trained with AlexNet and the visual received from the Pi is used for matching. The dataset in Fig. 8a, b are trained using MATLAB. The dataset consists of about more

Flood Relief Rover for Air and Land Deployment (FRRALD)

305

Fig. 8 Dataset of a person 1, b person 2

Fig. 9 Detection of a person 1, b person 2

than 120 pictures each. The training is done with stochastic gradient decent with momentum (SGDM) with an epoch of 20. 90% of the data is given for training and the rest 10% for testing. The testing is done using MATLAB with a test set of person 1 and 2. As in Fig. 9a, b, the person 2 and 1 are detected successfully. The system is modified for getting critical information such as patients’ blood group, medical history, etc., along with the name. This can be used for the rapid response from the rescue team and medical team.

8 Conclusion FRRALD can help in the effortless and rapid rescue of victims affected by flood and other disasters. They can get immediate medical attention. The system can assist rescue teams to recognize the effect of the catastrophe that has taken effect. The rescue team can rescue the people trapped by receiving the location to rescue. The GPS accuracy was good. Some tests were carried out using the GPS and Google Maps

306

J. M. John et al.

and it gave accurate results. Tests were carried out, in which the coordinates from the system were compared against known coordinates. With the help of the camera, the operator gets a clear-cut view of the scenario and the effects of the disaster that has taken place. The PIR and camera can assist in detecting and identifying the people trapped. The face recognition is useful in getting important information about the patient for rapid medical support. For the research work, a free ThingSpeak server is used, but for real-time communication, operation and to send the acquired data, a strong backhaul network and server can be used. The challenges that FRRALD face includes the accurate detection of the victims trapped deep within the mud. To increase the deep learning and MATLAB algorithm and to better visualize the face of the victim, it is still a major drawback for FRRALD. The cost of construction or the integration of the drone which can carry the payload weight of the rover, it can make the overall cost high. To increase the accuracy of the location tracked and relying to the operator can be a problem. The system can be fitted with thermal camera to get an overall better visualization. The system can also be upgraded by sending a FRRALD army or a fleet [8], to reduce the time in detecting the victims and also to act as a repeater to increase the range.

References 1. Pedersen, J.: Use of UAVs in the NGO world. In: CRS Conference—ICT4 Development, Nairobi, Kenya, Mar 25–28, (2014) 2. Rivera, A.J.A., Villalobos, A.D.C., Monje, J.C.N., Mariñas, J.A.G., Oppus, C.M: Post-disaster rescue facility: Human detection and geolocation using aerial drones. In: 2016 IEEE Region 10 Conference (TENCON) 3. Alwateer, M., Loke, S.W.: On-Drone decision making for service delivery: concept and simulation. 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops) 4. Tariq, R., Rahim, M., Aslam N., Bawany N., Faseeha, U.: DronAID: a smart human detection drone for rescue. In: 2018 15th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT (HONET-ICT) 5. Parvu, P., et al.: Autonomous system for image geo-tagging and target recognition. M J1 erospace Conference, m press, May 2014, pp. 1–26 6. Câmara, D.: Cavalry to the rescue: drones fleet to help rescuers operations over disasters scenarios. In: 2014 IEEE Conference on Antenna Measurements & Applications (CAMA) 7. Gaszczak, A., Breckon, T.P., Han. J.: Real-time people and vehicle detection from VAV imagery. In: Proceedings of SPIE: Intelligent Robots and Computer Vision XXVIII: Algorithms and Techniques, San Francisco, California, 2011, pp 8. Besada, J.A., Bernardos, A.M., Bergesio L, Vaquero, D., Campaña, I, Casar, J.R.: Drones-asa-service: A management architecture to provide mission planning, resource brokerage and operation support for fleets of drones. In: 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)

An Enhanced Differential Evolution Algorithm with Sorted Dual Range Mutation Operator to Solve Key Frame Extraction Problem M. Aathira and G. Jeyakumar

Abstract This paper proposes a modified Differential Evolution (DE) algorithm in which the conventional mutation operation of DE is replaced by a ‘sorted population’ based mutation operation. This ‘sorted population’ based mutation operation, proposed by authors, differs from the conventional mutation operation in the way in which it selects the candidates for the mutation process and the values it sets for the mutation scale factor (F). The modified DE was implemented, to verify its superiority, on solving 14 different standard benchmarking problems. A comparative study, based on the results obtained, revealed that the proposed algorithm solved the problems providing optimal solutions with lesser time, for higher dimensional problems. Next, the experiments were extended to solve the key frames problem from videos. This part of the experiment combined the conventional SSIM (Structural Similarity Index) approach of key frame extraction with the proposed DE. The results showed that the proposed DE was giving comparatively better results than classical DE. Keywords Differential evolution · Mutation · Modified mutation · Video analytics · Key frame extraction · SSIM approach

1 Introduction Evolutionary Algorithms (EAs) is a set of systematic random search algorithms popularly used for solving real-world optimization problems. The researcher community of EAs focus on various aspects of the algorithms which includes analysing the theoretical property of the algorithms, modifying their algorithmic structure, integrating them with other similar algorithms and designing strategies to control/tune their parameters. As well as, testing their applicability on solving different real word M. Aathira (B) · G. Jeyakumar Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] G. Jeyakumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_33

307

308

M. Aathira and G. Jeyakumar

optimization problems. It is also commonly found in the research community that the researchers propose innovations to the algorithmic structures of EAs and then the modified EAs are tested on real world optimization problems. In the similar line, this paper proposes to modify the mutation component of Differential Evolution (DE) (proposed in [1]) algorithm and to solve the key frame extraction problem of video analytics with the modified algorithm. The remaining part of the paper is organized with the following sections—Sect. 2 to discuss related works, Sect. 3 to introduce the proposed mutation strategy, Sect. 4 to explain the design of the experimental set up used for this study, Sect. 5 to present and discuss the results obtained on the benchmarking functions, Sect. 6 to verify the novelty of the proposed mutation on a video analytics problem and Sect. 7 to conclude the paper.

2 Related Works This section summarizes popular amendments made in the mutation component of DE and key frame extraction of video. An enhanced DE algorithm with multiple-mutation strategies and self-adapting control parameters was proposed in [2]. In [3], based on the fitness value, individuals of each generation are ranked from better to worst part. To perform the mutation, 2 individuals are chosen either from the worst part or better part. A modified mutation named dual preferred learning mutation (DPLM) was proposed in [4]. The DPLM simultaneously learns behaviours from the individual with better fitness (BFI) and individual with better diversity (BDI). A Revised Mutation DE, ReDE, was proposed in [5]. The ReDE used two control parameters and two types of populations. In [6], the author proposed a diversity based base vector selection mechanism for the mutation. This idea was extensively evaluated and reported in [7], by the same authors. A different base vector selection strategy, to select the centroid of top 3 candidates as the base vector was proposed in [8]. In [9], two different novel variants of Differential evolution called centroid differential evolution (CDE) and differential evolution with a local search (DELS) were proposed. [10] proposed a novel version of mutation operator to DE inspired from the biological phenomenon called Hemostasis. Authors of [11] proposed a set-based mutation operator which works on the causal matrix. There are also attempts to investigate and propose new algorithmic structure of DE viz dynamic DE and distributed DE ([12–14]) etc. In video analytics extracting the key frames from the video involves more applications viz video summarization, content based retrieval and object detection etc. There are numerous approaches proposed in the literature for this purpose. [15] proposed a Euclidean Distance Based strategy. In [16], an entropy based approach was proposed. The approach proposed in [17], used an improved histogram algorithm. An algorithm based on optimized key frame difference was proposed in [18]. The chi-square histogram algorithm was used in [19, 20] introduced a formula to calculate the difference between the current and the next frame. The integration of

An Enhanced Differential Evolution Algorithm …

309

Structural Similarity Index Method (SSIM) with classical DE algorithm to solve the key frame extraction problem was first introduced in [21]. An extensive comparative study on the conventional SSIM, entropy method and the Euclidean method and their integration with DE was presented in [22]. It was reported that the DE unified algorithms showed high accuracy. Following the DE_SSIM proposed in [21], this paper proposes to integrate the proposed mutation modified DE algorithm with SSIM approach to detect the key frames from a set of traffic surveillance videos. The proposed mutation method is described in the next section.

3 Proposed Mutation Strategy The general logic of DE’s mutation (which is named as differential mutation) is to add the scaled difference of two (or more) candidates in the population with another candidate in the population. Based on the way these candidates are selected there are many different mutation operators available for DE. The proposed mutation strategy is a modified version of ‘rand’ mutation. The population is sorted in the ascending order of their objective function values and it is divided into two partitions—Promising and Non-promising. The indexing of the candidates  index  range of  promising region   is from 0 to NP − 1. Hence, the is 0, N2P and the non-promising region is N2P , N P − 1 . For mutating the candidates in the promising and non-promising region    the random candidates are    selected in its region 0, N2P and N2P , N P − 1 , respectively. The F value is set random in the range [0, 0.5] and [0.5, 1], respectively. In both the cases the base vector is the candidate with good fitness value among the three random candidates. The proposed mutation operator is named ‘sorted dual range mutation’ (sdrm). The DE with sdrm is denoted as DE sdrm , henceforth in the paper. The design of experimental setup is discussed in the next section (Sect. 4).

4 Design of Experiments The objective of this experiment is to investigate the performance DE sdrm and classical DE (cDE). The comparison of the algorithms was done based on their performance on benchmarking problems and a video surveillance problem at traffic signals. The parameters for DE are the number of candidates in the population—population size (ps), size of each candidate—dimension (d), the mutation scale factor—F, the probability of crossover—crossover rate (C r ), the maximum number of generations (MaxGen) and the maximum number of trail runs (M tr ). The values for these parameters were set constant before start of the DE sdrm run, except for F. The values for F were chosen randomly in the ranges specified in the proposed mutation strategy (sdrm), for every candidate in the population. The experiment was repeated for two

310

M. Aathira and G. Jeyakumar

different population sizes 60 and 200. The summary of parameter setting is ps = 60 and 200; d = 30; F in [0, 1]; C r = 0.5; MaxGen = 10 and M tr = 50; The performance metrics used in the experiments were the average of solutions (AOS) and the speed. The AOS was measured as the average of these solutions obtained for M tr . The speed was measured with two metrics – the number of function evaluations (nFE) and the execution time (ExeTime). The values measured for each run are reported for discussion in Sect. 5.

5 Results and Discussions The classical DE (cDE) and the proposed DE (DE sdrm ) were implemented to solve the benchmarking functions chosen in the experimental setup. The AOS, nFE and ExeTime measured for cDE and DE sdrm , for ps = 60, is presented in Table 1. The results indicate that the proposed DE sdrm outperformed cDE by all the performance metrics, only, for f 2 . The DE sdrm outperformed cDE by two metrics together (AOS and ExeTime) for three functions f 3 , f 8 and f 13 . The DE sdrm outperformed cDE, only by nFE, for two functions—f 4 and f 10 . The DE sdrm outperformed cDE, only by ExeTime, for five functions—f 1 , f 7 , f 9 , f 11 and f 14 . Except for f 5 , in all other functions the DE sdrm outperformed cDE by at least any one of the metrics. The summary of inferences is “cDE was good in AOS and the DE sdrm was good in speed in both ExeTime and nFE”. Table 1 The AOS, nFE and ExeTime for ps = 60 Functions

cDE

DE sdrm

AOS

ExeTime

f1

55,270.07

0.0020

f2

109,338.55

0.0040

f3

427.49

0.0047

660

f4

20.57

0.0048

660

3.79E + 27

0.0049

606

f5

6.02E + 09

0.0021

660

5.91E + 11

0.0021

660

f6

88.25

0.0020

660

88.07

0.0020

660

f7

1.82E + 08

0.0039

660

2.59E + 08

0.0031

660

f8

58,148.70

0.0024

660

52,925.40

0.0023

660

f9

76.62

0.0060

660

98.16

0.0053

660

f 10

484.59

0.0040

660

919.78

0.0043

606

f 11

528.03

0.0054

660

552.34

0.0044

606

f 12

860.42

0.0053

660

506.95

0.0053

660

f 13

2559.65

0.0053

660

2393.92

0.0052

660

f 14

1522.02

0.0073

660

2033.85

0.0071

660

a good

results are marked in bold

nFE

AOS

ExeTime

nFE

660

75,252.14

0.0018

660

660

11,374.32

0.0034

606

418.03

0.0045

660

An Enhanced Differential Evolution Algorithm …

311

Table 2 The AOS, nFE and ExeTime for ps = 200 Functions

cDE

DE sdrm

AOS

ExeTime

nFE

AOS

ExeTime

nFE

f1

50,092.57

0.005

2200

55,856.26

0.005

2200

f2

76,148.29

0.010

2200

70,590.85

0.010

2200

f3

389.05

0.012

2200

382.37

0.011

2020

f4

20.58

0.012

2200

1.19E + 25

0.011

1840

f5

2.13E + 09

0.005

2200

6.80E + 08

0.004

1480

f6

83.80

0.005

2200

77.09

0.004

2200

f7

2.07E + 08

0.010

2200

2.00E + 08

0.010

2200

f8

58,462.30

0.007

2200

40,353.00

0.005

1480

f9

70.50

0.015

2200

64.29

0.014

2020

f 10

169.30

0.011

2200

1024.88

0.003

580

f 11

583.61

0.013

2200

530.26

0.014

2200

f 12

788.16

0.013

2200

713.74

0.015

2200

f 13

2103.64

0.013

2200

2048.07

0.012

2020

f 14

1791.78

0.017

2200

1451.90

0.019

2200

The same experiments were repeated for cDE and DEsdrm to solve the 14 benchmarking problems, however, with ps = 200. The values measured for the performance metrics of cDE and DEsdrm is presented in Table 2. The superiority of the DE sdrm was clearly evident. DE sdrm outperformed cDE in 11, 9 and 7 function cases out of 14 by AOS, ExeTime and nFE, respectively. DE sdrm could outperform cDE by all the three metrics for five benchmarking functions—f 3 , f 5 , f 8 , f 9 and f 13, by speed (both by ExeTime and nFE) for two functions—f 4 and f 10 and by AOS and ExeTime for one function—f 6 . Thus, the experiments done on the benchmarking functions proved that the proposed DE sdrm shows superior performance than the classical DE by both the solution and speed. To validate further the superiority of DE sdrm , its performance was assessed on solving the problem of extracting key frames from video. The experimental details and the observations gathered are presented in the next section.

6 Validation of DEsdrm on Video Analytics Problem There was numerous evolutionary algorithm based frameworks proposed in the literature for extracting key frames from the given videos. In this experiment, the cDE and DE sdrm were implemented for key frame extraction, to demonstrate the efficiency of DE sdrm . The video was first converted into frames and 75 frames. The objective of this experiment was set to extract 10 key frames from these 75 frames.

312

M. Aathira and G. Jeyakumar

The values for the DE parameters set were—ps = 10, D = 10, F = random (or constant (0.9)), C r = 0.6, MaxGen = 10 and M tr = 3/50. A population with 10 candidates was initialized. Each candidate in the population was a set of 10 random frames. The fitness of the candidates was measured as the ASSIM (Average Structural Similarity Index) value of the frames in the set. The experiments were repeated for 3 trails, each with different independent runs, in order to get better comparative analysis of the algorithms. In Trial 1, the M tr was set 3 for Trail 1. The F values were chosen in the range of [0, 0.5] and [0.5, 1] for the promising and non-promising regions. The proposed DEsdrm failed to outperform cDE. The average ASSIM value of DE sdrm was higher than the cDE. In Trail 2, the F values were set differently for each region of the population. The F values used were 0.5 and 0.9 for the promising and non-promising regions. The M tr was set to 5. DE sdrm outperformed cDE with a marginal difference of 0.0039. As well as, cDE also outperformed DE sdrm in 3 out of 5 runs. This showed the equal and comparable performance of DE sdrm with cDE. It is worth noting here that the average performance of DEsdrm has increased in its Trial 2 compared to its performance in Trail 1. In Trial 3, the F value was set constant, as 0.9, for both the promising and the non-promising regions. The M tr was set to 50 for this trail. It was found that the proposed DE sdrm generated key frames with lesser ASSIM values compared to cDE. The experimental results recorded are presented in Table 3. The cDE and DE sdrm algorithms were compared by different metrics measured on ASSIM values obtained for the 50 runs. The best, worst and average ASSIM values of 50 runs were lesser for the DE sdrm compared to the corresponding values of cDE. It is observed from the results that, on comparing the corresponding runs of cDE and DE sdrm , the DEsdrm significantly outperformed cDE in all the 50 runs. The pairwise difference between the ASSIM values attained by the algorithms in each run is also reported in the results. Table 3 Experimental results for trial—3 video 1

Details

cDE

DE sdrm

Key frames Comparison by

1, 1, 3, 23, 34,43,50, 51, 64, 70

1, 1, 5, 24, 28, 39, 48, 51, 58, 75

ASSIM

0.6250

0.5853

Pair

Best Worst

0.7195

0.6308

Worst—best

0.0945

0.0455

Average

0.6654

0.6045

+

0

50



50

0

Min_Diff

0.0051

Max Diff

0.1205

Avg Diff

0.0608

An Enhanced Differential Evolution Algorithm …

313

Fig. 1 Key frames a cDE and b DE sdrm

The average difference found was 0.0608. This showed the reasonable performance enhancement achieved by the proposed DE sdrm algorithm. The key frames extracted by the cDE and the DE sdrm are depicted in Fig. 1a, b, respectively, for a reference. Thus, the superiority of the proposed DEsdrm algorithm was proven on a set of 14 benchmarking problems and a video analytics problem.

7 Conclusions This paper proposed a novel mutation strategy named ‘sorted dual range mutation (sdrm)’ to Differential Evolution (DE) algorithm. The DE, in which the classical mutation operator replaced with the sdrm, was named as DE sdrm . To prove the novelty of sdrm the classical DE and DE sdrm were implemented to solve a set of 14 benchmarking problems and a key frame extraction problem. The results of the benchmarking experiments showed that the DE sdrm could outperform cDE, significantly, for higher dimensions. For key frame extraction problems, three trials were tried with different F values. The results revealed the trend of performance enhancement of DE sdrm from Trial 1 to Trial 3. The results proved the superiority of the proposed DE sdrm algorithm in the key frame extraction problem. The superiority of DE sdrm was well evident in the chosen video.

314

M. Aathira and G. Jeyakumar

In overall, the experiments on benchmarking and key frame extraction problems revealed the novelty exists in the sdrm mutation. The sdrm followed a strategy of exploring and exploiting the population in every generation from the beginning to the end of the evolution of DE. This strategy can be further analysed by comparing it with other similar mechanisms in the literature.

References 1. Rainer, S.: Differential evolution-a simple and efficient adaptive scheme for global optimization over continuous spaces. Tech Report Int. Comput. Sci. Inst. (1995) 2. Attia, M., Arafa, M., Sallam, E.A., Fahmy, M.M.: An Enhanced differential evolution algorithm with multi-mutation strategies and self-adapting control parameters. Int. J. Intell. Syst. Appl. 11(4), 26–38 (2019) 3. Zhou, Y., Li, X., Gao, L.: Adaptive differential evolution with intersect mutation and repaired crossover rate. Appl. Soft Comput. 13(1), 390–401 (2013) 4. Duan, M., Yang, H., Liu, H., Chen, J., Duan, M., et al.: A differential evolution algorithm with dual preferred learning mutation. Appl. Intell. 49, 605–627 (2019) 5. Ramadas, M., Abraham, A.: Revised mutation strategy for differential evolution algorithm. In: Metaheuristics for Data Clustering and Image Segmentation-Intelligent Systems Reference Library, vol. 152, pp 57–65 (2019) 6. Gokul, K., Pooja, R., Gowtham, K., Jeyakumar, G.: A Self-switching base vector selection mechanism for differential mutation of differential evolution algorithm. In: International Conference on Communication and Signal Processing (2017) 7. Gokul, K., Pooja, R., Jeyakumar, G.: Empirical evidences to validate the performance of selfswitching base vector based mutation of differential evolution algorithm. In Proceedings of 7th International Conference on Advances in Computing, Communications and Informatics, pp. 2213–2218 (2018) 8. Salehinejad, H., Rahnamayan, S., Tizhoosh, H.R.: CenDE: centroid-based differential evolution. In: Proceedings of IEEE Canadian Conference on Electrical & Computer Engineering (CCECE) 9. Ali, Musrrat, Pant, Millie, Nagar, Atulya: Two new approach incorporating centroid based mutation operators for differential evolution. World J. Model. Simul. 7(1), 16–28 (2011) 10. Prabha, Shashi, Yadav, Raghav: Differential evolution with biological-based mutation operator. Eng. Sci. Technol. Int. J. 23(2), 253–263 (2020) 11. Jing, S.-Y.: Set-Based differential evolution algorithm based on guided local exploration for automated process discovery. In: Foundations and Applications of Process-based Modeling of Complex Systems, Complexity, vol. 2020, (2020) 12. Jeyakumar, G., ShunmugaVelayutham, C.: Differential evolution and dynamic differential evolution variants—an empirical comparative performance analysis. Int. J. Comput. Appl. (IJCA) 34(2), 135–144 (2012) 13. Jeyakumar, G., Shunmuga Velayutham, C.: Distributed mixed variant differential evolution algorithms for unconstrained global optimization. Memetic Comput. 5(4), 275–293 (2013) 14. Jeyakumar, G., Shunmuga Velayutham, C.: Distributed heterogeneous mixing of differential and dynamic differential evolution variants for unconstrained global optimization. Soft Comput. 18(10), 1949–1965 (2014). Springer 15. Wang, L., Zhang, Y., Feng, J.: On the Euclidean distance of images. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), (2005) 16. Algur, S.P., Vivek, R.: Video key frame extraction using entropy value as global and local feature. arXiv:1605.08857 (cs.CV), (2016)

An Enhanced Differential Evolution Algorithm …

315

17. Liu, G., Zhao, J.: Key frame extraction from MPEG video stream. In: Proceedings of Second Symposium International Computer Science and Computational Technology (2009) 18. Liu, H., Meng, W., Liu, Z.: Key Frame extraction of online video based on optimized frame difference. In: Proceedings 9th International Conference on Fuzzy Systems and Knowledge Discovery (2012) 19. Ramender, G., Pavani, M., Kishore Kumar, G.: Evolving optimized video processing and wireless transmission system based on arm-cortex-a8 and gsm. Int. J. Comput. Netw. Wirel. Mobile Commun. 3(5), (2013) 20. Liu, H., Pan, L., Meng, W.: Key frame extraction from online video based on improved frame difference optimization. In: Proceedings of 14th International Conference on Communication Technology (ICCT) (2012) 21. Abraham, K.T., Ashwin, M., Sundar, D., Ashoor, T., Jeyakumar, G.: An evolutionary computing approach for solving key frame extraction problem in video analytics. In: Proceedings of ICCSP-2017—International Conference on Communication and Signal Processing (2017) 22. Abraham, K.T., Ashwin, M., Sundar, D., Ashoor, T., Jeyakumar, G.: Empirical comparison of different key frame extraction approaches with differential evolution based algorithms. In: Intelligent Systems Technologies and Applications, ISTA 2017 Advances in Intelligent Systems and Computing, vol. 683, pp. 317–326 (2018)

Annotation for Object Detection P. Myna, R. V. Anirudh, Brundha Rajendra Babu, Eleanor Prashamshini, and Jyothi S. Nayak

Abstract Computer vision is an important, new area of research. It requires large datasets for training; such datasets are often inaccessible due to financial reasons or do not exist for specialized needs. This paper discusses an annotation tool designed for convenient data annotation. The aim is to enable easy manual annotation for image. Also, annotation accuracy has been compared under a case study for detection by humans and detection by YOLO9000. Keywords Annotation · Object detection · Computer vision

1 Introduction Video and image processing are a highly researched fields and are predicted to continue expanding for a significant period. The improvement of computing capabilities and easy access to video and image recording gadgets have enabled the development of computer vision applications in surveillance, disease detection, autonomous vehicles design, etc. Since most real-world applications are highly sensitive, it is imperative to trained and tested machine learning algorithms on huge datasets.

P. Myna · R. V. Anirudh · B. R. Babu · E. Prashamshini (B) · J. S. Nayak Computer Science and Engineering, B.M.S. College of Engineering, Basavanagudi, Bangalore, Karnataka 560019, India e-mail: [email protected] P. Myna e-mail: [email protected] R. V. Anirudh e-mail: [email protected] B. R. Babu e-mail: [email protected] J. S. Nayak e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_34

317

318

P. Myna et al.

Niche applications, such as those in biology and astronomy, often do not have annotated datasets or easily accessible high-quality images. Thus, manual image collection and annotation become the only option [1]. Another important application of this tool is to compare the accuracy of algorithm-based object detection to the accuracy of detection by the human eye. The prime focus of this paper is to discuss the design of a manual annotation tool and check the accuracy of the same with respect to algorithm-based annotation. Annotation of an image means associating critical extra information with the image/diagram. In this tool, all persons and objects are identified in the image and are assigned the correct labels. YOLO9000 is a real-time object detection algorithm used for classification for objects in the annotation tool. Intersection over Union (IoU) is an evaluation metric used popularly used to check object detection accuracy. This tool provides a feature to check the IoU between human annotated image and image annotated done by the tool [2].

2 Related Work 2.1 Annotation Tools Over the years, many automatic or semi-automatic annotation tools have been developed [3]. Most of them work using pre-trained weights or targets. Thus, for applications where no targets exist, manual annotation becomes a necessity.

2.2 Object Detection Algorithm Object detection tools, supported by advancements in technology, have been recently put into production. While accuracy, size and speed issues are still persistent, they are no longer major hindrances. Two-step object detection [4], such as CNN (Convolutional Neural Network), R-CNN, Fast R-CNN, and Faster R-CNN models, generally surpass their one step counterparts in accuracy. The first step, region proposal, checks for regions in the image that have a significant probability of being an object and then generates relevant coordinates. The second step, object detection, takes the generated regions as inputs and performs classification. On the other hand, single-step object detection models combine locating and classifying into a single step and thus have higher speeds and memory efficiency despite their simplicity. Some examples of these models are Single Shot MultiBox Detector Model, Retina Net and You Only Look Once (YOLO).

Annotation for Object Detection

319

The object detection algorithm which has been tested for accuracy in this paper is a version of YOLO [5]. YOLO detects objects and provides a confidence score for how accurate the detection of the object is. YOLO employs regression and compacts the whole detection pipeline to one network. A single head iterates through sections of the image and processes the same using a few convolutional layers to get a feature map. Then, offsets are calculated to get an anchor box. This system of anchor and offsets is reported to decrease training time. A threshold confidence score of 30% is generally used while generating object detection outputs. YOLO9000 [6] is a more optimized version and is a better fit here. The use of Siamese Networks [7] has helped to train with the limited annotated surveillance data that is available.

2.3 Currently Available Annotated Datasets As computer vision applications expand, newer annotated datasets for specific needs are required. To provide context, some commonly used datasets are discussed briefly: 1. Common Objects in Context (COCO) dataset [4]: This dataset is of 328,000 images and 91 object classes of objects in their natural surroundings. It has labels for commonly seen objects such as cat, car and eye glasses. This dataset was annotated by a tool called coco-annotator. 2. ImageNet dataset [8]: This mammoth dataset contains 12 subtrees with 5247 synsets of classifications and 3.2 million images. This dataset contains more detailed labels like Egyptian cat, freight car, passenger car and sunglasses. This dataset was hand-annotated. 3. SUN dataset [9]: This dataset focuses on scene categorization with 397 categories and 130,519 images. This contains images with object labels such as door, car and tree, as well as scene labels such as cafeteria, farm and elevator. This dataset was hand-annotated.

2.4 Metric The metric chosen to measure object detection capabilities in this paper is Intersection over Union (IoU) [10]. Intersection over Union calculation requires: 1. The actual hand-labelled bounding boxes, referred to as ground-truth bounding boxes. 2. The bounding boxes predicted as output by the object detection model. Figure 1 explains how Intersection over Union is calculated. An IoU score greater than 0.5 usually indicates good detection.

320

P. Myna et al.

Fig. 1 Intersection over Union formula [10]

3 Implementation The input to the system was videos of busy streets. A script was run to extract frames from the video. Then, frames were run through YOLO9000 and also annotated manually. The accuracy of detection was calculated using IoU. The frontend for the application was implemented using ReactJS. MongoDB was used for its ability to store semi-structured and unstructured data.

3.1 System Architecture Images are uploaded to the tool where it is displayed with two layers; the actual image and a transparent layer above on which annotation is done. Up to 100 images can currently be uploaded at once. Humans annotate each of these images manually by drawing boxes around each person in the image. If required, YOLO9000 can be run on the images as well to detect objects classified as ‘people’. On saving, a file with original images, human annotation details and YOLO9000 annotation coordinates are stored on the local system. Input to the system is uploaded as images or as a video through a script to extract frames. As inputting images individually would be cumbersome during testing, short videos were inputted to the system. Frames were extracted from these videos at random intervals and processed (Fig. 2).

3.2 Interface Design A single page application, using ReactJS, has been created to provide access to the annotation tool. The user flow has been crafted to be simple and intuitive for users. The application has a drawable area, where the image to be annotated is layered with a canvas. The user can proceed to manually annotate the displayed image by

Annotation for Object Detection

321

Fig. 2 System architecture

drawing boxes around persons, using their mouse. When the user saves the annotations, all coordinates are stored at the backend. Further, the user can run a comparison with the YOLO9000 for the annotated images and download all the results.

3.3 Design The frontend is built with ReactJS as a single page application (SPA). The application is created using create-react-app, and each of the page components is dynamic. HTTP requests are made from the ReactJS app, to the backend. The backend consists of an Application Programming Interface (API) written in Go language and NodeJS. The annotated images are stored using Mongo Atlas cloud services. A Docker image of the application is created which is used to create a Docker container. The container is hosted on AWS cloud services thus ensuring security and scalability (Fig. 3).

4 Experimental Setup Suitable images of people were collected. Then boxes, called bounding boxes, were drawn around the object of interest using two sets of coordinates. The coordinates are denoted by (X1, Y1) and (X2, Y2) such that X1 and Y1 are the coordinates of the top-left corner of the object, and X2 and Y2 are the coordinates of the bottom-right

322

P. Myna et al.

Fig. 3 Design

corner of the object. Coordinates are measured with the top-left corner of the image as the origin. Using the two sets of coordinates, all the four corners of the object section of the image can be represented as: (X1, Y1)—Top-left coordinate of object (X2, Y1)—Top-right coordinate of object (X1, Y2)—Bottom-left coordinate of object (X2, Y2)—Bottom-right coordinate of object A two-part experiment was setup to record coordinates as follows: (a) Human Annotation: The authors of this paper manually recorded the coordinates of boxes around people in the images. (b) Machine Learning Algorithm: The images are annotated by the chosen ML algorithm, where emphasis is laid on a particular label/class of objects. Additionally, most algorithms give a confidence score for these detected objects in the images. This paper explores using the YOLO9000 object detection algorithm in the tool.

Annotation for Object Detection

323

5 Case Study: YOLO9000 5.1 Overview of YOLO9000 YOLO9000 works well on images with abundant noise and thus is selected for the accuracy comparison in this paper. YOLO9000 has been tested for object detection on the ImageNet detection validation set and has received a score of 19.7mAP (mean Average Precision). On testing with the COCO dataset, YOLO9000 has scored 16.0mAP for the 156 classes not in COCO [6].

5.2 Dataset Description Images of Indian urban and rural locations were used. A good mixture of images of busy streets, markets and other public spaces was used. Four hundred images were used to check the versatility of YOLO9000, especially to check its application in monitoring crowded Indian public spaces.

5.3 Experimental Setup YOLO9000 [6] has been chosen as the object detecting algorithm for its capability to provide coordinates of bounding boxes around 9000 objects and to provide confidence scores. This case study focuses on detection of the ‘people’ label. The images were uploaded to the tool, where each image was manually annotated using the annotation tool. Finally, the coordinates for people detected by YOLO9000 were evaluated with respect to those detected by humans, using IoU. It is unrealistic to expect the model to predict the exact coordinates of any detected object. By considering the area of overlap between the ground-truth bounding boxes and the predicted coordinates, the closeness of values generated by the model and by hand labelling can be measured (Fig. 4). In the accuracy analysis, the IoU is computed for each object (person) detected. Further, the detected bounding boxes are looped over, and the IoU for each is computed. In order to measure IoU, each bounding box labelled as ‘people’ is checked with all the possible ground truths. Then, the maximum IoU is considered for that specific bounding box (non-max suppression). The above is repeated for each bounding box detected by the YOLO9000 algorithm. To tackle the possibility that people detected in the ground truth are completely ignored by the YOLO9000 algorithm, the difference in the number of people detected by YOLO9000 and the number of people annotated for ground truth is calculated.

324

P. Myna et al.

Fig. 4 Comparison of annotation by a humans and b YOLO9000

Fig. 5 Case study results: annotation by YOLO9000. Red bounding boxes represent annotation by humans, and blue ones represent annotation by YOLO9000

These unaccounted detections of people are added with zero values to the list of the people detected for IoU calculation. Then, IoU scores are averaged over images and finally over the entire dataset (Table 1). Table 1 IoU scores for case studies Scenario

Resolution

Density of people

Distant

Medium

Low

Fig.

IoU 0.30967884374807597

5a Blurry image with clustered objects

Low

Well-distributed objects

High

High

0.518701104056638 5b

Low

0.7732527620999491 5c

Annotation for Object Detection

325

5.4 Results Efficient image annotation is possible using this tool. Also, this tool enables streamlined, convenient testing of object detection and text detection algorithms. This would be a significant convenience during the development of algorithms related to computer vision. On comparing manual annotation to YOLO9000 annotation, an IoU score of 0.3005834618310959 was obtained for our dataset. On assuming that the human eye has an accuracy of 100% in detecting people, YOLO9000 scored about 30%. This shows that human annotation might be more reliable for a variety of sensitive needs.

6 Conclusion and Future Enhancements This tool helps to conveniently annotate large sets of images. Functionalities to allow annotation by YOLO9000 have also been implemented. The use of open-source software has made the tool inexpensive and thus, accessible, The YOLO9000 feature currently processes an image in approximately 4 s on a 1.6 gigahertz, dual core system. The use of higher capacity processors would greatly improve the speed of the YOLO9000 feature. Additionally, the use of improved metrics might help us better evaluate and compare human annotation to machine annotation.

References 1. Russell, B.C., Torralba, A., Murphy, K.P., et al.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77, 157–173 (2008) 2. Cheng, Q., Zhang, Q., Fu, P., Tu, C., Li, S.: A survey and analysis on automatic image annotation. Pattern Recogn. 79:242–259 (2018) 3. Zhang, D., Islam M.M., Lu, G.: A review on automatic image annotation techniques. Pattern Recogn. 45(1):346–362 (2012) 4. Lin, T.-Y., et al.: Microsoft coco: common objects in context. In: European Conference on Computer Vision. Springer, Cham (2014) 5. Redmon, J., et al.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016) 6. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) 7. Koch, G., Zemel, R., Salakhutdinov, S.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, vol. 2 (2015) 8. Deng, J., et al.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2009) 9. Xiao, J., et al.: Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE (2010) 10. Rosebrock, Adrian: Intersection over Union (IoU) for object detection-Py image search. Machine Learning, Object Detection, Tutorials (2016)

Development of Self Governed Flashing System in Automotives Using AI Technique N. Sankarachelliah, V. Rijith Kumar, P. Senthilram, S. Valai Ganesh, T. Selva Sundar, S. Godwin Barnabas, and S. Rajakarunakaran

Abstract Developing an intelligence system to automatically turn ON or OFF the indicator in automotive (particularly four wheeler) by drawing input from sensors. Almost 50% of drivers failing to use indicators while changing lanes or overtake a vehicle. This leads to vehicle accidents and may cause some serious issues. The proposed system comprises of a steering angle sensor, optical sensor for vehicle detecting and tracking and also incorporates open CV (AI tool) for lane detection and this system is adaptable to the current situation. Keywords Vehicle · Indicator · Intelligence system · Automatic · OpenCV

N. Sankarachelliah (B) · V. Rijith Kumar · P. Senthilram · S. Valai Ganesh · T. Selva Sundar · S. Godwin Barnabas · S. Rajakarunakaran Ramco Institute of Technology, Rajapalayam, Tamil Nadu 626117, India e-mail: [email protected] V. Rijith Kumar e-mail: [email protected] P. Senthilram e-mail: [email protected] S. Valai Ganesh e-mail: [email protected] T. Selva Sundar e-mail: [email protected] S. Godwin Barnabas e-mail: [email protected] S. Rajakarunakaran e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_35

327

328

N. Sankarachelliah et al.

1 Introduction New technologies are nothing but continual development in the technologies already exist with the further updating. Today most part of our surroundings are equipped with an intelligence system, which play a vital role in our day to day life. They are playing major role in the every activity of our day to day life to further improve the performances and to enhance the human ability. System which is programmed to think like humans and mimics their actions [1]. The Intelligent System (IS) can be described as a device that integrates information into machine handling applications. Intelligent devices often perform complex automatic processes that are not feasible under the conventional programming model. Human-machine interface helps the driver to perform various tasks, for example intelligence system can be used to control the turn signals in automotives. The current turn signaling system requires the driver to turn on/off the signal for the required turn. The remainder of the article is structured as follows. Section 2 describes the Need Statement and Present Study in Sect. 3. Discussion on the existing system in Sect. 4 and discussion of our proposed system in Sect. 5. The experimental findings were discussed in Sect. 6 followed by conclusion and future work in Sect. 7.

2 Necessity of the Device The testing is carried out in the four-wheeler car for making a turns, change directions or pass a car Drivers normally use the indicators [2]. Almost drivers either refuse to signal or do not turn off the indicator while changing from one lane to another [3, 4]. Though it look like a minor violation to refuse a signal change, a lot of car crashes occur when turning without notice or when the lane switches.

3 Problem Identification The present study conducted by automotive engineers shows that most of the drivers nearly 48% failed to turn the indicator off while changing the lane or while making a turn and similarly 25% failed to turn the indicator on while making a turn [4]. More study shows that most of the drivers not using the turn signals, nearly 20 crore times for a day, for a year it comes to nearly 7500 crore time. It creates more problem than the disturbances in the driving. Mentioned numbers shows that there is an alarming rate of increase in this problem and it is happening globally all around the world.No solution has been made till date to address to this issue and whole present system is dependant on the driver input. A driver mistake on the road not only threatens the safety of the driver, but also that of the following cars. A single act of neglect quickly impacts a variety of individuals.

Development of Self Governed Flashing System in Automotives …

329

4 Existing Systems 4.1 Conventional Turn Indicator The conventional turn indicator is fully manual controlled system which requires the driver to turn on/off the signal for the required turn. Often, this system may delay a driver’s response to trigger the turn signal. Some drivers don’t trigger the turn signal because their hands need to be separated from the steering wheel to turn the light on. The approach is much more challenging for less experienced drivers.

4.2 ORVM ORVM defines Outside Rear View Mirror shown in Fig. 1. Indicators mounted on the rear view mirror to make sure the driver can quickly locate the signal and respond correctly, particularly though a vehicle drive parallel to the car and has crossed the traditional indicator mounted on the windshield. This is also very suitable for a U-turn, because from a perpendicular perspective, these signs are clearly visible. Fig. 1 OVRM

4.3 Automatic Vehicle Turn Indicator Using Speech Recognition (Still, It Did Not Come into the Market) The system actuates the indicator by recognizing the driver’s speech which is done with the help of Google Maps Voice Assistant [5].

5 Proposed Solution The proposed system is designed to automatically turn the indicator on / off and totally evacuate the manual operation during overtaking and change of lane. Currently, the

330

N. Sankarachelliah et al.

proposed system focuses only on two-way traffic. The system draws input from various devices and sensors. The Framework comprises of 3 segments 1. Camera data source 2. Steering angle data source 3. Ranging sensor. When the vehicle (A) leaves its current lane, it crosses a lane line with or without a second vehicle (B) in proximity. The automatic activation may occur when the vehicle processes a data from the device on the first vehicle to determine whether or not it crosses the lane [6]. If the first vehicle (A) is computed, the turn signal will be activated. In the case of unmarked lane The automatic activation may occur when the vehicle processes a data from the device on the first vehicle (B) to determine whether any vehicle is in front of that or not and the distance between two vehicles is lowered or not if it is determined then the turn signal will be activated. For Prior Indication Velocity Difference is inversely proportional to the distance. Relative Velocity between A and B VB − V A = 40 km/h The distance between A and B is decreased. There is a more chance to overtake (Fig. 2). Fig. 2 Relative velocity

Introduction of Autopilot [7] mode enhances safety and comfort features of the vehicle. Autopilot is designed to support you with the most burdensome driving parts. Autopilot adds new features to make the Tesla safer and more reliable over time and improves current functionality. Autopilot allows the car to automatically steer, accelerate and brake within its path. Present Autopilot functions require active supervision of the driver. Block Diagram Figure 3 shows a flowchart of an exemplary SGF embodiment.

Development of Self Governed Flashing System in Automotives …

331

Fig. 3 Process flow

6 Experimental Results System Block Diagram is proposed and designed software for lane detection and tracking using OpenCV (AI tool) [6]. Blinkers are automatically switched on/off, that is made by controller. It requires a program (code) that is also designed for the implementation (Fig. 4). Fig. 4 Lane detection and tracking

Avoid almost 50% of drivers failing to use indicators while changing lanes or overtake another vehicle.

7 Conclusion and Future Work Compared to conventional system the proposed system is an efficient and completely restricts manual intervention when an indication of a turn is required. Through this we may prevent accidents. The project’s future work is to extend our system for all

332

N. Sankarachelliah et al.

types of vehicles and for all way traffic. And switch to alternate vision solutions. It’ll then be practically tested. Acknowledgements We would like to express our sincere thanks and gratitude to our college management for providing excellent infrastructure, laboratory and computing facilities to complete this research work successfully.

References 1. https://www.igi-global.com/dictionary/intelligent-system/15045 2. Yusuf, M.M., Karim, T., Saif, A.S.: A robust method for lane detection under adverse weather and illumination conditions using convolutional neural network. In: Proceedings of the International Conference on Computing Advancements, pp. 1–8 (2020) 3. http://www.foxbusiness.com/features/2012/05/04/half-drivers-dont-use-turn-signals 4. Ponziani, R.: Turn signal usage rate results: A comprehensive field study of 12,000 observed turning vehicles. In: SAE Technical Paper. SAE International (2012). https://doi.org/10.4271/ 2012-01-0261 5. Divakar, A., Krishnakumar, S., et al.: Automatic vehicle turn indicator using speech recognition. Int. J. Recent Technol. Eng. (IJRTE) 8, 6697–6700 (2019) 6. https://towardsdatascience.com/tutorial-build-a-lane-detector-679fd8953132 7. https://www.tesla.com/autopilot

Comparison Between CNN and RNN Techniques for Stress Detection Using Speech Bageshree Pathak, Snehal Gajbhiye, Aditi Karjole, and Sonali Pawar

Abstract The profession of maintaining law and order is not an easy task. It is an inherently stressful job. Due to an increase in crime, policeman’s working hours have also increased, resulting in poor psychological health and increased risk of suicide. Hence, we are building software for the detection of stressed and non-stressed speech for policemen. We propose to develop a system for Central Police Research (CPR) using Machine Learning techniques. We are identifying if a person is in a stressed or non-stressed condition using Python language. We are using two techniques Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) to detect stress in speech. Keywords Police · Machine learning · Feature extraction · Supervised learning · NN · MFCC · CNN · RNN

1 Introduction Speech is an expression of ideas and thoughts using articulate vocal sounds. Stress is a mental, physical, or emotional factor which cause mental or bodily tension. In this research work, we are using machine learning techniques to determine whether an individual is in stress or non-stress condition, given an audio recording. The database is generated in two ways for this research work which are database generated at B. Pathak · S. Gajbhiye · A. Karjole (B) · S. Pawar Department of Electronics and Telecommunications, MKSSS’s Cummins College of Engineering for Women, Pune, India e-mail: [email protected] B. Pathak e-mail: [email protected] S. Gajbhiye e-mail: [email protected] S. Pawar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_36

333

334

B. Pathak et al.

CPR department and recorded voice samples from Internet. In the training phase, recorded samples need to be converted into an appropriate format and provided to the preprocessor for applying different processing techniques such as noise reduction, silenced voice removal, etc. Preprocessing output is given to the feature extraction. We have used melfrequency cepstral coefficients. Representation of power spectrum of a speech is called Mel-frequency Cepstral (MFC). MFCC is the most efficient technique for feature extraction, and it is further given to a supervised learning algorithm that is CNN and RNN techniques. CNN generates fixed size output by taking fixed size input. RNN, on the other hand, can handle arbitrary input/output lengths, but would typically require much more data compared to CNN because it is more complex.

2 Literature Survey Kouzani and Kipli [1] present a depression detection using MRI. To find whether the person is under depressed or normal condition, the brain’s structural MRI, and it’s volumetric features has been investigated so that we can determine features that are contributed to the more accurate depression detection. It gives accuracy up to 80% but the cons of this existing system is that it is a costly system. Lee and Kan [2] have researched depression detection using EEG. They stated that in a recent study about this topic electroencephalography (EEG) is used for analyzing the waves of the brain. Brain’s electrical activity is measured by EEG. They got the accuracy of about 70% but as a cons of this project, usually people don’t prefer to go through EEG for detection of depression. Even it is giving high accuracy people don’t prefer this process. In this paper [3], the main target of the review was to discover the occurrence of stress or anxiety or depression for patients having normal pathologies prompting voice. The pathologies were focused on MTD and PVFMD because of its presumed connection to mental conditions of patients.” In [4], feature extraction used as the impulse response of the vocal response track, and then, it is convolution with the glottal source of excitation signals, and fundamental frequency IAIF removes vocal track effects. SWIPE algorithm as a feature selection method. The technique used for classification is specifically, stressed and neutral PDF curves. In the paper, feature extracted are formants, BFCC, PLP, MFCC, and energy. In this existing system, if they would use neural network algorithms which surely add on to the accuracy of their system. Alghowinem [5] uses the dataset of the speech signal, extracted features such as linguistic and acoustic, energy, intensity, loudness, jitter, HNNNR, MFCC followed by support vector machine classification technique. Databases used are Berlin database, entered face database, and expressive speech database. Database used for developing this system is highly efficient. In [6], the ORI-DB database used in this with the extraction of feature technique as a spectral category with Support Vector Machine (SVM) mode classifier. The

Comparison Between CNN and RNN Techniques for Stress …

335

accuracy is obtained between 80 and 84.5%, due to the prior knowledge of the emotions considered with the help of processing of speech. In this system, accuracy achieved more due to known dataset samples. Hence, cons of this research work is that we can process only the known samples of dataset. System is not efficient for unsupervised techniques.

3 Database Generation The database is generated in two ways for this research work.

3.1 Database Generated at CPR Database is generated at CPR department in collaboration with Cummins College of Engineering and SNDT Arts and Commerce College. We visited the training center of the police to collect speech samples. We took around 50 voice samples of officers both stressed and non-stressed. We have also provided questionnaires to all officers in the format of Google form. Based on their answers in questionnaires, the psychology department of SNDT validated the stressed and non-stressed speeches. This validation is done for a better understanding of the database and to cross verify the result after getting output as stressed or non-stressed.

3.2 Recording Speech Samples We have collected speech samples of stressed and non-stressed from media coverage and YouTube videos. By considering the recent incidents such as the nationwide pandemic situation of COVID 19, acid attack survivor’s speech, other judicial victims of various cases, etc., the stressed speech samples collected from all such situational videos. For non-stressed speech samples, by taking samples of family members, relatives as they were in non-stressed phase.

4 Methodology See Fig. 1.

336

B. Pathak et al.

Fig. 1 Block diagram

Fig. 2 Plot for speech signal

4.1 Signal Pre-processing Audio channels, sample rate, bit depth are the audio properties that need preprocessing. Librosa package in a python is used for audio processing. In this research work for preprocessing, we used Librosa’s load() function. It has a default sampling rate of 22.05 kHz; it normalizes the data and flattens audio channels to mono. Figure 2 shows speech signal after preprocessing. Duration of audio sample is set to 3 s.

4.2 Feature Extraction In this research work, we have used Mel-frequency cepstral coefficients. Representation of the power spectrum of a speech is called Mel-frequency Cepstral (MFC). MFCC is considered to be as most efficient technique for feature extraction, and it is

Comparison Between CNN and RNN Techniques for Stress …

337

Fig. 3 Plot for MFCC data

further given to a supervised learning algorithm that is CNN and RNN techniques. Figure 3 shows the plot for MFCC data. We have taken 13 MFCC coefficients per frame for our dataset. There are total 259 frames for each audio sample.

4.3 Classification The classification of stressed data and non-stressed data has been done using two classifiers that are RNN and CNN.

4.3.1

RNN

RNN is a supervised machine learning technique. RNN is one of the types of artificial neural networks. RNN uses its internal state for processing variable-length sequences of inputs which are derived from feedforward neural networks. RNN is used to denote networks of two classes with a similar structure, one having infinite impulse and the other having finite impulse. Here, we have used Long Short-Term Memory (LSTM) which is an RNN architecture. Feedback connections are present in LSTM. It can process single data points, as well as complete sequences of data like video or speech. LSTM commonly comprises of a cell, input, an output, and a forget gate. The cell can remember values over random time intervals. The flow of data in and out of the cell is controlled by three gates. LSTM systems are suitable to process, classify, and make predictions using time series data as there can be delays in the unknown period among significant

338

B. Pathak et al.

time series events. While training traditional RNN, a fading gradient problem is encountered. LSTM is developed to deal with this problem.

4.3.2

CNN

CNN is a supervised machine learning technique. CNN involves input, output, and multiple hidden layers. In hidden layers, series of convolution layers are present. The RELU layer is normally used as an activation layer. There are additional convolutions such as fully connected, pooling, and normalization layers. These layers are called hidden layers. Activation function and final convolution are used to mask inputs and outputs of hidden layers. In the convolutional layer, stride, depth, and zero padding are the three hyperparameters which control the size of the output. The formula to calculate number of neurons which will fit in a given volume is [(i−k + 2 p)/s] + 1[(i−k + 2 p)/s] + 1 I k p s

input size kernel field size of the convolutional layer neurons zero paddings, and stride.

5 Results A loss model indicates how bad the model’s prediction on a single epoch. In the case of RNN, we have taken 10 epochs, and for CNN, 50 epochs have taken. If the model’s prediction is true during validating the data, then the loss will be zero, else the loss will be more. Model loss of training and testing data for CNN and RNN is shown in Figs. 4 and 5, respectively. We have taken epoch on X-axis and loss on Y-axis to show the model loss. We used 104 speech samples to train the model. For RNN, we got 85.58% accuracy. For CNN, we got 81.73% accuracy. The confusion matrix is a table that describes the performance of the classifier using the results of validation data. The confusion matrices shown in Tables 1 and 2 for validation data are obtained from two classification RNN and CNN, respectively. We can verify model’s accuracy from confusion matrix tables.

Comparison Between CNN and RNN Techniques for Stress …

339

Fig. 4 Model loss of RNN algo

Fig. 5 Model loss of CNN algo

Table 1 Confusion matrix for RNN

Stress

Non-stress

Stress

51

8

Non-stress

7

38

Stress

Non-stress

Table 2 Confusion matrix for CNN Stress

54

5

Non-stress

14

31

340

B. Pathak et al.

6 Conclusion In this research work, we have developed a software to detect whether a person is under stress or not. Our research work is completely dedicated to the police department. For this research work, we used Python 3.7 software and Spyder IDE compiler to implement python code. We generated database by two means, firstly from CPR department’s trainee officers and secondly by using media coverage and YouTube videos for stressed and non-stressed speech samples. The database which we collected from the CPR department has been verified by the psychology department of SNDT college. We have used CNN and RNN artificial neural network techniques to get the final result. Testing accuracy obtained by the CNN technique is 81.73%, and the by RNN technique is 85.58%. Therefore, from the results of both techniques, we conclude that RNN is having greater accuracy than CNN for our database. Acknowledgements We also like to express special thanks of gratitude to Central Police Research (CPR) Department who gave us the opportunity to do this wonderful project and contributing toward the betterment of the health of police officials.

References 1. Kouzani, A.Z., Kipli, K.: Evaluation of feature selection algorithms for detection of depression from brain SMRI scans. Adv. Comput. Sci. Appl. Technol. (ACSAT) (2013) 2. Lee, P.F., Kan, D.P.X.: Decrease alpha waves in depression: an electroencephalogram (EEG) study. In: International Conference on Biosignal Analysis, Processing and Systems (ICBAPS) (2015) 3. Dietrich, M., Abbott, K.V., Gartner-Schmidt, J., Rosen, C.A.: The frequency of perceived stress, anxiety, and depression in patients with common pathologies affecting voice. J. Voice 22(4) (2008) 4. Simantiraki, O., Giannakakis, G., Pampouchidou, A.: Stress detection from speech using spectral slope measurement. Pervasive Comput. Paradig. Mental Health (2016) 5. Alghowinem, S.: A comparative study of different classifiers for detecting depression in speech: multi classifier system. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2013) 6. Stolar, M.N., Lech, M., Allen, N.B., Stolar, S.J.: Detection of Adolescent depression from speech using optimized spectral roll-off parameters. Biomed. J. 2, 10 (2018) 7. Fung, P., Zuo, X., Li, T.: A multilingual database of natural stress emotion. In: Proceeding of the 8th International Conference on Language Resources and Evaluation (LREC’12) (2012) 8. Hawila, S., Tomba, K., Dumoulin, J., Khaled, O.A., Mugellini, E.: Stress detection through speech analysis. In: Proceeding of the 15th International Joint Conference on e-Business and Telecommunication (ICETE) (2018)

Finding the Kth Max Sum Pair in an Array of Distinct Elements Using Search Space Optimization Deepak Ahire , Smriti Bhandari , and Kiran Kamble

Abstract The algorithm aims to find the Kth max sum pair of two indices of an array of N (N ≥ 2) distinct elements [a1 , a2 , a3 , …, an ]. If the sum of values represented by the 2 indices of a single pair in array A is the same as that of any other pair, i.e., if P(i, j) and P(m, n) are 2 distinct pairs and if (A[i] + A[j] = A[m] + A[n]), then the pair containing the index which represents the maximum of all 4 values represented by indices of the 2 pairs in the array obtains the highest priority, i.e., if (A[m]>A[i]>A[n]>A[j]), then the pair containing the index m obtains the highest priority. The purpose of this algorithm is to optimize the computation of recommendations on real time platforms. At the time of making a purchase on e-commerce platforms, with millions of options available in the product catalog, the algorithm can be used to recommend the best complementary product that can be bought as a pair with the main product or two all together different products of same type as of main product which can be bought as a combo or a pair. Not only the top recommendations, but random recommendations are also necessary so that the customers get a good breadth or variety of the available products in the catalog. In this paper, we propose an algorithm which can be used to address both the scenarios in real time and conclusively, it is evident that the time and space complexities are independent of K.

All the authors have an equal contribution towards work. D. Ahire (B) Walchand College of Engineering, Sangli, Maharashtra, India e-mail: [email protected] S. Bhandari Department of Computer Science and Engineering, Annasaheb Dange College of Engineering and Technology, Ashta, Maharashtra, India e-mail: [email protected] K. Kamble Department of Computer Science and Engineering, Walchand College of Engineering, Sangli, Maharashtra, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_37

341

342

D. Ahire et al.

Keywords Algorithm · k-max-sum-pair · Searching · Sorting · Product-recommendation · Real-time-searching · Search-space-optimization

1 Introduction Searching is presently one of the most tedious tasks. With increases in the amount of data every second, searching can consume a substantial amount of CPU time depending upon the data organization and searching mechanisms used. Not only the processors, but it also taxes the users performing online search [1, 2]. More importantly, regarding the retrieval of data in real time, for example, in the case of an e-commerce platform, Amazon found that a delay of a fraction of a second can cost several percentage points of sales, as discussed in [3]. A Harvard business review discusses how the design of the product page also affect the online sales [4]. In addition to processing time and page design, the consumer demands are also affected by the availability of substitutes and complements, as discussed in [5]. Not only swiftly rendering customers’ requirements but, sales promotion is also important to change their perception and purchasing behaviour [6]. One of the most important incentives of sales promotion is product bundling or combo [7]. Customers generally tend to buy combos instead of one main product if offered at the same price or lower price. According to the study discussed in [8], it was found that the customers placed a perceived value on combo meals, even if it would cost the same when choosing items a la carte. People also prefer combo meals even there is no discount [8]. Results reported in [9], on basis of experiments provided empirical evidence that customer preferred bundles in circumstances when the searching cost was reduced by the availability of the combos as a choice. As customers expect the swift and graceful experience because of the fact that humans are generally bad at choosing when plenty of options are available, described in [10], loading all possible recommendations on the page is not a feasible option as it would also consume a lot of time. Instead, the top K matching combos can be recommended to the customer, which is analogous to the “Top N Video Ranker” technique used by Netflix as discussed in [11]. Not only the top K matching recommendations, but completely random recommendations are also useful for customers so that they get a good breadth or variety of the available products in the catalog as discussed in [11, 12]. The random recommendations, not only provide a good breadth of available products, but act as a choice for customers in terms of price, brand, current situations, publicity, and many more. These can also be used to promote new, popular, non-recent and non-popular items which would have not been found out by the users as described in [13]. A customer may like the recommended combos, but may reject it on the basis of price as discussed in [14]. Customers also buy combos with high cost than their preferred price limit if they get a better product quality, brand or for something extra which they might not have considered while buying in the first place [15]. Therefore, computing and suggesting random recommendations are also crucial with the top K recommendations taking swiftness into account. In this paper, we propose an algorithm which can be used

Finding the Kth Max Sum Pair in an Array of Distinct Elements …

343

to address both the scenarios. At the time of making a purchase on e-commerce platforms, with millions of options available in the product catalog, the algorithm can be used to recommend the best complementary product that can be bought as a pair with the main product or 2 all together different products of same type as of main product which can be bought as a combo or a pair. Once the customer filters the product and adds the main product (the product of his/her choice) into the cart, the proposed algorithm in the article comes into play. It has to suggest pairs of products from a list or collection of relevant or similar products with respect to main product. This list or collection of the relevant products is already computed by the recommendation engine on basis of several metrics such as commonality index, which represents the relatedness of a item to other relevant item [16]. Although, the list is not the same at all time, as the engine is constantly learning in the backend considering several factors. But, for a particular instance of time, this list of relevant products can be used in real time to suggest the pairs of product or combo. This problem can be solved by finding the Kth Max Sum Pair in an array of distinct elements. Here, for example, the elements of the array can represent the commonality index assigned to other products with respect to the the main product. For a pair or combo of related products, the commonality index can be computed as summation of commonality indices of the individual products. Computing the Kth Max Sum Pair in real time is tricky as this task is both, compute and space intensive because there can be millions of related products  to the main one and the maximum number of possible pairs or combos, K max = N2 , where N is the size of the list of related products. This paper is organized in sections as follows: Section 2 describes the abbreviations used, an example to explain the problem statement and the related work. The proposed approach is presented in Sect. 3 with an algorithm and complexity analysis. The experimental setup, results and a detailed discussion on results for the implementation with different dataset are reported in Sect. 4. Finally, the Sect. 5 provides the conclusion.

2 Problem Statement 2.1 Abbreviations Table 1 lists the abbreviations used in this manuscript.

2.2 Example The following is an example that aims to explain the use case:

344

D. Ahire et al.

Table 1 Abbreviations used in this manuscript Abbreviation Definition A MAX_SUM MIN_SUM N P(i, j) P-QUE PRI-{i} PRI-1 PRI-2

S TARGET_PAIR

Input array (0 based indexing is used) Sum of a pair having a maximum sum (A[N − 1] + A[N − 2]) after A is sorted Sum of a pair having a minimum sum(A[0] + A[1]) after A is sorted Size of the array Pair of indices i and j, where j > i, i < N, and j < N Priority queue for holding pairs P(i, j) and obeying PRI-1 and PRI-2 Priority for maintaining pairs in the queue. Here i represents the priority number Non-increasing order for a pair sum If the sum of values represented by 2 indices of a single pair in array A is the same as that of any other pair, for example, if P(i, j) and P(m, n) are 2 distinct pairs and if (A[i] + A[ j] = A[m] + A[n]),then, the pair containing the index which represents the maximum of all 4 values represented by indices of the 2 pairs in the array obtains the highest priority. For example, if (A[m]>A[i]>A[n]>A[ j]), then the pair containing the index m obtains the highest priority Set for holding unique pairs P(i, j) The required Kth pair

Consider an array of distinct elements, A = {1, 2, 3, 4}, which represents the list of commonality indices of the products. The maximum value, 4, belongs to the main product and the other three belong to the related products. Thus, the possible set of pairs (representing pairs of indices of the array A, sorted according to PRI-1 and PRI-2, as mentioned in Table 1) are = {(2, 3), (1, 3), (0, 3), (1, 2), (0, 2), and (0, 1)}. Notice that the 3rd pair and the 4th pair ordered in the above set represent an equal sum, i.e.(A[0] + A[3]) = (A[1] + A[2]), but the 3rd pair obtained the highest priority as A[3] >A[2] >A[1] >A[0]. Therefore, if K = 3, then the answer is P(0, 3), which means that the combo of the main product and the product having commonality index of 1 (as A[0] = 1) stands 3rd in the list of combos with respect to PRI-1 and PRI-2.

2.3 Related Work The algorithm devised in this article is inspired by a similar use case based on two arrays. The use case aims to find the first K maximum sum pairs from all the possible sum pairs using the two given arrays, as discussed in [17–20]. For our scenario, we

Finding the Kth Max Sum Pair in an Array of Distinct Elements …

345

need an approach which works for a single array. The naive approach is to compute a set of all possible pairs P(i, j) and sort them according to PRI-1  and PRI-2. After sorting, the first K maximum sum pairs are returned. There are N2 distinct pairs that   can be formed from a list of N elements. Therefore, for sorting, it will take O( N2 * N    log( 2 ) = O(N 2 * log(N)) time complexity and O( N2 ) space complexity. A more optimised approach is to limit the search space as discussed in [17–20]. Identical approach was used to devise an optimised algorithm which works for a single array, provided in Algorithm 1. Algorithm 1 Find_Kth_Max_Sum_Pair(A, K) Finds the Kth Max Sum Pair in an array of distinct elements Pre A is the array containing distinct elements, K is a constant Post Array A is sorted Return The TARGET_PAIR 1: Sort the array A in a non-decreasing order. 2: Enqueue the pair having MAX_SUM, i.e., P(N−2, N−1) into P-QUE. 3: Initialise temporary variable dequeue_count = 0. 4: Initialise S to an empty set (to avoid insertion of duplicate pair in the P-QUE ). 5: Loop( P-QUE is not empty and dequeue_count = K−1 ) do 5.1: Dequeue the P-QUE front item. (let it be P(i, j)). 5.2: Increment dequeue_count by 1. 5.3: Insert the dequeued pair P(i, j) into set S. 5.4: Enqueue new pair P(i−1, j), if not present in the set S, if i−1 ≥ 0 and (i−1) = j. 5.5: Enqueue new pair P(i, j−1), if not present in the set S, if j−1 ≥ 0 and (j−1)= i. End Loop 6: Return P-QUE front (front item is the required TARGET_PAIR). End Algorithm 1

Rather than computing all the possible pairs, the focus is to generate only the first K Max Sum Pairs. We are enqueuing the pair once and also dequeuing it from the queue. Therefore, for each pair, there are 2 operations (enqueue into and dequeue from P-QUE). Therefore, for K number pairs, the time complexity is equal to O(K * 2 * log(K)), that is, O(K * log(K)), as maximum number of pairs   possible in this case is of the squared order of the size of the input (K max = N2 ). The factor of log(K) is generated because of the max heap operations. Gerald’s O(1) time priority queue [21] is significant in reducing the factor of log(K) and thus finally reducing the time complexity to O(K). A. Mirzaian and E. Arjomandi devised an O(N) time algorithm for a similar use case for selecting the Kth smallest element in matrix which is cartesian sum of 2 sorted vectors of real numbers each of size N [22]. For our scenario, we have to compute the Kth Max Sum Pair using a single array. For K = 1, we can just find the maximum sum pair in the given array and the pair can be computed in O(N)  complexity and O(1) space complexity as discussed in [23].  time The case for K = N2 is equivalent to finding the minimum sum pair in the given array and the pair can be computed in O(N) time complexity and O(1) space complexity as discussed in [24].

346

D. Ahire et al.

Table 2 All pairs and corresponding pair sums are from Sect. 2.2 mentioned earlier Pairs Corresponding pair sum Number of pairs (having pair sum ≥ Corresponding pair sum) (2, 3) (1, 3) (0, 3) (1, 2) (0, 2) (0, 1)

7 6 5 5 4 3

1 2 4 4 5 6

The pair sum is the sum of the values in array A at the indices represented by the pair, and the corresponding pair sum is the pair sum of the pair mentioned in the respective rows

3 Proposed Approach Even the optimized version, that is algorithm 1 have both the time and space com plexities of the squared order of N (K max = N2 ). It would consume a substantial amount of CPU time and hence will lead to a greater user response time, when considered to be applied to a real-life scenario where the basic size of the input starts with a value greater than or equal to a million. Therefore, we devised an algorithm to find the answer in time and space complexities, that are independent of K. Before we dive into the approach, the following facts are worth examining: • We know that the TARGET_PAIR will have a pair sum, which we will call TARGET_SUM. • MIN_SUM ≤ TARGET_SUM ≤ MAX_SUM. • Table 2 forms the basis of the development of the approach. • The number of pairs with a pair sum greater than or equal to the given sum can be computed in time complexity O(N * log(N)). This can be achieved by subtracting the number of pairs with pair sum less than the given sum from total number of possible pairs. The number of pairs with pair sum less than the given sum can be computed in time complexity O(N) for sorted array as discussed in [25]. For unsorted array, it takes an extra factor of log(N) to sort the array. The key is to find the greatest TARGET_SUM that follows Eq. 1. Number of pairs(having pair sum ≥ TARGET_SUM) ≥ K

(1)

Having computed the greatest TARGET_SUM, we know that there are at maximum N/2 such pairs possible that have a pair sum equal to the greatest computed TARGET_SUM. Thus, our new search space is now reduced to the size of N/2 at maximum. TheTARGET_PAIR lies in the newly generated search space.

Finding the Kth Max Sum Pair in an Array of Distinct Elements …

347

We need an offset to find the Kth pair. We cannot directly return the Kth pair in the new search space, so we need to subtract the count of pairs (having pair sum > greatest computed TARGET_SUM). Therefore, let us define a function, F(given Sum), which will return the number of pairs having a pair sum ≥ givenSum. Therefore, F(givenSum) = (Number of pairs with pair sum ≥ givenSum). Then, New Offset(K New ) = K − F (greatest computed TARGET_SUM + Δ)

(2)

Note that, in Eq. 2 a very small value (Δ), is added to the greatest computed TARGET_SUM and then passed to function F, as we want the count of pairs having a pair sum greater than the greatest computed TARGET_SUM. Note: Δ can take any type of value as per the datatype of input array elements. For example, if the datatype of an array is an integer, then Δ = 1 or, if the datatype of array is a floating-point, Δ = 0.001. Finally, the K N ew th pair in the new search space is the required TARGET_PAIR.

3.1 Proposed Algorithm The proposed approach to solve the problem is provided in Algorithm 2 Algorithm 2 Find_Kth_Max_Sum_Pair(A, K) Finds the Kth Max Sum Pair in an array of distinct elements Pre A is the array containing distinct elements, K is a constant Post Array A is sorted Return The TARGET_PAIR 1: Sort the array A. 2: Assign MIN_SUM = A[0]+A[1]. 3: Assign MAX_SUM = A[N−2]+A[N−1]. 4: Binary Search on TARGET_SUM (lower_bound = MIN_SUM, upper_bound = MAX_SUM, K): 4.1: Find the greatest TARGET_SUM, such that F(TARGET_SUM) ≥ K End Binary Search on TARGET_SUM. 5: Generate the new search space having pairs that have a pair sum equal to the greatest computed TARGET_SUM. 6: Calculate the New Offset(K N ew ) = K − F (greatest computed TARGET_SUM + Δ). 7: Return the K N ew th pair from the new search space. End Algorithm 2

3.2 Complexity Analysis Time Complexity: For sorting the array, it takes O(N * log(N)) and for binary search and finding the greatest TARGET_SUM, O(N ∗ log(MAX_SUM − MIN_SUM)). Therefore, Time Complexity = O(N ∗ log(N )) + O(N ∗ log (MAX_SUM − MIN_SUM)) = O(N ∗ max(log(N ), log(MAX_SUM − MIN_ SUM))).

348

D. Ahire et al.

Space Complexity: For the generation of New Search Space: O(N/2). Therefore, Space Complexity = O(N).

4 Experimental Setup, Results and Discussion Both the algorithms were implemented, and multiple tests were performed using the environment mentioned in Table 3. Tables 4, 5, 6, 7, 8 and 9 describe the results of the tests carried out on unsorted array containing N distinct elements having values in the range (1, N). Test 1 is comparison of average runtimes for both the algorithms. Average Runtime = (

K 

Runtime to compute the ith pair)/K

(3)

i=1

For Algorithm 1, K 

Runtime to compute the ith pair = 2 ∗ (1 ∗ log(1) + 2 ∗ log(2)

i=1

+ 3 ∗ log(3)

+ . . . . . . + K ∗ log(K )) ≈ K 2 ∗ log(K ) ≈ O(N 4 )

(4)

From Eqs. 3 and 4, it is evident that for Algorithm 1, the time complexity to compute average runtime is O(N 4 ), therefore, we have limited the input size for comparison to 200, to compute the average runtime in polynomial time. For Test 2, the average runtime for greater values of N was computed by limiting K, that is 1 ≤ K ≤ N . In Test 3, the average runtime was computed for first half of the range of values of K, that is, 1 ≤ K ≤ (N ∗ (N − 1)/4), whereas for Test 4, it was for second half, that is, (N ∗ (N − 1)/4) ≤ K ≤ (N ∗ (N − 1)/2). Test 5 was carried out for constant input size of N = 104 , so that the average runtime could be computed for greater values of K in polynomial time. Test 6 was carried out solely on Algorithm 2 for greater values of both N and K.

Table 3 Testing environment for both algorithms Environment type Value Testing platform Machine type CPU platform OS image Implementation Language RAM Number of vCPUs Runtime calculation

Google cloud n1-standard-4 Intel Haswell Debian GNU/Linux 10 (buster) C++ 15 GB 4 Using chrono library in C++

Finding the Kth Max Sum Pair in an Array of Distinct Elements … Table 4 Results for test 1 Input size (N) Metrics Average runtime in seconds

2 10 15 25 50 75 100 105 110 120 130 150 200

Algorithm 1

Algorithm 2

0.000005 0.000068 0.000176 0.000560 0.002677 0.006618 0.012644 0.013935 0.015522 0.018851 0.022641 0.031912 0.059142

0.000005 0.000006 0.000010 0.000024 0.000070 0.000135 0.000185 0.000192 0.000204 0.000223 0.000243 0.000342 0.000470

Table 5 Results for test 2 Input size (N) 200 400 1000 2000 5000 10000

% Reduction in average runtime 0 91.176 94.318 95.714 97.385 97.96 98.537 98.622 98.686 98.817 98.927 98.928 99.205

Average runtime in seconds Algorithm 1

Algorithm 2

0.000439 0.000944 0.002737 0.005621 0.015746 0.034195

0.000467 0.001175 0.003605 0.008752 0.029666 0.068591

Table 6 Results for test 3 Input size (N) Metrics Average runtime in seconds

10 50 100 150 200 250

349

Algorithm 1

Algorithm 2

0.000031 0.001219 0.005690 0.013735 0.025961 0.042503

0.000006 0.000074 0.000186 0.000344 0.000475 0.000599

% Reduction in average runtime 80.645 93.929 96.731 97.495 98.170 98.591

350

D. Ahire et al.

Table 7 Results for test 4 Input size (N) Metrics Average runtime in seconds

10 50 100 150 200

% Reduction in average runtime

Algorithm 1

Algorithm 2

0.000102 0.003794 0.017793 0.044392 0.084147

0.000007 0.000071 0.000186 0.000344 0.000473

Table 8 Results for test 5 K Metrics Average runtime in seconds

102 103 104 105 106 107 (104 ) ∗ (104 − 1)/2 = K max

93.137 98.129 98.955 99.225 99.438

% Reduction in average runtime

Algorithm 1

Algorithm 2

0.001313 0.006624 0.063618 0.732708 9.744984 126.241947 717.129630

0.059648 0.067545 0.067740 0.077039 0.073913 0.070026 0.009545

−4442.879 −919.701 −6.479 89.486 99.242 99.945 99.999

Table 9 Results for test 6 Algorithm 2 Runtime in seconds Input size (N) 104 105 106

K = 106

K = 107

K = 108

K = 109

K = 1010

K = 1011

0.079911 1.172436 14.508822

0.075864 1.148921 14.440534

NA 1.022144 15.067522

NA 1.113623 16.353549

NA NA 15.003332

NA NA 15.336468

From Table 4, it is evident that, for N ≤ 200, there was more than 90% reduction in average runtime as compared to Algorithm 1. From Test 2 and Test 5, it is evident that Algorithm 1 performs better than Algorithm 2, when 1 ≤ K ≤ N . From Tables 6 and 7, it is evident that, % reduction in average runtime for higher values of K is greater as compared to the lower values. For values of K ≥ N , the Algorithm 2 outperforms Algorithm 1. Table 8 depicts that as K increases, the % reduction in runtime for Algorithm 2 increases. Test 5 and Test 6, act as a proof that it is possible to calculate

Finding the Kth Max Sum Pair in an Array of Distinct Elements …

351

the pairs with lower priorities or higher in rank, in real time using Algorithm 2, giving them a fair chance to be recommended to the customer with saving more than 89% of average CPU time as compared to Algorithm 1.

5 Conclusion In this manuscript, we addressed the importance of both top K and random recommendations. We discussed why the existing algorithms have a high response time and can’t provide a fair chance to the pairs with lower priority. We proposed an optimized algorithm that finds the Kth max sum pair in time and space complexities independent of K and proved why it is a feasible real time searching option by carrying out various tests for different values of N, K and an input array of commonality indices thus, supporting a catalog of a million products.

References 1. Sohail, S., et al.: Product recommendation techniques for ecommerce—past, present and future. Int. J. Adv. Res. Comput. Eng. Technol. 1(9), 219–225 (2012) 2. Gayle, L.: How Product Bundling Can Boost Your E-Commerce Sales. https://returnonnow. com/2018/08/how-product-bundling-boost-ecommerce/ (2018) 3. Einav, Y.: Amazon Found Every 100ms of Latency Cost them 1% in Sales. https://www. gigaspaces.com/blog/amazon-found-every-100ms-of-latency-cost-them-1-in-sales (2019) 4. Harmeling, C. et al.: How to Design Product Pages that Increase Online Sales. https://hbr. org/2019/11/how-to-design-product-pages-that-increase-online-sales 5. Rousu, M., et al.: The effects of selling complements and substitutes on consumer willingness to pay: evidence from a laboratory experiment. Can. J. Agric. Econ. Revue canadienne d’agroeconomie. 56(2), 179–194 (2008) 6. Ai, W., Yazdanifard, R.: The review of how sales promotion change the consumer’s perception and their purchasing behavior of a product. Glob. J. Manage. Bus. Res. E Mark. 15(5), 32–37 (2015) 7. Foubert, B.: Product Bundling: Theory and Application. University of Antwerp, Faculty of Applied Economics, Working Papers (1999) 8. Sharpe, K., Staelin, R.: Consumption effects of bundling: consumer perceptions, firm actions, and public policy implications. J. Pub. Policy Mark. 29(2), 170–188 (2010) 9. Harris, J., Blair, E.: Consumer preference for product bundles: the role of reduced search costs. J. Acad. Mark. Sci. 34(4), 506–513 (2006) 10. Schwartz, B.: The Paradox of Choice. Harper Perennial, New York (2004) 11. Gomez-Uribe, C., Hunt, N.: The netflix recommender system. ACM Trans. Manage. Inf. Syst. 6(4), 1–19 (2016) 12. What the difference between global and random recommendations?. https://support. shippingeasy.com/hc/en-us/articles/115005400683-What-the-difference-between-globaland-random-recommendations 13. Hopfgartner, F.: News recommendation in real-time. In: Smart Information Systems: Computational Intelligence for Real-Life Applications, pp. 169–170. Springer International Publishing (2015)

352

D. Ahire et al.

14. Zhao, Q., et al.: E-commerce recommendation with personalized promotion. In: Proceedings of the 9th ACM Conference on Recommender Systems—RecSys ’15, pp. 19–226 (2015) 15. Shanthi, R.: Customer Relationship Management. MJP Publisher (2019) 16. Linden, G., et al.: Collaborative Recommendations Using Item-to-Item Similarity Mappings (2020) 17. Agrawal, N., Sharma, S.: K maximum sum combinations from two arrays—GeeksforGeeks. https://www.geeksforgeeks.org/k-maximum-sum-combinations-two-arrays/ 18. Gangwar, A.: N Max Sum Pairs. https://discuss.codechef.com/t/n-max-sum-pairs/14769 19. Liu, S.: N Max Pair Combinations. https://shengqianliu.me/heaps-and-maps/n-max-paircombinations 20. K maximum sum combinations from two arrays—Tutorialspoint.dev—TutorialsPoint.dev. https://tutorialspoint.dev/data-structure/heap-data-structure/k-maximum-sumcombinations-two-arrays 21. Paul, G.: A complexity O(1) priority queue for event driven molecular dynamics simulations. J. Comput. Phys. 221(2), 615–625 (2007) 22. Mirzaian, A., Arjomandi, E.: Selection in X + Y and matrices with sorted rows and columns. Inf. Process. Lett. 20(1), 13–17 (1985) 23. Mittal, N.: Find the Largest Pair Sum in an Unsorted Array—GeeksforGeeks. https://www. geeksforgeeks.org/find-the-largest-pair-sum-in-an-unsorted-array/ 24. Ojha, D.: Smallest Pair Sum in an array—GeeksforGeeks. https://www.geeksforgeeks.org/ smallest-pair-sum-in-an-array/ 25. Mittal, N.: Count Pairs in a Sorted Array Whose Sum is Less than x—GeeksforGeeks. https:// www.geeksforgeeks.org/count-pairs-array-whose-sum-less-x/

Dynamic Trade Flow of Selected Commodities Using Entropy Technique Sharmin Akter Milu, Javed Hossain, and Ashadun Nobi

Abstract The global entropy and uniformity of three major commodities that exported most worldwide and all commodities combinedly were observed from 1995 to 2018. It is found that global entropy and uniformity of manufactured goods chiefly classified by material are higher than two other products machinery, transport equipment, and crude materials, inedible, except fuels, and it was fluctuating in two products. In 2018, they fall remarkably in manufactured goods and crude materials, inedible, except fuels. Further, local entropy and number of trade partners of two world’s most influencing countries China and USA were investigated and compared it with world’s average value of local entropy and trade partners. It is seen that local entropy and trade partners of two countries are much higher than world’s average value except some early cases of local entropy of China. It is also observed that when local entropy and number of partners both countries declined together, world’s average values fall significantly. Keywords International trade · Export · Global entropy · Uniformity · Trade partnership

1 Introduction Economic transactions are made among the countries for the purpose of providing a nation with commodities it lacks in exchange for those commodities that it produces in abundance. This is called the export-import relationship or worldwide trade [1–4] S. A. Milu · J. Hossain · A. Nobi (B) Department of Computer Science and Telecommunication Engineering(CSTE), Noakhali Science and Technology University, Sonapur, Noakhali 3814, Bangladesh e-mail: [email protected] S. A. Milu e-mail: [email protected] J. Hossain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_38

353

354

S. A. Milu et al.

relationship. It is established among nations depending on the economic, political, and social relationships. It can be thought as the driving force for the development of economic growth of a country. Globalization and regionalization have been occurred during last two decades in the term of international trade. In recent years, by viewing the international trade system as an interdependent complex network, a huge amount of fast-growing literature has been built, where nodes are representing countries, and edges are trade relationships [5–11]. In weighted network approach, each link carries the trade value which is known as weight. The weight is an amount that a country produces with a partner country in a year [12]. Trade volumes are also distributed largely [7, 13, 14]. To observe the number of trade partners with the variation of time, how trade values are distributed with the diverging number of trade partners and how much different volumes are among the trading countries, a study has been employed with entropy technique [15]. In this paper, we have worked with three main products that are exported extensively and associated with large number of trade partnerships. The products are: (1) Manufactured goods chiefly classified by material, (2) Machinery and transport equipment, and (3) Crude materials, inedible, except fuels. We also took the export value aggregated all over the products which we named as all commodity. The aim of the study is to find the global entropy for the products all over the world and how uniform the trade was. We also found the local entropy and the number of trade partners for the most involving countries in international trade China and USA and also compared them with average local entropy and trade partners of the whole world for the individual three products and all commodity.

2 Data Analysis The trade data has been here was from United Nations (UN) COMTRADE. We studied whole concept for a time period from 1995 to 2018 about 24 years for 168 countries. To build an international trade network, we constructed a matrix, where each cell of the matrix contains the trade value that it exported with a partner country. If there is a trade relationship between any two countries (e.g., France, India, Japan, China, USA), then we’ve set the trade value and otherwise 0.

3 Methods 3.1 Global Entropy We have worked with exported value for measuring the global entropy. We took the values, where Yi j (t) ≥ 0; here, trade flows from country i to j for the year t.

Dynamic Trade Flow of Selected Commodities Using …

355

We used the normalized value of exported volume by calculating like this yi j (t) = N Yi j (t)/Y (t), total trade value Y (t) = k,l Yk,l (t). Then, the global entropy for each commodity for a time period is determined as [16–18]. S(t) = −



yi j (t)log2 yi j (t)

i, j

This equation provides the information about the problem of which pair of countries make a partnership for trading a product. If global entropy increases, it means that total number of trading pair countries are increasing. On the other hand, it is concentrated only some specific pairs of countries if the entropy decreases.

3.2 Uniformity Uniformity gives the information how much the trade heterogeneous or homogeneous was. To calculate uniformity, we need the total partners involved in trading products such as P(t) = i, j,yi j ≥0 1. Then, the uniformity is the ratio, U(t) = S(t)/ log2 P.

3.3 Local Entropy The information amount of the trade partnership of a single country in a year is calculated by si (t) = −



f i j (t) log f i j (t)

j

 The trade flux will be normalized locally as f i j (t) = Yi j (t)/Yi (t) where Yi (t) = j Yi j (t) and then calculate local trade entropy of a country i in year t. The world’s average local entropy is calculated by  savg =

si (t) N

Here, N = 168 (Total countries considered)

356

S. A. Milu et al.

3.4 Trade Partnership Total number of countries that a country makes trade partnership with those countries. We investigated the number of partnerships for export value of some specific countries.  1 pi (t) = j,yi j (t)>0

Here, yi j (t) represents export value. The average partnership of the world k is calculated all over the world (the countries considered in the study) to compare it with a specific country. In this case,  pi (t) K(t) = N

4 Results 4.1 Global Entropy and Uniformity We have calculated global entropy and uniformity for produced products manufactured goods vitally categorized by material; machinery and transpor equipment; crude materials, inedible, except fuels, and all commodities for the exported trade value. In Fig. 1a, we can see the global entropy of all commodities, and manufactured goods are higher than two other products during whole time period. It means that manufactured goods are the most exported products that involved higher trade partners and obviously with a higher trade value. The global entropy was gradually increasing till 2007, and it was almost constant till last except a sharp fall of global entropy of manufactured goods in 2018. The global entropy of crude materials is higher than machinery and transport equipment till 2006, and then, it interchanges with each other with some fluctuations. A sharp fall in crude materials is also seen in 2018 like manufactured goods. In Fig. 1b, we explained the uniformity of trade all over the world. The higher uniformity means that the trade was homogeneous over the world. In other words, it can be said that the trade was evenly distributed; the influence of a specific country was less in higher uniformity. On the other hand, lesser uniformity means the heterogeneity or unevenly distributed trade. Like global entropy, a transition is seen in uniformity in both the products manufactured goods and crude materials in 2018.

Dynamic Trade Flow of Selected Commodities Using …

(a) Global Entropyfor All commodities;Manufactured goods chiefly

(b) Uniformityfor All commodities;Manufactured goods chiefly

categorized by material; machinery and transport equipments; Crude materials, inedible, except fuels

categorized by material; machinery and transport equipments; Crude materials, inedible, except fuels

357

Fig. 1 a Global entropy for all commodities; manufactured goods chiefly categorized by material; machinery and transport equipment; crude materials, inedible, except fuels. b Uniformity for all commodities; manufactured goods chiefly categorized by material; appliance and transit ingredients; crude materials, inedible, except fuels

4.2 Local Entropy We calculated local entropy for the most influential two countries: China and USA in world trade and compared it with average local entropy of the world. In Fig. 2a, in 1995, the local entropy of manufactured goods chiefly classified by material for China is its lowest value; then, it increased gradually over time, and it takes the highest value of entropy after 2010. A sharp fall is shown in 2018. In the case of USA, it is decreasing till 2000, and then, it is increased up to 2017. In 2018, the entropy of USA is fallen as like china. In world’s average local entropy, the value remains same almost in whole time period, but in 2018, it is declined due to the fall of both china and USA. In machinery and transport goods (Fig. 2b), local entropy is increasing with small fluctuation for China, and there is a rapid fall in 2018. While, the entropy of USA is increased from 2000 to 2006 and slightly declined from 2007. We see in this product that there is no significant change in world’s average local entropy. Like two other products, in crude materials, China’s local entropy has fallen in 2018 (Fig. 2c). It was increasing with some significant fluctuations up to 2017 and fell perniciously in 2018. It causes a fall in world’s average value as in the manufactured products. On the other hand, USA has an opposite change in 2018 with an upward transition. It means that the influence of China is higher than USA in this product. And finally, we considered all the combined products named all commodity, we saw that local entropy of China was also fallen drastically in 2018 which impacted the average local entropy of the world deficiently (Fig. 2d).

358

S. A. Milu et al.

Fig. 2 Local entropy for a manufactured goods chiefly classified by material, b machinery and transport equipment, c crude materials, inedible, except fuels, d all commodity

4.3 Export-Trade Partnership Analysis In trade partnership analysis, we took the most influential two countries: China and USA in world trade to see their number of trade partners and compared it with average number of partners of the world in the same manner of local entropy. In Fig. 3a, we see partners of China and USA are almost same in whole time period, and there is a sharp fall to 75 for both countries for manufactured goods. It affected to the average partnership of the world trade as we see a downward transition in world average partnership which means that trade partnership of China and USA has a very strong effect on world trade. In machinery and transport goods (Fig. 3b), the partnership in China in 2018 has fallen sharply. For this reason, the average world partnership has decreased slightly which means that it effected world trade of these products a little.

Dynamic Trade Flow of Selected Commodities Using …

359

Fig. 3 Trade partnership for a manufactured goods chiefly categorized by material, b machinery and transport equipment, c crude materials, inedible, except fuels, d all commodity

Like two other products, in crude materials, China’s number of partners has fallen in 2018 (Fig. 3c). The number of partners of China was gradually increasing with a little fluctuation and fell perniciously in 2018. In all commodity, we saw that the partnership of China was also fallen drastically which impacted a little bit in the average partnership of the world (Fig. 3d). We can see a common change in every product for china in 2018 which effected world trade slightly. But when China and USA both’s partnership fallen, we see a great impact in world trade as seen in manufactured goods. We also found that one thing is in all products China’s fall which was constant in 2018, but the fall of USA was only in manufactured goods in trade partnership.

360

S. A. Milu et al.

5 Conclusion The global entropy describes the trade relationship among countries, and uniformity represents how much uniform the trade was for the considered individual products and all commodity. From three products, we have found that manufactured goods were involved to higher global entropy and uniformity and two other products with some fluctuating values. From the local entropy and trade partnership analysis, we found that local entropy and the partnership of China and USA are much higher than world’s average except some early years of local entropy of China. That’s why we can say that the two countries have an impactful influence in world trade. We also noticed that China’s local entropy and trade partnership drastically fall at 2018 in almost all products which have an small effect on world trade but when local entropy and trade partnership of China and USA fell together it effected world trade significantly that we have seen in manufactured goods in both local entropy and trade partnership. Acknowledgement This work is fully funded and supported by ICT Division, Ministry of Posts, Telecommunications and Information Technology, Bangladesh under the ICT fellowship scheme.

References 1. Eaton, J., Kortum, S.: Technology, geographyand trade. Econometrica 70(5), 1741–1779 (2002) 2. Helpman, E., Melitz, M., Rubinstein, Y.: Estimating trade flows: trading partners and trading volumes. Quart. J. Econ. 123(2), 441–487 (2008) 3. Rose, A.K.: Dowereally know that the WTO increases trade? Am. Econ. Rev. 94(1), 98–114 (2004) 4. Foschi, R., Riccaboni, M., Schiavo, S.: Preferential attachment in multiple trade networks. Phys. Rev. 90, 022817 (2014) 5. Serrano, M.A., Boguná, M.: Topology of the world trade web. Phys. Rev. E 68, 015101 (2003) 6. Garlaschelli, D., Loffredo, M.I.: Structure and evolution of the world trade network. Physica A 355, 138–144 (2005) 7. Fagiolo, G., Reyes, J., Schiavo, S.: World-trade web: topological properties, dynamics, and evolution. Phys. Rev. E 79, 036115 (2009) 8. Riccaboni, M., Schiavo, S.: Structure and growth of weighted networks. New J. Phys. 12, 023003 (2010) 9. De Benedictis, L., Tajoli, L.: The world trade network. World Econ. 34, 1417–1454 (2011) 10. Riccaboni, M., Rossi, A., Schiavo, S.: Global networks of trade and bits. J. Econ. Interac. Coord. 8, 33–56 (2013) 11. Riccaboni, M., Schiavo, S.: Stochastic trade networks. J. Complex Netw. forthcoming (2014) 12. Fagiolo, G., Reyes, J., Schiavo, S.: The evolution of the world trade web: a weighted–network analysis. J. Evol. Econ. 20, 479–514 (2010) 13. Bhattacharya, K., Mukherjee, G., Saramäki, J., Kaski, K., Manna, S.S.: The international trade network: weighted network analysis and modelling. J. Stat. Mech. P02002 (2008) 14. Cha, M.-Y., Lee, J.W., Lee, D.-S.: Complex networks and minimal spanning trees in international trade network. J. Korean Phys. Soc. 56, 998 (2010) 15. Oh, C.-Y., Lee, D.-S.: Entropy of international trades. Phys. Rev. E95, 052319 (2017) 16. Shannon, C.E.: A Mathematical Theory of Computation Bell. Syst. Tech. J. 27, 379 (1948)

Dynamic Trade Flow of Selected Commodities Using …

361

17. Pulliainen, K.: Entropy measures for international trade. Swedish J. Econ. 72, 40 (1970) 18. Lei, H., Chen, Y., Li, R., He, D., Zhang, J.: Maximum entropy for the international division of labor. PLoS ONE 10, e0129955 (2015)

An Automated Bengali Text Summarization Technique Using Lexicon-Based Approach Busrat Jahan, Sheikh Shahparan Mahtab, Md. Faizul Huq Arif, Ismail Siddiqi Emon, Sharmin Akter Milu, and Md. Julfiker Raju

Abstract There is enough resources for English to process and obtain summarize documents. But this thing is not directly applicable for Bengali language as there is lots of complexity in Bengali, which is not same to English in the context of grammar and sentence structure. Again, doing this for Bengali is harder as there is no established tool to facilitate research work. But this necessary as 26 crore people use this language. So, we have gone for a new approach Bengali document summarization. Here, the system design has been completed by preprocessing the i/p (input) doc, tagging the word, replacing pronoun, sentence ranking, respectively. Pronoun replacement has been added here to minimize the rate of swinging pronoun in the output summary. As the pronoun replacement, we have gone ranking sentences according to sentence frequency, numerical figures (both in digit and word version) and document title. Here, if the sentence has any word that exists in title also taken into our account. The similarity between two sentences has been checked to deduct one as that causes less redundancy. The numerical figure also makes an impact, so they were also identified. We have taken over 3000 newspaper and books documents words has been trained according to grammar. And two B. Jahan  I. S. Emon  Md. Julfiker Raju Department of CSE, Feni University, Feni, Bangladesh e-mail: [email protected] I. S. Emon e-mail: [email protected] Md. Julfiker Raju e-mail: julfi[email protected] S. S. Mahtab (&) Department of EEE, Feni University, Feni, Chittagong Division, Bangladesh e-mail: [email protected] Md. Faizul Huq Arif Department of ICT(DoICT), ICT Division, Dhaka, Bangladesh e-mail: arifi[email protected] S. A. Milu Department of CSTE, Noakhali Science and Technology University, Noakhali, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_39

363

364

B. Jahan et al.

documents have been checked by the design system to evaluate the efficiency of designed summarizer. From the evaluation system, it is been found that the recall, precision, F-score are 0.70 as it is 70%, 0.82 as it is 82%, 0.74 as it is 74%, respectively. Keywords Text summarizer learning POS tagging



 BTS  Bengali  NLP  Python  Machine

1 Introduction Text summarization is the process of summarizing a text or document. There are many summarization tools for the English language. There are also some tasks for automated Bengali text or document summarization. From an application standpoint, the tools do not seem to be very suitable. The abstracts are categorized in two ways: the extractive and the abstractive approach. Most of the summarizer methods for Bengali text summarization are extractive [1]. In an automated text summarization process, a text is delivered to the computer, and the computer returns a less-than-redundant extract or abstract of the original text (s). Text abstraction is the process of producing an abstract or a summary of an extract by selecting a significant portion of the information from one or more texts [1–3]. Thus, the overview summarizes the meaning of the extremes, and some time extraction results in data loss. These methods are not also able to create a plain text from related hierarchical texts. Extract summarization is less like complexity when it comes to favorite issues than abstracts for less complexity. We can use the grammatical rules in conjunction along with mathematical rules for making sentences to decrease the unnecessary error. Again, it can be use for creating new and plain text from multiple texts that enables to reduce the size of the text summary [4]. Rafel et al. told the extractive summarizer states all the basic requirements. This method has three structures: text analysis, ranking/scoring sentence and summarization [5].

2 Literature Review The observation of summarization of Bangla language for only a document is showed in this sector. However, the area of Bangla text summarization was begun several years back as a new research. Previously, most of the work in the text summarization domain was done on the basis of sentence prohibition. The survey of different text summarization techniques is proposed as the article method [1]. They accomplished an analysis of various methods for text and implemented the basis of extraction Bangla text summarizer. According to the proposed method of Jones [4], which provides a summary of a text without reading full text we have found that. The main steps in his method

An Automated Bengali Text Summarization Technique Using …

365

have (i) preprocessing, (ii) scoring/ranking sentence and (iii) generating summary. It has also term frequency (TF), inverse document frequency (IDF) and positional value (PV). The presented method of Haque et al. [5], it summarized Bangla document by using an extraction based summarization technique. The four major steps of their method are given here: (i) preprocessing, (ii) scoring/ranking sentence, (ii) sentence clustering, (iv) generating summary. Efat et al. [6] suggested a summarization method as an extraction based which acts on the Bangla documents. At the same time, it is capable of summarizing a single document. It has two major steps in their proposed method: (i) preprocessing, (ii) scoring/ranking sentence and summarization. The method of Das and Bandyopadhyay [7] presented the identification of sentiment from the text, combines it and lastly signifies the text summarization. They used a sentiment model to restore and integrate sentiment. The integration is based on the presentation of theme clustering (K-means) and document level theme relational graph algorithms and finally generates summary selected by the standard page rank algorithm for data retrieval.

3 Suggested Method For successfully we have employed two tagging systems. One is general tagging system, and another is special tagging system. The special tagging system makes the thing best and updated.

3.1

General Tagging

Every word is made to tag (like noun, pronoun, adjective, verb, preposition, etc.). By using a lexicon database [2] and SentiWordNet [3]. The lexicon database and SentiWordNet have limited number of predefined words. Using lexicon database, the words can be tagged as “JJ” (adjective), “NP” (proper noun), “VM” (verb), “NC” (common noun), “PPR” (pronoun), etc. On the other hand, SentiWordNet has list of words with tag as “a” (adjective), “n” (noun), “r” (adverb), “v” (verb), “u” (unknown). Based on these predefined lists of words, we have experimented on 200 Bangla news documents and found that 70% words can be tagged. Bangla words (especially verb) are very much interesting [1]. Though we use word stemming to identify the original term of the word, 100% inactive verbs cannot be stemmed. In fact, it is very difficult to identifying verb because there are many suffixes in Bangla. For example, basis on the tense and person, the English words “do” may be “doing”, “did” and “does”, but on the other hand, the word may have different forms in Bangla. To consider the present continuous tense Like, “কর” (kor-do), three main forms of this word can only depend on the first, second and third person.

366

B. Jahan et al.

Also, it can be “করছি” (doing) for first person, “করছ” (doing) for second person and “করছেন” (doing) for third person, respectively. To consider the present continuous tense Like, “কর” (kor-do), three main forms of this word can only depend on the first, second and third person. Also, it can be “করছি” (doing) for first person, “করছ” (doing) for second person and “করছেন” (doing) for third person, respectively. The forms of verbs for all these meanings of “you” in Bangla are also different. For instance, all these meanings for the forms of verbs of “you” are also different in Bangla. As, “আপনি করছেন” (you are doing), “তুমি করছ” (you are doing), “তুই করছিস” (you are doing) where those terms are specified in present continuous tense and also with second person. Thus, the word “কর” (do) may have the given forms: “করে” (do), “করেন” (do), “করিস” (do), “করি” (do), “করছে” (doing), “করছেন” (doing), “করছ” (doing), “করছিস” (doing), “করছি” (doing), “করেছে” (did), “করেছেন” (did), “করেছ” (did), “করেছিস” (did), “করেছি” (did), “করুক” (do), “করুন” (do), “করল” (did), “করলেন” (did), “করলে” (did), “করলি” (did), “করলাম” (did), “করত” (do), “করতেন” (did), “করতে” (did), “করতিস” (did), “করতাম” (did), “করতেছি” (doing), “করতেছ” (doing), “করতেছেন” (doing), “করছিল” (doing), “করছিলেন” (doing), “করছিলে” (doing), “করছিলি” (doing), “করছিলাম” (doing), “করেছিল” (doing), “করেছিলেন” (doing), “করেছিলে” (doing), “করেছিলি” (doing), “করেছিলাম” (doing), “করবে” (do), “করবেন” (do), “করবি” (do), “করব” (do), “করো” (do). Thus, there is no any comparison between the complexity of verb in Bangla and English. However, verb identification is very important for language processing because the verb is the main word of a sentence. So, the complexity of verb in Bangla cannot be compared with English. A list of suffixes are considered as for the final checking in following: “ইতেছিস” (itechhis), “তেছিস” (techhis), “ইতিস” (itis), “ইলে” (ile), “ইবি” (ibi), etc. Now, if the word has suffix, it is tagged as a verb. The result of word tagging has been improved from 68.12% (before using the list of suffixes [4]) to 70% (after using the list of suffixes). We get some preliminary tagging in this step, and later, it may be updated in the next steps and also along with certain words will be specifically tagged as acronym, named entity, occupation, etc., in the next step [8–11].

3.2

Special Tagging

After general tagging, special tagging was introduced to identify the words as acronym, elementary form, numerical figure, repetitive words, name of occupation, organization and places. 1. Examining for English acronym: When the words are formed by the initials of the other words, then it is called acronym such as “ইউএনও” (UNO), “ওআইসি” (OIC), “‘ইউএসএ” (USA). For examining these kinds of words, when we can separate these words that like “ইউএনও” (UNO) to match with “ইউ” (U), “এন”, “ও” (O), those are matched every letter of the words. Actually, we can write all English letters in Bangla like: A for (“এ”), B for (“বি”), C for (“সি”), D for

An Automated Bengali Text Summarization Technique Using …

367

(“ডি”), … W for (“ডাব্লিউ”), X for (“এক্স”), Y for (“ওয়াই”), Z for (“জেড”) and if we can sort them by descending order depend on their string lengths where W (“ডাব্লিউ”) will be in the first place and A (“এ”) will be in the last place, then match every letter of the words. It is important in descending order that is always used to ensure the longest match. Such as, “এম” (M) does not match with “এ” (A), but it will match with “এম” (M). This experiment shows that 98% success rate for this case. 2. Studying for Bangla elementary tag: Bangla letters with spaces, like: “আ ক ম” (A K M), “এ বি ম” (A B M), etc. These letters will be tagged as Bangla primary tag. We have gotten based on research; the accuracy of the elementary result is 100%. 3. Studying for recurrent words: Recurrent words are special form of word combination where same word can be placed for two times consecutively. For example, “ঠান্ডাঠান্ডা” (thandathanda—cold cold), “বড়বড়” (boroboro—big big), “ছোটছোট” (chotochoto—small small), etc. There are some words, and they are partially repeated such as “খাওয়াদাওয়া” (khawadawa—eat). We have found 100% accuracy on identifying recurrent/repetitive words. 4. Studying for numerical digit: There are three conditions for recognizing the numerical representation in words and digits, are examined as follows: (a) It is formed by following the first part of the word, like as, 0 for (০), 1 for (১), 2 for (২),…, 9 for (৯) or “এক” (one), “দুই” (two), “তিন” (three), “চার” (four) to “নিরানব্বই” (ninety nine). The decimal point (.) is also considered when examining the numerical form from digits. (b) The next part (if any) is followed by: “শত” (hundred), “হাজার” (thousand), etc. (c) Finally, it can have suffixes such as, “টি” (this), “টা” (this), “এন” (en), etc. After the experiment on our sample test documents, 100% numerical form can be found from both numerical values and text documents. 5. Studying for name of occupation: Occupation has a significant word, and for the human named entity identification, occupation is very much helpful by which named entity can be recognized. If we get any word as occupation, we may consider the immediate next some words to find out named entity. We have retrieved some entries for the occupation of Bangladesh from a table such as “শিক্ষক” (shikkhok-master), “সাংবাদিক” (sangbadik-journalist). Every word has matched with these words (that we collected from different online source) and if any matches are found then tagged as occupation. Here, “শিক্ষক” (shikkhok-master) will turn into “প্রধান শিক্ষক” (prodhanshikkhok-Head master) and so on. From this study, it may identify 96% for occupation. 6. Studying for the name of organization: Name of organization is an important factor where any type of word may be the element of organizational name. From our analysis, it has been mentioned as follows: (a) The following complete name of the organization, which is depended on the acronym of the name that is together with this parenthesis. For example,

368

B. Jahan et al.

“দূর্নীতিদমনকমিশন(দূদক)” “Durniti Domon Commission (DUDOK) Anti Corruption Commission (ACC)”. (b) The organization name with last part may contain certain words. Such as, “লিমিটেড” (limited-limited), “বিদ্যালয়” (biddaloyschool), “মন্ত্রণালয়” (montronaloy-ministry), etc. Along with the above point, if any such of words are presented in the text according to the point (b), then immediately check the three words of the particular word. Uncertainty when the words are found as noun, name entity or any blocked word, then call them the organizations name. It is found that the organizations name may be accepted the basis of point (b) 85% times.[13–20] 7. Studying for name of place: There is a table the name of places of Bangladesh, it is made with 800 names for the list of division, district, upazila and municipality. Here, the top level is division, second level is district, and third level is upazila or municipality in area-based separation. In addition, we have analyzed 230 countries names and their capitals. In this way, about 91% of the place names can be identified in our experiment.

4 Experimental Results Sample input Title: দুই ভাই-বোনের ময়না তদন্ত হয়েছে, মামলা হয়নি Text: রাজধানীরবনশ্রীতেদুইভাইবোনেররহস্যজনকমৃত্যুরঘটনায়এখনোমা মলাহয়নি।শিশুদেরবাবামামলাকরবেনবলেজানিয়েছেপরিবার।দুইশিশুরলাশেরময়ন াতদন্তহয়েছে।তাঁদেরগ্রামেরবাড়িজামালপুরেলাশদাফনকরাহবে।খাবারেরনমুনা পরীক্ষারফলাফলএখনোপাওয়াযায়নি।শিশুদের বাবা আমানউল্লাহর বন্ধু জাহিদুল ইসলাম আজ মঙ্গলবার বেলা সোয়া ১১ টার দিকে প্রথম আলোকে এসব কথা জানিয়েছেন।রামপুরাথানারভারপ্রাপ্তকর্মকর্তা (ওসি) রফিকুল ইসলাম বলেন, এখনো মামলা হয়নি।পরিবারের পক্ষ থেকে আজ মামলা হতেপারে।জিজ্ঞাসা বাদের জন্য চায়নিজ রেস্তোরাঁর ব্যবস্থাপক, কর্মচারী, পাচককে থানায় নেওয়া হয়েছে।চায়নিজ রেস্তোরাঁ থেকে আগের দিন আনা খাবার গতকাল সোমবার দুপুরে গরম করে খেয়ে ঘুমিয়ে পড়ে নুসরাত আমান (১২) ও আলভী আমান (৬)। এরপর তারা আর জেগে ওঠেনি। অচেতন অবস্থায় হাসপাতালে নেওয়া হলে চিকিৎসকেরা তাদের মৃত ঘোষণা করেন।পরিবারের অভিযোগের ভিত্তিতে পুলিশ জিজ্ঞাসাবাদের জন্য ওই রেস্তোরাঁর মালিককে থানায় নিয়ে গেছে। নুসরাত ভিকারুননিসা নূন স্কুল অ্যান্ড কলেজের পঞ্চম ও আলভী হলিক্রিসেন্ট স্কুলে নার্সারি শ্রেণির শিক্ষার্থী। তাদের বাবা মো. আমান উল্লাহ ব্যবসায়ী ও মা জেসমিন আক্তার গৃহিণী। এই দম্পতির এই দুটি সন্তানই ছিল। চায়নিজ রেস্তোরাঁ থেকে আগের দিন আনা খাবার গতকাল সোমবার দুপুরে গরম করে খেয়ে ঘুমিয়ে পড়ে নুসরাত আমান(১২) ও আলভী আমান(৬)। এরপর তারা আর জেগে ওঠেনি। অচেতন অবস্থায় হাসপাতালে নেওয়া হলে চিকিৎসকেরা তাদের মৃত ঘোষণা করেন। পরিবারের অভিযোগের ভিত্তিতে পুলিশ জিজ্ঞাসাবাদের জন্য ওই রেস্তোরাঁর মালিককে ওই দিনই থানায় নিয়ে গেছে।

An Automated Bengali Text Summarization Technique Using …

369

Getting Summary of Sample Title: দুইশিশুরলাশেরময়নাতদন্তহয়েছে। Text:রামপুরাথানারভারপ্রাপ্তকর্মকর্তা (ওসি) রফিকুলইসলামবলেন, এখনোমামলাহয়নি। দুইভাইবোনেরময়নাতদন্তহয়েছে,মামলাহয়নিরাজধানীরবনশ্রীতেদুইভ াইবোনেররহস্যজনকমৃত্যুরঘটনায়এখনোমামলাহয়নি।শিশুদেরবাবামামলাকর বেনবলেজানিয়েছেপরিবার। পরিবারেরপক্ষথেকেআজমামলাহতেপারে। শিশুদের বাবা আমানউল্লাহর বন্ধু জাহিদুল ইসলাম আজ মঙ্গলবার বেলা সোয়া ১১টার দিকে প্রথম আলোকে এসব কথা জানায় । See Figs. 1 and 2.

4.1

Co-selection Measures

Co-selection measures: In co-selection measures, the principal evaluation metrics are [12]: (i) Precision (P): It is the number of sentences occurring in both system generated summary and ideal summary divided by the number of sentences in the system generated summary.

Fig. 1 Sentence scoring of sample document

370

B. Jahan et al.

Fig. 2 Mean deviance of sample document

PrecisionðPÞ ¼ ðA \ BÞ=A where “A” denotes that the number of sentences obtained by the summarizer and also “B” denotes the number of relevant sentences compared to target sets. (ii) Recall (R): It is the number of sentences occurring in both systems generated summary and ideal summary divided by the number of sentences in the ideal summary. RecallðRÞ ¼ ðA \ BÞ=B where “A” denotes that the number of sentences obtained by the summarizer and also “B” denotes the number of relevant sentences compared to target sets. (iii) F-measure: The integrated measure that incorporated both precision and recall is F-measure. FScore ¼ ð2  P  RÞ=ðP þ RÞ where “A” denotes that the number of sentences obtained by the summarizer and also “B” denotes the number of relevant sentences compared to target sets. The evaluation result of first ten document has given in Table 1.

An Automated Bengali Text Summarization Technique Using … Table 1 Result of precision, recall and F-score

371

Document No.

Precision (P)

Recall (R)

F-score

1 2 3 4 5 6 7 8 9 10 Average score

0.84 0.79 0.82 0.82 0.79 0.82 0.78 0.85 0.85 0.84 0.82

0.71 0.72 0.69 0.68 0.71 0.73 0.72 0.70 0.71 0.71 0.70

0.76 0.75 0.74 0.74 0.74 0.75 0.73 0.75 0.76 0.76 0.74

5 Conclusion We have gone for an automatic Bengali document summarizer using Python as a programming platform. There is enough resources for English to process and obtain summarize documents. But this thing is not directly applicable for Bengali language as there is lots of complexity in Bengali, which is not same to English in the context of grammar and sentence structure. Again, doing this for Bengali is harder as there is no established tool to facilitate research work. But this necessary as 26 crore people use this language. So, we have gone for a new approach Bengali document summarization. Here, the system design has been completed by preprocessing the i/ p (input) doc, tagging the word, replacing pronoun, sentence ranking, respectively. Pronoun replacement has been added here to minimize the rate of swinging pronoun in the output summary. As the pronoun replacement, we have gone ranking sentences according to sentence frequency, numerical figures (both in digit and word version) and document title. Here, if the sentence has any word that exists in title also taken into our account. The similarity between two sentences has been checked to deduct one as that causes less redundancy. The numerical figure also makes an impact, so they were also identified. We have taken over 3000 newspaper and books documents words which has been trained according to grammar. And two documents have been checked by the design system to evaluate the efficiency of designed summarizer. From the evaluation system, it is been found that the recall, precision, F-score are 0.70 as it is 70%, 0.82 as it is 82%, 0.74 as it is 74%, respectively.

372

B. Jahan et al.

References 1. Radev, D.R., Hovy, E., McKeown, K.: Introduction to the special issue on summarization. J. Comput. Linguist. 28(4), 399–408 (2002) 2. Hamou-Lhadj, A., Lethbridge, T.: Summarizing the content of large traces to facilitate the understanding of the behaviour of a software system. In: Proceedings of the 14th IEEE International Conference on Program Comprehension (ICPC), pp. 181–190. IEEE, (2006) 3. Hovy, E.: Automated text summarization. In: Mitkov, R. (ed.) The Oxford Handbook of Computational Linguistics, pp. 583–598. Oxford University Press (2005) 4. Jones, K.S.: Automatic summarizing: factors and directions. In: Advances in Automatic Text Summarization, pp. 1–12 (1999) 5. https://blog.frase.io/ 6. Dongmei, A., Yuchao, Z., Dezheng, Z.: Automatic text summarization based on latent semantic indexing. J. Artif. Life Robot. 15(1), 25–29 (2010) 7. Kunder, M.D.: The size of the world wide web. Online. Available. http://www. worldwidewebsize.com. Accessed 15 Feb 2015 8. Chakma, R., et al.: Navigation and tracking of AGV in ware house via wireless sensor network. In: 2019 IEEE 3rd International Electrical and Energy Conference (CIEEC), Beijing, China, pp. 1686–1690 (2019). https://doi.org/10.1109/cieec47146.2019.cieec-2019589 9. Emon, I.S., Ahmed, S.S., Milu, S.A., Mahtab, S.S.: Sentiment analysis of bengali online reviews written with english letter using machine learning approaches. In: Proceedings of the 6th International Conference on Networking, Systems and Security (NSysS ’19). Association for Computing Machinery, New York, pp. 109–115 (2019). doi: https://doi.org/10.1145/ 3362966.3362977 10. Ahmed, S.S., et al.: Opinion mining of Bengali review written with English character using machine learning approaches. In: Bindhu V., Chen J., Tavares J. (eds.) International Conference on Communication, Computing and Electronics Systems. Lecture Notes in Electrical Engineering, vol. 637. Springer, Singapore (2020). https://doi.org/10.1007/978981-15-2612-1_5 11. Milu, S.A., et al.: Sentiment Analysis of Bengali reviews for data and knowledge engineering: a Bengali language processing approach. In: Bindhu V., Chen J., Tavares J. (eds.) International Conference on Communication, Computing and Electronics Systems. Lecture Notes in Electrical Engineering, vol. 637. Springer, Singapore (2020). https://doi.org/10.1007/ 978-981-15-2612-1_8 12. Munir, C., Ibrahim, K., Mofazzal, H.C.: Bangla VasarByakaran. Ideal publication, Dhaka (2000) 13. Ferreira, R., de Souza Cabral, L., Freitas, F., Lins, R.D., de Frana Silva, G., Simske, S.J., Favaro, L.: A multi-document summarization system based on statistics and linguistic treatment. Expert Syst. Appl. 41(13), 5780–5787 (2014) 14. Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958) 15. Foong, O.M., Oxley, A., Sulaiman, S.: Challenges and trends of automatic text summarization. Int. J. Inf. Telecommun. Technol. 1(1), 34–39 (2010) 16. Azmi, A.M., Al-Thanyyan, S.: A text summarizer for arabic. J. Comput. Speech Lang. 26(4), 260–273 (2012) 17. Karim, M.A., Kaykobad, M., Murshed, M.: Technical challenges and design issues in bangla language processing. Published in the United States of America by Information Science Reference (an imprint of IGI Global) (2013) 18. Islam, M.T., Masum, S.: Bhasa: a corpus based information retrieval and summarizer for bengali text. In: Proceedings of the 7th International Conference on Computer and Information Technology (2004)

An Automated Bengali Text Summarization Technique Using …

373

19. Uddin, M.N., Khan, S.A.: A study on text summarization techniques and implement few of them for bangla language. In: Proceedings of the 10th International Conference on Computer and Information Technology (ICCIT-2012), pp. 1–4. IEEE (2007) 20. Sarkar, K.: Bengali text summarization by sentence extraction. In: Proceedings of International Conference on Business and Information Management (ICBIM-2012), pp. 233–245. NIT Durgapur (2012)

Location-Based Pomegranate Diseases Prediction Using GPS Rajshri N. Malage and Mithun B. Patil

Abstract In our India, agricultural field is most important and essential field. This field plays major role in our economy and daily life. There are different types of crops and fruits that are cultivated in our country. Pomegranate is one of the major commercial fruit grown in our country, but these fruits are prone to many uneven climatic diseases. The weather forecasting technology using GPS is very important, effective, and beneficial to pomegranate farmer to protect the plant form different diseases and maintain the immunity of their plant. In this research paper, we have designed a system to predict the correct weather of that location to detect and forecast the pomegranate diseases and provide prevention tips for the diseases. It gives an alert message to cultivator based on which he makes decisions. Keywords Pomegranate diseases · Accuweather · Segmentation · Weather forecast · Global positioning system (GPS)

1 Introduction Pomegranate is one of commercial horticulture products in India and many other countries because of its high medical application across the globe.There are many disease effects; these pomegranate cultivations in terms of plants or fruit result in degradation of quality and quantity of production of fruits which results in degradation of financial source in terms of agriculturists and the human health so to overcome this problem forecasting and predication of diseases in these Pomegranate is an important issue. To overcome this problem, we have designed an Android mobile application especially for pomegranate diseases.We use the GPS system to identify location of that agriculture plant and calculate the 5 days weather condition of R. N. Malage (B) · M. B. Patil Department of CSE, N K Orchid College of Engineering and Technology Solapur, Solapur, India e-mail: [email protected] M. B. Patil e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_40

375

376

R. N. Malage and M. B. Patil

(a)

(b)

Fig. 1 a Mars disease on pomegranate plant. b Bacterial blight on pomegranate plant

that farm and according to weather situation we forecast and predict occurrence of diseases and solution of that disease which in turn prevent affection of plant. The pomegranate cultivators can use the system/application to get information regarding daily weather and changes in environmental conditions, application mostly gives the information regarding diseases due to weather.Hangs and appropriate solutions on that disease. The diseases generally, these pomegranate plants need to have a spray Fig. 1. Bacterial blight on pomegranate fruit treatment is based on humidity and temperature so in our application we have designed system for forecasting diseases based on available whether information. Variation in wheatear condition requires spray treatment as shown in Fig. 2 occurred due to changeable climate that affects to whole plant of pomegranate; it would be a huge loss for cultivator. As per researchers of pomegranate plant, it should require 16 months for healthy growth and if plant is prone to any disease, it will be vanishing within few days. In pomegranate cultivation, exact information regarding the disease’s predication is an important issue. The pomegranate is prone to many diseases like bacterial blight and Mars diseases as shown in Fig. 1 which results in reduction in yield and reduce its medical importance.

2 Literature Review In recent developments of information communication technology (ICT) to farmers several computer techniques are made available for agricultural and horticultural cultivator. The cultivator cannot contact the agricultural or horticultural experts due to their distant availability and unable to identify the disease symptoms due to its complexity of disease pattern. In existing system, the captured images from plant leaves surfaces can give a better solution where the remote agri experts can see instantly the image for disease diagnosis to extend advice to remote areas through manual or telephonic. Most of the diseases in these plant are based on environmental condition so we have designed an application which forecast the diseases

Location-Based Pomegranate Diseases Prediction Using GPS

377

Fig. 2 Temperature and humidity

in pomegranate plant based on weather conditions. Few of the related works in forecasting the diseases are discussed here. [1] designed a system for detection of pomegranate disease using algorithms of machine learning and Internet of things, and the main goal of research work is to detect the whole plant of pomegranate to avoid diseases. Dubey and Jalal [2] have an image processing-based technique for identification and detection of pomegranate disease in which image segmentation, local binary patterns, color coherence vector-mean clustering, histogram, and complete local binary patterns are used for extracting the features. Islam et al. [3] proposed method that integrates machine learning and image processing to allow diseases diagnosis from leaf images. The author proposed an image segmentation with multiclass SVM to develop an automated and easily accessible system. Bhange and Hingoliwala [4] explain pomegranate diseases detection using image processing (image processing algorithms and K-map algorithm). It is accurate system, through the image processing detection of diseases of pomegranate plant. This system only uses image to detect pomegranate diseases. Dhakate and Ingole [5] diagnosis of pomegranate plant uses image processing and neural network algorithms to deal with the issues of pathology, i.e., classification of diseases. Gaikwad and Karande [6] introduce the detection of disease and grading in pomegranate fruit using digital image processing, and image processing for fruit disease detection image processing

378

R. N. Malage and M. B. Patil

is required for enhancing image.Lamani et al. [7] predicted plant diseases from weather forecasting using data mining. Plant diseases determination is an art and science. Plant diseases are essential problem that lowers the quantity and reduces the quality of agricultural production. The proposed system uses segmentation technique such as k-means clustering and deep neural network learning to predict the disease-based weather feature of orange plant.

3 Proposed System The proposed architecture for forecasting of diseases in pomegranate disease is as shown in Fig. 3 which consists of different phases as explained below.

3.1 Processing In this phase, the deal information weather such as humidity, temperature, wind speed, possibility of rain, and timing of sunrays is collected since these are the major parameter in forecasting of diseases in pomegranate plant.

3.2 Segmentation In this phase, the processed data is grouped together based on parameters which can be used for efficient calculation and analysis of stored information and it also take

Fig. 3 Pomegranate diseases forecasting system

Location-Based Pomegranate Diseases Prediction Using GPS

379

cares of potential challenges to implement the appropriate data segmentation and data quality tools with customer data validation.

3.3 Feature Extraction It is an initial process in which dimensionality reduction into manageable group for processing is done. In this phase, various features are selected and combined which, effectively reduces the amount of data processed, while still accurately and completely describing the original dataset. The process of feature extraction reduces the need resources without losing the important information.

3.4 Preprocessing In this phase, the data are normalized and selected relevant data for processing missing data are corrected in this phase so that unreliable data, noise, and irrelevant data are ignored during processing time.

3.5 Testing In general, testing is finding out how well something works. To examine something carefully to find out if it is working properly or what it is like, in this phase, testing of pomegranate based on weather conditions is performed.

3.6 Classification It is the phase of organizing things into groups based on their type, systematic arrangement in group, or categories according to established criteria.

3.7 Detection Plant detection is the process matching a specimen plant to know taxon. The ability to identify plants allows us to access many important pasture variables that are critical to management like range condition proper stopping rate wide life habited quality.

380

R. N. Malage and M. B. Patil

4 Result and Discussion According to this result, we get weather forecasting information regarding humidity, temperature, wind speed, precipitation, and sunrays. Through this result, we are displaying three days weather. So, the farmer can take decision at which time spray will take according precipitation and wind speed. The weather forecasting information related to on humidity, temperature, wind speed, precipitation, and sunlight is done as shown in Fig. 4. The information is notified to cultivator. Based on the information, the cultivator can take decision about spray as per precipitation and wind speed. Figure 5 shows the detail forecasting of disease based on weather such as Fulkidi, Ali, Pityadhekun, Mava. These diseases reduce the quality and quantity. It will reduce the price of pomegranate fruit in market. It will create big impact on farmer income.

Fig. 4 Weather forecast

Fig. 5 Diseases occurred

Location-Based Pomegranate Diseases Prediction Using GPS

381

Figure 6 shows the temperature graph which is primary factor affecting the rate of plant development. Raise in temperature may affect the plant which may impact plant productivity. Temperature will help to farmer from sun burning for protect their pomegranate fruit and cover the farm. Figure 7 shows wind speed. This result shows to farmer for taking daily spray. So, he can save his money and spray. Wind direction and velocity have significant influence on crop growth. Figure 8 shows humidity means the amount of wetness or water vapor in the air and it can also predicate rainfall. Fig. 6 Temperature graph

Fig. 7 Wind speed graph

382

R. N. Malage and M. B. Patil

Fig. 8 Humidity graph

5 Conclusion The forecasting of disease in pomegranate plant based on environment condition and intimating the cultivator which is main need for high yield of pomegranate to cultivator is designed and implemented in this paper. The designed system/application forecasts pomegranate plant disease based on weather condition by considering the location of the plant using GPS system. The designed application considers the weather information like temperature, humidity, wind speed, and possibility of rain and timing of sunrays to detecting pomegranate diseases, which in turn helps the farmers/cultivator to increase the yield with quality of fruit and pomegranate plant.

References 1. Pawara, S., Navalem, D., Patil, K., Mahajan, R.: Detection of pomegranate disease using machine learning and internet of things. In: IEEE 3rd International Conference for Convergence in Technology (I2CT) (2018) 2. Dubey, S.R., Jalal, A.S.: Detection and classification of tomato vegetable diseases using complete local binary patterns IEEE. In: Third International Conference on Computer and Communication Technology, vol. 3, pp. 247–251 (2012) 3. Islam, M., Dinh, A., Wahid, K., Bhowmik, P.: Detection of potato diseases using image segmentation and multiclass support vector machine. In: IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE) (2017) 4. Bhange, M., Hingoliwala, H.A.: Pomegranate disease detection using image processing. Procedia Comput. Sci. 280–288 (2015) 5. Dhakate, M., Ingole, A.: Diagnosis of pomegranate plant diseases using neural network. In: Fifth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG) (2015) 6. Gaikwad, D.S., Karande, K.J.: Image processing approach for grading and identification of diseases on pomegranate fruit: an overview. Int. J. Comput. Sci. Inf. Technol. 7(2), 519–522 (2016) 7. Lamani, S.B., Ravikumar, K., Jamal, A.: Pomegranate fruits disease classification with fuzzy c mean clustering. Int. J. Adv. Eng. Res. Dev. 5(2) (2018) 8. Kaur, K., Kaur, M.: Prediction of plant disease from weather forecasting using data mining. Int. J. Future Revolution Comput. Sci. Commun. Eng. 4(4) (2018)

Location-Based Pomegranate Diseases Prediction Using GPS

383

9. Sowmya, G.M., Chandan, V., Kin, S.: Disease detection in pomegranate leaf using image processing technique. Int. J. Sci. Eng. Technol. Res. (IJSETR) 6(3) (2017) 10. Li, Q., Wang, M., Gu, W.: Computer vision based system for tomato surface defect detection. Comput. Electron. Agric. 36, 215–223 (2002) 11. Mehl, P.M., Chao, K., Kim, M., Chen, Y.R.: Detection of defects on selected tomato cultivars using hyperspectral and multispectral image analysis. Appl. Eng. Agric. 18, 219–226 (2002) 12. Wang, Y., Cui, Y., Huang, G.Q., Zhang, P., Chen, S.: Study on vegetable quality inspection based on its surface color in produce logistics. In: International Conference on Manufacturing Automation (2010) 13. Chaerle, L., Lenk, S., Hagenbeek, D., Buschmann, C., Straeten, D.V.D.: Multicolor fluorescence imaging for early detection of the hypersensitive reaction to tobacco mosaic virus. J. Plant Physiol. 164(3), 253–262 (2007) 14. Singh, V., Varsha, A.K.: Detection of unhealthy region of plant leaves using image processing and genetic algorithm. In: 2015 International Conference on Advances in Computer Engineering and Applications (ICACEA) IMS Engineering College, Ghaziabad, India 15. Chaudhary, M., Chavan, R., Durgawali, S., Ghodeswar, A.: Smart agriculture: detection of disease in plants using image processing. In: International Conference on Innovative and Advanced Technologies in Engineering 16. Mithun, P., Aishwarya, K., Nikita, S., Aishwarya, G.: Android based application for fruit quality analysis. Int. J. Innovative Res. Sci. Eng. Technol. 12(6) (2016) 17. Doddaraju, P., Kumar, P., Gunnaiah, R., Gowda, A.A., Lokesh, V., Pujer, P., Manjunatha, G.: Reliable and early diagnosis of bacterial blight in pomegranate caused by Xanthomonas axonopodis pv punics sensitive PCR technique 18. Sharma, J., Sharma, K.K., Kumar, A., Mondal, K.K., Thalor, S., Maity, A., Gharate, R., Chinchur, S., Jadhav, V.T.: Pomegranate bacterial blight: symptomatology and rapid inoculation technique for Xanthomonas axonopodis pv punicae. J. plant Pathol. 19. Jain, K., Desai, N.: Pomegranate the cash crop of India: a comprehensive review on agricultural practices diseases. Int. Res. Health Sci. Res.

Medical Image Enhancement Technique Using Multiresolution Gabor Wavelet Transform Kapila Moon and Ashok Jetawat

Abstract Medical images are applied for analysis and diagnosis of particular medical disorder or diseases. Hence, the medical image enhancement technique is necessary and challenging for further processing through computer vision systems. It assists in further processing of medical images for segmentation, detection and prediction of certain diseases such as cancer, tumor and any other disorder. Most of the medical images obtained through various sources are dark and seem to be noisy that requires efficient image enhancement technique that preserves the content of the images. In this paper, the image enhancement technique through multiresolution Gabor wavelet transform is presented. Gabor wavelet transform has demonstrated multiresolution capabilities with better texture enhancement that helps in quality improvement in medical images. Experiments based on public dataset reveal better performance with respect to qualitative and quantitative analysis. Experimental results on several low illuminated medical images demonstrate best results in terms of enhancement parameters and visual testing. Finally, the obtained outcomes are compared with the prominent methods published in the literature. Keywords Image enhancement · Medical images · Multiresolution · Gabor wavelet transform · Low illumination

K. Moon (B) Department of Electronics Engineering, Ramrao Adik Institute of Technology, Navi Mumbai, India e-mail: [email protected] A. Jetawat Faculty of Engineering, Pacific Academy of Higher Education and Research University, Udaipur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_41

385

386

K. Moon and A. Jetawat

1 Introduction Medical images are mostly captured in low light environment and are affected by noise and low contrast. Therefore, it is utmost necessary to apply image enhancement techniques so as to make it suitable for further processing through computer vision systems. These systems process medical images for segmentation, detection and prediction of certain diseases such as cancer, tumor and any other disorder. During day time or better illumination, the image quality may be sufficient for its indented application, but especially during night and low illumination circumstances, the image quality may be worse and affect the correct diagnosis of the disease [1– 3]. Some of the images obtained in low illumination are depicted in Fig. 1. Many image enhancement techniques are proposed by researchers and developers based on spatial and transform domain. Most of the spatial domain techniques are based on histogram equalization for contrast enhancement, adaptive mean filtering technique that uses statistical methods and models for removing noise. Whereas transform domain techniques apply frequency domain techniques such as Fourier transform and wavelet transform for image enhancement and contrast stretching. Nowadays, machine learning techniques are being explored to enhance the region of interest and further processing of medical images. A technique based on multiresolution Gabor wavelet transform for medical image enhancement is presented in this paper. Medical image enhancement technique is necessary and challenging for further processing through computer vision systems. It assists in further processing of medical images for segmentation, detection and prediction of certain diseases such as cancer, tumor and any other disorder. Our contribution can be summarized as first we apply fixed Gabor wavelet transform, secondly variable Gabor wavelet transform is applied at several resolution and finally the results are summed to obtain a no noise and contrast stretched image. Further, the paper is organized in various sections that includes introduction in Sect. 1, related work is discussed in Sect. 2, presented methodology is explained in Sect. 3, experimental results are elaborated in Sect. 4 and conclusion is discussed in Sect. 5. Fig. 1 Some examples of medical images obtained under low illumination

Medical Image Enhancement Technique Using Multiresolution …

387

2 Related Work Many researchers, scientist, medical practitioners, engineers and medical imaging equipment developers and manufacturers have expressed the necessity of image enhancement technique for correct analysis and diagnosis [4–6]. Many image enhancement techniques are proposed by researchers and developers based on spatial and transform domain. Several techniques in spatial domain are proposed by researchers based on histogram equalization that mainly focuses on image contrast [7]. However, it is observed that dark regions within an image are not restored appropriately. Single scale retinex (SSR) [8], multiscale scale retinex (MSR) [9] and multiscale retinex with color restoration (MSRCR) [10] techniques based on frequency domain approach are proposed, but unsuitable for medical images that need better contrast. Yet other techniques were applied using basic image processing algorithms such as erosion, median filter, dilation, outlining and edge detection for proper extraction of region of interest and further segmentation to detect cancer nodules [11–13]. Nowadays, machine learning techniques are being explored to enhance region of interest and further processing, whereas these techniques completely depend on the network that learns enhancement function, appropriate training set of medical images and need such large database [14, 15]. Convolutional neural network (CNN) is now mostly applied in the detection of abnormalities in the medical images. Training the CNN is the most important step that necessitates the appropriate dataset of images. It requires quality images which are needed to extract feature vectors to the train the network. But most of the medical images are obtained with the unnecessary noise and distortion [16–18]. Therefore, the medical image enhancement technique for dark and noisy images is required that enhances features in the dark region of the image and removes noise and assist in further processing for better analysis and diagnosis.

3 Multiresolution Gabor Wavelet Transform Fourier transform is one of the best tools to obtained frequency response of the audio, image and video signals. However, Fourier transform loses the frequency spectrum information that makes it inappropriate for restorage of image and further processing especially through convolutional neural network (CNN). Gabor wavelet transform offers better multiresolution approach that represents the texture of an image. We apply Gabor filter to extract global features from the whole medical image. The 2-D Gabor function can be specified by the frequency of the sinusoid w and the standard deviation σ x and σ y of the Gaussian envelope as shown in Eq. (1) 

1 e g(x, y) = 2π σx σ y

 − 21

x2 σx2

  2 + y 2 +2π jwx σy

(1)

388

K. Moon and A. Jetawat

Let g(x, y) be the mother Gabor wavelet, filter coefficients can be obtained by appropriate dilations and rotations of g(x, y) depicted in Eq. (2). ˜ y˜ ) gmn (x, y) = a −m g(x,

(2)

where m specifies the scale, whereas n specifies the orientation respectively of the wavelets, the m and n are the integers whose values can be given as m = 0, 1, 2, …, M − 1, n = 0, 1, 2, …, N − 1. The integers designated by M and N represent the total number of scales and orientations applied in wavelet transform respectively and can be represented as Eqs. (3) and (4) x˜ = a −m (x cos θ + y sin θ )

(3)

y˜ = a −m (−x sin θ + y cos θ )

(4)

where a > 1 and θ = 2π /N. Let I(x, y) be the gray level an input medical image, the convolution of this image I with a Gabor kernel Gmn is given using Eq. (5). Gmn (x, y) =

 s

I (x − s, y − t)g∗mn (s, t)

(5)

t

where s and t are the filter mask size variables, g∗mn is the complex conjugate of Gabor function gmn . Application of Gabor filters on the whole medical image with different orientation and scale, an array of magnitudes are obtained. These magnitudes at a different scale and orientation of the image are finally summed to obtain a no noise and contrast stretched image.

4 Experimental Results Investigational results for established dataset [19], it consists of brain tumor dataset containing 3064 images from 233 patients with three kinds of brain tumor meningioma (708 slices), glioma (1426 slices) and pituitary tumor (930 slices) for enhancing low light images through multiresolution Gabor wavelet transform. Most of the medical images obtained through various sources are dark and seem to be noisy that requires efficient image enhancement technique that preserves the content of the images. In this paper, the image enhancement technique through multiresolution Gabor wavelet transform is presented. Gabor wavelet transform has demonstrated multiresolution capabilities with better texture enhancement that helps in quality improvement in medical images. To appropriately quantify our results, three parameters were explored mean average error (MAE), peak signal to noise ratio (PSNR) and image enhancement factor (IEF) defined in Eqs. (6), (8) and (9), respectively. MAE, PSNR and IEF need base image to evaluate the parameters.

Medical Image Enhancement Technique Using Multiresolution …

 PSNR = 10 ∗ log10

2552 1 ∗ se m∗n

389

 (6)

where m and n are size of the images. se =

m  n 

|I (x, y) − O(x, y)|2

(7)

x=1 y=1

where I(x, y) and O(x, y) are obtained base input and output/restored image, respectively. MAE =

m  n 

|I (x, y) − O(x, y)|

(8)

x=1 y=1

m n x=1

y=1 |In(x,

x=1

y=1 |I (x,

IEF = m n

y) − I (x, y)|2

y) − O(x, y)|2

(9)

where In(x, y) is low illuminated input image. Results obtained through various techniques such as histogram equalization (HE), wavelet transform (DWT) through Haar wavelet and discrete Fourier transform (DFT) are depicted in Fig. 2 and performance parameters MAE, PSNR and IEF are tabulated in Table 1. It clearly indicates the performance with respect to qualitative and quantitative analysis. Our method comparatively achieves the higher IEF and PSNR and lower MAE. Figure 3 demonstrates the restored images from various low light illuminated medical brain images taken from the dataset.

5 Conclusion In this paper, image enhancement technique through multiresolution Gabor wavelet transform is presented. Medical images are mostly captured in low light environment and affected by noise and low contrast. Convolutional neural network (CNN) is now mostly applied in the detection of abnormalities in the medical images. Training the CNN is the most important step that necessitates the appropriate dataset of images. It requires quality images which are needed to extract feature vectors to the train the network. Therefore, it is utmost necessary to apply image enhancement techniques so as to make it suitable for further processing through computer vision systems. Experimental results on several low illuminated medical images demonstrate best results in terms of enhancement parameters and visual testing. Gabor wavelet transforms through its multiresolution approach demonstrates better image enhancement

390

K. Moon and A. Jetawat

Input image captured under low illumination

Output image(HE)

Output image (DFT)

Output image (DWT)

Output image (Gabor filtered)

Output image (our work)

Fig. 2 Experimental results obtained on public dataset through various methods

Table 1 Performance parameters Technique

HE

DFT

DWT

Our work

MAE

85.64

41.84

2.58

0.017

PSNR (dB)

8.61

12.81

33.50

63.55

IEF

38.18

107.94

15,096.0

1,311,900

as compared with DWT, DFT and histogram equalization especially for low illuminated images. Thus, it is preferred to apply multiresolution Gabor wavelet transform for preprocessing of medical images that assist in further diagnosis and analysis.

Medical Image Enhancement Technique Using Multiresolution … Fig. 3 Experimental results obtained on public dataset through multiresolution Gabor wavelet transform (our work)

Input image captured under low illumination

391

Output image (our work)

References 1. Kadir, T., Gleeson, F.: Lung cancer prediction using machine learning and advanced imaging techniques. Transl. Lung Cancer Res. 7(3), 304–312 (2018) 2. Makaju, S., Prasad, P.W.C., Alsadoon, A., Singh, A.K., Elchouemi, A.: Lung cancer detection using CT scan images. Procedia Comput. Sci. 125, 107–114 (2018) 3. Zhang, G., Jiang, S., Yang, Z., Gong, L., Ma, X., Zhou, Z., Bao, C., Liu, Q.: Automatic nodule detection for lung cancer in CT images: a review. Comput. Biol. Med. 103, 287–300 (2018) 4. Zhang, J., Xia, Y., Cuia, H., Zhang, Y.: Pulmonary nodule detection in medical images: a survey. Biomed. Signal Process. Control 43, 138–147 (2018) 5. Uzelaltinbulat, S., Ugurb, B.: Lung tumor segmentation algorithm. Procedia Comput. Sci. 120, 140–147 (2017) 6. Nithila, E.E., Kumar, S.S.: Segmentation of lung nodule in CT data using active contour model and Fuzzy C-mean clustering. Alexandria Eng. J. 55, 2583–2588 (2016) 7. Abdullah-Al-Wadud, M., Kabir, M.H., Dewan, M.A.A., Chae, O.: A dynamic histogram equalization for image contrast enhancement. IEEE Trans. Consum. Electron. 53(2), 593–600 (2007) 8. Jobson, D.J., Rahman, Z.-U., Woodell, G.A.: Properties and performance of a center/surround retinex. IEEE Trans. Image Process. 6(3), 451–462 (1997) 9. Rahman, Z., Jobson, D.J., Woodell, G.A.: Multi-scale retinex for color image enhancement. In: Proceedings of 3rd IEEE International Conference on Image Processing, pp. 1003–1006 (1996)

392

K. Moon and A. Jetawat

10. Jobson, D.J., Rahman, Z.-U., Woodell, G.A.: A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Trans. Image Process. 6(7), 965–976 (1997) 11. Sharma, D., Jindal, G.: Identifying lung cancer using image processing techniques. In: International Conference on Computational Techniques and Artificial Intelligence (ICCTAI), pp. 115–120 (2011) 12. Chaudhary, A., Singh, S.S.: Lung cancer detection on CT images by using image processing. In: IEEE International Conference on Computing Sciences (ICCS), pp. 142–146 (2012) 13. Gupta, A., et al.: Methods for increased sensitivity and scope in automatic segmentation and detection of lung nodules in CT image. In: IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pp. 375–380 (2015) 14. Shen, L., Yue, Z., Feng, F., Chen, Q., Liu, S., Ma, J.: MSRnet: low-light image enhancement using deep convolutional network. Available https://arxiv.org/abs/1711.02488 (2017) 15. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016) 16. Jakimovski, G., Davcev, D.: Using double convolution neural network for lung cancer stage detection. Appl. Sci. 9(427), 1–12 (2019) 17. Chapaliuk, B., Zaychenko, Y.: Deep learning approach in computer-aided detection system for lung cancer. In: IEEE International Conference on System Analysis and Intelligent Computing (SAIC), Ukraine (2018) 18. Li, Z., Li, L.: A novel method for lung masses detection and location based on deep learning. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), America (2017) 19. https://figshare.com/articles/brain_tumor_dataset/1512427/5

HOMER-Based DES for Techno-Economic Optimization of Grid R. Raja Kishore, D. Jaya Kumar, Dhonvan Srinu, and K. Satyavathi

Abstract This study presents the techno-economic feasibility on grid connected distributed energy system (DES) or micro-grid for a big technical institute. It is concentrated on how to optimize the electricity utilization from the grid by delivering, however, much as could reasonably be expected renewable energy and not withstanding that it integrates green vehicle transportation usage such as hydrogen gas and electric cars which are necessary elements of a sustainability in the proposed system. The work is initiated collecting the institute monthly electrical load data, climate data and associated monetary data with the aim of investigating a renewable energy supply system feasibility study. Different scenarios are developed according to the project needs and the scenarios were modelled by HOMER software. The study concludes with a direct comparison of the economic feasibility, renewable energy fraction and emission among all system looks for appropriate sustainable solution. This study will provide helpful insights into the relevant stack holders and policymakers in the development of grid connected distributed energy systems. This can be achieved by HOMER programming that simulates a hundred or even a huge number of techniques. HOMER simulates the procedure of a hybrid micro-grid for an entire year, in time frames from one moment to 60 min. Keywords DES modelling · HRES · HOMER size and cost optimization

R. Raja Kishore (B) · D. Jaya Kumar · D. Srinu Department of ECE, Marri Laxman Reddy Institute of Technology and Management, Hyderabad, India e-mail: [email protected] D. Jaya Kumar e-mail: [email protected] D. Srinu e-mail: [email protected] K. Satyavathi Department of ECE, Nalla Malla Reddy Engineering College, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_42

393

394

R. Raja Kishore et al.

1 Introduction In the present scenario, every country places considerable importance on sustainable development and energy security. Hence, hybrid renewable becomes more significant sources. Energy requirement is an essential need to enhance the income, improved life quality of individuals [1]. Developing countries on the way of growing their economies are in much demanding for electricity access to facilitate their economical and industrial growth. These days renewable power source assets are one of the promising approaches to address numerous issues. Environmental change, desertification, nursery impact and so on lead the world towards reasonable energy period [2]. Utilizing common and sustainable assets, for example, wind, sunlight-based geothermal, tidal, wave and hydroelectric offer clean choices for non-renewable energy source [3]; in which they are omnipresent, abundant, free, clean and easily accessible even in isolated and undeveloped places. Sources of energy that can be repeatedly generated are of the challenges for the development for a system of energy, i.e. renewable with least socio-economic problems is to design a new form of fuel [4]. This energy that can be generated repeatedly from the same source needs to be appropriately designed and evaluated from beginning stages [5]. Unreliable nature of these sources of energy is one of short comings in the process of their development, in particular as thereof more reliable energy sources available to be replaced when it is essential to distribute the load [6]. This inadequacy coupled with a huge expense at the beginning as well as depending heavily on conditions of weather and climate makes it essential to put together various renewable resources to make up a fusion of different systems which can be used with more flexibility more cost effectively, reliably and efficiently [7]. But planning assessing carefully should make sure effective carrying out of hybrid of this new power system. The thing the operators trained, getting the participation of the community on programs of electrification and supervising the setting up and commissioning having the maintenance structure, in force and following up the maintenance and reporting of essential for successful implementation of a hybrid power system.

2 HOMER Software The HOMER Pro is small scale lattice programming and the HOMER Energy is the worldwide standard for advancing smaller-scale network structure in all territories, from network force and island advantages to matrix connected grounds and the army installations [3]. At first, it will be created at the national regeneratable energy laboratory. This system can improve and distributed by HOMER Energy. Hybrid optimization model for multiple energy resources (HOMER) nests three controlling tools in one software product, so that production and financial side work side by side are shown in Fig. 1.

HOMER-Based DES for Techno-Economic Optimization of Grid

395

Fig. 1 HOMER Pro screenshot

2.1 Simulation By the side of its basic, HOMER is a reproduction model it wills effort to simulate a feasible method for all possible arrangements of the tools that you hope to study. Reliant on how you complex your concern, HOMER programming can simulate a hundred or even a huge number of techniques. HOMER simulates the procedure of a hybrid micro-grid for an entire year in timeframes from one moment to 60 min.

2.2 Optimization HOMER concentrates every single possible mix of game plan types in a lone single run, and afterwards bunches the frameworks as per the advancement adaptable of ideal. HOMER Pro structures our unique streamlining process that impressively improves the technique method for grouping least-value opportunities for microgrids or other distributed generation of electrical power systems. HOMER optimizer is a patented “derivative free” optimization process that was calculated specifically to effort in HOMER is shown in Fig. 2.

3 Case Study These case studies were developed to test the ability of the multilevel optimization method to analyse remote communities with different climate conditions. For this, we

396

R. Raja Kishore et al.

Fig. 2 Optimization window

Fig. 3 Proposed location

consider a Marri Laxman Reddy Institute of Technology and Management campus having three blocks (main block, SR Block and MV block) which is located in Hyderabad, Telangana (in Fig. 3). The college campus is selected as they exhibit distinct climate conditions of solar energy. Table 1 shows the mean horizontal isolation (Fig. 4). Since we are interested in analysing on the influence of climate conditions, we fixed the load profile for the campus. Consequently, for this case, we used the sample load profile is shown in Fig. 4, as it mimics the average electricity demand (168.4 kwh/day) for campus.

3.1 Renewable Energy Resources (Solar Resource) Data solar resource input data for HOMER is made up of monthly averaged daily insulation incidents on a horizontal surface (kWh/m2 /day) from the NASA Surface Meteorology and Solar Energy (SSE) website. NASA gives monthly averaged values from 10 years of data. Due to the close distance, the location data of the city Hyderabad is used as location data of the college Marri Laxman Reddy Institute of Technology and Management in the study. The following location data is used to find the solar radiation data:

HOMER-Based DES for Techno-Economic Optimization of Grid Table 1 Monthly average solar global horizontal irradiance

397

Clearness index

Daily radiation Kwh/m2 /day

January

0.645

5.060

February

0.663

5.820

March

0.646

6.360

April

0.616

6.510

May

0.580

6.280

June

0.447

4.840

July

0.395

4.260

August

0.395

4.180

September

0.452

4.529

October

0.532

4.790

November

0.607

4.850

December

0.630

4.740

Month

Fig. 4 Power demand for a sample week of an average sized campus

Latitude: 17° 35 39.51 N Longitude: 78° 24 59.28 E Time Zone: Eastern Time Zone (UTC5:30) New Delhi In Table 1, he obtained solar radiation data for the campus Marri Laxman Reddy Institute of Technology and Management is presented. The solar resource raw data inputting to the software is the average global horizontal radiation measured in 10-min time interval over the two years. On top of the solar resources data, the latitude and longitude of this area would also be used as an input. The time zone is another parameter to be set. The college is at latitude: 17° 35 39.51 N, longitude: 78° 24 59.28 E, and with time zone of UTC +5:30. The annual solar radiation available within the study location is 5.18 kWh/m2 /day using HOMER. HOMER assesses PV exhibit power for the year on an hourly premise and uses scope latitude value to compute the normal everyday radiation from the clearness index and vice versa. The yearly arrived at the midpoint of day by day solar insulation here was seen as 5.18 kWh/m2 /day. The proficiency of the PV exhibits anything, but HOMER input, in light of the fact that the product does not assign the PV cluster size as far as m2 , yet in kW of rated capacity. The rated capacity is the measure of intensity the PV module gets under STC and records for the board proficiency.

398

R. Raja Kishore et al.

By overseeing evaluated limit, HOMER has no compelling reason to manage the effectiveness, since two modules with various efficiencies (and a similar territory) would be set to various sizes. Solar resource data was downloaded at 4/21/2020 11:21:02 AM from NASA Surface Meteorology and Solar Energy data base NASA Surface Meteorology and Solar Energy data base Cell Number: 107258 • • • •

Cell Dimensions: 1 Degree *1 Degree Cell Midpoint Latitude: 17.5 Cell Midpoint Longitude: 78.5 Annual Average Radiation: 5.18 Kwh/m2 /day.

4 Description of Hybrid Renewable Energy System The energy system that is proposed is expected to meet the demand of the load of electricity of the community that will also include classrooms. The source of renewable energy considered here are mainly of solar and wind due to the unstable nature of renewable energy battery bank is employed as storage system. In this configuration, a two-way converter is inserted. This is used to change the battery power in terms of AC type voltage into DC type voltage. It supplies AC type power back from battery to AC type load to consumers. AC type power is required by all the consumers, part of the input values into the software are given according to size and quantity. The other components are solar PV and converter; these two are also vary in size. The simulated model of the hybrid architecture considered in this paper is presented in Fig. 5. Fig. 5 Architecture of the selected technologies of the hybrid system

HOMER-Based DES for Techno-Economic Optimization of Grid

399

5 Size and Cost Optimization Immediately after selecting the segments innovation from the library of HOMER programming, enter the power load into the demonstrating apparatus. The essential load input entered on a 24 h information basis and from that point the product models a peak load. It additionally combines the month to month load from a 24-h input information. This paper describes an essential power burden and its data sources. It groups an end of the week load and for August, January and further rest of the months produced by HOMER in the after generated the 24 h load information portrays the diurnal variety of the essential burden profile of the college Fig. 5 demonstrates the essential load demands and shows that load profile changes during the day. The load is going to zero from midnight to 6:00 clock in the morning. The load is about to raises the demand from 6:00 to 9:00 o’ clock. Around lunch time, i.e. from 12:00 AM to 2:00 PM there is a greater demand in power. There is a greater demand for power around dinner time; however, peak hour is from 6:00 PM to 12:00 PM midnight. This schematic clearly demonstrates that electricity ID consumed most for lightening purpose.

5.1 Cost Data and Size Specifications of Each Component The cost of the modules as chief purpose of the work is investigating the best power system contour which would meet the requirements with minimum NPC and COE is the basic criteria relevant to the selection of the power system components in this thesis. The cost of equipment was estimated on the basis of a current cost available in the market. Initial capital cost: The total installed cost used in purchasing and installing components in the beginning of the project. O&M cost: The cost of maintaining and operating of the system is the O&M cost. All the components related to this scheme considered in this project as variable operational cost and maintenance cost. Miscellaneous O&M cost mentioned by HOMER is emission damages, capacity shortage, penalty and fixed operational and maintenance cost. Replacement cost: It is compulsory to change wear out modules at the end of its lifetime initial cost of components is different from this because of all the spaces of the components are not necessary to be replaced at the end of the life cycle and cost born by donors may make up or reduce the starting cost. However, a new cost may not be considered as travel cost.

400

R. Raja Kishore et al.

Table 2 Size and cost of PV panel Size of PV (kW)

Capital cost ($)

O&M cost ($/year)

Life span of PV (year)

Considered sizes (kW)

1

500

250

15

0, 100,200

5.2 Solar PV Size and Cost The reason for choosing after considering different products regarding the cost provided them with four modules having product was chosen from the stated company because of its low cost. This is expected to give an efficient service for a considerable long time. We considered a 50 KW solar panel 250 W capacity delivered by Generic PV Company. The panel is known as Generic Flat Plate PV built with mono-crystalline silicon we have efficiency of 20.4% and price range from $1.16 to $1.31/W. The insulation cost is taken as 60% of PV price. The operation and maintenance would expected to 1% per year and other details found in Table 2.

6 Simulation Results and Discussions Optimization results are presented in an overall and classified form showing the most workable power system structure which is suitable for a load and input workable solutions are appeared in an expanding order of the net present cost in a dropping request. A characterization table gives the least cost effective from all the units’ setup. While the general enhancement results introduced all the moderate frameworks mixes dependent on our NPC. Net present expense and net present cost were the basics of selecting the power systems. The parameters were like low excess electricity generation, low capacitive shortage and high renewable fraction are used for illustration of power generation schemes in order to test their technical feasibility. Optimization results for a selected hybrid power system are shown in Fig. 6.

7 Conclusion Configuration of a viable system design of a modified renewable energy system using distributed energy resources for the application in the college has been done for distant cases of connections of renewable sources. Case studies are already done with considering the solar as renewable source with different cases. Another case study was complete by recon the energy load and the resource availability for electricity production here. The scale annual average electrical load at the college was estimated 168.4 Kw/day with peak load of 27.34 Kw. The electrical analysis shows that remaking scenarios are not economical fit for the current situation. During the

HOMER-Based DES for Techno-Economic Optimization of Grid

401

Fig. 6 HOMER results

conditions figured in this study, the relatively low NPC of the system is much dependent under among the price at which power can be sold at grid. Therefore, of selling electricity as an important role for the systems economical suitable such an agreement would make the energy system much more economically viable for the college, which would continue not power to the stage power grid, reduce the CO2 emissions and contribute to an increase renewable energy use and increased availability of power supply.

References 1. Vendoti, S., Muralidhar, M., Kiranmayi, R.: Optimization of hybrid renewable energy systems for sustainable and economical power supply at SVCET Chittoor. i-manager’s J. Power Syst. Eng. 1(1), 26–34 (2017) 2. Boqtob, O., El Moussaoui, H.: Optimal sizing of grid connected micro grid in Morocco using Homer Pro. In: IEEE Conference Proceedings (2019) 3. Vendoti, S., Muralidhar, M., Kiranmayi, R.: HOMER based optimization of solar-wind-diesel hybrid system for electrification in a rural village. In: IEEE Digital Library Explorer pp. 1–6 (2018) 4. Vendoti, S., Muralidhar, M., Kiranmayi, R.: Techno-economic analysis of off-grid solar/wind/biogas/biomass/fuelcell/battery based system for electrification in a cluster of villages by HOMER software. Environ. Dev. Sustain. (2020) 5. Fernando, W., Gupta, N., Kamya, G., Ozveren Suheyl, C.: Feasibility study of small scale battery storage systems integrated with renewable generation technologies for Sri Lankan domestic applications. IEEE Conference Proceedings (2019)

402

R. Raja Kishore et al.

6. Khasawneh, H.J., Mustafa, M.B., Al-Salaymeh, A., Saidan, M.: Techno-economic evaluation of on-grid battery energy storage system in Jordan using Homer Pro. AEIT (2019) 7. Marais, S., Kusakana, K., Koko, S.P.: Techno-economic feasibility analysis of a grid-interactive solar PV system for South African residential. In: 2019 Proceedings of the 27th Domestic Use of Energy Conference, pp. 163–168 (2019)

Odor and Air Quality Detection and Mapping in a Dynamic Environment Raghunandan Srinath, Jayavrinda Vrindavanam, Rahul Rajendrakumar Budyal, Y. R. Sumukh, L. Yashaswini, and Sangeetha S. Chegaraddi

Abstract Timely collection of biodegradable wastes if not collected regularly can pollute the environment and surroundings and can be a major health risk. This paper proposes a cost-effective technique that can detect the garbage on the pavements and decentralized collection points through odor sensing. The system will be equipped in a moving vehicle and consist of a MQ series sensor which detects the foul smell. The sensor is designed to send the information which is a value that indicates the level of toxicity of the smell and location with the help of global positioning system fitted in the vehicle. The information will be sent with the support of LoRa (Long Range) network and cloud. The location-wise level of toxicity captured in a master screen would support the authorities in prioritizing the areas to be cleaned up first and would also support in monitoring the results of the action. Keywords Odor · Air quality · Heltec ESP32 · LoRa · MQ series sensor · MapBox (API) · GPS · Firebase (database)

R. Srinath SenZopt Technologies, Bengaluru, India e-mail: [email protected] J. Vrindavanam (B) · R. R. Budyal · Y. R. Sumukh · L. Yashaswini · S. S. Chegaraddi Department of ECE, Nitte Meenakshi Institute of Technology, Bengaluru, India e-mail: [email protected] R. R. Budyal e-mail: [email protected] Y. R. Sumukh e-mail: [email protected] L. Yashaswini e-mail: [email protected] S. S. Chegaraddi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_43

403

404

R. Srinath et al.

1 Introduction The waste disposal management systems of the cities around the world has been facing challenges on account of ever-increasing urbanization and concomitant rising volume of waste generated and littering. Public waste bins are filling up faster than ever and inevitably many of the bins overflow prior to collection, causing not only bad odors and garbage overflow nearby to the dumping area, but also poses health hazards and environmental pollution as overflowing, and uncollected waste bins are a perfect location for the growth of bacteria, insects, and vermin. The flies that feed on the rubbish can spread the diseases like typhoid, gastroenteritis, and other major illnesses. Overflowing garbage smell causes various respiratory diseases, fall in air quality, adverse health effects in the form of spreading of disease causing pathogens entering into human body through breathing and contacts. Though there can be quite a few substances in the waste, in general, the air gets polluted with gases like carbonic acid gas, laughing gas, ammonia, and methane. In daily life, we identify the polluted air through smelling of odors, which are usually caused out of decomposition of bio-degradable items. So as to make sure timely collection of food and other perishable wastes at risk of degeneration, municipal authorities in certain localities have introduced static sensors in areas like the market and other specific locations. Since the wastes are generated at wider locations within the cities, timely collection of biodegradable wastes assume importance as the delays can pollute the air and the nearby environment. Instead of installing a few sensors at a specific location, this paper proposes to introduce placing of the sensors in moving objects like vehicles and tracking all sorts of odor so that the city-wide odors can be sensed, and appropriate action can be initiated. When the sensors are fitted with large number of vehicles, inputs on the air odor can be obtained from multiple locations found be a better and effective approach that can support cleaner environment. The odor sensing device proposed in this paper continuously detects, measures, and monitors the odor gaseous contaminants. The solution incorporates Odor Atmospheric Dispersion Modeling (OADM) for predicting odor impact on the surrounding area depending on meteorological conditions. With the help of meteorological data, the odor sensing device can trace the odorant dispersion plume incited by conditions like wind speed and wind direction. The odor sensing device uses LoRa a low-power wide-area network (LPWAN) technology network, which is one of the best cost-effective approaches in such conditions. The odor sensor is being implemented by using chemical sensors (MQ-2, MQ-3, MQ-9, MQ-135, etc.), air quality sensors. Whenever the brink point of the chemical sensor is reached, the sensor data is shipped to the LoRa gateway together with the placement (longitude and latitude) of the vehicle. The LoRa main hub is placed every 3–5 km radius. The LoRa receiver receives this data and sends it to the cloud. From the cloud, the municipality can take action to scrub that area. All this data is shipped to the corporation for the upkeep of cleanliness and keeping the environment clean.

Odor and Air Quality Detection and Mapping …

405

In the present scenario, in most of the countries, the sensors introduced by the municipal authorities are static and accordingly, placed only in a few select locations. Given the increased attention to cleanliness amongst the cities (like Swachh Bharat Abhiyan (Clean India Mission) and ranking of cities based on cleanliness (e.g., in India), the project can support relevant authorities for ensuring better living conditions by reducing the hazardous odor and timely waste collection by using the proposed system, which can be a moving sensor that can detect the foul smell in any part of the cities. Sect. 1 as above is on introduction, and Sect. 2 explains the literature review. Proposed system is discussed in Sect. 3. Results are analyzed in Sects. 4 and 5 is on conclusion.

2 Literature Review In the year 2004, a study has brought out a detection instrument to identify odor pollution in the environment [1]. Keeping in view the criticality of the waste management for ensuring better public health, thereafter, a number of researches were carried on the organic components and chemical compositions present in odor. The measurement method was introduced for odor emission capacity to describe the number of odorants present. Odor compounds can also be recognized by means of detection instruments like gas chromatography. Yet another method introduced has been the E-nose, which is an instrument developed to approach the biological olfaction system. This system comprised of the electronic, chemical sensors with partial specificity, and an appropriate pattern recognition system, capable of recognizing simple or complex odors. In another study [2] on odor detection methods, such as olfactometry and chemical sensors, the study examined the state of both human and instrumental sensing currently used for the detection of odors. The olfactometric techniques employing a panel of trained experts were discussed and the strong and weak points of odor assessment through human detection highlighted. Further, the paper also discussed the merits and demerits of the instrumental sensory methods and chemical sensors. The limitations of the dynamic olfactometry are that it provided point odor concentration data, which is not sufficient to have a full evaluation. There are also studies that have attempted comparison and integrations between olfactometry and E-nose, and their outcomes were listed out. Another paper viewed that using more than one approach is required for better understanding of olfactory nuisance cases. Monitoring household garbage odors in urban areas through distribution maps were proposed [3] and introduced e-nose, which is equipped by bike, sensors such as MQ series (2, 9) and TGS series (2620, 2602a, 2602b). The limitation is use of bicycle for monitoring the waste; furthermore, bicycle is equipped with expensive devices such as laptop, GPS module, and e-nose, etc. Other studies carried out by Deepak and Neeta [4] surveyed odor detection and sensors in the year 2017. The paper reviewed various odor detection system and sensors that can be employed for real world in the field of detection, identification, and classification of various odor

406

R. Srinath et al.

present in the air. Surface acoustic wave sensors were created to detect multiple volatile organic compounds. Metal oxide sensor also possesses the detection capability of volatile compounds and detects molecules in a range. New sensors were introduced for the biological olfactory system called biosensors. Gas chromatography system which results in the detection of VOCs. And various odor detection sensors give depth knowledge about various aspects of odor that is used during detection and classification. And also increases the efficiency of odor detectors. Dabholkar et al. [5] proposed a method for illegal dumping detection. The authors have demonstrated that application of deep learning approach can facilitate automatic detection of illegal dumping. The authors explored multiple design trade-offs to achieve better accuracy with a smaller memory footprint. Okokpujie [6] and others introduced the idea of a smart air pollution monitoring system to continuously track air quality, and the measured air quality indicators were displayed on a screen. The authors have developed a platform named, “Thing speak,” where the collected indicators were displayed. The purpose of the system was to enhance public awareness on air quality. This monitoring device was capable of delivering real-time measurements of air quality. Smart air quality monitoring system with LoRa (Long Range) WAN was proposed by Thu [7] and others in the year 2018. The smart air quality monitoring system, which the paper described as an end-to-end system was implemented in Yangon, the business capital of Myanmar. The smart system, according to the paper, allowed the users to access the online dashboard to monitor the real-time status of the air quality and also had the feature that allowed the users to retrieve the past data, all by themselves. The smart system also has the feature of adding sensor nodes and gateways in case the implementation team decides to extend the scope of monitoring the area. Yet another intervension has been the IOT-based E-tracking system [8] that enables monitoring of garbage. The paper viewed that the proposed application has been economical apart from being a long-range automated garbage monitoring system. The system was capable of generating real-time data and analysis, supported through a web portal and android application. Through android application, system sends the notification to the garbage collector about garbage level along with its ID and location using GPS module. This feature also supported in the route optimization. In addition, the proposed system has made use of machine learning model to predict the impact of air pollution based on the collected air quality parameters for a given period of time. This has enabled pro-active management of resources and its deployment and also contributed in enhanced efficiency in terms of time and cost. Unlike the above studies, the present implementation is novel and cost effective, as it uses vehicles as a platform for fixing the sensors, and the online connectivity is achieved through the LoRa.

Odor and Air Quality Detection and Mapping …

407

3 Proposed System The proposed system uses MQ-2 gas sensor, which is highly sensitivity to LPG, i-butane, propane, methane, alcohol, hydrogen, and smoke. MQ-135 gas sensor is sensitive to ammonia, sulfide, and benzene steam, also sensitive to smoke and other harmful gases. Heltec ESP32 module is an integrated system consisting of ESP32, LoRa, and OLED display. Initially, the device will sense the odor values using the MQ series sensor placed within the device. The smell is detected as bad or good supported the edge being set. If the detected smell is bad, then the sensor will send the information value to the LoRa, and also, it will fetch the location of the vehicle. LoRa, as already explained in the literature, is a wireless technology that provides long-range, low-power, and secure data transmission for M2M and Internet of Things (IoT) applications. LoRa is predicated on chirp spread spectrum modulation, which has low-power characteristics like Frequency Shift Key (FSK) modulation but may be used for long-range communications. The sensor will send the sensor data together with the situation of the vehicle (latitude and longitude) to the LoRa gateway given that the edge value is reached. Location is fetched using the GPS module that is being used. The LoRa receiver will receive the information and send it to the cloud from the cloud everyone can access the map. LoRa is low powered module. LoRa can be connected to the vehicle battery using voltage regulators. The block diagram of the proposed device is as shown in Fig. 1. Advantages of the proposed model are as follows: • Quality of the air can be measured. • Prevents spreading of disease by detecting the foul smell of dead animals. • Municipal’s will get the locations to be cleaned.

Fig. 1 Block diagram of the proposed system. Firebase—to fetch database, MapBox—for maps, PyQt5—for creating GUI, MQ series sensors—odor detection, LoRa—to receive and send the data, ESP32—microcontroller

408

R. Srinath et al.

Fig. 2 Flow chart of the proposed system

The flow chart of the proposed system is as shown in Fig. 2. The sensor is used to detect the smell as bad or good. If the detected smell is bad, then the sensor will send the information value to the LoRa, and also, it will fetch the location of the vehicle.

4 Results and Discussion In the proposed system as shown in Fig. 3, sensor is embedded on any vehicles plying in the city and supports city-wide tracking of odor across multiple locations. This would ensure that considerable data points are gathered, and the municipal authorities can initiate actions depending upon intensity of the odor. The GUI shown in Fig. 4 is a representative screen which provides the data, where the garbage is detected by the vehicle. The data provided is in terms of latitudinal and longitudinal coordinates that enable us to spot the location of garbage. The data 1 and data 2 correspond to the value of MQ sensors, i.e., MQ-2 and MQ-135, respectively. If sensors data exceeds the threshold value, then it indicates foul smell. Live data will be highlighted on map, whenever the value of sensors surpasses the threshold value as shown in Fig. 5. The bar graph on the right indicates the intensity of foul smell which is depicted with different colors. The foul smell with highest

Odor and Air Quality Detection and Mapping …

409

Fig. 3 Snapshots of the vehicle sensing the odor

Fig. 4 Snapshot of the GUI

intensity is with represented by red color, and the least intensity values are with blue. The color code enables the municipal authorities to give priority to the red hotspot areas over the remaining colors.

5 Conclusion The paper has introduced a system-based odor detection and tracking that can be fitted on a vehicle with the support of the network and GUI. The results on map provide the areas, where the odor intensity is more, and accordingly, waste collection plan can be initiated. Further, the system can also be used as a base support indicator for

410

R. Srinath et al.

Fig. 5 Snapshot of map showing the locations to be cleaned is detected from the vehicles

placing the right sized bins keeping in view of the volumes of waste generation or on account of repeated triggers.

References 1. Yuwono, A., Lammers, P.S.: Odor pollution in the environment and the detection instrumentation. Agric. Eng. Int. CIGR J. Sci. Res. Dev. 6 (2004) 2. Brattoli, M., Gennaro, G., Pinto, V., Loiotile, A.D., Lovascio, S., Michele, P.: Odor detection methods: olfactometry and chemical sensors. Proc. J. Sens. 11(5), 5290–5322 (2011) 3. Monroy, G., Gonzalez, J.J., Sanchez-Garrido, C.: Monitoring Household Garbage Odors in Urban Areas Through Distribution Maps. Department of System Engineering and Automation IEEE Sensors. Valencia, November (2014) 4. Aeloor, D., Patil, N.: A Survey on Odor Detection and Sensors. Department of Computer Engineering (2011) 5. Dabholkar, A., Muthiyan, B., Shilpa, S., Swetha, R., Jeon, H., Gao, J.: Smart illegal dumping detection. In: 2017 IEEE Third International Conference on Big Data Computing Service and Applications (2017) 6. Okokpujie, K., Noma-Osaghae, E., Modupe, O., John, S., Oluwatosin, O.: Smart air pollution monitoring system. Int. J. Civil Eng. Technol. (IJCIET) 9(9), 799–809 (2018). ISSN: 0976-6308 and ISSN: 0976-6316 7. Thu, M.Y., Htun, W., Aung, Y.L., Shwe, P., Tun, N.M.: Smart Air Quality Monitoring System With LoRaWAN (2018) 8. Gokhale, M., Chaudhari, P., Jadhav, N., Wagh, R., Smita, K.: IOT based E-tracking system for waste management. In: The IEEE International Conference on Internet of Things and Intelligence System (2018)

A Comparative Study on the Performance of Bio-inspired Algorithms on Benchmarking and Real-World Optimization Problems E. Lakshmi Priya, C. Sai Sreekari, and G. Jeyakumar

Abstract Biologically inspired computing, shortly known as bio-inspired computing (BiC), follows the models of biology to solve the problems by computing. The main objective of the study presented in this paper is to present the working principle of three BiC algorithms with different biological bases. The algorithms considered were genetic algorithm (GA), particle swarm optimization (PSO), and simulated annealing (SA). These algorithms were implemented, to solve a set of benchmarking problems and a real-world image segmentation problem, and their performance were compared. The performance metrics used for the comparison were the solution obtained (So), number of generations (NoG), and the execution time (ExeTime). It was observed from the results, of benchmarking problems, that PSO has given better solutions followed by SA and GA. For the real-world problem, it was concluded with the results that GA has segmented the image better than SA and PSO. Keywords Bio-inspired algorithms · Particle swarm optimization · Genetic algorithm · Simulated annealing · Image segmentation · Comparative study

1 Introductions Nature, which exhibits diversity, dynamicity, complexity, robustness and fascinating phenomenon, is a great source of inspiration for solving hard and complex problems in computer science (CS). For the past decades, numerous research efforts have E. Lakshmi Priya · C. Sai Sreekari · G. Jeyakumar (B) Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] E. Lakshmi Priya e-mail: [email protected] C. Sai Sreekari e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_44

411

412

E. Lakshmi Priya et al.

been concentrated on solving the optimization problems around us by taking inspiration from the nature. Thus, the field of study bio-inspired computing (as we call it) emerged to solve computer science problems using the models of biology. It is observed across the CS community that the BiC algorithms are providing optimal solutions with less computation requirements. Thus, the BiC algorithms are gradually gaining huge prominence. There exist many algorithms under BiC, and each of them differs from others in the way it solves the optimization problem. The algorithmic structure of each algorithm is matched with the biological model it follows. Hence, the articles describing the working of BiC algorithms are much appreciated by the practitioners as well as the researchers in the community. Following this tendency, this paper proposes to compare the performance GA, PSO and SA on well-defined benchmarking problems and a real-world medical image problem. The remaining part of the paper is organized with Sect. 2 for related works, Sect. 3 for presenting the results and discussion, and finally Sect. 4 for the conclusions.

2 Related Works A comparative study of results of five algorithms: GA, PSO, artificial bee colony (ABC) algorithm, invasive weed optimization (IWO) algorithm, and artificial immune (AI) algorithm to solve some standard benchmark multivariable functions was presented in [1]. The comparison of ant colony optimization (ACO) and PSO on optimizing the membership functions of a fuzzy logic controller was presented in [2]. Another comprehensive review and comparative study of the Bi C algorithms was presented in [3]. A comprehensive review of the significant bio-inspired algorithms that are popularly applied in sentiment analysis is presented in [4]. A comparative study of four bio-inspired algorithms (GA, PSO, DBDE, and BSO) in finding optimal energy saving pattern for an intelligent Internet of things-based system was presented in [5]. The authors of [6] proposed a hybrid bio-inspired algorithm for load balancing and scheduling among the cloudlets. The proposed algorithm was compared with firefly algorithm, ACO, and ABC. Aswanth et al. [7] presented comparison of firefly algorithm, symbiotic organism search algorithm, harmony search algorithm, and the k-means algorithms for clustering the sensor nodes in wireless sensor networks. A similar study on comparing the algorithms for their performance in solving the autonomous landing problem of Unmanned aerial vehicle was presented in [8]. Following the above-mentioned trend of research, this paper proposes to compare the performance of GA [9], PSO [10] and SA [11] on a set of four benchmarking problems and a medical image segmentation problem.

A Comparative Study on the Performance …

413

3 Results and Discussions The experimental study was performed in two phases. The Phase I compared the performance of the algorithms on four benchmarking functions chosen from CEC2005 [12]. The Phase II used the algorithms to solve a medical image segmentation problem and compared their performance. Phase I—The GA, PSO and SA were used to solve the benchmark functions with varying dimensions. The dimensions (d) used for this study were (d = 2, d = 5 and d = 10). The benchmark functions [12] taken for this phase were Ackley, Rastrigin, Griewank and Bent Cigar. These functions are differing from each other by their basic properties. The performance metrics measured (solution obtained (So), number of generations (NoG), and the execution time (ExeTime)) for GA, PSO and SA on solving the above benchmarking functions are presented in Table 1. As shown in Table 1, for Ackley function, PSO gives best solutions at all the dimensions.GA takes less execution time as the dimension increases. PSO takes more execution time as the dimension increases. SA takes more generations to solve the function. It also Table 1 Results for benchmarking functions Dimension (d = 2) Results

GA

Dimension (d = 5)

Dimension (d = 10)

PSO

SA

GA

PSO

SA

GA

PSO

SA

4.44 e−16

0.00

7.73

8.93 e−08

0.17

7.118

0.01

18.20

Ackley function So

4.05

NoG

299

299

29,999

299

299

29,999

299

299

29,999

ExeTime

0.98

0.796

2.13

1.02

1.09

2.28

1.35

1.656

2.886 0.603

Rastrigin function So

1.07

0.0

0.026

9.21

1.03

0.026

18.01

5.43

NoG

299

299

29,999

299

299

29,999

299

299

29,999

ExeTime

0.9538

0.721

1.859

0.965

1.177

1.859

1.276

1.51

2.81

2.220 e−16

3.877

1.549

0.029

16.64

20.56

0.072

13.15

Griewank function So

0.528

NoG

299

299

29,999

299

299

29,999

299

299

29,999

ExeTime

0.9171

0.753

1.879

1.028

1.044

2.381

1.405

1.476

2.811

So

0.528

2.220 e−16

3.877

1.549

0.029

16.64

20.56

0.072

13.15

Bent Cigar function So

1.09 e6

2.53 e−33

0.136

25.8 e6

1.59 e−08

4.100

598 e6

132.34

5.17

NoG

299

299

29,999

299

299

29,999

299

299

29,999

ExeTime

0.766

0.722

1.954

0.861

0.953

2.133

1.186

1.301

2.309

414

E. Lakshmi Priya et al.

noticed that the performance of the algorithms decay as the dimension increases. For the Rastrigin function, PSO gives the best solutions at lower dimension and SA gives best solutions at higher dimensions. Execution time taken by GA was less than other algorithms, at higher dimensions. PSO takes more execution time as the dimension increases. The results for the Griewank function show that PSO performs superior, consistently, in all the dimensions in giving best solutions. The ExeTime of GA was less than other algorithms for higher dimensions. For the Bent Cigar function, for d = 2 and d = 5, PSO outperformed others by So. At d = 10, SA has given best So. For d = 2, the ExeTime also less for PSO. However, for higher dimensions GA took less ExeTime compared to other algorithms. For unimodal functions (Griewank and Bent Cigar)—PSO has given best So consistently in all the dimensions.GA was able to get its solutions faster, i.e., with less execution time. SA could produce good results at few higher dimensional cases, however, with more number of generations. For multimodal functions (Ackley and Rastrigin)—PSO was found superior in producing best results than other algorithms for varying dimensions. GA was found taking less execution at higher dimensions, and PSO was taken more execution time as dimension increases. SA was found to take more generations to solve multimodal functions also. The Phase I of the comparative study revealed that PSO is good at solving all the functions irrespective of its problem characteristics. However, PSO takes more execution time as dimension increases and GA takes less time as dimension increases. SA gives better result at higher dimensions (but not consistently) and it takes more generations to solve benchmarking functions. Phase II—The Phase II of the comparative study is to solve a medical image Segmentation using GA, PSO and SA and to compare their performance. Image segmentation is the process of partitioning of an image to many segments. Image segmentation is primarily used in locating objects and boundaries in an image. Medical image segmentation plays an essential role in computer-aided diagnosis systems which are used in various applications such as microscopy, X-rays, and MRI scans. Image segmentation is considered to be a most essential medical imaging process as it extracts the region of interest. The input image taken for segmentation is depicted in Fig. 1. The steps followed for GA, PSO, and SA to solve the image segmentation problems are described below. The output images obtained using GA, PSO and SA are shown in Fig. 1.

Fig. 1 Images a input, b GA’s output, c PSO’s output, d SA’s output

A Comparative Study on the Performance …

415

3.1 Image Segmentation Using GA The image is divided into subimages and GA is applied to each subimage starting with a random initial population. Each individual is evaluated using an arbitrary fitness function. Best-fit individuals are taken and mated to produce offspring in forming the next generation. Morphological operation is used to make new generation with the help of crossover and mutation operators. The algorithm finally comes to an end to give the final segmented subimage. The segmented subimages are combined to form the final end image. The execution time taken by GA is 37.16 s.

3.2 Image Segmentation Using PSO Set a particular threshold level. For each speck in the population—upgrade speck’s fitness in search space, upgrade speck’s best fitness in search space, and move the speck in the population. For each speck do—if the swarm gets finer, reward the swarm and extend the speck’s and swarm’s life, else remove the speck and decrease the swam’s life. Extend the swarm to breed and it is considered for the next iteration. Delete the failed swarm and rest the threshold counter. Execution time taken by PSO is 33.39 s.

3.3 Image Segmentation Using SA The image is divided into subimages, and SA is applied to each subimage. Initialize the temperature to T. Calculate energy U of the conformation. Alter the system using appropriate Gaussian distraction. Calculate new energy U 1 of the altered system and calculate the change in energy of the system as well [det(U) = U − U 1 ]. If [det(U) > 0] accept the altered system as the new conformation. Else accept the altered system as the new conformation with a probability exp. [det(U)/KT]. Reduce the temperature corresponding to the cooling schedule. Repeat the above steps until it cools to do a considerably low value. Now SA has been applied to a subimage. Repeat the above steps for each subimage and combine all the subimages to get the final segmented image. Execution time taken by SA is 40.98 s. On comparing the resultant images and the execution time taken by GA, PSO and SA, the following inferences are recorded from the Phase II comparative study. (1) GA has segmented the image better followed by SA and PSO. (2) PSO takes less execution time but with poor quality of result. (3) The best-to-worst order of algorithms based on the clarity of output image is GA, SA and PSO. (4) The best-to-worst order of algorithms based on the execution time taken in PSO, GA and SA.

416

E. Lakshmi Priya et al.

4 Conclusions This paper analyzed the working and the performance of three widely used bioinspired algorithms namely GA, PSO, and SA. An elaborative comparative study was performed in two phases. In Phase I, four benchmarking functions with different characteristics were solved by GA, PSO and SA and their performance were compared by three performance metrics. The experiments were done with different problem dimensions. This phase could identify that PSO was consistently outperforming other algorithms in producing optimal solution at all the dimensions. GA was found to take lesser execution time, but not with good solutions. Few interesting higherdimensional cases were observed where SA was able to perform better than GA and PSO. In Phase II, a medical image segmentation problem is solved by GA, PSO and SA and their performances were compared based on the solution quality and the execution time. The observations found were—GA was good in producing good solutions, PSO was good in solving the problem faster and was no special remarkable performance by SA. This contradictory performance of the algorithms on the benchmarking problems, and the real-world problems needs to be investigated further with more extensive experimental setup and different optimization problems.

References 1. Krishnanand, K.R., Nayak, S.K., Panigrahi, B.K., Rout, P.K.: Comparative study of five bioinspired evolutionary optimization techniques. In: Proceedings of 2009 World Congress on Nature and Biologically Inspired Computing (NaBIC), pp. 1231–1236 (2009) 2. Castillo, O., Martinez-Marroquin, R., Melin, P., Veldez, F., Soria, J.: Comparative study of bio-inspired algorithms applied to the optimization of type-1 and type-2 fuzzy controllers for an autonomous mobile robot. Inf. Sci. 192, 19–38 (2012) 3. Kalaiarasi, S., Sirramya, P., Edreena, P.: A review and comparative stud of bio-inspired algorithms. Int. J. Appl. Eng. Res. 9(23), 23435–23448 (2014) 4. Yadav, A., Vishwakarma, D.K.: A comparative study on bio-inspired algorithms for sentiment analysis. In: Cluster Computing (2020) 5. Romero-Rodriguez, W.J.G., Baltazar, R., Zamudio, V., Casillas, M., Alaniz, A.: Comparative study of bio-inspired algorithms applied to illumination optimization in an ambient intelligent environment. Smart Innov. Syst. Technol. 148 (2020) 6. Shobana, S., Radhika, N.: Efficient cloudlet provisioning using bio-inspired hybrid algorithm in mobile cloud computing. J. Adv. Res. Dyn. Control Syst. 10(5), 1672–1678 (2018) 7. Aswanth, S.S., Gokulakannan, A., Sibi, C.S., Ramanathan, R.: Performance study of bioinspired approach to clustering in wireless sensor networks. In: Proceedings of 3rd International Conference on Trends in Electronics and Informatics (2019) 8. Harun Surej, I., Ramanathan, R.: A Performance study of bio-inspired algorithms in autonomous landing of unmanned aerial vehicle. In: Proceedings of Third International Conference on Computing and Network Communications (2019) 9. Holland, J.H.: Adaptation in Natural and Artificial System. MIT press, Cambridge, USA (1975) 10. Russell, E., James, K.: Particle swarm optimization. Proc. IEEE Int. Conf. Neural Netw. 4, 1942–1948 (1995)

A Comparative Study on the Performance …

417

11. Van Laarhoven, P.J.M., Aarts, E.H.L.: Simulated Annealing: Theory and Applications, pp. 7–15 (1987) 12. Chen, Q., Liu, B., Zhang, Q., Liang, J., Sugunathan, P., Qu, B.: Problem definitions and evaluation criteria for CEC 2015. In: Proceedings of Special Session on Bound Constrained Single-Objective Computationally Expensive Numerical Optimization (2015)

A Study on Optimization of Sparse and Dense Linear System Solver Over GF(2) on GPUs Prashant Verma and Kapil Sharma

Abstract There are various crypt-analytic techniques where solving a large dense or sparse system of linear equations over finite field becomes a challenge due to high computation. For instance, the problem like NFS for factorization of large integers, symmetric ciphers for crypt-analytic problem, discrete log problem, and algebraic attacks involves solving large sparse or dense linear systems over finite field. Here, we consider GF(2) finite field. Gaussian elimination is the popular and relevant method for solving large dense systems while Block Lanczos and Block Wiedemann algorithms are well known for solving large sparse systems. However, the time complexity of such popular method makes it reluctant and hence, the concept of parallelism is made compulsory for such methods. In addition, the availability of high end parallel processors and accelerators such as general-purpose graphics processing units (GPGPUs) solves computationally intensive problems in reasonable time. The accelerators with thousand of cores available today explore the bandwidth of memory and take advantage of multi-level parallelism on multi-node and multiGPU units. Here, we consider Nvidia GPUs like Keplar, Pascal, and Volta along CUDA and MPI. Also, CUDA-aware MPI leverages GPU-Direct RDMA and P2P for inter- and intranode communication. Keywords Cryptography · GPGPUs · GPU-direct P2P · MIMD · RDMA

1 Introduction In today’s world, the growth of digital information has been increased rapidly; therefore, information security is imperative for the security requirement of the digital world. There are various crypt-analytic techniques where solving a large system of P. Verma (B) · K. Sharma Department of Information Technology, Delhi Technological University, New Delhi, Delhi, India e-mail: [email protected] K. Sharma e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_45

419

420

P. Verma and K. Sharma

linear equations over finite field become a challenge due to high computation. The system can be either dense or sparse depending on algorithms and problem defined in cryptography. For instance, the integer factorization problem using number fields sieve (NFS) [1] algorithm, discrete log problem, symmetric ciphers cryptanalysis, and algorithms used in algebraic attacks involves handling large system of linear equations (either dense or sparse) over finite field or GF(2). Gaussian elimination is the popular and relevant method to handle large dense systems. In order to achieve the results in short span of time, it is hard to define the hotspots for parallelism with high extent. Additionally, such huge systems cannot sustain into the memory of a particular node; therefore, an effective solver based on latest parallel hardware platform and Gaussian elimination approach should be present. Block Lanczos [2, 3] and Block Wiedemann [4–6] are the popular methods to solve such compute intensive problems but the time complexity of such systems is cubic, therefore such system is computationally slow and practically not feasible. To solve compute intensive problems in reasonable amount of time, accelerated units such as general-purpose graphics processing units (GPGPUs) are accomplished. Now it is very popular to create supercomputer in the form of clusters where each node hosts multiple GPUs. If we see the top 500 supercomputer list [7], we found that most of them are GPUs based. Thus, it is necessary to develop applications in a way that it can be efficiently scaled over multiple GPGPUs and nodes. The original method for Block Lanczos algorithm [8, 9] is roughly split in three steps. The steps are preprocessing, Lanczos iterations, and post-processing and at higher densities greater than 10%, the Block Lanczos is quite costly in terms of performance. This paper describes the research work for optimizations carried out on an existing GPU enabled code for Gaussian elimination and Block Lanczos algorithm. The optimization exercise started with understanding, performance profiling of the existing methods. The next section gives the details of literature review. Section 3 explains parallel methodology of Gaussian elimination and Block Lanczos over GF(2) [3, 10]. The optimization in multiple hardware platforms is explained in Sect. 4. The next Sect. 5 shows the results over different hardware platform for performance and scalability of Gaussian elimination and Block Lanczos for dense and sparse system over GF(2), and finally Sect. 6 concludes the paper.

2 Literature Review In order to solve dense and sparse system of linear equations over GF(2), the methods that are available were implemented serially [11, 12]. The parallel implementation is also available [13], but they are not optimized with latest hardware platforms available and hence not fully utilized the available hardware resource of latest existing technology. Nvidia introduces series of accelerating cards for researchers to make their application parallel and solve bigger problems in a reasonable amount of time [14]. Figure 1 shows the architecture of Nvidia GPU where grid, blocks, and thread are

A Study on Optimization of Sparse and Dense …

421

Fig. 1 GPUs grid, block, and thread

arranged. For solving large system of dense linear equations, Gaussian elimination is a prominent area for researchers and the research to optimize its parallel version is less focused. Koc and Arachchige [15] proposed Gaussian elimination algorithm over finite field GF(2) and implemented the same on the geometric arithmetic parallel processor known as GAPP. Parkinson and Wunderlich [16] proposed the parallel Gaussian elimination for finite field GF(2) and the same was deployed on the parallel array processor named as ICL-DAP. Bogdanov et al. [17] used a hardware that is parallel in architecture to solve Gaussian Elimination over finite field GF(2) quickly. This architecture was implemented on a field-programmable gate array (FPGA). In addition to this, the author also evaluates for a possible implementation based on ASIC architecture. All these solutions can solve only small systems of dense over finite field GF(2) and are very costly using special kind of hardware platforms. Albrecht and Pernet [18] proposed the solution of dense system of linear equations over finite field GF(2). The solution used multicore architectures and are very efficient and part of the Method of four Russians (M4RI) library [19]. This solution shows the performance results for 64 × 64 K linear systems of equations and presented that their method is as good as to the implementation by Allan Steel [20] for solving Gaussian elimination over finite field or GF(2) using MAGMA library. The work to solve Gaussian elimination over finite field or GF(2) in a general-purpose processor is not yet focused. This solution shows the performance results for 64 × 64 K linear systems of equations and presented that their method is as good as to the implementation by Allan Steel for solving Gaussian elimination over GF(2) using MAGMA library. This is the first work to solve Gaussian elimination over GF(2) in GPGPUs.

422

P. Verma and K. Sharma

The challenge with the sparse matrix is to reduce the substantial memory requirements by accumulating the only nonzero elements. Depending on the sparsity factor distinct data structures can be utilized to save enormous amount of memory. Formats to save only nonzero elements can be divided into mainly two groups. The first groups are those that support modification efficiently. For instance, dictionary of keys (DOK), list of lists (LOL), or coordinate list (COO) comes under this category and typically used for constructing the matrices. The second groups that help effective entree and matrix actions, such as compressed sparse column (CSC) or compressed sparse row (CSR) [21, 22]. Figure 3 shows storage representation of dense and sparse matrix format. The system has the order of hundreds of thousands of unknowns [23, 24]. Therefore, an efficient optimized Block Lanczos solver for large sparse systems should be available which can run on multiple instruction multiple data (MIMD) architecture shown in Fig. 2, a kind of cluster of multiple nodes, each consist of either one graphics processing unit or multiple graphics processing unit. This study shows that how a Gaussian elimination for dense system and Block Lanczos for sparse systems leverages parallel hardware and scales efficiently over MIMD architecture with hybrid technology [25].

Fig. 2 Multiple GPU devices across multiple nodes using MPI and CUDA

Fig. 3 Sparse matrix storage format representation

A Study on Optimization of Sparse and Dense …

423

3 Research Methodology and Analysis Given a system of linear equations over GF(2) and the task is to find out the equations linearly dependent on others and remove them. Consider the system of equations where no. of equations equals to no. of variables and of order O (105 ) or higher.

4 Linear System Solver Over GF(2) for Dense The system is of the form A * x = B (mod 2) where Matrix A is dense its 50% of the elements nonzero and no. of rows is greater than no. of columns. All arithmetic operations are over GF(2) which means that addition and multiplication is equivalent to logical XOR and logical AND, respectively. The Gaussian elimination to solve large dense system of equations has following steps:

4.1 Generate Random Matrices [1] [2] [3] [4]

Generate entries of A and x with pseudo-random number generator Compute A * x = B Solve linear system [A, B] Compare computed solution of linear system with reference.

4.2 Generate Linear System Using LFSR [1] [2] [3] [4]

LFSR is initialized to random input Clocked multiple times to produce multiple bits of output The output bits are expressed as linear combination of initial condition With enough equations, a linear system of equations can be formed a. Initial condition of LFSR as unknown

[5] Solve the linear system a. Compare the computed initial condition with reference

4.3 Single and Multi-GPU Gaussian Elimination [1] Matrix is split in parts rowwise [2] Each GPU owns exactly one part [3] All processing (3 kernels) on the part is done by owner GPU

424

P. Verma and K. Sharma

[4] All operations are done in parallel by the GPUs [5] Consensus about pivot is achieved after findPivot operation.

4.4 Optimization [1] [2] [3] [4] [5] [6] [7]

Performance is heavily influenced by memory access pattern How should A be stored? Find pivot prefers column major, extracting pivot row prefers row major One coalesced and one stride accesses of memory is unavoidable Row reduction works better with column major Store transpose A instead of A Memory access pattern.

5 Block Lanczos Solver Over Finite Field or GF(2) for Sparse The initial method for Block Lanczos algorithm is roughly split in three steps preprocessing, Lanczos iterations and post-processing shown in Fig. 4. In the preprocessing step, operation such as memory allocation, initialization, and loading of the linear system data are done. The Lanczos step involves the iterative part of code that computes the solution and finally in post-processing step solution is written to file. The optimization work that has been explored is as follows. Fig. 4 Handling steps in Block Lanczos algorithm

A Study on Optimization of Sparse and Dense …

425

5.1 Better Test Data Generation The method requires sparse linear systems as input for benchmarking the performance. A new data generating module should be present, which is faster and can generate arbitrary relations between columns of the matrix.

5.2 Optimization of SpMV and SpMTV Operations Pression The Lanczos step involves repeated calls to two GPU kernels, the sparse matrix– vector multiplication (SpMV) and sparse matrix transpose vector multiplication (SpMTV). The high percentage share of these two kernels makes them primary candidate for optimization. Performance of both the kernels is improved with following techniques. The SpMV and the SpMTV are both matrix–vector multiplication. The matrix–vector multiplication is composed of multiple dot products. Multiple dot products can be executed in parallel and warp (vector of 32 threads) is dedicated for computing one dot product. The dot product operation involves two steps: first pointwise multiplication and second is adding all multiplication results together. The pointwise multiplication can be done in parallel by each thread of the warp. However, for adding the multiplication results together, is reduction operation and thus threads need to cooperate. The Kepler architecture introduced four shuffles instructions: _shfl(), _shfl_down(), _shfl_up(), and _shfl_xor(). Figure 5 shows shuffle down operation on 8 threads. Shuffle instructions allow faster cooperation between threads from same warp. Effectively, threads can read registers of other threads in the same warp. The reduction operation in new version of SpMV is implemented using shuffle instructions. The shuffle-based reduction performs better than even the shared memory atomics-based implementation. This modification leads to better work distribution among threads of a warp and reduces warp divergence significantly. Warp-level approach also results in more coalesced memory access. Fig. 5 Warp shuffle instruction

426

P. Verma and K. Sharma

6 Conclusions This paper presents a study on optimization of scalable solution for solving large sparse and dense systems of linear equations over finite field or Galois Field for binary, i.e., GF(2). These solvers are utilized as a library for various cryptography and cryptanalysis applications like integer factorization problem using NFS, cryptanalysis of ciphers, DLP, algebraic attacks, etc. The research work explored CUDA and MPI to leverage multi-level parallelism used in multi-socket, multi-GPU systems. Many optimizations techniques with respect to solving large dense and sparse systems are discussed to tell about the capabilities of the device kernels and excellent scalability in multi-GPUs architecture. At higher densities (>10%), the Block Lanczos is quite costly in terms of performance. For such cases, even the dense solver such as Gaussian elimination can be tried. The SpMV and SpMTV are essentially matrix–vector operations. In SpMV the matrix is in normal format while in SpMTV the matrix is in the transposed format. This change leads to huge change in performance of code. The transpose multiply is 3–4x slower than the normal multiply. The overhead of this approach in terms of execution time is, time needed for transposing the matrix and in terms of memory is doubling the storage space of matrix. The future research is to explore hotspots in a program which are massively parallel and offload it to the GPGPUs. We also focus on memory out of bound, where the system of linear equations overtakes the memory space of an individual GPU.

References 1. Wang, Q., Fan, X., Zang, H., Wang, Y.: The space complexity analysis in the general NFS integer factorization. Theor. Comput. Sci. 630, 76–94, (2016). ISSN: 0304–3975, https://doi. org/10.1016/j.tcs.2016.03.028 2. Sengupta, B., Das, A.: Use of SIMD-based data parallelism to speed up sieving in integerfactoring algorithms. IACR Cryptol. 44 (2015) 3. Intel Corp.: Technical Report. https://en.wikipedia.org/wiki/Lanczos algorithm (2009) 4. Giorgi, P., Lebreton, R.: Online order basis algorithm and its impact on the block Wiedemann algorithm. In: Proceedings of 39th International Symposium on Symbolic and Algebraic Computation (ISSAC’14), pp. 202–209. ACM (2014) 5. Huang, A.G.: Parallel Block Wiedemann-Based GNFS Algorithm for Integer Factorization. Master thesis, St. Francis Xavier University, Canada (2010) 6. Zhou, T., Jiang, J.: Performance modeling of hyper-scale custom machine for the principal steps in block Wiedemann algorithm. J. Supercomput. 1–23 (2016) 7. Top 500 list—Nov 2017. https://www.top500.org/list/2019/11/ 8. Summit: Oak Ridge National Laboratory’s Next High-Performance Supercomputer. https:// www.olcf.ornl.gov/olcfresources/computesystems/summit 9. Flesch, I.: A new parallel approach to the Block Lanczos algorithm for finding null spaces over GF (2). Master thesis, Utrecht University, The Netherlands (2006) 10. Thomé, E.: A Modified Block Lanczos Algorithm with Fewer Vectors. arXiv:1604.02277 11. Yang, L.T., Huang, Y., Feng, J., Pan, Q., Zhu, C.: An improved parallel block Lanczos algorithm over GF (2) for integer factorization. Inf. Sci. 379, 257–273 (2017). ISSN 0020-0255, https:// doi.org/10.1016/j.ins.2016.09.052

A Study on Optimization of Sparse and Dense …

427

12. Xu, T.L.: Block Lanczos-Based Parallel GNFS Algorithm for Integer Factorization. Master thesis, St. Francis Xavier University, Canada (2007) 13. Yang, L.T., Xu, L., Yeo, S.S., Hussain, S.: An integrated parallel GNFS algorithm for integer factorization based on Linbox Montgomery block Lanczos method over GF (2). Comput. Math. Appl. 60(2), 338–346 (2010) 14. Reaño, C., Silla, F.: Performance evaluation of the NVIDIA pascal GPU architecture. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications, pp. 1234–1235. Sydney, NSW (2016). https://doi.org/10.1109/HPCCSmartCity-DSS. 2016.0173 15. Koc, K., Arachchige, S.N.: A fast algorithm for gaussian elimination over GF (2) and its implementation on the GAPP. J. Parallel Distrib. Comput. 13(1), 118–122 (1991) 16. Parkinson, D., Wunderlich, M.: A compact algorithm for gaussian elimination over GF (2) implemented on highly parallel computers. Parallel Comput. 1(1), 65–73 (1984) 17. Bogdanov, A., Mertens, M.C., Paar, C., Pelzl, J., Rupp, A.: A parallel hardware architecture for fast gaussian elimination over GF (2). In: 14th IEEE Symposium on Field-Programmable Custom Computing Machines, pp. 237–248 (2006) 18. Albrecht, M.R., Bard, G.V., Pernet, C.: Efficient dense gaussian elimination over the finite field with two elements. CoRR, abs/1111.6549, 2011 19. M4ri library. https://github.com/malb/m4ri 20. Bosma, W., Cannon, J., Playoust, C.: The magma algebra system I: the user language. J. Symbol. Comput. 24(3–4), 235–265 (1997) 21. Buluç, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In: Proceedings of the Twenty-First Annual Symposium on Parallelism in Algorithms and Architectures (SPAA ’09). Association for Computing Machinery, New York, NY, USA, pp. 233–244. https://doi. org/10.1145/1583991.1584053 22. Vastenhouw, B., Bisseling, R.H.: A two-dimensional data distribution method for parallel sparse matrix-vector multiplication. SIAM Rev. 47(1), 67–95 (2004) 23. Zamarashkin, N.L., Zheltkov, D.A.: GPU based acceleration of parallel block Lancoz solver. Lobachevskii J. Math. 39(4), 596–602 (2018) 24. GPU acceleration of dense matrix and block operations for lanczos method for systems over large prime finite field. Supercomput. RuSCDays Ser. Commun. Comput. Inf. Sci. 793, 14–26 (2017) 25. Gupta, I., Verma, P., Deshpande, V., Vydyanathan, N., Sharma, B.: GPU-accelerated scalable solver for large linear systems over finite fields. In: 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC), Solan Himachal Pradesh, India, pp. 324–329 (2018). https://doi.org/10.1109/PDGC.2018.8745743

Intracranial Hemorrhage Detection Using Deep Convolutional Neural Network K. Thirunavukkarasu, Anmol Gupta, Satheesh Abimannan, and Shahnawaz Khan

Abstract A brain hemorrhage is a serious medical emergency that can cause intracranial bleeding that occurs inside the cranium. Intracerebral hemorrhage leads to severe neurological symptoms on one side of the human body, such as loss of consciousness, numbness, or paralysis. That often needs swift and intense therapy. Hypertension specialists review the patient’s cranial medical images to see the location of intracranial bleeding. Now, it is a complex process and often time-consuming. This research identifies a convolutionary neural network approach from computed tomography scans for automatic brain hemorrhage detection. Convolutional neural networks are a powerful image-recognition technique. This research evaluates a firm neural network optimized for the detection and quantification of intraperitoneal, subdural/epidural, and subarachnoid hemorrhage on contrast CT scan. The dataset used for this research includes 180 GB images of 3D head CT studies (more than 1.5 million 2D images). All provided images are in DICOM format used for medical images. Keywords Intracranial hemorrhage detection · Deep convolutional neural network

1 Introduction A debilitating illness classifies intracranial hemorrhage (ICH) [1]. This illness is one of the leading causes of death and injury and causes a stroke. Intra-crane hemorrhage is identified as bleeding inside the human body’s skull. Traumatic brain injury (TBI) is among the leading causes of death and disability in USA. This represents nearly 30% of all deaths in 2013. There is a high risk of TBI transforming into a secondary brain injury that can lead to insensitivity. If it remains untreated, it may K. Thirunavukkarasu (B) · A. Gupta · S. Abimannan School of Computer Science and Engineering, Galgotias University, Greater Noida, India e-mail: [email protected] S. Khan Department of Information Technology, University College of Bahrain, Saar, Bahrain © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_46

429

430

K. Thirunavukkarasu et al.

Fig. 1 Types of hemorrhage

lead to death. It is considered clinically awful. Intracranial hemorrhage is divided into five subtypes (Fig. 1) based on its location in the brain: intraparenchymal (IPH), intraventricular (IVH), subdural (SDH), subarachnoid (SAH), and epidural (EDH). Intracranial hemorrhage that occurs within brain tissue is often called intracerebral hemorrhage. Detecting hemorrhages and even mild ones are difficult. Specially trained radiologists need a high degree of concentration for the study. The head’s computed tomography (CT) [2] is the method of medical imaging of workhorses adopted worldwide to diagnose neurological accidents. The CT image quality and its quick procurement time make it a suitable diagnostic method for primary evaluation of intracranial hemorrhage over magnetic resonance imaging. CT scans to produce a series of images. It uses X-ray beams to capture brain tissues with varying intensities based on the magnitude of X-rays absorption in the tissue. The CT images are shown by means of a windowing system. The HU numbers are converted into grayscale values [0, 255] by the parameters of the width and the level of the windows in the windowing method. CT scanning grayscale images, however, are constrained by poor signal-to-noise, lower contrast, and a huge percentage of artifacts. In a massive 3D volume with near-perfect sensitivity, one individual challenge is to recognize miniature subtle anomalies. Deep convolutionary neural networks are a spectacular branch of machine learning for visual objects that have gained considerable focus in recent years due to their striking success in various computer applications, such as object recognition, detection, and segmentation. Deep CNN have succeeded with significant results [3, 4].

Intracranial Hemorrhage Detection …

431

Convolutionary neural networks discover the associations between the input images’ pixels by extracting characteristic features through pooling and convolution. The features discovered using learned kernels at each layer vary in terms of their complexity [5]. The first layer removes basic features like edges and then removes more complex and high-level features from layers. Convolution operation on CNNs is supported by three main sections: (i) Weight-sharing mechanism to handle 2D images or 3D data, such as volumetric images and videos [6, 7], (ii) local input topology connectivity can be used using 2D or 3D kernels, (iii) slight shift-invariance generated using the pooling layer [8]. Newly proposed very deep CNN architectures to replace the traditional convolutionary layer with more robustly represented modules by using limited computational resources [9, 10]. Szegedy et al. [11] introduced inception modules that could remove multi-scale features from the input function maps and could effectively decrease parameter numbers. The inception module was developed with a higher uniform clarified architecture compare to past versions and could. Therefore, the performance achieved for large-scale image classification task was of top class. For years, traditional supervised machine-learning approaches [12, 13] were developed using well-engineered algorithms and programmed training techniques. The method consisted of taking the immature data, describing their content with lowdimensional feature vectors—using detailed prior knowledge of the outline being discussed—and inputting the vectors into a trainable classifier. Although the classifier was indeed useful for other purposes, the features were not generic in principle. The precision of the method would depend on how well the heuristics were designed. The most important type of deep neural networks, such as video sequences or images, is convolutionary neural networks. It has the capability to process arraylike data. The concept behind CNN, from a top-level standpoint, is to identify compositional hierarchy characteristics that object from the real-life display [14]. The next section of this research illustrates the materials and methods used in the detection of intracranial hemorrhage. It also illustrates the detailed architecture, algorithm, and implantation procedures of the proposed methods. Section 3 presents the results of this paper. Section 3 discusses the result and discussion of the project system and existing system. Finally, Sect. 4 concludes this research and proposes guidelines for future research.

2 Materials and Methods The paper proposes a method using deep CNN and weighted multilabel focal loss for the classification of intracranial hemorrhages into epidural hemorrhage, intraparenchymal hemorrhage, intraventricular hemorrhage, subarachnoid [15] hemorrhage, and subdural hemorrhage.

432

K. Thirunavukkarasu et al.

subtype intraparenchymal

Subarachnoid

subdural

intraventricular

epidural

Fig. 2 Target class distribution of the training data

Dataset The paper used the intracranial hemorrhage dataset RSNA [16] for the analysis of intracranial hemorrhage. The dataset contains 4,516,818 DICOM format images of five different types of intracranial hemorrhage together with its associated metadata which was labelled with the help of 60 volunteers. Figure 2 shows the distribution of the training data. Since a standard training and validation split was not provided for the dataset, we split into 70:30 splits. The image augmentation techniques such as rotation, zoom, scale, and translate were performed before splitting the dataset. Also to avoid any data leakage was performed while evaluating model performance adversarial cross-validation. Proposed Deep CNN Architecture Deep CNN with regularizing layers [17] like max pooling. Dropout is used to obtain CT scans embedding features, which are then classified into four different classes using a neural network fully convolution. Architecture of the model with its corresponding input and output form is shown in Fig. 3. Since the total trainable parameters are 5, 147, 716 which could result in overfitting while practicing, batch normalization, max pooling, and dropout layers are applied to the model which increased the generalizability of the proposed model. Loss Function Our dataset was highly imbalanced with very few images of epidural hemorrhage while other forms of hemorrhage had roughly the same distribution. Because of this imbalance complexity of data loss functions such as categorical cross-entropy, we did not hit the global minimum and used weighted class approach and weighted multilabel focal loss to solve this problem and found that the weighted multilabel focal loss handled the class imbalance problem as shown in Table. 1 very well. The Weighted multilabel focal loss used in our methodology is given as L=

N M 1  wm .[a + b] X n=1 m=1

(1)

Intracranial Hemorrhage Detection …

Fig. 3 Architecture of the proposed method with layer type and output shape

433

434

K. Thirunavukkarasu et al.

Table 1 Results comparison MT

REG

AUG

Objective function

Log loss

CNN

No

No

CCE

0.6923

Deep CNN

Yes

Yes

CCE

0.8144

Deep CNN

Yes

Yes

CCE + WC

0.8529

Deep CNN

Yes

Yes

WFL

0.9721

MT model type; REG regularization; AUG augmentation; CCE categorical cross entropy; WC weighted classes; WFL weighted focal loss

γ      γ where a = (1− ∝). 1 − yn,m .tn.m . ln(yn.m ) and b = α.yn,m . 1 − tn,m . ln 1 − yn,m L=

N M 1  wm .[c.ln(yn , m, t)] N n=1 m=1

(2)

γ  where c = (1− ∝t ). 1 − yn,m . where w is the class weight, α is weighing factor, and γ is focusing parameter which is tuned in the range of [0,5] and it was observed that on moving from γ = 0 to γ = 5 the loss evaluated [18] were having higher contributions from imbalanced classes which were wrongly classified.

3 Result and Discussion After careful experimentation with different methodologies and then evaluation using log loss evaluation metrics, we found that our proposed method with deep CNN architecture and weighted focal loss [19] performs very well and achieves 97% accuracy, whereas, in comparison with other methodologies, the key takeaway is that regulatory techniques and proper augmentation were the key methods that helped in achieve top accuracy. Figure 4 shows the training and validation accuracy of our intended model that is being trained for 40 epochs while Fig. 5 shows our intended model’s training and validation losses.

4 Conclusion and Future Work Although the number of hemorrhages in the test sets is low, especially if broken down by type, the findings provide important insights. The highest possible intraparenchymal hemorrhages were recognised. Typically, they were hyperattenuating and enclosed by normal tissue. Epidural hemorrhage was evident straight away.

Intracranial Hemorrhage Detection …

435

Fig. 4 Training and validation accuracy of the proposed model

Fig. 5 Training and validation loss of the proposed model

Missed subdural hemorrhage was primarily hypoattenuating with regards to healthy tissue. Hemorrhages of the subarachnoids are difficult to detect. These are typically thin, with blood filling in the sulci, which are cortex fissures. We use a deep convolutionary neural network to detect brain hemorrhage. The method proposed with deep CNN architecture and weighted focal loss is 97% accurate. Therefore, the future research on the identification of multiple pathologies from brain CT scans is verified. The solution proposed should not be misconstrued as a credible replacement for the real radiologists in the field. In short, the suggested deep CNN system demonstrates the ability to be used as a tool for emergency exams. Still, the method has been tested on a limited test set and is still subject to further experimentation for its real-world implementation.

436

K. Thirunavukkarasu et al.

References 1. Mandybur, T.I.: Intracranial hemorrhage caused by metastatic tumors. Neurology 27(7), 650– 650 (1977) 2. https://www.pnas.org/content/116/45/22737 3. Rao, A.A., Patel, M.D.: Deep 3D convolution neural network for CT brain hemorrhage classification. In: Proc. SPIE 10575, Medical Imaging 2018: Computer-Aided Diagnosis, 105751C (27 Feb 2018). https://doi.org/10.1117/12.2293725 4. Khan, S.N., Usman, I.: Amodel for english to urdu and hindi machine translation system using translation rules and artificial neural network. Int. Arab J. Inf. Technol. 16(1), 125–131 (2019) 5. https://stats.stackexchange.com/questions/362988/in-cnn-do-we-have-learn-kernel-values-atevery-convolution-layer 6. https://datascience.stackexchange.com/questions/26755/cnn-how-does-backpropagationwith-weight-sharing-work-exactly 7. Bashir, T., Usman, I., Khan, S., Rehman, J.U.: Intelligent reorganized discrete cosine transform for reduced reference image quality assessment. Turkish J. Electr. Eng. Comput. Sci. 25(4), 2660–2673 (2017) 8. https://stats.stackexchange.com/questions/121703/what-does-shift-invariant-mean-in-convol utional-neural-network 9. Khan, A., Sohail, A., Zahoora, M.M.E., Qureshi, A.S.: A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. (2019). https://doi.org/10.1007/s10462020-09825-6 10. Shahnawaz, Mishra, R.B.: An English to Urdu translation model based on CBR, ANN and translation rules. Int. J. Adv. Intell. Paradig. 7(1), 1–23 (2015) 11. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9 12. Kotsiantis, S.B.K.: Supervised machine learning: a review of classification techniques. Informatica 31, 249–268 (2007) 13. Khan, S., Kannapiran, T.: Indexing issues in spatial big data management. In: International Conference on Advances in Engineering Science Management and Technology (ICAESMT)2019, Uttaranchal University, Dehradun, India (Mar, 2019) 14. L’azaro-Gredilla, M., Liu, Y., Phoenix, D.S., George, D.: Hierarchical compositional feature learning. arXiv:1611.02252 [Online], https://arxiv.org/pdf/1611.02252.pdf 15. Thorgood, M., Adam, S.A., Nlann, J.: Fatal subarachnoid haemorrhage in young women: role of oral contraceptives. Brit. Med. J. 283, 762 (1981) 16. https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/data 17. https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/ 18. https://machinelearningmastery.com/loss-and-loss-functions-for-training-deep-learning-neu ral-networks/ 19. Multi-class classification with focal loss for imbalanced datasets [Online]. https://www.dlo logy.com/blog/multi-class-classification-with-focal-loss-for-imbalanced-datasets/

A Multi-factor Approach for Cloud Security Francis K. Mupila and Himanshu Gupta

Abstract Cloud computing is known for its complexity regarding different models of deployment and services that it offers. However, security remains a massive hindrance in its development. Hence, a multi-factor approach to secure the cloud environment proposed by this paper relies on authentication and auditing as the fundamental elements for sustaining the privacy of information in the cloud environment. These are needful assets to counter various threats and attacks on the cloud service provider as well as at the user end. This paper proposes a multi-factor approach through which a user’s identity is verified securely; as well as a means to build trust between the client and the cloud service provider by allowing proper visibility of the user’s activities. Keywords Authentication · Auditing · Cloud computing · Cloud security · User’s trust

1 Introduction The security in cloud computing has been a broad viewpoint since cloud computing works in terms of services they provide to the users. Therefore, its security should be applied proportionately. Following the architecture of cloud computing, which is SaaS, PaaS, and IaaS, each level requires particular attention to its security as they do not face the same threats. Moreover, the cloud service provider should dedicate and implement the appropriate security role to the needed application or resources so as not to slow its performance. The security of the cloud environment aims, to ensure F. K. Mupila (B) · H. Gupta Amity University, sector 125, Noida, Uttar Pradesh, India e-mail: [email protected] H. Gupta e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_47

437

438

F. K. Mupila and H. Gupta

the user’s trust in their data and to prevent vulnerabilities from being exploited. It also aims to prevent threats that cause error to the infrastructure and stops the likelihood of attacks to occur. Cloud computing is a virtual base concept of nature. It has its complexities from services or techniques running in the background, such as its elasticity, scalability, and ubiquity. Therefore, it becomes hard to secure and easy to breach the security. Data protection englobes a wide variety of laws, technologies, and policies to protect data, applications, and the ever-expanding cloud computing infrastructure. The cloud security alliance (CSA) published a detailed study on the top 12 information security risks [1] which are listed below. • • • • • • • • • • • •

Data breaches Identity, Credential, and Access Management insufficient Vulnerable IPS and APIs Vulnerabilities of the system Account Hijacking Malicious Insiders Advanced Persistent Threats Data Loss Inadequate Due Diligence Abuse and Nefarious Use of Cloud Services Denial of Services Shared Technology Vulnerabilities.

There are various approaches to assess these threats, such as end-to-end strong and appropriate encryption, better isolation of resources, strong authentication, monitoring and auditing among other. In regards to this, this paper proposes a multi-factor approach where is focus is on authentication, auditing and monitoring which is composed of steps between the cloud service provider and the end-user to manage trust and data integrity of user’s data. By definition, authentication and authorization are two different terms but both of them have a similar implementation as well. Authentication is performed to verify the user before accessing any resources. And, authorization focuses on providing the right access to the right user. Multi-factor authentication (MFA) is defined as an authentication scheme where a computer user has access only after having successfully sent two or more pieces of proofs (or factors) to an authentication mechanism, for example; information (what the user and only the user knows), ownership (what the user and only the user possess), and inherence (something the user and only the user is) [2] Multi-factor authentication approach preserves the confidentiality and integrity of the users. It deploys multiple manners of authentication which averts against attacks such as the man in the cloud attack, men in the middle attack, phishing attack, and so on which may result to the modification of data. The end user’s concerns are about the storage and location of their data, as long as they have no physical access to the data centre, although this flaw creates a significant trust problem with the service provider in which they are subscribed. One of the schemes to build trust between the user and the cloud service provider is by

A Multi-factor Approach for Cloud Security

439

implementing a reliable authentication system which guarantees the data’s integrity and offers proper visibility to the user over their data. Auditing and monitoring the cloud-based application is an essential feature as it helps to recognise any suspicious activity in the network which can be detected while tracking and analysing the infrastructure. As mentioned above, the overall idea of this paper is composed of six-steps, which, emphasises on the authentication, auditing, and monitoring. The latter has the interest to strengthen the trust of the cloud consumer and provides a new way to secure the cloud computing environment.

2 Related Work In view of my proposed work, there are findings of other researchers whose works have been highlighted regarding the security of cloud computing. Ganesh V. Gujar proposed STEP-2 user authentication in which a dynamic token from the hash table is sent to the user’s email ID; then, the token value is required for the step-2 authentication at the user interface. An additional feature for the session management in which the dynamic token generated from hash table will remain valid up to a particular session only. Once the user logs out from the cloud environment, the token expires [3]. Prachi Soni proposed a multi-factor authentication security framework in cloud computing. Here, an elliptic curve point algorithm is implemented, executing ten steps to assure authentication and authorization. However, data confidentiality, integrity, and then control access are based on attribute certificate. This technique is used here to give the power and control to the client by using the combination of cryptography. This also gives access control to keep the data safe from vulnerabilities [4]. Sabout Nagaraju has suggested SecAuthn: Secure authentication of multi-factor authentication for the cloud computing environments. Four steps, notably key credential, username and password, bio-metric fingerprint, and OTP have been used as the multi-factor’s aspects. Then, station-to-station Diffie-Hellman key exchange is used to prepare, encrypt and share one-time session keys, which is followed by the hashed credentials checked in the authentication servers using only the original credentials. The authentication scheme proposed offers true security to the cloud user’s credentials with the aid of GNY logic [5]. Kashif Munir, stated that an in-depth security strategy must be enforced to protect against threats to the integrity and safety of their applications and data. This security line includes firewalls, identification and prevention of intruders, reputation management, log review, and malware protection. Such protection can be used by prudent organisations and service providers in their cloud infrastructure to provide data protection so they can have a leverage on cloud computing before their competitors [6].

440

F. K. Mupila and H. Gupta

Given a number of efforts to address this problem, a number of problems such as identification, privacy, personalization, integration, protection, and scalability remain major barriers to cloud adoption. These are few related works not to mention all, but the security of cloud computing remains an important factor.

3 Proposed Model This work aims to reinforce the security of cloud computing. Considering the fact that a single factor is not enough to secure a cloud-based environment, it is proposed that using a multi-factor approach can provide more security to the environment and establishes a trust between user and cloud service provider. The proposed work offers a secure environment presenting a technique to keep track of all the activity, and a reliable authentication process to give the power to the client in order for the latter to have control over their activities and data. These processes verify the user’s credentials and monitor their access control over the resources. As previously mentioned, authentication is crucial for information privacy, therefore, this work has been divided in such a way authentication is counted in order to mitigate various cyber-attacks over cloud computing, furthermore, monitoring the activities of the user, and their access control refers as the second part of this work where the focus is on granting to the user the visibility over their logs record and the ability to track their activities following the six-steps of the proposed framework where security responsibilities are shared between the client and the cloud service provider in order to avoid a lengthy process at only one side of this communication. Consequently, one of the most important difficulties in integrating cloud-based security is to ensure unified access and accountability across the various domains, however, a mixture of public, private, and even hybrid cloud-based services, makes integration of security services a key task, despite several established networks [7]. The lack of visibility creates gaps in the overall safety of an organisation’s network, making it difficult to see attacks. In the old network architecture, all structures inside the wall of an organisation, that is, under the control of the organisation, were not an essential challenge to maximum visibility in the network. However, when the cloud is adopted, some control is lost and consequently the correct visibility is no longer achievable. Visibility is the main take-over, since devices that you can’t see can’t be secure [8].

3.1 Overview of the Proposed Approach The concerns of this work take place in the cloud service provider side as well as at the user side. Amid the six-steps which contains this process, step one and four are executed at the client-side, whereas, step two, three, five, and six are executed at the service provider side.

A Multi-factor Approach for Cloud Security

441

Fig. 1 Brief description of interaction to the cloud infrastructure

It is known that security over private cloud is more secure since all data are within the boundary (firewall) and are not available for the general public, meanwhile, in public cloud, the subscriber or client does not know the structure of the data centre, particular which server processes the data, how is the implementation of the network, or how secure is the environment. As a matter of fact, there is an urgent need to focus more on preventing breach of confidence than on post-service lack of accountability from diminishing the concerns that hinders its progress so that we could fully benefit from the unprecedented advantages that cloud computing has to offer. An effective standardised trust management system is necessarily required for the individuals and organisations to utilise the potential benefits served by cloud computing technology adequately [9]. Figure 1 illustrates the basic interaction between the client and the cloud service provider. It is well known that authentication is the first operation to take place in order to authenticate the user’s credentials.

3.2 Detailed Description and Working Principle (1) Step 1 The step one of this proposed model concerns a standard login process which means the user has to enter its credential to access the requested resource in the cloud. However, to mitigate identity theft, a unique verification method needs to be set to confirm the email through which the user uses to register for the cloud service. And, the establishment of some other parameters to be used in further steps. (2) Step 2 After the user gained access across the cloud services, now it is the responsibility of the cloud service provider to monitor all the activities performed by the user. The first task from the cloud service provider is to send a report of login to the client through its email address given at the registration time containing details of login such as: • Location • Device IP address • Device type

442

F. K. Mupila and H. Gupta

Fig. 2 Steps of the proposed model

• •

Device Mac address Time The log report is essential in preventing the privacy leakage of user’s data. The detailed report not only provides accurate information about the login session but also enables the user to trace its activities. Figure 2 demonstrates the steps followed in the proposed model at the user end as well as at the cloud service provider.

(3) Step 3 In this method, session time is introduced not only to establish the trust between user and service provider but to maintain the confidentiality and integrity of data as well. At the time of registration or subscription to the cloud, the user must determine the duration of their daily session in such a way that allows the service provider to convey a new password to the user email address once their scheduled session expires. The session is suspended until the user logs in again. The user needs to log in again with a new password, which is shared from the service provider through the registered email address to maintain the session active. An email is sent to the user contains an encrypted password of a minimum 12 characters. The user is obligated to determine its decryption method at the registration time to be able to decrypt the new password and to resume its session. (4) Step 4 Once the user logs in again with the new password, the session resumes unless there is any mismatch with the one sent. Accordingly, if there is the presence of an attacker, automatically he or she is going to be logged out from the session. Most important of all is that, all the connected devices which has failed to reconnect using the new password are going to be out of the session.

A Multi-factor Approach for Cloud Security

443

(5) Step 5 The cloud service provider ensures the security of client over the remote network considering all the security methods applied by the client or subscriber will no longer be taken or supervised by them. This step provides more visibility of the user activity. Subsequently, after the client has entered the new password, the cloud service provider has to perform another action to maintain the visibility of the user activities. Therefore, another login report is sent to the user using the registered email address considering: • Current image of the client • Screenshot of the current page • Monitored report of the previous session • IP address confirmation • Location confirmation (6) Step 6 The final step consists of a final report sent from the cloud service provider to notify the client about the completion of the current session. The challenges of cloud security are not insurmountable. With the right partners, technology and foresight, companies can leverage the benefits of cloud technology. A trusted administration service can be cloud independent, but trust techniques and evaluations features must be consistent with the IaaS, PaaS, or SaaS cloud model underlying this approach [10]. We argue that consideration of a multi-factor strategy has to capabilities to establish potential trust. Management view point and techniques is crucial. Out of several steps, it is believed that these steps are essential in the secure functioning of the cloud. These six-steps are user friendly because of the security share method it deploys from the user side and the service provider, quick, reliable, and robust.

4 Research Analysis The proposed model overcomes some of the threats and attacks cited in the table below. Besides granting authority to the users, control is another critical question that builds trust. In fact, we trust a system less when we don’t have much control over our assets. There is no way, of course, to ensure that cloud is fully safe for customers. The importance of trust differs between organisations, depending on the nature of the data. Therefore, the less confidence a company places in the cloud provider, the more it needs the technology to monitor its data [11]. Table 1 indicates some attacks and threats which challenges the cloud environment and averts the consumer to have a complete trust to the service provider.

444

F. K. Mupila and H. Gupta

Table 1 Possible problems in cloud Problematic in cloud Threats

Attacks

Threats

Security control

Attack

Security control

Migration to cloud

Strong authentication

Zombie attack

Strong authentication

Cloud API

Strong authentication

Attack on monitoring

Monitoring with IDS/IPS

Insider attack

Monitoring

Spoofing attack

Strong authentication

Datas loss

Strong authentication and auditing

Back door and channel attack

Strong authentication

Risk profiling

Monitoring

Phishing attack

Strong authentication

Identity theft

Strong authentication

Man in the middle attack

Strong authentication and encryption

5 Future Work Despite the limitations, these are valuable in light of this work, reliable authentication, and proper visibility of activities which gives authority and establish trust between user and cloud service provider. Further researches should focus on securing the email address in such a way to counter an intruder or attacker to have access and to possess the new password. Besides, the implementation of a secure authentication method is also favoured.

6 Conclusion In summary, this paper argued that visibility of activities to the user is a valuable asset because it brings trust to utilise the potential benefits served by cloud computing technology adequately. This work identifies the types of cloud services that this technique supports and develop a suitable trust management system. In addition to authentication and authorization procedures, we assume that using the audit monitoring system to monitor all positive and unsuccessful authentication and access attempts to be genuine way to build trust and to asses attacks. For this reason, both the client and the cloud service provider are responsible for maintaining its security.

References 1. Walker, K.: The treacherous twelve’ cloud computing top threats. In: RSA Conference Booth #S2614, SAN FRANCISCO, Cloud Security Alliance, 29 Feb 2016

A Multi-factor Approach for Cloud Security

445

2. Multi-Factors Authentication, From Wikipedia, the free encyclopaedia. https://en.wikipedia. org/wiki/Multi-factor_authentication 3. Gujar, G.V., Sapkal, S., Korade, M.V.: STEP-2 user authentication for cloud computing. Int. J. Eng. Innov. Technol. (IJEIT) 2(10), ISSN: 2277-3754 ISO 9001:2008 Certified Apr 2013 4. Soni, P., Sahoo, M.: Multi-factor authentication security framework in cloud computing. Int. J. Adv. Res. Comput. Sci. Softw. Eng. (IJARCSSE) 5(1), 1065–1071 (2015). ISSN: 2277 128X 5. Nagaraju, S., Parthiban, L.: SecAuthn: provably secure multi-factor authentication for the cloud computing systems. Ind. J. Sci. Technol. (IJST) 9(9) (2016). ISSN (Online): 0974-5645 6. Munir, K., Palaniappan, S.: Secure cloud architecture. Adv. Comput. Int. J. (ACIJ) 4(1) (2013) 7. SDxCentral Staff “What is Cloud-Based Security” Topic Hub/Security, 17 Oct 2015 1:57 PM. https://www.sdxcentral.com/security/definitions/what-is-cloud-based-security/ 8. Sarai, S.: Building the new network security architecture for the future. In: SANS Institute, Information Security Reading Room, SANS White Paper, Jan 2018 9. Khan, M.S., Warsi, M.R., Islam, S.: Trust management issues in cloud computing ecosystems. In: International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM) (2019) 10. Noor, T.H., Sheng, Q.Z., Maamar, Z., Zeadally, S.: Managing trust in the cloud: state of the art and research challenges. IEEE Computer Society, IEEE Xplore 2016 11. Khan, K.M., Malluhi, Q.: Establishing trust in mobile cloud computing. J. ICIC Express Lett. 9(6), 1713–1718

An Innovative Authentication Model for the Enhancement of Cloud Security Francis K. Mupila and Himanshu Gupta

Abstract Cloud computing provides different types of services deployed in different models; thus, its security has become a paramount concern in the IT field. In this paper, a conceptual framework is proposed to mitigate various authentication threads by introducing an encrypted certificate and token build-up using the user’s geographical location in such a way to enhance the security and ensure the users against the data loss, unauthorized access over unauthorized users and hackers. Keywords Authentication · Cloud computing · Cloud security · Web API · API gateway

1 Introduction Cloud computing is a significant aspect of the development of computing technology. It allows big and small organizations to manage and access their infrastructures, applications, storages, networks, directories and platforms at various data center on a distributed system through the Internet. The deployment of cloud computing is as per the user interest and needs, such as a public cloud, a private cloud, a hybrid cloud as well as a community cloud. Furthermore, cloud models like; Software as a Service (SaaS), Platform as a Service (PaaS), Infrastructure as a Service (IaaS), Network as a service (NaaS), Directory as a Service (DaaS).

F. K. Mupila (B) · H. Gupta Amity University, sector 125, Noida, Uttar Pradesh, India e-mail: [email protected] H. Gupta e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_48

447

448

F. K. Mupila and H. Gupta

According to its definition in computer science, authentication refers to the process of verifying the identity of a connected device. In other words, authentication refers to how an application determines who you are. Not to confuse with an authorization, which refers to how application limits access to users. Authentication and authorization are two things that need to comply. [1] This verification process can be done through five forms as follows: Type 1: What the user knows such as a password or pin, Type 2: What the user has, such as a phone, token or a smart card, Type 3: What the user is such as fingerprint, retina and voice recognition, Type 4: Where the user is such as the location, network, Type 5: What does the user prefer, user signature or pattern. In order to avoid attacks such as the man in the cloud, the man in the browser or any other attack, the cloud service provider must set up a robust, reliable and trustworthy mechanism to prevent and detect unauthorized users to access the data or resources. Even though no security method can last for longer without being tampered, in this paper, a conceptual framework to enhance the security of cloud computing is proposed by introducing an encrypted certificate and token build-up using the user’s geographical location to enhance the security and protect from data loss and unauthorized access. Figure. 1 shows the login interface which is used to enter the credentials. Fig. 1 Login interface

Username Password

Remember Me

Log In

An Innovative Authentication Model for the Enhancement …

449

2 Importance of Security The IT industry is evolving day by day. Companies, organizations, agencies and more institutions are migrating from on-premise environment to cloud environment. In this process, the complexity of cloud intensifies. Cloud is not simply used for data storage, but it ensures in keeping data highly available, exchanging information between client and cloud service provider, handling the communication channel and so on. Cloud’s complexity makes it vulnerable which can result in data loss [2]. Therefore, cloud’s security is a significant aspect to be taken seriously. By definition, cloud security is a collection of system, technology, software and controls used to secure virtualized cloud computing infrastructure, IP, data, application, services and related cloud infrastructure [3]. Nevertheless, the users must use a secure connection to access the cloud environment, and cloud service provider must maintain a high-security level, for instance, using a robust encryption system to keep the integrity of the data. Besides, it is essential to remember that increasing the security to a certain extent reduces the performance of functionality. Therefore, both the safety and the performance are to be kept almost at an equal level [4]. One of the most dangerous attacks in the cloud environment is the insider attack. Thus, the cloud service provider must ensure that workers who have physical access to the servers at the data center conduct routine background checks so that the data center is monitored for suspicious activities [5]. Subsequently, the cloud service provider must ensure proper data isolation and logical storage segregation since more than one customer’s data are stored on the same server. This sharing of resources can contribute to information leakage, and 75% of security problems are triggered by this sharing of resources [6]. In addition, the primary concern of cloud protection is the secrecy and privacy of data. Therefore, the system should be sufficient and very scalable so that safety would not be an additional requirement, but rather it should exist as an essential feature of the system at all levels (computers, communications and service-level agreement) [7]. Accordingly, with the growth of cloud technology, authentication remains a key factor as it ensures that the user is the one claimed [8]. Trust, confidentiality, integrity, availability, authentication and authorization are the most critical security problems in cloud computing [9].

3 Related Work Several techniques and methods have been deployed to provide sustainable authentication; some of these techniques and work based on authentication are listed below: (1) Certificate-based authentication is usually used in the current industry because a digital certificate has been incorporated to authenticate the customer. For the first time, a user uses a service; a user installs a unique certificate on their

450

F. K. Mupila and H. Gupta

device, and when the user enters the service, they ask the computer for that specific certificate and only access is given if the certificate is valid [10]. (2) Server authentication with location verification, the aim behind this paper is to strengthen the problem of Web authentication. Here, the author uses a concept to leverage the server location as the second factor of its authenticity by introducing a location-based server authentication while preventing server impersonation at any cost even if the secret of the victim server is known by the attacker [11]. (3) Security algorithms for cloud computing has been reviewed as symmetric algorithms for different encryption and encoding strategies and has concluded that EAS is a good choice for key encryption and that MD5 is faster for encoding. In addition, it can be improved by using 1024-bit RSA and 128-bit keys with RSA EAS encryption algorithm. This monitors the data protection of cloudbased applications. The private key cannot be evaluated using AES although the attacker provided the public keys [12]. The authentication breach is specified as the root of data losses in the cloud environment. Once addressed, the customers will be ensured that the integrity of their data stored in the cloud infrastructure is secure as just the tree in the soil is secured from the root.

4 Proposed Model This work aims to provide a secure procedure through which the user’s credentials are entered and verified to gain access to the cloud environment. For this purpose, this conceptual framework proposes an encrypted certificate and token build-up deployed to provide a trusted authentication and authorization between the clients and the cloud service provider by using the client’s geographical location. Additionally, with the help of the Geolocation API used in google chrome55 and other upgraded Web browsers, HTML5 Geolocation API is used to gather the user’s geographical location. Even if this can compromise the user’s privacy, as mentioned in the new regulation of privacy and the statement, the position is not procurable unless the user consents to it [13]. Using this new feature allows the Web browser to notify the Web server about the user’s accurate location. Similarly, there are a large number of factors that this feature is dependent on, from technological, geographical and even to physical, to influence how particular this feature is it implemented in the real world [14]. Subsequently, this work is going to take place in three phases where each of them is performing and executing a particular task in such a way to enhance the security over the process of identifying users before gaining access to their resources. The API gateway involved in this work acts as a reverse proxy to collect the user’s requests and redirect them to the microservice in charge. Relatively, it decreases the security breaches as only one public IP address it is used publicly [15].

An Innovative Authentication Model for the Enhancement …

API Gateway

Client Service

451

Web Server

Authentication Server

Fig. 2 Flow of the authentication cycle

Figure 2 shows the cycle which the authentication of this proposed work follows, henceforth, to consolidate the user’s verification operation into the three phases.

4.1 Working Principal of the Proposed Work 1. Phase 1 It incorporates the communication between the client service inserted on the Web browser and the API gateway. From this point, the client’s requests for a Web page from the Web server are executed by the gateway API as it is hosted inside the Web server and acts as an entry point. Generally, API gateway accommodates SSL certificate, authentication, authorization and many more microservices according to the Web server’s configuration. Additionally, API gateway is configured in such a way to receive and manage all the static requests from clients. It receives the client’s request, and then, it forwards the login page. Besides, the gateway API is configured in such a way to share the cryptographic hash function and the cryptographic key with the client service to execute the HMAC algorithm at the client service side. The preference over hash function here is because of its swiftness in computing and its ability to minimize any duplication of the output value. 2. Phase 2 After the exchange is established, the client service sends the user’s credentials along with the users’ locations by dint of the Geolocation API of the browser to the authentication server. Basically, the content sent through the network to the authentication server is a hash-based message authentication code. From this

452

F. K. Mupila and H. Gupta

point, the integrity and confidentiality of the user’s credentials and the locations are guaranteed. Subsequently, the cryptographic hash function used for this model is the SHA3-512. The authentication server stores the results if correct, so that the further request cannot use the same details (the user name and password) to request access to the resources. This procedure averts against one of the challenging threats faced by the cloud service provider, which is the replay attack. Here are the sub-processes of this phase; at first, the authentication server generates the JWT Web token to be sent to the client service. However, the token contains a new claim by adding the user’s location into the payload. Following that, the authentication server and the Web server execute the key distribution exchange to use the RSA cryptographic system. Finally, an encrypted certificate containing the user’s location is shared between the authentication server and the Web server to verify the authenticity of the received token due to the additional claim added. Beyond the ability to be a public-key cryptosystem, RSA is used in this proposed model to sign the certificate and counter any alteration to the message. Figure 3 shows how the exchange of the token, the key and the certificate takes place. Besides, there is an important aspect, namely the refresh token. Every new request sent from the client service has to bring up to date its location to the authentication server, and then, the operation to get the JWT token takes place again.

Fig. 3 Exchange done in phase 2 of the proposed model

An Innovative Authentication Model for the Enhancement …

453

3. Phase 3 The encrypted token reaches the client service and then sent to the Web server to validate the JWT token. And then, the pieces of information held in the certificate are verified with the private key of the RSA cryptosystem before gaining access to the resources. Such security techniques ensure data transfer, the security of the user interface, the security for the separation of data, the storage of data and the user access control [16]. This research uses a token to make security decisions and store tamper-proof information about a device individually. Although a token is usually used to reflect only cryptographic details, it is also capable of carrying additional free-form data that can be added when the token is being produced. Lack of good authentication can result in illegal disclosure to the cloud domain user accounts, which can contribute to breaches of privacy. Similarly, the absence of authorization in cloud computing leads to infringements of privacy when unauthorized parties enter the user’s database [17].

4.2 Program Code 4.2.1

Sting of Code Representing the Token: (Output)

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9. eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4 gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ. SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

454

4.2.2

F. K. Mupila and H. Gupta

Decoded JWT Token with Extra Claim (Highlighted In Yellow)

HEADER: ALGORITHM & TOKEN TYPE { "alg": "HS256", "typ": "JWT" } PAYLOAD:DATA { "Sub": "1234567890", "Name": "John Doe", "iat": 1516239022 "enc_loc":"1s0x390cfd5b347eb62d:0x52c2b7494e204dce" } VERIFY SIGNATURE HMACSHA256 ( base64UrlEncode (header) + "." + base64UrlEncode (payload), Your-256-bit-secret ) Secret base64 encoded

5 Future Work It will be important that future research investigates whether the request is from an authorized user. For that, it requires the efficacy and security of data transactions in order to guarantee the integrity and confidentiality of the user’s data by implementing a secure authentication technique. Most users prefer easy passwords which becomes easy for an attacker to guess. But even the best password can also be stolen by brute force and dictionary attacks. Taking this into consideration, future work must consist of a more in-depth analysis in the complexity present in the cloud environment, and a hybrid encryption system is preferably recommended to strengthen security.

An Innovative Authentication Model for the Enhancement …

455

5.1 Conclusion The main conclusion that is drawn is that even though there is no such a robust or stringent technique to implement the security into the Web environment, this paper presents a model to authenticate the user with the help of the user’s geolocation. Cloud security architecture works effectively only when the correct defensive implementations are in place and considered efficient only when it can recognize the questions that arise the security management. To this end, the user’s access to the resource is granted after the token, and the certificate is validated.

References 1. Turner, D.M.: Digital authentication: the basics. In: Cryptomathic. Archived from the Original on 14 Aug 2016. Retrieved 9 Aug 2016 2. Gharami, S., Dinakaran, M.: Sequential mathematical solution for authentication and authorization technique implementing encryption methodology creating a secure transaction using various methods also at the quantum level. In: IOP Conference Series: Materials Science and Engineering 2017 3. From Wikipedia, the free encyclopedia, https://en.wikipedia.org/wiki/cloud_computing_sec urity 4. Bhardwaj, A., Goundar, S.: A framework to define the relationship between cybersecurity and cloud performance. Comput. Fraud Secur. (2019) 5. Indu, I.A., Rubesh Bhaskar, P.M., Vidhyacharan: Identity and access management in a cloud environment. Mech. Challenges Eng. Sci. Technol. Int. J. 21, 574–588 (2018) 6. Wueest, C., Barcena, M.B., O’Brien, L.: Mistakes in the IAAS Could Put Your Data At Risk. https://www.symantec.com/content/en/us/enterprise/media/security_response/Whi tepapers/mistakes-in-the-iaas-cloud-could-put-your-data-at-risk.pdf. May 2015 7. Subramanian, N., Jeyaraj, A.: Recent security challenges in cloud computing. J. Comput. Eelectr. Eng. 71, 28–42 (2018) 8. Farooq, H., Lokhande, T.S., Rajeshri, R.: A review on cloud computing security using authentication techniques. Int. J. Adv. Res. Comput. Sci. 8(2) (2017) 9. Kshetri, N.: “Privacy and security issues in cloud computing” The role of institutions and institutional evolution. Telecommun. Policy 37, 372–386 (2013) 10. From Wikipedia, the free encyclopaedia. https://en.wikipedia.org/wiki/basic_access_authentic ation 11. Yu, D.-Y., Ranganathan, A., Masti, R.J.: Salve: server authentication with location verification. In: International Conference on Mobile Computing and Networking, Mobicom 2016 12. Bhardwaj, A., Subrahmanyam, G.V.B., Avasthi, V., Sastry, H.: Security algorithms for cloud computing. In: International Conference on Computational Modelling and Security CMS (2016) 13. The World’s Largest Web Developer. https://www.w3schools.com/html/html5_geolocation.asp 14. Rich, B.: Everything You Ever Wanted to Know About Html5 Geolocation Accuracy. Feb 2018. https://www.storelocatorwidgets.com/blogpost/20453/everything_you_ever_wanted_ to_know_about_html5_geolocation_accuracy 15. Bush, T.: API Gateway. 11 June 2019. https://nordicapis.com/what-is-an-api-gateway/ 16. Gonzalez, N., Miers, C., Redigolo, F., Simplicio, M., Carvalho, T., Naslund, M., Pourzandi, M.: A Quantitative Analysis of Current Security Concerns and Solutions for Cloud Computing. Springer (2012) 17. Raju, B., Swarna, P., Rao, M.: Privacy and security issues of cloud computing. Int. J. (2016)

Substituting Phrases with Idioms: A Sequence-to-Sequence Learning Approach Nikhil Anand

Abstract In this paper, a sequence-to-sequence model is proposed for translating sentences without idiom to sentence with the idiom. The problem is challenging in two ways, predicting the correct idiom based on context and generating the correct sentence using the idiom due to complex semantic and syntactic rules of language. Sequence-to-sequence learning has gained popularity in the past few years due to their surprising results on machine translation task. This work is based on sequence-tosequence learning of word sequences along with their part-of-speech tags to predict sentences with correct idiomatic phrases. Results have shown that models have achieved higher BLEU score on using part-of-speech tags as the input sequences. These observations show the prominence of part-of-speech tags in identifying the hidden writing patterns in the language. Keywords Machine learning · NLP · Encoder–decoder · RNN · POS tags

1 Introduction Communication has evolved in thousands of years. Humans have covered a very long journey, starting from cave paintings to modern language. Cave paintings, ideograms, petroglyphs, pictograms, and writing, all these communication techniques have the same common idea of conveying meanings from one individual to another or one group to another. Language is an ordered system of communication that has emerged in the past thousands of years and is continuously evolving. New words, phrases, and proverbs are continuously being added to the languages. Apart from time, language also got reshaped among a community and group of people. This leads to variation in the same language in different periods and within different groups. Idioms are part of any language that has a metaphorical meaning which is different from the actual meaning of words comprising it. These phrases amplify the sentence N. Anand (B) Internshala, Gurugram, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_49

457

458

N. Anand

when they are used. New idioms are being added from time to time, gaining popularity and becoming part of our daily use. Semantic and syntactical rules specific to any language make linguistics difficult. Irregularities also emerged due to different writing styles. This paper explores the possibility of augmenting natural text by substituting phrases by idioms. This is a development of the previous work on implementing an idiom recommendation system using POS tagging and sentence parsing [1]. The earlier work was entirely based on handcrafted rules. In the field of NLP, we have observed that the irregularities in natural language restrict the effectiveness of rulebased methods for any task. The impressive results of neural networks in various areas of natural language processing have influenced this work. From sentiment analysis to anomaly detection, from text generation to image captioning, deep learning has shown its unmatched capabilities [2–5]. In this work, a sequence-to-sequence model is proposed that translates a sentence without any idioms to sentences with idiomatic phrases based on the context. Different variations of RNN encoder–decoder models for the experiment without explicitly defining any syntactic rules. The paper is framed in the following manner: Sect. 2 represents the literature survey, Sect. 3 represents the methodology, Sect. 4 represents the results, and Sect. 4 represents the conclusion.

2 Literature Survey 2.1 Encoder–Decoder The growing popularity of deep learning has evolved various architectures for different applications in machine learning. From image recognition to machine translation, all the tasks have specialized deep learning architectures [6, 7]. Encoder–decoder is one such neural network architecture, which is used for image compression, neural machine translation, anomaly detection [3, 7, 8] etc. The encode–decoder architecture contains two connected layers known as encoder and decoder. When the encoder receives a source sequence, it reads the sequence and converts it to a low-dimensional hidden state feature vector; the process is called encoding. The decoder reverses the process by transforming the low-dimensional vector back to the sequence, the process is called decoding. Since encoder–decoder architecture is an end-to-end machine learning model, the results are not visible, and it can be seen as a mapping source sequence to target sequence via an intermediate hidden layer as a feature extractor. In machine translation, the encoder–decoder architecture was implemented successfully before. Different variations were implemented in the past few years. An RNN encoder–decoder was proposed for statistical machine translation [7]. Another similar approach with LSTM layers as encoder and decoder was proposed

Substituting Phrases with Idioms: A Sequence-to-Sequence …

459

for machine translation which achieved greater BLUE scores for even much longer sentences [9].

2.2 Recurrent Neural Network Recurrent neural networks (RNN) are specialized neural networks that give importance to the order of input in long sequential input [10]. The gated architecture of recurrent neural networks such as LSTM and GRU has gained popularity in recent years due to their capability in capturing sequential regularities. There are two major problems associated with RNN, vanishing gradient, and exploding gradient [11, 12]. Long short-term memory popularly known as LSTM is a solution to vanishing gradient descent problems in recurrent neural networks [13]. It solves the problem by using gated architecture. Gates are used at each input state to decide how much the new input should be written in the memory cell and how much the content of the current memory cell should be forgotten. LSTM architecture is defined as:     s j = RLSTM s j−1, h j = c j ; h j

(1)

c j = f  c j−1 + i  z

(2)

  h j = o  tan h c j

(3)

  i = σ x j W xi + h j−1 W hi

(4)

  f = σ x j W x f + h j−1 W h f

(5)

  o = σ x j W xo + h j−1 W ho

(6)

  z = tan h x j W x z + h j−1 W hz

(7)

  y j = OLSTM s j = h j

(8)

s j ∈ R 2·dh , xi ∈ R d x , c j , h j , i, f, o, z ∈ R dh , W xo ∈ R d x × dh , W ho ∈ R dh × dh (9) Here, cj and hj are memory and hidden state component, respectively. Three gates are there—i, f, and o which stands for input, forget, and output gate. Gated recurrent unit popularly known as GRU is an alternative to LSTM. LSTM architecture is hard to explain and its complexity makes it hard to analyze [14].

460

N. Anand

There are computational constraints with LSTM networks as well. GRU architecture overcomes these shortcomings. It has fewer gates than LSTM and does not have a separate memory cell. GRU architecture is defined as:   s j = RGRU s j−1, x j = (1 − z)  s j−1 + z  s ∼j

(10)

  z = σ x j W x z + s j−1 W sz

(11)

  r = σ x j W xr + s j−1 W sr

(12)

    s ∼j = tan h x j W xs + r  s j−1 W sg

(13)

  y j = OGRU s j = s j

(14)

s j , s ∼j ∈ R ds , xi ∈ R d x , z, r ∈ R ds , W xo ∈ R d x X dh , W so ∈ R ds X ds

(15)

In bidirectional RNN, each element of the sequence is based on both past and future contexts [15]. Two different RNN, one process from left to right and another from right to left, are concatenated together. These networks are efficient when features are extracted from the context window around a word. Bidirectional RNN is defined as:   biRNN (x1: n , i) = yi = RNNforward (x1: i ); RNNbackward (xn:i )

(16)

2.3 Part-of-Speech Tagging Part of speech is word categories in any language. This part of speech includes— nouns, verbs, adjectives, determiners, adverbs, etc. POS tagging is a technique where POS tags are assigned to words and word sequences. POS tagging techniques are classified into two categories—supervised and unsupervised. Supervised POS tagging technique uses probability for assigning the POS tags. A large corpus is trained on tagged data. The probabilistic approach is used while tagging the data considering unigram, bigram, trigram, hidden Markov model, etc. Due to the sequential training of the POS tagger, these show best results for sequential data only [16]. Rule-based tagger utilizes grammatical information and handcrafted set of rules for assigning POS tags. These are one of the earliest tagging practices. The unsupervised approach is not as accurate as of the supervised approach. Although, some

Substituting Phrases with Idioms: A Sequence-to-Sequence …

461

recent work has filled the gap between unsupervised and supervised approaches for POS tagging by using bilingual graph-based projections [17].

3 Methodology In this paper, a sequence-to-sequence model is proposed for substituting phrases with idiomatic expressions. This work is an extension of the previous work on idioms recommendation based on their syntactic structure using rule-based methods. This method is influenced by previous work on neural machine translation using sequenceto-sequence learning [9]. Two different idioms—in a while and for a while/awhile are used. To train the model in such a way that it can identify the correct idiom, the correct version of the idiom, and the correct position of the idiom in the sentence, we will be using part-of-speech tag sequences along with the word sequences to train the model. The goal of the model is to estimate the conditional probability of y1:m , where the input sequence is x1: n is concatenated with POS1: n . The recurrent neural network first obtains the fixed dimensional feature vector c = RNNEncoder (x1: n ; POS1: n ) by stepping through the input time sequences. A condition generator RNNDecoder (c) is then used for stepping through the output time steps with a softmax over all the words in the vocabulary to obtain y1: m . For the experiment, input text sequences, output text sequences, and POS tag sequences are label encoded. Then, these sequences are padded to convert them to fixed-size vectors. These fixed-size vectors are then used to train the encoder–decoder framework (Fig. 1). Four different versions of recurrent neural networks are used for the experimentSimpleRNN, GRU, LSTM, and Bi-LSTM. These specialized architectures are further trained on word sequences as input data and word sequences as input along with the part-of-speech tag sequences concatenated with them.

4 Experimental Setup 4.1 Dataset For the experiment, 1275 sentences are collected from different web sources. The dataset contains the pair of two sentences—sentence without idiom and sentence with idiom phrase. These sentences can be classified between the two idioms—in a while and for a while/awhile based on the context. The second idiom can be in two forms—for a while or awhile based on the semantic rules of language. The sample data is shown in Table 1. The left column has the input sentences while the right column has the same sentences with idiomatic expressions replacing phrases.

462

N. Anand

Fig. 1 Proposed architecture with concatenated word embedding and part-of-speech embedding as the input for the encoder–decoder framework

Table 1 Three sample sentences and their corresponding sentences with idiomatic expression

Sentences

Sentence with idiom

We will reach there in a short period of time

We will reach there in no time

Stay for a short period of time Stay awhile and rest and rest I was on crutches for a short period of time

I was on crutches for a while

4.2 Evaluation Metrices For measuring the performance of sequence-to-sequence learning models, there are various quantitative methods available such as word error rate (WER), multireference word error rate (mWER), BLEU score, subjective sentence error rate (SSER), and information item error rate (IIER) [18].

Substituting Phrases with Idioms: A Sequence-to-Sequence … Table 2 BLUE-4 scores from different architectures with and without concatenating the POS tag embedding as input along with word embedding

Layers in model

463 BLEU-4 score

SimpleRNN

0.9420

SimpleRNN with concatenated POS tags

0.9454

GRU

0.9572

GRU with concatenated POS tags

0.9570

LSTM

0.9476

LSTM with concatenated POS tags

0.9568

Bi-LSTM

0.9623

Bi-LSTM with concatenated POS tags

0.9653

BLEU score is a machine translation evaluation score that compares the n-gram in the machine-generated text to the n-grams in the reference text. For this model, we have used a cumulative score from 1-gram to 4-gram, also called BLUE-4 [19].

5 Results The model is evaluated on 10% of the total data. To evaluate the machine-generated text, we have used BLEU-4 score. The score for different models is shown in Table 2. The model is evaluated on 10% of the total data. From the above results, we observe that concatenated POS tag sequences along with word sequences performed comparatively better than only word sequences. Except for the GRU, every other architecture of RNN performed significantly better with the concatenated layers. Although, the difference in GRU both the different versions is insignificant. The qualitative evaluation has also shown that concatenated POS tags have predicted idioms more accurately.

6 Conclusions A sequence-to-sequence model is introduced for translating input sentences to the output sentence with an idiomatic phrase. It is observed that part-of-speech tags as features improve the model ability to capture more hidden features and semantic rules. The qualitative evaluation has shown the importance of part-of-speech tags as features for predicting the correct idiom and capturing the context. The results further suggest that the bidirectional LSTM model performs best among all the other specialized RNN architectures. The results can further be improved by using a larger data set. This model can be implemented on more idioms and by using dense encoder–decoder networks. Using attention layers can also be considered for further improving the results.

464

N. Anand

References 1. Anand, N.: Idiom recommendation using POS tagging and sentence parsing. In: Kumar, A., Paprzycki, M., Gunjan, V. (eds.) ICDSMLA 2019. Lecture Notes in Electrical Engineering, vol. 601. Springer, Singapore (2020) 2. Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of Conference on Empirical Methods in Natural Language Processing—EMNLP 2015, pp. 1422–1432 (2015, September). https://doi.org/10.18653/v1/ d15-1167 3. Sakurada, M., Yairi, T.: Anomaly detection using autoencoders with nonlinear dimensionality reduction. In: ACM International Conference Proceeding Series, vol. 2, pp. 4–11 (2014, December). https://doi.org/10.1145/2689746.2689747 4. Marcheggiani, D., Perez-Beltrachini, L.: Deep graph convolutional encoders for structured data to text generation, pp. 1–9 (2018). https://doi.org/10.18653/v1/w18-6501 5. You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2016, pp. 4651–4659 (2016, December). https://doi.org/10.1109/cvpr.2016.503 6. Calderon, A., Roa, S., Victorino, J.: Handwritten Digit Recognition using Convolutional Neural Networks and Gabor filters. In: Proceedings of the 2003 International Congress on Computational Intelligence (CIIC), pp. 1–8 (2003) 7. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734, 2014. https://doi.org/10.3115/v1/d14-1179 8. Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Deep convolutional autoencoder-based lossy image compression. In: Proceedings of 2018 Picture Coding Symposium (PCS 2018), pp. 253–257 (2018). https://doi.org/10.1109/pcs.2018.8456308 9. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural. Inf. Process. Syst. 4(January), 3104–3112 (2014) 10. Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990). https://doi.org/10. 1207/s15516709cog1402_1 11. Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: 30th International Conference on Machine Learning (ICML 2013), no. PART 3, pp. 2347–2355 (2013) 12. Pascanu, R., Mikolov, T., Bengio, Y.: Understanding the exploding gradient problem. In: 30th International Conference on Machine Learning (ICML 2013), no. PART 3, pp. 2347–2355 (2013) 13. Hochreiter, S., Schmidhuber, J.: Long Short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 14. Dey, R., Salemt, F.M.: Gate-variants of gated recurrent unit (GRU) neural networks. In: Midwest Symposium Circuits System, vol. 2017, pp. 1597–1600 (2017, August). https://doi.org/10. 1109/mwscas.2017.8053243 15. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997). https://doi.org/10.1109/78.650093 16. Ratnaparkhi, A.: A Maximum entropy model for part-of-speech tagging. Ann. Neurol. 5(1), 133–142 (1996) 17. Das, D., Petrov, S.: Unsupervised part-of-speech tagging with bilingual graph-based projections. In: ACL-HLT 2011—Proceedings of 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 600–609 (2011) 18. Tomás, J., Mas, J.À., Casacuberta, F.: A quantitative method for machine translation evaluation, pp. 27–34 (2003). https://doi.org/10.3115/1641396.1641401 19. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. Ann. Phys. 371(23), 311–318 (2002). https://doi.org/10.3115/1073083. 1073135

A Composite Framework for Implementation of ICT Enabled Road Accident Prediction Using Spatial Data Analysis Dara Anitha Kumari and A. Govardhan

Abstract Matching with the growth of the nations, the road surface transportation for each country has increased to it maximum capacity. The increased traffics have left a very little space for the maintenance of the roads and keeping the roads up to the mark for handling these higher volumes of traffic. Moreover, the traditional method for road maintenance is manual and it is highly time consuming. Due to this fact, many of the developed and underdeveloped nations are facing the problem of under maintained roads, which again leads to the road accidents. The accidents in the roads not only reduce the effectiveness of the country to match the growth of the industrial development but also cause the threat to the human life, which is highly unacceptable. Thus, the demand for the accident prediction automation is the need of the current research. Many of the parallel research outcomes have aimed to solve this problem by analysing the road traffic volume. However, many of the researchers, which again is cited further in this work, have proven that the accidents on the road surface are caused by the road conditions, rather by the traffic volume. Henceforth, this work proposes a novel framework for demonstrating the use of ICT enabled methods for predicting the accident-prone zones by analysing the road conditions. This work demonstrates nearly 90% accuracy for noise reduction, nearly 98% accuracy for road surface defect detection and nearly 98% accuracy for predicting the accident-prone zones for making the road surface transportation a much safer option. Keywords Correlation · Adaptive · Location dependent · Accident possibility prediction · Regression

D. A. Kumari (B) Department of Computer Science, JNTUH, Hyderabad, India e-mail: [email protected] A. Govardhan JNTUH, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_50

465

466

D. A. Kumari and A. Govardhan

1 Introduction Indian cities are among the fastest developing metropolises in the world. With the growth of the economy as the population is continuously increasing in big and medium sized cities with the increase of the living standards. The future strength of India lies in urban territories; therefore, it is crucial to evolve these environments. Smart city solutions are best serving the needs of citizens to live a safe, convenient and happy life. The mission to develop smart cities in India consists of many diverse tasks. One of the ways towards smart city is better road condition. Road network acts as the principal network to smooth out the progress of trade, transport, social assimilation and financial development. It provides better accessibility, flexibility and reliability and thereby is the greatest advantage of economies of scale. According to NHAI, 60% of goods and 80% of fare traffic are carried by the road. Among all type of transport via road transport is preferable for short distance connectivity. Mega cities traffic in India is increasing day by day due to increase in the population. Quantity of vehicles has also been increasing at an average speed of 10.16% per annum over the decade. Geometric shapes facilitates in the plane can be utilized instinctively regarding one’s present area, where case the x-pivot will highlight the nearby north. All the more officially, such arranges can be gotten from three-dimensional directions utilizing the cunning of a guide projection. It is absurd to expect to outline bended surface of Earth onto a level guide surface without disfigurement. The trade-off frequently picked— called a conformal projection—jelly points and length proportions, with the goal that little circles are mapped as little circles and little squares as squares. Further, the rest of the paper is organized such that, in Sect. 2, the problem is identified and listed for finding the solution in the next phase, in Sect. 2.1, the proposed architecture is furnished, in Sect. 3, the comparative benefits are listed with parallel research outcomes and finally, this work presents the final research conclusion in Sect. 4.

2 Problem Identification In this section of the work, the problem is identified and presented. With the collected recommendations from various research attempts, various research attempts are being implemented and studied in order to establish a process to automate the maintenance cycle of the roads. Nevertheless, the complete cost propositions of those models are not completely justified. Also, adding to this complexity challenge, road conditions are captured in various lighting conditions and using variety of capture devices, disjointed by quality and method of capturing. A number of research attempts are carried out in order to detect the road condition based on the potholes. Nonetheless, the detection process is highly time complex and makes the maintenance process delayed [4–6], Further, majority of the parallel research outcomes failed to measure

A Composite Framework for Implementation of ICT Enabled Road …

467

multiple potholes in a single image and cannot distinguish the potholes based on the emergency of repair. Thus, this work defines a newer dimension of pothole detection for road images, which contains higher number of potholes in a single image and makes the process faster by reducing the change of false detection. It is being observed that the preventing measures on the road repair can make the road surface last longer and can reduce significant time for the maintenance needed for high damage rebuild operations. Nonetheless, the potholes causing major delinquent on the road surfaces can easily visible but the cracks on the road, which will significantly, become a pothole cannot be always seen by human eyes. Henceforth, after the understanding of the problem, in the next section of this paper, the proposed architecture is presented.

2.1 Proposed Architecture In this section of the work, the proposed architecture is furnished in Fig. 1. Considering the fissures of the recent researches, this work identifies the following major steps to be taken in order to make a significant contribution towards this research domain: • Improving the input image quality is the basic need to improvement the accuracy for detecting the road conditions. • The first and foremost challenge of this direction of the research is to formulate a method or framework to identify and reduce various types of noises and adaptively remove the images.

Fig. 1 Proposed architecture of the automated framework

468

D. A. Kumari and A. Govardhan

• This work extracts the parameters for determining the potholes existence as the major outcome. • Yet another outcome of this work is to classify the potholes based on the urgency of repair. • The outcome of the work is to automate the detection facility to provide a timely maintenance alert and deliver a better road condition in India. • In addition, the maintenance tasks demand a suitable condition of the weather, which is difficult to predict. Situations have proven that the maintenance work started with no knowledge of the weather had to abort and caused further delay in the task resulting into further decay in the road conditions. Thus, it is the demand of the recent research to provide the prediction of the road condition in order to detect the potholes to be given higher priority, cracks to be considered for immediate repair and patch works to be ignored during the automation. • The major outcome of this work is to build an automated framework to analyse and predict the road damages and recommend the schedule maintenance tasks with 100% accuracy in order to make the world with better surface transport capable. The proposed algorithms are already been discussed in the other works by the same author [1–3]. Henceforth, in the next section of this work, the proposed framework is compared with the other parallel research outcomes.

3 Comparative Analysis As this work is already been demonstrated in various parts in the previous sections, the final comparative analysis is carried out in this section of the work [1]. Firstly, from research objective—1, the noise reduction comparisons are furnished in Table 1. Dara, A.K., and Govardhan, A. [2] the results are visualized graphically [6–9] in Fig. 2. Secondly, from research objective—2, the clustering comparisons are furnished in Table 2. The results are visualized graphically in Fig. 3. Dara, A.K., and Govardhan, A. [3] finally, from research objective—3 [9–12], the prediction accuracy comparisons are furnished in Table 3. The results are visualized graphically in Fig. 4. Table 1 Noise reduction comparative analysis Research outcome

Missing value detection and reduction accuracy (%)

Outlier detection and reduction accuracy (%)

Ertürk et al. [4]

58.32

58.49

Çe¸smeci et al. [5]

62.84

61.61

Proposed method

90.00

90.00

A Composite Framework for Implementation of ICT Enabled Road …

Fig. 2 Noise reduction comparative analysis Table 2 Clustering accuracy comparative analysis

Research outcome

Accuracy (%)

Kanarachos et al. [6]

90

Bello-Salau et al. [7]

94

Bayer et al. [8]

91

Azhar et al. [9]

90

Proposed method

98

Fig. 3 Clustering accuracy comparative analysis

469

470 Table 3 Prediction accuracy comparative analysis

D. A. Kumari and A. Govardhan Research outcome

Accuracy (%)

Trajectory-based, Cai 2015 [10]

90

Dense trajectory, Wang [11]

84

Auto parking, Mahmood [12]

90

Proposed method

98

Fig. 4 Prediction accuracy comparative analysis

Henceforth, it is natural to realize that the proposed automated framework has outperformed the parallel research works. Further, in the next section of this paper, the final research conclusion is presented.

4 Conclusion In order to match the current trend of research, this work proposes a novel framework for predicting the road accident-prone zones on a live map. This work maps the zones with the coordinates such as longitude and latitude from the maps. To achieve the higher accuracy of the prediction, in the first phase of the research, this work deploys three algorithms such as Adaptive Moment-Based Spatial Image Noise Detection and Removal Algorithm (AMBSI-NDR) for reduction of the noises from the image data, which is separated from the spatial data, Adaptive Logistic Correlation-Based Missing Value Identification and Replacement Algorithm (ALCMVIR) for missing value reduction method from the textual data extracted from

A Composite Framework for Implementation of ICT Enabled Road …

471

the spatial information and Correlative Logistic Correction-Based Outlier Identification and Removal Algorithm (CLC-OIR) for reducing the outliers from the textual data extracted from the spatial information. During this phase, the work demonstrates nearly 90% accuracy. In the second phase of this research, this work presents the fourth algorithm as Parametric Extraction and Pragmatic Clustering for Defect Detection (PE-PC-DD) for clustering of the defects based on the extracted parameters. During this phase, the work demonstrates nearly 98% accuracy. Finally, in the third phase of the research, this work showcased the algorithm implementation called Correlation-Based Adaptive Location-Dependent Accident Possibility Prediction (CBA-LD-APP) for predicting the accident-prone zones with nearly 98% accuracy. Thus, this work demonstrates a complete autonomy for the accident prediction and accident-prone zone mapping and must be considered as one of the benchmark in this domain of research.

References 1. Dara, A.K., Govardhan, A.: Noise reduction in spatial data using machine learning methods for road condition data. Int. J. Adv. Comput. Sci. Appl. 11. https://doi.org/10.14569/ijacsa.2020. 0110120 2. Dara, A.K., Govardhan, A.: Parametric extraction of the road conditions spatial data and detection of defeats using pragmatic clustering method. Int. J. Eng. Adv. Technol. (IJEAT) 9(3) (2020). ISSN 2249 – 8958 3. Dara, A.K., Govardhan, A.: Detection of coordinate based accident-prone areas on road surface using machine learning methods. Int. J. Comput. Eng. Inf. Technol. (IJCEIT) 12(3) (2013). E-ISSN 2412-8856 4. Ertürk, A., Çe¸smeci, D., Güllü, M.K., Gerçek, D., Ertürk, S.: Integrating anomaly detection to spatial preprocessing for endmember extraction of hyperspectral images. In: Proceedings of IEEE Geoscience and Remote Sensing Symposium (IGARSS), pp. 1087–1090 (2013) 5. Ertürk, A., Çe¸smeci, D., Güllü, M.K., Gerçek, D., Ertürk, S.: Endmember extraction guided by anomalies and homogeneous regions for hyperspectral images, IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 7(8), 3630–3639 (2014) 6. Kanarachos, S., Christopoulos, S.R.G., Chroneos, A., Fitzpatrick, M.E.: Detecting anomalies in time series data via a deep learning algorithm combining wavelets neural networks and hilbert transform. Expert Syst. Appl. 85, 292–304 (2017) 7. Bello-Salau, H, Aibinu, A.M., Onumanyi, A.J., Onwuka, E.N., Dukiya, J.J., Ohize, H.: New road anomaly detection and characterization algorithm for autonomous vehicles. Appl. Comput. Inf. (2018). [online] Available https://doi.org/10.1016/j.aci.2018.05.002 8. Bayer, F.M., Kozakevicius, A.J., Cintra, R.J.: An iterative wavelet threshold for signal denoising. Sig. Process. 162, 10–20 (2019) 9. Azhar, K., Murtaza, F., Yousaf, M.H., Habib, H.A.: Computer vision based detection and localization of potholes in asphalt pavement images. In: 2016 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–5 (2016, May) 10. Cai, Y., Wang, H., Chen, X., et al.: Trajectory-based anomalous behaviour detection for intelligent traffic surveillance. IET Intell. Transp. Syst. 9(8), 810–816 (2015) 11. Wang, H., Klaser, A., Schmid, C., et al.: Action recognition by dense trajectories. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, pp. 3169–3176 (2011) 12. Mahmood, Z., Haneef, O., Muhammad, N., et al.: Towards a fully automated car parking system. IET Intell. Transp. Syst. 13, 293–302 (2019)

VISION AID: Scene Recognition Through Caption Generation Using Deep Learning Mathew Regi and Mathews Abraham

Abstract Visually impaired individuals heavily trust their alternative senses like acoustic signals and touch to comprehend the world outside. It is incredibly tough for a visually handicapped individual, to perceive objects without feeling them. But there could be times when physical contact between the individual and the object is risky or deadly. This proposed paper presents a real-time object recognition application to aid the visually impaired. A camera linked mobile phone with systematised orientation, given as input to a computer device for generation of real-time object detection. The proposed project utilises a convolutional neural network (CNN) to recognise pretrained items in captured imagery and uses recurrent neural network (RNN) with LSTM for generation of captions. Here, the caption dataset is utilised for the training of captioning model. After the training, these neural models can generate captions of objects. The network output can then be analysed to impart to those with visual impairment. This is put forth in audio format by converting the generated captions to audio. Exploratory outcomes on the MS-COCO dataset show that our design beats the best in class. Keywords Object recognition · Caption generation · CNN · RNN · LSTM

1 Introduction In this age, where most applications solely benefit the healthy ones, it is essential to create a device for guiding the visually challenged. Generally, these impaired individuals depend on the assistance of others to guide them through. Unfortunately, there could be scenarios, where help may not be easily available or the blind may get fooled. M. Regi (B) · M. Abraham Department of Information Technology, Rajagiri School of Engineering and Technology, Ernakulam, Kerala, India e-mail: [email protected] M. Abraham e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_51

473

474

M. Regi and M. Abraham

Taking these issues into consideration, it was proposed to design a modern application favouring the visually impaired. In this technological era, with strides towards progress in every sphere, the blind must not get left behind. This application aims to provide them with a better understanding of the world around. Currently, a few things like spectacles, the Braille or even a walking stick are used to tide over the impairment and move on with their lives. This proposed project utilises convolutional neural network model for recognising objects along with recurrent neural network for generating captions. Its imagery is described automatically to the blind person by converting the generated text-tospeech, devoid of external help.

2 Related Works Various methods have been designed for generating captions from images. This section includes some of the important works done in the area of caption generation using deep learning techniques. Khademi et al. [1] propose a contextual and focussed deep architecture for the caption generation of images. The proposed architecture uses a bidirectional grid LSTM. This captures visual aspects of an RGB imagery as input and learns its intricate space patterning based on a dual-dimensional background, by choosing or disregarding its input. Often, region-grounded versions elucidate features of those entities and their link in the images. For caption generation of images, it integrates characteristics from grid LSTM with these versions, utilising dual-layer bidirectionally. A new approach based on region-based deep learning [2] method is recommended to generate captions for imagery. This consists of recurrent neural network (RNN) attribute predictor, region-based object detector, encoder-decoder language producer fixed with dual RNNs to create meaningful explanations of the given imagery. It uses R-CNN architecture to detect objects and encoder-decoder RNN model to generate sentences. The IAPR TC-12 dataset enables the evaluation process. In paper [3], multilayer dense attention architecture is proposed to generate image captions. Faster R-CNN is used to obtain imagery features, and LSTM helps decode the multilayer dense attention architecture. Thus, caption text is produced. The model’s overall architecture is performed on encoder-decoder format, which is split into two levels: bottom-up attention and top-down attention. The first mechanism is proposed to extract image regions, and the second mechanism is to produce the relevant captions in each time series. It is evaluated on various datasets like MS-COCO, Flickr, Chinese-AI. The method proposed in paper [4] uses a cascade recurrent neural network (CRNN) to generate image captions. CRNN uses a cascade network to generate the captions of images. This network can utilise in-depth meaningful contexts present in the imagery. Unlike the conventional MRNN, CRNN comprises front-end and back-end network, linked to obtain visual language interfaces from two sides. Here,

VISION AID: Scene Recognition Through Caption Generation Using …

475

a stacked gated recurrent unit is made with dual concealed levels that stretch the verticality of RNN and thus obtains meaningful correlation between images and sentences. Its back-end network has been developed to extract semantic context by front and back directions to predict words. It transfers the acquired knowledge in the front as initial setting and feeds sentences in reversal in the back-end system. Efficacy of CRNN is confirmed by MS-COCO datasets.

3 Proposed System This proposed system is a real-time scene capturing application for those vision impaired individuals to guide them through. This application will capture the image of a scene and deliver the description of the scene as an audible format. Hereby, they understand what objects are in front of their surrounding through a camera aligned smartphone and thus reduce the risk of accidents. The layout of this recommended device is explained in Fig. 1. Initially, the user on shaking the mobile activates the application, and the camera starts taking pictures.

Fig. 1 System layout

476

M. Regi and M. Abraham

Then, the picture is taken to the server, where the weight file is stored for predicting the caption. The MS-COCO dataset [5] is employed to train the network. For generating captions, a lot of image data is essential. Varied image datasets like Flickr30k, Flickr8k, MS-COCO, SBU, Pascal and more can be easily accessed. MSCOCO is the latest and possibly the most popularly utilised and systematised dataset. It has 82,783 images for the training and 40,504 for both testing and validation. Each one of the images consists of five captions. Current model is used to train with the MS-COCO dataset. This is used extensively for network testing and training. Initially, a pre-trained convolutional neural network (CNN) with VGG19 architecture is used for preprocessing the image, and the output is given to RNN to generate descriptions for images. Subsequently, the generated captions are saved in a text file and given over to the mobile. Next, the caption in text format is converted into speech by using text-to-speech API and given back to the visually impaired user. The major steps of this proposed system is described herewith.

3.1 Object Detection and Recognition VGG-19 object detector is implemented to detect the object efficiently. VGG-19 is a CNN architecture that is 19 layered, wherein number 19 represents the total layers with trainable weights. This comprises 16 convolutional levels besides 3 wholly connected ones (Fig. 2) [6]. The VGG-19 consists of five sets of convolution layers. Of this, two of them have 64 filters and the next set has 2 convolution layers with 128 filters. This is followed

Fig. 2 VGG19 architecture

VISION AID: Scene Recognition Through Caption Generation Using … Table 1 Example of object attributes

477

Colour

Black, white, grey, blue, green, etc.

Shape

Long, circle, round, rectangle, square, etc.

Pattern

Spotted, striped

Texture

Rough, furry, smooth, shiny, metallic, wooden, wet, etc.

by a set of 4 convolution levels having 256 filters. Next, 2 sets have 4 convolution levels each, having 512 filters. Max pooling layers between each set of convolution layers have 2 × 2 filters with 2 pixels. The yield of the last pooling layer is levelled out and send to a fully connected layer, whose output is fed to another similar layer with 1000 neurons. All these layers are ReLU activated. Finally, there is a softmax layer that outputs a vector which represents the probability distributions of a list of outcomes. Convolution layers and fully connected layers are trainable weights. Max pool layer helps decrease the size of input imagery, where softmax is utilised for the final decision making. The system takes a (224, 224, 3) RBG image as the input from the MS-COCO dataset for training. After training, the network is capable of detecting an object in the scene.

3.2 Attribute Prediction Here, we utilise RNN-based attribute classification [7]. Research shows that RNNs benefit varied spheres of machine learning, involving caption generation, device transformation, etc. RNNs are employed here because of their capability in effectively predicting the attributes that must be reported for the prescribed set of features. Besides, RNNs utilised for this work are word based and uses LSTM [8] architecture. At the point of testing, CNN helps to obtain imagery features. These are then used for predicting multiple characteristics, one at a time. This prediction depends on those extracted image aspects in combination with previously created terms. It goes on producing appropriate features till the assigned STOP is generated, i.e., when RNN concludes that no other attribute can be utilised for describing imagery, with all its aspects and those formerly produced features (Table 1) [2].

3.3 Caption Generation The captions generated thereof are more explanatory than those brought forth by other prevailing research studies, in terms of features and object recognition details. Hence, ideally, the MS-COCO dataset is utilised for the purpose of training and evaluation. Owing to its descriptions, this proposed system is superior when compared to other popular databases.

478

M. Regi and M. Abraham

Both CNN and RNN are utilised to generate captions [9, 10]. Hence, the network is trained through the MS-COCO dataset. Here, each image in particular is combined with five captions. To hasten training, each imagery is pre-encoded to its feature series. Since the caption may contain a large number of unique terms, word encoding is not used. But, the trained entrenched architecture outputs the word into a vector like (1, 128). LSTM architecture is used to generate captions. During the training period, the network learns how to develop descriptions for images through analysis of the provided dataset. After training the network, a weight model is formed which contains all the learned weights of the network. The vector format test image is fed as input to weight model to create the captions. Overall, both the CNN- and RNN-based object and attribute estimations are very efficient in classifying high-meaningful sentence generation.

3.4 Application Development The proposed caption generation mobile app is developed through react native, a JavaScript framework. It is used to write veritable, localised mobile apps for android and ios devices. Based on react Facebook’s JavaScript library, this helps to build user interactions, yet rather than target web browsers, it aims at mobile platforms. React native easily enables simultaneous development for both android and iOS. Our application is created to be work on android devices. As a blind aid the user can activate the application by gesture like shaking the mobile. Then, the sensor event listener consists of the sensor manager which notifies whenever it receives sensor data. The two variables used to detect whether shake has occurred are the sensor accelerometer and sensor manager. The camera activity commences if the threshold is more than value of threshold set, or else it will not trigger the camera activity. The camera activation enables the user to capture images. The saved mobile imagery is fed as input into a trained system that generates captions for images. Then, this caption is saved into a text document. Subsequently, this is directed into the mobile and converted into speech. The text-to-speech API is used to convert the generated captions into an audible format for the blind, which is similar to human speech.

4 Evaluation Results For evaluating the performance of this system to generate image captions, we use the BLEU score to compare our model with other existing models. There are different evaluation metrics like BLEU, ROUGE, METEOR, etc. for evaluating description generation.

VISION AID: Scene Recognition Through Caption Generation Using …

479

Table 2 BLEU evaluation report BLEU-1

BLEU-2

BLEU-3

BLEU-4

CRNN [4]

0.691

0.514

0.376

0.275

Deep [11]

0.713

0.539

0.403

0.304

Show and tell [12]

0.718

0.504

0.357

0.25

Adaptive [13]

0.742

0.58

0.439

0.332

Our model

0.751

0.592

0.448

0.341

The bilingual evaluation score (BLEU) is a method to calculate a predicted caption to reference caption. If it matches, then the score will be 1, but a mismatch results in 0. Through this evaluation, the closeness between a system produced caption and the original dataset caption is determined. The BLEU evaluation of existing methods has been employed for comparison and to help us calculate the efficiency of our model. BLEU-1, BLEU-2, BLEU-3, BLEU4 are cumulative scores that confer with the computation of individual n-gram scores at all series from one to n and weighting them by computing the mathematical mean. An individual N-gram score is an analysis of simply matching grams of a selected order, like single words (one-gram) or word pairs (two-gram or bigram). By default, it computes up to a cumulative 4-gram BLEU score, also known as BLEU-4. The cumulative and individual one-gram BLEU use the same weights. The two-gram weights assign 50 percent to each of one-gram and two-gram and the three-gram weights are 33 percent for each of the one, two, and three-gram scores. The weights for the BLEU-4 are 25 percent for each of the one-gram, two-gram, three-gram, and four-gram scores. Table 2 depicts the comparison table of the BLEU score of different models. It also shows that our model generates captions better than other existing models. This proposed model is executed based on TensorFlow framework on a NVIDIA TESLA T4 GPU. The epochs were set to 50 for the purpose of training. After the completion of epochs, the trained CNN-RNN-based captioning model consist of trained weights for testing was obtained. The real-time caption generation from the developed application is shown in Fig. 3. On shaking the mobile, then the image is taken, and its corresponding caption is generated in the form of speech. The results obtained from the android application is shown in Fig. 3. The generated captions are (a) a vase filled with purple flowers, (b) a clock on a table, (c) a car parked in front of a window.

5 Conclusion Today, modern technology has grown by leaps and bounds. This can be harnessed efficiently for creating a device that will aid the blind to live fuller lives. The proposed

480

M. Regi and M. Abraham

Fig. 3 Generated captions from the application

system will provide them with a better understanding of their surroundings and make them more independent. So our project aims to develop a user friendly application that can guide the visually impaired in our society. The proposed system focuses on generating captions for varied images. This android application will generate meaningful sentences for images captured by the camera aligned smartphone of the blind user. It then speaks out the captions formed for the benefit of the visually impaired. As future work, this can work more precisely generate captions if it transferred to video file. Some issues like out-of-focus or blurred imagery could be solved by utilising the video as an input to the system. The efficacy of the network enables to tide over any delays in caption generation for images fed as input to the server. The accuracy of prediction can be increased through high-quality datasets and efficient training.

References 1. Khademi, M., Schulte, O.: Image caption generation with hierarchical contextual visual spatial attention In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT pp. 2024–20248 (2018). https://doi.org/10.1109/CVPRW. 2018.00260 2. Kinghorn, P., Zhang, L., Shao, L.: A region-Based Image Caption Generator with Refined Descriptions. Elsevier (2018). https://doi.org/10.1016/2017.07.0140925-2312/2017 3. Wang, E.K., Zhang, X., Wang, F., Wu, T., Chen, C.: Multilayer dense attention model for image caption. In: 2019 IEEE Access 7, 66358–66368. (2019). https://doi.org/10.1109/ACC ESS.2019.2917771.

VISION AID: Scene Recognition Through Caption Generation Using …

481

4. Wu, J., Hu, H.: Cascade recurrent neural network for image caption generation. Electron. Lett. 53(25), 16421643 (201, 7th December). (IEEE) 5. Lin, T.Y., et al.: Microsoft coco: Common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8693. Springer, Cham. https://doi.org/10.1007/978-3-319-10602-1_48 6. ResearchGate, Fig. 8 Illustration Of The Network Architecture Of VGG-19 Model: conv means convolution, FC means fully connected , https://www.researchgate.net/figure/llustr ation-of-the-network-architecture-of-VGG-19-model-conv-means-convolution-FC-means_ fig2_325137356 7. Wu,Q., Shen, C., Wang, P., Dick, A., van den Hengel, A.: Image captioning and visual question answering based on attributes and external knowledge. IEEE Trans. Pattern Anal. Mach. Intell. (2017). https://doi.org/10.1109/tpami.2017.2708709 8. Poghosyan, A., Sarukhanyan, H.: Long short-term memory with read only unit in neural image caption generator. IEEE Comput. Sci. Inf. Technol. (2017). https://doi.org/10.1109/csitechnol. 2017.8312163,2017 9. Kumar, N.K., Vigneswari, D., Mohan, A., Laxman, K., Yuvaraj, J.: Detection and recognition of objects in image caption generator system: a deep learning approach. In: 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS), Coimbatore, India, 107–109 (2019). https://doi.org/10.1109/ICACCS.2019.8728516 10. Luo, R.C., Hsu, Y., Wen, Y., Ye, H.: Visual image caption generation for service robotics and industrial applications. In: 2019 IEEE International Conference on Industrial Cyber Physical Systems (ICPS), Taipei, Taiwan, 827–832 (2019). https://doi.org/10.1109/ICPHYS.2019.878 0171 11. Re, Z., Wang, X., Zhang, N., Lv, X., Li, L.-J.: (2017) Deep reinforcement learning based image captioning with embedding reward. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017). IEEE. https://doi.org/10.1109/cvpr.2017.128 12. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the International Conference on Machine Learning, pp. 20482057 (2015) 13. Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning.In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 3242–3250 (2017). https://doi.org/10.1109/CVPR.201 7.345

Effect of Hybrid Multi-Verse with Whale Optimization Algorithm on Optimal Inventory Management in Block Chain Technology with Cloud C Govindasamy and A. Antonidoss

Abstract One of the important works of supply chain management is the optimal inventory control. The optimal inventory control techniques plan to minimize the supply chain cost by efficiently managing the inventory. This paper tactics to analyze the influence of hybrid Multi-Verse Optimization (MVO) and Whale Optimization Algorithm (WOA) termed as Whale-based Multi-Verse Optimization Algorithm (WMVO) on optimal inventory management in block chain under cloud sector. The costs like transaction cost, inventory holding cost, shortage cost, transportation cost, time cost, setup cost, back-ordering cost and quality improvement cost are considered for deriving the multi-objective model. The effectiveness of the proposed hybrid algorithm is analyzed by varying Travelling Distance Rate (TDR) from 0.2 to 1.2, and the model is evaluated with the assistance of block chain under the cloud sector. Keywords Supply chain management · Optimal inventory control · Whale-based multi-verse optimization algorithm · Block chain · Transaction cost · Inventory holding cost · Shortage cost · Transportation cost · Time cost · Setup cost · Back-ordering cost · Quality improvement cost

1 Introduction The continuity of organizations in this provident world is about the merit of controlling inventories. In most of the fabricating organizations, there must be few kinds of inventory varieties like; effectiveness of material on the technique that is not yet completed raw materials that are progressing to be sorted via generation and finished outcomes for sales that are managed for the organization. The enhancement of the organization is achieved by the best inventory control strategies [1]. C. Govindasamy (B) · A. Antonidoss Department of Computer Science and Engineering, Hindustan Institute of Technology and Science, Chennai, India e-mail: [email protected] A. Antonidoss e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_52

483

484

C. Govindasamy and A. Antonidoss

Inventory management is an analytical management problem for many companies like medium-sized companies, large companies and small companies. Block chains are a dispensed framework and determining model based on an asymmetric encryption algorithm. Block chains contain distinctive benefits over existing transaction techniques and data storage. One of the best components is the efficient flow inventory management. The generation techniques are adopted in certain inventories. The retailers and wholesalers hold the necessary sufficient inventories. The basic objective is to attain inventory management between high ROT and low inventory. The levels of inventory handle the organization materials very efficiently.

2 Literature Review Although there were several inventory management models, still there exist various challenges that have to be resolved in the future. Integrated research framework [2] enhances the behaviour of inventory control systems, and it also reduces the inventory-related costs. But, it does not consider the inconsistent and changing dates of expiry in various groups of received orders, and it also does not consider the up-todate behavioural factors in healthcare. The two-stage stochastic programming model [3] is sufficiently adaptable, and it also reduces the present target levels to minimize the total cost and wastage. Still, it does not involve the feasibility regarding the hospital in a network. Holistic Mixed Integer Linear Programming (MILP) model [4] permits the dynamic inventory management, and interacting pumping runs, and it also quickly proves the optimality. However, it does not manually assign the starting products inside the pipeline, and it influences both the Central Processing Unit (CPU) time and the solution quality. Simultaneous Equation Modelling (SEM) [5] provides the concurrent and associated relationship among the demand simulation impact and the sales impact, and it also provides various methods for producing the resulted goods. Yet, it has limitations of data availability, and it also does not differentiate and compute the predicting accuracy of every system. Continuous-time scheduling model [6] accomplishes the optimization of depot inventory management and multiproduct pipeline transportation. But there exist few challenges among the realistic applications and the developed work. These challenges motivated to analyze the influence W-MVO on optimal inventory management in block chain under the cloud sector.

Effect of Hybrid Multi-Verse with Whale Optimization Algorithm …

485

3 Major Assumptions and Structure of Proposed Inventory Management Model 3.1 Structure and Assumptions The three-echelon supply chain inventory model is composed of n number of suppliers, manufacturer, transport chain and o number of distributors in the middle of every echelon. The system is split into manufacturers, suppliers and distributors. Few presumptions are articulated as in [7].

3.2 Problem Definition Parameters of the inventory cost: The term ho12 j represents the holding cost of 1 indicates transportation cost the final product j which represents manufacturer. tcn1k of raw material k from the supplier n to manufacturer. scoj denotes shortage cost 2 denotes the fixed order cost of the final of the product j for the distributor o. oc1oj 1 represents the holding cost product j from the distributor o to manufacturer. honk 3 of raw materials at the supplier n. hooj denotes holding cost of the last product 2 represents transportation cost of the final product j from at the distributor o. tc1oj 1 represents the fixed order cost of raw material k manufacturer to distributor o. ocn1k from manufacturer to the supplier n. deoj (tim) represents the need for raw materials from the manufacturer during the time tim. I n 21 j (tim) denotes real-time inventory of completed product during the time tim. 1 I n nk (tim) represents the real-time inventory of raw material k at supplier n during the time tim. I n 3oj (tim) represents real-time inventory of completed product for distributor o during the time tim(I n 1nk (tim), I n 21 j (tim) and I n 3oj (tim) represent the non-negative integers). 2 denotes the delayed transportation cost of Parameters of the time cost: ve1oj 2 denotes the delayed the final product j from the manufacturer to distributor o. tr1oj 1 transit time of the final product j from the manufacturer to the distributor o. ven1k represents the delayed transportation cost of raw material k from the supplier n to 1 represents the delayed transit time of raw material k from the manufacturer. trn1k supplier n to the manufacturer. k, k = 1, 2, . . . , K denotes the index number of raw material; n, n = 1, 2, . . . , N denotes the index number of supplier inventory; j, j = 1, 2, . . . , J represents the index number of the final product; u, u = 1, 2, . . . , U represents the index number of time period; o, o = 1, 2, . . . , O represents the index number of distributor inventory. Parameter initialization of remaining costs: Let the cost of each item be represented as I c1 , I c2 , . . . , I c j , where j represents the finished product. The additional cost to improve quality is represented as Ac1 , Ac2 , . . . , Ac j , in which j represents

486

C. Govindasamy and A. Antonidoss

the finished product. The supplier setup cost is represented as As1 , As2 , . . . , Asn , where n denotes the number of suppliers. The manufacturer setup cost is represented as Am. The distributor setup cost is represented as Ad1 , Ad2 , . . . , Adn , where n represents the number of distributors.

4 Contribution of Whale-Based Multi-Verse Optimization for Inventory Management 4.1 Proposed Architecture The multi-level three-echelon supply chain is shaped with ’manufacturers, suppliers and distributors’. The architecture of block chain under the cloud sector is represented in Fig. 1. In the developed technique, the five inventory management parameters are integrated via block chain technology inside the cloud environment. These parameters are optimized with the help of the developed W-MVO algorithm. The multi-objective function involves several cost functions. Therefore, with the help of W-MVO, these

Supplier

Manufacturer

Distributor Cloud Block chain

Cost function Transaction cost W-MVO

Inventory holding cost Shortage cost Transportation cost Time cost Setup cost Quality Improvement cost Back ordering cost

Fig. 1 Architecture of proposed inventory management

Effect of Hybrid Multi-Verse with Whale Optimization Algorithm …

487

costs are minimized, and then, the finally obtained optimal solution is linked to every distributor and stored in the cloud with the help of block chain. This completed optimal solution of each distributor is safeguarded and could not display to any distributors.

4.2 Proposed W-MVO The usage of optimization algorithms gained high attention between scientists [8]. Multi-Verse Optimization (MVO) [9] is inspired by the abundant big bang that follows to the conveyance of circle. Although there are several advantages, it suffers from various disadvantages such as the binary version, and the multi-objective method is not achieved. Therefore, to overcome the disadvantages, Whale Optimization Algorithm (WOA) is integrated into it, and the resultant algorithm is known as W-MVO. WOA [10, 11] is a nature-inspired meta-heuristic technique, which has the capacity of handling different problems. When it is differentiated with other types of optimization algorithms, WOA has many advantages like exploration capability, exploitation capability, etc. optimization techniques or procedures are integrated to generate a hybrid optimization algorithm. Generally, in the conventional MVO, if ran2 < WEP the mechanism is updated using Eq. (1) and if ran2 ≥ WEP, then the same solution is used. But in the proposed W-MVO if ran3 < 0.5, the solution is updated using Eq. (1) of MVO. Otherwise, if ran2 ≥ WEP, then the location of the individual is updated using the WOA based on Eq. (2). ⎧    ⎪ ⎨ C g + TDR × ubg − lbg  × ran4 + lbg  ran3 < 0.5 ran2 < WEP C g − TDR × ubg − lbg × ran4 + lbg ran3 ≥ 0.5 c gp = ⎪ ⎩ c gp ran2 ≥ WEP (1) c(r + 1) = H  · ecran · cos(2π ran) + c∗ (r )

(2)

In the above equation, r represents the iteration, c represents the solution, c repre c ∗ (r ) − c(r )| denotes sents a constant, c∗ represents the location of prey, H = | the distance of whale to prey, ran represents random number in the interval range of [−1, 1], and · represents element-by-element multiplication.

488

C. Govindasamy and A. Antonidoss

In1nk (tim )

In12j (tim )

Inoj3 (tim )

tr11k

P1

2 tr1oj

Pne

N1

N be

Fig. 2 Solution encoding

4.3 Solution Encoding The actual tuning or optimization of the parameters is to reduce the multi-echelon supply chain cost of the inventory technique. Along with these parameters, the probability of failure Pn and the number of backorders Nn are also taken as a decision variable in solution encoding. The boundary value of the Pn lies in between the range of [0.3, 0.9], and the boundary value of Nn lies in between the range of [0, 5]. The diagrammatic representation of solution encoding is represented in Fig. 2. Here, j represents the finished product. The proposed W-MVO algorithm is employed to acquire the minimum objective function for obtaining a better solution for inventory management.

4.4 Objective Model The aim is to reduce the ’multi-level inventory cost’. 1. Transaction cost: It involves the transaction cost between the manufacturer and suppliers and between manufacturers and distributors as in Eq. (3). O=

N K

1 On1k +

O J

n=1 k=1

2 O1oj

(3)

o=1 j=1

2. Inventory holding cost: It involves the cost of the manufacturer, suppliers and distributors as in Eq. (4). H=

N K

1 honk

·

In1nk (tim)

n=1 k=1

+

J j=1

ho12 j

·

In21 j (tim)

+

O J

3 hooj ·

0=1 j=1

I n 3n (tim)

(4)

3. Shortage cost: It is defined by the need of raw materials from the manufacturer deoj (tim), shortage cost scoj and the real-time inventory of the final product In2oj (tim). In2oj (tim) as in Eq. (5). S=

O J o=1 j=1

scoj · deoj (tim) − I n 2oj (tim)

(5)

Effect of Hybrid Multi-Verse with Whale Optimization Algorithm …

489

4. Transportation cost: It involves the transport cost between distributors and manufacturers and between manufacturers and suppliers as in Eq. (6). Tr =

N K

1 tcn1k

·

+

I n 1nk (tim)

n=1 k=1

O J

2 tc1oj · I n 21 j (tim)

(6)

o=1 j=1

5. Time cost: It involves the time cost between distributors and manufacturers and between the manufacturer and the suppliers as in Eq. (7). T =

N K

1 1 ven1k · trn1k +

n=1 k=1

O J

2 2 ve1oj · tr1oj

(7)

o=1 j=1

6. Setup cost: It is defined as the cost sustained to get equipment prepared to process a divergent quantity of goods as in Eq. (8).

Sec =

N n=1

Asn

K



tcn1k · I n 1nk (tim) + Am +

O o=1

k=1

⎢ Ado ⎢ ⎣

J



tcok ⎥ ⎥ ⎦ ·I n 2k j (tim) j=1

(8)

7. Quality improvement cost: It is defined as the probability of additional costs incurred that result after the finished product as in Eq. (9). QIC =

K

Pk · Ac j

(9)

k=1

In the above equation, P represents the probability, and Ac represents the additional cost. 8. Back-ordering cost: It is defined as a type of cost that is sustained by an inventory when it is not able to complete an order and must finish it after some time later as in Eq. (10). BOC =

K

Icj · bj

(10)

k=1

Here, I c represents the item cost, and b denotes the number of backorders. The multi-echelon supply chain inventory model’s objective function is given in Eq. (11). Z = α(O + H + T r + S + Sec) + βT + γ (Q I C + BOC)

(11)

490

C. Govindasamy and A. Antonidoss

In the above equation, the values of α, β and γ are represented as α = 0.5, β = 0.2 and γ = 0.3.

5 Results and Discussion 5.1 Simulation Setup The developed inventory management in block chain technology under cloud selector was performed in MATLAB 2018a, and analysis was executed. The characteristics of the developed technique were analyzed by taking into account three test cases. The total population size was considered as 10, and the maximum rounds performed were 1000. The behaviour of the developed W-MVO was differentiated based on the analysis of proposed W-MVO and statistical analysis by varying the travelling distance rates as 0.2, 0.4, 0.6, 0.8, 1.0 and 1.2.

5.2 Analysis of Proposed W-MVO The analysis of the proposed W-MVO is graphically represented in Fig. 3. The travelling distance rate of the W-MVO is altered from 0.2 to 1.2, and the analysis is performed. In Fig. 3a, for test case, the proposed W-MVO performs well when any travelling distance rate of the W-MVO is taken into consideration. The cost function linked to all TDR is overlaid, and therefore, it is concluded that for any TDR, the proposed W-MVO is attaining minimum cost function. At 500th iteration, the cost function of the proposed W-MVO is maximum at TDR = 1.0. Hence, it can be concluded that the proposed W-MVO is best applicable for inventory management in block chain under the cloud sector.

6 Conclusion This paper analyzed the influence of hybrid W-MVO on optimal inventory management in block chain under the cloud sector. The costs like transaction cost, inventory holding cost, shortage cost, transportation cost, time cost, setup cost, back-ordering cost and quality improvement cost were considered for deriving the multi-objective model. The effectiveness of the proposed hybrid algorithm was analyzed by varying the TDR value, and the model was evaluated with the assistance of block chain under the cloud sector. Moreover, from the analysis, the cost function of the proposed WMVO is maximum at TDR = 1.2. Thus, it can be concluded that the proposed

Effect of Hybrid Multi-Verse with Whale Optimization Algorithm …

491

Fig. 3 Algorithmic analysis of the proposed W-MVO for inventory management in block chain under cloud sector by varying the TDR for ‘a test case 1, b test case 2 and c test case 3’

W-MVO-based block chain under the cloud sector performed effectively when it was analyzed with various TDR values.

References 1. Chukwuemeka, G.H., Onwusoronye, O.U.: Inventory management: pivotal in effective and efficient organizations. A case study. J. Emerg. Trends Eng. Appl. Sci. 4(1), 115–120 (2013) 2. Saha, E., Ray, P.K.: Modelling and analysis of inventory management systems in healthcare: a review and reflections. Comp. Ind. Engine. 137, 1–16 (2019) 3. Dillon, M., Oliveira, F., Abbasi, B.: A two-stage stochastic programming model for inventory management in the blood supply chain. Int. J. Prod. Econ. 187, 27–41 (2017) 4. Mostafaei, H., Castro, P.M., Relvas, S., Harjunkoski, I.: A holistic MILP model for scheduling and inventory management of a multiproduct oil distribution system. Omega. 1–47 (2019) 5. Chuang, C.-H., Zhao, Y.: Demand stimulation in finished-goods inventory management: empirical evidence from general motors dealerships. Int. J. Prod. Econ. 208, 208–220 (2019)

492

C. Govindasamy and A. Antonidoss

6. Yu, L., Chen, M., Xu, Q.: Simultaneous scheduling of multi-product pipeline distribution and depot inventory management for petroleum refineries. 220 (2020) 7. Wang, Y., Geng, X., Zhang, f., Ruan, J.: An immune genetic algorithm for multi- echelon inventory cost control of IOT based supply chains. IEEE Access. 6, 8547–8555 (2017) 8. Rajakumar, B.R.: Impact of static and adaptive mutation techniques on genetic algorithm. Int. J. Hybrid Intell. Syst. 10(1), 11–22 (2013) 9. Mirjalili, S., Mirjalili, S.M., Hatamlou, A.: Multi-verse optimizer: a nature-inspired algorithm for global optimization. Neural Comput. Appl. 27, 495–513 (2016) 10. Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016) 11. Beno, M.M., Valarmathi, I.R., Swamy, S.M., Rajakumar, B.R.: Threshold prediction for segmenting tumour from brain MRI scans. Int. J. Imaging Syst. Technol. 24(2), 129–137 (2014)

Bottleneck Feature Extraction in Punjabi Adult Speech Recognition System Shashi Bala, Virender Kadyan, and Vivek Bhardwaj

Abstract In this paper, the bottleneck feature extraction technique with MLP is used on Punjabi adult speech recognition. Nowadays, neural networks are most widely used approaches for training and testing the system. It helps to recognize the back probabilities among various phoneme set. This input info includes at some point get wrapped, and it becomes difficult to prepare them on Hidden Markov Model (HMM) based state-of-the-art synthesis. Here, context-based model is trained on Deep Neural Network (DNN) and after that on Bottleneck-Neural Network (BNNN) system with the use of Multi-layer Perceptron (MLP). The baseline ASR is performed with different environment conditions on different modelling system. To improve the performance of a system, MLP-based supervised learning method utilizing for adjoining voice outlines related data to change the design of profound neural system DNN by extracting the bottleneck features. Finally, the MLP are used as input for the DNN-HMM and BN-NN state-of-the-art system. This paper presents the larger improvement obtained by applying the MLP feature vector with the relative improvements of 4.03% which is achieved on the Punjabi ASR with varying the several attributes associated with BN-NN and DNN-HMM modelling approaches. Keywords BN-NN · Mel-frequency cepstral coefficients (MFCC) · MLP · DNN-HMM

S. Bala · V. Bhardwaj Chitkara University Institute of Engineering & Technology, Chitkara University, Rajpura, Punjab, India e-mail: [email protected] V. Bhardwaj e-mail: [email protected] V. Kadyan (B) Department of Informatics, School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_53

493

494

S. Bala et al.

1 Introduction Neural networks have become a part of day to day human life from past few years as we are rigorously moving towards human and machines interaction. However, it has been remaining a major concern for many researchers that how to recognize the pattern of a speech process. Need of pattern recognition evolved many techniques like HMM, GMM [1–3]. Other than the probabilistic bottleneck approach [4–6] for HMM-GMM [1–3] system (e.g. MLP based) acoustic modelling that have been additionally investigated as an elective methodology of Hidden Markov Model system [1–3]. With regards to evaluate back likelihood, NN’s-based feature extraction with two hidden layers can be considered as a procedure of non-direct feature transformation. While BN-NN approach is utilized as a non-direct discriminative analysis that can be interpreted as dimension reduction method of state-of-the-art framework. The BN-NN features are basically concatenated with MFCC which performs as the output for posterior features. In the light of the ongoing achievement of profound neural system in hybrid acoustic modelling, the initial step is the calculation of BN features that are taken already []. In order to fuse state-of-the-art info in HMMGMM, as [4] appeared with the combination of BN-NN are various related ideas resulted as a better performance of the system. Additionally, training the MLP-NN on MFCC feature extraction with five-layer BN feature, where the neural network evaluated as phoneme posteriors. Consequently, we have depicted the BN-NN feature on MFCC features. To begin with, the output layer is optimized; second, the further structure is used for MLP training for the output of DNN-HMM and BN-NN linear transformation for Punjabi speech recognition. The rest of this paper is organized as: Sect. 2 gives a related work performs in BN features and a whole description of BN-NN in Sect. 3. In Sect. 4, whole system overview is summarized on BN feature extraction. Section 5 describes the experimental setup with corpus table. Result and analysis are reported in Sect. 6, finally some conclusion in Sect. 7.

2 Related Work In [5], author generated bottleneck features with the help of ANN structure and has obtained 33.3% WER and has shown 2.5% better improvement over HLDAPLP baseline of state-of-the-art system. Also [7] presented a bottleneck features for LVCSR dataset by some reduction in WER of the system, where [8] has presented deep neural nets by exploring linear units for LVCSR utilizing Bayesian optimization approach for relative improvement of the system. In [4], authors presented system for preparing multilingual MLP’s, likewise, characterize the use of a language ward layer traditional three layer’s which are used to drive phoneme posteriors. This methodology allows for sharing assets cross-wise over dialects without expecting to develop common phoneme set. Likewise, Morgan et al. [9] proposed a novel method to

Bottleneck Feature Extraction in Punjabi Adult Speech …

495

manage multi-layer perceptron factor analysis using five-layer MLP with a normalized linear bottleneck layer can outperform three-layer MLP system using MFCC alone. Therefore, while taking about bottleneck, in [1], Michael et al. presented DNN-based acoustic modelling for finding that they can match ASR system production with Aurora without definite any noise. Kadyan et al. [3] have describe various normalizing databases with using RASTA channel standardization of feature before input to the MLP getting 18% relative improvement of WER.

3 Theoretical Background 3.1 Bottleneck This approach is presented by Grezl et al. [10] which can be translated as a non-direct dimensionality reduction method it fundamentally dependent on MLP approach, where the internal layers has a small hidden unit, like the size of another hidden layer. These layer makes a limitation/necessity in the system that must have the option to produce compressed features after compelling the dimensionality reduction. Therefore, bottleneck features can be derived using both unsupervised and supervised method [11]. In supervised training, decoder is used to train acoustic model in several languages and conditions [4–6, 10]. The system comprises of an encoder and a decoder as shown in Fig. 1. The input consists classifier with hidden vector x encoded to hidden layer h which calculates the posterior probability over HMM state. x is encoded to hidden layer h by a non-linear activation function σ, using learned weight matrix W (1) and bias vector b(l) as follows: Fig. 1 Structure of bottleneck feature with decoder

496

S. Bala et al.

  h = w(1) x + b(1)

(1)

after that, input layer is decoded from the hidden layer to produce a reconstructed layer y using learned weight matrix W (2) and bias layer b(2) as follows:   y = w(1) h + b(1)

(2)

The autoencoder parameter θ = (W (1) , b(1) ), (W (2) , b(2) ) is learned using backpropagation algorithm by minimizing the mean square error (MSE) loss(m) as defined: MMSE (θ ) =

1 m MSE (x, y) d

(3)

The learning process attempts to minimize the prediction error L (x, y) with respect to the parameter θ = (W (1) , b(1) ), (W (2) , b(2) ), …, (W (L) , b(L) ). Typically, the loss function in MLP is the cross-entropy error function [12]. Bottleneck features provide more effective information while preserving enough information of the original input features.

4 System Overview The bottleneck-NN based Punjabi ASR has been described in Fig. 2. First, the system is trained and tested on bottleneck-NN. For evaluating the accuracy, the front-end

Fig. 2 Block diagram summarizing the BN-NN features for enhancing Punjabi ASR system

Bottleneck Feature Extraction in Punjabi Adult Speech …

497

feature extraction technique MFCC is used in BN-NN based solution in ASR. So, to improve the performance of the BN-NN-based ASR is trained on MLP features by KALDI toolkit [13]. In training and testing phase, the stage of an input speech is handled by utilizing 20 ms window with an edge set of 100 Hz helping pre-emphasis factor 0.97. The extricated input frames are changes over into frequency domain that is prepared by DFT. It helped in expulsion of stage info from the output of short-term spectrum. The Fourier sign is additionally gone through 25 channel banks. At long last, DCT is utilizing to change the Mel-frequency which is effective to deliver numerous arrangements of de-correlated cepstral coefficients. These coefficients are utilizing to expel higher order data from it. Therefore, the output acquired is 13 default coefficients with a splicing factor. A sum of 9 setting size edge with 4 left and right each has been analyzed. The mail output got with 13 default MFCC is prepared with HMM modelling; with following, the procedure of feature extraction, the monophones, and  +  (triphone (delta + delta − delta)) are figured on tri2 models. Further, these  +  are joined with a static cepstra to 39-dimensional size feature vector. So, as to improve the performance of the framework: LDA and MLLT techniques have been applied to implement 40-time spliced reduced features. Although, these features further performed with fMLLR approach in tri4 model with speaker adaptive training which is used to handle these tri4 features. The output has been gotten on triphone modelling, where the model has been given to baseline GMM-HMM, DNN-BN approaches. In part of, GMM-HMM, three states HMM using an eight corner to corner covariance mixture for each state used. A total of 2500 number of leaved and 30,000 Gaussians a are selected. Further, DNN system is trained with tanh non-linearity model with variation in hidden layers. For improvement of DNNHMM system, learning rate and epochs are employed on a mini batch size of 512 Besides.

5 Experimental Setup For experiments, we employ two set of corpora, where 422 phonetically rich and connected words generated from 5000 most frequent words. Later, these sets well combined to form only one set. The replication of unique sentences and words will further be produced by 20 different speakers. A roman transcription was done from audio by keeping the aspect of linguistic characteristics of Punjabi language. There are 7 male 13 female in the synthetically created dataset. Also, we divide the collected dataset into two sets: 70% in train and rest 30% in test. Table 1 shown represents the training and testing partitions of combined dataset. To analyze the performance of obtained dataset, two parameter was employed, i.e. word error rate (WER) and relative improvement (RI).

498

S. Bala et al.

Table 1 Corpus specification of bottleneck feature extraction

Type

Test

Train

No. of speakers

6

14

Language used

Punjabi

Type of data

Phonetically rich sentences and isolated words

Age of speakers

18–26

Total no. of audio files

1211

2866

Gender

3-male, 3-female

4-male, 10-female

6 Results and Discussion Here, the result section is performed by specifying the accuracy of the system. To obtain the performance of the ASR utilizing the BN-NN, DNN-HMM metrices like, word Accuracy (WA) = 100 × ((TW − S − D − I)/TW) and word error rate (WER) = (S + D + I)/TW.

6.1 Performance Measure in Clean Environment of GMM-HMM with MFCC and DNN For the entire dataset, the system is training, and testing was done using the corpus of the system, i.e. speaker independent on training and speaker dependent in testing phase. The acoustic models were trained on these datasets, and the size of comparing language model 5 k was utilized. An input speech signal was processed to generate acoustic features using 13 static MFCC + Delta + Double Delta coefficients. It was likewise seen that later linear discriminative analysis (LDA) reproduces these separated acoustic features and essentially ad libbed the training of small vocabulary dataset. The initial 13 MFCC features in combination with nine frames was resulted into 117 dimensions which were further wrapped into 40 dimensions through LDA approach. apart; it additionally attempted to utilize these features on HMM state arrangements utilizing triphones models (Table 2). Table 2 WER verifying through the acoustic modelling with MFCC WER (%)

Mono

Tri1

Tri2

Tri3

Tri4

DNN

BN-NN

6.50

8.09

8.07

7.88

5.76

4.12

4.03

Bottleneck Feature Extraction in Punjabi Adult Speech …

499

Table 3 WER verifying learning rate on different modelling techniques Learning rate

0.005–0.0005

0.010–0.0010

0.015–0.0015

0.020–0.0020

DNN

4.12

3.98

4.03

4.11

BN-NN

4.03

4.06

4.05

3.80

6.2 Performance Measure Through Learning Rate For successful training of DNN-HMM system, different variation was trained by varying key parameters, i.e. learning rate number of iteration and number of epochs as follows: Further, to boost up the accuracy of the explored system, the different values of learning rate are analyzed by gradient decent algorithm, where weights are in layers which are calculated through gradient error for changing them to compressing the error rate. Therefore, the system is examined through calculating the predetermined piecewise constant learning rate which clearly depicts that when to change the values of learning rate to what value [12]. In Table 3 shows that the results performed on learning rate, and the system has finally achieve the maximum efficiency for DNN and BN-NN at 0.010–0.0010 and 0.020–0.0020.

6.3 Performance Measure Through Epochs To find the best results by passing the entire dataset through neural networks is not enough. Basically, impulse in an input signal is caused due to its ’vocal folds’ that will closer due to its pitch is called ‘epochs. The major occurrence in vocal tract was due to its glottal pulse, where significant excitation was taking place at each epoch’s location. The calculation simply involves working out the difference between the observed output for each unit, then adding up all these squared differences for each output unit and for each input signal [21]. Therefore, in Table 4, epochs values of the audio files input are ranges from 15, 20, 25, and 30, where the system got finest result on epochs_30. For BN-NN with no change in DNN-HMM system. Table 4 WER verifying epochs on different modelling techniques No. of epochs

Epochs_15

DNN

4.10

BN-NN

4.14

Epochs_20

Epochs_25

Epochs_30

4.12

4.04

4.03

4.03

4.97

3.64

500

S. Bala et al.

7 Conclusion The work proposed here focuses on the effect of feature vector on Punjabi language with BN-NN. To further, extent the effectiveness, these variations has been projected on DNN-HMM system. Preceding model training, optimal value of learning rate and epoch are selected to produce effective result. Overall, the system is evaluated on original and synthetic speech corpus, where gain has been obtained thorough fMLLR speaker adaption training of the network which gives the finest performance of the system. The output of the system on BN-NN achieved a relative improvement of 3.33% over conventional GMM-HMM and DNN-HMM system.

References 1. Seltzer, M. L., Yu, D., Wang, Y.: An investigation of deep neural networks for noise robust speech recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7398–7402 (2013) 2. Patel, I., Rao, Y.S.: Speech recognition using hmm with mfcc-an analysis using frequency specral decomposion technique. Sig. Image Process. Int. J. (SIPIJ) 1(2), 101–110 (2010) 3. Kadyan, V., Mantri, A., Aggarwal, R.K.: Improved filter bank on multitaper framework for robust Punjabi-ASR system. Int. J. Speech Technol. 1–14 (2019) 4. Yu, D., Seltzer, M.L.: Improved bottleneck features using pretrained deep neural networks. In: Twelfth Annual Conference of the International Speech Communication Association (2011) 5. Grézl, F., Karafiat, M., Burget, L.: Investigation into bottle-neck features for meeting speech recognition. In: Tenth Annual Conference of the International Speech Communication Association (2009) 6. Grézl, F., & Karafiát, M.: Hierarchical neural net architectures for feature extraction in ASR. In: Eleventh Annual Conference of the International Speech Communication Association (2010) 7. Veselý, K., Karafiát, M., Grézl, F.: Convolutive bottleneck network features for LVCSR. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 42–47 (2011) 8. Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8609–8613. IEEE (2013) 9. Morgan, N.: Deep and wide: multiple layers in automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 7–13 (2011) 10. Grézl, F., Karafiát, M., Kontár, S., Cernocky, J.: Probabilistic and bottle-neck features for LVCSR of meetings. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, vol. 4, pp. IV-757. IEEE (2007) 11. Valente, F., Magimai-Doss, M., Wang, W.: Analysis and comparison of recent mlp features for lvcsr systems. In: Twelfth Annual Conference of the International Speech Communication Association (2011) 12. Essays, UK. (November 2018). Speech Recognition using Epochwise Back Propagation. Int. J. Comput. Appl. 0975 – 8887. Retrieved from https://www.ukessays.com/essays/computerscience/speech-recognition-using-epochwise-8817.php?vref= 13. Yegnanarayana, B., Murty, K.S.R.: Event-based instantaneous fundamental frequency estimation from speech signals. IEEE Trans. Audio Speech Lang. Process. 17(4), 614–624 (2009) 14. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2011)

Bottleneck Feature Extraction in Punjabi Adult Speech …

501

15. Bourlard, H., Morgan, N.: Continuous speech recognition by connectionist statistical methods. IEEE Trans. Neural Netw. 4(6), 893–909 (1993) 16. Grézl, F., Karafiat, M., Janda, M.: Study of probabilistic and bottle-neck features in multilingual environment. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 359–364. IEEE (2011, December) 17. Grezl, F., Fousek, P.: Optimizing bottle-neck features for LVCSR. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4729–4732. IEEE (2008) 18. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (No. CONF). IEEE Sig. Process. Soc. (2011) 19. Rodríguez, L.J., Torres, I.: Comparative study of the baum-welch and viterbi training algorithms applied to read and spontaneous speech recognition. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 847–857. Springer, Berlin, Heidelberg (2003) 20. Senior, A., Heigold, G., Ranzato, M. A., Yang, K.: An empirical study of learning rates in deep neural networks for speech recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6724–6728

A Study of Machine Learning Algorithms in Speech Recognition and Language Identification System Aakansha Mathur and Razia Sultana

Abstract Speech recognition is a broad topic that primarily involves sub-topics like language identification, speaker identification, speech emotion recognition, speech to text systems, text to speech systems, dialogue systems and much more. While, human beings are quickly able to recognize or identify a language because of the corpora of knowledge built over the years. However, it is a challenging task to have a machine identify a spoken language. So, to build a system that can correctly identify multiple languages irrespective of the dialect and speaker characteristics is an interesting area of research. One benefit of such a LID system is that the barrier between people caused due to language differences will be broken. Such a system will further the progress of globalization. The latest developments of machine learning in speech and language are described as a detailed state of the art in this paper. Keywords Machine learning · Speech recognition · Support vector machines · Classification · Language

1 Introduction Every language across world is recognized by machine learning algorithm using an identification pattern. A language identification system (LID) aims to detect the language spoken in an audio speech signal or file. Most LID systems, proposed by research, consist of two stages: 1. Feature extraction stage: The feature extraction stage primarily involves extraction of audio signal features like melody and stress. 2. Classification stage. Before the feature extraction stage, the input speech signals are preprocessed. A. Mathur (B) · R. Sultana Department of Computer Science, BITS Pilani, Dubai, United Arab Emirates e-mail: [email protected] R. Sultana e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_54

503

504

A. Mathur and R. Sultana

1.1 Preprocessing The preprocessing steps include the following.

1.1.1

Pre-emphasis

The pre-emphasis stage primarily involves smoothening the frequency spectrum. This method improves the efficiency of the system by amplifying the high-frequency part of signal.

1.1.2

Framing

Framing is the method of diving a speech signal into frames. Each frame is usually 20– 30 milli seconds. The frames overlap each other for some milli seconds. A common overlap time is 20 milli seconds.

1.1.3

Windowing

Windowing lessons the discontinuities in the frames. A window function is applied on every frame. A commonly used window function is Hamming window function.

1.2 Machine Learning in Speech Processing The classification stage involves selecting a classifier, passing the extracted features to the classifier and identifying the language. Earlier research had involved using only audio signal processing techniques for language identification. In other words, only signal processing techniques were used for both stages in a LID system. A common signal processing method used for classification purposes is vector quantization method. However, as research progressed, many researchers began using machine learning classification algorithms like Gaussian mixture models (GMM), decision trees (DT), K-nearest neighbours (K-NN), support vector machines (SVM), artificial neural networks (ANN) and deep neural networks (DNN). These machine learning classifiers performed very well in a LID system. This also highlights the application of machine learning in speech recognition. While the classification stage of a LID system has used machine learning techniques, the feature extraction stage predominantly uses signal processing techniques.

A Study of Machine Learning Algorithms in Speech Recognition …

505

1.3 Types of Features There are various speech signal features that are extracted for discriminating between languages. There are two types of features: low-level and high-level speech features. The different types of low-level features are acoustic features, phonotactic features and prosodic features. The different types of high-level features are lexical and syntactic features. Most of the research is focused on using low-level features to language identification.

1.3.1

Acoustic Features

Another stream of features called acoustic features is obtained through two techniques: linear prediction and mel frequency cepstral coefficients (MFCC) based. The linear prediction techniques give linear predictive coding (LPC), linear prediction cepstral coefficients (LPCC) and perceptual linear predictive features (PLP). The MFCC features are widely used in research because of their robustness and ability of eliminating speaker-dependent features.

1.3.2

Auditory Features

Furthermore, MFCC, PLP and RASTA-PLP are auditory features while MFCC and LPCC are static features. Auditory features use filter bank method for extracting features. Auditory features are inspired from human hearing system. Static features involve dividing speech signals into frames to obtain static characteristics. These static characteristics vary with time. Another notable feature of speech processing is phonotactic feature involving study of phonemes and system. The smallest unit in a language is phoneme and phoneme is used to construct meaningful parts in a language. A phoneme itself does not have a meaning. Phonology is a field concerned with the functioning of the sound in a language. This objective of the sound functions is to make the speech meaningful. Prosodic features like melody (pitch), stress, intonation, duration of speech, intonation and rhythm are extracted by research.

1.3.3

Lexical Features

Lexical features are a type of high-level features and deal with a language’s word structure. The research with lexical features primarily involves extracting words from the speech and building word-level LID systems.

506

1.3.4

A. Mathur and R. Sultana

Syntactic Features

Syntactic features are concerned with the order of words and sentence structure in a language. Not many LID systems have been constructed which use syntactic features. Researchers have used the extracted features individually and in combinations in the LID systems. Often researchers compensate noise from the input speech signal to improve the performance of the LID system. It depends on the researcher if noise should be compensated or not. However, recent researches have attempted to identify sub-languages. For instance, the Indian subcontinent consists of many sub-languages. Researches have tried to identify sub-languages such as Tamil, Hindi, Punjabi or Assamese. The data set used by researchers primarily consists of speeches from local news and radio broadcasts. For instance, researchers using Indian languages derived their data set from All India Radio broadcast or Doordarshan Television Network. Moreover, the data set consists of male and female speakers and speakers with different dialect to add variability to the data set. The extracted features inputted to the classification algorithm. The classifier first trains itself on these features and then recognizes the language in an unknown audio signal. So, researchers have further split the classification stage into learning and recognition phase. Various researches have been conducted to improve the learning phase which in turn will improve the performance of the LID system. Let us now look at chronological development of research in language identification in recent years. The next four sections explain few of the LID models and finally the conclusion.

2 Language Identification Model I The objective of the research [1] was to build a LID for three types of Indonesian languages. The research extracted high-level speech features and phonotactic features. The research used two phonotactic feature extraction methods: 1. Phone recognition followed by language modelling (PRLM) 2. Parallel phone recognition followed by language modelling (PPRLM).

2.1 Methodology The research analysed and compared the performance of the two phonotactic methods. The input to the PRLM is a speech signal. The PRLM method first performs phone recognition and then performs classification of the phone into the target languages. The PRLM system consists of a single universal phone recognizer. The universal phone recognizer is created using n-gram statistics model. That is, the likelihood of sequences of a phone appearing in a certain language is calculated. The phones recognition, from a speech signal, tabulates log likelihood for each language.

A Study of Machine Learning Algorithms in Speech Recognition …

507

The identification of language in a speech is determined by maximum log likelihood value. The (PPRLM) method uses more than one phone recognizers or uses multiple phone recognizers. Each phone recognizer identifies the language for phones of a speech signal. Each phone recognizer acts as a language model for different languages. The log likelihood values are tabulated from each language model. The log likelihood values are compared against each other. The language in a speech is determined by maximum log likelihood value. The research used a phone recognizer developed by Brno University of Technology. The phone recognizer was used to identify phones in four languages: Czech, English, Mandarin and Russian. Eighteen speech (three languages × six speakers) recordings are there in the database. Equal number of male and female speakers’ speeches were taken. The speech clips are at a frequency of 16 kHz.

2.2 Data set and Putting into Practice The data set is divided into training subset, development subset and test subset. The research performed experiment on the n-gram models for PRLM and PPRLM methods. In PRLM experimentation, the research trains for four spoken language identification systems using Czech, English Hungarian and Russian and then tests the systems. The four systems are tested on three Indonesian languages. The four systems are tested with different n-grams statistical models. The value of n ranged from 3 to 10. Confusion matrices for PRLM experiments are derived. It was observed that English and Russian phone recognizers gave the highest accuracy of 77.42 and 75.94%, respectively. The PPRLM experiments consist of two language identification systems. The first system creates interpolated models by using all phone recognizers for Czech, English Hungarian and Russian. Then, the first system tokenizes the phones. The second language identification system uses two phone recognizers that give the highest accuracy in PRLM experiments. The research selected phone recognizers of English and Russian as they had the highest accuracy in PRLM experimentation. The two language identifications systems are also tested with the three Indonesian languages. The two LID are also tested for the different n-gram statistical models.

3 Language Identification Model II The research [2] proposed a language identification model that identifies the following five languages: Arabic, Chinese, English, Korean and Malay. The data set consisted of ten speakers and each of them spoke the different languages mentioned earlier. So, the total number of recordings was 50 (ten speakers × five languages).

508

A. Mathur and R. Sultana

3.1 Methodology The preprocessing is the first step in the LID. The preprocessing step consists of amplification of the speech signal. The speech signal was amplified because it was a weak signal, and it could not be used as an input. Another preprocessing procedure was removing the silence in speech recordings and removing the background noise. The pre-emphasis stage performed noise removal in the speech and emphasized the higher frequencies in the speech signal. There were two ways to implement pre-emphasis stage. One way was pre-emphasis as a fixed coefficient filter. The second way was pre-emphasis as an adaptive coefficient filter. In the second way, the coefficient was adjusted with time according to a speech’s autocorrelation value. The pre-emphasis causes spectral flattening. This results in the signal being less vulnerable to the finite precision effects in subsequent signal processing.

3.2 Procedural Steps in a Nutshell The speech was divided into frames of 50 milli seconds. The frames overlapped every 20 milli seconds. The research assumed that speech signal was stationary over each frame. The research increased the correlation of linear predictive coding (LPC) in order to decrease the discontinuity between beginning and end of each frame. This was done by windowing each frame. The research used Hamming window. Then, the proposed system passed the windowed frames through fast Fourier transformation and mel frequency warping. By doing so, the mel spectrum was obtained. The logarithmic of mel spectrum gave MFCC. The model derived these features because MFCC features are robust. Once the MFCC features were extracted, these features were then passed to the classification stage. The research used vector quantization (VQ) method as classifier. The VQ technique is a classic technique in audio processing. The process of approximating feature vector that causes quantization of multiple values is known as quantization process. The research created a codebook which is used by VQ. This objective of the codebook is to work as a descriptor for vector quantizer. The codebook contains a set of fixed prototype vectors, where each vector is called a codeword. The VQ process matches the input vector to the codeword in the codebook. To perform this task, the VQ method needs a distortion measure. The index of the input vector codeword replaces the input vector. The index should show the codeword with the smallest distortion in the codebook. So, minimization of distortion is the goal of VQ technique.

A Study of Machine Learning Algorithms in Speech Recognition …

509

3.3 Data set The research divided the data set into testing and training data set. The training data set consisted of speech recordings from four males and one female. The testing data set consisted of speech recordings from two males and three females. The research optimized the frequency parameter and codebook size parameter and observed its effects on recognition rate.

3.4 Evaluation Results The audio files were set to two frequencies: 8 and 16 kHz. The recognition rate for all the five languages was higher for 16 kHz sampling frequency than that of 8 kHz frequency. The recognition rate is the ability of the classifier to correctly classify the audio signals into the different languages. The average recognition rate for the five languages was 78%. A limitation of the research was lack of experimentation with machine learning classifiers such SVM, K-NN, ANN and K-means clustering.

4 Language Identification Model III The next research proposal [3] applied machine learning procedure to build a LID that uses MFCC and K-NN, a machine learning classifier. The LID is used to identify Arunachal languages.

4.1 Data set The data set consisted of speech files from five types of Arunachal languages. Speech recordings were taken from All India Radio local news broadcast. The data set consisted of speech files is of duration of 4 min.

4.2 Procedural Steps in a Nutshell The first stage of the system is feature extraction stage. The research extracted MFCC features. The MFCC features were extracted because MFCC has production and perception of the speech similar to that of a human being. The logarithmic perception of loudness and pitch is imitated by the MFCC. MFCC features do not include speaker-dependent features. The MFCC feature extraction technique involves the

510

A. Mathur and R. Sultana

following steps: framing, windowing, discrete Fourier transformation, mel filter bank and discrete cosine transformation (DCT). Framing is a process of dividing speech signals into frames, and these frames overlap each other. The number of features in a one minute of a speech signal is a sequence of 5000 13-dimensional feature vectors. The discrete Fourier transformation is a process to convert speech signal from time domain to frequency domain.

4.3 Machine Learning Algorithm The classification stage follows the feature extraction stage. The research had chosen K-NN algorithm for classification task. The MFCC features extracted are passed to the K-NN algorithm for language identification of a speech signal. The training data set consisted of 20 min of the speech file. The testing data set contained speech samples of time length of 20 s.

4.4 Extended Evaluation The research further experimented by changing the test data. The test data was changed from 20 s speech signals to 10 s speech signals. It was observed that the correct prediction accuracies of Adi, Apatani, Galo, Idu and Tagin were 77%, 65.8%, 94%, 97% and 83%. The Adi language was misclassified into Apatani, Galo, Idu and Tagin with a misclassification rate of 1.5%, 20%, 1.5% and 0.5%, respectively. The Apatani language was misclassified into Adi, Galo, Idu and Tagin with a misclassification rate of 14.6%, 14.1%, 1.4% and 3.9%, respectively. The Galo language was misclassified into Adi, Apatani, Idu and Tagin with a misclassification rate of 3.6%, 1.2%, 0.2% and 0.9%, respectively. The Idu language was misclassified into Adi, Apatani, Galo and Tagin with a misclassification rate of 0.6%, 0.4%, 0.8% and 0.2%, respectively. The Tagin language was misclassified into Adi, Apatani, Galo and Idu with a misclassification rate of 5.3%, 5.1%, 4.5% and 1.8%, respectively. The research did not explore other classification algorithms when MFCC features are used.

5 Language Identification Model IV Another similar work constructs [4] a language identification system that identifies four different Indian languages: Tamil, Telugu, Hindi and Kanada using machine learning algorithm.

A Study of Machine Learning Algorithms in Speech Recognition …

511

5.1 Methodology The language identification system takes a speech signal as input and classifies a speech signal into one of the four Indian languages by performing computations on the speech signal. The classifiers that the research work practices are decision tree and SVM.

5.2 Data set The proposed language identification system consists of several steps: MFCC feature generator, feature vectors, training data and classifier. The data set consists of audio files in waveform audio file format (WAV). The speech recordings are obtained from news broadcasts of Doordarshan Television Network. The data set consists of 5 h of speech recording for each language. The data set is divided into two: testing and training data. Before feature extraction, the speech files are preprocessed. The preprocessing step involves removing silence in the speech recordings. This is done by using short-term energy function.

5.3 Procedure in a Nutshell The system proposed by the research has a feature extraction step that involves extracting MFCC features. The MFCC features remove the harmonics from speech signals thereby eliminating speaker-dependent characteristics. The MFCC extraction technique involves the followings steps. The first step is framing the signal into short frames. The frames are of length 20 milli seconds, and the frame shift is of time length of 10 milli seconds. Then the next step is computing periodogram estimate of the power spectrum. The output of this step is mel spectrum. Following this is applying mel filter bank to the power spectra and summation of the energy in each filter. This step is also called mel scale filtering. The output of mel scale filtering is mel frequency spectrum. The next is tabulating the logarithmic of all filter bank energies. The DCT is applied on the logarithmic of filter bank energies. The DCT coefficients 2–13 are kept and other are discarded. From the discrete cosine transformation, we get the mel cepstral coefficients. Once the MFCCs are obtained, the MFCC feature values are saved in a comma separated value file (CSV).

512

A. Mathur and R. Sultana

5.4 Machine Learning Algorithm These optimal hyperplanes will perform data separation with minimal to no errors. Support vectors are those training points that are closest to the optimal hyperplanes. Each training sample is stored as a row in the CSV file. This CSV file is passed to the support vector machines and decision tree classifiers for training. The research assessed the performance of the classifiers by calculating the detection rate. The classifiers classify the unknown test speech signals.

5.5 Evaluation and Results The detection rate is the ratio of number of correctly classified keywords to sum of number of correctly classified keywords, number of incorrectly classified keywords and number of rejected keywords. The accuracy for Tamil, Telugu, Hindi and Kanada when SVM was used as classifier: 0.4, 0.2, 0.28 and 0.33, respectively. The accuracy for Tamil, Telugu, Hindi and Kanada when decision tree was used as classifier: 0.8, 0.67, 0.2 and 0.22, respectively. The research obtained accuracies of 76% and 73% when using support vector machines and decision trees, respectively. The research only used one type of spectral characteristic, MFCC and did not explore other features like prosodic features.

6 Conclusion The current report studies the significant research in speech recognition has taken place 2016–2020. From the current report, it can be observed that machine learning has an application in speech recognition. Moreover, the interaction between machine learning and audio signal processing is also significant. A typical language identification system consists of feature extraction stage and classification stage. The data set used by the researchers predominantly consisted of speech recordings from local news broadcasts. Moreover, the researchers brought variability to their data set by incorporating speeches by male and female speakers. Researchers have performed various feature extraction techniques. Moreover, machine learning classifiers like SVM, DT and NN have been used. Some types of neural networks that have been used are artificial neural networks, probabilistic neural networks [5], deep belief NN and FFBPNN. Some researchers have broken down the classification stage into learning phase and recognition phase. These researches have tried to optimize the learning phase by building learning models. Researchers have used extreme learning machine approach to create learning models. Moreover, researches have tried to

A Study of Machine Learning Algorithms in Speech Recognition …

513

optimize the extreme learning machine approach by using different optimization approaches. Lastly, the current research uses utterance of words in a speech as data set. More research needs to be done on language identification system for continuous speech signals.

References 1. Safitri, N. E., Zahra, A., Adriani, M.: Spoken language identification with phonotactics methods on Minangkabau, Sundanese, and Javanese Languages. In: SLTU, pp. 182–187 (2016, January) 2. Gunawan, T.S., Husain, R., Kartiwi, M.: Development of language identification system using MFCC and vector quantization. In: 2017 IEEE 4th International Conference on Smart Instrumentation, Measurement and Application (ICSIMA), pp. 1–4 (2017, November). IEEE 3. Nyodu, K., Sambyo, K.: Automatic identification of arunachal language using K-nearest neighbor algorithm. In: 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), pp. 213–216 (2018, October). IEEE 4. Venkatesan, H., Venkatasubramanian, T.V., Sangeetha, J.: Automatic language identification using machine learning techniques. In: 2018 3rd International Conference on Communication and Electronics Systems (ICCES), pp. 583–588 (2018, October). IEEE 5. Sulthana, A.R., Gupta, M., Subramanian, S., Mirza, S.: Improvising the performance of imagebased recommendation system using convolution neural networks and deep learning. Soft Comput., 1–14 (2020)

Plant Leaf Disease Detection and Classification Using Machine Learning Approaches: A Review Majji V. Appalanaidu and G. Kumaravelan

Abstract Early detection of plant diseases will certainly increase the productivity of agricultural products. In addition, identification of the type of diseases by which the plant leaves are affected is a cumbersome task for human beings. Hence, in recent years, image processing techniques with machine learning algorithms provide an accurate and reliable mechanism to detect and classify the type of diseases in plants. We delivered a comprehensive study on the identification and classification of plant leaves using image processing and machine learning techniques. We presented a discussion about common infections and followed a line of investigation scenarios in various phases of the plant disease detection system. Finally, the problems and future developments in this area are explored and identified. This review would help investigators to learn about image processing and machine learning applications in the fields of plant disease detection and classification system. Keywords Image processing · Plant disease · Machine learning · Classification

1 Introduction The detection of diseases in plants is an essential issue, and it should be controlled entirely in the field of agricultural science. In particular, the crop loss due to diseases in developing countries like India affects their economic growth and nutritional standard adversely, because almost 70% of the population depends on agriculture. Thus, detection of disease in crops/plants at the earlier stage plays a vital role. Besides, few diseases have no visible symptoms, and farmers are not having enough knowledge of M. V. Appalanaidu (B) · G. Kumaravelan Department of Computer Science, Pondicherry University Karaikal Campus, Karaikal, Pondicherry, India e-mail: [email protected] G. Kumaravelan e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_55

515

516

M. V. Appalanaidu and G. Kumaravelan

these diseases. In these cases, farmers fail to identify those diseases. Therefore, the necessity of an automated system to detect the type of plant diseases, and its severity level becomes more critical. Recently, image processing techniques with machine learning (ML) algorithms prove to be a prominent technique for automatic plant leaf recognition and categorization of diseases. Figure 1 shows the overall architecture of an automated plant disease detection and classification system. It typically involves two-step practices. The first step consists of image processing routines, namely image acquisition a method to capture images of the infected parts of the plant leaf through RGB camera followed by image pre-processing a method that removes noises in the captured image through filters, followed by image segmentation a method that extracts the diseased portion from the chosen image, and finally feature extraction a method to set up the derived values from the segmented image. The second step consists of a classification process through the ML algorithm, which predicts the healthy and the infected plant leaves. The organization of this paper is as follows: Section 2 presents a Categorization of Plant Diseases. Section 3 describes various modules that are involved in the Process of Plant Leaf Disease Detection and classification Systems. Section 4 discusses the results of the previous research works. Section 5 concludes this paper along with future work directions. Image Processing Practice

• Smart phone • Camera Image Acquistion

Image Preprocessing • Noise • Image Enhancement

• K-means • Color Threshold Image Segmentation

Feature Extraction • Color • Shape • Texture

Classification Practice Healthy Leaf • SVM • ANN • KNN

Machine Learning Algorithms

Diseased Leaf

Fig. 1 Architecture of an automated plant disease detection and classification system

Plant Leaf Disease Detection and Classification Using Machine …

517

2 Categorization of Plant Diseases Generally, plant diseases have caused by biotic and abiotic factors. Usually, biotic factors are living organisms such as bacteria, fungi, and viruses, and abiotic factors are non-living organisms such as excess temperature, insufficient sun lights, and chemicals substances from an industry outlet. Nevertheless, abiotic factors are mostly avoidable because they are less dangerous, non-infectious, non-transmissible. Thus, the farmers are more worried about the biotic factors than abiotic factors that affect the agricultural farm in terms of the quality and quantity of the crop products. Figure 2 shows the various types of biotic factors such as bacteria, fungi, and viruses through which the plant leaves are affected. Soft spot, spot, and wilt are examples of bacterial diseases that normally affect plants like potato, corn, and bean. Mildew, rots, and spots are examples of fungal diseases that normally affect the plant leaves like carrots, beetroot, and beans. Dwarfing, distortion, and mosaic are examples of viral diseases that normally affect plants like tobacco, pepper, potato, tomato, eggplant, cucumber, and petunia. Figure 3 shows the leaves of the plants affected by various types of biotic factors. Types of Diseases

Bacterial Diseases

Soft spot

Molds

Spot

Rust

Fungal Diseases

Wilt

Mildew

Viral Diseases

Mottling

Rots

Cankers

Fig. 2 Types of various biotic factors influencing plant diseases

Distortion

Spots

Dwarfing

Wilts

518

M. V. Appalanaidu and G. Kumaravelan

Bacterial Diseases Softspot

Fungal Diseases

Spot

Wilt

Molds

Rust

Mildew

Rots

Spots

Wilts

Mottling

Distortion

Viral Diseases Dwarfing

Fig. 3 Leaves of the plants affected by various types of biotic factors

3 Process of Leaf Disease Detection and Classification System This section elaborates in detail about the various processing modules used in the development of the leaf disease detection/classification system.

3.1 Image Acquisition In this method, the investigators have used well-known datasets, namely plant village dataset, integrated pest management (IPM) images, and American Psychopathological Society (APS) images. Most of the works have observed a single crop for the period instead of the full-fledged dataset [1–5]. Some of the experimenters use scanned images [6, 7]. A few research workers have used self-collected images. A powerful leaf disease detection system depends on the image capturing in the environmental conditions. A list of datasets used by the various researchers has been shown in Table 1.

Plant Leaf Disease Detection and Classification Using Machine …

519

Table 1 List of datasets used by various researchers S. No. Name of the dataset

Data collected from Maximum number the source of images

Used by the number of researchers

1

Apple

Agriculture research institution at university of Tehran

320

1

2

Soybean

Plant village dataset 4775

2

3

Bitter gourd

Self-collected images

470

1

4

Cassava

An experimental field located in Khaphaengsaen campus Kasetsart University, Nakhon pathom, Thailand

160

1

5

Cotton

Self-collected images

290

4

6

Groundnuts

Self-collected images

400

1

7

Jujube

Self-collected images

45

1

8

Paddy

Paddy fields, Shivamogga district, Karnataka state, India

330

4

9

Potato

Plant village dataset

300

3

10

Rice

Self-collected images

500

4

11

Tomato

Self-collected images

800

5

12

Watermelon

Watermelon nursery in Kuala Selangor

200

1

13

Wheat

Self-collected images

800

2

3.2 Image Preprocessing This method helps to maintain all the images in fixed size using the resize function. Various filters are employed to the image for noise removal and image enhancement. If the captured images have no noise, then those images give better results in the next steps. Mean and median filters have used to eliminate unwanted information like dust, dewdrops, water drops, insects, and shadows that appear on the image. Weiner filters are used to clear the blurring effect of the leaf image. The list of preprocessing functions has shown in Table 2.

520

M. V. Appalanaidu and G. Kumaravelan

Table 2 List of preprocessing functions S. No.

Pre-processing

Filters/functions

1

Noise removal

Filters like Mean, Median and Weiner

2

Image enhancement

Image contrast, Image resize, Image cropping Image smoothing

3.3 Image Segmentation Segmentation of the image means to split the image into different segments and extracts the unhealthy portion of the leaf image using segmentation techniques. A list of various segmentation techniques used by the researchers has shown in Table 3. Table 3 List of various segmentation techniques used by the researchers S. No.

Segmentation method

Used by the number of authors

Highest classification accuracy

1

Active contour model

1

85.52

2

Binary threshold

1

89.6

3

Color threshold segmentation

7

99.8

4

Edge detection using Sobel

2

98.75

5

Fermi energy

2

92.2

6

Fuzzy c means

1

NA

7

Genetic algorithm

1

95.7

8

GAACO

1

91.3

9

Global threshold

1

88

10

Grab cut algorithm

1

NA

11

Improved histogram segmentation

1

90

12

k-means

13

Otsu thresholding

14

PSO

1

97.4

15

Ring segmentation

1

90

16

SOFM

1

97.3

17

Thresholding and masking

1

100

18

YCbCr color space

1

98

12

100

4

98

Plant Leaf Disease Detection and Classification Using Machine …

521

Table 4 A list of features used by the researchers S. No.

Features

Used by the number of authors

Highest classification accuracy

1

Color

2

100

2

Shape

1

75

3

Texture

6

99.8

4

Color, shape

5

97

5

Color, texture

6

97.3

6

Color, texture, shape

8

97.3

7

Discrete wavelet transform

1

89.6

8

Fractional-order Zernike moments

1

97.3

9

LBP features

1

95

10

Geometrical

1

76.5

11

SIFT

1

93.3

12

Eigen

1

90

13

Hu moments

1

85.52

3.4 Feature Extraction The final method of image processing is feature extraction. This method is useful to reduce the image dataset and also to find the name of the disease. Color features have extracted by CMM (color moment method) and CCM (color co-occurrence matrix). Mean and Standard_Deviation are some examples of the color feature. Shape features have extracted by the MER (minimum enclosing rectangle). Area, perimeter, and diameter are a few examples of shape features. The GLCM (Gray_Level_Cooccurrence_Matrix) extracts texture features. Contrast, entropy, and homogeneity are examples of the texture feature. CCM is also used to extract combinations of color and textures. A list of features used by the researchers has shown in Table 4.

3.5 Existing Machine Learning Algorithms for Plant Disease Classification Support Vector Machine (SVM): The authors in [8] classify the five diseases of the banana leaf. They collect a total number of 106 images by the digital camera. During classification, training used 60 images, and testing used 46 images. SVM performs the classification with 95.7% accuracy. The authors in [9] classify the soybean leaves of three diseases. During classification, training used in 3341, and testing used 1434 images. They divide the whole dataset into three models, like model1, model2, and model3. For training and testing, model1 uses each 50% of the total images. Model2

522

M. V. Appalanaidu and G. Kumaravelan

uses 60% and 40% of the overall pictures for training and testing. For the learning and evaluation, model3 uses 70% and 30% of the total frames. Among the three models, the highest classification accuracy 62.53% achieved by the model3. The authors in [10] classify the grape leaves of the two diseases. The total number of images 400 collected from a well-known benchmark dataset called the plant village dataset in the form of JPEG. During classification, training used in 225 and testing used 175 images. SVM performs the classification with accuracy 97.3%. The author compared the proposed model with NN, ANN, fuzzy set theory algorithms and concluded that the proposed model gets the best precision. The authors in [11] classify the two diseases of potato leaves. They collect the images from the plant village benchmark dataset. Images used during classification, training used 180, and testing used 120. MulticlassSVM classifier performs classification with an accuracy of 95%. The authors in [12] classify the four diseases of a wheat leaf. During classification, training used 150 images, and testing used 50 images. The proposed model multiple classifier systems (MCS) performs the classification with 95.1% accuracy. K-Nearest Neighbor (KNN): The authors in [13] categorize two diseases of paddy leaf. Initially, segment the image by the global threshold method to separate the unhealthy region of the leaf image. After that, extract geometric features from the segmented images and submitted to the KNN classifier. During classification, training used 198, testing used 132 images. KNN classifies the diseases of paddy leaf with 76.59% accuracy. The authors in [14] classify the five kinds of diseases of corn leaves, which take by the digital camera. The KNN classifies the diseases of corn leaf with an accuracy of 90%. The authors in [15] classify the two diseases of soybean leaves. During classification, training used 100, and testing used 44 images with accuracy 75%. The authors in [16] classify the two diseases of a paddy leaf using the KNN classifier. Classifiers, i.e., SVM and KNN, perform the classification with accuracy 93.3 and 91.10%, respectively. During classification, training used 90 images, and testing used 30 images. Finally, the author concludes among the two classifiers that KNN has given the best performance. The summary of all machine learning classification algorithms has shown in Table 5. Naïve Bayes (NB): The authors in [17] present an efficient technique to classify the healthy and diseased leaf of the okra. They test the proposed method on 79 leaf images. During classification, training used 49, and testing used 30 images. Naïve Bayes classifier categorizes the healthy and unhealthy leaf with 87% accuracy. Neural Network (NN): The authors in [18] proposed Enhanced Particle Swarm Optimization (EPSO) to classify the diseases of root rot, leaf blight, bacterial blight, micronutrient, wilt of the cotton leaf. They apply the reduced features to SVM and BPNN classifiers. During classification, training used 270, and testing used 120 images for the classification of various diseases of cotton leaves with 94% accuracy. Finally, the author concludes that BPNN is the best classifier among the two classifiers. The authors in [19] developed a new system to identify the varieties of white rot, anthracnose, rust, ascochyta spot, witches broom of jujube leaf. Eleven shapes, four texture, and nine color feature extract from the segmented images. Lastly, these features apply to the neural network classifier as input. The classifier identifies various diseases of jujube plant leaf with an accuracy of 85.33%. The authors in [20]

Plant Leaf Disease Detection and Classification Using Machine …

523

Table 5 Summary of classification techniques Classification Author and year technique

Plant name

Number Training Testing Classification of images images accuracy diseases

SVM

Vijai Singh et al. (2017)

Banana

5

60

46

Sukhvir Kaur et al. (2018)

Soya bean

3

3341

1434

62.53

P. Kaur et al. (2019)

Grapes

2

225

175

97.3

Islam et al. (2017) Potato

2

180

120

95

Tian et al. (2010)

Wheat

4

150

50

95.1

M. Suresha et al. (2017)

Paddy

2

198

132

76.5

S. W. Zhang et al. Corn (2015)

5

90

10

90

S. Shrivastava et al. (2014)

2

100

44

75

K. J. Mohan et al. Rice (2016)

3

90

30

93.3

NB

D. Mondal et al. (2015)

Okara

2

40

39

87

NN

Revathi et al. (2014)

Cotton

2

270

120

94

W. Zhang et al. (2013)

Jujube

6

30

15

85.33

M. Ramakrishnan Groundnuts 1 (2015)

360

40

97.41

H. Sabrol et al. (2016)

Tomato

5

117

266

97.3

H. Sabrol et al. (2016)

Tomato

5

598

150

77.6

KNN

DT

Soybean

95.7

investigate a method to identify the Cercospora disease of groundnut using BPNN. They collect a total number of 400 images for this proposed method. Initially, convert RGB to HSV for color generation and descriptor. Later, it removes the background using the thresholding algorithm. Finally, extract the features namely texture and color, from the segmented image and then passed to the BPNN classifier to perform classification with an accuracy of 97.41%. Decision Tree (DT): The authors in [21] classify the five diseases of tomato plant images. The decision tree performs classification with an accuracy of 97.3%. The author concludes that the combination of features gives the best classification accuracy. The authors in [22] suggested a model automate the plant disease recognition and categorization of different types of tomato leaves and stems’ images. Initially, all

524

M. V. Appalanaidu and G. Kumaravelan

the pictures are segments by the Otsu thresholding segmentation technique to separate the diseased part of the form. Later, extract ten color features from the segmented image and store in the feature vector. Finally, these extracted features are submitted to the decision tree and perform the classification with 78% accuracy.

4 Discussions The image processing has shown an effective method for identifying and diagnosing plant disease and replacing the digital camera with human eyes and brain with learning optimization algorithms. The above review clarified different methods for identifying and classifying various plant leaf infections. Some facts drive this inference. Table 1 indicates the list of datasets has used by multiple authors. Table 1 shows that the maximum number of images has taken from the source plant village dataset. Table 2 describes that the different preprocessing functions have been applied by the various authors. Table 3 shows the list of segmentation methods, and it has indicated that k-means and thresholding methods have the best performance among all the segmentation techniques. Table 4 presents different types of features and their combination of features used by the various authors. From Table 4, color features alone and the combination of the features have better performance among all the features. The results for Table 5 show that the NN classifier performs best among all the classifiers concerning all the classification performance measures. The classification accuracy of the NN classifiers is 97.41. The SVM and DT classifier performs well next to NN classifier. The SVM and DT have the same classification accuracy 97.3. However, the KNN classifier yields the next classification performance among all the classifiers. The classification accuracy of the KNN classifiers is 93.3. Finally, NB shows the lowest classification performance with 87%.

5 Conclusions This review paper describes the various image processing and machine learning strategies used in the detection and classification of diseases of different plants. A detailed list of image processing methods has explained individually. A comparison of different classification approaches has clearly described. From the above review, researchers should implement a few new algorithms and an understanding of methods to achieve better outcomes. A mixture of unexplored methods of processing, selection, and training can also have to improve detection and classification methods. Through developing mobile applications, farmers can make immediate solutions available. Web portals may have to provide online solutions for plant disease.

Plant Leaf Disease Detection and Classification Using Machine …

525

References 1. Mohanty, S.P., Hughes, D., Salathe, M.: Using deep learning for image-based plant disease detection. Front. Plant Sci. 7, 1–10 (2016) 2. Ipm images. https://www.ipmimages.org/about/. Accessed 15 May 2017 3. APS Image database. https://imagedatabase.apsnet.org/search.aspx. Accessed 16 May 2017 4. Pujari, J.D., Yakkundimath, R.S., Jahagirdar, S., Byadgi, A.M.: Quantitative detection of soybean rust using image processing techniques. J. Crop Prot. 5(1), 75–87 (2015) 5. Rumpf, T., Mahlein, A.K., Steiner, U., Oerke, E.C., Dehne, H.W., Plumer, L.: Early detection and classification of plant diseases with support vector machines based on hyperspectral reflectance. Comput. Electron. Agric. 74(1), 91–99 (2010) 6. Pires, R.D.L., Goncalves, D.N., Orue, J.P.M., Kanashiro, W.E.S., Rodrigues, J.F., Machado, B.B., Gonçalves, W.N.: Local descriptors for soybean disease recognition. Comput. Electron. Agric. 125, 48–55 (2016) 7. Phadikar, S., Sil, J., Das, A.K.: Rice diseases classification using feature selection and rule generation techniques. Comput. Electron. Agric. 90, 76–85 (2013) 8. Singh, V., Misra, A.K.: Detection of plant leaf diseases using image segmentation and soft computing techniques. Inf. Process. Agric. 4(1), 41–49 (2017). (Elsevier) 9. Kaur, S., Pandey, S., Goel, S.: Semi-automatic leaf disease detection and classification system for soybean culture. IET Image Process. 12(6), 1038–1048 (2018) 10. Kaur, P., Pannu, HS., Malhi, AK.: Plant disease recognition using fractional-order Zernike moments and SVM classifier. Neural Comput. Appl. pp. 1–20 (2019). (Springer) 11. Islam, M., Dinh, A., Wahid, K., Bhowmik, P.: Detection of potato diseases using image segmentation and multiclass support vector machine. In: 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–4. IEEE (2017) 12. Tian, Y., Zhao, C., Lu, S., Guo, X.: SVM-based Multiple classifier system for recognition of wheat leaf diseases. In: Proceedings of 2010 Conference on Dependable Computing (CDC ‘2010), pp. 2–6 (2010) 13. Suresha, M., Shreekanth, K.N., Thirumalesh, BV.: Recognition of diseases in paddy leaves using knn classifier. In: 2nd IEEE International Conference for Convergence in Technology (I2CT 2017), pp. 663–666 (2017) 14. Zhang, S.W., Shang, Y.J., Wang, L.: Plant disease recognition based on plant leaf image. J Anim Plant Sci 25(Suppl. 1), 42–45 (2015) 15. Shrivastava, S., Hooda, D.S.: Automatic brown spot and frog eye detection from the image captured in the field. Am. J. Intell. Syst. 4(4), 131–134 (2014) 16. Mohan, K.J., Balasubramanian, M., Palanivel, S.: Detection and recognition of diseases from paddy plant leaf images. Int. J. Comput. Appl. 144(12), 34–41 (2016) 17. Mondal, D., Kole, D.K.: Detection and classification technique of yellow vein mosaic virus disease in okra leaf images using leaf vein extraction and Naive Bayesian classifier. In: IEEE International Conference on Soft Computing Techniques and Implementations (ICSCTI) (2015, October) 18. Revathi, P., Hemalatha, M.: Cotton leaf spot disease detection utilizing feature selection with skew divergence method. Int. J. Sci. Eng. Technol. 3(1), 22–30 (2014) 19. Zhang, W., Guifa, T., Chunshan, W.: Identification of jujube trees diseases using a neural network. Int. J. Light Electron. Opt. 124(11), 1034–1037 (2013) 20. Ramakrishnan, M.: Groundnut leaf disease detection and classification by using a backpropagation algorithm. In: IEEE International Conference on Communications and Signal Processing (ICCSP), pp. 0964–0968 (2015, April) 21. Sabrol, H., Kumar, S.: Tomato plant disease classification in digital images using classification tree. In: International Conference on Communication and Signal Processing, IEEE, pp. 1242– 1246 (2016) 22. Sabrol, H., Kumar, S.: Intensity-based feature extraction for tomato plant disease recognition by classification using a decision tree. Int. J. Comput. Sci. Inf. Secur. 14(9), 622–626 (2016)

Single-Channel Speech Enhancement Based on Signal-to-Residual Selection Criterion Ramesh Nuthakki, Junaid Abbas, Ayesha Afnan, Faisal Ahmed Shariff, and Akshaya Hari

Abstract Over the last 40 years, researchers/engineers have proposed quite a many speech enhancement algorithms to reduce noise, but little efforts have been made to improve speech comprehensibility. The prime aim in this paper is to ameliorate speech quality standard and comprehensibility by examining the application of binary mask in conditions, unfavorable to hearing impaired or normal listeners who find the speech incomprehensible. Gain functions like Wiener and spectral subtraction aim to attenuate the signal when speech is absent or the estimated SNR is low and retain the signal when speech is present for which the estimated SNR is high. For this approach, access to accurate SNR estimates and estimates of background noise spectrum is needed. Even in extremely low SNR conditions (SNR < −5 dB), this aim is attainable. This method is applicable in real time in hearing aids, mobile phones and speech-activated machines. Keywords Speech comprehensibility · Ideal binary mask · Parametric gain estimator · STOI · SSNR

R. Nuthakki · J. Abbas · A. Afnan · F. A. Shariff (B) · A. Hari Department of Electronics and Communication Engineering, Atria Institute of Technology, ASKB Campus, 1st Main Rd, Ags Colony, Anandnagar, Hebbal, Bengaluru 560024, India e-mail: [email protected] R. Nuthakki e-mail: [email protected] J. Abbas e-mail: [email protected] A. Afnan e-mail: [email protected] A. Hari e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_56

527

528

R. Nuthakki et al.

1 Introduction Human speech signal typically degrades due to various surrounding environmental conditions. Background noise is one of the most influential factors that cause deterioration of the speech standard and comprehensibility. Background noise is either stationary or non-stationary and isn’t mostly correlated but additive to the speech spectrum [1, 2]. Speech enhancement focuses on improvement in speech condition by the use of different algorithms. The objective of this improvement algorithm is to achieve an enhancement in both comprehensibility and overall quality of deteriorated speech by the use of audio signal processing techniques. Although many advances are made in developing enhancement algorithm that suppresses the background noise and improves overall quality of speech, significantly less progress has been made in developing an algorithm that enhances speech comprehensibility [3–7]. Former studies performed on normal-hearing people have reportedly proved large improvements in comprehensibility using an ideal binary mask method. It distinguishes the noise dominated and speech dominated units, which is then calculated and implemented onto the input noisy spectrum to get the noise-suppressed spectrum. This mask was customized to keep the time-frequency (T-F) areas where the masker (noise) was dominated by the target speech (local SNR > 0 dB) and to eliminate T-F units where masker is more (local SNR < 0 dB). A Bayesian classifier is used to indicate the efficiency of the binary mask in enhancing the speech comprehensibility [8]. The elimination or retention of T-F bins by the application of a binary mask is determined by the masker spectrum underestimation or overestimation criteria. This proves necessary because numerous available algorithms that estimate noise underestimate its power-spectrum density (psd). Another mask can alternatively be synthesized by the application of restrictions on dual speech degradation types that the gain function can initiate [5]. When the gain function is applied, it gives a variation in spectral amplitudes. Consequently, the attenuation and amplification distortions occur. Research has shown that between amplification and attenuation distortions, the former causes more damage compared to the latter. Incidentally, the improved speech that is obtained has attenuation distortion and is proven to have more comprehensibility compared to the noisy speech. Hence, an ideal binary mask is applied to the enhanced spectrum to construct a speech signal comprising of only the attenuation distortion [4, 9]. The binary mask procedure used can substantially enhance comprehensibility of speech sentences that are deteriorated by background noise of even −10 dB SNR levels. The proposed method nullifies the intrusion of noise from the targeted speech by using the mask and enhances the SNR of speech, thus improving the efficiency of speech communication by reducing listener fatigue and increasing listening comfort [8, 10].

Single-Channel Speech Enhancement Based on Signal-to-Residual …

529

2 Proposed Method 2.1 Signal Residual Selection Criterion Consider a clean speech z(n), interrupted by noise d(n), (not correlated to z(n)). Subsequently, the corrupted speech y(n) is: y(n) = z(n) + d(n)

(1)

Figure 1 shows the blocks used to build the mask in magnitude domain. Portions containing noise are partitioned into frames/slots of 20 ms, with 50% overlap binding the adjoining frames. Hanning window is applied to each speech frame and short-time Fourier transformed. On multiplying the noise spectrum Y (k, mi ) with the gain G(k, mi ), we get an estimate of the speech spectra. Here G(k, mi ) is given with respect to a priori SNR, and Zˆ (k, mi ) is used to denote the estimate of clean sentence spectrum where mi is frame index, and k denotes the frequency bin [9, 11]. Zˆ (k, m i ) = G(k, m i ).Y (k, m i )

(2)

After computing the estimated noise spectrum, the formulation of binary mask is done on restricting the anomalies that were caused by inaccuracies during estimation of the noise spectrum. Particularly if Zˆ (k, mi ) ≤ 2Z(k, mi ), binary mask lets the spectrum pass through and masks the spectrum if vice versa. Usually, the processed speech contains both noise underestimation and overestimation. The spectrum estimate is differentiated opposed to the real noise spectrum for every T-F unit, the ones

Fig. 1 Procedure to build the mask in magnitude domain

530

R. Nuthakki et al.

that satisfy the constraints are held and ones that do not satisfy are eliminated. By applying ISTFT, we get enhanced speech in time domain [12]. Parametric Wiener gain filter algorithm is used as a gain function. This algorithm was chosen due to its less computation complexity, easy implementation and its efficiency with respect to speech comprehensibility, unlike other sophisticated noise-reducing algorithms [2]. The following equation is used to calculate parametric Wiener filter function:  ω 1 (3) G(k, m i ) = δ + SNRprio (k, m i ) Here, SNRprio is the a priori SNR which is calculated by:  2  Y (k, m i ) Zˆ 2 (k, m i − 1) + (1 − α).max − 1, 0 SNRprio (k, m i ) = α. λ Dˆ (k, m i − 1) λ Dˆ (k, m i )

(4)

α is a smoothing constant which controls SNRprio and its value is 0.98. Background noise variance estimate is represented by λ Dˆ [3, 5, 13].

2.2 Channel Selection Algorithms The block diagram of the process involved for the aforementioned SNRESI -based algorithm is shown in Fig. 2. Unlike the SNR rule, the SNRESI rule selects channels from the enhanced (noise-suppressed) spectrum rather than from the noise-corrupted spectrum. The noise-reduction block shown may include any conventional noise reduction algorithm. The choice of algorithm will not influence performance, at least in terms of intelligibility [11]. If Zˆ (k, mi ) > Z (k, mi ), it indicates noise overestimation, and Zˆ (k, mi ) < Z (k, mi ) indicates noise underestimation distortion. Normally, both are present in the processed speech.

Fig. 2 Block diagram representing two different channel selection algorithms

Single-Channel Speech Enhancement Based on Signal-to-Residual …

531

Fig. 3 Plot of SNRESI versus the ratio of enhanced (| Zˆ |) to clean (|Z |) spectra

The impact of these distortions by gain on the comprehensibility of speech, in steady noise conditions and competing talker, is assessed as shown in Fig. 3. They were confined into three regions by using an ordinary noise reducing algorithm (Square Root Weiner). Reg I: Only contains attenuation distortion. Reg II: Only contains amplification distortion lesser than 6 dB. Reg III: Contains amplification distortion greater than 6 dB. As seen from above, if we combine the first two regions and denote them as Reg I+II, we get the restraint: Zˆ (k, m i ) ≤ 2. Z (k, m i )

(5)

ˆ mi ) ≤ Z(k, mi ) which gives rise In Region I: SNRENH (k) ≤ SNR(k) leads to Z(k, the condition in this region. In Region II: SNR(k) < SNRENH (k) ≤ SNR(k) + 6 dB which gives rise to the condition in this region. Lastly, Region III constraint is obtained because factually in this region: SNRENH (k) > SNR(k) + 6 dB. By these definitions pertaining to these three regions, it is made obvious that, to maximize SNRESI (and hence maximize comprehensibility), the approximated magnitude spectra Zˆ (k, mi ) must be retained in both regions I and II [3, 4].

3 Objective Measures The initial speech signal and the enhanced speech signal are generally used to calculate the objective quality measures. In either frequency or time domain, every speech slot’s distortion measure average is taken to evaluate speech distortion. The objective measures computed in our project are segmental signal-to-noise ratio (SSNR) and short-time objective intelligibility (STOI).

532

R. Nuthakki et al.

3.1 SSNR The SSNR is estimated in both time and frequency domains. The most simple measure used to evaluate the speech improvement algorithm is the time domain approach. The original and processed signals are time aligned with phase errors rectified. SSNR can be defined as, ⎛ ⎞ N M−1 N m+N −1 2   10 ⎜ ⎟ i=1 Z (i) (6) log SSNR = ⎝

2 ⎠ M M=0 10 i=N m N ˆ i=1 Z (i) − Z (i) where Z (i) is initial (clean) signal, Zˆ (i) is enhanced signal. M denotes the signal frame numbers and N indicates the length of frame (20 ms). Geometric average of SNRs over all slots of the signal provides basis of SSNR. A probable issue with the approximation of SSNR is that, in the durations of quietness in a speech signal (which are plenty in all human conversations), the signal energy is observed to be very low that results in highly negative SSNR values which tend to create unfairness in the overall assessment. To resolve this, the quiet frames are excluded, by comparison of short-time energy computations with respect to a threshold and by restricting SSNR to low values. They were restricted within a span of (−10, 35 dB) hence evading the use of a speech silence detector. The SSNR is based on the clean and processed signals. The signals passed through the perceptual weighting filters. After passing clean and processed signals through these filters, computation of the segmental SNR is based on the outputs of these filters [14].

3.2 STOI Rate of sample is 10,000 Hz for which this technique is synthesized, so as to wrap the appropriate frequency span for comprehensibility. Signals with different rates of sample must be sampled again. Moreover, we presume that clean and estimated signals are time-aligned. Firstly, 50% overlapping is done on both signals by segmenting to gain a T-F depiction, slots of 256 length samples are Hann-windowed, up to 512 samples of zero padding are done for each frame and analysis of 1/3rd octave band is done by clumping DFT channels. Totally, 15,1/3rd octave bands are utilized along with setting the smallest central frequency to 150 Hz. Zˆ (k, mi ) defines the kth DFT-channel of mi th slot of clean signal. Then, the T-F unit which is norm of the jth 1/3rd octave band is given by,  k2 ( j)−1 2     Z j (m i ) =   Zˆ (k, m i ) k=k1 ( j)

(7)

Single-Channel Speech Enhancement Based on Signal-to-Residual …

533

Here k 1 and k 2 are the 1/3rd octave band edges rounded off to the closest DFTchannel. T-F depiction of the processed signal is achieved in the same manner and denoted by s. The intermediary comprehensibility assessment for a single T-F unit, represented dj(mi ), depends on N successive T-F unit regions from Z j (m i ) and Zˆ j (m i ) both, with mi ∈ M where M = {(m i − N + 1), (m i − N + 2), . . . , m i − 1, m i }

(8)

Normalization method is first done by scaling of every T-F unit from Zˆ j (m i ), ⎛ ⎞  2 1/2   α=⎝ Z j (m i )2 Zˆ j (m i ) ⎠ mi

(9)

mi

In a way that its energy is equal to the clean signal energy, inside that T-F region. Then, for signal-to-distortion ratio (SDR) to be lower bound, α Zˆ j (m i ) is clipped, we define SDR as, ⎛ ⎞ Z j (m i )2 ⎜ ⎟ SDR j (m i ) = 10 log10 ⎝

2 ⎠ α Zˆ j (m i ) − Z j (m i )

(10)

Hence,

Zˆ  = max min α Zˆ , Z + 10−β/20 Z , Z − 10−β/20 Z

(11)

Here Zˆ  denotes clipped plus normalized T-F unit, also β indicates lower bound SDR. Intermediary comprehensibility assessment is given by approximating the correlation coefficient between processed and unprocessed T-F units,  d j (m i ) = 

mi

 mi

Z j (m i ) −

Z j (m i ) −

1 N

1 N

l

l

Z j (l)

Z j (l)



 Zˆ j (m i ) −

2 mi

1 N



Zˆ j (m i ) − 

l 1 N

 Zˆ j (l)

l

Zˆ j (l) 

2

(12)

Finally, the OCM is obtained by taking the mean of intermediary comprehensibility assessment on all frames and bands, d=

1  d j (m i ) J M j,m

(13)

i

It is clear from Table 2 that there is a significant improvement in SSNR and STOI for speech signal corrupted by random, babble, helicopter and car noises for δ and

534

R. Nuthakki et al.

ω values of the Parametric Wiener gain filter shown. The values were chosen so as to get the trade-off between the overall signal quality and comprehensibility.

4 Subjective Measures A total of 8 listeners, 4 males and 4 females, were asked to participate in a listening test. Both enhanced speech signal and the noisy sentences were played. They were then asked to rate the enhanced signal out of 5 with respect to the following subjective quality parameters: background (BAK), signal (SIG) and overall (OVL). Unprocessed speeches were deteriorated by random, car, babble and helicopter noises of 0 and −5 dB SNRs, respectively [15]. Listeners scored these speeches which are shown in Table 1. After observing the scores, it is seen that there is a massive advancement in speech quality for the enhanced speech signal (Table 2). Table 1 Subjective measure analysis Noise type

SNR(dB)

BAK

SIG

OVL

Random noise

0 (δ = 4.8, ω = 0.2)

3.8

4.0

4.1

−5 (δ = 3.3, ω = 0.4)

2.2

2.3

2.9

0 (δ = 1.3, ω = 0.2)

3.9

4.2

4.3

−5 (δ = 2.2, ω = 0.7)

2.6

2.7

3.0

Helicopter noise

0 (δ = 3.0, ω = 0.3)

3.8

4.2

4.5

−5 (δ = 2.9, ω = 0.3)

3.7

4.1

4.4

Car noise

0 (δ = 4.8, ω = 0.3)

3.2

3.7

4.1

−5 (δ = 4.5, ω = 0.4)

2.3

2.9

4.0

Babble noise

Table 2 Objective measure analysis Noise type Input SNR (dB)

δ

ω

SSNR(dB) with parametric Wiener filter

SSNR (dB) with Wiener filter (δ = 1, ω = 1)

STOI (dB) using parametric Wiener filter

STOI (dB) with Wiener filter (δ = 1, ω = 1)

Helicopter noise

0

3.0

0.3

14.7530

11.8087

0.934

0.856

−5

2.9

0.3

10.5136

9.9602

0.905

0.817

Random noise

0

4.8

0.2

12.6998

9.4059

0.958

0.842

−5

3.3

0.4

10.1681

8.8039

0.895

0.788

Babble noise

0

1.3

0.2

3.8180

1.2307

0.921

0.804

−5

2.2

0.7

0.4231

0.5082

0.698

0.684

Car noise

0

4.8

0.3

11.7970

11.0471

0.953

0.882

−5

4.5

0.4

10.4678

10.1048

0.912

0.858

Single-Channel Speech Enhancement Based on Signal-to-Residual …

535

Fig. 4 Mean comprehensibility score

5 Mean Comprehensibility Score Figure 4 indicates the mean % of words identified by listeners with standard capacity to identify. It has been evident from the figure that intelligibility was improved when noise distortion constraints were applied in the magnitude domain and was noticed to have been degraded on applying the Weiner and unrefined stimuli [3]. The recordings obtained on the basis of mean % of words taken from listeners are shown in Fig. 4. UN represents the values derived from unprocessed speech. It has been evident that there was a significant improvement in the performance on applying the proposed binary mask, as depicted in Fig. 3. On considering −5 dB, the performance increased from 25% with un-processed stimuli (UN) to 97% with proposed binary mask ( Zˆ (k, mi ) ≤ 2.Z (k, mi )) and at 0 dB performance increased from 65% with un-processed stimuli (UN) to 99% with the proposed binary mask ( Zˆ (k, mi ) ≤ 2.Z (k, mi )).

6 Spectral Analysis Spectrograms are framed to illustrate the functioning of the time-changing spectral attributes. The plots of the spectrogram for the magnitude domain are displayed in Fig. 5. These spectrograms were obtained for −5 dB and 0 dB input SNR levels. We can say by seeing the spectrograms that it retrieves both voiced and silent edges plus formants in magnitude domain.

7 Results and Conclusion The parametric Wiener gain filter was used for the implementation of the new binary mask approach with the use of MATLAB tool. Different subjective and objective tests were done. For objective tests, the parameter calculated was SSNR and STOI in

536

R. Nuthakki et al. Clean speech

Frequency(Hz)

10000 8000 6000 4000 2000 0

0.5

1

1.5 Noisy speech

0

0.5

1

0

0.5

0

0.5

2

2.5

3

1.5 2 Wiener Filter processed

2.5

3

1

1.5 2 Enhanced speech Signal

2.5

3

1

1.5 Time (s)

2.5

3

10000 8000 6000 4000 2000

10000 8000 6000 4000 2000

10000 8000 6000 4000 2000 2

(a)

(b)

(c)

(d)

Fig. 5 Spectrograms of helicopter noise a SNR = 0 db b SNR = −5 db and spectrograms of car noise c SNR = 0 db d SNR =−5 db

time domain. The tests were run for different combinations of δ and ω of parametric Wiener gain filter for different background noises at 0 and −5 dB SNR levels. On looking at the objective scores, there was an obvious upgrade in the SSNR values for sentences degraded by helicopter, car, random and babble noises at 0 and −5 dB SNRs. The subjective tests also show improvement in overall speech enhancement standard and speech comprehensibility. The mean comprehensibility scores also suggest improvement in intelligibility for the proposed binary mask channel selection criterion.

8 Future Scope In the future, improvement in speech comprehensibility and overall speech enhancement standard for signal degraded by noise as low as −10 dB SNR levels can be achievable. Further improvement in values for other objective measures parameters such as SDR, PESQ can be obtained and verified.

Single-Channel Speech Enhancement Based on Signal-to-Residual …

537

References 1. Naik, D.C., Sreenivasa Murthy, A., Nuthakki, R.: A literature survey on single channel speech enhancement techniques. Int. J. Sci. Technol. Res. 9(3). ISSN 2277-8616 2. Rangachari, S., Loizou, P.C.: A noise-estimation algorithm for highly non-stationary environments. Speech Commun. 4, 220–231 (2006). (TX 75083-0688, 2005 Elsevier B.V) 3. Kim, P., Loizou, P.C.: Gain-Induced Speech Distortions and the Absence of Intelligibility Benefit with Existing Noise-Reduction Algorithms. Department of Electrical Engineering, University of Texas at Dallas, Richardson, Texas. 75080 VC 2011 Acoustical Society of America. https://doi.org/10.1121/1.3619790. pp. 1581–1596. Accepted 2 July 2011 4. Kim, G., Loizou, P.C.: Why do Speech-Enhancement Algorithms not Improve Speech Intelligibility? Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX 75080, USA 978-1-4244-4296-6/10/2010 IEEE 4738 ICASSP 2010 5. Nuthakki, R., Sreenivasa Murthy, A., Naik, D.C.: Single channel speech enhancement using a new binary mask in power spectral domain. In: Proceedings of the 2nd International Conference on Electronics, Communication and Aerospace Technology (ICECA 2018) IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1 6. Nuthakki, R., Sreenivasa Murthy, A., Naik, D.C.: Modified magnitude spectral subtraction methods for speech enhancement. In: 2017 International Conference on Electrical, Electronics, Communication, Computer and Optimization Techniques (ICEECCOT). IEEE. 978-1-53862361-9/17/2017 7. Nuthakki, R.: Speech enhancement techniques. Int. J. Adv. Res. Sci. Eng. 6(8) (2017, August) 8. Kim, G.: Binary mask estimation for noise reduction based on instantaneous SNR estimation using Bayes risk minimisation. Electron. Lett. 51(6), 526–528 (2015, 19 March) 9. Nuthakki, R., Sreenivasa Murthy, A.: Enhancement of speech intelligibility using binary mask based on noise constraints. Int. J. Recent Technol. Eng. (IJRTE). 8(3) (2019, September). ISSN: 2277-3878 10. Li, N., Loizou, P.C.: Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. Acoust. Soc. Am. (2008). https://doi.org/10.1121/1.2832617 11. Nuthakki, R., Sreenivasa Murthy, A., Naik D.C.: Enhancement of speech intelligibility using binary mask based on channel selection criteria. Int. J. Recent Technol. Eng. (IJRTE) 8(5) (2020, January). ISSN: 2277-3878 12. Chen, F., Loizou, P.C.: Impact of SNR and Gain-Function Over- and Under-Estimation on Speech Intelligibility. Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX 75083-0688, USA. Accepted 8 Sept 2011 13. Kim, G., Loizou, P.C.: A New Binary Mask Based on Noise Constraints for Improved Speech Intelligibility. Department of Electrical Engineering, University of Texas at Dallas, USA, ISCA 1632, 26–30 Sept 2010, Makuhari, Chiba, Japan INTERSPEECH 2010 14. Ma, J., Hu, Y., Loizou, P.C.: Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. In: 2009 Acoustical Society of America. https://doi.org/10.1121/1.3097493 15. Hu, Y., Loizou, P.C.: Subjective Comparison and Evaluation of Speech Enhancement Algorithms. Elsevier B.V (2006)

Evolutionary Algorithm for Solving Combinatorial Optimization—A Review Anisha Radhakrishnan and G. Jeyakumar

Abstract Evolutionary computing (EC) has made a remarkable competency in both research and industry. Its efficiency in addressing the combinatorial optimization problems (COPs) has gained wide popularity. Exploration of bio-inspired algorithms for solving COPs has experienced a notable shift from classic algorithms to hybridized and co-evolutionary algorithms. This paper presents a broad study on different approaches exists for solving COPs. In particular, detailed explanation on evolutionary algorithms (EAs) and its usages in solving COPs are presented. This study also details different algorithmic adjustments to be done on EAs and possible integrations of them with other meta-heuristics, in order to make them apt for solving COPs. Keywords Evolutionary algorithm · Combinatorial optimization problem · Discrete optimization problem · Continuous optimization problem

1 Introduction Evolutionary algorithm (EA) is a global, generic population-based, parallel search optimization technique originated by the inspiration of natural. Traditionally, evolutionary programming (EP), evolution strategies (ES), genetic algorithm (GA) genetic programming (GP) were its family. The other bio-inspired algorithms which are in the domain are particle swarm optimization (PSO), ant colony optimization (ACO), biogeography-based optimization (BBO), Cuckoo search (CS), artificial bee colony (ABC), learning classifier systems (LCS), and differential and estimation of distribution algorithm (EDA). Combinatorial optimization is a branch of mathematics, A. Radhakrishnan Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, India e-mail: [email protected] G. Jeyakumar (B) Amrita Vishwa Vidyapeetham, Ettimadai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_57

539

540

A. Radhakrishnan and G. Jeyakumar

where optimal solution is found from a finite set of possible solutions. From last decades, researchers have explored a lot in EA solving complex COPs. Though there are several approaches available to solve COPs, EAs have outperformed in solving COPs and finding solution in polynomial run time [1]. The practical application of combinatorial optimization can be seen in all the directions of real-world applications. As stated in “No Free Lunch” theorem [2], there is no single global optimal algorithm that can solve all the problems. This survey emphasizes on real-world COP applications where EAs are applied. It reviews the approaches embedded with EA to solve COPs. Analysis on how EAs solve the real-world COPs and benchmarking COPs is presented along with the performance measurements. The rest part of this article is structured as follows Sect. 2 introduces combinatorial optimization problems (COPs), the Sect. 3 discusses how EAs are used for solving COPs and summarizes based on application areas, and Sect. 4 concludes the article.

2 Combinatorial Optimization Problems The problems, around us, which have multiple feasible solutions but one or more best solutions are called as optimization problems. The process of searching the best solution among the feasible solutions is termed as optimization. The optimization problems are categorized into different types. COPs are those who have optimal solutions within a finite set of possible solutions. This set is defined by a set of conditions, and it is too large to search. The mathematical techniques finding the optimal solutions to COPs involve finding an ordering of a finite set of objects (solution components) that satisfy the given conditions. The COPs are hard optimization problems to solve than the continuous optimization problems. However, the advancements in the algorithm design methodologies and computing technologies made solving COPs easier. There are two categories of approaches in formulating the algorithms to solve COPs—(1) Exact Approach (2) Heuristic Approach. The exact approach follows the brute force strategy. The complexity involved in generating all possible solutions is high. Hence, the idea of finding approximate solutions which are good enough brought into picture. The heuristic approaches follow this idea. They do not guarantee that the exact solution will be found but find approximate solutions which are good enough for the problem at hand [3]. This lead to the availability of numerous general purpose heuristics for solving complex COPs in reasonable time. They are classified as constructive heuristics, meta-heuristics, approximate algorithms, and iper-heuristics. The constructive heuristics start the process with generating an “empty solution.” Then, the empty solution is extended to get a complete solution. Meta-heuristics [4] are problem independent algorithmic frame works which provides guidelines for constructing optimization algorithms for solving problems [5]. The most popular meta-heuristics are evolutionary algorithms (EAs) [6], Tabu search [7], simulated annealing [8], and ant colony optimization [9], etc. The approximate algorithms

Evolutionary Algorithm for Solving Combinatorial Optimization …

541

are the special class of heuristics, which guarantee the near optimal solutions with a limited error from global optimal solution with a specified threshold for the error. Integrating the operation research and artificial intelligence techniques, the iper-heuristic approaches aim at developing general algorithms able to generate problem specific algorithm. Objective of this paper is to present an overview of how evolutionary algorithms (EAs) are used to solve the COPs.

3 Evolutionary Algorithms for COPs EAs are stochastic, approximation optimization method that belongs to a subclass of evolutionary computation (EC). The layout of EA includes population initialization, fitness function evaluation, mutation, crossover, and selection. The fittest individual will survive and added to the population for next generation. The foremost step in EA is to initialize the population. The set of initial possible solutions (also called as individuals or chromosomes) is called as population. The chromosomes can be represented as integer, float, binary or tree depending on the application. The number of chromosomes in a population is termed as population size (Np). There are different initialization techniques. They are categorized based on randomness, compositionality, and generality [10]. Once the population is initialized, the next task is to evaluate the potential of the solutions. Two approaches are suggested for this— problem-based and evolutionary-based [11]. The next step is to perform crossover and mutation. These evolutionary operators are significant in EA, to produce the offspring. The logic of mutation and crossover strategies is specific to each classical EA. Several variants to these classical EAs are proposed in the research community with improved mutation and crossover strategies. The selection operators are to select the candidates for next generation. From past two decades researches and studies are extensively happening in evolutionary eomputing (EC) field to use EAs for solving COPs. Since the EAs are stochastic search algorithms, their probability of finding approximate best solutions from finite set of solution space of the COPs at the early stage of the optimization process is very high. The EA solving a COP needs to consider several aspects. There are different approaches in finding solutions for COP problems using EAs. There are EAs which are designed specifically for the COP at hand. EAs which are mainly proposed for continuous domain cannot be applied directly to COP; we need to apply modification in its representations. These modifications can be broadly classified as—Ensemble for COP, evolutionary operator-based hybridization and coevolution [12]. There are several model-based approaches based on ensemble for discrete optimization. Discrete problems can be represented as categorical variables, strings, tree, graph, permutations, and ordinal integers. The strategies are (i) Naive approach (ii) Custom modeling (iii) Discrete Model (iv) Mapping (v) Feature Extraction and (vi) Similarity centered modeling. A summary of various COPs solved by EAs, other meta-heuristics and their hybridization is presented in Tables 1, 2 and 3.

542

A. Radhakrishnan and G. Jeyakumar

Table 1 Summary for algorithms for COPs (for electrical power systems) Reference

Algorithm used

Technique

Optimal power flow with ecological emission [13]

Enhanced ACO

Modified structure

Optimal chiller loading using minimum power consumption [14]

Fish algorithm

Hybridization

Optimal reactive power dispatch problem [15]

Ant lion optimizer

Global optimizer

Optimal integration of renewable energy sources [16]

PSO

Modified PSO with operators of DE

Optimal reactive power dispatch problems [17–19]

Enhanced firefly algorithm, teaching learning based algorithms, Gravitational search algorithm

Hybridization of GA and LS

Table 2 Summary for algorithms for COPs (for routing, traveling salesman, scheduling, and planning) Reference

Algorithm Used

Technique

University time table scheduling [20]

Simulated annealing + GA

Hybridization

Parallel machines manufacturing scheduling [21]

Symbiotic organisms search Simulated annealing

Hybridization

Job scheduling [22]

PSO + simulated annealing

Hybridization

Vehicle routing problem [23]

Tabu search

Modified structure

Vehicle routing problem [24]

Modified PSO

Hybridization

Manufacture scheduling [25]

Hybrid EDA (markov network-based EDA)

Hybridization

Job scheduling [26]

EDA

Hybridization

Hybrid dynamic berth allocation planning problem [27]

Chemical reaction optimization

Hybridization

Flexible job scheduling [28]

PSO

Hybridization

Constraint shortest path problem [29]

PSO + VNS (variable neighborhood search)

Hybridization (GA + LS)

Traveling salesman problem [30]

ACO + 3 Opt algorithm

Hybridization

4 Conclusion This paper presented a survey on using EA and other meta-heuristics for solving combinatorial optimization problems (COPs). As many real-world COPs are NP,

Evolutionary Algorithm for Solving Combinatorial Optimization …

543

Table 3 Summary for algorithms for COPs (for pattern recognition-feature selection, classification, clustering) Reference

Algorithm Used

Technique

Feature selection and classification [31]

ACO + BCO

Hybridization

Feature selection [32]

Artificial bee colony and gradient boosting decision tree

Hybridization

High-dimensional classification [33] PSO—competitive swarm optimizer (CSO)

Modified structure

Handwritten signature verification [34]

Artificial immune systems

Modified structure

Feature selection in big data [35]

Fish swarm optimization

Modified structure

hard problems adapting EAs for approximate solutions is the most invited possibility. EAs are designed for solving continuous parameter problems, but they are not directly adaptable to discrete domain. We need to have proper mapping method to represent them. Applying genetic operator results in real values, thus proper mapping and searching techniques for global solution is effectively possible. Studies have proved that EAs perform better when they are modified for solving COPs. This paper summarized several COPs solved by EAs with suitable changes made in the algorithmic structure and hybridization.

References 1. Puchinger, J., Raidl, G.R.: Combining metaheuristics and exact algorithms in combinatorial optimization: a survey and classification. In: Mira, J., Álvarez, J.R. (eds.) Artificial Intelligence and Knowledge Engineering Applications: A Bioinspired Approach. IWINAC 2005. Lecture Notes in Computer Science, vol. 3562, pp 41–53. Springer, Berlin (2005) 2. Wolpert, D.H., Macreedy, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(1), 67–82 (1997) 3. Montiel, O., Díaz Delgadillo, F.J.: Reducing the size of combinatorial optimization problems using the operator vaccine by fuzzy selector with adaptive heuristics. Math. Prob. Eng. (2015) 4. Osman, I.H., Kelly, J.P.: Meta-Heuristics: An Overview. Meta-Heuristics, pp. 1–21. Springer, Boston (1996) 5. Glover, F., Sörensen, K.: Metaheuristics. Scholarpedia 10(4), 6532 (2015) 6. Back, T.: Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, Oxford (1996) 7. Glover, F., Laguna, M.: Tabu search. Handbook of Combinatorial Optimization, pp. 2093–2229. Springer, Boston (1998) 8. Van Laarhoven, P.J.M., Aarts, E.H.L.: Simulated annealing. Simulated ANNEALING: THEORY AND applications, pp. 7–15. Springer, Dordrecht (1987) 9. Dorigo, M., Di Caro, G.: Ant colony optimization: a new meta-heuristic. In: Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), vol. 2. IEEE (1999)

544

A. Radhakrishnan and G. Jeyakumar

10. Kazimipour, B., Li, X,. Qin, A.K.: A review of population initialization techniques for evolutionary algorithms. In: 2014 IEEE Congress on Evolutionary Computation (CEC). IEEE (2014) 11. Jin, Y.: A comprehensive survey of fitness approximation in evolutionary computation. Soft. Comput. 9(1), 3–12 (2005) 12. Bartz-Beielstein, T., Zaefferer, M.: Model-based methods for continuous and discrete global optimization. Appl. Soft Comput. 55, 154–167 (2017) 13. Raviprabakaran, V., Subramanian, R.C.: Enhanced ant colony optimization to solve the optimal power flow with ecological emission. Int. J. Syst. Assur. Eng. Manag. 9(1), 58–65 (2018) 14. Zheng, Z., Li, J.: Optimal chiller loading by improved invasive weed optimization algorithm for reducing energy consumption. Energy Build. 161, 80–88 (2018) 15. Mouassa, S., Bouktir, T., Salhi, A.: Ant lion optimizer for solving optimal reactive power dispatch problem in power systems. Int. J. Eng. Sci. Technol. 20(3), 885–895 (2017) 16. Lorestani, A., Ardehali, M.M.: Optimal integration of renewable energy sources for autonomous tri-generation combined cooling, heating and power system based on evolutionary particle swarm optimization algorithm. Energy 145, 839–855 (2018) 17. Liang, R.-H., et al.: An enhanced firefly algorithm to multi-objective optimal active/reactive power dispatch with uncertainties consideration. Int. J. Electr. Power Energy Syst. 64, 1088– 1097 (2015) 18. Ghasemi, M., et al.: Solving optimal reactive power dispatch problem using a novel teaching– learning-based optimization algorithm. Eng. Appl. Artif. Intell. 39, 100–108 (2015) 19. Chen, G., et al.: Optimal reactive power dispatch by improved GSA-based algorithm with the novel strategies to handle constraints. Appl. Soft Comput. 50, 58–70 (2017) 20. Fredrikson, R., Dahl, J.: A comparative study between a simulated annealing and a genetic algorithm for solving a university timetabling problem (2016) 21. Ezugwu, A.E., Prayogo, D.: Symbiotic organisms search algorithm: theory, recent advances and applications. Expert Syst. Appl. 119, 184–209 (2019) 22. Tang, H., et al.: Flexible job-shop scheduling with tolerated time interval and limited starting time interval based on hybrid discrete PSO-SA: An application from a casting workshop. Appl. Soft Comput. 78, 176–194 (2019) 23. Archetti, C., et al.: An iterated local search for the traveling salesman problem with release dates and completion time minimization. Comput. Oper. Res. 98, 24–37 (2018) 24. Norouzi, N., Sadegh-Amalnick, M., Tavakkoli-Moghaddam, R.: Modified particle swarm optimization in a time-dependent vehicle routing problem: minimizing fuel consumption. Optim. Lett. 11(1), 121–134 (2017) 25. Gen, M., et al.: Advances in hybrid EDA for manufacturing scheduling with uncertainty: part I. In: International Conference on Management Science and Engineering Management. Springer, Cham (2018) 26. Hao, X., et al.: Effective multiobjective EDA for bi-criteria stochastic job-shop scheduling problem. J. Intell. Manuf. 28(3), 833–845 (2017) 27. De, Arijit, et al.: A hybrid dynamic berth allocation planning problem with fuel costs considerations for container terminal port using chemical reaction optimization approach. Ann. Oper. Res. 1–29 (2018) 28. Nouiri, M., et al.: An effective and distributed particle swarm optimization algorithm for flexible job-shop scheduling problem. J. Intell. Manuf. 29(3), 603–615 (2018) 29. Marinakis, Y., Migdalas, A., Sifaleras, A.: A hybrid particle swarm optimization–variable neighborhood search algorithm for constrained shortest path problems. Eur. J. Oper. Res. 261(3), 819–834 (2017) 30. Mahi, M., Baykan, O.K., Kodaz, H.: A new hybrid method based on particle swarm optimization, ant colony optimization and 3-opt algorithms for traveling salesman problem. Appl. Soft Comput. 30, 484–490 (2015) 31. Shunmugapriya, P., Kanmani, S.: A hybrid algorithm using ant and bee colony optimization for feature selection and classification (AC-ABC Hybrid). Swarm Evol. Comput. 36, 27–36 (2017)

Evolutionary Algorithm for Solving Combinatorial Optimization …

545

32. Rao, H., et al.: Feature selection based on artificial bee colony and gradient boosting decision tree. Appl. Soft Comput. 74, 634–642 (2019) 33. Gu, S., Cheng, R., Jin, Y.: Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft. Comput. 22(3), 811–822 (2018) 34. Parmar, M., et al.: State of art survey signature verification techniques 2019. Asian J. Convergence Technol. (AJCT) 5(3), 91–96 (2020) 35. Manikandan, R.P.S., Kalpana, A.M.: Feature selection using fish swarm optimization in big data. Cluster Comput. 22(5), 10825–10837 (2019)

Effect of J48 and LMT Algorithms to Classify Movies in the Web—A Comparative Approach Prashant Bhat and Pradnya Malaganve

Abstract Social Media websites such as Facebook, YouTube, twitter, etc., are the convenient platforms to share one’s views about the multimedia. Videos getting uploaded on YouTube every day are millions in number. Videos can be of different category such as comedy video, sports video, news, advertisement, movie trailer video, etc. Nowadays, data mining researchers are attracted towards different classification techniques of data mining to discover hidden information as well as to discover knowledge from huge video data. The goal of this research is, classifying and predicting movies trailer videos as poor movie, good movie, very good movie and excellent movie based on the meta data such as likes, dislikes, comments, ratings, budget, etc. An attempt is made in the present work to provide an effective mining result about classifying Social Media movies. These movies are labelled based on a particular class and other related attributes of the same dataset. 10 folds crossvalidation test is applied on J48 and LMT decision tree algorithm, and comparison analysis is made based on confusion matrix and accuracy rate. Keywords Classification · J48 Decision tree · LMT Decision tree · Social Media

1 Introduction Social Media data is very huge in size, and many times the data will be full of noise, fuzzy, incomplete and unstructured in nature. At the same time, it is an essential task to handle such kind of data and discover useful information or knowledge out of it. Every day about a million GB of movie videos are uploaded on Social Media websites such as Facebook, YouTube, Instagram, etc [1].

P. Bhat · P. Malaganve (B) Department Computational Science and IT, Garden City University, Bengaluru, India e-mail: [email protected] P. Bhat e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_58

547

548

P. Bhat and P. Malaganve

In the proposed work, by using the meta data rating, all movies are classified as whether the movie is poor, good, very good or excellent to watch. Ratings are nothing but marks given to a movie trailer on YouTube in the range of 0–10 to convey one’s opinion about the movie trailer. And, average of different viewers rates on particular movie will be calculated and given to a movie. It gives a hint to the user to decide upon watching that movie. The present dataset contains several attributes related to YouTube and Twitter. As well as another meta data [2] budget is also considered for comparison. The attribute budget is divided among three labels such as, high budget movie, average budget movie and low budget movie [3, 4]. In this work, WEKA tool [5] is used to pre-process and to classify the dataset considering the class and other related meta data. Decision Tree—J48 [6] and LMT are used for testing and training the dataset by considering 10 folds cross-validation test [7]. An attempt is to make comparative analysis between Decision Tree J48 and LMT [8]. The rest of the paper is organised as literature review, proposed methodology including sample dataset, table of contents, confusion matrix of both J48 and LMT Decision Tree algorithms, findings and conclusion.

2 Proposed Model Figure 1 represents the proposed model for comparison analysis of Decision Tree J48 and LMT based on confusion matrix and accuracy rate. The data is extracted from Social Media that are YouTube and Twitter, and the extracted data is stored in .CSV file for pre-processing. The pre-processed data is classified into different classes such as poor movie, good movie, very good movie and excellent movie based on the meta data using LMT and J48 data mining algorithms. By using classification techniques, we apply 10 folds cross-validation test on pre-processed data set to divide it as training and testing data to get the efficient result of Classification. Finally, the accuracy rate and confusion matrix of both LMT and J48 algorithms is generated using WEKA then the results of both the algorithms is compared and analysed to prove which algorithm can provide best accuracy result after classifying the whole data set which contains movie trailer videos.

2.1 J48 Decision Tree J48 Decision Tree is advance version of C4.5. The algorithm uses divide-and-conquer method. And, to construct the tree, it uses pruning method [9]. It is a common method which is used in information gain or entropy measure [10]. Hence, it is like tree structure with root node, intermediate and leaf nodes. Node holds the decision and helps to acquire the result [11].

Effect of J48 and LMT Algorithms to Classify Movies in the Web … Data extracted from You Tube

549

Data extracted from Twitter

Data Selection .CSV file Data Pre-process

Classification Algorithms LMT

J48

Data mining algorithm

10 folds cross validation Training

Testing

Confusion matrix & Accuracy rate Knowledge Discovery Comparison analyses

Fig. 1 Propose model for comparison analysis of J48 and LMT Decision Tree

2.2 LMT Decision Tree Logistic model tree (LMT) is a classification model with an associated supervised training algorithm. It combines Decision Tree learning and logistic prediction. Logistic model trees use a Decision Tree that has linear regression models at its leaves to provide a section wise linear regression model. All illustrations must be supplied at the correct resolution [10, 12].

550

P. Bhat and P. Malaganve

2.3 Attribute Details of Table 1 Movie: The dataset contains 232 different movie names which are stored under Movie attribute column. Year: It indicates, in which year each movie got released on the screen. Ratings: Shows the number of ratings the movie has got, depending on which movie can be classified. Genre: It indicates the genre or category of the movie. Gross: Gross collection of a particular movie after it got released on the screen. Budget: Total amount of budget the movie required to build. Screens: Number of screens in USA where all the movie got released. Sequels: Number of sequels made next to the movie. Sentiment: Sentiment score of the movie. Views: Number of views of movie trailer on YouTube. Likes: Number of likes of movie trailer on YouTube. Dislikes: Number of dislikes of movie trailer on YouTube. Comments: Number of comments of movie trailer on YouTube. Aggregate followers: Aggregate actor followers on Twitter. Table 1 contains the real values of used dataset with respective data type. To label the class as poor movie, good movie, very good movie and excellent movie, we have considered “Ratings” attribute and converted the numeric values of Ratings attribute to nominal values by providing particular range as shown in Table 2. Table 1 Dataset descriptions Name of attribute

Data type

Description

Movie

Nominal

Name of the movie

Year

Numeric

Year at which movies were projected on the screens

Ratings

Numeric

Ratings on the movie

Genre

Numeric

Genre

Gross

Numeric

Gross income in USD

Budget

Numeric

Budget in USD

Screens

Numeric

Number of screens in USA

Sequel

Numeric

Sequel

Sentiment

Numeric

Sentiment score of the movie

Views

Numeric

Number of views of movie trailer on YouTube

Likes

Numeric

Number of likes of movie trailer on YouTube

Dislikes

Numeric

Number of dislikes of movie trailer on YouTube

Comments

Numeric

Number of comments of movie trailer on YouTube

Aggregate followers

Numeric

Aggregate actor followers on Twitter

Effect of J48 and LMT Algorithms to Classify Movies in the Web … Table 2 Classification of rating attribute

Table 3 Classification of budget attribute

Rating range

Label

0–3

Poor

3.1–5

Good

5.1–8

Very good

8.1–10

Excellent

Budget range

551

Label

Budget < 800,000

Low budget movie

Budget < 16,000,000

Average budget movie

Budget > 16,000,000

High budget movie

The attribute “Budget” was having carrying numeric values has converted to nominal values to classify the dataset in better way. Nominal values of attribute “Budget” are labelled as shown in Table 3.

3 Findings and Results 3.1 Folds Cross-Validation Test Here, we can divide the dataset in different number of folds. If we consider 10 folds, the dataset is divided into 10 different sets. In the first iteration, first set is considered as testing dataset and remaining 9 sets are considered as training datasets. Same in the second iteration, second set is considered as testing dataset and remaining 9 sets are considered as training dataset and so on… Hence, the entire dataset is considered as training as well as testing dataset.

3.2 Comparison Analyses of J48 and LMT Decision Tree Figure 2. shows the accuracy of Decision Tree J48 as 85.71% and the accuracy of Decision Tree LMT as 86.58%; therefore, Decision Tree LMT gives better result as compared to J48. When we observe the confusion matrix in Fig. 3. J48 has correctly classified 193 instances as very good movies, 5 instances as good movies, 0 instances as excellent movie and the dataset does not contain poor movie which are below 3 ratings so confusion matrix has taken only three classes. At the same time, LMT has correctly classified 199 instances as very good movies, 1 instance as good movie, 0 instance as excellent movie, and the dataset does not contain poor movie which are below 3 ratings so confusion matrix has taken only three classes [13, 14].

552

P. Bhat and P. Malaganve

Fig. 2 J48 and LMT

a 193 13 12

=== Confusion Matrix === b c Classified as 6 2 | a = Very good movie 5 0 | b = Good movie 0 0 | c = Excellent movie

a 199 17 12

=== Confusion Matrix === b c Classified as 2 0 | a = Very good movie 1 0 | b = Good movie 0 0 | c = Excellent movie

Fig. 3 Confusion Matrix J48 and Confusion Matrix LMT

4 Conclusion As we saw in Sect. 3.2, i.e., comparison analysis of Decision Trees J48 and LMT, accuracy of LMT is better than the accuracy of J48, as Decision Tree LMT has correctly classified a greater number of instances. Based on the analysis of Confusion Matrix, Classification accuracy and other required calculations shown in Fig. 2, such as kappa statistics, mean absolute error, root mean squared error, relative absolute error and root relative squared error which takes vital lead in classifying the instances correctly, we conclude that LMT Decision Tree is the best suitable classification method for classifying the movies trailers data set with good efficiency and accuracy.

5 Future Work In the future, we will make good attempt to check whether all high budget movies are excellent or very good to convey and whether all low budget and average budget movies can also carry good ratings or not.

Effect of J48 and LMT Algorithms to Classify Movies in the Web …

553

References 1. Sharma, A.K., Sahni, S.: A comparative study of classification algorithms for spam email data analysis. Int. J. Comput. Sci. Eng. (IJCSE). 3(5) (2011). ISSN 0975-3397 2. Rangaswamy, S., Ghosh, S., Jha, S., Ramalingam, S.: Metadata extraction and classification of YouTube videos using sentiment analysis. In: 2016 IEEE International Carnahan Conference on Security Technology (ICCST) 3. Algur, S.P., Bhat, P., Kulkarni, N.: Educational data mining: classification techniques for recruitment analysis. Int. J. Modern Educ. Comput. Sci. 2, 59-65 (2016). (Published Online February 2016 in MECS). http://www.mecs-press.org/.10.5815/ijmecs.2016.02.08 4. Bansal, A., Gupta, C.L., Muralidhar, A.: A sentimental analysis for youtube data using supervised learning approach. Int. J. Eng. Adv. Technol. (IJEAT) 8(5, (2019, June). ISSN 2249-8958 5. Weka—Data Mining Machine Learning Software. Available at http://www.cs.waikato.ac.nz/ ml/weka/ 6. Kalmegh, S.R.: Comparative analysis of WEKA data mining algorithm random forest, Randomtree and LADTree for classification of indigenous news data. Int. J. Emerg. Technol. Adv. Eng. www.ijetae.com. 5(1) (2015, January). ISSN 2250-2459, ISO 9001:2008 Certified 7. Bhat, P., Malaganve, P., Hegde, P.: A new framework for social media content mining and knowledge discovery. Int. J. Comput. Appl. (0975 – 8887) 182(36) (2019, January) 8. Kalmegh, S.: Analysis of WEKA data mining algorithm REPTree, simple cart and randomtree for classification of Indian News. Int. J. Innov. Sci. Eng. Technol. (IJISET) 2(2) (2015, February) 9. Nahar, N., Ara, F.: Liver disease prediction by using different decision tree techniques. Int. J. Data Mining Knowl. Manag. Process (IJDKP) 8(2) (2018, March) 10. Algur, S.P., Bhat, P.: Web video mining: metadata predictive analysis using classification techniques. Int. J. Inf. Technol. Comput. Sci. 2, 68–76 (2016). (Published Online February 2016 in MECS) 11. Algur, S.P., Bhat, P.: Abnormal web video prediction using RT and J48 classification techniques. Int. J. Comput. Sci. Eng. 4(6), 101–107 (2016, June). E-ISSN 2347-2693 12. Malika, H., Tiana, Z.: A framework for collecting youtube meta-data. In: Peer-Review Under Responsibility of the Conference Program Chairs. Published by Elsevier B.V. https://doi.org/ 10.1016/j.procs.2017.08.347 13. Algur, S.P., Bhat, P., Ayachit, N.H.: Educational data mining: RT and RF classification models for higher education professional courses. Int. J. Inf. Eng. Electron. Bus. 2, 59-65 (2016). (Published Online March 2016 in MECS, http://www.mecs-press.org/) https://doi.org/10.5815/ ijieeb.2016.02.07 14. Vadhanam, B.R.J., Mohan, S., Ramalingam, V.V., Sugumaran, V.: Performance comparison of various decision tree algorithms for classification of advertisement and non advertisement videos. Indian J. Sci. Technol. 9(48) (2016, December). https://doi.org/10.17485/ijst/2016/ v9i48/102098

A System to Create Automated Development Environments Using Docker N. S. Akhilesh, M. N. Aniruddha, Anirban Ghosh, and K. Sindhu

Abstract In software development, there is often a great deal of dependencies that need to be set up and managed before the actual process of development can begin. For instance, before one can start doing Java development, one would need to install and set up the JRE, JVM, Gradle (Optionally) and Maven (Optionally). Additionally, when working with a team, maintaining consistency in the dependency versions used by everyone in the team becomes necessary as well (since version clashes can often lead to unpredictable behavior and incompatibility issues). To resolve all this, there exists tools such as package managers and the more commonly used Docker. Docker is a tool often used in development to achieve cross platform automated dependency management and uniformity between development and production environments. While Docker is an incredibly useful tool, it does have a learning curve associated with it, and novice programmers would need to understand concepts such as virtualization and the like before they can start using Docker and benefiting from its various features. All of this facilitates the need for an application or system that would allow developers to leverage the power of Docker without requiring any knowledge of it. In this paper, we illustrate and implement such a system, one which allows even novice programmers to easily and effortlessly create automated development environments that leverage Docker under the hood. Keywords Docker · Automation · Dependency management

N. S. Akhilesh (B) · M. N. Aniruddha · A. Ghosh · K. Sindhu BMS College of Engineering, Bangalore, India e-mail: [email protected] URL: https://bmsce.ac.in/home/Information-Science-and-Engineering-About M. N. Aniruddha e-mail: [email protected] A. Ghosh e-mail: [email protected] K. Sindhu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_59

555

556

N. S. Akhilesh et al.

1 Introduction Modern applications are often quite complex. Generally, they are composed of a number of software each playing a vital role in the application (e.g., a MEAN stack application uses MongoDB as its database, Express for routing, AngularJS for its frontend and NodeJS for its back-end). Setting up and managing each of these software dependencies (as well as each of their own internal dependencies) can be quite cumbersome, especially in a team of people working on the application. This is where Docker comes in. Docker is a tool that (among other things) allows one to define all the software dependencies of any application in a configuration file called a Dockerfile (or docker-compose.yml file) and feed that configuration file into Docker which will then use the file to create a development environment which has all the dependencies mentioned in the file automatically installed and set up. And in a team, the configuration file can easily be shared via git to ensure uniformity across the team over the application’s dependencies. By automating much of the dependency setup process, Docker has not only solved a great deal of problems related to dependency management (such as version clashes, complications involved in version updates and OS-level interference) but has also made the actual process of development easier [1]. Needless to say, Docker is an incredibly useful software, and its wide scale adoption and use in the industry is reflective of this. In this paper, we focus primarily on Docker’s ability to automate dependency management as we believe that a great deal of programmers (particularly novice programmers and students) can benefit from this feature. Novice programmers in particular often face difficulty setting up dependencies for a language, tool or framework before they can start using it (Ex: Setting up Ruby on Rails on Windows or setting up a MEAN stack application). Docker can be useful in this situation, but it does have an associated learning curve and people who are new to programming or unfamiliar with the topic may need to understand things such as virtualization before they are able to understand Docker. In this paper, we propose a system or application that takes the form of an integrated development environment (IDE) that uses Docker under the hood to set up environments for any language, tool or framework which people can immediately start working with. The end product should be a code editor similar to VS code but one where a developer can additionally type out a few pieces of information (such as a language and its version) and the editor will then automatically set up an environment based on that information in which the developer can start coding. In essence, this is a system which abstracts on top of Docker to allow for its use without having to know how to write a Dockerfile or docker-compose.yml file. Such a system would be useful to novice programmers, students (who want a learn various languages, tools and frameworks without having to worry about setting them up), computer labs (since this one system can replace a number of languages, tools and frameworks that would otherwise need to be set up individually) and developers who are interested in leveraging the power of multiple languages side by side in a Jupyter notebook style operating environment.

A System to Create Automated Development Environments Using Docker

557

2 Literature Survey 2.1 Docker Docker is a tool used to create quick execution environments known as containers to perform various tasks. This concept of using containers to manage various functionalities is known as containerization. Containerization in many ways is the evolution of virtualization [2]. It was standard practice that if one wanted to run several different tasks without those tasks interfering with them, you would run them on individual virtual machines, and a large part of cloud services such as AWS, GCP, Azure still uses virtual machines to provide isolated serviceable environments for clients. But one of the major disadvantages with virtual machines is the amount of overhead they bring. An Ubuntu virtual machine, often contains a large amount of software such as a Web browser, a GUI, a text editor that are largely useless if all you want to do with it is run a REST server. Containers help solve this problem because unlike virtual machines which are entire OSs that take up a certain amount of hardware resources and space, containers are nothing more than lightweight isolated process groups which can do many of the same things that a virtual machine can but without bringing the large amount of overhead associated with virtual machines [3, 4]. Containers can be started and stopped near instantaneously since they are nothing more than specialized processes unlike virtual machines which need time to boot up an entire OS. Containers also take up significantly less resources and can be automated extremely easily [5]. Docker containers have been shown to be faster than virtual machines in boot up times, calculation times, random read, write and mixed speeds and sequential read and write speeds [6]. Containers are able to achieve all these benefits by using technologies which are already provided by the Linux kernel: Control groups (CGroups) and kernel namespaces [3]. Control groups is a special feature of the Linux kernel which allows for the easy allocation, management and monitoring of resources for a given process. It can also be used to set limits to the amount of resources (RAM, CPU, etc.) that a process can use [7]. Kernel namespace can be used to isolate process groups from each other. A container can be assigned a specific namespace, and all processes and resources inside the container are scoped to that namespace and cannot access anything outside that namespace [8]. All in all, these various technologies come together in Docker to create automated isolated lightweight execution environments. Docker has become mainstream in the industry. It is commonly used as a tool for the development, testing and deployment of microservice-based architectures since containers are ideal environments in which microservices can be run in [9]. This is largely because Docker automates the process of setting up networks and connections between the containers. Additionally, Docker gained a great deal of traction in the field of DevOps. Another major benefit provided by Docker is that it creates uniformity between the development, staging and production environments since the same configuration can be used for both environments [10].

558

N. S. Akhilesh et al.

2.2 Docker in Research Sample work and implementation are essential in many aspects of scientific research, being able to reproduce the work of specific research has become very vital to its verification and validation by researchers and domain experts. Though reproducing computer software seems significantly simpler than replicating the physical environments of some experiments, the ever-changing nature of software today and the challenges of interoperable dependencies in software can make this task a serious challenge. This is where Docker can prove to be extremely useful as it stands as a far superior solution to existing solutions such as workflow systems and virtual machines. Carl Boettiger illustrates this in his paper where he uses an R statistical environment setup in various conditions (including Docker) and compares them [11]. Additionally, Docker is also useful in automating the various tasks of a workflow system like makeflow and workqueue. Containers can be connected to various points of a workflow’s infrastructure, and there have been several methods produced to manage containers’ images that need to be shared for the execution of tasks [12]. All of this hints to Docker’s extensive use in the field of research, and an area of interest for this paper since replicating environments to test and review peer research is a vital aspect of the field.

2.3 Electron JS Electron is an open-source framework that can be used for desktop app development. It is created and maintained by GitHub who used it to build the Atom editor. Electron uses a combination of the Chromium browser and the NodeJS runtime to create fully functioning desktop applications. Because of this, it allows for the UI development of the application to be done using standard HTML, CSS and JS, while the core logic of the application is done via NodeJS. A majority of Electron’s APIs is written using C++ and Objective-C which are then exposed to the core logic via NodeJS bindings [13].

3 Existing Solutions Automation in development is not a new concept. There have been tools to do this even before Docker. So if you are a developer, what are some of the ways you could tackle common issues that arise when working on a coding project such as dependency hell, poor documentation (which can often make it difficult to setup, initiate and work on existing projects) and code rot (referring to code changing behavior due to external circumstances such as OS updates or bug fixes in the languages used by the software) [11].

A System to Create Automated Development Environments Using Docker

559

One approach is to use OS package managers such as APT (Ubuntu), HomeBrew (MacOS) or chocolatey (Windows) to automate installation and management of dependencies and then use workflow systems such as MakeFlow to provide a simple and automated means to build and execute code (instead of having to rely on documentation). Additionally, to avoid code rot, one could set up a controlled environment using virtual machines in which to develop the code in. To make life easier, one could also use a tool like vagrant which automates the process of setting up and managing virtual machines. While the above-mentioned approach is fairly popular (especially in research), it requires a developer to understand and be able to use a large set of tools such as package managers and workflow systems. Additionally, virtual machines can be slow and resource intensive, thus slowing down the speed of active development at the cost of providing consistency and predictability. A better approach would leverage Docker which as discussed above can solve all the issues we discussed earlier: dependency hell, poor documentation and code rot (via configurable isolated lightweight shareable environments called containers for the code to run in). Docker has seen a large deal of adoption in industries and corporations for its ability to automate the development workflow and make the lives of developers easier. So if you are an experienced developer, you could just learn Docker and achieve all the benefits offered by the system we intend to propose in the next section, but Docker is not a simple tool. If you are novice programmer, a student or someone who is completely new to development, you will be less likely to learn something like Docker (since you have not even learnt a language yet) and thus cannot benefit from all the features it provides.

4 Proposed System As we have discussed in the previous section, Docker alone does already achieve a great deal of automation. Its only flaw being its associated learning curve which can deter people who are new to programming. Therefore, in this paper, we propose a solution which abstracts the features of Docker and provides them to a user through an easy-to-use and simplified user interface, allowing the user to leverage Docker without knowing how to use it. Our proposed system is an IDE in which a user would enter a few details in a form and the IDE will then use that information to set up a development environment by creating the required Dockerfiles and docker-compose.yml files as well as the required language files for that specific language, tool or framework that the user wishes to work with.

560

N. S. Akhilesh et al.

The proposed system will also allow for these development environments to be shareable. This is done by allowing for each development environment to be created using a minimal configuration file. Adding this configuration file to any normal project will make it compatible with the system, and we refer to such a project as a recipe. These projects (recipes) can then be shared and managed via Git allowing for them to be community driven and customizable. The team behind the proposed system will maintain official recipes for various languages that will act as both stable defaults and base recipes. Any individual can then build on these official recipes to create customized and personalized recipes, and this would be especially useful in private organizations where development teams might have their own custom setups for products.

5 Mechanism Assuming that all the OS and software requirements are fulfilled, the application works in the following way: (Let us assume that a user of the application wishes to execute some code in NodeJS) First the application pulls a recipe template (remember that a recipe is merely a project with a special configuration file and template here refers to handlebars template) for NodeJS (the official by default, but a custom one can be specified by the user) from GitHub/GitLab/BitBucket and stores the template in a special directory reserved for the application by the OS which is usually: • • • •

Windows XP—C:/Documents and Settings/USERNAME/Application Data Windows 7 and above—C:/Users/USERNAME/AppData/Roaming MacOS—/Users/USERNAME/Library/Preferences Linux—/home/USERNAME/.local/share.

Then, the application gets any inputs from the recipe that were specified by the creator of the recipe and renders them as a form to the user. The application then takes the output of the form and uses it to fill out the recipe template and then places the recipe in a directory local to the project in which the user is working (usually a directory called “judip_recipes”. After the recipe is added, it appears on the frontend as a codeblock to the user where the user can then enter any type of (in this case NodeJS) code, and the entered code gets saved to the locally stored recipe. Finally, the application checks the newly installed recipe’s configuration file which contains “execute” and “execute_background” keys that the application can use to execute the recipe.

A System to Create Automated Development Environments Using Docker

561

Fig. 1 NodeJS code being executed by the IDE in an expanded codeblock

6 Results To demonstrate our proposed system in use, we have developed a PoC desktop application which is implemented in the form of a command line interface (CLI) built using NodeJS and a graphical user interface (GUI) built using ElectronJS. The CLI contains the core logic of the application that interacts with Docker underneath the hood to provide environments to code in, and the GUI provides an interface with which the user can then interact with the CLI seamlessly. The below screenshots depict the application running code written in NodeJS on a laptop with a Windows 10 OS, i5-6300HQ Intel processor, 8 GB of RAM, an Nvidia GTX 960M graphics card and 256 GB Sandisk SSD (Figs. 1 and 2). The above execution was performed using the official NodeJS recipe for the proposed system which can be found at https://github.com/izalus which also contains all the code used to create the application (both CLI and GUI) as well as official recipes for other languages such as C, C++, Java and Python.

562

N. S. Akhilesh et al.

Fig. 2 NodeJS code being executed by the IDE in an un-expanded codeblock

7 Conclusion In this paper, we illustrated the concept of a recipe which is a configuration that the application uses to recreate development environments and enables them to be shared easily via any kind of public distribution platform like GitHub, BitBucket, etc. We built an application using a range of technologies: NodeJS, Electron, ReactJS, Git and Docker that allows one to create automated development environments for various languages, tools and frameworks. The application is split into a command line interface that is shared through NPM and a graphical user interface that is shared using GitHub or Bintray.

References 1. Willis, J.: Docker and the Three Ways of DevOps. https://goto.docker.com/rs/929-FJL-178/ images/20150731-wp_docker-3-ways-devops.pdf 2. Turnbull, J.: The Docker Book: Containerization is the New Virtualization (2014) 3. Docker. https://www.docker.com/ 4. Linux Containers. https://linuxcontainers.org/ 5. Merkel, D.: Docker: Lightweight Linux Containers for Consistent Development and Deployment. https://www.seltzer.com/margo/teaching/CS508.19/papers/merkel14.pdf 6. Rad, B.B., Bhatti, H.J., Ahmadi, M.: An introduction to Docker and analysis of its performance. IJCSNS Int. J. Comput. Sci. Netw. Secur. 17(3) (2017). http://paper.ijcsns.org/07_ book/201703/20170327.pdf 7. Linux control groups. http://man7.org/linux/man-pages/man7/cgroups.7.html 8. Linux namespaces. http://man7.org/linux/man-pages/man7/namespaces.7.html 9. Anderson, C.: Docker [Software engineering]. IEEE Softw. 32(3), 102-c3 (2015). https:// ieeexplore.ieee.org/document/7093032

A System to Create Automated Development Environments Using Docker

563

10. Zhang, Q., Liu, L., Pu, C., Dou, Q., Wu, L., Zhou, W.: A Comparative Study of Containers and Virtual Machines in Big Data Environment (2018). https://arxiv.org/pdf/1807.01842.pdf 11. Boettiger, C.: An Introduction to Docker for Reproducible Research, with Examples from the R Environment (2014) 12. Zheng, C., Thain, D.: Integrating Containers into Workflows: A Case Study Using Makeflow, Work Queue, and Docker (2015) 13. Electron. https://www.electronjs.org/

Novel Methodologies for Processing Structured Big Data Using Hadoop Framework Prashant Bhat and Prajna Hegde

Abstract There are many tools and techniques to store and process data. But with big data usual traditional systems fail to handle big data. The reason is its structure, size, etc. Because of this reason many new tools have been developed and Hadoop is one of them. Hadoop is a framework that contains many tools to manage big data. Apache Hadoop has tool called Hive which can be used to process big data in structured form. There are many ways in which big data can be processed. But if the user is not well-versed with programming and knows query language like SQL, then required information can be retrieved by using Hive. Using Apache, Hive not only reduces the number of lines of coding but also saves time of programmer. This paper explains the working of Hive along with an illustration of how useful data can be retrieved by using HiveQL. This paper presents the effective way of achieving big data analytics using big data using Hadoop Hive. Keywords Big data · Big data analysis · Hadoop · Hive · MapReduce

1 Introduction The data obtained in huge volume from different sources can be in any form, i.e., it may be structured, unstructured, and even semi-structured. Based on whether data is structured, unstructured, or semi-structured, appropriate tool can be selected in order to get useful data. Challenge is getting something which has a value for the user from the huge amount of data gathered from various sources [1]. Analysis of big data [2] results in information that can be used by the user to implement new ideas in their business, hence results in increasing efficiency of the business. Getting information from huge dataset can also result in monitoring financial transactions. Information P. Bhat · P. Hegde (B) School of Computational Sciences and Information Technology, Garden City University, Bengaluru, Karnataka, India e-mail: [email protected] P. Bhat e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_60

565

566

P. Bhat and P. Hegde

retrieval can be done from big data [3] that can be used in healthcare, crime, and other fields. The data that flows into the system needs to be analyzed effectively and quickly to get useful data [4]. Hadoop enables storing and processing of big data in distributed systems across a cluster of computers using simple programming model. Hadoop MapReduce is one of the data processing technique which can be applied to perform big data analytics. Another Hadoop tool Apache Hive can be used to process data quickly. Hive is a data warehouse system which allows user to write query similar to SQL and helps to get appropriate answer to the query. Hive [5] is used to analyze huge amount of data stored in HDFS. This paper gives insight into working with big data using Hive. Also, this paper explains the flow of data in Hive system, and the way by which query is processed [6]. This paper also focusses on characteristics of Hive system.

2 Related Work Authors Kan et al. [7] presented a paper, where they say Electronic Health Record (HER) store information in digital format. Due to technological innovations, data in HER is increasing. Effective techniques are needed to store and analyze and interpret heterogeneous data in HER. In this paper, authors focused on techniques by which data in HER can be analyzed, and required information can be retrieved. Hive queries are executed in HDFS. This paper also shows the use of tableau as a data analysis technique to get meaningful information from visual graph. Authors Potharaju et al. [8, 9] say Hadoop is not a software. It cannot be downloaded directly into the computer. Hadoop is a framework that contains many tools. It makes use of ad hoc queries and analyses huge datasets stored in Hadoop. SQL like language is facilitated by Hive called HiveQL. In this paper, authors presented simple examples for using Hive using Hadoop. This paper also explains how to create table and store data in table along with getting data from table when required. Cumulative CPU time, time required for fetching records from files are also explained in this paper. In this paper, author says Apache Hive which provides analytical power to the users and organizations. Hence, it has become in practice standard for SQL. Hive is created by Facebook in 2008. This paper compares Apache Hive with Impala, Shark, and also HAWQ. This paper explains the strength of Hive which has become enterprise SQL data warehouse. Authors Thusoo et al. [10] presented a paper, where they say that warehouse system is becoming expensive as datasets to be analyzed are growing with high frequency. MapReduce programming model needs user to write custom programs which make it low-level model. This paper explains Hive architecture, Hadoop, HiveQL language. HiveQL allows user to add custom MapReduce scripts to queries. HiveQL contains tables, supportive data type, arrays, maps, etc. Along with this, Hive also has meta store which contains schemas and statistics that can be used for data

Novel Methodologies for Processing Structured …

567

exploration, query optimization, and compilation. Authors have explained structure of Hive warehouse in this paper. Author Gupta [11] presented a paper, where he says it is very difficult as well as costlier to process huge amount of data in traditional way. Popular framework called Hadoop written in java is used by companies like Facebook, Yahoo, etc. These companies use Hadoop to process huge data in commodity hardware. Hive is used to process structured data using query processing called HiveQL which is similar to SQL. HiveQL is introduced by Facebook. Hive contains metastore, system catalog which are useful while doing query optimization, data exploration.

3 Proposed Methodology Data growing exponentially with time can be stored and processed using framework called Hadoop. It provides different tools like MapReduce, Hive, Tez, etc. MapReduce is a programming model which processes data using two steps called map and reduce. User needs to write lengthy programs in order to work with data. Rather than writing long codes, it is easy to query dataset to retrieve information. Hive is built to work with structured data in the way same as SQL. Hive can be used to query data which is in huge volume same as SQL. Hive is a warehouse infrastructure developed on top of Hadoop. Hadoop Hive architecture is a solution to manage big data. It works on data which is stored in HDFS. Hive uses language called HiveQL to query data. Hive works on the principal of write once and read many times. Hive is a mirage which processes data using MapReduce but no need to write long code for the user. Hive query is converted into MapReduce program, and data is retrieved and provided to the user. Hive is just a translator which makes work of user much easier (Fig. 1).

3.1 UI User interface is an interface between user and Hive. It allows user to communicate with Hive. It provides Hive which provides command line interface, web interface, and thrift server to the users to submit their queries.

3.2 Metastore It stores structure details of partition, tables. Information like number of columns in table, data types of databases are stored in metastore. Hive uses derby SQL server as default metastore.

568

P. Bhat and P. Hegde

MapReduce

User Interface 1 Execute query

Web Interface

Name Node

Hive CLI

Resource Manager

Thrift 6 Execute

HDFS Execution En3 Get metadata

2 Get plan Driver

Compiler 5 Send plan

Metastore 4 Send metadata

Fig. 1 Hive data flow

3.3 Executing Engine Its work is to execute the plan developed by compiler. To execute the work plan, it interacts with name node, resource manager. It communicates with data node, where actual data is stored. It also communicates bidirectionally with metastore to perform data definition language operations. After communication with Hadoop daemons like data node, name node, and job tracker execution engine executes query on HDFS. The result generated is sent to user interface via driver.

4 Characteristics of Hive Hive is a data warehouse infrastructure which resides on the top of Hadoop and is used to analyze big data. Some of the characteristics of Hive are as follows (Fig. 2): • Large data: Hive is a tool that can be used to process data with huge volume. • Language: Hive uses a query language called HiveQL. • Table structure: Hive stores data in table format. That is, it stores data in terms of rows and columns. • Data analysis: Hive is used to retrieve useful information from large dataset. Hence, it helps in data analysis. • Storage: Hive works on data stored in Hadoop distributed file system. • Multi-user: More than one user can query data stored in Hadoop distributed file system at the same time using HiveQL language provided by Hive.

Novel Methodologies for Processing Structured …

569

Large Data Language

Multi User

Hive Table Structure

Storage Data Analysis

Fig. 2 Characteristics of hive

Hive can be used to query huge database which cannot be queried using structured query language (SQL). Queries are converted into series of MapReduce jobs. Hence, user need to to write long MapReduce codes. Hive uses a query language called HiveQL to get useful work done (Fig. 3). Consider a dataset of Zomato restaurants in India. It is a huge dataset which can be effectively queried using Hive. It contains following attributes: • Res_id: It represents restaurants id. • Name: It represents name of the restaurant. • Establishment: It gives details of restaurant whether it is dhaba, quick bites, casual, etc. • Url: It represents url address. • City: represents the city name, where the restaurant is located at. • City_id: It represents city id number. • Locality: It represents locality of restaurant. • Latitude: It gives latitude coordinate of restaurant. • Longitude: It gives longitude coordinate of restaurant.

Fig. 3 Zomato India dataset

570

• • • • • • • • • • • • • • • •

P. Bhat and P. Hegde

Zipcode: It represents zipcode of restaurant. Country_id: It represents id of country in which restaurant resides. Locality_verbose: It gives locality of restaurant along with city. Cuisines: It gives details about type of cuisines available in restaurant. Timings: It represents timings during which restaurant provides service. Average_cost_for_two: It represents cost for two customers in different cuisines. Price_range: It gives details about range of price for food. Currency: It represents type of currency. Highlights: It represents highlight of the corresponding restaurant. Aggregate_rating. Rating_text: It represents text based on ratings. Votes: It represents number of ratings. Photo_count: It gives details about photo count. Opentable_support: It represents whether open table support is provided or not. Delivery: It represents whether online delivery or not Takeaway: It represents whether takeaway service is there or not.

As mentioned, this dataset has number of columns and rows. The data is huge and cannot be queries using SQL [12]. This can be made possible with the help of Hive. The process will take time but task can be made possible. Big data analysis can be done using Hive by executing series of queries. Here, an attempt is made to retrieve information from Zomato restaurant dataset. To analyze this dataset [13], first database called "dataset" is created. This can be done using following command. • CREATE DATABASE dataset; Next, a table called zomato1 is created in database that we have created. This can be done by following command. • USE dataset; • CREATE TABLE zomato1 (Res_id, ineteger, Name string, Establishment string, Url string, Address string, City string, City_id string, Locality string, Latitude float, Longitude float, Zipcode integer, Country_id integer, Locality_verbose string, Cuisines string, Timings string, Average_cost_for_two integer, Price_range integer, Currency string, Highlights string, Aggregate_rating float, Rating_text string, Votes integer, Photo_count integer, Opentable_support integer, Delivery integer, Takeaway integer). row format delimited. fields terminated by “→”; Above commands create a database called dataset. And creates a table called zomato1 inside “dataset” database. Next step is to load dataset into the created table that is zomato1. To do so, following command is used. • LOAD DATA LOCAL INPATH ‘/home/prajna/Desktop/zomato_restaurants_in_ India.txt’ into table zomato1;

Novel Methodologies for Processing Structured …

571

Fig. 4 Output sample

Once dataset has been added to the table, it can be queried as required by the user. Now, dataset has been places in Hadoop distributed file system. The useful information can be retrieved from the dataset stored in Hadoop distributed file system by writing the query in HiveQL language. A. Display the number of restaurants in Panaji. SELECT COUNT(*) from zomato1 WHERE locality = “Panaji”; This query gives the total umber of restaurants which provide Zomato service in the city Panaji. B. Display different names of cities which are mentioned in the dataset. SELECT DISTINCT(city) FROM zomato1; This query returns different names of the restaurants given in the dataset. C. Display names and average cost per two person of restaurants located in Amritsar. SELECT name, average_cost_per_two from zomato1 where city = “Amritsar”; This dataset gives the list of restaurant names and respective cost per two people. D. How many restaurants are providing zomato service in Udupi? SELECT COUNT(CITY) from zomato1; This query gives total number of restaurants in Udupi city (Fig. 4).

5 Conclusion In this paper, we discuss how data flows in Hive and how it processes data. Along with characteristics of Hive, this paper explains some of the novel examples for

572

P. Bhat and P. Hegde

creating, storing, and retrieving useful information using Hive QL command. Big data is mainly recognized by volume, velocity, and variety. It can be structured, unstructured, and semi-structured. Big data can be analyzed by using technique like MapReduce. But MapReduce expects user to write code to get useful information from stored data. But if the data stored is in structured format, then it can be analyzed using Hadoop Hive which required user to write query instead of long programming code. It not only saves the time of user but also helps user who does not have much expertise in coding. It processes the data by storing it in the form of rows and columns, i.e., in table format. Hive converts query written in Hive QL language to MapReduce tasks and processes the data.

References 1. Peng, X., Liu, L., Zhang, L.: A hive -based retrieval optimization scheme for long-term storage of massive call detail records. IEEE Access 1–1. https://doi.org/10.1109/Access.2019.2961692 2. Shakhovska, N., Veres, O., Mariia, H.: Generalized formal model of big data. ECONTECHCHMOD Int. Q. J. 5(2), 33–38 3. Kapil, G., Agrawal, A., Khan, R.A.: Big data security issues. Asian J. Comput. Sci. Technol. 7(2), 128–133 4. Pandey, P., Satsangi, C.S.: Comparative performance using Hadoop ecosystem-PIG and HIVE through rendering of duplicates. ICANI2018. https://doi.org/10.1007/978-981-13-2673-8_11 5. Krishna Mohan, K.V.N.: Query optimization in big data Hadoop using hive 4(1), 2347–9272 (2016) 6. Pushpalatha, N., Sudheer, P.: Data processing in big data by using hive interface 3(4), 2321– 7782 (2015) 7. Kan, K., Cheng, X., Kim, S.H., Jin, Y.: Apache hive-based big data analysis of health care data. Int. J. Pure Appl. Math. 119(18), 237–259 (2018) 8. Potharaju, S.P., Shanmuk Srinivas, A., Tirandasu, R.K.: Case study of hive using Hadoop. Int. J. Eng. Res. Technol. 3(11) (2014). ISSN: 2278–0181 9. Pushpa, S.K., Manjunath, T.N.: Analysis of airport data using Hadoop-hive: a case study. Int. J. Comput. Appl. 0975–8887 (2016) 10. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive a petabyte scale data warehouse using Hadoop. In: Proceedings of 26th International Conference on Data Engineering, California, USA, pp. 996–1005. https://doi.org/10.1109/ ICDE.2010.5447738 11. Gupta, A.: HIVE-processing structured data in Hadoop. Int. J. Sci. Eng. Res. 8(6), 2229–5518 (2017) 12. Patel, N.: Analyzing of vehicle registration trend in NY using HBase, pig, hive and MapReduce. https://doi.org/10.13140//RG.2.2.18574.92488 13. Amiripalli, S.S., Tirandasu, R.K.: Case study of hive using hive. Int. J. Curr. Eng. Sci. Res. 1(3), 2393–8374 (2014) 14. Dubey, A.; Big data. Int. J. Eng. Serv. Manage. Res. 5, 9–12 (2018). https://doi.org/10.29121/ ijetmr.v5.i2.2018.606. 15. Manike, C., Nanda, A.K., Gajulagudem, T.: Hadoop Scalability and Performance Testing in Homogeneous Clusters. https://doi.org/10.1007/978-3-030-30577-2_81

Intelligent Cane for Assistant to Blind and Visual Impairment People Meet Patel, Hemal Ahir, and Falgun Thakkar

Abstract Everyone wants freedom in their life to go anywhere and everywhere, but some cannot have due to their compromised vision. Electronic mobility aid has been proposed in this paper to help them make their life easier and more convenient. The National Institutes of Health of the United States published an article on NCBI reporting problems faced by blind people in navigating and obstacle detection. To overcome these for a visually impaired person by using technologies and the Internet of Things (IoT) platforms like the Thingspeak and IFTTT server. This mobility aid will help their guardian or family member to navigate them. In addition to this, if a person tumbles, then an alert message is sent to their closed ones so help can reach them. The current system becomes bulky so to overcome their restrictions this device contains all the components befitted within this stick. Keywords Blind stick · Internet of Things · GPS module · ESP8266

1 Introduction According to the latest research conducted by World Health Organization, there are at least 2.2 billion people suffering from vision impairment or blindness, of whom around 1 billion people have a vision impairment that could have been prevented or has yet to be addressed [1]. In India, there are 40 million people blind in that 1.6 million are children [2]. Major reasons being infections, diabetic retinopathy,

M. Patel (B) · H. Ahir · F. Thakkar G H Patel College of Engineering and Technology, Bakrol Road, Vallabh Vidhyanagar, Gujarat, India e-mail: [email protected] H. Ahir e-mail: [email protected] F. Thakkar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_61

573

574

M. Patel et al.

age-related macular degeneration, cataract, and glaucoma [1]. But among all this, the cataract is the most common cause for blindness. While blind people may require assistance in certain circumstances but they do not get that always, so for their convenience this assistive aid is to guide them rather than old and traditional white cane or guided dogs. Adoption of assistive technology in their life helps them to make their life comfortable. There is some remarkable work done in the field of electronic mobility aid which is being discussed in the below section; a smart stick is also one of the electronic travel aids to support them but there is some hurdle so to overcome this, the updated system is proposed. In this system, navigation of the user is possible, and the guardian of the user also gets alert emails and calls whenever the user tumbles. The major problem with the existing system is that it is bulky and complicated for a user to use and understand so to avoid this, all the modules in this system are implemented within the stick, and it is foldable which makes it convenient in terms of mobility.

2 Related Works Already there are many significant works done in this domain several researchers came with some good ideas and constructed their projects. Ameliorating some traditional mobility aid by adding various electronic sensors. Different design of ETAs is discussed below along with their functionality.

2.1 Design and Implementation of Mobility Aid for Blind People All the electronic systems are implemented in jackets mounted with five sensors. There will be five ultrasonic sensors mounted on the jacket such that one sensor detects potholes or stairs; the other sensor is implemented for obstacles near the head, and the remaining three sensors are for right, left, and front [3]. They have included the salient feature for users that the microcontroller finds minimum value from these three ultrasonic sensors and notifies the user about obstacles by voice command which is pre-installed in micro SD card [3]. The downside of this system is that people won’t find it comfortable wearing it all the time.

2.2 Smart Stick Blind and Visually Impaired People This device was developed by Mukesh Agarwal and Atma ram Gupta. This version of the stick includes ultrasonic and water sensors [4]. Both these sensors help them in

Intelligent Cane for Assistant to Blind …

575

obstacle detection and water detection as their name suggests. Stick is integrated with the SIM808 module which supports global system for mobile communication and merges GPS technology for satellite navigation [4]. Sim card is used to implement identical communication like regular cell phones. The drawback of this gadget is that its design is highly complex, and modules does not get fit inside the stick.

3 Proposed System 3.1 Electronic Component and Their Usage Ultrasonic sensor. The working principle of the ultrasonic sensor and radar system is identical. The basic difference between sonar and ultrasonic is that sonar is used for underwater with both kinds of frequency high as well as low while on another side ultrasonic is used for terrain surface, and only uses high frequency. The electrical signals provided to echo pin of ultrasonic sensor are converted to acoustic waves and vice versa. The ultrasonic wave is also called an acoustic wave. The ultrasonic sensor generates an acoustic wave at 40 kHz. At the frequency of 18 kHz, the acoustic wave travels in a free medium by ultrasonic sensor (HC-SR04) [5]. It provides range from 2 to 400 cm the eight acoustic waves burst each burst having a 10 us duty cycle is sent to the trigger pin of the sensor by the microcontroller at the same time timer initiates [5]. Immediately, the timer stops after receiving reflected acoustic waves. The primary aim of this sensor in a blind stick is for obstacle detection like potholes, staircase, and many more by calculating the time difference between transmitting and receiving a signal; the distance can be calculated by using this formula: Distance =

Time Taken × Speed of Sound 2

MPU-6050 IMU. This module has six output, three accelerometer output, and three gyroscope output so it is also known as six-axis motion tracking or six degrees of freedom device [6]. It uses Micro-Electromechanical System (MEMS) and Coriolis effect to calculate force and angle in the respective plane. The accelerometer gives gravitational acceleration, and the gyroscope gives a rate of change in velocity over time along the x, y, z-axis. This module uses I2C as a communication protocol with other devices [6]. MPU 6050 IMU also has an embedded thermistor and Digital Motion Processor (DMP). DMP is used to compute motion processing algorithms. This sensor plays a vital role by giving an angle of a stick concerning ground. Pictorial representation of MPU6050 IMU is given in Fig. 1. Speaker and vibrator motor. The speaker of 8  is connected to take maximum output. Both these components alert the user about an obstacle or problem on their way. If a person falls, then the speaker activates and will divert the public toward

576

M. Patel et al.

Fig. 1 MPU6050 [9]

them so that they can get help and also as per distance between the obstacle and user increases the intensity of the vibration motor increases. GPS Neo-6M. GPS means a global position system that works on a basic mathematical principle of trilateration. The position is determined by calculating the distance between the receiver and the satellite. Nowadays, the NEO-6 module series is very popular due to its cost-effectiveness, on a board memory chip, miniature packages, high-performance; it also has a ceramic patch antenna and backup battery [7]. This series is based on the u-blox NEO-6M GPS engines. This module works well with a DC input in the 3.3–5 V range [7]. UART, USB are some well-known communication protocols supported by this module. Well-known communication protocols supported by this module. GPS module sends raw data in form NEMA message [7]. User 2D location can be determined by the receiver using at least three satellites (latitude and longitude) and can track movement. By using four or more satellites in view, the receiver can determine the user’s 3D position (latitude, longitude, and altitude). Node MCU. In this project, we have used Node MCU by Devkit having ESP8266 as a microcontroller. This microchip is integrated with Wi-fi SoC and low power consumption. ESP8266 chip drives between 3 and 3.6 V [8]. Node MCU module has a total of 30 pins from them 17 are GPIO having all peripheral duties ADC channels, UART interface, PWM output, SPI, I2C, and I2S interface [8]. It includes four power pins, three 3.3 V, and one V in . This microcontroller is for connecting through the internet and makes a blind stick as a part of IoT so we can access from anywhere in the world. This microcontroller board is shown in Fig. 2.

3.2 Design and Development In this stick, there are features like emailing and calling as we have connected this stick with the Internet, a person who has access to an account linked with a stick on the server can see the location of users on Google maps. All the sensors as discussed

Intelligent Cane for Assistant to Blind …

577

Fig. 2 Node MCU [10]

above are installed inside the stick for user best comfort. The modules used in this device are less than 4 cm. As we have seen blind people use conventional sticks having stagnant design so we have replaced this stick with a foldable mechanism. Not only this, but the panic button is also included in this stick for an emergency purpose which will automatically call the user’s guardian or relative, and at the same time, email is also sent which incorporates latitude and longitude of the user. The ultrasonic sensor is arranged in the outer surface of cane in such a way that it covers all the obstacles which come in the direction and alert the user about them.

Fig. 3 2D prototype of cane

578

M. Patel et al.

Fig. 4 Block diagram

3.3 Hardware Assembly As Node MCU is the microcontroller of this system, it is connected with all the modules. Node MCU is not only connected with MPU6050 but also with GPS module having I2C protocol. Two General Purpose Input/Output (GPIO) pins of Node MCU are connected to the echo and trigger pin of the ultrasonic sensor (HC-SR04); this helps the system to calculate the distance between obstacle and stick. The other two GPIO pin function as output and warns the user about an obstacle on their way. The first is connected to the speaker, and the second is connected to the vibration motor. One more pin takes action as input, and it is connected to is a panic button for emergency purposes. Most of the sensors are oriented inside the stick so it becomes more convenient for the user to hold it as shown in Fig. 4.

4 Working As shown in Fig. 5, when the switch is on the microcontroller starts controlling the sensor from its GPIO pins. First, the microcontroller will check whether it is connected with Wi-Fi or not, and if not, then it will wait till the connection is established. Once it makes sure that the connection is established, it activates trigger pin of ultrasonic sensor which will send acoustic waves. This acoustic wave gets reflected when it strikes an obstacle in front of a user so by calculating time difference, we can find the distance between obstacle and user. Depending on the distance, the user is notified by the vibration of the motor. When the distance between user and obstacle is 40–60 inches, then the motor will vibrate in an interval of 1000 ms, as distance decreases below 20 inch the vibration becomes constant and plays a predefined tune. Next, the inclination of the stick is checked by the position of MPU6050. It gives the position of the module in terms of x, y, z-axis, and this parameter is converted to roll,

Intelligent Cane for Assistant to Blind …

579

Fig. 5 Flowchart

pitch, and yaw by monitoring roll parameter the system analysis whether the user has fallen or not. If users tumble, then the stick will also fall so to navigate them; data from GPS module is fetched, and at the same time, the IFTTT server will trigger links related to webhook service which will call and send emails to the guardian or relatives of the user. The call will just inform them to check email while in the email, there will be the location of the user at one click. This will direct the user’s guardian or relative through the Thingspeak server, and it will show the latitude and longitude of the user as pinout on Google maps. The whole process will continue checking all the aspects of the system until the system is switched off.

580

M. Patel et al.

5 Results The system is checked in outdoor and normal conditions, and the results are according to expectation. When the stick fall, microcontroller triggers IoT platform services like IFTTT and Thingspeak. The call is received informing to check email which consists of a link as shown in Fig. 6 is sent, and when we click on that link web page consisting of pinout location on Google maps is opened as shown in Fig. 7. There are two ultrasonic sensors connected in this system; their readings are taken as input and shown in Table 1. In this table, the output of the vibration motor will vary according to the range of the ultrasonic sensor when the range is between 40 and 60 in. vibration will occur on an interval of 1000 ms; if the obstacle is in range of 20–40 in, then the vibration of motor interval becomes 100 ms; at last, if the object is less than 20 in., then the vibration will be constant, and pre-install tune is played. Table 2 contains the data of time taken by the system to connect wi-fi and trigger time from system to server and server to user’s relatives when the user tumble. Table 2 mostly depends on the Internet speed of a user and a user’s relatives. These readings were taken under 25–30 Mbps as a user connection and 35–40 as the user’s relative connections.

Fig. 6 Email sent by server

Fig. 7 Location of user

Intelligent Cane for Assistant to Blind …

581

Table 1 Real-time performance of ultrasonic sensor and vibration motor S. No.

Ultrasonic sensor (inches) Sensor 1

Sensor 2

Vibration motor delay (ms)

Intimation

1

79.2

104.2



No obstacle ahead

2

59.93

153.68

1000

3

77.77

50.45

Obstacle 40–60 in. ahead

4

25.49

28.64

100

5

25.49

48.02

Object 20–40 in. ahead

6

10.52

28.67

Continuous vibration

Object 0–20 in. ahead

7

29.45

5.45

Table 2 Real time taken by system-server-user’s relatives Feature

Case no. Time taken by system to trigger Response time taken from server the server (ms) to user’s closed one’s (s)

Wi-fi connection 1

6218

2

6505

3

3156

1

1325

33

2

1184

31.25

3

1109

27.56

1

1347

18.17

2

1240

15.51

3

1045

17.45

Call

Email



6 Conclusion and Future Work The model is a simple foldable blind stick consisting of many features and easy to use for users. The system is designed as such to replace the old and traditional blind stick which visually impaired people are using for a long time. The main motive is to provide visually impaired being with affordable assistive technology costing around 3000–4000. Although, it has limitations like user must be accompanied by a smartphone having Internet connection 24/7. In future, artificial intelligence can be install so that the user can easily operate through voice command and get feedback in terms of voice.

References 1. World Health Organization: Blindness and Vision Impairment (2019). https://www.who.int/ news-room/fact-sheets/detail/blindness-and-visual-impairment. Accessed 26 June 2020

582

M. Patel et al.

2. The Tribune: India Home to 20 Percent of World’s Visually Impaired. https://www.tribunein dia.com. Accessed 26 June 2020 3. Sourab, B.S., Ranganatha Chakravarthy, H.S.: Design and implementation of mobility aid for blind people. In: International Conference on Power and Advanced Control Engineering (ICPACE), pp. 290–294, Bangalore, India (2015) 4. Agrawal, M.P., Gupta, A.R.: Smart stick for the blind and visually im-paired people. In: Proceedings of the 2nd International Conference on Inventive Communication and Computational Technologies (ICICCT 2018), pp. 290–294, Coimbatore, India (2018) 5. The Working Principle Applications and Limitations of Ultrasonic Sensors. https://www.mic rocontrollertips.com/principle-applications-limitations-ultrasonic-sensors-faq/. Last accessed 2020/05/07 6. MPU-6000 and MPU-6050. https://howtomechatronics.com/tutorials/arduino/arduino-andmpu6050-accelerometer-and-gyroscope-tutorial. Last accessed 2020/05/07 7. U-blox 6 GPS Modules DataSheet. https://www.u-blox.com/sites/default/files/products/doc uments/NEO-6_DataSheet_%28GPS.G6-HW-09005%29.pdf,last. Accessed 2020/05/07 8. Insight into ESP8266 NodeMCU Features. https://lastminuteengineers.com/esp8266-nod emcu-arduino-tutorial/. Last accessed 2020/05/07 9. Elementz Engineering. https://www.elementzonline.com/mpu6050-gy-521-3-axis-analoggyro-sensors-accelerometer-module 10. NodeMCU pintrest. https://pl.pinterest.com/pin/645492559069569752/

A Comprehensive Survey on Attacks and Security Protocols for VANETs Aminul Islam, Sudhanshu Ranjan, Arun Pratap Rawat, and Soumayadev Maity

Abstract The increasing demand for improving road traffic and the driver’s safety has brought our consideration towards the Intelligent Transportation System (ITS) which was termed as Vehicular Ad-hoc Network (VANET). Its main goal is to enhance roadways efficiency and traffic safety. In VANET many issues came while implementing privacy and security measures. Since this network is vulnerable to attacks on security therefore numerous security requirements need to be fulfilled. In this survey we have emphasized on finding the limitations of the existing papers in the respective field. Going through the fundamentals of VANET, we have illustrated its communication methods. Then we have discussed the application areas and security services in the contiguous sections. Later possible attacks in VANET have been thoroughly discussed. Keywords VANET · V2I · V2V · ITS · Security · Privacy · Authentication · Attacks

1 Introduction Large number of vehicles can be seen running on the roads of a city. Road traffic controllers manually direct vehicles to reduce the traffic congestion and prevent road accidents but without using wireless communication technology, it is really a hecA. Islam (B) · S. Ranjan · A. P. Rawat · S. Maity Department of Information Technology, Indian Institute of Information Technology Allahabad, Prayagraj, India e-mail: [email protected] S. Ranjan e-mail: [email protected] A. P. Rawat e-mail: [email protected] S. Maity e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_62

583

584

A. Islam et al.

tic job for them. Unknowingly, they may direct the traffic to an already busy road. Also, they may not be aware of some emergency vehicles which may be stuck in the traffic away from their eyesight. To solve all these kind of problems, the Intelligent Transportation System (ITS) was introduced which provides the two types of communications i.e. vehicle to vehicle (V2V) and vehicle to infrastructure (V2I). Here infrastructure involves the two basic components—Road Side Unit (RSU) and Trusted Authority (TA), installed alongside the road. Later, ITS was termed as Vehicular Ad-hoc Network (VANET) which uses the functionalities of Mobile Ad-hoc Network (MANET). The network architecture of VANET consists of three major components i.e On-Board Unit (OBU), Roadside Unit (RSU) and Trusted Authority (TA) as shown in the Fig. 1. In VANET, every vehicle is assumed to be equipped with an OBU device that also comprises of the different component e.g. Global Positioning System (GPS), micro-sensors etc. OBU takes the advantages of Dedicated Short Range Communication (DSRC) protocol which is based on IEEE802.11p (5.9 Ghz) radio technology to communicate among the vehicles. It also uses a Tamper-Proof Device (TPD) to store secret information of the vehicles. TPD is assumed to be more secure as it is considered to be unfeasible to ingress the stored data for a malicious node. It warns the driver periodically about traffic-related information like speed, location, direction and road condition etc. to avoid traffic jams and road accidents. Further, this information is sent to RSU and then RSU verifies all the received information and rebroadcasts it with the warnings to other vehicles. Moreover, RSU is responsible for all the authentication work to lighten the burden of a TA. Whereas TA plays a major role in registering all the OBUs and RSUs. It has high computational and storage capabilities as compared to other components and also maintains a database of the vehicles so that they can remove a malicious node from the network by tracing back to the origin of the messages.

Fig. 1 VANET network architecture

A Comprehensive Survey on Attacks and Security …

585

Every vehicle in the VANET broadcasts safety messages which may contain a vehicle’s information (e.g. speed, position etc.) that need to be processed before transmitting to the other vehicles because any malicious vehicle may intend to send some misleading messages deliberately that can destroy the VANET. It is also required to secure the personal information (e.g. id, car number etc.) of a vehicle and prevent other nodes from accessing it in the network. So, it arises the requirements of security services. Verifying every message sequentially by RSU, may not satisfy the timing requirement of the VANET. Let us suppose there are 200 vehicles and everyone is sending messages in every 300 ms that need to be signed. Consequently, an RSU will have to verify at least 650 messages per second approximately which is not a good solution. Moreover, storing and managing the public key certificates were also a communication overhead. So, to overcome this, ID-based group verification scheme was suggested in which a batch of messages is verified at a time that significantly reduces the time overhead. However, it is also having some drawbacks. The rest structure of the paper is as follows: Sect. 2 provides the application areas of VANET in brief. Required security services of VANET are described in Sect. 3. Possible Attacks types are mentioned in Sect. 4. Section 5 presents the discussion on existing papers in detail. And finally Sect. 6 presents the concluding remarks of the paper.

2 Application Areas The interaction of the OBU with the RSU has solved the traffic congestion problem. There are a number of applications of VANET in real life some of them are as follows.

2.1 Safety Related Applications It enable a vehicle to gather information from its sensors or by communicating to other vehicles then this information is used as safety messages which enhances the traffic system and make it more safe. It provides a secure Ad-hoc network by notifying its users about emergency services and privacy-related precautions.

2.2 Intelligent Transportation Applications These applications observe the traffic pattern and manage them accordingly. It enhances the delivery of traffic information by improving the efficiency and accuracy of traffic detection.

586

A. Islam et al.

2.3 Comfort Applications These applications directly relate to the comfort of passengers and drivers. It keeps them updated with the vicinity such as locations of the nearest fuel point, ATMs, food courts and restaurants with their price list. While having an interface with the RSUs, it also provides entertainment related applications such as online games, nearest cinema hall’s location etc.

3 Security Services It is the basic requirement in VANET that enhance the network by providing the security to its users, data and services. To make a vehicular network trustworthy and efficient, its security services should be effective. There are some basic security services which are as follows.

3.1 Availability In VANET, availability ensures that all the required resources and services should be available to all the legitimate vehicles during wireless communication. Since all other services depend on the availability of the resources which make it one of the very crucial security services.

3.2 Integrity Integrity ensures vehicles of the network that the data they are sharing during the communication is not altered or modified in between. It is an important security service because in the absence of it, a hacker can modify the data that may cause traffic congestion or accidents in some cases.

3.3 Authentication Authentication service makes sure that the vehicle who is sending the safety message to the RSU is an authorized user. In addition to this, the receiver can also be sure about the legitimacy of the sender via a pseudonym. So it allows only the legitimate vehicle to communicate in VANET.

A Comprehensive Survey on Attacks and Security …

587

3.4 Confidentiality Confidentiality basically assures the vehicles or users that their messages will not be read by any illegitimate user in the network. It is achieved by encrypting the transmitted message.

3.5 Non Repudiation With this service, the vehicle which has sent the message can not retract with this fact. So it works like a proof of sender for the receiver of the message in the VANET. It can also help in tracing an unauthorized vehicle.

4 Possible Attack Types As of now, we have discussed the vehicular Ad-hoc network and we know that VANET is vulnerable to attacks. In VANET, attacks can be defined as stealing or manipulating the vehicle’s information and using it for all the wrong purposes. In this section, we will enunciate about major possible attack types in VANET.

4.1 Attacks on Availability 4.1.1

Spamming Attack

Spams are the unsolicited messages (eg advertisement). They have no use to the vehicles’ driver or the traveller. They are only meant for consuming the bandwidth that may cause high latency. Due to the lack of Central Administration, it is difficult to control.

4.1.2

Broadcast Tampering Attack

It refers to modify or to manipulate the information. Hackers may add some new messages or hide the precautions. That may result in road jams and accidents also in some cases.

588

4.1.3

A. Islam et al.

Denial of Service Attack (DoS)

If an attacker tries to jam the communication medium and restrict legitimate users from accessing the network resources, this comes under DOS attack. This attack is performed by flooding the requests to the RSUs.

4.1.4

Malware Attack

Malware is the malicious software carried mostly by the insider. They are intended to steal the relevant information. Malware (e.g. worm or virus ) could be installed in vehicles during the installation of the update.

4.2 Attacks on Authentication 4.2.1

Replay Attack

It is an attack where the fraudulent vehicle captures another vehicle’s safety message and replays the manipulated messages for his own use which may cause traffic congestion. This attack may incur insignificant failure to the network.

4.2.2

Sybil Attack

In a Sybil attack, multiple fake vehicles are created to send fake safety messages which may force another vehicle to change their way and result in traffic jams.

4.2.3

Masquerading Attack

It involves forging the identity for unauthorized access to the VANET. This is intended for gaining the personal information of some authorized vehicle and may send wrong messages in the network.

4.2.4

Global Positioning Attack

Global positioning is used to locate a vehicle in real-time. This attack involves providing the wrong location to the other vehicles.

A Comprehensive Survey on Attacks and Security …

4.2.5

589

Message Tampering Attack

Message tampering attack involves altering the useful information. In this attack, a malicious vehicle may discard, alter or drop the information shared by an authorized vehicle in VANET.

4.2.6

Impersonation Attack

As the name suggests the impersonation attack refers to impersonate an authorized vehicle (OBU) to send the manipulated message just to prove his own purpose.

4.2.7

Tunneling Attack

The attacker performs the tunneling attack with the intention of analysing the traffic by linking two parts of the vehicular network with the help of a tunnel where the tunnel refers to a communication medium.

4.3 Attack on Confidentiality As we have already mentioned the importance of confidentiality of the data in the VANET. In this attack, the attacker tries to get valuable information from legitimate users and may disclose it to other vehicles which may destroy the communication network.

4.4 Attack on Non-repudiation Non-repudiation can provide assurance about the sender of the message that means he can not oppose it [12]. Two users should be identified uniquely. It happens when two or more users share the same key for communication. In such cases tracing the unauthorized user is difficult.

5 Survey on Existing Research As we can see that road traffic is increasing day by day, and unresponsive behaviour of the drivers may cause a traffic jam and seldom an accident also. So to overcome this, many researchers have proposed different security protocols. But these protocols have various vulnerabilities. In this section we have classified those protocols into

590

A. Islam et al.

three different categories i.e. Public Key Infrastructure (PKI) based schemes, Elliptic Curve Cryptography (ECC) schemes and Identity-based signature (IBS) schemes. Later we have compared those protocols and analysed them briefly. This classification has been shown in Table 1.

5.1 Public Key Infrastructure Based Schemes Asymmetric key cryptography, also known as public-key cryptography, plays a very significant role in VANET. The rationale to use this technique is to ensure data safety in the communication network. In this technique, key pairs that consist of public key and private key are used to encrypt and decrypt the safety messages. Every message that is transmitted over the network needs to be digitally signed to make the network reliable and secure. In PKI, OBUs and RSUs are authenticated by a trusted third party. Moreover, Digital Certificates, also known as public-key certificates are used to provide authentication in PKI. In 2007, Raya et al. [11] suggested a PKI based protocol where all the trafficrelated information is signed and verified by the certificate authority (CA). In order to start a communication, a large number of certificates with identities are generated in advance that are randomly selected by a vehicle. In 2008, Zhang et al. [9] suggested an authentication scheme for an OBU to RSU communication. In this scheme, the vehicle communicates with RSU with the help of key pairs, certificate and Hash Message Authentication Code (HMAC). Later in the same year, Lu et al. [8] suggested an authentication scheme for OBU to RSU communication in VANET. In this protocol, TA generates the system parameters like a public key, private keys and certificates by using the Bilinear Pairing (BP) for signing and verifying traffic-related information. Each vehicle obtains an anonymous certificate whenever it is in the range of RSU. This protocol minimizes the moving track attack since the vehicles do not reveal the real identity. In 2013, Wasef et al. [6], suggested a PKI based authentication scheme. This scheme works for both OBU to OBU and OBU to RSU communication. In this protocol, the author reduces the time to check the certificate revocation list (CRL) by replacing it with a keyed hash message authentication protocol MAC. For making the network secure and reliable, it is an overhead for TA to maintain and for vehicles to store a large number of certificates. So, after analyzing the protocols we have found that Raya et al. [11] protocol is vulnerable to Traceability attack as well as TA has to face overhead due to a large certificate revocation list. Contrary to that, in Zhang et al. [9] protocol, every vehicle needs to be notified by the RSU to check whether it is valid or not. It results in a heavy message and transmission overhead and makes the Ad-hoc network slow. Due to heavy operations in Lu et al. [8] protocol, it still faces the computational overhead. Moreover, in the Wasef et al. [6] protocol, because of the global nature of the key, the key update process requires high transmission delay.

A Comprehensive Survey on Attacks and Security …

591

Table 1 Comprehensive analysis of different Protocols Classifications Year Author name Type of Objectives communication Public key infrastructure based schemes

Elliptic curve cryptography schemes

ID-based signature schemes

2007

Raya et al. [11] V2I

Improves security

2008

Zhang et al. [9] V2I

2008

Lu et al. [8]

V2I

2013

Wasef et al. [6]

V2V, V2I

2019

Cui et al. [3]

V2I

2019

Ming et al. [2]

V2I

2008

Zhang et al. [10]

V2I

2010

Chim et al. [7]

V2V

2013

Horng et al. [5] V2V

Improves computation overhead Moving track attack, reduces storage cost Average verification delay is efficient Reduces the storage and communication cost Reduces the communication cost by 44% Reduces delay in signature verifications, transmission overhead Impersonation attack raceability attack Impersonation attack

2018

Li et al. [4]

V2I

2020

Ali et al. [1]

V2I

Limitations Traceability attack, overload on the TA Transmission overhead is increased Computational overhead High transmission delay Still computational cost is high High computational cost Impersonation attack, traceability attack

Still vulnerable to impersonation attack Still vulnerable to traceability attack Full key Not efficient for exposure attack, message forgeability signing and attack verification Computation Communication overhead, overhead forgeability increases due to attack PKG

592

A. Islam et al.

5.2 Elliptic Curve Cryptography Schemes Elliptic curve cryptography (ECC) was first proposed by Neal Koblitz and Victor S. Miller in 1985. It provides a high level of security in VANET with low cost. Considering the points on the elliptic curve, it generates the public and private keys. This algorithm uses smaller keys as compared to the RSA (Rivest–Shamir–Adleman) and DSA (Digital Signature Algorithm) due to which it takes less computational power. In addition to this, it requires less space and less bandwidth. As well as it takes less time to generate keys and encrypting or decrypting the data. In 2019, Cui et al. [3], analysed and found a weakness in the group based and pseudonym protocol. The author has shown that these schemes lack many functionalities such that it needs to manage CRL and distribute the certificate to vehicles. With such schemes, the vehicles need to store the certificates, key pairs which is very bulky. To manage certificate revocation lists, large computational and storage capabilities are required. For that reason, many schemes are available on the basis of TA but it is very difficult to implement it in real-world. So, the author has suggested a new semi-trusted ECC based authentication scheme. In this scheme, the receiver has no need to worry about CRL and the vehicle has no need to store it as well. In the same year, Ming et al. [2], suggested a scheme which is based on ECC for V2I communication. According to the scheme, the RSU can handle a vast number of messages in very less time. The suggested scheme also fulfils all the security requirements along with provably secure in the random oracle model. This scheme neither uses Bilinear Pairing (BP) nor map-to-point operation. Thus it reduces the computation delay of singing and verifying the messages and this scheme is appropriate for real-life application. As we know that ECC protocols have the benefits of less computational power as compared to other encryption schemes. So, after analyzing the protocols we have found that for batch verification the Cui et al. [3] protocol’s computation cost is higher than that of Ming et al [2] protocol because Ming et al. [2] protocol takes (2n + 2) scalar multiplication operations in ECC whereas Cui et al. [3] protocol take (n + 2) scalar multiplications operation, (n) small scale multiplication operations and (2n) addition operations in ECC along with (2n) one-way hash function operations. Besides, in both the protocols, every user authenticated once is assumed not to be maliciously affected in the near future.

5.3 Identity-Based Signature Schemes Identity-Based Cryptography (IBC) is a type of Asymmetric Key Cryptography which was proposed by Adi Shamir in 1984. In this protocol, some significant identity of the user like name, mobile number and email id etc. can be used as his public key. It also reduces the overhead of maintaining the public keys. In Identity-based signature (IBS) scheme, user’s identity is used as a public key and corresponding

A Comprehensive Survey on Attacks and Security …

593

private keys are generated with the help of the Private Key Generator (PKG). In IBS, private keys are used to sign the safety messages. It can be defined in four phases: • Setup phase: In the first phase of ID-based signature scheme, PKG generates the system parameters (master key) which are distributed across the vehicles. • Key Extraction: In this phase a private key is generated using vehicles unique id and master key for the communication purpose. • Signing phase: In this phase message is signed by using timestamp and previously derived private key. • Verification phase: Finally, the signed message is verified by using verification algorithm. In 2008, Zhang et al. [10], addressed the issue in the OBU to RSU communication that when the RSU gets a large number of signatures, due to storage problems, it could not verify them in the span of 300ms time. There must be a delay to verify all the signatures. So, the author proposed this ID-based protocol to overcome these issues. This is identity-based protocol due to which no certificate is needed and that is why transmission overhead problems can be reduced. In this protocol, the author used batch verification techniques by using bilinear pairing to overcome the delay in verifying a huge number of signatures. In 2010, Chim et al. [7] raised issues in Zhang et al. [10] protocol stating that this protocol is impuissant to impersonation attack and heavily depends upon TPD. If the TPD is compromised then the whole network will have to suffer. So, to overcome, they suggested an first software-based group communication protocol by using Bloom Filter(BF) and Binary Search Techniques(BST). In this scheme, there is no need for an RSU to share information within a batch. It also takes the advantages of BP and reduces the number of operations to improve its efficiency. In 2011, Horng et al. [5], addressed the issue in the previous Chim et al. [7] protocol and found that it is still vulnerable to an impersonation attack. So, the author suggested a new authentication scheme to overcome it. Being a softwarebased protocol, it does not rely on the hardware. In this protocol, the vehicle can generate a pseudo-identity to transfer a message to another vehicle so that the real identities of vehicles are not revealed. Only TA can disclose the uniqueness of the vehicles whenever it is required. In 2018, Li et al. [4], suggested an ID-based message authentication scheme that takes the advantages of Id signature and ring signature along with BP. After analysing the securities, they have shown that this protocol can defend key exposure attack and forgeability attack. In April 2020, Ali et al. [1], has suggested an identity-based conditional privacy preserving authentication (ID-CPPA) scheme for V2I communication that relies on BP. It uses one-way hash function due to which processing of the messages at RSU can be done efficiently. It also allows the batch verification and ensures the forgeability attack against the Inverse Computational Diffie-Hellman problem in the oracle model. The above protocols use the Bilinear Pairing (BP) approach which requires heavy operations. Due to which it obligates high computational cost. VANET needs to be more secure and capable of refraining the attacker from accessing the network. For

594

A. Islam et al.

that reason, we have analysed aforementioned paper and found that Zhang et al. [10] and Chim et al. [7] protocols are still impuissant to overcome impersonation attacks. Moreover, Zhang et al. [10] and Horng et al. [5] protocols are needed to provide the security against the traceability attack. We have noticed that Li et al. [4] protocol uses the bilinear pairing which increases the computational delay. Along with this, Li et al. [4] protocol makes use of ring-signature and Id-based signature. These all are heavy operations that make message signing and verifying inefficient. At last, we have observed that Ali et al. [1] protocol is still facing high communication overhead due to PKG.

6 Conclusion Vehicular Ad-hoc Network fulfils the emerging requirements of vehicles for making the Intelligent Transportation System. So it is seen that in the past few years, researchers have concentrated on improving the security and privacy of Vehicular Ad-hoc Network. The rationale behind VANET is to implement it into the real world and to provide a better traffic system. In this paper, we have discussed the security services, possible attack types and communication method in VANET. At last, we have illustrated the benefits and drawbacks of the existing papers successfully. It is expected that this paper will give a clear overview of already suggested protocols on VANET and will open a door for researchers to extend the securities in the VANET.

References 1. Ali, I., Li, F.: An efficient conditional privacy-preserving authentication scheme for vehicleto-infrastructure communication in VANETs. Veh. Commun. 22, 100228 (2020). https://www. sciencedirect.com/science/article/abs/pii/S221420961930275X 2. Ming, Y., Cheng, H.: Efficient certificateless conditional privacy-preserving authentication scheme in VANETs. In: Mobile Information Systems 2019 (2019). https://www.hindawi.com/ journals/misy/2019/7593138/ 3. Cui, J., Wu, D., Zhang, J., Xu, Y., Zhong, H.: An efficient authentication scheme based on semitrusted authority in VANETs. IEEE Trans. Veh. Technol. 68(3), 2972–2986 (2019). https:// ieeexplore.ieee.org/document/8629275 4. Li, J., Liu, Y., Zhang, Z., Li, B., Liu, H., Cheng, J.: Efficient ID-based message authentication with enhanced privacy in wireless ad-hoc networks. In: 2018 International Conference on Computing, Networking and Communications (ICNC), Maui, HI, pp. 322–326 (2018). https:// ieeexplore.ieee.org/document/8390287 5. Horng, S., et al.: b-SPECS+: batch verification for secure pseudonymous authentication in VANET. IEEE Trans. Inf. Forensics Secur. 8(11), 1860–1875 (2013). https://ieeexplore.ieee. org/document/6576161 6. Wasef, A., Shen, X.: EMAP: expedite message authentication protocol for vehicular ad hoc networks. IEEE Trans. Mob. Comput. 12(1), 78–89 (2013). https://ieeexplore.ieee.org/document/ 6081877

A Comprehensive Survey on Attacks and Security …

595

7. Chim, T.W., et al.: SPECS: secure and privacy enhancing communications schemes for VANETs. Ad Hoc Netw. 9(2), 189–203 (2011). https://www.sciencedirect.com/science/article/ abs/pii/S1570870510000648 8. Lu, R., Lin, X., Zhu, H., Ho, P., Shen, X.: ECPP: efficient conditional privacy preservation protocol for secure vehicular communications. In: IEEE INFOCOM 2008—The 27th Conference on Computer Communications, Phoenix, AZ, pp. 1229–1237 (2008). https://ieeexplore.ieee. org/document/4509774 9. Zhang, C., Lin, X., Lu, R., Ho, P.: RAISE: an efficient RSU-aided message authentication scheme in vehicular communication networks. In: 2008 IEEE International Conference on Communications, Beijing, pp. 1451–1457 (2008). https://ieeexplore.ieee.org/document/ 4533317 10. Zhang, C., et al.: An efficient identity-based batch verification scheme for vehicular sensor networks. In: IEEE INFOCOM 2008-The 27th Conference on Computer Communications. IEEE (2008). https://www.researchgate.net/publication/4334277_An_Efficient_ Identity-Based_Batch_Verification_Scheme_for_Vehicular_Sensor_Networks 11. Raya, M., Hubaux, J.-P.: Securing vehicular ad hoc networks. J. Comput. Secur. 15(1), 39–68 (2007). https://www.researchgate.net/publication/37439204_Securing_Vehicular_Ad_ Hoc_Networks 12. Khan, S., Khan Pathan, A.: Wireless Networks and Security, vol. 10, pp. 978–3. Springer (2013). https://link.springer.com/book/10.1007%2F978-3-642-36169-2

Analysis, Visualization and Prediction of COVID-19 Pandemic Spread Using Machine Learning Snigdha Sen, B. K. Thejas, B. L. Pranitha, and I. Amrita

Abstract Over the years, human beings have faced several health issues related to the spread of viruses. After Spanish flu, Nipah, and Ebola, now COVID-19 has thrown a serious threat to society all over the world. The rate is increasing exponentially, prevention, proper measurement and strategic action are the need of the hour to combat this pandemic. This paper focuses on analyzing COVID-19 dataset using numerous machine learning (ML) algorithms, visualizing the results and evaluating the performance of the best algorithm. The spread of virus outbreak has caused thousands of deaths across the world and is considered to be a pandemic according to WHO reports. There are a number of methods in preventing the risk of infection manually such as predicting the risk of infection, screening the patients, using chatbots to analyze the risk of infection, identifying and speeding up drug development, etc. In this paper, we mainly experimented with KNN, ANN, SVM, linear (LR) and polynomial regression (PR) methods to learn and analyze about pandemic spread. To achieve this, we have considered COVID-19 dataset of Karnataka state. Mostly, district-wise confirmed, active and death cases have been considered for this work. In addition, we have also performed gender-wise infection spread and presented a cumulative dashboard for overall district-wise active, confirmed and recovered cases of Karnataka. Keywords COVID-19 · Machine learning · Seaborn · Dashboard · Visualization

S. Sen (B) · B. K. Thejas · B. L. Pranitha · I. Amrita Department of CSE, Global Academy of Technology, Bengaluru, Karnataka, India e-mail: [email protected] B. K. Thejas e-mail: [email protected] B. L. Pranitha e-mail: [email protected] I. Amrita e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_63

597

598

S. Sen et al.

1 Introduction Originated from Wuhan market, China, in December 2019, slowly, it started stretching out its tentacle over the entire world. Previously termed as 2019 Novel coronavirus [1], COVID-19 is a group of viruses with scientific name as Orthocoronavirinae or simply Coronavirinae that can infect animals majorly mammals. Because one of the major symptoms of this is acute respiratory syndrome, ICTV termed it as SARS-CoV-2. After entering the human body, it may even kill most of the infected humans specially with comorbidities. The transmission of this virus through cough and sneezing of infected people is very rapid, and hence, measures need to be taken to stop the spreading of the virus. The COVID-19 pandemic has set foot across the globe and is majorly impacting the country’s economy across industries and business. According to WHO, there are 7,039,918 confirmed cases and 404,396 confirmed deaths all over (as per 09-06-2020 report) [1] which is a serious issue to be considered. When the entire world is worried about the pandemic, as computer science engineers, we try to explore how ML algorithms and data analysis can assist us to tackle and control this pandemic better. Being residents of Karnataka state, we used the state dataset for our case study. We evaluate performance of various ML algorithms toward data analysis of Karnataka. However, these algorithms can be applied on another dataset as well. The manuscript is organized as follows. Section 2 discusses relevant literature survey. The dataset description has been described in Sect. 3. We present experimental setup and result discussion about various ML algorithm and data visualization in Sect. 4. Lastly, we conclude with possible future work.

2 Literature Survey In the last few years, AI, machine learning and deep learning are setting notable footprints in data analysis across every sector. Due to this COVID-19, researchers all over the world demand help from data scientists to analyze and predict the spread so that situation can be handled in a better and organized way. Benvenuto et al. [2] discussed the ARIMA model based on autoregressive integrated moving average which is useful for predicting COVID-19 spread and forecasting disease prevalence. Narinder et al. [3] have evaluated and compared performance of support vector machine, polynomial regression, deep neural network and recurrent neural networks using long short-term memory (LSTM) with COVID-19 data from Johns Hopkins University and finally reported PR offers best prediction results with low root mean square error (RMSE) over other approaches in confirmed, death and recovered case. In his paper [4], time series method described by Deb et al. helps in estimating reproduction rate of COVID-19. Here, the authors also concentrated on usage of various data analysis and statistical tools to find out patterns of virus outbreak. Based on that, early precaution can be taken. While working in same research orientation,

Analysis, Visualization and Prediction of COVID-19 Pandemic …

599

Sars-COV-2 transmission scientific model was developed and proposed by Kucharski et al. [5] using datasets focusing on confirmed, death and recovery cases inside Wuhan and rest of the world. Later, Lauer et al. [6] worked on the incubation period of COVID-19 and mentioned time period can be 5–14 days. While using deep learning, Narin et al. [7] used and explored usage of CNN for automatic disease detection from x-ray images. They evaluated performance of three CNN models like ResNet50, InceptionV3 and InceptionResNetV2 and showed that ResNet50 offers 98% accuracy in classifying infected patients outperforming other two models. Apart from that, a lot of another research is going on in this field. Scientists use ML and DL [8] for screening and creating antibodies to cure COVID-19 disease which proves a great success. These algorithms even help to recognize suitable antibody candidates in a quicker and cost-effective way than the traditional approach. It will certainly accelerate cure therapies of viruses. Columbia University students [8] launched a startup app EVQLV which is helping to generate millions of antibodies quickly. MIT developed model [9] is the first which directly uses data from the coronavirus itself and works on integration of machine learning and standard epidemiology for better prediction. At Rensselaer Polytechnic Institute (RPI), researchers are also utilizing ML to analyze the effects of social distancing. Nguyen [10] conducted a survey on how various AI, IoT and deep learning methods are effective and prompt in the battle against COVID-19 pandemic using CT scan images and reported performance of those methods in terms of accuracy.

3 Description of Dataset Used Mainly, we used data from Kaggle and Johns Hopkins University. The first reported confirmed case in Karnataka is on March 9, 2020. So, our dataset contains data from that day till 5th June for analysis.

4 Experimental Setup and Result Data analysis has been done using Python in Jupyter notebook with libraries from Matplotlib and Seaborn for visualization. In Fig. 1, we have reported district-wise active versus confirmed versus recovered cases captured during March 2020 till June 8, 2020.

4.1 Dashboard Creation We have built a dashboard using Microsoft Excel for the number of COVID-19 cases in Karnataka as on May 21, 2020. Some techniques like conditional formatting,

600

S. Sen et al.

Fig. 1 Confirmed versus active versus recovered cases in Karnataka

Fig. 2 Karnataka COVID-19 dashboard

pivot table and a few basic formulas were used in Excel for obtaining the desired dashboard. A provision for users to compare districts has also been incorporated. Users can also visualize the districts which are above a certain limit. Here, the input for this type of formatting must be given by the user (Fig. 2).

4.2 Comparative Analysis of ML Algorithm Day-wise rise in confirmed cases has been plotted here keeping daily increase as a target variable. Dataset for Karnataka has been used till June 5, 2020. For SVR, RBF kernel is used, KNN with k = 3 and ANN with a single hidden layer with ReLU activation function and linear activation function in the last layer is considered using Keras. Network was trained in 10 epochs with 10 batch size. MAE reported 797.3837.

Analysis, Visualization and Prediction of COVID-19 Pandemic … Linear regression (LR)

SVR

601

Polynomial regression

KNN

Fig. 3 Comparative study

Table 1 RMSE error Linear regression (LR)

SVR

Polynomial regression (PR)

KNN

ANN

RMSE = 52.6 r 2 score = 0.51

RMSE = 72.7 r 2 score = 0.075

RMSE = 28.9 r 2 score = 0.85

RMSE = 25.4 r 2 score = 0.72

RMSE = 1287.6 r 2 score = −0.63

Table 2 Forecasting confirmed cases

Date

Predicted value (confirmed case)

Actual value (confirmed case)

1/6/20

3130

3221

2/06/20

3337

3408

3/06/20

3556

3796

4/06/20

3788

4063

5/06/20

4033

4320

MAE and RMSE are evaluator metric for regression model. Linear regression does not fit COVID-19 data well, whereas polynomial regression with degree 5 works best (Fig. 3). From Table 1, it is visible that PR works best among all algorithms with least RMSE. So, we used PR for forecasting day-wise confirmed cases depicted in Table 2.

4.3 Data Analysis and Visualization Here, in Fig. 4, we have shown interactive pie chart, KDE plot to find the percentage of confirmed, recovered, active and diseased cases from each district of Karnataka. Brighter region consists of safer districts, and the darker region consists of districts which are more prone to COVID-19. Date-wise number of male and female confirmed cases and categorizing districts based on infection spread are analyzed using KNN.17

602

S. Sen et al.

Fig. 4 Visualization using seaborn and matplotlib

districts are in critical zone. Critical zone is calculated by the percentage of recovered victims with respect to total affected victims.

5 Conclusion To save the world from the jaw of this pandemic, more collaboration among the medical fraternity and data scientists should be promoted. Through this paper, we tried to highlight the impact and potential of machine learning tools to fight this disease quickly. Collecting more datasets and exploring other ML algorithms can be part of further research study for even more better prediction. While considering Indian population, early lockdown helped us to reduce the number of infected cases and death rate too. Still, there is a long way to go by maintaining social distance, avoiding crowded places and using sanitizer and mask. So, stay healthy, stay safe.

Analysis, Visualization and Prediction of COVID-19 Pandemic …

603

References 1. WHO corona viruses (COVID-19). Retrieved June 10, 2020 from https://www.who.int/emerge ncies/diseases/novel-coronavirus-2019 2. Benvenuto, D., Giovanetti, M., Vassallo, L., Angeletti, S., Ciccozzi, M.: Application of the ARIMA model on the COVID-2019 epidemic dataset. Data Brief 105340 (2020) 3. Punn, N.S., Sonbhadra, S.K., Agarwal, S.: COVID-19 epidemic analysis using machine learning and deep learning algorithms. Preprint https://doi.org/10.1101/2020.04.08.20057679 (2020) 4. Deb, S., Majumdar, M.: A time series method to analyze incidence pattern and estimate reproduction number of COVID-19. arXiv preprint arXiv:2003.10655 (2020) 5. Kucharski, A.J., Russell, T.W., Diamond, C., Liu, Y., Edmunds, J., Funk, S., Eggo, R.M., et al.: Early dynamics of transmission and control of COVID-19: a mathematical modelling study. Lancet Infect. Dis. (2020) 6. Lauer, S.A., Grantz, K.H., Bi, Q., Jones, F.K., Zheng, Q., Meredith, H.R., Azman, A.S., Reich, N.G., Lessler, J.: The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application. Ann. Intern. Med. (2020) 7. Narin, A., Kaya, C., Pamuk, Z.: Automatic detection of coronavirus disease (COVID-19) using x-ray images and deep convolutional neural networks. https://arxiv.org/ftp/arxiv/papers/2003/ 2003.10849.pdf 8. Kent, J.: Data scientists use machine learning to discover COVID-19 treatments. https://health itanalytics.com/news/data-scientists-use-machine-learning-to-discover-covid-19-treatments (As on June 10, 2020) 9. Gallagher, M.B.: Model quantifies the impact of quarantine measures on Covid-19’ spread http://news.mit.edu/2020/new-model-quantifies-impact-quarantine-measures-covid-19-spr ead-0416 (As on June 10, 2020) 10. Nguyen, T.T.: Artificial intelligence in the battle against coronavirus (COVID-19): a survey and future research directions. Preprint https://doi.org/10.13140/rg.2.2.36491.23846 (2020)

Study of Behavioral Changes and Depression Control Mechanism Using IoT and VR Pavan Kumar Katkuri and Archana Mantri

Abstract Internet of things (IoT) has cemented its place as one of the critical technologies in providing solutions for the present issues. Though we say that there is a massive advancement in technology, but we still have a person who dies every 40 s due to depression or mental health-related issues. Mental health disorder/deterioration or depression is the key issue which needs to be addressed in the healthcare domain. This paper is a study about analyzing the behavioral changes and how the IoT and virtual reality (VR) can be used for identification of mental disorder or depression-related issues and avoid facing the critical stages of depression by the user. Keywords Mental disorder · Depression · Internet of things · Virtual exposure · Virtual reality · Early stage

1 Introduction In this millennium generation, we have seen many technologies being used for addressing the issues in various domains like health care, transport, public or civil services and so on. Health care has progressed with many innovations including tracking of health, fitness and providing interventions to maintain the health condition. However, as per the report by “World Health Organization” (WHO), one person is dying in every 40 s [1]. Though there is vast research happening with respect to various diseases and health conditions, there is limited research focused on diagnosing and healing depression [2]. As per the estimation of the World Health Organization, depression will be one of the top three pandemic diseases by the year 2030 [1]. Hence, there is a tremendous necessity to address the mental health-related issues, which causes depression and emotional imbalance. P. K. Katkuri (B) · A. Mantri Chitkara University Institute of Engineering and Technology, Punjab, India e-mail: [email protected] A. Mantri e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_64

605

606

P. K. Katkuri and A. Mantri

Depression and anxiety patients need to be treated with utmost care and support, to cure and make them back normal, but most of times the end result is not achieved. Depression will lead to low mood, lack of interest in the activities they do which can be identified by the change in their behavior or actions. These patients if not identified in the early stage, their disorder could blow out of proportion and could lead them to suicidal thoughts. Most of the patients fail from getting recognized that they are suffering from disorder tendency; if identified on time, few could get cured based on their stage of depression levels, few might end up with committing suicide and few could undergo the treatment for a lifetime at regular intervals. So, it is very much essential for the patient to get diagnosed at an early stage and get timely cured or and thus avoid from reaching critical stages of depression [3]. As tracking human emotions/health is key element to cure depression patients, IoT devices were deployed for tracking/monitoring the emotional health. For this, a perfect mechanism should be able to analyze the data generated by IoT devices and VR so as to identify the level of depression and then generate the recommending actions to help the patient recover.

2 Assistance for Depressive Disorder Victims 2.1 Early Stage Identification Depression is becoming an increasingly severe problem in society these days. It is therefore necessary to identify the patients for any mental health disorders at its early stage. Depression symptoms can be detected and monitored using IoT devices by monitoring their behavioral changes without human interaction. It could be advantageous to have the patient’s behavior monitored when she/he is alone instead of monitoring in the ambience of a psychiatric hospital or when he is surrounded by family or friends. In order to identify the behavioral change, firstly, the diagnosing system needs to be trained with the patient’s normal behavior so that it can detect the behavioral change and report the user/patient at an early stage. The system can identify the change in behavior, which may lead to depression, based on few parameters like (i) (ii) (iii) (iv)

Feeling of guilt: can be measured with sensors. Activity: measuring the energy level with which they perform the actions. Sleep: sleep can be calculated with IoT devices or wearable devices. Depressed mood: can be measured with IoT devices or wearable devices or even the smartphone using the parameters like calls, messages, images, etc. (v) Suicidal tendency: browsing negative content over the Internet. (vi) Levels of anxiety: IoT devices/wearable devices. (vii) Weight: monitoring weight loss continuously. (viii) Breath and heart rate: measured with IoT devices.

Study of Behavioral Changes and Depression Control Mechanism …

607

Fig. 1 Mechanism to identify behavioral changes

(ix) (x) (xi) (xii)

Phobias, panic attacks. Obsessive-compulsive disorder (OCD). Post-traumatic stress disorder (PTSD). Relationship with others.

Mental ill health is wide-ranging with different symptoms and severity and is generally characterized by abnormal thoughts, emotions, behavior, relationships with others, etc. The aforesaid symptoms mentioned are some of the emotions that cause mental disorder and it is very crucial that these emotions be continuously monitored at all locations and at all times. The data generated by the system deployed can be analyzed based on the questionnaire, survey, Hamilton depression scale, self-report rating, BDI [4], etc. The stage of depression/mental disorder of the patient can be identified using the aforesaid symptoms (Fig. 1).

2.2 Monitoring the Victims Depression/mental disorders can be classified into three stages; they are mild, moderate and severe. In order to cure the depression, we have adequate physiatrists, hospitals and medical treatments, etc., but still depression is considered a critical problem as they are not medicated or identified in the early stages. If the patients are identified in their early stage and given utmost care and are shown empathy, most of the cases could be cured. However, most of them cannot even know on their own that they are suffering from a mental disorder. Patients with strong psychological and physical health can recover even from the severe stage, but the percentage of such types is very low. Once the stage of the mental disorder is identified based on the score, the IoT system can be used to (1) provide interventions between doctors and patients, (2) enable individuals to engage in their health care actively, (3) support

608

P. K. Katkuri and A. Mantri

proactive, preventive and personalized healthcare delivery to individuals around the world and (4) significantly decrease unnecessary hospital admissions and associated costs [5]. If the patient is identified in the early stages, then the system will be able to provide recommendations based on the health status of the user. Suggestions play a crucial role when the patient is identified in the early stage. Figure 2 shows the recommendations based on the depression disorder [6]. For example, if the user is feeling deprived of sleep which might also lead to the depression, this can be identified at an early stage and can be provided with recommendations initially to listen to music and/or meditate, etc., and make him conscious about the importance of having adequate sleep for good health. If it continues further, then recommendations can be changed; based on the stage, even medical assistance can be recommended. So, it is crucial to provide recommendations from the very early stage, identifying or monitoring based on the health symptoms of the person. Few other recommendations that can be generated based on the user depression symptoms are, viz.: watching a movie, doing yoga, meditation, showing good past moments from mobile images/videos, stroll in the cool breeze with family or friends, playing sports, reading books and/or suggesting books to be read, suggesting mobile apps that help the patients in sleeping, meditation, etc., growing plants or Fig. 2 Recommendations based on depression disorder

Study of Behavioral Changes and Depression Control Mechanism …

609

Fig. 3 Remote health monitoring [9]

gardening, etc. These activities can keep the patient engaged and active—physically and mentally. Recommendations should vary from person to person and be categorized and be categorized based on the age and health condition of the user. IoT system should generate the recommendations by taking the following into consideration: (i) (ii) (iii) (iv) (v)

Gender of the use. Age group varying from children to old age. Based on emotions or symptoms measured. Identifying the context when the abnormal behavior is recorded. Tracking the location of the user (Fig. 3).

Based on the above factors, the recommendation is to be generated by the system. The sensors used for measuring play a vital role in generating the data. These recommendations need to be changed based on the state of the mental disorder. Simply having an automated mechanism will not be able to cure the patients, instead it can only help the user to identify and suggest the recommendation. It is always necessary and conveniently beneficial to have the people who can take care, shower love and empathy on the patient. If the patient is suffering from mild depression, this is the stage where the state and progress of the patient are regularly informed to his family or friends or with whom the patient maintains frequent contact. This contacts’ data can be taken from the patient’s phone, analyzed for measuring the emotional connect of the patient with those contacts. When the patient is about to reach the severe stage, the report has to be shared or reported to the doctor. These recommendations and continuous monitoring of the patient and the collected data will precisely help in

610

P. K. Katkuri and A. Mantri

identify the depression or mental disorder. Data thus has a more significant impact on analyzing depression or mental disorder.

3 Wearable Devices and Sensors A vulnerable segment of those with a mental illness is termed as severe mental illness (SMI), a condition associated with psychosis and other extreme states. These patients may resist treatment, particularly, medication; this is where the sensors and IoT systems will be beneficial. This is the reason we have seen the penetration of wearable devices to monitor health. The sensors in the devices can help the doctors and family members to ensure that patients take their medication regularly as prescribed. Various devices like Mimo, Sproutling, Withings home, Emospark, Apple watch, Jawbone, Fitbit, etc., have proved that people are not only attracted towards wearable devices but also become inclined to improve few parameters of fitness. Some of the advanced devices can monitor blood pressure, sugar levels heart rate, sleep, oxygen level, water consumption, etc. Apple’s new cognitive kit can also help the patients to share their live moods with the doctors. The best thing about the cognitive kit is that it has some games, which record the reaction of the patients and therefore act like a psychometric test. This proves that IoT can be a better solution for diagnosing and monitoring the patients.

4 Diagnostic Algorithms Is it vital to diagnose the patients faster? Does this have a greater impact on the rate of recovery? Studying the latest research in which the researchers believe that IoT systems with VR technologies help in diagnosing the patients faster when compared to the traditional methodology. However, combining this with AI algorithms improves and speeds up the diagnosis and the following treatment. IoT combined with VR/AI algorithms detects signs of clinical depression three months earlier than the medical provider’s diagnosis. With the help of these advance systems, monitoring of changes can be done and analyzed accurately. During a recent test, the patients underwent the examination of their mood and thought patterns. This information necessitated to guide the patients through cognitive behavioral therapy (CBT) skills. This treatment led them to the proven AI-based approach better than other mechanisms. As mental health is difficult to be measured, we need to have better wearable tools to measure our vital statistics. IoT-driven smart concepts supported by algorithms are a win-win situation for both, the rural masses for getting cured and their doctors for quickly predicting illness before it develops. These applications or the systems have the high potential to save countless lives. AI and machine learning, for instance, could learn from the symptoms, treatments and

Study of Behavioral Changes and Depression Control Mechanism …

611

Fig. 4 Mechanism for detection of depression

outcomes for a specific condition and provide insights to the physician, which would be difficult to predict as an individual (Fig. 4).

5 VR in Treating Clinical Disorder Victims Some of the major changes in the behavior that may lead to depression/mental disorder are phobia and panic attacks. VR Technologies provides great means of support in accessing and treating the clinical disorder patients [9]. Many reviews in the past, discussed about the clinical implications and findings of VR on disorder patients with accessed quality [10]. Various pilot studies, open trails and random access trails (RCTs) were reviewed which implemented VR treatment. These compared the effectiveness of VR treatment with other or no treatment. This study proved that VR treatment provided better outcome compared to other treatment or no treatment. VR-enabled treatment can provide better outcomes for disorders caused by “fear of heights,” fear of flying,” “spider phobia,” “social phobia,” “obesity,” “fear of public speaking,” etc. (Table 1). VR Technology supports various treatments like “VR-assisted cognitive behavior therapy,” “VR-based cognitive treatment,” “VR exposure,” “VR therapy,” etc. These treatments provided better results by treating the clinical attendees and referred patients who were facing depressions and behavioral changes, but the evidence for the efficacy of VR treatment still needed to be established.

6 Research Research study shows that one in every 20 persons suffers from depression at one or the other stage of their life. Many researchers are working on designing various IoT systems to detect depression and track emotions in diverse age groups. Multiple

612

P. K. Katkuri and A. Mantri

Table 1 Controlled trails Behavioral changes

Sample/size

Treatment

Comparison

Outcome

Fear of height

CA/37

VR exposure

No treatment

VR > NT

Fear of flying

CA/30

VR exposure with physiological feedback

Vivo exposure

VR > IV

Social phobia

RP/36

VR exposure

CBT group

VR = GCBT

Obesity

CA/216

VR therapy

CBT group No treatment

VR > CBTP

Fear of speaking

Students/17

VR exposure

Self-exposure

VR > SE

Spider phobia

Students/40

VR exposure with tactile cues

No treatment

VR > NT

CA—clinical attendees, RP—referred patients

mechanisms like speech features, facial expressions, text patterns, algorithms and energy levels were used to identify depression and provide recommendations for the patients. Wearable IoT technology, smart healthcare, virtual reality [11], artificial intelligence (AI), EEG signal processing and so on are all undergoing extended trials for optimization in this field.

7 Conclusion and Future Scope Depression/mental disorder is one of the major problems in humans, causing symptoms that affect how one thinks, feels and even performs daily activities. This paper is the study of behavioral changes and the importance of the IoT mechanism and VR technologies, which can help in identifying and curing the depression/mental disorder. This system will be able to monitor the user activities and emotions, and the data can then be used to determine the mental disorder. Advantages of this mechanism can help in monitoring the patients continuously and report to the physiatrist and family members with recommendations generated based on the mental disorder stage of the patient. There are many wearable devices getting introduced into the market regularly. These devices can be used for monitoring all emotions and detecting the symptoms based on the data. These can be further improved to provide better service by combining IoT systems with algorithms of artificial intelligence or machine learning. IoT can also be combined with other technologies like AR or VR and can provide AR interface, reliable self-assessment service based on real practitioners’ knowledge, direct pathway to relevant rethink mental illness resources and personalized support and interventions. Thus, IoT system when combined with AR/VR, AI or ML, can give better results and thereby significantly control the death rate.

Study of Behavioral Changes and Depression Control Mechanism …

613

References 1. https://www.who.int/news-room/detail/09-09-2019-suicide-one-person-dies-every-40-sec onds 2. Anumala, H., Busetty, S.M., Bharti, V.: Leveraging IoT device data for emotional health. In: International Internet of Things Summit, pp. 487–501. Springer, Cham (2015) 3. Deepika Mathuvanthi, P., Suresh, V., Pradeep, C.: IoT powered wearable to assist individuals facing depression symptoms (2019) 4. Beck, A.T., Ward, C.H., Mendelson, M., Mock, J., Erbaugh, J.: An inventory for measuring depression. Arch. Gen. Psychiatry 4, 561–571 (1961) 5. Zois, D.S.: Sequential decision-making in healthcare IoT: real-time health monitoring, treatments and interventions. In: 2016 IEEE 3rd World Forum on Internet of Things (WF-IoT), pp. 24–29. IEEE (2016) 6. Ali, S., Kibria, M.G., Jarwar, M.A., Kumar, S., Chong, I.: Microservices model in WoO based IoT platform for depressive disorder assistance. In: 2017 International Conference on Information and Communication Technology Convergence (ICTC), pp. 864–866. IEEE (2017) 7. Vaseem, A., Dr. Sharma, S.: Depression: a survey on the Indian scenario and the technological work done. Int. J. Eng. Res. Technol. (IJERT) 08(03) (2019) 8. https://www.c-sharpcorner.com/UploadFile/f88748/internet-of-things-applications/ 9. Gregg, L., Tarrier, N.: Virtual reality in mental health. Soc. Psychiatry Psychiatr. Epidemiol. 42(5), 343–354 (2007) 10. Glantz, K., Rizzo, A., Graap, K.: Virtual reality for psychotherapy: current reality and future possibilities. Psychother. Theor. Res. Pract. Training 40, 55–67 (2003) 11. Katkuri, P.K., Mantri, A., Anireddy, S.: Innovations in tourism industry and development using Augmented Reality (AR), Virtual Reality (VR). In: TENCON 2019–2019 IEEE Region 10 Conference (TENCON), Kochi, India, pp. 2578–2581 (2019). https://doi.org/10.1109/tencon. 2019.8929478

Sentiment Analysis on Hindi–English Code-Mixed Social Media Text T. Tulasi Sasidhar, B. Premjith, K. Sreelakshmi, and K. P. Soman

Abstract Social media has been experiencing an enormous amount of activity from millions of people across the globe over last few years. This resulted in the accumulation of substantial amount of textual data and increased several opportunities of analysis. Sentiment analysis and classification is one such task where the opinion expressed in the text is identified and classified accordingly. This becomes even more trickier in code-mixed text due to free style of writing which does not have a proper syntactic structure. In this paper, we worked on such Hind–English code-mixed texts obtained from SentiMix shared task of SemEval-2020. We created a novel customized embedding model for feature generation from Hindi–English code-mixed texts to classify them to various sentiments like positive, neutral and negative using deep learning techniques. It is observed that attention-based CNN-Bi-LSTM model has achieved better performance out of all models with 70.32% F1-score. Keywords Sentiment analysis · Word2Vec · fastText · Long short term memory (LSTM) · Attention mechanism

1 Introduction Social media platforms like Facebook, Twitter and Instagram have seen a phenomenal range of interactions across the world. These platforms are flooded with all sorts of data like texts, videos, images, and among all, textual communication is a prime T. Tulasi Sasidhar (B) · B. Premjith · K. Sreelakshmi · K. P. Soman Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyappetham, Coimbatore, India e-mail: [email protected] B. Premjith e-mail: [email protected] K. Sreelakshmi e-mail: [email protected] K. P. Soman e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_65

615

616

T. Tulasi Sasidhar et al.

source of research due to abundant usage. The amount of people engaging in social networking are exponentially increasing each day on enormous variety of aspects. This opened a huge scope for analyzing and understanding the behavioral pattern of people and leveraging it for the improvement in several fields, i.e., getting feedback for a product, reviewing public opinion on new government policy, obtaining verdict of a movie and so on. The series of methods and techniques to understand the human polarity by extracting the relevant information from the textual data can be termed as sentiment analysis [1]. Traditionally, sentiment analysis and classification means analyzing the polarity of the expressed opinion and categorizing them as positive, neutral or negative. Sentiment analysis is one of the subcategories and the prime area of research within the area of natural language processing. There are lot of advanced models achieved state-of-the-art classification results on texts which are expressed in monolingual nature like English, Spanish, Chinese and so on. But people from multilingual societies like India tend to use different style of writing texts as they more likely tend to have an influence of at least two languages on them and code-mixed [2] writing is one of such text patterns. Code mixing is a phenomena of transliterating and mixing native and foreign language. An example of such text is illustrated below. • Sarkar ne corona time pe lockdown shuru karke spread ko control kar diya. The above example contains a Hind–English code-mixed text, and we can see that Hindi words like “Sarkar ne”, “shuru karke” and “kar diya” are written in roman script. Analyzing this type of sentences and classifying based on the sentiment expressed in them is still an active research and difficult when compared to traditional text classification. Lack of pretrained models and quality annotated corpus make the task even more trickier. In this paper, we conducted experiments on such type of data, and we chose Hindi–English code-mixed texts and attempted to classify them based into buckets of positive or negative or neutral. A novel way of creating a customized embedding model is proposed for better feature generation and used deep learning models to classify the sentences. The paper structure is as follows. Section 2 provides a detailed description about the existing works done in the field of sentiment classification. The details of the dataset used for experimentation are given in Sect. 3. In Sect. 4, a detailed flow along with description of each step followed while conducting experiments is provided, and the paper is concluded in the Sect. 5.

2 Related Works Code-mixed text classification in the context of Indian languages is still an active research area. Lack of annotated data and involving a complex and semantically rich language like Hindi along with English makes it trickier. The most crucial step for dealing with textual data is to obtain a relevant numerical representation for it. Word embeddings are one such learned representations which convert words to vectors

Sentiment Analysis on Hindi–English Code-Mixed Social Media Text

617

while preserving the context among them. Cha et al. [3] proposed a work on encapsulating models by the formation of word embedding clusters for evaluation of text. They used Bag-Of-Words, word2vec, fastText and Doc2Vec to fabricate semantic embedding features which are highly beneficial in readability of text, and among them, fastText gave better performance. In the context of code-mixed text, Braja et al. [4] presented a summary of a task in which texts are classified based on the sentiment expressed in them, and two different code-mixed texts (Hindi–English and Bengali–English) are used for experimentation. A brief of each team approach in terms of features and models used by them is provided. Among all, the top two performing teams used GloVe and fastText word embeddings. fastText along with CNN layer to grab sub-word features and bi-directional long short-term memory network (Bi-LSTM) to capture sequential information gave top classification performance. Shalini et al. [5] proposed an approach for classifying the code-mixed texts in Indian languages based on the sentiment embedded in it. They introduced the first Kannada–English annotated corpus by grabbing Facebook comments using API. The proposed model used Doc2Vec and fastText for feature vector generation. Machine learning model like SVM and deep learning networks like convolutional neural network (CNN) and Bi-LSTM are used for classification. The proposed method is validated on Bengali–English and Hindi–English corpus acquired from a shared task. They achieved 60.22%, 72.20% using Bi-LSTM on EN-HI, EN-BE and 71.50% using CNN on EN-KA dataset. In many cases of code-mixed texts, a subset of words constitute the entire context of sentence. In order to have better classification, its important to weigh each word. This can be carried by neural network with the incorporation of attention mechanism [6] in it. Zhou et al. introduced an LSTM model with attention mechanism for classifying the sentiments in cross-language texts [7]. Word2vec model trained on both English and Chinese is used to generate feature vectors. They used a neural network model with attention mechanism which is trained in combination with the bilingual bi-directional LSTMs to model the word sequences and achieved 82.4, 84.1 and 81.3% accuracies on NLP&CC datasets. The main challenge of distinguishing emotions in code-mixed texts is exploring monolingual and bilingual content of each text and identifying the useful words from the context. Wang et al. [8] addressed these challenges by proposing a bilingual attention network (BAN) model which accumulates the important word features from both languages to construct feature vectors and integrate the vectors with high attention weight to predict the emotion. The related works that are provided for this task support the fact that sequential models along with attention mechanism enhanced the classification but lack of stateof-the art pretrained embedding model [9, 10] especially in Hindi–English codemixed domain resulted in sub-par accuracy values. In this work, we propose to fabricate a customized embedding model which gives better numerical representation of texts so that better classification can be achieved.

618

T. Tulasi Sasidhar et al.

3 Dataset Description The dataset used for experimentation for sentiment analysis is obtained from SentiMix shared task organized in SemEval-2020 [11]. The detailed distribution of data is portrayed in Table 1. The task is to classify the Hindi–English code-mixed text based on the sentiment expressed in it. The sentiment labels considered are positive, neutral and negative. The dataset contains 14,000 sentences for training, 3000 sentences for validation and 3000 sentences for testing. Each text is annotated with respective sentiment, and along with that, word level language labels are also provided.

4 Experiments and Results This section is organized as follows. In Sect. 4.1, preprocessing steps which are used for cleaning the data are illustrated. First phase of experiments, results and output analysis are provided in Sect. 4.2. In Sect. 4.3, the reason and procedure of fabricating a customized embedding model are described, and the experiments conducted with feature vectors from customized embedding model are provided in Sect. 4.4.

4.1 Preprocessing The texts present in data are extracted from social media platforms, and they are filled with information like Usernames, URLs, Hashtags and special characters. A prior preprocessing is required to remove irrelevant information from the sentences. As each data point is splitted into words, first step is to concatenate and form sentences. After that, all the usernames which are in general start with @ and hashtags (#) are removed from the sentences. All the special characters like multiple dots, smileys along with additional spaces are removed, and each sentence is converted to lower case. These preprocessed sentences are utilized for all the experiments that are carried out in this work.

Table 1 Dataset description Label Train data Positive Neutral Negative

4634 5264 4102

Validation data

Test data

982 1128 890

1000 1100 900

Sentiment Analysis on Hindi–English Code-Mixed Social Media Text

619

4.2 Experiments-1 It is clear from the literature survey that pretrained bilingual embedding models generate better feature vector in code-mixed sentence classification. As they were already trained on similar pattern of sentences, they establish relatively better semantic relation between words and generate better numerical representation for them. Hence, as the first level of experiments, a domain-specific pretrained model is utilized [12]. Initially, every preprocessed sentence is tokenized. Word2vec from gensim library is used to load the pretrained model and retrain it with the tokenized sentences. Skipgram method is used, and model is retrained for 10 epochs. Sequential models like LSTM, Bi-LSTM are used along with CNN headed models. Each model is trained for 15 epochs, and the test results are tabulated in Table 2. It is evident from the results that there is a large scope of improvement. Hence, we performed a retrospective analysis and found that there are huge number of unique words which are introduced by this data to the pretrained embedding model. As lot of new words are present in the dataset, it is tough to generate relevant embeddings with the available model. Hence, we propose to fabricate a customized word embedding model.

4.3 Customized FastText Embedding Model As word2vec retrained model has many newly introduced words, we have decided to create a customized embedding model for this dataset. It is also evident from the literature survey that fastText is one of the word embedding model producing better feature vectors. All the scenarios directed us to use fastText embedding model for further set of experiments. So, the first stage of work is to collect tweets which are similar to the texts in data of experimentation. Python library named tweepy is used to scrape more code-mixed texts. Initially, all the words in data are tokenized, and n-grams are collected out of them. The n value ranged from 1 to 5. The collected n-grams are used as key words and given input to the tweepy library. It collected all the tweets that contain n-gram within the text. All the collected tweets are manually refined, and tweets which have relevant information and code-mixed in nature are filtered. In total, 110,000 code-mixed texts are collected from social media and other sources. Gensim fastText library is utilized to create embedding model. Skip-gram mechanism is elected, and a fastText model is fabricated by training it with collected data for 10 epochs.

Table 2 Experiments-1 results Model Accuracy LSTM CNN-LSTM Bi-LSTM CNN-BiLSTM

0.5543 0.5634 0.5602 0.5763

Precision

Recall

F1-score

0.5671 0.5742 0.5627 0.5729

0.5613 0.5680 0.5659 0.5896

0.5572 0.5672 0.5640 0.5759

620

T. Tulasi Sasidhar et al.

4.4 Experiments-2 In this phase, customized fastText obtained model is used to generate feature vectors, and the experiments are conducted using the same deep learning models used in Experiment-1. There is a significant rise in classification results. At this stage, the misclassified sentences are retrospected, and it is observed that most of them are either lengthy or short. This high arbitrary nature in the lengths of sentences is responsible for false classification. So, an attention model is adapted in the architecture to identify and capture the relevant information according to context. This resulted in the better performance than the already experimented models. The architecture for the top performing model is as shown in Fig. 1. The results of each of the experimented models are illustrated in Table 3. The metrics for measuring the quality of classification are accuracy, recall, precision and F1-score. The confusion matrix of test data results for the best performing model is shown in Fig. 2 in which model class-wise classification performance can be displayed. In comparison with Experiments-1 results, there is a surge in accuracy, and F1-score of deep learning models can be seen in the phase of Experiments-2. It is evident that a customized fastText bilingual embedding model gave better feature vectors and attention mechanism helped in handling the sentences of which lengths are highly arbitrary in nature. The optimal hyperparameters of best performing model are given in Table 4. Initially, we started of with embedding vector size as 100, and to observe the change in the results, we experimented by varying the vector size from 200 to 400. There was improvement in results till 300 but at 400 the results started decreasing so the optimal embedding vector was fixed as 300. Various activation functions like ReLu, Tanh, etc., were used, and on observing the result, Tanh gave the best performance. We experimented by varying the number of epochs from 5 to 15 and found that after 10 epochs the results were overfitting so we stopped with 10 epochs. The number of

Fig. 1 Overview of attention-based Bi-LSTM model Table 3 Experiments-2 results Model Accuracy LSTM CNN-LSTM Bi-LSTM CNN-BiLSTM CNN-BiLSTM + Attention

0.6335 0.6476 0.6423 0.6524 0.7016

Precision

Recall

F1-score

0.6371 0.6484 0.6409 0.6667 0.7060

0.6196 0.6560 0.6496 0.6546 0.7016

0.6282 0.6476 0.6452 0.6579 0.7032

Sentiment Analysis on Hindi–English Code-Mixed Social Media Text

621

Fig. 2 Confusion matrix of best performing model Table 4 Hyperparameters of best performing model Hyperparameter Selected value Embedding dimension Optimizer Loss function Activation Epochs Batch size No. of Bi-LSTM units No. of conv. filters Size of conv. filters

300 Adam Categorical cross entropy Tanh(Bi-LSTM), Softmax(dense layer) 10 100 350 300 10 × 1

units in Bi-LSTM is experimented from 100 to 400, and at 350, we observed better classification. In summary, all the hyperparameters are selected based on trial and error method.

5 Conclusion Sentiment classification in Hindi–English code-mixed text is carried out in this work. The data for the experimentation is obtained from SentiMix shared task by SemEval2020, and the target labels are positive, neutral and negative. In the first phase of experiments, a pretrained word2vec model is retrained with the data, and feature

622

T. Tulasi Sasidhar et al.

vectors are generated. CNN headed BI-LSTM sequential model gave better performance with 57% F1-score. In order to improve the classification, a customized fastText bilingual embedding model is fabricated, and attention mechanism is utilized to deal with arbitrary sentence lengths. It is observed that out of all the experimented models attention-based CNN-BiLSTM has given better performance in terms of F1score. It is evident from the confusion matrix that it also has given better class-wise performance.

References 1. Mäntylä, M.V., Graziotin, D., Kuutila, M.: The evolution of sentiment analysis–a review of research topics, venues, and top cited papers. Comput. Sci. Rev. 27, 16–32 (2018) 2. Sreelakshmi, K., Premjith, B., Soman, K.P.: Detection of hate speech text in Hindi–English code-mixed data. Procedia Comput. Sci. 171, 737–744 (2020) 3. Cha, M., Gwon, Y., Kung, H.T.: Language modeling by clustering with word embeddings for text readability assessment. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2003–2006. ACM (2017) 4. Patra, B.G., Das, D., Das, A.: Sentiment Analysis of Code-Mixed Indian Languages: An Overview of SAIL-Code-Mixed Shared Task@ ICON-2017. arXiv preprint arXiv:1803.06745 (2018) 5. Shalini, K., Ganesh, H.B., Kumar, M.A., Soman, K.P.: Sentiment analysis for code-mixed Indian social media text with distributed representation. In: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1126–1131. IEEE (2018) 6. Chen, H., Sun, M., Tu, C., Lin, Y., Liu, Z.: Neural sentiment classification with user and product attention. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1650–1659 (2016) 7. Zhou, X., Wan, X., Xiao, J.: Attention-based LSTM network for cross-lingual sentiment classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 247–256 (2016) 8. Wang, Z., Zhang, Y., Lee, S., Li, S., Zhou, G.: A bilingual attention network for code-switched emotion prediction. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 1624–1634 (2016) 9. Kamble, S., Joshi, A.: Hate Speech Detection from Code-mixed Hindi-English Tweets Using Deep Learning Models. arXiv preprint arXiv:1811.05145 (2018) 10. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017) 11. Patwa, P., Aguilar, G., Kar, S., Pandey, S., PYKL, S., Gambäck, B., Chakraborty, T., Solorio, T., Das, A.: Semeval-2020 task 9: overview of sentiment analysis of code-mixed tweets. arXiv e-prints, pp.arXiv-2008 (2020) 12. Sasidhar, T.T., Premjith, B., Soman, K.P.: Emotion detection in Hinglish (Hindi + English) code-mixed social media text. Procedia Comput. Sci. 171, 1346–1352 (2020)

Accident Risk Rating of Streets Using Ensemble Techniques of Machine Learning Akanksha Rastogi and Amrit Lal Sangal

Abstract Increased vehicular traffic and lack of expert drivers on the street coupled with the adverse conditions and poor maintenance of streets are liable for increase in traffic accidents. Hence, prediction of traffic collision is of paramount importance for their mitigation. Street traffic analysis and prediction can be a dedicated approach to ensure safe and reliable street networks. The primary objective of this research is to assign an accurate accident risk factor for each street using machine learning models on the identified dataset. For automated and accurate prediction, various ensemble models of machine learning are applied, and their performance is compared with the naive models. Keywords Receiver operating characteristic (ROC) · Support vector machine (SVM) · Decision tree (DT) · K-nearest neighbor (KNN) · Root mean square error (RMSE)

1 Introduction In recent times, increased urbanization has resulted in much higher count of vehicles on streets, which has given rise to numerous troubles, such as traffic congestion, accidents, and air pollution. These issues have caused immense physical and economic loss as well as human casualties. The Global Status Report on Road Safety 2015, representing statistics from 180 nations, reveals that the overall number of traffic fatalities worldwide observes an annual increase by 1.25 million, with the maximum traffic mortality rates in lower-income nations. According to National Vital Statistics Reports 2017, traffic accidents are reasons for around 36,000 deaths in the USA. Urgency of moment is to improve traffic safety and reduce the number of deaths. A. Rastogi (B) · A. L. Sangal Department of Computer Science and Engineering, Dr. B R Ambedkar National Institute of Technology, Jalandhar, Punjab, India e-mail: [email protected] A. L. Sangal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_66

623

624

A. Rastogi and A. L. Sangal

Some collisions have found to have occurred because of the road structures whereas others can be attributed to human error. Although progress has been made in strengthening the regulations for on-road protection and making cars safer, the study indicates that the speed of reform remains very slow. With the guide of traffic information and deep learning, traffic stream forecast has encouraged individuals to avoid enormous road jams and accidents by choosing routes with a lower clog. Enormous traffic information and machine learning may likewise give an urging solution to foresee and diminish the danger of traffic casualties. One significant work in car accident evasion is to develop a successful traffic security score prediction system. On the off chance that a traffic security score in a specific area can be anticipated, we can advance this report to the neighboring drivers to make them careful or, while on the other hand, make them select a safer street. In any case, the definite forecast of the car accident is as troublesome as many related causes could impact car crash. Ensemble models provide us an upper hand in machine learning such that it combines the results of various models and allows better predictive performance compared to a single model. This potential makes it beneficial to apply the ensemble models of machine learning in the problem mentioned above. The main objective is to investigate risk levels of roads using traffic accident data. The purpose of this paper is to investigate accident data and classify streets into various accident risk levels and applying machine learning models to accurately predict the risk levels of the streets. So that future research can be done on making a device that would aid in avoiding the collision-prone areas and advise alternative approaches to alleviate accident recurrence and harshness. The remaining part of the paper is structured in the following way: Sect. 2 talks about the reviewed literature. This section is followed by our proposed accident risk assignment methods in Sect. 3. Model evaluation is discussed in Sect. 4. Section 5 concludes the paper with a depiction on future directions.

2 Literature Survey The immense effort was committed to the identifiable proof of main factors or distinct road patterns that could have caused the collision of traffic. For instance, Oh suggested the reason that disrupting traffic streams is one of the reasons for provoking the accident [1]. Ultimately based on the loop detector dataset and crash dataset, they found that a normal 5-min deviation of the automobile rates, immediately before a car accident, is a notable indication of a crash. Even though few accident pointers have been recommended, they could not address the issue of exact mishap prediction because plenty of components had complicated relations with car accident. Spatio-temporal reliance is a moving part of traffic, and the reliance of traffic movement on space and time is assessed by Yue utilizing cross-relationship reasoning and shows its importance in traffic prediction assessment [2]. Dauwels proposed unsupervised learning ways to deal with and conclude the spatio-temporal examples

Accident Risk Rating of Streets Using Ensemble …

625

in a gigantic traffic speed forecasting [3]. Dish thought of a model intended to foresee the spatio-temporal effect of happened occurrences on its neighboring traffic contingent upon the constant traffic data [4]. A spatio-temporal recurrent convolutional neural system to consider the spatial interdependent conditions and temporal behavior of network-wide traffic and worldly conduct of system-wide traffic is proposed by Yu [5]. The progression of AI innovation triggers the researchers to target real-time traffic accident prediction. Lv considered the factors relying upon Euclidean measurement and utilized k-closest neighbor way to deal with foresee car accidents [6]. Park assembled an enormous measure of vast roadway car crash information in Seoul and built a prediction flow of work contingent upon the k-means grouping approach and logistic regression [7]. Chen gathered the human versatility dataset in Japan and utilized model a stack denoise autoencoder to decipher the continuous traffic open to danger. One negative mark of these inquiries is that these works did not manage the temporal patterns of traffic impact itself into the models. Without this data, the percent intensity of the model could be reduced. In general, the literature reviewed on the harshness of collision injury found that serious thought had been given to modeling crash harshness, but prediction of injury outcome would not be a core concern. Statistical models are all the more often utilized in crash seriousness modeling contrasted with AI techniques, while AI strategies were, for the most part, utilized as prediction tools. DT, NB, SVM, and RF are seen as utilized in crash seriousness modeling with changing popularity.

3 The Proposed Risk Assignment Method The proposed process for the risk assignment to each street is shown in Fig. 1. The detailed description of all the steps is given in the subsections following the figure.

Fig. 1 An overview of the risk assignment process

626

A. Rastogi and A. L. Sangal

3.1 Dataset Selection The two datasets used in this research are vehicle dataset and vehicular accident dataset received from the Chicago Data Portal of the City of Chicago in the year 2015 to 2019.

3.2 Data Preprocessing Based on the common report number attribute we have merged both the datasets also we have cleaned the data to include only on-road vehicles and passenger vehicle types while excluding unknown values for the traffic type, lightning condition and weather columns. The attribute ‘num-passengers’ does not include the driver of the vehicle. Hence, we check If the ‘num-units’ = 1, and this implies that there was only one driver involved in the crash, and we add +1 to get the total number of passengers in the vehicle. If the ‘num-units’ >1, means there was more than one vehicle involved in the crash, and we add that value to the total number of passengers in the vehicle. From 1610 samples, 30% of samples were kept for testing and remaining 70% were used to train the model.

3.3 Feature Engineering Feature selection is the way toward choosing the traits that can make the anticipated variable increasingly correct or taking out those properties that are insignificant and can diminish the model precision and quality. Correlation is an approach to comprehend the connection between numerous factors and characteristics in the dataset. Correlation pictures on the off chance that one or numerous properties rely upon another quality or a reason for another trait or one or different features are related with different features. Correlation investigation shows that the accident severity can be resolved dependent on number of wounds and physical harm in the accident data.

3.4 Assigning the Danger Score We assign the danger score according to the number of injuries per person in the vehicle and the physical damage of the vehicle, and as these two factors have the maximum correlation, we find number of injuries per person involved in the accident by dividing the total number of the injuries in the accident by a total number of people involved in the accident. We choose to assign four danger score ratings to account for all accidents depending on the amount of injuries. We checked the unique values

Accident Risk Rating of Streets Using Ensemble …

627

of the damage and then decided what weight to add to it in the computation of the danger score. There are three unique values of the damage in monitory terms (1500), so we assign rating values 1, 2, and 3, respectively, then add a weight of ‘w’ in the multiplication. With a weight of 0.5, we have eight different scores to bin into four categories. Danger score is 1, if the current score is 0.5 and 1; danger score is 2, if the current score is 1.5 and 2; danger score = 3, if the current score is 3 and 4; danger score = 4, if the current score is 4.5 and 5. Then, we assign the accidents into three bins as combined danger score.

3.5 Applying Machine Learning Models After assigning the danger score to the each accident, we have applied machine learning models to estimate the accuracy of our assigned danger scores. We have applied basic machine learning models like logistic regression, SVM, KNN, decision tree, and ensemble models like random forest and gradient boosting. The implementation results of these models are described in the next section.

4 Model Evaluation We have implemented basic machine learning models and ensemble models and compared the results based on metrics like precision, recall, and F1 Score in Table 1 with other existing methods used in the study for prediction such as SVM, KNN, and DT models. We have shown the performance of models on ROC. Figures 2, 3, 4, 5, 6 and 7 provide the ROC curve for comparison of each model’s performance on the given dataset for the three danger score classes. The results obtained shows that gradient boosting is better as compared to all other models. The gradient boosting model has the smallest RMSE across different prediction models for all the classes. Table 1 Parameters based on the performance of models Model

Precision

Recall

F1-score

Accuracy

RMSE

Logistic regression

62

51

56

51.19

63.97

K-nearest neighbor

86

84

84

83.62

47.96

Support vector machine

91

89

89

85.82

40.46

Decision tree

81

90

85

85.97

39.25

Random forest

86

84

84

85.84

39.35

Gradient boosting

94

93

93

90.26

31.20

628

A. Rastogi and A. L. Sangal

Fig. 2 ROC for logistic regression

Fig. 3 ROC for k-nearest neighbor

5 Conclusions and Future Scope In the last decade, the analysis and prediction of street traffic have become a subject of continuous research in various sub-fields of computer science. In this paper, we enlisted and discussed various approaches proposed for the accidents risk rating assignment of streets, including machine learning and ensemble techniques. With results obtained, we can conclude that Gradient Boost, an ensemble model of machine learning, performed better than simple random forest in terms of accuracy and other parameters.

Accident Risk Rating of Streets Using Ensemble …

629

Fig. 4 ROC for support vector machine

Fig. 5 ROC for decision tree

Future work can be extended to develop a more successful model that would outperform the accuracies achieved by gradient boosting. Also, we can use hyperparameter tuning to optimize our models with parameters giving the best results, Moreover, our system could be extended to include more accident-related features that would help drivers to choose the safest path out of the various available routes, and the model should also be able to tell the risk of all the available streets. Future research will focus upon confirming the superiority of DNNs as traffic accident severity classification/ prediction models using the existing datasets of the relevant

630

A. Rastogi and A. L. Sangal

Fig. 6 ROC for random forest

Fig. 7 ROC for gradient boosting

literature, also putting forward a set of independent parameters that are not only salient, but also enough, for traffic accident severity prediction.

Accident Risk Rating of Streets Using Ensemble …

631

References 1. Oh, C., Oh, J.-S., Ritchie, S., Chang, M.: Real-time estimation of freeway accident likelihood. In: 80th Annual Meeting of the Transportation Research Board, Washington, DC (2001) 2. Yue, Y., Yeh, A.G.-O.: Spatiotemporal traffic-flow dependency and short-term traffic forecasting. Environ. Plann. B Plann. Des. 35(5), 762–771 (2008) 3. Asif, M.T., Dauwels, J., Goh, C.Y., Oran, A., Fathi, E., Xu, M., Dhanya, M.M., Mitrovic, N., Jaillet, P.: Spatiotemporal patterns in large-scale traffic speed prediction. IEEE Trans. Intell. Transp. Syst. 15(2), 794–804 (2014) 4. Pan, B., Demiryurek, U., Shahabi, C., Gupta, C.: Forecasting spatiotemporal impact of traffic incidents on road networks. In: 2013 IEEE 13th International Conference on Data Mining (ICDM), pp. 587–596. IEEE (2013) 5. Yu, H., Wu, Z., Wang, S., Wang, Y., Ma, X.: Spatiotemporal recurrent convolutional networks for traffic prediction in transportation networks. Sensors 17(7), 1501 (2017) 6. Lv, Y., Tang, S., Zhao, H.: Real-time highway traffic accident prediction based on the k-nearest neighbor method. In: International Conference on Measuring Technology and Mechatronics Automation. ICMTMA’09, vol. 3, pp. 547–550. IEEE (2009) 7. Park, S.-H., Kim, S.-M., Ha, Y.-G.: Highway traffic accident prediction using vds big data analysis. J. Supercomput. 72(7), 2815–2831 (2016)

Skin Detection Using YCbCr Colour Space for UAV-Based Disaster Management S. J. Arya, A. Asish, B. S. Febi Shine, J. L. Sreelakshmi, and Elizabeth Varghese

Abstract Crushing impact of inappropriate disaster management and expanded death rate during a disaster compelled to search for powerful disaster management. Rapid technological advancement and research on unmanned aerial vehicle (UAV) urged them to use in disaster management. UAV captures the disaster site images and upon further analysis by image processing techniques helps in human detection through skin detection technique. The study involves skin detection using YCbCr colour space. Experiment test was performed on both outdoor and indoor, and the results largely depend on light conditions and various environmental factors. The UAV was hovered above 15 m from the ground to capture the outdoor samples This technique helps to get better output such that humans could be detected based on their skin and quick recovery can be made reducing the mortality and risk of rescue operators. Keywords Unmanned aerial vehicle · Image processing · Disaster management · Skin detection

S. J. Arya (B) · A. Asish · B. S. Febi Shine · J. L. Sreelakshmi · E. Varghese Department of Electrical & Electronics Engineering, Mar Baselios College of Engineering & Technology, Thiruvananthapuram, India e-mail: [email protected] A. Asish e-mail: [email protected] B. S. Febi Shine e-mail: [email protected] J. L. Sreelakshmi e-mail: [email protected] E. Varghese e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_67

633

634

S. J. Arya et al.

1 Introduction Disaster surveillance is an unpredictable procedure and is in a developing stage as there is a compelling impulse for powerful disaster management. Innovative progression got refined changes in disaster management and one among them is the surveillance utilizing Unmanned aerial vehicles. Successful disaster management is required for a fast recuperation and reclamation, therefore giving a solid guide to the salvage administrators. The management gets troublesome at its post-disaster scenes because of its huge vulnerability and turbulent conditions. This paper plans to conduct a study on the YCbCr colour space technique for skin detection, which can be utilized with the end goal of disaster victim surveillance utilizing UAV [1]. The area of skin detection is a developing sector for the past ten years. There are so many proposed techniques for skin detection. Skin colour extraction is a basic element in the skin detection process [2]. The different characters of images are brightness, variance, visibility and saturation [2]. Human skin detection is a challenging process because of the huge range of skin colour from one region to another [3]. One colour space can be defined by two or three different colour components. Different applications like TV broadcasting and graphics suit different colour spaces [2]. There are numerous papers which talk about various texture segmentation procedure like Gabor filter, edge discovery, content-based picture recovery, thresholding, Markov arbitrary fields, supervised segmentation, unsupervised segmentation, grouping techniques, region-based and histogram-based strategies [4]. A significant property of Gabor channels is that they have ideal joint restriction, or goals, in both the spatial and the spatial-recurrence areas [5]. In 2011, a texture-based skin detection algorithm was proposed. The paper covers the process of skin detection via HSV and another topic of texture segmentation in grey-level distribution [5]. This paper concludes like algorithm based on greyscale distribution is effective and yields better results. The referred papers show that YCbCr colour space is well-suited for skin detection in complex image. Since this paper deals with skin detection in disaster site, it adopts YCbCr colour space and greyscale conversion for better results. By coordinating image processing technique with UAV, a quick and effective rescue facility can be empowered and it gives achievement to disaster surveillance. Image processing is a technique of modifying an image and changing its attributes to obtain the ideal characteristics. The best image processing technique for disaster surveillance is skin recognition technique which identifies the nearness of human caught in a disaster site through skin shading distinguishing proof. Human body location is done utilizing skin shading recognition, and it is a procedure by which skin hued pixels are recognized from caught images or video. Skin colour along with light conditions is the deciding factors for skin recognition. The captured image is broken down into single pixel and classifies them into the desired output in YCbCr colour space. Thus for developing a fast and cheaper rescue facility with better efficiency, a skin identification technique linked with drone technology opens a new path in disaster surveillance.

Skin Detection Using YCbCr Colour Space for UAV …

635

Fig. 1 Block diagram

2 Methodology The paper describes an experimental study on skin detection in YCbCr colour space. MATLAB is used for image processing, and results give the feasibility of the selected skin detection method. This will help to make an effective system for disaster victim surveillance using UAV. Once the sample is loaded, it is converted into YCbCr colour spaces from normal RGB value. Different morphological operations are done to develop efficient skin detection. Along with this colour, segmentation model paper aims to check the efficiency of skin detection in greyscale value distribution. For this purpose, MATLAB code is improved to plot the histogram and to detect the skin colour in the greyscale range also. In YCbCr, procedure effectiveness is broke down by plotting the histogram of every part. UAV mounted with a camera catches the pictures of the catastrophe site, and it is received at the ground station. Preparing the pictures that have taken by UAV during the flight from a height in MATLAB for skin identification opens an approach to discover casualties and guarantees a snappy recuperation. Figure 1 indicates the block diagram of the system which has aground board system and an air board system.

3 Image Processing Image processing is a rising field with a wide assortment of uses and assumes an immense job in surveillance. It is a technique of modifying the characteristics of an image and changing its attributes to get an ideal yield according to the enthusiasm of the client. The fundamental strides in each sort of image processing continue as before as it includes bringing in images utilizing image obtaining process, breaking

636

S. J. Arya et al.

down the images through programming, the caught images are controlled to deliver the ideal highlights and different filter techniques are applied to evacuate noises. Skin detection is done by separating skin shaded pixel from non-skin hued.

3.1 YCbCr Colour Space YCbCr colour space is an encoded form of RGB colour space used for video streaming and compression. Since the portrayal makes it easy to discard some repetitive shading information, it finds application in the image and video compression standards like JPE Group, MPE Group 1, MPE Group 2 and MPE Group 4. The change straightforwardness and express parcel of luminance and chrominance fragments make YCbCr shading model. In this course of action, luminance data is taken care of as a unit fragment (‘Y’), and chrominance data is taken care of as 2 shading differentiation parts (‘Cb’ and ‘Cr’). ‘Cb’ addresses the differentiation between blue portion and reference esteem. ‘Cr’ addresses the complexity between red part and reference esteem. It is a linear conversion of RGB [4]. YCbCr values can be gotten from RGB shading space as per Eqs. 1–3 [4]. Y = 0.299R + 0.287G + 0.11B

(1)

Cr = R − Y

(2)

Cb = B − Y

(3)

YCbCr colour model representation is shown in Fig. 2. All three components are represented in the image. Fig. 2 [4] YCbCr colour model

Skin Detection Using YCbCr Colour Space for UAV …

637

4 Results of Skin Detection Skin tone selected for the study is of the moderate colour range. 0 < H < 50 & 23 < S < 68 [2] RGB-level to grey-level conversion takes place as per the code, and the converted range is y > 0.2; cb > 0.3 & cb< 0.44; cr >0.47. Skin detection of both an indoor image and an outdoor image which is captured during the flight of UAV is done in YCbCr colour space.

4.1 Skin Detection Results in YCbCr Figures 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 and 14. Fig. 3 Indoor sample

Fig. 4 Skin pixels

638 Fig. 5 Skin pixels in colour

Fig. 6 Non-skin pixels in colour

Fig. 7 Original colour image

Fig. 8 Histogram of Y image

S. J. Arya et al.

Skin Detection Using YCbCr Colour Space for UAV … Fig. 9 Histogram of Cb image

Fig. 10 Histogram of Cr image

Fig. 11 Oudoor sample

Fig. 12 Skin pixels

639

640

S. J. Arya et al.

Fig. 13 Skin pixels in colour

Fig. 14 Non-skin pixels in colour

4.2 Results RGB colour space is not being used commonly because of its non-uniform nature. YCbCr colour space shows better results and transformation. As the disaster site contains more complex situations, YCbCr colour space will be effective for the skin detection purpose. Figure 3 is an indoor sample. Skin pixels are extracted and shown in Fig. 4. Skin part and non-skin part are marked in Figs. 5 and 6, respectively. Histogram of the image can be used for better analysis of the picture. The original colour image got by processing the sample using the code which is shown in Fig. 7. Histogram of Y, Cb, Cr is indicated by Figs. 8, 9 and 10, respectively. Figure 11 is the outdoor sample that was taken using UAV from an approximate height of 15 metres. Skin pixel identification in greyscale is shown in Fig. 12. Figures 13 and 14 are the skin pixels in colour and non-skin pixels in colour, respectively. The results were perfect in the case of the samples that we were used for the test.

5 Conclusion Skin detection is a leading edge in human body identification and analysis and is applied in many emerging technologies like face detection where human skin colour acts as an elementary guide for detection. The segmentation process of human skin

Skin Detection Using YCbCr Colour Space for UAV …

641

colour depends upon the colour space selected, as the skin colour distribution largely depends on the colour space. The images under various positions based on orientation, illumination, shadow, pose, in-plane rotation can be easily distinguished and are applied in diverse applications like video-compression and recognition technology, as it finds difficult in computer vision technology. Considering the various limitations in this field, we can say there is not been a Zen per cent solution for skin detection and is still under development.

References 1. Mbaitiga, Z., Fuji, S., Minori, S.: Rapid human body detection in disaster sites using image processing from unmanned aerial vehicle (UAV) cameras. In: ICIIBMS 2018, Track 2: Artificial Intelligent, Robotics, and Human-Computer Interaction, Bangkok, Thailand 2. Lei, Y., Hui, L., Xiaoyu, W., Dewei, Z., Jun, Z.: An algorithm of skin detection based on texture. In: 4th International Congress on Image and Signal Processing 2011, pp. 1822–1825 (2011) 3. Shaik, K.B., Ganesan, P., Kalist, V., Sathish, B.S., Jenitha, J.M.M.: Comparative study of skin color detection and segmentation in HSV and YCbCr color space. Procedia Comput. Sci. 57, 41–48 (2015) 4. Ahmed, E., Crystal, M., Dunxu, H.: Skin detection—a short tutorial. Encyclopedia of Biometrics, pp. 1218–1224. Springer, Berlin, Heidelberg (2009) 5. Kolkur, S.: Human skin detection using RGB, HSV and YCbCr colour models. Adv. Intell. Syst. Res. 137, 324–332 (2017)

Lie Detection Using Thermal Imaging Feature Extraction from Periorbital Tissue and Cutaneous Muscle Prajkta Kodavade, Shivani Bhandigare, Aishwarya Kadam, Neha Redekar, and Kiran P. Kamble

Abstract The contribution addresses problem of detecting a deception when interrogation is going on by taking the thermal images of the face. When the person is lying, due to stress on his face, the blood flow in periorbital tissue and cutaneous muscles increases, which ultimately results in the higher blood flow in the respective areas. In the proposed work, we have collected the dataset of such thermal images and developed algorithms to extract the features from the dataset generated in order to be able to train the neural network model which will in turn classify the input image or frame of video as a deception or truth. Using the proposed approach, obtained F1 score for baseline , truth, direct lie and indirect lie is 54%, 46%, 67%, respectively, and overall accuracy is 60.53%. Keywords Thermal images · Neural networks

1 Introduction According to several studies, the differentiation between the liars and non-liars is very poorly detected by normal as well as expert people. The well-known method, i.e., polygraphy includes different sensors which measure a person’s blood pressure, P. Kodavade (B) · S. Bhandigare · A. Kadam · N. Redekar · K. P. Kamble Department of Computer Science and Engineering, Walchand College of Engineering, Sangli, India e-mail: [email protected] S. Bhandigare e-mail: [email protected] A. Kadam e-mail: [email protected] N. Redekar e-mail: [email protected] K. P. Kamble e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_68

643

644

P. Kodavade et al.

respiration activity, etc. Mostly, the polygraph technique succeeds in detecting the lies with a accuracy of 90%. But its major drawback is the time taken and the quality of expert people conducting the test. This drawback can be overcome by using automated deception detector which detects the lies by facial and behavioral changes. The lie detector uses thermal image as an input image. It uses the temperature of the skin that is captured by the thermal camera, which varies due to the blood flow which is the result of varying emotions. This technique seems to be promising as it is hard to control one’s emotions [1]. This change in the blood flow is mostly observed in the forehead and periorbital regions of face. This gives us the input as the relevant patterns to be found in these regions to detect the lies and truth.

2 Literature Survey Rajoub et al. [2] presented the lie detection approach by observing thermal features in region of interest, i.e., periorbital region. The approach includes the use of machine learning for the feature. Bhowmik et al. [3] proposed a solution for facial feature detection. Features such as eye, nose, mouth are detected which do not vary on rotation, scale, image noise such as eyes and nose. This solution uses an algorithm called Harris Interest Point Detection algorithm to do so. Wu et al. [4] have presented recognition of thermal face and also learn important features like nose, eye, mouth, etc., from the raw dataset using a CNN architecture. The recognition rate here is still affected by conditions like head rotation, expression variation and illumination variation. Kyal et.al. [5] have presented a way to identify the face of a human being from a thermal image efficiently. The feature extraction task was done using a technique called histogram plot. In order to be able to detect face efficiently, techniques such as object boundary analysis, thresholding are applied to images. George et al. [2] analyzed the count of eye blink and duration of blink of truth and lie responses. Analysis over 50 people over some sample questions showed that count and the duration are more in case of deception. Grouping of responses was done on the basis of maximum blink duration, and maximum count of blinks for both lie and truth responses and also the responses where no blinking is observed was categorized as no blink category.

3 Methodology The methodology used for present research includes recording of facial skin temperature using thermal camera, processing captured images on mobile device using deep learning techniques.

Lie Detection Using Thermal Imaging Feature Extraction from Periorbital …

645

3.1 Architecture Figure 1 demonstrates the top view architecture of the project model. Thermal camera is plugged into a smartphone, and video is recorded. Then, recorded video in the gallery of a smartphone is transferred to application. This video is fetched on local machine where trained model is already located. The model is downloaded from deep learning server after training with a sufficiently large dataset. The processed input image is then given as input to the script, and respective response is generated using the pre-trained model which is ultimately sent back to user via application.

3.2 Data Acquisition Data will be acquired by using a thermal camera, Seek Compact Thermal Imager for Android with a temperature range −20 to 1000 ◦ C. During each interview session, thermal measurements of participant’s face will be obtained by using the Android smartphone attached with the thermal camera. The dataset can be obtained by using the following methods: 1. Surveillance of participants and interviewing them. 2. To ask a subject to describe another person. 3. Interview of different people which can have different base temperatures of the body under normal circumstances.

Fig. 1 Architecture of Lie detection system

646

P. Kodavade et al.

These varying temperatures can affect or change the heat maps, and it can also worsen the accuracy of detecting a lie. In order to get over this issue, the initial few seconds of every recording will be used as the baseline for each person. In this case, the concerned person will sit normally without getting into any pastime or answering. The dataset is based on two profile case studies and a particular mock crime. There is a total of ten participants who are considered for the demo interview, each interview further divided into four parts. (1) Baseline (2) True (3) Direct lie and (4) Indirect lie. For testing purposes, we will be using mock crime video.

3.3 Data Processing The data processing begins with identifying and cropping of the subject’s face [6]. This action is followed by detecting the maximum intensity point in the image(refers to the nasal tip which is most close to thermal camera). Now, taking this as the reference point, the location of forehead and eye region is calculated [7]. These regions are now cropped from the image for creation of dataset. Now, the dataset consists of forehead and periorbital regions which we will eventually get by refining the previously cropped images. These cropped images are stored as dataset. Sample of processing is as shown in Figs. 2 and 3.

3.4 Experimenting Experimenting the data acquired from the above step using deep learning (DL) machine. Dataset of four parts like baseline, true, direct lie and an indirect lie is

Fig. 2 Periorbital regions to be cropped from the eye region

Lie Detection Using Thermal Imaging Feature Extraction from Periorbital …

647

Fig. 3 The cropped forehead and eye region

uploaded on the DL server. The server has two NVIDIA V100 architecture GPU cards and 128 GB DDR 4 ECC RAM. The model is trained by using AlexNet [8] and different algorithms available on the server [9]. We have varied epochs and different parameters to get more accuracy. Model is trained using processed dataset which contains 22,000 images of baseline, 33,912 for truth, 11,000 each for direct and indirect lie. Test dataset contains 6752 test images for baseline, 7892 test images for truth, 3375 test images for direct and indirect lie class.

4 Dataset The dataset is acquired by using a thermal camera, Seek Compact Thermal Imager for Android with a temperature range −20 to 1000◦ C (evening). The thermal camera features are as follows—Seek Thermal Camera with an image resolution of 640 × 480 pixels and a frame rate of 12 FPS. 256 × 156 Thermal sensor. 36-degree field of view, works day and night. During each interview session, thermal measurements of the participant’s faces were obtained by using the Android smartphone attached with the thermal camera. The dataset was obtained by using the following methods: 1. Surveillance of people and interviewing them. 2. To tell a subject to describe another person. 3. Different subjects may have different base temperatures in usual circumstances. 4. These differences can control and worsen the accuracy of lie detection. 5. In order to get over this issue, the initial few seconds of every recording will be used as the baseline for each person. 6. In this case, the concerned person will sit normally without getting into any pastime or answering. The dataset is based on two profile case studies and a particular mock crime. There are a total of ten participants who are considered for the demo interview, each interview further divided into four parts—true, direct lie, indirect lie and baseline. For testing purposes, we used a mock crime video.

648

P. Kodavade et al.

Fig. 4 Overall structure of the dataset

The subject was given a character profile to learn for 10 min. Four sessions of each subject were conducted. Baseline session, truth session, direct lie session and indirect lie session. Figure 4 gives us the overall distribution of the dataset used in the project. It sums up the total number of videos to 41 videos. Using the videos of various participants will narrow the error zone, thereby widening the scope of the project. Table 1 gives us the overall distribution of the questions. The baseline questions consist of the general questions like what is your name, which city do you belong to, etc. The true session consisted of questions where the participant gave genuine answers. The direct lie session consisted of the questions where the person lied directly without hiding the truth or making up a story. This was achieved as the person was given the story plot before the session. The indirect lie session consisted of the questions where the participants made up a story and lied.

5 Experimental Setup • Sessions recording using the thermal Android app provided with the camera. • The camera should be placed approximately 20 cm away from the subject’s face. • Sessions to be conducted in the evening so as to get sharp edges.

Lie Detection Using Thermal Imaging Feature Extraction from Periorbital … Table 1 Distribution of questions Session

649

Number of questions

Baseline True Direct Indirect Total

7 50 25 25 107

6 Result Analysis The trained model is now able to identify between true and lie correctly, whereas the minute differences like between direct and indirect are not detected. Tables 2 and 3 denote the confusion matrix for classification as class true or class lie and for classification in four classes, i.e., baseline, true, direct lie and indirect lie, respectively. Table 2 denotes the overall accuracy of the model, while classifying image into true or lie is 100%. Table 3 denotes the F1 score for baseline, truth, direct lie and indirect lie is 54%, 46%, 67%, 67%, respectively. The overall accuracy based on this is 60.53%.

Table 2 For classification as true or lie True Lie True Lie Overall truth Recall

27,000 0 27,000 100%

0 22,000 220,00 100%

Overall classification

Precision

27,000 22,000 – –

100% 100% – –

Table 3 For classification in four classes, i.e., baseline, true, direct lie and indirect lie Baseline Truth Direct lie Indirect lie Precision F1 score Baseline Truth Direct lie Indirect lie Overall truth Recall

8000 5390 0 0 13,390

8000 5610 0 0 13,610

0 0 6050 1000 7050

0 0 4950 10,000 14,950

50% 51% 55% 90.90% –

54% 46% 67% 77% –

59.74%

41.22%

85.816%

66.89%





650

P. Kodavade et al.

7 Conclusion We proposed a solution to detect lie in an investigation interview with least physical intervention and with great accuracy. Our solution works sufficiently well when it comes to classify between lie and not lie. Whereas, a clear classification between direct and indirect lie is not that accurate. Some different approach such as eye blinking, looking down can be taken into consideration in order to improve the accuracy of the classification between direct and indirect lie. Acknowledgements We appreciate S. H. Bhandari, Minal Parchand, Pankhudi Bhonsle and Komal Kotyal for building concrete foundation which helped us to accomplish this work and also to Department of Computer Science and Engineering WCE, Sangli for continuous support and valuable guidance.

References 1. Marzec, M., Koprowski, R., Wrobel, Z.: Method of face localization in thermograms. Biocybern. Biomed. Eng. (2014) 2. Rajoub, B.A., Zwiggelaar, R.: Thermal facial analysis for deception detection. IEEE Trans. Inf. Forensics Secur. 9(6), 1015–1023 (2014) 3. Bhowmik, M.K., Shil, S., Saha, P.: Feature points extraction of thermal face using harris interest point detection. In: International Conference on Computational Intelligence: Modeling Techniques and Applications (CIMTA) (2013) 4. Wu, Z., Peng, M., Chen, T.: Thermal face recognition using convolutional neural network. In: 2016 International Conference on Optoelectronics and Image Processing 5. Kyal, C.K., Poddar, H., Reza, M.: Detection of human face by thermal infrared camera using MPI model and feature extraction method. In: 2018 4th International Conference on Computing Communication and Automation (ICCCA) 6. Latif, M.H., Md. Yusof H., Sidek, S.N., Rusli, N.: texture descriptors based affective states recognition- frontal face thermal image. In: 2016 IEEE EMBS Conference on Biomedical Engineering and Sciences (IECBES) 7. Abd Latif, M. H, Md. Yusof, H, Sidek, S.N, Rusli, N.: Implementation of GLCM features in thermal imaging for human affective state detection. In: 2015 IEEE International Symposium on Robotics and Intelligent Sensors 8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105, Curran Associates, Inc (2012) 9. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Weinberger densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition

Voting Classification Method with PCA and K-Means for Diabetic Prediction Anupama Yadav, Harsh K. Verma, and Lalit Kumar Awasthi

Abstract Data mining can be defined as a technology using which valuable information can be extracted from the massive volume of data. The big patterns can be explored and analyzed using statistical and artificial intelligence in big databases. The goal of this research work is to predict the diabetes disease accurately with machine learning algorithms such as PCA, K-means, random forest, multilayer perceptron (MLP), and naive Bayes. The diabetes prediction model has various steps like data preprocessing, feature extraction with the help of PCA, and classification with voting classifier. The fundamental focus of this research is to improve the prediction accuracy. To improve the accuracy for diabetes prediction, voting classifier is introduced for the diabetes prediction. Keywords Diabetes prediction · PCA · K-means · Voting · Logistic regression

1 Introduction Diabetes is a frequent chronic malady. This disease severely affects the health of a human being. The increase in blood sugar level from the normal range is the main feature of this disease. Imperfect insulin secretion or impaired genetic effects are the main causes of this disease. In this disease, human body either doesn’t generate sufficient insulin or becomes inefficient in the usage of generated insulin in proper way. If this disease is not treated in proper time, then it can cause harm to a person’s nerves, eyes, kidneys, and other organs. The first type generally affects youngsters below A. Yadav (B) · H. K. Verma · L. K. Awasthi Department of Computer Science and Engineering, Dr. B R Ambedkar National Institute of Technology Jalandhar, Jalandhar, Punjab, India e-mail: [email protected] H. K. Verma e-mail: [email protected] L. K. Awasthi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_69

651

652

A. Yadav et al.

thirty year. Some medical indications of this disease are more thirst and repeated urination, high blood sugar levels, etc. It is not possible to cure this disease with oral medicines only. In many cases, insulin is given to the human body through injection. The second type of this disease mainly occurs in middle-aged and old people. In old people, this disease mainly occurs due to fatness, high blood pressure, dyslipidemia, arteriosclerosis, and other maladies. The future trends can be predicted, or hidden pattern can be discovered using data mining. There are various methods in data mining using which relevant information can be taken out. These techniques include classification, clustering, association rule, regression, outlier detection, etc. The technology of data mining is gaining a lot of popularity in healthcare sector. Data mining is a leading tool set in clinical databases. Nowadays, the use of data mining algorithms for generating clinical predictions has become quite common. Over the past few years, many researchers have theorized that medically assistive supports, and prediction patterns can be acquired from the crucial data of a patient. Most of the researches in the area of disease prediction analysis are focused on increasing the accuracy rate. The data should be in understandable format for carrying out analysis.

2 Literature Survey The database of diabetes patient had employed in this system to provide the diabetes malady analysis. The exploitation of KNN and Bayesian algorithms was suggested in this system that had carried out in the dataset of diabetes patients. Several diabetes features were extracted for analysis of these algorithms to predict the disease of diabetes [1]. Recommends a risk prediction model for diabetes of type-2. This model was designed on the basis of ensemble learning technique. The selection of optimal attributes was done by RF-WFS and XGBoost (extreme gradient boosting) classifiers for selecting best features. A lot of performance parameters were compared in this work to validate the efficiency of the recommended classifiers. Moreover, these classifiers showed more accurate prediction results as compared to other existing classifiers [2]. A medical case had considered after taking account of electronic health records from different sources that were related to patients of diabetes. The naive Bayes and SVM data mining algorithms for classifications were employed to implement the analysis. This analysis focused on the diabetes prediction with health record. The superior algorithm was identified for predicting the diabetes when the precision of both the data algorithms was compared [3]. The GMM, support vector machine, artificial neural network, ELM, and logistic regression were various data mining technique that had applied to diagnose the diabetes in early phase. The outcomes obtained after experiments demonstrated that the better accuracy had achieved from ANN than other methods [4].

Voting Classification Method with PCA …

653

The experiment on WEKA tool was conducted using four classifiers named SVM, random forest, simple cart, and naive Bayes for diabetes prediction. The comparison of these classifiers was performed in terms of exactness value, time for training and testing. The classifier measure of accuracy was another performance technique that had used for evaluation. The SVM classifier was performed better than naive Bayes, RF, and simple cart for predicting the diabetes. The results acquired after the testing demonstrated the efficiency of suggested model [5]. The predicting models were set up from the diagnostic medical datasets for extracting the knowledge. This extracted knowledge had proved efficient for diabetic prediction in patients. The diabetic mellitus was predicted using SVM, naive Bayes, KNN, and decision tree (C4.5) machine learning algorithms on data related to youngsters. The greater accuracy had obtained from decision tree (C4.5) [6]. The prediction of diabetes was completed using ANN, K-means, and RF methods. The highest accuracy had achieved from the artificial neural network that was evaluated 75.7%. It was helpful to aid medical professionals for making decisions for treatment [7]. A new model based on data mining to diagnose and predict diabetes disease in early stage is given. This algorithm could be utilized for different types of data. The kmeans was simple, and it was very responsive to original locations of cluster centers. This phenomenon determined the ultimate clustering result. The adequate clustered dataset was received from it for the logistic regression model. To improve the accuracy rate of k-means along with logistic regression was the main motive of this work. It was evaluated from the results that the accuracy of both the algorithms mentioned above was improved by principal component analysis. The recommended model included three algorithms. These algorithms were identified as principal component analysis (PCA), k-means for clustering, and logistic regression for classification. The tested outcomes depicted that PCA algorithm made improvements in the k-means approach. This k-means algorithm showed 25% improvement in accuracy rate while logistic regression showed 1.98% more accuracy rate [8].

3 The Proposed Diabetes Prediction Model Following are the various phases for the diabetic prediction (Fig. 1): 1. Dataset input: The diabetes dataset obtained from UCI database is used for this prediction. The dataset is comprised of 768 sample female patient from the Arizona, USA population who were examined for diabetes. The dataset has a total of 8 attributes such as pregnancies (preg), glucose (plas), blood pressure (pres), skin thickness (Skin), insulin, BMI, diabetes pedigree function, age with one target class (0 or 1). 2. Attribute selection: In this phase, the technique of PCA is utilized to decrease the dimensionality of the data. The technique of PCA is applied which can select the most relevant attributes from the large number of attributes. The selection

654

A. Yadav et al.

Fig. 1 Diabetes disease prediction model

START

Input data from UCI repository for diabetes prediction

Apply PCA algorithm for the feature reduction

Apply k-mean algorithm to cluster similar data

Apply voting classification method for the diabetes prediction

Analyze in terms of certain parameters like accuracy, precision and recall

STOP

of relevant attributes may lead to reduction in execution time as high dimension data is extremely complex to process due to inconsistencies in the features. 3. Clustering: In this phase, k-means clustering algorithm will be used for the better classification. The process in which the similar objects are grouped together is called clustering. These objects are grouped according to their characteristics. Kmeans clustering is one of the least complex algorithm which uses unsupervised learning method to resolve known clustering problems. Steps involved in k-means clustering are: Step 1 Randomly setup k points called the group centroids. Step 2 Elbow curve can be used to dictate the value of k (number of clusters). Step 3 Calculate the separation between the data points and the group centroid introduced using Euclidean distance formula. Step 4 On the basis of minimum distance, data points are allocated into nearest clusters. Step 5 Calculate the mean value including the new data point for every cluster to find out the new centroid.

Voting Classification Method with PCA …

655

Table 1 Models performance parameters Model

Precision

Recall

Accuracy

PCA + logistic regression

71

72

72.07

PCA + Naive Bayes

69

69

69.48

PCA + SVM

71

72

72.07

PCA + K-means + logistic regression

97

97

97.40

Proposed method (PCA + K-means + voting classifier)

98

98

98.05

Table 2 Comparison from previous work S. No.

Author

Model

Accuracy

1

Zhu et al. [8]

PCA + K-means + logistic regression

97.40

2

Our approach

PCA + K-means + voting classifier

98.05

Step 6 Repeat past two stages iteratively till the group centroids quits changing their positions. 4. Classification: In this stage, a voting classification algorithm will be utilized for the diabetic forecast. Voting is one of the easiest method of consolidating the prediction from numerous machine learning algorithm. This voting classifier will be combination of random forest, naive Bayes, and multilayer perceptron.

4 Model Evaluation and Result Comparison from Previous Work This work focuses on the diabetic prediction. The data is taken from the UCI database. The dataset has 8 attributes, and dataset is of multivariate type for the prediction analysis. Different methods are implemented and compared in terms of certain parameters like accuracy, precision, and recall. In the proposed method PCA, k-means and voting classification approaches are implemented for diabetic prediction. The voting classification method is combination of multilayer perceptron (MLP), random forest, and naive Bayes classifier. We have applied following models on the dataset whose result is given in Tables 1 and 2.

5 Conclusion and Future Scope In this paper, it is inferred that various steps are involved in diabetes prediction. The technique of PCA is used for the feature reduction. The k-means clustering algorithm is used to cluster alike and diverse type of data. In the last, the voting classification

656

A. Yadav et al.

method is implemented for the diabetic and non-diabetic prediction. It is examined that proposed method has high precision, accuracy, and recall values as compared to the existing methods. The techniques which are proposed in previous research works are using different set of algorithms like k-means, SVM, logistic regression, and other machine learning algorithm for prediction. The proposed algorithm is combination of the PCA, k-means, and voting classification. The proposed model gives accuracy about 98.05% which is better than previously achieved accuracies in different papers which are mentioned above. In future, the proposed method can be additionally expanded by utilizing transfer learning technique for diabetes forecast.

References 1. Shetty, D., Rit, K., Shaikh, S., Patil, N.: Diabetes disease prediction using data mining. In: 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), pp. 1–5, Coimbatore (2017) 2. Xu, Z., Wang, Z.: A risk prediction model for type 2 diabetes based on weighted feature selection of random forest and XGBoost ensemble classifier. In: 2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI), pp. 278–283, Guilin, China (2019) 3. Raj, R.S., Sanjay, D.S., Kusuma, M., Sampath, S: Comparison of support vector machine and Naïve Bayes classifiers for predicting diabetes. In: 2019 1st International Conference on Advanced Technologies in Intelligent Control, Environment, Computing and Communication Engineering (ICATIECE), pp. 41–45, Bangalore, India (2019) 4. Komi, M., Li, J., Zhai, Y., Zhang, X.: Application of data mining methods in diabetes prediction. In: 2017 2nd International Conference on Image, Vision and Computing (ICIVC), pp. 1006– 1010, Chengdu (2017) 5. Mir, A., Dhage, S.N.: Diabetes disease prediction using machine learning on big data of healthcare. In: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), pp. 1–6, Pune, India (2018) 6. Faruque, M.F., Asaduzzaman, Sarker, I.H.: Performance analysis of machine learning techniques to predict diabetes Mellitus. In: 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 1–4, Cox’sBazar, Bangladesh (2019) 7. Alam, T.M., Iqbal, M.A., Ali, Y., Wahab, A., Abbas, Z.: A model for early prediction of diabetes. Inform. Med. Unlocked 16, 100204 (2019) 8. Zhu, C., UwaIdemudia, C., Feng, W.: Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques. Inform. Med. Unlocked 17 (2019)

Hybrid Model for Heart Disease Prediction Using Random Forest and Logistic Regression Hemant Kumar Sharma and Amrit Lal Sangal

Abstract Data mining is a method in which the valuable data is mined from the rough data. The futuristic outcomes are forecasted using recent information in the prediction analysis. The more useful, efficient, and commercial management of health resources after the recognition of risks, the prediction of disease in people or the prediction of hospital entry’s length is facilitated through it. This research work deals with the prediction of the heart disease. There are several steps that are included in the heart disease prediction. The preprocessing, feature selection and classification are some of these steps. The Random Forest (RF) and logistic regression based the hybrid scheme are introduced. The features are selected using RF. The implementation of Logistic Regression (LR) is done for classification. The analysis of performance of the recommended model for acquiring accuracy, precision, and recall is completed in this research. The accuracy has obtained in predicting the heart disease from this model is evaluated 95.08%. Keywords Heart disease prediction · Naive Bayes · Random forest · Logistic regression

1 Introduction The use of data mining technology in healthcare sector has revolutionized the task of disease prediction. The role of this technology in heart disease prediction is quite significant. At present, a lot of data mining techniques are being used to detect and extract valuable information from the medical dataset with minimum user inputs and hard work. With the time, researchers have found several methods for implementing data mining in medical domain so that different types of heart diseases can H. K. Sharma (B) · A. L. Sangal Department of Computer Science and Engineering, Dr. B R Ambedkar National Institute of Technology Jalandhar, Jalandhar, Punjab, India e-mail: [email protected] A. L. Sangal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_70

657

658

H. K. Sharma and A. L. Sangal

be predicted accurately. The performance of data mining differs from technique to technique being used and the features selected. In general, the clinical datasets in the medical domain are redundant and unpredictable. Therefore, there is the need of prior and appropriate preparations for implementing data mining approaches. Following are the various techniques commonly used in data mining: a. Association: One of the known data mining technique in which the relationship among particular items of similar transaction is used to discover a certain pattern which is known as association. For instance, the relationship of various attributes used for analysis in heart disease prediction is known through association technique. All the risk factors needed for disease prediction are used [1]. b. Classification: On the basis of machine learning, another classic data mining technique designed is classification. Every object in the dataset is categorized into one of the predefined set of classes through classification. The various mathematical techniques are used in this method. c. Clustering: The objects with similar property are clustered together to generate a meaningful cluster using an automatic approach known as clustering. The classes are defined by the clustering technique as well, and the objects are placed in them. Further, in the predefined classes, the classification objects are assigned. For instance, it is possible to cluster the list of patients with similar risk factors using clustering when predicting heart disease. Therefore, the patients with high blood sugar and relevant risk factors can be separated [2]. d. Prediction: The relation among independent variables and dependent variables are discovered by another data mining technique called prediction.

2 Literature Survey There are a few sources which are liable for any sort of coronary illness. The Naive Bayesian (NB) calculation is viewed as which structure the Smart Heart Disease Prediction (SHDP). A precision of around 89% is appeared by the proposed approach [3]. To resolve the heart disease prediction related issues, ensemble techniques are used. An accuracy of 85.48% is achieved by proposed technique [4]. To improve the accuracy of predicting cardiovascular diseases using a model of hybrid random forest with linear model, an improved performance accuracy level of around 88.7% was achieved in this research [5]. To predict heart disease, this research focused on adapting the SVM and apriori algorithms. The medical profiles based on various factors were collected and used here. The patients that were more likely to get heart disease were predicted here [6]. For the medical fraternity and patients, the usage of appropriate technology support proved to be highly beneficial. Data mining techniques could be used to resolve such an issue. The accuracies of naive Bayes and decision tree were compared in this research [7].

Hybrid Model for Heart Disease Prediction …

659

To identify the risk in highly accurate manner, a heart disease prediction system was proposed in this research. A new system was designed using the data mining techniques. The frequent pattern growth association mining was applied on the dataset of patients to provide strong association rules. The data could be explored, and the heart disease could be predicted accurately by the doctors using this proposed method [8].

3 The Proposed Heart Disease Prediction Model See Fig. 1.

3.1 Dataset Selection The Cleveland dataset has been widely used for the heart disease prediction. This dataset has 14 attributes. Fig. 1 Heart disease model

START Input the data from the UCI repository for prediction

Pre-process the data for expulsion of missing and redundant values Apply random forest for the feature selection and then K-means

Apply logistic regression classifier to perform

Analyze performance in terms of certain parameters like accuracy, precision, recall

STOP

660

H. K. Sharma and A. L. Sangal

3.2 Data Preprocessing For applying data mining techniques such that completeness can be introduced, and a meaningful analysis can be achieved on the data; the data preprocessing is performed. The performance of training model is improved by providing a clean and noise free data for the feature selection process.

3.3 Feature Selection A subset of highly distinguished features is picked by feature selection to diagnose the disease. The discriminating features that belong to the available classes are selected by feature selection process. In the proposed method, the RF model is used for the feature selection. The RF model takes 100 as the estimator value and generates tree structure of the most relevant features. The random forest model selects the features which are most relevant or important for the heart disease prediction. After that, the selected features will be given to the k-means to do the clustering. The two clusters will be formed as our target variable has two classes yes or no.

3.4 Classification To categorize the given features for performing disease prediction, the selected features are mapped to the training model. Here, a kind of heart disease is represented by each separate class. The logistic regression model is applied for the classification. The logistic regression takes input of the k-means output. In this research work, two classes are defined which are heart disease and no heart disease. It means that which persons have probability of heart disease and which don’t have probability of heart disease.

4 Model Evaluation and Result Comparison from Previous Work A variety of models such as decision tree, Naive Bayes (NB), Multilayer Perceptron (MLP), ensemble of Random Forest (RF), NB, and MLP is applied on the dataset. The results of above models are compared in terms of accuracy, precision, and recall. It is analyzed that accuracy of proposed model is 95.08% which is maximum as compared to other models for the heart disease prediction. The dataset is divided into ratio of 60:40. 60% of the dataset is used for training, and remaining 40% is used for testing (Fig. 2; Tables 1 and 2).

Hybrid Model for Heart Disease Prediction …

661

Fig. 2 Accuracy comparison of models

Table 1 Parameters based on performance of models Model

Precision

Recall

Accuracy

Decision tree Naive Bayes

75

75

75.41

84

84

83.61

Multilayer perceptron

85

84

83.61

(NB + RF + MLP)

86

85

85.25

Proposed method

95

95

95.08

Table 2 Comparison from previous work S. No.

Author

Model

Accuracy

1 2

Anjan Nikhil Repaka et al.

Naive Bayes[3]

89.77

C. Beulah Christalin Latha et al.

Ensemble classification [4]

85.48

3

Our approach

Proposed random forest and logistic regression

95.08

5 Conclusion and Future Scope Heart disease is term that defines any disorder related to the heart. The problems using the blood vessels, circulatory system, and the heart are defined as the cardiovascular disease. It is analyzed in this work that heart disease prediction is very challenging as the large number of features included in it. The various models are tested for the heart disease prediction like decision tree, naive Bayes, multilayer perceptron, ensemble classifier. The novel model in which the random forest and logistic regression are integrated is introduced for the prediction. The selection of features is generated using random forest, and logistic regression is carried out to perform the classification.

662

H. K. Sharma and A. L. Sangal

The recall, accuracy, and precision obtained from the proposed model is computed as 95%. In future, the proposed model can be further improved using the methods of deep learning.

References 1. Duff, F.L., Muntean, C., Cuggia, M., Mabo, P.: Predicting survival causes after out of hospital cardiac arrest using data mining method. In: Medinfo, pp. 1256–1259 (2004) 2. Frawley, W.J., Piatetsky-Shapiro, G., Matheus, C.J.: Knowledge discovery in databases: an overview. AI Mag. 13(3), 57–57 (1992) 3. Repaka, A.K., Ravikanti, S.D., Franklin, R.G.: Design and Implementing Heart Disease Prediction Using Naives Bayesian. IEEE (2019) 4. Latha, C.B.C., Carolin Jeeva, S.: Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inform. Med. Unlocked 16, 100203 (2019) 5. Mohan, S., Thirumalai, C., Srivastava, G.: Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7 6. Sowmiya, C., Sumitra, P.: Analytical Study of Heart Disease Diagnosis Using Classification Techniques. IEEE (2017) 7. Priyanka, N., Kumar, P.R.: Usage of data mining techniques in predicting the heart diseases— Naïve Bayes and decision tree. In: 2017 International Conference on Circuit ,Power and Computing Technologies (ICCPCT), pp. 1–7, Kollam (2017) 8. Chauhan, A., Jain, A., Sharma, P., Deep, V.: Heart disease prediction using evolutionary rule learning. In: 2018 4th International Conference on Computational Intelligence and Communication Technology (CICT), pp. 1–4 (2018)

Detection of Android Malware Using Machine Learning Techniques Sonal Pandey, C. Rama Krishna, Ashu Sharma, and Sanjay Sharma

Abstract With the increase in popularity of the internet and android operating system, the number of active internet user and their daily activity on android devices is also increasing. So, that’s the reason malware writers are targeting android devices more and more. The quickly creating malware is a major issue, and there is a requirement for discovery of android malware to secure the framework. Signature-based technologies work efficiently for known malware but fail to detect unknown malware or new malware. Academia is continuously working on machine learning and deep learning techniques to detect advanced malware in today’s scenario. For machine learning, feature vector and sufficient dataset are very important. In this paper, we will develop and implement an approach for the detection of unknown malware with a high detection rate. Keywords Malware · Metamorphic malware · Android · Machine learning

1 Introduction In recent years, android has overtaken many other mobile operating systems to become one of the most popular and versatile mobile platforms in the world. International Data Corporation (IDC) shared a report on global market share [1] for the S. Pandey (B) · C. Rama Krishna NITTTR Chandigarh, Chandigarh, India e-mail: [email protected] C. Rama Krishna e-mail: [email protected] A. Sharma Mindtree Hyderabad, Hyderabad, India e-mail: [email protected] S. Sharma C3i, IIT Kanpur, Kanpur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_71

663

664

S. Pandey et al.

smartphone operating system, this report shows that in the 3rd quarter of the year 2018, 86.8% or total market is shared by the android operating system. Developers prefer android over other smartphone operating systems for developing applications because it is completely open-source. On the other hand, a user is biased to opt for android smartphones due to the availability of low to high-end models, easy to use, customization, high level of multitasking, custom ROMs, support of a large number of applications, etc.

1.1 Android Background Google developed an android working framework, which is an open-source portable mobile operating system dependent on the Linux kernel and has been released under the Apache v2 open-source permit. The following sections provide an overview of android system architecture, benign and malware application, android tools and techniques to distinguish them, and previous research work is done in this field.

1.2 Malware and Their Types A software/program which is proposed to attack the framework without client permission as well as makes denied move is recognized as malicious software [2]. As new variants of malware are introduced every day by malware developers, malware detection becoming a difficult task. Malicious software is term used for computer virus, spyware, trojan, worm, infection, and so on. In today’s digital world are big threats because highly skilled hackers are increasing customized malware to disrupt industries and military espionage [3].

1.3 Malware Analysis Approaches The detection process followed by signature-based, heuristic, normalization, and machine learning techniques [4] which further accomplished the process with static, dynamic, and hybrid analysis. Analysis approach means how detection techniques gather information which is further used for detecting malicious software. Static Analysis: Static analysis is employed to extract features from code which is gathered by disassembling the programs with the utilization of any disassembler tool further which is employed for a distinction between malware and benign [5]. Dynamic Analysis: Dynamic analysis is also defined to as behavioral analysis involves executing malicious program and monitoring behavior, framework interaction, and the effect on the host machine [6].

Detection of Android Malware Using Machine Learning Techniques

665

Hybrid Analysis: It includes both static and dynamic approaches for malware analysis. It first inspects malware code by static examination shadowed by a dynamic analysis approach for the improvement of complete examination [7].

1.4 Malware Detection Techniques The purpose of detection methods is to study the program’s behavior and verify whether it is malicious or benign. Robust malware detection relies upon the capability of handling obfuscated malware efficiently [2]. Two generally used obfuscation techniques used are polymorphic and metamorphic in the generation of second-generation malware. To battle threats/attacks from malware, antimalware developer’s software is created, which primarily relies on the presumption that the formation of malware does not modified considerably. The following are the techniques for malware detection: 1. Signature-Based: Signature-based technique is an easy and impressive way of detecting known malware [8]. To combat threats/attacks from the malware, antivirus developing companies use signature-based techniques. Unique byte sequences are extracted when the malware is classified which is being used as a signature. 2. Heuristic-Based: In heuristic-based detection technique artificial intelligence used with signature-based detection techniques to enhance efficiency [9]. 3. Malware Normalization: In malware normalization, normalizer acknowledges the confounded type of malware and annihilates the confusion taken on the program and creates the standardized executable. 4. Machine Learning: In the last couple of years, a machine learning technique is attaining popularity for malware detection. “Tom Mitchell portrays machine learning as the learning of computer algorithms that improves through analyses” [10].

2 Background Study Carrying out a literature review is extremely important in any research project as it establishes the need for the work. For assessment of the issue and proposed arrangement, some various research papers and some related books from international conferences, journals, and symposia are extracted. Schultz et al. [11] introduced machine learning for malware detection, author utilized program executable (PE), byte n-gram and strings for features extraction [11]. The classifier used by the author is Ripper, Naive Bayes, multi-Naive Bayes for training and testing. So, from 2001, machine learning assumes a crucial job in obscure malware presentation.

666

S. Pandey et al.

Allix et al. [12] introduced a novel approach in 2014 to extract the control flow graph from the application program that is a more expressive way than n-gram representation. The authors used a sizeable dataset (over 50,000) of android application and implemented using machine learning classifiers viz. Random forest, J48, LibSVM, and JRip using ten-fold cross-validation. Feizollah et al. [13] showed the effectiveness of explicit Intent and implicit Intent for android malware detection. The evaluation has been done on 5560 malware samples and 1846 benign samples. They adept a 91% accuracy by utilizing android Intent, 83% utilizing android permission, and by consolidating together, they learned the detection rate of 95.5% [13]. Sun et al. [14] introduced SigPID which is based on the permission analysis to detect malicious applications. Authors have extracted 135 permissions from the dataset but used only 34 permissions (25% of total permissions) to differentiate between malicious and benign applications [6]. They have used the support vector machine (SVM) for model training and declares an accuracy of 93.62% within dataset and 91.4% for unknown malware [14]. Tao et al. [15] in this paper, authors studied hidden patterns of malware in real world android applications. Authors extracted sensitive APIs that are utilized in malware and similarly implemented an automatic malware recognition system to detect unknown android malware [15]. Authors conducted a comprehensive study using 31,185 benign and 15,336 malware samples and got 98.24 F1 scores. Rashidi et al. [9] introduced an android assets utilization hazard evaluation called XDroid. They have utilized Drebin malware dataset and affirmed that their methodology could achieve up to 82% precision [9]. Zhu et al. [16] proposed a highly efficient and low-cost approach to extract permissions and sensitive APIs as a feature and used ensemble rotation forest for model training. The authors used 2130 samples to train the model and got 88.26% detection accuracy with 88.40% sensitivity at the precision of 88.16%. Opcode plays an important role in malware detection as Sanjay et al. [17] used opcodes frequency for malware detection in their approach. They have used the fisher score feature selection algorithm for relevant feature selection. Authors used several classifiers existing in Weka machine learning tool and got almost 100% detection accuracy. Recently, Ashu et al. [17] examined the five classifiers on the Drebin dataset using the opcodes occurrence as a feature and got an accuracy of 79.27% by functional tree classifier for malicious application detection. Sahin et al. [18] proposed an authorization based android malware framework to recognize malevolent applications. In contrast to different investigations, the authors proposed an authorization weight approach. At that point, K-nearest neighbor (KNN) and Naıve Bayes (NB) calculations are used and got 90.76% accuracy. As indicated by the authors, the proposed approach has preferable outcomes over different ones.

Detection of Android Malware Using Machine Learning Techniques

667

3 Proposed Methodology In this segment, we clarify our methodology for the detection of malicious android applications utilizing machine learning. Figure 1 displays the structure of our offered strategy where we perform the accompanying advances: • • • •

Dataset collection Feature extraction Feature selection Classification of malware and benign

3.1 Dataset Collection We collect malicious and benign apks for android applications from AndroZoo [7], which is a developing vault of android applications. AndroZoo contains the applications that are collected from the various sources, including the Google play store marketplace. The dataset we use for the analysis contains 15,000 malware and 15,000 benign android application package APKs. We also check the secure hash algorithm (SHA) value of the applications to ensure the unique sample for analysis. Fig. 1 Flow chart of our approach

668

S. Pandey et al.

3.2 Feature Extraction In this phase, we extract the features from the dataset that we have collected. In this work, we have performed the static analysis of android applications to identify malicious applications. For the extraction of features, we use Apktool and Androguard [19] reverse engineering tools. We extract seven categories of features, namely requested android open source project (AOSP) permissions, requested third-party permissions, providers, activities, receivers, services, and opcodes. We extract all categories of features based on the occurrence in the application. During the initial stage of feature extraction, we extract a huge number of features in each category. Then, we first filter the features with the top frequency of occurrence in the dataset. Figures 2, 3, 4, 5, 6, 7 and 8 shows the top 20 frequency feature in our dataset for each category of features. The information that is extracted as features are as follows: 1. Permissions: Permissions are used to protect the privacy of an android user, and a few applications also need permission to access users’ sensitive data like short message service SMS, contact, etc. Some applications also request third part permission which is not mentioned in the android open-source project [20]. The combination of permission sometimes reflect the malicious behavior. Therefore, we extract two types of permission as features. • AOSP and third-party permission. Figure 2 shows comparison of top 20 android open-source project permissions in malware and benign. Figure 3 shows comparison of top 20 third-party permissions in malware and benign.

Fig. 2 Comparison graph of top 20 AOSP permission in malware and benign

Detection of Android Malware Using Machine Learning Techniques

669

Fig. 3 Comparison graph of top 20 third-party permission in malware and benign

Fig. 4 Comparison graph of top 20 activity in malware and benign

2. Activity: An activity class is defined as a urgent segment of an android application, and the manner in which exercises are propelled and assembled is an essential piece of the platform’s application model. Figure 4 shows comparison of top 20 activity in malware and benign. 3. Opcodes: Opcodes play an essential role in the execution of the application [5]. During the literature review, we find that in the static analysis, operational codes are basic building blocks during the execution of applications. Figure 5 represents comparison of top 20 opcodes present in malware and benign.

670

Fig. 5 Comparison graph of top 20 opcodes in malware and benign

Fig. 6 Comparison graph of top 20 service in malware and benign

S. Pandey et al.

Detection of Android Malware Using Machine Learning Techniques

671

Fig. 7 Comparison graph of top 20 provider in malware and benign

Fig. 8 Comparison graph of top 20 receivers in malware and benign

4. Service: A service is an application part that can execute long-running activities in the back-end, and it does not deliver a user interface [1]. A provider manages access to a central repository of data. Figure 6 represents comparison of top 20 services present in malware and benign. 5. Provider: A provider is part of an android application, which often provides its user interface (UI) for working with the data. Figure 7 represents comparison of top 20 provider features present in malware and benign.

672

S. Pandey et al.

6. Receiver: Receivers answer broadcast messages from different applications or from the framework itself. These messages are at some point called events or intents. Figure 8 represents comparison of top 20 receiver features present in malware and benign.

3.3 Features Selection During the feature extraction phase, we extract a total of 696 features. But, we understand that the efficiency of the machine learning model will decrease with a large number of features, and also it increases the time to train and test the model. We apply information gain and correlation coefficient as a feature selection algorithm for the reduction of irrelevant features. During the literature review, we find that the researchers widely use these feature reduction algorithms [21, 22]. During the feature selection process, we remove those features which are either not participating in enhancing the model performance or trying to make the performance of the model worse. In the case of the correlation coefficient, feature selection, we select 180 features whose ranking score is greater than 0.1189 and in case of information gain 106 features are selected whose score is greater than 0.103665.

3.4 Classiftcation This section discusses the machine learning classifiers which are utilized in our work. For the classification, we practice four dissimilar supervised machine learning classifiers, namely random forest [4], eXtreme Gradient Boosting (XGBoost) [23], decision tree [24], and k-nearest neighbors (KNN) [25]. These ML classifiers are mostly used in malware detection domain, and there is one reason to select treebased classifiers because these are very robust [20, 26–29]. They perform well on a large variety of problems and also capture dependencies in such ways linear models cannot. We spilt the dataset in a 70–30% ratio for training and testing the models, respectively. We also are doing the parameter tuning while training and testing for the higher performance of the classifiers. To analyze the performance of our models, we execute experiments using ten-fold cross-validation.

4 Experimental Results Firstly, we applied ten-fold cross-validation on data set with a single feature vector, i.e. (opcodes, receivers, providers, etc.), after that on combined features. Finally, we applied ten-fold validation on the dataset with selected features using information gain and correlation. Table 1 summarized the accuracy of each classifier with a

Detection of Android Malware Using Machine Learning Techniques

673

Table 1 Experimental results Features/classifters

Random forest XGBoost Decision tree KNN

Activity

0.759036

0.747992

AOSP

0.911312

opcodes

0.953846

Third party permissions

0.757363

0.731928

0.878179

0.899933

0.873159

0.920013

0.92838

0.929384

0.723896

0.72423

0.725904

0.691767

Providers

0.60107

0.601405

0.60107

0.567603

Receivers

0.716867

0.705823

0.716867

0.669344

Service

0.727242

0.722557

0.727242

0.683735

Combined all 7 feature without feature 0.978246 selection

0.985609

0.977912

0.9334

Combined all 7 feature with information gain

0.991968

0.988956

0.98929

0.930723

Combined all 7 feature with correlation 0.990629

0.985609

0.985609

0.925033

single feature vector, without feature selection, and finally with selected features. So, in the case of random forest using information gain feature selection algorithm with combined features, we got maximum accuracy, i.e., 99.19%.

5 Conclusion and Future Scope We presented a machine learning approach based on 7 types of features with two feature selection algorithm, i.e., information gain and correlation algorithm to detect and analyze malicious android apps. Specifically, we used 7 types of features viz. requested AOSP permission, requested third-party permissions, providers, activities, receivers, services, and opcodes with two feature selection algorithm. In our approach, we have used 4 classifiers for classification and got 99.1968% accuracy with random forest. The proposed approach can be extended by analyzing apps dynamically. As future work, another exploration center is consolidating static and dynamic examination in which different machine learning classifiers are utilized to dissect both source code as well as dynamic feature of applications in run time environment.

References 1. International Data Corporation: Smartphone Market Share. https://www.idc.com/promo/sma rtphone-market-share/os. (Nov, 2019) 2. Sharma, A., Sahay, S.K.: Evolution and Detection of Polymorphic and Metamorphic Malwares: A Survey. arXiv:1406.7061 (2014) 3. Stone, R.: A Call to Cyber Arms (2013)

674

S. Pandey et al.

4. Dogru, N., Subasi, A.: Traffic accident detection using random forest classifier. In: 2018 15th Learning and Technology Conference (L&T), pp. 40–45. IEEE (2018) 5. Sharma, S., Krishna, C.R., Sahay, S.K.: Detection of advanced malware by machine learning techniques. In: Soft Computing: Theories and Applications, pp. 333–342. Springer (2019) 6. Shabtai, A., Moskovitch, R., Elovici, Y., Glezer, C.: Detection of malicious code by applying machine learning classifiers on static features: a state-of-the-art survey. Inf. Secur. Tech. Rep. 14(1), 16–29 (2009) 7. Allix, K., Bissyand´e, T.F., Klein, J., Le Traon, Y.: Androzoo: Collecting millions of android apps for the research community. In: Proceedings of the 13th International Conference on Mining Software Repositories, pp. 468–471. MSR ’16, ACM, New York, NY, USA (2016). https://doi.org/https://doi.org/10.1145/2901739.2903508 8. Griffin, K., Schneider, S., Hu, X., Chiueh, T.C.: Automatic generation of string signatures for malware detection. In: International Workshop on Recent Advances in Intrusion Detection, pp. 101–120. Springer (2009) 9. Rashidi, B., Fung, C., Bertino, E.: Android resource usage risk assessment using hidden markov model and online learning. Comput. Secur. 65, 90–107 (2017) 10. Dietterich, T.G.: Machine learning in ecosystem informatics and sustainability. In: Twenty-First International Joint Conference on Artificial Intelligence (2009) 11. Schultz, M.G., Eskin, E., Zadok, F., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001, pp. 38–49. IEEE (2000) 12. Allix, K., Bissyand´e, T.F., J´erome, Q., Klein, J., State, R., Le Traon, Y.: Largescale machine learning-based malware detection: confronting the” 10-fold cross validation” scheme with reality. In: Proceedings of the 4th ACM Conference on Data and Application Security and Privacy, pp. 163–166 (2014) 13. Narudin, F.A., Feizollah, A., Anuar, N.B., Gani, A.: Evaluation of machine learning classifiers for mobile malware detection. Soft. Comput. 20(1), 343–357 (2016) 14. Li, J., Sun, L., Yan, Q., Li, Z., Srisa-an, W., Ye, H.: Significant permission identification for machine-learning-based android malware detection. IEEE Trans. Industr. Inf. 14(7), 3216–3225 (2018) 15. Tao, G., Zheng, Z., Guo, Z., Lyu, M.R.: Malpat: mining patterns of malicious and benign android apps via permission-related Apis. IEEE Trans. Reliab. 67(1), 355–369 (2017) 16. Zhu, H.J., You, Z.H., Zhu, Z.X., Shi, W.L., Chen, X., Cheng, L.: Droiddet: effective and robust detection of android malware using static analysis along with rotation forest model. Neurocomputing 272, 638–646 (2018) 17. Sharma, A., Sahay, S.K.: An investigation of the classifiers to detect android malicious apps. In: Information and Communication Technology, pp. 207–217. Springer (2018) 18. Sahın, ¸ D.Ö., Kural,O.E., Akleylek, S., Kili¸c, E.: New results on permission based static analysis for android malware. In: 2018 6th International Symposium on Digital Forensic and Security (ISDFS), pp. 1–4. IEEE (2018) 19. Androguard. https://androguard.readthedocs.io/en/latest/ (Dec, 2019) 20. Milosevic, N., Dehghantanha, A., Choo, K.K.R.: Machine learning aided android malware classification. Comput. Electr. Eng. 61, 266–274 (2017) 21. Jimenez, J.H., Goseva-Popstojanova, K.: Malware detection using power consumption and network traffic data. In: 2019 2nd International Conference on Data Intelligence and Security (ICDIS), pp. 53–59. IEEE (2019) 22. Zhang, Z., Chang, C., Han, P., Zhang, H.: Packed malware variants detection using deep belief networks. MATEC Web Conf. 309, 02002 (2020) 23. Zhang, Y., Huang, Q., Ma, X., Yang, Z., Jiang, J.: Using multi-features and ensemble learning method for imbalanced malware classification. In: 2016 IEEE Trustcom/BigDataSE/ISPA, pp. 965–973. IEEE (2016) 24. Gunnarsdottir, K.M., Gamaldo, C.E., Salas, R.M., Ewen, J.B., Allen, R.P., Sarma, S.V.: A novel sleep stage scoring system: combining expert-based rules with a decision tree classifier. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 3240–3243. IEEE (2018)

Detection of Android Malware Using Machine Learning Techniques

675

25. Cunningham, P., Delany, S.J.: k-nearest neighbour classifiers. Multiple Classifier Syst. 34(8), 1–17 (2007) 26. Alam, M.S., Vuong, S.T.: Random forest classification for detecting android malware. In: 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing, pp. 663–669. IEEE (2013) 27. Firdausi, I., Erwin, A., Nugroho, A.S., et al.: Analysis of machine learning techniques used in behavior-based malware detection. In: 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies, pp. 201–203. IEEE (2010) 28. Kruczkowski, M., Niewiadomska-Szynkiewicz, E.: Comparative study of supervised learning methods for malware analysis. J. Telecommun. Inf. Technol. (2014) 29. Wang, J., Li, B., Zeng, Y.: Xgboost-based android malware detection. In: 2017 13th International Conference on Computational Intelligence and Security (CIS), pp. 268–272. IEEE (2017)

The Predictive Genetic Algorithm (GA) Load Management Mechanism for Artificial Intelligence System Implementation (AI) T. Pushpatha and S. Nagaprasad

Abstract The next generation of cloud infrastructure will allow the network more versatile and useable resources effectively. Load balancing is one of the key issues of cloud computing which distributes the task over many nodes to insure that no one tool becomes overwhelmed which underused. The user must be willing to guarantee that all criteria are fulfilled in a limited span of time for optimal performance for applications which are cloud dependent almost every day. The cloud load genetic algorithm (GA) approach is given in this article. Depending on the population initialization period, the urgency of the proposal is considered. The emphasis is on the idea of imaging the universe in question. Systems for real-life situation have other targets for our algorithm to be mixed. The suggested method is modeled using cloud analyst. A simulation of cloud infrastructure is feasible. The end result will reveal the viability of a quantitative workload management approach that would help manage working loads with an improved use of computational capital. This article offers a new approach to genetic algorithm (GA) load control. To order to minimize the difficulty of a single task, the algorithm handles the cloud storage load. The proposed load balancing strategy was evaluated by a program analyst model. The findings of simulations for a typical sample system show that the suggested algorithms exceeded real methods like FCFS, round robin (RR), and the local search algorithm of stochastic hill climbing (SHC). Keywords Cloud computing · Load balancing · OLB · GA

T. Pushpatha (B) · S. Nagaprasad Department of M.C.A., St.Ann’s College, Mehdipatnam, Hyderabad, Telanagana, India e-mail: [email protected] Faculty of CS and CA, Tara Govt. College (A), Sangareddy, Telangana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_72

677

678

T. Pushpatha and S. Nagaprasad

1 Introduction Configuration balance is one of the key issues of virtualization, command of setup. There are large-scale load balancing projects, but cloud computing remains a significant topic and numerous research programs are under way [1]. Because, the architecture of the cloud is generic, and the problem is normal. Only with homogenous and dedicated money will traditional load-balance algorithms be used, but cloud computing cannot operate successfully [2]. The complexity, complexities, and flexibility of the cloud technology are often popular but cannot be directly applied to cloud infrastructure with conventional charge balancing algorithms. Network cloud computing is an increasingly increasing network system that delivers services to its consumers with the aid of remote device resources. Stored are technology and facilities, production systems of technology and tool testing experiments [1, 3]. This allocation of resources is made by suppliers of energy. The second was placed under ‘Server Application Systems’ (SaaS) and the ‘Cloud Data infrastructure’ (Iaas) server network (SaaS), respectively. [2] Cloud computing is a demand-based, pay-as-go (PAYG) [4] private cloud storage network. Adobe, Microsoft, Twitter, SAP, Oracle, VMware, IBM, and other big players are among the key players for this development [1, 3]. The manufacturers are primarily IT firms. Two different headings are displayed for the cloud storage platform. Firstly, information is provided defined by the way the conventional cloud provider operates. This is why three main SaaS, PaaS, and IaaS forms are widely used [5, 6]. The other is the size, the connection, the management, the sophistication, and the accessibility of the cloud model. The summary of NIST offers four cloud structures, proprietary, public, collaborative, and hybrid [7]. The load balance relates to the ways in which operations are spread fairly across the storage infrastructure of data centers to improve cloud computing performance. The primary function for load balancing that focuses on the client or the service provider and can be identified by • the user in order to minimize the impact of its operation regardless of any other network activity. The service company’s objective is to increase the turnaround period and allocate available funds effectively. The matter is divided into four steps which reflect a handling solution for loads. (1) Load calculation: A load estimate is first necessary to assess the load unbalance. Workload calculation involves different activities to evaluate the number balance of operations. (2) Load start-up: If a difference occurs, when loadings have been specified for all VMs. The question of load disequilibrium is the unforeseen phenomenon on the part of CSP, which undermines the capability and trust of system resources along with the guarantee of standard of operation (QoS) under the service level agreement agreed (SLA). Load balancing (LB) is essential under these circumstances, and this subject is of particular interest to the researchers. At the physical device or VM stage [1] of cloud computing, the load balancing may be accomplished.

The Predictive Genetic Algorithm (GA) Load Management …

679

2 Literature Survey Cloud computing is recognized as one of the newest cloud computing technologies and is not designed for universities but rather for companies. As end-user software, this cloud platform includes virtualized, streamlined, and substantial resources. Indeed, it has a major advantage in promoting computation completely as a business. There are thousands of machines. In the entire cloud world, it is not feasible to manually allocate resources, and therefore, we focus on the principle of virtualization. Innovative equipment maintenance, certified applications, and employee preparation options are offered by the cloud infrastructure. Cloud computing is entirely Internet-based, millions of machines are internet linked to the web. Online computing provides servers, bandwidth, applications, networking, and more. The cloud platforms are versatile for users in the context of virtualization. Figure 2.1 illustrates the concept of cloud computing design. The concept behind cloud computing is virtualization. Virtualization incorporates vast computing resources in order to maximize technology. The four layers of cloud computing were proposed by Foster et al. (2008). Live ET everything. The framework layer includes computer, device, and network resources. The actual property layer includes the hardware image of the virtualization methodology. The application layer controls the malware framework for end users. The foundation of the application contains the web client. One of the big problems of virtualization is load balancing. Key load balancing studies, however, are important topics for cloud infrastructure and various work activities are ongoing. Because of the common cloud infrastructure and the distinctive problem. The classic load balancing algorithms can only be used with standard, committed services so that the cloud infrastructure cannot function properly. Many facets of the cloud architecture are efficiency and durability, which conventional load balances cannot do directly in cloud computing. M. Randles et al. have investigated a decentralized strategy of load balancing with honeybees, which is a natural solution. It manages loads through the activity of the nearby program. The execution of the software is enhanced by an expanded set of features, but the size of the computer does not enhance performance. This is ideally tailored to the situations under which a specific community of service users is needed. Z. Samson et al. In a transparent, distributed computing system, a load balancing solution was introduced which snarled the anti-colony and dynamic network theories. Such approach removes complexity, can be adapted to different environments, provides genius in faulty tolerance and is highly adaptable, thereby improving device efficiency. Without the quality of a complex charge balance, the system uses small worlds.

680

T. Pushpatha and S. Nagaprasad

3 Proposed System and Methodology For VMs in cloud storage, a common load balancing technique was adopted. This requires regional policy experience to render load management decisions. Normal performance is improved similarly by load balancing, and fault reduction is not taken into account. R. The card approach that involves the load balancing and the distributed rate control system has been introduced by Hamilton et al. and acts as an integrated tool for cloud management. Brazilian et al. Yi = f (X i , β) + ei

(1)

f (X i , β) = β0 + β1 X i

(2)

For data centers with automated cloud virtualization and computing, vector dot methodology has been implemented. The dot component is used to distinguish nodes by utility requirements. The algorithm in the illustration aims to resolve the problem of load balance for capital delivery. Yi = β0 + β1 X i + ei 

= (Yi − f (X i , β))2

(3) (4)

i

Nevertheless, the approach does not tackle the reduction of costs, that is, the expense of the allocation of loads that may take longer than the actual measurement time (Fig. 1). To mitigate costs for storage and the benefit from decreased knowledge transmission. However, in order to maximize the distribution and migration of the data using the linear algorithm, such algorithms will require simultaneous applications for data processing and migration, implementing slave load balance master (master slave load balance).

Fig. 1 Predicted bandwidth requirement

The Predictive Genetic Algorithm (GA) Load Management …

681

Fig. 2 System architecture design



= ei2 =



i

(Yi − (β0 + β1 X 1i + β2 X 2i ))2 = 0

(5)

i

Yi = β0 + β1 X 1i + β2 X 2i + ei  i

ei2 =



(6)

(Yi − (β0 + β1 X 1i + β2 X 2i ))2 = 0

(7)

i

Nevertheless, this method just addresses static load balancing. This indicates that the Lagrange multiplier is estimated for the transmitted weight and therefore an efficient functional weights balance converting algorithm in Euclidean form (Fig. 2). The development technology of a hybrid grid and cloud infrastructure [8] reduces the operating system’s length and overall management.  β1 =

(xi − x)(yi − y)  (xi − x)2

(8)

yi = β0 + β1 xi .  σβ0 = σε

(9) 

1 x2 + = σβ1 (xi − x)2 n



xi2 n

(10)

682

T. Pushpatha and S. Nagaprasad

To address the expenditure and the time period for the question. This approach yields better outcomes in a shorter time. Similar types of topics were regarded.

3.1 Machine Learning Algorithms with BSP Paradigm Typically located ML uses the BSP model, for example, spark and log. The calculation method for BSP contains a set of T super phases divided into a synchronization firewall. Super phase is used to define a series of operations over two synchronization maximum periods. In each super step, all measurement nodes conduct iterative calculations simultaneously. p n   i=1 k=1

xi j xik βk =

n 

xi j yi , j = 1, . . . , p.

(11)

i=1

   X X β = X Y,

(12)

β = (X X)−1 X Y.

(13)

Enter a synchronization barrier and wait. The parameter changes and passes the regional parameters to all computer nodes until the calculations are finished by all the computer nodes and acts are settled upon.     1 n ρ yX, β, σ 2 ∝ (σ 2 )− 2 exp − 2 (y − Xβ)T (y − Xβ) . 2σ

(14)

β = (XT X)−1 XT y

(15)

All computer nodes then travel together through the next overlapping. The alignment limits are exposed. This sync function allows the parallel ML algorithm of the BSP model to be serialized to ensure that changes in parameters are globally consistent and that execution of the algorithm is precise.

3.2 The Efficiency of the BSP Machine Model If there is an unbalanced cluster load, the efficacy of the BSP system model will decrease considerably. For e.g., the Ho analysis shows that, if the LDA model is run on 32 BSP devices, the synchronization barrier is six times higher than iterations [9]. Some cases, thus removing the problem of straggler. The styling threshold of DSP is low, but the cluster load balance is implied. The straggler problem cannot

The Predictive Genetic Algorithm (GA) Load Management …

683

be solved because when the system nodes are added, the DSP does not fully fit the cluster charge. The following threshold limits analyses are provided.

3.3 DSP-Based Load Equilibrium Adaptation Method As all computation nodes are modified synchronously and the iterative check quantity from each device node is calculated through the usage of the output model, the controller mechanism generates the function performance via the Ganglia machine control unit. A-DSP provides a system for adjusting load balances dependent on DSP.

4 Prediction and Simulation Method The main parameter method components are the centralized management framework, the application unit for output insurance, the centralized synchronous control device, and the redistribution task framework. For global model parameters, the standardized parameter control architecture is applied (Fig. 3). More estimates at strong nodes are delegated by rapidly converting the estimated sum of the iteration of each node into the actual iteration period between nodes, thus essentially growing the cluster load and the configuration exercise. Transform FR makes less node slower time and faster node measurement.

ˆ T y − Xβˆ and v = n − k, (16) vs 2 = (y − Xβ)       ρ β, σ 2 = ρ σ 2 ρ βσ 2 , 

ρ σ

2

Fig. 3 Aggregates of sample data



2 −

∝ (σ )

v0 2

−1

  v0 s02 exp − 2 . 2σ

(17)

(18)

684

T. Pushpatha and S. Nagaprasad

Fig. 4 A basic design with three stages

   1  k ρ βσ 2 ∝ (σ 2 )− 2 exp(− 2 β − μ0 )T 0 (β − μ0 ) . 2σ

(19)

The working load array records the training number calculated with the next iteration of the algorithm node 1. Three computers with one system and database transfers are included on the site level. Only through the flow lines from thirds. It is represented when not all operator pairs connect in the following traffic matrix (Fig. 4). The coding was passed to the manager of the dependency program and an unbiased operation evaluation. It receives the job and tests if it is totally isolated or requires multiple jobs.   μn = (XT X + 0 )−1 XT X β + 0 μ0 .

(20)

It explores the relations between various activities if several activities are involved. You’ll take the job queue and another job queue into consideration. The work roles will be directed to the scheduler to plan childcare after childcare.

4.1 The Load Balancing in the IWRR The addiction job list will contain activities dependent on the other VM activities. As long as all the roles of the child are fulfilled in this set, the parent task of the VM is delegated while the person tail includes different duties. The scheduler comes with a separate work queue and addiction feature.       ρ β, σ 2 y, X ∝ ρ βσ 2 , y, X ρ σ 2 y, X ,

(21)

    n = XT X + 0 , μn = (n )−1 XT X β + 0 μ0 ,

(22)

The Predictive Genetic Algorithm (GA) Load Management …

685

The scheduler selects the correct machine based on the IWRR algorithm. The scheduler gathers the details of the resource planner. an = a0 +

 n 1 , bn = b0 + yT y + μT0 0 μ0 − μTn n μn . 2 2

(23)

It tests the processing power of the VMs and then uses the suggested algorithm to determine the right VM for the particular task. Every VM provides comprehensive details on the task execution list, task split list, and job custody.

4.2 Load Balance Measures with Percentage from VM The load balancer tests the office number percentage to the VM point. The loading on will of the VMs would be calculated by means of the VM’s job execution list if the proportion is less than 1, then the VM scheduler must marking for the task.  1 T y y + μT0 0 μ0 − μTn n μn . 2 p(y|m) = p(yX, β, σ ) p(β, σ )dβdσ

bn = b0 +

 1 p(ym) = n (2π ) 2 p(ym) =

det(0 ) b0a0 (an ) · · det(n ) bnan (a0 )

p(β, σ |m) p(yX, β, σ, m) p(β, σ y, X, m)

(24) (25)

(26) (27)

The least-used VM will be allocated when usage falls below 20 per cent; the scheduler will be informed of the right VM for the job. Before the right server is located, the job will be assigned to this machine. P(ZX) =

P(XZ)P(Z) P(XZ)P(Z) = ∫Z P(X, Z)dZ P(X)

(28)

The configured data centers contain host and VM with the corresponding elements. The funds are checked for idleness and large loads to move demands for workers efficiently to an acceptable location.

686

T. Pushpatha and S. Nagaprasad

5 Results The following order indicates the maximum to lowest efficiency of the computing power of the heterogeneous VMs into account. Among the homogenous workplaces among heterogeneous settings, more workers are allocated to higher ability VMs. DKL (QP) =





Q(Z) log Q(Z) − log P(Z, X) + log P(X)

(29)

Z

The WRR takes into consideration the connection of VM ability to the overall VM resources and assigns a proportionate amount of works

DKL (QP) = EZ log Q(Z) − log P(Z, X) + log P(X)

(30)

When the least loaded is able to complete all of the works current in extremely loaded worker in the shortest possible period. It is the next step it does. Based from the previous equation, the LDVs will be allocated long work, so the execution period will be postponed (Fig. 5). The scheduler then checks the estimated completion date for each of the loaded VMs and covers the estimated period for a VM at the real completion date of the set. Consequently, the least likely period of depletion was calculated in one of the VMs from the above measurements, and the function was then allocated to this VM. At the end of the task is the load balance in the IWRR with the work time. Also, for heterogeneous area data centers, this method is ideal (Fig. 6). Solid diurnal task weights developed by the Cicada professional tracking algorithm. Nearly, all weights are given 24 h earlier by the algorithm (Fig. 7). The following instruments are listed. The data storage module helps to store each computer node’s subtractive sets. The node reads the sharing of workload already. The goal of this paper is to adapter to the dynamic workload distribution (Fig. 8). The output of the Cicada estimation algorithm versus past. When a tale rises, the sum of past requires to be considered. For a projection in all but one event, Cicada requires fewer than 10 ms (Fig. 9). Fig. 5 Figure relative L-2 error

The Predictive Genetic Algorithm (GA) Load Management … Fig. 6 Index of X-indicates matrix

Fig. 7 Results CDF

Fig. 8 Speed of prediction calculations

Fig. 9 CDF 2 spatial variations

687

688

T. Pushpatha and S. Nagaprasad

Fig. 10 Index X-2

The SSP requires multiple training sessions to be performed by each the slowest node completes this iteration by default. The global model parameters are then satisfied in order to adjust local model parameters (Fig. 10). The degree of the frequency of synchronization challenges is minimized, and SSP model testing costs are lowered. At the same time, if the output is greater than PE, a space-free CPU/PE may only be used to execute one operation at any time. The intern performs higher job migrations in the WRR and RR algorithms. This sum of migration is also important in the smaller number of resources in WRR and RR algorithms (Fig. 11). Displays this type for reservation. Rather, a ‘over-subcontracted data clumters’ (VOC) model is ideally adapted to authors who recognize there are no program traffic trends in this curriculum vitae (quotation [37] and [61]). This model makes groups with oversubscribed virtual machine interaction as seen in Figs. 3, 4 and 5b. The VOC model needs two additional parameters (Fig. 12). The estimate shows that the violation in SLA enhancement is less than 0.286. Figure 6 points out the effects of the JCR with and without SVM. JCR has a redline impact with SVM, while blue line reflects JCR’s effects without SVM. The figure shows that the JCR increases above 0.538 (Fig. 13). Fig. 11 Virtual switch network results

The Predictive Genetic Algorithm (GA) Load Management …

689

Fig. 12 Error

Fig. 13 CDF 4 relative error fraction

This is an example of the topology of the egoistic location of the network. In order to place J1 and J3, the gourmet positioning algorithm also requires the rate-1 approach as it places (Fig. 14). Effect on the fulfillment of requests using the field data instead of the Cicada estimates as seen. Cicada is proposing to develop software in this respect for some

Fig. 14 Oversubscriptions factor

690

T. Pushpatha and S. Nagaprasad

55–75%. The average increase is 11–18% in total and 25–26% in the amount of applications that have been modified by Cicada. These are near the numbers published.

6 Conclusion and Enhancement in Future The improved weighted round robin algorithm helps VMs to operate in and out of most compatible VMs. In the three different circumstances of the atmosphere cycle, there are three distinct phases. Initial placement concentrates on the strengthened weighted round robin algorithms to provide job requirements for msv participants, based on the capabilities of the VM and the required working time. The dynamic planner is ready for the loading and completion time of all configured VMs for all configured VMs. The minimum time for completion of this specific role was then defined for one of the VMs based on the above calculations. The weighing device for the ring robin is at the end of each game. When complete, the load is spread consistently within the facilities (VMs) involved over any VMs and idling periods. The results of the success analysis and tests with this algorithm have shown that the improved weighted ring royal algorithm is appropriate for heterogeneous work with heterogeneous devices relative to the other ring and weighted circular algorithm. This algorithm calls the QoS key parameter reaction times.

References 1. Simar, P.S., Anju, S., Rajesh, K.: Analysis of load balancing algorithms using cloud analyst. Int. J. Grid Distrib. Comput. 9(9), 11–24 (2016) 2. Maguluri, S.T., Srikant, R., Ying, L.: Stochastic models of load balancing and scheduling in cloud computing clusters. In: INFOCOM Proceedings IEEE, pp. 702–710 (2012) 3. Desyatirikova, E.N., Kuripta, O.V.: Quality management in IT service management based on statistical aggregation and decomposition approach. In: 2017 International Conference “Quality Management, Transport and Information Security, Information Technologies” (IT&QM&IS), pp. 500–505. https://doi.org/10.1109/ITMQIS.2017.8085871 4. Cheng, D., Rao, J., Guo, Y., Jiang, C., Zhou, X.: Improving performance of heterogeneous map reduce clusters with adaptive task tuning. IEEE Trans. Parallel Distrib. Syst. 28(3), 774–786 (2016) 5. Chiang, M.L., Luo, J.A., Lin, C.B.: High-reliable dispatching mechanisms for tasks in cloud computing. In: BAI2013 International Conference on Business and Information, Bali, Indonesia, p. 73, 7–9 July 2013 6. Mohapatra, S., Smruti Rekha, K., Mohanty, S.: A comparison of Four Popular Heuristics for Load Balancing of Virtual Machines in Cloud Computing 7. Kundu, S., Rangaswami, R., Dutta, K., Zhao, M.: Application Performance Modeling in a Virtualized Environment. In: Proceedings of IEEE HPCA, Jan 2010 8. Chiang, M.-L., Hsieh, H.-C., Tsai, W.-C., Ke, M.-C.: An improved task scheduling and load balancing algorithm under the heterogeneous cloud computing network. In: 2017 IEEE 8th International Conference on Awareness Science and Technology (iCAST). https://doi.org/10. 1109/icawst.2017.8256465

The Predictive Genetic Algorithm (GA) Load Management …

691

9. von Laszewski, G., Wang, L., Younge, A.J., He, X.: Power-aware scheduling of virtual machines in DVFS-enabled clusters. In: IEEE International Conference on Cluster Computing and Workshops, New Orleans, LA, pp. 1–10 (2009) 10. Kaneria, O., Banyal, R.K.: Analysis and improvement of load balancing in cloud computing. In: International Conference on ICT in Business Industry and Government (ICTBIG), Jan 2016 11. Ajila, S.A., Bankole, A.A.: Cloud client prediction models using machine learning techniques. In: 37th Annual International Computer Software and Applications Conference, Kyoto, Japan (2013) 12. Lyu, H., Li, P., Yan, R., Luo, Y.: Load forecast of resource scheduler in cloud architecture. In: 2016 International Conference on Progress in Informatics and Computing (PIC) 13. Shakir, M.S., Razzaque, A.: Performance comparison of load balancing algorithms using cloud analyst in cloud computing. In: 2017 IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON). https://doi.org/10.1109/uemcon.2017. 8249108 14. Kumar, M., Sharma, S.C.: Dynamic load balancing algorithm for balancing the workload among virtual machine in cloud computing. In: 7th International Conference on Advances in Computing and Communications, ICACC-2017, 22–24 Aug 2017, Cochin, India 15. Volkova, V.N., Chemenkaya, L.V., Desyatirikova, E.N., Hajali, M., Khodar, A., Osama, A.: Load balancing in cloud computing. In: 2018 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus). https://doi.org/10.1109/eiconrus.2018. 8317113 16. Wang, Y., Ren, Z., Zhang, H., Hou, X., Xiao, Y.: “Combat Cloud-Fog” network architecture for internet of battlefield things and load balancing technology. In: 2018 IEEE International Conference on Smart Internet of Things (SmartIoT).https://doi.org/10.1109/smartiot. 2018.00054 17. Li, J., Qiu, M., Niu, J.-W., Chen, Y., Ming, Z.: Adaptive resource allocation for preempt able jobs in cloud systems. In: 10th International Conference on Intelligent System Design and Application, pp. 31–36 (2011) 18. Shi, J.Y., Taifi, M., Khreishah, A.: Resource planning for parallel processing in the cloud. In: IEEE 13th International Conference on High Performance and Computing, pp. 828–833 (2011) 19. Goudarzi, H., Pedram, M.: Multi-dimensional SLA-based resource allocation for multi-tier cloud computing systems. In: IEEE International Conference on Cloud Computing, pp. 324– 331 (2011) 20. Dhiman, G., Marchetti, G., Rosing, T.: vGreen: a system for energy efficient computing in virtualized environments. In: Conference of ISLPED 2009 San Francisco, California ,USA, pp. 19–21 (2009) 21. Jin, H., Deng, L., Wu, S., Shi, X., Pan, X.: Live virtual machine migration with adaptive, memory compression. In: IEEE International Conference on Cluster Computing and Workshops, New Orleans, LA, pp. 1–10 (2009) 22. Pattanaik, P.A., Roy, S., Pattnaik, P.K.: Performance study of some dynamic load balancing algorithms in cloud computing environment. In: 2015 2nd International Conference on Signal Processing and Integrated Networks (SPIN) 23. Li, B., Li, J., Huai, J., Wo, T., Li, Q., Zhong, L.: EnaCloud: an energy-saving application live placement approach for cloud computing environments. In: IEEE International Conference on Cloud Computing, Bangalore, pp. 17–24 (2009). (2) (PDF) VM Allocation in cloud computing using SVM. Available from https://www.researchgate.net/publication/336022132_VM_Alloca tion_in_cloud_computing_using_SVM. Accessed 16 Mar 2020

Continuous Recognition of 3D Space Handwriting Using Deep Learning Sagar Maheshwari and Sachin Gajjar

Abstract In this paper, we attempt to present novel input methods that help enable byzantine free of hands interface through recognition of 3D handwriting. The motion is detected wirelessly by the use of the inertial measurement unit (IMU) of the Arduino 101 board. Two different approaches are discussed. One approach is to use the pattern matching engine (PME) of the Intel® Curie™ module on Arduino 101 mounted on the back of the hand. The second approach uses the IMU input to a wellstructured recurrent neural network. The spotting of handwriting segments is done by a support vector machine. The former approach, being indigent of memory, is not preferred over the latter. The deep learning approach can continuously recognize random sentences. The model was trained on 1000 freely definable vocabulary and was tested by only one person, achieving the lowest possible word error rate of 2%. Keywords Air writing · Arduino · Deep learning · Recurrent neural networks · Support vector machine

1 Introduction Hand gestures are a pervasive, common, and significant piece of a communicated language. The advent of recognition of 3D hand gestures has enticed variety of research concerns in various emerging fields namely pattern recognition, computer vision, and human–computer interaction. There are several ways of sensing gestures, one of which is a low-cost method of sensing through hand mounted sensors that include accelerometers and gyroscopes [1]. Operations like writing or comprehending text or some different convoluted tasks entail more expressive capacity than a limited bunch of secluded gestures [2]. This paper presents novel approaches S. Maheshwari (B) · S. Gajjar Department of Electronics and Communication Engineering, Nirma University, Ahmedabad, Gujarat 382481, India e-mail: [email protected] S. Gajjar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_73

693

694

S. Maheshwari and S. Gajjar

that combine the intuition gathered from gestures to express it in the form of handwriting, specifically as a text output. Several challenges arise. First, in everyday life, the gestures are not limited to specific handwriting segments, but also include the normal day-to-day activities, introducing a lot of irrelevance in the text input interface. The handwriting segments should be identified beforehand in the continuous stream of data. Secondly, as the accelerometer data is noisy, it should be filtered before sending it to the recognition stage. Third, the actual text input must be recognized in the whole data stream. For continuous recognition, we use two approaches. The first approach involves the use of Arduino 101 [3]. The Intel® Curie™ module embedded on the Arduino 101 provides us a pattern matching engine that can be used to recognize the gestures. The second approach is to use the 3-axes accelerometer of Arduino 101 and divide the process into 2 stages. The first stage is the spotting stage, which involves the use of a support vector machine [4] to classify between the writing and non-writing segments. The second stage uses recurrent neural networks for recognition of the gestures [5]. While the existing proposed scheme is based on recognition of text, this can be utilized as a base for any type of gesture recognition scheme which is built on a primeval alphabet of freely definable gestures. The first approach lacks suitable memory for large datasets; i.e., it is limited to only 128 bytes of memory per neuron for 128 neurons. The second approach, however, can be applied to large definable vocabularies, larger than V1K. Following is the organization of rest of the paper. Section 2 discusses the related work. Recognition of gestures using Arduino 101 and deep learning are discussed in Sects. 3 and 4, respectively, finally followed by conclusion.

2 Related Work Recent research suggests the paradigm shift to mobile device computing by facilitating hands-free action. Gestures allow fostering interface that is independent of any handheld tool. Hence, allowing faultless incorporation into day-to-day activities. Mini-projectors portray the display on a rigid exterior in front of the subject and the gesture is tracked via a camera or any other medium [6]. However, the approach depends on sensory input and hence would perform poorly in case of continuous stream recognition. Other researchers propose that 3D communication is doable without any sort of graphical output. The operator needs to imagine a blank surface that serves the purpose of the screen [7]. Handwriting can be predicted as text lacking any optical or sensory feedback, a method that is used here. In any accelerometer data, the spotting of relevant signal segments is necessary. This is possible by employing a binary classifier to detect probable segments and then classify the gesture afterward [7]. This approach, however, introduces latency, and therefore, the overhead involved reduces the efficiency of the recognition system. The other method is to sort the input constantly and eliminate any irrelevant outputs. Gesture recognition using accelerometer data has been experimented heavily previously where normally numerous secluded motions are expressed and sorted [8]. Many researchers propose

Continuous Recognition of 3D Space …

695

a variety of methods to recognize gestures through accelerometer input. Though various researchers have proposed various methods, they are either built on a very exhaustive and a primitive vocabulary or lacks in recognition of continuous stream of input. To this end, this paper discusses two independent approaches to recognize gestures using accelerometer data. The said approaches solve the issue of limited vocabulary and a continuous stream of input.

3 Gesture Recognition Using Arduino 101 Arduino 101 is a development kit that comprises Intel® Curie™ module, intended to assimilate the low power usage of the core with elevated ease-of-use [3]. The Arduino has capacities of low energy Bluetooth and consists of an onboard 6-axes accelerometer/gyroscope. It consists of 2 miniature cores, a 32-bit ARC architecture core, and an x86 (Quark), both of which are clocked at 32 MHz. The real-time operating systems (RTOS) and the associated framework designed by Intel are both open-source [3].

3.1 Deep Learning on Intel® Curie™ The pattern matching engine (PME) of Curie™ works as an engine for concurrent data recognition having 128 concurrent processing elements (PE) each along with input vector of 128-byte, 128-byte model memory, and 8-bit arithmetic units. It supports 2 classification techniques: radial basis function and k-nearest neighbors and supports 127 contexts. Arduino 101 provides CuriePME API that can be used to train and classify gestures. Additionally, the module also provides an inertial measurement unit having 6 degrees of freedom and each sensor sample (so ) can be represented as a 6-dimensional vector corresponding accelerometer/gyroscope values:    so = (a, g) = (ax , ay, az ), gx , g y , gz

(1)

As stated earlier, the QuarkSE core on Curie module comes with 128 neurons, with 128 bytes of memory per neuron. And hence, there is a trade-off between memory and the data that can be classified. Figure 1 shows the glove for gesture recognition. We propose a system shown in Fig. 2 which is user-dependent and gives a comparatively poor performance in terms of word error rate on a person-independent setup. This system gives better performance when the dataset comprises of utmost 3 syllable words. The system gives 100% accuracy when single letters are to be classified. Continuous recognition of words is also possible with this setup but is not recommended due to memory indigence. For instance, drawing the letter A takes almost 2 s, which is 200 samples at 100 Hz. Now, the 3-axes accelerometer values in ‘int’ are 4 bytes each which makes

696

S. Maheshwari and S. Gajjar

Fig. 1 Prototype of gesture recognition glove

Fig. 2 Prototype of gesture recognition glove

up 2400 bytes per letter. But for 128 neurons, our pattern can be no larger than 128 bytes. So, to throw off at least 95% of data without affecting the results requires the use of under-sampling, after which the max size of 128 bytes per letter can be achieved. Also, to remove the noisy data, we use an averaging filter. From the above discussion, it is clear that memory management is not efficient with this system. And which leads to a lot of data being wasted. The CuriePME library is mostly used for (1) learning patterns (2) classifying and recognizing patterns and (3) storing and retrieval of pattern matching as knowledge. The use of CurieBLE library ensures the wireless function [9]. Figure 3 shows the raw and noisy accelerometer data. Figure 4 shows the accelerometer data under-sampled to a total of 45 samples and mapped from 0 to 255. Due to the memory constraints, i.e., only 128 bytes per pattern and poor memory management, we propose a new method for gesture recognition with the use of deep learning.

Continuous Recognition of 3D Space …

697

Fig. 3 Drawing ‘A,’ raw accelerometer data

Fig. 4 Under sampled version of ‘A’

4 Gesture Recognition Using Deep Learning This approach is a more robust approach for gesture recognition. It can be divided into spotting stage and the recognition stage. The combination of the two stages introduces no overhead and there is no effect on the accuracy of word detection. The process can be seemingly pipelined and real-time detection of gestures is possible.

698

S. Maheshwari and S. Gajjar

Fig. 5 SVM architecture

4.1 Spotting Stage The role of the spotting stage is to classify the writing and the non-writing segments in the accelerometer data, uniquely. The intuition of the spotting stage is derived by Amma et al. [4]. The segments that are correctly recognized as the writing segments are then carried forward to the detection stage. The stage uses an RBF kernel (C = 126, γ = 2) based on a binary support vector machine (SVM) classifier. For usage on continuous data streams and in real-time, the approach of the sliding window is more suitable. The overlapping sliding windows are classified and accumulated to send to the recognition stage. The window of length 0.9 s and shifting width of 0.1 s is used in the approach. Figure 5 depicts the architecture of the spotting stage. In the figure, green and red segments show writing and non-writing segments, respectively. Visual inspection depicts that the handwriting part has high frequency and amplitude than the non-writing part. For each window wt , the SVM classifier C(wt ), returns 1 when handwriting segment is detected and returns 0 otherwise. One sample of sensor, st is categorized as a handwriting motion if at least a single window consisting of st is categorized as handwriting segment [4]. C(st ) =

max

k: s(t)∈w(k)

C(wk )

(2)

This system is biased towards the detection of writing motion. Also, minute intervals while writing would not result in any gaps in the detected writing segments. All real-time experiment results show that the chosen values are suitable for the model. As the system is biased, a high recall of 98.2% is attained in the process and the low precision of 32% is attained. As a comparable result in [4], these values are reasonable.

4.2 Recognition Stage The purpose of gesture detection is to build a dominant classifier and hence several deep learning models that exhibit temporal dynamic behavior come into play. These state-of-art models include gated recurrent units (GRU) [5], long short-term memory (LSTM) [5], and recurrent neural networks (RNN) [5]. We discuss RNN for this stage. They are utilized for processing time-sequence data. From the input layer to

Continuous Recognition of 3D Space …

699

Fig. 6 RNN architecture

the output layer in a conventional neural network, the layers are fully connected that is not appropriate for time-series data. Hence in RNN, the present output is also related to the past output. A network would remember the previous output and apply this information for calculating the present output. Theoretically, however, RNN can manage infinite time-series data. In practice, to reduce intricacy, the present state is only associated to the past few states, according to need [5]. Averaged 3D acceleration and averaged 3D angular rate features are extracted from the inertial measurement unit of Arduino 101. Figure 6 shows the RNN structure that is used. Formula for the network is a derivative of the Eqs. 3 and 4:   h t = f wh h t−1 + wi x t

(3)

  y t = f wo h t

(4)

where x t depicts input at t = 1, 2, 3, 4 …, ht forms the hidden layer at value t, yt results in the output of step t, and f is usually a non-linear activation function, example ReLU or Leaky ReLU or tanh [5]. The experiment was conducted by a single subject. The length of sentences varied from 2 to 4 words. The user had to write 10 English sentences without moving the wrist with an approximate height of 15 cm per character. In total, the user wrote 37 words with 245 characters. The neural network took half an hour to train on a small vocabulary (V1k) that contains 986 words on NVIDIA GeFORCE GTX 1050Ti GPU. The word error rate was calculated as depicted in Amma et al. [4]. A word error rate (WER) of 2% was reached.

5 Conclusion In the initial part of the work, a wearable system of gesture input that is adept in recognizing text input written in air centered on the IMU of Arduino 101 is suggested. With the use of CuriePME, the system works well on detection of gestures

700

S. Maheshwari and S. Gajjar

containing words with utmost 3 syllables and works with 100% efficiency when a single syllable is input. However, dataset is limited because of lack of memory and more memory is required to expand the vocabulary. To avoid memory constraints, a new method using deep learning was used. During the spotting stage, 98% recall and 32% precision was achieved. The network was trained on a very small vocabulary (V1K). Experiments were conducted on a dataset of approximately 300 words. A WER of 2% was attained. In the future, the proposed system will be tested on a versatile dataset of large vocabulary V8K and above. Acknowledgements The work is funded by IDEA LAB Program at Institute of Technology, Nirma University, India under contract IDEA-2019-EC-02.

References 1. Cheng, H., Yang, L., Liu, Z.: Survey on 3D hand gesture recognition. IEEE Trans. Circuits Syst. Video Technol. 26(9), 1659–1673 (2016) 2. Amma, C., Schultz, T.: Airwriting: demonstrating mobile text input by 3D-space handwriting. In: Proceedings of the ACM International Conference on Intelligent User Interfaces (IUI’12) (2012) 3. “Arduino -Arduino101”, Arduino.cc, 2020 [Online]. Available https://www.arduino.cc/en/ guide/arduino101. Accessed: 05 Apr 2020 4. Amma, C., Georgi, M., Schultz, T.: Airwriting: hands-free mobile text input by spotting and continuous recognition of 3D-space handwriting with inertial sensors. In: 2012 16th International Symposium on Wearable Computers, Newcastle, pp. 52–59 (2012) 5. Du, T., Ren, X., Li, H.: Gesture recognition method based on deep learning. In: 2018 33rd Youth Academic Annual Conference of Chinese Association of Automation (YAC), Nanjing, pp. 782–787 (2018) 6. Chen, F., et al.: WristCam: a wearable sensor for hand trajectory gesture recognition and intelligent human–robot interaction. IEEE Sens. J. 19(19), 8441–8451 (2019) 7. Gustafson, S., Bierwirth, D., Baudisch, P.: Imaginary interfaces: spatial interaction with empty hands and without visual feedback. In: Proceedings of the 23rd Annual ACM Symposium on User Interface Software and Technology (UIST’10) (2010) 8. Elmezain, M., Al-Hamadi, A., Michaelis, B.: Hand trajectory-based gesture spotting and recognition using HMM. In: 2009 16th IEEE International Conference on Image Processing (ICIP), Cairo, pp. 3577–3580 (2009) 9. Support for Intel® Curie™ Modules”, Intel, 2020 [Online]. Available https://www.intel.com/con tent/www/us/en/support/products/94036/boards-and-kits/intel-curie-modules.html. Accessed: 05 Apr 2020

Automated SQL Grading System Shohna Kanchan, Samruddhi Kalsekar, Nishita Dubey, Chelsea Fernandes, and Safa Hamdare

Abstract A grading system is a procedure used by teachers to assess and evaluate a student’s educational performance. The method employed by tutors in a general grading system for grading structure query language (SQL) assignments is laborious, time consuming, and inaccurate. In this paper, an automated grading system for SQL queries is proposed which thus provides an efficient way to assess a student’s performance by awarding appropriate scores. The automated grading system is implemented for partial marking using PostgreSQL. The front-end of our system is executed with a database driven website in Django. We believe that the system will be very useful to global educational systems. Keywords Automated grading · Partial marking · PostgreSQL · Canonicalization · Structure query language

1 Introduction An automated grading system provides efficient means for tutors to check the student’s understandings of certain concepts in order to determine comprehension. Considering the incrementing proportion of students, the automation of this process highly enhances the overall efficiency. The student queries are graded by comparing S. Kanchan · S. Kalsekar (B) · N. Dubey · C. Fernandes · S. Hamdare Department of Computer Engineering, St. Francis Institute of Technology, Mumbai, India e-mail: [email protected] S. Kanchan e-mail: [email protected] N. Dubey e-mail: [email protected] C. Fernandes e-mail: [email protected] S. Hamdare e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_74

701

702

S. Kanchan et al.

its each component with the components of the correct query [1, 2]. An initial approach is to award full marks if the query under consideration is correct; i.e., the correctness of the student SQL query is evaluated by comparing the result and the respective query of student query with that of instructor query. Furthermore, in cases as given below [3], wherein the queries look similar but give different results. Instructor query

Student query

SELECT Department.Dept_Id, Employee.Name FROM Department RIGHT JOIN Employee ON Department.Employee_Id = Employee.Employee_Id ORDER BY Department.Dept_Id

SELECT Department.Dept_Id, Employee.Name FROM Department RIGHT JOIN Employee ON Department.Employee_Id = Employee.Employee_Id ORDER BY Employee.Employee_Id

In such a scenario our system allocates partial marks based on the matching of student query with the correct query. A student may write some parts of the query correctly, and in such cases, partial marks should be allotted by taking the weighted attributes and predicates under consideration. Partial marking, thus, incorporates various sub techniques under query pre-processing for awarding partial marks to incorrect student queries. Canonicalization of the student query as well as the instructor query is required for performing comparison of the same syntactic variations. Thus, the canonicalized queries are then broken into components and these components are compared. Moreover, canonicalization may not guarantee an optimal result due to deviations in the form in which the student query and instructor query are canonicalized, even though they are equivalent. As a result, various preprocessing techniques are required. Dividing a given query in different attributes and performing initial pre-processing is an integral step toward awarding partial scores. Depending upon the syntactic variations, techniques for instance, attribute disambiguation, WITH Clause Elimination, BETWEEN Predicate Elimination, normalization of relational, predicates, join processing are performed. In this paper, we summarize the pre-processing techniques which are required to allot partial marks to the student queries.

2 Literature Review The techniques for data generation in the X-Data system were further extended in order to include a much larger variety of queries and mutations. This data was used in building a SQL grading system. The testing for the accuracy of the datasets generated by this X-Data system was conducted by using SQL queries that students submitted as a part of their DBMS course. This system did not support, or supported only partially, some SQL features which included sub-queries in a query, queries containing arithmetic operations, and identifier replacement mutations. It also did

Automated SQL Grading System

703

not support the functionality of assigning partial marks to examine the extent of correctness of the student query [3, 4]. A system was presented that took a database application program as input. It generated datasets and using these datasets, unit tests were carried out to test the accuracy of the functions with queries in the application. The techniques that were used were on the basis of mutation testing and static program analysis. Java applications that used JDBC or Hibernate APIs were examined. The system could not handle all areas of SQL query mutations. It would not suggest correct queries based on the datasets [2, 5]. An automated SQL assignment grading system was developed using objectoriented design (OOD) technique and model-view-controller (MVC) framework. The system consisted of two main parts: assignment management and automated SQL grader. Instructors could manage their assignment and student information conveniently anytime and anywhere via internet network. The automated SQL grader was designed to support four DBMSs: MariaDB, MySQL, PostgreSQL, and Microsoft SQL server. In this system, grading on SQL outputs was not applicable for the SQL with comparison operators. The partial marking system was absent [6]. The scope of the XData system was increased by including functionalities of assigning partial marks to student queries. In the comparison of student and instructor query, the system was able to check many more syntactic features. Due to this, the system was able to be fully automated and scalable to huge class sizes such as those of massive open online courses (MOOCs). Canonicalization of sub-queries was not taken into account in this system. Canonicalization of DISTINCT placement in FROM clause sub-queries versus outer queries was another area of future work [1, 7].

3 Challenges Identified Canonicalization and partial marking of sub-queries or nested sub-queries needs to be done by de-correlation. This system would make use of a method of dividing the constraints and putting it into a table which will contain the following fields: Predicates, Projections, Relations, Group By, and Having Clause. This technique is different from the one mentioned in research paper, i.e., test data generation. Ability to grade the student query according to specifications explicitly indicated by the instructor which must be reflected in the student query is the biggest challenge identified. These specifications must be taken into account since it may vary in different cases. Comparison between the query execution time of the student with the instructor and the same must be reflected on the portal. A suggestion will be given to the student if it has taken more query execution time but the marks will not be deducted. The automated SQL partial marking system [1] is not open- source and not freely available for use.

704

S. Kanchan et al.

4 Problem Definition A generic assessment process for grading of structured query language traditionally follows two approaches; mainly by either comparing the input query with the optimal set of queries or by executing the respective input query. Considering the manual effort and the accompanying error in both the above-mentioned approaches, an automated grading approach proves to be more efficient and effective. Therefore, we model an automated SQL grading system which includes partial marking of SQL queries.

5 Problem System Methodology 5.1 Flow-Chart of the System Step 1 An instructor can create SQL assessment tests and can provide model answers for the same in the instructor mode. Step 2 Instructor will enter the required keywords and assign corresponding weights to entities of the model answer query. Step 3 In the student mode, the student attempts the assessment test and submits answers for each question. Step 4 Once the query is submitted, the student query and the tutor query as well as their outputs are evaluated by the matching criteria, and if all mentioned conditions are satisfied, the student is awarded full marks. Step 5 If the query is incorrect, the student gets the justification for the same in the learning mode along with appropriate partial scores (Fig. 1).

5.2 Algorithm 1. Evaluation of student query using X-Data System (X-Data generates multiple datasets to kill mutations). 2. Canonicalization or removal of syntactic variations of student and instructor query to make the queries comparable. 3. Pre-processing includes attribute disambiguation between predicate elimination, normalization of relational predicates, join processing. 4. Generating equivalence class of attributes. 5. Join minimization. 6. Functional dependencies which includes canonicalizing ORDER BY attributes, comparing GROUP BY attributes, canonicalizing duplicate removal. 7. Deconstruction of SQL queries into components such as SELECT list, FROM clause, WHERE clause, GROUP BY, HAVING, etc.

Automated SQL Grading System

705

Fig. 1 Flow-chart of the assessment system

8. Component and sub-parts (attributes) matching. 9. Computation of marks using weighted technique. In Fig. 2, both the instructor and student queries are segregated into elements inclusive of the basic selection clause, where clause predicates, from clause, operators, etc. These elements are further divided into sub-parts like Predicates, Projections, Relations, Group By, and Having Clauses. For each component of the instructor query, the sub-parts from the instructor query are matched with the corresponding sub-parts from the student query. Missing sub-parts are penalized by giving marks for that component in proportion to the number of instructor query sub-parts that are actually present. Extraneous sub-parts in the student query are not penalized; marks are computed in this manner for each subpart and added to get a mark for each component [1].

6 Performance Evaluation Parameter Following are the parameters based on which the performance will be evaluated:

706

S. Kanchan et al. Instructor query: SELECT Department.Dept_Id, Employee.Name FROM Department RIGHT JOIN Employee ON Department.Employee_Id = Employee.Employee_Id ORDER BY Department.Dept_Id; Student query: SELECT Department.Dept_Id, Employee.Name FROM Department RIGHT JOIN Employee ON Department.Employee_Id = Employee.Employee_Id ORDER BY Employee.Employee_Id; Marks: 6.5 Paral Marking Details

Department.Employee_Id = Employee.Employee_Id

Department.Employee_Id = Employee.Employee_Id

Department.Dept_Id, Employee.Name Department Employee Employee Employee.Employee_Id

Department.Dept_Id, Employee.Name Department Employee Employee Department.Dept_Id

Fig. 2 Component-wise partial marking

1. User Friendly Interface: The interface must be easily accessible and comprehensible by the user (student and tutor). 2. Easy integration with existing systems: The system should portray flexibility with respect to installation and up gradation. 3. Processing time: Time taken in evaluation of queries and displaying aggregate marks. The accuracy of the system will be evaluated based on the required execution time and how well the awarded partial marks correlate with the marks assigned by the tutor. Unit test of each program will be performed using black box testing followed by integration, system and acceptance testing.

7 Experimental Setup Hardware Requirements: A system with RAM 2 GB and storage 2 GB Software Requirements: Python, Django Database used: PostgreSQL. The user interface proposed in this project is classified into two working modes— Instructor mode and student mode. The fundamental aim of the system is to provide easy navigation and interaction providing interfaces, thus enhancing the educational

Automated SQL Grading System

707

Fig. 3 Sample user interface

experience. The two corresponding modes will maintain a user specific profile for every student and teacher, also providing authorization for the same. Furthermore, the student mode is classified into two profiles—Learning and Assessment. In the instructor mode, the questions and respective solutions will be provided. The instructor also defines the keywords and assigns the weights. For e.g., in the above sample GUI, given in Fig. 3 the instructor has provided the keywords such as INNER JOIN, GROUP BY. Then, in the student mode, the student can view the marks allotted to them and the optimal query execution time taken.

8 Conclusion In the course of our research, we have studied the various approaches contributing to the SQL query evaluation processes, especially highlighting about the syntactic mutations of the same. We have tried to address the challenges identified due to conventional grading systems through our developed system. In this work, we are primarily focusing on standardization and canonicalization techniques to process and evaluate SQL queries for assignment evaluation, as well as enhancing the learning environment by improving efficiency. The system provides a clear indication of the correctly assessed queries and vice versa up to a specific efficiency rate for the result instances tested against the system. Considering 50 queries, comparing the expected and actual result, the system facilitates an efficiency rate of 76%. We have also

708

S. Kanchan et al.

successfully studied the flow of the entire system regarding the query processing and its various techniques involved.

References 1. Chandra, B., Joseph, M., Radhakrishnan, B., Acharya, S., Sudarshan, S.: Automated grading of SQL queries. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE) 2. Neumann, T., Moerkotte, G.: A combined framework for grouping and order optimization. In: VLDB, pp. 960–971 (2004) 3. Chandra, B., Chawda, B., Kar, B., Maheshwara Reddy, K.V., Shah, S., Sudarshan, S.: Data generation for testing and grading SQL queries. VLDB J. 24(6), 731–755 (2015) 4. Paulley, G.N., Larson, P.-A.: Exploiting uniqueness in query optimization. In: CASCON, pp. 804–822 (1993) 5. Agrawal, P., Chandra, B., Venkatesh Emani, K., Garg, N., Sudarshan, S.: Test data generation for database applications. In: IEEE 34th International Conference on Data Engineering (ICDE) (2018) 6. Singporn, P., Vichianroj, P., Trongratsameethong A.: ASQLAG—automated SQL assignment grading system for multiple DBMSs. J. Technol. Innov. Tertiary Educ. 1(1), 41–59 (2018) 7. Silberschatz, A., Korth, H.F., Sudarshan, S.: Database System Concepts, 6th edn. McGraw Hill (2010)

Error Analysis with Customer Retention Data V. Kaviya, V. Harisankar, and S. Padmavathi

Abstract Churn prediction is required by most of the service companies to improve their business. The machine learning approaches concentrate on the selection of algorithms or features to improve the accuracy of churn prediction. Algorithms to understand what went wrong and why a prediction is not accurate are needed to improve the system. This paper gives special attention to the error analysis of those approaches and the overall analysis of the dataset. This paper analyses the working of various machine learning approaches for customer retention prediction based on bank customer’s transaction data. It also gives a detailed error analysis using distance and similarity metrics like Mahalanobis distance, Hamming distance, and Jaccard Similarity Score. It provides a ranking for the features in the dataset based on error analysis and also lists their importance in a quantified manner by removing highly ranked features. Keywords Churn prediction · Data mining · Error analysis · Machine learning · Similarity metrics

1 Introduction In the digital era, vast amounts of data are generated from various sources like Health care, retail, telecommunications, banking, social networking sites, etc. Due to the sharp growth of data, researchers and decision-makers often find it difficult to analyse the data with efficiency and obtain beneficial and worthy conclusions.For V. Kaviya (B) · V. Harisankar · S. Padmavathi Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] V. Harisankar e-mail: [email protected] S. Padmavathi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_75

709

710

V. Kaviya et al.

any business, customer retention is the primary key to its long- term survival. Customer retention is the act of retaining customers by undertaking activities to prevent customers from defecting to other peer companies.According to the Harvard business review, a company can raise its profits by 25–85% if its customer retention rate is increased by 5% [1]. Therefore, companies are in need of accurate analytical models that can identify the fruitful customers based on their personal, behavioral and demographic data. The analysis used in this paper is carried out on a dataset of transactions of bank clients [2].This paper intends to analyse and inspect the working of numerous machine learning approaches for the dataset using different evaluation metrics and to focus on the error analysis based on the wrong predictions to give a clear understanding on the reason for the deviation from the general trend and wrong predictions.This paper begins with Sect. 2 where a detailed assessment of the already existing methodologies is presented. This section also discusses the limitations of these methodologies. In Sect. 3, a description of the overall working mechanism of the study with results of all the approaches along with the error analysis is presented. Finally, the paper ends with Sect. 4 where conclusions are presented.

2 Literature Review Most of the studies conducted by the machine learning community on churn prediction have used datasets from Telecom industry [3–6] and very few studies have taken datasets from Banking industry [7, 8]. In [2], the study compared the results of the Decision tree algorithm with both the Spark ML package and the Spark MLlib package in handling enormous data and found that Spark ML package performed better. In data pre-processing, it can be seen that feature selection, random under-sampling or over-sampling, data cleaning, feature extraction, standardization and encoding of categorical and continuous attributes has significant impact on the prediction of the model.Prediction techniques like CART, SVM, Random Forest, MLP(Neural Networks), Naive Bayes and DT [3, 4, 9] are used to a large extent,and it has been found that traditional models like DT and SVM perform better compared to the Neural networks and clustering models. However, most of the studies in the literature have not considered a study that covers the error analysis for the wrong predictions. There are various types of errors related to machine learning and data analytics. The training and testing error samples are important for error analysis and testing error samples are considered the most important since they aid in assessing potential execution of a given prescient model on fresh and inconspicuous data. Therefore, the current study is unique considering all the widely used pre-processing techniques employed into a single study and then establishing a solid ground for the classification errors and the deviation from the general trend by performing error analysis using distance and similarity metrics.

Error Analysis with Customer Retention Data

711

3 Methodology 3.1 Dataset Dataset taken in this analysis contains details of bank clients which is freely accessible on Kaggle. It consists of transactional details of 10,000 customers. The features of the dataset include Row Number,Customer ID, Surname, Credit Score (A credit score is a 3 digit number that quantifies a person’s capacity to pay back the acquired amount), Geography The locality of the clients across three nations where the bank is working), Gender, Age Balance, IsActiveMember, Estimated Salary, Tenure (The time of having the account in months), NumOfProducts (Number of accounts the individual has), HasCrCard (Binary variable to indicate whether the client has a credit card)and Exited (Binary variable to denote whether the client has left the bank).

3.2 Data Preprocessing Studies have demonstrated that preprocessing has a remarkable impact on the prediction of the model.First, the insignificant attributes (Row Number, CustomerID and Surname) were dropped.Categorical variables like Gender and Geography are encoded using one hot encoding and Min max normalization is adopted for feature value normalization. One of the main problems faced by any classification model is class bias where there is an unequal distribution between the classes of a target variable. SMOTE (synthetic minority oversampling technique) algorithm is utilized which works by creating an arbitrary arrangement of minority class samples to move the classifier learning bias towards minority class. ROC scores have improved from 74.62 to 75.55% for RandomForest after applying SMOTE.

3.3 Evaluation Metrics The execution of the model is assessed utilizing accuracy and AUC- ROC. Accuracy is the ratio between the correct number of predicted samples and the total number of samples. ROC-AUC score reveals the ability of the model to differentiate between classes. Larger the score, better the model is at distinguishing classes.

712

V. Kaviya et al.

3.4 Observation The accuracy and ROC-AUC scores for all algorithms are tabulated in Table 1. From the table, it can be seen that Random Forest and XGBoost gave better accuracy and ROC-AUC score compared to other algorithms. Random Forest is a cluster of decision trees. Each node within the decision trees is a condition on one feature, to group similar values from the dataset. For classification algorithm the condition is based on Gini impurity. The feature importance is computed based on the amount of influence that each feature has in decreasing the weighted impurity.For example, in Random Forest, the final feature importance is the average of the values across all the trees.Feature importance for Random Forest and Xgboost is shown in Fig. 1 and it can be observed that ‘Age’, ‘NumOfProducts’, ‘Balance’ are the most dominant features in both algorithms. Figure 2 gives the feature importance based on Random Forest algorithm when (a) wrongly classified samples alone are taken and (b) when an equal number of correct and wrongly classified samples are used. It can be observed that ‘CreditScore’, ‘Age’, ‘Balance’ are the most dominant features in the first scenario. When features were analysed individually, customers with balance less than 20,000 and customers with age greater than forty had high retention. Table 3 shows the number of samples incorrectly classified and the count of common incorrectly classified samples between the two algorithms. These samples are considered for error analysis.

Table 1 Accuracy and roc-score of selected models Model Accuracy RandomForest XGBoost SVM Naive Bayes Logistic regression

Fig. 1 Feature importance of XGBoost and RandomForest

87.1 86.88 86.05 78.40 81.45

Roc-Auc score 74.62 74.62 68.13 66.51 59.68

Error Analysis with Customer Retention Data Table 2 Legend for Figs. 1 and 2 Label Feature f1 f2 f3 f4 f5 f6

Geography (France) Geography (Spain) CreditScore Gender Age Tenure

713

Label

Feature

f7 f8 f9 f10 f11

Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary

Fig. 2 Feature importance using wrongly classified and an equal number of correct and wrong samples Table 3 Values wrongly predicted between two algorithms Country Xgboost RandomForest France Spain Germany

101 60 81

103 65 90

Common values 90 55 71

3.5 Error Analysis In order to identify the common features that caused the errors, we have analyzed the similarity and distance of the features belonging to the error samples.

3.5.1

Mahalanobis Distance

Mahalanobis distance is a measure which considers the unequal variances and correlations between features to find the distance between two data elements in the space

714

V. Kaviya et al.

defined by features. This algorithm has been used for object classification in [10]. Equation explains how to compute Mahalanobis distance which was first introduced in [11]. (1) D 2 = (x − m)T C −1 (x − m) where D 2 = Mahalanobis distance, x = vector of data, m = Vector of mean values of independent variables, C −1 = Inverse Covariance matrix of independent variables and T indicates vector should be transposed.

3.5.2

Hamming Distance

Hamming distance is the minimum number of substitutions required to turn one into another. Since hamming distance works on binary data, all continuous values were binned and converted into minimum categories possible. In [12], Hamming distance has been used for fault analysis of various circuits.

3.5.3

Jaccard Similarity Score

The Jaccard similarity estimates likeness between limited sample sets and is characterized as the cardinality of the intersection of sets divided by the cardinality of the union of the sample sets. In [13], Jaccard similarity has been used to find dissimilarity between the frames of a video to detect motion wherein pixels are tokenized and hashed. Mahalanobis distance is calculated between each feature and the target variable to find the within-class variance. The feature with the highest Mahalanobis distance is the most confusing. The feature with the maximum between-class variance also has the minimum within-class variance. Between class variance of each feature is calculated by using Hamming distance and Jaccard similarity score. Error analysis sample set is split into two classes (a) Samples which were wrongly predicted as Exited and (b) Samples which were wrongly predicted as Not-Exited. Distance measure can be considered as an inverse for similarity measure. The features that contribute to wrong prediction are the ones which have smaller values for distance and larger values of similarity in the error samples. These are the confusing features that result in wrong prediction. The following details can be concluded from the observations in Table 4. Concerning Mahalanobis distance, ‘Gender’, ‘Geography’ and ‘NumofProducts’ are the confusing features as they have the highest Mahalanobis distance. With respect to Hamming distance ,the similarity in the dominant features which is ‘Credit Score’ and ‘Balance’ have resulted in the wrong classification. It can be observed that ‘Balance’ and ‘NumofProducts’ are the most dominant features based on Fig. 1. But ‘Balance’ and ‘NumofProducts’ were found to be confusing features based on error analysis.To quantify the importance of these features, these features were removed and respective results are tabulated in Table 5. There

Error Analysis with Customer Retention Data Table 4 Results of error analysis Feature Binned features

Geography (Germany) Credit score

715

Mahalanobis distance

Hamming distance

Jaccard similarity score

21.4934

0.3846

0.6153

0.2857 0.2527 0.4065

0.7142 0.7472 0.5934

0.2637

0.7362

0.3956

0.6043

17.5018 Cred_Score(550, 650) Cred_Score(750, 850)

Gender Balance

NumOfProducts

22.2626 16.4628 Balance (140,000, 260,000) NumOfProducts (1, 4)

21.0203

Table 5 Accuracy and roc-score after removing confusing features Features removed RandomForest XgBoost Accuracy Roc-auc score Accuracy None NumOfProducts Credit score

87.1 83.95 86.45

74.62 66.35 73.64

86.88 84.6 86.5

Roc-auc score 74.62 66.21 73.11

is a drop in accuracy from 87.1 to 83.95 and ROC score from 74.62 to 66.35 when ‘NumofProducts’ alone is removed. The features in the dataset were ranked based on their importance concerning both RandomForest and XGBoost for the whole sample set and when wrongly classified samples alone are taken.When the whole sample set was taken, it could be seen that Age, Credit Score, Balance, and NumOfProducts were the top-ranked features for both the algorithms. When wrongly classified samples alone were taken, it can be observed that the same features—Age, Credit Score, and Balance are the most dominant.

4 Conclusion and Future Work Error analysis and identifying the features causing the error is very important in this machine learning age. This paper has considered the Customer retention data for error analysis. The data is tested and analysed with various classifiers and it has been observed that Random Forest and XGboost algorithms perform very well for the data set under consideration. The misclassified data of these two classifiers is considered

716

V. Kaviya et al.

for error analysis. The features are ranked based on Gini impurity which is used in the default Scikit-Learn implementation. Based on the observations, features that were found to be the confusing features were ranked higher in the feature importance ranking with respect to both Random Forest and XGBoost. When these particular features alone were removed, accuracy dropped to a great extent. This affirms the fact that an algorithm gives more significance to an attribute that is not capable of separating a data sample into two classes. The classifier performance is analyzed and top ranking features are listed under three circumstances: actual data with less bias, with 50% error samples and with 100% error samples. This paper lists the possible confusion features that are responsible for misclassification and compares with the actual data. The study can be widened by assessing and performing error analysis with datasets from various sources to identify a general pattern and to check whether this stratified error analysis can be generalized. Feature ranking for various sampling techniques and diverse machine learning algorithms needs to be explored to get a clearer understanding of their influence on feature ranking.

References 1. Reichheld, F.F., Sasser, E.: Zero defections: quality comes to services. Harvard Bus. Rev. 68(5), 105–111 (1990) 2. Sayed, H., Abdel-Fattah, M.A., Kholief, S.: Predicting potential banking customer churn using apache spark ML and MLlib packages: a comparative study. Int. J. Adv. Comput. Sci. Appl. 9, 674–677 (2018). https://doi.org/10.14569/IJACSA.2018.091196 3. Sahar, F.: Machine-learning techniques for customer retention: a comparative study. Int. J. Adv. Comput. Sci. Appl. 9 (2018). https://doi.org/10.14569/IJACSA.2018.090238 4. Au, T., Ma, G., Li, S.: Applying and evaluating models to predict customer attrition using data mining techniques. J. Comp. Int. Manage. 6(1), 10 (2003) 5. Qureshi, S.A., Rehman, A.S., Qamar, A.M., Kamal, A., Rehman, A.: Telecommunication subscribers’ churn prediction model using machine learning. In: Eighth International Conference on Digital Information Management (ICDIM 2013), Islamabad, pp. 131–136 (2013). https:// doi.org/10.1109/ICDIM.2013.6693977 6. Umayaparvathi, V., Iyakutti, K.: Applications of data mining techniques in telecom churn prediction. Int. J. Comput. Appl. 42, 5–9 (2012). https://doi.org/10.5120/5814-8122 7. He, B., Shi, Y., Wan, Q., Zhao, X.: Prediction of customer attrition of commercial banks based on SVM model. Procedia Comput. Sci. 31, 423–430 (2014). https://doi.org/10.1016/j.procs. 2014.05.286 8. Devi Prasad, U., Madhavi, S.: Prediction of churn behaviour of bank customers using data mining tools. Indian J. Mark. 42(9), 25–30 (2012) 9. Xia, G., Jin, W.: Model of customer churn prediction on support vector machine. Syst. Eng. Theor. Pract. 28, 71–77 (2008). https://doi.org/10.1016/S1874-8651(09)60003-X 10. Natarajan, V., Bharadwaj, L.A., Krishna, K.H., Aravinth, J.: Urban objects classification from HSR -HTIR data using gaussian and mahalanobis distance classifiers. In: Proceedings of the 2018 IEEE International Conference on Communication and Signal Processing (ICCSP 2018), Chennai, pp. 1041–1045 (2018) 11. Mahalanobis, P.C.: On the generalized distance in statistics. Proc. Natl. Inst. Sci. India 2(1), 49–55 (1936)

Error Analysis with Customer Retention Data

717

12. Chandini, B., Nirmala, Devi M.: Analysis of circuits for security using logic encryption. In: Thampi S., Madria S., Wang G., Rawat D., Alcaraz Calero J. (eds.) Security in Computing and Communications. SSCC, Communications in Computer and Information Science, vol. 969. Springer, Singapore (2018) 13. Srenithi, M., Kumar, P.: Motion detection algorithm for surveillance videos. In: Pandian, D., Fernando, X., Baig, Z., Shi, F. (eds.) Proceedings of the International Conference on ISMAC in Computational Vision and Bio-Engineering 2018 (ISMAC-CVB). ISMAC 2018. Lecture Notes in Computational Vision and Biomechanics, vol. 30, pp. 955–964. Springer Netherlands (2019)

Prediction Based Task Scheduling for Load Balancing in Cloud Environment Suresh Chandra Moharana, Amulya Ratna Swain, and Ganga Bishnu Mund

Abstract The exponential growth in demand towards computing resources led the foundation of cloud computing technology. Cloud computing enables the provision of virtual resources in terms of Virtual Machine (VM)s to service user requests. The user tasks are scheduled for these VMs for their accomplishment. However, the services of cloud computing are web-based and hence the workload over the VM gets updated dynamically. In order to handle the dynamic workload, smarter task scheduling heuristics need to be incorporated in the cloud models. The absence of a proper task scheduling scheme may result, uneven load distribution across VMs leading to inefficient utilization of resources. In this work, a prediction based task scheduling scheme is proposed that handles the dynamically changing workload efficiently. It has been seen that the proposed model lessens the load imbalance level across VMs as compared to the contemporary task scheduling models. Keywords Task scheduling · VM · Load balancing · MAD

1 Introduction The increasing demand for computing resources enabled cloud computing to provide unlimited resources to the end-user. Cloud computing is based on distributed computing concepts and it offers services to users over the web. It arranges infrastructure, platform, and software as services as per the pay-per-use model [1, 2]. Besides, cloud computing also reduces the cost of building and managing infrastructure by providing scalable virtualized resources to the end-user [3]. Virtualization [4] helps cloud S. C. Moharana (B) · A. R. Swain · G. B. Mund KIIT Deemed to be University, Bhubaneswar, Odisha 751024, India e-mail: [email protected] A. R. Swain e-mail: [email protected] G. B. Mund e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_76

719

720

S. C. Moharana et al.

computing to provide scalable resources in terms of the Virtual Machine (VM) to the end-user. The user tasks are allocated to these VMs for their accomplishment. However, the monstrous growth of need for these resources motivating service providers of cloud to use their resources towards fulfilling more user service requests [5, 6]. It may lead to uneven distribution of workload across these VMs leading to inefficient usage of computing resources. Hence, there is a necessity of distributing the workload evenly across the VMs leading to load balancing. It will not only use the resources efficiently but also satisfy the QoS requirements effectively. Further, it is also observed that the resource allocation to the tasks in the cloud is NP-complete [7]. So, it is a moving assignment to build up a task scheduling model on cloud resources. The above-mentioned challenge can be addressed by designing appropriate task scheduling heuristic that balances the load across VMs leading to the effective usage of resources. Further, in the cloud model the workload is always unpredictable and hence requires a dynamic scheduling model to handle the unpredictability. In this work, the objective is to design a prediction based task scheduling model that leads to load balancing across the available VMs. At first, the overloaded VMs get selected by using an upper threshold. The upper threshold is chosen dynamically using a popular statistical method considering the historic CPU usage information of VMs. Then, the tasks need to be remapped to the VMs other than the current VM is selected using a pre-defined heuristic. Finally, the identified tasks are rescheduled to the nonoverloaded VMs from overloaded ones in-order to achieve load balancing. The load imbalance level parameter is taken into account to compare the proposed scheme with the contemporary models. After performing extensive experimentation, it has been seen that the proposed model decreases the load imbalance level marginally as contrasted with the current methodologies. So, the proposed model not only achieves load balancing but also handles the dynamic workload efficiently. The remainder of the paper is sorted out as follows. The next section presents a summary of the literature closely linked to the current work. Section 3 provides the details of the proposed system model. In Sect. 4, the assessment of the proposed model is featured. Finally, the last section highlights the conclusive remarks and future directions.

2 Related Work The assignment of tasks to the VMs famously known as task scheduling in the cloud model has been widely studied in different literature. Among them, scheduling of tasks for balancing the in the cloud has taken a reasonable place. As per the literature [8], the objective of load balancing schemes not only to distribute the load evenly across VMs but also maximize the utilization of computing resources. Milani and Navimipour [9] have discussed the different load balancing schemes applicable to the cloud environment. Besides, they have also mentioned the challenges faced by the load balancing algorithms. Patel et al. [10] also studied the varied load balancing

Prediction Based Task Scheduling for Load Balancing …

721

schemes in the cloud environment. They have classified the mentioned load balancing schemes into various categories and highlighted the pros as well as cons of each method. The resource allocation problem can be classified as of NP-Complete nature. So, there is a need of developing heuristic and meta-heuristic schemes for addressing this problem. Freund et al. [11] discussed max–min and min–min heuristic for scheduling a task in a distributed environment. He et al. [12] suggested a modification of the minmin heuristic by taking of QoS into consideration. The literature [13, 14] has focused on task scheduling schemes in the cloud environment that takes QoS into account. Umarani et al. [15] have presented an ant colony optimization based meta-heuristic model for scheduling of tasks however experience the ill effects of prolonged waiting time. Cho et al. [16] proposed a hybrid meta-heuristic approach towards addressing the scheduling problem taking both the ant colony optimization and particle swarm optimization into consideration. The task scheduling heuristics termed as round-robin and random is presented in the literature [17, 18]. The work presented by Rimal et al. [18] used round-robin for even distribution of load across computing resources, but it does not take loads of VMs into consideration. It has been observed that the load balancing schemes can also be either migration based or prediction based in the cloud environment. Task scheduling schemes based on migration transfers the running tasks from overloaded VMs to the non-overloaded ones without service disruption. A particle swarm optimization based task mapping scheme is proposed in [19] in which rather than migrating the overburdened VM the tasks on the over-burdened VM are moved to achieve load balancing. Wu et al. [20] have presented a prediction based task mapping scheme relying upon previous data. The focus of the authors is to predict the VM needs in advance and schedule the task accordingly to accomplish load balancing. Bala and Chana [21] presented a predictive approach to identify the overloaded and underloaded VMs and highlighted a migration based scheme of task scheduling from the over-burdened VMs to the under-burdened ones to balance the load across VMs. After reviewing the literature, the following research gaps are identified. It has been observed that prediction based task scheduling schemes based on statistical techniques are missing. Alongside this, most literature has considered a single parameter for taking decisions in their model. It has been also found that the underutilized virtual machines are taken into account in a few pieces of literature. In this proposed scheme, the authors have addressed some of the gaps mentioned above. In the next section, the details of the proposed task scheduling scheme will be discussed.

3 System Design In the proposed model, the authors have presented a prediction based task mapping scheme for achieving load balancing. For this model, the tasks are assumed to be independent of one another. At first, the underloaded hosts are detected by considering the lower threshold as 15% CPU

722

S. C. Moharana et al.

usage. The tasks available on these hosts are randomly placed over the available VMs. The next job is to identify the overloaded VM. In-order to detect the overloaded VM, an upper threshold is computed based on a statistical method. As suggested by Beloglazov and Buyya [22], the median absolute deviation (MAD) has been employed to decide upon the upper threshold. The MAD value is computed by taking the previous CPU usage values of available VMs and is used to predict the upper threshold value dynamically. The technique used for computing the MAD value is mentioned below. v_MAD = median(| h_CPUi − h_CPU |) for 1 ≤ i ≤ n

(1)

where n represents the count of currently active VMs, v_MAD represents the MAD value, h_CPUi represents the previous CPU usage value of ith CPU, and h_CPU represents the mean CPU usage value of n active VMs. Then, the upper threshold value (u_THR) gets predicted using the rule, u_THR = 1 − k ∗ v_MAD for 0 ≤ k ≤ 1

(2)

In case, the value of k moves towards 0 then u_THR gets a lesser value otherwise it gets a larger value. The value of k decides the aggressiveness of the VM for the accomplishment of tasks assigned to it. In the proposed scheme, the k value is assumed to be the standard value of 0.7. The complete proposed system model is provided in Fig. 1 for reference. In the second phase of the presented model, the task selection is taken into account after the selection of the overloaded VMs. For each task running on the available VMs, a matrix is maintained that will keep the record of workload (in MIPS) and the priority of each one. α ∗ t_LOAD + (1 − α) ∗ t_PRT

Fig. 1 Proposed system model

(3)

Prediction Based Task Scheduling for Load Balancing …

723

The task with the highest value is selected for migration from the previously selected VMs. Nevertheless, the random strategy is applied to break the coincidence in task selection. The α value is meant for keeping the balance between the task load (t_Load) and its priority (t_PTR) and it must be chosen according to the environmental requirements. In the proposed model the α value is chosen 0.6, as it leads to the best results. It is worthy to mention, at the VM selection phase, the VMs are segregated into either overloaded or non-overloaded ones. After the task gets selected from the overloaded VM, it is migrated to the non-overloaded VM. This process continues in iterations inorder to achieve load balancing across the VMs. As a result, the computing resources are also utilized efficiently. The next section will highlight the assessment of the presented model and compare the findings with the existing approaches.

4 Experimental Evaluation and Results The experimental environment takes the hp pro-book system with core i5 8th gen processor, 8 GB of RAM, and Ubuntu 18.04 OS into consideration. The proposed model is implemented in the Java programming environment. The collection framework in Java has been used efficiently for realizing the proposed scheme. One list is created in Java for each VM. The CPU requirement of the user tasks are represented as random values in a pre-defined range and these values are stored in the list. It simulates the scheduling of tasks to VMs. Then, the proposed model executed in many iterations. In the first experiment, 20 VMs are considered whereas the second experiment considers 40 VMs for analyzing the performance of the presented scheme with contemporary approaches. The performance metric Load_Imbalance_Level has been used to measure the performance of the presented model with the existing schemes. The Load_Imbalance_Level parameter can be mathematically defined as,  Load_Imbalance_Level =

n 

 (c_LOADi − c_LOADi+1 )

n

(4)

i=1

where c_LOADi represents the CPU load post applying the proposed scheme. The values of c_LOADi in the gap of five iterations have been recorded to analyze the effectiveness of the presented scheme. It has been observed that the presented scheme outperforms the existing model in terms of Load_Imbalance_Level in both the experiments as shown in Fig. 2. As Load_Imbalance_Level gets reduced in the presented scheme, the CPU load difference among the active VMs gets minimized as well leading to load balancing. The next section will present the concluding remarks and provide future directions as well.

724

S. C. Moharana et al.

Fig. 2 Performance assessment of proposed model

5 Conclusion The scheduling of user service requests to the virtual machines in the cloud environment to balance the load is extensively studied in the literature. However, prediction based task scheduling using the statistical approaches can be an extra addition. In this work, a prediction based task scheduling scheme is presented based on the median absolute deviation. The work aims to reduce the CPU load difference among the active VMs to achieve load balancing among the VMs. The experiment results suggest that the presented model outperforms the contemporary scheduling schemes in terms of load imbalance level. In the future, the plan is to incorporate multiple parameters like memory usage, network bandwidth into the proposed model and analyze its performance.

Prediction Based Task Scheduling for Load Balancing …

725

References 1. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010) 2. Armbrust, M., Fox, A., Griffith, R., Joseph, A., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: Above the Clouds: A Berkeley View of Cloud Computing. Rep. UCB/EECS-2009-28, University of California at Berkley, USA (2009) 3. Yousafzai, A., Gani, A., Noor, R.M., Sookhak, M., Talebian, H., Shiraz, M., Khan, M.K.: Cloud resource allocation schemes: review, taxonomy, and opportunities. Knowl. Inf. Syst. 50(2), 347–381 (2017) 4. Panda, B., Moharana, S.C., Das, H., Mishra, M.K.: Energy aware virtual machine consolidation for load balancing in virtualized environment. In: 2019 International Conference on Communication and Electronics Systems, pp. 180–185. IEEE, India (2019) 5. Singh, S., Chana, I.: Cloud resource provisioning: survey, status and future research directions. Knowl. Inf. Syst. 49(3), 1005–1069 (2016) 6. Arunarani, A., Manjula, D., Sugumaran, V.: Task scheduling techniques in cloud computing: a literature survey. Future Gener. Comput. Syst. 91, 407–415 (2019) 7. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NPCompleteness. Freeman, San Francisco (1979) 8. Subrata, R., Zomaya, A.Y., Landfeldt, B.: Game-theoretic approach for load balancing in computational grids. IEEE Trans. Parallel Distrib. 19(1), 66–76 (2008) 9. Milani, A.S., Navimipour, N.J.: Load balancing mechanisms and techniques in the cloud environments: systematic literature review and future trends. J. Netw. Comput. Appl. 71, 86–89 (2016) 10. Patel, D.K., Tripathy, D., Tripathy, C.R.: Survey of load balancing techniques for grid. J. Netw. Comput. Appl. 65, 103–119 (2016) 11. Freund, R.F., Gherrity, M., Ambrosius, S.L., Campbell, M., Halderman, M., Hensgen, D.A., Keith, E.G., Kidd, T., Kussow, M., Lima, J.D., Mirabile, F., Moore, L., Rust, B., Siegel, H.J.: Scheduling resources in multi-user, heterogeneous, computing environments with SmartNet. In: Proceedings of the 7th Heterogeneous Computing Workshop, pp. 184–199. IEEE, USA (1998) 12. He, X., Sun, X., Von, L.G.: QoS guided min-min heuristic for Grid task scheduling. J. Comput. Sci. Technol. 18(4), 442–451 (2003) 13. Wu, X., Deng, M., Zhang, R., Zeng, B., Zhou, S.: A task scheduling algorithm based on QoS-driven in cloud computing. Procedia Comput. Sci. 17, 1162–1169 (2013) 14. Ali, H.G.E.D.H., Saroit, I.A., Kotb, A.M.: Grouped tasks scheduling algorithm based on QoS in cloud computing network. Egypt. Inform. J. 18(1), 11–19 (2017) 15. Umarani, S.G., Maheswari, V.U., Shanthi, P., Siromoney, A.: Tasks scheduling using ant colony optimization. J. Comput. Sci. 8(8), 1314–1320 (2012) 16. Cho, K.M., Tsai, P.W., Tsai, C.W., Yang, C.S.: A hybrid meta-heuristic algorithm for virtual machine scheduling with load balancing in cloud computing. Neural Comput. Appl. 26(6), 1297–1309 (2014) 17. Lee, Y.C., Zomaya, A.Y.: Energy efficient utilization of resources in cloud computing systems. J. Supercomput. 60(2), 268–280 (2012) 18. Rimal, B.P., Choi, E., Lumb, I.: A taxonomy and survey of cloud computing systems. In: Fifth International Joint Conference on INC, IMS and IDC, pp. 44–51. IEEE, South Korea (2009) 19. Ramezani, F., Lu, J., Hussain, F.K.: Task-based system load balancing in cloud computing using particle swarm optimization. Int. J. Parallel Prog. 42(5), 739–754 (2013) 20. Wu, H.S., Wang, C.J., Xie, J.Y.: TeraScaler ELB—an algorithm of prediction-based elastic load balancing resource management in cloud computing. In: 27th International Conference on Advanced Information Networking and Applications Workshops, pp. 649–654. IEEE, Spain (2013)

726

S. C. Moharana et al.

21. Bala, A., Chana, I.: Prediction-based proactive load balancing approach through virtual machine migration. Eng. Comput. 32(4), 581–592 (2016) 22. Beloglazov, A., Buyya, R.: Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers. Concurr. Comput.: Pract. Exp. 24(13), 1397–1420 (2011)

Test Case Generation Using Adequacy-Based Genetic Algorithm Ruchika Malhotra and Shivani Pandey

Abstract Generating test cases is one of the most time- and effort-consuming problems in software testing. Many efforts have been made to automate this problem so as to make the procedure of software testing more efficient. The major part of these solutions involves the use of evolutionary techniques. Genetic algorithm is associated with automating the problem of test case generation since early 1990s. This paper presents an alternative way of using genetic algorithm for test case generation. It involves adequacy-based approach where the mutants are incorporated into the source code while generating the test cases. This approach will not only help in producing efficient results but also will reduce ample amount of time taken in the process. The results show that the intended approach undergoes an effective decline in the obtained number of test cases when compared to the path testing approach. Keywords Test case · Test case generation · Evolutionary techniques · Genetic algorithm · Adequacy-based approach

1 Introduction In conventional life cycle of developing the software, the software testing procedure takes nearly half of the development budget, more than half of the total development time, and maximum effort compared to all the other phases [1]. The process of software testing comprises of three main phases considering test cases: (i) generation, (ii) execution, and (iii) evaluation which can be singularly described as: Test case generation is the process which involves developing the relevant test cases in accordance with a particular software system. Further, the test cases are executed for the verification of software functionalities in the process called test case execution. The R. Malhotra · S. Pandey (B) Department of Computer Science and Engineering, Delhi Technological University, New Delhi, India e-mail: [email protected] R. Malhotra e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_77

727

728

R. Malhotra and S. Pandey

third phase, known as test case evaluation, involves recording the test cases which were useful and provided value to the entire process of software testing [2]. The most crucial one out of all these phases is test case generation as it takes the maximum cost, effort, and time out of all the three phases [1], and its execution requires a certain level of expert knowledge. The process of automating test case generation can greatly reduce the overall development time and cost of the software testing procedure which in turn will reduce the time and cost of overall software development procedure [3]. There has been a lot of research done in the field involving automation of software testing. Most of the work regarding automation of test case generation involves the use of evolutionary techniques. Genetic algorithm is the most used evolutionary technique in automatic test case generation [4]. It imitates the process of the natural biological evolution built on the notion: ‘survival of the fittest’ given by Charles Darwin. This algorithm involves an initial population which evolves into a better population in each generation by allowing the reproduction of the individuals partaking high fitness and discarding the individuals with low fitness values. It involves having an initial population which is checked for its fitness followed by selection of parents having high fitness values to generate a new population which contains the offsprings using the process of crossover and mutation of genes. This new population is then evaluated. This algorithm is iterated until a pre-decided stopping criterion is met [5]. Software testing is broadly categorized into functional testing and structural testing. The process of functional testing is used for checking the functionalities of the software system [6], and structural testing emphases the correctness in structure or hierarchy of the system [7]. Structural testing is further classified into reliabilitybased criteria and adequacy-based criteria. The reliability-based criteria outputs a reliable set of test cases which proves the correctness of the program while adequacybased criteria brings out the fault finding capacity of the test suite generated [8]. This paper uses adequacy-based testing criteria along with incorporating mutation analysis alongside the process of test case generation which will save a substantial amount of time in the whole process. It is an extension of work in [9] and uses the concept with better technology and enriched dataset to get efficient results which prove the fidelity of the technique. The rest of the paper is systematized as: Sect. 2 covers methodology employed in the paper. Section 3 discusses experimental studies, technologies and parametric settings, and the results obtained by the process. Section 4 carries the conclusions and the possibilities of the future work.

2 Methodology This paper proposes an adequacy-based criteria for generation of the test cases. The most common practice which examines test case adequacy is ‘mutation analysis’ which usually is done after we have generated the test cases. This paper proposes a

Test Case Generation Using Adequacy-Based Genetic Algorithm Fig. 1 Steps of genetic algorithm in the form of flowchart diagram

729

Initial population

Fitness evaluation

Selection procedure

N Mutation/ Crossover

Stopping Condition Y

End

technique which uses the mutation analysis alongside test case generation which save ample amount of time and automatically generates adequate test cases as the output. The typical process of genetic algorithm is shown below in a form of flowchart in Fig. 1. In the proposed method, we first generate mutants in the program by making some slight variations in the source code. Then, we record the difference in the original and mutated statement of the source code and generate the respective constraints accordingly. The solution to these constraints would represent the test cases. Subsequently, using the rules given in [10] we construct fitness function for the source code. This fitness function is then fed to the genetic algorithm for the generation of the test cases. Now, this process has the capability to kill the other mutants, along with the current one, recorded at the initial level. So we will now examine the status of other mutants. If any mutant is still alive, we will repeat the process until all the mutants are killed. Figure 2 shows the comprised steps in the proposed process.

730

R. Malhotra and S. Pandey 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Identify the working source code for a program. Generate the mutants in definite statements of the source code. For a particular mutant, precisely record the differences in the mutated and original statements. Generate the constraints for the program according to the recorded differences. Generate the fitness function for the program using the constraints generated. Feed the generated fitness function to genetic algorithm module for generating the test data. If all other mutants are killed by the generation of current test data; end the process. Else, consider the next live mutant and go to step 3. Repeat step 7 and 8 until all the mutants are processed. End.

Fig. 2 Steps showing the proposed method in detail

Table 1 Details of the dataset

S. No.

Program

Description

LOC

P1

snake.c

Snake and bits game

564

P2

bike.c

Bike racing game

581

P3

pacman.cpp

Pacman game

647

P4

sal.c

Snake and ladder game

867

P5

heli.cpp

Helicopter attack game

959

P6

helilearn.cpp

Helicopter driving game

1048

P7

fortell.cpp

Fortune teller game

1059

P8

tank.c

Tank game

1092

3 Experimental Studies 3.1 Dataset In this work, we have used source codes of eight real time programs, which lie in the range of 564–1092 lines of code. All these programs are developed in C/C++ language. All the selected source codes are of game-based programs and are fully functional. The details of the programs used are shown in Table 1.

3.2 Fitness Function Construction Once we have finalized the dataset, we have to consider each program source code individually and apply the process described in Sect. 2 on the same until the desired results are obtained. Mutant generation. A mutant in a source code is introduced by intentional alteration of the source code [11]. For each program considered, we have chosen five mutants by introducing a slight variation in the source code. While choosing the

Test Case Generation Using Adequacy-Based Genetic Algorithm

731

mutants we have to make sure that we have to choose such a mutant that its execution will lead the program onto a different path than the expected path. This is how the mutant will be identified and killed. We have recorded each of these mutants carefully along with its processing status during the execution. For each mutant, a total of 10 runs are iterated in each program. Constraint Generation. This is the crucial step which will ensure the correctness of the obtained test data [8]. After the generation of mutants, we record the differences obtained in the mutated statement with the original statement of the source code. We record these differences in a specific format to generate the constraints for the mutant. The solutions to these constraints will give us the required test cases for the specific program. Fitness function. The main element in genetic algorithm is its fitness function value. Based on this value of fitness function, the algorithm decides the goodness measure of each individual element in the population [12]. The performance of the algorithm solely depends upon how effective is the associated fitness function. Henceforth, the step of generating the fitness function is the most crucial in the execution of genetic algorithm. The procedure adopted for the construction of fitness function is the one followed in [9]. It will generate the fitness function for the mutant in consideration which will then be fed the genetic algorithm module of Python to get the desired results.

3.3 Parametric Settings The settings of the adequacy-based algorithm are as follows: 1. 2. 3. 4. 5. 6. 7. 8.

Initial population size: 100 Genetic Algorithm software: GA module of Python. Selection Technique: Roulette wheel selection. Representation scheme: Double vector. Crossover technique: Single point crossover. Mutation rate: 0.01 Crossover probability: 0.09 Number of generations (maximum): 40

3.4 Experimental Results For each program mentioned in Table 1, we have computed test cases for the proposed technique by taking five mutants in each source code, generating their respective constraints and fitness functions and subsequently the same to the genetic algorithm module of Python. For comparing the efficiency of this adequacy-based technique, we have used the most used technique from reliability testing criteria: ‘path testing technique’ [13]. For this technique, we have constructed control flow graphs for

732

R. Malhotra and S. Pandey

Table 2 Total test cases generated as per adequacy-based and reliability-based testing techniques S. No.

Program source code

Adequacy-based technique (proposed technique)

Reliability-based technique (path testing technique)

P1

snake.c (Snake and bits game)

43

111

P2

bike.c (Bike racing game) 64

104

P3

pacman.cpp (Pacman game)

117

199

P4

sal.c (Snake and ladder game)

33

415

P5

heli.cpp (Helicopter attack game)

124

172

P6

helilearn.cpp (Helicopter driving game)

107

179

P7

fortell.cpp (Fortune teller game)

100

373

P8

tank.c (Tank game)

165

241

each of the program source codes mentioned in Table 1 and then chose five unique paths from those CFGs [14]. We have followed the method given in [9] for the construction of fitness values for each of these paths. Upon the construction of these fitness functions, we will feed them to our genetic algorithm module as we did for the proposed technique; and in the same manner present technique is also iterated for 10 runs. Table 2 shows the total number of test cases generated by both the techniques, namely adequacy-based technique (proposed technique) and reliability-based technique (path testing technique). Now, for each of the five mutants and paths selected in each program, the total number of unique test cases is recorded for each of the 10 runs. Then, we have taken the average of the values for each mutants or path separately. Any value obtained in decimal is approximated by rounding off to the floor value to get an integral value. After this, we have taken the sum of all these approximated values. These are the values shown in Table 2 (result of the sum of approximated average values of all the 10 runs for mutants/paths for each program). The comparison between both techniques is done on the basis of two measures: (i) total number of test cases generated and (ii) time taken for generating the test cases. Here, the proposed method has generated considerably less number of test cases, as and when compared with the path testing technique. The main reason being, the proposed technique only generates adequate test cases while path testing technique generates adequate and non-adequate test cases as well. Millo and Offut have stated that an adequate test case set is responsible for the failure of all faulty versions of considered program [6]. Adequacy mainly focuses on detection of faults by the

Test Case Generation Using Adequacy-Based Genetic Algorithm

733

100

% reduction in test cases

90 80 70 60 50 40 30 20 10 0 1

2

3

4

5

6

7

8

Program number

Fig. 3 Percentage reduction obtained in the number of test cases

test case set rather than focusing on proving the correctness; and this makes it a better alternative. We are comparing both the techniques on the basis of percentage reduction in the test cases which is calculated by the formula used in [9] which is shown as Eq. 1 below: percentage reduction =

TPT − TOT TPT

(1)

Here, TPT denotes the number of test cases generated by the path testing technique and TOT denotes the number of test cases generated by our proposed technique. The following bar chart represented in Fig. 3 shows the comparison between the two techniques based on the above-stated formula. Figure 3 shows that there is 27.9– 92.04% reduction observed in number of the test cases which are yielded by the proposed technique in comparison to the reliability-based technique. For the method of path testing, for each program five unique paths are taken into consideration and fitness function are constructed for each of these five paths separately before feeding them onto the genetic algorithm module. However, in the proposed technique, we have chosen five mutants, which upon execution kill one or more of the other mutants. Therefore, we have to construct the fitness function for only those mutants which are not killed by their fellow mutants. For instance, if out of 5 mutants, 3 are killed in the process, we have to construct the fitness function only for two mutants saving 60% time when compared with the method of path testing technique. Hence, ample amount of time is saved in the proposed technique when compared with the time taken in path testing technique. The bar chart in Fig. 4 shows the percentage reduction in time in proposed method for considered dataset when compared to the path testing technique. The above figure shows that time taken in generating the test cases undergoes 20–60% reduction in the proposed technique in comparison to the reliability-based technique.

734

R. Malhotra and S. Pandey 70

% reduction in time

60 50 40 30 20 10 0 1

2

3

4

5

6

7

8

Program number

Fig. 4 Percentage reduction in time taken in generating the test cases

4 Conclusion In this work, genetic algorithm has been used for the generation of adequate test cases. We have adopted a concept of generating test cases along with simultaneous mutation analysis so as to automatically generate adequate test cases and to save additional time. We have taken a rich dataset of eight real-time program that ranges up to 1092 lines of source code. We implemented the technique on this dataset and got promising results. The results of the implementation were compared against the most used reliability-based technique: path testing technique. Considering the number of generated test cases, we have recorded a reduction up to 92.04% in comparison of path testing technique; which is substantially better than the parent research. In terms of time taken in generation of these test cases we have recorded up to 60% savings. Hence, in both the comparison criteria, the proposed technique has shown significant better results than the reliability-based testing. Thus, we can say the proposed technique is a promising technique in the area of automatic test cases generation. The future aspects of this works includes the implementation of other heuristic algorithms following this concept of the proposed adequacy-based technique such as particle search optimization, bat algorithm, artificial bee colony algorithm etc. to verify the efficiency of the technique independent of the algorithm. Further, an automatic tool can be developed for implementing the proposed technique to save time and efforts to develop the code at each attempt.

References 1. Beizer, B.: Software testing techniques. Dreamtech Press (2003) 2. Singh, Y.: Software Testing. Cambridge University Press (2012)

Test Case Generation Using Adequacy-Based Genetic Algorithm

735

3. McMinn, P.: Search based software test data generation: a survey. Softw. Test. Verif. Reliab. 14(2), 105–156 (2004) 4. Chuaychoo, N., Kansomkeat, S.: Path coverage test case generation using genetic algorithms. J. Telecommun. Electron. Comput. Eng. (JTEC) 9(2–2), 115–119 (2017) 5. Korel, B.: Automated software test generation. IEEE Trans. Softw. Eng. 16(8), 870–879 (1990) 6. Duran, J.W., Ntafos, S.C.: An evaluation of random testing. IEEE Trans. Softw. Eng. 10(4), 438–443 (1984) 7. Jones, B.F., Sthamer, H.H., Eyres, D.E.: Automatic structural testing using genetic algorithms. Softw. Eng. J. 299–306 (1996) 8. DeMillo, R., Offutt, A.J.: Constraint-based automatic test data generation. IEEE Trans. Softw. Eng. 17(9), 900–910 (1991) 9. Malhotra, R., Garg, M.: An adequacy based test data generation technique using genetic algorithms. J. Inf. Process. Syst. 7(2), 363–384 (2011) 10. Chen, Y., Zhong,Y.: Automatic path-oriented test data generation using a multi-population genetic algorithm. In: Fourth International Conference on Natural Computation, pp. 566–570. IEEE (2008) 11. Haga, H., Suehiro, A.: Automatic test case generation based on genetic algorithm and mutation analysis. In: IEEE International Conference on Control System, Computing and Engineering, pp. 119–123. IEEE (2012) 12. Xanthakis, S., Ellis, C.: Application of genetic algorithm to software testing. In: Proceedings of 5th International Conference on Software Engineering and its Applications, pp. 625–636. Toulouse, France (1992) 13. Nirpal, P.B., Kale, K.V.: Using genetic algorithm for automated efficient software test case generation for path testing. Int. J. Adv. Netw. Appl. 2(6), 911–915 (2011) 14. Dahal, K., Hossain, A.: Test data generation from UML state machine diagrams using GAs. In: International Conference on Software Engineering Advances, pp. 834–840. IEEE (2007)

Performance Analysis of π, AL and CT for Consistency Regularization Using Semi-Supervised Learning Rishita Choubey and Koushik Bhattacharyya

Abstract A semi-supervised learning problem starts with a series of labeled data points as well as some data point for which labels are not known. The primary motive of a typical semi-supervised model is to categorize some of the unlabeled data by means of the labeled information set. The training procedures used for semisupervised learning paradigm show different levels of consistency loss. These losses are mainly caused by small disruptions or disturbances of its inputs and parameters, and it helps in improvisation of generalization performance in comparison to supervised learning. This research article analyzes the performance of , active learning (AL) and interpolation consistency training (ICT). Keywords Semi-supervised learning ·  · Active learning · Interpolation consistency training · Consistency regularization · Supervised learning · Neural network · Consistency-based model

1 Introduction Deep learning models [1] yield better results when trained with an ample amount of supervised data. In real-life scenarios, obtaining a large-scaled labeled dataset can be challenging, since the construction of such datasets is usually cost-incurring. Here semi-supervised learning [2, 3] comes into play. By virtue of standard development procedures like Reinforcement Learning Models or Generative Adversarial Networks (GANs) [4, 5], large-scaled labeled datasets can be composed, and their potency can be further improved by the implementation of consistency enforcing models. These models are trained using unlabeled data and aim at stabilizing the predictions, on being subjected to input perturbations. These are widely used for training audio R. Choubey (B) · K. Bhattacharyya Computer Science and Engineering, Dream Institute of Technology, Kolkata, India e-mail: [email protected] K. Bhattacharyya e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_78

737

738

R. Choubey and K. Bhattacharyya

recognition models and further utilized in research-oriented fields of medicine and other technologies. After a detailed analysis, we have determined the accuracy of the following consistency enforcing models [6–8], based on the error percentage in their predictions. Hence, our research article is purposed to show a performance analysis of the best-known consistency regularization models. π model, AL model, CT model, applied to elementary neural network approaches, using conventional datasets, namely CIFAR-10 and SVHN. This gives a clear perception of the three models and the best among the three.

2 Literature Review The demand of consistency-based models is rapidly increasing due to the vivid use of semi-supervised learning. One example of such consistency-based models has proposed by Samuli Laine and Timo Aila [4], where self-ensembling [9] is introduced. Here concurrent predictions for unknown labels are made using the outputs of the network, on various time intervals. The proposed theory heavily relies on the concept of dropout regularization and input augmentation [10]. The drawback here was that dropout, when increased beyond a certain threshold, results in overfitting or underfitting of the model. Intuitively, higher dropout rate would result in variance to some of the layers which eventually degrades the overall performance of the model. Hence, dropout is not used much nowadays. On the other hand, input augmentation is also computationally very expensive (only rotation and scaling are cheap). Another model which was grounded on training consistency-based methods using stochastic weight averaging (SWA) was proposed by Ben Athiwaratkun, Marc Finzi, Pavel zmailov, Andrew Gordon Wilson. It is a recent approach where a modified learning rate is used, it uses the mean of weights along the arc of stochastic gradient descent (SGD) [7] is evaluated. But SWA does not give optimum predictions and can be often slow for large datasets depending on the learning rate of the SGD. Further researches include the concept of active learning (AL) [11], authored by Mingfei Gao, Zizhao Zhang, Guo Sercan O. Arik, Larry S. Davis, Tomas Pfister. This is a combination of data labeling and model training. Minimization of the labeling cost is achieved by primarily selecting data of higher value. In AL models based on pool mechanism, easily accessible unlabeled data, are used for selection purposes, but not for training a model. The performance of AL model is better than π model, but it cannot be considered as the optimum model since the error rate in this model is also considerably elevated.

Performance Analysis of π, AL and CT …

739

3 Consistency-Based Models NL In the semi-supervised setting, we have access to labeled data DL = {(xiL , yiL )}i=1 )} N U U   and unlabeled data DU = {(xi )}i=1 . Given two perturbed inputs x , x of x and the perturbed weights ωf and ωg the consistency loss penalizes the predicted probabilities f i(x ; ωf ) and (x  ; ωg ). This loss is typically the mean squared error or KL divergence:

   2    MSE ω f , x =  f x  : ωf − g x  , ωg  lcons KL lcons

or        ω f , x = KL f x  ; ωf ||g x  , ωg

(1)

The total loss used to train the model can be written as:         lC E ω f  x, y + λ lcons ω f , x L ωf = (x,y)∈D L



 LCE

x∈D L ∪DU





(2)



L cons

where for classification LCE is the cross-entropy between the model predictions and supervised training labels. The parameter λ > 0 controls the relative importance of the consistency term in the overall loss.

3.1 Π-model The -model can also be seen as a simplification of the -model of the ladder network by Rasmus et al. [2, 12, 13], a previously presented network architecture for semi-supervised learning. Algorithm (-model pseudocode): Require: xi = training stimuli. Require: L = set of training input indices with known labels. Require: yi = labels for labeled inputs i ∈ L. Require: w(t) = unsupervised weight ramp-up function. Require: f θ (x) = stochastic neural network with trainable parameters θ. Require: g(x) = stochastic input augmentation function. for t in [1, numepochs] do. for each minibatch B do. z i∈B ← f θ (g(xi∈B )), evaluate network outputs for augmented inputs. z˜ i∈B ← f θ (g(xi∈B

)), again, with different dropout and augmentation. log z i |yi |, supervised loss component. loss ← − |B1i | i∈(B∩L)

740

R. Choubey and K. Bhattacharyya

1 +ω(t) C|B|



||z i − z˜ i ||2 , unsupervised loss component.

i∈B

update θ using, e.g., ADAM. Update network parameters. end for. end for. return θ. The network is evaluated for each training input xi twice, resulting in prediction vectors z i and z˜ i . The loss function consists of two components. The first component is the standard cross entropy loss, evaluated for labeled inputs only. The second component, evaluated for all inputs, penalizes different predictions for the same training input xi by taking the mean square difference between the prediction vectors z i and z˜ i . To combine the supervised and unsupervised loss terms, the latter is scaled by time-dependent weighting function w(t).

3.2 Consistency-Based Semi-Supervised AL (Active Learning) It includes incorporating a semi-supervised learning (SSL) objective in the training phases of AL [8, 11]. This model is based on minimizing the notion of sensitivity to perturbations with the idea of inducing “consistency,” i.e., imposing similarity in predictions when the input is perturbed in a way that would not change its perceptual content. For consistency-based semi-supervised training, a common choice of loss is: ˜ M) (3) L u (x, M) = D P(Yˆ = l|x, M), P(Yˆ = l|x, where D is a distance function such as KL divergence, or L2 norm [4, 14] and x˜ denotes a perturbation of the input x. Specifically, it proposes a simple metric C measures the inconsistency across perturbations. There are various ways to quantify consistency. Due to its empirically observed superior performance: C(B, M) =



ε(x, M), where

x∈B

ε(x, M) =

J 

 Var P(Yˆ = l|x, M), P(Yˆ = l|x˜1 , M), P(Yˆ = l|x˜ N , M)

(4)

l=1

J is the number of response classes and N is the number of perturbed samples of the original input data x, {x˜1 , …, x˜ N }. Algorithm (AL-model pseudocode):

Performance Analysis of π, AL and CT …

741

Require: xi = training stimuli. Require: L = set of training input indices with known labels. Require: yi = labels for labeled inputs i ∈ L. Require: w(t) = unsupervised weight ramp-up function. Require: f θ (x) = stochastic neural network with trainable parameters θ. Require: g(x) = stochastic input augmentation function. fort in [1, num epochs] do. for each minibatch B do. z i∈B ← f θ (g(xi∈B )), evaluate network outputs for augmented inputs. z˜ i∈B ← f θ (g(xi∈B

)), again, with different dropout and augmentation. log z i |yi |, supervised loss component. loss ← − |B1i | i∈(B∩L)

1 +ω(t) C|B| ||z i − z˜ i ||2 , unsupervised loss component. i∈B

update θ using, e.g., ADAM. Update network parameters. end for. end for. return θ. where D is a distance function such as KL divergence and x˜ denotes a perturbation (augmentation) of the input x. C is the selection criterion for better integration of AL selection mechanism in the SSL training framework. N is the number of perturbed samples of the original input data x, {x˜1 , …, x˜ N }.

3.3 ICT Model Interpolation Consistency Training (ICT) is a naive and competent computation algorithm for training deep neural networks [15] in the semi-supervised learning paradigm. ICT method boosts the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. ICT moves the decision boundary to low-density regions of the data distribution in classification problems. It has been observed that ICT’s performance, when applied to standard neural network architectures, is optimal [9, 16] on the CIFAR-10 and SVHN benchmark datasets. Algorithm (ICT-model pseudocode): Require: (x): f θ (x) neural network with trainable parameters θ . Require: f θ  (x) mean teacher with θ  equal to moving average (MA) of θ . Require: D L (x, y): collection of the labelled samples. Require: DU L (x, y) : collection of the unlabelled samples. Require: α: rate of moving average. Require: w(t): ramp function for increasing the importance of consistency regularization. Require: T: total number of iterations.

742

R. Choubey and K. Bhattacharyya

Require: Q: random distribution on [0,1]. Require: Mixα (a, b) = λa + (1 − λ)b. for t = 1, ..., T do. B ∼ Sample {(xi , yi )}i=1  labeled minibatch.  D L (x, y). Sample B Supervised loss (cross-entropy). L S = Cross-Entropy {( f θ (xi ), yi )}i=1  U U Sample u j j=1 , {u k }k=1 ∼ DU L (x). Sample two unlabeled examples.   U  U  U yˆ j j=1 = f θ  u j j=1 , yˆk k=1 = { f θ  (u k )}Uk=1 . Compute fake labels. Sample λ ∼ Q. sample coefficient.  an interpolation    y ˆ interpolation. Mix , y ˆ u m = Mixλ u j , u k , yˆm = λ j k  U. Compute   . e.g., mean squared error. L U S = Consistency Loss f θ (u m ), yˆm i m=1 L = L S + w(t) · L U S . Total Loss. gθ ← ∇θ .L. Compute Gradients. θ  = αθ  + (1 + α)θ . Update moving average of parameters. θ ← Step(θ, gθ ). e.g., SGD, Adam. end for. return θ. ICT regularizes semi-supervised learning by encouraging consistent predictions f (αu 1 + (1 − α)u 2 ) = α f (u 1 ) + (1 − α) f (u 2 )

(5)

at interpolations αu 1 + (1 − α)u 2 of unlabeled points u 1 and u 2 . ICT learns f θ in a semi-supervised manner. ICT uses f θ , where θ  are an exponential moving average of θ . During training, predictions are       f θ (Mixλ u j , u k ≈ Mixλ f θ  u j , f θ  (u k ) ,

(6)

and correct predictions for labeled examples xi . Given a mixup [8, 17] operation: Mixλ (a, b) = λ · a + (1 − λ) · b

(7)

Interpolation Consistency Training (ICT) trains a prediction model f θ to provide consistent predictions at interpolations of unlabeled points:        f θ Mixλ u j , u k ≈ Mixλ f θ  u j , f θ  (u k ) where θ  is a moving average of θ .

(8)

Performance Analysis of π, AL and CT …

743

4 Experiments 4.1 Implementation It is a common practice of conducting experiments based on semi-supervised learning, where to cut down the expenses, only a minor part of the training data is labelled, and the rest do not have labels (unlabelled). To achieve accurate predictions standardized procedures [18, 19] were used. The CIFAR-10 dataset comprises of 70,000 color images, split into size of 32 × 32 each, between 50 K training and 10 K test images. The ten classes of our dataset comprised of images, mainly of common objects around us namely buildings, dogs, ship and cars. Similarly, there are 73,257 training samples and 26,032 testing samples also available in the SVHN dataset, each of which can have a dimension of 32 × 32. Either sample is a detailed image of the unique number of a particular house (0–9 s the ten classes as mentioned earlier). Further, in case of CIFAR-10, we resize each image by zero-padding it on each side by 2 px. The finally obtained image is restored back to its original size (32 × 32) by arbitrary cropping. Hence , a new image is obtained this way. Likewise, for SVHN, each image is resized by zero-padding each side of it by 2 px, and again the same cropping is done, producing a new image of resolution 32 × 32 px. This process is again trailed by zero-mean and unit-variance image whitening. At first, it is observed that the most convenient samples for consistency regularization are located adjacent to the decision boundary. If any trivial perturbation δ is added to these unsupervised samples u j , then it eventually pushes the resultant u j +δ to such an extent that it ranges to the opposite side of the decision boundary. Hence, u j + δ, can now be considered as a decent point for the application of consistency regularization techniques since it violates the assumption of low-density separation. On the contrary, the unlabeled points secluded from the decision boundary, i.e., having higher margins, do not experience these violations. Hence, in low-margin unlabeled pointu j , it is detected that a particular perturbation δ, can result in u j and u j + δ, to lie on obverse sides of thedecisionboundary. Rather, if the interpolations u j + δ is considered, u j + δ = Mix u j + u k (Fig. 1 shows an example), where u k is the successive randomly chosen unlabeled instance. There are 3 possibilities, first, the unlabeled samples u j and u k belong to the same cluster. Second, the samples are located in discrete clusters but are parts of a single class and third being the samples are situated on separate clusters and also unlike classes. This elucidation proposes that interpolations between arbitrary unlabeled samples may probably lie in low-density areas. Hence, consistency-based regularization can be applied in such interpolations. Spontaneously, as it can be concluded that decision boundaries having large margin yields better results. Mixup is one of the methods, where prediction models are compelled to linearly change between the samples, in this way the decision boundary and the class have substantial distancein between them.     The paradigm f θ , on being trained, predicts the “fake label”, Mix f θ u j , f θ (u k ) at location Mix u j , u k and the mixup is extended to semi supervised learning setting (Fig. 1 shows an example),  where θ is considered to be the MA (i.e., cumulative average of the groups of subsets

744

R. Choubey and K. Bhattacharyya Labeled Img(i)

Supervis ed loss

Unlabele d Img(j) Mixed m (j, k)

Consistenc y loss

Super vised Loss+ Consis tency Loss

Unlabele d Img(k)

Fig. 1 Proposed block diagram for consistency regularization using semi-supervised learning, where li → correct predictions for labeled images, u j , u m , u k → low margined unlabeled sample images

of the main dataset) of θ . By virtue of SGD or stochastic gradient descent, for each repetition t, the parameters θ are revised so as to curtail L = L S + w(t)· L U S , where L S is the loss incurred due to use of typical cross-entropy mechanism on supervised samples D L . Besides that, L U S is the state-of-the-art terminology in regularization of interpolation consistency. These losses of consistency are reckoned over minibatches which can be either supervised or unsupervised, and the significance of L U S (consistency regularization term), is amplified subsequently after simultaneous iterations by consuming the gradient function w(t). The unlabeled samples namely u j and u k develop a couple of mini batches, sampling which, the L U S is evaluated. Further, fake labels of the same are assessed accordingly.

4.2 Performance Analysis The experiment is conducted by means of CNN-13 or ConvNet-13 along with the architecture of Wide-Residual networks-28-2. The CNN-13 architecture is considered to be the standard benchmark architecture in recent state-of-the-art SSL methods [20]. We are using its variant (i.e., the input layer is devoid of any additional Gaussian noise) as instigated in the -model [7]. The results, i.e., the error percentage, are provided for CIFAR10 and SVHN datasets (see Table 1). In each experiment, we delineate the attributes like mean, variance and standard deviation across 3 independently run trials. The value of consistency coefficient w(t) is initially set to 0.0. Later on, at a quarter of the total number of areas, the value is elevated to its highest possible limit, by executing the typical sigmoid routine. The loss in consistency was

Performance Analysis of π, AL and CT …

745

Table 1 Outcomes for various models on CIFAR10 (5000 labels) and SVHN (1000 labels) datasets S. No.

Models

CIFAR10 5000 labeled 60,000 unlabeled (test error %)

SVHN 1000 labeled 76,277 unlabeled (test error %)

1



29.66 ± 2.34

17.21 ± 3.01

2

AL

18.67 ± 1.23

9.01 ± 1.01

3

ICT

6.79 ± 0.12

2.54 ± 0.04

determined by computing the sum of squared distances between our target variable and predicted values (mean squared error). Table 1 shows outcomes for various well-known consistency regularization models on CIFAR10 (5000 labels) and SVHN (1000 labels) datasets. It is observed that for CIFAR10 as well as SVHN datasets, CT attains better results compared to other models. An SSL algorithm can be evaluated by comparing its performance against a novel algorithm using supervised learning. Hence, this research article shows an effective comparison of three sophisticated algorithms, administered as π , CT, and AL in Table 1. After successful completion of the experiment, it is observed that CT method outperforms this test as compared to other models and results in a twofold reduction in the error obtained in the case of CIFAR10, and a drastic reduction of four-folds is detected for SVHN dataset. Additionally, in Table 1, it is perceived that CT considerably cuts down the test error as compared to robust SSL approaches. For example, for 5000 samples (labeled), it brings down the error percentage of the best-affirmed approach by almost 25%. In general, it is noticed that for a handful of data having labels, lesser the values of the max-consistency coefficient and α, better the validation errors were obtained. For SVHN, CT obtains test errors are competent concerning other well-known SSL methods (Table 1). SSL algorithm, which uses the WRN-28-2, brings out the least error percentage obtained for either of these algorithms. To find the actual efficiency of CT contrary to these semi-supervised learning algorithms, the experiments were conducted on Wide ResNet-28-2 architecture. The outcomes are jotted down in Table 1. CT proves to be more efficient on CIFAR10 and SVHN datasets as compared to other models.

5 Conclusion Machine learning [1] has a radical influence to various extents, and still its application is often constrained due to the high cost of labeled data. Advancement in SSL techniques [18] bridges the gap for those implementations where obtaining labeled data is cost incurring. In this article, we have conducted a performance analysis with the best-known consistency-based models, namely π , AL, and CT, using CIFAR 10 and SVHN, to which we have observed that CT yields the optimal result (having the least error prediction). CT has two benefits when compared to other methods using

746

R. Choubey and K. Bhattacharyya

semi-supervised learning. Firstly, it is a very simple model demanding almost no surplus computation, which is not the same for techniques using adversarial perturbations or generative model training. Secondly, it outpaces robust reference points on two standard datasets, in spite of being devoid of an extensive hyperparameter tuning.

References 1. Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15(2), 201–221 (1994) 2. Rasmus, A., Berglund, M., Honkala, M., Valpola, H., Raiko, T.: Semi-supervised learning with ladder networks. In: Advances in Neural Information Processing Systems (2015) 3. Sajjadi, M., Javanmardi, M., Tasdizen, T.: In regularization of stochastic transformations and perturbations for deep semi-supervised learning. In: Proceedings of the International Conference on Neural Information Processing Systems, NIPS’16, pp. 1171–1179, USA, 2016. Curran Associates Inc. ISBN: 978-1-5108-3881-9 4. Laine, Aila, T.: Temporal ensembling for semi-supervised learning. In: International Conference on Learning Representations (2017) 5. Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.: Mixmatch: A Holistic Approach to Semi-Supervised Learning (2019) 6. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A.C., Bengio, Y.: Generative adversarial nets. In: Advance Neural Information Processing Systems, pp. 2672– 2680 (2014) 7. Athiwaratkun, B., Finzi, M., Izmailov., P, Wilson, A.G.: There are many consistent explanations of unlabeled data: In: International Conference on Learning Representations 8. Luo, J., Zhu, M., Li, Y.R., Zhang, B.: Smooth neighbors on teacher graphs for semi-supervised learning. In: CVPR (2018) 9. French, M.M., Fisher, M.: Self-ensembling for visual domain adaptation. In: International Conference on Learning Representations (2018) 10. Mohammadi, M., Al-Fuqaha, A., Guizani, M., Oh. J.: Semisupervised deep reinforcement learning in support of IoT and smart city services. IEEE Internet Things J. 5(2), 624–635 (2018). https://doi.org/10.1109/JIOT.2017.2712560 11. Chapelle, O., Schlkopf, B., Zien, A.: Semi-Supervised Learning, 1st edn. The MIT Press (2010). ISBN 0262514125, 9780262514125 12. Yazici, C.-S., Foo, S., Winkler, K.-H., Yap, G.P., Chandrasekhar, V.: The Unusual Effectiveness of Averaging in GAN Training (2018) 13. Gao, M., Zhang, Z., Yu, G., Arik, S.O., Davis, L.S., Pfister, T.: Consistency-Based SemiSupervised Active Learning: Towards Minimizing Labeling Cost (2019). arXiv:1910.07153 14. Berthelot, D., Raffel, C., Roy, A., Goodfellow, I.: Understanding and Improving Interpolation in Autoencoders Via an Adversarial Regularizer (2019) 15. Bachman, P., Alsharif, O., Precup, D.: Learning with pseudo-ensembles. In: Advances in Neural Information Processing Systems, pp. 3365–3373 (2014) 16. Goodfellow, O.V., Saxe, A.: Qualitatively characterizing neural network optimization problems. In: International Conference on Learning Representations (2015) 17. Clanuwat, T., Bober-Irizar, M., Kitamoto, A., Lamb, A., Yamamoto, K., Ha, D.: Deep Learning for Classical Japanese Literature. arXiv:1812.01718 (2018) 18. Oliver, A.O., Raffel, C., Cubuk, E.D., Goodfellow, J.: Realistic evaluation of deep semisupervised learning algorithms. In: ICLR Workshop (2018)

Performance Analysis of π, AL and CT …

747

19. Park, S., Park, J., Shin, S.-J., Moon, I.-C.: Adversarial dropout for supervised and semisupervised learning. In: AAAI (2018) 20. Balcan, M.-F., Broder, A., Zhang, T.: Margin based active learning. In: international Conference on Computational Learning Theory, pp. 35–50. Springer (2007)

An Energy-Efficient PSO-Based Cloud Scheduling Strategy Ranga Swamy Sirisati, M. Vishnu Vardhana Rao, S. Dilli Babu, and M. V. Narayana

Abstract Cloud computing provides useful services to users with extensive and scalable resources that virtualized over the internet. It defined as a collection of the communication and computing resources located in the data-center. The service based on on-demand is subject to QoS, the load balance, and certain other constraints with a direct effect on the user’s consumption of resources that are controlled by this cloud infrastructure. It is considered a popular method as it has several advantages that have been provided by a cloud infrastructure. The cloud scheduling algorithm’s primary goal was to bring down the time taken for completion (the cost of execution) of the task graph. The start time and the finish time for the task node influence the task graph completion completed to the time (the cost). The task node sort order an essential aspect that influences the start time and the finish time for every task node. In a hybrid cloud, efficient dense particle mass-based cloud scheduling is efficient because users need to maintain the security of the hybrid cloud. Different algorithms with different algorithms suggested by researchers in the cloud. This paper proposes particle swarm optimization (PSO)-based cloud optimal scheduling. Effective results obtained in an efficient fuzzy mass-based PSO cloud scheduling. Keywords Cloud scheduling · Particle swarm optimization · Cloud tasks · Load balance · Fuzzyness

R. S. Sirisati (B) · M. Vishnu Vardhana Rao · S. Dilli Babu Department of CSE, Vignan’s Institute of Management and Technology for Women, Hyderabad, India e-mail: [email protected] M. Vishnu Vardhana Rao e-mail: [email protected] S. Dilli Babu e-mail: [email protected] M. V. Narayana Department of CSE, Guru Nanak Institutions Technical Campus, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_79

749

750

R. S. Sirisati et al.

1 Introduction Cloud computing delivers the services and resources of computers that include user applications, computer processing power, networks, specialized corporates, and data storage space. Cloud computing permits users to make use of software, hardware managed by service providers of the cloud without the knowledge of the servers. The main advantage of moving to the cloud is the scalability of the application. Unlike the other grids, the cloud resource scalability has permitted some real-time provisioning of the resources to meet the application’s requirements. The various cloud services, such as storage, the capacity of data transfer and storage, are used to bring down the expenses. Some new scheduling proposed to defeat the properties of the network among the clients and their resources. They may use a part of the ideas of traditional planning to combine them with some techniques to ensure efficient scheduling [1]. Typically, these errands scheduled by the client’s needs. Firstly, the scheduling algorithms executed by the networks. The reduced performance that faced in grids, also a need for actualizing planning in clouds. Further, it enables the workflow management systems to meet the QoS requirements for applications as opposed to a conventional approach needed earlier in the common multi-client grid conditions. The various cloud services, such as the data transfer capacity, the resources, the process, and the storage, have to be accessible at a low cost. The environments are not comfortable to be made on the grid resources. Each framework site another setup that can result in some additional exertion each time the application ported to one more site [2]. The VMs further permit an application developer to make a completely customized and convenient environment in their application as shown in Fig. 1. A traditional way to do this is to make use of all clients’ immediate undertakings as their base of overhead applications. The main issue is the association between the overhead application and how there are unlike ways in which there are some overhead expenses resources found in the cloud systems. In case of a significant number of these straightforward assignments, the cost decreased on the off chance that it can have for the complex tasks. This on-demand service that is given by the cloud result in the necessity of some newer strategies of scheduling. These proposed combining some traditional concepts of scheduling along with the new parameters of schedule like the cost of efficient scheduling, job migration, energy consumption, and bandwidth [3].

1.1 Performance Parameters of Cloud Scheduling Several parameters of scheduling discussed below: The make-span: (Completion time): The time variation is between the start and the finish of an entire sequence in a schedule.

An Energy-Efficient PSO-Based Cloud Scheduling Strategy

751

Fig. 1 A cloud scheduling strategy

The Resource Cost: This was determined in a cloud by the capacity of resources and the time occupied. Their powerful resources result in a higher cost. Scalability: It refers to the capacity to deal with and perform along with an increased workload with the capability to be able to enhance the resources effectively. Reliability: The system and its ability to continue the work even in a situation of failure. Resource Utilization: A parameter defines the effectiveness of the utilization of the resources by the system.

1.2 Existing Algorithms Based on a straightforward method of classification, these algorithms grouped into two. Batch-mode heuristic scheduling algorithms (BMHA) is the first one. Online mode heuristic algorithms (OMHA) is the second one. BHMA places jobs in queues and crates the one set based on timeslices. It comes into the concept and completed within the predefine timeslices. The following are some of the models in these classification algorithms.

752

• • • •

R. S. Sirisati et al.

First come first served (FCFS) scheduling algorithm. Round robin (RR) scheduling algorithm). Min–Min scheduling algorithm. Max–Min scheduling algorithm.

Now, the second model contains numerous jobs that scheduled at the time they arrived. As in the case of a cloud, the situation was unrelated. The quickness of every workstation will vary fast, and the algorithms of online mode heuristic scheduling will be well-suited for the cloud environment. The most fit task scheduling algorithm (MFTF) is an ideal example of this online mode heuristic scheduling algorithm. The process of scheduling in a cloud has 3-stages: • The resource realizing and sifting—The agent of the data-center will be aware of the present status of the resources available in a cloud and the remaining resources that could be available. However, these resources may generally be the VMs. The resource selection on the origin of the obtained facts the status of resources involving to the queued jobs and the information on the cloud resource status in connection with the cloud resources, a cloud scheduler will make decisions relating to the deletion and the creation of some could nodes (VMs) for being well-suited in the set of the jobs that to run. • The Job Submission—for this stage, the job is submitted, and resource is selected.

2 Literature Review The static scheduling versus the dynamic scheduling: In case of the static scheduling, all information on the status of the resources available in the cloud and also in the needs of the jobs. In the case of the dynamic scheduling of task allocation, the jobs enter dynamically. The scheduler works hard on the decision making for allocating resources within a stipulated time. The main improvement of the dynamic scheduling over that of static scheduling is that the system does not have to have a runtime behavior of application before running. In centralized scheduling, there is a centralized scheduler or a set of distributed schedulers to make global scheduling decisions. Therefore, there is more resource control. The scheduler can continuously monitor all the available resources with some ease on the implementation. The disadvantage, however, is that there is a lack of scalability, performance, and fault-tolerance. In decentralized scheduling, there is no other central entity that controls the resources. Here, the lower schedulers called the local resources machine (LRM) would manage and also maintain the various job queue. Energy-aware-scheduling has some quick enhancement in cloud computing, and a data-center of a large scale has played a vital role in the case of cloud computing. The depletion of energy for these distributed systems (DS) is now a projecting issue that has been getting much attention. Most of the application scheduling approaches had not been considering the cost of energy for the network devices. The approaches also not considered the cost of energy on the devices that have been a large portion of the consumption

An Energy-Efficient PSO-Based Cloud Scheduling Strategy

753

of power in the enormous data -centers. The model used in minimizing the energy consumption of servers and devices of the network that developed. Gang scheduling is a job that is efficient in the time-sharing applied in the parallel and the distributed systems. This way, every job needs many processors to a certain amount of parallelism. The tasks are executed based on arrival, send-off. In the cloud-computing setup, using the job migration with the mutable workloads, the size of the jobs and their types fit in the computing of high performance in cloud-computing. Different methodology proposed by dieerent authors are arranged in Table 1.

3 Problem Statement The scheduling algorithms that are prevalent in clouds include, based on time, cost scheduling algorithm. The proposed novel algorithm of compromised-time-cost scheduling that considers the cloud-computing characteristics for accommodating the workflows that are instance-intensive and cost-constrained using compromising the time taken for execution and the cost with the user input that enabled on the fly. Particle swarm optimization (PSO)-based heuristics for the scheduling of Workflow Applications: There is a PSO-based heuristic for scheduling the applications to the cloud resources considering the computation cost and the cost of transmission. It used for a workflow application using changing its computation as well as the communication cost. The results of the experiment proved that this PSO was able to achieve a good savings of cost and proper distribution of the workload on the resources. An improved cost-based algorithm for a task scheduling: This is to make an efficient mapping of the available tasks in the cloud. There is a need to improvise the traditional activity based on costing, which has been proposed by task scheduling strategy in a cloud environment. This algorithm will divide the user tasks based on the priority of the tasks into three lists. It measures the resource cost and also the computation performance, thus improving the computation to communication ratio.

4 Proposed Energy-Efficient PSO (EPSO) Based Cloud Optimal Scheduling Several strategies and algorithms for cloud schedules have recently proposed. The beneficiaries of cloud-computing technology depend entirely on the cloud scheduling method or algorithm. Fuzzy inheritance-based cloud scheduling is a useful comparison with older fuzzy-based cloud scheduling. Cloud scheduling based on the optimization algorithm is implemented by a set of efficient fuzzy cells to improve fuzzy gene-based scheduling tasks. Effective thin-cell mass optimization used in cloud scheduling to make cloud scheduling more efficient with PSO algorithm shown in Fig. 2.

754

R. S. Sirisati et al.

Table 1 Varoius scheduling algorithms with parameters Author (Refs.)

Methodology proposed

Parameters worked

Algorithm theme

Computing model

[4]

Pheromone strategy of updation

Completion time

self-reliant

Grid model

[5]

Pheromone strategy of updation

Reliability, completion time, Execution cost

Large-scale Workflows

Grid model

[6]

Initialization of methods

Deadline constraint, execution cost

Workflows

Grid simulation model

[7]

Pheromone strategy of updation

Deadline constraint, execution cost

Time-varying workflows

Grid model

[8]

Internal search

Completion time

Workflows

Cluster model

[9]

Types of ant agents

SLA constraints of throughput, power usage, response time

self-reliant

All

[10]

Basic ant colony optimization

Scheduling

self-reliant

CloudAnalyst

[11]

Historical scheduling to forcast future demand of cloud resources

Power usage

VM Placement Own java-based simulation toolkit

[12]

Vector algebra

Resource VM Placement Cloud simulation utilization, power model usage

[13]

Ant colony optimization

Deadline constraint, execution cost

Workflows

Grid model

[14]

Load balancing in VM

Scheduling, energy consumption, SLA violation

self-reliant

CloudSim

[3]

Ant colony optimization

Completion time of the last job

self-reliant

Grid model

[2]

Load balancing in VM

Completion time, self-reliant load balancing

CloudSim

[15]

Pheromone strategy of updation

Completion time, self-reliant load balancing

Grid simulation model (continued)

An Energy-Efficient PSO-Based Cloud Scheduling Strategy

755

Table 1 (continued) Author (Refs.)

Methodology proposed

Parameters worked

Algorithm theme

Computing model

[16]

Pheromone nature Power usage in virtual machines

VM Placement Cloud simulation model

[17]

Load hot spots utilization ACO

Scheduling

self-reliant

Cloud model

[18]

Pheromone strategy of updation

Completion time

self-reliant

Grid simulation model (GridSim toolkit)

[19]

Pheromone strategy of updation

Scheduling

self-reliant

Not mentioned

[20]

Online environment

Response time, throughput

VM Placement CloudSim

[21]

Basic ant colony optimization

Completion time

self-reliant

Cloud simulation model (CloudSim toolkit)

[22]

ACO and PSO (hybrid)

Resource utilization ratio, completion time

Workflows

Cloud simulation model (MatLlab 7.0)

[23]

Dynamic load balancing

Scheduling

self-reliant

Java

Cell populations are a population-based strategy for adaptation that mimics the social behavior of fish or bird livestock. In the PSO system, each candidate solution is called a piece. Each cell moves faster in the search space. It adjusts dynamically depending on the experience of the associated cells and cell partners. In mathematics, cells organized by the following Eqs. (1) and (2): Cid (x + 1) = wi ∗ Cid (x) + j1 ∗ p1 ∗ [cid (x)−m id (x)]   + p2 ∗ q2 ∗ C gd (x)−m id (x)

(1)

Pid (x + 1) = Pid (x) + A ∗ Cid (x + 1)

(2)

The parameters here specifies that, C id , x, wi , p, q, A, m represents cloud resource positions, ith particle„ previous weights, particle vector societal factor parametrs.

756

R. S. Sirisati et al.

Fig. 2 The proposed PSO algorithm flow

5 Experimental Results The performance of the proposed method was analyzed using the Cloud Sim Simulation Kit. Cloud Sim Toolkit Mode supports researchers in the cloud-computing environment, where the laboratory of cloud computing and distribution systems at the

An Energy-Efficient PSO-Based Cloud Scheduling Strategy

757

University of Melbourne is released. Additionally, it provides features for modeling and simulation of cloud-computing environments. According to Cloud Sim, users are trying to display their etches in cloud form. Each cloudlet contains file size attributes and several instructions to execute. The fuzzy-based genetic algorithm (FGA), standard PSO, basic genetic algorithm (GA), and fuzzy logic (FL) approache are used to compare with proposed EPSO algoritm. The results are represented in Tables 2 and 3 the corresponding bar pictorial representation are shown in Figs. 3 and 4. The results shows allow the broker to schedule a virtual machine associated with the cloudlet schedule, the advantage of creating agent-driven processes. Cloud SIM refers to virtual machines created on the class hosts described in the VM. Production of hosts based on an agent that assigns each VM to different hosts. Data centers can manage many hosts, and agents can dynamically change the host and VM Table 2 Time is taken makespan for various algorithm execution (in a sec) Algorithms/no of tasks

50

100

150

200

EPSO

382

556

690

906

FGA

404

578

712

924

PSO

430

600

730

1000

GA

458

626

758

1052

FL

488

700

848

1310

Table 3 Computational cost of tasks in cloud Algorithms/no of tasks

50

100

150

200

EPSO

6200.46

12,001.56

17,200.64

22,027.08

FGA

6409.72

12,973.84

18,091.2

24,249.5

PSO

6801.1

13,981.34

20,203.78

26,534.18

GA

7201.44

15,044.46

21,890.5

28,503.7

FL

8045.64

16,245

22,904

30,984.68

Makespan vs Tasks

Fig. 3 Time is taken makespan for various algorithm execution (in a sec)

1400

Time insec)

1200 1000 800 600 400 200 0

EFPSO FGA PSO

Algorithm

GA

FL

Fig. 4 Time taken for execition Cost vs Tasks for various algorithm execution (in a sec)

R. S. Sirisati et al.

Computational Cost

758

35000 30000 25000 20000 15000 10000 5000 0

Execution Cost vs Tasks

EFPSO FGA PSO

GA

FL

Algorithm

system. Various parameters used to evaluate the performance of specific FGA-based scheduling in a cloud-computing environment. These include work time (duration), cost of implementation, percentage of resource utilization, speed and efficiency.

6 Conclusion Since cloud computing provides the resources based on demand, it is called the on-demand resource provisioning based on a subscription. There is also a central remote server that maintains data and application. Owing to the reliability, fault tolerance, effective communication and speed, cloud computing is now a fast-emerging technology. Cloud computing provides several scheduling algorithms for solving real-world computing resource provisioning. In this paper, an energy efficient fuzzy particle swarm optimization algorithm developed to provide optimum optimization in cloud filling. In the scheduling process, optimization is an important step. Due to the lack of optimization, some scheduling processes yield fewer results than cloud scheduling, using only genetic algorithm or fuzzy logic. It gave good results, and the performance of the algorithm was much better than other algorithms. Therefore, it concluded here that the fuzzy particle mass optimization algorithm is effective in cloud scheduling.

References 1. Ge, J.W., Yuan, Y.S.: Research of cloud computing task scheduling algorithm based on improved genetic algorithm. Appl. Mech. Mater. 347, 2426–2429 (2013) 2. Li, K., Xu, G., Zhao, G., Dong, Y., Wang, D.: Cloud task scheduling based on load balancing ant colony optimization. Sixth Annu. Chinagrid Conf. 2011, 3–9 (2011). https://doi.org/10. 1109/ChinaGrid.2011.17 3. Kousalya, K.: To improve ant algorithm’ s grid scheduling using local search. Int. J. Comput. Cogn. 7, 47–57 (2009)

An Energy-Efficient PSO-Based Cloud Scheduling Strategy

759

4. Bagherzadeh, J., MadadyarAdeh, M.: An improved ant algorithm for grid scheduling problem using biased initial ants. In: 3rd International Conference on Computer Research and Development, pp. 373–378 (2011). https://doi.org/10.1109/CSICC.2009.5349368 5. Chen, W.-N., Zhang, J.Z.J.: An ant colony optimization approach to a grid workflow scheduling problem with various QoS requirements. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 39, 29–43 (2009). https://doi.org/10.1109/TSMCC.2008.2001722 6. Chen, W.-N., Zhang, J., Yu, Y.: Workflow scheduling in grids: an ant colony optimization approach. IEEE Congr. Evol. Comput. 3308–3315 (2007) 7. Chen, W., Shi, Y., Zhang, J.: An ant colony optimization algorithm for the time-varying workflow scheduling problem in grids. IEEE Congr. Evol. Comput. 875–880 (2009) 8. Chiang, C.-W., Lee, Y.-C., Lee, C.-N., Chou, T.-Y.: Ant colony optimisation for task matching and scheduling. IEE Proc. Comput. Digit. Tech. 153, 373–380 (2006). https://doi.org/10.1049/ ip-cdt 9. Chimakurthi, L., Madhu Kumar, S.: Power efficient resource allocation for clouds using ant colony framework. Available from arXiv:11022608 (2011) 10. Dam, S., Mandal, G., Dasgupta, K., Dutta, P.: An ant colony based load balancing strategy in cloud computing. Adv. Comput. Netw. Inform. 2, 403–413 (2014). https://doi.org/10.1007/ 978-3-319-073507 11. Feller, E., Rilling, L., Morin, C.: Energy-aware ant colony based workload placement in clouds. In: Proceedings of 12th IEEE/ACM International Conference on Grid Computing, pp. 26–33 (2011). https://doi.org/10.1109/Grid.2011.13 12. Ferdaus, M.H., Murshed, M., Calheiros, R.N., Buyya, R.: Virtual machine consolidation in cloud data centers using ACO metaheuristic. In: Euro-Par 2014 Parallel Process, pp. 306–317. Springer (2014). https://doi.org/10.1007/978-3-319-09873-9 13. Hu, Y., Xing, L., Zhang, W., Xiao, W., Tang, D.: A knowledge-based ant colony optimization for a grid workflow scheduling problem. In: Adv. Swarm Intell. Notes Comput. Sci. 241–248 (2010). https://doi.org/10.1007/978-3-642-38703-6 14. Khan, S., Sharma, N.: Effective scheduling algorithm for load balancing (SALB) using ant colony optimization in cloud computing. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4, 966–973 (2014) 15. Liu, A.L.A., Wang, Z.W.Z.: Grid task scheduling based on adaptive ant colony algorithm. In: International Conference on Management e-Commerce eGovernment, pp. 415–418. IEEE (2008). https://doi.org/10.1109/ICMECG.2008.50 16. Liu, X., Zhan, Z., Du, K., Chen, W.: Energy aware virtual machine placement scheduling in cloud computing based on ant colony optimization. In: Proceedings of Conference on Genetic and Evolution Computing, pp. 41–47. ACM (2014). https://doi.org/10.1145/2576768.2598265 17. Lu, X., Gu, Z.: A load-adapative cloud resource scheduling model based on ant colony algorithm. In: IEEE International Conference on Cloud Computing Intelligent System 2011, pp. 296–300. https://doi.org/10.1109/CCIS.2011.6045078 18. Mathiyalagan, P., Suriya, S., Sivanandam, S.N.: Modified ant colony algorithm for grid scheduling. Int. J. Comput. Sci. Eng. 02, 132–139 (2010) 19. Nishant, K., Sharma, P., Krishna, V., Gupta, C., Singh, K.P., Nitin, et al.: Load balancing of nodes in cloud using ant colony optimization. In: UKSim 14th International Conference on Computing Model Simulation, pp. 3–8 (2012). https://doi.org/10.1109/UKSim.2012.11 20. Pacini, E., Mateos, C., Garino C.G.: Balancing throughput and response time in online scientific clouds via ant colony optimization. Adv. Eng. Softw. 84, 31–47 (2015) 21. Tawfeek, M.A., El-Sisi, A., Keshk, A.E., Torkey, F.A.: Cloud task scheduling based on ant colony optimization. In: 8th International Conference on Computer Engineering System, pp. 64–69 (2013). https://doi.org/10.1109/ICCES.2013.6707172

760

R. S. Sirisati et al.

22. Wen, X., Huang, M., Shi, J.: Study on resources scheduling based on ACO algorithm and PSO algorithm in cloud computing. In: Proceedings of 11th International Symposium Distribution Computing Application to Business Engineering Science, pp. 219–222 (2012). https://doi.org/ 10.1109/DCABES.2012.63 23. Zhang, Z., Zhang, X.: A load balancing mechanism based on ant colony and complex network theory in open cloud computing federation. Int. Conf. Ind. Mechatron. Autom. 2, 240–243 (2010). https://doi.org/10.1109/ICINDMA.2010.5538385

A Pronoun Replacement-Based Special Tagging System for Bengali Language Processing (BLP) Busrat Jahan, Ismail Siddiqi Emon, Sharmin Akter Milu, Mohammad Mobarak Hossain, and Sheikh Shahparan Mahtab

Abstract Natural language processing (NLP) is one of the most important thing for human machine interaction and a very important thing for machine learning system. In the world, over 27 crore people use Bengali as their first and mother language, and it has its own written system, so it is very much important to process Bengali language for natural language processing. In this research work, we have tried to demonstrate an upgraded parts of speech tagging system (POS) for Bengali language, where we have used special tagging system with general grammatical parts of speech based on many different things like—Considering suffixes for verb, where get 68% success rate. We have also added places name, occupation name, Bengali Name, Bengali repeated word, digit of Bengali in both written and digit form, English acronym, organization name for both cases. The success rate of tagging for genera tagging is 70 and 76% for special tagging which is ever highest. This tagging system can be used for Bengali language processing (BLP) like—sentiment analysis for Bengali, Bengali text summarization, etc. Keywords POS tagging · Bengali POS tagging · Special tagging · BLP · NLP Bengali · Bangla POS tagging B. Jahan · I. S. Emon Department of CSE, Feni University, Feni, Bangladesh e-mail: [email protected] I. S. Emon e-mail: [email protected] S. A. Milu Department of CSTE, Noakhali Science and Technology University, Noakhali, Bangladesh e-mail: [email protected] M. M. Hossain Department of CSE, Asian University of Bangladesh, Dhaka, Bangladesh e-mail: [email protected] S. S. Mahtab (B) Department of EEE, Feni University, Feni, Chittagong Division, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_80

761

762

B. Jahan et al.

1 Introduction Nowadays, we share, store, write, and publish text through the revolutionized advances of Internet, hardware, and software. In this regard, a new era of information explosion is impending. Users often find each retrieved document very lengthy that is very tedious and time consuming to read [1]. Therefore, the automatic text summarization is needed to process the huge amount of Internet data efficiently [2]. It is a matter of fact that unlike English which has seen a large number of systems developed to cater to it; other languages are less fortunate [3]. So, the development of text summarization has no mentionable progress for other languages specifically Bangla. Point to be stated that we are Bangladeshi people, and our national language is Bangla. Bangla is the fourth largest language in the Indo-European language family and the sixth largest in the world in terms of number of native speakers. Bangla language is also the seventh largest oral language in the world out of 3500 languages. Bangla is the mother tongue of Bangladeshi people and the second largest oral language of some states in India. According to economic surveys—2015, 62.2% of the educated people in Bangladesh and most of them are only accustomed to Bengali language only [4]. Summarize the text scientific documents, the literature, news documents, books, etc. may be required today; the online content of Bangla news documents is growing very fast, and mass people are reading it regularly [6]. Bangla news document like, the online portal of Bangla magazine is also increasing rapidly, and also electronic Bangla text is expanding in the cyber world with no borders, and there are lots of great lessons from people and so on. Very few research works have been done for improving the expanding big amount of text of Bangla. In addition, more research work needs to be done for the community of Bengali-speaking people, especially for retrieving Bangla information [7].

2 Methodology The research work of Bangla text documents is much more difficult for the ensuring things: i. ii. iii. iv. v.

According to our study, automated tools for research work are rarely available for Bengali language. Though a similar tool is under development and has limited features, but there is no lexicon-based dictionary like WordNet for Bangla. Researchers have done differently, and there is very little consolidation. Lack of free and open source software [8]. The Bengali language originated from Sanskrit and is mostly maintained inconsistency rules. For proper recognition of the structure of sentences, the subjects and objects of all the sentences need to be identified which are not easy in Bangla compare to English.

A Pronoun Replacement-Based Special …

763

Despite these difficult issues, this method is focused on these difficulties in the output of Bangla data recovery methods. In data recovery output, where the corresponding nouns are missing, here, some sentences can be extracted with dangling pronouns. User would not get any appropriate message from summary, and there are chances to misunderstand the text for these pronouns. So, these dangling pronouns need to be replaced by corresponding nouns [9]. Based on the analysis of Bangla news documents, the immediate preceding sentence or in the second immediate preceding sentence, it has been observed that the noun related to any pronoun is existed 88.63% of the time. The purpose to study this work is to recover Bangla information output methods to free from dangling pronouns. Otherwise, a user may be misunderstood the text because of an only dangling pronoun that is sufficient for sending the incorrect message. In between in the situation, a method to solve this problem with the following heads is proposed here contributions [10, 11]: (i) Verify the nature of a word is needed to dependency parsing. In which the inactive words are tagged by using dependency parsing; (ii) Identify pronouns and distinguish subject or object; (iii) Detecting every words of the nature of Bangla text sentence as noun, pronoun, regardless of verbs, subjects, objects, digits, acronym, organization, people, and place names, etc. There are two phases in which words are tagged as general and special tagging; (iv) Identify nouns related to pronouns and replace pronouns in appropriate format [12]. Derived from the research of the authors’, this is found the much better result for pronoun replacing in Bangla than others. Some rules are used here after analyzing some news documents. For instance, special tagging, dependency parsing, and subject and object identification, and all pronoun replacements are done after completion, based on these rules. To carry out this work, we have been selected 3000 Bangla news documents [collected from the most popular Bangladeshi newspaper the Daily Prothom-Alo (February 2020)].

2.1 General Tagging All words are tagged (like noun, pronoun, adjective, verb, preposition, etc.) by using a lexicon database [1] and SentiWordNet [2]. Using lexicon database, we can be tagged the words as “JJ” (Adjective), “NP” (Proper noun), “VM” (Verb), “NC” (Common Noun), “PPR” (Pronoun), etc. On the other hand, SentiWordNet has list of words with tag as “a” (Adjective), “n” (Noun), “r” (Adverb), “v” (Verb), “u” (Unknown). Depend on these lists of words that are predefined, we have experimented on 200 Bangla news documents and found that 70% words can be tagged. Though we use word stemming to find base form of word, here when the verb is not active form, then it couldn’t be stemmed. In fact, it is very difficult for identifying verb because there are many suffixes in Bangla. For instance, depending on the tense and person, the English words “do” may be “doing,” “did,” and “does,” but on the other hand, the word may ” have different forms in Bangla. To consider the present continuous tense like, “

764

B. Jahan et al.

(kor-do), three main forms of this word can only have depend on the first, second, ” (doing) for first person, “ ” (doing) and third person. Also it can be “ ” (doing) for third person, respectively. To consider for second person and “ the present continuous tense like, “ ” (kor-do), three main forms of this word ” can only have depend on the first, second, and third person. Also it can be “ ” (doing) for second person and “ ” (doing) (doing) for first person, “ for third person, respectively. All these meanings for the forms of verbs of “you” are ” (you are doing), “ ” (you also different in Bangla. As, “ ” (you are doing) where those terms are specified in present are doing), “ ” (do) may have continuous tense and also with second person. Thus, this word “ ” (do), “ ” (do), “ ” (do), “ ” (do), “ ” the next forms: “ (doing), “ ” (doing), “ ” (doing), “ ” (doing), “ ” (doing), “ ” (did), “ ” (did), “ ” (did), “ ” (did), “ ” (did), “ ” (do), “ ” (do), “ ” (did), “ ” (did), “ ” (did), “ ” ” (did), “ ” (do), “ ” (did), “ ” (did), “ ” (did), “ ” (did), ” (doing), “ ” (doing), “ ” (doing), (did), “ ” (doing), “ ” (doing), “ ” (doing), “ ” (doing), “ “ ” (doing), “ ” (doing), “ ” (doing), (doing), ” (doing), “ ” (doing), “ ” (do), “ ” (do), “ ” “ (do), “ ” (do), “ ” (do). However, verb identification plays a vital role for language processing because this is the main root of a sentence. Thus, there is no any comparison between the complexity of verb in Bangla and English. A list of suffixes ” (itechhis), “ ” is considered as for the final checking in following: “ ” (itis), “ ” (ile), “ ” (ibi), etc. The result of the percentage (techhis), “ of word tagging has been amplified from 68.12% (before using the list of suffixes [4]) to 70% (after using the list of suffixes). Some tagging we get in this step can be an initial tag and some tags updating in the next steps. Again, certain words will be specifically tagged as acronym, named unit, occupation, etc., in the next step.

2.2 Special Tagging After general tagging, special tagging was introduced to identify the words as acronym, elementary form, numerical figure, repetitive words, name of occupation, organization, and places. 1. Examining for English acronym: When the words are formed by the initials of the other words, then it is called acronym. Such as, “ ” (UNO), “ ” ” (USA), etc. For examining these kinds of words, when we (OIC), “ can separate these words that like: “ ” (UNO) to match with “ ” (U), ”, “ ” (O), those are matched every letter of the words. Actually, we can “ write all English letters in Bangla like: A for (“ ”), B for (“ ”), C for (“ ”), ”), X for (“ ”), Y for (“ ”), Z for D for (“ ”), … W for (“ ”), and if we can sort them by descending order depend on their string (“

A Pronoun Replacement-Based Special …

765

lengths, where W (“ ”) will be in the first place and A (“ ”) will be in the last place, then match every letter of the words. It is important in descending ” (M) does order that is always used to ensure the longest match. Such as, “ not match with “ ” (A), but it will match with “ ” (M). This experiment shows that 98% success rate for this case. ” 2. Studying for Bangla elementary tag: Bangla letters with spaces, like: “ ” (A B M), etc. These letters will be tagged as Bangla primary (A K M), “ tag. We have gotten based on research; the accuracy of the elementary result is 100%. 3. Studying for repetitive words: Repetitive words are special form of word combination, where same word can be placed for two times consecutively. For example, ” (thandathanda—cold cold), “ ” (boroboro—big big), “ “ ” (chotochoto—small small), etc. There are some words; they are ” (khawadawa—eat). We have found partially repeated such as “ 100% accuracy on identifying repetitive words. 4. Studying for numerical form: There are three conditions for recognizing the numerical representation in words and digits which are examined as follows: (a) It is formed by following the first part of the word, like as, 0 for ( ), 1 for ” (one), “ ” (two), “ ” (three), ( ), 2 for ( ), …, 9 for ( ) or “ ” (four) to “ ” (ninety nine). The decimal point (.) is also “ considered when examining the numerical form from digits. ” (hundred), “ ” (thousand), (b) The next part (if any) is followed by: “ etc. ” (en), (c) Finally, it can have suffixes such as, “ ” (this), “ ” (this), “ etc. After the experiment on our sample test documents, 100% numerical form can be found from both numerical values and text documents. 5. Studying for name of occupation: Occupation has a significant word and for the human named entity identification, occupation is very much helpful by which named entity can be recognized. If we get any word as occupation, then we may consider the immediate next some words to find out named entity. We have retrieved some entries for the occupation of Bangladesh from ” (shikkhok—master), “ ” (sangbadika table such as, “ journalist), etc. Every word has matched with these words (that we collected from different online source), and if any matches are found, then tagged as ” (shikkhok-master) will turn into “ ” occupation. Here, “ (prodhanshikkhok-Head master), and so on. From this study, it may identify 96% for occupation. 6. Studying for name of organization: Name of organization is an important factor, where any type of word may be the element of organizational name. From our analysis, it has been given below: a.

The following complete name of the organization, which is depended on the acronym of the name that is together with this parenthesis. For

766

B. Jahan et al.

” “RajdhaniUnnayanKartiexample, “ pakkh (RAJUK)-Anticorruption Commission (ACC).” Depending on the total number of letters in the acronym, if there is any acronym bounded with parentheses. Then, before the acronym of the number of same words are tagged as the name of organization. In this case, the acronym can be added to the initial letters of the word immediately after the commencement of the acronym; otherwise, this process will not be applicable. Research shows that after name of acronym in parentheses may be found for the name of organization 85%. (b) The organization name with last part may contain certain words. Such as, ” (limited-limited), “ ” (biddaloy-school), “ ” “ ” (kong–kong), etc., [5]. Along with the (montronaloy-ministry), “ above point. If any such of words are presented in the text according to the point (b), then immediately check the three words of the particular word. Consisting of more than three words and then also selecting three words of the organization, it will be considered sufficient for the purpose. Uncertainty when the three words are found as noun, name entity or any blocked word, then call them the name of an organization. It is found that the organizations name can be accepted for 85% times based on the point (b). 7. Studying for name of place: There is a table the name of places of Bangladesh, it is made with 800 names for the list of division, district, upazila, and municipality. Here, the top level is division, second level is district, and third level is upazila or municipality in area-based separation. In addition, we have analyzed 230 countries names and their capitals. In this way, about 91% of the place names can be identified in our experiment.

3 Results and Discussion Word tagging success rate experimental result of word tagging success rate is given in Table 1 for each phase. The Experiment has been conducted on 32,143 words of 200 test documents. In the results of special tagging (shown in Table 2), it has been found that some nature of words have been identified for 100% as acronym, initial, numerical figure from digits and words. The procedures have followed some patterns to identify these Table 1 Results of word tagging of different phases

Phases of word tagging

Number of tagged words

Percentage of tagged words (%)

General tagging

21,896

68.12

Considering suffixes for verb

22,500

70.00

Special tagging

26,098

76.98

A Pronoun Replacement-Based Special …

767

Table 2 Exploratory different types of word results of special tagging Types of word

Success rate (%)

English acronym

98

Repeated words

100

Bengali name

100

Digit

100

Occupation

96

Places name

91

Organization name for both cases

85

Table 3 Results on number of pronoun replacement for 200 news documents Total pronoun

Unaffected

Incorrectly exchanged/replaced

Properly exchanged/replaced

301

71

15

215

(acronym, initial, numerical figure from digits and words) and not based on any limited number of predefined words. These specific patterns are the main reason for 100% success rate achievement. But some nature of words can’t be identified completely. Here, occupation has been identified as 96%, name of organization by considering acronym 85%, name of human and places have been identified as 100 and 91% correspondingly. The procedures have utilized some lists of predefined words to identify occupation, name of organization, name of human, and places.

3.1 Results on Replacement of Pronoun From the evaluated 200 documents, we have counted the pronoun manually and crosscheck it with our program. The results of replacement of pronoun and number of pronouns have been given in Table 3.

4 Conclusion Natural language processing (NLP) is one of the most important thing for human machine interaction and a very important thing for machine learning system. In the world, over 27 crore people use Bengali as their first and mother language, and it has its own written system, so it is very much important to process Bengali language for natural language processing. In this research work, we have tried to demonstrate an upgraded parts of speech tagging system (POS) for Bengali language, where we have used special tagging system with general grammatical parts of speech based on

768

B. Jahan et al.

many different things like—Considering suffixes for verb, where get 68% success rate. We have also added places name, occupation name, Bengali name, Bengali repeated word, digit of Bengali in both written and digit form, English acronym, organization name for both cases. The success rate of tagging for Genera tagging is 70 and 76% for special tagging which is ever highest. This tagging system can be used for Bengali language processing (BLP) like—sentiment analysis for Bengali, Bengali text summarization, etc.

References 1. Azmi, A.M., Al-Thanyyan, S.: A text summarizer for arabic. J. Comput. Speech Lang. 26(4), 260–273 (2012) 2. Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995) 3. Indian Statistical Institute: A Lexical Database for Bengali 2015 [Online]. Available https:// www.isical.ac.in/∼lru/wordnetnew/index.php/site/aboutus. Accessed 28 Oct 2015 4. Chakma, R. et al.: Navigation and tracking of AGV in ware house via wireless sensor network. In: 2019 IEEE 3rd International Electrical and Energy Conference (CIEEC), pp. 1686–1690. Beijing, China, 2019. https://doi.org/10.1109/CIEEC47146.2019.CIEEC-2019589 5. Milu, S.A., et al.: Sentiment analysis of Bengali reviews for data and knowledge engineering: a Bengali language processing approach. In: Bindhu, V., Chen, J., Tavares, J. (eds.) International Conference on Communication, Computing and Electronics Systems. Lecture Notes in Electrical Engineering, vol. 637. Springer, Singapore (2020). https://doi.org/10.1007/978-98115-2612-1_8 6. Notes for Students: Rule Based System, Nov 2000 [Online]. Available https://www.jpaine.org/ students/lectures/lect3/node5.html. Accessed 01 Apr 2017 7. Gpedia: Gpedia, your encyclopaedia [Online]. Available www.gpedia.com/bn. Accessed 25 June 2016 8. BdJobs.com: Occupation in Bangladesh, Name of Occupation in Largest Job Site in Bangladesh, Feb 2016 [Online]. Available https://bdjobs.com. Accessed 25 June 2016 9. Emon, I.S., Ahmed, S.S., Milu, S.A., Mahtab, S.S.: Sentiment analysis of Bengali online reviews written with English letter using machine learning approaches. In: Proceedings of the 6th International Conference on Networking, Systems and Security (NSysS ’19). Association for Computing Machinery, New York, NY, USA, pp. 109–115 (2019). https://doi.org/10.1145/ 3362966.3362977 10. Khan, M.F.S., Mahtab, S.S.: PLC based energy-efficient home automation system with smart task scheduling. In: 2019 IEEE Sustainable Power and Energy Conference (iSPEC), pp. 35–38. Beijing, China, 2019.https://doi.org/10.1109/iSPEC48194.2019.8975223 11. Ahmed, S.S., et al.: Opinion mining of bengali review written with english character using machine learning approaches. In: Bindhu, V., Chen, J., Tavares, J. (eds.) International Conference on Communication, Computing and Electronics Systems. Lecture Notes in Electrical Engineering, vol. 637. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-261 2-1_5 12. Mahtab, S.S., Monsur, A., Ahmed, S.S., Chakma, R., Alam, M.J.: Design and optimization of perovskite solar cell with thin ZnO insulator layer as electron transport. In: 2018 International Conference on Advancement in Electrical and Electronic Engineering (ICAEEE), pp. 1–4. IEEE, Gazipur, Bangladesh (2018). https://doi.org/10.1109/ICAEEE.2018.8643012

A High Performance Pipelined Parallel Generative Adversarial Network (PipeGAN) Rithvik Chandan, Niharika Pentapati, Rahul M. Koushik, and Rahul Nagpal

Abstract Generative Adversarial Networks (GANs) are gaining popularity with applications in unsupervised, supervised as well as reinforcement learning for generating images, videos, and other artefacts. However, the inherently sequential nature of GANs is an Achilles heel to widespread adoption. In this paper, we propose and experimentally evaluate a novel sophisticated pipelined parallel version of GANs by dividing the training process into different balanced pipeline stages. Our experimental evaluation of the proposed technique shows significant performance gain up to 30% and 23% with an average speed-up close to 23% and 15% as compared to the serial implementation in the context of NumPy and Pytorch respectively when used to accurately classify real and fake images from standard MNIST and Fashion MNIST datasets. Keywords Generative adversarial networks · Parallelism · Pipeline parallelism · Performance · MNIST

1 Introduction Generative Adversarial Networks [1] are gaining popularity with extensive use in the generation of realistic images of objects, people, and scenes among others. Other applications of GANs include image-to-image translation tasks such as translating photos of night to day and autumn to spring. A GAN is usually composed of two R. Chandan · N. Pentapati · R. M. Koushik · R. Nagpal (B) Department of Computer Science and Engineering, PES University, Bangalore, India e-mail: [email protected] R. Chandan e-mail: [email protected] N. Pentapati e-mail: [email protected] R. M. Koushik e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_81

769

770

R. Chandan et al.

contending neural networks co-trained at the same time, involving forward and backward propagation. The co-training, though effective, suffers from the curse of tight dependencies posing a significant parallelism challenge and is not amenable to traditional data and task parallelization techniques. Pipeline parallelization divides the work into different subtasks that constitutes pipeline stages thereby overlaps execution as data passes through stages. If stages are kept sufficiently balanced, this style of parallelism offers good speed-up even in the presence of dependencies which render data and task parallelism impotent. Pipeline parallelism is ideal for accelerating GANs involving thousands of batches in each epoch by overlapping batches. In this paper, we describe and experimentally evaluate our novel and sophisticated pipelined parallel implementation of GANs that accurately classify and generate new images with significant speed-up. Our main contributions are as follows: 1. Design and implementation of a novel and sophisticated pipelined parallel implementation of GANs. 2. Detailed Experimental Evaluation of our scheme on standard MNIST and Fashion MNIST dataset demonstrating speed-up of up to 30% with an average speed-up of 20%. The rest of the paper is organized as follows. We discuss related work in Sect. 2, followed by our methodology and pipelined parallel algorithm in Sect. 3. We discuss our implementation in Sect. 4 followed by a hard-nosed experimental evaluation of our technique in Sect. 5. We present the results in Sect. 6 and conclude in Sect. 7.

2 Related Work PipeDream [2] parallelize deep neural networks separating model layers into various stages using an optimization algorithm mapping each stage to GPU. Every stage performs a forward propagation and passes the result to the next, and the loss is calculated at the final stage. This loss is propagated backward and the weights of the model are updated. PipeDream pipelines the training of internal layers. However, PipeDream does not pipeline the training process across multiple networks. GPipe [3] proposes a batch splitting algorithm for fast training of large scale Deep Neural Networks. During forward propagation, mini-batches are divided into smaller batches and are pipelined over accelerators. Similarly, during backward propagation, the gradients from each smaller batch are aggregated for every mini-batch to update the parameters used in the model. Although, this technique attempts pipelined parallelism in deep neural networks. However, GPipe does not consider adversarial settings peculiar to GANs. Another approach to specifically parallelizing the training of Convolutional Neural Networks (CNNs) [4] strives to maximize the communications. It uses a thread once the gradients are computed followed by concurrent communication of data generated during backpropagation and computations from different layers. This approach is not focused on GANs.

A High Performance Pipelined Parallel Generative …

771

Whereas all the earlier techniques are specific to pipelining or parallelizing within a neural network and don’t take into consideration any adversarial settings like that of a GAN which separates our work from earlier proposals. Earlier techniques, as discussed, focus mostly on the performance enhancements while training the internal layers of the neural network. Our technique specifically resolves the difficult challenge of pipelining the training process itself.

3 Methodology 3.1 Training Figure 1 shows various stages of computation and evaluation used in the training of the GAN with input training data of images split into multiple mini-batches of size d. These stages with input and output are outlined as follows: • Discriminator Real Forward Propagation: Takes a mini-batch of real images and outputs probability Preal that differentiates real from fake images. • Gradient Compute Discriminator Real: Takes Probability Preal and determines loss using the function in (1). This is followed by padding the computed loss with a cost function used to compute gradients for each discriminator layer. L discriminatorfake (Preal ) = 1/Preal

(1)

• Discriminator Weight Update Real: The discriminator’s weights are updated using gradients. • Generator Fake Forward Propagation: Forward propagation of the generator is performed by transforming a random noise into d images. These images are labeled as Fake Images. • Discriminator Fake Forward Propagation: Fake Images are passed to the discriminator which emits a probability Pfake .

Fig. 1 Stages of GAN training

772

R. Chandan et al.

• Gradient Compute Discriminator Fake: Gradients are computed using the probability Pfake based on the loss function specified in (2). L discriminatorfake (Pfake ) = 1/Pfake

(2)

• Discriminator Weight Update Fake: Weights are updated based on gradients computed in the Gradient Compute Discriminator Fake stage. • Generator Forward Propagation: A new set of fake images is generated using the same noise used to train the discriminator in order to train the generator. • Discriminator Generator Forward Propagation: These images are passed to discriminator to emit probabilities. • Gradient Compute Generator: Probabilities are used to calculate gradients. • Generator Weight Update: Generator weights are updated based on the gradients computed. A total of 40 epochs are performed on the MNIST and Fashion MNIST datasets as the generated images reach a desirable quality at this point.

3.2 Pipelining Algorithm To implement pipelining in the GAN training process, the training data is split into mini-batches and the 11 stage pipeline as described in Sect. 3.1 is used for training. These overlapped executions of stages are interconnected in a pipeline structure to provide efficiency. Figure 2 shows the pipelined structure using a timing diagram for the abbreviated stages. A column denotes a point in time along with the units running at that instant. As multiple units run simultaneously, pipeline parallelism is harnessed. At the beginning, training data consisting of the images are split into mini-batches which are maintained in a queue called the BatchQ. A mini-batch from this queue is passed to stage 1 and in the next time step, it is propagated to stage 2. Simultaneously, the next mini-batch enters stage 1 and so on. The pipeline is full and runs at maximum capacity after 10 such iterations. The first 10 iterations are marked by the HandleStart() function in Algorithm 1. When all functional units are busy, mini-batches are sent to the functional unit queue one after another in a pipelined fashion. Once the mini-batch finishes all the stages of the functional unit queue, that mini-batch is popped from the BatchQ. These sequences of steps are done repeatedly for e number of times. Dependency analysis is performed in the kernel part of the pipeline. The results are shown in Fig. 3. Nodes marked in grey are Weight Updates which causes Read After Write (RAW) and Write After Write (WAW) dependencies. These dependencies occur during shared weight update that causes conflicts. These conflicts are resolved by explicit synchronization mechanisms such as locks or mutexes.

A High Performance Pipelined Parallel Generative …

773

Fig. 2 Pipelining time diagram

Fig. 3 Dependence graph for functional units

4 Implementation In this section, we describe our implementation including the pipelined parallel implementation as described above. The generator and discriminator networks have architectures similar to traditional neural networks. Each layer consists of a collection of nodes that operate on a received weighted input and transforms it with a set of mostly non-linear functions. The layers then pass these values as output to the next layer that in turn performs similar actions. The generator consists of 11 layers where the activation function for 10 layers is LeakyRelu. The output activation function used is tanh. These are standard activation functions used in a generator model. The numbers of nodes in each layer are as follows—100, 128, 256, 512, 1024, 784. The discriminator consists of 4 layers where the activation function for 3 hidden layers is LeakyRelu. These are standard activation functions used in a discriminator model. The output activation function used is sigmoid. The numbers of nodes in each layer are as follows—784, 512, 256, 1.

774

R. Chandan et al.

Algorithm 1: Pipelining Method Input: Queue containing mini-batches (Batch Q), functional unit queue (FuncQ), H andleStar t function, total epochs (e), number of mini-batches(num batches ), number of functional units (n), cr eateT hr ead and star t T hr ead functions to implement threading length pi pe =n + num batches − 1; for epoch in range(e) do threads = [] ; for iter in range(length pi pe ) do if iter < n then H andleStar t () ; else if (len(Batch Q) > n)) then Batch Q. pop() ; input = Batch Q(n) ; for (i in range(2, 9)) do threads. push(threadCr eate(FuncQ[i])) ; end threads. push(threadCr eate(FuncQ[1], input)) ; else for (le f tover in range(Batch Q)) do threads. push(threadCr eate(FuncQ[le f tover ])) ; end end end star t T hr eads() ; end end

We have used the standard Modified National Institute of Standards and Technology (MNIST) dataset having a large number of handwritten digits as well as Fashion MNIST dataset with 28 × 28 grayscale images, from 10 classes namely apparel types to further corroborate our results. Figures 4 and 5 depict generated images using these datasets. A serial, a non-pipelined parallel and pipelined parallel version of GAN is implemented using Python. The non-pipelined parallel version exploited vectorization and data parallelism wherever possible along with concurrent execution in the weight update phases. Two pipelined versions using NumPy and PyTorch implement the pipelined architecture as explained earlier. Pipelined NumPy version has the advantage of not requiring backward compatibility, and the Pipelined PyTorch implementation generates quality MNIST and Fashion MNIST images at even higher performance.

A High Performance Pipelined Parallel Generative …

775

Fig. 4 MNIST Images generated by Pipelined Pytorch-GAN

Fig. 5 Fashion MNIST Images generated by Pipelined Pytorch-GAN

Fig. 6 Limitations in non-pipelined parallel implementation

5 Experimental Evaluation The experimental setup consists of a GPU enabled Google Colab with 12.72GB RAM, 68.4GB disk space, and a Python3 Google Compute Engine GPU having multiple cores to leverage the pipeline parallel execution. The implementation of non-pipelined parallel version shows that the overhead of spawning the threads not only steals any possible benefits but essentially takes more time compared to serial

776

R. Chandan et al.

Fig. 7 Execution times of pipelined and serial numpy and PyTorch versions

versions as depicted in Fig. 6 due to tight dependencies. However, pipelining can work even in the presence of dependencies by overlapping execution leading to a significant amount of speed-up.

6 Results and Analysis Figure 7 depicts the speed-up obtained by our pipelined NumPy and PyTorch implementations compared to the respective serial version. We observe significant performance gain up to 30% with an average speed-up as 23% for NumPy. PyTorch serial GAN implementation has some pre-existing optimization resulting in a relatively lesser speed-up compared to NumPy. However, our pipelined PyTorch version still gained significantly in performance with maximum speed-up of 23% and an average speed-up of 15% compared to pre-optimized serial version in the context of PyTorch.

7 Conclusion Generative Adversarial Networks (GANs) are growing in popularity with extensive applications in a variety of learning techniques for generating as well as differentiating artefacts. However, inherent sequential nature has inhibited parallelization of GANs in the past. We design and implement a novel and sophisticated pipeline parallel GANs (PipeGan) by dividing the training process into different stages. Our experimental evaluation demonstrate that the proposed pipeline parallel technique achieves significant performance gain of 30% and 23% with average speed-up close to 23% and 15% as compared to the serial implementation in the context of NumPy and Pytorch respectively and accurately classifying real and fake images from standard MNIST and Fashion MNIST datasets. In the future, we intend to experiment with various buffering and caching techniques in our pipeline parallel implementation in the quest of further performance gains.

A High Performance Pipelined Parallel Generative …

777

References 1. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27, 2672–2680 (2014) 2. Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N.R., Ganger, G.R., Gibbons, P.B., Zaharia, M.: PipeDream: generalized pipeline parallelism for DNN training. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP ’19). Association for Computing Machinery, New York, NY, USA, pp. 1–15 (2019) 3. Huang, Y., Cheng, Y., Chen, D., Lee, H., Ngiam, J., Le, Q.V., Chen, Z.: Efficient Training of Giant Neural Networks using Pipeline Parallelism, GPipe (2018) 4. Lee, S., Jha, D., Agrawal, A., Choudhary, A., Liao, W.: Parallel deep convolutional neural network training by exploiting the overlapping of computation and communication. In: 2017 IEEE 24th International Conference on High Performance Computing (HiPC), Jaipur, pp. 183– 192 (2017)

Electroencephalogram-Based Classification of Brain Disorders Using Artificial Intelligence Laxmi Raja and R. Santhosh

Abstract Electroencephalogram (EEG) is a medically advanced screening technology currently used to classify various brain disorders and problems. In this paper, we have proposed a new framework for acquiring EEG signals so that it can be beneficial to various researchers of the field. Ag/AgCl electrodes are used to obtain EEG signals. Depending on the requirement of particular studies, different number of channels can be used. The electrodes are placed over the scalp using gel and signals obtained. The data is pre-processed to remove unwanted noise/disturbance. Dual tree complex discrete wavelet transform (DTCWT) was used to transform the data, so that redundancy is reduced to a minimum. Signals were classified using Gaussian mixture model (GMM). We present this model in detail which can be used for studies involving collection of EEG data in medical illnesses. Keywords Electroencephalogram · DTCWT · GMM

1 Introduction Electroencephalography (EEG) is an inexpensive and versatile tool which is used in the last 85 years to investigate the electrical signals generated in the brain. Advancement in digital technology made EEG cheap and user-friendly and provided effective pre-processing and classification of signals which cannot be easily observed by the naked eye [1]. EEG has a very high temporal sensitivity and is used to evaluate cerebral functioning. The list of clinical uses of EEG is long including evaluation of epilepsy, L. Raja (B) · R. Santhosh Department of CSE, Faculty of Engineering, Karpagam Academy of Higher Education, Coimbatore, India e-mail: [email protected] R. Santhosh e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_82

779

780

L. Raja and R. Santhosh

sleep disorders and brain death and to monitor depth of anesthesia. Most of the EEG signals range between 1 and 20 Hz. They have bandwidths like alpha, beta, theta and delta [2]. In this study, EEG datasets of people with various brain disorders were obtained and used. Those EEG were obtained using placing scalp electrodes using International 10–20 system (Fig. 1). Brain is inside skull covered by scalp outside [3]. All the cells in the human body have a resting membrane potential and can produce electrical signal. So, an EEG not only records brain electrical activity, but also the extra-neuronal signals too (Fig. 2). These are termed as artifacts. As a result, pre-processing methods or filtering was used to remove the unwanted noise present in the EEG data which were present around the range of 0.4–100 Hz. Moreover, electrical or power line interference usually present at around 50 Hz [4, 5]. This was removed using a notch filter. After pre-processing, dual tree complex wavelet transform (DTCWT) was used to extract feature of the signals. Gaussian mixture model (GMM) classifier was used to classify EEG signals to identify the brain disorders.

Fig. 1 International 10–20 system of placing EEG electrodes and composite signal of EEG rhythms

Fig. 2 International 10–20 system of placing EEG electrodes and composite signal of EEG rhythms

Electroencephalogram-Based Classification …

781

2 EEG Signals Pre-processing Pre-processing of EEG signals is a very significant step present in order to eliminate unwanted data and interference present in the EEG signal. These unwanted data mixed with our original data is called as noise and can be both internal as well as external noise. This may lead to misinterpretation of EEG and arriving at wrong clinical decisions. Removing these unwanted artifacts or noise is the first step of pre-processing. The artifacts present in the signal can be classified as two types, namely physiological and extra physiological. The artifacts or noise which arises anywhere from the body except the signals from brain is called as physiological noise. For example, heart produces electrical activity and this will appear in EEG and will be more enhanced in short-necked individuals and people with artificial pacemakers. Mechanical artifacts from heart like pulse artifact and ballisto-cardiographic artifact exist. Sweat, slow roving eye movements create low amplitude artifacts. Muscular activities like chewing, swallowing and facial movements can lead to noise. Extra physiological noise is all the other noises generated from rest of the sources like equipment and environment. Electrode pops are a type of artifact which is due to spontaneous discharge arising skin and the gel. Poor electrode placement also produces differences in impedance. Power line interference is caused due to electrical connections present around the system which is embedded in the EEG data at 50–100 Hz [6]. Three commonly used filters are high- and low-frequency filters and notch filters. Low-frequency filters remove signals with low frequencies and allow high-frequency signals so they are called high-pass filters. High-frequency filters are similarly called low-pass filters [7]. Notch filters filter out activity at a specific frequency instead of a range. See Fig. 3. This round of checking takes place about two weeks after the files have been sent to the editorial by the contact volume editor, i.e., roughly seven weeks before the start of the conference for conference proceedings, or seven weeks before the volume leaves the printers, for post-proceedings. If SPS does not receive a reply from a particular contact author, within the timeframe given, then it is presumed that the author has found no errors in the paper. The tight publication schedule of LNCS does not allow SPS to send reminders or search for alternative email addresses on the Internet. Power line interference was removed using a notch filter. Band-pass filters are those which allow only certain range of frequency to pass through. Wavelet transforms virtual instrumentation functions was used to remove the extra physiological artifacts present in the signals. Here, time domain signals were converted to frequency domain and as a result, redundant data is removed and more accuracy obtained. See Fig. 4.

782

L. Raja and R. Santhosh

Fig. 3 Effect of 5 Hz filter

3 Feature Extraction by Dual Tree Complex Wavelet Transform (DTCWT) Wavelet transform is the process of converting the wavelets of the signals to approximate coefficients. Data compression, motion estimation, classification and denoising are some of the issues which can be solved using wavelet transform. Wavelet transform helps to preserve the symmetry, smoothness and shape which are important to get correct coefficients. DTCWT uses high and low double wavelet filters at each scale. This results in real and imaginary complex wavelet coefficient. This property is applied in areas of pattern recognition and signal processing [8, 9]. When the real signals are needed to be further processed or transformed, an inverse transform of length is required, which will be produced by this transform. A representation of this method is given in Fig. 5. Datasets of denoised EEG signals given in Fig. 6a and b.

Electroencephalogram-Based Classification …

Fig. 4 Artifact removal techniques

Fig. 5 Scheme of DTCWT

783

784

L. Raja and R. Santhosh

Fig. 6 a and b Datasets of denoised and real-time EEG signals

4 Gaussian Mixture Model Classifier The two basic categories of classifiers are deterministic and statistical classifiers. The deterministic classifiers can be explained as the classifiers which take into account for initialization of the unlabeled parameters, and search is taken place only in the

Electroencephalogram-Based Classification …

785

Fig. 7 Distributed Gaussian models

search space [10]. On the other hand, threshold values of density functions are only considered for statistical classifiers. Here, we are dealing with the Gaussian mixture model which is an unsupervised learning method where we can find the pattern without using class labels [11, 12]. The Expectation Maximization (EM) technique is used to find the amount of data points which will be present in the clusters [13–15]. Later, cluster means and covariance are calculated based on it. As a result, a cluster covariance is produced for the signals used. And as we partition these clusters, we can differentiate and find the classification pattern. See Fig. 7. As a result, standard mean and covariance matrix are developed which helps to analyze the independent Gaussian distribution. Using the patterns, we can eventually distinguish the EEG signals of people with brain disorders with normal people.

5 Conclusion The scope of the paper is in real-time analysis of people with brain disorder. In the exponentially increasing population and parallelly increasing disorders, it is very

786

L. Raja and R. Santhosh

difficult to expect humanly diagnosis and personal inspection for each person. Moreover, some minute patterns will be ignored by human eyes due to the extent of data in case of each person. As a result, analyzing EEG signals of such people seems to be the best solution. In our paper, we have concluded that combination of DTCWT and Gaussian mixture model classifier seems best for the purpose.

References 1. Teplan, M.: Fundamentals of EEG measurement. Measur. Sci. Rev. 2, Section 2 (2002) 2. Britton, J.W., Frey, L.C., Hopp, J.: Electroencephalography (EEG): an introductory text and atlas of normal and abnormal findings in adults, children, and infants. American Epilepsy Society. Chicago (2016) 3. Fahmie, M., Bin, I., Rodzi, M.: EEG Acquisition Using Labview. Faculty Electronics and communication Engineering, Kolej University Teknikal Kebangsaan, Malaysia, May 2006 4. Adalarasu, K.: Detection of early onset of driver fatigue using multimodal bio signal. Department of biotechnology, Indian institute of technology, Chennai India, February 2010 5. Arman, S.I., Ahmed, A., Syed, A.: Cost-effective EEG signal acquisition and recording system. Int. J. Biosci. Biochem. Bioinform. 2(5) (2012) 6. Khatwani, P., Tiwari, A.: A survey on different noise removal techniques of EEG signals. Int. J. Adv. Res. Comput. Commun. Eng. 2(2). ISSN 2319-5940 (2013) 7. Gurumurthy, S., VudiSai Mahit, Ghosh, R.: Analysis and simulation of brain signal data by EEG signal processing technique using MATLAB. Int. J. Eng. Technol. (IJET) 5(3), ISSN 0975-4024 (2013) 8. Kingsbury, N.: The dual tree complex wavelet transform: a new technique for shift invariance and directional filters. University of Cambridge, Cambridge CB2 1PZ 9. Slimen, I.B., Boubchir, L., Mbarki, Z., Seddik, H.: EEG epileptic seizure detection and classification based on dual-tree complex wavelet transform and machine learning algorithms. J. Biomed. Res. 34(3), 151–161. https://doi.org/10.7555/JBR.34.20190026 10. Cao, M.: Practice on classification using gaussian mixture model course project report for COMP-135 (2010) 11. Lakshmi, R., Prasad, T.V., Prakash, C.: Survey on EEG signal processing methods. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(1). ISSN 2277-128X (2014) 12. Raj, A., Deo, A., Kumari, M., Tripathi, S.: A review on automated detection, classification and clustering of epileptic EEG using wavelet transform and soft computing techniques. Int. J. Innov. Res. Sci. Eng. 17. ISSN 2347-320 (2016) 13. Patel, R.: A real time frequency analysis of the electroencephalogram using lab view. A Thesis Submitted to the Faculty of New Jersey Institute of Technology in Partial Fulfillment of the Requirements for the Degree of Master of Science in Biomedical Engineering, Department of Biomedical Engineering, January 2002 14. Varunadikkarapatti, V.: Optimal EEG channels and rhythm selection for task classification. A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Masters of Science in Engineering, Madras University, India (2004) 15. Raja, L., Arunkumar, B.: A comparative study of various artificial neural network classifiers for EEG based autism spectrum disorder diagnosis. J. Adv. Res. Dyn. Control Syst. 11(1) (2019)

Parallel Antlion Optimisation (ALO) and Grasshopper Optimization (GOA) for Travelling Salesman Problem (TSP) G. R. Dheemanth, V. C. Skanda, and Rahul Nagpal

Abstract We present our parallel high-performance version of the adapted Antlion and Grasshopper meta-heuristic algorithms to solve the Travelling Salesman problem. Our detailed experimental evaluation reveals significant improvements over the traditional Genetic and Ant-Colony based solutions in both accuracy and speed with a performance gain of up to 4× thereby making it possible to solve Travelling salesman problems for a large number of cities. Keywords ALO · GOA · ACO · TSP · Combinatorial · Optimisation · Parallelism

1 Introduction The Travelling Salesman Problem (TSP) is to find the shortest path that goes through all the cities and returns to the first city in the path, given the direct path length between each pair of cities. This problem has no known exact optimal polynomialtime algorithm and is NP-complete. Many heuristic and meta-heuristic algorithms strive to find near-optimal solutions. Meta-heuristic algorithms iteratively generate a vast number of random solutions and search for global optima. Ant Colony Optimisation (ACO) [5] and Genetic Algorithm (GA) [4] have been applied widely to solve TSP. In this paper, we propose our adapted ALO and GOA algorithm to solve TSP. We have also developed exclusive parallel versions of these algorithms and have evaluated our algorithm compared to earlier proposed ACO and GA. Our experimental evaluation reveals that our proposed ALO and GOA G. R. Dheemanth · V. C. Skanda · R. Nagpal (B) Department of Computer Science and Engineering, PES University, Bengaluru, India e-mail: [email protected] G. R. Dheemanth e-mail: [email protected] V. C. Skanda e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_83

787

788

G. R. Dheemanth et al.

algorithms are faster and more accurate in finding a solution to TSP for an even large number of cities. Our main contributions are as follows: 1. We adapted ALO and GOA targeting TSP and this, to the best of our knowledge, is the first attempt in this direction. 2. We have additionally developed and implemented parallel versions of our ALO and GOA for TSP. 3. We performed a detailed hard-nosed experimental evaluation of the accuracy of our algorithms as well as speedup of the parallel version of our ALO and GOA that shows a significant gain in both accuracy and performance as compared to the earlier state-of-the-art ACO and GA algorithms. The rest of the paper is organized as follows. We explain related work in Sect. 2, followed by our ALO and GOA algorithm as adapted to solve TSP in Sects. 3 and 4 respectively including parallel versions of these algorithms. We analyze our results in Sect. 5 and conclude in Sect. 6 with mention of future directions for this work.

2 Related Work The Held Karp dynamic programming Solution [1, 2] brings down the O(N !) time complexity to O(n 2 2n ) with the space complexity of order O(n2n ) and is one of the best known exact solutions to TSP. However, the algorithm is still exponential with space complexity that increases exponentially with an increasing number of cities thereby this solution quickly becomes prohibitive even with 25–30 cities. Therefore, meta-heuristics such as ACO [5] and GA [4] have been applied to TSP problems. GA evolves a collection of possible solutions known as phenotype towards a better solution by altering and rearranging solutions and selecting the fittest solution for the next generation based on the principle of “Survival of the fittest”. However, GA tends to frequently converge towards local optima thereby missing the optimal solution. ACO exploits an ant’s capability to find the shortest path to a destination containing food with each path travelled by an ant associated with a pheromone trail aiding in path tracing. The intensity of the pheromone trail is proportional to the quality of a sub-path, a property that is used to build better solutions while each ant decides on which sub-path to take next. However, ACO does not scale well and thereby is not efficient for large scale combinatorial optimization problems such as when solving TSP for a large number of cities. ALO mimics the interaction of antlions and their prey which are ants with various random operators used to find optimal solutions. ALO has been used recently to successfully solve a variety of engineering problems, including training neural networks. GOA is a population-based algorithm that mimics the movement of a swarm of grasshoppers and their social interaction. Each grasshopper in the swarm represents a solution to the optimization problem. In contrast to earlier applications of

Parallel Antlion Optimisation (ALO) and Grasshopper …

789

ALO and GOA primarily that are primarily in the context of continuous optimization problems, we have adapted and parallelized ALO and GOA targeting discrete optimization problems.

3 Adaptation of ALO Algorithm to TSP ALO makes use of random operators to avoid local optima compared to other metaheuristic algorithms. In this paper, we propose a novel discrete version with an array of cities to model the TSP path in contrast to the proposal in [3] targeted at optimizing parameters of continuous variables that can not be used for a solution to TSP. In our implementation, as outlined in Algorithm 1, each ant and antlion is a permutation of the cities, and fitness is the total path length of the array of cities. Ants randomly walk in a bounded area (search space) represented as an array of numbers which are initialized to the maximum distance between two cities (upper bound). We propose “random permutation of cities” as the random walk. Since a random permutation can lead to the ant going out of the search space, we also propose a new normalization function that tries to bring the maximum number of distances between cities within the search space. The normalization function iterates through the cities in the path. If the distance between any two cities i and j is greater than the upper bound, then a city k is found such that the distance between i and k is less than the distance between i and j, and following this, k and j are swapped. This process repeats until the end of the array is reached. During the random walk, ants can sometimes fall into the trap of antlions which is simulated using a roulette wheel to select the antlion, which traps an ant. The roulette wheel is used to give more preference to fitter antlions. To simulate elitism, which is a salient feature of ALO, the antlion with the minimum fitness or minimum path length is taken as a better solution (elitism). Since the elite antlion makes the best traps affecting all ants, after the random walk of an ant, the position is updated taking into consideration both the elite antlion and the antlion which was randomly selected using a roulette wheel. This updated path can sometimes violate the TSP path by having a city more than once in the path. So a function was developed to correct the path. To simulate the sliding of the ants towards the antlion, the search space is gradually reduced as the number of iterations increases.

3.1 Parallel ALO for TSP We have observed that ALO, as described in Algorithm 1, can be effectively parallelized to improve accuracy by increasing the number of search agents in a fixed time or to improve speed for a fixed number of search agents as well as any suitable combination of the both based on the available budget. We have capitalized on these observations in our implementation by parallelizing initialization of search-agents as well as the calculation of fitness of the search agents attributed to no dependencies.

790

G. R. Dheemanth et al.

4 Adaptation of GOA Algorithm to TSP GOA is a population-based metaheuristic algorithm where each grasshopper represents a solution. We have proposed and implemented a discrete version of GOA in contrast to [6] that targets continuous optimization by using an array of cities to represent the TSP path with initialization of the grasshopper path using a random permutation of cities. Fitness is modelled as the total path length or the sum of the weights of the edges in the path. The grasshopper having the minimum fitness or minimum path length was considered to be the best agent. The overall process, as described in Algorithm 2, is as follows. First, the grasshoppers are initialized with a random permutation of cities. Also, a new grasshopperupdate function for the TSP problem is introduced compared to what is proposed in [6]. The modification of the grasshopper’s position depends on three criteria, namely the current position of the grasshopper, a random grasshopper’s position, and the best grasshopper’s position. The random grasshopper’s position simulates the function s, which calculates the social forces as in [6]. The main interactions between grasshoppers are attractive and repulsive forces. These interactions are modelled as a comfort zone around every grasshopper where the repulsive force is greater

Algorithm 1: Pseudocode of ALO algorithm Input: Graph in form of Adjacency Matrix, list of lower and upper bounds Output: Path length of optimal TSP tour Initialise the first population of ants and antlions randomly; for each node ant in ants do calculate fitness; end for each node ant in antlions do calculate fitness; end repeat for each node ant in ants do Select an antlion using Roulette wheel ; Update upper bound ; generate random permutation ; update position of ant; end for each node ant in ants do calculate fitness; end if fitness(elite) is greater than fitness(maxFitness(ants)) then update elite ; update leastfitness(antlions) with elite ; else continue ; end until end criterion not satisfied;

Parallel Antlion Optimisation (ALO) and Grasshopper …

791

than the attractive force. A parameter c is used to represent this comfort zone. Initially, this parameter is high, allowing the grasshoppers to explore large parts of the search space. Over the iterations, the value of the parameter is reduced, leading to the movement and convergence of the grasshoppers. The overall process, as described in Algorithm 2, is as follows. First, the grasshoppers are initialized with a random permutation of cities. Also, a new grasshopperupdate function for the TSP problem is introduced compared to what is proposed in [6]. The modification of the grasshopper’s position depends on three criteria. (a) the current position of the grasshopper (b) a random grasshopper’s position, and (c) the best grasshopper’s position. The random grasshopper’s position simulates the function s, which calculates the social forces as in [6]. The main interactions between grasshoppers are attractive and repulsive forces. These interactions are modelled as a comfort zone around every grasshopper where the repulsive force is greater than the attractive force. A parameter c is used to represent this comfort zone. Initially, this parameter is high, allowing the grasshoppers to explore large parts of the search space. Over the iterations, the value of the parameter is reduced, leading to the movement and convergence.

4.1 Parallel GOA for TSP Parallelization strategy used for GOA is similar to ALO with parallelization of the initialization of search agents as well as fitness calculation. Algorithm 2: Pseudocode of GOA algorithm Input: Graph in the form of Adjacency Matrix, list of lower and upper bounds Output: Path length of optimal TSP tour Initialize the first population of grasshoppers randomly ; for each grasshopper in the swarm do calculate fitness; end T = best search agent ; repeat for each search agent do Update position of current search agent ; calculate fitness; end update T if there is a better solution ; until end criterion not satisfied;

792

G. R. Dheemanth et al.

5 Performance Evaluation Figure 1 compares the accuracy of ALO, GOA, ACO and GA with reference to the Held Karp dynamic programming Algorithm for progressively more number of cities. We observe that our GOA and ALO algorithms perform the best in accuracy, whereas the Genetic Algorithm (GA) performs the worst. Figure 2 depicts the speedup of the Parallel version of ALO, GOA, ACO and GA compared to the corresponding serial version. Our ALO and GOA algorithms are on average 1.5× and 4× faster compared to the serial version respectively on our system with Ryzen 7 1700 8-Core CPU with 16 GB of memory and Nvidia GPU: Nvidia GTX 1050 Ti GPU. There is little speedup at all for most of the algorithms on a small number of cities because of the overheads of thread spawning that steals away any parallelism benefits.

6 Conclusion and Future Directions In this paper, we presented our adapted ALO and GOA algorithms for TSP, along with exclusively designed and developed parallel high-performance versions of these algorithms. We implemented serial and parallel versions of ALO, GOA and experimentally evaluated the accuracy and performance compared to other well-known algorithms. Our experimental results revealed that our ALO and GOA algorithms per-

Fig. 1 Comparison of accuracy of various algorithms

Parallel Antlion Optimisation (ALO) and Grasshopper …

793

Fig. 2 Comparison of time taken by serial and parallel version of algorithms on Ryzen 7 with GTX 1050Ti

form better in terms of accuracy as well as performance compared to other algorithms and are on average 1.5× and 4× faster compared to the serial version. In the future, we plan to implement and evaluate these algorithms in a distributed environment.

References 1. Bellman, R.: Dynamic programming treatment of the travelling salesman problem. J. ACM (JACM) 9(1), 61–63 (1962) 2. Held, M., Karp, R.M.: A dynamic programming approach to sequencing problems. J. Soc. Ind. Appl. Math. 10(1), 196–210 (1962) 3. Mirjalili, S.: The ant lion optimizer. Adv. Eng. Softw. 83, 80–98 (2015) 4. Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge, MA, USA (1996) 5. Moyson, F., Manderick, B.: The Collective Behavior of Ants: An Example of Self-organization in Massive Parallelism. Vrije Universiteit Brussel, Artificial Intelligence Laboratory (1988) 6. Saremi, S., Mirjalili, S., Lewis, A.: Grasshopper optimisation algorithm: theory and application. Adv. Eng. Softw. 105, 30–47 (2017). https://doi.org/10.1016/j.advengsoft.2017.01.004

Design and Development of Machine Learning Model for Osteoarthritis Identification Naidu Srinivas Kiran Babu, E. Madhusudhana Reddy, S. Jayanthi, and K. Rajkumar

Abstract In the body of human beings, the calcaneus or heel bone is one of the strongest and biggest bone which is in the foot. It helps the foot flexibly in normal walking movements. In recent years, most of the people in the age group of 35 to 50 years are falling victim to osteoarthritis (calcaneal shift) as it makes serious impacts continuously. Such types of diseases direct to ailment of the knee. If the ache in the knee is not released by physiotherapy or medication, the affected person may be confined to bed and has to undertake calcaneal osteotomy. It leads to the development of a user-friendly application to load and run Calcaneus images into the application model developed for calcaneal osteotomy. In general, machines are available to predict calcaneal shift occurrence, but they fail to predict and analyze the other subtypes of the calcaneal shift in the foot. This research work focused on developing a model to predict and analyze the subtypes of calcaneal shift occurrence in the foot. Keywords Calcaneal shift · Deep convolution neural network · Osteotomy · Osteoarthritis · Binary classification

N. S. K. Babu (B) Department of Computer Applications, Career Point University, Kota, India e-mail: [email protected] E. M. Reddy Department of CSE, Guru Nanak Institutions Technical Campus, Hyderabad, India e-mail: [email protected] S. Jayanthi Department of IT, Guru Nanak Institute of Technology, Hyderabad, India e-mail: [email protected] K. Rajkumar School of Computer Science and Information Technology, DMI-St John the Baptist University, Mangochi, Malawi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0_84

795

796

N. S. K. Babu et al.

1 Introduction Osteoarthritis is one type of arthritis by which widespread number of people are being affected worldwide. It shall occur if the cartilage which appears at both the end point bones that outwear over time. In few cases, osteoarthritis also may occur due to irregular foot workings, like flat feet or pes planus or high arches. Flatfoot is a common foot condition that affects the patients of all walks of life in India. Musculotendinous component is one of the important components of pes planus which can be aggravated by deficiencies in the circulatory system, or increased wait loading. Flatfoot is caused with a deterioration of medial longitudinal arch of the foot and is usually associated by hindfoot valgus and abduction of forefoot as shown in Fig. 1. To resolve this issue, no consistent method is developed to treat flatfoot malalignment. Conservative treatment and surgical correction are used to abate symptoms such as orthoses and immobilization. These surgeries taken for approaching the malalignment in flatfoot that may consist of the combinations such as transfer of tendons, medial column plantar flexor Osteotomies, stabilization of medial column and different hindfoot osteotomies [1, 2]. There is an agreement available amid surgeons to perform and follow surgical procedures. The commonly followed hindfoot osteotomy methodologies are medializing calcaneal osteotomy, eans calcaneal osteotomy and calcaneocuboid distraction. In recent years, calcaneal Z-osteotomy has been adopted to approach seriously deformed flatfoot. But this method has not been reviewed and discussed objectively in this research work. There are diverse chances available to study the deformity of flatfoot and surgical corrections to treat flatfoot. The first option is to observe the human subjects, but it is time consuming and costly. Also, there are a limited number of measurements

Fig. 1 External symptoms of pes planus a crumple of medial fallen arch, b malalignment of the hindfoot and c forefoot abduction

Design and Development of Machine Learning Model …

797

available to treat flatfoot without harming the patient. Moreover, it is very tricky for making comparisons directly on the different medical procedures performed on confounding variables which are generated from the treatments given to specific patients [3–6]. However, computational models are significantly influential, and they are restricted to certain degree based on the robustness of its development and the rigorousness of its validation. There are a number of issues such as anatomical accuracy, tissue mechanical properties and biofidelic boundary conditions which arise in computational models. Cadaveric models are efficient tools which have often adopted to be used to investigate more invasive biomechanics. In cadaveric models, vitro lower limbs use to be loaded in the physiological simulation. It presents an efficient way to explore foot mechanics and diverse treatment methods [7–9]. It permits to measure the different parameters which would not ethically be feasible on living human being. For example, quantifying motion with bone pines is very difficult task with living human beings. However various studies have been carried out on flatfoot, most of the research and studies are quasi-static (or static), and examine the farefoot at midstance, or select particular specific locations in stance. Dynamic gait simulators are a recently developed simulator that avail researchers to perform completely dynamic simulation with cadaveric specimens [6, 10].

2 Literature Review After an extensive review carried out in the related work, it is observed that the aim of the statistical model used for ideology is to enhance efficiency of a computational foot system by using artificial neural networks. Human joints are used for computational foot modeling to observe joint deformation and kinematics and also to study about how joint function is affected by its structure. More particularly, research works carried out using these foot computational models have done study on many topics which includes joint motion and the positions of related bones with simulation of load, and the forces made on joints due to injury or day to day routine activities, or the hardware positioning for correcting the defect. A. Evaluating calcaneal load from the footprint of human while standing using a 3D scanner. This research has done an extensive research in finding the relationship between the footprint load and its depths in the calcaneal area of the human standing straightly. Footprint depths which are the deformation in calcaneal area have been obtained through the z-value extraction obtained from 3D scanner foot scanning. In this observation, force-sensing resistor sensor has been placed over the shoe in the calcaneal area. Then, the peak loads are estimated from the footprint. To carry out these findings, 20 patients have been selected [1, 11, 12].

798

N. S. K. Babu et al.

In this study, a notable difference is observed in calculating the calcaneal loading due to plantar foot position of patients. It is observed that the plantar foot position that bends toward front, back or side also affects the result. 3D scanner can be used to estimate the calcaneal loading during standing posture. The benefit of implementing this method is to calculate the pressure or load at the footprint that contact with another surface area. It is calculated by using the desired footprint depth location instead of the maximum footprint. B. The experimental results of the calcaneal lengthening osteotomy to approach pes planovalgus and evaluate the position of the foot. The efficiency of the calcaneal lengthening is evaluated using an adapted Evans osteotomy method in handling pes planovalgus and reinstates the normal position of the foot in patients. This technique has been carried out using the adapted Evans technique among 11 patients with different age group with pes planovalgus distortion. Five patients have been cerebral palsy, one was sequela of myelomeningocele, and one was sensorimotor polyneuropathy, and four patients were assessed as idiopathic. Out of 11 patients, ten patients undergone long-term conventional therapy preoperatively, but one patient had not gone for any surgery to cure the distortion of the foot [6, 10, 13]. Clinical assessment has been done on ten different parameters. Radiographic assessment has been done on seven different parameters on typical anteroposterior and radiographs. These assessments have been carried out for 15 months. After the thorough assessment, it has been observed that union has been attained in the patients after seven weeks. It is also observed that clinical result was excellent in 17 feet, good in 3 feet, fair in 1 foot and poor in 1 foot. In radiographic, five feet have been observed as excellent, 13 feet as good and four feet as fair, respectively. It is observed that calcaneal lengthening has been done for 7.3 mm length. It is also found that distortion or overcorrection has not occurred during and after the treatment. Also, before carrying out surgery, five of the patients could able to walk on the heel of their foot without any support [6, 9]. C. Deep CNN model for classification of sentence. Deep CNN model used classification of sentences and has three of the filtered region with sizes: 2, 3 and 4, and each has two filters as shown in Fig. 2. Filters on this model perform convolution in sentence matrix and produce feature maps with variable length. Followed by this step, 1-max pooling will be carried out over each of the maps. From six maps, univariate feature vector is created and is merged to formulate a feature vector to penultimate layer. Subsequently, this feature vector is sent as the input to the final layer of softmax model to perform classification in the sentence; here, as we perform binary classification, it produces two outputs such as normal and defected [10, 14].

Design and Development of Machine Learning Model …

799

Fig. 2 Illustration of deep CNN model for classification of sentence [14]

3 Methodology The research work on the proposed method is partitioned into four parts, namely data collection, data preprocessing, fusion and features extraction, and classification of patterns. First, in the data collection phase, images have been collected from individuals. In the second phase, deep convolution neural network is applied to extract images. In the third phase, important features are extracted from the images. During the final phase, softmax algorithm, a pattern recognition algorithm has been for recognition of patterns as either normal or defected.

800

N. S. K. Babu et al.

Fig. 3 Deep CNN softmax model for classification of calcaneal image

4 Development of a Novel Classification Model The goal of developing the novel classification model which takes a collection of sample test case inputs to provide a reliable coverage at a identifiable depth in test space. This leads to a set of test cases which are focused on triggering the functionality of the foot independent of the model used for the implementation. This softmax model based on a deep CNN can be divided into three steps. 1. Defining the operational scope of the softmax model. This step included data acquisition and preprocessing stage of the osteoarthritis/flatfoot identification. 2. Identifying and enumerating the attributes in images and its values, respectively. This step includes analyzing the preprocessed image which re-fed as input to CNN softmax algorithm. 3. Appling the deep CNN softmax model on the data/images and classifying the images either as normal or defected (Fig. 3). The detailed architectural model for automatic detection of images of flatfoot and prediction of results is shown in Fig. 4. It is similar to digital stain which identifies the images’ region that is required and most relevant to classify as either normal or defected in flatfoot. This research work is intended to propose a model for classification of flatfoot by using CNN softmax algorithm. The efficacy of this model will be tested on varying foot images to detect its distortion. Its efficiency parameters also will be evaluated by comparing with that of comparative algorithms.

5 Importance of Proposed Research The proposed research on developing a novel classification model is to predict and approach calcaneal Shift in the foot of flatfoot patients. This model helps to reduce the time complexity, and the accuracy of the prediction result is also better compared

Design and Development of Machine Learning Model …

801

Visual Interpretable Prediction Automatic Detection of Basal-Cell Carcinoma (Softmax Classifier)

Image Representation (Convolutional Auto Encoder)

Unsupervised Failure Learning

Image

Fig. 4 Architecural model for classification of calcaneal image using softmax algorithm

to the conventional method of predicting or measuring the calcaneal shift. As it takes less time and accurate in predicting the calcaneal shift, patients can able to undergone preventive measures in early stages to avoid difficult effects on the joints such as hip, pelvis, knee and spine. So this method can be used to predict the flatfoot problem before the deformity occurs.

6 Conclusion Calcaneal lengthening osteotomy is used for pain relief and notable clinical and radiographic modification in forefoot and hindfoot for symptomatic pes planovalgus. Different feeding techniques can be implemented to further enhance the model. Further work can be carried out to detect the congenital anomalies related to this calcaneal shift. And also, this program code can also be made to design for design standards other than the Indian Standard by incorporating necessary modifications.

References 1. Albon, T.: Plantar force distribution for increasing heel height within women’s shoes. Physics, The College of Wooster, Wooster, Ohio, December 2011 2. Wibowo, D.B., Gunawan, D.H., Agus, P.: Estimation of foot pressure from human footprint depths using 3D scanner. AIP Conf. Proc. 1717 (2016)

802

N. S. K. Babu et al.

3. Wright, R.W., Boyce, R.H., Michener, T., Shyr, Y., McCarty, E.C., Spindler, K.P.: Radiographs are not useful in detecting arthroscopically confirmed mild chondral damage. Clin. Orthop. Relat. Res. 245–25 (2006) 4. Urry, S., Wearing, S.: Arch indexes from ink footprints and pressure platforms are different. Foot 15(2), 68–73 (2005) 5. Hunt, A.E., Fahey, A.J.: Static measures of calcaneal deviation and arch angle as predictors of rearfoot motion during walking. Aust. J. Phys. 46, 9–17 (2000) 6. Hsu, T.C., et al.: Comparison of the mechanical properties of the heel pad between young and elderly adults. Arch. Phys. Med. Rehabil. 79, 1101–1104 (1998) 7. Barrett, S.L., O’Malley, R.: Plantar fasciitis and other causes of heel pain. Am. Fam. Phys. 15:59(8), 2200–2206 (1999) 8. Nass, D., Hennig, Treek, V.: The thickness of the heel pad loaded by bodyweight in obese and normal weight adults. Biomechanics Laboratory, University of Essen, Germany, D 45117 (2000) 9. Pinto, C.C., Marques, M., Ramos, N.V., Vaz, M.A.P.: 3D modelling for FEM simulation of an obese foot. ResearchGate. Conference Paper, January 2010 10. Filardi, V.: Flatfoot and normal foot a comparative analysis of the stress shielding. 15(3), 820–825 (2018) 11. Baldonado, M., Chang, C.-C.K., Gravano, L., Paepcke, A.: The stanford digital library metadata architecture. Int. J. Digit. Libr. 108–121 (1997) 12. ScanPod3D: 3D Scanner Mini and Scansoft for Foot Orthotic. Vismach Technology Ltd. www. scanpod3d.com (2013) 13. Lee, D.G., Davis, B.L.: Assessment of the effects of diabetes on midfoot joint pressures using a robotic gait simulator. Foot Ankle Int. 30(8), 767–772 (2009) 14. Kim, J.: Convolutional neural network (CNN) perform text classification with word embeddings. In: Towards Data Science, Dec 3, 2017

Author Index

A Aathira, M., 307 Abbas, Junaid, 527 Abimannan, Satheesh, 429 Abraham, Mathews, 473 Aditya Sai Srinivas, T., 43 Afnan, Ayesha, 527 Agarwal, Mohit, 1 Agrahari, Neeraj Kumar, 215 Ahire, Deepak, 341 Ahir, Hemal, 573 Akhilesh, N. S., 555 Akhil, K., 63 Amrita, I., 597 Anand, Nikhil, 457 Aniruddha, M. N., 555 Anirudh, R. V., 317 Antonidoss, A., 483 Appalanaidu, Majji V., 515 Aravind, Karrothu, 43 Arya, S. J., 633 Ashok Kumar, S., 19, 27 Asish, A., 633 Awasthi, Lalit Kumar, 651

Bharti, Shubam, 153 Bhat, Prashant, 547, 565 Bhat, Aruna, 161 Bhatia, Rajesh, 153 Bhattacharyya, Koushik, 737 Bhosale, M. S., 205 Budyal, Rahul Rajendrakumar, 403

B Babu, Brundha Rajendra, 317 Babu, Naidu Srinivas Kiran, 795 Balakesava Reddy, P., 53 Bala, Shashi, 493 Begum, Gousiya, 269 Bhandari, Smriti, 341 Bhandigare, Shivani, 643 Bhardwaj, Vivek, 195, 493

E Eapen, Justin, 297 Emon, Ismail Siddiqi, 363, 761

C Chandan, Rithvik, 769 Chegaraddi, Sangeetha S., 403 Chhabra, Bhavya, 153 Choubey, Rishita, 737 Choudhary, Mukesh, 87 Cross, Maria Anisha, 277

D Das, Arijit, 161 Devarapalli, Danny Joel, 259 Dheemanth, G. R., 787 Dilli Babu, S., 749 Dubey, Nishita, 701

F Faizul Huq Arif, Md., 363 Febi Shine, B. S., 633 Fernandes, Chelsea, 701

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Lecture Notes in Networks and Systems 171, https://doi.org/10.1007/978-981-33-4543-0

803

804 G Gajbhiye, Snehal, 333 Gajjar, Sachin, 693 Garg, Deepak, 215 Ghosh, Anirban, 555 Godwin Barnabas, S., 327 Gopalakrishnan, E. A., 185 Gorijavolu, Harshit, 259 Gote, Anuja, 9 Govardhan, A., 465 Govinda, K., 53 Govindasamy, C, 483 Gupta, Anmol, 429 Gupta, Himanshu, 437, 447 Gupta, Suneet Kr., 1 Gupta, Sunny, 9 Gurumurthi, Jaishankar, 87

H Hamdare, Safa, 701 Hari, Akshaya, 527 Harini, N., 185 Harisankar, V., 709 Hegde, Prajna, 565 Hossain, Javed, 353 Hossain, Mohammad Mobarak, 761 Huq, S. Zahoor Ul, 269

I Islam, Aminul, 583

J Jahan, Busrat, 363, 761 Jaimin, Patel, 241 Jaya Kumar, D., 393 Jayanthi, S., 795 Jena, Debasish, 73 Jetawat, Ashok, 385 Jeyakumar, G., 307, 411, 539 Jha, Shikha, 9 John, Jeffin, 297 John, Jewel Moncy, 297 Joseph, Ebin, 297 Julfiker Raju, Md., 363 Juliet, D. Sujitha, 225

K Kadam, Aishwarya, 643 Kadyan, Virender, 195, 493 Kalsekar, Samruddhi, 701

Author Index Kaman, Sweta, 235 Kamble, Kiran, 341 Kamble, Kiran P., 643 Kamble, R. M., 205 Kanchan, Shohna, 701 Karjole, Aditi, 333 Karri, Sai Prashanth Reddy, 259 Kathuria, Shivam, 153 Katkuri, Pavan Kumar, 605 Kaur, Harshdeep, 195 Kaviya, V., 709 Khan, Shahnawaz, 429 Kodavade, Prajkta, 643 Koushik, Rahul M., 769 Krishna Menon, Vijay, 185 Kulkarni, Linganagouda, 79 Kulkarni, Tejas, 9 Kumaravelan, G., 515 Kumari, Dara Anitha, 465 Kumari, Ruchika, 99 Kumar, Manish, 153 Kumar Mohanta, Bhabendu, 73 Kumar, Rakesh, 99

L Ladge, Leena, 87 Lakshmi Priya, E., 411

M Maan, Veerpaul Kaur, 123 Maheshwari, Sagar, 693 Mahtab, Sheikh Shahparan, 363, 761 Maity, Soumayadev, 583 Malaganve, Pradnya, 547 Malage, Rajshri N., 375 Malathi Latha, Y. L., 143 Malhotra, Ruchika, 727 Manivannan, S. S., 43 Mantri, Archana, 605 Mate, Yash, 171 Mathur, Aakansha, 503 Mavilla, Venkata Sai Dheeraj, 259 Milu, Sharmin Akter, 353, 363, 761 Mohapatra, Niva, 73 Moharana, Suresh Chandra, 719 Moon, Kapila, 385 Mund, Ganga Bishnu, 719 Mupila, Francis K., 437, 447 Murali Krishna, T., 135 Murthy, GRS, 285 Myna, P., 317

Author Index N Nagaprasad, S., 677 Nagaraj, Akash, 63 Nagdev, Nikhil, 171 Nagpal, Rahul, 769, 787 Nair, Akhil, 87 Narayana, M. V., 749 Nayak, Jyothi S., 317 Nehal, Patel, 241 Nimmakuri, Sri Anjaneya, 259 Nobi, Ashadun, 353 Nuthakki, Ramesh, 527

P Padmavathi, S., 709 Pandey, Mithilesh Kumar, 215 Pandey, Shivani, 727 Pandey, Sonal, 663 Patel, Meet, 573 Pathak, Bageshree, 333 Patil, Gayatri, 171 Patil, Mithun B., 375 Patil, S. T., 205 Pawar, Sanjay S., 33 Pawar, Sonali, 333 Pentapati, Niharika, 769 Pranitha, B. L., 597 Prashamshini, Eleanor, 317 Premjith, B., 615 Priya, R. L., 171 Pushpatha, T., 677

R Radhakrishnan, Anisha, 539 Rahman, Mostafizur, 251 Rajakarunakaran, S., 1, 327 Raja Kishore, R., 393 Raja, Laxmi, 779 Rajkumar, K., 795 Rama, B., 115 Rama Krishna, C., 663 Ramasubbareddy, Somula, 53 Ramji, B., 185 Ranjan, Sudhanshu, 583 Rastogi, Abhinav, 161 Rastogi, Akanksha, 623 Rawat, Arun Pratap, 583 Reddy, E. Madhusudhana, 795 Redekar, Neha, 643 Regi, Mathew, 473 Rijith Kumar, V., 327

805 Rohit, Chatla Venkat, 285

S Sahu, Gitimayee, 33 Saidulu, Ch., 135 Saidulu, D., 53 Sai Sreekari, C., 411 Sajith Variyar, V. V., 185 Salwan, Poonam, 123 Sandip, Patel, 241 Sangal, Amrit Lal, 623, 657 Sankarachelliah, N., 327 Santhosh, R., 779 Saravanan, S., 225 Sateesh Kumar, K., 135 Satyavathi, K., 393 Satyavathi, N., 115 Selva Sundar, T., 327 Sen, Protiva, 251 Sen, Snigdha, 597 Senthilram, P., 327 Shanmuganantham, T., 19, 27 Shariff, Faisal Ahmed, 527 Sharma, Ashu, 663 Sharma, Hemant Kumar, 657 Sharma, Kapil, 419 Sharma, Sanjay, 663 Shekar, K. Chandra, 277 Sindhu, K., 555 Singh, Shivam, 215 Singh, Shreyanshi, 73 Sirisati, Ranga Swamy, 749 Siva Kumar, A. P., 269 Skanda, V. C., 787 Soman, K. P., 185, 615 Somula, Ramasubbareddy, 43 Sowmya, V., 185 Sreelakshmi, J. L., 633 Sreelakshmi, K., 615 Srikanth, H. R., 63 Srinath, Raghunandan, 403 Srinivas, Pattlola, 143 Srinu, Dhonvan, 393 Subramanyam, S., 19 Sultana, Razia, 503 Sumukh, Y. R., 403 Swain, Amulya Ratna, 719 Swami Das, M., 143

T Thakkar, Falgun, 573

806 Thejas, B. K., 597 Thirunavukkarasu, K., 429 Thomas, Abraham K, 297 Tirodkar, Gaurav, 171 Tulasi Sasidhar, T., 615

Author Index

U Udaya Bhanu, P., 135

Venkatesh, Akshay, 63 Venugopal Rao, K., 27 Verma, Harsh K., 651 Verma, Prashant, 419 Vichore, Hrishikesh, 87 Vijayalakshmi, M., 79 Vijay Kumar, P., 135 Vishnu Vardhana Rao, M., 749 Vrindavanam, Jayavrinda, 403

V Valai Ganesh, S., 1, 327 Varghese, Elizabeth, 633 Vasudevan, Vignesh, 277

Y Yadav, Anupama, 651 Yashaswini, L., 403