Intelligent Systems and Sustainable Computing: Proceedings of ICISSC 2021 (Smart Innovation, Systems and Technologies, 289) 9811900108, 9789811900105

The book is a collection of best selected research papers presented at the International Conference on Intelligent Syste

140 79 19MB

English Pages 695 [662] Year 2022

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Conference Committee
Preface
Contents
About the Editors
Unsupervised Learning for the Automatic Management of Scientific Aspect in Academic Conferences
1 Introduction
2 Related Work
3 Methods' Description
3.1 Topic Modeling
3.2 Word Embedding
3.3 Detection of Out of Scope Papers Task
3.4 Sessions Creation and Conference Program Schedule
4 Experimental Results
4.1 Out of Scope Detection
4.2 Clustering Papers into Sessions
5 Conclusion
References
Context Sensing from Deployed IoT Sensors for Improved Public Transport Systems
1 Introduction
1.1 IoT Architecture
1.2 Vehicular Context-Awareness System
2 Related Work
3 Problem Statement and Case Study Area
3.1 Problem Statement
3.2 Case Study Area
4 Proposed Solution
4.1 Proposed System Architecture
4.2 IoT-Based Prototype Hardware Components
4.3 Data Collection Using the Prototype
5 Conclusion
References
College Admissions Counseling Using Intelligent Question Answering System
1 Introduction
2 Related Works
3 Methodology
3.1 The Proposed Model
3.2 Word Embedding
3.3 Bi-LSTM
3.4 Question Answering System
4 Experiments
4.1 Dataset
4.2 Result
5 Conclusion
References
Juvenile Crime Categorization with EM Clustering
1 Introduction
2 Related Work
3 Methodology
4 Experiments, Results and Discussions
5 Conclusions
References
Review of Interoperability Issues Influencing Acceptance and Adoption of Cloud Computing Technology by Consumers
1 Introduction
2 Review of Literature
2.1 Software as a Service (SaaS)
2.2 Platform as a Service (PaaS)
2.3 Infrastructure as a Service (IaaS)
3 Cloud Computing Implementation Challenges
3.1 Common Cloud Computing Issues
3.2 Issue of Interoperability of Resources Across Heterogeneous Cloud Platforms
4 Solutions to Cloud Computing Challenges
4.1 Results of Survey on Interoperability Solutions in the Cloud
5 Conclusion
References
Time Series Analysis Using Random Forest for Predicting Stock Variances Efficiency
1 Introduction
2 Literature Survey
3 Methodology
4 Conclusion
References
Prediction of COVID-19 Severity Level Using XGBoost Algorithm: A Machine Learning Approach Based on SIR Epidemiological Model
1 Introduction
2 Literature Review
3 Methodology
3.1 System Model
4 Model Evaluation
5 Conclusion
References
A Blockchain-Based Architecture for Implementing Job Recruitment Operations
1 Introduction
2 Blockchain Technology
3 Proposed Permissioned Blockchain Platform
3.1 Model
3.2 Phases of the Model
4 Implementation of the Network
5 Conclusion
References
Analysis of the Interest in the Profession of Tutor on the Internet (A Case Study of Google Trends)
1 Introduction
2 Research Methods
3 Results
3.1 Analysis of the Interest in the Search Query “Tutor” on the Internet in Ukraine Using the Web Service Google Trends
3.2 Assessment of the Relationship Closeness Between the Search Query “Tutor” Level of Popularity in Ukraine and the Average Percentage of People Who Passed the External Independent Testing (EIT)
4 Discussion and Conclusions
References
A PID Temperature Controller for Passive Solar Water Heaters with a Low-Cost Three-Way Valve
1 Introduction
2 Methodology
2.1 System Overview
2.2 Experimental Model Development
2.3 Control Algorithm Development
2.4 System Software Design
3 Results and Discussion
3.1 Characterization of the Three-Way Valve
3.2 Temperature Control with the Experimental Model
3.3 Temperature Control with the Passive-Heater Model
4 Conclusions
References
MATLAB/Simulink-Based Simulation Model for Minimizing the Torque Ripple Reduction in Brushless DC Motor
1 Introduction
2 Related Works
3 The Benefits of the Proposed Model
4 Proposed System—BLDC Motor Using Fuzzy Logic Controller
5 Results and Discussion
6 Conclusion
References
Improving Accuracy in Facial Detection Using Viola-Jones Algorithm AdaBoost Training Method
1 Introduction
2 Related Work
2.1 Face Detection
2.2 Accuracy Measurement
2.3 Viola-Jones Algorithm
2.4 Advantages of Viola-Jones Algorithm
3 Proposed System
3.1 Viola-Jones Algorithm
3.2 Procedure of Viola-Jones Algorithm
3.3 Stages of Viola-Jones Algorithm
4 Experimental Result
5 Conclusion
References
Comparative Analysis Model for Lambda Iteration, Genetic Algorithm, and Particle Swarm Optimization-Based Economic Load Dispatch for Saudi Arabia Network System
1 Introduction
2 Saudi Arabia Network Studied System
3 Lambda Iteration
3.1 Problem Formulation of Lambda Iteration for ELD
3.2 Neglecting Losses and Including Generations Limits
3.3 Including Losses and Generations Limits
4 Genetic Algorithm
5 PSO Algorithms
6 Result and Dissection
7 Conclusion
References
A Novel Approach for Providing Security for IoT Applications Using Machine Learning and Deep Learning Techniques
1 Introduction
2 Related Work
3 Proposed Learning Models
3.1 One Class SVM Algorithm
3.2 Deep Autoencoder Algorithm
4 Evaluation Methodology
5 Experimental Results
6 Conclusion and Future Work
References
Analysis of Bangla Keyboard Layouts Based on Keystroke Dynamics
1 Introduction
2 Previous Works
3 Bangla Keyboard Layouts
4 Analysis Criteria
5 Experimental Design
6 Key-Stroke Data Collection
7 Evaluation Corpus
7.1 Text Selection and Preparation
8 Methodology
9 Results
10 Conclusion
References
Improved Efficiency of Semantic Segmentation using Pyramid Scene Parsing Deep Learning Network Method
1 Introduction
2 Related Work
2.1 Semantic Segmentation
2.2 Pixel Accuracy Metrics Used in Semantic Segmentation
3 Proposed Methodology
3.1 Advantages of Pyramid Scene Parsing Network
3.2 Working of PSPNet
3.3 Dilated Convolution
4 Experimental Results
5 Conclusion
References
Fractional Approach for Controlling the Motion of a Spherical Robot
1 Introduction
2 System Modeling
3 Controller Design
4 Simulation Results
5 Conclusion
References
Progression of Metamaterial for Microstrip Antenna Applications: A Review
1 Introduction
2 Evolution of Metamaterials
3 Nicholson-Ross-Weir Method (NRW)—RF Material Measurement
3.1 Mathematical Formulation-Parametric Retrieval Method
4 Discussion
5 Conclusion
References
Natural Language Processing of Text-Based Metrics for Image Captioning
1 Introduction
2 Related Work
3 Proposed System
4 Implementation
4.1 Data Preprocessing
4.2 Extract Features
4.3 Formatting Dataset
4.4 Building Model
4.5 Evaluating the Model Performance
5 Results and Discussion
6 Conclusion
References
Li-Fi Enables Reliable Communication of VLT for Secured Data Exchange
1 Introduction
2 Working Model
2.1 Energy Impart Section
2.2 Receiver Section
3 Implementation
3.1 Data Interpreting Unit
4 Results and Discussion
5 Conclusion and Future Work
References
Design of Dual-Stack, Tunneling, and Translation Approaches for Blockchain-IPv6
1 Introduction
1.1 IPv4
1.2 IPv6
2 Literature Survey
3 Proposed Work
3.1 Dual-Stack
3.2 Tunneling
4 Proposals in Tunneling
4.1 Translation
4.2 Translation Layout
5 Simulation Analysis
6 Conclusion
References
Wavelet-Based Aerial Image Enhancement Tools and Filtering Techniques
1 Introduction
2 Research Findings
2.1 Wavelet-Based Satellite Image Enhancement Techniques
2.2 Geo-correction and Orthorectification Techniques in Enhancing Aerial Images
2.3 Image Enhancement Using HE
2.4 Filtering Techniques Employed in Image Enhancement
3 Conclusion
References
Efficiency Optimization of Security and Safety in Cooperative ITS Communication for QoS Service
1 Introduction
2 Related Works
2.1 Standards of Security in C-ITS
2.2 C-ITS Involving Security-QoS-Safety Trade-Offs
2.3 Capture of Precise Vehicle Safety Awareness
3 Safety Awareness Metrics
3.1 Infrastructure-Centric Standards for Safety Awareness
3.2 Awareness Metrics Based on Vehicle-Centric Safety
4 Result
4.1 Network Performance
5 Conclusion
References
Classification of Medicinal Plants Using Machine Learning
1 Introduction
2 Related Works
3 Methodology
4 Results and Discussion
5 Comparison of Results of SVM and ResNet50
6 Conclusion
References
Performance Measures of Blockchain-Based Mobile Wallet Using Queueing Model
1 Introduction
2 Literature Review
3 Role of Blockchain in the Mobile Wallets and Mobile Payments
4 Model Description and Queueing Model
5 Numerical Results
6 Conclusion
References
Conventional Data Augmentation Techniques for Plant Disease Detection and Classification Systems
1 Introduction
2 Related Works
3 Materials and Methods
3.1 Dataset
3.2 Methodologies Adopted to Enhance the Dataset
4 Conclusion
References
Directionality Information Aware Encoding for Facial Expression Recognition
1 Introduction
2 Literature Survey
3 Proposed Approach
3.1 Robinson Compass Mask
3.2 Encoding
4 Experimental Results
4.1 Results Over 7-class
4.2 Results Over 6-class
5 Conclusion
References
MRI Breast Image Segmentation Using Artificial Bee Colony Optimization with Fuzzy Clustering and CNN Classifier
1 Introduction
1.1 Related Study
2 Materials and Methods
2.1 Image Dataset
2.2 Median Filters
2.3 Artificial Bee Colony Optimization
2.4 Fuzzy C Means Clustering
2.5 Convolutional Neural Network Classifier
2.6 Objective and Subjective Measures
3 Result and Discussion
4 Conclusion
References
Human Action Recognition in Still Images Using SIFT Key Points
1 Introduction
2 Methodology
2.1 Pre-processing
2.2 Generating SIFT Key Points
2.3 Statistical Information of SIFT Key Points
3 Experiment and Results
4 Conclusion
References
Detecting Credit Card Fraud Using Majority Voting-Based Machine Learning Approach
1 Introduction
1.1 Logistic Regression
1.2 Gradient Boosting
1.3 Random Forest
1.4 K-Nearest Neighbors
1.5 Voting Classifier
2 Related Lıterary Work
3 Experimental Setup
3.1 Dataset Description
3.2 About Anaconda Framework
4 Result and Discussion
5 Conclusion
References
Evaluation and Performance Analysis of Magnitude Comparator for High-Speed VLSI Applications
1 Introduction
2 Proposed Comparator Circuits
2.1 Proposed 11 T 1-bit Comparator Circuit1
2.2 Proposed 10 T 1-bit Comparator Circuit2
2.3 Proposed 2-bit Comparator Circuit
3 Simulation Results
4 Conclusions
References
Artificially Ripened Mango Fruit Prediction System Using Convolutional Neural Network
1 Introduction
2 Related Work
2.1 Motivation and Contribution
3 Proposed CNN-based Artificially Ripened Mango Fruit Prediction System
3.1 Image Acquisition and Preprocessing
3.2 Feature Extraction
3.3 Feature Extraction and Training for CNN-based Mango Fruit Prediction
4 Results and Discussion
4.1 Experimental Setup
5 Conclusion
References
Quality and Dimensional Analysis of 3D Printed Models Using Machine Vision
1 Introduction
2 Methodology
2.1 Image Acquisition and Capturing
2.2 Edge Detection
2.3 Morphological Operations
2.4 Detection Algorithms for 3D Printed Models
3 Experimental Results
3.1 Shape Defect Detection
3.2 Chip Defect Detection
3.3 Color Defect Detection
3.4 Dimension Defect Detection
4 Conclusion
References
Traffic Density Determination Using Advanced Computer Vision
1 Introduction
2 Assumptions and Limitation
3 Methodology
3.1 Single Shot Detection (SSD)
3.2 Faster R-CNN (SSD)
4 Data Collection and Preparation
4.1 Data Sources
4.2 Video Capture and Frame Extraction
4.3 Data Annotation
4.4 Data Analysis and Initial Findings
5 Model Building and Research
5.1 Model Framework Selection
5.2 Model Selection
5.3 Faster R-CNN Inception V2
6 Result and Discussion
7 Conclusion
References
Analyzing and Detecting Fake News Using Convolutional Neural Networks Considering News Categories Along with Temporal Interpreter
1 Introduction
2 Problem Statement
3 Data Analysis
3.1 Datasets
3.2 Text Analysis
3.3 Image Analysis
4 Architecture
4.1 Text Image Convolutional Neural Network
4.2 Text Convolutional Neural Network
4.3 System Workflow
5 Implementation
5.1 Text Branch
5.2 Image Branch
6 News Category Classification
7 Temporal Interpreter
7.1 Sliding Lookback Window Analysis
8 Experiments
9 Conclusion and Future Scope
References
Automatic Student Attendance and Activeness Monitoring System
1 Introduction
2 Methodology
3 Implementation
3.1 Database Design Using XAMPP
3.2 Setting Up the Coding Environment
3.3 Home Window
3.4 Student Details Window
3.5 Dataset Training Functionality
3.6 Continuous Monitoring of the Students
3.7 Saved Photo Collection
3.8 Attendance Window
3.9 Help and Support
4 Results and Discussion
5 Conclusion
References
Greenput Algorithm for Minimizing Power and Energy Consumption in Hybrid Wireless Sensor Networks
1 Introduction
2 Related Works
3 Flow Control and Representations
4 Greenput Algorithm
5 Conclusion
References
Cloud-Based Face and Face Mask Detection System
1 Introduction
1.1 Overview
1.2 Conventional Methods
1.3 Related Works
2 Proposed System
2.1 Methodology
2.2 Applications/Advantages
3 Simulations/Implementation Results
3.1 Results with Description
3.2 CNN Model
4 Conclusion and Future Works
4.1 Conclusion
4.2 Future Works
References
Design of Delay-Efficient Carry-Save Multiplier by Structural Decomposition of Conventional Carry-Save Multiplier
1 Introduction
2 Conventional Carry-Save Multiplier
2.1 Half Adder
2.2 Full Adder
3 Modified Carry-Save Multiplier
4 Results and Discussion
5 Conclusion
References
Speech Intelligibility Quality in Telugu Speech Patterns Using a Wavelet-Based Hybrid Threshold Transform Method
1 Introduction
1.1 Speech Processing
1.2 Denoising Techniques
1.3 Wavelet Transforms
2 Literature Survey
2.1 Noisy Speech Enhancement
2.2 Existing Method
2.3 Problem Identification
3 Proposed Method
4 Results and Discussion
5 Conclusion and Future Scope
References
Landslide Susceptibility for Communities Based on Satellite Images Using Deep Learning Algorithms
1 Introduction
2 Methodology
2.1 Image Acquisition and Preprocessing
2.2 Pre-trained CNN and Classification of the Mixed Images
2.3 Hazard Prediction
3 Results and Discussion
4 Conclusion
References
Comparative Study on Estimation of Remaining Useful Life of Turbofan Engines Using Machine Learning Algorithms
1 Introduction
2 Related Work
3 Proposed Model
3.1 CNN + LSTM Ensemble Model
3.2 Long Short-Term Memory Model
3.3 Statistical Models
4 Results
5 Conclusion
References
Machine Learning-Based Framework for Malware Analysis in Android Apps
1 Introduction
1.1 Motivation
1.2 Problem Statement
1.3 Key Contributions
2 Literature Review
3 Android Malware Detection
3.1 Feature Extraction
3.2 Feature Selection
3.3 Training ML Model
4 Proposed Methodology
4.1 Model Building
4.2 App Prediction
5 Dataset
6 Results
7 Conclusion and Future Work
References
A Survey on Byzantine Agreement Algorithms in Distributed Systems
1 Introduction
2 Analysis of the Algorithms
2.1 The Byzantine Generals Problem ch44c1
2.2 Solution to Byzantine Algorithm When Both Links and Processors Are Subjected to Hybrid Faults ch44c2
2.3 A Randomized Protocol to Solve Byzantine Agreement ch44c3
2.4 Protocol to Solve Byzantine Agreement in Cloud Environment ch44c4
2.5 Degradable Byzantine Agreement
2.6 Protocol to Have Distributed Agreement When Both Links and Nodes are Subjected to Faults
3 Conclusions
References
An Autonomous Intelligent System to Leverage the Post-harvest Agricultural Process Using Localization and Mapping
1 Introduction
2 Scope and Contribution
3 Previous Work
4 Proposed Approach
5 Simulation and Results
5.1 Simultaneous Localization and Mapping for the Un-mapped Model
5.2 Experimental Analysis for the Un-mapped Model of the Farm
6 Conclusion
References
Zero-Day Attack Detection Analysis in Streaming Data Using Supervised Learning Techniques
1 Introduction
1.1 Anatomy of a Zero-Day Malware
1.2 Zero-Day Exploit Market
2 Related Works
2.1 Renowned Zero-Day Vulnerabilities
3 Methodology
3.1 CICIDS 2017 Dataset
3.2 Experimental Setup
3.3 Random Forest
3.4 Random Tree
3.5 Naïve Bayes
3.6 Hoeffding Tree
3.7 Analysis Tool Used
4 Experimental Result Analysis
4.1 Dataset Processing
4.2 Result Analysis
5 Conclusion
References
Comparative Evaluation of Machine Learning Methods for Network Intrusion Detection System
1 Introduction
2 Prior Survey
3 Network Anomalies
3.1 Network Configuration Anomalies
3.2 Security-related Anomalies
4 Network Anomalies Detection
4.1 Intrusion Detection System (IDS)
5 Network Anomalies Identification Techniques
5.1 Misuse Detection
5.2 Anomaly Detection
5.3 Hybrid Approach
6 Machine Learning
6.1 Supervised Learning
6.2 Unsupervised Learning
7 Datasets
7.1 KDD Cup 1999 Datasets
8 Evaluation Metrics
9 Machine Learning Tools
9.1 WEKA
10 Experiments and Results
11 Conclusion
References
Rare Pattern Mining from Data Stream Using Hash-Based Search and Vertical Mining
1 Introduction
2 Related Work
2.1 Basic Terminology
2.2 Rare Pattern Mining from a Data stream
2.3 Hybrid Eclat Algorithm for Rare Pattern Mining from Data Stream: HEclat-RPStream
2.4 Extending Two Items to Generate Large Itemsets
2.5 Structure of Two level Hash Table
3 Experimental Results
4 Conclusion
References
A Novel ARDUINO Based Self-Defense Shoe for Women Safety and Security
1 Introduction
1.1 Literature Survey and Existing System
2 Proposed System
2.1 Methodology
3 Results
4 Advantages
5 Conclusion
6 Future Scope
References
A New Supervised Term Weight Measure Based Machine Learning Approach for Text Classification
1 Introduction
2 Dataset Characteristics
3 A Term Weight Measure-Based ML Approach for Text Classification
4 Term Weight Measures (TWMs)
4.1 Term Frequency and Inverse Document Frequency (TFIDF)
4.2 Term Frequency and Relevance Frequency (TFRF) Measure
4.3 Term Frequency Inverse Document Frequency and Inverse Class Frequency (TFIDFICF) Measure
4.4 Proposed Class Specific Supervised Term Weight Measure (CSTWM)
5 Experimental Results of Term Weight Measures
6 Conclusions and Future Scope
References
Machine Learning-Based Human Activity Recognition Using Smartphones
1 Introduction
2 Proposed Work
3 Model Testing and Results
3.1 Random Forest Model
3.2 SVM Model
3.3 KNN Algorithm
3.4 Artificial Neural Network
4 Conclusion
References
A Feature Selection Technique-Based Approach for Author Profiling
1 Introduction
2 Dataset Characteristics
3 Feature Selection Techniques-Based Approach for Author Profiling
3.1 Chi-Square (CHI2) Measure
3.2 Mutual Information (MI)
3.3 Information Gain (IG)
3.4 Proposed Distributional Class Specific Correlation-Based FS Technique (DCC)
4 Experimental Results
4.1 Experimental Results with Most Frequent Terms
4.2 Experiment with Feature Selection Techniques
5 Conclusion and Future Scope
References
Detection of Fake News Using Natural Language Processing Techniques and Passive Aggressive Classifier
1 Introduction
2 Literature Survey
3 Proposed Work
3.1 Dataset
3.2 Data Preprocessing
3.3 Train-Test Split of Dataset
3.4 Feature Extraction
3.5 TF-IDF Vectorizer
3.6 Passive Aggressive Classifier
4 Algorithm and Implementation
4.1 Approach
5 Experimental Results and Discussion
6 Conclusion and Future Scope
References
Efficiency Analysis of Pre-trained CNN Models as Feature Extractors for Video Emotion Recognition
1 Introduction
2 Related Works
3 Methodology
3.1 Input Generation
3.2 Training of Models
3.3 Datasets Used
4 Result Analysis
5 Conclusion and Future Works
References
Mapping of Computational Social Science Research Themes: A Two-Decade Review
1 Introduction
2 Research Methods
3 Result and Discussion
3.1 Most Productive Organizational Affiliations in Computational Social Science Research
3.2 Most Individual Researcher in Computational Social Science Research
3.3 The Computational Social Science Sector's Annual Publications
3.4 Research Theme Map
4 Conclusion
References
Diagnosis of Pneumonia with Chest X-Ray Using Deep Neural Networks
1 Introduction
2 Methodology
2.1 Data Collection
2.2 Data Preprocessing
3 Proposed System
4 Results
5 Conclusion
References
High Performance Algorithm for Content-Based Video Retrieval Using Multiple Features
1 Introduction
2 Shot Boundary Detection
3 Key Frame Extraction
4 Proposed Spatio-temporal Feature Extraction
4.1 CBVR System Design
5 Performance Evaluation
6 Conclusion
References
Smart Recruitment System Using Deep Learning with Natural Language Processing
1 Introduction
2 Pre-processing
3 Proposed Work
4 Semantic Analysis
5 Result
6 Conclusion
References
Early Diagnosis of Age-Related Macular Degeneration (ARMD) Using Deep Learning
1 Introduction
2 Literature Survey
3 Preprocessing and Feature Extraction
4 Wet/Dry/No Disease
5 Role of Deep Learning on ARMD
6 Conclusion
References
Recent Trends in Calculating Polarity Score Using Sentimental Analysis
1 Introduction
2 Sentence Level Sentimental Analysis
3 Text Tokenization and Sentiment Scores
4 The Average Sentiment and Variance
References
Author Index
Recommend Papers

Intelligent Systems and Sustainable Computing: Proceedings of ICISSC 2021 (Smart Innovation, Systems and Technologies, 289)
 9811900108, 9789811900105

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Smart Innovation, Systems and Technologies 289

V. Sivakumar Reddy V. Kamakshi Prasad D. N. Mallikarjuna Rao Suresh Chandra Satapathy   Editors

Intelligent Systems and Sustainable Computing Proceedings of ICISSC 2021

Smart Innovation, Systems and Technologies Volume 289

Series Editors Robert J. Howlett, Bournemouth University and KES International, Shoreham-by-Sea, UK Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK

The Smart Innovation, Systems and Technologies book series encompasses the topics of knowledge, intelligence, innovation and sustainability. The aim of the series is to make available a platform for the publication of books on all aspects of single and multi-disciplinary research on these themes in order to make the latest results available in a readily-accessible form. Volumes on interdisciplinary research combining two or more of these areas is particularly sought. The series covers systems and paradigms that employ knowledge and intelligence in a broad sense. Its scope is systems having embedded knowledge and intelligence, which may be applied to the solution of world problems in industry, the environment and the community. It also focusses on the knowledge-transfer methodologies and innovation strategies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of disciplines from science, technology, business and the humanities. The series will include conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality content is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes will ensure that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles. Indexed by SCOPUS, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST), SCImago, DBLP. All books published in the series are submitted for consideration in Web of Science.

More information about this series at https://link.springer.com/bookseries/8767

V. Sivakumar Reddy · V. Kamakshi Prasad · D. N. Mallikarjuna Rao · Suresh Chandra Satapathy Editors

Intelligent Systems and Sustainable Computing Proceedings of ICISSC 2021

Editors V. Sivakumar Reddy Malla Reddy University Hyderabad, India D. N. Mallikarjuna Rao Federation University at IIBIT Sydney, NSW, Australia

V. Kamakshi Prasad Department of Computer Science Engineering Jawaharlal Nehru Technological University Hyderabad, India Suresh Chandra Satapathy KIIT University Bhubaneswar, India

ISSN 2190-3018 ISSN 2190-3026 (electronic) Smart Innovation, Systems and Technologies ISBN 978-981-19-0010-5 ISBN 978-981-19-0011-2 (eBook) https://doi.org/10.1007/978-981-19-0011-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Conference Committee

Chief Patron Sri. CH. Malla Reddy, Founder Chairman, MRGI

Patrons Sri. CH. Mahendar Reddy, Secretary, MRGI Sri. CH. Bhadra Reddy, President, MRGI

Conference Chair Dr. V. Sivakumar Reddy, Vice-Chancellor, Malla Reddy University, Hyderabad

Convener Prof. K. Kailasa Rao, Dean, School of Engineering

Publication Chair Dr. Suresh Chandra Satapathy, Professor, KIIT, Bhubaneswar

v

vi

Conference Committee

Co-convener Prof. P. Sanjeeva Reddy, Dean, International Studies

Organizing Chair Prof. G. S. Naveen Kumar, HOD, CSE (Data Science), MRUH

Organizing Secretaries Dr. E. V. Reddy, HOD, CSE, MRUH Dr. Thayyaba Khatoon Mohammed, HOD, CSE (AI & ML), MRUH Dr. Meeravali Shaik, HOD, CSE (Cyber Security), MRUH

Coordinators Dr. Rajasekar Rangasami, Professor, CSE (AI & ML) Dr. B. V. V. Siva Prasad, Professor, CSE Dr. Sudheer Nidamanuri, Professor, CSE (DS) Dr. Arun Singh Chauhan, Professor, CSE (CS) Dr. K. Vijaya Sekhar Reddy, Professor, MBA

Organizing Committee Dr. V. Dhanunjana Chari, Dean, School of Sciences Dr. P. S. V. Srinivasa Rao, HOD, I Year Engineering Dr. M. Ravikanth, Professor, CSE Dr. Magesh Kumar, Professor, CSE (AI & ML) Dr. Pranayath Reddy, Professor, CSE (AI & ML) Dr. K. Srikanth, Professor, CSE (Data Science) Prof. V. Ramachandran, CSE (AI & ML) Mr. T .Rama Rao, CSE Dr. G. Nanda Kishor Kumar, Professor, CSE Ms. K. Lakshmi Madhuri, CSE (Data Science) Mr. T. Vinay Simha Reddy, CSE (AI & ML) Mr. T. Sanjeeva Rao, School of Engineering

Conference Committee

vii

Session Chairs Prof. Yu-Dong Zhang, School of Informatics, University of Leicester, UK Dr. Jinshan Tang, George Mason University, Virginia, United States Dr. Sanju Tiwari, Universidad Autonoma de Tamaulipas, Mexico Dr. Naga Mallikarjuna Rao Dasari, Federation University @IIBIT, Australia Dr. Rama Murthy Garimella, Mahindra University, Hyderabad, India. Dr. Bhanu Murthy Bhaskara, Ex-Professor, University of Majmaah, Saudi Arabia Dr. Midhun Chakkaravarthy, Lincoln University College, Malaysia Prof. K.Venkat Rao, Dean Academics, Andhra University, Visakhapatnam, India Dr. Ilaiah Kavati, National Institute of Technology, Warangal, India Dr. E. Suresh Babu, National Institute of Technology, Warangal, India Dr. M. Ramesh, Professor, RVR & JC College of Engineering, Guntur, India

Preface

The International Conference on Intelligent Systems and Sustainable Computing (ICISSC-2021) was successfully organized by Malla Reddy University during September 24–25, 2021, at Hyderabad. The objective of this conference was to provide opportunities for the researchers, academicians and industry persons to interact and exchange the ideas, experience and gain expertise in the cutting-edge technologies pertaining to soft computing and signal processing. Research papers in the above-mentioned technology areas were received and subjected to a rigorous peer-reviewed process with the help of program committee members and external reviewers. The ICISSC-2021 received a total of 605 papers out of which 38 papers are from twenty-one countries of abroad, each paper was reviewed by more than two reviewers, and finally, 60 papers were accepted for publication in scopus-indexed Springer book series “Smart Innovation, Systems and Technologies (SIST).” Our sincere thanks to our Chief Guest Prof. D. N. Reddy, Hon’ble Chancellor, MRUH, and our Guests of Honor and Keynote Speakers Dr. Suresh Chandra Satapathy, Professr and Dean R&D, KIIT; Dr. Aninda Bose, Senior Editor, Springer Publications, India; Prof. Yu-Dong Zhang, University of Leicester, UK; Dr. Jinshan Tang, George Mason University, Virginia, USA; Dr. Mufti Mohhamad, Department of Computing and Technology, Nottingham Trent University, UK; Dr. Naga Mallikarjuna Rao Dasari, Federation University @IIBIT, Australia; Dr. Bhanu Murthy Bhaskara, Ex-Professor, University of Majmaah, Saudi Arabia; Dr. Midhun Chakkaravarthy, Lincoln University College, Malaysia; Dr. Sanju Tiwari, Universidad Autonoma de Tamaulipas, Mexico; Prof. Ganapati Panda, Professorial Fellow, IIT Bhubaneswar, India; and Dr. Rama Murthy Garimella, Professor, Mahindra University Ecole Centrale School of Engineering, India, for extending their support and cooperation. We would like to express our gratitude to all session chairs, viz. Dr. M. Ramakrishna Murthy, ANITS, Visakhapatnam; Prof. K. Venkat Rao, Dean Academics, Andhra University, Visakhapatnam; Dr. Ilaiah Kavati, National Institute of Technology, Warangal; Dr. E. Suresh Babu, National Institute of Technology, Warangal; Dr. M. Ramesh, Professor, RVR and JC College of Engineering, Guntur; Dr. K. Mallikharjuna Lingam, Professor and Head, ECE, MRCET Campus, Hyderabad; ix

x

Preface

Dr. G. Sharada, Professor and Head, CSE, MRCET Campus, Hyderabad, for extending their support and cooperation. We are indebted to the program committee members and external reviewers who have produced critical reviews in a short time. We would like to express our special gratitude to publication chair Dr. Suresh Chandra Satapathy, KIIT, Bhubaneswar, for his valuable support and encouragement till the successful conclusion of the conference. We express our heartfelt thanks to our Chief Patron Sri. CH. Malla Reddy, Founder Chairman, MRGI; Patrons Sri. CH. Mahendar Reddy, Secretary, MRGI; Sri. CH. Bhadra Reddy, President, MRGI; Conference Chair Dr. V. Sivakumar Reddy, Vice-Chancellor, MRUH; Convener Prof. K. Kailasa Rao, Dean, School of Engineering; and Co-Convener Prof. P. Sanjeeva Reddy. We would also like to thank the organizing chair Prof. G. S. Naveen Kumar, HOD, CSE (Data Science), Organizing Secretaries; Dr. E. V. Reddy, HOD, CSE; Dr. Thayyaba Khatoon Mohammed, HOD, CSE (AI & ML); and Dr. Meeravali Shaik, HOD, CSE (Cyber Security), for their valuable contribution. Our thanks also go to all coordinators Dr. Rajasekar Rangasami, Professor, CSE (AI & ML); Dr. B. V. V. Siva Prasad, Professor, CSE; Dr. Sudheer Nidamanuri, Professor, CSE (DS); Dr. Arun Singh Chauhan, Professor, CSE (CS); Dr. K. Vijaya Sekhar Reddy, Professor, MBA, and the organizing committee as well as all the other committee members Dr. V. Dhanunjana Chari, Dean, School of Sciences; Dr. P. S. V. Srinivasa Rao, HOD, I Year Engineering; Dr. Pranayath Reddy, Professor, CSE (AI & ML); Dr. Magesh Kumar, Professor, CSE (AI & ML); Dr. K. Srikanth, Professor, CSE (DS); Dr. G. Nanda Kishor Kumar, Professor, CSE; Dr. M. Ravikanth, Professor, CSE; Ms. K. Lakshmi Madhuri; Prof. V. Ramachandran; Mr. T. Rama Rao; Mr. T. Vinay Simha Reddy; and Mr. T. Sanjeeva Rao for their contribution in successful conduct of the conference. We are indebted to the program committee members and external reviewers who have produced critical reviews in a short time. Last, but certainly not least, our special thanks to all the authors without whom the conference would not have taken place. Their technical contributions have made our proceedings rich and praiseworthy. Hyderabad, India

V. Sivakumar Reddy Vice Chancellor, MRUH

Contents

Unsupervised Learning for the Automatic Management of Scientific Aspect in Academic Conferences . . . . . . . . . . . . . . . . . . . . . . . . Abdeldjaouad Nusayr Medakene, Abdelhadi Mohammed Ali Eddoud, and Khadra Bouanane

1

Context Sensing from Deployed IoT Sensors for Improved Public Transport Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evariste Twahirwa, James Rwigema, and Raja Datta

15

College Admissions Counseling Using Intelligent Question Answering System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bui Thanh Hung

27

Juvenile Crime Categorization with EM Clustering . . . . . . . . . . . . . . . . . . . Lalitha Saroja Thota, Ravinder Reddy Baireddy, Suresh Babu Changalasetty, and Rambabu Pemula Review of Interoperability Issues Influencing Acceptance and Adoption of Cloud Computing Technology by Consumers . . . . . . . . . Gabriel Terna Ayem, Salu George Thandekkattu, and Narasimha Rao Vajjhala Time Series Analysis Using Random Forest for Predicting Stock Variances Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parnandi Srinu Vasarao and Midhun Chakkaravarthy Prediction of COVID-19 Severity Level Using XGBoost Algorithm: A Machine Learning Approach Based on SIR Epidemiological Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Labeba Tahsin and Shaily Roy

39

49

59

69

xi

xii

Contents

A Blockchain-Based Architecture for Implementing Job Recruitment Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. C. Sherimon, Vinu Sherimon, Arjun B. Dev, Mammen Jacob Ambrayil, and Mooneesah Abeer Analysis of the Interest in the Profession of Tutor on the Internet (A Case Study of Google Trends) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oleh Karyy, Ihor Kulyniak, Oksana Ivanytska, Liubov Halkiv, and Ivan Zhygalo

79

89

A PID Temperature Controller for Passive Solar Water Heaters with a Low-Cost Three-Way Valve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Chanh-Nghiem Nguyen, Ngo Thanh The, and Luong Vinh Quoc Danh MATLAB/Simulink-Based Simulation Model for Minimizing the Torque Ripple Reduction in Brushless DC Motor . . . . . . . . . . . . . . . . . 115 Issa Etier, C. Arul Murugan, and Nithiyananthan Kannan Improving Accuracy in Facial Detection Using Viola-Jones Algorithm AdaBoost Training Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Kodeti Haritha Rani and Midhun Chakkaravarthy Comparative Analysis Model for Lambda Iteration, Genetic Algorithm, and Particle Swarm Optimization-Based Economic Load Dispatch for Saudi Arabia Network System . . . . . . . . . . . . . . . . . . . . . 139 Abdulrhman Alafif, Youssef Mobarak, and Hussain Bassi A Novel Approach for Providing Security for IoT Applications Using Machine Learning and Deep Learning Techniques . . . . . . . . . . . . . . 155 M. V. Kamal, P. Dileep, and M. Gayatri Analysis of Bangla Keyboard Layouts Based on Keystroke Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Shadman Rohan, Koushik Roy, Pritom Kumar Saha, Sazzad Hossain, Fuad Rahman, and Nabeel Mohammed Improved Efficiency of Semantic Segmentation using Pyramid Scene Parsing Deep Learning Network Method . . . . . . . . . . . . . . . . . . . . . . 175 Pichika Ravikiran and Midhun Chakkaravarthy Fractional Approach for Controlling the Motion of a Spherical Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Quoc-Dong Hoang, Le Anh Tuan, Luan N. T Huynh, and Tran Xuan Viet Progression of Metamaterial for Microstrip Antenna Applications: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Renju Panicker and Sellakkutti Suganthi

Contents

xiii

Natural Language Processing of Text-Based Metrics for Image Captioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Sudhakar Sengan, P. Vidya Sagar, N. P. Saravanan, K. Amarendra, Arjun Subburaj, S. Maheswari, and Rajasekar Rangasamy Li-Fi Enables Reliable Communication of VLT for Secured Data Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 N. Venkata Ramana Gupta, G. Rajeshkumar, S. Arun Mozhi Selvi, Sudhakar Sengan, Arjun Subburaj, S. Priyadarsini, and Rajasekar Rangasamy Design of Dual-Stack, Tunneling, and Translation Approaches for Blockchain-IPv6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Srihari Varma Mantena, S. Jayasundar, Dilip Kumar Sharma, J. Veerappan, M. Anto Bennet, Sudhakar Sengan, and Rajasekar Rangasam Wavelet-Based Aerial Image Enhancement Tools and Filtering Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 P. Ramesh, V. Usha Shree, and Kesari Padma Priya Efficiency Optimization of Security and Safety in Cooperative ITS Communication for QoS Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 M. Mohan Rao, Sreenivas Mekala, Rajaram Jatothu, G. Ravi, and Sarangam Kodati Classification of Medicinal Plants Using Machine Learning . . . . . . . . . . . . 255 Rohit Sunil Meshram and Nagamma Patil Performance Measures of Blockchain-Based Mobile Wallet Using Queueing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 R. Kavitha, R. Rajeswari, Pratyusa Mukherjee, Suchismita Rout, and S. S. Patra Conventional Data Augmentation Techniques for Plant Disease Detection and Classification Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Srinivas Talasila, Kirti Rawal, and Gaurav Sethi Directionality Information Aware Encoding for Facial Expression Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 A. Vijaya Lakshmi and P. Mohanaiah MRI Breast Image Segmentation Using Artificial Bee Colony Optimization with Fuzzy Clustering and CNN Classifier . . . . . . . . . . . . . . 303 R. Sumathi and V. Vasudevan Human Action Recognition in Still Images Using SIFT Key Points . . . . . 313 M. Pavan and K. Jyothi

xiv

Contents

Detecting Credit Card Fraud Using Majority Voting-Based Machine Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 V. Akshaya, M. Sathyapriya, R. Ranjini Devi, and S. Sivanantham Evaluation and Performance Analysis of Magnitude Comparator for High-Speed VLSI Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Madhusudhan Reddy Machupalli, Krishnaveni Challa, Murali Krishna Bathula, G. Rajesh Kumar, and Raja Posupo Artificially Ripened Mango Fruit Prediction System Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 V. Laxmi and R. Roopalakshmi Quality and Dimensional Analysis of 3D Printed Models Using Machine Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 S. H. Sarvesh, Kempanna Chougala, J. Sangeetha, and Umesh Barker Traffic Density Determination Using Advanced Computer Vision . . . . . . 375 Narayana Darapaneni, M. S. Narayanan, Shakeeb Ansar, Ganesh Ravindran, Kumar Sanjeev, Abhijeet Kharade, and Anwesh Reddy Paduri Analyzing and Detecting Fake News Using Convolutional Neural Networks Considering News Categories Along with Temporal Interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 Viraj Desai, Neel Shah, Jinit Jain, Manan Mehta, and Simran Gill Automatic Student Attendance and Activeness Monitoring System . . . . . 405 Naveena Narayana Poojari, J. Sangeetha, G. Shreenivasa, and Prajwal Greenput Algorithm for Minimizing Power and Energy Consumption in Hybrid Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . 417 S. Beski Prabaharan and Saira Banu Cloud-Based Face and Face Mask Detection System . . . . . . . . . . . . . . . . . . 427 V. Muthumanikandan, Prashant Singh, and Rithwik Chithreddy Design of Delay-Efficient Carry-Save Multiplier by Structural Decomposition of Conventional Carry-Save Multiplier . . . . . . . . . . . . . . . . 435 M. Venkata Subbaiah and G. Umamaheswara Reddy Speech Intelligibility Quality in Telugu Speech Patterns Using a Wavelet-Based Hybrid Threshold Transform Method . . . . . . . . . . . . . . . 449 S. China Venkateswarlu, N. Uday Kumar, D. Veeraswamy, and Vallabhuni Vijay Landslide Susceptibility for Communities Based on Satellite Images Using Deep Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Aadityan Sridharan, A. S. Remya Ajai, and Sundararaman Gopalan

Contents

xv

Comparative Study on Estimation of Remaining Useful Life of Turbofan Engines Using Machine Learning Algorithms . . . . . . . . . . . . . 473 Eshika Shah, Nishil Madhani, Aditya Ghatak, and Abhiram Ajith Kumar Machine Learning-Based Framework for Malware Analysis in Android Apps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 Anil Kumar Naik and K. V. Pradeepthi A Survey on Byzantine Agreement Algorithms in Distributed Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 P. C. Sachin and Dua Amit An Autonomous Intelligent System to Leverage the Post-harvest Agricultural Process Using Localization and Mapping . . . . . . . . . . . . . . . . 507 Amitash Nanda and Deepak Ahire Zero-Day Attack Detection Analysis in Streaming Data Using Supervised Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 B. Ida Seraphim and E. Poovammal Comparative Evaluation of Machine Learning Methods for Network Intrusion Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 Sunil Kumar Rajwar, Pankaj Kumar Manjhi, and Indrajit Mukherjee Rare Pattern Mining from Data Stream Using Hash-Based Search and Vertical Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 Sunitha Vanamala, L. Padma Sree, and S. Durga Bhavani A Novel ARDUINO Based Self-Defense Shoe for Women Safety and Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 Swarnalatha Pasupuleti, Sattibabu Gummarekula, V. Preethi, and R. V. V. Krishna A New Supervised Term Weight Measure Based Machine Learning Approach for Text Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563 T. Raghunadha Reddy, P. Vijaya Pal Reddy, and P. Chandra Sekhar Reddy Machine Learning-Based Human Activity Recognition Using Smartphones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 A. Vinay Kumar, M. Neeraj, P. Akash Reddy, and Ameet Chavan A Feature Selection Technique-Based Approach for Author Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583 D. Radha and P. Chandra Sekhar Detection of Fake News Using Natural Language Processing Techniques and Passive Aggressive Classifier . . . . . . . . . . . . . . . . . . . . . . . . . 593 K. Varada Rajkumar, Pranav Vallabhaneni, Krishna Marlapalli, T. N. S. Koti Mani Kumar, and S. Revathi

xvi

Contents

Efficiency Analysis of Pre-trained CNN Models as Feature Extractors for Video Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Diksha Mehta, Janhvi Joshi, Abhishek Bisht, and Pankaj Badoni Mapping of Computational Social Science Research Themes: A Two-Decade Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617 Agung Purnomo, Nur Asitah, Elsa Rosyidah, Andre Septianto, and Mega Firdaus Diagnosis of Pneumonia with Chest X-Ray Using Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627 E. Venkateswara Reddy, G. S. Naveen Kumar, G. Siva Naga Dhipti, and Baggam Swathi High Performance Algorithm for Content-Based Video Retrieval Using Multiple Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 G. S. Naveen Kumar and V. S. K. Reddy Smart Recruitment System Using Deep Learning with Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647 Ranganath Ponnaboyina, Ramesh Makala, and E. Venkateswara Reddy Early Diagnosis of Age-Related Macular Degeneration (ARMD) Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657 Pamula Udayaraju and P. Jeyanthi Recent Trends in Calculating Polarity Score Using Sentimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665 K. Srikanth, N. Sudheer, G. S. Naveen Kumar, and Vijaysekhar Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675

About the Editors

V. Sivakumar Reddy is Professor in Computer Science and Engineering, Vice Chancellor, Malla Reddy University, and has an experience of more than 25 years in Teaching and Industry put together. He is an alumnus of Indian Institute of Technology, (IIT) Kharagpur, obtained his Ph.D., and he is versatile in multidisciplinary specializations in Signal Processing & Communications as well as Computer Science Engineering. His laurels include more than 150 Publications in reputed National and International Conferences and Journals. He is Fellow of IETE, Life Member of ISTE, Member of IEEE, and Member of CSI. He was awarded as “Best Teacher” in three consecutive academic years with citation and cash award. He is the recipient of “India Jewel Award” for outstanding contribution to research in the field of Engineering and Technology. He was Member of Board of Studies for JNT University, Hyderabad, India. V. Kamakshi Prasad completed his Ph.D. in speech recognition at the Indian Institute of Technology Madras and his M.Tech. in Computer Science and Technology at Andhra University in 1992. He has more than 22 years of teaching and research experience. His areas of research and teaching interest include speech recognition and processing, image processing, pattern recognition, ad-hoc networks, and computer graphics. He has published several books, chapters, research papers in peer-reviewed journals, and conference proceedings. He is also Editorial Board Member of the International Journal of Wireless Networks and Communications and Member of several academic committees. D. N. Mallikarjuna Rao is currently working as Professor in Faculty of Science and Technology, Federation University @IIBIT, Australia. He obtained his Ph.D. in Computer Science at University of South Australia, Australia, and Master of Technology at Jawaharlal Nehru Technological University, India. He has more than 20 years of teaching and research experience. He has experience in developing mathematical models, designing and developing algorithms and data analysis, and expertise in computational techniques and computer networks. He published more than

xvii

xviii

About the Editors

15 reputed journal papers in computation, networking principles, and graph theoretical analysis. He has experience in designing computational methodologies to solve engineering problems. Suresh Chandra Satapathy is currently working as Professor, KIIT Deemed to be University, Odisha, India. He obtained his Ph.D. in Computer Science Engineering from JNTUH, Hyderabad, and Master’s degree in Computer Science and Engineering from National Institute of Technology (NIT), Rourkela, Odisha. He has more than 27 years of teaching and research experience. His research interest includes machine learning, data mining, swarm intelligence studies, and their applications to engineering. He has more than 98 publications to his credit in various reputed international journals and conference proceedings. He has edited many volumes from Springer AISC, LNEE, SIST and LNCS in the past, and he is also Editorial Board Member in few international journals. He is Senior Member of IEEE and Life Member of Computer Society of India. Currently, he is National Chairman of Division-V (Education and Research) of Computer Society of India.

Unsupervised Learning for the Automatic Management of Scientific Aspect in Academic Conferences Abdeldjaouad Nusayr Medakene , Abdelhadi Mohammed Ali Eddoud, and Khadra Bouanane

Abstract An academic conference is of great importance in the scientific community as it provides a platform for scientists and researchers to share their research findings, exchange ideas and establish new cooperative relationships with research groups around the world. Tasks related to the management of a scientific conference can be categorized according to two main aspects: the technical or scientific aspect and the administrative and logistical aspect. While tasks in both aspects are challenging and time/effort consuming, the scientific aspect of a conference requires in addition, a high level of expertise. Despite the great support that conference management systems provide to conference organizers, most of them focus mainly on the administrative and logistical aspect, while tasks of the scientific aspect are less considered. In this work, we make use of several unsupervised algorithms to automate some tasks from the scientific aspect, where expertise, time and efforts are required. Namely, the detection of out of scope papers and the creation of an accurate conference program.

1 Introduction The organization of an academic conference is a prestigious event in the scientific community as it provides a platform for scientists and researchers to share their research findings, exchange ideas and establish new cooperative relationships with research groups around the world. However, making such an event exceptional requires time and tremendous efforts since there is a plethora of must achieved tasks in a limited amount of time, especially if the main concern of organizers is to preserve or build a high reputation for their conference. This makes the web-based information management systems, commonly named Conference Management Systems (CMSs), an indispensable tools that aim to assist the organizers in performing many time-consuming and complex tasks. A. N. Medakene · A. M. A. Eddoud · K. Bouanane (B) Department of Computer Science and IT, Kasdi Merbah University, Ouargla, Algeria e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_1

1

2

A. N. Medakene et al.

On the other hand, the organization of a conference involves generally several people in different roles performing several tasks. A detailed description of these roles and related tasks can be found in [21, 23]. Tasks related to the management of a scientific conference can be categorized according to two main aspects: the technical or scientific aspect, and the administrative and logistical aspect. While tasks in both aspects are challenging and time and effort consuming, those related to the scientific aspect require in addition, a high level of expertise. These tasks are usually achieved by the chairmen or members of the Technical Program Committee (TPC). For instance, we can mention: Technical program committee composition: the selection of the technical program committee members is highly related to the conference scope and topics. In this task, the PC chairmen must look for experts with relevant profiles to take part of the TPC. Plagiarism and out of scope papers detection: while multiple plagiarism checkers can be used to detect plagiarism in submitted manuscripts, detecting out of scope papers is usually done manually. To do so, TPC members are requested to take a look to each submitted manuscript and decide whether it fits the scope of the conference or not. Basically, this requires a high expertise and time, especially with a large number of submissions, which makes this task more tedious. Assignment of papers to reviewers: a good assignment allows an impartial assessment of submitted manuscripts which positively impacts the conference reputation. However, finding an appropriate, equitable and fair assignment for reviewers and papers is considered as a challenging and time-consuming task. Preparation of the conference program: this task requires a high level of expertise and needs to be accomplished in two steps. Based on available resources (time and locations), the TPC chairmen must first organize accepted papers into sessions for oral and poster presentations. Each session should contain papers that share some common topics and must be labeled accordingly. Once this step is finished, the schedule is established where several sessions can be planned to be held in parallel. However, one should consider potential conflicts. For instance, sessions that involve presentations from the same author must not be scheduled in the same time slot. Furthermore, it is better to schedule sessions with related topics in different time slots to allow participants benefit from the event. While the automation of paper-reviewer assignment task is integrated in many existing CMSs and has been the focus of extensive studies over decades [3, 7, 10, 12, 15, 19, 24, 26], the automation and integration of the other tasks remain less considered. In this work, we are concerned with two tasks from those described above. Namely, the detection of out of scope manuscripts and the preparation of the conference program schedule. For this purpose, we propose to use efficient unsupervised methods to automate the aforementioned tasks. The remainder of the paper is structured as follows. Section 2 provides an overview of related work. The automation of the aforementioned tasks is described in Sect. 3. In Sect. 4, the results of computational experiments are presented then the paper is concluded in Sect. 5.

Unsupervised Learning for the Automatic …

3

2 Related Work As mentioned above, CMSs become an necessary tool that aim to assist the organizers in performing many tasks and thus make the organization process easier. For instance, we can cite EasyChair [8], Conftool [6], Openconf [16], ConfMaster [5], Confious [18] and many others. The main tasks that are usually handled by CMSs include email notification, online submission, collection and download of papers and abstracts, reviewers management, conflict of interest management, assignment of papers, collection of reviews, the preparation of the proceedings or even registration and payment of conference fees. On the other hand, many existing works propose to enhance such systems by automating and integrating other tasks. In addition to a new assignment algorithm, authors in [20] present methods for conflict of interest detection, the poster setup plan and a conference participant support tasks and in [21], the same authors employed hierarchical clustering techniques to automatically propose a compilation of the papers into scientific sessions. Reinhardt et al. [23] propose to integrate common features of social media in CMSs to make scientific event management more social and awareness supporting. In [27], authors performed modifications on Openconf to include registration and payment acceptance.

3 Methods’ Description In the current work, we make a focus on two tasks related to the scientific aspect of the conference. Usually, these tasks are accomplished manually by the TPC chairmen or members and thus require time, efforts and a high expertise: 1. Detection of out of scope papers: in many conferences, TPC members or chairmen are requested to check out the submitted papers and make a decision about their fitness with regard to the topics of the conference. 2. Preparation of the conference program: which consists first on creating sessions for both oral talks and posters according to common themes then the schedule is established based on available resources. To do so, several steps are performed as shown in Fig. 1. Given a conference (which will be referred to as Conf), to perform any task for Conf, our framework first performs the following operations:

3.1 Topic Modeling This involves modeling Conf’s topics by training a Latent Dirichlet Allocation (LDA) topic model, as described in [13, 14, 22], on a dataset of documents that are

4

A. N. Medakene et al.

Fig. 1 A global description of the framework

relevant to Conf’s topics. In our case, we trained an LDA model on the documentterm matrix of the pre-processed training set of the PeerRead dataset [11], which consists of over 14K papers submitted in top-tier venues including ACL, NIPS and ICLR.1 For an arbitrary conference, the system could be modified to dynamically build a corpus specifically for the conference, given the conference’s list of topics, which can be chosen from a predefined and updatable list of topics. The LDA model infers a set of q (q is given) topics T = {tk | k ∈ [q]} from its training corpus, and for each submitted paper in P = {Pi | i ∈ [n]}, it predicts Pi ’s topics probability distribution vector, denoted z i , given Conf’s topics T .

3.2 Word Embedding In addition to the topic model, the semantic information of words is obtained by carrying out training of a Word2vec word embedding model on the combined corpus of Conf’s LDA model corpus (in our case PeerRead) and Conf’s pre-processed submitted papers corpus. The Word2vec model outputs, for each word in the model’s vocabulary, its learned word vector, that will be used in calculating the semantic similarity between words using a suitable similarity measure (such as cosine similarity). Once the topic and word models have been built, the framework then proceeds to perform the main tasks as follows:

1

https://github.com/allenai/PeerRead.

Unsupervised Learning for the Automatic …

5

3.3 Detection of Out of Scope Papers Task As mentioned previously, this is a tedious task that is usually accomplished by the TPC chairmen or members, manually. In this paper, we propose an algorithm which adapts the method presented in [22] for this task. Originally, authors in [22] used an LDA model to extract topics and feature words for each topic, and a Word2vec model to express the semantic information of words in documents. They trained both models solely on the subject documents, and used both of LDA’s probabilities and Word2vec’s vectors to calculate what they referred to as the “relevancy” of a document. Instead of training the LDA and Word2vec models on the subject documents only (in our case the submitted papers), we propose to train the LDA model using the corpus described in Sect. 3.1, since its documents are guaranteed to be in-topic. On the other hand, we train the Word2vec model on the corpus described in Sect. 3.2. By doing so, the Word2vec model will grasp the submitted papers’ words’ semantics from their context in the submitted papers and allows it to integrate well with the LDA model by modeling its vocabulary as well. The topic and word models together will prove useful in determining the relevancy of a submitted paper. Submitted Papers’ Relevancy Calculation First, for each word w j of every submitted paper, vectors extracted by Word2vec are used to calculate the cosine similarity between w j and the feature word w f under topic tk . The relevancy between the word w j and the topic tk , denoted Relw,t (w j , tk ), is then defined as the probability weighted sum of the cosine similarities between w j and a number N f of feature words under topic tk : Relw,t (w j , tk ) =

Nf 

P(w f | tk ) cos(w j , w f )

(1)

f =1

where cos denotes the cosine similarity between the Word2vec feature vectors of words w j and w f . Then, the relevancy between the word w j and the submitted paper Pi , denoted as Relw, p (w j , pi ), is defined as the probability weighted sum of the relevancy between w j and all topics of Pi : Relw, p (w j , pi ) =

q 

P(tk | pi )Relw,t (w j , tk )

(2)

k=1

After that, Relw, p (w j , pi ) of all words w j in submitted paper Pi are accumulated and divided by the number of words Ji in paper Pi to obtain the total relevancy of the submitted paper Pi , denoted Rel p ( pi ), where

6

A. N. Medakene et al.

Rel p ( pi ) =

Ji 1  Relw, p (w j , pi ) Ji j=1

(3)

Finally, off-topic submitted papers are filtrated according to a predetermined threshold for Rel p . We also note that in their original paper, the authors in [22] defined Rel p ( pi ) without the factor J1i . The reason we added that factor was to normalize a document’s relevancy with respect to its size.

3.4 Sessions Creation and Conference Program Schedule As mentioned in the beginning of this section, the preparation of the conference program goes through two steps, first, the generation of sessions for oral and poster presentations then the scheduling of these sessions in a meaningful way. Considering available time slots and locations, we must group, in each session, a predefined number of papers that share a common theme. Thus, sessions can be generated by performing a clustering under size constraints on clusters. In this paper, the generation of sessions is accomplished by a new algorithm based on Bregman Hard Clustering [2]. A cluster labeling method is then proposed. This new algorithm uses the BregMeans++ with Local Search method [25] as an initializer, and is modified to take into account clusters’ size constraints, since each session’s (i.e., cluster’s) size is specified by the PC Chairmen beforehand. Once the sessions are generated, establishing the conference program can be formulated as a coloring problem, where any of the many existing algorithms may be used to solve it. The description of our approach is detailed as follows: Clustering the Accepted Papers into Sessions Task To cluster the accepted papers into sessions, we propose an algorithm based on Bregman Hard Clustering (BHC), introduced by authors in [2]. Since the feature vector used to represent a paper Pi is its topics probability distribution vector z i , we make use of a suitable divergence measure for these features. In our case, we choose the Jensen–Shannon divergence, denoted JSD. The proposed clustering algorithm consists of three steps: S1. Initialization Step: For this step, we use the method proposed in [25], BregMeans++ with Local Search, which is an improvement of BregMeans++ [1]. BregMeans++ with Local Search combines BregMeans++ with the local search strategy, whose main purpose is to further improve the initial centroids assignment by an iterative process based on a single swap operation. S2. Assignment Step: Standard BHC doesn’t take into account prior information about cluster sizes, and given the PC Chairmen would want to specify exactly how many papers are fit into each session, standard clustering doesn’t suffice. For this reason, we propose to incorporate the constraint defined in [9] into BHC.

Unsupervised Learning for the Automatic …

7

We denote with ζs the number of papers in session s, and with Ns the number of sessions. The algorithm then assigns the point (paper) z to the nearest available cluster (session) Cs such that: s = argmins  ∈{s  ∈[Ns ] : |Cs |MinistryStaff creator }.

Upon approval from the Ministry of Civil Services, the vacancies are visible for the jobseeker participants. They can post a job application asset onto the network. ApplyJob transaction is used to apply for the job. transaction ApplyJob { o JobApplication jobApplication }.

A Blockchain-Based Architecture for Implementing …

85

Later, the status of their application is validated and informed by the ministry to the job seekers. AssignJob transaction is used to assign the job to the concerned applicant. This transaction updates the status of the job vacancy, and it approves the job. transaction AssignJob { o JobVacancy jobVacancy o JobApplication jobApplication o JobVacancyApprovalStatus jobVacancyApprovalStatus o JobVacancyStatus status }.

The transaction UpdateStatus is used to update the status of the job application of a candidate. transaction UpdateStatus { o JobApplication jobApplication o JobApplicationStatus jobApplicationStatus }.

To execute the above transactions defined in the model file, the logic file needs be deployed on the blockchain network. Hyperledger Composer Playground [17] tool is used to create the business network. This tool provides a platform to create business network cards, test them, and export the business network in archive format. Figure 2 presents the testing environment of composer which displays two instances of JobSeeker participant.

Fig. 2 Testing environment of composer playground

86

P. C. Sherimon et al.

5 Conclusion The design of the proposed network of a blockchain-based job portal, as well as the definitions of the important transactions, is included in this article. The suggested network was designed to improve the conventional technique, which was expensive and time-consuming. The assets will be secured on the blockchain, and all transaction details will be available in the registries maintained by the blockchain network, enabling data security and proper accountability. The deployment of the abovementioned model with API compatibility will be the focus of future study in this area. Also, for each transaction defined in the model file, functions should be defined as smart contracts in the script file. Future work could also include an interactive front-end for interacting with the engine network. Funding The research leading to these results has received funding from the Research Council (TRC) of the Sultanate of Oman under the Block Funding Program BFP/RGP/ICT/18/114.

References 1. Allessie, D., Sobolewski, M., Vaccari, L., Pignatelli, F., Europäische Kommission, Gemeinsame Forschungsstelle: Blockchain for digital government an assessment of pioneering implementations in public services (2019). Accessed 05 June 2021 [Online]. Available: https://doi.org/ 10.2760/942739 2. Blockchain Dubai. Dubai Blockchain Strategy. Smart Dubai. https://www.smartdubai.ae/initia tives/blockchain. Accessed 05 June 2021 3. Keršiˇc, V., Štukelj, P., Kamišali´c, A., Karakati´c, S., Turkanovi´c, M.: A blockchain and AIbased platform for global employability. In: Blockchain and Applications, pp. 161–168. Cham (2020). https://doi.org/10.1007/978-3-030-23813-1_20 4. Pinna, A., Ibba, S.: A blockchain-based decentralized system for proper handling of temporary employment contracts (2017). Accessed 05 June 2021. [Online] Available: http://arxiv.org/abs/ 1711.09758 5. Design of Recruitment Management Platform Using Digital Certificate on Blockchain. http:// xml.jips-k.org/full-text/view?doi=https://doi.org/10.3745/JIPS.03.0121. Accessed 05 June 2021 6. Yeom, S., Choi, S., Chi, J., Park, S.: Blockchain-based employment contract system architecture allowing encrypted keyword searches. Electron 10(9) Art. no. 9 (2021). https://doi.org/10.3390/ electronics10091086 7. Onik, M.H., Miraz, M.H., Kim, C.-S., A recruitment and human resource management technique using blockchain technology for industry 4.0, p. 6 (2018) 8. Hamrani, N.R.A., Hamrani, A.R.A.: People of determination (disabilities) recruitment model based on blockchain and smart contract technology. Technol. Invest. 12(3) (2021).https://doi. org/10.4236/ti.2021.123008 9. Haber, S., Stornetta, W.S.: How to time-stamp a digital document. In: Advances CryptologyCRYPTO’ 90, pp. 437–455. Berlin, Heidelberg (1991).https://doi.org/10.1007/3-540-384243_32 10. Nakamoto, S.: Bitcoin: A peer-to-peer electronic cash system, p. 9 11. Buterin, A.: Next Generation Smart Contract and Decentralized Application Platform, p. 36

A Blockchain-Based Architecture for Implementing …

87

12. Hyperledger–Open Source Blockchain Technologies, Hyperledger. https://www.hyperledger. org/. Accessed 05 June 2021 13. Kim, H.J.: Technical aspects of blockchain. In: The Emerald Handbook of Blockchain for Business. In: Kent Baker, H., Nikbakht, E., Stein Smith, S. (eds.), Emerald Publishing Limited, pp. 49–64 (2001). https://doi.org/10.1108/978-1-83982-198-120211006 14. “ILTANET.” https://www.iltanet.org/. Accessed 05 June 2021 15. Frankenfield, J.: Permissioned blockchains, Investopedia. https://www.investopedia.com/ terms/p/permissioned-blockchains.asp. Accessed 02 Sept 2021 16. Sherimon, V.,Sherimon, P.C., Ismaeel, A.: JobChain. An integrated blockchain model for managing job recruitment for ministries in Sultanate of Oman. Int. J. Adv. Comput. Sci. Appl. IJACSA, 11(2), 32/29 (2020). https://doi.org/10.14569/IJACSA.2020.0110252 17. Using Playground | Hyperledger Composer. https://hyperledger.github.io/composer/v0.19/pla yground/playground-index. Accessed 05 June 2021

Analysis of the Interest in the Profession of Tutor on the Internet (A Case Study of Google Trends) Oleh Karyy, Ihor Kulyniak, Oksana Ivanytska, Liubov Halkiv, and Ivan Zhygalo

Abstract Using the Web service Google Trends, it was analyzed the dynamics of changes in the popularity of the search entry “tutor” in Ukraine, based on the results of the distribution of frequent searches. The paper with the help of the Web service Google Trends includes ranking of regions according to the search query “tutor” degree of popularity. To confirm the hypothesis of a relationship between the popularity of the search query “tutor” and the average percentage of school graduates who passed the external independent testing (EIT) in Ukraine, the values of association and contingency coefficients were calculated, as well as Fisher’s exact criterion for groups of regions with lower and higher average values.

1 Introduction The problem of education has always been and remains relevant in any society. Teachers, parents, scientists, politicians are trying to answer the question of how to make the learning process effective and how to improve the quality of education. Currently, more and more parents are turning to private tutors for help in improving the quality and level of education of their children.

O. Karyy · I. Kulyniak (B) · O. Ivanytska · L. Halkiv · I. Zhygalo Lviv Polytechnic National University, Bandera Street 12, Lviv 79013, Ukraine e-mail: [email protected] O. Karyy e-mail: [email protected] O. Ivanytska e-mail: [email protected] L. Halkiv e-mail: [email protected] I. Zhygalo e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_9

89

90

O. Karyy et al.

In the last few years, this phenomenon has become wide spread, which allows us to make assumptions that tutoring is gradually becoming part of the learning process. Whereas, tutoring used to be seen as a necessity for a certain period, for example, due to a delay in learning because of illness, nowadays, tutors often accompany a child during the entire period of study—from preparation for school to admission to a higher educational institution. The popularity of tutoring is growing rapidly. On average, every year, the demand for online tutoring services grows by 15%. The data of the sociological survey “Dynamics of the attitude of Ukrainians to EIT” conducted by the sociological group “Rating” from 3 to 8 of May 2018 show that more than half (52%) of those who personally or whose children passed EIT used tutors’ services in preparation for the test, 43%—did not use such services [1]. Tutoring services are also in demand abroad. The results of the international Ukrainian-Polish sociological project “Youth of the Borderland: Drohobych– Przemy´sl” give grounds to claim the significant popularity of tutoring among Polish youth—62% of graduates of Polish educational institutions addressed tutors [2]. In the Russian Federation, the results of a poll provide the following figures: tutors’ services were used by 31% of Moscow students, namely: in primary school, a private individual teacher is needed for 13% of children, in grades 5–7—for 28% of pupils, in grades 8–11—up to 68% of pupils work with tutors [3]; the results of the survey of 522 students of Surgut State Pedagogical University showed that during the preparation for the single state exam (SSE) 41% of the surveyed students applied for the help of individual tutors [4]. In the source [5], the following data are given for other post-Soviet countries: In Kazakhstan, 59.9% of first-year students received additional private educational services, in Georgia—76%, in Tajikistan—64.8%. The highest rate of use of tutoring services is noted in Azerbaijan—95%, in Ukraine—82%, in Georgia—81.2%, the lowest—49.9% in Tajikistan. In Slovakia, 56% of the first-year students admit that they went to tutoring or preparatory courses in the last grade of secondary school. In Hungary, 75% of primary and secondary school pupils had additional classes [6]. With the introduction of such form of graduates’ knowledge-level assessment in Ukraine as an external independent testing (EIT), tutoring as an auxiliary form of private education and preparation for this control is becoming increasingly important [7]. Tutoring is mainly provided as a service to prepare entrants to pass the EIT for further admission to higher educational institutions, to a lesser extent—to prepare children for school, to pass various tests and examinations both at schools and other educational institutions [8, 9]. That is why research and analysis of the tutoring role in the development of the education system are extremely relevant. This problem is especially acute for such subjects: school graduates, tutors, and the state in the face of secondary schools. The demand and effectiveness of additional individual classes necessitate a further study of tutoring as a multifaceted social phenomenon, which reflects the relevance of the chosen research topic. The Web service Google Trends has long been used successfully to estimate the frequency of search queries in various fields: politics, sociology, economics, IT, medicine, communications, etc. Among the most important works, we note the

Analysis of the Interest in the Profession of Tutor …

91

following: Scientist Petrenko [10] analyzed the methodology of using software to study the impact of Internet communication on public consciousness under the conditions of a hybrid conflict in Ukraine. Liashenko [11] put forward and tested the hypothesis of integrity, interaction, and the interdependence of the concepts of social and information security using the tool Google Trends. Seung-Pyo Juna, Hyoung Sun Yooa, and San Choia [12] conducted comprehensive and objective research into how the public use of big data from Web searches has affected research and, furthermore, discussed the implications of Google Trends in terms of big data utilization and application. In addition, the Web service Google Trends is used by scientists to model and predict economic indicators [13, 14], assessing the level of interest and modeling of the innovation topic [15], in predicting tourist arrivals and overnight stays [16]. The main research objectives of the study are to justify the relevance and growth of demand for tutoring as an ancillary form of private education and preparation for exams or other forms of control. Taking into consideration the fact that in Ukraine, the services of a tutor are mainly used to prepare for the external independent testing; the authors formulated the following hypothesis: There is a connection between the search query “tutor” level of popularity and the average percentage of people who passed the external independent testing (EIT).

2 Research Methods A symptom of what society is interested in, what issues are the focus of public attention, is the frequency of search queries on a particular occasion. One of the tools for analyzing the interest in the profession of “tutor” on the Internet is the Web service Google Trends. Google Trends (https://www.google.com/trends) is Google’s public Web service, which is based on Google search and shows how often a certain search term is searched in relation to the total volume of search queries on the Internet in different countries of the world and in different languages. The Web service analyzes and generates a large amount of information that is available online and presents the results of the search frequency distribution in a certain time interval, groups countries, regions, and cities in the ranking by search popularity of a word. As it is known, the easiest and cheapest way to advertise tutoring services is to advertise it on the Internet. The Internet is also the fastest source for students to find information about tutoring services. In addition, the Internet is one of the most accessible sources of research of a large array of data that is easy to find at any time and in the shortest possible time (compared to newspaper or magazine ads). That is why it is proposed to use the Web service Google Trends to identify the interest and popularity of the search term “tutor” on the Internet. The proposed theoretical research model is presented in Fig. 1. The theoretical model allows to estimate the connection between (1) the search query “tutor” level of popularity (determined by the analysis of the interest in the search query “tutor” on the Internet in Ukraine using the Web service Google Trends)

92

O. Karyy et al. Analysis of the interest in the search query “tutor” on the Internet in Ukraine using the web service Google Trends Data of the Higher Educational Institutions of Ukraine

The search query “tutor” level of popularity in Ukraine

Below average [0-50] Above average [51-100]

H The average percentage of people who passed the external independent testing (EIT)

Below average

Above average

Fig. 1 Proposed theoretical research model (Source Built by the authors)

and (2) the average percentage of people who passed the external independent testing (EIT) in Ukraine (determined on the basis of data of the Higher Educational Institutions of Ukraine). To confirm the hypothesis (H), the values of association and contingency coefficients are calculated, as well as Fisher’s exact criterion for groups of regions with above and below average values.

3 Results 3.1 Analysis of the Interest in the Search Query “Tutor” on the Internet in Ukraine Using the Web Service Google Trends In the Web service Google Trends, we analyzed the degree of interest in the search term “tutor” for the periods 01.01.07–01.01.21 and 01.08.2019–01.06.2020. The period was chosen to see the mood of searchers on the Internet for the entire period since the introduction of the EIT (2008) and the year before. Graphs of search query popularity are constructed, where time is represented on the horizontal axis, and the vertical—how often the search term “tutor” was searched concerning the total number of search terms in Ukraine. The numbers on the graph show the popularity of the search term “tutor” relatively to the highest point on the graph for a particular region and period. We should note that 100 is the popularity peak of the search term “tutor,” 50 means that the popularity of the search term “tutor” is twice less. 0 means that there was not enough data about the search term “tutor” on the Internet for analysis (see Figs. 2 and 3). “Vacancies and education” were selected from the list of categories to exclude those searches that are not the subject of our study from other categories (for example, the title of the movie “Tutor”). In Russian, the word “tutor” coincides in spelling with Ukrainian, which allowed to integrate search queries not only of Ukrainian-speaking but also Russian-speaking citizens of Ukraine. According to Fig. 2, we can trace the upward trend, which indicates the growing interest of the Ukrainian citizens in the search term “tutor” every year. In Figs. 2 and 3, we can trace the seasonality—the period of the biggest interest in the year

Analysis of the Interest in the Profession of Tutor …

93

Fig. 2 Popularity of the search query “tutor” in Ukraine for the period 01.01.2007–01.01.2021 (Source Built by the authors using the web service Google Trends)

Fig. 3 Popularity of the search query “tutor” in Ukraine for the period 01.08.2019–01.06.2020 (Source Built by the authors using the web service Google Trends)

is in September, the smallest—in April–May. The peak of popularity of the search query “tutor” for the period 01.01.2007–01.01.2021 in Ukraine is in September 2018. Thus, we can assume that most entrants start looking for tutors from September and in the long run. Inquiries in other months may indicate a desire to find a new tutor or tutor for a few months before the EIT passing period (3–4 months). Another factor influencing the smoothing of seasonality is the search for tutoring services by students to prepare for exams, employees of enterprises and organizations (for example, to learn English), as their requests are not limited to the time of the EIT passing, as for applicants. Also, with the help of the Web service Google Trends, it was constructed ranking of regions (Fig. 4) according to the popularity of the search query “tutor.” Ranking of the territorial units of Ukraine allows us to find out where the search term “tutor” was the most popular during the selected period. During the period 01.08.2019–01.06.2020, the highest popularity indicator is typical for Lviv (100) and Khmelnytskyi (99) regions, and the lowest—for Cherkasy and Luhansk regions (5). The highest interest rate for the period 01.01.2007–01.01.2021 is typical for Ternopil (100), Ivano-Frankivsk (86) and Lviv (82), and the lowest—for Kharkiv (10). During the period 01.08.2019–01.06.2020, Web service Google Trends singled out only one city—Kyiv (100); in other cities, the proportion of searches to the abovementioned leader is so small that the application ignores them and does not analyze them.

94

O. Karyy et al.

Lviv Khmelnytskyi Ternopil Rivne Zhytomyr Volyn Vinnytsia Ivano-Frankivsk Kyiv Poltava Kyiv city Transcarpathian Chernivtsi Zaporizhzhia Dnipropetrovsk Kherson Sumy Odesa Donetsk Mykolaiv Kirovohrad Kharkiv Chernihiv Cherkasy Luhansk

100 99 79 78 74 74 65 60 56 53 48 48 38 34 25 23 23 17 17 16 14 13 6 5 5 0

20

40

60

80

100

120

Fig. 4 Ranking of regions of Ukraine by the degree of popularity of the search query “tutor.” (Source Built by the authors using the web service Google Trends)

3.2 Assessment of the Relationship Closeness Between the Search Query “Tutor” Level of Popularity in Ukraine and the Average Percentage of People Who Passed the External Independent Testing (EIT) One of the resulting indicators of the education quality in Ukraine is the percentage of people who have passed an external independent testing (EIT). In 2020, participants had the opportunity to take external examinations in Ukrainian language and literature (obligatory subject for all), history of Ukraine, English, Spanish, German and French, biology, geography, mathematics, physics and chemistry. The results of the external independent testing were determined in two stages. At the first stage, the test score of the external independent testing participant was determined. At the second stage, based on the test score, the rating of the external independent testing participant was determined on a 200-point scale, which is used in compiling of the rating list of applicants to Higher Educational Institutions of Ukraine. To obtain the results of the external independent testing participant on a 200-point scale, tables were used to convert test scores into a rating scale from 100 to 200 points, which were published by the Ukrainian Evaluation Center mandatory only after checking the correctness of the tasks of each test participant and determining the threshold score “passed/failed.”

Analysis of the Interest in the Profession of Tutor …

95

The threshold score was determined by a group of experts who analyzed the actual performance of the test tasks by the test participants, and based on this analysis, set the number of test scores that determine the threshold test score for each subject. External independent testing participants who have not passed a certain threshold test score are considered to have failed the test. In our study, to assess the closeness of the connection under the average percentage of people who passed the EIT will be an indicator that shows the ratio of the number of EIT participants who passed a certain threshold test score to the total number of people who came to the EIT. According to the results of the external testing in 2020, it was found that the lowest percentage of people who passed the EIT is 85.38% for Transcarpathian region, and the highest—93.60% for the city of Kyiv [17]. Let us evaluate the closeness of the connection between the popularity level of the search query “tutor” and the average percentage of people who passed the external independent testing by a table of mutual conjugation of features constructing and analyzing. The popularity level of the search query “tutor” is measured on a scale of [0–100]; therefore, we can distinguish two groups: below average [0–50] and above average [51–100]. Similarly, we propose to distinguish two groups of percentages of the number of persons who passed the EIT: below the average level [85.38–89.49] and above the average level [89.50–93.60]. To determine the closeness of these two features connection, each of which consists of only two groups, we use the method of the mutual conjugation table analysis and calculate the coefficients of association of D. Yul as well as the contingency of K. Pearson. D. Yul association coefficient is calculated by the formula: A=

8·9−7·1 ad − bc = = 0.8227 > 0.5 ad + bc 8·9+7·1

where a, b, c, d—the number of objects that fall into each of the four zones formed by the division into two groups of two features. K. Pearson’s contingency coefficient is calculated by the formula: ad − bc 8·9−7·1 K =√ =√ (a + b)(b + d)(a + c)(c + d) (8 + 7)(7 + 9)(8 + 1)(1 + 9) = 0.4422 > 0.3 The contingency coefficient is always less than the association coefficient. The connection is considered confirmed if A ≥ 0.5 or K ≥ 0.3. Table 1 presents data on the grouping of regions of Ukraine by the value of the popularity level of the search query “tutor” and the average percentage of people who passed the external independent testing. To confirm or refute the null hypothesis whether there are statistically significant differences between the level of popularity of the search query “tutor” and the average percentage of people who passed the external independent testing, we use Pearson’s

96

O. Karyy et al.

Table 1 Data on the relationship between the popularity level of the search query “tutor” and the average percentage of people who passed the EIT (in 2020) Groups of the average percentage of people who passed the EIT Groups of the popularity level Below average level of the search query “tutor” [85.38–89.49]

Above average level [89.50–93.60]

Total

a=8

b=7

15

Above average level [51–100] c = 1

d=9

10

Total

16

25

Below average level [0–50]

9

criterion χ2 and compare the result with the critical value. Since according to our data (Table 1), the values in each of the cells are not greater than 10; in this case, the exact Fisher’s test is used for analysis. Fisher’s exact criterion is a criterion used to compare two or more relative indicators that characterize the frequency of a certain feature that has two values. For our study, we use a two-way test, as we estimate the differences in frequencies in two directions, namely we estimate the probability of both higher and lower frequency in the experimental group compared to the control group. Fisher’s exact criterion is calculated by the formula: P=

(a + b)! · (c + d)! · (a + c)! · (b + d)! a! · b! · c! · d! · N !

(1)

where N—the total number of studied features in the two groups. If the value of Fisher’s exact criterion is less than the critical one (P = 0.05), the conclusion is made that there are statistically significant differences in the frequency of the result depending on the influence of the risk factor. Fisher’s exact criterion is as follows: P=

(8 + 7)! · (1 + 9)! · (8 + 1)! · (7 + 9)! = 0.04045 < 0.05. 8! · 7! · 1! · 9! · 25!

Thus, the values of the coefficients of association, contingency, and Fisher’s exact criterion indicate that the relationship between the popularity of the search query “tutor” and the average percentage of people who passed the external independent testing is confirmed and is relatively strong.

4 Discussion and Conclusions The following aspects have been explored using the Web service Google Trends: • The popularity of the search queries of “tutor” is defined; • the trends are revealed in the popularity change of search queries “tutor;”

Analysis of the Interest in the Profession of Tutor …

97

• seasonality in the interest in the profession of “tutor” in the market is analyzed; • the influence of geographical data on the popularity of search queries “tutor” is researched. Today, tutoring is a common practice of teacher’s additional classes with one student or a group to improve the mastery of the material or a separate preparation for tests or exams. Tutoring became widespread in the early 1990s, and the problem of this phenomenon began to be studied in the early 2000s. Teachers, parents, scientists, politicians are trying to answer the question of how to make the learning process effective and how to improve the quality of education. Currently, more and more parents are turning to tutors for help to improve the quality of education and the level of children’s education. In the last few years, this phenomenon has become widespread; according to the opinion polls, about 50–60% of pupils seek the services of tutors for the EIT preparation, which allows making assumptions that tutoring is gradually becoming part of the learning process, which involves all its subjects— students, tutors, and government agencies. Investigating the level of popularity of search queries, public educational institutions can predict the success in the external independent testing passing by participants. External independent testing as a form of knowledge control in Ukraine was introduced in 2008. However, society still does not have a clear opinion about the effectiveness of this evaluation system. Taking into account the lack of consensus in society on the effectiveness of EIT and tutoring, our study is a significant contribution as a scientific basis for decision-making by the Ministry of Education and Science of Ukraine and other educational institutions. The calculation of the association coefficients values (A = 0.8227 > 0.5) and contingency (K = 0.4422 > 0.3), as well as the exact Fisher criterion (P = 0.04045 < 0.05), allows us to conclude that the relationship between the popularity level of the search query “tutor” and the average percentage of people who passed the external independent testing is confirmed and is relatively strong. These results are consistent with the findings of the authors in the source [18], which confirmed the high prevalence and popularity of tutoring among prospective applicants and analyzed the changes that have occurred since the introduction of EIT. Similar results are given in the report [17], which states that the highest scores of external evaluation were awarded to graduates who used the services of tutors and preparatory courses. However, the development of tutoring leads to increased social inequality and inequality of educational opportunities (Vakhovskyi and Bocharova [19]) and can help to better prepare for the EIT only some categories of graduates and should not cover all (Hladkevych and Zayats [20]). One of the challenges of today is that the development of tutoring reduces the role of the school in preparing for the final stage of training—conducting of EIT or state final certification. Among the limitations, it should be noted that the functions, tasks, and responsibilities of a tutor in Ukraine are not similar to those embedded in the meaning of the term “tutor” in other countries. In addition, different countries have different requirements for independent testing. These facts do not allow us to summarize the

98

O. Karyy et al.

results of the study and fully state the relationship between the level of popularity of the search query “tutor” and the average percentage of people who have passed an independent testing in other countries. The Web service Google Trends was selected by us to conduct research due to certain advantages along with traditional sources of information collection. Firstly, Web search data are available in near real time, while collecting traditional data can take months or years. Secondly, everyone has free, open, and unimpeded access to data retrieval, and no significant financial costs are required to collect data. Thirdly, searchable data are poorly controlled and therefore provides a high level of transparency. However, Google Trends is not designed to collect or analyze the frequency of a large number of search queries; therefore, some Internet marketers ignore it. But, for seasonality analysis, current trends, or demand forecasting, this is one of the best free tools. As Google Trends shows the level of interest in a topic (search query) based on the highest rate over a period of time, we weren’t able to estimate the demand for those who don’t have an Internet connection or have used social connections (relatives, friends) to find a tutor. The results of the study can be used by tutors and companies that provide tutoring services. For example, the identified seasonality in the interest in the profession of “tutor” will allow companies to build an effective marketing plan, intensifying the advertising campaign in those months in which the most active search for tutors. The results of the relationship between the popularity level of the search query “tutor” and the average percentage of people who passed the external independent testing can be taken into account by the authorities to form a state order for specific regions or educational institutions. Identifying the popularity of providing tutoring services in a particular subject will allow educational institutions to form an optimal curriculum, taking into account the requests of school graduates. The results of the research are limited to the study of interest in the profession of “tutor” in Ukraine; therefore, in the future, it is necessary to pay attention to the study of interest in the profession of “tutor” in other countries.

References 1. More than half of Ukrainians who took the external examination used the services of tutors, https://galinfo.com.ua/news/bilsha_polovyna_ukraintsiv_yaki_skladaly_zno_koryst uvalysya_poslugamy_repetytoriv_288510.html 2. Shchudlo, S., Dluhosh, P.: Tutoring in Ukraine and Poland: why do you need additional knowledge? Methodology, theory and practice of sociological analysis of modern society. Collect. Sci. Works 19, 427–431 (2013) 3. Burdiak, A.: Additional classes in school subjects: motivation and prevalence. Monitor. Publ. Opinion Econ. Social Changes 2, 96–112 (2015) 4. Naumova, A.: The role of tutoring in the modern system of education. In.: XX All-Russian scientific-practical conference Russia between modernization and archaization: 1917–2017, pp. 385–390 (2017) 5. Bray, M., Lykins, Ch.: Shadow education: private supplementary tutoring and its implications for policy makers in Asia. Asian Development Bank, Mandaluyong City, Philippines (2012)

Analysis of the Interest in the Profession of Tutor …

99

6. Bray, M.: The shadow education system: private tutoring and its implications for planners. Wydawnictwo UNESCO: International Institute for Educational Planning, Pary´z (2007) 7. Hrynkevych, O., Sorochak, O., Panukhnyk, O., Popadynets, N., Bilyk, R., Khymych, I., Viktoriia, Y.: Competitiveness of higher education system as a sector of economy: conceptual model of analysis with application to Ukraine. Adv. Intel. Syst. Comput. 1131 AISC, 439–445 (2020). https://doi.org/10.1007/978-3-030-39512-4_69 8. Prokopenko, O., Kudrina, O., Omelyanenko, V.: ICT support of higher education institutions participation in innovation networks. In: 15th International Conference on ICT in Education, Research and Industrial Applications. Integration, Harmonization and Knowledge Transfer, vol. 2387, pp. 466–471 (2019) 9. Melnyk, M., Syniura-Rostun, N., Lysiak, N., Dzyubina, A.: Business environment of regions in Ukraine: peculiarities of structural institutional changes. Probl. Perspect. Manage. 19(1), 456–469 (2021). https://doi.org/10.21511/ppm.19(1).2021.38 10. Petrenko, O.: Internet communication as a factor in the transformation of social consciousness in a hybrid war in Ukraine. Bulletin of V.N. Karazin Kharkiv National University. Series: Sociological research of modern society: methodology, theory, methods, vol. 37, 194–203 (2016) 11. Liashenko, P.: Research of interdependence of social and information security concepts with the use of Google Trends and Ahrefs. Bulletin of Cherkasy University. Series: Economic Sciences, vol. 1, 77–87 (2017) 12. Seung-Pyo, J., Hyoung, S.Y., San, C.: Ten years of research change using Google Trends: from the perspective of big data utilizations and applications. Technol. Forecast. Soc. Chang. 130, 69–87 (2017). https://doi.org/10.1016/j.techfore.2017.11.009 13. Combes, S., Bortoli, C.: Nowcasting with Google Trends, the more is not always the better. In.: 1st International Conference on Advanced Research Methods and Analytics (CARMA 2016), pp. 15–22 (2016). https://doi.org/10.4995/CARMA2016.2016.4226 14. Woloszko, N.: Tracking activity in real time with Google Trends. OECD Economics Department Working Papers. OECD Publishing, Paris (2020). DOI: https://doi.org/10.1787/6b9c75 18-en 15. Kliuiev, O., Vnukova, N., Hlibko, S., Brynza, N., Davydenko, D.: Estimation of the level of interest and modeling of the topic of innovation through search in Google. Paper Presented at the CEUR Workshop Proceedings 2604, 523–535 (2020) 16. Havranek, T., Zeynalov, A.: Forecasting tourist arrivals: Google trends meets mixed-frequency data. Tour. Econ. 27(1), 129–148 (2021). https://doi.org/10.1177/1354816619879584 17. Official report on conducting in 2020 an external independent evaluation of learning outcomes obtained on the basis of complete general secondary education, https://testportal.gov.ua//wpcontent/uploads/2020/09/ZVIT-ZNO_2020-Tom_1_r.pdf 18. Khmelevska, O.: Tutoring as a component of shadow education and opportunities for its evaluation in Ukraine. Demography Social Econ 1, 37–53 (2017). https://doi.org/10.15407/dse2017. 01.037 19. Vakhovskyi, L., Bocharova, O.: Shadow education in Ukraine and the problem of equality of educational opportunities. Sci. Bulletin Donbass 1(2) (2018) 20. Hladkevych, M., Zayats, O., Kyshakevych, Y.: Preparation for external evaluation: school or tutoring? Youth Market 2, 11–15 (2018)

A PID Temperature Controller for Passive Solar Water Heaters with a Low-Cost Three-Way Valve Chanh-Nghiem Nguyen , Ngo Thanh The, and Luong Vinh Quoc Danh

Abstract In this paper, we propose a PID temperature controller for passive solar water heaters by regulating an electric valve developed from a low-cost three-way valve that was integrated with a servomotor. An experimental model was developed to characterize the three-way valve and to find the parameters of the controller. The control algorithm was embedded in an Arduino Nano kit to control the opening angle of the designed valve for mixing hot and cool water from the inlet. By adjusting the hot and cool water mixing ratio, the mixed outlet water temperature can be maintained at the desired value. The response of the PID model was also compared to that of the fuzzy logic model. This study also determined the optimal value of the initial valve opening angle to achieve the required energy efficiency and response time. Experimental results show that the system response using the PID controller had an overshoot of less than 3%, a settling time of about 110 s, and a temperature error of ±1 ◦ C. The research results provide an advanced feature for passive solar water heaters that contributes to energy saving, comfort, and safety for users.

1 Introduction The recent demand for solar energy, a clean and sustainable energy source for civil and industrial purposes, is increasing rapidly. In particular, the use of solar energy to heat water for household use is becoming more and more popular, with the global solar thermal capacity of 480 GW in 2018 [1, 2]. Using solar energy significantly reduces electricity consumption and contributes to reducing carbon dioxide (CO2 ) C.-N. Nguyen · L. V. Q. Danh (B) Can Tho University, Can Tho City, Vietnam e-mail: [email protected] C.-N. Nguyen e-mail: [email protected] N. T. The Can Tho Vocational College, Can Tho City, Vietnam © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_10

101

102

C.-N. Nguyen et al.

emissions from burning fossil fuels. The South–central and South regions of Vietnam have a total of 2000–2500 h of sunshine per year, and the radiation intensity of the sun is in the range of 4.9–5.7 kWh/m2 /day [3], which is very suitable for deployment of passive solar water heaters. The advantages of passive solar water heaters are that they are highly durable with low maintenance cost and do not require electricity to operate. However, to obtain the water at the outlet’s desired temperature, three-way mixing valves are used and adjusted manually. This adjustment takes time and has a potential risk of hot water scalds to users, especially children and the elderly. Microcontroller techniques have been applied in temperature control for water heaters [4–8]. Microcontroller units (MCUs) were used to regulate the frequency and working time of the heating (or cooling) elements for constant water temperature at the outlet. These approaches have many drawbacks, such as low accuracy in temperature control, slow response, and complex hardware designs. Such limitations could be overcome by using microcontrollers embedded with proportional–integral– derivative (PID) control algorithm due to their simple structures and effectiveness for many dynamic processes [9–14]. However, most of the published works focused on the development of temperature controllers for electric water heaters and active solar water heaters. To the best of our knowledge, few efforts have been made to control the water temperature of passive solar heaters that have been widely used with increasing demand. Therefore, we developed a PID-controlled passive solar water heater using an electrically controlled valve. This valve was fabricated based on a low-cost three-way valve and a servomotor to regulate the mixing ratio of the hot and cool water. The remainder of the paper is organized as follows. Section 2 presents the methodology, including the structure of the designed system and the experimental model development, the control algorithm development, and the system software design. Experimental results are discussed in Sect. 3. Finally, conclusions are presented in Sect. 4.

2 Methodology 2.1 System Overview The principle diagram of the passive solar water heater integrated with a temperature controller is presented in Fig. 1. The proposed control system design consists of five main parts: a microcontroller unit (MCU), an electric three-way valve, a temperature sensor, an RF transceiver module, and a handheld remote controller. Microcontroller unit (MCU). An Arduino Nano kit [15] was used as the central processing unit for the control system. Its main task was to control the three-way valve for adjusting the hot and cool water mixing ratio for a set temperature. The MCU also performed data transmission with the handheld remote controller via the RF transceiver module.

A PID Temperature Controller for Passive Solar Water …

103

Fig. 1 Structure of the temperature controller for the solar passive water heaters

Electric three-way valve. This valve was developed for mixing hot and cold water from inlets so that the outlet water could be maintained at the desired temperature. The opening angle of the valve was controlled by the MCU based on the control algorithm. Temperature sensor. The DS18B20 digital temperature sensor from Maxim Integrated [16] was used to measure the water temperature at the outlet of the three-way valve. Handheld remote controller. The handheld remote controller was developed to set the desired water temperature at the outlet wirelessly. It consists of an Arduino Nano kit, an LCD, a keyboard, and an RF transceiver module, as shown in Fig. 2. RF transceiver module. The HC-11 433-MHz RF transceiver module [17] was used for data transmission between the MCU and the handheld remote controller, as shown in Fig. 3. Fig. 2 Block diagram of the handheld remote controller

104

C.-N. Nguyen et al.

Fig. 3 Display of set temperature and the current temperature on a the handheld remote controller and b the MCU of the developed system

2.2 Experimental Model Development An experimental model was developed to determine the parameters for the temperature controller, as shown in Fig. 4. In the model, a hot and a cool water tank were used to emulate the hot water source from the solar heater and the conventional tap water source, respectively. At each of the two inlets and the outlet of the three-way

Hot water tank

Controlled 3-way valve Flow sensor for hot water (FI-01) Temperature sensor for hot water (TI-01)

Fig. 4 Experimental setup of controlling water temperature

Cool water tank

Flow sensor for cool water (FI-02) Temperature sensor for cool water (TI-02) Temperature sensor (TI-03) and flow sensor (FI-03) at outlet

A PID Temperature Controller for Passive Solar Water …

105

Fig. 5 Principle diagram of the system hardware for collecting data from sensors

valve, a DS18B20 digital temperature sensor [16] and a YF-S201 flow rate sensor [18] were installed. An Arduino Nano kit was used to collect the water flow rate and temperature at a particular inlet or the outlet of the three-way valve. Then, it sent the sensors’ data to a PC (Fig. 5) in which a MATLAB-Simulink program was built for simulating the control process of the three-way valve (Fig. 6).

2.3 Control Algorithm Development In this study, PID and fuzzy controllers were developed, and their performance was experimentally analyzed. PID control model. A closed-loop PID control model was developed for the water temperature controller of a passive solar heater. The input of the PID controller is the difference between the temperature of mixed water (TI-03) and the desired temperature. The output is the opening angle of the three-way valve. The Ziegler–Nichols’ method based on close-loop frequency response [19] was applied to determine the parameters of the PID controller in this study. Then, the parameters of the PID controller were empirically tuned for implementation in the temperature controller.

106

C.-N. Nguyen et al.

Fig. 6 MATLAB-Simulink program for simulating the operation of the three-way valve

Fuzzy control model. In this study, a Mamdani fuzzy controller was also built for comparison purposes using the fuzzy logic toolbox, MATLAB. It has two input variables (e, de) and one output variable (i.e., the valve’s opening angle). The first input variable e is the difference between the temperature of mixed water (TI-03) and the desired temperature. The second input variable de is the derivative of e. The output variable is the opening angle of the three-way control valve. The input and output variables all have five membership functions with the default MATLAB function type trimf . They were automatically constructed within the defined range, as shown in Table 1. The fuzzy rules are summarized in Table 2, based on which the 3D surface plot representing the input–output relationship was also established, as shown in Fig. 7. Table 1 Configurations and parameters of the fuzzy inference system Variable

Range

Membership function name

Membership function type

Input e

[−65 65]

Very small, small, zero, large, very large

trimf

Input de

[−0.8 0.8]

Steep falling slope, falling slope, constant, rising slope, steep rising slope

trimf

Output

[30 140]

Very small, small, average, large, very large

trimf

A PID Temperature Controller for Passive Solar Water …

107

Table 2 Fuzzy rules for controlling the opening angle of the valve e

de Steep falling slope

Falling slope

Constant

Rising slope

Steep rising slope

Very small

Very small

Very small

Small

Average

Average

Small

Very small

Small

Small

Average

Large

Zero

Small

Small

Average

Large

Large

Large

Small

Average

Large

Large

Very large

Very large

Average

Large

Large

Very large

Very large

Fig. 7 3D surface plot of the fuzzy control rules

2.4 System Software Design Figure 8 shows the flowchart of the control program for the temperature controller embedded in the MCU. According to this algorithm, the MCU receives commands from the handheld remote controller. If there is no signal from the handheld unit and zero flow rate at the outlet, the system will be put into sleep mode to save energy. Also, to ensure users’ safety, the MCU will activate a safety valve if the cool water tank runs out, and only, hot water is fed into the system. This function is performed based on the evaluation of the system’s overshooting value. If this value exceeds 10% within 60 s, the system will alert the user by activating the buzzer in the handheld remote controller. For safety, a normally closed solenoid valve is installed at the hot water inlet to stop the hot water supply if the system is out of power.

108

C.-N. Nguyen et al.

Fig. 8 Flowchart of the control program embedded in the MCU

3 Results and Discussion 3.1 Characterization of the Three-Way Valve In this study, the authors proposed using a commercial three-way valve combined with a servomotor LD-27MG [20] to form the electrically controlled valve (Fig. 9). By controlling the servomotor, the MCU could change the opening angle of the three-way valve to adjust the mixing ratio of hot and cool water from the inlets for a desired water temperature at the outlet. Figure 10 depicts the experimental scheme to investigate the linearity characteristics of the three-way valve used in this work. Three flow sensors FI-01, FI-02, and FI-03 were used to measure the flow rate of the hot water, the cool water, and the mixed water, respectively. Experimental results showed that the opening angle of the valve was in the range of [30◦ , 140◦ ]. Ideally, the opening angle of 30◦ and 140◦ corresponded to 100% of hot water flow and cool water flow, respectively. The relationship between the flow rate at the three-way valve’s inlets and outlet when continuously increase and decrease the valve’s opening angle is depicted in Figs. 11 and 12, respectively. A significant amount of hot water could be supplied

A PID Temperature Controller for Passive Solar Water …

109

Fig. 9 Commercial three-way valve (a), the designed three-way valve (b)

Fig. 10 Experimental diagram for investigating characteristics of the three-way valve

with an opening angle of smaller than 100º. Similarly, the cool water was noticeably obtained with an opening angle larger than 80◦ (Fig. 11). However, to reduce the cool water input to an insignificant level, the valve’s opening angle should be adjusted below 70◦ (See Fig. 12). A steady water flow rate at the valve outlet could be maintained when sufficient hot and cool water flow was allowed. The nonlinearity of the three-way valve implied that the valve’s initial opening angle should be close to 100◦ so that only a minimal amount of hot water can be supplied to better control the water temperature at the outlet considering the safety and system response.

110

C.-N. Nguyen et al.

Fig. 11 Water flow rate when increasing the valve opening angle

Fig. 12 Water flow rate when decreasing the valve opening angle

3.2 Temperature Control with the Experimental Model As discussed in Sect. 2.2, the gains of the PID controller were initially determined by Ziegler–Nichols’ method based on its close-loop frequency response with the experimental model. The PID gains were then manually tuned (KP = 0.15, KI = 0.1, KD = 0.001) and implemented in the water temperature controller in the experimental model. Based on the characterization results of the three-way valve, the initial opening angle of the valve was chosen at 90◦ rather than 140◦ for safety reasons and for reducing a wasted amount of cool water at the initial operation. Figure 13 showed the system response when the reference input t_ref was set at 45 °C, and the initial opening angle of 90◦ was applied. It had a rise time of only about 5 s and an 11.5% overshoot. The control system could adapt well to changes in the reference input (Fig. 14) with a minor overshoot of about 2.3% (calculated with reference input of 50 °C). The significant overshoot, as shown in Fig. 13, might be partly due to a

A PID Temperature Controller for Passive Solar Water …

111

Fig. 13 System response at a set temperature of 45 °C and initial opening angle of 90◦

Fig. 14 System response with changes in the reference input

significant temperature difference between the hot and cool water supply and the nonlinearity of the three-way valve. The fuzzy controller was also implemented in the experimental model. As depicted in Fig. 15, the fuzzy controller’s response had no overshoot, a steady-state error of about 5%, a rise time of about 29 s, and a settling time of about 59 s. Extensive adjustment of the valve’s opening angle was noticed for a minor regulation of the water temperature (See Fig. 15 when time >160 s). This behavior of the controller might have a negative effect on the lifetime of the valve.

112

C.-N. Nguyen et al. Temperature

valveOpeningAngle 29° °

Fig. 15 Response of the system with fuzzy controller

3.3 Temperature Control with the Passive-Heater Model The control system was implemented in the actual conditions, as illustrated in Fig. 1. The hot water was supplied by the passive solar water heater. The cool water was obtained directly from the water supply from the local water plant. Based on the experimental results in Sect. 3.2, the gains of the PID controller were fine-tuned again as KP = 0.18, KI = 0.12, and KD = 0.0025. Its responses to changes in reference input are depicted in Fig. 16. The most significant percentage of overshoot was about 7%, corresponding to the first reference temperature of 45 °C. However, when the reference input was changed to 50 °C, the system had an overshoot of only 3%. When the reference input was changed to 40 °C, only, a slight overshoot could be observed. The system had a settling time of about 110 s and a steady-state error of about ±1°C.

Fig. 16 Response of the PID-based temperature controller under real-world conditions

A PID Temperature Controller for Passive Solar Water …

113

Fig. 17 Response of the fuzzy controller under real-world conditions

The fuzzy controller was also implemented for performance comparison. As shown in Fig. 17, the system response of the fuzzy controller was relatively slow to changes in reference input. It might also require a longer settling time as compared with the PID controller. Experimental results also showed that the rate of adjusting the valve opening angle was minimal with the PID controller during its operation. Therefore, energy consumption could be reduced, and the lifetime of the three-way valve and the servomotor could be preserved. In addition, due to its design to operate with 12 V DC power sources, the system can be integrated with a solar power supply that allows the device to operate safely and independently from the main power supply.

4 Conclusions The design and implementation of a temperature controller for passive solar water heaters have been introduced in this paper. The proposed control system was based on a PID controller that adjusted the hot and cool water mixing ratio to maintain a desired mixed water temperature. This study introduced a low-cost solution using commercial three-way valves combined with servomotors to design the electrically controlled valves. An optimal value of the initial valve opening angle was also determined for energy efficiency and response time. The experimental results showed that the PID temperature controller offered better performance compared to that of a fuzzy controller. The research results provide an advanced feature for passive solar water heaters that contributes to energy saving, comfort, and safety for users.

114

C.-N. Nguyen et al.

References 1. Gautam, A., Chamoli, S., Kumar, A., Singh, S.: A review on technical improvements, economic feasibility and world scenario of solar water heating system. Renew. Sustain. Energy Rev. (2017) 2. Statista: Solar thermal capacity globally 2018, https://www.statista.com/statistics/1064418/ solar-thermal-energy-cumulative-capacity-globally/ ` n˘ang lu,o.,ng tái ta.o oij, Nam Trung bô. và Miên ` 3. Le, V., Long, T.: S´u,c hút nguôn nam (In Vietnamese), http://tapchicongthuong.vn/bai-viet/suc-hut-nguon-nang-luong-tai-taoo-nam-trung-bo-va-mien-nam-63450.htm 4. Yin, Y.G.: Design and application of solar water heater intelligent control system. In: 2009 International Conference Energy and Environment and Technology, pp. 580–583 (2009) 5. Hasan, M.R., Arifin, K., Rahman, A., Azad, A.: Design, implementation and performance of a controller for uninterruptible solar hot water system. In: 2011 IEEE 18th International Conference on Industrial Engineering and Engineering Management (2011) 6. Odigwe, I.A., Ologun, O.O., Olatokun, O.A., Awelewa, A.A., Agbetuyi, A.F., Samuel, I.A.: A microcontroller-based active solar water heating system for domestic applications. Int. J. Renew. Energy Res. 3, 837–845 (2013) 7. Li, S.: The circuit design on thermostat control of solar water heater’s water temperature. Appl. Mech. Mater. 513–517, 3560–3563 (2014) 8. Tasnin, W., Choudhury, P.K.: Design and development of an automatic solar water heater controller. In: International Conference Energy, Power and Environment: Towards Sustainable Growth (2015) 9. Singh, P., Chouhan, H.: Design of a microcontroller based programmable PID controller for temperature control. J. Mater. Sci. Mech. Eng. 2, 17–19 (2015) 10. Koulouras, G., Alexandridis, A., Karabetsos, S., Grispos, S., Stoumpis, P.C.G., Koulouris, A., Nassiopoulos, A.: An embedded PID temperature control scheme with application in a medical microwave radiometer. J. Eng. Sci. Technol. Rev. 9, 56–60 (2016) 11. Rusia, P.: Digital implementation of PID controller using FPGA for temperature control. Int. J. Sci. Eng. Res. 8, 1774–1777 (2017) 12. Zhang, J., Li, H., Ma, K., Xue, L., Han, B., Dong, Y., Tan, Y., Gu, C.: Design of PID temperature control system based on STM32. In: IOP Conference Series: Materials Science and Engineering, p. 072020 (2018) 13. Mekali, H. V., Somashekhar, K.R., Gowda Baragur, A., Arfan, M., Surendra, S., Nayak, K.C.: Design and development of automatic temperature control system for solar water heater system. In: IEEE 7th International Conference Power and Energy, pp. 19–22 (2018) 14. Madugu, J.S., Vasira, P.G.: Modeling and performance evaluation of P, PI, PD and PID temperature controller for water bath. ASRJETS 7, 186–200 (2018) 15. Arduino Official Store: Arduino Nano, https://store.arduino.cc/usa/arduino-nano 16. Maxim Integrated: DS18B20—Programmable Resolution 17. Mikroelectron: 433MHz RF Wireless UART Module (HC-11), https://mikroelectron.com/Pro duct/433MHz-RF-Wireless-UART-Module-HC-11-HC11/ 18. ElectroPeak: YF-S201 Hall Effect Water Flow Meter Sensor|ElectroPeak, https://electropeak. com/yf-s201-water-flow-sensor 19. Åström, J., Karl, M., Murray, R.: Feedback Systems—An Introduction for Scientist and Engineers. Princeton University Press (2021) 20. NaveStar Technology Ltd: Digital Servo Motor with 270 Control Angle LD-27MG Full Metal Gear, https://www.navestar.com/p/digital-servo-motor-with-270-control-angle-ld-27mg-fullmetal-gear/

MATLAB/Simulink-Based Simulation Model for Minimizing the Torque Ripple Reduction in Brushless DC Motor Issa Etier, C. Arul Murugan, and Nithiyananthan Kannan

Abstract The wide torque ripple of BLDC machines is a major drawback that restricts their use in dynamic performance for motor speed drives. Average torque control algorithm and a fuzzy logic technique are used in this investigation. The torque ripple mitigation mechanism is the focus of this proposed research for a brushless DC motor detecting the back electromotive force (EMF). To produce the excitation field rotor position exploring the information, no sophisticated observer is required. Hall effect sensors are not used in this system to determine the rotor direction; instead, the voltage and current are taken as the feedback. MATLAB/Simulink is used to simulate the proposed method. The results of simulations show that the proposed approach is capable of reducing torque ripples, major lower harmonics, and increasing device stability.

1 Introduction Brushless direct current (BLDC) motors have gained importance in the past few decades due to their high efficiency in order to understand which winding consuming more immunity to electromagnetic interference (EMI) problems. It covers a great strategy in various industry applications to transfer current to a moving armature. One of the most common disadvantages of BLDC drives is irregularities caused by I. Etier Department of Electrical Engineering, The Hashemite University, Zarqa, Jordan e-mail: [email protected] C. Arul Murugan Department of Electronics and Telecommunication Engineering, Karpagam College of Engineering, Coimbatore, India e-mail: [email protected] N. Kannan (B) Department of Electrical Engineering, Faculty of Engineering, King Abdulaziz University, Rabigh, Saudi Arabia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_11

115

116

I. Etier et al.

phase current commutation. Estimating the position of this method from a modeling perspective demands variable safety critical applications. The ambiguity of phase turns on during commutation. In the third step, rotating parts can be considered by a mismatch between a set two phase currents, resulting in torque ripple. Using the current gradient, it is also possible to reduce ripples in BLDC motor in the proposed system with space vector modulation technique and an average torque control process. The sensor-less technique is used to solve the disadvantages of censored BLDC motors. The developed inverters for induction motors are used as input in the back EMF observer level, which calculates the approximate speed value. By applying fuzzy logic, it is used to give the device a quick response and reduce the maximum error in the output. The fuzzy controller input is used to monitor the three-phase inverter switching by the method of space vector pulse width modulation.

2 Related Works The three-phase current has been suggested allowing connection during the commutation time into three parts. The alternate methods of changing the dc-bus voltage were proposed using control algorithms by switching three-phase inverter. To change the dc-bus voltage, a method of power converter topologies is suggested. The limitations of using position sensors and compensation technique are proposed [1] by applying control task of a BLDC motor drive that does not use a DC link capacitor. The controller’s simplicity allows it to be implemented on low-cost microcontroller platforms with limited resources. The main drawback of this system is that it has increased the motor drive’s overall complexity. A novel DTC technique was proposed [2, 3] for controlling BLDC motors. The switching frequencies are used for many IGBTs that rely on a feedback device to restrict power switches. This is accomplished by replacing the two-level torque controller with a three-level torque controller [4]. Torques ripples reduction method for BLDC drives varied in a linear manner IVSC in [5, 6]. In both two-phase conduction mode and commutation mode, torque output is minimized by energy saving potential with back EMF waveforms. A commutation control with two-phase or three-phase switching mode is used during commutation. The three-phase switching mode reduces performance and increases switching losses. In [7–10], motor also exhibits throughout the commutation cycles to preferred shapes which continue to develop fast power switches. The difference is to fit drive directly on a brushless machine. Various works have been reported related to ripple reduction and application-oriented simulations process-based works.

3 The Benefits of the Proposed Model • The proposed method eliminates torque ripple over a broader variety of applications.

MATLAB/Simulink-Based Simulation …

117

• Instead of using Hall effect sensors, the voltage and current are used as feedback, lowering the system’s cost. • The method of fuzzy logic is used. The benefits of using fuzzy logic controller are, a. b.

More energy is saved by the motor at start-up or when it is working at a lower load. The controller’s cost and complexity are minimized.

It maintains a constant motor speed as the load changes. The system efficiency of the system is improved.

4 Proposed System—BLDC Motor Using Fuzzy Logic Controller The marked exception is occasionally used to drive the BLDC motor. Position sensors are not used in a sensor-less BLDC motor drive. Instead, position information is compiled in an indirect manner. The benefits of using a sensor-less drive device include the following: reduced feedback units, increased system stability, decreased system size reduced, and hardware cost. They are often unaffected by mechanical or environmental constraints. A voltage and current input from the inverter output is undertaken through the interaction of back EMF and then the speed in the proposed system. This method employs the space vector modulation technique. Also, the torque production is to measure the rotor position, which is then used to calculate the torque ripple level. Figure 1 depicts the proposed system’s block diagram.

Fig. 1 Block diagram of sensor-less BLDC motor with fuzzy logic controller

118

I. Etier et al.

The energy flowing into the device in an ideal machine changes the current excitation in electromechanical energy conversion theory. The formula is depicted as below. dWe = dWm + dWmech + dWloss

(1)

The energy contained in the magnetic field achieves dynamic equilibrium. As a result, the energy shift in the air gap expresses the ripple factor. Sensors control the dc-bus voltage and current, as previously stated. Integrating the enclosure cap is independent of ripple production for a single control cycle. The benefits of a fuzzy logic controller are that it does not rely on a mathematical model and instead relies on linguistic rules derived from machine operation experience, and it can manage nonlinear functions with a more robust performance. Given the short switching period, it is fair to conclude that the system’s performance remains constant over one switching cycle. In one switching cycle, the relationship between input energy and average torque T av can be expressed as dWe =

  1 dWmech = θ ηTav = kTav η

(2)

The DC input energy fluctuation will be expressed as the electric machine torque variance, according to (2). Therefore, average torque T av may be computed by using the energy flowing into the system for each control cycle. In fuzzy logic control gains, the effect of coefficient k can be counted. In traditional motor control, the solution of fuzzy controller is usually taken as the electromagnetic torque command. In this case, the speed control gains are set up such that the controller’s output is the energy command at the end of each control cycle. a.

Pulse width modulation for three-phase vsi using space vector

Figure 2 depicts the topology of a three-leg voltage source inverter. A voltage source inverter with indefinite slope requires intensive motor dimensions. Figure 3 depicts these topologies. Nonzero switching states are six of the eight topologies that provide nonzero output voltage, whereas the other topologies produce winding configurations. b.

Voltage space vector

The three phase quantities are represented as vectors in a two-dimensional (αβ) plane in space vector modulation (SVM) for three-leg VSI. For the sake of completeness, this is depicted here. The line voltages Vab , Vbc , and Vca are provided by topology as mentioned in Fig. 4. Vab = Vg

(3)

Vbc = 0

(4)

MATLAB/Simulink-Based Simulation …

Fig. 2 Topology of a three-leg voltage source inverter

Fig. 3 Switching-state topologies with voltage source inverter

119

120

I. Etier et al.

Fig. 4 Topology of αβ plane

Fig. 5 αβ plane with nonzero voltage vectors

Vca = Vg

(5)

In Fig. 4, this topology generates an efficient voltage vector of V 1(pnn). Following the same logic, the positions of V 1 –V 6 are depicted in Fig. 5. These vectors tips are mentioned by dotted line in Fig. 5. Within the hexagon, a sector is defined as the area enclosed by two adjacent vectors. As a result, in Fig. 5, sectors are numbered by 1 to 6. c.

Modulation of space vector

In Fig. 6, the desired three-phase voltages at the inverter’s output are expressed by an analogous vector, V, rotating counterclockwise. The magnitude results in a specific design to measure mathematical approaches. d.

Fuzzy logic controller

The fuzzy logic system is crucial in the control of linear systems and in industrial applications where automation and control are critical. The fuzzy inference systems are used to construct a fuzzy logic controller with input and output membership functions. The fuzzy sets and rules are planned, and the drive can be managed accordingly. Fuzzy logic (FL) is a type of multi-valued logic that deals with ambiguous or fuzzy

MATLAB/Simulink-Based Simulation …

121

Fig. 6 Output voltage vector representation in the αβ plane

problems. FL makes it easy to reach a firm conclusion based on imprecise, vague, ambiguous, missing input information, or noisy. The five stages of a fuzzy inference method are as follows: 1. 2. 3. 4. 5.

Adding a layer of fuzziness to the input variables. Use of the fuzzy operator in the rule’s antecedent. Inference from the antecedent to the outcome. Consequences aggregation through the rules. Defuzzification of output variables.

The centroid method of the Mamdani form has been chosen as the inference method in the proposed scheme. The centroid method of defuzzification is used since it is simple to implement and takes less time to compute. The fuzzy function approximator’s approximation law consists of 49 laws desired. e.

Parameters of BLDC motor used in simulation

On the suggested method, a sensor-less method for BLDC motor speed calculation is used as mentioned in Fig. 7. The feedback is taken in the form of voltage and current, which is then converted to back EMF. Additionally, this back EMF is necessary for speed estimation. As a result, a speed estimate is obtained. The fuzzy logic is processed using the inputs of reference speed and estimated speed. The space vector sequence block receives the output signal and the key parameters of the proposed motor is shown in Table 1.

5 Results and Discussion a.

Three-phase inverter voltage waveform

Figure 8 depicts the voltage waveform of a three-phase inverter. b.

Waveform of the motor with Back EMF

Figure 9 depicts the motor’s back EMF waveform.

122

I. Etier et al.

Fig. 7 BLDC motor simulation with fuzzy logic controller

Table 1 BLDC motor key parameters

c.

Power

60 W

Stator phase Resistance

2.875

Rated speed

1500 rpm

Number of poles

4

Stator phase Inductance

8.5 Mh

Rated voltage

24 V

Flux linkage

0.175

Motor speed characteristics

The motor speed characteristics is shown in Fig. 10. d.

Torque characteristics of the system

Torque characteristics with fuzzy logic controller is shown in Fig. 11.

MATLAB/Simulink-Based Simulation …

Fig. 8 Voltage waveform of the three-phase inverter

Fig. 9 Waveform of the motor with back EMF

Fig. 10 Motor speed characteristics

123

124

I. Etier et al.

Fig. 11 Torque characteristics with fuzzy logic controller

e.

Comparison of Fuzzy logic controller and PI controller with torque ripple

The zoomed torque characteristic of the device with PI controller over a time range of 0.025–0.045 s is illustrated in Fig. 12. The zoomed torque characteristic of the device with fuzzy logic controller over a time range of 0.025–0.045 s is displayed in Fig. 13. The torque ripples are reduced to 1.2 Nm in magnitude.

Fig. 12 Zoomed torque characteristic with PI controller

MATLAB/Simulink-Based Simulation …

125

Fig. 13 Zoomed torque characteristic of the system with fuzzy logic controller

6 Conclusion The average torque control system is used to minimize the BLDC motor’s torque ripples. To minimize costs on hardware, increase system stability and shrink the system, a sensor-less BLDC motor is used. The proposed framework was simulated with the aid of MATLAB/Simulink. The efficiency of both the proportional integral controller and the fuzzy logic controller-dependent torque ripple reduction system of the BLDC motor was evaluated through simulation studies. With the space vector modulation method, a fuzzy logic controller produces better results. The approximate speed was determined using the feedback current and voltage from the inverter’s output, and the corresponding pulse was sent to the three-phase inverter through a fuzzy logic controller with a fast response. The torque ripple in the fuzzy logic controlled proposed system has less overshoots, as can be seen. The obtained result validates the proposed systems efficacy. This proposed system decreases torque ripples while also improving system stability and performance. In the future, a BLDC motor may be designed such that torque ripples are minimized over a wider speed range.

References 1. Samitha, H.K., et al.: A torque ripple compensation technique for a low cost brushless DC motor drive. IEEE Trans. Ind. Electron. 99 (2015) 2. Masmoudi, M., et al.: Direct torque control of brushless DC motor drives with improved reliability. IEEE Trans. Ind. Appl. 50(6) (2014) 3. Liu, Y., et al.: Commutation-torque-ripple minimization in direct-torque-controlled PM brushless DC drives. IEEE Trans. Ind. Appl. 43(4) (2007) 4. Shao, J.: An improved microcontroller-based sensorless BLDC motor drive for automotive applications. IEEE Trans. Ind. Appl. 42(5) (2006)

126

I. Etier et al.

5. Xia, C., et al.: Torque ripple reduction in BLDC drives based on reference current optimization using integral variable structure control. IEEE Trans. Ind. Electron. 61(2) (2014) 6. Huang, X., et al.: A single sided matrix converter drive for a brushless DC motor in aerospace applications. IEEE Trans. Ind. Electron. 59(9), 3542–3552 (2012) 7. Singh, B., et al.: A voltage-controlled PFC Cuk converter- based PMBLDCM drive for airconditioners. IEEE Trans. Ind. Appl. 48(2), 832–838 (2012) 8. VimalRaj, S., Suresh Kumar, G., Thomas, S., Kannan, N.: MATLAB/SIMULINK based simulations on state of charge on battery for electrical vehicles. J. Green Eng. 9(2), 255–269 (2019) 9. Sureshkumaar, G., Kannan, N., Thomas, S.: MATLAB/SIMULINK based simulations of KY converter for PV panels powered led lighting system. Int. J. Power Electron. Drive Syst. 10(4), 1885–1893 (2019) 10. Etier, I., Murugan, C.A., Kannan, N., Venkatesan, G.: Measurement of secure power meter with smart IOT applications. J. Green Eng. 10(12), 12961–12972 (2020)

Improving Accuracy in Facial Detection Using Viola-Jones Algorithm AdaBoost Training Method Kodeti Haritha Rani and Midhun Chakkaravarthy

Abstract Facial detection and recognition is a major problem facing in image processing. Face identity is an important for identifying, searching and authentication purposes. Facial detection is quite common for humans; hence, it is very difficult for the computational device. The human face consists of many complicated multi-dimensions, and it is difficult to identify by a machine. The proposed methodology is used to distinguish human faces in the given multiple images using Viola-Jones algorithm framework and also improving accuracy from the false positive information and true negative information in the given data by using AdaBoost training methods and cascading classification.

1 Introduction Face recognition is an interesting area of research and is a major challenge and longstanding problem nowadays. Face detection plays a crucial role in automatic face recognition systems. Face detection identifies objects of faces located and present in the image using differentiating faces among all other features and objects present in the image. Face recognition plays a major role in recognizing faces in various applications like security systems, railway stations, airports, banks, credit card and debit card verification, passport verification, identifying criminals. The main goal of face detection is identifying the given images consisting of human faces or not. In the proposed system, we can identify only human faces in the given images irrespective of age, gender, etc. Our proposed system detects only human faces from the given images, if those consist of any animal faces, those will be ignored. In our proposed system for facial detection, Viola-Jones algorithm framework uses works K. H. Rani (B) · M. Chakkaravarthy Department of Computer Science and Engineering, Lincoln University College, Kota Bharu, Malaysia e-mail: [email protected] M. Chakkaravarthy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_12

127

128

K. H. Rani and M. Chakkaravarthy

on Haar feature selection, and to increase accuracy, AdaBoost and cascading training methods are used. The main goal of the proposed methodology is to detect only human faces from the given input images with high accuracy and better efficiency compared to other existing methods.

2 Related Work 2.1 Face Detection Human face is detected from the captured images using algorithm of Viola-Jones. The mechanism of Viola-Jones algorithm is mainly used for locating and detecting an object [1]. The objective of this algorithm was training slow detection fast. The use of this algorithm implements Haar measure intensities filters, such that it misapply multiplications [2].

2.2 Accuracy Measurement Haar feature detection and feature extraction detect some faces which are having the highest possibilities, whereas more accuracy is added by using an AdaBoosting method. Accuracy is measured using an AdaBoosting training algorithm. An AdaBoost method is adding more accuracy to face detection in Viola-Jones algorithm. Here, the no. of correct and incorrect predictions is summarized with count values. Here, the diagonal elements indicate the no. of faces correctly classified in each class. Sum of all elements indicates the total no. of images used for testing [3].

2.3 Viola-Jones Algorithm In 2001, Paul Viola and Michael Jones proposed the Viola-Jones algorithm which is an algorithm for detecting various objects. It can be accomplished to detect multiple object classes, especially the problem of face detection. Viola-jones algorithm is the first methodology to give accurate object detection results. Due to its high face detection results and processing speed, a minimum of two frames per second must be handled in order to differentiate between faces and non-faces [4]. Given an image (this algorithm only works with grayscale images), the program examines numerous smaller subregions and attempts to discover a face by looking for explicit attributes in each subregion [5]. Because an image can have numerous faces

Improving Accuracy in Facial Detection …

129

of varying sizes, it must verify many possible positions and scales. In this algorithm, Viola and Jones used Haar-like properties to detect faces [6].

2.4 Advantages of Viola-Jones Algorithm 1. 2. 3.

Fast: For real-time applications, a minimum of two frames per second must be handled. Robust: More accurate detection results, i.e., true-positive information rate and very low false positive information rate always Facial detection only: It aims to differentiate faces and non-faces

3 Proposed System 3.1 Viola-Jones Algorithm In 2001, two computer vision researchers named Paul Viola and Michael Jones presented the Viola-Jones algorithm. Despite the fact that it is an obsolete methodology, Viola-Jones is highly dominant, and its usage in real-time facial detection has proven quite effective [7]. This algorithm is excruciatingly slow to train but detects faces in real time with amazing speed.

3.2 Procedure of Viola-Jones Algorithm There are two phases in Viola-Jones algorithm that are: 1. 2.

Training Detection

This face detection algorithm was implemented for frontal faces, so it can identify by the best frontal compared to faces looking on various sides [8]. Before facial detection, facial image will be changed into grayscale; hence, it is very easy to detect the accurate object using less data [9]. This algorithm first detects the grayscale image in the face, and next finds the colored image location in the detected object place. This algorithm before detecting those facial images first draws an outline of each box and examines for a frontal face in that box and incrementing right after going to the box through each and every tile in the picture. With similar increments, a number of features are detected in the box face; like in the box, those collected boxes will help to determine the faces locations in the given image [4].

130

K. H. Rani and M. Chakkaravarthy

Fig. 1 Four important stages of Viola-Jones algorithm [11]

3.3 Stages of Viola-Jones Algorithm Viola-jones has four stages [10, 11]. 1. 2. 3. 4.

Selection of Haar features. Creating an integral image AdaBoost training Cascading classifiers (Fig. 1).

3.3.1

Selection of Haar Features

The Haar wavelet is a sequence similar to Fourier analysis and features derived from this. The Haar cascade classifier is based on the Haar wavelet technique to identify pixels in the image into square-shaped functions. This utilizes integral features of an image to compute the Haar-like features detected [12]. Some of the common features found in most common facial images are as follows: 1. 2. 3.

A thick eye region compared to upper cheek A bright nose middle region compared to the eyes Some of human facial feature parts like eyes, nose, mouth, forehead….

Haar features are acting as just like a kernel in convolution neural networks. The measures of the kernel are to be recognized by this Haar feature training which is applied by manually (Fig. 2). The researchers of Viola-Jones algorithm identified the following three types of Haar-like features. They are Fig. 2 Haar-like features [8, 10, 12, 14]

Improving Accuracy in Facial Detection …

1. 2. 3.

131

Edge feature Line feature Four-sided feature.

The above Haar feature measures assist the computational device to identify the image. The important two features of facial detection, the eyebrows and the nose, are described by horizontal and vertical edges, respectively [11, 12]. Moreover, each feature has its own value when the images are observed and easy to understand. When the Haar-like measure was changed to a grid image, individual square in grid image represents a pixel. There may be multiple number of pixels and makes a big grid for a firm feature [13]. Then, the darkness of the features represents into numbers. Here, we get the feature of a particular value of the face, whenever you add digits on the both left-side columns and subtract it from the sum of the both right-side columns [14].

3.3.2

Creating an Integral Image

In the integral image, we calculate the feature of a value; the image plays its role in allowing to perform the exhaustive calculations immediately. We can understand the criteria for number of features that fit the given criteria or not. For creating an integral image, select one grid for an exact feature, to analyze the value of that feature. Already mentioned vital image is considered as the sum of the selected grid in the unvarying image. After that, take the four corners of grid image feature and add diagonally and subtract those sums. The vital image method deliberates the calculation considerably less intensive and can reduce much time for any face detection framework. Training In training, we trained the computer vision to recognize the features. We distinguished the information and further subsequently trained it to adapt from the information which is used to predict [15]. So finally, this algorithm is resulting a lowest threshold to determine whether something can be classified as a feature or not. The Viola-Jones algorithm compresses the input into 24 * 24 and checks for the proficient features within the four corners of image. It requires a lot of facial images data to stream to the algorithm so that it can be well trained. In Viola-Jones, it nourished their algorithm into 4960 images. We need to give some non-facial images also to the algorithm. So, it can be differentiated between the two subsets of facial and non-facial outputs. Viola-Jones supplies 9544 non-facial images. Inside these, a few pictures may seem to be like components in a face, yet the calculation will comprehend in which highlights are bound to be a face and which are not to be on a face.

132

3.3.3

K. H. Rani and M. Chakkaravarthy

Adaptive Boosting Training Method (AdaBoost)

The calculation acquires from the images we supply it and can decide the false positives and true negatives in the information, permitting it to be more precise [16]. We would get a more suitable and exact model whenever we have looked at all potential positions and blends of those highlights. Training can be too extensive on account of the various possibilities and combinations you would need to check for each and every edge or image. Consider an equation for our Haar features that to give details of the success rate, with f 1 , f 2 and f 3 as the features and w1 , w2 , w3 as the corresponding weights of the features. F(x) = α1 x1 (x) + α2 x2 (x) + α3 x3 (x) Every feature is acknowledged as a weak classifier. Every one of the highlights are termed as a weak classifier. The left-half of the condition F(x) is known as a strong classifier. If any one of the weak classifiers could not give proper output, we will get a strong classifier when we have a combination of two or more weak classifiers [7, 8]. As larger data of face images increases, it gets more grounded and more grounded. This is called a troupe. Ensuring the image highlights in front, AdaBoost algorithm plays vital role identifying most significant and the best highlights when inquired. For example, let us say you have 20 pictures which comprise of 10 facial images and 10 non-facial images. So, you need to identify a significant component, the best element as per your requirement and use this to make a face prediction. The model gives us 7 out of 10 true positives and 3 out of 10 true negatives. It forecasts the accurate output for these images, but still there are some errors remain which are 3 false positives and 7 false negatives. So, the three images’ features did not found by this forecasting although they are actually faces in the given input, but it detects the features in 7 non-facial images. In this subsequent stage, versatile boosting utilizes to an element, which is to be suitable supplement to our present most grounded include. Hence, there is no need to search for the second-best element, because the selected supplement is the momentum suitable component. Hence, it builds the significance of the pictures which gets off-base as bogus false and identifies the following suitable component which fits to these pictures, as it was, expanding the heaviness of these pictures on the general calculation. In this way, if we added new highlights, we would boil down to one input toward the end that would be given a higher weight. When the calculation is improved and we can compute all trues and false results accurately, we proceed onward to the following stage: cascading training method.

3.3.4

Cascading F(x) = w1 x1 (x) + w2 x2 (x) + w3 x3 (x) . . .

Improving Accuracy in Facial Detection …

133

Fig. 3 Average face. https:// github.com/LovepreetSingh-LPSK/Face-Recogn ition

Cascading is another training method to improve the accuracy as well as speed of our model. Initially, we start the process by taking a part of the image and then take the better feature in that part. Then, search if it is there in the part of the image or not. If that feature is present in that part, then we look for the second feature in that part of the image. If the feature is not present in the sub-window, then we do not look at the part of the image, and simply, we can ignore and reject that part. We continue same for all quantity of highlighted parts and ignore the non-featured parts. If assessments may require second features you need to repeat for every element, it could take a great deal of time. Cascading training method speeds this process more, and the machine is able to produce the results more quickly.

4 Experimental Result See Figs. 3, 4, 5 and 6.

5 Conclusion A face detection method is presented based on Viola-Jones algorithm in this paper. Some augmentation techniques are used before the training and testing process. The network achieves an accuracy of 98.6% compared with other face recognition techniques. Also, working on the system with real-time images is tested. Moreover, the network has excellent performance and strong robustness. This paper proposes a facial detection based on Viola-Jones algorithm. The algorithm has more effectively worked among the existing algorithms. The test result on the real-time image

134

K. H. Rani and M. Chakkaravarthy

Fig. 4 Eigen faces. https://github.com/Lovepreet-Singh-LPSK/Face-Recognition

dataset also proves the accuracy of the proposed system. This paper uses a webcam to capture the images of the person to create the dataset. Face is identified from it using Viola-Jones algorithm. To increase accuracy, in this proposed system, AdaBoost and cascading methods are used. By using AdaBoosting in face detection process, accuracy is increased.

Improving Accuracy in Facial Detection …

Fig. 5 Sample output. https://github.com/Lovepreet-Singh-LPSK/Face-Recognition

135

136

Fig. 6 ition

K. H. Rani and M. Chakkaravarthy

Accuracy versus value of input. https://github.com/Lovepreet-Singh-LPSK/Face-Recogn

References 1. Boda, R., Priyadarsini, M.J.P.: Face detection and tracking using KLT and viola jones. ARPN J. Eng. Appl. Sci. 11(23), 13472–13476 (2016) 2. Deshpande, N.T., Ravishankar, S.: Face detection and recognition using Viola-Jones algorithm and fusion of LDA and ANN. IOSR J. Comput. Eng. 18(6), 1–6 (2016). https://doi.org/10. 9790/0661-1806020106 3. Devadas, A., James, S.P.: Face Detection and Recognition through Viola Jones Algorithm and CNN, pp. 5406–5410 (2020) 4. Balasuriya, L.S.: Frontal View Human Face Detection and Recognition (2000) 5. Boruah, D., Sarma, K.K., Talukdar, A.K.: Different face regions detection based facial expression recognition. In: 2nd International Conference on Signal Processing and Integrated Networks, SPIN 2015, pp. 459–464 (2015). https://doi.org/10.1109/SPIN.2015.7095280 6. Nayak, J.S., Indiramma, M.: Efficient face recognition with compensation for aging variations. In: 4th International Conference on Advanced Computing, ICoAC 2012. https://doi.org/10. 1109/ICoAC.2012.6416839 7. Da’San, M., Alqudah, A., Debeir, O.: Face detection using Viola and Jones method and neural networks. Int. Conf. Inf. Commun. Technol. Res. ICTRC 1, 40–43 (2015). https://doi.org/10. 1109/ICTRC.2015.7156416 8. Kaur, J., Sharma, A., & Cse, A.: Performance analysis of face detection by using Viola-Jones algorithm. Int. J. Comput. Intell. Res. 13(5), 707–717 (2017). http://www.ripublication.com 9. Promoteur, A., Droogenbroeck, V.: (00 Master thesis): Facial recognition using deep neural networks (2018). http://matheo.uliege.be 10. Hoang, M.P., Le, D., De Souza-Daw, T., Nguyen, T.D., Thang, M.H.: Extraction of human facial features based on Haar feature with Adaboost and image recognition techniques. In: 2012 4th International Conference on Communications and Electronics, ICCE 2012, pp. 302–305 (2012). https://doi.org/10.1109/CCE.2012.6315916 11. Tikoo, S., Malik, N.: Detection of face using Viola Jones and recognition using back propagation neural network (2017)

Improving Accuracy in Facial Detection …

137

12. Winarno, E., Hadikurniawati, W., Nirwanto, A.A., Abdullah, D.: Multi-view faces detection using Viola-Jones method. J. Phys. Conf. Ser. 1114(1) (2018). https://doi.org/10.1088/17426596/1114/1/012068 13. Pandey, S., Sharma, S.: An optimistic approach for implementing viola jones face detection algorithm in database system and in real time. Int. J. Eng. Res. V4(07), 1118–1122 (2015). https://doi.org/10.17577/ijertv4is070758 14. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2001) 15. Science, C., Rourkela, T. (n.d.): A Study on Face Detection in Images Rajesh Kumar Mahapatra (110cs0132), pp. 1–31 16. Ismael, K.D., Irina, S.: Face recognition using Viola-Jones depending on Python. Indonesian J. Electr. Eng. Comput. Sci. 20(3), 1513–1521 (2020). https://doi.org/10.11591/ijeecs.v20.i3. pp1513-1521 17. Cen, K.: Study of Viola-Jones real time face detector. Ai2 (Allen Inst. Artif. Intell.) 9(10), 1–94 (2016) 18. A Thesis Submitted in Parallel Fulfulment of the Requirements for the Degree: Human Face Detection and Recognition a Thesis Submitted in Parallel Fulfulment Human Face Detection and Recognition (2011)

Comparative Analysis Model for Lambda Iteration, Genetic Algorithm, and Particle Swarm Optimization-Based Economic Load Dispatch for Saudi Arabia Network System Abdulrhman Alafif, Youssef Mobarak, and Hussain Bassi Abstract The main objective of this work is to implement the genetic algorithm (GA)-based economic load dispatch (ELD) for a Saudi Arabia system network with comparative analysis. Different mathematical and probabilistic methods are used to solve ELD problems. This paper presents GA and particle swarm optimization (PSO) to solve ELD problem, also compare with conventional techniques the same as lambda iteration. The fuel cost has been compared for Saudi Arabia system network. All results have been done in MATLAB programming environment. The results have been proved that the GA is best for ELD problem than other conventional methods for a power system network.

1 Introduction The well-organized and optimum economic process of power systems has incessantly engaged one of critical factor in electrical power field. In most up-to-date periods, it is changing into vital for firms to run the power system network with lowest quantity whereas satisfy their client want in any respect times and ready to create profit. With restricted obtainability of manufacturing entities and also the huge rise in power requirement, fuel amount, and resource constraint, the dedicated units got to work the anticipated load request beside deviations in the fuel quantity and uncertain within the load demand prediction at ordinary periods in maximum optimal way [1–6]. ELD is a method to assign the producing units giving to the load request and to reduce operative price. GA is one of the traditional heuristic methods and it can able to A. Alafif · Y. Mobarak (B) · H. Bassi Department of Electrical Engineering, Faculty of Engineering, King Abdulaziz University, Rabigh, Saudi Arabia e-mail: [email protected] A. Alafif e-mail: [email protected] H. Bassi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_13

139

140

A. Alafif et al.

get success at higher probability of calculate the optimal values of various functions which does not need derivatives [7–10]. The improvement of the PSO is completed by American social man of science James Kennedy and engineer Russell Eberhart. PSO is one of most optimal procedure inspired over swarm smarts of birds and fish and via even human actions. It referred to as debris, swarm across the request space starting from many preliminary random guess [11–14]. PSO is an algorithm able to optimizing a non-linear and multidimensional problem which generally reaches right solutions efficiently on the identical time as requiring minimum parameterization [15–20]. The swarm connects the present best and portions the world level finest so as to emphasis at the finest resolutions. Since its development, there were around twenty various forms of PSO methods and have been implemented to most of the regions of firm optimal issues [21–26]. It is a populace-based stochastic procedure demonstrated on communal performances found in gathering birds. All element, which signifies a resolution, hovers through the exploration space with a velocity and this is vigorously accustomed in line with its individual and its mate’s past performances. The elements incline to fly near improved exploration areas over the sequence of the exploration process [27–31].

2 Saudi Arabia Network Studied System The Saudi Arabia electrical network studied system is divided in four area, such as east, center, west, and south. The four areas are the electrical network, the generation station, the placement, and the types. Gas, steam, combined cycle and solar stations in Saudi Arabia’s electrical network includes gas steam, combined cycle, and solar power stations. The system selected is a 28-bus system developed by the Saudi Arabia in faculty of Engineering at Rabigh. The total power produced during 2010 was approximately 240,064 GWh. A 10.4% increase was attained as compared with the year 2009. The power produced by Saudi Electricity Company Generation activity was 78% from total power. The total active energy during 2010 reached 212.263 GWh with a 9.7% increase over 2009. Electricity service was delivered to 297,000 new customers, so growth in the number of customers to 5,997,553 customers with a 5% increase over 2009. The total electric power generation during 2010 reached 50,000 MW, from these data, we simulated the Saudi Arabia network for 13 generations, three generations in east area and center area, with tie line power connected. Also, five generations in west area and two generations in South area. This constrains depended on power generation, power demand, and power losses Pg – Pl – Pd . This constrains for the limit of each power generator, so the generator will be generating a power between maximum and minimum limit on power [32] Pmin < Pg < Pmax . The maximum and minimum operating conditions for group of generators within a power plant station of turbines are supplied from single boiler. The extremes of the boiler operating conditions will determine these limitations Pmin (kg) < Pk < Pmax (kg). k = 1, 2, 3, . . . number of generators.

Comparative Analysis Model for Lambda Iteration …

141

In Saudi Arabia, network divided each units of generator has single boiler, so this constrain not working in Saudi Arabia network. For simplicity, we neglected the spare capacity constrain, the spare capacity is required to account the error of load prediction and sudden change in load demand. G>



Pl + psp

The Saudi Arabia system is suggested interconnected four area, a one-line diagram for the system is given in Fig. 1, where the system is basically composed of thirteen generating units with twenty-eight bus and fifteen loads [33].

Fig. 1 Saudi Arabia network interconnected studied system

142

A. Alafif et al.

3 Lambda Iteration When sending your final files, please include a readme informing the contact volume editor which of your names is/are your first name(s) and which is/are your family name(s). This is particularly important for Spanish and Chinese names. Authors are listed alphabetically according to their surnames in the author index.

3.1 Problem Formulation of Lambda Iteration for ELD The ELD problem will be described mathematically now on lambda iteration and will be consider process of real power generation units. The difference of the fuel cost (F) for individual generator with real power (Pg ) is specified in succeeding order flat fuel amount function. The total of amount for power plant is the totality of each generator (unit). F(Pg ) =

Ng 

Fi (Pgi )

(1)

i=0

The function of fuel cost curve for active power came from   F Pg = αi + βx Pg + γ x Pg 2 $/ h

(2)

where: (α, β, i) are constants. From (1) and (2):     Ng  αi + βx Pg + γ x Pg 2 F Pg i > i=0

(3)

With neglecting losses, fuel cost will be under to power balance equation, its simplest ELD problem, take on that all units are having only on one bus and all interconnected with load. Pd = Pg or Pd − Pg = 0

(4)

Where Pd = sum of all demand at load in the system (MW). From (4): Pg = Pd =

β Pd + 2∗γ λ−β or λ = 1 2∗γ 2∗γ

(5)

Each generating units have max and mini generating limits. It is also will serve as a form of constraint.

Comparative Analysis Model for Lambda Iteration …

143

3.2 Neglecting Losses and Including Generations Limits In order that of variable system economically typically, several generators at some point of the system are operating their maximum MW limits base load generators are at their maximum limits unless throughout the off-peak. For generation restricted between max and min limits, the power generation foe each plant is given by this constrain [34]: Pi (min() ≤ (max()))

(6)

∂fi = λ for Pi (min()(max())) ∂ pi

(7)

∂ fi ≤ λ for P i = P i (max()) ∂ pi

(8)

∂fi ≥ λ for Pi = Pi (max()) ∂ pi

(9)

The optimal condition of neglecting losses and including generations limits reduces to ∂fi = β + 2 F iλPi ∂ pi

(10)

λ = β + 2Fi Pi

(11)

3.3 Including Losses and Generations Limits When energy is transferred over long distance transmission, the line losses are the one of the major issues which has strong impact. Figure 2 shows a flowchart of ELD. Ploss =

m m m    . Pi Bil P j + B0i Pi + B00 i=1

j=1

(12)

i=1

The power balance equation constraint: Pd =

m  (Pi ) − Ploss i=1

(13)

144

A. Alafif et al.

Fig. 2 Flowchart of lambda iteration-based ELD

The optimal condition of including losses and generations limits reduces to ∂ Ploss ∂fi =λ =λ ∂ pi ∂ pi

(14)

 ∂ Ploss =2 Bil P j + B0i ∂ pi j=1

(15)

m

From (10) and (15) into (14), we have βi + 2λi Pi + 2λ

m  j=1

Bil P j + B0i λ = λ

(16)

Comparative Analysis Model for Lambda Iteration …

145

Form (16), power generated in unit i can founded by Pi =

λ(1 − B0i ) − Bi − 2λ

m j=1

2(Fi + λBii )

Bil P j

(17)

From (17), it can be simplified: Pi =

λ − Bi 2(Fi + λBii )

(18)

Net power will be calculated by Pnet =

m 

Pi − Pd − Ploss

(19)

i=1

4 Genetic Algorithm GA is empirical systems that contain in optimal events as encouraged in usual development. In GA, initial chance populace needs to be shaped. The features of each explanation are used in equivalent chromosomes. All problems are assessed and branded reliable with how agreeably it fulfills the objective function and before its distant allocated a likelihood of imitation. The rightest persons are abundant extra probable to be replicated (selection), and therefore, they receive the ones features. The mixture (crossover) of the parental genes harvests to a successive generation (mutation). Mutation could rise in the chromosomes of certain persons. It is predictable that a few persons of this new generation must inherited the finest features in their paternities and may be a one of superior result to resolve ELD issues [35]. Population is chromosomes might be program elements or any data system. Reproduction is parents are designated at chance with collection ventures subjective in relative to chromosome assessments. Modifications are stochastically activated of operative kinds are crossover. Evaluation is interpreting a chromosome and allocates it a suitability quantity and the only connection among a standard GA and the problematic, it is resolving. Discard is steady-state GA is whole populations are substituted with every single iteration. Steady-state GA is insufficient memberships substituted both generations. Advantages of GA are concept is too easy to understand how its work. It also does backings of multi-objective optimal results. GA is capable of getting better at time progress. Less complexity to use already obtains solutions with increases reusability. Building as components makes the model more flexible. GA can be used as substitute details are too much slow or highly complex and it can also be utilized as need an investigative tool to observe new methods. Figure 3 shows flowchart of GA-based ELD [36].

146

A. Alafif et al.

Fig. 3 Flowchart of GA-based ELD

5 PSO Algorithms The multiple agents called particles swarm around the search space starting from some initial random guess. The swarm communicates the current best and shares the global best so as to focus on the quality solutions. Since its development, there have been about 20 different variants of particle swarm optimization techniques and have been applied to almost all areas of tough optimization problems. The basic idea of the algorithm is to create a swarm of atoms which flow in the space round them looking for their goal or the location which best fits their wishes given through a fitness function. We outline what is called a neighborhood, it is to mention compute distances and take the closest particles, the most widely used neighborhood is a “social” one: only a listing of neighbors, regardless in which they are. So, you do not want to outline a distance and that could be a great advantage, for in a few cases,

Comparative Analysis Model for Lambda Iteration …

147

Fig. 4 Mathematical model

particularly for discrete spaces, any such definition could be quite arbitrary. The most typically use neighborhood is the circular one as shown in Fig. 4. The figure is almost self-explanatory. Each particle is numbered, placed on a virtual circle according to its quantity, and the neighborhood of a given particle is constructed through taking its neighbors in this circle. Mathematical process model of the algorithm Xi (t + 1), this might be better location primarily based totally on the mathematical version and the following those regulations with the aid of using each particle swarm helpfully the particle ought to collaborate to discover the greatest location. Equation for update the location of particle and it is able to be defined as Xi(t + 1) as shown in Fig. 5. Vi (t + 1) = W Vi (t) + C1r 1(Pi (t) − xi(t)) + C2r 2(g(t) − xi(t)) xi(t + 1) = Xi(t) + vi(t + 1)   wmax − wmin () W = W max − max iteration V i = Velocity of the ith particle, xi = Position of the C1 and C2 = Acceleration coefficients, r1 and r2 = Random number. The three components of particles to create new velocity vector equation. Vi (t + 1) = W Vi (t) + C1r 1(Pi (t) − Xi(t)) + C2r 2(g(t) − xi(t))

148

A. Alafif et al.

Fig. 5 Flowchart of PSO-based ELD

6 Result and Dissection Some of method to resolve ELD problematic is for lessening the price of unit for generators lambda iteration, GA, and PSO methods. Even though the computations are complex, but it provides solutions for optimal solutions. Case Study: GA and PSO methods are implemented in Saudi Arabia network system with thirteen generating units with twenty-eight bus and fifteen loads. The results are from GA, PSO, and lambda iteration, whole computer program has been done in MATLAB environment. Table 1 presents result of ELD problem for Saudi Arabia network with thirteen units and twenty-eight bus using lambda iteration method, GA and PSO with comparison the results, shows total fuel price, power losses, number of iteration, and lambda for various load conditions of individual generators. Table 1 shows the power demand for all methods 50,000 MW. The fuel price about 2,049,950 $/h with power losses 5162.2 MW by using lambda iteration method then the fuel price also about 2,037,700 $/h with power losses 5033.6 MW by using GA method finally the fuel price about 1,840,770 $/h with power losses 4987 MW by using PSO method. The obtained results in Table 1 found the power losses using lambda iteration less than GA but PSO less than other method. About fuel price using lambda iteration better than GA and by using PSO better than other methods, these graphics present of ELD problem by GA and PSO methods fitness value function is total fuel cost for thirteen units system generating in (Figs. 6 and 7) and best value individual its best value for each of thirteen units generators (Figs. 8 and 9). As well, Fig. 6 provides the results of total fuel price for GA about 2,037,700 $/h and Fig. 7 reduced the results

Comparative Analysis Model for Lambda Iteration … Table 1 The result of Saudi Arabia network using lambda iteration, GA, and PSO

149

Number of gen.

Lambda iteration

GA

PSO

Pd (MW)

50,000

50,000

50,000

P1 (MW)

9995.5

9999.1

9695.9

P2 (MW)

4500

4499.9

4113.2

P3 (MW)

3351.4

3479.8

3181.4

P4 (MW)

5000

4999.6

4777.1

P5 (MW)

3443.5

3497.5

3277.8

P6 (MW)

3000

2997.9

2591

P7 (MW)

3500

3499.9

3409.4

P8 (MW)

4500

4498.8

4425.4

P9 (MW)

4382.5

4495.2

4328.2

P10 (MW)

3000

2999.5

2827.7

P11 (MW)

2852.2

2968.5

2425.5

P12 (MW)

3000

2999.2

2919.5

P13 (MW)

3000

2999.1

2801.5

Ploss (MW)

5162.2

5033.6

4987

F total ($/h)

2,049,950

2,037,700

1,840,770

Number of iteration

7

51

63

Lam ($/MWh)

64.4828

60.3470

61.9186

Fig. 6 Fitness value of thirteen units system generating with twenty-eight bus by GA

150

A. Alafif et al.

Fig. 7 Fitness value of thirteen units system generating with twenty-eight bus by PSO

Fig. 8 Current best individual of thirteen units system generating with twenty-eight bus by GA

of total fuel price to about 1,840,770 $/h by PSO. Also, Fig. 8 shows the results of power for each unit around P1 = 10,000 MW, P2 = 4500 MW, P3 = 3480 MW, P4 = 5000 MW, P5 = 3500 MW, P6 = 3000 MW, P7 = 4500 MW, P8 = 4495 MW, P9 = 3000 MW, P10 = 2968 MW, P11 = 3000 MW, P12 = 3000 MW, and P13 = 2911.2 MW by GA and Fig. 9 shows different results of power for each unit around P1 = 9695 MW, P2 = 4113 MW, P3 = 3181 MW, P4 = 4777 MW, P5 = 3278 MW, P6 = 2591 MW, P7 = 3409 MW, P8 = 4425 MW, P9 = 4328 MW, P10 = 2828 MW, P11 = 2426 MW, P12 = 2919 MW, and P13 = 2919 MW by PSO.

Comparative Analysis Model for Lambda Iteration …

151

Fig. 9 Current best individual of thirteen units system generating with twenty-eight bus by PSO

7 Conclusion ELD in electrical power systems is a significant job, as it is compulsory to source the control at the least amount which assistances in profitable. In this work, ELD problems, here, solved for Saudi Arabia network system with thirteen generating units with twenty-eight bus and fifteen loads with three of widespread techniques that GA, PSO methods, and lambda iteration with help of MATLAB m-file function code. Computational results that PSO and GA are given the results better than lambda iteration with more performances for PSO in regard of fuel cost and power losses with more detail, fitness value, and current best individual.

References 1. Baldonado, M., Chang, C.-C.K., Gravano, L., Paepcke, A.: The Stanford digital library metadata architecture. Int. J. Digit. Libr. 1, 108–121 (1997) 2. Pratap Nair, M., Nithiyananthan, K.: An effective cable sizing procedure model for industries and commercial buildings. Int. J. Electr. Comput. Eng. 6(1), 34–39 (2016) 3. Warwick, W.M.: A Primer on Electric Utilities, Deregulation, and Restructuring of U.S. Electricity Markets. Pacific Northwest National Laboratory (2002) 4. Vidhya, K.: An power efficient implementation of SDR using matlab and simulink with adaptive modulation and demodulation techniques. J. Green Eng. 10(9), 4584–4597 (2020) 5. Blanco, M.I.: The economics of wind energy. Renew. Sustain. Energy Rev. 13(6–7), 1372–1382 (2009) 6. Li, X.: Study of multi-objective optimization and multi-attribute decision making for economic and environmental power dispatch. Electr. Power Components Syst. 37(10) (2009) 7. Sachan, A., Gupta, A.K.: A review of MPPT algorithms employed in wind energy conversion systems. J. Green Eng. 6(4), 385–402 (2017) 8. Kalpana, N., Venu Gopala Rao, M.: Impact of crow search algorithm to minimize transmission system power losses. J. Green Eng. 10(12), 12851–12864 (2020)

152

A. Alafif et al.

9. Faseela, C.K., Vennila, H.: Economic and emission dispatch using whale optimization algorithm. Int. J. Electr. Comput. Eng. (IJECE) 8, 1297–1304 (2018) 10. Chen, F., Huang, G.H., Fan, Y.R., Liao, R.F.: A nonlinear fractional programming approach for environmental-economic power dispatch. Int. J. Electr. Power Energy Syst. 78, 463–469 (2016) 11. Touil, A., et al.: Economic and emission dispatch using cuckoo search algorithm. Int. J. Electr. Comput. Eng. 9, 3384–3390 (2019) 12. Nithiyananthan, K., Mobarak, Y., Alharbi, F.: Application of cloud computing for economic load dispatch and unit commitment computations of the power system network. Adv. Intell. Syst. Comput. 1179–1189 (2020) 13. Paulraj, D.: A gradient boosted decision tree-based sentiment classification of twitter data. Int. J. Wavelets Multiresolution Inf. Process. 18(4) (2020) 14. Modiri-Delshad, M., Kaboli, S.H.A.: Backtracking search algorithm for solving economic dispatch problems with valve-point effects and multiple fuel options. Energy 116, 637–649 (2016) 15. Mobarak, Y.A., Kannan, N.: Vertically integrated utility power system structures for Egyptian scenario and electricity act. J. Adv Res. Dyn. Control Syst. 11(7), 766–772 (2019) 16. Beigvand, S.D., Abdi, H., La Scala, M.: Combined heat and power economic dispatch problem using gravitational search algorithm. Electr. Power Syst. Res. 133, 160–172 (2016) 17. Krishnasamy, U., Nanjundappan, D.: Hybrid weighted probabilistic neural network and biogeography based optimization for dynamic economic dispatch of integrated multiple-fuel and wind power plants. Int. J. Electr. Power Energy Syst. 77, 385–394 (2016) 18. Zhang, Q., Zou, D., Duan, N., Shen, X.: An adaptive differential evolutionary algorithm incorporating multiple mutation strategies for the economic load dispatch problem. Appl. Soft. Comput. 78, 641–669 (2019) 19. Rao, R., Jaya, V.: A simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int. J. Ind. Eng. Comput. 7(1), 19–34 (2016) 20. Awad, M., Tawila, M., Alnakeeb, H.: The development of trans-border energy supply to the South Mediterranean Coast particularly electricity and natural gas. In: 8th World Congress, Sydney, Australia, 5–9 Sept 2004 21. Nikmehr, N., Ravadanegh, S.: Optimal power dispatch of multimicrogrids at future smart distribution grids. IEEE Trans. Smart Grid 6(4), 1648–1657 (2015) 22. Neelakandan, S.: An automated exploring and learning model for data prediction using balanced CA-SVM. J. Ambient Intell. Human. Comput. 66 (2020) 23. Al-Shetwi, A.Q., Alomoush, M.I.: A new approach to the solution of economic dispatch using genetic algorithm. J. Eng. Technol. 7(1), 40–48 (2016) 24. Kavitha Priya, C.J.: An Analysis of types of protocol implemented in internet of things based on packet loss ratio. In: Proceedings of International Conference of Information and Communication Technology for Competitive Strategies, pp. 1–6 (2016) 25. Bora, T., Mariani, V., Coelho, L.: Multiobjective optimization of the environmental economic dispatch with reinforcement learning based on non-dominated sorting genetic algorithm. Appl. Therm. Eng. 146, 688–700 (2019) 26. Subbulakshmi, P., Ramalakshmi, V.: Honest auction based spectrum assignment and exploiting spectrum sensing data falsification attack using stochastic game theory in wireless cognitive radio network. Wirel. Pers. Commun. Int. J. 102(2), 799–816 (2018) 27. Cheng, Y.-S., et al.: A particle swarm optimization-based power dispatch algorithm with roulette wheel redistribution mechanism for equality constraint. Renew. Energy 88, 58–72 (2016) 28. Thangavel, R.: Resource selection in grid environment based on trust evaluation using feedback and performance. Am. J. Appl. Sci. 10(8), 924–930 (2013) 29. La, C., Balachander, K.: Design of power system stabilizer for multi-machine systems using modified swarm optimization algorithm with wind energy generation. J. Green Eng. 11(1), 156–178 (2021) 30. Zhang, H.: An improved particle swarm optimization algorithm for dynamic economic dispatch problems. Int. J. Innov. Res. Eng. Manage. (IJIREM) 3(4), 264–266 (2016)

Comparative Analysis Model for Lambda Iteration …

153

31. Gholamghasemi, M., Akbari, E., Asadpoor, M.B., Ghasemi, M.: A new solution to the nonconvex economic load dispatch problems using phasor particle swarm optimization. Appl. Soft. Comput. 79, 111–124 (2019) 32. Lotfi, H., Dadpour, A., Samadi, M.: Solving economic dispatch in competitive power market using improved particle swarm optimization algorithm. Int. J. Smart Electr. Eng. 6(1), 35–41 (2017) 33. Jadoun, V.K., Gupta, N., Niazi, K.R., Swarnkar, A.: Modulated particle swarm optimization for economic emission dispatch. Int. J. Electr. Power Energy Syst. 73, 80–88 (2015) 34. Low, S.H.: Optimal Power Flow for Future Smart Grid. Clean Energy Institute, University of Washington (2015) 35. Ebeed, M., Kamel, S., Jurado, F.: Optimal power flow using recent optimization techniques. In: Classical and Recent Aspects of Power System Optimization, pp. 157–183 (2018) 36. Keerthivasan, K., Pavan Kumar Reddy, Y.: A multilevel H-bridge cascaded using a single dc source inverter for automobile applications. Int. J. Innov. Sci. Eng. Res. (IJISER) 5(10), 93–98 (2018)

A Novel Approach for Providing Security for IoT Applications Using Machine Learning and Deep Learning Techniques M. V. Kamal, P. Dileep, and M. Gayatri

Abstract In the contemporary era, Internet of things (IoT) is becoming very popular technology. It has paved way for many use cases that provide more advantages to humans at various levels. IoT essentially combines physical and digital things to have a seamless experience in terms of controlling and communicating. It contains sensor network that continuously produces data. In IoT use cases, the usage of heterogeneous standards and original equipment manufacturers leads to several security vulnerabilities. The existing methods to provide security need enhancement in terms of analysing data in real time and prevent cyberattacks on IoT applications. In this paper, we proposed machine learning (ML) and deep learning-based solutions towards IoT security. Once class SVM is used for detecting attacks. We also used deep autoencoder to analyse data and find attacks. Both the approaches are found to be useful in near real-time detection of attacks on IoT networks. Experimental results are made with an IoT modelled real-time dataset. The experimental results revealed that one class SVM showed slightly better performance over deep autoencoder. Nevertheless, both are found to be useful to detect and prevent cyberattacks. They can complement the overall security strategy for IoT use cases.

1 Introduction In the contemporary era, Internet of things (IoT) is becoming very popular technology. It has paved way for many use cases that provide more advantages to humans at various levels. IoT essentially combines physical and digital things to have a seamless experience in terms of controlling and communicating. It contains sensor network that continuously produces data. In IoT use cases, the usage of heterogeneous standards and original equipment manufacturers leads to several security vulnerabilities. There are many existing solutions in the literature that are based on either ML or deep learning paradigms. Some of the ML models are found in [1, 2] and [3], whilst M. V. Kamal · P. Dileep · M. Gayatri (B) School of Computer Science, Malla Reddy College of Engineering and Technology, Hyderabad, Telangana State, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_14

155

156

M. Kamal et al.

deep learning models are found in [4–6] and [7]. Al-Garadi et al. [1] made a survey on ML and deep learning methods that are used to detect cyberattacks. Roldan et al. [2] investigated and developed an intelligent ML-based framework for detection of cyberattacks. Shafique et al. [3] studied different architectures based on ML for IoT security. Amanullah et al. [4] particularly used big data technologies and deep learning to develop a model for IoT security. They found that feature selection could improve performance of detection models. Sagduyu et al. [5] proposed an adversarial network model using deep learning to detect cyber security attacks. Otoum et al. [6] used deep learning models to have protection mechanisms for IoT applications. They incorporated learning enhancements to have better performance. Roukounaki et al. [7] worked on different IoT security datasets in order to find a scalable approach in having a deep learning or machine learning model for IoT security. From the literature, it is understood that there are many approaches using either deep learning or ML for IoT security. In this paper, we proposed both deep learning and ML models to examine the utility of the two kinds of models. Our contributions in this paper are as follows. 1. 2. 3.

We built a framework to facilitate two learning models such as one class SVM from deep learning models and deep autoencoder from deep learning models. The two models are presented with their functionality using two algorithms. A prototype is built using Python data science platform to evaluate the proposed framework and find the performance statistics of both the models

The remainder of the paper is structured as follows. Section 2 reviews literature on different models for cyber security in IoT use cases. Section 3 presents the proposed learning models. Section 4 presents evaluation methodology used. Section 5 presents results of the empirical study. Section 6 concludes the paper and gives directions for future scope of the research.

2 Related Work There are many security issues with networks in IoT integrated use cases. Malhotra et al. [8] explored different security issues in such environments in different domains. Hussain et al. [9] also investigated on security issues in IoT applications. They found that there have been efforts using deep learning and ML approaches to detect attacks in IoT scenarios. Amanullah et al. [4] particularly used big data technologies and deep learning to develop a model for IoT security. They found that feature selection could improve performance of detection models. Al-Garadi et al. [1] made a survey on ML and deep learning methods that are used to detect cyberattacks. Babar et al. [10] on the other hand proposed a management engine to monitor IoT use case to have secure resilient approach. Roldan et al. [2] investigated and developed an intelligent ML-based framework for detection of cyberattacks. Sagduyu et al. [5] proposed an adversarial network model using deep learning to detect cyber security attacks. Zikria et al. [11] focussed on the deep learning-based models towards monitoring IoT

A Novel Approach for Providing Security for IoT …

157

scenarios to protect them from cyberattacks. Jafari et al. [12] explored fingerprinting of IoT devices that are participating in a specific use case. This could help them to develop attack detection models particularly the attacks launched by compromised IoT devices. Abbas et al. [13] identified different IoT security risks and classified them into different categories. Otoum et al. [6] used deep learning models to have protection mechanisms for IoT applications. They incorporated learning enhancements to have better performance. Li et al. [14] investigated on the concept of system statistics to learn and model IoT security. They found its feasibility and relevance. Roukounaki et al. [7] worked on different IoT security datasets in order to find a scalable approach in having a deep learning or machine learning model for IoT security. Sharma et al. [15] studied on the detection of anomalies in IoT networks using deep learning-based models. Dawoud et al. [16] defined a security architecture for IoT-based software defined networking (SDN) phenomenon and found its suitability. Uprety et al. [17] used reinforcement learning paradigms to protect IoT systems from malicious attacks. Shafique et al. [3] studied different architectures based on ML for IoT security. Other important researches include deep learning-based intrusion detection [18], AI-enabled security [19], and ML-based solutions [20]. From the literature, it is understood that there are many approaches using either deep learning or ML for IoT security. In this paper, we proposed both deep learning and ML models to examine the utility of the two kinds of models.

3 Proposed Learning Models The proposed methodology is based on the fact that there is need for automated learning algorithms from large volumes of data. Fortunately, the IoT use cases have lot of associated historical data that can be used as training dataset. This has enabled the proposed framework to have empirical study. Two learning models are used for the detection of attacks on IoT use cases. First, a ML-based model using one class SVM is used. Afterwards, a deep learning-based autoencoder model is used for the experiments. The overview of the proposed framework is shown in Fig. 1. As presented in Fig. 1, it is understood that the given DDB dataset is subjected to pre-processing. The pre-processed data are given to the ML or deep learning algorithm. The one class SVM works as unsupervised model with a decision function to detect attack traffic in the traffic data arrived. The deep autoencoder on the other hand has its own mechanism consisting of encoding and decoding to arrive at the decision in detection of attack traffic. As presented in Fig. 2, it is evident that the given data are subjected to encoding. The encoding process reduces the representation of data using representation learning. Once it is completed, the data are given to hidden layer. The code representation of the data in hidden layer is further subjected to decoding where it establishes the inputs as reconstructed representation to detect malicious traffic scenarios.

M. Kamal et al.

DDb Dataset (Real Time IoT Network Traffic)

Pre-Processing

158

Deep Autoencoder / One Class Support Vector Machine

Cyber Attack Detection System

Cyber Attack Detection

Fig. 1 Proposed methodology for cyberattack detection

Fig. 2 Deep autoencoder for attack traffic detection

3.1 One Class SVM Algorithm The one class SVM algorithm uses unsupervised learning model that takes dataset as input and produces detection results.

A Novel Approach for Providing Security for IoT …

159

Algorithm: One Class SVM Input: DDb Dataset D Output: Attack detection results R 1. Start 2. Initialize training data vector T1 3. Initialize testing data vector T2 4. Initialize compressed data vector C 5. (T1, T2) PreProces(D) 6. Model TrainOneClassSVM(T1) 7. R=Predict(Model, T2) 8. Print confusion matrix 9. Compute performance statistics 10. Display confusion matrix 11. Display performance statistics 12. Return R 13. End Algorithm 1: One class SVM algorithm

As presented in Algorithm 1, it takes DDB dataset as input and produces attack detection results. The data are subjected to pre-processing. Then, the one class SVM model is trained from the training set. After learning from the data and generation of knowledge model, it is capable of predicting possible attacks. The attack detection results with several details such as confusion matrix, performance statistics, and actual prediction values are returned.

3.2 Deep Autoencoder Algorithm The deep autoencoder algorithm uses unsupervised learning model that takes dataset as input and produces detection results. However, it has different phases such as encoding, decoding, and reconstruction.

160

M. Kamal et al.

Algorithm: Deep autoencoder Input: DDb Dataset D Output: Attack detection results R 14. Start 15. Initialize training data vector T1 16. Initialize testing data vector T2 17. Initialize compressed data vector C 18. (T1, T2) PreProces(D) 19. Learn from T1 20. C=Encoding(T2) 21. C1 Decoding(T2) 22. R Reconstruction(C1, T2) 23. Compute reconstruction error 24. Compute performance metrics 25. Display reconstruction error 26. Display performance statistics 27. Return R 28. End Algorithm 2: Deep autoencoder algorithm

As presented in Algorithm 2, the deep autoencoder learns from given data and performs operations such as encoding, decoding, and reconstruction. After reconstruction, it will be able to detect possible attack traffics in the test data.

4 Evaluation Methodology The evaluation methodology is based on the provision of extracting ground truth from the given dataset. Once the ground truth is available, the algorithms’ performance can be computed using certain observations provided in Fig. 3.

Fig. 3 Confusion matrix-based ground truth and predicted values comparison

A Novel Approach for Providing Security for IoT …

161

The number of true positives are computed based on the count of positively detected samples that match with the ground truth. In the same fashion, true negative reflects the count of negatively detected samples that match with the ground truth. In the same fashion, false positives and false negatives are counted. From these counts, certain metrics are derived. They are known as sensitivity, specificity, accuracy, and kappa statistic.

5 Experimental Results Experiments are made with the two algorithms known as one class SVM and deep autoencoder. The results are observed in terms of different metrics aforementioned. Tables 1 and 2 show the confusion matrix values for one class SVM and deep autoencoder, respectively. As presented in Figure 4, the row index and MSE are shown in horizontal and vertical axes, respectively. The results revealed that reconstruction error is kept minimum reflecting the efficiency of the deep autoencoder. Table 1 Confusion matrix for one class SVM

Table 2 Confusion matrix for deep autoencoder

Fig. 4 Reconstruction error with deep autoencoder

Really (ground truth) Prediction

False

True

False

484,441

18

True

0

3955

Prediction

False

True

False

968,750

1610

True

0

6469

Really (ground truth)

162

M. Kamal et al.

Table 3 Performance comparison between one class SVM and deep autoencoder Algorithm

Sensitivity

Specificity

Detection rate

Balanced accuracy

SVM

1

0.9955

0.9919

0.9977

Deep autoencoder

1

0.8007

0.9917

0.9004

Fig. 5 Performance comparison of the two models

As presented in Table 3, the experimental results of the two learning models or detection models are provided in terms of sensitivity, specificity, detection rate, and balanced accuracy. As presented in Fig. 5, the two models such as one class SVM and deep autoencoder are compared in terms of different metrics. The two models showed the sensitivity as 1 which is highest. However, the specificity, detection rate, and balanced accuracy reflect that SVM is slightly better than deep autoencoder. The deep autoencoder trains itself from the data available implicitly. It is capable of performing well in presence of large volumes of data.

6 Conclusion and Future Work In this paper, we proposed machine learning (ML) and deep learning-based solutions towards IoT security. Once class SVM is used for detecting attacks. We also used deep autoencoder to analyse data and find attacks. Both the approaches are found to be useful in near real-time detection of attacks on IoT networks. In the presence of cyberattacks, the traditional approaches to provide security need to be integrated with ML and deep learning-based solutions. The rationale behind this is that those methods can quickly learn from large volumes of data and provide near real-time results. This will enable security systems to be more realistic and useful. The usage of the two approaches in this paper proved that, they are highly accurate and can be used to protect IoT networks. Experimental results are made with an IoT modelled realtime dataset. The experimental results revealed that one class SVM showed slightly

A Novel Approach for Providing Security for IoT …

163

better performance over deep autoencoder. Nevertheless, both are found to be useful to detect and prevent cyberattacks. They can complement the overall security strategy for IoT use cases. In the future, we intend to extend our learning models with feature selection approaches and ensemble learning for improving performance further.

References 1. Al-Garadi, M. A., Mohamed, A., Al-Ali, A., Du, X., Ali, I., Guizani, M.: A survey of machine and deep learning methods for internet of things (IoT) security. IEEE Commun. Surv. Tutor., pp. 1–46 (2020) 2. Roldán, J., Boubeta-Puig, J., Luis Martínez, J., Ortiz, G.: Integrating complex event processing and machine learning: an intelligent architecture for detecting IoT security attacks. Expert Syst. Appl., pp. 1–22 (2020) 3. Shafique, M., Theocharides, T., Bouganis, C.-S., Hanif, M.A., Khalid, F., Hafiz, R., Rehman, S.: An overview of next-generation architectures for machine learning: roadmap, opportunities and challenges in the IoT era. 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 827–832 (2018) 4. Amanullah, M.A., Habeeb, R.A.A., Nasaruddin, F.H., Gani, A., Ahmed, E., Nainar, A.S.M., Akim, N.M., Imran, M.: Deep learning and big data technologies for IoT security. Comput. Commun., pp. 495–517 (2020) 5. Sagduyu, Y.E., Shi, Y., Erpek, T.: IoT network security from the perspective of adversarial deep learning. In: 2019 16th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), pp. 1–23 (2019) 6. Otoum, Y., Nayak, A.: On securing IoT from deep learning perspective. In: 2020 IEEE Symposium on Computers and Communications (ISCC) (2020) 7. Roukounaki, A., Efremidis, S., Soldatos, J., Neises, J., Walloschke, T., Kefalakis, N.: Scalable and configurable end-to-end collection and analysis of IoT security data : towards end-to-end security in IoT systems. In: 2019 Global IoT summit (GIoTS), pp. 1–6 (2019) 8. Malhotra, P., Singh, Y., Anand, P., Bangotra, D.K., Singh, P.K., Hong, W.: Internet of things: evolution, concerns and security challenges. Sensors, pp. 1–33 (2021) 9. Hussain, F., Hussain, R., Hassan, S.A., Hossain, E.: Machine learning in IoT security: current solutions and future challenges. IEEE Commun. Surv. Tutor., pp. 1–38 (2020) 10. Babar, M., Tariq, M. U., Jan, M.A.: Secure and resilient demand side management engine using machine learning for IoT-enabled smart grid. Sustain. Cities Soc., pp. 1–21 (2020) 11. Zikria, Y.B., Afzal, M.K., Kim, S. W., Marin, A., Guizani, M.: Deep learning for intelligent IoT: opportunities, challenges and solutions. Comput. Commun. p1–14 (2020) 12. Jafari, H., Omotere, O., Adesina, D., Wu, H.-H., Qian, L.: IoT devices fingerprinting using deep learning. In: MILCOM 2018—2018 IEEE Military Communications Conference (MILCOM), pp. 901–906 (2018) 13. Abbass, W., Bakraouy, Z., Baina, A., Bellafkih, M.: Classifying IoT security risks using deep learning algorithms. In: 2018 6th International Conference on Wireless Networks and Mobile Communications (WINCOM), pp. 1–6 (2018) 14. Li, F., Shinde, A., Shi, Y., Ye, J., Li, X.-Y., Song, W.Z.: System statistics learning-based IoT security: feasibility and suitability. IEEE Internet Things J., pp. 1–8 (2019) 15. Sharma, B., Sharma, L., Lal, C.: Anomaly detection techniques using deep learning in IoT: a survey. In: 2019 International conference on computational intelligence and knowledge economy (ICCIKE). p146–149 (2019) 16. Dawoud, A., Shahristani, S., Raun, C.: Deep learning and software-defined networks: towards secure IoT architecture. Internet of Things 3–4, 82–89 (2018) 17. Uprety, A., Rawat, D.B.: Reinforcement learning for IoT security: a comprehensive survey. IEEE Internet Things J., pp. 1–14 (2020)

164

M. Kamal et al.

18. Idrissi, I., Azizi, M., Moussaoui, O.: IoT security with deep learning-based intrusion detection systems: a systematic literature review. In: 2020 Fourth International Conference on Intelligent Computing in Data Sciences (ICDS), pp. 1–10 (2020) 19. Wu, H., Han, H., Wang, X., Sun, S.: Research on artificial intelligence enhancing internet of things security: a survey. IEEE Access, pp. 1–22 (2020). 20. Tahsien, S.M., Karimipour, H., Spachos, P.: Machine learning based solutions for security of Internet of Things (IoT): a survey. J. Netw. Comput. Appl. 102630, P1–18 (2020)

Analysis of Bangla Keyboard Layouts Based on Keystroke Dynamics Shadman Rohan , Koushik Roy , Pritom Kumar Saha, Sazzad Hossain, Fuad Rahman , and Nabeel Mohammed

Abstract Keyboards are the primary devices for interaction with computer platforms and have been a central topic in HCI research for decades. The study of their layout design is important in deciding their efficiency, practicality and adoption. Multiple keyboard layouts have been developed for Bangla without any rigorous study for their comparison. In this paper, we take a quantitative data-driven approach to compare their efficiency. Our evaluation strategy is based on the key-pair stroke timing with data collected from standard QWERTY English keyboards. Our experiments conclude that the Bijoy keyboard layout is the most efficient design for Bangla among the four layouts studied. This quantitative approach can lay the groundwork for further study of these layouts based on other criteria.

1 Introduction The QWERTY keyboard layout currently dominates the computer keyboard users’ preference. Computer keyboards originally inherited their QWERTY layout from the typewriters back in 1874. Although sub-optimal in many ways, the study in [17] concluded that the steep adoption curve of any new arrangement makes the possibility of mass adoption of some other design highly unlikely. With the given popularity of QWERTY keyboards, the pioneers of Bangla keyboards simply mapped the Bangla characters over it [1]. Most Bangla keyboard layouts are adopted from preexisting Bangla typewriter layouts. We could not locate any clear data or studies justifying the placement choices on these layouts. This leaves ground for investigation into this problem using methodologies that has been developed in modern days. In this paper, we perform quantitative analysis and comparison on several Bangla keyboard layouts using a data-driven approach. S. Rohan · K. Roy · P. K. Saha · S. Hossain · N. Mohammed (B) Apurba-NSU R&D Lab, Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh e-mail: [email protected] F. Rahman Apurba Technologies Ltd., Mohakhali New DOHS, Dhaka, Bangladesh © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_15

165

166

S. Rohan et al.

2 Previous Works Only few researches were conducted on comparison between Bangla keyboard layouts. Few studies, such as [5, 16] that investigated optimal keyboard design frequently needed to develop a framework for comparison before optimization. So there is some overlap between studies related to keyboard comparison and optimization. The study in [18] proposed a layout based on character and fingering frequency, and [13] used data mining techniques to build the optimum layout. Asikhia and [3] made some efforts to redesign the QWERTY layout for improving ergonomics. Sattar et al. [18] followed a statistical approach in creating the layout. They collected Bangla writings and documents from various categories and then simply aimed to map the Bangla characters to keyboard keys based on a sorted order of frequency. The data mining technique in [13] used the association rule [21] and the apriori algorithm [9]. They collected large number of characters from different sources, analyzed their frequency and listed them in descending order. Then, the relation among them were investigated, and strong association rules which satisfy minimum support threshold and confidence threshold were formed. This was later used to determine the suitable position of keys. In the data mining method, the support and confidence values were required to be optimized for the design. There were no functions proposed to evaluate the performance of the models. However, the research in [13] conducted comparative studies with the Bijoy Keyboard. In their study, they calculated the hand switching, left- and right-hand load and showed their keyboard performed better based on these criteria. There are other approaches like [20] where they used simulated annealing to modify the keyboard layout to optimize for the finger travel distance between characters.

3 Bangla Keyboard Layouts Fixed keyboard layouts are based on the QWERTY keyboard layout which is meant for Latin scripts and has been around since the era of typewriters. Bangla has far more unique characters compared to Latin (73 unique characters including 11 vowels, 39 consonants, 10 digits and 13 signs), and all these characters need to be mapped. Among fixed keyboard layouts for Bangla, the most popular ones are Bangla Jatiyo Keyboard, Bijoy Keyboard, Probhat, Baishakhi, Uni Gitanjali etc. In Bangladesh, the Jatiyo Keyboard developed by the Bangladesh Computer Council is considered as the standard layout, whereas Baishakhi and Uni Gianjali are used in Indian governmental works. Phonetic layouts convert the words typed in English into Bangla. Avro is currently the most popular phonetic keyboard layout in the world. It has been listed in the Unicode resources for keyboard layout by the Unicode consortium and the built-in keyboard for Bengali Wikipedia. The ease of learning in phonetic layouts comes

Analysis of Bangla Keyboard Layouts …

167

with certain challenges, such as being inflexible in the predictive texts as well as compatibility issues with certain popular software products, i.e., Adobe Photoshop [4].

4 Analysis Criteria As Bangla keyboard layouts lack any rigorous study that looks into scientific methods for key placement or sets up a basis for comparison, we heavily rely on the research around QWERTY layout to extract meaningful design principles that applies to Bangla. For a device to be considered effective, it should improve both comfort and productivity. Certain general design principles for keyboards furnished by ISO 9241-410 [12], EN ISO 9241-410:2008/A1:2012 [11] and HF-STD-001 [2] are widely accepted as necessary to create an effective and ergonomic keyboard. A primary objective in the keyboard optimization problem is to increase the typing speed, which naturally leads to the question of how to measure keyboard layouts in terms of the typing speed they may facilitate. Brouillette et al. [6] suggest reducing the average time needed to move between two characters which can be calculated using an adaptation of Fitts’ Law [8] shown in Eq. 1. t¯ =

   n  n  Di j Pi j log2 +1 IP Wi i=1 j=1

(1)

where IP denotes the calculated performance index for a device, Pi j gives the frequency of each digraph, Di j is the distance between the two keys, and W i is the width of each key.

5 Experimental Design We take a data-driven approach to quantify, analyze and compare the efficiency of some common Bangla keyboard layouts. Our approach requires stroke timing for all possible key-pairs which is difficult to collect. We side step this issue by working only with the high-frequency key-pairs. This exclusion has very little effect on our final experiments since these rare key-pairs have very low frequency and therefore contribute very little to the final cost estimate. The major problem with our approach was finding expert typists on all the keyboards for data collection. While it was impossible to find expert typists on all layouts, the long adaptation periods required for a new typist to adjust to new a layout made training also impractical. We develop heuristics to solve this issue by collecting data

168

S. Rohan et al.

from English typists and estimate the costs of each Bangla layouts by extrapolating from this data. A large corpus of Bangla texts is used for evaluation. The exact methodology for collecting evaluation corpus is described in the evaluation corpus section, and the methodology for quantifying and comparing the efficiency of these keyboards is described in the methodology section. At the end, we discuss the results and evaluate possible drawbacks of our current approach in Results and Conclusion section. The flowchart in Fig. 1 demonstrates the complete workflow of our experiments.

Fig. 1 Experimental workflow flowchart

Analysis of Bangla Keyboard Layouts …

169

6 Key-Stroke Data Collection The study in [14] laid out several conditions for data collection which were roughly followed. A group of ten volunteers were invited to type a randomly selected article from standard Wikipedia [19] on a computer with standard QWERTY keyboard. It is important to note that the volunteers were typing English texts. A key logger running in the background was used to register all the key-tap timings. The difference between press times was used to calculate the switching time between key-pairs as demonstrated in Fig. 2. The keystroke data were collected in two sessions from each volunteer. The first session was for training which consisted of 15 typing practice attempts, and the second session was for actual data collection which consisted of 10 further typing attempts. In the beginning, each subject was allowed a pre-training rehearsal session of 5 typing attempts, to familiarize the subject with the experiment. The travel time between all the key-pairs typed were calculated. For multiple occurrence of the same key-pair, the average was taken. The high-frequency key-pairs are shown in Table 1.

7 Evaluation Corpus 7.1 Text Selection and Preparation To build a corpus with textual diversity, three sources were utilized. The Banglapedia [10] was used as source for wiki-like descriptive texts, BanglaWikiSource[7] was used for collecting stories/novels and the popular daily Prothom Alo was scrapped as the source for news data. Table 1 shows the top five most common key-pairs from each layout. They were extracted from the keystroke transformation of the corpus. After extracting 1000 highest frequency key-pairs from each domain, we found that there is a large intersection of 73.9% among the key-pairs as shown in 3. The top five most common key-pairs are shown in Table 1.

Fig. 2 Switching time was calculated using Press–Press Time difference between two consecutive keys

170

S. Rohan et al.

Table 1 Top 5 common key-pairs for each layout Bijoy National Inscript fv (3.11%) cv (2.65%) fb (1.90%) vf (1.67%) vc (1.63%)

kj(2.38%) jk(2.33%) hj(1.95%) jm(1.83%) jh(1.81%)

hv (3.30%) gv (3.21%) cv (2.90%) gw (2.89%) hb (2.81%)

Avro ar(2.67%) ,r(2.53%) er(2.23%) ,z(2.18%) an(2.09%)

Fig. 3 Venn diagram of high-frequency key-pairs. Note that the diagram is not drawn to scale

8 Methodology We devise our evaluation methodology based on [15], where they formulate a simple metric to measure the efficiency of a layout, as given in Eq (2). Cost =

Z  Z 

F αβ T pos(α),pos(β)

(2)

α=A β=A

For a given Corpusbn = (c1 , c2 , c3 , . . . , cn ), the set of unique character-pairs and its corresponding frequency set F = { f 1 , f 2 , f 3 , . . . , f m } is defined in 3 and 4: C = {(ci , ci+1 ) : ci ∈ Corpusbn } fj=

n−1 

1(ci ,ci+1 )=k j ,k j ∈C

(3) (4)

i=1

where the ci is the ith character in the Corpusbn , f j is a mapping from frequency F to each element in C of key-pairs, n is the total number of characters. We propose a a simple adaptation of the above formulation in (2) with a new cost C En→Bn where the frequency F comes from Bangla domain F bn (represented by

Analysis of Bangla Keyboard Layouts …

171

our Corpus) and the switching time T is collected from typists typing in English on a standard three row QWERTY keyboard T en . C En→Bn =

Z Z  

F bn αβ T en pos(α),pos(β)

(5)

α=A β=A

where Fa,b is the frequency of every consecutive character pair appearing in our Bangla Corpus Corpusbn and T en pos(α),pos(β) is the switching time between positions Tpos1 , Tpos2 on a standard three row keyboard. In this setting, C is simply the metric that should be minimized to improve layout efficiency. This process is further described in Algorithm 1. Algorithm

Cost of a Layout

Input : Bangla character-pair frequencies Fbn ={f1 , f2 , f3 ... fm } and Key-pair Switching times, Ten {t1 , t2 , t3 ... tm } Output :Cost of the layout procedure Keyboard-Cost((f[], t[])) Initialize Cost for i = 1 → n do KeyPairCosti = fi * ti Cost = Cost + KeyPairCost end for return Cost

9 Results The results from our experiments are reported in the Table 2, which places the Bijoy layout as the most effective. The National keyboard suffers a significant drop in efficiency due to the dissemination of high-frequency characters across separate layers. While the phonetic layout, Avro, performed poorly, it still dominates the users’ preferences with its low adoption time.

Table 2 Efficiency measurements for each layout Keyboard Layout Cost Key-Pair Bijoy National Inscript Avro

217,747 244,625 234,579 253,213

172

S. Rohan et al.

Needless to say these results are influenced by the corpus of text used to extract the statistics. However, given the diversity of texts used in this study, we believe these conclusions have merit.

10 Conclusion We compared Bangla keyboards based on a measured metric that employs key-pair stroke timing. The results are interesting and sets up a quantitative framework for comparison that can lead to better layout arrangements. However, it is important to note that this study builds on some very strong assumptions. It is still unknown whether key-pair tap timing persists across languages and layouts or text being typed. Approaches such as [20], that uses slightly different heuristics to measure the efficiency of a keyboard, may be explored in the future. Our study was conducted on a small sample of 10 subjects without proper justification for the sample size from a statistical viewpoint. Perhaps, a study on a much broader scale can reveal new insights on these layouts and the users’ relation to them.

References 1. Ahamed, S.: An amazing journey from shahid lipi to avro. The Daily Star https://www. thedailystar.net/news-detail-136160 2. Ahlstrom, V., Longo, K.: Human factors design standard (hf-std-001), Atlantic city international airport, NJ: Federal aviation administration William J. Hughes technical center (2003) (amended/updated 2009) 3. Asikhia, O., Ehondor, S.: Ergonomics design of computer keyboard layout. J. Sci. Multidisc. Res. 2 (2010) 4. Bin IqbalL, F.: Which bangla keyboard is for you. The Daily Star https://www.thedailystar.net/ shout/satire/news/which-bangla-keyboard-you-1826149 5. Bowman, D.A., Rhoton, C.J., Pinho, M.S.: Text input techniques for immersive virtual environments: An empirical comparison. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting. vol. 46, pp. 2154–2158. SAGE Publications Sage CA, Los Angeles, CA (2002) 6. Brouillette, A., Sarmah, D., Kalita, J.: Multi-objective optimization for efficient brahmic keyboards. In: Proceedings of the Second Workshop on Advances in Text Input Methods, pp. 29–44 (2012) 7. Contributors, W.: Main page (2004). https://bn.wikipedia.org/wiki, [Online]. Accessed 2 July 2021 8. Fitts, P.M.: The information capacity of the human motor system in controlling the amplitude of movement. Journal of experimental psychology 47(6), 381 (1954) 9. Hegland, M.: The apriori algorithm–a tutorial. In: Mathematics and Computation in Imaging Science and Information Processing, pp. 209–262 (2007) 10. Islam, S.: Banglapedia: National Encyclopedia of Bangladesh, vol. 3. Asiatic society of Bangladesh (2003) 11. Ergonomics of human-system interaction—part 410: Design Criteria for Physical Input Devices—Amendment 1. Standard, International Organization for Standardization, Geneva, CH (2012)

Analysis of Bangla Keyboard Layouts …

173

12. Ergonomics of human-system interaction—part 410: Design Criteria for Physical Input Devices. Standard, International Organization for Standardization, Geneva, CH (2008) 13. Kamruzzaman, S., Alam, M., Masum, A.K.M., Hassan, M., et al.: Optimal Bangla Keyboard Layout Using Data Mining Technique. arXiv preprint arXiv:1009.4982 (2010) 14. Lee, J., Cho, H., McKay, R.B.: A rapid screening and testing protocol for keyboard layout speed comparison. IEEE Transactions on Human-Machine Systems 45(3), 371–384 (2015) 15. Light, L., Anderson, P.: Designing better keyboards via simulated annealing. AI Expert 8(9), 20–27 (1993) 16. MacKenzie, I.S., Zhang, S.X.: The design and evaluation of a high-performance soft keyboard. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 25–31 (1999) 17. Noyes, J.: The qwerty keyboard: A review. International Journal of Man-Machine Studies 18(3), 265–281 (1983) 18. Sattar, M.A., Pathan, A.M.K., Ali, M.A.: Development of an optimal bangla keyboard layout based on character and fingering frequency. In: National Conference on Computer Processing of Bangla, pp. 38–46. Citeseer (2004) 19. Wikipedia Contributors: Main page (2004). https://en.wikipedia.org/wiki [Online] Accessed 2 July 2021 20. Yang, N., Mali, A.D.: Modifying keyboard layout to reduce finger-travel distance. In: 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 165– 168. IEEE (2016) 21. Zhang, C., Zhang, S.: Association Rule Mining: Models and Algorithms, vol. 2307. Springer (2003)

Improved Efficiency of Semantic Segmentation using Pyramid Scene Parsing Deep Learning Network Method Pichika Ravikiran and Midhun Chakkaravarthy

Abstract While semantic segmentation is useful for object detection and scene perception, traditional methods have limitations in terms of the level of accuracy details that can be recovered from a given image or scene. A label category can be assigned to each pixel by a deep learning-based semantic segmentation algorithm, which can be developed using the pyramid scene parsing method, which was proposed in this paper. Training and testing experimental results on public datasets were carried out, resulting in high mean accuracy and good intersection over union (IOU).

1 Introduction Scene perception is the visual understanding of an environment as viewed by an observer at any given time. It includes relative information about locations and expectations what other kinds of objects might be encountered. Object detection is a computer vision technique that allows us to identify and locate objects in an image or video. It gives information of object identification and localization, object detection can be used to calculate number of objects in a scene and determine and track their precise pixel locations, all while accurately labeling them. Scene parsing [1], based on semantic segmentation, is a fundamental topic in computer vision. The goal is to assign each pixel in the image a category label. Scene parsing gives complete perception of the scene. It predicts the label, location, as well as shape for each element. Scene parsing based on semantic segmentation has potential applications are automatic driving, robot sensing. P. Ravikiran (B) · M. Chakkaravarthy Department of Computer Science and Engineering, Lincoln University College, Kota Bharu, Malaysia e-mail: [email protected] M. Chakkaravarthy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_16

175

176

P. Ravikiran and M. Chakkaravarthy

2 Related Work 2.1 Semantic Segmentation Semantic segmentation [2, 3] understands an image at pixel level and assigns each pixel in the image an object class or label. Before deep learning took over computer vision, people used approaches like random forest-based classifiers for semantic segmentation. One of the popular initial deep learning approaches was patch classification where each pixel was separately classified into classes using a patch of image around it. Main reason to use patches was that classification networks usually have full connected layers and therefore required fixed size images. CNN architectures for dense predictions without any fully connected layers. This allowed segmentation maps to be generated for image of any size and was also much faster compared to the patch classification approach. Almost all the subsequent state-of-the-art approaches on semantic segmentation adopted this paradigm. Apart from fully connected layers, one of the main problems with using CNNs for segmentation is pooling layers. Pooling layers increase the field of view and are able to aggregate the context while discarding the “where” information. However, semantic segmentation requires the exact alignment of class maps and thus needs the “where” information to be preserved. Two different classes of architectures evolved in the literature to tackle this issue. First one is encoder-decoder architecture [4]. Encoder gradually reduces the spatial dimension with pooling layers, and decoder gradually recovers the object details and spatial dimension. There are usually shortcut connections from encoder to decoder to help decoder recover the object details better. U-Net is a popular architecture from this class. Architectures in the second class use what are called as dilated/atrous convolutions [5] and do away with pooling layers which used in a deep learning-based semantic segmentation algorithm, using the pyramid scene parsing method.

2.2 Pixel Accuracy Metrics Used in Semantic Segmentation The most commonly used metrics [6] for semantic segmentation are the pixel accuracy and IoU [6, 7]. k Pixel Accuracy = k i=0

i=0

k

True positiveii

j=0

False positivei j

[10]

IoU is the area of overlap between the predicted segmentation and the ground truth divided by the area of union between the predicted segmentation and the ground truth.

Improved Efficiency of Semantic Segmentation using Pyramid …

177

Fig. 1 Pyramid scene parsing network [7, 8]

3 Proposed Methodology 3.1 Advantages of Pyramid Scene Parsing Network In complex scene parsing object and open vocabulary stuff, identification improves the accuracy performance which avoids representative failures and improves pixel label prediction also (Fig. 1) [7]. Mismatched Relationship: Context relationship is important in recognizing a scene. For example, it is likely that a “boat” is over “river,” while it is unlikely that a “car” is over “river.” Having correct knowledge to this relationship would increase the ability to correctly classify the segment’s class. Confusing Categories: There often are confusing categories in a dataset, such as the third column below. The network is predicting that some parts of the object are a “skyscraper” while the other is “building.” However, the result should be either one of them, not both. The relationship between categories could solve this. Inconspicuous Classes: Objects can exist in various scales in a scene. Traditional FCNs do not care about the various scales in a scene, resulting in discontinuous prediction between various scales, as shown in the image below. One potential reason for this problem is the small receptive field of the network and the inability to pay attention to certain sub-regions, overlooking the global scene category. In PSPNet, used a pyramid pooling module to increase the receptive field without a drastic decrease in the output resolution, or increase in parameter/layer count which over comes the above issues.

3.2 Working of PSPNet Given an input image, PSPNet uses a pretrained CNN with the dilated network strategy to extract the feature map. The final feature map size is 1/8 of the input image. On top of the map, we use the pyramid pooling module to gather context information. Using our 4-level pyramid, the pooling kernels cover the whole, half of, and small portions of the image. They are fused as the global prior. Then, we

178

P. Ravikiran and M. Chakkaravarthy

concatenate the prior with the original feature map in the final part of. It is followed by a convolution layer to generate the final prediction map. FCNs owe their name to their architecture, which is built only from locally connected layers, such as convolution, pooling, and up sampling layers. No dense layer is used in this kind of architecture. Generally, an FCN consists of two main parts: an encoder-decoder architecture for pixel-wise object representation and a Softmax layer for pixel-wise assignments. In the encoder-decoder architecture, an input image is encoded by several convolutional and pooling layers and then decoded by one or more up sampling layers. The Softmax layer assigns each pixel in the input image to one of the classes based on the outputs of the encoder-decoder architecture. Therefore, the outputs of the encoder-decoder architecture, called the pixel-wise feature maps, are considered as a feature representation of the input image [6].

3.3 Dilated Convolution

Standard Convolution (Left),

Dilated Convolution (Right)

The left one is the standard convolution. The right one is the dilated convolution. We can see that at the summation, it is s + lt = p that we will skip some points during convolution. When l = 1, it is standard convolution. When l > 1, it is dilated convolution. Dilated convolution, also known as atrous convolution, is specifically designed for dense prediction. With an extra parameter to control the rate of dilation, it aggregates multi-scale context information without sacrificing spatial resolution. However, dilated convolution comes with the so-called gridding problem. In order to increase the receptive field while retaining the resolution, PSPNet uses a dilated convolution (atrous convolution). Dilated convolutions are a type of convolution that “inflate” the kernel by inserting holes between the kernel elements. An additional parameter (dilation rate) indicates how much the kernel is widened. There are usually spaces inserted between kernel elements (Fig. 2).

4 Experimental Results Cityscapes [8] contain 5000 finely annotated images with 19 semantic classes. The images are in 2048 × 1024 resolution and captured from 50 different cities. There will

Improved Efficiency of Semantic Segmentation using Pyramid …

179

Fig. 2 Dilated convolution puts a spacing between values in a kernel with the parameter “rate,” which allows better control on increasing the receptive field while the resolution does not decrease

be training, validation, and test sets densely annotated images with 150 fine-grained semantic concepts (Fig. 3) [8]. Given an input image, PSPNet [7] uses a pretrained CNN with the dilated network strategy to extract the feature map. The final feature map size is 1/8 of the input image. On top of the map, we use the pyramid pooling module to gather context information. Using our 4-level pyramid, the pooling kernels cover the whole, half of, and small portions of the image. They are fused as the global prior. Then, we concatenate the prior with the original feature map in the final part of. It is followed by a convolution layer to generate the final prediction map.

5 Conclusion This proposed pyramid scene parsing network PSPNet [7] useful for semantic segmentation of indoor scenes and outdoor scenes and it uses the pyramid pooling module to implement semantic segmentation. While semantic segmentation is useful for object detection and scene perception, PSPNet improves accuracy on training and testing experimental results on public cityscapes dataset resulting in high mean accuracy 84.6 and intersection over union (IoU) 0.8672.

180

P. Ravikiran and M. Chakkaravarthy

Fig. 3 Examples of PSPNet results on cityscapes dataset [8]

References 1. Yuan, Y., Huang, L., Guo, J., Zhang, C., Chen, X., Wang, J. OCNet: object context network for scene parsing (2018). Retrieved from http://arxiv.org/abs/1809.00916 2. Li, B., Shi, Y., Qi, Z., Chen, Z.: A survey on semantic segmentation. In: IEEE International Conference on Data Mining Workshops, ICDMW, 2018-November, pp. 1233–1240 (2019). https://doi.org/10.1109/ICDMW.2018.00176 3. Wu, G., Li, Y.: CyclicNet: an alternately updated network for semantic segmentation. Multimedia Tools and Applications 80(2), 3213–3227 (2021). https://doi.org/10.1007/s11042-02009791-9 4. Alam, M., Wang, J.F., Guangpei, C., Yunrong, L., Chen, Y.: Convolutional neural network for the semantic segmentation of remote sensing images. Mobile Netw. Appl. 26(1), 200–215 (2021). https://doi.org/10.1007/s11036-020-01703-3 5. Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking Atrous Convolution for Semantic Image Segmentation (2017). Retrieved from http://arxiv.org/abs/1706.05587 6. Lateef, F., Ruichek, Y.: Survey on semantic segmentation using deep learning techniques. Neurocomputing 338(5), 321–348 (2019). https://doi.org/10.1016/j.neucom.2019.02.003 7. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings—30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017-January, 6230–6239. (2017). https://doi.org/10.1109/CVPR.2017.660

Improved Efficiency of Semantic Segmentation using Pyramid …

181

8. Ko, T. Y., Lee, S.H.: Novel method of semantic segmentation applicable to augmented reality. Sensors (Switzerland) 20(6) (2020). https://doi.org/10.3390/s20061737 9. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018). https://doi.org/10.1109/TPAMI.2017. 2699184 10. Fang, H., Lafarge, F.: Pyramid scene parsing network in 3D: improving semantic segmentation of point clouds with multi-scale contextual information. ISPRS J. Photogramm. Remote. Sens. 154(July), 246–258 (2019). https://doi.org/10.1016/j.isprsjprs.2019.06.010 11. Iccv, A., & Id, P.: Towards Bridging Semantic Gap to Improve Semantic Segmentation Anonymous ICCV submission. Iccv2019, 1(c), 1–5 (2019) 12. Thoma, M.: A survey of semantic segmentation, pp. 1–16 (2016). Retrieved from http://arxiv. org/abs/1602.06541 13. Yan, J., Zhong, Y., Fang, Y., Wang, Z., Ma, K.: Exposing semantic segmentation failures via maximum discrepancy competition. Int. J. Comput. Vision 129(5), 1768–1786 (2021). https:// doi.org/10.1007/s11263-021-01450-2 14. Yuan, Y., Huang, L., Guo, J., Zhang, C., Chen, X., Wang, J.: OCNet: object context for semantic segmentation. Int. J. Comput. Vision 129(8), 2375–2398 (2021). https://doi.org/10.1007/s11 263-021-01465-9

Fractional Approach for Controlling the Motion of a Spherical Robot Quoc-Dong Hoang, Le Anh Tuan, Luan N. T Huynh, and Tran Xuan Viet

Abstract Spherical robots are mobile robots positioned by shifting their gravity center to create torque to rotate independently. The body’s angle installed in the robot will determine the robot’s posture and position. It has one contact point only with the floor when moving; therefore, the inappropriate control input signal can produce a sizeable amplitude of the main body with the appeared vibrations. This problem has a significant impact on monitoring control quality and causes awkward robot movements. To address the issue, this research will propose a new fractional backstepping control law that meets both the accuracy and stability of the robot’s movement for the given robot. The simulation results compared with classical backstepping control are proposed to illustrate the effectiveness of the work.

1 Introduction The spherical rolling robot is a form of mobile robot with a variety of distinct advantages. A waterproof shell, for example, completely covers and protects the entire system. It also only has one point of contact with the ground. As a result, it is more adaptable and smoother than other mobile robots. It can also drive on a variety of surfaces and environments, including water. Halme proposed the first prototype of a spherical-ball robot in 1996 [1]. Following that, the robot was improved and developed in a variety of ways. There is a special Q.-D. Hoang · L. A. Tuan Institute of Mechanical Engineering, Vietnam Maritime University, Hai Phong, Vietnam e-mail: [email protected] L. N. T Huynh (B) Thu Dau Mot University, Binh Duong, Vietnam e-mail: [email protected] T. X. Viet Faculty of Electrical and Electronic Engineering, Vietnam Maritime University, Hai Phong, Vietnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_17

183

184

Q.-D. Hoang et al.

feature of the spherical robot that when moving, it is necessary to maintain a small angle of the body to create the rolling moment. In other words, the shaking angle of the body generates the essential energy for the robot’s movement. Unlike other underactuated systems like cranes [2, 3] or excavators [4, 5], this swing angle thus needs to be controlled to keep it to a minimum but still ensure movement instead of removing it. Most of the previous studies did not mention this angle but only focused on controlling and building the motion trajectory [6], except for a few studies like [7]. Regarding existing studies of spherical robot, intelligent control [8], path planning [9], and hierarchical sliding mode [10] are all examples of controllers for robot systems. Due to its self-robustness [11] and short finite-time convergence[12], sliding mode control [13, 14] has been attracting a growing number of researchers for a long time. In fractional sliding mode [12], these superior features are more obvious. These characteristics make it a good fit for the current system. This study proposes a 2D dynamic model of rolling motion control and designs a sliding mode [13] backstepping control using the underactuated model [3]. The tracking errors of the actuated and unactuated parts of the state variables are converged to the origin using Lyapunov stability. As a result, the robot is returned to its original position and velocity. The main body’s vibrational angle is also reduced. The main contribution of this research is developing a fractional backstepping control law (FBSC) for a spherical robot, and the effectiveness of the work is given with the comparison with conventional backstepping control [7].

2 System Modeling Figure 1 illustrates the coordinate system for the underactuated dynamic model. In which θss is the spherical shell’s angle, θw is the wheel’s angle, θmd is the driving mechanism’s angle in the vertical flat.

Fig. 1 Platform of spherical robot and the front view

Fractional Approach for Controlling the Motion …

185

The matrix form of the system dynamic is provided as M(χ S )χ¨ S +Bχ˙ S +C(χ S , χ˙ S )χ S +G(χ S ) = U,

(1)

T T   where χ S = [θw , θmd ]T , χ˙ S = θ˙w , θ˙md , and χ¨ S = θ¨w , θ¨md are the state variable T  vectors.U = u w 0 is the system’s input signal, with u w representing the torque applied to the wheel motor. Here, the matrices in the model are computed as       b11 b12 c11 c12 m 11 m 22 ,B = ,C = , M(χ S ) = m 21 m 22 b21 b22 c21 c22   0 G= . g2

(2)

The elements of mass matrix of the dynamic system M(χ S ) are given by kw m mdrw2 + kw2 m s rw2 + kw2 m w rw2 Jss kw2 rw2 + , 2 2 k2 r 2 ksh kss sh ss ⎛ ⎞ kw m w rw (rss + cos(θmd )(rw − kshrss )) Jss kw rw 1 ⎜ ⎟ , m 12 = m 21 = ⎝ +kw m mdrw (rss − d cos(θmd )) ⎠+ 2 ksh kshrss ksh +kw m orrw rss + kw m ssrssrw ⎞ ⎛ 2 3 2 2 2 3 ksh rss m w kss + Jmd kss kshrss + ksh m md kss rss ⎟ ⎜ 2 3 2 2 ⎟ ⎜ +ksh m w kss rw − 2kss ksh m w rss3 cos(θmd ) ⎟ ⎜

−1 ⎟ ⎜ 2 2 2 2 2 2 2 2 m 22 = ⎜ +d ksh kssrss + kss ksh m w rssrw − 2kss ksh m w rw rss ⎟ × ksh kss rss . ⎟ ⎜ 2 ⎟ ⎜ −2dksh kss m mdrss2 cos(θmd ) + Jss kshrss ⎠ ⎝ 2 2 3 +2kss ksh m w rss2 rw cos(θmd + ksh m ss kss rss ) m 11 = Jw +

The components of damping matrix B are b k2 r 2 b k r b11 = bw + k 2ssk 2w rw2 , b12 = b21 = k 2ssk w rw , b22 = bmd + bkss2 . ss sh ss ss ss sh ss The matrix of Coriolis and centrifugal coefficients C(χ S , χ˙ S ) is c12

1 = ksh



 dkw m mdrw θ˙md sin(θmd )

, − kw m w rw θ˙md sin(θmd )(rw − kshrss )

⎛ c22 =

1 2k r kss sh ss

⎞ 2 2 2 kshrss + kss ksh m w rss3 θ˙md sin(θmd ) bss kshrss + bmd kss ⎜ ⎟ 2 2 ⎝ +dkss ksh m mdrss θ˙md sin(θmd ) ⎠− 2 −ksh kss m w rss2 θ˙md

sin(θmd ) The gravity vector G(χ S ) is given as

bss 2 kss

− bmd .

186

Q.-D. Hoang et al.

 g2 =

1 2r ksh kss ss

2 2 2 gkss ksh m w rss2 sin(θmd ) + dgkss ksh m mdrss sin(θmd )



. 2 −gkss ksh m w rssrw sin(θmd ) In which, rw and rss are the radiuses of the outer shell and the wheels. d is the distance between the ball’s center and the center of body. rmd is the distance between the center of the ball and the center of the axes of two wheels. The masses of the main body, the wheel, and the shell are denoted by m md , m w , and m ss , respectively. Besides, the wheel, the body, and the ring, and the outer ball are represented by Jmd , Jw , and Jss . kss , kw , and ksh are the coefficients depending on the surface’s property and proportional coefficient of inside round rolling radius and radius of the spherical shell.

3 Controller Design In this section, we propose a backstepping control strategy based on the fractionalorder calculus. Therefore, it not only solves the problem of nonlinear control for the orbital tracking problem but also can guarantee to eliminate or minimizing unactuated states, the biggest challenge in control the underactuated system. The FBSC scheme is illustrated in Fig. 2. The fundamentals of fractional calculus are provided in [15, 16]. To design the controller, the entire system (1) is decoupled as follows: m 11 (χ S )χ¨ u + m 12 (χ S )χ¨a + b11 χ˙ u + b12 χ˙ a + c11 (χ S , χ˙ S )χ˙ u + c12 (χ S , χ˙ S )χ˙ a = 0,

(3)

m 21 (χ S )χ¨ u + m 22 (χ S )χ¨a + b21 χ˙ u + b22 χ˙ a + c21 (χ S , χ˙ S )χ˙ u + c22 (χ S , χ˙ S )χ˙ a + g2 (χ S ) = u w ,

(4)

Fig. 2 Diagram of FBSC

Fractional Approach for Controlling the Motion …

187

where the state vectors of the actuated and unactuated parts are, respectively, defined as χa = θw and χu = θmd . The targets of the controller design are not only reducing the system vibrations but also driving the robot following it desired trajectories. That means the system states χ S including χa and χu should be forced to their desired values χat , and χut , respectively. The fractional sliding function ζ f (t) is selected as

ζ f (t) = Dtα (χa − χat ) + (χa − χat ) + σ Dtα χu + χu ,

(5)

where  > 0, σ > 0,  > 0. Dtα (g(t)) is the fraction-order derivative of function g(t) [12, 16], which is computed in the sense of Riemann–Liouville and Caputo as Dtα g(t)

dm 1 = (m − α) dt m

t (t − τ )m−α−1 g(τ )dτ

(6)

0

and C

Dtα g(t)

1 = (m − α)

t (t − τ )m−α−1 g m (τ )dτ

(7)

0

with α ∈ [m − 1, m) and m ∈ Z+ . The backstepping sliding mode control law based on fractional-order calculus is designed as  uw =

 q1 (χ S , χ˙ S )χ˙ u + q2 (χ S , χ˙ S )χ˙ a + n(χ S ) + p(χ S )χ¨at



+ p(χ S )Dt2−2α T χ˜ a − σ Dt2α χu + Dtα χu − ζ f − λsgn ζ f (8)

which stabilizes the system model (3), (4) asymptotically. Here, χ˜ a = χa − χat denotes the tracking error vector; λ > 0 is the controller parameter matrices; and the functions p(χ S ), p1 (χ S , χ˙ S ), q2 (χ S , χ˙ S ), and n(χ S ) are provided by p(χ S ) = m 22 (χ S ) − m 21 (χ S )m −1 11 (χ S )m 12 (χ S ), n(χ S ) = −m 21 (χ S )m −1 11 (χ S )g1 (χ S ) + g2 (χ S ) −1 ˙ S) q1 (χ S , χ˙ S ) = −m 21 (χ S )m −1 11 (χ S )b11 − m 21 (χ S )m 11 (χ S )c11 (χ S , χ + b21 + c21 (χ S , χ˙ S ),

188

Q.-D. Hoang et al. −1 ˙ S) q2 (χ S , χ˙ S ) = −m 21 (χ S )m −1 11 (χ S )b12 − m 21 (χ S )m 11 (χ S )c12 (χ S , χ + b22 + c22 (χ S , χ˙ S ).

To investigate the system stability, auxiliary Lyapunov function is chosen as ϒ0 (t) = 0.5χ˜ a2 (t).

(9)

The fractional derivative with order α of the function is calculated as Dtα ϒ0 (t) = Dtα (0.5χ˜ a (t)).

(10)

A virtual input is further defined as Dtα χ˜ a (t) = ζ f − χ˜ a (t).

(11)



Dtα ϒ0 (t) ≤ 0.5χ˜ a ζ f − χ˜ a (t) .

(12)

Based on [16],

When once function converges to the origin or Dtα χ˜ a = −χ˜ a , we have ≤ −0.5χ˜ a2 ≤ 0, which satisfies the stable condition in [16]. Then, the Lyapunov function is asymptotically stable around the equilibrium point χ˜ a = 0. The primary Lyapunov function is selected as Dtα ϒ0 (t)

ϒ(t) = ϒ0 (t) + 0.5ζ 2f ≥ 0,

(13)

  Its α-order fractional derivative is Dtα ϒ(t) = Dtα ϒ0 (t) + Dtα 0.5ζ 2f . We have

Dtα ϒ(t) ≤ −0.5χ˜ a2 + Dtα 0.5ζ Tf (t)ζ (t) ≤ −0.5χ˜ a2 + ζ Tf Dtα ζ (t).

(14)

With applying control law (8), the α-order fractional derivative of the selected Lyapunov function (13) is calculated as 

 Dtα ϒ(t) ≤ −0.5χ˜ a2 + ζ f Dtα Dt2α−2 χ¨˜a + Dtα χ˜ a + σ Dt2α χ˜ u + Dtα χu

≤ −0.5χ˜ a2 + ζ f Dtα χ˜ a2 + 2 (χa − χat ) − ζ f − λsgn ζ f . (15) Using the condition (11), the following can be computed: Dtα ϒ(t) ≤ −0.5χ˜ a2 − ζ 2f − ζ f λsgn ζ f

(16)

Fractional Approach for Controlling the Motion …

189

Based on [16], both χ˜ a and +ζ f satisfy the asymptotical stability conditions.   It can be inferred that if Dtα ζ f + ζ f + λsgn ζ f = 0, and lim ζ f = 0 and x→∞

lim χ˜ a = 0. Besides the underactuated part ensures Dtα χu + χu = 0. Because x→∞ Dtα ϒ(t) ≤ −0.5χ˜ a2  − ζ 2f  − ζ f λsgn ζ f and Dtα ϒ0 (t) ≤ −0.5χ˜ aT χ˜ a ≤ 0. On the basis of [16], χu is obtained the asymptotic stability around its equilibrium.

4 Simulation Results The results of fractional backstepping control (FBSC) are illustrated with the comparison with conventional backstepping approach (CBSC) in Figs. 3, 4, 5, 6, 7, and 8. The rolling motion tracking control of the spherical shell is provided in Figs. 3 and 4. The system controlled with the different controller shows different control performances. Specifically, in FBSC (α = 1.05), the tracking errors and the oscillations are significantly reduced, and the convergence time of the errors to zero is also reduced, as shown in Figs. 7 and 8. The results are the same for the controls for the wheels and the main body, as shown in Figs. 5 and 6. For a quantitative comparison of controllers, here, we utilized the RMSE standard of the tracking errors with the statistics provided in Table 1. Fig. 3 Spherical shell’s angle

Fig. 4 Shell’s angular velocity

190

Q.-D. Hoang et al.

Fig. 5 Wheel’s angle

Fig. 6 Wheel’s angular velocity

Fig. 7 Body’s angle

Fig. 8 Body’s angular velocity

Table 1 RMSEs of controllers

System parameters

RMSE CBSC

RMSE FBSC

θw θ˙w

0.351

0.134

0.295

0.131

θmd

0.012

0.094

θmd

0.33

0.19

Fractional Approach for Controlling the Motion …

191

5 Conclusion In this research, a backstepping control law based on fractional fundamentals is provided for a model of the spherical robot in two dimensions. The underactuated state of the system model generating the robot’s oscillations are minimized, and the tracking performances are enhanced. The numerical computation results and comparisons with conventional backstepping control are shown to investigate the effectiveness of the designed controller. Acknowledgments This research was supported by the Institute of Mechanical Engineering, Vietnam Maritime University.

References 1. Halme, A., Schonberg, T., Wang, Y.: Motion control of a spherical mobile robot. In: Proceedings of 4th IEEE International Workshop on Advanced Motion Control—AMC’96—MIE, 1996, vol. 1, pp. 259–264 (1996) 2. Dong, H.Q., Lee, S., Ba, P.D.: Double-loop control with proportional-integral and partial feedback linearization for a 3D gantry crane. In: 2017 17th International Conference on Control, Automation and Systems (ICCAS), pp. 1206–1211 (2017) 3. Cuong, H.M., Dong, H.Q., Van Trieu, P., Tuan, L.A.: Adaptive fractional-order terminal sliding mode control of rubber-tired gantry cranes with uncertainties and unknown disturbances. Mech. Syst. Signal Process. 154, 107601 (2021) 4. Hoang, Q.-D., Park, J., Lee, S.-G.: Combined feedback linearization and sliding mode control for vibration suppression of a robotic excavator on an elastic foundation. J. Vib. Control, 1077546320926898 (2020) 5. Hoang, Q.-D., Park, J.-G., Lee, S.-G., Ryu, J.-K., Rosas-Cervantes, V.A.: Aggregated hierarchical sliding mode control for vibration suppression of an excavator on an elastic foundation. Int. J. Precis. Eng. Manuf. (2020) 6. Chowdhury, A.R., Soh, G.S., Foong, S.H., Wood, K.L.: Experiments in robust path following control of a rolling and spinning robot on outdoor surfaces. Rob. Auton. Syst. 106, 140–151 (2018) 7. Dong, H.Q., Lee, S.-G., Woo, S.H., Leb, T.-A.: Back-stepping approach for rolling motion control of an under-actuated two-wheel spherical robot. In: 2020 20th International Conference on Control, Automation and Systems (ICCAS), pp. 233–238 (2020) 8. Kayacan, E., Kayacan, E., Ramon, H., Saeys, W.: Adaptive neuro-fuzzy control of a spherical rolling robot using sliding-mode-control-theory-based online learning algorithm. IEEE Trans. Cybern. 43(1), 170–179 (2013) 9. Bhattacharya, S., Agrawal, S.K.: Design, experiments and motion planning of a spherical rolling robot. In: Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065), vol. 2, pp. 1207–1212 (2000) 10. Dong, H.Q., et al.: Kinematic model-based integral sliding mode control for a spherical robot. In: 2019 19th International Conference on Control, Automation and Systems (ICCAS), pp. 635– 640 (2019) 11. Tuan, L.A., Joo, Y.H., Tien, L.Q., Duong, P.X.: Adaptive neural network second-order sliding mode control of dual arm robots. Int. J. Control. Autom. Syst. 15(6), 2883–2891 (2017)

192

Q.-D. Hoang et al.

12. Le, A.T.: Neural observer and adaptive fractional-order back-stepping fast terminal sliding mode control of RTG cranes. IEEE Trans. Ind. Electron., 1 (2020) 13. Hoang, Q.-D., Lee, S.-G., Dugarjav, B.: Super-twisting observer-based integral sliding mode control for tracking the rapid acceleration of a piston in a hybrid electro-hydraulic and pneumatic system. Asian J. Control 21(1), 483–498 (2019) 14. Tuan, L.A., Cuong, H.M., Van Trieu, P., Nho, L.C., Thuan, V.D., Anh, L.V.: Adaptive neural network sliding mode control of shipboard container cranes considering actuator backlash. Mech. Syst. Signal Process. 112, 233–250 (2018) 15. Li, Y., Chen, Y., Podlubny, I.: Mittag–Leffler stability of fractional order nonlinear dynamic systems. Automatica 45(8), 1965–1969 (2009) 16. Trigeassou, J.C., Maamri, N.: Analysis, Modeling and Stability of Fractional Order Differential Systems 2: The Infinite State Approach. Wiley (2019)

Progression of Metamaterial for Microstrip Antenna Applications: A Review Renju Panicker and Sellakkutti Suganthi

Abstract This article provides an overview of the evolution of metamaterials (MTM) and all the aspects related to metamaterial development for antenna applications. It will be a useful collection of information for antenna researchers working in metamaterials applications. It gives an insight into the various metamaterial structures utilized along with miniature antenna designs. Different types of design parameters studied by the previous researchers are showcased to understand better perception of the metamaterial usage.

1 Introduction Metamaterials are an extraordinary field of research dealing with man-made artificially engineered materials. It is a miraculous optical medium that interacts with light, not like other materials. In 2001, the negative refraction property exploded into new avenues of research. In 1968, [1] studied characteristics properties of dielectric permittivity and magnetic permeability. The first application of MTM was described by Pendry [2], in 2000, where MTM can provide negative permittivity and permeability. MTM is an artificially engineered medium loaded with different types of antennas. This paper studies various metamaterials and metasurfaces which contributed to the improvement of low-profile antennas. In this paper, the progression of MTM is explained having insight on the previous works. Figure 1 explains the refractive nature of MTM different from conventional materials.

R. Panicker (B) Christ University, Bangalore, India S. Suganthi RF & Microwave Research Laboratory, Electronics, and Communication Engineering, Christ University, Bangalore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_18

193

194

R. Panicker and S. Suganthi

Fig. 1 Metamaterial characteristics based on Epsilon and Mu value [3]

2 Evolution of Metamaterials The metamaterials are artificially engineered dielectrics having the negative refractive index toward specific frequencies and are used in antennas. Study on metamaterial using light for communication created photonic materials. In the twentieth century, Pendry [2] demonstrated that the chemical composition of dielectric if varied leads to changes in properties and thus titled with Greek word “meta” which meant— “beyond.” The different names for MTM are left-handed material (LHM), negative permeability metamaterials (MNG), Epsilon negative metamaterials (ENG), double negative metamaterials (DNG), negative refractive index (NRI) material. Metamaterial loaded transmission line, split ring resonators/complimentary SRR, MTM ground plane, MTM absorbers, and MTM for cloaking are the popular MTM used for MSA applications. Split ring resonators are researched with various patterns like circular-SRR and rectangular-SRR which are reviewed thoroughly in this paper and in Table 1. Caloz and ltoh [4] in 2002 explained that in a left-handed material (LHM), light propagates reverse to the energy flow direction leading to (−) ve index of refraction. There is an evolution of metamaterial dimension-wise. Followed by Engheta [5], the same year 2002 gave an idea on 1D cavity resonator. Ishimaru et al. [6] in 2003 continued the work in 3D by presenting non-magnetic inclusions. Ozbay [7] in 2003 studied the propagation characteristics measurement of

Progression of Metamaterial for Microstrip Antenna …

195

Table 1 Summary of various MTM-based antennas Refs. Model/shape

Operating Substrate frequency (∈ r ) (GHz)

Characteristics Software used

Outcome

Compact chip configuration

[4]

Lumped-element 3 LH-TL

RT Duroid (2.2)

Left-hand (LH) material



[9]

Patch loaded MTM

0.5

FR4 epoxy (4.4)

MNG

CST Miniaturized microwave layout studio

[19]

H-SRR

20

FR4 epoxy (4.4)

N-RI

HFSS

Appreciable gain

[13]

R-SRR

2.4

Rogers Duroid 5880 substrate (2.2)



HFSS v18.2

High gain and bandwidth

[15]

Pi-model

2.5

R04003(3.38) RH-LH

HFSS

Increases antenna efficiency, multiband

[14]

C-SRR

2.4

FR4 epoxy (4.4)

HFSS.18

Improved gain

DNG

LHM in the air at microwave frequency range and first of its type. The metamaterial was SRR as in Fig. 2. In the DNG region, 0.3 dB amplitude was measured per unit cell which was the highest measured so far. Followed by Ziolkowski [8] in 2003 has exhibited design, fabrication, and testing of different metamaterial (especially DNG) in X-band as in Fig. 3. In 2006, Bilotti et al. [9] demonstrate that if patch size is minimized then antenna parameters-gain, bandwidth, radiation pattern may decrease along with a decrease in surface wave. MTM loaded in substrate has a core with negative real part (μ). The dimension of λ/10 and split ring resonator (SRR) is implemented which is shown in Fig. 4a, b. Fig. 2 SRR is made of copper [7]

196

R. Panicker and S. Suganthi

Fig. 3 Nonplanar MTM geometry [8]

Fig. 4 a SRRs are included to study the MNG behavior, b circular patch excited by coaxial probe [9]

Palandoken et al. [10] in 2009 proposed a compact novel MSA having 6-unit cells (2 × 3) array of left-handed material (LHM), and a dipole element is an LHM unit cell as shown in Fig. 5. In 2010, Palandoken et al. [11] proposed a miniaturized antenna using fractal MTM with negative permittivity. Suganthi et al. [12] in 2012 have come with similar work on C-SRR but using a Hilbert curve fractal antenna (HCFA) on Fr4 substrate. This structure is also used as a defected ground structure (DGS). Again in 2018, Suganthi et al. [13] proposed a hexagonal patch antenna using SRR fabricated on Rogers Duroid 5880 substrate (2.2) operating at 2.4 GHz (Fig. 6). Shashikumar and Suganthi in 2020 proposed a novel hybrid metamaterial unit cell [14] as in Fig. 7 using FR4 substrate using five SRRs at the bottom of antenna and a hexagonal ring made of six triangles which showed double negative (DNG) properties at 2.45 GHz.

Progression of Metamaterial for Microstrip Antenna …

197

Fig. 5 Array of left-handed material (LHM) and a dipole element [11]

Fig. 6 Hexagonal patch antenna a front view, b rear view [13]

Fig. 7 Hexagonal ring made of six triangles [14]

Karthik et al. [15] in 2020 discuss RH-LH dual-band rectenna as in Fig. 8. The operating frequency is 2.5 GHz for W-Fi and Wi-Max operating frequency is 3.6 GHz. Metamaterial provides multiband/broadband operation and increases antenna efficiency. An omnidirectional pattern is observed at both bands.  Pharvested =

 Pt G t G r

λ 4π d

2 

η 100

 (1)

198

R. Panicker and S. Suganthi

Fig. 8 RH-LH dual-band rectenna [15]

A Skyworks SMS-7630 Schottky diode is used for the voltage doubler circuit. The harvested power can be calculated using the Friis transmission equation shown in Eq. (1). Metamaterials can be used as absorbers (MMA) too as mentioned by Venneri et al. [16] in 2017 proposed a unit cell MTM absorber using a Minkowski fractal geometry to get a miniaturized size as in Fig. 9. The structure is simulated using MoM. This proposal is useful for indoor applications to avoid multipath in wireless applications. Kaur et al. [17] in 2020 proposed MMA as in Fig. 10 which is very much compact and useful for UWB. The design consists of L-shaped (inverted) and a rectangular shape (diagonally placed). This design is useful in stealth technology, thermal imaging, etc.

Fig. 9 MTM absorber with unit cell structure [16]

Progression of Metamaterial for Microstrip Antenna …

199

Fig. 10 Proposed metamaterial absorber [17]

3 Nicholson-Ross-Weir Method (NRW)—RF Material Measurement The material property of metamaterial can be retrieved using NRW method [18]. NRW method is based on measuring the reflection from and transmission through a homogenous sample of isotropic material under specified illumination conditions. NRW provides a direct calculation of both ε (permittivity) and μ (permeability) from S-parameters. The propagation of EMF through objects depends on μ and ε.

3.1 Mathematical Formulation-Parametric Retrieval Method Parameter retrieval method [18] using S-parameters—used to calculate curve for complex μ and ε of MTM-unit element. The S-parameters Eqs. (1) and (2) are S11 = S12 =

  R01 1 − e j2nkd 2 j2nkd 1 − R01 e

(1 − R01 )2 e j2nkd ) 2 j2nkd 1 − R01 e

(2)

(3)

4 Discussion The overview on metamaterial-based MSA is shown in Table 1.

5 Conclusion This paper discusses the overview of MTM-based microstrip antenna of different shapes and dimensions. This paper reviewed antenna parameters like dimension,

200

R. Panicker and S. Suganthi

operating frequency, characteristics, and software used. The evolution of MTM from LH/RH-TL to metasurface and its usage as absorber is being taken into consideration. From this study, we can conclude that MTM can be utilized with various antenna structures for application in UWB applications, energy harvesting, filtering, and absorption. Different MTM shapes are analyzed inclusive of circular, split rings, MTM-based fractal antennas, and many more. This paper is useful for beginners to understand MTM antennas. Thus, MTM-based antennas improve the show of MSA in terms of gain intensification, wider bandwidth, reduced size. From the study undertaken, it is also learned that the majority of the researchers adopted NicolsonRoss-Weir (NRW) method for verifying the metamaterial nature, utilized simulation software such as HFSS and CST Microwave Studio, and validated their simulated results with measurements using an anechoic chamber and vector network analyzer. Hence, MTM-based antennas especially SRR have a wide scope in research and commercial applications.

References 1. Veselago, V.G.: The electrodynamics of substances with simultaneously negative value of ε and μ. Soviet Phys. Upsekhi 10(4), 517–526 (1964) 2. Pendry, J.: Manipulating the near field with metamaterials. Opt. Photon. News, 33–37 (2004) 3. Smith, D.R.: Homogenization of metamaterial by field averaging. J. Opt. Soc. Am. B 23(3), 391 (2006) 4. Caloz, C., ltoh.: Application of the transmission line theory of left-handed (LH) materials to the realization of a microstrip. LH line. IEEE (2002) 5. George, Engheta, E.N.: Metamaterials: fundamentals and applications in the microwave and optical regimes. Proc. IEEE 99(10) (2011) 6. Ishimaru, A., Lee, S.-W., Kuga, Y., Jandhyala, V.: Generalized constitutive relations for metamaterials based on the quasi-static Lorentz theory. IEEE Trans. Antennas Prop. 51(10) (2003) 7. Ozbay, E., Aydin, K., Cubukcu, E., Bayindir, M.: Transmission and reflection properties of composite double negative metamaterials in free space. IEEE Trans. Antennas Prop. 51(10) (2003) 8. Ziolkowski, R.W.: Design, fabrication, and testing of double negative metamaterials. IEEE Trans Antennas Propag. 51(7) (2003) 9. Bilotti, F., Alù, A., Engheta, N., Vegni, L.: Miniaturized circular patch antenna with metamaterial loading. EuCAP 2006’, Nice, France, ESA SP-626, Oct (2006) 10. Palandoken, M., Grede, A., Henke, H.: Broadband microstrip antenna with left-handed metamaterials. IEEE Trans. Antenna Wave Propag. 57(2) (2009) 11. Palandoken, M., et al.: Fractal Negative-Epsilon Metamaterial. IEEE (2010) 12. Suganthi, S., Raghavan, S., Kumar, D., Hosimin Thilagar, S.: A compact hilbert curve fractal antenna on metamaterial using CSRR. In: PIERS Proceedings, Kuala Lumpur, Malaysia, Mar 27–30 (2012) 13. Suganthi, S., Patil, D.D., Raghavan, S.: Performance of hexagonal patch antenna influenced by split ring resonator. In: 2019 TEQIP III Sponsored International Conference on Microwave Integrated Circuits, Photonics and Wireless Networks (IMICPW), Tiruchirappalli, India, pp. 278–282 (2019) 14. Shashi Kumar, D., Suganthi, S.: Novel hybrid metamaterial to improve the performance of a beamforming antenna. J. Phys. Conf. Ser. (2020)

Progression of Metamaterial for Microstrip Antenna …

201

15. Chandrasekaran, K.T., et al.: Compact dual-band metamaterial-based high-efficiency rectenna. IEEE Antennas Propag. Mag., 1045–9243 (2020) 16. Venneri, F., et al.: Fractal-shaped metamaterial absorbers for multi-reflections mitigation in the UHF-band. IEEE Antennas Wirel. Propag. Lett. (2017) 17. Kaur, M., Singh, H.S.: Experimental verification of super-compact ultra-wideband (UWB) polarization and incident angle-independent metamaterial absorber. Int. J. Microw. Wirel. Technol., 1–11 (2020) 18. Numan, A.b., Sharawi, M.S.: Extraction of material parameters for metamaterials using a full-wave simulator. IEEE Antennas Propag. Mag. 55(5) (2013) 19. Bose, S., Ramraj, M., Raghavan, S.: Design, analysis and verification of hexagon split ring resonator based negative index metamaterial. In: IEEE India Annual Conference 2012 (2012)

Natural Language Processing of Text-Based Metrics for Image Captioning Sudhakar Sengan , P. Vidya Sagar , N. P. Saravanan , K. Amarendra , Arjun Subburaj , S. Maheswari , and Rajasekar Rangasamy

Abstract A rapidly increasing interest has been focused on natural language processing and computer vision study by automatically producing descriptive sentences for images in present trends. Image processing is a success factor, especially involving somaticized image knowledge and creating reliable, appropriately organized explanation phrases. In this paper, a dynamic program that uses the VCG16 platform to develop image descriptions and a long short-term memory (LSTM) to contain appropriate words with generated text keywords successfully is proposed. This paper determines the NLP method’s usefulness using Flickr8 K-statistical models and demonstrates that their model delivers accurate performance compared to the Bleu metric. The Bleu metric is an automated system for measuring a machine translation’s performance by monitoring the effectiveness of text transformed from

S. Sengan (B) Department of Computer Science and Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India P. V. Sagar · K. Amarendra Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, A.P., India e-mail: [email protected] N. P. Saravanan Department of Computer Science and Engineering, Kongu Engineering College, Perundurai, Tamil Nadu 638060, India A. Subburaj Managing Director, Tranxit Technology Solutions Pvt. Ltd., Chennai, Tamil Nadu, India e-mail: [email protected] S. Maheswari Department of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, India e-mail: [email protected] R. Rangasamy Department of Computer Science and Engineering-AI & ML, School Engineering, Malla Reddy University, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_19

203

204

S. Sengan et al.

natural to other languages. The results have been justified with images that satisfy the NLP algorithm.

1 Introduction Creating the learning experience is an artificial intelligence (AI) challenge where a particular illustrative sentence is developed to input images. In the proper sequence, the image’s perception involves dual computer vision (CV) analysis to meet the message’s features and a natural language processing (NLP) learning algorithm. Several other image sous-titration applications, including editing standards, automated systems, image indexation, persons with vision problems, Twitter and Facebook, and many other NLP technologies. For instance, recent performance with deep learning (DL) approaches has been acquired in the state-of-the-art. DL methods have also been shown to achieve optimal results in the field of subtitle generation issues. Instead of requiring complex data training or a pipeline of precisely built models, an end-to-end framework can be mentioned to estimate a text caption. We use the BLEU conventional metric to analyze our method to evaluate its output on the Flickr8 Kdataset [1]. Such research outcomes show that our proposed method performs better in performance assessment than conventional image machine translation models. The organization of this paper consists of introduction in Sect. 1, and existing research findings are described in Sect. 2 of the article. Section 3 contains text-based metrics for image captioning by the proposed NLP method and implementation and analysis of result and discussion in Sect. 4. Finally, Sect. 5 consists of the conclusion.

2 Related Work The primary issue of image captioning and its proposed changes has existed since the Internet’s invention and its commonly acknowledged use as ensures to exchange images. Researchers from multiple backgrounds have put forward several learning algorithms and techniques. It used non-saturating neurons. A implemented a neural network (NN) and effective convolution function implementation of a particular Graphics Processing Unit (GPU) system. They succeeded in eliminating overfitting by using a regularization strategy called dropout. A new database, called ImageNet, was introduced by Li [2], a comprehensive set of images developed using the WordNet structure core. In a densely populated syntactic and semantic hierarchical structure, ImageNet configured the specific categories of images. Besides learning visual information and language from internal communication, method [3] used images and sentencing classifications. Their study represented the recurrent neural networks model that uses the conditional co-efficient geographies procedure to generate new image descriptions. Author [4] suggested a method for the automated production of a definition of an image in a natural language, which would

Natural Language Processing of Text-Based Metrics …

205

significantly help facilitate images’ interpretation. The proposed neural network multi-model approach, involving image detection and localization segments, is identical to the graphic social scheme, which can automatically describe images’ content. To fix the issue, LSTM units are complex and fundamentally sequential over time for machine translation, and conditional image generation suggested a convolutional neural network (CNN) [5]. They extensively experimented on large datasets consisting of various content types with multiple network architectures and introduced a unique model demonstrating notable improvements in captioning accuracy over the previously suggested model [6].

3 Proposed System The paper’s real work is that the features are removed from the metaphors using the pre-trained model of VGG-16 and then fed to the model of LSTM along with the training captions. The trained model can then create captions for any images provided to it [7]. The dataset used here is the FLICKR 8-K [8], consisting of around 8091 images for each image along with five captions (Fig. 1). Then the BLEU metric is used to find accuracy. The technique works by measuring n-grams in the candidate translation into n-grams in the comparison text, with each token 1-g/unigram and a large-gram comparative analysis for each word pair. The comparative study was carried out differently based on grammatical structure. A BLEU [9] systems integrator’s cognitive factor task compares the candidate’s n-grams and the number

Fig. 1 LSTM

206

S. Sengan et al.

of matches to the n-grams of the required learning. These matches are locationindependent [10]. The more matches there are, the better the candidate translation is: the following are the dependencies needed for the work (Figs. 2 and 3): Keras, TensorFlow GPU, Pre-trained VGG-16 Weights, NLTK, Matplotlib [11].

Fig. 2 VGG-16

Fig. 3 System design

Natural Language Processing of Text-Based Metrics …

207

4 Implementation 4.1 Data Preprocessing Importing the appropriate libraries: importing all the relevant libraries, such as Keras, Pandas, NLTK, Matplotlib, TensorFlow [12]. • Configuring: We are now setting up the GPU memory to be used for training purposes. We set it such that 95% of the available memory of the GPU is used. • Importing Dataset: the flicker image dataset and its respective captions are now imported [13]. • Understanding the Dataset: To learn more about it by plotting a few images and their captions from the dataset [14]. • Cleaning Captions for More Processing: The caption dataset requires punctuations, singular words, and numerical values to be cleaned before the model is fed because the uncleaned dataset does not produce good captions for the photos [15]. • Caption Analysis in the Dataset: The top 50 terms that appear in the cleaned dataset are plotted. • Adding tokens for the complete sequences for each caption: The tokens must be applied to the start and end series so that the captions for the images are easier to distinguish as each of them has a different length.

4.2 Extract Features To extract features from the images, we load the model VGG-16 and weights. The last layer of the VGG-16 is omitted since we only use it to extract features rather than classify objects. Here, the features in all images in dataset are extracted. The VGG-16 model gives 4096 features from the 224 * 224 size input image. • Plotting Similar Images from the Dataset We first have to build a cluster for this and find which images belong together. PCA is then used to reduce the characteristics we received from 4096 to 2 from VGG-16 feature extraction. The clusters are plotted first, and few examples are taken from the bunch to show.

4.3 Formatting Dataset • Merging Data: Merging the images and the training captions • Tokenizing: Tokenizing the captions into their words for further processing of the captions. Since the model does not take texts as input, it is essential to translate them into vectors.

208

S. Sengan et al.

• Data Splitting: The test log is divided for trial purposes. • Padding: Finding the entire length of the padding caption • Processing: Processing of the captions and photos according to the appropriate shape for the model

4.4 Building Model LSTM is a kind of regular CNN. By default, LSTM will store data in memory for a long time. For processing, forecasting, and classifying built-on time sequence data, LSTM is used. A recurrent neural network is ideal for sequences, lists, and other language processing issues, allowing knowledge to be transferred from one stage of a network to another. An LSTM can learn long-term dependencies and operate on a wide range of problems exceptionally well. Although long-term dependencies are problematic for RNNs, LSTMs are specifically designed to avoid long-term dependency. • Training-LSTM Model: Train the LSTM model for epoch times generating captions on a small set of images: After training finishes on the model, we can test its output on some test images to determine if the captions produced are good enough. If the captions generated are good enough, we can create the test’s full dataset captions.

4.5 Evaluating the Model Performance We have to test the prediction capabilities of models on the test dataset after the model is trained. Forecast accuracy metrics that have been used in the past are no longer applicable. We have a metric called the BLEU score for text assessments. Bilingual Evaluation Understudy (BLEU) compares one or more reference texts with a candidate text. Generating captions for the entire test data and having a BLEU score can check some of the produced caption quality as the excellent and lousy caption. Sometimes due to the complex nature of images, the generated captions are not acceptable.

5 Results and Discussion The framework of image sub-titling was performed to validate that we could generate marginally comparative sub-titles to human sub-titles. We plot-related images from the dataset using PCA (Figs. 4, 5, 6, 7 and 8).

Natural Language Processing of Text-Based Metrics …

209

Fig. 4 Dataset using PCA

Fig. 5 PCA embedding

6 Conclusion To produce the captions required for the pictures, the model has been successfully trained by suitable parameters. By fine-tuning the model with distinct hyperparameters, the caption generation has been constantly improved. The higher BLEU score indicates that the captions produced are very close to those of the images present in the actual caption. In the future, this can be extended with the knowledge-driven

210

S. Sengan et al.

Fig. 6 Loss versus epoch of LSTM

Fig. 7 Bad caption

Fig. 8 Good caption

model for analysis. A semantic contextual model that can semi-automate the feature engineering process can also be developed as a stand-alone system.

Natural Language Processing of Text-Based Metrics …

211

References 1. Recht, B., Roelofs, R., Schmidt, L., Shankar, V.: Do ImageNet classifiers generalize to ImageNet. In: Proceedings of the 36th International Conference on Machine Learning, PMLR 97, pp. 5389–5400 (2019) 2. Li, H.: Learning to Rank for Information Retrieval and Natural Language Processing, 2nd edn, Morgan & Claypool (2014) 3. https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_Dataset.zip 4. Isahara, H.: Resource-based natural language processing. In: International Conference on Natural Language Processing and Knowledge Engineering, Beijing, pp. 11–12 (2007) 5. Khan, N.S., Abid, A., Abid, K.: A Novel Natural Language Processing (NLP)–based machine translation model for english to pakistan sign language translation. Cogn. Comput 12, 748–765 (2020) 6. Kłosowski, P.: Deep Learning for Natural Language Processing and Language Modelling, Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), pp. 223– 228. Poznan, Poland (2018) 7. Fiok, K., Karwowski, W., Gutierrez, E., Reza-Davahli, M.: Comparing the quality and speed of sentence classification with modern language models. Appl. Sci. 10, 3386 (2020) 8. Mahima, Y., Ginige, T.N.D.S.: Graph and natural language processing based recommendation system for choosing machine learning algorithms. In: 12th International Conference on Advanced Infocomm Technology (ICAIT), Macao, China, pp. 119–123 (2020) 9. Hassan, R.O., Mostafa, H.: Implementation of deep neural networks on FPGA-CPU platform using Xilinx SDSOC. Analog Integr. Circ. Sig. Proces. 106, 399–408 (2021) 10. Riofrio, G.E.C., Sucunuta, M.E.E.: Using natural language processing in the design of a multilingual architecture for representation of language. In: 2nd International Conference on Software Technology and Engineering, San Juan, PR, USA, V2–17-V2–21 (2010) 11. Dong, R., Liu, M., Li, F.: Multilayer Convolutional Feature Aggregation Algorithm for Image Retrieval (2019) 12. Aung, S.P.P., Pa, W.P., New, T.L.: Automatic Myanmar image captioning using CNN and LSTM-based language model. In: Language Resources and Evaluation Conference (LREC 2020), pp. 139–143 (2020) 13. Sehgal, S., Sharma, J., Chaudhary, N.: Generating image captions based on deep learning and natural language processing. In: 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), pp. 165–169 (2020) 14. Zhao, Z.Q., Zheng, P., Xu, S., Wu, X.: Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. arXiv:1807.0551 (2019) 15. Zong. Z, Hong, C.: On application of natural language processing in machine translation. In: 3rd International Conference on Mechanical, Control and Computer Engineering (ICMCCE), Huhhot, China, pp. 506–510 (2018)

Li-Fi Enables Reliable Communication of VLT for Secured Data Exchange N. Venkata Ramana Gupta , G. Rajeshkumar , S. Arun Mozhi Selvi , Sudhakar Sengan , Arjun Subburaj , S. Priyadarsini , and Rajasekar Rangasamy

Abstract Li-Fi is a new method of sending and receiving secure data using light energy from one node to another. When data are transferred to the receiver node, the sender’s node light will blink continuously in the decade research. This may appear to be an uncomfortable remote control. Other than that, there are disadvantages to using a Wi-Fi network for the node because it has significant drawbacks such as bandwidth variances between nodes when using the communications system, such as some nodes achieving actual transmission speeds and others having low data rates. Li-Fi solves this issue by using light energy similar on both the sender and receiver sides, so the receiver node can transfer data at high speeds even when the device sending it is far away. The data transfer speed will be 9600 bits per second, which is fast enough to prevent lagging. The Li-Fi system has mainly been used in N. V. R. Gupta Department of Computer Science and Engineering, Prasad V. Potluri, Siddhartha Institute of Technology, Kanuru, Vijayawada, India G. Rajeshkumar Department Computer Science and Engineering, Erode Sengunthar Engineering College, Perundurai, Erode, Tamil Nadu, India S. A. M. Selvi Department of Computer Science and Engineering, DMI St. John The Baptist University, Mangochi, Malawi S. Sengan (B) Department of Computer Science and Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu, India A. Subburaj Managing Director, Tranxit Technology Solutions Pvt. Ltd., Chennai, Tamil Nadu, India e-mail: [email protected] S. Priyadarsini Department of Computer Science and Engineering, P.S.R Engineering College, Sivakasi, Tamil Nadu, India R. Rangasamy Department of Computer Science and Engineering-AI & ML, School Engineering, Malla Reddy University, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_20

213

214

N. V. R. Gupta et al.

wireless nodes and IoT devices with embedded technology, a colour sensor and a microcontroller. The virtual node’s light source communicates with the destination nodes’ colour sensor, transforming the light into digital information. For the most part, these apps are found in IoT devices for user needs.

1 Introduction When an electric current flows forward through a PN-junction diode, it emits light. Charge carrier recombination occurs in the LED. Energy can be released in light and heat if an electron from the N-side and a hole from the P-side join. The LED is made of colourless semiconductor material and emits light through the diode junction. LED bulbs use a microchip to turn electrical energy into light, which illuminates the tiny light sources to create visible light. Compared to conventional incandescent and fluorescent light bulbs, this procedure saves up to 90% on electricity. LEDs are focussed light sources. LED lights are, in fact, extremely adaptable. Coloured LEDs with low voltage are frequently used as decorations on everything from a strip of lights on the underside of a bar to commonplace devices like your phone. And the high voltage large ones are used in football stadiums to cover a large enough area so that both players and spectators can see everything that is going on [1]. In simples devices like communication between two nodes, the single light source and receiver are used. Multiple optical channels and visual-spatial of additional light sources are possible with the sensor [2]. High bandwidth speed data communication is possible using visible light in visible light communication (VLC). Modulating the intensity of light emitted by a light source is used to transmit this data (Fig. 1). M. George Craford, a Holonyak bachelor’s degree student at the university, is credited with inventing the first yellow colour LED and a brighter red colour LED in 1972. In 1976, Thomas P. Pearsall invented a more-brightness light-emitting diode with fibre optic cables in telecommunication services [3]. The prime target is to develop a message, sound

Fig. 1 Block diagram of Li-Fi

Li-Fi Enables Reliable Communication of VLT …

215

recording, or various monitoring applications using Li-Fi technology to deal with the bandwidth issue we have with RF signals. Li-Fi is used to create a more reliable, cost-effective, ensure security and fast connection. The organization of paper consists of introduction in Sect. 1, the working model of the proposed work is projected in Sect. 2 of the article. Section 3 contains the proposed Li-Fi model, and the result analysis is explained in Sect. 4. Finally, Sect. 5 consists of the conclusion of the research.

2 Working Model For data transmission, Li-Fi employs visible light via overhead lighting. Visible light communications (VLCs) are used to transmit data, which make this possible. There are two essential parts in a VLC module: A photodiode is required in at least one device to receive light signals. The future of wireless communication lies with Li-Filight fidelity, which is similar to Wi-Fi in many ways. Wireless high-speed networking and bidirectional communication are some of this technology’s key features [4]. The transmitter and receiver are the two main components of a Li-Fi system. The input signal can be modulated at the transmitter section with a specific time before being sent using LED lights in zeros and ones form. LED bulb flashes are represented with 0 and 1 s in this example. A photodiode is used at the receiver end to receive the LED flashes, strengthen the signal and output [5]. Input, a timer control circuit and an LED light source are all encompassed in the transmitter section of the Li-Fi platform’s circuit diagram, shown below. Text, voice, or other types of data can be fed into the transmitter. This chapter’s timer circuit can be used to provide the time available interval training between each bit, which are communicated to the receiver end in the form of LED flashing lights. The circuit is clear and straightforward; it consists of three transistors, a few resistors, three cathodic capacitors, one potentiometer and one 1 W LED. The entire circuit is set it up as a shared-base mode amplifier (Fig. 1).

2.1 Energy Impart Section It will be informative to see how a source of energy—Li-Fi—can help the energy harvest sensors and assist the bidirectional transmission of data in this research (Fig. 2). The integration of Li-Fi and energy generation Wi-Fi embedded sensors may enable appealing functionalities and provide substantial advantages in the nextgeneration top of buildings [6]. The following is how the Li-Fi connectivity is envisioned: (i) Because of the power delivered by the Li-Fi Light source, the wireless power transfer harvest sensors can immediately obtain enough power and thus do not face a power problem. The wireless sensing system’s ‘easy charger’ is a dynamic combination of LED and energy harvest sensors. (ii) When compared to existing

216

N. V. R. Gupta et al.

Fig. 2 Transmission unit

RF electromagnetic technologies, Li-Fi allows for much faster transmission speeds. Energy harvest sensors could efficiently deliver environmental parameters for control purposes in this manner. Future monitoring applications will benefit from the highspeed characteristics. (iii) Energy harvesting sensors could aid in determining the LED lights’ coverage area [7]. In VLC, the sender section methods the data and sends it to the receiver. The data are encoded into the light source using several components in the sender section, including [8–10]. • • • •

The HyperTerminal programme is installed on this node. Max 232 IC Circuit of switching Diode-laser.

The transmitter module is composed of a logic board that wants to control the electrical inputs and outputs of the LED, as well as a microcontroller that manages various LED functions. An optoisolator, an open-collector hex inverter and other components are included on the PCB. A 9 V battery powers the transmitter. An optoisolator connects a standard RS-232 signal from a computer to the circuit’s driver section. At the optisolator’s input, a resistor/diode configuration converts the voltage difference of an RS-232 signal to a signal appropriate for the optoisolator’s LED [11, 12] (Fig. 3).

2.2 Receiver Section An NPN phototransistor serves as the receiving sensor. Even though the light frequency band is in the visible spectrum (670 nm), the phototransistor’s wideranging response band (550–1050 nm) is large enough to detect the intense light

Li-Fi Enables Reliable Communication of VLT …

217

Fig. 3 Transmitter

Fig. 4 Receiver unit

beam. The phototransistor signal is cleaned up and squared by buffering it with a pair of Schmitt trigger buffers. An embedded system converts the second buffer’s power to an RS-232 primary signal (Fig. 4).

3 Implementation Li-Fi, or light fidelity, is a form of data transfer that helps to use reflected light to send information. Due to the increased bandwidth provided by this technology,

218

N. V. R. Gupta et al.

bulk volume of data can be communicated at high speeds. The following elements comprise the proposed Li-Fi functioning system. • • • •

Graphical User Interface Input analyzer phase Data transmitting phase Send and receive phase.

3.1 Data Interpreting Unit The potential future of data transfer, such as Li-Fi, is a novel substitute to radio signal waves. It is highly secure, easy to access and less expensive than the alternatives. Because it is a visible light communication (VLC), it transmits information and illumination using a range of electromagnetic waves ranging from 400 to 800 THz. To generate digital strings like 1 and 0 s, you can vary the LED’s flashes rate. This encodes the data in light. In order to avoid being detected by the human eye, the LED intensity is modulated. Li-Fi is extremely simple to use. On one end, there is a light emitter, such as an LED, and on the other, there is a light sensor device. When the LED is turned ‘ON’, the electromagnetic radiation registers a binary ‘1’; when it is turned ‘OFF’, it registers a binary ‘0’. To construct a message, flash the LED repeatedly or use an array of LEDs of a few different colours to achieve data rates in the hundreds of Mbps range.

4 Results and Discussion This Li-Fi system can transfer any data format, including text, image and multimedia data, between the two nodes at lightning speed. The primary requirement for this LiFi is a direct LOS between the sender and receiver. Due to this, Li-Fi can only be used as a local area network (LAN) within a limited geographic area. This technology, however, is faster and more accurate than wireless signal transmission. Data are sent and received through VLC, which serves as the primary source. The data transmission medium is the visible light spectrum between 375 and 780 nm; in order to fully-fledge their network, many businesses and their research development unit invested in Li-Fi research with continuous research and a speed of about 800 Mbps (Fig. 5).

5 Conclusion and Future Work In the not-too-distant future, Li-Fi technology will emerge as a powerful technology, and it circumvents radio spectrum limitations and offers a high data transfer rate. A better and less expensive alternative to Wi-Fi is using analogous hotspots to transmit

Li-Fi Enables Reliable Communication of VLT …

219

Fig. 5 Li-Fi data transfer process

data wirelessly from each light bulb. There are many advantages to using this technology over Wi-Fi, and if it is commercially viable, then every light bulb can be used as an analogous hotspot to transmit data wirelessly. A major game-changer in data communication could be found in light fidelity. This new and advanced form of technology is still undergoing continuous research to correct any problems or errors and address any issues related to light fidelity.

References 1. Valiveti, H.B., Polipalli, T.R.: Light fidelity handoff mechanism for content streaming in highspeed rail networks. In: 8th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), pp. 488–492. IEEE (2017) 2. Thayananthan, V., Abdulkader, O., Jambi, K., Bamahdi, A.M.: Analysis of cybersecurity based on Li-Fi in green data storage environments. In: IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud), pp. 327–332 (2017) 3. Ghosh, D., Chatterjee, S., Kothari, V., Kumar, A., Nair N., Lokesh, E.: An application of LiFi based wireless communication system using visible light communication. In: International Conference on Opto-Electronics and Applied Optics (Optronix), pp. 1–3 (2019)

220

N. V. R. Gupta et al.

4. Monisha, M., Sudheendra, G., Lifi-light fidelity technology. In: International Conference on Current Trends in Computer, Electrical, Electronics and Communication (CTCEEC), pp. 818– 821 (2017) 5. Haas, H., Yin, L., Wang, Y., Chen, C.: What is LiFi? J. Light. Technol. 34, 1533–1544 (2016) 6. Ayyash, M., et al.: Coexistence of Wi-Fi and LiFi toward 5G: concepts opportunities and challenges. IEEE Commun. Mag. 54(2), 64–71 (2016) 7. Soni, N., Mohta, M., Choudhury, T.: The looming visible light communication Li-Fi: an edge over Wi-Fi. In: International Conference System Modeling & Advancement in Research Trends (SMART), pp. 201–205 (2016) 8. Shanmughasundaram, R., Prasanna Vadanan, S., Dharmarajan, V.: Li-Fi based automatic traffic signal control for emergency vehicles. In: 2nd International Conference on Advances in Electronics, Computers and Communications (ICAECC), pp. 1–5 (2018) 9. Kalaiselvi, V.K.G., Sangavi, A.: Li-Fi technology in traffic light. In: 2nd International Conference on Computing and Communications Technologies (ICCCT), pp. 404–407 (2017) 10. Andreev, V.V.: Wireless technologies of information transmission based on the using of modulated optical radiation (Li-Fi Communication System): State and Prospects. In: Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO), pp. 1–4 (2018) 11. Wu, Y.T.: Research trends in technological pedagogical content knowledge (TPACK) research: a review of empirical studies published in selected journals from 2002 to 2011. Br. J. Edu. Technol. 44(3), E73–E76 (2013) 12. Bhalerao, M., Sonavane, V., Kumar, V.: A survey of wireless communication using visible light. Int. J. Adv. Eng. Technol. 15(2), 188–197 (2013)

Design of Dual-Stack, Tunneling, and Translation Approaches for Blockchain-IPv6 Srihari Varma Mantena , S. Jayasundar , Dilip Kumar Sharma , J. Veerappan , M. Anto Bennet , Sudhakar Sengan , and Rajasekar Rangasam

Abstract The most current edition of the Internet protocol is IPv6 (IP). Because once compared to IPv4 and IPv6 has a series of benefits of more efficient packet processing. With each day, IPv4 addresses should become obsolete. Migration from IPv4 to IPv6 is now required. This study uses tunneling, dual-stack, and translation methods to transition from IPv4 to IPv6. Using such techniques, we could send a limited time from an IPv4 user to an IPv6 client. IPv4 packets on an IPv4 network and IPv6 packets on an IPv6 network are transmitted using a dual-stacked method. The proposed tunneling method uses the same channel for data transfer in IPv6 and IPv4 packets. For IPv4 to IPv6 transmission, a translation method has also S. V. Mantena Department of Computer Science and Engineering, SRKR Engineering College, Bhimavaram, India S. Jayasundar Department of Computer Science and Engineering, Idhaya Engineering College for Women, Chinnasalem, Tamil Nadu, India D. K. Sharma Department of Mathematics, Jaypee University of Engineering and Technology, Guna, Madhya Pradesh, India J. Veerappan Department of Electronics and Communication Engineering, Saveetha School of Engineering, Chennai, Tamil Nadu, India e-mail: [email protected] M. A. Bennet Department of Electronics and Communication Engineering, Vel Tech Rangarajan Dr, Sagunthala R&D Institute of Science and Technology, Chennai, Tamil Nadu, India e-mail: [email protected] S. Sengan (B) Department of Computer Science and Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu, India R. Rangasam Department of Computer Science and Engineering-AI & ML, School Engineering, Malla Reddy University, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_21

221

222

S. V. Mantena et al.

been implemented. The dual-stack, tunneling, and translation method has high intercommunication versatility, efficiency, and consistency. Only with a wire shark’s help, the procedure of this channel be examined (a packet analyzer).

1 Introduction The Internet is a high-speed connectivity system of connected computer networks that use the standard versions of the Internet protocol (IP), IPv4 and IPv6. Because these protocols are not directly compatible, there is some competition among them; this paper aims to demonstrate the interoperability of IPv6 and IPv4, and n/w providers and users are being pressured to choose whether to support more than one protocol for varied communication networks. Our ultimate goal is to enable IPv4 users to access IPv6 users and IPv6 users to interact with IPv4 users who cannot update to the most recent IP. IPv6 is more expensive than IPv4 because it is more expensive [1]. Using our program, we were able to make CISCO routers consistent with both IPv4 and IPv6 users. The study’s principles apply to routing instances in the IPv6 network using open-source dynamic protocols, and performance measures and statements are analyzed and submitted as a result [2].

1.1 IPv4 IPv4 is the fourth IP. This is one of the core protocols of standards-based internetworking methods on the World Wide Web and other packet-switched networks. In 1983, the ARPANET was just the first network to use IPv4 in production. Despite the ongoing deployment of IPv6, it still routes the majority of Internet traffic today. The IP is the method that defines and enables the communication between vehicles on the Internet and the network layer protocol suite. In summary, it is the world’s comprehensive Web foundation. It uses a logical result to demonstrate and route packet data from a source host toward the next router, one hop closer to the destination node on a different device. IPv4 is a connection-oriented system that enables the best service model, which means that delivery, proper replication, and the aversion of redundant delivery are not guaranteed. The transport layer discusses these considerations, which also include data security [3].

1.2 IPv6 IPv6 seems to be the latest version of IP, a communications protocol that generates a system for identifying and locating computers on networks and routing traffic across the WWW. Through the use of multicast, addresses are extended and generalized

Design of Dual-Stack, Tunneling, and Translation …

223

and further optimizes the provision of services. In designing the protocol, aspects of device flexibility, security, and specification are taken into account. IPv6 is a packetswitched Internet layer protocol that enables End-to-End data chart transmission all over multiple IP networks, in close accordance with the design rules discussed in the preceding protocol version, IPv4. IPv6 also incorporates characteristics not present in IPv4 in addition to provide more addresses. It makes it easier to set up addresses, renumber networks, and declare routers when changing connectivity providers [4]. It simplifies package processing in routers by positioning packet fragmentation liability at endpoints. Setting the host ID part of an address to 64 bits, you standardize the IPv6 subnet size. RFC-4291 defines IPv6 addressing architecture and allows the transmission of three different kinds: unicast, anycast, and multicast. Now many Web users have access to IPv6 mainly because the destination address expanded [5]. The paper’s organization consists of the introduction of the article in Sect. 1; the literature survey is explained in Sect. 2 of the article. Section 3 contains the proposed method of dual-stack tunneling and translation on IPV6, and Sect. 4 is presented the result and discussion of the proposed method. Finally, Sect. 5 consists of the conclusion of the research work.

2 Literature Survey The fourth revision of the Internet protocol is known as IPv4. IPv4 is a packetswitched link-layer network connections protocol (e.g., Ethernet). By far, the most widely used Internet layer protocol is IPv4. It uses the most required delivery method. IPv4 is a network that allows two or more computers to communicate with each other. Because the lack of adequate original capacity in the previous Internet infrastructure is the leading cause of Ipv4 exhaustion, many other factors have compounded the situation. Each of them increased competition on the channel’s small quantity of domain names, often in surprising places for the broadcaster’s developers [6]. The sudden advancement of technology has resulted in an increasing number of Internet-connected devices. Many transition technologies are available, each with unique expectations, operations and maintenance theory, and IPv6 ease of access. Implementation analysis can help network engineers discover the proper environmental transition mechanism. The main focus of this work is to evaluate the various transition mechanisms with the packet tracer’s help. This same analysis shows that dual-stack performed best than tunneling. The key distinction between IPv4 and IPv6 headers is that IPv4 header source and destination addresses are 32 bits long, whereas IPv6 header source and destination addresses are 128 bits long. A network is a group of devices that are interconnected together because the they can communicate with each other. This study assesses tunneling and dual-stack efficiency, as they are the most commonly used approach. According to specific configuration commands, all configurations were incorporated using GNS3 as the operating system according to particular configuration commands [7].

224

S. V. Mantena et al.

This article provides an overview of the transition mechanism behavior, with and without VPN protocols. The authors assess two transition processes: 4 to 6 and 6 to 4. The benefits and drawbacks of both machines are certain. In this research, the established communication metrics were throughput, delay, jitter, DNS efficiency, DNS delay, and DNS jitter for both protocols TCP and UDP [8]. Dual-stack is the most versatile and recommended way to deploy IPV6 in existing IPV4 environments, and it manages to combine dual-stacking and tunneling technologies that can be identified by capturing packets from routers. With the growing number of communication devices, IPv4 address space is unavoidable. Migration from IPv4 to IPv6 is thus necessary. This project aims to design an IPv6 address scheme for the Indian Financial Network (INFINET) and evaluate different migration strategies for IPv6. After the method was created, it was implemented on a test bench with a dual-stack configuration. We use an IP address management tool to manage IPv6 IPs and use DNS with BIND 9 for URL resolution. With the growth in Internet of Things (IoT) intelligent devices and the world’s trend to a converged network environment, the Internet protocol address becomes the major logical infrastructure for every kind of voice and communication with data, leading to a 32-bit IPv4 address space exhaustion. Various issues such as security, service quality, addressing, routing management, and address depletion have been addressed with the IPv4 addressing infrastructure. The information and communication technology service providers are migrating to IPv6 and migrating to cloud computing and software-defined networks, where “migration in unity” is conceived to enter the new IT-based companies and services era. The interoperability of IPv4 and IPv6 is not possible. Moving to the IPv6 operating network is, therefore, a gradual process [9]. This paper suggested a transitional network step after addressing the service providers’ migration strategies with different transition techniques. IPv6 generates numerous seamless features, making it much better than its preceding IPv4 protocol. It is expected that IPv4 is currently a standard and has been installed in almost all Web technologies. The conversion from IPv4 to IPv6 is, therefore, very challenging. Many techniques have been developed to prevent this transformation, or in the real sense, to wait for it, such as CIDR and NAT [10]. However, the fact is, the set of IP addresses is sent, and the ultimate result is to move on to IPv6. This study paper has two objectives: Firstly, the issues of change from IPv4 to IPv6 should be highlighted, and secondly, the difference can be provided for end-users to use all IPv4 services. The aim is to cog the event and the demands that will probably be addressed during the IPv4 to IPv6 transition. DSTM allows both protocols to run on a one-time basis, and the results indicate a seamless transition from IPv4 to IPv6 [11]. Exhaustion of IPv4 contributed to a different IP version, IPv6. Using IPv4 and IPv6 networks, Internet networks are hybrid when trying to change from IPv4 to IPv6. This proposal defines the critical compatibility data among IPv4–IPv6 methods. Dual-stack is a compatible IPv4–IPv6 method that operates both IPv4 and IPv6 in one node. In packets IPv4, 6 to 4 tunneling method encodes IPv6 packets to make IPv6 network communication via IPv4 network necessary, in this research work,

Design of Dual-Stack, Tunneling, and Translation …

225

dual-stack and tunneling processes are incorporated later [12]. This study investigates transmission latency and transmission using TCP/UDP as transport protocols in various scenarios through empirical observations from both dual-stacks and tunneling. This work contains a functional strategic perspective before attempting to execute IPv6 in a network. IPv6 is now the most developed and technologically advanced. IPv4 has a lower speed, very little memory. The cost of movement from IPv4 to IPv6 networks is more remarkable because we have found that tunneling, dual-stacking, and translation deliver excellent results compared to others.

3 Proposed Work We recommended the dual-stack tunneling and translation methodologies for IPv4 to IPv6 transmission. We also lowered the length of the algorithm by combining the ports of the router. CISCO routers are preferred in this work because they give sound output and speed compared to other routers. We have various double-stacking, tunneling, and translation methods [13].

3.1 Dual-Stack A double-stack network is a network with IPv4 and IPv6 enabled for all nodes. It is highly significant to the router since, typically, the router is the first node on a specific system that receives traffic from outside the network. Dual-stack implies that devices can run iPv4 and IPv6 parallel [14]. It allows hosts to simultaneously reach IPv4 and IPv6 content, offering a versatile coexistence method. Solution over a connection between the layer 2 or 3 services provided in Fig. 1.

Fig. 1 Dual-stack mapping

226

S. V. Mantena et al.

Fig. 2 Dual-stack layout

We use the same router to connect IPv4 and IPv6, and the entire converting procedure is executed in one router. This approach minimizes conversion costs and allows users to use their old IP [15]. This is the most commonly used and suggested conversion method, as shown in Fig. 2. By combining all of the ports in a single syntax, this strategy can shorten the program. This makes programming the router a lot easier. The latency among data transmissions was also reduced with our proposed system.

3.2 Tunneling Tunneling, also recognized as “device discovery,” is the dissemination of data that is only for use in an underground, usually company network via a wireless site, in a manner that would not make consumers network’s routing devices conscious of the origin of the transfer. Tunneling is usually done by encapsulating private network data and protocol information within transmit antennas of public networks, resulting in data from the personal network protocol as public communication networks. Tunneling creates knowledge to be sent over the WWW on behalf of a private network, as seen in Fig. 3.

Design of Dual-Stack, Tunneling, and Translation …

227

Fig. 3 Tunneling map

4 Proposals in Tunneling All routers in the network connection between two IPv6 nodes cannot implement iPv6 in order to minimize all links during the transition. IPv6 packets are encapsulated within IPv4 packets and sent through IPv4 routers, and tunneling employs this method. Both for IPv4 and IPv6, we have a combined program. The routers have been designed to transfer data at a faster rate than before. Besides, if the primary route is unavailable, we have a program that sends the data over an alternative route if one is available.

4.1 Translation NAT is a method of converting a personal IPv4 address to a public IPv4 address, and it provides only network transmission between IPv6 and IPv4. A border device (translator) between two IP networks is required to allow native IPv6 hosts in the IPv6-only network to interact with native IPv4 hosts in the IPv4-only network and vice versa, shown in Fig. 4. One protocol is always converted into the other in translation mechanisms.

228

S. V. Mantena et al.

Fig. 4 Translation method

Fig. 5 Translation layout

4.2 Translation Layout This technique requires changing Internet protocol. Conversely, we could use the IPv4 and IPv6 suitable routers shown in Fig. 5. We have decreased coding, which can make IPv4 packets perform faster and inject IPv6.

5 Simulation Analysis This bar chart and Table 1 describe the total taken to transmit and convert IPv4 to IPv6. Initially, we sent IPv4 users 100 MB of data, and we got 3.4 Mbps transmission speed and 21 latency. Likewise, transmitting IPv4 to IPv6 improved the performance to 5 Mbps. Then, GB’s increased data size has some excellent outcomes, as we predicted (Fig. 6). The transmission and transformation times from IPv4 to IPv6 are represented in this bar chart and Table 2. We sent 100 MB of data to the IPv4 user, and the

Design of Dual-Stack, Tunneling, and Translation …

229

Table 1 Dual-stack output and bar chart Packet size

IPv4

IPv6

Latency

Throughput (Mbps)

Latency

Throughput (Mbps)

100

21

1000

26

3.4

20

5

38.4

23

2000

43.4

36

35.5

36

55.8

3000

48

75

37

81

Fig. 6 Transmit and convert IPv4 to IPv6

Table 2 Tunneling output (TAP tunneling) and bar chart Packet size

IPv4

IPv6

Latency

Throughput (Mbps)

Latency

Throughput (Mbps)

100

13

7.6

21

4.7

1000

19

52.6

24

41.6

2000

30

66.6

33

55.8

3000

36

83.3

40

81

transmission rate was 7.6 Mbps, with a latency of 13. Similarly, we attempted to transmit from IPv4 to IPv6, and the speed increased to 4.7 Mbps. When we raised the data size to GBs, we got the expected excellent results (Fig. 7). The transmitting and converting time from IPv4 to IPv6 is depicted in this bar chart and Table 3. We sent 100 Mbps of data to the IPv4 user, and the transmission speed was 7.6 Mbps, and the latency was 12. Similarly, when we attempted to transmit Fig. 7 Transmission and transformation times from IPv4 to IPv6

230

S. V. Mantena et al.

Table 3 Translation output and bar chart Packet size

IPV4

IPV6

Latency

Throughput (Mbps)

Latency

Throughput (Mbps)

100

12

1000

16

7.6

19

4

52.6

21

2000

40.6

28

66.6

30

52.7

3000

31

83.3

26

78

Fig. 8 Outcome of increased data size to GB

from IPv4 to IPv6, we only got 4 Mbps. When we increased the data size to GBs, we got the expected excellent results (Fig. 8).

6 Conclusion IPv6 introduces new features and functions to resolve many of IPv4’s constraints. IPv6 enhances router efficiency and is simple to set up. As a result, now is the opportunity to concentrate on IPv6. As a result, many of the modifications in IPv6 are merely cosmetic. As a result, the address space has been expanded, and an important barrier to the Internet’s continued growth has been removed. IPv6 also has more flexibility and expandability than IPv4 and security support built into the protocol definition. As a result, IPv6 packets are carried over existing IPv4 and by simulation tool GNS3, demonstrating interoperability. Some of IPv6’s advantages are self-evident: more addressing space, built-in QoS, and improved routing performance and services. However, several obstacles must be overcome before IPv6 can be implemented. We all require IPv6, as the increased address space needed for the growth of IP appliances, which we are now hearing about every week. Cars that are IP-ready are already on their way. IPv6 has the potential to solve a variety of issues. Even before the early 1990s, most of those best programmers and engineers have been working on IPv6. Hundreds of RFCs were written, covering various topics such as expanded addressing, simplified header format, flow labeling, verification, and confidentiality. Extended addressing raises the addressing method from 32 to 128 bits. It also includes newer unicast and broadcasting methodologies and the ability to inject hexadecimal into IP addresses.

Design of Dual-Stack, Tunneling, and Translation …

231

References 1. Ashraf, Z., Yousaf, M.: Optimized routing information exchange in hybrid IPv4-IPv6 network using OSPFV3 & EIGRPv6. Int. J. Adv. Comput. Sci. Appl. 8(4), 220–229 (2017) 2. Dasgupta, S., Roy, P.J., Sharma, N., Misra, D.D.: Application of IPv4, IPv6 and dual stack interface over 802.11ac, 802.11n and 802.11g wireless standards. In: 3rd International Conference on Advances in Electronics, Computers and Communications (ICAECC), pp. 1–6 (2020) 3. Inforzato, et al.: Scip and IPSEC Over NAT/PAT Routers. United States Patent Application Publication. Pub. No.: US 2019 / 0097968 A1. Pub. Date: Mar. 28 (2019) 4. Jain, V., Tiwari, D., Singh, S., Sharma, S.: Impact of IPv 4 IPv6 and dual stack interface over wireless networks. Int. J. Comput. Netw. Inf. Secur. (2018) 5. Kao, Y., Liu, J., Ke, Y., Tsai, S., Lin, Y.: Dual-stack network management through one-time authentication mechanism. IEEE Access 8, 34706–34716 (2020) 6. Lencse, G., Kadobayashi, Y.: Methodology for the identification of potential security issues of different IPv6 transition technologies: threat analysis of DNS64 and stateful NAT64. Comput. Secur. 77(1), 397–411 (2018) 7. Liu, J.C., Ke, Y.-Q., Kao, Y.-C., Tsai, S.-C., Lin, Y.-B.: A dual-stack authentication mechanism through SNMP. Proc. 4th Int. Symp. Mobile Internet Secur., 1–13 (2019) 8. Ordabayeva, G.K., Othman, M., Kirgizbayeva, B., Iztaev, Z.D., Bayegizova, A.: A systematic review of transition from IPV4 To IPV6. In: Proceedings of the 6th international conference on engineering & MIS 2020 (ICEMIS’20). Association for Computing Machinery, New York, NY, USA, Article 6, 1–15 (2020) 9. Samaan, S.S.: Performance evaluation of RIPng, EIGRPv6 and OSPFv3 for real-time applications. J. Eng. 24(1), 111–122 (2018) 10. Samad, F., Abbasi, A., Memon, Z.A., Aziz, A., Rahman, A.: The future of internet: IPv6 fulfilling the routing needs in internet of things. Int. J. Future Gener. Commun. Network. 11(1), 13–22 (2018) 11. Shalini Punithavathani, D., Radley, S.: Performance analysis for wireless networks: an analytical approach by multifarious Sym Teredo. Sci. World J. 8. Article ID 304914 (2014) 12. Sheryl, R., Shalini, P.: Real time simulation of routing virtualization over a testbed designed for the various IPv4-IPv6 transition techniques. Asian J. Inf. Technol. 13(9), 485–493 (2014) 13. Sochor, T., Sochorova, H.: Dynamic routing protocol convergence in simulated and real IPv4 and IPv6 networks. In: Silhavy, R. (eds.) Cybernetics and Automation Control Theory Methods in Intelligent Algorithms. CSOC 2019. Advances in Intelligent Systems and Computing, vol. 986 (2019) 14. Toyota, Y., Nakamura, O.: Dynamic control method of explicit address mapping table in IPv6 single-stack network. In: 21st Asia-Pacific Network Operations and Management Symposium (APNOMS), pp. 37–42 (2020) 15. Yan, Z., Wang., H.C., Park, Y.J., Lee, X.: Performance study of the dual-stack mobile IP protocols in the evolving mobile internet. Netw. IET 4(1), 74–81 (2015)

Wavelet-Based Aerial Image Enhancement Tools and Filtering Techniques P. Ramesh , V. Usha Shree , and Kesari Padma Priya

Abstract Image enhancement procedures have a broad range of options for manipulating images to produce appealing graphical outcomes. The methodology to enhance is determined by the task at hand, image quality, observer characteristics, and viewing conditions. This survey paper provides an overview of resolution enhancement strategies and analyzes their performance. The work also discusses a wide variety of filtering techniques and georectification applications currently on the market. The review shows that image enhancement techniques that do not use wavelets have the downside of missing high-frequency information, which causes blurring. Also, the CWT method almost shifts invariant, resulting in improved efficiency. Combining wavelet transform with other enhancement and filtering methods will yield fewer artifacts related to other methods for hyperspectral satellite images. Similarly, improve the accuracy of a satellite image in terms of MSE and PSNR.

1 Introduction In the image processing domain, the enhancement of satellite images is a fruitful research topic. The enhancement process aims to make an image look original to apply certain remote-sensing activities appropriately. The remotely sensed images are given more choices for enhancing their visual quality through satellite image enhancement techniques. The efficient and frequently used method for collecting information from the earth’s surface is called aerial imaging. The captured images at an elevation between hundreds of meters to hundreds of km make airplanes, P. Ramesh (B) JNTU, Anantapur, Andhra Pradesh, India V. U. Shree Department of Electronics and Communication Engineering, Joginpally B.R. Engineering College, Hyderabad, Telangana, India K. P. Priya Department of Electronics and Communication Engineering, University College of Engineering, JNTU, Kakinada, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_22

233

234

P. Ramesh et al.

unmanned aerial vehicles (UAV), or satellites. There is a growing demand for aerial images as it is used in numerous defense and non-defense applications. There are too many merits to the credit of UAV, which include the minimum operational cost of traditional airplanes. Moreover, such airplanes’ launching in rocky areas is too easy, and their usage is highly safer for human beings. Thus, the UAVs built with the camera make it viable using these merits [1]. The panoramic view of the landscape and the complete coverage of the search area widen when the airplanes with wide-angle lenses fly at higher altitudes; however, this also minimizes the resolution [2]. A locally insufficient brightness in aerial images is due to the bad climatic situations that introduce objects into the image, the restricted range of the camera, and the thermal features of electronic devices lead to the signal-to-sound ratio. Some preliminary processes such as enrichment, retrieval, and non-obscurity to be succeeded by detecting the feature are carried out if HSV does not observe the recorded images. Artifact detection, tracking, and categorization utilize the feature detection technique [3]. Juxtaposing the recorded image with the detected image is the main idea of improvement methods. This is primarily for the scene’s enormous signification detected directly by deriving the details in images, which are supposed to be very dark because of either lack of brightness or contrast. These global processing-based enrichment systems lead to the single plotting between the i/p and o/p intensity space [4]. There are local image enrichment methods in which the input pixel value alone is not the base of output value but the pixel values in the pixel’s neighborhood. Even in different lightings, these techniques enhance the local contrast. An outline of the current optimization techniques, geo-correction, and filtering-based preprocessing methods is discussed in this literature review. Various techniques have been proposed to enhance digital images that can be used to improve satellite imageries. A survey on numerous enhancement methods for satellite images has been carried out, and it reviews each enhancement model, method, and technique (Figs. 1 and 2). The taxonomies of image enhancements: SDE and FDE. The pixel image values are directly dealt with by the spatial domain method. While enhancing an image,

Fig. 1 Image feature extraction process

Wavelet-Based Aerial Image Enhancement Tools and Filtering …

235

Fig. 2 Examples of image fusion

the FDM adopts the Fourier change of an image. This paper discusses the wavelet transform-based (WT) image enhancement techniques and models and filtering and geo-correction models [5]. The organization of this paper consists of an introduction of the form in Sect. 1, the research findings of image enhancement models and their performance analysis in Sect. 2 of the article. Finally, Sect. 3 consists of the conclusion of the research.

2 Research Findings 2.1 Wavelet-Based Satellite Image Enhancement Techniques The fast Fourier transform algorithm is a popular method that appropriately assesses the stationary signal in which the fundamental frequencies possess nonfinite coherence time. But, the time of occurrence of a specific frequency has failed to be ascertained accurately by this method. Later, Gabor presented a short-time Fourier transform with fixed-size windows to fix Fourier transform issues. However, the local modifications in frequency content that stimulated wavelet to change to analyze the signal are not depicted. WT provides a multi-scale and time–frequency-localized image description. Wavelets are a fundamental operation for representing WT signals and imageries, identical to the sine function and cosine function in the Fourier domain. As the representation of many resolutions of a specified image is provided

236

P. Ramesh et al.

by WT, the individually maintained high-pass component and the low-pass component can be optimized by the researchers if a frame image with multiple resolution levels can be decomposed into many components. Author [6] considers the background study and the research work related to frequency domain based on image enrichment. Created on the comment, the FDM, which is best for contrast enhancement, can study an image under various resolutions and track minor variations at different levels. But, they have found out that the execution time of the DWT approach is prolonged and complicated. Moreover, the DWT also lacks in handling noise amplification and uneven image. Also, the operational cost is very high in DCT and has more difficulty opting for appropriate transmission principles. A significant demerit of this Fourier transforms technique is that it lacks the perfection of producing an enhanced image over lighting. The proposed [7] suggested enriching dark images using a technique based on the locally transformed histogram. Since this method does not implement the conversion technique on the input image’s entire histogram, it is better than the other methods. Instead, a small portion of the input image histogram is used for applying this conversion technique. Compared to the present methods, the empirical results are so explicit that this technique performs very well. The researcher recommended a novel technique that first filtered noises from spacecraft images using multiple noises and filter sequences. Next, it is followed by a process which is an amalgamation of firmness enrichment technique based on merged HF sub-band images acquired by DWT and input image in association with contrast enrichment technique. The procedure is equipped with SVD transform of LL band image gotten from DWT with both the process happening correspondingly. While resolution enrichment functions on HL, LH, HH sub-bands, brightness enrichment functions on LL sub-band. The proposed approach’s performance and testing are done on spacecraft imageries with various noise and filters. The PSNR and visual outcomes show the median filter’s better performance, reducing salt and pepper noise and Weiner filter for all speckle, Poisson, and Gaussian noise. Author [8] addressed a serious problem faced in the WT-based optimization of medical images, i.e., the technique for high-frequency information mining. This algorithm that contains high-frequency sub-images of wavelets is disintegrated using the HAAR transform. This resultantly helps to mine high-frequency information effectively. The effective enhancement of a medical image is also made possible using this technique. The result shows that the proposed algorithm can effectively optimize image contrast besides preserving its edge property. An image is disintegrated using two-level WTs. Due to smooth threshold filtering, the wavelet disintegration level is preferably less than four times. Author [9] proposed a method that aimed at the enhancement of biomedical images. This method uses a fusion of wavelets. The first wavelet D’Mayes is implemented once scale invariant feature transforms (SIFT) is applied. Author [10] uses an image fusion based on wavelet to enhance the tarnished imageries by fusing HE performance and honing his proposed method. First, to get the basic imageries, HE and honing correspondingly execute in the same tarnished image. Using distinct rules, these basic imageries are combined in the wavelet field to obtain the enriched image.

Wavelet-Based Aerial Image Enhancement Tools and Filtering …

237

The trial’s outcome shows a significant improvement in tarnished imageries’ brightness and sharpness using the proposed algorithm and parallely supplies sufficient particulars and suitable brightness in the enriched imageries [10]. A dynamic range contraction process was recently proposed based on a wavelet to enhance the graphic properties of digital imageries taken in uneven illumination situations in the altitude of dynamic range scenes [11]. The quick image enrichment algorithm renders an active range contraction while conserving the local brightness and sound interpretation. This is the most reliable candidate in implementing aerial imagery like image rendition for army and security reasons. Air safety is ensured by further implementing this algorithm to video streaming. This study presents the recent version of the planned algorithm that could optimize the aerial imageries, and therefore, the eye-to-eye human surveillance is better than the optimized imageries. A sturdiness and excellent image quality are the outcomes of this method using this algorithm for various aerial imageries. Table 1 compares different image enhancement models that are taken as inspiration for this research work.

2.2 Geo-correction and Orthorectification Techniques in Enhancing Aerial Images A 3D milieu is usually transformed into a 2D picture by a map. The rebuilding of correct imaging geometry is geocoding because each image pixel’s position has a global relationship. Generally, internal diffusions influence remote-sensing data because of the sensor or the platform, and diffusions caused by topographic effects also influence them. The chief objective is that diffusion’s impact is also needed for a perfect resolution (sub-meter level), mainly providing a base for collecting spatial information. Besides, geometric distributions restrain the placement and coregistration of topographic data with other geospatial layers of the data, as stated. The image diffusion is eliminated using a technique called “orthorectification,” and since they would be in a planimetric map, they produce an “ortho-image” with positioned characteristics. Proper computation of distance is not feasible because the scale is inconsistent in the image [12]. It is ubiquitous that an ortho-image is referred to as aerial photographs or satellite images. However, any 2D remotely sensed data in a 3D-bound atmosphere could extend the idea of orthorectification. The queries related to diffusion and topography modifications for radar images are addressed in many papers by finding the similarity between SAR data with the digital topography model. A technique proposed to orthorectify satellite or aerial radar images using terrain altitude information is extracted from stereo-radar. The latest studies did the association of the latest studies with satellite or aerial data [13].

238

P. Ramesh et al.

Table 1 Different image enhancement models Methodology

Application

Pros and Cons

S-WT

Magnetic resonance (MR) imaging

Pros: The optimized MR images’ complicated structural characteristics are maintained Cons: Nonetheless, too much noise amplification in comparatively the same areas in the image is the drawback of this method

N-SCT

Image processing

Pros: Shift-invariant directional multiresolution image depiction is provided Cons: The original attributes at high frequencies are in some way vanished because of decimation in CT. The characteristics at the edge of the reconstructed image may get disappeared from the subsequent images

LWT

Image enhancement

Pros: The algorithm becomes faster and possesses more memory by using LWT and SVD Cons: Maximizes spectral information, flaws, color, and image’s visual quality

NST

Image enhancement

Pros: The quantity of information and clarity of the actual image are increased Cons: The computation time is high due to its vast structure and the count of a vague repetition

WT

Medical image enhancement

Pros: The enhancement of image details and preserving its edge features is performed effectively by this technique Cons: No attention was given to over sharpness of edges and the impact of the atmosphere

DCT

Dark and low-image contrast improvement

Pros: By not missing any image or color details, the contrast of the image and color details is increased by optimizing bistable system parameters Cons: Continuity of boundary between the excessively treated and less treated wedges in the o/p is the major problem of this technique

DWT and SVD

Satellite image enhancement

Pros: The proposed method helps to get better visual outcomes Cons: The proposed method outperforms the traditional way in its performance; however, it suffers more computational costs

Wavelet-Based Aerial Image Enhancement Tools and Filtering …

239

2.3 Image Enhancement Using HE HE is the most common method for enhancing image as it is simple and relatively performing well on nearly all kinds of imageries. Depending on the input gray levels’ probability distribution, the image’s gray levels are remapped to perform the HE operation. An analysis of various HE techniques used in image enhancement is discussed in this section [14]. Many researchers argue that enhancing brightness and image quality is made easy through HE. Yeong Kim bothered about brightness issues since 1997 and proposed brightness preserving bi-histogram equalization (BPBHE) for dissimilarity improvement. The individual point, which is deemed the average intensity value, was employed to determine the dissimilarity between shaded and illuminated regions. The finding of the above study was different from the study. Thus, the author presents the median intensity value’s greater accuracy due to comparing the individual point with the average intensity. Since the outcomes had more discrepancies, the least mean contrast between actual and resultant images was recommended because when associated with the BBHE and DSIHE, the individual point is highly exclusive and precise. Quadrant dynamic histogram equalization (QDHE) [15] is an emerging technique proposed. The histogram is partitioned into four sub-quadrants in the technique’s first phase, depending on the actual image’s median value. Lastly, the image matched after the standardization of each sub-histogram. The promising feature of QDHE is that it performs image enhancement in the absence of any intensity saturation, noise intensification, and too much optimization. Another study conducted by a proposed a HE-based novel approach that rectified lighting on the face images. The main concepts of this technique are an amalgamation of gamma rectification and contraction of the retinal filter’s function, viz., GAMMAHM-COMP. The retinal filter’s efficiency, the latest enhancement technique, can be related to the other three traditional enrichment techniques: HE, gamma rectification, and log conversion.

2.4 Filtering Techniques Employed in Image Enhancement Image enrichment is done exclusively to enhance the image’s look to make visual interpretation and investigation significantly simpler. The two functions, viz., brightness expansion, which mounts the sound variation between different characteristics in a scene, and spatial filtering, which optimizes or oppress an image’s particular spatial patterns, are part of the enhancement process. In 2009, a filtering technique sharpened the image’s edges that improved colored spacecraft imageries. The dual-staged data process using convolution and Laplacian methods were implemented to do HE. DWT and singular value disintegration-based innovative brightness enrichment techniques were introduced. Bettahar proposed a

240

P. Ramesh et al.

new procedure for the enhancement of color pictures via the scalar-diffusion-shockfilter coupling model. This technique reduced the blemished and noisy imageries. The resulting color images are very efficient for color images, which feature noise and blur reduction by not producing any artificial colors. They proposed two new image filters in this work. Measuring the distance among image pixels and their neighbors to build random principles for improving irregular pixel values are the basics of these two filters: FDF and NDF. Better images are produced when FDF and NDF use a (5 × 5) kernel than the usual (3 × 3) kernel. By calculating PSNR and MSE, the output of the proposed filters and renowned mean filters is studied. The eminence of the proposed filters is explicit from this performance. The method proposed by Masahiko Sakata curbs Poisson noise and produces incredibly quality images obtained with a tiny dose to enhance the image brightness very accurately. In our proposed work, a threshold value determined by the Bayes Shrink method was multiplied by a factor by accounting for the local average surrounding the interest pixel and used to decrease the Poisson noise. Besides, the “translation-invariant denoising” notion was presented to minimize the objects in the Poisson noise reduction process. The author suggests a filter for a combination of noise and brightness enrichment to enhance the image quality of ovarian cancer for CAD and focus on the region of interest (ROI) of the areas affected by ovarian cancer. Brightness enhancement is achieved in the proposed approach that employs the Wiener filter to eradicate noise and CLAHE. The dual-stage filtering and brightness enrichment for X-ray images are part of an algorithm proposed. The adaptive median filter and bilateral filter utilization oppress the hybrid, including Gaussian noise and impulsive noise. At the same time, the critical structures in the images, e.g., edges, are protected. Then, the image brightness is improved by utilizing gray-level morphology and contrast limited histogram equalization (CLAHE). HE is the most frequently used brightness enrichment technique. But, in the enhancement process, it inclines to alter the image contrast as the preservation of actual contrast is mandatory to prevent object irritation. Therefore, to preserve the actual contrast to some extent, BBHE has been proposed and mathematically evaluated. Yet, the BBHE has not treated some instances because of the need for an extreme level of preservation. The expansion of BBHE is MMBEBHE. The overly detailed images are not suitable for the application of MMBEBHE. A new method was proposed to rid away these drawbacks. In this method, the brightness expansion management based on MMBEBHE performs the enhancement. During image optimization, the images with inbuilt noise are also optimized. Such an outcome could be prevented by transmitting the optimized image via a median filter. For the prevention of impulse noise-based images, median filter is an optimum method. The median filter’s effect of incomplete averaging and the predetermined input stream is not direct mathematical averaging. Speckle noise influences medical ultrasound imageries of predominant applications with less brightened images. This paper aims to improve the preprocessing

Wavelet-Based Aerial Image Enhancement Tools and Filtering …

241

Fig. 3 Wavelet factor processes of image processing

phase’s efficiency in treating medical ultrasound imageries by maintaining its essential features. This work proposes an automatic cumulative HE and gamma correctionbased brightness enrichment method depending on the image. It also presents an algorithm for denoising called gamma correction with exponentially adaptive threshold (GCEAT) that recommends GC for brightness enrichment associated with the latest adaptive soft thresholding method based on wavelet. The adoption of GCEAT for reducing noise in images also authorizes the rest of the optimization and denoising techniques. The empirical outcomes that show subtle brightness synthetic and real ultrasound imageries are an indication that the projected approach is greater than the current brightness enrichment techniques. The promising results were acquired with medical ultrasound images by PSNR, MSE, SSIM, and AI (Fig. 3).

3 Conclusion Image processing methods offer a wide range of image fusion processes. The execution of a particular task, image content, viewer features, and viewing situations causes choosing such methods. An idea and performance analysis of distinct resolution optimization methods is discussed in this survey chapter. An extensive collection of filtering techniques besides georectification tools existing in the market is also presented in this chapter. The image enhancement systems do not consider the wavelets prone to missing the high-frequency contents leading to blurring. And the results of CWT techniques, which are approximately shifted invariant, show better

242

P. Ramesh et al.

performance. While comparing with other methods, the WT and further enhancement and filtering methods are merged in this work to generate fewer artifacts for hyperspectral satellite imageries. The performance of a satellite image is also optimized in the aspects of MSE and PSNR.

References 1. Aamir, M., Rahman, Z., Pu, Y.-F., Ahmed, W., Gulzar, K.: Satellite image enhancement using wavelet-domain based on singular value decomposition. Int. J. Adv. Comput. Sci. Appl. 10 (2019) 2. Bhardwaj, A., Wadhwa, A., Verma, V.: Image enhancement in lifting wavelet transform domain. In: AIP Conference Proceedings, vol. 2061 (2019) 3. Chen, S.D., Ramli, A.R.: Minimum mean brightness error bi-histogram equalization in contrast enhancement. IEEE Trans. Consum. Electron. 49(4), 1310–1319 (2003) 4. Hussain, K., Rahman, S., Rahman, M., Mostafijur, S., Khaled, A.-A.-W.M., Khan, M., Shoyaib, M.: A histogram specification technique for dark image enhancement using a local transformation method. IPSJ Tran. Comput. Vis. Appl. 10 (2018) 5. Jha, R.K., Chouhan, R., Aizawa, K., Biswas, P.: Dark and low-contrast image enhancement using dynamic stochastic resonance in discrete cosine transform domain. APSIPA Trans. Sig. Inf. Process. (2013) 6. Jobson, D.J., Rahman, Z., Woodell, G.A., Hines, G.D.: A comparison of visual statistics for the image enhancement of FORESITE aerial images with those of major image classes visual information processing XV. Proc. SPIE 6246 (2006) 7. Kirti, K., Kumar, D.: Biomedical image enhancement using wavelets. Proc. Comput. Sci. 48, 513–517 (2015) 8. Luo, S.M.: A image enhancement algorithm combined wavelet transform with image fusion. Appl. Mech. Mater. 1832 (2012) 9. Mansoor, A., Khan, Z., Khan, A.: An application of fuzzy morphologhy for enhancement of aerial images. In: International Conference on Advances in Space Technologies, pp. 143–148 (2008) 10. Mishro, P.K., Agrawal, S., Panda, R., Hansdah, K.T.: MR image enhancement using stationary wavelet transform based approach. In: 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, pp. 1–6 (2020) 11. Muna, F.A.-S., Abdul, N., Saiyd, M.A.: Colored satellites image enhancement using wavelet and threshold decomposition. Int. J. Comput. Sci. 8(5), 33–41 (2011) 12. Mustafa, W., Yazid, H., Khairunizam, W., Jamlos, M., Ibrahim, Z., Mohamad, Razlan, Z., Shahriman, A.: Image enhancement based on discrete cosine transforms (DCT) and discrete wavelet transform (DWT): a review. In: IOP Conference Series: Materials Science and Engineering, vol. 557 (2019) 13. Qu, Z., Xing, Y., Song, Y.: Image enhancement based on pulse coupled neural network in the nonsubsample shearlet transform domain. Math. Probl. Eng. (2019) 14. Salaheldin, A., Nagaty, K., Elarif, T.: An enhanced histogram matching approach using the retinal filter’s compression function for illumination normalization in face recognition. In: International Conference on Image Analysis and Recognition, pp. 873–883 (2008) 15. Unaldi, N., Temel, S., Asari, V.K., Rahman, Z.: An automatic wavelet-based nonlinear image enhancement technique for aerial imagery. In: 4th International Conference on Recent Advances in Space Technologies, Istanbul, Turkey, pp. 307–312 (2009)

Efficiency Optimization of Security and Safety in Cooperative ITS Communication for QoS Service M. Mohan Rao, Sreenivas Mekala, Rajaram Jatothu, G. Ravi, and Sarangam Kodati

Abstract Technology like Internet of things (IOT) has gradually become a trending hotspot for research. For social network, communication services like communication enterprises are the crucial one. The cooperative intelligent transport system (C-ITS) is a leading technology which is useful to enable safety and security while traveling on roads by utilizing wireless communications. This is only possible only when we improve the quality standards and efficiency of communication services that enhance the reliability of communication. Basically, vehicle flow is regularly shared with the mobility information with the zones of nearby traffic roads and by utilizing cooperative awareness messages (CAMs) infrastructure of the road side units are to be developed as secure applications for generating local dynamic map. As these applications are generated by C-ITS, it assures safety of human and it is trust worthy as well as required secure communications are obtained. The traditional quality of service (QoS)-based Web development scheme is very beneficial for acquiring the requirements of user. Yet there are few drawbacks in obtaining the desired standards of reliability. In this paper, we make a brief study on relationship among security and quality of service (QoS), for achieving awareness on safety of vehicles in C-ITS system. In this paper, we suggested an algorithm (QoS) by integration of security computing and quality of service for enhancement of security credibility and efficiency of communication service.

M. M. Rao Department of CSE, Ramachandra College of Engineering, Vatluru, Andhra Pradesh, India S. Mekala Department of IT, Sreenidhi Institute of Science and Technology, Hyderabad, Telangana, India R. Jatothu · S. Kodati (B) Department of CSE, Teegala Krishna Reddy Engineering College, Hyderabad, Telangana, India G. Ravi Department of CSE, MRCET, Hyderabad, Telangana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_23

243

244

M. M. Rao et al.

1 Introduction Enabling the safety of passengers, traffic efficiency, and information of traffic flow on the roads while driving can be identified by new technology that is wireless communications between vehicles [1]. This type of latest technology can be called as cooperative intelligent transport system (C-ITS) is assumed for providing information about accurate view of vehicular flow in that zone as well as unsafe applications [2, 3]. Cooperative awareness, roundabout crossing, warning notification, lane change safety, safety intersection, etc., all these techniques are part of this application. Furthermore with the help of enhanced vision of traffic in the neighborhood, managing the traffic and comfort of user applications are perceived [4]. The enhancement of receiving the information technique, computer technology, communicational mechanism, and smart hardware technology, and at the same time, fast iterations and updates have lead to huge number of technologies which are emerging today in various disciplines [5]. The industries related to knowledgeintensive services are continuously serving the nation as well as society, most particularly relating to companies providing communication service. These enterprises are base for social network communication [6, 7]. Highly trusted communication is guaranteed to the user and the society and assures the improved standards of the quality communication services. So that more number of people can enjoy the advantages brought out by the high end technology. C-ITS stations include communicative components like vehicles, infrastructure RSUs, traffic centers at central level, and personal computing device [8]. Vehicles act like ITS stations for mobile and it activates applications for the convenience of drivers at junctions with constant ITS stations with infrastructure are usually said to be road side units (RSUs). The traffic operations and service providing facilities are given by central traffic centers which are helpful in managing the level of traffic cities and ITS communication centers. Traffic information on roads is collected by RSUs, and it communicates with central traffic center for managing the traffic flow and improves the functioning of application. Because of this development, personal and movable devices like personal devices and mobile phones can also take access to different services provided by ITS. Traffic awareness map of the surroundings is implemented for vehicle transmission, and safety messages are delivered such map is called as local dynamic map (LDM). Regular messages for safety are called as basic safety messages (BSMs) for maintaining WAVE standard and cooperative awareness messages (CAMs) in the ETSI standard [9, 10]. At a range of 1–10 Hz, the security messages are delivered at a generation of packets at high rate to vehicles with current view of the traffic flow in the zones surrounding the location and exact LDM is generated by application for cooperative awareness. Due to these indications, drivers decide the route basing on the information displayed in the LDM. This data should be maintained in a secured way as a venomous user may drop false information in the network site that could result in accidents or fake messages. There are some usual threats on security in ITS that includes denial of sybil attacks, servicing attacks, man-in-the-middle attacks, and

Efficiency Optimization of Security and Safety in Cooperative …

245

eavesdropping [11–13]. To overcome these threats on data and to maintain security and to meet the requirements of reliability communication services efficiency is improved basing on QoS techniques and improvement in the semantic matching in complete combinational algorithm, number of scholars have researched and analyzed on this semantic matching technique [14–16]. For optimizing, the algorithms like Web composition are basing on QoS and for optimization of the algorithm is based on context related semantics. Even though this algorithm improves the elasticity of algorithm to some standard, but it may have some disadvantages in updating same accuracy and credibility of security services [17–20].

2 Related Works For enhancing the precision of semantic similar in the entire system, the number of research scholars have suggested a technique based on algorithm like semantic matching. The utilization of elliptic curve digital signature algorithm (ECDSA) for safe transmission of CAM basing on the ECDSA scheme and processor speed needs specific time span for authorizing and verification of CAMs. High network load is generated by CAMs which results in blockage and security verification requires additional time. This leads to delay for long time and suffers loss of packets. Therefore, it is crucial to identify the tradeoff among security of vehicles for maintaining QoS for safety of critical applications. A crucial measure of reliability in C-ITS is estimation of the standard of vehicle’s awareness and exactness of its LDM. Various measures like packet delivery ratio, awareness on level of quality, etc., are utilized in the literature for estimating vehicle awareness. Anyhow, by these standards, the received number of CAMs can only be estimated from their neighbors but it does not measure the precision and safety information value which may influence vehicle safety awareness also. The QoS in C-ITS is evaluated for impact of security by an accurate awareness metric. Contributions: The main aim of current research paper is estimating the effect of security on QoS and awareness on vehicle safety in C-ITS. To achieve this result, firstly, we develop a set of standard foe categorization of infrastructure-centric and safety awareness metrics (SAMs) of vehicles that determines the precision and swiftness of information provided in the LDM. Infrastructure and vehicles may have various synonyms for safety awareness basing on applications supported in C-ITS. A brief representation is made by the infrastructure for vehicle awareness by taking into view the percentage of traffic flow identified in neighboring zones and actual position error in their mobility information of vehicles. This information can be corrupted because of GPS error or time interval of information received last. Apart from this, vehicles make use of heading information to consider the critical neighbors in the domain for calculation of awareness. Basically, load functions are utilized for calculation of awareness which gives priority to the error in the position of vehicles

246

M. M. Rao et al.

in less time headway safety concern. As defined in the ITS standards security procedures are implemented, Each CAM is added with ECDSA-based signature prior to its transmission. The CAM is positioned at the receiver in the form of security firstin-first-out (FIFO) queue, and it is verified during its turn. Some of the vehicle CAMs may not be examined in the timeout duration of packet, those CAMs are eliminated from the queue which may result in loss of packet because of security. By considering produced studies, we identify the relation among security, QoS, and the safety awareness standards in various conditions of density of road traffic. In this paper, we analyze the important concepts from security and safety concerns in the C-ITS.

2.1 Standards of Security in C-ITS For the explanation of secure CAM, transmission procedure in the C-ITS circumstances in either of the WAVE or the ET1SI standard have implemented a framework for security of cross-section layers. In the WAVE model, IEEE 1609.2 standard [14] explains about the services relating to security whereas ETSI TS 103 097 explains about security identifications in the ETSI standard. For either of the standards or no loss of generality explains about the ETSI security framework which is more identical with it. ETSI TS 103 097 explains about security headers, certificate format, and profiles of security, and it serves as the important component of security for ITS applications [15]. The ETSI explains the differences in order to ensure interoperability of messages for secured transmission among various ITS stations of both vehicles and road side units (RSUs).

2.2 C-ITS Involving Security-QoS-Safety Trade-Offs For maintenance of data transmission reliability in C-ITS, the security is a crucial component in it. Anyhow CAM adds security in terms of both size and time estimation. At the transmitter, the secure CAM transmission procedure in C-ITS takes place, ECDSA-256-SHA-256 algorithm is signed by utilizing CAM, this rises the size of CAM. It leads to increase in transmission time of the packet and engages the wireless channel tuned for long durations of time. Finally, the CAM received is stored in a proper order of security queue at the receiver. By using the ECDSA-256SHA-256 algorithm prior to transferring it to application layer, the packet is verified again.

Efficiency Optimization of Security and Safety in Cooperative …

247

2.3 Capture of Precise Vehicle Safety Awareness Vehicles for awareness on safety evaluation is a biggest task in C-ITS. At such stage, each ITS station stores the data in a database for moving vehicles and information of road traffic which is called as local dynamic map (LDM). The data are obtained from different elements of ITS like vehicles in RSUs, command centers, and board sensors of traffic for developing a data store. The traffic information of the surrounding zones is maintained by a vehicle by making use of CAMs in its local LDM. In addition to this, a traffic command center or RSUs can gather information of traffic flow from a huge geographical area for development of global LDM. By utilizing this information, efficient and best routes can be identified by the vehicles and used for developing efficient techniques for managing the traffic flow smartly in the city. Performance indicator is developed for vehicle safety awareness, and it determines the exactness and time maintenance of information received in CAMs and precision of LDM.

3 Safety Awareness Metrics In C-ITS, the enhancement of the safety awareness estimation of vehicles moving in that zone, we developed a some standards for the infrastructure of road (RSUs or traffic command centers) and vehicles. These standards are the off-line safety measures of awareness, and this can be beneficial tool for investigating the performance estimation of C-ITS. The packet delivery ratio and AQL are the counterparts of C-ITS, this helps in counterfeit of study analysis of the vehicular network, these standards help in analyzing different protocols and its effect on safety awareness. The hypothesis for these standards is that every ITS station must have a unique view of vehicle awareness that depends on the applications that it supports. Hierarchy of LDM in C-ITS of each ITS station manages its own standards of safety awareness. Traffic command center (TCC) at the city level is connected with different RSUs which helps in maintaining a global LDM, and it analyzes safety awareness by percentile of exactly identified zones surrounded by the road side units. Each and every RSU in addition consists of local LDM that accurately measures the precision of the safety information that is collected by CAM for estimating traffic flow in zone basing awareness. Lastly, vehicles those requires decisions for driving maintains their own LDM and these awareness measurement can be refined for the next level by ranking errors basing on the severity causes high safety concerns.

248

M. M. Rao et al.

3.1 Infrastructure-Centric Standards for Safety Awareness We suggested a standard known as standardized error base for safety awareness level ˆ (SAL) used for calculating awareness on traffic safety. We can observe that RSU maintaining LDM consists of exhibiting positions of overall vehicles within the extent of its communication area through CAMs. A false view of surrounding traffic is provided at the RSU with an outdated LDM is due to the inaccuracies in GPS, and packet collisions are the result. This result in generating QoS for infrastructure ˆ enhance supportive applications might be decreased. The metric standards of (SAL) the safety awareness measures by considering the received CAM and the position ˆ errors in it. To estimate (SAL), every RSU j first analyzes the error at position ∈kT ( j, n) of a single neighbor n within the zone k and at time “T ” that is exact variation of the distances among original and suggested neighboring position is calculated as follows:    T  T 2 2 T T T ∈ K ( j, n) xact (n) − xadv ( j, n) + yact (n) − yadv ( j, n)

3.2 Awareness Metrics Based on Vehicle-Centric Safety This is a heading-based filtration mechanism, it just accepts vehicles that are critical from the neighboring zone, i.e., those have accident threats on potentiality. It follows a loaded scheme that gives priority to the positions basing on errors from surrounding zones vehicles basing on their time variations. Filtration basing on vehicle heading: Every vehicle maintains its heading information and that of its neighbors in the LDM. This heading information is provided either by the GPS or calculated based on the current and the previous position of the vehicles. Heading can be measured with respect to the true north. The headingbased filtering algorithm is shown in Algo. 1. For each neighbor vehicle n, vehicle i finds the absolute differences of headings used in HEADING_DIFF process in Algo. 1. This process gives result with the smallest variation among these headings that ranges between 0° and 180°. In this scenario, mod (x1 + x2 ) represents x1 modulo x2 . If the difference among these heading is below 45°, we presume that surrounding zones vehicles travel in the similar route with the similar route like vehicles A and D. Awareness estimation can be achieved by the neighboring vehicles for observing vehicles that are traveling in the similar segment but they are movement is in reverse direction. The another rule in the NEIGHBOR_FILTERING process is represented in Algo. 1. All neighboring vehicles which are present below the 45° arc but differed by 180° such as vehicles B and C are included in this condition. The conditions of vehicle i are fulfilling by surrounding vehicles except the vehicles that have passed the route earlier like vehicle C. This is achieved by utilizing the changed vector Din , and it is calculated with the help of GPS as shown in algorithm. At last, all surrounding

Efficiency Optimization of Security and Safety in Cooperative …

249

vehicles of the other segments like vehicles “E” and “F” are repudiated. The cause of ignoring these vehicles does not cause any damage to the moving vehicles i in regarding with safety and the route varies. An error in the information as of position of those surrounded vehicles may not cause any change to the vehicle i’s awareness. Although, vehicle i may afford to obtain the updated data at a less frequency from the neighbor vehicles. Algorithm: Heading-based Neighbor Filtering for Awareness Evolution a Vehicle i Procedure Neighbor_Filtering() T

For n=1 to Vk (i) do If HEADING_DIFF (hi, hn) < 45 then Keep neighbor n else if -22.5 < HEADING_DIFF (hi, hn) – 22.5 then if Din > 0 then Keep neighbor n else discard neighbor n end if Discard neighbor n else discard neighbor n end if end if end for end procedure procedure HEAIDNG_DIFF (hi, hn) diff = hi – hn hdiff mod ((diff + 180), 360) – 180) return |hdiff| end procedure

This algorithm works for the implementation of traditional quality of service (QoS) basing on Web combination scheme is enhanced to the great extent in acquiring the user requirement. This working can be depicted with the help of Fig. 1 is as follows. The above flowchart shows the optimized cooperative ITS system communication for QoSs. Initially, the C-I TS application is selected which provides safety. By using semantic matching algorithm, the application data will be processed. The VANETs similarity index will compare the data which are coming from semantic matching

250

M. M. Rao et al.

Fig. 1 Optimized cooperative ITS system communication for QoSs

algorithm block. After that, parameters are evaluated from the obtained data using C-ITS parameters. At last, the data and parameters are updated.

4 Result The results are classified into three parts by concentrating on fields like response time of network, response time of security, and performance on security awareness.

4.1 Network Performance In the aspect of network performance, we display the performance results instead of designing the security mechanism. Hence, the CAM size of 200 bytes is transferred with no signature on it.

Efficiency Optimization of Security and Safety in Cooperative …

4.1.1

251

Inter-arrival Time of Packet

The inter approach time of the packets is represented. Impeccably, the inter-arrival time must be same as that of CAM production intervals of time that is 100 ms. Nevertheless, loss of packets is because of collisions occurring and declined outcomes in relatively greater intervals of arrival time. Specifically, we can observe a difference of 20–190 ms in the inter-arrival time and separation time k ranges from 50 to 350 m at the transmitter receiver. When the density of vehicles is 1200 vehicles/km2 , there is an increase in inter-arrival time to 912 ms and 500 m is the distance of separation. This indicates that there may be a loss of 9 succeeding CAMs of vehicles on average at a great density of vehicles inside the range of communication, therefore resulting in impact awareness. Hence, vehicles maintaining distances higher than 300 m may not experience serious danger for safety of applications, this may not lead to decrease in performance standards.

4.1.2

Packet Delivery Ratio

The packet delivery ratio is depicted here in Fig. 2. The reception of packets successfully is the statistics of specific vehicle density reduces as the distance of separation among transmitter and receiver is high. Within 100 m distance of surrounding zones, CAM can be delivered at a higher rate of 60% at various densities of vehicles. The possibility of packet receiving from surrounding zone vehicles is in the range of 150 m distance is minimized to 48% of density range is 1200 vehicles/km2 . Lastly, the delivery of packet rate is minimized to 12% and distance between transmitter and receiver is up to the range of 300 m. Thus it is not censorious for the safety of applications. As the value of k is improved, because of hidden nodes and loss of propagation time due to fading is the cause of packet loss is due to increased collision rate.

Fig. 2 Delivery ratio of packets at various densities of vehicles

252

M. M. Rao et al.

Table 1 Packet delivery ratio depending on vehicle densities Distances

100/km2

200/km2

600/km2

1200/km2

50

0.9

0.8

0.7

0.6

100

0.8

0.7

0.6

0.5

150

0.7

0.6

0.5

0.4

200

0.6

0.5

0.5

0.3

250

0.5

0.4

0.3

0.2

300

0.4

0.3

0.2

0.1

Fig. 3 Accuracy and security of cooperative ITS system communication for QoSs

The recorded values of above graph are observed and presented in Table 1 form as follows. Figure 3 shows the accuracy and security of cooperative ITS system communication for QoSs. From this figure, it can observe that the accuracy of the system is very high and high security is provided to the data which is processed.

5 Conclusion This paper analyzes about the relationship among reliability and safety basing on QoS for C-ITS. This helps in précised estimation of safety awareness measures with the help of this novel standard metrics that considering accurate information of LDM. Besides this, the quantity of CAMs received from the vehicles. Position errors of surrounding vehicles are taken into consideration at the nodes of infrastructure for awareness prediction on an average. Making use of a vehicle heading-based filtration mechanism that involves critical neighbors is suggested for calculating awareness.

Efficiency Optimization of Security and Safety in Cooperative …

253

Although, vehicles use loaded error position and average value of loaded errors at the position to analyze the awareness measures. The surrounded neighbors zones are close is considered for high standards of safety measures and safety concerns. These standards allot a headway time intervals of surrounding zones basing on the load of error with the help of reflected results, we can represent that there is better standard of safety awareness with the help of these suggested standards as compared with the current metrics which is used for vehicular ad hoc networks (VANETs). This leads to development of the higher standards of safety methods for C-ITS and analyzes the effect of security signature sign or verifying the speed of awareness of vehicle moving in various timings and conditions of road. From the outcome, we can observe that a 1.5 GHz processor with average sign or verification speed may result in less awareness of neighboring traffic flow in the ITS stations which is not favorable for safety applications of C-ITS. Hence, a processor of 3 GHz speed must be utilized in addition with other concentrated clogging techniques for controlling and maintaining the desired standards of QoS and safety awareness measures.

References 1. Sun, Z., Liu, Y., Wang, J., Li, G., Anil, C., Li, K., Guo, X., Sun, G., Tian, D., Dongpu, C.: Applications of game theory in vehicular networks: a survey. https://doi.org/10.1109/COMST. 2021.3108466 2. Wei, L., Chen, Y., Zheng, D., Jiao, B.: Secure performance analysis and optimization for FD-NOMA vehicular communications. https://doi.org/10.23919/JCC.2020.11.003 3. Ravi, B., Thangaraj, J.: Improved performance evaluation of stochastic modelling and QoS analysis of multi-hop cooperative data dissemination in IVC. https://doi.org/10.1109/IMICPW. 2019.8933202 4. Zaidi, S., Smida, O.B., Affes, S., Vilaipornsawai, U., Zhang, L., Zhu, P.: QoS-based virtualization of user equipment in 5G networks. 10.1109/I WCMC.2018.8450390 5. Kailong, Z., Min, W., Hang, S., Ansheng, Y., de La Fortelle, A., Kejian, M.: QoS-CITS: a simulator for service-oriented cooperative ITS of intelligent vehicles. https://doi.org/10.1109/ ICIS.2017.7960093 6. Mumtaz, S., Huq, K.M.S., Ashraf, M.I., Rodriguez, J., Monteiro, V., Politis, C.: Cognitive vehicular communication for 5G. IEEE Commun. Mag. 53(7), 109–117 (2015) 7. Javed, M.A., Ngo, D.T., Khan, J.Y.: Distributed spatial reuse distance control for basic safety messages in SDMA-based VANETs. Veh. Commun. 2(1), 27–35 (2015) 8. Shao, C., Leng, S., Zhang, Y., Vinel, A., Jonsson, M.: Performance analysis of connectivity probability and connectivity-aware MAC protocol design for platoon-based VANETs. IEEE Trans. Veh. Technol. 64(12), 5596–5609 (2015) 9. Sahoo, J., Wu, E.H.-K., Sahu, P.K., Gerla, M.: Congestion controlled-coordinator-based MAC for safety-critical message transmission in VANETs. IEEE Trans. Intell. Transp. Syst. 14(3), 1423–1437 (2013) 10. Hamida, E.B., Noura, H., Znaidi, W.: Security of cooperative intelligent transport systems: Standards, threats analysis and cryptographic countermeasures. Electronics 4(3), 380–423 (2015) 11. Intelligent transport systems (ITS)—vehicular communications—basic set of applications— part 1: functional requirements. Eur. Telecommunication. Standards Inst., Sophia Antipolis, France, Tech. Rep. ETSI TS 102 637-1 v1.1.1 (2010)

254

M. M. Rao et al.

12. Intelligent transport systems (ITS)—vehicular communications—basic set of applications— local dynamic map (LDM)—rationale for and guidance on standardization. Eur. Telecommun. Standards Inst., Sophia Antipolis, France, Tech. Rep. ETSI TR 102 863-2 v1.1.1 (2011) 13. Othmane, L.B., Al-Fuqaha, A., Hamida, E.B., van den Brand, M.: Towards extended safety in connected vehicles. Proc. IEEE Conf. Intell. Transp. Syst., pp. 652–657 (2013) 14. Qu, F., Wu, Z., Wang, F.-Y., Cho, W.: A security and privacy review of VANETs. IEEE Trans. Intell. Transp. Syst. 16(6), 2985–2996 (2015) 15. Liu, J., Zhang, T., Song, X., Xing, J.: ‘Enhanced red emission of 808 nm excited upconversion nanoparticles by optimizing the composition of shell for efficient generation of singlet oxygen.’ Opt. Mater. 75, 79–87 (2018) 16. He, Y., et al.: ‘Optimizing microwave-assisted hydrolysis conditions for monosaccharide composition analyses of different polysaccharides.’ Int. J. Biol. Macromolecules 118, 327–332 (2018) 17. Wang, S., Kim, H.J., Chen, J., Laughlin, D.E., Piazza, G., Zhu, J.: The effect of in situ magnetic field on magnetic properties and residual stress of fe-based amorphous film. IEEE Trans. Magn. 54(6). Art. no. 2001008 (2018) 18. Ishii, D., et al.: ‘Effect of monomeric composition on the thermal, mechanical and crystalline properties of poly [(R)-lactate-co-(R)-3- hydroxybutyrate].’ Polymer 122, 169–173 (2017) 19. IEEE standard for wireless access in vehicular environments—security services for applications and management messages, IEEE Standard 1609.2–2013 (2013) 20. Intelligent transport systems (ITS)—security—security header and certificate formats. Eur. Telecommun. Standards Inst., Sophia Antipolis, France, Tech. Rep. ETSI TS 103 097 V1.2.1 (2015–06) (2015)

Classification of Medicinal Plants Using Machine Learning Rohit Sunil Meshram and Nagamma Patil

Abstract Nowadays, peoples are not having information about the surrounding plants and their medicinal values. If some person wants to know about the medicinal plants, they have to contact the person who is having deep knowledge about the medicinal plants and its uses. In order to solve this problem we can use the current technology to give a tool which will help the common people to know more about the medicinal plants. For doing this we can use many machine learning techniques for classifying the medicinal plants with more accuracy. Different kind of medicinal plant species are available on the planet earth but classification of the Particular medicinal plant is very difficult without knowing about the plants first. The information about the medicinal plants is collected by the scientists and urban people. Generally this kind of knowledge is passed through generation to generation and sometimes there might be some changes in the information and its contents. So according to the current situation we can use the machine learning technology to make the tool which will be helpful to solve the medicinal plant classification problem. Machine learning model can easily classify the medicinal plants after the feature extraction and applying the model.

1 Introduction Classification of the plant species is very helpful for the variety of the sector like botanists, physicians, pharmaceutical laboratories and public. Due to this, there are many researchers taking interest in this domain by developing the automated system for classification of the plant species. On planet earth we can find a variety of the plant species, some of which are having some medicinal values which makes this domain very important. Some plant species are close to extinction and demand for the conservation of them. But for that firstly we have to correctly classify the plant species. For classifying the plant species first R. S. Meshram (B) · N. Patil Department of Information Technology, National Institute of Technology, Karnataka, Surathkal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_24

255

256

R. S. Meshram and N. Patil

we have to look for the expert botanist and then he will classify the plants based on his knowledge. If we think then the most successful method for the classification of the plant species is the manually classifying the plants. We classify the plant species from the physical appearance of the leaf and the texture of the leaf. And that is why we have to consider these characteristics of the leaf which is used by the expert to classify the particular plant species while developing the automated system for it. Machine learning is part of (subset) artificial intelligence. The ML models learn on the data they are available with and they predict the unseen data with some accuracy. There are two types of machine learning techniques. • Supervised Learning • Unsupervised Learning Supervised learning is the learning in which the model learns for the labeled dataset which provides the answer key which is used by the algorithm to evaluate its accuracy. In an unsupervised learning the machine learning model analyzes and clusters the unlabeled dataset by finding the hidden patterns in the dataset.

2 Related Works For protecting the plants or just taking some information about it, first we need to classify the plant species correctly, for that in many tasks the plant is classified from the textural-based features [1]. Considering only one type of feature is not sufficient and that’s why in other tasks we can observe that the researchers were also considering the other types of features of the plants like shape-based features [2, 3]. Other than features of the leaf, it is very important to preprocess the dataset before it is given to the model for training, hence in many tasks there is special importance is given to the dataset preprocessing which will provide the clean data to the model [4]. Other than the advantages of the tasks they also have some limitations like the number of the plant species they are considering for the classification [5, 6]. In the current tasks of the medicinal plant classification we can also observe the use of the transfer learning for deep neural network-based plant classification. Transfer learning is good as it is trained with millions of records and nearly 1000 number of classes. The models which use the transfer learning include ResNet50, VGG16, Inception. Our objective of the project is to classify medicinal plants using best machine learning models (Table 1).

3 Methodology A. SVM SVM stands for the Support Vector Machine which is a supervised machine learning algorithm. SVM provides analysis of data for regression and classification. The

Classification of Medicinal Plants Using Machine Learning Table 1 Literature survey Authors Methodology Sathwik and Yasaswini [4]

Venkataraman [7]

Sandeep Kumar [8]

Liu and Huang [9]

Classification of selected medicinal plant leaves using texture analysis Computer vision-based feature extraction of leaves for classification of medicinal values of plants Leaf feature-based approach for automated classification of medicinal plants Plant leaf recognition

Advantages

257

Limitations

Classification is based Boundary recognition on textural features, is missing will produce better result Shape feature used Textural feature is missing

Preprocessing is done followed by image segmentation and feature extraction.

Only 10 different species of the plants are considered

Well aligned images Method holds for with constant clean images only background and nearly no variations of color

SVM algorithm creates the best possible decision boundary that can separate the N-dimensional space into different classes, by putting each dataset in the correct classes. This decision boundary is called a Hyperplane as shown in Fig. 1. The SVM is mostly used for the clustering of the unlabeled data. Maximum-margin hyperplane and margins for an SVM trained with samples from two classes. Samples on the margin are called the support vectors as shown in Fig. 1. (1) Preprocessing: Data processing is a very crucial step, as it provides cleaned and relevant dataset which then can be used in further steps like classification or regression. The data will be given to the SVM to predict a given image. The steps for the preprocessing of the dataset are as follows. (1) (2) (3) (4) (5)

RGB to grayscale image conversion. Use of Gaussian filter for smoothing image. Use of Otsu’s thresholding method for adaptive image thresholding. Use of morphological transformation for holes closing. Use of contours for boundary extraction.

(2) Feature Extraction: In the feature extraction, the number of total features is reduced by creating a new dataset from the features of the available dataset. When the dataset contains a large number of features, it becomes difficult to fit the model to the dataset. So we only consider those features which are having more impact on the prediction (Fig. 2).

258

R. S. Meshram and N. Patil

Fig. 1 Support vector machine

Fig. 2 Block diagram of proposed methodology

Number of leaf features will be extracted from the preprocessed image which are listed as follows [10, 11]: (1) Features based on shape: physiological length, physiological width, area, perimeter, aspect ratio, rectangularity, circularity. (2) Features based on color: mean and standard deviations of R, G and B channels. (3) Feature based on texture: contrast, correlation, inverse difference moments, entropy. (3) Model Building and Testing: After preprocessing we have data ready to use. We will first build the model which will fulfill our project requirements and train it for further testing it. For that Support Vector Machine Classifier is used as the model to classify the plant species. StandardScaler will be used to scale the features of the species, similarly GridSearchCV is used for parameter tuning to find the better hyperparameters of the model. B. ResNet50 ResNet stands for Residual Network which is an Artificial Neural Network (ANN). ResNet networks work by using skip connections or jumping over some layers as shown in Fig. 3. Typical ResNet is implemented with a double or triple

Classification of Medicinal Plants Using Machine Learning

259

layer. In ResNet there are two main reasons for adding skip connections. First is to avoid the problem of vanishing gradients. Where adding more layers leads to higher training error. The second reason is that skipping effectively simplifies the network. This speeds the learning process of the network as they have to propagate through the less number of the layers. The number 50 in the ResNet50 denotes that the ResNet network is 50 layers deep. (1) Identity Block: Identity block is the building block of the ResNet50. In the identity block the activation of a layer is skipped to a deeper layer in the neural network. Theoretically the training error should decrease as more layers are added to a neural network, but while actual practice for the traditional neural network, after a certain point the training error will keep increasing. This problem does not happen in the case of ResNet. (2) Convolutional Block: Convolutional block is almost the same as identity block, but in convolutional block there is a short-cut path to just change the dimension such that the dimension of the input and output matches. Whereas the identity block is used where there is no change in the input and output dimensions. We are not using transfer learning. Instead, we are implementing the ResNet 50 using Keras as there are also some limitations to the transfer learning which includes the problem of negative transfer and problem of overfitting. C. Dataset The dataset used is the Flavia leaves dataset which contains 1833 images of total 30 plant species. also has the breakpoints and the names mentioned for the leaves dataset. After the preprocess the features will be extracted from the processed image on the basis of color, texture and color (Fig. 4).

4 Results and Discussion A. SVM (1) Gaussian Filter: A Gaussian filter is a linear kind of filter which is used to reduce the noise of the image by making it blur. A Gaussian filter can reduce contrast and blur edges of the image (Fig. 5). (2) Boundary Extraction: In Boundary Extraction the edge of the binary image is extracted by the digital process on the image. It is also used for extracting edges of color and grayscale images. The boundary extracted image is shown in Fig. 6. (3) Image Background Subtraction: In this we explore the method of background subtraction of plant leaf images captured from mobile camera as shown in Fig. 7. Background subtracted images will then be treated as input images to the plant leaf classification system. In this image also we will apply various filters.

260

R. S. Meshram and N. Patil

Fig. 3 Canonical form of a residual neural network. A layer l-1 is skipped over activation from l-2

(4) Medicinal Plant Classification: While the classification of the chosen plant we will use the CSV file which we have generated by adding the extracted features of the dataset. With that we will also import some dependencies for the procedure. After loading the file we will split the dataset into two files for training and testing. 70% of the data will be used for the training and 30% of the data will be used for the testing in this project. After the data splitting we will do the feature scaling as it is very important. If feature scaling is not done, then a machine learning algorithm tends to weigh greater values, higher and consider smaller values as the lower values, regardless of the unit of the values. After all this adjustments we have to apply SVM on the dataset. After getting predicted data from the model we will do the parameter tuning. This method is provided by the sklearn library which allows us to define a set of possible values we wish to try for the given model and it trains on the data and identifies the best estimator from a combination of parameter values. If we observe, there are many advantages to the dimensionality reduction. Due to the dimensionality reduction the possibility of the overfitting is very low. The model will work more easily on the new data. Reducing the number of input variables for a model, promoting only important variables is known as dimensionality reduction. For the dimensionality reduction the Principal Component Analysis (PCA) from linear algebra is used, which automatically does the dimensionality reduction. Completing all the important adjustments our model is ready to test the plant which is not classified. While testing we can expect the result with accuracy of 90% considering all the parameters we are dealing with during the feature extraction.

261

Fig. 4 Medicinal plant leaf dataset with 1833 images

Classification of Medicinal Plants Using Machine Learning

262

R. S. Meshram and N. Patil

Fig. 5 Smoothing image using Gaussian filter

Fig. 6 Boundary extraction using contours-effective

B. ResNet50 As the ResNet50 is the pretrained model with dataset of millions of records which will give better and fast prediction about the plant species. But here we are implementing the ResNet50 with Keras by giving a custom dataset. Actually we are using the same dataset for the both models, which is a dataset with 30 species of the medicinal plants and with a total 1833 images [12, 13]. We are training this model with 40 numbers of epochs and with the batch size of 32. After determining the number of images for training and validation the model will start training with the provided dataset. The model will accept the images with the shape of 64 × 64 which can be further changed by 224 × 224. After the training testing and validation we are getting the accuracy of 91% as shown in Fig. 9. And we can also observe the related confusion matrix as shown in Fig. 11.

Classification of Medicinal Plants Using Machine Learning

263

Fig. 7 background subtraction of plant leaf images captured from mobile camera

Fig. 8 Performance of SVM

5 Comparison of Results of SVM and ResNet50 After the implementation of both the models on the custom dataset we can observe the difference between accuracy of the models as shown in Figs. 8 and 9. Similarly we can observe the corresponding confusion matrix of the results of both the models as shown in Figs. 10 and 11. As shown in Fig. 12 we can observe the comparison between the performance of SVM and ResNet 50 on the dataset with a total of 30 species of the medicinal plants. Here the performance of both the models is compared using the parameters like precision, recall and f 1-score for each class of medicinal plants.

264

Fig. 9 Performance of ResNet50

Fig. 10 Confusion matrix for SVM

R. S. Meshram and N. Patil

Classification of Medicinal Plants Using Machine Learning

265

Fig. 11 Confusion matrix for ResNet50

6 Conclusion The work analyzed the prediction accuracy of 2 different Artificial Neural Networks (ANN) on the same dataset with 30 classes. Our main goal was to find out the accuracy of different networks on the same dataset and analyze the performance of both the networks. Here we have to note that there are so many species of the medicinal plants, and this results in the similarity in the shape and texture in the leaf of two different medicinal plants which will be nearly not distinguishable. That’s why we can see the differences between the accuracy of the networks. Instead of this we can use the transfer learning technique which will do the prediction with the higher rate of accuracy, but here the thing to learn is more the number of layers and more training, more the rate of accuracy. From the above result and analysis we can easily conclude

266

R. S. Meshram and N. Patil

Fig. 12 Performance of SVM and ResNet50 on dataset classes

that the ResNet50 model implemented with Keras and custom dataset outperforms the SVM model by giving the accuracy of 91%. And accuracy of the SVM is still appreciable which is 89%. We can also conclude that neural networks are best for object categorization problems which have a wide range of applications and can be easily integrated in various platforms. One can easily generate the desired model with the nominal computational requirements.

References 1. Papineni, K.: BLEU: a method for automatic evaluation of MT. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002 2. Chaki, J., Parekh, R., Bhattacharya, S.: Plant leaf recognition using ridge filter and curvelet transform with neuro fuzzy classifier. In: Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics, New Delhi, India, 2016 3. Turkoglu, M., Hanbay, D.: Leaf-based plant species recognition based on improved local binary pattern and extreme learning machine. Phys. A Stat. Mech, Appl (2019) 4. Sathwik, R.V.A.G.T., Yasaswini, R.: Classification of selected medicinal plant leaves using texture analysis. In: 4th ICCCNT—IEEE-31661, Tiruchengode, India, July 4–6, 2013 5. Habiba, U., Rasel Howlader, M., Aminul Islam, M., Faisal, R.H., Mostafijur Rahman, M.: Automatic medicinal plants classification using multi-channel modified local gradient pattern with SVM classifier. In: Joint 2019 8th International Conference on Informatics, Electronics and

Classification of Medicinal Plants Using Machine Learning

6.

7.

8.

9. 10.

11.

12. 13.

267

Vision (ICIEV) and 3rd International Conference on Imaging, Vision and Pattern Recognition (IVPR), Spokane, WA, USA, 07 Oct 2019 Dileep M.R., Pournami P.N.: AyurLeaf: A Deep Learning Approach for Classification of Medicinal Plants. TENCON 2019—2019 IEEE Region 10 Conference (TENCON), Kochi, India, 12 Dec 2019 Venkataraman, M.N.D.: Computer vision based feature extraction of leaves for identification of medicinal values of plants. In: IEEE International Conference on Computational Intelligence and Computing Research, Chennai, India, 15–17 Dec 2016 Sandeep Kumar, V.T.E.: Leaf feature based approach for automated identification of medicinal plants. In: International Conference on Communications and Signal Processing, Melmaruvathur, India, 3–5 Apr 2014 Liu, A., Huang, Y.: Plant leaf recognition. In: Conference Proceedings (2016) Amuthalingeswaran, C., Sivakumar, M., Renuga, P.: Identification of medicinal plant’s and their usage by using deep learning. In: Proceedings of the Third International Conference on Trends in Electronics and Informatics (ICOEI 2019) IEEE Xplore, Tirunelveli, India, 10 Oct 2019 Manoj Kumar, P., Surya, C.M., Gopi, V.P.: Identification of ayurvedic medicinal plants by image processing of leaf samples. In: IEEE International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), Kolkata, India, 25 Dec 2017 Hu, R., Lin, C., Xu, W., Liu, Y., Long, C.: Ethnobotanical study on medicinal plants used by Mulam people in Guangxi. China. J. Ethnobiol, Ethnomed (2020) Simion, I.M., Casoni, D., Sârbu, C.: Classification of Romanian medicinal plant extracts according to the therapeutic effects using thin layer chromatography and robust chemometrics. J. Pharmaceutical Biomed. Anal. (2019)

Performance Measures of Blockchain-Based Mobile Wallet Using Queueing Model R. Kavitha, R. Rajeswari, Pratyusa Mukherjee, Suchismita Rout, and S. S. Patra

Abstract Smart and cashless payments through mobile wallets are the recent implementation of digital payment alternatives in developing countries to achieve competitive advantage. Blockchain is an egressing digitally managed technology (BT) which allows ubiquitous monetary transactions with the untrusted entities. In blockchain, it is impossible to change records without notifying all participants and validating signatures. Hence, it overcomes the security challenges that exist in the mobile wallet. BT can revolutionize a change in paradigm by which the transactions will become more efficient by reduced number of intermediaries and faster payment procedures among the parties. In this paper, we develop a queueing model which evaluates in detail the characteristics of a mobile wallet system and measures the various performances of the blockchain-based mobile wallet system.

1 Introduction In the current dynamic world, blockchain makes enormous provisions across various industries to perform any type of transaction, mainly the payment transaction or approval procedures or auditing process or any other financial dealings. Blockchains offer a cryptocurrency supported secured mechanism that enables governments, banks, and other finical institutions to conduct safe digital payments through mobile applications. One of recent phenomena in the industry is the mobile wallet, a mobile app that removes the physical wallet and helps to make payment transactions through R. Kavitha · R. Rajeswari Department of Commerce, Periyar University, Salem, India P. Mukherjee School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, India S. Rout Department of CSE, Silicon Institute of Technology, Bhubaneswar, India S. S. Patra (B) School of Computer Applications, KIIT Deemed to be University, Bhubaneswar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_25

269

270

R. Kavitha et al.

the mobile device where the mobile becomes a digital purse. In India, there is sharp increase in mobile wallets utility for payments such as fuel recharges, flight booking, hotel payments, restaurants, and any other type of utility payments. Mobile wallets acceptance in India is growing very fast and taking leap for making P2P payments as well as purchasing goods and services, due to easy adoptability of smart phone and revolutions in Telecom industry. According to the data of Reserve Bank of India, the total mobile payments transactions are increased by 110%, from 124.3 crore in February to 253.2 crore in May this year in India. However, consumers are still lagging in adoption because of easy availability of cash and having less trust and security concerns on the mobile wallet compared to trusted banks. Thus, security and trust are the major factors that are stopping the adoption of mobile wallets. Therefore, there is urgent need to have highly track able, auditable, and untamable solution for mobile wallet industry. Blockchain is one such versatile and suitable technology for conducting any type of transaction in safe and secured manner. This study proposes a conceptual model using blockchain in mobile wallets to solve the above-mentioned problem.

2 Literature Review Blockchain is an extensively explored field and used in variety of use cases such as healthcare, smart agriculture, and smart cities [1–4]. During the revolution of industries 4.0, the use of e-wallet has been considered as a favored means of payment and is going to overtake the cash transaction and cash expenditure in the coming times. Because of its convenience and ease of use the mobile payment, e-wallet and online payment are rapidly turned into the mainstream. In the bitcoin blockchain, blockconstruction plays an important role. From the transactional perspective, a freshly issued transaction awaits in the memory pool of a miner node. Next, it is assimilated into a new block. It is ultimately confirmed, it is appended to the blockchain on competition of corresponding mining. As a block includes many transactions, the block-construction procedure may be modeled as a queueing system having single arrival and batch service, where a group of requests gets service simultaneously [5, 6]. Blockchain systems with bulk service had been studied in [7] using queueing model. Many systems following both bulk arrival and bulk service were extensively studied and implied to several use cases [6, 8, 9]. To validate the transaction-confirmation procedure using queueing model, it is mandatory to formulate the service-time distribution. In the Bitcoin blockchain, the generation of the block, the arrival of the transactions, the time interval between successive block generations, can be designated as the service time for transactions follows the exponential distribution.

Performance Measures of Blockchain-Based Mobile Wallet …

271

Fig. 1 Proposed blockchain-based mobile wallet

3 Role of Blockchain in the Mobile Wallets and Mobile Payments Mobile payments are generally classified into two categories: transactions where a merchant is remotely available and the payment at the merchant site. They all can be implemented by near-field-communication-based mobile payment, QR code-based payment, and using audio-signal. Though there are many products available in the market such as Google pay, Apple Pay, and Alipay and have different communication channel but have certain similarities; in the first stage, the mobile device checks the credential by checking the account details, credentials as well as may be credit card or debit card information and in the second stage, from the bank, the actual payment is processed. Cryptocurrency or e-cash was introduces in 1990’s and was popular. But it was relying on a centralized party, the bank and achieves the design goal. Mobile wallets can leverage key benefits from blockchain as it is becoming an important and very essential tool of any mobile users. Blockchain is a sequence of blocks and follows a decentralized mechanism of book keeping. With the help of hash function, the blocks are linked to each other. A proof field is used to verify the validity of the blocks. The basic idea of block chain is transactions are embedded. Due to the immutability and public accessibility, there is no chance of double-spending in the system as long as majority of the participants to the system are honest. The proposed blockchain-based mobile wallet architecture is shown in Fig. 1.

4 Model Description and Queueing Model The blockchain-based mobile wallet system can be modeled as a queueing system and is represented in Fig. 2. The arrival rate of transactions to the system is λ. “b”

272

R. Kavitha et al.

Fig. 2 Queueing model for blockchain-based mobile wallet [10]

number of transactions are brought in a block and let it be b.μ is the service rate. The system is modeled as a Markovian single server M/Mb /1 queueing system, where the server serves the entire batch of transactions in the queue. The server waits until the entire batch has b number of transactions. πn (t + h): Prob. that at time (t + h), the system has (n + b) number of transactions. πn (t): Prob. that at time t, the system has n number of transactions. πn−1 (t): Prob. that at time t, the system has (n − 1) number of transactions. πn+b (t): Prob. that at time t, the system has (n + b) number of transactions. The stochastic balance equations can be derived as follows: πn (t + h) = πn+b (t) ∗ Prob(no arrval and b service) + πn (t) ∗ Prob(no arrval and no service) + πn−1 (t) ∗ Prob(one arrval and no service)

(1)

= πn+b (t) ∗ (1 − λh)μh + πn (t) ∗ (1 − λh)(1 − μh) + πn−1 (t) ∗ λh(1 − μh) πn (t + h) − πn (t) = πn+b μ − πn λ h

(2)

Performance Measures of Blockchain-Based Mobile Wallet …

− πn μ + πn−1 λ as λh ∗ μh− > 0

273

(3)

In the steady state, LHS is equals to 0. 0 = −(λ + μ)πn + πn+b μ + πn−1 λ, n ≥ b

(4)

0 = −λπn + πn+b μ + πn−1 λ, 1 ≤ n < b

(5)

0 = −λπ0 + πb μ

(6)

Equation (4) can be written as πn = Cr0n (n ≥ b − 1, 0 < r0 < 1)

(7)

Here, πn is applicable only for n ≥ b − 1. From Eq. (6), πb =

λ π0 = Cr0b μ

(8)

However, C and π0 can be obtained as C=

π0 λr0n−b λπ0 (n ≥ b − 1), , π = n μ μr0b

(9)

To get π0 , we use b-1 stationary equations given in (4)–(6) μπn+b = λπn − λπn−1 (1 ≤ n ≤ b)

(10)

The geometric form for πn (when n ≥ b − 1) can be substituted for πn+b in the previous equation giving π0 r0n = πn − πn−1 (1 ≤ n < b)

(11)

By using iterative method, the equations can be solved starting from n = 1 or it can be noted that πn = C1 + C2 r0n

(12)

Direct substitution into (12) implies C2 = −

π0 r0 1 − r0

At n = 0, the boundary condition can be written as

(13)

274

R. Kavitha et al.

C1 = π0 − C2

(14)

⎧ ⎪ π0 (1 − r0n+1 ) ⎪ ⎪ (1 ≤ n ≤ b − 1), ⎨ 1 − r0 πn = ⎪ π0 λr0n−K ⎪ ⎪ (n ≥ b − 1). ⎩ μ

(15)

This implies

Since

∞

πn = 1, from Eq. (12), we have

n=0

 π0 = 1 + 

b−1  1 − r n+1 0

n=1

1 − r0



λ  n−b + r μ n=b 0

−1

b−1 r 2 (1 − r0b−1 ) λ = 1+ − 0 + 2 1 − r0 (1 − r0 ) μ(1 − r0 )  −1 μr0b+1 − (λ + μ)r0 + λb(1 − r0 ) = μ(1 − r0 )2

−1

(16)

5 Numerical Results For demonstrating the pertinency of the mobile wallet-based blockchain system, several numerical results have been implemented. Table 1 shows the expected waiting time in the waiting queue by the transactions, and Table 2 illustrates the variance of waiting time in the queue by the transactions. The expected waiting time decreases when block size b is more. The expected waiting time increases as λ decreases. Figures 3, 4, 5, and 6 show several performance measures of the system. Table 1 Expected waiting time in the waiting queue by the transactions λ = 14

λ = 12

λ = 10

b

μ=6

μ=8

μ=6

μ=8

μ=6

μ=8

1

7.27263

6.39335

6.75678

6.13232

6.13942

5.64558

2

6.27271

5.43034

5.75674

5.10345

5.45941

4.94046

3

5.77156

5.13447

5.13458

4.82938

4.94091

4.13249

6

4.72634

4.14242

4.72634

3.67772

4.72635

3.14543

12

4.17377

3.55247

4.17374

3.13362

4.17377

2.66452

24

3.67233

3.13223

3.67238

2.86571

3.67233

2.32566

Performance Measures of Blockchain-Based Mobile Wallet …

275

Table 2 Variance of waiting time in the queue by the transactions λ = 14

λ = 12

λ = 10

b

μ=6

μ=8

μ=6

μ=8

μ=6

μ=8

1

48.84842

35.47488

35.48484

35.30302

35.38934

25.89392

2

35.76374

25.93833

25.47832

25.23035

24.59456

24.38381

3

25.13321

24.73748

24.45437

17.82387

24.39392

15.93935

15.39396

6

17.57646

16.59491

15.55342

15.39397

9.123211

12

16.36467

10.48948

16.44557

9.393936

14.39393

8.383944

24

15.57473

11.54343

8.939334

10.39307

5.293935

9.303435

Fig. 3 Buffer size versus blocking

6 Conclusion Blockchain and distributed ledger technology saw its inception through cryptocurrencies. Over the years, several research communities and industries started leveraging this technology into an array of use cases. Many mathematical models have also been built to prove the authenticity of blockchain-based applications. Whether the use cases are going to give the required performance measures is the question of prime interest. To answer this, current paper implements a queueing M/Mb /1 model that assesses the various performance measures of the mobile wallet system. The future endeavors include the multiple parameter studies and their corresponding impact on different distributions.

276 Fig. 4 Buffer size versus length of the probability system

Fig. 5 Buffer size versus length of the queue

R. Kavitha et al.

Performance Measures of Blockchain-Based Mobile Wallet …

277

Fig. 6 Buffer size versus waiting time in the system

References 1. Egala, B.S., Pradhan, A.K., Badarla, V.R., & Mohanty, S.P.: Fortified-chain: a blockchain based framework for security and privacy assured internet of medical things with effective access control. IEEE Internet Things J. (2021) 2. Mohanty, S.P., Yanambaka, V.P., Kougianos, E., Puthal, D.: PUFchain: a hardware-assisted blockchain for sustainable simultaneous device and data security in the internet of everything (IoE). IEEE Consum. Electron Mag. 9(2), 8–16 (2020) 3. Puthal, D., Mohanty, S.P., Kougianos, E., Das, G.: When do we need the blockchain. IEEE Consum. Electron Mag. 10(2), 53–56 (2021) 4. Patra, S.S., Misra, C., Singh, K.N., Gourisaria, M.K., Choudhury, S., Sahu, S.: qIoTAgriChain: IoT blockchain traceability using queueing model in smart agriculture. In: Blockchain Applications in IoT Ecosystem, pp. 203–223. Springer (2021) 5. Goswami, V., Patra, S.S., Mund, G.B.: Performance analysis of cloud computing centers for bulk services. Int. J. Cloud Appl. Comput. (IJCAC) 2(4), 53–65 (2012) 6. Mukherjee, P., Barik, R.K., Pradhan, C.: A comprehensive proposal for blockchain-oriented smart city. In: Security and Privacy Applications for Smart City Development, pp. 55–87. Springer, Cham. 7. Goswami, V., Patra, S.S., Mund, G.B.: Performance analysis of cloud with queue-dependent virtual machines. In: 2012 1st International Conference on Recent Advances in Information Technology (RAIT), pp. 357–362. IEEE (2012) 8. Patra, S.S.: Energy-efficient task consolidation for cloud data center. Int. J. Cloud Appl. Comput. (IJCAC) 8(1), 117–142 (2018) 9. Dutta, A., Bhattacharyya, A., Misra, C., Patra, S.S.: Analysis of encryption algorithm for data security in cloud computing. In: Smart Computing Techniques and Applications, pp. 637–644. Springer, Singapore (2021) 10. Mukherjee, P., Patra, S.S., Barik, R.K., Pradhan, C., Barik, L.: hQChain_leveraging towards blockchain and queueing model for secure smart connected Health. Int. J. E-Health Med. Commun. (IJEHMC) 12, 6(3) (2021)

Conventional Data Augmentation Techniques for Plant Disease Detection and Classification Systems Srinivas Talasila, Kirti Rawal, and Gaurav Sethi

Abstract A tremendous improvement has been seen in the development of plant leaf disease detection and classification systems using convolutional neural networks from the last few years. The performance of these detection algorithms depends on a large amount and a wide variety of data. But, the collection of such an amount of data relies on so many parameters like weather condition, varying illumination, and non-occurrence of diseases in that specific time. Data augmentation techniques are very much needful to overcome this issue. In this work discussed, various data augmentation techniques that are applied to increase the black gram leaf disease dataset, which is further used to train the customized black gram plant leaf disease detection and classification systems.

1 Introduction Deep learning surprises many computer vision applications like image classification and localization, image reconstruction, image synthesis, object detection, object segmentation, and many others. All these applications are firmly relying on a large amount and a wide variety of data. Data augmentation is a powerful tool to increase deep learning models’ generalizability that effected by the overfitting of the training

S. Talasila (B) · K. Rawal · G. Sethi School of Electronics and Electrical Engineering, Lovely Professional University, Phagwara 144411, Punjab, India K. Rawal e-mail: [email protected] G. Sethi e-mail: [email protected] S. Talasila Malla Reddy College of Engineering and Technology, Hyderabad 500100, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_26

279

280

S. Talasila et al.

model. Overfitting takes place because of diverse issues like data classification difficulty, occurrence of noise on training images, and sparse datasets. So, many techniques are available to increase the generalizability of deep learning models, such as dropout, batch normalization, transfer learning, pretraining, one shot, and zero-shot learning [1]. But, data augmentation is trying to solve the overfitting problem from the root and helps the networks to reduce the distance between the points of training and validation as well as testing sets. It is well known that deep learning models provide better performance when they trained on massive datasets. However, the collection and labeling of such immense datasets in the area of plant diseases detection and classification is a difficult task, laborious, and expensive. Recent researches have proved that plant disease detection and classification algorithms based on deep learning exhibit greater classification accuracy when trained on larger datasets [2–5] and a little bit low when trained on sparse datasets [6]. It is hard to build enormous datasets for identifying and classifying plant diseases because of the non-prevalence of plant diseases at that particular time, lack of knowledge on plant diseases by the researchers, and environmental changes while capturing the images from the cultivation fields, etc. Class imbalance is another serious issue to overcome. These difficulties have led to data augmentation in the agriculture sector, particularly in the field of plant disease detection and classification. Data augmentation is a process of creating a wide variety of data using limited available data. One can apply data augmentation techniques manually or artificially to increase their model’s performance. Presently, many data augmentation techniques are available, categorized into basic image manipulations and deep learning approaches. Flipping, cropping, rotating, color transformations, noise injection, kernel filters, and random erasing come under basic image manipulations, while adversarial training, neural style transfer, and GAN-based data augmentations come under deep learning approaches. The use of particular data augmentation techniques depends on the data availability and the author’s intuition. This section discussed the need for image data augmentation to identify and classify plant leaf diseases; Sect. 2 presents the related works; Sect. 3 presents methodologies adopted to increase the dataset and their results, followed by Sect. 4 conclusion and future scope.

2 Related Works Employing deep neural networks to classify plant leaf diseases has been popular after Mohanty et al. [7] achieved a remarkable classification accuracy on PlantVillage database. Joshi et al. [8] employed basic image manipulation techniques such as Gaussian blur, rotation, and scaling to increase the robustness of VirLeafNet, which classifies the yellow mosaic virus in three infected categories (healthy, mild, and severely). Hossain et al. [9] adopted rotation, illumination switching, and vertical/horizontal symmetry augmentation techniques to increase the versatility of their DSCPLD system. U˘guz et al. [10] compared their proposed CNN architecture

Conventional Data Augmentation Techniques for Plant Disease …

281

performance with data augmentation and without data augmentation. Experimental results were evident that the success rate is higher when their model was trained on augmented images while data augmentation authors considered zooming [0.5, 1.0], brightness [0.2, 1.0], and horizontal flip and 90º rotation techniques. Gnanasekaran et al. [11] utilized contrast limited adaptive histogram equalization (CLAHE) technique to enhance the low contrast images before applying height and width shifts, zooming, shearing, scaling, cropping, and flipping augmentation techniques. Authors employed augmentation techniques to overcome class imbalance issue in their adopted dataset. Karlekar et al. [12] proposed a soybean leaf disease classification network called SoyNet. In this work, authors adopted scaling blur, rotation blur, PCA jittering, color jittering, and adding noise types of augmentation techniques with traditional rotation, scaling, and flipping techniques to balance the individual classes. Nazki et al. [13] developed an unsupervised image-to-image translation method using generative adversarial networks (GANs) to improve the performance of plant diseases detection algorithms. In this work, the authors introduced AR-GAN that improves on CycleGAN, unsupervised image-to-image translation by introducing an activation reconstruction module consisting of a feature extraction network to calculate activation reconstruction loss (ARL). Yuwana et al. [14] employed GANs and DCGANs for augmenting data and observed that DCGANs exhibit superior performance with DenseNet architecture. Bin et al. [15] proposed a novel LeafGAN model to generate synthetic grape leaf diseased images. Frechet inception distance metric was used to check the quality of LeafGAN output, and it is evident that the proposed model is better than the DCGAN and WGAN. Cap et al. [16] introduced LeafGAN to enhance the cucumber disease dataset. After that they have trained the ResNet-101 model with the images obtained using LeafGAN and Vanilla CycleGAN. The experimental results noted that the network got better diagnostic performance when the model trained on LeafGAN outputs. Shin et al. [17] employed the rotation technique with three different angles to study the effect of augmentation on classification accuracy.

3 Materials and Methods 3.1 Dataset Dataset used in this work collected from the natural cultivation fields at Nagayalanka, Andhra Pradesh, India provenance. This dataset is formed by collecting 1000 plant leaf images of five categories (four disease categories and a healthy category) of black gram crop. The aim of composing this dataset is to design a customized deep learning algorithm for the identification and classification of black gram plant leaf diseases. Firstly, image preprocessing techniques were applied to the acquired images. After that by applying image segmentation methods, the leaf regions were extracted from

282

S. Talasila et al.

the complex background, which were then used as an input to the classification algorithms. But, deep learning-based classification models give better results when they trained on a wide variety of data. For that reason, to enhance the data, conventional data augmentation techniques (discussed in Sect. 3.2) were employed.

3.2 Methodologies Adopted to Enhance the Dataset 3.2.1

Rotation

Rotation is a widely used image manipulation technique that rotates or changes the displacement of all pixels with a specific angle θ . Let (x1 , y1 ) are the coordinates of an input image, and (x2 , y2 ) are the coordinates of an output image after rotating by a user-specified angle θ about the center of rotation (x0 , y0 ). Then, after rotation, output image coordinates can be obtained by Eqs. 1 and 2. x2 = (x1 − x0 ) ∗ cos θ − (y1 − y0 ) ∗ sin θ + x0

(1)

y2 = (x1 − x0 ) ∗ sin θ + (y1 − y0 ) ∗ cos θ + y0

(2)

In this work, 45º, 90º, 135º, 180º, 225º, 270º, and 315º rotation angles were used to enhance the dataset (results shown in Fig. 1a–h).

Fig. 1 Rotation augmentation a original image, b 45º rotation, c 90º rotation, d 135º rotation, e 180º rotation, f 225º rotation, g 270º rotation, h 315º rotation

Conventional Data Augmentation Techniques for Plant Disease …

283

Fig. 2 Mirror symmetry augmentation a original image, b horizontal symmetry, c vertical symmetry

3.2.2

Mirror Symmetry

Mirror symmetry creates reflected replication of an image that looks similar to the input image but is reversed in a direction perpendicular to the mirror surface. The horizontal mirror symmetry of an image is obtained by the mirror reflected of an input image across the horizontal line and is also called flipping (shown in Fig. 2b). The vertical mirror symmetry of an input image is obtained by the mirror reflected of an input image across the vertical line and is also called flopping (shown in Fig. 2c). Let us consider an input image f (x0 , y0 ) having width ω, then after employing horizontal and vertical mirror symmetry, the new pixel values are (as per Eqs. 3 and 4)

3.2.3

x1 = ω − x0 , y1 = y0

(3)

x1 = x0 , y1 = ω − y0

(4)

Illumination Corrections

Plant leaf disease detection algorithms provided better results when they trained on real cultivation field images. But, images captured under cultivation conditions may contain external noises like shadows, sand, dust, daylight variations, and foggy weather conditions. To increase the capability and to enhance the information on the acquired images, sharpening, brightness, and contrast image processing techniques were used. Sharpening is the process of creating an image with highlighted edges and exemplary attributes of an input image. It is mainly used to overcome the issues that happened due to blurred images and focus on specific regions, improving the quality of an input image. For sharpening the input image f (x, y), 2nd-order derivative Laplacian is applied according to Eq. 5   s(x, y) = f (x, y) + k ∇ 2 f (x, y)

(5)

284

S. Talasila et al.

Fig. 3 Illumination correction augmentation a original image, b–e using different contrast limits for input and output image

where s(x, y) is the sharpened image, k maybe 1 or −1. Brightness is nothing but changing the intensity value of all pixels in an image by a constant. The image becomes brighter if a positive value added to all the pixel values and becomes darker if a negative value is added, whereas contrast is the difference between the brightness of all objects in an image. By changing the slope of the transfer function, one can adjust the contrast of an image. One can improve the contrast of an image by taking the slope of the transfer function larger than ‘1’ and decrease the contrast of an image by taking the slope of the transfer function less than ‘1’. Figure 3 depicts the original image and its illumination correction images using different contrast limits for input and output image.

3.2.4

Random Shifting or Translation

With the help of shifting operations, one can shift all pixels of an image in one direction, either horizontally or vertically and can keep the same image dimensions. For example, if you consider the plant disease datasets, most of the images were captured by considering leaves at the center when network trained on these images may give good results only on those types of images. So, to overcome the positional bias, shifting or translation techniques were helpful. Figure 4 depicts the outcomes of vertical shift and horizontal shift of an input image.

Fig. 4 Shifting or translation augmentation a original image, b vertical shift by 250 pixels, c horizontal shift by 250 pixels

Conventional Data Augmentation Techniques for Plant Disease …

3.2.5

285

Noise Injection

Data collected from the cultivation field are seldom clean. But, we train deep learning models with clean data, which results in poor performance while testing. So, if we add noise to the input data, it helps to enhance the data and give better results on noisy data as well. Noise augmentation on the training data increases the robustness of the training process and minimizes the generalization error. In this work, gaussian, poisson, salt & pepper, and speckle noises were adopted for the augmentation. Most commonly used noise is the Gaussian noise, which is also known as electronic noise. Gaussian noise is generated by adding the random Gaussian function to the image. Gaussian noise having a probability density function equal to that of the normal distribution function is shown in Eq. 6 p(z) =

(x−μ)2 1 √ e− 2σ 2 σ 2π

(6)

where p is the probability density function of Gaussian random variable z, mean μ, and variance σ . Here, to generate the Gaussian noise on the input, zero mean, Gaussian white noise with variance 0.01 were considered. Poisson noise, also referred to as shot noise or photon noise, is a fundamental type of ambiguity related to the measurement of light, inherent to the quantized nature of light and having independent photon detections. Poisson noise has a probability density function (shown in Eq. 7) P(k) =

λk e−λ k!

(7)

where λ is a positive number and is the expected number of successes per time. Salt-and-pepper noise is a sort of impulse noise applied to an image by adding both random bright and random dark throughout the image. Salt and pepper noise is also called data drop noise because this noise is obtained by randomly dropping original values. This noise degrades the image quality as pixel values are randomly changed to either maximum (White-salt) or minimum (Black-pepper), i.e., ‘255 or 0’, respectively. Speckle noise is also called multiplicative noise or granular noise that intrinsically present in an image and deteriorates its quality. It is generated by multiplying random pixel values with different pixels of an image. Its probability distribution function follows gamma distribution (shown in Eq. 8). g

F(g) =

g α−1 e− a (α − 1)!a α

(8)

Figure 5 represents the outcomes of Gaussian, Poisson, salt and pepper, and speckle noises of an input image.

286

S. Talasila et al.

Fig. 5 Noise injection augmentation a Gaussian noise, b Poisson noise, c salt-and-pepper noise, d speckle noise

Several other techniques are available with the above augmentation techniques, such as color jittering, image mixing, and random erasing. The reason behind not applying those augmentation techniques is that they may miss lead the required disease information on the images. In this work, only basic augmentation techniques were considered, and for extension, we want to employ augmentation techniques based on deep learning in the near future.

4 Conclusion The success rate of plant disease identification and classification systems based on CNN’s depends on data diversity. But, collecting such an amount of diverse data is time-consuming and laborious. In this situation, data augmentation plays a major role and could improve the performance of selected models. This work discusses available traditional data augmentation techniques that were employed for enhancing the dataset. The presented work will help the readers to expand their narrow datasets, and they will gain the knowledge on conventional data augmentations which can improve the performance of their models.

References 1. Shorten, C., Khoshgoftaar, T.: A survey on image data augmentation for deep learning. J. Big Data 6 (2019). https://doi.org/10.1186/s40537-019-0197-0 2. Ferentinos, K.P.: Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 145, 311–318 (2018). https://doi.org/10.1016/j.compag.2018.01.009 3. Kamal \, K.C., Yin, Z., Wu, M., Wu, Z.: Depthwise separable convolution architectures for plant disease classification. Comput. Electron. Agric. 165https://doi.org/10.1016/j.compag. 2019.104948 4. Arsenovic, M., Karanovic, M., Sladojevic, S., Anderla, A., Stefanovi´c, D.: Solving current limitations of deep learning based approaches for plant disease detection. Symmetry 11, 21 (2019). https://doi.org/10.3390/sym11070939

Conventional Data Augmentation Techniques for Plant Disease …

287

5. Too, E., Yujian, L., Njuki, S., Yingchun, L.: A comparative study of fine-tuning deep learning models for plant disease identification. Comput. Electron. Agric.https://doi.org/10.1016/j.com pag.2018.03.032 6. Srivastava, S., Kumar, P., Mohd, N., Singh, A., Gill, F.: A novel deep learning framework approach for sugarcane disease detection. SN Comput. Sci. 1, 87 (2020). https://doi.org/10. 1007/s42979-020-0094-9 7. Mohanty, S.P., Hughes, D.P., Salathé, M.: Using deep learning for image-based plant disease detection. Front. Plant Sci. 7, 1419 (2016). https://doi.org/10.3389/fpls.2016.01419 8. Joshi, R., Kaushik, M., Dutta, M., Srivastava, A., Choudhary, N.: VirLeafNet: automatic analysis and viral disease diagnosis using deep-learning in Vigna Mungo plant. Eco. Inform. 61https://doi.org/10.1016/j.ecoinf.2020.101197 9. Hossain, S.: Plant leaf disease recognition using depth-wise separable convolution-based models. Symmetry 13 (2021).https://doi.org/10.3390/sym13030511 10. U˘guz, S., Uysal, N.: Classification of olive leaf diseases using deep convolutional neural networks. Neural Comput. Appl. 33https://doi.org/10.1007/s00521-020-05235-5 11. Gnanasekaran, S., Opiyo, G.: A predictive machine learning application in agriculture: Cassava disease detection and classification with imbalanced dataset using convolutional neural networks. Egypt. Inf. J. 22https://doi.org/10.1016/j.eij.2020.02.007 12. Karlekar, A., Seal, A.: SoyNet: Soybean leaf diseases classification. Comput. Electron. Agric. 172, 105342 (2020).https://doi.org/10.1016/j.compag.2020.105342 13. Nazki, H., Yoon, S., Fuentes, A., Park, D.: Unsupervised image translation using adversarial networks for improved plant disease recognition. Comput. Electron. Agric. 168https://doi.org/ 10.1016/j.compag.2019.105117 14. Yuwana, R., Fauziah, F., Heryana, A., Krisnandi, D., Kusumo, R., Pardede, H.: Data augmentation using adversarial networks for tea diseases detection. Jurnal Elektronika dan Telekomunikasi. 20, 29 (2020). https://doi.org/10.14203/jet.v20.29-35 15. Bin, L., Tan, C., He, J., Wang, H.: A data augmentation method based on generative adversarial networks for grape leaf disease identification. IEEE Access, 1–1 (2020). https://doi.org/10. 1109/ACCESS.2020.2998839 16. Cap, Q., Uga, H., Kagiwada, S., Iyatomi, H.: LeafGAN: an effective data augmentation method for practical plant disease diagnosis. IEEE Trans. Autom. Sci. Eng, 1–10 (2020). https://doi. org/10.1109/TASE.2020.3041499 17. Shin, J., Chang, Y., Heung, B., Nguyen-Quang, T., Al-Mallahi, A., Price, G.: Effect of directional augmentation using supervised machine learning technologies: a case study of strawberry powdery mildew detection. Biosys. Eng. 194https://doi.org/10.1016/j.biosystemseng. 2020.03.016

Directionality Information Aware Encoding for Facial Expression Recognition A. Vijaya Lakshmi and P. Mohanaiah

Abstract Facial expression recognition has gained a huge research interest in the field of computer vision. However, the main issue is designing an appropriate face descriptor that can capture the facial variations for different expressions. Toward such an objective, in this paper, we propose a new face descriptor based on facial muscle movements. We employed the Robinson compass mask to capture these movements and derived the edge responses in eight directions. Next, we propose a new texture encoding scheme based on a local binary pattern called direction aware binary pattern (DABP), which encodes the directional and magnitude of edge responses. Finally, the face is described through a histogram descriptor and then fed to the KNN algorithm for classification. For experimental validation, we simulated the proposed approach over FER-2013 dataset, and the performance is measured through recognition accuracy.

1 Introduction In the current world, facial expressions have an important part because they carry a huge amount of information regarding human being’s intentions. The amount of information conveyed by facial expression in interpersonal communication is not possible with traditional elements like voice and hand gestures [1]. The conventional electronic intelligent teaching system has a major drawback is that the feedback is collected from the students in the form of voice. This is not a vital solution, and the shortcomings of such a system are solved through facial expressions. Due to this reason, facial expression recognition (FER) was gained a huge interest in the research related to computer vision. FER has widespread applications in several

A. Vijaya Lakshmi (B) Department of ECE, JNTUA, Anantapur, A.P, India P. Mohanaiah Department of ECE, N.B.K.R Institute of Science and Technology, Vidyanagar, A.P, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_27

289

290

A. Vijaya Lakshmi and P. Mohanaiah

fields like human–computer interaction technology, medical field, distance education, automatic counseling systems, face expression synthesis, lie detection, music for mood, fatigue detection, and computer-aided training [2]. For instance, the interactions between robots and human beings become much flexible if the robot is able to recognize facial expressions. Similarly, the robots can provide physical assistance to the patients who were not able to talk. Additionally, the application of FER technology also granted novel achievements in the area of field film animation and the amalgamation of facial images. Hence, the FER has become much significant for the current society, which motivated us to develop an automatic FER system with high recognition efficiency. Typically, the FER system is composed of two stages: They are face image representation and classification. In the former stage, the input face image is described with several descriptors such that the hidden properties of facial expressions are alleviated. In the next stage, the classifier classifies the emotion based on the descriptor. The face descriptor’s design must be in such a way the FER system has to become robust for all facial variations like noise, facial deformations, lighting conditions, and facial poses. Moreover, the face descriptor must ensure a more inter-class variance. Most of the researchers in FER have concentrated on the design of an efficient face descriptor. According to the theme of feature descriptor design, the earlier methods are widely classified as appearance-based methods and geometric-based methods [3]. The appearance-based methods consider template matching, and every template is treated as a vector or a pixel. On the other hand, the geometric features-based FER is considered “principal component analysis (PCA)” [4] and “multi-layer neural network (ML-NN)” to obtain lower-sized feature space for face images. However, the computational complexity of feature-based methods is observed to be large compared to the methods based on template matching, but they are delicate to position, orientation, size, and scale of face image [5]. Local binary pattern-based FER [6] is one of the most employed feature extraction techniques for FER. Even though the LBP features had shown significant performance in recognition of facial expression, the LBP is observed to have more information loss. To solve the problem with LBP, we proposed a new face descriptor based on LBP. This approach employed an enhanced version of LBP to encode the facial features. Before encoding the facial features, the input facial image is subjected to preprocessing to extract the features in multiple orientations. For this purpose, we employed a strong edge detector called as Robinson edge operator in eight directions. Once the edge features are calculated for all directions, then we apply a texture encoding technique based on the directionality information. Finally, the entire face is described through a histogram descriptor and then fed to a classifier. For classification purposes, we employed a simple and effective classifier, i.e., KNN classifier. The rest of the paper is structured as follows: The profounded details of literature survey are explored in Sect. 2. The full-pledged details of proposed methods toward FER are explored in Sect. 3. Simulation experiment details are deliberated in Sect. 4, and concluding remarks are provided in the last and final sections.

Directionality Information Aware Encoding for Facial Expression …

291

2 Literature Survey In earlier, there are many appearance-based approaches proposed for expression recognition from facial images. They are simply categorized into global and local representations. In the global representation, the entire image is considered, while in local representation, the image is initially segmented into local divisions, and then, they are subjected to feature representation. Fisher faces [7] and Eigenfaces [8, 9] are the best examples for global representation, in which the input images are considered as a 1D vector. But, they are not robust for pose and illumination variations in the facial image. On the other hand, the local representations are more effective. They can explore the hidden properties of images such that the FER system will become robust and can recognize emotions in any type of environment. LBP [10–12] is one of the most popular and effective local representation methods that encode the texture information of the facial image. Recently, Cao et al. [13] applied LBP over the facial image to encode the facial texture variations. For LBP calculation, the input facial image is subjected to preprocessing, and at this phase, the input image is segmented into equitized non-overlapping regions. Once the image is represented through binary patterns, they are fed to “support vector machine (SVM)” for classification. Similarly, Kauser and Sharma [14] also applied segmentation over the facial image, but they segmented the facial parts such as mouth, nose, and eyes. Over every facial part, they applied LBP and represented them with local textures. For classification purposes, they used artificial neural network. Goyani and Patel [15] proposed an extension of LBP called “local mean binary pattern (LMBP)” in which the face is described through its shape and texture. In LMBP, the encoding is done with the help of the mean of a local block of nine pixels. Unlike the LBP process, which encodes through center pixel intensity, LMBP is considered the mean of the entire block for encoding. This approach also proposed a novel template matching strategy called as “histogram normalized absolute difference (HNAD)” for the comparison of LMBP histograms. The major problem with LBP and its variants is limited recognition accuracy due to huge information loss. To resolve the deficiencies of LBP, Guo et al. [16] developed a new version of LBP named as “extended local binary pattern (ELBP)” and combined it with “K-L transform (KLT).” Here, ELBP is employed for feature extraction, and then, covariance matrix is employed to lessen the dimensionality aiming at the extraction of main feature vectors. For classification purposes, they employed SVM algorithm. Muqeet et al. [17] also employed LBP to encode the facial features of the image. Before LBP, they employed an “interpolation-based directional wavelet transform (DIWT)” which adopts quadtree partitioning of the facial image to facilitate an adaptive direction selection. Then, LBP histograms are extracted for top-level DIWT sub-bands to derive the local descriptive features. Chao et al. [12] proposed the “expression-specific local binary pattern (ESLBP)” which considered the fiducial points of face image and emphasizes the partial information of the human face. Next, for the purpose of maximizing the relation

292

A. Vijaya Lakshmi and P. Mohanaiah

between expression classes and features of faces, they proposed a “class-regularized locality preserving projection (cr-LPP)” whose aim is the maximization of class independence and simultaneous preservation of local features similarity through dimensionality reduction. Zhao et al. [18] proposed to extract the dynamic and static information from video sequences for FER. Initially, they applied an incremental discriminative deformable face alignment method to locate the facial points there by the in-plane head rotation is achieved followed by the extraction of the facial region from the background. Then, they applied spatiotemporal LBP for feature extraction and combined it with Gabor multi-orientation fusion histogram to reflect the dynamic and static texture information of facial expressions. For classification, they employed a multi-class support vector machine algorithm. Considering the non-linear manifold structure of facial images, Zhao and Zhang [19] proposed a new method named “kernel discriminant isometric mapping (KDIsomap)” [20]. This map aims to extract features in Hilbert kernel space that have maximum inter-class variance and minimum inter-class variance. Initially, the facial image is processed through LBP, and then, the KDIsomap is applied for the purpose of dimensionality reduction. For classification, they employed “nearest neighbor classifier (NNC)” through the computation of Euclidean distance.

3 Proposed Approach Here, we explore the details of our proposed FER system. The major contribution of this work is the development of a new face emotion descriptor through an edge mask. Here, we consider the Robinson edge mask for the extraction of edge features in eight directions. Once the facial image is masked through the Robinson compass mask, then, they are subjected to the newly proposed texture encoding technique. For texture encoding, here, we applied a new method that was developed based on the directional information. Once the face image is described through a face descriptor, then, they are fed to KNN classifier for emotion recognition. Figure 1 depicts the simple schematic of the developed FER system.

Fig. 1 Schematic of proposed FER system

Directionality Information Aware Encoding for Facial Expression …

293

3.1 Robinson Compass Mask In the FER system, the facial expressions directly relate to the shape of the face, meaning the shape of facial images varies with the variations in facial expressions. If the face descriptor employed for face representation is able to acquire such kinds of variations, then the FER becomes the most effective system, and it can recognize the emotions with high accuracy. Moreover, if such kinds of features are only processed for further encoding, then the computational burden is also reduced over the FER system. Hence, the extraction of the most discriminative features for every emotion has prime importance in the design of a FER system. In the facial image, the magnitudes of a pixel at boundaries are highly compared with the magnitudes of pixels at smooth regions. For example, in the happy expression, a boundary arises at the left and right cheek. The magnitudes of these boundaries have much deviated with the magnitude of smooth regions like the forehead. To acquire such features, we employed a novel edge operator called the Robinson edge operator, which extracts the boundaries of facial images and the edge responses. Basically, the Robinsons compass mask is a directional mask that has eight directional filters. For every direction, it has one filter, and by convolving the image with that filter, we can get the edge response in that direction. The filter design at multiple directions is a simple task, and it is accomplished by the shifting of filter coefficients. The total eight directions are named as East (E), South East (SE), South (S), South West (SW), West (W), North West (NW), North (N), and North East (NE). The Robinson compass mask is an asymmetric kind of mask, and it produces similar edge responses with opposite signs in opposite directions. For instance, the magnitudes of edge responses of west and east are the same but opposite in sign. For example, if the edge responses of a pixel in the east direction are considered as 234, then the edge responses of the same pixel in the west direction is—234. Thus, out of eight edge responses, we consider only four edge responses. The filters at eight directions of the Robinson compass mask are shown in Fig. 2, and the consistent magnitude responses are exposed in Fig. 3.

3.2 Encoding After the computation of edge responses in eight directions through the Robinson compass mask, we apply to encoder to represent every output with respect to its texture features. For this purpose, we proposed an improved version of LBP called direction aware binary pattern (DABP). Unlike the simple LBP, this approach considers the edge response as inputs and encodes the magnitude as well as directional information in every direction. Instead of eight edge responses, we consider only four responses because this mask is an asymmetric mask. Hence, the edge responses of oppositedirections are similar but opposite in sign. Consider the masks at eight directions are represented as M1 , M2 , M3 , M4 , M5 ., M6 , M7 , and M8 ; initially,

294

A. Vijaya Lakshmi and P. Mohanaiah

Fig. 2 Robison compass masks in eight directions

Fig. 3 Edge responses of Robinsons compass masks in eight directions

Directionality Information Aware Encoding for Facial Expression …

295

the original face image is convolved with all these masks. For simplicity, we perform convolution with only four masks and derived four edge responses. For the remaining four edge responses, we simply changed the signs of the first four edge responses. The main reason behind this simple computation is symmetry between masks in opposite directions. For instance, consider the first four masks M1 , M2 , M3 , and M4 represents the masks in four directions East, Northeast, North, and Northwest, then the remaining four masks have the same filter coefficient but with opposite signs. Thus, the edge response obtained through the first four masks is mirror of the edge responses obtained through the remaining four masks. The major advantage with Robinson compass masks-based edge responses extraction is less computation burden. Furthermore, Lahdenoja [21] proved that the patterns with high symmetry levels have more occurrence probability in the facial images. Due to this reason, the Robinson compass masks are effective in the representation of facial images with symmetrical features. Our new encoding scheme encodes the magnitude as well as the direction of the four edge responses as follows: Consider X to be a facial image with size M × N , where M is the number of rows and N be the number of columns, the computation of edge responses of X with four masks is done as E i = X ∗ Mi , i ∈ [0, 1, 2, 3]

(1)

where E i is the edge response of image X after its convolution with ith Robinson compass mask Mi . From the obtained edge responses, for every pixel, we search for the initial positive and top responses as well as initial negative and top responses. Consider the positive direction of a pixel at location (x, y) is P(x, y), and the negative direction is N (x, y), they are calculated according to the following mathematical expressions, P(x, y) = argmax(E i (x, y))

(2)

i

and N (x, y) = argmin(E i (x, y))

(3)

i

In our method, we represent every pixel with eight bits. Among the eight bits, the first six bits are derived from the positive direction and negative direction such as P(x, y) and N (x, y), respectively. Over the remaining two bits, a normal LBP method is employed over the negative and positive but top edge responses. To do this task, at first, we discover the 2nd negative and top positive responses and the corresponding pixels. Next, the center pixel magnitude is subtracted from the magnitude of these two pixels, and based on the obtained value, either 0 or 1 is allocated. For the encoding of edge responses of the Robinson mask, the center pixel has only four neighbor pixels because we consider only four responses. Among the four pixels, we encoded the

296

A. Vijaya Lakshmi and P. Mohanaiah

two pixel’s information with respect to their directions, and the remaining pixel’s information is encoded with magnitude information. Hence, the information loss is reduced, which is the major drawback of LBP. For the remaining two pixels, the encoding is done through following process: P2 = S( pi − ci )

(4)

N2 = S(n i − ci )

(5)

and

where  S(x) =

1, ifx ≥ 0 0, ifx < 0

(6)

where pi is the positive pixel of input image at the same position in the edge response; n i is the negative pixel of input face image at the same position in edge response, and ci is the center pixel. The final pattern of a pixel is obtained by the horizontal concatenation of the results obtained at Eqs. (2), (3), (4), and (5),. Let DABP(x, y) be the binary pattern of a pixel present at the location (x, y), it is calculated as DABP(x, y) = [P(x, y)P2 N (x, y)N2 ]

(7)

Finally, over the encoded image, we apply histogram computation. For histogram computation, we divide the encoded image into equal-sized blocks let it be {B1 , B2 , . . . , B N }; the histogram is computed for every block and let Hi be the histogram of the ith block; it is computed as 

Hi (c) =

C

(8)

GDP(x, y) = c (x, y) ∈ Bi where (x, y) signifies the position of a pixel in the block Bi ; c is a binary DABP code, and DABP(x, y) is a binary DABP code of a pixel present at position (x, y), and C is the cumulated measure. Next, the final histogram of the entire image is calculated by the concatenation of the histograms of entire blocks present in image as; H=

N  i=1

Hi

(9)

Directionality Information Aware Encoding for Facial Expression …

297

where N denotes the total count of blocks into which the input facial expression image is segmented,  signifies the operation of concatenation. In this work, we apply spatial concatenation; thus, the final histogram of image contributes more toward the recognition of facial expressions.

4 Experimental Results Here, for experimental validation of our FER system, we used standard face emotion datasets, i.e., FER-2013. For simulation purposes, we used MATLAB software. At FER-2013, we conduct a vast set of experiments, and the performance is assessed with the help of several performance metrics like accuracy, false-negative rate, F-score, precision, and Recall. About the class problem, we conduct two case studies with 6-class and 7-class. In the seven-class problem, we consider neutral as an emotion, while in the 6-class problem, we neglect the neutral emotion.

4.1 Results Over 7-class Facial expression recognition 2013 (FER-2013) database [22] has a total 35,887 different facial images. This dataset has totally seven emotions; they are angry, surprise, sad, neutral, happy, fear, and disgust. The resolution of each grayscale facial image of FER-2013 is 48 × 48. Some examples of this dataset are shown in Fig. 4. The total number of facial images employed for training and testing is shown in Table 1. The recognized emotion results after the simulation of proposed DABP models over FER-2013 are shown in Table 2. From this table, we can observe that the emotions that have similar attributes are mostly misclassified. For example, consider the happy emotion in which the major muscle movements are observed at the rise of cheeks on both left and right sides. The same movements can also be observed for

Fig. 4 Samples image of FER-2013 a angry, b disgust, c fear, d happy, e neutral, f sad, and g surprise

298

A. Vijaya Lakshmi and P. Mohanaiah

Table 1 Simulation set of FER-2013 Emotion

Training

Angry

Testing

800

Disgust

Total

195

995

88

45

133

820

206

1026

Happy

1400

368

1768

Neutral

990

246

1236

Sad

966

250

1216

Surprise

635

166

801

Fear

Table 2 Confusion matrix of the FER-2013 for 7-class problem Angry Angry

Disgust

Fear

Happy

Neutral

Sadness

Surprise

Total

168

0

7

0

0

12

8

2

31

4

0

2

6

0

45

14

5

162

15

0

5

5

206

Happy

8

12

3

317

4

3

21

368

Neutral

10

18

32

5

129

47

5

246

Sadness

5

10

15

10

33

167

10

250

Disgust Fear

Surprise Total

195

1

5

5

19

5

5

126

166

208

81

228

366

173

245

175

1476

surprise emotion, but an open mouth is observed in surprise, while an open mouth with teeth is observed in happy emotion. Similarly, consider the sad emotion in which almost all facial muscles are relaxed, which resembles the motions of neutral emotion. Hence, for a given 250 test sad emotion images, the system recognized 167 as sad and 33 as neutral. This same situation can be observed in a vice versa condition, i.e., for a given 246 test neutral emotion images, the system recognized 129 as neutral and 47 as sad. For a given 368 test happy emotion images, the system recognized 317 as happy and 21 images as a surprise. In the vice versa condition, for a given 166 test surprise emotion images, the system recognized 126 as a surprise and 19 images as happy. Based on these observations, we can understand that emotions with similar directional movements may get recognized incorrectly. The performance metrics are calculated from the results shown in the confusion matrix, and they are stipulated in Table 3. Here, totally, we measure five performance metrics such as recall, precision, F-score, FNR, and FPR. The basis for these metric computations is true positives (TP), true negatives (TN), false negatives, (FN), and false positives (FP). Based on these primary metrics, the secondary metrics are computed according to their standard formulae. From Table 3, it can be noticed that the Happy emotion gained maximum recall, and Neutral emotion gained minimum recall, and the approximate values are 0.8614 and 0.5243, respectively. Next, the Happy emotion gained maximum precision, and the Disgust emotion gained

Directionality Information Aware Encoding for Facial Expression …

299

Table 3 Performance metric for FER-2013 for 7-class problem Emotion/metric

Recall

Precision

F-score

FNR

FPR

Angry

0.8615

0.8076

0.8337

0.1384

0.1923

Disgust

0.6888

0.3827

0.4920

0.3111

0.6172

Fear

0.7864

0.7105

0.7465

0.2135

0.2894

Happy

0.8614

0.8661

0.8637

0.1385

0.1338

Neutral

0.5243

0.7456

0.6157

0.4756

0.2543

Sad

0.6680

0.6816

0.6747

0.3320

0.3183

Surprise

0.7590

0.7200

0.7390

0.2409

0.2800

minimum precision, and the approximate values are 0.8661 and 0.3827, respectively. Next, the Happy emotion gained maximum F-score, and the Disgust emotion gained minimum F-score, and the approximate values are 0.8637 and 0.4920, respectively. The values of FNR and FPR simply follow an opposite relation with recall and precision, respectively. Hence, the values and emotions lie opposite to them.

4.2 Results Over 6-class Under this simulation, we consider the same FER-2013 facial images, but we neglect the neutral emotion. The FER-2013 has seven expression classes. To check the impact of classes on the proposed FER system, we conduct an additional simulation study by neglecting the neutral facial images. The simulation setup for the 6-class problem is the same one that was used for the 7-class probe simulation, and the corresponding confusion matrix and performance metrics are shown in Tables 4 and 5, respectively. From the above simulation results, we can see that the lack of neutral emotion has shown a great significance toward emotion recognition improvisation. It is very simple that if the number of classes is more, then the system will confuse and misclassify the emotions. Such kind of confusion will get reduced if the classes are reduced. The confusion matrix of the 6-class problem is shown in Table 4 where the effect Table 4 Confusion matrix of the FER-2013 for 6-class problem Angry Angry Disgust Fear

Disgust

Fear

Happy

Sadness

Surprise

Total

173

6

2

2

10

2

6

33

4

0

2

0

195 45

14

6

168

6

8

4

206

Happy

8

8

4

323

4

21

368

Sadness

20

10

25

10

175

10

250

Surprise

6

3

4

16

3

134

166

227

66

207

357

202

171

1230

Total

300

A. Vijaya Lakshmi and P. Mohanaiah

Table 5 Performance metric for FER-2013 for 6-class problem Emotion/metric

Recall

Precision

F-score

FNR

FPR

Angry

0.8871

0.7621

0.8199

0.1128

0.2378

Disgust

0.7333

0.5000

0.5945

0.2666

0.5000

Fear

0.8155

0.8115

0.8135

0.1844

0.1884

Happy

0.8777

0.9047

0.8910

0.1222

0.0952

Sad

0.7000

0.8663

0.7743

0.3000

0.1336

Surprise

0.8072

0.7836

0.7952

0.1927

0.2163

of neutral emotion is distributed for remaining emotions. At the same time, the individual emotions are also detected more in number. From Table 5, the Angry emotion gained maximum recall, and Sad emotion gained minimum recall, and the approximate values are 0.8871 and 0.7000, respectively. Next, the Happy emotion gained maximum precision, and Disgust emotion gained minimum precision, and the approximate values are 0.9047 and 05.000, respectively. Next, the Happy emotion gained maximum F-score, and Disgust emotion gained minimum F-score, and the approximate values are 0.8910 and 0.5945, respectively. The average accuracy at different simulation studies is shown in Fig. 5 where we compare them with the accuracies of existing methods. As it can be seen from this figure, the accuracy of the proposed DABP is high in both simulation studies, while poor performance can be observed at LBP. The main reason behind this poor recognition performance is information loss at encoding the pixel intensities. The LBP discards huge information, and hence, it suffered from less accuracy. Next, the LMBP has better recognition accuracy when compared with LBP. Even though the LMBP followed the same LBP process, it encodes the pixel based on the mean Fig. 5 Accuracy at different simulation studies

80

Accuracy (%)

75 70 65 60 LBP [11] LMBP[13] DABP

55 50

7-Class

6-Class

Simulation

Directionality Information Aware Encoding for Facial Expression …

301

of the patch. It tried to preserve some information, making the FER increase in recognition accuracy due to such process. The proposed approach had shown an excellent performance in both simulation studies because it encodes the magnitude as well as direction information that can provide more information for the FER system regarding the muscle movements. On average, the accuracy of DABP at 6class simulation is observed as 81.23546%, while for LBP and LMBP, it is observed as 74.3265% and 77.4412%, respectively. Next, on an average, the accuracy of DABP at 7-class simulation is observed as 75.6696%, while for LBP and LMBP, it is noticed as 70.8964% and 72.2222%, respectively.

5 Conclusion In this paper, we proposed a new method called DABP for the recognition of expressions from facial images through directional information encoding. Unlike the simple LBP and its sub-variants like LMBP which encodes only the pixel intensities, DABP encodes the magnitude and direction of edge responses of facial images. Initially, the edge responses explore the edge and boundaries of the facial image that occur at the movement of expression in the facial image. Next, the DABP encodes the edges in multiple directions with an appropriate encoding; thereby, the information loss is reduced. Moreover, the directional information explores the directionality of muscle movements. Simulation experiments were conducted over FER-2013 face dataset, and the performance is assessed with the help of accuracy, precision, and recall. On average, the accuracy of DABP is observed as 78.55698%, 73.6535%, and 72.1145% for DABP, LMBP, and LBP, respectively.

References 1. Mehrabian, A., Russell, J.A.: An Approach to Environmental Psychology. MIT Press, Cambridge, MA, USA (1974) 2. Kumaria, J., Rajesha, R., Poojaa, K.M.: Facial expression recognition: a survey. In: Proceedings of Second International Symposium on Computer Vision and the Internet, pp. 486–491 (2015) 3. Wang, N., Gao, X., Tao, D., Yang, H., Li, X.: Facial feature point detection: a comprehensive survey. Neuro Comput. 275, 50–65 (2018) 4. Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cognit. Neurosci. 3(1), 71–86 (1991) 5. Bartlett, M.S., Littlewort, G., Fasel, I., Movellan, J.R.: Real time face detection and facial expression recognition: Development and applications to human computer interaction. Proc. Comput. Vis. Pattern Recognit. Workshop, 53 (2003) 6. Shan, C., Gong, S., McOwan, P.W.: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009) 7. Anggo, M., Arapu, L.: Face recognition using fisherface method. In: 2nd International Conference on Statistics, Mathematics, Teaching, and Research, vol. 028, 012119 (2018) 8. Chakrabarti, D., Dutta, D.: Facial expression recognition using Eigen spaces. Proc. Technol. 10, 755–761 (2013)

302

A. Vijaya Lakshmi and P. Mohanaiah

9. Mangala, B., Divya, S., Prajwala, N.B.: Facial expression recognition by calculating euclidian distance for Eigen faces using PCA. In: International Conference on Communication and Signal Processing (ICCSP), Chennai, India (2018) 10. Ahonen, T., Hadid, A., Pietikäinen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006) 11. Feng, X., Pietikainen, M., Hadid, A.: Facial expression recognition based on local binary pattern. Pattern Recognit. Image Anal. 17, 592–598 (2007) 12. Chao, W.L., Ding, J.J., Liu, J.Z.: Facial expression recognition based on improved local binary pattern and class-regularized locality preserving projection. Signal Process 117, 1–10 (2015) 13. Cao, N.T., Ton-hat, A.H., Choi, H.I.: Facial expression recognition based on local binary pattern features and support vector machine. Int. J. Pattern Recogn. Artif. Intell. 28(06), 1456012 (2014) 14. Kauser, N., Sharma, J.: Facial expression recognition using LBP template of facial parts and multilayer neural network. In: Proceedings of International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, India, pp. 445–449 (2017) 15. Goyani, M.M., Patel, N.: Recognition of facial expressions using local mean binary pattern. Electron. Lett. Comput. Vis. Image Anal. 16(1), 54–67 (2017) 16. Guo, M., Hou, X., Ma, Y.: Facial expression recognition using ELBP based on covariance matrix transform in KLT. Multimed. Tools Appl. 76, 2995–3010 (2017) 17. Muqeet, M.A., Holambe, R.S.: Local binary patterns based on directional wavelet transform for expression and pose invariant face recognition. Appl. Comput. Inf. 15(2), 163–171 (2017) 18. Zhao, L., Wang, Z., Zhang, G.: Facial expression recognition from video sequences based on spatial-temporal motion local binary pattern and gabor multi-orientation fusion histogram. Hindawi Math. Problems Eng. 2017, Article ID 7206041, 12 pages (2017) 19. Zhao, X., Zhang, S.: Facial expression recognition based on local binary patterns and kernel discriminant Isomap. Sensors 11, 9573–9588 (2011) 20. Choi, H., Choi, S.: Robust kernel isomap. Pattern Recogn. 40, 853–862 (2007) 21. Lahdenoja, O., Laiho, M., Paasio, A.: Reducing the feature vector length in local binary pattern based face recognition. IEEE Int. Conf. Image Process. (ICIP 2005) 2, II–914 (2015) 22. Mollahosseini, A., Chan, D., Mahoor, M.H.: Going deeper in facial expression recognition using deep neural networks. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Lake Placid, NY, USA, 7–10 Mar 2015

MRI Breast Image Segmentation Using Artificial Bee Colony Optimization with Fuzzy Clustering and CNN Classifier R. Sumathi and V. Vasudevan

Abstract Breast cancer is the most leading disease for women and many lost their lives without proper treatment and early diagnosis helps to reduce the death rate. In clinical procedure identifying the tumor take much processing time and extracting the abnormal part leads error at some time. To overcome this problem and helps the radiologist an automated framework with the integration of artificial bee colony and fuzzy C means clustering for extracting the cancerous part and classify the tumor as benign and malignant with CNN classification. Various objective measures like entropy, eccentricity, contrast, correlation, homogeneity, mean, variance, kurtosis, dice similarity index, hands-off distance and subjective measures like precision, recall, accuracy, MSE, PSNR were evaluated to ensure the efficiency and accuracy of segmentation and classification of MRI breast images with T1, T2—contrast enhanced images. Our hybrid approach yield 98% segmentation accuracy and 98.6% classification accuracy and it take average of 5 s for processing. We utilized public dataset like DDSM, MIAS and INbreast dataset for validation.

1 Introduction Many women’s from the age group 30–60 faced the breast related diseases due to stress, food habit and family gene and so on. Many abnormal cells forms a cancerous cell and affect by growing new cells and affect the other pasts, if it is known at the earlier stage it can rectified by diagnosis and save their lives. Radiologist prefer the MRI scanning procedure for studying the detailed behavior of abnormal parts in tumor such size, shape and color. Mammogram is used for initial stage and it is not suitable for dense breast with masses so MRI scanning is used for briefing the R. Sumathi (B) Department of Computer Science and Engineering, School of Computing, Kalasalingam Academy of Research and Education, Krishnankoil, India V. Vasudevan School of Computing, Kalasalingam Academy of Research and Education, Krishnankoil, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_28

303

304

R. Sumathi and V. Vasudevan

features without radiation side effects. With MRI breast image sequences like T1Weighted, T2-Weighted, T1-Axial and T2-Axial images are obtained for processing the images using various soft computing and machine learning techniques. Artificial bee colony optimization is a nature inspired meta heuristic optimization technique which produces optimal solution for many real time applications like stock market, sales profit prediction and medial diseases classification. Its main focus in medical field is to search and select best optimal features for cancer cell detection and classification. Clustering is an unsupervised learning method groups with its similarity measures such as intensity value of pixel in an image. It also recognizes and categories the clusters in various dimensions like 1D, 2D and 3D, etc. Among many clustering methods K-means and C means plays a vital role in image segmentation and classification. Fuzzy C means reduce the overlapping problem and to classify the data points based on their features effectively, so FCM is used in many segmentation algorithm. To ensure the quality of image segmentation various objective and subjective measures are applied to ensure the accuracy of proposed approach.

1.1 Related Study To incorporate Enhanced BEE Colony Optimization (EBCO) [1] and Fire Fly Algorithm (FFA) for segmenting tumor part of mammogram breast images to ensure the optimal threshold for accurate tumor segmentation and yield 96%. With a meta heuristic combination of ABC and whale optimization which imitate the food search location with the exploitation of employee bee. ABC is integrate with HAW [2] for feature selection and classify the tumor using HAW-RP, HAW-GD, HAW-LM and proved its accuracy is better than state of art and validation is done with online dataset like WBCD, WDBC, WPBC, DDSM, MIAS and INbreast with the integration of Gbest-guided artificial bee colony [3] and fuzzy clustering optimized neural network for classifying the mammogram breast cancer images, GABC is highly used for choosing the best parameters for the classifier and proved that this combination yields 99% classification accuracy and validated their approach using WBC dataset. To ensure the global optimal solution and fine tune in determining the centroid and fixed the clusters, FCM is integrated with ABC algorithm [4] for classifying the tumors with 9 attributes with 3 classes and obtained the result in less computational time. Limitation in this approach is choosing of initial cluster and fixing the number of classes [5]. Analyzed FCM, EM and SFCM for segmenting the mammogram breast images and ensured that SFCM produce the segmentation accuracy in terms of Jaccard Coefficient (JC), Volumetric Similarity (VS), Variation of Index (VOI), Global Consistency Error (GCE) and ensured that segmentation of mammogram breast is efficient in using FCM method and limitation is only 9 images are used for validation. As a honey bee swarms inspired by the intelligent behavior of bee, ABC is proposed to segment breast DCE-MR [6] images and its segmentation accuracy is compared with SOM-based K-means clustering algorithm and obtained 97% accuracy and its limitation is the limited usage of images for validation. With the

MRI Breast Image Segmentation Using Artificial Bee Colony …

305

combination of Principal Component Analysis with Fuzzy Artificial Bee Colony clustering [7] for clustering and compared the performance of feature set selection with various existing clustering like FCM, FPSO, FABC, and ensured that proposed PCA + FABC is better than all the other clustering techniques. With the combination of Otsu’s multi-level thresholding algorithm [8] as a fitness function into the ABC algorithm for segmenting the mammogram breast images collected from MINI MIAS database and ensured the segmentation accuracy with various performance metrics like Peak-Signal-Noise-Ratio (PSNR) and Structural Similarity Index (SSI) and Computational time which is far better than ostu, multiostu thresholding method. Our proposed approach integrates the ABC with FCM for detecting the abnormal part in MRI breast cancer images with less span of time and also classify them as normal and abnormal using CNN classifier. The major limitation in the overall study is to limited images are used for validation and accuracy of segmentation need to be improved. The main objective of our proposed approach is to integrate ABC with FCM for detecting the tumor part of MRI breast images. Various objective and subjective measures are evaluated to ensure the accuracy.

2 Materials and Methods 2.1 Image Dataset MRI breast images with sequences like T1-W, T2-W, T1-A image sequences are collected from DDSM, MIAS and INbreast. Overall 200 images are collected from all the online dataset with 80 benign and 120 malignant images. The flow diagram of our approach is shown in Fig. 1.

Read MRI images with 256 x 256

Classify the tumor with CNN classifier

Objective and subjective measures are performed to ensure the efficiency Fig. 1 Flow diagram for proposed approach

Apply median filters for preprocessing

To find the optimal solution and extract the tumor part of breast images ABC with FCM is applied

306

R. Sumathi and V. Vasudevan

2.2 Median Filters To enhance the contrast level for detailed observation from the medical image, median filters are used. To highlight the sharpness and to remove the noisy information median filter is preferred [9]. A(i, j) = median f (x, y)

(1)

where f (x, y) refers the input image and A(i, j) refers the median filtered image.

2.3 Artificial Bee Colony Optimization Artificial bee colony simulates the behavior of honey bee with the three group of bee namely, employee bee, onlooker bees and scout bees in the colony. Employee bee’s task is to find the food source and onlooker’s bee’s task is to determine and choose food source and scout bee searches the location. Using the ABC algorithm is to find the optimal solution based on the fitness function. Let initialize x = (x 1 , x 2 , …, x n ) represents the food source and D represents the problem size.   max min +  x − x xi j = x min j j j

(2)

where x min and x max are bounds for index j and  be a real number with the range [0, j j 1]. The employee bees candidate solution is produced by performing a local search around a neighboring food source   Vi j = xi j + ∅i j xi j − xk j

(3)

Let L be the random food source such as Le (1, 2, …, SN) and L = i. ∅i j produce the random output in the range [−1, 1]. Fitness function is computed by using Eq. (4) is as follows   1 fi > 0 1+ f i Fitnessi = 1 + abs( f i ) f i < 0 where fitness i represents the fitness function and the solution of fitness function is obtained in f i . For the onlooker bee, the food source is picked based on the probability value pi and its formula is as follows

MRI Breast Image Segmentation Using Artificial Bee Colony …

pi = 0.9 ∗

fitnessi + 0.1 max(fitnessi )

307

(5)

From the ABC optimization procedure global solution is obtained based on the fitness function and its objective criteria is computed by using fuzzy C means for initial cluster point’s identification.

2.4 Fuzzy C Means Clustering FCM is used to extract the abnormal part of breast cancers, FCM [10] is used to choose the initial cluster and embed the output of FCM to ABC for global solution. FCM with initial cluster is represented as 1 2 . m−1 (d /d ) i j ik k=1

(6)

 µi j .m V j = 1, 2, . . . , c

(7)

µi j = c and Vj =

n  i=1

Objective function of FCM is as follows: J (U, V ) =

n c

µi j xi − v j

(8)

i=1 j=1

where xi − v j measures the Euclidean distance with ith data and cth cluster.

2.5 Convolutional Neural Network Classifier From the obtained segmented output various features are extracted using GLCM and fed those features in CNN classifier [11] with 2 classes namely, normal and abnormal. It learns and categorize the features [12] based on the features obtained from segmented images and ensured with training (70%) and testing (30%) for ensuring the classification accuracy.

308

R. Sumathi and V. Vasudevan

2.6 Objective and Subjective Measures Objective measures like MSE, PSNR are used to ensure the segmentation accuracy and precision, recall and accuracy (subjective measures) ensure the quality and efficiency of segmentation with expert segmentation or with golden standard ground truth images x−1 y−1 1 MSE = [A1(i, j) − B1(i, j)].2 x y i=0 j=0



255 PSNR = 10 ∗ log 10 √ MSE Presicion = Recall = Accuracy =

(9)



TP TP + FN

TN TN + FP

TP + TN TP + FP + TN + FN

(10) (11) (12) (13)

3 Result and Discussion Our proposed integration ABC with FCM is suitable for efficient segmentation for the various sequences of MRI breast images. Figure 2a contains the input image, 2(b) contains the median filtered output and Fig. 2c contains ABC-FCM segmented output. Various features used for extraction and its values are represented in Fig. 3. It was ensured that the features extractions are comparable better than existing approaches. The subjective measures like precision, recall and accuracy is compared with existing approaches like K- Means yields 97.1, FCM yields 96.3% and FCM-PSO yields 97.6% and our proposed yields 98.1% for segmentation validation and for classification our proposed yields 98.8% accuracy which is far better result than KNN with 97% and ANN with 97.5%. Figures 4 and 5 shows the segmentation and classification accuracy comparison. Our proposed methods yields an average of 0.003 MSE value and 57.8 PSNR for all MRI breast image sequences. Computational time for segmentation it takes average of 5 s which is far better than FCM and K-means clustering approaches. With the FCM flexibility clusters points are grouped and to provide the optimal solution ABC is integrated and produces the efficient output. Noise reduction is

MRI Breast Image Segmentation Using Artificial Bee Colony …

Fig. 2 a Input image, b median filtered image, c ABC-FCM output

309

310

R. Sumathi and V. Vasudevan

%

Feature Extraction 1.2 1 0.8 0.6 0.4 0.2 0

0.98 0.78

0.76

0.93

0.97

0.94

0.89

0.88

0.78

0.22 0.03

Values

Fig. 3 Feature extraction values

Segmentation Accuracy 99

%

98

97.6

97.1

97

98.1

96.3

96 95

K- Means

FCM

FCM-PSO

Proposed

Approaches

Fig. 4 Segmentation accuracy comparison

Classification Accuracy

98.8 99 98

97.5 97

97 96

KNN

ANN

CNN

Fig. 5 Classification accuracy comparison

ensured with objective measures and accurate tumor detection and classification is ensured with subjective measures.

MRI Breast Image Segmentation Using Artificial Bee Colony …

311

4 Conclusion From the integration of ABC with FCM proves that is suitable for segmentation and classification of MRI breast images. Our from the MSE and PSNR values it was proved that the segmentation results obtained by the proposed algorithm have good immunity toward noise interference. With DDSM, MIAS and INbreast dataset, various sequences of images are validated with ABC approach to ensure the segmentation and classification accuracy and it was proved that computational time for segmentation of our proposed method is far better than existing approaches, to ensure the quality of automation various objective and subjective measures are performed to prove its efficiency.

References 1. Sivaramakrishnan, A., Karnan, M.: medical image segmentation using firefly algorithm and enhanced bee colony optimization. In: International Conference on Information and Image Processing (ICIIP-2014), pp. 316–321 (2014) 2. Stephan, P., Stephan, T., Kannan, R.: A hybrid artificial bee colony with whale optimization algorithm for improved breast cancer diagnosis. Neural Comput. Appl. (2021) 3. Addeh, J., Pourmandia, M.: breast cancer diagnosis using fuzzy feature and optimized neural network via the Gbest-Guided artificial bee colony algorithm. Comput. Res. Progress Appl. Sci. Eng. 1, 152–159 (2014) 4. Kaur, J., Nazeer, K.A.: An improved clustering algorithm based on Fuzzy C-means and artificial bee colony optimization. In: 2018 International Conference on Computing, Power and Communication Technologies (GUCON), pp.1089–1094 (2018) 5. Nagi Reddy, V., Subba Rao, P.: Comparative analysis of breast cancer detection using K-means and FCM & EM segmentation techniques. Ingénierie des Systèms d’Information 23, 173–187 (2018) 6. Nagi Reddy, V., Subba, R., Janaki Sathya, D., Geetha, K.: quantitative comparison of artificial honey bee colony clustering and enhanced SOM based K-means clustering algorithms for extraction of ROI from Breast DCE-MR images. Int. J. Recent Trends Eng. Technol. 8, 51–56 (2013) 7. Gomathi, C., Velusamy, K.: Enhancing performance of the fuzzy artificial bee colony clustering algorithm based on principal component analysis. Int. J. Sci. Res. Comput. Sci. Appl. Manage. Stud. 7, 1–5 (2018) 8. Kumar, M.A., Ramadevi, Y.: Multi-Otsu’s image segmentation for Mammogram using Artificial Bee Colony (ABC) Algorithm. Annals of R.S.C.B. 25, 12353–12362 (2021) 9. Abdallah, Y.M.Y., Hayder, A., Wagiallah, E.: Automatic enhancement of mammography images using contrast algorithm. Int. J. Sci. Res. (IJSR) 3, 1885–1889 (2014) 10. Herdangkoo, M., Yazdi, M., Rezvani, M.H.: Segmentation of MR brain images using FCM improved by artificial bee colony algorithm. In: Proceedings of the 10th IEEE International Conference on Information Technology and Applications in Biomedicine (2010) 11. Ting, F.F., Tan, Y.J., Sim, K.S.: Convolutional neural network improvement for breast cancer classification. Expert Syst. Appl. 120, 103–115 (2019) 12. Himabindu, G.M., Murty, R., et al.: Extraction of texture features and classification of renal masses from kidney images. Int. J. Eng. Technol. 7, 1057–1063 (2018)

Human Action Recognition in Still Images Using SIFT Key Points M. Pavan and K. Jyothi

Abstract Recognition of the human action in still images is one of the challenging problems due to the unavailability of movement information of the human body parts. In this work, we addressed this problem by generating scale-invariant feature transform (SIFT) key interest points over human body and used the statistical information of SIFT key points for representing human actions. We applied standard classifiers on the model and compared the efficiency using K-fold and hold-out validation technique. For the experiment video frames from KTH action dataset are used as static image. Our method is simple and recognizes human action with 85.8% accuracy.

1 Introduction Human action recognition in still image is essential research topic in computer vision, as there are many images are available over the Internet, developing an efficient method for analysis of human action is very much required, which automate many of the human action-oriented applications using machine. In human action recognition, machine mainly identifies and labels the actions performed by the human, based on the position of the human body parts. Even though there are some research works had been carried out in human action recognition using video as input, there are only few works done using static images [1]. Video-based action recognition system has some drawbacks; for example, it requires more time for processing, as video is more complex compared to image, there may be multiple actions associated within a single video, and there may be dynamic change in the background like moving traffic. Image-based action recognition system overcomes these drawbacks easily, but there are many challenges in this approach, as there is no information about movement of the body parts and the recognition is entirely dependent upon the spatial information, not on temporal. Recognition of the M. Pavan (B) · K. Jyothi Information Science and Engineering, JNN College of Engineering, Shivamogga, Karnataka, India e-mail: [email protected] K. Jyothi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_29

313

314

(a) Boxing

M. Pavan and K. Jyothi

(b) Clapping

(c) Waving

(d) Jogging

(e) Running

(f) Walking

Fig. 1 SIFT key interest points generated on the human action image

human action using still image is used in many of the applications such as image annotation, where action verbs can be annotated with images, image retrieval based on the human action or behaviour, reduction in video frame size for video-based action recognition, human–machine interaction, where machine identifies human action using still image, etc. There are few related research works had been carried out in Human action Recognition using still images as input. Girish et al. [2] worked on action recognition involving objects, they considered interest regions in the image for identifying the action. Zhang et al. [3] had addressed action recognition in still images with minimum annotation efforts, their work also considered human–object interaction for identifying the human action. In Raja et al. [4], graphical model containing six nodes encoding positions of five body parts and the action label are considered for recognizing the action. Thurau and Hlavac [5] defined pose primitives in each image as the unique cues for a specific action. These methods are based upon body pose estimation [6, 7], graphical models [4], and human–object interactions [2], and they have used manually annotated examples using bounding box in the action images to identify the human body. These methods are not automatic, they need the extra work to identify the human in the image and also the number of annotated training images available for recognition of human action is restricted, due to the extra cost associated with the manual annotation. In our proposed method, we generated scale-invariant feature transform (SIFT) key interest points [8] on Human Body as shown in Fig. 1 and used the statistical information of these future key points to train the model. Then we applied different classifiers to recognize the human action. Many researchers [9, 10] had used the SIFT future descriptors to recognize human actions in still images, but their accuracy result is very low. Organization of the paper is as follows; in the second part of the paper we discussed the methodology of our current work, in the third part we discussed the experiment result obtained for the proposed method, and in the last part we concluded the current work.

Human Action Recognition in Still Images Using SIFT Key Points

315

Fig. 2 Block diagram of human action recognition system Fig. 3 Input image and intensity adjusted image

2 Methodology Block diagram of Human action recognition System is shown in Fig. 2, which consists of following steps Pre-processing, Generating SIFT key interest points, Obtaining the Statistical Information of the generated SIFT key points and Classification using trained database for predicting Human action.

2.1 Pre-processing The input image is enhanced, using Contrast adjustment method, where intensity of the input image between 0.3 and 0.7 adjusted to 0.1 and 0.9 with γ value 0.5. The contrast adjusted image along with input image is shown in Fig. 3. To remove noise in the image, we applied low pass Weiner filter [11] of 5×5 neighbourhood. This filter estimates the local mean and variance around each pixel and filters noise based on the variance of noise in the image. 1  g (n 1 , n 2 ) N M n ,n ∈η

(1)

1  2 g (n 1 , n 2 ) − μ2 N M n ,n ∈η

(2)

μ=

1

σ2 =

1

gˆ (n 1 , n 2 ) = μ +

2

2

σ 2 − ν2 (g (n 1 , n 2 ) − μ) σ2

(3)

316

M. Pavan and K. Jyothi

where n 1 , n 2 is the N × M local neighbourhood of each pixel in the image g and ν 2 is the noise variance. The image is further enhanced by applying two-dimensional median filter using Eq. 4. Where Sx,y is the rectangular sub-image window of size 3×3 centred at point (x, y). We have applied two filters to obtain SIFT key points more accurately on the human body, which is necessary for our Action Recognition System. The result is shown in Fig. 4. I (x, y) = median(s,t)∈Sx,y {g(s, t)}

(4)

2.2 Generating SIFT Key Points We have generated scale-invariant key points in the action image by using the method of Lowe et al. [8]. We have modified the parameters used in this method according to the need of action recognition system. To generate the key points in the action image, SIFT method uses two step procedure. 1. Scale-space extrema detection. 2. Key point localization. S(x, y, σ ) = G(x, y, σ ) ∗ I (x, y)

(5)

1 −(x 2 +y 2 )/2σ 2 e 2π σ 2

(6)

G(x, y, σ ) =

In scale-space extrema detection, we will be going to find locations which is invariant to scaling which can be accomplished by applying Gaussian kernel function G on the input image I as shown in Eq. 5, here * is Convolution operator. In Eq. 6 for the current work we used variance σ value 1.7321. DoG(x, y, σ ) = (G(x, y, kσ ) − G(x, y, σ )) ∗ I (x, y) = S(x, y, kσ ) − S(x, y, σ )

Fig. 4 Image after applying Weiner and median filter

(7)

Human Action Recognition in Still Images Using SIFT Key Points

317

To detect the stable key points, we convolved the image with Difference-of-Gaussian (DoG) function using 3 octaves, in 3 levels as given by Eq. 7, here k is a constant factor. The result is shown in Fig. 5. To find extrema points in image, comparison of each sample point with eight surrounding neighbours in current image, nine neighbours in the scale above and below image is carried out. Sample point is selected if it is larger or smaller than all its twenty-six neighbours. Next points which are low in contrast are rejected based on a derivative of x in Taylors expansion scale-space function (8). DoG(x) = DoG +

1 ∂ 2 DoG ∂DoGT x + xT x ∂x 2 ∂x2

Fig. 5 DoG operation with 3 octaves in 3 levels

(8)

318

M. Pavan and K. Jyothi

xˆ = −

∂ 2 DoG ∂x2

−1

∂DoG ∂x

(9)

The function value of DoG(ˆx) in 10 is used, for removing the extrema points having low contrast value, which is considered less than 0.1 in the current work. DoG(ˆx) = DoG +

1 ∂DoGT xˆ 2 ∂x

(10)

Further extrema points which are poorly localized along the edge are identified and removed by Harris et al. [12] method which has following steps. In the first step by applying 2×2 Hessian matrix (HS) of Eq. 11 at extrema points. In the next step by computing sum of Eigenvalues using trace of HS and product of the determinant as given by Eq. 12.   DoGx x DoGx y (11) HS = DoGx y DoG yy Tr(HS) = DoGx x + DoG yy = α + β 2  Det(HS) = DoGx x DoG yy − DoGx y = αβ

(12)

Consider r be the ratio between Eigenvalue highest magnitude α and lowest magnitude β, such that α = rβ. (α + β)2 (rβ + β)2 Tr(HS)2 (r + 1)2 = = = Det(HS) αβ rβ 2 r

(13)

The expression (r +1) in (13) is minimum, when Eigenvalues are minimum or equal r and increases with r. To identify accurate key interest SIFT points, Eq. 14 need to be satisfied. (r + 1)2 Tr(HS)2 < (14) Det(HS) r 2

In the proposed method, we have used r value 50. Further, to remove key points which are present far away from the human body, we removed outlier data using lower 3 and upper 97 percentile threshold value. Dist =

 2 

1/2 |xi − yi |

2

(15)

i=1

Even though we generated the key points accurately on human body, few key points are almost same and indistinguishable. To remove these overlapped key points, we have calculated Euclidean distance between neighbour key points using Eq. 15 and eliminated the key points less than threshold distance value 3.

Human Action Recognition in Still Images Using SIFT Key Points

319

2.3 Statistical Information of SIFT Key Points Using the generated key interest points, we selected Twelve statistical information of SIFT Key interest points as parameter for training our model they are as follows. • Distance between the combination of Topmost, Bottommost, Leftmost and Rightmost pair of points. • Percentage of points distribution above, below, left-side and right-side of the midpoint axis. • Standard deviation of x and y coordinate values of the points. Distance are measured using Euclidean distance Eq. 15. Expression used for Identifying Rightmost (R), Leftmost (L), Topmost (T ) and Bottommost (B) points are shown in (16), (17), (18), (19) respectively and the corresponding points in different actions are shown in Fig. 6. R(x, y) = arg max Point(xi , y)

(16)

L(x, y) = arg min Point(xi , y)

(17)

T (x, y) = arg max Point(x, yi )

(18)

B(x, y) = arg min Point(x, yi )

(19)

UP = Count(Points(x, y) > B(x, y) + (U (x, y) − B(x, y))/2)

(20)

BP = Count(Points(x, y) < B(x, y) + (U (x, y) − B(x, y))/2)

(21)

LP = Count(Points(x, y) < L(x, y) + (R(x, y) − L(x, y))/2)

(22)

RP = Count(Points(x, y) > L(x, y) + (R(x, y) − L(x, y))/2)

(23)

(i=1 to n)

(i=1 to n)

(i=1 to n)

(i=1 to n)

(a) Boxing

(b) Clapping

(c) Waving

(d) Jogging

(e) Running

(f) Walking

Fig. 6 Topmost, bottommost, leftmost, rightmost SIFT key interest points generated on the human action image

320

M. Pavan and K. Jyothi

Next we count the number of points above (UP), below (BP), left-side (LP) and right-side (RP) of middle point axis using Eqs. 20, 21, 22, 23 respectively, based on this we calculate percentage of points scattered around midpoint axis. Finally for calculating standard deviation of x (SD of x) coordinate and y coordinate values (SD of y) with respect to Key points we used (24), (25).

SD of x =

|xi − μx |2 Number of Points

(24)



SD of y =

|yi − μ y |2 Number of Points

(25)

3 Experiment and Results We conducted experiment using MATLAB software and used standard benchmark KTH Dataset [13]. KTH Dataset contains six actions Boxing, Hand Clapping, Hand Waving, Jogging, Running, and Walking performed by twenty-five Subjects, captured over homogeneous background, which are considered as class 1 to class 6 respectively in our experiment. Video frames are 160×120 pixels in size. We selected 6 static image frames from each 25 Subject of 6 different Action captured in outdoor scenario. Total 900 Images we have used for our experiment. The input image is grey scale and resized image of fixed size 256×256. We cross Validated our Model using K Fold Cross Validation Technique using K = 5 and Hold Over validation Technique using 25% hold out for our model. We used different classifiers for conducting our experiment. Table 1 shows the results, we obtained for different classifier. As shown in the table we obtained 80.2% result for weighted KNN classifier for K-fold validation technique, corresponding confusion matrix are shown in Fig. 8. KFold validation Confusion matrix shows, True positive Rates for Boxing, Hand Waving, Running actions are above 80% and for walking action True Positive rate is low which is 68%. Further false discovery rate is below 20% for boxing, hand waving, running actions and for hand clapping, jogging, walking action having false discovery rate 26%. Next in hold-out validation technique we got 85.8% result for cubic SVM classifier, and corresponding confusion matrix is shown in Fig. 11. Holdout validation Confusion matrix shows True positive rates is above 95% for Boxing, Hand Waving, Running actions and 71% for walking action. Further false discovery rate is below 10% for boxing, hand waving, running actions and 25% for hand clapping and walking actions.

Human Action Recognition in Still Images Using SIFT Key Points Table 1 Accuracy result for human action recognition Classifier name Validation technique K-fold in % Weighted KNN Ensemble of bagged trees Fine KNN Cubic SVM Ensemble with subspace KNN

80.2 79.3 78.4 78.2 78.1

321

Hold-out in % 84.4 79.6 82.7 85.8 81.8

Bold significance represents Maximum accuracy obtained for weighted KNN and Cubic SVM classifiers Fig. 7 Confusion matrix for weighted KNN for K-fold validation

4 Conclusion We have presented a work on Human Action Recognition using still images using statistical information obtained using generated SIFT key points on the Human body. This method not used manual annotation or bound box value for detecting human body region, which reduces the extra work. We have identified twelve different useful parameters, of SIFT key points for identifying human action efficiently. We have applied different classifiers for checking the performance of our proposed model. We verified performance of the proposed model using K-fold and hold-out validation technique using standard KTH dataset.

322 Fig. 8 Confusion matrix of true positive rates for weighted KNN for K-fold validation

Fig. 9 Confusion matrix of false discovery rates for weighted KNN for K-fold validation

M. Pavan and K. Jyothi

Human Action Recognition in Still Images Using SIFT Key Points Fig. 10 Confusion matrix for cubic SVM for hold-out validation

Fig. 11 Confusion matrix of true positive rates for cubic SVM for hold-out validation

323

324

M. Pavan and K. Jyothi

Fig. 12 Confusion matrix of false discovery rates for cubic SVM for hold-out validation

References 1. Guo, G., Lai, A.: A survey on still image based human action recognition. Pattern Recogn. 47(10), 3343–3361 (2014) 2. Girish, D., Singh, V., Ralescu, A.: Understanding action recognition in still images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2020) 3. Zhang, Y., et al.: Action recognition in still images with minimum annotation efforts. IEEE Trans. Image Process. 25(11), 5479–5490 (2016) 4. Raja, K., et al.: Joint pose estimation and action recognition in image graphs. In: 2011 18th IEEE International Conference on Image Processing. IEEE (2011) 5. Thurau, C., Hlavác, V.: Pose primitive based human action recognition in videos or still images. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2008) 6. Yang, W., Wang, Y., Mori, G.: Recognizing human actions from still images with latent poses. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE (2010) 7. Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE (2010) 8. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004) 9. Delaitre, V., Laptev, I., Sivic, J.: Recognizing human actions in still images: a study of bag-offeatures and part-based representations. In: BMVC 2010-21st British Machine Vision Conference (2010) 10. Shapovalova, N., et al.: On importance of interactions and context in human action recognition. In: Iberian Conference on Pattern Recognition and Image Analysis. Springer, Berlin, Heidelberg (2011) 11. Lim, J.S.: Two-dimensional signal and image processing. Englewood Cliffs (1990)

Human Action Recognition in Still Images Using SIFT Key Points

325

12. Harris, C.G., Stephens, M.: A combined corner and edge detector. Alvey Vis. Conf. 15(50) (1988) 13. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition, vol. 3. ICPR 2004. IEEE (2004)

Detecting Credit Card Fraud Using Majority Voting-Based Machine Learning Approach V. Akshaya, M. Sathyapriya, R. Ranjini Devi, and S. Sivanantham

Abstract Credit cards have become the common payment mode for both offline and online purchases as a result of new developments in communication systems and electronic-commerce systems, resulting in a substantial rise in transactions related frauds. Every year, fraudulent credit card purchases budget companies and customers a considerable amount of money and fraudsters are actively attempting to use new technology and methods to commit fraud. The identification of fraud prone transactions has become a major factor influencing the adoption of e-payment systems. As a consequence, there is an inevitable need for accurate algorithms for detecting fraudulent transactions in credit cards. An intelligent method for detecting fraud in credit card transactions is introduced in this paper. Popular machine learning algorithms like logistic regression, gradient boosting, random forest, k-nearest neighbor, and voting classifier are treated against the credit card dataset in this paper. On comparing different approaches, the proposed paper showcases that the voting classifier provides optimal accuracy of 89.25% and F1-score of 0.633.

1 Introduction Increase in interest to run business in Internet and the increase in electronic monetary transactions in the cashless economy, accurate detection of fraud has become a crucial factor in securing such transactions. In e-business Web applications, one of the primary payment methods is credit card system, and payment on delivery (POD) is least used. Statistics says even in a small country like Malaysia, purchases using a credit card are 3200 lakhs in the year 2011, and it raised to 3600 lakh in 2015. The increase in credit card purchases has amplified the threat of fraud also. The fraudsters hide their details like location and identity on the Internet. These adverse fraudsters V. Akshaya · M. Sathyapriya · R. Ranjini Devi Department of CSE, IFET College of Engineering, Villupuram, India S. Sivanantham (B) Department of IT, Adhiyamaan College of Engineering, Hosur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_30

327

328

V. Akshaya et al.

threaten the financial industry. The merchants are affected much due to these adversaries and have to bear the loss that is incurred due to the attack. When an adversary attempt to make purchases using credit card details the owner’s knowledge, then this is referred as credit card fraud. Credit card fraud charges billions of dollars due to the illegitimate use of credit cards and a lack of effective protection mechanisms [1]. It is difficult to draw an exact figure of the losses because credit card companies are usually hesitant to reveal such information. However, information about the monetary losses happened due to credit card fraud is not aware to the public. Credit card usage without proper protection results in billion-dollar financial losses. Credit card fraud costs the world $22.8 billion in financial losses. Fraud detection is considered as a major challenge in machine learning for a variety of reasons because, for instance, data continue to evolve rapidly over time with new attacks every now and then, seasonality and many other factors. Depending upon the behavior of transactions, various approaches can be used to detect and classify legitimate transactions from fraudulent transactions [2]. These models are primarily classified into supervised learning and unsupervised learning algorithm. Identification and elimination of fraud cases are an operative solution for mitigating illegal credit card usage. For detecting fraudulent credit card transactions, there have been many data mining and machine learning methods evolved and still evolving [3]. Currently, in recent times, for obtaining the accuracy of the fraud transactions, numerous machine learning algorithms like logistic regression, random forest classifier, support vector machine, hidden Markov model, and Naïve Bayer’s classification have been employed [4]. The popular machine learning techniques like logistic regression, random forest, gradient boosting, k-nearest neighbor, and voting classifier algorithms are involved in the discovery of credit card fraud in our paper. The performance evaluation among these methods is done with a real-world credit card payment transaction dataset and with few inbuilt criteria. The fraud detection is done through these machine learning methods like random forest, logistic regression, k-nearest neighbor, and gradient boosting are made use of by voting classifier, and an accurate result for fraud detection is obtained [5]. In this typical hybrid implementation, voting in the majority (voting classifier) looks to be reliable.

1.1 Logistic Regression Logistic regression (LR) is basically a statistical approach used to evaluate a dataset, and in which, the determination of outcome is done using one or multiple independent variables. These dichotomous variables measure the outcome of a classification or regression. Each independent variable produces an outcome that is usually binary in nature (0/1, no/yes, false/true). The binary outcomes are maintained in dummy variables. Logistic regression is treated as a special case of linear regression [6]. The working principle of logistic regression is an estimation of the probability of the existence of an event by fitting data to a logistic function.

Detecting Credit Card Fraud Using Majority Voting-Based Machine …

329

1.2 Gradient Boosting Bagging is a common ensemble technique for constructing multiple standalone class predictions and uniting the predictions with common averaging techniques. There is one more ensemble approach where the predictors are built consecutively called boosting, and in boosting, predictions are made independently [7]. Gradient boosting (GB) is a popular most machine learning ideas typically involved to solve regression and classification problems. The specialty of gradient boosting is its production of a strong prediction model as an ensemble of weak prediction models like decision trees, Bayesian classifier, and random forest. The implementation of GB is shown below. The predictor is constructed in a phase by phase fashion similar to existing boosting methods, and it simplifies the process by permitting optimization of a differential loss function in each phase.

1.3 Random Forest Random forest is the most efficient data mining techniques and is used for machine learning as well in the majority of places. As the name implies, this algorithm grows a forest of data with a collection of decision trees which is usually made expertise through the bagging concept most of the time [8]. The hyperparameters used in the decision trees are more or less similar to the random forest, and absolutely, there is no compulsion for associating a decision tree and bagging classifier. In Random Forest, the data trees are grown-up morphologically, and additional randomness is added to the model. This method will not search for the significant feature instead; it will search for the best features among a random feature subset, when splitting a node. This random approach is followed for achieving a wide range of diversity to attain a better prediction model. As discussed above, while splitting a node, the random forest considers only a random feature subset. Random trees are made by using random thresholds. Additionally in RF, unlike a normal decision tree instead of searching all the features, the best possible feature is chosen from the random subset for setting thresholds for classification.

1.4 K-Nearest Neighbors Classification based on neighbors is a kind of lazy yet simple machine learning method. KNN never attempt to build a common internal model as other ML methods do. It quietly stores data instances of the training data initially. Classification is performed by a voting on majority of the k neighbors closest to the data point where

330

V. Akshaya et al.

K is a user-preferred variable [9]. KNN classification is simple to implement, and other advantages are its robustness toward noisy data and effectiveness in dealing with voluminous data.

1.5 Voting Classifier The Voting Classifier (VC) is a meta-classifier that links two or more analogous machine learning classifiers even if they are conceptually different. The classification is done here through the majority voting concept [9]. Figure 1 shows the interesting ensemble solution provided by voting classifier. The VC may not be taken as a conventional classifier but can be treated as a wrapper for a group of well-trained conceptually dissimilar classifiers that are trained and evaluated side by side to expel the varied peculiarities of each technique used.

2 Related Lıterary Work In recent times, different kinds of research frontiers are reached by researchers in developing a robust credit card fraud detection systems (CFDS) using machine

Fig. 1 Working of voting classifier

Detecting Credit Card Fraud Using Majority Voting-Based Machine …

331

learning. We shall discuss few noteworthy research findings by researchers about CFDS that are developed using machine learning approaches for fraud detection in this chapter. Natekin and Knol have presented a work using the gradient boosting model (GBM), and advocate that gradient boosting gives the best performance in pair with conventional bagging and boosting techniques [7]. Randhawa et al. have deliberated about AdaBoost and majority voting and the authors recommend majority voting since it offers an ideal detection rate when collaborated with AdaBoost classifier [10]. Biau et al. have implemented the random forest and logistic regression and highlight that logistic regression provided the optimal result with random forest [8]. Ravisankar et al. have worked out support vector machine, logistic regression, MLLF, and GMDH. The authors express that the logistic regression provides an optimal accuracy compared to the rest of the methods [11]. Chen et al. proposed an integrated model with random forest, decision tree, rough set theory, and back-propagation neural network to construct a fraud detection framework for corporate finance industries, and the authors suggest such hybrid model like this helps to make accurate classifications with respect to other hybrid classifiers [12]. Li et al. have established a crossbreed model by combining principal component analysis and random forest and used the model to detect fraudulent transactions in insurance dataset, and they observed that the hybrid predictors are good at achieving optimal efficacy in terms of time and cost [13]. Many of the existing research works related to CFDS pay attention in improvising the accuracy rate in detecting frauds. The outcome of the above review reveals evidently that the hybrid machine learning approaches offer improved performance compared to methods that are used individually.

3 Experimental Setup For experimentation, we have chosen logistic regression, gradient boosting, and random forest to work with voting classifier. The voting classifier considers the best parameters used by the classification algorithms to give the top accuracy, precision, recall, and F1-score. The voting classifier is commonly used for improving classification accuracy. The entire experiment is staged in the Anaconda framework at Jupyter notebook, and the performance of the classifiers is evaluated by using the Kaggle credit card dataset.

3.1 Dataset Description The Kaggle credit card dataset is a familiar real-world dataset that is often used for machine learning research. As far as the composition of the dataset is concerned, it is composed of payment transactions done using the credit card, and the dataset contains a set of transactions labeled as fraudulent. The attributes are labeled from v1, v2 ……. to v28 in sequence. The dataset is built strictly avoiding alphabets and

332

V. Akshaya et al.

unrequired components other than the data points. Metadata will not supply larger and considerable data, and the main attributes are time of transaction and amount involved in the transactions. Time is recorded every second, and the amount that gets transacted within a specific interval is taken into account for the amount attribute. The dataset is composed of 80% of normal or non-fraud transactions and 20% of fraud transactions.

3.2 About Anaconda Framework Anaconda Navigator is a popular machine learning platform. The two major opensource languages used in Anaconda for machine learning are R programming language and Python. Developing machine learning applications with ML languages in standalone mode are difficult due to the absence of relevant packages. Data analysis in particular predictive analytics can perform in the best way in the Anaconda framework, and it supplies the user with all required application constructs to convert any proposed ML model to a useable application. The navigator contains many IDEs for the effective development of programming models in many languages. In our experimentation phase, we have used the Jupyter notebook as IDE. The Jupyter notebook is simple to handle and supports diverse formats of dataset like ARFF, C4.5, XRFF, JSON instance files, BSI, CSV, and MATLAB ASCII files. Regarding functionality, major data analysis techniques like data preprocessing, association, clustering, regression, and classification can be staged in Anaconda Navigator easily [14]. There are many other data mining and machine learning tools to implement the discussed ML algorithms, but the Anaconda framework is specialized to implement complex ensemble classifiers like voting classifier. As the initial step, the raw data are fed using appropriate data readers in the language preferred and followed by that is the preprocessing step where data are cleansed using filters and other preprocessors. Once the data are preprocessed, the selection of the ML model is done. Finally, the best predictions are found by majority voting among the other predictors. When the selection of appropriate model is over, the next step is logistic regression, random forest, gradient boosting, and voting classifier are applied in order. k-fold validation is adopted to achieve optimal accuracy, precision, recall, and F1-score.

4 Result and Discussion The result estimation includes the comparison of accuracy, precision, recall, and F1score. These performance metrics are derived from the confusion matrix obtained after classification. True positive (TP), true negative (TN), false positive (FP), and false negative (FN) are used for further calculation [15]. Accuracy, precision, F1-score, and recall are calculated as.

Detecting Credit Card Fraud Using Majority Voting-Based Machine …

333

The experimentation is done in two stages. The five discussed classifiers are employed without doing validation, and the outcomes of the classification are documented in the first stage and the next stage; the five discussed classification systems are executed with k-fold validation. The average of accuracy, F1-score, precision, and recall with and without validation is recorded in Table 1. At every stage, the voting classifier provides a decent result with LR, GB, RF, and KNN combinations [16]. Looking into the rate of accuracy measured from the discussed machine learning models, the voting classifier gives the optimal performance as shown in Table 1. As far as F1-score is concerned, logistic regression, random forest, and voting classifier provide more or less equivalent and optimal results which is given in bold. It is evident that random forest provides the best results, and on the other side, logistic regression gives the least for precision, and GB and KNN classifiers give an average performance. While comparing recall, results show that logistic regression is best among the models, and gradient boosting is the least, and the rest are on average scale. The above comparisons show that without any suspicion, the voting classifier gives optimal performance for the credit dataset fraud detection problem. The voting classifier combines the performance of LR, GB, KNN, and RF and outstrips those classifiers individual performance. Table 1 Performance comparison of the approaches in average Classıfıer used

Accuracy (%)

F1-score

Precision

Recall

Logistic regression

84.60

0.614

0.065

0.921

Gradient boosting

82.20

0.578

0.912

0.348

K-nearest neighbors

83.56

0.592

0.869

0.675

Random forest

84.33

0.627

0.945

0.775

Voting classifier

89.25

0.633

0.923

0.783

334

V. Akshaya et al.

5 Conclusion The proposed credit card fraud detection system is hybrid in nature because it involves logistic regression, gradient boosting, random forest, k-nearest neighbor, and voting classifier. The hybrid approach all the time outperforms the techniques that are used individually. After analyzing the result, it is evident that the voting classifier offers an optimal accuracy rate and F1-score of 89.25% and 0.633, respectively; at the same time, random forest offers a good precision rate of 0.945, and gradient boosting is best with a recall rate of 0.921.

References 1. Zhang, X., Han, Y., Xu, W., Wang, Q.: HOBA: a novel feature engineering methodology for credit card fraud detection with a deep learning architecture. Inf. Sci. May 2019. Accessed: 8 Jan (2019) 2. N. Carneiro, G. Figueira, and M. Costa : A data mining based system for credit-card fraud detection in e-tail. J. Decis. Support. Syst. 95, 91101 (Mar 2017) 3. Lebichot, B., Le Borgne, Y.-A., HeGuelton, L., Oblé, F., Bontempi, G.: Deeplearning domain adaptation techniques for credit cards fraud detection. In: Proceeding INNS Big Data Deep Learn, The conference, Genoa, Italy, pp. 7888–7900, (2019) 4. John, H., Naaz, S.: Credit card fraud detection using local outlier factor and isolation forest. Int. J. Comput. Sci. Eng. 7(4), 1060–1064 (2019) 5. Adewumi, A.O., Akinyelu, A.A.: A survey of machine-learning and nature-inspired based credit card fraud detection techniques. Int. J. Syst. Assur. Eng. Manag. 8, 937–953 (2017) 6. Peng, C.Y.J., Lee, K.L., Ingersoll, G.M.: An introduction to logistic regression analysis and reporting. J. Educ. Res. 96(1), 22–31 (2010) 7. Natekin, A., Knol, A.: Gradient boosting machines, a tutorial. Frontiers Neuro Robot, Acad. Art. 7(21), 1–21 (2013) 8. Biau, G.: Analysis of a random forests model. J. Mach. Learn. Res. 13(7), 1063–1095 (2012) 9. Chen, H., Lin, Y., Tian, Q., Xu, K.A.: Comparison of multiple classifier combinations using different voting-weights for remote sensing image classification. Int. J. Remote Sens 39(11), 208–219 (2018) 10. Randhawa, K., Loo, C.K., Seera, M., Lim, C.P., Nandi, A.K.: Credit card fraud detection using AdaBoost and majority voting. IEEE Trans. 6, 14277–14284 (2017) 11. Ravishankar, R., Ravi, V., Raghava, R., Bose, I.: Detection of financial statement fraud and feature selection using data mining techniques. J. Decis. Support Syst. 50(2), 491–500 (2011) 12. Chen, F.H., Chi, D.J., Zhu, J.Y.: Application of random forest, rough set theory, decision tree and neural network to detect financial statement fraud–taking corporate governance into consideration. In: Proceeding of International Conference on Intelligent Computing, pp. 221– 234. Springer, Berlin (2014) 13. Li, Y., Yan, C., Liu, W., Li, M.: A principle component analysis based random forest with the potential nearest neighbour method for automobile insurance fraud identification. J. Appl. Soft Comput. 70, 1000–1009 (2017) 14. Anaconda complete manual. Available online: https://www.anaconda.org 15. Powers, M.W.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011) 16. Sivanantham, S., Dhinagar, S.R., Kawin, P., Amarnath, J.: Hybrid approach using machine learning techniques in credit card fraud detection. In: Suresh, P., Saravanakumar, U., Hussein, A.S.M. (eds.) Advances in Smart System Technologies. Advances in Intelligent Systems and Computing, vol. 1163. Springer, Singapore (2021)

Evaluation and Performance Analysis of Magnitude Comparator for High-Speed VLSI Applications Madhusudhan Reddy Machupalli, Krishnaveni Challa, Murali Krishna Bathula, G. Rajesh Kumar, and Raja Posupo

Abstract In the architecture of central processing units (CPUs) and microcontrollers, the magnitude or the digital comparator plays a vital role. The performance of the comparator sinks due to the critical path delay created by the internal complexity of the circuit. To overcome this, two novel 1-bit comparator circuits are designed using CMOS logic structures (combination of pass transistor logic (PTL) and transmission gate logic (TGL)). The proposed 1-bit comparator circuits are compared along with the existing ones which exhibit better power delay product (PDP). The simulation results are carried out with the Mentor Graphics tool at VDD = 1 V using 22-nm CMOS technology. It is extended to the design and implementation of a 2-bit magnitude comparator using proposed 1-bit comparator circuits. The proposed 2-bit comparator circuit occupies less amount of area on the silicon chip and consumes low power when compared to the existing magnitude comparators.

1 Introduction The design of digital circuits with high-speed and low power is a challenging thing nowadays. The magnitude comparator is one of the examples of digital circuits which play a significant role in the architecture of CPUs and microcontrollers (i.e., comparison of bits). If the comparison of the bit size increases, parallely, the architecture of the comparator increases. With this, the execution time and wiring complexity create M. R. Machupalli (B) ECE, K.S.R.M College of Engineering, Kadapa 516003, India Krishnaveni Challa · G. Rajesh Kumar ECE, JNTUK, Kakinada 533003, India M. K. Bathula ECE, Koneru Lakshmaiah Education Foundation, Green Fields, Vaddeswaram, Guntur 522502, India R. Posupo ECE, RISHI MS Institute of Engineering and Technology for Women, Hyderabad 500090, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_31

335

336

M. R. Machupalli et al.

the increase of critical path delay [1]. The size and complexity of the comparator circuit mainly depend on the device geometries and transistor count. Using CMOS logic structures, the design of the comparator circuit consumes fewer amounts of power and area. The basic arithmetic component in high-speed VLSI circuits is the 1-bit and 2-bit comparator circuits used for binary number comparison [1, 2]. The existing topologies of the 2-bit comparator circuit are as follows: The standard CMOS logic style-based 2-bit comparator requires 54 transistors. The logic style is regular, low power, and simple to construct the circuit. It exhibits more amount of delay and utilizes a large area on the silicon chip. The design and implementation of a TGL style-based 2-bit comparator require 74 transistors. The full swing output voltage levels are obtained with this logic style. Due to more number of internal nodes, circuit complexity, and transistor count, it requires more amount of power consumption. The design of a 2-bit comparator using PTL consists of 40 transistors. It consumes less amount of area, around 42–26% of the transistor count is reduced when compared to TGL and CMOS logic. But, the degradation of output logic levels takes place [1, 2]. The hybrid logic (combination of CMOS and PTL) requires 46 transistors. The width-to-length ratios of the transistor are not equally mobilized. With this, it exhibits a high propagation delay [2]. The half-adder logic-based 2-bit comparator requires 32 transistors. The outputs of the comparator A < B, A = B and A > B pass through the pass transistors. Due to this, the degradation of output takes place [1]. The modified TGL-based 2-bit comparator requires 30 transistors. It occupies less amount of silicon area, around 60, 44, 25, and 6% of the transistor count is reduced when compared to TGL, CMOS, PTL, and half-adder logic. The full swing output voltage levels are obtained with this logic style [1]. Due to more number of internal nodes and wiring complexity, the propagation delay is high, and time-skew problems lead to short circuits. Not more than 3 TGs can be connected in a cascade due to the charge sharing problem. A buffer circuit is to be connected after 3 TGs [3]. The remaining part of the paper is as follows: To overcome this, two novel 1bit comparator circuits are designed using CMOS logic structures (combination of PTL and TGL). It is extended to design, and implementation of 2-bit magnitude comparator using proposed 1-bit comparator circuits is explained in Sect. 2. The simulation results are carried out with Mentor Graphics tool at 1 V supply voltage using 22-nm CMOS technology for 1-bit and 2-bit comparator circuits, and performance analysis is discussed in Sect. 3. Finally, the paper ended up with conclusions.

2 Proposed Comparator Circuits The 1-bit comparator circuit consists of two inputs (A, B) and three outputs (A < B, A = B, A > B). It compares the inputs and sends the output as which is greater than A > B, less than A < B, and equal to A = B. The truth table and block diagram of the

Evaluation and Performance Analysis of Magnitude Comparator …

337

Fig. 1 Circuit diagram of proposed 11 T 1-bit comparator circuit_1

1-bit comparator are shown in [4]. The design and working principle of the proposed 1-bit comparator circuits is explained in this chapter.

2.1 Proposed 11 T 1-bit Comparator Circuit_1 The circuit diagram of the proposed 1-bit comparator circuit_1 is shown in Fig. 1. To design, the circuit_1 requires 11 transistors. The transistors M1, M2 form an AND gate (output A < B); transistors M3, M4 form an AND gate (output A > B), and transistors M5, M6, M7 form a NOR gate (output A = B). The operation of the circuit is as follows: if we assume input as A = ’0’ and B = ’1’, the output A < B = ’1’ and the outputs A > B = ’0’ and A = B = ’0’ [4]. The AND gate is designed using a combination of PTL and TGL called hybrid logic. With this logic, the transistor count is reduced, exhibits good delay performance, and full swing output voltage levels are obtained. The NOR gate is designed concerning pseudo-nMOS logic. When compared to CMOS logic transistor count is minimized, but it consumes more static power dissipation which is the only drawback [3]. The drawback is rectified by replacing the XNOR gate instead of the NOR gate in the proposed 10 T 1-bit comparator circuit_2.

2.2 Proposed 10 T 1-bit Comparator Circuit_2 The circuit diagram of the proposed 1-bit comparator circuit_2 is shown in Fig. 2. The circuit_2 design requires 10 transistors. The transistors M1, M2 form an AND gate (output A < B); transistors M3, M4 form an AND gate (output A > B), and transistors M5, M6 form an XNOR gate (output A = B). The operation of the circuit is as follows: if we assume input as A = ’1’ and B = ’1’, the output A = B = ’1’ and the outputs A > B = ’0’ and A < B = ’0’ [4]. The AND gate is designed using a combination of PTL and TGL called hybrid logic. With this logic, the transistor count is reduced; better power delay product is

338

M. R. Machupalli et al.

Fig. 2 Circuit diagram of proposed 10 T 1-bit comparator circuit_2

exhibited, and full swing output voltage levels are obtained. The XNOR gate is also designed concerning hybrid logic. As compared to CMOS logic and pseudo-nMOS, logic transistor count is minimized. Due to this, it occupies less amount of area on the silicon chip. If the comparison of the bit size increases, parallely, the architecture of the comparator increases. With this, the execution time and wiring complexity create the increase of critical path delay. The size and complexity of the comparator mainly depend on the device geometries and transistor count [1, 2]. To overcome this, a 2-bit magnitude comparator is designed using the proposed 10 T 1-bit comparator circuit_2.

2.3 Proposed 2-bit Comparator Circuit The truth table and block diagram of the standard 2-bit comparator are shown in [1, 5, 6]. The circuit diagram of the proposed 2-bit comparator using the proposed 10 T 1-bit comparator circuit_2 is shown in Fig. 3. To design the proposed 2-bit comparator two 1-bit comparators, three 2-input AND gates, two 2-input OR gates, and in all a total of 30 transistors are required. The operation of the circuit is as follows: It compares 2-bit of the inputs from most significant bit (MSB) to least significant bit (LSB). If we assume inputs as A1A0 = Fig. 3 Circuit diagram of proposed 2-bit comparator using proposed 10 T 1-bit comparator circuit_2

Evaluation and Performance Analysis of Magnitude Comparator …

339

’01’ and B1B0 = ’11’, the output A < B = ’1’ and the outputs A > B = ’0’ and A = B = ’0’ [6–8]. The proposed 2-bit comparator is designed using a combination of PTL [3] and TGL [3] called hybrid logic. With this logic, the transistor count is reduced, exhibits low power consumption and better delay performance, and full swing output voltage levels are obtained. As compared to the existing ones, the proposed 2-bit comparator utilizes fewer transistor counts. Due to this, it occupies less amount of area on the silicon chip. The performance analysis of the proposed 2-bit comparator shows significant results when compared to the six existing magnitude comparators.

3 Simulation Results The input/output analysis of the proposed 1-bit comparator circuit_1 and circuit_2 is simulated using the Pyxis schematic Mentor Graphics tool at 22-nm CMOS technology. The simulations were performed with the supply voltage VDD = 1 V. The simulated waveforms of the proposed 1-bit comparator circuit_1 and circuit_2 are shown in Fig. 4 and Fig. 5. The simulated input and output waveforms of the proposed Fig. 4 Simulated input and output waveforms of the proposed 11 T 1-bit comparator circuit_1

Fig. 5 Simulated input and output waveforms of the proposed 10 T 1-bit comparator circuit_2

340

M. R. Machupalli et al.

1-bit comparator circuit_1 and circuit_2 for A < B, A = B and A > B are shown in Fig. 4 and Fig. 5. In simulated waveforms, the inputs (A, B) are applied for all possible combinations, i.e., 00 to 11. The aspect ratio of the transistor is considered a minimum for the proposed 2-bit and 1-bit comparator circuits, i.e., width = 0.4 µm and length = 0.022 µm for pMOS and width = 0.2 µm and length = 0.022 µm for nMOS. For this purpose, BSIM3 LEVEL 53 nMOS. 1 model and pMOS. 1 model transistors with Vtn = 0.31 V and Vtp = − 0.29 V are used. The layout of the proposed 1-bit comparator circuit_1 and circuit_2 is shown in Figs. 6 and 7 drawn using Pyxis layout Mentor Graphics tool at 130 nm CMOS technology design rules. Because of the space constraint, the layout of the existing ones is not shown. But, the numerical value of the area occupied by the 1-bit comparator circuits is shown in Table 1. By using Euler’s path approach [3], the area of the proposed 1-bit comparator circuit_2 is reduced to 3.82 µm2 on a silicon chip, reduced by 37%, 7.5%, and 4.7% of the area when compared to the CMOS, PTL, and half-adder logic comparators. Fig. 6 Layout of the proposed 11 T 1-bit comparator circuit_1

Fig. 7 Layout of the proposed 10 T 1-bit comparator circuit_2

Evaluation and Performance Analysis of Magnitude Comparator …

341

Table 1 Comparison of design metrics for proposed and existing 1-bit magnitude comparators Comparator logic structure

Transistor count

Area (µm2 )

Propagation delay(ns)

Power consumption (µW)

Power delay product (fJ)

CMOS logic [2]

28

6.12

6.03

0.45

2.713

Transmission gate logic (TGL) [6]

36

7.35

7.14

0.66

4.712

Pass transistor 14 logic (PTL) [1]

4.13

3.56

0.31

1.103

Hybrid logic [1]

20

4.86

4.92

0.37

1.820

Modified TGL [2]

14

4.15

3.52

0.29

1.020

Half-adder logic [2]

12

4.01

3.39

0.26

0.881

Proposed 11 T 11 logic

3.87

3.27

0.25

0.817

Proposed 10 T 10 logic

3.82

3.16

0.23

0.726

The performance analysis and post-layout simulation results of the proposed and alternative implementations of 1-bit and 2-bit comparator circuits with VDD = 1 V at 22-nm CMOS technology is Tableated in Table 1 and Table 2. A comparison is included with existing reported designs, which demonstrates the benefit of the proposed 1-bit and 2-bit comparators, exhibits better power delay product (PDP), and the results have been shown in Tables 1 and 2. Transistors size has been optimized to minimize the PDP. Among all the 2-bit comparator circuits, the proposed 2-bit comparator has a minimum PDP, which proved significantly reduced 29, 7.8, and 2% for PTL, half-adder logic and modified TGL circuits. While optimizing the transistor sizes of 1-bit and 2-bit comparators, it is conceivable to decrease the delay of all 1-bit and 2-bit comparators without significantly expanding the power utilization.

4 Conclusions The hybrid-CMOS design style gives more freedom to the designer to select different modules in a circuit depending upon the application. In this paper, energy-efficient two novel 1-bit comparator circuits have been proposed, and the design has been extended for the 2-bit comparator also. The simulation results show that proposed 1-bit and 2-bit comparator circuits have better performance in terms of propagation delay, power dissipation, and PDP at supply voltage 1 V than most of the conventional

342

M. R. Machupalli et al.

Table 2 Comparison of design metrics for proposed and alternative 2-bit magnitude comparators Comparator logic structure

Transistor count

Area (µm2 )

Propagation delay (ns)

Power consumption(µW)

Power delay product (fJ)

CMOS logic [2]

54

22.96

8.40

0.62

5.208

Transmission gate logic (TGL) [6]

74

29.41

9.52

0.94

8.948

Pass transistor logic (PTL) [1]

40

15.72

7.13

0.51

3.636

Hybrid logic [1]

46

17.53

7.64

0.55

4.202

Modified TGL [2]

30

13.40

6.26

0.42

2.629

Half-adder logic [2]

32

13.68

6.52

0.43

2.803

Proposed 10 T logic

30

13.15

6.15

0.42

2.583

and existing comparator circuits owing to the novel design modules proposed in this paper. The proposed 1-bit comparator circuit_2 occupies an area of about 3.82 µm2 on a silicon chip, reduced by 37, 7.5 and 4.7% of the area when compared to the CMOS, PTL, and half-adder logic comparators. Among all the 2-bit comparator circuits, the proposed 2-bit comparator has a minimum PDP, which proved significantly reduced 29%, 7.8%, and 2% to PTL, half-adder logic and modified TGL circuits. Therefore, the proposed 2-bit comparator remains one of the best contenders for designing highspeed CPUs and microprocessor circuits with low power consumption, less amount of area, and reduced energy consumption. Acknowledgments We are very grateful to the college authorities of KKR & KSR Institute of Technology and Sciences, Guntur for providing a Research & Development Laboratory to carry out the research work.

References 1. Sorwar, A., Rangon, M.M.T., Sojib, E.A., Chowdhury, M.S.A., Dipto, M.A.Z., Siddique, A.H.: Design of a high-performance 2-bit magnitude comparator using hybrid logic style. 11th ICCCNT 2020—IIT, Kharagpur, pp. 1–6. IEEE Xplore, India (2020) 2. Mukherjee, D.N., Panda, S., Maji, B.: Performance evaluation of digital comparator using different logic styles. IETE J. Res. 64, 422–429 (2018) 3. Kang, S., Leblebici, Y.: CMOS digital integrated circuit, analysis and design, 3rd edn., pp. 295– 302. Tata McGrawHill, New Delhi (2003) 4. Sharma, A., Singh, R., Kajla, P.: Area efficient 1-bit comparator design by using hybridized full adder module based on PTL and GDI logic. Int. J. Comput. Appl. 82, 5–13 (2013)

Evaluation and Performance Analysis of Magnitude Comparator …

343

5. Kumar, D., Kumar, M.: Design of low power two-bit magnitude comparator using adiabatic logic. In: 2016 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), pp. 1–6. IEEE Xplore, Thailand (2017) 6. Anjuli, S.A.: Two-bit magnitude comparator design using different logic styles. Int. J. Eng. Sci. Invention 2, 13–24 (2013) 7. Aggarwal, M., Mehra, R.: Performance analysis of magnitude comparator using different design techniques. Int. J. Comput. Appl. 115, 12–15 (2015) 8. Sharma, A., Sharma, P.: Area and power efficient 4-bit comparator design by using 1-bit full adder module. In: 2014 International Conference on Parallel, Distributed and Grid Computing, pp. 1–6. IEEE Xplore, India (2015)

Artificially Ripened Mango Fruit Prediction System Using Convolutional Neural Network V. Laxmi and R. Roopalakshmi

Abstract India is one of the chief producers and consumer of mangoes. Mango being a climatic fruit, there is a high demand during the season because of which mangoes are usually ripened artificially with artificial ripening agents like calcium carbide. The artificial ripening agents cause lots of health hazards on the consumer because of which it is important to know if the mango fruit is artificially ripened or naturally ripened. Whereas, computer vision-based techniques are evolving for quality assurance, classification, defect, and disease detection of fruits. Hence, detection of artificially ripened mango fruit helps consumer in quick decision-making when compared with manual identification. The most successful deep learning model is convolutional neural network (CNN) which has made a remarkable achievement in the field of identification, defect detection, and classification of fruits. This paper proposes the usage of CNN-based artificially ripened mango fruit prediction with binary cross entropy for loss reduction. This model results in classifying artificially ripened mango fruits with increased rate of accuracy and better loss reduction. It has a good outlook with the artificially ripened mango fruit prediction system.

1 Introduction Mango is one of the most consumed seasonal fruit globally. Mango is usually harvested at an un-ripened stage and allowed to ripe naturally. Due to the increasing demand by the customers, globally, vendors make use of artificial ripening agents like calcium carbide in order to ripen the fruit. This causes a lot of ill effects on the health of consumers. The facility to identify such artificially ripened mango fruit using V. Laxmi (B) Department of Information Science and Engineering, BNMIT, Bengaluru, Karnataka 560085, India e-mail: [email protected] R. Roopalakshmi Department of Computer Science and Engineering, MIT, Manipal, Karnataka 576104, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_32

345

346

V. Laxmi and R. Roopalakshmi

computer vision helps the consumers to avoid consumption of such fruits and avoid any harm to their health. To ensure the safety of consumer’s health, an artificially ripened mango fruit identification technique has to be paramounted by automated image processing tools. At present computer vision and image processing, technology has great advantage in making decisions at a faster rate. To develop an understanding of image classification, traditional machine learning classification techniques like AdaBoost, SVM, etc., have to be ground up. Most of these traditional classifier techniques use shallow structures to deal with complex classification problems. Therefore, the potential of deep learning techniques such as convolutional neural networks (CNNs) has to be highlighted over the traditional classifiers. The CNNs [1] have become most successful deep learning model in computer vision industry. CNN models with pretraining and fine-tuning speed up the performance [2]. The advantage of CNN is its ultra-fast processing for precise prediction and decision-making. This paper first ponders to generate a database for Indian mango species like Alphonso, Badam, Mallika, and Neelam. Then, new CNN-based artificially ripened mango fruit prediction system is presented. The proposed technique is a fine-tuned CNN model with binary cross entropy infused for loss reduction and better prediction rate. This paper is organized as follows: Sect. 2 provides overview of fruit and vegetable classification, quality assessment, defect and disease detection in fruits, and ripening stages of fruit. Section 3 delivers the proposed CNN-based artificially ripened mango fruit prediction system infused with binary cross entropy. The performance and evaluation of the model are then presented in Sect. 4. Finally, the concluding remarks and a viewpoint are given in Sect. 5.

2 Related Work The application of image processing and computer vision has vast growth in agricultural industry. A number of approaches are debated for fruit classification, quality assessment, defect detection, and grading problems. Most of the methods use color and shape features only. Anurekha and Sankaran propose a classification and grading for mangoes using genetic adaptive neuro fuzzy inference system (GANFIC) [3]. The approach extracts features like shape, texture, and color from 2D mango images. A genetic algorithm is used for feature selection, and the method applies fuzzy inference procedure to perform classification and grading. The method also evaluates multifeature class similarity measures to perform classification. This approach has achieved sensitivity of 98.05%, specificity of 97.39%, and 99.18% accuracy for the same. Nevertheless, machine learning techniques might reduce the computational speed in comparison with this system. Thinh et al. in [4] propose an artificial intelligence-based mango classification for quality assessment. The proposed system is a combination of image processing and artificial intelligence for classifying mangoes. Quality assessment of mangoes

Artificially Ripened Mango Fruit Prediction System …

347

is based on color, volume, size, shape, and fruit density. CCD cameras are used to capture the images, and split layer processing is done to determine mass and volume which helps to determine maturity and sweetness of mangoes. The major drawback of the system is the use of CCD cameras to capture images. Bhargava and Bansal in [5] present a review of various techniques suitable for fruit and vegetable quality inspection. The paper discusses various methods for image acquisition, preprocessing, segmentation, feature extraction, and classification. Under image acquisition ultrasound, infrared and tomographic image acquisition techniques are summarized. For pixel preprocessing color space transformation (CST), hue, saturation, and intensity (HSI) color space, monochromatic image preprocessing techniques have been discussed. Some of the segmentation techniques have also been conversed. Among all the techniques discussed, RGB-based color extraction with deep learning methods for classification is missing. Renuka Devi and Tamizharasan in [6] propose disease prediction in pre- and postharvest time of mango cultivation. This method detects 7 disease found in the mango fruit and the level of infection found in the fruit using hybrid multiclass neural SVM classifier with a RMBF feature extraction algorithm. The evaluation is done against classifiers like KNN, fuzzy logic, and Naive Bayes. This system gives a prediction accuracy of 98.4% but RMBF feature extraction taking a lot of computational storage space is one of the downsides. Another statistical feature and SVM-based automatic classification of mangoes are proposed by Santi Kumari Behera. Zhao et al. in [7] developed a simple machine vision to detect tomatoes in green house. The detection style works in two steps (1) extracting Harr-like features in gray scale images and categorizing AdaBoost classifier. (2) Average pixel value (APV)based color analysis approach is used to eliminate false negative classifications in the result. The proposed system gives a promising result of more than 96% to detect ripped tomatoes in real-world environment. False negative rate is about 10% and 3.5% of tomatoes are not spotted. Major weakness of the system is it uses Harr-like feature extraction techniques. Mazen in [8] proposed a artificial neural network-based ripeness classification of banana. To find the ripening stages of banana, an instinctive computer vision system is proposed. A four-class homemade database is prepared to which ANNbased framework is used to extract texture features like color, brown spot development, and Tamura statistics to classify banana fruit ripening stage. Proposed technique is compared with SVM, Naive Bayes, KNN, and decision tree analysis classifiers. Result of 97.75% is achieved from the system. One limitation of ANN is it is computationally expensive. Jogin et al. focus on feature extraction using CNN and deep learning in [9]. The existing image cataloging procedures are quicker in run while and more correct than ever before. Deep learning models are able to recognize numerous phases of image representation when they are trained well. The convolutional neural networks can also learn basic shapes in first layer and then learn image features in upcoming layers, this led to more precise image classification. The hunch of CNN is inspired by the ordered depiction of neurons by Hubel and Wiesel in 1962, study of spurs of visual pallium in cat was their work. In computer vision field, it was a vital discovery and

348

V. Laxmi and R. Roopalakshmi

helped in understanding the working of visual pallium in human beings and wildlife. In this paper, feature of images is mined by means of CNN using the concept of deep learning. From the above survey carried out, one can conclude that the methods discussed concentrate on computer vision-based machine learning traditional techniques used to classify different fruits for quality assessment, defect detection, ripeness measurement, and disease detection. Most of the techniques discussed talk about traditional classifiers like SVM, AdaBoost, K-means, decision tree, and Naive bayes which need a particular format of file (.csv) to be given as input. This file must consist of features extracted for the classification purposes. This acts as a major disadvantage when compared to the modern techniques of deep learning like ANN and CNN. These techniques reduce the burden of using a particular format of file for classification. Further, CNN not only extracts features of a given image but also stores the weights of the extracted features in its neurons for training purposes. This adds on as a great advantage to classify a given problem. Hence, this paper proposes a CNN-based artificially ripened mango fruit prediction system for the achievement of better processing speed and greater accuracy.

2.1 Motivation and Contribution In literature survey, nevertheless, a lot of hard work are fixated toward quality assessment, defect detection, ripeness measurement, and disease detection of fruits and vegetables. Scarcer efforts are made toward classification/detection/prediction of artificially ripened fruits especially mango fruit. From another perspective using deep learning techniques for color, texture, and shape-based feature extraction for a clear prediction of artificially ripened mango fruit helps a consumer to take faster decisions. So, to bestow toward this arena, this paper proposes a CNN-based artificially ripened mango fruit prediction system. This paper also recommends an algorithm for DFI construction and dominant color extraction used as a feature extraction technique for AdaBoost and SVM classifiers only. Whereas, feature extraction within CNN is also projected and the usage of binary cross entropy for loss reduction is crammed within CNN in order to achieve greater accuracy for the prediction system.

3 Proposed CNN-based Artificially Ripened Mango Fruit Prediction System To detect artificially ripened fruits are one of the vital elements in computer vision research field. This is because consumption of artificially ripened fruits like mango causes diarrhea, vomiting, stomach aches, nausea, etc., health hazards. Manual approach of identifying artificially ripened mango fruit is slow. Therefore, this article

Artificially Ripened Mango Fruit Prediction System …

349

proposes artificially ripened mango fruit prediction system that facilitates decisionmaking at a much faster rate. To begin with images are acquired and then preprocessed by performing cropping by contours masking operation so that the images are ready for feature extraction. Features based on raw RGB color values are extracted within CNN, then decision-making and prediction of artificially ripened mango fruit are attained.

3.1 Image Acquisition and Preprocessing Proposed mango fruit prediction model needs a dataset involving of mango fruit images. This is created for both training and testing drives. Indeed 80 mangoes fitting to four diverse types viz., Badam, Neelam, Alphonso, and Mallika are considered. Among two sets, one set is allowed to ripen naturally while the other set is chemically ripened using chemical ripening agent (calcium carbide). After the ripening procedure is completed, the mango images are taken using a Canon power shot digital camera. After image acquisition and before training CNN, generated dataset has to go through preprocessing. Firstly, cropping by contouring is done in order to extract the areas of interest. A smallest horizontal and vertical coordinate are defined from top left corner of the contour which are indexed, and then, these coordinates are cropped. Thus, the cropped image is further processed for prediction. Contour cropped and masked image of the preprocessing step are shown in Fig. 1. Fig. 1 Sample output for contour cropping and masking

350

V. Laxmi and R. Roopalakshmi

Table 1 Primary dominant color values obtained from proposed dominant color extraction algorithm Primary dominant color Min

Mean

Max

Avg

SD

NRM

66

183

248

0.24

0.06

ARM

123

222

255

0.18

0.04

3.2 Feature Extraction To regulate a specified image is natural or chemical ripened, feature extraction plays a vital role. For classification using AdaBoost and SVM classifiers, dominant color values of features with reverence to RGB are extracted with the help of MPEG–7 color dominant features [10]. To calculate color frequencies for RGB values, a 3D gamut of the image is built as given in [11, 12]. Along with this, dominant frequency image (DFI) is produced for the pixel rate of recurrence in the original image. Mostly, DFI edifice is needed in order to find dominant colors. Accuracy of value abstraction from DFI is improved so that dominant colors are obtained. For dominant color abstraction improved procedure for DFI edifice is used. For primary dominant color, it is detected that naturally ripened mango (NRM) fruit varies in the range of 140–205 compared to artificially ripened mango (ARM) ranging from 210 to 245. Secondary and tertiary dominant colors pointedly differ in values of both natural and artificial ripened images. Further, Table 1 indicates the primary dominant color discrepancy of both natural and artificially ripened fruits in terms of max, min, avg, mean and SD. These extracted features are stored in a data file of CSV format and labeled accordingly to pass through SVM and AdaBoost classifiers for classification.

3.3 Feature Extraction and Training for CNN-based Mango Fruit Prediction A feature plot for data in machine learning is formed to solve the problem of classifier. This is applied on every problem which has unique set of data and tactics. Among deep learning models, CNN can accept input image, assign weights, and biases which can be learned based on several things like facets/objects present in the image and also separate them. ConvNets preprocessing is much minimal when compared to other classification algorithms. Filters are hand-engineered in rude methods. Hence, with adequate training, CNN has the ability to learn these filters/features. For CNN-based feature extraction, each image of the dimension 780 × 540 is considered. For 3 color channels R, G, and B, raw pixel values of the image are held. Convolutional layer will compute the local dot product with weights and input and this is given by y = f (x*w). Where x is raw pixel value of R channel and w is weight. The resulting output dimensions depend upon the number of filters used and

Artificially Ripened Mango Fruit Prediction System …

351

Fig. 2 Block diagram of CNN for proposed

the filter size of the subsequent layer. These extracted feature values are stored in the neurons of the hidden layer for training purposes. A differentiable function is used to transform every layer’s input to output. Only few such distinct layers are available. Normally, convolutions are slow when compared to max pool during propagations like forward and backward. For a deep network, training step will take a lot of time. Figure 2 shows the block diagram for training the CNN-based mango fruit prediction system’s layers. There is an input layer, four hidden layers among which activation layer is ReLU and it will not change the dimension of the layers. Now, to reduce the dimension, a max pooling layer is used for value down sampling. The proposed system has fixed filter size and repetition of (CNN–ReLU–max pooling) layers is for ten times. Finally, at the end, number of weights and neurons are reduced gradually in fully connected layer. One output layer with two neurons that represent the two-class prediction/classification for naturally ripened mango and artificially ripened mango, respectively. CNN can effectively capture the spatial and temporal need of the image. The proposed architecture executes in a better manner by reducing the number of tangled parameters in the image dataset. Reusability of weights is also supported in proposed CNN architecture and also reduces the original image form into a much easier form for processing. In order to get good prediction, the features which are critical have to be extracted without any feature loss. Class scores are computed in fully connected layers. Binary cross entropies log probability is sandwiched on to two output neurons so that class scores can be mapped. The mathematical model for binary cross entropy is as shown in Fig. 3. Where y is the label, predicted probability of the point p(y) is

Fig. 3 Binary cross entropy mathematical model

352

V. Laxmi and R. Roopalakshmi

computed for all N points. Looking into the formula, each label with y = 1, will add log (p(y)) to the loss. Conversely, y = 0 will add log (1 − p(y)). Rectified linear unit (ReLU) is the activation function used by the proposed system. y = max (0, x) is the mathematical definition for ReLU, where x is input to a neuron. A hyperparameter named epoch will decide upon the number of times a training algorithm must work for complete training dataset. Number of epochs will decide for how many times the weights of the network are changed. In this model, the epoch value is chosen to be ten with a batch size of ten. The pooling layer is responsible to lower spatial size of convoluted layers feature. This is done to reduce the dimensionality of data and lower computational overload to process the data. Kernel will cover maximum value from the portion of the image; later, max pooling is used by the proposed model. De-noising is performed to discard the noisy activations if any. Now, the model is enabled successfully after the completion af above mentioned process. Finally, the model flattens the output and the result is fed to the fully connected neural network for classification.

4 Results and Discussion 4.1 Experimental Setup To assess the performance of proposed mango fruit prediction model, a home dataset consisting of mango fruit images is produced for both training and trying purposes. Around 1100 fruit images are generated for both training and trying purposes. Datasets are labeled separately as NRM fruits and ARM fruits. To know if the given mango image is naturally ripened or artificially ripened, mango fruit prediction model is proposed. When compared with AdaBoost and SVM classifiers, proposed convolutional neural network (CNN)-based mango fruit prediction model provides a better accuracy. Figure 4 illustrates few sample photographs of mangoes with different apprehending angles. MPEG–7 color dominant features are considered for differentiating natural and artificially ripened mango fruits in AdaBoost and SVM classifiers. ARM fruits are labeled as class1 fruits, and NRM fruits are labeled as class2 fruits. It is seen that along with shape, RGB values of MPEG–7 color dominant features are used as a discriminating factor between class1 fruits and class2 fruits. However, this analogy is excluded in the proposed system by considering CNN based feature extraction for color and texture. To measure the performance of proposed system, a binary confusion matrix (2 × 2) is considered. Figure 6 shows the confusion matrix for the proposed mango fruit prediction model using CNN. The model achieves a 96.67% prediction rate. Compared to the AdaBoost and SVM classifiers performance, proposed CNN-based mango fruit prediction model aims at delivering a higher and better performance as shown in Fig. 5.

Artificially Ripened Mango Fruit Prediction System …

Fig. 4 Sample snapshots of mango images (vertical/horizontal axis, top/side/front views) Fig. 5 Confusion matrix for CNN-based mango fruit prediction system

Fig. 6 CNN-based mango fruit prediction system precision graph

353

354

V. Laxmi and R. Roopalakshmi

Graphical results show that the accuracy of AdaBoost and SVM classifiers is less than the proposed CNN-based mango fruit prediction model. From the experimental observation, the model accuracy for AdaBoost classifier is 86.44%. Though the results of AdaBoost are satisfactory the FP prediction rate is high, similarly, the model accuracy for SVM classifier is 83.46% but the prediction of FP rate is not convincing. Precision and cost are the two measures considered for proposed model prediction tests. Precision is defined as the ability of the prediction model to exactly predict an instance of a particular class from the datasets. On the other hand, cost is the inability of the prediction model to correctly predict an instance of a particular class from the datasets. Precision = TP/(TP + FN)

(1)

Cost = TP/(TP + FP)

(2)

and

where TP, FP, and FN are the number of true positive, false positive, and false negative predictions for the considered class. The overall precision percentage for CNN-based mango fruit prediction system is 96.67%, and overall cost percentage is 2.29%. Misclassification percentage for training dataset is found to be 0.25%, and for testing dataset, it is 0.32%. Therefore, overall misclassification will be 0.28%. Figure 6 shows proposed model’s precision graph. Observations show that, for training datasets, there is a linear growth in precision but the variation for testing datasets is observed. Figure 7 shows the proposed models cost graph. Experimental observations show that there is a linear decrease in the cost of the model as the epochs are increased for both training and testing datasets. To check the quality of the proposed system, around 400 NRM and 400 ARM images for training purposes are considered. Similarly, about 200 NRM and 250 Fig. 7 CNN-based mango fruit prediction system cost graph

Artificially Ripened Mango Fruit Prediction System …

355

ARM images for testing purposes are considered. It is detected that the proposed CNN-based mango prediction system shows a promising prediction rate for both class1 and class2.

5 Conclusion A CNN-based fruit prediction model for detection of artificially ripened mangoes is discussed. The proposed model uses color, texture, and shape feature as described in Sect. 3.3 for detection of artificially ripened mango among two mango classes. The proposed systems performance as compared with traditional classification algorithms like AdaBoost and SVM is optimum. The overall of 96.67% accuracy is achieved for naturally ripened mangoes and 97.74% for artificially ripened mangoes. Proposed model is simple and has highly efficient recognition rate. The speed of classification is 72 s for 1100 mangoes which makes it a profitable and productive deep learning and computer vision machine for the consumers. Further, focus can be given on minimizing training and testing run time for mango fruit classification by applying transfer learning or deep learning methods to semi supervised or supervised machine learning algorithms.

References 1. Lee, S., Chen, T., Yu, L., Lai, C.: Image classification based on the boost convolutional neural network. IEEE Access 6, 12755–12768 (2018) 2. Xin, M., Wang, Y.: Research on image classification model based on deep convolution neural network. J. Image Video Proc. 2019, 40 (2019). https://doi.org/10.1186/s13640-019-0417-8 3. Anurekha, D., Sankaran, R.A.: Efficient classification and grading of MANGOES with GANFIS for improved performance. Multimed Tools Appl. 79, 4169–4184 (2020) 4. Thinh, N.T., Duc Thong, N., Cong, H.T., Thanh Phong, N.T.: Mango classification system based on machine vision and artificial intelligence. 2019 7th International Conference on Control, Mechatronics and Automation (ICCMA), pp. 475–482. (2019). https://doi.org/10.1109/ICC MA46720.2019.8988603 5. Bhargava, Anuja., Bansal, Atul.: Fruits and vegetables quality evaluation using computer vision: a review. J. King Saud Univ.—Comput. Inf. Sci. 33(3), 243–257 (2021). ISSN 1319-1578. https://doi.org/10.1016/j.jksuci.2018.06.002 6. Renuka Devi, M., Tamizharasan, A.: Prediction and classification of disease in mango fruit using hybrid multiclass neural SVM. JCR. 7(10): 1770–1778 (2020) https://doi.org/10.31838/ jcr.07.10.318 7. Zhao, Y., Gong, L., Zhou, B., Huang, Y., Liu, C.: Detecting tomatoes in greenhouse scenes by combining AdaBoost classifier and colour analysis. Biosyst. Eng. 148, 127–137 (2016). ISSN 1537-5110 8. Mazen, F.M.A., Nashat, A.A.: Ripeness classification of bananas using an artificial neural network. Arab. J. Sci. Eng. 44, 6901–6910 (2019). https://doi.org/10.1007/s13369-018-036 95-5

356

V. Laxmi and R. Roopalakshmi

9. Manjunath, J., Mohana, M., Madhulika, M., Divya, G., Meghana, R., Apoorva, S.: Feature extraction using convolution neural networks (CNN) and deep learning. 2319–2323 (2018). https://doi.org/10.1109/RTEICT42901.2018.9012507 10. Laxmi, V., Roopalakshmi, R.: A novel frame work for detection of chemically ripened mango fruits using dominant colour descriptors, https://doi.org/10.3233/APC200114, Advances in Parallel Computing, vol. 37, pp. 18–27. Intelligent Systems and Computer Technology (2020). ISSN 0927-5452 11. Mark Hayworth. Color Frequency Image. https://www.mathworks.com/matlabcentral/fileex change/28164-color-frequency-image. MATLAB Central File Exchange (2019) 12. Laxmi, V., Roopalakshmi, R.: Detection of chemically matured mango fruits using laplacian descriptors and scale determinants (November 21, 2020). Proceedings of the 2nd International Conference on IoT, Social, Mobile, Analytics and Cloud in Computational Vision and BioEngineering (ISMAC)

Quality and Dimensional Analysis of 3D Printed Models Using Machine Vision S. H. Sarvesh, Kempanna Chougala, J. Sangeetha, and Umesh Barker

Abstract The main issue in manufacturing and 3D printing industries is quality and dimensional analysis of the 3D printed models. As quality and dimensional analysis are manual at present time, and the mistakes in these fields sway the item trust and quality. This research work is carried out to create a system which is able to detect the defects in a 3D printed model. From this work, we are able to aid several 3D printing industries which in turn reduces manual labor for inspection and monitoring the process. Hence, this reduces labor cost and time. The prime aspect of this research is to ensure quality of 3D printed models. The quality assessment will be done by checking for defects such as shape, color, chips, and in dimensional analysis, we check for dimensional aspects such as height and length of 3D printed models. The methodology incorporated is tested on several 3D printed models.

1 Introduction 3D printing is a breakthrough technology, it has various applications in various fields like civil engineering, manufacturing industries, etc. And uses of 3D printing are arising nearly continuously, and as this innovation keeps on infiltrating, all the more generally and profoundly across mechanical, creator, and customer areas, this is simply set to increment. 3D printed models discover their utilizations in pretty much every field. The advancement of 3D printing has seen a quick development in the quantity of organizations embracing the innovation. The applications and use cases differ across businesses, yet extensively incorporate tooling helps, visual and even end-use parts and practical models. It takes so many hours even to print a small 3D printed model, but sometimes due to certain issues, the print will not be up to the mark, due to this, hours of printing time will be wasted. In order to prevent S. H. Sarvesh (B) · K. Chougala · J. Sangeetha · U. Barker Department of Computer Science and Engineering, Ramaiah Institute of Technology, Bengaluru, India J. Sangeetha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_33

357

358

S. H. Sarvesh et al.

this, we need to optimize the quality of each print, we should monitor the print and find out if there is any cracks or defects that can occur during the process and alert the user regarding the issue and also verify the printed models. People can discover such defects without earlier information on the imperfection free example. Imperfections are seen as in-homogeneities in consistency and direction fields. Two different approaches used. The first approach characterizes underlying deformities as districts of unexpectedly falling consistency. The second approach as annoyances in the prevailing direction. These two approaches can be used in machine vision. Machine vision is emerging as a most necessary technology in the production line of all the industries to control quality of the product and for defect management. It can be used to detect defects in the wide range of products such as from various mechanical products to civil construction works. One such research [1] developed a integrated living product model which is used in civil construction works for quality control. This system is capable of finding design change defects, thickness of a wall, columns, beams, and flat slabs. As the defects will detected in early phases of construction, hence it is cost effective. In research [1], authors created as-planned product model which includes pre-planned building-design and process data obtained from scheduling software and design systems. They collected data from as-built model using two types of laser scanners. As-built model defines the actual construction work going on. Finally, the defects are identified in defect model which is difference between as-planned and as-built models. In another research [2], machine vision system is proposed for assessing the quality of tool wears. It is the system to monitor cutting tools, such as its shape and dimensions. In research [2], fiber-optic light source is targeted to the area-ofinterest and camera with 768 × 493-pixel resolution to capture a cutting tool for assessment. The entire image analysis was executed on a (486 DX-66) microcomputer. Image noise was removed by employing moving window averaging technique. Texture operators have been used to transfer same texture pixels into same brightness pixels. This allows accurate separation of pixels which is required for tool wear assessment. The researchers [3] applied machine vision and image processing techniques in the agriculture field and proposed a methodology to detect diseased leaves by processing digital images of leaves. This proposed system detects bacterial disease symptoms, viral disease symptoms, and fungal disease symptoms on leaves. Markov random field (MRF) model is employed for segmentation of color, and it is combined with edge detection technique to accurately identify edges. The vital methods used to detect disease of plants are back propagation network (BPN), support vector machine (SVM), K-means clustering. In research [4], authors build an approach to automate the quality control and inspection process of construction models. This model classifies specifications as “applicable,” “not applicable,” and “possibly applicable” against the construction process. The roughness of machined surfaces can be measured accurately with the help of machine vision [5]. The roughness of any object’s surface can be measured by scattered and specular reflected beam. The idea is smooth object surface reflects laser beam in a specular direction and rougher surface scatters laser beam in all directions.

Quality and Dimensional Analysis of 3D Printed Models …

359

The authors in research [6] conducted a research related to chili disease detection and proposed an approach to disease detection by using captured leaf colors. The colors of the leaf served as a important factor in detection of plant disease. The images of leafs are captured using digital camera and processed by the MATLAB software. Graphical user interface module was handled by the LABVIEW software to access the leaf disease detection system. After capturing the image, the image features (such as color) are extracted and used with color matching applications such as identification of color and image color segmentation. The proposed approach in [6] is fastest and efficient approach in detecting a diseased chili leaf. The research [7] introduced a robot with four wheels to assess the steel bridge. This [7] robot uses attractive force produced by magnets to travel freely on surface of the steel. The robot can climb up any surface of steel bridge with carrying approximately 4 kg of instruments without using any extra power source, rather than just magnetic force. Robot collected data are transmitted to wireless local area network connected laptop which is located at ground level for real-time monitoring and further processing. In research [8], disease detection system was built to recognize plant diseases were used principal component analysis and neural networks to recognize diseased plant images. One research [9] developed software application using the c programming language for image analysis. That software is made up from two modules such as (i) module for system training and (ii) module for image capturing and image processing. This system takes less than 300 ms for image capturing and image processing, with recent advanced computers, we can reduce this time to less than 50 ms. The wave reflection techniques can be used to detect cracks on the surface [10]. The reflection from cracked areas is less intense than that from plain areas. Researchers in [10] proposed a hyper spectral reflectance imaging technique to detect bruises on “Jonagold” apples. Neural networks are more efficient and suitable for recognition and classification, because the input image does not have to be transformed into another representation space [11]. The research [12–14] proposed the machine vision model to find fractures in the lower bridge surface automatically from the captured images. That system is composed of a digital camera and video recorder. This robot is placed below the bridge surface and kept moving automatically throughout the length of the bridge for continuous image capturing and crack assessment. In research [15], authors worked on a method to assess the quality of ceramic tiles and cracks in the concrete. Quality analysis includes detecting long cracks, color, and holes in ceramic tiles manufactured in factories. Images that contain only defects make easier for the detecting process by using principles of image processing and morphological operations [15]. This research work is aimed at providing efficient and consistent defect detection techniques for the 3D printing industries, specifically for the 3D models. Our 3D models were printed using 3D printer and contain shapes such as square, pentagon, and circle. Square or pentagon shapes are expected from each printed model, shapes other than square, and pentagon is detected as error model. Circle shape 3D model is printed for testing purpose. We trained the model using 4500 images of 3D printed models and pixel value of each image is stored in excel file. Our 3D models were printed using plastic raw material, and during printing process, 3D models must receive a proper cooling. Sometimes due to surrounding environment condition, 3D

360

S. H. Sarvesh et al.

models will not get proper cooling and it leads to edge retentions and deformations in the 3D models. Hence, it is important to detect such chip defects. In a very rare situations, 3D printer’s color-ink nozzle misplace the color on the 3D model because of nozzle leakage. So, it is important to detect color defects in the 3D models. In this research, we considered 3D model with red color as error-free 3D model and other than red color models are detected as error models. A 3D printer prints the 3D models layer-by-layer by flushing the raw plastic material. During this process, sometimes due to printer malfunction, the plastic raw material may get overflow or underflow and it causes dimension defects in the 3D models. Hence, it is also important to detect dimension defects in the 3D models. In this research, we considered a 3D model with 38 mm in length and 26.47 mm in height as error-free model. Our defect detection techniques detect shape, chip, color, and dimension of the 3D printed model with the help of morphological operations which are discussed in detail in Sect. 2.4.1, 2.4.2, 2.4.3, 2.4.4, respectively. This paper is organized as follows: Sect. 2 presents the adopted methodology of image acquisition and capturing, edge detection, morphological operations, and detection algorithms for 3D printed models. Section 3 discusses about 3D printed model, hardware setup, software tools used in the system, and experimental results. Finally, Sect. 4 presents conclusion to our study as well as advantages and impact of this research work on 3D printing industries.

2 Methodology The entire process that is being carried out using machine vision in order to detect the possible defects that can occur in the 3D printed models are Shape defect detection, chip defect detection, color defect detection, dimension defect detection. In order to find this, we need to adopt to the procedure which can detect the abovementioned defects and to do that we need to carry out image acquisition and capturing, edge detection, morphological operations followed by detection algorithms for 3D printed models.

2.1 Image Acquisition and Capturing In this research work, we are using images that are suitable for visual scene of humans to identify objects, the process that is being utilized for this technique is morphological process. Initially, we select the images with surface defects in them in order to simplify the process characters activity will be presumed through its qualities, the images of the 3D printed model which are captured using the Pi camera module converted into gray scale which is then sent for sorting using this sorting technique, we can detect various types of faults in the 3D printed model through different algorithms, instead of using regular algorithms, we have to fine tune them

Quality and Dimensional Analysis of 3D Printed Models …

361

in order to detect particular faults through their properties; here, we use techniques like noise reduction, Image smoothing, balance of histogram level in images, and their intensity adjustment.

2.2 Edge Detection Edge detection is a vital part in this process, we have to identify the objects through its edges it can be done by identifying the surface of the object. The importance of light and shadow should be taken into consideration because of which the value of pixel varies. There are two stages in identifying the edges in 3D printed models which are discussed below: 1. 2.

To find the exact number of pixels embedded in the image which can be done by notifying uneven gradients. Gathering the focus of the edge will provide the path where there are lines on the edge and even curves.

In order to improve the detection process, thresholding needs to be taken care of as it enhances the image segmentation which will in turn focus on the prime aspect in the image which is a 3D printed model each object has its own contrast level which is to be fine-tuned in recognizing the image.

2.3 Morphological Operations Morphological operation is a technique that is interpreted to get binary images from gray scale images irrespective of their shapes, the result obtained is fed as input, i.e., the gray scale and binary images are gives as inputs in return we obtain output, the individual pixel value from the output image will be adhered by computing the pixel from the input to its nearest pixel which will for the shape properly. Applying the division measures to gray scale images, we obtain binary image, the morphology algorithm applied on the resultant image, i.e., binary image, we can see the improvement in segmentation, by using thresholding in the background we can get the defects on comparison. Later, the images are sorted based of their size and shape, in this research work, we are using machine vision in order to focus on identifying the defects. The accuracy of framework using imaging technique is connected to the neighborhood unpleasantness in the contrast which in turn gets actuated by deformation, it steers clear of the color contrast. The surface defect and color defect detection are two vital physical properties, which are independent of type of material used. The dimension of the 3D printed models utilized can be plain part or textured part, in order to improve the result dilation activity should be incorporated.

362

S. H. Sarvesh et al.

2.4 Detection Algorithms for 3D Printed Models We will find in this section various procedures created for the recognition of diverse scope of 3D printed models’ defects. Figure 1 depicts the primary process which is been carried out, initially, the picture of the 3D printed model is fed as an input where as we obtain an image where the intensity is being adjusted and the histogram level in the image will be equalized and returned as an output, the output from primary process is fed as an input in the next process where these images will undergo separate process based on the defect detection which differs from one another. The resultant from the primary process, where the histogram level and intensity are fixed, is taken as input for recognizing various deformations, this input will proceed further and undergo several stages to give the final resultant image.

2.4.1

Shape Defect Detection Algorithm

Figure 2 shows the shape defect detection algorithm, the output image from the primary process is converted to gray scale in image converter; later, in the edge

Fig. 1 Primary process of the defect detection algorithm

Fig. 2 Shape defect detection algorithm

Quality and Dimensional Analysis of 3D Printed Models …

363

Fig. 3 Chip defect detection algorithm

detection process, we concentrate on the edges of the 3D printed models using edge detection, a contour will be applied around the 3D printed model. The fill gaps are used to discriminate defected pixels, if the pixel formation is uneven or the shape does not resemble to have a proper edge, it is being notified, the morphology operation will separate the uneven pixel and further more into process will apply noise reduction to remove attention on unwanted pixels and smoothening the object will result in a clean and clear image of 3D printed model containing the defect.

2.4.2

Chip Defect Detection Algorithm

As we can see in Fig. 3, the chip defect detection algorithm in this the output image from the primary process is given to edge detection, in this a contour will be applied around the 3D printed model, followed by fill gaps, it is applied to discriminate the defect pixels, and further in the morphology operations, it separate the uneven pixels and check if the inside shape does not resemble to have a proper edge, later in the process noise reduction will be applied to remove unwanted pixels and smoothening of the object takes place to get clear picture of defect.

2.4.3

Color Defect Detection Algorithm

In the color defect detection algorithm for the input image from the primary process, we apply the morphological operations, and after that image passes to scene change detection (SCD) morphology activity, each color has its own RGB value (i.e., red, green, and blue) that can be detected. Meanwhile, we preset the value in such a way it only detects red color and the system pops up the error if the hue saturation value (HSV) is out of bound, i.e., if it does not match with preset value of red it is treated as defect and later on, it goes through noise reduction process to remove unwanted pixels in this algorithm to get clear image, as we can see the algorithm in Fig. 4.

364

S. H. Sarvesh et al.

Fig. 4 Color defect detection system

Fig. 5 Dimension defect detection algorithm

2.4.4

Dimension Defect Detection Algorithm

We can see in Figure 5, the implementation of dimension defect detection algorithm where the input image is compares with the preset data after the comparison utilizing morphology, we can separate the distinctions in dimension and recognize the variety in dimension contrasted with the preset value. The image from the primary process is given to inverse function, it is used in order to make the image clearer so that edges are refined to calculate their dimensions; later in the process, it goes to morphology operation to separate uneven pixels, and noise reduction method is applied to remove unwanted pixels followed by fill gaps, it is applied to discriminate the defect pixels, and we get the final image with dimension.

3 Experimental Results In this research work, we are discussing about how to detect the defects in the 3D printed model. To execute this work, some of the hardware and software environments were discussed. In Fig. 6, the hardware setup is discussed, and in Table 1, we have described the hardware components. Figures 6a–b are the top view and side view of the hardware setup. According to the Fig. 6, 1 is 3D printed model, 2 is Raspberry Pi 4, 3 is Pi camera and 4 is LED lights The software tools used for this work are listed below: (1) (2)

PuTTY is used for establishing communication between Raspberry Pi and computer using local area network (LAN). VNC viewer, it is used to obtain graphical interface of Raspberry Pi where the operating system (OS) is installed.

Quality and Dimensional Analysis of 3D Printed Models …

(a) Top view image

365

(b) Side view image

Fig. 6 Hardware setup of the research work

Table 1 Hardware setup description Sl No.

Components

Description

1

3D printed model

It is object used for inspection and quality control

2

Raspberry Pi 4

It is microprocessor used for the research work

3

Pi camera

It is used to capture the image in real time

4

LED lights

It is used to increase the illumination on 3D printed model

As shown in Fig. 7, we have considered cube shape 3D printed model and its dimension is 38 mm in length and 26.47 mm in height. The main aim of this work is to detect the defects in the cube shaped 3D printed models as shown in Fig. 8. The type of defects that are identified is discussed below. Fig. 7 3D printed model

366

S. H. Sarvesh et al.

Fig. 8 3D printed models considered in our work

3.1 Shape Defect Detection In shape defect detection, we check if there is any defect in the outer shape of the 3D printed model. The outer shape of the considered models is circle, square, and pentagon. Our area-of-interest in the 3D printed model is only square or pentagon; then, we declare it as error-free model, in the process, if any circle shaped or any other model is input we declare it as an error model. Input of the 3D printed model is shown in Fig. 9 with square model and the algorithm should correctly detect the outer shape of the 3D printed model as square only. In Fig. 10, we can see the algorithm correctly detects the outer shape of the 3D printed model as square. Hence, its error-free model. Similarly, as shown in the Fig. 9 Input of the 3D printed model without shape defect

Quality and Dimensional Analysis of 3D Printed Models …

367

Fig. 10 System detected input image

Fig. 11 Input for the shape defect

Fig. 11, when the circle shape or any other shape (i.e., other than square or pentagon shape) is inputted to the system, the system detects there is error in the shape, which is shown in the Fig. 12. Henceforth, from this research work, we were able to identify the defect in the shape of the 3D printed model. So, whatever is our area-of-interest in the model of the particular shape, the same shaped model we should come across in the process, suddenly if some other shape comes on our way, then we detect the model as defective model.

3.2 Chip Defect Detection In chip defect detection, we check in the 3D printed model if there are any edge retentions, deformation or the edge has been torn off, then we declare the model is

368

S. H. Sarvesh et al.

Fig. 12 Output from the system after detecting shape defect

error. Otherwise, the model is error-free model. We use this algorithm on several numbers of 3D printed models. Our area-of-interest in the 3D printed model is good shape with proper edges in the outer shape. Input of the 3D printed model without chip defect is shown in Fig. 13 with square model having proper edges and the algorithm should correctly detect the outer shape of the 3D printed model as square only. In Fig. 14, we can see the algorithm correctly detects the outer shape of the 3D printed model as square. Hence, it is error-free model. Similarly, as shown in Fig. 15, when the 3D printed model with torn shape (i.e., without proper edges) is input to the system, the system detects there is error in the shape, which is shown in Fig. 16. Henceforth, from this research work, we were able to identify the chip defect in 3D printed model. So, whatever is our area-of-interest in the model of the particular shape, the same shaped model we should come across in the process, suddenly due to some printing error or some other thing interrupted the process, it will lead chip defects by printing the defective model, then we detect the model as defective model. Fig. 13 Input 3D printed model without chip defect

Quality and Dimensional Analysis of 3D Printed Models … Fig. 14 Output from the system without chip defect

Fig. 15 Image of 3D printed model with chip defect

Fig. 16 Output from the system after detecting chip defect

369

370

S. H. Sarvesh et al.

3.3 Color Defect Detection In color defect detection, the system tries to find out if there are any irregularities in the color of the 3D printed model. Our area-of-interest in the 3D printed model is red color. As in the 3D printing process, the entire model will be printed with same filament. Due to some errors, if it happens to print using the other filament with other color then the system should give error. Hence, we use this algorithm and try to detect whether there is any color defect or not in any other inputted 3D printed model. Figure 17 shows the detection of red color in 3D printed model and applying the contour around the model by the system. In Figure 18, when the system comes across the different color like blue color of the 3D printed model, it does not detect the model as system is trained with red color only and contour is not present around the model. Hence, it shows an error in the color. Henceforth, from this research work, we were able to find out in the 3D printed model if there is any defect in the color. So, whatever our area-of-interest in the model Fig. 17 The output from system detecting red color 3D printed model

Fig. 18 The output from the system showing error in color

Quality and Dimensional Analysis of 3D Printed Models …

371

of particular color, the same color should come across in the process, suddenly if some other color comes on our way, then the system detects the model as defective model.

3.4 Dimension Defect Detection Dimension defect detection checks for the proper dimension of the 3D printed model regarding to height and length. Our area-of-interest is the 3D printed model with perfect dimension containing 38 mm in length and 26.47 mm in height. If there is any system error, the machine will print the compressed model with length or height defects, or sometimes if the filament is finished before even the printing process is complete; hence, we will not get the perfect model with perfect dimensions so there will be defect. In Fig. 19, we can see the actual dimension of the 3D printed model measured by ruler, and the system detects the image and measures for its dimension, and all the dimensions are within the specified parameters, the height, and width of the 3D printed model is intact, and within the limits specified, which is shown in Fig. 20. Now, we use dimension defect detection algorithm and try to detect whether there is any dimension defect or not, like defect in height or length in 3D printed model. Figure 21 shows the dimension measured from vernier calipers. We can also observe that the length of 3D printed model is 8mm offset to the preset value (i.e., 38 mm). Figure 22 depicts the output obtained from the system stating an error in length of the 3D printed model using dimensional defect detection algorithm. Henceforth, from this research work, we were able to identify the defect in the dimension as in output image 22, the system shows error in length. Our research work can be implemented in verifying the modular design of 3D printed models, Fig. 19 Input of the 3D printed model without dimension defect

372 Fig. 20 Output from system showing no defect in dimension

Fig. 21 Input for dimension defect measured in vernier calipers

Fig. 22 Output of dimension defect from the system

S. H. Sarvesh et al.

Quality and Dimensional Analysis of 3D Printed Models …

373

it can also be implemented in bricks manufacturing industry, conveyor inspection module, construction material manufacturing units, it aids as smart vision for machine inspection.

4 Conclusion The focus of this research work is to detect the defects in the 3D printed models using machine vision techniques like image processing and morphological operations. With the help of this work, we can build a better framework in resolving the issues in 3D printing industries by automating the defect detection system. We cannot rely on human beings all the time to monitor the 3D printing process. It is also difficult for the human beings to sort the defects with precision, as it takes lot of efforts in order to achieve the accuracy, so we need to build an autonomous system which does the job for us. Compromising on quality over time is not assured, which can result in large number of low-quality products. Hence, with the help of this work, we can successfully differentiate in the type of defect that is present in a 3D printed model using the defect detection techniques. This robotized sorting framework has greater advantage in bringing economic benefits to the particular company. It also increases plant effectiveness and minimize other expenses with slight change brought to the process. Using machine vision, we are able to identify all the defects such as shape, color, chip, and dimension of the 3D printed models. These techniques will ensure creating automatic defect detection system. Hence, this system improves the speed of working boundaries. In future, we can implement IR blaster to check whether there is damage internally which can affect the 3D printed part.

References 1. Akinci, B., Boukamp, F., Gordon, C., Huber, D., Lyons, C., Park, K.: A formalism for utilization of sensor systems and integrated project models for active construction quality control. Autom. Constr. 15(2), 124–138 (2006) 2. Kurada, S., Bradley, C.: A machine vision system for tool wear assessment. Tribol. Int. 30(4), 295–304 (1997) 3. Gavhale, K.R., Gawande, U.: An overview of the research on plant leaves disease detection using image processing techniques. IOSR J. Comput. Eng. (IOSR-JCE) 16(1), 10–16 (2014) 4. Boukamp, F., Akinci, B.: Automated processing of construction specifications to support inspection and quality control. Autom. Constr. 17(1), 90–106 (2007) 5. Dhanasekar, B., Mohan, N.K., Bhaduri, B., Ramamoorthy, B.: Evaluation of surface roughness based on monochromatic speckle correlation using image processing. Precis. Eng. 32(3), 196– 206 (2008) 6. Husin, Z.B., Shakaff, A.Y.B.M., Aziz, A.H.B.A., Farook, R.B.S.M.: Feasibility study on plant chili disease detection using image processing techniques. In: 2012 Third International Conference on Intelligent Systems Modelling and Simulation, pp. 291–296. IEEE, (2012)

374

S. H. Sarvesh et al.

7. Pham, N.H., La, H.M.: Design and implementation of an autonomous robot for steel bridge inspection. In: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 556–562. IEEE, (2016) 8. Wang, H., Li, G., Ma, Z., Li, X.: Image recognition of plant diseases based on principal component analysis and neural networks. In: 2012 8th International Conference on Natural Computation, pp. 246–251. IEEE, (2012) 9. Blasco, J., Aleixos, N., Moltó, E.: Machine vision system for automatic quality grading of fruit. Biosyst. Eng. 85(4), 415–423 (2003) 10. Xing, J., De Baerdemaeker, J.: Bruise detection on ‘Jonagold’ apples using hyperspectral imaging. Postharvest Biol. Technol. 37(2), 152–162 (2005) 11. Fleyeh, H., Dougherty, M.: Road and traffic sign detection and recognition. In: Proceedings of the 16th Mini-EURO Conference and 10th Meeting of EWGT, pp. 644–653. (2005). 12. Oh, J.-K., Jang, G., Oh, S., Lee, J.H., Yi, B.-J., Moon, Y.S., Lee, J.S., Choi, Y.: Bridge inspection robot system with machine vision. Autom. Constr. 18(7), 929–941 (2009) 13. Wu, M., Phoha, V.V., Moon, Y.B., Belman, A.K.: Detecting malicious defects in 3d printing process using machine learning and image classification. In: ASME International Mechanical Engineering Congress and Exposition, vol. 50688, p. V014T07A004. American Society of Mechanical Engineers, (2016) 14. Lim, R.S., La, H.M., Shan, Z., Sheng, W.: Developing a crack inspection robot for bridge maintenance. In: 2011 IEEE International Conference on Robotics and Automation, pp. 6288– 6293. IEEE, (2011) 15. Paraskevoudis, K., Karayannis, P., Koumoulos, E.P.: Real-time 3D printing remote defect detection (stringing) with computer vision and artificial intelligence. Processes 8(11), 1464 (2020)

Traffic Density Determination Using Advanced Computer Vision Narayana Darapaneni, M. S. Narayanan, Shakeeb Ansar, Ganesh Ravindran, Kumar Sanjeev, Abhijeet Kharade, and Anwesh Reddy Paduri

Abstract Estimation of traffic density is a fundamental component of an automated traffic evaluation and monitoring system. It estimation can be used in several traffic monitoring and management applications—from identification of congested traffic flow to macroscopic traffic management in any environment. In this paper, we evaluated MobileNet-SSD and faster R-CNN to evaluate parameterized traffic density in any given traffic image. SSD is capable of handling multiple shapes, sizes, and view angles of objects. MobileNet-SSD, an optimized and a faster algorithm for mobiles, is a cross-trained model from SSD to MobileNet architecture. Faster R-CNN adds a region proposal network (RPN) to generate region proposals instead of the selective search algorithm, which adds a significant processing time to R-CC-based models. The RPN uses anchor boxes for object detection. Producing RPN in the network is faster and better optimized to the data. In this study, we pick a specific use case for the application of MobileNet-SSD and faster R-CNN framework. The advantages and shortcomings of the MobileNet-SSD framework and faster R-CNN were analyzed using feeds from multiple traffic cameras and publicly available labeled MIO-TCD dataset. Multiple models of fasterR-CNN were evaluated on both TensorFlow 1.x and TensorFlow 2.x. Based on the evaluation and a compromise between speed and accuracy, it was finally decided to use faster R-CNN Inception V2 trained on the COCO dataset as the backbone and it was trained on a combination of MIO-TCD localization dataset and manually labeled dataset captured from live camera feeds. The model uses TensorFlow 1.x as the framework.

N. Darapaneni Northwestern University, Great Learning, Evanston, USA M. S. Narayanan · S. Ansar · G. Ravindran · K. Sanjeev · A. Kharade · A. R. Paduri (B) Great Learning, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_34

375

376

N. Darapaneni et al.

1 Introduction With rapid urbanization and higher spending power, congestion in traffic is now becoming a common but serious problem. Lines between peak hour and non-peak hour traffic are being blurred progressively [1–3]. Congestion in urban areas is also a major problem for both developed and developing countries [4]. Experts feel the solution for this problem may be an autonomous and driverless driving system [5]. We already have a basic infrastructure in place. Taking this as a limitation, how can we improve traffic flow by reducing congestion? 1. 2. 3.

Build new infrastructure. Optimize current infrastructure by leveraging innovative technologies using IoT and AI. A mix of both as there are some environmental dependencies-related urbanization and technology reach.

The focus of this study is not on optimizing traffic flow [6–8], but estimating the density of traffic. This is the fundamental parameter over which algorithms may be built to optimize and manage traffic flow [9–12]. In this study, we had used live traffic feeds from three different traffic cameras across different geographies. The algorithm is trained by augmenting the images with blurred and lower resolution for better generalization and for predicting lower quality images. MobileNet-SSD [13, 14] and faster R-CNN were evaluated in this study to compute vehicle density. These algorithms have been optimized and industry validated for object detections with very high accuracies [15–17]. Current use of AI in traffic monitoring is primarily restricted to monitoring of traffic violations/incidents and is mostly a reactive process [18–20]. Hence, there is significant scope for researching and arriving at traffic flow and management solutions for the same as the usage of object detection combined with optimization algorithms in real time is currently very limited [21]. This study was built upon various prior studies using traffic camera feeds by extracting frames to count vehicles on the road and arrive at a traffic density score subject to the various conditions of drivable surfaces, which was subsequently qualitatively assigned a traffic density based on the defined thresholds [22].

2 Assumptions and Limitation The following assumptions were made. 1.

Only 4 types of vehicles were considered a. b. c. d.

Cars Busses Trucks (includes small and large trucks) Bike (includes scooter, motorcycle, and bicycle).

Traffic Density Determination Using Advanced Computer Vision

2. 3.

4.

5.

6.

377

Rain and fog conditions were not considered. However, day and night conditions were considered for training purposes. Traffic density is a subjective score. Heavy traffic in a specific intersection may be light or medium traffic in a different intersection. Hence, a given region of interest would also be associated with traffic threshold values. Traffic density may be applied in the context of flow or occupancy measure of vehicles against drivable surfaces [13]. However, we have scoped this project to subjective analysis of static vehicles measured against defined traffic thresholds. This study focuses on estimation of traffic density characteristics based on defined threshold by extracting frames using faster R-CNN techniques. Aim of this study is to calculate the traffic density (using the above techniques) by leveraging existing traffic cam videos for estimating macroscopic traffic from live camera feeds I combination with publicly available vehicle classification datasets. Model training is a combination of live traffic cam feeds and publicly available traffic datasets, the classes considered in scope of this project are cars, trucks (of any size), busses, and bikes (motorcycles, scooters, and bicycles). Any other object is being treated as background class for this project.

3 Methodology 3.1 Single Shot Detection (SSD) Single shot detector or SSD as it is normally referred takes one shot for detecting multiple objects in any given image. It has very high inference times good enough for real-time video rendering with prediction and a fairly good prediction accuracies. SSD divides the images into multiple grids with each grid detecting objects within that grid. Each grid cell in an SSD has fixed size anchor boxes or priors to address various object sizes [23]. Computationally, it is order of times less expensive than using a sliding window approach. Figure 1 illustrates the SSD architecture. VGG-16 is the core network, including the fully connected layers. VGG-16 is commonly used for transfer learning owing to it is high performance image classification with good accuracy [14]. A set of new convolutional layers is then added to the architecture—these are the layers that make the SSD framework possible [24]. By adding these layers, we get a fully convolutional network with no restriction on image size. What makes SSD work? 1. 2.

As the network goes deeper, we keep reducing the volume size like in a typical CNN. Every convolutional layer is connected to the final layer (detection layer). This helps in localizing object detection at various scales in a single forward pass.

378

N. Darapaneni et al.

Fig. 1 The model architecture of SSD

3.

SSD utilizes fixed size bounding boxes or priors provided based on the actual images. The above makes it fully feedforward giving SSD’s the speed and the efficiency.

3.2 Faster R-CNN (SSD) Faster R-CNN, unlike SSD is a two shot detection algorithm, is a two-stage target detection algorithm. 1. 2.

An region proposal network (RPN) for region proposals to share image convolutional features. A detection network.

The speed is slightly lesser than SSD, but the same is offset by a higher accuracy. This trade off depends on the use cases in question [25]. Selective search of fast R-CNN is replaced with a network for faster inference times. The algorithm splits the images into smaller sectional areas and is sent through convolutional filters to extract feature maps. Faster R-CNN was used in this study to get a slightly higher accuracy. Figure 2 shows the block diagram of faster R-CNN architecture. Training Process: Feature extraction in faster R-CNN uses an established base network. In this study, we have used the Inception V2 feature extractor. The extracted features are subsequently passed through RPN which slides over the feature map using the anchor points. Using two fully connected layers, the regions are determined, which includes the bounding box co-ordinates and probability of belonging to one of the classes or the background. The next stage generates regions of interest (ROI), which passes through a ROI pooling layer to estimate the final class probability and the bounding boxes [26]. In our study, these bounding boxes and classification were super imposed over the ground truth using non-maximum suppression (NMS) for

Traffic Density Determination Using Advanced Computer Vision

379

Fig. 2 The model architecture of faster R-CNN

eliminating overlapping bounding boxes. This study uses TensorFlow 1 application programming interface (API).

4 Data Collection and Preparation 4.1 Data Sources Extensive data search was done for freely available datasets, however, it was finally decided to use live streaming data for this project, collected from 4 different sources [27]. 1. 2. 3. 4.

Live traffic camera feed—Bulgaria/Varna: https://www.youtube.com/watch?v=BGCytWLOmyA Live traffic camera feed—Taiwan/Taipei: https://www.youtube.com/watch?v=INR-B7FwhS8 Live traffic camera feed—Shinjuku/Tokyo: https://www.youtube.com/watch?v=RQA5RcIZlAM MIO-TCD [28] vehicle localization annotated dataset: http://podoce.dinf.usherbrooke.ca/challenge/dataset/.

MIO-TCD [28] dataset is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The following changes were made to MIO-TCD [28] dataset in context of this project. The original 12 labels were rationalized into 4 labels, and the rest was considered to be background class. 1. 2.

Articulated truck/pickup truck/single unit truck—Rationalized to truck. Motorized vehicle/non-motorized vehicle/pedestrian—Labels removed.

380

3. 4. 5.

N. Darapaneni et al.

Car/work van—Rationalized to car Bus—Remained as bus. Bicycle/motorcycle—Rationalized as bike.

4.2 Video Capture and Frame Extraction Individual frames from the live traffic camera feed were extracted as follows: 1. 2.

3.

The YouTube link response was analyzed to extract the underlying original data source. The original data source URL was used as network stream, and the output was streamed to a local file, and the video was saved as MP4 file. A few utilities were analyzed and were finally done using VLC player [29]. VLC player was used to extract frames from each of these video files. The video files were 30 fps, and every 60th frame was extracted from these files. Effectively, an image every 2 s.

4.3 Data Annotation The labels for MIO-TCD [28] dataset were available along with the images. However, we had to label and annotate the data extracted from traffic cameras. For the purpose of annotation, labeling (https://github.com/tzutalin/labelImg) was used and data annotated using Pascal VOC format.

4.4 Data Analysis and Initial Findings Based on the above, we expect good localization and precision for cars across all classes. However, since we have generalized the trucks/car and labels as mentioned in Sect. 5.1, the accuracy or precision may have an impact. The actual figures are presented in the model evaluation section later in this report (Table 1).

5 Model Building and Research 5.1 Model Framework Selection TensorFlow and PyTorch [30, 31] were the 2 frameworks we had analyzed for training and building our object detection model. Within TensorFlow, we had evaluated 1.x and 2.x versions of TensorFlow object detection API. Eventually, it was decided that

Traffic Density Determination Using Advanced Computer Vision Table 1 Distribution of objects across images

381

Object

Count

Distribution %

Bike

9516

2.78

Bus

11,357

3.31

Car

261,638

76.31

Truck

60,368

17.61

Total

342,879

100.00

Total no. of images processed: 111,548 Total no. of objects across images: 342,879

we would go ahead with TensorFlow 1.x for training the model taking the following into consideration. 1. 2. 3.

Stability of the framework. Usability and community support of the framework. Familiarity with the framework.

TensorFlow 2.x was also evaluated, but the framework is a bit unstable due to high number of changes and releases. Hence, it was decided to use the more stable TensorFlow 1.x framework for object detection.

5.2 Model Selection Models pre-trained on COCO dataset were picked up for evaluation. TensorFlow model zoo has a list of pre-trained models available for download and usage. If the object categories in our classification dataset are a subset of the classes for which the model is trained on, the models can be directly used for inference. However, in our case, since we have changed the labels and wanted to specifically re-train the model only for a small subset of classes, we had picked up only the model and trained the model fully for all images from our dataset. The following models were evaluated, and results compared in a later section. 1. 2. 3.

SSD MobileNet v1 COCO. Faster R-CNN ResNet Inception V2. Faster R-CNN Inception V2 COCO.

The following models from TensorFlow2 were also evaluated, but subsequently dropped due to inherent instability of the framework. 1. 2.

EfficientNet D4 1024 × 1024. Faster R-CNN Inception ResNet V2 640 × 640.

The SSD models provided good inference times but struggled with distant and small object detections. Faster R-CNN ResNet Inception V2 provided better accuracy and detection but had very high inference times. Faster R-CNN Inception V2 provided a good balance between inference times and accuracy [32].

382

N. Darapaneni et al.

5.3 Faster R-CNN Inception V2 The inception network (Google LeNet) re-defined CNN classifiers, which until then stacked layers upon layers to go deeper. The inception network achieved higher performance and better accuracy by going a bit wider. It had a lot of inherent complexity but was able to come out with top-5 error rate of 6.67% in 2014, a significant reduction from 15.3% (AlexNet 2012) and 14.8% (ZFNet 2013). More complex versions with iterative improvements over the earlier were released subsequently. Inception was designed to address variation in size, overfitting of very deep networks, and computationally expensive large convolution stacking processes. The network was wider with multiple filter sizes. In Fig. 3 shown about Naïve Inception module and in Fig. 4 shown about inception module with dimension reduction. GoogLeNet has 9 inception modules stacked linearly. It is 27 layers deep including all the pooling layers. It is a deep classifier and suffers from vanishing gradient

Fig. 3 Naïve inception module. (Source Inception V1)

Fig. 4 Inception module with dimension reductions. (Source Inception v1)

Traffic Density Determination Using Advanced Computer Vision

383

Fig. 5 Inception network. (Source Inception V1)

Fig. 6 Model comparison (train/test values)

problems. To handle this, the authors introduced two additional classifiers (the purple boxes in the image). They applied Softmax to the outputs of two of the inception modules and computed an auxiliary loss over the same labels. The total loss function is a weighted sum of the auxiliary loss and the real loss. Weight value used in the paper was 0.3 for each auxiliary loss. Hence, Inception network provided us with a good option to evaluate on multiple shapes and sizes of the same object. This formed the backbone CNN of our Faster R-CNN.

384

N. Darapaneni et al.

6 Result and Discussion The objective of this project is not heavy focus on object detection, but on figuring out how dense the traffic is given an image. Traffic density varies from point to point and intersection to intersection. A line of 30 cars at a signal may be light traffic, while a line of just 10 cars at a different intersection may be heavy traffic. It is a very subjective determination based on specific point of capture of traffic images. Hence, we will have a threshold for traffic density established before we start our prediction. The threshold can vary for each intersection, and we will take it as an input. Since we are not focusing very heavily on accuracy of object detection, we can go with a lesser confidence interval. However, we will evaluate the images at a confidence interval of 0.5. The classes being identified in this model are a. b. c. d.

Car (this includes van, minivan types) Truck (any sort of truck) Bus Bike (this could be a bicycle or motorcycle or scooter).

Since the contribution to the traffic density will vary depending on the type of vehicle, we will assign weights to the individual classes as follows: a. b. c. d.

Bike: 1 Car: 4 (empirical assumption that 4 bikes can occupy the same space as 1 car) Bus: 16 (empirical assumption that 4 cars can occupy the same space as 1 bus) Truck: 24 (empirical assumption that 6 cars can occupy the same space as 1 truck. This is considered as 6 to average out between smaller trucks and huge trailer trucks).

Once the score is computed, it is compared against qualitative thresholds to identify the state of traffic in a given image. A sample score could be a. b. c. d. e.

Density scores 0–20: light traffic. Density scores 21–60: medium traffic. Density scores 61–90: heavy traffic. Density scores 91–120: very heavy traffic. Density scores 121 and above: traffic jam.

The mean average precision and inference times of all the 3 models evaluated are presented in the table.

7 Conclusion The results of the 3 base models which were used for this project. Considering the training and testing values, the model seems to generalize well. However, there is a lot of scope for improving the map values of the bike class.

Traffic Density Determination Using Advanced Computer Vision

385

Logical organization of the following hyper parameters could potentially yield a much better map and more importantly recall values. 1. 2. 3. 4. 5. 6. 7.

Learning rates. Image augmentation options. Number of epochs/iterations. Number of regions. Usage of dropout layers. Image resizing. IOU threshold for non-maximum suppression.

References 1. Animated GIF editor and GIF maker, Ezgif.com. [Online]. Available: https://ezgif.com/videoto-jpgd. [Accessed: 07 Apr 2021] 2. Free online MP4 to JPEG converter, Filezigzag.com. [Online]. Available: https://www.filezi gzag.com/mp4-jpeg-en.aspx. [Accessed: 07 Apr 2021] 3. MP4 to JPG, Onlineconverter.com. [Online]. Available: https://www.onlineconverter.com/ mp4-to-jpg. [Accessed: 07 Apr 2021] 4. Yanagisawa, H., Yamashita, T., Watanabe, H.: A study on object detection method from manga images using CNN. In: 2018 International Workshop on Advanced Image Technology (IWAIT), pp. 1–4. (2018) 5. Guan, T., Zhu, H.: Atrous faster R-CNN for small scale object detection. In: 2017 2nd International Conference on Multimedia and Image Processing (ICMIP), pp. 16–21. (2017) 6. Before you continue to YouTube, Youtube.com. [Online]. Available: https://www.youtube. com/watch?v=1h_lDu_wvR0. [Accessed: 07 Apr 2021] 7. [Online]. Available: http://www.telesens.co/2018/03/11/object-detection-and-classificationusing-r-cnns/. [Accessed: 07 Apr 2021] 8. Liu, Y.: An improved faster R-CNN for object detection. In: 2018 11th International Symposium on Computational Intelligence and Design (ISCID), vol. 02, pp. 119–123. (2018) 9. Abbas, S.M., Singh, D.S.N.: Region-based object detection and classification using faster R-CNN. In: 2018 4th International Conference on Computational Intelligence and Communication Technology (CICT), pp. 1–6. (2018) 10. Raj, B.: A simple guide to the versions of the inception network. Towards Data Science, 29-May-2018. [Online]. Available: https://towardsdatascience.com/a-simple-guide-to-the-ver sions-of-the-inception-network-7fc52b863202. [Accessed: 07 Apr 2021] 11. Gandharv, J.: Google’s Lenet(Inception net)—Juber Gandharv—Medium, Medium, 18-Oct2019. [Online]. Available: https://medium.com/@jubergandharv/googles-lenet-inception-netf256c2976955. [Accessed: 07 Apr 2021] 12. Chen, J., Tan, E., Li, Z.: A machine learning framework for real-time traffic density detection. Intern. J. Pattern Recognit. Artif. Intell. 23(07), 1265–1284 (2009) 13. Tsang, S.-H.: Review: SSD—single shot detector (object detection). Towards Data Science, 03-Nov-2018. [Online]. Available: https://towardsdatascience.com/review-ssd-single-shot-det ector-object-detection-851a94607d11. [Accessed: 07 Apr 2021] 14. Search for a dataset—the Datahub, Datahub.io. [Online]. Available: https://old.datahub.io/dat aset?tags=traffic. [Accessed: 07 Apr 2021] 15. Researchgate.net. [Online]. Available: https://www.researchgate.net/publication/319166906_ Image-Based_Learning_to_Measure_Traffic_Density_Using_a_Deep_Convolutional_Neu ral_Network. [Accessed: 07 Apr 2021]

386

N. Darapaneni et al.

16. Nam, D., Lavanya, R., Jayakrishnan, R., Yang, I., Jeon, W.H.: A deep learning approach for estimating traffic density using data obtained from connected and autonomous probes. Sensors (Basel) 20(17), 4824 (2020) 17. Doulamis, A.D., Doulamis, N.D., Kollias, S.D.: An adaptable neural-network model for recursive nonlinear traffic prediction and modeling of MPEG video sources. IEEE Trans. Neural Netw. 14(1), 150–166 (2003) 18. Traffic density estimation using deep learning, Stackexchange.com. [Online]. Available: https://stats.stackexchange.com/questions/359903/traffic-density-estimation-using-deep-lea rning. [Accessed: 07 Apr 2021] 19. Morgunov, A.: How to train your own object detector using TensorFlow object detection API. Neptune.ai, 30-Oct-2020. [Online]. Available: https://neptune.ai/blog/how-to-train-your-ownobject-detector-using-tensorflow-object-detection-api. [Accessed: 07 Apr 2021] 20. Morgunov, A.: TensorFlow object detection API: best practices to training, evaluation & deployment. Neptune.ai, 22-Dec-2020. [Online]. Available: https://neptune.ai/blog/tensorflow-obj ect-detection-api-best-practices-to-training-evaluation-deployment. [Accessed: 07 Apr 2021] 21. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017) 22. Object localization and detection, Gitbooks.io. [Online]. Available: https://leonardoarau josantos.gitbooks.io/artificial-inteligence/content/object_localization_and_detection.html. [Accessed: 07 Apr 2021] 23. Data.world, Data.world. [Online]. Available: https://data.world/datasets/traffic. [Accessed: 07 Apr 2021] 24. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv [cs.CV] (2015) 25. Top 10 vehicle and cars datasets for machine learning, Lionbridge.ai, 10-Jul2019. [Online]. Available: https://lionbridge.ai/datasets/250000-cars-top-10-free-vehicleimage-and-video-datasets-for-machine-learning/. [Accessed: 07 Apr 2021] 26. N. Darapaneni et al.: Computer vision based license plate detection for automated vehicle parking management system. In: 2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), pp. 0800–0805. (2020) 27. Welcome to Webcamtaxi, Webcamtaxi.com. [Online]. Available: https://www.webcamtaxi. com/en/. [Accessed: 07 Apr 2021] 28. Luo, Z., Charron, F.B., Lemaire, C., Konrad, J., Li, S., Mishra, A., Achkar, A., Eichel, J., Jodoin P.-M.: MIO-TCD: a new benchmark dataset for vehicle classification and localization in press at IEEE transactions on image processing, 2018. MIO-TCD. (n.d.). Retrieved April 7, 2021, from Usherbrooke.ca website: http://podoce.dinf.usherbrooke.ca/challenge/dataset/ 29. VideoLAN. (n.d.). VLC: official site—free multimedia solutions for all OS!—VideoLAN. Retrieved April 7, 2021, from Videolan.org website: https://www.videolan.org/ 30. India, U.: Tensorflow or PyTorch: the force is strong with which one?, Medium, 24-Apr2018. [Online]. Available: https://medium.com/@UdacityINDIA/tensorflow-or-pytorch-theforce-is-strong-with-which-one-68226bb7dab4. [Accessed: 07 Apr 2021] 31. PyTorch versus TensorFlow—a detailed comparison, Tooploox.com, 02-Oct-2020. [Online]. Available: https://www.tooploox.com/blog/pytorch-vs-tensorflow-a-detailed-comparison. [Accessed: 07 Apr 2021] 32. The battle of speed versus accuracy: single-shot versus two-shot detection meta-architecture, Clear.ml, 08-Mar-2020. [Online]. Available: https://clear.ml/blog/the-battle-of-speed-acc uracy-single-shot-vs-two-shot-detection/. [Accessed: 07 Apr 2021]

Analyzing and Detecting Fake News Using Convolutional Neural Networks Considering News Categories Along with Temporal Interpreter Viraj Desai, Neel Shah, Jinit Jain, Manan Mehta, and Simran Gill

Abstract With significant increase in data generation and the ease of sharing information, the sharing of fake information has seen an increase too. This paper proposes a system that accurately calculates the probability of an article being real and classifies into human readable result categories—mostly real, likely real, neutral, likely fake and mostly fake. Both implicit and explicit text and image features present in the news article are used as inputs to the Text image convolutional neural network (TICNN) which outputs the probability of realness that falls into one of the predefined five result categories. If an image is not present in the news article, we use the text features only. The news category for the news article is also predicted and used in determining the trending factor of that news category at the time of use. News articles are divided into five categories—political, business, entertainment, sports, technology and an empirical study is conducted to identify the correlation between percentage of news which is fake, i.e., the fake fraction and the trend fraction for each news category for different lookback window sizes. Furthermore, the temporal interpreter also calculates a delta value based on the appropriate correlation constant and trending threshold defined. This calculated value is then incorporated into the TICNN model’s output probability of realness which makes the entire system more accurate and the output more user readable.

V. Desai (B) · N. Shah · J. Jain · M. Mehta Veermata Jijabai Technological Institute, Mumbai, India J. Jain e-mail: [email protected] M. Mehta e-mail: [email protected] S. Gill Indian Institute Of Information Technology, Allahabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_35

387

388

V. Desai et al.

1 Introduction Fake news is viewed as one of the greatest threats to democracy and trust among people [1]. Fake news has always been in the limelight whenever a major world event takes place, like the 2016 US Presidential elections [2]. The reach of fake news, for example, can be easily seen by the impact it had on the elections for one of the most powerful countries on the planet [3]. Our economies face a similar threat from this problem, as news has an obvious impact on stock market fluctuations and massive trades, and the massive amounts of money involved in the markets provide for the lucrative to exploit the power of fake news to affect the markets [4]. Online platforms like Facebook and Twitter are an easy medium for the spread of fake news and people relying on such sources for news are potential victims of fake news [5]. The rise of social media and its popularity is also a major reason why researchers and students are increasingly interested in determining a news article’s type as real or fake. About 67% of Americans used modern means like social networking websites such as Facebook, Twitter, WhatsApp and similar applications for getting their news, in mid-2017 [6]. Studies in the areas of social psychology and communications suggest that humans lack the ability to detect if what they hear is true or if they are being fooled according to the figures [7]. This makes fake news a growing concern in the modern world, and the accurate detection of fake news the need of the hour. We attempt to provide a solution to this challenging problem in this paper.

2 Problem Statement To design a fake news detection model that processes the text and image present in a news article and computes the probability of an article’s trueness. The model should make suggestions with respect to how the results should be interpreted by a user and report the insights from the analysis of such news articles. Mathematically, Given a set of n articles that contain both text and image, we can represent the data as a set of tuples as shown in Formula (1). A = (a[t][ p], a[i[ p]])t ε TextSet, i ε ImageSet, p ε(1 to n))

(1)

The output labels are denoted as 0 for fake and 1 for real news. A set of features and categories can be extracted from the given text while the implicit and explicit features can be extracted from the given image. The objective of the problem is to build a model that predicts the potential label of a given news article and informs the user if the news belongs to one of the following types: mostly fake, likely fake, neutral, likely true, mostly true. The model should also determine the category to which a particular news belongs, and if that category has been trending in the past ‘d’ days, where d is a predefined interval and then change the way the output of the

Analyzing and Detecting Fake News Using Convolutional Neural …

389

model should be interpreted by a user by referring to the trend of how the fraction of fake articles for that category in each period varies when that category starts to gain popularity.

3 Data Analysis For deriving quantitative and qualitative findings from the raw data, we have carried out a comprehensive investigation of the text and image information in the news articles. There are few observable differences between the real and fake news [8]. We have investigated the text and image data information using a plethora of computational methods which include computational linguistics, textual implicit features, sentiment analysis and other image as well as text features. The quantitative information of the dataset used for the fake versus real news classification are mentioned in this section.

3.1 Datasets The dataset used1 consists of both fake and real news with a total of 20,015 news articles. It has 11,941 fake news and 8074 real news. The dataset used [9] in this study was made available by the authors of [8]. The dataset is based on the dataset collected by Megan Risdal on Kaggle [10] which includes text and metadata from about 244 websites and consists of articles that are all fake. For the distinction between real and fake news we have used the title, text and image information. However only 6940 images are present in the entire dataset, so image analysis is not done on those articles which do not have related image present. The news classifier incorporates a different labeled dataset. The dataset has 2225 documents from the BBC website used for training the news category classifier, which consists of articles from the year 2004 to the year 2005 [11]. This dataset is solely used for training the news classifier and is not used in any other part of the fake news detector.

3.2 Text Analysis Text in news articles can express the mood and emotion of the writer. This makes the analysis of text an important step in determining whether a news article is fake or real. There are either noticeable differences in texts like the use of unnecessary bold characters, capital words, number of words or subtle differences like finding a certain pattern connecting the mindset of the writer. Meaningless numbers and words like 1

https://drive.google.com/file/d/1FZWQZZwaku1AoLXtwqsNlvBiQKncc7KD.

390

V. Desai et al.

no title, IN, Cancer and NOT intermittently occur in news that are fake [12]. The frequent words in real news articles tend to indicate a contrasting trend as words like takeaways, Quotable, Carson and Notable appear more frequently. News having no title or hyperlinks are one of many irregularities found in the dataset and have been dealt accordingly. Computational Linguistic. It deals with the features present in text and often differs based on the language of the text. All the news articles in our text are in English. The implicit features are patterns or traits present in the text which cannot be identified by the human reader directly. These implicit features are captured by the neural network internally and are not possible to quantify or capture in the paper. Hence, we analyze our dataset based on just explicit text features in this section. Sentence length, word count and average words per sentence. The thinking and behavior of the person writing fake news can be understood using the principle of reverse psychology [13]. Hence, we analyze features like word count, sentence length and average words per sentence. Writers of real news generally follow professional standards, which is defined by permissible sentence length and adequate ranges of words present in the article, i.e., variation in word count is limited. The standard deviation in word count for fake news is 957.11 while the standard deviation in word count for real news is 628.34. The standard deviation for word count in fake news is greater than real news. The writer’s psychology is also highlighted by the number of sentences formed in that news article [13]. Fake news has on average 34.49 sentences while this number is 51.92 for real news. So, it is observed that there are fewer sentences in real news as compared to fake news. Real news writers prefer to write concise sentences with a smaller number of words in comparison to sentences present in fake news. There are on average 19.08 words per sentence in fake news, while the number is 16.76 for real news. This means that writers of fake news tend to write longer sentences which try to reduce the reader’s deduction and understanding abilities. The standard deviations for sentence count, word count and words per sentence (average word count) features are all higher for fake news than for real news. Exclamation marks, question marks and capital letters. Exclamations are used for stressing the idea or emotion to the user. The message or information that is carried by fake news is generally twisted such that there is a doorway for emotions to be attached for laying emphasis on the viewpoint of the writer. The exclamations catch the reader’s attention to such points. This is done by use of punctuation marks like exclamation marks and question marks. The average number of exclamation marks used in fake news is almost 4 times the average number of exclamation marks used in real news. Fake news has more rhetorical questions to intensify the emotion associated with it. Rhetorical questions are more frequently used in fake news than in real news [14]. Hence, the average number of question marks used is analyzed for fake and real news present in the dataset. There are indeed more question marks used on average in fake news in comparison to real news. Fake news has on an average 1.70 question marks while real news has on average 1.18 which is a significant difference, a 44% increase in question marks usage from real to fake news. The

Analyzing and Detecting Fake News Using Convolutional Neural …

391

usage of capital words also serves a similar purpose as the punctuation marks, i.e., to emphasize a particular idea. Capital words, especially in bold draw reader’s attention quicker to that part of the sentence. This gives the writer a command over the reader’s interpretation and hence imprint their ideas on the reader’s mind. On average 7.37 capital words are present in fake news, while the average is 4.73 capital words for real news, i.e., 60% more capital words are present in fake news than in real news. Hence, we infer that on an average there are more capital words used in fake news than in real news. Negative words. The usage of negative words like no, not, neither is also analyzed. This analysis is crucial for understanding the writer’s state of mind. Fake news has a mean negative word count mean of 0.147 while real news has a negative word count mean of 0.556. So, there is a higher usage of negations in real news as compared to fake news. Fake news writers tend to avoid negations as they need to be more forceful in their statements. This is supposedly to make the reader believe their statements and mitigate their odds of getting caught in a contradiction. Real news articles persuade users through sound arguments while fake news persuades users through heuristics [15]. Sentiment Analysis. Real and fake news have altogether different sentiments [16]. Majority of the news articles present in the dataset have positive sentiment. In case of fake news as shown in Fig. 1, the fractions of negative and positive sentiment occurrences are comparable. However, approximately 55% of the articles in the dataset are positive, 40% are negative and 5% are neutral. Figure 2 shows the distribution of sentiments for real news. For real news articles with negative sentiment have a lesser fraction, whereas the contrary is observed for articles with positive sentiment. This observation leads to the inference that fake news is equally inclined toward positive as well as negative sentiment, but real news is inclined more toward positive sentiment.

Fig. 1 Sentiment analysis for fake news

392

V. Desai et al.

Fig. 2 Sentiment analysis for real news

Liars tend to use more negative emotions due to guilt and fear of getting caught [17, 18] and hence we can infer that fake news has a higher fraction of negative sentiment.

3.3 Image Analysis Images play a major role in deceiving people and in making them believe the fake news related to the false images. We observed that the most used deception traits employed are use of (1) irrelevant images, (2) altered/doctored images and (3) low resolution images. These aim to capture the reader’s attention and curiosity. People have an extremely limited ability to detect and locate manipulations of real world scenes [19]. Hence image analysis forms a crucial part of our fake news detection. The difference between real and fake images are scrutinized on many factors such as number of faces present in the image and the resolution of the image. The analysis of these surface level features provides us with interesting insights regarding the differences present between real and fake images. It is observed that there are on average 3.05 number of faces in real images, while fake images contain 1.49 faces on average. This gives the inference that there are more faces present in a real image on average as compared to fake images in general. Furthermore, there is a significant difference in resolution of images used in real and fake news. On average a real image has a resolution of 972 × 649 while a fake image has an average resolution of 754 × 461 which is significantly lesser than its real image counterpart.

Analyzing and Detecting Fake News Using Convolutional Neural …

393

Fig. 3 Architecture model of TICNN

4 Architecture 4.1 Text Image Convolutional Neural Network The model uses both the text and the image branches for predicting whether a news article is real or fake [8]. The model consists of two major branches—text and image, where the text branch is further divided into implicit and explicit branches. Therefore, there are three sub branches present—text explicit, text implicit and image branch. The layers used in the neural networks for each branch can be viewed in the model architecture present in Fig. 3. The model is further discussed in the implementation section.

4.2 Text Convolutional Neural Network When there is no image present the article is classified as fake or real based only on the textual features. TCNN model may be viewed as a sub model or subset of TICNN model. TCNN model contains only two branches—text explicit branch and text implicit branch. The implementation and design of these branches is the same as for text implicit and text explicit branches present in TICNN model. The layers used in the neural networks for each branch can be viewed in the model architecture

394

V. Desai et al.

Fig. 4 Architecture model for TCNN

present in Fig. 4. Please refer the implementation section for further information regarding these branches.

4.3 System Workflow Figure 5 represents the workflow of the entire fake news detection system. The text convolutional neural network (TCNN) consists only of text branch which deals with the text part of the news article. It contains both the text explicit and implicit subbranches. The TCNN is used for predicting the trueness probability of the input test news article when the news article does not have an image associated with it or the image is not accessible. If the image is available, the TICNN model is used for predicting the trueness probability. The trueness probability is then combined with the ranges provided by the temporal interpreter module to give a result category. The result categories are mostly real, likely real, neutral, likely fake, mostly fake and have [1.0,0.8), [0.8,0.6), [0.6,0.4), [0.4,0.2) and [0.2,0.0] ranges, respectively. Simultaneously, the news category of the article is predicted by the news classifier model and is provided to the trend analyzer. The news categories used in our study are business, politics, entertainment, sport, tech and others. The category ‘others’ contain those news articles which cannot be deterministically placed under the other five categories. The trend analyzer filters all the news articles present in the lookback window belonging to the news category under consideration. Using these results, it calculates the trend fraction and fake fraction as described in the temporal interpreter section. If the category is not trending in the lookback period, then there are no

Analyzing and Detecting Fake News Using Convolutional Neural …

395

Fig. 5 Workflow model of entire fake news detection system

changes made to the result category ranges. However, if the category is trending the temporal interpreter module changes these ranges by a value delta which is computed using the formula described in the temporal interpreter section.

5 Implementation 5.1 Text Branch The kernel size of the 1D convolutional layer that is used in this model is 4, and we use 10 filters for this layer. The two dropout layers are used to improve the model’s generalization. For the textual explicit branch, we add the dense layer with 128 neurons after the input layer, and then add a batch normalization layer to deal with the covariance shift of the hidden unit values. Then we use a ReLU layer, which

396

V. Desai et al.

is important as it helps to maintain the mean activation close to 0 and the activation standard deviation close to 1. The reason for choosing ReLU as the activation layer is that both the sigmoid and tanh functions saturate. Modern deep learning uses activation functions that do not easily saturate, like the Rectified Linear Unit, instead of functions like tanh or sigmoid that saturate more easily [20]. The model architecture has been covered. Explicit text features branch. This branch deals with explicit features which are surface level features. Explicit text features such as linguistics are provided as inputs to this branch. Explicit branch involves the creation of a model for classifying fake and real news only on explicit text features. These features are selected based on the analysis results derived from the data analysis section. The features include sentiments, the eight parameters of emotion according to Plutchik’s model of emotion, text length, number of capital words, number of capital letters, etc. The explicit branch focuses on analyzing the news article based on the features that one can observe directly or explicitly and does not investigate dealing with the hidden patterns in text. Implicit text features branch. This branch deals with the hidden patterns within the text like style of writing and other features which human readers cannot identify just by reading the text. Convolutional neural networks are used to figure out such patterns. The 1D convolutional layer scans through the entire text in a window of a predefined size. For example, if a window size of 3 is defined, it scans through 3 words at a time and searches for the hidden patterns in the text. These features, however, cannot be plotted on a graph or analyzed like we did for the explicit features. Model Implementation. We merge the two text branches by concatenating the output vectors from each channel, i.e., implicit and explicit branches. This concatenated vector is then processed to give a number as an output, which is further passed to the temporal interpreter, where this number is assigned one of the five categories as earlier mentioned. The output of the CNN is as a probability of an article being real. GloVe was used to embed the words in our model. This forms the text convolutional neural network model and is later combined with the image branch to form the text image convolutional neural network model. GloVe or Global Vectors capture the global corpus statistics directly. Euclidean distance between two word vectors provides an effective method to capture the linguistic or semantic similarity between two words. It also reveals rare but relevant words that are often not captured by an average human’s vocabulary. However, the intricate relationships between two words can rarely be captured by a single number, and hence GloVe uses vector differences to capture the meaning specified by the words and their juxtaposition. For example, the vector differences man-woman, king-queen will be roughly equal.

Analyzing and Detecting Fake News Using Convolutional Neural …

397

5.2 Image Branch This branch extracts the implicit features from the image present in the news article in order to identify its originality. Number of faces present and image height and width, i.e., resolution are the inputs to this model. Multi-task cascaded convolutional network (MTCNN) [21] model is used for face detection. This model uses multiple scaled images of the input image to recognize all the faces possible with different sizes. There are three convolutional networks P Net, R Net and O Net within this model. Each network uses output of the preceding one for increasing the precision of face recognition. P Net has been trained to recognize all the possible candidates of bounding box. R Net increases its confidence level by fluctuating all the coordinates which are inaccurate, eliminating those which overlap with other bounding boxes with higher confidence and rescale coordinates of scaled images to original coordinates. O Net uses this R Net’s output and finds all the coordinates of facial features such as eyes, nose and mouth. The end output is a dictionary whose size gives the total number of faces present. The height and width of the image are calculated using Pillow (PiL) library in python. The model architecture has been covered in the Architecture section. Model Implementation. The input layer with three parameters number of faces, width and height of image feeds its succeeding dense layer of 128 neurons. The output from this dense layer is further fed to the batch normalization layer. This layer improves the performance, speed and stability of the neural network by normalizing and transforming all the inputs to be mean 0 and have a uniform distribution of inputs which accelerates the training process. Now, at the end output of this layer is used in the ReLu activation layer. The ReLu activation layer was used because it provides better generalization as compared to other activation layers used in deep learning.

6 News Category Classification The news category is predicted by the news classifier for each news article present in the news dataset and a category column is added to the news dataset. The news classifier is trained on the BBC dataset mentioned in the datasets section. The articles are classified into one of the following six categories: business, entertainment, politics, tech, sport and others. The articles whose categories cannot be determined by our classifier since the predicted probabilities are less than the category threshold are assigned the others category. The accuracy of the news classifier on the testing dataset created from the BBC dataset is 96.4%. Figure 6 shows the category wise distribution of the news article present in the news dataset. The news categories play an integral role in the temporal interpreter, which is discussed next.

398

V. Desai et al.

Fig. 6 Percentage of articles from dataset present in each news category

7 Temporal Interpreter The temporal interpreter is the part of the model that deals with ‘time related’ affairs for the model. The dataset contains a published column which captures the timestamp that article was published on. The temporal interpreter aims to normalize the probability which is predicted by the models with the current trends in the news reporting world. The temporal interpreter leverages this data and keeps a record of the category of articles that were trending in the past n number of days, which we call the lookback window. The duration of the window can be changed according to the prevailing situation and needs. The fractions of different news categories from all the news articles present in the lookback window are calculated next. If the fraction for a particular news category crosses a predefined threshold, we brand the category as a trending category. For a trending news category, the percentage number of fake articles for that category may differ form when it is not trending. It may be the case that for the politics category the percentage of fake news articles increases near elections, i.e., when politics is in the trend [22]. The 2016 United States presidential election is a prime example. The model analyzes these trends in misinformation and investigates the behavior for each news category. The trending fraction (T ) is the fraction of news articles out of the total articles belonging to a particular category present in the lookback window. Fake fraction (F) is the fraction of news articles which are fake out of all news articles belonging to the category under consideration in the lookback window. Since the dataset contains news articles published in a period of approximately two months, we used different

Analyzing and Detecting Fake News Using Convolutional Neural …

399

lookback window sizes 3,4,7 and 9 days for each news category. The ‘others’ category is not analyzed since it would be difficult to derive an inference for uncategorized news articles. The correlation coefficient between trend fraction and fake fraction is calculated for each news category and lookback window. The correlation matrix stores the correlation coefficients for all news categories and lookback window sizes under consideration. We classify a news category as trending in a particular lookback window if the trend fraction is greater than the threshold trend fraction which was set to 0.2 during our analysis. The delta is directly proportional to the trend fraction and the absolute value of correlation coefficient. Higher the correlation value, higher is the absolute delta value. The delta is calculated using the Formulas (2) and (3). δ=

T (c − τ ) ,c > 0 n

(2)

δ=

T (c + τ ) ,c < 0 n

(3)

c is the correlation coefficient and T is the trend fraction for the news category. τ acts as a control variable and its value is calculated empirically. By experimenting we found out that the best results were achieved when τ is 0.3. n is the number of result categories which is 5—mostly real, likely real, neutral, likely fake, mostly fake. The default ranges for classifying the news article based on the model’s output probability of realness into mostly real, likely real, neutral, likely fake and mostly fake categories are [1.0,0.8), [0.8,0.6), [0.6,0.4), [0.4,0.2) and [0.2,0.0], respectively. Using delta, which is calculated for the instance, we change the ranges of the categories. We add delta to all the bounds of the ranges except for corner boundaries, i.e., at 1.0 and 0.0. When there is a positive correlation of the trending category, the ranges of the fake categories increase while the ranges of real categories decrease. For negative correlation the opposite phenomenon occurs. This changes the ranges and hence the categorization with response to the delta which means that the categorization takes into consideration the correlation between the trend fraction and the fake fraction based on the news category of the article under consideration. We shall investigate all categories individually in the following section.

7.1 Sliding Lookback Window Analysis All the plots discussed in the current section have the fake fraction, i.e., fraction of fake news present in that category within a specific window on the y axis and trend fraction on the x axis which denotes the fraction of news articles of that category present within the same window. The scatter plots for different news categories and window sizes are available here2 . The goal is to look for correlation between the two 2

https://drive.google.com/drive/u/2/folders/1EEWO3Q3h7zI11qcCF9NlrALbxBr3-z2U.

400

V. Desai et al.

Table 1 Correlation matrix for news category and window size Window size

Business

Politics

Entertainment

Sports

Tech

3

0.082

−0.938

−0.081

0.783

0.887

5

0.021

−0.941

0.176

0.795

0.901

7

0.203

−0.943

−0.248

0.964

0.826

9

0.367

−0.941

−0.565

0.903

0.861

variables mentioned above. The correlation coefficients between fake fraction and trend fraction for each window size and news category are present in Table 1. Business news category. There is an extremely weak positive correlation between fake fraction and trend fraction present for business news based on correlation coefficients present in Table 1. However, the correlation becomes stronger with the increase of window size. We find an interesting nonlinear correlation in this category for larger window sizes which forms a shape of an upside down parabola. Politics news category. The correlation coefficients present in Table 1 indicate very strong negative correlation between fake fraction and trend fraction for news articles for politics news category. This leads to the counter intuitive conclusion that when political news is trending the fraction of news which is fake decreases. The root cause may be that the total number of political news article published increased way more than the number of fake political news published when some important political event is trending. Entertainment news category. No proper conclusion can be drawn regarding the correlation between fake fraction and trend fraction for entertainment news based on the correlation coefficients present in Table 1. There are many outliers present due to sparse presence of entertainment news in the dataset. For window size 9 there is a satisfactory negative correlation, hence entertainment category news has a higher fraction of fake news when the window size is 9 days. Sports news category. The correlation coefficients present in Table 1 indicate strong positive correlation between fake fraction and trend fraction for news articles under sports category. This leads to the conclusion that when sports news is trending the fraction of news which is fake also increases. The correlation is stronger when larger window sizes of 7 and 9 days are used. Tech news category. The correlation coefficients present in Table 1 show a strong positive correlation between the fake fraction and trend fraction for tech news. There is no significant change in the correlation coefficient with respect to window size. We can infer that the percentage of fake tech news increases with respect to the increase in its trend fraction.

Analyzing and Detecting Fake News Using Convolutional Neural …

401

8 Experiments Five separate experiments were conducted for both TICNN and TCNN models for which the mean accuracy was reported. If an image is present in the news article along with the text, we use the TICNN, else TCNN is used. The dataset has 20,015 without image items and 6940 articles with related image. The experiment setup has two configurations. First, a train-test split of 90–10 (~18 k articles for training and ~2 k for testing) is used and then with 70–30 train-test split (~14 k and 6 k articles for training and testing, respectively) is used. The number of epochs is set to 10 and the maximum text length at 2000 for both the models. TCNN gives a mean accuracy of 91.5% over these experiments, and the TICNN gives a mean accuracy of 93%. The trends in the accuracy and loss for the TICNN with its corresponding epochs is shown in Fig. 7. Similarly, the accuracy and loss for the TCNN is plotted against the corresponding epochs, as shown in Fig. 8. Scikit learn was used for calculating performance metrics like precision, recall and F1 measure. These values were calculated from the confusion matrix that we obtain using scikit learn. The average results and metrics calculated over the experiments are summarized in Table 2. The accuracy of the fake news detection system without the temporal interpreter was 92.8%, and it was 93.2% with the temporal interpreter. Hence, the addition of

Fig. 7 Model accuracy and model loss for TICNN

Fig. 8 Model accuracy and model loss for TCNN

402

V. Desai et al.

Table 2 Performance metrics for CNN models Model

Accuracy

Precision

Recall

F1 Measure

TCNN

0.9120

0.9383

0.8047

0.8664

TICNN

0.9329

0.9722

0.8452

0.8566

the temporal interpreter to the system improved the accuracy by 0.4%. The accuracy for the entire system was calculated by dividing the number of correctly predicted news articles by the total number of predictions made by the system.

9 Conclusion and Future Scope Fake news spreads more than the truth because humans are more likely to spread it [23]. Fake news needs to be dealt with as quickly and as accurately as possible. This paper provides a more accurate approach for detecting fake news and provides insights into the fluctuations of fake news with respect to the trending news. For the first time, the trends in the news were factored in for the detection. The temporal interpreter proved to a valuable addition to the system since an increase in accuracy was observed. For future work, firstly, we plan to increase our dataset by adding more recent and diverse news articles from more diverse sources. Secondly, we plan to incorporate nonlinear correlations into the temporal interpreter. Thirdly, we plan to use a news category classifier which classifies news into a greater number of categories and similar number of articles in each category to improve the temporal interpreter. Lastly, we plan to analyze human behavior and social network structures [8] to identify and create better feature inputs for the models.

References 1. A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities. arXiv: 1812.00315 [cs.CL] 2. Pogue, D.: How to stamp out fake news. Scientific American (2017) 3. Silverman, C.: This analysis shows how viral fake election news stories outperformed real news on Facebook. BuzzFeed News (2016) 4. Rapoza, K.: Can ‘Fake news’ impact the stock market? (2017) 5. Schulz, A.: Where populist citizens get the news: an investigation of news audience polarization along populist attitudes in 11 countries (2019) 6. https://www.journalism.org/2017/09/07/news-use-across-social-media-platforms-2017/ 7. Rubin, V.: On deception and deception detection: content analysis of computer-mediated stated beliefs (2010) 8. TI-CNN: Convolutional Neural Networks for Fake News Detection. arXiv:1806.00749 [cs.CL] 9. https://drive.google.com/open?id=0B3e3qZpPtccsMFo5bk9Ib3VCc2c 10. Kaggle. Getting real about fake news, https://www.kaggle.com/mrisdal/fake-news 11. Greene, D., Cunningham, P.: Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering. Proc. ICML (2006)

Analyzing and Detecting Fake News Using Convolutional Neural …

403

12. Kessler, J.: Scattertext: a browser-based tool for visualizing how Corpora differ 13. Verigin, B., Meijer, E., Bogaard, G., Vrij, A.: Lie prevalence, lie characteristics and strategies of self-reported good liars 14. Rubin, V., Conroy, N., Chen, Y.: Towards News Verification: Deception Detection Methods for News Discourse 15. This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News. arXiv:1703.09398 [cs.SI] 16. Liu, B.: Sentiment analysis and opinion mining 17. Pennebaker, J., King, L.: Linguistic styles: language use as an individual difference 18. Markowitz, D., Hancock, J.: Linguistic obfuscation in fraudulent science 19. Nightingale, S., Wade, K., Watson, D.: Can people identify original and manipulated photos of real-world scenes 20. Empirical Evaluation of Rectified Activations in Convolutional Network. arXiv:1505.00853 [cs.LG] 21. Xiang, J., Zhu, G.: Face Recognition Based on MTCNN and Convolutional Neural Network 22. Allcott, H., Gentzkow, M., Chuan, Y.: Trends in the diffusion of misinformation on social media (2019) 23. Vosoughi, S., Roy, R., Aral, S.: The spread of true and false news online

Automatic Student Attendance and Activeness Monitoring System Naveena Narayana Poojari, J. Sangeetha, G. Shreenivasa, and Prajwal

Abstract Managing the student attendance is a very important functionality of an educational institution. Institutions manage this task in their own ways. Tracking of each and every student will be a job which demands high precisions. Imagine tracking the attendance as well as activeness of students throughout the class along with teaching subject is a tedious task. It is nearly impossible where the student count is very high. In our work, we have used this computational power of the computers and implemented an automated system which monitors both attendance and activeness of each individual in its covered area and stores the data in a database for later use. Our model is multitasking and has reasonable computational speed, which can easily replace the work of manually taking attendance in a very effective and fast way. This work also focusses on the activeness detection. Hence, this is a multitasking system. Nowadays, many students wear spectacles and our work can easily recognize the faces wearing spectacles also. The activeness system also works fine on students wearing spectacles.

1 Introduction We know that traditional method of taking attendance has many disadvantages. We live in a world where there can be some thousands of employees working under single person, or thousands of students in a college but very less number of faculty. If we start taking attendance for each and every person one by one, it may take a whole day to only mark the attendance. Then comes the point of taking attendance with note of timestamp also. This is never a choice for any such big institution. Now, think of something like, we have to monitor their attendance along with their drowsiness status. This is nearly impossible to do manually by only human effort. In the literature, studies have performed that the attendance monitoring system is mainly focussed on implementing using machine learning techniques. Many typical N. N. Poojari (B) · J. Sangeetha · G. Shreenivasa · Prajwal Department of Computer Science and Technology, Ramaiah Institute of Technology, Bangalore, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_36

405

406

N. N. Poojari et al.

problems are solved using present existing technologies. Taking/marking attendance in the school/college is time consuming work wherein a given class period, teachers have to teach the subject and also note the attendance of the students manually. As to overcome this situation, in [1], fingerprint, RFID wireless, iris, and face recognition methods have been used to handle the drawback, but still these methods have some drawbacks, to overcome the drawbacks, a filtering system has been used upon the calculated Euclidean distances by the Eigen faces, Fisher faces, and local binary pattern. In [2], convolutional neural network (CNN) model is used to develop a face recognition system. The CNN model includes face recognition, dataset training, data entry, and attendance monitoring. The work focusses on recognizing multiple students from an input video stream. To achieve optimal (i.e. good quality) images, [3] the authors have developed an image acquisition technique to optimal face recognition. Raspberry Pi camera is used, so that the camera supports high resolution and captures videos at high rate. In [4], the researchers have focussed on face recognition automatic attendance monitoring system by using ‘Eigen faces’ technique. All the above techniques are used only for marking attendance automatically or to detect drowsiness. In other words, all the existing systems that we know about are limited to perform only one task (i.e. either tracking the attendance or activeness but not the both). The proposed system aims to take attendance automatically and also monitor every student’s activeness during the class hour (i.e. here, the system is capable of multitasking) which helps the teacher to find out which student is not active in the class and take necessary actions like informing the student’s parents. The system scans the eyes of students whilst they are in the class and detects whether he/she is drowsy or active during the class hour. This system is capable of detecting all the faces in the video feed and monitoring them. Like this, we have come up with an effective, dynamic, and automated approach. This paper is organized as follows: Sect. 2 presents about methodology, flowchart, and algorithms used. Section 3 is about creating environment of this research work, dataset training, face detection and recognition, attendance management. Section 4 is about summarizing the result, obtained in this study. Section 5 is about conclusion to our study and future possible work.

2 Methodology In the proposed system, we have achieved the goal of monitoring both the attendance and the activeness simultaneously and continuously; in order to achieve this, we have used an organized approach. Figure 1 shows the flowchart of the applied methodology. In this research work, we have used three algorithms. They are local binary pattern histogram (LBPH) algorithm [5–7], histogram of oriented gradients (HOG) algorithm [8–10], and convolutional neural network (CNN) algorithm [11–13]. Here is the brief introduction of each of the three:

Automatic Student Attendance and Activeness Monitoring System

407

Fig. 1 Flowchart of the applied methodology

LBPH algorithm: This is one of the best performing algorithms in texture description. This is very easy to implement. Takes less CPU power and gives satisfactory results. The Fisher faces algorithm and Eigen faces algorithm will fail when there is a luminance error. These algorithms will not work efficiently under change in lighting condition, whereas LBPH algorithm works under all lighting conditions. This algorithm is computationally fast with less compatible devices. A study on these three algorithms [14] also declared LBPH as the best amongst these. Hence, we have chosen LBPH algorithm. HOG algorithm: As the name says this algorithm works on gradients in the image, since the images will have distinguishable edges. It is an object detection algorithm, and the results and the required computational power are slightly less than CNN. CNN algorithm: It is one of the techniques used in neural networks. It is a deep learning algorithm. In simple words, this is based on a mathematical operation called convolution. CNN basically stores the data in compressed form without losing the important features of the input data. This algorithm needs a high computational power to execute the work.

408

N. N. Poojari et al.

3 Implementation The entire model is implemented mainly using Visual Studio (VS) code as code editor, Python as the programming language, multiplatform Apache server. ‘MariaDB’ Perl PHP (XAMPP) [15–17] is the application used for database design. The coding part is done using only Python, because of its huge range of features. The coding environment is set up using the VS code. The system is a software application which can monitor both the attendance and the activeness of the student simultaneously. The graphical user interface (GUI) is designed using the ‘TkInter’ module [18–20]. Other features like face recognition and drowsiness detection are mainly built over the ‘OpenCV’ [21–24] and ‘face recognition’ library of Python. The implementation of our work is explained below in detail.

3.1 Database Design Using XAMPP We have designed our required database using the XAMPP application. In which we have used Apache Web server [25] and the MySQL [26] language. In this work, we have created a database in our local system. To create this, first, we have to install XAMPP and set it up. Now, open the Web browser (i.e. Firefox) and type in ‘localhost’ in the search bar and search. In ‘phpMyAdmin’, create a local database in which all the details (i.e. name of the department, course, year, semester, ID, name, division, roll number, gender, date of birth, email, phone number, address, teacher, photo taken or not) have to be stored. Any user can modify the structure and use this as per their convenience.

3.2 Setting Up the Coding Environment We have installed Visual Studio code in our Linux machine. Used Firefox as our browser for the database management. We have to create a directory to store all the photos, and the system has taken during the generation of dataset process. This process has been explained further in detail in Sect. 3.4. The ‘haarcascade_frontalface_default.xml’ file should be downloaded and has to be present in the working directory. We have to create an empty file named ‘classifier.xml’, which stores all the generated dataset.

Automatic Student Attendance and Activeness Monitoring System

409

Fig. 2 Main window of the proposed system

3.3 Home Window Our entire GUI is designed and implemented using the ‘TkInter’ module of the Python. We have our main window for navigation through different functionalities of the system. The functionalities are implemented using the ‘OpenCV’ and ‘face recognition’ modules. Figure 2 shows the main window of our work. The GUI has user convenient navigation system.

3.4 Student Details Window This window has the functionalities like adding, deleting, and updating the student details to the database. The structure of the same window is showed in the Fig. 3a. First of all, we have to add every student or the individuals who is going to be undergoing the procedure of tracking the attendance. Whenever a student detail is incorrect and

Fig. 3 a Student details window, b generating dataset

410

N. N. Poojari et al.

we have to change it, we can do that simply by using the update function. The delete function is to remove the information of the student. The button named ‘take photo samples’ has the functionality of generating the dataset. Once a student ID is selected and clicked on this button, the Webcam will be turned on and it will start taking desired number of photo samples as shown in the Fig. 3b. This system is set to take 50 face photos (i.e. used in training phase) and will store them in the directory named as the respective student ID.

3.5 Dataset Training Functionality This functionality is used to train the dataset (i.e. 50 face photos per student). We have used three different training algorithms (i.e. LBPH algorithm, CNN algorithm, and HOG algorithm). We have designed the system such that the face recognition can be done using these algorithms. Once the code generation is completed for each face, the programme writes the encoded data to the ‘classifier.xml’ file.

3.6 Continuous Monitoring of the Students In this functionality, we have methods to monitor student’s activeness and the attendance continuously and update them to the excel sheet and the database, respectively. When invoked the functionality will turn on the Webcam and start to read the video stream and detect the faces in it. Then compares the encodings of the detected faces and the pre-encoded data in the ‘classifier.xml’ file. If there is a match found, it will update to the database and excel sheet. And it uses the eye aspect ratio (EAR) [27, 28] calculations to detect the activeness of the students. This process will happen along with the face recognition. For each student, the system will be having a count variable which is initially set to zero for all students. Whenever the EAR falls below 0.25 for continuous 200 frames, the system will consider him as drowsy and increases the count variable by 1. If that particular student is drowsy more than 5 times (i.e. count variable > = 5), then the system will consider him/her as sleepy state. If the count variable is zero, then the system will consider him/her as an active state and if the count variable is between 0 and 5, then the system will consider him/her as in drowsy state.

3.7 Saved Photo Collection This functionality is to view all of our dataset related to each student which is taken at the generation of the dataset process.

Automatic Student Attendance and Activeness Monitoring System

411

Fig. 4 Attendance management window

3.8 Attendance Window This functionality is included in the ‘attendance’ button in the main window, it will redirect us to this window as shown in the Fig. 4. The import button will import all the details from our ‘.csv’ format file and display the details in the right table. After this, the data are easy to analyze. We can export all the data, we are seeing to another ‘.csv’ formatted file using the export button. We have options to select the directory under which the data file should be stored also. We have reset and update buttons also and they have the functionality of resetting and updating the attendance details, respectively.

3.9 Help and Support In this functionality, we have a GUI which displays our contact info, using this any user of our work can solve any query by just contacting us. We have included our mail addresses and a hyperlink with a WhatsApp group link for a group discussion if wanted. There is also an exit button on the home window which is used just simply exit out of the GUI.

412

N. N. Poojari et al.

4 Results and Discussion Many of the existing systems which are stated in the literature review are more time consuming and sometimes require tremendous computational power. They have only focussed on the attendance monitoring system. This proposed system has overcome many of their drawbacks. In this research work, we have added up a very effective and useful feature like drowsiness detection. This proposed system will work in case the student is wearing spectacles also. This is a very useful system for every institution with a need of an effective, dynamic, and automated monitoring system (i.e. both attendance and drowsiness). The system can be implemented in any educational institution. In order to monitor the students, a surveillance camera has to be installed in every class room. As shown in the Fig. 5a, the proposed system detects as well as recognizes faces of the students by displaying their ID, roll number, name, and department name. The system also detects multiple faces. The attendance of the students gets updated in the database (i.e. in.csv format). The system also detects the activeness (i.e. active, drowsy, or sleepy) state of the students. In Fig. 5b, it has detected the drowsiness state of the student. In this work, we are able to detect the activeness of the student even though when he/she is wearing spectacles. Any additional improvements based on the environment and the methods used can also be added into the system. This research work, makes use of the new technologies, which in turn helps to effectively perform the attendance and activeness monitoring system. System efficiency for the dataset size of 1500 photos (collected for 30 students) is displayed in the Table 1. The testing is done using a personal laptop. The time taken column represents the time taken by the system to train the dataset.

Fig.5 a Face detected and recognized, b drowsiness detected

Table 1 Comparison of the algorithms

Algorithms Accuracy (%) CPU power (%) Time taken (s) LBPH

76.66

20

61.80

HOG

83.33

55

639.28

CNN

90.00

100

608.32

Automatic Student Attendance and Activeness Monitoring System

413

The accuracy measured in case of the LBPH algorithm is using formulas (1). In our case, the total number of faces is equal to 30 (i.e. T = 30). Out of them, the system recognized 23 (i.e. C = 23) faces correctly. Hence, the accuracy calculated according to Eq. (1) is 76.66% (i.e. accuracy of the algorithm, A = (23/30)*100). A = (C/T ) ∗ 100.

(1)

Similarly, we have calculated for HOG and CNN where their accuracies are 83.33% and 90.00%, respectively. The CPU power mentioned in the Table 1 is the values taken from the ‘system manager’ during the execution of each algorithms, respectively.

5 Conclusion Our work has many applications packed in one place. Easy framework makes the entire system easy to use. As per our knowledge, there were no system in the past literature which monitors both the attendance and the activeness state of the student. In this research work, we have used three algorithms (i.e. local binary pattern histogram algorithm, histogram of oriented gradients algorithm, and convolutional neural network algorithm). The CNN algorithm gave us the highest accuracy (i.e. 90.00%). Since, our system is effective, dynamic, and automated, it is the best choice for any educational institution. At the user end, it is very easy to use our work and improve the task of student monitoring. This research work will convert a very time consuming task to a simple, easy, and automated task. As we are in the pandemic era, where everybody is supposed to wear face mask. Hence, in the future, we can add feature wherein the student face gets recognized even when he/she is wearing mask.

References 1. Samet, R., Tanriverdi, M.: Face recognition-based mobile automatic classroom attendance management system. In: 2017 International Conference on Cyberworlds (CW), pp. 253–256. IEEE (Sep 2017) 2. Chowdhury, S., Nath, S., Dey, A., Das, A.: Development of an automatic class attendance system using cnn-based face recognition. In: 2020 Emerging Technology in Computing, Communication and Electronics (ETCCE), pp. 1–5. IEEE (Dec 2020) 3. Fung-Lung, L., Nycander-Barúa, M., Shiguihara-Juárez, P.: An image acquisition method for face recognition and implementation of an automatic attendance system for events. In: 2019 IEEE XXVI International Conference on Electronics, Electrical Engineering and Computing (INTERCON), pp. 1–4. IEEE (Aug 2019) 4. Helmi, R.A.A., bin Eddy Yusuf, S.S., Jamal, A., Abdullah, M.I.B.: Face recognition automatic class attendance system (FRACAS). In: 2019 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS), pp. 50–55. IEEE (June 2019)

414

N. N. Poojari et al.

5. Mohammed, M.A., Zeebaree, D.Q., Abdulazeez, A.M., Zebari, D.A., Fadhil, Z.D., Ahmed, F.Y., Rashed, E.M.: Machine learning algorithm for developing classroom attendance management system based on haar cascade frontal face. In: 2021 IEEE Symposium on Industrial Electronics and Applications (ISIEA), pp. 1–6. IEEE (July 2021) 6. Bhavana, D., Kumar, K.K., Kaushik, N., Lokesh, G., Harish, P., Mounisha, E., Tej, D.R.: Computer vision based classroom attendance management system-with speech output using LBPH algorithm. Int. J. Speech Technol. 23(4), 779–787 (2020) 7. Shanthi, S., Nirmaladevi, K., Pyingkodi, M., Selvapandiyan, P.: Face recognition for automated attendance system using lbph algorithm. J. Crit. Rev. 7(4), 942–949 (2020) 8. Abuzar, M., bin Ahmad, A., bin Ahmad, A.A.: A survey on student attendance system using face recognition. In: 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), pp. 1252–1257. IEEE (June 2020) 9. Akay, E.O., Canbek, K.O., Oniz, Y.: Automated student attendance system using face recognition. In: 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), pp. 1–5. IEEE (Oct 2020) 10. Tamilkodi, R.: Automation system software assisting educational institutes for attendance, fee dues, report generation through email and mobile phone using face recognition. Wirel. Pers. Commun. 1–18 (2021) 11. Agarwal, L., Mukim, M., Sharma, H., Bhandari, A., Mishra, A.: Face recognition based smart and robust attendance monitoring using deep CNN. In: 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 699–704. IEEE (March 2021) 12. Sanivarapu, P.V.: Multi-face recognition using CNN for attendance system. In: Machine Learning for Predictive Analysis, pp. 313–320. Springer, Singapore (2021) 13. Derkar, P., Jha, J., Mohite, M., Borse, R.: Deep learning-based paperless attendance monitoring system. In: Advances in Signal and Data Processing, pp. 645–658. Springer, Singapore (2021) 14. Özdil, A., Özbilen, M.M.: A survey on comparison of face recognition algorithms. In: 2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–3. IEEE (Oct 2014) 15. Raskar, R.B.: XAMPP Installation, Configuration, php-mysql Connectivity on Web Technology (2020) 16. Faraj, K.H.A., Ahmed, K.H., Al Attar, T.N.A., Hameed, W.M., Kanbar, A.B.: Response time analysis for XAMPP server based on different versions of linux operating system. Sci. J. Cihan Univ.-Sulaimaniya 4(2), 102–114 (2020) 17. Friends, A.: XAMPP Apache+ MariaDB+ PHP+ Perl (2020) 18. Herath, M.H.M.N.D.: Unit-14 Python Graphical User Interface Development. Indira Gandhi National Open University, New Delhi (2021) 19. Interface, G.U. Tkinter GUI 20. Moore, A.D.: Python GUI Programming with Tkinter: Develop Responsive and Powerful GUI Applications with Tkinter. Packt Publishing Ltd (2018) 21. Mridha, K., Yousef, N.T.: Study and analysis of implementing a smart attendance management system based on face recognition tecqnique using OpenCV and machine learning. In: 2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT), pp. 654–659. IEEE (June 2021) 22. Bussa, S., Mani, A., Bharuka, S., Kaushik, S.: Smart attendance system using OPENCV based on facial recognition. Int. J. Eng. Res. Technol. 9(03), 54–59 (2020) 23. Dalwadi, D., Mehta, Y., Macwan, N.: Face recognition-based attendance system using realtime computer vision algorithms. In: International Conference on Advanced Machine Learning Technologies and Applications, pp. 39–49. Springer, Singapore (Feb 2020) 24. Gupta, N., Sharma, P., Deep, V., Shukla, V.K.: Automated attendance system using OpenCV. In: 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), pp. 1226–1230. IEEE (June 2020) 25. Both, D.: Apache web server. In: Using and Administering Linux, vol. 3, pp. 215–234. Apress, Berkeley, CA (2020)

Automatic Student Attendance and Activeness Monitoring System

415

26. Jose, B., Abraham, S.: Performance analysis of NoSQL and relational databases with MongoDB and MySQL. Materials Today: Proceedings 24, 2036–2043 (2020) 27. Srivastava, S.: Driver’s drowsiness identification using eye aspect ratio with adaptive thresholding. In: Recent Trends in Communication and Electronics, pp. 151–155. CRC Press (2021) 28. Maior, C.B.S., das Chagas Moura, M.J., Santana, J.M.M., Lins, I.D.: Realtime classification for autonomous drowsiness detection using eye aspect ratio. Expert Syst. Appl. 158, 113505 (2020)

Greenput Algorithm for Minimizing Power and Energy Consumption in Hybrid Wireless Sensor Networks S. Beski Prabaharan and Saira Banu

Abstract We explore the issue of energy efficiency in connecting networks with transmission power control and given multiple connections to be communicated inside a bounded network. Also, the planning and force control issue with the target of limiting the complete transmission energy efficiency under the requirement that all transmission. We propose an appropriated throughput-ideal force allotment calculation in wireless sensor networks from remote. The issue has been restricted because of the non-convexity of the hidden streamlining issues that denies a productive arrangement even in a unified setting. We see that in a transfer-based remote organization, throughput-based reasonableness essentially restricts the exhibition of the hand-off clients energy is controlled. To detect this issue, we propose the thought of max–min energy effectiveness throughput decency, by the hand-off client energy is utilized and receive better throughput. We further propose an effective information rate allotment calculation to accomplish the decency objective in a multi-jump transfer remote organization. We propose a Greenput calculation that fulfills the optimality conditions in this manner accomplishing (almost) 100% throughputs.

1 Introduction Resource allocation in multi-chip wireless networks includes how to handle shared output, scheduling, and power distribution problems that are difficult to aggregate [1]. Because of these issues, most of the current work, considering how to write a simple configuration where all the hubs in the organization take a constant transmission power, and the active part, this is amplified by simply connecting to the booking problem. At the end of the year, the prevalence of mobile devices, including iPads, iPhones, and Android smartphones, as well as other Internet applications, such as S. Beski Prabaharan (B) School of Computer Science and IT, Jain Deemed-To-Be University, Bangalore, India S. Banu Department of Computer Science and Engineering, School of Engineering, Presidency University, Yelahanka, Bangalore, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_37

417

418

S. Beski Prabaharan and S. Banu

Web crawler, Facebook, and YouTube, is stimulated by the importance of external knowledge management [2]. Accordingly, to be effective, IDI speaks to remote and wireless networks as expected, ease of operation in the radio range. Effective assets encourage you to try many different situation management systems, especially when it comes to modesty. In this situation, and the general transit organization may, in principle, feel offended by the low connection level to limit clients, due to the presence of more than one channel, access to the entire time. Responsive time is decent, [2] which makes it possible for each client to have reasonable timing parts that are then developed in the context of the transfer. However, the lower limit of connections may, in principle, be responsible for [3]. To solve this problem, on the one hand, an energy saving plan is developed and contracts are sent. After that, once again take into account that the Russian energy resources used in planning are a reasonable rate in the order of approximations. We are focused around this issue. Place transit based on modesty, consider a reasonable strategy, depending on energy utilization. In particular, we propose min–max energy, throughput, and level of service which implements that the max–min throughput of modesty is a unit of energy. This is justified impartially, and the SPAC is divided into higher data transfer rates if it absorbs more energy per unit time. One of the most important tourist destinations in the division of property will be to increase the size of the normal appearance of tariffs that can be handled when using the power grid [4]. Energy production is important for both business leaders and end customers. According to the administrator’s point of view of organizations, energy is a work efficiency that involves accepting the operating costs of working with natural support and contributing to reducing their exposure to coal. The end user points the energy are saved based on acceleration and life time of battery so energy properly utilized and recover from falling issues [5].

2 Related Works In the past, large-scale research has been conducted in remote LAN areas. This research includes energy transfer, mitigate, agreement and resistance of the supervisory board, energy saving, and this is the channel in the assignment. For example, an article tends to be a point in the problem domain-specifically around the task channel, as well as several different models to play some prominent role and channel in the task. I think, it is an idea to send a power regulator to save electricity lowered before transferring power to bring more energy savings [6]. In addition, to extract the effect control, we mainly use the improvisation and organization fields that we have in the cell, and this is not a problem with multiple channels or hard-to-find channels in calculations. A list of some exams and main thoughts is presented as follows:

Greenput Algorithm for Minimizing Power and Energy Consumption …

419

Wong et al. zeroed in on-this is how the organization develops, channel settings, and string lengths, this is the ideal power level that ensures minimal energy consumption for transmitting data elements. Because of the above, the developers gave the package length, an additional force control convention to unexpected organizations using two-state Markov channel models. Eventually, they will be presented and discussed, as well as the critical energy that has been made to use their energy systems [7]. Manikandan et al. gives three different power control systems in the IEEE 802.11 standard. In this case, when the impedance drops, a similar cell organization occurs. He struggled to divide to reflect calculations based on different cells, and thought about the storage behavior of data occurring in different spatial situations, as well as about time. The results obtained show that the work performed by the stress management system requires a significant reduction in the level of occlusion [8]. Jiah Hu et al., which describe the technique of using wind power, are used in communication, close to the roots, will even comply with the law to collect a few and even with a lot of luck and a lot of blurring in routing channel description. This paper gives a steering system that controls power up to 802.11v, external to specially designated areas [9]. Developers were able to drop bombs and bring out real parameters of power transfer, control and using 802.11b cards, for example, CISCO Aironet 350, with a mobile connection effect at various levels. In this agreement, take advantage of energy intensity, since information is managed directly through an application that is versatile, portable, and can be frequently used [10]. The associated chipping away from energy and abilities of optimization problems, as it is appropriate, including works that include, but are not limited to, works [11]. Runga et al. research to promote energy efficiency, and internal connections consist of a pair of transmitter and receiver, which together with their energy, performance determination, are intended to be incorporated as the ability to transmit the power and power accumulated in the equipment. Have the energy and skills that vaccination usually does not extend to capture bandwidth. In our article, we will deal with a similar problem, but we need to take into account the multicellular situation, which is more staggering. They describe a method of reducing power, using a form close to the base level, while simultaneously realizing the right to collect a pile and even with great luck and a lot of blurring in the English channel. This paper shows the design of a rudder that controls power in the IEEE 802.11b standard (external to dedicated organizations. Developers were able to collect and transmit control power to the exact type of transmission using new IEEE 802.11b) cards (for example, CISCO Aironet 350, with a mobile connection) at various levels. In this agreement, take advantage of the power, skills, and information managed directly through an app that is universal, portable, and progressively functional [12].

420

S. Beski Prabaharan and S. Banu

3 Flow Control and Representations In this article, we use flow controllers that can be deterministically and securely bound to a queue on the network. We combine the Lyapunov optimization method and the scheduling mechanism proposed for constructing routing operation, scheduling, optimized flow control for end user wireless sensor networks. The utility network maximization problem is based on the assumption and all the buffered data are infinite life time. The proposed algorithms have for network operation with specific buffers, which, assuming static nature, routing, and speed of traffic influence and that are located within the transit zone. In this paper, we consider the general speed at which traffic arrives, based on inside/outside the transit region, network life time is limited from buffered input. The dynamic routing is selected for achieving the maximum possible network throughput. The proposed system can be listed below. • We used flow control algorithms with high network utilization and deterministically bounded latencies with all buffered regions in the network. In addition, internal buffers are obtained based on overflow results. • We show that there is a trade-off between buffer size and achievable AC and usability. • We show that the delay in queue input mode and it is used monitoring flow control, although it does harm the value of queue arrears. • We show by modeling that the control algorithm works very well in the mode of congested traffic. In particular, they achieve near-optimal usability, performance is very low and limited by ready queue inputs. The energy efficiency in wireless sensor networks is justified, and there is a huge interest in studying the variety of balancing mechanisms to achieve energy efficiency at each protocol level. In this paper, time delay is also taken based on network lifetime and parameters. We propose a Greenput algorithm for selecting dynamic frame in each network calibration which takes a given power distribution. The latency and efficient energy are the two major factor, and our proposed Greenput algorithm is used to reduce the transmission power and save the energy. So it is penalized too many packets in delayed network lifetime (Fig. 1).

Fig. 1 Closeup of nodes 1 and 2

Greenput Algorithm for Minimizing Power and Energy Consumption …

421

Separating Hyperplane Kernel map

Fig. 2 Kernel space support vector framework in WSN

The optimal raw material for sending links (part 1, 2) is green material. Optimal steering wheel life in reverse direction (with link (2,1)), blue product. This work has shown in network simulations that can be achieved the same performance with minimum cost. It is minimizing consumed power by the network. The signal-to-noise ratio is calculated and performance metrics also measured. The energy management is considered for implementing linear operations and formulated in network space. So the WSN Kernel space is shown in the Fig. 2.

4 Greenput Algorithm In this article, we discussed bandwidth, life time, energy consumption, and throughput issues are taken and designed Greenput algorithms to improve the energy efficiency of networks. Tassiulas and Ephremides introduced stability region in WSN Kernel space, which is used to set of all arrival rate vectors that can be continuously maintained. It is a single routing and scheduling policy that achieves 100% throughput and stabilizes the network every time. The arrival rate is within the stability of the entire region. Low-latency design in wireless networks allows you to maximize the use of network bandwidth, and it is important task to handles network theory and simulations. In this paper, Tassiul and Efremid propose to create a Greenput routing and scheduling algorithm for achieving maximum network throughput and more efficient network management algorithms (Fig. 3).

422

S. Beski Prabaharan and S. Banu

Fig. 3 Granular space representation of WSN nodes

Algorithm: Greenput Algorithm Input: Delay vector in n-frame (Start) Y (n) = (y1(n), y2(n),..., u(s)) Output: N-frame length, M 1: (Queue node is Empty there is no delay and the n-frame value, set M = (Tmin). 2 Otherwise, run algorithm step 5. 3: (power saving mode) after we run 2 Plot TDMA {ti, i = 1, 2, . . ., N}, set Tn =_Ni=1 ti = Tmax and plot the TDMA frame. 4, Otherwise it means solving the minimum time planning problem. 5: Power node = maximum value. {It is in optimal} minimum time_schedule {{t_g, g ∈ H_} and _g∈H_ t_g ≥ Tmax. Set Tn = _g∈H_ t_(g) and take minimum_time to schedule queue. 6: contrary to splitting the minimum_time graph into two sub-kernel_spaces: the subspace associated with the group_size is greater than 1, and the subspace associated with the group_size is equal to 1. 7: Power saving is Mixed, can do this with a group_size greater than 1 and the position of another yi(n), in the TDMA scheme of the algorithm, 2 of them are using TDMA schedule from Algorithm 2 by using ˜yi(n), i = 1, 2, . . ., n in (24) and ˜ Tmax in (30) as its inputs. Let Tn = Tmax be the sum of the name lengths associated with a group size greater than 1, and then the length of the new TDMA subgroups.

Multi-model, selecting the optimal electrical power distribution for access points (aps) to minimize overall interference between them, and also to make sure that each user knows that they are exceeding a pre-set threshold. There are three transmission power levels to consider, namely minimum, medium, and maximum. For users taking

Greenput Algorithm for Minimizing Power and Energy Consumption …

423

Table 1 Result of energy and power consumption status using Greenput Charact Areas CON M (N/SL)

SD (N/SL)

MIG M (N/SL)

SD (N/SL)

ABU M (N/SL)

SD (N/SL)

C

90

1.075/1.069 0.024/0.028 1.08/1.069

0.012/0.026 1.076/1.057

116

1.065/1.084 0.02/0.027

0.012/0.02

1.066/1.074

0.011/0.021

L

90

1.007/1.044 0.001/0.008 1.007/1.048 0.001/0.007 1.007/1.045

0.001/0.007

116

1.004/1.03

0.001/0.004

W

90

0.995/0.807 0.007/0.018 0.997/0.804 0.001/0.017 0.995/0.8

116

0.996/0.834 0.004/0.016 0.997/0.828 0.001/0.015 0.9962/0.822 0.002/0.016

1.07/1.09

0.001/0.005 1.004/1.032 0.001/0.005 1.004/1.031

0.013/0.02

0.002/0.017

over coverage, it is measured in both access points at the same power level minutesminutes, minutes-half-half_max, max_max, have similar speed in all positions (Table 1 and Fig. 4). The time is divided into equal gap values that are large enough to use packet data transmission and divided into frames. Some of the TDMA work as well as focus on minimizing the frame size will be determined that each computer or feedback loop is assigned at least one of the slots. In this work, however, the length of the frame is determined depending on the delay time for one jump. Each node generates a random number of fixed-length packets to be transmitted in a TDMA frame. This is a request to send data. Packets that have not been transmitted for a certain period of time will

Fig. 4 Simulation results of Greenput algorithm using MATLAB

424

S. Beski Prabaharan and S. Banu

be activated. Pass-through a data packet (such as the (g) sensor that is located in the pool), in each TDMA frame, you will be redirected to the fourth node location. In the next TDMA frame, this will be considered a transmission request for the area of this node to the receiver.

5 Conclusion In this article, we will look at bandwidth optimization-energy consumption limited by proprietary, relay, and wireless networks. With a large user transfer capacity, left, and overall system throughput improved based on throughput and efficiency. However, the end user has a disadvantage due to limited power. In this regard, we assume that the max–min form is energy efficient, bandwidth efficient, and fair. The task of optimizing the data transfer rate for implementing the proposed promotion goals, which are non-linear intervals. We developed Greenput algorithm for decomposing the data rate, in order to achieve maximum and minimum energy efficiency, bandwidth, and fairness using a numerical approach. Confirm the benefits of proposed equity, policies, through extensive modeling.

References 1. Manikandan, S., Chinnadurai, M.: Effective energy adaptive and consumption in wireless sensor network using distributed source coding and sampling techniques. Wirel. Pers. Commun. (2021). Springer, https://doi.org/10.1007/s11277-021-08081-3 2. Davaslioglu, K., Ayanoglu, E.: Quantifying potential energy efficiency gain in green cellular wireless networks. IEEE Commun. Surv. Tuts. 16(4), 2065–2091 (Fourth Quarter 2019) 3. Lin, X., Shroff, N.B., Srikant, R.: A tutorial on crosslayer optimization in wireless networks. IEEE J. Select. Areas Commun. 24(8), 1452–1463 (2016) 4. Sharma, G., Shroff, N.B., Mazumdar, R.R.: On the complexity of scheduling in wireless networks. In: ACM MOBICOM. Los Angeles, CA (Sept 2016) 5. Modiano, E., Shah, D., Zussman, G.: Maximizing throughput in wireless networks via gossiping. In: ACM SIGMETRICS/Performance. Saint-Malo, France (June 2016) 6. Eryilmaz, A., Ozdaglar, A., Modiano, E.: Polynomial complexity algorithms for full utilization of multi-hop wireless networks. In: IEEE INFOCOM. Anchorage, AK (May 2017) 7. Sanghavi, S., Bui, L., Srikant, R.: Distributed link scheduling with constant overhead. In: ACM SIGMETRICS. (June 2017) 8. Gupta, A., Lin, X., Srikant, R.: Low-complexity distributed scheduling algorithms for wireless networks. In: IEEE INFOCOM. Anchorage, AK, (May 2017) 9. Manikandan, S., Raju, K., Lavanya, R., Gokila, R.G.: Web enabled data warehouse answer with application. Appl. Sci. Rep., Progressive Sci. Publ. 21(3), 84–87 (2018). E-ISSN: 23109440/P-ISSN: 2311-0139. https://doi.org/10.15192/PSCP.ASR.2018.21.3.8487 10. Feng, D., Jiang, C., Lim, G., Cimini, L.J., Feng, G., Li, G.: A survey of energy-efficient wireless communications. IEEE Commun. Surv. Tuts. 15(1), 167–178 1st Quart (2013) 11. Tan, G., Guttag, J.: Time-based fairness improves performance in multi-rate wireless lans. In: USENIX Annual Technical Conference (2014) 12. Ebert, J.P., Stremmel, B., Wiederhold, E., Wolisz, A.: An energy-efficient power control approach for WLANS. J. Comun. Netw. 197–206 (Sep 2015)

Greenput Algorithm for Minimizing Power and Energy Consumption …

425

13. Elena, L.A., Casademont, J.: Transmit power control mechanisms in IEEE 802. 11. International Conference On Communications And Mobile Computing, pp. 731–736. ACM New York, NY USA (2016) 14. Sheth, R.H.: An Implementation of Transmit Power Control in 802. 11b Wireless Networks. University of Colorado, Boulder (2012) 15. Manikandan, S., Dhanalakshmi, P., Priya, S., Mary Odilya Teena, A.: Intelligent and deep learning collaborative method for E-learning educational platform using TensorFlow. Turk. J. Comput. Math. Educ. 12(10), 2669–2676 (2021). E-ISSN: 1309-4653

Cloud-Based Face and Face Mask Detection System V. Muthumanikandan, Prashant Singh, and Rithwik Chithreddy

Abstract Security has always been an essential issue in both residential and official buildings. A home security system offers several benefits apart from keeping owners and their property, safe from unknown intruders. Even home automation has become an important part of our life that at least, three smart devices are there in every house. Either by scheduling tasks or learning our routine, these smart devices make our lives easier by unloading these tasks on these devices. In this project, the aim is to design and implement a security system with facial detection which is implemented on the cloud, so that cost and efficiency of the system can be reduced. The system consists of many systems including SMS, face detection, and face mask detection.

1 Introduction Recognition refers to technologies that can identify or verify ownership or source of subjects, images, and videos. The first facial recognition algorithm was developed in the early seventies. Since then, their accuracy has increased to the point that the decision is often preferred to other biometric techniques, traditionally regarded as reliable, such as fingerprint recognition or iris recognition. One of the key features, which makes the recognition more attractive than other biometric methods, is the simple quality of the products.

V. Muthumanikandan (B) School of Computer Science and Engineering (SCOPE), Vellore Institute of Technology, Chennai, India e-mail: [email protected] P. Singh · R. Chithreddy School of Electronics Engineering (SENSE), Vellore Institute of Technology, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_38

427

428

V. Muthumanikandan et al.

1.1 Overview Over the years, the methods of recognition have undergone significant changes. Traditional methods that use hand-crafted to the edge of the functions and the structure of the descriptors in combination with machine learning methods, such as principal component analysis (PCA), linear discriminant analysis (LDA), and support vector machines (SVMs). The complexity of the technological features that were resistant to any of the different varieties that occurred in unbounded environments made the researchers focus on specific techniques for each type of variation, such as age-invariant methods and position-invariant methods.

1.2 Conventional Methods Geometry-Based Methods—Kelly and Kanade’s doctoral thesis at the beginning of the seventies, which is regarded as the first scientific work on the automatic recognition of control, and they are suggested to make use of specialized boundary, and the outline of the detectors to determine the location of the prefabs, as well as personal to save, and relative measurements, the positions and the distances between them. The authors show that matching some of the components that can be upgraded to a gradient of the drawing gives a higher accuracy of recognition than comparing the geometric properties. However, the geometry-based method is faster and requires less memory. Holistic Methods—Most of these methods work by casting images of the faces onto low-dimensional space that removes or hides the non-essential details and variations that are not needed for tasks like recognition. One of the most well-known methods is the use of principal component analysis (PCA) to a set of images used for training to find the eigenvectors that have the most variance in the dataset.

1.3 Related Works Face detection and recognition systems based on Raspberry Pi. They used Haar detection and PCA using the eigenface algorithm. Python coding was used to run the hardware setup presented in [1]. Smart home systems can be connected to the face recognition security system using Raspberry Pi. Eigenface used the feature extraction, along with the PCA classifier presented in [2]. A recognition technique in which the image is captured, unwanted parts are eliminated, the alignment is set to be perpendicular, and feature extraction is applied in [3]. It proposes the use of concepts of cloud computing and edge-based AI. This can benefit various areas such as face recognition and secure environment which was presented in [4]. Creating an intelligent security system using image processing for face identification. Only the

Cloud-Based Face and Face Mask Detection System

429

facial part of the picture is taken which is required for feature extraction and is sent to a database, described in detail in [5]. Explains an IoT-based surveillance system that can be used in buildings, houses, or marketplaces using RPi and Wi-Fi technology. Some important aspects of this system are live video streaming and remote alerts presented in [6]. Designed a system using Microsoft Kinect sensor technology and Azure’s cloud technology for facial recognition. Possible areas of usage are home automation and security, automotive industry, etc., which was presented in [7]. Comparison of various algorithms which are used for facial detection and recognition based on various factors like eigenvalues, vector scores, etc., and the results presented in [8]. A transient plane was proposed in order to handle the failures occurred during the transmission [9]. Building a robust architecture using artificial intelligence and the Internet of things to implement a more advanced security system is presented in [10]. To recover from failures, fast rerouting technique was proposed [11].

2 Proposed System 2.1 Methodology The project is mainly divided into the following subsections: Face Detection Using Microsoft Face—The main component of this project is deploying a Node-RED flow that uses a variety of services like Microsoft Azure, Twilio, and IP Webcam to detect if any person is detected at the doorstep or not. The step by step flow of the program is as follows: • The PIR sensor/push button will trigger image capturing. • IP camera will capture the image and be saved in the local directory. • Wi-Fi router is used to connect Webcam to Raspberry Pi and connect Raspberry Pi to the Internet. • Using Microsoft face API, the image captured will be tested for face detection. • Node-RED is used to link all the services—reading the image, processing the image via Microsoft face by Microsoft Azure, Twilio for message service. • The result will be messaged to the user by the Twilio service. Mask Detection Using CNN Model—Transfer learning has been used to train the CNN model. Transfer learning is based on using pre-trained CNN models from the libraries and building a different CNN model on top of it. By doing transfer learning, the need to train all the layers are eliminated. It is done by freezing the bottom layers of the model and training and adding layers on top of it which suits the output condition. The VGG16 model is used for training the custom CNN model. Additionally, the ImageDataGenerator class can be used for augmenting the dataset to give better

430

V. Muthumanikandan et al.

Fig. 1 Block diagram

results. The ImageDataGenerator randomly rotates, rescales, and horizontally flips the training data for each epoch, thereby creating more data. For deploying the project, Google Collab is used. Keras and OpenCV are mostly used for training, importing, and saving the models. NumPy is used for handling array data. Matplotlib is used for showing various graphs and performance analyzes. Sklearn metrics is used to display the Confusion Matrix. The block diagram of the proposed system is given in Fig. 1.

2.2 Applications/Advantages This system can be installed in homes at doorways, and it will continuously monitor for activity at your doorstep. It will also monitor if the person is wearing a mask or not and will accordingly alert the user and the person too. For Face detection, Microsoft Azure face is used which is very fast and among the few cloud-based face detection services left currently because many such services are starting to get deprecated. For Face mask detection, the use of Transfer Learning along with data augmentation can harness the true capabilities of CNN models, without the requirement of a large dataset.

Cloud-Based Face and Face Mask Detection System

431

3 Simulations/Implementation Results 3.1 Results with Description There can be 3 test cases or message outputs that can be obtained while executing the Node-RED flow: Camera Not Working: When the Node-RED flow is triggered by PIR, but the camera is not able to capture images and save them in Raspberry Pi. Some Activity Detected: When the PIR triggers the Node-RED flow and camera is also able to image successfully, but a person is not detected in the image. Someone Arrived at Your Doorstep: When a person is successfully detected in the image using Microsoft Azure face.

3.2 CNN Model The pre-trained VGG16 model has been used, by freezing the top layers of this model and adding the model on top of it. The model is trained for 10 epochs. The training vs validation accuracy and training vs validation loss for different epochs is displayed in the following figures (Figs. 2 and 3). Training loss is obtained while training the model using the training dataset, and validation loss is obtained while testing the model using validation dataset. The model should have less validation loss. The results of training and validation loss are described in Fig. 2. Similarly, the accuracy of model obtained while training the model using training dataset is training accuracy while the accuracy measure obtained while testing the

Training/Validation Loss vs No. of epochs train loss

10

val loss

Loss

8 6 4 2 0 1

2

3

4

5

6

No. of epochs Fig. 2 Training loss and validation loss for different epochs

7

8

9

10

432

V. Muthumanikandan et al.

Training/ValidaƟon Accuracy vs No. of epochs 1

Accuracy

0.9 0.8 0.7

Accuracy

0.6

Val. Accuracy

0.5 0.4 1

2

3

4

5

6

7

8

9

10

No. of epochs Fig. 3 Training loss and validation accuracy for different epochs

Table 1 Classification report of the model Precision

Recall

F1-score

Support

Mask

1.00

0.95

0.98

43

No mask

0.93

1.00

0.96

26

Accuracy





0.97

69

Macro avg

0.96

0.98

0.97

69

Weighted avg

0.97

0.97

0.97

69

model with validation dataset is validation accuracy. The results obtained are shown in Fig. 3. Table 1 shows the classification report which shows the precision, recall, F1-score, and support of the final CNN model obtained.

4 Conclusion and Future Works 4.1 Conclusion The results obtained are satisfactory. The Node-RED provided accurate results as planned from the start. The flows functioned perfectly. In terms of face mask detection, the proposed system did outperform some of the preexisting models. The precision and accuracy are around 97% which is a pretty good score.

Cloud-Based Face and Face Mask Detection System

433

In this project, the different capabilities of the Node-RED programming tool are clearly described, especially the ease with which one can connect different services in a program. The graphical UI made it easy for visualization of the program flow. The transfer learning used in the training of the CNN model can show the true capabilities of deep learning models for image identification.

4.2 Future Works Face recognition technology, if paired with artificial intelligence (AI) and deep learning (DL) technology, can prove to be beneficial for many industries like airports, mobile phone manufacturing companies, law enforcement agencies, home appliance manufacturing companies, etc. Face recognition technologies can be used to monitor and prevent violence and crime. Mobile phone manufacturers can use face recognition to get biometric device security to create a more secure environment. Therefore, the use cases of face detection and recognition are countless in terms of various applications of security, user interactivity, and automation.

References 1. Gupta, I., Patil, V., Kadam, S.: Face detection and recognition using raspberry PI (2016) 2. Gunawan, T., Gani, M.H.H., Rahman, F.D.A., Kartiwi, M.: Development of face recognition on raspberry pi for security enhancement of smart home systems. Indonesian J. Electr. Eng. Inf. 5, 317–325 (2017). https://doi.org/10.11591/ijeei.v5i4.361 3. Akshay A.P., Vrushsen P.P.: Face recognition system (FRS) on cloud computing for user authentication (2013) 4. Zeng, J., Li, C., Zhang, L.J.: A face recognition system based on cloud computing and AI edge for IoT. In: Liu, S., Tekinerdogan, B., Aoyama, M., Zhang, LJ. (eds.) Edge Computing—EDGE 2018. EDGE 2018. Lecture Notes in Computer Science, vol. 10973. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94340-4_7 5. Balla, P.B., Jadhao, K.T.: IoT based facial recognition security system. 2018 International Conference on Smart City and Emerging Technology (ICSCET), pp. 1–4. Mumbai, (2018) https://doi.org/10.1109/ICSCET.2018.8537344 6. Sruthy, S., George, S.N.: WiFi-enabled home security surveillance system using Raspberry Pi and IoT module. 2017 IEEE International Conference on Signal Processing, Informatics, Communication, and Energy Systems (SPICES), pp. 1–6. Kollam, (2017). https://doi.org/10. 1109/SPICES.2017.8091320 7. Dobrea, D., Maxim, D., Ceparu, S.: A face recognition system based on a kinect sensor and windows azure cloud technology. International Symposium on Signals, Circuits and Systems ISSCS2013, Iasi, pp. 1–4. Romania, (2013). https://doi.org/10.1109/ISSCS.2013.6651227 8. Chawla, D., Trivedi, M.C.: A comparative study on face detection techniquesfor security surveillance. Advan. Intell. Syst. Comput. 531–541 (2017). https://doi.org/10.1007/978-98110-3773-3_52 9. Vanamoorthy, M., Chinnaiah, V.: Congestion-free transient plane (CFTP) using bandwidth sharing during link failures in SDN. Comput. J. 63(6), 832–843 (2020). https://doi.org/10. 1093/comjnl/bxz137

434

V. Muthumanikandan et al.

10. Verma, R.K., Singh, P., Panigrahi, C.R., Pati, B.: ISS: intelligent security system using facial recognition. In: Panigrahi, C.R., Pati, B., Mohapatra, P., Buyya, R., Li, K.C. (eds.) Progress in Advanced Computing and Intelligent Engineering. Advances in Intelligent Systems and Computing, vol. 1198. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-65847_10 11. Muthumanikandan, V., Valliyammai, C.: Link failure recovery using shortest path fast rerouting technique in SDN. Wirel. Pers. Commun. 97, 2475–2495 (2017). https://doi.org/10.1007/s11 277-017-4618-0

Design of Delay-Efficient Carry-Save Multiplier by Structural Decomposition of Conventional Carry-Save Multiplier M. Venkata Subbaiah and G. Umamaheswara Reddy

Abstract Multiplication is an essential operation in many signal and image processing applications. In this paper, alternative structures are proposed for binary multiplication using the carry-save addition process by structural decomposition of the conventional carry-save multiplier (CSM). The proposed structures are helpful to achieve parallel computation by reducing the number of levels required to reach the ripple carry adder (RCA) stage. The 8-bit and 16-bit multipliers are designed and coded in Verilog HDL. The functional verification and synthesis of the circuits are done in Xilinx Vivado 2017.2 on target device ‘xc7z010clg400-1’ of the Zynq 7000 family as well as on ‘xc7s50fgga484-1’ of the Spartan 7 family. Further, the performance of the circuits is compared with respect to the number of LUTs and critical path delay.

1 Introduction The multiplier is one of the essential elemental blocks in signal and image processing applications such as convolution, filtering, discrete cosine transform (DCT), discrete Fourier transform (DFT), and fast Fourier transform (FFT) [1–4]. A multiplier of large bit size is used in the public-key cryptosystems such as elliptic curve cryptography (ECC) and RSA [5]. The performance of the above-stated applications is greatly influenced by the performance of the multiplier. The multiplication of two binary numbers is achieved in three steps that are the generation of partial products, accumulation of them, and the final product is generated by RCA. Over the years, many techniques have been proposed for the multiplication operation [6–12]. A few recent works are listed as follows. Pramod et al. [13] proposed an efficient signed CSM where the full adder (FA) was replaced by an improved FA in the partial product reduction tree and employed a modified square root carry select adder in place of the RCA. A low-power and low-area array multiplier with M. Venkata Subbaiah (B) · G. Umamaheswara Reddy Department of Electronics and Communication Engineering, S. V. University College of Engineering, Sri Venkateswara University, Tirupati, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_39

435

436

M. Venkata Subbaiah and G. Umamaheswara Reddy

a carry-save adder is proposed in [14] where an RCA was removed by giving the carry bits from the current bit position of the last stage of the partial product reduction tree to the next left column. In [15], a multiplier is proposed based on a hybrid parallel adder to improve the speed of multiplication. A power-efficient Wallace tree multiplier (WTM) is proposed in [16] where a power-efficient 7:3 counter-based on multiplexer and EX-OR gates was proposed. In [17], an approach is proposed to design a high-speed error-free multiplier where a 15:4 counter was proposed using a new 5:3 counter. However, WTM reduces the number of levels significantly in the computation, but the structure is irregular. In this paper, two alternative structures are proposed for binary multiplication by structurally decomposing the conventional CSM. The proposed structures are helpful in reducing the number of levels required to reach the RCA stage. With the proposed structures, 8-bit and 16-bit multipliers are designed. Unlike WTM, the proposed multipliers are regular in structure. The remaining sections of the paper are sequenced as follows. Section 2 describes the conventional CSM, and the modified CSM is discussed in Sect. 3. The performance of several multipliers is compared in Sect. 4, and Sect. 5 describes the conclusion.

2 Conventional Carry-Save Multiplier The 8-bit multiplication [18, 19] with multiplicand A = A7 A6 A5 …..A1 A0 and multiplier B = B7 B6 B5 …..B1 B0 is as shown in Fig. 1. The partial products are obtained by multiplying the multiplicand A with each bit of the multiplier B starting from the least significant bit (LSB), which would be added together to produce the final product P

Fig. 1 8-bit binary multiplication [18, 19]

Design of Delay-Efficient Carry-Save Multiplier by Structural …

437

Fig. 2 Logic diagram of 8-bit conventional carry-save multiplier [20]

= P15 P14 P13 .…..P1 P0 . Pij represents the partial product bits, which is given by Pij = Ai .Bj where i = 0 to 7 and j = 0 to 7. Figure 2 shows the logic diagram of 8-bit conventional CSM [20]. In the multiplication operation, the carry produced in the ith bit position from the jth step is applied to the (i + 1)th bit position in (j + 1)th step. To get the final product, the circuit requires seven levels of carry-save addition and an RCA stage. The worst-case delay (WCD) path [21] of the circuit is highlighted with half adder (HA) and FAs (yellow-colored). The WCD [13] is given by Eq. (1), τWorst - Case = τAND + τHA - Cout + 6τFA - Sum + 7τFA - Cout

(1)

where τWorst - Case is the WCD of the circuit, τAND is the time delay (TD) introduced by the 2-input AND gate from its inputs to output, τHA - Cout is the TD introduced by the HA from its inputs to output carry ‘Cout’, τFA - Sum is the TD introduced by the FA from its inputs to output ‘Sum’, and τFA - Cout is the TD introduced by the FA from its inputs to output carry ‘Cout’. Similarly, for 16-bit multiplier, the WCD is given by Eq. (2). τWorst - Case = τAND + τHA - Cout + 14τFA - Sum + 15τFA - Cout

(2)

438

M. Venkata Subbaiah and G. Umamaheswara Reddy

For N-bit multiplier, the WCD can be generalized by Eq. (3). τWorst - Case = τAND + τHA - Cout + (N − 2)τFA - Sum + (N − 1)τFA - Cout

(3)

The circuit uses two adder circuits, half adder and full adder, to perform the addition of partial products.

2.1 Half Adder A half adder is a combinational circuit that accepts two inputs X and Y of 1-bit size and produces two outputs the sum and carry ‘Cout’ of 1-bit size each. The sum and carry ‘Cout’ may be expressed as in Eqs. (4) and (5) [18], respectively. Sum = X ⊕ Y =



X.(X.Y )1

1  1 1 · Y.(X.Y )1

 1 Cout = X.Y = (X.Y )1

(4) (5)

The block diagram and logic diagram of the HA [18] are as shown in Fig. 3 a and b, respectively. The amount of TD required for producing the ‘Sum’ and ‘Cout’ outputs from its inputs is given by Eqs. (6) and (7), respectively.

(a)

τHA - Sum = 3τNAND

(6)

τHA - Cout = 2τNAND

(7)

(b)

Fig. 3 a Block diagram and b Logic diagram of half adder [18]

Design of Delay-Efficient Carry-Save Multiplier by Structural …

439

where τHA–Sum denotes the TD from inputs to ‘Sum’ output of the HA, τHA–Cout denotes the TD from inputs to output carry ‘Cout’ of the HA, and τNAND denotes the TD introduced by the 2-input NAND gate from its inputs to output.

2.2 Full Adder A full adder is a combinational circuit that accepts three inputs X, Y, and Z of 1-bit size and produces two outputs sum and carry ‘Cout’ of 1-bit size each. Out of the three inputs, two are bits of significant inputs and the third input is the carry coming from the previous bit position. The sum and carry ‘Cout’ may be expressed as in Eqs. (8) and (9) [18], respectively. Sum = X ⊕ Y ⊕ Z

(8)

1  Cout = X.Y + Y.Z + X.Z = (X.Y )1 .((X ⊕ Y ).Z )1

(9)

The block diagram and logic diagram of the FA [18] are as shown in Fig. 4a and b, respectively. The amount of TD required for producing the ‘Sum’ and ‘Cout’ outputs from its inputs is given by Eqs. (10) and (11), respectively. τFA - Sum = 6τNAND

(10)

τFA - Cout = 5τNAND

(11)

where τFA - Sum denotes the TD from inputs to ‘Sum’ output of the FA and τFA - Cout denotes the TD from inputs to output carry ‘Cout’ of the FA. In Conventional CSM, though all the bits of the partial product are simultaneously generated, due to the carry-save addition process, more number of levels is involved

(a)

(b)

Fig. 4 a Block diagram and b Logic diagram of full adder [18]

440

M. Venkata Subbaiah and G. Umamaheswara Reddy

in generating the final product. To decrease the number of levels to reach the RCA stage and to achieve parallel computation, two alternative structures are proposed using structural decomposition of conventional CSM.

3 Modified Carry-Save Multiplier Structure-1. In this, the partial products are equally divided into two groups: group 1 and group 2. The final product of multiplication is achieved in three steps as follows, and the same is depicted in Fig. 5 for 8-bit CSM. 1.

2. 3.

Partial products in group 1 and group 2 are simultaneously added by carry-save adders (HA and FAs) until the two rows of binary bits from each group are acquired. The results obtained in step 1 are properly aligned and added by carry-save adders (HA and FAs) until the two rows of binary bits are obtained. Finally, the two rows of binary bits are added with the help of an RCA to obtain the final product.

The WCD path of the circuit is highlighted with HA and FAs (yellow-colored). The WCD is given by Eq. (12). τWorst - Case = τAND + τHA - Sum + τHA - Cout + 3τFA - Sum + 9τFA - Cout + τXOR (12) where τXOR is the TD incurred by the 2-input XOR gate from its inputs to output. Similarly, for 16-bit multiplier, the WCD is given by Eq. (13). τWorst - Case = τAND + τHA - Sum + τHA - Cout + 7τFA - Sum + 21τFA - Cout + τXOR (13) Structure-2. In this, the partial products are equally divided into four groups such as group 1, group 2, group 3, and group 4 [11, 22]. The final product of multiplication is obtained in four steps as follows, and the same is illustrated in Fig. 6 for 8-bit CSM. 1. 2.

3.

4.

Partial products in group 1, group 2, group 3, and group 4 are simultaneously added by carry-save adders (HAs). The results obtained from group 1 and group 2 as well as group 3 and group 4 in step 1 are properly aligned and added simultaneously by carry-save adders (HA and FAs) until the two rows of binary bits are obtained in each case. The results obtained from group I (group 1 and group 2) and group II (group 3 and group 4) are properly aligned and added simultaneously by carry-save adders (HA and FAs) until the two rows of binary bits are acquired. Finally, the two rows of binary bits are added with the help of an RCA to obtain the final product.

The WCD path of the circuit is highlighted with HA and FAs (yellow-colored). The WCD is given by Eq. (14).

Design of Delay-Efficient Carry-Save Multiplier by Structural …

441

Fig. 5 Logic diagram of 8-bit modified carry-save multiplier using structure-1

τWorst - Case = τAND + 2τHA - Sum + τHA - Cout + 3τFA - Sum + 8τFA - Cout + 2τXOR (14) Similarly, for 16-bit multiplier, the WCD is given by Eq. (15). τWorst - Case = τAND + 2τHA - Sum + τHA - Cout + 4τFA - Sum + 23τFA - Cout + 2τXOR (15)

442

M. Venkata Subbaiah and G. Umamaheswara Reddy

Fig. 6 a, b and c Logic diagram of 8-bit modified carry-save multiplier using structure-2

(a)

(b)

(c)

Design of Delay-Efficient Carry-Save Multiplier by Structural …

443

Table 1 lists the hardware requirements of the conventional and proposed modified CSMs. From the table, it is noticed that the number of 2-input AND gates is the same for all the multipliers of the same bit size. When compared to conventional CSM, the HA count of modified CSM 1 (using structure-1) of any bit size is almost double. However, the HA count of modified CSM 2 (using structure-2) of any bit size is quite high compared to the conventional CSM. A generalized expression for the HA count is not given in the table for the modified CSM 2 since there is no relation between 8-bit and 16-bit count. However, it may be possible to get the generalized expression as a higher bit size is considered. The number of FAs of modified CSM 1 is one less than that of the conventional CSM of any bit size, while the same is two lesser in the case of CSM 2. Table 2 lists the number of levels required to reach the RCA stage of the conventional and modified CSMs. From the table, it is observed that the number of levels required in conventional CSM is one less than the bit size. The number of levels required in modified CSM 1 is exactly half of the bit size. The generalized expression of the number of levels required in modified CSM 2 given in the table is applicable to any bit size except for the 8-bit case. However, the modified CSM 1 and CSM 2 could be feasible from 8-bit onwards. The two-dimensional (2D) DFT is an important operation for transforming the information from the spatial domain to the frequency domain, while the 2D inverse DFT (IDFT) would transform the information from the frequency domain back to Table 1 Hardware requirements of the conventional and modified CSMs Type of the multiplier

No. of 2-input AND gates

No. of half adders

No. of full adders

No. of 2-input XOR gates

Conventional CSM [20]

8-bit

64

8

48



16-bit

256

16

224



N-bit

N2

N

N (N − 2)



8-bit

64

14

47

1

16-bit

256

30

223

1

N-bit

N2

2(N − 1)

N (N − 2) − 1 1

8-bit

64

39

46

2

16-bit

256

44

222

2

N-bit

N2



N (N − 2) − 2 2

Modified CSM 1

Modified CSM 2

Table 2 Number of levels required to reach the RCA stage of conventional and modified CSMs Type of the multiplier

Conventional CSM [20]

Modified CSM 1

8-bit

No. of levels 7

Modified CSM 2

16-bit

N-bit

8-bit

16-bit

N-bit

8-bit

16-bit

N-bit

15

N−1

4

8

N/2

5

6

(N/4) + 2

444

M. Venkata Subbaiah and G. Umamaheswara Reddy

the spatial domain. Equation (16) gives the 2D DFT [2] for an image f (m, n) of size MxN, F(k, l) =

M−1 N −1 

f (m, n)e− j2π ( M + N ) mk

ln

m=0 n=0

for k = 0, 1, 2, · · · , M − 1 and l = 0, 1, 2, · · · , N − 1

(16)

where f (m, n) represents a digital image of size M × N and the exponential term is the basis function corresponding to each point of F(k, l) in the Fourier space or frequency space. Similarly, f (m, n) is obtained by applying 2D IDFT [2] to F(k, l) as given by Eq. (17). f (m, n) =

M−1 N −1 1  mk ln F(k, l)e j2π ( M + N ) M N k=0 l=0

for m = 0, 1, 2, · · · , M − 1 and n = 0, 1, 2, · · · , N − 1

(17)

Addition and multiplication are the two major operations in 2D DFT and IDFT computations, which are often found in applications of image processing. The performance of the 2D DFT and IDFT computations is determined mostly by the performance of these two operations, particularly multiplication, which is a more timeconsuming operation than the addition. If such multiplication is done by the proposed modified CSM 1 or CSM 2, the performance of 2D DFT and IDFT computation would be enhanced significantly.

4 Results and Discussion Each of the circuits described is coded using Verilog HDL. The functional verification and synthesis of all these circuits are done by Xilinx Vivado 2017.2. A few random input values are applied to these circuits to verify their functionality. Tables 3 and 4 show the synthesized results of the conventional and modified CSMs such as the number of LUTs and critical path delay (CPD) in nanoseconds (ns) on the target device ‘xc7z010clg400-1’ of the Zynq 7000 family and ‘xc7s50fgga4841’ of the Spartan 7 family, respectively. The number of LUTs occupied by 8-bit and 16-bit conventional CSM, modified CSM 1, and modified CSM 2 is the same in Tables 3 and 4; however, the CPD in ‘ns’ is different due to the underlying technology of the targeted devices. The number of LUTs occupied by the 16-bit modified CSM 1 is less, and the 16-bit modified CSM 2 is further less, as compared with its conventional counterpart due to the broken vertical carry chain. However, the number of LUTs occupied by the 8-bit modified CSM 1 is high and the 8-bit modified CSM 2 is further high, as

Design of Delay-Efficient Carry-Save Multiplier by Structural …

445

Table 3 Performance details of conventional and modified CSM on the target device ‘xc7z010clg400-1’ of Zynq 7000 family Parameter

Type of the multiplier conventional CSM [20]

Modified CSM 1

Modified CSM 2

8-bit

8-bit

8-bit

16-bit

16-bit

16-bit

Number of LUTs

81

550

88

459

120

375

Critical path delay (ns)

16.34

26.54

14.63

23.45

14.60

22.32

Table 4 Performance details of conventional and modified CSM on the target device ‘xc7s50fgga484-1’ of Spartan 7 family Parameter

Type of the multiplier conventional CSM [20]

Modified CSM 1

Modified CSM 2

8-bit

16-bit

8-bit

16-bit

8-bit

16-bit

Number of LUTs

81

550

88

459

120

375

Critical path delay (ns)

16.21

27.35

14.14

22.69

14.29

21.74

expected, as compared with the conventional CSM. The reduction in LUT count does not occur due to the comparable number of levels required to reach an RCA stage and an increase in HA count, as compared with its conventional counterpart. Further, from Table 3, it is observed that the CPD of 8-bit and 16-bit modified CSM 1 is decreased by 10.46% and 11.64% in comparison with the 8-bit and 16-bit conventional CSM, respectively. Similarly, the CPD of 8-bit and 16-bit modified CSM 2 is decreased by 10.65% and 15.90% in comparison with the 8-bit and 16-bit conventional CSM, respectively. From Table 4, it is observed that the CPD of 8-bit and 16-bit modified CSM 1 is decreased by 12.77% and 17.04% in comparison with the 8-bit and 16-bit conventional CSM, respectively. Similarly, the CPD of 8-bit and 16-bit modified CSM 2 is decreased by 11.84% and 20.51% in comparison with the 8-bit and 16-bit conventional CSM, respectively. Further, the synthesized results of modified CSMs is compared with the existing 16-bit array multiplier, 16-bit WTM, 16-bit WTM using counters, and 16-bit conventional CSM obtained on the target device ‘xc7s50fgga484-1’ of Spartan 7 family as shown in Table 5.

5 Conclusion In this paper, 8-bit and 16-bit CSMs were designed using the proposed alternative structures and were coded in Verilog HDL. The designed multipliers were simulated and synthesized in Xilinx Vivado 2017.2 on the target device ‘xc7z010clg400-1’

446 Table 5 Performance comparison of proposed and existing multipliers

M. Venkata Subbaiah and G. Umamaheswara Reddy Type of the multiplier

Number of LUTs

Critical path delay (ns)

16-bit array multiplier [23]

600

36.08

16-bit WTM [23]

437

23.33

16-bit WTM using counters [24]

369

22.48

16-bit conventional CSM [20]

550

27.35

16-bit modified CSM 1

459

22.69

16-bit modified CSM 2

375

21.74

of the Zynq 7000 family and ‘xc7s50fgga484-1’ of the Spartan 7 family. The 16bit modified CSM 1 achieved a reduction of 37.10, 2.74, and 17.04% in CPD when compared to the CPD of 16-bit array multiplier, 16-bit WTM, and 16-bit conventional CSM, respectively. However, its CPD is 0.90% higher than the 16-bit WTM using counters. Similarly, the 16-bit modified CSM 2 achieved a reduction of 39.74, 6.81, 3.30, and 20.51% in CPD when compared to the CPD of 16-bit array multiplier, 16bit WTM, 16-bit WTM using counters, and 16-bit conventional CSM, respectively. Further, the CPD of the proposed multipliers can be decreased by replacing the RCA with the fast adder circuits such as carry select adder and parallel prefix adders.

References 1. Proakis, J.G., Manolakis, D.G.: Digital Signal Processing–Principles, Algorithms and Applications, 4th edn. Pearson Education (2007) 2. Rafael, C.G., Richard, E.W.: Digital Image Processing, 3rd edn. Pearson Education International (2008) 3. Freeny, S.L.: Special-purpose hardware for digital filtering. Proc. IEEE 63(4), 633–648 (1975). https://doi.org/10.1109/PROC.1975.9797 4. KavyaShree, D., Samundiswary, P., Gowreesrinivas, K.V.: High speed multipliers using counters based on symmetric stacking. In: 2019 International Conference on Computer Communication and Informatics (ICCCI). (2019) 1–6. https://doi.org/10.1109/ICCCI.2019. 8822185 5. Asif, S., Kong, Y.: Design of an algorithmic Wallace multiplier using high speed counters. In: 2015 Tenth International Conference on Computer Engineering and Systems (ICCES), pp. 133–138. (2015). https://doi.org/10.1109/ICCES.2015.7393033 6. Wallace, C.S.: A suggestion for a fast multiplier. IEEE Trans. Electr. Comput. EC-13(1), 14–17 (1964). https://doi.org/10.1109/PGEC.1964.263830 7. Habibi, A., Wintz, P.A.: Fast Multipliers. IEEE Trans. Comput. C–19(2), 153–157 (1970). https://doi.org/10.1109/TC.1970.222881 8. Dhurkadas, A.: Faster parallel multiplier. Proc. IEEE 72(1), 134–136 (1984). https://doi.org/ 10.1109/PROC.1984.12827

Design of Delay-Efficient Carry-Save Multiplier by Structural …

447

9. Oklobdzija, V.G., Villeger, D.: Improving multiplier design by using improved column compression tree and optimized final adder in CMOS technology. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 3(2), 292–301 (1995). https://doi.org/10.1109/92.386228 10. Park, B.-I., Park, I.-C., Kyung, C.-M.: A regular layout structured multiplier based on weighted carry-save adders. In: Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040), pp. 243–248. (1999). https://doi. org/10.1109/ICCD.1999.808432 11. Bickerstaff, K.C., Swartzlander, E.E., Schulte, M.J.: Analysis of column compression multipliers. In: Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001 pp. 33–39. (2001). https://doi.org/10.1109/ARITH.2001.930101 12. Waters, R.S., Swartzlander, E.E.: A Reduced complexity wallace multiplier reduction. IEEE Trans. Comput. 59(8), 1134–1137 (2010). https://doi.org/10.1109/TC.2010.103 13. Patali, P., Kassim, S.T.: An efficient architecture for signed carry save multiplication. IEEE Lett. Comput. Soc. 3(1), 9–12 (2020). https://doi.org/10.1109/LOCS.2020.2971443 14. Ravi, N., Satish, A., Prasad, T.J., Rao, T.S.: A new design for array multiplier with trade off in power and area. 533–537 (2011). arXiv preprint arXiv:1111.7258 15. Thamizharasan, V., Kasthuri, N.: High-speed hybrid multiplier design using a hybrid adder with FPGA implementation. IETE J. Res. 1–9 (2021). https://doi.org/10.1080/03772063.2021. 1912655 16. Solanki, V., Darji, A.D., Singapuri, H.: Design of low-power wallace tree multiplier architecture using modular approach. Circ. Syst. Sig. Process 40, 4407–4427 (2021). https://doi.org/10. 1007/s00034-021-01671-3 17. Hemanth Krishna, L., Neeharika, M., Janjirala, V., Veeramachaneni, S., Mahammad, S.N.: Efficient design of 15:4 counter using a novel 5:3 counter for high-speed multiplication. IET Comput. Digit. Tech. 15, 12–19 (2021). https://doi.org/10.1049/cdt2.12002 18. Morris Mano, M., Ciletti M.D.: Digital Design: With an Introduction to the Verilog HDL. 5th edn. Pearson education (2013) 19. Mallavarapu, R.K., Rao, T., Vasavi, S.: Design and implementation of an eight bit multiplier using twin precision technique and baugh-wooley algorithm. Int. J. Sci. Res. Publ. 3(4), 1–6 (2013) 20. Suman, S., Singh, N.P., Selvakumar, R., Saini, H.: Design of 32-bit cell based carry-save combinational multiplier with reduced area and propagation delay. J. Phys.: Conf. Ser. 1804(2021), 1–8 (2021) 21. Balasubramanian, P., Maskell, D., Naayagi, R.T., Mastorakis, N.: Early output quasi-delayinsensitive array multipliers. Electronics 8(4): 444, 1–14 (2019). https://doi.org/10.3390/electr onics8040444 22. Selvi, T.: FPGA implementation of 8-bit multiplier with reduced delay time. Int. J. Comput. Commun. Eng. 665–668 (2013) 23. Ram, G.C., Rani, D.S., Balasaikesava, R., Sindhuri, K.B.: Design of delay efficient modified 16– bit Wallace multiplier. In: 2016 IEEE International Conference on Recent Trends in Electronics, Information and Communication Technology (RTEICT), pp. 1887–1891. (2016). https://doi. org/10.1109/RTEICT.2016.7808163 24. Kumar, S., Sasamal, T.N.: Verilog implementation of high-speed wallace tree multiplier. In: Sharma, R., Mishra, M., Nayak, J., Naik, B., Pelusi, D. (eds.) Green Technology for Smart City and Society. Lecture Notes in Networks and Systems, vol. 151, pp. 457–469. Springer, Singapore, (2021). https://doi.org/10.1007/978-981-15-8218-9_38

Speech Intelligibility Quality in Telugu Speech Patterns Using a Wavelet-Based Hybrid Threshold Transform Method S. China Venkateswarlu, N. Uday Kumar, D. Veeraswamy, and Vallabhuni Vijay

Abstract This paper proposes the algorithm of multiband spectral subtraction in which the quality of speech has enhanced the intelligibility of the speech. In our daily life, speech is significant to convey to destination. When we consider industrial areas where we face the noise in speech, there may be some additional noise added to the signal, which causes the disturbance to the original signal cannot be achieved perfectly to remove this noise; different algorithms such as spectral subtraction and wiener filter. At the same time, whilst these two algorithms have different objective measure analyses such as SNR, SSNR, MSE, PSNR, NRMSE and PSEQ. Comparing these values with multiband spectral subtraction, some objective measures such as SNR, segmental SNR, frequency segmental SNR, cestrum and overall SNR. By using different transform techniques such as Haar transform and Daubechies transform, we can remove the noise and improve the quality of the speech signal.

1 Introduction There is no conventional challenge in improving voice production and eligibility that remains accessible and unresolved; this has been an active study era for many years. The demand for communication headsets for industrial use, new applications such as hands-free networking and automatic voice recognition devices have fuelled S. C. Venkateswarlu (B) · V. Vijay Department of Electronics and Communication Engineering, Institute of Aeronautical Engineering, Dundigal, Hyderabad 500043, India V. Vijay e-mail: [email protected] N. U. Kumar Department of Electronics and Communication Engineering, Marri Laxman Reddy Institute of Technology and Management, Dundigal, Hyderabad 500043, India D. Veeraswamy Department of Electronics and Communication Engineering, Hyderabad Institute of Technology and Management, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_40

449

450

S. C. Venkateswarlu et al.

development in this subject. Non-steady noise is one of the major issues for the modern state of technology. Standard algorithms can detect non-stationary noise, but output efficiency is decreased as background noise status is increased. Hence, algorithms to increase speech communication efficiency in the industrial and heavily noisy world improve speech communication. Coefficient thresholding methods such as binary masking and transformation of the wavelet have also been used widely to improve expression. Modulation channel methods have performed little work. Yet, the optimum performance in speech efficiency and intelligibility could not be achieved in both strategies. In other words, it is difficult to find effective ways to restrict the sounds of low SNR. The previous approaches had limitations, including the incorporation of peaks (musical noise), more iterations and low speech efficiency and intelligibility.

1.1 Speech Processing Speech processing refers to the analysis of speech signals as well as signal processing techniques. There are some techniques which are used in the speech processing and those are: Dynamic time warping is an algorithm for the simulation measurement between the two time series. Generally speaking, the digital time warping equation determines the ideal fit between two sequences with some restrictions and laws like time series. The optimum match which meets all constraints and laws and which has the minimum cost in which the cost is calculated as the number of absolute differences between their values for each matched index pair. Artificial neural networks are built on an artificial neuronal-like set of linked units or nodes that model the biological brain loose. Each communication can relay a signal from one artificial neuron to another, like here, the main point included is the algorithms are made with the help of programming. And the next one is signal processing. Speech processing is mainly regarded as a special case of digital signal processing.

1.2 Denoising Techniques The denoising of speech is an issue of many years. Due to an input noisy signal, we want to filter the unwanted, damaging the interest signal. You can visualise someone speaking during a video call when a music piece is playing behind the scenes. In this case, it is the responsibility of the speech denotation device to remove the background noise to increase the speech signal [1]. This technology is particularly important for video and audio conferencing where noise can greatly reduce speech intelligibility, in addition to many others.

Speech Intelligibility Quality in Telugu Speech Patterns …

451

Fig. 1 Denoising using wavelet transform

Denoting or estimating functions require as much as possible reconstruction of the signal on the basis of measurements of a useful noise corrupting signal (Fig. 1). The above shown is the pictorial presentation of denoising using wavelet transform. And first, it starts with an input noisy speech pattern. So here, we will input noise consisting of speech patterns. And then, it is passed to wavelet transform to check for it. And then to estimate the threshold to check the estimation of this. And then, there will be two ways. One will be soft thresholding and other will be hard thresholding. So, we can apply either of them to proceed. Then next, it goes to inverse wavelet transform and then finally, its up to denoised speech. Here, we can see that the speech is passing and the noise which is alongside speech is removed. Hence, this is the process of denoising technique.

1.3 Wavelet Transforms In several areas of mechanics, engineers, seismography, electronic data processing, etc., wavelet transformations are widely used in the analyses, encoded and reconstructions of signals. The wavelet transform, a feature that breaks into a series of wavelets, is an alternative solution [1].

1.3.1

Different Wavelet Transforms

HAAR: The first and shortest wavelet is the Haar wavelet, which is the starting point for any discussion of wavelets. The Haar wavelet is a step-like feature that is discontinuous. It depicts the same wavelet as the Daubechies db1 wavelet [9]. DAUBECHIES: Ingrid Daubechies, perhaps the most splendid star in the realm of wavelet research, developed what are called minimally upheld orthonormal wavelets—along these lines making discrete wavelet investigation practicable [9].

452

S. C. Venkateswarlu et al.

2 Literature Survey For adaptive noise cancellation, Widrow et al. have suggested wiener filters based on both noise and voice. A complicated continuum of clean speech (time domain form) is easily retrieved using the linear model between observed and projected signal. Clean voice spectral amplitude is measured in the Wiener filter and also phase is retrieved directly from the noisy sign. This approach digital signal processor (DSP) is suitable for stationary applications where the ideal Wiener filter reduces the measurement error. In the Wiener filter, a further number of iterations were carried out [2].

2.1 Noisy Speech Enhancement The problem of noise reduction in the paper was that the optimal signal y(n) is recovered with the cleansing speech signal x(n), and n is the discrete time indicator of 0. y(n) = x(n) + s(n)

(1)

where S(n) is a background noise and x(n) is background information that is not correlated. Non-stationary signals are used as input data for the analysis and implementation of the proposed method. In the frequency domain short-time, Fourier transformations are used to calculate the clean language patterns provided by noisy speech patents [3]. Y (k, m) = S(k, m) + U (k, m)

(2)

where Y (k, m), S(k, m) and U(k, m), the frequency bin k is belongs to {0, 1, 2, 3, …., k − 1} and the time frame m, respectively, the STFTs of y(n), S(n) and u(n). The variance of Y (k, m) is necessary for further study of the threshold because of s(n) and u(n) uncorrected by inference [6]. Recently, improving expression has become an essential component of speech coding and speech recognition technologies. Speech amplification has two main considerations: Language evaluation and noise power assessment. The voice estimation is based on the mathematical speech model, the distortion criterion and the measured noise [2].

2.2 Existing Method From the paper we had, the existing model has both soft and hard thresholding. So this causes some advantages and also some disadvantages. To solve some of the

Speech Intelligibility Quality in Telugu Speech Patterns …

453

disadvantages, here is the mix of soft and hard thresholding. And that is called the hybrid threshold [4]. Hybrid Threshold: As few drawbacks are present in soft thresholding and hard thresholding individually during noise reduction methods. To come back with these drawbacks, we use the combination of these soft and hard thresholding techniques to get a new kind of thresholding technique and it is the hybrid thresholding method. In soft thresholding methods, it removes the discontinuity of the signal and in hard thresholding technique, the discontinuity of the signal presents. Sometimes, it keeps and sometimes, it kills the procedure and is more and more often. So these combined techniques are used for developing a new technique which is more efficient in manner and it is named hybrid thresholding [4] (Fig. 2). Here, the input is through windowing and then, it passes through FFT which is a fast Fourier transform. And then, these speech signals will go to noise estimation and then to spectral subtraction. Here, these speech signals then continue to complex spectrum and then to IFFT which is inverse fast Fourier transform (IFFT). Here, then continues to overlap—add and then to enhanced speech signal. Here, we get a clear signal without noise [5]. Principle of spectral subtraction method: Take a noise signal having noises which is derived from the independent additive noises as y[n] = s[n] + d[n]

(3)

where y[n] = sampled noisy speech, s[n] = clean speech, d[n] = additive noise Is considered additive noise with a zero mean and no relevance to any type of clear speech. Due to the non-stationary and time-variable nature of speech signals,

Fig. 2 Block diagram of spectral subtraction method

454

S. C. Venkateswarlu et al.

its representation is in the short-time Fourier transform, which has the following transformation. y(ω, k) = S(ω, k) + D(ω, k)

(4)

By removing a noise estimate from the received signal, the speech can be approximated.      ˆ ˆ 2   S(ω)2  = |y(ω)|2 −  D(ω)

(5)

Averaging recent speech pause frames yield an estimate of the noise spectrum:     ˆ ˆ   k − 1)2  + (1 − λ(ω, k)) = λ(ω, k)2  D(ω, k)2  = λ(ω, k) D(ω,

(6)

M is the number of consecutive frames. The spectral subtraction is considered as filter by manipulating (4) so that it can be given as the product of noisy speech spectrum along with spectral subtraction filter (SSF) as below: b  b   ˆ  ˆ  ,b > 0  S(ω) = |y(ω)|b −  D(ω)

(7)

where H(ω) is gain function and also known as spectral subtraction filter (SSF). The H(ω) is a zero phase filter, having magnitude response as in between 0 ≤ H(ω) ≤ 1.  α = α0 (S N R − S N Rmin )

 αmin − α0 , S N Rmin < S N R < S N Rmax S N Rmax − S N Rmin (8)

To construct back the previous signal, we need phase estimation of the speech. Thus, the speech signal in a frame is calculated by ⎡



⎢ S N R(ω, k) = 10 log10 ⎣

|γ (ω, p)| ⎥ 2 ⎦   ˆ p=1  D(ω, k − p) 2

1 m

m

(9)

Those estimated speech signals recover in the domain of time and inverse Fourier transforming S(ω) using the overlap-add technique. Although, this spectral subtraction method reduces the majority of the noises and it also has some drawbacks like it depends more (Fig. 3). Here, we can see that the flow chart starts with the noisy speech signal. In other words, the noise is added to the speech signal. And then, it continues to time– frequency analysis in windowing mode. And next to wavelet decomposition passing through it. Here, the wavelet is decomposed. Next is hybrid thresholding, as said

Speech Intelligibility Quality in Telugu Speech Patterns …

455

Fig. 3 Hybrid thresholding

above, this is the mixture of soft and hard thresholding. The signal passes through this. And next to IWT, which is integer wavelet transform. And then finally to enhance speech signal. Here, enhanced meaning, the noise which is present with the signal is eradicated. And the pure enhanced signal is shown [4]. Wiener Filter: The Wiener filter (WF) is a kind of filter that reduces the mean square error (MSE). The GF–gain function of WF Wiener (ω) is written in the form of the power spectral density (PSD) of clean speech of the noise Pd (ω). ⎛

⎞ 2 |y (ω)| ⎜ ω=ki i ⎟ S N Ri (d B) = 10 log10 ⎝ 2 ⎠ ki+1   ˆ ω=ki  Di (ω) ki+1

(10)

The fixed gain (FG) at every frequency levels and their requirements to estimate the PSD of the clean signal and noise are before filtering. This is the drawback of Wiener filter. So, we use adaptive WF to round the approximation of WF gain function. 2  2    2   ˆ  ˆ  (11)  > β.|Yi (ω)|2  S(ω) = |y(ω)|2 − αi .δi . Dˆ i (ω) , i f  S(ω)

456

S. C. Venkateswarlu et al.

where ki < ω < ki+1

2.3 Problem Identification During communication between two people in a laboratory, there is some noise added to the original speech signals of the speaker. This noise may be due to environmental disturbance or may be people around the speakers who create the disturbance. This result in the quality of speech is degraded and cannot transmit the message in effective manner. This is the problem which we face in our daily life situations. In order to rectify the problem, we have the algorithm which enhances the speech quality [5].

3 Proposed Method Multiband spectral subtraction is an algorithm which removes the noise in speech signals by using different transform techniques. In real-life situations, whenever there is transmission of information, we add a carrier signal to the message signal in order to travel more distance. Also we add some noise in the transmitter side and remove that noise in the receiver side before we retrieve actual information [5]. But sometimes due to environmental issues, this noise will not be discarded completely and cause disturbances. So in order to remove this noise, we are using this algorithm and using different wavelet transform techniques such Haar, Daubechies, etc., and we are denoising the speech signals. There are some thresholding techniques such as hard thresholding and soft thresholding [9]. Hard Thresholding: This is a wavelet thresholding technique in which by making the zero coefficients, the fixed values are lesser than the threshold λ. Soft Thresholding: In this wavelet thresholding, technique in which the making combines non-zero coefficients towards zero (Fig. 4). Multiband spectral subtraction algorithm: The assumption behind the multiband spectral subtraction method is that the added noise will be stationary and uncorrelated with a clean voice signal [6].   2 2 2   |Yi (ω)|2 − αi .δi .  ˆ     ˆ S(ω)  Di (ω) , i f  Sˆi (ω) > β.|Yi (ω)|2  =  2 β.|Yi (ω)|

(12)

where ki < ω < ki+1 .    αmin − αmax αi = αmax + (S N Ri − S N Rmin ) , S N Rmax − S N Rmin  i f S N Rmin (3 ∗ Si + S j ) and c > (2 ∗ Si + S j ) + 2 ∗ (Ti + T j ) where u = (

n−1 ) 3

GPBA is the best algorithm present when considered with respect to the faulty processors permitted and the number of rounds it takes to perform the protocol.

2.3 A Randomized Protocol to Solve Byzantine Agreement [3] The previous algorithms we saw were deterministic algorithms which were proposed for a general network. The algorithm proposed in this paper is a randomized algorithm which is used to solve the Byzantine agreement problem where ‘t’ number of nodes can be faulty out of ‘n’ nodes. This algorithm works only for synchronous t ) and total message systems. The total rounds needed for this algorithm is O( logn

A Survey on Byzantine Agreement Algorithms …

501

2

tf bits is O( nlogn ). The algorithm is independent of how the failures are distributed over the network. The above performance mentioned can be improvised to fixed rounds and O(n 2 ) number of message bits assuming that the faulty nodes are distributed uniformly. This randomized algorithm works for any number of nodes ‘n’; if n >= (3 ∗ t + 1), ‘t’ is the maximum allowable faulty nodes. This algorithm is simple and efficiently implemented and hence is of great need in real implementations. When performance is considered, deterministic algorithm’s performance is less than this randomized algorithm’s performance, and also, it does not need any cryptographic techniques to perform this algorithm. The randomized algorithm discussed here, for randomization, uses global coin tosses. In this algorithm, the network is assumed to be reliable and fully connected. At every round, coins might be tossed by some processors for some of the calculations. Not-faulty processors, if they tossed coin, send the value they got from the coin toss instead of sending the incorrect values as sent by the faulty processors. Every processor, once the algorithm starts, sends its initial value ‘v’ chosen from a group of values ‘V .’ The aim of this randomized algorithm is that every processor will have an answer, once it has sent messages. The answers need two requirements to be satisfied:

1. Agreement: The answers given by all the not-faulty processors are same. 2. Validity: If there is one value with which all the processors start the algorithm, then at the end of the algorithm, the correct answer will be the value with which all processors started the algorithm. This randomized algorithm has three properties. Validity: If there is one value with which all the not-faulty processors start its algorithm, then during the second round of first epoch, every not-faulty processor agree on that same value with which they all started. Agreement: If a not-faulty processors agree for the first time on a value, during an epoch s, then by the second round of the next epoch, all of the not-faulty processors agree on that value. Termination: There will be minimum one value in every epoch such that all the not-faulty processors will agree on the correct value during the finish of the next epoch. Under the assumption that all the faulty processors are separated evenly, then the randomized algorithm proposed is expected to run in constant time. There are four principle limitations this algorithm faces, they are: 1. This algorithm limits the total defective nodes. The maximum allowable defective nodes should be at most 13 of the total present nodes. It is proven that no cryptographic algorithm exists which can perform correctly if the maximum defective nodes allowable is more than 13 of the total nodes present. 2. The assumption concerning the network is that it is a reliable communication. 3. It is assumed that the faulty processors cannot determine the coin tosses made by the not-faulty processors in the future.

502

P. C. Sachin and D. Amit

4. During every round, it is assumed that the not-faulty processors will not subvert any processors. To achieve agreement using this randomized algorithm, the rounds which is pert formed is O( logn ) and also the random bits taken by every processor is above 1. This randomized algorithm is preferable when all the faulty processors are evenly spread across the network because the expected running time becomes a constant when the faulty processors are uniformly distributed.

2.4 Protocol to Solve Byzantine Agreement in Cloud Environment [4] In recent years, cloud computing has taken substantial growth, and its popularity is increasing. Even Byzantine problem exists in cloud computing systems which consists of huge amount of memories and processors and also will contain networks made up of fast transmission links. In this domain, every node has to interact with other nodes and have to communicate with each other in order to satisfy the needs of the customers. So, one of the major problems in the cloud environment is reliability where faulty nodes should be taken care of in order to achieve reliability. Hence, achieving Byzantine agreement in a cloud system is the goals of the algorithm. The concept used in this algorithm to achieve Byzantine agreement is ‘Early stopping protocol(ESP)’ which is used to make sure that all nodes agree on same value early in separate rounds. This algorithm can also be applied for mixed fault failure, and it takes optimal number of rounds when compared to other algorithms in the cloud domain. Therefore, this algorithm proposed can resolve the Byzantine agreement in the cloud domain and also improves the reliability which is one of the important factors to be considered. The goal of ‘Fault Diagnosis Agreement(FDA)’ problem is that the not-faulty processors use the least number of message transfers to find the processors which are faulty. This algorithm uses ESP so that the number of messages which are exchanged is minimized and improve the reliability and efficiency in the cloud computing environment. This algorithm also considers hybrid fault for the processors and, hence, deals with both dormant and arbitrary faults. In this algorithm, the following conditions must be met: Agreement: Every not-faulty processor can determine the faulty components Fairness: It is made sure that all the not-faulty processors are determined as notfaulty processors and faulty processors as faulty processors. Not-faulty processors should not be falsely determined as faulty processors. The algorithm proposed is ‘Early Diagnosis Cloud Agreement (EDCA)’ protocol which solves the Byzantine agreement. In a cloud environment, EDCA algorithm can be applied on a network in which maximum allowable faulty processors are present, and the total rounds taken to perform the algorithm is optimal. The total rounds

A Survey on Byzantine Agreement Algorithms …

503

needed to exchange messages is r = min{ fr + 2, f a + 1}. The EDCA protocol has three phases, namely ‘the message exchange’ phase, ‘the decision-making phase, and the ‘fault diagnosis’ phase. The EDCA algorithm helps in increasing the reliability in cloud computing environments. For the EDCA algorithm to work correctly, the total number of processors n > ) + 2Si + S j and c > 2Si + S j .c=connectivity of the networks Si = arbitrary b( n−1 3c fault S j = dormant faults The maximum rounds needed is min{Sr + 2, Si + 1} The ‘Early Diagnosis Cloud Agreement’ integrates the idea of ‘Early Stopping’ and ‘Fault Diagnosis’ so that Byzantine Agreement can be achieved in an efficient and quick way in cloud computing domain. Only min{Sr + 2, Si + 1} number of messages are needed so that every not-faulty processors can agree on the correct value in a cloud computing domain (Table 2).

2.5 Degradable Byzantine Agreement The algorithms presented before need every not-faulty nodes to agree on an answer such that it is same in all fault-free processors. In this algorithm, Byzantine agreement can be achieved if the total defectives is t(t < 21 ∗ n), and also, it can accomplish a degraded form of agreement if the number of faults is u and u >= t. This leads to every not-faulty processors to decide on two values, out of which one will definitely be the default value. In this protocol, the channels complete calculations on an input which they receive from the sender, and then, the value generated by every channel is produced to an outside node. This node will calculate the value which is received maximum times from every channel and using that finds the maximum value (Table 3). The algorithm should ensure certain conditions such as: 1. The outside node gets the right value by finding the majority value if the provided system’s channels are 3t with only one sender and assuming the number of faulty channels are t, and the sender is not faulty. 2. Every not-faulty nodes will be in the same state, where total defective nodes permissible is t. The main motive of this algorithm is that it is much better to use default value instead of the wrong value, especially in safety-critical systems. If the total number of nodes is n, then every fault-free nodes can agree with t/u degradable agreement only if n > 2t + u. The network connectivity should be minimum t + u + 1 for this algorithm to work properly. If the faulty processors are less than t, then this algorithm achieves Byzantine agreement, i.e., every not-faulty node will come to a common agreement. If the defective nodes are > t (but = 3m + 1

m+1

Reliable and fully connected

Single fault type

Synchronous

Byzantine agreement in the presence of mixed faults on processors and links [2]

Deterministic t, u+1 y = ( n−1 3 ), n > (3 ∗ Si + S j ), (2 ∗ Si + Sj) + 2 ∗ (Ti + T j

Unreliable and general network

Mixed fault type

Synchronous

A simple and efficient randomized Byzantine agreement algorithm [3]

Randomized

t O( logn )

Reliable and fully connected

Mixed fault type

Synchronous

min{Sr + 2, Si + 1}

General network

Mixed fault type

Synchronous

General network

Mixed fault type

Synchronous

General network

Mixed fault type

Synchronous

t, n >= 3t + 1

An agreement Deterministic Si + S j , n > under early ( n−1 3 ) + 2Si + stopping and 2S j , c > fault 2Si + S j diagnosis protocol in a cloud computing environment [4] Degradable Deterministic u, u >= Byzantine t, n > (2 ∗ agreement [5] t + u), c > t +u Distributed agreement in the presence of processor and communication faults [6]

Deterministic t, n >= 3t + 1

t +1

Table 3 Least amount of total processors required for t/u degradable agreement u/t 1 2 3 4 5 1 2 3

4 – –

5 7 –

6 8 10

7 9 11

8 10 12

A Survey on Byzantine Agreement Algorithms …

505

2.6 Protocol to Have Distributed Agreement When Both Links and Nodes are Subjected to Faults In this protocol, Byzantine algorithm’s solution is presented where if a processor cannot send or receive messages, then that is also considered as a failure. In this algorithm considered, processors may fail in both sending and receiving of messages, i.e., a faulty processor can halt abruptly and also can fail to receive or send messages required by the protocol. A faulty processor can produce messages independent of those sent by the loyal processors. However, the message agreed upon by the loyal nodes must be correct and same concerning every not-faulty nodes. This algorithm provides solution to the more realist models of failures. When given a less-restricted network, processors, and fault types, this algorithm provides the best optimal solution in terms of time, messages sent, and received. This algorithm performs same as that of other algorithms proposed, when the network and processor fault types are more restrictive. However, this algorithm is applied for the less-restrictive type of failure, hence, is better than other algorithms. If the total nodes is n and the number of nodes which perform send-faults is at most t and n > (t + 1), then this algorithm makes sure that the Byzantine Agreement is achieved. Here we can observe that the algorithm does not put a bound on the number of receiver faults because this algorithm treats receiver faults as benign. Hence, this indicates that only send faults are relevant. The complexity of the early stopping protocol is O( f n 2 ) because the amount of bits used for message sending and receiving is that much. This complexity is identical to the complexity of ‘crashfault protocol [7].’ But this algorithm is performed on a less-restrictive failures, and hence, this algorithm performs better for more variety of problems.

3 Conclusions In all the algorithms observed, it works only for synchronous processors. Even if one of the processors is asynchronous, achieving Byzantine algorithm will be complicated. Also, all the algorithm works only if the total number of nodes in the system is greater than three times the number of faulty nodes. If this condition is not satisfied, then we need to use cryptographic methods to ensure agreements among processors. However, the complexity of each algorithm to perform the Byzantine agreement algorithm is different. Byzantine Generals Problem was one of the first algorithms proposed which became the basis for all other algorithms.‘Randomized protocol to solve Byzantine Agreement’ performs better than Byzantine generals problem. Randomized algorithm analyzed performs better than other algorithms only in a fully connected network. ‘Protocol to solve Byzantine Agreement in Cloud Environment’ proposed is the optimal solution for a Byzantine agreement in a cloud environment. ‘Solution to Byzantine Algorithm when both links and processors are subjected to Hybrid Faults’ performs much better than degradable Byzantine algorithm, but the

506

P. C. Sachin and D. Amit

latter allows more faulty processors than the former. Hence, by the analysis of the algorithms mentioned above, we can, say, mention that given a particular scenario, each algorithms performance varies and different algorithms perform better in different scenarios.

References 1. Lamport, L., Shostak, R., Pease, M.: The Byzantine Generals Problem. ACM Trans. Programming Language Systems 4(3), 382–401 (1982) 2. Chor, B., Coan, B.A.: A simple and efficient randomized Byzantine agreement algorithm. IEEE Trans. Softw. Eng. 11(3), 531–539 (1985) 3. Siu, H.-S., Chin, Y.-H., Yang, W.-P.: Byzantine agreement in the presence of mixed faults on processors and links. IEEE Tran. Parallel Distrib. Syst. 9(4), 335–345 (1998) 4. Chiang, M.L., Chen, C.L., Hsieh, H.C.: An agreement under early stopping and fault diagnosis protocol in a cloud computing environment. IEEE Access 6, 44868–44875 (2018) 5. Vaidya, N.H., Pradhan, D.K.: Degradable Byzantine agreement. IEEE Trans. Comput. 44(1), 146–150 (1995) 6. Perry, K.J., Toueg, S.: Distributed agreement in the presence of processor and communication faults. IEEE Trans. Softw. Eng. SE–12(3), 477–482 (1986) 7. Lamport, L., Fischer, M.: Byzantine Generals and Transaction Commit Protocols. SRI Int (1982) 8. Dolev, D.: The Byzantine generals strike again. J. Algorithms 3(1), 14–30 (1982) 9. Christian, F., Aghili, H., Strong, H.R.: Atomic broadcast: from simple message diffusion to Byzantine agreement. In: Proceedings of Symposium Fault-Tolerant Computing, pp. 200–205 (1985) 10. Thambidurai, P., Park, Y.-K.: Interactive consistency with multiple failure modes. In: Proceedings of Symposium Reliable Distributed Systems, pp. 93–100 (1988) 11. Meyer, F.J., Pradhan, D.K.: Consensus with dual failure modes. IEEE Trans. Parallel Distrib. Syst. 2(2), 214–222 (1991) 12. Lincoln, P., Rushby, J.: A formally verified algorithm for interactive consistency under a hybrid fault model. In: Proceedings of Symposium Fault-Tolerant Computing, pp. 402–411 (1993) 13. Babaoglu, O., Drummond, R.: Street of Byzantium: network architectures for fast reliable broadcasts. IEEE Trans. Softw. Eng. 11(6), 546–554 (1985) 14. Yan, K.Q., Chin, Y.H.: An optimal solution for consensus problem in an unreliable communication system. In: Proceedings of International Conference on Parallel Processing, pp. 388–391 (1988) 15. Yan, K.Q., Chin, Y.H., Wang, S.C.: Optimal agreement protocol in Byzantine faulty processors and faulty links. IEEE Trans. Knowl. Data Eng. 4(3), 266–280 (1992)

An Autonomous Intelligent System to Leverage the Post-harvest Agricultural Process Using Localization and Mapping Amitash Nanda

and Deepak Ahire

Abstract Agriculture accounts for the principal source of income in a country like India. Farmers produce a prolific amount of crops yearly. Crop production involves four critical phases, harvesting, storage, processing, and market access. The postphase of crop harvesting is very crucial for valuable production. Various fruits and vegetables remain fresh for a shorter duration of time before reaching timely to the market. Manual labor and data handling are less efficient; hence precision agriculture requires efficient crop monitoring and data collection using a mobile robot. In this work, we proposed two scenarios, one for mapped and the other for the unmapped model of the farm. The mapped model signifies the creation of a two-dimensional replica of the fruit farm. FireBird V ATmega 2560 robotics research platform is used in this research to traverse the abstraction of the fruit farm. It spans the mapped model, collects the fruit crates, and transmits container data to the truck abstraction for transportation. The robot covers the shortest path from the source to destination nodes by parallelizing Dijkstra’s algorithm using the OpenMP API. In the unmapped model of the fruit farm, the GraphSLAM algorithm generates a map, and the adaptive Monte Carlo Localization algorithm implementation occurs on the virtual robot. The simulation experiment is conducted in the Gazebo environment using the Robot Operating System platform.

1 Introduction Indian agriculture plays a vital role in contributing to the country’s economy. It stands for 15.4% of India’s GDP and uses almost half of the country’s workforce [1]. The four critical phases of crop production involve harvesting, storage, processing, and early market access. Post-harvest losses occur due to many reasons, and that A. Nanda (B) Department of Instrumentation and Electronics Engineering, College of Engineering and Technology, Bhubaneswar, Odisha, India D. Ahire Department of Computer Science and Engineering, Walchand College of Engineering, Sangli, Maharashtra, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_45

507

508

A. Nanda and D. Ahire

Fig. 1 Data for estimated crop losses during harvest and post-harvest

incurs enormous losses to farmers and the Government. Gathered from the source Jha et al. (2015), Figs. 1 and 2 display commodities and post-harvest losses [2]. The Government takes various measures and provides facilities to the farmers; still, the farmer suicides. The current advancement in technologies should replace the conventional post-harvesting methodologies. The recent technology has developed traditional agricultural practices and made them more flexible and reliable to use. Precision agriculture leads to a more accurate and controlled way of farming [3]. The intervention of technologies like GPS, control systems, a wide range of sensors, autonomous robots, drones, and precision software has developed the agricultural processes. In India, primary crop production is seasonal, but the demand remains throughout the year. So, when the demand increases, it becomes necessary to match the consumer’s needs. The data provided in this article [4] states that the post-harvest losses in India account for approximately US$ 14.33 bn. In addition, the daily waste of crops leads to a worth of US$ 19.4 million. The main reason for the enormous losses is rejection in the farm gates, delays in the distribution process, and poor transportation. Moreover, the current system is unstable, manual, and involves human error. The farmers delay the process of transporting fresh fruits or vegetables to the market at the proper time. As a result, it remains fresh only for a short duration of time, and lack of cold storage facilities leads to a massive waste of fresh fruits and vegetables. The present crop production methodologies require radical transformation and innovative autonomous intelligent systems to leverage the post-harvest losses. In large fruit farms, collecting fruit crates and transferring them to the proper transportation channel within the time frame is crucial. The current system involves the use of manual labor, which delays the process. The use of an autonomous robotic unit reduces human effort and performs series of complex actions.

An Autonomous Intelligent System to Leverage …

509

Fig. 2 Data for existing crop losses during post-harvest

In this research, we modified Dijkstra’s algorithm used for the shortest path in our previous work. In this work, we harness the computing power of all the available logical cores by parallelizing Dijkstra’s algorithm using the OpenMP API. For an unmapped farm model, it is pretty difficult for a robotic unit to traverse the entire area and find the shortest path. The system involves a lot of energy consumption and still delays the process. In this case, we generate a map of the entire farm using Simultaneous Localization and Mapping. GraphSLAM algorithm generates the map using the method of gmapping. The research experiment is conducted using the simulated environment Gazebo and performed in the Robot Operating System (ROS) platform. The burger model of the turtle bot is used for traversing the arena on the ROS platform. The localization of the Fire Bird V is shown using the turtle bot in the abstraction by using the Adaptive Monte Carlo Localization algorithm.

2 Scope and Contribution This research tries to leverage the post-harvest agricultural losses in India using modern technologies. Automation has become a significant technology in today’s world and has delivered compelling results in the farming industry [13]. Moreover, the use of a developed algorithm has improved the system’s efficiency and sustainability. The use of localization and mapping techniques in this project optimizes the route of the robot. The autonomous intelligent system reduces human error and improves post-harvest crop collection and transportation facilities.

510

A. Nanda and D. Ahire

3 Previous Work In the previous research, we created an abstraction of the fruit farm [5]. As a result, the physical robot traversed in the 2D arena. Moreover, the physical robot is connected to the computer simulation of a 3D model of the farm generated using the Blender 3D modeling tool. The process starts with the blender interface in the application window and displays the movement of the robot. Finally, we used FireBird V ATmega 2560 robotics research platform to enact actual robots required in the real farm. It used the localization concept by belief state elimination and covered the shortest path of the fruit farm arena using Dijkstra’s algorithm. The robot aims to find the fastest route, reach the fruit crates, and drop them into the rotating disc, which is the abstraction for the actual truck. The robot acts as the coordinator node while blender simulation and rotating disc as the router node. The three nodes communicate among themselves using the XBee module similarly in [12]. Figure 3 displays the working model of the previous research.

4 Proposed Approach Proposed Algorithm: In the previous research, we used the single source shortest path approach, which lacks a parallel solution. In this research, we decided to harness the computing power of all the available logical cores by parallelizing Dijkstra’s algorithm using the OpenMP API [7]. This approach gives us fast results and is simultaneously work-efficient. Unlike the adjacency matrix version proposed in [6], we used our own custom adjacency list version using private queues having shared and private scopes. To ensure that each thread does an equal amount of work and the execution enters the critical section for the least number of times, we used the OpenMP Thread ID (TID) to calculate the custom bounds for the adjacency list iterator, as presented in step 6.7. We tested the algorithm on the non-free tier versions of the compute engines or the standalone servers used in [8]. To ensure the reproducibility of the algorithm’s output, we are providing the code implementation, the comprehensive test cases and the test case generator as V1.0 release Zenedo [9] and Github [10].

An Autonomous Intelligent System to Leverage …

511

Algorithm Parallel Dijkstra (G, DIST_S, S, K) Computes the shortest paths from source nodes to all other nodes parallely. Pre G is the given DAG, S is the given source node, K is the number of threads we want to use, and DIST_S is a one-dimentional array that will be used to store single source shortest path from source S to all other nodes for original graph G. Post DIST_S contains shortest paths to all nodes from source S. Return None. 1: Initialize an empty shared priority queue, SHARED_PRIORITY_QUEUE. 2: Initialize all elements of DIST_S to INF. 3: Initialize a one dimentional boolean array, VISITED, to ensure that each node is processed only once after its removal from the SHARED_PRIORITY_QUEUE. 4: Initialize source node S and DIST_S[S] = 0. 5: Push the source node S into the SHARED_PRIORITY_QUEUE. 6: Loop (While SHARED_PRIORITY_QUEUE is not empty) do 6.1: popped_node = SHARED_PRIORITY_QUEUE.top(). 6.2: Pop the top node of the shared queue, i.e., SHARED_PRIORITY_QUEUE.pop(). 6.3: If (VISITED[popped_node] == True) { 6.3.1: Continue. } 6.4: VISITED[popped_node] = True. 6.5: Initialize a PRIVATE_PRIORITY_QUEUE. This queue will have a private scope and therefore each thread/processor will have its own private queue. 6.6: Initialize a variable TID to fetch the Thread ID of the respective thread. This variable will have a private scope. 6.7: Define the OpenMP parallel region using OpenMP pragma. OpenMP Parallel Region with K threads and private scope for TID and PRIVATE_PRIORITY_QUEUE{ 6.7.1: Fetch the value for TID, i.e., TID = omp_get_thread_num(). 6.7.2: Fetch the size of adjacency list of popped_node, i.e., adj_size = G.adjacency_list[popped_node].size(). 6.7.4: Initialize variable PARTS = ceil(adj_size/STEP). 6.7.5: Initialize a variable FROM that will act as the starting position of the iterator for the adjacency list with respect to TID, i.e., FROM = (TID ∗ STEP). 6.7.6: Initialize a variable TO that will act as the ending position of the iterator for the adjacency list with respect to TID, i.e., TO = ((TID+1) ∗ STEP). 6.7.7: If (TID + 1 == PARTS){ 6.7.7.1: TO = adj_size. } 6.7.8: Loop ( List iterator it = FROM; it != TO ; it++ ) do 6.7.8.1: If ( VISITED[G.adjacency_list[popped_node]] == False ) { 6.7.8.1.1: Initialize a new node, i.e., new_node = G.adjacency_list[popped_node][it]. 6.7.8.1.2: If ( Dist_S[new_node] > Dist_S[popped_node] + |e(popped_node, new_node)| ) { 6.7.8.1.2.1: Dist_S[new_node] = Dist_S[popped_node] + |e(popped_node, new_node)|. 6.7.8.1.2.2: Update distance of new_node. 6.7.8.1.2.3: Push the new_node into the private priority queue, i.e., PRIVATE_PRIORITY_QUEUE.push(new_node). } } End Loop 6.7.9: Define the OpenMP critical region using OpenMP pragma. OpenMP Critical Region { 6.7.9.1: Loop (While PRIVATE_PRIORITY_QUEUE is not empty) do 6.7.9.1.1: Push all entries of the private queue into the shared queue, i.e., SHARED_PRIORITY_QUEUE.push(PRIVATE_PRIORITY_QUEUE.top()) 6.7.9.1.2: PRIVATE_PRIORITY_QUEUE.pop() End Loop } /*End of critical region*/ } /*End of parallel region*/ End Loop End Algorithm Parallel Dijkstra

512

A. Nanda and D. Ahire

Fig. 3 Working model of the system in 2D abstraction and 3D simulation model

5 Simulation and Results 5.1 Simultaneous Localization and Mapping for the Un-mapped Model The work demonstrates a system in which the map gets generated for the unmapped model of the farm. It uses the concept of Simultaneous Localization and Mapping for generating the map. Further, use the computational idea to traverse and compute the locations of the robot. The localization of the turtle bot takes place using Monte Carlo or is known as particle filter localization. This algorithm determines the rotation and location termed as pose by using sensors. The robot starts from a random place and has no idea about the surroundings. When it moves, it shifts the particles to predict the unfamiliar state.

5.2 Experimental Analysis for the Un-mapped Model of the Farm Simulation Environment This research uses the Robot Operating System platform for simulation purposes. The design of the entire environment occurs in the Gazebo simulator. Linux environment supports ROS and Gazebo, and it processes the realtime data. This work creates a 3D environment before testing the actual robot on the real farm. It is the essential requirement for the movement of the simulated robot over the model and locations. The model and building editor tools present in the Gazebo plugin create the environment in Xacro and URDF format. The simulation environment requires a simulated robot to complete the task. According to the pro-

An Autonomous Intelligent System to Leverage …

513

posed setting, the turtle bot is compatible. So, out of the burger, waffle, and waffle pi model, the burger model is used in our Gazebo environment. Localization Generating a map is a crucial factor for an unmapped simulated model. In this research, navigation is an essential part of the environment using the turtle bot. There are many algorithms to perform the complete navigation stack, but each has specific errors. This research involves the use of combined Monte Carlo Localization and the Adaptive Monte Carlo Localization method. Sensors used for performing the navigation are LiDAR and RGB-D camera mount on turtle bot. The sensors detect the presence of noise or obstacles in the environment. Sensor Fusion The algorithm works in two processes. One is updating the motion and sensor values, and the second one is by re-sampling. In this research, LiDAR, and the camera detects its surrounding which is known as belief in Monte Carlo Localization and send the previous belief data to the robot. The model and its algorithm update and sense both measurement and motion pose in the environment. Further detects the noise out of the given belief. To compare the error and find the probability, they are assigned to small particles to follow the particle filter method. Through this process, the weight is calculated, which is the mismatched value between the actual and predicted measurement. The weight helps in data transfer for the re-sampling cycle. After the correction of specific data, they returned to update measurement and motion. This process occurs in a loop until the design of a perfect graph of the environment. GMapping The GraphSLAM is a SLAM process that covers the entire path and map. Habibie et al. demonstrated the mapping of a fruit plant in a simulated environment of Gazebo using Simultaneous Localization and Mapping [11]. The map generates using poses, features, motion and by using measurement constraints. After this process’s creation of the robot’s course and map of the environment occurs, this process uses the information matrix and vector method. All the map values are stored in a matrix and vector form with an index of each particle. RTAB-Map or real-time appearancebased mapping is a graph-based SLAM approach. Here the map generates with the help of the loop closure function and inverted index method. The map gets stored with working memory (WM) and long-term memory (LTM). The process is termed RTAB-Map memory management, and the complete procedure is termed gmapping. The map gets launched using a package called RVIZ. It is a 3D visualization tool for ROS. The key_teleop package helps to move the turtle bot inside the environment. The designed 3D model of the farm and SLAM experimental analysis can be seen in Figs. 4, 5, 6, and 7. Path Planning In this process, the robot decides the shortest path among all the routes it has traversed to that point. It returns an optimized way from the starting point to the destination with less noise and disturbances.

514

Fig. 4 SLAM graph generation I

Fig. 5 SLAM graph generation II

Fig. 6 SLAM graph generation III

A. Nanda and D. Ahire

An Autonomous Intelligent System to Leverage …

515

Fig. 7 SLAM graph generation IV

6 Conclusion In this project, we presented two scenarios for precision farming. Drawing on concepts from our previous research, we developed the algorithm for the shortest path in the first case. Implementing parallel Dijkstra’s algorithm into such scenarios is the first of its kind to the best of our knowledge. Mapping the entire farm, collecting the fruit crates, and transmitting container data to the transportation unit requires an efficient system. Hence, the robot needs to traverse the shortest path and perform desired actions. Experimental analysis for the unmapped model of the farm gives an excellent result in a simulated environment. It’s essential to generate the map of a large farm in the simulation environment rather than traversing the real robot and finding out the shortest path. Testing with the actual robot requires enormous energy consumption, and it’s not an efficient way of spanning across the farm. So, this research involves the creation of the abstraction of the farm in a simulated environment. Further, it generates a map of the model using the concept of simultaneous localization and mapping. This research will leverage the post-harvest agricultural process and will help to transport the crops to the market on time.

References 1. Sector-wise contribution of GDP of India: http://statisticstimes.com/economy/sectorwise-gdpcontribution-of-india.php. Last accessed 16 July 2021 2. Jha, S.N., Vishwakarma R.K., Ahmad, T., Rai, A., Dixit, A.K.: Report on assessment of quantitative harvest and post-harvest losses of major crops and commodities in India. ICAR-All India Coordinated Research Project on Post-Harvest Technology, ICAR-CIPHET, P.O.-PAU, Ludhiana, 141004 (2015) 3. Stafford, J.V.: Implementing precision agriculture in the 21st Century. J. Agric. Eng. Res. 76(3), 267–275 (2000). https://doi.org/10.1006/jaer.2000.0577. ISSN 0021-8634

516

A. Nanda and D. Ahire

4. Reducing Post-harvest Losses in India: Key Initiatives and Opportunities. http://intellecap.com/ wp-content/themes/intellecap/pdf/Public-Facing-Report.pdf. Last accessed 16 July 2021 5. Nanda, A., Swain, K.K., Reddy, K.S., Agarwal, R.: Transporter: an autonomous robotics system for collecting fresh fruit crates for the betterment of the post harvest handling process. In: 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 577–582 (2020) https://doi.org/10.1109/ICACCS48705.2020.9074439 6. Ye, Z. (2021). An Implementation of Parallelizing Dijkstra’s Algorithm. Presentation. https:// cse.buffalo.edu/faculty/miller/Courses/CSE633/Ye-Fall-2012-CSE633.pdf 7. Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998). https://doi.org/10.1109/99.660313 8. Ahire, D., Bhandari, S., Kamble, K.: Finding the Kth max sum pair in an array of distinct elements using search space optimization. In: Innovations In Computer Science and Engineering, pp. 341–352 (2021). https://doi.org/10.1007/978-981-33-4543-0_37 9. Ahire, D.: adeepak7/Finding_all_edges_on_any_shortest_path_between_two_given_nodes_of _DAG_using_parallel_computing: First Release (Version v1.0). Zenodo (2021). https://doi. org/10.5281/zenodo.4893226 10. Ahire, D.: Finding All Edges on Any Shortest Path between Two Given Nodes of DAG Using Parallel Computing (Version 1.0) [Computer software] (2021). https://doi.org/10.5281/zenodo. 4893226 11. Habibie, N., Nugraha, A.M., Anshori, A.Z., Ma’sum, M.A., Jatmiko, W.: Fruit mapping mobile robot on simulated agricultural area in Gazebo simulator using simultaneous localization and mapping (SLAM). In: 2017 International Symposium on Micro-Nano Mechatronics and Human Science (MHS). https://doi.org/10.1109/MHS.2017.8305235 12. Gayatri Sakya, Ankit Gautam, Smart Agriculture System using Adhoc Networking among Firebird V Bots, International Journal of Innovations & Advancement in Computer Science IJIACS, ISSN 2347–8616, Volume 5, Issue 10, October 2016 13. Swathi, R., Mahalakshmi, T., Srinivas, A.: Vision based plant leaf disease detection on the color segmentation through fire bird V robot. GRD J. Glob. Res. Dev. J. Eng. 1(4), 2455–5703 (2016)

Zero-Day Attack Detection Analysis in Streaming Data Using Supervised Learning Techniques B. Ida Seraphim and E. Poovammal

Abstract In the last decade, protection of sensitive information and online resources from network intrusion is the critical challenge in the field of cybersecurity. Intrusion detection system provides an essential base for network defence. It requires the adaptive techniques that can handle the zero-day attacks. A zero-day attack imposes severe damage to the network or system. Attackers take hold of the vulnerabilities in the network and perform the zero-day exploit on the Web. This paper presents the machine learning techniques to detect the zero-day exploits with higher accuracy in less amount of time with reduced false alarm rate. The CICIDS dataset is streamed which includes zero-day attacks. The system is implemented using random forest, random tree, Naïve Bayes and Hoeffding tree techniques and evaluated against the performance measures such as accuracy, detection time and memory usage. It found that Hoeffding tree gives the accuracy of 99.97% in 5.94 s with the memory usage of 0.08 MB.

1 Introduction Due to the unexpected growth in Internet usage, the secured communication of digital information is crucial nowadays. Intrusion has become a significant issue in almost all the fields like government sectors, commerce and industries. An intrusion detection system (IDS) capable of identifying zero-day attacks is the need of the hour. An IDS is a computer hardware device or software application that runs, detects or monitors the presence of an intruder trying to break into the network or a system. The machine learning (ML) techniques build a strong IDS. However, existing IDS can detect only the known attacks with high accuracy but is unsuccessful in identifying the new unknown attacks called the zero-day attack [1]. B. I. Seraphim (B) · E. Poovammal Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Kattankulthur, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_46

517

518

B. I. Seraphim and E. Poovammal

IDS has a broad scope ranging from antivirus to hierarchical systems that screen the complete network’s traffic. Based on architecture, IDS is of two types: host intrusion detection system (HIDS) and network intrusion detection system (NIDS). HIDS examines the computer system calls, modification in a file system and deviation in application log data and finds the intrusion. NIDS examines the traffic in the network, changes observed in the network protocols and payloads for suspicious activities. Moreover, based on detection methods, IDS is of two types: anomalybased IDS and signature-based IDS. Signature-based detection identifies the attack by observing the patterns and signatures and comparing those patterns and signatures with the already stored signatures. A signature is a kind of pattern that describes the known attack or exploits [2]. They require frequent updates of the database with the new signatures. Signature-based detection techniques cannot be suitable for novel attacks. Anomaly-based detection detects harmful events by finding the deviation of the observed traffic from the normal traffic. The main drawback of the anomaly detection technique is the high false alarm rate (FAR). FAR is the previously unseen (legitimate) system behaviours considered to be anomalies. Three main limitations contribute to the challenges in network security [3]. (1) Voluminous growth of network data. (2) In-depth monitoring to improve efficacy and accuracy. (3) There are several different protocols, and a variety of data passes the network. The vulnerability in the software that the attacker discovers before the vendor is aware of is a zero-day attack. The attacker can easily exploit that vulnerability knowing that no defences are available [4]. The name is zero-day because an attack has zero-days between the vulnerability discovery and the first attack [5]. The main limitation of current IDS is that they rely on predefined patterns and signatures. As a result, many zero-day attacks remain undetected, which intensifies the consequences of stolen sensitive information, denial of service, etc. [1].

1.1 Anatomy of a Zero-Day Malware The framework of zero-day malware is as follows and also shown in Fig. 1.

Looking for vulnerabiliƟes

CreaƟng an exploit

Fig. 1 Anatomy of a zero-day attack

Observing System with vulnerability

Planning the aƩack

InfiltraƟon

Launch of zero-day exploit

Zero-Day Attack Detection Analysis in Streaming …

1.1.1

519

Looking for Vulnerabilities

Attackers search the software code for the vulnerabilities or even buy vulnerabilities from the black market and use it to gain unauthorized access to the system [4].

1.1.2

Creating an Exploit

The attackers create a code or a programme to exploit the vulnerabilities in the system or software [4].

1.1.3

Observing for Systems Affected by the Vulnerability

The attackers can monitor and find the system that suffers from vulnerability by using bots, automated scanners and other techniques [4].

1.1.4

Planning the Attack

The attacker plans to attack the system either in a targeted or non-targeted manner. The attacker carries out a detailed investigation to identify the best way to penetrate the system in a targeted attack. In a non-targeted attack, the attacker uses bots or large phishing campaigns to gain access to as many vulnerable systems as possible [4].

1.1.5

Infiltration

The attacker penetrates through the perimeter defences of an organization or personal devices [4].

1.1.6

Launch of a Zero-Day Exploit

Now, the attacker can launch a zero-day exploit on the compromised system with vulnerabilities [4].

1.2 Zero-Day Exploit Market There are three different categories of zero-day exploit market: white, grey and the black market.

520

B. I. Seraphim and E. Poovammal

Fig. 2 Top vendors affected by zero-day attack

1.2.1

White Market

The white market mainly deals with finding patches to the vulnerability that a vendor has reported. They fix the security holes in the system. The white market sells and buys the vulnerability information and uses that information for a good cause.

1.2.2

Grey Market

The grey market mainly comprises military and intelligence agencies. These people buy the zero-day exploit and vulnerability information about the targeted system.

1.2.3

Black Market

The black market mainly consists of cybercrime organizations. These organizations buy and sell the exploits to gain unauthorized access to one’s system or organization. The main goal of the black market breaks into the system, steals sensitive information’s or infects the network. The most significant zero-day attack gets induced in products from top companies like Microsoft, Adobe, etc. Microsoft products such as Windows, Internet Explorer and Office are highly vulnerable, and they are the top products affected by zeroday attacks [6]. Second, in line is Adobe product in which Adobe Flash is highly vulnerable and influenced by the zero-day attack. Figure 2 shows the top vendors affected by the zero-day attack. The data are

Zero-Day Attack Detection Analysis in Streaming …

521

taken from project zero by Google, which has vulnerability data based on products, vendors, etc. The dataset includes details of vulnerability from 2014 to date (2021).

2 Related Works Hindy et al. [1] proposed an autoencoder that detects zero-day attacks. The existing outlier-based detection suffers from a vast number of false negatives. The present IDS system cannot detect zero-day attacks as it entirely relies on signature-based attacks. Two benchmark datasets such as NSL_KDD and CICIDS2017 are used for evaluation. The performance of the autoencoder was compared with one-class SVM. The autoencoders are well suited for zero-day attack detection when compared to one-class SVM. The zero-day detection accuracy for the NSL_KDD dataset produced by the autoencoder ranges between 89 and 99% and CICIDS2017 ranges from 75 to 98%. Both the techniques maintain an excellent detection accuracy with a low false-positive rate. Kunang et al. [7] used the autoencoders for feature selection. They have selected the best features from the dataset. It passes the features into SVM for attack detection. The performance evaluation of the model uses autoencoder feature selection and SVM for multi-class classification that gives the overall detection accuracy of 86.96% and precision of 88.65%. Kaur et al. [8] proposed a hybrid technique that combines anomaly and signaturebased techniques to detect zero-day attacks. They proposed a layered architecture that comprises detection, analysis and resource layer. The unknown attacks can be detected using the detection layer. The behaviour analysis of the captured traffic can be done through the analysis layer. The hardware resources help execute components from the detection and analysis layer can be done in the resource layer. All three layers work in parallel to improve the performance, and the most popular SVM technique gives a good performance in attack detection. In one-class SVM, the training data available would be of a single normal class. It is easy to get the normal data for training, but collecting abnormal data are impossible. When new traffic arrives, they labelled it as out of class as it is challenging to handle. The proposed model gives an accuracy of 98% and a false-positive rate of 2%. Compared with the honeynet system, the proposed method reduces the response time to a greater extent. Radhakrishnan et al. [9] surveyed zero-day attacks. Malware is software that damages the computer or network internally. Malware analysis consists of malware detection, malware’s source, intent, functionality and impact. A large number of zero-day attacks are on the Microsoft and Adobe products. They analyzed a large number of famous zero-day attacks.

522

B. I. Seraphim and E. Poovammal

2.1 Renowned Zero-Day Vulnerabilities Aurora: Aurora is a famous zero-day attack focussed on the organizations like Adobe, Rackspace, Juniper Networks and Google [9]. The name comes from the file path of the malware where it resides. Stuxnet: Stuxnet is the most significant zero-day vulnerability. It replicates through the network and the detachable devices until the target is reached [9]. It steals the digital certificates and hides them from the antivirus. Isolation cannot control Stuxnet malware [9]. RSA APT Breach: It steals the RSA secure ID two factor authentication using advanced persistent threat (APT). It uses remote code vulnerability to induce attacks in Adobe Flash player. It also sends mail with malicious content links to the victim system [9]. Figure 2 shows the top vendors affected by the zero-day attack. The data are taken from project zero by Google, which has vulnerability data based on products, vendors, etc. The dataset includes details of vulnerability from 2014 to date (2021). Red October: Kaspersky Labs in 2012 discovered this vulnerability. This attack mainly targets government agencies. It uses advanced persistent threat (APT) to infect the victims. These victims receive the mail with malicious word or excel documents [9]. Dark Leech attack: These attacks target the Web servers. This attack compromises more than 40,000 Web servers. They deliver Nymaim ransomware to the compromised system. Most of the systems that are compromised were running Apache Web servers [9]. Most of the Adobe and Microsoft products are more vulnerable to various attacks. They have also analyzed various methods used in multiple papers. The main aim of the project is malware analysis using Cryptojackers and zero-day attacks. They have studied existing methods to come up with new novel methods to detect the attacks. Verkerken et al. [10] focusses on anomaly-based detection of a zero-day attack. They have used four unsupervised machine learning techniques, of which two of them was self-supervised techniques. They have used CICIDS 2017 dataset for evaluation of the four techniques. They have used principal component analysis (PCA), isolation forest, one-class SVM and autoencoder. These models are evaluated based on the classification performance and complexity of computation. The area under the receiver operating characteristics (AUROC) were obtained for all the models. The autoencoder outperformed all the other models with a 0.978 AUROC value and showed an acceptable computational complexity. One-class SVM with an AUROC value of 0.970, isolation forest with AUROC value of 0.958 and PCA with AUROC value of 0.937. Zoppi et al. [11] discuss the problems in zero-day attacks, and they reviewed unsupervised techniques for zero-day attack detection. To identify the zero-day attacks,

Zero-Day Attack Detection Analysis in Streaming …

523

they have applied a question–answer approach and conducted a quantitative analysis. They have taken a recent dataset and debated on how features impact detection performance. The meta-learning reduces the misclassification rate in detecting the zero-day attack.

3 Methodology This section mainly describes the dataset used, experimental setup, classification techniques and the ML techniques’ performance evaluation on the dataset.

3.1 CICIDS 2017 Dataset The study uses CICIDS 2017 machine learning CSV data taken from Canadian Institute for Cybersecurity site. This dataset mainly consists of real traffic, label data, zero-day attacks, complete packet captured, etc. As it has zero-day attacks data in this dataset, it can be beneficial for zero-day attack detection. CICIDS 2017 dataset contains eight different traffic monitoring sessions in a comma-separated value format. The CSV files consist of Benign and anomaly traffic data. It addresses 14 different types of attacks [12]. It includes extra features which are not present in the popular NSL_KDD dataset. These additional features lead to the accurate detection of various zero-day attacks. The CICIDS dataset represents the real-world traffic in the networks. The network traffic includes data from Monday to Friday. Monday traffic has only benign traffic. Brute force FTP, DoS, Heartbleed, brute force SSH, Web attack, infiltration, botnet and DDoS attacks are performed on the traffic from Tuesday to Friday [13]. The main shortcoming of the CICIDS dataset is that it scatters across eight different files. Individually, processing these files is very difficult. When all the files are combined, it results in a massive volume of data. The dataset contains duplicate columns and missing values that must be processed correctly. There is a huge class imbalance in the dataset. The benign instances occupy more than 80% of the dataset, leading to biased results towards the majority class. As a result, the detection accuracy reduces, and the false alarm rate increases [14]. There are many ways to handle the class imbalance problem in a dataset [15, 16]. One of the major solutions to address the class imbalance problem is class relabelling. The class relabelling can be done by partitioning the majority class or combining the minority classes to improve the prevalence ratio, thus reducing the class imbalance in the dataset. In CICIDS, it is difficult to split the majority class, so we decided to combine the minority classes into common attack classes [14]. The comparable characteristics of minority classes merge to form a new class [13]. Table 1 shows the new labelling used for minority classes. The class prevalence ratio improves after

524

B. I. Seraphim and E. Poovammal

Table 1 New class labelling for CICIDS dataset

New class label

Old class label

Normal

Benign

Botnet

Bot

Brute force

FTP-Patator

DDOS

DoS Golden eye

SSH-Patator DoS Hulk DoS Slow httptest DoS Slowloris Infiltration

Infiltration

Port scan

Port scan

Web attack

Web attack-brute force Web attack-SQL injection Web attack-XSS

combining the minority classes. To further enhance the class prevalence ratio, merge all the minority classes into one minority class named anomaly.

3.2 Experimental Setup The experiment was conducted using the MOA framework on Windows 10, 64-bit OS, Ryzen 7 processor and 16 GB RAM. The CICIDS 2017 dataset, which includes a zero-day attack data, is streamed using the MOA framework, and the supervised machine learning algorithms are applied for analysis. As the dataset is huge, consider part of the dataset for analysis [12]: 1. 2. 3.

Remove the redundant features in the dataset and relabel the classes. Then, split the dataset into training and testing sets. Apply machine learning techniques to the dataset. Compare the performance of the ML techniques and conclude.

The research is performed and analyzed on what kind of machine learning techniques can be applied to the dataset to detect the zero-day attack more accurately. Finally, the work uses random tree (RT), Naïve Bayes (NB), Hoeffding tree (HT), random forest (RF). Ali Hadi [17] states that the random forest technique gives good performance in attack detection. It also has excellent scalability and efficiency. The Naïve Bayes technique can identify the attack classes faster when compared to other techniques, and the model has low complexity. So, these techniques can effectively detect attacks.

Zero-Day Attack Detection Analysis in Streaming …

525

3.3 Random Forest The random forest is a kind of supervised ensemble learning classifier technique. The classifier collection is said to be a forest if it uses a decision tree as an ensemble. The random forest technique uses the bagging method for training. It can be used for both classification and regression [12]. The main aim of including the bagging model is to increase the overall performance. Due to the best performance, it can be used for anomaly detection.

3.4 Random Tree A random tree is a kind of decision tree technique built on randomly selected attributes. The decision tree consists of nodes and branches where nodes represent the test feature and branches hold the results. The decision leaves finally provide the class label to which the data belongs. Some of the researchers used the random tree technique for anomaly detection [12].

3.5 Naïve Bayes The Naïve Bayes is the most popular and widely used technique. Naïve Bayes is a statistical technique based on the Bayes theorem [12]. It assumes that the influence of feature values of a class is independent of other features values. Naïve Bayes classifiers are easy to build and can be efficiently used for larger datasets [18]. The assumptions on the attributes are simplified; that is, why it got the name Naïve. Researchers have also used the Naïve Bayes technique for anomaly detection.

3.6 Hoeffding Tree Hoeffding tree technique is an incremental machine learning technique that is suitable to handle large datasets. It is a tree-like structure with root, leaf and test node. The leaf node represents the prediction of a class [19]. In streaming data, the data must be classified within a single pass. Hoeffding tree uses an incremental tree approach to classify the data.

526

B. I. Seraphim and E. Poovammal

3.7 Analysis Tool Used The experiment uses the massive online analysis (MOA) framework. Now, stream the CICIDS 2017 dataset using the MOA framework, evaluate the performance by applying the machine learning techniques. MOA is a JAVA-based open-source framework that efficiently handles large data streams [18]. The experiment simulation uses Ryzen 7 processor, 16 GB RAM, Windows 10 operating system to perform the analysis.

4 Experimental Result Analysis This section deals with dataset processing, machine learning techniques applied and their result discussions.

4.1 Dataset Processing The CICIDS 2017 dataset consists of eight different machine learning CSV files combined into a single CSV file. Now, the CSV file is first converted to an ARFF file format using the WEKA tool. Load and stream the dataset using the MOA framework. The dataset consists of 78 features, and the 79th feature denotes the class label. One of the features called ‘Fwd Header Length’ is redundant, so the redundant columns must be removed [12]. After removing, it results in 77 features and one-class label column. As discussed, earlier the CICIDS dataset is susceptible to high-class imbalance. We relabel the class to reduce the class imbalance. After relabelling, divide the dataset into testing and training sets. We performed training using 70% of the data and testing using 30% of data.

4.2 Result Analysis The experiment uses the MOA framework. Now, the dataset is streamed using MOA, and the machine learning techniques like random tree (RT), Naïve Bayes (NB), Hoeffding tree (HT), random forest (RF) were applied. The performance of these techniques is analyzed using performance metrics such as accuracy, execution time and memory used. Figure 3 shows Hoeffding tree performs exceptionally well in detecting the zeroday attacks compared to NB, RF and RT. The detection accuracy given by Hoeffding tree is 99.97% which is 0.15, 1.77 and 2.23% higher than NB, RF and RT. Thus, it proves that the Hoeffding tree detects the zero-day attack more accurately.

Zero-Day Attack Detection Analysis in Streaming …

527

Fig. 3 Comparison of accuracy produced by ML techniques

We have compared the time taken by different machine learning techniques to detect the zero-day attack. From Fig. 4, it is evident that the time taken by the random tree is 4.36 s which is significantly less when compared to NB, RF and HT with an accuracy of 97.74%. But the detection accuracy of HT is 99.97% that takes 5.94 s to detect the attack. It shows that time taken by HT is more when compared to NB, RF and RT. Even though the time taken by HT is comparatively high, still it produces an excellent detection accuracy. We have compared the memory used by different machine learning in detecting the zero-day attack. From Fig. 5, it is clear that NB occupies 0.03 MB of memory

Fig. 4 Comparison of time taken to detect zero-day attack

528

B. I. Seraphim and E. Poovammal

Fig. 5 Memory usage to detect zero-day attack

Table 2 Comparison of state-of-the-art methods ML techniques

Accuracy (%)

ML techniques

Accuracy (%)

SVM [7]

86.96

Accuracy weighted ensemble [19]

95.30

CART [20]

80.3

Accuracy updated ensemble [19]

87.4

KNN [20]

79.4

Random forest

98.2

C4.5 [20]

81

Naïve Bayes

99.82

OC-SVM [20]

83.24

Random tree

97.74

AdaBoost [21]

97.53

Hoeffding tree

99.97

which is significantly less when compared to RF, HT and RT. The next in line is HT, which occupies 0.08 MB of memory with the highest detection accuracy of 99.97%. Even though HT occupies slightly higher memory than NB, but it gives an outstanding detection accuracy. Table 2 explains the accuracy obtained by various state-of-the-art methods. It is found that Hoeffding tree gives a good detection accuracy of 99.97% which is 13.01, 19.67, 20.57, 18.97, 16.73, 2.44, 4.67, 12.57, 1.77, 0.15 and 2.23% higher than SVM, CART, KNN, C4.5, OC-SVM, AdaBoost, accuracy weighted ensemble, accuracy updated ensemble, random forest, Naïve Bayes and random tree.

5 Conclusion This paper deals with the comparison of supervised learning techniques on streaming data to detect zero-day attacks. We analyzed four different ML techniques, namely

Zero-Day Attack Detection Analysis in Streaming …

529

NB, RF, RT and HT. We found that the Hoeffding tree approach gives an outstanding detection accuracy of 99.97% with time 5.94 s and memory 0.08 MB. Even though HT took more time than other techniques, it shows reduced memory usage and increased detection accuracy. The CICIDS includes known zero-day attacks in the dataset, and supervised learning can effectively detect the known zero-day attack. But when an unknown zero-day attack comes, supervised learning may fail to detect those unknown attacks. In the future, if we combine supervised and unsupervised learning, we can detect the unknown real-time zero-day attack even more precisely and with perfect detection accuracy. In the future, multi-class classification of various attacks can be done. Acknowledgements This Publication is an outcome of the R&D work under MeitY’s Visvesvaraya Ph.D. Scheme, Government of India, implemented by Digital India Corporation.

References 1. Hindy, H., Atkinson, R., Tachtatzis, C., Colin, J.N., Bayne, E., Bellekens, X.: Utilising deep learning techniques for effective zero-day attack detection. Electronics 9(10) (2020) 2. Green, C., Lee, B., Amaresh, S., Engels, D.W.: Comparative study of deep learning models for network intrusion detection. SMU Data Sci. Rev. 1 (2018) 3. Shone, N., Ngoc, T.N., Phai, V.D, Shi, Q.: A deep learning approach to network intrusion detection. In: IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2 (2018) 4. Zero-Day Vulnerabilities, Exploits and Attacks: A Complete Glossary, https://www.cynet. com/network-attacks/zero-day-vulnerabilities-exploits-and-attacks-a-complete-glossary/, Cynet (2020). 5. Posey, B., Shea, S.: Zero-day (Computer). https://searchsecurity.techtarget.com/definition/ zero-day-vulnerability, Tech Target, August 2020 6. Hawkes, B.: 0day in the wild. In: Project Zero Team, Google (2019) 7. Kunang, Y.N., Nurmaini, S., Stiawan, D., Zarkasi, A, Jasmir, F.: Automatic features extraction using autoencoder in intrusion detection system. In: Proceedings of the 2018 International Conference on Electrical Engineering and Computer Science (ICECOS), Indonesia, pp. 219– 224 (2018) 8. Kaur, R., Singh, M.: A hybrid real-time zero-day attack detection and analysis system. I.J. Comput. Netw. Inf. Secur. 9, 19–31 (2015) 9. Radhakrishnan, K., Mohan, R.R., Nath, H.V.: A survey of zero-day malware attacks and its detection methodology. IEEE Xplore (2019) 10. Verkerken, M., D’hooge, L., Wauters, T., Volckaert, B., De Turck, F.: Unsupervised machine learning techniques for network intrusion detection on modern data. In: 2020 4th Cyber Security in Networking Conference (CSNet), pp. 1–8. doi:https://doi.org/10.1109/CSNet50428.2020. 9265461 (2020) 11. Zoppi, T., Ceccarelli, A., Bondavalli, A.: Unsupervised algorithms to detect zero-day attacks: strategy and application. IEEE Access 9, 90603–90615 (2021). DOI:https://doi.org/10.1109/ ACCESS.2021.3090957 12. Kurniabudi, D.S., Darmawijoyo, Bin Idris, M.Y., Bamhdi, A.M., Budiarto, R.: CICIDS-2017 dataset feature analysis with information gain for anomaly detection. IEEE Access 8, 132911– 132921 (2020). doi:https://doi.org/10.1109/ACCESS.2020.3009843

530

B. I. Seraphim and E. Poovammal

13. Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: 4th International Conference on Information Systems Security and Privacy (ICISSP), Portugal (2018) 14. Panigrahi, R., Borah,S.: A detailed analysis of CICIDS2017 dataset for designing intrusion detection systems. Int. J. Eng. Technol. 7, 479–482 (2018) 15. Mera, C., William Branch, J.: A survey on class imbalance learning on automatic visual inspection. IEEE Latin Am. Trans. 12, 657–667 (2014). doi:https://doi.org/10.1109/TLA.2014.686 8867 16. Longadge, R., Dongre, S.S., Malik, L.: Class imbalance problem in data mining: review. Int. J. Comput. Sci. Netw. 2 (2013). ISSN 2277–5420 17. Hadi, A.A.A.: Performance analysis of big data intrusion detection system over random forest algorithm. Int. J. Appl. Eng. Res. 13, 1520–1527 (2018) 18. Seraphim, B.I., Poovammal, E.: Analysis on intrusion detection system using machine learning techniques. In: Pandian, A., Fernando, X., Islam, S.M.S. (eds.) Computer Networks, Big Data and IoT, Lecture Notes on Data Engineering and Communications Technologies, vol. 66. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0965-7_34 19. Akanksha, M.S., Khwaja Aamer, S.A.H.: Increasing efficiency of intrusion detection system using stream data mining classification. Int. J. Adv. Res. Ideas Innov. Technol. IJARIIE 3, 1253–1261 (2017) 20. FKhraisat, A., Gondal, I., Vamplew, P., Kamruzzaman, J., Alazab, A.: Hybrid intrusion detection system based on the stacking ensemble of C5 decision tree classifier and one class support vector machine. MDPI Electron. 9, 1–18 (2020) 21. Abri, F., Siami-Namini, S., Adl Khanghah, M., Mirza Soltani, F., Siami Namin, A.: Can machine/deep learning classifiers detect zero-day malware with high accuracy? In: IEEE International Conference on Big Data, pp. 3252–3259 (2019) 22. Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: data mining, inference, and prediction, 2nd ed. 12th Printing, Springer Science & Business Media, New York (2017)

Comparative Evaluation of Machine Learning Methods for Network Intrusion Detection System Sunil Kumar Rajwar, Pankaj Kumar Manjhi, and Indrajit Mukherjee

Abstract In this paper, we define the outlier detection and its application areas. The most important field of outlier detection is network anomaly detection. This can be achieved by network intrusion system. There are several NIDSs are developed and practically implemented. We proposed a comparative analysis of different machine learning methods for network anomaly detection. The standard KDD99 dataset is used worldwide for practical IDS evaluation. We have also used 10% KDD99 dataset with Weka software for the analysis of different machine learning algorithms. According to the observation K Star, random forest, Bayesnet, logistic, IBK, decision table performance is better as compared to the other algorithms.

1 Introduction The process of finding unexpected patterns in data is referred as outlier detection [1]. These unexpected patterns are often referred to as anomalies, inconsistent observations, exceptions, glitches, and defects. Outlier detection is a widely used in several fields and in great use for a wide range of application domains such as cyber security, financial transactions, surveillance, industrial flaw detection, tax fraud detection, insurance, and many more. Outlier detection system provides us a systematic process to find anomalies in the data that are used as meaningful information in a wide range of application domains. For example, an unexpected network traffic pattern on a computer network can cause S. K. Rajwar (B) University Department of Computer Applications, Vinoba Bhave University, Hazaribag, Jharkhand, India P. K. Manjhi University Department of Mathematics, Vinoba Bhave University, Hazaribag, Jharkhand, India I. Mukherjee Departmetn of Computer Science and Engineering, Birla Institute of Technology, Mesra, Jharkhand, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_47

531

532

S. K. Rajwar et al.

an attacker to send confidential data to an unauthorized destination. Outlier detection techniques are also used to detect abnormal patterns in patients’ medical records that may be symptoms of a new disease. Similarly, unexpected transactions in credit card can indicate theft or abuse of credit cards [2]. Our research aim is to provide the basic understanding of network anomalies and their different type of detection system. We also evaluate different machine learning approach for network anomaly detection and their results based on benchmark network dataset with standard machine learning tools.

2 Prior Survey Network anomaly detection or intrusion detection is an important field of computer science deal with network security and management. Several researches have been done in the field of network intrusion detection. Bhattacharyya et al. (2013) give comprehensive details about network anomalies, systems, and tools used. This paper also describes several recent system developed for intrusion detection along with their advantages and disadvantage [3]. Hamid et al. (2016) also provide a machine learning approach for IDS. This approach is very beneficial for upgrading several existing IDS. It enhances the performance and effectiveness in terms of accuracy and avoiding false positive rate [4].

3 Network Anomalies Anomalies in the network generally refer to situations in which network operations deviate from normal network traffic [5]. Anomalies in the computer network can arise for a variety of reasons, including malfunction of network devices, network traffic overload, denial-of-service attacks, and network outages that interfere with the normal delivery of network services. These network anomalies will obstruct with the normal activities of some computable network data. The behavior of normal network depends on a number of network related factors, such as network traffic volume, type of network data, and different applications running on the computer network. Accurate normal network behavior modeling remains a dynamic area of research, especially modeling of real-time network traffic [6]. As a network administrator, it is very difficult to observe anomalies or violations of rules to determine the fitness of the network. These conditions activate an alarm to indicate a deviation from normal network behavior and can occur before or during abnormal events. These deviations degrade the performance of the network. Network anomalies can be classified into two categories:

Comparative Evaluation of Machine Learning Methods …

533

3.1 Network Configuration Anomalies Anomalies in the network occurred due improper configuration of the network. It includes weak network architecture, improper configuration of network equipment, transmission of unprotected passwords, and long-term use of same password. [7, 8].

3.2 Security-related Anomalies These anomalies are different types of attacks include DoS and network disruption. DoS attack occurs when the services of the network are hijacked by some malicious programs. Attackers may disable vital services such as DNS services and network shutdown [9, 10]. Due to this network, bandwidth is hijacked and by unnecessary flooding of the network, legitimate users are disturbed [11, 12]. A large volume of network traffic occurred due to this type of anomalies.

4 Network Anomalies Detection The enormous growth critical systems on a network and the development of specialized applications that run on distributed environment are highlighting network security. Therefore, the integrity, availability, and confidentiality of a computing resource are compromised by the attackers [3].

4.1 Intrusion Detection System (IDS) The intrusion detection system (IDS) is the most important component in the network management and anomaly identification. Finding relevant, hidden data from network traffic can be achieved by integrating data mining with IDS with less execution time. This make network administrator to early find the anomalies and take counter measures to secure the network [13]. An intrusion detection system is located centrally in the network infrastructure to monitor and evaluate network activities. It consists of several components that include data collection and preprocessing module, intrusion identification module, alert module, and intrusion models module. Figure 1 shows that the data collection module collects the network traffic and sends it to preprocessing module. The preprocessing module removes noise and adds necessary attributes. The processed data are then sending to the intrusion identification module to identify and classify the intrusion according to their severity. The report module continuously checks the log and generates report accordingly. If the

534

S. K. Rajwar et al.

Fig. 1 General structure of intrusion detection system

report is normal, then no action is taken, otherwise alarm is generated for reporting. Based on the report, network administrator handles the situation in advance to avoid any intrusion in the network.

5 Network Anomalies Identification Techniques A network intrusion detection system (NIDS) based on the fact that the behavior of the user is observable, and the intrusive behavior can be differentiated from normal network behavior [14, 15]. While detecting network anomalies and several network anomaly identification models available, there are some issues related to the design of effective intrusion detection system: • In the last decade, several network anomaly detection systems are developed but they are not universally accepted and used. • It is very difficult to separate noise from the data. Noise normally looks like real anomaly. • There is lack of publicly available standard network datasets that are used for network intrusion detection. • Constantly, evolving nature of normal behavior limits the capabilities of current intrusion detection techniques. We need more powerful and sophisticated techniques because the intruders aware of all current IDS systems. In general, network anomaly detection system is categorized into three categories:

Comparative Evaluation of Machine Learning Methods …

535

5.1 Misuse Detection Misuse detection is based on finding abnormal behaviors in network traffic to compare with known attacks signature [16]. Any match with signature indicates system intrusion. Misuse detection is more accurate than anomaly detection because of less false alarms rate. The main limitation of misuse detection is to model the signatures of each potential attack. One more problem related to misuse detection is update the signature database when new attacks appear on the platform.

5.2 Anomaly Detection Normal behavior of the network is build by modeling in anomaly detection. User behaviors are watched over time, and a model is created that effectively finds a legitimate user behavior. Activities that are very different are considered as anomaly. The main limitations of anomaly detection are model normal behavior and identify user’s behavior as normal. It also results in high false positive which make this detection ineffective and outdated [17].

5.3 Hybrid Approach The combination of misuse detection and anomaly detection is often referred as hybrid detection. The main advantage of hybrid detection is it includes the positives of both misuse detection and anomaly detection. The complexity of hybrid approach is biggest disadvantage of this method due to combination of misuse and anomaly detection.

6 Machine Learning Machine learning deals with computer programs that learn automatically to predict complex patterns from data [18]. Nowadays, machine learning is widely used in decision-making. To improve the performance of intrusion detection system, machine learning methods are widely used by network administrators. Most widely used machine learning algorithms are neural networks [19], support vector machines [20], decision trees [21].

536

S. K. Rajwar et al.

6.1 Supervised Learning It directly related with classification. In supervised learning, the program is trained with labeled data.

6.2 Unsupervised Learning It is directly related with clustering. In unsupervised learning, similar data are grouped together to find meaningful patterns. Our research paper focuses on comparative evaluation of intrusion detection by applying supervised learning methods. This paper focuses on various supervised learning methods used for the experiment, including Naive Bayes, multilayer perceptron, K Star, decision table, J48, random forest, AdaBoost M1, OneR, SMO, etc. All these methods are selected from Weka explorer in the classify tab.

7 Datasets In supervised learning methods, datasets are important component for the training and testing the models. However, standard datasets are necessary for testing and training for intrusion detection system. In the last two decades, a no. of standard dataset developed so far. All of them are collected in dedicated network infrastructure with legitimate attack experiment to improve the quality of the datasets. Dedicated projects are developed to create benchmark datasets all over the world. These are PREDICT, CAIDA, DEFCON, ADFA, NSL-KDD, KYOTO, ISCX 2012, and ICX attack datasets [22]. Due to enhanced skills and advanced attacks developed by attackers, these datasets become obsolete and out of date. However, it is essential for any network administrator to have sound knowledge about these datasets. This helps the network administrator to develop their own network infrastructure to effectively improve the performance any network intrusion in future.

7.1 KDD Cup 1999 Datasets The KDD Cup 1999 dataset is a benchmark dataset widely used in research for intrusion detection. This dataset is almost 15 yrs old but extensively used in research [23]. The following are the main characteristics of KDD 99 dataset: • It consists of a collection of 2 week of normal data and 5 week of attack instances. • The overall output is divided into 5 categories: normal, denial-of-service (DoS), user to root (U2R), root to local (R2L), and probe.

Comparative Evaluation of Machine Learning Methods …

537

AƩack type Normal

U2R

DoS

Probe

R2L

0% 0%

20% 1%

79%

Fig. 2 Attack distribution in 10% KDD99 dataset (total instances: 494,020)

• In general, normal network traffic consists of 2% of attack instances in whole dataset but in KDD99, almost 80% of the instances are attack instances which make this dataset a benchmark dataset for evaluation of any intrusion detection system. The whole KDD99 dataset consists of a total of 4,898,430 instances which make the dataset unsuitable for most of the machine learning algorithms. In our experiment, we use 10% of KDD99 dataset to reduce the complexity and it is shown in Fig. 2.

8 Evaluation Metrics For the evaluation of any machine learning algorithms, we use evaluation metrics for better performance. We use different parameters for evaluation which are as follows: (a)

Total Acc: Total accuracy evaluates the percentage of correct predictions over the total number of instances. Total Acc =

(b)

t positives + t negatives t positives + f positives + t negatives + f negatives

Recall: True positives proportions that are correctly identified.

538

S. K. Rajwar et al.

Recall = (c)

t positives t positives + t negatives

Precision: The positive patterns that are correctly predicted from the total predicted patterns in positive cases. Precision =

(d)

t positives t positives + f positives

F-Measure: It represents the harmonic mean between recall and precision values. F − Measure =

(e)

TP Rate: It measures the proportion of positives that are correctly identified as positives. TP Rate =

(f)

2 ∗ Precision ∗ Recall Precision ∗ Recall

t positives t positives + f negatives

TN Rate: It measures the proportion of negatives that are correctly identified as negatives. TN Rate =

t negatives f positives + t negatives

9 Machine Learning Tools 9.1 WEKA In data mining and knowledge discovery, Weka is used by most of the researcher. It is open-source software [24]. Weka provides a number of algorithms used for supervised, unsupervised learning. Weka Work Explorer includes methods for major data mining problems including regression, classification, association rules mining, and features selection.

Comparative Evaluation of Machine Learning Methods …

539

Fig. 3 Results of several machine learning methods

10 Experiments and Results In our experiment, we use different machine learning techniques for evaluation in Weka environment. The computational environment required for our experiment is as follows: Minimum computational requirements for ML experiment. CPU: Intel core i3 CPU with 2.3 GHz processor. RAM: 4 GB RAM. O/S: 64 bit Windows 10 Operating System. We begin our experiment with evaluating the performance of different machine learning techniques on a dataset of 21,679 instances. These data are randomly selected from 10% of KDD99 dataset. A total of 12 features are selected for this experiment that include 11 features extracted with Weka tools given in the table and type attribute for categorization of different attacks. The Fig. 3 given below represents the results of different machine learning techniques on the reduced dataset consisting of 12 features.

11 Conclusion This experiment is carried out with benchmark KDD99 dataset with different machine learning algorithms. The results show among all different ML techniques K Star, random forest, Bayesnet, logistic, IBK, decision table performance is better as

540

S. K. Rajwar et al.

compared to the other. Due to the large number of false positives generated during intrusion detection, it necessary to update the algorithms to enhance the efficiency of IDS in future. Our experiment also shows that all classification methods does not depend upon all 41 attributes of KDD99 dataset. We can extract important attributes to get faster and accurate results.

References 1. Chandola, V., Banerjee, A., Kumar, V.: Outlier Detection: A Survey, ACM Computing Surveys (2009). 2. Gogoi, P., Bhattacharyya, D.K., Borah, B., Kalita, J.K.: A Survey of Outlier Detection Methods in Network Anomaly Identification, the Computer Journal, vol. 54, no. 4 (2011) 3. Bhattacharyya, D.K., Kalita, J.K.: Network Anomaly Detection: A Machine Learning Perspective. ISBN 9781466582088, Published by Chapman and Hall/CR, p. 366 (2013) 4. Hamid, Y., Balasaraswathi, R., Sugumaran, M.: IDS using machine learning-current state of art and future directions. Br. J. Appl. Sci. Technol. 15(3), 1–22 (2016) 5. Khraisat, A., Gondal, I., Vamplew, P., Kamruzzaman, J.: Survey of intrusion detection systems: techniques, datasets and challenges, Khraisat et al. Cybersecurity (2019) 6. Ye, T., Kalyanaraman, S., Harrison, D., Sikdar, B., Mo, B., Kaur, H.T., Vastola, K., Szymanski, B.: Network management and control using collaborative on-line simulation. Proc. CNDSMS (2000) 7. Thottan, M., Ji, C.: Using network fault predictions to enable IP traffic management. J. Netw. Syst. Manage. (2000) 8. Maxion, R., Feather, F.E.: A case study of ethernet anomalies in a distributed computing environment. IEEE Trans. Reliability 39, 433–443 (1990) 9. Vigna, G., Kemmerer, R.A.: Netstat: a network based intrusion detection approach. Proc. ACSAC (1998) 10. Yang, J., Ning, P., Wang, X.S., Jajodia, S.: Cards: a distributed system for detecting coordinated attacks. Proc. SEC, 171–180 (2000) 11. Wang, H., Zhang, D., Shin, K.G.: Detecting Syn flooding attacks. Proc. IEEE INFOCOM (2002) 12. Savage, S., Wetherall, D., Karlin, A.R., Anderson, T.: Practical network support for IP traceback. Proc. ACM SIGCOMM, 295–306 (2000) 13. Nadiammai, Hemalatha, M.: Effective approach toward Intrusion detection system using data mining techniques. Egypt. Inform. J. 15 (2014) 14. Stallings W.: Network and internetwork security: principles and practice. Englewood Cliffs: Prentice Hall 15. Verwoerd, Theuns, Hunt, R.: Intrusion detection techniques and approaches. Elsevier Comput. Commun. 25(15), 1356–1365 2002) 16. Anonymous. Intrusion detection FAQ. Available: http://www.sans.org/ Available:http://www. sans.org/security-resources/idfaq/ (2010). Accessed 19 May 2010 17. Julian, S., Malki, H.: Network intrusion detection system using neural networks. s.l.: IEEE, ICNC’08. Fourth International Conference (2008) 18. Machine learning. [Online] Available: https://en.wikipedia.org/wiki/. Machine learning (2015) 19. Tong, D.L., Mintram, R.: Genetic Algorithm-Neural Network (GANN): a study of neural network activation functions and depth of genetic algorithm search applied to feature selection. Int. J. Mach. Learn. Cybern. 1(1–4), 75–87 (2010) 20. Peddabachigari, S., Abraham, A., Thomas, J.: Intrusion detection systems using decision trees and support vector machines. Int. J. Appl. Sci. Comput. 11(3), 118–134 (2004)

Comparative Evaluation of Machine Learning Methods …

541

21. Sindhu, S.S.S., Geetha, S., Kannan, A.: Decision tree based light weight intrusion detection using a wrapper approach. Elsevier Expert Syst. Appl. 39(1), 129–141 (2012) 22. Ahmed, M., Naser, A., Mahmood, Hu, J.:A Survey of network anomaly detection techniques,Journal of Network and Computer Applications 60, 19–31 (2016) 23. “KDD Cup 1999 Data.” [Online]. Available: http://kdd.ics.uci.edu/databases/kddcup99/kdd cup99.html (1999) 24. WEKA, http://www.cs.waikato.ac.nz/ml/weka/

Rare Pattern Mining from Data Stream Using Hash-Based Search and Vertical Mining Sunitha Vanamala, L. Padma Sree, and S. Durga Bhavani

Abstract Rare itemset mining is the emerging research domain in data mining. Patterns with low support and high confidence are referred to as Rare Patterns, which are very interesting compared to frequent Patterns in certain application domains like analysis of network logs, online customer purchase behavior, online banking transaction analysis, sensor data analysis, stock market data analysis. Many applications generate large volumes of the continuous data streams. To analyze such data streams and to identify rare patterns, we need efficient algorithms that can process data streams. Many research articles on rare pattern mining are available for static databases. However, it is not possible to apply the algorithms designed for static databases to data streams. Hence, we need algorithms that are specifically designed for data stream processing, to mine important rare patterns. Rare pattern mining from the data stream is in the budding stage and only a few algorithms are available. To address this, we have proposed algorithm HEclat-RPStream, an Eclat based method to mine rare patterns from a data stream using a vertical mining with bitsets. The discovered patterns are maintained in a prefix-based rare pattern tree, it uses double hashing to maintain a rare pattern in the data stream. The algorithm also uses the Breadth First Search (BFS) and Depth First Search (DFS) to discover interesting large itemsets. To handle data streams, we have used a time sensitive sliding window approach which captures most recent patterns. The pruning technique based on two items is used to optimize the performance. The experimental results of the proposed method demonstrated good performance concerning execution time and the total number of rare patterns generated.

S. Vanamala (B) Department of CS, TSWRDCW, Warangal East, Warangal, Telangana, India L. P. Sree Department of ECE, VNR Vignan Jyothi Institute of Technology and Science, Hyderabad, Telangana, India e-mail: [email protected] S. D. Bhavani School of Information Technology, JNTUH, Hyderabad, Telangana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_48

543

544

S. Vanamala et al.

1 Introduction The extraction of common patterns is an important stage in association rule mining since it reduces the search time and the number of association rules created. The concept of association rules assumes that all items in the database have identical occurrence frequencies hence only one minSUP is used for the entire database. However, because most real-world databases are non-uniform, mining frequent patterns (or association rules) with a single minSUP constraint causes the following issues: (i) if minSUP is set too high, we will not be able to find patterns involving rare itemsets. (ii). we must lower minSUP to a relatively low value in order to detect patterns including both common and unusual items. However, because those common items will be related with one another in all feasible ways, many of them may be meaningless depending on the user and/or application needs, this may result in a combinatorial explosion, resulting in too many patterns.

2 Related Work The subject of mining association rules from databases has been extensively addressed in the literature since the advent of association rules. Market basket analysis is a typical application, in which association rule mining examines how things purchased by customers are related. There are three types of research into mining frequent item sets in data streams: landmark-window-based data steam mining, damped-window-based data stream mining technique for discovering frequent patterns, and sliding window-based method [1–4]. A two-phase approach was developed by Tanbeer et al. [5]. The sliding window’s items are kept in a Compact tree structure CPS Tree in a preset order in the first phase. The tree is reconstructed in the second phase to derive all closed frequent item sets. It is more efficient than the CET method, but it must be rebuilt for each item if it is appearing for the first time, so takes more space and time to compute. Window initialization, sliding window, and pattern generation are three steps proposed by Deypir et al. [6], which include window setup, updating operation, and pattern generation. Because it does not need to evaluate the complete history of incoming transactions and can just handle a narrow range of recent transactions, the sliding window is an intriguing model for solving common pattern mining problems. However, in the past, however, previous sliding window algorithms need a large amount of memory and processing time. Gangin et al. [7] propose a method for finding the most recent patterns in a data stream using a weighted maximal frequent itemsets and the sliding window technique, in which a single scan reads the database, loads the transactions, and finds the most recent frequent patterns using a sliding window model, but the model’s tree structure is limited because they used an FP-like tree that needs more time, so

Rare Pattern Mining from Data Stream …

545

the new approach is used to address the restrictions of FP tree and used B+ tree to increase the performance and generates the frequent itemsets rapidly. To generate a set of rare itemsets, Huang et al. [8] introduced a new approach SRP. The items in the newly arrived transaction are added into a prefix tree based on the FP tree approach. In most cases, the FP tree is constructed by putting all items of the transactions in descending order of item’s support count. However, in the case of data streams, it is not possible to arrange the items in a logical order. To solve this difficulty, a structure known as a connection table is used to keep track of the elements in the sliding window in canonical order. If an item has less than minimum support, the path containing the item creates the entire set of rare itemsets. RP-Tree [9] avoids the costly itemset creation process by using pruning, and patterns are kept in a tree data structure based on FP tree. Sunitha et al. [10], in this article, multiple minimum item support (MIS) values are used to solve the problem of missing rare pattern sets with low support but high confidence using bitsets and vertical pattern mining. Sunitha et al. [11] provided a method for analyzing data and identifying useful unusual association rules. Sliding window approach with vertical bit sequence format is used to execute such an uncommon association rule mining procedure. Rare associations can be found using an efficient sliding window approach in a search. Almuammar et al. [12, 13], proposed a method to find frequent patterns data stream, the algorithm uses adaptive learner to classify multiple data streams, multiple supports and stream data is processed using titled time windows. Sunitha et al. [14] proposed an Eclat_RPgrowth technique based on Zaki’s Eclat [15]. A vertical pattern mining with bitset. An efficient pruning strategy and linear tree implementation are key points in this approach. Rahul et al. [17] proposed an effective compact sliding window tree structure with enhanced weight conditions to detect frequent patterns within a single database scan over a dynamic data stream. Anindita et al. [18] proposed a single pass rare pattern mining technique to detect outliers from incremental databases.

2.1 Basic Terminology Let the list of items I in the database as {i1, i2, i3, …, in}. The data stream is a continuous stream of incoming transactions, each of which arrives at a particular timestamp denoted as ds = {t1, t2, …, tn, tn + 1,}, each transaction may have a subset of items of I. An itemset is a nonempty itemset that has one or more items from items list I. An itemset that has k items is referred to as k-itemset. Support of an itemset X is the fraction of transactions in the database that has the itemset X denoted as |X|. Block: The incoming transactions are grouped into a block based on time unit, each block is a set of transactions and the total number of transactions in block represents the block size |B|, |B| is time sensitive, and may vary between blocks.

546

S. Vanamala et al.

Frequent Itemset: The given itemset X is referred as frequent if its support is higher than the user defined Minimum Support Threshold1 (MST1), i.e., |X| > MST1. Rare Itemset: An itemset is referred as a rare itemset if its support is less than the MST1 and more than or equal to Minimum Support Threshold2 (MST2), i.e., MST1 > |X|< = MST2. Algorithm 1: HECLAT-RPStream blockid=0; pefefixBvNodeList pbvlist ; //holds oneitems BlockSupports bsup; // block wise parameters Globals gb;//support parameters of datastream while(datastream) Begin blocksize=0; while(currenttimeunit) begin updateitemslistwithnewtransaction(); Blocksize++; end bsup=updateSupportthresoldsArrays(); filteroneitemslist(bsup); generateLargeItemsWithPrefixBasedBFSnDFS(rpt, gb, bsup); Blockid++; displayRarepatterns(); End

2.2 Rare Pattern Mining from a Data stream The proposed approach is to find rare and important patterns from the data stream with a time sensitive sliding window which is based on the Eclat algorithm, the sliding window model is used because many applications require data analysis of recent data rather than old data. The proposed method represents the data in Vertical Data Format with a Bit vector.

2.3 Hybrid Eclat Algorithm for Rare Pattern Mining from Data Stream: HEclat-RPStream The model uses two support thresholds, Maximum Support Threshold1 (MST1), Minimum Support Threshold2 (MST2), the window size of sliding window, which is the maximum number of blocks present in each window, blocks may overlap between windows, so in each window, the processed blocks data is reused, modifications are required based on newly arrived block, the oldest block which is being expired upon

Rare Pattern Mining from Data Stream …

547

arrival of the new block. The model is capable of finding both frequent and rare itemsets from steam data and correlation measure, lift may be used to filter the rules. In this article, we proposed a HEclat-RPStream, as the algorithm uses the Breadth First Search (BFS) and Depth First Search (DFS), we refer it as hybrid Eclat. The features of the algorithm are given below-. 1.

2.

3.

4.

5.

Algorithm is based on Eclat and it uses bitset for representing the transaction set of an itemset instead of tidset. The important thing about this representation is it scans data only once to mine rare patterns from data, which is essential for stream data processing as it cannot be stored. Transaction block is scanned once to find the support of one itemsets. All items whose support > = MST2 are considered and used to generate large items because a rare item in the current block may become frequent or vice versa in the data stream as described in Alogorithm1. Prefix node is used to store itemset, as it optimizes memory usage. The node stores the prefix only once for all suffix items, the itemset is formed by the concatenation of prefixes and suffix item. The prefix is the first (n-1) items of the itemset, suffix is the nth item in an itemset. PrefixBVlist is also used, which holds a two item prefixbv node initially that has to be extended to generate large ‘n’ items, n is of user defined parameter. It uses BFS to generate two itemsets by using the intersection of bitsets of corresponding parent one itemsets. To intersect any two items, the prefix items except the suffix item should be the same. For example, ab and ac can be intersected, abd and acd cannot be intersected because first, (n-1) items of abd, acd (ab and ac) are not the same, the same rule should be followed for higher order itemsets. For each one item, a prefixbv node is created with the given one item as prefix and suffix items are generated for eligible two items and an ArrayList is maintained which consists of all prefixbv nodes generated. For example- the given window has four one items, let us assume as a, b, c, d, the 3 prefixbv nodes are created and added to the ArrayList. Node a, Node b, Node c and corresponding suffix itemslist. Generate large rare itemsets using DFS and using optimization with two itemset, the stack data structure used to implement DFS. Generated large items are stored in two level hash map as described in Algorithm2. Algorithms 3 and 4 are to update the rare items in prefix node to hash table.

2.4 Extending Two Items to Generate Large Itemsets Optimization Based on Two Itemsets: To generate two items, the algorithm uses breadth first search, all two items are generated and stored into a hash table, these two itemsets are used to prune the number of candidates while generating large itemsets for example: itemset{abcd} is joined with {abde} to generate large item {abcde} of size 5, however, if the itemset {de} does not satisfy the support criteria for rare itemset, then obviously itemset {abcde} will not satisfy the support criteria, so before

548

S. Vanamala et al.

joining itemsets, support((n-1)th item of the first itemset, (n-1)th item of the second itemset) is checked in two itemsets hash table to prune the unfruitful candidate rare itemsets. This pruning strategy reduces the number of intersection operations while generating large itemsets, hence the algorithm’s efficiency is improved in terms of execution time. Algorithm 2: Generate Large Items With Prefix Based BFSnDFS Input: oneItemlist, MST2, rootpotentialItems rpt,gb,bsup. Output: Set of patterns with support >MST2. numOfOneitems0

2.5 Structure of Two level Hash Table To store and maintain the patterns generated from the data stream, the tree is implemented as a Linear tree, implemented using ArrayList, each entry in the ArrayList is a hash map (level map), the level map keeps itemsets of same size, n level maps are to be created to store itemsets of n length, n is the max itemset size maintained. The structure of two level hash map is shown in Fig. 1. The rare patterns are extracted from hash table that satisfy support criteria as given in rare item definition. Algorithm3: insertOrUpdateOneItems 1.

if(globals.Blockid==0)

Rare Pattern Mining from Data Stream … Structure of tree

List:rootpt structure Level0Map Level1Map .. LevelnMap

Level Map

LevelnMap:lpmap structure



549 SuffixMap

Suffixmap:smap structure



Suffix item

Suffixitem: linkedlist Support, bsitset

Fig. 1 Two level hash table structure

2. 3. 4. 5. 6. 7. 8.

Create a levelwiseprefixmap and suffixmap with items in one items list else if(globals.Blockid>=gb.windowsize) begin discountOldBlockSupports(bsup); end Insert or update levelwiseprefixmap and suffixmap with items in oneitems list

Algorithm 4: Insert Or Update Large Items 1. 2. 3. 4. 5. 6. 7. 8. 9.

for(;rootSize=gb.windowsize) discountOldBlockSupports(bsup); create or update levelwiseprefixmap and suffixmap with items in prefixnode

3 Experimental Results The algorithm is compared with the SRP tree algorithm, on different datasets from the FIMI repository [16]. The details of datasets are shown in Table 1. The graph in Fig. 2 shows the comparison of time complexity of proposed HECLAT_RPStream with SRP tree, our algorithm is efficient in execution time as compared to SRP tree. The block size is taken as 25 K. We have shown the speed of execution in (msec) as a performance indicator. The experiments were carried out on an Intel core i5 2.4 GHz, Windows 10 OS with 8 GB RAM. The source code is written Java language. Table 2 describes the characteristics of Ecoli dataset from UCI machine learning repository [19]. It contains information about localization sites of protein. Table 3 shows experimental values obtained with varying minimum frequent support and window sizes, which are input parameters in the algorithm. Time and umber of rare itemsets generated is proportional to the minimum frequent support.

550

S. Vanamala et al.

Fig. 2 Performance comparison of HECLAT-RPstream and SRP tree

Table 1 Data sets and performance comaprision Dataset

Block size (K)

MST1 (min Rare support)

MST2 (min Frequent support)

T10I4D100K

25

0.0001

T40I10D100K

25

0.003

Kosark (250 K) 25

0.001

Table 2 Ecoli dataset

SRP Tree no. of itemsets

HECLAT_RPstream no. of itemsets

SRP Tree (ms)

HECLAT_ RPStream (ms)

0.0005

1161

294,901

12

82.896

0.05

4,734,806

2,590,964

301

74.294

0.15

35,623,519

241,396

703

85.077

Number of instances

336

Number of attributes

8

Table 3 Performance comparison on Ecoli dataset S. No.

Min. Freq. support

Min. rare support

Number of items

Time

Window size

1

0.1

0.01

1856

0.246

2

2

0.2

0.01

2243

0.251

2

3

0.3

0.01

2412

0.229

2

4

0.4

0.01

2499

0.21

2

5

0.3

0.01

2412

0.229

2

6

0.3

0.01

2362

0.244

3

7

0.3

0.01

2805

0.261

4

Figure 3 depicts the number of rare itemsets generated for different support values with window size as 2 and each block has 80 instances. Discounting is performed when block size exceeds the window size.

Rare Pattern Mining from Data Stream …

551

Fig. 3 Number of rare items generated versus minimum frequent support

4 Conclusion In this paper, we have shown the implementation of rare pattern mining from data streams with time sensitive sliding window, the algorithms is Eclat based implemented with bitsets. Time sensitive means that the number of items in the block may vary depending on the time. The algorithm uses two level hash tables for recording generated large items. The first level hash map is to search for valid prefixes and the second level is for identifying corresponding suffix item. In addition to the hash tree, the algorithm used BFS and DFS while generating large items in each block. An optimization based on the support of two itemsets is also the one important aspect of the algorithm which improved the overall efficiency of the algorithm in terms of time complexity because it reduces the considerable number of intersections. The proposed algorithm may be extended to implement with the files when the inbuilt memory is not sufficient. In our implementation, we have used various input support thresholds for analyzing the data. The proposed method has shown improved performance over other related algorithms. The major key points are 1. Two item optimization, 2. Prefix node-based large items generation and 3. Vertical mining with bitsets instead of tidsets.

References 1. Li, H.-F., Lee, S.-Y.: Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst. Appl. 36, 1466–1477 (2009) 2. Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. ACM SIGMOD Rec. 34(2), 18–26 (2005) 3. Koh, L., Shin, S.-N.: An approximate approach for mining recently frequent itemsets from data streams. Proc. DaWaK, 352–362 (2006) 4. Cheng, J., Ke, Y., Ng, W.: Maintaining frequent closed itemsets over a sliding window. J. Intell. Inf. Syst. 31, 191–215 (2008)

552

S. Vanamala et al.

5. Tanbeer, S.K., Ahmed, C.F., Jeong, B.S., Lee, Y.K.: Efficient single-pass frequent pattern mining using a prefix-tree. Inf. Sci. 179(5), 559–583 (2009) 6. Deypir, M., Sadreddini, M.H., Hashemi, S.: Towards a variable size sliding window model for frequent item set mining over data streams. Comput. Ind. Eng. 63(1), 161–172 (2012) 7. Lee, G., Yun, U., Ryu, K.H.: Sliding window based weighted maximal frequent pattern mining over data streams. Expert Syst. Appl. 41, 694–708 (2014) 8. Huang, D., Koh, Y.S., Dobbin, G.: Rare Pattern Mining on DataStream Data Warehousing and Knowledge Discovery Lecture, Notes in Computer Science, vol. 7448, p. 30 (2012) 9. Tsang, S., Koh, Y.S., Dobbie, G.: RP-Tree: rare pattern tree mining. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2011. LNCS, vol. 6862, pp. 277–288. Springer, Heidelberg (2011) 10. Vanamala, S., Padma Sree, L., Durga Bhavani, S.: Efficient rare association rule minig algorithm. Int. J. Eng. Res. Appl. (IJERA) 3(3), 753–757 (2013) 11. Vanamala, S., Sree, L., Bhavani, S.: Rare association rule mining for data stream. In: International Conference on Computing and Communication Technologies, pp. 1–6 (2014) 12. Almuammar, M., Fasli, M.: Learning patterns from imbalanced evolving data streams. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 2048–2057 (2018) 13. Manal Almuammar, Maria Fasli, “Pattern discovery from dynamic data streams using frequent pattern mining with multi-support thresholds”, the Frontiers and Advances in Data Science (FADS) 2017 International Conference on, pp. 35–40, 2017. 14. Vanamala, S., Padma Sree, L., Durga Bhavani, S.: Eclat_RPGrowth: finding rare patterns using vertical mining and rare pattern tree. In: Pandian, A., Fernando, X., Islam, S.M.S. (eds.) Computer Networks, Big Data and IoT. Lecture Notes on Data Engineering and Communications Technologies, vol. 66. Springer, Singapore. https://doi.org/10.1007/978-981-16-09657_14(2021) 15. Zaki, M.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000) 16. Frequent itemset mining dataset repository. http://fimi.uantwerpen.be/data/ 17. Anil Ghatage, R.: Frequent Pattern Mining Over Data Stream Using Compact Sliding Window Tree and Sliding Window Model (IRJET), vol. 02. e-ISSN: 2395-0056 p-ISSN: 2395–0072 15 (2015) 18. Borah, A., Nath, B.: Incremental rare pattern based approach for identifying outliers in medical data. Appl. Soft Comput. 85 (2019) 19. Frank, A., Asuncion, A.: UCI machine learning repository (2010). http://archive.ics.uci.edu/ml

A Novel ARDUINO Based Self-Defense Shoe for Women Safety and Security Swarnalatha Pasupuleti, Sattibabu Gummarekula, V. Preethi, and R. V. V. Krishna

Abstract Women have become more strong and autonomous in today’s world, yet their safety has become a concern every year, every day, every minute, and every second. The government has taken several preventative laws to combat nearby molesters from misbehaving, but the rate of these crimes continues to rise. This paper proposes a smart shoe as a novel innovation to protect girls and women. Because this shoe contains a high voltage shock system established in our prototype utilizing a stun gun, this paper focuses on security and self-defense for women so that they feel safe and can defend themselves from an assailant. The electronic system in the shoe is made up of several modules, including GPS, GSM, shock circuit, LED, and a charging port. As a result, we utilized ARDUINO to create a smart shoe for women’s safety.

1 Introduction Our first priority is to provide safety of women. It is important for every human being from government to corporate and singles to come together to make India a safer place for women and girls to work to educate and live. If you search in Internet, there are a number of cases of women and girls being raped by cab drivers, molested by men on travel guides, foot-binding, acid burning, and child marriage happening in different states of India. “Once you feel some sort of trouble related to your safety in a place, it is hard to ever feel safe in that place again”. In a ratio of 1:3 women and girls experience molestation in their life journey. So from the idea of all the existing systems we came out with an idea of equipping a shoe sole with alerting and self-defense circuit. So having credence to budge freely with a self-reliant safety gadget [1] on your side will provide safety of women even more. Our smart shoe will never provide any discomfort but will guard you like a warden or custodian. S. Pasupuleti (B) · S. Gummarekula · V. Preethi · R. V. V. Krishna Department of Electronics and Communication Engineering, Aditya College of Engineering and Technology, Surampalem, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_49

553

554

S. Pasupuleti et al.

This type of safety system with reliability is made feasible by inserting the entire self-defense and alerting systems into a shoe sole. This sole is made of 3D printing process which will provide solidity and light weight even though it is carrying the rechargeable battery. The woman or girls wearing this smart shoe will feel flimsy and comfort in carrying the safety equipment with them anywhere. This paper proposes a design of a self-reliant security and defensive system known as smart shoe [2] to fit the basic alerting essentials like buzzer, calling, messaging and location sending independently in case of peril [3]. This paper suggests an innovative and reliable idea to protect girls and women. It provides comfort in wearing as the shoe sole is made by 3D printing process which makes the sole light weighted even after it is equipped with security devices. This paper focuses on security and self-defense for women and girls so that they will never feel deprived of their strength. It was high time where we women needed a change. The electronic system fitted in a shoe which consisting of various modules such as GPS, GSM [4], shock circuit [5], LED, charging point. Hence, we implemented smart shoe for women safety using ARDUINO [6].

1.1 Literature Survey and Existing System There are many different sorts of existing systems, but none of them equal to our smart shoe in terms of comfort and safety. Here’s a quick rundown of some of the existing systems: first of its kind is a pressure sensor [2] safety system in which by using a portable pressure switch when it is switched or pressed then message will be sent to nearest ones, the drawback is that if she forgets the portable device then it will be a serious issue for her to be safe. The next one is one touch system [4] in which the same switch to be pressed here also but the thing added here is video recording. We also came across safety bands [6] which provide location and alert messages to police and parents but it was not good idea due to its skin allergies as most of the girls have with the materials used to construct the band. Another existing system is that the safety belt [7] that provides accuracy in locations and alert message but it showed off due to its kidney diseases due to gadgets near it. We considered the electronic jacket [8] for women protection but the drawback in this system is that the device is not detachable and it is non-resistant to water also like this we came across many types of existing methods which are inefficient and non-reliable, providing skin allergy so we came out with our innovative thought of using the same ARDUINO in a shoe sole to make the women and girl feel comfort and secured wherever and whenever they go out of their space.

A Novel ARDUINO Based Self-Defense Shoe for Women …

555

2 Proposed System It is a security system which is designed a shoe sole that provides apparel that performs the necessary security actions. The sole is made of 3D printing process to make the shoe sole light weighted to carry protection equipment with comfort without any strain. We used both shoes in providing safety, one shoe for self-defense and another shoe for alerting the police and registered mobile numbers, right shoe sole carries electric shock system and left shoe sole carries GPS to determine location and sent to registered mobile numbers in the form of latitude and longitudes, a GSM [9] to alert the registered people with an SMS and BUZZER to alert the people in the surroundings by alarming. A charging port is provided to charge the battery used inside the shoe. So it does not need any wires for its working.

2.1 Methodology The proposed report consists of different electronic components such as GSM, GPS, ARDUINO, CHARGING PORT, and SHOCK CIRCUIT. The SHOCK CIRCUIT main purpose is to hit the attacker with high electric pulse used for self-defense. The block diagrams given below shows the procedure how the entire system of right and left shoe works.

2.1.1

Left Shoe

We are using ARDUINO UNO module as the heart of our paper in the left shoe Fig. 1. This ARDUINO is connected by GPS, GSM, vibrator, battery, button, and switch. Button is attached to the battery over the shoe. When the switch turned ON the voltage is supplied to the ARDUINO through the battery. Vibrator is used to check the working condition of the circuit the vibration alert indicates whether the circuit in the left shoe is ON or OFF. As soon as the switch is turned ON the SIM808 gets started and then automatically GPS and GSM are also turned ON. Then the

Fig. 1 Block diagram for LEFT SHOE

556

S. Pasupuleti et al.

Fig. 2 Block diagram for RIGHT SHOE

location with latitudes and longitudes generated by GPS will be sent to police and registered mobile numbers. Also an alerting message to I am in danger, help me is also sent to registered mobile numbers through GSM.

2.1.2

Right Shoe

The right shoe in Fig. 2 is used to attack the aggressor and helps to alert the nearest people around her. It consists of button and switch. When the button is pressed the circuit and buzzer will be turned ON. The buzzer gives a sound which is used to alert the people who are in the surroundings of the women at the particular time of attack. The push button is arranged at the toe section of the shoe sole (in the front part of the shoe).when the aggressor attacks the women then she can fight back in self-defense by hitting the attacker with the shoe, the stun gun arranged inside the right shoe consists of the press button, when it touches the aggressor it is pressed and circuit sends a very strong voltage pulse to the aggressor that makes him stop for few minutes. It creates dizziness for some time during which the women can escape the aggressor.

3 Results The below Fig. 3a shows the shock circuit that is stun gun equipped inside a left shoe sole Fig. 3b which is made up of 3D PRINTING PROCESS for providing comfort and light weighted to the women or girls wearing these shoes [10].

A Novel ARDUINO Based Self-Defense Shoe for Women …

557

Fig. 3 a stun gun shock circuit with press button. b left shoe sole circuit setup

Figure 4 shows the shoe sole equipped with security system and Fig. 5 shows the connections ARDUINO [4] with GSM, GPS [11] and a vibrator to indicate the person wearing the shoe that the circuit is turned ON. Figure 6 shows the location of women or girl with latitudes and longitudes [12] within fraction of seconds, without any delay so that the police and the close persons can know her location with certainty. A buzzer is also provided inside the shoe to alert [13, 14] the people in the surrounding area of the woman and girls at the time of attack or abuse [15]. It continuously sends location alerts to provide certainty in finding her. [16]

Fig. 4 Left shoe sole circuit setup

558

Fig. 5 ARDUINO with GSM and GPS

Fig. 6 location message with latitudes and longitudes

4 Advantages • The device is compact in size

S. Pasupuleti et al.

A Novel ARDUINO Based Self-Defense Shoe for Women …

559

For most consumers, it is infuriating that shoe sizes do not fit evenly across the industry, but different brands and styles of shoes do it differently, and that’s before you consider that the United States, the United Kingdom, Europe, and Japan all use distinct sizing systems. • It provide wireless connectivity due to the rechargeable battery usage Wireless charging reduces the need for cables to charge mobile phones, cordless appliances, and other electronic devices. The battery in any battery-powered device can be charged with a wireless charger by simply placing the appliance near a wireless power transmitter or an authorized charging station. As a result, the appliance’s case can be totally sealed, if not impenetrable. • Easy to install any new application Writing code and uploading it to the board is simple with the open-source Arduino Software (IDE). Any Arduino board can be used with this software. • Water resistant In the same way that we do in this shoe snow, ice, salt, or rain, all of which can easily destroy the material, can be protected using waterproofing sprays or conditioning lotions. • Light weighted Even though we are using different equipment for defense the shoe is light weighted as the entire equipment is inside the sole. • Women physical empowerment will be partially fulfilled Women by using these shoes will feel unharmed and they can defense themselves from the attacker which they will feel empowered physically.

5 Conclusion SMART SHOE is a novel concept that focuses on women’s safety and self-defense during harassment or other forms of assault. When compared to our smart sneaker, many existing technologies cannot give the same level of reliability. Our suggested system will address unfavorable challenges that women and girls have experienced in the past and in recent years, and will assist them in overcoming such issues using technologically advanced devices such as our smart shoe. Shock circuit is employed in our paper to stop the aggressor’s attack, while alarm sound is used to report supportive techniques to notify the antipathy. The essential component of saving women and girls from the hands of aggressors attacking them within the time-lapse without losing our loved ones is sending an SMS connected to her predicament and position through GPS and GSM.

560

S. Pasupuleti et al.

6 Future Scope We can also include audio and video recording, which offers evidence at the time of the crime. In the shoe sole, there is a space for storing a knife that can be utilized in an emergency. Our concept is a whole shoe with a smart security system that can be updated when new uses or technologies become available. It can be used by blind persons who are equipped with direction indicators, allowing them to live independently.

References 1. Monisha, D.G., Monisha, M., Pavithra, G., Subhashini, R.: Women safety device and application-FEMME. Indian J. Sci. Technol. 9(10), 1–6 (2016). https://doi.org/10.17485/ijst/ 2016/v9i10/88898 2. Viswanath, N., Pakyala, N.V., Muneeswari, G.: Smart foot device for women safety. In: Proceedings of 2016 IEEE Region 10 Symposium TENSYMP 2016, no. 1, pp. 130–134. doi: https:// doi.org/10.1109/TENCONSpring.2016.7519391 (2016) 3. Kodavati, B., et al.: Gsm and Gps based vehicle location and tracking. Gsm Gps Based Veh. Locat. Track. 1(3), 616–625 (2013) 4. Priya, C., Ramya, C., Befy, D., Harini, G., Shilpa, S., Sivanikiruthiga, B.: One touch alarm for women’s safety using arduino. Int. J. Innov. Technol. Explor. Eng. 8(6), 258–260 (2019) 5. Gowrishankar, P., Logeswaran, T., Surendar, V., Shriram, B., Sivamuruganandam, J., Vasanth, S.: Women safety dress using stun gun technology. Int. J. Innov. Technol. Explor. Eng. 9(2), 2502–2505 (2019). https://doi.org/10.35940/ijitee.b7074.129219 6. Rai, P.K., Johari, A., Srivastava, S., Gupta, P.: Design and implementation of women safety band with switch over methodology using Arduino Uno. In 2018 International Conference on Advanced Computation and Telecommunication (ICACAT). doi:https://doi.org/10.1109/ICA CAT.2018.8933713 (2018) 7. Gadhe, S.B., et al.: Women anti-rape belt. Compusoft 4(4), 1632–1636 (2015) 8. Gadhave, S.N., Kale, S.D., Shinde, S.N., Bhosale, P.A.C., College, S.B.P., Engineering, O.: Electronic jacket for women safety. Int. Res. J. Eng. Technol. 4(5) (2017). Available: https:// www.irjet.net/archives/V4/i5/IRJET-V4I5170.pdf. 9. Chaitanya Kumar, D., Adiraju, R.V., Pasupuleti, S., Nandan, D.: A review of smart green house farming by using sensor network technology. Adv. Intell. Syst. Comput. 1245, 849–856 (2021). doi:https://doi.org/10.1007/978-981-15-7234-0_79. 10. Wankhede, A.: Portable device for women security. Int. J. Res. Eng. Technol. 4(3), 65–67 (2015). doi:https://doi.org/10.15623/ijret.2015.0403009. 11. Bhilare, P., Mohite, A., Kamble, D., Makode, S., Kahane, R.: Women Employee Security System using GPS and GSM Based Vehicle Tracking, no. 1, pp. 65–71 (2015). 12. Saranya, N.M.C.A., Mca, K.K.: Women Safety Application Using Android Mobile, pp. 1317– 1319 (2015). doi:https://doi.org/10.4010/2015.347. 13. Journal, I., Engineering, C.: 10.Ijarcet-Vol-3-Issue-3-966-968, vol. 3, no. 3, pp. 966–968 (2014).

A Novel ARDUINO Based Self-Defense Shoe for Women …

561

14. Basavaraj Chougula, P.D., Naik, A, Monu, M, Patil, P., Das, P.: Smart Girls Security System, vol. 3, no. 4, pp. 281–284 (2014). Available: http://www.ijaiem.org/volume3issue4/IJAIEM2014-04-30-088.pdf. 15. Monisha, D.G., Monisha, M., Pavithra, G., Subhashini, R.: Women safety device and application-FEMME. Indian J. Sci. Technol. 9(10), 33–40 (2016). https://doi.org/10.17485/ ijst/2016/v9i10/88898 16. Qdyljdwlrq, R., Dqg, V.V., Wlph, U., ’Hvljq Dqg ’Hyhorsphqw Ri *36 *60 %Dvhg 9Hklfoh 7Udfnlqj Dqg $Ohuw 6\Vwhp Iru & Rpphufldo, Qwhu.

A New Supervised Term Weight Measure Based Machine Learning Approach for Text Classification T. Raghunadha Reddy, P. Vijaya Pal Reddy, and P. Chandra Sekhar Reddy

Abstract Text classification is a technique of predicting assigning a class label of an anonymous document or classifying the documents into known classes. The content of a text is a primary source for classifying the data in text classification. The researchers used the content of a text in different ways like most frequent terms, TFIDF scores of terms, N-grams of word and character for text classification. In this work, a term weight measure-based machine learning approach is proposed for text classification. In this approach, we propose a new term weight measure to represent the importance of a term in a document. The terms which are more frequent in the dataset are extracted to represent the documents as vectors. The term value in vector representation is determined by using term weight measure. Four term weight measures are used in this experiment to compute the weight of a term. Machine learning algorithms are trained by using these vectors to generate the model for classification. The performance of a proposed system is predicted by using this classification model. Accuracy measure is used as a performance evaluation measure. Six machine learning algorithms and two datasets namely, IMDB and Enron Spam datasets are used in this work. The proposed term weight measure-based approach efficiency is good when compared with results of popular approaches to text classification.

1 Introduction The textual information in the Internet is increasing tremendously through different social media platforms. The type of text is very important for better organization of text and effective accessing of information in the Internet. Text classification (TC) is one such technique to classify the documents into different categories as T. Raghunadha Reddy · P. Vijaya Pal Reddy (B) Department of CSE, Matrusri Engineering College, Hyderabad, Telangana, India P. Chandra Sekhar Reddy Department of CSE, Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, Telangana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_50

563

564

T. R. Reddy et al.

well as detecting the class label of an anonymous document [1]. The researchers proposed several approaches based on stylistic features, content-based features, word n-grams, character n-grams, term weight measures (TWMs), feature selection techniques, similarity based techniques, machine learning (ML) techniques and deep learning (DL) techniques for text classification [2]. In general, the text classification approaches are divided into several steps such as data collection, data preprocessing, feature extraction, dimensionality reduction, document vector representation, machine learning techniques and exploration of results. Several researchers focused on the development of TWMs to compute the term weight in a document [3]. The TWMs are majorly categorized into two types such as supervised TWMs (STWM) and unsupervised TWMs (UTWM) by considering the utilization of class membership information. The STWMs used class membership information while computing the term weight. The UTWM not used class membership information while determining the term weight [4]. In this work, we concentrated on the proposal of a novel supervised TWM to determine weight of terms based on the term distributions in various classes. The proposed measure used the information of the way the term is distributed in positive class of documents, negative class of documents, within a document and total dataset to calculate the term weight in a document. In this work, a TWM based machine learning method is proposed for text classification. In this approach, TWMs are used to determine the term value in vector representation of documents. Various ML techniques such as Decision Tree (DT), Gaussian Naïve Bayes (GNB), Logistic Regression (LR), Random Forest (RF), KNearest Neighbor (KNN) [5], Support Vector Machine (SVM) and linear SVM are used to generate the model to assess the efficiency of the proposed approach. Two datasets namely Enron spam dataset and IMDB movie reviews datasets are used in this experiment. The proposed TWM performance is compared with various popular TWMs such as TFIDF, TFRF and TFIDFICF. This chapter is structured in 5 sections. Dataset characteristics are presented in Sect. 2. The proposed approach is explained in Sect. 3. The existing and proposed term weight measures are described in Sect. 4. The experimental results are discussed in Sect. 5. The conclusions and future scope are explained in Sect. 6.

2 Dataset Characteristics The characteristics of datasets are mentioned in Table 1. The dataset play a major role in the text classification process. The datasets which are collected from known sources with correct labels are obtaining good accuracies in the classification process. In this work, two datasets such as IMDB and Enron spam datasets are used.

A New Supervised Term Weight Measure Based Machine … Table 1 The dataset characteristics

565

S. No.

Dataset name

Number of documents/samples

1

IMDB [6]

50,000 (positive and negative reviews)

2

Enron spam dataset [7]

Ham (3672 files), spam(1500 files)

3 A Term Weight Measure-Based ML Approach for Text Classification In this work, a TWM-based machine learning approach is proposed for text classification. In this approach, term weight measures and machine learning algorithms are used to improve the accuracy of classification. The proposed model is represented in Fig. 1. In this approach, firstly, apply preprocessing techniques like tokenization, lowercase conversion, removal of punctuations, stop words removal and stemming. Stop words are words which are commonly used by the users like articles, prepositions, etc. but they are not useful in the process of classification [8]. Stemming is a technique of converting a word into its root form [9]. After cleaning the data from the dataset, the dataset contains all informative words which are useful for classification. Next step is extracting the important terms for experiment. The most frequent terms are selected as important terms. The documents are represented with these selected terms as document vectors. The value of a term in vector representation is determined by using TWM. Once the vectors representation is ready the ML algorithms are used to train on the document vectors and generate a classification model and it is used to determine the performance of the proposed approach as well as detect the class label of a test document.

4 Term Weight Measures (TWMs) The TWMs determine the value of a term based on the importance of a term in document. In this work, we proposed a new TWM and compared with three popular term weight measures.

4.1 Term Frequency and Inverse Document Frequency (TFIDF) TFIDF assign weight to the terms based on term frequency and documents that are containing the term at least one time [10]. Equation (1) is used to compute the TFIDF

566

T. R. Reddy et al.

Fig. 1 The proposed approach

measure. 

N TFIDF(Ti , Dk ) = TFik × log DFi

 (1)

In this measure TFik is the occurrence count of Ti term in Dk document, N is to total count of documents in whole dataset, DFi is the documents count in dataset which contain term Ti at least one time.

A New Supervised Term Weight Measure Based Machine …

567

4.2 Term Frequency and Relevance Frequency (TFRF) Measure TFRF measure considers the information of term presence of documents in negative and positive class [11]. Equation (2) is used to compute the TFRF measure.  TFRF(Ti , Dk ) = TFik × log 2 +

A max(1, C)

 (2)

In this measure, TFik is frequency of term Ti in kth document, A is count of positive class documents have a term Ti , C is the documents count of negative class that have a term Ti .

4.3 Term Frequency Inverse Document Frequency and Inverse Class Frequency (TFIDFICF) Measure TFIDFICF measure considers the information of term frequency of a term, count of documents contains term and count of classes contains term [12]. Equation (3) is used to determine the TFIDFICF measure.     |C| N × log (3) TFIDFICF(Ti , Dk ) = TFik × log DFi CFi In this measure, TFik is term Ti occurrence count in Dk document, N is total documents in dataset, DFi is documents count that contain term Ti , |C| is classes count in dataset and CFi is classes count which contain term Ti .

4.4 Proposed Class Specific Supervised Term Weight Measure (CSTWM) The proposed STWM considers different types of information of terms in the dataset, in other words, the way terms are distributed in the dataset. This measure gives more weight to the terms that are occurred more times in the dataset, the terms that are discussed in less number of documents in the dataset, the terms that are discussed in less number of classes and the terms that are occurred more times in positive class of documents when contrasted with negative class of documents. 

|D| CSTWM(Ti , Dk ) = TF(Ti , Dk ) × log 1 + DFi



568

T. R. Reddy et al.

 × log

|C| 1 + CFi



 ×

A 1+C

 (4)

where |D| is total documents in dataset, |C| is classes count in dataset, CFi is count of classes in dataset which contain ith term.

5 Experimental Results of Term Weight Measures The experiment is conducted with the four term weight measures and six machine learning algorithms for text classification. In this work, 8000 terms that are more occurred in the dataset are extracted for vectors representation of documents. The experiment started with 2000 terms and increases the number of terms in next iterations by 2000. It was observed that the text classification accuracies are reduced when experimented with more than 8000 terms. The documents are represented with extracted features as vectors. The machine learning algorithms trained on these vectors and gives the accuracy of proposed system. The text classification accuracies of different classifiers on IMDB dataset is showed in Table 2. In Table 2, the combination of most frequent 8000 terms and RF algorithms achieved accuracy 0.848 for text classification when the TFRF measure is used to compute the term weight in vector representation. The combination of most frequent 8000 terms and RF algorithms achieved accuracy 0.851 for text classification when the TFIDFICF measure is used to compute the term weight in vector representation. The combination of 8000 terms and RF algorithms achieved accuracy 0.859 for text classification when the proposed CSTWM is used to compute the weight of terms in vector representation. Overall, the proposed CSTWM attained best accuracies for text classification when compared with other TWMs. It was found that the RF classifier shows best efficiencies for TC when contrasted with other machine learning algorithms. Table 3 displays the accuracies of TC when experimented with most frequent terms and machine learning algorithms on Enron Spam dataset. In Table 3, the combination of 4000 terms and LR algorithm achieved accuracy 0.910 for text classification when the TFIDF measure is used to compute the term weight in vector representation. The combination of 4000 terms and LR algorithm achieved accuracy 0.913 for text classification when the TFRF measure is used to determine the weight of terms in vector representation. The combination of 6000 terms and LSVM algorithm achieved accuracy 0.915 for text classification when the TFIDFICF measure is used to compute the weight of terms in vector representation. The combination of most frequent 8000 terms and LSVM algorithm achieved accuracy 0.919 for text classification when the proposed CSTWM is used to calculate the weight of terms in vector representation. Overall, the proposed CSTWM obtained good accuracies for text classification when compared with other TWMs. It was

A New Supervised Term Weight Measure Based Machine …

569

Table 2 The accuracies of text classification on IMDB dataset Term weight measures/machine learning algorithms/most frequent terms

2000

4000

6000

8000

TFIDF

LR

0.805

0.828

0.832

0.846

KNN

0.606

0.574

0.589

0.568

SVM

0.793

0.812

0.815

0.831

LSVM

0.821

0.824

0.812

0.828

GNB

0.731

0.691

0.637

0.635

DT

0.659

0.656

0.667

0.656

RF

0.821

0.819

0.821

0.839

LR

0.805

0.827

0.839

0.849

KNN

0.606

0.574

0.593

0.574

SVM

0.793

0.811

0.814

0.835

LSVM

0.821

0.827

0.819

0.830

GNB

0.731

0.691

0.639

0.645

DT

0.656

0.672

0.661

0.660

TFIDFICF

TFRF

CSTWM

RF

0.829

0.826

0.839

0.851

LR

0.814

0.823

0.827

0.836

KNN

0.579

0.582

0.574

0.567

SVM

0.795

0.81

0.819

0.825

LSVM

0.806

0.798

0.802

0.803

GNB

0.731

0.691

0.629

0.618

DT

0.665

0.668

0.659

0.658

RF

0.818

0.828

0.841

0.848

LR

0.801

0.828

0.835

0.833

KNN

0.601

0.569

0.580

0.573

SVM

0.788

0.816

0.823

0.834

LSVM

0.803

0.796

0.803

0.807

GNB

0.731

0.692

0.621

0.618

DT

0.663

0.665

0.667

0.666

RF

0.830

0.834

0.845

0.859

identified that the LSVM and LR classifier shows best performance when contrasted with other machine learning algorithms.

570

T. R. Reddy et al.

Table 3 The accuracies of TC on enron spam dataset Term weight measures/machine learning algorithms/most frequent terms

2000

4000

6000

8000

TFIDF

LR

0.899

0.910

0.909

0.909

KNN

0.851

0.850

0.840

0.842

SVM

0.896

0.903

0.903

0.905

LSVM

0.869

0.878

0.878

0.877

GNB

0.731

0.769

0.769

0.801

DT

0.889

0.881

0.874

0.888

RF

0.905

0.903

0.901

0.906

LR

0.905

0.913

0.911

0.909

KNN

0.867

0.866

0.859

0.845

SVM

0.901

0.894

0.895

0.886

LSVM

0.906

0.910

0.915

0.911

GNB

0.731

0.772

0.774

0.803

DT

0.892

0.879

0.874

0.889

TFIDFICF

TFRF

CSTWM

RF

0.904

0.901

0.902

0.906

LR

0.902

0.913

0.912

0.909

KNN

0.866

0.866

0.863

0.853

SVM

0.897

0.886

0.886

0.889

LSVM

0.869

0.891

0.893

0.887

GNB

0.729

0.773

0.774

0.809

DT

0.889

0.882

0.878

0.885

RF

0.906

0.901

0.902

0.906

LR

0.906

0.915

0.913

0.913

KNN

0.879

0.883

0.877

0.863

SVM

0.901

0.893

0.894

0.891

LSVM

0.907

0.910

0.905

0.919

GNB

0.733

0.778

0.780

0.908

DT

0.890

0.883

0.879

0.885

RF

0.904

0.903

0.899

0.907

6 Conclusions and Future Scope In this work, a new a term weight measure-based ML approach is proposed for text classification. Four TWMs are used to determine the value of a term in document vector representation. A new TWM is proposed and observed that the proposed CSTWM attained best accuracies for text classification when compared with existing TWMs. For IMDB dataset, the combination of RF classifier and proposed CSTWM

A New Supervised Term Weight Measure Based Machine …

571

attained an accuracy of 0.859 for text classification. For Enron Spam dataset, the combination of proposed term weight measure and SVM classifier attained an accuracy of 0.919 for text classification. In future work, we are planned to implement to a new vector representation to avoid the problems in existing document representations. It was also planned to implement DL techniques to increase the accuracy of TC.

References 1. Raghunadha Reddy, T., Vishnu Vardhan, B., Vijayapal Reddy, P.: A survey on author profiling techniques. Int. J. Appl. Eng. Res. 11(5), 3092–3102 (2016) 2. Khatoon, T., Govardhan, A., Sujatha, D.: Improving document relevant accuracy by distinguish Doc2query matching mechanisms on biomedical literature. In: IEEE 10th International Conference on Cloud Computing, Data Science and Engineering, pp. 29–31 (2020) 3. Chandra Sekhar Reddy, P.: Gender classification using central fibonacci weighted neighborhood pattern flooding binary matrix (CFWNP_FBM) shape primitive features. Int. J. Eng. Adv. Technol. 8(6), 5238–5244 (2019). (IJEAT) ISSN: 2249-8958 4. Raghunadha Reddy, T., Vishnu Vardhan, B., Vijayapal Reddy, P.: A document weighted approach for gender and age prediction. Int. J. Eng. Trans. B Appl. 30(5), 647–653 (2017) 5. Khatoon, T.: Query expansion with enhanced-BM25 approach for improving the search query performance on clustered biomedical literature retrieval. J. Digital Inform. Manag. 16(2) (2018) 6. https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews? 7. https://www.kaggle.com/wanderfj/enron-spam 8. Chandra Sekhar Reddy, P., Vara Prasad Rao, P., Kiran Kumar Reddy, P., Sridhar, M.: Motif shape primitives on fibonacci weighted neighborhood pattern for age classification. In: Wang, J., Reddy, G., Prasad, V., Reddy, V. (eds) Soft Computing and Signal Processing. Advances in Intelligent Systems and Computing, vol 900. Springer, Singapore (2019). https://doi.org/10. 1007/978-981-13-3600-3_26 9. Avanthi, M., Chandra Sekhar Reddy, P.: Human facial expression recognition using fusion of DRLDP and DCT features. In: Satapathy, S.C., Bhateja, V., Favorskaya, M.N., Adilakshmi T. (eds) Smart Computing Techniques and Applications. Smart Innovation, Systems and Technologies, vol. 224. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-15023_20 10. Raghunadha Reddy, T., Vishnu Vardhan, B., Vijayapal Reddy, P.: Author profile prediction using pivoted unique term normalization. Indian J. Sci. Technol. 9(46) (2016) 11. Raghunadha Reddy, T., Vishnu Vardhan, B., Vijayapal Reddy, P.: Profile specific document weighted approach using a new term weighting measure for author profiling. Int. J. Intell. Eng. Syst. 9(4), 136–146 (2016) 12. Ren, F., Sohrab, M.G.: Class-indexing-based term weighting for automatic text classification. Inf. Sci. 236, 109–125 (2013)

Machine Learning-Based Human Activity Recognition Using Smartphones A. Vinay Kumar, M. Neeraj, P. Akash Reddy, and Ameet Chavan

Abstract Human activity recognition (HAR) research using Artificial Intelligence (AI) is being leveraged in many areas such as healthcare monitoring, surveillance motion tracking, sports training, and assisted living. The proposed work has developed a machine learning algorithm to analyze the data, collected in series time signals, from smartphones sensors like accelerometer and 3D gyroscope. The objective of the work is also to decrease the vast dimensions of the datasets. The activities of human are classified and predicted using a fusion of ML supervised and active learning models like logistic regression, QDA, k-nearest neighbor, support vector machine, ANN–classifier, decision tree, Naïve Bayes. The algorithm compares model scores to attain maximum efficiency through dataset training and prediction. With the proposed algorithm, precision levels >90% were obtained for six human activity.

1 Introduction Human activity recognition (HAR) system recognizes and records simple and complex action of the individuals over a period of time. Typically, implementing such recognition system either employs pose analysis from video recording or wearable wireless sensor networks. The previous technique adds delay and later technique offers real-time analysis but is intrusive and complex to operate. The project implements a cost-effective context-based data extraction system using smartphones sensors to identify and categorize human activity. The system is relatively simple and practical, wherein multiple volunteers’ activities can be detected at the same time. The research validates that the gyroscope and accelerometer in smartphones help in providing data of 3 dimensional values. For many other constrains used in the system for accounting to process, the signal is done by virtual formatting and project the data by dividing it into training and testing data which are in ratio of 70:30. A. Vinay Kumar · M. Neeraj · P. Akash Reddy · A. Chavan (B) Department of Electronics and Communication Engineering, Sreenidhi Institute of Science and Technology, Hyderabad, Telangana 501301, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_51

573

574

A. Vinay Kumar et al.

After, the models are trained with the provided datasets with keeping certain parameters and performing exploratory data analysis (EDA) on the values and activities are registered through active learning. The most common positions of the accelerometer and gyroscope are x, y, z axis rotational placement of object that is oriented taking reference for ground. Training is done by active learning algorithms in which estimators are set into model fitting and over fitting and under fitting of the model is also checked by ensemble methods in machine learning. Provided the classifier, the feedback can be provided from testing and the model can learn again in constituent semi supervised way of learning to estimate and predict the distant discriminant values. Labeling is done according to the number of activities, and various methodological comparative analyzing plots are made like confusion matrix, bar plots peek curves, and time for model working. Active learning which is given queries the samples which is unlabeled intelligently. So, the parameters are passed in the way to acquire the properties of movements provided in data and calculate percentage error rates. Configuring properties and conditions driven for practical implementation. We present analysis of method of training models through machine learning for identifying of activities like walking upstairs or standing or sitting or moving forward or backward using data from sensors that are accelerometer and 3-D axis gyroscope. Tabulating and studying the results are informed by visualizing the data we collected. Recognition of human activities is being researched, and activities are predicted using traditional way of surveillance. By the proposed system of ours, machine learning decreases vast dimensions of the data also give accurate results in very faster time interval and the models trained can also be used for various testing data not to the limits of over fitting and under fitting of the classifier. Data processing involves high scope with inertial sensors data. Body mount sensors convey data through Internet of things platforms, and collected data are trained and tested, as shown in Fig. 1. Active learning forms the motivation for reducing time and labor for labeling abundant. This technique can be used for further future scopes. References [1, 2] designed a deep neural network structure to obtain the high-level representation of human activity combining both motion features and context features. The work involves design of scene prior feature and scene context feature to capture the environment of interest. References [3, 4] identified for all ranking algorithms, we forecast 10 random animal

Fig. 1 Typical HAR system data processing flow

Machine Learning-Based Human Activity Recognition …

575

actions with greater precision by operating a Android application with a frequency of 1 Hz by using dense neural network.

2 Proposed Work The proposed work, as shown in Figs. 2 and 3, implements a cost-effective system for recognition of human activity with the help of data generated from the smartphone. The activities include standing, sitting, laying, walking, walking upstairs, and walking downstairs. Dataset is gathered from accelerometer, gyroscope, and other sensors that were embedded in the smartphone. Data are randomly divided into 70:30 percentages of ratio to form training data and testing dataset. Activity classification is carried out with the help of machine learning models, namely random forest, support vector machine, artificial neural network, and K-nearest neighbor. The work

Fig. 2 Proposed HAR system block diagram

Fig. 3 Proposed HAR system data processing flow

576

A. Vinay Kumar et al.

contrasted and juxtaposed the performance and accuracy using confusion matrix and random simulation for these models. The system trains on dataset from The University of California at Irvine Machine Learning Repository [5]. The data are taken, cleaned, and normalized. In order to increase accuracy and performance of system that we have reduction in dimensions of our original set of data using principal component analysis (PCA) technique. The experiments carried out with a group of 30 to 49 volunteers. Each person performed six to seven activities carrying a smartphone (Samsung Galaxy S II) on the waist with embedded inertial sensors—3-axial liner acceleration and 3-axial angular velocity at a constant rate of 50 Hz. Correctness of the system is determined by generating confusion matrix and by random simulations. Dimensional reducing with use of principal component analysis and linear discriminant analysis technique. The builtin accelerometer used by repository UCI has frequency 5 × 10 Hz speed and great sensitivity. Energy is of 20 Hz [6]. Considering the theory of frequency rate Nyquist, 5 × 10 Hz accelerometer is sufficient. The insights obtained from sensing activities and are for analysis. Activities read or analyzed are of number 200–500 person volunteered data, which corresponds to 5.4–15 s sequence data values. All the other features are generated from axis directions excluding the resultant accelerometer value. Feature variables for the dataset were calculated from the accelerometer signals in domain of time and frequency, including standard deviation, mean, signal magnitude, and signal frequency. The final product is a dataset in 563 dimensions, where 21 participants helped to make the training dataset and the remaining 9 made the testing dataset. Reduced data are then processed through various supervised machine learning algorithms like random forest, support vector machine, artificial neural network, and K-nearest neighbor to classify data into six categories.

3 Model Testing and Results 3.1 Random Forest Model Random forest is a learning method which is ensemble for regression, classification, and another tasks [7]. It works by constructing many decision trees at training time and outputs the class that is the mean prediction of decision trees. Random forest prevents over fitting of data. It is a method of learning that is ensemble for regression type, classification type, and other modeled jobs. It operates by building a number of decision trees at the training time and the decision trees’ mean prediction is resulted. It fends off the problem of data over fitting by the help of drop method [8]. With our experiments, we noticed that there was no use for building five-hundred decision trees. The model we used will operate in the same manner and will yield the same output if we build only eighty trees as shown in Fig. 4. From Fig. 6, it can be noticed that there is no change in success rate for random forest model (Figs. 5 and 6).

Machine Learning-Based Human Activity Recognition …

577

Fig. 4 Measurements of rate of error with respect to no. of trees

Fig. 5 Behavior of rate of success in random forest model

Fig. 6 Variation of error rate w.r.t value of K

3.2 SVM Model SVMs are depended on forming with values of vector equations to classify at ease of planar regions as

578

A. Vinay Kumar et al.

Fig. 7 Rate of success in K-nearest neighbor

(xi , x j ) = (xi ) ∗ (x j ). Also, the (x) is not necessarily an explicit function. Instead, we are only interested in a kernel function. Reference [9] SVM can be scaled and tuned according to its c value and kernel value of weather it is RBF or not. We can choose the one with best score. The above-mentioned values are of function that is explicit and SVM can be tuned for RBF or not with the results.

3.3 KNN Algorithm KNN is a basic classifier of instance. It is being operated. Labels are used to analyze the k value for accurate and precise results [10]. KNN is a lazy learning algorithm in supervised machine learning algorithm with more time required and minimum error rate. So, the k value needs to be estimated properly. From the figure, we had found that error rate decreases from K = 40 to K = 65 and reaches minima at K = 65. From Fig. 7, it can be noticed that K-nearest neighbor’s success rate changes from 80 to 90%.

3.4 Artificial Neural Network In ANN models, the input signals set and an output signal set’s relation by the use of a derived model from our comprehension of how a biological brain reacts to the stimuli from different sources of inputs [11]. We made use of methods package in training our dataset, which can be used for single hidden layer feed-forward neural network. Artificial neural network is trained n = by the package with the use of method of back-propagation. Through this approach, output error was derived and then, this output error is sent back into the network. The weights are updated to

Machine Learning-Based Human Activity Recognition …

579

minimize the resulting error from each and every neuron [12]. The output parameter L was made false by standard as we are going to deal with a problem of classification type (Figs. 8 and 9). After the calculation of principal components of the training set, we can use these components to predict the testing data. This seems simple, but we need to understand the underlying concept here: (a)

(b)

Implementing principal component analysis and LDA to the entire data at once will cause the features of the training dataset into testing to “leak” and will show an effect on the ability of prediction of our model [13]. On testing and training dataset separately using principal component analysis and LDA will yield in vectors that possess various directions. Therefore, will produce result that is not appropriate [13].

Principal component analysis (PCA) is used to make the number of features in our dataset minimum. The PCA is designed to reduce the dimensionality for a large dataset containing a huge number of interrelated variables and retain as much as possible of the variation present in the dataset [14–16]. The principal components

Fig. 8 Changes in rate of error with respect to no. of nodes in hidden layer (ANN)

Fig. 9 Variations in rate of success in ANN classifier

580

A. Vinay Kumar et al.

are correlated and are ordered in terms of the variation present in all of the original variables to identify the activity as shown in Figs. 10 and 11.

Fig. 10 2D projection of the data with PCA

Fig. 11 2D projection of data with t-SNE

Machine Learning-Based Human Activity Recognition … Table 1 Model comparison

Model

Scores

581 Time taken (S)

Error (%)

LR

0.94

12.9

6.25

QDA

0.81

7

18.1

KNN

0.88

6.2

12

D. TREE

0.86

11

14

NB

0.77



7

SVM

0.95

10

4.63

ANN

0.92

5.2

7.8

Another method explored by the proposed work for reducing the dimensionality which is well suited for high-dimension dataset is known as t-SNE (t-distributed stochastic neighboring entities) compared to the PCA. From the results produced this technique though effective in clustering, the dataset compared to PCA has limitations with regard to performance and memory requirement.

4 Conclusion The proposed work shows that SVM method is more accurately for predicting the activity of a person with more accuracy score and scaled values, and the ensemble techniques help to compare with artificial neural networks and avoiding over fitting and under fitting random forest also provide goo score of accuracy. The study has been performed to get insights. All the models discussed above that are ANN, KNN, random forest classifier, and so on are firmly held machine learning mechanism of recognizing activity performed with accuracy. From 500 to 600 features to less number of features by applying principal component analysis and LDA, we had performed data analysis and found that support vector machine had greatest efficiency (95.6%) in predicting human activities. SVM is most efficient because we have used Gaussian kernel in reduced dataset. Least time to train the machine learning model was taken by K-nearest neighbor (3 s) as of its least complex and it uses Euclidian distance function. Maximum time to train the model was taken by support vector machine (11.943 s).

References 1. Wei, L., Shah, S.: Human Activity Recognition using Deep Neural Network with contextual information. VISIGRAPP (2017) 2. Masum, M., Kadar, A., Sadia, J., Erfanul, B., Md Golam Rabiul, A., Shahidul, K., Mohammad, A.: Human Activity Recognition Using Smartphone Sensors: A Dense Neural Network Approach, pp. 1–6 (2019). https://doi.org/10.1109/ICASERT

582

A. Vinay Kumar et al.

3. Navya Sri. M., Ramakrishna Murty, M., et al.: Robust features for emotion recognition from speech by using gaussian mixture model classification. In: International Conference and published proceeding in SIST series, Springer, vol. 2, pp. 437–444 (2017) 4. Maurer, U., Rowe, A., Smailagic, A., Siewiorek, D.: Location and activity recognition using eWatch: a wearable sensor platform. Ambient Intell. Everday Life Lect. Notes Comput. Sci. 3864, 86–102 (2006) 5. Preece, S.J., Goulermas, J.Y., Kenney, L.P.J., Howard, D.: A comparison of feature extraction methods for the classification of dynamic activities from accelerometer data. IEEE Trans. Biomed. Eng. (2008) 6. Antonsson, E.K., Mann, R.W.: The frequency content of gait. J. Biomech. 18(1), 39–47 (1985) 7. Settles, B.: Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison (2010) 8. Lang, K., Baum, E.: Query learning can work poorly when a human oracle is used. In: Proceedings of the IEEE International Joint Conference on Neural Networks, pp. 335–340. IEEE Press (1992) 9. Zhu, X.: Semi-Supervised Learning with Graphs. PhD thesis, Carnegie Mellon University (2005) 10. Settles, B., Craven, M., Friedland, L.: Active learning with real annotation costs. In: Proceedings of the NIPS Workshop on Cost-Sensitive Learning, pp. 1–10 (2008) 11. Bao, L., Intelligent, S.: Recognition of activity from users those annotated acceleration data. In: International Conference on Pervasive Computing. Springer, pp. 1–17 (2004) 12. Bevilacqa, A., McDoneld, K., Rangarej, A., Widjaya, V., Caulfield, B., Kechadi, T.: Human activity recognition with convolutional neural networks. In: European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, p 542–551 (2018) 13. Sun, L., Zhang, D., Li, B., Guo, B., Li, S.: Activity recognition on an accelerometer embedded mobile phone with varying positions and orientations. In: International conference on ubiquit intelligence, computing. Springer, pp. 548–562 14. “Decision-tree,” Feb 2021. Available: https://scikitlearn.org/stable/modules/generated/sklearn. tree.DecisionTreeClassifier.html/ (2021) 15. “Random-forest,” Feb 2021. [Online]. Available: https://scikitlearn.org/stable/modules/genera ted/sklearn.ensemble.RandomForestClassifier.html (2021) 16. “Support vector classifier,” February 2021. [Online]. https://scikitlearn.org/stable/modules/gen erated/sklearn.svm.SVC.html (2021)

A Feature Selection Technique-Based Approach for Author Profiling D. Radha and P. Chandra Sekhar

Abstract Author profiling is a technique of analyzing social media content like texts and images produced by users and classifying authors into classes such as age, gender, language variety, location, and so on according to the characteristics of their content. Authorship profiling techniques are extensively used in many applications like forensic analysis, security, marketing, education, reputation management, fake profiles prediction, and sentiment analysis. The process of prediction is initially started with identification of suitable stylistic features which differentiates the writing style of authors. Most of the authors are successful to extract good number of stylistic features, but they are not achieved good accuracies for author profiles prediction. Several researchers got good accuracies for profiles prediction when the contentbased features like most frequent words, most frequent n-grams of words, part of speech tags, and characters are used. In this paper, the experiment started with the terms which are most frequent in the dataset. It was recognized that the accuracy is not increased for profiles prediction because of irrelevant and redundant terms in the vocabulary set. The experiment continued with feature selection algorithms to identify the best informative features as well as avoiding the redundant features. The documents are represented as vectors by using these top informative features. In this work, we proposed a new feature selection algorithm based on the distributional information of the terms in different classes. Various machine learning algorithms are used in this experiment to assess the efficiency of the proposed feature selection algorithm. The proposed work attained best accuracies for prediction of age and gender than most of the popular methods in author profiling.

D. Radha Research Scholar, GITAM, Visakhapatnam, India Department of CSE, Malla Reddy College of Engineering and Technology, Hyderabad, India P. Chandra Sekhar (B) Department of CSE, GITAM, Visakhapatnam, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_52

583

584

D. Radha and P. Chandra Sekhar

1 Introduction The Internet allowed an interactive and massive communication to exchange of information among people from different geographical areas, gender, age, socio-economic level, etc. Recently, social media has gained an important popularity for some services to easily share information such as messaging, chats, blogs, among others. With the exponential development of information in the Internet, the crimes also developed like threatening mails, harassing messages, and fake profiles in the Web. Finding the author information of a text became hot research topic in recent years to overcome the problems with false information. Authorship analysis (AA) is one such area to extract useful author information by examining the characteristics of their written texts [1]. Authorship profiling (AP) is a type of AA technique that is extensively used in various text processing applications like forensic analysis, marketing, education, security, and business. Developing models for author profiling requires a vast quantity of user written data. Several social media environments such as Twitter, Facebook, and different Web forums consist of this type of data. The researchers gather data from these platforms and extracted different stylistic features which differentiate the writing style of author. The features extraction techniques used by most of the AP approaches belong to three possible categories such as stylistic, content dependent, and deep learning techniques [2, 3]. The researchers experimented with content-based features like N-Grams of character, word, and part of speech (PoS) tags and observed that the accuracies are improved for prediction of author profiles. Some researchers identified that there is a problem with huge number of features in the experimentation and they used some feature selection (FS) techniques to reduce the features count. In this work, the experiment conducted with various feature selection algorithms and proposed a new FS algorithm to recognize the best relevant features to predict the characteristics of an author. The feature selection algorithms compute the score of term in the dataset. The top ranked terms or features are used in the representation of document vectors. The document vectors are forwarded to machine learning algorithms to generate the classification model. This model is used to detect the class label of gender and age of a test document. Various MLAs are considered in this work for evaluating the efficiency of proposed feature selection algorithm. The PAN competition 2014 reviews dataset is used in this experiment. The gender and age are considered as author profiles. This work is planned in 10 sections. The information about dataset is displayed in Sect. 2. The importance of FS algorithms, various existing feature selection algorithms, and proposed feature selection algorithm are discussed in Sect. 3. The experimental results of feature selection algorithms for gender and age prediction are discussed in Sect. 4. The results are analyzed in Sect. 5. The conclusions are explained with future scope in Sect. 6.

A Feature Selection Technique-Based Approach for Author Profiling Table 1 The PAN 2014 competition reviews dataset

Classes/profiles Gender Age

585 Number of reviews

Male

2080

Female

2080

18–24

360

25–34

1000

35–49

1000

50–64

1000

65+

800

2 Dataset Characteristics The plagiarism, authorship, and social software misuse (PAN) is a competition which organizes competitions on different tasks like plagiarism detection, authorship attribution, authorship verification, author profiling, etc., in every year by changing the tasks. The author profiling task was introduced in the year 2013 with different datasets, and the profiles need to predict are gender and age. In this work, the dataset was taken from PAN 2014 competition reviews dataset [4]. The dataset characteristics are presented in Table 1.

3 Feature Selection Techniques-Based Approach for Author Profiling Figure 1 shows the author profiling approach based on feature selection techniques. In this approach, first, the training dataset is cleaned by using preprocessing techniques such as stop-word removal and stemming. After cleaning the text, extract all terms from the dataset. Feature selection techniques are used to calculate the score for every term. Select the top scored terms in the experiment and considered as a bag of words (BoW). Then, represent the documents as vectors with this BoW features. Finally, the document vectors are passed to machine learning algorithms to construct the classification model internally. The test vectors are passed to this model to predict the age and gender of an author. The term frequency measure is used to calculate the value of a term in vector representation. In this work, various popular FS techniques are used to recognize the best features as well as a new feature selection technique is proposed to identify the best informative features to enhance the accuracy of a gender and age. In this work, the experiment conducted with three existing filter methods and one proposed filter method.

586

D. Radha and P. Chandra Sekhar

Fig. 1 Proposed approach based on feature selection techniques

3.1 Chi-Square (CHI2) Measure Chi-Square (χ2 ) measure selects the important features based on the dependency among class and the feature [5]. The CHI2 measure score is in between 0 and 1. 0 indicates the feature is completely irrelevant to the class, and 1 indicates the feature is completely relevant to the class. If the CHI2 score of a feature is more means, the feature is more informative to the class. Equation (1) is used to compute the CHI2

A Feature Selection Technique-Based Approach for Author Profiling

587

score of a term Ti in class Cj . χ 2 (Ti , C j ) =

N × (ai di − bi ci )2 (ai + bi ) × (ai + ci ) × (bi + di ) × (ci + di )

(1)

where, N is document count in dataset, ai is documents count in class C j contain the term T i , bi is count of documents in class C j does not contain the term T i , ci is count of documents in other than class C j contain the term T i , d i is count of documents in other than class C j does not contain the term T i . The CHI2 score of a term T in all classes is computed by using Eq. (2).  m  χ 2 (T ) = max χ 2 (Ti , C j )

(2)

j=1

where m is the classes count.

3.2 Mutual Information (MI) MI determines the relation among the features and the classes [6]. MI determines the mutual dependency of a feature T i and a category C j . The MI among the term Ti and class C j is computed by using Eq. (3).  MI(Ti , C j ) = log

P(Ti |C j ) P(Ti )

 (3)

where P(T i |C j ) is the proportion of class C j documents contain the term T i , P(T i ) is proportion of documents in all classes contain the term T i . The MI score of a term T in all classes is computed by using Eq. (4).  m  MI(T ) = max MI(Ti , C j ) j=1

(4)

where m is the classes count.

3.3 Information Gain (IG) IG is feature selection technique in text classification that measure the amount of information got for a given category by having a term in a document or lacking the term [7]. IG feature selection chooses the terms which scores higher gain information.

588

D. Radha and P. Chandra Sekhar

The IG of a term T is computed by using Eq. (5). IG(T ) = −

m 

P(C j ) log(P(C j )) + P(T )

j=1

m 

P(C j |T ) log(P(C j |T ))

j=1

+ P(T )

m 

P(C j |T ) log(P(C j |T ))

(5)

j=1

where m is classes count, P(C j ) is the proportion of documents count in class C j relative to total documents count in training dataset. P(T ) and P(T ) are the proportion of documents contains term T and does not contain term T in whole dataset, respectively. P(C j |T ) and P(C j |T ) are the proportion of class C j documents contain term T and does not contain term T, respectively.

3.4 Proposed Distributional Class Specific Correlation-Based FS Technique (DCC) In this work, n new feature selection technique is proposed to recognize the terms which are having strong dependency with a class. Equation (6) is used to compute the DCC score of a term Ti in class Cj . 

DCC Ti , C j





       P Cj P C j |Ti |C|   = P(Ti )   P(C|Ti ) P C j |Ti P C j |Ti −

N 

correlation(Ti , Tk )

(6)

k=1

where P(T i ) is the proportion of documents contains term T i in whole dataset, P(C j ) is the proportion of class C j documents in whole dataset, P(C j |T i ) is the proportion of class C j documents contains term T i , P(C j |T i ) is the proportion of other than class Cj documents contains term T i , |C| is number of classes and P(C|T i ) is the proportion of classes contains term T i in whole classes. The correlation (T i , T k ) is computed by using Eq. (7). 

correlation Ti , T j



=



  T T − T − T i,d i j,d j d∈|docs|  2

 2 d∈|docs| Ti,d − Ti d∈|docs| T j,d − T j



(7)

where T i,d and T j,d are the count of occurrences of T i and T j in dth document, respectively, Ti and T j are the mean values of feature T i and T j vectors.

A Feature Selection Technique-Based Approach for Author Profiling

589

Table 2 The accuracies of gender and age prediction when experimented with most frequent terms Profiles—machine learning algorithms/most frequent terms

Gender prediction NBM

SVM

RF

Age prediction NBM

SVM

RF

2000

0.6178

0.6445

0.6547

0.5178

0.5513

0.5907

4000

0.6323

0.6606

0.6891

0.5347

0.5864

0.6155

6000

0.6607

0.6983

0.7208

0.5621

0.6109

0.6476

8000

0.6792

0.7154

0.7439

0.5850

0.6287

0.6654

In this measure, the term distribution information in all classes and class specific information of terms is considered as well as the redundant information is removed.

4 Experimental Results In this experiment, the experiment starts with most frequent terms and then continued with the features identified by the FS algorithms.

4.1 Experimental Results with Most Frequent Terms The most frequent 8000 bag of words is used in this experiment. The experiment starts with 2000 terms. In every iteration, the number of terms is increased by 2000. The experiment stopped at 8000 terms. It was observed that the accuracies reduced when experiment was done with more than 8000 terms. Three MLAs are used to evaluate the accuracy of gender and age prediction. Table 2 displays the accuracies of gender and age when experimented with 8000 bag of words. In Table 2, the RF classifier achieved an accuracy of 0.7439 for gender prediction and 0.6654 for age prediction when experimented with most 8000 most frequent terms as features. The experiment conducted with up to most frequent 8000 terms. It was found that the accuracy was decreased when experiment continued with after 8000 features.

4.2 Experiment with Feature Selection Techniques In this experiment, different feature selection algorithms like MI, IG, CHI2, and DCC are used for age and gender prediction. The feature selection algorithms select best informative features for document representation. Table 3 displays the accuracies of gender and age prediction when experimented with NBM classifier and the most ranked 8000 terms which are computed by feature selection algorithms.

590

D. Radha and P. Chandra Sekhar

Table 3 The accuracies of gender and age prediction when experimented with NBM classifier Profiles—feature selection techniques/top ranked terms

Gender prediction MI

IG

CHI2

DCC

Age prediction MI

IG

CHI2

DCC

2000

0.6423

0.6824

0.7217

0.7619

0.5939

0.6132

0.6680

0.6830

4000

0.6845

0.7187

0.7408

0.8063

0.6143

0.6357

0.6852

0.7128

6000

0.7109

0.7561

0.7894

0.8305

0.6409

0.6621

0.7123

0.7387

8000

0.7267

0.7719

0.8032

0.8456

0.6525

0.6886

0.7207

0.7491

The proposed DCC feature selection algorithm attained an accuracy of 0.8456 for gender prediction and 0.7491 for age prediction when 8000 most ranked terms are used for vector representation of documents. Table 4 displays the accuracies of gender and age prediction when experimented with SVM classifier and the most ranked 8000 terms which are computed by feature selection algorithms. The proposed DCC feature selection algorithm attained an accuracy of 0.8612 for gender prediction and 0.7864 for age prediction when 8000 most ranked terms are used for vector representation of documents. Table 5 displays the accuracies of gender and age prediction when experimented with RF classifier and the most ranked 8000 terms which are computed by feature selection algorithms. The proposed DCC feature selection algorithm attained an Table 4 The accuracies of gender and age prediction when experimented with SVM classifier Profiles—feature selection techniques/top ranked terms

Gender prediction MI

IG

CHI2

DCC

Age prediction MI

IG

CHI2

DCC

2000

0.6143

0.6711

0.7421

0.7983

0.5614

0.6321

0.6745

0.7113

4000

0.6589

0.7102

0.7877

0.8127

0.6057

0.6532

0.7078

0.7355

6000

0.6958

0.7363

0.8103

0.8433

0.6289

0.6965

0.7229

0.7661

8000

0.7157

0.7678

0.8262

0.8612

0.6523

0.7109

0.7512

0.7864

Table 5 The accuracies of gender and age prediction when experimented with RF classifier Profiles—feature selection techniques/top ranked terms

Gender prediction MI

IG

CHI2

DCC

Age prediction MI

IG

CHI2

DCC

2000

0.6611

0.7132

0.7515

0.8175

0.5810

0.6122

0.6613

0.7507

4000

0.6823

0.7587

0.7848

0.8354

0.6055

0.6481

0.7064

0.7866

6000

0.7256

0.7959

0.8223

0.8667

0.6334

0.6860

0.7379

0.7921

8000

0.7567

0.8126

0.8557

0.8831

0.6559

0.7104

0.7658

0.8108

A Feature Selection Technique-Based Approach for Author Profiling

591

accuracy of 0.8831 for gender prediction and 0.8108 for age prediction when 8000 most ranked terms are used for vector representation of documents.

5 Conclusion and Future Scope The AP task is used to determine the demographic features of humans by analyzing their texts. In this work, the experiment conducted with various feature selection algorithms and a new feature selection algorithm was proposed. It was observed that the proposed feature selection algorithm attained best accuracies for gender and age prediction. The RF classifier achieved good accuracy of 0.8831 for gender prediction and 0.8108 for age prediction when experimented with 8000 top ranked features generated by the proposed FS algorithm. In future, it was planned to implement different term weight measures to assign suitable weight to the terms in vector representation. It was also planned that find a suitable deep learning architecture with best hyper-parameters to increase the accuracies of various profiles prediction.

References 1. El, S.E.M., Kassou, I.: Authorship analysis studies: a survey. Int. J. Comput. Appl. 86(12), 22–29 (2014) 2. Argamon, S., Koppel, M., Fine, J., Shimoni, A.R.: Gender, genre, and writing style in formal written texts. Text-The Hague then Amsterdam then Berlin 23(3), 321–346 (2003) 3. Álvarez-Carmona, M.A., López-Monroy, A.P., Montes-y Gómez, M., Villaseñor-Pineda, L., Meza, I.: Evaluating topic-based representations for author profiling in social media. In: IberoAmerican Conference on Artificial Intelligence. Springer, pp. 151–162 (2016) 4. https://pan.webis.de/clef14/pan14-web/author-profiling.html 5. Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, Nashville, pp. 412–420 (1997) 6. Hall, M.A.: Correlation-Based Feature Selection for Machine Learning (Ph.D. thesis). University of Waikato, Hamilton, New Zealand (1999) 7. Lee, C., Lee, G.: Information gain and divergence-based feature selection for machine learningbased text categorization. Inf. Process. Manag. 42(1), 155–165 (2006)

Detection of Fake News Using Natural Language Processing Techniques and Passive Aggressive Classifier K. Varada Rajkumar, Pranav Vallabhaneni, Krishna Marlapalli, T. N. S. Koti Mani Kumar, and S. Revathi

Abstract Today, inside the time of innovation, we invest the vast majority of our time on the Web; we get a great deal of data from various sources. We are digital and future citizens, so we have obligation of battling counterfeit word getting out and controlling our lives. Web-based media is making immense effect on the clients. It is in any event, delivering fake information which might give bogus charges. We are building a model utilizing AI strategies to distinguish the phony news on the Web. The word is a modifier in regards to or indicating conditions inside which target realities are less compelling in molding conviction than claims to feeling and private conviction. Utilizing scikit-learn, we assemble a TF-IDF vectorizer (term frequency, inverse document frequency) on our dataset. Then, at that point, we instate a passive aggressive classifier and we fit the model. Eventually, the value of accuracy and furthermore the confusion matrix reveal to us how well our model passages. This algorithm is an agile algorithm and may adjust to the new processes without any problem. All things considered, it addresses the inquiry “What is the “REAL” and “FAKE” news from the spreading news.”

1 Introduction Fake news detection is a method for detecting the fake news, spreading vigorously, from the real news. With the progression of innovation, digital news is all the more broadly exposed to the users universally and adds to the increment of spreading deceptions and disinformation on the Web. Fake news can be found through famous stages like online media and the Internet. Spreading disinformation holds different aims, specifically, to acquire favor in political decisions, for business and items, done in a spirit of meanness. People can be Naïve and fake news is trying to separate from the ordinary news. K. Varada Rajkumar (B) · P. Vallabhaneni · K. Marlapalli · T. N. S. Koti Mani Kumar · S. Revathi Department of CSE, Sir C R Reddy College of Engineering, Eluru, Andhra Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_53

593

594

K. Varada Rajkumar et al.

We present our starter investigates applying AI strategies for counterfeit news discovery. The fundamental goal is to distinguish fake information, which is an exemplary book order issue with a straight suggestion. It is expected to assemble a model that can be separate between “Genuine” news and “Phony” news. Previously, there are only the offline tools to detect the fake news from the huge data. The research only based on the words in the dataset. They used binary classification techniques to detect fake news. The accuracy is very less. So we proposed a online learning model which acts as passive for correct input and turns aggressive for false or counterfeit input. We used TF-IDF vectorizer and passive aggressive classifier on the preprocessed data.

2 Literature Survey A stacked ensemble of 5 machine learning classifiers developed by the students. This exhibition acquires saw in the advancement set did not appear in the opposition, however, because of a significantly more troublesome visually impaired test set [1]. A prototype system which uses the social argumentation for verifying the validity of proposed facts and strive to detect the false news from the media. They utilized fundamental argumentations technique ideas in graph- theoretic framework which also incorporate the linked data principles and semantic Web [2]. It involves a conceptual research by illustrating the unique features of satires [3]. B-TransE model for detecting the fake news using knowledge graphs based on news content. But they need to address technical challenges like firstly, computational oriented checking was not comprehensive enough for covering the relations that fake news detection needed. Secondly, this is challenging for validating the extracted triples correctness from news articles they used the Kaggle’s-getting real about fake news dataset and some true articles from the media [4]. The datasets using Twitter API, resulting in 15% of news as fake and 45% as real by using SVM and random forest classification algorithms [5]. Network-based clues are used for detecting the fake news. They study the fake news patterns in the social networks [6]. They are represented at the various levels as node-level, ego-level, triad-level, and overall network for utilizing in detecting the fake news. The approach proposed now increases the explain ability in the fake news feature engineering. But building the network models is a little difficult during the model construction. Some of the supervised machine learning algorithms are random forest, support vector machines, Naive Bayes [7], K-nearest neighbor (KNN), regression, logistic regression, etc. In supervised learning, the specific algorithm will learn the insights from the training data which is provides by us to train the model and also directs the students learning process [8]. The algorithms are trained with trained data in such a way to make the model give targeted outputs. In supervised technique, we have input values x and then output values y. We use algorithm to map a function y = f (x). The main goal is to find mapping function during training, such that if we input new values during testing, we can get predicted outputs. When the readers come across fake news while searching for real news, people may believe

Detection of Fake News Using Natural Language Processing …

595

that, it is another fake news. So built a comparative analysis on fake news by using multinomial Naïve Bayes, passive aggressive classifier, and decision tree classifier using techniques of natural language processing.

3 Proposed Work 3.1 Dataset The data for fake news detection using machine learning were taken from the Kaggle dataset with a shape of −7796 × 4. The first column indicates the news id, and the second column and third column represent the title and the text, respectively, fourth column give the labels denoting whether the news is the real or the fake.

3.2 Data Preprocessing Data preprocessing is a technique to clean the data by removing the unwanted characters, duplicate data, stopwords. Figure shows the preprocessing steps. The steps involved in preprocessing are 1. 2. 3.

Tokenization Remove punctuation Remove stopwords

The first step is tokenization where the sentences of the text are split into individual words called tokens in order to make the prediction task easier for machine learning model used. The second step is removal of punctuation helps in removing the punctuations defined by string module which does not add any weight to the fake news detection. We import string module which consists of most used punctuations. Stopwords removal consists of removing the commonly used words such as a, an, the, in, and an (Fig. 1). Data cleaning includes deleting duplicative/repetitive or unimportant qualities from the dataset. Repetitive perceptions modify productivity by an extraordinary degree as the information rehashes and may add toward the right side or toward the inaccurate side, along these lines creating untrustworthy outcomes.

3.3 Train-Test Split of Dataset The train-test split helps in estimating the performance of passive—aggressive classifier which is a machine learning algorithm. It is suitable for large dataset and a

596

K. Varada Rajkumar et al.

Fig. 1 Preprocessing steps

good estimate to model the performance quickly. We divide the dataset into test data and train data. This train set provides evaluating of model by giving the ideal standard. We split the dataset into 80–20% as train dataset and test dataset, respectively. Training the model is the most crucial part of the machine learn, so 80% of the data is used as train set to get more accuracy to the model used.

3.4 Feature Extraction Feature extraction corresponds to set the new features from existing features which are relevant to the model. There will be large number of words, phrases, terms, etc., which contribute for learning process. They constitute the features which consist of relevant and irrelevant to the data. These irrelevant features may affect the accuracy of the proposed model. So, to train the proposed classifier, we extract the features which can be converted to numeric values. In our system, we are using the term frequency, inverse document frequency.

Detection of Fake News Using Natural Language Processing …

597

3.5 TF-IDF Vectorizer TF-IDF stands for term frequency and inverse document frequency. TF-IDF vectorizer transforms a sample documents set into a matrix of the TF-IDF features, which is then sent to passive aggressive classifier to fit the model. This is a binary approach which is employed in search engines and spam filters, through which we decode the words or tokens into continuous numbers rather than integers. 

wi, j = t f i, j

N × log d fi

 (1)

• Term Frequency (TF): It provides the count of the specific word appearing in the given document. It is the ratio between the number of times the mentioned word appears with entire number of words in the given document. n i, j t f i, j =  k n i, j

(2)

• Inverse Document Frequency (IDF): IDF gives information about how significant a term within the corpus. Gives the heaviness of uncommon words in the document. Ignore the words which are irrelevant to the detection.  id f (w) = log

N d fi

 (3)

The results of term frequency inverse document frequency are applied to find the words in the corpus which are useful for query. Words’ having high value of TF-IDF numbers implies that they have a strong relationship with the document.

3.6 Passive Aggressive Classifier Passive aggressive classifier is an efficient online machine learning algorithm which remains passive for accurate classification outcome and becomes aggressive for any miscount or irrelevant input. It is one of the valuable and effective applications in machine learning. It is mainly used when we have a large amount of dataset to be processed. They are similar to perceptron model that they do not need a learning rate parameter. • Passive: If the given forecast is right, leave the model as usual and do not attempt any changes or improvements means that the info is passive to the model and not enough to bring any changes in the model. • Aggressive: Changes to the model are made, if the input gives wrong predictions. They are the key data to provide efficient changes to the model being proposed.

598

K. Varada Rajkumar et al.

Fig. 2 Flowchart for detecting fake news

3.6.1

Accuracy Score

It gives the percentage of correct predictions to the test dataset. It is the ratio of the true predictions to overall number of predictions. It gives the insight about how well our model works in comparison with the previous models. Accuracy =

3.6.2

No. of correct predictions Total no. of predictions

(4)

Confusion Matrix

This is used for evaluating the performance of the model which is proposed. They give the true positives, false positives, true negatives, false negatives. It compares the predicted values given by the model to the actual values of the specific target variable. Figure 2 shows the flowchart of detecting fake news.

4 Algorithm and Implementation 4.1 Approach Step 1:

Step 2: Step 3: Step 4:

Make necessary imports: from scikit-learn >> NumPy, pandas, intertools, train_test_split, TF-IDF vectorizer, passive aggressive classifier, accuracy__score, confusion__matrix. Step 2: Read the data into a DataFrame and get shape of the data. Step 3: Preprocessing the data by using techniques like tokenization, stemming, stop words removal, and punctuation removal. Step 4: And we will get labels from the DataFrame.

Detection of Fake News Using Natural Language Processing …

Step 5: Step 6:

Step 7: Step 8:

599

Step 5: Now, split the dataset into training and testing datasets. Step 6: After that initialize a TF-IDF vectorizer with the stopwords from English language with a maximum document_frequency of 0.7 (terms with the high document frequency gets removed) and a TF-IDF vectorizer turns the collection of raw documents into the matrix of TF-IDF vector features. Now, we will be initializing a passive aggressive classifier. That then will fit this on TF-IDF__train and y__train. Then, we will predict the test set from the TF-IDF vectorizer and calculate the accuracy.

5 Experimental Results and Discussion First, we make the necessary imports from scikit-learn module. We then load the data into a DataFrame and get the shape of it. Preprocessing the data is the keenest step and needs to be done before analyzing the data under model defined. Tokenization, removing punctuation, and stopwords are the important preprocessing techniques. Dataset is then split into test and train dataset. We send the split data to the TF-IDF vectorizer to transform the data into a matrix of value which is then given as an input to the classifier. The passive aggressive classifier fits the model under the given data. To know the correctness confusion matrix is calculated which provides us the records which are true positives, true negatives, false positives, and false negatives. By huge amount of input or train set, we can increase the efficiency of our model. We make necessary imports: NumPy, pandas, itertools, train_test_split, TF-IDF vectorizer, passive aggressive classifier, accuracy_score, confusion_matrix. Importing the given data and browse the information into DataFrame and get the form of the information and get the labels from the DataFrame. After that we divide our dataset into train and test sets. We preprocessed the data using techniques like tokenization, removal of punctuations, stopwords removal and eliminating all the unwanted data from our content and makes the evaluation of model easy and efficient. These are removes because they do not provide any weight to our analysis and results in excess time consumption. So preprocessing the dataset is the keen step in processing the data using machine learning model. Because, in training set, we can extract features and train to suit a model, and in testing set, we will predict using the model obtained on the training set. At that time, we initialized a TF-IDF vectorizer with stopwords and a maximum value of inverse document frequency as 0.7 (words having better frequency of document are going to be removed). Stopwords are the foremost similar words in a vocabulary which are to be filtered prior to handling the linguistic communication data. And a TF-IDF vectorizer changes a set of raw catalog into matrix of TF-IDF features. They provide the data with feature set by applying the feature extraction. Later, we have initialized a classification algorithm, passive aggressive classifier. That is, we will be fitting this on TF-IDFtrain and y-train.

600

K. Varada Rajkumar et al.

Passive aggressive classifier algorithms, being an online machine learning algorithm, learn from the input and act as aggressive for the inputs which turns to be negative to the model and remains passive for the inputs to which the model does not react to any changes. Train set is used for training the model, and we extract some features to the model proposed. We then provide the test dataset and evaluate the result and see whether the model provides correct output by comparing with the actual result. This is then defining the accuracy of the model. Accuracy score gives the parameter that how efficient our algorithm is and defines the suggestive changes to our model. Then, we will be predicting on test set from the TF-IDF vectorizer and calculate the accuracy of the predicted values with accuracy_score() function from sklearn metrics package. We compared the accuracy of the proposed model with the previous models and succeeded in getting a high accuracy than the others. In Fig. 3, the models SVM, CNN, and Navie Bayes are the existing systems which give the accuracy values for detecting fake news. SVM, being a traditional machine learning algorithm, gives an accuracy of 73%. Deep learning model CNN gives 86% accuracy. Navie Bayes gives an accuracy of 74.5%. But the proposed system gave an accuracy of 93.37% which is more than the existing systems. Therefore, passive aggressive classifier shows the best results in detecting the fake news. We got fake news count as 3164 and real news count as 3171 from the imported dataset with an accuracy of 93.37%. The below bar chart gives the count of fake and real news in our data and provides the insights of how to move further. The proposed online machine learning algorithm is the best suited algorithm, while we are having large amount of data in the datasets. As processing the huge data is not convenient to traditional models. So this modern online algorithm helps in processing data dynamically.

Fig. 3 Comparison of the proposed model with the existing models

Detection of Fake News Using Natural Language Processing …

601

6 Conclusion and Future Scope During this project paper, we executed a shot for verifying the news story dependability counting on the respective characteristics. At this stage, we proposed an algorithm connecting various classification techniques with the model of texts. It accomplished well, and the results of accuracy were also moderately fulfilling. For future work, we arrange a finer study of the mix between the feature extraction techniques and accordingly classifiers, as we will be ready for choosing a model of representation of text that achieves well with that classifier.

References 1. Chen, M., Thorne, J., Myrianthous, G., Wang, J.P., Vlachos, A.: Fake news instance detection using stacked ensemble of classifiers. In: Proceedings of the 2017 Workshop: Natural Language Processing meet Journalism, pp. 80–83 (2017) 2. Sethi, R.J.: Crowdsourcing the verification of fake news and alternative facts. In: Proceedings of 28th ACM Conference on hypertext and Social Media, HT ’17, pp. 315– 316 (2017) 3. Conroy, N., Chen, V.R., Cornwell: Fake news or truth? Using satirical cues to detect potentially misleading news. In: Proceeding of the Second Workshop on Computational Approach to Deception Detection, pp. 7–17 (2016) 4. Jeff, Z.P., Pavlova, S., Li, C., Li, N.: Content based fake news detection using knowledge graphs. In: Proceedings of International Semantic Web Conference ISWC, pp. 669–683 (2018) 5. Helmstetter, S., Paulheim, H.: Weakly supervised learning for fake news detection on twitter. In: International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (2018) 6. Xinyi, Z., Zafarani, R.: Network-based fake news detection: A pattern- driven approach. In: Association for Computing Machinery, New York (2019) 7. Anushaya Prabha, T., Aisuwaruya, T., Vamsee Krishna Kiran, M., Vasudevan, S.K.: An innovative and implementable approach for online fake news detection through machine learning. J. Comput. Theor. Nanosci. 17(1), 130–135(6) (2021) 8. Joju, S., Kammath, P.S.: Analysis of fake news detection using machine learning. In: Proceedings of International research journal of engineering and technology. e-ISSN: 2395–0056 (2021)

Efficiency Analysis of Pre-trained CNN Models as Feature Extractors for Video Emotion Recognition Diksha Mehta, Janhvi Joshi, Abhishek Bisht, and Pankaj Badoni

Abstract Emotion recognition is a complex task that involves understanding and scrutinizing the information depicted by the human body in the form of various physiological signals. Earlier works have used these signals for manually extracting features, such as by using facial landmarks or by using Mel frequency cepstrum coefficient (MFCCs), to classify emotions. But with increasing computational power, neural networks are now widely being used to automate this task. Many state-ofthe-art convolutional neural network (CNN) models can be used to extract features and classify emotions. In this research, a quantitative comparison has been done between five state-of-the-art CNN models—DenseNet, ResNet, Inception, VGG16, and Xception along with their variants, as feature extractors. Two popular audiovisual datasets—RAVDESS and SAVEE—have been preprocessed and the resultant data is used as input for a multimodal model for emotion classification. This model is built by first fine-tuning pre-trained CNN models using preprocessed audio data and preprocessed video data and then combining the fine-tuned models in a fusion network. The results obtained through this research will prove to help select the appropriate CNN model in future works that make use of CNN models as feature extractors for emotion recognition.

D. Mehta Applied Information Sciences, Hyderabad, India J. Joshi University of Massachusetts, Amherst, USA A. Bisht International Institute of Information Technology, Hyderabad, India P. Badoni (B) University of Petroleum and Energy Studies, Dehradun, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_54

603

604

D. Mehta et al.

1 Introduction Emotions can be described as a set of predictable responses that humans tend to have in various situations. These responses often have accompanying physiological signals, for example, faster heartbeat, change in facial expressions, change in pitch of voice, and so on. When studied carefully, these changes can yield useful information that can be applied in various fields and domains. To avoid any confusion while working with emotions, over the years, a two-dimensional system for emotions has come up which defines all emotions on two axes—valence and arousal [8]. As shown in Fig. 1, low valence and low arousal indicate an unpleasant feeling which describes emotions like sadness and depression. Similarly, high valence and low arousal represent feelings like happiness. As research in deep learning and its various applications is increasing, the field of emotion detection and recognition is advancing rapidly. Many approaches present in the literature make use of convolutional neural networks (CNN), and a class of neural networks is specialized in analyzing images. A CNN architecture consists of three types of layers: convolution, max pooling, and classification [6]. A convolution layer consists of several nodes that extract features from input images using the convolution operation. A pooling layer reduces the dimension of these extracted features to ensure better accuracy of the output from the last layer, that is, the classification layer. Such an approach to image feature extraction has revolutionized computer vision and simplified the process of extraction of emotions. Over the years, several state-ofthe-art CNN models trained on huge benchmark datasets, like the ImageNet [7], CIFAR-10, CIFAR-100, MS-COCO [16], and SVHN [18] have come up. Usage of these tried and tested architectures ensure accurate and reliable results and gives room to researchers to explore new paths into further innovations. Some well-known architectures include VGG16 [25], GoogleNet [27], DenseNet [11], Xception [5], Inception [28], and so on.

Fig. 1 Two-dimensional system for emotions

Efficiency Analysis of Pre-trained CNN Models …

605

Recently, there has been an increase in combining all these features (facial expressions, audio signals, physiological signals, body expressions) to build a multimodal system [3, 21, 26]. It is found that such a multimodal approach yields more informative results [2, 4, 22, 33]. The study of facial expressions coupled with audio signals, particularly, is rapidly gaining importance due to the variety of its applications ranging from gaming, website customization, and education [14]. Moreover, these two signals are some of the easiest to capture with no requirement of any additional or complicated sensors. In this paper, certain preprocessing steps have been followed for the data which is followed by giving it as input to a large variety of state-of-the-art CNN models to provide a comparative analysis among them. DenseNet [11], Resnet [9], Inception [28], VGG16 [25], and Xception [5] have been evalualted by training them on two audio-video datasets—RAVDESS [17] and SAVEE [12]—in terms of their efficacy in emotion recognition using a multimodal system of facial expression and audio signals.

2 Related Works A lot of research has been done around unimodal emotion recognition systems that consider audio, visual, and textual features individually. In [15], Li et al. have done a detailed survey on facial expression recognition (FER). This survey includes details of all the available and widely used databases, common preprocessing steps required for FER, and state-of-the-art models in case of static and temporal facial data. Similarly, in [24], Schuller reviewed speech recognition system (SER) developments since the beginning of the concept until the latest approaches followed to recognize emotions through audio/speech. But complying with the multimodal methods (facial, speech, physiological signals, etc.) for detecting emotions, researchers are tending towards designing multimodal systems with high accuracy and robustness. Many researchers have concluded in their researches that multimodal emotion recognition systems are better than unimodal. In [2], Busso et al. classified four emotions by using speech recording, facial expressions, and both. This study concluded that when the two modalities were fused, the performance increased and that too by significant numbers. Likewise, in [22] Poria et al. have also compared the results by considering text, visual, and audio alone, and then combined. They also observed higher accuracies when the modalities are combined. Many researchers are finding different approaches to build multimodal systems by extracting features from recorded audios, videos, and biosignals. In [31], Zhang et al. proposed a multimodal approach considering audio and video, wherein a multimodal deep CNN is used to find results on the RML audio–video database [30]. The proposed approach includes two-step training, i.e., fine-tuning of pre-trained face and audio models and then fusion of these. Similarly, in [32], Zhang et al. proposed a fusion model (FM) called EMOdal which learns the common and modality-specific

606

D. Mehta et al.

information of audio and visual data and improves the inference ability, by adopting a common loss. In [19], Noroozi et al. classified emotions in audio channels by extracting MFCCs, filter bank energies, and prosodic features. For facial expressions, they extracted facial landmarks’ geometric relations and extracted frames, summarizing each video which were further classified into neural network (NN) classifier. Later, the three outputs (facial landmarks relations, neural net weights, and audio features) were fused to get the output of the multimodal model. In [13], Kim et al. proposed informed segmentation and labeling approach (ISLA) which uses speech signals to segment the facial data. The face is segmented into upper and lower face, and then, emotion is recognized using these along with speech. In [29], Tzirakis et al. use CNN and deep residual network (ResNet) for acoustic and visual feature extraction, respectively, stacked with two layers long short-term memory (LSTM) network with 256 cells each, to capture temporal information. Later, multimodal is trained end to end by inputting extracted features into LSTM by utilizing the weights of trained unimodal models. Further, Table 1 shows an elaborate comparison between the above-mentioned approaches based on the results achieved on different databases.

Table 1 Review of multimodal emotion recognition systems References Title Database [31]

[32]

[13]

[29]

Multimodal deep convolutional neural network for audio-visual emotion recognition Poster abstract: multimodal emotion recognition by extracting common and modality-specific information ISLA: temporal segmentation and labeling for audio-visual emotion recognition

RML audio–video database

Large-scale real-world dataset collected from 33 TV shows and 20 movies

IEMOCAP, SAVEE databases

End-to-end multimodal RECOLA database of emotion recognition using AVEC 2016 research deep neural networks challenge

Results Testing accuracy on: Audio-net = 66.17% Video-net = 60.79% Fusion-net = 74.32% Testing accuracy on: Audio-only = 52.20% Image-only = 63.73% Fused = 66.02%

Highest accuracy achieved: IEMOCAP = 67.22% (using audio + left face) SAVEE = 86.01% (using audio + upper face) Predictions on detecting arousal and valence on raw signals is 0.789 and 0.691, respectively.

Efficiency Analysis of Pre-trained CNN Models …

607

3 Methodology The proposed approach is to classify emotions using a multimodal CNN model which consists of two input streams—the visual input and the audio input, each of which uses a state-of-the-art CNN model as a feature extractor. The output of these streams is concatenated and fed as input to the fusion network which then further is connected to a classification layer [31]. The entire proposed approach thus can be divided into two main parts: Input Generation

Training

For this, the video files from datasets are preprocessed to obtain the inputs for the two input streams of multimodal model. This is described in detail in Sect. 3.1. The preprocessed output from the previous part is used to train the model. Training is done in two steps: first, the pre-trained models are fine-tuned, and then, the fusion network is trained. This process is described in detail in Sect. 3.2

The source code for our work can be found at the link given in the footnote.1

3.1 Input Generation Visual Input Generation The dataset contains videos classified into various emotions. According to Zhang et al. [10], the emotion depicted by a single frame could be different from that of the video as a whole, and therefore, the temporal relation between the frames is important. Moreover, there are many unwanted details present in the frame, for example, background, hair, and ears of the actors which do not contribute to the emotion. Hence, for video processing, extracted frames are cropped and aligned for removing unnecessary details, and then, frames are subtracted in order to highlight the temporal relations existing within them as shown in Fig. 2. Extraction of Frames Frames are extracted from all the videos depending on a fixed interval. After experimentation, this interval value was taken as 10, that is, every tenth frame of the video is considered in order to understand the underlying emotion of video well.

Fig. 2 Visual input generation 1

https://doi.org/10.5281/zenodo.5515542.

608

D. Mehta et al.

Detection of Region of Interest (ROI) and alignment of Frames The major step of preprocessing, that is, the subtraction step, requires the faces in all frames of a video to be aligned for better results. For this, ROI that is the facial part including eyes, nose, and mouth is detected. Classifiers like Haar-cascade and dlib are used for this purpose. In this approach, dlib is used because it provides higher accuracy [1]. After detection of ROI, cropped frames are rotated and aligned using Algorithm 1.

Subtraction of Aligned Frames Subtraction is the operation that is done on the images to highlight the differences between the images and accentuate the underlying emotion of the particular video [10]. Cropped and aligned frames of a video are subtracted till a final single image representing that video is computed. This final frame outlines the important features that depict the emotion of the video. Simple pixel by pixel subtraction is done using Eq. 1. S(i, j) = |I1 (i, j) − I2 (i, j)|

(1)

where S is the subtracted image, I1 is the first image and I2 is the second image The complete steps are given in Algorithm 2.

Audio Input Generation The proposed approach makes use of the mel-spectrogram for extracting the audio features [23]. The computed spectrogram is used as an input for the CNN models as shown in Fig. 3.

Efficiency Analysis of Pre-trained CNN Models …

609

Fig. 3 Audio input generation

Computation of Spectrogram For the audio corresponding to each video in the dataset, 64 Mel-filter banks 20–20,000 Hz have been used to compute log Mel-spectrogram of the entire signal using a 25 ms hamming window and hop length of 10 ms [20]. Spectrogram can be computed using Eq. 2, STFT{x[n]}(m, ω) ≡ X(τ, ω) =

∞ 

x[n]ω[n − m]e− jωn

(2)

n=−∞

where x[n] is the input signal, w[τ ] is the window function, and ST F T x[n] ≡ X is the short-time Fourier transform of the input signal. To convert the spectrogram into Mel-spectrogram, the frequencies (denoted by f ) are converted to a Mel scale representation using Eq. 3.   f (3) m = 2595 log10 1 + 700 Since the CNN models take three channels as input, the derivatives of the spectrogram are computed to obtain static, delta (δ)—first dertivative of spectrogram and delta2 (δ 2 )—the second derivative of spectrogram, channels to provide as input to the model. Three channels are necessary in order to provide input to the CNN models. The usefulness of δ and δ 2 channels has already been demonstrated in earlier works [31]; therefore, these channels have been used in our proposed approach. The threechannel spectrogram thus obtained is resized to match the input shape specification of the model. This is done in run time using bilinear interpolation.

3.2 Training of Models As shown in Fig. 5, the multimodal model is trained in two stages. The first stage involves fine-tuning of the pre-trained CNN models on audio and visual data followed by training of the fusion network. Both the stages are explained in detail in the following subsections.

610

D. Mehta et al.

Fig. 4 Fine-tuning of pre-trained models

Fine-Tuning of Pre-trained Models Figure 4 shows the fine-tuning step in which pre-trained CNN models trained on huge datasets such as ImageNet are loaded and modified by replacing the last layer used for classification by our own classification layer which uses softmax as the activation function. These models are then fine-tuned using our considered datasets separately on facial and audio preprocessed data. Training of Fusion Network In this step, the fine-tuned audio and facial models are loaded from the disk, and parameters of models are made non-trainable. By doing this, the fine-tuned models act as feature extractors. Classification layers of models are discarded, and then models are concatenated together to obtain a join feature vector. This feature vector is connected to a fully connected layer with 2048 hidden units. This layer is further connected to a classification layer to classify the input into different emotion classes as shown in Fig. 5. Hyperparameter Selection After experimentation, the following set of hyperparameters was found to be best suited for our purpose: Batch size = 8, epochs = 200, optimizer = RMSprop, activation function = ReLU.

3.3 Datasets Used The following audio visual datasets have been used for our experimentation, and the results obtained when they were passed to our model as input have been presented.

Efficiency Analysis of Pre-trained CNN Models …

611

Fig. 5 Fusion network training

RAVDESS Ryerson Audio-Visual Database of Emotional Speeches [17] is a validated set of audio-visual speech and song dataset in North American English. It contains 7356 files with unique filenames by 24 professional actors (12 male, 12 female), vocalizing two lexically-matched statements in a neutral North American accent. Speech emotions include Calm, Happy, Sad, Angry, Fearful, Surprise and Disgust, while song emotions include Neutral, Calm, Happy, Sad, Angry and Fearful. The dataset is validated from 319 raters. SAVEE Surrey Audio-Visual Expressed Emotion [12] is an audio-visual dataset of expressed emotions with six basic emotions—Common, Anger, Disgust, Fear, Happiness, Surprise, Neutral. It has recordings from 4 male actors in 7 different emotions, which amount to 480 British English utterances in total. The sentences were chosen from the standard TIMIT corpus and phonetically—balanced for each emotion. The recordings were evaluated by 10 subjects under audio only, visual only, and audio-visual conditions to check the quality of performance.

4 Result Analysis In this section, results obtained from training of all the different versions of the considered CNN models are examined. Models are trained and tested on two datasets, i.e., RAVDESS [17], and SAVEE [12]. Performance of models are compared, and the best performing models are further scrutinized. In Table 2, the performance of CNN models with different versions is tested on the SAVEE dataset, and accuracies are obtained. Here as well, the FM displays much better results. In the case of SAVEE dataset, the visual recognition rate is higher than acoustics. The best accuracy obtained on SAVEE is 82.29% by fused model inception_resnet_v2, followed by 72.91% by fused model Xception and DenseNet121. On the other hand, VGG and ResNet show lesser accuracies.

612

D. Mehta et al.

It is observed that inception_resnet_v2 and Xception models consistently show better accuracies on the fused model (FM) as well as on the separate visual (VM) and audio models (AM). Figure 6 depicts emotion classes of SAVEE and confusion matrix of inceptio_resnet_v2 and Xception trained on SAVEE because these two models are showing highest accuracies in case of this database. Neutral emotion is correctly classified by both the models as is visible in Fig. 6a, b. Both the models are getting confused while classifying disgust. In Table 3, the performance of CNN models with different versions is tested on the RAVDESS dataset and accuracies are obtained. From the table, it is clear that the FM outperforms the visual and audio models (VM and AM). The best accuracy obtained on RAVDESS is 75.42% by fused model inception_resnet_v2, followed by

Table 2 Performance in terms of visual, audio, and fusion model accuracies obtained on considered CNN models trained and evaluated on the SAVEE dataset CNN model Version Accuracies(%) VM AM FM DenseNet 121 64.58 53.12 72.91 169 62.5 51.04 71.87 201 52.08 48.95 60.41 ResNet 50 59.37 40.62 60.41 50v2 42.70 45.83 59.37 101 43.75 46.87 56.25 101v2 38.54 36.45 40.62 152 64.58 38.54 62.5 152v2 43.75 44.79 54.16 Inception v3 41.67 48.96 53.12 resnetv2 72.91 56.25 82.29 VGG 16 31.25 27.08 31.25 Xception 73.95 42.70 72.91

(a) Confusion matrix of Inception resnet v2 model

(b) Confusion matrix of xception model

Fig. 6 Confusion matrix of inception_resnet_v2 and xception trained on SAVEE

Efficiency Analysis of Pre-trained CNN Models …

613

Table 3 Performance in terms of visual, audio, and fusion model accuracies obtained on considered CNN models trained and evaluated on the RAVDESS dataset CNN model Version Accuracies (%) VM AM FM DenseNet

ResNet

Inception VGG Xception

121 169 201 50 50v2 101 101v2 152 152v2 v3 resnetv2 16

38.56 35.83 34.12 32.76 31.05 36.17 27.98 31.05 34.13 28.66 53.92 34.81 47.78

(a) Confusion matrix of Inception resnet v2 model

63.48 66.21 62.45 48.12 46.41 49.14 40.61 60.75 40.24 53.92 61.09 13.31 67.91

68.25 72.69 69.28 59.72 57.67 60.75 43.68 66.21 49.15 51.19 75.42 37.88 71.67

(b) Confusion matrix of xception model

Fig. 7 Confusion matrix of inception_resnet and Xception trained on RAVDESS

72.69% by fused model DenseNet169 and 71.67 by fused model Xception. While ResNet has shown overall less accuracy with all the versions. Figure 7 shows emotion classes of the RAVDESS dataset along with confusion matrices of the two models with higher accuracies, that is, the inception_resnet_v2 and Xception CNN models. In Fig. 7, it is visible that the inception_resnet_v2 model classifies calm, happy, angry, and surprise emotions very well, whereas it gets a bit confused while recognizing neutral emotion. Similarly, in Fig. 7, it can be observed that Xception classifies calm, angry, and surprise very well like the first model; however, it misclassifies neutral as calm.

614

D. Mehta et al.

5 Conclusion and Future Works In this work, five state-of-the-art CNN models and their various versions are compared based on their performance as facial and audio feature extractors for emotion detection. Inception_Resnet_v2 outperforms all other models with 75.42% accuracy on RAVDESS and 82.29% on SAVEE. Xception and DenseNet show the secondhighest accuracies on both the datasets. The work also shows that the fused model obtains better accuracies than the individual facial and audio models, and therefore, the multi-modality of models helps in gaining better results. In the future, there is a plan for exploring more CNN models as well as making use of LSTM for handling the temporal nature of data. It has also been planned to experiment with more hyperparameters to enhance the results.

References 1. Adouani, A., Ben Henia, W.M., Lachiri, Z.: Comparison of haar-like, hog and lbp approaches for face detection in video sequences. In: 2019 16th International Multi-Conference on Systems, Signals Devices (SSD), pp. 266–271 (2019) 2. Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Lee, S., Neumann, U., Narayanan, S.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the 6th International Conference on Multimodal Interfaces, pp. 205–211. ICMI ’04, Association for Computing Machinery, New York, NY, USA (2004). https://doi.org/10.1145/1027933.1027968 3. Canento, F., Fred, A., Silva, H., Gamboa, H., Lourenço, A.: Multimodal biosignal sensor data handling for emotion recognition. In: SENSORS, 2011 IEEE, pp. 647–650 (2011) 4. Castellano, G., Kessous, L., Caridakis, G.: Emotion recognition through multiple modalities: face, body gesture, speech, pp. 92–103. Springer Berlin Heidelberg, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85099-1_8 5. Chollet, F.: Xception: deep learning with depthwise separable convolutions, pp. 1800–1807 (2017). https://doi.org/10.1109/CVPR.2017.195 6. Cire¸san, D.C., Meier, U., Masci, J., Gambardella, L.M., Schmidhuber, J.: Flexible, high performance convolutional neural networks for image classification. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol. 2, pp. 1237– 1242. IJCAI’11, AAAI Press (2011) 7. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009) 8. Egger, M., Ley, M., Hanke, S.: Emotion recognition from physiological signal analysis: a review. Electron. Notes Theoret. Comput. Sci. 343, 35–55 (2019) 9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) 10. He, Z., Jin, T., Basu, A., Soraghan, J., Di Caterina, G., Petropoulakis, L.: Human emotion recognition in video using subtraction pre-processing. In: Proceedings of the 2019 11th International Conference on Machine Learning and Computing, pp. 374–379. ICMLC ’19, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3318299. 3318321 11. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017)

Efficiency Analysis of Pre-trained CNN Models …

615

12. Jackson, P., ul haq, S.: Surrey audio-visual expressed emotion (savee) database (2011) 13. Kim, Y., Provost, E.M.: Isla: temporal segmentation and labeling for audio-visual emotion recognition. IEEE Trans. Affective Comput. 10(2), 196–208 (2019) 14. Kołakowska, A., Landowska, A., Szwoch, M., Szwoch, W., Wróbel, M.R.: Emotion Recognition and Its Applications, pp. 51–62. Springer International Publishing, Cham (2014). https:// doi.org/10.1007/978-3-319-08491-6_5 15. Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE Trans. Affective Comput. 1-1 (2020) 16. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision—ECCV 2014, pp. 740–755. Springer International Publishing, Cham (2014) 17. Livingstone, S., Russo, F.: The Ryerson audio-visual database of emotional speech and song (ravdess): a dynamic, multimodal set of facial and vocal expressions in north American English. PLoS ONE 13 (2018) 18. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.: Reading digits in natural images with unsupervised feature learning. NIPS (01 2011) 19. Noroozi, F., Marjanovic, M., Njegus, A., Escalera, S., Anbarjafari, G.: Audio-visual emotion recognition in video clips. IEEE Trans. Affective Comput. 10(1), 60–75 (2019) 20. Paliwal, K.K., Lyons, J.G., Wójcicki, K.K.: Preference for 20–40 ms window duration in speech analysis. In: 2010 4th International Conference on Signal Processing and Communication Systems, pp. 1–4 (2010) 21. Piana, S., Staglianò, A., Odone, F., Camurri, A.: Adaptive body gesture representation for automatic emotion recognition. ACM Trans. Interact. Intell. Syst. 6(1) (2016). https://doi.org/ 10.1145/2818740 22. Poria, S., Chaturvedi, I., Cambria, E., Hussain, A.: Convolutional mkl based multimodal emotion recognition and sentiment analysis. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 439–448 (2016) 23. Prasomphan, S.: Detecting human emotion via speech recognition by using speech spectrogram. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–10 (2015) 24. Schuller, B.W.: Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends. Commun. ACM 61(5), 90–99 (2018). https://doi.org/10.1145/3129340 25. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arxiv:1409.1556 26. Soleymani, M., Asghari-Esfeden, S., Fu, Y., Pantic, M.: Analysis of EEG signals and facial expressions for continuous emotion detection. IEEE Trans. Affective Comput. 7(1), 17–28 (2016) 27. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015) 28. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR) (2015). arxiv:1409.4842 29. Tzirakis, P., Trigeorgis, G., Nicolaou, M.A., Schuller, B.W., Zafeiriou, S.: End-to-end multimodal emotion recognition using deep neural networks. IEEE J. Sel. Topics Signal Process. 11(8), 1301–1309 (2017) 30. Wang, Y., Guan, L.: Recognizing human emotional state from audiovisual signals*. IEEE Trans. Multimed. 10(5), 936–946 (2008) 31. Zhang, S., Zhang, S., Huang, T., Gao, W.: Multimodal deep convolutional neural network for audio-visual emotion recognition. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pp. 281–284. ICMR ’16, Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2911996.2912051

616

D. Mehta et al.

32. Zhang, W., Gu, W., Ma, F., Ni, S., Zhang, L., Huang, S.L.: Multimodal emotion recognition by extracting common and modality-specific information. In: Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems, pp. 396–397. SenSys ’18, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3274783. 3275200 33. Zheng, W., Liu, W., Lu, Y., Lu, B., Cichocki, A.: Emotionmeter: a multimodal framework for recognizing human emotions. IEEE Trans. Cybern. 49(3), 1110–1122 (2019)

Mapping of Computational Social Science Research Themes: A Two-Decade Review Agung Purnomo, Nur Asitah, Elsa Rosyidah, Andre Septianto, and Mega Firdaus

Abstract Research on computational social science continues to develop but is limited to one country and/or one field. From the perspective of bibliometric reviews, this study purposes to visually research mapping and research trends in the field of computational social science on an international scale. This study used bibliometric techniques with secondary data from Scopus. This study analyzed 729 scientific documents published from 1999 to 2020. According to the research, the USA, Massachusetts Institute of Technology, and Claudio Cioffi-Revilla had the most active affiliated countries, institutions, and individual scientists in computational social science research. Based on the identification of a collection of knowledge accumulated from two decades of publication, this research proposes a grouping of computational social science research themes: computational methods, human, application of computer science, social science, and element of social media, abbreviated as CHASE research themes.

A. Purnomo (B) Entrepreneurship Department, BINUS Business School Undergraduate Program, Bina Nusantara University, Jakarta 11480, Indonesia e-mail: [email protected] N. Asitah · E. Rosyidah · A. Septianto · M. Firdaus Institute for Research and Community Services, Universitas Nahdlatul Ulama Sidoarjo, Sidoarjo 61218, Indonesia e-mail: [email protected] E. Rosyidah e-mail: [email protected] A. Septianto e-mail: [email protected] M. Firdaus e-mail: [email protected] E. Rosyidah Environmental Engineering Department, Universitas Nahdlatul Ulama Sidoarjo, Sidoarjo 61218, Indonesia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_55

617

618

A. Purnomo et al.

1 Introduction Big data has given researchers new ways to achieve high significance and effect, as well as shift and improvement of the way we investigate social science phenomena [1]. The open data movement, as well as developments in data analysis, has created new possibilities for social science crises [2]. Computational social science (CSS) is a modern area of research resulting from the convergence of social science with computer science and engineering [3]. Computational social science is the analysis of social processes using Web intelligence, software, and technologies capable of observing, diagnosing, and solving people’s everyday problems [4]. In the presence of friends-of-friends factors, social media, social networks, customer opinions, and mutual emotions have a lot of power [5]. Computational social science has led to a paradigm shift in communication in particular and social science in general [6]. CSS’s key features include automated social network analysis, knowledge extraction systems, spatial information systems, social simulation models, and complexity processing [7]. Social science problems exist at the level of individuals, relationships, and the whole society [8]. User interactions are a reflection of social relationships [9, 10]. The field of CSS has developed rapidly with thousands of documents published using experimental design, large-scale observational, and previously inadequate data or unavailable to researchers [11]. Scientists will re-evaluate interpersonal and mass communication thanks to the intersection of social media, big data, social computing, and communication science [12]. Research related to computational social science in business, computer science, and social science has been carried out and developed at the international level over the last few years. However, previous studies on the topic of computational social science were typically restricted to one field in particular [13, 14]. There has not been much reported on computational social science, despite providing a large image map visualized on a global scale year after year using data from many published studies. There has been no publication that directly discusses the strong positive relationship between relationships, scholars, and scholarly studies’ influence. One of the methods used to view research, in general, is the bibliometric method. Bibliometrics is a method for measuring and analyzing scientific references with a combination of mathematical and statistical methods. Bibliometrics is a statistical technique for analyzing bibliometric publication data such as peer-reviewed journal articles, reports, reviews, books, periodicals, conference proceedings, and related publications. Bibliometric methods have been widely used to present the relationship between quantitative methods and the research domain [15]. This study proposes research questions: What is the mapping and trend of computational social science research using visual bibliometric analysis? From the perspective of bibliometric reviews, this study purposes to visually research mapping and research trends in the field of computational social science on an international scale.

Mapping of Computational Social Science Research …

619

2 Research Methods This study has used bibliometric analysis in a comprehensive literature database. In order to search for and classify similar documents in the global Scopus database, this survey listed important keywords related to a computational social science report. Researchers have used the Scopus database as the main source of information because it is considered a reliable source of scientific publications by academics. This research has used the keyword “computational social science” in the title, abstract, and author keywords to get the necessary data from the Scopus database. Data mining was limited to annual data to obtain fully published data for twelve months each year. Data mining uses the following search query option TITLE-ABSKEY (“computational social science”) AND PUBYEAR < 2021 as of March 2021. In this step, we have found 729 publications over the last 21 years from 1999 to 2020. In the study at this point, the Scopus result metadata has been extracted in the CSV dataset format [16]. The Scopus Web site provides an analyzed search result function that displays bibliometric information from selected publications. We have used this service to analyze and visualize the publication productivity of researchers, institutions, and countries. Besides, this feature was to measure the number of annual publications and publication citations, as well as the proportion of subject areas and source documents [17, 18]. In the next stage, the researcher analyzed the collected documents using the VOSviewer ver. 1.6.16 for co-occurrence analysis. This study employs the VOSviewer tool to create a network of keyword maps for research themes, as well as a thorough co-occurrence analysis, keyword relationship analysis, and a fully systematic computation technique [19, 20]. Simple statistics and tables have been calculated and tabulated using Microsoft Excel. Then, the research results were synthesized and triangulated.

3 Result and Discussion This segment discusses the growing results of data based on the most common organizational affiliation, nations, individual studies, the largest frequency of subject areas, yearly source documents, annual documents and cited papers, research of the map, and networks of authorship in the computational social science.

620

A. Purnomo et al.

Fig. 1 Affiliation with an institution annual publication count of computational social science

3.1 Most Productive Organizational Affiliations in Computational Social Science Research There were 1,505 affiliated organizations to have researched computational social science. The higher organization that researches computational social science publications was the Massachusetts Institute of Technology, USA (n = 25) and then followed by Northeastern University, USA (n = 24); Harvard University, USA (n = 20); George Mason University, Fairfax Campus, USA (n = 18); Stanford University, USA (n = 16); University of Southern California, USA (n = 16); Northwestern University, USA (n = 15); University of Oxford, UK (n = 15); University of Pennsylvania, USA (n = 14); Copenhagen Business School, Denmark (n = 14), and University of Warwick, UK (n = 14). There were 58 countries identified to have researched computational social science. The USA was the leading research nation in computational social science publications (n = 374). It can be seen that the most productive research organization in the study of computational social sciences was dominated by the world-class university from the USA (n = 8) and UK (n = 2). This is because the USA and the UK were the countries that support development and research, including in the computational social science area (Fig. 1).

3.2 Most Individual Researcher in Computational Social Science Research There were 1,737 individual researchers that have researched computational social science. Cioffi-Revilla, C. from George Mason University, USA, was the researcher with the most writings in the field of computational social science (n = 13).

Mapping of Computational Social Science Research …

621

Fig. 2 Most individual computational social science publication researcher

Researcher rating pursued by Moat, H.S. from University of Warwick, United Kingdom (n = 12); Preis, T. from University of Warwick, United Kingdom (n = 12); Vatrapu, R. from Copenhagen Business School, Denmark (n = 10); Hussain, A. from Copenhagen Business School, Denmark (n = 9); Jungherr, A. from OttoFriedrich-Universität, Germany (n = 8); Lazer, D. from Northeastern University, United States (n = 8); Pentland, A. from Massachusetts Institute of Technology, United States (n = 8); Bail, C.A. from a Duke University, United States (n = 7); Jürgens, P. from a Johannes Gutenberg-Universität Mainz, Germany and Lepri, B. from Fondazione Bruno Kessler, Italy (n = 7) (Fig. 2). It can be seen that the most productive individual researcher in the computational social science study mostly comes from the USA (n = 4), UK (n = 2), Denmark (n = 2), and Germany (n = 2).

3.3 The Computational Social Science Sector’s Annual Publications The number of annual international publications on computational social science studies tends to show an increasing trend. With 111 publications, Fig. 3 shows the highest peak of publishing in 2020. Since 1999, it has been noted that researchers have been researching computational social science. The computational social science study will have 104 documents in 2019, compared to 81 in 2018, 93 in 2017, 75 in 2016, and 65 in 2015.

622

A. Purnomo et al.

Fig. 3 Computational social science sector’s annual publications

3.4 Research Theme Map The research theme map is a review that seeks to identify computational social science research based on keyword linkages between publications. The VOSviewer software was used to evaluate and visualize the construction of the computational social science keyword scheme for the computational social science of the research theme map. For the minimum number of keyword-related articles, eight repetitions were required. As a consequence, 3,889 keywords out of 117 met the criteria. Figure 4 represents five research theme groups for international academic publication of computational social science, which have been simplified and abbreviated as CHASE research themes, based on research keywords. 1.

2.

3.

4.

Computational method cluster (purple). This cluster contains keywords of agentbased model, complex networks, complex systems, computational methods, computational model, computer simulation, multi-agent systems, simulations, and social simulation. This cluster was linked to the majority of these keywords. Human cluster (red). Adult, article, communication, female, human, human experiment, humans, male, personality, psychology, and social behavior. Many of these keywords have something to do with humans in some way. Application of computer science cluster (green). We can find the application of computer science in this cluster from keywords such as artificial intelligence, big data, computational linguistic, computational social science, data analytics, data science, information management, information retrieval, machine learning, natural language processing, and text mining. Social science cluster (blue). This cluster was connected by keywords of behavioral research, social computing, social datum, social interactions, social

Mapping of Computational Social Science Research …

623

Fig. 4 Map of research themes

5.

sciences, and social science computing. Several of these keywords are concerned with social science. Element of social media cluster (yellow). The keyword of Facebook, information diffusion, online social networks, online systems, sentiment analysis, social media, social networks, and Twitter have dominated this cluster. Many of these terms are associated with social media themes.

4 Conclusion The number of international publications on computational social science, as well as maps and visual trends, has been growing on an annual basis, according to the findings. The convergence axis is classified in this research, which includes publication in computational social science, to categorize the body of knowledge produced over twenty-one years of academic publication in terms of knowledge contributions: computational methods, human, application of computer science, social science, and element of social media, abbreviated as CHASE research themes. As a practical result of identifying key themes in the computational social science field, practical studies are required to clarify general backgrounds and subjects, as well as study gaps, and there is a clearer understanding of the need for them. All of this will assist in new research into the disciplines’ lack of advanced expertise and analysis. The potential

624

A. Purnomo et al.

of computational social science to contribute to entrepreneurs, managers, IT staff, marketer, and management information systems is frequently studied themes.

References 1. Chang, R.M., Kauffman, R.J., Kwon, Y.: Understanding the paradigm shift to computational social science in the presence of big data. Decis. Support Syst. 63, 67–80 (2014). https://doi. org/10.1016/j.dss.2013.08.008 2. Burger, Oz, Kennedy, C.: Computational social science of disasters: opportunities and challenges. Futur. Internet. 11, 103 (2019). https://doi.org/10.3390/fi11050103 3. Edelmann, A., Wolff, T., Montagne, D., Bail, C.A.: Computational social science and sociology. Annu. Rev. Sociol. 46, 24.1–24.21 (2020). https://doi.org/10.1146/annurev-soc-121919 054621 4. Tao, X., Velasquez-Silva, J.D., Liu, J., Zhong, N.: Editorial: computational social science as the ultimate web intelligence. World Wide Web 23, 1743–1745 (2020). https://doi.org/10.1007/ s11280-020-00801-2 5. Natali, F., Carley, K.M., Zhu, F., Huang, B.: The role of different tie strength in disseminating different topics on a microblog. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp. 203–207. ACM, New York, USA (2017). https://doi.org/10.1145/3110025.3110130 6. Peng, T.-Q., Liang, H., Zhu, J.J.H.: Introducing computational social science for Asia-Pacific communication research. Asian J. Commun. 29, 205–216 (2019). https://doi.org/10.1080/012 92986.2019.1602911 7. Cioffi-Revilla, C.: Computational social science. Wiley Interdiscip. Rev. Comput. Stat. 2, 259– 271 (2010). https://doi.org/10.1002/wics.95 8. Zhang, J., Wang, W., Xia, F., Lin, Y.-R., Tong, H.: Data-driven computational social science: a survey. Big Data Res. 21, 100145 (2020). https://doi.org/10.1016/j.bdr.2020.100145 9. Xia, F., Liu, L., Jedari, B., Das, S.K.: PIS: a multi-dimensional routing protocol for sociallyaware networking. IEEE Trans. Mob. Comput. 15, 2825–2836 (2016). https://doi.org/10.1109/ TMC.2016.2517649 10. Maulana, F.I., Zamahsari, G.K., Purnomo, A.: Web design for distance learning indonesian language BIPA. In: 2020 International Conference on Information Management and Technology (ICIMTech), pp. 988–991. IEEE, Jakarta (2020). https://doi.org/10.1109/ICIMTech5 0083.2020.9211175 11. Lazer, D.M.J., Pentland, A., Watts, D.J., Aral, S., Athey, S., Contractor, N., Freelon, D., Gonzalez-Bailon, S., King, G., Margetts, H., Nelson, A., Salganik, M.J., Strohmaier, M., Vespignani, A., Wagner, C.: Computational social science: obstacles and opportunities. Science 369(6507), 1060–1062 (2020). https://doi.org/10.1126/science.aaz8170 12. Cappella, J.N.: Vectors into the future of mass and interpersonal communication research: big data, social media, and computational social science. Hum. Commun. Res. 43, 545–558 (2017). https://doi.org/10.1111/hcre.12114 13. Garcia-Mancilla, J., Ramirez-Marquez, J.E., Lipizzi, C., Vesonder, G.T., Gonzalez, V.M.: Characterizing negative sentiments in at-risk populations via crowd computing: a computational social science approach. Int. J. Data Sci. Anal. 7, 165–177 (2019). https://doi.org/10.1007/s41 060-018-0135-9 14. Gaffney, D., Matias, J.N.: Caveat emptor, computational social science: large-scale missing data in a widely-published Reddit corpus. PLoS ONE 13, e0200162 (2018). https://doi.org/10. 1371/journal.pone.0200162 15. IGI Global: What is Bibliometric? https://www.igi-global.com/dictionary/bibliometric/49021 16. Purnomo, A., Asitah, N.: Publication dataset of computational social science (1999–2020), https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:https://doi.org/10.7910/DVN/ 8R9JRO. https://doi.org/10.7910/DVN/8R9JRO

Mapping of Computational Social Science Research …

625

17. Purnomo, A., Sari, Y.K.P., Firdaus, M., Anam, F., Royidah, E.: Digital literacy research: a scientometric mapping over the past 22 years. In: International Conference on Information Management and Technology (ICIMTech), pp. 108–113. IEEE (2020). https://doi.org/10.1109/ ICIMTech50083.2020.9211267 18. Purnomo, A., Rosyidah, E., Firdaus, M., Asitah, N., Septianto, A.: Data science publication: thirty-six years lesson of scientometric review. Int. Conf. Inf. Manag. Technol. 893–898 (2020). https://doi.org/10.1109/ICIMTech50083.2020.9211192 19. Ranjbar-Sahraei, B., Negenborn, R.R.: Research positioning and trend identification. TU Delft, Walanda (2017) 20. van Eck, N.J., Waltman, L.: Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 84, 523–538 (2010). https://doi.org/10.1007/s11192-009-0146-3

Diagnosis of Pneumonia with Chest X-Ray Using Deep Neural Networks E. Venkateswara Reddy, G. S. Naveen Kumar, G. Siva Naga Dhipti, and Baggam Swathi

Abstract Pneumonia has caused significant deaths worldwide, which remains a threat to human health. Pneumonia is the soreness in the minute air sacs which is originated by deadly lung contagion, which creates congestion by filling the lungs with fluid and pus. People may serve shortness of breath, a cough, fever chest pain, chills, or fatigue which may be a complication of viral infection such as COVID-19 or the flu. World Health Organization (WHO) reports that 33% deaths in India is because of pneumonia. Diagnose of pneumonia needs chest X-rays, CT scans to be evaluated by expert radiotherapists. However, the image quality of chest X-ray has some defects, such as low contrast or overlapping organs and blurred boundary which now has increased the need of automatic system for detecting pneumonia efficiently. The proposed method uses a deep neural network to classify the endless dataset. As deep learning is proven in analyzing medical images, attention of disease classification is grabbed by convolution neural networks (CNNs). Image classification tasks are supported with features learned by pretrained CNN models on extensive datasets. In the proposed work, we evaluate the usefulness of pretrained CNN models used as component extractors followed by various classifiers for the grouping of unusual and typical chest X-rays. Data processing and augmentation are performed as and when required. In the proposed method, CNN is created and trained using tensor flow 2.0. Exactness of 98.4% is accomplished by the suggested weighted classifier model. The anticipated model assists the radiologists for analysis of pneumonia by giving a biting diagnosis.

1 Introduction The emphasis of machine learning is to minimize the learning process of human being. Deep learning is one such new technology which has made breakthroughs in computer vision, image processing. Machine learning algorithms have good potential in all sectors of medicine for diagnosis and thereby making critical clinical decision. It E. Venkateswara Reddy (B) · G. S. Naveen Kumar · G. Siva Naga Dhipti · B. Swathi Depatrment of CSE, Malla Reddy University, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_56

627

628

E. Venkateswara Reddy et al.

is promising to practice tools based on deep learning frameworks to spot pneumonia centered on chest X-ray images. With the emergent technical encroachments, we have in this day and age. The superior proven conclusions for finding advanced treatment have become a challenge with a very imprecise shape, size, and position of pneumonia which can vary a great deal. WHO reported 18% of all deaths of children who are under five are accounted with pneumonia. This can be frozen with low-cost, low-tech suppository, and care by vetoing with simple intrusions [1]. So, this has shown an immediate requirement of research on technology-based diagnosis which can reduce pneumonia-related impermanence, expressly in children. The test to be done to diagnose pneumonia is CT scan, chest ultrasound, and MRI for chest, chest X-Ray [2]. As the time consumption for procedure of CT, ultrasound and MRI scan are more when compared to X-ray process. X-ray is must preferred over the others for technical evolution by automation [3]. At a minimal cost, this ability can be made accessible to outsized inhabitants with the advancement in deep learning-based solutions in image processing [4, 5]. Deep learning techniques are so advanced that it can even accept the cases which the radiologist find to be challenging to diagnose the disease which may have the feature that describe the very existence of disease often get involved with some other illness [6, 7]. Deep learning techniques resolve all these issues with accuracy in the same or even greater than an average radiologist.

2 Methodology 2.1 Data Collection The dataset that we are going to use has about five thousand and eight hundred images. The dataset used for chest X-ray widely available on the Kaggle [8] platform, and it consists of 1,12,120 frontal chest x-ray images out of which 30,085 patients, which is sufficient for using deep learning with trained convolution neural networks using Tensorflow to diagnose penumonia as normal, backerial, or viral.

2.2 Data Preprocessing The data required for diagnose are collected from Kaggle, and the anticipated model for pneumonia contagion empathy is pronounced further in detail. Initiating the process from collection of data in the form of images the entire method is classified into five different stages as shown in Fig. 1. With minimum time consumption, the input images are retrieved from the large database using information-based visual retrieval technique. Augmentation of data

Diagnosis of Pneumonia with Chest X-Ray …

629

Fig. 1 Flow of proposed pneumonia detection model

techniques is then used to enhance the proportions of the dataset. The image duplication problems are evaded with the support of information-based image retrieval techniques [9]. Dataset extension can be done by data augmentation method. Data augmentation in data analysis is the technique used to upsurge the dataset by accumulating marginally reformed photocopies of already existing data by rotate, translate, and scaling. This helps the model to generalize better over images as it is looking in seven different directions. The newly derived artificial observations are called augmented observations, and these observations are used for training classifiers. In every class, the count of images is upsurge to 6000 images by the data augmentation technique so that there are 12,000 images to train the proposed neural network [6, 10]. The process of augmented data is implemented using Python programming language with tensor flow and augmenter library. And, the dataset is splited as a training, validation, and testing set. Augmentation is a tricky part in medical imaging [11]. We cannot use most of the augmentations, as it might change the image itself. That is the reason, we use only those augment observations which do not affect pixel values like horizontal rotations, random image cropping.

3 Proposed System In this paper, we have proposed a model to which uses the images in chest X-ray to diagnose pneumonia with deep learning techniques. In deep learning, we have knearest neighbors, logistic regression classifier, and convolution neural networks to analyze the images. Convolution neural network is one of the technique to do image classification and image recognition in neural networks, wherein data by multiple layers of arrays are processed with the designed system. Image recognition is one such area which uses predominantly uses convolution neural networks [12]. Convolution neural networks take two dimensional arrays as input. By taking this 2D input, CNN can cite the altitudinal features by means of its kernel from the data available, which is not possible by other networks [13]. Detection of edges, distribution of colors boost

630

E. Venkateswara Reddy et al.

Fig. 2 Pneumonia detection using convolution neural network (CNN)

these networks in image classification. Figure 2 represents the pneumonia detection using convolution neural networks. By using a plain convolution, validation errors are higher as adding of more layers can lead to more complex learning functions which might lead to an over fitting situation. Deeper networks might face an issue back propagating the gradient; this issue is most commonly called vanishing gradient problem [14, 15]. This over fitting problem can be handled by using regularization and drop out where we deactivate a few random neurons to help other neurons learn [16]. Vanishing ingredients problem, it can be handled using ReLU activation, which maximizes the gradient flow. Figure 3 shows the Deeper ResNet. In the figure on left, we have VGG-19, which is a 19-layer plane network, and on the right, we have 152 layer ResNet as they look similar to each other except for the skip connection spot. The validation error for 34-layer ResNet is significantly lower compared to plain convolution. Difference will keep increasing with increase in the number of layers. ResNet won the image net competition and proved itself as one of the extraordinary innovations in deep learning space [17].

4 Results We used a large number of chest X-rays to build a training model, as shown in Fig.4, and evaluated the model to measure accuracy, precision and recall as shown in Fig.5, while also taking into account the input data, as shown in Fig. 6. After processing of dataset using CNN-based deep learning algorithm, we diagnose the patient with pneumonia with high accuracy, sample of which is shown in Fig. 7. Figure 8 shows a sample validation image. The validation error for ResNet is significantly lower compared to plain convolution. ResNet won the image net competition and proved itself as one of the extraordinary innovations in deep learning space. The performance of the network for testing dataset is evaluated after the training phase and is compared using the performance metrics: accuracy, precision, recall.

Diagnosis of Pneumonia with Chest X-Ray …

Fig. 3 Deeper ResNet, comparison of plain convolution and ResNet convolution

Fig. 4 Training the model

631

632

E. Venkateswara Reddy et al.

Fig. 5 Evaluating the loaded model

Fig. 6 Several X-ray images in the dataset used as input to the model

Diagnosis of Pneumonia with Chest X-Ray …

633

Fig. 7 Pneumonia detected X-ray’s by the model

Accuracy =

true positive + true negative true positive + true negative + false positive + false negative Precision = Recall =

true positive true positive + false positive

true positve true positive + false negative

The accuracy, precision, and recall of pneumonia images are 98%, 97%, and 100%, respectively, as shown in Fig. 9.

634

E. Venkateswara Reddy et al.

Fig. 8 Sample validation image

Fig. 9 Graphs for precision, recall, accuracy, and loss

5 Conclusion Demonstration of proficient radiologists is the uppermost inevitability to appropriately diagnose any kind of thoracic disease. The intention of writing this paper is to deliver a dictating pretrained CNN model and classifier for the advancement of superior algorithms for noticing pneumonia in anticipatable future and to expand the medical adeptness in areas where the obtainability of radiotherapists is quiet inadequate. The spreading out of deep learning with such algorithms in the medical domain can be highly valuable for providing better healthcare services. In this process, we are making use of 112,120 frontal chest X-ray images from 30,085 patients which good enough for applying deep learning with trained convolution neural networks. In this study, we compared the plain simple VGG-based CNN with ResNet CNN

Diagnosis of Pneumonia with Chest X-Ray …

635

as they look similar to each other except for the skip connection spot. The validation error for 34-layer ResNet is significantly lower compared to plain convolution. Difference will keep increasing with increase in the number of layers. ResNet won the image net competition and proved itself as one of the extraordinary innovations in deep learning space.

References 1. Johnson S., Wells D.: Healthline Viral Pneumonia: Symptoms, Risk Factors, and More. Available online: https://www.healthline.com/health/viral-pneumonia. Accessed on 31 Dec 2019 2. Pneumonia. Available online: https://www.radiologyinfo.org/en/info.cfm?pg=pneumonia. Accessed on 31 Dec 2019 3. World Health Organization. Standardization of interpretation of chest radiographs for the diagnosis of pneumonia in children. World Health Organization; Geneva, Switzerland. Technical Report [Google Scholar] (2001) 4. Kallianos, K., Mongan, J., Antani, S., Henry, T., Taylor, A., Abuya, J., Kohli, M.: How far have we come? Artificial intelligence for chest radiograph interpretation. Clin. Radiol. 74, 338–345 (2019). https://doi.org/10.1016/j.crad.2018.12.015.[PubMed][CrossRef][GoogleScholar] 5. Liu, N., Wan, L., Zhang, Y., Zhou, T., Huo, H., Fang, T.: Exploiting convolutional neural networks with deeply local description for remote sensing image classification. IEEE Access. 6, 11215–11228 (2018). https://doi.org/10.1109/ACCESS.2018.2798799.[CrossRef][Google Scholar] 6. Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Langlotz, C., Shpanskaya, K., et al.: Chexnet: radiologist-level pneumonia detection on chest x-rays with deep learning (2017). arXiv:1711.05225 7. Lakhani, Sundaram, B.: Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology 284(2), 574–582 (2017) 8. https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia 9. Naveen Kumar, G.S., Reddy, V.S.K.: Video shot boundary detection and key frame extraction for video retrieval. In: Proceedings of the Second International Conference on Computational Intelligence and Informatics. Springer, Singapore, pp. 557–567 (2018) 10. Convolution neural network based on two-dimensional spectrum for hyperspectral image classification Research Article|Open Access, Article ID 8602103 (2018). https://doi.org/10.1155/ 2018/8602103 11. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Springer International Publishing, Cham, Switzerland, pp. 234–241 (2015) 12. Analysis of MRI Based Brain Tumor detection using RFCM Clustering and SVM Classifier International Conference on Soft Computing and Signal Processing (ICSCSP-2018) Springer Series, June 22, 2018 13. Rajasenbagam, T., Jeyanthi, S., Pandian, J.A.: Detection of pneumonia infection in lungs from chest X-ray images using deep convolutional neural network and content-based image retrieval techniques. J. Ambient. Intell. Human Comput. (2021) 14. Baltruschat, I.M., Nickisch, H., Grass, M., Knopp, T., Saalbach, A.: Comparison of deep learning approaches for multi-label chest X-ray classification. Sci. Rep. 9, 1–10 (2019) 15. Abiyev, R.H., Ma’aitah, M.K.: Deep convolutional neural networks for chest diseases detection. J. Healthcare Eng. 12 (2018)

636

E. Venkateswara Reddy et al.

16. Venkateswara Reddy, E., Ramesh, M., Jane, M.: A comparative study of clustering techniques for big data sets using Apache Mahout. In: 3rd IEEE International Conference on Smart City and Big Data 2016, Sultanate of Oman (2016) 17. Kumar, G.N., Reddy, V.S.K.: Key frame extraction using rough set theory for video retrieval. In: Soft Computing and Signal Processing, pp. 751–757. Springer, Singapore (2019)

High Performance Algorithm for Content-Based Video Retrieval Using Multiple Features G. S. Naveen Kumar and V. S. K. Reddy

Abstract With the growth of technology, there has been a revolution in multimedia. The current text-based video retrieval method necessitates a large amount of manual intervention and provides very poor retrieval performance when the database size is large. Given the limitations of current algorithms, obtaining videos based on their own content outperforms traditional methods. There exist many algorithms to recoup a video from a large database. None of them could reduce the time consumption, which does not fulfill the user requirements. But only, the propounded system integrates the spatiotemporal features by exploring the whole information of the video, and it improves the efficiency of video retrieval. The proposed algorithm is an advanced spatiotemporal scale invariant feature transform (ST-SIFT). This system considers the color and motion feature of the video to attain spatiotemporal feature. Here, in this algorithm, we implement HSV-CH technique for taking out of color feature and motion histogram technique to take out video motion feature. The CBVR algorithm’s performance was examined using the TRECVID and YouTube datasets, and the retrieved videos demonstrate the proposed algorithm’s high performance.

1 Introduction Since the sudden expansion of recorded devices and Internet bandwidth has been available for affordable costs, the video database has grown dramatically and significantly as a result of widespread usage of video surveillance, communication, entertainment, online education, and medical-related videos [1]. The conventional video retrieval search is originated on the name of the video, its description, annotation, and tagged words. Eventually, retrieved videos are not satisfactory to the users as a lot of irrelevant videos are retrieved, and ample human struggle is involved in it. An ideal database management system has storage, indexing, and retrieval [2] which function effectively. Therefore, content-based video retrieval is essentially required G. S. Naveen Kumar (B) · V. S. K. Reddy Department of ECE, Malla Reddy College of Engineering & Technology, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_57

637

638

G. S. Naveen Kumar and V. S. K. Reddy

to meet the needs of users in the present scenario. The features such as color, texture, shape, and motion in the video constitute the content [3]. The primary goal of the CBVR system is to extract video information with content that is very similar to the user’s query [4]. It is very difficult to examine two videos directly because of its huge image content need vast computations results in a lot of time ingesting. Instead of doing direct video assessment, it would be better to compare the key features of both the videos. Consequently, the developers of CBVR systems need to focus on two significant processes, i.e., feature extraction and similarity matching. Shot boundary recognition and selective frame extraction are prerequisites for content-based video retrieval. A shot denotes the continuous capture of video. The shot boundary of video is sensed if the variance between any two successive frames is larger than the threshold value. The repeated frames in video, which is a sequence of frames, may comprise data redundancy [5]. Exclusive frames representing the entire video information are extracted from the video and are referred to as key frames. The efficiency of video retrieval algorithm is decided by the parameters such as feature extraction, feature selection, and the classifier [6]. Redundant and irrelevant features in the dataset are eliminated in the feature selection processes to improve the classifier’s performance [7]. Shot boundary detection involves different activities like video retrieval, classification, indexing, and summarization. Hence, it grabs the attention of most of the researchers to work on these research directions. Very few algorithms like pixel based, block based, and histogram based were proposed to detect the shot boundaries of video images. In pixel-based methods, the sequence of frames is verified at the corresponding pixel positions [8]. This method is very simple and needs very few computations. Hence, it takes less computational time. Due to the enormous growth of video in Internet, the recent storage format of video is tended to be Motion Picture Expert Group, AVI and H.264 with standard compression schemes. Even after this wide spread usage of compressed videos, most of the CBVR techniques reported in the literature were operating in the pixel domain. These techniques first decompress the video files and then perform operations on the pixel domain that obviously increase computational complexity in terms of time and space. Very few techniques were reported to build CBVR system in frequency domain. Hence, there is a need to design a CBVR system that can operate on frequency domain with reduced computational effort simultaneously increasing the retrieval accuracy rate. Video retrieval is an essential task in many real-world applications, and the research on it is still young owing to the exponential rise in the video data worldwide. The performance of object-based framework in VR depends mainly on the fruitfulness of its subtasks like video structure parsing, object segmentation, and representation. In Sect. 1, a complete discussion about this research work is explained in “Introduction.” Section 2 discusses about proposed innovative methodology for “Shot Boundary Detection based on scale invariant feature transform (SIFT)” for distinguishes about various algorithms proposed earlier on video retrieval. Section 3 comprises the key frame extraction. Section 4 briefly explained about “Contentbased Video Retrieval Based on Hybrid Color and Motion Histogram Method.” Performance evaluation is shown in Sect. 5. Last section concentrates on the paper’s conclusion and future scope.

High Performance Algorithm for Content-Based Video …

639

2 Shot Boundary Detection With a single camera, a shot is defined as a sequence of sequential photos of nonstop recorded video. The shot boundary was separated into two successive shots based on the video’s resemblance and dissimilarity. A shot boundary can be classified as a cut boundary or a gradual border [9]. The fast changeover from a shot to the next is identified as a cut boundary. Between the end frame of the foremost shot and the start frame of the next shot, there is a cut. [10] There is a continuous transition between frames. Feature extraction must be performed to check the similarity and difference between two frames of a movie in order to detect the shots. For the development of a feature vector in a video, we presented a technique called SIFT. Filter keypoints of short articles are very first gotten rid of from a great deal of referral photos as well as do away with in a data source. A thing is regarded in an additional photo by solely taking a look at each component from the brand-new photo to this data source as well as uncovering candidate working with highlights based on Euclidean splitting up of their aspect vectors. From the complete plan of suits, parts of keypoints that acknowledge to the product as well as its location, range, as well as instructions in the brand-new photo are acknowledged to look with excellent suits. Each lot of at the very least three consists of that yield to a post as well as its pose is after that subject to in addition direct by factor version check as well as along these lines exemptions are disposed of. At long last, the probability that a certain plan of highlights shows the proximity of a product is figured, offered the accuracy of fit as well as number of possible incorrect suits. Each of the SIFT keypoints shows 2D location, range, as well as instructions, as well as each collaborated keypoint in the data source has a document of its specifications about the prep work image in which it was located. The similitude adjustment presumed by these four specifications is simply an estimate fully six degree of opportunity stance room for a 3D thing as well as moreover does not stand for any kind of nonunbending distortions. In this fashion, Lowe used large container dimensions of 30 levels for instructions, an aspect of two for range, and also 0.25 celebrations one of the most severe awaited preparing photo dimension (using the prepared for range) for location. The SIFT crucial instances created at the larger range are provided double the thickness of those at the littler range. This indicates the larger range is basically prepared to transport the definitely next-door neighbors for examining at the littler range. This furthermore enhances recommendation implementation by providing even more weight to the least-boisterous range. To keep a calculated, range from the problem of limitation influences in receptacle job, each keypoint suits enact support of the two closest cylinders in each dimension, offering an accumulation of 16 areas for each conjecture as well as additional increasing the position run. Keypoint Allocator: The area photo inclines are established at the picked array in the location around each lower line. These estimations were after that expanded with the objective that share side area was never ever once again needed, for instance by browsing for raised quantities of ebb and also circulation in the image disposition. The expected edges were also being recognized on components of the photo which were not edges

640

G. S. Naveen Kumar and V. S. K. Reddy

in the standard feeling (for instance a little fantastic area on a dark structure could be identified) [11]. These concentrates are much of the moment referred to as intrigue concentrates, nonetheless the expression “edge” is made use of by personalized. The extraction of SIFT features is shown in Fig. 1. The functionality of a proposed algorithm is explained with a neat block diagram of different modules in Fig. 2. Here, each module performs different functionalities and submits the output as an input to the next module. Algorithm-1: Spatiotemporal SIFT algorithm Input: Video from the database Output: Detection of shot boundaries Step 1: Choose a video dataset Step 2: From the database take an input video Step 3: Sequence of pictures is extracted from the input video Step 4: Using ST-SIFT algorithm, we extract features of each frame Step 5: Keypoints are identified in each frame Step 6: Salient points of first picture are matched with the consecutive frame till nth frame Fig. 1 Major phases of the SIFT feature extraction

High Performance Algorithm for Content-Based Video …

641

Fig. 2 Proposed system for video shot detection

Step 7: The similarity is measured by comparing with threshold value Step 8: Shots are detected based on dissimilarity Step 9: Recurrence Steps 2–8 for entirely videos in the database.

3 Key Frame Extraction The proposed method categorizes frames as global features based on entropy values and selects a frame from each class as the representative key frame. It consists of two major steps: The first is the extraction of I picture from a compressed MPEG video stream, and the second is the extraction of image information entropy value-based key frames [12]. The information entropy of each outline in the shot is determined in order to extract selective frameworks. Entropy =

L 

log( p(xi))

(1)

i=1

Here, p(x i )

is representation of probability of intensity value.

When considering the other pictures, there may be critical frames that are similar, resulting in redundant vital frames. It is possible that the gray level circulation in the two equivalent frameworks differs, resulting in redundant necessary frameworks. This is shown in Fig. 3.

642

G. S. Naveen Kumar and V. S. K. Reddy

Fig. 3 Block diagram of key frame extraction

Algorithm-2: Image information entropy algorithm for selective frame extraction: Step 1: Take shot/video as an input Step 2: Extract the I-frames for entire shot Step 3: Evaluate image information entropy values for all I-frames Step 4: Compare entropy values with threshold values Step 5: Based on dissimilarity, key frames are extracted Step 6: Repeat Step 1 to Step 5 for all shots.

4 Proposed Spatio-temporal Feature Extraction Color Feature Extraction: Color is the most significant feature to be considered to retrieve the spatial information of the video. Histogram analysis is used for the statistical representation of frame intensity values. It is mostly depending on the RGB color spaces, but in few cases, any two dissimilar frames may have similar RGB values. So, this research is very specific to choose the HSV values to distinguish the frames [13]. (a)

RGB to HSV Conversion: Initially, RGB values are regularized by dividing with “255.” Let we consider Vmax represents the maximum value of red, green, and blue, and Vmin represents the minimum values.

High Performance Algorithm for Content-Based Video …

643

Hue Calculations ⎧ 0, ⎪ ⎪   ⎪ ⎪ ⎪ ⎨ 60◦ × Vmaxg−b mod 6 , −V min   H= b−r ◦ ⎪ 60 × Vmax −Vmin + 2 , ⎪ ⎪   ⎪ ⎪ r −g ⎩ 60◦ × +4 , Vmax −Vmin

Vmax = Vmin Vmax = r Vmax = g Vmax = b

Saturation Calculation ⎧ ⎪ ⎨ 0, S = 0,  ⎪ ⎩ 1−

Vmax = Vmin  Vmax = 0 Vmin , Otherwise Vmax

Value Calculation V = Vmax In the proposed system, HSV values are quantifies as 16 bins as 4 * 4 bins consecutively. This quantization makes 4 × 4*16 = 256 bins (distinct colors). The bins with non-zero count pixels are considered to be the color objects [14]. Effective Motion Feature Extraction Algorithm Motion feature is used to find the correlation between the consistent frames within a video shot. The main purpose to consider the motion feature is to understand the time-based variations of a video. In this proposed algorithm, motion histogram is quantized with 121 × 121 bins. In which, 60 bins represent positive direction; 60 bins represent negative direction, and only, 1 bin represents no change. Traditional approaches use feature matching to detect motion vectors between two consecutive frames, but this results in complex computations [15]. As a result, this study may be able to find an alternative solution to this problem by introducing the MPEG video format to be used as a substitute.

4.1 CBVR System Design In present days, the CBVR system function is very low level. As based on an individual perspective, the video description varies. So, to search the video content faster, we need high-level features for the CBVR system. To construct this system, we have to know the type of video database we are dealing with [16]. There are plenty of video database at present. Later, the user has to decide the type of query to resolve the video database such as text based, object based, and image based.

644

G. S. Naveen Kumar and V. S. K. Reddy

Fig. 4 Block diagram for the proposed CBVR system

The main agenda of CBVR system is to satisfy the user in retrieving the database video which has similar content to his query. In order to overcome, many computations and lots of time by conducting direct comparison of two videos are highly a great task. So, it is better to compare the videos by its features makes the task easy. So, feature extraction and similarity matching are two major tasks to be performed for easy database retrieval using CBVR systems. The block diagram of proposed algorithm is shown in Fig. 4.

5 Performance Evaluation On a database of three hours of news, commercials, documentaries, sports, movies, and cartoon video sequences, the performance of the proposed spatiotemporal feature extraction algorithm hybrid color and motion histogram algorithm (HCMH) is evaluated. Video clips captured at 30 frames per second and with a resolution of 320 × 240 pixels are used. Recall =

Number of True videos retrieved Total number of True videos

Precision =

Number of True videos retrieved Total number of videos retrieved

As shown in Figs. 5 and 6, the proposed algorithm produces better results in terms of precision and recall than the existing algorithm.

High Performance Algorithm for Content-Based Video …

645

Fig. 5 Performance comparison of histogram clustering and HCMH in terms of precision

Fig. 6 Performance comparison of histogram clustering and HCMH in terms of recall

6 Conclusion For performing a CBVR system design the major steps, we are moving with are shot boundary detection and keyframe extraction by using the SIFT and image information entropy techniques, respectively. This operation improves the efficiency and accuracy which in turn strengthens the output of hypothesized framework. By applying the highlighted algorithm, we can obtain the shot transition and keyframe extraction done efficiently during the process of video retrieval.

646

G. S. Naveen Kumar and V. S. K. Reddy

References 1. Yu, L., Huang, Z., Cao, J., Shen, H.T.: Scalable video event retrieval by visual state binary embedding. IEEE Trans. Multimed. 18(8), 1590–1603 (2016) 2. Pont-Tuset, J., Farre, M.A., Smolic, A.: Semi-automatic video object segmentation by advanced manipulation of segmentation hierarchies. In: International Workshop on Content-Based Multimedia Indexing (2015) 3. Naveen Kumar, G.S., Reddy, V.S.K.: Detection of shot boundaries and extraction of key frames for video retrieval. Int. J. Knowl. Intell. Eng. Syst. 24(1), 11–17 (2020) 4. Gray, C., James, S., Collomosse, J.: A particle filtering approach to salient video object localization. In: IEEE International Conference on Image Processing, pp. 194–198 (2014) 5. Naveed, E., Mehmood, I., Baik, S.W.: Efficient visual attention-based framework for extracting key frames from videos. Sign. Process. Image Commun. 28(1), 34–44 (2013) 6. Guogang, H., Chen, C.W.: Distributed video coding with zero motion skip and efficient DCT coefficient encoding. In: 2008 IEEE International Conference on Multimedia and Expo, EEE, pp. 777–780 (2008) 7. Wang, T., Wu, J., Chen, L.: An approach to video key-frame extraction based on rough set. In: 2007 MUE’07 International Conference on Multimedia and Ubiquitous Engineering, IEEE, pp. 590–596 (2007) 8. Gianluigi, C., Raimondo, S.: An innovative algorithm for key frame extraction in video summarization. J. Real-Time Image Proc. 1(1), 69–88 (2006) 9. Liu, G., Zhao, J.: Key frame extraction from MPEG video stream. In: 2010 Third International Symposium on Information Processing (ISIP), IEEE, pp. 423–427 (2010) 10. Xu, J., Yuting, S., Liu, Q.: Detection of double MPEG-2 compression based on distributions of DCT coefficients. Int. J. Pattern Recognit. Artif. Intell. 27(01), 1354001 (2013) 11. Uehara, T., Safavi-Naini, R., Ogunbona, P.: Recovering DC coefficients in block-based DCT. IEEE Trans. Image Process. 15(11), 3592–3596 (2006) 12. Shirahama, K., Matsuoka, Y., Uehara, K.: Event retrieval in video archives using rough set theory and partially supervised learning. Multimed. Tools Appl. 57(1), 145–173 (2012) 13. Yang, H.Y., Li, Y.W., Li, W.Y., Wang, X.Y., Yang, F.Y.: Content-based image retrieval using local visual attention feature. J. Vis. Commun. Image Represent. 25(6), 1308–1323 (2014) 14. Borth, D., Ulges, A., Schulze, C., Breuel, T.M.: Keyframe extraction for video tagging & summarization. Informatiktage 2008, 45–48 (2008) 15. Luo, Y., Junsong, Y.: Salient object detection in videos by optimal spatio-temporal path discovery. In: Proceedings of ACM International Conference on Multimedia, pp. 509–512 (2013) 16. Naveen Kumar, G.S., Reddy, V.S.K.: High-performance video retrieval based on spatiotemporal features. In: Microelectronics, Electromagnetics and Telecommunications, pp. 433– 441. Springer, Singapore (2018)

Smart Recruitment System Using Deep Learning with Natural Language Processing Ranganath Ponnaboyina, Ramesh Makala, and E. Venkateswara Reddy

Abstract Recruitment, a worldwide industry with millions of resumes being uploaded on N-number of hiring websites for countless jobs on daily routine, with the great evolution in online-based hiring process. Job requirement is considered one of the major activities for humans which is a very strenuous job to find a fruitful talent. Every job seeker will have their novel to organize the data blocks with unique style and format for representation, thereby the resumes became an extraordinary illustration of data which is unstructured. Choosing a fantastic resume from a collection of unstructured resumes is very simple with a resume parser, which adapts the process of extracting structured data from a collection of unstructured resumes. Machine understandable output is obtained using the set of instructions that investigates and abstracts resume/CV data. This helps to store and analyse data automatically. This paper proposes a smart recruitment system (SRS) which includes resume classification using deep learning along with natural language processing (NLP) techniques and automatic questionaries’ which measures technical proficiency of an applicant, and by using syntactical similarity and semantic similarity measurements to identifying the suitable candidate according to the requirement of the company skill set. To abstract the significant data for hiring process using natural language process, we propose to use named entity recognition. The proposed procedure irrespective of format whether it is a pdf or word document of the resume, will pick up the data associated with the candidate-like work experience and education. Parsing and ranking the resume makes the hiring process easy and efficient.

R. Ponnaboyina (B) Depatrment of CSE, Acharya Nagarjuna University, Guntur, A.P., India R. Makala Depatrment of IT, RVR&JC College of Engineering, Guntur, A.P., India E. Venkateswara Reddy Depatrment of CSE, Malla Reddy University, Hyderabad, Telangana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_58

647

648

R. Ponnaboyina et al.

1 Introduction Innovation today has made the jobseeker so comfortable that they can upload their resume with only one single click. Each and every resume differ with the other in style, font and presentation which makes the extraction from the bunch painful and as well as tough with human intervention. So here arises the need of a smarter [1] and enhanced system to extract the required information which can auto-scrap the aspirants with the abilities that are not relevant to employer. To conquer the issue of sorting and ranking the resumes from unstructured data, which is a troublesome issue, an intelligent algorithm is required to parse the information [2]. Here is the point where the role of deep learning comes into play to extract the information from. Resume parsing is a method of parsing a given resume and extracting the data from the resume which can be in many different formats, like presentation can be of different fonts, and the layouts can be different [3]. As soon as resume data is parsed, it is put in storage for further process. The system ranks the resumes based on keyword matching and shows the most relevant ones to the employer. For which, we are using an algorithm called name entity recognition (NER) using SPACY which takes a string of text as an input and identifies relevant nouns (people, places and organizations) and other specific words to produce a relevant output which could be candidate’s experience, scale and academic excellence.

2 Pre-processing Entity extraction is also called entity name extraction or named entity recognition (NER). The data from the resumes can be extracted into different entities named like name of a person, organization, values etc. which is called entity recognition. It helps in transformation of unstructured text to structured text [4]. Data set to the algorithm is given as PKL file for pre-processing instead of text file which has to go through sequence phases while pre-processing. Data Cleaning: The procedure of preserving or eradicating inappropriate, ruined, erroneously formatted, replicated or inadequate data within the dataset is called as data cleaning. Mislabelling and duplication possibility is a lot when the data is combined from multiple sources, which leads to untrustworthy algorithms and thereby the outcomes, even though they may look correct. Data Integration: Data integration is the process of merging data from several disparate sources. Data transformation: Merging of unstructured data with structured data for further investigation is called data transformation in data pre-processing. It is quite trouble free to investigate the pattern from a standardised and well-structured data.

Smart Recruitment System Using Deep Learning …

649

Data Discretization: Discretization is the process of segmenting the data into set of data intervals which makes the reading and investigation of information simple; hence, the task efficiency upturns. As discretization process alters the huge data set into a set of uncompromising data, this step of process is also called as data reduction mechanism. Tokenization: Dividing the complete sentence of a text into individual objects known as tokens is called tokenization. Examples of tokens can be words, characters, numbers or symbols. Below is code for tokenization of a sentence into units of meaning called tokens (Fig. 1). Stemming: Stemming is the process of reducing inflection to its root forms; this occurs in such a way that illustrating a group of relevant words further down the same stem, even if the root has no appropriate meaning, is possible [5–7] (Fig. 2). Lemmatization: Another technique similar to stemming is called lemmatization which gives a root word called lemma rather than a root stem [8, 9]. We will be getting a valid word that means the same thing after lemmatization. Parts of Speech: The process of labelling every single word in a sentence by the suitable part of speech is called tagging by parts of speech (POS) (Fig. 3). Chunking: Taking out the phrases from unstructured text is called chunking which breaks the sentence into phrases which are expedient than the single words to produce significant outcomes. Chunking plays a very vital role for extraction with labels. Significant outcomes. Chunking plays a very vital role for extraction with labels.

Fig. 1 Tokenization of a sentence of a text

650

R. Ponnaboyina et al.

Fig. 2 Stemming

Fig. 3 Parts of speech tagging

3 Proposed Work In this proposed methodology, we are using deep learning for natural language processing technique for parsing the resume according to the particular companies [10]. The resumes received would be parsed and ranked according to company requirements. A model in machine learning which learns to perform tasks directly from images, text or sound thinks like a human in processing the information and creating various

Smart Recruitment System Using Deep Learning …

651

Fig. 4 Parse tree illustrating the syntactic analysis

patterns to use in decision-making is called deep learning which in general uses neural network architecture [11, 12. In the fast growing machine learning technology, the serious necessity to recognize text with its unpredictable structure, implied implications and intent is raising the importance of natural language processing [13]. Constructing prospects for humans to get done tasks that were impossible before can be done with the help NLP by translating machine language into human language, which is used to parse the resume information using the following constraints [9]: 1.

2.

Lexical Analysis: This is the first phase of compilation where the source program is scanned, and one character at a time is converted to meaningful lexemes or tokens. Syntactic Analysis: The process of examining the natural language in accordance with the guidelines of prescribed grammar is called syntactic analysis. Grammar is very essential and important to describe the syntactic structure of well-formed programs [1, 14].

Parsing. Parsing is to “resolve a sentence into its component parts and describe their syntactic roles”. Parsing of a sentence is shown in Fig. 4 with an example [15]—“having good experience in teaching data science and big data”.

4 Semantic Analysis Semantic analysis is the process of understanding the meaning and interpretation of words, signs and sentence structure [9]. Here is where the computer relatively recognize the natural language as the human does [16, 17]. Schema given below shows how the parse is working internally. The training data have text data and corresponding label, then this level is converted in the form of document; thereafter, it does the training internally. The models weight is updated through the update block. And there is also optimizer to optimize the training process and improve the efficiency (Figs. 5, 6 and 7).

652

Fig. 5 Internal working of parse

Fig. 6 Data set for training the model/algorithm

Fig. 7 SPACY NLP pipeline

R. Ponnaboyina et al.

Smart Recruitment System Using Deep Learning …

653

As a substitute of generating a model from graze, we castoff pre-trained model so that we can power NLP competencies pre-trained model (Fig. 8). After using spacy NER, the below-given labels are produced (Fig. 9). Now, we can load this spacy model anywhere we want, and then, we can predict the required data. Fig. 8 Passing of trained data to the NLP

Fig. 9 Labels produced using spacy NER

654

R. Ponnaboyina et al.

Fig. 10 Label entities in the shortlisted resumes

5 Result The proposed system for parsing with deep learning for natural language processing will offer trust worthy interviewees to the employer to fit their requirements. The end result includes parsing resumes from sources in pdf/doc formats and normalising and tokenizing data entities to find the best candidates from incomplete data.. Companies and interviewees equally will be benefited by the proposed system. The concluding output of eligible entrants will be put on show with the looked-for label entities (Fig. 10).

6 Conclusion The proposed work is to make a smart [1] recruitment system (SRS) which includes resume classification and ranking using deep learning along with natural language processing (NLP). The underdone information we attained through the resumes is normalised, parsed and tot up to display the top N candidates. We productively converted different formats of resumes to text and parse significant information from there. We have also merged the executive requirements while scoring the resume, thus making it recruiter-specific. This proposed work will deliver the quality of applicants for the executives; based on the information in the form of technical skills, the resumes will be ranked in order.

References 1. Sowmya, V., Vishnu Vardhan, B., Bhadri Raju, M.S.V.S.: Improving semantic textual similarity with phrase entity alignment. Int. J. Intell. Eng. Syst. 10(4) (2017) 2. Javed, F., Luo, Q., McNair, M., Jacob, F., Zhao, M., Kang, T.S.: Carotene: a job title classification system for the online recruitment domain. In: 2015 IEEE First International Conference on Big Data Computing Service and Applications 3. Jayaraj, V., Mahalakshmi, V., Rajadurai, P.: Resume information extraction using feature extraction model. Am. Int. J. Res. Sci. Technol. Eng. Math. (2015) 4. Kopparapu, S.K.: Automatic extraction of usable information from unstructured resumes to aid search. In: Progress in Informatics and Computing (PIC), 2010 IEEE International Conference

Smart Recruitment System Using Deep Learning …

655

5. Gugnani, A., Kasireddy, V.K.R., Ponnalagu, K.: Generating unified candidate skill graph for career path recommendation. In: 2018 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE (2018) 6. Kudatarkar, V.R., Ramannavar, M., Sidnal, N.S.: An Unstructured Text Analytics Approach for Qualitative Evaluation of Resumes, IJIRAE (2015) 7. Naveen Kumar, G.S., Reddy, V.S.K.: Detection of shot boundaries and extraction of key frames for video retrieval. Int. J. Knowl. Intell. Eng. Syst. 24(1), 11–17 (2020) 8. Sanyal, S., Ghosh, N., Hazra, S., Adhikary, S.: Resume parser with natural language processing, IJESC (2007) 9. Parkavi, A., Pandey, P., Poornima, J., Vaibhavi, G.S., Kaveri, B.W.: E-Recruitment System Through Resume Parsing, Psychometric Test and Social Media Analysis, IJARBEST (2019) 10. Fahad S.K.A., Yahya, A.E.: Inflectional review of deep learning on natural language processing. In: 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE). IEEE (2018) 11. Saxena, C.: Enhancing Productivity of Recruitment Process Using Data mining & Text Mining Tools, San Jose State University 12. Ayishathahira, S.: Combination of neural networks and conditional random fields for efficient resume parsing. In: International CET Conference on Control, Communication and Computing (IC4) (2018) 13. Bhaliya, N., Gandhi, J., Singh, D.K.: NLP Based Extraction of Relevant Resume using Machine Learning, IJITEE (2020) 14. Kumar, G.N., Reddy, V.S.K.: Key frame extraction using rough set theory for video retrieval. In: Soft Computing and Signal Processing, pp. 751–757. Springer, Singapore (2019) 15. Sowmya, V., Mantena, S.V.S., Raju, B., Vardhan, B.V.: Analysis of lexical, syntactic and semantic features for semantic textual similarity (IJCET) (2018) 16. “Intelligent hiring with resume parser and ranking using natural ...” https://www.ijircce.com/ upload/2016/april/218Intelligent.pdf 17. Shivratri, P., Kshirsagar, P., Mishra, R., Damania, R., Prabhu, N.: Resume parsing and standardization (2015)

Early Diagnosis of Age-Related Macular Degeneration (ARMD) Using Deep Learning Pamula Udayaraju and P. Jeyanthi

Abstract Retinal diseases become more complicated for the humans. Among this, age-related macular degeneration (ARMD) is the eye-related disease that may cause vision loss. ARMD consists of two types such as dry-ARMD and wet ARMD. The dry ARMD gets affected slowly, so there is no vision loss. The wet ARMD shows the impact on vision loss. If the person got infected for two eyes, they may loss their quality of life. The dry form of age-related macular degeneration tends to get worse slowly, so you can keep most of your vision. The wet form of macular degeneration is a leading cause of permanent vision loss. If it is in both eyes, it can hurt your quality of life. Early detection of ARMD can prevent the vision loss for elderly persons. Deep learning (DL) is an artificial intelligence (AI) technique that works better on human body parts by generating patterns for decision-making. In this paper, discussed several preprocessing techniques, feature extraction techniques and early diagnosis of ARMD diseases by using deep learning algorithms. The performance of various algorithms is discussed on optical coherence tomography (OCT) dry and wet images.

1 Introduction Diagnosis of diseases in human body is very difficult task; especially, one should focus on prime domains that applied on artificial intelligence (AI). Companies such as Google [1] and GE healthcare [2] investing more on health sector. Many healthcare systems focus on diagnosing the diseases by using different algorithms. Detecting and diagnosing of eye disease are very sensitive because of unpredictability of those diseases. P. Udayaraju (B) Department of Computer Science and Engineering, Sathyabama Institute of Science and Technology, Chennai, India P. Jeyanthi Department of Information Technology, Sathyabama Institute of Science and Technology, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_59

657

658

P. Udayaraju and P. Jeyanthi

The previous experiments show that a degenerative retinal cycle is related with the volume of drusen saw in SD-OCT imaging [2], and numerous different attributes measured through SD-OCT might be helpful as illness biomarkers. Explicit techniques for man-made brainpower (AI and profound learning) are in effect progressively utilized for computerized picture examinations in SD-OCT [3]. For example, customary AI procedures [4–8] were utilized to evaluate explicit illness highlights, and these methods could likewise be utilized for ID of covered-up designs that can be utilized to improve the anticipation or reaction to treatment all the more effectively. Studies utilizing profound learning [9, 10] have demonstrated to be viable for grouping ordinary versus ARMD OCT pictures by straightforwardly examining the picture pixel information. As to expectation of ARMD movement dependent on OCT picture biomarkers, two principle examines have been distributed previously [11]. These investigations propose an underlying answer for the expectation of wet ARMD utilizing conventional AI methods with restricted exactness (0.74 and 0.68 AUC). However, none of the prescient models considered successive learning on the longitudinal OCT information caught during numerous visits. The estimated performance could be improved by utilizing a successive profound learning model that considers various visits of similar patients in a start to finish model to foresee patient explicit patterns in short and long haul movement of ARMD. In any case, a functional test for executing a particularly successive deep learning model for preparing longitudinal crude SD-OCT pictures is that a solitary SD-OCT volume ordinarily contains from 100 to 200 two-dimensional (2D) highgoal pictures (B-checks), which makes the information measurement amazingly enormous (number of visits × number of B − scan images × number of pixel in each image) to be dealt with in a computationally proficient manner. A profound learning model that can deal with a particularly intricate information space needs an enormous measure of preparing information to smother hopefully one-sided assessments of the presentation. When in doubt of thumb, the size of the dataset ought to be at any rate about 10× its measurement, which is unrealistic for a large portion of the longitudinal clinical expectation cases, given the limited availability of data. This paper mainly focuses on discussing about the various preprocessing techniques, feature extraction techniques, and deep learning algorithms for diagnosis of ARMD.

2 Literature Survey This section explained about the various algorithms that are applied on OCT images. A clear literature with performance is given in this. The deep CNN [12, 13], ResNet-50, was proposed, by utilizing transfer learning, this is applied on OCT images for the classification of CNV, drusen, and DME. The accuracy of this deep CNN is 96.1%. Due to the large number of parameters, the transfer learning is adopted to increase the pre-trained performance. This is very

Early Diagnosis of Age-Related Macular Degeneration (ARMD) …

659

heavy network and this may not compatible for real-time environment. One layer is used for classification of normal retina, DME, drusen, and CNV was proposed in [14], the accuracy is about 89.9% for CNV. In CNV, two networks are utilized, one for segmentation and other for classification. The author proposed dynamic and rapid algorithm to classify the wet and dry ARMD is developed and this is applied on OCT dataset images [2, 15]. A novel technique is proposed for the classification of DME and other two stages of ARMDdrusen, CNV-from healthy OCT images. Using CNN classification, a multi-scale deep feature fusion (MDFF) is proposed. Performance is analyzed based these parameters such as sensitivity 99.6%, specificity 99.6%, and accuracy 99.6%. Due its multiscale nature, this network is more complicated because of its multi-scale nature with large number of parameters that are learnable. In Ref. [3], the DL-based ARMD classification is proposed. The deep CNN is used for classification on preprocessed OCT image. The accuracy is about 96.93% by using inception V3 network. These learned metrics have pre-trained network, and the proposed classification is more weighted which is not compatible for real-time execution. A dynamic classification of DME and ARMD applied on OCT images is explained [4]. The accuracy is 97.1% if the AlexNet architecture was utilized. Due to large number of learnable metrics, the real-time deployment is more complex if it works with AlexNet architecture.

3 Preprocessing and Feature Extraction Preprocessing is most widely used to improve the extraction of features on selected images. In general, every human eye acknowledge the “apotheosis” in given input image (such as geographic decay and drusen), AI algorithms classify the estimated “apotheosis” based on the pixels of an item (i.e., drusen). Likewise, an OCT image that consists of gray color that requires away from or wiped out on these pixels will not be of any importance. The pixel range of OCT images is from 0 to 255 per color channel (e.g., RGB, saturation, value). The drusen in the OCT shows the small regions which pixels are very bright. From image histogram, the properties are calculated such as entropy, energy, and intensity of the affected region and classify the drusen or not. To increase the contrast of image, contrast limited adaptive histogram equalization (CLAHE) is used in [16]. Based on the previous studies, the median filter is used to eliminate the black border that removes the noise and smoothing the input image without decreasing contrast.

660

P. Udayaraju and P. Jeyanthi

Fig. 1 System architecture

4 Wet/Dry/No Disease Many traditional algorithms are used to detect the dry ARMD using SVM, Naive Bayes (NB), probabilistic neural networks (PNNs), k-nearest neighbors (KNNs), and decision trees (DTs) was proposed by Mookiah and coworkers [17]. Three datasets such as (automatic retinal image analysis (ARIA), STARE, and a private dataset (PDS)) are used for training and testing. The SVM classifier shows the best performance when compare with other classifiers. In ARIA and STARE, the accuracy is obtained up to 95.7% for ARMD and 95% for normal [18]. Features such as energy, entropy, and Gini index are obtained from discrete wavelet transform (better image denoising technique) also obtained better improvement in accuracy for SVM (94.10%) [19]. Figure 1 explains about the system architecture based on the DL algorithms.

5 Role of Deep Learning on ARMD Deep learning (DL) is trending field which is used for feature extraction, training, testing, and different classifiers that are restore with multilayer neural networks which are compatible of learning hidden patterns in data [20]. NNA developed and created to perform the task within the network. In the neural networks, the basic building blocks are convolution, pooling, and fully connected layers, called as CNN. CNN is considered as deep CNN (DCNN) architecture consists of 10 or more convolutional layers. This approach requires huge amounts of training for the data, which is not available, generally in healthcare systems. Experimental Results The experiments are conducted by using Python programming language. Several packages are used to detect the various Python packages such as NumPy, pandan, sklearn, and Matplotlib. Every package has its own advantage. Based on the disease detection, several approaches are applied on various datasets such as OCT images datasets, Messidor datasets, IDRiD dataset, OCT2017 dataset, and Srinivasan2014 datasets. The performance of several algorithms is analyzed by using the parameters such as accuracy, sensitivity and specificity, and weighted error for sometimes.

Early Diagnosis of Age-Related Macular Degeneration (ARMD) … Table 1 Performance of several models by showing the parameters

661

Models

Accuracy

Sensitivity

Specificity

Inception V3

92.90

96.60

94.34

Human expert 2 [3]

92.40

99.78

95.12

Inception V3 [4]

97.10

97.97

97.67

ResNet50-V1 [33]

99.57

99.60

99.79

MobileNet-V2 [3]

99.70

99.70

99.80

Xception [41]

99.80

99.91

99.92

OpticNet-71

99.85

99.85

99.95

True Positive (TP): The predicted positive is true. If the system predicts, person is effected with retinal disease, then he is effected with disease. True Negative (TN): The predicted negative is true. If the system predicts person is not effected with retinal disease, then he is not effected with disease. False Positive (FP): If the system predicted that the person effected with disease, then it is false. False Negative (FN): If the system is predicted that the person is not effected with disease, then it is false. The following are the equations that will used to calculate the parameters. The performance of several models by showing the parameters is in Table 1 (Fig. 2). Accuracy =

100 98 96 94 92 90 88

TP + TN TP = TN + FP + FN

Accuracy Sensivity Specificity

Fig. 2 Performance of several models

662

P. Udayaraju and P. Jeyanthi

Sensitivity =

TP TP + FN

Specificity =

TN TN + FP

6 Conclusion In this paper, the diagnosis of AMRD by using various preprocessing techniques, feature extraction techniques, and machine/deep learning algorithms is discussed which are applied on OCT images and fundus images and analyzed the performance based on the parameters such as sensitivity, specificity, and accuracy. This paper also analyzed the various stages of AMRD and applied different algorithms.

References 1. Krause, J., Gulshan, V., Rahimy, E., Karth, P., Widner, K., Corrado, G.S., Peng, L., Webster, D.R.: Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy. Ophthalmology 125(8), 1264–1272 (2018) 2. Das, V., Dandapat, S., Bora, P.K.: Multi-scale deep feature fusion for automated classification of macular pathologies from OCT images. Biomed. Signal Process. Control 54, 101605 (2019) 3. Hwang, D.-K., Hsu, C.-C., Chang, K.-J., Chao, D., Sun, C.-H., Jheng, Y.-C., Yarmishyn, A.A., Wu, J.-C., Tsai, C.-Y., Wang, M.-L., et al.: Artificial intelligence-based decision-making for age-related macular degeneration. Theranostics 9(1), 232 (2019) 4. Kaymak, S., Serener, A.: Automated age-related macular degeneration and diabetic macular edema detection on oct images using deep learning. In: 2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP), IEEE, pp. 265–269 (2018) 5. Lee, C.S., Baughman, D.M., Lee, A.Y.: Deep learning is effective for classifying normal versus age-related macular degeneration OCT images. Ophthalmol. Retina 1(4), 322–327 (2017) 6. Sun, Y., Li, S., Sun, Z.: Fully automated macular pathology detection in retina optical coherence tomography images using sparse coding and dictionary learning. J. Biomed. Optics 22(1), 016012 (2017) 7. Sidibe, D., Sankar, S., Lemaitre, G., Rastgoo, M., Massich, J., Cheung, C.Y., Tan, G.S.W., Milea, D., Lamoureux, E., Wong, T.Y., et al.: An anomaly detection approach for the identification of DME patients using spectral domain optical coherence tomography images. Comput. Methods Programs Biomed. 139, 109–117 (2017) 8. Perdomo, O., Rios, H., Rodríguez, F., Otálora, S., Meriaudeau, F., Müller, H., González, F.A.: Classification of diabetes-related retinal diseases using a deep learning approach in optical coherence tomography. Comput. Methods Programs Biomed. (2019) 9. Awais, M., Müller, H., Tang, T.B., Meriaudeau, F.: Classification of sd-oct images using a deep learning approach. In: 2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), IEEE, pp. 489–492 (2017) 10. Rasti, R., Rabbani, H., Mehridehnavi, A., Hajizadeh, F.: Macular OCT classification using a multi-scale convolutional neural network ensemble. IEEE Trans. Med. Imaging 37(4), 1024– 1034 (2017)

Early Diagnosis of Age-Related Macular Degeneration (ARMD) …

663

11. Seebock, P., Waldstein, S.M., Klimscha, S., Bogunovic, H., Schlegl, T., Gerendas, B.S., Donner, R., Schmidt-Erfurth, U., Langs, G.: Unsupervised identification of disease marker candidates in retinal OCT imaging data. IEEE Trans. Med. Imaging 38(4), 1037–1047 (2018) 12. Kermany, D.S., Goldbaum, M., Cai, W., Valentim, C.C.S., Liang, H., Baxter, S.L., McKeown, A., Yang, G., Wu, X., Yan, F., et al.: Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172(5), 1122–1131 (2018) 13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 14. Huang, L., He, X., Fang, L., Rabbani, H., Chen, X.: Automatic classification of retinal optical coherence tomography images with layer guided convolutional neural network. IEEE Signal Process. Lett. 26(7), 1026–1030 (2019) 15. Serener, A., Serte, S.: Dry and wet age-related macular degeneration classification using oct images and deep learning. In: 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), IEEE, pp. 1–4 (2019) 16. Hijazi, M., Coenen, F., Zheng, Y.: Retinal image classification using histogram based approach. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. Barcelona (2010) 17. Mookiah, M., Acharya, U., Koh, J., Chua, C.K.: Decision support system for age-related macular degeneration using discrete wavelet transform. Med. Biol. Eng. Comput. 52, 781–796 (2014) 18. Mookiah, M.R.K., Acharya, U., Koh, J., Chandran, V.: Automated diagnosis of age-related macular degeneration using greyscale features from digital fundus images. Comput. Biol. Med. 53, 55–64 (2014) 19. Zheng, Y., Hijazi, M., Coenen, F.: Automated disease/no disease grading of age-related macular degeneration by an image mining approach. Invest. Ophthalmol. Vis. Sci. 53(13), 8310–8318 (2012) 20. Acharya, U., Hagiwara, Y., Koh, J., Salatha: Automated screening tool for dry and wet agerelated macular degeneration (ARMD) using pyramid of histogram of orientated gradients (PHOG) and nonlinear features. Comput. Sci. 20, 41–51 (2017)

Recent Trends in Calculating Polarity Score Using Sentimental Analysis K. Srikanth, N. Sudheer, G. S. Naveen Kumar, and Vijaysekhar

Abstract The current technological era turns the traditional lifestyle in several domains like publishing news and events has become faster with the advancement of information technology (IT). With the help of information technology, immense amounts of data are being published every minute of every day, by millions of users given by their opinions in the form of comments, blogs, reviews, and opinions through blogs, social media micro-blogging Websites, and many more. In this paper, we want to explore how sentimental analysis plays vital role and has drawn the attention of data scientists.

1 Introduction Sentiment analysis (SA) has drawn the attention of data scientists during the last two decades. The main reason for this is the creation of large volume of user-generated content like reviews/tweets on various subjects [1]. Such a data are usually unstructured and contain a mixture of all types of data. Machine learning models for business applications make use of this text for real-time decision-making [2]. The method of SA works on a systematic way of as follows [3]. (a) (b) (c)

The message is split into sentences by using sentence boundaries (like “.”). Each sentence is further divided into words separated by “white spaces.” Each word of the sentence is compared with a dictionary in which the polarity of the word is listed as a hash table. The Joker’s polarity dictionary is one example.

K. Srikanth (B) · N. Sudheer · G. S. Naveen Kumar Department of CSE, Malla Reddy University, Hyderabad, India e-mail: [email protected] N. Sudheer e-mail: [email protected] Vijaysekhar Department of Management, Malla Reddy University, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2_60

665

666

K. Srikanth et al.

(d)

Each word in a sentence is segregated as a polarity word (sentiment word) or non-polarity word with corresponding polarity value (+1 for positive, 0 for neutral, and −1 for negative). The average sentiment and its standard deviation are calculated for the message basing on all the sentences.

(e)

Though sentiment is often classified as binary (positive/negative), it can be finetuned to reflect the precision of polarity in the text so that it can be expressed on a 5point scale viz., very negative, negative, neutral, positive, and very positive categories with suitable numerical rating. This is known as fine-grained SA [4]. Tan et al. [3] have studied classification of reviews/documents by analyzing sentence level polarity utilizing linguistic arguments. Vinodhini and Chandrasekaran [5] made a survey of various methods and applications of SA and opinion mining and highlighted some pertinent challenges. Bukherjee and Bhattacharyya [6] studied the sentiment in the product reviews by focusing on specific features instead of analyzing the review in general. Bakliwal et al. [7] have reported various applications of SA to the context of political tweets. Various concepts related to SA were discussed by Khaled et al. [8]. SA of product reviews for big data situations like those generated by Amazon was studied by Xing Fang and Justin Zhan (2015). Humans are subjective creatures and opinions are important. Being able to interact with people on that level has many advantages for information systems. Sentiment analysis, also called opinion mining, is the field of study that analyzes people’s opinions, sentiments, evaluations, appraisals, attitudes, and emotions toward entities such as products, services, organizations, individuals, issues, events, topics, and their attributes. It represents a large problem space. There are also many names and slightly different tasks, e.g., sentiment analysis, opinion mining, opinion extraction, sentiment mining, subjectivity analysis, affect analysis, emotion analysis, review mining, etc. While in industry, the term sentiment analysis is more commonly used, but in academia, both sentiment analysis and opinion mining are frequently employed. They basically represent the same field of study. The term sentiment analysis perhaps first appeared in [1], and the term opinion mining first appeared in (Dave et al. 2003). However, the research on sentiment analysis and opinion mining was already studies by Das and Chen [2], Morinaga et al. (2002), Pang et al. (2002), Tong (2001), Turney (2002), Wiebe (2000). Sentimental analysis and opinion mining both were used to calculate the polarity scores on sentiments, which make positive or negative statements.

2 Sentence Level Sentimental Analysis The polarity of each sentence is derived, and also the assumption is that each sentence is written by a single person and expresses a single positive or negative or neutral opinion/sentiment [6].

Recent Trends in Calculating Polarity Score …

667

An early work has already been done with sentence level analysis, identifying sentences based on the subject. Most techniques were classification techniques. This process is divided into two task: First, identifying the type of sentence old opinion and classify each sentence with polarity score (positive/negative). The advantage of sentence level analysis lies in the subjectivity classification [5, 6].

3 Text Tokenization and Sentiment Scores “Sentimetr” is a package in R software commonly used in machine learning models. It is loaded into R-environment using a) install.packages(“sentimentr”) and b) library(sentimentr) which can be obtained from the URL http://github.com/trinker/ sentimentr. We also need a dictionary of sentiment words (several dictionaries are available) which is available in install.packages(“ndjson”) and library(ndjson). For text tokenization, we use library(tidyverse) and library(tokenizers). The sentiment score of at text is evaluated in terms of polarity (emotions expressed in a sentence) of words in various sentences of the text. Not all words have polarity or sentiment. A list of words having positive or negative polarities has been prepared by experts and known as lexicon. Here are many lexicons available and each one has a potential to apply in a given context. For instance, the polarity of word in the context of seeking answers from a student is different from the same word while evaluating a review on a cinema or a car model. Joker’s lexicon (2017) is one such dictionary which is used by R software for evaluating the sentiments of general reviews. The words which are used in a lexicon are known as polarity words which carry a score of +1 for positive sentiment and −1 for negative sentiment. All other words are treated as neutral and carry a score of 0. In SA, each word is compared with the indexed list of polarity words and segregated into positive, negative, and neutral words. However, SA also takes into account the neighboring words in the sentence so as to add or delete the additional tone attached to the polarity word. For instance, the word “poor” is negative polarity while “not poor” will be treated as positive. This is called contextual polarity and n (positive integer) of words (known as valence shifter) before and after each polarity word will be taken into account for calculating the polarity [7–9]. In the case of text having short length like captions in newspapers, there will be only few words that serve as valance shifters and the sentiment is purely based on the polarity of the individual words. In the case of long tweets, we get a better resolution of words and thereby the sentiment can be estimated by measuring the polarity of various segments of the text. More details can be had from https://medium.com/@ODSC/anintroduction-to-sentence-level-sentiment-analysis-with-sentimentr-ac556bd7f75a. Given a text T having k sentences, the sentiment in T is expressed as the arithmetic average of sentiment from each of the k sentences of T. The R-command to achieve this is

668

K. Srikanth et al.

sentiment (text.var, polarit y_dt = lexicon :: hash_sentiment_ jocker s_rinker, valence_shi f ter s_dt = lexicon :: hash_valence_shi f ter s, hyphen =  , ampli f ier.weight = 0.8, n.be f or e = 5, n.a f ter = 2, question.weight = 1, adver sative.weight = 0.25, neutral.nonver b.like = F AL S E, missing_value = 0, ...)

The output of this function produces a table showing. 1. 2. 3. 4.

element_id—The id number of the original vector (of words) passed to sentiment sentence_id—The id number of the sentences within each element_id word_count—Word count of the text paragraph sentiment—Sentiment/polarity score

The sentiment score is given for each sentence of the text T. The score is, however, dependent on the type of lexicon used, and the R package uses the augmented version of Jocker’s (2017) polarity dictionary.

4 The Average Sentiment and Variance When a text contains multiple sentences, the sentiment is calculated as the arithmetic average of the polarity of all the sentences in the text. Consider the following review about Hundai cars (Source: Kaggle, Hundai Car review data set) which we denote by “tt.” I absolutely love my Azera. The performance is unmatch in its class. I have not had any issues with anything. I have not experienced the problems others are mentioning about how it handles bump. The luxury interior is unmatched for its class also. I would recommend this car to everyone. Just go drive one and you will be a believer also.

If we run the code sentences 0,] sentiment_counts[ polarit y < 0,] sentiment_counts[ polarit y == 0,] Output: The words having positive, negative, and neutral sentiments are extracted and shown in Table 1. We notice that only 7 of 59 words are tagged with a positive polarity while only 3 had negative polarity from the 7 sentences of the text. All other words are non-sentiment words. Sentiment by each sentence. Let s[k] denotes the kth sentence of the text. Then, the code sentiment_by (s[k]) produces the sentiment in s[k] denoted by avg_sentiment. Since there is only one sentence, the average and actual sentiment will be equal. The following code produces the sentence wise sentiment of tt. Code: y < −get_sentences(tt) vv < −sentiment(y) vv Output: The sentiment score and its standard deviation of each sentence in the review “tt” are shown in Table 2.

believer





recommend

performance





6

7

0.40

0.50





0.60

1

1





1

1 3

2

1

3

0.60

1

experienced

2

0.75

love

1 bump

issues

problems

words

#

n

words

#

polarity

Negative sentiment_words [polarity < 0,]

Positive sentiment_words [polarity > 0,]

Table 1 Sentiment words and polarity values

1 1 1

−0.5 −1.0 −1.0

n

polarity

41

40





3

2

1

#

a

be





is

the

i

words

0

0





0

0

0

polarity

Neutral sentiment_words [polarity == 0,]

1

1





2

3

3

n

670 K. Srikanth et al.

Recent Trends in Calculating Polarity Score …

671

Table 2 Average sentiment and standard deviation from 7 sentences of the review Sentence

1

2

3

4

5

6

7

word_count

5

7

7

13

9

7

11

Standard deviation

NA

NA

NA

NA

NA

NA

NA

ave_sentiment

0.33541

0.15119

0.37796

−0.30509

0.33333

0.18898

0.18091

The arithmetic average of the sentiment for each of the 7 sentences gives 0.18038 which is same as the value obtained by the command sentiment = sentiment_by (tt). Since each sentence is treated as a single line of text, we do not get standard deviation (SD). For the complete text of 7 sentences, however, the SD will be 0.23204. In fact, we do not need to know the sentiment by each sentence of the text unless we wish to have a comparative analysis of sentiments across various sentences. It is also possible to extract other statistical indicators of sentiment score by using the following code. y < −get_sentences(tt) vv < −sentiment(y) vv$sentiment   mean < −mean vv$sentiment   stdev < −sd vv$sentiment   median < −median vv$sentiment   q1 < −quantile vv$sentiment, 0.25   q3 < −quantile vv$sentiment, 0.75 iqr < −(q3 − q1) print(c(mean, stdev, median, q1, q3, iqr )) The statistical summary of the full review is shown in Table 3. The average sentiment of the given text is, therefore, 0.18039 which is positive but has SD of 0.23205 which is higher than the mean (average). This indicates that some sentences have very large positive or negative sentiment rather than being stable throughout the text. It often happens in reviews because some of sentences may have extreme sentiment. In an ideal situation, we expect the SD to be smaller than the mean sentiment score. Table 3 Summary of sentiment scores within the text of the review Mean

SD

Median

Q1 (25% percentile) Q3 (75% percentile) IQR (interquartile range: Q3-Q1)

0.18039 0.23205 0.18898 0.16605

0.33437

0.16833

672

K. Srikanth et al.

An alternative summary of sentiment is the “median” instead of “mean” of the sentiments of the sentences. In this case, the median sentiment is 0.18898 which is slightly more than the mean. The quartiles Q1 and Q3 denote the 25th and 75th percentiles of score, and (Q3–Q1) is a measure of variation known as interquartile range (IQR). In this text, out of 7 sentences, 25% has a score below 0.16605 while 25% has a score above 0.33437 and the IQR = 0.16833. The final activity in SA is to classify the text as positive or negative basing the score. One standard rule for classification is to tag the text as “negative” if the score +0.05; “natural” otherwise. Since the average score is positive and falls above 0.05, the given text conveys a positive sentiment. In the following section, we perform the analysis of sentiments from a database and report the findings. Conclusions From the analysis, we conclude the following. 1. 2. 3.

Sentimentr package offers a wide range analytical features for extracting the sentiment of reviews. The ave_sentiment is used as a measure of the positivity of the opinion along with standard deviation. In this research, each reviewer has provided (a) a numerical rating of the review on a 1–5 scale and (b) a title for the review.

Acknowledgments The authors are thankful to Prof. K. V. S. Sarma for his support in the statistical analysis and its interpretation.

References 1. Nasukawa, T., Yi, J.: Sentiment analysis: Capturing favorability using natural language processing. In: Proceedings of the KCAP- 03, 2nd International Conference on Knowledge Capture (2003) 2. Das, S., Chen, M.: Yahoo! for Amazon: Extracting market sentiment from stock message boards. In: Proceedings of APFA-2001 (2001) 3. Tan, L.K.-W., Na, J.-C., Theng, Y.-L., Chang, K.: Sentence-level sentiment polarity classification using a linguistic approach. In: Digital Libraries: For Cultural Heritage, Knowledge Dissemination, and Future Creation, pp. 77–87. Springer, Heidelberg, Germany (2011) 4. Naveen Kumar, G.S., Reddy, V.S.K.: Detection of shot boundaries and extraction of key frames for video retrieval. Int. J. Knowl. Based Intell. Eng. Syst. 24(1), 11–17 (2020) 5. Vinodhini, G., Chandrasekaran, R.M.: Sentiment analysis and opinion mining: a survey. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2(6) (2012) 6. Bukherjee, S., Bhattacharyya, P.: Feature specific sentiment analysis for product reviews. In: Computational Linguistics and Intelligent Text Processing, pp 475–487 (2012) 7. Bakliwal, A., Foster, J., van der Puil, J., O’Brien, R., Tounsi, L., Hughes, M.: Sentiment analysis of political tweets: Towards an accurate classifier. In: Proceedings of the Workshop on Language in Social Media, 4958 (2013)

Recent Trends in Calculating Polarity Score …

673

8. Khaled, A., Neamat, E.T., Ahmad, H.H.: Sentiment Analysis Over Social Networks: An Overview, IEEE (2015) 9. Naveen Kumar, G.S., Reddy, V.S.K.: Video shot boundary detection and key frame extraction for video retrieval. In: Proceedings of the Second International Conference on Computational Intelligence and Informatics, pp. 557–567. Springer, Singapore (2018)

Author Index

A Abeer, Mooneesah, 79 Ahire, Deepak, 507 Ajai, A. S. Remya, 463 Ajith Kumar, Abhiram, 473 Akash Reddy, P., 573 Akshaya, V., 327 Alafif, Abdulrhman, 139 Amarendra, K., 203 Ambrayil, Mammen Jacob, 79 Amit, Dua, 495 Ansar, Shakeeb, 375 Arul Murugan, C., 115 Asitah, Nur, 617 Ayem, Gabriel Terna, 49

B Badoni, Pankaj, 603 Baireddy, Ravinder Reddy, 39 Banu, Saira, 417 Barker, Umesh, 357 Bassi, Hussain, 139 Bathula, Murali Krishna, 335 Bennet, M. Anto, 221 Beski Prabaharan, S., 417 Bhavani, S. Durga, 543 Bisht, Abhishek, 603 Bouanane, Khadra, 1

C Chakkaravarthy, Midhun, 59, 127, 175 Chandra Sekhar Reddy, P., 563, 583 Changalasetty, Suresh Babu, 39

Chavan, Ameet, 573 Chithreddy, Rithwik, 427 Chougala, Kempanna, 357

D Danh, Luong Vinh Quoc, 101 Darapaneni, Narayana, 375 Datta, Raja, 15 Desai, Viraj, 387 Dev, Arjun B., 79 Dileep, P., 155

E Eddoud, Abdelhadi Mohammed Ali, 1 Etier, Issa, 115

F Firdaus, Mega, 617

G Gayatri, M., 155 Ghatak, Aditya, 473 Gill, Simran, 387 Gopalan, Sundararaman, 463 Gummarekula, Sattibabu, 553 Gupta, N. Venkata Ramana, 213

H Halkiv, Liubov, 89 Hoang, Quoc-Dong, 183

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 V. S. Reddy et al. (eds.), Intelligent Systems and Sustainable Computing, Smart Innovation, Systems and Technologies 289, https://doi.org/10.1007/978-981-19-0011-2

675

676 Hossain, Sazzad, 165 Hung, Bui Thanh, 27 Huynh, Luan N. T, 183

I Ivanytska, Oksana, 89

J Jain, Jinit, 387 Jatothu, Rajaram, 243 Jayasundar, S., 221 Jeyanthi, P., 657 Joshi, Janhvi, 603 Jyothi, K., 313

K Kamal, M. V., 155 Kannan, Nithiyananthan, 115 Karyy, Oleh, 89 Kavitha, R., 269 Kharade, Abhijeet, 375 Kodati, Sarangam, 243 Koti Mani Kumar, T. N. S., 593 Krishna, R. V. V., 553 Krishnaveni Challa, 335 Kulyniak, Ihor, 89 Kumar, N. Uday, 449

L Laxmi, V., 345

M Machupalli, Madhusudhan Reddy, 335 Madhani, Nishil, 473 Maheswari, S., 203 Makala, Ramesh, 647 Manjhi, Pankaj Kumar, 531 Mantena, Srihari Varma, 221 Marlapalli, Krishna, 593 Medakene, Abdeldjaouad Nusayr, 1 Mehta, Diksha, 603 Mehta, Manan, 387 Mekala, Sreenivas, 243 Meshram, Rohit Sunil, 255 Mobarak, Youssef, 139 Mohammed, Nabeel, 165 Mohanaiah, P., 289 Mukherjee, Indrajit, 531 Mukherjee, Pratyusa, 269

Author Index Muthumanikandan, V., 427

N Naik, Anil Kumar, 485 Nanda, Amitash, 507 Narayanan, M. S., 375 Naveen Kumar, G. S., 627, 637, 665 Neeraj, M., 573 Nguyen, Chanh-Nghiem, 101

P Paduri, Anwesh Reddy, 375 Panicker, Renju, 193 Pasupuleti, Swarnalatha, 553 Patil, Nagamma, 255 Patra, S. S., 269 Pavan, M., 313 Pemula, Rambabu, 39 Ponnaboyina, Ranganath, 647 Poojari, Naveena Narayana, 405 Poovammal, E., 517 Posupo, Raja, 335 Pradeepthi, K. V., 485 Prajwal, 405 Preethi, V., 553 Priya, Kesari Padma, 233 Priyadarsini, S., 213 Purnomo, Agung, 617

R Radha, D., 583 Raghunadha Reddy, T., 563 Rahman, Fuad, 165 Rajesh Kumar, G., 213, 335 Rajeswari, R., 269 Rajwar, Sunil Kumar, 531 Ramesh, P., 233 Rangasam, Rajasekar, 221 Rangasamy, Rajasekar, 203, 213 Rani, Kodeti Haritha, 127 Ranjini Devi, R., 327 Rao, M. Mohan, 243 Ravi, G., 243 Ravikiran, Pichika, 175 Ravindran, Ganesh, 375 Rawal, Kirti, 279 Reddy, V. S. K., 637 Revathi, S., 593 Rohan, Shadman, 165 Roopalakshmi, R., 345 Rosyidah, Elsa, 617

Author Index Rout, Suchismita, 269 Roy, Koushik, 165 Roy, Shaily, 69 Rwigema, James, 15 S Sachin, P. C., 495 Sagar, P. Vidya, 203 Saha, Pritom Kumar, 165 Sangeetha, J., 357, 405 Sanjeev, Kumar, 375 Saravanan, N. P., 203 Sarvesh, S. H., 357 Sathyapriya, M., 327 Selvi, S. Arun Mozhi, 213 Sengan, Sudhakar, 203, 213, 221 Septianto, Andre, 617 Seraphim, B. Ida, 517 Sethi, Gaurav, 279 Shah, Eshika, 473 Shah, Neel, 387 Sharma, Dilip Kumar, 221 Sherimon, P. C., 79 Sherimon, Vinu, 79 Shreenivasa, G., 405 Shree, V. Usha, 233 Singh, Prashant, 427 Siva Naga Dhipti, G., 627 Sivanantham, S., 327 Sree, L. Padma, 543 Sridharan, Aadityan, 463 Srikanth, K., 665 Srinu Vasarao, Parnandi, 59 Subburaj, Arjun, 203, 213 Sudheer, N., 665 Suganthi, Sellakkutti, 193 Sumathi, R., 303 Swathi, Baggam, 627

677 T Tahsin, Labeba, 69 Talasila, Srinivas, 279 Thandekkattu, Salu George, 49 The, Ngo Thanh, 101 Thota, Lalitha Saroja, 39 Tuan, Le Anh, 183 Twahirwa, Evariste, 15

U Udayaraju, Pamula, 657 Umamaheswara Reddy, G., 435

V Vajjhala, Narasimha Rao, 49 Vallabhaneni, Pranav, 593 Vanamala, Sunitha, 543 Varada Rajkumar, K., 593 Vasudevan, V., 303 Veerappan, J., 221 Veeraswamy, D., 449 Venkata Subbaiah, M., 435 Venkateswara Reddy, E., 627, 647 Venkateswarlu, S. China, 449 Viet, Tran Xuan, 183 Vijaya Lakshmi, A., 289 Vijaya Pal Reddy, P., 563 Vijaysekhar, 665 Vijay, Vallabhuni, 449 Vinay Kumar, A., 573

Z Zhygalo, Ivan, 89