Soft Computing and Signal Processing: Proceedings of 5th ICSCSP 2022 9811986681, 9789811986680

This book presents selected research papers on current developments in the fields of soft computing and signal processin

430 110 22MB

English Pages 668 [669] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Conference Committee
Preface
Contents
About the Editors
Microservice Architecture Observability Tool Analysis
1 Introduction
2 Literature Survey
3 Observability Goal
4 Logging Backend
5 Tracing Backend
6 Metrics Backend: Prometheus Using Grafana
7 Conclusion
References
Decentralized Payment Architecture for E-Commerce and Utility Transactions with Government Verified Identities
1 Introduction
2 Related Work
3 Transaction Process Fundamentals
4 Proposed Architecture
5 Proposed Software Architecture
6 Future Work
7 Results
8 Conclusion
References
Detection of Phishing Website Using Intelligent Machine Learning Classifiers
1 Introduction
2 Related Works
3 Tools and Techniques
3.1 Classification Algorithms
3.2 Performance Metrics
4 Experimental Methodologies
5 Results and Discussion
6 Conclusion
References
Bidirectional Gated Recurrent Unit (BiGRU)-Based Bitcoin Price Prediction by News Sentiment Analysis
1 Introduction
2 Literature Survey
3 News Headlines Data Preprocessing Techniques
4 Sentiment Analysis Using TextBlob
5 Forecasting Deep Learning Models
6 Proposed Bidirectional Gated Recurrent Unit Model with News Sentiment Analysis
7 Experimental Analysis and Results
8 Conclusion and Future Directions
References
How AI Algorithms Are Being Used in Applications
1 Introduction
1.1 Artificial Intelligence
1.2 The Promise of AI
1.3 What It Means to Learn
1.4 Work with Data
1.5 Machine Learning Application
1.6 Machine Learning Various Types
1.7 Select the Best Algorithm
2 Literature Review
2.1 Implement AI Applications:
3 Conclusion
References
A Framework for Identifying Theft Detection Using Multiple-Instance Learning
1 Introduction
2 Related Works
2.1 Different Machine Learning Approaches
2.2 Feature Extraction
2.3 Comparative Exploration of Different Models
3 Proposed System
3.1 Dataset Descriptions
3.2 Video Preprocessing
3.3 Feature Extraction
3.4 Neural Network with Multiple-Instance Learning
3.5 Multi-Class Classification
3.6 Web Application Development
4 System Design
4.1 Preprocessing
4.2 Feature Extraction
5 Implementation Results and Discussion
6 Conclusion
References
An Efficient Reversible Data Hiding Based on Prediction Error Expansion
1 Introduction
2 Literature Survey
3 Proposed Scheme
3.1 Calculation of Prediction Error
3.2 Calculation of Fluctuation Value
3.3 Embedding Procedure
3.4 Extraction and Recovery Procedure
4 Experimental Results and Comparative Analysis
4.1 Comparison of Embedding Capacity
4.2 Comparison of PSNR
5 Conclusion
References
Deriving Insights from COVID-19
1 Introduction
1.1 Related Work
2 Literature Survey
3 Existing Model
4 Proposed Model
5 Working Methodology of Tableau
6 Tools Used
7 Experimental Results
8 Techniques
9 Conclusion
References
Smoke Detection in Forest Using Deep Learning
1 Introduction
2 Literature Survey
3 Proposed Work
3.1 Dataset
3.2 CNN Model
3.3 Inception V3 Architecture
3.4 Proposed Model
4 Results and Discussion
5 Conclusion
References
Pneumothorax Segmentation Using Feature Pyramid Network and MobileNet Encoder Through Radiography Images
1 Introduction
2 Related Works
3 Materials and Methods
4 Experiment and Results
5 Conclusion
References
Visual Learning with Dynamic Recall
1 Introduction
2 Related Work
3 Proposed Architecture
4 Proposed Algorithm
5 Results and Discussion
6 Conclusion
References
Machine Learning for Drug Discovery Using Agglomerative Hierarchical Clustering
1 Introduction
2 Related Work
3 Methodology
3.1 Molecular Fingerprint
3.2 Molecular Fingerprint Similarity
3.3 Proposed Methodology
3.4 Evaluation and Validity
4 Experiments and Results
4.1 Environment Setup
4.2 Experimental Investigations
5 Conclusion
References
Automated Photomontage Generation with Neural Style Transfer
1 Introduction
2 Related Work
3 Proposed Approach
4 Implementation Results
5 Analysis and Comparative Study
5.1 Based on Style and Content Swap
5.2 Based on Different Number of Iterations
5.3 Based on Different Extracted Layer
6 Challenges and Conclusion
References
Machine Learning Model Development Using Computational Neurology
1 Introduction
1.1 Biological Neuron
2 Neuronal Dynamics: SRM-0
2.1 Programming Language for the Computational Neurology
2.2 Dataset Description: MNIST
3 Proposed Methodology
3.1 Spike Train: Rate Encoding
4 Conclusion
References
Graph Theory-Based User Profile Extraction and Community Detection in LinkedIn—A Study
1 Introduction
2 Literature Review
3 Implementation
3.1 Dataset Description
3.2 Methodology
3.3 Louvain Community Detection Algorithm
3.4 Girvan–Newman Algorithm
3.5 Leiden Community Detection Algorithm
3.6 Walktrap Community Detection Algorithm
4 Results
5 Conclusion
References
A Hybrid Approach on Lexical Indian Sign Language Recognition
1 Introduction
2 Literature Survey
3 Methodology
4 Discussions
5 Conclusion
References
Defining the Convergence of Artificial Intelligence and Cyber Security with a Case Study of Email Spam Detection
1 Introduction
1.1 Challenges of Maintaining Cyber Security
2 Literature Review
3 Convergence of Artificial Intelligence and Cyber Security
3.1 Benefits of Convergence of Artificial Intelligence and Cyber Security
3.2 Current AI Tools Being Used in Cyber Security
4 Case Study of Email Spam Detection Through AI
4.1 Simulation Environment
4.2 Results and Key Findings
5 Conclusion and Future Scope
References
Determining the Attribute Priority Among the Maintenance and Support Phases in Software Development Phases Using Intuitionistic Fuzzy Analytical Hierarchy Process
1 Introduction
1.1 Fuzzy Set Theory
1.2 Intuitionistic Fuzzy Set
1.3 Analytical Hierarchy Process (AHP)
1.4 Fuzzy Analytical Hierarchy Process (FAHP)
1.5 Intuitionistic Fuzzy Analytical Hierarchy Process (IFAHP)
2 Literature Survey
3 Methodology
4 Implementation
4.1 Architecture Flow Diagram
4.2 Observation from the Experts
4.3 Weightage
4.4 Ranking
5 Conclusions
References
Impact of COVID-19 on Energy Generation in India
1 Introduction
2 Related Works
3 Data Source
4 COVID-19 Impact on Energy Sector
4.1 Impact on Total Power Generation During Various Phases of Lockdown
4.2 Impact on Thermal, Hydro, and Nuclear Power Plant Generation During Various Phases of Lockdown
5 COVID-19 Effect on Power Generation
5.1 Change in Thermal Power Production from 2019 to 2021
5.2 Change in HydroPower Production from 2019 to 2021
5.3 Change in Nuclear Power Production from 2019 to 2021
6 COVID-19 Effect on Energy Demand and Peak Demand
6.1 A Comparative Analysis of Peak Load Factor(PLF)
7 Deviation of Energy Generation
8 Conclusion
References
A Machine Learning-Based Approach for Enhancing Student Learning Experiences
1 Introduction
1.1 Objective
1.2 Scope
2 Proposed System
2.1 Accuracy, Precision, Recall, and F1-Score
2.2 Precision
3 Methodology
3.1 Machine Learning
3.2 Decision Tree
3.3 KNN Algorithm
3.4 SVM Algorithm
4 System Architecture
5 Result
6 Conclusion
References
Book Genre Classification System Using Machine Learning Approach: A Survey
1 Introduction
2 Related Work Done
2.1 Using Support Vector Machine (SVM), K-Nearest Neighbours (KNN), and Logistic Regression (LR)
2.2 Using Adaboost Classifier
2.3 Using Character Networks
2.4 Using Word2vec Algorithm and Several Machine Learning Models—CNN, RNN, GRU, and LSTM
2.5 Using Latent Dirichlet Allocation (LDA) and Softmax Regression Technique
2.6 Using FastText Approach
2.7 Using Mahalanobis Distance Based KNN Classifier
2.8 Using Novel Feature Weight Method—Gini Index
2.9 Using Vector Space Model
3 Conclusion
References
A New Method for Imbalanced Data Reduction Using Data Based Under Sampling
1 Introduction
2 Related Work
3 Proposed Work
3.1 Performance Evaluation
4 Experimental Results
5 Conclusion
References
Resource-Based Prediction in Cloud Computing Using LSTM with Autoencoders
1 Introduction
2 Related Work
3 Methodology
3.1 Dataset Description
3.2 System Overview
3.3 Network Design
4 Results and Discussion
4.1 Discussion
5 Conclusion
References
Arduino UNO-Based COVID Smart Parking System
1 Introduction
2 Literature Survey
3 Proposed Model
3.1 Proposed Architecture
3.2 Principle and Working
4 Results
5 Conclusion and Future Scope
References
Low-Cost Smart Plant Irrigation Control System Using Temperature and Distance Sensors
1 Introduction
2 Literature Survey
3 Proposed Model
4 Experiments and Results
5 Conclusion and Future Directions
References
A Novel Approach to Universal Egg Incubator Using Proteus Design Tool and Application of IoT
1 Introduction
2 Related Work
3 Structural Plan
4 Proposed System Design
5 Circuit Design and Flow Diagram
6 Results and Discussions
7 Conclusion
References
Dynamic Resource Allocation Framework in Cloud Computing
1 Introduction
2 Literature Review
3 General Architecture of the Proposed Framework
4 Proposed Algorithm
5 Experiments
5.1 Performance Evaluation
6 Conclusion
References
Region-Wise COVID-19 Vaccination Distribution Modelling in Tamil Nadu Using Machine Learning
1 Introduction
2 Literature Survey
3 Proposed Model
4 Analysis and Results
4.1 Data set
4.2 ARIMA
4.3 FbProphet
4.4 LSTM
5 Conclusion
References
Detection of Phishing Websites Using Machine Learning
1 Introduction
2 Related Work
3 Methodology
3.1 Data Collection
3.2 Feature Extraction
3.3 Data Preprocessing
3.4 Exploratory Data Analysis
3.5 Train–Test Split
4 Algorithms and Evaluation
4.1 Logistic Regression
4.2 K-Nearest Neighbor
4.3 Decision Tree Classifier
4.4 Random Forest Classifier
4.5 Support Vector Machine
4.6 Ensemble Learning
4.7 XGBoost Classifier
4.8 Neural Networks
5 Performance Evaluation
6 Results and Discussion
7 Conclusions
References
Computational Learning Model for Prediction of Parkinson’s Disease Using Machine Learning
1 Introduction
2 Related Work
3 Methodology
3.1 Data Source
3.2 Study Subjects
3.3 Demographics
3.4 Validation
3.5 Statistical Analysis
4 Results and Discussion
5 Conclusions
References
Eliminating Environmental Context for Fall Detection Based on Movement Traces
1 Introduction
2 Literature Survey
3 Algorithm
3.1 Structural Similarity
3.2 Contour Tracing Method
3.3 Neural Network
4 Methodology
4.1 Preprocessing Stage
4.2 Frame Differencing
4.3 Classification
5 Evaluation
5.1 Dataset
5.2 Evaluation Metrics
5.3 Results
6 Conclusion
References
Performance Analysis of Regression Models in Stock Price Prediction
1 Introduction
1.1 Stock Analysis
1.2 Analysis of Stock Data
2 Methodology
2.1 Regression Analysis
3 Implementation
3.1 Dataset Description
3.2 Sample Generation and Training
4 Results and Discussion
4.1 Linear Regression
4.2 Logistic Regression
4.3 Autoregression
4.4 Comparative Analysis of Linear Regression and Autoregression
4.5 Evaluation Criteria
5 Conclusion
References
Review on the Image Encryption with Hyper-Chaotic Systems
1 Introduction
2 Lyapunov Exponents
3 Hyper-Chaotic Systems and Its Applications in Cryptography
3.1 Entropy
3.2 NPCR and UACI
4 Discussion and Conclusion
References
Object Detection Using Mask R-CNN on a Custom Dataset of Tumbling Satellite
1 Introduction
2 Literature Review
3 Methodology
3.1 Video Frames Generation
3.2 Creating Annotation File
3.3 Training Mask R-CNN
3.4 Testing
3.5 Object Tracking with SORT
3.6 Crop the Satellite Object
3.7 Corner Detection Using Harris Corner Detector.
4 Experiments
5 Performance
5.1 Performance Matrix
6 Conclusion
References
Decentralized Blockchain-Based Infrastructure for Numerous IoT Setup
1 Introduction
2 Related Work
3 Proposed System
4 Results and Discussions
5 Conclusion
References
Sentiment Analysis Toward COVID-19 Vaccination Based on Twitter Posts
1 Introduction
1.1 Background
1.2 Scope of the Project
2 Survey on COVID-19 Vaccination Strategies
2.1 Literature Survey
2.2 Challenges
2.3 Data Acquisition
2.4 Problem Statements
2.5 Analysis and Planning
3 Results and Discussions
4 Conclusion
References
Smart Cities Implementation: Australian Bushfire Disaster Detection System Using IoT Innovation
1 Introduction
2 Related Work
3 Research Methodology
4 Proposed Approach
5 Comparison with Another Technology
6 The Fire Detection System Architecture
7 Results
7.1 GPS Module Results on “Thing Speak” Platform
7.2 Flame Sensor Results on “Thing Speak” Platform
7.3 Temperature and Humidity Sensor Detection Results on “Thing Speak” Platform
7.4 Field Charts—Real-Time Data Captured from Thing Speak Cloud Platform
8 Conclusion
References
An Advanced and Ideal Method for Tumor Detection and Classification from MRI Image Using Gamma Distribution and Support Vector Machine
1 Introduction
2 Related Works
3 Proposed Work
3.1 Basic Model
4 Experimental and or Analytical Work Completed in the Paper
4.1 Dataset
4.2 Gamma Distribution
4.3 Modeling, Analysis, and Design
4.4 Implementation and Testing
5 Modules
5.1 Data Acquisition
5.2 Preprocessing
5.3 Segmentation
6 Results and Analysis
7 Conclusion
References
Forecasting Stock Exchange Trends for Discrete and Non-discrete Inputs Using Machine Learning and Deep Learning Techniques
1 Introduction
1.1 Aim of the Project
1.2 Scope of the Project
2 Literature Survey
2.1 A Local and Global Event Sentiment-Based Efficient Stock Exchange Forecasting Using Deep Learning
2.2 Deep Learning-Based Feature Engineering for Stock Price Movement Prediction
3 System Analysis
3.1 Existing System
3.2 Proposed System
4 Implementation
5 System Design
6 Results
7 Conclusion
References
An Efficient Intrusion Detection Framework in Software-Defined Networking for Cyber Security Applications
1 Introduction
2 Preliminary Knowledge
2.1 Intrusion Detection Systems
2.2 Intrusion Detection Methods
2.3 Software-Defined Network
3 Methodology and Implementation
4 Proposed Intrusion Data-Based Clustering and Detection Scenarios
4.1 Clustering Algorithms Based on Traffic Data
4.2 SDN-Based IDS Using Deep Learning Model
5 Results and Discussions
5.1 Results of the Environmental Simulation of the Clustering Algorithms Based on Traffic Data
5.2 Simulation Results of the SDN-Based IDS Using Deep LearningModel
6 Conclusion and Future Work
References
Inter-Antenna Interference Cancellation in MIMO GFDM Using SVD-MMSE Method
1 Introduction
1.1 Related Works
2 System Model
3 Simulation Results
4 Conclusion
References
Categorization of Thyroid Cancer Sonography Images Using an Amalgamation of Deep Learning Techniques
1 Introduction
2 Related Works
3 Methodology
3.1 VGGNet Model
3.2 CNN-VGGNet-16 Ensemble Method
3.3 Performance Metrics
4 Results
4.1 Experimental Setup
5 Conclusions
References
Enhancement of Sensing Performance in Cognitive Vehicular Networks
1 Introduction
2 Related Literature
3 Proposed Work
4 Simulation
4.1 Phase1: Creation of Cognitive Vehicular Network Scenario
4.2 Phase2: Perform the Spectrum Sensing in the Environment
4.3 Phase 3: Access the Sensed Channel to Transmit Different Types of Messages
5 Results
6 Conclusion
References
Evaluating Ensemble Through Extracting the Best Information from Environment in Mobile Sensor Network
1 Introduction
2 Literature Survey
3 Research Methodology
3.1 Compressive Sensing
3.2 Adaptive Sampling Method
3.3 Sampling and Reconstructions
3.4 Error Correction
4 Working Procedure
4.1 Performance of the Proposed Method and Analysis
4.2 Mobile Sensor Network and Adaptive Sampling Data Set Used in the Experiment
5 Implementation
6 Conclusion
References
Design and Analysis of Monopulse Comparator for Tracking Applications
1 Introduction
1.1 Branch-Line Hybrid Couplers
1.2 Analysis of Hybrid Coupler
2 Design of Monopulse Comparator
3 Results and Discussions
4 Conclusion
References
Objective Parameter Analysis with H.265 Using Lagrangian Encoding Algorithm Implementation
1 Introduction
2 Existing Methodology
3 Proposed Method
4 Results and Discussions
5 Conclusion
References
Effective Strategies for Detection of Abnormal Activities in Surveillance Video Streams
1 Introduction
2 Abnormal Event Detection
2.1 Need for the Abnormal Detection
3 Related Work
4 System Architecture and Design
4.1 Preprocessing
4.2 Feature Learning
4.3 Spatial Convolution
5 Results and Discussion
6 Conclusion
References
Bending Conditions, Conductive Materials, and Fabrication of Wearable Textile Antennas: Review
1 Introduction
2 Literature Survey
3 Bending Scenarios Wearable Antennas
4 Conductive Materials
5 Conclusion
References
A 3D Designed Portable Programmable Device Using Gas Sensors for Air Quality Checking and Predicting the Concentration of Oxygen in Coal Mining Areas
1 Introduction
2 Related Works
3 The Hardware and Software Required
3.1 Arduino
3.2 128 * 64 OLED Display
3.3 Sensors
3.4 Code Language
4 Methodology
5 Results
6 Conclusion and Future Works
References
Gain Enhanced Single-Stage Split Folded Cascode Operational Transconductance Amplifier
1 Introduction
2 Standard Folded Cascode OTA Topology
2.1 Background and Related Work
3 Proposed Folded Cascode OTA Topology
3.1 Architecture
3.2 Design Specifications
4 Simulation Results
5 Conclusion
References
Secure IBS Scheme for Vehicular Ad Hoc Networks
1 Introduction
2 Related Works
3 Proposed Scheme
3.1 Registration
3.2 Login
3.3 Password Change
3.4 Pseudo ID Generation
3.5 V2R and R2V Authentication
3.6 Key Update
4 Performance Analysis
4.1 Computation Cost
4.2 Communication Cost
5 Conclusion
References
Secure Localization Techniques in Wireless Sensor Networks Against Routing Attacks Using Machine Learning Models
1 Introduction
2 Literature Review
3 Network Model
4 Materials and Methods
4.1 Datasets and Machine Learning Models
5 Experimental Setup
5.1 Performance Metrics
6 Result and Discussion
6.1 Attack Detection Analysis
7 Conclusion
References
Pothole Detection Approach Based on Deep Learning Algorithms
1 Introduction
2 Literature Survey
3 Methodology
3.1 R-CNN
3.2 YOLO
3.3 Dataset
4 Results
4.1 R-CNN
4.2 YOLO
5 Conclusion
References
Design of 7:2 Approximate Compressor Using Reversible Majority Logic Gate
1 Introduction
2 Proposed Reversible Majority Logic Gate
3 Reversible Majority Logic Gate Using 4:2 Compressors
4 Proposed Reversible Majority Logic Gate Using 7:2 Compressors
5 Results and Implementations of Reversible Majority Logic Gate Approximate Adders
6 Conclusion
References
Segmentation of Cattle Using Color-Based Skin Detection Approach
1 Introduction
2 Related Work
3 Proposed Method
3.1 Color Spaces
3.2 2D Color Histogram
3.3 Classifier Training
4 Experimental Results and Discussion
4.1 Training Results
4.2 Testing Results
5 Conclusion
References
Sinusoidal Oscillator Using CCCCTA
1 Introduction
2 Current Controlled Current Conveyor Trans-conductance Amplifier (CCCCTA)
3 Proposed Sinusoidal Oscillator
4 Simulation Results
5 Conclusion
References
Automation of Railway Gate Using Raspberry Pi
1 Introduction
2 Problem Statement
3 Literature Survey
4 Methodology
5 Proposed Method
6 Results and Discussion
7 Conclusion
8 Limitations
9 Future Scope
References
Comparative Analysis of Image Segmentation Methods for Mushroom Diseases Detection
1 Introduction
2 Literature Survey
3 Methodology and Method
3.1 K-means Clustering-Based Mushroom Disease Segmentation
3.2 Region of Interest-Based Mushroom Disease Segmentation
3.3 Color Threshold-Based Mushroom Disease Segmentation [27]
4 Results and Discussions on Disease Segmentation
5 Conclusion
References
An Effective Automatic Facial Expression Recognition System Using Deep Neural Networks
1 Introduction
2 Literature Survey
3 FER Database
4 Deep Neural Networks for Face Expression Recognition System
5 Experimental Results
6 Conclusion
References
Author Index
Recommend Papers

Soft Computing and Signal Processing: Proceedings of 5th ICSCSP 2022
 9811986681, 9789811986680

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Smart Innovation, Systems and Technologies 313

V. Sivakumar Reddy V. Kamakshi Prasad Jiacun Wang K. T. V. Reddy   Editors

Soft Computing and Signal Processing Proceedings of 5th ICSCSP 2022

Smart Innovation, Systems and Technologies Volume 313

Series Editors Robert J. Howlett, Bournemouth University and KES International, Shoreham-by-Sea, UK Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK

The Smart Innovation, Systems and Technologies book series encompasses the topics of knowledge, intelligence, innovation and sustainability. The aim of the series is to make available a platform for the publication of books on all aspects of single and multi-disciplinary research on these themes in order to make the latest results available in a readily-accessible form. Volumes on interdisciplinary research combining two or more of these areas is particularly sought. The series covers systems and paradigms that employ knowledge and intelligence in a broad sense. Its scope is systems having embedded knowledge and intelligence, which may be applied to the solution of world problems in industry, the environment and the community. It also focusses on the knowledge-transfer methodologies and innovation strategies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of disciplines from science, technology, business and the humanities. The series will include conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality content is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes will ensure that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles. Indexed by SCOPUS, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST), SCImago, DBLP. All books published in the series are submitted for consideration in Web of Science.

V. Sivakumar Reddy · V. Kamakshi Prasad · Jiacun Wang · K. T. V. Reddy Editors

Soft Computing and Signal Processing Proceedings of 5th ICSCSP 2022

Editors V. Sivakumar Reddy Malla Reddy College of Engineering and Technology Hyderabad, Telangana, India Jiacun Wang Department of Computer Science and Software Engineering Monmouth University West Long Branch, NJ, USA

V. Kamakshi Prasad Department of CSE Jawaharlal Nehru Technological University Hyderabad Hyderabad, Telangana, India K. T. V. Reddy Faculty of Engineering and Technology Datta Meghe Institute of Medical Sciences Sawangi, Maharashtra, India

ISSN 2190-3018 ISSN 2190-3026 (electronic) Smart Innovation, Systems and Technologies ISBN 978-981-19-8668-0 ISBN 978-981-19-8669-7 (eBook) https://doi.org/10.1007/978-981-19-8669-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Conference Committee

Chief Patron Sri. CH. Malla Reddy, Hon’ble Minister, Government of Telangana, Founder Chairman, MRGI

Patrons Sri. CH. Mahendar Reddy, Secretary, MRGI Sri. CH. Bhadra Reddy, President, MRGI

Conference Chair Dr. V. S. K. Reddy, Director

Convener Dr. S. Srinivasa Rao, Principal

Publication Chair Dr. Suresh Chandra Satapathy, Professor, KIIT, Bhubaneswar

v

vi

Conference Committee

Co-conveners Dr. P. H. V. Sesha Talpa Sai, Dean R&D Dr. T. Venugopal, Dean-Students Welfare

Organizing Chair Prof. P. Sanjeeva Reddy, Dean, International Studies

Organizing Secretaries Dr. G. Sharada, HOD, (IT&AIML) Dr. K. Mallikarjuna Lingam, HOD, ECE Dr. S. Shanthi, HOD, CSE Dr. M. V. Kamal, HOD, CSE (DS, IOT&CS)

Coordinators Ms. P. Anitha, Associate Professor, ECE Dr. M. Jayapal, Professor, CSE Ms. M. Gaytari, Associate Professor, CSE Dr. A. Mummoorthy, Professor, IT

Organizing Committee Dr. M. Ramakrishna Murty, Professor, Department of CSE, ANITS, Visakhapatnam Dr. B. V. N. S. Nagesh Deevi, Professor, ECE Dr. B. Jyothi, Professor, ECE Dr. M. Sucharitha, Professor, ECE Dr. V. Chandrasekhar, Professor, CSE Dr. R. Ramasamy, Professor, CSE Ms. Renju Panicker, Associate Professor, ECE Mr. M. Vazralu, Associate Professor, IT Sri. B. Rajeswar Reddy, Administrative Officer

Conference Committee

vii

Web Developer Mr. K. Sudhakar Reddy, Assistant Professor, IT

International and National Advisory Committee Advisory Committee Dr. Heggere Ranganath, Chair of Computer Science, University of Alabama in Huntsville, USA Dr. Someswar Kesh, Professor, Department of CISA, University of Central Missouri, USA Mr. Alex Wong, Senior Technical Analyst, Diligent Inc., USA Dr. Ch. Narayana Rao, Scientist, Denver, Colorado, USA Dr. Sam Ramanujan, Professor, Department of CIS and IT, University of Central Missouri, USA Dr. Richard H. Nader, Associate Vice President, Mississippi State University, USA Dr. Muralidhar Rangaswamy, WPAFB, OH, USA Mr. E. Sheldon D. Wallbrown, Western New England University, Springfield, USA Prof. Peter Walsh, Head of the Department, Vancouver Film School, Canada Dr. Murali Venkatesh, School of Information Studies, Syracuse University, USA Dr. Asoke K. Nandi, Professor, Department of EEE, University of Liverpool, UK Dr. Vinod Chandran, Professor, Queensland University of Technology, Australia Dr. Amiya Bhaumik, Vice Chancellor, Lincoln University College, Malaysia Dr. Divya Midhun Chakkaravarthy, Lincoln University College, Malaysia Dr. Hushairi bin Zen, Professor, ECE, UNIMAS Dr. Bhanu Bhaskara, Professor at Majmaah University, Saudi Arabia Dr. Narayanan, Director, ISITI, CSE, UNIMAS Dr. Koteswararao Kondepu, Research Fellow, Scuola Superiore Sant’ Anna, Pisa, Italy Shri. B. H. V. S. Narayana Murthy, Director, RCI, Hyderabad Prof. P. K. Biswas, Head, Department of E and ECE, IIT Kharagpur Dr. M. Ramasubba Reddy, Professor, IIT Madras Prof. N. C. Shiva Prakash, Professor, IISC, Bangalore Dr. B. Lakshmi, Professor, Department of ECE, NIT, Warangal Dr. G. Ram Mohana Reddy, Professor and Head, IT Department, NITK Surathkal, Mangalore, India Dr. Y. Madhavee Latha, Professor, Department of ECE, MRECW, Hyderabad

Preface

The International Conference on Soft Computing and Signal Processing (ICSCSP2022) was successfully organized by Malla Reddy College of Engineering and Technology, an UGC Autonomous Institution, during June 24–25, 2022, at Hyderabad. The objective of this conference was to provide opportunities for the researchers, academicians and industry persons to interact and exchange the ideas, experience and gain expertise in the cutting-edge technologies pertaining to soft computing and signal processing. Research papers in the above-mentioned technology areas were received and subjected to a rigorous peer-review process with the help of program committee members and external reviewers. The ICSCSP-2022 received a total of 325 papers, each paper was reviewed by more than two reviewers, and finally 60 papers were accepted for publication in Springer SIST series. Our sincere thanks to Dr. Aninda Bose, Senior Editor, Springer Publications, India, Dr. Suresh Chandra Satapathy, Prof and Dean R&D, KIIT, for extending their support and cooperation. We would like to express our gratitude to all session chairs, viz. Dr. M. Ramakrishna Murthy, ANITS, Visakhapatnam, Dr. Jiacun Wang, Monmouth University, USA, Dr. Ramamurthy Garimella, Mahindra University, Hyderabad, Dr. Divya Midhun Chakravarthy, Lincoln University College, Malaysia, Dr. Naga Mallikarjuna Rao Dasari, Federation University, Australia, Dr. Hushairi Zen, UNIMAS, Malaysia, Dr. Samrat Lagnajeet Sabat, University of Hyderabad, Dr. Bharat Gupta, NIT, Patna, and Dr. Mukil Alagirisamy, Lincoln University College, Malaysia, for extending their support and cooperation. We are indebted to the program committee members and external reviewers who have produced critical reviews in a short time. We would like to express our special gratitude to publication chair Dr. Suresh Chandra Satapathy, KIIT, Bhubaneswar, for his valuable support and encouragement till the successful conclusion of the conference. We express our heartfelt thanks to our Chief Patron Sri. CH. Malla Reddy, Founder Chairman, MRGI, Patrons Sri. CH. Mahendar Reddy, Secretary, MRGI,

ix

x

Preface

Sri. CH. Bhadra Reddy, President, MRGI, Convener Dr. S. Srinivasa Rao, Principal, Co-Conveners Dr. P. H. V. Sesha Talpa Sai, Dean R&D; Dr. T. Venugopal, Dean-Students Welfare, Organizing Chair Prof. P. Sanjeeva Reddy, Dean, International Studies. We would also like thank Organizing Secretaries Dr. G. Sharada HOD, (IT&AIML); Dr. K. Mallikarjuna Lingam, HOD, ECE; Dr. S. Shanthi, HOD, CSE; Dr. M. V. Kamal, HOD, CSE (DS, IOT&CS), for their valuable contribution. Our thanks also to all Coordinators and the organizing committee as well as all the other committee members for their contribution in successful conduct of the conference. Last, but certainly not least, our special thanks to all the authors without whom the conference would not have taken place. Their technical contributions have made our proceedings rich and praiseworthy. Hyderabad, Telangana, India Hyderabad, Telangana, India West Long Branch, NJ, USA Sawangi, Maharashtra, India

V. Sivakumar Reddy V. Kamakshi Prasad Jiacun Wang K. T. V. Reddy

Contents

Microservice Architecture Observability Tool Analysis . . . . . . . . . . . . . . . . Jay Parmar, Sakshi Sanghavi, Vivek Prasad, and Pooja Shah Decentralized Payment Architecture for E-Commerce and Utility Transactions with Government Verified Identities . . . . . . . . . . . . . . . . . . . . J. Shiny Duela, K. Raja, Prashanth Umapathy, Rahul Rangnani, and Ashish Patel Detection of Phishing Website Using Intelligent Machine Learning Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mithilesh Kumar Pandey, Munindra Kumar Singh, Saurabh Pal, and B. B. Tiwari Bidirectional Gated Recurrent Unit (BiGRU)-Based Bitcoin Price Prediction by News Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jyotsna Malla, Chinthala Lavanya, J. Jayashree, and J. Vijayashree How AI Algorithms Are Being Used in Applications . . . . . . . . . . . . . . . . . . Aleem Mohammed and Mohammad Mohammad A Framework for Identifying Theft Detection Using Multiple-Instance Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. R. Aishwarya, V. Gayathri, R. Janani, Kannan Pooja, and Mathi Senthilkumar An Efficient Reversible Data Hiding Based on Prediction Error Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manisha Duevedi, Sushila Madan, and Sunil Kumar Muttoo

1

9

21

31 41

55

69

Deriving Insights from COVID-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Avinash Golande, Shruti Warang, Sakshi Bidwai, Rishika Shinde, and Sakshi Sukale

83

Smoke Detection in Forest Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . G. Sankara Narayanan and B. A. Sabarish

95

xi

xii

Contents

Pneumothorax Segmentation Using Feature Pyramid Network and MobileNet Encoder Through Radiography Images . . . . . . . . . . . . . . . 107 Ayush Singh, Gaurav Srivastava, and Nitesh Pradhan Visual Learning with Dynamic Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 G. Revathy, Pokkuluri Kiran Sree, S. Sasikala Devi, R. Karunamoorthi, and S. Senthil Vadivu Machine Learning for Drug Discovery Using Agglomerative Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 B. S. S. Sowjanya Lakshmi and Ravi Kiran Varma P Automated Photomontage Generation with Neural Style Transfer . . . . . . 139 Mohit Soni, Gaurang Raval, Pooja Shah, and Sharada Valiveti Machine Learning Model Development Using Computational Neurology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Soumen Kanrar Graph Theory-Based User Profile Extraction and Community Detection in LinkedIn—A Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 S. Sneha Latha, D. Lathika, T. Srehari, P. Yaswanthram, and B. A. Sabarish A Hybrid Approach on Lexical Indian Sign Language Recognition . . . . . 171 Athul Mathew Konoor and S. Padmavathi Defining the Convergence of Artificial Intelligence and Cyber Security with a Case Study of Email Spam Detection . . . . . . . . . . . . . . . . . . 181 Siddhant Thapliyal, Mohammad Wazid, and D. P. Singh Determining the Attribute Priority Among the Maintenance and Support Phases in Software Development Phases Using Intuitionistic Fuzzy Analytical Hierarchy Process . . . . . . . . . . . . . . . . . . . . . 191 S. Muthuselvan, S. Rajaprakash, H. Lakshmi Priya, Albin Davis, Ameer Ali, and V. Ashik Impact of COVID-19 on Energy Generation in India . . . . . . . . . . . . . . . . . . 207 Athira Krishnan and P. N. Kumar A Machine Learning-Based Approach for Enhancing Student Learning Experiences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 T. Bhaskar and R. R. Tribhuvan Book Genre Classification System Using Machine Learning Approach: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Abhisek Sethy, Ajit Kumar Rout, Chandra Gunda, Sai Kiran Routhu, Srinu Kallepalli, and Geetha Garbhapu

Contents

xiii

A New Method for Imbalanced Data Reduction Using Data Based Under Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 B. Manjula and Shaheen Layaq Resource-Based Prediction in Cloud Computing Using LSTM with Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Adithya Babu and R. R. Sathiya Arduino UNO-Based COVID Smart Parking System . . . . . . . . . . . . . . . . . 265 Pashamoni Lavanya, Mokila Anusha, Tammali Sushanth Babu, and Sowjanya Ramisetty Low-Cost Smart Plant Irrigation Control System Using Temperature and Distance Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Jyotsna Malla, J. Jayashree, and J. Vijayashree A Novel Approach to Universal Egg Incubator Using Proteus Design Tool and Application of IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 J. Suneetha, M. Vazralu, L. Niranjan, T. Pushpa, and Husna Tabassum Dynamic Resource Allocation Framework in Cloud Computing . . . . . . . . 297 Gagandeep Kaur and Sonal Chawla Region-Wise COVID-19 Vaccination Distribution Modelling in Tamil Nadu Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 M. Pradeep Gowtham and N. Harini Detection of Phishing Websites Using Machine Learning . . . . . . . . . . . . . . 317 Rahul Kumar, Ravi Kumar, Raja Kumar Sahu, Rajkumar Patra, and Anupam Ghosh Computational Learning Model for Prediction of Parkinson’s Disease Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Ch. Swathi and Ramesh Cheripelli Eliminating Environmental Context for Fall Detection Based on Movement Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 J. Balamanikandan, Senthil Kumar Thangavel, and Maiga Chang Performance Analysis of Regression Models in Stock Price Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 Manas Ranjan Panda, Anil Kumar Mishra, Samarjeet Borah, and Aishwarya Kashyap Review on the Image Encryption with Hyper-Chaotic Systems . . . . . . . . . 369 Arghya Pathak, Subhashish Pal, Jayashree Karmakar, Hrishikesh Mondal, and Mrinal Kanti Mandal Object Detection Using Mask R-CNN on a Custom Dataset of Tumbling Satellite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 P. C. Anjali, Senthil Kumar Thangavel, and Ravi Kumar Lagisetty

xiv

Contents

Decentralized Blockchain-Based Infrastructure for Numerous IoT Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 C. Balarengadurai, C. R. Adithya, K. Paramesha, M. Natesh, and H. Ramakrishna Sentiment Analysis Toward COVID-19 Vaccination Based on Twitter Posts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 Vaibhav E. Narawade and Aditi Dandekar Smart Cities Implementation: Australian Bushfire Disaster Detection System Using IoT Innovation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 Mohammad Mohammad and Aleem Mohammed An Advanced and Ideal Method for Tumor Detection and Classification from MRI Image Using Gamma Distribution and Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439 S. K. Aruna, Rakoth Kandan Sambandam, S. Thaiyalnayaki, and Divya Vetriveeran Forecasting Stock Exchange Trends for Discrete and Non-discrete Inputs Using Machine Learning and Deep Learning Techniques . . . . . . . 449 Teja Dhondi, G. Ravi, and M. Vazralu An Efficient Intrusion Detection Framework in Software-Defined Networking for Cyber Security Applications . . . . . . . . . . . . . . . . . . . . . . . . . 461 Meruva Sandhya Vani, Rajupudi Durga Devi, and Deena Babu Mandru Inter-Antenna Interference Cancellation in MIMO GFDM Using SVD-MMSE Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 Nivetha Vinayakamoorthi and Sudha Vaiyamalai Categorization of Thyroid Cancer Sonography Images Using an Amalgamation of Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . 483 Naga Sujini Ganne and Sivadi Balakrishna Enhancement of Sensing Performance in Cognitive Vehicular Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 K. Jyostna and B. N. Bhandari Evaluating Ensemble Through Extracting the Best Information from Environment in Mobile Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . 503 F. Asma Begum and Vijay Prakash Singh Design and Analysis of Monopulse Comparator for Tracking Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Srilakshmi Aouthu and Narra Dhanalakshmi Objective Parameter Analysis with H.265 Using Lagrangian Encoding Algorithm Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 Kiran Babu Sangeetha and V. Sivakumar Reddy

Contents

xv

Effective Strategies for Detection of Abnormal Activities in Surveillance Video Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 Anikhet Mulky, Payal Nagaonkar, Akhil Nair, Gaurav Pandey, and Swati Rane Bending Conditions, Conductive Materials, and Fabrication of Wearable Textile Antennas: Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 Rajesh Katragadda and P. A. Nageswara Rao A 3D Designed Portable Programmable Device Using Gas Sensors for Air Quality Checking and Predicting the Concentration of Oxygen in Coal Mining Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 M. Aslamiya, T. S. Saleena, A. K. M. Bahalul Haque, and P. Muhamed Ilyas Gain Enhanced Single-Stage Split Folded Cascode Operational Transconductance Amplifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 M. N. Saranya, Sriadibhatla Sridevi, and Rajasekhar Nagulapalli Secure IBS Scheme for Vehicular Ad Hoc Networks . . . . . . . . . . . . . . . . . . 577 J. Jenefa, S. Sajini, and E. A. Mary Anita Secure Localization Techniques in Wireless Sensor Networks Against Routing Attacks Using Machine Learning Models . . . . . . . . . . . . . 587 Gebrekiros Gebreyesus Gebremariam, J. Panda, and S. Indu Pothole Detection Approach Based on Deep Learning Algorithms . . . . . . 597 Y. Aneesh Chowdary, V. Sai Teja, V. Vamsi Krishna, N. Venkaiah Naidu, and R. Karthika Design of 7:2 Approximate Compressor Using Reversible Majority Logic Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 Vidya Sagar Potharaju and V. Saminadan Segmentation of Cattle Using Color-Based Skin Detection Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621 Diwakar Agarwal Sinusoidal Oscillator Using CCCCTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635 Shailendra Bisariya and Neelofer Afzal Automation of Railway Gate Using Raspberry Pi . . . . . . . . . . . . . . . . . . . . . 641 K. Sabarish, Y. Prasanth, K. Chaitanya, N. Padmavathi, and K. Umamaheswari Comparative Analysis of Image Segmentation Methods for Mushroom Diseases Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653 Y. Rakesh Kumar and V. Chandrasekhar

xvi

Contents

An Effective Automatic Facial Expression Recognition System Using Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665 G. S. Naveen Kumar, E. Venkateswara Reddy, G. Siva Naga Dhipti, and Baggam Swathi Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675

About the Editors

V. Sivakumar Reddy is Professor at the Department of Electronics and Communication Engineering, Malla Reddy College of Engineering and Technology, and has more than 25 years of teaching and research experience. He completed his B.E. in Electronics and Communication Engineering from S. V. University, his M.Tech. in Digital Systems at JNT University and his Ph.D. in Electronics and Communication Engineering at IIT Kharagpur. His areas of research interest include multi-media signal processing and communication protocols. He has published more than 150 papers in peer-reviewed journals and reputed conferences. He is a member of several academic bodies, such as IETE, IEEE, ISTE and CSI. He is also a reviewer for several IEEE journals. He was awarded as “Best Teacher” in three consecutive academic years with citation and cash award. He is the recipient of “India Jewel Award” for outstanding contribution to the research in the field of Engineering and Technology. V. Kamakshi Prasad is Professor in the Department of Computer Science and Engineering in JNTU Hyderabad, he completed his Ph.D. in speech recognition at the Indian Institute of Technology Madras and his M.Tech. in Computer Science and Technology at Andhra University in 1992. He has more than 23 years of teaching and research experience. His areas of research and teaching interest include speech recognition and processing, image processing, pattern recognition, ad hoc networks and computer graphics. He has published several books, chapters and research papers in peer-reviewed journals and conference proceedings. He is also an editorial board member of the International Journal of Wireless Networks and Communications and a member of several academic committees. Jiacun Wang received a Ph.D. in Computer Engineering from Nanjing University of Science and Technology (NJUST), China, in 1991. He is currently Professor at the Computer Science and Software Engineering Department at Monmouth University, West Long Branch, New Jersey. From 2001 to 2004, he was a member of scientific staff at Nortel Networks in Richardson, Texas. Prior to joining Nortel, he was a research associate at the School of Computer Science, Florida International University (FIU) at Miami and Associate Professor at NJUST. He has published numerous xvii

xviii

About the Editors

books and research papers and is Associate Editor of several international journals. He has also served as a program chair, a program co-chair, a special sessions chair and a program committee member for several international conferences. He is the secretary of the Organizing and Planning Committee of the IEEE SMC Society and has been a senior member of IEEE since 2000. K. T. V. Reddy, an Alumni of IIT Bombay, is presently working as Dean of the Faculty of Engineering and Technology, Datta Meghe institute of Medical science (Deemed to be University), Sawangi (Meghe), Wardha, Maharashtra, India. He has been teaching for 30 years. He was the presiding officer for the JEE, GATE, NEET, and other all-India exams. He has served as an advisor, general chair, and convener at over 200 national and international conferences and workshops. He gave over 150 expert talks. He has published more than 150 publications. He is an editor for the Journal of Signal and Image Processing (Tailor and Francis) and a reviewer for IEEE journals and conferences. For the years 2017-2019, he served as president of the Institution of Electronics and Telecommunications Engineers (IETE) in New Delhi, India.

Microservice Architecture Observability Tool Analysis Jay Parmar, Sakshi Sanghavi, Vivek Prasad, and Pooja Shah

1 Introduction Recently due to the improving Information Technology (IT) infrastructure and advancing technologies, there has been an immense surge in the Internet usage worldwide. Global Internet users have grown exponentially in recent times. Increasing Internet usage causes demand for high-performance applications that can efficiently handle this data. However, monolithic architecture is not capable of providing high scalability and availability. The solution to this problem is to use microservice architecture. Microservice architecture is creating a great surge in the current market. The main reason for this is that it is more scalable, reliable, and can quickly adopt new technologies than the monolithic architecture [1]. Many large-cap companies, such as Amazon, Netflix, have moved their applications and systems to the cloud as it allows these organizations to scale their computing resources as per their usage [1]. However, working with microservices is challenging, especially for beginners. One of the substantial problems is having observability in microservices. Observability consists of logs, traces, and metrics data. This data can help developers to observe applications much more efficiently. Logs assist in comprehending information on various events in the application. Tracing and metrics data helps in understanding the incoming request path and performance evaluation of the application, J. Parmar (B) · S. Sanghavi · V. Prasad · P. Shah Institute of Technology, Nirma University, Gujarat, India e-mail: [email protected] S. Sanghavi e-mail: [email protected] V. Prasad e-mail: [email protected] P. Shah e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_1

1

2

J. Parmar et al.

respectively. In addition, debugging for errors in this complex architecture is difficult as numerous microservices are communicating with each other. If a single service died, it would be cumbersome to detect the error. Logs would help in identification of error, but the root cause of error would still be undetermined. So we need a tracing tool that enables us to determine the application’s execution flow. Also, when we host our application on a cloud platform, we need to manage Pod load according to the traffic on the application, and this can be accomplished in an efficient manner with the use of Prometheus and Grafana which will be elaborated in later sections.

2 Literature Survey Omar Al-Debagy and Peter Martinek [2] provide a comparison of monolithic architecture and microservice architecture. It established that monolithic architecture performs well for small load, while microservice architecture works well for greater loads. Benjamin Mayer and Rainer Weinreich [3] describe the concept of monitoring and managing microservices which includes a system overview, runtime info of microservices, service interaction, etc. Cinque et al. [4] introduce a novel monitoring framework and perform passive tracing and logs analysis. Stefan Haselb¨ock and Rainer Weinreich [5] represent the decision guidance model, which is essential for selecting a microservice monitoring tool. Marie-Magdelaine et al. [6] demonstrate an observability framework for cloudnative microservices. They also explain that the observability framework can effectively analyze the internal behavior of microservice. Augusto Ciuffoletti [7] aims to deploy automatic monitoring infrastructure for microservice applications. Furthermore, specific consideration is provided to guard the distinction between core and custom functionalities and the on-demand creation of a cloud service. F´abio Pina et al. [8] propose a system that can collect logs of all the activity of microservice architecture with metrics data without changing the source code. In addition, it also aims to achieve an unintrusive monitoring tool. Cinque et al. [9] present a vision to associate microservices logs with black box tracing. Their proposed approach helps in gaining higher performance than built-in logs tools.

3 Observability Goal Observability in microservices is an extension of monitoring that provides an internal understanding system [6]. It includes logging, tracing, and metrics alerting that are helpful for better understanding of the system. Additionally, this data can also be beneficial in debugging applications efficiently. The observability design pattern is beneficial for understanding and scanning microservice applications efficiently. This pattern consists of log aggregation, application metrics, audit logging, distributed tracing, exception tracking, and health check API.

Microservice Architecture Observability Tool Analysis

3

Fig. 1 Observability in microservice

Figure 1 is a complete overview of the observability goal we want to achieve in this project. With the help of the instrumentation libraries, we have sent application information to various backend. The objective is to accomplish efficient observability and correlate logs, metrics, and tracing data. By achieving this, we can observe our microservice effectively. Figure 1 demonstrates various open-source tools which we have used as backend.

4 Logging Backend Logging has been an essential part of observability as logs help developers to de bug applications easily. In the microservice world, the organization requires more interactive logging tools that can help them to debug applications efficiently. In this paper, we explore some of the logging tools and evaluate its performance.

4

J. Parmar et al.

Fig. 2 ELK tracing dashboard

ELK Stack ELK stands for Elasticsearch, Logstash, and Kibana. The real-time data is collected from Logstash which is a processing pipeline and stored on Elastic search. Kibana helps in visualization of this log data significantly efficiently. To have seamless log visualization, there is a need for log shipper because if we collect and process the logs using Logstash, it will become overhead. Various ex- porters are available to ship logging data to ELK. One of such shippers is FileBeat. Firstly, the logs are pushed through FileBeat to ELK, and after processing through Logstash, they are stored on Elasticsearch. ELK stack also provides the feature of App Performance Monitoring (APM). This APM is capable enough to fetch and visualize trace information, as shown in Fig. 2. In addition, it also provides metrics information related to those traces such as latency, throughput. Having logs, metrics, traces, and alerts in the same place makes ELK stack a great observability tool. Some of the features of the ELK stack are: (1) it provides a centralized log aggregation mechanism. (2) Real-time data analysis and visualization are possible in ELK. (3) It supports various shippers for shipping log data to ELK. (4) Massive amount of data can be process in short interval. However, some of the areas where ELK might be improved are: (1) configuring self-hosted ELK is complex. (2) ELK utilizes high computing resources for easy operations. Loki Loki is a log aggregation system similar to Elasticsearch, but it is more effortless to set up and work with more promising functionalities. It is motivated by Prometheus for log aggregation and is highly cost-effective and effortless to operate. Loki has three components—Promtail, Loki, and Grafana. The Promtail agent is responsible for locating the target, adding the labels to the incoming log streams, and pushing it to the Loki instance. After connecting Loki with Grafana, we can start visualizing those logs on the dashboard. Some of the features of the Loki are: (1) it is very

Microservice Architecture Observability Tool Analysis

5

cost-effective. (2) It provides higher scalability. (3) It is easy to plug with popular tools like Kubernetes and Grafana. However, Loki has certain areas where it might be improved, which are as follows: (1) it is not easy to perform complex queries on Loki. (2) It does not provide a rich dashboard as provided by ELK.

5 Tracing Backend Logging tells us that there is an error in an application; however, it does not inform us of the flow of requests to understand the application efficiently. That is where tracing comes into the picture. We experimented with Jaeger, SigNoz, and ZipKin to visualize tracing data. Jaeger Jaeger is one of the great open-source tracing tools with numerous functionalities. Jaeger helps us to understand the execution of requests in the system [10]. Figure 3 demonstrates the information of individual request flow. We can see very useful information such as transaction time, duration, a number of services, total number of spans, request type, requested endpoints, incoming IP address, and other related requests information. Additionally, it also provides similar information on each span it processes to understand execution flow in a better way. Some of the features of Jaeger: (1) the Jaeger backend has no single point of failure and scales with the business needs. (2) It can process several billion spans per day. (3) Jaeger backend, Web UI, and instrumentation libraries have been designed from the ground up to support the OpenTracing standard. Scope of improvement in Jaeger: (1) Jaeger can have a unified UI for metrics and traces for more good observability. (2) There are no alert options available as of now in Jaeger. (3) There is no role-based

Fig. 3 Jaeger single request view

6

J. Parmar et al.

Fig. 4 Distributed tracing using SigNoz

access control available for better team management. (4) Filtering components are limited in Jaeger like we cannot run aggregates filtered on traces. SigNoz SigNoz is an open-source tool for metric and tracing data visualization. In Fig. 4, we have a diagram of trace visualization in SigNoz, and this image illustrates how convenient it is to understand request execution using SigNoz. Furthermore, we can monitor application performance, as seen in Fig. 5. One of the great features of SigNoz is that it allows for a correlation between metrics and tracing data. SigNoz architecture includes OpenTelemetry Collector, ClickHouse, Query Service, and Frontend. OpenTelemetry Collector will receive multiple format data input from various applications and forward it to ClickHouse for storage. Query service fetches data from ClickHouse and processes it before passing it to the frontend. And the last component is the frontend, which provides a unified UI for logging and metrics data, including service map and alert functionality. Some of the features of SigNoz: (1) provide correlation between metrics and tracing data; (2) advance filtering option available for data filtering; (3) it also includes a feature of alerting and service mapping. Scope of improvement in SigNoz: (1) Tool is very young in the industry. (2) It does not provide support of logs as of now.

6 Metrics Backend: Prometheus Using Grafana Metrics play a crucial role in tracking application performance. It gives us helpful insights like CPU usage, services rate, and many more. Prometheus is a monitoring system that scraps application metrics data, and Grafana is a great data visualization

Microservice Architecture Observability Tool Analysis

7

Fig. 5 Monitoring application performance using SigNoz

tool. Figure 6 demonstrates application metrics visualization in the Grafana dashboard. The reason why we want Grafana along with Prometheus is that Prometheus provides very limited visualization capability. Some of the features of Prometheus and Grafana: (1) Prometheus have effective query language for fetching and analyzing metrics data. (2) Grafana provides excellent data visualization feature. (3) We can use Grafana for alerting on fly application failure information. (4) Grafana supports data grouping for more useful data visualization. Scope of improvement in Prometheus and Grafana: (1) Prometheus is lacking in providing good visualization UI. (2) Prometheus does not have a long-term storage option. (3) Visualization libraries are limited in Grafana.

Fig. 6 Application performance monitoring in Grafana

8

J. Parmar et al.

7 Conclusion In this paper, we have discussed microservice architecture and challenges within architecture. We worked on an observability design pattern that addressed issues related to observability in microservices. We have investigated and implemented various open-source tools such as Prometheus, Grafana, SigNoz, Jaeger, Loki, and ELK stack. Our study will help others to understand the capabilities of these tools. Here, we want to conclude that there would never be a scenario where we can have one fixed tool for observability in a microservice architecture. Many tools come and can replace existing tools based on market performance. So we should always look for an updated tool in a microservice architecture.

References 1. Chen, R., Li, S., Li, Z.: From monolith to microservices: a dataflow-driven approach. In: 2017 24th Asia-Pacific Software Engineering Conference (APSEC), pp. 466–475, IEEE (2017) 2. Al-Debagy, O., Martinek, P.: A comparative review of microservices and mono-lithic architectures. In: 2018 IEEE 18th International Symposium on Computational Intelligence and Informatics (CINTI), pp. 000149–000154, IEEE (2018) 3. Mayer, B., Weinreich, R.: A dashboard for microservice monitoring and management. In: 2017 IEEE International Conference on Software Architecture Work- shops (ICSAW), pp. 66–69, IEEE (2017) 4. Cinque, M., Della Corte, R., Pecchia, A.: Advancing monitoring in microservices systems. In: 2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 122–123, IEEE (2019) 5. Haselbock, S., Weinreich, R.: Decision guidance models for microservice monitoring. In: 2017 IEEE International Conference on Software Architecture Work- shops (ICSAW), pp. 54–61, IEEE (2017) 6. Marie-Magdelaine, N., Ahmed, T., Astruc-Amato, G.: Demonstration of an observability framework for cloud native microservices. In: 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), pp. 722–724, IEEE (2019) 7. Ciuffoletti, A.:Automated deployment of a microservice-based monitoring infrastructure. Proc. Comput. Sci. 68:163–172 (2015) 8. Pina, F., Correia, J., Filipe, R., Araujo, F., Cardroom, J.: Nonintrusive monitoring of microservice-based systems. In: 2018 IEEE 17th International Symposium on Network Computing and Applications (NCA), pp. 1–8, IEEE (2018) 9. Cinque, M., Della Corte, R., Pecchia, A.: Microservices monitoring with event Logs and black box execution tracing. IEEE Trans. Serv. Comput. (2019) 10. Open source, end-to-end distributed tracing (2022)

Decentralized Payment Architecture for E-Commerce and Utility Transactions with Government Verified Identities J. Shiny Duela, K. Raja, Prashanth Umapathy, Rahul Rangnani, and Ashish Patel

1 Introduction Blockchain is the most adaptable technology that each enterprise inherits for their development. While purchasing things through e-commerce, most customers use their credit or debit cards. Payment gateways are used in these situations to maintain the integrity, authentication, and non-repudiation of card payments. As intermediary businesses such as payment gateway providers interfere in the payment process, and charge a transaction fee. Previously, such fees were not a major concern because credit card payments were typically greater in value and fewer in quantity. However, the growing availability of utility bill terminals, changes in tax deduction rates, increased value-added services offered by credit card companies, and an increase in the number of benefit stores processing higher volumes of small payments are causing problems for the e-commerce market, prompting consumers and small businesses to raise concerns about the issue [1]. Blockchain-based technologies are projected to impact a wide range of business applications and procedures, with major implications for e-commerce. Given the possibility of blockchain and related technology to construct untrustworthy systems with separate assets, a variety of business models and established processes have been developed [2] Also, modern data protection schemes do not ensure data privacy and integrity during the completion of an online payment transaction. There is no way to authenticate the transaction by any government id [3]. The blockchain technology can help solve this problem because it uses component technologies like hash, asymmetric cryptography, and public-key certificates. J. Shiny Duela · K. Raja · P. Umapathy (B) · R. Rangnani · A. Patel Department of Computer Science Engineering, SRM Institute of Science and Technology Chennai, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_2

9

10

J. Shiny Duela et al.

Decentralized authentication on a registry records agreements and business between participating nodes is part of the blockchain cryptocurrency system. This reduces total development and operating costs as well as merchant fees by simplifying the system’s structure and eliminating the requirement for external feature modules [4]. The blockchain cryptocurrency system may perform decentralized authentication on a registry that contains agreements and transactions between blockchain cryptocurrency system collaborating nodes. A blockchain payment system that preserves transaction probity and authenticity among participating nodes is being developed [5]. As a result, our primary goal is to eliminate the third-party aspect in order to avoid paying large transaction fees for Internet shopping and energy bill payments [6]. Our secondary goal is to address the issues of data isolation, transaction delays, and the risk of data leakage [7]. By restricting access of our blockchain system to only government ID confirmed people, so that security between transaction can be increased [8]. However, as the number of credit card terminals on the market grows, tax deduction rates change; credit card companies offer more value-added services, and the number of stores processing larger amounts of small payments grows, [9] the ecommerce market faces growing challenges, prompting consumers and small businesses to raise concerns about the fees associated with electronic payment systems. Because it uses component technologies like hash, asymmetric cryptography algorithm, and public-key certificate, blockchain technology may be used to solve this problem [10].

2 Related Work Bitcoin, invented in 2008, is an innovative open-source P2P payment network that allows users to send or receive money online. As a proof of work, consensus algorithm requires high specs system for mining which rewards users with coin. Each and every transaction made are maintained in a log are maintained in an encrypted form. In present-day scenarios, most Web sites use third-party payment gateways for making the payment, which charges them high service charge which makes difficult for small businesses to cover up. E-commerce payment model using blockchain [11] proposes a simple payment model that uses basic cryptocurrency features, such as public key, private key to eliminate the need for transaction intermediaries. Use of these certificates ensures authentication and integrity of the transactions. The author showcased a model based on QR which gets displayed to the customer upon purchasing an item, blockchain system confirms payment for debit and credit from customer to merchant and delivers a successful report to both confirming the order; transactions made in this are encrypted through the private and public key. Blockchain, shared log technology, can be applied to wide range of applications. In order to reinforce the privacy protection of decentralized payment systems, many solutions were made, but this

Decentralized Payment Architecture for E-Commerce and Utility …

11

system could be criminally exploited. In response to this, DCAP: A Secure and Efficient Decentralized Conditional Anonymous Payment System Based on Blockchain [12], proposed a new definition of Decentralized Conditional Anonymous Payment (DCAP) and the security standards that go with it. In order to make a strong DCAP system, they designed a condition anonymous payment scheme which includes four entities, i.e., certificate authority, user, manager, and blockchain network. Trusted nodes are necessary in the DCAP system to regulate user authority and track the true identity of harmful users. As a result, they recommended a permissioned type for DCAP. It contributes to enhanced anonymity and allows for hundreds of thousands of transactions per second, meeting the requirements of a regulated cryptocurrency system. Mobile payments have become more popular these days because of handy and faster payments. A Decentralized Data Privacy for Mobile Payment using Blockchain Technology [13] proposed a secured transaction pattern with blockchain technology to overcome the security issues related to centralized data of the payment applications which may suffer internal attacks. They bypassed the permission required from multiple service providers to directly making payments to any users without seeking the permission. In order to cope up with the fraud transactions or unknowing transactions in the network, author BPS: Blockchain-Based Decentralized Secure and Versatile Light Payment System[14] proposed a system that records any doubtful or unauthorized transactions and sends it further for investigation upon cancelling the transaction and instead of blocking, the account can be put up on hold. The author described a limitation of having large number of users in blockchain and understood that the structure of the blockchain is relatively difficult to understand than other payment systems. With increasing global economy, global trades are upscaling at higher rates, thus requires payments to be made cross-border for quick and hassle-free transfers. The author Blockchain-based Cross-border E-business Payment Model [15] proposed a framework that discusses the hierarchy and functional modules of blockchain technology to make payments across cross-borders. This paper proposed three-level chain structure and data classification management building partnership chains with dealers, customers, banks, etc., across cross-border e-business regulators. In Patterns for Blockchain-Based Payment Application [16], identified the life cycle of the token and presented 12 patterns that cover important areas that enables state transitions for blockchain-based tokens payment application. The lifecycle and annotated patterns give a systematic payment-oriented study and guide to system interactions for effective use of patterns.

12

J. Shiny Duela et al.

3 Transaction Process Fundamentals A. Blockchain Blockchain would be a data-recording system in a way that makes it difficult or impossible to imagine to evade, hack, or deceive the framework. It is also possible that a certain type of database sorts that are distinct. The amount of data that can be stored on a blockchain is limited, though. So far, the most widely used method has been a method of keeping track of transactions. Each and every one of the transactions is saved on a public list that can be accessed by anybody assumed to be a blockchain. B. Bitcoin Bitcoin is a digital electronic money that may be transferred from one client to another without the need of an intermediary via the peer-to-peer bitcoin network. It may not have a single board of trustees or a central bank. The mining process results in the creation of bitcoins. Each bitcoin is a virtual record kept in a computer’s virtual wallet. It is a form of electronic money. C. Cryptocurrency Cryptocurrency The term cryptocurrency refers to the cryptography processes used to ensure the security of a transaction. Cryptocurrency is an electronic payment system that may be used to purchase and sell products. Several companies have developed their own monetary units, called as tokens that can be exchanged for products or services.

4 Proposed Architecture The entire payment process consists of several important parts. The process involved in this is a sort of cyclic process. Customer visits the merchant’s business and makes payment through third-party vendors which redirect to merchant’s acquiring bank followed by a few processes need to be followed such as credit card processor and after that card data interchange to the issuing bank returning to the customer again for delivery of report for the successful or unsuccessful transaction. It goes on a cyclic process and basically has no specific starting and finishing point [17]. A distinct certificate, government verified ids, or supplementary functionality must be built as a component to enable a convenient utility bill payment system without a payment gateway. Blockchain technology, on the other hand, is built with public key, private key, and digital signature features out of the box, allowing for the creation of a simple utility bill payment system without the need of a payment gateway. The present study proposes that blockchain technology can be used to create a model that does not require additional feature components to be implemented. Although it has a simple structure and does not require external feature components to be added, it ensures transaction probity and authentication between the merchant and the customer, as well as between the client and the blockchain system, cutting

Decentralized Payment Architecture for E-Commerce and Utility …

13

Fig. 1 Architecture diagram of transactions in client server interactions

total system development and running expenses, as well as customer and merchant fees. As shown in Fig. 1, we can see how the payment gateway utilizes all the components simultaneously to work securely and efficiently. The payment transactions are governed using government ids. Any transaction to be processed need to have valid government id to execute, declining the same in absence of having valid id proof. Upon proceeding through numerous check passes, various services like utility bill payments, mobile recharges, booking gas cylinders, and paying insurance to many more online services. We present our payment system’s innovative software architecture in this part. As we all know, blockchain is a technology that allows many parties involved in communication to carry out various transactions without the involvement of a third party [18]. These transactions/communications are verified and validated by miners, which are specific types of nodes. A valid transaction is stored in a data structure known as a block. The current transaction’s execution is dependent on previously committed transactions. This method helps to avoid/restrict double spending in the bitcoin system in this way. It illustrates the block structure as well as the chain of blocks. We can see that the previous block’s hash is used to generate the chain of blocks. A block consists of two parts: a block header and a list of transactions. The block with the highest consensus will be accepted to be added to the network out of all the blocks in the network. Other blocks are referred to as orphan blocks and are eventually discarded by the network. Some transactions in orphan blocks have already been incorporated in the legitimate block that was just added, but others may have yet to be considered. Further mining operations must account for such transactions. For users, blockchain provides absolute transparency, both good and bad. On the one hand, by smoothing money flows, it enhances payment systems. Users that don’t

14

J. Shiny Duela et al.

want to disclose all of their payment information with everyone, on the other hand, may be concerned. Users can send money directly from their accounts to another person using peer-to-peer transactions. A decentralized payment system based on blockchain is possible. As a result, the security issue can be easily resolved. Payments can be made anywhere in the globe since blockchain has no territorial limits. The blockchain allows for real-time transactions. As a result, the rate of payment will dramatically increase [19]. To make peer-to-peer lending systems more efficient, our suggested architecture would integrate blockchain payment solutions to conduct auto-payments using smart contracts. This helps to eliminate intermediaries from the loan system, allowing the lender and the borrower to conduct business directly [20].

5 Proposed Software Architecture In this part, we present our novel software architecture for our payment system. As we know that blockchain is a technology that allows many parties involved in communication to carry out various transactions without the involvement of a third party. These transactions/communications are verified and validated by miners, which are specific types of nodes. A valid transaction is stored in a data structure known as a block. The current transaction’s execution is dependent on previously committed transactions. This method helps to avoid/restrict double spending in the bitcoin system in this way. In Fig. 2, we can see the Web3 setup for our blockchain system; it illustrates the block structure as well as the chain of blocks. The hash of the preceding block is used to generate the chain of blocks. The block header and the list of transactions are the two parts of a block. The block having the greatest consensus among the numerous blocks in the network will be allowed for inclusion. Other blocks are referred to as orphan blocks, and the network will eventually reject them. Some transactions in orphan blocks have already been incorporated in the legitimate block that was recently added, but others have yet to be evaluated. In future mining procedures, such transactions will be taken care of with similar procedure. For users, blockchain provides absolute transparency, both good and negative. On the one hand, by smoothing money flows, it enhances payment systems. Users that don’t want to disclose all of their payment information with everyone, on the other hand, may be concerned. Users can send money straight from their accounts to another individual via peer-to-peer transactions. A decentralized payment system based on blockchain is possible. As a result, the security issue may be easily resolved. Payments may be made anywhere in the globe since blockchain has no territorial limits. As shown in Fig. 3, we can see the transactions taking place in real time when an order is executed. Using the Web3 technology, from Fig. 2, transactions and hashing between blocks in blockchain have increased performance as compared to older alternatives. As a result, the rate of payment will drastically rise. Our proposed architecture will integrate blockchain payment solutions to process auto-payments using smart contracts to make lending more efficient in peer-to-peer

Decentralized Payment Architecture for E-Commerce and Utility …

15

Fig. 2 Web3 implementation of blockchain-based payment gateway

Fig. 3 Blocks mining in the blockchain server through hashing

lending platforms. This helps remove intermediaries from the lending system and enables the direct transaction between the lender and the borrower. In Fig. 4, our payment model is shown along with the pipeline of how all the modules work together. Our proposed software architecture is discussed in this section. Our implementation of the payment gateway is separated down to 4 key modules. After our server is setup and running on the peer’s network, our model runs. The miners start hashing the blocks, and transactions are ready to start. The flow of this architecture is explained as: • • • •

B2C connection Transaction Hashing Blockchain update

16

J. Shiny Duela et al.

Fig. 4 Architecture diagram of proposed payment gateway

In the first module, the business and the client’s wallet is connected for the transaction in the blockchain. Then, the client’s wallet is given access to the hash in the ecommerce chain to request for a payment of the product. The business sends a token and a hash value that requests a transaction fee in hashed format to the client for security. Later, during the transaction, the client has the token and the hash for the transaction chain; this enables the client to start the payment. The client’s cryptocurrency will be sent through the B2C blockchain visible to all but hashed uniquely to the business and the client only. After the client has sent to amount to the chain, our hashing algorithms hash the transaction unique to the buyer and the seller. This triggers our hash miners, which keep looking out for alterations in the blockchain. The hashing process ensures that our transaction stays public to everyone while in hashed format. This is important to backtrack any fraudulent transaction anyone makes in the future. It also maximizes safety for every block again as the chain is updated after each transaction. Finally, after the blocks have been hashed and the update, the transaction will been processed, and the cryptocurrency will be delivered successfully. Now, the new transaction will be updated in the chain refreshing the hash values. This ensures our transaction has been recorded truthfully and can be verified against anyone from the business or the client. We can refer to Fig. 5 where we can see the transactions

Decentralized Payment Architecture for E-Commerce and Utility …

17

Fig. 5 Transactions occurring the server-side blockchain network

verified and the smart contracts in the blockchain server being modified to add a new verified transaction into the chain.

6 Future Work Even though the market is settling around Bitcoin and Ethereum, there are still several other cryptocurrencies to deal with, and none of them are programmable with others. Cross-border payment requires a set of standards to be followed as it cannot be entertained with non-registered currencies which are private. Therefore, there is a need for more liquid-registered currencies to deal with the same technology to adopt the SWIFT standards to ease payments cross-border. Our proposed model gets rid of the third-party payment gateways, for which they used to charge a lot for different payment methods. Ease of payment through UPI, QR-based payment are attracting users; the same can be adopted with cryptocurrencies too. Payment can be made directly from one wallet to another with the same technology. The advantage would be that the chances of getting payment failure would get negligible, and payments can be made within seconds. This also will help in getting compact addresses other than having large address for cryptocurrencies. There would be one stop application for managing, transferring, and receiving payments. With better UI and customer support, it will attract customers, and cryptocurrencies payments will get popularized. Future advances to cryptocurrencies can range from non-liquid private currencies to get them standards to operate upon through popularizing them along with to make blockchain a system to be compatible to use with other systems. List of previous approaches in establishing a secure payment gateway and their shortcomings.

18

J. Shiny Duela et al.

Table 1 List of previous approaches in establishing a secure payment gateway and their shortcomings Research Papers

Novelty

Problems and issues

Blockchain-based cross-border E-business payment model [15]

Implemented payment system with custom hashes per client

Failed transactions in large scale limited by old network protocols

BPS: Blockchain-based Implemented large-scale decentralized secure and blockchain system for bulk versatile light payment system transaction [14]

Had a delay in between verification due to slow hashing process

E-commerce payment model using blockchain [17]

Increased transaction rates between parties

Outdated hashing algorithms that were not secure

A decentralized data privacy using blockchain technology [13]

Used better hashing algorithms

Transactions were not isolated

Our proposed architecture

Government verified identities Overcomes all previous with latest technology shortcomings with latest Web3 implementations architecture implementation

7 Results While processing any payment in e-commerce settings, customers are made to pay additional charge along with the sum. Upon improving from various previous approaches, the current proposed architecture offered a simple payment model that does not require a transaction intermediary or middleman, such as a public-key certificate or payment gateway, and constructed a design to demonstrate its use. An experiment was carried out to see if internal blockchain elements like public key, private key, and digital signature could be utilized to build a working electronic payment system without the need for extra modules. Table 1 shown about. Referring to Fig. 6, we can see the efficiency in hashing new blocks. This just not only saved extra charges that customers had to pay but also had facilities like faster payments during rush hours without slowing down payment transfers that usually used to happen due to large number of users in payment gateways. Payment settlements were made fast due to bank-to-bank linkage.

8 Conclusion Prime novelty was the introduction of government ids linked to accounts that used to make utility bill payments, recharges, book tickets, etc. As shown in Fig. 5, we can see the transaction completed in a utility service and the Ethereum being deducted from our emulated wallet which had 100 ETH for testing purposes. The account wallet is attached to the individual’s government verified identification number. This

Decentralized Payment Architecture for E-Commerce and Utility …

19

Fig. 6 Comparison between time taken in hashing transaction blocks

significantly reduces any fraudulent exchange happening over digital transactions and promotes users toward digital payments. Finally, in conclusion, a system was made in which the customer had facility to make payment directly from his/her bank account to merchant’s bank account without any third party looking over the payments.

References 1. Ismanto, L., Ar, H.S., Fajar, A.N. and Bachtiar, S.: Blockchain as E-commerce platform in Indonesia. Information Systems Management Department, Universitas Bina Nusantara, Indonesia (2019) 2. Treiblmaier, H., Sillaber, C.: The impact of blockchain on e-commerce: a framework for salient research topics. Electron. Commer. Res. Appl. 48 (2021) 3. Li, X., Mei, Y., Gong, J., Xiang, F., Sun, Z.: A blockchain privacy protection scheme based on ring signature. IEEE Access 1–1 (2020). https://doi.org/10.1109/ACCESS.2020.2987831. 4. Chen, Y., Bellavitis, C.: Blockchain disruption and decentralized finance: the rise of decentralized business models. J. Bus. Ventur. Insights (2020) 5. Liu, C., Xiao, Y., Javangula, V., Hu, Q., Wang, S., Cheng, X.: Norma Chain: a blockchainbased normalized autonomous transaction settlement system for IoT-based E-commerce. IEEE Internet Things J. 6(3), 4680–4693 (2019). https://doi.org/10.1109/JIOT.2018.2877634 6. Nandan, G., Chakravorty, C.: Comparative study on cryptocurrency transaction and banking transaction. Global Trans. Proc. 2(2). ISSN 2666–285X, (2021). https://doi.org/10.1016/j.gltp. 2021.08.064 7. Paik, H.Y., Xu, X., Bandara, H.D., Lee, S.U., Lo, S.K.: Analysis of data management in blockchain-based systems: from architecture to governance. IEEE Access 1–1 (2019). https:// doi.org/10.1109/ACCESS.2019.2961404 8. Kuperberg, M., Kemper, S. and Durak, C.: Blockchain usage for government-issued electronic IDs: a survey. Blockchain and Distributed Ledgers Group, DB Systel GmbH, Frankfurt am Main, Germany 9. Arora, A., Sharma, M., Bhaskaran, S.: Blockchain technology transforms E-commerce for enterprises. In: Batra, U., Roy, N., Panda, B. (eds.) Data Science and Analytics. REDSET

20

10.

11. 12.

13. 14.

15. 16. 17. 18. 19. 20.

J. Shiny Duela et al. 2019. Communications in Computer and Information Science, vol. 1230. Springer, Singapore. https://doi.org/10.1007/978-981-15-5830-6_3 Lim, Y.H., Hashim, H., Poo, N., Poo, D.C.C., Nguyen, H.D.: Blockchain technologies in Ecommerce: social shopping and loyalty program applications. Int. Conf. Hum. Comput. Interact. (2019) Kim, S.I., Kim, S.H.: E-commerce payment model using blockchain. J. Ambient Intell. Hum. Comput. (2020) Lin, C., He, D., Huang, X., Khan, M.K. and Choo, K.K.R.: DCAP: a secure and efficient decentralized conditional anonymous payment system based on blockchain. IEEE Trans. Inf. Forensics Secur. 15 (2020) Pillai, B.G., Madhurya, J.A.: A decentralized data privacy for mobile payment using blockchain technology. Int. J. Recent Technol. Eng. (IJRTE) 8(6), ISSN: 2277–3878 (2020) Ahamed, S., Siddika, M., Islam, S., Anika, S., Anjum, A., Biswas, M.: BPS: Blockchain based decentralized secure and versatile light payment system. Asian J. Comput. Sci. Inf. Technol. (2021) Li, x.H.: Blockchain-based Cross-border E-business payment model. In: 2nd International Conference on E-Commerce and Internet Technology (ECIT) (2021) Lu, Q., Xu, X., Bandara, H.D., Chen, S. and Zhu, L.: Patterns for Blockchain-based payment applications. arXiv:2102.09810v3 18 Aug 2021 Zhao, H.: A cross-border E-commerce approach based on blockchain technology. Mob. Inf. Syst. 2021, Article ID 2006082, (2021) Jiang, J., Chen, J.: Framework of blockchain-supported e-commerce platform for small and medium enterprises (2021) Luo, X., Wang, Z., Cai, W., Li, X. and Leung, V.C.M.: Application and evaluation of payment channel in hybrid decentralized ethereum token exchange. Blockchain Res Appl (1) (2020) Tarakeswara Rao, B., Patibandla, R.S.M. and Murty, M.R.: A comparative study on effective approaches for unsupervised statistical machine translation. In: International Conference and Published the Proceedings in AISC Springer Cconference, vol. 1076, pp . 95–905 (2020)

Detection of Phishing Website Using Intelligent Machine Learning Classifiers Mithilesh Kumar Pandey, Munindra Kumar Singh, Saurabh Pal, and B. B. Tiwari

1 Introduction Due to the rapid growth of general system association and communication progress, a large number of our usual rehearsals, such as easy-going organization, electronic banking, online business, etc., have been transferred to the network. The open, inexplicable, and uncontrolled foundation of the Internet is connected to the unprecedented stage of computerized attacks, which also has real security flaws for standard PC client networks. Despite the fact that caution and experience of the client are significant, it is absurd to totally keep clients from tumbling to the phishing trick [1]. Since, to build the accomplishment of the phishing assaults, aggressors likewise get into thought about the character qualities of the end client particularly for misdirecting the generally experienced clients. End-client designated cyberattacks cause monstrous loss of touchy/individual data and even cash for people whose aggregate sum can arrive at billions of dollars in a year. Phishing assaults’ relationship is gotten from “fishing” for casualties; this sort of assaults has drawn in a lot of consideration from scientists lately. In addition, this is a promising connection program for attackers who initiate false objections, and it has authoritatively approached the course of action of significant and real data on Web sites on the Internet [2]. Furthermore, because of the expanded utilization of cell phones, the end clients are not really cautious while actually looking at their informal organizations moving. Hence, aggressors focus on the cell phone clients to build the productivity of their M. K. Pandey · M. K. Singh · S. Pal (B) Department of Computer Applications, VBS Purvanchal University, Jaunpur, Uttar Pradesh 222001, India e-mail: [email protected] B. B. Tiwari Department of Electronics and Communication, VBS Purvanchal University, Jaunpur, Uttar Pradesh 222001, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_3

21

22

M. K. Pandey et al.

Fig. 1 Phishing detection approach

assaults. In the writing, there are a few examinations, which are centered on distinguishing phishing assaults. In the new studies, creators examine the overall qualities of the current phishing strategies by sorting the specialized methodologies utilized in these kinds of assaults, and some viable and compelling battling procedures are featured [3]. Phishing assaults exploit the weaknesses of the human clients; hence, some extra emotionally supportive networks are required for the security of the frameworks/clients. The insurance instruments are arranged into two principle gatherings: by expanding the attention to the clients and by utilizing some extra projects as portrayed in Fig. 1. In a new audit on phishing [4], when some new methods were proposed to defeat various phishing attacks, the creators focused their attention on this point; the attackers took advantage of the weaknesses of this arrangement and passed on new types of attacks have been introduced.

2 Related Works In this segment, early works that utilized phishing recognition moves toward that used with AI calculations will be talked about (See Table 1).

3 Tools and Techniques This segment gives the subtleties of the six proposed order model’s calculations and portrayal about performance measurements that are utilized in this article.

Detection of Phishing Website Using Intelligent Machine Learning …

23

Table 1 Related work that has been as of now done Author’s Name

Topic/Methodology/Finding

Cao et al. [5]

Supports a framework for making a white list by registering the IP address of each site, and the site has a login UI that the client has visited. It happens that when a customer visits a certain site, the system will prompt to accept that the selected information of the site is abnormal. Nonetheless, this technique is viewed as suspect in the real locales visited by the client interestingly

Shyni et al. [6]

Proposes a procedure consolidating normal language handling, AI and picture handling is depicted. They utilize a sum of 61 elements are utilized. They accomplished an order exactness of above 96% utilizing a multi-classifier

Prakash et al. [7]

Using a mixture of blacklists and heuristics, they achieved FP and FN rates of 5% and 3%, respectively

Chandrasekaran et al. [8]

Proposed a technique to describe phishing reliant upon hidden properties of phishing messages. Author used information gain (IG) to rank these components subject to their significance. They applied one-class SVM to arrange phishing messages subject to the picked highlights. Their results ensure an acknowledgment speed of 95% of phishing messages with a low false-positive rate

Jain and Gupta [9]

Activated a technology that warns customers on the Internet through a white real objection that these objections are resurrected along these lines. The strategy includes two stages, space-IP address and module arrangement and source code extraction of relational elements. As shown in the exploratory results, a true positive rate of 86.02% and a false negative rate of 1.48% were found in this work

Fette et al. [10]

Author tried 860 phishing emails and 6950 real emails. The proposed program accurately perceives 96% of phishing messages at a false rate of 0.1%. As the real manufacturers put forward, their execution ability is not very good, supported further work in this field

Buber et.al [11]

Author proposed a phishing exposure framework containing 209 word vector parts and 17 NLP-based highlights. Therefore, in our relentless review, we focused our attention on this issue and showed better results with an accuracy of 7%

Babagoli et.al [12]

Used nonlinear far away trust technology to see if the site is phishing. They tend to use planned tasks and support vector machine meta-heuristic calculations to build structures. According to them, by using only about 11,000 pages, the consistency search can provide unparalleled accuracy of 94.13% and 92.80% for the training and testing process

3.1 Classification Algorithms Logistic regression is a verifiable evaluation strategy used to anticipate data based on previous perceptions of education classification. This strategy allows estimates to be used in AI applications to schedule data movements that relay on real data. With

24

M. K. Pandey et al.

the emergence of more important data, it is estimated that the grouping within the expected classification should be improved [13]. K-nearest neighbor estimates subject to supervised learning strategies. K-NN estimates that it recognizes the similarities between the new cases/data and the available cases, and puts the new cases into most of the open-type arrangements. K-NN estimates to store all available data and rank another data point according to similarity [14]. K-NN estimation can be similarly used for sequential recurrence, but it is usually used for classification problems [15]. Decision tree is a supervised learning system which is more suitable for handling categorization problems. It is a tree classifier, where the internal focus deals with the information index part, the branch deals with the selection measures, and each leaf position focuses on the result. In the selection tree, there are two focal points, especially the selection center and the leaf center [16]. Random forest is a noteworthy artificial intelligence estimate. It is commonly used for grouping and recurrence problems in ML. It is a process of combining different classifiers to deal with unpredictable problems and reduce the introduction of models. As the name implies, random forest is a classifier that contains different trees on each subset of a given dataset and uses ordinary methods to eliminate the foresight accuracy of the dataset [17]. Support vector machines or SVMs are used for classification similar to regression problems. The goal of SVM estimation is to determine the best line or decision limit that can divide the n-dimensional space into several categories. This optimal decision limit is called the hyperplane. SVM selection helps to make the top center/vector of the hyperplane [18, 19].

3.2 Performance Metrics When conducting the experiment, each test set was performed using six different AI estimates. First, create a confusion matrix for learning calculations. By utilizing the qualities in confusion matrix, four distinct insights as accuracy, precision, f 1-score, and recall are determined to quantify the value and effectiveness of the calculations. Further ROC curve is attracted to approve the discoveries from accuracy. These measurements, whose detailing is portrayed in Eqs. (1–4), are additionally significant for making an examination between the tried AI draws near [20]. Accuracy =

TP + TN TP + TN + FN + FP

(1)

TP TP + FP

(2)

Precision = Recall =

TP TP + FN

(3)

Detection of Phishing Website Using Intelligent Machine Learning …

25

Fig. 2 Proposed methodology

F − Measure = 2 ∗

Precision ∗ Sensitivity Precision + Sensitivity

(4)

where TP implies true positive, TN implies true negative, FP implies false positive, and FN implies the false-negative rate of classification algorithms.

4 Experimental Methodologies To develop a high-quality URL phishing detection system, several machine learning classifiers are applied to phishing dataset. When developing the classifier, we initially changed each site to a reasonable arrangement for our AI calculations. Each site is tended to by a vector that has the worth of each erased feature (equal or diligent). As referenced above, we utilized tenfold cross-validation. The PC used to run this experiment is a 32-bit framework with a processor speed of 3.20 GHz and a RAM size of 4 GB. More details on flow of works are provided in Fig. 2. In this work, we use tenfold cross-validation which means out of 10 parts, 9 parts are used for training, and 1 part will be used for testing; all components will be used as preparation and testing data. This strategy guarantees that the training information is unique in relation to the test information. Used for execution and testing of our machine learning classifier. We use URLs for 11,054 sites; each example has 30 sites boundaries and a class name distinguishing it as a phishing site or not (1 or −1). In the given dataset, there are 6157 phishing sites, and 4897 was not.

5 Results and Discussion ML includes two significant stages: the training stage and the testing stage. The prescient exactness of the classifier exclusively relies upon the data acquired throughout the preparation interaction; if the data acquired information gain is low,

26 Table 2 %Accuracy acquired by classifiers

M. K. Pandey et al. Algorithm

Accuracy (%)

LR

92.67

KNN

65.03

DT

94.61

RF

96.87

SVM

68.83

then prescient exactness will be low; however, in case of high IG then, at that point, the classifier’s exactness will be high. In this work, we arranged and attempted our classifier using tenfold crossvalidation. The dataset is separated into 10 parts; 9 of the 10 folds are used to set up the classifier, and the information gained from the train stage would be used to test the 10th part; this is finished on different occasions, so much that, at the completion of the training and testing stage, all of the parts would have been used as both training and testing data. This strategy guarantees that the training information is unique in relation to the test information. In AI, this strategy is known to give an excellent gauge of the speculation of a classifier. We tried our technique utilizing phishing dataset (as displayed in Table 2); this was done to know the presentation of the calculation. The full outcome is accounted for in Table 2. The calculation performed best when tried on a few calculations. Among these calculations, RF has most elevated accuracy of 96.87%, precision of 97%, and recall of 96% and f1-score of 97%; this infers that our technique will work adequately whenever applied to other phishing dataset, which is typically enormous in size. Table 3 shows the precision, recall, and f1-score of the classifiers for phishing and non-phishing sites. RF, DT, and LR give most elevated precision, recall, and f1-score individually. Table 3 additionally analyzes the confusion matrix, precision matrix, and recall matrix for the classifiers. Accordingly, exhibitions of RF and DT are better when contrasted with different classifiers as far as accuracy, recall, and precision. The ROC curve shows in Fig. 3 is the compromise between affectability (or TPR) and explicitness (1—FPR). Classifiers that give curve nearer to the upper left corner show a superior exhibition. As a benchmark, an irregular classifier is relied upon to give focuses lying along the inclining (FPR = TPR). The nearer the curve goes to the 45° askew of the ROC space, the less precise the test. Random forest has the most elevated AUC, i.e., 99.5% among every one of the classifiers.

6 Conclusion Phishing has turned into a certification risk for overall security and economy. The rapid rise of new phishing areas and the spread of phishing attacks have made it difficult for people to stay with recent incidents. Similarly, in this article, we propose

Detection of Phishing Website Using Intelligent Machine Learning …

27

Table 3 Precision, recall, and f 1-score with confusion, precision, recall matrix of the classifiers Algorithm

Non-Phishing -1 Phishing 1

Precision

Recall

F1-score

LR

−1 1

0.89 0.95

0.94 0.92

0.91 0.94

KNN

−1 1

0.60 0.70

0.63 0.67

0.61 0.68

DT

−1 1

0.94 0.96

0.94 0.95

0.94 0.96

RF

−1 1

0.97 0.97

0.96 0.98

0.97 0.97

(continued)

28

M. K. Pandey et al.

Table 3 (continued) Algorithm

Non-Phishing -1 Phishing 1

Precision

Recall

F1-score

SVM

−1 1

0.53 0.81

0.69 0.69

0.60 0.75

Fig. 3 ROC curve for comparison of classifiers

a substance-based phishing method that has overcome the current vulnerabilities identified in the combination. This concept produces a high accuracy of 96.87%, and ROC (AUC) is about 99.5%.

References 1. Shein, E.: The gods of phishing. Infosecurity 8(2), 28–31 (2011) 2. Suresh, Y., Aarthi, E., Godekar, A., Geetha, R.: Detailed investigation: stratification of phishing websites assisted by user ranking mechanism. In: Proceedings of the International Conference on Intelligent Computing Systems (ICICS 2017–Dec 15th-16th 2017) organized by Sona College of Technology, Salem, Tamilnadu, India (2017) 3. Rao, S., Verma, A. K., Bhatia, T.: A review on social spam detection: challenges, open issues,

Detection of Phishing Website Using Intelligent Machine Learning …

29

and future directions. Expert Syst. Appl. 115742 (2021) 4. Miranda, M.J.: Enhancing cybersecurity awareness training: a comprehensive phishing exercise approach. Int. Manage. Rev. 14(2), 5–10 (2018) 5. Cao, Y., Han, W., & Le, Y.: Anti-phishing based on automated individual white-list. In: Proceedings of the 4th ACM Workshop on Digital Identity (2008) 6. Shyni, C.E., Sarju, S., Swamynathan, S.: A multi-classifier based prediction model for phishing emails detection using topic modelling, named entity recognition and image processing. Circ. Syst. 7(09), 2507 (2016) 7. Prakash, P., Kumar, M., Kompella, R. R., Gupta, M.: Phishnet: predictive blacklisting to detect phishing attacks. In: 2010 Proceedings IEEE INFOCOM (pp. 1–5). IEEE (2010) 8. Chandrasekaran, M., Narayanan, K., Upadhyaya, S.: Phishing email detection based on structural properties. In:NYS Cyber Security Conference, vol. 3 (2006) 9. Jain, A. K., Gupta, B.B.: A novel approach to protect against phishing attacks at client side using autoupdated white-list. EURASIP J. Inf. Secur. (2016) 10. Fette, I., Sadeh, N., Tomasic, A.: Learning to detect phishing emails. In Proceedings of the 16th International Conference on World Wide Web (pp. 649–656) (2007) 11. Buber, E., Diri, B., Sahingoz, O. K.: Detecting phishing attacks from URL by using NLP techniques. In: 2017 International Conference on Computer Science and Engineering (UBMK), vol. 28, pp. 337–342 (2017) 12. Babagoli, M., Aghababa, M. P., Solouk, V.: Heuristic nonlinear regression strategy for detecting phishing websites. Soft Comput. 1–13 (2018) 13. Chaurasia, V., Pal, S.: Ensemble technique to predict breast cancer on multiple datasets. Comput. J. bxab 110 (2021). https://doi.org/10.1093/comjnl/bxab110 14. Chaurasia, V., Pal, S.: Applications of machine learning techniques to predict diagnostic breast cancer. SN Comput. Sci. 1(5), 1–11 (2020) 15. Yadav, D.C., Pal, S.: Prediction of thyroid disease using decision tree ensemble method. Hum. Intell. Syst. Integr. 2(1), 89–95 (2020) 16. Basak, J., Krishnapuram, R.: Interpretable hierarchical clustering by constructing an unsupervised decision tree. IEEE Trans. Knowl. Data Eng. 17(1), 121–132 (2005) 17. Chaurasia, V., Pandey, M. K., Pal, S. (2021, March). Prediction of presence of breast cancer disease in the patient using machine learning algorithms and SFS. In: IOP Conference Series: Materials Science and Engineering, vol. 1099, no. 1, p. 012003. IOP Publishing 18. Kurani, A., Doshi, P., Vakharia, A., Shah, M.: A comprehensive comparative study of artificial neural network (ANN) and support vector machines (SVM) on stock forecasting. Ann. Data Sci. 1–26 19. Tarakeswara Rao, B., Patibandla, R. S. M., Murty, M. R.: A comparative study on effective approaches for unsupervised statistical machine translation. In: Embedded Systems and Artificial Intelligence, pp. 895–905. Springer, Singapore (2020) 20. Chaurasia, V., Pal, S.: Performance analysis of data mining algorithms for diagnosis and prediction of heart and breast cancer disease. Rev. Res 3(8) (2014)

Bidirectional Gated Recurrent Unit (BiGRU)-Based Bitcoin Price Prediction by News Sentiment Analysis Jyotsna Malla, Chinthala Lavanya, J. Jayashree, and J. Vijayashree

1 Introduction Cryptocurrencies, like Bitcoin, Litecoin, and Ethereum, are digital assets, mostly used as a medium of payment [5–7, 21]. Bitcoin was introduced in 2009 and is the earliest cryptocurrency in the world. It has gained immense popularity and has attracted a huge consumer base owing to its ever-increasing market capitalization. The rise of cryptocurrencies has led to huge amounts of data-generated online in various forums, blogs, social media sites, etc. This data can be used to generate useful patterns related to Bitcoin price fluctuations with the help of machine learning, deep learning, and natural language processing. Quite a few researchers have got notable predictions in Bitcoin price patterns based on sentiment in social media platforms [12, 15, 17]. However, very few focus on the capability of media news to detect Bitcoin price behavior. News constantly changes perception and sentiment of traders and investors of Bitcoin and has an impact on their decisions. Our work would help to fill the gaps identified by developing a deep learning model for Bitcoin price prediction using news sentiment analysis. Bidirectional GRU (BiGRU) is an extension of the GRU, a recurrent neural network consisting of input and forget gates. BiGRU uses two GRU’s to take input in both forward and backward directions and works extremely well with time series data. This paper aims to propose a neural network model for Bitcoin price prediction based on news sentiment analysis. The neural network framework is composed of various steps. Sentiment analysis in the second step is done on the preprocessed news headline data. In the third step, various deep learning algorithms are used to find the most efficient algorithm for prediction. The best performing algorithm is chosen for the prediction modeling. This model shows that it is able to outperform the various J. Malla (B) · C. Lavanya · J. Jayashree · J. Vijayashree School of Computer Science and Engineering, Vellore Institute of Technology, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_4

31

32

J. Malla et al.

other models in its accuracy. It can be used as a reliable Bitcoin price predictor to understand patterns in Bitcoin price movement. The remaining part of the paper is organized as follows. Section 2 consists of a Literature Survey. Section 3 consists of the News Headlines Data Preprocessing Techniques. Sentiment Analysis is discussed in Sect. 4. Section 5 describes the performance comparison of different deep learning algorithms. Section 6 is the proposed BiGRU prediction model. Experiments and Results are discussed in Section VII. Lastly, Section VIII is the conclusion of the work and highlights the future directions.

2 Literature Survey Researchers have contributed several works in the field of predicting the Bitcoin market movement based on sentiment analysis [8, 12, 15–17, 20]. In [12], the authors have forecasted the cryptocurrency price fluctuations based on tweet sentiment causality, the volume of tweets along with the daily price closings and trading volumes of various cryptocurrencies. It has been found that Twitter sentiments could play a role in the prediction of the cryptocurrency market movement. The authors in [15] tried extending this concept by developing a model employing extreme gradient boosting regression (XGBoost) and Twitter sentiment to predict the price volatility of ZClassic cryptocurrency. The authors developed KryptoOracle to predict the next-minute Bitcoin prices using current Bitcoin prices and Twitter sentiments in [17]. XGBoost was used to develop KryptoOracle because of its high speed and high performance and training simplicity. A hybrid model consisting of both LSTM and GRU has been proposed for predicting the prices of namely two digital currencies Litecoin and Monero in [18]. The proposed model achieved a mean absolute error (MAE) of 19.5 for predicting the digital currencies prices for an interval of 7 days. The model and was able to outperform the existing LSTM model for predicting the prices of Litecoin and Monero. Five days Bitcoin price prediction models using linear regression and decision tree techniques were compared in [19]. The models were trained over the Bitcoin dataset ranging from 2011 to 2019. Linear regression model was able to predict the Bitcoin prices better than the decision tree model. The authors in [8] tried to forecast the bi-hourly prices of various cryptocurrencies such as Litecoin and Bitcoin taking into account the social factors. A linear regression model based on a large volume of neutral, negative, and positive sentiments generated from tweets bi-hourly was developed for this purpose. The importance of various data preprocessing methods for Twitter sentiment analysis was studied in [20]. 16 different preprocessing techniques were tested. They stated it is important to preprocess the data using techniques such as lemmatization, replacing shortened words, and removing punctuations.

Bidirectional Gated Recurrent Unit (BiGRU)-Based Bitcoin Price …

33

Research work in [16] was to detect the Twitter users using contentious words while posting tweets related to Covid-19. Their model used different machine learning algorithms such as random forest, logistic regression, support vector machine, multi-layer perception, and stochastic gradient descent to detect such users. Random forest was found to give the highest accuracy for the input data.

3 News Headlines Data Preprocessing Techniques To use a large dataset for prediction modeling such as the news headlines dataset, before modeling, the data need to be properly clean and processed to increase the overall accuracy of the model and to decrease the computational time involved in prediction. Removing the stop words and text-stemming is some of the techniques for preprocessing text-based data [1–4, 9, 13]. Text-Stemming Text-stemming is a text preprocessing technique widely used in natural language processing, text-mining, information-retrieval systems. It extracts the root or the stem form of the word by removing the word and grammar conjugations. This helps in faster searching of these words. It also helps to reduce the total sum of distinct words since various word forms have the same meaning. Lesser the variability in text data the lesser time it takes for processing the data and generates the output faster in natural language processing systems. Text-stemming removes similar meaning words from the data, and all the resulting words in the data have different meanings. Removing Stop Words Stop words in text data occur quite frequently and are mostly used to conjugate words in sentences. These words do not add significant meaning to the data and can be safely ignored. They can be removed to decrease the quantity of text data which would otherwise hinder fast information retrieval and data processing in natural language processing applications. Commonly occurring stop words in text data are ‘is’, ‘a’, ‘are’, ‘the’, etc.

4 Sentiment Analysis Using TextBlob Sentiment analysis helps to understand the emotions and sentiment of the data and generate hidden patterns based on the context. It helps to analyze the data and classify it depending on the needs of the work [10, 11]. TextBlob is a natural language processing Python library mainly used for sentiment analysis and performing complex tasks on text data. It returns the subjectivity and polarity of data. Polarity ranges between [−1,+1]. +1 denotes a positive sentiment, and -1 denotes a negative sentiment. Subjectivity denotes the quantity of

34

J. Malla et al.

personal information and facts contained in the data. The higher the subjectivity, the higher is the amount of personal information contained in the text. Its values range between [0, 1].

5 Forecasting Deep Learning Models Different machine learning algorithms to analyze how the social factors can be an indicator of Bitcoin and other cryptocurrency’s market movements. Research depicts that long short-term memory (LSTM), gated recurrent units (GRUs), bidirectional long short-term memory (BiLSTM) are commonly used for developing the cryptocurrency market movement prediction model [14, 22]. Long Short-Term Memory (LSTM) LSTM is an improved variation of Vanilla RNN. It was developed to get rid of the vanishing gradient problem. The network of LSTM has three gates, namely forget gate, input gate, and output gate. It also has a cell state which carries the data from the earlier stages to the later stages without losing any of it. Gated Recurrent Unit (GRU) A newer version of RNN is GRU. The working of it is similar to LSTM, but it works faster than LSTM because it does not contain the cell state. It contains two gates, namely the update gate and the reset gate. It works faster for smaller datasets since the information is directly stored in the memory. Bidirectional Long Short-Term Memory BiLSTM is an improved variation of LSTM. It contains two LSTM units. This helps to increase the total information in the network as one LSTM unit processes the input data in forward direction and the second unit processing it in backward direction.

6 Proposed Bidirectional Gated Recurrent Unit Model with News Sentiment Analysis We built a BiGRU-based prediction model based on sentiment analysis of news headlines data related to Bitcoin. This model is used to achieve higher accuracy than the other existing base models used for Bitcoin price prediction based on Twitter sentiment analysis. The BiGRU model has the advancements of bidirectional LSTM and RNN and has much newer modifications. A neuron in the GRU is selected to substitute RNN neuron of the recurrent neural network model [2]. The BiGRU prediction model with sentiment analysis is composed of 5 stages, (1) News headlines preprocessing (2)

Bidirectional Gated Recurrent Unit (BiGRU)-Based Bitcoin Price …

35

Bitcoin historical dataset preprocessing (3) Sentiment scoring using TextBlob (4) Feature selection (5) Training BiGRU prediction model 6) Model validation. In stage 1, the news headlines data are preprocessed using stop word removal and text-stemming. Then, in stage 2, the Bitcoin historical dataset is preprocessed using min–max normalization. The third stage involves converting the news headlines into sentiment scores using sentiment analysis. Textblob is a natural language processing Python library used for sentiment analysis. The subjectivity and polarity scores from the sentiment analysis are merged with the Bitcoin historical dataset. Feature selection is performed on the merged dataset using the recursive feature elimination (RFE) technique. The selected features are used for prediction modeling. In the fifth stage, the BiGRU prediction model is developed. The model consists of a stacked two hidden layer BiGRU model to maximize the performance. 10-fold cross-validation was employed in stage 6. The merged dataset was segregated into 10 parts and 9 parts out of it were taken for the training set and one part as the test set in turns. The average value of all results was aggregated as used as an evaluation metric for the performance of the model. The model is lastly validated through different performance measures. Figure 1 shows the system flow diagram of the proposed prediction model with news sentiment analysis.

7 Experimental Analysis and Results Experimental Datasets Used Bitcoin historical dataset was downloaded from coinmarketcap.com for the period ‘01-01-2021’ to ‘31-12-2021’. The significant reason for using this time frame is that Bitcoin cryptocurrency experienced massive fluctuations owing to various social factors during this time. The dataset contains parameters, namely ‘open’, ‘high’, ‘low’, ‘close’, ‘adjusted volume’, ‘volume’. The news headlines related to Bitcoin were collected using API and a little bit of Web scraping for the above time period from cryptocurrency news aggregators online such as coindesk.com. cointelegraph.com, and cryptopanic.com. Performance Measures The confusion matrix is widely used in determining the quality and performance of the machine learning models. The output is in the form of a matrix that shows the performance assessment of the built model. Accuracy, precision, F1-score, and recall are the performance measures that can be determined from the confusion matrix and give an understanding of the quality of the model. In addition to these, mean-squared error (MSE) shows the mean of the squared difference of the predicted and real values in the dataset. Each of these measures helps us understand the overall quality of the model [1, 4]. We have used four different recurrent neural network algorithms for determining the algorithm which gave the best results for the performance measures

36

J. Malla et al.

Fig. 1 System flow diagram of the proposed prediction model with news sentiment analysis

indicated above. The algorithms are long short-term memory (LSTM), gated recurrent units (GRUs), bidirectional long short-term memory (BiLSTM), bidirectional gated recurrent units (BiGRUs). We have assessed the performance of the above algorithms based on accuracy, precision, F1-score, recall, and mean-squared error rate (MSE). Further adjustments in the hyperparameters and the number of epochs used in the model can lead to higher accuracy and performance. Figure 2 illustrates the performance measure ‘accuracy’ of the mentioned algorithms. BiGRU has the highest accuracy of 70%. Figure 3 shows the efficiency of the algorithms measured in terms of precision and recall. BiGRU has the highest precision and recall compared to the other algorithms. Figure 4 illustrates the F1-score of the above algorithms. BiGRU has the highest F1-score of 0.7.Figure 5 depicts the mean-squared error rate in percentage of the mentioned algorithms. BiGRU has the least error rate of about 0.28 compared to other algorithms.

Bidirectional Gated Recurrent Unit (BiGRU)-Based Bitcoin Price …

Fig. 2 Accuracy (%) versus algorithm

Fig. 3 Efficiency versus algorithm

Fig. 4 F1-score versus algorithm

37

38

J. Malla et al.

Fig. 5 Error rate versus algorithm

8 Conclusion and Future Directions In this paper, we have developed a BiGRU Bitcoin price prediction model by news sentiment analysis. Experimental analysis shows that the proposed BiGRU model outperforms the various other state-of-art models News headlines data collected Covid-19 era. This model can even be adopted after the Covid-19 pandemic to help investors and traders in their decision-making. Future research would include considering the impact of other factors on the Bitcoin market movement such as Covid-19. Various different sentiment analysis techniques can also be used to improve the model.

References 1. Bakagiannis, I., Gerogiannis, V.C., Kakarontzas, G., Karageorgos, A.: Machine learning product key performance indicators and alignment to model evaluation. In: 2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC). pp. 172–177 (2021). https://doi.org/10.1109/CTISC52352.2021. 00039 2. Dai, J., Chen, C.: Text classification system of academic papers based on hybrid Bert-BiGRU model. In: 2020 12th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC). vol. 2, pp. 40–44 (2020). https://doi.org/10.1109/IHMSC49165.2020. 10088 3. Faidha, Y.F., Shidik, G.F., Fanani, A.Z.: Study comparison stemmer to optimize text preprocessing in sentiment analysis Indonesian e-commerce reviews. In: 2021 International Conference on Data Analytics for Business and Industry (ICDABI), pp. 135–139 (2021). https://doi. org/10.1109/ICDABI53623.2021.9655867 4. Gharib, M., Bondavalli, A.: On the evaluation measures for machine learning algorithms for safety-critical systems. In: 2019 15th European Dependable Computing Conference (EDCC), pp. 141–144 (2019). https://doi.org/10.1109/EDCC.2019.00035

Bidirectional Gated Recurrent Unit (BiGRU)-Based Bitcoin Price …

39

5. Ibrahim, A., Kashef, R., Corrigan, L.: Predicting market movement direction for bit- coin: a comparison of time series modeling methods. Comput. Electr. Eng. 89, 106905 (2021). https://doi.org/10.1016/j.compeleceng.2020.106905, https://www.sciencedirect.com/science/ article/pii/S0045790620307576 6. Ibrahim, A., Kashef, R., Li, M., Valencia, E., Huang, E.: Bitcoin Network Mechanics: Forecasting the BTC Closing Price Using Vector Auto-Regression Models Based on Endogenous and Exogenous Feature Variables. J. Risk Finan. Manage. 13(9) (2020). https://doi.org/10. 3390/jrfm13090189, https://www.mdpi.com/1911-8074/13/9/189 7. Ibrahim, A., Corrigan, L., Kashef, R.: Predicting the Demand in Bitcoin Using Data Charts: A Convolutional Neural Networks Prediction Model. In: 2020 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–4 (2020). https://doi.org/10.1109/CCE CE47787.2020.9255711 8. Jain, A., Tripathi, S., Dwivedi, H.D., Saxena, P.: Forecasting price of cryptocurrencies using tweets sentiment analysis. In: 2018 Eleventh International Conference on Contemporary Computing (IC3), pp. 1–7 (2018). https://doi.org/10.1109/IC3.2018.8530659 9. Kaplan, C., Aslan, C., Bulbul, A.: Cryptocurrency Word-of-Mouth Analysis viaTwitter (2018) 10. Kariya, C., Khodke, P.: Twitter sentiment analysis. In: 2020 International Conference for Emerging Technology (INCET), pp. 1–3 (2020). https://doi.org/10.1109/INCET49848.2020. 9154143 11. Khan, R., Rustam, F., Kanwal, K., Mehmood, A., Choi, G.S.: US based COVID-Tweets sentiment analysis using TextBlob and supervised machine learning algorithms. In: 2021 International Conference on Artificial Intelligence (ICAI), pp. 1–8 (2021). https://doi.org/10.1109/ ICAI52203.2021.9445207 12. Kraaijeveld, O., Smedt, J.D.: The predictive power of public Twitter sentiment for forecasting cryptocurrency prices. J. Int. Finan. Mark. Inst. Money 65, 101188 (2020) 13. Ladani, D.J., Desai, N.P.: Stopword identification and removal techniques on TC and IR applications: a survey. In: 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 466–472 (2020). https://doi.org/10.1109/ICACCS48705.2020. 9074166 14. Li, L., Arab, A., Liu, J., Liu, J., Han, Z.: Bitcoin options pricing using LSTM-based prediction model and blockchain Statistics. In: 2019 IEEE International Conference on Blockchain (Blockchain), pp. 67–74 (2019). https://doi.org/10.1109/Blockchain.2019.00018 15. Li, T.R., Chamrajnagar, A.S., Fong, X.R., Rizik, N.R., Fu, F.: Sentiment-based pre- diction of alternative cryptocurrency price fluctuations using gradient boosting tree model. Frontiers Phys. 7 (2019). https://doi.org/10.3389/fphy.2019.00098, https://www.frontiersin.org/article/ https://doi.org/10.3389/fphy.2019.00098 16. Lyu, H., Chen, L., Wang, Y., Luo, J.: Sense and sensibility: characterizing social media users regarding the use of controversial terms for COVID-19. IEEE Trans. Big Data 7(6), 952–960 (2021). https://doi.org/10.1109/TBDATA.2020.2996401 17. Mohapatra, S., Ahmed, N., Alencar, P.: KryptoOracle: A real-time cryptocurrency price prediction platform using twitter sentiments (2020) 18. Patel, M.M., Tanwar, S., Gupta, R., Kumar, N.: A deep learning-based cryptocurrency price prediction scheme for financial institutions. J. Inf. Secur. Appl. 55, 102583 (2020). https://doi.org/10.1016/j.jisa.2020.102583, https://www.sciencedirect.com/science/article/pii/ S2214212620307535 19. Rathan, K., Sai, S.V., Manikanta, T.S.: Crypto-currency price prediction using decision tree and regression techniques. In: 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), pp. 190–194 (2019). https://doi.org/10.1109/ICOEI.2019.8862585 20. Symeonidis, S., Effrosynidis, D., Arampatzis, A.: A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis. Expert Syst. Appl. 110, 298–310 (2018). https://doi.org/10.1016/j.eswa.2018.06.022, https://www.sciencedirect.com/ science/article/pii/S0957417418303683

40

J. Malla et al.

21. Tan, X., Kashef, R.: Predicting the closing price of cryptocurrencies: a comparative study. In: Proceedings of the Second International Conference on Data Science, E-Learning and Information Systems. DATA ’19, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3368691.3368728, https://doi.org/10.1145/3368691.3368728 22. Tanwar, S., Patel, N.P., Patel, S.N., Patel, J.R., Sharma, G., Davidson, I.E.: Deep learningbased cryptocurrency price prediction scheme with inter-dependent relations. IEEE Access 9, 138633–138646 (2021). https://doi.org/10.1109/ACCESS.2021.3117848

How AI Algorithms Are Being Used in Applications Aleem Mohammed and Mohammad Mohammad

1 Introduction Artificial intelligence is a generation aimed toward the modern smartwatch as an appropriate wearable tool and an entire host of size gear that works as an instrument that measures temperature, coronary heart charge sensor, gyroscope, GPS, and a slew of other small pieces of technology. Following the collection of data from these sensors and receivers, an analysis method can be completed in order to apply the set of rules and produce the desired results. Based on the popularity of the task, an algorithm can be developed to provide practice on the path to obtaining the installation purpose. They are initially utilized [1–3] to evaluate human sports because of their distribution throughout the human body. It will be examined how technology clocks are used to keep track of activities in this investigation. The purpose of this study is to investigate the use of smart technology watches to keep track of activities. This generation is primarily concerned with maintaining the consequence of mobile phone pursuits with smartwatch-based activities. A gyroscope sensor is used to improve the overall functionality of the wristwatch by detecting interest and popularity. This boosts the efficiency of the smartwatch while also performing hobby recognition. Boosts the efficiency of the smartwatch while also recognizing activity.

A. Mohammed Computer Science, Sydney, NSW, Australia e-mail: [email protected] M. Mohammad (B) Department of Information Technology, Melbourne Institute of Technology, Sydney, NSW, Australia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_5

41

42

A. Mohammed and M. Mohammad

1.1 Artificial Intelligence In the event that you have ever had a picture utility tag your family or friends robotically, or if you have ever had an internet map reroute, then definitely can get benefit to avoid passing by strangers, you have undoubtedly profited from synthetic intelligence, also known as artificial intelligence (AI). Alternatively, you may have had your life spared by an AI-assisted diagnosis of coronary heart disease. There are so many promising developments in artificial intelligence that it is virtually tough to understand all of the ways in which it affects your life now and in the future. However, it is important to note that artificial intelligence (AI) is not a universal remedy for all ills and that, as with any developing technology, it can encounter unexpected technological difficulties as well as present new criminal, moral, and socially problematic issues [4].

1.2 The Promise of AI Artificial intelligence provides some of the best assurances available, particularly in the domains of work in which it is used. For example, there has been significant progress in the use of artificial intelligence to assist in identifying ailments, such as by looking at brain scans or listening for heart murmurs [5], which allows healthcare providers to provide better and more efficient service. Additionally, artificial intelligence is being used for study synthesis, in which researchers take several years of clinical research and hundreds of publications and comb through them in search of brand-new ideas and packages. Similarly, artificial intelligence is being applied efficiently in move lining logistics [6], whether it is for an air tour, corporate travel, or even trip sharing. Also comparable are the capabilities of providing immediate response to credit card and mortgage programs, identifying and investigating capacity fraud in real-time, classifying photos, and testing e-mail messages. There are plethora of potential applications for something like this. Throughout this, it is important to understand that artificial intelligence includes multiple linked domains and that humans occasionally utilize either one when they are talking about essentially the same aspect. AI is a short form of artificial intelligence, and the concept behind it is that you have wondering machines or computers that can analyze from experience like humans, perform without precise instructions, and that can perform tasks such as visual notion, logical reasoning, and mastering, among other things. With machine learning, which is a comprehensive set of computer algorithms that can learn from data to predict patterns and outcomes, there is a carefully connected subject. One particularly useful approach in gadget learning and artificial intelligence has been the use of neural networks, which are algorithms that have a completely complex set of hidden layers of nodes that are available between the enter and the output and that will provide an intermediate set of processing between the enter and the output. Deep learning is a term that refers to a completely well-developed set of

How AI Algorithms Are Being Used in Applications

43

neural networks. It is a specific type of neural community that has many layers that are hidden in between that allows for a wide range of intermediate processing, and it has been responsible for some of the most exciting trends in artificial intelligence. Finally, there is predictive modeling and analytics, which is the general exercise of developing models’ statistical models in order to anticipate precise repercussions, such as whether or not someone would need to check into a clinic or whether or not someone will default on a mortgage. Predictive modeling can be completed by using AI or device analyzing, or it may be carried out with extra modern methods, although AI has made a nice sized amount of growth, notably in intricate challenges of predictive modeling. The preferred concept has indeed been orbiting since the nineteen fifties that is while work began to construct machines that would motivate, reach conclusions, make choices, and analyze blunders. That was the intention, but the technology developed into nothing near where it is miles proper today, and while it inspired a few instant curiosities and some thrilling developments, it did not move very away. We had a clean duration to the duration of the 60s wherein there was not almost as great deal progress, and then in the 70s, researchers decided to take an exceptional method and that they made progress through drawing from recreation principle and arithmetic, and the strategies of experimental psychology. In AI, this was elevated to the second primary section. A further breakthrough was made in the nineteen nineties when IBM’s Deep Blue defeated global chess champion Garry Kasparov, something that human beings believed was impossible for a machine to accomplish. Consider the fact that since then, artificial intelligence (AI) has evolved the power to defeat the world champions in a variety of games, each step reflecting a very good accomplishment in creating (as well as honest, human problem-solving). And then, most recently, in the decade of the 2010s, deep learning, which is the critical component of artificial intelligence, has been economically feasible. Computer technology has advanced to the point where it is feasible, and the amount of information has increased to the point where it can supply the algorithms with the raw data that they require for processing. This is a very brief timeline from the 1950s to the present day, during which there has been an explosion of technological advancement and synthetic intelligence programs [7].

1.3 What It Means to Learn It was necessary to provide computer-specific instructions on what to accomplish. Consider the manner in which you interact with computer systems. The majority of applications are simply a series of plain commands. That is why, while developing software for a specific purpose, such as banking software, you should strive to be as distinctive as possible. It is possible to provide an instruction that states something along the lines of “If a client attempts to withdraw cash” and this exceeds their financial stability, “then the transaction will be canceled.” That is a deliberate act of preparation. If you see an X, then Y should do by the user. Machine learning is a rather particular field. You are no longer creating unique instructions in this

44

A. Mohammed and M. Mohammad

situation. Instead, you are providing the laptop with information and equipment it needs to take a look at the problem and figure out how to solve it on its own, without even being instructed on what to do. This will allow the computer to evolve, adapt, and gain from its blunders as a result of the information you have provided it with. That is not all unusual in terms of how individuals investigate things. During system learning, computers behave in a very similar manner to humans when approaching this information. In order to begin, the system tests something smaller, such as a tiny portion of the information. After that, it employs a statistical set of principles to determine how well the information fits together. The gadget will search for styles using the set of rules that have been provided. After that, the device will receive a few compliments. Every time the system learns something new, it updates the database with the information. On one level, the computer is saving the information in its long selective memories so that it can develop and evolve as needed in the future. [8] Both the system and the person came away with new knowledge and expertise.

1.4 Work with Data A large portion of computer science is still based on exacting specifications and following particular instructions. Traditional programming entails configuring the machine to accept your input and generate an output that is predominantly determined by a set of rules. The command is entered, and the output is a present reaction that occurs. That will work perfectly if you have a program that performs accurate computations. However, because people cannot explicitly train the computer on what to do, things get a little more difficult. A programming version that allows the machine to conduct research is required in these situations. In addition, you should give the device the ability to respond to user input in some fashion. In terms of machine learning, this is an optimal state of affairs. Consider the following scenario: developing a user program that needs to identify junk mail text. These communications are frequently covered with unsolicited advertising or malware, among other things. Users can easily construct a filter word and out software that removes the mails that contain not uncommon unwanted mail words, allowing you to filter out phrases such as gold, lottery, and winner without difficulty. This has the potential to eliminate a large number of junk mail texts, but it may be simple to the fake task. The user can alter the phrase lottery to include zero if they want. Additionally, it has the potential to generate a large number of false results. You might have received an e-mail from a friend telling you a shaggy dog yarn about winning the lotto. Due to a clerical error, this e-mail was erased. Attempting to solve these kinds of problems when you are restricted to following carefully detailed instructions will not yield satisfactory results. You will not be able to build an input command with a present response with certainty. The reason for this is that device learning causes items to be switched. Instead of entering commands, the user can enter information. In place of using a predefined response, you will be using device mastering methods to guide the gadget through the process of discovering new ways to respond. The first step

How AI Algorithms Are Being Used in Applications

45

would be to separate your documents into two categories: verify facts and education statistics. The education information is a little amount of information that users can utilize to look for trends. Occasionally, a model will assist your gadget in making sense of the data through the use of statistical methods. These algorithms assist the device in making accurate forecasts or in identifying patterns between different pieces of your statistical data. Consider the possibility that gadget studying could be used in conjunction with our spam software. As a starting point, let us use 10,000 e-mail messages as our training data set, which we will use to create and modify our model before putting it to the test on our test data, which consists of over a million texts. Users can utilize your check facts to provide the system with one-of-a-kind examples of unwanted mail that were not previously seen. Then, you can employ a classifier device with a learning algorithm to assist you in cutting up the e-mail between the two firms. You should receive your unsolicited mail first, followed by a standard message [9]. This is referred to as the binary type. The device accomplishes this by identifying words in the groups or phrases that more considerably occurred in unwanted mail text. After that, it generates a score to indicate the likelihood that the mail is unsolicited mail. The optimum classifier set of rules to use will be determined by you as a system learning about the profession, you will be working in. It is then your responsibility to tune the hyper-parameters of the algorithm such that it is capable of accurately anticipating whether an e-mail message is junk mail. Once you are pleased that will serve as the basis for your initial statistical model. You will utilize a gadget to learn a set of rules and to record the relevant hyper-parameters in order to generate an accurate forecast using the gadget. Keep in mind that, even while the programmer enters the statistics, chooses the methods, and makes corrections, it is eventually the device that decides whether or not a message is transmitted. In certain circumstances, the programmer may not even be aware of how the machine determined that the message was spam.

1.5 Machine Learning Application Several niche businesses are already utilizing machine learning techniques. When users glance at the weather forecast, or if you are wearing a smartwatch that monitors your physical activities, you are already taking benefit of machine learning techniques. This technology could be beneficial to any commercial firm that has a large amount of data and is looking for more effective ways to analyze and utilize it. However, there are other places that are significantly easier to suit. Machine learning can assist us in identifying ways to improve your overall experience. This smartwatch has the capability of collecting a huge number of data, such as temperatures, speed, range, gyroscope location, and so on and so forth. When you are trying to lose weight, you will enter your current weight and height, after which you will set the cantered weight. The set of rules will then determine how many calories you should burn each day by engaging in your traditional activities because it already has a database of your

46

A. Mohammed and M. Mohammad

behavior and habitual day-by-day activities. Obtaining massive amounts of information is a relatively simple process. What’s difficult is extracting information from such documents. Machine learning can assist in taking all of this information and examining it to learn more about you. You should be able to lose weight more quickly if you change your routine, for example, by running for a longer period of time than usual. When you use this tool, it can analyze your behavior and identify ways in which it can better suit your needs. The results of the private and impartial interest popularity figures are represented in Tables 1 and 2, respectively, and can be found in the following sections. Results are quantified classification accuracy, which in this case is defined as the percent of classifications that accurately identify the activity in which the consumer is participating. All of the models make use of a single sensor model; the use of the watch-accelerometer, the telephone-accelerometer, and the watch-gyroscope are all evaluated in this manner. And based on the same research, some data was extracted from the output data, as shown in Table 3. This may appear strange, but computers that help you learn more have actually grown to become one of the most profitable industries on the planet. Companies such as Google, Facebook, and Apple all make use of this technology. To better understand you, I am using a gizmo to make you understand the concept. The majority of the time, they identify patterns in material that humans should never notice. That is one of the most fascinating aspects of device learning; it is not only a more appropriate kind of human learning, but it is also a more effective type of human learning. To uncover patterns, make judgments, and gain better insights, you Table 1 Overall accuracy for personal models

Table 2 Overall accuracy for impersonal models

Algorithm

Phone accel (%)

Watch accel (%)

Watch gyroscope (%)

RF J48

75.5

93.3

79.0

65.5

86.1

73.0

IB3

67.7

93.3

60.1

NB

77.1

92.7

80.2

MLP

77.0

94.2

70.0

Ave

72.6

91.9

72.4

Algorithm

Phone accel (%)

Watch accel (%)

Watch gyroscope (%)

RF

35.1

70.3

57.5

J48

24.1

59.3

49.6

IB3

22.5

62.0

49.3

NB

26.2

63.8

53.5

MLP

18.9

64.6

57.7

Ave

25.3

64.0

53.5

How AI Algorithms Are Being Used in Applications

47

Table 3 Per-activity accuracy for rf models Activity

Impersonal (%)

Personal (%)

Watch accel Phone accel Watch Gyro Watch accel Phone accel Watch gyro Walking

79.8

60.7

87.0

94.2

88.5

93.5

Jogging

97.7

93.8

48.6

99.2

68.8

98.1

Stairs

58.5

66.7

43.1

88.9

66.7

80.0

Sitting

84.9

26.9

70.5

97.5

87.0

82.2

Standing

96.3

65.9

57.9

98.1

73.1

68.6

Kicking

71.3

72.5

41.4

88.7

91.7

67.9

Dribbling

89.3

26.1

86.0

98.7

84.8

96.9

Catch

66.0

26.1

68.9

93.3

78.3

94.6

Typing

80.4

76.9

60.8

99.4

72.0

88.6

Handwriting

85.2

12.9

63.1

100.0

75.9

80.5

Clapping

76.3

40.9

67.9

96.9

77.3

95.6

Brush teeth

84.5

19.2

66.2

97.3

96.2

89.6

Fold clothes

80.8

8.3

37.8

95.0

79.2

73.1

Eat pasta

47.1

0.0

57.9

88.6

40.0

72.9

Eat soup

52.7

0.0

47.7

90.7

82.4

69.8

Eat sandwich 29.0

7.1

31.1

68.9

63.0

44.2

Eat chips

65.0

16.0

50.6

83.4

76.0

52.5

Drink

62.7

31.8

61.1

93.3

77.3

78.5

Overall

70.3

35.1

57.5

93.3

75.5

79.0

must approach the problem in an entirely different way. In other words, if you want to apply machine learning to a company, you must have an idea about how the machine learns and how it will do so. As a result, you will be able to begin collecting data that will allow your software to better customer understanding [10].

1.6 Machine Learning Various Types Users could think of machine learning as a fancy moniker for a thing that has been there for some time. Perhaps it is merely a more up-to-date way of explaining statistics or a new way of speaking about record science. However, while considering system learning, the primary idea at the time of studying is critical. It is unquestionably true that machines have records. It has the potential to be a critical component of your statistical technological know-how endeavors. However, those are only the tools that your device desires to explore; they are no longer a learning component of new information. Consider what it means to be a lifelong learner. What are some of the one-of-a-kind tactics that you employ when learning a new subject? How can

48

A. Mohammed and M. Mohammad

you take those techniques and then apply them to machines to see what happens? Consider the scenario in which you wished to figure out how to play chess. Users might use a few different techniques to do this. It is possible that you will wish to rent a chess train. They might introduce you to some of the more exclusive chess pieces, as well as how they move around the board in different situations. Users can get some exercise by gambling against your performance, and they would keep an eye on your actions and assist you if you make a mistake. After a period of time, your instructor may decide to discontinue all of your classes, and you will be able to participate in competitive games with other students. Let us imagine you were unable to locate a show. If you like, students could travel to local parks and can see various hundred professional players compete in the game. No questions may be asked; instead, you were expected to sit quietly and observe. If you, do it for a long enough period of time, you will most likely understand the game. You may not be familiar with the names of the chess pieces, but you are familiar with the motions and techniques because of the hours of observation you have put in. You could even combine the two ways to see if it works better for you. A chess exhibition could be able to show you the way. If the rules were straightforward, you’d pass lower back to observe other humans gambling. You’d have a high-level evaluation and the titles of the chess sections, but you’d rely on observations as a means of learning new strategies and improving your performance. Users can engage in a process known as supervised studying. In this situation, a record scientist serves as a coach for the machine. They train the machine by presenting it with key guidelines and providing it with a universal approach to follow. You might also experiment with unsupervised learning. In this case, you just instruct the machine to make all of the observations on its own. Although the system may not recognize all of the different names and labels, they will be able to locate the styles on their own. Finally, you can combine the two approaches and attempt semisupervised knowledge acquisition. In this case, you would merely teach the gadget a small amount so that they get a high-stage evaluation. By observing unusual patterns, you may learn the majority of what you need to know about policies and plans. As you can expect, each of the three techniques has its own set of advantages and disadvantages. Having a knowledgeable coach will be necessary for supervised knowledge acquisition. Someone who is easily accessible and who is well-versed in the game of chess, and who could teach you how to play the game is required. When conducting unsupervised research, you need to have access to a wide range of statistics. It is possible that you will not be able to go to a public park and see dozens of different professionals play the game. This also relies on who you see a little bit. You should keep an eye out for people who are doing well at gambling. You may have difficulties with semi-supervised getting to know each other on each of the factors. With a bad train, it will be much more difficult to learn from observation than it will be with an excellent one. As an alternative, in the event that you have a fantastic show, but the people who you examine are bad gamers, you may be able to comprehend the sport, but you will not be capable of turning into acceptance to each institution [11]. It is possible that you are in a function where you can identify which method is the best

How AI Algorithms Are Being Used in Applications

49

first-class option. However, there are times when you simply have to make do with what you have got at your disposal. Even if you are unable to locate a show, you can still be of use by simply monitoring people in public parks. If you do not have access to a public park, you will have to do your best to find great entertainment somewhere else. You can only perform semi-supervised studying.

1.7 Select the Best Algorithm As a system researching professional, one of the first things you will have to do is determine which set of rules will be most appropriate for your project. In other cases, you will not have much of a choice because of the circumstances. If your data is labeled, you will almost certainly want to use supervised learning to improve your accuracy and efficiency. The labeled information aids in comprehending both the input and the output of the program. So, if you are developing a utility that can assist you in charging your home, you will need a slew of clearly labeled data to work with. These are usually special tags that make it easier to distinguish between the statistics. In such a case, you will include your zip code, square footage, and a number of toilets in your itinerary. It is not necessary for your machine to locate its own personal patterns. If your information is not labeled, it is possible that you will employ unsupervised learning. You will be allowing the system to establish its own clusters in this section. Consequently, you will supply your device with all of the statistics that you have on one-off residences and other unique properties. The machine then determines which clusters make the most sense in the given situation. Could it be that the device gathered all of the houses that are more walkable together? It is possible that it is a criterion that is not even known. Once you have obtained the clusters, you will be able to extract a couple of them using this method. In the case of large amounts of unlabeled data, okay-way clustering will almost certainly be the most appropriate method to employ. Alternatively, the machines could be instructed to form clusters in a variety of ways. Alternatively, when you have a large amount of labeled information, you may want to use regression, okay-nearest neighbor, or selection bushes to sort through the data. You can also experiment with a variety of alternative algorithms before taking a more in-depth look at the outcomes of your experiments. Do not expect to see results right away because this could take a long time and require a lot of computational power, so be patient. Suppose you are working with supervised system learning, and you want to utilize three different learning algorithms for your education facts: selection bushes, naive Bayes (or ok-nearest neighbor). Let us say you are working with supervised system learning and you want to apply three different algorithms for your education facts: selection bushes, naive Bayes, or okay-nearest neighbor. Once you have done that, you can look at the results and determine which one had the highest level of accuracy. You might also experiment with something known as ensemble modeling. This is the phase in which you work to construct one-of-a-kind ensembles of devices studying algorithms. There are a variety of ways that you may use to create ensembles. Bagging, boosting, and

50

A. Mohammed and M. Mohammad

stacking are all options. Bagging is the process of creating multiple special copies of a system while analyzing a set of rules. It is important to remember that selecting bushes can be prepared in a variety of different methods. A large number of remarkable indicators for the root word can be generated. To avoid having to resort to begging, simply plant a large number of unique trees and observe which ones yielded the best results in the end. In the event that your results are inconsistent, you can also pool your results together and analyze them. By boosting your results, you are attempting to improve the accuracy of your outcomes by running numerous amazing device learning algorithms simultaneously. You can utilize okay-way clustering in conjunction with a decision tree to achieve the best results. Here, we are plucking the leaves of the tree and letting the system decide whether or not there are any interesting groupings. This is also a fantastic example of semi-supervised learning, which you can read about here. Stacking is the process of combining many remarkable systems, and learning methods and stacking them in order to improve the accuracy of the system. The crew that won the Netflix reward used a stacking technique to achieve their victory. It was renamed feature waited linear stacking after the update. Several separate prediction models were developed, and then they were built on top of one another. As a result, you can layer okay-nearest neighbor on top of the naive Bayes peak. Each individual may only upload 0.01, but over time, this can add up to a significant amount of development. Some of the winners in gadget learning competitions would have more than 30 algorithms stacked on their devices. The most important thing to remember is that you should consider each set of principles for mastering a gadget as a capable tool. You can experiment to find the most satisfying one, or you can practice with a variety of different pieces of equipment to increase your accuracy over time[12]. The approaches of artificial intelligence were optimized and managed by computer networks [13], and in this way, artificial intelligence appeared and evolved toward the human civilization throughout time, with the first-rate of growth occurring between the years 2000 and 2015 [14]. Using artificial intelligence, the first keywords-based systematic mapping study has been conducted, and it has been aimed at improving accuracy since the year 2000 [14]. In the experimental setting of MATLAB, 30-targeted factors have been coordinated into a different value, and this approach frequency has anticipated a good overall performance and then evolved to the accuracy common in the field. [15–18].

2 Literature Review The simulation of human intelligence in the machine and computer form is known as artificial intelligence. According to Y.Chou, Y. Nam, and Y-J Choi, artificial intelligence is being used for various purposes, it is currently used to detect the deficiencies of nutrients in the soil. According to Smart things have all been available for use in recent times. In today’s cutting-edge civilization, artificial intelligence can be put to use in a variety of contexts. Because it is capable of solving complicated problems in an effective manner in a variety of areas, including medicine, recreation, finance,

How AI Algorithms Are Being Used in Applications

51

academia, and so on, it is becoming increasingly important in today’s world. This is one reason why it is becoming increasingly important in today’s world. The use of AI is making our lives easier and more efficient in many ways [21]. The use of artificial intelligence is very helpful in finding solutions to difficult challenges in the universe. Moriwaki, and Norihiko, claim that the formation of AI could be useful for expanding our knowledge of the cosmos, including its operation, its origin, and other aspects. Information security is of the utmost importance for every company, and the rate at which cyberattacks are becoming more commonplace in the online world is accelerating dramatically. The application of AI can make the facts you have even more reliable and secure. Some instances, such as AEG bot and AI2 Platform, are used to determine software bugs and cyberattacks in a more accurate manner. In the past four to five years, artificial intelligence (AI) smartwatches have gained recognition. Fitness trackers are becoming increasingly popular, and numerous companies, like iPhone, Samsung, and Fitbit, are fighting against one another to bring new models to market [22]. They may consistently release updates to the monitor in order to maintain their position as a contender in the market. The performance, design, and features of the AI smartwatches that are currently available on the market are all of the highest possible calibers. These smartwatches are made possible by AI, which enables them to monitor and record a user’s personalized activities and sports. It makes it possible for us to keep up our fitness and achieve our day-to-day health goals. This is the function that caters to the significance of smartwatches among individuals who are conscious of their fitness levels. The AI smartwatches that are beneficial to populations contain features such as the ability to measure heartbeats and monitor blood pressure. Some people believe that smartwatches are helping to bridge the gap between patients and their doctors by providing real-time health reports that allow for more timely treatment.

2.1 Implement AI Applications: Developing an AI device involves reverse-engineering individual trends and abilities in a system and harnessing its processing power to exceed us. ML teaches a device to make inferences and choices based on experience. It discovers patterns, analyzes beyond records, and deduces the meaning of data points without human experience. This automation saves firms time and helps them make better decisions by assessing records. For a free novice gadget learning path, you can learn basic criteria. Deep learning (ML) enables a device to use layers to classify, infer, and forecast results. Neural networks mimic human neural cells. They are a series of programs that captures the relationship between underlying factors and analyzes information like a human mind. NLP is a tool for reading, understanding, and decoding language. Once a device knows what a user is saying, it responds.

52

A. Mohammed and M. Mohammad

Computer vision algorithms analyze a photo’s unique characteristics to interpret it. This helps the gadget identify and evaluate photos to select a better output based on previous observations. Cognitive computing algorithms analyze the message, picture, and speech like a human and offer the preferred output also get AI-free programs.

3 Conclusion As with the human mind, artificial intelligence (AI) might learn and make decisions in the most appropriate way based on the data and the most optimal algorithm, and in some cases, it could even do so faster. Because data collection is continually in progress, and the more data you acquire, the more accurate the judgment you make, it is possible that the AI will achieve an unanticipated level of sophistication in its decision-making at times. Because of their dependability, smartwatches will undoubtedly achieve a new level of sophistication.

References 1. Cho, Y., Nam, Y., Choi, Y-J., Cho, W-D.: SmartBuckle: human activity recognition using a 3-axis accelerometer and a wearable camera. In: HealthNet ‘08 Proceedings of the 2nd Int. Workshop on Systems and Networking Support for Health Care and Assisted Living Environments 2. Gyorbiro, N., Fabian, A., Homanyi, G.: An activity recognition system for mobile phones. Mob. Netw. Appl. 14(1), 82–91 (2008) 3. Maurer, U., Smailagic, A., Siewiorek, D., Deisher, M.: Activity recognition and monitoring using multiple sensors on different body positions. In: 2006 IEEE Proceedingds on the International Workshop on Wearable and Implantable Sensor Networks, vol. 3, 5 4. Lv, J. et al. Artificial intelligence-assisted auscultation in detecting congenital heart disease. Eur. Heart J. Digital Health 2(1): 119–124 (2021) 5. Thompson, W.R. et al.: Artificial intelligence-assisted auscultation of heart murmurs: validation by a virtual clinical trial.“ Pedia. Cardiol. 40(3) (2019): 623–629 6. Ho, G.T.S, et al.: An intelligent information infrastructure to support the streamlining of integrated logistics workflow. Expert Systems 21.3 (2004): 123–137 7. Haussler, D.: Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artif. Intell. 36(2), 177–221 (1988) 8. Moriwaki, N. et al.: Achieving general-purpose AI that can learn and make decisions for itself. Hitachi Rev. 65(6): 113 (2016) 9. Khandelwal, Y., Bhargava, R.: Spam filtering using AI. Artificial Intelligence and Data Mining Approach in Security Frameworks (2021): 87–99 10. Weiss, G.M., et al.: Smartwatch-based activity recognition: a machine learning approach. In: 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 2016. 11. Ayodele, T.O.: Types of machine learning algorithms. New Adv. Mach. Learn. 3, 19–48 (2010) 12. Abu-Naser, S.S.: Developing visualization tool for teaching AI searching algorithms (2008)

How AI Algorithms Are Being Used in Applications

53

13. Qadir, J., Yau, K.A., Imran, M.A., Ni, Q., Vasilakos, A.V: IEEE access special section editorial: artificial intelligence enabled networking. IEEE Access 3:3079–3082,2015,DOI: https://doi. org/10.1109/ACCESS.2015.2507798 14. Liu, J., et al.: Artificial Intelligence in the 21st Century. IEEE Access 6, 34403–34421 (2018). https://doi.org/10.1109/ACCESS.2018.2819688 15. Zhou, Y., Xu, F.: Research on application of artificial intelligence algorithm in directed graph. Int. Conf. Comput. Intell. Inf. Syst. (CIIS) 2017, 116–120 (2017). https://doi.org/10.1109/CIIS. 2017.26 16. Chen, L., Qiao, Z., Wang, M., Wang, C., Du, R., Stanley, H.E.: Which artificial intelligence algorithm better predicts the chinese stock market? IEEE Access 6, 48625–48633 (2018). https://doi.org/10.1109/ACCESS.2018.2859809 17. Wei, Y., et al.: A review of algorithm & hardware design for AI-based biomedical applications. IEEE Trans. Biomed. Circuits Syst. 14(2), 145–163 (2020). https://doi.org/10.1109/TBCAS. 2020.2974154 18. Ray, S.: A quick review of machine learning algorithms. In: 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), pp. 35–39 , 2019. https://doi.org/10.1109/COMITCon.2019.8862451.Appendix: Springer-Author Discount 19. Lockhart, J.W., Pulickal, T., Weiss, G.M.: Applications of mobile activity recognition. Proceedings of the 2012 ACM UbiComp International Workshop on Situation, Activity, and Goal Awareness, Pittsburgh, PA

A Framework for Identifying Theft Detection Using Multiple-Instance Learning S. R. Aishwarya, V. Gayathri, R. Janani, Kannan Pooja, and Mathi Senthilkumar

1 Introduction A framework is suggested for generalized theft detection using convolutional neural network (CNN) and multiple-instance learning (MIL) to identify abnormal events and categorize them as a specific type by analysing the surveillance videos. Crime rates have been increasing at an alarming rate all over the world. It may lead to dangerous severe issues not only for the environment but also for humankind. Closed-circuit televisions (CCTVs) are mainly used as monitor and store technologies. It is being used to identify crimes and theft, and over the years, it has become more efficient. Detecting an anomalous event finds a rare abnormal pattern from the frequently occurring normal pattern. It is taken up as a manual, tedious job in most cases. To save time and labour, developing intelligent algorithms would be beneficial. Automatic detection of crime/theft in the video has gained significant attention in the past few years. By utilizing the application of CCTVs, it can identify the crimes that have happened or even prevent crimes from happening. The main goal of the theft detection system is to raise the S. R. Aishwarya · V. Gayathri · R. Janani · K. Pooja · M. Senthilkumar (B) Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] S. R. Aishwarya e-mail: [email protected] V. Gayathri e-mail: [email protected] R. Janani e-mail: [email protected] K. Pooja e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_6

55

56

S. R. Aishwarya et al.

alarm or signal when there is a deviation from normal patterns. The paper focuses on developing a CNN and MIL-based model for video understanding and predicting an abnormal event. In this model, addressing the same is tried and predicting the type of abnormal event. Thus, this model would be beneficial in predicting and preventing crime and ensuring the well-being of humankind.

2 Related Works The type of machine learning approaches, feature extraction techniques, and the kind of model used to obtain results are the key factors for evaluating the solution of the theft detection application.

2.1 Different Machine Learning Approaches In the case of a fully supervised method, frame-level labelling is done along with video-level labelling, which is a very time-consuming process as described in [1, 2]. Unsupervised anomaly detection has also been explored in different research. The most common approach is to treat abnormal events as outliers to a trained model using normal videos. The problem here is that defining a boundary between normal and abnormal is ambiguous. It is because it cannot describe all possible normal patterns or behaviours. As a result, any new occurrence of the normal event may also deviate from the trained model and can cause a false alarm. Some other approaches assume that unseen anomalous videos cannot reconstruct properly. Hence, the samples with higher reconstruction errors were considered abnormal events. But due to a lack of prior knowledge on abnormality, these methods overfit the training data and might fail to discriminate between normal and abnormal events. The work in [3] compares unsupervised and weakly supervised approaches to see better results in weakly supervised approaches. However, using partially labelled (weakly supervised) data shows better results than unsupervised approaches, as seen in [4–5]. Hence, it would be best to choose weakly supervised learning to predict anomalies out of the three methods since it is less time consuming and is much more efficient.

2.2 Feature Extraction One of the important functions of anomaly detection application is deriving the spatiotemporal features from the video clips to train the model. The temporal encoding–decoding network is proposed to find how normal/abnormal feature instances evolve [7]. The research work in [2] used the multimedia content description interface (MPEG-7) library for identifying knives in the considered frame. In

A Framework for Identifying Theft Detection Using Multiple-Instance …

57

[5, 8], the grouped positive and negative instances are passed to a three-dimensional residual network (3D ResNet), pre-trained using a kinetics dataset. The method in [4] added a self-guided attention module to boost the feature encoder, which automatically focuses on anomalous regions in frames while extracting task-specific representations. The work in [1] suggested two-stream inflated 3D ConvNets (I3D) obtain local spatiotemporal features. Most of the other researchers [7, 9, 10, 5] have used deep three-dimensional CNNs (C3D) to obtain the features. The C3D model is given an input video segment of 16 frames and outputs a 4096-element vector.

2.3 Comparative Exploration of Different Models A model is proposed where the extracted features are passed through the bidirectional long short-term memory (LSTM) for temporal data processing [11]. But the disadvantage here is that stacking many layers of bidirectional LSTM creates a vanishing gradient problem. It is relatively slow compared to other models. The work in [12] showed that LSTM encoder–decoder-based reconstruction model over normal time series is a viable approach [13] proposed a hybrid autoencoder architecture based on the LSTM encoder–decoder and the convolutional autoencoder that extracts spatiotemporal features. The work in [14] acquires both the local features and the classifiers in a single learning framework using a fully CNN-based autoencoder [4] proposed multiple-instance self-training frameworks with a twolevel stage of pseudo-labels generation and self-guided attention boosted feature encoder. [3] proposed robust temporal feature magnitude enabled MIL classifier, which takes the top k features to train snippet classifiers. They use a multi-scale temporal network since it learns multi-scale temporal features from pre-computed features. [5] suggested a deep MIL ranking model that enforces ranking only on the two instances with the highest anomaly score in the positive and negative categories. But in this model, it takes too many iterations for the network to start to produce low scores for normal segments and high scores for abnormal segments [7] proposed a multiple-instance deep ranking method where the maximum margin classifier with support vector machine formulation has been extended to the MIL. The work proposed in [15] also used the support vector machine strategy to classify normal and abnormal activities in ATM booths. They encoded the video streams by histograms of gradient technique. The K-means clustering algorithm does the feature mapping. [5] introduced new ranking measures for learning the deep MIL (DMIL) without temporal annotations. Also, with the help of joint learning of deep motion and appearance features, deep network with multiple ranking measures learned the context dependency [9] proposed a clustering-based self-reasoning framework that considers anomaly detection a binary problem and employs clustering algorithms to distribute all the fragments into two clusters. [16] used MOG2 to extract the background and eliminate the moving objects [17] proposed a temporally coherent sparse coding (TSC) with similar neighbouring frames encoded with similar reconstruction coefficients and mapped the TSC with a special type of stacked recurrent

58

S. R. Aishwarya et al.

neural networks. The self-trained deep ordinal regression is applied to video anomaly detection [18, 19] which overcomes some demerits of existing methods. [20] performs unsupervised learning to identify anomalies in imaging data. [21] proposed a model to find normal and abnormal events in sensor log files. Their approach achieved a higher accuracy rate while using stacked long short-term memory, proving that it can learn long-range temporal dependencies faster with sparse representations without the preliminary knowledge of the time order information. The literature survey of CNNs is reviewed with different variations for different applications [22, 23]. The anomaly detection using thermal images is investigated by mainly focusing on three classes of features, including textual features, colour features, and shape features [24, 25]. All related works mainly classify a video snippet as normal or abnormal. Some of the research works led to certain open problems like robustness to noises and wrong classification of abnormal snippets as normal snippets. Considering these problems, reducing the impact is focused on categorizing the type of anomaly detected to give better results for theft detection. The work starts with video preprocessing: resize each video frame and fix the frame rate, followed by extraction of visual features from the fully connected layer of the C3D network. Then, the C3D features are computed for every 16-frame video clip, followed by l2 normalization. After that, a three-layered, fully connected network is built to process the features obtained after extraction and choose the required parameters like dropout regularization, activation function for each layer, optimizers, and learning rate. The anomaly score is calculated for each video instance. Then, MIL is used to design an objective function that must be optimized. Here, the MIL ranking loss function is redefined. As the next step, the anomaly score is calculated for each video. Using the anomaly video’s score and its corresponding instances, the video is mapped with one of the 13 anomaly classes available. Finally, a user-friendly website is built using the Django framework to get the input video and display the predicted results accurately. The final product is deployed using Heroku (a cloud application platform).

3 Proposed System 3.1 Dataset Descriptions University of Central Florida (UCF)—The crime dataset is a large-scale dataset of long videos with different scenes representing real-life situations. The dataset consists of 1900 videos divided into training and testing sets. The training sets consist of 800 normal videos and 810 abnormal videos, and the test sets include 150 normal and 140 abnormal videos (290 videos in total). The abnormal videos in both training and testing cover 13 real-world anomalies: abuse, arrest, arson, assault, accident, burglary, explosion, fighting, robbery, shooting, stealing, shoplifting, and vandalism. The total dataset duration is 128 h. No temporal (frame-level) annotation is available in this

A Framework for Identifying Theft Detection Using Multiple-Instance …

59

dataset except for the testing videos. It is the biggest video anomaly dataset and the only one with multiple scenes with real surveillance videos. ShanghaiTech dataset: It is a medium-scale dataset that contains 437 different videos. It has 13 different scenes of 31,739 frames of resolution 489 × 856 pixels with different lighting conditions and camera angles. The training set has 175 normal and 63 abnormal videos. The test set has 155 normal and 44 abnormal videos. Snatch 1.0 dataset: From Hyderabad, India, 35 chain snatch theft incidents from different surveillance cameras have been obtained. Finally, all three datasets are combined. While combining the Snatch 1.0 dataset has been divided into a training and testing set and was added to the robbery class category along with the other two datasets.

3.2 Video Preprocessing Resizing the video—video resizing is important to increase or decrease the total number of pixels. To do this, resize method is used with the VideoFileClip object. Each video is resized to 320 × 240 pixels. Keyframe extraction—the keyframe of a video summarizes a video in a smaller number of frames. The colour feature of the images is used. Histogram difference (d) between two consecutive frames is calculated using cv2.compareHist (H1, H2, method). Mean (μadh) and standard deviation (σadh) of absolute difference are calculated, and the threshold (t = μadh + σ adh) is computed. If d > th, the first frame was selected as the keyframe. The similarity in structural similarity index measure (SSIM) is determined through the correlation method.

3.3 Feature Extraction Important visual features from input videos are extracted from the FC6 layer of the C3D network. C3D features are computed for every 16-frame video clip. Further l2 normalization is performed over it, and at the end, the average of 16-frame video clip features within a segment is calculated. Thus, the features for each video segment have been obtained. These features are inputted to a three-layer fully connected network.

3.4 Neural Network with Multiple-Instance Learning A fully connected 3D CNN is built for training the data. Dropout regularization is determined to prevent overfitting, which helps differentiate normal and abnormal

60

S. R. Aishwarya et al.

behaviour. The model determines and applies the activation function suitable for every unit. ReLU is used for the initial layers, and the sigmoid function is used for the last layer. An optimizer suitable for the application is determined for changing the neural network’s weights to reduce the losses. It is intended to use an AdaGrad optimizer since it is an optimizer with parameter-specific learning rates. Based on these parameters, three-layered, fully connected network is built. Learning rates are adjusted according to the result that is obtained. The learning rate is constantly updated depending on the number of epochs. Parameters of sparsity and smoothness constraints are taken into consideration. MIL ranking loss is calculated by including these constraints. Each frame of the video is considered, and its corresponding anomaly score is calculated. If the score is high for a frame, it is concluded that an anomaly is detected.

3.5 Multi-Class Classification There are 13 abnormal activity classes where each activity has a different range of values based on which an abnormal video is classified. Each class is specified to have a range of anomaly scores, using which the type of anomaly for a new instance is determined. The frames with low anomaly scores are categorized as normal videos. Hence, the frames with high anomaly scores are only taken for classification. The frame where a high variation in the score is recorded is used to find the timeline in which the abnormal activity occurs. Thus, the type of abnormal activity and timeline in which it happens is recorded and displayed.

3.6 Web Application Development Front-end development—build a user-friendly interface with HTML, CSS, JS, and Bootstrap, where a web page is provided for uploading the video input. Integration of front end and back end—Django framework connects the frontend web pages with the deep learning model. Heroku is used for deploying the application.

4 System Design Figure 1 describes the flow of the proposed system design. A module-wise implementation of the proposed system is described in the following sub-sections.

A Framework for Identifying Theft Detection Using Multiple-Instance …

61

Fig. 1 Flow diagram of the proposed system

4.1 Preprocessing The input for the proposed system is in the form of a video. Thus, the preprocessing related to video datasets is implemented using the modules—resizing the video, frame extraction, and keyframe extraction. Resizing the video—the videos are resized so that the frames of equal width and height are found during the frame extraction process, which is an important factor for us in the upcoming stages, like obtaining insights from the frames using the feature extraction process. To do this, the resize method with the VideoFileClip object is used. Each video is resized to 320 × 240

62

S. R. Aishwarya et al.

pixels. Keyframe extraction—videos are converted to images or frames to obtain their features. Using the process of frame extraction, all the frames from the video were obtained, but only specific frames contained key information.

4.2 Feature Extraction Building the model—a C3D convolutional neural model extracts the features from the preprocessed input. The network architecture contains eight convolutional layers, five pooling layers, and three fully connected layers. The fully connected layers have a size of 4096 dimensions, where the first two layers have ReLU as an activation function, and the last layer has softmax as its activation function. The first fully connected layer is used to obtain the features. The C3D model is given an input video segment of 16 frames and outputs a 4096-element vector. Extracting the features and normalizing them—the application dataset contains two types of videos: normal and abnormal. The test data contains both normal and abnormal videos. The abnormal videos are the one that includes anomalous activities which are to be predicted. So, these two types of videos are separated and put up for training the model. The normal videos and abnormal videos are processed separately for extracting features. After the processing, the model generates a py file which consists of raw features from the dataset. These raw features are obtained after the prediction from the model and normalization of the output. These features are later processed to get the original features.

5 Implementation Results and Discussion Under the module preprocessing, frames are extracted from videos. The essential information is obtained using the keyframe extraction technique from all the available frames. The processes include resizing the video, frame extraction, and keyframe extraction. All three approaches have their importance. Resizing the video is essential for having the extracted frame’s standard length and width. It can be quickly processed to obtain the necessary information. Video processing is similar to image processing after obtaining the frames from the video. However, the frames received from the videos may be significant in number, as shown in Fig. 2. There are 83 frames after frame extraction, which is large in number. So, the keyframe extraction method opted to focus only on the frames that have vital information. As shown in Fig. 3, the keyframes obtained include only 28 images. A weakly supervised approach is used to classify the video as normal or abnormal. Under feature extraction, the normal, abnormal videos are processed separately, and their features are obtained as a vector. This vector is then normalized and stored as an npy file. The features referring to different classes are obtained and stored in

A Framework for Identifying Theft Detection Using Multiple-Instance …

Fig. 2 Frame extraction

Fig. 3 Keyframe extraction

63

64

S. R. Aishwarya et al.

Fig. 4 Visualization of different classes of anomalies

a single file. These raw features obtained from C3D are processed and sent to the neural network for training the model. Under neural network construction, a three-layered, fully connected network has been built to process the features obtained after extraction using the Keras library. The visualization of different classes of anomalies that were considered is shown in Fig. 4 as a bar graph and tag cloud. The interface for uploading video for processing and obtaining results is created. Figure 5 displays the video to be processed before resizing it. Figure 6 shows the implementation progress on different modules of the proposed work.

6 Conclusion Video preprocessing, which includes frame resizing and keyframe extraction using histogram and SSIM-based techniques, has been done, and the results (keyframes) have been used for the feature extraction. Features relating to different types of abnormal activity are obtained in vectors and fed to the neural network model for further classification. A three-layered, fully connected network has been built in neural networks to process the features obtained after extraction using the Keras library. This framework focuses on identifying if there happens an abnormal activity to determine the type of anomalous activity and the time frame in which it occurs. Most research on theft detection is done only until identifying an abnormal activity. However, placing the type of abnormal activity would let us take the perfect action within time. The present work can be further enhanced by considering various theft cases and training the model accordingly. It cannot be expected that the theft in

A Framework for Identifying Theft Detection Using Multiple-Instance …

65

Fig. 5 Display of the video

Fig. 6 Progress of theft detection modules

American countries may be similar to that in Asian countries. So, further research is needed to develop over different theft cases occurring in other countries.

66

S. R. Aishwarya et al.

References 1. Cheng, M., Cai, K., & Li, M.: Rwf-2000: an open large scale video database for violence detection. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 4183– 4190). IEEE (2021) 2. Zaheer, M.Z., Mahmood, A., Shin, H., Lee, S.I.: A self-reasoning framework for anomaly detection using video-level labels. IEEE Signal Process. Lett. 27, 1705–1709 (2020) 3. Feng, J. C., Hong, F. T., Zheng, W. S.: Mist: multiple instance self-training framework for video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14009–14018 (2021) 4. Dubey, S., Boragule, A., Gwak, J., Jeon, M.: Anomalous event recognition in videos based on joint learning of motion and appearance with multiple ranking measures. Appl. Sci. 11(3), 1344 (2021) 5. Dubey, S., Boragule, A., Jeon, M.: 3D ResNet with ranking loss function for abnormal activity detection in videos. In: 2019 International Conference on Control, Automation and Information Sciences (ICCAIS), pp. 1–6. IEEE (2019) 6. Wan, B., Jiang, W., Fang, Y., Luo, Z., Ding, G.: Anomaly detection in video sequences: a benchmark and computational model (2021). arXiv:2106.08570 7. Grega, M., Matiola´nski, A., Guzik, P., Leszczuk, M.: Automated detection of firearms and knives in a CCTV image. Sensors 16(1), 47 (2016) 8. Parab, A., Nikam, A., Mogaveera, P., & Save, A.: A new approach to detect anomalous behaviour in ATMs. In: 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 774–777. IEEE (2020) 9. Kamoona, A. M., Gosta, A. K., Bab-Hadiashar, A., Hoseinnezhad, R.: Multiple instance-based video anomaly detection using deep temporal encoding-decoding (2020). arXiv:2007.01548 10. Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6479– 6488 (2018) 11. Tian, Y., Pang, G., Chen, Y., Singh, R., Verjans, J. W., Carneiro, G.: Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning (2021) arXiv:2101. 10030 12. Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., Shroff, G.: LSTM-based encoder-decoder for multi-sensor anomaly detection (2016) 13. Wang, L., Zhou, F., Li, Z., Zuo, W., Tan, H.: Abnormal event detection in videos using hybrid spatio-temporal autoencoder. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 2276–2280 (2018) 14. Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 733–742 (2016) 15. Viji, S., Kannan, R., Jayalashmi, N.Y.: Intelligent anomaly detection model for ATM booth surveillance using machine learning algorithm: intelligent ATM surveillance model. In: 2021 IEEE International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India (2021) 16. Wei, J., Zhao, J., Zhao, Y., Zhao, Z.: Unsupervised anomaly detection for traffic surveillance based on background modelling .In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 129–1297 (2018) 17. Luo, W., Liu, W., Gao, S.: A revisit of sparse coding-based anomaly detection in stacked RNN framework. IEEE Int. Conf. Comput. Vis. 2017, 341–349 (2017) 18. Pang, G., Yan, C., Shen, C., van den Hengel, A., Bai, X.: Self-trained deep ordinal regression for end-to-end video anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12173–12182 (2020) 19. Ilse, M., Tomczak, J.M., Welling, M.: Attention-based deep multiple instance learning. arXiv: 1802.04712v4 [cs.LG] (2018)

A Framework for Identifying Theft Detection Using Multiple-Instance …

67

20. Schlegl, T., Seeböck, P., Waldstein, S.M., Schmidt-Erfurth, U., Langs, G.: Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery. arXiv: 1703.05921v1 [cs.CV] (2017) 21. Vinayakumar, R., Soman, K.P., Poornachandran, P.: Long short-term memory-based operation log anomaly detection. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 236–242 (2017) 22. Ravikumar, S., Vinod, D., Ramesh, G., Pulari, S.R., Mathi, S.: A layered approach to detect elephants in live surveillance video streams using convolution neural networks. J. Intell. Fuzzy Syst. 38(5), 6291–6298 (2020) 23. Aloysius, N., Geetha, M.: A review on deep convolutional neural networks. In: 2017 International Conference on Communication and Signal Processing (ICCSP), India (2017) 24. Mishra, C., Bagyammal T., Parameswaran, L.: An algorithm design for anomaly detection in thermal images. In: Innovations in Electrical and Electronic Engineering, Springer, 2021. 25. Krishnamoorthy, V., Mathi, S.: An enhanced method for object removal using exemplarbased image inpainting. In: 2017 International Conference on Computer Communication and Informatics (ICCCI) (pp. 1–5). IEEE (2017)

An Efficient Reversible Data Hiding Based on Prediction Error Expansion Manisha Duevedi, Sushila Madan, and Sunil Kumar Muttoo

1 Introduction With the boundless advancements in multimedia network technology, securing information has become the need of an hour. Several techniques such as cryptography, steganography, and encryption are used to safeguard digital content confidentiality, integrity, and authentication. Broadly, data hiding covers both watermarking and steganography. Cryptography focuses on the protection of the ownership of digital content [1, 2], whereas steganography is an art of hiding secret data in a cover media so as not to invite an attacker’s speculation [3, 4]. Steganography can be further categorized into two main types, i.e., reversible and irreversible depending upon the kind of distortion caused in cover media. In irreversible, data hiding is performed in an irreversible manner where it is not possible to recover the cover media post secret data extraction whereas in reversible, this limitation is removed in a way that both the secret data and cover media can be obtained in their original form at the receiver’s end. The importance of RDH arises in scenarios where image distortion is highly related to the loss of critical information such as military, law forensics, medical science, etc. Several RDH algorithms have been proposed so far which are broadly classified into spatial and transform domains [5–7]. Based on different embedding procedures, RDH in the spatial domain is categorized into techniques such as difference expansion (DE) [8–10], histogram shifting (HS) [11–14], and prediction error expansion (PEE) [15–20]. In DE-based techniques, Tian [8] introduced a high-capacity RDH algorithm by embedding secret data in the expandable difference values of the pixels. M. Duevedi (B) · S. K. Muttoo Department of Computer Science, University of Delhi, Delhi, India e-mail: [email protected] S. Madan Lady Shri Ram College, University of Delhi, Delhi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_7

69

70

M. Duevedi et al.

Another approach proposed in [9] works by bidirectional difference extension of the approximate mean of two adjacent pixels. [10] uses local prediction for DE. For each center pixel of the square block, it computes a least square predictor and uses the corresponding prediction error for expansion. In [13], the HS technique is used to embed secret data. It creates a separate residual image by computing residual values for all equal-sized blocks of the image. The histogram of this residual image is then used for embedding secret data. [12] generates an n-dimensional histogram for data embedding. Other authors [13, 14] have also proposed different RDH techniques based on histogram shifting and prediction error modifications. In this paper, we propose a PEE-based RDH scheme in grayscale images. Firstly, we divide the image into a checkerboard pattern identified via one white plane and the other as a gray plane. Secondly, corresponding to pixels of one plane, fluctuation and prediction are performed. Thirdly, the embedding of half the payload is performed in the pixels of the selected plane such that smooth regions are chosen for embedding depending upon the computed fluctuation values. Post embedding in one plane, all the described steps are repeated for the second plane. Data extraction and image recovery are performed reversibly at the receiver’s end. The experimental results prove that the scheme provides a sufficiently good embedding capacity and a better stego-image quality. The rest of the paper is organized as follows. Section 2 briefly describes the literature survey. Section 3 covers the proposed scheme in detail. Section 4 extends the experimental results and comparative analysis of the proposed scheme. Section 5 concludes the paper.

2 Literature Survey Several RDH algorithms have been proposed so far based on PEE. It is the combination of DE and histogram shifting methods that instigated the concept of PEE. In PEE-based RDH techniques, the image pixels are first predicted using a mechanism. The prediction errors are then obtained as the difference between the original pixel value and the predicted pixel value. The histogram corresponding to prediction errors forms a Laplacian-like distribution centered around 0, which is then shifted or expanded to incorporate the secret data bits [15] proposes a PEE-based scheme with a large embedding capacity and low distortion rate. It identifies the flat and smooth regions of the image using local complexity as a parameter. It uses flat regions to embed 2 bits and rough regions to embed 1 bit. In [20], another dual image RDH using PE shift has been introduced. It uses a bidirectional shift strategy to embed more data at a lower distortion rate [21] embeds the signed representation of secret data using block-based prediction and HS methodology [22, 23] have also proposed PEE-based RDH schemes. Further, advancements in PEE-based RDH lead to pixel value ordering (PVO)based techniques to generate prediction errors. [16] being one of the PVO-based techniques divides the image into equal-sized blocks. The blocks are sorted and the

An Efficient Reversible Data Hiding Based on Prediction Error Expansion

71

maximum and minimum valued pixels of the block are predicted using the ordering of pixels in the block. The peaks -1 and 1 of the prediction error histogram (PEH) are then modified to embed secret data. In [17], an improved version of [16] is presented. In this, the prediction errors are generated using the pixel locations of the largest and second-largest pixels of the block. The peaks 0 and 1 of the PEH are then modified to hide secret data. [19] also improved the original PVO scheme by reducing the size of the block so that a higher capacity of secret data can be accommodated in the image [18] proposes another improved PVO that dynamically identifies the number of embeddable pixels in a block depending upon the correlation of pixels in the block. All the PVO-based techniques used an image block as a unit to perform prediction and embedding, [24] on the other hand, implements a pixel-based PVO predictor (PPVO) for RDH. It performs prediction in a pixel-wise manner using a defined sorted context. It achieves a higher embedding capacity and better stego-image quality in comparison with traditional PVO approaches. In, Wu et al. [25] improvised the PPVO scheme by including the closer pixels in the context for prediction accuracy. It uses multiple histograms and multi-sized prediction contexts to perform embedding. The concept of SVM [24] can also be used for pixel prediction in RDH schemes.

3 Proposed Scheme In this section, we will describe the method in detail. Firstly, the detailed process of the prediction scheme is introduced. Secondly, the calculation of local complexity and fluctuation estimation as a measure of smoothness is presented. Finally, the procedures of embedding and extracting are detailed.

3.1 Calculation of Prediction Error As shown in Fig. 1, the original image is first divided into two sets X and Y in a checkboard pattern. Set X is white, while set Y is gray. The pixels of set X is used to predict pixels of set Y and vice versa. All the pixels of the image are predicted except the boundary pixels. Since the process of prediction is similar for sets X and Y, we take set X as an example to introduce the scheme in detail. For each pixel of set X, its four nearest neighbors are used for prediction as shown in Fig. 2. 1. Firstly, the mean μ of the four neighbors is calculated as follows μ=

Y1 + Y2 + Y3 + Y4 . 4

(1)

72

M. Duevedi et al.

Fig. 1 Example to illustrate the division of images in a checkerboard pattern. White pixels belong to set X and gray pixels belong to set Y

Fig. 2 Example to demonstrate the four neighboring pixels of set Y used for prediction and fluctuation calculation of pixels of set X

2. The four means μi ∀i[1, 2, 3, 4] along the positive and negative diagonals are calculated as follows: μi =

Yi + Yi+1 ∀i[1, 2, 3]. 2

μ4 =

Y4 + Y1 . 2

(2)

(3)

3. The difference between μ and each μi represented by di is obtained where i[1, 2, 3, 4]. The corresponding wi is calculated as follows.

An Efficient Reversible Data Hiding Based on Prediction Error Expansion

wi =

⎧ ⎪ ⎨ 1/4

if

4 ⎪ di ⎩ i=1 , 1+di

4 

di = 0

i=1

73

(4)

otherwise

The calculated weights wi are normalized toobtain ωi for each of the four diagonals around pixel P, respectively, such that ωi = 1. 4. The predicted value for pixel P is calculated as 

P = ω1 · μ1 + ω2 · μ2 + ω3 · μ3 + ω4 · μ4

(5)

The prediction error is calculated by e=P−P



(6)

3.2 Calculation of Fluctuation Value Since the process of fluctuation value calculation is the same for both sets X and Y. Therefore, we take pixels of set X, to illustrate the method of calculating the fluctuation values. 1. The local complexity for each pixel P of set X is calculated using the formula given below π P = |Y1 − m| + |Y2 − m| + |Y3 − m| + |Y4 − m| where m =

4 

(7)

Yi /4 is the mean of four neighboring pixels Y1 , Y2 , Y3 , and Y4 of

i=1

set Y as shown in Fig. 2. 2. The fluctuation value of the pixel gives the measure of its smoothness which is determined using the local complexity of the pixel along with the average of complexity of the alike (belonging to the same set) adjacent pixels. For instance, for pixel X 1 with only one alike adjacent pixel in Fig. 1, the fluctuation value f X 1 is computed as f X1 = πX1 + πX4

(8)

For pixel X 2 in Fig. 1 with two alike neighboring pixels, the fluctuation value f X 2 is computed as  f X2 = πX2 +

πX4 + πX5 2

 (9)

74

M. Duevedi et al.

Similarly, the fluctuation value f X 4 for pixel X 4 with four alike neighboring pixels is computed as 

f X4 = πX4

πX1 + πX2 + πX7 + πX8 + 4

 (10)

3.3 Embedding Procedure In our scheme, the embedding of data bits is performed using histogram shifting due to which the problem of overflow and underflow may arise for the pixels with intensity values [0,255] in a grayscale image. To address this problem, the pixels with intensity value 0 are modified to 1 and one’s with value 255 are modified to 254. A location map is generated corresponding to the image to keep a track of the modified pixels. The modified pixels are marked with 1 and the rest with 0. This location map is compressed and embedded in the image as a part of the payload. The detailed embedding process for set X can be explained as follows. 1. Firstly, the two topmost peaks Pk 1 and Pk 2 of the prediction error histogram (PEH) are identified to embed secret data. 2. The fluctuation values and prediction errors estimated using procedures in Sects. 3.1 and 3.2, are obtained in a raster scan order. The

fluctuation values are then sorted in an ascending order to obtain the sequence f X i1 , f X i2 . . . f X in . Also, the prediction errors sequence is obtained as e f X i , e f X i , . . . , e f X An by 1 2 sorting the prediction errors in the ascending order of the fluctuation values. 3. 3.Half of the payload is embedded in set X. The prediction error sequence efXi , efXi , .., efXin is modified using the following equation 1

2

⎧ ei ⎪ ⎪ ⎨  ei ei = ⎪ e ⎪ ⎩ i ei

+b −b +1 −1

if e = max(Pk1 , Pk2 ) if e = min(Pk1 , Pk2 ) if e > max(Pk1 , Pk2 ) if e < min(Pk1 , Pk2 )

(11)

where b is the additional data bit, b =, Pk1 and Pk2 are the two largest peaks,  and ei is the marked prediction error. 4. The corresponding original pixel value is modified by 





Pi = Pi + ei 



(12) 

where Pi is the predicted value, ei is the marked prediction error, and Pi is the marked pixel value.

An Efficient Reversible Data Hiding Based on Prediction Error Expansion

75

Fig. 3 Flowchart of embedding procedure

5. All the pixels of set X are modified according to the above procedure to obtain the marked image of X. The marked image of set X is further used for pixels predictions and fluctuation calculation for set Y. A flowchart of embedding procedure is shown in Fig. 3.

3.4 Extraction and Recovery Procedure The process of extraction and recovery is performed inversely to the process of embedding. The data is first extracted from marked pixels of set Y and it is recovered. The recovered set Y is then used to extract data from marked pixels of set X and recover set X. We consider the marked set Y, to illustrate the process in detail. 1. Determine the fluctuation value and marked prediction errors for all the pixels of set Y using the procedure discussed in Sects. 3.1 and 3.2. Since the fluctuation value for set Y is computed using marked pixels of set X, its value remains the same pre- and post-embedding additional data in Y. The fluctuation values and marked prediction errors are first obtained in a raster scan order. The fluctuation values are

then sorted in ascending order to obtain the sequence f Yi1 , f Yi2 . . . f Yin . The sequence of marked prediction errors    e fY , e fY , . . . , e fY is obtained by sorting the prediction errors according to i1 i2 in the ascending sequence of fluctuation values.    2. The additional data bits are extracted from the sequence e fY , e fY , . . . , e fY i1 i2 in using the equation as follows: ⎧ ⎨0 b= 1 ⎩

if e = Pk1 or e = Pk2 if e = min{Pk1 , Pk2 } − 1 or e = max{Pk1 , Pk2 } + 1

(13)

76

M. Duevedi et al.

where b{0, 1} is the extracted data bit, Pk1 and Pk 2 are the peak values used in the process of embedding. The bits are extracted from the sequence until half of the payload that was embedded in set Y is extracted. 3. The prediction errors are recovered from the marked prediction errors    efY , efY , . . . . . . , efY using the equation as follows: i1

i2

in

⎧  ⎪ ⎪ ei ⎪ ⎪ ⎪ ⎨ ei ei = ei ⎪ ⎪ ⎪ ei ⎪ ⎪ ⎩ e i

−1 +1 −1 +1

if ei if ei if ei if ei if ei

= max{Pk1 , Pk2 } + 1 = min{Pk1 , Pk2 } − 1 = Pk1 or ei = Pk2 > max{Pk1 , Pk2 } < min{Pk1 , Pk2 }

(14)

The original pixel value Pi is recovered corresponding to the obtained prediction error ei using the equation as 

Pi = Pi + ei 

(15)

where Pi is the recovered pixel value, Pi is the predicted value of the pixel, and ei is the recovered prediction error. 4. All the pixels of set Y are recovered, and the same process is followed to extract the remaining half payload from set X and recover the pixels of set X. 5. Lastly, the pixels are recovered using the decompressed location map. For a 1 in the location map, 254 is changed to 255 and 1 is changed to 0. Finally, the whole image is perfectly recovered. A flowchart of extraction and recovery procedure is shown in Fig. 4.

Fig. 4 Flowchart of extraction and recovery procedure

An Efficient Reversible Data Hiding Based on Prediction Error Expansion

77

4 Experimental Results and Comparative Analysis In this section, the performance of the proposed method is evaluated and analyzed through several experiments. The experiments have been performed on ten 512×512 grayscale test images available on the SIPI image database. We compare the results with four state-of-the-art schemes of Li et al. [16] (PVO), Peng et al. [17] (IPVO), Jung [19], and Jia et al. [13] to assess the performance of proposed methods in terms of image visual quality and embedding capacity.

4.1 Comparison of Embedding Capacity Table 1 constitutes the maximum capacity that can be embedded in the proposed scheme and other state-of-the-art schemes. All the compared schemes are RDH schemes based on PEE. We have chosen a block size of 2 × 2 to implement scheme PVO, IPVO, and block size 1 × 3 for Jung [19]. It can be observed from the table that the proposed scheme provides better embedding capacity than PVO [16], IPVO [17], and Jung [19] and an almost equivalent embedding capacity as that of Jia et al. [13]. Schemes proposed in [16, 17], and [19] provide lower embedding capacities in comparison with the proposed scheme and [13]. Since schemes ([16, 17], and [19]) perform block-based prediction and use only the maximum and the minimum pixels of the block for embedding, i.e., only 2 bits can be embedded in a block at the lower and the higher end of the block, whereas [13] divides the image in a checkerboard pattern and performs prediction corresponding to each pixel using its four neighboring pixels. It modifies the two topmost peaks of the PEH to embed secret data, which results in a higher embedding capacity. Similarly, our proposed scheme also divides the image in a checkerboard pattern but uses different methodologies for Table 1 Comparison in terms of embedding capacity (in bits) Image

PVO[16]

IPVO[17]

Jung[19]

Jia et al. [13]

The proposed scheme

Baboon

13,000

13,000

14,225

21,716

21,619

Airplane

38,000

52,000

46,368

97,449

96,787

Elaine

21,000

25,000

24,082

35,546

35,717

Tiffany

33,000

43,000

38,969

72,158

72,245

Man

28,273

35,000

29,422

68,138

66,292

House

31,040

46,000

37,442

88,281

84,650

Splash

41,496

53,000

47,156

97,430

97,338

Woman

43,752

58,000

50,898

Airport

25,135

27,000

26,768

45,737

44,977

Aerial

26,679

32,441

29,531

63,927

61,471

1,14,652

1,16,724

78

M. Duevedi et al.

prediction and fluctuation calculation. Also, embedding in our scheme is performed in the two topmost peaks of the PEH without considering zero points of the histogram. The scheme achieves a sufficiently large embedding capacity as can be observed in Table 1.

4.2 Comparison of PSNR The comparative analysis of image quality of the proposed scheme with PVO[16], IPVO[17], Jung[19], and Jia et al. [13] at varying capacities from 5000 bits to their maximum embedding capacities with a step size of 2000 bits is shown in Fig. 5.We have chosen block size of 2 × 2 to implement scheme PVO, IPVO, and 1 × 3 for Jung [19]. Peak signal-to-noise ratio (PSNR) in dB has been used as the measure to quantify the image quality. It can be observed that the proposed scheme offers a better PSNR value than PVO [16], IPVO [17], and Jung [19] schemes for all of the images. PVO and Jung [19] use blocks to perform embedding. However, the schemes do not apply any technique to select smooth blocks, so that a better marked image quality can be obtained. The PSNR values of Jia et al. [13] and the proposed scheme differ by minute values as the overlap can be observed for almost all the images in Fig. 5. However, the difference is significant for texture images like Baboon in which the proposed scheme observes a better marked image quality by approx. The PSNR values for the images at a low embedding payload of 10,000 bits can be observed in Table 2. Our proposed scheme performs better for nine out of ten images at low embedding payload.

5 Conclusion In this paper, a PEE-based RDH has been proposed that divides the image in a checkerboard pattern with two planes such that pixels of one plane are utilized to perform pixel prediction and fluctuation calculation by the other plane. Compared with four state-of-art schemes [16, 17, 19], and [13], the PSNR value has improved by the proposed scheme especially for texture image like baboon. The scheme also provides a significantly higher embedding capacity than other schemes.

An Efficient Reversible Data Hiding Based on Prediction Error Expansion

79

a)

b)

c)

d)

e)

f)

Fig. 5 Performance comparison of PVO [16], IPVO [17], Jung [19], and Jia et al. [13] with the proposed scheme for test images a Airplane, b Baboon, Cameraman, d Elaine, e House, f splash, g Tiffany, h Airport, i Aerial, and j Woman

80

M. Duevedi et al.

g)

h)

i)

j)

Fig. 5 (continued) Table 2 Comparison in terms of PSNR (dB) for PVO [16], IPVO [17], Jung [19], and Jia et al. [13] at payload of 10,000 bits Image

PVO[16]

IPVO[17]

Jung[19]

Jia et al.[13]

The proposed scheme

Baboon

53.84

53.53

51.67

54.72

55.02

Airplane

61.74

60.60

58.63

63.13

63.13

Elaine

56.04

55.91

54.67

57.75

57.75

Tiffany

59.64

59.12

57.05

60.95

60.98

Man

59.18

58.86

55.25

62.57

62.68

House

61.04

63.59

60.37

64.09

64.26

Splash

60.89

60.00

57.47

63.02

63.12

Woman

62.10

60.70

59.24

62.86

62.84

Airport

57.99

57.07

59.94

60.74

60.82

Aerial

59.73

59.50

55.31

62.15

62.18

An Efficient Reversible Data Hiding Based on Prediction Error Expansion

81

References 1. Agarwal, N., Singh, A.K., Singh, P.K.: Survey of robust and imperceptible watermarking. Multimedia Tools Appl. 78(7), 8603–8633 (2019). https://doi.org/10.1007/s11042-018-7128-5 2. Qin, C., Ji, P., Zhang, X., Dong, J., Wang, J.: Fragile image watermarking with pixel-wise recovery based on overlapping embedding strategy. Signal Process. 138, 280–293 (2017). https://doi.org/10.1016/j.sigpro.2017.03.033 3. Kadhim, I.J., Premaratne, P., Vial, P.J., Halloran, B.: Comprehensive survey of image steganography: techniques, evaluations, and trends in future research. Neurocomputing 335, 299–326 (2019). https://doi.org/10.1016/j.neucom.2018.06.075 4. Zhang, J., Lu, W., Yin, X., Liu, W., Yeung, Y.: Binary image steganography based on joint distortion measurement. J. Vis. Commun. Image Represent. 58, 600–605 (2019). https://doi. org/10.1016/j.jvcir.2018.12.038 5. Hou, D., Wang, H., Zhang, W., Yu, N.: Reversible data hiding in JPEG image based on DCT frequency and block selection. Signal Process. 148, 41–47 (2018). https://doi.org/10.1016/j. sigpro.2018.02.002 6. Li, F., Mao, Q., Chang, C.-C.: Reversible data hiding scheme based on the Haar discrete wavelet transform and interleaving prediction method. Multimedia Tools Appl. 516, 5149–5168 (2017). https://doi.org/10.1007/s11042-017-4388-4 7. Wang, X., Li, X., Yang, B., Guo, Z.: Efficient generalized integer transform for reversible watermarking. IEEE Signal Process. Lett. (2010). https://doi.org/10.1109/LSP.2010.2046930 8. Tian, J.: Reversible data embedding using a difference expansion. IEEE Trans. Circ. Syst. Video Technol. 13(8), 890–896 (2003). https://doi.org/10.1109/TCSVT.2003.815962 9. Wang, W.: A reversible data hiding algorithm based on bidirectional difference expansion. Multimedia Tools Appl. 79(9–10), 5965–5988 (2020). https://doi.org/10.1007/s11042-01908255-z 10. Dragoi, I.C., Coltuc, D.: Local-prediction-based difference expansion reversible watermarking. IEEE Trans. Image Process. 23(4), 1779–1790 (2014). https://doi.org/10.1109/TIP.2014.230 7482 11. Tsai, H.L.Y.P., Hu, Y.C.: Reversible image hiding scheme using predictive coding and histogram shifting. Signal Process. 89, 1129–1145 (2009) 12. Li, T.Z.X., Li., Yang, B.: General framework to histogram-shifting-based reversible data hiding. IEEE Trans. Image Process. 22 2181–2191 (2013) [Online]. Available: https://ieeexplore.ieee. org/document/6459018/ 13. Jia, Y., Yin, Z., Zhang, X., Luo, Y.: Reversible data hiding based on reducing invalid shifting of pixels in histogram shifting (2019). https://doi.org/10.1016/j.sigpro.2019.05.020 14. Chen, X., Sun, X., Sun, H., Zhou, Z., Zhang, J.: Reversible watermarking method based on asymmetric-histogram shifting of prediction errors. J. Syst. Softw. 86(10), 2620–2626 (2013). https://doi.org/10.1016/j.jss.2013.04.086 15. Li, X., Yang, B., Zeng, T.: Efficient reversible watermarking based on adaptive prediction-error expansion and pixel selection. IEEE Trans. Image Process. 20(12), 3524–3533 (2011). https:// doi.org/10.1109/TIP.2011.2150233 16. Li, X., Li, J., Li, B., Yang, B.: High-fidelity reversible data hiding scheme based on pixelvalue-ordering and prediction-error expansion. Signal Process. 93(1), 198–205 (2013). https:// doi.org/10.1016/j.sigpro.2012.07.025 17. Peng, B.Y.F., Li, X.: Improved PVO-based reversible data hiding. Digital Signal Process 255– 265 (2013) [Online]. Available: https://www.sciencedirect.com/science/article/pii/S10512004 13002479 18. Weng, S., Shi, Y.Q., Hong, W., Yao, Y.: Dynamic improved pixel value ordering reversible data hiding. Inf. Sci. 489, 136–154 (2019). https://doi.org/10.1016/j.ins.2019.03.032 19. Jung, K.: A high-capacity reversible data hiding scheme based on sorting and prediction in digital images, pp. 13127–13137 (2017). https://doi.org/10.1007/s11042-016-3739-x

82

M. Duevedi et al.

20. Yao, H., Mao, F., Tang, Z., Qin, C.: High-fidelity dual-image reversible data hiding via prediction-error shift. Signal Processing 170 (2020). https://doi.org/10.1016/j.sigpro.2019. 107447 21. Xie, X.Z., Chang, C.C., Hu, Y.C.: An adaptive reversible data hiding scheme based on prediction error histogram shifting by exploiting signed-digit representation. Multimedia Tools Appl. 79(33–34), 24329–24346 (2020). https://doi.org/10.1007/s11042-019-08402-6 22. He, W., Cai, Z.: Reversible data hiding based on dual pairwise prediction-error expansion. IEEE Trans. Image Process. 30, 5045–5055 (2021). https://doi.org/10.1109/TIP.2021.3078088 23. Li, S., Hu, L., Sun, C., Chi, L., Li, T., Li, H.: A Reversible data hiding algorithm based on prediction error with large amounts of data hiding in spatial domain. IEEE Access 8, 214732– 214741 (2020). https://doi.org/10.1109/ACCESS.2020.3040048 24. Qu, X., Kim, H.J.: Pixel-based pixel value ordering predictor for high-fidelity reversible data hiding. Signal Process. 111, 249–260 (2015). https://doi.org/10.1016/j.sigpro.2015.01.002 25. Wu, H., Li, X., Zhao, Y., Ni, R.: Improved PPVO-based high-fidelity reversible data hiding. Sig. Process. 167:2020. https://doi.org/10.1016/j.sigpro.2019.107264

Deriving Insights from COVID-19 Avinash Golande, Shruti Warang, Sakshi Bidwai, Rishika Shinde, and Sakshi Sukale

1 Introduction “The World Health Ministry” named the “COVID-19” virus a global health warning, a virus of the corona group that poses a threat to the planet. The majority of nations of the world have noticed a massive range of COVID-19 cases from December 2019 onward. COVID-19 is more likely to occur in people who have a weakened immune system, are older, or have lung-related medical problems. The number of persons infected with coronavirus is growing by the day. As of 30th March 2022, Maharashtra has the highest number of covid cases. We aim to improve the prediction model to the level of attack rates and the number of completed cases in the future as well as to determine the point of division of total cases and completed cases are divided. That means a time when there will be no new cases of coronavirus in India under the assumption that the current environmental situation remains the same. Viral illness statistical methods and accompanying analytical techniques have become critical inputs in the development of control and prevention measures. The simulations allow us to evaluate different strategies in exercises before actually implementing them on persons or organizations of individuals. Statistical estimates of this epidemic from India have been tried by many researchers since the beginning of the COVID-19 cases. Simulations make it possible to determine multiple approaches in rehearses before incorporating them into different individuals.

A. Golande (B) · S. Warang · S. Bidwai · R. Shinde · S. Sukale JSPM’S Rajarshi Shahu College of Engineering, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_8

83

84

A. Golande et al.

1.1 Related Work In [1], for pandemic analysis, prognosis, and COVID-19 recognition, a sensor platform is deployed. The framework may execute observational, explanatory, analytical, and regulatory studies. We predict an epidemic using the neural network. Data analysis approaches are employed in this study for objective and systematic analysis, and additional data on the epidemic’s numerous manifestations are provided. Then, using a machine learning model to estimate disease using distinct ailment criteria, perform a forecasting study. The prescriptive analysis is provided for consumers by comparing the output of the neural network with various machine learning algorithms. In [2], techniques for anticipating the appearance of the COVID-19 in a range of nations using machine learning approaches. Here, a supervised learning model was constructed to forecast the occurrence of COVID-19 transmission throughout many nations and the expected period when the virus may be eradicated. The outcome shows that the transmission would decrease significantly which will soon end. This study is greatly enhanced in the decision tree algorithm on real-time global COVID19 data. The optimization algorithm for this work, notably the algorithm regression trees and logistic extrapolation, is effective models for forecasting sequence and timeline difficulties. The outcome reveals a 99 percent accuracy rate. In [3], the goal of the research is to create a framework that can effectively anticipate future outbreak changes over long periods using previous case data although the LSTM model does have some certain forecasting issues. To improve its accuracy, the LTSM model is integrated with other models. Because typical LSTM models forecast problematic deviation of the data. According to findings, the model is capable of accurately predicting verified instances. In [4], we study the impact of these strategies to stop the spread of COVID-19 statistically and propose a new statistical model to predict new conditions or events and infected cases in the real world. A new model of the situation determined by the spread of COVID-19 is proposed. It is considered a tree-based model, where some people are quarantined and a few are left unseen (hidden nodes) for a variety of reasons such as unidentified landmarks and hiding history tours, and these hidden explosives spread the disease to the public. Proof of the analysis and results obtained shows that no community has spread in India so far, that is, most people are not infected or do not spread the disease due to lockdown or isolation. In [5], we differentiate the results of various machine learning approaches in determining results according to the patient risk factor configurations. A random class classifier is the most effective algorithm, with a Fbeta-measure of 0.788 and an efficiency of 0.789. Using the same range of risk markers, such as race, location, indications, and health diseases, the model described in this research may predict outcome (i.e., death, discharge, or stabilization) in each patient detected with COVID19. In [6], a SEIR compact model was created, in which the movement of people through compounds was simulated using a series of simultaneous equations. Several situations are simulated for 1000 runs.

Deriving Insights from COVID-19

85

2 Literature Survey

Year

Paper

Tech/Algorithm

Results

Accuracy

Dataset

2nd April 2020

COVID-19 outbreak modeling and projections in India

Geometric progression or series or patterned advance, Tree-based model structure

Lockdown in many stages is readable 197,200 cases after 6 days break after 15 days closure. Many COVID-19 patients were found in huge numbers

R-naught denotes the contagiousness of an infectious disease, which is 1.9 in this case. One of the infective nodes infects another in 2.3 days. The rate of recovery is 4 days

Web site: World meters, WHO

31st Dec 2020

A framework for leveraging big analytics to anticipate pandemics

Internet of Things-based healthcare framework for analysis

When compared to another machine learning model, a neural network-based model achieves 99 percent accuracy

Machine learning— < 94% Neural network—99%

COVID-19 dataset

10th April The COVID-19 Decision tree virus outbreak: algorithm and A machine linear regression learning model prediction study

The results 99% predicted COVID-19 diseases will decrease significantly during the first week of September 2021 when it will end soon

Johns Hopkins University, WHO and Worldometer official Web site

31st August

By integrating the LSTM and the Markov technique, we were able to predict and analyze the COVID-19 epidemic trend

LSTM-Markov model

The More than 60% LSTM-Markov model can anticipate validated incidents successfully; predictable outcomes would aid in government decision-making in taking appropriate as well as effective action importance in life

GitHub repository

2021

A stochastic mathematical model of the COVID-19 epidemic in India’s health care

SEIR (Susceptible, exposure, infectious, and recovered)

With a rapid NPI center, the total number of cases, hospitalizations, ICU utilization, and mortality rates can be decreased by over 90%

In India, R0 = 2.28, N = 1375.98 million, Gestation period = 5.1,contagious period = 7,epidemicdevelopmentrate = 1.15 and N = 1375.98 million, incubation period = 5.1, infectious period = 7

Web site: World Meters, The Ministry of Health and Family Welfare Web site

(continued)

86

A. Golande et al.

(continued) Year

Paper

2021

Patient Random forest outcomes classifier predicted using machine learning approach

Tech/Algorithm

3 Existing Model • System Architecture See Fig. 1 • Methodology: See Fig. 2

Fig. 1 Existing proposed architecture

Results

Accuracy

Dataset

The model developed in this study can predict the outcome (death, discarded, or stable) in each patient diagnosed with COVID-19 using the same collection of hazardous drugs, such as age, location, indications, and related illnesses

78%

John Hopkins University Center for Systems Science And Engineering (JHU CSSE)

Deriving Insights from COVID-19 Fig. 2 Existing methodology

4 Proposed Model • System Architecture: See Fig. 3 • Methodology:

87

88

A. Golande et al.

Fig. 3 Proposed architecture

5 Working Methodology of Tableau (a) Gather document requirements: As shown in Figure 4, before we begin any programming work, we gather all of the necessary information to develop and customize the interface. This would be probably the most challenging Workflow Display Recommended Practices. So, it is difficult to determine what your requirement is. As a result, it is suggested that you start by making a list of questions. Then, look for data that can help you answer them. We derived the criteria for this dashboard using existing data in an Excel file. We intended to design a Covid-19 variants dashboard that a user could quickly read to determine how effectively the health sector should operate to reduce the amount of covid cases and deaths. (b) Make Sketches: Drafting drawings is a must-do Tableau dashboard best practice. Before we start designing dashboards, we need to know what they should assemble. A few of my efforts at designing the design are shown here; this makes it easier to get the layout and chart kinds we want. Another reason we should draw is to avoid over-cluttering. We used three charts on our dashboard: bar charts, a tree map, and a global map. (c) Choose Chart Types And Analysis: One of the best practices that many data analysts overlook is chart-type selection. It is vital that you choose the right chart types for your views. It is always a smart idea to figure out what kind of research you want to do and what kinds of charts will best report the results. How fast, for example, do cases rise in each country? Or, which country is the most afflicted by which variant? (d) Divide The Dashboard Into Sections: The most important dashboard in Tableau’s exemplary initiatives is to segregate dashboards into parts. We need to break things down into bite-sized parts for better analysis and comprehension. Before going back to the drawing board, go back to the questions you want to be addressed. We decided to split the panel into three pieces: bar charts for categorical data, a tree map for grouped data in a hierarchical framework, and a world map for analyzing the global spread of COVID-19 variations.

Deriving Insights from COVID-19

89

Fig. 4 Proposed methodology

(e) Build Your Views: You may save a lot of time during the project development if you know what views to add ahead of time. I. KPI Graphs One of the best crystal display optimization techniques is to use BANs to help with current KPIs and monitor how well your findings were delivered immediately. This can be done quickly by selecting the multiple aspects and using the Reveal Me word column tool, or by using the dimension names and values alternative. Make use of charts and graphs. The usage of bar charts to depict how the quantifiable appear when compared to alternative locations is one of Viz Display’s best practices. To easily demonstrate that your bar charts have distinct metrics, use labels on top of the charts and in boxes. To make a horizontal chart in Tableau, put the dimension field (Country in this case) on the row shelf and the quantity or metric (variants and cases) on the column shelf.

90

A. Golande et al.

II. Make use of Tree Map One of the Tableau dashboard best practices is to use a tree map to display hierarchical data as a group of nested rectangles. A colorful rectangle (branch) with smaller rectangles within it represents each level of the hierarchy (leaves). III. Use World Maps Maps are one of the most effective and simple chart styles in Tableau. They are effective because they allow us to decode latitude and longitude combinations almost immediately, allowing us to spot similarities between geographic areas that would otherwise be difficult to find. (a) Bring It All Together: The ability to combine them on the panel and make it both stylish and efficient is the second most significant display success criteria. (b) Choose a Layout and Arrangement: This is when our Tableau dashboard best practices come into help. Putting the various views on our dashboard becomes a lot easier now that we have already sketched out the layout. Creating your panel in a panel arrangement is recommended. We use outlier containers to construct boxes for our charts before presenting them on the dashboard. We utilized a 1400 × 1000 layout for this dashboard. (c) Use Display Events To Increase Connectivity: Use dashboard actions to increase responsiveness. Using dashboard actions is one of Tableau’s guiding principles for increasing interactivity. (d) Include The Necessary Filters: Including appropriate filters can help us filter the data more precisely and readily obtain conclusions.

6 Tools Used Tableau: It is a dashboard-based information visualization engine that makes it simple to produce dynamic spectral inspection. Non-technical analysts and end-users could simply adapt data into intelligible, interactive visuals, charts, and graphs with these dashboards. Engaging [7] with your identification data makes it easier to leverage data from public sources, resulting in fresh and helpful insights. Steps: • Gather Document Requirements • Make Sketches

Deriving Insights from COVID-19

• • • • • • • • • • •

Choose Chart Types and Analysis Divide the Dashboard into Sections Build Your Views KPI Charts Use Bar Charts Use Line Charts Use Gauge Charts Bring It All Together Select A Layout and Arrangement Make some formatting changes Interactivity may be achieved by using interface actions.

7 Experimental Results See Fig. 5 COVID-19 Data Analytics With Variants Software: Tableau public version Data Input: Corona dataset from Kaggle

Fig. 5 Tableau screenshot

91

92

A. Golande et al.

• Stepwise Dashboard Explanation – Layout: Dashboard is divided into 3 main charts. – Header Part: Horizontal header container which consists of Logo, Title, Total Countries, Total Dates, and Total Variants as shown in Fig. 5 above. – Chart 1: It consists of a country-wise number sequences chart. This chart is known as horizontal bar chart. Here, on the y-axis, we have put country locations, and on the x-axis, we have put the aggregate of number sequences for 43 days. – Chart 2 : It consists of the country-wise number sequences total chart. This chart is known as tree map. Here, the sizing and color density of the particular rectangles is achieved using number sequences total column. – Chart 3 : It consists of a country-wise percentage sequences chart. This chart is known as world map. Here, the color density of the particular country, region, state, etc., is achieved using the percentage sequences column. • Results of Every Chart – Chart 1: This chart conveys to us that USA & UK are the 2 major countries affected via various variants of COVID-19 whereas countries like Jamaica, Togo, Kazakhstan, and many more were the least affected nations. – Chart 2: This chart displays that again USA, UK, Germany, Denmark, and many more were the major nations who are at top of number sequences total, while Belize, Monaco, Moldova, and many more are at the bottom level for the same. – Chart 3: This chart gives us the big picture about the percentage sequences means the total amount of a nation’s population gets affected by different variants of COVID-19 in the percentile measure. Thus, the USA, Canada, and Russia are at the top while Thailand, Iraq, Iran, and many more are at the bottom level.

8 Techniques • IoT-Based healthcare framework for analysis: An Internet of Things healthcare architecture is developed to assess, anticipate, and identify the novel pandemic. We examined a variety of various types of data analysis methods: informative, descriptive, prognostic, and regulatory, and offered professionals with insights and foresight into the epidemic. • Decision tree algorithm: It is a supervised learning method that can be used to solve linear and non-linear although it is sometimes used to solve classification tasks. In this tree-structured classifier, nodes in the network contain information attributes; limbs provide a rule base, and each leaf node gives the result.

Deriving Insights from COVID-19

93

• Linear Regression: The term comes from the fact that the linear regression algorithm depicts a linear connection between a reliant (x) component and one or more unrelated (x) variables. Because linear regression denotes a direct proportionality, it determines how the reliant variable’s value varies as the unrelated variable’s value changes. • LSTM-Markov model: The recurrent neural network (RNN) has improved the LSTM model, which is now widely utilized in domains like handwritten character recognition, banking, and biochemical engineering are just a few examples. A processing unit, hidden layers, and convolution neurons make up the language model [8]. After traveling through the input layer, the input data reach the hidden layers. The most complicated are hidden layers, which might have many layers. Each LSTM hidden layer is formed by three gate sections and one store configuration module. After the input data move through three gate units and one main memory, in turn, the acceptable data are kept in the memory unit, while the invalid information is deleted, allowing for the forecast of upcoming inputs. • Random forest classifier: It is a classification algorithm that takes the arithmetic mean of many decision choices on subareas of a collection to enhance the dataset’s expected trustworthiness. The more and more trees, the more precise it is, and regularization is no longer a concern. • Arithmetic Progression: In an arithmetic progression (AP), the difference between any two consecutive integers has a set value. It is also called arithmetic sequence [9]. The common difference is always constant. The common differentiation between the two successive words will be = 2 even when working with odd and even numbers. • SEIR Model: The SEIR model splits the community into four chambers: vulnerable, revealed, afflicted, and rebounded. The population consists of the following four groups: • S is half of the people infected with the disease (those who are unable to get the disease), • E half of the people exposed (those infected but not yet infected), • I a subset of infected people (those who are unable to transmit the disease), • R only a small number of people have recovered (those who have been vaccinated against the immune system).

94

A. Golande et al.

9 Conclusion Thus we read general papers and did analyses and surveys to get an overview of the entire COVID-19 situation and its outbreak in India as well as other countries. Also, we have visualized how vaccination has helped in reducing the number of patients.

References 1. Ahmed, I., Ahmad, M. et.al.: A framework for pandemic prediction using big data analytics. Big Data Res. 25 (2021) 2. Malki, Z., Atlam, E.S., Mohamed, A.A.: The COVID-19 Pandemic: prediction study based on machine learning models. Environ. Sci. Pollut. Res. (2021) 3. Ma, R., Zheng, X., Wang, P.: The prediction and analysis of COVID-19 epidemic trend by combining LSTM and Markov method. Sci. Rep. (2021) 4. Arti M.K.: Modelling and Predictions for COVID-19 Spread in India. IEEE and Kushagra Bhatnagar (2020) 5. Alnazzawi, N.: Using machine learning techniques to Predict COVID-19 Patient Outcomes. J. King Abdulaziz Univ. Comput. Inf. Technol. Sci. (2021) 6. Chatterjee, K., Chatterjee, K.: Healthcare impact of COVID-19 epidemic in India: a stochastic mathematical model. Med. J. Armed Forces (2020) 7. Nikhat A., Yusuf, P.: Data analytics and visualization using Tableau utilitarian for COVID-19 (Coronavirus). Global J. Eng. Technol. Adv. (2020) 8. Gadhvi, N.K.: Statistical analyses of COVID-19 Cases in India. J. Infect. Dis. Epidemiol. (2020) 9. Kotwal, A., Yadav, A.K., Yadav, J.: Predictive models of COVID-19 in India: a rapid review. Med. J. Armed Forces (2020)

Smoke Detection in Forest Using Deep Learning G. Sankara Narayanan and B. A. Sabarish

1 Introduction Early detection of forest fires is very important when it comes to containing the damage caused to the environment. Forest fires can cause a great amount of economic loss, and depending on the place of its origin, it can also affect the livelihood of people. In recent years, forest fires caused by human activities have seen a major increase [1]. One of the methods used to detect fires in forest is a lookout station [2]. These lookout stations are essentially buildings built on mountain tops. They are often built in remote locations, and the fire watcher is expected to work with minimal to no supervision. But, these lookout stations are prone to have blind spots and human errors due to the lack of supervision. Another existing method of detecting fire uses CO2 sensors. These sensors have high response times and can be triggered by non-fire-related increase in CO2 levels. This makes them inefficient and slow. Due to the shortcomings of the existing methods, exploring other methods of detection is essential. One of the methods would be to use image processing with machine learning and deep learning techniques to detect smoke and fire in forest. From [3, 4], it is clear that deep neural networks (DNNs) are better than traditional classifiers for image classification. Especially, for multiclass image classification, DNNs are preferred due to the importance of features like color, shape, and size [5]. In machine learning, once feature extraction is complete, principle component analysis is done to filter out the relevant features. But, deep learning models automatically extract features according to their architecture and use all the information gained, in the G. Sankara Narayanan (B) · B. A. Sabarish Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] B. A. Sabarish e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_9

95

96

G. Sankara Narayanan and B. A. Sabarish

decision-making process. Second section of this paper covers the literature survey of related works; third section explains the different models and their architecture; the results are consolidated in the fourth section, and the fifth section contains the conclusion and future scope of the project.

2 Literature Survey In [6], a detailed comparison between the existing early detection methods using remote optical sensing has been carried out. The methods discussed are divided into three major categories: terrestrial, unmanned aerial vehicles (UAVs), and spaceborne systems. In traditional terrestrial methods, optical sensors and cameras are used to extract physical features that are color based. The problem with color-based classification systems is a high number of false alarms. These optical sensors can also be used in a deep learning system where the feature extraction is done internally which reduces the false alarm rates. This is due to better understanding of the feature representations. UAVs are used to monitor forest cover from above the ground and to eliminate the amount of blind spots which is a notable limitation in the terrestrial method. UAVs can be used to cover more than one area due to their ability to move. UAVs with GPS can be used to detect smoke and fire more precisely. These systems are still prone to false alarms due to clouds and reflection of sunlight. Satellite imaging is used in the spaceborne systems. The satellites used are differentiated with respect to their orbit. Due to high communication latency of GEO and SSO satellites, only, low earth orbit satellites can be used for real-time fire detection. In [7], an early smoke detection framework is proposed where support vector machines are used to detect and segment smoke in an image. Superpixel segmentation and superpixel merging is done in order to achieve this. The algorithm finds sets of adjacent pixels with similar characteristics and groups them together to form a superpixel. Once the superpixel blocks are formed, the algorithm classifies each block into smoke or non-smoke blocks using SVM classifiers. SVMs are used due to the small sample size available and its ability to restrict over-learning. In [8] paper, the framework uses dual convolution networks adjacently to classify the image into smoke and non-smoke classes. RGB image is given to a ResNet model, and its corresponding dark channel image is given to a different dark channel network. In this study, the feature maps of both the networks are compared to find that the dark channel network extracts effective features that resemble the original image. Then, it is compared to existing deep CNNs for evaluation of the proposed framework. In [9], ensemble learning is used with three deep learners in parallel to detect fire. YOLOv5 and EfficientDet are used to detect fire in an image, but fire-like images will definitely affect the performance. In order to overcome this, a third deep learner is added. EfficientNet is used to classify between fire and non-fire images. Based on the result from all the three learners, a decision is made. The use of object detectors like YOLOv5 and EfficientDet increases the false-positive rate which is then negated by the use of the EfficientNet classifier. This system detects and classifies only fire and

Smoke Detection in Forest Using Deep Learning

97

non-fire images. In [10], fire detection using SVM classifiers and CIFAR10 networks is compared. Due to lack of data on forest fires, patch detection is incorporated. First, the model detects if there are fire patches in the given image. If there is, then the up-sampled image is then given to the fire patch detector to find the precise location. This method of fire detection will be better for dynamic surveillance. In [11], YOLOv3 architecture is used in detecting fire and smoke in a given image. The image is acquired from an UAV; then, using cloud computing, the data are transferred to the ground station. The UAV has a local detection unit, which triggers the data transfer to the ground station where the fire diagnosis takes place. The gap in this study is that the proposed system worked optimally for large forest fires whereas the performance for small fire spots needs improvement. In [12], faster R-CNN is used with AlexNet, VGG16, and ResNet architectures. Using transfer learning, the weights are modified for the pretrained models to classify two classes: fire and non-fire images. Because of the use of R-CNN, ROIs are acquired; then, they are compared to the ground truth to form the training data. Which is then used to train the models. Once the detection and localization of the fire object is done using transfer learning, spatial analysis using LDS models is carried out and then fed into a VLAD encoder. Finally, with the use of a SVM classifier, the final classification is done. In [13], different types of SqueezeNet models are used to detect smoke in an image. The models are compared for different learning rates and batch sizes. They are also compared with other deeper neural networks: VGG16, AlexNet, ShuffleNet, MobileNet, and Xception. The average runtimes of all these models are compared to find that the SqueezeNet models are much faster. The SqueezeNet models even with complex environments and big difference between training and testing set still manage to perform with high accuracy. In [14], deep belief networks (DBNs) are used to detect smoke. It is compared with other existing smoke detection methods called Toreyin and Zhao method. Classification in this model is done using dynamic features of smoke like motion, energy, and flicker disorder. In [15], a quadcopter is designed with an IR camera module which can be used during rescue operations by the firefighters. The quadcopter acts as a UAV that transmits live data to the base station. This paper focuses more on the operation of the drone than the detection of fire. In [16], light CNN architecture used to localize fire in a given image. Although fire in high resolution images is localized with high accuracy, localization in low-resolution frames is less accurate. With the use Gaussian probability threshold method, accuracy of localization in low-resolution images is improved. In [17], performance for image classification of two deep learning models is compared. The two models compared use a light CNN architecture and a VGG like architecture as mentioned in the paper. Inference from the paper is that, VGG architecture gives better performance but at the cost of higher computation time. In [18], the proposed model is targeted toward stationary surveillance for fire. RGB color model is used to detect fire pixels using dynamic and chromatic disorder analysis. In [19], YCbCr color space model is used to detect fire. The proposed system is aimed at replacing the conventional electronic sensors used to detect fires in smart buildings (Table 1).

98

G. Sankara Narayanan and B. A. Sabarish

3 Proposed Work 3.1 Dataset The dataset used contains 377 images divided into four different classes. The four classes are as follows: No Smoke, Low Smoke, High Smoke, and Fire. Average pixel intensity of all the smoke images are taken and normalized using minmax normalization. Based on the normalized value, these smoke images are put in high smoke and low smoke classes. These images are then preprocessed to a resolution of 256 × 256 pixels, and 80% of the images are used as training data, and the rest 20% is used as validation data. For evaluation, images from different environments are collected to check the reliability of the model. The testing set contains 45 images.

3.2 CNN Model The input image after preprocessing goes through two convolutional layers of 50 and 125 filters, respectively. Two convolutional layers are stacked together to extract high-level features. Both the convolutional layers have a kernel size of 3 × 3 with a 1 step stride used to traverse through the input [20]. This stack is followed by a max pooling layer to reduce the spatial representation and parameters to reduce the computing time significantly [21]. A dropout layer that drops 25% of the input units is added to regularize the model and prevent the model from overfitting the data. The last three layers are then repeated in their respective order. A flatten method is used convert the shape of the feature into one dimension. Which is then followed by a dense layer with 500 output neurons and a ReLU activation function. 40% of the units are dropped and fed into another dense layer with 250 output neurons with another ReLU activation function. Right before the output layer, 30% of the units are dropped. 50% dropout is used widely, but in order to avoid loss of information, we use 40 and 30 percent dropouts. The rest of the units are fed into a final dense layer with 4 output neurons with softmax as the activation function. Since the model is going to be classifying four classes, a softmax function is used [22]. Model layout can be seen in Fig. 1, and summary can be seen in Fig. 2.

3.3 Inception V3 Architecture Inception architecture is chosen due to its ability to extract high-level features of different sizes. Smoke and fire do not have a specific size or shape, which makes it challenging for a neural network to learn. This challenge is addressed with the use of inception architecture. An inception block uses three different convolutional layers with different filter sizes. The 1 × 1 convolutional layer is used to extract small

Smoke Detection in Forest Using Deep Learning

99

Fig. 1 CNN model layout

features, and the 3 × 3, 5 × 5 convolutional layers are used to extract large features. In Inception V3 architecture, three types of inception blocks are used [24]. Factorization of the convolutional layers is done in order to reduce the number of parameters and reduce the computational time. In inception block A, 5 × 5 convolutional layer is replaced with two 3 × 3 convolutional layers which reduces the parameters by 28%. In inception blocks B and C, asymmetric factorization is implemented, where the 3 × 3 and 7 × 7 convolutional layers are replaced with 1 × 3, 3 × 1 and 1 × 7, 7 × 1 layers, respectively. In addition to these changes, V3 architecture has an auxiliary classifier whose loss is added to the final classifier loss to overcome the vanish gradient problem (Figs. 3 and 4). Due to the mentioned changes, Inception V 3 model is deeper, more efficient, and computationally less expensive when compared to the previous V 1 and V 2 models (Fig. 5).

3.4 Proposed Model The Inception V 3 model has been trained using Google ImageNet dataset for 1000 classes. In order for the model to work optimally, the pretrained model should be modified. These modifications can be done by adding suitable layers to the fully connected layer of the existing model. In the proposed system, the existing fully connected layer is replaced with two dense layers with ReLU activation. The convolutional layers are used to extract features that are differentiable, and the fully connected layer classifies these features. Therefore, increase in number of dense layers increases the model’s decision-making capability in classifying the extracted features. The two added dense layers have 2048 and 1024 output neurons, respectively. One dropout layer is added to each of the dense layers to avoid overfitting of the data or in other words to regularize the model. Since the dense layers take 1D input, a global average pooling layer is added to connect the output from the inception model to the input of the dense layers. Global average pooling is used instead of flatten method, to

100

G. Sankara Narayanan and B. A. Sabarish

Fig. 2 CNN model summary

reduce the number of parameters [25] which in turn reduces the computation time and prevents overfitting (Fig. 6). As mentioned earlier, since the model is classifying four classes, a softmax activation is used in the last output layer for final classification. Now that the model is modified; the next step is to train the model for fire and smoke dataset. As part of transfer learning, not all the available layers from the inception model are retrained.

Smoke Detection in Forest Using Deep Learning

Fig. 3 Inception V3 architecture compressed [23]

Fig. 4 Inception blocks A, B, and C (Left to Right) [24]

Fig. 5 Proposed system

Fig. 6 Layers added to the Inception V 3 model

101

102

G. Sankara Narayanan and B. A. Sabarish

The layers in the first two inception blocks are frozen, and rest of the layers along with the added layers are trained for the fire and smoke dataset. Only, the last inceptions blocks are trained because the size the fire and smoke dataset is relatively much smaller than the size of ImageNet dataset. By freezing the layers, the weights are untouched in the training process. This helps in initial feature extraction. After freezing the initial layers, the model is now trained for the fire and smoke dataset with Adam optimizer and categorical cross-entropy as the loss function. Categorical cross-entropy is best suited for multiclass classifications, and Adam optimizer is used due to its faster computation time [26], less memory usage and efficiency when compared to other optimizers. In the proposed system, the modified model can be used in a surveillance device for live monitoring of fire and smoke in forest cover. Many such surveillance devices can be grouped together to form a network which can then be controlled and monitored from a control hub.

4 Results and Discussion From Tables 2 and 3, it can be inferred that the CNN model has a bias toward the fire class. This is due to the lack in depth of the model. Since fire and smoke do not have specific shape or size, the model gives more weightage to the color-based features of image. Therefore, along with fire images, fire-like images are also classified as fire. Other smoke class images are classified as no smoke class. This is due to lack of distinguishable features extractable by the model. From Tables 4 and 5, the inception model performs better than the CNN model. But, the bias has shifted toward the high smoke class and no smoke class. In this model, only, the last classification layer was changed to change prediction of 1000 classes into 4 classes. Although the performance is better than the CNN model, it can be improved using the proposed modifications. The classification report and the confusion matrix of the proposed model cane be seen in Tables 6 and 7, respectively. The performance has increased when compared Table 1 Literature summary Method

Findings

Terrestrial • Often treated as a binary classification problem • Different deep learning architectures, classifiers and color models are used to classify and detect smoke and fire • Classification and detection are incorporated as part of transfer learning and ensemble learning • Focused more on stationary surveillance UAV

• Method of acquiring the images is the major change • Follows the same principles in classification and detection as seen in the terrestrial method • Focused more on dynamic surveillance

Smoke Detection in Forest Using Deep Learning Table 2 Classification report of CNN model

103

Metric

Value

Accuracy

0.5625

Precision

0.5625

Recall

0.5625

F1-score

0.5625

Table 3 Confusion matrix of CNN model No

Low

High

Fire

No

6

0

1

5

Low

4

4

2

0

High

7

0

4

2

Fire

0

0

0

13

Table 4 Classification report of Inception V 3 model

Metric

Value

Accuracy

0.6667

Precision

0.6667

Recall

0.6667

F1-score

0.6667

Table 5 Confusion matrix of Inception V 3 model No

Low

High

Fire

No

12

0

0

0

Low

5

2

3

0

High

0

0

12

1

Fire

2

0

5

6

to the unmodified Inception V3 model. It is also worthy to note that the false-negative rate is dropped to 2.2% for the proposed model as compared to 15.5% of Inception V3 model and 24.4% of CNN model Tables 8 Table 6 Classification report of proposed model

Metric

Value

Accuracy

0.79167

Precision

0.79167

Recall

0.79167

F1-score

0.79167

104

G. Sankara Narayanan and B. A. Sabarish

Table 7 Confusion matrix of proposed model No

Low

High

Fire

No

10

0

0

2

Low

1

5

2

2

High

0

0

10

3

Fire

0

0

0

13

Table 8 Comparison of the three models

Metric

CNN

Inception V3

Proposed model

Accuracy

0.5625

0.6667

0.79167

Precision

0.5625

0.6667

0.79167

Recall

0.5625

0.6667

0.79167

F1-score

0.5625

0.6667

0.79167

FN rate

24.4

15.5%

2.2%

5 Conclusion The proposed model has the best performance out of the three models tested. The changes made to the inception architecture have directly impacted the false-negative rate. The false-negative predictions in the Inception V 3 model were due to overfitting of the training data. When the dataset size is small, models tend to overfit. But, as a result of modifications made to regularize the model for the fire and smoke dataset, the accuracy has increased to 79% with a false-negative rate 2.2%. This is due to targeted changes made to avoid overfitting of the data. Due to its low false-negative rate, the proposed system can be used for the stationary surveillance of forest cover. As part the future work of this project, localization of fire using the information gained from the features maps of the convolution layers in the proposed model is being explored. This method will avoid the use of a separate model for localization of fire, which reduces the computation time and cost significantly.

References 1. Juárez-Orozco, S. M., Siebe, C., Fernández y Fernández, D.: Causes and effects of forest fires in tropical rainforests: a bibliometric approach. Tropical Conservation Science 10: 1940082917737207 (2017) 2. Kucuk, O., Topaloglu, O., Altunel, A.O., Cetin, M.: Visibility analysis of fire lookout towers in the Boyabat State Forest Enterprise in Turkey. Environ. Monit. Assess. 189(7), 1–18 (2017) 3. Xin, M., Wang, Y.: Research on image classification model based on deep convolution neural network. EURASIP J. Image Video Process. 2019(1), 1–11 (2019) 4. Lorente, Ò., Riera, I., Rana, A.: Image classification with classic and deep learning techniques. arXiv:2105.04895 (2021)

Smoke Detection in Forest Using Deep Learning

105

5. Abu, M.A., Indra, N.H., Rahman, A.H.A., Sapiee, N.A., Ahmad, I.: A study on Image Classification based on Deep Learning and Tensorflow. Int. J. Eng. Res. Technol. 12(4), 563–569 (2019) 6. Barmpoutis, P., Papaioannou, P., Dimitropoulos, K., Grammalidis, N.: A review on early forest fire detection systems using optical remote sensing. Sensors 20(22), 6442 (2020) 7. Xiong, D., Yan, L.: Early smoke detection of forest fires based on SVM image segmentation. J. For. Sci. 65(4), 150–159 (2019) 8. Liu, Y., Qin, W., Liu, K., Zhang, F., Xiao, Z.: A dual convolution network using dark channel prior for image smoke classification. IEEE Access 7, 60697–60706 (2019) 9. Xu, R., Lin, H., Lu, K., Cao, L., Liu, Y.: A forest fire detection system based on ensemble learning. Forests 12(2), 217 (2021) 10. Zhang, Q., Xu, J., Xu, L., Guo, H.: Deep convolutional neural networks for forest fire detection. In Proceedings of the 2016 International Forum on Management, Education And Information Technology Application. Atlantis press (2016) 11. Jiao, Z., Zhang, Y., Xin, J., Mu, L., Yi, Y., Liu, H., & Liu, D.: A deep learning based forest fire detection approach using UAV and YOLOv3. In: 2019 1st International Conference on Industrial Artificial Intelligence (IAI), pp. 1–5). IEEE (2019) 12. Barmpoutis, P., Dimitropoulos, K., Kaza, K., Grammalidis, N.: Fire detection from images using faster R-CNN and multidimensional texture analysis. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8301– 8305. IEEE (2019). 13. Peng, Y., Wang, Y.: Real-time forest smoke detection using hand-designed features and deep learning. Comput. Electron. Agric. 167, 105029 (2019) 14. Kaabi, R., Sayadi, M., Bouchouicha, M., Fnaiech, F., Moreau, E., & Ginoux, J.M.: Early smoke detection of forest wildfire video using deep belief network. In: 2018 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pp. 1–6. IEEE (2018) 15. Divan, A., Kumar, A. S., Kumar, A. J., Jain, A., Ravishankar, S.: Fire detection using quadcopter. I: 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1–5. IEEE (2018) 16. Kumar, S., Parameswaran, L., Oruganti, V.R.M.: Real-time building fire detection and segmentation in video using convolutional neural networks with gaussian threshold approach (2022) 17. Saiharsha, B., Diwakar, B., Karthika, R., Ganesan, M.: Evaluating performance of deep learning architectures for image classification. In: 2020 5th International Conference on Communication and Electronics Systems (ICCES), pp. 917–922. IEEE (2020) 18. Srishilesh, P. S., Parameswaran, L., Sanjay Tharagesh, R. S., Thangavel, S. K., Sridhar, P.: Dynamic and chromatic analysis for fire detection and alarm raising using real-time video analysis. In: International Conference On Computational Vision and Bio Inspired Computing, pp. 788–797. Springer, Cham (2019) 19. Sridhar, P., Parameswaran, L., Thangavel, S.K.: An efficient rule based algorithm for fire detection on real time videos. J. Comput. Theor. Nanosci. 17(1), 308–315 (2020) 20. Murphy, J.: An overview of convolutional neural network architectures for deep learning. Microway Inc 1–22 (2016) 21. Scherer, D., Müller, A., Behnke, S.: Evaluation of pooling operations in convolutional architectures for object recognition. In: International Conference on Artificial Neural Networks, pp. 92–101. Springer, Berlin, Heidelberg (2010) 22. Nwankpa, C., Ijomah, W., Gachagan, A., Marshall, S.: Activation functions: comparison of trends in practice and research for deep learning (2018). arXiv:1811.03378

106

G. Sankara Narayanan and B. A. Sabarish

23. Mahdianpari, M., Salehi, B., Rezaee, M., Mohammadimanesh, F., Zhang, Y.: Very deep convolutional neural networks for complex land cover mapping using multispectral remote sensing imagery. Remote Sens. 10(7), 1119 (2018) 24. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016) 25. Lin, M., Chen, Q., Yan, S.: Network in network (2013). arXiv:1312.4400 26. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv:1412.6980

Pneumothorax Segmentation Using Feature Pyramid Network and MobileNet Encoder Through Radiography Images Ayush Singh, Gaurav Srivastava, and Nitesh Pradhan

1 Introduction Medical imaging incorporates methods and techniques to better understand medical images with aid of algorithms. It plays a crucial role in the classification and treatment of diseases [1, 2]. One of the major techniques used in medical imaging is image segmentation which involves breaking down an image into smaller segments based on different parameters ranging from textures to shapes. It has been an area of active research with use cases in autonomous flight/navigation, satellite, and medical imaging. With advancements in deep learning, image segmentation has been proven to be a boon in the diagnosis of a vast variety of diseases ranging from brain tumors to cancers. Pneumothorax is one such lungs disease that results in sudden breathlessness because of some underlying symptoms or with no symptoms at all [3]. Pneumothorax can be diagnosed by clinical diagnosis using a stethoscope. In contrast, in some cases where the pneumothorax is smaller, X-ray scans are used to determine the location of the pneumothorax. The diagnosis of which is done by a radiologist by looking at the chest X-ray and recognizing the region of the pneumothorax. If pneumothorax is small, then it might heal on its own or else treatment includes the insertion of a needle between the ribs to remove excess air. Image segmentation can aid this process by segmenting the region of the pneumothorax thus providing a confident diagnosis. Several sophisticated methods, especially convolutional neural networks [4], have introduced a new paradigm in image segmentation. The way of segmenting an image A. Singh · G. Srivastava · N. Pradhan (B) Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, Rajasthan, India e-mail: [email protected] G. Srivastava e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_10

107

108

A. Singh et al.

can thus be divided into two: Semantic segmentation and instance segmentation where the former relates to tasks of classifying an image at pixel level and the latter refers to the more complex task of classifying instances of the same pixels in the image. One of the major roadblocks in medical imaging is the availability of labeled data in large quantities, but these are tackled by a state-of-the-art convolutional neural network in medical image segmentation called U-Net [5]. This network is based on an encoder-decoder model where the encoder and decoder can be replaced by more efficient CNN architecture like VGG-16 [6], ResNet50 [7], MobileNet [8], and EfficientNet [9]. Following a research hierarchy, these networks improve one upon another. This paper investigates the training of U-Net with different backbone networks along with different architectures like FPN [10] and LinkNet [11] on the pneumothorax X-ray dataset provided by the Society for Imaging Informatics in Medicine (SIIM). We start presenting our findings by introducing earlier work in image segmentation in the next section followed by a description of the dataset and networks used thus explaining different network architectures. We share our experimental results in Section 4.

2 Related Works Image segmentation has been an area of active research as its applications are boundless. It covers a wide range of applications in the engineering and medical fields. Today’s state-of-the-art algorithms make use of decades of research progress in this field. Earlier work in image segmentation made use of image thresholding [12], clustering [13], edge detection, and graph-based methods. Image thresholding is a simple idea in which the image is first converted into gray scale, and then, thresholding is applied to segment an image. Clustering starts by considering each pixel as an individual cluster and then merging these individual clusters that have the least inter-cluster distance. K-means [14] clustering is a commonly known clustering algorithm. Edge detection algorithms work on major disruptions in the image, i.e., detecting boundaries of different objects in the image by making use of 2D filters. Graph-based segmentation is the most famous of all the methods and is still used in many state-of-the-art image segmentation algorithms. The most used algorithm in graph-based segmentation is Felzenszwalb et al. [15] which first creates an undirected graph where every pixel in the image is a node, and the difference between intensities of two nodes is the weight of the vertex connecting them. This algorithm is still widely used in current state-of-the-art deep learning networks to provide region proposals. Deep learning methods, especially convolutional networks, have made significant improvements in image segmentation. Since their resurrection at ILSVRC 2010 where AlexNet [16], the largest CNN at that time was introduced. Since then, CNNs have made significant contributions to signal processing tasks. Abedella et al. [17]

Pneumothorax Segmentation Using Feature Pyramid Network …

109

trained B-U-Nets which comprised of four networks (ResNet50, DenseNet, Efficientnetb4, and SE-ResNet50) as the backbone. They combined BCE and dice coefficient to make the loss function of the network which achieved 0.8608 on the test set which is among the highest dice score on the SIIM-pneumothorax dataset. Jarakar et al. [18] trained the same dataset on U-Net with ResNet as backbone network achieving a dice score of 0.84. Noticeable work of Malhotra et al. [19] which incorporates Mask R-CNN with ResNet101 as the backbone FPN. This model has a lower loss than ResNet50 as the backbone. Tolkachev et al. [20] examined U-Net with different backbone networks. They used ResNet34, SE-ResNext50, SE-ResNext101, and DenseNet121. They also put their system to test against experienced radiologists thus examining how confident diagnosis can affect the treatment processes. Their system achieved a dice coefficient of 0.8574. In this paper, we present a comparative study of the three most widely used network architectures, i.e., U-Net, LinkNet, and FPN with each architecture trained with four different backbone networks.

3 Materials and Methods Data Description The dataset is provided by Society for Imaging Informatics in Medicine (SIIM) on Kaggle. It consists of 12,000 DICOM files which consist of metadata about the image and the X-ray image in .jpg or .png format. In general, Digital Imaging and Communications in Medicine (DICOM) file consists of a header file and an image file. The header file contains information about the patient and information containing a description of the image such as pixel intensity, dimensions of the image. The image can be from any medical scan such as MRI, X-ray, and ultrasound. The annotations are provided in the form of run-length encoding (RLE) along with image IDs. The X-ray scans which don’t have pneumothorax are marked −1. Data Preprocessing Since the dataset is in DICOM (.dcm), it must be preprocessed to be used for model training. These Dicom files are read using the pydicom library in Python. It may seem that some of the images have lower contrast, so as a preprocessing step, the contrast of the images is increased using histogram equalization [21]. The images are also resized to 128x128 and converted into a NumPy array of 128 × 128 × 3. The rest of the content from the Dicom file is read and converted into Pandas DataFrame. The annotations are RLE-encoded, so they are read and converted to Numpy arrays of dimension 128 × 128 × 1. The dataset was then splitted into training and validation set in the ratio of 8:2. A sample from the preprocessed dataset is shown in Figure 1.

110

A. Singh et al.

Fig. 1 Dataset visualization with both CXR image and after applying mask

Fig. 2 Graphical abstract of the proposed work

Image Segmentation Image segmentation at its crux is classifying an image at the per-pixel level. It means assigning each pixel in the image with some class labels. To divide an image into segments, we first extract features from the image. These features are captured using convolutional neural networks. The early part of CNN’s captures low-level pixels features while later layers capture high-level features in the image. The task of the

Pneumothorax Segmentation Using Feature Pyramid Network …

111

CNN is to produce a per-pixel prediction of every object identified in the image. This task of dividing an image at the pixel level can then be classified into two ways—semantic segmentation and instance segmentation. Semantic segmentation classifies every object belonging to the same class as one whereas instance segmentation classifies every object belonging to the same class as distinct. Depending upon the number of distinct classes we want to classify in an image, segmenting images can further be classified as single-class segmentation and multiclass segmentation. In our project, since we are dealing with only one type of class, i.e., small regions of pneumothorax, so it is a single-class semantic segmentation. Network Architectures The popularity of CNNs rose after Alex Krizhevsky [22] trained a deep convolutional neural network (AlexNet) that achieved the highest accuracy in ILSVRC 2010. Since then, the research interest in CNN has grown which has led to the development of better, efficient, and more robust CNN networks. After the development of AlexNet, the inception network [23] was developed by Google. Bigger networks like AlexNet are more prone to over-fitting and the larger the network the more difficult it becomes to transfer gradients throughout the network. The idea behind inception was to instead of using fixed-size kernels for convolution operation to use multiple, as it will help capture sizes of different sizes in the image. The object distributed throughout the image will be captured by large kernels while the objects distributed locally will be captured by smaller kernels. This makes the network wider instead of longer which helps in the gradient flow without being computationally expensive. VGG-16: VGG is another widely used network. Instead of using larger kernels for convolution, it uses smaller 3x3 kernels. This significantly reduced the computational cost of the network and made the network training easier. One of the major problems in AlexNet and VGG is computational cost. ResNet50: ResNet showed that networks can be made to go deeper and deeper without being too computationally expensive. It introduced the concept of “skip connection” which made gradient flow easier throughout the network. MobileNet is a lightweight convolutional neural network designed to be run on mobiles and microcomputers like Arduino, Raspberry Pi. It achieves this using depthwise separable convolutions which significantly reduces the network’s computational cost. EfficientNetB7: EfficientNet highlighted the use of scaling for achieving higher accuracy. The scaling can be done by adjusting the width of the network which means scaling the number of filters in the network. Another scaling factor is the depth which means adjusting the length of the CNN while scaling the resolution of the input image also helps in improving accuracy. The scaling of the network is done using compound scaling introduced in the EfficientNet paper. As shown in , our experiment uses VGG-16, MobileNet, EfficientNetb7, and ResNet50 networks as backbone networks for encoder-decoder segmentation models, i.e., U-Net, LinkNet, and FPN networks.

112

A. Singh et al.

Network Architectures for Segmentation Earlier contribution using CNN for image segmentation was made by proposing fullconvolutional networks (FCNs) [24]. FCNs are networks without fully connected networks at the end, so this network uses feature maps from the last convolution to make predictions. Since the last layers produce coarse feature maps, dense output in the final prediction is obtained by deconvolution operation on previous layers and adding them to the output. Deconvolution network is another well-known image segmentation architecture that unlike FCN learns the deconvolution parameters. SegNet [25] based on encoder-decoder architecture was introduced along the lines of a deconvolution network, but instead of deconvolution, it uses upsampling for producing dense feature maps as output. U-Net: U-Net [5] was essentially introduced for biomedical imaging and is known as the state-of-the-art encoder-decoder network in the medical field which can learn from a few-labeled dataset which makes it suitable for biomedical segmentation. Like SegNet, U-Net also uses upsampling with no pooling layers which helps in improving the resolution of the output layer. It does not have any fully connected layers like the FCN but is a drastic improvement over FCN. Skip connections are introduced in the network which helps in carrying semantic information to later layers while also providing indirect pathways for smooth gradient flow. LinkNet: LinkNet is like U-Net and was developed to be efficient with fewer parameters and FLOPs. LinkNet uses only 11.5 million parameters and 21.2 GFLOPs. The efficiency of LinkNet allows it to be used in segmenting live videos as well. It was developed by keeping efficiency along with better performance in mind. It provided state-of-the-art results on the CamVid dataset. Feature Pyramid Network: Unlike U-Net and LinkNet, Feature Pyramid Network (FPN) is a pyramid feature network that makes use of the bottom-up and topdown approaches for making predictions. High-level features are extracted from a bottom-up approach which increases the semantic value of the feature maps at later layers. Then, top-down approach is used to construct high-resolution layers from the semantic-rich output of the bottom-up approach. Features from the bottom-up approach are added to top-down layers for better detection of the objects in the image while also acting as skip connections for the easy flow of gradients.

4 Experiment and Results In this section, we present a detailed overview of our experimental setup and the metric we used for evaluating different network architectures.

Pneumothorax Segmentation Using Feature Pyramid Network …

113

Experimental Setup All 12 networks were trained using Kaggle kernels which are equipped with NVIDIA P100 GPU with GPU memory of 16GB capable of performing 9.3 TFLOPS. Each network required 4–5 h of training time. Each network is trained for 50 epochs. Intersection over Union Intersection over Union [26] (IOU) is a metric to calculate the performance of the semantic model. It is calculated by calculating the intersection between the predicted mask and the ground truth and dividing it by the total number of pixels in both the predicted mask and ground truth mask. If IoU is 0, then it indicates that our segmentation model has poorly performed, and if IoU is 1, then it means that it has performed nicely. It is calculated as shown in Equation 1. J (A, B) =

|A∩B| |A∪B|

=

|A∩B| |A|+|B|−|A∩B|

(1)

J = Jaccard distance A= Set 1 B = Set 1 In multiclass segmentation, the IoU is calculated for each class separately and then averaged over all calculated IoUs which predicts the total IoU for the semantic model. Dice Coefficient It is the harmonic coefficient of precision and recall [27]. The dice coefficient is calculated by multiplying 2 by the total of true positives (TP) divided by 2 times the number of TP + false negatives (FN) + false positives (FP). It is defined as shown in Equation 2. Dice Coefficient =

2|X ∩Y | |X |+|Y |

(2)

Results We trained U-Net, FPN, and LinkNet on the pneumothorax dataset. We experiment with them using different backbone networks as encoders and decoders. These backbone networks include VGG-16, MobileNet, EfficientNet, and ResNet50. The main objective was to derive the best network combination for the pneumothorax dataset. We used Adam optimizer for training all 12 networks. We start by examining the loss values of U-Net as presented in Table 1 followed by LinkNet with a detailed description of evaluation metric values in Table 2 and at last presenting the performance description of FPN in Table 3. For training the networks, we used Jaccard loss with Intersection over Union (IoU) and dice coefficient as the metrics for evaluating the networks. Below we present our findings in tabular form for each network architecture with different backbone networks. They are compared based on the Jaccard loss, IoU, and dice coefficient obtained on the training and validation set, followed by the graphs for the same.

114

A. Singh et al.

Table 1 Experimental results of different pre-trained encoders with U-Net architecture Backbone networks

Training Jaccard loss

Training IoU

Training dice coefficient

Validation Jaccard loss

Validation IoU

Validation dice coefficient

VGG-16 MobileNet

0.34537

0.65460

0.78623

0.29936

0.70061

0.81916

0.46473

0.53533

0.68999

0.43130

0.56882

0.71826

EfficientNetB7

0.43055

0.56955

0.71935

0.37270

0.62732

0.76591

ResNet50

0.34410

0.65590

0.78797

0.32202

0.67802

0.80348

Table 2 Experimental results of different pre-trained encoders with LinkNet architecture Backbone networks

Training Jaccard loss

Training IoU

Training dice coefficient

Validation Jaccard loss

Validation IoU

Validation dice coefficient

VGG-16 MobileNet

0.35843

0.66159

0.79254

0.31247

0.70429

0.82304

0.39980

0.62218

0.76251

0.36521

0.65366

0.78657

EfficientNetB7

0.41013

0.61216

0.75510

0.35961

0.65936

0.79099

ResNet50

0.38842

0.61157

0.75317

0.36906

0.63097

0.76852

Table 3 Experimental results of different pre-trained encoders with FPN architecture Backbone Networks

Training Jaccard Loss

Training IoU

Training Dice Coefficient

VGG-16

0.47703

0.54882

0.70226

MobileNet

0.28901

0.72649

0.83946

EfficientNetB7

0.40222

0.62012

0.76136

ResNet50

0.35407

0.64591

0.77967

Validation Jaccard Loss

Validation IoU

Validation Dice Coefficient

0.44197

0.58129

0.73027

0.23795

0.77301

0.87055

0.33454

0.68251

0.80799

0.32804

0.67201

0.79928

U-Net In our experiment, we found that for all the backbone networks we got similar Jaccard loss for U-Net architecture. The loss varied by very little amount among these backbones. ResNet50 and VGG-16 backbones produced the lowest loss with VGG-16 producing slightly better results on the validation set. LinkNet LinkNet produced similar results for VGG-16, MobileNet, and EfficientNetb7 while ResNet50 produced the lowest dice score. Among the three, the best performing backbone was VGG-16 followed by EfficientNetb7. Feature Pyramid Network (FPN) The best combination was with MobileNet as the backbone network followed by EfficientNetb7. VGG-16 produced the lowest dice score on the validation set. FPN with MobileNet combination performed the best among all 11 networks. Next, we present the learning curves which comprise Jaccard loss, Intersection over Union, and dice coefficient. Each of these curves shows improvement in the

Pneumothorax Segmentation Using Feature Pyramid Network …

115

networks when trained for 50 epochs. We share 3 different graph plots, i.e., Figs. 3 and 4 displaying Jaccard loss, IoU, and dice coefficient of all the 12 networks, respectively. Jaccard Loss graph The following graphs plot the learning curve for all 11 networks when trained for 50 epochs. Inference from the graph shows MobileNet FPN with the lowest loss while VGG-16 FPN with the highest loss value. Intersection Over Union graph We can conclude from the graph that the highest IoU is achieved by MobileNet FPN, and the least is produced by MobileNet U-Net. We observe a steep increase in the MobileNet FPN curve while VGG-16 FPN and MobileNet U-Net follow a similar curve.

Fig. 3 Loss curve during training of different pre-trained encoders combined with U-Net, FPN, and LinkNet architecture

Fig. 4 Dice coefficient and Jaccard index curve during training of different pre-trained encoders combined with U-Net, FPN, and LinkNet architecture

116

A. Singh et al.

Dice Coefficient We can easily infer that MobileNet FPN has the best dice score hence the best network combination among the 12 networks while VGG-16 FPN and MobileNet U-Net are the poor performing combinations. U-Net with ResNet50 backbone is the second-best backbone network.

5 Conclusion In this paper, we investigated three well-known network architectures with four different backbone networks each hence training 12 networks altogether to determine the best network combination for the pneumothorax segmentation dataset. We trained the network with Jaccard loss and used IoU and dice coefficient as metrics. We conclude that the best network combination was FPN with MobileNet backbone while U-Net with MobileNet and FPN with VGG-16 as the worst-performing architectures. We communicated our findings with the aid of tables and graphs. Future works include using other more sophisticated architectures like Mask R-CNN for segmentation and comparing it with our existing results.

References 1. Himabindu, G., Ramakrishna Murty, M.: Classification of kidney lesions using bee swarm optimization. Int. J. Eng. Technol. 7(2.33): 1046–1052 (2018). 2. Himabindu, G., Ramakrishna Murty.M.: Extraction of texture features and classification of renal masses from kidney images. Int. J. Eng. Technol 7(2.33): 1057–1063 (2018) 3. MacDuff, A., Arnold, A., Harvey, J.: Management of spontaneous pneumothorax: British thoracic society pleural disease guideline 2010. Thorax 65, no. Suppl 2 (2010): ii18-ii31. 4. LeCun, Y., Haffner, P., Bottou, L., Bengio, Y.: Object recognition with gradient-based learning. In: Shape, Contour and Grouping in Computer Vision, pp. 319–345. Springer, Berlin, Heidelberg (1999) 5. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and ComputerAssisted Intervention, pp. 234–241. Springer, Cham (2015) 6. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556 7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 8. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M. and Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications.arXiv:1704.04861 (2017) 9. Mingxing, T., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019) 10. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Featurepyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

Pneumothorax Segmentation Using Feature Pyramid Network …

117

11. Chaurasia, A., Culurciello, E.: Linknet: exploiting encoder representations for efficient semantic segmentation. In: 2017 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4. IEEE (2017) 12. Gurung, A., Tamang, S.L.: Image segmentation using multi-threshold technique by histogram sampling (2019). arXiv:1909.05084 13. Naous, T., Sarkar, S., Abid, A. and Zou, J.: Clustering plotted data by image segmentation (2021).arXiv:2110.05187. 14. Andrecut, M.: K-Means Kernel Classifier (2020). arXiv:2012.13021 15. Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vision 59(2), 167–181 (2004) 16. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems 25 (2012) 17. Abedalla, A., Abdullah, M., Al-Ayyoub, M., Benkhelifa, E.: Chest X-ray pneumothorax segmentation using U-Net with Efficient-Net and ResNet architectures. PeerJ Comput Sci. 29(7), e607 (2021). https://doi.org/10.7717/peerj-cs.607.PMID:34307860;PMCID:PMC827 9140 18. Pneumothorax segmentation: deep learning image segmentation to predict pneumothorax by karan Jarkhar [arXiv:1912.07329 ] 19. Malhotra, P., Gupta, S., Koundal, D., Zaguia, A., Kaur, M. and Lee, H.N.: Deep learningbased computer-aided pneumothorax detection using chest X-ray images. Sensors 22(6): 2278 (2022). https://doi.org/10.3390/s22062278 20. Tolkachev, A., Sirazitdinov, I., Kholiavchenko, M., Mustafaev, T., Ibragimov, B.: Deep learning for diagnosis and segmentation of pneumothorax: the results on the kaggle competition and validation against radiologists. IEEE J. Biomed. Health Inform. 25(5), 1660–1672 (2020) 21. Pizer, S.M., Amburn, E.P., Austin, J.D., Cromartie, R., Geselowitz, A., Greer, T., ter Haar Romeny, B., Zimmerman, J.B. and Zuiderveld, K.: Adaptive histogram equalization and its variations. Comput. Vis. Graphics Image Process. 39(3): 355–368 (1987) 22. Hong, S., Noh, H., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528. 2015. 23. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Goingdeeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) 24. Long, J., Shelhamer, E., Darrell, T.: Fullyconvolutional networks for semantic segmentation.” In Pro- ceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431– 3440. 2015. 25. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481– 2495 (2017) 26. Generalized intersection over union: a metric and a loss for bounding box regression. arXiv: 1902.09630 27. Continuous dice coefficient: a method for evaluating probabilistic segmentations. arXiv:1906. 11031

Visual Learning with Dynamic Recall G. Revathy, Pokkuluri Kiran Sree, S. Sasikala Devi, R. Karunamoorthi, and S. Senthil Vadivu

1 Introduction Convolutional Neural Networks have long been used for computer vision tasks like image classification, image captioning, semantic segmentation and object recognition, to name a few. Traditionally, CNNs use large amounts of data to perform these tasks effectively. This could be a drawback in situations where adequate data is not available. Few-shot learning is the domain of research that addresses the problem of learning with minimal data. The idea of few-shot learning in the context of the base paper is to train a classification model on a dataset containing images belonging to ‘base’ categories and use said model to classify images belonging to ‘novel’ categories. A good few-shot learning system should ideally satisfy two requirements, (i) the novel categories have to be learnt quickly and (ii) the classification accuracy of the base classes should not deteriorate.

G. Revathy (B) School of Computing, SASTRA Deemed University, Thanjavur, Tamilnadu, India e-mail: [email protected] P. K. Sree Department of Computer Science and Engineering, Shri Vishnu Engineering College for Women(A), W.G Dt, VishnupurBhimavaram, AP, India S. S. Devi School of Computer Science, PPG College of Arts and Science, Coimbatore, Tamilnadu, India R. Karunamoorthi Department of Computer Science and Engineering, Kongu Engineering College, Perundurai, Tamilnadu, India S. S. Vadivu Department of IT, Arulmigu Meenakshi Amman College of Engineering, Thiruvannamalai, Tamilnadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_11

119

120

G. Revathy et al.

To achieve this, two technical novelties are introduced—the first being a cosine-similarity-based classifier and the second being an attention-based few-shot classification weight generator to generate weights for the novel categories. One common problem faced in few-shot learning models is that they have to be retrained to include the novel classes. The few-shot classification weight generator and the cosine-similarity-based classifier intrinsically overcome this problem.

2 Related Work Few-shot learning, right now is a domain that is being explored extensively. Over the years, several methods have been proposed to tackle this problem. One of the first attempts to solve this task, was to formulate it as an image matching task between the test sample and the novel class training data [2, 4, 5]. They trained a Siamese Neural Network for the same [3]. It is propose matching networks that meta-learn embedding functions for the train and test data using LSTMs and use similarity to classify/recognize the test data [5, 6]. Prototypical Networks learn a prototype representation for each novel class and classify the test data by computing the distance between the prototype and the test data. In the proposed architecture, the classification weights also learn to be representative of their respective classes. Meta-learning techniques have been adapted in several ways for the task of few-shot learning. In MAML [7–10], a model is trained on several tasks to obtain an initial set of parameters that can be improved for the few-shot task with only a few gradient updates. The proposed architecture also contains a meta-learning component which is the classification weight generator.

3 Proposed Architecture The proposed architecture consists of a feature extractor module followed by a classifier module. The feature extractors used here are Resnet10, C64F and C128F [1] We use the C128F in our implementation of the base paper. It consists of a four “Conv Blocks” each of which consist a convolution layer with a non-linear activation function, a max pooling layer and a batch normalization layer. The network is illustrated in Fig. 1 and the “Conv Block” is elaborated in Fig. 2.

Visual Learning with Dynamic Recall

121

Fig. 1 C128F feature extractor

Fig. 2 Conv block

4 Proposed Algorithm The proposed a two-stage training approach to train both modules effectively. • In the first training stage, the feature extractor is trained and the classifier learns the base class weight vectors. kb

Na Dr = ∪ (xb )i=1 of kb b=1

(1)

where DrTrainingset N a is number of training samples, b the category of x b , i the training sample There are 2 main components a Conv Net recognition model which recognizes base and novel categories and a few-shot classification weight generator that dynamically generates classification vectors at test time.

122

G. Revathy et al.

• Conv Net Recognition Model: Consisting of a feature vector and a classifier. In feature vector z = F(x|) ∈ R d

(2)

K∗ W ∗ = {wk∗ ∈ R d }k=1

(3)

Where x is an input image. In classifier

where K* is the dimensionality vector base Wbase = (w)kk=1

(4)

During the initial training phase the parameters and the classification weight vectors of the base categories are able to recognize the weight categories. • In the second stage, few-shot classification weight generator is a meta-learning mechanism with K novel category items. 

N



K novel n Dnovel = Un=1 (xn,i )i=1 

(5)



Nn is the number of training samples and xn,i is the ith training sample from Conv net model. For each sample few-shot classification weight is generated. • The feature extractor is frozen and only the classifier is trained. Out of the B base classes, N are chosen as the “fake” novel classes and K images from each of the “fake” novel classes are used for training the classifier to learn weights for the classes. The novel class weights are learnt by the few-shot classification weight generator, which is implemented using two different methods. In the first one (feature averaging), the feature representations of the samples from each of the novel classes are averaged with learnable weights to obtain the weights for the novel classes. In the second method (attention), an attention-based mechanism is used to obtain the weights for the novel classes using both feature representations of the samples from the novel classes and the base class weights (Fig. 3). The novel class weights are learnt by the few-shot classification weight generator, which is implemented using two different methods. Step 1: In the first one (feature averaging), the feature representations of the samples from each of the novel classes are averaged with learnable weights to obtain the weights for the novel classes. Step 2: In the second method (attention), an attention-based mechanism is used to obtain the weights for the novel classes using both feature representations of the samples from the novel classes and the base class weights.

Visual Learning with Dynamic Recall

123

Fig. 3 Train stage 1

Step 3: In train stage 2, the authors use test samples from both the base and novel categories to compute the loss. This ensures that the model learns weights for the novel classes without forgetting the base classes. Train Stage 1 is illustrated in Fig. 4, and Stage 2 is illustrated in Fig. 5.

Fig. 4 Train stage 2

Fig. 5 Loss graph of train stage 1

124

G. Revathy et al.

5 Results and Discussion We evaluate our few-shot object recognition system on the Mini-ImageNet dataset [2] that includes 100 different categories with 600 images per category, each of size 84 × 84. For our experiments we used the splits by Ravi and Laroche [3] that include 64 categories for training, 16 categories for validation and 20 categories for testing. The typical evaluation setting on this dataset is first to train a few-shot model on the training categories and then during test time to use the validation (or the test) categories in order to form few-shot tasks on which the trained model is evaluated. The Loss Graph and Shoot Accuracies are Determined. The Code part of the above are done with Python. Loss Graphs See Figs. 6, 7 and 8.

Fig. 6 Loss graph of train stage 2 [5-shot]

Fig. 7 5-way 5-shot validation accuracies

Fig. 8 5-way 5-shot test accuracies

Visual Learning with Dynamic Recall

125

Fig. 9 5-way 1-shot validation accuracies

Fig. 10 5-way 1-shot test accuracies

Table 1 Accuracies using feature averaging method Model

Our implementation

5-way 5-shot (5 novel classes, 5 training samples per novel class)

5-way 1-shot (5 novel classes, 1 training sample per novel class)

Novel

Base

Both

Novel

Base

Both

66.14%

61.19%

50.22%

48.52%

60.28%

40.69%

Shot Accuracies See Figs. 9 and 10. Feature Averaging See Table 1.

6 Conclusion The algorithm and architecture proposed by the base paper were implemented successfully with results very close to the ones reported in the paper. The implementations closely followed the recommended hyperparameters and other settings to ensure that the results would not have any inadvertent error.

References 1. Gidaris, S., Komodakis, N.: Dynamic few-shot visual learning without forgetting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018) 2. Wang, Y.X., Girshick, R., Hebert, M., Hariharan, B.: Low-shot learning from imaginary data. arXiv preprint arXiv:1801.05401. (2018) 3. Chunjie, L., Qiang, Y., et al.: Cosine normalization: using cosine similarity instead of dot product in neural networks. arXiv preprint arXiv:1702.05870 (2017)

126

G. Revathy et al.

4. Finn, C., Abbeel, P., Levine, S.: Model-agnostic metalearning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400 (2017) 5. Hariharan, B., Girshick, R.: Low-shot visual recognition by shrinking and hallucinating features. arXiv preprint arXiv:1606.02819 (2016) 6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 7. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 8. Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: International Workshop on Similarity-Based Pattern Recognition, pp. 84–92. Springer (2015) 9. Revathy, G., et.al.: Machine learning algorithms for prediction of diseases. Int. J. Mech. Eng. 7(1) (2022) 10. Krizhevsky, I., Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)

Machine Learning for Drug Discovery Using Agglomerative Hierarchical Clustering B. S. S. Sowjanya Lakshmi and Ravi Kiran Varma P

1 Introduction Drug designing and development plays a prominent role in a pharmaceutical organization and chemical scientists. The process of discovering and designing drugs is known as drug discovery. It is processes that aim to identify an element that can be used to cure as well as treat diseases [1]. Investigators typically find novel drugs by obtaining a new perspective on a disease process, allowing them to design a medicine to mitigate the disease’s effect. This method consists of identifying candidates, synthesis, categorization, validation, optimization, screening, and assays for healing efficacy. In the process of drug development, the molecules significance has to be proved through research prior to clinical trials. Because of the high R&D and clinical trial budgets, drug discovery and development are an expensive process. From the time a new drug is discovered to the time, it is offered to the public for treating patients; it takes over 12 years to develop [2, 3]. For each effective medicine, the average cost of research and development is estimated to reach about $2 billion. This amount includes the expense of thousands of failures: just one molecule out of every 5000–10,000 that enters the research and development pipeline is approved. These figures identify why drug discovery and development takes more time [4, 5]. Success requires a lot of resources, including the best scientific and logical brains, highly complex laboratories and technology, and multidimensional project management [6]. It also requires patience and good fortune. Finally, the drug discovery process delivers billions of victims hope, confidence, and relief. In the contemporary research, ML techniques are prominently used in every stage of drug discovery [7, 8]. However, there are a few positive fine realistic constraints [9, 10]. One advantage for the researchers is the availability of large amounts of B. S. S. S. Lakshmi · Ravi Kiran Varma P (B) Maharaj Vijayaram Gajapathi Raj College of Engineering, Vizianagaram, AP, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_12

127

128

B. S. S. S. Lakshmi and Ravi Kiran Varma P

data in this domain. Disease selection, target hypothesis, lead identification, lead optimization, preclinical trial, clinical trial, pharmacogenic identification are all steps in the drug discovery process. It takes several years to successfully complete all of these steps. Research is being conducted to improve the speed and efficiency of this procedure in order to fight disease with the drug. With large feature space of drug discovery data, traditional methods have very high computational and time complexity that leads to wastage of time and efforts when a discovery is misfired. ML techniques overcome these difficulties and aid in optimized solution [11–13]. In the recent drug discovery method, the procedure of lead identification and optimization replaces the traditional drug phases that toll greater price and time [14]. The objective is to incorporate machine learning techniques that help expedite the process of drug discovery and development which consumes expenditure and process time. For example, ciprofloxacin is an antibiotic used to treat a number of bacterial infections. The lead compound of ciprofloxacin is piper zine, and on further synthesis of piper zine, it gives 38 different compounds. In this paper, we work on compound similarity prediction, which is a part of lead identification. The dataset used is the SDFfilemolV3000records; from this dataset, we take a structure and smiles, and from smiles, we produce molecular fingerprint generation using rdkit in a Jupyter Notebook, and we find similarity between compounds; based on that similarity, we perform clustering. Drug discovery using traditional methods takes many years to discover a new drug. Using machine learning algorithm, we can reduce the time taken to discover a new drug. First, we need to know the reaction of the new compound to use that compound in any process, so compound similarity prediction is used to speed up the process. The paper is organized as follows: Sect. 2 summarizes literature surveys and related work in the field. Section 3 describes about the methodology used in a work. Section 4 explains the environment setup. Section 4.2 presents the performance analysis and experimental investigation. Section 5 details the conclusion and future work.

2 Related Work Some researchers are working on machine learning approaches in drug discovery and development. Lo et al. [15] explained, machine learning approaches to drug discovery have been improving rapidly, achieving promising results. Supervised and unsupervised ML methods like KNN clustering, Naïve Bayes (NB), support vector machines (SVMs), random forests (RFs), neural networks (NNs), decision trees (DTs) are among the few that are used in drug discovery. It also has certain limitations, like a lack of data interpretability, overfitting, and the need for a large amount of data. They focused on utilizing ML over large datasets for a greater range of biological activity prediction.

Machine Learning for Drug Discovery Using Agglomerative …

129

The work by Priya et al. [16] focused on various machine learning techniques that are applied to achieve accurate prediction of drug discovery. During the past few decades, biological databases have increased immensely. The lead feature is collected and applied to the machine learning model for better inhibitor prediction. The algorithms used in this work are NB, DT, RF, and SVM. In the drug discovery process, feature extraction and preprocessing of raw data play a major role in accurate prediction. Vamathevan et al. [17] state that machine learning approaches have also been investigated by large pharmaceutical companies for use in drug research and development. They identified uncertainties of various ML algorithms, in the form of variation in accuracies. The test results on known records were better compared to that of unknown records. Patel et al. [18] used several ML methods including NB, SVM, recurrent neural networks (RNNs), convolutional neural networks (CNNs) for discovering drugs. However, they identified under and overfitting issues in the results. Leilei et al. [19] proposed a silhouette coefficient-based selection framework and fingerprint of molecular. Unsupervised clustering models were used in their work including kmeans algorithm, Birch algorithm, spectral clustering algorithm, Gaussian mixture model, mini batch K-means algorithm, hierarchical clustering algorithm. Syarofina et al. [20] validated inhibitor compound DPP-4 3079 molecule data gathered from Chembl dataset. Performance was evaluated using the parameters Calinski–Harabasz score, silhouette coefficient, and Davies–Bouldin index. Based on the findings, we use various machine learning algorithm to cluster similar compounds among huge databases based on the chemical and biological properties. We use various cluster validation methods.

3 Methodology 3.1 Molecular Fingerprint Molecular fingerprint converts a molecule into a sequence of 1’s and 0’s (as bit vector) depending on the existence of certain chemical features. The format of the fingerprint is simplified molecular input line entry system (SMILES). Different types of a molecular fingerprint are MACCS keys, atom-pair, Morgan, path-based, topological torsion. MACCS, molecular access system keys are a binary fingerprint where existence or non-existence of substructure features is represented by a pattern of 1’s and 0’s. There are two sets of MACCS keys, one with 960 keys and other with a subset of 166 keys. RDKIT generates 167 bit-long fingerprints. Morgan circular fingerprint is a variation of the fingerprint produced by the Morgan algorithm. It represents the substructure of neighborhood atoms in circular form called bond radius. The default set of feature for the Morgan fingerprint are as follows: radius = 2, nBits = 2048. Atom-pair fingerprint is constructed using, as

130

B. S. S. S. Lakshmi and Ravi Kiran Varma P

the name suggests, pairs of atoms as well as their topological distances. The default set of features for the atom-pair fingerprint are as follows: minimum length = 1, maximum length = 7, include chirality = True, nBits = 2048, nBitsPerEntry = 2. Topological torsion fingerprint is 2D structural fingerprint that is generated by identifying bond paths of four non-hydrogen atoms. The default set of features for the topological torsion fingerprint are as follows: include chirality = True, nBits = 2048, nBitsperentry = 4. A path-based fingerprint, similar to those produced by daylight, it finds all paths of a given length in a molecule. Those paths, together with information about the atoms and bonds along that path, define the substructure. The default set of features for the path-based fingerprint are as follows: min Path = 1, max Path = 7, fpSize = 2048, bitsPerHash = 2, useHs = True, tgtDensity = 0.0, minSize = 64.

3.2 Molecular Fingerprint Similarity Take two molecules as input and returns a numerical value between 0 and 1 representing how ‘similar’ the molecules are. The molecular fingerprint metrics of similarity used are Tversky, Russel, Tanimoto, Cosine, Dice, and Sokal. In this work, Tanimoto coefficient similarity has been employed. It has the following form: TC =

N AB N A + N B − N AB

(1)

Here, N A represents no. of bits in A, N B structure B’s no. of bits, and N AB no. of bits common to structures A and B. The similarity is a measure between 0.0 and 1.0, higher the value, greater the similarity.

3.3 Proposed Methodology In the proposed system, a clustering algorithm will be applied to group similar compounds that have similar chemical, biological, and physical characteristics as lead compounds in drug discovery. Cluster analysis determines how groups of objects are clustered. Clustering forms boundaries between the data elements based on certain similarity measures. Typically, clustering is of four types, based on density, partitioning, hierarchy, and fuzzy. This paper employed three prominent types of clustering methods, the k-means, hierarchical agglomerative, and the c-means fuzzy. The block diagram of the methodology is given in Fig. 1. K-Means Clustering. It belongs to unsupervised ML algorithms, where each object shall be grouped into one of the k-clusters, such that within the cluster the object

Machine Learning for Drug Discovery Using Agglomerative …

131

Fig. 1 Workflow of machine learning techniques in drug discovery

similarity is more and from one to another cluster the object similarity is lesser. User can select the value of k. Input: D is a dataset; number of objects is n, and the number of clusters is k. Output: k-clusters. Steps: • Randomly pick k objects D to be the primary cluster centroids. • Perform the following actions for each of the objects in D: – Determine the distance between the current objects and the centroids of kclusters. – Assign the current object to the cluster to which it is most closely related. • Compute the “cluster centers” of each cluster. These become the new cluster centroids. • Perform steps 2–3 until the convergence criterion is satisfied. • Stop Agglomerative Hierarchical Clustering (AHC). It is a popular hierarchical clustering technique. It uses a bottom-up strategy to group the datasets into cluster. It means that this method treats each dataset as a single cluster at first, and then starts joining the cluster that are the closest to each other. The merging process continues iteratively until all the clusters are merged. Some of the distance measures are single linkages, complete linkages, average linkages, centroid distance, and Ward’s method. The hierarchy of clusters is represented as a tree (dendrogram). Steps in the algorithm: • • • •

N clusters are formed treating each data element as a different cluster. N-1 clusters are formed by merging two clusters that are nearest. N-2 clusters are formed by further merging two closest clusters. Step 3 is repeated till there is only single cluster left.

132

B. S. S. S. Lakshmi and Ravi Kiran Varma P

Fuzzy C-Means clustering (FCM). It is a soft computing approach. Here, each data elements within a cluster belongs to one of the fuzzy memberships with respect to its cluster center. As the data elements is closer to the center, the membership is higher. Steps in the algorithm: Let the data points be, P = { p1 , p2 , ... pn } and the cluster centers be, O = {o1 , o2 , ...oc }. 1. Select ‘c’ cluster centers randomly. 2. The ‘μi j ’ fuzzy membership is calculated using Eq. (2):

μi j =

1 c  di j (2lm−1) k=1

(2)

dik

3. The fuzzy centers ‘V j ’ are calculated using Eq. (3): n  m  μi j xi V j = i=1 m  , ∀ j = 1, 2 . . . c n  i=1 μi j

(3)

4. The above two steps shall be repeated, up to an optimized objective function ‘J’ is achieved or || U (k+1) − U (k) || < β, where, The stopping criterion is β, 0 ≤ β ≤ 1. Iteration count is ‘k’   The fuzzy membership matrix is, ‘U’= Ui j n∗c

3.4 Evaluation and Validity Validation is the trying out segment of the cluster that has been obtained to discover how well the overall performance of the cluster is. The silhouette coefficient, the Davies–Bouldin index, and the Calinski–Harabasz score cluster validation methods are used in this study. DBI assessment is seen from the wide variety and proximity of facts from clustering results, in which whether or no longer the cluster outcomes are seen from the quantity and proximity between the information from the cluster results. The DBI dimensions approach is to maximize the distance among cluster and reduce the distance between clusters. DBI can be defined as 1 maxi= j (Ri, j ) k i=1 k

DBI =

(4)

Machine Learning for Drug Discovery Using Agglomerative …

133

The number of clusters selected in this case is k. The ratio of clusters i and j is denoted by Ri, j . In DBI validation, a cluster can be said to be good if it has the smallest possible cohesion and as much separation as possible. Ri, j formulated as follows Ri, j =

SSWi + SSW j SSBi, j

(5)

For sum of square within cluster (SSW), it is a cohesion metric in a cluster as follows: SSWi =

mi   1  d x j , ci m i j=1

(6)

  The cluster, i, data point is m i ; the cluster i centroid is denoted by ci ; d x j , ci is the Euclidean distance. The silhouette coefficient technique is one of the techniques used to test the best of the clusters from the clustering technique. To compare the position of each element ˆ is stated as in each cluster, silhouette index is used. S(i) ˆ = S(i)

D(i) − O(i)  max D(i), O(i)

(7)

ˆ is the silhouette value. The mean distance between elements (i) to rest of Here, S(i) the elements in the same cluster is O(i). The mean distance between elements (i) to rest of the elements in different cluster D(i). Based on the standardized silhouette, where higher value is better than a lower value, a value close to zero is considered not good. CHS is also known as the variance ratio criterion. A weak group model is indicated by lower CHS values. CHS represents common ratio of inter- and intra-cluster distributions, for a dataset E sized n E that is divided into k-clusters. The trace of the intra-cluster scatter matrix is Tr(Bk ) and that of the inter-cluster is Tr(Wk ), CHS can be expressed as: CHS =

nE − k Tr(Bk ) × Tr(Wk ) k−1

(8)

134

B. S. S. S. Lakshmi and Ravi Kiran Varma P

4 Experiments and Results 4.1 Environment Setup In this section, we will discuss the hardware and software we use in evaluating performance of algorithm and dataset. We are using a laptop with an Intel®Core™i3CPU and 4.0 GB RAM. Also, we use Windows 10 as an operating system. The dataset is open accessible via the Marvin software. It is an SDF file format. The Anaconda navigator is a tool used for desktop GUI and comes under the Anaconda distribution. It is easy to launch the application and manage packages and environment without using command-line commands. It can mainly use Python and R data science packages. It manages library dependencies and the environment with Conda. Evolve and train machine learning and deep learning models with Scikit-learn, TensorFlow, etc. The Anaconda navigator, we created a separate environment called ‘my-rdkit-env’. The version of RDKIT is 2021.09.2. In this environment, we use Jupyter Notebook as application. In other hand, we use Jupyter Notebook as Python IDE. Jupyter Notebook is an integrated development environment (IDE) used in computer programming, for the Python language. Jupyter Notebook is a tool for any machine learning programmer. It segregates the code into cells and executes. It also helps in autofilling the function and prompts the syntax for the function. It also gives complete documentation on the site for function that is being used. Jupyter Notebook is used to show the graph, charts without the need for any new application. The databases used in this study are SDFfilemolV3000records downloaded from drug central database site (https://drugcentral.org/download) in SDF file format. This will involve collection of data with chemical structure and activity data after this preprocessing is applied to extract feature. The records in the dataset are around 4607 records. The dataset contains fields like ‘ID’, ‘PREFERRED_NAME’, ‘CAS_RN’, ‘SYNONYMS’, ‘URL’, SMILES’, ‘Molecules’, ‘type’, ‘CAS’, ‘Name’, ‘Ambiguous’.

4.2 Experimental Investigations In this work, k-means, hierarchical agglomerative, and c-means fuzzy clustering algorithms are used. The silhouette coefficient (SC), the Davies–Bouldin index (DBI), and the Calinski–Harabasz score (CHS) cluster validation methods are used in this study for performance comparison. The value of the Silhouette coefficient, Calinski– Harabasz score should be maximum, and Davies–Bouldin index should be minimum for better clustering. The validation results, Davies–Bouldin index, of the k-means algorithm reported a value of 0.9420, and the silhouette coefficient produces a value of 0.3500 while the Calinski–Harabasz score produces a value of 3422.3283. The validation results of the agglomerative hierarchical algorithm using the Davies–Bouldin index produce a

Machine Learning for Drug Discovery Using Agglomerative …

135

Table 1 Results of the validation of the three algorithms Algorithms

No of clusters

SC

DBI

CHS

K-means

3

0.3500

0.9420

3422.3283

Agglomerative Hierarchical

3

0.3344

0.8570

2902.8186

Fuzzy C-means

3

0.3303

0.9950

3390.8621

0.355

1.05

0.35

1

0.345

3500 3400 3300 3200 3100 3000 2900 2800 2700 2600

0.95

0.34

0.9

0.335

0.85

0.33

0.8

0.325

0.75

0.32 Silhouette Coefficient (SC) K-Means

AHC

FCM

Davies Bouldin Index (DBI) K-Means

AHC

FCM

Calinski-Harabasz Score (CHS) K-Means

AHC

FCM

Fig. 2 Graphical comparison of K-means, AHC, and FCM with three different performance indicators, viz., SC, DBI, and CHS

value of 0.8570, and the silhouette coefficient produces a value of 0.3344, while the Calinski–Harabasz score produces a value of 2902.8186. The validation results of the fuzzy c-means algorithm using the Davies–Bouldin index produce a value of 0.9950, and the silhouette coefficient produces a value of 0.3303 while the Calinski–Harabasz score produces a value of 3390.8621. From Table 1, it can be seen that hierarchical agglomerative clustering has outperformed with a minimum DBI of 0.8570; the highest value of SC, 0.3500, is recorded for k-means clustering; the highest value of CHS, 3422.3283, is produced by kmeans. Figure 2 shows the graphical comparison of K-means, AHC, and FCM with three different performance indicators, viz., SC, DBI, and CHS.

5 Conclusion Drug discovery is a complicated process, and it takes several years to discover a novel drug. So, to reduce the time-consuming process, we need machine learning algorithms to speed up the process of drug discovery. To design a new drug, we

136

B. S. S. S. Lakshmi and Ravi Kiran Varma P

must know every drug’s reaction at every stage. Then, we could save time in discovering a new drug for that disease. The focus of this research is on spending up the lead hypothesis stages of the drug discovery process. It also focuses on fingerprint creation and finding similarities by using metrics like the Tanimoto coefficient. We used a clustering algorithm. Some clustering algorithms are k-means and agglomerative hierarchical and fuzzy c-means. Finally, we clustered the compounds based on their similarity. We have successfully implemented compound similarity prediction, and finally, we can know the reaction of a new drug. Here, agglomerative hierarchical clustering algorithm gives best result compared to k-means and fuzzy c-means algorithm.

References 1. Prakash, N., Devangi, P.: Drug discovery. J. Antivir Antiretrovir 2, 063–068 (2010) 2. Deore, A.B., Dhumane, J.R., Wagh, R., Sonawane, R.: The stages of drug discovery and development process. Asian J. Pharm. Res. Dev. 7, 62–67 (2019) 3. Dara, S., Dhamercherla, S., Jadav, S.S., Babu, C.M., Ahsan, M.J.: Machine learning in drug discovery: a review. Artif. Intell. Rev. 55, 1947–1999 (2022) 4. Moffat, J., Vincent, F., Lee, J., Eder, J., Prunotto, M.: Opportunities and challenges in phenotypic drug discovery: an industry perspective. Nat. Rev. Drug Discov. 16, 531–543 (2017) 5. Mohs, R.C., Greig, N.H.: Drug discovery and development: role of basic biological research. Alzheimers Dement. (N Y). 3, 651–657 (2017) 6. DiMasi, J.A., Hansen, R.W., Grabowski, H.G.: The price of innovation: new estimates of drug development costs. J. Health Econ. 22, 151–185 (2003) 7. Baskin, I.I.: Practical constraints with machine learning in drug discovery. Expert Opin. Drug Discov. 16, 929–931 (2021) 8. Gupta, R., Srivastava, D., Sahu, M., Tiwari, S., Ambasta, R K., Kumar, P.: Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol. Divers. 25, 1315–1360 (2021) 9. Mingbo, Z., Huipu, H., Zhili, X., Ming, C.: Applications of machine learning in drug discovery. Biomed. J. Sci. Tech. Res. 23, 17050–17052 (2019) 10. Liao, C., Sitzmann, M., Pugliese, A., Nicklaus, M.C.: Software and resources for computational medicinal chemistry. Future Med. Chem. 3, 1057–1085 (2011) 11. Talevi, A., Morales, J.F., Hather, G., Podichetty, J.T., Kim, S., Bloomingdale, P.C., Kim, S., Burton, J., Brown, J.D., Winterstein, A.G., Schmidt, S., White, J.K., Conrado, D.J.: Machine learning in drug discovery and development part 1: a primer. CPT Pharmacometrics Syst. Pharmacol. 9, 129–142 (2020) 12. Carracedo-Reboredo, P., Liñares-Blanco, J., Rodríguez-Fernández, N., Cedrón, F., Novoa, F.J., Carballal, A., Maojo, V., Pazos, A., Fernandez-Lozano, C.: A review on machine learning approaches and trends in drug discovery. Comput. Struct. Biotechnol. J. 19, 4538–4558 (2021) 13. Manne, R.: Machine learning techniques in drug discovery and development. Int. J. Appl. Res. 7, 21–28 (2021) 14. Begam, B.F., Kumar, J.S.: A study on cheminformatics and its applications on modern drug discovery. Procedia Eng. 38, 1264–1275 (2012) 15. Lo, Y.C., Rensi, S.E., Torng, W., Altman, R.B.: Machine learning in chemoinformatics and drug discovery. Drug Discov. Today. 23, 1538–1546 (2018) 16. Priya, N., Shobana, G.: Application of machine learning models in drug discovery: a review. Int. J. Emerg. Technol. 10, 268–275 (2019)

Machine Learning for Drug Discovery Using Agglomerative …

137

17. Vamathevan, J., Clark, D., Czodrowski, P., Dunham, I., Ferran, E., Lee, G., Li, B., Madabhushi, A., Shah, P., Spitzer, M., Zhao, S.: Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18, 463–477 (2019) 18. Patel, L., Shukla, T., Huang, X., Ussery, D.W., Wang, S.: Machine learning methods in drug discovery. Molecules 25, 5277 (2020) 19. Gu, L., Zhang, X., Li, K., Jia, G.: Using molecular fingerprints and unsupervised learning algorithms to find simulants of chemical warfare agents. J. Phys. Conf. Ser. 1684(012072), 1–13 (2020) 20. Syarofina, S., Bustamam, A., Yanuar, A., Sarwinda, D., Al-Ash, H.S., Hayat, A.: The distance function approach on the mini batch K means algorithm for the DPP-4 inhibitors on the discovery of type 2 diabetes drugs. Proc. Comp. Sci. 179, 127–134 (2021)

Automated Photomontage Generation with Neural Style Transfer Mohit Soni, Gaurang Raval, Pooja Shah, and Sharada Valiveti

1 Introduction Transferring visual style from one image to another is a common problem for computer vision and deep neural networks in the era of machine learning and artificial intelligence. Neural style transfer extracts a style from an image which can be applied to another image’s content. Here, two inputs are needed. The first one is the content image, and the second one is the style image. Both inputs are processed through the convolutional neural network (CNN) [1]. The objective of the proposed work is to apply the text as content transfer and combine it with the style of another image with minimum loss. Assume there is an image with red background and one wants to apply a custom text over it, this is the task which can be done with the help of image editing applications.

M. Soni (B) · G. Raval · S. Valiveti Computer Science and Engineering Department, Institute of Technology, Nirma University, Ahmedabad 382481, India e-mail: [email protected] G. Raval e-mail: [email protected] S. Valiveti e-mail: [email protected] P. Shah Department of CSE, School of Technology, Pandit Deendayal Energy University, Gandhinagar 382007, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_13

139

140

M. Soni et al.

However, with the help of neural computing approaches, this integration of style and content can be done in an automated way. Recently, some researchers have explored the applications of this idea and applied the same for style transfer as reported in the literature. The results obtained can be further fine-tuned by applying iterative corrections to the outcome of this transfer process. Section 2 discusses the related work in this area. Section 3 presents the proposed approach with a detailed explanation; results and related analysis are presented in Sects. 4 and 5 and conclusions in Sect. 6.

2 Related Work Several researchers have worked on the neural style transfer-based approaches in the recent past as mentioned in this section. Kim [2] used generative adversarial networks (GANs) with a different structure for neural style transfer. In this paper, European paintings are used as the input here. To decrease the loss during neural style transfer, the stochastic gradient descent method is used. Gatys et al. [3] proposed deep neural networks of CNN to reconstruct the input image from 5 convolution layers of the original VGG-16. To prevent feature loss, the squared-error loss between the two feature representations is defined. Also, loss function is used to minimize the feature loss by below equation. L = αL content + β L style

(1)

Liu et al. [4] proposed image style transfer without neural networks for style transfer. Common styles transferred are texture synthesis, image analogy, image filtering, and image analogy for style transfer. Forward networks are used for combining style and image while ensuring that the flexibility of style and the calculation efficiency is achieved with a real-time effect. Authors of [5] proposed a way of using neural style transfer in Japanese animation in which the use of color is limited in the output images to make them similar to the ideal color. CNN is used to reduce the workload of Japanese animation work. They could create an image with similar styles and different content. Authors of [6] proposed the use of creating human art using artificial intelligence. By selecting the style to content ratio, a computer can create art form which looks as if it was created by humans. Zhao et al. [7] divided the loss function into two categories—one is the local loss function, and another one is the global loss function. The local loss function keeps the style details while the global loss function stores the details about the structure of the image. The global loss function uses feature map in a neural network for optimization. The local loss function divides neural maps into a block for optimization. Sheng et al. [8] proposed a mechanism through which neural style transfer can predict the ink tone, brush stroke, and yellowing that differentiates the Chinese

Automated Photomontage Generation with Neural Style Transfer

141

painting and the western art. Chinese Painting Style Transfer algorithm preserves the Chinese painting style during style transfer [11]. Yuan et al. [9] contributed an artistic font using machine learning method using conditional producer opposing networks. The first view of the proposed route uses two separate networks—the typeface network and the decorative network. The typeface network changes the shape of an input font, and the decorative network adds effects to the font. The second idea is to provide the bones and edges of the character as an auxiliary installation on the typeface network to avoid interrupting the file font format. Atarsaikhan et al. created decorated logos [10]. It requires photo editing skills to create such logos. They introduced the new loss function based on distance conversion of inputs image, which allows the retention of text drawings and things.

3 Proposed Approach This section discusses the proposed approach in detail. Figure 1 shows the architecture of the proposed approach. Dataset: For the process of content and feature extraction, the trained model of ImageNet detects the image content into various classes such as cars, animals, and airplanes. ImageNet has been trained for more than 1 million photographs, to get the desired content from the image.

Fig. 1 Proposed architecture of NST

142

M. Soni et al.

Model-Feature: Features are extracted using the pre-trained VGG-19 network. VGG-19 is trained on the ImageNet dataset and exclusively used in the field of face detection and capture. In this work, neural style transfer is used where the model is frozen and the desired convolutional layer is selected from each convolutional block to extract the specific requirement. In this approach, different convolutional styles are extracted, and the results will be compared. The results of these style layers are used to form a gram matrix. The gram matrix is a mathematical way to represent the styles between convolutional neural networks. Gram matrix is a dot product of all vectors from depth in various blocks. G il j =



Fikl Fjkl .

(2)

k

Ram matrix provides the similarities between features in a layer, and it also provides some insights into the texture and color information present in a picture. Loss Function: To obtain the style features and content from the images and measure the accuracy of the process, a loss function is defined. This loss function value has to be minimum to achieve the desired quality of the output. While calculating loss between generated style and original style or generated content and original content, the concept of per-pixel loss is defined. Content Loss: Compared to the style loss, the content loss is easier to calculate as the generated content image depends on the single convolutional layer. Content loss is defined by L (content). The content loss function measures the difference in features of generated image and the input image using the squared loss function. Lcontent (F(x), F(xc )) =

Nl  ||(F(x) − F(xc ))||2

(3)

i=1

Here, F(x) denotes the original image, and F(xc ) denotes the generated image. Style Loss: Style is a mixture of edges, curves, strokes, and textures which can be extracted at different layers of CNN. Hence, to get the style, we need to find the amount of correlation between the feature maps per layer used to calculate the style information. A gram matrix is a correlation matrix used to get information about style and style loss. While reducing the style loss, it is ensured that the features of the convolution layer and original image have minimal difference. Gram matrix is the product of ith and the jth element of the feature map and is added with the height and width of the image. For calculating style loss, the square of the gram matrix is passed to the square loss function. Style loss does not depend on the content of the image.

Automated Photomontage Generation with Neural Style Transfer

143

Fig. 2 Example of the photomontage

L lstyle

= 1/M

l

 ij

G il j (s)−G il j g

2

(4)

Where, G il j (I ) = k Alik (I )Aljk (I ) Total Loss: After calculating the loss function of content as well as style, the total loss function is computed as the weighted addition of the content loss and the style loss. While changing the values of alpha and beta, the loss function can be optimized. L = αL content + β L style

(5)

4 Implementation Results To obtain the output of neural style transfer, the model for NST is trained by extracting the content and style features [11] from the synthesized image and then optimizing the loss function to obtain a better result. Figure 2 demonstrates the combined output of style and content. By changing the values of alpha and beta from the total loss, it can be observed that there will be a change in the generated image. Parameters alpha and beta act as the hyper-parameters for content loss and style loss values. Alpha suggests the amount of content, and beta suggests amount of style. To obtain the best results, the values of alpha and beta are altered such that alpha < beta. Hence, it can be concluded that the style is applied to the content.

5 Analysis and Comparative Study In order to get optimized output image based on loss and computation time [12], different variations on the given algorithm were tried, and their statistics are projected in the following subsections.

144

M. Soni et al.

Fig. 3 Analysis swapping style and content

5.1 Based on Style and Content Swap Initially, the content image is taken as the generated image and then merged with the style. Figure 3 shows that if style image is taken initially as the generated image, the total loss would reduce, but the end product image would not meet the desired objective. Figure 3 implies that even if the style loss decreases per iteration, the content loss keeps increasing, and hence, in the output, we don’t get text from the written content image.

5.2 Based on Different Number of Iterations Figure 4 shows loss graph and output of 500 to 5000 iterations. As the number of iterations increases, the graph of total loss and iteration becomes smooth. This eventually results in an optimized output image. And it is observed that after a certain number of iterations, both of them start decreasing simultaneously.

5.3 Based on Different Extracted Layer Figure 5 shows loss graph and output of [3, 8, 15, 22], [6, 11, 18, 25], [11, 15, 22, 29], [15, 22, 25, 29] layers extracted for style and each array’s last element for content. It can be observed that the color of the style image is extracted in the earlier layer

Automated Photomontage Generation with Neural Style Transfer

145

Fig. 4 Analysis different iterations

of the model. Hence, at higher layers, we get some sort of white image. Apart from that, we get the most optimized image for the least total loss at [13, 15, 22, 29] which is not the same as the objective image that we desire. The output most nearer to our objective image are at [3, 8, 15, 22]. After adding another style layer for extraction, the best output obtained is at [3, 8, 15, 22, 27].

6 Challenges and Conclusion The nearer to the objective image is being extracted at layers [3, 8, 15, 22, 27]. All these extracted layers are at ReLU, i.e., rectified linear activation unit. The reason is calculating tensors at the ReLU layer is smoothened by it, resulting in a better image and with less computation [13] (Fig. 6).

146

M. Soni et al.

Fig. 5 Analysis of different extracted layers

Fig. 6 5 Style layers best output

Current approach lacks in time for computation and the accuracy when compared to Photoshopped output image which can be seen by using SSIM scores. More feature layers from style can be used to reduce the style loss and optimize the output image. There are various other ways to improve NST such as by improving the loss function. These challenges can be improved by using autoencoders [14], cycle-consistent adversarial networks [15] and for extracting style and content. The computation time can be reduced by using real-time style transfer [16] and various boosting schemes like AdaBoost, CatBoost [17].

Automated Photomontage Generation with Neural Style Transfer

147

References 1. Luan, F., Paris, S., Shechtman, E., Bala, K.: Deep photo style transfer. CVPR 2017 2. Kim, K.S., Kim, D., Kim, J.: Hardness on style transfer deep learning for rococo painting masterpieces. In: 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 452-454 (2019). https://doi.org/10.1109/ICAIIC.2019.8668965 3. Gatys, L., Ecker, A.: Matthias Bethge; A Neural Algorithm of Artistic Style. J. Vis. 16(12), 326 (2016). https://doi.org/10.1167/16.12.326 4. Liu, L., et al.: Advanced deep learning techniques for image style transfer: a survey. Signal Process. Image Commun. 78, 465–470 (2019) 5. Ye, S., Ohtera, R.: Japanese animation style transfer using deep neural networks. In: 2017 International Conference on Information, Communication and Engineering (ICICE), pp. 492– 495, (2017). https://doi.org/10.1109/ICICE.2017.8479213 6. Jeong, T., Mandal, A.: Flexible selecting of style to content ratio in neural style transfer. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 264–269 (2018). https://doi.org/10.1109/ICMLA.2018.00046. 7. Zhao, H.-H., Rosin, P.L., Lai, Y.-K., Lin, M.-G., Liu, Q.-Y.: Image neural style transfer with global and local optimization fusion. IEEE Access 7, 85573–85580 (2019). https://doi.org/10. 1109/ACCESS.2019.2922554 8. Sheng, J., Song, C., Wang, J., Han, Y.: Convolutional Neural Network Style Transfer Towards Chinese Paintings. IEEE Access 7, 163719–163728 (2019). https://doi.org/10.1109/ACCESS. 2019.2952616 9. Yuan, Y., Ito, Y., Nakano, K.: Art font image generation with conditional generative adversarial networks. In: 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW), pp. 151–156 (2020). https://doi.org/10.1109/CANDARW51189.2020.00039 10. Atarsaikhan, G., Iwana, B.K., Uchida, S.: Constrained neural style transfer for decorated logo generation. In: 13th IAPR International Workshop on Document Analysis Systems (DAS) 11. Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., Yang, M.H.: Universal style transfer via feature transforms. NeurIPS (2017) 12. Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance normalization: the missing ingredient for fast stylization 27 Jul 2016 13. Huang, X., Belongie, S.: arbitrary style transfer in real-time with adaptive instance normalization. ICCV (2017) 14. Choi, H.C.: Unbiased image style transfer, date of publication: October 28, 2020, date of current version November 10, 2020. https://doi.org/10.1109/ACCESS.2020.3034306 15. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycleconsistent adversarial networks. ICCV (2017) 16. Johnson, J., Alahi, A., Fei-Fe, L.: Perceptual losses for real-time style transfer and superresolution 27 Mar 2016 17. Venkata Praneel, A.S., Srinivasa Rao, T., Ramakrishna Murty, M.: A survey on accelerating the classifier training using various boosting schemes within casecades of bossted ensembles. In: International Conference with Springer SIST Series, vol. 169, pp, 809–825 (2019)

Machine Learning Model Development Using Computational Neurology Soumen Kanrar

1 Introduction The motivation behind this paper is to explore the effective implementation of the reinforcement learning method based on neural networks. It requires a model much closer to the biological neurons’ networking characteristics. Spiking neural networks can better represent spatiotemporal data by making them far more computationally efficient than the existing neural networks. Implementing a working simulation of such networks can be helpful in the following ways. In the field of medical sciences, studying the neuronal dynamics in simulated spiking neural networks is an alternative to invasive procedures [1]. It provides a unique way of studying pathologies. The educational tool for computational neuroscience is a formidable field to wrap one’s head around. The predictive tool can understand the computational models of neurons better. Not to mention, understanding their unsupervised machine learning algorithms. The challenges currently faced in artificial general intelligence during the application of intelligent systems to generalized human tasks older generations of artificial neural networks are incapable of performing these tasks as well as SNN simulators. TN Wiesel and DH Hubel were the first to recognize that their highresolution, single-neuron analyses of visual response properties in the cat cortex [2]. It provided a powerful approach to deciphering how much sensory feature extraction was fixed at birth and how much depended on its appearance on environmentintroduced visual activity. This study helped in explaining the working of a neuron in computational neurology. A neural network model for a mechanism of visual pattern recognition is proposed in this paper. The network is self-organized by ‘learning S. Kanrar (B) Department of Computer Science and Engineering, Amity University Jharkhand, Ranchi 801001, India e-mail: [email protected] Department of Computer Science, Vidyasagar University, Midnapore, West Bengal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_14

149

150

S. Kanrar

without a teacher’ and acquires an ability to recognize stimulus patterns based on the geometrical similarity (Gestalt) of their shapes without being affected by their positions. This network is given the nickname ‘neocognitron’. After completion of self-organization, the network has a structure similar to the hierarchy model of the visual nervous system proposed by Hubel and Wiesel [2]. Rate coding vs. temporal order coding proposed for the visual cortex’s retinal ganglion cells [3]. It is often supposed that the messages sent to the visual cortex by the retinal ganglion cells are encoded by the mean firing rates observed on spike trains generated with a Poisson process. One of the useful simulators is the spiking neural network (SNN) simulator [4]. The solution of the (SNN) simulator consists of a customized hardware architecture that computes the neural-synaptic parameters and a software environment. It displays the same on the screen of a general-purpose computer (Host). The data to be visualized is transferred to the host through a high-speed Ethernet link. The proposed solution tailors the use of the Ethernet link bandwidth to an optimum such that the customized neuromorphic architecture and the host computer duo can be used to visualize the dynamics of the SNN network of any size in real time.

1.1 Biological Neuron We know how our input is to be encoded and transmitted via the input layer. It is crucial for us to describe the structure of a neuron and the function that governs when it fires and when it does not. Biologically speaking, a neuron has three distinct parts, namely, the dendrites, the cyton, and the axon. Figure 1 shows a typical neuron. Let us consider dendrites as inputs. The cell body is the processing center, and the axon is the output. The axon terminals of one neuron are connected to the dendrites of other neurons. A junction is known as a synapse. The strength of this junction is chemically controlled and dictates the ease with which the spikes will transmit from the presynaptic neuron to the postsynaptic neuron. This strength is subject to change depending on how often the two connected neurons participate in firing.

2 Neuronal Dynamics: SRM-0 Computational neuroscience describes plenty of models that describe the electrodynamics of a neuron. In this work, we consider the formal spike response model–0. It is a generalization of the leaky integrate and fire model. It models a biological neuron with high accuracy and does not suffer from the high dimensionality of ordinary differential equations involved in models, for example, the Hodgkin–Huxley model. Figure 2 shows how the membrane potential builds within a postsynaptic neuron with successive spikes for the spike response model–0. Once this crosses the threshold, the postsynaptic neuron generates an output spike [5].

Machine Learning Model Development Using Computational Neurology

151

Fig. 1 Biological neuron

Fig. 2 Membrane potential builds

The process is analytically modeled using Eq. (1). μ(t) = μr est +

 i

(f)

(f)

ωi j ∗ ε(t − t j ) + η(t − ti )

(1)

j

μ(t): Represent the membrane potential at time (t). μr est : Represent the resting membrane potential. ε(t): It describes the spike and the gain in potential. η(t): This function modeled the refractory phase (in case the potential falls drastically). ωi j : It is the synaptic strength between the pre- and postsynaptic neurons.  (f) If μ(t) = μthr eshold and dμi (t) dt  0, then t = ti .

152

S. Kanrar

2.1 Programming Language for the Computational Neurology We have considered the programming language Julia to implement our work. Julia is a high-level general-purpose dynamic programming language that addresses the needs of high-performance numerical analysis for computational science. It does not require any separate compilation. We have considered Julia over Python for some specific reasons. One of the reasons is the multiple dispatches available in Julia. It provides the ability to define function behavior across many combinations of argument types. This is effective language design. It provides flexibility in designing for general mathematical functions that can be dispatched to different numerical objects. For example, it has the same function for scale and addition of vectors and matrices. Julia is working optionally on a dynamic-type system. We can have strong typing wherever type checks are essential for error reduction, and it can also be dynamically typed, if necessary. The explicit control helps to reduce errors and accounts for better code readability. Julia was designed with high-performance numerical computation in mind. It makes the language more effective for computational neurology. JIT (justin-time) compilation along with LLVM offers speeds approaching those of statically typed languages like C. It makes orders faster than Python. Julia does not require any extra libraries or APIs to call C functions. They can be used as they are. It even goes a step further with its PyCall package that enables us to run native Python code on Julia, thus leaving the power of the entire Python ecosystem at our disposal. Otherwise, the neuron only fires when it gains potential. If it occurs during depolarizing, i.e., not on losing potential, after a spike, then it is called hyperpolarization.

2.2 Dataset Description: MNIST The MNIST training set is comprised of 60,000 handwritten digit images that can be used to train and validate machine learning models for the pattern recognition task of recognizing digits. Supervised learning algorithms can classify these digits as per the provided labels. Unsupervised learning algorithms can make the distinction between these digits by learning them as different features (Fig. 3).

3 Proposed Methodology Here, we present the background architecture. The algorithms’ specifics how the models are operated in the context of pattern recognition tasks are described. First, we shall look at individual steps and then at the bigger picture. The photo-sensitive nerve cells of the retina are the first stage of the human visual pathway. They receive light rays and encode this visual information in the form of action potentials or

Machine Learning Model Development Using Computational Neurology

153

Fig. 3 MNIST sample data

spike trains, transmitted through the optic nerve to the dLGN, in the thalamus, where they are decoded. These retinal cells are arranged in a very specific manner. Certain regions become more susceptible to excitation on being subjected to bright light, and others have the exact opposite behavior. These are arranged in an almost concentric circular manner. These forms are on-centers and off-centers [3]. These are known as the receptive fields. The first step of our algorithm is to model these receptive fields to encode our grayscale images into spike trains. Figure 4, shown beside, is two trivial models of on-centered (A) and off-centered (B) receptive fields. These receptive fields are the spatial regions that hold the capability of activating a neuron that lies connected at the center. Outside this region, this neuron does not receive any excitation. In the context of computational modeling, we can essentially think of RFs as a matrix that is used to weigh the pixel intensity values of an image. RFs are characterized by enhanced contractility and matrix remodeling. To construct a shape similar to the off-centered receptive field, the distance of a cell from the center of the matrix is the distance between two clusters. The following minimum distance between elements of each cluster is also called single-linkage clustering. Min {d(x, y): x∈A, y∈B} The formula for this distance between points x = (x1 , x2 , · · · ), y = (y1 , y2 , · · · ) Fig. 4 Models of on-centered (A) and off-centered (B)

B

A

+ +

-

154

S. Kanrar

  1 d(x, y) car d(A)car d(B) x∈(A) x∈(B) The off-centered approach is being used here because our mist images have a white background and black ink. We need to give more weight to the black and thus a negative center [2]. The image on the right depicts one such receptive field. Algorithm 1: Generate a n × n receptive field matrix. Procedure: Generate n odd integer // Dimension Find Origin coordinates: oxy = ceil (n / 2) Initialize: W = - 1 * ( n × n ) // A weighted singular matrix k = - 1 * 0.375 // Generate weights as per the Manhattan //distance, origin as center. Begin for x =1 to n do for y=1 to n do W[x, y] = W[x, y] - (| oxy - x | + | oxy - y |) * k done Done //Return the weighted matrix, its index limits, and origin coordinate. Return W, [1 - oxy…, oxy - 1], oxy End

The output is presented in Fig. 5. Our input data is a 28 × 28 pixel image. Convolution is a formal mathematical operation defined as +∞

( f ∗ g)(t) = ∫ f (τ )g(t − τ )dτ −∞

Fig. 5 Dry run output of Algorithm 1 based on Julia

(2)

Machine Learning Model Development Using Computational Neurology

155

Fig. 6 Sliding window over an image

If we look closely, we can see that this is a weighted integral. Since our input space is discrete, involving pixels and weights, we can write it as n  n 

ω[i, j] ∗ I mg[ir el , jr el ]

(3)

i=0 j=0

Which can be thought of as weighing each pixel over a sliding window over an image, as shown in Fig. 6. Figure 7 shown above is a convoluted matrix for n = 7 and dx = dy = 4. The convolution is a 7 × 7 matrix, an input for 49 neurons in the first layer. Algorithm 2 generates a vector containing convoluted values. This can be reshaped to form the convoluted matrix

156

S. Kanrar

Fig. 7 Convoluted matrix for n = 7

Algorithm 2: Generate convolution Image Int: n //Dimension Real: (idx, jdy) // Strides Initialize Real: Empty conVector [] Get w, idx, jdy, oxy // Use algorithm 1 Generate: RF (n) // //RF-Window, for i=1 to Ylimit //Xlimit, Ylimit are pixel limits of the image for j=1 to Xlimit sum = 0 // Sliding window indices for m in idx for n in idx if (1 x + 1 (7) y−1 + y−x−1 p=i+1 (1 − νx p )(1 − ν py )

y−x−1

And, for y = x + 1, let M x y = Mx y , and y < x, let M x y = (v yx , μ yx )M x y = (v yx , μ yx ). Using the above equations, the multiplicative predictable intuitionistic connection is acquired and the lower triangular components of the matrix.

Determining the Attribute Priority Among the Maintenance and Support …

197

Step 5: Consistence Check The calculation of the distance among the intuitionistic relations is carried out with the support of the below formula [16].

1 (|μx y − μx y | + ||ν x y − νx y | + ||π x y − πx y |) 2(n − 1)(n − 2) x=1 y=1 n

d(M, M) =

n

(8) d(M, M) < τ

(9)

In the event that Eq. 8, the connection matrix is reliable in the intuitionistic connection. Here, τ is the limit for the consistency. In the event that Eq. 8 is not fulfilled, then intuitionistic inclination connection is not reliable, then go to the step1. Step 6: Weight Calculation The following equation is used to design the greatness of the intuitionistic preference relations [15]. n

wx =

= n

y=1 [μx y , 1 − νx y ] n x=1 y=1 [μx y , 1 − νx y ]

(10)

y=1 [μx y ] n n x=1 y=1 [1 −

[1 − νx y ] , 1 − n n νx y ] x=1 y=1 [μx y ]

(11)

wx =

n

1 y=1 M x y n n 1 x=1 y=1 M x y

n

n

y=1

According to Szmidt and Kacprzyk, a function in mathematical category is as follows: ρ(α) = 0.5(1 + πα )(1 + μα )

(12)

Step 7: Preference Ranking After derived the weight of Eq. (11), P(α) is calculated with the help of Eq. (12). Using P(α) value, the order of the preference attributes (Rank) will be found.

4 Implementation Determining the attribute priority among the maintenance and support phases in software development process using intuitionistic fuzzy analytical hierarchy process is important for the module of the software development process. Each level of the software development process will ensure the quality of the software. After deployment of the software at the client place, maintenance and supporting will

198

S. Muthuselvan et al.

Questionnaire Framing

AUTHOR

Intelligent Agent

Questionnaires

Sending Questionnaire to expert

IT Industry Experts System Input

Input N = 79%

Yao et al. in [6]

Using fastText approach

Improving text categorization models by applying various weights to features

Precision—0.92 Recall—0.93 F-value—0.92

Zhang et al. in [7]

Using Mahalanobis distance-based KNN classifier

MDKNN can be utilised since it is more accurate than KNN

70–90%

Wenqian et al. in [8]

Using novel feature weight method—Gini index

Improving the performance of the TF-Gini algorithm for categorization

The TF-Gini method, when used to weight features in classifiers, produces better classification results than other feature weight algorithms

60–70%

(continued)

Book Genre Classification System Using Machine Learning Approach: …

239

Table 1 (continued) Authors

Methodology

Future scope/enhancement

Accuracy (%)

Shaohui et al. in [9]

Using vector space model

The VSM representation of a document loses a lot of information about term association, which would be a future topic

High precision and recall

Alan et al. in [10]

Using word embeddings

Apply the algorithms to data to PubMed data

LCAF1 —0.89

3 Conclusion As the amount of data grows exponentially in the modern day, the need for text data classification and categorization grows. Machine learning techniques may be useful in resolving this problem. Email filtering, chat message filtering, news feed filtering, and other industries can all benefit from text categorization. It has also been observed in locations such as libraries, bookstores, and eBook sites where books are not classified by genre. By reiterating this point, the main aim here is to use machine learning techniques to classify the books by genre and text categorization tools, which will aid in the classification of books by genre based on the title and abstract. The extensive study of studies by various writers connected to the subject is done in this paper. Acknowledgements The authors are thankful to Department of Information Technology of GMR Institute of Technology and thankful to all the authors of the reference.

References 1. Shiroya, P., Vaghasiya, D., Soni, M., Patel, V., Panchal, B.Y.: Book genre categorization using machine learning algorithms (K-nearest neighbor, support vector machine and logistic regression) using customized dataset. Int. J. Comput. Sci. Mob. Comput. 10(3), 14–25 (2021) 2. Gupta, S., Agarwal, M., Jain, S.: Automated genre classification of books using machine learning and natural language processing. In: 2019 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pp. 269–272. IEEE (2019) 3. Agarwal, D., Vijay, D.: Genre classification using character networks. In: 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 216–222. IEEE (2021) 4. Ozsarfati, E., Sahin, E., Saul, C.J., Yilmaz, A.: Book genre classification based on titles with comparative machine learning algorithms. In: 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), pp. 14–20. IEEE (2019)

240

A. Sethy et al.

5. Li, Z., Shang, W., Yan, M.: News text classification model based on topic model. In: 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pp. 1– 5. IEEE (2016) 6. Yao, T., Zhai, Z., Gao, B.: Text classification model based on fasttext. In: 2020 IEEE International Conference on Artificial Intelligence and Information Systems (ICAIIS), pp. 154–157. IEEE (2020) 7. Zhang, S., Pan, X.: A novel text classification based on Mahalanobis distance. In: 2011 3rd International Conference on Computer Research and Development, vol. 3, pp. 156–158. IEEE (2011) 8. Shang, W., Dong, H., Zhu, H., Wang, Y.: A novel feature weight algorithm for text categorization. In: 2008 International Conference on Natural Language Processing and Knowledge Engineering, pp. 1–7. IEEE (2008) 9. Liu, S., Dong, M., Zhang, H., Li, R., Shi, Z.: An approach of multi-hierarchy text classification. In: 2001 International Conferences on Info-Tech and Info-Net. Proceedings (Cat. No. 01EX479), vol. 3, pp. 95–100. IEEE (2001) 10. Stein, R.A., Jaques, P.A., Valiati, J.F.: An analysis of hierarchical text classification using word embeddings. Inf. Sci. 471, 216–232 (2019) 11. Chiang, H., Ge, Y., Wu, C.: Classification of book genres by cover and title. In: IEEE International Conference on Intelligent Systems and Green Technology (ICISGT) (2019) 12. Patel, S.H., Aggarwal, D.: BGCNet: A Novel Deep Visual Textual Model for Book Genre Classification 13. Xu, Z., Liu, L., Song, W., Du, C.: Text genre classification research. In: 2017 International Conference on Computer, Information and Telecommunication Systems (CITS), pp. 175–178. IEEE (2017) 14. Feldman, S., Marin, M.A., Ostendorf, M., Gupta, M.R.: Part-of-speech histograms for genre classification of text. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4781–4784). IEEE (2009) 15. Sarangi, P.K., Sahoo, A.K., Nayak, S.R., Agarwal, A., Sethy, A.: Recognition of isolated handwritten Gurumukhi numerals using Hopfield neural network. In: Computational Intelligence in Pattern Recognition, pp. 597–605. Springer, Singapore (2022) 16. Liu, R., Jiang, M., Tie, Z.: Automatic genre classification by using co-training. In: 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery, Vol. 1, pp. 129–132. IEEE (2009) 17. Saini, T., Tripathi, S.: Predicting tags for stack overflow questions using different classifiers. In 2018 4th International Conference on Recent Advances in Information Technology (RAIT), pp. 1–5. IEEE (2018) 18. De Dillmont, T.: Encyclopedia of Needlework. Editions Th. de Dillmont (1987) 19. Sethy, A., Raut, A.K., Nayak, S.R.: Face recognition based automated recognition system. In: 2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pp. 543–547 (2022). https://doi.org/10.1109/Confluence52989.2022.9734135 20. Parwez, M.A., Abulaish, M.: Multi-label classification of microblogging texts using convolution neural network. IEEE Access 7, 68678–68691 (2019) 21. Yang, Z., Liu, G.: Hierarchical sequence-to-sequence model for multi-label text classification. IEEE Access 7, 153012–153020 (2019) 22. Sethy, A., Patra, P.K., Nayak, S., Jena, P.M.: Symmetric axis based off-line Odia handwritten character and numeral recognition. In: 2017 3rd International Conference on Computational Intelligence and Networks (CINE), pp. 83–87. IEEE (2017) 23. Helaskar, M.N., Sonawane, S.S.: Text classification using word embeddings. In: 2019 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA), pp. 1–4. IEEE (2019) 24. Holanda, A.J., Matias, M., Ferreira, S.M., Benevides, G.M., Kinouchi, O.: Character networks and book genre classification. Int. J. Mod. Phys. C 30(08), 1950058 (2019) 25. Mason, J.E., Shepherd, M., Duffy, J.: An n-gram based approach to automatically identifying web page genre. In: 2009 42nd Hawaii International Conference on System Sciences, pp. 1–10. IEEE (2009)

Book Genre Classification System Using Machine Learning Approach: …

241

26. Kundu, C.S.: Book Genre Classification by its Cover Using a Multi-view Learning Approach (2020) 27. Kundu, C., Zheng, L.: Deep Multi-modal Networks for Book Genre Classification Based on its Cover (2020). arXiv preprint arXiv:2011.07658 28. Worsham, J., Kalita, J.: Genre identification and the compositional effect of genre in literature. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1963– 1973 (2018) 29. Sethy, A., Patra, P.K.: Discrete cosine transformation based approach for offline handwritten character and numeral recognition. J. Phys. Conf. Ser. 1770(1):012004. IOP Publishing (2021) 30. Jordan, E.: Automated Genre Classification in Literature. Doctoral dissertation, Kansas State University (2014) 31. Sobkowicz, A., Kozłowski, M., Buczkowski, P.: Reading book by the cover—book genre detection using short descriptions. In: International Conference on Man–Machine Interactions (pp. 439–448). Springer, Cham (2017) 32. Zheng, W., Jin, M.: Comparing multiple categories of feature selection methods for text classification. Digit. Scholarsh. Human. 35(1), 208–224 (2020) 33. Zheng, Y.: An exploration on text classification with classical machine learning algorithm. In: 2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), pp. 81–85. IEEE (2019) 34. Sethy, A., Patra, P.K., Nayak, S.R.: A hybrid system for handwritten character recognition with high robustness. Traitement du Signal 39(2), 567–576 (2022). https://doi.org/10.18280/ ts.390218

A New Method for Imbalanced Data Reduction Using Data Based Under Sampling B. Manjula and Shaheen Layaq

1 Introduction When large volume of data is present to understand and interpret it we have to classify the data properly so that we can take appropriate decision. While classifying the large volume of the data it is facing with the problem of imbalanced dataset. Imbalance is a problem if it is present the accuracy of result decreases. Imbalance issue occurs if a particular class has more related samples than that of other class related samples. It is mostly seen in many applications where data is rare. One of the rare, sensitive and imbalanced applications which we considered in our work is software defect prediction (SDP). After extracting SDP dataset from PROMISE repository we observed that it consists of two different labels defect and non-defect. When the count of defect and non-defect were made the non-defect samples where more than that of defect samples. By which the difference in count leads imbalance problem. All the non-defect samples forms the majority class and defect samples forms the minority class. On this imbalance SDP dataset if classification is done we leads to inaccurate result. To resolve it we have to do early prediction and balance it. To make SDP imbalance dataset to balance many balancing methods are present among them data sampling shows good performance. The data sampling methods overcomes the problem of imbalance by adding or deleting the data samples. Sometimes it is observed that both can also performed at a time. In earlier work many data sampling methods were presented and discussed in which each of them faces their own merits and demerits. The data sampling methods can be basically classified as under sampling, oversampling and hybrid sampling. In oversampling methods to make balance the minority class is considered and few new samples are been pushed to minority dataset. It had faced with a problem that unrelated samples may B. Manjula · S. Layaq (B) Department of Computer Science, Kakatiya University, Warangal, Telangana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_22

243

244

B. Manjula and S. Layaq

be added. In under sampling method to balance it pops some of the extra samples from majority class by which some important data may be lost but it can be overcome by our proposed method IDRUS in which we are going to pop samples which are least important. The hybrid sampling is combination of over and under sampling. But, to implement it is hard and requires more time complexity.

2 Related Work Many data-based methods were introduced and proposed depending on the concept of oversampling, under sampling and hybrid sampling. But, each of them faced with the different advantages and problems some of them are discussed here. The synthetic minority oversampling technique (SMOTE) [1] was proposed by Chawia. It is oversampling method which gives priority to minority class. To push or add extra samples it randomly selects one sample from the minority class and from that sample it identifies the near sample to it and generates new random samples in between them. By doing so the number of samples of minority class increases and balance is maintained in between majority and minority. But by selecting of initial sample randomly it leads to unnecessary, outliers, noisy and boundary samples. The Borderline-SMOTE [2] was proposed by Han, in which new samples were identified from border but, most of them were not related. Majority weighted minority oversampling technique [3] was explained by Barua et al., they added extra weights to the minority samples but if weights are too big and too small then unrelated samples will be added. The Class Imbalance Reduction (CIR) was proposed by Kiran et al. [4] they tried to overcome the problem of SMOTE by calculating mean from minority dataset and from pivot(center) sample it identifies the first nearest neighbor and randomly it tries to generates the new samples. But the all the random samples generated were found to be biased, dense and overlapped. The Random under sampling (RUS) method was proposed by Prusa et al. [5] in which samples of majority class were deleted randomly to balance but, by randomly deleting the chances of losing important data is very high here. The high density based [6] deleting majority data was proposed by Małgorzata et al., but, more dense in the sense more related by which important data may lost. The diversified sensitivity based under sampling [7] was proposed by Ng et al. They used the concept of clustering, distribution and scaling on the majority class. But by scaling we may hide few features. The Cluster-Based Under sampling (CBUS) [8] was proposed by Yen and Lee. The various clusters were formed from the imbalanced data and from each cluster majority and minority datasets were identified and samples from majority datasets were deleted randomly. But, by forming multiple clusters we are again reducing the size of data and by deleting randomly we may lose some important data. The hybrid approach SMOTE with Edited Nearest Neighbor (SMOTEENN) [9] was proposed by Gustavo et al. Oversampling is done by SMOTE and other side deletion is done using ENN concept. But, SMOTE oversampling is not appropriate and ENN was able to delete only few samples. A new approach of oversampling and

A New Method for Imbalanced Data Reduction Using Data Based …

245

under sampling was proposed by Anantaporn [10] in which SMOTE was used for oversampling and samples which are overlapping are deleted from the majority class. But, SMOTE has its own drawback and but if majority dataset doesn’t contain any overlapping then it doesn’t work. Each data sampling methods have their own draw backs. We concentrated on under sampling methods problems and tried to overcome it by using our proposed algorithm IDRUS.

3 Proposed Work To overcome the class imbalance problem which was observed in SDP dataset by proposing new method related to data-based under sampling IDRUS method. Our method tried to overcome few of the problems of under sampling methods related to deleting the data randomly from majority class by which important data may lose and forming of multiple clusters by which further the size of data will be reduce. The detailed frame work of our method IDRUS is shown Fig. 1. The SDP dataset which is imbalanced is partition into two datasets MinDS (minority dataset) and MajDS (majority dataset) depending on the label bug1. The bug1 (label) of SDP consists of no defect and defect which consists of ‘0’ (no defect) and ‘1’ (defect) value, respectively. All the ‘0’ related records are moved to ZDS (Zero Dataset) and ‘1’ related records are moved to ODS (One Dataset) and the count of zero’s and one’s are made, respectively. As it is observed that the count of defect is less than no defect all defect records are moved to minority dataset (MinDS ) and no defect are moved to majority dataset (MajDS ). In our proposed work we are interested in majority dataset (MajDS ) and it is passed to our algorithm IDRUS. The mean of the records of majority dataset (MajDS ) is calculated by which we get center sample and that center sample is most appropriate as most of the derailed information is present in center. From that center sample distance to all remaining samples are calculated and arrange them in descending order. Later, drop the first ‘N’ records (where N is required imbalance). Here, as the distance of first ‘N’ records is more Fig. 1 Proposed framework

Imbalanced Dataset Minority Dataset

Majority Dataset IDRUS Balanced Dataset

Naive Bayes Classification Performance Evaluation

246

B. Manjula and S. Layaq

there are considered to be farthest and less important. So, we can drop them by which the whole dataset becomes balance. Let us assumed that imbalanced dataset (SDP) consists of 1000 records. In which 700 records are related to the non-defect (majority) and 300 records related to the defect (minority). At this point if classification is done we get in accurate result. So, to overcome the problem balancing has to done at early stage. It is most appropriate to consider the majority class for balancing than minority as it contains more samples and from those samples we can identify the unrelated or least important samples and we can drop them. From 700 records drop first 400 records (majority samples-minority samples) which are farthest. Now, majority dataset consists of 300 records which are equal to the minority sample so now it is balance. Now, balanced dataset consists of equal count of minority and majority dataset of SDP. Below Algorithm 1 shows detailed steps of IDRUS. Algorithm 3.1: Proposed Algorithm IDRUS Algorithm: Imbalanced Data Reduction using Under Sampling (IDRUS) [This algorithm converts imbalanced dataset (IDS ) to balance dataset (BDS ) by performing under sampling using IDRUS] Input: IDS (SDP) is considered as input which consists of x1 , x2 , …, xm attributes and a label bug1 which holds value ‘0’(no defect) or ‘1’ (defect) and y1 , y2 , …, yn as the records. Output: BDS (minority and under sampled majority dataset) holds the output which is balanced. Step 1: Move all ‘0’ related records to Zero dataset (ZDS ) and ‘1’ related records to One dataset(ODS ). for i ← 0 to i < IDS .bug1.Count() ZDS ← IDS [IDS .bug1 = = 0] ODS ← IDS [IDS .bug1 = = 1] Step 2: Zero’s and one’s are counted from zero dataset and one dataset respectively and store in zero count (ZC ) and one count (OC ). ZC ← ZDS [bug1].Count () OC ← ODS [bug1].Count () Step 3: Majority dataset (MajDS ) and Minority dataset (MinDS ) are identified. If (ZC > OC and bug1 = = 0) MajDS ← ZDS else MinDS ← ODS Step 4: Mean of majority datasets is calculated and store in CM (Center Mean). CM ← Mean (MajDS (y1 , y2 , …, yn )) Step 5: Calculate the Euclidian distance from center and Majority dataset records (yi ) and store in D(Distance) array. for each record bi in MajDS D ← Distance(yi , CM ) Step 6: Sort the distances in order of descending and store in DS (Distance sort) array. DS ← D.Sort(reverse = True) Step 7: Calculate the under sampled data and store in N (total under sampled data). N ← ZC − OC Step 8: Drop first ‘N’ distances from DS and store remaining distances values in Distance List(DL ). DL ← DS . iloc[N:] Step 9: Append DL to MinDS (minority dataset) and store in BDS which is balanced. BDS ← MinDS .append(DL ) Step 10: End

A New Method for Imbalanced Data Reduction Using Data Based … Table 1 Confusion Matrix

247

Predicted Actual

Pos

Neg

Pos

p

s

Neg

r

q

3.1 Performance Evaluation The Naive Bayes classification is done to classify the balanced dataset generated by IDRUS and six performance evaluations Accuracy (Acc), Precision (Pre), Sensitivity (Or Recall), F-Measure (FM), Specificity(Spec) and Geometric Mean(GM) are performed to know the accuracy given in Eqs. (1–6) and Table 1 shows the matrix of confusion for it.

Acc =

p+q p+q +r +s

(1)

p p+q

(2)

Pre =

Recall = FM =

p p+s

(3)

2 ∗ (Pre ∗ Recall) (Pre + Recall)

(4)

q q +r

(5)

Spec = GM =



Recall ∗ Spec

(6)

where P q r s

Total True Positive Total True Negative Total False Positive Total False Negative.

4 Experimental Results The software defect prediction (SDP) datasets is considered to evaluate the performance which consists of forty datasets from the repository of PROMISE [11] and twenty software metrics and a label bug1 are present in each dataset. The contrasting result analyzing of IDRUS is done with few under sampling methods (RUS and

248 Table 3 Contrasting IDRUS performance with various data-based under sampling methods

B. Manjula and S. Layaq Measures

Under sampling methods RUS

CBUS

IDRUS

Accuracy

0.52 ± 0.09

0.60 ± 0.09

0.85 ± 0.10

Precision

0.51 ± 0.09

0.80 ± 0.12

0.89 ± 0.11

Recall

0.50 ± 0.10

0.70 ± 0.11

0.83 ± 0.09

F-measure

0.55 ± 0.12

0.74 ± 0.12

0.82 ± 0.10

Specificity

0.56 ± 0.11

0.82 ± 0.11

0.85 ± 0.10

Geometric mean

0.57 ± 0.10

0.72 ± 0.13

0.82 ± 0.09

Bold significance to show improved or increase accuracy

CBUS) and six performance measures Geometric Mean, Precision, Accuracy, Recall, Specificity and F-Measure were calculated after performing Naive Bayes classification. The “Mean ± SD” is calculated and shown in the table format Table 3 and chart format in Fig. 2. By comparing the Table 3 values our proposed algorithm IDRUS accuracy is in between 82 and 89% and others like RUS in between 50 and 57% and CBUS in between 60 and 82%. We can conclude that our IDRUS method is showing better accuracy than other under sampling methods.

Fig. 2 Performance evaluation of various under sampling methods

A New Method for Imbalanced Data Reduction Using Data Based …

249

5 Conclusion To overcome the problem of imbalance present in SDP dataset we proposed a new data-based under sampling method. Our proposed method IDRUS has tried to overcome the problems faced by various under sampling methods. In earlier methods the accuracy reduces if we randomly pick the data from majority class and delete it and by forming the multiple clusters we are reducing the data. To overcome it we had identified the center data by using mean method as center data contain most relevant features and is most detailed important data. The data near to it will also be important and data which are far are least important than near data so far data can be dropped. The farthest data can be identified by calculating distance and sorting them. The required number of farthest data from majority dataset is deleted by which the majority and minority datasets are balanced. The performance of IDRUS method is evaluated by considering dataset of software defect prediction from PRedict or Models in Software Engineering (PROMISE) repository. The Naive Bayes classification is done and performance measures (Geometric Mean, Accuracy, Precision, Recall, Specificity and F-Measure) were calculated. The comparing results are shown in tabular format of form “Mean ± Standard Deviation”. By analyzing the result our algorithm IDRUS shows better performance than the other.

References 1. Chawia, N.V., Bowyer, W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority oversampling technique. J. Artif. Intel. Res. 16, 321–357 (2002) 2. Han, H., Wang, W.Y., Mao, B.H.: Borderline-smote: a new oversampling method in imbalanced data sets learning. In: International Conference on Advanced Intelligent Computing, pp. 878– 887 (2005) 3. Barua, S., Islam, M.M., Yao, X., Murase, K.: MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405–425 (2014) 4. Kiran, K.B., Gyani, J., Narsimha, G.: Class imbalance reduction (CIR): a novel approach to software defect prediction in the presence of class imbalance. Symmetry 12, 407 (2020) 5. Prusa, J., Khoshgoftaar, T.M., Dittman, D.J., Napolitano, A.: Using random undersampling to alleviate class imbalance on tweet sentiment data. In: IEEE International Conference on Information Reuse and Integration, pp. 197–202 (2015) 6. Małgorzata, B., Aleksandra, W., Mateusz, P.: The proposal of undersampling method for learning from imbalanced datasets. In: 23rd International Conference on Knowledge-Based and Intelligent Information Engineering Systems, vol. 159, pp. 125–134 (2019) 7. Ng, W.W., Hu, J., Yeung, D.S., Yin, S., Roli, F.: Diversified sensitivity based undersampling for imbalance classification problems. IEEE Trans. Cybern. 45(11), 2402–2412 (2015) 8. Yen, S.J., Lee, Y.S.: Cluster-based under-sampling approaches for imbalanced data distributions. Exp. Syst. Appl. 36(3), 5718–5727 (2009) 9. Gustavo, E., Ronaldo, C.P., Maria, C.M.: A study of the behaviour of several methods for balancing machine learning training data. Art. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004) 10. Anantaporn, H.: .A new hybrid sampling approach for classification of imbalanced datasets. In: 3rd International Conference on Computer and Communication Systems. IEEE (2018)

250

B. Manjula and S. Layaq

11. Ferenc, R., Toth, Z., Ladányi, G., Siket, I., Gyimóthy, T.: A public unified bug dataset for Java. In: Proceedings of the 14th International Conference on Predictive Models and Data Analytics in Software Engineering, Oulu, Finland, pp. 12–21 (2018)

Resource-Based Prediction in Cloud Computing Using LSTM with Autoencoders Adithya Babu and R. R. Sathiya

1 Introduction The most growing demands in any technological company are the data and its storage. Since the company relies more on the cloud to store, access or perform operations in the cloud, a more proactive and optimized resource allocation needs to be done in cloud computing to sufficiently meet the requirements of the clients or users’ specifications. More and more companies and personal accounts depend on cloud computing for better and efficient processing of their works. Proper and required resource and storage allocation helps the cloud service providers to manage their servers efficiently and meet each of their client’s Service Level Agreements. The providers invest and always try to find an optimal way to achieve a more accurate deep learning model with minimal time for workload prediction for different parameters like cluster usage, resource allocations, disk storage, etc. Therefore, it is undoubtedly an important element in the cloud infrastructure. The increasing interest of cloud providers to research and develop algorithms for the prediction of cloud workloads, so to easily scale the cloud resources and to reduce significant performance failures because of resource shortages and to obtain proficient savings in energy consumption mainly in the datacenters, workload prediction is an important vertical in cloud computing. For the past 20 years, heavy research has been conducted in the areas of prediction in time series data like forecasting the stock prices, energy consumption, and generation, cloud computing, etc. We will be focusing on the highly dynamic and volatile time series dataset in cloud computing and predict resource usage. There A. Babu · R. R. Sathiya (B) Department of Computer Science and Engineering, Amrita School of Computing, AmritaVishwaVidyapeetham, Coimbatore, India e-mail: [email protected] A. Babu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_23

251

252

A. Babu and R. R. Sathiya

are many methods for forecasting which had been evolved over all these years like taking the moving average (MA) and autoregression (AR) models from the stationary time series models and predicting future values. Later with time and research, it was proved that using Machine Learning (ML) & Deep Learning (DL) algorithms, resulted in a more accurate model for time series prediction. Lately, instead of using single forecasting models to predict univariate and multivariate time series, ensemble methods have also been used which greatly increases the model by extracting the best features for flexible prediction. Some of the models include CNN-LSTM, Generative Adversarial Network (GAN), etc. These hybrid models brought the most important features for accurate predictions both in time series and other supervised datasets. In our paper, we have given importance to the high latency-sensitive tasks and it’s preprocessing. High latency-sensitive tasks have more chance of having failed or killed jobs during operations when compared to the jobs that have been scheduled and submitted. In short, they have a low completion rate. Since these jobs involve revenue-generating tasks and jobs, more accurate resource allocation needs to be done to lower the failure rates. Hence a more accurate model needs to be devised for the jobs belonging to the high latency class. The next step in this paper is to extract the important features from these highly volatile and dynamic datasets. A custom build autoencoder model is built to act as a feature extractor that feeds its latent features to the LSTM model for better and fast prediction. There are millions of records and billions of jobs that are processed within the timeframe of the dataset and hence the need for representing this big data with its important or latent features is necessary for easy prediction to the LSTM.

2 Related Work This section discusses some of the recent and important work that has been done related to prediction models that have been used in cloud computing for predicting the resources [1–3]. Some of the traditional methods used for prediction of the workload involves machine learning and other conventional methods [4–7]. Many research and analysis works have been done in the cloud workload resources [8–11] including latest research methods [12–16]. Some researchers have used in other sectors as in energy estimation [17] and other domains [11, 18, 19]. Cloud computing providers face several challenges in precisely forecasting large-scale workload and resource time series. In [20], other than the usual ARIMA model or any supervised neural network, they have proposed a stack of LSTM layers to predict or forecast the workload prediction of resource usage along with 3 filters for data preprocessing. This work applies a logarithmic operation to reduce the standard deviation before smoothing workload and resource sequences. Then, noise interference and extreme points are removed via a powerful Savitzky-Golay (SG) filter. A Min–Max scaler is adopted to standardize the data. In [21] a two-stage deep learning technology is used for clustering the time series data. In this paper, they have done unsupervised clustering to create labels and apply deep

Resource-Based Prediction in Cloud Computing Using LSTM …

253

learning using an encoder-decoder model. Their model achieves 87.5% accuracy in predicting the correct labels which were done using one of the conventional clustering algorithms K-means. The encoder-decoder model which represents the latent representations was able to identify certain hidden representations which could not have been predicted by the K-means. Their model was tested on selected financial and stock time series data of over 70 stock indices. In [22], a host load prediction is built as it remains a major challenge in the dynamic cloud computing world. The model achieves good accuracy compared to the state-of-the-art architectures at the time. It uses the Echo State Network (ESN) for the prediction and autoencoder for getting the important features. These features are then given to the ESN for prediction. They have used the Mean Square Error (MSE) as a metric for comparison. In [23], they presented a two-stage workload parallelization strategy-based DL training workloads to improve the performance of workloads as well as resource usage in the GPU cluster. To begin, they introduced two interference-aware prediction models, InterferenceAware Similarity Prediction (IASP) and Interference-Aware Performance Prediction (IAPP), to aid in the parallelization of DL workloads on a cluster and node level. After that, a workload parallelization approach at the cluster level would be implemented. Using the suggested IASP model and a node-level workload method, cluster-level workload parallelization (CLWP) is presented to allocate DL jobs to appropriate worker nodes. The suggested IAPP model and the communication costs across tasks are used to assign DL workloads to appropriate GPUs using nodelevel workload parallelization (NLWP). The workload prediction was even tried in a different domain like reinforced learning (RL) in [24]. They proposed the RLScope, a cross-stack profiler that scopes low-level CPU/GPU resource utilization to high-level algorithmic operations while accounting for profiling overhead to deliver accurate insights. They use RL-Scope to look at RL workloads in all of its primary dimensions, including the ML backend, the RL algorithm, and the simulator. In [25] an optimized algorithm for cloud workload prediction can be seen wherein the parameters of the proposed model are greatly simplified by transforming the weight matrices to the standard polyadic format. The parameters are also trained using an efficient supervised learning algorithm. Finally, the developed effective deep learning model is used to forecast virtual machine workloads in the cloud. Experiments are carried out on the PlanetLab datasets to validate the proposed model’s performance by comparing it to other machine learning-based approaches for virtual machine workload prediction. A more free approach can be observed with deep learning techniques with better data preprocessing algorithms like in [26] where the model uses Top Sparse Autoencoder to extract the essential features using, deep learning-based prediction algorithm for cloud workloads (LPAW) algorithm, formed by integrating the TSA with Gated Recurrent Unit (GRU) and Recurrent Neural Network for cloud cluster usage prediction. The role of GAN in prediction is not that popular but many forecasts, prediction models augmenting synthetic datasets [27] can be built using the GAN technique, yielding more accuracy. In [1], we can see Multivariate Time Series Synthesis Using Generative Adversarial Networks. In this research, resource consumption measurements, such as Content Delivery Network (CDN) cache utilization rates, are generated, and a comparative evaluation pipeline based on descriptive

254

A. Babu and R. R. Sathiya

statistics and time series analysis is developed to examine the statistical coherence of generated and measured workloads. GAN can be effective also irrespective of different frameworks of cloud computing like Spark, Hadoop, etc. An Auto-Tuning Configuration system is built using GAN in these frameworks [2]. Advanced versions of GAN are also employed for managing workload properly as in [3]. The strategy for developing performance models that give data augmentation to manage largescale data repositories that are not publically available is discussed in the report. Deep learning algorithms for performance adjustment of large-scale data repositories were examined in this research. They proposed the Optimized GAN-based Deep Learning (OGDL) model as a performance prediction model. Conditional Generative Adversarial Networks (CGAN) are used to enhance data. From an autonomic perspective, they incorporated the MAPE-K model to manage the workload automatically. In [28] the empirical mode decomposition approach is used to breakdown the cloud workload time series into its constituent components in distinct frequency bands, which decreases the complexity and nonlinearity of the prediction model in each frequency band. In addition, based on the degree of complexity and volatility of each sub-band workload time series. Hence, here a new state-of-the-art ensemble GAN/LSTM deep learning architecture is presented to predict each sub-band workload time series independently. To summarize Ensemble techniques like CNN-LSTM or E2LG [28] give a more accurate model. Only one paper has used autoencoder with other DL models. In [23], autoencoders which are usually used for compressed image reconstruction can also be used to find the latent features from a high dimensional time series dataset. In cases of highly dimensional large-scale datasets, sequence prediction is best when using LSTM. With the help of autoencoders, the latent features are well represented increasing the chances of getting a more accurate prediction from a compressed model. Managing the workload in cloud computing is important as it affects the datacenters which are run by the service providers. In cloud computing some of the main features like CPU Usage, RAM Memory usage, Disk I/O space, etc., if not allocated properly (either too high or too low) could cause a disruption in the server enabling it to crash. In [19], LSTM is also used in sequence-based machine learning approaches indicating that it is most suitable algorithm for sequence-based datasets yielding accurate predictions. LSTM can also be used in other important domains like in [29], where a multi input multi output architecture is deployed for prediction enabling real time action control in microgrids. Therefore the objective of the work in short words is to get an accurate model with minimal time. Since the cloud data is highly volatile, it becomes very hard to preprocess, denoise, and input to the data model for prediction [30]. Hence our contribution to the work will be to primarily build a custom autoencoder to enable it for denoising and preprocessing [31, 32] of the dataset. Then our second step would be to extract the latent feature vectors from the bottleneck layer of the autoencoders. Finally the most important and latent representations are only sent to LSTM for the prediction of our feature, thereby reducing time and getting an increased accuracy.

Resource-Based Prediction in Cloud Computing Using LSTM …

255

3 Methodology This section comprises the overall architecture of the model in the macro mode inside the domain. There are 4 steps explaining the flow of the data in this system along with our proposed architecture. In this domain, the central management module controls and allocates the resources to different virtual machines according to the jobs and services selected for the machine. Hence at times, the dynamic allocation would also be needed. Here from our dataset, the feature values are predicted and fed back to the central management module for more accurate prediction so that there is no overuse or wastage of resources for the service providers. The stages are: • The clients/users’ requests are loaded from the utility queue and are sent to the Proposed Model. • The workload is first sent to the autoencoder where only the essential data will be given as input to the LSTM for prediction of the workload with the most important features, only thus reducing much time for training as well as data preprocessing. • The predicted feature values are then captured based on the resource module and the values are inverted from scaling to their original values. • The output of the proposed model is then sent to the cloud management module for assigning only the required utilities, resources, and a number of machines. The stages and flow of the system is as shown in Fig. 1.

3.1 Dataset Description The dataset used for this architecture is from the Google Compute cell tracing Google cluster data 2011. The dataset consists of various tables with an index of timestamp and other parameters that normally govern the various resources used for different jobs or tasks having different types of events. It contains data that spans over 125,000

Fig. 1 Architecture diagram

256

A. Babu and R. R. Sathiya

machines traces for the month of March of 29 days. The dataset is sectioned into 3 verticals involving various features for each of the sections. The 3 main different tables which we are going to use in this dataset are categorized as Machine Table, Job Tables, and Task Tables. The machine records all the machine capacities reflecting the normalized physical capacity of each machine along each dimension like the CPU cores, RAM size, etc. The cluster management module clusters and then allocates the resources according to availability. Each machine ID comprises many jobs and these jobs contain many tasks. The Job table comprises all the important job features which classify it into different scheduling classes and their event type. The latter two tables are cojoined by the job ID, hence it has become easier for us to track and merge the files for our research purposes. Training, Testing, and Validation • Training data is used to help our machine learning model make predictions. It’s the largest part of our dataset, forming at least 70–80% of the total data we’ll use to build our model. So for our project in terms of hours we use the 600 h as our training data. • Validation data is primarily used to determine whether our model can correctly identify new data or if it’s overfitting to our original dataset. We are going to use 20 h of our data after the initial 600 h as our validation data. • Testing data is used after both training and validation. It aims to test the accuracy of our final model against our targets. We are deploying the last set of hours, i.e., 76 h as our testing data. Data Visualization Here we have plotted the data on graphs so that it would be easier to analyze and understand the dataset. The important features that need to be allocated accurately are CPU Usage, RAM Memory Usage, Disk Space, etc. Figure 2 shows the CPU Usage of the machines over timesteps. Figure 3 shows the Requested memory (above) and used memory (below) from the Google Cluster Trace Dataset 2011.

Fig. 2 Sample CPU usage

Resource-Based Prediction in Cloud Computing Using LSTM …

257

Fig. 3 Memory requested (above) and memory used (below)

Figure 4 shows the total number of jobs and what class do they all belong to. Figure 5 shows the Event type which includes the count of jobs that have been submitted, finished or killed, etc. with their respective count of which class do they belong. We observe from Fig. 3 that the no of jobs belonging to scheduling class 3 is less. In Fig. 4 Each event represents the action that the job is categorized into. Event 0 represents that the job has been submitted. Event 1 indicates the job has been scheduled. Event 2 describes the job has been evicted due to a shortage of resources in the workload. Event 3 means the job has failed also due to the min computational

Fig. 4 Count of jobs based on scheduling class

258

A. Babu and R. R. Sathiya

Fig. 5 Count of jobs based on event type

requirements. Event 4 tells that the job has been finished. Event 5 shows the number of jobs that have been killed either by the user or by the programming API it is running due to fewer resources to run the job. Events 6, 7, and 8 represent the missing, pending updates of running jobs.

3.2 System Overview The proposed method is an ensembled technique of autoencoders and the best suited prediction algorithm for time series or sequence type data, i.e., the LSTM. The primitive data from the cloud workload is taken into the autoencoder for data preprocessing. Since autoencoders is a neural network which enhances in giving an output as same as the input, we make use of the bottleneck layers in this network and take out the most essential features or parameters that really drives the workload data. Hence by accessing the parameters in the bottleneck layer helps to get rid of all the unwanted noise and parallel parameters which do not govern the dataset. Then all the parameters are fed into the LSTM model, to predict the output of the resources from the workload with only the features having least variances from the target variables. Thus the output can be predicted depending on the variable we input, as in we keep the corresponding timestamp in order to get the desired output in the particular timestamp like Hours or Days.

Resource-Based Prediction in Cloud Computing Using LSTM …

259

3.3 Network Design Autoencoder network The workload processor in our proposed network is the autoencoder. In our system, we have 2 encoder-decoder layers and 1 bottleneck layer. We construct the model by trying to reduce the dimensions and represent the highly dynamic data in lower dimensions which are the essential parameters. It can mathematically be represented as: Yn = f (W xn + b) ≈ xn

(1)

Such that the output Y n can be close to the input x n . Chances of over fitting is possible and hence we add a regularization term while back-propagation is taking place. The features are now reduced to half of our input. Now the model is fitted and run for 10 epochs. The batch normalization is used to speed up the training process and the activation function used is leaky ReLU so that the gradients don’t equate to 0 while training. The input 19 features have been reduced to 10 important features from where the model tries to build the input data. LSTM Now the compressed workload vector from the workload processor is given as the input to the LSTM for future prediction. Since the data is in a sequence of time, LSTM will be able to predict the data overcoming the RNN’s gradient problems and long learning disabilities. Due to computational difficulties, the LSTM model could only be run for 1 epoch. The optimizer function used is ‘Adam’ and the loss function is ‘MSE’. The data consists of 12 million records and it was trained with LSTM to predict the CPU Usage in the next hour and day. The optimizer used is ‘Adam’ and the loss function used is ‘MSE’. The loss function was analyzed and since only the program could only be run for 1 epoch, the model is evaluated on the test set with the help of performance metrics like MSE, MAE, and R2 scores.

4 Results and Discussion The output is predicted in hours and also in days. The proposed network model is compared with the other models on accounts of the performance metrics. All the models were modeled and fit using the same Google Cluster Trace Dataset 2011. The results of the comparison are displayed in Table 1. From the above table, we observe that even though the error metrics are high for our proposed model we get a better R2 Score. Here Facebook’s prophet model is one of the tangent models used for time series forecasting. It tries to fit the nonlinear model into its seasonality patterns namely weekly, monthly, and daily. It uses a Bayesian model algorithm to predict its output. Another important and one of the most used time series algorithms is ARIMA. This model integrates (I) the moving average (MA) and the autoregression (AR) model. It however is only extremely useful in datasets having good trend and

260

A. Babu and R. R. Sathiya

Table 1 Model performance comparison Performance metrics

Fb-prophet model

MSE

0.00018

ARIMA model

MAE

0.00201

0.0059

0.0364

RMSE

0.0103

− 0.0047

0.0448

R2 Score

0.5692

0.5892

0.7511

0.0019

Proposed model 0.0020

seasonality. High seasonality datasets have an improved model called SARIMAX for better prediction in the future. Since these models are some of the state-of-theart architectures widely used, I have chosen them for comparison with our LSTM model. The prediction results are given and our proposed model achieves an accuracy of 75%.

4.1 Discussion Our proposed model was scaled with the Min–Max Scaler so that all the features are within the same range. Then the input vectors are directly given to the autoencoder model and finally, the low-dimensional output is fed to the input of the LSTM, where now the input to LSTM will be denoised and also be free of unwanted features. The LSTM predicted the output and the metrics R2 score was found to be 75.1 ranged around 0.002 with an MAE value of 0.03. The model was then tested with the FBProphet method. The sample CPU usage feature was directly taken and the model was fit by giving a timestamp. The model was then predicted for 10 days giving the window function as 2 days meaning 48 h. The forecasted model was then crossvalidated with the value of 25 h in the horizon parameter. Then the model metrics were calculated giving very low metric errors. Then the R2 score was calculated from the forecasted and actual CPU Sample usage denoted by ‘y’ in the data frame and found out to be 0.58 indicating its accuracy. The data was given to the ARIMA model to predict and the metrics were found to be negative in spite of taking the proper p, q, and d coefficients from their partial and autocorrelation graphs. Then the Auto-ARIMA was also tried but the AIC scores attained were of negative value. The MAE score of 0.00134 obtained from the prediction was low compared to the proposed model but since its MSE and RMSE values were negative, we can say that the model is not fit directly into the dataset. Perhaps deep feature engineering and other methods would have helped in gaining a better ARIMA model. From Table 1 it is now clearly evident that the proposed model works better without much initial data preprocessing or any other deep feature representation. In Fig. 6 the output is predicted for the coming hours into the future. The accuracy of the proposed system can again be improved by deploying ensemble methods or by using Generative Adversarial Networks (GAN).

Resource-Based Prediction in Cloud Computing Using LSTM …

261

Fig. 6 Output of the proposed model

5 Conclusion Cloud Computing has become an inevitable domain in the field of technology and business worlds. Hence predicting and allocating only the required resources without trading of the quality of service has become the major challenge in the industry. Even though many have invested their money and resources in building an efficient model, there was always a trade-off with the time constraint. In this paper, we proposed a twostage workload predictor to predict one of the most important workload resources, i.e., CPU Usage. In this system, the low-dimensional workload processor is used to denoise and compress the original high-dimensional data, so that only the relevant features are needed for the workload predictor to predict. This model has secured an accuracy score of 75 and has shown better results when compared with FB-Prophet and ARIMA models. The accuracy of the model can be improved using ensembling techniques like using CNN-LSTM or Generative Adversarial Networks (GAN) like architectures. Research has found that they give the best accuracy results. The further study includes a high latency-based analysis, performing clustering to the dataset, and finding the jobs and tasks with high priority to improve their prediction.

References 1. Leznik, M., Michalsky, P., Willis, P., Schanzel, B., Östberg, P.-O., Domaschka, J.: Multivariate time series synthesis using generative adversarial networks. In: Proceedings of the ACM/SPEC International Conference on Performance Engineering (ICPE’ 21), 2021. Association for Computing Machinery, New York, NY, USA, pp. 43–50 (2021) 2. Li, M., Liu, Z., Shi, X., Jin, H.: ATCS: auto-tuning configurations of big data frameworks based on generative adversarial nets. IEEE Access 8, 50485–50496 (2020) 3. Shaheen, N., Raza, B., Shahid, A.R., Malik, A.K.: Autonomic workload performance modeling for large-scale databases and data warehouses through deep belief network with data augmentation using conditional generative adversarial networks. IEEE Access 9, 97603–97620 (2021). https://doi.org/10.1109/ACCESS.2021.3096039

262

A. Babu and R. R. Sathiya

4. Gao, J., Wang, H., Shen, H.: Machine learning based workload prediction in cloud computing. In: 2020 29th International Conference on Computer Communications and Networks (ICCCN), pp. 1–9 (2020) 5. Saxena, D., Singh, A.: Workload Forecasting and Resource Management Models Based on Machine Learning for Cloud Computing Environments (2021) 6. Kumar, J., Singh, A.K., Buyya, R.: Self directed learning based workload forecasting model for cloud resource management. Inform. Sci. 543, 345–366 (2021). ISSN 0020-0255 7. Ajila, S.A., Bankole, A.A.: Cloud client prediction models using machine learning techniques. Proc. Int. Comput. Softw. Appl. Conf. (2013). https://doi.org/10.1109/COMPSAC.2013.21 8. Peng, Y.B., Chen, Y., Wu, Y., Chuan Guo, C.: Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Clusters, pp. 1–14 9. Li, K., Tang, Y., Chen, J., Yuan, Z., Xu, C., Xu, J.: Cost-effective data feeds to blockchains via workload-adaptive data replication. In: Proceedings of the 21st International Middleware Conference (Middleware’ 20). Association for Computing Machinery, New York, NY, USA, pp. 371–385 (2020) 10. Kumar, J., Singh, A.K.: Performance evaluation of metaheuristics algorithms for workload prediction in cloud environment. Appl. Soft Comput. 113, 107895 (2021). https://doi.org/10. 1016/j.asoc.2021.107895 11. Chen, L., Zhang, W., Ye, H.: Accurate workload prediction for edge data centers: SavitzkyGolay filter, CNN and BiLSTM with attention mechanism. Appl. Intel. 1–16 (2022). https:// doi.org/10.1007/s10489-021-03110-x 12. Hardy, C., Le Merrer, E., Sericola, B.: MD-GAN: Multi-Discriminator Generative Adversarial Networks for Distributed Datasets (2018). arXiv e-prints¡/i 13. Koltuk, F., Yazar, A., Schmidt, E.G.: CLOUDGEN: Workload Generation for the Evaluation of Cloud Computing Systems. In: 2019 27th Signal Processing and Communications Applications Conference (SIU), pp. 1–4 14. Zhou, X.P., Hu, Z., Tang, G., Siqi Zhao, C.: Stock market prediction on high-frequency data using generative adversarial nets. Math. Probl. Eng. 1–11 (2018) 15. Singh, A.K., Saxena, D., Kumar, J., Gupta, V.: A quantum approach towards the adaptive prediction of cloud workloads. IEEE Trans. Parallel Distrib. Syst. 32(12), 2893–2905 (2021) 16. Xu, M.S., Wu, C., Gill, H., Ye, S.S., Ye, K., Xu, C.-Z.: EsDNN: Deep Neural Network Based Multivariate Workload Prediction Approach in Cloud Environment (2022) 17. Khan, T.T., Ilager, W., Buyya, S., Rajkumar: Workload forecasting and energy state estimation in cloud data centers: ML-centric approach. Futur. Gener. Comput. Syst. (2021). https://doi. org/10.1016/j.future.2021.10.019 18. Sridhar, P., Sathiya, R.R.: Crypto-watermarking for secure and robust transmission of multispectral images. In: 2017 International Conference on Computation of Power, Energy Information and Communication (ICCPEIC), pp. 153–163 (2017) 19. Viswanathan, S., Anand Kumar, M., Soman, K.P.: A sequence-based machine comprehension modeling using LSTM and GRU. In: Emerging Research in Electronics, Computer Science and Technology, pp. 47–55. Springer, Singapore (2019) 20. Bi, J., Li, S., Yuan, H., Zhou, M.C.: Integrated deep learning method for workload and resource prediction in cloud systems. Neurocomputing 424, 35–48 (2021). ISSN 0925-2312 21. Tavakoli, N., Siami-Namini, S., Adl Khanghah, M., et al.: An autoencoder-based deep learning approach for clustering time series data. SN Appl. Sci. 2, 937 (2020) 22. Yang, Q., Zhou, Y., Yu, Y., et al.: Multi-step-ahead host load prediction using autoencoder and echo state networks in cloud computing. J. Supercomput. 71, 3037–3053 (2015) 23. Geng, X., Zhang, H., Zhao, Z., et al.: Interference-aware parallelization for deep learning workload in GPU cluster. Cluster Comput. 23, 2689–2702 (2020) 24. Gleeson, J., Krishnan, S., Gabel, M., Janapa Reddi, V., de Lara, E., Pekhimenko, G.: RL-Scope: Cross-Stack Profiling for Deep Reinforcement Learning Workloads. arXiv e-prints (2021) 25. Zhang, Q., Yang, L.T., Yan, Z., Chen, Z., Li, P.: An efficient deep learning model to predict cloud workload for industry informatics. IEEE Trans. Ind. Inf. 14(7), 3170–3178 (2018)

Resource-Based Prediction in Cloud Computing Using LSTM …

263

26. Chen, Z., Hu, J., Min, G., Zomaya, A.Y., El-Ghazawi, T.: Towards accurate prediction for highdimensional and highly-variable cloud workloads with deep learning. IEEE Trans. Parallel Distrib. Syst. 31(4), 923–934 (2020) 27. Lin, Z., Jain, A., Wang, C., Fanti, G., Sekar, V.: Using GANs for sharing networked time series data: challenges, initial promise, and open questions. In: Proceedings of the ACM Internet Measurement Conference (IMC’ 20), 2020. Association for Computing Machinery, New York, NY, USA, pp. 464–483 (2020) 28. Yazdanian, P., Sharifian, S.: E2LG: a multiscale ensemble of LSTM/GAN deep learning architecture for multistep-ahead cloud workload prediction. J Supercomput 77, 11052–11082 (2021) 29. Kumar, A.G., Sindhu, M.R., Kumar, S.S.: Deep neural network based hierarchical control of residential microgrid using LSTM. TENCON 2019-2019 IEEE Region 10 Conference (TENCON). IEEE (2019) 30. Sridhar, P., Sathiya, R.R.: Noise standard deviation estimation for additive white Gaussian noise corrupted images using SVD domain. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 8(11) (2018) 31. Sathiya, R.R., Swathi, S., Nevedha, S., Shanmuga Sruthi, U.: Building a knowledge vault with effective data processing and storage. In: Proceedings of the International Conference on Soft Computing Systems, vol. 398 (2016) 32. Sathiya, R.R.: Content ranking using semantic word comparison and structural string matching. Int. J. Appl. Eng. Res. 10, 28555–28560 (2015)

Arduino UNO-Based COVID Smart Parking System Pashamoni Lavanya, Mokila Anusha, Tammali Sushanth Babu, and Sowjanya Ramisetty

1 Introduction The main concept of this IOT devices is what exactly how the communication can be identified the devices. From these IOT devices how the car can connect with Arduino and how it’s working using those IOT devices. Before us slotting the vehicles we have to check that the users contain the Covid report or does person has the Covid positive or negative. While checking the Covid report then the particular person is allowed to book the slot in the specify area. The main challenge is to reduce the Covid cases in this project due to this in our society the Covid cases reduces without any disturbances. So before parking their slots they have to enter the details to check whether they contain the Covid or not. In this we first started with the Covid report that to enter they details website has been created that they contain Covid or not to know that we created a website and from the Arduino UNO automatically the person can slot their vehicle in the specific area. And the slots can be book before the slots are completed if the slots are completed then the from the IOT devices the vehicle can’t enter into the slotting area because it contains only a number of slots so who ever come to that place to slot their vehicle they can slot the vehicle. We are created for this a website from the html, XAMPP, Database and then by checking their reports we are entering them with vehicles to slot from the Arduino UNO, server motor, the server motor contains role only if the slot is free the vehicle can enter in those specific area or else if there is no free slots the slots are filled then the server motor do not open to enter the vehicle. It is key to solve both the Covid cases and to solve the emerging parking problems using IOT devices Arduino UNO.

P. Lavanya · M. Anusha · T. Sushanth Babu · S. Ramisetty (B) KG Reddy College of Engineering and Technology, Hyderabad, Telangana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_24

265

266

P. Lavanya et al.

2 Literature Survey As we all know that the world’s population is increasing day by day and automation is also leads to growth. So, the entire world’s is facing problem of vehicular parking as well as the Covid cases in our society. The major issues that to rapidly increasing a lot of vehicles and reducing the important of more work slots are: 1. The first point is that the time consuming takes more due to this time is wasted and it trouble, difficulties, and incapable not achieving the maximum vehicles. 2. Due to this the vehicles contains more diesel or petrol when idling or moving at the slots, from this the air conditioner emission contains polluted and can be surrounded to all the environment [2]. 3. From the website html, XAMPP, database, and with IOT devices the slots can be managed and get entered into their slots [4]. 4. Due to this driver can enter for the communication on vehicle for reservation the parking slots or space by the website and the Arduino UNO of IOT devices [5]. 5. The writer propose the smart parking system from the above, which is deployed with radio frequency and identification (RFID) to authentication at the gate management service (GMS) to assign a definitive slot. The system provide an additional feature to monitor parking IOT over the Internet [6]. 6. If there is no parking slots are available then their contains number of accidents if there is no parking slots [8]. 7. From this system and the website the slots are filled, so from this it makes so intelligent which provides by this system [9] But from this there is lot of shortage that it contains to reduce the trees by erosion, logging. Thus, it has a large and very effect on the environment. So, to overcome this we use the Arduino system from the c code programming language. So, we design a vehicle and website where the services are entering the details of customer information is the person contains the Covid, i.e., positive or negative.

3 Proposed Model Covid smart parking system using Arduino UNO which contains a website and IOT devices from this we design the system that the customer details and booking their slots if the slots are free it contains a smart system with Covid report. From the mail password after registration the website has shown in Fig. 1. The person can check whether the slots are available in the specified parking by the LCD display before entering in to the parking and also automatically the gates can open by the motor if there is free slots are available. Overall the project explains how to reduce the Covid status in the country by the smart parking system. So, from the Arduino UNO-based Covid smart parking system we introduced hoe to reduce

Arduino UNO-Based COVID Smart Parking System

267

Fig. 1 Registration completion

the Covid status. Based on the website application we can check the Covid report form the report we can know that the person is vaccinated or not and the person has the Covid. So, this project helps to reduce the Covid status at the smart parking system.

3.1 Proposed Architecture In many areas the parking slotting is available due to overcome across the accidents and to control the clients to the empty spaces. The following block diagram of the parking system is show as Fig. 2. In many areas the parking slotting is available due to overcome across the accidents and to control the clients to the empty spaces. The following block diagram of the parking system is show as Fig. 2. The Functionalities of Covid Smart Parking System are: • • • • •

Fast and Efficient service to the user. A searching process is an easy process. Selective of slot is also an easy process. Save a lot of time while searching a fee slot in the Building and even may avoid the minor.

268

P. Lavanya et al.

Fig. 2 Block diagram of proposed architecture

3.2 Principle and Working We done with 4 sensors according to the parking system slots available and a LCD display where it can display the slots are full or empty available slots can be seen before entering into the slots. From the male to female wires the connections are done in this model we have servo motor, LCD sensors, Arduino UNO software, before entering into the slots they have to login into the website to check whether they contain the Covid status or not so if they have a negative Covid report they can allow into the parking due to this report we can reduce the enlarge of the Covid status in the society. So from the sensors, Arduino, and LCD the parking will takes place. Sensors automatically open when the slots are free. IR Sensor. IR sensors works with the reflective surface. We know that the IR sensor consists of IR receiver and IR Emitter. So the sensors can allow the vehicles whether the slots are free. Here in this project we used 4 sensors which allow the person for booking the slot and back to leave the parking. Liquid Crystal Display (LCD). Here we having the 16 * 2 size LCD. It means 2 lines and with 16 characters.

Arduino UNO-Based COVID Smart Parking System

269

1. A website is created based on the html, XAMPP, and the database. From the website the customers has to login to the page. Then after login form is completed it asks to enter the details of the customers is shown in Fig. 3. 2. Due to this entering the details the person knows that is the person has completed the vaccination and the person contain any Covid report or not. 3. After the website part is completed then we undergo with the IOT devices, i.e., with the Arduino UNO. From the Arduino UNO the system can work how the slots are booked. 4. And one more thing is that from the sensors, from sensors the vehicles can be booked it says that the slots are booked and there is no empty slots are available so the vehicle can be return back to the other parking area.

Fig. 3 Flowchart

270

P. Lavanya et al.

5. IR sensors receives and emits the emitter from this the parking slots can slot their vehicle at the particular area. 6. Receiver, the lower the resistance of IR receiver. System testing which an IOT is testing a type which it manages the IOT devices. IOT testing is an important in the parking system and also framework is important.

4 Results Step 1. The Arduino microcontrollers come in different types. The most common is the Arduino UNO, but they contain special variations. Step 2. To enter or start we will need to install or download the Arduino programmer IDE. Step 3 Which we are installed the Arduino software to the laptop connect the USB port. Step 4. Set all the board type and the connections serial port in the Arduino programmer. Step 5. To upload a new code to the Arduino, either you’ll need to have access to code you can paste into the programmer, or you’ll have to write it yourself, using the Arduino programming language to create your own sketch. From the Arduino sketch for the execution purpose. Step 6. Once you’ve uploaded the new sketch to your Arduino, disconnect it from your computer and integrate it into your project as directed. Step 7. From HTML, XAMP create web application to check to check the Covid test for the customers who want to park the cars. They need to done with Covid test whether they contain the Covid Positive/Negative. Figure 4 explains first check the vaccination status of people then only it can be provide slot for booking parking area. The working principle with Arduino process shown in Fig. 5 and that the customer details displays on web application in Fig. 4. This figure specifies overall about the project explanation, checking whether the person is vaccinated or not with the help of Covid report based on website, then if the person is vaccinated then the person can enter into the parking slot with the help of the LCD display it displays the number of slots are available in the parking area. So, they can allow in easier way without any disturbance and if the slots are free then automatically the gate can open by electric motor so the persons can allow into the parking area. This is the explanation of the above which is an automatic slot booking parking area with the Covid vaccination from Covid report (Fig. 6). This paper detect empty and access to parking slots that the customers can find the available space for park the vehicle at particular area. Only customers who are enter the details are followed to slot the parking system.

Arduino UNO-Based COVID Smart Parking System

271

Fig. 4 Automatic slot booking parking area with COVID vaccination

Fig. 5 Working with Arduino

Challenges and Advantages • As there are many sectors on the basis of storing the user data, it was little inconvenient to implement database. • Implementing 2 datasets for one user with interacting with each other. • Some devices were not working while running the project like LCD and sensors. • To stop enlarge of Covid status and to save time for the customers. • A message delivered to LCD was in gibberish language at point, but it was solved after finding out data point connection was not given properly. • Slot sensors were not assigned correctly which led to false report of parking.

272

P. Lavanya et al.

Fig. 6 Displaying the customer details

• Easy and Quick process also, we get to know Covid status of the person entering the parking slot. • Drivers don’t get frustrated, Parking Discipline will be improved. • Easy implementation and Low cost.

5 Conclusion and Future Scope As we seen in many areas there is only the parking slot but there is no such Covid website which the person has contains the Covid or not. In this project we discussed how the Covid status can be checked and how the customers can book their slots how they enter into the parking area. Most of the smart parking system we know that it reduce the congestion in traffic volume on average and the reduce the pollution in miles traveled trying to find the parking space, reduce the time wasted in time taken to find the space the time is consumed more, then there is an increase in revenue based on all it will checks the Covid status for reducing the cases in the society. It will work on the bases of web application and the IOT devices. And the main future scope is that to reduce the Covid cases in the society that it must not spread to the society so from the smart parking system we have introduce the website which a person has done with a Covid vaccination or not and does he contain Covid or not. Due to this in the future we can conclude there is no chance to spread the Covid at parking system and that it contains a good effort without any restrictions the existing will be good. So, hope this feature can be added in the future for reducing the Covid cases on the basis of vaccination status at the parking space area for slotting their vehicle.

Arduino UNO-Based COVID Smart Parking System

273

References 1. Than, N.P., Ming-Fong, T., Duc, B.N., Chi-Ran, D., Der Jinn, D.: A cloud-based smart-parking system based on internet of things technologies. IEEE Access 3, 1581–1591 (2015) 2. Yanfeng, G., Christos, G.C.: A new smart parking system based on optimal resource allocation and reservations. IEEE Trans. Intel. Transp. Syst. 14, 1129–1139 (2013) 3. Cui, S., Wu, M., Liu, C., Rong, N.: The research and implement of the intelligent parking reservation management system based on ZigBee technology. IEEE Meas. Technol. Mechatron. Autom. (ICMTMA) 26, 741–744 (2014) 4. Ashiwin, S., Ramesg, P.S.: ZigBee and GSM based secure vehicle parking management and reservation system. J. Theor. Appl. Inform. Technol. 2, 27 (2012) 5. Yan, F.G., Christos, G.C.: New smart parking system based on resources allocation and reservations. IEEE Trans. Intel. Transp. Syst. 14 (2013) 6. Kumar, K.A., Baron, S., Prabhu, R.A., Britto: Cloud based intelligent transport system. Procedia Comput. Sci. 50, 58–63 (2015) 7. Ji, Z., Ganache, I., Droma, M.O., Zhang, X.: A cloud-based intelligent car parking services for smart cities. In: Proceedings of 31st URSI General Assembly and Scientific Symposium (URSI GASS), vol. 13, pp. 478–480 (2014) 8. Reddy, P.D., Rao, A.R., Ahmed, S.M.: An intelligent parking guidance and information system by using image processing technique. IJARCCE 2 (2013) 9. Finkenzeller, K.: Fundamentals and applications in contactless smart cards and identification. Benu Electric 4, 560–570 (2010)

Low-Cost Smart Plant Irrigation Control System Using Temperature and Distance Sensors Jyotsna Malla, J. Jayashree, and J. Vijayashree

1 Introduction Agriculture has grown to have become a major cause of water scarcity all around the world. Agricultural irrigation takes up nearly 70% of water withdrawn in the developed countries and about 90% in the developing countries like India. It is extremely essential to use smart irrigation systems to prevent the depletion of water resources [2, 3, 8]. Traditional irrigation systems are largely incapable of using the water resources in an efficient and sustainable manner. The groundwater levels are largely depleted to the lack of overconsumption of water resources for irrigation. Smart irrigation helps to distribute the water resources as per the requirement of the crops. The temperature and level of underground reservoir are constantly monitored using smart devices which can help control the wastage of water resources to around 80% [6, 11, 13, 14]. In this paper, we aim to develop a low-cost smart plant irrigation control system. The main objective is to understand the current advancements in the field of agriculture automation by reviewing other recent works of research and scouring through recent trends. The title of the paper signifies that the goal of this research undertaking is to arrive at a low-cost solution to the problem of agricultural irrigation management relevant today. Hence, the desired end product is a working modular,

J. Malla (B) · J. Jayashree · J. Vijayashree School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India e-mail: [email protected] J. Jayashree e-mail: [email protected] J. Vijayashree e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_25

275

276

J. Malla et al.

end-to-end solution that tries to address all the issues faced by the agricultural irrigation market today. The significance of this research endeavor lies to addressing some of the most dominant issues faced in irrigation today. Primarily, the task at hand is to develop an end-to-end solution that takes recurring activities out of the scope of a user’s day of labor, hence ensuring that they will be able to save time and effort for more pressing or valuable activities. As a pure optimization problem, the aim is to try and allow users to take home the best possible benefit from their deployed resources. Once installed, this smart low-cost plant irrigation system becomes part of the small agricultural ecosystem that comes under the governance of the user as part of their deployed resources. The end goal is to optimize the yield the user obtains. As a case, the smart low-cost plant irrigation system can be used to first collect environmental data from the context of installation like temperature and use it to alter the system’s response. This means that system’s stimulus is unique and curated for each environment separately, to help the user optimize their yield of crops, by understanding the type of soil in use, types of crops grown, surface temperature, etc., and using this combination of data to provide the best irrigation solution. The proposed smart low-cost plant irrigation system also takes into account that some environments may be faced by a rampant water shortage problem. To combat this, the system uses some additional constraints over and above the regular process flow. The resource of water supply is constantly measured to quantify the amount of residual water. Based on configurations, during regular system operation, if the residual water in the supply resource drops below a certain threshold, the system automatically shuts down and sounds a warning to alert the user. Together, with the above-mentioned capabilities, the significance of the system lies in providing a total solution to users for irrigation issues, ranging from automation to water shortage. Hence, the advantages involved are quite tangible. Primarily, the automation introduced means that user work hours are freed up. This directly implies more time and effort saved for other essential activities that could lead to increased end value for the user. Moreover, the system can help users achieve process optimization through machine precision that prevents user errors from happening and environmental data collection that enables the system to decide the best flow to help the user obtain the best yield. This translates to a lower cost of operation and hence more value for the user in the long run. Finally, the system also addresses the growing problem of water shortage but setting an operational threshold. A drop in the level of residual water in the supply resource, beneath the configured threshold, will automatically shut down the system and alert the user through a warning. All these advantages help provide value to the user in terms of an irrigation solution. The rest of the paper is organized as follows. Literature Survey is presented in Section II. Section III contains the Proposed Model. Experiments and Results are presented in Section IV. This is followed by Section V which discusses the Conclusion and Future Directions.

Low-Cost Smart Plant Irrigation Control System Using Temperature …

277

2 Literature Survey A lot of research has been conducted in the field of using smart devices to automate the processes involved in farming. A rule-based low-cost smart irrigation system has been discussed in [6]. The system consists of an ESP8266 NodeMCU is a microcontroller. The microcontroller gets the data from the sensors, and the processed data are sent to its destination using MQTT protocols. The moisture of the soil is monitored using the soil moisture sensor, which ensures the minimum water requirement of the plant by irrigating it. The water levels of reservoir are also constantly monitored using the ultrasonic sensors and relay. The data related to the water levels are sent to the microcontroller periodically. The temperature and the water vapor levels surrounding the crops are also checked periodically. The authors in [1] discussed an intelligent irrigation system using IoT devices. This rule-based system helps the farmers get updated information about their crops. Neural network-based intelligence is provided to the device for getting the data from the sensors and presents the farmers with suggestions regarding the irrigation schedules. Remote monitoring of the crops from a faraway location is enabled using MQTT and HTTP protocols. An intelligent water motor controller has been presented in [9]. It consists of a microcontroller and a soil moisture sensor. The sensor is relayed using the MQTT protocol. Transport layer security (TLS) and socket secure layer (SSL) and cryptographic protocol are used for the security of the system. The data gathered using the soil moisture sensor and the water motor data are constantly updated and displayed to the farmer using a mobile application. An automated irrigation system using various sensors has been proposed in [10]. The system consists of a microcontroller and different sensors to gather the data regarding the soil characteristics such as moisture content, temperature, and pH. ANN has been used to make decisions regarding the automation of the water pump to ensure that the crops get the required amount of water and prevent its overconsumption. The authors in [16] proposed a programmed and automated water irrigation framework. This framework aims to use the water resources efficiently to prevent any wastage and to ensure that the optimum water requirements of the crops are met. The system contains a microcontroller, a WiFi module, and a soil moisture sensor. The data gathered are sent to the cloud for making required analysis and providing the farmer with suitable suggestions regarding the irrigation patterns. The proposed programmed water system with framework for the terrains will reduce manual labor and optimize water usage, thereby increasing the productivity of crops. An intelligent crop irrigation model has been presented by the authors in [4]. The system consists of a fuzzy controller that gathers data regarding the soil moisture content and the surrounding temperature using sensors and then trains it to make decisions regarding the correct time to irrigate the crops and the optimum quantity of water required for the specific crops.

278

J. Malla et al.

A smart watering system for irrigating crops was proposed in [5]. The objective is to ensure that the crops acquire the required soil moisture content. The system consists of soil moisture sensors and a microcontroller. The data sourced from the sensors are analyzed and sent to the farmers using a web application. The farmer can be constantly updated about the status of the water pump. The automatic watering from the pump starts when the soil moisture content falls below a threshold. The threshold is set by the farmer to specify the device to start irrigating the crops till the threshold soil moisture content is met. The authors in [12] presented a smart irrigation system that aims at reducing the overutilization of water resources. The sensors detect the soil moisture content, temperatures of air and soil, the humidity of the field, and the UV light radiations. The sensor data are collected over the cloud, and machine learning is used to train the data. The analyzed data and the predictions and recommendations regarding the crops and the irrigation patterns are displayed to the farmer in real time on a web page. High accuracy is shown for the predictions made by the proposed machine learning algorithm. An IoT-based zoning water irrigation framework is proposed in [7]. The primary goal of the proposed framework is to lower the overconsumption of water and energy resources. Different sensors like the soil moisture sensor, temperature server, etc., are used to source the data, and it is sent to a server. A fuzzy logic controller is used to process the sensor data and automate the irrigation depending on the water requirement of the plant. The model showed high accuracy with optimum consumption of water when tested on tomato crops in a greenhouse. The authors in [15] proposed a smart irrigation framework for crops that require a different quantities of water in various stages of their life cycle. In the system, the mobile application captures the soil images, calculates its moisture content, and passes on the data to a microcontroller. Based on the data received, the water quantity required for that period is calculated and sent to the farmer. The system has been tested to optimize almost 42% of water consumed when compared to traditional irrigation methods.

3 Proposed Model The proposed model has been described below using block, architecture, and circuit diagrams in this section. The high-level architecture diagram of the proposed model is shown in Fig. 1. The central idea of the proposed model is as follows: an array of sensors captures contextual and environmental data to provide custom irrigation solutions based on the conditions of the environment in which the system is deployed. This array of sensors includes a temperature sensor to measure ambient temperature of the environment of operation and an ultrasonic distance sensor to measure the amount of residual water left in the main water supply resource. This sensor is used to provide the added functionality of a warning mechanism in the event of a water scarcity problem in the irrigation environment.

Low-Cost Smart Plant Irrigation Control System Using Temperature …

279

Fig. 1 Architecture diagram of the proposed model

The schematic diagram of the proposed smart irrigation device is shown in Fig. 2. The main operation of the system, the irrigation control is driven by a DC motor. In accordance, the Arduino microcontroller and power supply are support components that help accomplish the same. The actuator module is in the form of a piezoelectric component. This component completes added value functionality of the water shortage warning mechanism by sounding a warning to alert users in case of a critical event. Finally, in the interest of providing a smooth user experience, the system comes bundled with a liquid crystal display and a pair of LEDs. The liquid crystal display is used to display performance data in a user-friendly format. This includes the operational state of the system and the data collected by the array of contextual sensors. The pair of LEDs are used as to signify the operational state of the DC motor. As can be seen from the high-level block diagram (Fig. 1), the array of sensors, the actuator, and the motor are all bound around the central Arduino board. The low-cost smart irrigation control system is built on top of Arduino Uno R3 and uses a DC motor, connected to digital pin 13, to supply water to irrigate fields. The automation behind the motor is dependent on the synergy between two sensors. The analog temperature sensor (TMP36) is used to measure the ambient temperature of the area and regulate the functioning of the motor. As visible in the diagram, it has been connected to the analog pin A0 of the board. Additionally, the ultrasonic distance sensor is used to keep track of the amount of water in the supply tank. It is connected to digital pin 10 and is used to cut off the motor if the water level drops below a certain threshold. This also sets of the piezo, connected to digital pin 9, as a mechanism to

280

J. Malla et al.

Fig. 2 Schematic diagram of the proposed model

inform the user of the water shortage. LEDs of different colors have been used to indicate the state of the motor, and finally, the connected LCD keeps the user aware of the state of the system at all times. Further, the display is initialized across digital pins 2 through 7. Next, variables have been used to initialize the temperature and distance sensors, the LEDs, the motor, and the piezo over the various pins on the Arduino board. The setup function (void setup()) is used to initialize the various components based on application requirements. It has been used to first initialize the LCD and point its cursor at the appropriate spot to begin printing data. Next, within the setup function, pinMode method has been used to set the components as either input or output devices. Here, distance sensor has been initialized as an input device, while the motor, LEDs, and piezo have been initialized as output devices. The loop function (void loop()) governs the functioning of the application as it loops the written code on throughout the simulation. Hence, variables have been assigned for the following: bases for a standard 36 in water tank, activation ambient temperature of 25 degrees Celsius, and the minimum water supply requirement of 20 inches. Appropriate inbuilt methods have then been used to collect and transform sensor data into usable formats. Finally, conditional statements have been used to write the main governing logic of the application. If the ambient temperature exceeds the set basis of 25 degrees, the motor is started to irrigate the field. However, this action

Low-Cost Smart Plant Irrigation Control System Using Temperature …

281

Fig. 3 Flowchart for the conditional logic of the application

is nullified and the buzzer is activated if the water level in the main supple drops beneath the set standard. The flowchart for the conditional logic of the application is shown in Fig. 3.

4 Experiments and Results The circuit of the proposed model is shown in Fig. 4. Data parameters such as temperature and distance as measured by the sensors present in the system have been plotted against time in the graphs displayed. Figure 5 displays the graph of the temperature sensor data plotted against time for a time frame. Figure 6 displays the graph of

282

J. Malla et al.

the distance sensor data potted against time for a time frame. This visualization of data has been done by using a combination of Tinkercad and ThingSpeak. ThingSpeak provides IoT analytics tools, connects easily with Tinkercad circuits, and helps provide efficient performance analytics. In the two plots, temperature and distance have been shown as a function of time. These parameters have been varied randomly over time to demonstrate the functioning of the system. According to the configuration of the system, the motor shuts down whenever the temperature drops below the twenty-five degree threshold or the level of residual water drops below the fifty percent threshold in the primary water supply source. The sensor data are sent over the cloud server, and the information regarding the motor pump and the sensor values are displayed to the user via a web application named IrrigatEasy. The web application allows the user to set the threshold values for the ambient temperature and the minimum water requirement of the water reservoir. The ambient temperature and the water reservoir levels are constantly updated and displayed to the user in real time. The motor status is shown to the user. The motor status ON indicated that the pump is irrigating the crops as the water level in the reservoir is above the set threshold value and the temperature is above the set threshold value. The motor status OFF indicates that water level in the reservoir has gone below the set threshold value and an alert is sent to the user via a message to the registered mobile number. Figure 7 shows the snapshot of the mobile application when the motor is ON and the irrigation has started. Figure 8 shows the snapshot of the mobile application when the motor is OFF and the irrigation has stopped.

Fig. 4 Circuit diagram of the proposed diagram

Low-Cost Smart Plant Irrigation Control System Using Temperature …

283

Fig. 5 Temperature sensor data versus time

Fig. 6 Distance sensor data versus time

5 Conclusion and Future Directions Our IoT-based low-cost smart irrigation control system has been found to be cost efficient for improving approaches for conserving water resources and optimizing them for agricultural productivity. This system aids the farmer by operating in an automated and intelligent manner. Water can be delivered solely to the required area of land by burying several sensors in the soil. Because this technique requires

284

J. Malla et al.

Fig. 7 Snapshot of ON motor status in web application

Fig. 8 Status of OFF motor status in web application

minimal upkeep, it is accessible to all farmers. This system aids in the conservation of water. Crop output increases dramatically when this approach is used. For advanced water resources for agricultural output, the smart irrigation system is suitable and cost effective. Our system would have a feedback mechanism that would effectively monitor and control all plant development and irrigational activities. We can also try to add a rain gun sensor to avoid floods in case of heavy rains. Rainwater can be stored easily, and the water collected can be used to irrigate crops. We can also add a lot other kinds of water sensors that can be beneficial for the crops. All the gained observations show that this system can be a proper solution to the problems faced by the agriculture sector due to water shortages. Implementing such a system in our country can significantly boost agriculture production while also assisting in the appropriate management of water resources and decreasing waste. In the future, our system could improve and be a better version of itself that forecasts user activities, plant nutrient levels, harvest time, and many other necessary things that can be of

Low-Cost Smart Plant Irrigation Control System Using Temperature …

285

some benefit. More breakthroughs can be made in the future deploying machine learning and AI algorithms, which will benefit farmers greatly and cut water use in agriculture sector majorly. We can improve our system employing a sensor to record the pH levels of the soil, allowing farmers to reduce the amount of fertilizer used, and also saving money used to buy these fertilizers. A water metrics can be added to measure the quantity of water used for irrigating a particular crop in a particular season and as a result to provide a cost estimate for irrigation purposes. Furthermore, this will reduce the farmers’ investment on a crop and thus will increase profits majorly.

References 1. An IoT based smart irrigation management system using Machine learning and open source technologies 155, 41–49 (2018). https://www.sciencedirect.com/science/article/pii/S01681699 18306987 2. Aggarwal, S., Kumar, A.: A smart irrigation system to automate irrigation process using IOT and artificial neural network. In: 2019 2nd International Conference on Signal Processing and Communication (ICSPC), pp. 310–314 (2019) 3. Alomar, B., Alazzam, A.: A smart irrigation system using IoT and fuzzy logic controller. In: 2018 Fifth HCT Information Technology Trends (ITT), pp. 175–179 (2018). https://doi.org/ 10.1109/CTIT.2018.8649531 4. Atzori, L., Iera, A., Morabito, G.: The Internet of Things: a survey. Comput. Netw. 54(15), 2787–2805 (2010). https://doi.org/10.1016/j.comnet.2010.05.010. https://www.sciencedirect. com/science/article/pii/S1389128610001568 5. Barkunan, S.R., Bhanumathi, V., Sethuram, J.: Smart sensor for automatic drip irrigation system for paddy cultivation. Comput. Electr. Eng. 73, 180–193 (2019). https://doi.org/10.1016/j. compeleceng.2018.11.013, https://www.sciencedirect.com/science/article/pii/S00457906173 02288 6. Benyezza, H., Bouhedda, M., Rebouh, S.: Zoning irrigation smart system based on fuzzy control technology and IoT for water and energy saving. J. Cleaner Prod. 302, 127001 (2021). https://doi.org/10.1016/j.jclepro.2021.127001, https://www.sciencedirect.com/science/article/ pii/S0959652621012208 7. Cagri Serdaroglu, K., Onel, C., Baydere, S.: IoT based smart plant irrigation system with enhanced learning. In: 2020 IEEE Computing, Communications and IoT Applications (ComComAp), pp. 1–6 (2020). https://doi.org/10.1109/ComComAp51192.2020.9398892 8. Goldstein, A., Fink, L., Meitin, A., Bohadana, S., Lutenberg, O., Ravid, G.: Applying machine learning on sensor data for irrigation recommendations: revealing the agronomist’s tacit knowledge. Precision Agric. 19, 421–444 (2017) 9. Kodali, R.K., Sarjerao, B.S.: A low cost smart irrigation system using MQTT protocol. In: 2017 IEEE Region 10 Symposium (TENSYMP), pp. 1–5 (2017). https://doi.org/10.1109/TEN CONSpring.2017.8070095 10. Kwok, J., Sun, Y.: A smart IoT-Based irrigation system with automated plant recognition using deep learning. In: Proceedings of the 10th International Conference on Computer Modeling and Simulation, pp. 87–91. ICCMS 2018, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3177457.3177506, https://doi.org/10.1145/3177457. 3177506 11. Meher, C.P., Sahoo, A., Sharma, S.: IoT based irrigation and water logging monitoring system using Arduino and cloud computing. In: 2019 International Conference on Vision Towards Emerging Trends in Communication and Networking (ViTECoN), pp. 1–5 (2019). https://doi. org/10.1109/ViTECoN.2019.8899396

286

J. Malla et al.

12. Millán, S., Campillo, C., Casadesús, J., Moñino, M.J., Vivas, A., Prieto, M.H.: Automated irrigation scheduling for drip-irrigated plum trees. Precision agriculture@19 (2019) 13. Mishra, D., Khan, A., Tiwari, R., Upadhay, S.: Automated irrigation system-IoT based approach. In: 2018 3rd International Conference On Internet of Things: Smart Innovation and Usages (IoT-SIU), pp. 1–4 (2018). https://doi.org/10.1109/IoT-SIU.2018.8519886 14. Nawandar, N.K., Satpute, V.R.: IoT based low cost and intelligent module for smart irrigation system. Comput. Electron. Agric. 162, 979–990 (2019) 15. Pernapati, K.: IoT Based Low Cost Smart Irrigation System. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 1312–1315 (2018). https://doi.org/10.1109/ICICCT.2018.8473292 16. Rawal, S.: IOT based smart irrigation system. Int. J. Comput. Appl. 159, 7–11 (2017). https:// doi.org/10.5120/ijca2017913001

A Novel Approach to Universal Egg Incubator Using Proteus Design Tool and Application of IoT J. Suneetha, M. Vazralu, L. Niranjan, T. Pushpa, and Husna Tabassum

1 Introduction To encourage proper cell proliferation in the chicken egg, the healthy eggs should be kept warm during the incubation phase [1]. There are two types of incubation processes: organic incubation and artificial incubation. A mother sits on the egg during natural incubation to keep the hatch warm. The controller in an artificial incubation system monitors and controls the whole environment, including temperature, humidity, and ventilation, which are critical duties in the incubation process. Artificial incubation has the benefit of producing bigger hatched eggs throughout the year. The growth and hatching period of egg embryos are affected by changes in temperature, moisture, aeration, and egg rotation system [2]. As a result, developing a low-cost, high-efficiency incubator is tough. The majority of individuals in the country use a tiny incubation system, but the main issue is a lack of suitable financial and technical assistance [3]. Because of these technologies and lack of expertise, the incubation system’s correct temperatures, moisture, and aeration could not be maintained, resulting in 50 percent of the eggs dying over the 17-day incubation period. The temperature decreases below 20 °C [4] during the winter season, which has a significant impact on the incubation structure. The humidity and temperature sensors in the hatchery are indeed the LM35 and DHT22, which are operated by an Arduino Nano. The findings show that semiconductor unit-type sensors have the best direct response but the worst accurateness, with a change range of 10–50 and J. Suneetha (B) · L. Niranjan Department of CSE, CMR Institute of Technology, Bangalore, India e-mail: [email protected] M. Vazralu Department of IT, MR College of Engineering and Technology, Hyderabad, India T. Pushpa · H. Tabassum Department of CSE, HKBK College of Engineering, Bangalore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_26

287

288

J. Suneetha et al.

reaction duration of 5–60 s. Without the network, the user will be unable to handle the device remotely, and data such as lights on/off and fans on/off are managed on the web server [5].

2 Related Work Poultry in South Asia comes in a variety of kinds, and farmers raise them for eggs and meat. Vitamins, lipids, protein, and minerals are just a few of the nutrients included in eggs [6]. The goal of egg incubating is to grow the embryo inside the egg until it hatches. The majority of research and development in the last decade have focused on microcontrollers in order to create a clever egg incubator that can manage moisture, temperatures, egg flipping, and ventilation [7]. Figure 1 shows the ecological features that need to be considered for better results, and the factor which affects the system are noted [8]. One among them is the air quality in the incubator. During the incubation phase, the incubator’s concentration of oxygen should be kept at a sufficient level. Second thing is the egg rotating and it is the most important aspect of the incubation process that is egg turning, and the same implementation is carried out using Proteus tool [9]. Because the embryos have no circulatory system during the first week, egg rotation is critical. Third is the humidity and condensation in the incubator, where the amount at which water evaporates is determined by the humidity in the incubation unit and its life of operation [10]. Fourth is the temperature the most important aspect of the incubation [11]. The temperature must be kept at the appropriate level for excellent embryo growth and hatchability [12]. Last but not the least is the disinfection, where the eggs are disinfected before placing them in the incubator.

3 Structural Plan The construction is made up of mechanical components including plastics egg trays, perforated sheets, plastic angles, and cardboard sheets. These materials are both inexpensive and readily available. Temperature sensors, moisture sensors, light sensors, egg inclination sensors, water level sensors, embedded systems, egg rotating DC motors, diffusers, pumping systems, nodes, MCUs, solar panels, light bulbs, LCD displays, SMPS, and UPS are some of the fundamental elements [13]. The design of the incubator, including the positioning of egg boxes within the unit, is shown in Fig. 2. The electrical connections of the incubator are shown in Fig. 3. Operating the fan and the exhaust with the controller allows for pressurized air and then still air incubation [14]. To maintain the eggs warm and preserve a consistent temperature overall the incubator, the cage is built made up of recycled panels and aluminum foil [15].

A Novel Approach to Universal Egg Incubator Using Proteus Design …

Fig. 1 Ecological features to be measured during egg incubation

Fig. 2 Inside view of incubator

289

290

J. Suneetha et al.

Fig. 3 Air circulation in the incubator

4 Proposed System Design As shown in Fig. 4, the suggested technique is separated into three fragments: the first fragment depicts the interconnection of the incubator’s sensors, and the second fragment depicts the controller’s connectivity to the sensors, monitor, power unit, and driving unit. The third component is the driven unit, which links the microcontroller to the signal generator, which subsequently controls ventilation, humidity, egg tilt, and temperature. The third component also includes an internet connection to the smartphone through the NodeMCU device [16]. The humidity and temperature are controlled by the DHT11 and DS18B20, respectively. The output data are sent into the analog input signal of an 8-bit microcontroller PIC16f887 [17]. The level of water and sensors are used to manage the water pressure in the humidifier, which is meant to keep the moisture levels in the incubator at the desired level. An incandescent light keeps the heat of the device constant [18]. To avoid issues associated with bulb failure, the light sensor is utilized to detect whether the bright light is ON or OFF while incubation [19, 20]. Two leaf changeovers on each side keep the egg tray at the appropriate angle. Each will have control over the motor’s rotation angle in the gestation unit.

A Novel Approach to Universal Egg Incubator Using Proteus Design …

291

Fig. 4 Proposed system design block

5 Circuit Design and Flow Diagram As shown in Fig. 5, the Proteus managing and developing were used to create the circuit. This tool is used for the whole simulation, and the signals are modified in the simulation to examine the controller’s behaviors. The code is written in c-code using the MPLAB program.

6 Results and Discussions The temperature, humidity, and tilt of eggs are observed for over 22 days, and during the period, the variations in the parameters are observed and noted (Table 1). Figure 6 shows the temperature measured in real-time versus the estimated temperature of 22 days. In the measurement, the real-time and estimated temperatures are compared. The variations in humidity with respect to different temperatures are measured for 22 days as shown in Fig. 7. For incubation, a total of 300 viable eggs were employed in the suggested procedure. T1 36.5 °C, T2 37.5 °C, and T3 38 °C were the three temperatures tested. The optimal temperature for incubation is 37.5 °C. Figure 8 shows the variation of humidity with respect to temperature over a period of 22 days. The variation of humidity for a fixed temperature is illustrated in the figures. Figure 9 shows the variation of egg tilt for an angle of inclination over a period of 22 days. This is followed for both the temperatures, i.e., 37.6 and 38 °C.

292

J. Suneetha et al.

Fig. 5 Proposed design using Proteus tool Table 1 Over the period of 22 days Temperature Temp2 = 37.5 °C (chicks in incubator = 150)

Temp3 = 38 °C (chicks in incubator = 150)

82% hatching rate

93% hatching rate

85% hatching rate

Temperature

Temp1 = 36.5 °C (chicks in incubator = 150)

Temperature

Day 22

Number of Days

(a)

Number of Days

(b)

Fig. 6 a Measurement of temperature and its estimation. b Measurement of humidity and its estimation

293

Temperature

Temperature

A Novel Approach to Universal Egg Incubator Using Proteus Design …

(a)

(b)

Fig. 7 a Variation of temperature (36.5 °C) versus humidity over 22 days. b Variation of temperature (37.5 °C) versus humidity over 22 days

Fig. 8 a Variation of temperature (38 °C) versus humidity over 22 days. b Variation of temperature (36.5 °C) versus humidity over 22 days

Fig. 9 a Variation of temperature (37.5 °C) for an angle of inclination over 22 days. b Variation of temperature (38 °C) for an angle of inclination over 22 days

Figure 10a shows the control and monitoring of incubator using Blynk application through smartphone. Figure 10b depicts the code that was created on the NodeMCU for various tests before being implemented into the PIC controller. Figure 11a shows the implementation of incubator in real time. The unit consists of both incubation and hatchery units. The upper part of the incubator is used for incubation of eggs, and the lower part is used for hatching of eggs.

294

J. Suneetha et al.

Fig. 10 a Control and monitoring of incubator using Blynk application. b LCD, sensor, and cloud value through PIC

Fig. 11 a Implementation part of the entire incubator unit. b Result of hatched chicks

7 Conclusion An affordable forced air incubator device based on the PIC controller has been built. The materials utilized in the incubators are inexpensive and readily available in the area. The incubator breeding test was carried out at three different temperatures: Temp1, Temp2, and Temp3. The examination revealed that Temp 2 is the ideal temperature for maximizing productivity. In the breeding place, both the hatcher and the brooder are housed in a single compartment. The incubator device takes approximately 12 min to reach the desired temperature, which is Temp2 (from 21 to 37.5 °C). At the time, the power usage was roughly 62 watts, while the running

A Novel Approach to Universal Egg Incubator Using Proteus Design …

295

duration after reaching the desired temperatures was around 22 watts. With the Blynk application on a smartphone, the complete machine can be operated from anywhere. The incubator is intended for small-scale chicken production in communities that may be possible for both egg fetus laboratory and research operations. Acknowledgements This research was supported by SiliconShelf Electronic Systems, Bangalore, for the setup, and the entire model design and experimental process was simulated at CMRIT, Bangalore. We thank our colleague Prof. Niranjan L, ECE, CMRIT, who provided insight and expertise that significantly aided the research. We thank Mr. Vinay for his support with methodology from SiliconShelf Electronics Systems. We also thank Dr. K Venkateswaran, Professor and Director (Research & Innovation) CMRIT, Bangalore, for his valuable comments that greatly improvised the manuscript.

References 1. Boleli, C., Morita, V.S., Matos, J.B., Thimotheo, M.I., Almeida, V.R.: Poultry egg incubation: integrating and optimizing production efficiency. Braz. J. Poult. Sci. 18(2), 1–16 (2016) 2. Tona, K., Onagbesan, O., Ketelaere, B.D., Decuypere, E., Bruggeman, V.: Effects of turning duration during incubation on corticosterone and thyroid hormone levels, gas pressures in air cell, chick quality and juvenile growth. Poult. Sci. 82, 1974–1979 (2003). https://doi.org/10. 1093/ps/82.12.1974 3. Niranjan, L., Priyatham M.M.: An energy efficient and lifetime ratio improvement methods based on energy balancing. Int. J. Eng. Adv. Technol. 9(1S6), 52–61 (2019). https://doi.org/ 10.35940/ijeat.a1012.1291s619 4. Kutsira, G.V., Nwulu, N.I., Dogo, E.M.: Development of a small scaled microcontroller-based poultry Egg incubation system. In: International Artificial Intelligence and Data Processing Symposium (IDAP) (2019). https://doi.org/10.1109/IDAP.2019.8875897 5. Shwetha, N., Niranjan, L., Gangadhar, N., Jahagirdar, S., Suhas, A.R., Sangeetha, N.: Efficient usage of water for smart irrigation system using Arduino and proteus design tool. In: 2021 2nd International Conference on Smart Electronics and Communication (ICOSEC), pp. 54–61 (2021). https://doi.org/10.1109/icosec 51865.2021.9591709 6. Deeming, D.C.: Characteristics of unturned eggs: critical period, retarded embryonic growth and poor albumen utilization. Poult. Sci. 30, 239–249 (1989). https://doi.org/10.1080/000716 68908417144 7. Suhas, A.R., Niranjan, L., Sreekanth, B.: Advanced system for driving assistance in multi-zones using RF and GSM technology. Int. J. Eng. Res. Technol. (IJERT) 3(6), 1884–1889 (2014) 8. Islam, N., Uddin, M.N., Arfi, A.M., Alam, S.U., Md. Uddin, M.: Design and implementation of IoT based perspicacious egg incubator system. In: 9th Annual Information Technology, Electromechanical Engineering and Microelectronics Conference (IEMECON) (2019). https:// doi.org/10.1109/IEMECONX.2019.8877043 9. Shwetha, N., Niranjan, L., Chidanandan, V., Sangeetha, N.: Advance system for driving assistance using Arduino and proteus design tool. In: 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV) (2021). https://doi. org/10.1109/icicv50876.2021.9388620 10. Tolentino, L.K.S., Enrico, E.J.G., Listanco, R.L.M., Ramirez, M.A.M., Renon, T.L.U., Samson, M.R.B.: Development of Fertile egg detection and incubation system using image processing and automatic candling. In: TENCON 2018—2018 IEEE Region 10 Conference (2018). https:// doi.org/10.1109/tencon.2018.8650320

296

J. Suneetha et al.

11. Niranjan, L., Priyatham, M.M.: Lifetime ratio improvement technique using special fixed sensing points in wireless sensor network. Int. J. Pervasive Comput. Commun. (2021) (ahead-of-print). https://doi.org/10.1108/ijpcc-10-2020-0165 12. Huang, T., Sun, L.: Design and implementation of the infant incubator intelligent control system based on internet of things. Open Autom. Control Syst. J. 7(1), 2223–2229 (2015). https://doi. org/10.2174/1874444301507012223 13. Shwetha, N., Niranjan, L., Chidanandan, V., & Sangeetha, N.: Smart driving assistance using Arduino and proteus design tool. Expert Clouds Appl. 647–663 (2021). https://doi.org/10.1007/ 978-981-16-2126-0_51 14. Kabir, M.A., Abedin, M.A.: Design and implementation of a microcontroller based forced air egg incubator. In: 2018 International Conference on Advancement in Electrical and Electronic Engineering (ICAEEE) (2018). https://doi.org/10.1109/icaeee.2018.8642976 15. Aldair, A.A., Rashid, A.T., Mokayef, M.: Design and implementation of intelligent control system for egg incubator based on IoT technology. In: 2018 4th International Conference on Electrical, Electronics and System Engineering (ICEESE) (2018). https://doi.org/10.1109/ice ese.2018.8703539 16. Niranjan, L., Suhas, A.R., Sreekanth, B.: Design and implementation of robotic arm using proteus design tool and arduino-uno. Indian J. Sci. Res. 17(2), 126–131 (2018) 17. Gutierrez, S., Contreras, G., Ponce, H., Cardona, M., Amadi, H., Enriquez-Zarate, J.: Development of hen eggs smart incubator for hatching system based on internet of things. In: 2019 IEEE 39th Central America and Panama Convention (CONCAPAN XXXIX) (2019). https:// doi.org/10.1109/concapanxxxix47272.2019.8976987 18. Niranjan, L., Suhas, A.R., Chandrakumar, H.S.: Design and implementation of self-balanced robot using proteus design tool and arduino-uno. Indian J. Sci. Res 17(2), 556–563 (2018) 19. Gutierrez, S., Contreras, G., Ponce, H., Cardona, M., Amadi, H., Enriquez-Zarate, J.: Development of hen eggs smart incubator for hatching system based on internet of things. In: 2019 IEEE 39th Central America and Panama Convention (CONCAPAN XXXIX) (2019). https:// doi.org/10.1109/concapanxxxix47272.2019.8976987. 20. Dhipti, G., Swathi, B., Reddy, E. V., Kumar, G.S.: IoT-Based energy saving recommendations by classification of energy consumption using machine learning techniques. In: International Conference on Soft Computing and Signal Processing, pp. 795–807. Springer, Singapore (2021, June)

Dynamic Resource Allocation Framework in Cloud Computing Gagandeep Kaur and Sonal Chawla

1 Introduction Cloud Computing is one of the emerging technologies in these days. Resource Management in Cloud Computing is an important issue and faces problems related to allocation, provisioning, and scheduling of resources. The problems such as decreased throughput and failures of the systems may occur. The existing Resource Management techniques, frameworks, and mechanisms are not sufficient to handle the dynamic environments, applications, and resource behaviors. To provide efficient performance, the different characteristics of Cloud Computing environment have been addressed effectively in the proposed research work. This research presents a Resource Management Framework in Cloud Computing and tested the proposed framework using different metrics. Cloud Computing is a pay-per-use consumer provider service model, i.e., payment model in which the consumer pays for using the product rather than having to buy it [1–4]. Cloud Computing is an emerging concept through which users can gain access to their applications from anywhere, at any time, through network and connected devices.

2 Literature Review Current research efforts on resource provisioning in Cloud Computing have devised static and dynamic provisioning schemes. Static provisioning [5, 6] is usually performed offline, e.g. provisioning that occurs on monthly basis, whereas dynamic provisioning [7, 8] dynamically regulate the workloads. In both the static and dynamic G. Kaur (B) · S. Chawla Department of Computer Science and Applications, Panjab University, Chandigarh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_27

297

298

G. Kaur and S. Chawla

case, virtual machine (VM) sizing is identified as the most important step, where VM sizing refers to the estimation of the amount of resources to be allocated to VMs [9]. [10] proposes Combinatorial Double Auction Resource Allocation (CDARA) which is a market driven model for Resource Management in Cloud Computing. [11] proposes power and load aware resource allocation policy for hybrid Cloud. Authors tried to minimize power consumption and maximize utilization of resources. Algorithms have been tested with DVFS-based scheduling technique. [12] proposes resource monitoring model for virtual machine in Cloud Computing. Authors had monitored live working nodes static and dynamic information for future resource discovery and resource allocation models. [13] proposes monitoring architecture for Cloud Computing. To achieve this, authors have done integration between resource monitoring tool and its resource discovery protocol. [14] focuses on dynamic resource pricing in Cloud Computing. The solutions generated by heuristic methods most often get stuck in a local minima problem, and meta-heuristic algorithms were proved most efficacious approach to avoid situation stated in [15, 16]. The application of meta-heuristic algorithms on scheduling aspects in Cloud and grid environment are systematically reviewed by researchers in [17–20]. This paper is organized as follows: Sect. 3 presents general architecture of the proposed framework. Section 4 presents the proposed algorithm for Dynamic Resource Allocation and provisioning based on the architecture of the proposed framework. Section 5 presents experiments evaluating the proposed framework for different metrics. Section 6 concludes the paper.

3 General Architecture of the Proposed Framework The proposed framework uses infrastructure as a service model because it delivers processing, storage, networks, and other fundamental computing resources to users. For scarce resource, users themselves become competitors who will impact future price directly or indirectly. From the provider’s perspective, large scale of virtual machines needs to be allocated to distributed users dynamically, fairly, and profitable. This research work is based on Resource Management System and classifies the requirements of end users and Cloud providers. The Cloud Service Providers (CSPs) and users interact in a competitive market environment for resource trading and service access. The detailed working and functions of the layers of the proposed Resource Management System are as follows: Resource Provisioning Layer (RPL): Resource Provisioning Layer is the top layer of the proposed framework. This layer interacts with the users who request for resources and demand for task execution. Users submit request and bid for resources in Resource Provisioning Layer. Resource Provisioning Layer provides different service systems such as Resource Request System, Resource Database, and Cloud Deployment Model to interact with the users and initiate services and resources for the execution of tasks. These service systems are listed below:

Dynamic Resource Allocation Framework in Cloud Computing

299

Resource Request System: Users submit their requests to Resource Management System through Resource Request System. This system is a combination of task distribution and coordinated interactions to provide flexible solutions. Resource Request System handles the operational execution of tasks. Tasks are distributed on different virtual machines, and Resource Request System provides interfaces and services to monitor their communication. The main components of Resource Request System are as follows: Resource Identification: Resource Identification component is responsible for the identification of resources submitted by the user. User submits different types of tasks, e.g. small, medium, and large tasks according to their needs. These tasks are then further divided into meta-tasks. According to the tasks submitted by the user, resources are identified for tasks and meta-tasks. Resource Gathering: Resource Gathering component identifies available resources and prepares custom resources. User provides exact service specifications through the user interfaces. After receiving the exact specification from the user, resource gathering verifies the current available resources, policy to the Cloud like pricing policy, etc. Resource Brokering: Resource negotiation is done through this component to make sure that resources are available as per requirement. As user submits different types of tasks and demand for different types of resources, negotiation can be done for resources. The process of matching service requests from multiple users to multiple providers is done by resource brokering. Resource Database: Resource database are used to store the data submitted by the user while the submission of tasks at the back end. So it is a collection of informational content, either structured or unstructured and resides on infrastructure as a service platform. Resource database updates recent information before next data reports are submitted. Cloud Deployment Model: Cloud Deployment Model describes the services, parties involved, penalty policies, and Quality of Service (QoS) parameters such as makespan, response time, and cost. It consists of service level agreement that acts as an agreement between Cloud service provider and Cloud user. User utilizes service level agreement as a legally binding explanation of what provider promised to provide. The Cloud service provider utilizes it to have a significant record of service that is to be delivered. It also determines who provides the service (Fig. 1). Every user has his own requirements according to which service providers provide services and location. Following are the components of Cloud Deployment layer: Resource Discovery: Resource discovery searches and discovers resources and is responsible for the logical grouping of various resources as per the requirements of Cloud users. User request received from the resource database and requested resource has been searched. If there are appropriate resources discovered, then resources are declared to the user.

300

G. Kaur and S. Chawla

Fig. 1 Architecture of the proposed Resource Management Framework

Resource Selection: Resource Selection is responsible to choose best resources among available resources for requirements provided by cloud users. As different users demand for different resources as per their requirement, so selecting the best resource is the major responsibility of Resource Selection. Different types of tasks such as small, medium, and large tasks choose different types of resources such as small, medium, and large. Resource Mapping: Resource Mapping maps virtual resources with physical resources provided by the Cloud providers. There is a need for allocating and reallocating the virtual resources to physical resources according to the demand or the current status of resource in the data centers.

Dynamic Resource Allocation Framework in Cloud Computing

301

Resource Allocation: Resource Allocation allocates and distributes resources to Cloud users. The main goal of this component is to satisfy Cloud user needs. The fluctuating demands of the Cloud users may increase or decrease the allocation and reallocation of resources. It allows Cloud users to scale up and down resources based on their needs. Due to the constantly increasing demands of the Cloud users for services, the distribution of resources as per requirements, and Quality of Service parameters given in service level agreement can be achieved. Virtual Layer (VL): In the proposed architecture, the Virtual Layer provides virtual instances such as Virtual Machines and its resources to the Cloud users. This research work uses virtual machines and resources for the execution of tasks. The main purpose of developing virtual machines and resources is to obtain the advantage of scalability and to manage the allocation of Cloud Computing resources. The resources are allocated on the basis of payoff function of the users. Virtual machine interface creates and runs virtual machines (VM). Virtual machine interface allows one host computer to support multiple guest virtual machines by virtually sharing its resources, such as memory and processing. Depending on the services provided, the Resource Management System has to keep the track of performance and status of resources at different levels. It keeps track of the availability of virtual machines and their resource entitlements. The main components of virtual machine interface are as follows: Resource Monitoring: Resource Monitoring is the main component of resource optimization. Various virtualized Cloud resources are monitored to analyze utilization of resources. This will monitor availability of free resources for future purpose. The major issue with Cloud resource monitoring is to identify and define metrics and parameters. Resource Modeling: Resource Modeling is responsible for the prediction of various virtualized resources required by Cloud users. Resources are non-uniform so it is difficult to predict resource requirements for peak and non-peak periods. During the periods of high demands the predictions for pricing and services is done through Resource Modeling. Resource Brokering: Resource brokering is negotiation of virtualized resources with Cloud users to make sure that they are available as per requirement. There are different requests based on type and granularity that extends the responsibilities of Cloud service brokering. Resource brokering is responsible for supporting the relationships between Cloud providers and Cloud users. The major factor is pricing but other Quality of Service parameters such as response time, cost, and resource utilization are also important. Resource Adaptation: Resource Adaptation means assessing the allocation of resources to a workload. The proposed framework is based on both static and dynamic workloads where tasks are known prior to execution, and scheduling is performed when the tasks are arrived. System policies are used to control adaption decisions. Resource Pricing: The proposed research uses dynamic pricing to support scalability and application specific adaptation. In the proposed framework users bid for the

302

G. Kaur and S. Chawla

resources. As the number of users increases, the price for the resource increases. That means if the demand is high the resource price increase incredibly but as the demand decreases the resource price also decreases. Physical Layer (PL): The proposed architecture uses Physical Layer for offering hardware resources such as CPU, memory, bandwidth, and databases for the execution of tasks. Databases are used to store the data such as number of tasks, type of tasks, and number of virtual machines required. In Physical Layer, data centers are used for the execution of large number of tasks and responsible for providing both physical and virtual resources.

4 Proposed Algorithm The proposed Dynamic Resource Allocation algorithm initializes the user bid and predicts resource prices based on competitor’s bid. It dynamically allocates the resources based on user’s demands. The tasks requested by the user can be divided into meta-tasks and then submitted for the allocation of the resource. It depends on the type of task, i.e. small, medium, or large and the small, medium, or large resource can be allocated according to the user requirement. ALGORITHM: Dynamic Resource Allocation INPUT: User task, OUTPUT: Allocated virtual machines and resources Begin Let x be the number of tasks submitted by the user and K be the number of resources virtualized by cloud provider Let task s occupy resource capacity C and Divide s = s1 , s2 ,………sn Verify available resources, pricing policy and set user requirements User start bidding for resources and bid of user = n Sum of bids for all users = m and bid of competitors = m–n Execution Speed q = s / MIPS and Execution Time = q + q (Bid of other competitors / Bid) Calculate P = ET * bid // P is cost Decide to negotiate and negotiate multiple services with different providers Search for resources and choose best resource. Continue if users are unsatisfied for all satisfied users {//Optimize the bids Calculate the price that cloud provider charges If demand > provision Then Resource price increases Else resource price decreases Reduce proportion until provision −→ 1} Get information regarding the under-utilization of Cloud resources and negotiate cheaper agreements

Dynamic Resource Allocation Framework in Cloud Computing

303

If QoS parameters do not meet the user’s requirements saturate the resources Average resource utility = ((successful users + 0.0) /num user) * 100 If utilization of cloud resources are unbalanced check the correctness of negotiated parameters Arrange all tasks in ascending order in MT and while there are tasks in MT { //MT is meta tasks for all t i in MT and for all mj // (t i tasks) (mj machines) (Finish Time)ij = (Completion Time)ij + (Ready Time)j for all tasks t i in MT Calculate the (Least Finish Time)ij and resource mj If (resource > 1) Choose resource with least usage Find Average Finish Time (AFT) and Standard Deviation (SD) of all tasks in MT If (AFT > SD) Allocate t f to resource my that acquires (Finish Time)fy // t front Else Allocate t r to resource my that acquires (Finish Time)ry // t rear Delete assigned task from MT} Profit = [1 – ((Total Execution Time-Actual Time)]*100 End Above algorithm give a clear indication that Resource Allocation can be performed successfully by investigating task requirements for fulfilling user’s expectations. Also since it has been seen that it is difficult to perform suitable Resource Allocation according to application workload and due to the growing complexity of the data center and application requirements becomes challenging. It needs to ensure user demands by allocating suitable resources. Besides, an inappropriate allocation will lead to performance degradation and violation of service level agreement in Cloud environment. So to address this problem, the proposed algorithm is a feasible solution.

5 Experiments In order to test the effectiveness of the framework, it has been tested on different number of users with different number of tasks. The different number of users is better option for testing the framework as it ensures the effectiveness of different metrics used in the implementation of the framework. The user submits his request to the provider with Quality of Service constraints such as deadline, execution time, response time, and budget. If the request has been accepted, response time for the request has been calculated. The deadline is the highest time user would like to wait for the end result, and budget is the price user is ready to pay for the requested service. Optimization of tasks in data center considers utilization of resources. The task size can simply be measured as the average of the execution time over all machines. In a heterogeneous system, execution time of a task varies on different machines. CloudSim simulator is used to create Cloud environment and conduct experiments for the optimization of tasks in data center.

304

G. Kaur and S. Chawla

5.1 Performance Evaluation To improve the performance, following metrics such as makespan, response time, and performance cost have been proposed and evaluated. Makespan is defined as total time taken to process a set of jobs for its complete execution i.e. when all the jobs have finished processing. In the proposed research as the number of tasks increases, makespan also increases. In the proposed framework experimental methodology takes three assumptions: AI: Less short tasks and more long tasks. AII: Less long tasks and more short tasks. AIII: Random tasks. Different types of tasks have been taken for performance evaluation, i.e., short, random, and long. Figure 2 provides results for makespan of different types of tasks. For random tasks the algorithm provides better results as compared to long and short tasks. Response time is the amount of time taken by the machine to respond for a task. In the proposed framework, response time is the amount of time it takes to perform an individual transaction or query. This metric reflects the application’s speed and performance from user’s perspective. Figure 3 shows results for response time (in ms) for different number of tasks. For 100 tasks response time is 7 ms and 500 tasks response time is 146 ms. Here if the number of tasks increases the response time for the tasks also increases for the proposed research. Fig. 2 Graphical representation of makespan versus types of tasks

Makespan

Makespan

300 200 100 0

Short

Random

Long

Types of Tasks

Makespan

Response time 1000

Tasks

463

421

379

337

295

211

253

169

85

Response Time

127

0

1

Tasks

43

Response Time

Fig. 3 Graphical representation of response time

Fig. 4 Graphical representation of performance cost versus number of tasks

Performance Cost

Dynamic Resource Allocation Framework in Cloud Computing

305

Performance Cost 0.15 0.1 0.05 0

1000

2000

3000

4000

Tasks

In the proposed research performance cost metric aims to quantify the improvement achieved in average response time and its cost. The smaller the performance improvement cost, the better the algorithm performs. Performance cost is calculated as Amount Spent/Average Response Time. Figure 4 provides graphical representation of performance cost for different number of tasks. As shown in graph, the performance cost varies for different number of tasks. It is higher in case for 3000 tasks and lower in case of 4000 tasks.

6 Conclusion This paper discusses the architecture of the proposed Resource Management Framework in Cloud Computing. On the basis of the tools and techniques identified, the general architecture for Resource Management and Resource Allocation in Cloud computing has been proposed. The proposed problem is solved by economic method such as auction method along with heuristics for the allocation of resources and scheduling of tasks. The architecture proposes three layers such as Resource Provisioning Layer, Virtual Layer, and Physical Layer. Dynamic Resource Allocation Algorithm for both Cloud user and Cloud service provider has been proposed for the bidding of resources. The proposed framework has been evaluated using metrics such as makespan, response time, and performance cost. It has been depicted that the proposed metrics provides outstanding results for the metrics in the simulated environment.

References 1. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010) 2. Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging it platforms: vision, hype, and reality for delivering computing as the 5th utility. Futur. Gener. Comput. Syst. 25(6), 599–616 (2009) 3. Manvi, S.S., Shyam, G.K.: Resource management for infrastructure as a service (iaas) in cloud computing: a survey. J. Netw. Comput. Appl. 41, 424–440 (2014)

306

G. Kaur and S. Chawla

4. Pallis, G.: Cloud computing: the new frontier of internet computing. IEEE Internet Comput. 14(5), 70 (2010) 5. Gmach, D., Rolia, J., Cherkasova, L., Kemper, A.: Capacity management and demand prediction for next generation data centers. IEEE International Conference on Web Services (2007) 6. Wood, T., Cherkasova, L., Ozonat, K., Shenoy, P.: Profiling and modeling resource usage of virtualized applications. ACM International Conference on Middleware (2008) 7. Kusic, D., Kandasamy, N.: Risk-aware limited look ahead control for dynamic resource provisioning in enterprise computing systems. IEEE ICAC (2006) 8. Padala, P., Shin, K.G., Zhu, X., Uysal, M., Wang, Z., Singhal, S., Merchant, A., Salem, K.: Adaptive control of virtualized resources in utility computing environments. ACM SIGOPS (2007) 9. Meng, X., Isci, C., Kephart, J., Zhang, L., Bouilett, E.: Efficient resource provisioning in compute clouds via VM multiplexing, ACM ICAC (2010) 10. Samimi, P., Teimouri, Y., Mukhtar, M.: A combinatorial double auction resource allocation model in cloud computing. Inf. Sci. (2014) 11. Jha, R.S., Gupta, P.: Power & load aware resource allocation policy for hybrid cloud. Procedia Comput. Sci. 78, 350–357 (2016) 12. Ge, J., Zhang, B., Fang, Y.: Research on the resource monitoring model under cloud computing environment. In: Web Information Systems and Mining, pp. 111–118. Springer, Berlin (2010) 13. Gutierrez-Aguado, J. M. A. Calero, and W. D. Villanueva, Iaasmon: Monitoring architecture for public cloud computing data centers. J. Grid Comput. 1–15 (2016) 14. Mihailescu, M., Teo, Y.M.: Dynamic resource pricing on federated clouds. In: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp. 513–517 (2010) 15. Tsai, C., Rodrigues, J.J.P.C.: Metaheuristic scheduling for cloud: a survey. IEEE Syst. J. 8, 279–291 (2014) 16. Kalra, M., Singh, S.: A review of metaheuristic scheduling techniques in cloud computing. Egypt Inf. J. 16, 275–295. https://doi.org/10.1016/j.eij.2015.07.001 17. Poonam, Dutta, M., Aggarwal, N.: Meta-heuristics based approach for work flow scheduling in cloud computing: a survey. In: Advanced Intelligent System of Computing, pp. 1331–1345 (2016) 18. Wu, F., Wu, Q., Tan, Y.: Workflow scheduling in cloud: a survey. J. Supercomput. 71, 3373– 3418. https://doi.org/10.1007/s11227-015-1438-4 19. Alkhanak, E.N., Lee, S.P., Khan, S.U.R.: Cost-aware challenges for workflow scheduling approaches in cloud computing environments: taxonomy and opportunities. Future Gener. Comput. Syst. (2015). https://doi.org/10.1016/j.future.2015.01.007 20. Branch, U.: Towards workflow scheduling in cloud computing: a comprehensive analysis. J. Netw. Comput. Appl. 66, 64–82 (2016). https://doi.org/10.1016/j.jnca.2016.01.018

Region-Wise COVID-19 Vaccination Distribution Modelling in Tamil Nadu Using Machine Learning M. Pradeep Gowtham and N. Harini

1 Introduction COVID-19 outbreak started in Wuhan, China, in December 2019, and has proliferated in all countries [1]. Coronavirus has affected nearly 4.3 crore people in India as of April 20222. The trend of the positive case count in India is also shown in Fig. 1. In Tamil Nadu, 34.5 lakh people have been affected by the coronavirus as of April 2022 and approximately 10 crore people have been vaccinated [3]. New variants of COVID-19 are continuously spreading, and it is being instructed by WHO to get at least the first dose to be safe from the spread of new variants [4]. Vaccination is the only way to mitigate the spread of coronavirus. Several companies like Bharat Biotech International and Serum Institute of India are manufacturing vaccines in India. The scheme presented in this paper provides a solution for distributing vaccines within a short span of time. The advantage is twofold as the scheme ensures the usage of vaccines before the expiry date, and the spread of the virus is also controlled as the predicted distribution by the scheme is based on important statistics like the number of positive cases, the number of senior people, number of people vaccinated and the total population. Machine learning is being used in every aspect of health care, Machine learning can help in getting insights from huge amounts of medical data and can increase the prediction accuracy of the disease and it can be used to diagnose a disease in medical images [5]. Some examples of machine learning in healthcare and clinical decision support (CDS) tools are used to process large volumes of data and suggest M. P. Gowtham · N. Harini (B) Department of Computer Science, Amrita Vishwa Vidyapeetham, Amrita School of Engineering, Ettimadai, India e-mail: [email protected] M. P. Gowtham e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_28

307

308

M. P. Gowtham and N. Harini

Fig. 1 India COVID-19 positive cases count [2]

the next steps for treatment [6]. In this research work, time series models are used to predict the number of positive cases and vaccination rate in a district. Based on the prediction of the future number of cases, necessary steps can be taken in advance to avoid the adverse effect, and the vaccination rate prediction can be used to cover the entire population in a district in a short span of time without any wastage or delay. The rest of the paper is organized as follows: relevant search in literature which forms the motivation for the work presented in Sect. 2. The proposed model is presented in Sect. 3. Analysis and the results of the scheme for the Tamil Nadu district are discussed in Sect. 4. The conclusion from the developed scheme is presented in Sect. 5.

2 Literature Survey The research proposed in [7] intended to foresee the COVID count. The researchers used regression models to anticipate the susceptibility and recuperation rate for the contaminated people. They have implemented the models on the data collected from 7 March 2020 to 21 September 2020 in Tamil Nadu. This scope of the project can be upgraded ceaselessly by building a more suitable and precise model for the prediction along with future data. The research proposed in [8] implements ARIMA to forecast the COVID positive count. ARIMA model is utilized on the information gathered from January 2020 to February 2020 were gathered. ARIMA (1,0,4) was chosen as the best model, while ARIMA (1,0,3) was chosen as the best ARIMA model for deciding the frequency of COVID-2019. Further improvement on the model can be done by collecting more data to have a more detailed prevision. The research proposed in [9] optimizes the vaccine distribution all over the USA using ‘deterministic and stochastic recurrent neural networks’ [9]. Deterministic Long Short-Term Memory (LSTM) and stochastic Mixture Density Network to predict the future time steps. The performance is compared with a baseline linear

Region-Wise COVID-19 Vaccination Distribution Modelling in Tamil …

309

regression model. The model utilized the count of affirmed COVID-19 instances in all conditions of the US from 22 January 2020 to 26 November 2020. The review can be stretched out to explore the effects of versatility on the presentation of the grouping learning models. The research proposed in [10] implemented ‘Multiple Linear Regression’ to predict the positive, recovered deaths due to coronavirus. The data used is from 22 March 2020 to 4 July 2020. The model is used to predict the cases in Odisha and India. The model predicted that there will be a maximum of 10,134 and a minimum of 8582 cases in Odisha in the month of August, and in India, a maximum of 55,868 and a minimum of 48,711. The model can be improved by including the contact tracing cases. The paper suggests that the positive cases can be reduced if the contact tracing cases are reduced. The research proposed in [11] implemented the ARIMA model to predict the cases in India, Russia, Brazil, Spain and the US. The model was trained to forecast the succeeding 77 days. The findings from the prediction are that India and Brazil to touch the 1.38million and 2.47 million marks, respectively, in addition to that there is a possibility for the US to hit 4.29 million by end of June 2020.

3 Proposed Model The scheme used the data collected from an open government portal to foresee the positive cases and vaccination rates. The data pre-processing steps are carried out to change the format of the data according to the model required by the scheme. The models proposed in Sect. 4 are implemented, and the results of the models can be used to foresee the positivity rate and optimize the distribution of the vaccine (Fig. 2).

4 Analysis and Results The proposed model was analysed by applying the algorithms to the data set. The first five months (April 2020 to August 2020) of the data are taken as test data, and the rest of the data (September 2020 to November 2021) is taken as training data in positive prediction. In vaccination prediction, the first three months (January 2021 to March 2021) of data are taken as test data, and the rest of the data (April 2021 to November 2021) is taken as training data for vaccination rate prediction and results of applying the different algorithms are presented as follows.

310

M. P. Gowtham and N. Harini

Fig. 2 Architecture diagram of the research

4.1 Data set The model proposed is designed to work in two phases, the first phase is concentrated on the prediction of positive cases and the second phase is concentrated on the vaccination rate. For the first part, the data is collected from COVID-19 India’s open government API portal (https://data.covid19in dia.org/). The data set has information about the number of positive, recovered, and deceased information from 26/04/20 to 2/09/21 in the Tamil Nadu district wise, and for the second part district-wise vaccine data is collected from an open-source data website (https://www.kaggle.com/arnabbiswas1/covid-vac cination-india-district-wise-data?select=cowin_vaccine_data_districtwise.csv) The data set has information about the number of males, females, and transgender that had been vaccinated. The data set has total vaccination data counts from 16/01/21 to 24/08/21.

4.2 ARIMA ARIMA is a widely used time series forecasting model. The main parameter setting for the model ARIMA model is defined as follows: P is called the lag order, D is called the degree of difference. Q is called the order of moving average [12]. The model can be evaluated by Akaike Information Criteria (AIC); a lower AIC value implies a better model. Mean Absolute Percentage Error (MAPE)—the percentage error from MAPE implies how good is the model in predicting the series. The ARIMA model is the most predominant model in time series forecasting. The parameters of the

Region-Wise COVID-19 Vaccination Distribution Modelling in Tamil …

311

Fig. 3 Tiruppur (district in Tamil Nadu) positive case prediction. The X-axis represents the number of days and the Y-axis represents the positive case count

Fig. 4 Tiruppur (district in Tamil Nadu) Covaxin rate. The X-axis represents the number of days and the Y-axis represents the vaccination count

ARIMA can be used to pre-process the data which can make the data suitable for forecasting, and the model can predict the values with high accuracy. The ADF test is used to test the stationary. The degree of differencing (d) is found to be 2 using P-value from the ADF test. The p and q value is found from the PACF and ACF plots. The value is found to be 1 as lag 1 is significantly above the significant level. Using the p, d and q parameter settings, the ARIMA model is constructed and the model is used to forecast the positive cases for the next 300 days and to forecast the vaccination rate for the next 100 days. In ARIMA, the number of positive cases in Coimbatore is predicted to reach a range of 2.5 lakhs to 3 lakhs in the next 100 days. The model accuracy for the prediction is 98%. In Tiruppur, the number of positive cases is predicted to be 1.5 lakhs in the next 100 days as shown in Fig. 5. The model accuracy for the prediction is 99.7% (Figs. 3 and 4).

4.3 FbProphet FbProphet is a time series model. Facebook Prophet library was launched by Facebook. An API for carrying out time series forecasting. The advantage of this library is that it has the capability of handling stationarity within the data and also seasonalityrelated components. There is no need to pre-process the data manually. The output of

312

M. P. Gowtham and N. Harini

Fig. 5 Coimbatore (district in Tamil Nadu) Covaxin rate

Fig. 6 Coimbatore (district in Tamil Nadu) Covishield rate

the model will give the trend of the positive and vaccination rates. The future prediction is given as a range yhat upper and yhat lower within which the prediction value can be. The model can be evaluated by Mean Absolute Percentage Error (MAPE). The mode has been used to predict the Covaxin and Covishield vaccination rates. The model predicted the vaccination rate of Covaxin and Covishield in Coimbatore to be 3.74 lakhs and nearly 5.76 lakhs doses, respectively, in the next 118 days as shown in Figs. 7 and 8, respectively. In Tiruppur, the vaccination rate of Covaxin and Covishield in Coimbatore to be 1.5 lakhs and nearly 10 lakhs doses, respectively in the next 118 days (Fig. 6).

4.4 LSTM LSTM is known as Long Short-Term Memory; it is a deep learning model which can be used for time series analysis. The model supports both univariate and multivariate time series modelling. LSTM was introduced to get rid of the vanishing gradient problem faced in the previous architectures of RNN. The LSTM architecture has forget gate, input gate and output gate. The forget gate decides whether to take the information from the previous state or not. The input gate decides the values to be

Region-Wise COVID-19 Vaccination Distribution Modelling in Tamil …

313

Fig. 7 LSTM first architecture loss

Fig. 8 LSTM second architecture loss

updated and added in the current state and the output gate takes care of how much information from the previous and current states should be combined. The main advantage of LSTM over other networks is that the model can predict the values using fixed previous time step values and the model is not completely dependent on the previous time steps. The model is evaluated by Mean Absolute Error (MAE). In the research, two architectures are used to foresee the count of COVID occurrences and vaccination rate.

314

M. P. Gowtham and N. Harini

The bivariate LSTM model is used to forecast the count of affected instances and the vaccination count for the next 2 days. The model is trained to foresee the day’s count and rate using the previous four days’ time steps. Two architectures are used for the prediction. The first architecture has four input neurons and two output neurons with two hidden layers of LSTM with 100 neurons in each layer. In the second architecture, the input and the output neuron remain the same; the number of hidden layers is increased to four with each hidden layer will have 100 neurons. When the prediction result is compared with the MAE of two architectures, the first architecture seems to forecast the values of positive instances and vaccination rate with a minimum error when compared with the second architecture. The comparison of model loss is shown in Figs. 7 and 8. By comparing the results of the models implemented, the ARIMA model gives a better result than the other models. The MAPE value is better when compared with other models. As the ARIMA performance is based on the parameter settings, the performance is good as the values for the parameters have been fixed properly.

5 Conclusion The work presented here was thoroughly tested with the data set Sect. 3, and it was inferred that in Tiruppur, the number of people who took Dose 1 is 11 lakhs and dose 2 is 18 lakhs. Out of the total population in the district, only 1,800,000 people have been completely vaccinated and still approx. 10 lakhs people are yet to get vaccinated. Based on the models used to predict the vaccination rate we can see that the Covishield has been preferred by many people in this region. Based on the prediction nearly 200,000 people can be affected by corona in the near future and new variants are coming up. It is essential to at least cover the population by one dose. In Coimbatore, the number of people who took Dose 1 is 28 lakhs and dose 2 is 18 lakhs. Out of the total population in the district, only 1,800,000 people have been completely vaccinated and still approx. 20 lakhs people are yet to get vaccinated. Based on the models used to predict the vaccination rate we can see that the Covishield has been preferred by many people in this region. Based on the prediction nearly 200,000–30,000 people can be affected by corona in the near future and new variants are coming up. It is essential to at least cover the population by one dose. The scheme proposed can be effectively used to control the spread of COVID-19 in a given district of any state. The model can be incorporated with AI to function as a chatbot. The chatbots can be used to get hospital information regarding COVID-19. The chatbot can be used to get the COVID-19 information about a district based on the predictions made and risk assessment of a location.

Region-Wise COVID-19 Vaccination Distribution Modelling in Tamil …

315

References 1. Mehta, Y., Chaudhry, D., Abraham, O.C., Chacko, J., Divatia, J., Jagiasi, B., Samavedam, S., Kar, A., Khilnani, G.C., Krishna, B., Kumar, P. and Mani, R.K.:Critical Care for COVID-19 affected patients: position statement of the Indian Society of Critical Care Medicine. Indian J. Crit. Care Med. 24(4), 222–241 (2020) 2. World Health Organization. WHO Coronavirus Disease (COVID-19) Dashboard With Vaccination Data. https://covid19.who.int/region/searo/country/in/ 3. Ministry of Health and Family Welfare. MoHFW | Home. https://www.mohfw.gov.in/. (2022) 4. Centers of Disease Control and Prevention. Why to Get a COVID-19 Vaccine (2022). https:// www.cdc.gov/coronavirus/2019-ncov/vaccines/vaccine-benefits.html 5. Gupta, L., Misra, D.P., Agarwal, V., Balan, S., Agarwal, V.: Management of rheumatic diseases in the time of covid-19 pandemic: perspectives of rheumatology practitioners from India. Ann. Rheum. Dis. 80(1), 2020–217509 (2021) 6. Babu, K., Rajan, S., Paul, J., Kumar, L.: Anesthetic management of a COVID 19 suspected patient for mastectomy. Saudi J Anaesth 14(3), 411–412 (2020) 7. Ananthi, P., Begum, S.J., Jothi, V.L., Kayalvili, S., Gokulraj, S.: Survey on forecasting the vulnerability of Covid 19 in Tamil Nadu. J. Phys: Conf. Ser. 1767(1), 012006 (2021) 8. Benvenuto, D., Giovanetti, M., Vassallo, L., Angeletti, S., Ciccozzi, M.: Application of the ARIMA model on the COVID-2019 epidemic dataset. Data Brief 29, 105340 (2020) 9. Davahli, M.R., Karwowski, W., Fiok, K.: Optimizing COVID-19 vaccine distribution across the United States using deterministic and stochastic recurrent neural networks. PLoS ONE 16(7), e0253925 (2021) 10. Rath, S., Tripathy, A., Tripathy, A.R.: Prediction of new active cases of coronavirus disease (COVID-19) pandemic using multiple linear regression model. Diabetes Metab. Syndr. 14(5), 1467–1474 (2020) 11. Sahai, A.K., Rath, N., Sood, V., Singh, M.P.: ARIMA modelling & forecasting of COVID-19 in top five affected countries. Diabetes Metab. Syndr. 14(5), 1419–1427 (2020) 12. Pathak, P.: How to create an ARIMA model for time series forecasting in Python? Analytics Vidhya (2020, October 29). https://www.analyticsvidhya.com/blog/2020/10/how-to-create-anarima-model-for-time-series-forecasting-in-python/

Detection of Phishing Websites Using Machine Learning Rahul Kumar, Ravi Kumar, Raja Kumar Sahu, Rajkumar Patra, and Anupam Ghosh

1 Introduction The Internet has grown to be an essential part of our lives for assembling and posting information, mainly via social media. The Internet is a community of computer systems containing valuable and sensitive data, so there are many security and safety mechanisms in place to secure that information; however, there is a vulnerable link: we the humans [1, 2]. Social engineering is a type of attack that is one of the most common cyber-attacks [3, 4]. It is a type of attack used to steal user’s private and important data (please refer Fig. 1). Computer safety threats have rapidly increased in recent years (Fig. 2), owing to the speedy upgradation and improvements of technology, while simultaneously the vulnerability of human exploitation is increasing [5–8]. Users have to recognize how the phishers do it, and they must also be aware of techniques to help protect themselves from becoming victim of cyber-attack [9, 10]. Despite the fact that there are few unique characteristics in every type of phishing attack, majority of these attacks have some patterns and similarities [11–13]. And as we all know that machine learning method is a capable tool for finding similarities and patterns in the data, this method can be very well used to find similarities in different phishing attacks and recognize phishing websites [14–17]. In our work, we have used various machine learning methods to detect whether a website is phishing or not. The machine learning methods which we studied and used are as follows: logistic regression, k-nearest neighbor, support vector machine, decision tree, random forest, and neural networks. We have also used ensemble learning technique to increase the overall detection accuracy.

R. Kumar · R. Kumar · R. K. Sahu · R. Patra · A. Ghosh (B) Department of CSE, Netaji Subhash Engineering College, Kolkata, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_29

317

318

R. Kumar et al.

Fig. 1 Phishing growth by 2005–2015

Fig. 2 Financial losses across the globe due to phishing attacks

2 Related Work Related to this field, some researchers have examined the patterns in suspicious URLs and have worked on it to solve this problem. We have examined and reviewed the previous work of detecting phishing websites using URL features. Their work has motivated our own approach. Rishikesh and Irfan [5] stated the implementation and end result for detecting phishing websites. Rishikesh stated that to detect phishing websites, the best way to import machine learning algorithms is using scikit-learn. They split the dataset into 90:10, 70:30, and 50:50 ratios of training and testing sets, respectively. Each models created by them was trained on training set, and then they evaluated the performance of machine learning models. Jain and Gupta [9] stated

Detection of Phishing Websites Using Machine Learning

319

that in previous decade, various fraud websites were created to steal sensitive and personal information of victims which resulted in loss of financial assets of the users.

3 Methodology An extensive review was done on related topics and existing documented materials such as journals, ebooks, and websites containing related information gathered which was examined and reviewed to retrieve essential data to better understand and know how to help improve the system.

3.1 Data Collection The dataset used for the classification was sourced from multiple sources. We collected phishing dataset from an open-source platform called as PhishTank. This dataset consists of around 8000 random phishing URLs out of which we used 5000 random URLs from it to train the models. We collected the legitimate dataset from the website of University of New Brunswick. The dataset consists of spam, benign, phishing, and malware URLs. Out of them, we considered the benign URL dataset for this project which consists of 35,000 random legitimate URLs out of which we used 5000 random URLs from it to train the models. So, combinedly we have around 10,000 datasets for training ML models.

3.2 Feature Extraction We have extracted below features from URL using Python program. (1)

(2)

(3)

URL having IP Address: In place of domain name, phishers can use IP address in the URL, such as “http://115.96.2.132/fake.com”. Rule: If IP address is used instead of domain name, then it is phishing, otherwise legitimate. URL having “@” symbol: Phishers can use “@” symbol in the URL because browser ignores everything preceding “@” symbol. So this feature checks the presence of “@” symbol in the URL. Rule: If “@” symbol is present in the URL, then it is phishing, otherwise legitimate. Length of URL: Long URL can be used by the phishers to hide the suspicious part in the URL. On the basis of our research after going through various

320

R. Kumar et al.

documentation and journals related to this topic, we found that the legitimate URL has length less than 54 characters, otherwise it is phishing.

(4)

(5)

(6)

(7)

(8)

(9)

Rule: If the length of URL is less than 54, then it is legitimate, otherwise phishing. Depth of URL: This feature computes the depth of the URL. Depth of the URL means how many sub-pages are there in the URL based on “/”. This feature is numerical based. Redirecting “//” in URL: Phishers can use “//” in the URL. If this is done, then user will be redirected to an unknown website which the user is not aware of. After examining, we found that the position of “//” is sixth if the URL starts with “http”. And if the URL starts with “https”, then the position of “//” is seventh. Rule: If the last appeared “//” in the URL has position greater than 7, then it is phishing, otherwise legitimate. “https” token present in the domain name: “https” token can be added to the domain part by the phishers to cheat the users. For example, “http://www. bigbazaar.winner.prize.com/”. Rule: If http token is present in the domain part, then URL is phishing, otherwise legitimate. Use of URL shortening services “Tiny-URL”: Phishers can use URL shortening method to make URL of smaller length. Rule: If URL shortening services is used, then it is phishing, otherwise legitimate. Existence of “-” in domain part: It is very rare that legitimate URLs contain dash symbol. In the URL, prefixes or suffixes can be added to the domain name by the phishers. Rule: If the URL contains “-” symbol in the domain part, then it is phishing, otherwise legitimate. Website traffic: It is obvious that legitimate sites are visited by the large number of users. Trusted sites have more popularity so they are easily identified by the Alexa database (Alexa the Web Information Company).

Rule: If rank of the website is under 100,00, then it is legitimate, otherwise phishing. (10) Age of domain: The life of a phishing website is very small. After examining, we found that the age of phishing websites is less than only 6 months. Rule: If age of domain is less than 6 months, then it is phishing, otherwise legitimate. (11) End period of domain: We extracted this feature from WHOIS database. This feature finds the remaining time of the URL. It is calculated by the difference between expiration time of the URL and current time. After examining, we found that the phishing websites last for only less than 6 months.

Detection of Phishing Websites Using Machine Learning

321

Rule: If end period of domain is less than 6 months, then it is phishing, otherwise legitimate. (12) Iframe redirection: Using “Iframe” tag, other unknown webpages can be displayed into the currently opened webpage. Rule: If the iframe tag is used in source code, then it URL is phishing, else legitimate. (13) Customizing status bar: Phishers may trick users by showing them fake URL in the status bar which is done with the help of JavaScript in the source code. The “onMouseOver” event is used to do this. Rule: If onMouseOver event is used in the source code, then URL is phishing, else legitimate. (14) Disabling right click: To avoid users from viewing and saving the source code of the webpage, phishers can disable the right click function using JavaScript tool. The “onMouseOver” event is used to do this. So we will search whether “event.button = = 2” is used or not in the source code of the webpage to disable right click. Rule: If right click function is disabled, then URL is phishing, otherwise legitimate. (15) Website forwarding: Phishing websites are redirected to other webpages more times than the legitimate websites. After examining, we found that legitimate websites are redirected not more than 2 times. Rule: If response of URL is empty and website is redirected more than 2 times, then it is phishing, otherwise legitimate.

3.3 Data Preprocessing Data preprocessing refers to process of transforming the raw data into useful, understandable, and efficient format. The steps involved in data preprocessing are data cleaning, normalization, transformation, feature extraction, data reduction, etc. After performing data preprocessing, we get the absolute training dataset. Data preprocessing has a great impact on how the model performs at the end.

3.4 Exploratory Data Analysis Exploratory data analysis (EDA) is a technique of examining and inspecting the datasets to know the characteristics, discover patterns, and spot anomalies in the dataset. It is done often using diagrammatic representation [17–20]. After performing EDA, we can discover patterns in the data, unveil the hidden structure, spot anomalies,

322

R. Kumar et al.

Fig. 3 Error rate using different k values

and detect outliers in the dataset. Figure 3 shows the Heatmap of correlation of features of dataset which we did in our implementation part.

3.5 Train–Test Split After doing previous steps, we split the dataset into training and testing set. The training dataset is used to train the model, and the testing dataset is used to evaluate the performance of the model. We split our dataset into 80:20 ratio, i.e., 80% training dataset and 20% testing dataset.

4 Algorithms and Evaluation We first implemented basic machine learning algorithms as baseline measures of performance. Then we applied more complex models to increase the accuracy of the model. The machine learning algorithms which we implemented include logistic regression, SVM, KNN, decision tree, random forest, and neural networks. We have also tried to use ensemble learning techniques to increase the overall accuracy. They include bagging classifier and voting classifier.

Detection of Phishing Websites Using Machine Learning

323

4.1 Logistic Regression Logistic regression is a supervised machine learning algorithm which is based on the concept of probability and is used to solve classification problems. It is used to predict the probability of a binary event occurring. Logistic regression makes use of a more complicated cost function, instead of using linear function. This cost function can be referred to as the “Sigmoid Function” or also known as the “logistic function”. The hypothesis of logistic regression lies between 0 and 1, and it tends to limit the range of cost function between 0 and 1. 0 ≤ hθ (x) ≤ 1

(1)

We use the “sigmoid function” to map predicted values to probabilities. Below is the hypothesis of logistic regression: h θ (x) =

1 1+

e−(β0 +β1 x)

(2)

The cost function of logistic regression is described below:  Cost(h θ (x), y) =

if y = 1 − log(h θ (x)) − log(1 − h θ (x)) if y = 0

(3)

We can compress the above function into a single function as below: J (θ ) = −

   1  (i) y log(h θ (x(i))) + 1 − y (i) log(1 − h θ (x(i))) m

(4)

where J(θ) is cost function.

4.2 K-Nearest Neighbor K-nearest neighbors (KNN) is a supervised machine learning algorithm and is used to solve both classification and regression problems. It classifies based on a similarity measure. KNN algorithm uses feature similarity technique. That is, the value of a new data point will be assigned based on how closely it matches the points in the training set. The similarity between the new point and the points in the training set can be found in many different ways. When the neighbors of new data point are found, then the most common outcome is returned as the prediction. The prediction can also be made by taking the average. We calculated error rate using different k values. And as Fig. 4 shows, we got least error rate when k = 4. That is, when we set k = 4 we got best accuracy.

324

R. Kumar et al.

Fig. 4 Working of random forest algorithm

Let us denote our data point as x. Now suppose the four closest neighbors to x are x(1) , x(2) , x(3) , and x(4) . KNN uses a distance metric for calculating distance between points. The most popular distance metric is the Euclidean distance function (see Eq. 5). It is set default in the sklearn KNN classifier library in Python.  n  d(x, y) = (yi − xi )2

(5)

i=1

4.3 Decision Tree Classifier Decision tree is a supervised machine learning algorithm which is used for both classification and regression problems. It is structured as a tree. For the split at each non-terminal node, one of the attributes is chosen. That attribute which gives the highest information gain is chosen for the split. C4.5 is one of the most popular algorithms for decision trees. It is called statistical classifier because it has the potential to handle noisy data. In this algorithm, entropy is used as to calculate the information gain. The difference between the entropy before the split and the entropy after the split is defined as information gain. Below are the equations which are used to calculate the information gain.   E(T ) = − j P j log2 P j

(6)

Detection of Phishing Websites Using Machine Learning

325

Es(T) = −i Pi Es(Ti )

(7)

Gain(S) = E(T )−Es(T )

(8)

Here E(T) represents the entropy before the split, Es(T) represents entropy after the split, and Pj represents the probability of class j.

4.4 Random Forest Classifier Random forest is a supervised machine learning algorithm that can be used to solve both classification and regression problems. It is based on the concept of ensemble learning, which combines multiple models to solve a particular problem. Random forest, as its name suggests, contains a number of decision trees that work as a group and takes the average to improve the accuracy. The random forest has lower overfitting because there are a number of trees and many trees can be weak learners and they underfit. So overall they all balance the random forest. More trees in the random forest increase the accuracy and prevent overfitting. There must be a very low correlation among the predictions from each trees. Compared to other algorithms, random forest takes less training time.

4.5 Support Vector Machine Support vector machine is based on supervised learning and is used to solve both classification and regression problems. The main objective of the SVM algorithm is to find the decision boundary which is the best line to make partitions in the ndimensional space into classes. This is done so that when in the future we want to put a new data point in that space, it falls under correct category of class which it belongs to. These extreme vectors are called as support vectors. The original training data is transformed into a higher dimension and finds hyperplanes that partition data samples in the higher-dimensional feature space. The decision boundary hyperplanes are defined as follows: wx + b = 0

(9)

Here, w represents a weight matrix and b represents a constant. For maximizing the distance between the hyperplanes separating different category of classes, SVM algorithm finds the weight matrix (depicted in Fig. 5).

326

R. Kumar et al.

Fig. 5 Working of SVM algorithm

4.6 Ensemble Learning Ensemble learning is a technique in which multiple models are combined to solve a particular problem. It is a powerful tool to improve the accuracy of the model [11]. Random forest is also based on ensemble learning method. In this project, we have used bagging and voting classifier. Bagging: In bagging, the results of multiple models are combined to get better result. Since all models are getting same input as all are created on same set of data and combined, they all will give same result. In this project, we used “BaggingClassifier” module in sklearn. Voting: This ensemble technique is generally used to solve the classification problems. Voting or more specifically max voting is a technique in which multiple models are combined to predict the outcome. The prediction by each model is considered as a “vote”. The prediction which gets the maximum number of votes by different models is considered as a final prediction. In this project, we used “VotingClassifier” module in sklearn.

4.7 XGBoost Classifier For better speed and overall performance, gradient boosted decision trees were customized and refined and formed as XGBoost classifier. When existing models make errors, then new models are added to correct those errors. This is called boosting which is an ensemble technique. Nowadays, it is the most popular and dominating machine learning algorithm for structured data. It is almost 10 times faster than many

Detection of Phishing Websites Using Machine Learning

327

Fig. 6 Neural network

other algorithms. Several algorithmic optimizations are done that is why XGBoost has better scalability in all scenarios. XGBoost performs distributed and parallel computing to make faster learning.

4.8 Neural Networks A neural network mimics the structure of a human brain. The functionality of the biological neural networks is replicated in a mathematical model. Our project which is detecting phishing websites is a classification problem. So we have chosen a multilayer feedforward neural network. In this NN, there is no formation of directed cycle in the connection between neurons. Since our prediction is binary based, we have one computational unit in the output layer. We attempted different number of units in the hidden layer (Fig. 6). The activation function used in output layer is sigmoid function and rectified linear unit in hidden layer. We have tried with 100 epochs.

5 Performance Evaluation Our project, detection of phishing URLs is a binary classification problem. The URLs we collected can be divided into four categories when we apply classifiers to predict the result using true positive—TP (phishing URL is correctly identified), true negative—TN (legitimate URL is correctly identified), false positive—FP (legitimate URL is wrongly identified as phishing), and false negative—FN (phishing URL is

328

R. Kumar et al.

Table 1 Performance of different models S. No

Model

Accuracy %

FPR

FNR

Precision

Recall

1

Ensemble bagging

90.844

0.052

0.123

0.956

0.877

2

Random forest

90.689

0.055

0.123

0.953

0.877

3

XGBoost

90.689

0.055

0.123

0.953

0.877

4

Ensemble voting

90.638

0.043

0.132

0.964

0.868

5

Multilayer perceptrons

90.226

0.019

0.151

0.985

0.849

6

KNN

89.558

0.092

0.149

0.966

0.851

7

ANN

88.940

0.022

0.168

0.983

0.832

8

Decision tree

87.963

0.007

0.187

0.995

0.840

9

SVM

86.883

0.018

0.197

0.987

0.803

10

Logistic regression

86.626

0.017

0.201

0.988

0.799

wrongly identified as legitimate). We also calculated false positive rate (FPR), false negative rate (FNR), precision, and recall using following equations: |FP| # legitimate URLs |FN| FNR = # phishing URLs |TP| Precision = |TP|+|FP| |TP| Recall = |TP|+|FN|

1. FPR = 2. 3. 4.

6 Results and Discussion We implemented our project using libraries of Python. To import the machine learning algorithms, we used scikit-learn tool. Before training the model, we split our dataset into 80% training set and 20% testing set. Each ML algorithm is trained on training set, and their performance is evaluated on the testing set. The evaluation of the performance of the models is done by preparing confusion matrix, classification report, and confusion matrix and calculating false positive rate and false negative rate. See Table 1 to know about the performances of different models. After applying various algorithms, we found that ensemble learning works well with this dataset (Fig. 7). Among all algorithms, bagging classifier gave better detection accuracy of 90.84.

7 Conclusions In this research, we collected around 10,000 phishing and legitimate datasets. We then extracted 15 important features from the URLs, applied various machine learning algorithms, and evaluated their performances. We used ensemble learning technique

Detection of Phishing Websites Using Machine Learning Fig. 7 Plot of accuracy of different models

329

Ensemble-Voting Ensemble-Bagging ANN Multilayer Perceptrons XGBoost SVM Random Forest Decision Tree KNN Logistic Regression

90.64 90.84 88.94 90.23 90.69 86.88 90.69 87.96 89.56 84

86

86.63 88

90

92

Accuracy %

to improve the accuracy. After evaluating the performance of the classifiers, we found that ensemble classifiers like bagging and voting classifier, random forest, and XGBoost performed very well. We achieved 90.84% detection accuracy using bagging classifier with lower false positive rate and false negative rate. The results which we got have motivated us to do more future works to take this study at another level. We can add more features in our dataset which will help in better performance of the model. So in the future, we will also try to use other ensemble learning techniques available. This project can be taken further by creating a browser extension that can be installed on any web browser to detect phishing URL links.

References 1. Phishing Activity Trends Report, APWG 2019 (n.d.), 4th quarter. https://docs.apwg.org/rep orts/apwg_trends_report_q4_2019.pdf. Accessed from 15 Dec 2021 2. Zalavadia, F., Nevrekar, A., Pachpande, P., Pandey, S., Govilkar, S.: Detecting phishing attacks using natural language processing and deep learning models. J. Appl. Sci. Comput. (JASC) 378–382 (2019) 3. Deshpande, A., Chaudhary, N., Pendamkar, O., Borde, S. (2021) : Detection of phishing websites using machine learning. Int. J. Eng. Res. Technol. (IJERT) 10, 430–433 4. Shaikh, A., Shabut, A., Hossain, A.: A literature review on phishing crime, prevention review and investigation of gaps. In: 2016 10th International Conference on Software, Knowledge, Information Management & Applications (SKIMA), pp. 9–14 (2016) 5. Mahajan, R., Siddavatam, I.: Phishing website detection using machine learning algorithms. In: Int. J. Comput. Appl. 23, 45–47 (2018) 6. Abdelhamid, N.,Thabtah, F., Abdel-Jaber, H.: Phishing detection—a recent intelligent machine learning comparison based on models content and features. In: IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 72–77 (2017). https://doi.org/10.1109/ISI. 2017.8004877 7. Basnet, R., Mukkumala, S., Sung, A.H.: Detection of phishing attacks—a machine learning approach. Stud. Fuzziness Soft Comput. 226, 373–383 (2008) 8. Kiruthiga, R., Akila, D.: Phishing websites detection using machine learning. Int. J. Recent Technol. Eng. (IJRTE) 8, 111–113 (2019)

330

R. Kumar et al.

9. Jain, A.K., Gupta, B.B.: Phishing detection—analysis of visual similarity based approaches 2017 (2016). Retrieved from https://doi.org/10.1155/2017/5421046 10. Abdul, O., Alaran, M., Kareem, S.O., Adebayo, A.: A lightweight anti-phishing technique for mobile phone. Acta Informatica Pragensia 6(2), 114–123 (2017) 11. Singh, A.: A comprehensive guide to ensemble learning (2018). Retrieved from https://www. analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/. Accessed from 15 Dec 2021 12. Tewari, A., Jain, A.K., Gupta, B.B.: Recent survey of various defence mechanisms against phishing attacks. J. Inf. Privacy Secur. 12(1), 3–13. (2016). https://doi.org/10.1080/15536548. 2016.1139423 13. Khonji, M., Iraqi, Y.: Phishing detection—a literature survey. In: 2013 IEEE Communications Surveys & Tutorials, vol. 15, no. 4, (pp. 2091–2121) (2013) 14. Swaroop, K.P., Chowdary, K.R., Kavishree, S.: Phishing websites detection using machine learning techniques. Int. Res. J. Eng. Technol. (IRJET) 1471–1473 (2021) 15. Imperva Blog: Phishing attacks (2021). Retrieved from https://www.imperva.com/learn/applic ation-security/phishing-attack-scam/. Accessed from 15 Dec 2021 16. Jagatic, T.N., Johnson, N.A., Jakobsson, M., Menczer, F.: Social phishing. In October 2007 Communications of the ACM, vol. 50, no. 10, pp. 94–100 (2007) 17. Dhamija, R., Tygar, J.D.: The battle against phishing—dynamic security. In: SOUPS 2005: Proceedings of the 2005 ACM Symposium on Usable Security and Privacy, ACM International Conference Proceedings Series, ACM Press, pp. 77–88 (2005) 18. Almomani, A., Atawneh, S., Meulenberg, A, Gupta, B.B.: A survey of phishing email filtering techniques. In 2013 IEEE Communications Surveys and Tutorials, vol. 15, no. 4, pp. 2070–2090 (2013) 19. Jain, A., Kulal, C., Kini, M., Deekshitha, S.: A review of detection of phishing websites using machine learning. Int. J. Eng. Res. Technol. (IJERT) 7, 2–3 (2019) 20. Arachchilage, N.A., Psannis, K.E.: Defending against phishing attacks—taxonomy of methods, current issues and future directions, pp. 4–14 (2018). https://doi.org/10.48550/arXiv.1705. 09819

Computational Learning Model for Prediction of Parkinson’s Disease Using Machine Learning Ch. Swathi

and Ramesh Cheripelli

1 Introduction Parkinson’s disease is a degenerative central nervous system ailment that impairs mobility and causes tremors and rigidity. It is a five-stage disease that affects about 1 million people in India each year. This is a chronic condition for which there is currently no treatment. Dopamine-producing neurons in the brain are affected by this neurodegenerative disease (Fig. 1). – Tremor: Shaking or tremors usually begin in the hands or fingers, then progress to the rest of the body. It is a kind of tremor that happens when your thumb and fingers move back and forth against each other when you are pill-rolling. While your hand is at rest, it may shake. – Instability of movement (bradykinesia): As Parkinson’s disease progresses, your ability to carry out everyday tasks may become increasingly difficult and time consuming. When you walk, you may notice that your strides are becoming shorter. Getting out of a chair might be a hassle. As you try to move, you may notice that your feet are dragging behind you. – Rigid muscles: Any part of the body might be affected by tight muscles. Your range of motion may be restricted if you have tense muscles. The effects on one’s posture and equilibrium are negative. A side effect of Parkinson’s disease may be that your posture changes and you lose your balance. Ch. Swathi (B) Department of CSE, G. Narayanamma Institute of Technology and Science, Hyderabad, Telangana, India e-mail: [email protected] R. Cheripelli Department of IT, G. Narayanamma Institute of Technology and Science, Hyderabad, Telangana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_30

331

332

Ch. Swathi and R. Cheripelli

Fig. 1 Different stages of Parkinson’s disease

– Loss of automatic movements: It is possible that the ability to do unconscious motions such as blinking, smiling, or swinging your arms while walking will be compromised. – Speech changes: You have the option of speaking softly or quickly, slurring your words, or pausing before speaking. It’s possible that your speech is more monotone than usual, without inflections. – Writing changes: As a consequence, writing may be sloppy and difficult to read. Nervous system disorders by 2040, PD is anticipated to affect 770,000 persons in the USA, a 56% increase from the previous year [1]. Finding modifiable risk factors for Parkinson’s disease may be accomplished by studying patterns of Parkinson’s disease incidence within and across communities, which can be done via epidemiological research. Several studies have demonstrated that the possibility of having PD is rising with time [2–4], while others have shown that the risk is decreasing with time [5]. The reasons for disparities in diagnosis might be related to inconsistencies in factual information or discrepancies in the approach employed to reach the diagnosis. Diagnoses, medications, and treatments are being stored in databases, which are being utilised as a source of information in clinical research more and more. It is feasible to save both time and money by gaining access to these databases, despite the fact that they are often built for billing reasons rather for research objectives. A diagnosis code’s doubtful usefulness for properly describing a medical state should be evaluated [6–8]. As the most accurate way for defining illness categories retrospectively, manual data extraction using authorised criteria from medical records is also the most time consuming. Our purpose was to determine the validity of identifying incident PD patients and calculating PD incidence by using diagnostic codes. We expected that relying only on diagnostic codes would result in an overestimation of incidence PD.

2 Related Work Throughout history, illnesses have developed and vanished, or their prevalence has grown or diminished. These epidemiological patterns have been more pronounced

Computational Learning Model for Prediction of Parkinson’s Disease …

333

Fig. 2 Parkinson’s disease stage and motor and non-motor symptoms at each stage

in infectious illnesses (e.g. smallpox eradication), chronic conditions, however (e.g. lowering of vitamin inadequacies and decreasing the incidence of cardiovascular disease) (Fig. 2). Changing patterns in the epidemiology of neurological illnesses have been observed during the last three or four decades. In several high-income countries, the danger of dementia and stroke has dropped; all three types of multiple sclerosis are increasing in prevalence: amyotrophic lateral sclerosis (ALS), Parkinson’s disease (PD), and late onset multiple sclerosis (LOMS) [9, 10]. Distinct from North America and Western Europe’s physical and social environments, other places of the world have witnessed contrasting trends. Understanding these epidemiological patterns is critical for medical and public health decision-making [11, 12]. Dorsey, E Ray, Elbaz, Alexis, and other members of the GBD 2016 Parkinson’s Disease Collaborators [3, 13] have published a detailed study of indirect estimates of changes in worldwide Parkinson’s disease prevalence, disability, and mortality rates between 1990 and 2016. The GBD Study’s data and methods were used to undertake a series of research, forecasts, and extrapolations by the authors [14–16]. Despite the scarcity of high-quality data sources spanning vast geographic areas and the sophisticated assumptions behind GBD, the findings have significant public health implications and should be extensively highlighted. The results of this study provide early estimates of PD prevalence and incidence that may be used to inform future health planning and forecasting. According to the authors, in the entire world PD cases have been raised from 25 million persons in 1990 to 61 million patients in 2016 (95% confidence range 5073). In addition to population ageing and longer disease duration, they believe this surge is due in part to changes in environmental and socio-economic risk factors, such as longer life expectancy. Parkinson’s disease diagnosis and coding may have increased in routine medical care, according to research

334

Ch. Swathi and R. Cheripelli

based on registries or medical records. We must ask whether or not Parkinson’s disease risk or incidence has risen in the previous three decades due to prevalence reflecting both the incidence and duration of illness. Several studies indicate a rise in incidence in a number of high-income nations [2, 5, 17]. Several investigations, however, failed to corroborate the rise or observed a reduction. As a consequence of varying physical or social environments, different countries are likely to have had different patterns. Smoking, agricultural pesticide usage, clean water availability, and head trauma, for example, may have reduced in certain nations but amplified or remained stable in others. There is an urgent need to ascertain the frequency and occurrence of Parkinson’s disease, as well as risk and preventive factors, in various nations or geographical regions. At the age of 50, males were more likely to develop Parkinson’s disease than women, according to research from Dorsey, Elbaz, and colleagues. Researchers believe that this dimorphic pattern may hold the key to figuring out what causes Parkinson’s disease in people living in modern society. As a consequence of sex factor interactions (e.g. particular DNA segment on X or Y, impacts of hormones of different gender, and the effect on pregnant women), as well as with gender variables (e.g. the effects of pregnancy), it is likely that the effects of certain hazards on men and women will vary [7, 8]. The illness is both heterogeneous and complicated as a result of the many risk and protective variables that may be associated with Parkinson’s disease in a particular patient (there are various factors which are pretentious by the gender and sex). Twenty years ago, the Centres for Disease Control and Prevention (CDC) announced that the number of people living with Parkinson’s disease had more than doubled. These findings have subsequently been validated by additional research. With a conservative estimate of the patient population doubling in the next 30 years, this scenario predicts that there would be more than 12 million patients globally by 2050. Nevertheless, if inhabitants ageing continues, medical treatment improves survival rates, and environmental or social risk aspects remain constant or deteriorate, the patient population will almost certainly increase even more. According to Dorsey, Elbaz, and colleagues, Parkinson’s disease sufferers require new prevention strategies and treatments urgently. For men and women, as well as for various countries or regions of the world, preventive therapies may need to be tailored [18, 19].

3 Methodology Studies of drawings by Parkinson’s disease patients’ organization using three different classical types: – Random Forest based on image processing variables such as intersection count and line thickness. – Logistic Regression using ResNet50 architectural characteristics – Created a convolutional neural network model from scratch. The architecture is consistent with the micro VGG. Conv is immediately followed by dense (Fig. 3).

Computational Learning Model for Prediction of Parkinson’s Disease …

335

Fig. 3 Parkinson’s disease patients’ data organization

3.1 Data Source The National Health Insurance Administration (NHIA) of the Ministry of Health and Welfare collects and maintains statistics on national health insurance (NHI). This database was the starting point for our inquiry (NHIA). Expert assessments of an arbitrarily consider the patients of maximum 100 and minim 50 claims form the patients from various clinics and health care centres are performed by the National Health Insurance Administration (NHIA) on a quarterly basis to guarantee that claims data is accurate [20, 21]. The National Health Interview Survey data sets were among the world’s biggest and most thorough, and they were utilised in a significant number of published epidemiologic research on Parkinson’s disease [5, 7, 8] National Health Insurance (NHI) benefit claims that contain a diagnosis of Parkinson’s disease are accepted as authentic by neurologists. Data from the NHI for ambulatory care claims and ambulatory care orders were utilised in this study, as well as all claims data from the National Health Research Institutes (NHRI) for all inpatient claims data. In order to connect all data sets together, each individual is given an encrypted personal identification number (PID).

3.2 Study Subjects A patient was considered to have Parkinson’s disease (PD) if they had been diagnosed with the condition at least three times after their initial diagnosis and had been treated with Parkinson’s drugs such as LDPA or prescriptions for DAO agonists between 2002 and 2009. (ICD-9-CM code 332.0). Individuals who have been mistakenly diagnosed with Parkinson’s disease should be allowed at least 90 days between their initial and latest outpatient or inpatient appointments; data on anti-medication Parkinson’s must be necessary to prevent them from being excluded from the study.

336

Ch. Swathi and R. Cheripelli

Consequently, in order to include only cases of Parkinson’s disease that were newly discovered between 1999 and 2001, we excluded patients who had previously been diagnosed with the condition from the research (also known as incident cases). Those diagnosed with Parkinson’s disease between 2002 and 2009 were assigned an index date of between 2002 and 2009. Additional criteria were used to guarantee that the PD diagnosis was legitimate, and they were as follows: 1. 2. 3. 4.

Being younger than 40 years of age on the index date. In the observation period is it diagnosed with secondary Parkinsonism. In the first 180 days of course any kind of neuroleptic medication is used. Does they have medical claims 3 or more than that in the past prior to the index date that are consistent with a diagnosis of secondary Parkinsonism.

Patients having PD who detected before 2002, as well as those who got the condition and survived into adulthood, were regarded to be the norm at the time. Between 2002 and 2009, this research identified 26,996 persons who received a first-time Parkinson’s disease diagnosis and 181,277 people who had prevalent Parkinson’s disease based on the inclusion and exclusion criteria listed above.

3.3 Demographics Socio-economic situation and the amount of urbanisation, a database called the National Health Insurance Research Database was used to collect demographic and socio-economic data on the USA, including information on age, gender, and premiums which are based on their income, in addition to information on country’s degree urbanisation (NHIRD). For the purposes of this study, patients were split into five age groups according to their birth year at the time of their first PD diagnosis (incident cases) or at the time of their first PD diagnosis (prevalent cases): ages 40 to 49; 50 to 59; 60 to 69; 7079; and 80. Dependents were categorised in each year how much it is taking on the average is it less, equal or more than that paying; its based on how much premiums they are paying every year. In accordance with Liu et al. [19, 22], the USA is classified into three degrees of urbanisation based on density of population, how many of them completed at least secondary school, the ratio of senior citizens (over the age of 65), and on an average we consider for the people of 105 how many physicians are available.

3.4 Validation We investigated validity of Parkinson’s disease diagnosis in National Health Insurance entitlements by go over, more than 4000 files which we chosen randomly from all Parkinson’s disease people treated at various Hospitals, health care units, and analysing their medical records. Nine hundred and sixty-three Parkinson’s disease

Computational Learning Model for Prediction of Parkinson’s Disease …

337

patients were randomly selected, and in which we found around 864 satisfied all qualifying standards. Identifying Parkinson’s disease patients with our approach has a 91.5% accuracy, 94.8% specificity, a 96.4% positive predictive value, and an 85.7% negative predictive value, according to the results of our study. Certain medical organisations may be unable to make use of the information acquired during the validation procedure due to contractual restrictions.

3.5 Statistical Analysis Between 2012 and 2020, the number of new Parkinson’s disease cases was divided by the total number of NHI participants to generate biennial crude PD incidence and frequency rates, which were then used to assess the illness’s prevalence. PD incidence and frequency rates were computed throughout time using the WHO 2000 reference population as a baseline. We utilised Poisson regression to investigate if there was a linear secular trend in the occurrence and frequency of PD. The subjective average of yearly occurrence and frequency rates for each gender and age group was computed to get gender and age-specific occurrence and frequency rates for Parkinson’s disease. These rates were then used to establish age- and sex-specific Alzheimer’s disease and other neurodegenerative disease incidence and prevalence rates. Multiple Poisson regression was used to analyse the incidence and prevalence of PD in the UK. The research takes into consideration the independent effects of gender and age, as well as the probable impacts of urbanisation and socio-economic status (SES) on the illness. An equation which is the estimation of generalised is knows as GEE, GEE is employed to data clustering gathered from Parkinson’s disease patients existing at the equivalent level of urbanisation and calculated recurrently in order to estimate the frequency rate [21, 23, 24]. SAS 9.4 will be utilised for statistical analysis (SAS Institute, Cary, NC, USA). According to the scientists, a statistical significance level of 0.05 was found sufficient to justify further inquiry.

4 Results and Discussion We will store our data in a Pandas DataFrame. We can also categorise the data based on the directory path name. The files are organised such that we have {spiral-wave train-test healthy-Parkinson}. As you can see, we can therefore organize by the type of drawing, training or testing for later classification, and the class label for the prediction. We create a mosaic or montage of plots to show images. We group the images by their activity, i.e spiral or wave and if the drawer suffered from Parkinsons or not. We take the first nine images per montage image (Figs. 4, 5 and 6). Show all drawings on same plot by plotting the skeleton pixels as points and rescaling; we can overlay all of the images on top of each other for better visualization. The healthy patients are significantly more consistent than the Parkinson’s (Table 1).

338

Ch. Swathi and R. Cheripelli

(a) Sprial Healthy

(b) Sprial Parkinson

Fig. 4 Sprial

(a) Wave Healthy

(b) Wave Parkinson

Fig. 5 Wave

(a) Sprial Healthy

(b) Sprial Parkinson (c) Wave Healthy

Fig. 6 All drawings on same plot

(d) Wave Parkinson

Computational Learning Model for Prediction of Parkinson’s Disease …

339

Table 1 Activity disease mean and std thickness Spiral Wave

Healthy

1.894230

Parkinson

1.879122

Healthy

2.559639

Parkinson

2.819350

Spiral Wave

Healthy

0.428052

Parkinson

0.524317

Healthy

0.518404

Parkinson

0.709551

Interestingly enough we can see that on average the thickness is greater for Parkinson patients but more so the variation in the thickness, indicating the inconsistent hand movements so often associated with the disease. Both these results are, however, impacted by some drawings that overlap the spiral or wave lines and therefore increasing the thickness. The jittery-ness will also increase the apparent thickness of the stroke. Looking at the examples above we clearly understand the difference. So it is not just pen pressure that is being captured but a combination of pen pressure, as well as inconsistency with the drawing movement (Fig. 7).

Fig. 7 Thickness of stroke

340

Ch. Swathi and R. Cheripelli

5 Conclusions This article offered a model for Parkinson’s disease prediction based on spirals and waves. We analyse the raw images collected throughout this study to see whether we can construct a classifier for a patient with Parkinson’s disease and present our results. It was tested on a publicly accessible PD data set obtained from the University of California, Irvine, and found to be effective. It was discovered that the suggested DNN model is capable of accurately capturing nonlinearity in the feature space and outperforms previously known techniques. The suggested model’s performance increases when more data points are added to the data sets. The model surpassed the 85% benchmark with very little fine tuning and adjustments needed. In future, it would be nice to add features such as speed and pen pressure (CISP) or at least try to determine them from the thickness and intensities of the lines on the drawings.

References 1. Jones, D.S., Greene, J.A.: The decline and rise of coronary heart disease: understanding public health catastrophism. Am. J. Publ. Health 103, 120718 (2013) 2. GBD 2016 Parkinsons Disease Collaborators. Global, regional, and national burden of Parkinsons disease, 19902016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet Neurol. 2018; published online October 1 3. GBD 2015 Neurological Disorders Collaborator Group. Global, regional, and national burden of neurological disorders during 19902015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet Neurol. 16, 87797 (2017) 4. Savica, R., Grossardt, B.R., Bower, J.H., Ahlskog, J.E., Rocca, W.A.: Time trends in the incidence of Parkinson disease. JAMA Neurol. 73, 98189 (2016) 5. Isotalo, J., Vahlberg, T., Kaasinen, V.: Unchanged long-term rural-to-urban incidence ratio of Parkinsons disease. Mov Disord 32, 47475 (2017) 6. Rocca, W.A.: The future burden of Parkinsons disease. Mov. Disord. 33, 89 (2018) 7. Kaasinen, V., Vahlberg, T., Suominen, S.: Increasing age-adjusted male-to-female incidence ratio of Parkinsons disease. Mov. Disord. 30, 28688 (2015) 8. National Institute of Neurological Disorders and Stroke. Parkinsons disease: hope through research. National Institute of Neurological Disorders and Stroke, Bethesda (1994) 9. Marras, C., Beck, J., Bower, J., Roberts, E., Ritz, B., Ross, G., et al.: Prevalence of Parkinsons disease across North America. NPJ Parkinsons Dis. 4(1), 21 (2018) 10. Rascol, O., Goetz, C., Koller, W., Poewe, W., Sampaio, C.: Treatment interventions for Parkinsons disease: an evidence based assessment. Lancet 359(9317), 15891598 (2002) 11. Leon, S., Mutnick, H., Souney, F., Swanson, N.: Comprehensive pharmacy review. West Camden Street, Baltimore (2004) 12. Rosenbaum, R.B.: Understanding Parkinsons Disease: A Personal and Professional View. Greenwood Publishing Group, West-port (2006) 13. Savica, R., Grossardt, B.R., Bower, J.H., Ahlskog, J.E., Rocca, W.A.: Risk factors for Parkinsons disease may differ in men and women: an exploratory study. Horm Behav 63, 30814 (2013) 14. Clinic M. Parkinsons disease (2019). https://www.mayoclinic.org. 15. Shulman, L.M., Gruber-Baldini, A.L., Anderson, K.E., Fishman, P.S., Reich, S.G., Weiner, W.J.: The clinically important difference on the unified Parkinsons disease rating scale. Arch. Neurol. 67(1), 6470 (2010)

Computational Learning Model for Prediction of Parkinson’s Disease …

341

16. Tsanas, A., Little, M.A., McSharry, P.E., Ramig, L.O.: Accurate telemonitoring of Parkinsons disease progression by noninvasive speech tests. IEEE Trans. Biomed. Eng. 57(4), 884893 (2010) 17. Darweesh, S.K., Koudstaal, P.J., Ikram, M.A.: Trends in the incidence of Parkinson disease. JAMA Neurol. 73, 1497 (2016) 18. Brooks, D.J.: The early diagnosis of Parkinsons disease. Ann. Neurol. 44(1), S10S18 (1998) 19. Diamond, S., Markham, C., Hoehn, M., McDowell, F., Muenter, M.: Effect of age at onset on progression and mortality in Parkinsons disease. Neurology 39(9), 11871190 (1989) 20. Hely, M.A., Morris, J.G., Traficante, R., Reid, W.G., OSullivan, D.J., Williamson, P.M.: The Sydney multicentre study of Parkinsons disease: progression and mortality at 10 years. J. Neurol. Neurosurg. Psychiatry 67(3), 300307 (1999) 21. Ozcift, A.: SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease. J. Med. Syst. 36(4), 21412147 (2012) 22. Ramaker, C., Marinus, J., Stiggelbout, A.M., Van Hilten, B.J.: Systematic evaluation of rating scales for impairment and disability in Parkinsons disease. Mov. Disord. 17(5), 867876 (2002) 23. Haaxma, C.A., Bloem, B.R., Borm, G.F., Oyen, W.J., Leenders, K.L., Eshuis, S., et al.: Gender differences in Parkinsons disease. J. Neurol. Neurosurg. Psychiatry 78(8), 819824 (2007) 24. Babu, G.S., Suresh, S.: Parkinsons disease prediction using gene expression—a projection based learning meta-cognitive neural classifier approach. Exp. Syst. Appl. 40(5), 15191529 (2013)

Eliminating Environmental Context for Fall Detection Based on Movement Traces J. Balamanikandan , Senthil Kumar Thangavel , and Maiga Chang

1 Introduction In the current society, young people are traveling throughout the country in quest of better employment opportunities. According to Report [1], nearly 14.7 million or 28% of elderly Americans live alone, up from 6% a century ago. This includes 21% of older men and 34% of older women. Every year, between 28 and 35% of the elderly fall, and these statistics worsen with age. Elderly people living alone are more likely to be destitute and prone to fall. Furthermore, a decline in eyesight, a lack of attention, and myasthenia gravis are all factors that can contribute to falls. Until recently, most elderly people were supervised by humans, which resulted in a slew of management issues and personnel expenditures. Building an intelligent elderly monitoring system that can automatically identify and alert caregivers is critical for post-fall survival. The medical consequences of a fall have been proven to be highly reliant on response time. Fall detection systems [2, 3] may assist medical workers react faster and alleviate the medical consequences of falls. Elders who fall are often unable to recover on their own, results in unconsciousness and increased chance of mortality. Thus, human fall detection has important theoretical and sociological research relevance. Identifying human behavior continues to be a challenging and important undertaking [4]. Falling is frequently regarded as an abnormal human activity distinct J. Balamanikandan · S. K. Thangavel (B) Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] J. Balamanikandan e-mail: [email protected] M. Chang Athabasca University, Alberta T5J-3S8, Canada © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_31

343

344

J. Balamanikandan et al.

from other activities of daily living (ADL). A typical fall detection system should distinguish between falls and ADLs. There are several fall patterns available, each having its own motion posture and making it difficult to mimic the type of the fall. As a result, relatively few real-life falls have been recorded. With fewer data, it’s challenging to create a model that can detect falls in all scenarios. To work on any device and in any environment, a new architecture must be devised that is both lighter and more precise. This paper proposes an approach that can quickly distinguish between different activities with less computer resources.

2 Literature Survey The past several years have seen a proliferation of research on various methods for fall detection systems being conducted. Several studies [5, 6] examined these methods in detail and categorized those methods into wearable device-based systems and context-aware systems based on data collection equipment used. Wearable devicebased solutions need the individual to wear sensors such as a tilt switch, an accelerometer, or a gyro meter [6]. While context-aware systems often make use of piezoelectric sensors, acoustic sensors, infrared, or RGBD cameras. Video-based analysis has acquired significant appeal among context-aware systems as a result of the rapid development of computer vision architectures [5]. The research [7] describes the usage of an accelerometer sensor to detect falls. This was done by computing the subject’s SVMA and comparing it to a specified threshold. They determine the heart rate and trunk angle if the value is above the threshold and achieved an accuracy of 97.5%. Researchers designed a wearable airbag device for the avoidance of fall injuries in [8]. They activated the airbag using both acceleration and angular velocity inputs using a thresholding mechanism. Although the primary benefit of prediction through these devices is that it is inexpensive and does not jeopardize the user’s privacy. The suggested approaches generate high number of false positives. The wearable sensors require rather precise sensor alignment, it is inconvenient for users, particularly seniors who often forget to wear them [9]. An image classification pipeline is most often used to analyze video data for identifying falls [10] since it is more cost-effective and has been widely used in the monitoring of most buildings in the world. Most research [11–14] utilizes cameras to assist persons inside smart building. In [9], the researchers suggested detecting falls by evaluating human shape deformation during a video sequence. A shape matching algorithm is utilized to monitor the person’s silhouette throughout the video sequence, and then the Gaussian Mixture Model is employed for classification. In [15], the authors suggested a Siamese network with one-shot classification, which learns to distinguish distinct video sequences with the use of a similarity score. Two architectures are used to classify RGB and optical flow features: one with 2D convolutional filters and another with depth filters. They obtained findings that were comparable to those obtained at the state of the art. In [16], The authors

Eliminating Environmental Context for Fall Detection Based …

345

provided mathematical methods for detecting human falls using pretrained YOLO FCNN and Faster R-CNN architectures. Their approach achieves 99.22% accuracy in the Fall Detection Dataset (FDD). The authors of [17] developed a three-stream convolutional neural network (CNN) architecture for classification after preprocessing the surveillance footage. The first two streams of CNN receive silhouettes and motion history images from video, while the third stream receives dynamic images. It obtained 100% sensitivity and 99.9% specificity in the multiple cameras fall dataset. This [18] is closely related to our work of integrating traditional algorithms with deep learning methodologies. They extract the skeletal information of important joints related to fall activity. These features are passed on to 1D CNN to classify the action. It achieves an accuracy of 99.2% in NTU-RGBD dataset. Most of the prior research in this field has relied on hand-crafted features, which has its own set of issues, notably in terms of generating fair and consistent feature sets for tasks that demand a comprehensive understanding of their respective domains. The initial challenge when developing vision-based systems is to identify the motion component of persons in a scene. Numerous authors created their own frameworks for motion detection, while others relied on predefined motion extraction approaches. Most deep learning algorithms are computationally expensive and need a powerful GPU to process. The detection in this study is accomplished via the combination of algorithms and neural network architecture. This has resulted in increased processing speed for devices with limited resources.

3 Algorithm In this paper, an algorithm-based approach is employed for the identification of falls in the environment. This section will discuss the algorithms being utilized and the various methodologies used to accurately identify activity in real time.

3.1 Structural Similarity It is observed that natural image signals are highly structured. The successive frames of the video demonstrate significant interdependence between them. These dependencies include crucial information about the structural features. By comparing the two subsequent frames, the structural variations between them can be identified. These pixel-by-pixel variations reveal the motion information. Structural Similarity Index (SSIM) [19] in general helps in measuring the similarities and differences in each image. Suppose x and y are two subsequent frame images, the purpose of the system is to provide the similarity differences between them. To measure the differences, the SSIM combines three components of the image such as luminance, contrast, and correlation term. Given an image, Luminance can be calculated by averaging the intensities of all the pixels using,

346

J. Balamanikandan et al. N 1  xi N i=1

μx =

(1)

Then, the luminance comparison function is written as, 2μx μ y + C1 μ2x + μ2y + C1

l(x, y) =

(2)

where μx and μy corresponds to the luminance of x and y signals [19]. The mean intensity is removed from the signal, N i=1

xi = 0

(3)

The contrast [19] can be compared using the formula, c(x, y) =

2σx σ y + C2 σx2 + σ y2 + C2

(4)

After that, the image signal is normalized by its own standard deviation to have a unit standard deviation. The structural comparison is conducted on the normalized signals and can be represented as [19], s(x, y) =

σx y + C 3 σx σ y + C 3

(5)

where C 1 , C 2 , C 3 are constants to assure numeric stability. The above values are computed locally to a particular location rather than globally. This nature of the method is adapted in this research where the comparison of local region pixels between successive images provides us the local change in pixel regions. They are represented in a binary image of the same size as the original frame with altered pixels having a value of 1 and other pixels with a value of 0. These pixel-level changes are localized in the frame by bounding boxes using the contour approximation approach, which is detailed in the next part.

3.2 Contour Tracing Method The contour tracing approach [20] is a segmentation technique used to determine the border pixels of a digital binary image. This approach is relatively straightforward in defining the hierarchical connections between the boundaries and efficiently distinguishes the outer borders attributes from the hole borders. It is applied to the binary image generated using the frame differencing method; it begins by scanning the image from the top left region in a row-by-row way in search of the boundary

Eliminating Environmental Context for Fall Detection Based …

347

beginning point. After determining the contour start point, it performs pixel neighborhood search sequence in 8-connectivity and boundary tracing following certain criteria to avoid recirculating the same boundary. This method in [20] assigns a unique number to the border pixels in order to distinguish them from one another and names them NBD. The NBD starts from ‘two’ as the entire frame is already termed as ‘one’. It begins by assigning the NBD value to the border pixels clockwise. When the current boundary pixel Pn equals the second boundary pixel P2 and the prior boundary pixel Pn−1 equals the first boundary pixel P0 , the procedure terminates with the contour boundary being determined. This tracing approach eliminates single-pixel contours or noise and locates each contour’s outside edge. After segmenting the contour area, a bounding box is used to localize the contour in the intermediate frame. As this frame captures the motion component that occurs between successive frames rather than just detecting the complete person. We could directly extract the internal motion that occurs between the human body’s bodily sections. To eliminate noise in the motion component’s representation, we execute several morphological procedures such as dilation and erosion. The bounding box is fitted to the contour representing the motion component, and the center point of the bounding box is determined. These procedures are performed on successive frames, and the moving trace is constructed using the center points of the complete frameset. The trajectory image is constructed on a binary digital image with dimensions 224 × 224 and pixels set to zero.

3.3 Neural Network The architectures of vanilla-CNN designed for the prediction is shown in. In the proposed architecture, the Vanilla-CNN typically includes stack of three convolutional, pooling, and activation layers with fully connected layers attached at the end for prediction. Convolutional layers operate on the input that condenses the number of pixels in the receptive field of a filter into a single value. A filter "slides" across the 2D input data in a conv2D layer, performing elementwise multiplication and condensing the information into a single output pixel. The kernel will repeat this procedure for each location it traverses, transforming a 2D feature matrix into an informational 2D feature matrix. The max pooling layer chooses the most prominent feature from the region of the feature map covered by the filter, eliminating all other features. Finally, dense layers categorize the input based on the retrieved characteristics (Fig. 1). The early phases of preprocessing and extracting the motion trace of the video clip produce the grayscale images with the size of 224 × 224 × 1. The first convolution layer consists of 32 filters with a filter size of 3 and padding to extract information and stack it along the depth dimension of the input. These processed feature maps are transferred to pooling layer, which lower their spatial dimension. The activation layer then generates nonlinearity between the feature maps. The input is processed

348

J. Balamanikandan et al.

Fig. 1 Network architecture for classification

successively by three sets of these layers, with the number of filters rising linearly in the convolutional layers, such as 64 and 128 in the second and third convolution layers. This process captures subtle information from the input and aids the dense layers in categorization. There are two dense layers with 12,864 neurons each and a final layer with two neurons for binary categorization.

4 Methodology In sports science, it is common to use image-based human motion analysis to analyzes subject’s posture and activity. Each action has its own pattern, irrespective of the person performing it. For instance, walking has a distinct pattern from that of sitting. However, there is no dissimilarity between several walking individuals. These behavioral patterns are extracted and categorized and demonstrate that this method outperforms other algorithms that look at the whole image. This section will describe about removing the surroundings generalizability after extracting motion components, then constructing movement trace images, and finally classifying the activity in multiple sections. Figure 2 depicts the suggested framework’s procedure.

Fig. 2 The pipeline proposed in this research

Eliminating Environmental Context for Fall Detection Based …

349

Fig. 3 Extraction and selection of random frames during preprocessing

4.1 Preprocessing Stage Each URFD video is pipelined to separate the frames and is saved in a folder named after the video. The CSV file contains information on each video in the URFD dataset. The CSV file contains frame-specific information such as frames with falling, intermediate, and other positions. To enable frame preprocessing, the frames are renamed to match their folders. The preprocessing stage also ensures the cleaning of data for classification. Once frames are retrieved, they can be considered as sequentially stacked images and can be pre-processed as such. The frames are then normalized and standardized to improve the model’s convergence. Falling activity occurs in a matter of seconds. The frame rate of most security cameras is around 25 frames per second (FPS). Hence, the 50 frames are chosen at random from a different video each time as depicted in Fig. 3. The selection of fall videos is deliberate, as certain frames do not correspond to the individual falling. However, based on the information of subject’s intermediate or falling frames from CSV file, the sampling is done to contain at least a single fall frame in the sample. The sampled frames are then placed in the dataset’s subset folder along with the videos. Figure 4 shows the dataset folder structure after preprocessing.

4.2 Frame Differencing The optical flow approach is one of the most used methods for extracting motion components from video. The optical flow features retain an internal representation of motion in the video by computing displacement vectors between two successive frames. [21] CNN is used to extract and categorize optical flow data, with the CNN architecture adjusted to account for temporal aspects of optical properties in consecutive stacks of images. Here, the entire environment is evaluated, including the motion field, even if the motion component is very small or non-existent in the frame.

350

J. Balamanikandan et al.

Fig. 4 Structure of the dataset organized in folders

The primary necessity is to detect movement between the regions of the human body that contain the information necessary to accomplish the activity. The trajectory of a person may be determined by analyzing the motion between frames. For the identification of motion components, a modified version of the structural similarity index technique is being used. It creates a local component by extracting the internal motion that occurs between bodily components. This method can localize the motion component in 0.05 s. Figure 5 represents the process of subtracting two consecutive frames provides the image with pixel change, this image is inverted and passed through contour approximation method for localizing the pixel change with bounding box. These bounding box center points are used to create a trajectory image as depicted in Fig. 6 for classification.

Eliminating Environmental Context for Fall Detection Based …

351

Fig. 5 Processing of frames using frame differencing method

Fig. 6 Trajectory image prepared and passed for classification

4.3 Classification The motion trace image is utilized to classify the action category using the vanilla CNN architecture discussed previously. By using convolutional filters, the neural network retrieves different information from the image that is required to differentiate it from other images for prediction. This information includes, but is not limited to, corners, texture, patterns, blobs, and color. Convolutional filters containing additional information are layered in the image’s depth dimension. The pooling layers are used to consolidate the characteristics learnt by the convolutional layer feature map and decrease the image’s spatial dimension. Using the backpropagation technique [22], the parameters of the network are modified during the training phase to map the input image and the target label. The network constructs a multidimensional space in which the input and output target labels are represented. Thus, during inference, the input is projected to the same space based on its extracted features, and a label closely comparable to the existing mappings is inferred. This architecture provides two class probabilities that indicate the input’s most probable class.

352

J. Balamanikandan et al.

5 Evaluation The proposed methodologies are implemented in Python using TensorFlow framework, and its performance is assessed in URFD dataset. The tests are conducted in CPU with Intel Core I5 processor and 16 Gb RAM.

5.1 Dataset All these tests are performed using the UR fall detection dataset [23], which was originally proposed in 2014 by researchers from the University of Rzeszow’s Center for Computational Modeling. The dataset includes videos captured with two Kinect cameras at a frame rate of 31 frames per second and a resolution of 640 × 480. One camera is mounted on the top of the environment, while another is mounted on the front side to capture the subject. The videos are accessible in both RGB and depth formats, along with both horizontal and vertical viewing angles. The dataset contains 30 falls and 40 everyday activity sequences like bending, kneeling, and walking. Each video features an actor performing or simulating a single activity from any category. To protect the actors, mattresses were placed on the floor where the subject fall, and any material that could injure the subject was removed. The data on real falls is scarce and difficult to procure. Thus, actors are used to imitate the fall. These fall actions may not exactly reflect a person falling in real life, but they are the closest representations accessible at this moment. This dataset comprises only static items and events that occurred in confined environments during daylight. In absolute terms, there is no motion greater than that of a subject.

5.2 Evaluation Metrics This is a binary classification problem to identify if a fall occurred within a sequence of frames. Recall, precision, and accuracy are widely used to assess such classification problems. These measures perform well when datasets are skewed, which makes them more suitable for our task of fall recognition since fall samples are often fewer than other activities. The evaluation is based on True Positives (TP) and True Negatives (TN). TP is the number of correctly categorized falls, whereas TN is the number of correctly classified non-falls. False Positives (FP) indicate the number of non-falls that are classified as falls, while False Negatives (FN) indicate the number of falls that are incorrectly classified as non-falls. Recall. A recall value is the fraction of correctly classified falls across the entire dataset. This can be formulated as,

Eliminating Environmental Context for Fall Detection Based …

Recall =

TP T P + FN

353

(6)

Precision. Precision is measured as the fraction of correctly classified falls among all classified falls. This can be formulated as, Pr ecision =

TP T P + FP

(7)

Accuracy. Accuracy is calculated as the ratio of correctly classified falls to correctly classified non-falls. This can be formulated as, Accuracy =

TP +TN T P + FN + T N + FP

(8)

5.3 Results The performance of the approach is validated with different hyperparameter values. The frame count hyperparameter in the pipeline is tuned to different values such as 25, 50 to assess the model’s performance. Numerous methods are used to prepare and retrieve information for recognition before using the neural network in its entirety. The pipeline takes a cubical video data and returns a picture with trajectory of the subject. The neural network was trained for 15 epochs in all experiments. The data size was investigated with values 120, 180, and 240. The training is done incrementally, i.e., the model trained with 120 data points is retrained with a bigger number to determine model performance in unseen larger data. This technique helps us identify crucial characteristics for successful prediction and design of leaner classification architecture (Table 1). Following the data pipeline, experiments are conducted with several network architectures. It is noted that increasing the number of network layers and hidden nodes causes the network to overfit. As this is a binary classification problem, the model is trained using binary cross-entropy loss and stochastic gradient descent (SGD). Figure 6 depicts the results of several performance measures on the training and validation set for the 50-frame data processing approach. The performance of the approach in the training dataset is represented by Fig. 7a, c, e, while the performance of the approach in the validation dataset is represented by Fig. 7b, d, f. Each graph shows the value of Epochs in the X-axis and its corresponding value in the Y-axis. As previously indicated, the experiments are carried out with varying numbers of data samples. The color lines in Fig. 6 line graphs illustrate the various data sample settings, which are 240, 120, and 180. Figure 6a, b demonstrates that the approach converges after ten training epochs and that accuracy achieves 100% for all data settings. Similar patterns can be noticed for

354

J. Balamanikandan et al.

Table 1 Summary of results obtained in different settings Setting

Loss

Accuracy

Precision

Recall

Size: 120 Frames: 50

Train: 0.028 Val: 0.023 Test: 0.112

Train: 100 Val: 100 Test: 96.6

Train: 97.8 Val: 100 Test: 96.6

Train: 95.8 Val: 100 Test: 96.6

Size: 180 Frames: 50

Train: 0.042 Val: 0.08 Test: 0.073

Train: 99.3 Val: 100 Test: 96.1

Train: 98.6 Val: 100 Test: 96.1

Train: 98.6 Val: 96 Test: 96.1

Size: 240 Frames: 50

Train: 0.009 Val: 0.25 Test: 0.041

Train: 100 Val: 100 Test: 99.5

Train: 100 Val: 100 Test: 99.5

Train: 100 Val: 100 Test: 99.1

Size: 120 Frames: 25

Train: 0.01 Val: 0.051 Test: 0.208

Train: 100 Val: 100 Test: 91.6

Train: 98.9 Val: 94.7 Test: 93.9

Train: 100 Val: 100 Test: 90.8

Size: 180 Frames: 25

Train: 0.002 Val: 0.002 Test: 0.135

Train: 100 Val: 100 Test: 97.2

Train: 100 Val: 100 Test: 97.2

Train: 99.3 Val: 100 Test: 97.2

Size: 240 Frames: 25

Train: 0.008 Val: 0.017 Test: 0.055

Train: 100 Val: 100 Test: 98.7

Train: 99.4 Val: 100 Test: 98.7

Train: 100 Val: 97.2 Test: 98.7

Fig. 7 Graphical results of evaluation of the approach with random selection of 50 frames

Eliminating Environmental Context for Fall Detection Based …

355

Fig. 8 Time consumed for preprocessing frame blocks in different settings

Precision Fig. 7c, d and Recall Fig. 7e, f. More precisely, it achieves 98.97% accuracy and 100% recall at 13th epoch, when 120 data samples are selected from each video. In Fig. 6, blue line indicating model trained with 180 data samples performs better. The results in validation Fig. 7b, d, f show 99.31% accuracy, precision at the 13th epoch, and recall of 98.61%. Although the performance seems to be comparable to the model trained with 120 data samples, the loss on test-set reduced from 0.1120 to 0.0735, demonstrating the approach’s enhanced generalization with more data. The data sample size was raised by one-fold 240 to confirm the performance findings. Additionally, Fig. 7b, d, f demonstrates that the technique achieves 100% accuracy, precision, and recall and works well in unseen data samples with a loss value of 0.041. This demonstrates that the amount of data samples is critical to performance. Although further increasing the data size does not result in performance gains. As seen in Fig. 8a, the average time needed to analyze a data sample of 50 frames is around 2 s. The technique is also validated using 25-frame selected for each data sample, as seen in Fig. 9. Like the previous graphs, which depict epochs on the X-axis and their associated values on the Y-axis. Figure 9a, c, e depicts the findings on the training dataset and seems to be comparable to what was obtained before with convergence after the 10th epoch. However, Fig. 9b, d, e, demonstrates a significant amount of noise in the training results, implying an inadequate trace in each data sample. Figure 8b demonstrates that the average time required to process each data sample is around one second for 25 frames. It is noticed that the processing time for these data samples is greatly reduced, while performance has increased significantly as the number of data samples grows.

6 Conclusion The primary insight obtained from the studies with various parameter settings was the effect of data size on action prediction. With fewer parameters, the suggested technique obtained more than 99% accuracy in the URFD dataset. The inference takes two seconds in total, including preparation time. This enables the proposed

356

J. Balamanikandan et al.

Fig. 9 Graphical results of evaluation of the approach with random selection of 25 frames

approach to operate in near real time and to generalize more effectively in the unseen data. This architecture can be retrained overnight in less than five minutes to include fresh data and improve recognition over time. In the future, end-to-end neural network pipeline with fewer parameters will be designed to detect falls accurately in real time with higher generalization capability. Acknowledgements The authors would like to acknowledge the facilities related to computing provided by Vision and Network Analytics (VIShNA) Lab at Amrita School of Computing, Amrita Vishwa Vidyapeetham, Coimbatore for carrying out the experiments.

References 1. World Health Organization. (n.d.). Dementia. World Health Organization. https://www.who. int/news-room/fact-sheets/detail/dementia 2. Chang, M., Aggrey, E., Sayed, M., Kinshuk.: Health condition alarm system. International Conference on Brain and Health Informatics, LNAI 8211, pp. 307–315. (2013)

Eliminating Environmental Context for Fall Detection Based …

357

3. Wu, S., Chang, M., Lin, H. N., Heh, J. S.: Design Petri Net to simulate the processes of tele-healthcare. In: WSEAS International Conference on E-ACTIVITIES, pp. 183–188 (2008) 4. Ying, K., Chang, M., Chiarella, A.F., Heh, J. S.: Clustering students based on their annotations of a digital text. IEEE Fourth International Conference on Technology for Education, pp. 20–25 (2008) 5. Mubashir, M., Shao, L., Seed, L.: A survey on fall detection: principles and approaches. Neurocomputing 100, 144–152 (2013) 6. Pannurat, N., Thiemjarus, S., Nantajeewarawat, E.: Automatic fall monitoring: a review. Sensors 14(7), 12900–12936 (2014) 7. Wang, J., Zhang, Z., Li, B., Lee, S., Sherratt, R.S.: An enhanced fall detection system for elderly person monitoring using consumer home networks. IEEE Trans. Consum. Electron. 60(1), 23–29 (2014) 8. Tamura, T., Yoshimura, T., Sekine, M., Uchida, M., Tanaka, O.: A wearable airbag to prevent fall injuries. IEEE Trans. Inf. Technol. Biomed. 13(6), 910–914 (2009) 9. Rougier, C., Meunier, J., St-Arnaud, A., Rousseau, J.: Robust video surveillance for fall detection based on human shape deformation. IEEE Trans. Circ. Syst. Video Technol. 21(5), 611–622 (2011) 10. Popoola, O.P., Wang, K.: Video-based abnormal human behavior recognition—a review. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(6), 865–878 (2012) 11. Gautam, K.S., Parameswaran, L., Thangavel, S.K.: Computer vision based asset surveillance for smart buildings. J. Comput. Theor. Nanosci. 17(1), 456–463 (2020) 12. Subbiah, U., Kumar, D.K., Senthil Kumar T., Parameswaran, L.: An extensive study and comparison of the various approaches to object detection using deep learning. In: International Conference on Smart Systems and Inventive Technology (2020) 13. Kavin Kumar, D., Thangavel, S.K.: Assisting visually challenged person in the library. Comput. Vis. Bio Inspired Comput. 722–733 (2018) 14. Gautam, K.S., Thangavel, S.K.: Video analytics-based emotion recognition system for smart buildings. Int. J. Comput. Appl. 43(9), 858–867 (2021) 15. Berlin, S.J., John, M.: Vision based human fall detection with Siamese CNN. J. Ambient Intell. Humanized Comput. 1–12 (2021) 16. Singh, K., Rajput, A., Sharma, S.: Vision based patient fall detection using deep learning in smart hospitals. Int. J. Innov. Technol. Exploring Eng. (IJITEE) 9(2) (2019) 17. Kong, Y., Huang, J., Huang, S., Wei, Z., Wang, S.: Learning spatiotemporal representations for human fall detection in surveillance video. J. Vis. Commun. Image Represent. 59, 215–230 (2019) 18. Tsai, T.H., Hsu, C.W.: Implementation of fall detection system based on 3D skeleton for deep learning technique. IEEE Access 7, 153049–153059 (2019) 19. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004) 20. Suzuki, S.: Topological structural analysis of digitized binary images by border following. Comput. Vis. Graph. Image Process. 30(1), 32–46 (1985) 21. Núñez-Marcos, A., Azkune, G., Arganda-Carreras, I.: Vision-based fall detection with CNN. Wirel. Commun. Mobile Comput. (2017) 22. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986) 23. Kwolek, B., Kepski, M.: Human fall detection on embedded platform using depth maps and wireless accelerometer. Comput. Methods Programs Biomed. 117(3), 489–501 (2014) 24. Rudra, S., Senthil Kumar, T.: A Robust Q-Learning and Differential Evolution Based Policy Framework for Key Frame Extraction. Springer Advances in Intelligent Systems and Computing, vol. 1039, pp. 716–728 (2019)

Performance Analysis of Regression Models in Stock Price Prediction Manas Ranjan Panda, Anil Kumar Mishra, Samarjeet Borah, and Aishwarya Kashyap

1 Introduction Stock market which is also termed as equity market or share market and generally refers to the combination of buyers and sellers of the company shares. This market lets the sellers and the buyers to negotiate stock prices and make trades. If someone purchases a public or private company’s stock, he/she owns a small portion of the company. Since it is difficult to track the stock of every company, indexes have been introduced. Sell and purchase of public/private company shares are managed by stock exchanges [1]. There are two major stock exchanges available in India—Bombay Stock Exchange (BSE) and National Stock Exchange (NSE). In addition to this, the market has several indexes according to the sectors and other parameters, such as NIFTY, SENSEX, bank index, and IT index [2, 3]. The buy/sell of shares is known as trading and is open to all having a valid credential. Trading is done mostly online using various apps of service providers. The indexes play a major role in determining the performance of stock market. For example, if it is said that the particular market index has moved up or down, the stocks within a particular index have gained or lost value as a whole. Investors of stock market always try to book some profit through this movement of stock prices. First, companies list their shares through initial public offering (IPO) to grow company fund. The investors can buy and sell these shares. Later, the price of the shares changes based on market demand. The market and the

M. R. Panda GIET University Gunupur, Gunupur, India A. K. Mishra Department of CSE, GEC, Bhubaneswar, India S. Borah (B) · A. Kashyap SMIT, Sikkim Manipal University, Sikkim, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_32

359

360

M. R. Panda et al.

share price mainly depend on demand and supply. There are few more parameters also that influence share prices. Trading in the stock is primarily of two types—intraday and long term. Investors can be categorized broadly into these two categories. Intraday trader purchases and sells certain quantity of shares within the day trading session possibly after booking sessions. But, the long-term investors buy and hold the shares for a longer period. In the cases, investors need to know the price of the shares in advance to take purchase/sell decision. Hence, a proper analysis on the company shares or stock is required [4].

1.1 Stock Analysis There are primarily two ways to analyze stocks—technical analysis and fundamental analysis [5]. These are briefly discussed below: • Fundamental Analysis: In fundamental analysis, it is assumed that price of a stock does not reflect the business process. Valuation matrices along with some other information are used to determine the best possible buying/selling price of a stock. • Technical Analysis: It is opposite to fundamental analysis. In technical analysis, it is assumed that price of a stock price reflects the valuable information from the business process of a company.

1.2 Analysis of Stock Data Trend analysis in share market is gaining lots of popularities now-a-days which involves huge amount of data with large dimensions. Here, some of the parameters vary from sector to sector. Therefore, selecting a particular sector for analysis is an important factor. This may be either type of investment or industry. There are additional internal as well as external forces that influence stock prices. Some of these forces may be government regulations, major changes in some related industries, a major political change, natural calamity, etc. Stock market data or financial data are time dependent and considered as time series data. Time plays a major factor in analysis of such kind of data. Here, data points are ordered in time. In analysis of such data, time is frequently considered as the independent variable, and aim is to make a prediction for the future. Machine learning techniques are found useful in predicting future price of a stock [6–9]. Various other classification methods are also often used in stock price prediction [10]. Several researchers have used neural networks for stock price analysis and prediction [11–13]. From the literatures, it can be seen that, many basics as well as advanced techniques are applied in prediction of stock prices. In this paper, three comparatively basic learning models, namely linear regression, logistic regression, and autoregression, have been considered to investigate the same. The motivation

Performance Analysis of Regression Models in Stock Price Prediction

361

behind selection of these techniques is their computationally inexpensive nature. Also, the regression models are simple and scalable. The main aim is to analyze the performance of the regression models in stock price prediction. The paper is organized into five different sections, where the first section introduces the concept, and the second section describes the methodology of the experiment. Implementation details are discussed in the third section which is followed by Sect. 4 containing results and discussion. Finally, the work is summarized in the conclusion section.

2 Methodology Regression analysis is found to be an important technique in stock price prediction [14]. As stated above, three regression models which are considered as basic machine learning techniques are being used to perform the experiment. The same is conducted using Python, and details are discussed below (Fig. 1).

2.1 Regression Analysis In regression analysis, a study is made to establish a statistical relationship between a set of independent variables and a dependent variable. It helps in finding out the most or least influential factors in any analysis. Dependent variable is the single factor that is targeted to predict. There may be more than one independent variable and presumed that they can influence the dependent variable. Regression analysis is a part of machine learning which falls under supervised learning techniques [15]. Several regression analysis techniques are available in literature [16]. However, three of such techniques, namely, linear regression, logistic regression, and autoregression have been considered in this investigation. These techniques are briefly discussed below. In predictive analysis, a commonly used technique is the linear regression. The aim of the technique is simple—whether the regressors or independent or predictor

Linear Regression Model Evaluation TCS Stock Data

Logistic Regression MAE RMSE Autoregression

Fig. 1 Schematic diagram of the experiment

Results

362

M. R. Panda et al.

variables are sufficiently good in predicting the dependent variable or the outcome. Additionally, it also tries to find out the best possible predictor variables. Logistic regression can be considered as a supervised classification method. Binary outcomes can be predicted based on a set of independent variables. Therefore, it is found to be an effective model for binary data. The independent variable may be either continuous or discrete nominal or ordinal. Additionally, there may be further requirements that may be considered as assumptions [17]. In an autoregressive model or autoregression, current or future value can be predicted based on past data in a time series. Essentially, all data considered should be from the same time series. The basic working of autoregression is similar to the linear regression which checks the performance of a variable from the current time series against the past performance of one or more variable in the same series. Autoregression assumes that there is a relationship between two variables which could be positive, negative, and zero correlation [18].

3 Implementation The experiment has been conducted in Windows® 10 platform using Python 3.8. The system used has an Intel® core™ i5-7200U CPU @ 2.50 GHz with 4 GB of RAM. Regression analysis classes have been used from Python to conduct the experiment. As discussed above, the required independent and dependent variables are set to open price and exponential moving average (EMA), respectively. The EMA is more sensitive to recent price changes because the averages are exponentially weighted.

3.1 Dataset Description The dataset considered in the experiment is TCS stock price which contains share prices of Tata Consultancy Services from January 2018 to December 2021. TCS is the largest information technology services company in the world as per market capitalization (February 2021) [19]. The dataset has 4463 records and 8 attributes. The important attributes included are the date of trading, opening, and closing prices, low and high values, and size of shares. Additionally, two columns are there containing information on dividends and splitting of stocks.

3.2 Sample Generation and Training The dataset is spilt into training and testing sets at 80:20. However, no preprocessing has been performed for this dataset. As required in regression models, a set of independent variables and a dependent variable need to be selected for the experiment. For the independent variable, the open price feature from historical stock dataset has

Performance Analysis of Regression Models in Stock Price Prediction

363

been chosen. Similarly, for the dependent variable, the exponential moving average (EMA) has been considered. The reason behind this is that an EMA is more sensitive to recent price changes because the averages are exponentially weighted. For generation of the dependent variable column, PANDAS_TA module has been used taking 10 historical data steps.

4 Results and Discussion This section contains some of the notable results obtained during the experiment. As stated above, the experiment has been conducted on linear, logistic, and autoregression separately using the same dataset and without involving any preprocessing techniques.

4.1 Linear Regression Figure 2 shows a comparative analysis on the price predicted by linear regression and the actual price. It can be seen that the prices are fairly comparable. The linear regression model provides a better prediction accuracy on the dataset under consideration.

Fig. 2 Actual price versus predicted price using linear regression

364

M. R. Panda et al.

Fig. 3 Actual price versus predicted price using logistic regression

4.2 Logistic Regression This model is used to compute a weighted sum of the input variables. It is a bit similar to the linear regression model. A special nonlinear function termed as sigmoid function (logistic function) is used to produce the results. The output of a logistic regression model is binary. The classifier predicts 1 if the probability is greater than 0.5 and − 1 otherwise. While predicting the movement of the stock price, if tomorrow’s closing price is higher than today’s closing price, the stock can be purchased. For example, an output of 0.8 indicates 80% chances of higher closing price compared to the today’s one. This is classified as 1. A classification value of − 1 indicates a lower closing price compared to the today’s one. In the experiment, the logistic regression showed an accuracy of 56% approximately, thereby lowest performer among the three. The result is shown in Fig. 3.

4.3 Autoregression Results obtained from autoregression are shown in Fig. 4 where the blue line represents predicted opening prices and orange one is the actual one. As it can be seen, both are comparable, but not as compared to the linear regression.

Performance Analysis of Regression Models in Stock Price Prediction

365

Fig. 4 Actual price versus predicted price using autoregression

4.4 Comparative Analysis of Linear Regression and Autoregression A final analysis has been presented in Fig. 5, where both linear and autoregression has been considered. As the prediction mechanism is a bit different in case logistic regression, the same is not considered for inclusion in the plot. In addition to this, accuracy of that model is not found an impressive one. As it can be seen, linear regression has outperformed the other two models.

4.5 Evaluation Criteria The experimented models are evaluated using mean absolute error (MAE) and root mean square error (RMSE) [19]. This is reported in Table 1 and Fig. 6. In MAE, dissimilarity between the predicted and observed values is measured. The MAE can be calculated as: n |yi − xi | (1) MAE = i=1 n n where i=1 |yi −xi | is the summation of difference between the residuals considering absolute values and n is the numbers of observations. RMSE is also called as root mean square deviation. It is used to measure the prediction quality. Here, Euclidean distance is used to measure the difference between the predicted and the measured true values. RMSE is calculated as:

366

M. R. Panda et al.

Fig. 5 Actual price versus predicted price using linear regression and autoregression

Table 1 Comparative RMSE and MAE scores

Model

RMSE

MAE

Linear regression

0.536

0.413

Logistic regression

1.485

1.102

Autoregression

4.462

3.876

LINEAR REGRESSION

4.462

MAE 1.102

1.485

0.413

0.536

RMSE

LOGISTIC REGRESSION

3.876

PERFORMANCE EVALUATION OF LR, LOGR & AR

AUTOREGRESSION

Fig. 6 Performance evaluation of linear regression, logistic regression, and autoregression

 RMSE =

N  i=1

xi − xi∧ N

2 (2)

Performance Analysis of Regression Models in Stock Price Prediction

367

5 Conclusion In this research work, three regression analysis models are used to predict stock prices. The experiment has been conducted on TCS stock data containing information from January 2018 to December 2021. The dataset has been divided into 8:2 ratios for training and testing purpose. No preprocessing has been applied/considered in the experiment. From the experiment comparatively, linear regression is having highest prediction accuracy, and logistic regression is having the lowest. This investigation is limited to three of the regression models only with a single dataset from IT sector. There is always a scope for considering other regression models from [16] with datasets from different business sectors.

References 1. Morck, R., Shleifer, A., Vishny, R.W.: The Stock market and investment: is the market a sideshow? Brook. Pap. Econ. Act. 1990, 157–215 (1990) 2. Bandivadekar, S., Ghosh, S.: Derivatives and Volatility on Indian Stock Markets, Reserve Bank of India Occasional Papers, vol. 24, no. 3 (2003) 3. Pushpalatha, M., Srinivasan, J., Shanmugapriya, G.: A research on volatility in the Indian Stock Market, with special reference to nifty and selected, companies of financial service sector of NSE. Int. J. Technol. Exploring Engineering (IJITEE) 8(12S), 541–544 (2019). ISSN 2278-3075 4. Diao, H., Liu, G., Zhu, Z.: Research on a stock-matching trading strategy based on bi-objective optimization. Front. Bus. Res. China 14, 8 (2020). https://doi.org/10.1186/s11782-020-000 76-4 5. Nti, I.K., Adekoya, A.F., Weyori, B.A.: A systematic review of fundamental and technical analysis of stock market predictions. Artif. Intell. Rev. 53, 3007–3057 (2020) 6. Prasad, V.V., Gumparthi, S., Venkataramana, L.Y., Srinethe, S., SruthiSree, R.M., Nishanthi, K.: Prediction of stock prices using statistical and machine learning models: a comparative analysis. Comput. J. 8 (2021) 7. Chong, E., Han, C., Park, F.C.: Deep learning networks for stock market analysis and prediction: methodology, data representations, and case studies. Expert Syst. Appl. 83, 187–205 (2017) 8. Ampomah, E.K., Qin, Z., Nyam, G.: Evaluation of tree-based ensemble machine learning models in predicting stock price direction of movement. Information 11, 332 (2020) 9. Basa, S., Kar, S., Saha, S., Khaidem, L., De, S.: Predicting the direction of stock market prices using tree-based classifiers. N. Am. J. Econ. 47, 552–567 (2019) 10. Choudhury, S.S., Sen, M.: Trading in Indian stock market using ANN: a decision review. Adv. Model. Anal. A 54, 252–262 (2017) 11. Boonpeng, S.; Jeatrakul, P. Decision support system for investing in stock market by using OAA-neural network. In Proceedings of the Eighth International Conference on Advanced Computational Intelligence (ICACI), Chiang Mai, Thailand, 14–16 February 2016, pp. 1–6 12. Naik, N., Mohan, B.R.: Intraday stock prediction based on deep neural network. Proc. Natl. Acad. Sci. USA 43, 241–246 (2020) 13. Karim, R., Alam, M.K., Hossain, M.R.: Stock market analysis using linear regression and decision tree regression. In: Proceedings of 1st International Conference on Emerging Smart Technologies and Applications (eSmarTA), IEEE (2021). https://doi.org/10.1109/eSmarTA52 612.2021.9515762 14. Regression Analysis Essentials for Machine Learning. http://www.sthda.com/english/wiki/reg ression-analysis-essentials-for-machine-learning Accessed on 06 Apr 2022

368

M. R. Panda et al.

15. Different Types of Regression Models. https://www.analyticsvidhya.com/blog/2022/01/differ ent-types-of-regression-models/ Accessed on 06 Apr 2022 16. Logistic Regression in Machine Learning. https://www.javatpoint.com/logistic-regression-inmachine-learning. Accessed on 10 Apr 2022 17. Autoregression Models. https://online.stat.psu.edu/stat501/lesson/14/14.1. Accessed on 10 Apr 2022 18. TCS Stock Data—Live and Latest. https://www.kaggle.com/datasets/kalilurrahman/tcs-stockdata-live-and-latest. Accessed on 05 Apr 2022 19. Performance measures: RMSE and MAE. https://thedatascientist.com/performance-measuresrmse-mae/. Accessed on 14 Apr 2022

Review on the Image Encryption with Hyper-Chaotic Systems Arghya Pathak, Subhashish Pal, Jayashree Karmakar, Hrishikesh Mondal, and Mrinal Kanti Mandal

1 Introduction Secure transmission of data through any open access channel is a demanding issue nowadays. Amongst different types of data, images are widely used in different fields like medical science, surveillance, education, social media, etc. The security of image is done by encrypting the original signal. According to Shannon, for perfect security, the key size should be equal to the possible plain texts. But, it is impractical to implement [1]. Thus, the computational security depends on the amount of input data, memory space and computational time [2]. The resistivity of the encryption method against the easiest attack assures the security. The fulfilment of the following conditions ensures the higher efficiency of encryption algorithm: (i) sensitivity to initial conditions, (ii) ergodicity, (iii) pseudo-randomness, (iii) higher complexity during decryption, (iv) large key-space, (iii) low computational load, (iv) aperiodicity, etc. A chaotic system successfully fulfils these requirements; this is the main reason of extensive use of chaotic cryptography in image encryption till date [2, 3]. But, some cryptanalysis of several encryption mechanism successfully breaks the security and led the researchers to find new and better chaotic system for increased security. Over A. Pathak (B) · S. Pal · H. Mondal · M. K. Mandal (B) Department of Physics, National Institute of Technology, Durgapur 713209, India e-mail: [email protected] M. K. Mandal e-mail: [email protected] S. Pal Department of Physics, Dr. B. C. Roy Engineering College, Durgapur 713206, India J. Karmakar MUSE Lab, Indian Institute of Technology, Gandhinagar 382355, India H. Mondal Department of Physics, Durgapur Government College, Durgapur 713214, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_33

369

370

A. Pathak et al.

the past few decades, hyper-chaotic image encryption is preferred for its advantages over chaotic encryption. Hyper-chaotic system is an advanced version of a chaotic non-linear system which have more state variables and at least two or more positive Lyapunov exponents. The advantages of a hyper-chaotic system are as follows: (i) more state variables, i.e. higher dimensionality gives greater complexity but not too complex to execute practically, (ii) has two or more positive Lyapunov exponents, (iii) sensitive dependency on initial conditions due to more dimension than a chaotic system, (iv) larger key-space [4]. In this paper, we have presented a literature survey-based studies on hyperchaotic image encryption techniques with their merits and demerits in a chronological manner. In Sect. 2, we have explained the concept of Lyapunov exponents and Kaplan–Yorke dimension and the condition of being hyper-chaotic. In Sect. 3, different model of hyper-chaotic systems and their application in image encryption have been discussed. A comparison between different quality metrics of different models has been presented in Table 1 of this section. Section 4 concludes this paper.

2 Lyapunov Exponents Lyapunov exponent of any system is the value which will tell about the rate of separation the trajectories which are infinitesimally close. Here, the two trajectories with a certain initial separation x 0 starts diverging in phase space, and this diverging nature can be expressed in a linearized equation as given below, x ≈ eλt |x0 |

(1)

where this λ is the Lyapunov exponent. For any system to be hyper-chaotic, it should obey the following conditions: • The system must contain at least two or more positive Lyapunov exponents. • The sum of all the Lyapunov exponents should be less than zero to be a stable system. • The Lyapunov dimension of that system should be of fractional order. The Lyapunov dimension of a dynamical system is evaluated from Kaplan–Yorke conjecture and presented below [5]: k DK Y = j +

Here, k is a highest integer such that

k  j=1

j=1

λj

|λk+1 |

λ j ≥ 0 and λ j > λ j+1 .

(2)

Review on the Image Encryption with Hyper-Chaotic Systems

371

3 Hyper-Chaotic Systems and Its Applications in Cryptography Security and complexity are closely related in the field of secure communication and cryptography. The general technique to apply the chaotic system for image encryption can be understood from the general algorithm of Fig. 1. Researchers have implemented both chaotic and hyper-chaotic system to encrypt the plain images. It has been found after parameter analysis that the hyper-chaotic systems are more efficient than ordinary chaotic system as hyper-chaos are higher complex systems in comparison with chaotic maps and basic chaotic systems. The different types of hyper-chaotic system are discussed below. In 1976, Rossler proposed the first hyper-chaotic system [6] with four state variables as ⎫ x˙ = −(y + z) ⎪ ⎪ ⎬ y˙ = x + ay + w ⎪ z˙ = x z + b ⎪ ⎭ w˙ = −cz + dw

(3)

In 2007, Gao et al. [7] modified Lorenz system by incorporating non-linear feedback mechanism to introduce one hyper-chaotic system as mention in Eq. (4). Later on, the system is further modified by Zhen et al. [8] to fractional-order hyper-chaotic system to apply in cryptography. They calculated the Lyapunov exponents of the system (λ1 = 0.3392, λ2 = 0.1577, λ3 = 0, λ4 = −15.2624) to prove its hyperchaotic nature. According to their cryptanalysis, the estimated entropy value of the encrypted grayscale image is 7.9963.

Fig. 1 Flowchart for generalized chaos-based image encryption techniques

372

A. Pathak et al.

⎫ x˙ = a(y − x) ⎪ ⎪ ⎬ y˙ = bx + y − x z − w ⎪ z˙ = x y − cz ⎪ ⎭ w˙ = dyz

(4)

In 2009, Gangadhar et al. proposed hyper-chaotic key-based algorithm (HCKBA) on image to improve chaotic key-based algorithm (CKBA) as reported in [9]. On the other hand, Zhu [10] proposed a plain image encryption technique using private key-dependent hyper-chaotic system with four state variables as given in Eq. (5). ⎫ x˙ = a(y − x) + yz ⎪ ⎪ ⎬ y˙ = bx − y − x z + w ⎪ z˙ = x y − cz ⎪ ⎭ w˙ = dw − x z

(5)

The constant parameters of the system are a = 35, b = 55, c = 8/3, d = 1.3, and Lyapunov exponents are λ1 = 1.4164, λ2 = 0.5318, λ3 = 0, λ4 = −39.1015. This is resistive against chosen plain text or chosen cypher text attack, but it has a difficulty to send image information as well as key with every cypher image. In 2012, Wei et al. [11] introduced a novel colour image encryption technique based on DNA sequence and hyper-chaotic system. They have combined the chaotic maps with DNA sequence to obtain secure encryption scheme. Their used hyper-chaotic system is described below ⎫ x˙ = a(y − x) ⎪ ⎪ ⎬ y˙ = bx + cy − x z − w ⎪ z˙ = x y − dz ⎪ ⎭ w˙ = x + k

(6)

The parameters of the system are a = 36, b = 16, c = 28, d = 3 and −0.7 ≤ k ≤ 0.7. The Lyapunov exponents of the system are λ1 = 1.552, λ2 = 0.023, λ3 = 0, λ4 = −12.573. The author of this article claimed that the system is capable of enduring exhaustive attack, statistical attack and differential attack. To enhance the encryption security Hong et al. [12] have proposed the following hyper-chaotic system ⎫ x˙ = a(y − x) + w ⎪ ⎪ ⎬ y˙ = bx + cy − x z ⎪ z˙ = x y − dz ⎪ ⎭ w˙ = yz + ew

(7)

In their work, they have used two types of chaotic key; one generated from logistic map, and other is based on the hyper-chaotic system. These two types of keys are then used for encryption and obtained compatible results. In the work of Tong et al. [13],

Review on the Image Encryption with Hyper-Chaotic Systems

373

they have used a new hyper-chaotic system which is based on Rabinovich system. The system is given below ⎫ x˙ = ay − bx + yz ⎪ ⎪ ⎬ y˙ = ax − cy − x z z˙ = x y − dz + w2 ⎪ ⎪ ⎭ w˙ = x y + ew

(8)

The constant parameters are given as a = 8.1, = 4, c = -0.5, d = 1 and e = − 2.2. The Lyapunov exponents λ1 = 1.090046, λ2 = 0.012243, λ3 = −3.105106, λ4 = −4.697183 meet the conditions of being hyper-chaotic system, but the positive Lyapunov exponents are not much larger. Kumar et al. [14] proposed a 4D hyperchaotic map-based Lorentz system. In order to improve security here, user-defined 256-bit secret key and plain image are used as initial condition to the hyper-chaotic system for generating the key matrix. This key matrix is then used for encryption. This hyper-chaotic system is described as ⎫ x˙ = a(y − x) ⎪ ⎪ ⎬ y˙ = bx − y − x z + cw 4 4 ⎪ z˙ = x + y − dz ⎪ ⎭ w˙ = −ey

(9)

where a = 10, b = 46, c = 12, d = 8/3, e = 2. The Lyapunov exponents are λ1 = 0.60613, λ2 = 0.28066, λ3 = 0, λ4 = −11.489. The two positive Lyapunov exponents make the system hyper-chaotic. Zhang et al. [15] introduced the following hyper-chaotic system ⎫ x˙ = a(y − x) + w ⎪ ⎪ ⎬ y˙ = bx + cy − x z . ⎪ z˙ = x 2 − dz ⎪ ⎭ w˙ = ex + f y

(10)

The parameter values for this system are a = 20, b = 14, c = 10.6, d = 2.8, e ∈ [1, 24], f = 0.1. Their target was to use the key-stream generated by the hyperchaotic system to permute the original image pixel position. For a more secured encryption, Zhan et al. [16] used the following hyper-chaotic system and DNA encoding technique for image encryption. Their goal was to use the hyper-chaotic key throughout all the steps of encryption algorithm. The proposed system in this work is given below ⎫ x˙ = a(y − x) + bw ⎪ ⎪ ⎬ y˙ = cx + dw − x z . z˙ = x y − ez + f w ⎪ ⎪ ⎭ w˙ = −gx

(11)

374

A. Pathak et al.

This system has parameters values a = 35, b = 1, c = 35, = 0.2, e = 3, = 0.3, g = 5. Singh et al. in [17] proposed a new 5D hyper-chaotic system having two positive Lyapunov exponents as mention below. ⎫ x˙ = a(y − x) + byz + v + w ⎪ ⎪ ⎪ ⎪ ⎪ y˙ = −x z + cy + w ⎬ . z˙ = x y − dz − 4 ⎪ ⎪ ⎪ v˙ = −x y ⎪ ⎪ ⎭ w˙ = −ex

(12)

The constant parameter values of the system are given as a = 35, b = 30, c = 17, d = 0.78 and e = 12. The Lyapunov exponents are λ1 = 0.6892, λ2 = 0.1431, λ3 = 0, λ4 = −0.4238 and λ5 = −18.9440. This system is stable and has hyper-chaotic nature which makes it suitable for image encryption. Generally, 5D or 6D chaotic systems are more complex compare to 4D system. In case of 6D systems, the number of positive Lyapunov exponents is greater than two which makes it more complex, but it will take more execution time. To increase the security by using more efficient hyper-chaotic system, Kar et al. [18] proposed a new image encryption technique with 6D hyper-chaos. It has four positive Lyapunov exponents, and it fulfils the condition of hyper-chaotic nature. This system is given below. ⎫ ⎪ x˙ = a(y − x) + u ⎪ ⎪ ⎪ y˙ = bx − x z − y + v ⎪ ⎪ ⎪ ⎬ z˙ = x y − cz . ⎪ u˙ = du − x z ⎪ ⎪ ⎪ ⎪ v˙ = −ey ⎪ ⎪ ⎭ w˙ = f y + gw

(13)

The constant parameters of this system are a = 10, b = 28, c = 8/3, d = 2, e = 8.4, f = 1 and g = 1. The six Lyapunov exponents of the system are λ1 = −2.667, λ2 = −22.6858, λ3 = 0.3260, λ4 = 1, λ5 = 2 and λ6 = 11.3599. Though this system is more hyper-chaotic than a 4D hyper-chaotic system, it will exhibit more computational load due to its higher dimension and large complexity. For making the encryption process more powerful and in more unpredictable in nature, Zhang et al. [19] proposed the following six-dimensional hyper-chaotic system which is described below ⎫  x˙ = −ax + b α + βw 2 ⎪ ⎪ ⎪ ⎪ y˙ = cx + dy − x z + v ⎪ ⎪ ⎪ ⎬ z˙ = −ez + x 2 . (14) ⎪ u˙ = f y + gu ⎪ ⎪ ⎪ ⎪ v˙ = −hx ⎪ ⎪ ⎭ w˙ = y

Review on the Image Encryption with Hyper-Chaotic Systems

375

where a = 0.3., b = 1, c = 8.5, d = −2.1., e = 1.5, f = −0.1, g = 0.9., h = 1., α = 1., β = 0.2 are the controlling parameters and xi (i = 1, 2, ...6) are state variables. The Lyapunov exponents for this system are λ1 = 7.340, λ2 = 0.087, λ3 = 0.006, λ4 = −0.368, λ5 = −1.349, λ6 = −67.426. To increase the encryption security and to decrease the encrypted data size, the sparse and 4D hyper-chaotic system is proposed by Karmakar et al. [20]. In this encryption algorithm, the nonzero coefficient of the sparse matrix is encrypted by using a key-stream generated by the hyper-chaotic system. The proposed hyper-chaotic system is given as ⎫ x˙ = a(y − x) − bz ⎪ ⎪ ⎬ y˙ = x z − cy ⎪ z˙ = d − x − ez ⎪ ⎭ w˙ = f y − gw

(15)

where a, b, c, d, e, f , g are constant parameters having values 5, 0.01, 0.1, 20, 1, 20.8, 0.1, respectively. The Lyapunov exponents of the system are λ1 = 0.56, λ2 = 0.14, λ3 = 0, λ4 = −36.48 which confirms the hyper-chaotic nature. Sparse-based encryption mechanism provides better compression along with increased security. The security strength of an encryption mechanism is measured by different quality parameters like information entropy, number of pixels change rate (NPCR), unified average changing intensity (UACI), etc. These parameters are defined as below.

3.1 Entropy The randomness of a definite system is measured by the entropy parameter. The randomness amongst the pixel values of the cypher image with respect to an original image provides the measure of entropy. For 8-bit representation of the pixel value of an image, the maximum entropy is 8. Nearer to the said value ensures better randomness. If e represents the entropy of the cypher image c, then it can be found out using (16). e(c) = −

255

p(ci ) log2 p(ci )

(16)

i=0

where p(ci ) represents the probability that ci appears.

3.2 NPCR and UACI NPCR measured the rate of variation between the two encrypted images; one is generated without tempering, and other is tempered by changing or incrementing

376

A. Pathak et al.

Table 1 Comparisons of information entropy, NPCR and UACI for hyper-chaos-based encryption

Encryption algorithm

Entropy NPCR

UACI

Zhu [10] 4D system

7.9977

99.6143 33.51

Zhang et al. [15] 4D system

7.9993

99.7350 33.47

Karmakar et al. [20] 4D system 7.9980

99.64

33.56

Kar et al. [18] 6D system

7.9975

99.64

33.56

Zhang et al. [19] 6D system

7.9972

99.6450 33.6216

of at least one pixel value of the same image whereas UACI measures the average difference between the corresponding pixel intensity of the encrypted images before and after the shifting. The mathematical expression that is used to calculate the NPCR and UACI is given by Eqs. (17) and (18), respectively. Let I bs and I as of size m × n are the cypher images before and after the pixel shifting operation NPCR =

m n 1 δi j × 100% mn i=1 j=1

(17)

where (i, j) gives the position of the pixel. δi j indicates the variation rate:

0Iibsj = Iiasj 1Iibsj = Iiasj as m n bs 1 Ii j − Ii j × 100% UACI = mn i=1 j=1 255 δi j =

(18)

According to [20], the maximum theoretical value of NPCR is 99.6%, and UACI is 33%. For a superior encryption mechanism, the values of NPCR and UACI approximately equal with the maximum theoretical values. The value of this parameters of the encrypted image in several methods for input image Lena is presented in Table 1. The hyper-chaotic system used in each case is also pointed out in the same table. These cryptanalysis results are within the predicted values according to [21].

4 Discussion and Conclusion Different hyper-chaotic systems are discussed here with their specification about constant parameter and Lyapunov exponent values. On the basis of the larger Lyapunov exponent, some systems are superior to others. But, for faster and better encryption, the greater Lyapunov exponent as well as lesser complexity is preferred.

Review on the Image Encryption with Hyper-Chaotic Systems

377

A new 5D or 6D hyper-chaotic system is discussed here which can be utilized in image encryption efficiently in the near future for better security.

References 1. Shannon, C.E.: Communication theory of secrecy systems. Bell Syst. Tech. J. 28, 656–715 (1949) 2. Dachselt, F., Schwarz. F.: Chaos and cryptography. IEEE Trans. Circ. Syst. I Fundam. Theor. Appl. 48(12), 1498–1509 (2001) 3. Mandal, M.K., Banik, G. D., Chattopadhyay, D., Nandi, D.: An image encryption process based on chaotic logistic map. IETE Technical Rev. 29(5), 395–404 (2012) 4. Zhan, K., Wei, D., Shi, J., Yu. J.: Cross-utilizing hyperchaotic and DNA sequences for image encryption. J. Electron. Imag. 26(1), 013021 (2017) 5. Tong, X., Liu, Y., Zhang, M., Xu, H. and Wang.Z.: An image encryption scheme based on hyperchaotic Rabinovich and exponential chaos maps. Entropy 17(1), 181–196 (2014) 6. Rössler, O.E.: An equation for continuous chaos. Phys. Lett. A 57(5), 397–398 (1976) 7. Gao, T., Chen. Z..: A new image encryption algorithm based on hyper-chaos. Phys. Lett. A 372(4), 394–400 (2001) 8. Wang. Z., Huang. X., Li. Y. X. and Song. X. N..: A new image encryption algorithm based on the fractional-order hyperchaotic Lorenz system. Chinese Phys. B 22(1), 010504 (2013) 9. Gangahar, C.H., Rao, K.D.: Hyperchaos based image encryption. Int. J. Bifurcation Chaos 19(11), 3833–3839 (2009) 10. Zhu.C.: A novel image encryption scheme based on improved hyperchaotic sequences. Optics Commun. 285(1), 29–37 (2012) 11. Wei, X., Guo, L., Zhang, Q., Zhang, J., Lian, S.: A novel color image encryption algorithm based on DNA sequence operation and hyper-chaotic system. J. Syst. Softw. 85(2), 290–299 (2012) 12. Li-Hong. L., Feng-Ming, B., Xue-Hui, H.: New image encryption algorithm based on logistic map and hyper-chaos. In: 2013 International Conference on Computational and Information Sciences, IEEE, pp. 713–716 (2013) 13. Tong, X., Liu, Y., Zhang, M., Xu. H., Wang, Z.: An image encryption scheme based on hyperchaotic Rabinovich and exponential chaos maps. Entropy 17(1), 181–196 (2014) 14. Kumar, A., Kar, M., Mandal, M.K., Nandi, D.: Image encryption using four-dimensional hyper chaotic Lorenz system. Elixir Elec. Eng. 87, 41,904–41,909 (2016) 15. Zhang, X., Nie, W., Ma, Y., Tian, Q.: Cryptanalysis and improvement of an image encryption algorithm based on hyper-chaotic system and dynamic S-box. Multimedia Tools Appl. 76(14), 5641–15659 (2017) 16. Zhan, K., Wei, D., Shi, J., Yu, J.: Cross-utilizing hyperchaotic and DNA sequences for image encryption. J. Electron. Imaging 26(1), 013021 (2017) 17. Singh, J.P., Rajagopal, K., Roy. B.K.: A new 5D hyperchaotic system with stable equilibrium point, transient chaotic behaviour and its fractional-order form. Pramana 91(3), 1–10 (2018) 18. Kar, M., Kumar, A., Nandi, D., Mandal, M.K.: Image encryption using DNA coding and hyperchaotic system. IETE Tech. Rev. 37(1), 12–23 (2020) 19. Zhang, D., Chen, L., Li, T.: Hyper-chaotic color image encryption based on transformed zigzag diffusion and RNA operation. Entropy 23(3), 361 (2021) 20. Karmakar, J., Nandi, D., Mandal, M.K.: A novel hyper-chaotic image encryption with sparserepresentation based compression. Multimedia Tools Appl. 79(37), 28277–28300 (2020) 21. Fu, C., Chen, J., Zou, H., Meng, W., Zhan, Y., Yu, Y.: A chaos-based digital image encryption scheme with an improved diffusion strategy. Opt. Express 20, 2363–2378 (2012)

Object Detection Using Mask R-CNN on a Custom Dataset of Tumbling Satellite P. C. Anjali, Senthil Kumar Thangavel, and Ravi Kumar Lagisetty

1 Introduction The advancements in computer vision have led to various developments in image processing and their analysis in various applications [1–4]. Operations like noise filtering, image segmentation, feature detection, feature identification, and feature matching are performed on the images for analyzing the hidden information present. Some applications [5] where image processing is used are in building image search, image captioning, finding similar words from text images, using image datasets. The task of object detection is widely used in different applications like pose estimation, surveillance, vehicle detection, etc. Region-based convolutional neural network [6, 7] is an object detection model that was introduced to handle the various problems while working with a large number of selected regions to perform the classification task. This model utilized the algorithm, selective search to extract 2000 regions from the entire region proposals for the object that needs to detected. As a result, we need to classify only those 2000 proposed regions. But, the issue with the R-CNN model was in the time taken to train the network because here we need to classify 2000 region proposals per image. Also, there was no learning happening in a fixed algorithm like the selective search that might lead to the generation of bad candidate P. C. Anjali · S. K. Thangavel (B) Department of Computer Science and Engineering, Amrita University, Coimbatore, India e-mail: [email protected] P. C. Anjali e-mail: [email protected] R. K. Lagisetty Mission Simulation Group, URRAO Satellite Centre, ISRO Bangalore Platform and Specifition, Bengaluru, India e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_34

379

380

P. C. Anjali et al.

region proposals. Later came the Fast R-CNN [6, 7] model with much more speed than R-CNN because this model was not feeding the initial 2000 region proposals to the convolutional neural network each time. Instead, the convolutional operation was done only once per image, and a feature map was generated from it. But, the bottle next here was the region proposals that affected the performance of the model. To solve this issue, Faster R-CNN was introduced, where the authors introduced an object detection algorithm instead of selective search that made the network to learn the proposals for the regions. Earlier, the selective search was applied to the feature map output from convolutional neural network to get region proposals. But now, separate network was built to learn these regions and is reshaped by passing through region of interest (RoI) pooling which is then utilized to classify the image within the proposed region and also to predict the corresponding bounding box offset values. Then came the Mask R-CNN [8] model which is simply a Faster RCNN with a fully connected network (FCN) on the region of interests (RoIs). In this model, a mask was applied to the detected object in the image along with the classification and bounding box operations. The main intention for implementing this model was to improve the RoI pooling that was specific to the object detection task which led to the break-in pixel-to-pixel alignment when extending to image segmentation. Here, image segmentation [9] is a method in which a given digital image is sliced into various subgroups called image segments in order to reduce the complexity in analyzing the deeper information present in it and also makes the further processing simpler. Our paper focuses on performing object detection with the Mask R-CNN model and exploiting the segmentation technique in understanding the custom satellite components. Image segmentation is basically of two types: instance segmentation [10] and semantic segmentation. Semantic segmentation is the task of grouping parts of an image, which may or may not be near to each other, together, that belong to the same object class. It is considered to be a form of pixel-level prediction because each pixel in an image is classified according to a category. Whereas in instance segmentation, we will perform operations like detection and delineation on each distinct object of interest appearing in a given image. The importance of object detection [11] in the case of remote and uncooperative scenarios is very important that it helps in avoiding collisions, a hindrance to the smooth flow, etc. Similarly, it helps in detecting non-cooperative satellite components from images. In our paper, one input to the Mask R-CNN object detection model is images of a tumbling satellite that is obtained by capturing the rotation of the satellite using a 3-axis motion simulator. The idea of choosing a tumbling satellite is that our interest is in extending this object prediction model to a pose estimator that can help in predicting the rotation of the tumbling satellite when the model is provided with a satellite rotation video. By keeping this in mind, we are now trying to make our dataset more flexible for performing both object detection and pose estimation operations. Object tracking is the same as object detection, but the former is exploited mainly in dealing with videos in real-time applications. Many applications are making use of object tracking in various tasks like pedestrian detection, traffic signals, etc. Object tracking [12] is a deep learning application where the specific program inputs an

Object Detection Using Mask R-CNN on a Custom Dataset of Tumbling …

381

initial set of object detections and creates a unique identifier for each of those initial detections and then tracks the detected objects as they appear in the video frames. Also, these algorithms provide a tracking box around the detected object to locate them. Input to tracking algorithms can be an image or a video file or can be a realtime video or pre-recorded video, and it will make a corresponding effect on the performance of the algorithms. As a result, we need to choose tracking algorithms with respect to the input it is going to process. Our chosen model, Mask R-CNN is designed to perform object detection-related tasks. It is also used in object localization, object tracking, instance segmentation, etc. Along with object detection, the model also performs fetching of special information present inside the detected object in a given image by assigning a binary mask on it. We are then utilizing the masked image and performing corner detection on them to obtain meaningful object points so that it can be tracked in the remaining frames with the help of SORT tracker. The main difference between Faster R-CNN and Mask R-CNN is this mask assignment. We will discuss the Mask R-CNN model in the methodology section in detail. In the next section, we will go through the various papers that discuss pose estimation tasks implemented with object detection as well as object tracking as one of the key steps in determining the pose of a non-cooperative satellite. Also, will go through papers that discuss various implementations using different object tracking algorithms.

2 Literature Review In paper [13], a vision-based pose estimation model is built using the deep convolutional neural network. However, the proposed model works well for the synthetic dataset, in which a modified pre-trained GoogLeNet model was used and performed repurposing operations. In this application, the dataset is created using unreal engine 4 rendered of Soyuz spacecraft. Here, the correlations are made between spacecraft six-degrees-of-freedom parameters by learning from the data samples using the convolutional neural network. In paper [14], the authors proposed a novel design and verification method, which is used using a monocular-based approach to determine the position in the space application. A key contribution of this work lies in the development of a position-based decision-making system based on convolutional neural networks (CNNs) that provide first-time real-time estimates of the board. Their approach includes splitting the pose space and training the CNN model with images aligned with the pose labels. The parameters of CNN are estimated before the mission with synthetic imagery for obtaining a satisfactory result from training of the neural network, which demands a massive image dataset and computational resources. In this paper [15], a method to estimate the pose of a satellite using a residual neural network is suggested, and it also provides a practical implementation of how we can reduce the overfitting of the model. Initially, the author experimented with CNN and studied the mechanism for pose estimation. Later, the same has been

382

P. C. Anjali et al.

efficiently implemented using ResNet with mechanisms for reducing overfitting. In paper [16], the author proposes a new framework comprising the convolutional neural network (CNN) to implement the feature detection function, with a covariant efficient Procrustes Perspective-n-Points (cepPnP) solution along with an extended Kalman filter (ekf) to create a robust way of measuring positioning of a monocular in the vicinity of an unmanned spacecraft. The main contribution of this paper was in analyzing statistical information with the help of image processing task. This was achieved by connecting a covariance matrix to the heatmaps returned by the CNN network for each of the detected features in the image. In the paper [17], the authors proposed a mechanism to detect various components of a satellite by analyzing the characteristics of the satellite images in space. This approach was based on a regionalbased convolutional neural network (R-CNN) and was able to detect different satellite components accurately with the help of optical images. They compared the proposed model with the one implemented using R-CNN. The results showed that the proposed model was more efficient and gave better results while considering the f1-score and accuracy value. In the paper [18], the author proposes a method based on long short-term memory (LSTM) with multi-modality and adjacency constraints for performing image segmentation on brain images, named MSTM-MA. The suggested model employed two ways for feature sequence generation, they are, features with pixelwise and super pixel-wise adjacency constraints. The segmentation result was obtained by classifying these generated features into semantic labels with the help of the LSTM model. The experiments performed on brain web and membranes demonstrated that the proposed model with pixel-wise adjacency constraint achieved promising segmentation results, while with super pixel-wise adjacency constraint the model showed computational efficiency along with robustness to noise. In paper [19], a new approach for introducing the level of unconfirmed video segment is introduced. This method helped to obtain a reduced set of object suggestions that included a different set of items in the video. This paper has used the convolutional-recurrent network framework to study the score functions from the video segments. The paper also applied the modified point process (dpp) to the results to sample a well-defined and logical set of objects from the video. Paper [20] is based on a stacked RNN-based deep learning model that is used for the automatic prediction and classification of diseases in melons. This paper presents the correlations between watermelon yield and proposed disease prediction indices in the Indian largest watermelon-exporting form. They have used image processing techniques for distinguishing the illness in melon plants. The proposed method employed stacked RNN for the classification of diseases present in the melon. The proposed model for the prediction and classification of disease was developed using the TensorFlow and Keras framework. The paper used Adam as the optimization algorithm while performing the classification of disease and ReLU and softmax were used as the activation functions. In the paper [21], the authors proposed a method to handle occlusion, based on a recurrent neural network (RNN). This method reconstructs missed detection boxes that help

Object Detection Using Mask R-CNN on a Custom Dataset of Tumbling …

383

to preserve the id number of targets after occlusion by using the predicted detections in the next frames. The paper [22] discusses how the generative adversarial network (GAN) was successfully utilized to remove the image-wise color distortion of images captured by diffractive optics. The method shows some good image quality for a test sample. However, some reconstruction artifacts were produced by GAN for realscene images. In the paper, we study the nature of these artifacts. They have shown how overexposure and cross-like markers affect the occurrence of artifacts. In paper [23], a new model was built with a generative adversarial network (GAN) for lung ct image enhanced-quality (eq). The proposed eq-GAN model was trained with the help of a high-quality lung ct cohort to recover the quality visuals obtained from scanning that may be degraded with blur and noise. This method has paved the way to use eq as one of the preprocessing techniques for lung lobe segmentation. Further researches were also conducted to evaluate the impact of eq in adding robustness to airway and vessel segmentation and also to investigate anatomical details that are revealed in eq scans. In paper [24], a parallel binocular vision-based method was proposed to estimate non-cooperative satellite relative pose. It made use of the line feature of the target for extracting the features. In [25], a study on RGB-D camera pose estimation using deep learning techniques was done. The proposed network architecture in this paper is built with two main components: the convolution neural network (CNN) that helped in exploiting the vision information, and the long short-term memory (LSTM) unit that was utilized for incorporating the temporal information. The paper [26] introduces an improved Deep SORT YOLOv3 architecture for object tracking. This paper gave two main contributions; one is for handling the identity switches made by the YOLO object detection model by utilizing the Dlib tracker. The second contribution was in improving the operational speed by immediately passing the detected object one by one to the Deep SORT once they got detected in the image. The paper [27] gives an improvised version of the SORT object tracker by integrating the appearance information of the tracked object. This made the algorithm tracks the object under longer periods of occlusions, thus reducing the number of identity switches. The paper [28] proposes a method for both detection and tracking of objects efficiently by using an anchor-free object detection architecture of CenterNet called, FairMOT. This new model was mainly designed to handle the in-efficient re-ID assignment performed after the initial detections of the objects in multiple object tracking (MOT) tasks.

3 Methodology Since we are using Mask R-CNN as the object detector in SORT tracker, the dataset is prepared in accordance with this model. Once the tracker detects the satellite object in the first frame, the object is cropped and given to Harris corner detector to get the object points, and then, we are trying to track these points in the coming frames so that it can be used for tasks like pose estimation of object with Perspective-n-Point methods.

384

P. C. Anjali et al.

The entire workflow of the project is shown below in Fig. 1 with the help of block diagrams for each specific task. The tasks are categorized as: • • • • • • •

Video frames generation. Creation of annotation file using VGG2.0 tool. Training the Mask R-CNN model. Testing the trained model. Object tracking using SORT. Crop the satellite object. Corner detection using Harris corner detector.

Fig. 1 Procedure flowchart

Object Detection Using Mask R-CNN on a Custom Dataset of Tumbling …

385

Fig. 2 Video frames generation

Fig. 3 Annotation file generation for both training and validation images using VGG annotation tool of version 2

3.1 Video Frames Generation The video frames are created using a simple Python code where we have utilized the OpenCV-image package. We have considered every fifth frame of the video generated in a second. Figure 2 depicts the steps followed to create video frames for dataset creation.

3.2 Creating Annotation File We have used the visual geometry group (VGG) tool of version 2 for annotating the required objects in the frames as satellite and non-satellite bodies and the steps followed is depicted in Fig. 3. The satellite object that we are going to detect is shown Fig. 4, with annotations made using VGG.2 tool.

3.3 Training Mask R-CNN Any task related to neural networks will be going through an unavoidable step called training the chosen model. In our paper, for performing the task of object detection on a custom dataset consisting of tumbling satellite images, we employed the most advanced R-CNN model called, Mask R-CNN. Before moving to the Mask R-CNN

386

P. C. Anjali et al.

Fig. 4 Objects that are detected using Mask R-CNN model

model, a small introduction to the basic R-CNN model is provided so that it helps to make the basic concept of Mask R-CNN easier to understand. Region-Based Convolutional Neural Network (R-CNN) R-CNN is a family of machine learning models designed for performing various tasks in computer vision and image processing. These models are specially designed for object detection tasks, where the real intention is to detect objects for which they are trained, in any given image and define boundaries around them. The embedded image provided by the R-CNN model which is depicted in Fig. 5 will undergo a process called the selected search to extract information about the area of interest. The area of interest is highlighted by rectangular boundaries. In the first R-CNN model, there were more than 2000 regions of interest for the detecting object in the picture. This interesting region passes through the CNN network to produce underlying features. These elements are then transmitted through the SVM (vector machine) phase to classify the objects presented under the region of interest. Mask R-CNN Mask R-CNN is an extended version of Faster R-CNN with image segmentation as an added feature. The steps involved in Mask R-CNN are illustrated in Fig. 6.

Object Detection Using Mask R-CNN on a Custom Dataset of Tumbling …

387

Fig. 5 R-CNN architecture

Fig. 6 Mask R-CNN

Mask R-CNN model mainly consists of: • • • •

A layer to perform region of interest alignment operation. Classification layer Regression layer and Segmentation layer.

The layers that perform the RoIAlign operation as well as bounding box assignment forms the fully connected network for Mask R-CNN model. The model will take the dataset images as well as an annotated json file for training. This is shown in Fig. 7. The output will be the trained weights for each epoch; here, we have a total of 10 epochs for training the Mask R-CNN model. Hyperparameters Since we are using CPU for training the Mask R-CNN model, the images per GPU are initialized with the value 1. The trained output of Mask R-CNN is shown below Fig. 8. Among the parameters used for training the model, some important ones are mentioned below in Table 1. Backbone The Mask R-CNN model we are employing uses resnet101 as the backbone. ResNet versions like resnet50, resnet18, etc., are also available. ResNet101 was built with a convolutional neural network and is having a depth of 101 layers. The pre-trained

388

Fig. 7 Block diagram for training the Mask R-CNN model

Fig. 8 Training output of Mask R-CNN model with dataset

P. C. Anjali et al.

Object Detection Using Mask R-CNN on a Custom Dataset of Tumbling …

389

Table 1 Hyperparameters Weight delay

Learning rate

Batch size

Activation function

Learning momentum

Optimizer

Training set

240

240

480

0.9

SGD

version of the model using the ImageNet database is also available [29]. This pretrained network contains images of 1000 object categories. Some examples include keyboard, mouse, pencil, and many animals. As a result, the network was able to learn rich feature representations for a large collection of images. This network is designed with input images of the size of 224-by-224. The word ResNet [30] stands for residual network. This network was designed and implemented in the year 2015 by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, and published the findings in their 2015 computer vision research paper titled ‘Deep Residual Learning for Image Recognition’. The ResNet classification model was the model that received the top position at the ImageNet Large Scale Visual Recognition Challenge (ILSVRC 2015) classification competition which was held for evaluating algorithms that handle object localization, object detection, and image or scene classification from images at a large scale, with an error rate of only 3.57%. This model has also come first in the performing detection and localization on the ImageNet dataset as well as in performing detection and segmentation on the COCO dataset, in the competitions held on ILSVRC & COCO in the year 2015. ResNet model has many variants that run on the same concept with a difference in their number of layers. Resnet50 denotes the variant that can work with 50 neural network layers while ResNet101 is the model that is 101 layers deep. ResNet models are employed in various computer vision tasks. One such task is using deep neural networks in computer vision. The main reason for using more neural network layers to form a stacked structure is that it helps to solve more difficult and complex tasks by making use of each group of layers to solve a specific sub-task. In this way, it helps to obtain a more efficient and satisfying result for the underlying problem. But, the problem with deep stacked layers is the issue of degradation. That means, as the number of layers in the neural network increases, their accuracy in the operations performed will get saturated, and it will gradually degrade after a point. This will automatically affect the performance of the model in training as well as testing the data. This is not caused because of overfitting. But, it can be due to the network initialization parameters, function employed for optimization, or, can be due to vanishing or exploding gradients. ResNet model was introduced to handle this specific problem of accuracy degradation. Deep residual nets make use of residual blocks to improve the accuracy of the models. The strength or key part of residual blocks is the skip-connections that make the neural network efficient. There are two main ways these connections work. Firstly, they simplified the vanishing gradient problem by building a different path for the gradient to pass through. Along with that, the model was modified to learn

390

P. C. Anjali et al.

an identity function, which made both the higher and lower layers perform similarly. With this introduction of residual blocks, the learning of identity function became easier. This improved the deep neural network efficiency at the same time reducing the error percentage. With the introduction of skip connections, training in complex deep networks became possible.

3.4 Testing Testing of the trained model of Mask R-CNN is done by taking a new image that is not present in the train as well as in test set for performing the object detection task. We have kept a separate folder for test sample images. The control flow of the steps followed in testing is shown in Fig. 9. A detailed explanation of testing and training procedures is also given in the experiment section. Sample output obtained with the Mask R-CNN model is shown below Fig. 10.

Fig. 9 Steps for testing the trained Mask R-CNN model

Fig. 10 Mask R-CNN detection result

Object Detection Using Mask R-CNN on a Custom Dataset of Tumbling …

391

3.5 Object Tracking with SORT Input to the object detection model is a video file with captured rotation of a noncooperative satellite. Hence, we need to use a more robust object detection algorithm that is specially designed for object tracking in videos. SORT [31], which stands for simple online and realtime tracking algorithm, is designed for multiple objects tracking in videos. The main focus of this algorithm is to use objects efficiently in online and real-time applications. Also, the main advantage of this algorithm is that it can be fed with the detection result of any object detection algorithm like Faster R-CNN [32], YOLOV4 [33], Mask R-CNN, etc., using which it will track the object in the video frames. The SORT object tracker was initially built with the object detection result from the Faster R-CNN model. The paper [31] states that the performance of this algorithm can be improved up to 18.9% with an efficient object detection model. The tracker algorithm is provided with the detections of previous and current frames to locate the object in the coming frames. The correction information is from associating objects in two adjacent frames [34]. This model is designed basically for pedestrian detection as part of multiple object tracking. One of the frames with tracked satellite object is shown below Fig. 11. We are using the detection result of Mask R-CNN and providing the same to the SORT tracker, which will generate tracking ids for each tracked object

Fig. 11 Tracking window with id

392

P. C. Anjali et al.

in the video frame and then display the tracking box with associated ids. The detection result to the SORT tracker will be three components, namely, the bounding box, class ids, and the confidence score of the detected object by the Mask R-CNN model. The input to the SORT tracker can also be a detection text file that contains the detection result in MOT [35] format, namely, a CSV text file having one object instance per line. Each instance should must contain 10 values: , , , , , , , , , and . The conf_score indicates the detection confidence in the det.txt file, and it acts as a criterion to consider whether the ground-truth value needs to be considered in the detection file or not. According to the MOT challenge, a zero confidence score means that this particular instance is ignored in the evaluation, while any other value indicates that it can be considered.

3.6 Crop the Satellite Object Once the satellite object is tracked with the help of SORT tracker, we will crop it with the help of ROI values for corresponding class obtained from Mask R-CNN. This is done to specifically focus on our target object and to get proper object points while feeding the same to the corner detector algorithm in the next step. The sample output obtained is shown in Fig. 12. Fig. 12 Cropped satellite object

Object Detection Using Mask R-CNN on a Custom Dataset of Tumbling …

393

3.7 Corner Detection Using Harris Corner Detector. ‘Harris corner detection algorithm [36] was developed to identify the internal corners of an image. The corners of an image are basically identified as the regions in which there are variations in large intensity of the gradient in all possible dimensions and directions. Corners extracted can be a part of the image features, which can be matched with features of other images and can be used to extract accurate information. Harris corner detection is a method to extract the corners from the input image and to extract features from the input image’. Figure 13 represents the image obtained after detecting the corner points from the given cropped input image after passing through Harris corner algorithm. As we discussed above, the corner points are detected based on the change in intensity of the gradients.

4 Experiments The test environment for capturing the demo satellite rotation was set up in ISRO Bangalore Office. Rotation video was captured using a 3-axis motion simulator, Webcam Full HD 1080P PeopleLink i5 Plus, 3 MegaPixel camera model. The rotation videos are then converted into frames. By the Pareto principle, we divided the dataset as a train and validation set with an 80:20 ratio. For each frame in the train set as well as in the validation set, the required object to detect that is the satellite was annotated Fig. 13 Corner detection using Harris corner algorithm

394

P. C. Anjali et al.

Fig. 14 Sample images of tumbling satellite in the custom dataset used for training the models

Table 2 Dataset distribution

Category

No. of images

Class-1

Class-2

Training set

240

240

480

Validation set

60

60

120

Test set

21

21

42

using the VGG annotation tool of version2. The annotation file was extracted in JSON format and is placed inside the respective folders. Once the required classes are added according to the format of Mask R-CNN and the config parameters are ser, we can go for training the model with available pre-trained weights. We are making use of a pre-trained weight file that was trained on a large dataset like MSCOCO, for initial training. We are training the model for 10 epochs. Once the training is done, we will obtain ten weight files in our logs folder, using which we can proceed to test the Mask R-CNN model with our dataset. The sample images in our dataset are shown in below figure, Fig. 14. We have to load the model in inference mode for training with the latest trained weight file. We will try initially testing single images in our validation set and then compare them with an image from the test set. Hereby comparing the images, we can check whether the Mask R-CNN model can perform the object detection task when provided with a custom dataset. Once the result is satisfactory, we can go for testing the model on tumbling satellite video. Testing and training dataset details are mentioned below in Table 2. Here, Class-1 indicates the Spacecraft object in the image, and Class-2 indicates the object, a non-satellite body. Histograms for layers, namely Conv2D, Dense, and Conv2DTranspose are shown in Figs. 15 and 16.

5 Performance To check the performance of the Mask R-CNN model on the given custom dataset, we took the sample of non-contiguous frames and tested it. Also, the detection of the Mask R-CNN model was compared with the YOLOV4 model for detecting the satellite object in the given video frames. A performance comparison table is provided in Table 3. For Mask R-CNN model and YOLO model.

Object Detection Using Mask R-CNN on a Custom Dataset of Tumbling …

395

Fig. 15 Distribution of detection classes and the bounding boxes on the trained images

Fig. 16 Histogram representation of the distribution of masks on the objects present in the input image

Table 3 Performance comparison on the models

Model

Classes

Accuracy (%)

Time (s)

Mask R-CNN

Satellite

98

5

Non-satellite body

97

5

Satellite

98

5

Non-satellite body

95

5

YOLOv4

YOLOv4 [34, 37] means You Only Look Once version 4. This model is designed mainly for real-time object recognition systems to handle multiple objects detection in a single frame and is shown in Fig. 17. It is claimed that YOLO model recognizes objects more accurately and quickly than other recognition models. This model was built with the capability to predict up to 9000 classes, and it also handles objects that belong to unseen classes [37]. It will detect and recognize multiple objects from given input image and also give a boundary box around that object. Also, this model can be trained as well as deployed easily in a production system.

396

P. C. Anjali et al.

Fig. 17 Normal YOLOv4 architecture [37]

5.1 Performance Matrix Precision: Precision = TP/(TP + FP)

(1)

We obtained a precision value of 1.0 for the validation set. Recall: Recall = TP /(TP + FN)

(2)

For the validation set, we got 0.016 as the recall score, where the terms TP, FP, and FN stands for true positive, false positive, and false negative, respectively. Precision-Recall Curve: This is a graphical representation of precision and recall values that helps in evaluating the performance of an object detection model. A model is considered to be a good predictive model if the precision value remains high as the recall value increases. Figure 18 represents the precision-recall curve for the evaluate set and is the same as true-positive rate vs. false-positive rate. Confusion matrix: This matrix is structured in a tabular format that utilized to describe the performance of a classification model on a given set of testing data, and also, the true values of the same data are already known. The confusion matric for the validation set is shown in Fig. 19. Interest over Union (IoU): The Intersection over Union (IoU) ratio tells whether a predicted outcome belongs to a true positive or a false positive category. It gives how much area got intersected or overlapped by the bounding box around a predicted object and around its ground reference data. IoU = Area of Overlap/Area of Union IoU value obtained for our three classes is as given below:

(3)

Object Detection Using Mask R-CNN on a Custom Dataset of Tumbling …

Fig. 18 Precision-recall curve for the validation set

Fig. 19 Confusion matrix for the validation set

397

398

P. C. Anjali et al.

[[0.82225156 0. 0. ] [0. 0.8370366 0. ] [0. 0. 0.8601964]] mAP: mAP stands for mean average precision. The mean average precision (mAP) is the average AP over multiple Intersection over Union (IoU) thresholds. We obtained a value of test/val set = 0.993 for the test set as well as the validation set.

6 Conclusion This paper is intended to provide a simple understanding of how CNN models, especially Mask R-CNN, are used for object detection tasks and what all are the various steps involved in it. The paper also gives an idea of how we can integrate an object detection model to the SORT object tracker to handle detection tasks on real-time applications. We need to be more careful in dealing with image processing steps and key point estimation as they are the vital steps that can lead to undesirable results if not given much focus. As a further step, the project intended to estimate the pose of un-cooperative satellite close using the PnP algorithm and studying its scope of improvement. Acknowledgements The authors would like to acknowledge the facilities related to computing provided by Vision and Network Analytics (VIShNA) Lab at Amrita School of Computing, Amrita Vishwa Vidyapeetham, Coimbatore, for carrying out the experiments. Also, special thanks to Assistant Prof. M Uma Rani, for giving immense support and guidance throughout the projects progress

References 1. Sridhar, P., Thangavel, S.K., Parameswaran, L.: A New Approach for Fire Pixel Detection in Building Environment Using Vision Sensor. Adv. Intell. Syst. Comput. 392–400 (2021) 2. Gautam, K.S., Thangavel, S.K.: Video analytics-based facial emotion recognition system for smart buildings. Int. J. Comput. Appl. 43(9), 858–867 (2021) 3. Subbiah, U., Kumar, D.K., Senthil Kumar, T., Parameswaran, L.: An extensive study and comparison of the various approaches to object detection using deep learning. In: 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT) (2020) 4. Rudra, S., Senthil Kumar, T.: A Robust Q-Learning and Differential Evolution Based Policy Framework for Key Frame Extraction. Springer Advances in Intelligent Systems and Computing, vol. 1039, pp. 716–728. Springer International Publishing, Cham (2019) 5. Krishna, B.N., Sai and Sasikala T, Object detection and count of objects in image using tensor flow object detection API. In: 2019 International Conference on Smart Systems and Inventive Technology (ICSSIT) (2019) 6. CreateBytes: Top 10 Applications of Image processing. CreateBytes [Online] (2021). Available at https://createbytes.com/insights/Top-10-applications-of-Image-processing/

Object Detection Using Mask R-CNN on a Custom Dataset of Tumbling …

399

7. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014) 8. Gandhi, R.: R-CNN, Fast R-CNN, Faster R-CNN, YOLO—Object Detection Algorithms (2018) [Online]. Towards Data Science. Available at https://towardsdatascience.com/r-cnnfast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e 9. He, K., et al.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017) 10. Tyagi, M.: Image Segmentation: Part 1. Towards Data Science (2021) [Online]. Available at https://towardsdatascience.com/image-segmentation-part-1-9f3db1ac1c50 11. Michael: Instance vs. Semantic Segmentation: What Are the Key Differences? Keymakr (2021) [Online]. Available at https://keymakr.com/blog/instance-vs-semantic-segmentation/#: ~:text=In%20other%20words%2C%20semantic%20segmentation,a%20dataset%20for%20i nstance%20segmentation 12. Meel ,V.: Object tracking (2021). Viao.ai [Online]. Available at https://viso.ai/deep-learning/ object-tracking/ 13. Phisannupawong, T., Kamsing, P., Torteeka, P., Channumsin, S., Sawangwit, U., Hematulin, W., Jarawan, T., Somjit, T., Yooyen, S., Delahaye, D., Boonsrimuang, P.: Vision-based spacecraft pose estimation via a deep convolutional neural network for noncooperative docking operations. Aerospace 7(9), 126 (2020) 14. Sharma, S., Beierle, C., D’amico, S.: Pose estimation for non-cooperative spacecraft rendezvous using convolutional neural networks. In: 2018 IEEE Aerospace Conference, pp. 1–12 (2018). https://doi.org/10.1109/aero.2018.8396425 15. Zhao, Z., Zheng, P., Xu, S., Wu, X.: Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3212–3232 (2019). https://doi.org/10.1109/tnnls.2018.287 6865 16. Deori, B., Thounaojam, D.M.: A survey on moving object tracking in video. Int. J. Inf. Theor. 3(3), 31–46 (2014). Chen, Y., Gao, J., Zhang, K.: r-CNN-based satellite components detection in optical images. Int. J. Aerosp. Eng. 10 (2020). (Article id 8816187). Https://doi.org/https:// doi.org/10.1155/2020/8816187. 17. Xie, K., Wen, Y., LSTM-MA: A LSTM method with multi-modality and adjacency constraint for brain image segmentation. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 240–244 (2019). https://doi.org/10.1109/icip.2019.8802959 18. Theran, C.A., Álvarez, M.A., Arzuaga, E., Sierra, H.: A pixel level scaled fusion model to provide high spatial-spectral resolution for satellite images using lstm networks. In: 2019 10th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (Whispers), pp. 1–5 (2019). https://doi.org/10.1109/whispers.2019.8921269 19. Jayakumar, D., Elakkiya, A., Rajmohan, R., Ramkumar, M.O.: Automatic prediction and classification of diseases in melons using stacked RNN based deep learning model. In: 2020 International Conference on System, Computation, Automation and Networking (ICSCAN), pp. 1–5 (2020). https://doi.org/10.1109/icscan49426.2020.9262414 20. Sunil Kumar, K.H.,Indumathi, G.: A novel image compression approach using DTCWT and RNN encoder. In: 2020 IEEE Bangalore Humanitarian Technology Conference (B-HTC), pp. 1–4 (2020). https://doi.org/10.1109/b-htc50970.2020.9297955 21. Gu, J., Shen, Y., Zhou, B.: Image processing using multi-code gan prior. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3009–3018 (2020). https://doi.org/10.1109/cvpr42600.2020.00308. 22. Jammes-floreani, M., Laine, A.F., Angelini, E.D.: Enhanced-quality gan (eq-gan) on lung ct scans: toward truth and potential hallucinations. In: 2021 IEEE 18th International Symposium on Biomedical Imaging, pp. 20–23 2021. https://doi.org/10.1109/isbi48211.2021.9433996. 23. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arxiv:1409.1556 [online]. Available http://arxiv.org/abs/1409.1556 24. Wikipedia Contributors (2019). ImageNet [online] Wikipedia. Available at https://en.wikipe dia.org/wiki/ImageNet.

400

P. C. Anjali et al.

25. Li, R., Zhou, Y., Chen, F., Chen, Y.: Parallel vision-based pose estimation for non-cooperative spacecraft. Adv. Mech. Eng. (2015). https://doi.org/10.1177/1687814015594312 26. Dang, T.L., Nguyen, G.T., Cao, T.: Object tracking using improved deep_SORT_YOLOv3 architecture. ICIC Expr. Lett. 14, 961–969 (2020). https://doi.org/10.24507/icicel.14.10.961. 27. Wojke, N., Bewley, A., Paulus,D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649 (2017). https://doi.org/10.1109/ICIP.2017.8296962 28. Zhang, Y., et al.: Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 129(11), 3069–3087 (2021) 29. Shakhadri, S.A.G.: Analytics Vidhya [Online] (2021). Available at https://www.analyticsvid hya.com/blog/2021/06/build-resnet-from-scratch-with-python/ 30. ResNet101 (2017). MathWorks [Online]. Available at https://in.mathworks.com/help/deeple arning/ref/resnet101.html 31. Bewley, A., et al.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP). IEEE (2016) 32. Ren, S., et al.: Faster r-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015) 33. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) 34. Notes on tracking algorithms SORT and DeepSORT. Hao Gao (2019). Medium [Online]. Available at https://medium.com/@smallfishbigsea/notes-on-tracking-algorithms-sort-anddeepsort-d2684ced502f 35. Multiple Object Tracking Benchmark (2021). MOT Challenge [Online]. Available at https:// motchallenge.net/ 36. Python | Corner detection with Harris Corner Detection method using OpenCV (2022). GeeksforGeeks [Online]. Available at https://www.geeksforgeeks.org/python-corner-detection-withharris-corner-detection-method-using-opencv/ 37. What’s new in YOLOv4? Roman Orac (2020). Towards Data Science [Online]. Available at https://towardsdatascience.com/whats-new-in-yolov4-323364bb3ad3

Decentralized Blockchain-Based Infrastructure for Numerous IoT Setup C. Balarengadurai, C. R. Adithya, K. Paramesha, M. Natesh, and H. Ramakrishna

1 Introduction The realm of wireless networking and semiconductor systems, where all physical objects can be combined with information and communications from IoT to receive data, exchange it, and respond to it has made numerous achievements in recent years [1]. This idea talks about a whole new world of possibilities, including smart homes, smart cities, and intelligent transportation. Because of the exponential increase of IoT devices and data processing, traditional centralized management solutions are no longer feasible due to security, scalability, and excessive delays [2]. Bitcoin [3] and Ethereum [4] introduce the blockchain technology, which proves successful in business bids and can provide a promising solution for these problems through its decentralized architecture construction. In blockchain network [5– 7] processes and intelligent contracts for the first time. The studies then address the various potential usage situations in which IoT will take advantage of blockchain technology. Then [8] demonstrates the installation of a blockchain on IoT equipment like the Raspberry Pi, with IoT equipment being specifically engaged in blockchain activity and smart contracts executed. However, blockchain can take several services, which could be too costly for them, directly on IoT computers [9]. The authors of [10] advocated for two-factor IoT infrastructure authentication. The suggested approach uses a blockchain to save and authenticate authentication codes provided over an out-of-band station like light and child. The author [11] suggested the mechanism by which a blockchain linked platform could have consumer privacy consent. A transfer between IoT devices and users serves as the mediator to remind the user of the privacy policy (stored on the blockchain) and avoid the gathering of confidential user data on IoT devices. Many studies have investigated the usage of C. Balarengadurai (B) · C. R. Adithya · K. Paramesha · M. Natesh · H. Ramakrishna Vidyavardhaka College of Engineering, Mysuru 570002, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_35

401

402

C. Balarengadurai et al.

blockchain and smart IoT access control contracts. The author proposed a blockchainbased architecture for IoT application access control management in [12]. A control hub for extracting the intensive resource role from IoT devices is integrated into the proposed architecture. A single smart contract has been implemented to facilitate access management policies. Ethereum-based demonstration of the viability of the scheme has been developed. In [13], the authors suggested a blockchain-based approach to give IoT resources access to the blockchain by using an intelligent contract to build a temporary key. In [14], the writers suggested a control framework for access based on intelligent IoT contracts. The scheme suggested uses several intelligent contracts for each pair of objects and implements both a static permit/default right and an interval-based dynamic access right. In [15], authors suggested a blockchain-based architecture to secure IoT data access. To store IoT data and make advantage of the blockchain network to do so, author suggested using an off-chain database. The previous segment indicates that researchers became more involved in the application of blockchain technologies in IoT scenarios. However, the practical deployment of specific IoT systems is only feasible under a few studies. Furthermore, in a huge IoT implementation environment, there is no research to investigate the scalability of blockchain technologies. In this article, we are attempting to resolve the topic of IoT device reconfiguration different from the studies described earlier, in which IoT device configurations can be modified safely and scalable. A blockchain-based modular architecture enabling large-scale IoT reconfiguration is explicitly suggested in this document. To fix the problem of resource use, we propose firstly to decouple by packing IoT gadgets from blockchain network activities. REST APIs provide lightweight access to the blockchain network. Next brilliant agreement has driven distribute/buy if cycles dependent on shrewd arrangements are introduced to empower on-request and convenient setup of all local area gadgets. Finally, a proof-of-concept test bed is created based on existing IoT system deployment and used in mass reconfiguration scenarios to evaluate the proposed architecture’s performance. This is our first work to tackle the issue of modular IoT setup by leveraging blockchain technologies to the best of our knowledge. The remainder of the document has been planned accordingly. It is specified that the future structure design and process flow for large-scale Internet of Things reconfiguration.

2 Related Work Bitcoin-NG has been suggested for Bitcoin’s transaction potential by Eyalet al. [7] in 2016 wave network. A blockchain project is now utilizing a version of Bitcoin-NG known as Waves-NG. Bitcoin-NG is mainly concerned with decoupling block generation to two levels: leaders choice and serialization of transactions that correspond respectively to two types of blocks: “main blocks and micro-blocks”.

Decentralized Blockchain-Based Infrastructure for Numerous IoT Setup

403

The exception of the real exchange in the miniature squares and the key squares are practically identical to Bitcoin’s squares, which have a normal square recurrence of 10 min and give an answer for a hash puzzle that mirrors the work’s proof. If a key block is mined, the current key block miner creates all subsequent micro-blocks before the next key block is formed. Micro-blocks are produced deterrently and contain no job proof. The frequency of the micro-block is to allow for the largest possible number of transactions in the main block miner (up to one maximum). To finalize and repair main block forks, the lengthiest chain rule is still in operation. As for the micro-blocks, bitcoin-NG depends on an arrangement of a heavychain extension lead instruction and an expanding rule for the longest chain, and to finalize and solve forks of micro-blocks, 60% of exchange charges got from miniature squares by the current primary digger square would be reallocated to an excavator of the following enormous square to fuel reasonable association and to deter the generally key digger from redundancy and other misbehavior. With regard to the efficiency of Bitcoin-NG, key blocks do not have the transaction quality that is entirely determined by the bulk and regularity of the micro-block. The frequency of micro-blocks must be managed so that an excessive number will exceed the bandwidth of the network and frequently create the key block gaps. In the event of 10 min of the key block frequency and 1 MB of the micro-block duration, hypothesized Bitcoin-NG can attain up to 200 TPS if the micro-block frequency is maintained at 12 s (minimum realistic block length under the current Bitcoin network requires). On the downside, the main block miner may become the victim of denial of service or exploitation due to the determinism that exists in micro-block generation. An affected main block miner will selectively enclose transactions or complete inconsistent transactions, which can result in the network costing more than one block period. The number of identities a player can handle is equal to his percentage of Sybil’s computing power. Because of its identities, the program has an appropriate BFT protocol for committing transactions like PBFT and SGMP. The received processing fees are allocated uniformly among all names. This effectively decouples Peer Consensus participation management from processing of transactions and enables them to improve efficiency. In addition, Peer Consensus is not able to monitor male ability of transactions as the transaction history is not registered in blockchain. This aims to restrict the size of an independently running subcommittee which uses a BFT consensus protube to direct a local blockchain. The cumulative contributions of all subcommittees in the global blockchain were worked out by a specialist finalization team. The Merkle tree root of each block introduced by each subcommittee is kept in a block on the global chain. To secure consensus, SCP needs a two-thirds majority of honest computing resources to be retained by each subcommit and the whole network. But the usage of shading and a specialized finalization committee implies that the network in well-designed direction happens until it exists, which in a sense goes against the egalitarian ideal of distributed blockchains.

404

C. Balarengadurai et al.

In specific, sharing has been extensively explored in programming communities to improve blockchain transaction throughput. The intrigued per users, a creation history, and the cutting edge of blockchain sharing are recorded. The abovementioned BFT hybrid procedures similar aspect in that the PoW function is used to establish a secure consensus group in each instance of the BFT protocol due to the scalability of the overhead communication of the BFT protocol if the consensus group is out of reach due to the scalability of the overhead communication of the BFT protocol if the consensus group is out of reach due to the scalability of the overhead communication of the BFT protocol if the consensus group, although the PoW-based regulation of involvement has no real consent and is available to anybody with computing power.

3 Proposed System In this section, following details of the reorganization workflows, the proposed blockchain-based framework is introduced for major IoT reconfiguration in Fig. 1. The planned architecture demonstrates the building blocks, and the managers, IoT system, and the blockchain network are the three main components of the suggested design. For quick control, the IoT devices are grouped together. For example, within the smart cities framework, IoT devices may include electrical smart meters in each home, and a community could contain all the gadgets between the two junctions. The core principle of architecture was to use the blockchain as the source of faith for all the communications between IoT and administrators. An event publishing/subscribing framework is used to promote successful collaboration here between blockchain and IoT devices while preventing costly computing on those tiny IoT devices. In addition, the blockchain network has controls for access management allowing all configuration change operations to be authenticated and monitored. In addition, intelligent contracts are used to upgrade the blockchain configurations.

Fig. 1 Proposed blockchain model

Decentralized Blockchain-Based Infrastructure for Numerous IoT Setup

405

To enable configuration preservation and upgrade, a specification information structure is recommended to be kept on the blockchain, as displayed in Fig. 1. Regardless of whether another arrangement is submitted or used effectively or in mistake, the setup status is the present status of the latest form, which can comprise of either “Send,” “Submit,” and “Blunder.” The Device ID is assigned to each device and can be linked to one or more Group IDs. Module ID and Activation Flag are both included in IoT modules. Configuration modules are to control the configuration of the module and setup statuses to report the upgrade status of a module. Modifications to the configuration modules are saved in an array for each module, including a Module ID to identity. Let n be the total number of machines, C j (2) is the IoT system settings for j, ck is the IoT device k settings for j, and U = {u1 , uS ,… uN } is the administrator’s upgrade settings, and workflow problems are there in workflow. A device group is defined as follows for reconfiguration. The administrator, when necessary, composes the IoT modules in one group update U = {u1 , …. uN } and sends a POST request and a configuration update to the blockchain API with a corresponding group identity. A smart contract is implemented in Algorithm 1, and the setup data structure is based on the Group ID field, to use the blockchain for the group and to adjust the configuration state to “Send,” meaning that new configurations will be submitted. The smart contract is conducted on Algorithm 1. The event is then distributed to each IoT system party, then a GROUPSUBMIT event. The case will also include a list of upgradable modules. If the event has been issued, the IoT system can touch the latest blockchain settings, making the improvements, and POST will validate the database for all the modified modules becoming enabled.

4 Results and Discussions A performance overview of the suggested architecture is provided in this section. The first step is to define the conceptual concept setup. This configuration then explores the usage of resources on IoT computers. Finally, the huge IoT reconfiguration efficiency appraisal would be demonstrated. To assess the efficacy of the suggested design, a proof-of-idea test proving ground was made. The blockchain contraption runs Ubuntu-16.04 on an Intel Core [email protected] GHz processor with 16 GB of RAM (Xenial). The blockchain stage is worked with Hyperledger Composer variant 0.20 [16]. A Raspberry Pi 3 Model B is utilized to illustrate connectivity with the IoT system and understand the usage of resource on small IoT devices. A Node js script is used to implement the REST client. In order to view an IoT unit on the ground, Raspberry Pi is mounted to a sensing hat [17] with a number of different devices including weather, moisture, and gyroscope. It is difficult to purchase several Raspberry Pi boards only for the demo of several IoT system reconfigurations; as a result of this, multiple Node js scripts running simultaneously on a Windows 10 desktop are emulated by the IoT devices. For

406

C. Balarengadurai et al.

Fig. 2 Performance of proposed system

firmware changes, the Raspberry Pi temperature sensor is included in this setup. The findings from the test show a mediocre processor utilization of approximately 0.69% and an overall CPU usage of approximately 5%. The RAM usage remains reasonably steady at 28 MB, which is limited relative to the device’s overall RAM resource (1 GB). There is also an incredibly poor average use of the network through REST API of around 7 bytes each second in network statistics. Figure 2 shows the performance evaluation of the proposed framework. Figure 2 demonstrates that the proposed system gives prompt response for the configuration submission and updates the configuration immediately. Due to prompt response, the proposed architecture significantly reduced the average total processing time. These findings demonstrate that a scheme that has a marginal impact on the current implementations is appropriate for small IoT systems. This segment focuses on the efficiency of large IoT system reconfiguration. To this end, the computing time is considered when reconfiguring a set of IoT devices with separate IoT device numbers in a group. As mentioned before, IoT devices are simulated on a desktop computer by running several Node js scripts. Both measurements are carried out 20 times, and the results are summed in our studies. The group-sized upgrade time is 20–100 IoT computers. The estimated cumulative time for processing is determined from the time the upgrade script is executed until the COMMIT event has been received from all the machines in the group on the administration console. The increasing group size and the total processing time increase linearly. The results of the experiment suggest that the rate of increase can be scaled up to 20% on 20 devices for big groupings of IoT devices (or around 1% per additional equipment). In addition, the pattern submission time is divided into framework submission time, which is determined by downloading the content from the managerial control center to the gathering of the occasion “SUBMIT” connected with the execution of the insightful agreement for the updating of all squares designs, and the setup redesign time is estimated from the occasion “SUBMIT” to the occasion. The submission settings are at a slower level than the upgrade software. This is because in previous

Decentralized Blockchain-Based Infrastructure for Numerous IoT Setup

407

processing, the blockchain operations were merged and conducted in one batch, while the blockchain upgrade is performed by one of the computers and is thus more expensive during later processing.

5 Conclusion This article suggests a blockchain design for large-scale configuration of IoT computers. The REST API’s subscribe feature is utilized to decouple the sourceintensive activities on Internet of Things devices connected to blockchain. A strategy of access management is used to govern the access to the blockchain data system. Intelligent contracts are used to update the settings of a set of huge IoT devices. A proof-of-idea proving ground is utilized to assess the proficiency of the proposed plan. The trial discoveries show that the utilization of assets in IoT gadgets is negligible and that the proposed work processes will provide a modular solution to allow improvements to settings on demand for several IoT devices.

References 1. Lin, J., Yu, W., Zhang, N., Yang, X., Zhang, H.: A survey on internet of things: architecture, enabling technologies, security and privacy, and applications. IEEE IoT J. 4(5), 1125–1142 (2017) 2. Mohanty, S.P., Choppali, U., Kougianos, E.: Everything you wanted to know about smart cities: the Internet of Things is the backbone. IEEE Consum. Electron. Mag. 5(3) (2016) 3. Nakamoto, S.: Bitcoin: Apeer-to-Peer Electronic Cash System. https://bitcoin.org/bitcoin.pdf 4. Wood, G.: Ethereum: A Secure Decentralised Generalized Transaction Ledger. https://gav wood.com/paper.pdf 5. Conoscenti, M., Vetro, A., Martin, J.C.D.: Blockchain for the Internet of things: a systematic literature review. In: International Conference of Computer Systems and Applications (2016) 6. Christidis, K., Devetsikiotis, M.: Blockchains and smart contracts for the Internet of Things. IEEE Access 4, 2292–2303 (2016) 7. Singh, M., Singh, A., Kim, S.: Blockchain: a game changer for securing IoT data. In: IEEE World Forum on Internet of Things (2018) 8. Huh, S., Cho, S., Kim, S.: Managing IoT devices using blockchain platform. In: International Conference on Advanced Communication Technology (2017) 9. Dorri, A., Kanhere, S.S., Jurdak, R., Gauravaram, P.: Blockchain for IoT security and privacy: the case study of a smart home. In: IEEE International Conference on Pervasive Computing and Communications Workshops (2017) 10. Wu, L., Du, X., Wang, W., Lin, B.: An out-of-band authentication scheme for Internet of Things using blockchain technology. In: International Conference on Computing, Networking and Communications (2018) 11. Cha, S.-C., Chen, J.-F., Su, C., Yeh, K.-H.: A blockchain connected gateway for BLE-based devices in the Internet of Things. IEEE Access 4, 24639–24649 (2018) 12. Novo, O.: Blockchain meets IoT: an architecture for scalable access management in IoT. IEEE IoT J. 5(2), 1184–1195 (2018)

408

C. Balarengadurai et al.

13. Alphand, O., Amoreti, M., Claeys, T., Asta, S.D., Duda, A., Ferrari, G., Rousseau, F., Tourancheau, B., Veltri, L., Zanichelli, F.: IoT chain: a blockchain security architecture for the IoT. In: IEEE Wireless Communications and Networking Conference (2018) 14. Zhang, Y., Kasahara, S., Shen, Y., Jiang, X., Wan, J.: Smart contract-based access control for the IoT. IEEE IoT. J. (Early Access) (2018) 15. Rifi, N., Rachkidi, E., Agoulmine, N., Taher, N.C.: Towards using blockchain technology for IoT data access protection. In: IEEE International Conference on Ubiquitous Wireless Boradband (2017) 16. Hyperledger Composer. https://hyperledger.github.io/composer/latest/ 17. Raspberry Pi sense hat. https://www.raspberrypi.org/products/sense-hat/

Sentiment Analysis Toward COVID-19 Vaccination Based on Twitter Posts Vaibhav E. Narawade and Aditi Dandekar

1 Introduction The World Health Organization (WHO) defines vaccine hesitation as behavior influenced by a variety of factors, including lack of confidence, complacency (not feeling the need for a vaccine, not evaluating the vaccine), and convenience [1]. People allergic to vaccines are a large group with different levels of doubt about specific vaccines or vaccines in general. Vaccine opponents may accept all vaccines but are still concerned about vaccines. Some people may refuse or delay some vaccines while taking others, while others may refuse all vaccines. Several nations have initiated immunization efforts in the aftermath of the effective achievement of the COVID19 vaccine in recent months [2–4]. The focus of the study should now move to the vaccine’s quick and successful dissemination [5]. The public’s view of the COVID19 vaccination influences whether it may be given to large groups of people and so produce herd immunity. It is crucial to determine whether demographic groups have more positive or unfavorable attitudes toward vaccination. Understanding demographic differences in vaccine concerns can help identify geographic regions where vaccination coverage is low and where the risk of vaccine-preventable disease outbreaks is higher. This information could also be used to report specific health messages. [6] Project goals are to characterize vaccine hesitation worldwide, focusing on vaccine safety and efficacy issues, and identify differences in vaccine hesitation based on residence status and other socio-demographic factors. [7] Examining V. E. Narawade (B) Ramrao Adik Institute of Technology, D Y Patil Deemed to Be University, Nerul, Navi Mumbai, India e-mail: [email protected] A. Dandekar Scool of Engineering and Applied Sciences, University of Mumbai, Mumbai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_36

409

410

V. E. Narawade and A. Dandekar

whether public opinion shifts over time and space can assist public health authorities in making timely modifications to vaccination education efforts depending on local community opinions. Additionally, people identified in the tweets as having negative attitudes toward COVID-19 vaccines can help with direct education and outreach.

1.1 Background It is encouraging to see how quickly vaccine development has advanced and how many major vaccine platforms have entered clinical trials. Examples are traditional recombinant proteins, replicative and non-replicating viral vectors, and nucleic acid DNA and mRNA methods. There are advantages and downsides to each of these vaccination platforms. Manufacturing speed and flexibility, safety and reactogenicity, humoral and cellular immunogenicity profiles, vaccination duration, volume and cost of production, vaccine stability, and the necessity for a cold supply chain are all important considerations. Diverse design must be approached intelligently. Moderna, BioNTech, Pfizer, and CureVac (mRNA-based) are just a few of the companies working on nucleic acid-based (DNA-based) vaccines. Using viral sequences, vaccines based on DNA and mRNA can be easily generated, paving the way for a rapid clinical approach. [8, 9]

1.2 Scope of the Project The primary goal of the “Twitter Sentiment Data Analytics” [10] keywords is to increase the visibility of tweets. Sentiment analysis has several applications. It’s especially beneficial for social media monitoring since it helps you to get a sense of what your audience is thinking. It is, nonetheless, valuable in market research and other scenarios requiring text analysis. Because of its efficacy, sentiment analysis is in great demand. Thousands of text documents may be sentimentally analyzed in seconds rather than hours. Using data analysis, machine learning, and artificial intelligence, we were able to answer four project-related questions: (1) public opposition to COVID-19 vaccinations, (2) Twitter sentiment and help analysis, including tweets, (3) text analysis and visualization of textual data, and (4) text classification structure.

Sentiment Analysis Toward COVID-19 Vaccination Based on Twitter Posts

411

2 Survey on COVID-19 Vaccination Strategies 2.1 Literature Survey Based on COVID-19 research, some studies have used current tweet patterns to investigate various issues related to COVID-19. Below we have included a list of the techniques used, along with any significant trends. Many articles focused only on English tweets. According to [11], information is being collected through Twitter. They collected English tweets from Indian residents about the COVID-19 vaccine using Python. In this study, they used emotional analysis to investigate how Indians’ impressions of COVID-19 vaccination changed over time during the COVID-19 outbreak. According to their findings, 47% of vaccine-related tweets were neutral, while around 17% were negative. When it comes to vaccination against COVID-19, Indian citizens are especially concerned about the risk of illness and vaccine side effects. Author [12] used a variety of technologies, including APIs, Tweepy and PostgreSQL, and a set of keywords (“corona,” “2019-nCov,” and “COVID-19”). They also obtained content from public tweets in English between February 2, 2020 and March 15, 2020 and detected topics using unigram, bigram, and Latent Dirichlet Allocation (LDA) topic models. The author [13] employed subject recognition and sentiment analysis to evaluate several tweets from Brazil and the USA, two nations with large COVID-19 distributions and fatalities. Using tweets in English and Portuguese, they compared and discussed the performance of subject recognition and sentiment analysis in the two languages. Over the course of four months, they graded ten subjects and ranked Twitter content to offer an assessment of how speech has changed over time. The author [14] has applied an unsupervised machine learning technique to detect and characterize user-generated discussions related to symptoms associated with COVID-19, testing experiences, and illness recovery claims March 3–20, 2020. A theme of two was used Terms (BTM) for Tweet Analysis, which split groups of tweets with topics related to similar words into topic groups that included discussions about symptoms, testing, and recovery. Tweets from these groups were then manually extracted and annotated for content analysis, as well as statistical and geographic factors. There was a statistically significant co-occurrence of tweets on these topics among people who reported symptoms without being tested and mentioned recovery [15]. Rufai and Bunce [16] analyzed the most retweeted UK tweets mentioning COVID-19 on Twitter in March 2020. Several recent researches have attempted to decode the emotions conveyed on social media using multimodal cues such as visual, audio, and textual data. Emotions may be seen in video blogs (vlogs) or spoken comments posted to social networking sites like YouTube, such as a video of someone commenting on a product or movie. [17, 18]. Sentiment analysis is primarily concerned with the automatic recognition of opinion polarity, i.e., positive or negative. The intensity of emotion is a less researched

412

V. E. Narawade and A. Dandekar

aspect of extant representations of emotion. To account for sentiment intensity, [19] suggested doing a multilayer sentiment analysis. According to the study, predictability, along with excitement, emotion, and dominance, is required for emotional expression [20]. Similarly, others have advocated extending the biased representation of sentiment analysis with additional dimensions, distinct emotional representations, or ratings [21, 22]. However, it is unclear how different feelings, such as disgust, can express a feeling as an emotional disposition.

2.2 Challenges Since the bulk of today’s sentiment classification is data-driven, machine learning models’ capabilities are confined to the domain from which training data is generated. Adopting a model trained in sentiment analysis in product assessments to evaluate microblogging messages is one unresolved obstacle. Another important challenge in sentiment analysis is dealing with ambiguous situations and sarcasm. For example, ironic comments that complement an object are intended to convey a negative attitude. Traditional methods of sentiment analysis, on the other hand, usually misunderstand such statements. A variety of strategies for detecting sarcasm in language have been presented [23]. However, because hilarity is culturally distinct, it is extremely difficult for a machine to understand various (and sometimes sophisticated) cultural allusions. We claim that by including voice and facial emotions, multimodal mood analysis might improve the identification of sarcastic speech. Machine sentiment analysis is limited to external symptoms of mood and cannot accurately understand one’s implicit thoughts [24].

2.3 Data Acquisition We gathered historical tweets on the COVID-19 vaccine. To obtain tweets, a combination of “vaccine” and COVID-19-related phrases (“COVID”, “coronavirus”, and “SARS-CoV-2”) was used as input. We used a user lookup Twitter API in the current study to get user location and interaction information. The geographical information was derived from self-reported profile locations on Twitter, which were accessible for around 70% of the persons included. A text-matching query was also used to determine the nation to which the location information is related. Twitter yielded a total of 5,534,549 posts. These tweets were sent by 1,844,850 different people and included 1,020,320 hashtags and 2,669,379 mention phrases.

Sentiment Analysis Toward COVID-19 Vaccination Based on Twitter Posts

413

2.4 Problem Statements • It is difficult in emotional study to determine the polarity of a text at the level of letters, sentences, or strokes. If a feature of a letter, sentence, or item conveys a favorable, negative, or neutral attitude, it should be capitalized. • Despite technological advancements that allow for the separation of evaluation information from participants, corporations, and other knowledge workers continue to struggle with extracting information about people or services. • Informal languages include phrases and slang. Emojis are animated depictions of people’s genuine expressions. • Twitter offers a large number of review entries, with approximately 5000 tweets accessible for evaluation. This necessitates a large amount of paperwork, making it difficult for humans to obtain the words [10].

2.5 Analysis and Planning Figure 1 depicts the techniques employed in this study. In Kaggle, the tweets in English were collected using the Twitter API. The dataset was downloaded, and the messages were parsed using natural language processing before being examined in the parsing step. For text categorization, the Universal Language Model Fine-Tuning (ULMFiT) technique was utilized, which is a Python Fastai integrated program. Long-term memory networks (LSTM) and recurrent neural networks (RNN) play important roles in many sequence learning applications, including machine translation, language modeling, and question answering [29]. As a result, the AWD-LSTM model was utilized to train and improve a classifier. Figure 2 depicts the three steps of ULMFiT (Refining the Universal Language Model for Text Classification): (a) To capture the generic features of the language, the Fig. 1 Workflow

414

V. E. Narawade and A. Dandekar

Fig. 2 Structure and stepwise of ULMFiT

LM relies on a domain-general corpus at several levels. (b) Using target activity data, the whole LM is tuned using discriminative fine-tuning (“Discr”) and tilting triangle learning frequencies (STLR) to learn activity-specific features. (c) To conserve lowlevel representations while altering high-level representations, the classifier adapts to the target activity using progressive unfreezing, “Discr,” and STLR (shaded: unfreezing phases; black: frozen). Thus, the stages begin with loading the data, then proceed to fine-tune the language model, train a sentiment classifier, optimize the classifier, and lastly evaluate the tweets [26–28]. In this example, transfer learning is utilized to develop a tweet sentiment analysis model. The notion underlying transfer learning is that neural networks, particularly the earliest layers, accumulate knowledge that may be generalized to new settings. The tweets (data) were gathered by accessing Twitter’s public broadcast application programming interface through the Tweepy Python module (API). The API filter and a list of English words are used to confine the results to English. To get the “All Tweets about COVID-19 Vaccines” dataset, a relevant search phrase is used, including vaccine-related terms like Pfizer/BioNTech, Sinopharm, Sinovac, Moderna, Oxford/AstraZeneca, Covaxin, and Sputnik [30]. The classifier was trained using the dataset “Complete Tweet Sentiment Extraction Data,” which comprises 78,320 tweets categorized as negative, neutral, or positive sentiment. Fastai can conduct text prep and tokenization, however, to enhance speed, Twitter usernames, URLs, hashtags, and emoticons have been deleted. The language model is trained via self-controlled learning. The text is loaded into the model as an independent variable, and Fastai automatically analyzes it and creates a dependent variable. This is accomplished through the usage of the DataLoaders class, which converts the input data into a DataLoader object that can be used as input for a fast learner. Fastai chooses 20% of the training data at random for the validation set. Fastai employs word segmentation by default, which separates text into spaces and punctuation marks and breaks words like “can’t” into two independent tokens. Table 1 shows how different values and parameters were utilized to tune the model and test its accuracy, which was 29.2% after tuning. When the learner has completed

Sentiment Analysis Toward COVID-19 Vaccination Based on Twitter Posts

415

Table 1 Used values and parameters to fine-tuning the model Number of epochs

Learning rate

Training loss

Validation loss

Accuracy

Perplexity

Time

0

3E−02

3.968465

4.273552

0.256900

71.776161

30 m 19 s

1

1E−04

3.821843

4.041213

0.286713

56.895290

30 m 44 s

2

5E−03

3.613947

4.034208

0.291038

56.498173

31 m 16 s

3

1E−03

3.477068

4.059790

0.290724

57.957420

50 m 20 s

Table 2 Used values and parameters to fine-tuning the classifier Number of epochs

Learning rate

Training loss

Validation loss

Accuracy

Time

0

3E−02

0.646410

0.590527

0.755907

7 m 40 s

0

1E−04

0.593550

0.581930

0.757633

11 m 16 s

1

5E−03

0.545703

0.584210

0.758142

11 m 19 s

2

1E−03

0.519921

0.594769

0.757822

11 m 19 s

constructing embeddings from the pre-trained AWD-LSTM model, they are mixed with random embeddings created for words not in the lexicon. To fine-tune the classifier, discriminative learning rates and progressive unfreezing were used, which have been found to yield superior results for this type of model. First and foremost, all except the last layer have been frozen. The layers are then frozen, except for the last two. Finally, the whole model was frozen and further trained to enhance accuracy. Various values and parameters were used to fine-tune the classifier and track its accuracy, which was 75.8% when it was ultimately fine-tuned, as given in Table 2. To validate the mode, the predict function is called, which returns the projected sentiment, the forecast index, and the expected probability for negative, neutral, and positive emotion. Finally, the vaccine tweets are loaded into DataLoaders as a test set to be evaluated and sentiment assessed. get_preds is used to make predictions.

3 Results and Discussions We have done research and analysis using geographical representation across worldwide with regard to COVID-19 vaccines distribution. We have identified which available vaccines are being using in which countries and overall vaccination rates. Figures 3 depicts some map representations to promote the same results. Figure 4 depicts COVID-19 vaccination acceptance rates by country, with the most recent estimate utilized for nations with numerous trials. Also, below representation shows that many countries use only one or two different vaccines (Fig. 4).

416

V. E. Narawade and A. Dandekar

Countries using Moderna

Countries using Pfizer

Countries using Oxford/AstraZeneca

Countries using Chinese

Countries using Johnson&Johnson

Countries using Russia

Countries using Indian

Countries using Cuban

Countries using Kazakhstan

Fig. 3 Map representation of different acceptance of COVID-19 vaccines

Sentiment Analysis Toward COVID-19 Vaccination Based on Twitter Posts

417

Fig. 4 Number of vaccines by different countries across worldwide

Fig. 5 Number of positive, negative, and neutral tweets

To train the model to recognize vaccination tweets, Fastai and the ULMFiT approach are utilized. To train and develop the model for optimum accuracy, numerous parameters and variables are employed. Table 2 illustrates the values and parameters that were utilized to optimize the model to reach 72% accuracy, including the learning rate, number of epochs, training loss, validation loss, accuracy, and duration. Figure 5 depicts the period from December 12, 2020 to May 21, 2021, when the majority of tweets are unfavorable, with more negatives than positives.

4 Conclusion Twitter is a social networking platform with a global user base of over 500 million people. It has evolved into a global platform for disseminating information, debating ideas, and discussing current events. Because of the variety of news, viewpoints,

418

V. E. Narawade and A. Dandekar

and experiences shared by individuals and government agencies, Twitter is a terrific source of health-related information. For this study, we employed natural language processing to assess sentiment in 78,320 English tweets. The AWD-LSTM model was used to train and adjust our classifier, which produced an accuracy of 76.30% using the ULMFiT approach, which is included in the Python Fastai package. We’ve determined that most tweets are indifferent, with more negative sentiments than good.

References 1. Group, T.S.V.H.W.: What influences vaccine acceptance: a model of determinants of vaccine hesitancy (2013) 2. Mahase, E.: Covid-19: UK approves Pfizer and BioNTech vaccine with rollout due to start next week. BMJ 2020 Dec 2; 371:m4714. PMID:33268330 3. Limb, M.: Covid-19: Data on vaccination rollout and its effects are vital to gauge progress, say scientists. BMJ 2021 Jan 11;372:n76. PMID:33431370 4. Rosen, B., Waitzberg, R., Israeli, A.: Israel’s rapid rollout of vaccinations for COVID19. Isr. J. Health Policy Res. 10(1), 1–14 (2021). https://doi.org/10.1186/s13584-021-00440-6 5. Liu, J., Liu, S.: The management of coronavirus disease 2019 (COVID-19). J. Med. Virol. 92(9):1484–90 (2020). PMID:32369222 6. Chong, K.C., Hu, P., Chan, S.Y., et al.: Were infections in migrants associated with the resurgence of measles epidemic during 2013–2014 in southern China? A retrospective data analysis. Int. J. Infect. Dis. 90, 77–83 (2020). https://doi.org/10.1016/j.ijid.2019.10.014 7. Chou, W.Y.S., Budenz, A.: Considering emotion in COVID-19 vaccine communication: addressing vaccine hesitancy and fostering vaccine confidence. Health Commun. 35(14), 1718–1722 (2020). PMID:33124475 8. Dowd, K.A., et al.: Science 354, 237 (2016) 9. Pardi, N., et al.: Nature 543, 248 (2017) 10. Dandekar, A., Narawade, V.: Twitter sentiment analysis of public opinion on COVID-19 vaccines. In: Bansal, J.C., Engelbrecht, A., Shukla, P.K. (eds.) Computer Vision and Robotics. Algorithms for Intelligent Systems. Springer, Singapore (2022). https://doi.org/10.1007/978981-16-8225-4_10 11. Sv, P., Ittamalla, R., Deepak, G.: Analyzing the attitude of Indian citizens towards COVID-19 vaccine–a text analytics study. Diab. Metab. Syndr. Clin. Res. Rev. (2021) 12. Abd-Alrazaq, A., Alhuwail, D., Househ, M., Hamdi, M., Shah, Z.: Top concerns of tweeters during the COVID-19 pandemic: infoveillance study. J. Med. Internet Res. 22(4), e19016 (2020) 13. Garcia, K., Berton, L.: Topic detection and sentiment analysis in Twitter content related to COVID-19 from Brazil and the USA. Appl. Soft Comput. 101, 107057 (2021) 14. Mackey, T., Purushothaman, V., Li, J., Shah, N., Nali, M., Bardier, C., Liang, B., Cai, M., Cuomo, R.: Machine learning to detect self-reporting of symptoms, testing access, and recovery associated with COVID-19 on Twitter: retrospective big data infoveillance study. JMIR Public Health Surveill. 6(2), e19509 (2020) 15. Rufai, S.R., Bunce, C.: World leaders’ usage of Twitter in response to the COVID-19 pandemic: a content analysis. J. Public Health 42(3), 510–516 (2020) 16. Thelwall, M., Thelwall, S.: Retweeting for COVID-19: consensus building, information sharing, dissent, and lockdown life (2020). arXiv preprint arXiv:2004.02793 17. Morency, L.-P., Mihalcea, R., Doshi, P.: Towards multimodal sentiment analysis. In: ACM International Conference on Multimodal Interfaces (ICMI), p. 169. ACM, New York, USA. (2011). https://doi.org/10.1145/2070481.2070509

Sentiment Analysis Toward COVID-19 Vaccination Based on Twitter Posts

419

18. Wöllmer, M., Weninger, F., Knaup, T., Schuller, B., Sun, C., Sagae, K., Morency, L.-P.: YouTube movie reviews: ssentiment analysis in an audio-visual context. IEEE Intell. Syst. 28(3), 46–53 (2013). https://doi.org/10.1109/MIS.2013.34 19. Zadeh, A.: Micro-opinion sentiment intensity analysis and summarization in online videos. In: ACM International Conference on Multimodal Interaction (ICMI), 2015, pp. 587–591. https:// doi.org/10.1145/2818346.2823317 20. Fontaine, J.R., Scherer, K.R., Roesch, E.B., Ellsworth, P.C.: The world of emotions is not twodimensional. Psychol. Sci. 18(12), 1050–1057 (2007). https://doi.org/10.1111/j.1467-9280. 2007.02024.x 21. Clavel, C., Callejas, Z.: Sentiment analysis: from opinion mining to human-agent interaction. IEEE Trans. Affect. Comput. 74–93 (2015). https://doi.org/10.1109/TAFFC.2015.2444846 22. Munezero, M.D., Montero, C.S., Sutinen, E., Pajunen, J.: Are they different? Affect, feeling, emotion, sentiment, and opinion detection in text. IEEE Trans. Affect. Comput. 5(2), 101–111 (2014). https://doi.org/10.1109/TAFFC.2014.2317187 23. Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Mining Text Data, pp. 415–463. Springer, U.S. (2012). https://doi.org/10.1007/978-1-4614-3223-4_13 24. McDuff, D., Kaliouby, R.E., Cohn, J.F., Picard, R.W.: Predicting ad liking and purchase intent: large-scale analysis of facial responses to ads. IEEE Trans. Affect. Comput. 6(3), 223–235 (2015). https://doi.org/10.1109/TAFFC.2014.2384198 25. Howard, J., Gugger, S.: Deep Learning for Coders With Fastai and PyTorch: AI Applications Without a PhD, 1st edn. O’Reilly Media Inc., Sevastopol, CA, USA (2020) 26. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic Differentiation in PyTorch. NIPS Autodiff Workshop. 2017. Available online: https://openreview.net/forum?id=BJJsrmfCZ. Accessed on 14 Feb. 2020 27. Oliphant, T.: NumPy: A Guide to NumPy. Trelgol Publishing: Spanish Fork UT, USA (2006) 28. Clark, A.: Python Imaging Library (Pillow Fork). Available online: https://github.com/pythonpillow/Pillow. Accessed on 14 Feb. 2020 29. Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models (2017). arXiv preprint arXiv:1708.02182 30. Howard, J., Ruder, S.: Universal language model fine-tuning for text classification (2018). arXiv preprint arXiv:1801.06146

Smart Cities Implementation: Australian Bushfire Disaster Detection System Using IoT Innovation Mohammad Mohammad and Aleem Mohammed

1 Introduction In November 2019, at the early stage of COVID-19 pandemic, the bushfires burned in New South Wales 5.5 million hectares of vegetation, landscape, and more than the number around 2,000, houses destroyed and 25 lives lost in NSW alone [2]. According to Australian Bureau Statistics [3], “In New South Wales, 2019/20 was the most devastating bushfire season in the state’s history”. The absence of real-time information about bushfires, the state, local government, and other government agencies have adopted a dual approach to emphasize its quality publications. The lack of real-time information series, and the need to “rapid response” in providing updated information in an incremental change in an economic and social-based environment call the attention on researchers and decision makers to build a dedicated national bushfire detection system. The current patchwork of fire records cannot be delivered due to unavailability of information or may still be subject to change. On the other hand, some of the information obtained or sourced from media reporting, the accuracy of which has not been verified cannot deliver a reliable data. For example, [4] mentioned that area burnt in 2019/20 based on satellite estimate data was (30.38 million hectares) and government estimate was 39.8 million hectares, differ by more than 9 million hectares. Instead of creating a communication between bushfire and emergency services offices of state and local offices, the motive is to secure the forest environment by delivering the prior solution, which helps increasing the

M. Mohammad (B) Senior Lecturer and Discipline Leader Software Engineering, Melbourne Institute of Technology, Sydney Campus, Sydney, Australia e-mail: [email protected] A. Mohammed Computer Science Engineering, Melbourne Institute of Technology, Sydney, Australia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_37

421

422

M. Mohammad and A. Mohammed

response activities with accurate information. The Australian government should rethink about a new strategy for implementing an FDS can detect/collect reliable data mapping based on its devices, blending wireless sensors data, and medium-resolution GPS location and cameras on a drone with field validation. As the prevalent technologies helping the Internet to getting advanced like, e.g., big data, IoT (cloud based), and a cellular network (5G), it is feasible today in providing FDS connected to a lower cost network. According to the annual report of Cisco (2018–2023) mentioned in the paper that the expected number of IoT devices and data generated by 2023 is 15 billion, which shows big increase, in machine-to-machine (M2M) connections comparing with 6 billion in 2018 [5]. In addition, a report published by International Data Corporation (IDC) in July 2020 shows that data collected from IoT devices is expected to be 73 ZB by year 2025. Most of data will be collected from a camera surveillance security or an industrial-based application of IoT [6]. Australian cities need to be papered to accept smart technology, sensors, and complete IoT infrastructure to ensure their citizens’ safety as well as to provide effective solutions. Obviously, IoT and its applications play an important role in contributing to their resilience, and to its effectiveness in a bushfire incident system in all of the phases (in the pre- and post-bushfire including the in during bushfire). In addition to the numerous, benefits of sensors, communication on a traditional approach, security-based networking systems, against the outbreaks of fire itself considered as a least-cost product in the terms of equipment and an installation, considering these departments in a commercial market, technology is more implementing and becoming very popular. Fire outbreaks can happen spontaneously and for this reason, it needs security systems encounter the risk. Fire detecting sensors which considered an important role in the detection and monitoring the humid rates with a proper GPS location and the snapshot from the attached GPS location sensor and the camera on a drone. The sensors are connected to the Raspberry Pi board, and Raspberry Pi itself configured on a wireless hotspot from the USB stick, the data acquired from sensors it will continuously send and stored on the cloud and to the nearest fire stations and houses which must be readily accessible to the user such that user can take appropriate action based on the system alert. The benefit of this proposal that IoT is considered as an important enabler of smart technology may useful to detect bushfire and monitor its status with low cost [1]. It also helps enabling the bushfire emergency-based operations, which can perform in an intelligent way with lower human interaction. IoT can also provide a real-time information in recognizing, locating, monitoring, and managing in an intelligent way as well as it enables the government stakeholders such as state/local to become interconnected, and IoT can provide smart solutions needed to solve the problems with bushfires. Especially, if used in a way to be integrated with other technologies (e.g., cloud computing, drone, standalone microcontroller (Raspberry Pi), wireless communications, and flame sensors) can foster the effectiveness of the IoT and can play significant role in solving the problems and detecting bushfires. The following is the structure of this paper: In the beginning, there is a discussion of the research process and its limits. Second, it provides a survey of the relevant literature in order to explain the theoretical void and the foundational purpose for the research. Third, it addresses the information architecture that is enabled by

Smart Cities Implementation: Australian Bushfire Disaster Detection …

423

the Internet of Things. Fourthly, it assesses the applicability fire detection system (FDS) by employing an illustration scenario as well as a portable microprocessor (Raspberry Pi), it is an integration prototype in order to test the system’s potential uses. At last, it presents a discussion of the findings of the research, which is then accompanied by the conclusion.

2 Related Work In the course of its history, Australia has been witnessing to a multiple catastrophic kind of disasters, which are not only resulted in significant way in damaging the property but also it led in losing the life on the day on which they reached their zenith. These disasters include the five most deadly blazes, which were as follows: a black Saturday in a year 2009 in a region of Victoria claiming the life’s of 173 people and followed by destroying 2000 homes; Ash Wednesday in the year of 1983 in a region of Victoria and a South Australia claiming life’s of 75 people and almost destroyed around 1900 homes; Black Friday in the year of 1939 in a Victoria region of Australia claim the life’s of 71 people and around 650 homes has destroyed; Black Tuesday 1967 in Tasmania claimed the lives of 62 people and almost 1300 homes; and the Gippsland forest fire and Black Sunday of 1926 in Victoria claimed the lives of 60 people over the course of two months [9]. According to [9], it is anticipated that the bushfires that occurred in 2019–2020 were responsible for the deaths of at least 33 persons in addition to more than three billion animals. Due to the obvious complexity of multiple sources of data, the current patchwork of fire statistical data cannot be delivered. This is due to the fact that errors in data collection, miscommunications, data replication, and ineffective responses to natural disasters are all potential outcomes of using multiple sources of data. According to what was stated in [7, 8], there is also a growing interest among businesses, the government, and various groups to construct IoT-enabled intelligent designs and structures. One recent example is the development of an Internet of Things-enabled structure for use in intelligent healthcare systems. Innovative technologies such as the Internet of Things (IoT, e.g., drones, GPS, and fire sensors) offer new methods of detecting wildfires and ensuring the delivery of vital emergency information. In the not-too-distant future, it is projected that the Internet of Things (IoT) devices and related systems would experience significant growth. IoT-enabled trying to connect and substituting records with multiple devices, primarily in emergency situations fireplace detection methods and responses may be developed making use of less expensive devices and adopted for the purpose of distributing alert caution emails or alerts to the nearest fire station and residents. Nevertheless, designing and developing such dynamic and intricate fire detection structures for the fire department and any linked organizations is a difficult and time-consuming issue. Recent studies have combined aspects of wireless communication systems, disaster risk management, cloud computing, intelligent healthcare, and the Internet of Things (IoT) in studies of intelligent city environments [10, 11]. Examined the challenges associated with drones in the context

424

M. Mohammad and A. Mohammed

of the author’s vision of a future in which drones are popular and “the concept of drone services” is prevalent. The authors illustrate that the type of service that a drone is intended to do should be the primary factor considered when selecting the appropriate drone. This study also highlights the fact that the most common description for drone technology covers four dimensions: the type-based classification, a size-based categorization, the control-based categorization, and the height classification. There is a significant correlation between each of these four variables and a drone’s speed, payload, lasting power, traveled distance, kind of supplier, and type of site (e.g., indoor, or out of doors). Other studies investigate the concept of “helping the disabled” in relation to the utilization of drones to provide services to disabled individuals. These drones could be implemented on the fly and serve as the disabled person’s “eyes” (though the streaming video) and “ears” (through streaming audio), respectively, (e.g., blind or deaf) [7, 8, 10] and [11]. Drones have the potential to save lives; for instance, in the course of a bushfire in 2019, when it first started out, a group of drones could have been dispatched out to show and send a daily updated record approximately the wooded area quickly and to get visual imagery. This would have allowed the fire stations to rapidly ship someone to rescue the victims and save them.

3 Research Methodology The findings of this study will be beneficial to the ongoing efforts to enhance and assess the performance of the fire detection (FDS). As a result, a design studies (DR) method was taken into consideration for this study [8]. The purpose of DR is to increase knowledge and experience, as well as to improve design processes and rules pretty substantially, as well as to evaluate a product and respond to a few genuine or apparent difficulties [12]. However, the most important contribution is the information that was obtained from implementing the suggested design. The DR method [8] is broken down into the three fundamental steps that are as follows: Literature Review—Related literature is reviewed to identify the research problem within the domain of IoT and emergency information delivery to the Australian government. • Design—FDS is an information architecture driven strategy that is developed to deal with the issue of dependable data series that has been recognized in the literature. FDS is enabled by the Internet of Things. • Evaluation—Evaluation of the suggested FDS’s applicability is carried out with the assistance of a Raspberry Pi tool-primarily based system implementation prototype that is portable and with person testing. The prototype is being made available as a proof of concept (PoC), with the intention of guiding subsequent research and development efforts in this vital area of research. The FDS’s mission is limited to the transmission and display of emergency data (bushfire) to the relevant authorities in Australia. Perspectives offer a means of illuminating the

Smart Cities Implementation: Australian Bushfire Disaster Detection …

425

significance of the investigations at hand and the FDS that has been developed. In the first place, it enables the necessity of fresh information architecture design samples to be made evident within the context of Australian bushfires. These samples are enabled information architecture design. Secondly, the Australian documentation with bushfires reflects the maximum prevalence of emergencies. This has prompted the development of a higher or alters case of emergencies facts notification architecture that consists of FDS for the purpose of allowing effective communication to the nearest fire station and impacted citizens [9].These studies are an attempt to address a tiny gap in the research that has been done on the Australian climate environment in the context of bushfires. It is predicted that the outcomes of this research will also prove useful for the local climate conditions of other countries throughout the world.

4 Proposed Approach This business activity is being conducted with the intention of developing an innovative approach to identify and reveal bushfires. In the event that there is a forest fire, this device is able to send an early warning or caution message. The device communicates with a global system and a rapid messaging service via global system. A GPS region sensor and a camera are fitted to a drone for this purpose. For the purpose of communicating verbally, the Raspberry Pi gadget is made up of a microcontroller, as well as humidity and temperature sensors that are connected to each other. A GPS receiver is additionally interfaced with the microcontroller so that the module’s role can be reported and the sensory facts may be communicated. After that, the cloud server can collect this message in order to make the same movements or perform the same processing. The workflow diagram is shown in Fig. 1. There is information presented regarding the project’s architecture, as well as its evolving algorithms and general performance.

Fire

Flame

Image

GPS

Raspberry Fig. 1 Workflow diagram

Early

Cloud

426

M. Mohammad and A. Mohammed

5 Comparison with Another Technology We will compare now between the currently used technology in fire detection “optical sensor and digital camera” and the fire detection using an equipped drone. But, firstly, we will explain more about the “optical sensor and digital camera” technique. In the process of the development of sensor, virtual cameras, image analysis, and industry systems, a device for the visual, automated earlier detection and warning of forested area fires was created. In terrestrial frameworks, a variety of detection sensors including but not limited to the following: i. ii. iii. iv.

v.

A video camera that is sensitive to the spectrum of smoke that can be seen at a certain time as well and a fireplace that can be seen during the night, Infrared (IR), infrared camera sensors allow in detecting the heat transfer of a fire places, Infrared spectrometers are determined a special smoke characteristic. Identification of light is tended to range structures, also known as LIDAR (identification of a light range), which in turn calculate the intensity of a laser by its reflected rays. Ultraviolet spectrometers to determine the chemical composition of the smoke.

The various optical systems, each of which operates in accordance of one-of-akind algorithm proposed by the makers, all have the same widely used principle for detecting smoke and fireplace glow. To put it simply, the stationary digital camera takes pictures at regular intervals. The picture is composed of a numbered particle, and the processing unit monitors the movement in the pictures and counts how several pixels contain smoke or firelight. It then forwards the data to another set of rules, which decides whether or not to sound the alarm for the operators. Because of the importance of localization, the majority of an optical-based component has to be incorporated into geographical maps. See Fig. 2. The utilization of a specific kind of sensor or camera is dependent not only on the particular conditions of the function, but also on the economic resources that are at one’s disposal. The alarm eye is a passes system that incorporates infrared, black and white, and color frequencies detection for the purpose of detecting forest fires at an early stage. Because of its infrared capability, it is able to differentiate between photographs of flames and those of vaporized warmth. This device ended up being manufactured and put into operation in Thailand [14]. The EYEfi SPARC, sensors which play optically has manufactured by using EYEfi, Australia, for forest fire detection include the following: • Camera (coloration all throughout the day and incredibly low mild shades of gray at night time), • Weather report,

Smart Cities Implementation: Australian Bushfire Disaster Detection …

427

Fig. 2 Forest watch system [16]

• Lightening sensing detector, • Information sharing unit (0.25 Mpbs), • Power machine. A thermal camera or a webcam with pan, tilt, and zoom capabilities could be attached to the device. The automated detection of smoke is not yet available from EYEfi, but the company has plans to roll out this feature at some point in the nottoo-distant future. Simply said, EYEfi is able to provide photos to fire departments every time an operator detects smoke, and fire departments are able to use EYEfi technology to access a GIS map and locate the location of smoke on the ground. For increased precision, the device incorporates both a monitoring station and a lightning detector into its construction [17]. France is both the country of origin and market for the UraFire system, which identifies smoke predominantly through the use of clustering motions in conjunction with a time entry in order to decrease the number of false alarms [18, 19]. The Australian government, in collaboration with the Forest fire Cooperative Research Centre, has produced a very appealing article on Australian experience with three optoelectronic devices for forest fire detection. This paper is a part of the Bushfire Collaborative Research Centre’s ongoing research. This booklet discussed the experiences that Australians have had with bushfires over the years. They conceived up a project with the intention of comparing the overall effectiveness of each of the three optically sensing devices to the operation of the human-staffed remark tower in order to achieve their goal of determining which of the two provided the superior results. Mathews and others [20] established a holistic report while concealing the details on the assignment, the testing environment, the impacts of the testing, and the assessment [21]. The two mechanisms are known to as EYEfi, fire watch, and forest watch, and that they had been all examined on three various forms of fires in Tumut, New South Wales, and Otway Range, Australia, in the year 2010. The blazes affected the study sector, the non-public sector, and the business sector. See Fig. 3.

428

M. Mohammad and A. Mohammed

Fig. 3 Tower in Tumut, Australia, with the three systems [20]

Table 1 Detailed comparison to highlight the advantages and the disadvantages for the two techniques The comparison Image processing-based forest fire detection

Equipped drone

Technique

Fixed thermal cameras

Thermal camera, temp, humidity, and flame sensors

Response

After the fire breaks out it will be detected

Early detection

Processing

Get the fire indication via image processing

Get the fire indication by crossing the image, humidity, temperature, and flame indications

Accurate

Fair

Excellent

Fire located

Wide zone

Accurate via GPS

Coverage

Limited zone

Very huge area

Base

A high tower should be built to fix the No need for any construction works camera

In comparison with the previous technology, we will see that the drone has added a huge advantage to the equipped drone technology with GPS locator the time to response will be reduced (Table 1). Figure 4 determines the logical sequence and the outcome of the processed data. Another comparison as seen in [22] is discussed in Table 2

6 The Fire Detection System Architecture Discovering bushfires and transmitting emergency alerts to the nearest fire station and residents who have been impacted by them are the most important helpful aspects of the suggested architecture shown in Fig. 1. This designs studies (DR) technique

Smart Cities Implementation: Australian Bushfire Disaster Detection …

429

Fig. 4 Logical response sequence and the outcome action Table 2 Comparison of algorithms in terms of accuracy and F-score CART

RF

SVM

CART

RF

GBM

EONF ISSNIP

RF

Temperature

Accuracy (%) F-score (%)

98.9 98.2

98.8 98.2

98.8 98.4

97.7 95

94.7 89

97.7 95

98.5 NA

95 94

Humidity

Accuracy (%) F-score (%)

98.2 97.8

98.4 98

98.7 98.3

99.3 99

99.3 99

99.3 99

98.5 NA

95 94

430

M. Mohammad and A. Mohammed

provides the emergency facts that are enabled by the Internet of Things. IoT devices require a computational component (like a Raspberry Pi) in order to be connected to the Internet, and as a result, they utilize the Internet connection that is the most dependable, such as the National Broadband Network (NBN). These days, connected devices play a significant role in the development of the Internet of Things. All these devices are Internet enabled and equipped with smart sensors. Firstly, in the process, we are using a microprocessor as (Raspberry Pi) the Raspberry Pi 4 Model-B 4G generation Raspberry Pi with 1.5 GHz 64-bit quad core ARM Cortex-A72 processor, on-board 802.11ac Wi-Fi, Bluetooth 5, full gigabit Ethernet (throughput not limited), it comes with LPDDR4 RAM variants available for 2 GB, 4 GB, and up to 8 GB it has two USB 2.0 ports, two USB 3.0 ports, and dual monitor support via a pair of micro-HDMI (HDMI Type D) ports are used to connect to a display. Secondly, the sensors are to collect the information at the physical layer and transfer information to the microprocessor. Figure 1 shows the Global Positioning System (GPS) module used with Raspberry Pi to determine their position/location accurately on earth, the satellites actively send radio frequency signals in space and ground network station to GPS to establish the communications, but it does not need to transmit any information at this stage. Thirdly, we need to setup the cloud with a “Bushfire Info DB” database to store available emergency information, which is received from the authorized IoT-enabled devices such as warning messages from sensors and other associated devices. Fourthly, drone is used which is compatible with Raspberry Pi and equipped with a webcam. The image is taken in the event that there is a fire, and it is uploaded to the cloud using Wi-Fi while simultaneously activating the buzzer. Fifthly, the fire sensors used it to employ various principles for flame detection. These principles include specifically exploiting the physical or chemical properties of flames, such as the light emission from fire, smoke generated from the flame, and comfort generated from the flame in order to find flames. According to, “a flame sensor is one that is capable of sensing fire supply or other light resources within the wavelength range of 760–1100 nm”. Sixthly, power banks that are responsible for turning in approximately 10 watts can be connected to the Raspberry Pi so that it can power itself. This will merely function as a charger that may “take” the energy that it requires from the portable charger. Due to the fact that the capacity of the power bank is 50,000 mAh, it is sufficient to keep the Raspberry Pi operational for an extremely extended period of time. Seventhly, the DHT22 is a device that can measure both temperature and humidity at the same time. Document module is an improved programmed answer, with this sensor, the gathered information packets are immediately sent to the Raspberry Pi, while Raspberry Pi is acting as the controller in this undertaking. Additionally, it creates a virtual output at the output of the inverter (facts pin). It monitors the facts for a charge (once each three or four seconds). This sensor generates and transmits records in the form of a sequence of voltage spikes, which are understood as a 1, and low voltages, which are interpreted as a 0, which the microcontroller can examine and utilize to formulate a fee. In this circumstance, the microcomputer reads a value from the sensor that is 40 bits in length (forty pulse of high or low voltage), which is equal to five bytes. It then stores this information in a software variable. The very first two bytes represent the cost associated with

Smart Cities Implementation: Australian Bushfire Disaster Detection …

431

humidity, the next two bytes represent the cost associated with temperatures, and the fifth byte represents the checksum cost associated with ensuring a correct read [13]. Wi-Fi dongles provide a wireless connection between Raspberry Pi and other devices in the cloud, which is the eighth step. The Pi four also supports wireless Internet out of the box and has built-in Wi-Fi and Bluetooth, which makes it easier to transport data between the Raspberry Pi microprocessor and the cloud software. This feature is available as standard. Finally, as a result of the need for the cloud and activity platform (CAP) to process the data in accordance with the orders that are transmitted with the assistance of the Raspberry Pi, sensors need to be deployed with an information processing module. The Raspberry Pi that works as the central hub is the device that is responsible for receiving the facts and commands from the application regarding the sensors. It helps with configuration, command mappings, protocols conversion with a specified layout, and uploads the facts received through the sensor to the utility platform. All of these functions are based on the instructions (Fig. 5).

Fig. 5 Fire detection system architecture

432

M. Mohammad and A. Mohammed

7 Results Developed project demonstration: The equipment, GPS module, humidity sensor, temperature sensor, and fire detector were attached to the drone with a tape as shown in Fig. 6, and the mentioned sensors have been programmed and configured on a Raspberry Pi microcontroller.

7.1 GPS Module Results on “Thing Speak” Platform The latitude and longitude values are the essential values to determine the location of the fire in order to take the needed action from the authorities to quarantine the fire area. Figure 7 shows the latitude data, which were monitored around 33.9204, this reading has been recorded and illustrate the figure every one minute. Figure 8 shows the longitude data, which was monitored with above 151.0658 and the chart was updated every one minute, and Fig. 9 shows the fire value data, which was the chart was updated every one minute, and Fig. 10 shows the temperature values, which were sensed by DHT11 sensor.

7.2 Flame Sensor Results on “Thing Speak” Platform The fire detection gives a value, which led to conclusion that there is a fire in the area, so that the sensor should detect a flame for the outcome value (Table 3). Fig. 6 Raspberry Pi microcontroller is situated on the drone

Table 3 Fire detection result captured from the IR flame sensor Time

14:04

14:06

14:08

14:10

14:12

Fire value

2600

2750

2950

2900

2900

Smart Cities Implementation: Australian Bushfire Disaster Detection …

433

Table 4 Temperature and humidity result captured from the DHT11 sensor module Time

10:45

10:50

Temperature

19.4

19.5

Humidity

51

50

10:55 19.8 50–52.5

11 19.9 48

7.3 Temperature and Humidity Sensor Detection Results on “Thing Speak” Platform The needed of the temperature and humidity value is to make the alert more specific and to limit the fake alarms, so when there is an out-of-control fire, the temperature will raise and the humidity will drop and with the fire detection value, there is definitely a fire in the area which is located by the GPS module. The DHT11 sensor is a temperature and humidity sensor in one module, with analog output. The values of the temperature and humidity during 15 min (Table 4).

7.4 Field Charts—Real-Time Data Captured from Thing Speak Cloud Platform See Figs. 7, 8, 9, 10 and 11.

Fig. 7 Latitude data monitored

434

M. Mohammad and A. Mohammed

Fig. 8 Longitude data is monitored

Fig. 9 Fire value detection by DHT22 sensor

8 Conclusion There is no way to monitor the natural better than the sky, so we will try to but our eyes above the land that we want to protect and we will equip those drones with the needed sensors or detectors, and some day they will not only detect the fire and inform us, but also fight the fire immediately. In our opinion, drones present a classification of area-primarily based solutions that are capable of being combined with a variety of cellular service options. Although it is not likely that everyone will own a drone in the near future, and it is not possible if there are multiple drones could be there based on smart based, there will be a quite in a number, whereas businesses working drone technology to supply particular services (such as a “phone-a-drone” service

Smart Cities Implementation: Australian Bushfire Disaster Detection …

435

Fig. 10 Temperature value detection by DHT 11 sensor

Fig. 11 Humidity value detection by DHT 11 sensor

or drone condominium facilities) is probably possible in the close to future. It is possible that governmental, practical, and technological limits will push in using the drones in a specific service (niche), despite the fact that so many people may have their very own private drones that they can take with them wherever they go. When drones arrive at specific locations, it is also simple to keep in mind the possibility of providing them with infrastructure and information that is depending on the location of those drones. Due to the fact that it is still early days, we have not yet talked about metering, pricing, and the model of drones leasing or Uber drones which are sharing while using drone products. If the challenges like cyber-physical and social-based challenges are mentioned earlier can be overcome, the IoT in the air will connect with the Internet of Things on the ground to form a phase as follows of new services for end-customers. This is presuming, of course, that these challenges can be overcome. Many varieties of drones are continued to be increased, leading to the possibility that one-of-a-kind customized drones designed specifically for particular applications

436

M. Mohammad and A. Mohammed

will be produced. Drones that are significant in a huge diverse range of applications and specialized drones for unique variations of packages are two distinct trajectories that are on the upward push. Drone that are significant in a huge, diverse range of applications are on the upward push. There is likely to be a significantly greater selection of recent sorts of drones as they find their way to cover additional products. There are yet a great deal of up-and-coming applications to come back, such as in the fields of artwork (e.g., amusement offerings that use drones as pixels for threedimensional pictures in a cloud) and in a constructive project (e.g., ordering a drone set in constructing a sculpture or demand bridge) (Fig. 11).

References 1. Vandenbeld, John (1988). “The Making Of The Bush: a portrait of the island continent”. Nature of Australia. Episode 3. 55 minutes in. ABC TV. Retrieved 12 February 2020. 2. Tronson, Mark. “Bushfires – across the nation”. Christian Today. Retrieved 5 June 2021. 3. Australian Bureau Statistics Homepage, https://www.abs.gov.au/articles/droughts-firescycl ones-hailstorms-and-pandemic-march-quarter-2020, last accessed 2021/06/04. 4. Bowman, D., Williamson, G., Yebra, M.: Wildfires: Australia needs a national monitoring agency. Journal 584, 189–191 (2020) 5. Cisco Annual Internet Report, (2018–2023) White Paper accessed on 23/06/2021 https://www. cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/whitepaper-c11-741490.html 6. International Data Corporation (IDC), (2020) website accessed on 23/06/2021. https://www. idc.com/getdoc.jsp?containerId=prAP46737220#:~:text=IDC%20predicts%20that%20by% 202025,from%2018.3%20ZB%20in%202019 7. Accessed on 25/06/2021. https://www.electronicwings.com/raspberry-pi/gps-module. https:// www.electronicwings.com/raspberry-pi/gps-module-interfacing-with-raspberry-piinterfa cing-with-raspberry-pi 8. Gill, A., Phennel, N., Lane, D., Phung, V.: IoT-enabled emergency information supply chain architecture for elderly people: The Australian context, Information Systems, vol. 58, pp. 75-86, (2016) 9. Accessed on 09/07/2021 https://en.wikipedia.org/wiki/Bushfires_in_Australia 10. Qadir, Z., Ullah, F., Munawar, H., Al-Turjman, F.: Addressing disasters in smart cities through UAVs path planning and 5G communications: A systematic review, Computer Communications, vol. 168, pp. 114-135, (2021) 11. Alwateer, M., Loke, S., Zuchowicz, A.: Drone services: issues in drones for location-based services from human-drone interaction to information processing, Journal of Location Based Services, vol. 13, pp. 94-127, (2019) 12. Margolis, M.: Raspberry Pi Cookbook. 2nd edn. O’Reilly Media Inc, (2012). 13. Pallavi, A.: An Iot Based Forest Fire Detection using Raspberry Pi. International Journal of Recent Technology and Engineering 8, 9126–9132 (2019) 14. Accessed on 13/07/2021 https://www.seeedstudio.com/blog/2020/01/16/what-is-a-flames ensor-and-how-to-use-with-arduino/https://www.seeedstudio.com/blog/2020/01/16/what-isa-flame-sensor-and-how-to-use-with-arduino/ 15. Bell, C.: Beginning Sensor Networks with Arduino and Raspberry Pi. 1st edn. Berkeley, CA Apress (2013) 16. Hough, K. Vision Systems for Wide Area Surveillance: ForestWatch—a long range outdoor wildfire detection systemWildfire, 2007 17. Mathews, S., Ellis, P., Hurle, J. H.Evaluatuion of Three Systems2010AustraliaBushfire Cooperative Research Centre

Smart Cities Implementation: Australian Bushfire Disaster Detection …

437

18. Hartung, C., Han, R., Seielstad, C., Holbrook, S.FireWxNet: A multi-tiered portable wireless system for monitoring weather conditions in wildland fire environmentsProceedings of the 4th International Conference on Mobile Systems, Applications and Services (MobiSys ‘06) June 2006 Uppsala, SwedenACM28412-s2.0–3374897496 19. Hefeeda, M., Bagheri, M.Wireless sensor networks for early detection of forest firesProceedings of the IEEE Internatonal Conference on Mobile Adhoc and Sensor Systems (MASS ‘07)October 20072-s2.0–5024910848110.1109/MOBHOC.2007.4428702 20. Doolin, D., Sitar, N.Wireless Sensors for Wild Fire Monitoring2005San Diego, Calif, USA SmartStructure and Material 21. Alkhatib AAA. A Review on Forest Fire Detection Techniques. International Journal of Distributed Sensor Networks 2014 doi:https://doi.org/10.1155/2014/597368 22. M. R. Nosouhi, K. Sood, N. Kumar, T. Wevill and C. Thapa, “Bushfire Risk Detection Using Internet of Things: An Application Scenario,” in IEEE Internet of Things Journal, vol. 9, no. 7, pp. 5266-5274, 1 April1, 2022, doi: https://doi.org/10.1109/JIOT.2021.3110256

An Advanced and Ideal Method for Tumor Detection and Classification from MRI Image Using Gamma Distribution and Support Vector Machine S. K. Aruna, Rakoth Kandan Sambandam, S. Thaiyalnayaki, and Divya Vetriveeran

1 Introduction Brain tumor is defined as the growth of large amount of abnormal cells in the brain. Any type of tumor can cause a problem in the body depending upon its severity. Tumors can be cancerous or non-cancerous. The cancerous (malignant) tumors begin at the brain, whereas the non-cancerous (benign) tumors begin at some part of the body and it spread to brain. The tumors can be classified into two categories primary brain tumor and secondary brain tumor, primary brain tumor occurs in the human brain where the secondary brain tumor occurs in the other parts of human body apart from human brain. If not treated properly, the survival rate for the cancerous tumors is less than two years. The malignant brain tumor begins when normal cells or tissues in the brain acquire mutation in their DNA. Because of this mutation, the cells in the human brain starts to raise and it distribute at improved rates and remain living while healthy cells die and as the end, a mass of irregular cells occurs in the brain, which forms the tumor. S. K. Aruna (B) · R. K. Sambandam · D. Vetriveeran Department of Computer Science and Engineering, School of Engineering and Technology, CHRIST (Deemed to be University)—Kengeri Campus, Bangalore, Karnataka, India e-mail: [email protected] R. K. Sambandam e-mail: [email protected] D. Vetriveeran e-mail: [email protected] S. Thaiyalnayaki Department of Computer Science and Engineering, School of Computing, Bharath Institute of Higher Education and Research (Deemed to be University), Chennai, Tamilnadu, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_38

439

440

S. K. Aruna et al.

The secondary brain tumor often occurs in those people who had a history of cancer. The symptoms of brain tumor always varies depending on the size, rate of growth, and the location of the brain tumor. Image segmentation is the process of analyzing and extracting information from the given image with or without compromising its integrity [1].

2 Related Works Menze et al. report the set-up and consequences of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) coordinated related to the MICCAI 2012 and 2013 meetings. Twenty best in class cancer division calculations were applied to a bunch of 65 multi-contrast MR outputs of low-and high-grade glioma patients— physically clarified by up to four raters—and to 65 tantamount sweeps created utilizing growth picture reproduction programming. Wang et al., another sort of crossover dissemination-based level set technique is proposed to proficiently address the intricacy that the picture division issue has. It is not the same as the customary strategies, as the proposed strategy is performed on picture dissemination space rather than that of power space. From the get go, the nonlinear dissemination in view of the total variety stream and the added substance administrator parting plan is done on the first power pictures to get the diffused picture. Afterward, the nearby dispersion energy term is built by the exhibition of the homomorphic unsharp covering procedure on the diffused picture in order to execute the neighborhood piece astute steady inquiry. Shreyas et al., the MRI images that are obtained by RF signal acquisition confounds the analysis of images due to the bias. This happens due to the imperfect radiation of pattern in the system. To correct the heterogeneity caused in MRI images, the intensity standardization is used. While preprocessing the data, the training and the transformation are the two stages. After this, the values are calculated for each image patch. The CNN model is used to go deeper into each layer and then learn about the networks and its complex features. Havaei et al., the purpose of this is to classify the pixel based on its position concerning the surrounding region in the entire brain. Three cascading architectures are discussed. The requirement for cascaded architectures is mentioned. The architecture revolves around concatenating the CNNs, i.e., the output of one is passed as input to the other. Input, local pathways, and pre-output concatenation techniques are discussed. The paper also discusses about the imbalance in the BraTs dataset. It justifies the approach for using two-phase training. In the first phase of the training, the dataset consists of equally populated patches. The second phase of training focuses on retraining only the output labels. Wang et al., has used the clever fix driven level set technique for the division of the neonatal cerebrum MR pictures by exploiting the inadequate portrayal strategies. The authors have used the existing libraries to adjust a physically fragmented pictures for the utilization of inadequate potrayal in the fix based methodology. The spatial

An Advanced and Ideal Method for Tumor Detection and Classification …

441

consistency in the likelihood of guides that are gotten from the subject-explicit chart book is then built up by considering, the likenesses of a fix between the adjoining patches.

3 Proposed Work The main aim of the research work is to auto-segment the brain tumor images, which extract from MRI. The manual segmentation of brain tumors by the experts consumes a lot of time; sometimes it takes more than a day. Therefore, the necessary for autosegmentation of brain tumors from the MRI scans is much essential which speeds up the diagnosis of the patient. The paper focuses on the process of segmentation of brain tumors. To proceed with the paper, understanding of different types of scans is required. After thoroughly understanding the types of data, gathering information on the methods, and availability of the dataset, the following methodology is proposed.

3.1 Basic Model This paper mainly consists of three levels: preprocessing, feature extraction, and classification. The details of these processes will be mentioned in Sect. 4.4. In general, the dataset is divided into training and validation. This is now preprocessed. After preprocessing, the preprocessed image is extracted which is then provided to train the model. Post the training, the model is loaded again for feature extraction and classification with a new test dataset.

4 Experimental and or Analytical Work Completed in the Paper 4.1 Dataset The first and most important module in the machine learning network is gathering the early dataset. The paper utilizes the BraTs 2020 dataset. It used only the training dataset that is provided for the training, validation, and testing. From around 350 images, only 100 cases were considered for the training process, which consists of randomly shuffled HGG and LGG. The validation was done on 40 cases of randomly shuffled HGG and LGG. The testing was done on 24 cases, which are randomly chosen. The initial size of the image was 1024*728.

442

S. K. Aruna et al.

4.2 Gamma Distribution The proposed approach is computerized or automated in figuring out the tumor region through an appropriate image segmentation method along with segmentation-based edge analysis. Coordinate matching the usage of gamma distribution and enhancing the edges with recognition have been computed the usage of the machine learning method. Gamma distribution technique has been used here for the training and testing purposes. Gamma distribution technique has also been used for the purpose of feature extraction on this paper.

4.3 Modeling, Analysis, and Design Figure 1 shows the process flow that is happening in this paper. Three main processes can be subdivided. The first process is data processing. It revolves around the collection of data, i.e., obtaining the dataset, preprocessing, and extracting patches. This step is to make the data more understandable by the model to be trained. The next process is the segmentation of the model. Lastly, classification is performed on the model with the new data. This is the basic flow of what’s happening in the paper. The input dataset is been loaded into the model at first, followed by the preprocessing process, in which filtering, sharpening, and smoothing happens. Then, segmentation and feature extraction of the brain tumor happens using gamma distribution.

Fig. 1 Flowchart of automatic brain tumor segmentation

An Advanced and Ideal Method for Tumor Detection and Classification …

443

4.4 Implementation and Testing The MRI instrument always usages powerful magnets to polarize and excite the hydrogen nuclei, i.e., proton in the water molecules in the human tissue, produces a detectable signal, which is spatially encoded and that results in the images of the body. MRI images are different from normal images used for computer vision. It can consume a lot of memory. The dataset that is being used here is BraTs with a size of about 512*512 for training the model.

5 Modules 5.1 Data Acquisition The first and the important module of constructing a ML model is to collect the initial dataset. The dataset that is required for the experimental evaluation was obtained from BraTs dataset. It used only the training dataset that is provided for the training, validation, and testing. From around 350 images, only 100 cases were considered for the training process, which consists of randomly shuffled HGG and LGG. The initial size of the image was 1024*728. The validation was done on 40 cases of randomly shuffled HGG and LGG. The testing was done on 24 cases, which are randomly chosen.

5.2 Preprocessing Data preprocessing is applied on given MRI dataset. Data preprocessing is a technique, which transfers the information from raw data into an understandable format. Genuine information is frequently insufficient, conflicting, as well as ailing in explicit practices or drifts and is most likely going to contain various mistakes. Data preprocessing is a shown strategy for settling such issues.

5.3 Segmentation Segmentation of an MRI image entails the divide of the single image into sub-regions of attributes that are similar. The main reason of the image segmentation is to extract many features of the image that can be combined or split in order to build the OOI which analysis and clarification are achieved and it includes clustering, thresholding, etc. The proposed method consists of the usage of mixture of EM algorithm and level set method for segmentation of the MRI images.

444

S. K. Aruna et al.

6 Results and Analysis The proposed procedure is totally robotized in distinguishing the growth pictures in view of preparing the edge-based picture division facilitates utilizing symmetrical gamma dissemination alongside the AI approach. The mean, standard deviation, skewness, and smoothness of the brain tumor are also been calculated for each tumor identification. The significant features of orthogonal gamma distribution with ML approach are self-identification of region of interest with enhanced imaging segmentation approach, which uses the edge coordinate matching that stands unique among the other techniques that were used previously. For better generalization, the mean, standard deviation, entropy, variance, smoothness, etc., were computed at the end. The accuracy of the gamma distribution method on the model is about 97%. The accuracy of this model is calculated by taking the ratio between true positive and false positive rate (Figs. 2, 3, 4 and 5).

Fig. 2 Initial screen of the output

Fig. 3 Loading an image into the model

An Advanced and Ideal Method for Tumor Detection and Classification …

445

Fig. 4 Result after being uploaded in the model

Fig. 5 Feature extraction and classification of the tumor

7 Conclusion A completely programmed and exact strategy for division of the entire cerebrum growth and intratumoral districts involving gamma circulation procedure in clinical imaging is supportive of presented. The proposed method verified and calculated quantitatively on the training dataset of BraTs 2020 dataset. Machine learning with Notepad++ and MATLAB instance is used. The accuracy was about 97% was obtained from the model. The model has been tested for various dataset of different persons, and as a result, it gives a prominent performance for the detection of the tumor status of those patients and also helps in the promising implication in their treatment plan.

446

S. K. Aruna et al.

Acknowledgements The authors are grateful for the facilities provided by CHRIST (Deemed to be University), Bengaluru and Bharath Institute of Higher Education and Research (Deemed to be University), Chennai for the facilities offered to carry out this work.

References 1. Peyrl, A., Frischer, J., Hainfellner, J.A., Preusser, M., Dieckmann, K., Marosi, C.: Brain tumors—other treatment modalities. In: Handbook of Clinical Neurology, vol. 145, pp. 547– 560. Elsevier B.V (2018) 2. Wang, F., Jiang, R., Zheng, L., Meng, C., Biswal, B.: 3D U-Net based brain tumor segmentation and survival days prediction. Lect. Notes Comput. Sci. 11992, 131–141 (2020). (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). https://doi.org/10.1007/978-3030-46640-4_13 3. Neugut, A.I., et al.: Magnetic resonance imaging-based screening for asymptomatic brain tumors: a review. Oncologist 24(3), 375–384 (2019). https://doi.org/10.1634/theoncologist. 2018-0177 4. Thaiyalnayaki, S., Sasikala, J., Ponraja, R.: Indexing near-duplicate images in web search using Minhash algorithm on conference ELSEVIER. In: International Conference On Processing Of Materials, Minerals And Energy (July 29th–30th) 2016, Ongole, Andhra Pradesh, India (2016) 5. Lather, M., Singh, P.: Investigating brain tumor segmentation and detection techniques. Procedia Comput. Sci. 167(2019), 121–130 (2020). https://doi.org/10.1016/j.procs.2020. 03.189 6. RakothKandan, S., Dr. Sasikala, J.: Segmentation techniques for medical images—an appraisal. Int. J. Comput. Appl. 153(10), 27–31 (2016). ISSN 0975-8887 7. Tang, Z., et al.: Deep learning of imaging phenotype and genotype for predicting overall survival time of glioblastoma patients. IEEE Trans. Med. Imaging 39(6), 2100–2109 (2020). https:// doi.org/10.1109/TMI.2020.2964310 8. Rakoth Kandan, S., Dr. Sasikala, J.: Computer-aided diagnosis and classification of brain tumor identification using PNN and SVM. Int. J. Pharm. Technol. 9(2), 29920–29932 (2017) (Elsevier). ISSN 0975-766X 9. Rehman, A., Naz, S., Razzak, M.I., Akram, F., Imran, M.: A deep learning-based framework for automatic brain tumors classification using transfer learning. Circ. Syst. Signal Process. 39(2), 757–775 (2020). https://doi.org/10.1007/s00034-019-01246-3 10. Lachinov, D., Shipunova, E., Turlapov, V.: Knowledge distillation for brain tumor segmentation. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 11993, 324–332 (2020). https://doi.org/10.1007/978-3-030-46643-5_32 11. Ostrom, Q.T., Patil, N., Cioffi, G., Waite, K., Kruchko, C., Barnholtz-Sloan, J.S.: CBTRUS statistical report: primary brain and other central nervous system tumors diagnosed in the United States in 2013–2017. Neuro. Oncol. 22(Supplement_1), IV1–IV96 (2020). https://doi. org/10.1093/neuonc/noaa200 12. Thaiyalnayaki, S., Sasikala, J., Ponraj, R.: Detecting near-duplicate images using segmented Minhash algorithm. J. Int. J. Adv. Intell. Paradigms 13. Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42(2012), 60–88 (2017). https://doi.org/10.1016/j.media.2017.07.005 14. Angulakshmi, M., Lakshmi Priya, G.G.: Automated brain tumour segmentation techniques—a review. Int. J. Imaging Syst. Technol. 27(1), 66–77 (2017). https://doi.org/10.1002/ima.22211 15. Abd-Ellah, M.K., Awad, A.I., Khalaf, A.A.M., Hamed, H.F.A.: Design and implementation of a computer-aided diagnosis system for brain tumor classification. In: Proc. Int. Conf. Microelectron. ICM, 73–76 (2016). https://doi.org/10.1109/ICM.2016.7847911

An Advanced and Ideal Method for Tumor Detection and Classification …

447

16. Aldape, K., et al.: Challenges to curing primary brain tumours. Nat. Rev. Clin. Oncol.16(8), 509–520 (2019). https://doi.org/10.1038/s41571-019-0177-5 17. Gonzalez, K., Camp, M., Zhang, M.: 3D brain tumor segmentation: narrow UNet CNN. In: IEEE MIT URTC (Undergraduate Research Technology Conference, pp. 1–4 (2018) (Online). Available: https://par.nsf.gov/biblio/10095110

Forecasting Stock Exchange Trends for Discrete and Non-discrete Inputs Using Machine Learning and Deep Learning Techniques Teja Dhondi, G. Ravi, and M. Vazralu

1 Introduction Forecasting stock exchange prices is one of the complex problems that can be solved by machine learning and deep learning techniques. The stock data are always nonlinear and do not follow any particular pattern and are governed by multiple factors such as nature of markets, investor ideology, and performance of various public companies, politics, etc., which need complex mathematical calculations to understand and predict it. It is essential that the stock market information should be effectively and efficiently processed before it is fed to various machine learning models. Using this mechanism, the stock and index values can be predicted with greater accuracy. Stock market prediction system will help immensely to stop losses that might incur due to sudden falling up stock prices can benefit the traders and stockholders. The machine learning models have the capabilities to automatically identify and learn the patterns within the start data and hence are suitable for predicting stock trends. In this paper, we propose to compare the performance of various machine learning models like XGBoost, AdaBoost, linear regression, etc., and deep learning mechanisms like RNN and LSTM to forecast the stock trends. 10 different technical indicators for the past 10 years serve as the input to our system. Two different approaches for productions are also employed. They are non-discrete approach and discrete approach to understand the impact of preprocessing before feeding the data to the machine learning models. The discrete approach utilizes data such as open, close, high, and low values of the stocks. The non-discrete approach utilizes preprocessing to convert non-discrete data into discrete data and then feed it to the model for stock T. Dhondi (B) · G. Ravi · M. Vazralu Computer Science and Engineering, Malla Reddy College of Engineering and Technology, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_39

449

450

T. Dhondi et al.

prediction. The performances thoroughly evaluated for three classification metrics and identify the best tuning parameter for each model. We strongly believe that this model can pave a way and help in building a real-time system that can predict stock market trends in live environments.

1.1 Aim of the Project In this project, we aim to come up with effective and efficient machine learning models that can predict the movement of prices in stock exchange accurately. This system will help the stockholders and traders to make appropriate decisions and help in the process of stock trading. As the stock market information would contain multiple patterns, machine learning models are the best way to predict the stock trends and prices these models can learn the patterns by themselves. This would be of immense help as the traders can keep up with the fluctuating prices and improvise the trading decisions.

1.2 Scope of the Project This project utilizes one of the existing stock market data called Tehran stock data for the past 10 years with 10 technical indicators. We aim to predict the trends with respect to this data and build a Web application using which users can input the technical indicators of their choice and get predictions about the trend. Maintaining user history or working with live environments does not fall under the purview of this project.

2 Literature Survey 2.1 A Local and Global Event Sentiment-Based Efficient Stock Exchange Forecasting Using Deep Learning Stock trading and investment in stock markets are important for investors and venture capitalists who invest in businesses and get profits. Hence, it is important for them to understand and predict how a stock would perform in near future [1]. They will invest only if they have confidence on that company. All the investments in stock markets are risky and at the same time offer high profitability when compared to traditional investing options. These markets and exchanges can change abruptly and are impacted by various economic, political, and other major events that have occurred during that point of time. In this article, the others tried study the impact

Forecasting Stock Exchange Trends for Discrete and Non-discrete …

451

of events that occurred locally and globally on the stocks. Pakistan, Turkey, United States, and Hong Kong are the countries that were considered for the study. Twitter dataset has been used to understand the sentiment of the events that have occurred between 2012 and 2016 on the stock markets corresponding to these countries [2, 3]. The sentiment of the event is computed using sentiment analysis technique, and its relation to the stock trend has been observed. Experimental evaluation reveals that using sentiment of the events for stock market prediction greatly improves the accuracy of the results [4, 5].

2.2 Deep Learning-Based Feature Engineering for Stock Price Movement Prediction Creating machine learning models for stock market prediction have always been challenging task due to the type of data that stock markets hold [6]. They are nonlinear in nature and often noisy. Advancements in AI and deep learning techniques, forecasting of stock market trends have become possible. In this article, a novel technique called neural network with multiple refinements has been developed for extraction of financial features from the data to forecast the stock prices [7]. This mechanism uses both convolutional neural networks and recurrent neural networks. Experimental evaluation shows that the abovementioned mechanism outperforms other traditional approaches [8].

3 System Analysis 3.1 Existing System Stock price forecasting has always been an arduous issue for financial and statistical experts. There is a need to a forecast the prices of stocks as one has to purchase stocks that might increase and sell stocks that might decrease in price in the near future so as to maximize the profits and minimize the losses. The traditional ways of forecasting stock prices are fundamental analysis which relies on in organization’s annual growth rate, market position, etc. [9, 10]. The other method is technical analysis mechanism, and it relies on the stocks previous prices and values. It uses historical data and patterns to forecast the stock price. These methods do not accurately forecast the prices as there are many uncertain factors that directly influence the stock prices like political situations, company public profile, local and global events, etc. The existing techniques cannot predict the future trends. Might incur heavy losses [11, 12].

452

T. Dhondi et al.

3.2 Proposed System In this project, we propose to compare the performance of various machine learning models like XGBoost, AdaBoost, linear regression, etc., and deep learning mechanisms like RNN and LSTM to forecast the stock trends. 10 different technical indicators for the past 10 years serve as the input to our system. Two different approaches for productions are also employed. They are non-discrete approach and discrete approach to understand the impact of preprocessing before feeding the data to the machine learning models. The discrete approach utilizes data such as open, close, high, and low values of the stocks. The non-discrete approach utilizes preprocessing to convert non-discrete data into discrete data and then feed it to the model for stock prediction.

3.2.1

Non-discrete Data

In this process, inputs corresponding to prediction models are calculated from the mathematical formulae corresponding to each technical indicator. The indicators are homogenized to stop overwhelming to minor values by major values.

3.2.2

Discrete Data

In this process, non-discrete data are converted to discrete data using each technical indicator’s behavior and attributes. The discrete data are assigned + 1 for increasing trend and −1 for decreasing trend in every step.

3.2.3

Advantages

Able to predict the stock market trends accurately. Minimize loss and maximize profits.

4 Implementation Below is the proposed modular implementation of the paper: Admin module: The admin of the system performs the below tasks: 1. 2. 3. 4.

Login to the system Upload the dataset Exploratory data analysis Feature engineering for normalizing the features

Forecasting Stock Exchange Trends for Discrete and Non-discrete …

453

5. Splitting the dataset for training and testing 6. Creating the models using various ML & DL algorithms. User module: The user of the system performs the below tasks: 1. Login to the system 2. Predict stock price 3. View forecasted result

5 System Design System Architecture: In this section, system design is explained with step-by-step figures. Forecasting stock price movement with non-discrete data shown in Fig. 1, Forecasting stock price movement with discrete data shown in Fig. 2, Data Flow Diagram: Admin is shown in Fig. 3, Data Flow Diagram: User is shown in Fig. 4, Use case Diagram: Admin is shown in Fig. 5, Use case Diagram: User is shown in Fig. 6, Sequence Diagram is shown in Fig. 7.

Fig. 1 Forecasting stock price movement with non-discrete data

454

T. Dhondi et al.

Fig. 2 Forecasting stock price movement with discrete data

6 Results In this results section, accuracy is described in tabular format. In Table 1 shown about accuracy in % for non-discrete input, and Table 2 shown about accuracy in % for non-discrete input.

7 Conclusion Main intention of this project is to forecast stock price movement using machine learning and deep learning techniques. Last 10 years of data for 10 different indicators and four distinct stock exchange groups called diversified financials, non-metallic minerals, basic metals, and petroleum were chosen from the Tehran Stock Exchange to be the dataset. It has been pre-processed using to a process called discrete input data and non-discrete input data. The input data have been fed in these two formats to 9 different machine learning and deep learning algorithms to understand their performance. It has been observed that discrete data give better accuracy when compared to non-discrete data, and deep learning approaches like RNN & LSTM outperform all other machine learning algorithms.

Forecasting Stock Exchange Trends for Discrete and Non-discrete … Fig. 3 Data flow diagram: admin

455

456

Fig. 4 Data flow diagram: user

T. Dhondi et al.

Forecasting Stock Exchange Trends for Discrete and Non-discrete …

Fig. 5 Use case diagram: admin

457

458

Fig. 6 Use case diagram: user

T. Dhondi et al.

Forecasting Stock Exchange Trends for Discrete and Non-discrete …

459

Fig. 7 Sequence diagrams

Table 1 Accuracy in % for non-discrete input Div fin

Metals

Minerals

Petroleum

Decision tree

68

65

65

64

Random forest

72

70

73

73

AdaBoost

72

70

70

72

XGBoost

71

70

70

71

SVC

71

72

73

70

Naïve Bayes

68

68

67

68

KNN

71

73

72

71

Logistic regression

72

72

73

71

ANN

75

73

74

71

RNN

86

82

87

82

LSTM

86

82

87

83

460

T. Dhondi et al.

Table 2 Accuracy in % for discrete input Div fin

Metals

Minerals

Petroleum

Decision tree

84

84

86

83

Random forest

85

85

86

84

AdaBoost

86

85

86

84

XGBoost

85

85

86

84

SVC

85

86

87

86

Naïve Bayes

84

83

84

84

KNN

85

86

86

85

Logistic regression

85

86

87

86

ANN

87

87

87

87

RNN

90

88

89

89

LSTM

90

88

88

89

References 1. Murphy, J.J.: Technical Analysis of the Financial Markets: A Comprehensive Guide to Trading Methods and Applications. Penguin (1999) 2. Turner, T.: A Beginner’s Guide To Day Trading Online, 2nd edn. Simon and Schuster, New York, USA (2007) 3. Maqsood, H., Mehmood, I., Maqsood, M., Yasir, M., Afzal, S., Aadil, F., Selim, M.M., Muhammad, K.: A local and global event sentiment based efficient stock exchange forecasting using deep learning. Int. J. Inf. Manage. 50, 432451 (2020) 4. Long, W., Lu, Z., Cui, L.: Deep learning-based feature engineering for stock price movement prediction. Knowl. Based Syst. 164, 163173 (2019) 5. Duarte Duarte, J.B., Talero Sarmiento, L.H., Sierra Juárez, K.J.: Evaluation of the effect of investor psychology on an artificial stock market through its degree of efficiency. Contaduríay Administración 62(4), 13611376 (2017) 6. Lu, N.: A Machine Learning Approach to Automated Trading. Boston College Computer Science Senior, Boston, MA, USA (2016) 7. Hassan, M.R., Nath, B., Kirley, M.: A fusion model of HMM, ANN and GA for stock market forecasting. Expert Syst. Appl. 33(1), 171180 (2007) 8. Huang, W., Nakamori, Y., Wang, S.-Y.: Forecasting stock market movement direction with support vector machine. Comput. Oper. Res. 32(10), 25132522 (2005) 9. Venkateswara Reddy, E., Naveen Kumar, G.S., Swathi, B., Siva Naga Dhipti, G.: Deep learning approach for image-based plant species classification. In: International Conference on Soft Computing and Signal Processing, pp. 405–412. Springer, Singapore (2021) 10. Sun, J., Li, H.: Financial distress prediction using support vector machines: Ensemble vs. Individual. Appl. Soft Comput. 12(8), 22542265 (2012) 11. Naveen Kumar, G.S., Reddy, V.S.K.:High performance algorithm for content-based video retrieval using multiple features. In: Intelligent Systems and Sustainable Computing, pp. 637– 646. Springer, Singapore (2022) 12. Ou, P., Wang, H.: Prediction of stock market index movement by ten data mining techniques. Modern Appl. Sci. 3(12), 2842 (2009)

An Efficient Intrusion Detection Framework in Software-Defined Networking for Cyber Security Applications Meruva Sandhya Vani, Rajupudi Durga Devi, and Deena Babu Mandru

1 Introduction SDN was recently created as one of the competent responses to change the ultimate fate of global organizations and the Internet [1–3]. SDN, another emerging innovation, isolates information and control planes. The control plan is equipped to address all safety issues in the organization [4–7]. Deep learning (DL) is a kind of model of artificial intelligence, as it depends on the cycle of extracting huge information with confusing structures [8]. It is useful when gaining a lot of information unaided [9, 10]. The intrusion detection system (IDS) can be used to monitor network traffic, unauthorized businesses, and organization [11–13]. An investigation engine, sensors, and a revealing panel are important to the structure of the IDS. SDN-based security is a basic model for controlling malicious flows in SDN switches [14, 15]. As a result, with the growing number of attacks and severe threats against a variety of computer systems such as a computer, organization, cloud, and object network, the investigation of interrupt detection systems (IDS) has received a lot of attention. By some security specialists in recent years [16]. The IDS model can identify malicious attacks such as data theft [17, 18].

M. Sandhya Vani · R. Durga Devi · D. B. Mandru (B) Department of Information Technology, Malla Reddy Engineering College, Hyderabad 500100, India e-mail: [email protected] M. Sandhya Vani e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_40

461

462

M. Sandhya Vani et al.

2 Preliminary Knowledge This section presents a brief discussion of intrusion detection systems, intrusion detection methodologies, deep learning, and SDN-based IDS.

2.1 Intrusion Detection Systems An outage identification framework (IDS) is a product or equipment framework that analyzes network exercises or frameworks for forms of malicious behavior and creates reports for executives and regulatory frameworks. The essential ability of interrupt detection and contrast (IPS/IDS) frameworks is to distinguish attacks and interrupts. The Interrupt Discovery and Anticipation Framework (IDS) has become an underlying security concern. For the interrupt detection process, a variety of procedures [19] can be used.

2.2 Intrusion Detection Methods Recognition models are organized into two types: signature-based models and factbased models. The main type of brand model analyzes traffic to a collection of existing brands. The second type of factual model, on the other hand, maintains the profiles of clients, hosts, applications, and associations. In addition, the host or organization uses two key discovery plans: flagging and irregularity-based models, which are used to analyze activities and distinguish outages. Next, there are three key interrupt placement strategies: Signature-Based Detection (SD), Anomaly-Based Detection (AD), and Stateful Protocol Analysis (SPA). [19, 20] provides further information and subtleties on the three main techniques for locating the interruption.

2.3 Software-Defined Network The Software-Defined Network (SDN) framework is versatile, sensible, dynamic, and important [10]. Consolidate multiple staging devices for dynamic and integrated management of the PC network base. It allows network managers to quickly monitor risky requests. SDN engineering consists of three layers: control, base, and application. In fact, it is quite significant that information and control plans are specific to the functionalities of organizational devices [21] provides further insights and nuances on SDN information and control plans. Through the programming element of the SDN controller, SDN engineering incredibly affirms network search and observation tools. The creators of proposed

An Efficient Intrusion Detection Framework in Software-Defined …

463

Fig. 1 Intrusion detection system based on SDN

an SDN-based ID, shown in Fig. 1. The goal of this proposal is to look for malicious recognition in SDN network traffic. Provided additional information and specifics about the three primary methods for interrupt acknowledgment.

3 Methodology and Implementation In recent years, some research on interrupt discovery has become accessible using data mining strategies such as grouping of traffic data and interrupts acknowledgment and sorting. Portnoy introduced another type of cluster-based IDS graph known as

464

M. Sandhya Vani et al.

inconsistency break detection, which is trained on unlabeled information to distinguish new interrupts. This paper proposes a deep learning (DL) model to promote a competent SDN-based interrupt detection framework. The NSL-KDD dataset is inspected and examined in its entirety using several clustering calculations, including the proposed model. The dataset used is isolated into ordinary classifications and four major assaults. The findings provide a long-range investigation and an unambiguous close investigation into the different types of attacks remembered by the NSL-KDD dataset.

4 Proposed Intrusion Data-Based Clustering and Detection Scenarios This segment provides a definitive clarification of the proposed work for the aggregation of traffic information and the location of interruptions in the schedule of the characterized networks. The whole half and half recommended framework for clustering and characterization of SDN-based outage information is shown in Fig. 2. Fig. 2 Proposed hybrid system combined clustering and classification for intrusion detection in SDN

An Efficient Intrusion Detection Framework in Software-Defined …

465

4.1 Clustering Algorithms Based on Traffic Data The grouping system divides information into comparative and disparate sets. The main advantage of the clustering system is the identification of rarities in outages without prior information. This paper examines and discusses five clustering calculations: K-implies, further before, canopy, expansion of exceptions (EM), and densitybased computation. Below is a description of the correlation used to group the calculations. 1. The calculation of the K-cluster implies. The calculation of the K-implicate cluster is a group search strategy in which disjoint K-clusters are characterized based on the value of the elements to be assembled. 2. The calculation of the first pool furthest away. A strategy similar to that of calculating the K-implies follows, choosing centroids and assigning objects to groups. 3. Algorithm for the grouping of tones. It is a solo pre-group plan. It tends to be used as a pre-management stage for hierarchical clustering or K-imply plans. For management, the shelter plot uses two limits, T 1 > T 2 , and the fast approximate distance metric. 4. The Exception-Maximum (EM) clustering computation is seen as an expansion of the K-imply conspiracy. Distribute the object to a group in light of a weight that shows the probability of participation. Compared to the K-imply storyline, the EM conspiracy offers more remarkable accuracy. 5. Grouping algorithm based on thickness. It is a grouping calculation for information. When assigned multiple fires in a space, it picks up fires that are very close together (fires with many close neighbors), recognizing disconnected fires in thin areas (whose closest neighbors are too far apart) as anomalies.

4.2 SDN-Based IDS Using Deep Learning Model This segment depicts the proposed SDN-put together IDS based with respect to profound learning. The proposed framework can aid the ID of pernicious assaults as interruption activities. Figure 3 portrays the proposed SDN-based profound learning model for the IDS interaction. The NSL-KDD dataset is utilized to assess the proposed SDN-based IDS utilizing a DL plot. All tests were completed on Spyder utilizing the Python programming language and the Anaconda guide programming, with an Intel Core i5 GHz processor, 12 GB of RAM, and a 500 GB hard drive. The dataset was made to resolve a few inborn issues with the KDD-cup 1999 dataset. It consolidates three sorts of highlights: traffic-related, content-related, and fundamental elements. As indicated by their attributes, assaults in the dataset are delegated U2R (User to Root) assaults, R2L (Remote to Local) assaults, testing assaults, and DoS (Denial of Service) assaults.

466

M. Sandhya Vani et al.

Fig. 3 SDN-based deep learning model for intrusion detection

5 Results and Discussions This segment represents the investigation of close outcomes, conversations, and the climatic arrangement of re-enactment of grouping and discovery situations based on the proposed interruption information.

An Efficient Intrusion Detection Framework in Software-Defined …

467

5.1 Results of the Environmental Simulation of the Clustering Algorithms Based on Traffic Data Before using the clustering system, we perform a standardization interaction on the fully inserted dataset credits in the range 0–1 and the number of clusters is set to four. The NSL-KDD dataset is split using density-based calculations, K-implies, farthest first, canopy, exception-augmentation (EM), and clustering, with the training dataset containing the major types of attacks. The display of the above calculations is estimated based on the number of occasions per cluster, the running time, and the number of poorly grouped examples. The results of the tested games are displayed in Tables 1, 2, and 3 and Figs. 4, 5, 6, 7, and 8. Groups 0–3, including ordinary cases, are distributed into four groupings by clustered occurrences. The bundled sample transport is divided into Normal, DoS, R2L, U2R, and Probe. The aftereffects of the grouped occurrences utilizing the K-implies calculation are displayed in Fig. 4. Table 4 shows the quantity of bunched examples for each

Fig. 4 Distribution of instances to clusters using the K-means clustering algorithm

Fig. 5 Distribution of instances to clusters using the canopy clustering algorithm

468

M. Sandhya Vani et al.

Fig. 6 Distribution of instances to clusters using the farthest first clustering algorithm

Fig. 7 Distribution of instances to clusters using the canopy clustering algorithm

Fig. 8 Distribution of instances to clusters using the EM clustering algorithm

An Efficient Intrusion Detection Framework in Software-Defined …

469

Table 1 The simulation results of the K-means clustering algorithm Cluster number

No. of instances

Percentage

Cluster 0

37,026

25%

Normal 101

dos

probe

Cluster 1

41,533

28%

35,166

734

Cluster 2

49,724

34%

38,820

6216

Cluster 3

19,624

13%

2880

9928

6637

36,109

r2l

807

u2r 9

0

4328

1233

72

2182

2459

47

179

0

Time is taken to build a model (full training data): 4.51s

Table 2 The simulation results of the farthest first clustering algorithm Cluster Number

No. of instances

Percentage

Normal

dos

probe

Cluster 0

51,452

35%

11,092

38,749

1511

Cluster 1

30,342

21%

9789

12,280

Cluster 2

59,970

41%

54,145

1671

Cluster 3

6143

4%

1941

287

r2l

u2r

90

10

7654

619

0

1092

2955

3697

216

107 2

Time is taken to build a model (full training data): 0.39s; incorrectly clustered instances: 47,143 with percentage of 1.87%

Table 3 The simulation results of the canopy clustering algorithm Cluster Number

No. of instances

Percentage

Normal

Cluster 0

67,355

46%

57,738

Cluster 1

37,114

25%

135

Cluster 2

23,934

16%

16,357

Cluster 3

19,504

13%

2737

dos

probe

r2l

u2r

2546

4001

2962

36,162

808

9

4344

2494

728

11

9935

6651

181

0

108 0

Time is taken to build the model (full training data): 2.53s. Incorrectly clustered instances: 46,628 with a Percentage of 31.53%

tried group. Besides, this table shows the dissemination of each assault’s examples. The K-implies conspire requires 4.51 s to assemble bunch models, and the quantity of inaccurately grouped occurrences is 65108. Figure 5 portrays the bunched occurrence results acquired utilizing the farthest first calculation. Table 4 presents the quantity of grouped cases in each tried bunch. The farthest first plan requires 0.39 s to construct group models, and the quantity of mistakenly bunched examples is 47143. The canopy calculation brings about bunched cases, as displayed in Fig. 6. Table 5 shows the quantity of bunched cases for each tried group. Besides, this table shows the dispersion of each assault’s occurrences. The canopy plot requires 2.53 s to fabricate group models, and the quantity of erroneously bunched occasions is 46628. Figure 7 portrays the aftereffects of grouping examples utilizing the exception-amplification (EM) calculation. Table 6 shows the quantity of grouped

470 Table 4 Comparison between the clustering algorithms based on execution time

M. Sandhya Vani et al. Algorithm

Time (s)

K-means

4.51

Farthest first

0.39

Canopy EM

2.53 40.48

Density-based

5.41

occasions for each tried bunch. The EM conspire requires 40.48 s to construct group models, and the quantity of erroneously bunched cases is 36667. Figure 8 portrays the bunched case results acquired with the density-based grouping calculation. Table 4 presents the quantity of bunched cases in each tried group. The thickness-based grouping plan requires 5.41 s to assemble bunch models, and the quantity of inaccurately bunched occasions is 48184. Table 4 and Fig. 9 analyze the five bunching calculations in view of the quantity of occurrences in every one of the four groups. The execution period of the them is analyzed in Table 5 and Fig. 10. A correlation of five bunching calculations in view of the quantity of erroneously grouped occurrences is displayed in Table 6 and Fig. 11. The entirety of the reproduction and examination results show that the dissemination of occasions differs starting with one bunch then onto the next. The farthest first bunching calculation likewise has the most limited execution season of the five grouping calculations. Besides, among the five calculations, the EM bunching calculation creates the best number of erroneously grouped occurrences. Table 5 Comparison of between the clustering algorithms based on the incorrectly clustered instances K-means

65,108

44.0196%

Farthest first

47,143

31.8734%

Canopy

46,628

31.5252%

EM

36,667

24.7906%

Density-based

48,184

32.5772%

Table 6 Accuracy based comparison of different algorithm

Algorithm

Accuracy

Logistic regression (LR)

15.049%

Linear discriminant analysis (LDA)

82.333%

Gaussian naive bayes (NB)

51.889%

Proposed deep learning (DL)

94.21%

An Efficient Intrusion Detection Framework in Software-Defined …

471

Fig. 9 Distribution of instances to clusters using the density-based clustering algorithm

Fig. 10 Comparison between clustering algorithms based many of instances

Fig. 11 Comparison between the five clustering algorithms based on the number of incorrectly clustered instances

472

M. Sandhya Vani et al.

Fig. 12 Comparison of the proposed DL algorithm with different algorithms

5.2 Simulation Results of the SDN-Based IDS Using Deep LearningModel For binary class attack classification, the deep learning-based IDS model is used. It determines whether the input data is normal or malicious. The classification accuracy is evaluated for all 41 features, and the proposed deep learning model outperformed other machine learning techniques such as LR, LDA, and NB in terms of detection of different attack classes, as given in Table 6 and Fig. 12. The training percentage in the proposed model is 20%, the number of epochs is 200, the batch size is 10, and the number of selected attributes is 16. Figures show the accuracy and loss percentages for various types of epochs. Figure 13 show the accuracy and loss percentages for various types of epochs, where increasing the number of epochs increases the accuracy percentage while decreasing the loss percentage.

6 Conclusion and Future Work Perceiving and recognizing mysterious dangers and dangers is a critical task for providing a secure SND framework using a competent interrupt identification framework in recent times. Consequently, using density-based clustering calculations, farthest prime, canopy, exception enhancement (EM), and K-implicated cluster, this paper provides a cluster-based examination of the NSL-KDD dataset. In addition, it features a deep learning framework to promote a powerful interrupt identification framework that can recognize obscure, ill-conceived, and harmful exercises. The reproduction results show that the use of the farthest first plan resulted in a preferential appropriation of occurrence over the other four clustering plans. Furthermore, while distinguishing the opportunities for interruption, the deep learning model surpasses

An Efficient Intrusion Detection Framework in Software-Defined …

473

Fig. 13 Proposed deep learning model accuracy

existing estimates in terms of accuracy. The proposed DL strategy, for example, achieved high recognition accuracy of 94.21%. We intend to use advanced artificial intelligence tools in our future work to recognize new forms of attack on the SDN organization. Additionally, a certifiable run of an IoT-based SDN will be created and scheduled in online security applications.

References 1. Yan, Q., Yu, F., Gong, Q., Li, J.: Software-defined networking (SDN) and distributed denial of service (DDoS) attacks in cloud computing environments: a survey, some research issues, and challenges. IEEE Commun. Surv. Tutor. 18(1), 602–622 (2015) 2. Ali, J., Roh, B., Lee, B., Ohand, J., Adil, M.: Amachinelearningframeworkforpreventionofsoftwaredefined networking controller from DDoS attacks and dimensionality reduction of big data. In: Proceedings of IEEE Internatuional Conference on Information and Communication Technology Convergence (ICTC), Jeju, South Korea, pp 515–519 (2020) 3. Aliand, J., Roh, B.: An effective hierarchical control plane for software-defined network sleveraging TOPSIS for end-to-end QoSclass-mapping. IEEE Access 8, 88990–89006 (2020) 4. Bakshi K (2013) Considerations for software defined networking (SDN): approaches and use cases. In: Proceedings IEEE Aerospace Conference, Big Sky, MT, USA, pp. 1–9 (2013) 5. Karakus, M., Durresi, A.: A survey:Controlplanescalabilityissuesandapproachesinsoftwaredefined networking (SDN). Comput. Netw. 112(3), 279–293 (2017) 6. Karakus, M., Durresi, A.: Quality of service (QoS) in software defined networking (SDN): a survey. J. Netw. Comput. Appl. 80(4), 200–218 (2017) 7. Kwon, H.: Defending deep neural networks against backdoor attack by using de-trigger autoencoder, IEEE Access, 9, 2169–3536 (2021) 8. Kwon, H., Baek, J.: Adv-plate attack: Adversarially perturbed plate for license plate recognition system. J. Sens. 5, 1–16 (2021)

474

M. Sandhya Vani et al.

9. Tang, T., Mhamdi, L., McLernon, D., Zaidi, S., Ghogho, M.: Deep learning approach for network intrusion detection in software defined networking. In: Proceedings of International Conference on Wireless Networks and Mobile Communications (WINCOM), Fez, Morocco, pp. 258–263 (2016) 10. Sultana, N., Chilamkurti, N., Peng, W., Alhadad, R.: Survey on SDN based network intrusion detection system using machine learning approaches. Peer-to-Peer Netw. Appl. 12(2), 493–501 (2018) 11. Hemdan, E., Manjaiah, D.: Digital investigation of cybercrimes based on big data analytics using deep learning. In: Proceedings of Deep Learning and Neural Networks: Concepts, Methodologies, Tools, and Applications, 1st edn., pp. 615–632. vol. 2, IGI Global, USA (2020) 12. Jabez, J., Muthukumar, B.: Intrusion detection system (IDS): anomaly detection using outlier detection approach. Procedia Comput. Sci. 48(7), 338–346 (2015) 13. Duque, S., Omar, M.: Using data mining algorithms for developing a model for intrusion detection system (IDS). Procedia Comput. Sci. 61, 46–51 (2015) 14. Gonzalez, C., Charfadine, S.M., Flauzac, O., Nolot, F.: SDN-based security framework for the IoT in distributed grid. In: Proceedings of International Multidisciplinary Conference on Computer and Energy Science (IMCCES), Split, Croatia, pp. 1–5 (2016) 15. Liu, Y., Kuang, Y., Xiao, Y., Xu, G.: SDN-based data transfer security for internet of things. IEEE Internet of Things J. 5(1), 257–268 (2017) 16. Al-Jarrah, O., Arafat, A.: Network intrusion detection system using attack behavior classification. In 17. Proceedings of IEEE International Conferece on Information and Communication Systems (ICICS), Irbid, Jordan, pp. 1–6 (2014) 18. Su, M.: Prevention of selective black hole attacks on mobile ad hoc networks through intrusion detection systems. Comput. Commun. 34(1), 107–117 (2011) 19. Raghav, I., Chhikara, S., Hasteer, N.: Intrusion detection and prevention in cloud environment: a systematic review. Int. J. Comput. Appl. 68(24),7–11 (2013) 20. Seshadri Ramana, K., Bala Chowdappa, K., Obulesu, O., et al.: Deep convolution neural networks learned image classification for early cancer detection using lightweight. Soft Comput. 26, 5937–5943 (2022). https://doi.org/10.1007/s00500-022-07166-w 21. Mandru, D.B., ArunaSafali, M., RaghavendraSai, N., SaiChaitanya Kumar, G.: Assessing deep neural network and shallow for network intrusion detection systems in cyber security. In: Smys, S., Bestak, R., Palanisamy, R., Kotuliak, I. (eds.) Computer Networks and Inventive Communication Technologies. Lecture Notes on Data Engineering and Communications Technologies, vol. 75. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-3728-5_52

Inter-Antenna Interference Cancellation in MIMO GFDM Using SVD-MMSE Method Nivetha Vinayakamoorthi and Sudha Vaiyamalai

1 Introduction Next generation wireless systems [1] are expected to provide ubiquitous communication and seamless connectivity [2] for different applications scenarios, namely massive machine type communication (mMTC), enhanced mobile broadband (eMBB) and tactile internet (TI). Generalized frequency division multiplexing (GFDM) [3] is a flexible modulation technique to fulfil the diverse challenging requirements [4] by choosing prototype pulse shaping filters, subcarriers and subsymbols. However, prototype filtering in GFDM introduces inter-symbol interference (ISI) and intercarrier interference (ICI) [5]. The multiple input multiple output (MIMO) GFDM in spatial multiplexing (SM) mode provides higher data rate which in turn introduces inter-antenna interference (IAI) along with ICI & ISI. This paper presents the singular value decomposition-based minimum mean square error (SVD-MMSE) equalization method for spatial multiplexed MIMO GFDM system to cancel IAI, and also, the proposed method achieves significant peak to average power ratio (PAPR) reduction. The contributions of this work are summarized below: (i) SVD-MMSE equalization method is proposed to cancel IAI in MIMO GFDM system. (ii) The performance of IAI cancellation is analysed in terms of BER for different filter roll-off factors, modulation techniques and compared with the conventional MIMO GFDM system. (iii) It is found that the proposed method reduces the PAPR for different filter roll-off factors with respect to the conventional system.

N. Vinayakamoorthi (B) · S. Vaiyamalai Department of Electronics and Communication Engineering, National Institute of Technology, Tiruchirappalli 620015, India e-mail: [email protected] S. Vaiyamalai e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_41

475

476

N. Vinayakamoorthi and S. Vaiyamalai

1.1 Related Works The existing methodology to eliminate IAI in MIMO systems [6] is not feasible towards MIMO GFDM because of its nonorthogonal nature [7]. In conventional system, the interferences are cancelled by using equalization and detection techniques: Linear equalizers such as MMSE equalizer [8] and zero forcing (ZF) equalizer [9] are used to reduce interferences; however, residual ISI exists. Iterative detection receivers namely Markov Chain Monte-Carlo algorithm [10], Expectation propagation [11], MMSE-based sorted QR decomposition [12] provides significant interference cancellation at high signal to noise ratio (SNR) with several iterations. Noniterative detectors such as MMSE-based parallel interference cancellation [13] cancels the interference with increased algorithm complexity whereas sparse factorization of the channel matrix [14] reduces the algorithm complexity but degradation in the error rate for large subcarriers. Sphere decoding based successive interference cancellation [15], and Tensor-based precoding [16] provides better detection performance with increased implementation complexity. From the literature, it is evident that the existing algorithms cancel the interference in an iterative manner and provide better BER performance only at high SNR levels. SVD [17] can fully exploits the spatial multiplexing capabilities in MIMO systems with nonlinear distortion. In [18], it is shown that SVD precoding reduces the PAPR in OFDM based communication systems. In this paper, the SVD-MMSE equalization method is proposed to cancel IAI in MIMO GFDM system and also the proposed method reduces PAPR.

2 System Model The transceiver model of SM MIMO GFDM system with proposed SVD-MMSE equalization method for T transmit and R receive antennas is shown in Fig. 1. GFDM is a block-based multicarrier system, and each block contains K subcarriers and M subsymbols (i.e. N = M K ). The binary data bits (b(t) ) from tth transmit antenna are mapped to complex data symbols s(t) . The complex data symbols are passed through the pulse shaping filter which circularly shifts in both frequency & time domain.

Fig. 1 The transceiver model of MIMO GFDM with SVD-MMSE method

Inter-Antenna Interference Cancellation in MIMO …

477

The GFDM signal [3] transmitted on tth antenna [4] as, (t)

d [n] =

K −1 M−1  

(t) g[(n − m K ) mod N ]e j2π K n sm,k , k

(1)

k=0 m=0

Similarly, the GFDM signal in matrix form [5] is written as, d(t) = As(t) ,

(2)

   T where A= g0,0 . . . g0,K −1 . . . g M−1,K −1 and g = g[0] . . . g[1] . . . g[N − 1] is the pulse shaping filter of order N × 1. The GFDM signals from tth transmit antenna to r th receive antenna after cyclic prefix (CP) addition are sent through independent multipath Rayleigh block fading channels with L-tap channel impulse response (CIR) h(r,t) [n] based on [7]. Then, the received signal at the r th antenna after removal of CP [4] equals, y(r ) =

T −1 

H(r,t) x(t) + w(r ) ,

(3)

t=0

where H(r,t) is the N × N circular convolution matrix based on the CIR h(r,t) [n] and w(r ) denotes AWGN with distribution w(r ) ∼ CN (0, σw2 I), where σw2 is the ˆ −1 ), noise variance. In conventional method, after channel equalization (i.e. Heq = H the MMSE-based GFDM demodulation is performed, and the received signal after GFDM demodulation at r th receive antenna is expressed as, ) z(r ) = By(r eq ,

(4)

σ2

) (r ) H −1 H 2 w where y(r eq = Heq y ,B = (I N σs2 + A A) A and σs denotes variance of the data symbols. However, the conventional system exhibits IAI due to spatial multiplexed channels. To tackle this problem, the noniterative SVD-MMSE method is proposed. The MIMO channel matrix is decomposed based on SVD as,

ˆ = UEV H , H ⎡

(5)

⎤ H(1,1) . . . H(1,T ) ˆ =⎣ : : : ⎦ of order N  × N  where N  = N R (when R = where H (R,1) (R,T ) ... H H T ) [4], U and V are unitary matrices of order N  × N  , and E is the non-negative diagonal matrix with singular values as diagonal elements of order N  × N  . The GFDM-modulated symbols at the transmitter are multiplied by precoding matrix V and is expressed as, x = Vd, (6)

478

N. Vinayakamoorthi and S. Vaiyamalai

where d = [dT (0) dT (1) . . . .dT (T ) ]T of order N  × 1. After adding CP of length Ncp , the GFDM-modulated symbols are transmitted through MIMO channel. At the receiver (with perfect reception), the received signal is multiplied by U H after removal of CP. In the proposed method, by integrating MMSE equalization to SVD method, the independent data streams are transmitted across parallel subchannels, and IAI is efficiently cancelled due to MMSE equalization (Heq ). It is given by, Heq = (I N 

σw2 + E H E)−1 E H , σs2

(7)

The received GFDM signal after MMSE-based channel equalization is written as, ˆ yeq = Heq U H UEV H Vd + w,

(8)

ˆ = Heq U H w and w = [w(0) w(1) . . . w(R) ]T . Then, the where U H U = V H V = I, w received signals are demodulated based on MMSE method, and then, it is demapped to bits. In addition to IAI cancellation, the SVD precoding provides reduction in PAPR. Block-based structure in GFDM exhibits high PAPR due to overlapped subsymbols and subcarriers. The PAPR for GFDM signals is written as, PAPR =

max |x[n]|2

n∈[0,N −1]

Pa

,

(9)

where Pa denotes the average power of GFDM signal. The PAPR is computed by the complementary cumulative distribution function (CCDF), and it is described as the probability that PAPR exceeds an PAPR threshold level (PAPRo ), i.e. CCDF = Pr[PAPR > PAPRo ].

3 Simulation Results In this section, 2 × 2 MIMO GFDM performance is analysed based on PAPR and BER . Monte-Carlo simulations are executed to compare the performance of proposed system with conventional system. The simulation parameters are listed in Table. 1. In this paper, the channel state information is assumed to be available at both transmitter & receiver. In Fig. 2, the BER performance of the proposed system with 2-QAM modulation for different filter roll-off factors (α = 0.1, 0.5, 0.9) is presented. The proposed method provides better BER performance over conventional method for all α, thereby significantly cancels IAI. Also, both methods have a degradation in BER due to the increased ICI & ISI at higher filter roll-off factor. The proposed method outperforms the conventional method by approximately 4.5 dB SNR for α = 0.9 at BER of 10−2 .

Inter-Antenna Interference Cancellation in MIMO … Table 1 Simulation parameters Parameters Number of subcarriers Mapping Prototype pulse shaping filter Number of subsymbols Channel Filter roll-off factor CP length

Fig. 2 BER Comparison (2-QAM) with different filter roll-off factor

Fig. 3 BER Comparison for varying QAM modulation techniques

479

Notations

Value

K – g[n] M – α Ncp

64 2-QAM, 4-QAM Root Raised Cosine (RRC) 5 Multipath Rayleigh fading with 15 taps 0.1, 0.5, 0.9 16

480

N. Vinayakamoorthi and S. Vaiyamalai

Fig. 4 PAPR Comparison (2-QAM) for different filter roll-off factor

The BER performance comparison for different modulation techniques, namely 2QAM and 4-QAM with α = 0.1 is depicted in Fig. 3. The proposed method provides less interference cancellation for higher-order modulation due to minimum Euclidian distance. The SVD-MMSE equalization method reduces the required SNR about 3 dB and 4.5 dB over conventional method at BER of 10−2 for 4-QAM and 2-QAM modulation, respectively. Figure 4 shows the PAPR performance comparison for different filter roll-off factors (α = 0.1, 0.5, 0.9). It can be inferred that the SVD precoding reduces the PAPR of the system by approximately 2.5 dB, for α = 0.9 at CCDF of 10−4 . The proposed method provides a significant IAI cancellation in noniterative manner irrespective of SNR level and also reduces PAPR due to SVD-MMSE equalization.

4 Conclusion The SVD-MMSE equalization method is proposed in this paper to cancel IAI in spatial multiplexed MIMO GFDM system. The performance of IAI cancellation is validated through BER for different filter roll-off factor and modulation techniques. The proposed method outperforms the conventional MIMO GFDM system by approximately 4.5 dB at BER of 10−2 for α = 0.9. This method cancels the interference in a noniterative manner. Also, it provides 2.5 dB reduction in PAPR for α = 0.9 at CCDF of 10−4 over conventional system. Acknowledgements This work was supported by the Science and Engineering Research Board (SERB), Department of Science and Technology (DST), India under Grant No. EEQ/2019/000224.

Inter-Antenna Interference Cancellation in MIMO …

481

References 1. Liang, Q., Durrani, T.S., Gu, X., Koh, J., Li, Y., Wang, X.: IEEE access special section editorial: new waveform design and air-interface for future heterogeneous network towards 5G. IEEE Access 8, 160549–160557 (2020). https://doi.org/10.1109/ACCESS.2020.3019946 2. Kumar, C.R., Nanaji, U., Sharma, S.K., Murthy, M.R.: Performance analysis on IARP, IERP, and ZRP in hybrid routing protocols in MANETS using energy efficient and mobility variation in minimum speed. In: Raju, K.S., Senkerik, R., Lanka, S.P., Rajagopal, V. (eds.) Data Engineering and Communication Technology, vol. 1079, pp 811–824, Springer Singapore (2020) 3. Michailow, N., Matthè, M., Gaspar, I.S., Caldevilla, A.N., Mendes, L.L., Festag, A., Fettweis, G.: Generalized frequency division multiplexing for 5th generation cellular networks. IEEE Trans. Commun. 62(9), 3045–3061 (2014). https://doi.org/10.1109/TCOMM.2014.2345566 4. Öztürk, E., Basar, E., Çirpan, H.A.: Generalized frequency division multiplexing with flexible index modulation. IEEE Access 5, 24727–24746 (2017). https://doi.org/10.1109/ACCESS. 2017.2768401 5. Nivetha, V., Sudha, V.: BER Analysis of GFDM System under Different Pulse Shaping Filters. In: 2021 6th International Conference on Wireless Communications, pp. 53–56. Signal Processing and Networking (WiSPNET), IEEE (2021) 6. Melvasalo, M., Janis, P., Koivunen, V.: MMSE equalizer and chip level inter-antenna interference canceler for HSDPA MIMO systems. In: 2006 IEEE 63rd Vehicular Technology Conference. vol. 4, pp. 2008–2012 (2006). https://doi.org/10.1109/VETECS.2006.1683198 7. Matthè, M., Mendes, L.L., Michailow, N., Zhang, D., Fettweis, G.: Widely linear estimation for space-time-coded GFDM in low-latency applications. IEEE Trans. Commun. 63(11), pp. 4501–4509 (2015). https://doi.org/10.1109/TCOMM.2015.2468228 8. Zhang, D., Matthè, M., Mendes, L.L., Fettweis, G.: A study on the link level performance of advanced multicarrier waveforms under MIMO wireless communication channels. IEEE Trans. Wirel. Commun. 16(4), pp. 2350–2365 (2017). https://doi.org/10.1109/TWC.2017.2664820 9. Tunali, N.E., Wu, M., Dick, C., Studer, C.: Linear large-scale MIMO data detection for 5G multi-carrier waveform candidates. In: 2015 49th Asilomar Conference on Signals, Systems and Computers. IEEE, pp. 1149–1153 (2015). https://doi.org/10.1109/ACSSC.2015.7421320 10. Zhang, D., Matthè, M., Mendes, L.L., Fettweis, G.: A Markov chain Monte Carlo algorithm for near-optimum detection of MIMO-GFDM signals. In: 2015 IEEE 26th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC). IEEE, pp. 281–286 (2015). https://doi.org/10.1109/PIMRC.2015.7343310 11. Zhang, D., Mendes, L.L., Matthè, M., Gaspar, I.S., Michailow, N., Fettweis, G.P.: Expectation propagation for near-optimum detection of MIMO-GFDM signals. IEEE Trans. Wirel. Commun. 15(2), 1045–1062 (2016). https://doi.org/10.1109/TWC.2015.2482479 12. Matthe, M., Gaspar, I., Zhang, D., Fettweis, G.: Near-ML detection for MIMO-GFDM. In: 2015 IEEE 82nd Vehicular Technology Conference (VTC2015-Fall). IEEE, pp. 1–2 (2015). https://doi.org/10.1109/VTCFall.2015.7391033 13. Matthe, M., Zhang, D., Fettweis, G.: Iterative detection using MMSE-PIC demapping for MIMO-GFDM systems. In: European Wireless 2016; 22th European Wireless Conference. IEEE, pp. 1–7 (2016) 14. Matthè, M., Zhang, D., Fettweis, G.: Low-complexity iterative MMSE-PIC detection for MIMO-GFDM. IEEE Trans. Commun. 66(4), pp. 1467–1480 (2018). https://doi.org/10.1109/ TCOMM.2017.2782339 15. Matthè, M., Zhang, D., Fettweis, G.: Sphere-decoding aided SIC for MIMO-GFDM: coded performance analysis. In: 2016 International Symposium on Wireless Communication Systems (ISWCS). IEEE, pp. 165–169 (2016). https://doi.org/10.1109/ISWCS.2016.7600894 16. Pandey, D., Leib, H.: A Tensor based precoder and receiver for MIMO GFDM systems. In: ICC 2021—IEEE International Conference on Communications. IEEE, pp. 1–6 (2021). https://doi. org/10.1109/ICC42927.2021.9500957

482

N. Vinayakamoorthi and S. Vaiyamalai

17. Guerreiro, J., Dinis, R., Montezuma, P.: Optimum performance of nonlinear MIMO-SVD schemes. In: 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring). IEEE, pp. 1–5 (2021). https://doi.org/10.1109/VTC2021-Spring51267.2021.9449046 18. Ma, M., Huang, X., Jiao, B., Guo, Y.J.: Optimal orthogonal precoding for power leakage suppression in DFT-based systems. IEEE Trans. Commun. 59(3), pp. 844–853 (2011). https:// doi.org/10.1109/TCOMM.2011.121410.100071

Categorization of Thyroid Cancer Sonography Images Using an Amalgamation of Deep Learning Techniques Naga Sujini Ganne and Sivadi Balakrishna

1 Introduction In addition to hyperthyroidism, thyroid disorders are on the rise in India. Thyroid cancer is thought to be highly dependent on changes in human hormone levels. It is the fundamental reason for pediatric endocrinology and endocrinology [1]. Goiter (Hashimoto’s thyroiditis) and non-thyroid primary edema are two types of autoimmune hypothyroidism. Autoimmune hyperthyroidism affects 2% of women and 2% of males. Thyroid-stimulating antibodies (TSAb) are agonists that target the thyroid-stimulating hormone receptor (TSHR), causing chronic hyper-stimulation and thyrotoxicosis’ creates a highly efficient classified model by performing repetitions to train, learn data, and provide feedbacks to arrive at automated abstraction and to screen classified features. From the literature, it is found that DL algorithms can be applied to large data [2]. By using image processing, the classification of the models can increase their efficiency by segmenting images. Combining US and AI provides more an objective, stable and accurate models, and better diagnostic results. Convolution neural networks outperformed radiologists proposed combining two pre-trained CNN on a large dataset of US thyroid nodule pictures [3]. The study found that deep learning can detect thyroid lesions. Using high-level characteristics, Liu et al. features derived from CNNs and custom features. It was used to construct semantic deep features that were merged with conventional features such as histogram of directed gradient and transpose a scale-in vary presented a study using ultrasonic imaging and artificial intelligence to detect cancerous thyroid nodules [4]. To boost classification performance, various CNN models’ outputs were blended using bagging techniques. In, a DCNN model outperformed expert radiologists in diagnosing thyroid cancer N. Sujini Ganne (B) · S. Balakrishna Department of Computer Science and Engineering Vignan’s Foundation for Science, Technology and Research (Deemed to be University), Vadlamudi, Guntur, Andhra Pradesh 522213, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_42

483

484

N. Sujini Ganne and S. Balakrishna

patients. Because deep learning with transfer learning reduces training time, it is useful in medical applications. DL techniques and radiologists for diagnosing thyroid lesions in ultrasonography, it can detect thyroid lesions efficiently. Ensemble of two different techniques will give the feasible results when compared with the independent performance of those techniques. The paper organization is formed as follows. Section 2 shows the related work. In Sect. 3, the proposed methodology and ensemble method are introduced. The performance of the proposed algorithms is investigated and evaluated in Sect. 4. Finally, conclusions are presented in Sect. 5.

2 Related Works In this section, we have seen various research papers on thyroid ultrasound imaging using an ensemble of deep learning methods. A CNN module trained by ImageNet data is transmitted to the ultrasound image field, to produce semantic deep features in smaller sample situations to provide the feasible solution. In, an ensemble technique that merged 2 DL modules, viz., CNN and TL were established. For the initial module, named five CNN, they established an effective end-to-end skilled module with 5 convolutional layers, whereas in the next module, for pre-trained VGG-19 framework remained enhanced, trained, and repurposed [1]. They trained and authenticated these modules by a dataset for ultrasound images containing 4 kinds of thyroidal images: micro-nodular, autoimmune, normal, and nodular. Mehrotra [2] proposed an AI-based technology to improve the efficiency of thyroid nodule classifier scheme. Therefore, they excerpt image features after ultrasound thyroid image in 2fields: spatial domain produced on DL and frequency domain centered on FFT. By the mined feature, they execute cascade classifier system to categorize the input thyroid images whether it is malign (positive)/benign (negative) cases. Yoon [3] established an image analysis module by a DL method and calculated when the technique can forecast thyroid nodules by benign FNA outcomes. Ultrasonographic images for thyroid nodules using histologic/cytologic outcomes have been subsequently gathered. For training method, 1358 (688 malignant, 670 benign) thyroid nodule images have been input to a Inception-V3 network module. The module remained pre-trained for classifying nodules, for example, benign/malignant by ImageNet database. Ren [4] examines whether DCNN has the significance for improving diagnosis performance and improve the level for inter-observer arrangement in the classification for thyroid nodules now histopathological slide. Overall, of 11,715 splitter images after 806 persons, inventive histological image has been separated to test and train a dataset [5]. Inception-ResNet-v2 and VGG-19 have been trained by a training dataset and tested by the test dataset for determining diagnosis performance of distinct histologic kinds of thyroid nodules. The DCNN modules,

Categorization of Thyroid Cancer Sonography Images Using …

485

particularly VGG-19, attained fulfilling accuracy in the process of distinguishing thyroid lumps using histopathology [6]. Analyses of misdiagnosed cases exposed that standard adenoma and tissue have been the most difficult histological kinds for DCNN to distinguish when entire malignant classifications attained outstanding diagnosis performance [7]. The result indicates that DCNN modules might have significant to facilitate histopathology thyroid disease diagnoses [8]. The DCNN modules, particularly VGG-19, attained fulfilling accuracy in the process of distinguishing thyroid lumps using histopathology. Analyses of misdiagnosed cases exposed that standard adenoma and tissue have been the most difficult histological kinds for DCNN to distinguish when entire malignant classifications attained outstanding diagnosis performance.

3 Methodology The proposed 6-CNN model is a simple and lightweight architecture which is designed CNN network. Using this model, the considered dataset named DDTI: thyroid ultrasound images for end-to-end training. The current proposed architecture is shown in Fig. 1. In this model, 14 layers: 6 convolution layers, 6 pooling: 1 fully connected layer: 1 output layer with activation function (softmax) [2]. The steps involved in 6-CNN model are as follows: Algorithm 1: The 6-CNN Model. Input: The trained dataset with 224 × 224 Output: 6-CNN model weights Step 1: Every image in the dataset, normalize the intensity values in between [0,1] Step 2: Dense first convolution layer with activation function Leaky ReLU Step 3: Dense second convolution layer with activation function Leaky ReLU Step 4: Feature map is done with max pooling layer with respect to previous layers Step 5: Repeat steps 2 and 3 Step 6: Add flatten layer for making single vector of outputs of max pooling layer

Fig. 1 Proposed 6-CNN end-to-end model

486

N. Sujini Ganne and S. Balakrishna

Step 7: Add fully connected layer with 128 hidden units Step 8: For inactive purpose use dropout Step 9: With learning rate 0.0001 optimize the model with Adams optimization Step 10: Training of model with 100epochs Step 11: Finally save the model for future usage.

3.1 VGGNet Model It is a rudimentary CNN architecture and is sufficient for a network that is not very deep. The new sort of images should be used as input. As a result, selecting the appropriate kernel size for the convolution mask is hard, and there is no one optimal size for the convolution mask in most cases. The input image size is 222 × 222. This model uses 3 × 3 pixel channels. The steps involved in VGGNet-16 model are as follows: Algorithm 2: The VGGNet-16 model. Input: The trained dataset with 224 × 224. Output: VGGNet-16 model weights. Step 1: Every image in the dataset, normalize the intensity values in between [0,1]. Step 2: Load the VGGNet-16 models pre-trained on ImageNet datasets. Step 3: By removing of last layers of VGGNet-16 and make non-trainable all the layers of model. Step 4: Add flatten layer for making single vector of outputs of max pooling layer. Step 5: Add fully connected layer with 128 hidden units. Step 6: For inactive purpose use dropout. Step 7: Optimize the model with Adam’s optimizer. Step 8: Train the model with 100 epochs. Step 9: Finally save the model for future usage.

3.2 CNN-VGGNet-16 Ensemble Method To improve the accuracy in classification, a CNN-VGGNet that fuses CNN and VGGNet-16 models with their outputs was designed as shown in Fig. 2.

6- CNN

C11

C12 CNN-VGGNet16 Ensemble (C1, C2)

VGGNet-16

C21

Fig. 2 CNN-VGGNet-16 ensemble model

C22

Categorization of Thyroid Cancer Sonography Images Using …

487

The final diagnosis decision is taking on average of individual probabilities of 6-CNN and VGGNet-16 outcomes. C1 = Aver(C11 , C21 )

(1)

C2 = Aver(C12 , C22 )

(2)

As Jensen inequality rule [9] average error of ensemble model is less than average error of individual model. The ensemble model reduced the variance and this model sensitive to training data. The CNN-VGGNet-16 model: Input: The test data with size 224 × 224. Output: predicted probabilities of both class malignant, benign. Step 1: Every image in the dataset, normalize the intensity values in between [0,1]. Step 2: Load 6-CNN model which is saved. Step 3: Load VGGNet-16 model which is saved. Step 4: Predicting probabilities of 6-CNN is C11, C12. Step 5: Predicting probabilities of VGGNet are C21, C22. Step 6: Find the average of two predicted probabilities (C1, C2). Step 7: Output prediction probabilities for diagnosing the classes.

3.3 Performance Metrics The performance metrics used to in this paper are specificity (sp), sensitivity or recall, negative positive values (NPV), positive predictive values (PPV), test accuracy. True Negative (True Negative + FalsePositive)

(3)

True Postives (True Postivies + False Negatives)

(4)

Sp = Se =

NPV =

True Negatives (True Negative + FalseNegatives)

(5)

PPV =

True Postives (True Postivies + False Positives)

(6)

Accuracy =

Correctly Classified Cases Totalcases

(7)

The quality of classifier is measured using different metrics like receiver operating curve (ROC) and precision/recall (P/R) curve [3] and calculated micro-average area

488

N. Sujini Ganne and S. Balakrishna

under the curve (ROC-AUC) and average precisions to check overall diagnosis classes. For each diagnosis class, the averages of the separate true positive, true negative, false positive, and false negative for each diagnosis class were computed. Here, TP is no. of images correctly classified as normal; FP is other classes which are misclassified as normal, TN is other classes that are classified correctly, and FN is normal cases which are misclassified [10].

4 Results The digital database of thyroid ultrasound images is an open-access resource for the scientific community that is available to anyone who wants to use it. IDIME, CIM@LAB, and the Universidad Nacional de Colombia (UNC) have all contributed to the success of this initiative (Instituto de Diagnostico Medico). There are 99 cases in the database and 134 photos in total.

4.1 Experimental Setup The dataset having two classes. The dataset is randomly divided into training and testing dataset in the ration of 80% to 20% using pattern train-validation test. We did experiment in three ways: 1. Train and evaluate 6-CNN models 2. Train and evaluate VGGNet models 3. CNN-VGGNet ensemble methods for evaluation Experiment 1: 6-CNN Model. This 6-CNN model is trained for 100 epochs on training datasets. The experimental results are listed in Table 1. From the table, the 6-CNN model had good ability in classifying classes. Experiment 2: VGGNet-16. Table 1 Classification results of thyroid images with 6-CNN model Diagnostic class

Accuracy (%)

ROC-AUC

Se (%)

Sp (%)

PPV (%)

NPV (%)

Malignant

93.72

0.92

92.61

93.67

92.32

91.42

Benign

92.82

0.91

91.89

91.85

91.52

90.72

Avg

93.27

0.92

92.25

92.76

91.92

91.07

Categorization of Thyroid Cancer Sonography Images Using …

489

Table 2 Classification results of thyroid images with VGGNet model Diagnostic class

Accuracy (%)

ROC-AUC

Se (%)

Sp (%)

PPV (%)

NPV (%)

Malignant

94.82

0.93

91.75

94.21

92.91

92.85

Benign

93.71

0.91

92.91

93.52

91.72

92.63

Avg

94.26

0.92

92.33

93.86

92.31

92.74

Table 3 Classification results of thyroid images with CNN-VGGNet model Diagnostic class

Accuracy (%)

ROC-AUC

Se (%)

Sp (%)

PPV (%)

NPV (%)

Malignant

97.32

0.96

95.61

95.85

95.15

96.91

Benign

96.95

0.95

94.85

94.95

94.17

96.52

Avg

97.13

0.96

95.23

95.4

94.66

96.71

We trained VGGNet-16 model is trained for 100 epochs on training datasets. The results obtained for test dataset are listed in Table 2 in terms of accuracy (%) ROCAUC Se (%), Sp (%), PPV (%), and NPV(%). This model shown good performance in terms of accuracy and specifics in classifying thyroidal images. Experiment 3: 6-CNN—VGGNet-16 Ensemble model. The experiment is done with ensemble model (CNN-VGGNet) which is performed well when compared with individual model with 1–4% higher in each performance metric. By using this model, thyroid images are classified with accuracy of 97.32 and 96.95%, sensitivity 95.61% and 94.85, specificity 95.85% and 94.95 (Table 3). Figure 3 represents the accuracy curve of the proposed model. Figure 4 represents the loss curve of the proposed model classifier. Table 4 represents the image-level confusion matrix. Figure 5 represents the ROC curve of the proposed model.

Fig. 3 Accuracy curve of proposed model

490

N. Sujini Ganne and S. Balakrishna

Fig. 4 Loss curve of proposed model classifier

Table 4 Image-level confusion matrix

Benign

Malignant

Benign

348

156

Malignant

9

495

Fig. 5 ROC curve of proposed model

5 Conclusions This research focused on finding the way of best ensemble model to classify the thyroidal disorders using lightweight CNN model and pre-trained model VGGNet16. This combination CNN-VGGNet model has a stable performance in terms of overall test overall accuracy (97.13), ROC-AUC (0.96%), Se (95.23%), Sp (95.4%), PPV (94.66%), and NPV (96.71%). The comparison results show that the ensemble deep learning-based model has maximum applicability when compared with individual model. Our deep learning model could help endocrinologists by providing a second viewpoint during diagnosis.

Categorization of Thyroid Cancer Sonography Images Using …

491

References 1. Vasile, C.M., Udris, toiu, A.L., Ghenea, A.E., Popescu, M., Gheonea, C., Niculescu, C.E., Ungureanu, A.M., Udris, toiu, S, ., Droca¸s, A.I., Gruionu, L.G., Gruionu, G.: Intelligent diagnosis of thyroid ultrasound imaging using an ensemble of deep learning methods. Medicina 57(4), 395 (2021) 2. Mehrotra, P., McQueen, A., Kolla, S., Johnson, S.J., Richardson, D.L.: Does elastography reduce the need for thyroid FNA s? Clin. Endocrinol. 78(6), 942–949 (2013) 3. Choi, Y.J., Jung, I., Min, S.J., Kim, H.J., Kim, J.H., Kim, S., Park, J.S., et al.: Thyroid nodule with benign cytology: is clinical follow-up enough? PLoS One 8(5), e63834 (2013) 4. Ren, X., Yang, L., Li, Y., Cheshari, E.C., Li, X.: The integration of molecular imprinting and surface-enhanced Raman scattering for highly sensitive detection of lysozyme biomarker aided by density functional theory. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 228, 117764 (2020) 5. Schroyens, N., Alfei, J.M., Schnell, A.E., Luyten, L., Beckers, T.: Limited replicability of drug-induced amnesia after contextual fear memory retrieval in rats. Neurobiol. Learn. Mem. 166, 107105 (2019) 6. Raja, N., Rajinikanth, V., Fernandes, S.L., Satapathy, S.C.: Segmentation of breast thermal images using Kapur’s entropy and hidden Markov random field. J. Med. Imag. Health Inform. 7(8), 1825–1829 (2017) 7. Goodwin, S., McPherson, J.D. and McCombie, W.R.: Coming of age: ten years of nextgeneration sequencing technologies. Nat. Rev. Gen. 17(6), 333–351 (2016) 8. Handkiewicz-Junak, D., Czarniecka, A., Jarz˛ab, B.: Molecular prognostic markers in papillary and follicular thyroid cancer: current status and future directions. Mol. Cell. Endocrinol. 322(1– 2), 8–28 (2010) 9. Himabindu, G., Ramakrishna Murty, M., et al.: Classification of kidney lesions using bee swarm optimization. Int. J. Eng. Technol. 7(2.33), 1046–1052 (2018) 10. Himabindu, G., Ramakrishna Murty, M., et al.: Extraction of texture features and classification of renal masses from kidney images. Int. J. Eng. Technol. 7(2.33), 1057–1063 (2018)

Enhancement of Sensing Performance in Cognitive Vehicular Networks K. Jyostna and B. N. Bhandari

1 Introduction With the growing number of vehicles, millions of people are killed in road accidents every year around the globe. Also, traffic congestion and air pollution poses a serious threat leading to poor quality of life. This has brought in the focus to improve road safety and traffic efficiency. Vehicular ad hoc network, also referred to as VANET, is foreseen as a vital application with enormous societal impact. It is a network of vehicles acting as nodes. Cooperative communication among the vehicles can help make the driving experience more comfortable and safer. Hence, VANET architecture facilitates vehicles to communicate among themselves (V2V) and also with the roadside infrastructure (V2I) to support both safety and non-safety applications [1]. Federal Communication Commission (FCC) has allocated 75 MHz spectrum in the 5.9 GHz band to support vehicular communications. However, the allocated spectrum gets insufficient during the peak traffic hours. This elevates the spectrum scarcity problem in vehicular environment, and to address this, cognitive radio technology has been proposed as the solution [2]. CR technology allows the SUs to exploit the unused frequencies in PU environment so as to increase the spectral efficiency and capacity [3]. The SUs monitor the availability of free channels in the PU spectrum through spectrum sensing. The main objective of CR technology is to identify, use, and manage the vacant PU channels (Spectrum holes) for maximizing the spectrum utilization thereby improving the network throughput while mitigating the interference to the PU [4]. By implementing opportunistic spectrum access (OSA), CR allows the SUs to dynamically K. Jyostna (B) VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India e-mail: [email protected] B. N. Bhandari JNTU College of Engineering, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_43

493

494

K. Jyostna and B. N. Bhandari

switch between unoccupied spectrum gaps, taking the advantage of the locally unused PU spectrum. The SUs in a CR environment go through various phases of cognition cycle. Spectrum Sensing: The primary component of the cognition cycle which enables the SUs to continuously scan the PU channels and identify the spectrum holes. Spectrum Decision: This module decides the best channel among the available channels that can be used by the SUs based on its QoS requirements. Spectrum Sharing: The perfect knowledge about PU traffic in different channels could make the spectrum sharing convenient. This module implements different channel access and allocation techniques for effective transmissions. Spectrum Mobility: PUs being the high priority users over the SUs, when a PU activity is detected, SUs are transferred to a new unused channel to prevent the interference.

2 Related Literature Spectrum sensing is the fundamental task which identifies the spectrum holes in the local neighborhood of CR. It is the ability to detect whether the PU channel is idle or busy. The main objective of the spectrum sensing technique is to reliably detect the channel status. The sensing technique involves the detection of a PU transmitted signal by an SU. The SUs access these identified PU channels for transmitting their information. Spectrum sensing techniques are broadly categorized into two types [5]. The open issues related to spectrum sensing are discussed in [6, 7]. Non-cooperative Sensing: In this approach, SU relies on its sole effort to detect the PU transmissions on a particular spectrum band. This includes matched filter detection, energy detection [8], and cyclostationary feature detection methods [9]. Cooperative Spectrum Sensing (CSS): The cooperative spectrum sensing is more accurate than individual sensing as the sensing ability of a single SU is affected by fading, shadowing, and noise. The cooperative sensing process employs multiple SUs to perform the sensing task, and the final decision on channel status is made based on the local sensing results. It can be formulated by using a binary hypothesis model, where H 0 and H 1 hypotheses denote the absence and presence of the PU signal. CSS techniques are further classified into three types depending on how the SUs share their spectrum sensing data in the network. Centralized spectrum sensing: In this method, a fusion center (FC) or controller is employed to assign the channels to the SUs for sensing [10]. SUs share their local sensing outcome to the FC through the control channel where they are combined to

Enhancement of Sensing Performance in Cognitive Vehicular Networks

495

make a final decision on the PU channel status. The FC then sends back the decision to the cooperative SUs. Distributed Spectrum sensing: This kind of network architecture does not rely on a fusion center for making the judgment. SUs share the sensing results among themselves within the transmission range. Each SU sends its sensing outcome to other SUs, combines its sensing result with received sensing data, and takes a decision on the channel availability status. Relay-assisted spectrum sensing: This scheme is assisted by the relay nodes as both the detection channel and control channel used for reporting are not perfect. It improves the performance of the cooperative survey.

3 Proposed Work To detect the unused frequencies of the PU network, energy detection-based sensing is performed by each SU. The signal transmitted by the PU propagates in free space and reaches the SUs in the network through direct paths and reflected paths. Meanwhile, the signal strength is attenuated due to fading, and noise is also introduced into the signal. The received signal strength indicator (RSSI) is used as a metric against a set threshold to detect the PU activity. The block diagram of energy detection technique is shown in Fig. 1. The received signal is passed through a BPF to filter out the frequency components of permitted range. The signal energy is then calculated and compared with the predefined threshold. The threshold (λ) is set using Eq. (1). λ = d −1 ∗

 √ 2 ∗ nSample ∗ nvar4 ∗ Q −1 (Pf ) + nSample ∗ nvar2

(1)

where d = distance between PU and SU. nSample = number of samples of PU signal considered. nVar = noise variance in PU signal received. The RSSI calculated is then compared with the threshold set and is classified into hypothesis of PU absence (H 0 ) or PU presence (H 1 ).

Fig. 1 Block diagram of energy detection technique

496

K. Jyostna and B. N. Bhandari

H0 : RSSI < l

(2)

H1 : RSSI > l

(3)

The result from each node is transmitted to FC by PSK modulation as it is least affected by amplitude effects in the environment. The FC employs OR and majority (K-out-of-N) fusion rules to obtain a global decision on the channel status and sends it to the SUs [11]. According to majority rule, out of the total number of SUs, if the result from at least greater than half of the number of SUs is of one hypothesis, then that particular hypothesis is made as final decision. OR rule states that if at least one of the SUs transmits that PU activity is detected, then the global detection is in favor of that SU result. So, if a SU detects PU activity incorrectly, the result becomes incorrect. The evaluation is carried out using the metrics, namely probability of detection (Pd ), probability of misdetection (Pm ), and probability of false alarm (Pf ) uses the results obtained from the scenario for each N value and are obtained by using the following equations. Probability of Detection (Pd ): The ability of the receiver to detect the PU status accurately. Pd =

No. of(H0 /H0 or H1 /H1 ) Total number of iterations

(4)

Probability of Misdetection (Pm ): The inability of the receiver to detect the PU status accurately. Pm =

No. of(H0 /H1 ) Total number of iterations

(5)

Probability of False Alarm (Pf ): The ambiguous PU status detected by the SU. Pf =

No. of(H1 /H0 ) Total number of iterations

(6)

The effective use of threshold and an appropriate fusion technique for evaluating the sensing performance for cognitive vehicular networks is presented in Fig. 2. The global decision taken by the FC is transmitted to the SUs reporting the available channel band number using ASK modulation. The bands allotted to the safety and non-safety messages differ as safety messages such as beacons require less bandwidth while the non-safety messages which include information and entertainment messages require more bandwidth. The safety message and non-safety messages are then transmitted in the allotted band, and their performance is evaluated using BER. The bit error rate of a transmission is defined as the percentage of bits in the transmission that have errors as a result of noise, interference, or other issues. The safety

Enhancement of Sensing Performance in Cognitive Vehicular Networks

497

Fig. 2 Proposed methodology to enhance the sensing performance

and non-safety messages are transmitted by an SU and are received by a nearby SU. The BER is calculated by considering the total number of bits transmitted and the number of bits received incorrectly. BER =

Number of bits received incorrectly Total number of bits receivedd

(7)

4 Simulation The network is simulated for 100 iterations. Hence, 100 samples are obtained from each SU throughout the simulation. Initial nodes considered are 5 and are incremented

498

K. Jyostna and B. N. Bhandari

in steps of 5 up to 50. The network size considered is 2000*2000 m and is used for simulating varying number of SUs. The SUs move within this scenario taking 100 s for simulation. The SUs are made to move on the designated directions with an average speed of 40 Kmph. Under each simulation, we determine the probability of detection, misdetection, and false alarm which varies with the number of SUs (N) while remaining parameters are constant. The simulation involves three phases which are explained as follows.

4.1 Phase1: Creation of Cognitive Vehicular Network Scenario Cognitive vehicular network environment is created in which PUs, RSUs are fixed and SUs are mobile and number of SUs are varied from 5 to 50. Consider the number of SUs (N) at the beginning of each loop and give the movement with maximum speed up to 40 Kmph at the junction. The PU signal is considered to be ON (represents busy for some period of time) and OFF (represents idle in next period of time) and is amplitude modulated. The SUs keep moving on the road for 100 iterations with a speed which increases gradually or remain constant which resemblances a real-time road network. Because the SUs moves through the road, the signal transmitted from PU base station is affected by multi-path reflections and attenuations. Hence, the RSSI is not constant always, and therefore, each SU has its own threshold based on the distance it is from the PU base station. The SUs speed is varied throughout the program for ‘N’ varied from 5 to 50 in increments of 5.

4.2 Phase2: Perform the Spectrum Sensing in the Environment The performance metrics Pd , Pm , and Pf are evaluated using energy detection-based sensing method which involves the following procedure. SNR is found by calculating the distance between PU and SU positions, and AWGN noise is added to the modulated PU. The energy for the modulated PU signal is found and is compared against the predefined threshold. If computed energy > Threshold (Checked by ‘N’ SUs). PU is present (bit value is assumed to be 1). else. PU is absent (bit value is assumed to be 0). The bit values (1/0) obtained from ‘N’ SUs are modulated using FSK technique, and the modulated signal reaches the FC (i.e., it collects the data from all SUs and retransmits the global decision along with allotted vacant band number to SUs).

Enhancement of Sensing Performance in Cognitive Vehicular Networks

499

The global decision on channel availability is found by applying OR and majority fusion techniques, i.e., if a particular rule is satisfied by the data at FC, we increment detection else we increment the misdetection value. The extent of the presence of PUs is known by calculating the probability of detection and misdetection along with the probability of false alarm. Pd = Final detection value/No. of iterations. Pm = 1 − Pd . Pf = Final false alarm count/No. of iterations. Increment N = N + k, where k is any value. If ‘N’ reaches the maximum count, then plot Pd , Pm , and Pf against the number of SUs otherwise set the value of ‘N’ again and repeat the process.

4.3 Phase 3: Access the Sensed Channel to Transmit Different Types of Messages The sensed channel is utilized for transmitting safety and non-safety messages of secondary vehicular users. The average BER is calculated and plotted for the transmitted messages in the channel for different signal-to-noise ratio values.

5 Results The performance of proposed methodology is evaluated for the network scenario shown in Fig. 3. The blue dots represent the SUs, and the triangle represents the PU base station. The movement of nodes can be interpreted from Fig. 3, and at each instant, RSSI is calculated at every SU. Fig. 3 Network scenario

500

K. Jyostna and B. N. Bhandari

Figure 4 describes how Pd , Pm , and Pf varies with the increase in the number of secondary users. It is observed from the graph that the detection probability is more when we apply majority rule at the FC compared to the OR rule. From the second plot, we can see that the misdetection is more for K-out-of-N rule compared to OR rule. The more the detection performance, less is the misdetection. Lastly, it is observed that probability of false alarm is more for OR rule compared to K-out-of-N rule applied at the FC. From Fig. 5, we can observe that BER is more for non-safety message compared to safety message which varies with SNR, and this is concluded by calculating the average for all BER’s for varied SNR.

Fig. 4 Performance metrics

Enhancement of Sensing Performance in Cognitive Vehicular Networks

501

Fig. 5 BER versus SNR for critical and non-critical messages

6 Conclusion Cooperative sensing is an effective technique to improve detection performance by exploiting the unused TV white spaces. In this paper, spectrum scarcity which is a major challenge for the effective implementation of vehicular networks is addressed. In dense vehicular regions, the 75 MHz bandwidth allotted is insufficient and is utilized for transmitting high-priority messages. This is addressed by employing cooperative spectrum sensing concept of cognitive cycle in vehicular networks. The detection performance using various fusion rules is calculated by varying the number of vehicles participating in the spectrum sensing and are compared. Finally, it is concluded that the K-out-of-N rule has maximum detection probability and minimum false alarm probability and misdetection probability which is utmost desirable in a real-world vehicular environment scenario.

References 1. Chang, C.-Y., Yen, H.-C., Deng, D.-J.: V2V QoS guaranteed channel access in IEEE 802.11p VANETs. IEEE Trans. Dependable Secure Comput. 13(1), 5–17 (2016) 2. Riyahi, A., Bah, S., Sebgui, M., Elgraini, B.: Classification and analysis of spectrum sensing mechanisms in cognitive vehicular networks. EAI Endorsed Trans. Smart Cities (2018) 3. Eze, J., Zhang, S., Liu, E., Eze, E.: Cognitive radio technology assisted vehicular ad-hoc networks (VANETs): current status, challenges, and research trends. In: 2017 23rd International Conference on Automation and Computing (ICAC), 2017, pp. 1–6 4. Jyostna, K., Bhandari, B.N., Roja, S.: Performance analysis of group based cooperative sensing schemes. In: 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), pp. 270–274 (2016) 5. Jyostna, K., Dr. Bhandari, B.N.: Spectrum sensing techniques for cognitive vehicular networks. Int. J. Adv. Res. Comput. Commun. Eng. 11(5), 627–631 (2022)

502

K. Jyostna and B. N. Bhandari

6. Ahmed, A.A., Alkheir, A.A., Said, D., Mouftah, H.T.: Cooperative spectrum sensing for cognitive radio vehicular adhoc networks: an overview and open research issues. In: IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 1–4 (2016) 7. Chembe, C., et al.: Spectrum sensing in cognitive vehicular network: state-of-art, challenges and open issues. Comput. Commun. 97, 15–30 (2017) 8. Abbassi, S.H., Qureshi, I.M., Abbasi, H., et al.: History-based spectrum sensing in CRVANETs. J Wirel. Com. Netw. 2015, 163 (2015) 9. Lima, A.D., Silveira, L.F. and Xavier-de-Souza, S.: Spectrum sensing with a parallel algorithm for cyclostationary feature extraction. Comput. Electr. Eng. 71, 151–161 (2018). ISSN: 00457906 10. Ben Halima, N., Boujemaa, H.: Cooperative spectrum sensing with distributed/centralized relay selection. Wirel. Pers. Commun. 115(1), 611–632 (2020) 11. Qian, X., et al.: Hard fusion based spectrum sensing over mobile fading channels in cognitive vehicular networks. Sensors (Basel, Switzerland) 18(2), 475 (2018)

Evaluating Ensemble Through Extracting the Best Information from Environment in Mobile Sensor Network F. Asma Begum and Vijay Prakash Singh

1 Introduction A wireless sensor network is made up of several sensor nodes that interact with one another over wireless networks. In order to detect environmental conditions, the sensor node can be equipped with a number of sensors, including thermal, mechanical, magnetic, and optical sensors. These sensors are capable of sensing and measuring information from the surrounding environment and transferring that information to the microcontroller. The wireless sensor-system used in applications like video surveillance [15] where numerous cameras are mounted to supervise vehicles and individuals in an area, the sampling rate of camera is to be expanded by diminishing the sampling rates of other alternate cameras. Oversampling is the issue, specifically [2]. Due to their limited memory, sensor nodes are typically placed in remote locations where they are tasked with the tasks of sensing, collecting, processing, and transmitting data to the base station [11]. Sensor nodes are mostly powered by rechargeable batteries. A WSN is a collection of a great amount of battery-powered, ease-detecting gadgets, commonly referred to as sensors that collaborate to execute a distributed sensing task in a domain. For each type of sensor, these sensors have a corresponding detection type of gear, a central preparation unit, an electrical power supply unit, and a radio transceiver. Wireless sensor networks assist us in detecting our surroundings and obtaining information about naturally occurring phenomena that are observable to us. Since sensors send sample estimation of environment to the server, sampling rate influences the quality of service (QoS) of WSN [3]. Sensor networks, also known as wireless sensor networks (WSNs), are networks of self-organizing devices [6] that are essentially supporting sensors that are used to F. Asma Begum (B) · V. P. Singh Department of ECE, Sri Satya Sai University of Technology and Medical Sciences, Sehore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_44

503

504

F. Asma Begum and V. P. Singh

monitor physical or atmospheric conditions. For the chance of too low sampling rate, there is a possibility that the sensor estimation will miss a great deal of data. Yet again when we set the sampling rate too high, the estimations are possibly more accurate with the cost of computational power being increased resulting in short lifetime of sensors [4].

2 Literature Survey Qian et al. [9], environmental calamities like flash floods are growing more common, putting a greater strain on human society. They are frequently unpredictable, grow quickly, and span wide geographic areas. Better monitoring, for example, employing mobile sensing systems that can provide early and accurate information to first responders and the general public, can decrease the repercussions of such disasters. This work proposes a real-time sensor job scheduling method that is tailored to the characteristics and requirements of city-scale environmental monitoring tasks. The programme uses a forward search method and relies on the predictions of a distributed parameter system that simulates flash flood propagation. It inherits part of a search tree’s causal relationship, which defines all possible sequential decisions. The forward search tree’s computationally intensive data assimilation phases are replaced by functions based on the covariance matrix between observation sets. Younis et al. [16], wireless sensor networks (WSNs) are made up of a large number of sensor nodes with sensing, wireless communication, and computation capabilities that are densely deployed in a monitoring region. The notion of mobile agent has recently been employed in wireless sensor networks to reduce energy usage and improve data collecting. The primary role of WSN is to gather and transmit data from sensor nodes. The basic purpose of data aggregation is to efficiently capture and combine data. Finding the best itinerary planning for the mobile agent is a key stage in data collection. Ibrahim et al. [4], wireless sensor networks (WSNs) are an important component of the networking field. They are becoming increasingly significant for a variety of applications due to their low cost, high efficiency, and tiny size. With the growing use of applications that rely on wireless sensor networks, however, limitations such as memory and compute continue to exist. Since the rise of the Internet of Things, which relies on the effectiveness of wireless sensor networks, there has been a need to identify effective solutions. A survey of the most recent research in this topic for the years 2018–2020 was conducted in this study. Fan [8], this thesis proposes an optimization methodology for WSN design that determines the appropriate set of protocols and number of nodes. This paper proposes a genetic algorithm (GA)-based sensor network design tool (SNDT) for wireless sensor network performance, taking into account application-specific needs, deployment constraints, and energy characteristics. Offline simulation analysis is used by SNDT to aid in the resolution of design decisions. The suggested system’s optimization tool is a GA, and an appropriate fitness function is created to take into

Evaluating Ensemble Through Extracting the Best Information …

505

account several elements of network performance. The communication protocol and the number of nodes deployed in a specific region are among the configuration attributes optimized by SNDT. Xue et al. [20–24] with the use of the FM signal and the Monte Carlo approach, a new secure localization algorithm for dynamic sensor networks called frequency modulation secure localization (FMSL) is suggested to solve the problem of safe localization in dynamic sensor networks. The algorithm locates randomly dispersed nodes loaded with FM signal receiving modules using FM signals that are broadly covered, filters some malicious attack anchors in the network, and employs an updated Monte Carlo algorithm to locate the nodes to increase placement accuracy. Meanwhile, the energy consumption of the network could be approximated based on the moving path and the localization time, as well as the sleep scheduling technique for the localized node.

3 Research Methodology Our experiment has been carried out using the mentioned models, and an adaptive sampling strategy is introduced.

3.1 Compressive Sensing The compressive sensing algorithm as illustrated in Algorithm 1. Continue to operate the steps that have been specified. Algorithm 1 i. ii. iii. iv. v.

Sampling and reconstructing signals Sparsity analysis Adaptive movement Image restoration Repeat steps i, ii, iii, and iv

3.2 Adaptive Sampling Method This segment introduces an adaptive sparse sampling [1] approach utilizing mobile sensors. While the compressive detecting gathers random estimations without thinking about effect between estimations, the proposed algorithm dependably gathers the most enlightening estimation given all the current estimations [7]. The estimations are resolved and gathered successively in nature by mobile sensors to expand detecting field consistency.

506

F. Asma Begum and V. P. Singh

3.3 Sampling and Reconstructions The sampling and reconstruction methods used have been advanced [23] and greatly improved reconstruction is possible. As it is not possible to cover all bases using random sampling. The following is the layout of the measuring matrix. i. Reshape matrix in order to construct a vector ii. Compile total number of observations iii. Create a measurement matrix based on measuring prediction. Since every one of the samples have been advanced, a greatly improved reconstruction execution is possible.

3.4 Error Correction After the mobile sensor being travelling, parts of rebuilt images are created, which involves the motion errors [18] as a part of reconstruction. The combination of two or more images is done comparing the overlaps section where location and alignment parameters indicate where it should be placed.

4 Working Procedure The target sensing field which is reconstructed is assumed to be X, it is the source of collection of sensor data. Down-sampling X with mobile sensors depending on the requirements of sensing, a collection measurement of more concise is shown as Y. An estimate is built using the formula Y = X, which maps the target signal to the best lower size, since the signal size has been reduced, each measurement must now contain more records. A mobile sensor network of nv sensors, denoted by vi , i = 1, …, nv, will collaborate to accumulate the most informative readings and retrieve an unknown sensing field. A connected graph is plotted with sensors keeping track of it, denoted by letter G. When the networked sensors circulate, the graph will modify dynamically [10, 12, 13, 17]. The sensors are placed in various locations work parallel without synchronization, when the sensor detects another sensor with more information [14]. A new measurement count on sensor (vi) at location pti is now ready to decide and take new measurements and collect all new measurement at a time instant t. To determine and gather a size yt+1i . Given all of the to be had facts, together with felt statistics and facts from other sensors it has to be more informative. Dimension dedication with community collaboration is intuitively defined in Fig. 1, in which vi , i = one…four denotes networked sensors and the sensor v1 is determining a new size. As the sensors detect each other and collaborate the information it affects the effectives of the network.

Evaluating Ensemble Through Extracting the Best Information …

507

Fig. 1 Illustrative example of informative size collection with mobile sensor networks

Each sensor has information of its nearby sensor [5] based on facts communicated the mobile sensing discipline converting trend is changed. This measurement is most informative and should comprise more data than the formerly determined one.

4.1 Performance of the Proposed Method and Analysis The adaptive sampling suggested is a feedback-driven approach that uses prior measurements [19] to direct mobile sensors in collecting fresh data. Each mobile sensor has an image sensor that can collect invariant features and pinpoint the location of other sensors. In addition, the mobile sensor will sample and recreate the unknown region block by block while simultaneously locating itself in the surroundings. Each sensor employed in the network utilizes an algorithm, which is illustrated in Algorithm 2 (Fig. 2). • The feedback loop begins with the calculated set Y i total , which is created by combining Y i, t −1 shared information, and freshly generated calculations from prior loops. • In block A, sensors create a network and fuse data, resulting in measurement set Y i t which can be utilized to make additional calculations. • Given Y i t block B selects a collection of calculations Y i, t+1 1 , …, Y i, t+1 NI as the most informative, where N1 is the number of informative calculations. Network motion and calculations in block C, newly generated calculations are re-evaluated, and one of them is collected and represented as Y i t −1 .

508

F. Asma Begum and V. P. Singh total

yi Measur ement set

IniƟal Measur ements

t

A. Data sharing and Sensor Fusion

yi B. Measurement DeterminaƟon

D. ParƟal signal reconstrucƟon

t+1 t-1

t+1

yi,1.......yi,N

yi C. Network MoƟon and Measuremen t collecƟon

Fig. 2 Adaptive sampling algorithm

4.2 Mobile Sensor Network and Adaptive Sampling Data Set Used in the Experiment The adaptive sampling algorithm explained is based on compressive sensing theory. Compressive sensing theory uses imperfect data to rebuild target signals. We have the basic compressive sensing equation Y = ∅X = ∅ψ −1 S

(1)

Here Y = observations in graphic ∅ = similarity function X = reconstructed target signal S = sparse signal of dimension N×1 and sparsity K ψ = sparse domain. Using the above equation, the signal dimension is lowered, each measurement contains more information. Mobile sensor network of nv sensors, denoted by vi,

Evaluating Ensemble Through Extracting the Best Information …

509

where i = 1. …, nv , is used to collect the information and retrieve an unknown sensing field. For measurement determination and signal reconstruction, the compressive sensing method is applied to each sensor where sensor is vi, in the mobile sensor network. The target signal is denoted as X i t at time t, rather than X, and observations corresponding to Y i t producing Yit = ∅it X it = ∅it ψ −1 Sit

(2)

Y i t = observations in graphic at time t ∅i t = similarity function at time t X i t = reconstructed target signal at time t S i t = sparse signal of dimension N × 1 and sparsity K, at time t ψ = sparse domain In practice, measurement noise should, however, be considered, therefore resulting equation is given as Yit = ∅it X it = ∅it ψ −1 Sit + No

(3)

where No = measurement of noise

5 Implementation The adaptive sampling strategy introduced enhances the results to considerable extent. When gathering measurement in a mobile sensor network, sensors function in parallel without timing synchronization. The informative measurement is explained in the diagrams where vi, denotes networked sensors. To collect the most useful calculations and recreate an unknown sensing field, a mobile sensor network of ten sensors is deployed in the sensing field. Despite the fact that none of the sensors in the mobile sensor network cover the entire sensing field, they can work together to offer complete coverage with some overlaps. Because measurements are concentrated, each sensor can reconstitute its coverage of the sensing field in real time with great fidelity. Now in the sensing field, a mobile sensor network of ten sensors is utilized in this simulation to gather the very credible measurements and reconstruct the sensing field. Figure 3a shows a scalar field with a resolution of 256 × 256. Initial measurements are obtained at random at 2%, which is 2562 × 2% = 1311. Figure 3a shows a scalar field, Fig. 3b shows all of the data collected by the sensor vi , which only covers a small part of sensing field, Fig. 3c, d shows the calculation distribution of sensors v2 and v3 . At time t, A (t) RB(i) is 3232, with sensor vi at the centre. Figure 4a shows the remodelling error of each sensing field pixel, where each pixel is determined by ||x k − xk ||2 /|| X ||2 × 2562, k = 1, …, 2562. Figure 4b shows reconstruction performance.

510

F. Asma Begum and V. P. Singh

a. 2-D scalar field.

b. Measurement of v1.

c. Measurement of v2.

d. Measurement of v3.

Fig. 3 Scalar field and measurement distribution

(a)Sensing field reconstruction with error 0.1029

(b)Errors for each pixel

Fig. 4 Reconstruction performance with 8000 measurements

Evaluating Ensemble Through Extracting the Best Information …

511

6 Conclusion To rebuild an unknown sensing field, an adaptive sampling mobile sensor network is presented. The importance of network collaboration has been stressed, resulting in a more reliable measurement determination. When compared to a single sensor, networked sensors have shown more efficiency. Down-sampling and compressive sensing reduce the number of measurements taken by mobile sensors, but the tradeoff is that the total quantity of information obtained from collected measurements is reduced as a result of this reduction. As a result, it is necessary for each measurement to have as much data as feasible.

References 1. Wood, A.L., Merrett, G.V., Gunn, S.R., Al-Hashimi, B.M., Shadbolt, N.R., Hall, W.: Adaptive sampling in context-aware systems: a machine learning approach. In: IET Conference on Wireless Sensor Systems, Sep 2012, IEEE Xplore 05 (2012) 2. Chatterjea S., Havinga, P.J.M.: In: An Adaptive and Autonomous Sensor Sampling Frequency Control Scheme for Energy-Efficient Data Acquisition in Wireless Sensor Networks, vol. 5067, pp. 60–78, 4th IEEE International Conference on Distributed Computing in Sensor Systems (DCOSS) (2008) 3. Deng, K., Moore, A.W., Nechyba, M.C.: Learning to recognize time series: combining ARMA models with memory-based learning, vol. 23, issue 6, pp. 246–250. In: IEEE Symposium on Computational Intelligence in Robotics and Automation (2004) 4. Ibrahim, D.S., Mahdi, A.F., Yas, Q.M.: Challenges and issues for wireless sensor networks: a survey. J. Glob. Sci. Res. 6(1), 1079–1097 (2021) 5. Deligiannakis, A., Kotidis, Y.: Exploiting Spatio-Temporal Correlations for Data Processing in Sensor Networks, vol. 4540, pp. 45–65. Springer, Berlin Heidelberg (2008) 6. Drira, W., Ahn, K., Rakha, H., Filali, F.: Development and testing of a 3G/LTE adaptive data collection system in vehicular networks. IEEE Trans. Intell. Trans. Syst. 17(1), 240–249 (2016) 7. Gedik, B., Liu, L., Yu, P.S.: ASAP: an adaptive sampling approach to data collection in sensor networks. IEEE Trans. Parallel Distrib. Syst. 18(12), 1766–1783 (2007) 8. Fan, J.: Using genetic algorithms to optimize Wireless Sensor Network design. Loughborough University (2020) 9. Qian, K., Claudel, C.: Real-time mobile sensor management framework for city-scale environmental monitoring. J. Comput. Sci. 45 (2020) 10. K-means Clustering. http://en.wikipedia.org/wiki/K-means_clustering 11. Lim, C.L.: An approach to understand network challenges of wireless sensor network in realworld environments. Ph.D. thesis, University of Glasgow (2019) 12. Masoum, A., Meratnia, N., Havinga, P.J.: Quality aware decentralized adaptive sampling strategy in wireless sensor networks. In: The 9th IEEE International Conference on Ubiquitous Intelligence and Computing, pp. 298–305 (2012) 13. Madden, S., Franklin, M.J., Hellerstein, J.M., Hong, W.: The design of an acquisitional query processor for sensor networks. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 491–502 (2003) 14. Mainland, G., Parkes, D.C., Welsh, M.: Decentralized, adaptive resource allocation for sensor networks. In: Proceedings of the 2nd USENIX/ACM Symposium on Networked Systems Design and Implementation, pp. 315–28 (2005) 15. Eilertson, M.D.: Simulating a Mobile Wireless Sensor Network Monitoring the Air force Marathon, p. 34988. Air Force Institute of Technology (2021)

512

F. Asma Begum and V. P. Singh

16. Younis, M., Alzarroug, M., Jeberson, W.: Data Aggregation Scheme Using Multiple Mobile Agents in Wireless Sensor Network, p. 93587. Intech Open Book Series (2020) 17. Nkwogu D.N., Allen, A.R.: Adaptive sampling for WSAN control applications using artificial neural networks. J. Senor Actuator Netw. 1(3), 299–320 (2012) 18. Padhy, P., Dash, R.K., Martinez, K., Jennings, N.R.: A utility-based adaptive sensing and multihop communication protocol for wireless sensor networks. ACM Trans. Sensor Netw. (TOSN) 6(3), 1–39 (2010) 19. Prabha, R., Ramesh, M.V., Rangan, V.P., Ushakumari, P.V., Hemalatha, T.: Energy efficient data acquisition techniques using context aware sensing for landslide monitoring systems. IEEE Sens. J. 17(18), 6006–6018 (2017) 20. Raghunathan, V., Ganeriwal, S., Srivastava, M.: Emerging techniques for long lived wireless sensor networks. IEEE Commun. Mag. 44(4), 108–114 (2006) 21. Tatbul, N., Cetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load shedding in data streams. In: 29th International Conference on Very Large Data Bases (VLDB), vol. 29, issue 3, pp. 309–320 (2003) 22. Rault, T., Bouabdallah, A., Challal, Y.: Energy efficiency in wireless sensor networks: a topdown survey. 67, 104–122 (2014) 23. Wood A.L., Merrett, G.V., Gunn, S.R., Al-Hashimi, B.M., Shadbolt, N.R., Hall, W.: Adaptive sampling in context-aware systems: a machine learning approach. IET Wirel. Sens. Syst. 5 (2012) 24. Xue, W.C., Peng, B., Wang, S.H., Hua, Y.: A type of energy-efficient secure localization algorithm FM based in dynamic sensor networks. EURASIP J. Wirel. Commun. Netw. 39 (2020)

Design and Analysis of Monopulse Comparator for Tracking Applications Srilakshmi Aouthu and Narra Dhanalakshmi

1 Introduction The monopulse comparator is a feed network used in modern radio frequency communication applications such as in radars, satellites and missiles to find the direction by calculating the direction of arrival with difference signals [1]. The monopulse comparator network consists of four hybrid couplers such that the network system includes four input ports and four output ports. It generates a sum signal and two difference signals, which are used to calculate the direction of arrival.

1.1 Branch-Line Hybrid Couplers The hybrid branch-line coupler consists of one input port, two output ports and one isolated output port. The output at the two ports is equal in magnitude and with a phase shift of 90° between them. It is a reciprocal device, so input and output ports can be interchanged [2]. The important parameters of any hybrid coupler are the coupling factor, isolation between the main branch and secondary branch and directivity. Figure 1 represents a microstrip-type hybrid coupler with four ports. The distance of separation between the opposite branch lines is a quarter wavelength. The width and length of branches are designed using Eqs. (1)–(5).

S. Aouthu (B) Department of ECE, Vasavi College of Engineering, Hyderabad, India e-mail: [email protected] N. Dhanalakshmi Department of ECE, VNRVJIET, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_45

513

514

S. Aouthu and N. Dhanalakshmi

Fig. 1 Conventional branch-line coupler

Fig. 2 Modified branch-line coupler with “U”-shaped bent line

The monopulse comparator requires two difference channel outputs with 0° and 180° phase shift separately. The conventional branch-line coupler is modified by adding a bent line to get an extra 90° phase shift in a “U” shape to make the circuit compact as shown in Fig. 2 [3]. Quarter-wave transmission lines are used in the design of branch-line coupler. The impedance of two sections of the coupler is same at characteristic impedance, √ whereas the other two sections have impedance equal to Z 0 / 2. Ports are fed by way of a transmission line with an impedance of Z 0. The bandwidth of the coupler is calculated as the frequency range at which the phase shift is with ±10° of the required phase shift [4]. Insertion loss is the additional due to insertion of the component, whereas coupling factor is defined as the coupled power to the secondary branch line from the primary branch line in the forward direction. Isolation is defined as the amount of leakage power coupled to the secondary port in a backward direction [5]. The impedance bandwidth required for the couplers is in the range of 13:1, the coupling factor is 3 dB and the isolation is around 20 dB.

1.2 Analysis of Hybrid Coupler An even–odd mode approach may be used to investigate the hybrid coupler. Because the coupler is a linear device, this strategy is applicable. Linearity permits superposition to be used to divide an input source into a sum of even (in-phase) and odd

Design and Analysis of Monopulse Comparator for Tracking Applications

515

(out-of-phase) sources [6]. This split preserves the coupler’s symmetry, which simplifies analysis. After splitting the source, each mode may be evaluated independently, and the sum of the findings yields the coupler’s entire response. In order to begin the analysis, the coupler is considered as a network of transmission lines’ network of transmission line terminated with normalized characteristic impedance Z 0 at the output ports with circuit schematics as shown in Fig. 3. To find the S parameters of the coupler, the circuit needs to be excited at one port, while all other ports are terminated with matched loads. The S parameters of the coupler can be determined by choosing Port 1 arbitrarily as the excitation port. The incident voltage V 1 is applied on port 1, and reflected voltages at each port are represented by V 2 , V 3 and V 4 . This excitation can be split into its even and odd modes as shown in Fig. 4a and b, respectively.

Fig. 3 Hybrid coupler circuit schematic with input and output voltages

Fig. 4 a Even mode representation of a hybrid coupler. b Odd mode representation of a hybrid coupler

516

S. Aouthu and N. Dhanalakshmi

2 Design of Monopulse Comparator The proposed microstrip-type amplitude comparison monopulse comparator is aimed to design at center frequency of 17 GHz with substrate RT Duroid 5880 with relative permittivity of 2.2, height of 1.575 mm, bandwidth of 10%, coupling factor of 0– 5 dB, isolation of 15 dB, the impedance of ports 50 ohms. The monopulse comparator consists of four hybrid couplers and four 90° phase shifters as “U”-shaped bend. Each hybrid coupler is consisting of two sets of quarter-wave transformers with impedances of 50 and 35.4 ohms. The width and length of microstrip-type quarterwave transformers are calculated at design frequency and are tabulated in Table 1. The analytical studies are carried out using CST Microwave Studio. The phase velocity through the microstrip lines can be found using Eq. 1. 1 = 1.96 × 108 m/s Vph = √ μ∈

(1)

The length of branch-line coupler is given in Eq. 2. Length = 3λ/4 + Z 0

(2)

where Z 0 is the width of microstrip line, the length value calculated by using Eq. 2 is 9.31 mm for 50  and width is 0.4 mm. √ Z0 √ = 111.72  with width is equal to 1.1 mm, Height = L/4 + 2 Z 0 = 4.56 mm. 2 Then, wd [7] is given by: w 8e A = 2A d e −2 Length of hybrid L =

(3)

90 ∗ π ∗ 180 √ r ∗ k

(4)

Wave number [8] k=

2π f c

(5)

  w 60 8h Z0 = √ + for w ≤ h ln εreff w 4h Table 1 Calculated widths and length of branch lines of hybrid coupler at 17 GHz

Parameter

Value

50 microstrip line width

4.68 mm

35.4 impedance microstrip line width

7.60 mm

Quarter wavelength

3.13 mm

Design and Analysis of Monopulse Comparator for Tracking Applications

εreff =

517

εr − 1 εr + 1 +  2 2 1 + 12 wh

(6)

120π Z0 =  for w ≥ h w εreff[ h +1.393+0.667 ln( wh +1.444)]

(7)

Figure 5 depicts a single branch-line hybrid coupler with an input port, a through port, a coupled port and an isolated port [9]. The output parameters of a single line branch coupler are listed in Table 2. The simulated structure of the monopulse comparator comprises four modified novel hybrid couplers which is shown in Fig. 6. There are a total of eight ports: four ports are input and rest four ports are output ports. Port numbers 1, 2, 5 and 6 are the input ports and 3, 4, 7 and 8 ports are output ports. The output at port 7 is the sum of the signals from ports 1, 2, 5 and 6 in phase. The azimuthal difference channel output at port 4 is equal to port 1—port 2 + port 5—port 6 and the elevation difference channel output at port 8 is equal to port 1—port 2—port 5 + port 6. Port 3 is observed as an isolated port and is terminated [10].

Fig. 5 Simulated structure of single branch-line hybrid coupler

Table 2 Output parameters of single branch-line hybrid coupler

Parameter

Value in dB

Coupling factor

−3.5 dB

Isolation

−25 dB

Output along collinear arm

−3.8 dB

518

S. Aouthu and N. Dhanalakshmi

Fig. 6 Simulated structure of monopulse comparator with four input and four output ports

3 Results and Discussions Reflection coefficients versus frequency characteristics for single branch-line coupler are analyzed. The reflection coefficient S 11 vs. frequency characteristics for port 1 are depicted in Fig. 7, and S 22 versus frequency characteristics for port 2 are shown in Fig. 8. Similarly, S 33 versus frequency and S 44 versus frequency characteristics are shown in Figs. 9 and 10, respectively. It is observed from the results presented in Figs. 7, 8, 9 and 10 that the corresponding reflection coefficients of a single branch-line coupler at 17 GHz are −13.8 dB, −17.8 dB, −17.25 dB and −15.5 dB, respectively, with the bandwidth of more than 500 MHz. It is observed from Fig. 14 that the isolation of monopulse comparator at the sixth port is 35 dB at design frequency of 17 GHz. The transmission coefficients S 31 , S 41 , S 51 , S 61 , S 71 and S 81 are used to calculate the sum channel output and difference channel outputs. It is observed that S 31 is −6.25 dB from Fig. 11, S 41 is equal to − 6.4 dB from Fig. 12, S 51 is equal to −18.5 dB from Fig. 13, S 71 is equal to −5.75 dB

Fig. 7 S 11 versus frequency in GHz

Design and Analysis of Monopulse Comparator for Tracking Applications

519

Fig. 8 S 22 versus frequency in GHz

Fig. 9 S 33 versus frequency in GHz

Fig. 10 S 44 versus frequency in GHz

from Fig. 14. S 61 dB versus frequency in GHz, from Fig. 15, and S 81 is equal to − 6.8 dB from Fig. 16 at the designed frequency of 17 GHz. The sum channel signal is calculated as the sum of the output signals from port 1, port 2, port 5 and port 6 in phase and is represented in Cartesian pattern form in Fig. 17. The maximum main lobe amplitude of the sum channel at 17 GHz along the boresight (8 = 0) direction is 23.1 dB. The 3 dB angular width is 11.2 degrees with a side lobe level equal to −10.2 dB.

520

S. Aouthu and N. Dhanalakshmi

Fig. 11 S 31 in dB versus frequency in GHz

Fig. 12 S 41 in dB versus frequency in GHz

Fig. 13 S 51 dB versus frequency in GHz

Figure 18 represents the output at port 4 of the monopulse comparator along the azimuthal direction in Cartesian form. The output comprises the difference of outputs port 1, port 5, port 2 and port 6. Azimuthal difference channel output = port 1-port 2 + port 5-port 6. The difference is below 5 dB along the boresight direction at 17 GHz. The sum channel output at port 7 = −(S71 + S72 + S75 + S76 ) = 6.56 + 6.788 + 6.63 + 6.684 = 26.662 dB

Design and Analysis of Monopulse Comparator for Tracking Applications

521

Fig. 14 S 61 dB versus frequency in GHz

Fig. 15 S 71 in dB versus frequency in GHz

Fig. 16 S 81 in dB versus frequency in GHz

The azimuthal channel output at port 4 is equal to S 41 − S 42 + S 45 − S46 which is about 5 dB. The output at difference channels is about −6 dB because it is the sum of two single coupler coupled port outputs.

522

S. Aouthu and N. Dhanalakshmi

Fig. 17 Monopulse comparator sum channel output in dB at 17 GHz

Fig.18 Monopulse comparator azimuthal difference channel output in dB at 17 GHz

4 Conclusion A microstrip-type monopulse comparator using four single branch-line couplers is designed at 17 GHz with substrate RT Duroid, εr = 2.33. Four “U”-shaped bent 90° quarter-wave transmission lines are used to produce the desired phase shift along the different ports. The analytical studies are carried out using CST Microwave Studio. The analysis of reflection and transmission coefficients in terms of S parameters is presented. Monopulse sum channel signal and difference channel signal along the azimuthal direction are analyzed and presented. The design is best suited for tracking radar applications in the Ku band. The concept can be extended to include a difference channel along the elevation direction.

References 1. Kumar, H., Kumar, G.: Mono pulse comparators. IEEE Microwave Mag. 20(3), 13–100, Mar (2019) 2. S. Dey, Sai Kiran, N., S. Dey: Microstrip and SIW based Mono pulse comparators for

Design and Analysis of Monopulse Comparator for Tracking Applications

3. 4. 5. 6. 7. 8. 9.

10.

523

microwave and millimeter wave applications. In: 2020 International Symposium on Antennas & Propagation (APSYM) (2020) Li, J.-L., Wang, B.-Z.: Novel design of Wilkinson power divider with arbitrary power division ratios. IEEE Trans. Ind. Electron. 58(6), 2541–2546 (2011) Napijalo, V.: Member, IEEE, and brian kearns multilayer 180 coupled line hybrid coupler. IEEE Trans. Microwave Theory Tech 56(11), Nov (2008) Wu, T.-L.: Microwave filter design Chp4. Transmission Lines and Components Department of Electrical Engineering National Taiwan University Polar, D.M.: Microwave Engineering. Artech House, Norwell, MA (1990) Wang, J., Ni, J., Guo, Y.X., Fang, D.: Miniaturized microstrip Wilkinson power divider with harmonic suppression. IEEE Micro. Wireless Compon. Lett. 19(7), 440–442 (2009) Chun, Y., Hong, J.: Compact wideband branch-line hybrids. IEEE Trans. Microwave. Theory Tech. 54(2), 704–709 (2006) Dhipti, G., Swathi, B., Reddy, E.V., Kumar, G.S.: IoT-based energy saving recommendations by classification of energy consumption using machine learning techniques. In: International Conference on Soft Computing and Signal Processing (pp. 795–807). Springer, Singapore (2021) Swathi, B., Murthy, K.: An effective modeling and design of closed-loop high step-up DC–DC boost converter. In: Intelligent System Design (pp. 303–311). Springer, Singapore (2021)

Objective Parameter Analysis with H.265 Using Lagrangian Encoding Algorithm Implementation Kiran Babu Sangeetha and V. Sivakumar Reddy

1 Introduction Because video has become so popular in recent years, uncompressed video signals can generate a lot of data. There are further requirements for higher quality, higher resolution, and more fidelity, as well as video content accessibility. The exclusive use of video, such as remote home monitoring, video chat, and wearable cameras, places a significant demand on communication networks and data storage systems [1, 2]. High-Efficiency Video Coding (HEVC) makes a significant contribution in this direction. We must compress the data in order to transfer it. Previously, they employed a variety of lossless techniques to store the data [3]. However, they were unable to decrease the data, which may be insufficient. The authors are analyzing lossy video compression in the medical environment, taking into account lossy compression tolerance, which covers both temporal and spatial masking effects of the human visual system (HVS). Between compression percentage and persistence quality, there is a trade-off. In today’s environment, video accounts for roughly one-third of all mobile data usage. Video is one of the most demanding services in terms of reliability and network efficiency. As a result, service providers and mobile network operators around the world are facing a difficulty in offering high-quality video streaming [4, 5].

K. B. Sangeetha (B) Department of ECE, Malla Reddy College of Engineering and Technology, Hyderabad, Telangana, India e-mail: [email protected] V. S. Reddy Malla Reddy College of Engineering and Technology, Hyderabad, Telangana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_46

525

526

K. B. Sangeetha and V. S. Reddy

The primary goal of video encoding is to compress data (video) while maintaining quality. The Video Coding Experts Group (VCEG) and the Moving Picture Experts Group (MPEG) collaborated to develop the Advanced Video Coding (AVC) standard, also known as H.264. Similarly, the same group proposed High-Efficiency Video Coding (HEVC), also known as H.265. When compared to H.264, the bit rate of H.265 is reduced by 50% [2]. The main reason for using HEVC is that it has twice the compression capability without sacrificing quality. The primary goal of modern digital video compression technology and HEVC is coding and compression efficiency [6]. Encoding is far more difficult in 5G networks than decoding; at this point, HEVC provides a diverse range of encoders. This work employs LE-based H.265 architecture for video transmission, which combines an auto encoder and a learning-based model. The auto encoder predicts the frames using deep learning and can rebuild the frames without losing any data. Section 2 describes the existing methodology of objective parameter analysis with the previous one. Section 3 describes LE encoder and its algorithm describes the method used in this study; in Section 4, discussed the results which are presented. Section 5 concludes the paper.

2 Existing Methodology H.264 is the most popular video codec nowadays that achieves a 40–50% bit reduction. To represent the same high-quality video, it takes up half the space of MPEG-2. Without sacrificing speed or performance, we can attain HD quality. It has efficient HD video encoding and offers high-quality video at low transmission rates. Intra prediction is used by H.264 to anticipate the macro block from its preceding codes surrounded by pixels from the same frame [7].

3 Proposed Method H.265/High-Efficiency Video Coding (HEVC) is a codec that is favored by the ISO/IEC Moving Picture Experts Group (MPEG) and the ITU-T Video Coding Experts Group as a replacement for H.264 (VCEG). The primary goal of the new codec is to have 50% better compression efficiency than H.264 and to support resolutions up to 8192 × 4320 [6, 8]. The proposed block diagram of LE-based H.265 encoder is shown in Fig. 1. The main distinction between other codecs and the proposed LE-based H.265 encoder is that they employ block-based hybrid video coding. The video coding block is divided into smaller blocks that can be used for prediction and transformation codes. To determine the coding efficiency of a specific encoder, this is primarily determined by the encoding algorithm used.

Objective Parameter Analysis with H.265 Using Lagrangian Encoding …

527

Encoder Control

Output Quantizer

Partition into Code tree blocks

Decoder

Lagrangian Entropy Encoder

Transform coding

Motion Compensated Prediction Filter Intra Picture Prediction Input Video Frame Motion

Fig. 1 Proposed LE-based H.265 encoder

(A) Lagrangian Encoder It is necessary to select the coding modes, prediction parameters, quantization parameters, and transform coefficients. The simple and effective Lagrangian bit allocation algorithm for minimization is given by: x∗ = arg min D + λ.R xε X

(1)

For a constant that determines the trade-off between distortion D and number of bits R, the Lagrangian parameter x is minimized. The efficiency of coding is also affected by parameters such as interpolation filters, in-loop filters, and entropy encoders. The main difference between generations is expanded support for arranging the block of samples, which means that greater fidelity of motion vectors, more flexibility for coding order of pictures, longer reference pictures, a variety of intra predicted modes, a large number of vector predictors, and a large number of block sizes for motion-compensated prediction. (B) Lagrangian Encoder Algorithm In this paper, the design of a Lagrangian encoder for H.265/HEVC encoder is proposed. This encoder allows you to choose between compression efficiency and throughput. The encoder decides whether to use intra- or inter-code predicted blocks. When a coder is chosen, the frames are received and formed into a Code Tree Block. The partitioning of Code Tree Block into coding unit is as shown in Fig. 2. The coding unit consists of chroma and luma samples each of two samples. The CU can be 8 × 8, 16 × 16, or 32 × 32 in size. CU can be further subdivided into transform units based on the quad tree structure.

528

K. B. Sangeetha and V. S. Reddy

Fig. 2 Partitioning of 64 × 64 CTU into CUs of 8 × 8 and 32 × 32

The main advantage of using this block partitioning is the reduction in size, which allows us to access a small unit of the structure as well. The proposed Lagrangian-based encoding algorithm manages the macroblock mode decision and evaluates the motion-compensated signal, which is dependent on the inter mode for motion-compensated signal [9]. To minimize buffer delay, the quantization parameter is increased within the acceptable Lagrangian range. This procedure is repeated until the calculated delay is minimized. In Eq. 1, the Lagrangian parameter is varied over a range of values 4, 25, 100, 250, 400, 730, 1000 between different values of Q. The range of Q in our proposed encoder varies between Q  {1, 2, 3 … 0.32}. The choice of Q for a given bit rate is determined by the bit rate. With the inclusion of the proposed encoder, the delay is minimized through a series of computations using the Lagrangian parameter and Q, and the shortest buffer delay is obtained. There are two methods for evaluating video quality: subjective and objective. Because subjective quality assessment necessitates the use of human resources and takes time, we turn to objective quality metrics. The evaluation is highly perceptive, correlated with humans, and required for medical perception. There are several metrics that can be used to evaluate video quality. We begin with peak signal-to-noise ratio (PSNR), which measures the maximum possible power difference between the original and degraded images. A high PSNR indicates a low mean square error, implying better image restoration. The SSIM is the second. The structural similarity index is based on structural information extracted from stimuli. SSIM is a digital TV and digital image or video prediction method. It analyzes the similarity between two images. The pixels in this image have a strong spatial relationship. This structural information is required for predicting the objects in the visual scene [10]. Video Multimethod Assessment Fusion (VMAF) is a full-reference video quality metric developed by Netflix in collaboration with the University of Southern California. It makes assumptions about video quality based on the original and restored video sequences.

Objective Parameter Analysis with H.265 Using Lagrangian Encoding …

529

4 Results and Discussions The results are obtained by running Horse race’s test sequence (a CIF file). Using this video, the authors are created the dataset and analyzed the protocol and network performance. This video file was chosen because it has a high resolution and is in the A category. The objective quality parameter PSNR is measured for the existing and proposed methods. PSNR is calculated across bit rates using two techniques: one pass (Fig. 3) and two passes (Fig. 4). Encoding is done only once in the first method, i.e., one pass. The file size in one pass is unimportant, but the quality is superior when compared to two passes. Because the encoding is done twice in two passes, the video is processed twice and takes more time. The bit rate is calculated in two passes as file size/duration. The file size matters in two passes without regard to quality, whereas one pass primarily achieves video quality. The command line for calculating PSNR bit rates in one and two passes using the FFMPEG tool is provided. In Fig. 3, PSNR vs. bit rate is measured. The PSNR increases steadily from 24 dB at 500 kbps to 38 dB at 32 Mbps, whereas the old approach starts at 25 dB at 500 kbps and just 36 dB at 32 Mbps. Using the LE-based H.265 algorithm results in a 5.52% increase in performance. Fig. 3 PSNR versus bit rate (one pass)

Fig. 4 PSNR versus two pass bit rate

530

K. B. Sangeetha and V. S. Reddy

Fig. 5 VMAF versus preset

The compressed ratio improves with a higher PSNR. Figure 4 shows that the PSNR value is 34 dB for the bit rate 32 M and approximately 36 dB for the bit rate 32 M and that the greater PSNR value is produced with single pass (i.e., PSNR 38 compared to PSNR 32 for the same bit rate of 32 M). The VMAF objective quality metric is shown against the ‘preset’ parameter in Fig. 5. PSNR was calculated using mean squared error, whereas the VMAF measure is a machine learning-based training model that is dependent on the actual mean opinion score (MOS). MOS is a scale that ranges from 1 to 5, with 1 being the worst and 5 being the best. The encoder offers a preset option that selects a parameter metric that balances encoding speed and compression efficiency. Medium is the best value for preset. If the preset is set to ‘faster,’ the encoder will increase speed at the expense of quality and compression efficiency. When the setting is set to ‘slow,’ the coder achieves the highest quality while using more CPU computations. The command vmaf is used to implement the algorithm, and the C library libvmaf is used to incorporate VMAF into the code. The VMAF score runs from 0 (worst) to 100 (highest) (best). Figure 5 shows that the value of VMAF starts at 97 and decreases to 85 as the CPU spends more time on computations, reaching 96 for the ‘slow’ preset. Figure 5 shows that the existing method’s VMAF score is 94 for the preset ‘medium;’ however, the suggested LE-based H.265 algorithm has a 97 score. The SSIM metric is depicted in Fig. 6. SSIM has a value range of 0 to 1. The value 1 indicates that the original and reconstructed frames of the video are perfectly aligned. Figure 6 shows that SSIM is maintained at 0.95 at 32 M for LE-based H.265 compared to 0.92 at 32 Mbps bit rate, indicating a 3.3% increase. The values 0.97, 0.98, and 0.99 are thought to be adequate for reconstructing the original video. Constant rate factor (CRF) versus stream size and CRF versus SSIM are shown in Figs. 7 and 8. From the graph, it is observed that if CRF is increased, then there is a decrease in video quality. There are minor differences in PSNR, SSIM, and VMAF scores for the same CRF. CRF has a slowly declining tiny slope in PSNR, SSIM VMAF, approximately > = 20 for larger values.

Objective Parameter Analysis with H.265 Using Lagrangian Encoding … Fig. 6 SSIM versus bit rate

Fig. 7 CRF 15–30 stream size

Fig. 8 Bit rate versus CRF SSIM

531

532

K. B. Sangeetha and V. S. Reddy

The lower the CRF number, the better the quality. The compression in Fig. 7 is proportional to the CRF. The VMAF vs. bit rate score is plotted in Fig. 9. In comparison, the VMAF score for H.265 typically starts at 25 and increases to 96 as the bit rate increases from 2000 k. In this Fig. 9, the maximum value for bit rate 64 Mb/s is 98.8, indicating that H.265 outperforms H.264. Figure 10 depicts the metric PSNR with constant rate factor (CRF) values ranging from 15 to 30. The better the quality of the compressed video, the higher the PSNR. When CRF is low, PSNR is high. In Fig. 10, PSNR is greater than 33 and then decreases as the CRF increases. The preset is modified with regard to bit rate in Fig. 11 and plotted for both codecs while maintaining CRF constant. When switching from medium to slow, the amount of time required increases by around 40%. Instead, going slower would need approximately 100% more time (i.e., it will take twice as long). Fig. 9 VMAF versus bit rate

Fig. 10 CRF 15–30 PSNR

Objective Parameter Analysis with H.265 Using Lagrangian Encoding …

533

Fig. 11 Preset versus bit rate

When compared to medium, extremely slow consumes 280% of the original encoding time, with very minor quality gains above slower. Using a fast preset reduces encoding time by 10% and speeds up the process by 25%. Ultrafast will save 55% but at the cost of significantly lower quality.

5 Conclusion The two probably the most popular codecs for video compression are H.264 and H.265. The authors analyzed video compression and measured video quality metrics using the FFMPEG tool. The authors have been using the FFmpeg tool to compress video using a proposed Lagrangian-based H.265 coder, while also evaluating and comparing performance metrics such as PSNR, VMAF, and SSIM with the preceding encoding H.264 coder. It can be seen from the graphs that have better performance in all metrics when compared to H.264. The output of the specified metrics is in the form of a log file in JSON or text format. In terms of bit rate, simulations are done with one pass and two pass.

References 1. Chaabouni A., Gaudeau, Y., Lambert, J., Moureaux, J.-M., Gallet, P.: H.264 medical video compression for telemedicine: a performance analysis. IRBM 37(1), 40–48, ISSN 1959-0318, https://doi.org/10.1016/j.irbm.2015.09.007 2. Vranjes, M., Rimac-Drlje, S., Grgic, K.: Review of objective video quality metrics and performance comparison using different databases. Image Commun. 28(1), 1–19 (2013). https://doi. org/10.1016/j.image.2012.10.003

534

K. B. Sangeetha and V. S. Reddy

3. Anitha, P., Reddy, P.S., Prasad, M.N.G.: Content split block search algorithm based HEVC. In: Published from Editorial Board of Journal of Scientific and Industrial Research (JSIR), pp. 690–693. Science Citation Index (E). ISSN: 0975-1084 (online); 0022–4456 (print) 4. Panayides, A., Antoniou, Z., Pattichis, M.S.: The use of H.264/AVC and the emerging high-efficiency video coding (HEVC) standard for developing wireless ultrasound video telemedicine systems. In: 2012 Conference Record of the Forty-Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), pp. 337–341 (2012). https://doi.org/10.1109/ ACSSC.2012.6489019 5. Sivam, B.S., Sumithra, M.G., Sreelatha, P.: Survey on video compression techniques for efficient transmission. In Journal of Physics: Conference Series 2021/05/0J, vol. 1916 Issue 1, pp. 012211, 1742-6588-6596, IOP Publishing 6. Thyagarajan, K.S.: Still Image and Video Compression with Matlab. Wiley (2010). ISBN: 9780470484166. https://doi.org/10.1002/9780470886922, Dec 2010 7. Richardson, I.E.: The H.264 Advanced Video Compression Standard, 2 edn. Wiley. ISBN: 9780470989418. https://doi.org/10.1002/9780470989418 8. Hasan, M.K., Chuah, T.C., El-Saleh, A.A., Shafiq, M., Shaikh, S.A., Islam, S., Krichen, M.: Constriction factor particle swarm optimization based load balancing and cell association for 5G heterogeneous networks. Comput. Commun. (2021) 9. Tech, G., George, V., Pfaff, J., Wieckowski, A., Bross, B., Schwarz, H., Marpe, D., Wiegand, T.: Learning-based encoder algorithms for VVC in the context of the optimized VVenC implementation. In: Applications of Digital Image Processing XLIV, vol. 11842, p. 1184207. International Society for Optics and Photonics (2021) 10. Anitha, P., Reddy, P.S., Prasad, M.N.G.: High efficiency video coding with content split block search algorithm and hybrid wavelet transform. In: International Journal of Scientific technology and Research, pp. 956–962 (2020). ISSN 277-8616

Effective Strategies for Detection of Abnormal Activities in Surveillance Video Streams Anikhet Mulky, Payal Nagaonkar, Akhil Nair, Gaurav Pandey, and Swati Rane

1 Introduction Human abnormal activity detection is a challenging task; it involves not only recognizing the patterns but also understanding the behaviour of motion in different scenarios of computing environments. The automatic detection of suspicious activities is the significant application in video surveillance which is difficult in long-time video streams. Every year, computerized visual behaviour analysis provides several key components for an intelligent vision system [1]. The key to perceive people and their behaviours through vision is vital for a machine to connect robustly and easily with a human–computer interacting world. An important application of the surveillance application is the detection and prediction of anomalous events [2, 3]. Video surveillance is the process of monitoring video clips for the behaviour, activities, or other suspicious information for the purpose of managing or protecting the society or people. Surveillance can be classified into three types; those are manual, semiautonomous, and fully autonomous model. These manual and semi-autonomous models involve a human operator for monitoring activities. If the system is automated, A. Mulky · P. Nagaonkar · A. Nair (B) · G. Pandey · S. Rane Department of Electronics and Telecommunication Engineering, South Indian Education Society’s Graduate School of Technology, Nerul, Navi Mumbai 400706, India e-mail: [email protected] A. Mulky e-mail: [email protected] P. Nagaonkar e-mail: [email protected] G. Pandey e-mail: [email protected] S. Rane e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_47

535

536

A. Mulky et al.

Fig. 1 Proposed architecture

the requirement of human intervention is reduced, and it automatically identifies lowand high-level features like behaviour recognition of two well-known events like normal and abnormal activities [4]. The actions by automated systems perform event detection, accessing authentication, event classification, and behaviour analysis, and then sending message alerts to the concerned personnel. The surveillance system is shown in Fig. 1. Several existing technologies in the video surveillance consist of the same primary components that are available such as sensing devices, and include a recording mechanism and a human interface device, from where the variables are seemingly limitless [5]. This technology will often drive variables like camera function, resolution, frame rates, storage capacity, and actual camera selection. There are seven video surveillance technologies on the market like analogue closed circuit television cameras (CCTV), advanced analogue CCTV’s, IP based network, highdefinition serial digital interface, analogue high definition, high-definition composite video interface, and high-definition transport video interface, etc., which involves so many challenging issues [6, 7].

2 Abnormal Event Detection An event is defined as any identifiable incident that has significance to the user. In the video context analysis, events consist of sequential actions involving in abundant objects that interact or co-exist in a common space monitored by single or multiple cameras. The event can be classified into two groups: global event and local event, depending on the number of camera views. If the surveillance area is covered by single camera, then the user’s choice of activity is called as local activity. Similarly, if there is more than one camera covering the surveillance area, then the user’s choice

Effective Strategies for Detection of Abnormal Activities …

537

Fig. 2 ConvLSTM layers

of activity is termed as a global event [8]. Consequently, event detection refers the detection of user’s intent of particular actions. Typical examples are pedestrian movement, loitering, fighting in crowded areas, vehicle turning illegally, moving in the wrong direction, robbery, stopping at traffic intersections, etc. The anomaly events in the video scenes should be known to the end users working on the video analytic systems. The term anomaly refers to the aberrant events which occur occasionally in comparison with normal events. It is also known as the events which are rare and dissimilar from normal ones and the events with different characteristics from normal events. Former names for the term anomaly are rare or unusual or suspicious or abnormal or irregular and outlier. In the context of public safety video surveillance, typical examples of anomaly are fire detection, crowd fighting, and placing abandoned object, etc. In a congested area, it is impossible to monitor each person’s behaviour. Each person’s behaviour is distinct. Some people behave appropriately, while others behave strangely. As a result, keeping track of suspicious behaviour takes time. Some of the suspicious behaviour depicted in Fig. 2 includes shouting, fighting, rioting, jaywalkers, and travelling the wrong way, amongst other things. Motion anomaly detection delivers a new domain, such as automatic detection of riot acts in a congested area, automatic detection of traffic violence, group behaviour recognition, and performance evaluation. Certain behaviour in a congested area such as sudden convergence and divergence of people is vitally important for forensic purposes and for the prevention of hazardous activities [9].

2.1 Need for the Abnormal Detection The most common applications of surveillance systems are intruder detection, unattended object detection, loitering detection, theft detection, crowd anomaly detection, and car parking anomaly detection. These applications generate large amounts of data and require a lot of attention from the manual operator for performing analysis. As

538

A. Mulky et al.

a result, computer vision techniques must be used to automate the entire procedure. The majority of existing autonomous video analysis technologies rely on video motion detection (VMD), hard-wired rules, and object-cantered reasoning, and they function in hostile environments. However, in real-world applications, constraints such as image resolution, noise, and position and appearance fluctuation may cause considerable occlusion [10]. The other important reason for frequent breakdown of the traditional automatic analysis system is due to frequent configurations, parameter settings, and non-scalable deployment. Addressing the above-mentioned problems, the automatic analysis systems demand more sophisticated computer vision algorithms. Recently, modelling techniques such as dynamic Bayesian networks, Bayesian topic models, artificial neural networks, decision trees, and fuzzy reasoning have been used for this type of systems. One of the most important functionalities of such automation systems is to detect unusual events that could pose a threat to public system, security, and safety.

3 Related Work Existing techniques for anomaly detection used classification and clustering approaches which may miss to cover hidden interesting patterns affecting detection accuracy [11, 12]. Anastasiia et al., discussed advice for choosing its bandwidth utilization, learning technique, and abnormality detection characteristics of the NN. The model can learn the program’s typical behaviour and does not require anomalies to be present in order to determine its statistically significant difference. The approach has been proven to be capable of detecting abnormalities inside the vibrating electric signals that might be utilized to anticipate disaster occurrence. Additionally, the technology can be combined with telephone information mining to discover roadway issues like pits or irregularities enabling treatment as well as restoration. Signal Strength Indicators accessible at AIS base stations (BS), the challenge of determining how a scarcity in automatic identification system (AIS) communications indicates an alarming scenario was developed throughout sequence of video. The research also includes an object detecting technique for detecting deliberate AIS start and stop toggling. The approach would then be shown and evaluated using a data mining tool. ChihYang Lin et al., established computational compositional data analysis (CoDa) to find multidimensional deviations or abnormalities. Researchers demonstrated that quite contract strategies might achieve similar accurate values. The CoDa technique, on the other hand, requires fewer variables and identifies is components seem to be to blame again for irregularities [13–16]. Chunyu Chen et al., described a technique for identifying abnormalities in geographical data. Due to the limited quantity of information accessible in climate databases, entire Northern photos were separated in micro, allowing for a larger test dataset. Furthermore, the split may be utilized to compensate again for heterogeneity

Effective Strategies for Detection of Abnormal Activities …

539

that glacier pictures need. The technique has the benefit of being able to discover abnormalities totally automatically even without a specialist or human labelling. Jie Shao et al., introduced the use of image processing and machine learning approaches to identify irregularities in motorized vehicles. Stopping and going in the opposite way are examples of these abnormalities. Surveillance systems are used to record images from the vehicle’s front and back sides. As a result of this capacity, the findings are resistant to changes in design and operating circumstances. Jun Wei Hsieh et al., utilized an innovative strategy to dynamically categories large amounts of information it into a supervised training set. When the significance of main KPIs is completely understood, this would be doable. Abnormal Filtration employs a dual dataset username prognosis as well as a rational arrangement with permissible parameter intervals (Minimum /Maximum). The technique appears to resolve a number of important issues in the fields of cell membrane, permanent, data systems, networking, and remote controller. Other conceivable example application is robotics in broad, which includes medical/critical devices and components. Na Lu et al. created a new method dubbed AMF-LSTM for detecting aberrant traffic using a deep learning model. To obtain temporal correlation between flows, the statistical features of multi-flows were used as the input rather than a single flow or the features extracted from log, and an attention mechanism was added to the original LSTM to help the model learn which traffic flow has more contributions to the final results. Experiments revealed that the AMF-LSTM method had a good accuracy and recall in identifying anomaly types.

4 System Architecture and Design Any video input is essentially a collection of sequential images known as frames. Each frame can be treated as an image, and different kinds of image processing methods can be applied over the frames. A surveillance camera records the events across the surveillance space round the clock. The size of a video is not fixed, and at the same time, it can be segmented for easy processing.

4.1 Preprocessing The purpose of this step is to transform unaligned data so that it can be used as an input information. The frames with pixel of 227 × 227 are acquired from clips that are scaled in a range between zero and one. The process of normalization is accomplished by the global mean rate. From training dataset, mean value of frames is estimated by summation of pixel values. The dimensionality is minimized by grayscale conversion process. The frames are obtained from video that is altered to get a unit variance and zero mean that is followed by the pre-processing procedure. The model takes

540

A. Mulky et al.

video volumes as input, with every volume comprising of ten sequential frames with numerous skip strides.

4.2 Feature Learning From the training clips, a convolutional spatio-temporal-based autoencoder system is suggested. The planned design is divided into two sections: • Each video frame’s spatial properties are learned using a spatial autoencoder. • For learning temporal patterns of encoded spatial characteristics, a temporal encoder-decoder is used. The spatial encoder and decoder, as shown in Figs. 1 and 2, each include two convolutional and deconvolutional layers. The proposed scheme has demonstrated its effectiveness in applications like recognition of handwriting and translation of speech.

4.3 Spatial Convolution In Fig. 1, we see how the three-layered convolutional auto encoder functions by learning temporal regularity. Initially, video series are passed via spatial encoder that is convolution with different stride and filters. The convoluted frames are passed to the temporal encoder that is further de-convoluted and reconstructed. The convolution process of every image is passed to conv LSTM filters that is shown in Fig. 2. In Fig. 2, convolutional LSTM layers are given with 3*3 size image with 64 filters. The process of filtering accomplished via convLSTM is depicted. The process of feature mapping is done with slices acquired from image with convolved feature that is given in Fig. 3.

Fig. 3 Feature map of convolution layer

Effective Strategies for Detection of Abnormal Activities …

541

Fig. 4 Padding layers

In Fig. 3, the feature mapping process across the convolution layer is given and the image slices are depicted with convolved feature for further accomplishing the mapping process. On the image slice, the movement of the window is the Kernel and the size of the matrix is 3 × 3. The major purpose of the convolutional layer is to capture low-level aspects of image slices such as edges, corners, and colour, while the deeper layers can retrieve high-level elements such as texture by integrating lowlevel ones. A collection of convolution kernels connects each unit in a convolutional layer to a neighbourhood patch inside the feature maps of the preceding layer. These weight-based outcomes are combined using a non-linearity operation similar to a rectified linear unit. Constant convolution kernels are shared by all units in the feature map. During a convolution layer, various feature maps may employ diverse convolution kernels. After the convolutional layer, a padding layer is added in order to adjust the size of the image by adding extra zeros on the rows and columns of the images which is shown in Fig. 4. Figure 5 depicts the process of padding in the deep learning process, and the size of padding is 32 * 32 * 3. Padding is generally the process of accumulating layers of zeros to our input frames to prevent the disputes stated previously. After the process of padding, (n*n) frames will becomes (n + 2p) * (n + 2p) size of images. The process of feature mapping is illustrated in Fig. 5. In Fig. 6, the process of filtering of feature vector and mapping the relevant feature using RLU map is given. Filters or feature detectors are applied to the input image or the feature map output of the previous layers to create feature maps.

Fig. 5 Feature map

542

A. Mulky et al.

Fig. 6 Abnormal event detected—White papers scattered over road (1)

5 Results and Discussion The research utilizes newly created benchmarking datasets to train proposed model. In these datasets, all videos are captured from a fixed point. There are no unusual activities in any of the training videos. Both normal and aberrant occurrences can be used in testing the system. A threshold value, set by hit-and-trial method has been used to set a base for ‘normalcy’. Testing video that leads to the threshold value crossing the stipulated limit is termed as an “abnormal event”. Normal events are ones that fall under the threshold value. Setting a low threshold makes the system become sensitive to the happenings in the scene, where more alarms would be triggered. The true positive and false positive rates are obtained by setting at different error threshold to calculate the Area under curve (AUC) under the receiver operating characteristic (ROC) curve. A threshold value of 0.000488 is found to be the optimal point at which the system can seamlessly distinguish between normal and abnormal events for the current dataset. Further improving upon previous work, the proposed system is not resistant to change in design and operating circumstances and can be easily trained with new datasets with high accuracy. Also, it improves upon existing systems by having a quicker reaction time due to faster processing. For single instances, the system can detect fast-moving vehicles (cars, moped, and cycles) with an accuracy of 92%, A fast-moving person can also be detected with the accuracy of 85%. These results were found via the confusion matrix technique. For abnormal events that occur in groups, such as a group of people running, cycling, and fighting have a low accuracy of 64% because the autoencoder struggles in tracing an entity as a whole group. Massive amounts of collected data create noisy datasets and pose storage and analytics problems. Neural network systems have training issues since they rely on data that is particularly difficult to collect during unpredictable situations. As a result, bias may be present while evaluating a single individual or a group as a whole. This can be

Effective Strategies for Detection of Abnormal Activities …

543

improved further by training our model with more testing videos. Figure 6 shows abnormal event detected—Whitepapers scattered over road (1) Fig. 7 shows abnormal event detected—running individual in yellow (2) and Fig. 8 shows abnormal event detected—scooters in motion (3).

Fig. 7 Abnormal event detected—Running individual in yellow (2)

Fig. 8 Abnormal event detected—Scooters in motion (3)

544

A. Mulky et al.

6 Conclusion Video surveillance is widely used in areas like shopping malls, traffic monitoring, parking lots, etc. To ensure safety and security of the people, video surveillance has been an extensive area of research in the field of computer vision. Video surveillance data has huge volume of information which is difficult for a human proctor to pay attention to over long periods of time. Several intelligent systems have been previously integrated with video monitoring systems that analyse abnormal behaviour, but these systems have several issues like less accuracy rate in detection of persons and their actions, and high latency in the reporting of an abnormal event. We recommend that in the future, abnormal event detection systems find out a way to track a large number of entities with higher accuracy. We also recommend that a sophisticated system find out a way of automatically setting a threshold value, based on the environment the system in used in. Further work to design the deep learning model with different feature extraction may improve the accuracy of detecting anomalies. This will enhance the usage of computer vision techniques for effective anomaly detection in multiple domains.

References 1. Sokolova, A.D., Kharchevnikova, A.S., Savchenko, A.V.: Organizing multimedia data in video surveillance systems based on face verification with convolutional neural networks. In: International Conferences on Analysis of Images, pp. 223–230. Social Networks and Texts, Springer, Berlin (2017) 2. Lin, C.Y., Muchtar, K., Yeh, C.H.: Robust techniques for abandoned and removed object detection based on markov random field. Elsevier J. Vis. Comm. Image Represent. 39, 181–195 (2016) 3. Chen, C., Shao, Y., Bi, X.: Detection of anomalous crowd behavior based on the acceleration feature. IEEE Sens. J. 15(12), 7252–7261 (2015) 4. Chen, C.Y., Shao, Y.: Crowd escape behavior detection and localization based on divergent centers. IEEE Sens. J. 15(4), 2431–2439 (2015) 5. Fan, Y., Levine, M.D., Wen, G., Qiu, S.: A deep neural network for real-time detection of falling humans in naturally occurring scenes. Neuro-Computing 260, 43–58 (2017) 6. Shao, J., Sun, J., He, C.: Abnormal event detection for video surveillance using deep-one class learning. Multimedia Tools Appl. 78(3), 1–15 (2017) 7. Musale J.: Suspicious movement detection and tracking of human behavior and object with fire detection using a Closed-Circuit TV cameras. Int. J. Res. Appl. Sci. Eng. Techn. 5(8), 2013–2018 8. Hsieh, J.W., Chuang, C.-H., Alghyaline, S., Chiang, H.-F, Chiang, C.H.: Abnormal scene change detection from a moving camera using bags of patches and spider-web map. IEEE Sens. J. 15(5), 2866–2881 (2015) 9. Kandylakis, Z., Karantzalos, K., Doulamis, A., Doulamis, N.: Multiple objects tracking with background estimation in hyper spectral video sequences. IEEE Conf. Hyper Spectral Image Signal Proces. 1–4 (2015) 10. Zheng, K., Yan, W.Q., Nand, P.: Video dynamics detection using deep neural networks. In: IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2(3), pp. 224–234

Effective Strategies for Detection of Abnormal Activities …

545

11. Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., Tian, Q.: MARS: A Video Benchmark for Large—Scale Person Re-Identification. In: Published on Computer Vision, vol. 9910, pp. 868– 884. LNCS (2016) 12. Manaswi, C., Rohit, D., Debi, P., Harish, B., Lyudmila, M.: Motion anomaly detection and trajectory analysis in visual surveillance. Multimedia Tools Appl. 77(13), 16223–16248 (2018) 13. Muhammad, K., Jamil, A., Lv, Z.: Efficient deep CNN-Based fire detection and localization in video surveillance applications. IEEE Trans. Sys. Man Cybern. Soc. 49(7), 1419–1434 (2019) 14. Lu, N., Wu, Y., Feng, L.: Deep learning for fall detection: 3D-CNN combined with LSTM on video kinematic data. IEEE J. Biomed. Health Inf. 23(1), 314–323 (2018) 15. Rajenderana, S.V., Thang, K.F.: Real-time detection of suspicious human movement. In: International Conference on Electrical, Electronics, Computer Engineering and Their Applications, pp. 56–69 (2014) 16. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

Bending Conditions, Conductive Materials, and Fabrication of Wearable Textile Antennas: Review Rajesh Katragadda and P. A. Nageswara Rao

1 Introduction Due to the wide variety of applications in the medicinal and non-medicinal fields, wireless body area networks (WBAN) have emerged due to recent advances in wireless technology. As the central focus of any WBAN sensing system, wearable antennas have attracted a great deal of interest [1]. Wearable antennas must be lightweight, low-profile, small, and flexible to be unobtrusive. Being in the active near-field area of the antenna, the human body influences its performance from an electromagnetic perspective. Lossy human body power coupling reduces antenna efficiency and reduces realized gain. In the case of VHF antennas, the body impact is significant since the wearer’s whole body is inside the antenna’s near-field reactive area. Since the antenna operates at a relatively long wavelength, clothes placed under or covering it will not impact its function. The human body has a high dielectric permittivity and functions as a lossy antenna platform. But when antenna radiation efficiency decreases, antenna gain is reduced. Body-worn antennas must meet specific absorption rate (SAR) limitations to maintain user RF exposure at a safe level, according to the International Commission on NonIonizing Radiation Protection [1]. This article is organized as follows: Sect. 1 offers a wearable antenna introduction. Mohan and Suriyakala [3–11] the second section includes literature survey on various methods of substrate material fabrication of textile wearable antennas. R. Katragadda (B) Department of Electronics and Communication Engineering, A.U.TDR HUB, Andhra University, Visakhapatnam, Andhra Pradesh 530003, India e-mail: [email protected] P. A. N. Rao Department of Electronics and Communication Engineering, Gayatri Vidya Parishad College for Degree and P.G Courses (A), Andhra University, Visakhapatnam, Andhra Pradesh 530045, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_48

547

548

R. Katragadda and P. A. N. Rao

Additionally, in Sect. 3 [10, 12–20], the bending situations of antennas and the influence of bending on efficiency were discussed. However, in Sect. 4, the high conductivity substrate materials for different wearable antennas were discussed [21– 30]. Finally, the conclusion is presented in Sect. 5.

2 Literature Survey Mohamadzade et al. [2] based on the type of substrate fabrication process, wearable antennas can be classified into • • • • •

Embroidered antennas on fabrics Antennas with polymer embedding Injection alloy microfluidic antennas Antennas with inkjet, photolithography, and screen Antennas made of 3D printed materials

A sewing machine controlled by computer-aided design (CAD) software is often used to produce fabric-based embroidered antennas. It is possible to design coplanar antennas or ground-plane-backed antennas using this method. Antenna characteristics are affected by the conductivity of the embroidered layer after washing. Due to their fraying tendency during the embroidery process, a highly exact manufacturing procedure is required to ensure that the needle is positioned correctly, that the speed is appropriate, and that the tension is applied correctly [2]. Polymers are optimal for high-frequency applications as low-permittivity materials with minimal loss. “Due to their minimal water absorption and great flexibility and stretchability, they are also resistant to various conditions. For this reason, antenna substrates are made of flexible materials, such as polydimethylsiloxane (PDMS) or liquid crystal polymer (LCP)” [2]. PDMS is a polymer with great flexibility and stretchability and a tolerable loss factor at microwave frequencies. The poor adherence of metal to polymers has been identified as a critical obstacle to the use of polymers in antenna construction [2]. “Injecting metal alloys is one of the intriguing ways for fabricating flexible antennas. Liquid metal alloys and liquid metals such as Galinstan and eutectic gallium 75% indium 25% (EGaIn) can be utilized as flexible conductive components. Due to its flexibility, this technique offers an alternative to using conductive sheets on a flexible substrate, offering both high conductivity and resistance to mechanical stress” [2]. Due to their more excellent conductivity than commercial conductive fabrics, injection alloys improve the antenna’s effectiveness. Inkjet printing, which includes printing on a flexible substrate, has emerged as a preferred method for quick antenna prototyping at a cheap cost. Conductive inks based on silver, carbon copper, and gold nanoparticles have been widely utilized for conductive materials. “The usage of flexible substrates such as PDMS, paper, Kapton polyimide, polyethylene terephthalate (PET), polyester cotton, and polyethylene naphthalate (PEN) has also been effectively demonstrated. Due to the thinness of the conducting layers and the difficulty

Bending Conditions, Conductive Materials, and Fabrication …

549

of the textile to survive the ink curing temperature and bending, inkjet printing on rough, uneven, or porous fabrics is still a challenge. To minimize surface roughness, an interface-coated layer is applied to a polyester cotton fabric, and a screen-printed interface layer is utilized” [2]. Photolithography is another printing method that uses photo resists to produce metallic patterns. “Line patterning is a photolithography method that includes designing a negative picture of the desired pattern using a CAD tool. This is followed by the application of a conductive polymer to the substrate and the removal of the printed mask by using ultrasonic energy on the substrate” [2]. “With the advent of accessibility of materials and printer devices, multi-material fabrication of complicated three-dimensional structures and rapid fabrication the wearable antenna 3D printing methods laser sintering, stereolithography (SL), and fused deposition modeling (FDM) have become popular” [2]. As shown in [3], the rectangular patch antenna was tested on two different substrates. The proposed research used low-cost flame-resistant FR4 and denim substrates to build compact UWB antennas. Textile antenna design and analysis for WBAN systems were presented in [4]. The copper tape was utilized with a textile denim substrate for the radiator patch and ground plane. A flexible double-band woven antenna with minimized construction was shown in [5]. Denim was used as an antenna substrate and copper tape as a radiation component in the proposed antenna for the 2.45 and 5.8 GHz ISM bands. An innovative textile-woven antenna is operating in the ISM band [6]. The wearable antenna was developed to have a minimal influence on the human body and to be produced on textiles for convenience. Wearable textile antennas were studied in [7] of their size, substrates, and frequency ranges. It was developed a novel semi-circular slot antenna for UWB that may be worn. It utilizes textile materials such as (flannel, cotton, and jeans). Using phantombased fabrics, it was found that the reflection coefficient was more significant than 10 dB, which met the design criteria. FR-4, on the other hand, has low antenna impedance and bandwidth. Flannel is more efficient than other fabrics because the pattern is more uniform than cotton and jeans [8]. A unique low-profile ultra-wideband full-textile antenna design is suggested for portable medical imaging devices operating in the microwave band [9]. Small and adaptable, the antenna adapts to the curvature of the human body with a lightweight polyester fabric and copper taffeta structure that is easy to build. One hundred nine percent UWB operating bandwidth improved Omni directional radiation and a suitable gain of 2.9 dB were determined.

3 Bending Scenarios Wearable Antennas Bala et al. [12] examined the benefits of using graphene in constructing flexible WBAN antennas with a curved patch-based antenna. The graphene curved patch antenna had an overall dimension of (35 × 35) mm2 , and had reflection coefficients

550

R. Katragadda and P. A. N. Rao

of (−25.05 and −25.17) dB, maximum gains of (10.5 and 2.8) dB, VSWRs of (1.118 and 1.12), and radiation efficiency of (79.09 and 74.86) percent at resonating frequencies of (2.4 and 3.94) GHz, respectively. A flexible Yagi-Uda antenna with polymer substrates for WBAN applications was also investigated by Jianying et al. [13]. As described in [14], bending influences the efficiency of the rectangular microstrip textile patch antenna for WBAN operating at 2.4 GHz. The antenna’s conducting component was a mixture of copper and nickel, combined with polyester fiber and denim as a substrate. Researchers found that radiation cost and cumulative distribution are affected by bending curvature. Notable is the fact that the antenna structure was evaluated in three general locations: the chest, the forearm, and the wrist. Antennas with wrist-equivalent curvature have a lower average gain than those with flat antenna cases (2–4 dB). A dual-band textile circular patch antenna was examined in [15] under various bending situations. A total of three bending examples were investigated, each with a different bending radius, to conduct this research. Small human arms and legs and certain comparatively bigger sides of the human body were considered when developing the circular microstrip patch, which measures 70 × 70 mm2 . A significant influence was found on the antenna’s performance when bent in the E-plane. According to the case studies, antennas bent at an angle generate resonance period variations. Mounted or twisted antennas are used [10, 16] to create an arm or chest pattern. In [17], a small monopole antenna for radiography is used to identify skin cancer in a person with xeroderma pigmentosum disorders. The suggested antenna met all of the requirements and displayed UWB behavior. With (36 × 48 × 6.12) mm3 dimensions, the suggested antenna has an impedance bandwidth of (8.2–13 GHz) and a maximum gain of (7.04 dB). Since the antennas are designed to be positioned under the body, they are susceptible to curvature limitations. Based on this, an antenna made of Kapton Polyimide Film with star patches was developed [18]. It operated at (2.45) GHz. When the size of the antenna was specified as (75 × 50) mm2 , several scenarios for bending the antenna were examined, and it was determined that its performance was not impacted. The efficiency of conformal UWB portable antennas at different bending radii was also researched and assessed [19]. To test its performance, the suggested antenna was built on an RTDuroid flexible substrate with total dimensions of (35 × 31) mm2 and bent by a 5-year-old healthy kid and an adult aged 35 years, respectively. It was found that for all bending radii of UWB antennas, the reflection coefficient was less than (10) dB. Table 1 shows bending scenarios in wearable antennas overview. Adam et al. [31] Bending radius R is used to describe antenna bending. X-axis bending and Y-axis bending are two forms of bending covered in this article. Antenna construction is proposed from a rectangular patch of 56.6 × 47 mm2 . The radiating section of the antenna is made of Shieldit super, which has a thickness of 0.17 mm and has a dielectric constant of 1.44. The antenna’s Gain will be reduced by reducing the circular radius bending. The X-axis and Y-axis gain reductions are 16% and 12%,

Bending Conditions, Conductive Materials, and Fabrication …

551

Table 1 Bending scenarios in wearable antennas overview [30] Substrate

Application

Radius (mm)

BW

Silicon

Medical

30

3.88 GHz

SAR (W/kg)

Rad. Eff (%)

0.000148

79.09, 74.86

Polyimide

WBAN

20, 30, 40, 50, 60, 70, 180, 360

360 MHz





UWB and (BCWC)

8, 15, 30, 80

11.5 GHz



87, 92, 96

ISM

50, 100, 150

2.45

Denim

ISM

28.5, 42.5

140 MHz



0.15





Jeans

WLAN

33.5, 47.5, 58.5 3.84 GHz





Teslin-paper

UWB and (BCWC)

8, 15, 30, 80

12 GHz



82, 86, 89

Felt

Medical imaging

10, 15, 20

4 GHz

Wearable devices

30, 45

8.2 GHz



> 60

RT duroid

Off-body wearable

25, 50

7.1 GHz



60

Polyimide film

Wearable devices

Diameter = 70, 24.9% 90, 120 ≈400 MHz





0.12

64

respectively, from flat to 25-mm radius bending. Table 2 shows the Gain for various circular bending radii along X and Y-axes. Table 3 compares Gain along the X-axis and Y-axis for various bending degrees. This is especially true for most outcomes, where gains diminish with increasing bend degrees. Despite this, there is a 5% reduction in the cost of bending the Y-axis by a 45° increment in the Gain [31]. Table 2 Gain for various circular bending radii along X and Y-axes Frequency (GHz) Flat (dBi) Radius Bend gain along X-axis (dBi)

Bend gain along Y-axis (dBi)

2.5 cm 4.5 cm 5.5 cm 6.5 cm 2.5 cm 4.5 cm 5.5 cm 6.5 cm 2.4

4.7

3.93

2.45

4.72

3.81

2.5

4.73

3.65

4.38

4.51

4.72

4.13

4.72

4.95

5.05

4.23

4.38

4.62

4.07

4.6

4.83

4.94

4.05

4.2

4.47

4.01

4.41

4.66

4.78

552

R. Katragadda and P. A. N. Rao

Table 3 Comparison of Gain along X-axis and Y-axis for various bending degrees Frequency (GHz)

Flat (dBi)

Radius X-axis bend gain (dBi) 25°

45°

75°

Y-axis bend gain (dBi) 90°

25°

45°

75°

90°

2.4

4.70

4.75

4.86

4.87

4.64

4.82

4.89

4.59

4.37

2.45

4.72

4.60

4.70

4.76

4.41

4.87

4.95

4.64

4.42

2.5

4.73

4.30

4.38

4.33

4.02

4.91

5.01

4.70

4.47

4 Conductive Materials An L-slot AMC configuration for wearable antenna utilized in the ISM band was proposed [20]. The introduced AMC layer effectively reduces antenna back lobe radiation and enhances antenna efficiency. A rectangular textile wearable antenna (RTA) made of electro-conductive textile fabric (ECGT) was described in [21]. To create a conductive material, conductive cotton threads were twisted into cloth and copper filament threads. “A nylon ripstop fabric coated with nickel, copper, silver, and a water-resistant layer to create the conducting side of an antenna design built on a PDMS substrate” [22]. Simorangkir et al. [23] A new lightweight UWB antenna for portable applications was presented, which is sensitive to mounting and vulnerable to physical deformation in the vicinity of the human body. Furthermore, [24] adequately presented a current approach to fabricating robust, lightweight wireless antennas. [25] The proposed silver ink was used to attach a coplanar, dual-band flexible antenna to a paper substrate. “The operating keys of the difficult modest PDMS metal utilized in the construction of flexible antenna utilized in wearable applications for future BPMS-based conductive fabric antenna” [26]. PDMS has been used to produce a transparent flexible antenna constructed of translucent conductive fabric [27]. Based on the FDM platform, [28] developed a portable circularly polarized (CP) antenna (fused deposition modeling). “Depending on the type of conductive material, there are transparent and non-transparent versatile antennas with different parameters such as gain return loss, and radiation pattern. Transparent antennas, as opposed to non-transparent antennas, are unobtrusive and can be utilized in a variety of technologies” [29]. Table 4 shows the conductive materials overview.

5 Conclusion This study addresses the flexible materials and futuristic technologies needed to produce various antenna designs. The selection of materials depends on various parameters like operating radio frequencies, withstand to rough environments, smooth fit into a person wearing garments, and the impact of bending scenarios. Many

Bending Conditions, Conductive Materials, and Fabrication …

553

Table 4 Conductive materials overview [30] S. No

Conductive substance

Conductivity σ

Frequency (GHz)

Dimension (mm2 )

er

1

Embedded NinjaFlex



2.45

60 × 60

3

2

Silver nano ink



2, 2.45, and 3

48.2 × 48.2

3.6

3

ECGT fabric

2.09 × 107 S/m

2.021 and 2.052

48 × 46

2.2

4

Copper



2.93 and 5.71

45.13 × 53.21

1.7

5

Nylon ripstop 7.7 × 105 S/m

6

Polyester taffeta

2.5 × 105 S/m

7

Ripstop

4.2 × 105 S/m

8

Silver coated ripstop

8 × 104 S/m

2.45

2.7

substrate materials were also suitable due to their lightweight, flexibility, and compatibility. The antenna’s performance that was bent at various radii has been addressed. The challenges in designing wearable antennas include impedance matching, additional slots to increase bandwidth, increase the Gain, compactness, desired radiation characteristics, multi-band operation, stable performance under varying conditions (moisture, wet, and weather), and antenna detuning due to the human body loading, bending, and crumpling.

References 1. Nepa, P., Rogier, H.: Wearable antennas for off-body radio links at VHF and UHF bands: challenges, the state of the art, and future trends below 1 GHz. IEEE Antennas Propag. Mag. 57, 30–52 (2015). https://doi.org/10.1109/MAP.2015.2472374 2. Mohamadzade, B., Hashmi, R.M., Simorangkir, R.B.V.B., Gharaei, R., Ur Rehman, S., Abbasi, Q.H.: Recent advances in fabrication methods for flexible antennas in wearable devices: state of the art. Sensors. 19, 2312 (2019). https://doi.org/10.3390/s19102312 3. Mohan, D., Suriyakala, C.D.: Ergonomics of textile antenna for body centric wireless networks for UWB application. In: 2017 International Conference on Circuit, Power and Computing Technologies (ICCPCT). pp. 1–8 (2017). https://doi.org/10.1109/ICCPCT.2017.8074206 4. Turkmen, M., Yalduz, H.: Design and performance analysis of a flexible UWB wearable textile antenna on jeans substrate. IJIEE 8, 15–18 (2018). https://doi.org/10.18178/IJIEE.2018.8.2.687 5. Wang, K., Li, J.: Jeans textile antenna for smart wearable antenna. In: 2018 12th International Symposium on Antennas, Propagation and EM Theory (ISAPE), pp. 1–3 (2018). https://doi. org/10.1109/ISAPE.2018.8634337 6. Li, S.-H., Li, J.: Smart patch wearable antenna on jeans textile for body wireless communication. In: 2018 12th International Symposium on Antennas, Propagation and EM Theory (ISAPE), pp. 1–4 (2018). https://doi.org/10.1109/ISAPE.2018.8634084

554

R. Katragadda and P. A. N. Rao

7. Jayabharathy, K., Shanmuganantham, T.: Design of a compact textile wideband antenna for smart clothing. In: 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), pp. 477–481 (2019). https://doi.org/10.1109/ICI CICT46008.2019.8993388 8. Amit, S., Talasila, V., Shastry, P.: A semi-circular slot textile antenna for ultrawideband applications. In: 2019 IEEE International Symposium on Antennas and Propagation and USNC-URSI Radio Science Meeting, pp. 249–250 (2019). https://doi.org/10.1109/APUSNCURSINRSM. 2019.8889148 9. Lin, X., Chen, Y., Gong, Z., Seet, B.-C., Huang, L., Lu, Y.: Ultrawideband textile antenna for wearable microwave medical imaging applications. IEEE Trans. Antennas Propag. 68, 4238–4249 (2020). https://doi.org/10.1109/TAP.2020.2970072 10. El Gharbi, M., Martinez-Estrada, M., Fernández-García, R., Ahyoud, S., Gil, I.: A novel ultra-wide band wearable antenna under different bending conditions for electronic-textile applications. J. Text. Ins. 112, 437–443 (2021). https://doi.org/10.1080/00405000.2020.176 2326 11. Giampaolo, A.D.N.E.D.: A reconfigurable all-textile wearable UWB antenna. Prog. Electromagnet. Res. C. 103, 31–43 (2020). https://doi.org/10.2528/PIERC20031202 12. Bala, R., Singh, R., Marwaha, A., Marwaha, S.: Wearable graphene based curved patch antenna for medical telemetry applications. Appl. Comput. Electromagnet. Soc. J. (ACES) 543–550 (2016) 13. Jianying, L., Fang, D., Yichen, Z., Xin, Y., Lulu, C., Panpan, Z., Mengjun, W.: Bending effects on a flexible Yagi-Uda antenna for wireless body area network. In: 2016 Asia-Pacific International Symposium on Electromagnetic Compatibility (APEMC), pp. 1001–1003 (2016). https:// doi.org/10.1109/APEMC.2016.7522928 14. Ferreira, D., Pires, P., Rodrigues, R., Caldeirinha, R.F.S.: Wearable textile antennas: examining the effect of bending on their performance. IEEE Antennas Propag. Mag. 59, 54–59 (2017). https://doi.org/10.1109/MAP.2017.2686093 15. Isa, M.S.M., Azmi, A.N.L., Isa, A.A.M., Zin, M.S.I.M., Saat, S., Zakaria, Z., Ibrahim, I.M., Abu, M., Ahmad, A.: Textile dual band circular ring patch antenna under bending condition. J. Telecommun. Electronic Comput. Eng. (JTEC) 9, 37–43 (2017) 16. Mohandoss, S., Palaniswamy, S.K., Thipparaju, R.R., Kanagasabai, M., Bobbili Naga, B.R., Kumar, S.: On the bending and time domain analysis of compact wideband flexible monopole antennas. AEU-Int. J. Electron. C. 101, 168–181 (2019). https://doi.org/10.1016/j.aeue.2019. 01.015 17. Mersani, A., Osman, L., Ribero, J.-M.: Flexible UWB AMC antenna for early stage skin cancer identification. Prog. Electromagnet. Res. M. 80, 71–81 (2019). https://doi.org/10.2528/ PIERM18121404 18. Seman, F.C., Ramadhan, F., Ishak, N.S., Yuwono, R., Abidin, Z.Z., Dahlan, S.H., Shah, S.M., Ashyap, A.Y.I.: Performance evaluation of a star-shaped patch antenna on polyimide film under various bending conditions for wearable applications. Prog. Electromagnet. Res. Lett. 85, 125–130 (2019). https://doi.org/10.2528/PIERL19022102 19. Gupta, N.P., Kumar, M., Maheshwari, R.: Development and performance analysis of conformal UWB wearable antenna under various bending radii. IOP Conf. Ser.: Mater. Sci. Eng. 594, 012025 (2019). https://doi.org/10.1088/1757-899X/594/1/012025 20. Yin, B., Gu, J., Feng, X., Wang, B., Yu, Y., Ruan, W.: A low SAR value wearable antenna for wireless body area network based on AMC structure. Prog. Electromagnet. Res. C. 95, 119–129 (2019). https://doi.org/10.2528/PIERC19040103 21. Gangopadhyay, S., Rathod, S.M., Gadlinge, N., Waghmare, S., Sawant, S., Sachdev, T., Tambe, N.: Design and development of electro-conductive rectangular textile antenna using polypropylene fabric. In: 2017 4th IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics (UPCON), pp. 518–520 (2017). https://doi.org/10.1109/UPCON. 2017.8251103 22. Simorangkir, R.B.V.B., Yang, Y., Esselle, K.P.: Robust implementation of flexible wearable antennas with PDMS-embedded conductive fabric. In: 12th European Conference on Antennas and Propagation (EuCAP 2018), pp. 1–5 (2018). https://doi.org/10.1049/cp.2018.0846

Bending Conditions, Conductive Materials, and Fabrication …

555

23. Simorangkir, R.B.V.B., Kiourti, A., Esselle, K.P.: UWB wearable antenna with a full ground plane based on PDMS-embedded conductive fabric. IEEE Antennas Wirel. Propag. Lett. 17, 493–496 (2018). https://doi.org/10.1109/LAWP.2018.2797251 24. Mohamadzade, B., Simorangkir, R.B.V.B., Maric, S., Lalbakhsh, A., Esselle, K.P., Hashmi, R.M.: Recent developments and state of the art in flexible and conformal reconfigurable antennas. Electronics 9, 1375 (2020). https://doi.org/10.3390/electronics9091375 25. Baytöre, C., Zoral, E.Y., Göçen, C., Palandöken, M., Kaya, A.: Coplanar flexible antenna design using conductive silver nano ink on paper substrate for wearable antenna applications. In: 2018 28th International Conference Radioelektronika (RADIOELEKTRONIKA), pp. 1–6 (2018). https://doi.org/10.1109/RADIOELEK.2018.8376382 26. Simorangkir, R.B.V.B., Yang, Y., Hashmi, R.M., Björninen, T., Esselle, K.P., Ukkonen, L.: Polydimethylsiloxane-embedded conductive fabric: characterization and application for realization of robust passive and active flexible wearable antennas. IEEE Access. 6, 48102–48112 (2018). https://doi.org/10.1109/ACCESS.2018.2867696 27. Simorangkir, R.B.V.B., Yang, Y., Esselle, K.P., Zeb, B.A.: A method to realize robust flexible electronically tunable antennas using polymer-embedded conductive fabric. IEEE Trans. Antennas Propag. 66, 50–58 (2018). https://doi.org/10.1109/TAP.2017.2772036 28. Li, J., Jiang, Y., Zhao, X.: Circularly polarized wearable antenna based on NinjaFlex-embedded conductive fabric. Int. J. Antennas Propag. 2019, e3059480 (2019). https://doi.org/10.1155/ 2019/3059480 29. Kantharia, M., Desai, A., Mankodi, P., Upadhyaya, T., Patel, R.: Performance evaluation of transparent and non-transparent flexible antennas. In: Janyani, V., Singh, G., Tiwari, M., and d’Alessandro, A. (eds.) Optical and Wireless Technologies, pp. 1–8. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-6159-3_1 30. Mahmood, S.N., Ishak, A.J., Saeidi, T., Alsariera, H., Alani, S., Ismail, A., Soh, A.C.: Recent advances in wearable antenna technologies: a review. Prog. Electromagnet. Res. B. 89, 1–27 (2020). https://doi.org/10.2528/PIERB20071803 31. Adam, I., Kamarudin, M.R., Rambe, A.H., Haris, N., Rahim, H.A., Wan Muhamad, W.Z.A., Ismail, A.M., Jusoh, M., Yasin, M.N.M.: Investigation on wearable antenna under different bending conditions for wireless body area network (WBAN) applications. Int. J. Antennas Propag. 1–9 (2021). https://doi.org/10.1155/2021/5563528

A 3D Designed Portable Programmable Device Using Gas Sensors for Air Quality Checking and Predicting the Concentration of Oxygen in Coal Mining Areas M. Aslamiya , T. S. Saleena , A. K. M. Bahalul Haque , and P. Muhamed Ilyas

1 Introduction Air pollution has become one of the main threats to the entire living thing. Today, the air is getting polluted by various hazardous gasses. The main substances which cause air pollution are CO2 , NO2 , CO, Ozone, Lead, SO2 , etc. [1]. Ingestion of such polluted air leads to many health problems and respiratory disease and reasons for low birth rate, etc. [2, 3]. Health effects on cities like Delhi in India have rapid growth in the cases of diseases such as cancer, asthma, respiratory diseases, headache, eye/skin irritation. Many researches and studies proved that pollutant air creates many problems on the health of people [4–6]. Coal is the main energy source of many countries such as India [7]. Coal mining has a key role in the emission of methane in air [8]. The increased demand of coal eventually causes the production of methane in large amounts. The main source of anthropogenic methane emission is coal mines. The coal mining construction measures approximately 11% of global methane from the activities of humans [9]. The emission of methane during the coalification process M. Aslamiya (B) Department of Computer Science, Al Jamia Arts and Science College, Perinthalmanna, India e-mail: [email protected] T. S. Saleena Department of Computer Science, Sullamussalam Science College, Areekode, India e-mail: [email protected] A. K. M. B. Haque Software Engineering, LENS, LUT University, Lappeenranta, Finland e-mail: [email protected] P. M. Ilyas Sullamussalam Science College, Areekode 673639, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_49

557

558

M. Aslamiya et al.

in the mining area affects the quality of air badly. It critically affects the health of workers in the underground mines. When the methane reacts with the other gasses in the air it produces various harmful gasses in the atmosphere. Our study deals with a system that is capable of detecting the quality of air and levels of different pollutant gasses in the atmosphere and predicting the level of oxygen based on methane in coal mining areas. It is also enough to predict the air quality in the area of coalification. The work has been done using mq135, mq7, mq4 gas sensors to detect dreadful pollutants identified by the WHO [10]. The MQ135 sensor is sensitive to NH3 , sulfur(S), NOx, alcohol, benzene, CO2 , ozone, etc. The MQ7 is highly sensitive to carbon monoxide (CO) and MQ4 toward the methane. So our device is capable of detecting this much pollutants from the air. A 3D-modeled case is developed for the effective use of the device by using AutoCAD software. AutoCAD is computer-aided design software to create pinpoint drawings especially for 2D and 3D models. This work can bring a solution in such cases. Based on the measurements obtained from our model, the miners can take necessary precautions to breathe in such a harmful environment. It displays the rate of values in percentage and PPM on the OLED display. It predicts the quality of air based on many pollutant gasses. It uses LED for more clear outputs and it generates sound by using a buzzer if the necessary condition occurs. This work has been implemented using embedded C code language and burned into the Arduino board.

2 Related Works Air pollution being a major environmental and sustainability issue, several researches are going on in this area. A group of researchers proposed a system for monitoring air quality with MQ135 gas sensors [11]. They modeled an air quality monitor (AQM) to measure the concentration of gasses CO, CO2 , NH4 , ethanol, acetone, and toluene. After the measurement, the relative data will be sent to the admin through SIM 900. An LED and a buzzer alarm have been used as an indicator of polluted indoor air. They use the getPPM() function in the arduino IDE to convert the analog value read by the sensor into PPM. Kodali and Borra proposed a system that detects the leakage of gas and sends a message to the user [12]. It senses harmful gasses, such as LPG, methane, and benzene. Methane and LPG are highly inflammable, whereas benzene in high concentration will affect the health of workers badly. They used MQ4, MQ135, and MQ6 gas sensors for sensing these gasses and used ESP-32 as a Wi-Fi module. It sends alert messages so that the user can check whenever they need. There is a buzzer module to make sound while detecting the gas leakage and informing the appropriate authority. Chan, K., has proposed a technique that has used low-cost gas sensors and real-time data accessibility [13]. Sai, K. B. K. Subbareddy, have introduced a calculation method for finding air quality in PPM rate [14]. They made an initiation to make the people aware of the air pollution using the Thingspeak or Cayenne platform.

A 3D Designed Portable Programmable Device Using Gas Sensors …

559

3 The Hardware and Software Required In this work, we have assembled the following components to make the proposed device; an Arduino nano board, Atmega328 microcontroller, MQ135, MQ4, MQ7, 128 * 64 OLED display, serial port cable, com3 port, female to female wires, buzzer, 4 led (2green and 2red), and a switch with a battery module.

3.1 Arduino A low-cost and open market available input/output electronic board with its own processor and memory capable for communicating and controlling with the device through embedded programming language. And it is used as a better tool for many studies and researches [15, 16]. Arduino has a greater role in emerging technologies such as robotics [17]. It is implemented with an arduino Nano board, that is, tiny, complete, and user-friendly based on ATmega 328 microcontroller, with 32 KB of memory. It has 14 digital pins for input/output operations and operates on 5 V power [18]. The Arduino IDE is a piece of free software that is capable of writing the code in a programming language. Then it uploads to the processor after that Arduino carries out the instructions and processes according to it [19]. The microcontroller that serves as the device’s primary controller is the Atmega 328 [20]. It comes with a boot loader in Arduino that processes to upload code without the use of an external hardware. For the serial communication between the computer program and Arduino board, com3 has been chosen as a communication link.

3.2 128 * 64 OLED Display 128 * 64 OLED (organic light-emitting diode) used as the display module for screening purposes. It simulates the current new technologies with the potential of monitors, cell phones, LED, and LCD displays, etc. It is based on LED, it is cost-effective, and it is trying to simulate LCD displays [21]. OLED displays are widely used as display modules in many projects and research activities. In the global market, the scale rate of OLED displays increased from 36.2 million to 366 billion until 2010 [22]. The main reasons for including an OLED display over other displays are its features such as lower power consumption, low cost, simple design, and better durability and image quality.

560

M. Aslamiya et al.

3.3 Sensors This device used three different gas sensors MQ135, MQ7, and MQ4. MQ135 is widely used to detect a diverse array of pollutant gasses such as NH3 , NOx, alcohol, benzene, CO2 , etc., and is adequate for monitoring air quality [23]. MQ135 is a low-cost sensor widely used in air quality monitoring devices [23, 24]. MQ4 has fast response, stable, and long life [25]. MQ4 is a popular sensor for detecting the gas leakages in industries. Along with MQ135 and MQ4, this device used an MQ7 sensor for detecting the level of carbon monoxide. MQ7 is highly sensitive to carbon monoxide and it has a long life. Figure 1 shows the relationship between resistance and ppm value for mq135, mq4, mq7. These three sensors are cheaper and widely available. So this leads to the economical feasibility of the system. Low-cost sensors are adequate to various research based on the environment. These sensors can be used to develop real time, affordable air quality monitors [13]. Female to female wires are used for connecting the different parts of the system. The red LED for signaling the dangerous condition. The green LED for indicating the safe condition in the air. Buzzer produces sound when giving alerts for the users of the system.

Fig. 1 Datasheet that showing change in resistance versus ppm a for MQ4 b MQ135 c MQ7

A 3D Designed Portable Programmable Device Using Gas Sensors …

561

3.4 Code Language The code language is embedded in C. In the field of embedded systems, this features as one of the most popular and widely used programming languages. Selecting an appropriate programming language is a difficult task in the development of an embedded system. Mainly there are three different languages for programming embedded projects, 1- embedded C, 2-C++, 3-Ada. But today, C language is widely used for embedded projects [26]. C++ has some supplementary features than C such as data abstraction and OOPs features. But some of them reduce program efficiency [27]. In the case of Ada, it is a real-time programming language, but unfortunately it has not gained a good deal of popularity. Because its compiler is not widely available [26]. In the case of the C language, it is simple and flexible to use. The C compiler is available in almost every processor [27]. The language includes various libraries such as Adafruit_SSD1306, Adafruit_GFX, SPI, and Wire. It defines the OLED display, sensors, and other output modules. The code takes the analog value of the sensor and converts this value into ppm and percentage. The analog value of the sensor MQ135 converted to parts per million (PPM) and the values of MQ4, MQ7 converted to percentage. That will display on the OLED module. That is more userfriendly data rather than analog values. Based on these values, the system checks different conditions for predicting the quality of air and methane level.

4 Methodology Systems consist of both hardware and software. The hardware includes sensors, Arduino board, OLED display, LEDs, buzzer, female to female wires, and battery module. The software part embedded c code done in the Arduino IDE platform. The three sensors give input data to the processor. Based on the values read by the sensor processor, it perform various operations and display the output into the display module. The led and buzzer modules are used appropriately. Figure 2 shows the components and circuit diagram. The three sensors are connected to the processor through analogue pins A0, A1, A2 pins of the processor. The display is connected to the processor through A4, A5, analog pins. A4 for serial data transmission and reception, A5 for clock signals to peripheral devices. Led and buzzer are connected through digital pins D7, D6, and D4, respectively. The battery is connected with the battery module. All the parts of the system are connected to the battery module. The two common pins gnd (ground) and vcc (voltage common collector) are connected to the corresponding gnd and vcc pins of the arduino board. Connection diagram is created in circuito.io [28]. A printed circuit board (PCB) is manufactured based on this circuit. PCB is an electrical circuitry made up of electronic components. And the PCB included in the 3D case. AutoCAD software is used for printing 3D model cases for this device. It is used for drawings, diagrams, models, etc. AutoCAD is computer-aided design software

562

M. Aslamiya et al.

Fig. 2 Architecture of our proposed system, orange cylinder is the gas sensors, blue colored board on left is the Arduino nano board with ATmega328 microcontroller, blue board on the right is battery module, the board in the center is display module, the black wire is ground pin, red wire is for VCC pin, purple wire is analog pin, and green wire is SCL and SDA pin

developed by the Autodesk Company. It lets us draw and edit digital 3D and 2D designs more rapidly and comfortably than we could by hand. The files of autodesk can also be simply saved and stored in the cloud storage, so they can be accessed anywhere at any time. This is an effective drafting tool for 2D and 3D models [29, 30]. For this device, the 3D case has six faces, 16 vertices, and 24 edges. The six faces of the case are top, bottom, front, back, left, and right. Pair of edges in the model has the same measurement. The unit is taken in centimeters, that is, 4, 5, and 7 cm. The 3D model of the device is shown in Fig. 3. Karar, M. E, described air quality index ranges in their study [31]. The value obtained from the gas sensor, MQ135 can be converted into PPM value. The ranges of values have been associated with corresponding air quality. If the range of values

Fig. 3 All the hardware components of the device have been enveloped in a case and that case is designed using AutoCAD software as 3D image a top view b bottom view

A 3D Designed Portable Programmable Device Using Gas Sensors …

563

in sensor is 0–50, 51–100, 101–150, 151–300 PPM, or more than 300 PPM, then the air quality will be. • Good, moderate, unhealthy for sensitive groups, • Unhealthy and hazardous, respectively. If the value is less than 1%, CO and methane are considered as a good quality level. If the value is greater than 1% it is predicted to be unhealthy [32]. Based on the air quality index, if the ppm value of mq135 is between the range of 0–100 ppm and the concentration of CO and methane is less than 1%, the system predicts good air quality. If the value of mq135 is more than 100 ppm and/or the level of CO and methane is more than 1%, the system predicts unsafe or bad quality air (Tables 1, 2 and 3). In our system, it detects the level of oxygen in the coal mining area by assessing the level of methane resulting from the coalification process. When methane molecules bind with oxygen molecules, it produces CO2 (carbon dioxide) which is another major pollutant in the air. It is very harmful to breathe. The chemical reaction of methane and oxygen is CH4 + 2O2 → CO2 + 2H2 O.

(1)

Coalification process has a significant role in the emission of methane and reduction in oxygen [33]. When the air contains the concentration of methane at the level of 10% then it will reduce the concentration of oxygen by 2% [34]. Based on this Table 1 The value read by the gas sensor MQ135 can be converted into PPM value and the different ranges of values can indicate different air quality conditions

Table 2 The value read by MQ7 and MQ4 converted into percentage, based on the values system pointing to various air quality conditions

Table 3 The value read by MQ4 gas sensor converted into percentage and the decrement level of oxygen based on methane value

Range

Quality

0–100

Good

101–500

Bad

Source Author

Range

CO

Methane

Less than 1%

Good quality air

Good quality air

Greater than 1%

Bad

Bad quality air

Source Author

Level of methane (%)

Level of oxygen

5

Decremented by 1%

10

Decremented by 2%

Source Author

564

M. Aslamiya et al.

Fig. 4 a End product of this study b The screen on the device displays the result as text message, red and green LED lights to indicate the result. Red light glows when the air quality is bad, if it is fresh then green color glows. The red stripes on the right side of the screen are the buzzer to make sound to indicate the result c all the hardware components shown in Fig. 2 are enveloped in the outer case

condition, the system detects the level of oxygen in the presence of methane, especially in coal mining areas. Through monitor and buzzer, it gives appropriate signals for users.

5 Results We have built an air quality checking device using the gas sensors, MQ135, MQ4, MQ7. Figure 4 demonstrates the device. When the power is on, sensors start to monitor the environment. It takes the analog values of MQ135, MQ4, MQ7, and converted into PPM (parts per million) and percentage. The value of MQ135 is converted into PPM and that of MQ4 and MQ7 are converted into percentage. If the ppm value of MQ135 is in the range of 0–100 and the percentage of methane and CO is less than 1%, then the device predicts that the air quality is good. The result will be displayed on the screen, and the green LED will glow. For the testing purpose, we have explicitly polluted the air using different pollutant gasses in the area of sensors. If the ppm value of the air quality sensor is greater than 100 ppm and/or CO and methane are more than normal level, the device predicts that the air quality is bad. Figure 4c shows the hardware setup wrapped by the 3D case model.

6 Conclusion and Future Works In our study, we have developed a handheld portable device with the 3D model feature. In this system, we took three different gas sensors MQ4, MQ7, MQ135 for monitoring pollutant gasses in the atmosphere, especially in the coal mining areas. The reason for choosing these sensors is that we can detect many of the dreadful pollutants in the air which are listed by the WHO. That list contains CO, methane, nitrogen dioxide, sulfur dioxide, etc. The volume of methane present in the air can

A 3D Designed Portable Programmable Device Using Gas Sensors …

565

predict the oxygen level in the air, particularly in coal mining areas. As MQ135 is not able to detect the main pollutant, carbon monoxide, MQ7 sensor is added with the device. MQ4 is enough for detecting the level of methane, and thus it can predict the level of oxygen. A 3D model can be developed through AutoCAD software for including the PCB, processor, battery module and to make it more user-friendly. This converts the values of different pollutant gasses into PPM and percentage values. As a future work, the values detected by the device and other information can be transferred through a communication device to a cloud storage. The cloud can save the information from each device. Through an algorithm, we can classify the polluted and non-polluted area. The cloud data can be shared with the government authority for taking appropriate decisions.

References 1. Hamanaka, R.B., Mutlu, G.M.: Particulate matter air pollution: effects on the cardiovascular system. Front. Endocrinol. 9, 680 (2018) 2. Kampa, M., Castanas, E.: Human health effects of air pollution. Environ. Pollut. 151(2), 362– 367 (2008) 3. Ritz, B., Yu, F.: The effect of ambient carbon monoxide on low birth weight among children born in southern California between 1989 and 1993. Environ. Health Perspect. 107(1), 17–25 (1999). https://doi.org/10.1289/ehp.9910717 4. Block, M.L., Elder, A., Auten, R.L., Bilbo, S.D., Chen, H., Chen, J.C., Cory-Slechta, D.A., Costa, D., Diaz-Sanchez, D., Dorman, D.C., Gold, D.R., Gray, K., Jeng, H.A., Kaufman, J.D., Kleinman, M.T., Kirshner, A., Lawler, C., Miller, D.S., Nadadur, S.S., Ritz, B., Wright, R.J. et al.: The outdoor air pollution and brain health workshop. Neurotoxicology 33(5), 972–984 (2012). https://doi.org/10.1016/j.neuro.2012.08.014 5. Becerra, T.A., Wilhelm, M., Olsen, J., Cockburn, M., Ritz, B.: Ambient air pollution and autism in Los Angeles county, California. Environ. Health Perspect. 121(3), 380–386 (2013). https:// doi.org/10.1289/ehp.1205827 6. Balyan, P., Ghosh, C., Sharma, A.K., Banerjee, B.D.: Health effects of air pollution among residents of Delhi: a systematic review. Int. J. Health Sci. Res. 8(1), 273–282 (2018) 7. Singh, A.K., Kumar, J.: Fugitive methane emissions from Indian coal mining and handling activities: estimates, mitigation and opportunities for its utilisation to generate clean energy. Energy Procedia 90, 336–348 (2016) 8. Zhengfu, B.I.A.N., Inyang, H.I., Daniels, J.L., Frank, O.T.T.O., Struthers, S.: Environmental issues from coal mining and their solutions. Mining Sci. Technol. (China) 20(2), 215–223 (2010) 9. Kholod, N., Evans, M., Pilcher, R.C., Roshchanka, V., Ruiz, F., Coté, M., Collings, R.: Global methane emissions from coal mining continue growing even with declining coal production. J. Clean. Prod. 256, 120489 (2020) 10. https://www.who.int/publications/i/item/9789240034228 11. Abbas, F.N., Abdalrdha, Z.K., Saadon, M.M.: Capable of gas sensor MQ-135 to monitor the air quality with Arduino Uno. Int. J. Eng. Res. Technol. 13(10), 2955–2959 (2020) 12. Kodali, R.K., Greeshma, R.N.V., Nimmanapalli, K.P., Borra, Y.K.Y.: IOT Based industrial plant safety gas leakage detection system. In: 2018 4th International Conference on Computing Communication and Automation (ICCCA), pp. 1–5. IEEE (2018) 13. Chan, K., Schillereff, D.N., Baas, A.C., Chadwick, M.A., Main, B., Mulligan, M, O’Shea, F.T., Pearce, R., Smith, T.E., Van Soesbergen, A., Thompson, J.: Low-cost electronic sensors for

566

14.

15. 16. 17.

18. 19. 20. 21. 22. 23. 24.

25.

26.

27. 28. 29. 30. 31. 32. 33.

34.

M. Aslamiya et al. environmental research: pitfalls and opportunities. Prog. Phys. Geogr. Earth Environ. 45(3), 305–338 (2021) Sai, K.B.K., Subbareddy, S.R., Luhach, A.K.: IOT based air quality monitoring system using MQ135 and MQ7 with machine learning analysis. Scalable Comput. Pract. Experience 20(4), 599–606 (2019) Louis, L.: Working principle of Arduino and using it. Int. J. Control, Autom. Commun. Syst. (IJCACS) 1(2), 21–29 (2016) Badamasi, Y.A.: The working principle of an Arduino. In: 2014 11th International Conference on Electronics, Computer and Computation (ICECCO), pp. 1–4. IEEE (2014) Luciano, A.G., Fusinato, P.A., Gomes, L.C., Luciano, A., Takai, H.: The educational robotics and Arduino platform: constructionist learning strategies to the teaching of physics. In: Journal of Physics: Conference Series, vol. 1286, No. 1, p. 012044. IOP Publishing (2019) Nano, A.: Arduino nano (2018) Banzi, M., Shiloh, M.: Getting started with Arduino. Maker Media, Inc. (2022) Barrett, S.F.: Arduino microcontroller processing for everyone! Synth. Lect. Digital Circ. Syst. 8(4), 1–513 (2013) Gay, W.: OLED. In: Beginning STM32, pp. 223–240. Apress, Berkeley, CA. 23 (2018) Borchardt, J.K.: Developments in organic displays. Mater. Today 7(9), 42–46 (2004) Abbas, F.N., Abdalrdha, Z.K., Saadon, M.M.: Capable of gas sensor MQ-135 to monitor the air quality with Arduino Uno. Int. J. Eng. Res. Technol 13(10), 2955–2959 (2020) Chojer, H., Branco, P.T.B.S., Martins, F.G., Alvim-Ferraz, M.C.M., Sousa, S.I.V.: Development of low-cost indoor air quality monitoring devices: recent advancements. Sci. Total Environ. 727, 138385 (2020) Vinaya, C.H., Thanikanti, V.K., Ramasamy, S. (2017).: Environment quality monitoring using ARM processor. In: IOP Conference Series: Materials Science and Engineering, vol. 263, No. 5, p. 052020. IOP Publishing Nahas, M., Maaita, A.: Choosing appropriate programming language to implement software for real-time resource-constrained embedded systems. Embed. Syst.-Theory Des. Methodol. (2012) Barr, M.: Programming Embedded Systems in C and C++. O’Reilly Media, Inc. (1999) https://www.circuito.io/ Shakkarwal, P., Kumar, R., Sindhwani, R.: Progressive die design and development using AutoCAD. In: Advances in Engineering Design, pp. 531–539. Springer, Singapore (2021) Kukiev, B., O‘g‘li, A.N.N., Shaydulloyevich, B.Q.: Technology for creating images in autocad. Eur. J. Res. Reflection Educational Sci. 7 (2019) Karar, M.E., Al-Masaad, A.M., Reyad, O.: GASDUINO-wireless air quality monitoring system using internet of things (2020). arXiv preprint arXiv:2005.04126 https://www.mcair.com/resources/carbon-monoxide-the-silent-killer .Norrish, R.G.W., Foord, S.G.: The kinetics of the combustion of methane. In: Proceedings of the Royal Society of London. Series A-Mathematical and Physical Sciences, vol. 157(892), pp. 503–525 (1936) Yusuf, M., Ibrahim, E., Saleh, E., Ridho, M.R., Isk, I.: The relationship between the decline of oxygen and the increase of methane gas (CH4) emissions on the environment health of the plant. Int. J. Collaborative Res. Intern. Med. Public Health 8(7), 0–0 (2016)

Gain Enhanced Single-Stage Split Folded Cascode Operational Transconductance Amplifier M. N. Saranya, Sriadibhatla Sridevi, and Rajasekhar Nagulapalli

1 Introduction The recent advancement in the semiconductor industry toward deep sub-micron scale, fine-line CMOS process technologies steer analog designers to come up with new designs of high speed, low power circuits. The linearity and dynamic range of the analog circuits decrease with supply voltage and feature size scaling. Furthermore, the short-channel CMOS process regime only provides limited gain [1]. Thus, the realization of high-performance (i.e., high gain and speed) OTA at a reduced supply voltage in the sub-micron scaled CMOS technologies (like 65, 45 nm, and so on) has become the most challenging task. OTA is the most power-consuming and the most extensive module in the analog circuits [2]. Analogous to the traditional op-amp, OTA is an integral building module of various analog integrated circuits. Few such applications are active filters and control amplifiers, multiplexers, switched capacitor circuits [3], and sample-and-hold circuits [4]. OTA is one of the salient blocks in biomedical applications, specifically in the neural recording system [5–9]. Such precision applications require high gain leading to multi-stage OTA design, employing long channel devices which are operated at low bias current. Besides, multi-stage design leads to a substantial phase shift

M. N. Saranya Deparment of Electronics and Communication Engineering, National Institute of Technology, Surathkal, Karnataka, India e-mail: [email protected] S. Sridevi (B) School of Electronics Engineering, Vellore Institute of Technology, Vellore, India e-mail: [email protected] R. Nagulapalli Electronics and Instrumentation Group, Oxford Brookes University, Oxford, UK © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_50

567

568

M. N. Saranya et al.

and the need for severe compensation limits the high-frequency performance. Highspeed applications prefer single-stage OTA designs that use short-channel devices, which are biased at larger current, to achieve high unity-gain frequency [4]. Among the endless list of operational amplifiers available, the most reused and redesigned amplifier configuration is the standard folded cascode operational transconductance amplifier. It is an excellent choice either as a single-stage amplifier or a pre-amplifier in multi-stage amplifiers [2]. The paper is sub-divided into five sections. Section 2 discusses the SFCOTA topology and previous research work associated with it. Section 3 elaborates the proposed approach and design specifications to enhance the DC gain of the SFCOTA. Section 4 summarizes the Specter simulation results of the split folded cascode OTA topology. And finally, Sect. 5 concludes the work.

2 Standard Folded Cascode OTA Topology 2.1 Background and Related Work SFCOTA is a general purpose single-stage amplifier designed to overcome the drawbacks of telescopic cascode configuration [10]. The OTA in Fig. 1 is the SFCOTA configuration. There are several circuit approaches reported in the literature to refine the performance of an SFCOTA. Wattanapanitch et al. proposed a current scaling technique to design an OTA with minimum power consumption and reduced noise for the primary gain stage of the neural recording amplifier [5]. However, this configuration is implemented, sacrificing the voltage headroom, making it less attractive for low-voltage implementations and using several resistors, which will occupy much silicon. The

Fig. 1 Standard folded cascode OTA configuration

Gain Enhanced Single-Stage Split Folded Cascode Operational …

569

design in [6] presents a current splitting strategy to the intricate design in [5], to boost the power and noise trade-off. Another most researched technique is the current reuse and recycling, especially for folded cascode [2, 11–13]. As a general enhancement to the SFCOTA, the existing devices and currents are recycled to increase the effective gm. As a result, this method boosts the amplifier gain without additional power or area requirements [2]. However, this technique creates an additional pole-zero pair, deteriorating the phase margin, and long-tail settling in the step response. Yet, recently the recycling folded cascade (RFC) OTA has gained lot of popularity [14, 15]. Also, there are several improved versions of the recycling technique presented. Few such upgrades are improved recycling structure [12], cascode self-biasing [16], additional current sources [11] or positive feedback [13] to enhance the performance. However, these enhancements in the design are counterbalanced by significant phase margin degradation. Thus the system is susceptible to oscillation. Another popular and widely accepted technique is gain boosting feedback amplifier in the OTA structure [17]. In this method, the cascode transistor gm is enhanced by a feedback amplifier, ensuring a significant improvement in the gain of the op-amp. Anyhow, the gain boosted OTA requires additional power because of the feedback amplifier and suffer from the potential drawback of a pole-zero doublet and instability [4]. Other novel techniques to boost the performance of the SFCOTA include a gain boosting with negative degeneration resistor [9], hybrid mode bulk-driven [18], and programmable OTAs. Digitally programmed op-amp/OTA design is a growing field in analog design which ensures to optimize the output characteristics and system response based on the specific application [3, 7]. These techniques achieve high DC gain or high bandwidth performance, or both. However, this comes with the deterioration of any performance parameters such as power, area, noise, and stability. Thus, there is still a need for newer and improved techniques to realize high-performance amplifiers in deep sub-micron scale operating at low voltage with low power consumption and better noise performance. We present a design method to enhance the DC gain of a single-stage SFCOTA without elevating power consumption or headroom issues and still achieve an improved noise performance.

3 Proposed Folded Cascode OTA Topology 3.1 Architecture A simple split differential pair technique-based gain enhancement scheme has been proposed to achieve a higher gain in the SFCOTA. Figure 2 shows the proposed split folded cascode OTA. This split folded cascode OTA architecture aims to enhance the DC gain in a single-stage without having any headroom issues. This method relies on the concept that the gain of an OTA will improve when gm increases and gds decreases [8]. Using the split differential pair technique, the tail current source (Mb2) and the

570

M. N. Saranya et al.

Fig. 2 Proposed split folded cascode OTA configuration

differential input pair (M1 and M2) are split in half such that the modified OTA has two tail current sources (Mb2 and Mb3) and two differential pairs (M1, M2, M1a, and M2a). The current flowing through the differential input pairs (M1-M2 and M1a-M2a) is in the ratio of (1-m): m [8]. By decreasing the current through cascode device load, output impedance could be improved. Thus, the input differential pair which carries (1-m) times of the current is connected to the cascode load to reduce gds significantly. The adjunct input differential pair, which carries m times of the current, is connected to the resistive load to form a simple differential circuit. The amplified output signal of this adjunct differential circuit is used to drive the transistors M3 and M4. These transistors play a role in providing a folding point to the small-signal current from the differential pair. Therefore, injecting an amplified signal to these two transistors will significantly boost the gain. Further, this method does not introduce low-frequency pole-zero doublets or increase voltage headroom. Also, the number of biasing lines required is decreased by one, thus reducing the complexity of the biasing scheme considerably. The voltage gain expression of the split folded cascode OTA is given as AV =

gm1 (gds1 +gds3 )gds5 ds9 ) + (gds7g+g gm5 m7

× (−gm1a R D )

(1)

where RD is the load resistance of the adjunct differential circuit. Compared to the standard architecture, the proposed one has one extra node whose time constant is RDCP. This node will add extra phase lag in the loop gain of the op-amp, and hence phase margin will be compromised. This design made sure the pole frequency corresponding to this node is ~745 MHz, three times higher than unity-gain bandwidth (UGBW). The maximum value of the R, limited by phase margin, can be compromised.

Gain Enhanced Single-Stage Split Folded Cascode Operational …

571

3.2 Design Specifications The input stage determines the gain of an amplifier. Thus, PMOS or NMOS input is chosen based on the trade-off between gain, input common-mode range, and noise [19]. PMOS input folded cascode OTA has been chosen to ensure better performance. It is possible to achieve a high gain Av by opting for a larger channel length L and small Vgs-Vt. We have chosen the channel length L as 180 nm and restricted Vgs-Vt to 75 mV. Both the OTAs: proposed split folded cascode and SFCOTA are developed in a typical TSMC 65 nm CMOS technology process to operate at a minimum power supply of 1 V and a current budget of 90 µA. To achieve a minimum trade-off between DC gain and UGBW with good stability, the current splitting ratio m is selected as 0.4. Further, the currents flowing through the input and cascode devices are set the same, as a good choice for symmetry to avoid artifacts in slew rate, swing, etc. The amplifier’s outputs are set at Vdd/2 (0.5 V in this work) to maximize swing. The biasing circuitry for generating the bias current and cascode bias voltage is designed using standard circuits [19]. Many OTA circuits can share the bias voltage generated. So, the total power measured does not include the power consumption of the biasing circuitry.

4 Simulation Results The proposed split folded cascode OTA simulation results are detailed and compared with the existing OTAs in this section. Figure 3 shows the open-loop frequency response of the proposed design and SFCOTA while driving a load capacitance of 3 pF. Our design has a UGBW of 261.5 MHz with a 68.7° phase margin. As demonstrated in Fig. 3, the proposed OTA achieves a DC gain of 66.2 dB, 17 dB higher than that of the SFCOTA for the same current budget of 90 µA. Thus, the proposed OTA has an accuracy of 10 bits. The power measured is 144.8 µW. The performance comparison of the proposed design with the existing folded cascode OTA designs is presented in Table 1. From Table 1, it is explicit that the proposed split folded cascode OTA has a better DC gain and UGBW when compared to some of the existing amplifier designs [11–13]. In particular, compared to the OTA in [11] designed in the same 65 nm CMOS process, the proposed design technique achieved a higher UGBW and DC gain at a much lower bias current. Due to the improved GBW, the phase margin degrades. However, the designed circuits exhibit a better Figure of Merit (FOM). Process corner analysis and temperature-dependent simulation illustrate the temperature variation and mismatch effects on the proposed OTA. Figure 4a presents the open-loop AC response of the split folded cascode OTA at various process corners (TT, SS, FF, SF, and FS), and Fig. 4b shows 100 °C temperature variations from −10 to +90 °C. The designed OTA circuit demonstrates a relatively permissible response range in different process corners. However, there is no significant change in the results of

572

M. N. Saranya et al.

Fig. 3 Frequency response of both SFCOTA and proposed split folded cascade OTA

Table 1 Performance Comparison of the proposed split folded cascode OTA design with the published works Parameter

SFCOTA

This work

[11]

[13]

[12]

Supply voltage (V)

1

1

1

1.2

1.2

Bias current (µA)

90

90

800

300

260

Technology (nm)

65

65

65

180

130

DC Gain (dB)

49.1

66.2

54.5

65.7

70.2

UGBW (MHz)

60.5

261.5

203.2

148.9

83

Phase margin (deg)

89.1

68.7

66.2

80.3

70

Capacitive load (pF)

3

3

10

2×5

7

CMRR (dB)

70.9

68.7



42.9



FOM (MHz pF/mA)

2016

8716

2540

4963

2235

SF and FS corners to the TT corner. A summary of the plotted results is detailed in Table 2. Also, the noise performance of the proposed OTA was analyzed. Figure 5 shows the spectral density of the output noise. The output noise of the proposed OTA and SFCOTA is 247.929 and 289.5 pV2/Hz, respectively. Hence, the modified OTA exhibits improved noise performance compared to standard OTA.

Gain Enhanced Single-Stage Split Folded Cascode Operational …

573

Fig. 4 a AC response of the proposed split folded cascode OTA at process corners b AC response of the proposed OTA with temperature variations

Table 2 Summary of the split OTA design at process corner and with temperature variations Parameter

DC gain (dB) UGBW (MHZ) Phase margin (deg)

Temperature dependent

Process corner

−10 °C

TT

90 °C

SS

FF

SF

FS

72.2

50.1

66.2

66.1

51.0

67.7

63.4

224.1

225.0

261.5

34.0

242.6

240.1

277.3

68.6

72.4

68.7

71.4

71.3

69.9

67.6

Fig. 5 Output noise of the proposed OTA

5 Conclusion The proposed split differential input pair technique-based modification to an SFCOTA can boost the gain and GBW. This improvement is achieved at a smaller current budget without increasing power consumption or voltage headroom. Thus,

574

M. N. Saranya et al.

showing a significant performance enhancement than the existing amplifier designs. A transistor-level circuit has been realized and simulated in TSMC 65 nm CMOS technology. The preliminary simulation results show an enhancement of 17 dB in DC gain and 201 MHz in GBW compared with the SFCOTA. Also, the proposed OTA demonstrates better noise performance and robustness across process, voltage, and temperature (PVT) variations. As future work, the stability performance and Monte Carlo simulations are meant to be analyzed in detail.

References 1. Sansen, W.M.: Analog Design Essentials, vol. 859. Springer Science & Business Media (2007) 2. Assaad, R.S., Silva-Martinez, J.: The recycling folded cascode: a general enhancement of the folded cascode amplifier. IEEE J. Solid-State Circuits 44(9), 2535–2542 (2009) 3. Mal, A.K., Todani, R.: A digitally programmable folded cascode ota with variable load applications. In: 2011 IEEE Symposium on Industrial Electronics and Applications, pp 295–299, IEEE (2011) 4. Li, B.: A high DC gain Op-amp for sample and hold circuits, vol. 9. In: Proceeding of the 2nd International Conference Computer Science Electronic Engineering (ICCSEE 2013), pp. 1781– 1784 (2013) 5. Wattanapanitch, W., Fee, M., Sarpeshkar, R.: An energy-efficient micropower neural recording amplifier. IEEE Trans. Biomed. Circuits Syst. 1(2), 136–147 (2007) 6. Qian, C., Parramon, J., Sanchez-Sinencio, E.: A micropower low-noise neural recording frontend circuit for epileptic seizure detection. IEEE J. Solid-State Circuits 46(6), 1392–1405 (2011) 7. Wattanapanitch, W., Sarpeshkar, R.: A low-power 32-channel digitally programmable neural recording integrated circuit. IEEE Trans. Biomed. Circuits Syst. 5(6), 592–602 (2011) 8. Nagulapalli, R., Hayatleh, K., Barker, S., Zourob, S., Yassine, N.: An OTA gain enhancement technique for low power biomedical applications. Analog Integr. Circuits Signal. Process 95(3), 387–394 (2018) 9. Nagulapalli, R., Hayatleh, K., Barker, S., Zourob, S., Yassine, N., Sridevi, S.: A bio-medical compatible self bias opamp in 45 nm CMOS technology. In: 2017 International Conference on Microelectronic Devices, Circuits and Systems (ICMDCS), pp. 1–4. IEEE 10. Behazd, R.: Design of analog CMOS integrated circuit (2000) 11. Yan, Z., Mak, P.I., Martins, R.P.: Double recycling technique for folded-cascode OTA. Analog Integr. Circuits Signal Process. 71(1), 137–141 (2012) 12. Li, Y.L., Han, K.F., Tan, X., Yan, N., Min, H.: Transconductance enhancement method for operational transconductance amplifiers. Electron. Lett. 46(19), 1321–1323 (2010) 13. Akbari, M., Biabanifard, S., Asadi, S., Yagoub, M.C.: High performance folded cascode OTA using positive feedback and recycling structure. Analog Integr. Circuits Signal Process. 82(1), 217–227 (2015) 14. Yosefi G.: A special technique for recycling folded cascode OTA to improve DC gain, bandwidth, CMRR and PSRR in 90 nm CMOS process. Ain Shams Eng. J. 11(2), 329–342 (2020) 15. Sabry, M.N., Nashaat, I., Omran, H.: Automated design and optimization flow for fullydifferential switched capacitor amplifiers using recycling folded cascode OTA. Microelectron. J. 101, 104814 (2020) 16. Lv, X., Zhao, X., Wang, Y., Wen, B.: An improved non-linear current recycling folded cascode OTA with cascode self-biasing. AEU-Int. J. Electron. Commun. 101, 182–191 (2019)

Gain Enhanced Single-Stage Split Folded Cascode Operational …

575

17. Bult, K., Geelen, G.J.: A fast-settling CMOS op amp for SC circuits with 90-dB DC gain. IEEE J. Solid-State Circuits 25(6), 1379–1384 (1990) 18. Zhao, X., Zhang, Q., Dong, L., Wang, Y.: A hybrid-mode bulk-driven folded cascode OTA with enhanced unity-gain bandwidth and slew rate. AEU-Int. J. Electron. Commun. 94, 226–233 (2018) 19. Johns, D.A., Martin, K.: Analog integrated circuit design. Wiley (2008)

Secure IBS Scheme for Vehicular Ad Hoc Networks J. Jenefa, S. Sajini, and E. A. Mary Anita

1 Introduction Many researchers focus on Vehicular Ad hoc Network (VANET), because of the rapid growth in Intelligent Transportation Systems (ITS) and wireless technologies like GSM, GPRS, 5G, etc. In VANETs, vehicles communicate through wireless communications. It is also equipped with sensors and processors which help vehicles to establish communication and to perform computation operations [1]. The information exchanged among vehicles and Road Side Units (RSU) plays a vital role in improving traffic safety since it assists drivers with safer driving experience. The private information of a vehicle like location information, identity, etc., may be disclosed when attempting to communicate with other vehicles. Hence, researchers give major consideration for security issues in vehicular networks. In vehicular networks, two modes of communications are feasible. They are intervehicle communication [2] (Vehicle-to-Vehicle communication, V2V) and communication between vehicles and RSUs (Vehicle-to-RSU communication [V2R] and RSU-to-Vehicle communication [R2V]). These are established with the help of OnBoard Units (OBU) installed in the vehicles by manufacturers. To establish secure communications, OBUs and RSUs make use of back-end servers like trusted authorities. Hence the vehicular network is composed of three components: OBUs, RSUs and TAs. OBU records information during its journey and exchanges it with other vehicles J. Jenefa (B) · E. A. M. Anita CHRIST (Deemed to be University), Bangalore, India e-mail: [email protected] E. A. M. Anita e-mail: [email protected] S. Sajini SRM Institute of Science and Technology, Ramapuram, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_51

577

578

J. Jenefa et al.

and RSUs. RSUs are fixed units along the roadsides; vehicles acquire services with the help of RSUs. RSUs, on the other hand, gather information about local conditions from vehicles in their range and exchange it with other vehicles and RSUs. Because of their unique characteristics, vehicular networks are vulnerable to many security attacks. Security features are ensured to resist malicious attackers in the network. Among the security features, the primary feature is authentication which distinguishes attackers from legitimate vehicles. In addition, conditional privacypreservation should be ensured to avoid disclosing private details of vehicles. Information about the identity, location and movements should not be revealed when it communicates with other vehicles in the network. The identities of the attackers, on the other hand, should be extracted when it tries to send fake information. Hence, many researchers propose different authentication schemes with privacy-preservation to ensure security in vehicular networks.

2 Related Works The existing schemes are used to ensure authentication and privacy-preservation in vehicular networks [3–18]. Some of the schemes are discussed in this section. Raya and Hubaux [3] proposed an authentication, which preloads a set of private and public key pairs along with certificates. Since a large number of key pairs and certificates are stored it has high computation overheads. To offset the disadvantages, in Raya and Hubaux [3] scheme, Lu et al. [4] proposed an anonymous certificate-based scheme. In this scheme, RSUs broadcast anonymous certificates for the vehicles in its range, which are valid for a short duration. Due to the frequent updating of certificates, it incurs high computation and communication overhead. In order to address the shortcomings in Lu et al.’s [4] scheme, a novel authentication is proposed by Freudiger et al. [5]. In this scheme, the anonymous certificates are used with mix-zones but still vehicles must store a large number of certificates. Lin et al. [6] proposed another alternative secure approach in which membership managers are used instead of RSUs to issue certificates to the vehicles. The certificates are generated by using group signatures and it still has efficiency issues. Zhang et al. [7] proposed HMAC-based authentication protocol which ensures privacypreservation with the help of RSUs. In this scheme, vehicles communicate with the nearby RSUs to enable its privacy and hence it has low performance. Zhang et al. [8, 9] proposed an ID-based scheme for vehicular networks. It overcomes storage issues by using identities instead of massive certificates. However, it is not secure against reply attacks. In 2014, Chuang and Lee [10], proposed a scheme using the transitive trust relationship (TEAM). It uses only XOR operation and hash functions for authentication and hence it is a lightweight scheme. But it is insecure against insider attacks, impersonation attack and it does not ensure privacy-preservation. Zhou et al. [12] specified these weaknesses and to overcome these shortcomings, a new authentication scheme is proposed using Elliptic Curve Cryptography (ECC) but still it is insecure against

Secure IBS Scheme for Vehicular Ad Hoc Networks

579

identity guessing attack and impersonation attack. Hence we proposed an efficient authentication scheme in this paper.

3 Proposed Scheme The proposed identity-based authentication scheme is explained in this section. It has seven stages; they are registration, login, password change, pseudo ID generation, V2R and R2V authentication, V2V authentication and key update. The vehicular network is subdivided into many regions. Each region is under the control of a trusted authority (TA). TA is responsible for all the RSUs within its region. Each region is further subdivided into many sub-regions and each sub-region is under the control of an RSU. RSUs are responsible for the vehicles in its range. It provides requested services to vehicles and broadcast traffic information periodically. In registration stage, vehicles and RSUs communicate with TA through the secure channel, then the vehicle login into its OBU and change password if it is needed. Vehicles then communicate with RSUs which are nearby using its signature and acquire requested services from RSUs. The flow of the proposed scheme is depicted in Fig. 1.

3.1 Registration In this stage, TA initializes vehicles and RSUs in its region. During registration, TA checks the legitimacy of vehicles and RSUs and loads the shared key which is computed by using TA’s secret key ‘x’. The registration process for vehicles and RSU is explained below. Fig. 1 Work flow diagram of the proposed scheme

580

J. Jenefa et al.

RSU Registration: TA initializes all the RSUs in its region by assigning IDs and loading set of shared keys, SK i in its memory. The shared key, SK i is used to establish authentication between vehicles and RSUs. It is computed by TA by using its secret key ‘x’ and the expiration time of the shared key TSK as follows h(x||TS K ) −→ h (h(x||TS K )) −→ · · · −→ h n−1 (x||TS K ) −→ h n (x||TS K ) S Kn

S K n−1

S K2

S K1

(1)

After the expiration time period, TSK, TA updates the set of shared keys and loads it in all RSUs in its range. It also broadcasts the updated shared keys to all the vehicles in the region by using RSUs. TA sends {IDr , TSK, h (), SK i } to the RSU during initialization, where IDr is the ID of RSU, TSK is the expiration time period of the shared key, h () is the one-way hash function and SK i is the shared key. Vehicle Registration: TA initializes vehicles within its range by assigning IDs and other parameters {IDi , h (), D, PIK i , pwd i }, where IDi is the ID of the vehicle, h () is the one-way hash function, D = h (SK i ), PIK i is the pseudo ID generation keys and pwdi is the password assigned for OBUs. Password ‘pwd i ’ protects the vehicles from information disclosure; even if it is compromised by attackers the information stored in its memory is not disclosed.

3.2 Login OBUs compute the value of Ai , Bi and C i by using the parameters of TA which is loaded in its memory during the registration process. The following equations are used to calculate the values of Ai , Bi and C i . Ai = h (IDi || pwdi ) Bi = D ⊕ Ai Ci = h (Ai || IDi || pwdi )

(2)

The value of IDi and pwdi are known to the drivers of the vehicles during the registration process. By using the IDi and pwdi of the vehicle, the login process is carried out in OBUs. The user sends IDi and pwdi as input to its OBU with these values, OBU computes C i = h (Ai || IDi || pwd i ) and checks whether the generated C i value is the same as that of the stored value. If it is same then the OBU performs pseudo ID generation, signing and verification process.

Secure IBS Scheme for Vehicular Ad Hoc Networks

581

3.3 Password Change The password ‘pwd i ’ of the OBU is changed whenever it is necessary by the user. In such a case, the user sends IDi , pwd i as inputs to its OBU which computes C i value and verifies it with the stored value. If the values are the same, then the vehicle request the user to input the new password, ‘pwd i *’. It then computes the value of Ai , Bi and C i by using the new password as follows. These values are stored in its memory.   Ai = h IDi || pwdi∗ Bi = D ⊕ Ai   Ci = h Ai || IDi || pwdi∗

(3)

3.4 Pseudo ID Generation Multiple pseudo IDs are generated and used rather than original IDs of vehicles. Each pseudo ID is used only once. The pseudo IDs are generated by vehicles using the stored pseudo ID generation keys, PIK i . The pseudo ID is generated as follows P I Di = IDi ⊕ h ( P I K i || t )

(4)

where IDi is the original ID of the vehicle, PIK i is the pseudo ID generation key and t is the current timestamp.

3.5 V2R and R2V Authentication V2R Authentication: Vehicles generate Identity-Based Signature (IBS) by using Eq. (5) as follows. E i = h (Ai ) ⊕ r Si = h ( PIDi || Ai || r || M || t )

(5)

where r is the random number and S i is the signature generated with the pseudo ID, PIDi , the value of Ai , the random number, r, the message to be transmitted, M and the current timestamp, t. The vehicles send {PIDi , E i , S i , Bi , M, t} to the nearby RSUs. RSU verify the received messages from the vehicles as follows. Ai = Bi ⊕ h( S K i )

582

J. Jenefa et al.

r = h (Ai ) ⊕ E i Si = h ( PIDi || Ai || r || M || t )

(6)

If the signature S i is the same as that of the received value, then the received message is accepted by the RSU and then broadcasted. V2R authentication also follows the same steps. R2V Authentication: RSUs broadcast the traffic-related messages to the vehicles within its range periodically. R2V authentication is carried out as follows. Ur = h ( IDr || y || t ) Fr = h (S K i ) ⊕ Ur Sr = h ( IDr || Ur || Fr || M || t )

(7)

where y is the random number and S r is the signature generated with the ID, IDr , the value of U r and F r , the message to be transmitted, M and the current timestamp, t. RSU sends {IDr , F r , S r , M, t} to the vehicles in its range. Vehicles verify the received messages from the RSU as follows. Ur = Fr ⊕ D Sr = h ( IDr || Ur || Fr || M || t )

(8)

3.6 Key Update After expiration time TSK, TA updates the shared key, SK i and sends it to all the RSUs through the secure channel. It then broadcasts the shared key to all the vehicles in its region through RSUs as follows. G r = h (S K i ) ⊕ D ∗     Sr = h IDr  D ∗  G r || t

(9)

where D* = h (SK i ) which is the hash value of the new shared key, SK i . RSU sends {IDr , Gr , S r , t} to the vehicles within its range. Vehicles verify the signature as follows. D ∗ = h (S K i ) ⊕ G r     Sr = h IDr  D ∗  G r || t

(10)

Secure IBS Scheme for Vehicular Ad Hoc Networks Table 1 Comparison of computation cost

583

Schemes

Computation cost (ms)

Chuang and Lee [10]

0.782

Zhou et al. [12]

3.54

Wu et al. [13]

6.116

Proposed scheme

0.411

Fig. 2 Comparison of computation cost

4 Performance Analysis 4.1 Computation Cost The computation cost is the time taken to sign and verify a signature. In the proposed scheme, the computation cost for V2R, R2V and V2V communication is 0.411 ms, 0.274 ms and 0.411 ms, respectively. The computation cost of the V2V communication of the proposed scheme is compared to that of the related schemes as shown in Table 1. As shown, the proposed scheme has 47.45%, 88.39% and 93.28% less computation cost than the Chaung and Lee [10], Zhou et al. [12] and Wu et al. [13] schemes. This is because the proposed scheme uses only XOR and one-way hash functions for signing and verification. In order to ensure unlinkability, multiple onetime pseudo IDs are used which are generated by using pseudo ID generation key, PIKi which is updated periodically by TA. The comparison of the computation cost is graphically depicted in Fig. 2 with the number of OBUs. As shown, the proposed scheme has less computation cost.

4.2 Communication Cost The communication cost to establish V2R, R2V and V2V communication is 84 (20 + 20 + 20 + 20 + 4) bytes, 64 (20 + 20 + 20 + 4) bytes and 84 (20 + 20 + 20 + 20

584 Table 2 Comparison of communication cost

J. Jenefa et al. Schemes

Communication cost (bytes)

Chuang and Lee [10]

160

Zhou et al. [12]

180

Wu et al. [13]

148

Proposed scheme

84

Fig. 3 Comparison of communication cost

+ 4) bytes, respectively. The communication cost of the V2V authentication of the proposed scheme is compared to that of the related schemes in Table 2. As shown, the proposed scheme has 76, 96 and 64 bytes less communication cost than Chaung Lee [10], Zhou et al. [12] and Wu et al. [13] schemes. It is graphically depicted based on the number of messages in Fig. 2. As shown, the communication cost in the proposed scheme is 47.5%, 53.34% and 43.24% less than that of Chuang and Lee [10], Zhou et al. [12] and Wu et al. [13], respectively. As a result, compared to other current techniques, the proposed scheme has lower computation and communication overhead (Figs. 2 and 3).

5 Conclusion In this paper, an identity-based conditional privacy-preserving authentication scheme is proposed. It has less overhead since it uses one-way hash functions and XOR operations for the signing and verification process. In order to overcome the vulnerabilities of the XOR operation, multiple one-time pseudo IDs and random numbers are used. It provides secure V2R, R2V and V2V communications. These communications are possible with the help of shared secret keys, which are updated periodically. The performance of the proposed scheme is compared to that of the related schemes and it is shown that the proposed approach requires less overhead in terms of computation and communication.

Secure IBS Scheme for Vehicular Ad Hoc Networks

585

References 1. Liu, Y., Wang, Y., Chang, G.: Efficient privacy-preserving dual authentication and key agreement scheme for secure V2V communications in an IoV paradigm. IEEE Trans. Intell. Transp. Syst. 18(10), 2740–2749 (2017) 2. Lee, U., Zhou, B., Gerla, M., Magistretti, E., Bellavista, P., Corradi, A.: Mobeyes: smart mobs for urban monitoring with a vehicular sensor network. IEEE Wirel. Commun. 13, 52–57 (2006) 3. Raya, M., Hubaux, J.P.: Securing vehicular ad hoc networks. J. Comput. Secur. 15, 39–68 (2007) 4. Lu, R., Lin, X., Zhu, H., Ho, P.-H., Shen, X.: ECPP: efficient conditional privacy preservation protocol. In: Proceedings of the IEEE International Conference on Computer Communications. Phoenix, AZ, USA, pp. 1229–1237, 13–18, (2008) 5. Freudiger, J., Raya, M., Félegyházi, M., Papadimitratos, P., Hubaux, J.-P.: Mix-zones for location privacy in vehicular networks. In: Proceeding of the ACM Workshop Wireless Network Intelligence Transport Systems (WiN-ITS), pp. 1–7 (2007) 6. Lin, X., Sun, X., Ho, P.-H., Shen, X.: GSIS: A secure and privacy-preserving protocol for vehicular communications. IEEE Trans. Veh. Technol. 56, 3442–3456 (2007) 7. Zhang, C., Lin, X., Lu, R., Ho, P.-H.: RAISE: An efficient RSU-aided message authentication scheme in vehicular communication networks. In: Proceeding IEEE International Conference Communications, pp. 1451–1457 (2008) 8. Zhang, C., Lu, R., Lin, X., Ho, P.-H., Shen, X.: An efficient identity-based batch veri_cation scheme for vehicular sensor networks. In: Proceeding of the 27th Conference Computing and Communication, pp. 246–250 (2008) 9. Zhang, C., Ho, P.-H., Tapolcai, J.: On batch verification with group testing for vehicular communications. Wireless Netw. 17(8), 1851–1865 (2011) 10. Chuang, M.C., Lee, J.F.: TEAM: trust-Extended authentication mechanism for vehicular ad hoc networks. IEEE Syst. J. 8, 749–758 (2014) 11. Zhou, Y., Zhao, X., Jiang, Y., Shang, F., Deng, S., Wang, X.: An enhanced privacy-preserving authentication scheme for vehicle sensor networks. Sensors 17(12), 2854 (2017) 12. Wu, L., Sun, Q., Wang, X., Wang, J., Yu, S., Zou, Y., Liu, B., Zhu, Z.: An enhanced privacypreserving mutual authentication scheme for secure V2V communication in vehicular ad hoc network. IEEE Access 7, 55050–55063 (2019) 13. Jenefa, J., Mary Anita, E.A.: Secure authentication schemes for vehicular adhoc networks: a survey. Wireless Pers Commun 123, 31–68 (2022) 14. Jenefa, J., Anita, E.A.M.: Identity-based message authentication scheme using proxy vehicles for vehicular ad hoc networks. Wireless Netw 27, 3093–3108 (2021) 15. Anita, E.A.M., Jenefa, J.: A survey on authentication schemes of VANETs. In: 2016 International Conference on Information Communication and Embedded Systems (ICICES), pp. 1–7 (2016) 16. Jenefa, J., Mary Anita, E.A.: An enhanced secure authentication scheme for vehicular ad hoc networks without pairings. Wirel. Pers. Commun. 106, 535–554 (2019) 17. Jenefa, J., Mary Anita, E.A.: Secure vehicular communication using ID based signature scheme. Wirel. Pers. Commun. 98, 1383–1411 (2018) 18. Mary Anita, E.A., Jenefa, J., Lakshmi, S.: A self-cooperative trust scheme against black hole attacks in vehicular ad hoc networks. Int. J. Wireless Mobile Comput. 21(1), 59–65, 10042733 (2021). https://doi.org/10.1504/IJWMC.2021.10042733

Secure Localization Techniques in Wireless Sensor Networks Against Routing Attacks Using Machine Learning Models Gebrekiros Gebreyesus Gebremariam, J. Panda, and S. Indu

1 Introduction Wireless sensor networks (WSNs) consist of many sensor nodes providing a self-organizing network having multihop transmission and standard functioning for discovering the location of the data [1]. The network consists of randomly distributed sensor nodes for collecting data from the environment with limited battery constrained and computational data processing for scientific and military applications [2]. The concept of localizing and positioning in WSNs is essential for computing and identifying malicious attacks that degrade the network lifetime and performance. The main objective of this technique is the detection and localization of routing attacks in WSNs, including wormhole, Sybil, blackhole, and sinkhole attacks. Sybil attack is the most harmful routing attack and can be easily deployed in the network affecting the localization accuracy [3] and producing multiple identities acting as the cluster head, as shown below in Fig. 1a. The Sybil nodes S act as beacon nodes and injects a malicious M node so that the normal nodes N receive a broadcasting message from the malicious node. A wormhole attack disrupts the network by creating a tunnel between two malicious nodes and modifying the packets [4], as shown in Fig. 1. Figure 1b shows the two ends of malicious nodes A and B.

G. G. Gebremariam (B) · J. Panda · S. Indu Department of Electronics and Communication Engineering, Delhi Technological University, Shahbad Daulatpur, Main Bwana Road, Delhi 110042, India e-mail: [email protected] J. Panda e-mail: [email protected] S. Indu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_52

587

588

G. G. Gebremariam et al. Sybil Node N10

N9 S

M1

Anchor nodes

Malicious Node

N4

B

Normal Nodes Sybil routing

Malicious node Sensor node

Normal routing

Wormhole tunnel

N1 N8 N2

N7

A

N6 N5

a

b

Fig. 1 Illustration of Sybil attack generating multiple identities and routing paths (a) and wormhole attacks tunnel in WSNs (b)

Table 1 Summary of related works for secure localization in WSNs References

Technique

Research findings

[5, 6]

Position estimation and ANN

Effective for detection of malicious node position

[7]

DV-hop method

Detection and localization of wormhole

[8, 9]

Blockchain and vectorization

Improved localization of malicious nodes

[10, 11]

Bisection and hybrid ANFIS

Secure localization and better network lifetime

2 Literature Review Robinson et al. [1] Proposed a machine learning and 3-dimensional manifold localization technique for detecting and localizing unknown nodes to improve localization accuracy and fault detection. Chen et al. [2] Presented distance weighted hop distance using chicken swarm optimization to maximize unknown nodes’ localization accuracy in WSNs. Giri et al. [3] proposed information-theoretic technique for detecting and localizing Sybil attacks using entropy correlation coefficient in WSNs. The rest of the related works are summarized in Table 1, as shown below.

3 Network Model The network model consists of N number of sensor nodes randomly deployed into two-dimensional environments assuming that sensor motion is random. The sensor nodes are organized into m beacon nodes and n ordinary sensor nodes in the network. The sink node broadcasts information to all the nodes and selects the anchor nodes. Sink nodes and beacon nodes are location- and position-aware of their locations, as shown in Fig. 2a, having higher energy enabling the location and estimate of randomly deployed sensor nodes.

Secure Localization Techniques in Wireless Sensor Networks Against … Beacon Nodes Base Station

589

BS-----Base station

Sensor Nodes

BS

Malicious Nodes Routing Path

Sybil Nodes Anchor Nodes Normal Nodes Normal route Sybil route

a

b

Fig. 2 Secure network model for clustering and localization of sensor nodes (a) and illustration of routing Sybil attack in wireless sensor networks (b) [12]

The attack model, as depicted in Fig. 2b, shows how malicious nodes launch fake behaviors creating multiple identities against the position and location of the legitimate node with various routing paths. The malicious attack compromises the beacon node by capturing and manipulating packets and altering routing and energy information. The detection of malicious sensor nodes provides information for calculating the location and detection accuracy in WSNs based on beacon nodes. The beacon node is selected from the sensor nodes using the selection three criteria: • The strength of the received signal and the list of sensor nodes close to it. • The minimum distance of the nodes from the base station and remaining energy. The distance D between any two sensor nodes is computed using the distance vector technique as shown below: D=

 

ui − u j

2

2  + vi − v j

(1)

where i and j nodes with u and v coordinate, respectively. Computing the distance between the base stations to any node with a small distance is likely the cluster head. The distance vector localization procedure is essential for computing the position of the ordinary nodes. The distance vector approach calculates the minimum distance that was first identified by [12], and it is a range-free strategy [13, 14] with a series of steps including routing initialization, distance computation, and position estimation with the help of the beacon node.

4 Materials and Methods The proposed system, as shown below in Fig. 3a, has various phases and procedures, including sensor deployment and data collecting, computing position and location, data transmission and aggregation, data processing using feature engineering and sampling techniques, attack detection analysis, and classification for implementing

590

G. G. Gebremariam et al. Cs Cluster 1

Sensor Deployment Network Traffic data collection

Cluster 1 Cluster 2

S&D Select and search best Routing path

BS Checking & updating traffic to end user

Cs-->Number of clusters D--> Destination Node CHs-->Clustering Heads

Cluster n

BS--> Base Station

Hybrid Optimization CHs Localization Process Position Location &distance Verification Process Data Processing

S--> Source Nodes

Training set

Testing set

ML Modes

Attack Classification ?

Yes

Attack Detected

No

Cluster labeling and K-means

Normal Traffic data

a

b

Fig. 3 Proposed localization technique in WSNs using hybrid machine learning (ML) models (a) and frequency distribution of attacks in the benchmark datasets for evaluating the models (b)

secure localization and detection of in WSNs using machine learning models [1]. Hyperparameter and Bayesian optimization techniques are used to enhance the performance of hybrid machine learning classification models. The k-means clustering is also used for binary classification and further improves the attack detection performance of the proposed scheme. The cluster head processes, verify, and forwards the data to the base station.

4.1 Datasets and Machine Learning Models The CICIDS2017 and UNSW_NB15 are utilized as benchmark datasets for evaluating the effectiveness of the proposed system using machine learning models for classification and detection of the class of attacks. These datasets were captured and executed using tools for intrusion detection and network attack scenario purposes with seven categories of attacks, as shown in Fig. 3b, with the frequency distribution of the attacks in the samples. The UNSW-NB15 raw data traffic packets were generated using the cyber range laboratory IXIA PerfectStorm tool from the Australia Center for cyber security (ACCS) [15], as depicted in Fig. 3b. The data pre-processing step starts with data cleaning and normalization to improve the data quality for training and testing for building the predicting machine learning models using the minimum and maximum scaling values as in Eq. (2). X norm = ( p − q)

xn − min(xn ) max(xn ) − min(xn )

(2)

where x is the given feature in X, which is feature space, the normalization process is the scaling technique by squeezing the values for each attribute for a given range. P and q are binary values for the max and min, respectively. K-means cluster sampling

Secure Localization Techniques in Wireless Sensor Networks Against …

591

is utilized for improving the classification accuracy of the machine learning model by generating a small K-number of clusters of the original dataset to reduce the training complexity [16]. Several machine learning (ML) techniques are used, such as extra tree (ET), random forest (RF), extreme gradient boosting (XGBoost), decision tree (DT), and ensemble stacking. They are applied for improving and evaluating the efficiency of the proposed system using benchmark datasets with various evaluation metrics. The two hyperparameter and Bayesian optimization techniques are utilized with a tree based on the Parzen estimation (BO-PTE) for enhancing the classification accuracy of the machine learning models for the proposed scheme by training and testing the benchmark datasets.

5 Experimental Setup The simulation setting configuration and evaluation metrics will be discussed in this section. The sensor nodes and sink nodes are assumed to be static after deployment. Table 2 shows the simulation setting of the parameters configuration. The network planning and simulation are conducted in window Intel(R) Xeon(R) Silver 4214 CPU @ 2.20 GHz 2.19 GHz (2 processors) with 128 GB (128 GB usable), × 64-based processor, and 64-bit operating system using MATLAB R2021a. The Python Anaconda toolboxes are used for data processing and analysis to evaluate the proposed method’s performance using the dataset as a benchmark [5].

5.1 Performance Metrics Several assessment metrics are used to measure the proposed technique using average localization error (ALE), localization error (LE), localization accuracy of the unknown sensor nodes, and the confusion matrix taking benchmark datasets for classification of attacks. The average localization error (ALE), average localization Table 2 Simulation parameters for secure localization in WSNs

Parameters

Values

Parameters

Values

Software

MATLAB

Transmission radius

250 m

Deployment

Random

Number of beacons

60

Number of nodes

300

Unknown nodes 240

Protocol

DV-Routing

Model

Regular

Simulation area

1000 × 1000 m2

Mobility

Random

592

G. G. Gebremariam et al.

Table 3 Mathematical expressions of the various performance metrics Localization metrics  LE = (u i − u i )2 + (vi; − vi )2 (3) ALE = ALA =

n



Classification metrics Accuarcy =

Detection rate =



(u i −u i )2 +(vi; −vi )2 (4) i=1 nR     n (u i −u i )2 +(vi; −vi )2 × 100% 1− i=1 nR

T P +TN T P +TN +FP +FN

F1 − score = (5)

Precision =

TP T P +FN

(7)

2×T P 2×T P ++FP +FN

TP T P +F p

(6)

(8)

(9)

accuracy (ALA), accuracy, detection rate precision, and recall are used as evaluation metrics. The average error localization, shortened as ALE [12], is computed as follows in Eq. (3–5), respectively, as in Table 3. The ALE is the summation of the LE of all the unknown nodes to the total number of unknown nodes. The LE is the difference between the estimated and actual positions of unknown nodes. where (u i , vi ) are the actual coordinates of the anonymous node i and (u i , vi ) are the computed coordinates, n denotes unknown nodes, and R is the radius of communication in the network. The other metrics are expressed mathematically as in Table 3 using Eqs. (6–9). Where false positive (F P ) is the number of attacks wrongly classified, and true positive (T P ) is the number of attacks correctly classified as attacks in the network. The true negative (T N ) is the legitimate nodes classified as legitimate, and false-negative (F N ) is the legitimate nodes incorrectly as malicious nodes in the network traffic.

6 Result and Discussion The simulation results show that anchor nodes have more neighbors and connectivity compared to ordinary sensors, as depicted in Fig. 4a. The average connectivity of the sensor networks is 61, and the average number of the neighbor nodes to each anchor node is 3 using the regular model, as shown in Fig. 4b. Malicious nodes affect nodes’ distribution and localization accuracy by creating the wrong position and location of the unknown sensor nodes in wireless sensor networks, as shown in Fig. 4c. Malicious nodes mislead the sensor nodes’ routing path and information, degrading the network service and performance. The root mean square error (RMSE) with 60 beacons and 240 unknown sensor nodes is 0.1908, as shown in Fig. 4d, which is immune.

Secure Localization Techniques in Wireless Sensor Networks Against …

a

c

593

b

d

Fig. 4 Simulated results of senor deployment (a), neighbor relationship diagram (b), malicious node localization (c), and an error value of each unknown node using distance vector scheme (d)

6.1 Attack Detection Analysis The performance of the proposed technique is evaluated using machine learning models using benchmark datasets containing a class of attacks in WSNs, as shown in Table 4. The average detection accuracy of the proposed scheme is further improved with the application of the hybrid cluster labeling (CL) k-means binary classification technique and achieved detection accuracy of 100% using the benchmark datasets. The performance of the proposed technique is effective compared to Kasongo [15] presented attack detection technique using random forest based on a genetic algorithm (RF-GA), as shown in Fig. 5a, achieved with an average detection accuracy of 87.61%. Yang et al. [16] developed multi-tiered hybrid intrusion detection systems (MTH-IDS) with an average detection accuracy of 99.88% using binary classification. Suleiman and Issac [17] Evaluated using the RF-IDS scheme producing better detection accuracy using UNSW_NB15. This confirms the proposed technique is effective for detecting and localizing attacks, as shown in Fig. 5b.

594

G. G. Gebremariam et al.

Table 4 Comparison performance of the proposed scheme based on machine learning models using the benchmark datasets Hybrid machine learning method results Hybrid machine learning method results using CICIDS2017 using UNSW_NB15 Classifier

Accuracy Precision Recall

F1score Accuracy Precision Recall

F1Score

XGB

99.82

99.83

99.75

99.86

99.82

99.78

99.74

99.78

RF

99.82

99.82

99.82

99.80

99.75

99.67

99.75

99.70

DT

99.82

99.91

99.82

99.85

99.68

99.63

99.68

99.66

ET

99.82

99.80

99.82

99.80

99.72

99.66

99.72

99.68

Ensemble 99.82

99.91

99.82

99.85

99.78

99.74

99.78

99.75

K-Means

100.00

100.00 100.00

100.00

100.00

100.00 100.00

100.00

a

b

c Fig. 5 Performance comparison of the proposed scheme using a UNSW_NB15, b CICIDS2017, and WSN-DS c benchmark datasets

Secure Localization Techniques in Wireless Sensor Networks Against …

595

Sun et al. [18] developed a hybrid deep learning-based intrusion detection system (DL-IDS) with an average detection accuracy of 98.67%, as shown in Fig. 5b. Upadhyay et al. [19] proposed gradient boosting feature selection-based intrusion detection systems (GBFS-IDS) in smart grids as shown in Fig. 5c. Jiang et al. [20] proposed an intrusion detection system based on a secure light gradient boosting machine (IDSSLGBM) for detecting routing attacks, as shown in Fig. 5c. This suggests that the proposed scheme is effective for DoS attacks in WSNs in various applications using a benchmark dataset.

7 Conclusion Secure networking planning and data routing for enhancing localization and the lifetime of WSNs is challenging in an unattended environment. Secure localization approach in WSNs and analyze routing with average localization error for the unknown nodes is 0.1908, which is immune that leads to effective detection of malicious nodes. This could be computed with the application of hybrid machine learning methods to analyze and evaluate the performance of the proposed scheme. The hybrid cluster labeling K-means binary classification technique is utilized and achieved average detection accuracy of 100% using benchmark datasets with various classes of attacks.

References 1. Robinson, Y.H., Golden, S.V.E., Lakshmi, J.K.: 3—Dimensional Manifold and Machine Learning Based Localization Algorithm for Wireless Sensor Networks. Wirel. Pers. Commun. (0123456789) (2021). https://doi.org/10.1007/s11277-021-08291-9 2. Chen, J., Zhang, W., Liu, Z., Wang, R., Zhang, S.: CWDV-hop: a hybrid localization algorithm with distance-weight DV-Hop and CSO for wireless sensor networks. IEEE Access 9, 380–399 (2021). https://doi.org/10.1109/ACCESS.2020.3045555 3. Giri, A., Dutta, S., Neogy, S.: Information-theoretic approach for secure localization against sybil attack in wireless sensor network. J. Ambient Intell. Humaniz. Comput. (0123456789) (2020). https://doi.org/10.1007/s12652-020-02690-9 4. Singh, M.M., Dutta, N., Singh, T.R., Nandi, U.: A technique to detect wormhole attack in wireless sensor network using artificial neural network, vol. 53. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-5258-8_29 5. Chen, H., Lou, W., Wang, Z., Wu, J., Wang, Z., Xi, A.: Securing DV-Hop localization against wormhole attacks in wireless sensor networks. Pervasive Mob. Comput. 16(PA), 22–35 (2015). https://doi.org/10.1016/j.pmcj.2014.01.007 6. Hasan, B., Alani, S., Saad, M.A.: Secured node detection technique based on artificial neural network for wireless sensor network. Int. J. Electr. Comput. Eng. 11(1), 536–544 (2021). https:// doi.org/10.11591/ijece.v11i1.pp536-544 7. Farjamnia, G., Gasimov, Y., Kazimov, C.: An improved DV-hop for detecting wormhole attacks in wireless sensor networks 9(1), 1–24 (2020)

596

G. G. Gebremariam et al.

8. Goyat, R., Kumar, G., Rai, M.K., Saha, R., Thomas, R., Kim, T.H.: Blockchain powered secure range-free localization in wireless sensor networks. Arab. J. Sci. Eng. 45(8), 6139–6155 (2020). https://doi.org/10.1007/s13369-020-04493-8 9. Li, X., Yan, L., Pan, W., Luo, B.: Secure and robust DV-hop localization based on the vector refinement feedback method for wireless sensor networks. Comput. J. 60(6), 810–821 (2017). https://doi.org/10.1093/comjnl/bxx002 10. Beko, M., Tomic, S.: Toward secure localization in randomly deployed wireless networks. IEEE Internet Things J. 8(24), 17436–17448 (2021). https://doi.org/10.1109/JIOT.2021.307 8216 11. Kavitha, V.P., Katiravan, J.: Localization approach of FLC and ANFIS technique for critical applications in wireless sensor networks. J. Ambient Intell. Humaniz. Comput. 12(5), 4785– 4795 (2021). https://doi.org/10.1007/s12652-020-01888-1 12. Dong, S., Zhang, X.G., Zhou, W.G.: A security localization algorithm based on DV-hop against sybil attack in wireless sensor networks. J. Electr. Eng. Technol. 15(2), 919–926 (2020). https:// doi.org/10.1007/s42835-020-00361-5 13. Messous, S., Liouane, H.: Online sequential DV-hop localization algorithm for wireless sensor networks. Mob. Inf. Syst. 2020 (2020). https://doi.org/10.1155/2020/8195309 14. Hadir, A., Zine-Dine, K., Bakhouya, M., El Kafi, J.: An improved DV-Hop localization algorithm for wireless sensor networks. In: International Conference on Next Generation Networks and Systems NGNS, pp. 330–334 (2014). https://doi.org/10.1109/NGNS.2014.6990273 15. Kasongo, S.M.: An advanced intrusion detection system for IIoT based on GA and tree based algorithms. IEEE Access 9, 113199–113212 (2021). https://doi.org/10.1109/ACCESS.2021. 3104113 16. Yang, L., Moubayed, A., Shami, A.: MTH-IDS: A multitiered hybrid intrusion detection system for internet of vehicles. IEEE Internet Things J. 9(1), 616–632 (2022). https://doi.org/10.1109/ JIOT.2021.3084796 17. Suleiman, M.F., Issac, B.: Performance comparison of intrusion detection machine learning classifiers on benchmark and new datasets. In: 28th International Conference Computer Theory Applications ICCTA 2018—Proceeding, pp. 19–23 (2018). https://doi.org/10.1109/ICCTA4 5985.2018.9499140 18. Sun, P. et al.: DL-IDS: extracting features using CNN-LSTM hybrid network for intrusion detection system. Secur. Commun. Netw. 2020 (2020). https://doi.org/10.1155/2020/8890306 19. Upadhyay, D., Manero, J., Zaman, M., Sampalli, S.: Learning classifiers for intrusion detection on power grids. IEEE Trans. Netw. Serv. Manag. 18(1), 1104–1116 (2021) 20. Jiang, S., Zhao, J., Xu, X.: SLGBM: an intrusion detection mechanism for wireless sensor networks in smart environments. IEEE Access 8, 169548–169558 (2020). https://doi.org/10. 1109/ACCESS.2020.3024219

Pothole Detection Approach Based on Deep Learning Algorithms Y. Aneesh Chowdary, V. Sai Teja, V. Vamsi Krishna, N. Venkaiah Naidu, and R. Karthika

1 Introduction Potholes on streets are noticeable constructional catastrophes within the plane of a street, caused due to the activity and the awful climate. It has presently been to be one of the major defects of our Indian roads. Potholes not only make streets seem unattractive but also create a risk to the security of individuals traveling on streets. Potholes are a boon to vehicles, causing significant injuries to anyone who is caught in them. They are hazardous for drivers, together with the people walking on footpaths, bicyclists, and street specialists as well. Potholes may cause injury to everyone who travels on the street. As of the most recent estimates from a few state administrations, potholes cause roughly 30 deaths per day on the streets. In comparison to 2016, the mortality rate in 2017 climbed by more than 50%., i.e., 3597 [1] instances every year. It is critical to correctly detect and fix potholes in order to reduce the number of accidents as well as other losses. However, manually detecting potholes is not recommended because it is expensive and time-consuming. As a result, extensive research was conducted in order to develop a system that can detect potholes, which would have been a significant step forward in increasing the efficacy of survey, and the quality of pavement through inspection, observation, and quick reaction. Human error is one cause for road accidents, since drivers are unable to spot potholes and make rash judgments. To combat it, the advanced driver assistance system (ADAS) assists drivers in spotting possible hazards and dangers ahead of time. ADAS is also in charge of managing the vehicle’s control, balance, and movement in dangerous situations. Through a careful interface, ADAS has increased automobile Y. A. Chowdary · V. S. Teja · V. V. Krishna · N. V. Naidu · R. Karthika (B) Department of Electronics and Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_53

597

598

Y. A. Chowdary et al.

and road safety [2]. To increase safety qualities and obtain a clear steer on road deaths, technology such as warning the driver of possible dangers, providing protections, or even taking control of cars during a snag has been developed. Potholes cause a variety of obstacles brought on by such disasters. In such instances, it appears that the only option is to contact or protest to the appropriate authorities. These forces are not readily available or quick enough, perhaps worsening the problem. As a result, an automated solution is required to assist responsible authorities in effectively managing pothole issues. A real-time application is provided to meet the requirement by automating the monitoring process using latest deep learning algorithms and image processing techniques [3]. Using deep neural networks, it was feasible to reliably identify potholes [4–8].

2 Literature Survey This paper describes creating a two-stage process in which the model tries to localize the regions that are more likely to have potholes, then increases the resolution of these regions and concentrates upon segments that differentiate between areas with potholes and areas without potholes [9]. Aparna et al. [10] A CNN-based 2D visionbased method for identifying potholes includes two primary networks: a localization convolution network and part classification network (PCNN). To locate the region most likely to have potholes, LCNN is used which has a high recall metric. The PCNN uses classification to forecast potholes in the region. Separately, the LCNN and PCNN are trained: With 100 epochs, the image was scaled to 352 × 224. The major objective of the paper [11] is to construct a profound neural network to identify images with potholes and to create an end-to-end framework, and develop an android application, and after that, send the demonstration into it. Images of streets were taken where the pictures are gathered using versatile cameras in sunshine along most streets and contract paths and a dataset was created out of them. For pothole location, a CNN model is presented and tested with images. The Google Maps Application Programming Interface (API) is used to generate real-time potholemarked maps [5, 12–14]. The pothole application system take care of the pothole detection and also helps in finding the location of the image in the map using the framework. The paper describes how to develop a real-time automated system application that will aid authorities in effectively addressing the problems of waste, open manholes, and potholes [3]. To automate the monitoring procedure, powerful artificial intelligence and image processing techniques are used. The dataset was constructed by obtaining 700 photographs from Google Images for all the classes for the model which are manholes, potholes, and as well as garbage. Users may contribute images of potholes and garbage and are allowed to post the images with the exact location of them through the Android application. For picture validation, a classifier is created with F-RCNN. This information is then recorded in a database.

Pothole Detection Approach Based on Deep Learning Algorithms

599

The model used is a F-RCNN with Inception-v2 network [4]. The performance, accuracy, detection time, and differences between F-RCNN using SSD and the algorithm YOLO are compared in the paper. The suggested approach uses You Only Look Once version 2 (YOLO v2) and a convolutional neural network to detect potholes [13]. The characteristics of testing and images are extracted using the preconfigured CNN resnet50. In this study [15], a mask region-based CNN is presented as a deep learning approach for properly detecting and segmenting such potholes in order to compute their area. Images comprising 291 that were personally gathered in Mumbai streets, neighboring roads. The dataset is manually labeled with VGG tool, which is publicly accessible. Potholes were recognized as ROI using the Mask Region-Based CNN (Mask RCNN). The area of a pothole is then determined based on the created zone of interest. After that, the calculated area and accurate samples are set side by side. This research [16] proposes a deep learning-based approach for detecting potholes somewhat faster with photos, lowering risk and trouble. Deep learning and FasterRCNN with Inceptionv2 were the main components of this model.

3 Methodology 3.1 R-CNN The faster region-based convolutional neural network (F-RCNN) comprises three networks, namely: Feature, detection, and region proposal networks. Feature network in F-RCNN is responsible for the extraction of features from the images. This process of generating features is usually carried out by using a feature extractor, such as ResNet50 or Inception v3. Once the features are generated by the feature network, the execution is passed on to the region proposal network to generate object proposals. This layer estimates the areas in the image where objects are most likely to exist. The RPN is a three-layered convolutional network, where one layer is used to classify the images with classes and the remaining performs regression. The regression provides the bounding box coordinates. Eventually, the class of the object detected within the bounding box is produced by the detection network [1, 10, 17–19]. R-CNN model is trained using the Resnet101 feature extractor with the augmented dataset collected and sent for testing and validation by splitting the dataset. In each epoch, the precision and recall values are printed and stored while the model is trained. Figure 1 shows the block diagram of pothole detection for R-CNN.

600

Y. A. Chowdary et al.

Fig. 1 Block diagram of pothole detection for R-CNN

3.2 YOLO YOLO is a real-time object detection system that detects a variety of things in a fraction of time. It also recognizes items faster and more precisely than previous recognition systems. YOLO was designed to develop time taking two staged object detector like F-RCNN work better. Even when running on a GPU, R-CNNs are fast, yet they’re slow. Single-staged detectors, such as YOLO, are relatively fast and can attain hyper-real-time efficiency on a GPU. Since YOLO is trained to do both classification and regression at single time, it takes less time for prediction [17]. The YOLOv5 consists of two parts: Model Backbone and Model Head. The Model Backbone extracts the important features present in the given input image and reduces the number of network parameters to prevent overfitting, it extracts the informative features of the image and reduces the amount of captured features. The Model head generates probabilities of classes, localization vectors, and the performance metrics of detected target object. Following this, an activation function in the final detection layer will be used to activate neurons based on its weight. To automatically find the

Pothole Detection Approach Based on Deep Learning Algorithms

601

optimum bounding box for the dataset and apply them during training, YOLO v5 employs a k-means clustering technique with various k values. YOLO-v5 is the updated version of YOLO-v4 which is identical to YOLOv4 in terms of performance, implementation, and design. Furthermore, the models in YOLOv5 are substantially smaller, faster to train, and more practical to utilize in a real-world application. For their model, YOLOv5 provides four distinct scales: S, M, L, and X, which stands for small, medium, large, and Xlarge, respectively. Each of these scales multiplies the depth and breadth of the model by a different factor, so the general structure of the model remains the same, but the size of each model is scaled. In this paper, we had implemented the models YOLOv5s and YOLOv5m and compared their performance. The YOLOv5 implementation is compatible with the metrics of the Microsoft Common Objects in Context (COCO) API at three distinct object sizes (bounding box areas) and Intersection over Unions (IOU), which are most important terms for this study. The method for calculating values at specified scales can provide a good indicator of the model’s performance, although it may be somewhat erroneous in extreme circumstances, which will not be an issue for most of the cases.

3.3 Dataset Indian potholes are unique from those found in other countries. As a result, there is a need to create a new dataset that accurately depicts contemporary Indian road conditions. Our crew acquired 665 photographs from a variety of perspectives, distances, and lighting situations. Photographs for the dataset are collected in different weather circumstances such as during rain and cloudy sunny. The annotation is done using LabelImg because it can label pictures in YOLO format. Sample images of the dataset are shown in Fig. 2. Data augmentation is a strategy for extending the quantity of data available by generating new dataset from current dataset. It improves the dataset by creating a variety of images with different filters applied. The following data enhancements were performed: Rotation of the images: We rotated the image clockwise by 15° and 90°. Horizontal flip is used for photographs that, when flipped horizontally, seem almost similar. For data augmentation, Gaussian noise is injected into the picture. The process of blurring a picture entails averaging nearby pixels. This blurs the image and lowers the amount of information. The dataset size grows larger, and the model’s training improves. The 665 picture dataset was turned into 1995 image datasets using the aforesaid data augmentation approaches, and object identification algorithms were employed to find the potholes. The dataset is divided into the train, test sets, and validation sets at random. The train set comprises 70% of the dataset, 20% for validation, and 10% for testing.

602

Y. A. Chowdary et al.

Fig. 2 Sample images of the dataset

Fig. 3 Detection of potholes by using R-CNN model

4 Results 4.1 R-CNN The results are obtained based on mean Average Precision. Faster-RCNN with ResNet101 feature extractor achieved a precision 63.9% and a Recall 75%. Figure 3 shows the detection of potholes by using R-CNN model.

4.2 YOLO YOLOv5s algorithm had an accuracy of 82% and a Recall of 71.5%, whereas FasterRCNN had an average precision of 63.9% and an average Recall of 75%. The below Fig. 4 detected of potholes by using YOLO model.

Pothole Detection Approach Based on Deep Learning Algorithms

603

Fig. 4 Detected of potholes by using YOLO model

Above Fig. 5 shows the loss graph and the mAP graph after testing the trained YOLOv5 model, and the average loss and mean average precision after validation of the dataset are obtained. Table 1 shows the metrics table for F-RCNN and YOLOv5 algorithm The above metrics are calculated using mathematical formulas and equations. Precision and Recall are obtained after training the model and testing the model which is trained. Average Precision and Average Recall are the average of precision and recall values, respectively. F1 score is calculated by using average precision and average recall values given by the formula, F1 = 2 ∗ (P ∗ R)/(P + R)

(1)

Mean average precision is calculated by using precision values and given by the formula,

Fig. 5 Precision versus recall graph for YOLOv5 algorithm

604

Y. A. Chowdary et al.

Table 1 Metrics table for F-RCNN and YOLOv5 algorithm: Metrics table

Metrics

F-RCNN with ResNet101

YOLOv5s

YOLOv5m

Average precision 0.78

0.82

0.84

Mean average precision (mAP)

0.51

0.74

0.75

Average recall

0.70

0.72

0.74

F1 score

0.69

0.77

0.79

Loss

0.04

0.03

0.03

Time

1.98 s

0.01 s

0.02 s

m A P = (1/N )(Pi)−−(where i ranges from 1 to n)

(2)

Loss is a statistic that represents how accurate the model is, or to put it another way, a faulty prediction results in a loss. A loss of 0 indicates that the model is perfect, whereas a loss of more than 0 suggests that the model is inefficient. Time is defined as the average time taken to detect the image after training the model.

5 Conclusion Pothole detection accomplished by object identification techniques with increased speed and accuracy plays a significant role in the prevention of disasters and troubles happening on roads. Due to this, a model that employs the convolutional neural networks to detect the potholes was developed. This paper proposes four distinct algorithms, which are then compared. The algorithms Faster R-CNN and YOLOv5 are implemented. The size of the dataset was increased from 665 to 1995 using data augmentation techniques. ResNet101 is used as the feature extractor for Faster R-CNN, and the performance are compared. The pre-trained weight for convolutional layers in YOLOv5 is the reduced configuration of Darknet-53. YOLOv5s and YOLOv5m are the algorithms implemented and compared. Although each algorithm had its own strengths and issues, YOLOv5 m outperform other designs in terms of accuracy. Using the live front-view camera that is employed by Advanced Driver-Assistance Systems, this pothole detection model can be used in alerting the driver through its real-time detection. Potholes, on the other hand, may also go unnoticed and lead to false negatives due to various factors such as insufficient lighting, water-covered potholes, and high-speed vehicle movement. The model can also lead to false positives as a result of different kinds of shadows and the various shapes in which potholes are being deployed. To overcome these challenges, additional cameras can be installed, and features more specific to the potholes can be included in the recommended model. The data obtained by the system can also be given to a database. This

Pothole Detection Approach Based on Deep Learning Algorithms

605

may be linked with an application that pins pothole locations or help Google Maps in generating directions and routes with fewer potholes.

References 1. Dunna, S., Nair, B.B., Panda, M.K.: A deep learning based system for fast detection of obstacles using rear-view camera under parking scenarios. In: 2021 IEEE International Power and Renewable Energy Conference (IPRECON), pp. 1–7. IEEE (2021) 2. Dharneeshkar, J., Aniruthan, S.A., Karthika, R., Parameswaran, L.: Deep learning based detection of potholes in Indian roads using YOLO. In: 2020 International Conference on Inventive Computation Technologies (ICICT), pp. 381–385 (2020). https://doi.org/10.1109/ICICT4 8043.2020.9112424 3. Sayyad, S., Parmar, S., Jadhav, M., Khadayate, K.: Real-time garbage, potholes and manholes monitoring system using deep learning techniques. In: 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI) (48184), pp. 826–831 (2020). https://doi.org/ 10.1109/ICOEI48184.2020.9143030 4. Varona, B., Monteserin, A., Teyseyre, A.: A deep learning approach to automatic road surface monitoring and pothole detection. Pers Ubiquit Comput. 24, 519–534 (2020). https://doi.org/ 10.1007/s00779-019-01234-z 5. Carneiro, T., Medeiros Da NóBrega, R.V., Nepomuceno, T., Bian, G., De Albuquerque, V.H.C., Filho, P.P.R.: Performance analysis of google colaboratory as a tool for accelerating deep learning applications. IEEE Access 6, 61677–61685 (2018). https://doi.org/10.1109/ACCESS. 2018.2874767 6. CC, G.B.R., SB Rao, M., SM, K.E., SJ: Deep learning based pothole detection and reporting system. In: 2020 7th International Conference on Smart Structures and Systems (ICSSS), pp. 1–6 (2020). https://doi.org/10.1109/ICSSS49621.2020.9202061 7. Rohitaa, R., Shreya, S., Amutha, R.: Intelligent deep learning based pothole detection and reporting system. In: 2021 Fourth International Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 1–5 (2021). https://doi.org/10.1109/ICECCT 52121.2021.9616703 8. Yik, Y., Alias, N., Yusof, Y., Isaak, S.: A real-time pothole detection based on deep learning approach. J. Phys: Conf. Ser. 1828, 012001 (2021). https://doi.org/10.1088/1742-6596/1828/ 1/012001 9. Chen, H., Yao, M., Gu, Q.: Pothole detection using location-aware convolutional neural networks. Int. J. Mach. Learn. Cyber. 11, 899–911 (2020). https://doi.org/10.1007/s13042020-01078-7 10. Aparna, Y.B., Rai, R., Gupta, V., Aggarwal, N., Akula, A.: Convolutional neural networks based potholes detection using thermal imaging. J. King Saud Univ. Comput. Inf. Sci. 11. Patra, S., Middya, A., Roy, S. : PotSpot: Participatory sensing based monitoring system for pothole detection using deep learning. Multimedia Tools Appl. 80 (2021). https://doi.org/10. 1007/s11042-021-10874-4 12. Sumalatha, R., Rao, R.V., Devi, S.M.R.: Pothole detection using YOLOv2 object detection network and convolutional neural network. In: Iyer, B., Ghosh, D., Balas, V.E. (eds) Applied Information Processing Systems . Advances in Intelligent Systems and Computing, vol. 1354. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-2008-9_28 13. Omar, M., Kumar, P.: Detection of Roads Potholes using YOLOv4. Int. Conf. Inform. Sci. Commun. Technol. (ICISCT) 2020, 1–6 (2020). https://doi.org/10.1109/ICISCT50599.2020. 9351373 14. Arjapure, S., Kalbande, D.R.: Deep learning model for pothole detection and area computation. Int. Conf. Commun. Inf. Comput. Technol. (ICCICT) 2021, 1–6 (2021). https://doi.org/10. 1109/ICCICT50803.2021.9510073

606

Y. A. Chowdary et al.

15. Kumar, A., Chakrapani, Kalita, D., Singh, V.: A modern pothole detection technique using deep learning 1–5 (2020). https://doi.org/10.1109/IDEA49133.2020.9170705 16. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) 17. Dharneeshkar, J., Aniruthan, S.A., Karthika, R., Parameswaran, L.: Deep learning based detection of potholes in Indian roads using YOLO. In: 2020 International Conference on Inventive Computation Technologies (ICICT), pp. 381–385. IEEE (2020) 18. Hemaanand, M., Rakesh Chowdary, P., Darshan, S., Jagadeeswaran, S., Karthika, R., Parameswaran, L.: Advanced driver assistance system using computer vision and IOT. In: International Conference on Computational Vision and Bio Inspired Computing, pp. 768–778. Springer, Cham (2019) 19. Haritha, H., Thangavel, S.K.: A modified deep learning architecture for vehicle detection in traffic monitoring system. Int. J. Comput. Appl. 1–10 (2019)

Design of 7:2 Approximate Compressor Using Reversible Majority Logic Gate Vidya Sagar Potharaju and V. Saminadan

1 Introduction Power dissipation is being researched more and more in the context of integrated circuit design since it is considered one of the primary barriers to achieving high performance. Incorporating computational errors into fault-tolerant applications like multimedia signal processing, pattern recognition, and machine learning is a promising way to reduce power consumption and improve system performance [2]. VLSI systems can benefit from approximate computing since it reduces power consumption and improves performance [3]. Extensive research has been done on approximate computer arithmetic circuits based on complementary metal oxide semiconductors (CMOS). Specifically, the usage of HDL is required in order to describe the functioning and timing specifications of the gate. It is possible to conduct multiinput logic operations using the majority gate shown in Fig. 1; the logic expression for a three-input majority gate may be written as [4]: F = M(A, B, C) = AB + BC + AC

(1)

This majority logic gate is useful in improving power consumption and performance with error tolerant. On the other hand, the reversible logic method will be employed to reduce the power dissipation in arithmetic operations of digital logic gates in VLSI system design. These methods are used in applications such as quantum computing, nanotechnology, and low power CMOS operations, among others. When

V. S. Potharaju (B) · V. Saminadan Department of ECE, Puducherry Technological University, Puducherry 14, India e-mail: [email protected] V. Saminadan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_54

607

608

V. S. Potharaju and V. Saminadan

Fig. 1 Majority gate (3-input voter)

it comes to reversible quantum procedures, this approach takes the top spot in terms of priority. Some of the previous works on different approximate compressor designs are presented [5–9] where 4:2, 5:2, and 7:2 compressor adders are presented, which gives us the performance difference in terms of power, area and delay. Here by combining both the techniques, i.e., designing a majority logic gate with reversible logic, we will get better results in arithmetic operations. The proposed work will build and synthesize all of the reversible logic gates, then picks the reversible logic gate with the greatest performance for use as the majority gate. The following is the structure of this paper: Specifically, Sect. 2 discusses the proposed reversible majority logic gate, Sect. 3 discusses the proposed reversible majority logic gate using the 4:2 compressor, and Sect. 4 discusses the suggested technique of reversible logic gate using the 7:2 compressor. Section 5 will provide the findings of the majority logic gates and approximation adders, as well as their implementation. Section 6 brings this study to a conclusion.

2 Proposed Reversible Majority Logic Gate In addition to being relevant to several new nanotechnologies, majority logic (ML) and its fundamental building block (the 3-input majority logic gate, Majority Voter Circuit) have been widely employed in digital circuit design. In this study, we propose different designs of approximation adders using ML [9]. Reversible logic is mostly utilized to minimize noise in logic gates, and for this reason, we recommend that the majority of logic gates be constructed using reversible logic gates. For this module of work, nine distinct reversible logic gates for performance assessments were designed, including the FEYNMAN gate, the FREDKIN gate, the HNG gate, the MKG gate, the PERES gate, the SGG gate, the TOFFOLI gate, the TR gate, and the TSG gate [10]. In a similar fashion, we have implemented all of the reversible logic gates in a Xilinx Vertex-5 FPGA and compared slice LUT, occupied slice registers, IOB, delay, and power consumption. Finally, the best-performing and most appropriate reversible logic gate for the majority logic gate is put to use. Table 1 shows the results of nine distinct reversible logic gates that were evaluated. As seen in Fig. 2, the suggested majority gate will use reversible logic, which will include a low LUT, low slice register, and low IOB in order to synthesize results. The TOFFOLI gate and the FEYNMAN gate both are the logic gates of choice for the proposed majority gate because these two logic gates are using much less

Design of 7:2 Approximate Compressor Using Reversible Majority …

609

Table 1 Comparisons of reversible logic gate Comparisons of reversible logic gate—synthesized Xilinx Slice LUT

Occupied slice

IOB bonded

Delay (ns)

Power (mW)

FEYNMAN gate

1

1

4

3.696

3.294

FREDKIN gate

2

1

6

3.819

3.294

HNG gate

2

2

8

3.871

3.294

MKG gate

2

2

8

3.875

3.294

PERES gate

2

1

6

3.813

3.294

SGG gate

2

2

8

3.813

3.294

TOFFOLI gate

1

1

6

3.809

3.294

TR gate

2

1

6

3.813

3.294

TSG gate

3

3

8

3.881

3.294

Fig. 2 Proposed majority gate using reversible logic

slice LUT and occupied slice when compared to all other reversible logic gates. Finally, the suggested majority gate architecture will be shown in Fig. 1, and it will be developed in accordance with the majority gate logic Eq. (1). The design of Fig. 2 architecture will include three TOFFOLI gates and two Feynman gates, as well as other features. As shown in Eq. (1), the TOFFOLI gate will operate on AND gate functionality, so it will combine the inputs of A and B, B and C, A and C, whereas the Feynman gate will work on OR gate functionality, so it will combine the inputs of A and B, B and C, A and C, as shown in Eq. (1).

610

V. S. Potharaju and V. Saminadan

Fig. 3 Schematic diagrams of proposed approximate 4:2 compressor: a RMLAC11, b RMLAC221, c RMLAC22-2, d RMLAC12-1, e RMLAC12-2, and f RMLAC21

3 Reversible Majority Logic Gate Using 4:2 Compressors In general, 4:2 compressors consist of two full adders, which are named as models M1 and M2 from the top to the bottom of the compressor. To substitute the precise versions, 1-bit RMLAFA’s (Reversible Majority Logicbased Approximate Full Adder) are utilized instead. A total of six potential designs are studied by using various combinations of RMLAFA1 and RMLAFA2 in different ways (Fig. 3). Module 1 of RMLAC21 (Reversible Majority Logic-based Approximate Compressor) makes use of RMLAFA2, while module 2 makes use of RMLAFA1. When RMLAFA2 is used as module 2, RMLAC11 has been suggested; there are two schemes that may be used. This means that the carry input of the compressor is used by RMLAC22-1 and RMLAC12-1, which then use the negation of the input to compute the Sum, while the Carry input of the compressor is used by RMLAC22-2 and RMLAC12-2, which utilize the output of module 1 as carry. As a result, various approximate 4:2 compressors based on 1-bit RMLAFAs (with input values of 5 and 4, 3 and 2, and output values of sum, cout, and carry) values are constructed, as well as numerous values. In a similar vein, the suggested reversible majority logic gates were used in the development of the proposed 4:2 compressors, rather than traditional logic gates. As a result of these modifications, the number of LUT’S and garbage (trash) signals in the 4:2 compressor approximation adder was decreased. Comparative analysis of a 4:2 compressor utilizing conventional and reversible majority gates is presented in Table 2, and a comparison analysis chart of a 4:2 compressor using conventional and reversible majority gates is displayed in Fig. 4.

4 Proposed Reversible Majority Logic Gate Using 7:2 Compressors Using the suggested reversible majority gate design, the following methods of majority logic approximate compressor design (RMLAC) were used to construct

Design of 7:2 Approximate Compressor Using Reversible Majority …

611

Table 2 Comparisons of 4:2 compressor using conventional and reversible majority gate Comparisons of compressor 4:2 using conventional majority gate versus reversible majority gate—synthesized Xilinx Vertex 5 XC5VLX330-2FF1760 Slice LUT Conventional majority logic gate

Reversible majority logic gate

Occupied slice

IOB bonded

Delay (ns)

Power (mW)

MLAC11

2

1

8

5.385

3.294

MLAC22-1

1

1

8

4.771

3.294

MLAC22-2

2

1

8

5.273

3.294

MLAC12-1

2

1

8

4.884

3.294

MLAC12-2

2

1

8

4.391

3.294

MLAC21

1

1

8

5.273

3.294

RMLAC11

3

3

8

4.032

3.294

RMLAC22-1

1

1

8

4.028

3.294

RMLAC22-2

2

2

8

4.028

3.294

RMLAC12-1

2

2

8

4.028

3.294

RMLAC12-2

3

3

8

4.028

3.294

RMLAC21

2

2

8

4.032

3.294

Fig. 4 Comparisons analysis chart of 4:2 compressor using conventional and reversible majority logic gates

612

V. S. Potharaju and V. Saminadan

Table 3 Comparisons of 7:2 compressor using conventional and reversible majority logic gates Comparisons of compressor 7:2 using conventional majority gate versus reversible majority gate

Conventional majority logic gate

Reversible majority logic gate

Slice LUT

Occupied slice

IOB bonded

Delay (ns)

Power (mW)

MLAC11

2

1

10

5.883

3.294

MLAC22-1

2

1

10

5.193

3.294

MLAC22-2

2

1

10

5.770

3.294

MLAC12-1

2

1

10

5.306

3.294

MLAC12-2

2

1

10

4.889

3.294

MLAC21

2

1

10

5.845

3.294

RMLAC711

3

2

10

4.566

3.294

RMLAC722-1

2

1

10

4.328

3.294

RMLAC722-2

2

1

10

4.605

3.294

RMLAC712-1

2

1

10

4.561

3.294

RMLAC712-2

3

2

10

4.565

3.294

RMLAC721

3

2

10

4.561

3.294

the 7:2 compressor: RMLAC711, RMLAC722-1, RMLAC712-1, RMLAC712-2, and RMLAC721. The comparisons between conventional majority gate using 7:2 compressor and reversible majority gate using 7:2 compressors are displayed in Table 3. As a result of this demonstration here in this case, as compared to the conventional design, the reversible logic majority gate design will need much less latency, and certain RMLAC methods will require significantly less LUT and occupied slice registers than the traditional. The RMLAC design has six separate modules, which are designated as RMLAC711, RMLAC722-1, RMLAC722-2, RMLAC712-1, RMLAC712-2, and RMLAC721, for example. The 7:2 compressor adder was built and tested using this all-RMLAC architecture to demonstrate its performance. It will be explored by six distinct designs in the 7:2 compressors, which were created with three majority reversible gates and tested. There will be a separate sum and carry forming procedure for each of the designs, which will be represented in the following equations. The following is the design of the RMLAC711 technique of sum, carry, and cout generation: Sum =∼ (M(P7, P6, (M(P4, P5, ∼ M(P1, P2, P3))); Carry = (M(P7, P6, (M(P4, P5, ∼ M(P1, P2, P3))); Cout = (M(P1, P2, P3));

Design of 7:2 Approximate Compressor Using Reversible Majority …

613

The following equation depicts the design of the 7:2 Compressor Adder of the RMLAC722-1, and the design architecture diagram of the RMLAC722-1 is shown in Fig. 5b. Sum = (M(∼ P7, P6, (M(P4, P5, M(P1, P2, ∼ P3))); Carry =∼ P7 ; Cout =∼ P3 ; The following equation depicts the design of the 7:2 Compressor Adder of the RMLAC722-2, and the design architecture diagram of the RMLAC722-2 is displayed in Fig. 5c. Sum = (M(P7, P6, ∼ (M(P4, P5, M(P1, P2, ∼ P3))); Carry = M(P4, P5, ∼ M(P1, P2, P3)); Cout = P3 ; The following equation depicts the design of the 7:2 Compressor Adder of the RMLAC712-1, and the design architecture diagram of the RMLAC712-1 is displayed in Fig. 5d. Sum = (M(∼ P7, P6, (M(P4, P5, ∼ M(P1, P2, P3))); Carry =∼ P7 ; Cout = M(P1, P2, P3); The following equation depicts the design of the 7:2 Compressor Adder of the RMLAC712-2, and the design architecture diagram of the RMLAC712-2 is displayed in Fig. 5e. Sum = (M(P7, P6, (M(P4, P5, M(P1, P2, P3))); Carry =∼ M(P1, P2, P3); Cout = M(P1, P2, P3);

614

V. S. Potharaju and V. Saminadan

(a) RMLAC711,

(b) RMLAC722-1,

(c) RMLAC722-2.

(d) RMLAC712-1,

(e) RMLAC712-2,

and (f) RMLAC721.

Fig. 5 Design of proposed approximate 7:2 compressors

Design of 7:2 Approximate Compressor Using Reversible Majority …

615

Figure 5f depicts the design architecture diagram of the 7:2 Compressor Adder RMLAC721. The proposed designs significantly reduce the number of gates and delay but at a decrease in accuracy. Sum =∼ (M(P7, P6, (M(P4, P5, M(P1, P2, ∼ P3))); Carry = (M(P7, P6, (M(P4, P5, M(P1, P2, ∼ P3))); Cout =∼ P3; Table 3 and Fig. 6 show the comparisons of this entire design with conventional majority gate and reversible majority gate design. When compared to its precise counterpart, reversible logic this design reduced the required amount of memory logic components and garbage signals, while only suffering a small loss in accuracy when compared to its exact counterpart. Several experiments have proved that the suggested work, which is based on reversible majority logic gates of an approximation adder, beats previous approximate adders in terms of area, latency, and power consumption.

Fig. 6 Comparisons analysis chart of 7:2 compressor using conventional and reversible majority logic gates

616

V. S. Potharaju and V. Saminadan

Fig. 7 Simulation output of proposed 7:2 compressor RMLAC712-1

5 Results and Implementations of Reversible Majority Logic Gate Approximate Adders Finally, from Table 3, we can come to a conclusion that the 7:2 RMLAC712-1 adder compressor is giving best results in terms of efficiency and accuracy out of six alternative approaches; therefore, synthesis, simulation, and implementation with the power report are analyzed for this module. A Vertex-5 FPGA (XC5VLX330-2FF1760) was used to create this work using Xilinx ISE 14.7, which was then simulated with the help of Modelsim 6.5b. For RMLAC712-1, here seven operands P1, P2, P3, P4, P5, P6 and P7 are there, with these total 128 combinations are possible. The simulation results of all these 128 combinations cases are shown in Fig. 7. The synthesized results and design summary of the RMLAC722-1 are shown in Fig. 8. The RTL schematic of the of RMLAC722-1 design is shown in Fig. 9, and delay report of RMLAC722-1 is shown in Fig. 10, and power report of RMLAC722-1is shown in Fig. 11.

6 Conclusion It has been demonstrated in this research how to develop, analyze, and evaluate approximation adders that are based on reversible majority logic. These designs have lower circuit complexity and delay when compared to their exact counterparts.

Design of 7:2 Approximate Compressor Using Reversible Majority …

617

Fig. 8 Synthesis results of RMLAC722-1

Fig. 9 RTL schematic of RMLAC722-1

Reversible logic here decreased the number of memory logic elements and garbage signals compared to the exact counterpart. It has been demonstrated that the proposed work on reversible majority logic gate-based approximate adder outperforms other approximate adders in terms of delay, area and power. This work is developed in 4-bit and 7-bit sizes and simulated and synthesized using Modelsim and Xilinx Vertex-5 FPGA (XC5VLX330-2FF1760), respectively, with comparisons in area and power utilization. Further, this work can be extended to applications such as digital signal processing and image processing to name a few examples.

618

V. S. Potharaju and V. Saminadan

Fig. 10 Delay report of RMLAC722-1

Fig. 11 Power report of RMLAC722-1

References 1. Liu, W., Zhang, T., McLarnon, E., Oneill, M., Montuschi, P., Lombardi, F.: Design and analysis of majority logic based approximate adders and multipliers. IEEE Trans. Emerg. Top. Comput. 1–1 (2019) 2. Han, J., Orshansky, M.: Approximate computing: an emerging paradigm for energy-efficient design. In: Proceedings of European Test Symposium, pp. 1–6 (2013)

Design of 7:2 Approximate Compressor Using Reversible Majority …

619

3. Sharada Guptha, M.N., Eshwarappa, M.N.: FPGA implementation of high performance reversible logic method of array multiplier. JAC J. Compos. Theory XIII(VI) (2020). ISSN: 0731-6755 4. Labrado, C., Thapliyal, H., Lombardi, F.: Design of majority logic based approximate arithmetic circuits. In: Proceedings—IEEE International Symposium on Circuits and Systems, pp. 2122– 2125 (2017) 5. Fathi, A., Mashoufi, B., Azizian, S.: Very fast, high-performance 5:2 and 7:2 compressors in CMOS process for rapid parallel accumulations. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 28, 1403–1412 (2020) 6. Rashid, A., Mir, A.G.: Achieving performance speed-up in FPGA based 4:2 compressor using fast carry-chains. In: 4th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 5–9 (2017) 7. Thamizharasan, V., Kasthuri, N.: High-speed hybrid multiplier design using a hybrid adder with FPGA implementation. IETE J. Res. (2021) 8. Agarwal, S., Harish, G., Balamurugan, S., Marimuthu, R.: Design of high speed 5:2 and 7:2 compressor using nanomagnetic logic. Communications in Computer and Information Science, vol. 892. Springer, Singapore (2019) 9. Momeni, A., Han, J., Montuschi, P., Lombardi, F.: Design and analysis of approximate compressors for multiplication. IEEE Trans. Comput. 64(4), 984–994 (2014) 10. Raveendran, S., Edavoor, P., Kumar, Y., Vasantha, M.: Inexact signed Wallace tree multiplier design using reversible logic. IEEE Access 1–1 (2021)

Segmentation of Cattle Using Color-Based Skin Detection Approach Diwakar Agarwal

1 Introduction The cattle identification is used in various applications ranging from registration and tracking of animals to disease control and behavior analysis [1]. The segmentation of cattle region holds the crucial and first step in many cattle identification algorithms [2]. Since many years, numerous algorithms [3–9] have witnessed the application of skin detection on human images. Some of the applications include identifying human hands and limbs [3], gesture analysis [4–6], face detection [7–9], etc. Two approaches for human skin detection are discussed in the literature: one is pixel classification based on the skin color [10–18] and other is based on the analysis of skin texture [19–22]. Rather than texture, the number of algorithms has developed for color-based skin detection. In the color-based approach, at first, all pixels in an image are transformed into a suitable color space, and then, a classifier defines a decision boundary between skin color and non-skin color pixel classes. Motivated from the earlier research on human skin detection, this paper reports the utility of color-based skin detection in the segmentation of cows in cow images captured in an uncontrolled environment. The illumination and background color resemblance are two main challenges in an uncontrolled environment. The skin appears differently in different illuminations, i.e., daylight or under shed which causes different transformations in the same color space. Another challenge is the skin color tone possessed by the background region or objects such as pillars, cow dung, iron gates, which leads to misclassification errors. Figure 1 shows some exemplar cow images from manually created database incorporating these two challenges. The proposed method utilized color modeling approach and Bayes classifier [23] to deal with such complex background. In this paper, in color modeling, skin and non-skin D. Agarwal (B) Department of Electronics and Communication Engineering, GLA University, Mathura, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_55

621

622

D. Agarwal

Fig. 1 Images of cows captured from a mobile camera in an uncontrolled environment

color model is formed as a 2D color histogram in HSV color space. The HSV color space includes two color components hue (H) and saturation (S) and one component to record brightness value (V ). The Bayes classification is the probabilistic approach to find whether the given pixel belongs to skin or non-skin. Other than skin detection, the Bayes classifier has been successfully used in various applications such as predicting clinical diseases [24, 25], text classification [26, 27], natural language processing [28]. The main objective is to determine an optimal threshold value comparing with which the pixel is classified as skin or non-skin.

2 Related Work This section presents the work related to color-based skin detection. Jones and Rehg [10] have introduced the statistical skin and non-skin color models for human skin detection after incorporating the large online dataset. The method achieved skin detection rate of 80% and false positive rate of 8.5%. Kakumanu et al. [11] enriched the literature with an extensive survey on skin detection using color information. An emphasis has been put on describing various color spaces and adaptive color modeling approaches. Sun [12] has proposed the dynamic skin color modeling to detect the skin color efficiently. The tuning of trained skin color model by local skin model results into the color model which is adaptive to the unseen image dataset. Kawulok et al. [13] also provided an extensive survey on color modeling schemes. The main objective was to point limitations and approaches to improve color-based image pixel classification.

Segmentation of Cattle Using Color-Based Skin Detection Approach

623

Brancati et al. [14] utilized the YCbCr color space, in which the rule-based classification was applied on the subspaces of chrominance components YCb and YCr to identify skin pixels. Zhang et al. [15] introduced color-based skin detection in realtime videos. The color distribution was modeled by using facial reference points selected on faces which were detected in video frames. Samson and Lu [16] have proposed an improved human skin detection method which is efficient in terms of accuracy and speed. An optimal threshold value was computed to distinguish dark skin color pixels. Nazaria et al. [17] proposed the skin detection using fully connected neural network in a new color space. The classification of skin pixels using adaptive network-based fuzzy inference system (ANFIS) achieved low misclassification error. Saxe and Foulds [18] have proposed the color-based skin detection in video images. The method is robust to different skin colors, color of clothes, and background across large population. The robustness of a segmentation algorithm is maximally dependent on the way it deals with the cluttered background. Talasila et al. [29] introduced PLRSNet, a deep model that performs semantic segmentation of plant leaves under complex background. The above study shows that various color-based methods are developed for human skin detection in clear and controlled background, but no method is proposed for skin detection of cattle in cattle images. The main contributions of the proposed method are as follows: 1. Segmentation of cow regions is performed in cow images containing complex background. 2. Selection of the best size of the color model required for the classifier for reasonably good segmentation.

3 Proposed Method The objective of the proposed method is to determine an optimal threshold value based on which the Bayes classifier distinguishes between skin and non-skin pixels in an image. Based on the fact that skin pixels accumulated in some color measurement space [23], a 2D color histogram is determined as a part of skin modeling approach in the proposed method. An input RGB image is first converted into HSV color space. Then, the 2D skin and non-skin histograms are formed as a 2D matrix of a number of pixels counted manually at HS color value pair in skin and non-skin regions, respectively, in all images of the database. Further, in Bayes classification, the 2D histogram provides a likelihood that the given pixel in an image at a color value pair belongs to the skin or non-skin.

624

D. Agarwal

Fig. 2. Three-dimensional illustration of color models a HSV b HSL

3.1 Color Spaces The selection of a color space holds an important step in color-based image pixel classification. The color spaces are categorized into basic and perceptual color spaces. The RGB, normalized RGB, and CIE-XYZ are characterized as basic color spaces. The RGB that corresponds to red, green, and blue is considered as the basis format as it undergoes linear and nonlinear transformations to determine various other color spaces [11]. To make the RGB color space independent of varying lighting conditions, the RGB color space is normalized such that the sum of all three components is equal to 1. Such color space is called normalized RGB color space. In CIE-XYZ (Commission Internationale de l’Eclairage) color space, the luminance is defined by Y with two additional components X and Z. Apart from these, HSV, HSL, HSI, and TSL are characterized as perceptual color spaces. The two chroma components H and S in HSV, HSL, and HSI are defined commonly by hue and saturation, respectively. However, color spaces are distinguished by the method of computing the third component, i.e., color brightness value (V ), lightness (L), and intensity (I). In HSL, the maximum lightness is pure white. In contrast, the maximum brightness in HSV corresponds to the shining bright white light on a colored object. The appearance of colors in HSV and HSL color models are illustrated in Fig. 2. Since perceptual features such as hue, saturation, and intensity are absent in RGB color space, HSV color space is the prominent choice for many skin detection algorithms [11, 23].

3.2 2D Color Histogram The 2D color histogram is defined as a color model which represents the distribution of skin and non-skin tones in a color space. In the proposed method, the color components H and S of HSV color space are used to form 2D skin and non-skin color histograms. The V component is dropped to eliminate the illumination variation of the skin color and make the skin detection insensitive to illumination. Another reason of using only H and S color components is to make the histogram invariant

Segmentation of Cattle Using Color-Based Skin Detection Approach

625

to illumination and local variations caused by shadows [23]. The 2D histogram is formed as a matrix of m × m cells by quantizing the full range of H and S into m bins. At first, all images of the database are manually partitioned into skin and non-skin regions. Then, the total count of skin and non-skin labeled pixels at a color value pair (HS) is recorded in the corresponding cell of the skin and non-skin histograms, respectively. In this paper, the skin histogram is used as the skin color model S and the non-skin histograms are used as the non-skin color model NS.

3.3 Classifier Training In each image, according to the Bayes theorem, the probability that the given pixel i belongs to the skin region is given by   P S i =

  P i S P(S)     P i S P(S) + P i NS P(NS)

(1)

    where P i S and P i NS are conditional probabilities which correspond to like    lihood of the pixel i being skin and non-skin, respectively. The P i S and P i NS are estimated by using 2D color models S and NS as    count(c) P i S = Ns    count(c) P i NS = N

(2) (3)

ns

where in a 2D histogram, count(c) is the number of pixels contained in the cell c corresponds to the HS color value pair associated with the ith pixel. The Ns and Nns are the sum of number of pixels stored in each cell. The P(S) and P(NS) are prior probabilities which can be estimated by the number of skin and non-skin pixels of all images of database. According to the skin detection rule, the given pixel i belongs to the skin region if   P S i ≥

(4)

where  is the  threshold value that holds 0 ≤  ≤ 1. Rather than computing  the   exact value of P S i from (1), the easier approach is to compare the ratio of P S i     to P NS i with . Therefore, similar to (1), the P N S i is given by

626

D. Agarwal



 P NS i =

  P i N S P(N S)     P i S P(S) + P i N S P(N S)

(5)

    Then, the ratio of P S i to P N S i is given by     P S i P i S P(S)    =   P NS i P i N S P(N S)

(6)

Thus, the given pixel i belongs to the skin region if   P S i   ≥ P NS i

(7)

Using maximum likelihood (ML) approach, the probability of a pixel being skin P(S) is considered same as the probability of a pixel being non-skin P(N S). So, Eq. (7) is modified to   P i S   ≥  P i NS

(8)

For each training image, each pixel is probed for its belongingness to the skin or non-skin region by following (8) using multiple threshold values. At each threshold, the false positive rate (FPR) and false negative rate (FNR) are determined. The FPR is defined as the fraction of manually segmented non-skin pixels mistakenly classified as skin, and the FNR is defined as the fraction of manually segmented skin pixels mistakenly classified as non-skin. The FPR and FNR are given by FPR =

FP FP + T N

(9)

FNR =

FN FN + T P

(10)

where TP is true positive which is defined as the number of correctly classified skin pixels, FP is false positive which is defined as the number of non-skin pixels mistakenly classified as skin pixels, TN is true negative which is defined as the number of correctly classified non-skin pixels, and FN is false negative which is defined as the number of skin pixels mistakenly classified as non-skin pixels. A high threshold value misclassifies many skin pixels into non-skin and increases the FNR; similarly, a low threshold value misclassifies many non-skin pixels into skin and thus

Segmentation of Cattle Using Color-Based Skin Detection Approach

627

increases the FPR. Therefore, that threshold value is selected as the suitable one at which the difference between FPR and FNR is minimum. In the proposed method, the FPR and FNR at the suitable threshold value are recorded for each image of the training dataset. The main objective of the classifier training is to plot the receiver operating characteristic (ROC) curve between FPR and true positive rate (TPR). The TPR is defined as the fraction of manually segmented skin pixels correctly classified as skin. It is computed as T PR = 1 − FNR

(11)

The ROC curve measures the performance of the classifier at multiple classification threshold values. In the ROC curve, a low threshold value increases both FPR and TPR. The main reason is that if the ratio of conditional probabilities given by (8) exceeds the threshold, then maximum number of pixels are classified as skin pixels, whereas a high threshold value decreases the TPR and increases the FNR. Thus, the ROC curve provides an optimal threshold value at which the FPR achieves minimum and TPR achieves maximum value.

4 Experimental Results and Discussion The proposed method is applied on two databases, namely, a manually created database (database 1) and a publicly available online database (database 2) [30]. A manual database of 692 cow images is formed, where each image is acquired in a natural uncontrolled environment using smartphone camera. The images are captured with a 13 megapixel camera equipped in Vivo 1902 smartphone with 2.3 GHz Octa-core processing capability. Another database is an online database of 271 cow images included from different online sources. Before training and testing of the classifier, total images of each database are divided into the proportion of 70:30. In database 1, 692 cow images are divided into a training set of 482 images and a testing set of 210 images, whereas in database 2, 271 cow images are divided into a training set of 190 images and a testing set of 81 images. The efficacy of the proposed method is evaluated on testing datasets on the basis of FPR, FNR, and following performance metrics. 1. Precision: The precision is defined as the percentage of correctly classified skin pixels out of all classified skin pixels. It is determined by Precision =

TP TP + FP

(12)

2. Recall: It is same as TPR which is defined as the percentage of skin pixels correctly classified as skin. It is also referred as sensitivity.

628

D. Agarwal

Recall =

TP TP + FN

(13)

4.1 Training Results In classifier training, the Bayes classifier is supervisedly trained with manually segmented training images. The images in training datasets are partitioned into skin and non-skin regions to form ground truth labeling of each pixel. The conditional probabilities required in the ratio as given in (8) are computed by using (2) and (3). To find out the appropriate size of the 2D histogram, ROC curves for 256, 32, and 16 bins per channel are computed for database 1 and database 2 as shown in Figs. 3a and b, respectively. It is observed that 32 bins per channel are better than 16 and 256 bins per channel in terms of low FPR and high TPR. The division of the full range of H and S in 16 bins increases the color range of each bin and thus decreases the span or distribution of colors. Due to this, many different colors are indexed into same bin which makes difficulty for the classifier to train with non-skin and skin colors. As a result, the trained classifier misclassifies number of skin pixels into non-skin pixels that leads to low TPR. The classifier also misclassifies pixels of many non-skin regions which appears similar to skin regions, and this leads to high FPR. Moreover, division of full range of H and S in 256 bins increases the color distribution and decreases the range of colors of each bin significantly. Therefore, even a discernible color detail is indexed in a separate bin which results into slight misclassification of non-skin pixels into skin pixels, thus leading toward high FPR. As shown in Fig. 3a, for the database 1, the Bayes classifier with 32 bins per channel achieves 90.26% TPR at 25.15% FPR. At the same FPR, the TPR of 85.05% is achieved for 16 bins per channel and the TPR of 80.30% is achieved for 256 bins

Fig. 3 ROC curves for 32, 16, and 256 bins per channel a database 1 b database 2

Segmentation of Cattle Using Color-Based Skin Detection Approach

629

per channel. Similarly, for the database 2, as shown in Fig. 3b, with 32 bins, the TPR of 91.06% is achieved at the FPR of 28.31%. At the same FPR, 90.65% TPR is achieved for 16 bins and 83.85% TPR is achieved for 32 bins per channel.

4.2 Testing Results The effectiveness of the proposed method is tested on 210 unseen testing images of the database 1 and 81 testing images of database 2. The performance is evaluated on the basis of average precision, average recall, average FPR, and average FNR. For this purpose, similar to training images, all testing images are manually segmented into skin and non-skin regions to prepare the ground truth information. While testing, according to the training result, the optimal threshold value is obtained as 0.4 for the database 1 and 0.6 for the database 2. A skin and non-skin binary map equal to the size of an input image is formed while probing each pixel belongingness according to (8). At each pixel location, if the ratio of conditional probabilities is greater than the optimal threshold, then the pixel is considered as a skin pixel and a binary bit ‘1’ is assigned to the binary skin map at the same location. Otherwise, the pixel is considered as a non-skin pixel and a binary bit ‘1’ is assigned to the non-skin binary map at the same location. Furthermore, the performance metrics are computed by comparing the location of pixels previously marked in the ground truth and locations obtained in binary maps. The average value of all performance metrics is computed by taken into account all testing images. The significance of the precision is to check the potency of the proposed method for not misclassifying non-skin pixels into skin pixels. In contrast, the recall checks the effectiveness by measuring the amount of correctly classified skin pixels out of total classified skin pixels. Table 1 shows the average precision, average recall, average FPR, and average FNR obtained for 16, 32, and 256 bins per channel for database 1 and database 2. It is observed that the values of performance metrics for 32 bins per channel are obtained better than 16 and 256 bins per channel. Another observation is that with database 2, the classifier outperforms the training results of database 1 because the images in database 2 have less complex background in comparison to the images in database 1. A differentiation between 32, 16, and 256 bins per channel is also shown in Figs. 4 and 5 on the basis of the visual comparison between output segmented images. It can be seen that many non-skin regions which appear similar to skin regions are well segmented as non-skin (black region) in 32 bins per channel as compared to 16 and 256 bins per channel.

5 Conclusion In this paper, the cattle segmentation based on skin color modeling approach is presented. The color information obtained from HSV color space is utilized to form

630

D. Agarwal

Table 1 Performance metrics computed for testing images Performance metrics

16 bins per channel

32 bins per channel

256 bins per channel

Database 1

Database 2

Database 1

Database 2

Database 1

Database 2

Average precision

0.924

0.932

0.937

0.941

0.821

0.828

Average recall

0.919

0.921

0.921

0.926

0.839

0.852

Average FPR 0.478

0.466

0.47

0.461

0.489

0.472

Average FNR 0.081

0.071

0.079

0.064

0.161

0.143

Fig. 4 Testing results on database 1 (a), (f) input RGB cow images (b), (g) corresponding HSV images (c), (h) segmented images considering 32 bins per channel (d), (i) 16 bins per channel (e), (j) 256 bins per channel

skin and non-skin 2D color histograms. Using ML approach, the Bayes classifier distinguishes skin and non-skin pixels by comparing the ratio of likelihood probabilities at each pixel in an input image with the appropriate threshold value. At 32 bins per channel, the proposed method achieves low FPR and high TPR even in the complex background where most of the regions have resemblance with the cow’s skin color. The cluttered background and shadows pose a great problem for color-based

Segmentation of Cattle Using Color-Based Skin Detection Approach

631

Fig. 5 Testing results on database 2 (a), (f) input RGB cow images (b), (g) corresponding HSV images (c), (h) segmented images considering 32 bins per channel (d), (i) 16 bins per channel (e), (j) 256 bins per channel

skin detection approaches. As future aspects, researchers could apply illumination adaptation methods to increase the performance of skin detection classifiers. Funding This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Declarations of Interest None

References 1. Awad, A.I.: From classical methods to animal biometrics: a review on cattle identification and tracking. Comput. Electron. Agric. 123, 423–435 (2016) 2. Kumar, S., Singh, S.K., Singh, A.K.: Muzzle point pattern based techniques for individual cattle identification. IET Image Proc. 11(10), 805–814 (2017) 3. Roy, K., Mohanty, A., Sahay, R.R.: Deep learning based hand detection in cluttered environment using skin segmentation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 640–649 (2017)

632

D. Agarwal

4. McBride, T.J., Vandayar, N., Nixon, K.J.: A comparison of skin detection algorithms for hand gesture recognition. In: IEEE Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA), pp. 211–216 (2019) 5. Tsagaris, A., Manitsaris, S.: Colour space comparison for skin detection in finger gesture recognition. Int. J. Adv. Eng. Technol. 6(4), 1431 (2013) 6. Elshehry, O.S.: Real-time hand area segmentation for hand gesture recognition in desktop environment. J. ACS Adv. Comput. Sci. 10(1), 49–66 (2020) 7. Sharif, M., Mohsin, S., Javed, M.Y.: Real time face detection using skin detection (block approach). J. Appl. Comput. Sci. Math. 10(5), 75–81 (2011) 8. Qiang-rong, J., Hua-lan, L.: Robust human face detection in complicated color images. In: 2010 2nd IEEE International Conference on Information Management and Engineering, pp. 218–221 (2010) 9. Liu, Q., Peng, G.Z.: A robust skin color based face detection algorithm. In: 2nd IEEE International Asia Conference on Informatics in Control, Automation and Robotics (CAR 2010), vol. 2, pp. 525–528 (2010) 10. Jones, M.J., Rehg, J.M.: Statistical color models with application to skin detection. Int. J. Comput. Vision 46(1), 81–96 (2002) 11. Kakumanu, P., Makrogiannis, S., Bourbakis, N.: A survey of skin-color modeling and detection methods. Pattern Recogn. 40(3), 1106–1122 (2007) 12. Sun, H.M.: Skin detection for single images using dynamic skin color modeling. Pattern Recogn. 43(4), 1413–1420 (2010) 13. Kawulok, M., Nalepa, J., Kawulok, J.: Skin detection and segmentation in color images. In: Celebi, M., Smolka, B. (eds.) Advances in Low-Level Color Image Processing. Lecture Notes in Computational Vision and Biomechanics, Vol. 11. Springer, Dordrecht (2014) 14. Brancati, N., De Pietro, G., Frucci, M., Gallo, L.: Human skin detection through correlation rules between the YCb and YCr subspaces based on dynamic color clustering. Comput. Vis. Image Underst. 155, 33–42 (2017) 15. Zhang, K., Wang, Y., Li, W., Li, C., Lei, Z.: Real-time adaptive skin detection using skin color model updating unit in videos. J. Real-Time Image Process. 1–13 (2021) 16. Samson, G.L., Lu, J.: PKT: fast color-based spatial model for human skin detection. Multimedia Tools Appl. 80, 32807–32839 (2021) 17. Nazaria, K., Mazaheri, S., Bigham, B.S.: Creating a new color space utilizing PSO and FCM to perform skin detection by using neural network and ANFIS. arXiv preprint arXiv:2106.11563 (2021) 18. Saxe, D., Foulds, R.: Toward robust skin identification in video images. In: Proceedings of the Second IEEE International Conference on Automatic Face and Gesture Recognition, pp. 379– 384 (1996) 19. Fotouhi, M., Rohban, M.H., Kasaei, S.: Skin detection using Contourlet-based texture analysis. In: Proceedings of 2009 Fourth International Conference on Digital Telecommunications, pp. 59–64 (2009) 20. Salah, K.B., Othmani, M., Kherallah, M.A.: Novel approach for human skin detection using convolutional neural network. Vis. Comput. 38(5), 1833–1843 (2021) 21. Boulkenafet, Z., Komulainen, J., Hadid, A.: Face spoofing detection using colour texture analysis. IEEE Trans. Inf. Forensics Secur. 11(8), 1818–1830 (2016) 22. Sheha, M.A., Mabrouk, M.S., Sharawy, A.: Automatic detection of melanoma skin cancer using texture analysis. Int. J. Comput. Appl. 42(20), 22–26 (2012) 23. Zarit, B.D., Super, B.J., Quek, F.K.: Comparison of five color models in skin pixel classification. In: Proceedings of IEEE International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, In Conjunction with ICCV’99 (Cat. No. PR00378), pp. 58–63 (1999) 24. Jackins, V., Vimal, S., Kaliappan, M., Lee, M.Y.: AI-based smart prediction of clinical disease using random forest classifier and Naive Bayes. J. Supercomput. 77(5), 5198–5219 (2021)

Segmentation of Cattle Using Color-Based Skin Detection Approach

633

25. Xiong, Y., Ye, M., Wu, C.: Cancer classification with a cost-sensitive Naive Bayes stacking ensemble. Comput. Math. Methods Med. (2021). https://doi.org/10.1155/2021/5556992 26. Luo, X.: Efficient English text classification using selected machine learning techniques. Alex. Eng. J. 60(3), 3401–3409 (2021) 27. Ritonga, M., Al Ihsan, M.A., Anjar, A., Rambe, F.H.: Sentiment analysis of COVID-19 vaccine in Indonesia using Naïve Bayes algorithm. IOP Conf. Ser. Mater. Sci. Eng. 1088(1), 012045 (2021) 28. Moon, N.N., Salehin, I., Parvin, M., Hasan, M., Talha, I. M., Debnath, S.C., Saifuzzaman, M.: Natural language processing based advanced method of unnecessary video detection. Int. J. Electr. Comput. Eng. 11(6), 2088–8708 (2021) 29. Talasila, S., Rawal, K., Sethi, G.: PLRSNet: a semantic segmentation network for segmenting plant leaf region under complex background. Int. J. Intell. Unmanned Syst. (2021). https://doi. org/10.1108/IJIUS-08-2021-0100 30. www.kaggle.com/datasets/afnanamin/cow-images. Accessed on 27 May 2022

Sinusoidal Oscillator Using CCCCTA Shailendra Bisariya

and Neelofer Afzal

1 Introduction In electronics engineering domain, sinusoidal oscillators are used widely in a number of application areas like regenerative repeaters, satellite communication, radars and other signal processing tasks, signal generation, etc. The RC sinusoidal oscillators can be easily realized with the help of active elements [1–9]. Among them, the sinusoidal oscillators which can provide independent control of the condition of oscillation (CO) and the frequency of oscillation (FO) are much useful. On the other hand, an active element current conveyor trans-conductance amplifier (CCTA) has the advantage that it can work in both voltage and current modes and hence it gives flexibility in designing the circuits. Also CCTA provides high slew rate, high speed and wide bandwidth [10]. Further, since the CCTA in itself does not have any functionality to control the parasitic resistance at its input port, so current controlling can be provided in it to control the same, which makes this device a current controlled current conveyor trans-conductance amplifier (CCCCTA) [11, 12]. This has the additional advantage of electronic adjustability over the basic CCTA design. This paper presents a new sinusoidal oscillator design using CCCCTA as an active element. All of the oscillators realized in [1–9] have used either more than one active element, more than three passive components, or they have not provided electronic control to either the CO or the FO. CCII-based single resistance controlled oscillator was proposed in [1] which includes a total of five passive components, and tunability is achieved with the help of one passive resistor only. OTRA-based single resistance controlled oscillator was proposed in [2] which includes again a S. Bisariya (B) ECE Department, ABES Engineering College, Ghaziabad, Utter Pradesh, India e-mail: [email protected] N. Afzal Jamia Millia Islamia, New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_57

635

636

S. Bisariya and N. Afzal

total of five passive components, and no electronic tunability is possible. Sinusoidal oscillator design utilizing two CCII and four passive components is presented in [5] while [6] presented a design employing a total of three CDTA elements along with three passive components. OTRA-based design proposed in [8] utilizes four passive components while in [9] a total of five passive components are used. The proposed circuit employs only a single CCCCTA element and only three passive components, and it also gives independent control of the CO and the FO. Further, in the proposed circuit, both resistors and capacitors are grounded and not floating, and hence, it is suitable for monolithic integration.

2 Current Controlled Current Conveyor Trans-conductance Amplifier (CCCCTA) The proposed sinusoidal oscillator is implemented based on CCCCTA which is modified form of CCTA with current controlling ability. The CCTA is basically the combination of current conveyor (CCII) and trans-conductance amplifier. So, CCCCTA is actually formed by using CCII at its input stage followed by trans-conductance amplifier with current biasing to control the current. The basic model of CCCCTA is shown in Fig. 1a with its equivalent circuit in Fig. 1b. The CCCCTA device has two input and two output terminals as shown in Fig. 1. The input terminal ‘x’ has some parasitic resistance (Rx ) which varies according to the external supplied current. The input terminal ‘y’, the intermediate terminal “z” and the output terminal ‘o’ are basically high impedance terminals. Following matrix describes the property of CCCCTA:

Fig. 1 CCCCTA a circuit representation b equivalent symbol [11]

Sinusoidal Oscillator Using CCCCTA

637

Fig. 2 Schematic CMOS realization of CCCCTA

⎡ ⎤ ⎡ iy 0 ⎢v x ⎥ ⎢ R x ⎢ ⎥=⎢ ⎣ iz ⎦ ⎣ 1 io 0

0 1 0 0

0 0 0 gm

⎤⎡ ⎤ 0 ix ⎢v y ⎥ 0⎥ ⎥⎢ ⎥ 0⎦ ⎣ v z ⎦ 0 vo

where gm is the trans-conductance of CCCCTA. Figure 2 shows the CMOS realization of CCCCTA. The CCII consists of transistors mainly M4, M5 and M12, M13. The trans-conductance amplifier action is being performed with the help of transistors M14–M17 and M18–M25. The current mirroring action is being done by current mirror circuits (M8–M9), (M10–M11) and (M6–M7), and remaining transistors are used for biasing purpose.

3 Proposed Sinusoidal Oscillator Figure 3 shows the proposed circuit symbol, and Fig. 4 shows its MOS implementation. Only, one CCCCTA active element is used. One resistor is connected between input port y and ground while two capacitors are used; one is connected between input port y, and ground with output port z is connected directly to input y, and other capacitor is connected between second input terminal x and ground. By doing routine circuit analysis, following expression is found (R1 C1 Rx C2 )s2 + (R1 C1 + (Rx − R1 )C2 )s + 1 = 0.

(1)

From Eq. (1), it is investigated that this circuit will act as an oscillator if following condition is satisfied R1 C2 >= R1 C1 + Rx C2 .

(2)

638

S. Bisariya and N. Afzal

Fig. 3 Proposed sinusoidal oscillator symbolic representation

Fig. 4 Proposed sinusoidal oscillator MOS implementation

The frequency of oscillation obtained is  w0 = 1/ R1 C1 Rx C2

(3)

Since Rx can be changed with the help of biasing current I A , CO and FO can be easily changed by suitably changing the value of current I A only. Transistor sizes used for implementing the oscillator are mentioned in Table 1.

4 Simulation Results ORCAD PSPICE 17.2 version was used to simulate this CCCCTA-based sinusoidal oscillator circuit with 0.18 µm CMOS technology parameters. The value of capacitor C1 is taken as 20 pf and for capacitor C2 is 102 pf with a biasing current I A 60 µA and I B 120 µA. The value of R1 is taken as 1.88 kohm satisfying the condition of oscillation. Figure 5 shows the output waveform with the frequency obtained at

Sinusoidal Oscillator Using CCCCTA

639

Table 1 Transistor sizes used for implementation MOSFET’s [W /L] Size [µ] M1–M3, M6–M7, M18 M4–M5 M8–M11, M19 M12–M13 M14–M15, M24–M27 M16–M17 M20–M23

12/1 20/1 20/1 32/1 8/1 5/1 8/1

Fig. 5 Output waveform obtained for sinusoidal oscillator Table 2 Comparison of available sinusoidal oscillator Ref.[.] Analog building Number of ABB block (ABB) [1] [3] [5] [6] [8] [9] Proposed work

CCII OTRA CCII CDTA OTRA OTRA CCCCTA

1 1 2 3 1 1 1

Number of passive components

Electronic tunability of output current

5 5 4 3 4 6 3

No No No No Yes No Yes

output terminal is 2.8 MHz which verifies the theoretical results. Further, it can be changed easily by varying the biasing current which proves the feasibility as well as the controllability of the designed circuit by varying the input electronically. Table 2 shows comparison of available sinusoidal oscillator with the proposed one.

640

S. Bisariya and N. Afzal

5 Conclusion This paper presents a sinusoidal oscillator which utilizes only a single CCCCTA with three passive elements. All the passive elements are grounded and not floating makes this design suitable for integration. This circuit can provide both current and voltage outputs, and its CO and FO can be tuned electronically. The simulation results show the implemented sinusoidal oscillator waveform which verifies the theoretical calculations. Acknowledgements Authors acknowledge the management, director, dean and Head of ECE department to provide wonderful facilities and a great environment to pursue this work.

References 1. Bhaskar, D.R., Senani, R.: New current conveyor based single resistance controlled voltage controlled oscillator employing grounded capacitors. Electron. Lett. 29, 612–614 (1993) 2. Cam, U., Toker, A., Cicekoglu, O., Kuntman, H.: Current mode high output impedance sinusoidal oscillator configuration employing single FTFN. Analog Integr. Circ. Signal Process. 24, 231–238 (2000) 3. Cam, U.: A novel single resistance controlled sinusoidal oscillator employing single operational trans-resistance amplifier. Analog Integr. Circ. Signal Process. 32, 183–186 (2002) 4. Khan, A.A., Bimal, S., Dey, K.K., Roy, S.S.: Novel RC sinusoidal oscillator using second generation current. IEEE Trans. Instrum. Meas. 54(6), 2402–2406 (2005) 5. Fongsamut, C., Surakampontorn, W.: Current conveyor based single element controlled sinusoidal oscillator. Int. J. Electron. 93(7), 467–478 (2006) 6. Jin, J., Wang, C.: Current mode four phase quadrature oscillator using current differencing transconductance amplifier based first order all pass filter. Rev. Roum. Sci. Technol. Électrotechn. et Énerg. 57(3), 291–300 (2012) 7. Jantakun, A., Ngiamwibool, W.: Current mode sinusoidal oscillator using current controlled current conveyor trans-conductance amplifier. Rev. Roum. Sci. Techn. Électrotechn. et Énerg. 58(4), 415–423 (2013) 8. Pittala, C.S., Avireni, S.: Realization of sinusoidal oscillators using operational transresistance amplifier (OTRA). Int. J. Circ. Electron. 2, 103–113 (2017) 9. Komanapalli, G., Neeta Pandey, N., Pandey, R.: New realization of third order sinusoidal oscillator using single OTRA. Int. J. Electron. Commun. (AEÜ) 93, 182–190 (2018) 10. Prokop, R., Musil, V.: New modern circuit block CCTA and some its applications. ET 5, 93–98 (2005) 11. Siripruchyanun, M., Jaikla, W.: Current controlled current conveyor trans-conductance amplifier (CCCCTA): a building block for analog signal processing. Electr. Eng. 90(6), 443–453 (2008) 12. Sotner, R., Jerabek, J., Prokop, R., Vrba, K.: Current gain controlled CCTA and its application in quadrature oscillator and direct frequency modulator. Radioengineering 20(1), 317–326 (2011)

Automation of Railway Gate Using Raspberry Pi K. Sabarish, Y. Prasanth, K. Chaitanya, N. Padmavathi, and K. Umamaheswari

1 Introduction India’s railway network is one of the largest railway networks in the world which lengths 67,956 km as of 31 March 2020. In India, there are about more than 31,846 level crossing zones among these one third of level crossing zones, there is no gateman. In India, about 40% of railway gate accidents are occurring at level crossing zones. According to the NCBR committee year by year, the accidents are increasing at level crossing zone. Automation of railway gates is the best solution to overcome these accidents and improve safety at the level crossing zone [1]. Converting all those 31,846 level crossing zones to automated gates costs a lot for the Indian Government, but at least, we can automate those unmanned gates, which results in a decrease in the count of accidents at level crossing zones.

2 Problem Statement In this modern era, even though technology has been developed a lot, but still in the railway level crossing zones, the Indian Railway uses the traditional methods of operation. Now, there is a potential need for automation to upgrade the system. Automation at unmanned level crossing zones will result in less amount of accidents, not only the accidents but also saves the time in the absence of gateman at level crossing zones. The opening and closing of gates are done by a second loco pilot, which takes one more step forward on the technological development at Indian Railways. K. Sabarish (B) · Y. Prasanth · K. Chaitanya · N. Padmavathi · K. Umamaheswari Department of EIE, VR Siddhartha Engineering College, Vijayawada, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_58

641

642

K. Sabarish et al.

3 Literature Survey Pandey et al. proposed a system, to eliminate all potential causes of accidents at a level crossing though automation is a cost-effective solution, and it can also result in accidents in some circumstances. This system was created to address this issue. Obstacle detection has been successfully accomplished by utilising machine learning, specifically the SSD obstacle detection algorithm and feeding the data set into it [2]. Khan et al. proposed a solution, which was utilised to automate rail gate control systems with real-time tracking, and automatically, train brakes due to obstructions were presented at the track which enables us take a step further in developing the automatic railway gate [3]. Bhaskar et al. proposed a system, in which the train acts as master controller and the level crossing zone acts as slave controller, they implemented this idea using the Raspberry Pi. Even though the construction is simple, the gates will automatically close if a train reaches the zone. The main problem is that if the train’s master controller fails, it impacts the operation of all gates [4]. Ramesh Babu et al. proposed a system, in which using the help of GPS and GSM, they figure out the acceleration and movement of the train with the help of MATLAB using Kalman filter to estimate the location of the train, using the ARM controller. It is difficult to operate the gates at mountains, hills and network less areas [5]. Antoniü et al. proposed a solution, which gives an idea on the working of the axle counters, which are used to count the number of axles of the train, and with this idea, we can implement the automation but the problem is, there is a chance of counting error at turnouts and usage of electro-magnetic breaks will cause the counting error [6]. Naveen et al. proposed a solution, for visual surveillance-based railroad surveillance systems, a structure for railroad protection is provided which is based on the variance-based algorithm. By comparing it to the standard mean shift algorithm, many parameters are investigated. It solely serves as a daylight reference [7]. Verma et al. proposed a solution, it is based on preventing collisions between road vehicles and manual railway level crossing zones. Even though it is one of the best methods, the problem in India is, no one will follow the rules and no railway timings are correct [8]. Saxena proposed an idea, in which PLCs are used in this system. Accidents will be avoided with GPS-based automatic LC. It can be used as an add-on to a signallingbased system that currently exists. Based on the detection train signal, the PLC will be ready to operate the railway gate, PLC has high computation power, but the drawback is implementing of railway gate using the PLC costs a lot [9].

Automation of Railway Gate Using Raspberry Pi

643

4 Methodology In the bringing forward system, we were making use of Internet of Things (IoT) deployed technology. Here, we use Raspberry Pi 3 B+ as our main functioning unit. The Raspberry Pi receives data from the sensors and controls the associated actuators depending on the input data. Figure 1 gives a brief idea of the Raspberry Pi with the sensors and the actuators. We will install infrared (IR) sensors adjacent to the railway track at a pre-calculated distance from the level crossing in the proposed system. When the train passes by both sensors, the distance between them is known in advance, so the time it takes for the train to move from one to the other is computed and sent to the Raspberry Pi CPU installed at the level crossing zone [10, 11]. The proposed model is depicted in block diagram form in Fig. 2. The train’s speed is calculated by the microprocessor. The measured speed and the distance of the farthest sensor from the crossing are used to calculate the expected time for the train to arrive at the crossing. The microprocessor then issues a warning. As a result, the LED lights put at the level crossing serve as a warning signal to drivers approaching the level crossing zone. The colour of the signal light changes to convey to the road user when it is safe and when it is unsafe to enter the crossing zone. The gates will begin to close

Fig. 1 Block diagram Raspberry Pi

Fig. 2 Block diagram of Raspberry Pi at level crossing zone

644

K. Sabarish et al.

Fig.3 Arrangement of proposed gate control

as soon as the red light is turned on. This part of the processing that is handled by the microprocessor is critical in avoiding dangers. The Raspberry Pi sends out a command to open gates after a train passes over a level crossing in order to restore normalcy. The Raspberry Pi regulates the closing and opening of gates using servo motors that rotate in both clockwise and anticlockwise directions. Here, the piezo buzzer is used to alert the blind people to indicate the arrival of the train. LEDs that are used as the signal lights make it easy for trespassers to understand the status of the train and it is understandable for them whether it is the correct time to cross the gate or not [12–14]. Figure 3 gives an idea of the arrangement of the servo motors at the level crossing zone.

5 Proposed Method A. Overview We used a Raspberry Pi 3 B+ controller to build our proposed system. It has a fast processor and a number of GPIO pins for attaching various peripherals. In IoT applications, it is also one of the most extensively utilised microprocessors. Infrared sensors were employed to determine the train’s arrival. In cm, the distance among track sensors has been measured. However, in a real-world scenario, the distance will be measured in kilometres, and the speed will be measured in kilometres per hour. We slowed down the speed of our prototype. As signal indications, we employed LEDs and alerting devices to inform road users [15–18].

Automation of Railway Gate Using Raspberry Pi

645

Fig. 4 Proposed model’s prototype

Whenever the train arrives at the very first IR sensor, a red led will light up and a buzzer will sound to indicate that the train has arrived. When it reaches the second IR sensor, the gates will be closed. Later, the gates will open when the train approaches the third IR sensor. The mechanism will be the same for both directions. Figure 4 illustrates the proposed model’s prototype. B. Components Used (1) Raspberry Pi 3 B+ The Raspberry Pi 3 B+ is shown in Fig. 5. The Broadcom BCM2835SoC module on the Raspberry Pi is the correspondent of a chip found in an old smartphone. It is powered by an ARM1176JZF-S processor. The Raspberry Pi’s default clock speed is 700 MHz, which gives it a real-world performance of around 0.041 GFLOPS. In terms of processing power, the overall quality of the operation was comparable up to 300 MHz Pentium II from 1997 to 1999, while the GPU gives 1 Gpixel/s, 1.5 Gtexel/s or 24 GFLOPS of general-purpose compute, and the Raspberry Pi’s graphics capabilities are about equivalent to the console of 2001. By default, the Raspberry Pi microprocessor runs at 700 MHz, which is not quite hot enough and requires a heat sink or extra cooling. The Raspberry Pi was a compact device about the size of a debit card that runs on Raspberry Pi software. The Raspberry Pi 3 Model B+ is Fig. 5 Raspberry Pi 3 B+

646

K. Sabarish et al.

Fig. 6 IR sensor

a more powerful variant of the previous Raspberry Pi 3 Model B. The BCM2837B0 system-on-chip (SoC) features a 1.4 GHz quad-core, ARMv8 64-bit microprocessor and a strong Video Core IV GPU. The dual-band wireless LAN has modular compliance accreditation, which permits the board to be consolidated within end devices with much simplified wireless LAN validation, lowering both price and time to market [2]. (2) IR Sensor LM393 LEDs of infrared emit light with a frequency within the infrared range. Humans cannot notice infrared light since it has a wavelength (700 nm– 1 µm) that is much longer than the wavelength of visible light. IR LEDs have a light-emitting angle of about 20–60°, and the range is approximately a very few centimetres to many feet, depending on the type of IR transmitter as well as the manufacturer. Some of the transmitters will have a kilometrelong range. Because IR LEDs are white or transparent, they may emit the highest quantity of light. The IR sensor is shown in Fig. 6. Whenever beam strikes on a photodiode, it conducts and acts as a receiver of infrared light. The photodiode is a semiconductor that functions under reverse bias and has a P–N junction, meaning it conducts current in the opposite direction when light strikes it, and the amount of current flow is relative to the quantity of the light. It is useful in IR detection because of this hallmark. With such a dark colour varnishing on its outside side, the photodiode seems to be an LED. The black colour absorbs more light. Table 1 lists the features of infrared sensors [3]. (3) Servo Motor SG90 In RC applications, the SG90 tiny servo motor is the most often popular utilised servo motor. Servo motors are employed in precise control applications such as the placing of robot arms and tools in machining equipment. Table 1 Specifications of IR sensors

Model or type

LM393

Distance measuring range

2–30 cm

Operating voltage

3.6–5 VDC

Detecting angle

35°

Automation of Railway Gate Using Raspberry Pi

647

Fig. 7 Servo motor

The 180° range is normally controlled by servo motors. The servo motor is depicted in Fig. 7. The pulse width modulation is used to control a motor’s angular position. You may control the angular location of a motor by modifying the duty cycle. When hanging at 1 cm from the shaft, the SG90 servo motor will lift up to 1.6 kg. It is also suitable for robotic arms, CNC machines, RC car steering systems as well as other robotic or automation applications. The servo motor’s specs are given in Table 2 [4, 19]. (4) Piezo buzzer The main function of the buzzer is to convert the signal from audio to sound. Generally, it is powered up through DC voltage and used in timers, alarm devices, printers, alarms, computers, etc. Based on various designs, it can be able to generate different sounds like alarm, music, bell and siren. The piezo buzzer is shown in Fig. 8. It has two pins, one positive and the other negative. The buzzer’s positive terminal is indicated by a ‘+’ sign or even a longer terminal. The ‘−’ sign Table 2 Specifications of servo motor

Model/type

SG90

Stall torque at 4.8 V

1.2 kg-cm

Weight of motor

9gms

Stall torque at 6.6 V

1.6 kg-cm

Operational voltage

3–7.2 V

648

K. Sabarish et al.

Fig. 8 Piezo buzzer

Table 3 Specifications of buzzer

Supply current

Below 15 mA

Frequency range

3300 Hz

Sound pressure level

85 dBA or 10 cm

Operating voltage ranges

3–24 V DC

will indicate the negative terminal or short terminal and it must be connected to a GND terminal, and be powered by 5 V. Table 3 gives the specifications of the buzzer [4, 19, 20].

6 Results and Discussion Figure 4 shows the hardware circuit with servo motors that are adjacent to the road and on both sides of the level crossing zone, and IR sensors are paired. The average length of the Indian trains is 650 m, and our assumption of the first set of IR sensors housing the detection train is 10 km and another set which is 2 km away from the level crossing zone. Table 4 gives the details about the parameters of the real-time and prototype models. Table 4 Real-time vs prototype parameters

Real-time

Prototype model

Train length 650 m

Train length 5.2 cm

10 km

80 cm

2 km

16 cm

Automation of Railway Gate Using Raspberry Pi

649

Fig. 9 Train reaching first IR sensor

Fig. 10 Train reaching second IR sensor

As shown in Fig. 9, when the train strikes the IR beam of the first reference, the buzzer will sound and the red light will glow. The status of the buzzer and the red light will be carried forward until the train crosses the level crossing zone. When the train approaches the second IR sensor, the gates will be closed, as represented in Fig. 10. Figure 11 will be displayed on the monitor when the train reaches the second IR sensor. Figure 11 gives the status of the train and the speed of the train. Since it is a prototype, the speed of the train is displayed in centimetres per second (cmps). The gates will remain closed until the train approaches the IR sensor, which will be the first IR sensor after passing the level crossing zone, as indicated in Fig. 12. From Fig. 13, when the train passes via the IR sensor which is next to the level crossing zone, gates will be restored to the normal position (open position) moreover the buzzer and red light will turn off. As shown in Fig. 14, train out will be displayed on the monitor when action is performed by the train as per Fig. 13, i.e. the train out is displayed the moment the train passes through the IR sensor. Even when the train is travelling in the opposite way, the operation is identical.

650

K. Sabarish et al.

Fig. 11 Output displayed on the monitor with train’s speed and status

Fig. 12 At the level crossing zone, there’s a train

Fig. 13 Train reaching IR sensor which is next to level crossing zone

7 Conclusion In terms of the paper’s goals, the suggestion has presented adequate answers for preventing accidents at a level crossing zone. The automatic railway level crossing

Automation of Railway Gate Using Raspberry Pi

651

Fig. 14 Final output displayed on the monitor with train’s speed and status

system was created with the goal of minimising level crossing accidents around the globe. It is simple for the railway department to implement.

8 Limitations In the event of mechanical malfunction, the solution is just for accident avoidance, which is insufficient since there is no automated backup option for this situation.

9 Future Scope Other types of sensors, as well as actuators, can be employed that may perform better than the traditional paradigm. It also enables flexibility because other modules may be introduced without affecting the other modules. Acknowledgements The learners would like to express their sense of obligation to the EIE department of Velagapudi Ramakrishna Siddhartha Engineering College and all reviewers and experts for providing important suggestions and support.

References 1. Level crossing accidents up 20% in 2019: NCRB, Available: https://indianexpress.com/article/ india/leviel-crossing-accidents-up-20-in-2019-ncrb-6579574/ 2. Pandey, P., Shettar, C., Kulkarni, N., Chowdhary, A., Kulkarni, K.S., Doddamani, P.K.: Automated gate control with backup systems at level crossing. San Francisco State University, June 17 (2021) 3. Khan, M.R., Chowdhury, K.B.Q., Abdur Razzak, Md.: Automation of rail gate control with obstacle detection and real-time tracking in the development of Bangladesh railway. Carleton University, May 31 (2021)

652

K. Sabarish et al.

4. Bhaskar, D., Agrawal, V.K., Anirudha, H.M., Pradeep Varna, B.P.: Solution to unmanned railway crossing using satellite and GPS based approach. In: 4th International Conference for Convergence in Technology (I2CT) SDMIT Ujire, Mangalore, India, October 27–28 (2018) 5. Ramesh Babu, R., Srinivasan, S., Chellaswamy, C.: A humanitarian intelligent level crossing controller utilizing GPS and GSM. In: IEEE Global Humanitarian Technology Conference— South Asia Satellite (GHTC-SAS) September 26–27 (2014) 6. Antoniü, N.M., Nikoliü, M.V., Kokiü, I.Z., Milanoviü, M.D., Stojkoviü, Z.M., Kosiü, B.D.: Railway axle counter prototype. In: 22nd Telecommunications Forum TELFOR Serbia, Belgrade November 25–27 (2014) 7. Naveen, Ch., Gajbhiye, P., Satpute, V.R.: VIRTUe: video surveillance for rail-road traffic safety at unmanned level crossings. In: IEEE Region 10 Symposium (TENSYMP) (2017) 8. Verma, S., Singhal, V., Singh, A., Kavita, Anand, D., Rodrigues, J.J.P.C., Zaman, N., Ghosh, U., Iwendi, C.: Artificial intelligence enabled road vehicle-train collision risk assessment framework for unmanned railway level crossings. IEEE Access (2017). https://doi.org/10.1109/ACC ESS.2020.3002416 9. Saxena, K.: Automated level crossings—a futuristic solution enabling smart city infrastructure. University of Exeter (2020) 10. Yuan, L., Du, Q., Qu, Y., Liu, Y., Li, K.: Modeling and simulation of expanded train control system based on Simulink/Stateflow. IEEE 978-1-5386-7528-1/18 (2018) 11. Srinivas, Ch., Govardhan Rao, K.V.: Automatic railway gate control system using microcontroller. Int. J. Innov. Res. Sci. Eng. Technol. 7(5) (2018) 12. Chowdhury, I., Ahmed, T.: Into the binary world of zero death toll by implementing a sustainable powered automatic railway gate control system. In: IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT) (2020) 13. Nidhi, Yadav, S., Krishna: Automatic railway gate control using microcontroller. Orient. J. Comput. Sci. Technol. 6(4) (2013) 14. Hoque, S., Bhuiyan, R.H., Khan, T.N., Hasan, R., Biswas, S.: Pressure sensed fast response anti-collision system for automated railway gate control. Am. J. Eng. Res. (AJER) 2(11) (2013) 15. Saqquaf, S.S.M., Rashmi, V., Akhil, V., Shruthi, P.C., Chaithra, N.: An IoT based railway security system for automated manning at level crossings. In: IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT-2018) (2018) 16. Mohaimin Billah, Md., Mahmud, S., Emon, I.R.: Automated railway gate controlling system. Int. J. Comput. Trends Technol. (IJCTT) 27(1) (2015) 17. Zafar, N.A., Latif, S., Rehman, A.: Automata based railway gate control system at level crossing. In: International Conference on Communication Technologies (2019) 18. Balasubramanian, P., Thamilarasi, N., Banuchandar, J., Deepa, S., Kaliraj, V.: Automated unmanned railway level crossing system. Int. J. Mod. Eng. Res. (IJMER) 2(1) (2012) 19. Baby, E., Bobby, M., Krishnamurthi, K., Vidya, V.: Sensor-based automatic control of railway gates. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 4(2) (2015) 20. Tun, H.M., Tun, Z.M., Pwint, H.N.Y.: Automatic railway gate control system using microcontroller. Int. J. Sci. Eng. Technol. Res. (IJSETR) 3(5) (2014)

Comparative Analysis of Image Segmentation Methods for Mushroom Diseases Detection Y. Rakesh Kumar

and V. Chandrasekhar

1 Introduction Agaricus bisporus is a grass land mushroom native to North America and Europe. They come in white and brown varieties, with the white button mushroom being the most common in India. Seasonally and in climate-controlled cropping rooms, A. Bisporus is grown. Even it is grown in controlled environment still because of lack of knowledge, skill on crop management and human errors can result to diseases such as wet bubble, dry bubble cobweb, green mold, brown plaster, olive green, and bacterial blotch [1–3]. As a result, an expert’s opinion is needed to analyze and control the disease, but experts may not be accessible easily and time-taking process. A well-designed computer-aided technique to identify and segment diseases is an alternative with high accuracy.

2 Literature Survey In areas of an image processing and computer vision, because of its widespread use and application, segmentation remains an important research topic.

Y. Rakesh Kumar (B) ECE, G. Narayanamma Institute of Technology and Science for Women and Research Scholar Osmaina University Hyderabad, Hyderabad 500104, India e-mail: [email protected] V. Chandrasekhar Department of Electronics and Communication Engineering, MVSR Engineering College, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_59

653

654

Y. Rakesh Kumar and V. Chandrasekhar

A data mining approach is proposed to diagnosis the mushroom diseases in [4]. Sequential minimal optimization, Naive Bayes, ripple-down rules are used and accuracy for Naive Bayes and SMO [5] is 100% and for RIDOR 89.09%. When compared to SMO and RIDOR, the error rate for Naive Bayes is lower. The authors considered 16 diseases; however, each disease had a smaller number of photos, and the input image data was transformed to a different file format, so manual intervention is required. An automated web-based software for crop management methods are proposed in [6, 7]. The software was created to inspect white button mushroom crops and detect illnesses and pests using an image processing technology. It aids in the reduction of the amount of data and time that human users must process. It helps to spread disease information by uploading samples to the website’s server in the form of text and photographs. Then a pixel-by-pixel comparison is performed, and if the matching ratio reaches a specified percentage level, the software offers disease information. An expert system in diagnosing oyster mushroom diseases, such as bacteria, molds, viruses as well as the threat posed by pests and insects to mushroom growers, has been established [8]. It is a rule-based system that retrieves inferences from a knowledge base using forward chaining techniques [9]. Farmers are typically asked text-based questions on the symptoms and status of their crops in this system, and the ultimate decision is dependent on the farmer’s responses. A method of machine learning [10] presents a CNN-based approach for studying and analyzing mushroom pests and diseases images. This designed system includes a user interface for entering data into the system and browsing request analysis results, as well as a manager interface for learning analytical models. A CNN used as classifier shows good performance and high classification rate is achieved, it is applied in different research fields are discussed in [11]. It provides an overview of achievements, problems being encountered, and ongoing issues in the field of image segmentation and application of the approaches in various fields [12]. The strategies used in image segmentation are as follows: threshold based, edge based, and region based. A segmentation approach for disease-spotted leaves, the employment of advanced comprehensive color feature (ACCF), and region growing methodology under real-world situation is advised [13]. This technology uses ACCF and the region growing method for disease spot segmentation to address the clutter background and uneven illumination. Different segmentation approaches [14] for banana leaf disease detection are evaluated and compared, including “adaptive thresholding, Canny, fuzzy C-means, geodesic, color segmentation, global thresholding, log, K-means, multi-thresholding, Prewitt, region expanding, zero crossing and Robert, Sobel”. A disease detection model is presented that uses K-means and the Otsu technique to detect and identify infected Arecanuts. Preprocessing and illness identification are the two processes in this approach [15]. To reduce shadow effects, the Arecanut image is separated from the background using color K-means clustering in preprocessing. Otsu thresholding is used for disease detection and damaged Arecanut regions are then indicated using a connected components technique. The technique based on

Comparative Analysis of Image Segmentation Methods for Mushroom …

655

digital image processing [16] was discussed, and it was used to detect and classify various diseases of plant leafs. This model converts RGB to HIS, and different segmentation approaches such as K-means cluster, thresholding, and Otsu’s method were investigated and employed for various disease management strategies that will be useful in the agricultural industry. To categorize the fruits from akin to leaves and branches [17] uses the K-means algorithmic software. The automation uses the known center of mass position to pluck the fruit. Color-based segmentation is used to support the identifying of ripe fruits. Global thresholding is employed [18] to segregate the fruit infected area of the input photos using a supervised learning technique called multi-class SVM to classify images depending on image analyzed from RGB colored images. There are three different sorts of common apple illnesses to consider. A hybrid algorithm [19] is proposed for image segmentation, applied in fruit defect identification, based on a split and merge strategy. To achieve an over-segmentation result, the algorithm first employs the K-means to separate the image into sections depending on Euclidean color distance in laboratory color model. Then, using a graph representation as a guide, a minimal spanning tree merge process is used to repeatedly combine related sections into new homogeneous ones. Bhargava and Bansal [20] provides a complete description of several methods, like as “preprocessing, segmentation, feature extraction, and classification”, that were used to assess the quality of vegetables and fruits based on color, texture, shape, size, and flaws. For plant leaf segmentation, a new approach [21] depending on active contours and a saliency map was presented. To limit the influence of the beginning contour location, the visual saliency detection approach is utilized to extract a priori target item shape information from an input leaf image, which is then utilized to adaptively characterize the initial curve. In the development of computer-assisted design agricultural system for damage analysis, an automatic identification of plant leaf curl is a vital step [22]. In [23] biological material-RNA analysis, frozen in liquid nitrogen, microarray expression analysis are carried out. The experimental process necessitates manual involvement, i.e., the employment of chemical and biological methods. Because only a few samples are checked in general during the experimental process, the time-taking process and accuracy may be reduced. Isolation of dsRNA and electron microscopy were employed by [24]. Virus illnesses are detected by chemical and biological procedures, which involve manual intervention.

3 Methodology and Method 3.1 K-means Clustering-Based Mushroom Disease Segmentation The unsupervised K-means clustering method is used to segment the targeted region from the background. It is applied to unlabeled data and groups the data into Kclusters based on some kind of similarity with K-centroids.

656

Y. Rakesh Kumar and V. Chandrasekhar

The most standard clustering algorithm Naïve K-means used and its steps are as follows [16]: 1. Considering N pixels (P1 , P2 , P3 , …, PN ) of image represented by K-clustered. Initial K points signify cluster centroids as (C1, C2, C3,…, CN). 2. IF      Pi − Cq  >  Pi − C j  j = q and q = 1, 2 . . . K

(1)

Group the pixels into cluster which are close to initial centroids. 3. After all the pixels are allocated, calculate again the positions of the centroids. Then new centroid is (2) Yi∗

 = 1

si 

Ni

Pi i = 1, 2 . . . K

(2)

j=1

where N i is no. of elements in si cluster. Stop if the centroids of the cluster are not changing; otherwise, continue from step 2 till the centroids won’t change.

3.2 Region of Interest-Based Mushroom Disease Segmentation The regions can be geographical in character, such as polygons that contain adjacent pixels, or they can be specified by a set of intensities. The pixels in the matter scenario are not always contiguous. Algorithmic steps of ROI are as follows [25, 26]: Step 1: First, to divide the objects into groups and calculate the lowest detection resolution for each group, prior information is used. Step 2: Then image is down sampled to the required resolution. Step 3: Detecting image pixels with strong optical properties for each group and using morphological processing to increase the likelihood that each region, referred to as a region-of-candidates (ROC), corresponds to just one target. Step 4: The ROCs were calculated using a voting technique that maximized the number of pixels in these regions for all groups. Finally, using prior information, integrate surviving ROCs so that target pixels with weak optical characteristics are included within the relevant ROC. Extended ROCs are regarded as ROIs.

Comparative Analysis of Image Segmentation Methods for Mushroom …

657

3.3 Color Threshold-Based Mushroom Disease Segmentation [27] The global thresholding principle is used to identify diseased part. Images acquired are transformed to L * a * b color space to suppress the background using Eqs. (3)–(5). L = 116(v/vn ) − 16

(3)

a = 500 f (u/u n ) − f (v/vn )

(4)

b = 700(v/vn ) − f (w/wn )

(5)

where u, v, w t are assumed variables and R, G, B are values of RGB channels u = 0.412 ∗ R + 0.357 ∗ G + 0.18 ∗ B; v = 0.2126 ∗ R + 0.715 ∗ G + 0.072 ∗ B; w = 0.019 ∗ R + 0.119 ∗ G + 0.95 ∗ B; f is function assumed in (3)–(5) and is given by  f (t) =

√ 3

t 7.787t +

16 116

if t > 0.008895 otherwise



The image is then transformed to YCbCr color model using Eqs. (6)–(8) and Lin color model for gamma correction and linearize the image using Eq. (9). Morphological operations are carried out, after that, a final multi-thresholding is used to segment the diseased area and by transforming it to binary image, and disease region may be seen clearly. Threshold values for each channel of the L * a * b color model are determined using histogram analysis. Finally, to remove residue morphological operations are used. Figure 1 shows the color threshold methods block schematic. Y = 12 + (R ∗ 65.738/256) + (G ∗ 129.057/256) + (B ∗ 25.064/256)

(6)

Cb = 128 − (R ∗ 37.95/256) − (G ∗ 74.49/256) + (B ∗ 112.439/256)

(7)

Cr = 128 + (R ∗ 112.439/256) − (G ∗ 94.15/256) + (B ∗ 18.285/256)

(8)

L out = AL in γ where L in is input, power and A = 1, L out is output.

(9)

658

Y. Rakesh Kumar and V. Chandrasekhar

Fig. 1 Mushroom disease segmentation block schematic using laboratory channel threshold method

4 Results and Discussions on Disease Segmentation The results of simulations carried out in MATLAB environment for diseases like cobweb, dry bubble wet bubble, mites, and bacteria blotch are shown in the first column (CO1, DO2, WO3, MO4, BO5) in Table 1. Subjective analysis is carried out for K-means, ROI, and color threshold methods. Sample test diseased original images and only final stage of the outputs of K-means, ROI, and color threshold methods are shown in the second column (K1, K2, K3, K4, K5), third column (R1, R2, R3, R4, R5), and fourth column (C1, C2, C3, C4, C5), respectively, in Table 1. Color threshold method performed well in segmenting diseased part from mushroom image when compared with other traditional methods for all the diseases considered on mushroom, but in case of some images of diseases like wet bubble, mites along with diseased part and non-diseased part are also extracted in final output images. K-means method is not automatic and could not detect the diseased part properly for cobweb and mites. As ROI is not automatic, it could not cover exactly the zig-zag structures, and some over heads may lie in the output image. The color threshold method is compared with standard techniques in terms of accuracy defined by Eq. (10). Accuracy Total number of images segmentedw.r.t disease partfrom input image = Total number of images considered for test

(10)

To check the performance, 250 diseased and non-diseased images [20, 28–32] are considered. Table 2 shows different segmentation methods accuracy comparison. Accuracy using Eq. (10) is calculated for color threshold method and observed to be 95.6%, and K-means shows 92% accuracy, whereas region of interest shows 89.2%.

Dry bubble

Cobweb

Mushroom disease

DO2

CO1

Original image

K2

K1

K-means-based disease segmented

Table 1 Results for different diseases segmented from mushroom

R2

R1

ROI-based disease segmented

C2

C1

(continued)

Color threshold-based disease segmented

Comparative Analysis of Image Segmentation Methods for Mushroom … 659

Mites

Wet bubble

Mushroom disease

MO4

WO3

Original image

Table 1 (continued)

K4

K3

K-means-based disease segmented

R4

R3

ROI-based disease segmented

C4

C3

(continued)

Color threshold-based disease segmented

660 Y. Rakesh Kumar and V. Chandrasekhar

Bacterial blotch

Mushroom disease

BO5

Original image

Table 1 (continued)

K5

K-means-based disease segmented

R5

ROI-based disease segmented

C5

Color threshold-based disease segmented

Comparative Analysis of Image Segmentation Methods for Mushroom … 661

662

Y. Rakesh Kumar and V. Chandrasekhar

Table 2 Accuracy of different segmentation techniques S. No.

Segmentation techniques

No. of segmented images out of 250 images

Accuracy (%)

1

Color threshold method

239

95.6

2

K-means method

230

92

3

Region of interest

223

89.2

After observing both subjective and objective experimental analysis, proposed color threshold method performs better.

5 Conclusion Mushroom disease detection is critical for mushroom farming and crop yield. Disease detection in a manual method is time consuming and inaccurate because only few samples are evaluated during experimental phase and lack of expertise from one person to other. The computer-assisted disease segmentation methods such as Kmeans and ROI aren’t fully automatic. The user must define K-value and choose one of the resultant clusters in which the required object is segmented. The K-means approach does not provide desired object to be in the same cluster. ROI method provides desired object, but if object is zig-zag or if target areas are not close then, ROI method may remove target pixels or it may include non-targeted pixels. Color threshold method is fully automatic and performs well when compared with other methods, and accuracy is 95.6%. Acknowledgements I thank Dr. Anil Kumar Rao, Scientist ICAR—Directorate of Mushroom Research Solan (HP), India, and organization for providing some of the data set (diseased mushroom images) for my experimentation.

References 1. Sharma, S.R., Kumar, S., Sharma, V.P.: Diseases and Competitor Moulds of Mushrooms and their Management. National Research Centre for Mushroom (ICAR), Technical Bulletin (2007) 2. Sharma, V.P., Annepu, S.K., Gautam, Y., Singh, M., Kamal, S.: Status of mushroom production in India. Mushroom Res. 26(2), 111–120 (2017) 3. Fletcher, J.T., Gaze, R.H.: Mushroom Pest and Disease Handbook. Mansion Publishing, London, United Kingdom (2008) 4. Chowdhury, D.R., Ojha, S.: An empirical study on mushroom disease diagnosis: a data mining approach. Int. Res. J. Eng. Technol. 1(4), 529–534 (2017) 5. Platt, J.: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Microsoft Research Technical Report (1998) 6. Jensen, A.L., Boll, P.S., Thysen, I., Pathak, B.K.: A web based system for personalized decision support in crop management. Comput. Electron. Agric. 25, 271–293 (2000)

Comparative Analysis of Image Segmentation Methods for Mushroom …

663

7. Goyal, P., Din, S., Kapoor, S.: A software for diagnosis and management of diseases and pests in white button mushrooms. Int. J. Adv. Res. Comput. Commun. Eng. 8(2), 3136–3139 (2013). ISSN 2319-5940 8. Munirah, M.Y., Rozlini, M., Siti Mariam, Y.: An expert system development: its application on diagnosing oyster mushroom diseases. In: 13th International Conference on Control, Automation and Systems in Kimdaejung Convention Center, Korea, pp. 20–23 (2013) 9. Munirah, Y., Rozlini, M., Mariam, Y.S.: Design and rules development of expert system for diagnosing oyster mushroom diseases. In: Proceedings of the Computer & Information Science (ICCIS), pp. 286–289 (2012) 10. Kim, K.J., Jung, S.-H., So, W.-H., Sim, C.-B.: A study on mushroom pest and diseases analysis system implementation based on convolution neural network for smart farm. Int. J. Control Autom. 11(10), 61–72 (2017) 11. Lim, S.C., Kim, S.-H., Kim, Y.-H., Kim, D.-Y.: Training network design based on convolution neural network for object classification in few class problem. Int. J. Korea Inst. Inf. Commun. Eng. 21(1), 144–150 (2017) 12. Zuva, T., Olugbara, O.O., Ojo, S.O., Ngwira, S.M.: Image segmentation, available techniques, developments and open issues. Can. J. Image Process. Comput. Vis. 2(3), 20–29 (2011) 13. Jothiaruna, N., Joseph Abraham Sundar, K., Karthikeyan, B.: A segmentation method for disease spot images incorporating chrominance in comprehensive color feature and region growing. Comput. Electron. Agric. 165, 104934 (2019). ISSN 0168-1699. https://doi.org/10. 1016/j.compag.2019.104934 14. Deenan, S.P., Satheesh Kumar, J., Nagachandrabose, S.: Image segmentation algorithms for banana leaf disease diagnosis. J. Inst. Eng. (India): Ser. C 101 (2020). https://doi.org/10.1007/ s40032-020-00592-5 15. Siddesha, S., Niranjan, S.K.: Detection of affected regions of disease Arecanut using K-means and Otsu method. Int. J. Sci. Technol. Res. 9, 2 (2020). ISSN 2277-8616 16. Tete, T.N., Kamlu, S.: Plant disease detection using different algorithms. In: Proceedings of the Second International Conference on Research in Intelligent and Computing in Engineering, Vol. 10, pp. 103–106 (2017). ISSN 2300-5963. https://doi.org/10.15439/2017R24ACSIS 17. Sreekanth, G.R., Thangaraj, P., Kirubakaran, S.: Fruit detection using improved K-means algorithm. J. Crit. Rev. 7(12), 5–6 (2020). ISSN-2394-5125 18. Shafi, A.S.M., Bayazid Rahman, Md., Motiur Rahman, Md.: Fruit disease recognition and automatic classification using MSVM with multiple features. Int. J. Comput. Appl. 181(10) (2018). ISSN 0975-8887 19. Pham, V.H., Lee, B.R.: An image segmentation approach for fruit defect detection using kmeans clustering and graph-based algorithm. Vietnam J. Comput. Sci. 2, 25–33 (2015). https:// doi.org/10.1007/s40595-014-0028-3 20. Bhargava, A., Bansal, A.: Fruits and vegetables quality evaluation using computer vision: a review. J. King Saud Univ. Comput. Inf. Sci. 33(3), 243–257 (2021). ISSN 1319-1578. https:// doi.org/10.1016/j.jksuci.2018.06.002 21. Qiangqiang, Z., Zhicheng, W., Weidong, Z., Yufei, C.: Contour-based plant leaf image segmentation using visual saliency, image and graphics. ICIG 2015. Lecture Notes in Computer Science, vol. 9218, pp. 48–59. Springer, Cham (2015). Print ISBN 978-3-319-21962-2. https:// doi.org/10.1007/978-3-319-21963-9_5 22. Bala Naga Bhushanamu, M., Purnachandra Rao, M., Samatha, K.: Plant curl disease detection and classification using active contour and Fourier descriptor. Eur. J. Mol. Clin. Med. 7(5), 1088–1105 (2020) 23. Eastwood, D., Green, J., Grogan, H., Burton, K.: Viral agents causing brown cap mushroom disease of Agaricus bisporus. Appl. Environ. Microbiol. J. 8(20), 7125–7134 (2015) 24. Elibuyuk, I.O., Bostan, H.: Detection of a virus disease on white button mushroom in Ankara, Turkey. Int. J. Agric. Biol. 12(4), 597–600 (2010). ISSN Print 1560-8530 25. https://www.spie.org/news/0414-detecting-regions-of-interest-in-images?SSO=1 26. Sun, S., Zhang, R.: Region of interest extraction of medical image based on improved region growing algorithm. Adv. Eng. Res. 125, 360–364 (2017)

664

Y. Rakesh Kumar and V. Chandrasekhar

27. Rakesh Kumar, Y., Chandrasekhar, V., Rao, A.K.: An automatic multi-threshold image processing technique mushroom disease segmentation. Int. J. Current Eng. Res. 7, 110–115 (2020) 28. Mushroom world. https://www.mushroom.world/ 29. Alexander tsarev mushroom industry. https://en.agaricus.ru/cultivation/diseases/verticillium 30. Agriculture and horticulture development board. https://ahdb.org.uk/knowledge-library/ 31. Fungal-diseases-in-mushrooms Mark den Ouden. https://www.mushroomoffice.com/bacterialblotch 32. The Tamil Nadu agricultural university. https://agritech.tnau.ac.in/farm_enterprises/farm%20e nterprises_%20mushroom_disease.html

An Effective Automatic Facial Expression Recognition System Using Deep Neural Networks G. S. Naveen Kumar , E. Venkateswara Reddy, G. Siva Naga Dhipti, and Baggam Swathi

1 Introduction In human communication, 33% is done orally and 67% is done through non-verbal components according to various studies [1, 2]. Facial expressions are the most common and important mode of interpersonal communication. Reference [3] by Mehrabian classified the representation of emotions to be visual as 56%, vocal as 36%, and verbal as the remaining 8%. First and important sign that transmits the emotion during a conversation is variations in facial expression that is reason which made the researchers attracted by this modality. These are the main categories into which emotion detecting technologies fall. Text-based affect detection, affect detection based on posture, affect detection based on speech, and affect detection based on vision. Speech, gestures, text, facial expressions, blood volume pulse, and other features can be used in emotion recognition. The proposed work is centred on vision-based technology. Positive or negative facial expressions are possible. The basic universal expressions proposed by Ekman et al. (1976) are “Happiness”, “Anger”, “Sadness”, “Surprise”, “Disgust”, and “Fear”, and some of the non-basic expressions are “Irritation”, “Despair”, “Boredom”, “Panic”, “Shame”, and “Excitement”. Facial expression recognition (FER) attempts to automatically recognize the facial expression by analysing the facial feature changes. Most of the facial recognition systems identify facial features by extracting landmarks from the subject’s facial image. It outputs the information about the facial expression recognized so that it can be used further to identify the person’s mood. The approaches that are used to recognize the facial expression fall into two major categories, namely image based and model based. Image-based approach extracts G. S. Naveen Kumar (B) · E. Venkateswara Reddy · G. Siva Naga Dhipti · B. Swathi Depatrment of CSE, Malla Reddy University, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7_60

665

666

G. S. Naveen Kumar et al.

features from images without any prior knowledge about the object of interest. Using the given set of data, process of predicting the class for a data is said to be termed as classification. The types of learners in classification can be of two types: lazy and eager. Lazy learners wait with training data until testing data appears. Usually, they take less training time, but more predicting time, e.g. k-nearest neighbour classifier. Eager learners construct a model using training data before testing data appears. Usually, they take long training time, but less predicting time. To verify the applicability of the classifier, many methods are available. The common methods used for this are hold out and cross-validation. In holdout method, given set of data will be divided according to the ratio of 80:20 (training: testing). Training samples will be used to build a model, whereas the testing samples will be used to test the predicting power of the model. The snowballing curiosity for person expression recognition for several areas like online games and facial expression provides an important clue for teachers to know the learning status of students, human–computer interfaces [4], animation [5], medicine [6, 7], security [8, 9], and diagnosis of ASD in children [10]. Facial expressions [4, 11, 12], language [8, 13], and electroencephalogram [14] are the various characteristics utilized for emotion recognition. This paper is primarily concerned with various FER techniques, with three major steps being pre-processing, feature extraction, and classification. Only image-based FER techniques are used in this paper. FER systems typically deal with illumination variations, pose variations, lighting variations, and skin tone variations. This paper also provides an important research idea for future FER research. Deep neural networks [15] give the maximum pull-out for characteristics for adequate recognition of facial expressions [16, 17].

2 Literature Survey Deep CNN for FER across quite a lot of accessible databases was proposed by Mollahosseini et al. [18]. In the proposed model, augmentation data technique was applied on the images after extracting the facial landmarks. Convolution layers are applied locally to increase the local performance as well as reduce the overfitting problem. Reference Lopes et al. [19] discuss the impact of data pre-processing before training the model for a healthier expression recognizer. Before applying convolutional neural networks to the input, the data already under went through expansion, spin correction, sampling with two completely coupled layers with 256 and seven neurons. Author shows that combining CK+, JAFFE, BU-3DFE pre-processing steps is more effective. Reference [20] discusses the pre-processing techniques implemented by Mohammad Pour in which CNN model is discussed with two convolutional layers followed by max pooling which designate the count of action units stimulated. Reference [21] discusses a convolution neural network planning in which two convolution layers are used successively in the beginning followed by sparse batch normalization to fit the model without overflow. Li et al. [22] discuss the facial occlusion problem with a modified method of CNN, in which a training of

An Effective Automatic Facial Expression Recognition System Using …

667

convolution neural network with a procedure of automatic CNN is applied to the VGG-Net network. Real-world affective faces database, FED-RO, and Affect-Net are the databases used for training and testing the architecture. Reference [23] by Yolcu et al. proposed a method for exposure of the crucial parts of the face. To detect mouth, eyebrow, and eye, the author used three CNN each one to spot a part of the face. Images are passed through crop stage and the detection before applying CNN. Author concluded this method of acquainting the impressive feature gained from subsequent type of convolutional neural network for distinguishing emotion, as a better performer. Agrawal and Mittal [24] discuss the distinct influence of convolution neural network constraints on the emotion acknowledgement using FER-2013 database. CNN contains two successive convolution layers in which the max pooling layer achieves average of 65.77% accuracy. The reference author [25] proposed a deep convolutional neural network with two residual blocks, each with four layers. For a spatio-temporal model, reference [26] proposed a combination of convolutional neural networks and long short-term memory. In addition, reference [27] describes a spatio-temporal convolutional with nested LSTM architecture for multilateral features that is built on three deep learning networks.

3 FER Database Numerous facial expression recognizer datasets are now reachable to the scientists to fulfil the task, with variations in the count and dimensions of images, distinctions of the radiance, count, and expression posture. The summary of some FER database is given in Table 1. FER2013 (Kaggle): Reference [14] discusses the dataset for facial expression recognition (FER-2013) presented in ICML 2013, with 35,888 images with a resolution of 48 × 48. Formerly, 28,709 images and 3589 images are present in training and test data, respectively. The faces in database are robotically enumerated using Google image search API. The faces are labelled as any of the six cardinal expressions as well as neutral. FER dataset is composed with facial occlusion, unfinished faces, low divergence images, and even with faces having eyeglasses. Angry, disgust, and happy sample images are shown in Figs. 1, 2, and 3, respectively, from FER database.

4 Deep Neural Networks for Face Expression Recognition System Due to the deep learning approach’s excellent capacity for automatic recognition, researchers have recently turned their attention there in defiance of conventional facial identification methods. In this regard, we suggested a deep learning-based

668

G. S. Naveen Kumar et al.

Table 1 Summary of some FER databases MultiPie [28]

With fifteen distinct points of view and nineteen various lighting setups, more than 750,000 photos were taken

Angry, disgusted, neutral, joyful, squinty, shrieking, and surprised

MMI [11]

The neutral, onset, apex, and offset are indicated by 2900 videos

There are six basic emotions and one neutral emotion

GEMEP FERA [12]

There are 290 image sequences

Anger, fear, sadness, relief, and happiness

SFEW [13]

Seven hundred images with varying There are six basic emotions and ages, occlusion, lighting, and head one neutral emotion poses

CK+ [14]

There are 590 videos for posed and unposed expressions

Contempt and neutral are two of the six basic emotions

FER2013 [3]

Google image search yielded 35,890 grayscale images

There are six basic emotions and one neutral emotion

JAFEE [15]

Ten Japanese females posed for 215 There are six basic emotions and grayscale images one neutral emotion

BU-3DFE [16]

2500 3D facial images captured on two views—45° and + 45°

CASMEII [17]

Sequences of 247 micro-expressions Disgust, surprise, regression, and other emotions

Oulu-CASIA [29]

2881 videos were captured in three different lighting conditions

Affect-Net [18]

Over 440,001 images gathered from There are six basic emotions and the internet one neutral emotion

RAFD-DB [19]

Thirty thousand real-world images

Fig. 1 Sample images from the FER dataset for angry

There are six basic emotions and one neutral emotion

There are six basic emotions and one neutral emotion

There are six basic emotions and one neutral emotion

An Effective Automatic Facial Expression Recognition System Using …

669

Fig. 2 Sample images from the FER dataset for disgust

Fig. 3 Sample images from the FER dataset for happy

face expression recognizer that follows a certain stream and includes crucial steps like feature extraction, data pre-processing, and subsequently emotion categorization. Pre-processing involves various steps such as cropping, scaling, normalization, and face alignment. For better performance before feature extraction, pre-processing is applied in the facial expression recognizer. By applying cropping and scaling process on the face image, nose will be considered as the centre point. Sampling is used for reduction of image size by preserving the features of original image. Smoothening of the image is done by Gaussian filter. For reducing the illumination and variations, normalization is done as a part of pre-processing on facial images. Much intelligibility to the given input images is observed by normalization method used for extraction. The Viola–Jones algorithm is used in the pre-processing technique known as localization. Face alignment is carried out as part of the pre-processing phase using the scale-invariant feature transform flow algorithm. The ROI segmentation procedure in facial expression recognizers is more effective since it correctly detects the facial organs necessary for expression recognition. Feature extraction is performed on images after pre-processing, and with the help of helper function, the features are extracted. Once the features extraction is done, we build a DNN model for classification of the expressions. In this proposed model, we will first train the model and then test the model for a set of images to recognize the emotion and thereby the sentiment or feeling of a person.

670

G. S. Naveen Kumar et al.

Fig. 4 Face expression recognition system

The proposed method with deep neural network model to obtain amended exposure of emotion is shown in Fig. 4. In the proposed model, we are using a sequential model method in Keras to create our model for emotion detection, we are using dense, dropout, flatten, Con2D, and Maxpooling2D layers together to build a basic model that can actually be trained to classify various emotions. We are applying the following deep learning models for emotion classification. (i) (ii) (iii) (iv)

Logistic regression Support vector machine Random forest Voting classifier.

Support: The maximal margin hyperplane classification technique known as vector machines relies on findings from statistical learning theory to ensure strong generalization performance. SVMs are particularly well suited to a dynamic, interactive approach to expression recognition since they display strong classification accuracy even with a limited amount of training data. We choose SVMs as the classifier of choice due to the often-subtle distinctions between distinct expressions such as “anger” or “disgust” in our displacement-based data as well as the vast range of possible variations in a specific expression when performed by various participants. Combining the random forest classifier with features taken from the CNN model is an effective technique. As the characteristic of the fresh data would experience some characteristic loss after passing through each layer of the convolution neural network model, the realistic strategy is to include the random forest in the final pool layer. This is illustrated in Fig. 5. Reference [21] discusses the composition of multiple decision trees to form random forest classifier [RF]. The ultimate result is resolute by voting on randomly selected decision trees. A prospect selection-based method to regulate all the attained decision trees to reach all requirements of virtuous and variety is proposed in this paper. Improvement over conventional decision tree-based classifier is made by random forest classifier by overpowering some confines they have. Most importantly, random forest method

An Effective Automatic Facial Expression Recognition System Using …

671

Fig. 5 Structure of the new model. The former work of this model is the acquisition of convolutional neural network (CNN) features, while the latter work is the connection between CNN features and the improved random forest for facial expression classification

addresses the problem of overfitting by maintaining good accuracy for both training and testing data, as well as it handles the missing data in a better way.

5 Experimental Results This proposed method presents a multiple deep learning model based on voting mechanism of facial expression recognition method, to attain all categories of model of the time and understand the model of decision fusion. Experimental results are shown in Fig. 6. Each classifier in deep learning techniques is trained on every region of interest for five times, every time by using twenty per cent of data as test set and the remaining eighty per cent of data as training set. Consequently, every split is considered as test

Fig. 6 Experimental results in FER2013

672

G. S. Naveen Kumar et al.

set for single time and for the remaining four times as a training set. For the proposed model, the FER-2013 dataset is considered which comprises 35,887 distinct images, out of which 28,709 samples are considered as training set. The total dataset is classified as public and private, among which 3589 examples of public test are exploited for picking of optimal CNN model, whereas to verify the accuracy rate, examples of private test are exploited. Decision tree models that were engendered in our progress are shown in Fig. 7. The performance of used classifiers is compared in Table 2. With logistic regressor model, we got very low train accuracy which is only 31% and test accuracy as 20% and it is very difficult to train such a data. If we look into the result, the test set accuracy is very low, it is 25%. Logistic regression and SVM almost seem to be the same level of test accuracy. And coming to a random forest, we can look into the result is that we have accuracy of 47% on test dataset which is really a very good result. Next, we combined all the models using the voting classifier. We have the test score of 40% accuracy on test dataset which less than random forest. So, in the proposed work we found random forest classifier to be more effective for facial expression recognition.

Fig. 7 Generated decision tree models

Table 2 Classifier performance Classifier

Accuracy train

Accuracy test

F1-score train

F1-score test

Logistic regressor

0.31

0.23

0.23

0.18

Support vector machine

0.42

0.25

0.32

0.17

Random forest

0.99

0.47

0.99

0.46

Voting classifier

0.84

0.40

0.87

0.70

An Effective Automatic Facial Expression Recognition System Using …

673

6 Conclusion In this paper, we present a fully deep neural network model for facial emotion recognition, and the model has been tested on public datasets to assess the performance of the proposed model. The work summarized the performance evaluations of the various classifiers. This work has a good amount of scope for improvement with the help of hyperparameter tuning. Deep learning for facial emotion recognition could be the effective initiation for many of the expression-based applications like online games, costumer feedbacks, learning status of students in online classes, and many more.

References 1. Mehrabian, A.: Communication without words. Psychol. Today 2, 53–56 (1968) 2. Kaulard, K., Cunningham, D.W., Bülthoff, H.H., Wallraven, C.: The MPI facial expression database—a validated database of emotional and conversational facial expressions. PLoS ONE 7, e32321 (2012) 3. Marechal, C., et al.: Survey on AI-based multimodal methods for emotion detection. In: Kołodziej, J., González-Vélez, H. (eds.) High-Performance Modelling and Simulation for Big Data Applications: Selected Results of the COST Action IC1406 cHiPSet, pp. 307–324. Springer International Publishing, Cham (2019) 4. Roddy, C., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human computer interaction. IEEE Signal Process. Mag. 18, 32–80 (2001) 5. Kumar, G.N., Reddy, V.S.K.: Key frame extraction using rough set theory for video retrieval. In: Soft Computing and Signal Processing, pp. 751–757. Springer, Singapore (2019) 6. Jane, E., Jackson, H.J., Pattison, P.E.: Emotion recognition via facial expression and affective prosody in schizophrenia: a methodological review. Clin. Psychol. Rev. 22, 789–832 (2002) 7. Naveen Kumar, G.S., Reddy, V.S.K.: Detection of shot boundaries and extraction of key frames for video retrieval. Int. J. Knowl. Based Intell. Eng. Syst. 24(1), 11–17 (2020) 8. Chloé, C., Vasilescu, I., Devillers, L., Richard, G., Ehrette, T.: Fear-type emotion recognition for future audio-based surveillance systems. Speech Commun. 50, 487–503 (2008) 9. Saste, T.S., Jagdale, S.M.: Emotion recognition from speech using MFCC and DWT for security system. In: Proceedings of the IEEE 2017 International Conference of Electronics, Communication and Aerospace Technology (ICECA), pp. 701–704. Coimbatore, India (2017) 10. Marco, L., Carcagnì, P., Distante, C., Spagnolo, P., Mazzeo, P.L., Rosato, A.C., Petrocchi, S.: Computational assessment of facial expression production in ASD children. Sensors 18, 3993 (2018) 11. Ali, M., Chan, D., Mahoor, M.H.: Going deeper in facial expression recognition using deep neural networks. In: Proceedings of the IEEE 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). Lake Placid, NY, USA (2016) 12. Liu, P., Han, S., Meng, Z., Tong, Y.: Facial expression recognition via a boosted deep belief network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1805–1812. Columbus, OH, USA (2014) 13. Kun, H., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Proceedings of the Fifteenth Annual Conference of the International Speech Communication Association. Singapore (2014) 14. Wu, C.-H., Chuang, Z.-J., Lin, Y.-C.: Emotion recognition from text using semantic labels and separable mixture models. ACM Trans. Asian Lang. Inf. Process. TALIP 5, 165–183 (2006)

674

G. S. Naveen Kumar et al.

15. LeCun, Y.: Generalization and network design strategies. Connect. Perspect. 119, 143–155 (1989) 16. Pooya, K., Paine, T., Huang, T.: Do deep neural networks learn facial action units when doing expression recognition. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. Santiago, Chile (2015) 17. Panagiotis, T., Trigeorgis, G., Nicolaou, M.A., Schuller, B.W., Zafeiriou, S.: End-to-end multimodal emotion recognition using deep neural networks. IEEE J. Sel. Top. Signal Process. 11, 1301–1309 (2017) 18. Mollahosseini, A., Chan, D., Mahoor, M.H.: Going deeper in facial expression recognition using deep neural networks. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–10 (2016). https://doi.org/10.1109/WACV.2016.7477450 19. Lopes, A.T., de Aguiar, E., De Souza, A.F., Oliveira-Santos, T.: Facial expression recognition with convolutional neural networks: coping with few data and the training sample order. Pattern Recognit. 61, 610–628 (2017). https://doi.org/10.1016/j.patcog.2016.07.026. 20. Mohammadpour, M., Khaliliardali, H., Hashemi, S.M.R., AlyanNezhadi, M.M.: Facial emotion recognition using deep convolutional networks. In: 2017 IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI), pp. 0017–0021 (2017). https://doi. org/10.1109/KBEI.2017.8324974 21. Cai, J., Chang, O., Tang, X., Xue, C., Wei, C.: Facial expression recognition method based on sparse batch normalization CNN. In: 2018 37th Chinese Control Conference (CCC), pp. 9608– 9613 (2018). https://doi.org/10.23919/ChiCC.2018.8483567 22. Naveen Kumar, G.S., Reddy, V.S.K.: Video shot boundary detection and key frame extraction for video retrieval. In: Proceedings of the Second International Conference on Computational Intelligence and Informatics, pp. 557–567. Springer, Singapore (2018) 23. Yolcu, G., et al.: Facial expression recognition for monitoring neurological disorders based on convolutional neural network. Multimed. Tools Appl. 78(22), 31581–31603 (2019). https:// doi.org/10.1007/s11042-019-07959-6 24. Agrawal, A., Mittal, N.: Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Vis. Comput. (2019). https://doi.org/10.1007/ s00371-019-01630-9 25. Jain, D.K., Shamsolmoali, P., Sehdev, P.: Extended deep neural network for facial emotion recognition. Pattern Recognit. Lett. 120, 69–74 (2019). https://doi.org/10.1016/j.patrec.2019. 01.008 26. Kim, D.H., Baddar, W.J., Jang, J., Ro, Y.M.: Multi-objective based spatio-temporal feature representation learning robust to expression intensity variations for facial expression recognition. IEEE Trans. Affect. Comput. 10(2), 223–236 (2019). https://doi.org/10.1109/TAFFC. 2017.2695999 27. Naveen Kumar, G.S., Reddy, V.S.K.: High-performance video retrieval based on spatiotemporal features. In: Microelectronics, Electromagnetics and Telecommunications, pp. 433– 441. Springer, Singapore (2018) 28. Meng, Q., Hu, X., Kang, J., Wu, Y.: On the effectiveness of facial expression recognition for evaluation of urban sound perception. Sci. Total Environ. 710, 135484 (2020) 29. Courville, P.L.C., Goodfellow, A., Mirza, I.J.M., Bengio, Y.: FER-2013 Face Database. Universit de Montreal: Montréal, QC, Canada (2013)

Author Index

A Abhisek Sethy, 231 Adithya Babu, 251 Adithya, C. R., 401 Aditi Dandekar, 409 Aishwarya Kashyap, 359 Aishwarya, S. R., 55 Ajit Kumar Rout, 231 Akhil Nair, 535 Albin Davis, 191 Ameer Ali, 191 Aneesh Chowdary, Y., 597 Anikhet Mulky, 535 Anil Kumar Mishra, 359 Anjali, P. C., 379 Anupam Ghosh, 317 Arghya Pathak, 369 Aruna, S. K., 439 Ashik, V., 191 Ashish Patel, 9 Aslamiya, M., 557 Asma Begum, F., 503 Athira Krishnan, 207 Athul Mathew Konoor, 171 Avinash Golande, 83 Ayush Singh, 107

B Baggam Swathi, 665 Bahalul Haque, A. K. M., 557 Balamanikandan, J., 343 Balarengadurai, C., 401 Bhandari, B. N., 493 Bhaskar, T., 221

C Chaitanya, K., 641 Chandra Gunda, 231 Chandrasekhar, V., 653 Chang, Maiga, 343 Chinthala Lavanya, 31 Ch. Swathi, 331

D Deena Babu Mandru, 461 Divya Vetriveeran, 439 Diwakar Agarwal, 621

G Gagandeep Kaur, 297 Gaurang Raval, 139 Gaurav Pandey, 535 Gaurav Srivastava, 107 Gayathri, V., 55 Gebrekiros Gebreyesus Gebremariam, 587 Geetha Garbhapu, 231

H Harini, N., 307 Hrishikesh Mondal, 369 Husna Tabassum, 287

I Indu, S., 587

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. S. Reddy et al. (eds.), Soft Computing and Signal Processing, Smart Innovation, Systems and Technologies 313, https://doi.org/10.1007/978-981-19-8669-7

675

676 J Janani, R., 55 Jayashree Karmakar, 369 Jayashree, J., 31, 275 Jay Parmar, 1 Jenefa, J., 577 Jyostna, K., 493 Jyotsna Malla, 31, 275

K Kannan Pooja, 55 Karthika, R., 597 Karunamoorthi, R., 119 Kiran Babu Sangeetha, 525 Kumar, P. N., 207

L Lakshmi Priya, H., 191 Lathika, D., 159

M Manas Ranjan Panda, 359 Manisha Duevedi, 69 Manjula, B., 243 Mary Anita, E. A., 577 Mathi Senthilkumar, 55 Meruva Sandhya Vani, 461 Mithilesh Kumar Pandey, 21 Mohammad Wazid, 181 Mohammad, Mohammad, 41, 421 Mohammed, Aleem, 41, 421 Mohit Soni, 139 Mokila Anusha, 265 Mrinal Kanti Mandal, 369 Muhamed Ilyas, P., 557 Munindra Kumar Singh, 21 Muthuselvan, S., 191

N Naga Sujini Ganne, 483 Nageswara Rao, P. A., 547 Narra Dhanalakshmi, 513 Natesh, M., 401 Naveen Kumar, G. S., 665 Neelofer Afzal, 635 Niranjan, L., 287 Nitesh Pradhan, 107 Nivetha Vinayakamoorthi, 475

Author Index P Padmavathi, N., 641 Padmavathi, S., 171 Panda, J., 587 Paramesha, K., 401 Pashamoni Lavanya, 265 Payal Nagaonkar, 535 Pokkuluri Kiran Sree, 119 Pooja Shah, 1, 139 Pradeep Gowtham, M., 307 Prasanth, Y., 641 Prashanth Umapathy, 9 Pushpa, T., 287

R Rahul Kumar, 317 Rahul Rangnani, 9 Raja Kumar Sahu, 317 Raja, K., 9 Rajaprakash, S., 191 Rajasekhar Nagulapalli, 567 Rajesh Katragadda, 547 Rajkumar Patra, 317 Rajupudi Durga Devi, 461 Rakesh Kumar, Y., 653 Rakoth Kandan Sambandam, 439 Ramakrishna, H., 401 Ramesh Cheripelli, 331 Ravi, G., 449 Ravi Kiran Varma, P., 127 Ravi Kumar, 317 Ravi Kumar Lagisetty, 379 Reddy, V. S K., 525 Revathy, G., 119 Rishika Shinde, 83

S Sasikala Devi, S., 119 Sabarish, B. A., 95, 159 Sabarish, K., 641 Sai Kiran Routhu, 231 Sai Teja, V., 597 Sajini, S., 577 Sakshi Bidwai, 83 Sakshi Sanghavi, 1 Sakshi Sukale, 83 Saleena, T. S., 557 Samarjeet Borah, 359 Saminadan, V., 607 Sankara Narayanan, G., 95 Saranya, M.N., 567

Author Index Sathiya, R. R., 251 Saurabh Pal, 21 Senthil Kumar Thangavel, 343, 379 Senthil Vadivu, S., 119 Shaheen Layaq, 243 Shailendra Bisariya, 635 Sharada Valiveti, 139 Shiny Duela, J., 9 Shruti Warang, 83 Siddhant Thapliyal, 181 Singh, D. P., 181 Sivadi Balakrishna, 483 Siva Naga Dhipti, G., 665 Sneha Latha, S., 159 Sonal Chawla, 297 Soumen Kanrar, 149 Sowjanya Lakshmi, B. S. S., 127 Sowjanya Ramisetty, 265 Srehari, T., 159 Sriadibhatla Sridevi, 567 Srilakshmi Aouthu, 513 Srinu Kallepalli, 231 Subhashish Pal, 369 Sudha Vaiyamalai, 475 Suneetha, J., 287 Sunil Kumar Muttoo, 69 Sushila Madan, 69 Swati Rane, 535

677 T Tammali Sushanth Babu, 265 Teja Dhondi, 449 Thaiyalnayaki, S., 439 Tiwari, B. B., 21 Tribhuvan, R. R., 221

U Umamaheswari, K., 641

V Vaibhav E. Narawade, 409 Vamsi Krishna, V., 597 Vazralu, M., 287, 449 Venkaiah Naidu, N., 597 Venkateswara Reddy, E., 665 Vidya Sagar Potharaju, 607 Vijay Prakash Singh, 503 Vijayashree, J., 31, 275 Vivek Prasad, 1

Y Yaswanthram, P., 159