Proceedings of International Conference on Computational Intelligence: ICCI 2021 (Algorithms for Intelligent Systems) 9811921253, 9789811921254

The book presents high quality research papers presented at International Conference on Computational Intelligence (ICCI

212 116 16MB

English Pages 490 [475] Year 2022

Table of contents :
Preface
Contents
About the Editors
1 Visually Guided UGV for Autonomous Mobile Manipulation in Dynamic and Unstructured GPS-Denied Environments
1 Introduction
2 Related Work
3 System Description
4 Multi-task Unified Visual Perception Block
5 The Complete Workflow
6 Experiments
7 Conclusion
References
2 Neural Network-Based Motion Control Algorithm for Perching Nano-Quadrotor on Outdoor Vertical Surface
1 Introduction
2 Quadrotor Dynamic Modelling and Problem Formulation
2.1 Problem Statement
3 Design of ChNN-Based Control Algorithm
4 Experimental Results
5 Conclusion
References
3 An Intelligent Game Theory Approach for Collision Avoidance of Multi-UAVs
1 Introduction
1.1 Motivation
1.2 Contribution
2 Problem Formulation
3 Quadrotor Model
3.1 Quadrotor Dynamics
4 Game Theory-Based Nash Equilibrium
4.1 Nash Equilibrium Points for x Position
5 Stability Analysis
6 Simulation Results
6.1 Case 1
6.2 Case 2
7 Conclusion and Future Scope
References
4 Dynamics and Control of Quadrupedal Robot
1 Introduction
2 Robot Design and Kinematics
2.1 Body Design of the Robot
2.2 Leg Design of the Robot
2.3 Stability Analysis of the Robot
2.4 Weight Ratio on the Legs of the Robot
3 Electronics System Working
4 Control System of the Robot
4.1 Gait Strategy Used for Motion
4.2 Motion Planning
4.3 PID Algorithm for the Robot
4.4 Obstacle Detection and Overcoming Algorithm
5 Comparison
6 Result
References
5 Disturbance Observer-Based Sliding Mode Controller with Mismatched Disturbance for Trajectory Tracking of a Quadrotor
1 Introduction
2 Quadrotor Model and Problem Formulation
3 Proposed Disturbance Observer-Based Control Design
4 Stability Analysis
5 Results
6 Conclusion
References
6 Multi-robot Formation Control Using Integral Third-Order Super-Twisting Controller in Cyber-Physical Framework
1 Introduction
2 Problem Formulation and Preliminaries
2.1 Physical Model
2.2 Cyber Modeling
2.3 Problem Formulation
3 Path Planning and Controller Design
3.1 Path Planning Control
3.2 Sliding Manifold Design
3.3 Control Law Design
3.4 Stability Analysis
4 Time-Varying Formation Control Under CPS Framework
5 Simulation Results
6 Conclusions
References
7 RFE and Mutual-INFO-Based Hybrid Method Using Deep Neural Network for Gene Selection and Cancer Classification
1 Introduction
2 Related Work
3 Proposed Methodology
3.1 RNGS Phase
3.2 CC Phase
3.3 Experimental Setup
3.4 About Dataset
4 Results
5 Conclusions and Future Work
References
8 Biased Online Media Analysis Using Machine Learning
1 Introduction
2 Related Works
2.1 Ideological Bias Detection
2.2 Language Bias Detection
2.3 Political Text Bias Detection
2.4 News Articles Bias Detection
3 Proposed Methods and Discipline
3.1 Data Collection
3.2 Data Labelling
3.3 Data Pre-processing
3.4 Model Selection
4 Experimental Setup
4.1 Data Representation Vector
4.2 Model Setup
4.3 Result and Analysis
5 Conclusion
References
9 Coverless Information Hiding: A Review
1 Introduction
2 Background Information and Similar Work
2.1 Traditional Image Steganography
2.2 Steganalysis
2.3 Coverless Information Hiding
2.4 Progress in the Past Five years
3 Fundamental Framework
3.1 Coverless Information Hiding Based on Robust Image Hashing [12]
3.2 A Novel Coverless Information Hiding Method Based on the Most Significant Bit of the Cover Image [18]
3.3 Coverless Image Steganography Based on Image Segmentation [1]
3.4 Coverless Information Hiding Based on the Generation of Anime Characters [19]
3.5 Coverless Image Steganography Based on Jigsaw Puzzle Image Generation [8]
3.6 Coverless Real-Time Image Information Hiding Based on Image Block Matching and Dense Convolutional Network [16]
3.7 Coverless Image Steganography Based on Discrete Cosine Transform and Latent Dirichlet Allocation (LDA) Topic Classification [11]
3.8 Coverless Image Information Hiding Based on Generative Model [20]
3.9 Coverless Information Hiding Based on DCGAN [10]
4 Performance Evaluation
5 Current Key Challenges
6 Conclusion and Future Work
References
10 A Review on Transliterated Text Retrieval for Indian Languages
1 Introduction
2 Transliteration Concepts
3 Classification of Transliteration
4 Transliteration for the Indian Languages
4.1 Evaluation Metrics
4.2 Generation-Based Transliteration for the Indian Languages
4.2.1 Basic Rule-Based System
4.2.2 Basic Rule-Based System with Enlarged Alphabet
4.2.3 Character Sequence Modeling (CSM)
4.2.4 CRT-Based Transliteration
4.2.5 SMT-Based Transliteration
4.2.6 Serial Compositional Transliteration Systems
4.2.7 Parallel Compositional Transliteration Systems
4.3 Mining-Based Transliteration for the Indian Languages
4.3.1 Mining Comparable Data (Mint)
5 Result and Discussion
6 Conclusion
References
11 Learning-Based Smart Parking System
1 Introduction
2 Analysis and Design
3 Proposed Work and Implementation
3.1 Parking Slot Detection
3.2 Automatic Parking
4 Results and Discussion
5 Conclusion
References
12 Automated Identification of Tachyarrhythmia from Different Datasets of Heart Rate Variability Using a Hybrid Deep Learning Model
1 Introduction
2 Material and Methods
2.1 ECG Data Records
2.2 Convolution Neural Network
2.3 Long Short-Term Memory
2.4 Hybrid Deep Model Architecture
3 Result
4 Discussion
5 Conclusion
References
13 Automatic Pathological Myopia Detection Using Ensemble Model
1 Introduction
2 Literature Survey
3 Materials and Methods
3.1 Methodology
3.2 Datasets
3.3 Preprocessing
3.4 Proposed Designed
4 Experiments and Results
4.1 First Model
4.2 Second Model
4.3 Third Model
4.4 Ensembled Model
5 Comparative Study
6 Conclusion and Future Scope
References
14 Revolutionary Solutions for Comprehensive Assessment of COVID-19 Pandemic
1 Introduction
2 Harsh Effects of SARS-CoV 2 on Respiratory System
3 Sophisticated Methods of Implementation for COVID-19 Treatment
3.1 Nucleic Acid Amplification Tests (NAAT)
3.2 Use of Hydroxychloroquine Drugs (HCQ)
3.3 Clinical Trials of Convalescent Plasma Therapy
3.4 Reverse Transcriptase-Polymerase Chain Reaction (RT-PCR) Tests
3.5 Use of Plaquenil Drugs
3.6 Solidarity Trial of Antiviral Drug `Remdesivir'
3.7 Thapsigargin (TG): A Novel Antiviral Drug
4 Proposed SWIFT Technique for Procuring Lungs Imaging of ARDS Patients and Further Detection of COVID-19
5 Conclusion and Future Extensions
References
15 Application of Complex Network Principles to Identify Key Stations in Indian Railway Network
1 Introduction
2 Paper Summary
3 Graph Construction
3.1 Simple Graph Model
3.2 Hypergraph Model
4 Simple Graph Analysis
4.1 Global Network Information
4.2 Visualization
4.3 Important Stations
5 Hypergraph Analysis
6 Conclusion
7 Future Work
8 Appendix: Metrics
References
16 Text Classification Using Deep Learning: A Survey
1 Introduction
2 Review Methodology
3 Review of the Previous Methods
4 Data set Related to Text Classification
5 Metrics for Text Classification
5.1 Accuracy
5.2 Precision
5.3 Recall
5.4 Specificity
5.5 F-Measure
5.6 Micro-averaging and Macro-averaging
5.7 Matthews Correlation Coefficient (MCC)
5.8 ROC
6 Conclusion and Future Scope
References
17 Significance of Artificial Intelligence in COVID-19 Detection and Control
1 Introduction
2 Procedure Implicated in AI-enabled Technique for COVID-19
3 Methods
3.1 Criteria of Acceptance
3.2 Data Sources and Search Policy
3.3 Method of Study Selection
3.4 Analyses of Data (Qualitative and Quantitative)
3.5 Descriptive Analysis of Search Result
4 Discussion
4.1 Model 1: Computational Epidemiology (CE)
4.2 Model 2: EDD Model
4.3 Model 3: DP Models
5 Conclusion
References
18 Anomalous Human Activity Detection Using Stick Figure and Deep Learning Model
1 Introduction
2 Literature Survey
2.1 Pose Model Selection
3 Implementation
3.1 Data Collection
3.2 Model Training
3.3 Prediction
3.4 Data Visualization and Preprocessing
3.5 Results
3.6 Output
4 Conclusion
References
19 Noise Removal in ECG Signals Utilizing Fully Convolutional DAE
1 Introduction
2 Literature Survey
3 Proposed Methodology
4 Experimental Results
5 Conclusions
References
20 Performance Investigation of RoF Link in 16 Channel WDM System Using DPSK Modulation Technique
1 Introduction
2 Simulation Model
3 Simulation Results and Discussion
4 Conclusion
References
21 A Comprehensive Study on Automatic Emotion Detection System Using EEG Signals and Deep Learning Algorithms
1 Introduction
2 Literature Review of Various Emotion Detection Techniques
2.1 Deep Learning-Based Emotion Detection Systems
2.2 Machine Learning-Based Emotion Detection Systems
3 Conclusion
References
22 Sensor-Based Secure Framework for IoT-Based Smart Homes
1 Introduction
2 Literature Survey
3 Proposed System
3.1 Motion Sensor with Mailing System
3.2 RFID Security System
4 Conclusion
References
23 Manipulating Muscle Activity Data from Electromyography for Various Applications Using Artificial Intelligence
1 Introduction
2 Collecting Data from Non-invasive Electrodes
2.1 sEMG Signal Analysis
3 Applications
4 Conclusion
References
24 A Framework for Automatic Wireless Alert System for Explosive Detection
1 Introduction
2 Methodology
2.1 Geo-Fence Creation
2.2 Authentification
2.3 Alarm Generator
3 Testing and Results
4 Conclusion
References
25 IoT and Deep Learning-Based Weather Monitoring and Disaster Warning System
1 Introduction and Literature Review
1.1 Technology and Software Used
1.2 Literature Review
2 Methodology and Proposed System
2.1 Proposed System
3 Components Required
3.1 Hardware
3.2 Software
4 Implementation
4.1 Setting Up Netcat Server in Raspberry Pi
4.2 Developing the Models
4.3 Deep Learning Models Used
5 Result and Conclusion
6 Limitations
7 Scope for Future Work
References
26 Sketch-Based Face Recognition
1 Introduction
2 Methodology
2.1 CNN for Image/Sketch Detection
2.2 Image to Sketch Conversion
2.3 Mask RCNN
3 Results and Discussion
3.1 Dataset
3.2 Mask RCNN for Sketch-Based Face Recognition
3.3 Performance in Terms of Loss
3.4 Recognition Rate
4 Conclusion
References
27 Aerial Object Detection Using Different Models of YOLO Architecture: A Comparative Study
1 Introduction
2 Literature Survey
3 Materials and Methods
3.1 Data set
3.2 Data Pre-processing and Splitting
3.3 Models
4 Experiments and Results
4.1 Hardware and Software Setup
4.2 Training and Testing Data
4.3 Evaluation Criteria
4.4 Training Single Convolution Mode
4.5 Results and Discussion
4.6 Comparative Study
5 Conclusion and Future Scope
References
28 Video Anomaly Classification Using DenseNet Feature Extractor
1 Introduction
2 Literature Survey
3 Materials and Methods
3.1 Dataset
3.2 Image Processing
3.3 Proposed Model
4 Experiments and Results
4.1 Hardware Set-up
4.2 Performance Metric
4.3 Calculated Results
4.4 Comparative Study
5 Conclusion and Future Scope
References
29 Empirical Analysis of Novel Differential Evolution for Molecular Potential Energy Problem
1 Introduction
2 Problem Statement
2.1 Basic Differential Evolution Algorithm
2.2 Molecular Potential Energy Problem (MPEP)
3 Novel Differential Evolution (NDE)
3.1 NDE Algorithm
4 Result Analysis and Discussion
4.1 Standard Benchmark Function
4.2 Parametric and Experimental Settings
4.3 Analysis over Standard Benchmark Functions
4.4 Analysis over Molecular Potential Energy Problem
5 Conclusion
References
30 Sign Language versus Spoken English Language—A Study with Supervised Learning System Using RNN for Sign Language Interpretation
1 Introduction
2 Current Scenario with Major Challenges
3 Prominent Variance Between English and Sign Language
4 Sign Language Interpretation System
4.1 Steps Involved in Model of Interpretation of Sign Language
4.2 Subjective Analysis of a Given Solution
5 System Implementation
6 Results
7 Conclusion
References
31 Hidden Markov Modelling for Biological Sequence
1 Introduction
2 Review on Related Research
3 Description of the Data
4 Methodology
4.1 Markov Chain
4.2 Embedded Markov Chain
4.3 Hidden Markov Model
4.4 Artificial Neural Network
5 Results and Discussions
6 Conclusion
References
32 Proposed Crowd Counting System and Social Distance Analyzer for Pandemic Situation
1 Introduction
2 Background and Related Work
3 Proposed System and Methodology
3.1 Approach
4 System Implementation
4.1 Library
4.2 Input Collection
4.3 Video Capturing
4.4 Human Detection
4.5 Counting ID
5 Result and Discussion
6 Conclusion
7 Future Work
References
33 A Novel Ensemble Model to Summarize Kannada Texts
1 Introduction
2 Literature Survey
3 Methodology
3.1 Data Acquisition
3.2 Data Cleaning
3.3 Stemming
3.4 Stop Words
3.5 TF–IDF Computation
3.6 TF–GSS Computation
3.7 Positional Ranking of Sentences
3.8 Proposed Ensemble Hybrid Model and Efficiency Computation
3.9 Model Flexibilty and Performance Considerations
4 Results and Discussions
5 Conclusion
References
34 Parallel Computation of Probabilistic Rough Set Approximations
1 Introduction
2 Preliminaries
2.1 Rough Set Theory
2.2 The Basic Probabilistic Rough Set Model
2.3 MapReduce Programming Model
3 Proposed Algorithm
4 Experimental Results
4.1 Experimental Environment
4.2 Performance Analysis
4.3 Scalability of PACPRSA
5 Conclusion
References
35 Simplified TOPSIS for MLN-MODM Problems
1 Background of Problem
2 Problem Introduction
3 Simplified TOPSIS for ML-MODM Problems
4 Simplified TOPSIS Algorithm for ML-MODM Problems
5 Illustrative Numerical Example
6 Concluding Remarks
References
36 A Comprehensive Review Analysis on PSO and GA Techniques for Mathematical Programming Problems
1 Introduction
1.1 Related Studies and Contributed Works
1.2 Target Research Groups
2 Background—PSO and GA Methodologies
2.1 Comparison of PSO and GA
2.2 Hybrid of PSO and GA
3 Review of Literature
4 Comprehensive Analysis on Review
5 Research Benefits and Prospective Research Directions
5.1 Research Benefits
5.2 Prospective Research Directions
6 Concluding Remarks
References
Author Index

Recommend Papers

Proceedings of International Conference on Computational Intelligence: ICCI 2020 (Algorithms for Intelligent Systems) 9811638012, 9789811638015

The book presents high quality research papers presented at International Conference on Computational Intelligence (ICCI

120 47 12MB Read more

Proceedings of International Conference on Computational Intelligence: ICCI 2022 (Algorithms for Intelligent Systems) 9819928532, 9789819928538

The book presents high quality research papers presented at International Conference on Computational Intelligence (ICCI

109 2 10MB Read more

Proceedings of the International Conference on Computational Intelligence and Sustainable Technologies: ICoCIST 2021 (Algorithms for Intelligent Systems) 9811668922, 9789811668920

This book presents the collection of the accepted research papers presented in the 1st ‘International Conference on Comp

108 27 21MB Read more

Proceedings of International Joint Conference on Advances in Computational Intelligence: IJCACI 2021 (Algorithms for Intelligent Systems) 981190331X, 9789811903311

This book gathers outstanding research papers presented at the 5th International Joint Conference on Advances in Computa

121 48 15MB Read more

Proceedings of International Conference on Computational Intelligence and Emerging Power System: ICCIPS 2021 (Algorithms for Intelligent Systems) 9811641021, 9789811641022

This book gathers outstanding research papers presented in the International Conference on Computational Intelligence an

126 71 11MB Read more

Proceedings of International Joint Conference on Advances in Computational Intelligence: IJCACI 2020 (Algorithms for Intelligent Systems) 9811605858, 9789811605857

This book gathers outstanding research papers presented at the International Joint Conference on Advances in Computation

112 55 18MB Read more

Proceedings of International Conference on Computational Intelligence and Computing: ICCIC 2020 (Algorithms for Intelligent Systems) 9811633673, 9789811633676

This book includes the original, peer-reviewed research articles from the International Conference on Computational Inte

107 6 12MB Read more

Proceedings of International Conference on Communication and Computational Technologies: ICCCT 2021 (Algorithms for Intelligent Systems) 9811632456, 9789811632457

This book gathers selected papers presented at 3rd International Conference on Communication and Computational Technolog

118 60 30MB Read more

Proceedings of International Conference on Intelligent Cyber-Physical Systems: ICPS 2021 (Algorithms for Intelligent Systems) 9811671354, 9789811671357

This book presents innovative work by leading academics, researchers, and experts from industry which is useful for youn

116 36 11MB Read more

Proceedings of 2nd International Conference on Artificial Intelligence: Advances and Applications: ICAIAA 2021 (Algorithms for Intelligent Systems) 9811663319, 9789811663314

This book gathers outstanding research papers presented in the 2nd International Conference on Artificial Intelligence:

106 91 22MB Read more

Proceedings of International Conference on Computational Intelligence: ICCI 2021 (Algorithms for Intelligent Systems)
9811921253, 9789811921254

Author / Uploaded
Ritu Tiwari (editor)
Mario F. Pavone (editor)
Ranjith Ravindranathan Nair (editor)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Algorithms for Intelligent Systems Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar

Ritu Tiwari Mario F. Pavone Ranjith Ravindranathan Nair Editors

Proceedings of International Conference on Computational Intelligence ICCI 2021

Algorithms for Intelligent Systems Series Editors Jagdish Chand Bansal, Department of Mathematics, South Asian University, New Delhi, Delhi, India Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India Atulya K. Nagar, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK

This book series publishes research on the analysis and development of algorithms for intelligent systems with their applications to various real world problems. It covers research related to autonomous agents, multi-agent systems, behavioral modeling, reinforcement learning, game theory, mechanism design, machine learning, meta-heuristic search, optimization, planning and scheduling, artificial neural networks, evolutionary computation, swarm intelligence and other algorithms for intelligent systems. The book series includes recent advancements, modification and applications of the artificial neural networks, evolutionary computation, swarm intelligence, artificial immune systems, fuzzy system, autonomous and multi agent systems, machine learning and other intelligent systems related areas. The material will be beneficial for the graduate students, post-graduate students as well as the researchers who want a broader view of advances in algorithms for intelligent systems. The contents will also be useful to the researchers from other fields who have no knowledge of the power of intelligent systems, e.g. the researchers in the field of bioinformatics, biochemists, mechanical and chemical engineers, economists, musicians and medical practitioners. The series publishes monographs, edited volumes, advanced textbooks and selected proceedings. Indexed by zbMATH. All books published in the series are submitted for consideration in Web of Science.

Ritu Tiwari · Mario F. Pavone · Ranjith Ravindranathan Nair Editors

Proceedings of International Conference on Computational Intelligence ICCI 2021

Editors Ritu Tiwari Indian Institute of Information Technology Pune, India

Mario F. Pavone University of Catania Catania, Italy

Ranjith Ravindranathan Nair Indian Institute of Information Technology Pune, India

ISSN 2524-7565 ISSN 2524-7573 (electronic) Algorithms for Intelligent Systems ISBN 978-981-19-2125-4 ISBN 978-981-19-2126-1 (eBook) https://doi.org/10.1007/978-981-19-2126-1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

This proceedings contains the papers presented at the 2nd International Conference on Computational Intelligence (ICCI 2021) organized by Indian Institute of Information Technology, Pune, India, and technically supported by Soft Computing Research Society during 27–28 December 2021. ICCI 2021 invited ideas, developments, applications, experiences, and evaluations in the broad area of computational intelligence from academicians, research scholars, and scientists and had served as a platform for the researchers to exchange research evidence, personal scientific views, and innovative ideas on the issues related to the broad area of computational intelligence. The topics covered include artificial intelligence, neural network, deep learning techniques, fuzzy theory and systems, rough sets, self-organizing maps, machine learning, chaotic systems, multi-agent systems, computational optimization ensemble classifiers, reinforcement learning, decision trees, support vector machines, hybrid learning, statistical learning, metaheuristics algorithms: evolutionary and swarm-based algorithms like genetic algorithms, genetic programming, differential evolution, particle swarm optimization, firefly algorithm, memetic algorithms, machine vision, Internet of Things, robotics and control, vehicular systems, medical imaging, digital solutions to combat COVID-19 like pandemic, image processing, image segmentation, data clustering, sentiment analysis, big data, blockchain technology, computer networks, signal processing, supply chain management, web and text mining, distributed systems, bioinformatics, embedded systems, expert system, forecasting, pattern recognition, planning and scheduling, system modelling, time series analysis, human–computer interaction, web mining, natural language processing, multimedia systems, and quantum computing. The conference had received an excellent response from the scientific and research community and had witnessed the submission of a large number of research papers with authors from different countries in diverse application fields of computational intelligence. In order to maintain the highest technical quality of the research papers, a rigorous peer-review process had been followed in true spirit that resulted in around 30% acceptance rate of papers. The accepted papers were categorized so as to fit into four different technical tracks, which include robotics and control, machine learning, signal/image processing and IoT, and modelling and simulation. v

vi

Preface

ICCI 2021 is a flagship event of the Soft Computing Research Society, India. The two-day International Conference on Computational Intelligence 2021 started with the inaugural function. Mr. Vikas Chandra Rastogi, IAS, Principal Secretary, Department of Higher and Technical Education, Government of Maharashtra, was the Chief Guest in the inaugural session, and Prof. Rajendra Sahu, Director, ABVIIITM Gwalior, was the Guest of Honour in the session. Other eminent dignitaries present in the inaugural ceremony include Dr. Anupam Shukla, Director, IIIT Pune, and Honorary Chair of ICCI 2021; Prof. J. C. Bansal, General Secretary, SCRS; Prof. S. N. Sapali, Registrar, IIIT Pune; Prof. Ritu Tiwari, General Chair, ICCI 2021; Dr. Mario F. Pavone, General Chair, ICCI 2021; and Dr. Ranjith Ravindranathan Nair, General Chair, ICCI 2021. The conference witnessed keynote addresses from eminent speakers, namely Prof. Valentina Emilia Balas, Aurel Vlaicu University of Arad, Romania; Prof. Satyandra K. Gupta, University of Southern California, USA; Prof. Tomohiro Shibata, Kyushu Institute of Technology, Japan; Dr. Xin-She Yang, Middlesex University, London; Prof. Amit Konar, Jadhavpur University; and Dr. Krishnanand Kaipa, Old Dominion University, Virginia, USA. Pune, India Catania, Italy Pune, India

Dr. Ritu Tiwari Dr. Mario. F. Pavone Dr. Ranjith Ravindranathan Nair

Contents

1

2

3

Visually Guided UGV for Autonomous Mobile Manipulation in Dynamic and Unstructured GPS-Denied Environments . . . . . . . . . Mohit Vohra and Laxmidhar Behera

1

Neural Network-Based Motion Control Algorithm for Perching Nano-Quadrotor on Outdoor Vertical Surface . . . . . . . Sandeep Gupta and Laxmidhar Behera

15

An Intelligent Game Theory Approach for Collision Avoidance of Multi-UAVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Heera Lal Maurya, Padmini Singh, Subhash Yogi, Laxmidhar Behera, and Nishchal K. Verma

4

Dynamics and Control of Quadrupedal Robot . . . . . . . . . . . . . . . . . . . Shashank Kumar, Shubham Shukla, Ishank Agarwal, Arjit Jaitely, Ketan Singh, Vishwaratna Srivastava, and Vibhav Kumar Sachan

5

Disturbance Observer-Based Sliding Mode Controller with Mismatched Disturbance for Trajectory Tracking of a Quadrotor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vibhu Kumar Tripathi, Anuj Nandanwar, and Laxmidhar Behera

6

7

8

27

41

57

Multi-robot Formation Control Using Integral Third-Order Super-Twisting Controller in Cyber-Physical Framework . . . . . . . . . Anuj Nandanwar, Vibhu Kumar Tripathi, and Laxmidhar Behera

71

RFE and Mutual-INFO-Based Hybrid Method Using Deep Neural Network for Gene Selection and Cancer Classification . . . . . Samkit Jain, Rashmi Maheshwari, and Vinod Kumar Jain

85

Biased Online Media Analysis Using Machine Learning . . . . . . . . . . . Arpit Gupta, Anisha Kumari, Ritik Raj, Akanksha Gupta, Raj Nath Shah, Tanmay Jaiswal, Rupesh Kumar Dewang, and Arvind Mewada

99

vii

viii

9

Contents

Coverless Information Hiding: A Review . . . . . . . . . . . . . . . . . . . . . . . . 109 Nitin Kanzariya, Dhaval Jadhav, Gaurang Lakhani, Uttam Chauchan, and Lokesh Gagani

10 A Review on Transliterated Text Retrieval for Indian Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Sujeet Kumar, Siddharth Kumar, and Jayadeep Pati 11 Learning-Based Smart Parking System . . . . . . . . . . . . . . . . . . . . . . . . . . 147 S. Sajna and Ranjith Ravindranathan Nair 12 Automated Identification of Tachyarrhythmia from Different Datasets of Heart Rate Variability Using a Hybrid Deep Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Manoj Kumar Ojha, Sulochana Wadhwani, Arun Kumar Wadhwani, and Anupam Shukla 13 Automatic Pathological Myopia Detection Using Ensemble Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Rajeshwar Patil, Yogeshwar Patil, Yatharth Kale, Ashish Shetty, and Sanjeev Sharma 14 Revolutionary Solutions for Comprehensive Assessment of COVID-19 Pandemic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Shradha Suman Panda, Dev Sourav Panda, and Rahul Dixit 15 Application of Complex Network Principles to Identify Key Stations in Indian Railway Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Ishu Garg, Ujjawal Soni, Sanchit Agrawal, and Anupam Shukla 16 Text Classification Using Deep Learning: A Survey . . . . . . . . . . . . . . . 205 Samarth Bhawsar, Sarthak Dubey, Shashwat Kushwaha, and Sanjeev Sharma 17 Significance of Artificial Intelligence in COVID-19 Detection and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Abhishek Shrivastava and Vijay Kumar Dalla 18 Anomalous Human Activity Detection Using Stick Figure and Deep Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 P. D. Rathika, G. Subashini, S. Nithish Kumar, and S. Ram Prakash 19 Noise Removal in ECG Signals Utilizing Fully Convolutional DAE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Arun Sai Narla, Shalini Kapuganti, and Hathiram Nenavath 20 Performance Investigation of RoF Link in 16 Channel WDM System Using DPSK Modulation Technique . . . . . . . . . . . . . . . . . . . . . 257 Balram Tamrakar, Krishna Singh, and Parvin Kumar

Contents

ix

21 A Comprehensive Study on Automatic Emotion Detection System Using EEG Signals and Deep Learning Algorithms . . . . . . . . 267 T. Abimala, T. V. Narmadha, and Lilly Raamesh 22 Sensor-Based Secure Framework for IoT-Based Smart Homes . . . . . 283 Nidhi Dandotiya, Pallavi Khatri, Manjit Kumar, and Sujendra Kumar Kachhap 23 Manipulating Muscle Activity Data from Electromyography for Various Applications Using Artificial Intelligence . . . . . . . . . . . . . 291 Piyush Agrawal, Apurva Joshi, and Shailesh Bendale 24 A Framework for Automatic Wireless Alert System for Explosive Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Ankita Chandavale, C. Anjali, Sunita Jahirbadkar, and Niketa Gandhi 25 IoT and Deep Learning-Based Weather Monitoring and Disaster Warning System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 Chandra Kant Dwivedi 26 Sketch-Based Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 M. Maheesha, S. Samiksha, M. Sweety, B. Sathyabama, R. Nagarathna, and S. Mohamed Mansoor Roomi 27 Aerial Object Detection Using Different Models of YOLO Architecture: A Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Vinat Goyal, Rishu Singh, Aveekal Kumar, Mrudul Dhawley, and Sanjeev Sharma 28 Video Anomaly Classification Using DenseNet Feature Extractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Sanskar Hasija, Akash Peddaputha, Maganti Bhargav Hemanth, and Sanjeev Sharma 29 Empirical Analysis of Novel Differential Evolution for Molecular Potential Energy Problem . . . . . . . . . . . . . . . . . . . . . . . . 359 Pawan Mishra, Pooja, and Shubham Shukla 30 Sign Language versus Spoken English Language—A Study with Supervised Learning System Using RNN for Sign Language Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 Sampada S. Wazalwar and Urmila Shrawankar 31 Hidden Markov Modelling for Biological Sequence . . . . . . . . . . . . . . . 383 K. Senthamarai Kannan and S. D. Jeniffer

x

Contents

32 Proposed Crowd Counting System and Social Distance Analyzer for Pandemic Situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Mrunal Girhepunje, Simran Jain, Triveni Ramteke, Nikhil P. Wyawahare, Prashant Khobragade, and Sampada Wazalwar 33 A Novel Ensemble Model to Summarize Kannada Texts . . . . . . . . . . 417 S. Parimala and R. Jayashree 34 Parallel Computation of Probabilistic Rough Set Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 V. K. Hanuman Turaga and Srilatha Chebrolu 35 Simplified TOPSIS for MLN-MODM Problems . . . . . . . . . . . . . . . . . . 447 Kailash Lachhwani 36 A Comprehensive Review Analysis on PSO and GA Techniques for Mathematical Programming Problems . . . . . . . . . . . . 461 Kailash Lachhwani Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477

About the Editors

Prof. Ritu Tiwari is currently working as a Professor in Department of Computer Science and Engineering at Indian Institute of Information Technology (IIIT) Pune. Before joining IIIT Pune, she was Associate Professor in Department of Information and Communication Technology at ABV—Indian Institute of Information Technology and Management (IIITM) Gwalior. She has 12 years of teaching and research experience. Her field of research includes Robotics, Artificial Intelligence, Soft Computing and Applications. She has published 05 books and more than 80 research papers in various national and international journals/conferences and is the reviewer for many international journals/conferences. Dr. Tiwari has received Young Scientist Award from Chhattisgarh Council of Science and Technology in the year 2006. She also received Gold Medal in her Post Graduation from NIT Raipur. Prof. Mario F. Pavone is currently working as an Associate Professor in Computer Science at the Department of Mathematics and Computer Science, University of Catania, Italy. Prof. Pavone is focused on the design and development of Meta heuristics applied in several research areas, such as in Combinatorial Optimization; Computational Biology; Network Sciences and Social Networks. Prof. Pavone was visiting professor with fellowship at the Faculty of Sciences, University of Angers, France in 2016. From August 2017, Prof. Pavone is a member of the IEEE Task Force on the Ethical and Social Implications of Computational Intelligence, for the IEEE Computational Intelligence Society (IEEE CIS). Since February 2015, Prof. Pavone is the Vice-Chair of the Task Force on Interdisciplinary Emergent Technologies for the IEEE Computational Intelligence Society (Emergent Technologies Technical Committee—ETTC), whose main aim is to promote the interdisciplinary study of emergent computation in bio-informatics, bio-physics, interdisciplinary domains of economy, medicine, and industry. Prof. Pavone also served as the Chair of the Task Force on Artificial Immune Systems for the IEEE Computational Intelligence Society (IEEE CIS). Prof. Pavone is a member of several Editorial Boards for international journals, as well as member of many Program Committees in international conferences and workshops. Prof. Pavone has also an extensive experience of organizing successful workshops, symposium, conferences and summer schools. xi

xii

About the Editors

Prof. Pavone was also Invited Speaker for several international conferences, and Editor of many special issues in: Artificial Life, Engineering Applications of Artificial Intelligence (EAAI), Applied Soft Computing (ASOC), BMC Immunology, Natural Computing, and Memetic Computing. Etc. Prof. Pavone is the co-founder of Tao Science Research center, and the Scientific Director of ANTs Lab—Advanced New Technologies research laboratory. Prof. Pavone was visiting professor at the School of Computer Science, University of Nottingham, UK and a visiting researcher at the IBM-KAIST Bio-Computing Research Center, Department of Bio and Brain Engineering, at the Korea Advanced Institute of Science and Technology (KAIST) in 2009 and 2006, respectively. Dr. Ranjith Ravindranathan Nair is working as an Assistant Professor in the Department of Electronics and Communication Engineering, Indian Institute of Information Technology (IIIT) Pune. Dr. Nair received his Ph.D in control and automation from the Department of Electrical Engineering, Indian Institute of Technology Kanpur in 2017. After completing his Ph.D., he was working as a postdoctoral fellow in Intelligent Systems and Control Lab at IIT Kanpur itself, in the project funded by Abu Dhabi National Oil Corporation, Gas Research Centre. As a part of the Ph.D. programme, Dr. Nair was involved in research projects sponsored by ISRO and Department of Electronics and Information Technology, Government of India. Prior to joining for Ph.D. programme, he was working with Larsen and Tubro InfoTech Ltd. Chennai. Dr. Nair has many publications to his credit in IEEE Journals and Tier-I IEEE international conferences in Control Sciences. He has published a book titled Intelligent Control of Robotic Systems, Taylor and Francis Publication, CRC Press, which has received the award for Best Book in reference category for Engineering at the Book and Digital Product Awards 2020 organized by the Taylor and Francis Group. Dr. Nair has also received Best Paper Award in ACODS 2012, International Conference held at IISC, Bangalore in Feb 2012. He is serving as Associate Editor for IEEE RO MAN from 2019–2021. His primary research interests include MultiRobotic systems, Cyber-Physical systems, Non-linear control systems, Multi-Agent systems and Intelligent Control.

Chapter 1

Visually Guided UGV for Autonomous Mobile Manipulation in Dynamic and Unstructured GPS-Denied Environments Mohit Vohra and Laxmidhar Behera

1 Introduction Robotics applications are designed to have a transformational impact on day-today life in areas including disaster management, healthcare, household works, transportation, construction, and manufacturing [1, 2]. International robotics challenges such as the Amazon Robotics Challenge (ARC) and Mohamed Bin Zayed international robotics challenge (MBZIRC) are setting new benchmarks to advance the state of the art in autonomous solutions for these robotics applications. In addition to the development of new algorithms, system integration is an essential step to complete the task. Therefore, in this work, we will describe our system architecture for the task of assembling large 3D structures in GPS-deprived environments. However, the proposed system architecture for UGV-based object manipulation is not limited to the above settings and can be deployed for other real-world applications. For the task of assembling a structure, a UGV needs to search, track, grasp, transport, and assemble the bricks according to a user-specific pattern. Apart from that, for a GPS-denied environment, the system should harness the visual information for localization and should also be endowed with a suitable grasping mechanism. While performing the task, a mobile manipulator may need to move across several positions in a large workspace. It encourages us to use the onboard computational devices for various algorithms. Further, the top of each brick consists of a ferromagnetic region (0.15 m × 0.25 m) shaded with yellow color called grasping region and is attached in order to facilitate electromagnetic grasping as shown in Fig. 1.

M. Vohra (B) · L. Behera Indian Institute of Technology,Kanpur, India e-mail: [email protected] L. Behera e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_1

1

2

M. Vohra and L. Behera

Fig. 1 Top: UGV endowed with arm and gripper. Bottom: Bricks used for the task

In order to perform the task, the perception system must locate the grasping region on bricks. To equip the UGV with the functionality of brick searching, tracking, grasping, and placing, in a limited computational resources, we present a deep learning-based multi-task visual perception system. The state-of-the-art object detectors [3, 4] predicts a bounding box for an object. Since the bounding box may contain significant non-object regions, therefore, an additional computation step is required to extract the object information; hence, we prefer detection (or tracking) by segmentation. Since we already have the segmentation information for each object, therefore, to save computations, we prefer segmentation-based object tracking. Wang et al. [5], can track the object by predicting the object mask at each instant, but it relies on bounding box initialization. The key advantages of the tracking by segmentation are (i) the output of the brick detection by segmentation network can be directly used for tracker initialization, (ii) the output of the tracking network can be directly used for tracking in next subsequent frames, and (iii) the detection network and tracking network can be integrated into a single network, as both networks are performing a segmentation task. Apart from the vision system, we also have designed a selfadjusting electromagnetic gripper, which, when pressed against the brick surface, can adjust itself without loading the end-effector motors. It is done to align the magnetic gripper appropriately with the grasping regions to ensure a firm grasp. Overall, the main contributions of this work can be summarized as below:

1 Visually Guided UGV for Autonomous Mobile …

3

1. Multi-task deep neural network-based visual perception system for brick detection, part detection, instance segmentation, and tracking. 2. Development of an self-adjusting electromagnetic gripper. 3. System integration and a state machine to realize a fully autonomous UGV-based robotic manipulation system.

2 Related Work Object detection: RCNN is one of the first successful object detectors. The subsequent versions of RCNN are Fast-RCNN and Faster-RCNN, which have less inference time as compared to RCNN. In addition RetinaNet [6], FCOS [7] performs detection by predicting the axis-aligned bounding boxes. In general, the predicted box may contain significant background. Hence, to reduce the background impact in the detection, various solutions have been presented in the literature. For example, in [8], the author predicts a rotated bounding box to maximize the object regions inside the predictions. Similarly, Mask-RCNN [9] predicts the mask of target object inside the bounding box predictions. Object Tracking: Discriminant correlation filter (DCF)-based trackers [10] localize the object by computing the correlation between the filter and the search window. Siamese neural network-based trackers [11] localize the object by calculating the similarity between the target image and the search window. Some deep trackers directly regress the bounding box parameters of the object in the next frame by comparing the target object patch and the search window [12]. The third type of tracker uses a classifier-based approach to localize the object in the next frame. In this, a classifier assigns a score to each target candidate, which is generated at different scales and random locations around the previous object location [13], and a target candidate with maximum score is considered as the new object location. Object Part Detection: Recent approaches [14] have demonstrated part detection in point cloud data for 3D object detection. Conversely, [15] has detected object parts in the image plane for the task of detection, tracking, and segmentation. Since in the task, we have to detect yellow areas (ferromagnetic regions) from the brick surface, and we have to classify whether it is related to green brick, blue brick, or red brick. Therefore, it is intuitive to detect objects (or bricks) in the image plane and then detect yellow regions corresponding to each brick region, which leads us to integrate the function of object detection and part detection in a single network.

3 System Description UGV Platform: The UGV hardware system used for the task is shown in Fig. 1. The hardware setup consists of a UR5 robotic arm deployed on a ROBOTNIK Summit-

4

M. Vohra and L. Behera

Fig. 2 Proprietary electromagnetic gripper

XL mobile base. The UR5 arm is a 6-DoF industrial arm having an end-effector position accuracy of ±0.1 mm and a payload capacity of 5 kg. The mobile base has four high power motor mecanum wheels with a maximum payload of 100 kg. The mecanum wheel allows the base to move in any direction while keeping the front in a constant direction. The UGV is equipped with two RGB-D sensors: one sensor is mounted on the wrist1 of the UR5 arm, and the other sensor is mounted at the base’s front face, as shown in Fig. 1. The raw images captured by the first sensor are used for extracting the crucial information about the bricks, while the data from the second sensor is used for the UGV localization. Gripper Design: Figure 2 shows our carefully designed electromagnetic gripper for grasping the desired objects. It consists of a double plate assembly with a polymer form in between them. On one plate, we have attached seven electromagnets with uniform distribution at the center of the gripper. All the electromagnets are connected to the battery through a switch. The switch can be controlled by transmitting signals high or low on the ROS topic. The double plate assembly is connected to the UR5 end-effector through a steel rod. The presence of the foam in the gripper allows it to adjust according to the tilted brick surface when pressed against the brick surface. Hence, we call the gripper as a self-adjusting gripper. Further, to avoid any damage to the gripper, we keep measuring the force exerted on the gripper. If the value exceeds some threshold, then grasping operation is complete; otherwise, we keep pushing the gripper against the brick surface.

4 Multi-task Unified Visual Perception Block The main tasks of the visual perception block are brick detection, part segmentation, instant segmentation, and brick tracking. Limited computational resources prohibit us to use multiple CNN for the above. Hence, we aim to unify the above four tasks in a deep neural network-based visual perception framework. Figure 3 represents the proposed perception block, and in the following section, we briefly discuss each of the component individually.

1 Visually Guided UGV for Autonomous Mobile …

5

Fig. 3 Visual perception pipeline

Instance Detection and Segmentation: Since there are multiple instances of the bricks, hence it is necessary to perform instance detection and segmentation. The top most choice for instance level segmentation and detection is Mask-RCNN [9]. Being very memory exhaustive, we modified the architecture for our computational constrain system. First, we carefully design a CNN backbone similar to AlexNet, which is represented by C1 . Further, with 3-4 iterations of weight pruning, we ended up with a light model which can infer in run-time. In the baseline model of MaskRCNN, for object detection, the features up to the stage-2 of ResNet are used. In our setup, the object detection features are limited to stage-3. Further, the remaining of the architecture for task of instance detection and for segmentation remains the same. Part Detection: Since we have designed a lightweight Mask-RCNN for object detection and segmentation, further, we have a requirement of object part localization, i.e., ferromagnetic regions. Hence, we aim to extend the previously designed CNN for the above task. In order to achieve this, features from ROI-Align layer are passed through the convolution layers as shown in Fig. 3. The features from convolutional layers are multiplied with the output of the instance segmentation branch. Finally, a binary mask is learned using binary cross-entropy loss. For more detailed explanation, the reader is advised to refer the original work of Mask-RCNN. Conditional Tracking: After the detection of all target brick instances, we aim to localize the bricks in the subsequent frames by predicting the binary mask corresponding to the target bricks. At this instant, a binary mask Mt−1 corresponding to the target brick in the image It−1 is generated and fed to the unified network C2 . In the next step, the current image It is fed to the network, and based on previous image It−1 , and previous binary mask Mt−1 , new mask Mt is predicted. Since the new prediction is guided by previous mask, hence we call the mask as support mask. An isolated AlexNet style CNN (C2 ) (similar to the object detector) is designed to extract spatial feature embedding corresponding to the support mask as shown in Fig. 3. To meet the memory constraints, the architecture C2 has very less number of parameters in comparison with architecture C1 . Further, both tracker and detector share the CNN backbone C1 for extracting the high-dimensional embedding from RGB images.

6

M. Vohra and L. Behera

Now, the features from stage-3, stage-4, and stage-5 of both C1 (RGB images) and C2 (support mask) are merged by executing a concatenation operation. Concatenation operation essentially guides the tracker to predict in next frame by incorporating features from RGB image. Now, these merged features are passed through FPN block for final mask predictions.

5 The Complete Workflow At the start of the task, the UGV is provided with a target pattern for assembling the bricks at the assembly area. By parsing the target pattern file, UGV extracts the target brick ID and performs certain set of operations to bring the target brick from the piles to the assembly area. The complete robotic motion can be divided into various modules, depending on the state of the UGV, which are explained below. Searching Module: In this state, current image is fed to the perception system, which performs bricks detection and segmentation (Fig. 4, Col. 1). If the network ensures the presence of the target brick, then the search module is complete; otherwise, the UGV system will explore the space by rotating the base by 45◦ in a clockwise direction and again execute the search module. Tracking Module: In tracking module, perception system predicts the binary mask Mt corresponding to the target brick ID in the current image It . Further, to ensure that UGV is moving in right direction, the lateral velocity of the UGV is controlled by the horizontal component of the displacement vector formed between the target brick centroid and the image frame centroid as shown in Fig. 4, Col. 2. Further, the

Fig. 4 UGV state at different modules, Row1: view from external camera, Row2: UR5 arm sensor image, Row3: Network output

1 Visually Guided UGV for Autonomous Mobile …

7

Fig. 5 Alignment and brick placing operation

forward velocity of the UGV is proportional to the distance of the target bricks from the UGV, and it gradually decreases to zeros when distance is less than threshold distance (lth ). Alignment Module: In this module, the UGV will align its position w.r.t. the brick as the UR5 arm has a span of 1.0 m only. As a first step, the current image is fed to the perception system for ferromagnetic regions segmentation (Fig. 4, Col. 3). By applying principal component axis (PCA) on the point cloud data corresponding to ferromagnetic regions, the major and minor axes (green and blue axis in Fig. 5, Col. 1) are estimated. The PCA-axis along with the 3D centroid of the point cloud data represents the 6D pose of the brick. Based on the current pose of UGV and the 6D brick pose, piecewise motion planner is executed which consists of three steps: (i) rotate base by a2 , (ii) move forward by distance d, and (iii) rotate base by a1 , where a2 , d, and a1 are estimated online. The final positioning of UGV after alignment is shown in Fig. 5, Col. 2. Grasping Module: Firstly, 6D pose of the brick is estimated (Fig. 4, Col. 4). Due to the sensor noise, the estimated pose is not accurate, hence to compensate for this, the gripper is pressed against the brick surface till the force exerted on the gripper reached a value ( f th ). The foam present in the gripper allows it to adjust according to the tilted brick surface. Thus, force feedback-based grasping, with foam in the gripper, allows us to compensate for the sensor noise. Placing Module: After the grasping, UGV navigates to the wall assembly area. For navigation, we have used the visual-slam module by incorporating the state-of-the art real-time ORB-SLAM2 [16] on UGV. The final place pose for the grasped brick depends on the previously placed brick. Hence, 6D pose of the previous brick is calculated. Based on the previous brick pose, target pattern, and current pattern or current brick ID, the place pose is calculated. After the brick placing operation, the current pattern is updated. Figure 4, Col. 5 represents the UGV state in the placing module. Here the system has aligned w.r.t. to the previous brick (red brick) and again computing the 6D pose of the (red) brick for estimating the new pose for placing the green brick. The simplified flow of the overall system is shown in Fig. 6.

8

M. Vohra and L. Behera

Fig. 6 Simplified state machine of the overall system

6 Experiments To demonstrate the validity and usefulness of the work presented in this paper, we conducted several sets of experiments to evaluate the performance of the proposed visual perception network. Further, the evaluation of the overall system performance is also provided, where we have performed a series of experiments that provide a statistical analysis of the system’s performance. Dataset: In this section, we will explain the procedure for generating the dataset for two tasks, namely detection and tracking. • Detection The collection of data procedure is based on the work [17]. Following the idea of [17], a small-sized dataset of about 100 images is collected, where each image has multiple brick instances. The dataset was split between training set and testing set in the proportion of 7 : 3. Now, for each brick instance, the box regression and the segmentation are performed, while for ferromagnetic regions, only the segmentation is required. During the data generation process,

1 Visually Guided UGV for Autonomous Mobile … Table 1 Kernel sizes. ‘∗’ corresponds to input channels Layer C1 Stage-1 Stage-2 Stage-3 Stage-4 Stage-5 Others

4×3×3×3 8×4×3×3 16 × 8 × 3 × 3 32 × 16 × 3 × 3 32 × 32 × 3 × 3 12 × ∗ × 3 × 3

9

C2 2×1×3×3 4×2×3×3 4×4×3×3 8×4×3×3 16 × 8 × 3 × 3 4×∗×3×3

the mask corresponding to the brick instance and for ferromagnetic regions is marked manually. These masks are used as ground truths, for instance segmentation and brick-part segmentation. The ground truths for box regression are generated by fitting a rectangle on the manually marked object instance mask. In order to augment the dataset, synthetic data augmentation technique is used heavily [17]. • Tracking For tracking, 10 indoor video sequences as well as 10 outdoor sequences, each at the rate of ∼ 30FPS, are collected. The video sequences have the brick instances whose size varies as we move closer to the bricks. Further, the video sequences are down-sampled to the rate of ∼ 2 FPS, and each frame is annotated for instance detection, classification, segmentation, and part segmentation. The complete dataset for tracking contains 200 images, where 150 images are used for training and 50 are used for testing. Further, the dataset has been augmented using the synthetic augmentation technique [17]. Training Policy: We consider the architecture of baseline network as Arch1 , and we also consider two other variants, i.e., Arch2 and Arch3 , which contains 25% and 50% of additional number of parameters as compared to Arch1 . The kernel sizes corresponding to each layers are mentioned in Table 1. Since object detection and tracking are two different tasks, hence separate or individual training procedure is required. In order to avoid the individual training, brick instance masks and brick-part masks are also annotated with the object masks in the training dataset. In this way, the detector can be trained using the tracking dataset. Further, the training images are chosen with equal probability from both datasets, i.e., the detector dataset and tracker dataset. With this technique, the deep neural network can get the experiences of both temporal data (because of video sequences) and the single image object detection. Further, the following hyperparameters are used for training, that is, base learning rate is set to 0.001, we have used the learning rate policy=step, and AD AM optimizer is used with parameters β1 = 0.9 and β2 = 0.99. Ablation Study: Table 2 shows the performance of the deep neural network for various tasks. The accuracy of network along with the timing performance for halfprecision (FP16) and for single-precision (FP32) computations on a single GeForce GTX 1050, 4 GB GPU, is reported. Further, the performances of all the three architectures Arch1 , Arch2 , and Arch3 are reported. From Table 2, we observed

10

M. Vohra and L. Behera

Table 2 Performance Analysis Network Time Box architecture (ms) mIoU Arch1 Arch1 Arch2 Arch2 Arch3 Arch3

(FP16) (FP32) (FP16) (FP32) (FP16) (FP32)

49 123 80 161 101 187

80.3 82.2 81.2 83.5 85.3 86.7

Seg mIoU

Part Seg mIoU

Tracker mIoU

79.2 80.0 79.7. 81.1 80.6 81.9

73.1 72.9 73.6 74.0 73.5 74.1

83.4 84.8 84.1 85.6 84.4 86.4

Table 3 Effect of synthetic data augmentation for Arch1 , FP16 Data augmentation

mIoU score

Color

Scale

Mirror

Blur

Rotate

Synthetic scene

✗

✗

✗

✗

✗

✗

✓ ✓

✓

✓

✓

✓

✓

✓

✓

✓

✓

✓

✓

✓

✓

✓

✓

✓

✓

✓

✓

Detection

Segmentation

Part Tracker Segmentation

23.7

31.8

18.8

15.3

25.3

32.1

21.3

17.7

30.1

38.6

29.1

23.1

32.3

40.2

31.9

25.5

32.5

41.4

33.8

24.1

37.2

49.8

37.7

28.4

80.3

79.2

73.1

83.4

that Arch2 shows an minor improvement over Arch1 , similarly, Arch3 demonstrates slight improvements over Arch1 , despite the 50% more parameters. This is because, the basic network architecture for all variants is same except for the number of parameters. And another reason is that, the object used for experiments has significant differences, and thus intra-class variance is extremely low. Synthetic Data Augmentation: Table 3 demonstrates the impact of synthetic scenes and other augmentations as described in [17]. From Table 3, we observed that by incorporating the synthetic augmentation technique, we can see the significant improvement in the performance. Further, the tracker performance degradation is observed when the blurring effect is included in the augmentation process. Unified versus Non-unified Perception System: The main contribution of this work is the unification of various tasks. Hence, it is very intuitive to compare the performance of multi-task network against the individual network for same tasks, namely detection, part segmentation, and tracking. The individual network architectures have the same configurations as that of multi-task network. It can be observed from Table 4 that the unified pipeline is way better than the individual networks in terms of timing performance. Further the memory consumed by the unified network is 40% of total memory consumed by all three networks. The link for one of the experiment is https://youtu.be/nPiMzrBFJ-A.

1 Visually Guided UGV for Autonomous Mobile …

11

Table 4 Unified versus non-unified perception system, Arch1 , FP16 Network Time(ms) Architecture Detection Part Seg Tracking Arch1 Detection Part segmentation Tracking

– 26 – –

Table 5 Overall system evaluation Module Searching Tracking Alignment Grasping Placing Overall

– – 35 –

– – – 45

Total 49 106

Success trials 346/346 (100.0%) 346/346 (100%) 292/346 (84.4%) 292/292 (100.0%), 292/346 (84.4%) 218/292 (74.6%), 218/346 (63.0%) 218/346 (63.0%)

Overall System Evaluation: The complete robotic system was tested for 50 rounds. In each round, the pile’s initial position is at different locations and has different configurations, and the robotic system has to assemble the 3D structure at the assembly area according to the target pattern. In each round, the target pattern consists of 6 − 8 bricks. Thus overall, the UGV is tested for 346 number of bricks, and for each brick, we count the number of times the individual module has performed satisfactory. The performance of the module is defined as unsatisfactory if the final pose of the UGV, after the execution of the specific module, is random. On the other hand, if the module performs satisfactory, then UGV will land at appropriate pose and can easily switch to next module. Table 5 summarizes the performance of UGV in our experiments. From Table 5, we observed that the searching module and tracking module have 100% accuracy. This is because the dataset is small and network has been trained with extensive data augmentation technique. The alignment module is successful for 292 times out of 346 trials, which shows 84% success rate. This is because, the front sensor gets exposed to the textureless view, and thus visual-slam fails to update the pose of the UGV. For the remaining successful trials, grasping module was tested, and it has shown the 100% success rate and overall the system score is 292/346 which is 84%. Further the placing module is successful in 218/292, as during the brick placing operation, visual-slam fails to update the pose of UGV. The overall system score is 218/346. From the experiments, we observed that one of the main points which can enhance the performance of UGV is to estimate the accurate state of the UGV during alignment module and brick placing module. For this, in future

12

M. Vohra and L. Behera

we aim to localize the UGV by using the data from multiple sensors, which can be integrated at different body parts of the UGV.

7 Conclusion We have presented a robotic solution to enable unmanned ground vehicles (UGVs) to perform the highly complex task of assembling the elementary blocks (or bricks) in the GPS-denied environment. The proposed system consists of a deep learningbased unified multi-task visual perception system for the tasks of instance detection, segmentation, part segmentation, and object tracking. The perception system can infer at the rate of 20 FPS on a single GeForce GTX 1050 4 GB GPU. The proposed visual perception module has been extensively tested for various test cases which includes indoor and outdoor environments, and the performance of the perception module for various tasks is reported in this paper. The propose perception module is integrated onto the UGV system for the task of brick assembly. Further to facilitate the grasping of bricks, an electromagnetic-based self-adjusting gripper is designed, and a force-based feedback grasping strategy is deployed to compensate for the sensor noise. Further, the complete robotic system is tested for 50 rounds of bricks assembly task, and the proposed system has shown the accuracy of 63%. The primary reason for the low accuracy is that visual-slam fails to update the state of the UGV when the depth sensor gets exposed to the textureless view. Hence by incorporating the multiple sensors for localization, accuracy can be increased, which will be considered as future work.

References 1. Vohra M, Prakash R, Behera L (2019) Real-time grasp pose estimation for novel objects in densely cluttered environment. In: 2019 28th IEEE international conference on Robot and Human Interactive Communication (RO-MAN). IEEE, pp 1–6 2. Pharswan SV, Vohra M, Kumar A, Behera L (2019) Domain-independent unsupervised detection of grasp regions to grasp novel objects. In: 2019 IEEE/RSJ international conference on Intelligent Robots and Systems (IROS). IEEE, pp 640–645 3. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788 4. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, Berlin, pp 1–37 5. Wang Q, Zhang L, Bertinetto L, Hu W, Torr PH (2019) Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1328–1338 6. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988 7. Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision, pp 9627–9636

1 Visually Guided UGV for Autonomous Mobile …

13

8. Li S, Zhang Z, Li B, Li C (2018) Multiscale rotated bounding box-based deep learning method for detecting ship targets in remote sensing images. Sensors 18(8):2702 9. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: CVPR 10. Henriques JF, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intelli 37(3):583–596 11. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision. Springer, Berlin, pp 850–865 12. Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: European conference on computer vision. Springer, Berlin, pp 749–765 13. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302 14. Shi S, Wang Z, Shi J, Wang X, Li H (2020) From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE Trans Pattern Anal Mach Intelli 15. Lorenz D, Bereska L, Milbich T, Ommer B (2019) Unsupervised part-based disentangling of object shape and appearance. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10955–10964 16. Mur-Artal R, Tardós JD (2017) Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans Robot 33(5):1255–1262 17. Kumar A, Behera L (2019) Semi supervised deep quick instance detection and segmentation. In: 2019 International Conference on Robotics and Automation (ICRA). IEEE, pp 8325–8331

Chapter 2

Neural Network-Based Motion Control Algorithm for Perching Nano-Quadrotor on Outdoor Vertical Surface Sandeep Gupta and Laxmidhar Behera

1 Introduction Nano-quadrotors are light weighted, compact in size and utilized in wide variety of applications due to which attracted the interest of researchers and industrialists. They have multiple modes of operation like hovering, vertical take-off, aggressive manoeuvrability and landing on different surfaces including inclined and rough surfaces [1]. Moreover, the nano-quadrotors offer satisfactory performance in spite of its complex and coupled dynamics. Due to significant advances in nonlinear control theory and micro-feedback sensors, there is vast opportunity for the new applications of nanoquadrotors perching on walls [2–3] and cable lines for surveillance [4]. The common applications of quadrotors are remote inspection, video shooting and defence services [5]. The effective attitude and position control algorithms are required for quadrotor’s autonomous flight operation. The quadrotor altitude and position control are relatively complicated due to unmodelled dynamics and multivariable input–output system. In this system, there are four inputs to control the roll, pitch, yaw angle and throttle although a quadrotor’s dynamics have mainly six outputs as 3-D position (x, y, z) and Euler angles (∅, θ, ϕ). There are variety of motion control algorithms that exist in literature which are developed for fully actuated systems, but these algorithms cannot be applied directly for underactuated nonlinear systems. The modelling of nano-UAVs usually subjected to unmodelled dynamics [6]. Hence, in development of control law, high nonlinearity and aerodynamics disturbances must to address properly. In the past few years, different versions of PID and LQR from linear control theory have been developed and implemented for quadrotor system. These control techniques have used linearized model and not effective for such system with coupled S. Gupta (B) · L. Behera Electrical Engineering Department, Indian Institute of Technology, Kanpur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_2

15

16

S. Gupta and L. Behera

dynamics [7, 8]. These types of control methods are suitable for attitude control and the hovering state of UAV. The other nonlinear control techniques like backstepping control, sliding mode control, feedback linearization, and adaptive control have been developed to control the attitude and position of quadrotor which are interconnected [9, 10]. The hybrid nonlinear control technique is also used for improved trajectory tracking in [11]. The other category of nonlinear control algorithms is intelligent control techniques such as neural networks (NN’s) and fuzzy logic-based control methods [12, 13]. Many researchers have used neural networks in the system identification as well as in designing controller for the nonlinear systems with unknown dynamics. Few control algorithms have been used with neural networks which are model reference adaptive control (MRAC), model predictive control (MPC) and feedback linearization (FL). Generally, in designing of neural network controllers two steps are used such as (a) system identification and (b) controller design. In the first step, a neural network model is designed for the system. The neural network model of the system is trained offline in batch mode. In the second step, controller is designed and its parameters are optimized using neural network for desired performance. The model predictive control has the largest computational time while in feedback linearization control scheme, computational time is comparatively very small as the designed control law is simply a rearrangement of the system model. In recent times, many researchers proposed neural network-based control techniques for autonomous quadrotor flight operation [14]. The main interest of researchers is to develop adaptive neural control law [15, 16] in the presence of parameter variations, external disturbance and unmodelled dynamics of the UAV system. The major benefit of choosing ChNN is its less complex structure in comparison of multilayer feed forward neural networks for efficient computations [17]. In this work, direct adaptive neural control algorithm is presented for unknown dynamics in the system matrix and control matrix of the system model. The stability analysis is presented via Lyapunov theorem and deriving the weight update laws for Chebyshev neural networks (ChNN). The major benefit of choosing ChNN is its simple structure which requires less computational resources as in case of nano-quadrotors. It provides simpler implementation in embedded systems and fast convergence rate. The proposed neural network-based control algorithm is designed to preserve the stability during the perching operation while keeping minimum error in trajectory tracking of nano-quadrotor. The paper is organized as follows. Section 2 presents the quadcopter dynamic modelling and problem formulation. Design of ChNN-based control algorithm is introduced in Sect. 3. Finally, Sect. 4 provides the experimental results for the quadrotor perching application for surveillance. The conclusion and future scope of this research article are given in Sect. 5.

2 Neural Network-Based Motion Control Algorithm …

17

2 Quadrotor Dynamic Modelling and Problem Formulation A nano-quadrotor is simply a scaled-down version of a normal quadrotor. Nanoquadrotor is lightweight. Due to its various advantages such as silent operation, agility, safety, and so on, the nano-quadrotor is a rapid developing area of small UAVs. There is limited payload capacity, limited flight time due to small battery and compact size of the nano-quadrotor which create major challenges to flight control development. The quadrotor is an underactuated highly nonlinear system, where its twelve states are controlled by four control inputs. The governing dynamics of the quadrotor in three-dimensional space is taken from [18]. Jy − Jz k∅ d ¨∅ = θ˙ ˙ + ∅˙ + u2 Jx Jx Jx kθ d Jz − Jx ˙ + θ˙ + u 3 θ¨ = ∅˙ Jy Jy Jy Jx − Jy k 1 ˙ ¨ ˙ ˙ + u4 = ∅θ + Jz Jy Jz z¨ = −g −

u1 k z z˙ + (cos ∅ cos θ ) m m

x¨ = −

u1 k x x˙ + (sin θ cos cos ∅ + sin sin ∅) m m

y¨ = −

k y y˙ u1 + (sin θ cos ∅ sin − cos sin ∅) m m

(1)

where roll, pitch and yaw are represented by (∅, θ, ), respectively, and (x, y, z) are position coordinates. m is the mass, Jx , Jy , Jy are the inertial values in respective coordinates. The aerodynamics damping/drag coefficients are given as k x , k y , k z , k∅ , kθ , k and d is moment arm. u 1 is force and u 2 ,u 2 , u 3 torques. These four control inputs [17] are given as u 1 = F1 + F2 + F3 + F4 , u 2 = −F1 − F2 + F3 + F4 , u 3 = −F1 + F2 + F3 − F4 ,and u 4 = −F1 + F2 − F3 + F4 where F1 , F2 ,F3 , and F4 are applied forces on the rotors. When the quadrotor aligns in perching mode, the governing dynamics will become [2]. θ¨p = z¨ p = −g −

d uθ p Jy

u zp cos θ p k z z˙ + m m

(2)

18

S. Gupta and L. Behera

x¨ p = −

u x p sin θ p k x x˙ + m m

For the nonlinear model represented in (1) and (2), the objective is to track the desired path. xd , yd , z d , θd , ∅d , andd are the desired values of the states of the system. Desired values of pitch and roll are given in equations (3) and (4). ∅d = sin−1 ⎛

m u x sin d − u y cos d u1

(3) ⎞

⎟ ⎜ u x cos d − u y sin d ⎟ θd = sin−1 ⎜ ⎝

2 ⎠ 1 − u x sin d − u y cos d

(4)

Two virtual control inputs u x and u y are given as u x = (sin θ cos cos ∅ + sin sin ∅) and u y = (sin θ cos ∅ sin − cos sin ∅). In the case of perching (2), the desired states are θpd , xpd , and z pd . The virtual control (1) into two inputs are u xp , u zp and u θ p . The quadrotor dynamics can be divided subsystems: translational subsystem T¨ = f t Tr , T˙r + gt Tr , T˙r u t and rota tional subsystem R¨ t = fr Rt , R˙ t + gr Rt , R˙ t u r , where Tr = [x, y, z] and Rt = [∅, θ, ]. The system matrix f t , fr are unknown dynamics and the control model. For matrix gt , gr are known functions of the quadrotor mathematical perching mode, 2-D system dynamics are represented as T¨rp = f tp Trp , T˙rp + gtp Trp , T˙rp u tp and R¨ tp = f rp Rtp , R˙ tp + grp Rtp , R˙ tp u rp .

2.1 Problem Statement The control inputs u 1 , u 2 , u 3 , u 4 , u x , u y , u x p , u zp , u θ p and adaptive control law have to be designed for relative degree two type of subsystems assuming that ft , fr , f tp and f rp for both the subsystems are unknown dynamics of the system. Here, feedback linearization-based controller is designed and the unknown dynamics of the quadrotor model is estimated using adaptive neural network (ChNN). The error dynamics for which the controller has been designed is discussed below.

3 Design of ChNN-Based Control Algorithm A general MIMO system is given as follows: x˙ = f (x) + g(x)u

(5)

2 Neural Network-Based Motion Control Algorithm …

y = cx

19

(6)

where x ∈ R n , f (x) ∈ R n ,u ∈ R m , g(x) ∈ R m×n ,y ∈ R p ,c ∈ R p×n . Each subsystem of quadrotor dynamics (1) denotes a single input single output nonlinear system. The altitude dynamics are considered, where we take z = z 1 and z˙ = z 2 then altitude dynamics will become: z˙ 1 = z 2

(7)

z˙ 2 = f (z) + g(z)u 1

(8)

y = z1

(9)

The control objective is to design a control input u 1 so that quadrotor follows the desired trajectory z d (t), while the states stay bounded [16–19]. Using the feedback linearization technique, the control input law is provided as u 1 = (g(z))−1 − f (z) + Pdd E z + ρ e˙z + z˙ 2d

(10)

where e is the tracking error, z → (z 1 , z 2 ), ez = z 1 −z 1d , a new variable E z = ρez +e˙z and design parameter terms Pdd and ρ are selected as positive integers. To approximate the unknown function (z), the Chebyshev neural network as adopted. The structure of ChNN contains input layer, Chebyshev polynomial-based function expansion block, tunable weights of the network, a single output node and an adaptive law for tuning the weights neural network providing least estimation error. This neural network model has two parts in which numerical transformation is first part and learning algorithm is second part. The numerical transformation is the first part where enhanced input pattern is multiplied by weight vector to get the network output. In the second part, an adaptive law is taken for weights improvement to minimize the estimation error. The numerical transformation as a functional expansion (FE) is to expand input pattern. The two initial Chebyshev polynomials are chosen as P0 (e) = 1, P1 (e) = e and [P2 (e), P3 (e),……, PN (e)] will be generated by a recursive function [20, 21] as PN +1 (e) = 2e PN (e) − PN −1 (e), where PN (e) represents nth-order polynomial. The argument is taken as −1 < e < 1, and input pattern is n dimensional such as e = (e1 , e2 , . . . ., en )T ∈ R n . Then the enhanced pattern will become as β(e) = [P0 (e1 ), P1 (e1 ), . . . , PN (e1 ), . . . , P0 (en ), P1 (en ), . . . , Pm (en )]T

(11)

T Hence, ChNN neural network output will be expressed as Och = Wmn β(e), where Chebyshev polynomial basis function is β(e), order of Chebyshev polynomial is m, number of inputs is n in input layer and Wmn is weight vector of network. Figure 1 shows the structure of ChNN model.

20

S. Gupta and L. Behera

Fig. 1 Model of single layer two-input Chebyshev neural networks

Based on ChNN model, unknown function f (z). can be approximated as f (z) = T β(ein ) + ε, where ein = [e, e] ˙ T is input vector, Wmn = [W11 , W21 , W31 , Wmm W41 , W12 , W22 , W32 , W42 ] is weight vector and ε is approximation error. Now the T β(ein ) where Wˆ mn is estimation of estimation of f (z) is provided as fˆ(z) = Wˆ mn ˆ Wmn . Now putting f (z) in place of f (z) in Eq. (10), the control law will become u 1 = (g(z))−1 − fˆ(z) + Pdd E z + ρ e˙z + z˙ 2d

(12)

Let us assume ε = 0 and putting control law u 1 in system Eq. (8) we get, z˙ 2 = f (z) + g(z) · (g(z))−1 − fˆ(z) + Pdd E z + ρ e˙z + z˙ 2d T T =Wmn β − Wˆ mn β + Pdd E z + ρ e˙z + z˙ 2d

(13)

T T Defining W˜ T = Wmn − Wˆ mn , the equation will become

z˙ 2d − z˙ 2 = −W˜ T β − Pdd E z − ρ e˙z

(14)

e¨z = −W˜ T β − Pdd E z − ρ e˙z

(15)

We have E z = ρez + e˙z and after differentiating this equation, we get E˙ z = ρ e˙z + e¨z . Substituting e¨ z = E˙ z − ρ e˙z in Eq. (15), E˙ z − ρ e˙z = −W˜ T β − Pdd E z − ρ e˙z

(16)

E˙ z = −W˜ T β − Pdd E z

(17)

2 Neural Network-Based Motion Control Algorithm …

21

This is the linear and stable closed-loop error dynamics after putting control law u 1 in Eq. (8). To obtain the weight update law, Lyapunov theory is used. Let us chose a Lyapunov function V z in which D is a positive definite matrix. 1 2 1 ˜ T −1 ˜ E + W D W 2 z 2

(18)

V˙z = E z E˙ z + W˜ T D −1 W˙˜

(19)

Vz =

From Eq. (17), putting E˙ z into (19),

V˙z = E z −W˜ T β − Pdd E z + W˜ T D −1 W˙˜

(20)

V˙z = −Pdd E z2 − W˜ T (β E z + D −1 W˙ˆ mn )

(21)

If we chose the second term as 0 in Eq. (21), V˙z = −Pdd E z2 which satisfy the condition of stability in sense of Lyapunov (Vz > 0 and V˙z ≤ 0). The update law from Eq. (21) is given below as

β E z + D −1 W˙ˆ mn = 0

(22)

W˙ˆ mn = −Dβ E z

(23)

Finally, the control low for the altitude control will become as, m T ˆ u 1 = cos ∅ cos θ −Wmn β(ein ) + Pdd E z + ρ e˙z + z¨ d , W˙ˆ mn = −Dβ E z

(24)

where z¨ d = z˙ 2d . The same procedure is used to obtain different control laws for rest control inputs (roll, pitch, and yaw) including two virtual control for x and y positions. The control inputs are: ux =

m ˆT −Wmnx β(einx ) + Pddx E x + ρx e˙x + x¨d , u1

W˙ˆ mnx = − Dx β E x uy =

(25)

m ˆT −Wmny β einy + Pddy E y + ρ y e˙ y + y¨d , u1

W˙ˆ mny = − D y β E y

(26)

22

S. Gupta and L. Behera

u2 =

Jx ˆ T −Wmn∅ β(ein∅ ) + Pdd∅ E ∅ + ρ∅ e˙∅ + ∅¨ d , d

W˙ˆ mn∅ = − D∅ β E ∅ . u3 =

Jy ˆ T −Wmnθ β(einθ ) + Pddθ E θ + ρθ e˙θ + θ¨d ) , d

W˙ˆ mnθ = − Dθ β E θ u4 =

(27)

(28)

Jz ˆ T ¨d , −Wmn β(ein ) + Pddθ E + ρ e˙ + d

W˙ˆ mn = − D β E

(29)

And for perching states, the control laws are: uθ p =

Jy ˆ T −Wmnθ p β einθ p + Pddθ p E θ p + ρθ p θ p + θ¨pd , d

W˙ˆ mnθ p = − Dθ p β E θ p . u zp =

m ˆT −Wmnxp β einzp + Pddzp E zp + ρzp e˙zp + z¨ pd , cos θ p

W˙ˆ mnzp = − Dzp β E zp u xp =

(30)

(31)

m ˆT −Wmnxp β einxp + Pddxp E xp + ρxp e˙xp + x¨pd , sin θ p

W˙ˆ mnxp = − Dxp β E xp

(32)

Hence the proposed control input laws will stabilize the quadrotor dynamics as per the Lyapunov theory in which weights are modified using derived update laws.

4 Experimental Results Parameters for Crazyflie nano-quadrotor are taken as: length of the arm l = 0.010 m, mass m = 0.040 kg, inertial parameters in 3D are Jx = 1.112951 ∗ 10−5 , Jy = 1.114361 ∗ 10−5 and Jz = 2.162056 ∗ 10−5 kg/m2 respectively. The quadrotor is equipped with an optical flow sensor for measuring movements with reference to ground and ToF flight sensor (laser sensor) for measuring distance from wall. Experiments are conducted for perching task using ROS running on Linux operating system. Nano-quadrotor’s start location is [x = 0y = 0z = 0] in outdoor environment where x, y, z distances are in metres. The initial Euler angles are at [∅ = 0θ = 0 = 0].

2 Neural Network-Based Motion Control Algorithm …

23

Fig. 2 Experimental working environment

In Fig. 2, the experimental set-up shows the nano-quadrotor, one host computer for communicating with system and target vertical wall. The host computer is used to receive the necessary flight data for orientation estimation and to send few emergency commands. A communication radio module (nRF24LU1) is on-board of the quadrotor. Two test cases are performed on experimental set-up. Case 1: For perching in fixed position, [x = 1y = 0.5z = 1.5] is the target position and θd = 1 radian is the desired angles for pitch. During perching, states of the quadrotor change from 3-D space to 2-D space. Perching in the fixed position of quadrotor can be used for surveillance purpose on high-rise transparent wall. Figure 3 shows the tapped images of fixed position perching. Figure 4 shows the evolution of states using event-trigger strategy. Case 2: Perching and pose changing in fixed position. In this test, [x = 1y = 0.25z = 1] is the desired position and θd = 1radian is the desired angle for pitch. Perching and pose changing in the fixed position of quadrotor can be used for capturing the images from a high-rise building. Figure 5 shows the tapped images, and Fig. 6 shows the evolution of states. Fig. 3 Tapped images of perching in fixed position

24

S. Gupta and L. Behera

Fig. 4 Evolution of system states, a x position, b y position, c z position, d pitch, e position force, f pitch torque

Fig. 5 Tapped images of perching and pose changing in fixed position

5 Conclusion The article presents a perching application of nano-quadrotor using direct adaptive Chebyshev neural network-based control algorithm in presence of unmodelled system dynamics. The control laws are designed using feedback linearization technique and update law for the weights of the network are derived using Lyapunov theory. Experiments are performed for two different scenarios to validate the proposed neural controller. The future scope of the present work is to deal with challenges in climbing and taking off of quadrotor for surveillance and image capturing application.

2 Neural Network-Based Motion Control Algorithm …

25

Fig. 6 Evolution of system states, a x position, b y position, c z position, d pitch, e position force, f pitch torque

References 1. Thomas J, Loianno G, Pope M, Hawkes EW, Estrada MA, Jiang H, Cutkosky MR, Kumar V (2015) Planning and control of aggressive maneuvers for perching on ınclined and vertical surfaces. In: Proceedings of the ınternational design engineering technical conferences and computers and ınformation in engineering conference. Volume 5C, 39th Mechanisms and g. ASME 2. Singh P, Gupta S, Behera L, Verma NK, Nahavandi S (2020) Perching of nano-quadrotor using self-trigger finite-time second-order continuous control. IEEE Syst J 1–11 3. Pope MT, Kimes CW, Jiang H, Hawkes EW, Student Member, IEEE, Estrada MA, Kerst CF, Roderick WRT, Han AK, Christensen DL, Cutkosky MR (2017) A multimodal robot for perching and climbing on vertical outdoor surfaces. IEEE Trans Robot 33(1):38–48 4. Mohta K, Kumar V, Daniilidis K (2014) Vision-based control of a quadrotor for perching on lines. In: IEEE ınternational conference on robotics and automation (ICRA), pp 3130–3136 5. Tayebi A, McGilvray S (2006) Attitude stabilization of a vtol quadrotor aircraft. IEEE Trans Control Syst Technol 14(3):562–571 6. Erginer B, Altug E (2007) Modeling and pd control of a quadrotor vtol vehicle. Proc IEEE Intell Vehicles Symp 894–899 7. Bouabdallah S, Siegwart R (2007) Full control of a quadrotor. Proc IEEE/RSJ Int Intell Robots Syst Conf 153–158

26

S. Gupta and L. Behera

8. Madani T, Benallegue A (2006) Control of a quadrotor mini-helicopter via full state backstepping technique. In: Proceedings of the 45th IEEE conference on decision and control, pp 1515–1520 9. Martins L, Cardeira C, Oliveira P (2021) Feedback linearization with zero dynamics stabilization for quadrotor control. J Intell Robot Syst 101 10. Singh P, Gupta S, Behera L, Verma NK (2021) Sum of square based event-triggered control of nano-quadrotor in presence of packet dropouts. In: International conference on unmanned aircraft systems, pp 767–776 11. Tripathi VK, Behera L, Verma N (2016) Disturbance observer based backstepping controller for a quadcopter. In: Proceedings of 42nd annual conference of the IEEE ındustrial electronics society, IECON, pp 108–113 12. Dierks T, Jagannathan S (2010) Output feedback control of a quadrotor UAV using neural networks. IEEE Trans Neural Netw 21(1):50–66 13. Abdollahi T, Salehfard S, Xiong C, Ying J (2018) Simplified fuzzy-Padé controller for attitude control of quadrotor helicopters. IET Control Theory Appl 12(2):310–317 14. Yogi SC, Tripathi VK, Behera L (2021) Adaptive ıntegral sliding mode control using fully connected recurrent neural network for position and attitude control of quadrotor. IEEE Trans Neural Netw Learn Syst 15. Kar I, Behera L (2009) Direct adaptive neural control for affine nonlinear systems. Appl Soft Comput 756–764 16. Zhao B, Xian B, Zhang Y, Zhang X (2015) Nonlinear robust adaptive tracking control of a quadrotor UAV via immersion and invariance methodology. IEEE Trans Ind Electron 62(5):2891–2902 17. Sornam M, Vanitha V (2018) Application of chebyshev neural network for function approximation. Int J Compu Sci Eng 4:201–204 18. Chen F, Jiang R, Zhang K, Jiang B, Tao G (2016) Robust backstepping sliding-mode control and observer-based fault estimation for a quadrotor UAV. IEEE Trans Ind Electron 63(8):5044–5056 19. Pedro JO, Crouse AJ (2015) Direct adaptive neural control of a quadrotor unmanned aerial vehicle.: 10th Asian control conference (ASCC), pp 1–6 20. Bhat RB, Chakravety S (2004) Numerical analysis in engineering. Alpha Science Int Ltd., Oxford 21. Mall S, Chakraverty S (2017) Single layer chebyshev neural network model for solving elliptic partial differential equations. Neural Process Lett 45(3):825–840

Chapter 3

An Intelligent Game Theory Approach for Collision Avoidance of Multi-UAVs Heera Lal Maurya, Padmini Singh, Subhash Yogi, Laxmidhar Behera, and Nishchal K. Verma

1 Introduction UAV control has been a topic of great interest for the last two decades. It is an autonomous vehicle used for wide applications [1, 2]. Quadrotor also falls in the category of UAVs. In quadrotor control, four control inputs regulate the four rotors of the quadrotor [3]. Out of four control inputs, three control inputs are used to control the Euler angles, called the roll, pitch and yaw. The fourth control input is used to control the quadrotor’s positions x, y and altitude z. There are two control loops for quadrotor control: the first one is inner loop control, and the second one is outer loop control. Inner loop control is the controlling of roll pitch and yaw, and outer loop is the controlling of three positions [3]. Trajectory tracking is relatively easy for quadrotors; however, handling multi-UAV is comparatively tedious task [4]. There are twelve states, three position states, three velocity states, three angular states and three angular velocity states in quadrotors. It can be written as a subsystem of six states with second-order dynamics. Hence to control the entire states of the quadrotor, six subsystems are separately controlled. Therefore, a second-order dynamic controller can be designed for each subsystem for handling all the twelve states. Sometimes dynamics of the model are not entirely known; hence, using RBFNN, the quadrotor dynamics are estimated. Muti-UAVs control is one of the most exciting topics in UAV control; the number of applications increases as the number of UAVs increases [5–7]. For group tasks, it is required to control many UAVs simultaneously. The collision avoidance problem is a very challenging problem for quadrotors. Especially if two quadrotors are moving towards each other, it is very much needed to set the speed of the vehicle such that they reach different set points. H. L. Maurya (B) · P. Singh · S. Yogi · L. Behera · N. K. Verma Electrical Engineering, IIT Kanpur, Kanpur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_3

27

28

H. L. Maurya et al.

Nowadays, game theory is explicitly used for the multi-agent system to reach a common consensus. There are several applications where there is a need for a common consensus point where all the agents perform optimally. One of the applications is distributed temperature control of multi-zones collectively [8]. Traffic management by the vehicles, warehouse automation, and transportation [9, 10]. Exploration of large area by quadrotors [11], etc. The motivation of the work is as follows.

1.1 Motivation 1. Reaching a target value in the quadrotor has several applications, like monitoring a range of areas from a particular point or destroying the enemy’s target point. 2. Different vehicles may be used for doing one particular task; hence, different control strategies are used. 3. Making a common consensus with different vehicles can be achieved with some optimality conditions. 4. Optimality conditions can be achieved with different equilibrium points to execute the task safely without any hindrance. In this work, two different models of quadrotors are taken. The two quadrotors states evolve in various manners to reach the set point. The evolution of the states is needed to be controlled in such a manner so that the different equilibrium points are reached without collision. A pay-off function is designed to achieve optimality conditions using optimal controllers from the game theory. The optimal controllers for each model drag the quadrotor to a particular position to avoid collision. The main contributions of the proposed work are as follows.

1.2 Contribution 1. Two different quadrotor models are taken so that the system state evolves separately. 2. The quadrotor dynamics are unknown; hence, it is estimated using RBFNN. 3. A novel pay-off function is designed to achieve the optimality condition of both models. 4. From the pay-off function, a novel optimal controller is designed such that both quadrotors reach a standard consensus value and Nash equilibrium can be achieved. 5. Using Nash equilibrium, different set points of both quadrotors are obtained to avoid the collision. 6. Simulations are performed to achieve the Nash equilibrium and validate the proposed theory.

3 An Intelligent Game Theory Approach for Collision …

29

2 Problem Formulation Figure 1 depicts the overall problem formulation of the proposed work. In this work, two different models of UAVs are considered. From the various dynamics of the model, different sets of controllers are designed to satisfy the pay-off function. Payoff function design is based on the output of each model. After that, a pseudo-gradient function is designed such that the Nash equilibrium point can be derived for both of the UAVs so that they will not collide with each other. This technique is feasible for real-time problems. After reaching a set point, the two UAVs can perform several applications. Some are hitting a target, patroling a particular range using a visionbased sensor, etc. The proposed work is just proof of the concept.

3 Quadrotor Model Figure 2 represents the 3-D pictorial view of quadrotor in body frame. It has four rotors connected with four arms. ω1 , ω2 , ω3 and ω4 are the rotor speed of four arms. ω1 and ω3 rotate in anticlockwise direction, and ω2 and ω4 rotate in clockwise direction. Due to different direction of rotation forces, upward direction is generated on each rotor arms. From Fig. 2, these forces are F1 , F2 , F3 and F4 . In Fig. 2, x, y, and z are three position states, φ is the roll angle about x-axis, θ is the pitch angle about y-axis, and ψ is the yaw angle about z-axis.

Fig. 1 Over all problem formulation

30

H. L. Maurya et al.

Fig. 2 3-D pictorial view of quadrotor

3.1 Quadrotor Dynamics For multi-UAV collision avoidance problem, two different models of quadrotors are taken.

3.1.1

First Quadrotor Model

The first model is considered as [12]: m x¨ m y¨ m z¨ J1 φ¨ J2 θ¨ J3 ψ¨

= = = = = =

f (x) + (cos φ sin θ cos ψ + sin φ sin ψ)u 1 f (y) + (cos φ sin θ sin ψ − sin φ cos ψ)u 1 f (z) − mg + (cos φ cos θ )u 1 f (φ) + lτ1 f (θ ) + lτ2 f (ψ) + lτ3

(1)

where m mass of the quadrotor and l is the length of the arm. J1 , J2 , J3 are the inertial constant. Here, u 1 is the force applied for position control and τ1 , τ2 , τ3 are torque control applied for roll, pitch and yaw. u 1 , τ1 , τ2 , τ3 are related with forces F1 , F2 , F3 and F4 as: u 1 = F1 + F2 + F3 + F4 , u 2 = −F1 − F2 + F3 + F4 , u 3 = −F1 + F2 + F3 − F4 , u 4 = −F1 + F2 − F3 + F4 . Since quadrotor is an underactuated system, a virtual control inputs in x, y and z directions can be taken as

3 An Intelligent Game Theory Approach for Collision …

31

u1 (cos φ sin θ cos ψ + sin φ sin ψ) m u1 u y = (cos φ sin θ sin ψ − sin φ cos ψ) m u1 u z = (cos φ cos θ ) − g m

ux =

Hence,

u 1 = m (u x )2 + (u y )2 + (u z + g)2

(2)

(3)

These virtual control inputs reduce the position states as a subsystems of relative degree two with three control inputs u x , u y and u z . In quadrotor desired values of roll φd and pitch θd are generated using virtual control inputs −u x , u y and u z and desired value of yaw angle ψd . Let us design desired values of roll and pitch are: φd = sin−1 θd = tan

3.1.2

m u1

−1

(u x sin ψd − u y cos ψd )

u x cos ψd + u y sin ψd uz + g

(4)

(5)

Subsystems for First Quadrotor Model

Let us take x = x1 , x˙ = x2 then the quadrotor dynamics in x-direction is: x˙1 = x2 x˙2 = f (x) + u x

(6)

Equation (6) is a second-order dynamical system. Likewise, considering y = y1 and y˙ = y2 , z = z 1 , z˙ = z 2 , quadrotor dynamics for y and z positions are y˙1 = y2 y˙2 = f (y) + u y

(7)

z˙ 1 = z 2 z˙ 2 = f (z) + u z

(8)

For Euler angles, second-order dynamics can also be written like this.

32

3.1.3

H. L. Maurya et al.

Second Quadrotor Model

The second model is considered as [13]: x¨ y¨ z¨ φ¨

= = = =

θ¨ = ψ¨ =

u 1a (cos φ sin θ cos ψ + sin φ sin ψ) m u 1a (cos φ sin θ sin ψ − sin φ cos ψ) m u 1a (cos φ cos m θ) − g ˙θ ψ˙ J2 −J3 + τ1a J1 J1 1 φ˙ ψ˙ J3 −J + τJ2a2 J2 2 φ˙ θ˙ J1 −J + τJ3a3 J3

(9)

Since quadrotor is an underactuated system, a virtual control inputs in x, y- and z-directions can be taken as u 1a (cos φ sin θ cos ψ + sin φ sin ψ) m u 1a (cos φ sin θ sin ψ − sin φ cos ψ) = m u 1a (cos φ cos θ ) − g = m

u xa = u ya u za Hence,

u 1a = m (u xa )2 + (u ya )2 + (u za + g)2

(10)

(11)

These virtual control inputs reduce the position states as a subsystems of relative degree two with three control inputs u xa , u ya and u za . Consider again desired values of roll and pitch are: φd = sin−1

m (u xa sin ψd − u ya cos ψd ) u 1a

θd = tan−1

3.1.4

u xa cos ψd + u ya sin ψd u za + g

(12)

(13)

Subsystems for Second Quadrotor Model

Let us take x = x1a , x˙ = x2a then the quadrotor dynamics in x direction is: x˙1a = x2a x˙2a = u xa

(14)

Likewise, considering y = y1a and y˙ = y2a , z = z 1a , z˙ = z 2a , quadrotor dynamics for y and z can be obtained For Euler angles considering, φ = φ1a , φ˙ = φ2a , θ = θ1a , θ˙ = θ2a , ψ = ψ1a and ψ˙ = ψ2a , dynamics can be written as:

3 An Intelligent Game Theory Approach for Collision …

φ˙ 1a = φ2a φ˙ 2a = θ2a ψ2a θ˙1a = θ2a θ˙2a = φ2a ψ2a ψ˙ 1a = ψ2a ψ˙ 2a = φ2a θ2a

J2 − J3 J1

J3 − J1 J2

J1 − J2 J3

33

+

τ1a J1

(15)

+

τ2a J2

(16)

+

τ3a J3

(17)

Now, for both quadrotor model six control inputs are needed to be designed so that Nash equilibrium can be reached.

4 Game Theory-Based Nash Equilibrium 4.1 Nash Equilibrium Points for x Position Let us first take the x position dynamics of both the agents. 1. First quadrotor x position subsystem x˙1 = x2 x˙2 = f (x) + u x

(18)

2. Second quadrotor x position subsystem x˙1a = x2a x˙2a = u xa

(19)

3. Let us take output for first quadrotor is x1 and output for second quadrotor is x1a . 4. The optimal controller designed is based on the pay-off function and is of the form u x = −a1 x2 − b1 G(x1 ) − fˆ(x) (20) u xa = −a1a x2a − b1a G(x1a ) where a(·), b(·) > 0 and G(·) are cocoercive function and are Lipschitz continuous. Here, fˆ(x) is estimated using RBFNN and it is assumed that under ideal condition f (x) = wΦ(x), w is the ideal weight and Φ(x) is the radial basis func-

34

H. L. Maurya et al.

tion. Therefore, the estimated function fˆ(x) = wΦ(x). ˆ The difference between the actual weight and estimated weight is denoted by w˜ = w − w. ˆ Remark The procedure for designing optimal controller is given in [14]. Steps for Nash Equilibrium Step 1: First design a pay-off functions for each subsystems such that pay-off function is continuously differentiable and concave. 1. Pay-off function for first quadrotor Px = −[x1

x1a ]

x1 ka kb ke [x x ] + 1 1a kc kd x1a kf

(21)

2. Pay-off function for second quadrotor Pxa = −[x1 x1a ]

k g kh ki k j

x1 k + [x1 x1a ] k x1a kl

(22)

where k(·) are real numbers. Step 2: Afterthat, a psuedo-gradient function F(x) is designed with the help of payoff function

F(x)

∂ Px ∂ x1 ∂ Pxa ∂ x1a

(23)

Step 3: Further, solve F(x) = 0 and Eqs. (18) and (20) Nash equilibrium points can be achieved. Likewise, same procedure is followed for finding the Nash equilibrium point for y, z position for roll, pitch and yaw Euler angles.

5 Stability Analysis Stability analysis is given for the designed intelligent controller for unknown dynamics of the first quadrotor. Theorem 1 The control laws (20) for quadrotors make the system stable, if weight update law for estimating fˆ(x) is choosen as w˙ˆ = βx2 Φ(x) and under condition

b1 = b1a =

α1 x1 G(x1 ) α1a x1a , G(x1a )

(24)

(25)

3 An Intelligent Game Theory Approach for Collision …

35

Proof Let us take the Lyapunov function as: V =

α1 2 1 2 1 2 α1a 2 1 2 x1 + x2 + w˜ + x1a + x2a 2 2 β 2 2

(26)

Taking the derivative of (26) w˜ w˙ˆ + α1a x1a x˙1a + x2a x˙2a V˙ = α1 x1 x˙1 + x2 x˙2 − β

(27)

Substituting the values of x˙1 , x˙2 , x˙1a and x˙2a from (18) and (20) in (27) w˜ w˙ˆ V˙ = α1 x1 x2 + x2 ( f (x) + u x ) − + α1a x1a x2a + x2a u xa β

(28)

Substituting the values of u x and u xa from (20) in (28) w˜ w˙ˆ V˙ = α1 x1 x2 + x2 ( f (x) − a1 x2 − b1 G(x1 ) − fˆ(x)) − β + α1a x1a x2a + x2a (−a1a x2a − b1a G(x1a ))

(29)

Substituting values of b1 and b1a from (25) in (29) and putting f (x) = wΦ(x) and fˆ(x) = wΦ(x) ˆ w˜ w˙ˆ + x2a (−a1a x2a ) V˙ = x2 (wΦ(x) − a1 x2 − wΦ(x)) ˆ − β

(30)

Further simplifying, w˜ w˙ˆ − a1a x2a 2 ˜ − V˙ = −a1 x2 2 − x2 wΦ(x)) β

(31)

Now, if weight update law varies according to w˙ˆ = βx2 Φ(x), then (32) will turn out to be, (32) V˙ = −a1 x2 2 − a1a x2a 2 which is negative semidefinite. Hence, system remains stable.

6 Simulation Results For simulation parameters are taken as: length l = 0.05 m, mass m = 0.07 kg, damping constants are Bi = 0.07 Ns/m for i = 1, . . . , 6. Inertia in x, y and z directions

36

H. L. Maurya et al.

are Jx = 0.05 kg ∗ m2 , Jy = 0.05 kg ∗ m2 and Jz = 0.01 kg ∗ m2 , respectively. Simulations results show the validity of the proposed approach. For simulations, equilibrium points for two quadrotors are obtained as follows:

6.1 Case 1 In case 1, it is shown that how the equilibrium point is obtained using pay-off function.

6.1.1

For x Position

1. Let us take pay-off function for first quadrotor Px = −[x1

2 − 21 − 21 23

x1a ]

x1 3 [x x ] + 1 1a x1a −2

(33)

2. Pay-off function for second quadrotor

2 − 21 − 21 2

Pxa = −[x1 x1a ]

x1 4 + [x1 x1a ] x1a −5

(34)

3. The psuedo-gradient function F(x) for x position is F(x)

−2x1 + x1a + 4 x1 − 4x1a − 5

(35)

The Nash equilibrium points are x1 = 11 , x2 = 0, x1a = − 67 and x2a = 0. Likewise 7 Nash equilibrium points for rest of the states can be obtained with the different combinations of pay-off functions.

6.1.2

For y Position

1. Let us take pay-off function for first quadrotor Py = −[y1 y1a ]

1 − 21 − 21 23

y1 4 + [y1 y1a ] y1a 3

(36)

y1 2 + [y1 y1a ] y1a −5

(37)

2. Pay-off function for second quadrotor

Pya = −[y1

4 − 21 y1a ] − 21 1

3 An Intelligent Game Theory Approach for Collision …

37

25

25 Force

10 5

15

u1

3 2.5

1.5

10

20

30

0 0

40

10

Time(sec)

Quadrotor1 Quadrotor2

1

10 5

0 0

2

Force

20

15

20

30

0.6 Torque

0.5

Torque

0.4

τ3

0 2

40

Time(sec)

0.6

0.4

τ3

z Position

u1

20

0.2

0.2

4

0 2

-2

y Position

0 -4

-2

x Position

0

0 -0.2

-0.2 0

10

(a)

20

30

40

0

10

20

Time(sec)

Time(sec)

(b)

(c)

30

40

Fig. 3 a 3-D view; b Control input of Quadrotor 1; c Control input of Quadrotor 2

3. The psuedo-gradient function F(y) for y position is F(y)

−2y1 + y1a + 4 y1 − 2y1a − 5

(38)

The Nash equilibrium points are y1 = 1, y2 = 0, y1a = −2 and y2a = 0. Taking slightly different pay-off function for z position, the equilibrium points will be z 1 = 1, z 2 = 0, z 1a = 2.5 and y2a = 0. Using the same procedure, equilibrium points can be achieved for roll, pitch and yaw. Figure 3 shows the evolution of states of both quadrotors, and from the response, it is clear that vehicles achieve the Nash equilibrium without collision. Hence, set point is reached without any damage to the vehicles. Figure 3 shows the applied control inputs of the first quadrotor, and Fig. 3 shows the used control inputs of the second quadrotor. Since the quadrotor settles at different equilibrium points so that the collision between the two can be avoided, the proposed work’s objective is achieved.

6.2 Case 2 In case 2, it is shown that in different initial points, with varying sets of pay-off functions, other equilibrium points can be achieved without collision. Figure 4 shows that quadrotors have different equilibrium points, and they move in 3-D space without colliding with each other.

7 Conclusion and Future Scope This work proposes the design of Nash equilibrium points for multi-UAV systems. Two different models of the quadrotor are considered, and from a different set of

38

H. L. Maurya et al.

3

3 Quadrotor1 Quadrotor2

2

0

1

1

-2

0 2

0 3 2

2

y

0

1 0

-2 -4

(a)

x

Quadrotor1 Quadrotor2

z

z

z

2

2 Quadrotor1 Quadrotor2

2

0

y

0

-2 -4 -4

-2

x

(b)

-4 0

5

-1

y

0

-2 -3 -5

x

(c)

Fig. 4 3-D view of quadrotors for different set of equilibrium points

pay-off functions, Nash equilibrium points are achieved. This application can be extended with more agents of the same mathematical model or other mathematical models. Using this approach without using formation control theory, a different set of target points can be achieved. This approach also allows combining different UAVs to perform various tasks collectively. The future scope of the present work is to implement the theory for real-time applications.

References 1. Loquercio A, Maqueda AI, Del-Blanco CR, Scaramuzza D (2018) Dronet: learning to fly by driving. IEEE Robot Autom Lett 3(2):1088–1095 2. Adams SM, Levitan ML, Friedland CJ (2014) High resolution imagery collection for post-disaster studies utilizing unmanned aircraft systems (uas). Photogram Eng Rem Sens 80(12):1161–1168 3. Beard RW (2008) Quadrotor dynamics and control. Brigham Young Univ 19(3):46–56 4. Dong X, Yu B, Shi Z, Zhong Y (2014) Time-varying formation control for unmanned aerial vehicles: theories and applications. IEEE Trans Cont Syst Technol 23(1):340–348 5. Du H, Zhu W, Wen G, Duan Z, Lü J (2017) Distributed formation control of multiple quadrotor aircraft based on nonsmooth consensus algorithms. IEEE Trans Cybern 49(1):342–353 6. Kuriki Y, Namerikawa T (2014) Consensus-based cooperative formation control with collision avoidance for a multi-UAV system. American Control Conference. IEEE, vol 2014, pp 2077– 2082 7. Liu X, Liu Y, Chen Y, Hanzo L (2019) Trajectory design and power control for multi-UAV assisted wireless networks: a machine learning approach. IEEE Trans Veh Technol 68(8):7957– 7969 8. Dhar NK, Verma NK, Behera L (2020) An online event-triggered near-optimal controller for Nash solution in interconnected system. IEEE Trans Neural Netw Learn Syst 31(12):5534– 5548 9. Ren L, Castillo-Effen M, Yu H, Yoon Y, Nakamura T, Johnson EN, Ippolito CA (2017) Small unmanned aircraft system (suas) trajectory modeling in support of uas traffic management (utm). In: 17th AIAA aviation technology, integration, and operations conference, p 4268 10. Shen C, Chang T-H, Gong J, Zeng Y, Zhang R (2020) Multi-UAV interference coordination via joint trajectory and power control. IEEE Trans Sig Proc 68:843–858 11. Liu H, Tian Y, Lewis FL, Wan Y, Valavanis KP (2019) Robust formation tracking control for multiple quadrotors under aggressive maneuvers. Automatica 105:179–185

3 An Intelligent Game Theory Approach for Collision …

39

12. Zhao B, Xian B, Zhang Y, Zhang X (2014) Nonlinear robust adaptive tracking control of a quadrotor UAV via immersion and invariance methodology. IEEE Trans Ind Electron 62(5):2891–2902 13. Singh P, Gupta S, Behera L, Verma NK, Nahavandi S (2020) Perching of nano-quadrotor using self-trigger finite-time second-order continuous control. IEEE Syst J 14. Ibrahim AR, Hayakawa T (2018) Nash equilibrium seeking with second-order dynamic agents. In: 2018 IEEE conference on decision and control (CDC). IEEE, pp 2514–2518

Chapter 4

Dynamics and Control of Quadrupedal Robot Shashank Kumar, Shubham Shukla, Ishank Agarwal, Arjit Jaitely, Ketan Singh, Vishwaratna Srivastava, and Vibhav Kumar Sachan

1 Introduction Today, much research and progress have done in mobile robots that are equipped with wheels as the wheeled robots are easy to control and direct, and wheels provide a stable base on which robot can comfortably stand. The major drawback of such wheel robots is that these robots require a relatively flat surface on which they can operate. The operation of such robots in extremely rough, uneven terrain has been impossible or unreliable (Fig. 1). In many cases, there is a need for a robot that can move in areas with difficult terrain conditions where wheeled robots cannot travel. Examples of such scenarios occur in the search and rescue task as well as in carrying payloads for the army in rough or uneven terrain [1]. So, for such cases instead of using wheeled robots, attempts are made to develop walking robots that imitate the body structure and methods of locomotion of animals like horses or dogs. By the 1990s, many research programs in the field of legged locomotion take place. Apart from design and building the robot, research on different path planning, motion analysis of legs is made so that robots can move in desirable ways [2]. Today, the number of lab-scaled quadruped robots being used in research that includes Little Dog from Boston Dynamics, TITAN-IX and AiDIN-III from Sungkyunkwan University. These legged robots are composing of complicated structures and hardware as well as software is not open source. In this paper, we present an innovative, inexpensive design of a four-legged walking robot that uses open-source hardware and software, so that further improvement and research on this project-based research paper can be directed. We made a robot follows a plantigrade mechanism and has a driving structure mainly based on S. Kumar · S. Shukla (B) · I. Agarwal · A. Jaitely · K. Singh · V. Srivastava · V. K. Sachan KIET Group of Institutions, Delhi-NCR, Ghaziabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_4

41

42

S. Kumar et al.

Fig. 1 This four-legged robot was created during this research. It is equipped with encoders, camera, LCD and more accessories

gait strategy. We have used the lead screw system to make the legs of the robot. Each leg has its encoder motor attached via a lead screw. These motors provide vertical motion to the legs. A single step of the robot comprises two locomotions [3]; the first motion is the vertical motion of the diagonal limb of the robot, then followed by the horizontal sweep of the lead screw mechanism. Legged mechanisms are complex and require precision so that the structure can be stable in rest as well as in motion state. In this paper, we present the implementation of the PID algorithm for a precise movement of legs. We have used various sensors in the robot that helps in obstacle detection and overcoming. We have designed the robot with such flexible nature that the robot can climb up to 30° tilt.

2 Robot Design and Kinematics 2.1 Body Design of the Robot The body of this robot consists of three parts that are the central frame and two swing arms. Legs of the robot are present at the ends of these swing arms. A setup of geared encoder motors and the lead screw relates to each arm to ensure independent swing motion. In the front section, we have the setup of two micro-servo motors for stabilizing the camera module during the motion. Electronic equipment with

4 Dynamics and Control of Quadrupedal Robot

43

Fig. 2 Bot structure created on SOLIDWORKS

batteries and circuitry is mounted below the mainframe to maximize the payload area. We have designed the upper platform of the base is in such a way that any of the extra equipment can easily mount on it [5, 6] (Fig. 2). Walking calculation: Starting angle = 15z to −15 Sweep angle = 30 to 30 Length covered by one leg = 238.11 mm Also, Length covered by body due to front leg = 119.06 mm Now motion due to both legs: Robot movement = 119.06 mm Leg movement = 238.11 mm.

2.2 Leg Design of the Robot Proper design and analysis of the leg play a vital role in walking robot as it serves two main objectives: • Support the structure • Provide forward movement

44

S. Kumar et al.

Fig. 3 To give good support while moving, we have used elliptical foot elongated in direction of toppling forces generated due to the variable location of the center of mass

We have specially designed the feet of the robot in such a way that the robot can maintain stability in motion. And as shown in Fig. 3, the motor is present at the foot of the leg. We have used lead screw which ensures proper vertical motion of the leg. To give up-down motion to the leg, we have used a four start lead screw (for fast movement). To stop the bending of the lead screw, proper guided rail and support system is designed as shown in “Fig. 3”. This setup consists of double lead nuts without angle deviation attached to a 3-D-printed block. And the channel support system consists of two side channels paired with four bearings on each.

2.3 Stability Analysis of the Robot Stability analysis is done to overcome the conditions of toppling during extreme conditions. It is done by calculating the center of mass of the robot and weight distribution on the legs throughout the motion (Fig. 4). CASE-1: When bot is in initial position (θ = 0). Torque is generated when there is some distance between the line (formed by joining the legs on ground) and COM (center of mass) at point O. Line AC passes through point O. Distance between point (COM) and line AC = 0.

4 Dynamics and Control of Quadrupedal Robot

45

Fig. 4 Top view cross section of bot

CASE-2: When bot has completed its Step 1 (θ = 20) In the given diagram, we have AB = CD = 450 mm, PQ = 460 mm. To find OE, PO =

460 PQ = = 230 mm 2 2

AP =

450 AB = = 225 mm 2 2 ∠PAX = 20

In APX, sin20 =

PX PX = AP 225

and cos 20 =

AX AP

PX = 76.95 mm and AX = 211.45 mm. Points A and C are stationary. Line PQ has shifted with value of 76.95 mm (Fig. 5), So, OF = 76.95 XF =PO + OF − PX XF = 230 mm

46

S. Kumar et al.

Fig. 5 Time analysis of each step

In AXF. AX = 211. 43 mm and XF = 230 mm AX XF ∠AXF =44.37

∠AXF = tanh

In EOF Sin44.37 =

OE OF

OE = 53.81 mm (maximum distance between line and COM).

2.4 Weight Ratio on the Legs of the Robot EE From the diagram, cos 44.37 = OF . EF = 55 mm and AC = 643.5 Weight on leg A × AE = weight on leg C × EC

EC Weightonleg A = WeightonlegC AE EF + FC = AF − EC = 1.41 Also, Maximum weight on any leg = Weight of 2legs × 1.41 = Net weight of bot − 2

4 Dynamics and Control of Quadrupedal Robot

47

3 Electronics System Working To control robot sensors play a vital role. Our main tasks are object detection, path tracing and motor position. And to perform these tasks, we have chosen PIXY 2.0, Sharp IR GP2Y0A21YK0F, Infrared proximity, optical rotary encoder and limit switches. • We have used a Pixy camera for object detection and path tracing. In this research, we have used only object detection and line tracking mode. • Sharp Infrared distance measuring sensor is used for the detection of the distance of the robot from the upcoming object so that its response in time. • Optical rotary encoder sensor is used for detection of motor position. • Infrared proximity sensor is used for initializing the start of the robot. • We have used limit switches for feedback on the maximum movement of the robot’s leg. All the above sensors mentioned works with 3.3 V. The nongeared DC motor with 6000 RPM and 24 V is used to provide movement. We have used a micro-DC servo motor to control the proper alignment of the pixy camera. We used the single-channel DC motor driver with 30A current capacity for controlling the motor’s speed and direction (Fig. 6).

Fig. 6 This is the basic flowchart of the working of all the sensors and electronics component in our project

48

S. Kumar et al.

We have worked on Arduino Due (ARM Cortex M3-based microcontroller) because of its high clock speed. For the power supply, we have used one 24 V Li-Po battery, which can supply up to 8000 mAh current. This power supply is sufficient to drive all the sensors, microcontroller and motors on the robot.

4 Control System of the Robot 4.1 Gait Strategy Used for Motion Gait is a periodic sequence of lift and release events for each leg. Theoretically number of different gaits N can be calculated using formula: N = (2K − 1)!

(1)

N = Number of different gaits possible, K = Number of legs in the robot. For four-legged walking robot, K = 4 N = 7! = 5040. These possible gaits include the normal upward and downward motions of the legs. So, despite so many possible ways of motion planning, only a few can be used. Gait strategy can be either statically stable or dynamically stable. But as for staticstable walking, it is required that three legs of the robot be at ground. That means that just one leg can move at the same time and so walking becomes slowly. Here, we have used a dynamic stable-walking strategy. In case of dynamic stability, the number of ground contact points can vary from zero, when the robot is jumping, to the total number of legs, when the robot is stationary.

4.2 Motion Planning The locomotion of such robots is much more complicated. The system needs to be stable in motion as well as in stationary conditions. Here, we have used crawl gait technique and stabilized the robot in stationary as well as in transit. We have used encoder motors and lead screw mechanism along with the PID algorithm for achieving the proper motion of the robot. We have encoder motors connected with lead screws in all four-legs. These motors help the legs to lift. Each encoder motor gives P PPR (pulses per rotation), and each leg has an extension limit of R cm. The pitch of lead screw = 2 mm. We are using four thread lead screws (for fast movement). Hence, In 1 rotation, total traversal of lead nut = 2 4 = 8 mm. Length of all four lead screws = 27 cm each. Number of rotations required = 27/ 0.8 = 33.75 approx. Leg encoder motor has P = 200 ppr.

4 Dynamics and Control of Quadrupedal Robot

49

So total counters needed to completely lift the leg is: C = R P = 33.75 P = 33.75 200 = 6750 Counters. So, each leg can lift from 0 to 6750 counts (Figs. 7 and 8).

Fig. 7 Robot in motion: we have used dynamic stable-walking strategy. Diagonal leg 1 and leg 4 are in motion while robot is perfectly balance on other legs 2 and leg 3

Fig. 8 Robot in motion: solidworks simulation performed successfully

50

4.2.1

S. Kumar et al.

Straight Motion

For the forward movement, there are two diagonal motors under the base attached with lead screws. • Step 1: Lift diagonal legs (let 1 and 4) up to the defined counter. • Step 2: Moving these legs forward using both diagonal motors by rotating in the direction such that the legs (1 and 4) push forward. • Step 3: Reverse counter the diagonal legs (1 and 4) to the default counter so that all 4 legs are touching the ground. • Step 4: Now lift the diagonal legs (2 and 3) to the defined counter. • Step 5: Moving these legs forward using both diagonal motors by rotating in the direction(reverse) such that the legs (2 and 3) push forward. • Step 6: Reverse counter the diagonal legs (2 and 3) to the default counter so that all 4 legs are touching the ground. All these steps make the robot x meters forward (this consists of two robot steps: leg (1,4) and leg (2,3)). Repeating the above steps will make the robot move up to any distance you want (x* the number of loops) (Fig. 9).

Fig. 9 Stepwise motion of bot

4 Dynamics and Control of Quadrupedal Robot

4.2.2

51

Curve Path

If the movement of any one of the diagonal motors is prohibited or counters are restricted which is decided by the curve vector (angle between primary and dynamic vector). These vectors received from the Pixy Camera module, afterward with the same terminology as mentioned above, the robot can move in any curved path.

4.2.3

Inclined Path

Following the normal forward motion, for climbing an inclined plane, the C counters of the encoder motors of the legs (steps 1 and 4) are decided in such a way that the flexibility of the leg structure provides a maximum inclination of up to 30° while maintaining the degree of freedom at 1 for the structure for all motions with respect to the ground.

4.3 PID Algorithm for the Robot PID is a control loop system with a feedback mechanism. In our case, feedback is counters from encoder motors, error is the difference in actual counters and desired counters, whereas the output is PWM for the motors. PID control function: – PWM = K p × Error + K d × (Error – LastError)/Time + K i × (Error + LastError) × Time. here, Error = Actual Counters-Desired Counters. Time = time taken to execute a single loop/Elapsed Time. According to the situation, the PID algorithm provides appropriate PWM on each motor by analyzing the counter value of that encoder motor and the required motion can be performed.

4.4 Obstacle Detection and Overcoming Algorithm 4.4.1

Object Detection

This quadruped robot is capable of dodging and moving past small obstacles of up to the height of 10–12 cm. The obstacle object can be of any form, for example, a wooden block in which the robot can move over. For this, we have used a GP2Y0A21YK0F sharp IR sensor. When any obstacle comes within the detectable range of the sensor, it gives analog voltage as output corresponding to the distance of the object. The analog voltage from the sensor is passed through an analog to digital converting

52

S. Kumar et al.

Fig. 10 Object detection via distance sensor

circuit and then fed to the controller for getting the ultimate result, which is distance (Fig. 10).

4.4.2

Object Overcoming

Whenever the sharp IR sensors, which are present at the feet of the robot legs, detect any obstacle within the distance specified in the program, the robot enters the obstacle overcoming state. In this, it lifts and moves forward its one leg at a time (static gait walking strategy) contrasting the normal walking process (dynamic stable walking), in which diagonal legs of the robot are moving at a time. This process ensures the stability of the robot and makes sure the robot does not lose its balance as it moves over the obstacles (Fig. 11).

4.4.3

Object Following Technique Used

Pixy Camera Module detects an object or a specific line and creates a vector to follow it. The line vectors (primary vector and angle tracking vector) feedback is received from Pixy which in turn guides the robot to follow a certain motion (Fig. 12). The angle feedback goes to PID as input, then output counters and rpm to be applied to motors to nullify the error.

4 Dynamics and Control of Quadrupedal Robot

53

Fig. 11 Object overcoming: the system changes from dynamic to stable gait. In this robot’s three legs are in contact with the ground and one leg is in motion

5 Comparison During this research, we created a robot that is highly efficient, and our robot can walk on tilted paths. With 6000 rpm motors, our robot can walk at a speed of 1.6 m per second. Earlier walking robots were not having that much speed and accuracy. The use of Pixy camera has given our robot vision and can detect any object in front of the robot. Comparison Parameters Past designed robots

Our proposed designed robot

Stability

Earlier models were By doing proper gait analysis, we have achieved a statically stable, need 3 feet dynamically stable model. We manage to balance the to touch the ground at a robot on two diagonal feet single time

Feet stability

Pointed feet are used in walking robots and if some error occurs in calibration, then it becomes hard to balance the robot in motions

We have specially designed the feet of the robot in such a way that the robot can maintain stability in motion. To give better support while moving, we have used elliptical foot elongated to the direction of toppling forces generated due to the variable location of the center of mass

Speed of the robot

Previous Titan-XIII has a speed of 1.38 m per second

By an experiment with 6500 rpm diagonal motors, we managed to achieve a speed of 1.6 m per second

54

S. Kumar et al.

Fig. 12 Object overcoming: We have used Pixy camera for object detecting and calibrated it with the help of Pixy Mong software

6 Result In this paper, we have designed a four-legged walking robot using PID. We analyzed the concept of dynamics and control of the quadrupedal robot. We also understand the various concepts related to object detection, object overcoming and object following. We have used Arduino Due that gives the advantage to test various gait strategy and leg motion. We have done the proper simulation of the robot on the software, which helps in learning many mechanical aspects of the robot. In practical testing, this robot completed tasks of detecting wooden logs, ropes and an inclined path, and then passing over them by automatically changing step size according to the algorithm. In our future work, we want to involve in the research related to connecting the robot to the IoT and using LIDAR for path mapping. Acknowledgements This work was presented by us in DD-National Robocon 2019 held in IIT, Delhi, on June 17, 2019.

4 Dynamics and Control of Quadrupedal Robot

55

References 1. Filho AB (2010) A four-legged walking robot with obstacle overcoming capabilities. Federal University of Espiritu Santo 2. Geva Y (2014) A novel design of a quadruped robot for research purposes. Ben-Gurion University of the Negev, Israel 3. Zhong Y (2019) Analysis and research of quadruped robot’s legs: a comprehensive review. Northwestern Polytechnical University, Xi’an, China 4. Ridderström C (2003) Legged locomotion: balance, control and tools—from equation to action. Royal Institute of Technology Stockholm, Sweden 5. Orner AMM, Ogura Y, Kondo H, Morishima A, Carbone G, Ceccarelli M, Lim H-O, Takanishi A (2005) Development of a humanoid robot having 2-DQF waist and 2-DOF trunk. In: Proceedings of the 5th IEEE-RAS international conference on humanoid robots. Tsukuba, Japan, pp 333–338 6. Gopi Krishna Rao PV (2014) Study on PID controller design and performance based on tuning techniques. Kanyakumari, India 7. Kouhia E-P (2016) Development of an Arduino-based embedded system. Centrica University of Applied Science, Information Technology 8. Tedeschi F, Carbone G (2017) Design of a novel leg-wheel hexapod walking robot. Robotics 6:40 9. The Robot Report Homepage. Robotic Business Review. Available online: https://www.therob otreport.com/. Accessed on 21 Nov 2020 10. Orozco-Madaleno EC, Gomez-Bravo F, Castillo E, Carbone G (2020) Evaluation of locomotion performances for a mecanum-wheeled hybrid hexapod robot. IEEE/ASME Trans Mechatron

Chapter 5

Disturbance Observer-Based Sliding Mode Controller with Mismatched Disturbance for Trajectory Tracking of a Quadrotor Vibhu Kumar Tripathi, Anuj Nandanwar, and Laxmidhar Behera

1 Introduction Due to the active involvement of the unmanned aerial vehicle (UAV) in various applications these days, quadrotor has been emerged as the most attractive UAV for performing the critical tasks associated with many applications like mapping, aerial photography, transportation, surveillance and many more. There are mainly three systems involved in quadrotor for operating autonomously which are guidance system, navigation system and control system. The problem of designing a good flight control system has attracted much attention of the control researchers and scientist. However, designing the flight controller is very challenging and tedious task due to the nonlinear coupled dynamics, under-actuation characteristics and modelling uncertainty. Moreover, quadrotor is subjected to the external disturbances due to the wind gust and unknown environmental flying conditions that also significantly increases the designing challenge. The flight controller design using sliding mode theory has been successfully done in the recent years. The reason for gaining popularity of SMC is its robustness against bounded disturbances and uncertainty satisfying the matching condition [1– 3]. This means the system dynamics are insensitive to the disturbances that are coming to the control input channel directly. The SMC overcomes the effect of matched disturbances by selecting the reaching gain greater than the disturbance V. K. Tripathi (B) Department of Electrical and Electronics Engineering, Pranveer Singh Institute of Technology, Kanpur, Kanpur, India e-mail: [email protected] A. Nandanwar · L. Behera Department of Electrical Engineering, IIT Kanpur, Kanpur, India e-mail: [email protected] L. Behera e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_5

57

58

V. K. Tripathi et al.

upper bound. However, chattering appears in control inputs for large disturbances, as value of reaching gain will increase to nullify the effect of such disturbances. To reduce the chattering amplitude while maintaining the robustness feature of the conventional SMC, the robust controller design based on disturbance observer is developed over the decades. The idea behind these types of schemes is to develop the control laws by combining the SMC feedback with the disturbance estimationbased feedforward compensation straightforwardly. A robust flight controller for quadrotor is developed using SMC approach along with the sliding mode observer in [4]. The proposed approach shows better robustness against unknown external disturbances with low control gain or less computational power. An observer-based second-order SMC for quadrotor is proposed in [5, 6]. Results are compared with adaptive gain controller and standard observer-based supertwisting controller to show the effectiveness of the proposed methodology. In [7], a NDO-based SMC is developed for attitude tracking of the quadrotor. The performance of the proposed controller is tested using extensive simulations. In [8], the integral back-stepping SMC is proposed for quadrotor trajectory tacking problem. A continuous SMC strategy using finite time sliding mode observer is proposed for trajectory tracking of the quadrotor in [9]. The performance of the proposed approach is tested through experiments on real quadrotor under the effect of unknown external disturbances. A finite time disturbance observer-based SMC using nonlinear surface is developed for tracking control of the quadrotor in [10]. In [11], a finite time attitude controller is developed using improved second-order sliding mode approach for quadrotor. The proposed approach completely removes the problem of control gain overestimation and maintains the robustness property of conventional SMC. In [12], authors have developed the finite time integral SMC-based disturbance observer for quadrotor trajectory tracking problem. Performance of the developed controller is checked using simulations. The aforementioned disturbance observer-based SMC strategies show the robustness against the matched uncertainties and external disturbances. However, disturbances occurring other than control input channel known as mismatched disturbances deteriorate the tracking performance drastically. Since most of the robotic systems including quadrotors are subjected to both matched and mismatched disturbances. These perturbations affect the entire state model rather than the state equation in which control appears. In order to tackle the mismatched disturbance, a disturbance observer-based SMC using non-singular terminal sliding variable is proposed for quadrotor attitude and altitude tracking in [13]. However, proposed method is restricted to attitude and altitude control problem. A SMC based on NDO is proposed to overcome the effect of matched [14] and mismatched disturbances in [15]. The designed composite controller performance is tested through simulation in windy conditions. In [16], a NDO-based SMC is developed for position and attitude tracking of the quadrotor. The developed scheme provides the robustness against the mismatched disturbances. However, matched disturbances are not taken into consideration in the quadrotor model which is also mandatory for real-time practical implementation. Motivated from the above discussion, a novel NDO-based SMC for the trajectory tracking of the quadrotor is proposed in this paper. The proposed observer–controller pairs

5 Disturbance Observer-Based Sliding Mode Controller with Mismatched …

59

have been shown asymptotic stable in the sense of Lyapunov. The proposed scheme shows two attractive features. First, the switching gain will hold the value that will be dependent on the disturbance estimation error bound which will be very small as compared to the case where gain depends upon the disturbance bound, which in turn gives chattering mitigation. Second, the proposed controller tracking performances in the absence of perturbations are quite similar to the tracking performances achieved by the conventional SMC. The performance of the proposed scheme is tested under the presence of matched and unmatched disturbances through extensive numerical simulations. The detailed description of the proposed work is as follows. The mathematical model of the quadrotor along with problem formulation is presented in Sect. 2. The proposed methodology has been presented in Sect. 3. The overall closed-loop system stability analysis using Lyapunov stability theory is presented in Sect. 4. The simulation study and conclusion are presented in Sects. 5 and 6, respectively.

2 Quadrotor Model and Problem Formulation The nonliner dynamical model of the quadrotor is given below [17, 18] x˙ = f(x,u) + d where,

(1)

T u ˙ a3 , z˙ , a4 , x, ˙ a1 , θ˙ , a2 , ψ, f(x,u) = φ, ˙ ( umx )u 1 , y˙ , ( my )u 1 , T d = d1 , d2 , d3 , d4 , d5 , d6 , d7 , d8 , d9 , d10 , d11 , d12

with a1 = (

Jy −Jz ˙ ˙ )θ ψ Jx

+

x = ( Jz −J )φ˙ ψ˙ + Jy

u3 , a3 Jy

=(

Jx −Jy ˙ ˙ )φ θ Jz

= −g + T u 1 , u 2 , u 3 , u 4 and d1 , d3 , d5 , d7 , d9 , d11 represents the bounded unmatched disturbance and d2 , d4 , d6 , d8 , d10 , d12 represents bounded matched disturbance. The state vector is defined as u2 , a2 Jx

+

u4 , a4 Jz

(cos φ cos θ) u 1 and u is the control input vector defined as u = m u x , u y are the virtual control inputs already defined in [18].

T ˙ θ, θ˙ , ψ, ψ, ˙ z, z˙ , x, x, x = φ, φ, ˙ y, y˙ where φ, θ and ψ are the Euler angles representing roll, pitch and yaw angle, respectively. x, y and z are the inertial position of the quadrotor. Jx , Jy and Jz are the moment of inertia along x, y and z directions, respectively. The quadrotor model (1) can be easily viewed as a complex system made of six subsystems, and the subsystems dynamics are affected by the presence of matched and unmatched disturbances. Therefore, the main idea is to develop a nonlinear disturbance observer for these subsystems to estimate the unmatched and matched perturbation so that

60

V. K. Tripathi et al.

this information can be directly supplied to the controller to nullify the effect of disturbance. The subsystems of (1) are of the form S j , j = 1, . . . , 6 : x˙i = xi+1 + di : Sj x˙i+1 = f i+1 (x) + gi+1 (x)U j + di+1 ; i = 2 j − 1

(2)

where xi , xi+1 are the subsystem states and U j is the control input, di , di+1 are the unmatched and matched perturbation, respectively. Note that the controls are defined as U1 = u 2 , U2 = u 3 , U3 = u 4 , U4 = u 1 , U5 = u x u 1 , U6 = u y u 1 . Assumption 1 For each subsystem S j , the states are assumed to be known to both the observer and the controller. Finally, the control problem is stated as follows. For the nonlinear model represented in (2), design a control law U j that forces the states x, y, z, ψ to follow a desired trajectory xd , yd , z d , ψd in the presence of bounded matched and unmatched disturbances. Mathematically, one can write: lim (x, y, z, ψ) = (xd , yd , z d , ψd )

t→∞

3 Proposed Disturbance Observer-Based Control Design This section presents the design procedure of the control inputs u 1 , u 2 , u 3 and u 4 based on the proposed NDO-SMC approach which will be acting to the quadrotor. The basic schematic of the proposed NDO-SMC scheme is shown in Fig. 1. This section developed the control input U1 for the roll subsystem, and similarly, this approach is applied to other five subsystems. To design U1 , let us consider the roll subsystem which can be obtained from (2) by putting j = 1 given as:

x˙1 = x2 + d1 x˙2 = f 2 (x) + g2 (x)U1 + d2

(3)

J −J

where f 2 (x) = y Jx z x4 x6 , g2 (x) = J1x . d1 and d2 are the unknown bounded unmatched and matched disturbance in the roll dynamics with an upper bound of ds , ds∗ , respectively. The proposed disturbance observer for estimating the unmatched and matched perturbation applied to the quadrotor in (2) given as: ⎧ ξ˙ j = −Pξ j (gi+1 (x)U j + dˆi + xi+1 + f i+1 (x) + dˆi+1 ) ⎪ ⎪ ⎨ˆ di = ξ j + Pξ j (xi + xi+1 ) ⎪ η˙ = −Pη j (gi+1 (x)U j + f i+1 (x) + dˆi+1 ) ⎪ ⎩ j dˆi+1 = η j + Pη j xi+1

(4)

5 Disturbance Observer-Based Sliding Mode Controller with Mismatched …

61

Disturbance

Sliding mode controller

Reference System

Position Controller

Desired Euler angle calculation

Attitude Controller

Nonlinear Disturbance Observer

Fig. 1 Basic schematic of the proposed methodology

where dˆi , dˆi+1 are the estimation of unmatched and matched disturbance. (ξ j , η j ) and (Pξ j , Pη j ) are the internal states and observer gain, respectively. Hence, one can write the observer dynamics for (3) by substituting j = 1, i = 1 in (4), given as: ⎧ ξ˙1 = −Pξ 1 (g2 (x)U1 + dˆ1 + x2 + f 2 (x) + dˆ2 ) ⎪ ⎪ ⎨ˆ d1 = ξ1 + Pξ 1 (x1 + x2 ) ⎪ η˙ = −Pη1 (g2 (x)U1 + f 2 (x) + dˆ2 ) ⎪ ⎩ 1 dˆ2 = η1 + Pη1 x2

(5)

where dˆ1 , dˆ2 are the estimation of unmatched and matched disturbance associated with the roll dynamics. (ξ1 , η1 ) and (Pξ 1 , Pη1 ) are the internal states and observer gain, respectively, associated with the roll dynamics. Let us consider the tracking error for roll subsystem as: e1 = x1 − φd ; e2 = x2 − φ˙ d

(6)

The expression of φd can be found in [18]. The linear sliding variable based on disturbance estimation is defined as: s1 = e2 + β1 e1 + dˆ1

(7)

where β1 > 0. By taking the time derivative of s1 , one can get s˙1 as s˙1 = (x˙2 − φ¨d ) + β1 (x2 + d1 − φ˙ d ) + d˙ˆ1 U1 = a1 x 4 x 6 + + d2 − φ¨ d + d˙ˆ1 + β1 x2 + β1 d1 − β1 φ˙ d Jx The proposed control law U1 is selected as:

(8)

62

V. K. Tripathi et al.

U1 =Jx φ¨ d − a1 x4 x6 − dˆ2 − β1 x2 − β1 dˆ1 + β1 φ˙ d − Γ1 s1 − K 1 |s1 |γ sign(s1 ) where K 1 , Γ1 > 0 are the switching gain and γ varies from 0 to 1. Therefore, the control input u 2 for stabilizing the roll motion will therefore be

Jy − Jz γ ˆ ˆ ¨ ˙ u 2 =Jx φd − x4 x6 − d2 − β1 x2 − β1 d1 + β1 φd − Γ1 s1 − K 1 |s1 | sign(s1 ) Jx (9) On similar lines, the actual control input u 3 for stabilizing the pitch motion will be

u 3 =Jy θ¨d −

Jz − Jx γ ˆ ˆ ˙ x2 x6 − d4 − β2 x4 − β2 d3 + β2 θd − Γ2 s2 − K 2 |s2 | sign(s2 ) Jy (10)

The expression of θd can be found in [18]. The control input u 4 for stabilizing the yaw motion will therefore be

Jx − J y γ ˆ ˆ ˙ ¨ u 4 =Jz ψd − x2 x4 − d6 − β3 x6 − β3 d5 + β3 ψd − Γ3 s3 − K 3 |s3 | sign(s3 ) Jz

(11) The control input u 1 for altitude control will be u1 =

m z¨ d + g − dˆ8 − β4 x8 − β4 dˆ7 + β4 z˙ d − Γ4 s4 − K 4 |s4 |γ sign(s4 ) cos x1 cos x3 (12)

Finally, to evaluate the orientation angle of the quadrotor, the virtual control inputs u x and u y will be

m x¨d − dˆ10 − β5 x10 − β5 dˆ9 + β5 x˙d − Γ5 s5 − K 5 |s5 |γ sign(s5 ) u1

m y¨d − dˆ12 − β6 x12 − β6 dˆ11 + β6 y˙d − Γ6 s6 − K 6 |s6 |γ sign(s6 ) uy = u1 ux =

(13) (14)

where, ⎧ ⎪ s2 ⎪ ⎪ ⎪ ⎪ ⎨ s3 s4 ⎪ ⎪ ⎪ s5 ⎪ ⎪ ⎩ s6

= e4 + β2 e3 + dˆ3 ; e3 = x3 − θd ; e4 = x4 − θ˙d = e6 + β3 e5 + dˆ5 ; e5 = x5 − ψd ; e6 = x6 − ψ˙ d = e8 + β4 e7 + dˆ7 ; e7 = x7 − z d ; e8 = x8 − z˙ d = e10 + β5 e9 + dˆ9 ; e9 = x9 − xd ; e10 = x10 − x˙d = e12 + β6 e11 + dˆ11 ; e11 = x11 − yd ; e12 = x12 − y˙d

(15)

5 Disturbance Observer-Based Sliding Mode Controller with Mismatched …

63

4 Stability Analysis The asymptotic stability of the proposed disturbance observer and control laws will be examined using Lyapunov’s stability theory. Theorem 1 is used to examine the convergence of the estimated value of disturbances to actual value of disturbances and the convergence of the sliding variable to the sliding manifold. Once sliding manifold is reached, the convergence of tracking errors can be guaranteed. In this paper, the following assumptions are needed to show the asymptotic stability of the closed-loop system. Assumption 2 For each subsystem S j , the matched and unmatched perturbation are bounded and defined as: |di | ≤ ds ; |di+1 | ≤ ds∗ (16) where ds and ds∗ are positive constants. Assumption 3 For each subsystem S j , the matched and unmatched disturbance are slowly varying such that (17) d˙i = 0; d˙i+1 = 0 Theorem 1 Suppose that Assumptions 2 and 3 are satisfied for system (2). The disturbance estimation dˆi , dˆi+1 for each j = 1, ..., 6; i = 2 j − 1 of disturbance observer (4) can track the disturbance di , di+1 of system (2) asymptotically if the observer gain Pξ j , Pη j are chosen such that Pη j > Pξ j > 0 holds, which implies that d˙˜i + Pξ j d˜i = 0 d˙˜i+1 + Pη j d˜i+1 = 0

(18)

is globally asymptotically stable, where d˜i = di − dˆi ; d˜i+1 = di+1 − dˆi+1 are the ∗ . disturbance estimation error and are bounded defined by |d˜i | ≤ d˜i∗ ; |d˜i+1 | ≤ d˜i+1 Proof Let us consider matched estimation error as: d˜i+1 = di+1 − dˆi+1

(19)

By using (2) and (4) with Assumption 3, one can write the expression for d˙˜i+1 as: d˙˜i+1 = −η˙ j − Pη j x˙i+1 = Pη j dˆi+1 − Pη j di+1 = −Pη j (di+1 − dˆi+1 ) = −Pη j d˜i+1

(20)

By choosing observer gain Pη j > 0, matched estimation error d˜i+1 will converge to zero asymptotically. This shows dˆi+1 will approach to di+1 asymptotically. Now let us consider unmatched estimation error as:

64

V. K. Tripathi et al.

d˜i = di − dˆi

(21)

By taking time derivative of (21) and using Assumption 3, one can get d˙˜i = −d˙ˆi

(22)

d˙˜i = −ξ˙ j − Pξ j x˙i − Pξ j x˙i+1 = Pξ j dˆi + Pξ j dˆi+1 − Pξ j di − Pξ j di+1 = −Pξ j d˜i − Pξ j d˜i+1

(23)

From (2) and (4), one can write

By choosing Pη j > Pξ j so that d˜i+1 will approach to zero first, then (23) will become as: (24) d˙˜i = −Pξ j d˜i By choosing observer gain Pξ j > 0, unmatched estimation error d˜i will converge to zero asymptotically. This shows dˆi will approach to di asymptotically. Theorem 2 Suppose that Assumptions 2 and 3 are satisfied for system (2). Considering system (2) under the proposed control law (9), (10), (11), (12), (13), (14) the closed-loop system is asymptotically stable if the switching gain in the control laws are designed such that K j > (β j + Pξ j )d˜i∗ and the observer gain Pξ j , Pη j are chosen such that Pη j > Pξ j > 0 holds. Proof From (7) and (15), one can write the general expression of sliding variable as: s j = ei+1 + β j ei + dˆi (25) where j = 1, . . . , 6 and i = 2 j − 1. By taking time derivative of (25), one can get s˙ j = f i+1 (x) + gi+1 (x)U j + di+1 − x¨id + d˙ˆi + β j (xi+1 + di − x˙id )

(26)

From (9), (10), (11), (12), (13), (14), one can write the general expression for control U j as: Uj =

1 − f i+1 (x) − dˆi+1 + x¨id − β j xi+1 − β j dˆi + β j x˙id − Γ j s j gi+1 (x) − K j |s j |γ sign(s j ) (27)

Substituting the control law (27) into (26), yields s˙ j = −Γ j s j − K j |s j |γ sign(s j ) + (β j + Pξ j )d˜i

5 Disturbance Observer-Based Sliding Mode Controller with Mismatched …

65

Now consider candidate Lyapunov function as: Vj =

1 2 s 2 j

(28)

By taking time derivative of (28), one can get V˙ j = s j s˙ j = −K j |s j |γ +1 + (β j + Pξ j )d˜i s j − Γ j s 2j (β j + Pξ j ) ˜ ∗ ≤ −[K j − di ]|s j |γ +1 − Γ j |s j |2 |s j |γ γ +1 (β j + Pξ j ) ˜ ∗ γ +1 = −2 2 [K j − di ]V j 2 − 2Γ j V j |s j |γ

(29)

(β +P ) If K j > |sj j |γξ j d˜i∗ and Γ j > 0, then it is clear from (29) that the system states are approaching to the sliding surface (25) (s j = 0) in finite time. Once sliding surface is reached, the sliding mode dynamics from (25) will become

ei+1 = −β j ei − dˆi From (2), one can also write e˙i = −β j ei − dˆi + di = −β j ei + d˜i

(30)

It has been verified that dˆi is approaching to the actual value of disturbance di asymptotically, hence d˜i = 0. Therefore, (30) will become e˙i = −β j ei

(31)

This implies that the tracking error ei will converge to zero asymptotically under the proposed control law.

5 Results This section shows the simulation results of the proposed NDO-SMC scheme for the quadrotor position and attitude tracking in the presence of matched and mismatched disturbances. The initial value of the state variables is taken as x = (0, π3 , 0, π8 , 0, π2 , 0, 0, 1, 0, −0.5, 0)T for the simulation, and the reference values are taken as xd = 0, yd = 0, z d = t for 0 < t < 2 and xd = 2 cos(t) sin(t)/(sin(t)2 + 1), yd = 2 cos(t)/(sin(t)2 + 1), z d = 2 otherwise. The desired heading or yaw angle is selected as ψd = π5 . Since we are assuming the bounded and slowly varying disturbances as the time progresses, hence we are considering the disturbances di = 0.1 cos(0.5t); di+1 = 0.1 sin(0.5t) of some random finite magnitude

66

V. K. Tripathi et al.

to met the specification. The quadrotor parameters are selected as m = 3.2 kg, l = 0.65 m, Jx = 7.5 × 10−3 kg.m2 , Jy = 7.5 × 10−3 kg.m2 , Jz = 1.3 × 10−2 kg.m2 , b = 3.13 × 10−5 kg.m, d = 7.5 × 10−7 kg.m2 . The controller and observer parameters are chosen as Γ j = 3; K j = 2; γ = 0.5, Pξ j = 1, Pη j = 6. In order to justify the significance of the NDO augmented with the robust controller, two cases have been considered which are as follows: • Case-I: When quadrotor model is subjected to matched and mismatched external disturbances, NDO is not augmented with the controller. • Case-II: When quadrotor model is subjected to matched and mismatched external disturbances with NDO augmented with the controller In addition, proposed NDO-SMC performances have been compared with conventional SMC to tell the superiority of the proposed NDO-SMC over SMC. As it is already well known that conventional SMC alone is able to reject the effect of matched disturbances effectively. It is interesting to know the performance of the SMC in the presence of matched and mismatched disturbance. Now in order to test the robustness of the SMC for Case-I, we are considering that the matched and mismatched external disturbances which are of sinusoidal type of very less frequency as mentioned above are acting on the quadrotor system. Since SMC is insensitive to matched disturbances, sensitive to mismatched disturbances therefore its tracking performance will get deteriorated. From Fig. 2a, one can notice that the actual trajectory is not converging to desired trajectory asymptotically which results tracking errors are not converging to zero. This can be easily verified from the error response as shown in Fig. 2b. Up to now, we have observed that the SMC with modified sliding variable performs better and reject the effect of matched disturbances enormously but failed to achieve nominal performances when subjected to mismatched disturbances in the absence of

error

1.5

2.5

e9

e11

e7

0 -1.5

1 2

Desired SMC

0.5 0 3

initial state

1 -1 y-pos

-3

-1

-0.5

0

0.5 x-pos

(a) Trajectory tracking response

1

error

z-pos

2 1.5

0

5 10 time (in sec) e1

e3

15 e5

-1 -4

0

5 10 time (in sec)

15

(b) Tracking error response

Fig. 2 Trajectory tracking and error response without observer with external disturbance.

5 Disturbance Observer-Based Sliding Mode Controller with Mismatched …

error

1.5

2.5

e9

67 e 11

e7

0

2 z-pos

1.5

-1.5

1 Desired SMC NDO-SMC

0 3

initial state

1

0

0

y-pos

5 10 time (in sec)

2 error

0.5

0

-3

-1

e3

e5

0 -2

x-pos

e1

15

0

(a) Trajectory tracking response.

5 10 time (in sec)

15

(b) Tracking error response.

Fig. 3 Trajectory tracking and error response with Proposed approach

u3

0

0

-75

-100 0

5

10

15

10

SMC NDO-SMC

0

-10 0

5

10

15

4

SMC NDO-SMC

u4

75

SMC NDO-SMC

u2

u1

100

SMC NDO-SMC

0

-4 0

5

10

15

0

5

10

time (in sec)

time (in sec)

time (in sec)

time (in sec)

(a) u1

(b) u2

(c) u3

(d) u4

15

Fig. 4 Control inputs

disturbance observer. In order to test the robustness of observer based SMC for CaseII, the tracking response and error response of the proposed augmented disturbance observer-based SMC and conventional SMC are shown in Fig. 3a and b, respectively. As one can see that the proposed scheme is able to track the reference trajectory with zero tracking error despite external matched and mismatched disturbances, whereas reference trajectory is not accurately tracked by the quadrotor in conventional SMC. This clearly indicates the robustness of the proposed methodology against all types of bounded uncertainties and disturbances. Apart from this, the proposed scheme attenuates chattering phenomena and provides smoother control efforts which ensure feasibility in real-time implementation. The control inputs for proposed NDO-SMC and SMC are depicted in Fig. 4. It can be noticed that the chattering amplitude is quite larger in SMC. The performance of the nonlinear disturbance observer will depend on the fact whether accurate estimation of disturbances in finite interval of time is taking place or not. The mismatched and matched disturbance estimation plots are shown in Fig. 5a and b, respectively. From the above discussion, one can conclude that the proposed composite control scheme is robust against both types of external disturbances and ensures less chattering phe-

68

V. K. Tripathi et al. 0.2

2

d5

0.3

1

dˆ5

dˆ7

d7

d9

d11

d9

d7

d5

0

-0.3

0

0

0

5 10 15 time (in sec)

-0.6

-0.5

-0.2

-1

0

dˆ11

0

0.5

1

d 11

dˆ9

5 10 15 time (in sec)

0

5 10 15 time (in sec)

0

5 10 15 time (in sec)

(a) Unmatched disturbance. 0.12

0

-2.5

d8

dˆ8

0

-0.12 0

5 10 15 time (in sec)

0.12

d 10

dˆ10

0

-0.12 0

5 10 15 time (in sec)

0.12

d12

dˆ6

d10

d6

d8

d6

2.5

d 12

dˆ12

0

-0.12 0

5 10 15 time (in sec)

0

5 10 15 time (in sec)

(b) Matched disturbance. Fig. 5 Disturbance estimation

nomenon because of less value of switching gain which will be just greater than the disturbance estimation error bound. Remark 1 As compared to [18], the position and attitude controllers are developed using SMC theory in the presence of external disturbances, which are coming directly to the actuator channel. The effect of external disturbances that are coming through the channel other than the actuator or control input channel is not taken into consideration in the development of robust flight controller in [18]. In this paper, both types of external disturbances (matched and mismatched) are considered to demonstrate the robustness feature of the SMC approach. Remark 2 Compared to sliding mode techniques discussed in [7, 8], the proposed scheme provides better tracking response despite the presence of large class of disturbance as the disturbance information is being used in the sliding variable design.

6 Conclusion This paper discussed a disturbance observer-based robust controller design for quadrotor motion control. A nonlinear disturbance observer is proposed to estimate the unknown bounded matched and mismatched disturbances that are acting on the quadrotor model. The developed observer is augmented with the sliding mode controller based on proportional control reaching law. The proposed approach also limits the problem of chattering up-to a certain extent because of lesser controller gain value.

5 Disturbance Observer-Based Sliding Mode Controller with Mismatched …

69

The asymptotic stability of the overall system is analysed using Lyapunov stability theory. MATLAB simulations demonstrate the validity of the proposed strategy. Future research will be concentrated on the experimental validation of the developed approach.

References 1. Tripathi VK, Behera L, Verma L (2015) Design of sliding mode and backstepping controllers for a quadcopter. In: 39th national systems conference (NSC), pp 1–6 2. Nandanwar A, Dhar NK, Malyshev D, Rybak L, Behera L (2021) Finite-time robust admissible consensus control of multirobot system under dynamic events. IEEE Syst J 15(1):780–790 3. Nandanwar A, Tripathi VK, Behera L (2021) Fault-tolerant control for multi-robotics system using variable gain super twisting sliding mode control in cyber-physical framework. IEEE/ASME international conference on advanced intelligent mechatronics (AIM), pp 1147– 1152 4. Besnard L, Shtessel YB, Landrum B (2012) Quadrotor vehicle control via sliding mode controller driven by sliding mode disturbance observer. J Franklin Inst 349(2):658–684 5. Hamadi H, Lussier B, Fantoni I, Francis C, Shraim H (2019) Observer-based super twisting controller robust to wind perturbation for multirotor UAV. In: International conference on unmanned aircraft systems (ICUAS), pp 397–405 6. Nguyen NP, Kim W, Moon J (2018) Observer-based super-twisting sliding mode control with fuzzy variable gains and its application to overactuated quadrotors. In: IEEE conference on decision and control (CDC), pp 5993–5998 7. Wang H, Chen M (2014) Sliding mode attitude control for a quadrotor micro unmanned aircraft vehicle using disturbance observer. In: Proceedings of IEEE Chinese guidance, navigation and control conference, vol 2014, pp 568–573 8. Jia Z, Yu J, Mei Y, Chen Y, Shen Y, Ai X (2017) Integral backstepping sliding mode control for quadrotor helicopter under external uncertain disturbances. Aerosp Sci Technol 68:299–307 9. Ríos H, Falcón R, González OA, Dzul A (2018) Continuous sliding-mode control strategies for quadrotor robust tracking: real-time application. IEEE Trans Ind Electron 66(2):1264–1272 10. Cheng X, Liu Z-W (2019) Robust tracking control of a quadcopter via terminal sliding mode control based on finite-time disturbance observer. In: 14th IEEE conference on industrial electronics and applications (ICIEA), pp 1217–1222 11. Tian B, Cui J, Lu H, Zuo Z, Zong Q (2019) Adaptive finite-time attitude tracking of quadrotors with experiments and comparisons. IEEE Trans Ind Electron 66(12):9428–9438 12. Wang N, Deng Q (2018) Finite-time disturbance observer based integral sliding mode control of a quadrotor. In: 33rd youth academic annual conference of Chinese association of automation (YAC), pp 956–960 13. Ahmed N, Chen M (2018) Sliding mode control for quadrotor with disturbance observer. Adv Mech Eng 10(7):1687814018782330 14. Nandanwar A, Nair RR, Behera L (2020) Fuzzy inferencing-based path planning with a cyberphysical framework and adaptive second-order SMC for routing and mobility control in a robotic network. IET Cyber-Syst Rob 2(3):149–160 15. Aboudonia A, Rashad R, El-Badawy A (2018) Composite Hierarchical anti-disturbance control of a quadrotor UAV in the presence of matched and mismatched disturbances. J Intell Rob Systms 90(1–2):201–216 16. Fethalla N, Saad M, Michalska H, Ghommam J (2017) Robust tracking control for a quadrotor UAV. In: 25th Mediterranean conference on control and automation (MED), pp 1269–1274

70

V. K. Tripathi et al.

17. Tripathi VK, Behera L, Verma N (2016) Disturbance observer based backstepping controller for a quadcopter. In: 42nd IEEE conference of the industrial electronics society, pp 108–113 18. Tripathi VK, Kamath AK, Verma NK, Behera L (2019) Fast terminal sliding mode super twisting controller for position and altitude tracking of the quadrotor. Int Conf Robot Autom (ICRA) 2019:6468–6474

Chapter 6

Multi-robot Formation Control Using Integral Third-Order Super-Twisting Controller in Cyber-Physical Framework Anuj Nandanwar, Vibhu Kumar Tripathi, and Laxmidhar Behera

1 Introduction Nowadays, with the advancement in science and technology, several industrial applications evolved which are complex and tedious. For maintaining the nominal operation of various critical processes, there are numerous reliable control systems have been developed in a very large scale. A popular real-time example of a complex largescale control system is a CPS [1]. The physical world and cyber world coupling along with coordination ability results the use of CPS to de-configure the autonomous control structure with a high scale of automation. The presented idea can be applied primarily to the MRS [2]. There is a significant growing interest involved on distributed cooperative and consensus control problem for MRSs [3] over last two decades for its widespread utilization in various ranges, such as mapping and exploration, surveillance, and search and rescue missions. In order to accomplish any assigned tasks, a perfect path planning algorithm is needed. Motion planning ability of the robots helps in finding the optimal path between sources and goal despite various obstacles. Generally, the path planning approach needs complete information of the environment to generate an obstacle-free path. One of the model-free solutions for path planning is fuzzy logic. It is an organized method to handle imprecise data [4]. It is applicable in most of the situation where lack of precision is needed. Another variant of path A. Nandanwar (B) · L. Behera Department of Electrical Engineering, IIT Kanpur, Kanpur, India e-mail: [email protected] L. Behera e-mail: [email protected] V. K. Tripathi Department of Electrical and Electronics Engineering, Pranveer Singh Institute of Technology Kanpur, Kanpur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_6

71

72

A. Nandanwar et al.

planning is communication-aware motion planning. The primary purpose of this is to employed the complete awareness of connectivity term to plan the motion with communication constraints properly. A framework on communication-aware motion planning is developed for increasing the probability to maintain connectivity by robot with fixed station during tracking the target under realistic communication environment [5]. The formation tracking control aims to follows a reference trajectories while maintaining the specified shape. The various approaches have been developed to handle fixed formation [6–9]. This is primary step to handle a predefined formation shape. The time-varying formation of moving targets has helped to overcome some adverse environmental factors to consider higher-level tasks such as tracking the trajectory and environment of real or virtual guides, rice fields. Therefore, based on time-varying formation, a circular control law is proposed in [10] for single target. Similarly, [11] investigated a cooperative controller for multiple non-holonomic dynamics vehicles to form a circular formation around a target. The time-varying formation tracking problem is investigated in [12] for first-order MAS with a leader and an unknown control law. It should be pointed out that most reported literature has restricted its focus on the asymptotically convergence property of the MRSs and did not handling the uncertainties. Time-varying finite and fixed-time formation tracking problems using distributed control protocols were investigated in [13]. The higher-order SMC (HOSMC) is one of the most advanced robust SMC which improves system tracking performance in the presence of system uncertainties and ensures the finite-time convergence. HOSMC eliminates the chattering phenomena and keep the sliding variable and its higher derivative at zero in order to improve the robustness. The super-twisting sliding mode control is another prominent variant of HOSMC which is mainly applicable in real-time scenario [14]. Super-twisting algorithm (STA) [15, 16] improved the robustness while preserving nominal tracking performance. The nonsingular fast terminal sliding mode control (NFTSMC) is one of the better solution to achieve fast and finite-time convergence response. Unlike terminal sliding mode controller (TSMC), NFTSMC does not show the singularity problem. However, conventional NFTSMC does not impact the steady-state response. In order to improve the system steady-state response, it has been reported in the literature that an addition of an integral term in the surface is the first choice of the control engineer and researchers. To attain both faster tracking response and finite-time convergence, integral TSMC is introduced that contain the advantages of ISMC and TSMC. The cooperative control of MRSs using ITSMC has been studied in [17]. In summary, fast transient response, finite-time convergence, and chattering elimination are the key factors which has to be considered during the development of the robust controller [18]. Most of the real-time robotic systems structure can be represented as first- and second-order integrator model. Using the concept of SOSMC, several nonlinear controllers have been developed for such systems. However, SOSMC shows some chattering due to the discontinuity involved in its structure. In order to provide a finite-time convergence using continuous control law, Lavant [19, 20] proposed a third-order super-twisting sliding mode controller. To deploy these type of controller,

6 Multi-robot Formation Control Using Integral Third-Order …

73

first and second derivatives of sliding variable are required. Qingsong et al. developed a motion control technique for a piezoelectric-driven nanopositioning system using integral terminal third-order SMC [21]. A diverse state of the art for time-varying formation in MRS has been investigated without considering the modeling of the cyber term. Generally, an environmental factors are not considered in the development of a time-varying formation controller. While it has been observed in many cases that the distance between agents will be affected according to the different environmental conditions. Therefore, it is necessary to consider an environmental effect into the account. The communication among robots naturally leads a system to a cyber-physical system. The optimal wireless network is developed for mobility control of MRS in [22]. Similarly, continuous motion planning and discontinuous optimization problem have developed in [23]. In [24], the authors discussed the CPS aspect of MAS with fault-tolerance capability using the variable gain super-twisting sliding mode control. Motivated by the above discussion, a time-varying formation control in the cyber-physical framework is studied for an MRS using third-order super-twisting controller with integral fast terminal sliding surface in this work. Most of the works have addressed the time-varying formation control without considering the SMC methodology and CPS framework. This approach also demonstrates that the agents are capable of changing their reaction ability corresponding to the communication environments. In this work, we are come up first time with the solution of the time-varying formation control problem with the help of HOSMC. The key contributions are given below: • Development of the robust controller using nonlinear integral sliding manifold and third-order super-twisting switching law that provides the finite-time convergence with chattering mitigation as well. • A nonlinear sliding manifold has been incorporated that guarantees are tracking error in fast and finite-time convergence. • A proof of convergence time is also presented for this approach. • The finite-time stability of the overall closed-loop system is analyzed using Lyapunov stability theory and the time of convergence has been investigated in this work. • The comparative study with the help of extensive numerical simulations is presented to indicate that the potency and excellence of the proposed approach.

2 Problem Formulation and Preliminaries The basic schematic of the multi-robotic formation control structure under the CPS framework is shown in Fig. 1. From Fig. 1, it is obvious that the control structure is made up of three layers denotes the physical layer, cyber layer, and control layer. The physical layer consists of an actual robot model that produces the velocity and position state variable which needs to be controlled. The control layer consists of the main control system which takes the necessary state variables information from

74

A. Nandanwar et al.

Fig. 1 Multi-robot formation control in cyber-physical framework

Cyber

Robot interaction model (

)

Physical

Multi-Robot System Control

Third Order STSMC

Integral Fast Terminal Sliding Surface

Time-varying Formation control

the physical layer in order to control the process autonomously. The cyber layer is formed due to the presence of a communication network which makes communication possible between the controller and plant, i.e., physical robots. It is to be noted that the control layer deals with the time-varying term, velocities consensus unit, and distance formation unit.

2.1 Physical Model Let us assume a nonlinear second-order uncertain MRS as x˙i = vi v˙i = f (xi , vi , t) + di (xi , vi , t) + u i

(1)

where xi represents the ith robot position, vi indicates the ith robot velocity, f (xi , vi , t) represents ith robot associated nonlinearity, di (xi , vi , t) represents the lumped external disturbances, and u i represent the control inputs that will be acting to the ith robot. The time-varying formation is denoted by h and defined as h = [h T1 , h T2 , . . . , h TN ]T where h i ∈ R n is piecewise-continuously differentiable.

2.2 Cyber Modeling As mentioned prior, the cyber term establishes the communication link among composite controller and mobile robot. Mostly, graph theory is employed to obtain the model of the communication topology among the several agents. Generally, the

6 Multi-robot Formation Control Using Integral Third-Order …

75

graph theory does not taken communication parameters in to consideration. Therefore in this work, environmental communication parameters and switching topology are taken in the mobile robot network. The cyber modeling for the robot is defined below: r υ r0 ij exp − ahγ0 ψ(ri j ) = exp − aγ0 (2) r0 ri j where the distance from robot i to robot j is denoted by ri j , r0 denotes the reference distance, and υ denotes the path loss exponent, that varies with the environment. The detailed description can be found in [25]. Assumption 1 The function given by (2) is bounded having a strict maximum ψ ∗ = 1

ψ(ri∗j ), where ri∗j = r0 υh υ+1 . The agent i neighborhood can be computed as Ni = { j ∈ M |bi j ≥ PT }

(3)

Here a group of robots is defined by M . The information threshold PT can be defined as a threshold to recognize information coming from neighborhood robot to robot or sending to other robots. The channel capacity is denoted by bi j . The communication topology behaves like a switch topology concerning (3).

2.3 Problem Formulation This work aims to achieve a desired time-varying formation under the CPS framework. The current work mainly focuses on the following two problem for singleintegrator MRS: (1) how to design a distributed formation tracking control u i (t) using third-order super-twisting sliding mode controller. (2) What is role play a cyber-based controller design. Mathematically, one can formulated the control problem as lim (xi (t) − x j (t) − h i (t)) = 0 i = 1, 2, . . . , N

t→t f

(4)

then robots attained a finite-time time-varying formation tracking under CPS framework.

3 Path Planning and Controller Design This section develops a third-order STSMC controller with an integral fast terminal sliding manifold for formation tracking control.

76

A. Nandanwar et al.

3.1 Path Planning Control In order to solve path planning problem during time-varying formation among robots, the position and time-varying formation terms are needed. Path planning approach generates a feasible trajectory and SMC-based control track this path. To investigate the time-varying formation control, considered the tracking error in the form of position and path planning control input is defined as: t ei (t) = xi (t) −

ϕi dt

(5)

0

Based on local information, the path planning controller is defined by ϕ˙i = qi (t) = μ1 xi (t) + μ2 (xi (t) − h i (t)) + vi (t) ai j ((x j (t) − h j (t)) − (xi (t) − h i (t))) + μ3

(6)

j∈Ni

where μ1 , μ2 , and μ3 are positive constants. The parameter μ1 is used to expand the set of feasible time-varying formation h(t). μ2 and μ3 ensure the states of all robots achieve a desired formation. The protocol given in (6) achieves a consensus tracking in finite time in the absence of disturbance. This protocol is responsible for collision-free path planning and solve the connectivity issue among robots. Due to space constraints, the stability analysis of path planning protocol is not given in this paper.

3.2 Sliding Manifold Design To avoid the singularity problem in control input and enhance the time of convergence, the sliding manifold in terms of integral fast terminal sliding variable is defined as: t q1 Si = ei + (α1 ei + β1 |ei | p1 sign(ei ))dt (7) 0

where α1 > 0, β1 > 0, and p1 , q1 are positive odd integer satisfy p1 > q1 to avoiding singularity issue in control input. The time derivative of Si is defined as: q1

S˙i = e˙i + α1 ei + β1 |ei | p1 sign(ei )

(8)

Remark In SMC, the main objective is to drive system states from any initial states toward the sliding manifold and keep maintaining the sliding motion on the sliding

6 Multi-robot Formation Control Using Integral Third-Order …

77

manifold till the equilibrium point is reached. The phase in which states move toward the sliding manifold is known as reaching phase which generally ensures the finitetime convergence of the sliding variable. On the other hand, the convergence of phase in which sliding motion take place known as sliding phase generally depends upon the choice of sliding manifold. Therefore, one has to design such manifold which can improve the speed of convergence during sliding phase.

3.3 Control Law Design The proposed control input based on (5), (7), and the TOSTSMC reaching law is as follows: u i = qi (t) − α1 e˙i − β1

q1 q1 −1 |ei | p1 e˙i − K 1 |δi |1/2 sign(δi ) − p1

t K 3 sign(δi )dτ

(9)

0

where δi = S˙i + K 2 |Si |2/3 sign(Si ).

3.4 Stability Analysis This section will analyze the finite-time close-loop stability of MRS using Lyapunov stability theory. The objective of this section is to show whether sliding manifold has achieved the zeros in finite time. Once sliding variable is zero, error variable will slide along the manifold and converge to zero in finite time with bounded disturbance. Theorem 1 For the system (1), if the sliding variables chosen as (7) then the control laws are designed as (9) will drive the system trajectories toward the sliding manifolds in finite time tr , i.e., Si = 0 and will try to maintain the system trajectories on the sliding manifolds thereafter despite the bounded external disturbance di . Once Si = 0, the finite-time tracking errors convergence can be achieved in time t f . Proof For the system (1), the double time derivative of the sliding variable (7) is calculated as: q1 −1 S¨i =¨ei + α1 e˙i + β1 |ei | p1 e˙i p1 q1

(10)

with the help of e¨i and control input defined in (9), we can computed as: S¨i = −K 1 |δi |1/2 sign(δi ) + di + ηi η˙ i = −K 3 sign(δi )

(11)

78

A. Nandanwar et al.

Let us assume νi = di + ηi , then S¨i from (11) becomes: S¨i = −K 1 |δi |1/2 sign(δi ) + νi ν˙ i = −K 3 sign(δi ) + d˙i

(12)

Assuming d˙i is bounded and satisfies |d˙i | ≤ d¯s . Now, let σ1 = Si , σ2 = S˙i and σ3 = νi . Then (12) can be re-written in third-order super-twisting form as: ⎫ ⎪ ⎬

σ˙ 1 = σ2

σ˙ 2 = −K 1 |δi |1/2 sign(δi ) + σ3 ⎪ ⎭ σ˙ 3 = −K 3 sign(δi ) + d˙i

(13)

In order to show the convergence of σ1 , σ2 , and σ3 , consider the positive definite candidate Lyapunov function as given below: V1 = γ1 |σ1 |4/3 − γ12 σ1 2/3 (σ2 + K 2 σ1 2/3 ) + γ2 |σ2 + K 2 σ1 2/3 |2 + γ13 σ1 2/3 σ3 2 − γ23 (σ2 + K 2 σ1 2/3 )σ3 2 + γ3 |σ3 |4

(14)

where σ1 2/3 = |σ1 |2/3 sign(σ1 ); σ3 2 = |σ3 |2 sign(σ3 ). Let us also write candidate Lyapunov function in quadratic form as: V1 = ζ T ζ

(15)

where ζ = [σ1 2/3 δi σ3 2 ]T ; δi = (σ2 + K 2 σ1 2/3 ) and ⎡

⎤ γ1 − 21 γ12 21 γ13 = ⎣− 21 γ12 γ2 − 21 γ23 ⎦ 1 γ − 21 γ23 γ3 2 13

(16)

In order to ensure the finite-time convergence V1 must be greater than zero and time derivative of (14) must be negative definite. To show V1 > 0, matrix must be positive definite. This means all the pivots of must be positive. To ensure this, the coefficients (γ1 , γ12 , γ13 , γ2 , γ23 , γ3 ) must satisfy the following conditions: γ1 > 0, γ1 γ2 >

1 2 γ 4 12

1 1 2 1 2 1 2 γ1 γ2 γ3 − γ23 + γ12 γ23 γ13 > γ12 γ3 + γ2 γ13 4 4 4 4 In this case, V˙1 satisfies the differential inequalities for some positive κ that can always computed for the set of gains K 1 , K 2 , and K 3 . 3/4 V˙1 ≤ −κ V1

(17)

6 Multi-robot Formation Control Using Integral Third-Order …

79

From the above inequality, it is clear that σ1 , σ2 , and σ3 are converging to zero in finite time Tr for every value of the derivative of perturbation |d˙i | ≤ d¯s . The convergence time t0 of the estimated error from any initial condition σ0 can be calculated from (17) as: 4 1/4 Tr = V1 (σ0 ) (18) κ To show the tracking error convergence during sliding mode in finite time, considered Si = 0 in (7). Therefore, the close dynamics of the error look like as: e˙i = −α1 ei − β1 |ei |q1 / p1 sign(ei )

(19)

Let’s select candidates for Lyapunov functions V2 = 0.5ei2

(20)

By using the time derivative of V2 , V˙2 will become V˙2 = ei e˙i = ei (−α1 ei − β1 |ei |q1 / p1 sign(ei )) ≤ −α1 |ei |2 − β1 |ei |( p1 +q1 )/ p1 ( p +q )/2 p = −2α1 V2 − β1 2( p1 +q1 )/2 p1 V2 1 1 1

(21)

Therefore, ei reach to zero in a finite time Ts with proper selection of α1 , η1 > 0. Let V2 (Tr ) is the initial value of V2 during sliding phase and μ = ( p1 + q1 )/2 p1 and ρ = β1 2( p1 +q1 )/2 p1 . Then one can rewrite V˙2 as: μ V˙2 = −2α1 V2 − ρV2

(22)

From above expression, the time Ts can be computed as: ρ + 2α1 (V2 (Tr ))(1−μ) 1 ln Ts = 2α1 (1 − μ) ρ

(23)

Therefore, the settling time is described as a total convergence time of tracking error. It has two-part, first is reaching time Tr and second is sliding time Ts . The expression of setting time T f is as follows : ρ + 2α1 (V2 (tr ))(1−μ) 4 1/4 1 ln T f = V1 (σ0 ) + κ 2α1 (1 − μ) ρ

80

A. Nandanwar et al.

4 Time-Varying Formation Control Under CPS Framework In the last section, the cyber term, i.e., an environmental element, is not taken for controller design. Therefore, the controller could be adaptive to change of communication environment. Here, we introduced an environmental term in control law which defined in (2). This gives a feasible of a controller to deal with a change of communication environments. The modified control law based on (9). ui =

xi ψ(ri j ) + qi (t) − α1 e˙i − β1

j∈Ni

q1 q1 −1 |ei | p1 e˙i − K 1 |δi |1/2 sign(δi ) p1

t K 3 sign(δi )dτ

−

(24)

0

To examine the stability under close-loop and efficacy of the developed control input, We construct a potential function using the adopted communication model has defined as follow: ∗ ψ − ψ(ri j ); 0 ≥ ri j ≥ R (ri j ) = (25) ψ(R); ri j > R The potential function ψ(ri j ) remains fixed for ri j > R does not affect the agents if they are not in range. The stability analysis of the interaction model alongside switching topology can be studied by the non-smooth variant of LaSalle’s invariance principle [26].

5 Simulation Results This section has demonstrated the effectiveness of numerical simulations to examine the performance of the proposed approach. A nonlinear second-order MRS with six robots is assumed in the interaction of switch communication. The initial position of MRS is taken as x1 = [1.68, 4.5]T , x2 = [3.9, 1.95]T , x3 = [0.48, 0.65]T , x4 = [2.87, 0.3]T , x5 = [4.1, 0.1]T , and x6 = [3.25, 3.65]T and the initial robot velocities of MRS is chosen as v1 = [−0.52, −1.55]T , v2 = [−1.03, −0.38]T , v3 = [1.76, 1.82]T , v4 = [−1.06, −0.58]T , v5 = [−1.82, −1.31]T , and v6 = [0.59, −0.19]T . The predefined time-varying formation is given by ⎡ hi =

1 t ⎣ 2 cos 21 t

sin

+ +

⎤

(i−1)π 3 ⎦ (i−1)π 3

where i = 1, 2, . . . , 6.

6 Multi-robot Formation Control Using Integral Third-Order …

81

The simulation time and step size for the proposed approach are 60 s and 0.01 s, respectively. The disturbance that is acting on the robots are di = [0.5 sin(0.5t), 0.5 cos(0.5t)]T . We have performed numerical simulations using MATLAB. The communication parameters are given as a = 1 × 10−6 , γ0 = 5, r0 = 1.5, h = 2 × 105 , and PT = 96%. The parameters γ0 and PT are calculated based on application requirement. The antenna characteristics depend on the value of parameters r0 and h. The path loss exponent of the environment is chosen as μ = 3. The parameter a is influenced by the communication environment. In the first case, we analysis the robustness of path planning controller given in (6) in the presence of disturbance. The initial and final position of the robots can be seen in Fig. 2a at t = 0 s and t = 60 s, respectively. It is noticed that robots achieve the desired formation properly, and it is continuously maintained throughout the process. However, due to presence of disturbance, the trajectory of robots are shaky in nature. The position error is converges zeros at t = 12 s. A assumption is considered for stable formation control that the relative velocity between robots is to be finite. In the second case, we tested the proposed control law in (24) in the presence of external disturbances. The trajectory plot of robots in Fig. 2b is performed a very good shape even the system is influenced by the disturbance. The proposed controller is able to maintain the minimum and maximum inter robot distance among robots which is shown in Fig. 3a. The minimum and maximum inter robot distance converges at 6 m which confirm that the robot formed a hexagon formation. This ensures the connectivity and collisions avoidance issues can be resolved easily throughout the formation. The velocity disagreement (vi − v j ) ∀ (i, j) in dynamic environment is shown in Fig. 3b. The velocity disagreement potential converges to zero, this concluded that the consensus among robots is achieved in velocities, which never changes concerning time. The norm of the control inputs is plotted to illustrate the comparison purpose. This is shown in Fig. 3c. It is clear that the proposed controller works well with

40

R-1

R-2

R-3

R-4

R-5

40

R-6

0

-20

R-3

R-4

R-5

R-6

0

-20 t=0sec

-40 -10

R-2

20

Y(m)

Y(m)

20

R-1

0

10

t=20sec

20

30

t=40sec

40

50

t=60sec

60

t=0sec

70

-40 -10

0

10

t=40sec

t=20sec

20

30

X(m)

X(m)

(a)

(b)

40

t=60sec

50

60

70

Fig. 2 a Formation trajectory using path planning controller in the presence of uncertainties. b Formation trajectory using proposed controller in the presence of uncertainties under CPS framework. Where ‘R’ represents the robot and index is define by i = 1, 2, . . . , 6

A. Nandanwar et al.

10

Maximum

Minimum

5

0 0

10

20

30

40

50

60

Time(sec)

SMC

STSMC

600 400 200 0 0

10

20

30

40

50

60

Time(sec)

(b) 1

TOSTSMC

Existing

40

J

Control Input||u||

(a) 60

Velocity disagreement (m/s)2

Distance bet. two agents (m)

82

Proposed

0.5

20 0

0 0

10

20

30

40

50

60

0

10

20

30

Time(sec)

Time(sec)

(c)

(d)

40

50

60

Fig. 3 Performance profile of robots a inter robot distance b velocity disagreement c control input and d the average communication performance

very high accuracy compared to SMC and STSMC. To check the consistency of SMC, STSMC and proposed controller, i.e., third-order STSMC (TOSTSMC), a performance criteria (average control input norm) is considered. The average control input norm for SMC, STSMC, and the proposed approach is calculated as 19.79, 3.63, and 2.6557, respectively. It illustrates that the proposed method has reasonable control preciseness. For the purpose of fair comparison between the controller with CPS and controller without CPS, we calculated a performance criteria using average communication performance (J ). The average communication performance is defined by J =

n 1 1 ψ(ri j ) 2n i=1 |Ni | j∈N

(26)

i

Figure 3d shows the average communication performance. The controller without cyber-physical framework has converges to ψ ∗ = 48%, where as controller with cyber-physical framework has converges to ψ ∗ = 91%. It is implied that the proposed controller has maximized the average communication performance compare to existing approaches. Based on analysis, CPS-based controller is needed when environment coefficient is changes with time. The drawback of existing controller is cannot be deal with changes in environment with time due to the absence of the communication module that has established a communication structure. In summary, the SMC produces the chattering of much amplitude. In order to alleviate the chattering phenomena, a third-order super-twisting controller is a best option. This TOSTSMC operates like a disturbance observer, reducing chattering magnitude compared to STSMC. Based on simulation results, we found

6 Multi-robot Formation Control Using Integral Third-Order …

83

some remarkable conclusions, i.e., (i) TOSTSMC is more robust to uncertainty than STSMC. (ii) Based on the CPS framework, the controller is the ability to handle a change in environment coefficient easily.

6 Conclusions This work developed a control approach for time-varying formation between robots under the CPS framework and considered system uncertainty. The controller without cyber-physical framework is not capable to consider the environmental factors that affects the system performance drastically. To overcome these limitations, the third-order super-twisting sliding mode controller using integral fast terminal sliding surface under the consideration of the environmental communication parameter is developed. The reference trajectory is generated by the path planning approach and proposed controller is capable to track lateral position and velocity trajectory. Proposed scheme ensures the fast transient and steady-state response, chattering alleviation with convergence in finite time and very productive in the tasks where environmental coefficient needs to be adaptive. The finite-time stability of a closed control loop is examined by Lyapunov stability theory. The efficacy of the proposed approach has been validated through numerical simulations both with and without CPS-based control. Future work will be the extension of the proposed approach under the category of the resource optimization, packets delay, packet loss, and fault detection and isolation units.

References 1. Lee EA (2006) Cyber-physical systems-are computing foundations adequate. In: Position paper for NSF workshop on cyber-physical systems: research motivation, techniques and roadmap, vol 2. Citeseer, pp 1–9 2. Nandanwar A, Behera L, Shukla A, Karki H (2016) Delay constrained utility maximization in cyber physical system with mobile robotic networks. In: IECON 2016-42nd Annual conference of the IEEE industrial electronics society. IEEE, pp 4884–4889 3. Ren W, Beard RW (2008) Distributed consensus in multi-vehicle cooperative control, vol 27, no 2. Springer, Berlin 4. Von Altrock C (1995) Fuzzy logic and neurofuzzy applications explained. Prentice-Hall, Inc. 5. Ghaffarkhah A, Mostofi Y (2011) Communication-aware motion planning in mobile networks. IEEE Trans Autom Control 56(10):2478–2485 6. Guo J, Yan G, Lin Z (2010) Local control strategy for moving-target-enclosing under dynamically changing network topology. Syst Control Lett 59(10):654–661 7. Liu T, Jiang Z-P (2013) Distributed formation control of nonholonomic mobile robots without global position measurements. Automatica 49(2):592–600 8. Dalla VK, Pathak PM (2015) Trajectory tracking control of a group of cooperative planar space robot systems. Proc Inst Mech Eng Part I J Syst Control Eng 229(10):885–901 9. Dalla VK, Pathak PM (2017) Curve-constrained collision-free trajectory control of hyperredundant planar space robot. Proc Inst Mech Eng Part I J Syst Control Eng 231(4):282–298

84

A. Nandanwar et al.

10. Brinon-Arranz L, Seuret A, Canudas-de-Wit C (2014) Cooperative control design for timevarying formations of multi-agent systems. IEEE Trans Autom Control 59(8):2283–2288 11. Yu X, Liu L (2016) Cooperative control for moving-target circular formation of nonholonomic vehicles. IEEE Trans Autom Control 62(7):3448–3454 12. Xu Y, Wang Z, Chen J (2019) Formation tracking control for multi-agent systems on directed graphs. In: Chinese control conference (CCC). IEEE, pp 47–52 13. Zhang W, Zhao Y, He W, Wen G (2020) Time-varying formation tracking for multiple dynamic targets: finite-and fixed-time convergence. IEEE Trans Circ Syst II Express Briefs 14. Levant A (1993) Sliding order and sliding accuracy in sliding mode control. Int J Control 58(6):1247–1263 15. Tripathi VK, Kamath AK, Verma NK, Behera L (2019) Fast terminal sliding mode super twisting controller for position and altitude tracking of the quadrotor. In: 2019 International conference on robotics and automation (ICRA). IEEE, pp 6468–6474 16. Nandanwar A, Dhar NK, Malyshev D, Rybak L, Behera L, Stochastic event-based supertwisting formation control for multi-agent system under network uncertainties. In: IEEE Transactions on control of network systems. https://doi.org/10.1109/TCNS.2021.3089142. 17. Khoo S, Xie L, Man Z (2009) Integral terminal sliding mode cooperative control of multi-robot networks. In: 2009 IEEE/ASME International conference on advanced intelligent mechatronics. IEEE, pp 969–973 18. Van M, Mavrovouniotis M, Ge SS (2018) An adaptive backstepping nonsingular fast terminal sliding mode control for robust fault tolerant control of robot manipulators. IEEE Trans Syst Man Cybern Syst 49(7):1448–1458 19. Kamal S, Chalanga A, Moreno JA, Fridman L, Bandyopadhyay B (2014) Higher order supertwisting algorithm. In: 13th International workshop on variable structure systems (VSS), pp 1–5 20. Kamal S, Moreno JA, Chalanga A, Bandyopadhyay B, Fridman LM (2016) Continuous terminal sliding-mode controller. Automatica 69:308–314 21. Xu Q (2017) Continuous integral terminal third-order sliding mode motion control for piezoelectric nanopositioning system. IEEE/ASME Trans Mechatron 22(4):1828–1838 22. Zavlanos MM, Ribeiro A, Pappas GJ (2010) Mobility & routing control in networks of robots. In: 49th IEEE Conference on decision and control (CDC). IEEE, pp 7545–7550 23. Nandanwar A, Nair RR, Behera L (2020) Fuzzy inferencing-based path planning with a cyberphysical framework and adaptive second-order SMC for routing and mobility control in a robotic network. IET Cyber-Syst Rob 2(3):149–160 24. Nandanwar A, Tripathi VK, Behera L (2021) Fault-tolerant control for multi-robotics system using variable gain super twisting sliding mode control in cyber-physical framework. In: IEEE/ASME International conference on advanced intelligent mechatronics (AIM), pp 1147– 1152 25. Goldsmith A (2005) Wireless communications. Cambridge University Press 26. Shevitz D, Paden B (1994) Lyapunov stability theory of nonsmooth systems. IEEE Trans Autom Control 39(9):1910–1914

Chapter 7

RFE and Mutual-INFO-Based Hybrid Method Using Deep Neural Network for Gene Selection and Cancer Classification Samkit Jain, Rashmi Maheshwari, and Vinod Kumar Jain

1 Introduction In the present era, machine learning has evolved to be an extremely powerful tool for data analysis. The various applications of machine learning in the recent decade in the biological field are—chemical formulations, drug discovery, tumor classification, etc. Prediction models are very interesting applications of machine learning which are being used in different biological applications [1]. Machine learning-based statistical models can be used for the identification of cancer-causing tissues and differentiate them from normal tissues, then classified them. These models use gene expression profiles which makes them challenging problems [2]. The branch that uses a feature selection-based method to select a subset of relevant and significant genes for cancer classification and prediction is called as gene selection process. The several issues related to microarray datasets are as follows [3]: (i) Selection of an informative genes subset from a high-dimensionality dataset is an NP-hard problem. Thus, evolutionary methods and bio-inspired algorithms are used widely. (ii) The second difficulty is having the sparsity, that is occurred due to the fewer samples as compared to the high number of features (genes) that make data sparse. (iii) The third problem is the great complexity of gene expression data, which is caused by the strong correlation between genes and their interactions. As a S. Jain (B) · R. Maheshwari (B) · V. K. Jain Department of Computer Science and Engineering, PDPM Indian Institute of Information Technology, Design and Manufacturing Jabalpur, Jabalpur, India e-mail: [email protected] R. Maheshwari e-mail: [email protected] V. K. Jain e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_7

85

86

S. Jain et al.

result, choosing highly relevant genes becomes a difficult challenge. If the number of informative genes is reduced, the process of finding contaminated genes will be easier [4]. Many gene selection methods have already been proposed by various authors to solve the challenges. Filter models, wrapper models, and hybrid models are the three types of models that can be found. Rather than learning techniques, the filter model is based on the statistical properties of training data. The wrapper methods use a well-defined learning algorithm to find the optimized subset of genes/features. These models are generally bio-inspired or evolutionary which depend upon their population. The computational cost of this approach is high for high-dimensional dataset [5]. The performance of wrapper approaches is considered better than filter methods since there is an interaction between solution and predictors. The main benefit of the wrapper method is it wraps up both search space and classification in the same method. The search method explores all the subsets possible from a given set of genes [6]. The SVM-RFE is a wrapper-based method that is widely used for feature/gene selection in biomedical fields. The application of SVM is limited in biomedical data as it is not designed to evaluate prediction models and predictor variables [7]. The method is applicable for both binary classes and multiclass classification. The method is applied to reduce the dataset dimensionality. It is used for finding the highly relevant genes responsible for causing cancer. Mutual information has been also used in the literature for the selection of highly relevant genes for causing the tumors [8]. It calculates the value of correlation among the genes and classification class labels. The genes highly relevant to the classification class and highly uncorrelated to other genes are selected using this method. Deep learning models have also been frequently employed in bioinformatics, particularly in biomedical imaging [9]. However, deep learning applications in tumor classification remain rare or uncommon [10]. Among all the models of deep learning, convolutional neural network (CNN) is a widely used model for images. A brief introduction of CNN can be given as—CNN can extract the features from the given raw input. The weights are tied to neurons along with biases in conventional fully connected layers (multilayer perceptron) as well as sparse networks like CNN. CNNs process large and heavy input data of any input size. The small changes in the input do not harm much to CNN models. The main application of CNN is on images, but it is now widely used in other domains to produce good results for the high-dimensional inputs. The two types commonly used include—1DCNN and 2DCNN. The images are two-dimensional input and thus make 2DCNN ideal for the output but gene vectors are one-dimensional so conventional CNN cannot be used for the same. So, the concept of 1DCNN has been used in the other areas which provide results for one-dimensional vectors. In our work, we have used both the models for multilayer perceptron and 1DCNN for classification purposes. The main contribution of our work has been summarized as follows: Firstly, we applied the RFE-INFO method for reducing the dimensionality of the data. Secondly,

7 RFE and Mutual-INFO-Based Hybrid Method Using Deep Neural …

87

we applied deep learning models multilayer perceptron and 1DCNN directly on the reduced dataset to produce good classification results. Both the methods are computationally feasible and produced better results on the benchmark datasets.

2 Related Work DNA microarray technology is a potential and clinical tool for the diagnosis and classification of cancer. A greater challenge to microbiology is the measurement of thousands of genes for biological samples using microarray. To solve the issue of high dimensionality and get better results from experiments, numerous statistical and machine learning techniques have been used over the last few decades to identify the most relevant genes for cancer classification. Microarrays are high-dimensional datasets with a small number of samples and a large number of features (genes) [6]. Numerous gene selection methods have been proposed and defined to improve classification, analysis, and cost-effectiveness. They aid in the selection of relevant genes by removing redundant and noisy genes. By evaluating the classification accuracy using some induction algorithm, wrapper approaches outperform filter methods in terms of feature selection [11]. However, these methods have a higher computational cost and are prone to overfitting. SVM-RFE is one of the wrapper methods used by various authors for gene selection purposes. It is a backward elimination algorithm. The wrapper methods produce better results than filter methods. Thus, the main purpose of RFE is to utilize nonlinear kernels and produce good results for classification [7]. The method takes multiple steps to reach from a collection of all the available genes to a smaller combination by eliminating the least important gene at every step. This method produces the ranking such that the most relevant genes must not be eliminated and are specifically ranked in subsets. This produces a highly optimized subset of genes for better performance. Wang et al. [8] have mutual information to extract the highly relevant genes subset m from a set of m genes. They then separated m genes from m genes using an improved Lasso method. Finally, they develop classification models to assess the algorithm’s performance. Guo et al. [12] have proposed a novel method using mutual information for multiclass breast cancer/breast tumor classification. Biomedical data have been accumulated through the use of big data storage, and numerous machine learning techniques have been applied in the field of bioinformatics [9]. Although, a large number of methods for gene selection have evolved in the field of machine learning, very little work has been done on cancer classification using deep learning models due to a less number of training samples available [10]. Fakoor et al. [13] have used a variety of deep learning models for gene selection and classification, including sparse autoencoders, stacked autoencoders, stacked autoencoders with tuning, and principal component analysis with softmax and SVM (with Gaussian kernel). Shah and Iqbal [14] proposed a hybrid deep learning model based on convolutional neural networks with Laplacian scores (LS-CNN). It is used to classify specified datasets.

88

S. Jain et al.

Zeebaree et al. [15] have proposed a modified SVM-RFE-based cancer classification method along with an improved random forest, namely mSVM-RFE-iRF. Due to the application of a higher level of max pooling layer, the method will be prone to drastic movement in high-level and low-level features. Liu et al. [10] have applied the sample expansion method to increase the number of samples and then applied two methods using stacked autoencoders and 1DCNN for classification of tumorous gene expressions. The drawback of the method is computational cost will be high for the expansion of samples. Also, the method requires more time for classification as the number of samples will increase. The genes present may be noisy and thus applying sample expansion may produce more corrupted genes.

3 Proposed Methodology The overall structure for the SVM-RFE-1DCNN approach is depicted in Fig. 1. DNA microarray can be used to count hundreds of gene expressions associated with tissue samples in experiments. Assume that the dataset containing DNA microarrays contains N samples of genetic living tissues. The row vector of the dataset can be M M which is a set of combinations of total M represented as (S M j , C i ) where S j ∈ E expression genes for N samples. Ci ∈ 1, 2, 3, . . . , L designates the target label for the class for S M j characterization of genetic expression. For chosen dataset, the number of features, that is, M is greater as compared to a number of samples N (M ≥ N ). The dataset is composed of an M-dimensional gene expression profile having N samples and having a total of L target classes for the samples provided. There is a correlation between gene expression and the target classes for which the experiments are conducted. The dimensionality of these datasets are high. Additionally, it contains irrelevant, redundant, and obtrusive genes in the expressions. The higher dimension of the dataset does not only makes the computation process complex but also degrades the performance of computation of numerous algorithms for learning. As a result, feature selection algorithms are necessary to reduce the amount of irrelevant and redundant gene expression in the dataset and available data. The proposed methodology works in two phases which are as follows: • Relevant and Non-redundant Gene Selection (RNGS) phase—In this phase, we used the SVM-RFE and mutual info that removes the redundant and irrelevant genes from a high-dimensional dataset and provides a low-dimensional subgroup of a gene that is more predictive and efficient for a class. • Cancer Classification (CC) phase—During this stage, the dataset is reformed based on the appropriate set of genes which are non-redundant found by the SVMRFE and mutual information in the RNGS phase. Then, for cancer classification, various machine learning classifiers, as well as neural networks and 1DCNN, were used.

7 RFE and Mutual-INFO-Based Hybrid Method Using Deep Neural …

89

Fig. 1 View of proposed RFE-INFO method

3.1 RNGS Phase The microarray dataset has the property of having low samples and high-dimensional features. The primary objective is to choose a low-dimensional subgroup of a biological gene which are highly relative to class for the prediction efficiently. The cancer classification depends upon the fact that out of high-dimensional genes subset most relevant subset must be identified for detection of disease. The RNGS phase

90

S. Jain et al.

Fig. 2 1-DCNN architecture

consists of four steps: data preprocessing, SVM-RFE application, mutual information discovery, and sample expansion. These steps are described as follows: • Step 1—Data Preprocessing: The undefinable or missing values of gene expressions is replaced with mean values in each dataset during this data preprocessing step. The entire dataset is normalized by setting the mean to zero and the standard deviation to one using the following equation [6]: g new =

g−μ σ

Here, g new denotes the transformed value for the gene expression g, μ denotes the mean, and σ denotes the standard deviation for the given vector. • Step 2—Apply SVM-RFE: SVM-RFE is a wrapper algorithm, which is a scalable and efficient method for feature selection. In the case of multiclass classification problems, it produces very accurate results. Backward feature withdrawal is performed by computing the rank weighting factors of all characteristic features. Then, sorting of the characteristic features is done according to the weight

7 RFE and Mutual-INFO-Based Hybrid Method Using Deep Neural …

91

vectors. It is accomplished by removing one feature variable at a time. At each step, the coefficients of a linear SVM’s weight vector w are used to compute the characteristic feature ranking score [16]. The attributes with the lowest ranking score, i.e., ci = (wi 2 ) is removed, where wi corresponds to the component in the weights with the same name w [17]. The steps for feature subset selection using SVM-RFE are as follows: (i) Calculate the weights associated with each feature in the ranking order. (ii) The features with the lowest weight score are eliminated. The process is repeated and performed until we select the top 20% genes using SVM-RFE and store them separately. • Step 3—Find Mutual Information: We find the mutual information of each gene with the class label. Mutual information is calculated randomness between the two variables. It quantifies the reduction in the uncertainty associated with a variable when compared to the known value of another variable. In other words, it refers to the amount of information that can be gathered from one random variable in the presence of another. The following formula can be used to express the mutual information between two random variables X and Y [8]: I (X ; Y ) = H (X ) − H (X |Y ) where I (X ; Y ) denotes the mutual information between X and Y , H (X ) denotes the entropy of X , and H (X |Y ) denotes the conditional entropy of X in the presence of Y . After that, we store the top 20% genes with the highest mutual information with the class. After finding the top 20% genes in step-2 and step-3, we find the intersection of these two to find the most relevant gene subset. Let the number of genes after intersection be denoted as I . • Step 4—Sample Expansion: Since the dataset consists of a very less number of samples, the sample expansion is performed. It is done to get better classification accuracy from a deep learning model we require a large number of training data to train the deep learning model. So, the top relevant genes from the intersection of step-2 and step-3 are chosen as input for the process. For each sample, the process is repeated for I genes. And in each iteration, a gene is replaced by the mean of all the genes in the considered sample. This data is augmented to the existing dataset to increase the number of samples for training the model. Suppose we got now, the samples after available augmentation can be given as: Total Samples = N ∗ I + N where N denotes the number of samples.

(1)

92

S. Jain et al.

3.2 CC Phase After getting the most relevant and non-redundant gene subgroup from the RNGS phase, now, we used the different classifiers like Naive Bayes (NB), support vector machine (SVM), decision tree (DT), and two deep learning-based classifier model neural network (NN) and one-dimensional convolutional neural network (1DCNN) for cancer classification. Convolutional neural network (CNN) is a well-known deep learning model that makes extensive use of parameter sharing. It extracts local features of data. CNN assigns weights on the feature vectors and reduces overall weights. The input layer, hidden layer, and latent layer are the three primary components of convolutional models. These hidden layers are classified as fully connected, pooling, or convolutional. They are described shortly as follows: • Convolution Layer—Like a hidden layer in neural networks, convolution layers are used in CNNs. A convolutional layer has some filters (small pattern detectors) that performs the convolutional operation. These filters are learned through neural network training in the same way as in fully connected networks. • Pooling Layers—It can be done in three ways—min pooling, max pooling, and average pooling. To handle the large size of data, pooling layers are used. It compresses the input to the next layer while retaining the value’s characteristics. We have used max pooling operation. • Fully Connected Layers—They are also known as dense layers. All the neurons are connected to retain the feature values. The layer undergoes a transformation, which helps to retain the integrity of information integrity. The forward propagation and back propagation steps in the CNN training process are as follows [10]. Let x l represents the output of the lth layer and the input of the following layer (2) x l = f (u l ) u l = W L ∗ x l−1 + bl

(3)

where u l denotes the output of lth layer, W l denotes the weight matrix, and bl denotes the bias vector of the lth layer, f (.) denotes the activation function of the lth layer. As the activation function, we used the rectified linear unit (ReLU). From higher to lower layers, the back propagation derivative is critical. The given formula in Eq. (3) is used to implement the back propagation of weights and biases.

δ i = W i+1 δ i+1 ◦ f (u i )

(4)

The cost function for the N samples will be given by Eq. (4) as follows, where L(.) is the loss function. 1 (5) J (W, b) = (L(W x i + b, u i ) N

7 RFE and Mutual-INFO-Based Hybrid Method Using Deep Neural …

93

the partial derivatives of weights and bias are calculated in Eqs. (5) and (6) which helps to back propagate the errors. Here, α is the learning parameter. W i = −α

∂J ∂Wi

(6)

bi = −α

∂J ∂bi

(7)

The new weights and bias are computed as follows in Eq. (7) ∂J ∂Wi

(8)

∂J ∂bi

(9)

W i+1 = W i − α bi+1 = bi − α

CNN performs the convolutional operation on image data by using a twodimensional image as input. However, because each sample of gene expression data is a one-dimensional vector, the preceding procedure cannot be used. As a result, we used 1DCNN instead of 2DCNN, which requires a one-dimensional vector as input. Figure 2, shows the proposed architecture of the 1DCNN which consists of seven layers. The layers are as follows: one input layer, two convolutional layers C1 and C2 , one max pooling layer M1 , and two fully connected layers F1 and F2 . The input layer is consisting of a sample having m 1 genes and is represented as a vector of size m 1 ∗ 1. The convolutional kernels W1 and W2 are vectors represented as vectors of size m 2 ∗ 1 and m 3 ∗ 1 for layers C1 and C2 respectively. P is the size of filtering kernel represented as p which is a vector of size m 4 ∗ 1 for layer M1 . The feature maps of layers F1 and F2 are represented as vectors of size m 5 ∗ 1 and m 6 ∗ 1, respectively. The output layer is represented as a vector of size m 7 ∗ 1.

3.3 Experimental Setup The algorithms are implemented in the Python version 3.7 and MATLAB (2019). We have used Google Colaboratory and Jupyter Notebook for our experiment. The configuration of the machine is as follows: Processor—intel core i5 3.30 GHz*4, RAM— 8GB, OS—Ubuntu 18.04 LTS, Graphics—Radeon, Disk—1000 GB, OS Type—64 bits.

94

S. Jain et al.

Table 1 Descriptive summary of each datasets Datasets Number of genes SRBCT ALL-AML ALL-AML-3c ALL-AML-4c Lymphoma MLL CNS

2308 7130 7130 7130 4026 12,601 7129

Number of samples

Number of classes

83 72 72 72 66 203 60

4 2 3 4 3 2 2

3.4 About Dataset We used seven benchmark cancer microarray gene expression datasets obtained from—http://csse.szu.edu.cn/staff/zhuzx/Datasets.html [18]. These datasets are SRBCT, Central Nervous System (CNS), Leukemia-2c (ALL-AML), Leukemia-3c (ALL-AML-3), Leukemia-4c (ALL-AML-4), Lymphoma, mixed-lineage leukemia (MLL). Table 1 contains a brief summary of the dataset. Each dataset’s summary includes the total number of samples, the total number of genes per sample, and the total classes corresponding to samples.

4 Results We evaluated the proposed method on seven benchmark cancer datasets, and the results are summarized in Table 2. The proposed approach has been evaluated through different classifiers like NB, SVM, DT, and with two deep learning models NN and 1DCNN. The average of the accuracies obtained from the various classifiers is then computed for the specific dataset. The proposed method scored following average classification accuracies—SRBCT 99.13%, ALL-AML 99.99%, ALL-AML-3c 99.50%, ALL-AML-4c 99.16%, Lymphoma 99.97%, MLL 98.12%, and for CNS 87.19%. Following that, the proposed RFE-INFO classification accuracy on seven benchmark cancer datasets is compared to six other well-known classification methods, namely SVM, random forest, FCBF, BPSO, PSO-DT, and MBEGA. The proposed technique outperformed for Leukemia-3c, Leukemia-4c, Lymphoma, and CNS and the results are listed in Table 3. Figure 3 depicts a graphical representation of the accuracy comparisons of the proposed technique to other classification methods.

7 RFE and Mutual-INFO-Based Hybrid Method Using Deep Neural …

95

Fig. 3 Comparison of proposed and other methods

Table 2 Classification accuracy of proposed method with different classifier Dataset Naive SVM DT NN 1DCNN Bayes SRBCT Leukemia Leukemia3c Leukemia4c Lymphoma MLL CNN

100 100 100 100 100 100 91.66

100 100 100 100 100 97.33 83.33

95.64 99.98 99.93 95.73 99.87 99.90 94.33

100 100 100 100 100 93.33 75

100 100 97.41 100.00 100 100 91.66

Table 3 Classification accuracy of RFE-INFO and other ML methods Datasets SVM [19] RF [20] FCBF BPSO PSO-DT [21] [22] [23] SRBCT ALL-AML ALL-AML-3 ALL-AML-4 Lymphoma MLL CNN

64.94 65.28 52.89 52.78 69.70 65.28 65.40

100.00 94.44 83.33 76.39 96.97 94.44 58.33

98.94 100.00 95.83 95.00 98.48 98.61 76.67

99.00 100.00 97.22 91.66 95.45 95.8 53.33

92.49 95.83 95.83 94.44 98.50 94.04 58.33

Average 99.13 99.99 99.50 99.16 99.97 98.12 87.19

MBEGA [18]

Proposed

99.23 95.89 96.64 91.93 97.68 94.33 72.21

99.13 99.99 99.50 98.16 99.97 98.12 87.19

96

S. Jain et al.

5 Conclusions and Future Work In our work, a two-phase hybrid model RFE-INFO-1DCNN is proposed and evaluated on seven benchmark cancer datasets. Due to the fact that feature selection is more credible than feature extraction, a superior feature selection method called RFE is combined with mutual information to reduce the dimensionality of gene expression data. It eliminates the irrelevant and redundant genes and reduces the dimensionality of the dataset. Then, we extend the number of samples to better train the classification model. Classification is done with the help of NB, SVM, DT, and with two deep learning models NN, 1DCNN. The proposed method can serve as a good preprocessing tool to optimize the feature selection process. The method provides better-selected genes and classification accuracy. At the same time, it tries to keep computational resources that need to be minimum. As a part of future work, the proposed technique can contribute to other tissuerelated diseases tumorous diseases. The proposed algorithm’s computational cost and running time can also be reduced.

References 1. He J, Sun M-A, Wang Z, Wang Q, Li Q, Xie H (2015) Characterization and machine learning prediction of allele-specific DNA methylation. Genomics 106(6):331–339 2. Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679 3. Ai-Jun Y, Xin-Yuan S (2010) Bayesian variable selection for disease classification using gene expression data. Bioinformatics 26(2):215–222 4. Piras V, Selvarajoo K (2015) The reduction of gene expression variability from single cells to populations follows simple statistical laws. Genomics 105(3):137–144 5. Alshamlan HM, Badr GH, Alohali YA (2015) Genetic bee colony (GBC) algorithm: a new gene selection method for microarray cancer classification. Comput Biol Chem 56:49–60 6. Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203– 215 7. Sanz H, Valim C, Vegas E, Oller JM, Reverter F (2018) SVM-RFE: selection and visualization of the most relevant features through non-linear kernels. BMC Bioinform 19(1):1–18 8. Zhao G, Wu Y (2016) Feature subset selection for cancer classification using weight local modularity. Sci Rep 6:34759 9. Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics. Briefings Bioinform 18(5):851– 869 10. Liu J, Wang X, Cheng Y, Zhang L (2017) Tumor gene expression data classification via sample expansion-based deep learning. Oncotarget 8(65):109646 11. Ibrahim N, Hamid H, Rahman S, Fong S (2018) Feature selection methods: case of filter and wrapper approaches for maximising classification accuracy, Pertanika. J Sci Technol 26:329– 340, 109646 12. Guo S-B, Lyu MR, Lok T-M (2006) Gene selection based on mutual information for the classification of multi-class cancer. In: International conference on intelligent computing. Springer, Berlin, pp 454–463

7 RFE and Mutual-INFO-Based Hybrid Method Using Deep Neural …

97

13. Fakoor R, Ladhak F, Nazi A, Huber M (2013) Using deep learning to enhance cancer diagnosis and classification. In: Proceedings of the international conference on machine learning, vol 28. ACM New York, USA 14. Shah SH, Iqbal MJ (2020) Optimized gene selection and classification of cancer from microarray gene expression data using deep learning. In: Special issue on new trends in bio-inspired computing for deep learning applications. Springer, Berlin 15. Zeebaree DQ, Haron H, Abdulazeez AM (2018) Gene selection and classification of microarray data using convolutional neural network. In: 2018 International conference on advanced science and engineering (ICOASE). IEEE, pp 145–150 16. Duan K-B, Rajapakse JC, Wang H, Azuaje F (2005) Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans Nanobiosci 4(3):228–234, 109646 17. Huang M-L, Hung Y-H, Lee W, Li R-K, Jiang B-R (2014) SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier. Sci World J 18. Zhu Z, Ong Y-S, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn 40(11):3236–3248, 109646 19. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914, 109646 20. Breiman L (2001) Random forests. Mach Learn 45(1):5–32, 109646 21. Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 856–863 22. Chuang L-Y, Yang C-S, Wu K-C, Yang C-H (2011) Gene selection and classification using Taguchi chaotic binary particle swarm optimization. Expert Syst Appl 38(10):13367–13377, 109646 23. Chen K-H, Wang K-J, Wang K-M, Angelia M-A (2014) Applying particle swarm optimizationbased decision tree classifier for cancer classification on gene expression data. Appl Soft Comput 24:773–780, 109646

Chapter 8

Biased Online Media Analysis Using Machine Learning Arpit Gupta, Anisha Kumari, Ritik Raj, Akanksha Gupta, Raj Nath Shah, Tanmay Jaiswal, Rupesh Kumar Dewang, and Arvind Mewada

1 Introduction The media has always influenced politics, and it contains information that can be used in predicting future streams. People always tend to prefer one news channel or source over another. This trust may sometimes mislead the masses, persuading them to believe in the news that might be irrelevant, fake or biased in one context or the other. Based on this, we searched and collected thousands of news blogs and collected full ‘descriptions’ along with their ‘title’, ‘media source’, ‘time of news’ and ‘author’ from various Indian media sources published online. First, it filtered out and refined lots of data, removing the redundant and irrelevant data and then developed a predictive machine learning model that helped to classify them among biased or unbiased news [1, 2]. Further, to make this research more relevant, it targeted specific news reports related to politics and elections. The area considered in this research was the ‘North Eastern Indian States Elections of February 2018’. And, the dataset collected was classified using different machine learning models in Python, as discussed in the proposed work section. We review the related work in Sect. 2 and the proposed work in Sect. 3. Then, experimental setup and methodology that we have used were in Sect. 4. Next, we see results and then conclude in Sect. 5.

A. Gupta · A. Kumari · R. Raj · A. Gupta · R. N. Shah · T. Jaiswal · R. K. Dewang · A. Mewada (B) Computer Science and Engineering Department, Motilal National Institute of Technology Allahabad, Prayagraj 211004, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_8

99

100

A. Gupta et al.

2 Related Works Bias analysis is a broader term and can include biasing based on various fields like view bias, sex bias, ideology bias, political bias.

2.1 Ideological Bias Detection Hence, many people have taken it up, but most of them were regarding the US Congress Sessions’ Ideology Biasing. One such research was done in the paper ‘Political Bias Analysis’ by Misra and Basak [3]. They manually labelled the US congressional floor debate transcripts like ours and consisted of 2025 liberal and 1701 conservative biased sentences. Rory How [4] used RNN for training the data due to the less data availability [5].

2.2 Language Bias Detection Another bias analysis was done for human edits in Wikipedia articles, also known as epistemological bias detection. Tae Yano et al. [6] used supervised machine learning models for detection. Moreover, Zubin Jelveh [7] uses supervised models to predict political biasing in the research papers, and comparison is performed between supervised and unsupervised approaches [8].

2.3 Political Text Bias Detection Similarly, a relevant research topic is biasness detection in political blogs. Jiang et al. [9] identify subjective sentences in blog posts and the involved opinions to classify the text politically. And, a good source of data can be found in Bitter Lemons blog combined with parliament speeches and politicians’ public statements [10].

2.4 News Articles Bias Detection A QUOTUS [11] framework extracts quotes from news articles and blog posts from politicians to create an outlet-to-quote matrix. The authors used this matrix to predict whether or not an outlet would report a quote and discovered the latent dimensions of political bias. Chen et al. [12] evaluate whether users have favourable or adverse reactions to news opinion pieces to label them liberal or conservative. Similarly,

8 Biased Online Media Analysis Using Machine Learning

101

Dallmann [13] and Krestel [11] cope with the bias detection problem in Germany. Cruz et al. [14] examined selection bias on newspapers, i.e. how much space each newspaper dedicates for a particular political party. They used the Guardian and Telegraph dataset (online newspaper) and collected news articles from 2000 to 2015. They used some tools and APIs for classification like IBM Alchemy API and Stanford Core NLP Tool.

3 Proposed Methods and Discipline Existing work on the biased media detection is taking help of which we incorporated some new ideas and concepts of natural language processing on top of what has already been developed, and to do so, we have proposed this work in the following sections:

3.1 Data Collection Since this research targeted a specific area, we were unsure what data was fetched. The area we chose for the proposed work was, as quoted earlier, ‘North Eastern Indian States Elections of February 2018’. We brought most of the data from https://www. newsapi.org, http://www.ndtv.com, https://www.indiatoday.in and https://news.goo gle.co.in. After the collection, the dataset included 24,365 articles from different news sources, manually labelled by the research group members, classifying each article individually into two categories ‘biased (12,182 articles)’ and ‘unbiased (12,183 articles)’ 4873. Now, dataset is split into the ratio of 80:20% for training and testing, representing 19,492 articles for training and 4873 articles for testing.

3.2 Data Labelling One of the essential parts of the research was data labelling. The dataset required for the research was not available online because it included news articles of recent elections from sources of Indian news media. To do so, we fixed specific parameters to classify them correctly. Parameters had mostly the intuitive approach of overcriticizing or over-favouring any political party. Since the data was collected from a particular domain, it helped us label the same news from different news media perspectives.

102

A. Gupta et al.

3.3 Data Pre-processing The dataset must be clean and well-organized for the machine learning models to run efficiently and obtain decent accuracy. Subsequently, the collected data was then formatted into a CSV file and pre-processed. Many redundant and irrelevant rows had crept into the dataset, removing the various attributes obtained through the sources. The most significant ones were extracted: description, title, author and source. Further, we applied the following concepts of stemming and stopped words removal for obtaining good accuracy. 1.

2.

3.

Tokenization: Each word from the article was extracted and was stored as a separate entity. The article here means the appended values of description, title, author and source. Stemming: The process of reducing inflected words to their word stem is called stemming. The classifier does not recognize that the verbs investing and invested are synonymous and treats them as different words with distinct frequencies. It groups the frequencies of different inflections to just one term at, in this case, invests, by stemming them. Stop Words Removal: Stop words are typically high-frequency words that provide no additional into the labelling. They perplex the proposed classifier. They could be the, is, at, which and so on.

3.4 Model Selection After data pre-processing, an appropriate model needs to be selected. The features of these models are the words or terms in the training dataset. We sequentially worked on different models to develop comparatively better accuracy than the rest. Among these model selection processes, these steps were common: Counting Frequencies: The dataset is free of stop words and includes only word stem (base form). Now, keep a frequency distribution of these words uniquely for each training data row (article). TF-IDF Conversion: Some insignificant words might still have made their way into the current dataset even after stopping words removal and stemming and having a significant frequency that might affect the proposed model’s prediction accuracy. Consider, for example, the term also, which will not be removed either with stemming or stop words removal and might be having a considerable frequency as it is a common word in the English literature. Applying TF-IDF normalizes this overrated frequency. N-gram: A contiguous sequence of n items from a given text or speech sample constitutes an n-gram. The items can be phonemes, syllables, letters, words or base pairs according to the application. Sometimes, as in this case, clubbing two or more words and using them as a feature result in more accurate results than using single words as features [15]. The following models were worked on the training dataset

8 Biased Online Media Analysis Using Machine Learning

103

to spectate the different accuracy obtained on the trial of other models: naive Bayes classifier, logistic regression model, SVM and XGBoost classifier discussed their work later in the paper.

4 Experimental Setup The research implements that back-end was developed with Python as the programming language and Sublime Text as the editor. Python libraries/packages like NLTK and Scikit-learn were used for most of the part of natural language processing, CSV and pandas for handling CSV files, XGBoost for including boosted decision tree, Django for Django framework. The user interface was developed in the Django framework using Python itself. The following series of steps were taken to ensure a working experimental setup:

4.1 Data Representation Vector Count Vectorizer: Tokenization of collection of text documents and vocabulary of known can be made using count vectorizer. It makes a sparse matrix of the article to the vocabulary containing the frequency of that word in the research article. TF-IDF Calculation [11]: TF-IDF normalizes the frequency of words by using the following equation: TFIDF(d, t) = TF(t) ∗ IDF(d, t) IDF(d, t) = log

n DF(d, t)

(1)

+1

(2)

where TF: text frequency, n: total number of documents, t: term, IDF: inverse document frequency, d: document.

4.2 Model Setup Naive Bayes Classifier: It works on the Bayes theorem principle, where each feature is considered independent of the other. The following equation expresses the Bayes Theorem: P(A|B) =

P(B|A) ∗ P(A) P(B)

(3)

104

A. Gupta et al.

where A and B are events, p (A|B) is the conditional probability of occurrence of A when B is true. The conditional probability of occurrence of B is P (B|A) when A is true. P (A) is the probability of A. The probability of occurrence of B is P (B). The naive Bayes classifier [16] was implemented in Python using the multinomial NB class in the SK-learn library, resulting in a decent accuracy but unsatisfying. So, we tried with other models as discussed below. Logistic Regression: This equation represents the cost function of logistic regression. m (i) −1 (i) (i) (i) J(θ) = y log x ) + 1 − y log 1 − h θ x m i=1

(4)

where x (i) is the feature vector of the ith training example in the dataset, the y (i) is the output of the ith training dataset, whether 0 or 1. The m is the number of training data examples. hθ (x) is the hypothesis function given by: h θ (x) =

1 1 + e−θ T x

(5)

Applying logistic regression improved the accuracy compared to the naive Bayes classifier. Minimizing the above cost function and improving the accuracy required gradient descent which does not perform well with datasets that are not large enough. Thus, we hope there is still room for improvement and proceed further with the below models. Support Vector Classifier: In SVM [17], every training example in the dataset with n features is represented as a point in an n-dimensional space. Hyper-planes are chosen to separate the different classes. The figure at the right depicts the data in a three-dimensional plane. The green plane represents the hyper-plane. Red dots depict support vector as class 1, and the blue ones depict support vector as class 2. It performs better than other machine learning classification models when the datasets are not large enough. Applying this model improved the accuracy to a sufficiently good extent in this case when compared to the above-discussed models (Fig. 1). XGB Classifier: XGB is extreme gradient boosting classifier [18] containing the XGBoost objective function under the boosting technique. It is a gradient boosting framework with a significant difference: it is comparatively faster and more accurate. Par ensemble technique refers to training various models that simultaneously work on the dataset to achieve great accuracy. It comprises boosting and bagging. In boosting, the model is trained sequentially over the dataset and builds the new learner sequentially, while bagging [19] given in Fig. 2 describes it. XGB is different from other gradient boosting algorithms due to the following features as regularized gradient boosting algorithm, CPU cores during training for parallel tree construction, making the best use of hardware by cache optimization, facilitates out-of-core computing for vast datasets that do not fit into memory and

8 Biased Online Media Analysis Using Machine Learning

105

Fig. 1 SVM data points in 3D [17]

Fig. 2 Difference between bagging and boosting [18]

provide continued training so that you can further boost an already included model on new data. Applying this model resulted in an accuracy that was the best of all the models discussed above.

4.3 Result and Analysis The dataset represented in Sect. 3.1 has been used for training and testing purposes. Here, we divided the dataset into 80% of the training dataset and 20% of the testing

106

A. Gupta et al.

Table 1 Confusion metrics for biased media

Predicted class Actual Class

Table 2 Showing accuracy obtained from different machine learning models

Biased

Unbiased

Biased

1590

847

Unbiased

825

1611

Models

Accuracy (%)

Naïve Bayes classifier

61.0

Logistic regression

62.4

SVM

64.6

XGBoost classifier

65.6

Fig. 3 Graph comparing accuracy by different models

dataset. The XGBoost classifiers described above have been used, and the following confusion matrix and results are shown in Tables 1 and 2, respectively. The above result clearly shows that XGBoost classifier performed better than other classifiers or models. SVM and logistic regression performed descent, and naive Bayes gave the least accurate results (Fig. 3).

5 Conclusion The main idea of this research is to find biasness in the news media by considering how many times the media has spoken about any particular party or person, either

8 Biased Online Media Analysis Using Machine Learning

107

negatively or positively. The biasness of a newspaper can be reflected by their views. For example if they talk more about certain political parties or individuals, this shows their biasness towards them. Aside from classifying biased or unbiased sentiment analysis or opinion mining, media bias analysis encompasses a wide range of other aspects. In addition, we intend to create a graphical user interface platform where details such as ‘title’, ‘description’ and ‘author’ can be entered, which will result in displaying biasness in mathematical form. We do not need to train data in the proposed GUI platform if it has already been trained. Finally, the proposed model overarching goal is to examine the biasness in online media to see how facts and figures spread and change among people. Journalists and readers play a vital role in information diffusion as an influencer. The media chooses to promote certain information or people to shape public opinion. Detecting selection bias in political news is the prior work in this research analysis. The proposed method also aids in identifying those who have been paid to write something biased about individuals or political parties.

References 1. Aggarwal S, Sinha T, Kukreti Y, Shikhar S (2020) Media bias detection and bias short term impact assessment. Array 6:100025 2. Nikhil DV, Dewang RK, Sundar B, Agrawal A, Shrestha AN, Tiwari A (2021) An android application for automatic content summarization of news articles using multilayer perceptron. In: Proceedings of international conference on big data, machine learning and their applications. Springer, Singapore, pp 379–394 3. Misra A, Basak S (2016) Political bias analysis 8 4. How R (2020) Measuring political bias in british media: using recurrent neural networks for long form textual analysis 5. Iyyer M, Enns P, Boyd-Graber J, Resnik P (2014) Political ideology detection using recursive neural networks. In: Proceedings of the 52nd annual meeting of the association for computational linguistics. vol 1. Long Papers, pp 1113–1122 6. Yano T, Resnik P, Smith N (2010) Shedding (a Thousand Points of) light on biased language. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk 7. Jelveh Z, ogout B, Naidu S (2014) Detecting latent ideology in expert text: evidence from academic papers in economics. In: Proceedings of EMNLP 8. Hube C, Fetahu B (2018)Detecting biased statements in Wikipedia. In: Companion proceedings of the Wikipedia web conference 2018, pp 1779–1786 9. Jiang M, Argamon S (2008) Exploiting subjectivity analysis in blogs to improve political leaning categorization. In: Proceedings of SIGIR 10. Gangula RRR, Duggenpudi SR, Mamidi R (2019) Detecting political bias in news articles using headline attention. In: Proceedings of the 2019 ACL workshop blackbox NLP: analyzing and interpreting neural networks for NLP, pp 77–84 11. Lazaridou K, Krestel R (2016) Identifying political bias in news articles. Bull IEEE TCDL 12 12. Chen W-F, Al-Khatib K, Stein B, Wachsmuth H (2020) Detecting media bias in news articles using gaussian bias distributions. arXiv preprint arXiv:2010.10649 13. Dallmann A, Lemmerich F, Zoller D, Hotho A (2015) Media bias in German online newspapers. In: Proceedings of the 26th ACM conference on hypertext and social media, pp 133–137

108

A. Gupta et al.

14. Cruz AF, Rocha G, Cardoso HL (2020) On document representations for detection of biased news articles. In: Proceedings of the 35th annual ACM symposium on applied computing, pp 892–899 15. Dewang RK, Singh AK (2018) State-of-art approaches for review spammer detection: a survey. J Intell Inf Syst 50(2):231–264 16. Rish I (2001) An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3, no 22, pp 41–46 17. Mewada A, Dewang RK (2021) Research on false review detection methods: a state-of-the-art review. J King Saud Univ-Comput Inf Sci 18. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794 19. Ross QJ (1996) Bagging, Boosting, and C4. 5. In: AAAI, vol 1, pp 725–730

Chapter 9

Coverless Information Hiding: A Review Nitin Kanzariya, Dhaval Jadhav, Gaurang Lakhani, Uttam Chauchan, and Lokesh Gagani

1 Introduction Due to heavy use of Internet, information security becomes a biggest problem in many fields like economy, banking system, military, medical. Nowadays, large amount of data is transmitted through Internet such as private information, secret password, trade code, secret military information and secret information [1]. The increasing in cyber assaults and criminality has prompted increased focus on protecting data transmission over public networks such as the Internet. Because of substantial utilization of web, data security is a most serious issue in many fields, similar to economy, banking framework, military, clinical and other many records. In the period of time, information security made extensive use of encryption technologies. The most basic principle is to encrypt key data into “unreadable” ciphertext that can only be decoded by the receiver with the correct key [2]. The N. Kanzariya (B) · G. Lakhani Government Polytechnic Himatnagar, Himatnagar, Gujarat, India e-mail: [email protected] G. Lakhani e-mail: [email protected] D. Jadhav Vidyabharti Trust College of Master in Computer Application, Bardoli, Surat, Gujarat, India e-mail: [email protected] U. Chauchan Vishwakarma Government Engineering College, Ahmedabad, Gujarat, India e-mail: [email protected] L. Gagani Indus University, Ahmedabad, Gujarat, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_9

109

110

N. Kanzariya et al.

ciphertext obtained after encryption, on the other hand, is sometimes in the form of “cryptic code,” which attackers seeking to decrypt it may easily read [3]. As a result, in some applications, such as covert communication, information insertion is favoured over encryption since it conceals the communication’s trail without generating suspicion from attackers or observers. The information hiding method involves changing the pixels of the cover image to hide hidden information; this implies that the cover image may be altered [4]. Image steganography was quickly found using automated steganography techniques. The changes made by embedding, on the other hand, will be preserved in the cover picture, allowing for effective identification of concealed secret information using technology; this detection step is also known as steganalysis. To solve this problem, the coverless information concealing strategy was described by Sun in 2015. Coverless technology provides a direct mapping link between secret information and the hidden carrier based on the hidden carrier’s properties. It doesn’t need to change carrier information, thus even if an attacker obtains the original carrier with the secret information, he will be unable to access it. CIH technology offers an anti-steganalysis capacity that is unrivalled. An image, as we all know, is made up of a variety of feature data, including highlevel semantics, pixel brightness value, colour, texture, edge and contour. Zhou et al. investigated the algorithm that uses this idea. They proposed coverless information hiding using image and text in 2015. For information hidden with a proper feature description, it is possible to create precise connections with features and hiding data [5]. It may communicate the created image, which is independent of the key information, in order to produce the same relevant information as the source images [6]. This method efficiently defends against steganalysis attacks and substantially improves the security of confidential information. The remaining paper is arranged around the following sections: Section 2 discusses similar works, such as conventional image steganography, steganalysis and coverless image steganography, as well as recent developments and significant difficulties. In this, first we show some basic method of the coverless image steganography in Sect. 3. Section 4 presents the performance evaluation. Section 5 presents the current key challenges, and in the last, conclusion and further work are presented in Sect. 6.

2 Background Information and Similar Work 2.1 Traditional Image Steganography Steganography is a method of hidden communication that is not visible to the human eye. Carrier, message and password are the three components of steganography. Carrier, also known as a cover-object, is a container in which a message is inserted and serves to conceal its presence. The digital cover media can be used to classify

9 Coverless Information Hiding: A Review

111

steganography kinds [2, 7]. The four primary kinds of steganography are (i) text steganography, (ii) image steganography, (iii) audio steganography and (vi) video steganography. In steganography, the most popular cover items are images. Only receivers who have the correct decoding key will be able to decode the message from a cover-object, also known as stego-key. The stego-image is the cover-object with the covertly inserted message. There are two types of image steganography methods. • Spatial domain steganography. • Transform domain steganography. The LSB technique [7] is the most popular and straightforward data concealment strategy. HUGO [8], WOW [9], UNIWARD and other information hiding algorithms have been built from the fundamental principles of the LSB algorithm. The concealed techniques in the DWT, DFT, DCT and IWT are among the numerous transform domain steganography methods suggested. Nonetheless, it’s not hard to determine that traditional image steganography changes the image’s content to some amount, making it impossible to avoid detection by image steganalysis tools.

2.2 Steganalysis There are sure to be some alteration traces left behind when the embedding procedure involves changing pixel values of the cover image. As a result, we’re faced with a problem that existing steganalysis tools can detect based on change traces. As a result, secret data would be visible [8]. The statistical anomalies are used to judge whether the information embedded in the carrier data is paramount [10]. The most important approach for steganalysis is to use characteristic extractors [11], such as SPAM [12], SRM [13] and others. SVM, decision trees, ensembles and other common devices for researching classifiers are significant for steganalysis. Nowadays, neural network is highly used for steganalysis, and the experimental findings suggest that CNN can increase classification accuracy when used instead of traditional classifiers.

2.3 Coverless Information Hiding Coverless information hiding is presented as a way of transmitting secrets without altering the cover image. The term “coverless” does not indicate that there is no need for a cover [1]. Coverless information hiding, unlike traditional image steganography, does not need changing the cover image; rather, it focuses on the image’s basic characteristics to directly communicate concealed information. A proper feature description can be used to find links between feature information and hidden information. Coverless steganography will be divided into two categories as shown in Fig. 1. Mapping coverless steganography and full structured coverless steganography.

112

N. Kanzariya et al.

Fig. 1 Types of coverless information steganography

For structured steganography, according to particular criteria, the stego will be produced straight from the key data [11]. The researchers proposed a number of coverless structural steganography approaches based on this foundation. A completely structured algorithm provides low capacity. However, attacks on samples and the extraction of secret information, both of which need sophisticated computer help, will be difficult to counter [14]. In contrast to structured steganography, the presence of a mapping relationship is provided by mapping technique, in which establishing a meaningful link between the carrier and the secret information is how key information is expressed [15]. The number of image sample required rises exponentially with the increase in capacity in mapping approaches due to the limits of the mapping process [16]. The basic idea is to look at image features and map them to important information using a set of criteria based on the characteristics of image attribute. In this manner, the carrier may instantly communicate the necessary information. As a result, it significantly enhances image security by avoiding steganography recognition. Following are the main contributions made by Coverless Information Hiding: 1. 2.

No modifications will be made to the stego-image in order to execute covert communication. Currently available steganalysis tools cannot detect secret information since the stego-image was not modified.

2.4 Progress in the Past Five years Zhou et al. [17] suggested that make an image database by downloading a collection of images from the Internet. A robust technique creates a hash value for each image in the next phase. Using generated image hashes, images are then indexed in an inverted index structure in accordance with mutual sharing of image hashes between sender and receiver. Following that, the key message is sent as follows: The sender begins by converting it into a tiny quantity stream. Second, divide it into equal-length halves. Third, look for pictures with hash sequences that are appropriate for the key message parts. Finally, generated stego-image is sent to the receiver. Zheng et al. presented a resilient hash method that creates an 18-bit binary hash sequence for each image using a scale-invariant feature transform (SIFT) [18]. As a first stage, a

9 Coverless Information Hiding: A Review

113

local database (image database) will be generated on the fly, including images and hashes relating to each picture. However, the writers uncovered a flaw: the database included 218 images, each with its own hash, making it impossible to represent all 18-bit possibilities. To achieve the desired database capacity, authors represented every image with such a unique code of 18-bit hashes. Finally, secret message is divided into 18-bit segments large enough to accommodate the image hash code size, and a stego-image with a hash equal to the secret message segment is selected as the carrier to send to the receiver. Wu et al. [7] support the “Grayscale Gradient Co-occurrence Matrix” technique, which performs as follows: the message is initially represented as a binary stream. After that, the binary streams are divided into eight-bit equal chunks. A turbo encoder is used to convert each 8-bit segment to 16 bits due to the rapid information flow. Finally, seek for an image that fits the 16-bit segment length, as specified by the mapping function. Using Faster-RCNN, ZHILI ZHOU presented a strategy that supported the coverless information concealing approach in the cloud environment [9]. Throughout this process, the initial images with characteristics that might transmit essential information are used as stego-images. It’s difficult to achieve desired protection against picture attacks with today’s coverless information concealing methods, which rely on precise image characteristics to hide information. Furthermore, their ability to hide is limited. To overcome these issues, we developed a unique robust image coverless information hiding technique based on Faster-RCNN. Faster-RCNN recognizes and locates objects in images, then uses their labels to deduce hidden information. The suggested approach may successfully withstand steganalysis and cannot arouse attackers’ suspicion since the initial images with no alteration are utilized as stegoimages. In terms of durability and capacity, the suggested system improves traditional coverless information concealing techniques.

3 Fundamental Framework The coverless strategy is considered the foundation for securing secret data, and it is crucial to plan an efficient fundamental framework. Figure 2 shows steadily improved coverless image steganography solutions. The feature extraction method and the mapping methods are significant discussed in this article. This paper presents the basic structure of coverless image steganography, as shown in Fig. 2.

114

N. Kanzariya et al.

Fig. 2 Current methods of coverless image steganography

3.1 Coverless Information Hiding Based on Robust Image Hashing [12] A technique that creates a stable image hash by utilizing the SIFT feature’s orientation information could be used to extract an 18-bit binary sequence from each image seen in Fig. 3. The suggested technique was also compared to a coverless steganography method indicated that the new method was more reliable, resilient and secure. The four elements of this approach are as follows:

Fig. 3 Suggested steganographic algorithm’s flowchart [12]

9 Coverless Information Hiding: A Review

115

Fig. 4 Robust image hash method [12]

3.1.1

Robust Image Hashing Method

We first get 512 × 512 image using image preprocessing, and then we partition the images to 3 × 3 parts. The size of a window is calculated based on the number of stable points retrieved from image blocks that are selected by altering the edge (circular area). The gradient directions of all the sample points inside the window are added together to generate a histogram. If the maximum angle is between 0 and 90°, the code of this image block is set to 00; otherwise, the hash code is set to 01 if the maximum angle is between 90 and 180°, and so on. The final image hash is created by connecting the hash values of nine parts in positive direction. For a better understanding of the image hash process, we draw the SIFT function points (the green arrow) calculated using the approach in Fig. 4.

3.1.2

Inverted Index of Quad Tree Structure

To increase retrieval and match speed, this hash technique employs an inverted quadtree structure (see Fig. 5). To keep secret information from being discovered, each leaf node maintains at least two images. This quad tree’s maximum height is 10, and each node contains four child nodes with values of 00, 01, 10 and 11.

3.1.3

The Process of Information Hiding (Refer Fig. 3)

Step 1: The secret data must be divided into n binary components. The following is the connection between n and L, assuming that the length of secret data is L: n=

L , 18 L + 18

if L%18 = 0 1, otherwises

(1)

116

N. Kanzariya et al.

Fig. 5 Inverted index of quadtree structure [12]

The size of the last segment might be up to 18 which are the secret data. To ensure that this segment is the same length as the preceding one, several zeros are added to the top. Step 2: Exactly two bits of the secret data bits produced in the previous phase are matched using an inverted index quad tree. Each segment corresponds to a leaf node, and each image stored within the leaf node carries the secret data segment. Step 3: Select a random image from the image database, which is used to store the following information using LSB replacement steganography. 1. 2. 3.1.4

Block orders of stego-images that match in Step 2. L is the length of secret data. Retrieval of Secret Information (Refer Fig. 3)

The receiver gets all of the images in order, as well as information about the image blocks’ order and the length of the secret data from the additional image. To preserve the confidentiality of private information, the data lengths contained in the last image are computed as follows: n=

18, if L%18 = 0 L%18, otherwises

(2)

9 Coverless Information Hiding: A Review

117

3.2 A Novel Coverless Information Hiding Method Based on the Most Significant Bit of the Cover Image [18] The CIHMSB technique is depicted in Fig. 6 as a structure. Before communication may begin, both the sender and the recipient must agree on Km. The serial number of an image fragment is represented by Km, which is a collection of random integers. Km values vary from 1 through the picture fragment’s final serial number. The key data is translated to binary, and the cover image is divided into image parts that hold the key information. Secondly, following the order of Km, each image fragment is mapped with binary digits. This mapping results in the creation of Kf, which implements the data concealing approach. The receiver segments the stego-image using the same technique as the sender after receiving Kf and stego from the sender. Following that, using km and kf, binary bits are retrieved from the fragment. Finally, the receiver puts the binary bits together to create hidden data. The suggested technique wraps 2601 bits of secret information per carrier with a PSNR of 1 dB, according to experimental data. The three elements of this approach are as follows:

3.2.1

Information Hiding Process

As shown in Fig. 6, the cover image is fragmented into many image parts first. After the secret information has been preprocessed, it is converted into binary form. The mapping sequence Km and the mapping flag Kf are then used to identify the mapping between the MSB of image fragments and the binary form secret information. If Bi equals Vj’s MSB, set Kf = 1; otherwise, use the mapping sequence Km to set Kf = 0. The mapping flag Kf and the stego-image are returned. The method for hiding data has been completed.

Fig. 6 Proposed method [18]

118

3.2.2

N. Kanzariya et al.

Secret Information Retrieval

The process of information recovery is the opponent of information hiding. Two processes are used to extract secret data: stego-image preprocessing and mapping. The receiver preprocesses the stego-image after receiving it, and the preprocessing is the same as the sender’s, yielding binary bits. The mapping is then established using the mapping flag Kf and the mapping sequence Km. As a consequence of this operation, the binary strings B1, B2,…Bn will be produced. If Kf is 1, the MSB value of the stego-fragment will be allocated to Bi; otherwise, if Kf = = 0 and the MSB value of the stego-fragment is “0,” let Bi = 1, and if Kf = = 0 and the MSB value of the stego-fragment is “1„” let Bi = 0. At the last we got all binary bits like B1, B2,…Bn and combine these bits to create confidential information. We obtain our safe information that was sent by sender after merging. The information extraction procedure has completed successfully.

3.3 Coverless Image Steganography Based on Image Segmentation [1] In this paper, image segmentation was supported by a coverless image steganography approach in this research. We utilize ResNet to extract semantic features and Mask RCNN to conceal information from an image. The ethical structural integrity of these selected object regions, which do appear to be in the image’s visual centre, reduces intentional attack knowledge loss. Experiment outcomes demonstrated that our technique is more resistant to geometric attack than existing coverless image steganography methods.

3.3.1

The Proposed Steganography Scheme

Mask RCNN is used to segment a large number of object regions in this framework. The required object regions are then chosen. To construct item area sequences and establish an index for feature matching, we use strong hash techniques. As a consequence, the recipient’s stego-images are matched and delivered. Image segmentation, the development of an inverted index and the steganography technique make up the majority of this method’s components.

Image Segmentation Semantic segmentation identifies all pixels in an image without regard for the enclosing box. Detecting items from a specific category are present in a picture and locating the objects are the two issues that object detection addresses. Semantic

9 Coverless Information Hiding: A Review

119

Fig. 7 Visual examples of object detection and image segmentation [1]

Fig. 8 Object areas with size [1]

segmentation is coupled with object detection in instance segmentation. It could segment the recognized objects and forecast their position and category within a picture. Panoptic segmentation detects and segments the backdrop as well as identifying and segmenting all objects inside an image. For this type of task, Mask RCNN is the recommended network; Fig. 7 shows how Mask RCNN may be used to perform object identification, instance segmentation and panoptic segmentation.

Selection of Object Areas Mask RCNN is used to partition all of the object regions in many images and calculate the number of different-sized object areas. The ratio is depicted in Fig. 8. Small regions whose size is 0–5 KB are very easily lost and could not cover enough data. While large regions whose size is greater than 50 KB are limited, areas of 5–25 kKB are very suited to being chosen.

120

N. Kanzariya et al.

Fig. 9 Inverted index [1]

Fig. 10 Flowchart of the proposed method [1]

Construction of Inverted Index To speed up the comparison of secret info and object areas, an index must be constructed. The inverted index structure, as shown in Fig. 9, includes entries for all possible 8-bit hash sequences. A collection of object regions including the related stego-images, as well as points that will be utilized to identify the item areas may be found under each entry. It’s worth noting that each sequence code entry should include at least one picture to ensure that the image is found inside the index structure.

Steganography Process Based on Image Segmentation As seen in Fig. 10, the sender divides the secret data into segments, matches it to object region and extracts the stego-image and feature points that go with it. To ensure security, these image features will be encoded in reverse order. These photos

9 Coverless Information Hiding: A Review

121

are then emailed to the recipient. Mask RCNN is used by the receiver to extract the item regions from stego-images per feature point. Following that, the hash method is used to produce the item area sequences, which are then progressively concatenated to get secret information.

3.4 Coverless Information Hiding Based on the Generation of Anime Characters [19] In this approach, they offer a coverless information concealing approach for the creation of anime characters. It includes converting secret data into an anime character attribute label set, which is subsequently utilized as a driver for creating anime characters with GANs. The results show that, as compared to current approaches, the proposed method enhances hiding capacity by roughly 60 times and also performs well in terms of picture quality and durability. It will be split into three modules based on the majority of functions. • Create a label collection from sensitive information. • Index module for location information. • Image creation module. 3.4.1

Create a Label Collection from Sensitive Information

This study uses anime character attribute labels to represent secret information, as seen in Fig. 11. In this study, we’ll look at 5 haircuts, 13 hair colours, 10 eye colours

Fig. 11 Secret information representation [19]

122

N. Kanzariya et al.

Fig. 12 Long memory network conversion example [19]

Fig. 13 An example of an index [19]

and a range of other traits including blush, grin, open lips and ribbon that have a high chance of appearing. In this piece, the secret information and labels set are converted using a long memory network (LSTM) as seen in Fig. 13 [19]. For each anime character, the labels set format suggested in this work may be represented as hairstyle (2 bits), hair colour (3 bits), eye colour (3 bits), other characteristics 1 (2 bits), other attributes 2 (2 bits),…, other attributes n. (2bit) (Fig. 17).

3.4.2

Position Index

Figure 13 shows how to figure out which of the anime characters’ places indicated hidden information. In the position matrix, 0 indicates that the anime character in this position does not convey secret data and 1 indicates that the anime character in this position does convey secret data.

9 Coverless Information Hiding: A Review

123

Fig. 14 Framework of the proposed method [19]

3.4.3

Generation of Stego-Images

The anime character generating network is implemented on the CIH and creates animation images transmitting secret information at the index point, which is primarily restricted by the secret information labels established.

Information Hiding (See Fig. 14) Step 1: The first classified piece of information is convert the secure data series S = s1, s2, s3,… st to a binary string and segment it based on the label structure, then get the primary characteristic M and location index I = l’1, l’2, l’3,…, l’t from the Key. Step 2: Transfer the private data stream into the anime character attributes label sequence I = l1, l2, l3… lt using the LSTM model. Step 3: Construct the animation character by providing secret information at the index position of the stego-image. Step 4: Finally, deliver the stego-image created to the recipient. Information Extraction As seen in Fig. 14, the extraction process is to extract attribute labels from anime characters. The exact steps are as follows: Step 1: Using the mapping Key, get the primary attribute mapping M and location index I = l1, l2, l3,…, lt. Step 2: Extract the L = l1, l2, l3… it attribute label combination.

124

N. Kanzariya et al.

Step 3: Using the LSTM model, converting the anime character attribute label series into the secret data series S = s1, s2, s3,…, st. Step 4: At last we got binary bits series from step 3 and transform binary bits information into secret data.

3.5 Coverless Image Steganography Based on Jigsaw Puzzle Image Generation [8] In this work, a novel exciting CIH approach is given, which supports the creation of puzzle images based on a hidden message. The first carrier is divided into rows and columns to produce blocks known as sub-images. The shape of a puzzle component will be induced by tabs/blanks in each block, concluding in a properly developed puzzle stego-image in keeping with hidden bits and a suggested mapping function. After then, the completed puzzle image is emailed to the addressee. When compared to existing coverless steganography algorithms, the experimental findings and analyses indicate an honest performance in terms of concealing capacity, security and resilience.

3.5.1

Proposed Method

It has two primary components: embedding, which creates the puzzle image that supports the payload, and extraction, which extracts the crucial message. The suggested approach is entirely based on the creation of puzzle images and secret message mapping.

Jigsaw Puzzle Puzzles are defined as “puzzles composed of little parts and these parts combine together to form an image.” See Fig. 15a. Most puzzle pieces are rectangular or square and Fig. 15b. shows tabs and blanks. Every bit has a random arrangement of tabs and blanks. The maximum count of tabs and blanks could be equal to 4, as seen in the Fig. 15b.

Information Hiding Process The given image and the secret message are the first two inputs to our method. Then secret message is converted to binary string. Same as specified image will be fragmented into number of parts by adding horizontal lines, as shown in the “divided into rows” phase. Next, the picture is divided into equal size blocks and split into

9 Coverless Information Hiding: A Review

125

Fig. 15 a Puzzle image. b Structure of jigsaw puzzle piece [8]

Fig. 16 Information hiding framework [8]

columns of comparable width by adding vertical lines, as seen Fig. 16. These are the jigsaw pieces that are missing the tabs and blanks described before. Here key bit is represented by 0 or 1. If the key bit value is 1 and depending on the block location and scanning technique, a tab will be attached to the top, bottom, left, or right side. A blank is inserted to the current block side if the key bit is 0. This procedure will be repeated until all of the hidden bits are indicated by tabs and blanks; for more information, see the “Puzzle Generation” step in Fig. 16. After that, the edges seen between blocks and tabs/blanks are erased, leaving a puzzle pattern that may be visualized as a stego-image. Finally, the receiver receives this stego-picture.

Retrieval of Secret Information Extraction In Retrieval of Secret Information, the produced puzzle image is fed into the extraction mechanism via the receiver. To begin, the system monitors each block in the puzzle picture’s from left and right sides, row by row. Then, for each block, 1 bit is used to represent a tab and 0 bit is used to represent a blank. Next, to reach the top and bottom sides of each block, the same method will be performed column by column. The key bit stream is then formed by gathering all of the secret bits and concatenating them. Finally, the bit stream is split into 8-bit chunks, translated into characters and assembled to form the key message.

126

N. Kanzariya et al.

3.6 Coverless Real-Time Image Information Hiding Based on Image Block Matching and Dense Convolutional Network [16] In this paper, the high-level semantic characteristics of cover image is extracted by the supervised learning of deep learning Simultaneously, hash sequences of cover image is generated by using the DC coefficient. Matching and splicing the image blocks provide the necessary information. This technique includes three primary steps, as shown in Fig. 17.

3.6.1

Preprocessing

The Internet is used to search and download a large number of images. DenseNet is used to extract the strong semantic features of every block, and the images are then divided into numerous non-overlapping blocks for feature matching. The inverted index structure is produced by using DCT to create a strong hash code from the DC coefficients between consecutive image units.

Fig. 17 Flowchart of proposed method [16]

9 Coverless Information Hiding: A Review

3.6.2

127

Sending Processing

The index structure matches the picture blocks that are similar, and the key image is split into blocks. For transmission, the most appropriate image block is picked based on Euclidean distance.

3.6.3

Receiving Processing

Equivalent blocks from the stego-image are obtained using the location information. Because the secret picture has been provided, a similar-sized blank space has been created. Finally, these picture pieces are placed in the empty area to produce a duplicate of the key image.

3.7 Coverless Image Steganography Based on Discrete Cosine Transform and Latent Dirichlet Allocation (LDA) Topic Classification [11] The image database is classified using the LDA topic model. Second, images from a single topic are chosen, and an 8 × block DCT transform is applied to them. The strong feature sequence is then generated by the association between coefficients inside nearby blocks. Finally, add the feature sequence, DC, coordinates and picture link in an inverted index. Converting the key information to binary and segmenting it at the same time, the image also with optimal feature series matching the key information fragment is chosen as the index’s cover image. The feature sequence is created at reception using DC coefficients, and every feature pattern is concatenated to get the key information as shown in Fig. 19, which is broken into four sections, shows the exact steps:

3.7.1

Image Database Classification

BOF is a technique for extracting features from all pictures in a database and calculating word frequency. The subject distribution of images is calculated using the LDA topic model. By comparing the image is classed into the subject suggestion with the highest probability, and the procedure is repeated until all images have been classified into the correct manner.

128

3.7.2

N. Kanzariya et al.

Retrieval of Feature Sequence

The images are all scaled to the same size and divided into N × N/l × l blocks. Every block is transformed to YUV colour mode, with each block’s Y channel sub-divided into 16 sub-blocks. Every sub-block receives an 8 × 8 DCT transformation. The neighbouring sub-blocks are used to acquire each small amount of feature sequence that belongs to each block. To get the feature sets of all sub-blocks for every image, follow the procedures above.

3.7.3

The Creation of an Inverted Index

In inverted indexing, each feature sequence corresponds to a list with three columns as shown in Fig. 18. The DC that will be used to decide the order of the received images is computed and saved in the first column. The relevant picture path is stored in the last column. The feature sequence of every block of the images could be determined, and the info of all blocks of the same feature sequence forms a list. For hidden communication, the recipient receives edge information and photos. The sequence of the sub-feature block is calculated down to the tiniest detail. Repeat the steps above to get the feature sequence of each image in order to obtain the necessary information.

Fig. 18 Inverted index [11]

9 Coverless Information Hiding: A Review

129

Fig. 19 Flow chart of the proposed framework [11]

3.7.4

Hiding and Retrieval of Secret Information

First at the sender side, the private data is transformed to M binary pieces. The feature patterns of every sub-block are acquired once the image dataset has been trained. After that, the reverse exponential is generated. For the secret data part, an image with the same feature sequence is produced. Repetition of the preceding method will result in all essential information pictures being erased. The preceding paragraph’s number of zeros is tallied and converted into an M-bit binary sequence. Using the previous approach, locate the picture and it should be added to the cover picture. There is no need to save location information if full-size feature sequences are retrieved. The matrix, on the other hand, contains all position coordinates that have been encrypted using the AES encryption method. Using the geometric calibration technique, the image is corrected at the receiver, and the arrangement of the image containing secret information is retrieved using the DC. To acquire the sub-block location coordinates of each picture, decrypt the matrix. To sub-block, an 8 × 8 DCT transform is used.

3.8 Coverless Image Information Hiding Based on Generative Model [20] This method was the first to use a generative model to enable the way of concealing visual information without a cover. The key image (refer Fig. 20) in the generative model database is used to create regular and autonomous pictures with a variety of meanings from the key image. The receiver gets the developed image and enters it into

130

N. Kanzariya et al.

Fig. 20 Suggested framework’s flow chart [20]

the repository to build a comparable image to the secret image. Both the transmitter and the receiver have the similar data set and, as a result, the similar settings.

3.8.1

Training Process

WGAN (Wasserstein Generative Adversarial Network) [16] is an updated version of GAN that ensures more training stability. We’ll train the generative model database with the WGAN until it could create a meaningful image that isn’t connected to the key image. We give the generative model the key picture, and it generates significance, unrelated IMG which has no relation with the hidden image. If we choose “Leena” as the secret photo, as illustrated in Fig. 21, it may produce an IMG that is aesthetically similar to the “penguins” we want to send. Meanwhile, we’re working with “penguins” to make the IMG appear to be the same as “Leena” using the WGAN. The G1 and G2 models are named “Leena” and “penguins”, respectively. To appropriately explore, we capture more image as hidden images, preserve the relevant generative models and use the same method to put them into practice on the following trial. To create the generative model database, we put the generative models G1, G2 and so on of producing exactly identical to “Leena” “penguins”.

3.8.2

Information Hiding Process and Extraction Process

The transmitter and the recipient share the same dataset and parameters since both the transmitter and receiver have a solid grasp of the generative model database. As illustrated in Fig. 21, all we need to do is send the receiver a meaning- a normal picture that isn’t connected with the key image, so that the receiver may produce an image that looks exactly like the secret image.

9 Coverless Information Hiding: A Review

131

Fig. 21 A training process [20]

3.9 Coverless Information Hiding Based on DCGAN [10] In this method, the generator generates the stego-image, which is backed by preprocessed secret data in DCGANs, as depicted in Fig. 22, and it contains no information. DCGANs are used to produce a canopy image that already contains hidden information throughout this approach. At various phases, DCGANs are thoroughly trained, and the guidance is also transformed into a vector of noise [10]. To summarize, the noise vectors from the stego-images are extracted using DCGANs. After the mapped noise vectors have been trained into them. The elements of this coverless steganography approach supported by DCGAN are as follows:

3.9.1

Development of Stego-Image

The secret data is split into small parts, each of which is compared to a noise vector of two or three bits. DCGAN is learned on an image dataset generator, and G is acquired once DCGAN has reached its convergence point. The stego-images are created by matching the information exchange of the 000 images in the training set to construct a synthetic image using G, which consists of fully connected and 4 convolutional layers. The magnitude of the noise dimension is related to the depth of the small print inside the generated image.

132

N. Kanzariya et al.

Fig. 22 Flow chart of the proposed framework based on DCGANs [10]

3.9.2

The Extractor’s Training

The extractor E, a CNNs classifier, is trained using the recovered losses from a huge set of random noisy vectors. E has a leak-ReLu activation function with batch normalization but no pooling or dropout for each of the four convolutional layers. After the final convolutional layer, a fully connected layer is used. The generator creates stego-images with 64 × 64 × 3 dimensions in line with these noise vectors as input to the extractor, and the output may be a high-dimensional noise vector.

3.9.3

Invisible Communication

The network’s transmitter and receiver, as well as the G and E parameters after the DCGANs have been trained. Transmitter split the key data into small parts based on the capacity of the steganographic image, then uses G to generate stego-images in accordance with noise z. At the last, receiver receives stego-images, reduces noise vectors with E and then retrieves the relevant data using reverse mapping techniques.

9 Coverless Information Hiding: A Review Table 1 Embedding capacity of the proposed methods

133

Methods

Capacity

Robustness

Proposed method 1[12]

18 bits per carrier

Provide higher robustness

Proposed method 2 [18]

2600 bits per carrier

Provide low robustness

Proposed method 3[1]

8 bits per carrier

Provide low robustness

Proposed method 4 [19]

14 bits per carrier

Provide low robustness

Proposed method 5 [8]

760 bits per carrier

Provide higher robustness

Proposed method 6 [16]

800 bits per carrier

Provide higher robustness

Proposed method 7 [11]

15 bits per carrier

Provide higher robustness

4 Performance Evaluation First we compared coverless image steganography to regular image steganography in this part, as well as evaluated the performance of current coverless image steganography. The greatest advantage of coverless image steganography is that it can resist the existing steganalysis tools to a great extent. Coverless image steganography is more robust against geometric attack compare to traditional image steganography. The capacity is restricted by the length of the image hash, which is its greatest issue. Longer hash sequences may be constructed to conceal more secret information, but the image database must also be increased. Second, we analysed the performance of existing coverless image hiding method based on two attributes. First attributes is embedding capacity which is shown in Table 1, and second one is robustness which show in Table 2.

5 Current Key Challenges Finally, conducting an analysis of existing coverless image steganography and their characteristics, it was found that current approaches had the following drawbacks: 1. 2. 3. 4. 5.

Most of propose methods are very sensitive to geometric attack because they capture feature from spatial domain [1, 8, 16, 19]. Existing techniques are insecure and insufficiently robust [1, 18, 19]. Most of propose methods are also have a limited embedding capacity. To send the secret information, several images are needed [1, 11, 12]. It is necessary to have a large image database [15, 17].

134

N. Kanzariya et al.

Table 2 Robustness tests against geometric attack (BER) Geometric attack Rotational

Scaling

Gaussian noise

Median filter

Gaussian filter

6. 7.

Method 1[12]

Method 2 [18]

Method 7[11]

10°

0.7894

0.0937

0.164

30°

0.8307

0.1190

0.1116

50°

0.792

0.1676

0.1288

0.5

0.7364

0.5892

0

0.75

0.5943

0.2354

0

1.5

0.3928

0.0582

0

σ(0.001)

0.8088

0.1548

0.0301

σ(0.005)

0.8165

0.4680

0.0172

σ(0.1)

0.8662

0.9638

0.0086

(3 × 3)

0.6822

0.1813

0

(5 × 5)

0.6951

0.3486

0.0086

(7 × 7)

0.8088

0.4699

0.0172

(3 × 3)

0.686

0.1743

0

(5 × 5)

0.7558

0.3051

0

(7 × 7)

0.9031

0.4984

0

They scan the database for images that contain hidden message bits, almost like image retrieval. At both the transmitter and the receiver, current techniques produce a hash code for every image or image block in the database, which is wasteful in terms of time and resources [1–3, 8, 19].

6 Conclusion and Future Work This article explores a comprehensive analysis of coverless image steganography, emphasizing recent advancements, providing an overview of the framework of those techniques and examining performance for the most representative ways. Despite the significant advancements in coverless image steganography over the previous several years, there is still considerable need for development in future task. In future work, we will focus on enhance hiding capacity, extend the image’s distinct identifiable details, extend image database and enhance robustness. To enhance data hidden capacity, we may split the picture into numerous sub-images or create a longer hash series based on real-world demands. In upcoming progress will require us to enhance our image database in order to enhance the performance of the recovered image and effectively communicate hidden information through natural photographs. At the last also focus on to enhance robustness. The system’s robustness against rotation and content loss assaults will be improved in future.

9 Coverless Information Hiding: A Review

135

References 1. Luo Y, Qin J (2020) Coverless image steganography based on image segmentation. Comput Mater Continua CMC 64(2):1281–1295 2. Qiu A, Chen X, Sun X (2019) Coverless image steganography method based on feature selection. Tech Science Press JIHPP 1(2):49–60 3. Liu Q, Xiang X, Qin J (2019) Coverless steganography based on image retrieval of DenseNet features and DWT sequence mapping. Elsevier 4. Liu M-M, Zhang M-Q, Liu J, Zhang Y, Ke Y (2018) Coverless information hiding based on generative adversarial networks. J Appl Sci 36:371–382 5. Zhou Z, Sun H, Harit R, Chen X, Sun X (2015) Coverless image steganography without embedding. In: Huang Z, Sun X, Luo J, Wang J (eds) ICCCS 2015, vol 9483. LNCS. Springer, Cham, pp 123–132 6. Hartigan JA, Wong MA (2013) A K-means clustering algorithm. Appl Statist 100–108 7. Zhou Z-L, Cao Y, Sun X-M (2016) Coverless information hiding based on bag-of-words model of image. J Appl Sci Electron Inf Eng 34(5):527–536 8. Al Hussien Seddik Saad, Mohamed MS, Hafez EH (2020) Coverless image steganography based on Jigsaw puzzle image generation. Tech Science Press 9. Zhou Z, Cao Y (2019) Faster-RCNN based robust coverless information hiding system in cloud environment. IEEE 10. Liu M, Zhang M, Liu J, Zhang Y (2017) Coverless information hiding based On DCGAN 11. Zhang X, Peng F, Long M (2018) Robust coverless image steganography based on DCT and LDA topic classification. IEEE Trans Multimedia 20(12):3223–3238 12. Zheng S, Wang L, Ling B, Hu D (2017) Coverless information hiding based on robust image hashing Springer 13. Valandar MY, Barani MJ, Ayubi P, Aghazadeh M (2019) An integer wavelet transform image steganography method based on 3D sine chaotic map. Multimedia Tools Appl 78(8):9971–9989 14. Liang W, Long J, Cui A, Peng L (2015) A new robust dual intellectual property watermarking algorithm based on field programmable gate array. J Comput Theor Nanosci 12(10):3959–3962 15. Zhou Z, Mu Y, Wu QJ (2018) ‘Coverless image steganography using partial-duplicate image retrieval. Soft Comput 1–12 16. Luo Y,·Qin J, Xiang X (2019) Coverless real-time image information hiding based on image block matching and dense convolution network. Springer 17. Holub V, Fridrich J, Denemark T (2014) Universal distortion function for steganography in an arbitrary domain. EURASIP J Inf Secur 2014(1):1–13 18. Yang L, Deng H, Dang X (2020) A novel coverless information hiding method based on the most significant bit of the cover image. IEEE, vol 8 19. Cao Y, Zhou Z, Jonathan Wu QM, Yuan C, Sun X (2020) Coverless information hiding based on the generation of anime characters. EURASIP J Image Video Proc 20. Duan X, Song H, Qin C, Khan MK (2018) Coverless image information hiding based on generative model. Tech Science Press 018.

Chapter 10

A Review on Transliterated Text Retrieval for Indian Languages Sujeet Kumar, Siddharth Kumar, and Jayadeep Pati

1 Introduction Natural language processing is a way of computational techniques to analyze and synthesize natural languages, speeches, text mining, and information retrieval. With the rapid increase of online data that is primarily unstructured and global, the main challenge is to handle multilingual text processing under natural language processing. In the new technology era where the Internet and technology to reach are there in a far remote area, it is a challenge to provide the natural language support for all the in their native language support. The task becomes even more complex when the user uses more than one language. One way to provide support for NLP in local languages is to do transliteration. Because of several reasons like availability of computer or laptop/mobile keypad in English, unavailability or no support of fonts of local languages people mainly use to write the native language word in English language script, i.e., Roman script. There are few improvements in the availability of signiﬁcant language support other than English in a new type of keyboard, especially for large users like Chinese and Hindi. Still, it is too far from reality to provide keyboard and font support for all the local languages. In online technologies mainly in social media, the major population uses local languages but uses the English language script or Roman script to write the same. Transliteration is different than translation in a way that in translation the equivalent word(s) or phrase and grammar of one language to another language is used, whereas in transliteration the word(s) and phrases are used in local language S. Kumar (&) S. Kumar J. Pati Indian Institute of Information Technology, Ranchi 834010, India e-mail: [email protected] S. Kumar e-mail: [email protected] J. Pati e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_10

137

138

S. Kumar et al.

but the only script is changed to match with the transliterated script and usually no grammars are required. So, in transliteration the words or phrases are transformed phonetically into other or non-native languages. Transliteration is easy as compared to translation as users need not know the other language’s equivalent words or grammar at all. Translation of word “आम” in Hindi to English is Mango, and transliteration is aam. Same as translation of the phrase “चला कैसे करते है”ं in Hindi to English is “how to walk”, and transliteration is चला कैसे करते हैं. The major problem is anything that is written in a local language but using not local or non-native script may or may not follow the standard grammar and spelling. It totally depends on the regional pronunciation and user’s judgment or phonetic understanding of the conversion of phrase from native to a non-native script. Like, Hindi word फूल (Flower) can be written as Phool or phul or maybe fool as well by the Hindi user in Roman script. This requires a robust algorithm to understand and ﬁnd the best possible match of user keywords from the available options. Here the word “fool” is available in the roman script too which is its own meaning. So the system needs to identify and take care of the words which are possibly available in both native and foreign languages. At the same time, a user may use a combination or mixture of languages. Like, “राम स्कूल जा रहा है”. Here school is an English word that is getting used with Hindi words written in roman script. So for the words “Ram, jaa, raha, hai” the language is Hindi but the script is roman, whereas for the word School both language and script are English. To handle this situation efﬁciently, the preprocessing of data is required. All input for transliteration needs to be cleaned and massaged and preprocessed before using them for machine transliteration or information retrieval. Transliteration is very helpful for NLP and cross-lingual information retrieval (CLIR) in a way that it provides the flexibility to users for using the native language and still be able to communicate, search or utilize the info available over the Internet. In this survey paper, we are going to discuss the core concepts and terminologies in transliteration, current research done in the Indian languages transliteration-based search and text retrieval. Also, we are going to discuss the efﬁciency and accuracy of available transliteration approaches and future scope.

2 Transliteration Concepts It is essential to understand the basic terminologies of transliteration. There are a few crucial pieces of jargon as below, which are used widely in the transliteration world [1]. Phoneme: In the study of human speech, it is the smallest measure or unit of speech that distinguishes one word from another. It mainly belongs to spoken language, usually recorded using symbols like [] or //. Phoneme use is mainly limited to vowels and consonants, but for some extent, it can be used for pitch and stress of the word. Like /r is the Phoneme which separate the [rat] from [mat] and [cat].

10

A Review on Transliterated Text Retrieval for Indian Languages

139

Syllable: A can be deﬁned as a word itself or part of a word that contains only one vowel sound, so it can also be said as a single pronunciation unit. The syllable is formed with three parts where syllable nucleus or syllable peak, mainly vowels, are the most important and fall in the middle. Another factor is initial, which is optional. The last part is constant, primarily called as ﬁnal margin. The syllable is the smallest unit in terms of pronunciation. A single word may have one or more than one syllable, like “Hi” has one syllable, whereas “Hijack” has two syllables. A word can be monosyllabic (having a single syllable) or disyllable (having two syllables), or trisyllable (having three-syllable). If the word has many syllables, it is called polysyllables. Syllabic was invented even before letters, and it is several hundred years old. Grapheme: This is the smallest functional contrastive unit in the writing system—a grapheme having two concepts as referential and analogical concepts. In referential images, graphemes are considered the minor units of writing that correspond with phonemes or sounds. Like in uniform word un is grapheme. In the analogical concept, graphemes are deﬁned analogously to phonemes, like in fat and cat, f and c are grapheme. Multiple graphemes may act for or represent a single phoneme. When there are two graphemes, phonemes are called a digraph. Writing system: A writing system is most important to identify the meaning of a sentence and represent expressible phrases and statements in any given language. Different writing system has their speciﬁcation, and that differentiate them from others. The writing system can be functionally classiﬁed in ﬁve ways as: Featural: Instead of phonemes, symbols are included to represent individual features in a featural writing system. It has been assumed that a featural script can represent more minute details as compared to an alphabet. The symbol is not whole phonemes but only the feature which makes the phonemes. The system is infrequent and only used by the Korean Hangul language. Logo-graphic: It is one of the earliest writing systems. It is used by the ﬁrst historical civilizations of the Near East, Africa, China, and Central America. Chinese, Japanese, and Korean languages use this system more regressively. Syllabic: In this system, characters represent syllables, and these syllables are combined to make morphemes. In this system, vowel (V) or consonant–vowel (CV) are widely used. The Devanagari and Japanese kana (both hiragana and katakana) are examples of syllabic writing systems. Ambiguous: English is the best example of this where we have a mixture of logo-grams or unique characters like $, % #, etc., and numerals like 0–9. Ambiguous writing systems are the ones that use symbols and phonemes from different writing systems.

140

S. Kumar et al.

Alphabetic or segmental: It has alphabets those are nothing but a small set or sub-set of letters or symbols that act for a phoneme. Latin writing systems are an example of this writing system.

3 Classiﬁcation of Transliteration Transliteration can be achieved in three ways (a) Generation (b) Mining and (c) Fusion approach. The transliteration generation approach is a traditional and straightforward way of producing or generating transliterations automatically for any given word in a source language script to its equivalent matching counterpart destination language script. This works on pre-deﬁned mapping rules between source and destination language. So it is mainly for single resource system-based and less accurate. The generation approach can be further segregated based on the method used. Direction-based approach is a type of transliteration generation approach, which can be forward transliteration or backward transliteration. In forwarding, transliteration users write native terms using a foreign script. Like Hindi word, “आम” can be written as Aam in roman script. It is deﬁned as back-transliteration when converting any word back to its original or native script from a foreign language script takes place. To give an example, Aam which is written in the Roman script can be back-transliterated as “आम” in its original or native script. The other type of transliteration generation approach is script speciﬁc, mainly based on the kind of script used for source or destination language. It can be of multiple types like Arabic, Indic, Latin, and Roman. Resource used is another way of categorizing the transliteration generation approach, which can syllable-based, phoneme-based, or graphene-based depending upon the type of resource used for transliteration. In the transliteration mining approach, extraction or mining of transliteration pairs is done from different resources in parallel or web or using a comparable method of machine learnings to get the best match of the source to destination languages. There are several ways for transliteration mining which include but are not limited to phonetic similarity, machine learning, and word co-occurrence. The fusion method approach, also called the hybrid approach, is one in which both transliteration generation and mining approach methods are used.

4 Transliteration for the Indian Languages As India has diverse and having multiple regional languages, it is more vast to do the transliteration in the Indian language. There can be mainly three categories for Indian language transliteration like (a) non-Indian to Indian language transliteration, for example, English to Hindi transliteration. (b) Indian languages to non-Indian

10

A Review on Transliterated Text Retrieval for Indian Languages

141

language transliteration, for example, Hindi to English transliteration. (c) One Indian language to another Indian language transliteration, for example, Marathi to Hindi or Hindi to Marathi. As most technological development, especially in computer and mobile or Internet, has been done in English to the most required form of Indian language transliteration is from English to Hindi. Several researches have been done in this area with different transliteration generation, mining, and fusion approaches. We are going to discuss those as below.

4.1

Evaluation Metrics

Every input word of the source language generates the list of possible matching candidates in the destination language. The evaluation of the method has been done by MRR also called as mean reciprocal rank. This can be described as below: MRRðkÞ ¼

N 1X 1 N n¼1 RankðnÞ

ð1Þ

where k is deﬁned as top candidates generated. N is the total input words given to be transliterated. Rank(n) is deﬁned as the rank of the very ﬁrst matched answer. Rank(n) = inﬁnite if no solution is found for the word.

4.2

Generation-Based Transliteration for the Indian Languages

There are different generation-based transliteration as suggested by Chinnakotla et al. [2]

4.2.1

Basic Rule-Based System

This rule-based type system is implemented either by using a set of character sequence mapping rules or mapping of character among source and destination language. This method is an ancient method and has signiﬁcantly less efﬁciency and accuracy.

142

4.2.2

S. Kumar et al.

Basic Rule-Based System with Enlarged Alphabet

The accuracy of the basic rule-based system is very low because a single Hindi character usually is mapped with multiple English characters, making the varying length candidate generated. This can be solved by changing the alphabets in the NGram model. This model is more efﬁcient than a basic rule-based system.

4.2.3

Character Sequence Modeling (CSM)

A primary way this transliteration work is by creating a set of character sequence mapping rules among the source and destination languages. The transliteration research landscape is mainly dominated by parallel corpus-based systems because in transliteration accuracy is deﬁned by algorithm rules and its performance and with human convention, too, which are too inconsistent. For high resource or resource-rich language(s), sizeable collateral transliteration corpus-based techniques like statistical technique works good. For low resource language(s), parallel transliteration is not feasible, so only rule-based transliteration is optimal and same can be used [2]. Chinnakotla et al [2] demonstrated that a satisfactory level of transliteration performance could be achieved by using some manually or statistically created rules and monolingual resources. This indicates the efﬁciency of transliteration using simple generation methods based on character sequence modeling. A manual or statistical generated rule-based character mapping CSM has been done at the source side to use and utilize the destination side CSM for ranking of the generated candidate. This generation method can be used for English–Hindi, and Hindi– English both. Character sequence model or CSM can be deﬁned as a probability distribution (PD) over a series of characters within a string or complete word. The probability of ﬁnding the word W in any given corpus can be deﬁned as below [3] PðW Þ ¼

n Y

Pðci jci1 ; . . .; ciN ÞðNth oder AssumptionÞ

ð2Þ

i¼1

where W = < c1, c2,…, cm > can be deﬁned as string or word also ci = ith number character of given string or word.

4.2.4

CRT-Based Transliteration

In this approach, the transliteration assignment is considered as a sequence learning problem. The CRF-based approach is better in performance than other sequence learning problem approaches called the HMM-based approach. GIZA++ is used for character mappings learning in this method.

10

A Review on Transliterated Text Retrieval for Indian Languages

4.2.5

143

SMT-Based Transliteration

In this approach, the words of the transliteration problem are replaced by the characters and considered as a phrase-based machine translation assignment. GIZA++ aligner, SRILM toolkit, and Moses decoder are mainly used to implement this method (Fig. 1).

4.2.6

Serial Compositional Transliteration Systems

In this methodology, transliteration between a source and destination languages is developed when no direct data exists between them. The accuracy of X ! Z can be signiﬁcantly improved by composing one more intermediate transliteration layer of relation in between as X ! Y and Y ! Z. [4]

4.2.7

Parallel Compositional Transliteration Systems

This is also the type of compositional transliteration. If Transliteration is available between X and other different languages, then X ! Z’s accuracy can be signiﬁcantly improved by capturing transliteration evidence from different languages.

4.3 4.3.1

Mining-Based Transliteration for the Indian Languages Mining Comparable Data (Mint)

This method is mainly based on named entities (NEs) which are most important in information retrieval, especially in cross-language information retrieval (CLIR). Named entity transliteration equivalents (NETE) work by taking advantage of the availability of the same news in multiple languages, so mining news corpora in different world’s languages simultaneously is possible. MINT has many beneﬁts, including the non-requirement of language-speciﬁc knowledge and does not depend on frequency statistics (Table 1).

Fig. 1 Architecture for transliteration model

144

S. Kumar et al.

Table 1 Review comparison of different transliteration methods S No

Citation

Generation method used

Type of transliteration

Language pair

Data set used

MRR

1 2

[2] [2]

Generation Generation

Hindi–English Hindi–English

30 K dataset 30 K dataset

0.187 0.222

3

[2]

Generation

Hindi–English

30 K dataset

0.525

4

[2]

Generation

Hindi–English

30 K dataset

0.548

5

[2]

Generation

English–Hindi

30 K dataset

0.222

6

[2]

Generation

English–Hindi

30 K dataset

0.378

7

[2]

Generation

English–Hindi

30 K dataset

0.463

8

[2]

Generation

English–Hindi

30 K dataset

0.546

9

[4]

Generation

0.460

[4]

News 2009

0.519

11

[4]

News 2009

0.453

12

[4]

News 2009

0.366

13

[4]

News 2009

0.464

14

[4]

News 2009

0.557

15

[4]

News 2009

0.491

16

[4]

News 2009

0.509

17 18 19 20

[5] [5] [6] [6]

English–Marathi– Kannada English–Hindi– Marathi Kannada–Hindi– English Kannada– English–Hindi English–Marathi– Kannada English–Hindi– Marathi Kannada–Hindi– English Kannada– English–Hindi Hindi–English Hindi–English English–Hindi English–Kannada

News 2009

10

Basic system Basic system with enlarge Alphabet CRF transliteration SMT transliteration Basic system with enlarge alphabet Enlarge alphabet CRF transliteration SMT transliteration Serial compositional Serial compositional Serial compositional Serial compositional Parallel compositional Parallel compositional Parallel compositional Parallel compositional Naive Editex MINT MINT

FIRE 2013 FIRE 2013 Web Crawl Web Crawl

0.68 0.77 0.82 0.92

Generation Generation Generation Generation Generation Generation Generation Generation Generation Mining Mining

10

A Review on Transliterated Text Retrieval for Indian Languages

145

5 Result and Discussion After reviewing multiple transliteration research work, we can mainly categorize the transliteration into three categories of generation, mining, and fusion. Generation-based transliteration is an older approach with comparatively less accuracy or lower MRR. The experiments have been done for various mapping options feasible as Roman to Indian languages and vice versa. The investigation done in Chinnakotla et al. [2] with 30 k datasets shows that the accuracy and MRR of the basic system are signiﬁcantly less of Hindi–English transliteration and are not reliable. The basic design can enhance marginally by adding the enlarge alphabet feature. The basic system and basic system with enlarge alphabet feature words better for English–Hindi transliteration than Hindi-English transliteration. CRF- and SMT-based CSM transliteration have better MRR than basic system, but it works more efﬁciently in Hindi–English transliteration. Series and parallel composition work for multiple Indian languages transliteration, but its efﬁciency is lesser than CRF and SMT transliteration. The best MRR in generation method has been achieved in Editex followed by Naïve, but the maximum MRR can still be achieved as 0.77 only. The overall MRR performance in the generation method is more minor. For mining, there is limited research available for Indian languages, but the performance of mining is higher than generation methods. The MRR has been calculated for English to Hindi and English to Kannada, which MRR are, respectively, 0.82 and 0.92, which are the most elevated. Mining is complex, costly, and resource-consuming approach but gives the better MRR overall. There are few researches also done in the phonetic transliteration area using low resources techniques based on neural network [7]. The scope of research in transliteration is both written and phonic or voice-based search.

6 Conclusion We reviewed and discussed the evaluation of multiple transliteration approaches available and their performance using MRR measures. It has been found that generation-based transliteration needs some more methods for better accuracy on Indian languages. Mining methods can be used effectively to translate different Indian languages but require high resources and computation. There are many challenges in the Indian languages transliteration, like transliteration variants and the use of cross-language or multi-language scripts and texts, which provide the opportunities and scope in this research area [8]. C-DAC and NCST are among the major contributor to machine transliteration of Indian languages [9]. There is a scope of work in the fusion approach of transliteration.

146

S. Kumar et al.

References 1. Prabhakar DK, Pal S (2018) Machine transliteration and transliterated text retrieval: a survey. Sādhanā 43(6):1–25 2. Chinnakotla MK, Damani OP, Satoskar A (2010) Transliteration for resource-scarce languages. ACM Trans Asian Lang Inf Process (TALIP) 9(4):14 3. Chinnakotla MK, Ranadive S, Damani OP, Bhattacharyya P (2008) Hindi to English and Marathi to English cross language information retrieval evaluation. In: Advances in multilingual and multimodal information retrieval. Springer, Budapest, Hungary, pp 111–118 4. Kumaran A, Khapra MM, Bhattacharyya P (2010) Compositional machine transliteration. ACM Trans Asian Lang Inf Process 9(4):1–28 5. Gupta P, Bali K, Banchs RE, Choudhury M, Rosso P (2014) Query expansion for mixed-script information retrieval. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval. ACM, pp 677–686 6. Udupa R, Saravanan K, Kumaran A, Jagarlamudi J (2008) Mining named entity transliteration equivalents from comparable corpora. In: Proceedings of the 17th ACM conference on information and knowledge management. ACM, pp 1423–1424 7. Le NT, Sadat F, Menard L, Dinh D (2019) Low-resource machine transliteration using recurrent neural networks. In: ACM transactions on Asian and low-resource language information processing (TALLIP), vol 18, no 2, pp 13 8. Dasgupta T, Sinha M, Basu A (2013) A joint source channel model for the English to Bengali back Transliteration. In: Mining intelligence and knowledge exploration. Springer, pp 751–760 9. Dhore ML, Dhore SK, Dixit RM (2012) Optimizing transliteration for Hindi/Marathi to English using only two weights. In: Proceedings of the 24th international conference on computational linguistics. Bombay, India, pp 31–48

Chapter 11

Learning-Based Smart Parking System S. Sajna and Ranjith Ravindranathan Nair

1 Introduction As more and more people move to cities for education, jobs and better lifestyle, the population of the cities is constantly increasing. The demand for private vehicles is also increasing as people opt for better living standards. Our cities might not be big enough to meet the increasing demand of roads and parking spaces. A fully automated parking system with real-time availability of parking data can be very useful, as it helps to reduce time and energy consumption and also reduces the traffic congestion. This paper proposes a twofold approach to address the smart parking problem. It includes the vision-assisted, automated detection of free parking spaces based on deep learning (CNN) and the automatic parking using reinforcement learning. In [1], parked car and empty parking space are detected from images captured using drone. The paper uses multi-task cost-sensitive-CNN (MTCSCNN). Multi-task partition layer has some sub-task selection unit, which captures the local map of the original aerial view image. These local maps are scaled up to decrease the number of objects to detect. From each local map, car and empty space are detected. In [2], the street parking availability is taken into consideration. A CNN developed is trained using the images taken from the roadside camera and further the trained CNN is used to detect available free parking spaces for the users. The user can raise the parking slot detection request via a mobile-based user interface. Upon receiving this request, the street images are captured for analysing free slots and user will be notified for free parking space. In [3], two learning models with CNN-based framework were used for detecting and classifying the free parking slots: a 4-layer CNN and a FMRS. Sajna (B) · R. R. Nair Department of Electronics and Communication Engineering, Indian Institute of Information Technology Pune, Pune, Maharashtra 411048, India e-mail: [email protected] R. R. Nair e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_11

147

148

S. Sajna and R. R. Nair

CNN. The proposed FMR-CNN model has proved an average car detection accuracy of 92%, which has outperformed the convention 4-layer CNN model. In [4], mask R-CNN is used for detection and classification of parking slots, in which a FCN layer is employed for the bounding box segmentation and the object mask identification, where the mask helps to obtain vehicle’s area of coverage. In [5], a parking occupancy and monitoring system using ultrasonic sensors and Arduino Uno are used to detect free parking space. The sensors need to be placed through out the parking space, which makes it expensive and inefficient. A mobilebased user interface is also developed from which the user can get the details of free parking slots in the parking area. It is mainly useful for indoor parking system. Following are the some of the existing sensors technologies used in smart parking systems [6]: Passive infrared sensors, active infrared sensors, ultrasonic sensors [7] and radio-frequency identification (RFID) systems [8]. The main disadvantage of IR and ultrasonic sensors is that they are sensitive to environment. The RFID systems cannot be used in open parking spaces and occupancy status of individual parking slots cannot be obtained. In [8], RFID-based wireless sensor network is proposed for implementing an efficient car park monitoring system in large scale. Suhr and Jung [9] propose sensor fusion technique employing an AVM system and an ultrasonic sensor for the identification of vacant parking spaces, but it is highly sensitive to the environment. Zhang et al. [10] present multi-objective optimization-based path planning technique for automated parking. In this work, an environment model, which is an approximation of the actual environment, is established. A similar approach is depicted in [11] for automated parking, where multiple objectives are taken into consideration. An eye level camera is employed to take the picture of the parking slots. This is used for the detection of vehicle orientation. It is done using the AdaBoost vehicle detector. The red tail lights are also identified using the colour model algorithm. An automatic parking path tracking controller is designed to control the vehicle to track the planned path. In [12], a new model named Eff-UNet is proposed, which is a combination of EfficientNet and UNet decoder. In this method, the combination of high-level feature information and low-level spatial information are used for precise segmentation, where Eff-UNet is used for the complete scene understanding in intelligent transport system. The optical characteristics of the image of the parking space obtained from camera are used to detect the free parking spaces in [13]. It also proposes a alert system to be given to the user regarding free parking space. In [14], the problem of tactical decision making in autonomous driving is addressed, in which a Monte Carlo tree search algorithm is used for planning and deep reinforcement technique is employed for learning. In [15], combinations of video frames are used to detect free parking spaces rather than images. The video captured by cameras on roadsides are used. Deep convolution neural networks and a vehicle tracking system are used to improve accuracy. A car tracking filter is used instead of using segmentation for detecting parking slots.

11 Learning-Based Smart Parking System

149

In [16], high resolution aerial images are captured by air vehicles and satellites. These images are used for the detection and localization of cars. For this purpose, a sliding window approach is used. Since the aerial images are heavily cluttered due to the presence of many objects, which seems to be similar to a car from the top view, the false positive rate can be very high. In this paper, the problem is resolved by using Gaussian mixture model. Liu et al. [17] tries to mimic human behaviour in automatic parking. In the paper, the initially a trajectory is designed considering the initial and final position of the vehicle. Then a controller is designed to guarantee that the vehicle track the trajectory designed. Dynamic model of the vehicle is used to obtain the mapping relation. In [18], the short-term parking demand in a parking area is predicted. The data is collected from 17 June 2019 to 23 June 2019 at Jilin University of Nanling Campus. The vehicle transit pattern the parking space is monitored in different conditions, viz. rush hour and ordinary hour. Markov’s birth death process is used for the prediction of the parking demand. Knowing the short-term demand can be utilized for improving parking guidance system. Khan et al. [19] use Faster R-CNN and Resnet50 to detect free parking spaces and the vehicles from the images of the parking area based on manual labelling. A counting system using deep convolution features is employed for counting the number of vehicles entering and exiting the parking area. The system does not provide any solution for occluded vehicle. In [20], the parking space occupancy of a gas station is monitored. The images from the surveillance video is used to detect free and occupied parking spaces. HaarAdaBoosting and CNN are used for this purpose. Haar-AdaBoosting is used to find the possible set sub-windows, where the vehicles can be located. The sub-windows are passed through trained CNN, which identifies free and occupied parking spaces. Yamamoto et al. [21] use environment recognition for automated parking. A monocular camera mounted on the car is used to recognize the environment. The data from the monocular camera is used by YOLO for object detection. This helps in judging, whether parking is possible in the parking space. The car is able to park only if at least one vacant space is detected. Further a CNN with depth image is constructed for identifying the availability of the wheel turning space. In [22], an IoT-based parking system is designed. A GUI is designed with which pre-booking of parking slots can be done. It also allows automatic collection of parking charges from the user. Their system can also detect improper parking in the parking space. In [23], assisted parking is done. The system still requires the assistance of the driver for parking in the parking space as it is not fully automated. Initially, a path from the present position of the car to the parking space is generated. This path is then traced by the vehicle. Trained CNN is used to mimic the APS system. In [24], works are done for the improvement of disabled parking system. The system identifies disabled parking spaces and it identifies whether the parking space is occupied or not. It also checks whether authorized person is parked in the parking space. The hardware of the system contains RGB LED, RFID reader, Raspberry Pi and solar panel for battery charging. The software of the system contains data base

150

S. Sajna and R. R. Nair

of the authorized users. The system reads the RFID tags of the parked car and is compared with the data base of authorized users and thus checks whether authorized person is using the parking.

2 Analysis and Design Following conclusions are drawn from the literature survey: The main techniques widely used for automatic parking detection are using sensors, which are not practical in open parking system. In automatic parking, the general method used are dynamic and kinematic modelling of the car along with reinforcement learning. The parking system, in which the images are captured using drones or the camera placed in eye level elevated position, is required. IoT can be used to integrate the details from many parking area to get an integrated parking system for a city. The flowchart of the proposed system is given in Fig. 1. A camera is used to capture the real-time image or video of the parking system. The camera can be mounted on the top a building or a pole, so that the entire parking space is visible. The captured video/images are given to the prediction system. Alexnet is trained using CNRPark data set, and it consists of labelled images of free as well as occupied parking spaces. After training, the Alexnet is ready for accepting the real images from parking slot. The image obtained by the camera is cropped into different parts with a parking space in each part to input to Alexnet. Images are cropped using the coordinates of each parking space, so that all the parking spaces are separated for classification. These cropped images are given to Alexnet and the trained Alexnet determines the free parking spaces. For automatic parking system

Fig. 1 Flowchart

11 Learning-Based Smart Parking System

151

module, data regarding the free parking space as well as the current position of car are given as inputs. It uses Q learning, a type of reinforcement learning, to determine the optimum path to the parking space.

3 Proposed Work and Implementation The proposed smart parking system is divided into two parts as shown in Fig. 2. Initially, the image of the parking area is to be captured using a smart camera. This image is used to detect the free parking spaces. After getting free parking spaces, automatic parking is implemented, in which the car is parked in the respective parking space.

3.1 Parking Slot Detection The block diagram for parking slot detection is shown in Fig. 3. Smart camera: The smart cameras are the cameras with computational capabilities. Raspberry Pi-based smart camera can be used for this purpose. The camera should

Fig. 2 Two parts of the proposed system

Fig. 3 Parking slot detection

152

S. Sajna and R. R. Nair

be mounted in an outdoor camera box installed on an elevated platform like on the roof of the building or on a pole in front of the parking lot. Cameras used for surveillance can also be used, if available. The camera should be adjusted such that the entire parking area is clearly visible. This camera captures the video or images of the entire parking area at regular intervals, which are to be given as input to the deep learning-based prediction system. Deep learning-based prediction system: Deep learning models are very effective in image classification. Here, we are using Alexnet model, a convolution neural network, developed by Alex Krizhevsky. Further details and architecture of Alexnet are depicted later in this section. The images captured are given as input to the trained Alexnet to determine the occupancy status. Images captured by camera are segmented based on the coordinates of the parking spaces. Hence, each individual parking space is given to the trained Alexnet for classification. The trained Alexnet classifies each parking space as occupied and free. The output of the parking slot detection module is given as input to the automatic parking module. Data set: The data set used for training Alexnet is the CNRpark data set. The CNRPark data set is a data set for parking space occupancy detection, developed by Amato et al. for [25]. It has images from two different cameras placed in two different places. The images are also from different viewing perspectives of the parking area of the research area of the National Research Council (CNR) in Pisa. The data set includes images taken in different days and different timings with diverse light conditions. It includes occlusion and shadow situations, in which, the occupancy detection is challenging.

3.2 Automatic Parking Figure 4 shows the block diagram of automatic parking system. From the free parking slot detection system, using Alexnet, the details of free parking slots are given to the automatic parking system. The automatic parking system also needs the current location of vehicle. Q learning-based reinforcement learning scheme is used for automatic parking system.

Fig. 4 Automatic parking system

11 Learning-Based Smart Parking System

153

For training initially using Q learning, a Q table is created, which can be initialized with either 0 or with any random values. The agent is allowed to interact with the environment. The interaction of the agent is done in two ways: Explore and exploit. In explore, the agent explores the environment by random motions. In the initial phase of training, exploration is done more and the ratio of exploration to exploitation decays when the number of training increases. In exploit, the agent considers the values of the Q table. Q table is used as a reference to view the possible actions of present state and the rewards. The action with maximum reward is selected by the agent by referring to the Q table. A small per cent of exploration is done even in the final epochs, as exploration helps in exploring and discovering new states, which may not be selected, if exploit is used. The balance of exploration and exploitation is maintained using the variable epsilon (). It determines how often to explore and exploit. The value of epsilon decays as the number of epochs increases so that the agent traces the optimum path according to the Q table. After the interaction step using either exploration or exploitation, the next step is the updating of the Q table. Q table is updated after each episode. After a large number of iteration, optimum Q table is obtained. The equation for updating Q table is: Q t+1 (state, action) = Q t (state, action) + α ∗ (reward + γ ∗ max(Q t )) where Q t+1 (state, action) Q t (state, action) α γ

New Q value Present Q value Learning rate Discount factor.

4 Results and Discussion The images captured by the camera are given to the trained Alexnet and the free parking spaces are obtained. Q learning is used in the automated parking scheme. Alexnet gives training accuracy of 99.96% on validation data set. The plots with training loss and validation loss for Alexnet are shown in Fig. 5. The training and validation accuracy for Alexnet and Lenet are shown in Figs. 5 and 6, respectively. The training and validation accuracy of Alexnet and Lenet are plotted in Figs. 5 and 6, respectively. The image of a parking lot captured by camera, which is used as input is given in Fig. 7. The image is captured by a camera in daytime with good lighting conditions. The individual parking spaces are classified and the free parking slots are marked in the figure. The input and output images of the second test case are shown in Fig. 8 in which the roadside parking is considered, and the respective images are captured in day light. The input and output images of the third test case are shown in Fig. 9, where

154

S. Sajna and R. R. Nair

Training and Validation Loss

Training and Validation Accuracy

Fig. 5 Results using Alexnet

Fig. 6 Training and validation accuracy obtained when lenet is used

Input Image Fig. 7 Test case 1—parking area—daylight

Output Image

11 Learning-Based Smart Parking System

Input Image

155

Output Image

Fig. 8 Test case 2—roadside—daylight: input image

Input Image

Output Image

Fig. 9 Test case 3—parking area—night

the images are captured at night. These results confirms that the proposed strategy will work efficiently at day and night conditions. The simulations for automatic parking are done using Python. The main libraries used for the simulation are PYGAME and GYM. The simulations are repeated for parallel as well as perpendicular parking. The different positions of the car in perpendicular parking are shown in Fig. 10; whereas, the different positions of the car in parallel parking are shown in Fig. 11. These results confirm the adaptability of the automated parking scheme.

5 Conclusion The parking slot is classified as free and empty by using Alexnet, a CNN model and automatic parking is done by using Q learning, which is a model-free reinforcement learning. As we are using camera, the system is very effective in open parking spaces. Since Alexnet is trained with data set containing images from different view point,

156

S. Sajna and R. R. Nair

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 10 a–f Perpendicular parking positions 1–6

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 11 a–f Parallel parking positions 1–6

lighting and environment conditions, the system will be highly accurate even in low-lighting conditions. The cameras used in the parking area can also be used to perform video surveillance and to detect suspicious activities. By using automatic parking system, inexperienced drivers and old people can easily park their car. Time and fuel are saved, as the proposed system can give details about free parking space to the driver. The system also help to reduce air pollution and traffic congestion, and pavements for pedestrians are not required in parking area. As the system is fully automated, the assistance of parking attendants and workers are not required.

11 Learning-Based Smart Parking System

157

References 1. Xi X, Yu Z, Zhan Z, Yin Y, Tian C (2019) Multi-task cost-sensitive-convolutional neural network for car detection. IEEE Access 7:98061–98068 2. Gören S, Óncevarlk DF, Yldz KD, Hakyemez TZ (2019) On-street parking spot detection for smart cities. In: 2019 IEEE International smart cities conference (ISC2), pp 292–295 3. Mettupally SNR, Menon V (2019) A smart eco-system for parking detection using deep learning and big data analytics. In: 2019 SoutheastCon, pp 1–4 4. Sairam B, Agrawal A, Krishna G, Sahu SP (2020) Automated vehicle parking slot detection system using deep learning. In: 2020 Fourth international conference on computing methodologies and communication (ICCMC), pp 750–755 5. Grodi R, Rawat DB, Rios-Gutierrez F (2016) Smart parking: parking occupancy monitoring and visualization system for smart cities. In: SoutheastCon, pp 1–5 6. Paidi V, Fleyeh H, Håkansson J, Nyberg RG (2018) Smart parking sensors, technologies and applications for open parking lots: a review. IET Intell Transp Syst 12(8):735–741 7. Park W-J, Kim B-S, Seo D-E, Kim D-S, Lee K-H (2008) Parking space detection using ultrasonic sensor in parking assistance system. In: 2008 IEEE Intelligent vehicles symposium, pp 1039–1044 8. Pham TN, Tsai M, Nguyen DB, Dow C, Deng D (2015) A cloud-based smart-parking system based on internet-of-things technologies. IEEE Access 3:1581–1591 9. Suhr JK, Jung HG (2014) Sensor fusion-based vacant parking slot detection and tracking. IEEE Trans Intell Transp Syst 15(1):21–36 10. Zhang J, Chen H, Song S, Hu F (2020) Reinforcement learning-based motion planning for automatic parking system. IEEE Access 8:154485–154501 11. Ma S, Jiang H, Han M, Xie J, Li C (2017) Research on automatic parking systems based on parking scene recognition. IEEE Access 5:21901–21917 12. Baheti B, Innani S, Gajre S, Talbar S (2020) Eff-UNet: a novel architecture for semantic segmentation in unstructured environment. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition workshops (CVPRW), pp 1473–1481 13. Athira A, Lekshmi S, Vijayan P, Kurian B (2019) Smart parking system based on optical character recognition. In: 2019 3rd International conference on trends in electronics and informatics (ICOEI), pp 1184–1188 14. Hoel C-J, Driggs-Campbell K, Wolff K, Laine L, Kochenderfer MJ (2020) Combining planning and deep reinforcement learning in tactical decision making for autonomous driving. IEEE Trans Intell Veh 5(2):294–305 15. Cai BY, Alvarez R, Sit M, Duarte F, Ratti C (2019) Deep learning-based video system for accurate and real-time parking measurement. IEEE Internet Things J 6(5):7693–7701 16. El Mikaty M, Stathaki T (2018) Car detection in aerial images of dense urban areas. IEEE Trans Aerosp Electron Syst 54(1):51–63 17. Liu W, Li Z, Li L, Wang F-Y (2017) Parking like a human: a direct trajectory planning solution. IEEE Trans Intell Transp Syst 18(12):3388–3397 18. Zheng L, Xiao X, Sun B, Mei D, Peng B (2020) Short-term parking demand prediction method based on variable prediction interval. IEEE Access 8:58594–58602 19. Khan G, Farooq MA, Tariq Z, Usman M, Khan G (2019) Deep-learning based vehicle count and free parking slot detection system. In: 2019 22nd International multitopic conference (INMIC), pp 1–7 20. Xiang X, Lv N, Zhai M, El Saddik A (2017) Real-time parking occupancy detection for gas stations based on Haar-AdaBoosting and CNN. IEEE Sens J 17(19):6360–6367 21. Yamamoto K, Watanabe K, Nagai I (2019) Proposal of an environmental recognition method for automatic parking by an image-based CNN. In: 2019 IEEE International conference on mechatronics and automation (ICMA), pp 833–838 22. Sadhukhan P (2017) An IoT-based e-parking system for smart cities. In: 2017 International conference on advances in computing, communications and informatics (ICACCI), pp 1062– 1066

158

S. Sajna and R. R. Nair

23. Gamal O, Imran M, Roth H, Wahrburg J (2020) Assistive parking systems knowledge transfer to end-to-end deep learning for autonomous parking. In: 2020 6th International conference on mechatronics and robotics engineering (ICMRE), pp 216–221 24. Al Taweel Z, Challagundla L, Pagan A, Abuzneid A (2020) Smart parking for disabled parking improvement using RFID and database authentication. In: 2020 IEEE 6th World forum on internet of things (WF-IoT), pp 1–5 25. Amato G, Carrara F, Falchi F, Gennaro C, Vairo C (2016) Car parking occupancy detection using smart camera networks and deep learning. In: 2016 IEEE Symposium on computers and communication (ISCC), pp 1212–1217

Chapter 12

Automated Identification of Tachyarrhythmia from Different Datasets of Heart Rate Variability Using a Hybrid Deep Learning Model Manoj Kumar Ojha, Sulochana Wadhwani, Arun Kumar Wadhwani, and Anupam Shukla

1 Introduction The heartbeat is the most visual representation of the function of the heart and is the main physiological phenomenon of the human body. The ECG signal is a noninvasive monitoring of the heart’s electrical activity, which shows that the heart is working in real time and can be used for the detection and diagnosis of tachyarrhythmia [1, 2]. The sinus node initiates the entire heartbeat process, which includes atrial and ventricular depolarization and ventricular repolarization, with atrial depolarization generating a P wave, depolarization of ventricular generating a QRS complex, and repolarization of ventricular generating a T wave [3]. However, tachyarrhythmia leads to a disorder of cardiac activity, which leads to an acceleration of the heart rate. HRV, the change in time of the period between successive heart beats. In the literature, automatic tachyarrhythmia prediction systems follow a standardized workflow of ECG signals that are preprocessed and segmented. The segmented signals are then processed to extract features and select features. The selected features are important for classification [4]. Xia et al. [5] obtained a 2D matrix input suitable for deep CNN. ECG segments were analyzed using the stationary wavelet transform and short-term Fourier transform for the 2D matrix. Then, two distinct deep CNN models were created, for stationary wavelet transform and short-term Fourier transform output. This method did not require detection of R wave, P wave, and the feature designs for identification. Acharya et al. [6] detected arrhythmia beats: AFIB, AFL, and VFIB. The different entropy methods were used to analyze the ECG beats. The M. K. Ojha (B) · S. Wadhwani · A. K. Wadhwani Madhav Institute of Technology and Science, Gwalior, Madhya Pradesh, India e-mail: [email protected] A. Shukla Indian Institute of Information Technology, Pune, Maharashtra, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_12

159

160

M. K. Ojha et al.

features extracted using ANOVA and automated classification using the decision tree and the K-nearest neighbor were used. Recurrence quantification analysis features were implemented by Desai et al. [7] to identify four different types of heartbeats, namely NSR, AFIB, AFL, and VFIB, using ensemble classifiers. To select the best classifier, clinically significant features were ranked and fed independently into three ensemble methods: decision trees, random forests, and rotation forests. A tenfold cross-validation strategy was applied for feature set training and testing. Acharya et al. [8] presented a CNN for automatically detecting different ECG segments used in this paper. The algorithm was an 11-layer CNN model with four neurons in the output layer, each showing the AFIB, VFIB, AFL, and NSR classes. ECG signals with durations of two and five seconds were used in this study, with no need for QRS detection. Instead of using hand-crafted extraction features and selection as a typical approach, Yıldırım et al. [9] designed an end-to-end structure based on a 1D CNN model. This method, based on ten second ECG segments, was used on 1000 ECG database segments from the MIT-BIH arrhythmia dataset. Sanjana et al. [10] suggested an approach in which a model was trained on one of the tachycardia disorders, AFIB, and then tested on VFIB and sinus tachycardia. The RNN, LSTM, GRU, CNN, and RSCNN models were used in the analysis. The analysis showed that given the fact that tachycardia diseases share a fast beating rhythm, the method was unable to identify various types of tachycardia diseases. In this paper, we have not used an automated computer-based system in the traditional way. This work differs from the previous study in that it does not use feature extraction or feature selection methods. To automatically classify the heart rate of the four types of tachyarrhythmia (AFL, VFIB, AFIB, and NSR) in the ECG signal, we used a twelve-layer HDL model. As a result of this research, there is not necessary to study different extraction of features methods or identify which classification methods perform finest with the extracted features.

2 Material and Methods The ECG signals were divided into different HRV windows based on annotations obtained from a publicly available database. Before implementing windows as an input in the HDL model for verification and testing, all windows are normalized by Z-score to resolve the amplitude scaling and eliminate the offset effect of each window [11]. This HRV window was used to classify four different types of tachyarrhythmia, including NSR, using an HDL model without detecting any ECG waves. The proposed architecture of the HDL model for tachyarrhythmia classification is shown in Fig. 1.

12 Automated Identification of Tachyarrhythmia …

161

Tachyarrhythmia Beats Hybrid Model

NSR AFIB ECG Records

CNN + LSTM Model

AFL VFIB

Statistical Parameter Evaluation of Model Fig. 1 Proposed hybrid deep model for identification of tachyarrhythmia

2.1 ECG Data Records The ECG databases were obtained from three different databases that are publicly available, including the MIT-BIH atrial fibrillation database (afdb), the Creighton University ventricular tachyarrhythmia database (cudb), and the MIT-BIH arrhythmia database (midb) [12]. The midb has been sampled at 360 Hz, whereas the cudb and afdb have been sampled at 250 Hz. Therefore, midb ECG signals have decreased sampling from 360 to 250 Hz. Furthermore, according to certain annotations obtained from databases, ECG signals have been split into different windows. In this work, we created one ECG window based on a five-second signal using ECG II lead recordings. The total database has been converted into 8683 HRV windows that have 361 NSR windows, 7521 AFIB windows, 736 AFL windows, and 65 VFIB windows.

2.2 Convolution Neural Network An important feature of the CNN architecture is that it uses convolution to automatically choose and extract features from input data that are most significant. It retains a great focus on local features and its position, among other features. Otherwise, the neurons on the same feature maps on CNN have the same weight, resulting in parallel network learning, which significantly reduces training time. Its first objective is to learn the representations of the input features that it receives. Convolution layers consist of multiple kernels for calculating various feature maps. The kernel first convolves the input data and then uses a nonlinear activation function by elements on the convolved results to obtain a new feature map [13]. Mathematically, the following formula is used to generate an ith feature map for the lth layer:

162

M. K. Ojha et al. l l Ym,n,i = wil xm,n + bil

(1)

l where xm,n is input, patch centered at location (m, n), wil is weight vector, and bil is bias.

2.3 Long Short-Term Memory LSTM can study long-term dependencies, and it works widely for various types of tanks [14]. LSTM was created in order to avoid problems with long-term dependencies. This also reduces the problem of vanishing gradient. The LSTM cell is an especially logic block designed that helps reduce problems with vanishing gradients and is designed enough to be more useful for long-term memory tasks, i.e., for predicting sequential data [13].

2.4 Hybrid Deep Model Architecture The proposed multi-layer architecture of the HDL model for tachyarrhythmia classification is briefly presented in Table 1. In this architecture, three convolution layers, the first, third, and fifth, are convolved together with filter sizes of 25, 10, 4, and 1 according to the equation: Table 1 Architecture for a proposed hybrid deep learning (HDL) model Layers no.

Layer types

Neurons

Kernel size

Stride

i

Convolution layers

1226 × 3

25

1

ii

Max pooling layers

613 × 3

2

2

iii

Convolution layers

603 × 5

10

1

iv

Max pooling layers

301 × 5

2

2

v

Convolution layers

297 × 5

4

1

vi

Max pooling layers

148 × 5

2

2

vii

LSTM

148 × 16

–

–

Viii

LSTM

148 × 32

–

–

ix

Flatten

4736

–

–

x

Dense ReLU

500

–

–

xi

Dense ReLU

100

–

–

xii

Dense softmax

4

–

–

12 Automated Identification of Tachyarrhythmia …

Yn =

N −1

163

xi f n−i

(2)

i=0

where Y = output vector, x = signal values, f = denotes filters, and N = number of elements. Max pooling is used to reduce the feature’s map size after each convolution layer. The categorical cross-entropy function is used to determine the loss parameters using the fivefold cross-validation method in the database. The HDL model has been trained and tested iteratively. The test is also performed after each iteration. The dense layer is placed after the LSTM layer. The activation function ReLU and dense layer of softmax have been used. The output value of the final dense layer determines the class of tachyarrhythmia. Table 2 Confusion matrix summarizes the test dataset of experimental results Truth data NSR Classifier results

AFIB

AFL

VFIB

NSR

67

04

02

00

AFIB

06

1488

08

02

AFL

02

03

142

00

VFIB

00

02

00

11

Measurement in %

Table 3 Performance of classification on the test dataset

105 100

95 90 85

80 75

Accuracy (AC)

NSR

AFIB

AFL

VFIB

Average

99.14

99.77

99.195

99.25

98.62

Precision (PR)

89.35

99.12

93.1

85.43

91.75

Recall (RC)

93.11

99

97.41

85

93.63

F1 Score (F1)

91.51

99.1

95.05

85.2

92.715

164

M. K. Ojha et al.

Table 4 Performance comparison from other methods

Accuracy (%)

99.45

98.37

97.37

98.4

92.5

CNN

Rotation Forest

KNN

HDL

Desai et al. 2016 [7]

Martis et al. 2014 [15]

Proposed

Deep CNN

Acharya et al. Amrani et al. 2017 [8] 2018 [16]

3 Result The proposed HDL model has been validated and tested on a desktop PC (i5 processor and 8 GB of RAM). The Python language is used to identify the VFIB, AFIB, and AFL tachyarrhythmia with NSR. The accuracy was confirmed by comparing the results with other methods in the literature. This HDL model was trained on an 80% dataset with a fivefold cross-validation strategy. Therefore, the entire ECG dataset is split into five equal parts. Four of the five sections are used for training, and the rest of the one is used for testing. This process is repeated five times by transferring the test dataset. Performance is evaluated on each fold. The overall performance of the system is determined by taking the average of all five times. The rest of the 20% dataset was used for testing. Statistical performance is calculated using Re, Pe, Fs, and Ac. The remaining 20% dataset was put through its paces for testing. The statistical performance is calculated with the help of the following parameters: Re, Pe, Fs, and Ac. Accuracy(Ac ) =

TP + TN , TP + FN + FP + TN

(3)

TP , TP + FP

(4)

percentage of correct predictions. Precision (Pe) =

A correct positive prediction compared to the overall correct positive prediction. Recall (Re) =

TP , TP + FN

(5)

12 Automated Identification of Tachyarrhythmia …

165

correct positive predictions relative to total actual positive. F1 Score (Fs) = 2∗

Pe ∗ Re , Pe + Re

(6)

provides a single estimate that balances Ac and Re issues in a single number. Here, TP = true positive, TN = true negative, FN = false negative, FP = false positive. ECG classification has been evaluated based on confusion matrix as shown in Table 2. Using training and test datasets, the HDL model is used to classify the heart rhythms in this experiment into five different types. Correctly detected rhythms corresponding to their respective classes are represented by diagonal elements. According to the confusion matrix table in the test dataset, there are six NSR, sixteen AFIB, five AFL, and two VFIB that have been mistakenly divided into distinct classes.

4 Discussion The HDL model has been divided into different tachyarrhythmia classes according to the results: Ac 99.19, Pe 91.75, Re 93.63, and Fs 92.71%. Table 3 summarizes statistical measurements of the HDL model. It has been shown that the proposed method can provide excellent classification accuracy, indicating that it has the potential to become a useful way for cardiologists to diagnose tachyarrhythmia using ECG. Several studies have been published in the literature that deal with the classification of ECG data for tachyarrhythmia using publicly available databases such as cudb, afdb, and mitdb. As shown below, different state-of-the-art techniques are used to detect tachyarrhythmia as shown below. Acharya et al. [8] proposed the CNN method for automatic classification of four classes, which were NSR, AFIB, ALF, and VFIB, based on five seconds of ECG duration without detecting QRS. The technique achieved 94.90% accuracy within five seconds of ECG duration. Martis et al. [15] presented a technique for the automatic detection of normal, AFL, and AFIB ECG rhythms. The four different types of methods were applied. Three different classification methods were tested on the MIT-BIH arrhythmia databases and AF databases. DCT coupled with ICA and KNN gave an average Se of 99.61%, an average Sp of 100%, and a classification Ac of 99.45% using tenfold cross-validation. Compared to our system, we have identified another harmful heartbeat, which is VFIB; this rhythm is more complicated than the other three. A very deep CNN is proposed by Amrani et al. [16] by using small filters throughout the network to decrease noise and increase performance. They developed an expansively more precise system that can discriminate between NSR heartbeats and the three AFIB, AFL, and VFIB arrhythmias without using any noise reduction or preprocessing approaches. The experiment results indicate that the suggested method achieves an accuracy of 96.62% with fusion and 93.15% without fusion. Desai et al.

166

M. K. Ojha et al.

[7] classify four types of ECG rhythms, namely NCR, AFIB, AFL, and VFIB, using three ensemble methods: decision tree, random forest, and rotation forest based on ranked features. The tenfold cross-validation strategy is used to train a set of functions and provides 98.37% accuracy. Consequently, the performance of the HDL model is affected by the number of subjects (data) used in each class. Most of the papers presented in Table 4 found the QRS wave in their studies. Our result is comparable to the previous work presented in Table 4, which proves that the detection of the QRS wave is not necessary for the classification of tachyarrhythmia.

5 Conclusion In this paper, we propose a hybrid model that has been identified as four different types of tachyarrhythmia beats: AFIB, VFIB, AFL, and NSR using an HRV database. So far, an overall accuracy of 98.4% has been achieved using the HDL model with fivefold cross-validation in the automated detection of different types of tachyarrhythmia using the HRV database. There is no requirement for explicit feature extraction and the use of traditional classifiers. Our model can help clinicians diagnose tachyarrhythmia accurately. We will see if this approach can be used to diagnose various heart disorders, including myocardial infarction and coronary artery disease.

References 1. Roth GA, Forouzanfar MH, Moran AE, Barber R, Nguyen G, Feigin VL, Naghavi M, Mensah GA, Murray CJL (2015) Demographic and epidemiologic drivers of global cardiovascular mortality. New England J Med 372(14):1333–1341 2. Benjamin EJ, Blaha MJ, Chiuve SE, Cushman M, Das SR, Deo R (2017) Comité de Estadísticas y del Subcomité de Estadísticas de Ataque Cerebral de la American Heart Association. Estadísticas de enfermedad cardíaca y de ataque cerebral. Información actualizada para 2017: informe de american heart association. Circulation 135:e146–603 3. Kumar OM, Wadhwani S, Wadhwani AK (2020) Efficient R peak detection algorithm from ECG using combination stationary wavelet transform and hilbert transform. Solid State Technol 63(5):8685–8697 4. Pandey SK, Janghel RR (2020) Automatic arrhythmia recognition from electrocardiogram signals using different feature methods with long short-term memory network model. Sig Image Video Proc 14(6):1255–1263 5. Xia Y, Wulan N, Wang K, Zhang H (2018) Detecting atrial fibrillation by deep convolutional neural networks. Comput Biol Med 93:84–92 6. Acharya UR, Fujita H, Adam M, Lih OS, Hong TJ, Sudarshan VK, Koh JEW (2016) Automated characterization of arrhythmias using nonlinear features from tachycardia ECG beats. In: 2016 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 000533– 000538 7. Desai U, Martis RJ, Rajendra Acharya U, Gurudas Nayak C, Seshikala G, RANJAN, Shetty K (2016) Diagnosis of multiclass tachycardia beats using recurrence quantification analysis and ensemble classifiers. J Mech Med Biol 16(1):1640005

12 Automated Identification of Tachyarrhythmia …

167

8. Acharya UR, Fujita H, Lih OS, Hagiwara Y, Tan JH, Adam M (2017) Automated detection of arrhythmias using different intervals of tachycardia ECG segments with convolutional neural network. Inf Sci 405:81–90 9. Yıldırım Ö, Pławiak P, Tan R-S, Rajendra Acharya U (2018) Arrhythmia detection using deep convolutional neural network with long duration ECG signals. Comput Biol Med 102:411–420 10. Sanjana K et al (2020) Explainable artificial intelligence for heart rate variability in ECG signal. Healthcare Technol Lett 7(6):146 11. Xu SS, Mak M-W, Cheung C-C (2018) Towards end-to-end ECG classification with raw signal extraction and deep neural networks. IEEE J Biomed Health Inf 23(4):1574–1584 12. Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng C-K, Eugene Stanley H (2000) PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23):e215–e220 13. Murat F, Yildirim O, Talo M, Baloglu UB, Demir Y, Rajendra Acharya U (2020) Application of deep learning techniques for heartbeats detection using ECG signals-analysis and review. Comput Biol Med 120:103726 14. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 15. Martis RJ, Rajendra Acharya U, Adeli H, Prasad H, Tan JH, Chua KC, Too CL, Yeo SWJ, Tong L (2014) Computer aided diagnosis of atrial arrhythmia using dimensionality reduction methods on transform domain representation. Biomed Sig Proc Control 13:295–305 16. Amrani M, Hammad M, Jiang F, Wang K, Amrani A (2018) Very deep feature extraction and fusion for arrhythmias detection. Neural Comput Appl 30(7):2047–2057

Chapter 13

Automatic Pathological Myopia Detection Using Ensemble Model Rajeshwar Patil, Yogeshwar Patil, Yatharth Kale, Ashish Shetty, and Sanjeev Sharma

1 Introduction Pathological myopia is an extreme amount of shortsightedness which causes alteration of the globe or shape of the eye, which may lead to significant loss of vision [1]. This condition is progressive and irreversible. In the early stages of the slow progression of retinal and choroid stretching, individuals may not experience any symptoms of pathological myopia. If not treated, the condition can cause several complications and eventually result in loss of vision [1, 2]. This can be avoided by regular eye checkup, but it increases the load on the medical professionals, and the time for the results is also more. An automated computer-based approach can help in detecting the disease in early stages, and appropriate treatment can be given to the patients. Pathological myopia can be detected by using methods such as fundoscopy and spectral domain OCT (SD-OCT). These methods either take a long time to give back the result of the test or require highly complex machinery for the test. There is a need for a computer-aided diagnosis system which can automate the process. Artificial R. Patil (B) · Y. Patil · Y. Kale · A. Shetty · S. Sharma Indian Institute of Information Technology, Pune, India e-mail: [email protected] Y. Patil e-mail: [email protected] Y. Kale e-mail: [email protected] A. Shetty e-mail: [email protected] S. Sharma e-mail: [email protected] URL: https://deeplearningforresearch.com/ © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_13

169

170

R. Patil et al.

intelligence, machine learning and deep learning are the cutting-edge technologies that are used to solve great challenges in the world, encompassing the medical field also. Pathological myopia is of great clinical importance as it is progressive and irreversible and affects individuals during the most productive years of their lives. It is thought to affect approximately 3% of the population [3], so it is important to develop a system that is cheaper and easily available for everyone. Deep learning provides fast and automatic solutions to the detection of disease once trained with different ocular images. Thus, it reduces the load on the ophthalmologist, preventing vision damage in people. Apart from this, artificial intelligence, machine learning and deep learning are emerging fields in today’s data-driven world. Deep learning has made huge progress in the medical field. Various models have produced accurate results on medical data which show the fact that these models are capable of learning patterns from medical data and thus capable of producing a promising result in the medical field. We have proposed an ensemble model which can classify pathological myopia. Transfer learning is applied on three pretrained models (Xception, InceptionV3 and DenseNet201), and these models are further ensembled to get better accuracy. The rest of the paper is organized as follows: Sect. 2 discusses the literature survey. Section 3 discusses the materials and methods used. Section 4 discusses the experiments and results. Section 5 concludes the paper with future trends.

2 Literature Survey In [4] this paper, the classification of pathological myopia images and image segmentation of different lesions are done using deep learning technique, CNN and U-Net method. The model achieves an accuracy of 97.8% when it is experimentally tested. The author [5] proposed a method for detection of pathological myopia using CNN model. The CNN model consists of two convolutional layers followed by one dense layer and output as sigmoid layer. The following model is trained on PALM challenge 2019 dataset. The model achieved best validation loss of 0.1457 and an AUC score of 0.9845. The best accuracy achieved is 95%. The [6] paper proposes a method for detecting pathological myopia using peripapillary atrophy features based on variational level sets. The model is evaluated on dataset containing 40 images. The evaluation phase achieved an accuracy of 95% and achieved specificity of 1 and sensitivity of 0.9. The author [7] proposed a deep learning system for myopia detection. The dataset used contains 2350 ocular appearance images. The model consists of deep convolutional neural network; VGG-Face with transfer learning is used. The model has 15 layers for feature extraction and detection of disease. The model achieved an AUC of 0.9270 and sensitivity of 81.13%. In [8] this paper, the authors created a CNN-based approach to evaluate the likelihood of a patient being diagnosed with pathological myopia (PM) based on his or her

13 Automatic Pathological Myopia Detection Using Ensemble Model

171

retinal pictures. Using novel residual U-Net architecture, the optic disc is segmented from the retinal images. Validation data results for PM recognition and optic disc segmentation are evaluated, and a result of 0.9973 and 0.901 is achieved. In this paper [9], pretrained convolutional neural network is used for the detection of pathological myopia. The author got an accuracy of 95.34% using ResNet-50 and 98.08% using DenseNet121. In this paper [10], the authors tested three systems for multi-class classification using a bagging ensemble, and they found that models assembled from deep neural networks outperformed traditional algorithms alone.

3 Materials and Methods 3.1 Methodology As shown in Fig. 1, the steps used to detect and classify pathological myopia images using deep learning begin by looking for existing models and the ways to implement them. The next step is data collection and applying the needed preprocessing to improve and enhance the images, design a predictive deep learning model and train it on the collected images. In order to determine the model’s performance, the trained model is tested on the testing data (images) to find the performance of the model if the model performs better than the existing models stop training else repeat from the designing step.

3.2 Datasets The data were collected from two ocular disease datasets; these RGB images are taken under different imaging conditions. Ocular disease intelligent recognition (ODIR) is a dataset which consists of 5000 opthalmic images of patients [11]. The left and right eye colour fundus images of each patient are present along with the doctors’ diagnostic keywords. The images are annotated by trained human readers with quality control management. The patient

Fig. 1 Flow graph

172

R. Patil et al.

Fig. 2 Sample images from the datasets

images are classified into eight labels, namely normal, diabetes, glaucoma, cataract, age-related macular degeneration, hypertension, pathological myopia, other diseases/abnormalities [12]. The dataset contains 1000 fundus images of 39 different classes. Only pathological myopia and normal fundus images are collected from these two datasets (Fig. 2).

3.3 Preprocessing Labelled pathological myopia and normal fundus images are collected from two datasets from Kaggle [11, 12]. These datasets are then merged together. The data imbalance problem was handled by restricting the number of normal images. These images are resized to 224 × 224 to obtain a new preprocessed dataset. These images are rescaled to transform every pixel value from the range [0–255] to [0–1] in order to treat high-resolution and low-resolution images in the same manner. Data augmentation is applied to the dataset, namely horizontal flip, vertical flip, shear range and rotation range, to increase the dataset and also to make the trained model more robust to real-world data.

3.4 Proposed Designed Xception The model is based on depthwise separable convolution layers model. On the ImageNet dataset, Xception achieved a top-one accuracy of 79% and a top-five accuracy of 94.5%. There are over 15 million high-resolution labels on the ImageNet dataset, which is comprised of approximately 22,000 categories of images.

13 Automatic Pathological Myopia Detection Using Ensemble Model

173

Fig. 3 Xception model architecture [14]

Fig. 4 Densenet model architecture [16]

The Xception architecture has 36 convolutional layers, forming the feature extraction base of the network [13]. Figure 3 shows the architecture of the Xception model. DenseNet201 model consists of convolutional neural network that is 201 layer deep. On the ImageNet dataset, DenseNet201 achieved a top one accuracy of 77.3% and a top-five accuracy of 93.6%. There are over 15 million high-resolution labels on the ImageNet dataset, which is comprised of approximately 22,000 categories of images. The model has 20,242,984 parameters [15]. Figure 4 shows the architecture of the DenseNet201 model. InceptionV3 is a convolutional neural network designed to reduce computing costs without reducing accuracy and to make the architecture easy to extend or adapt without sacrificing performance or efficiency. On the ImageNet dataset, InceptionV3 achieved a top-one accuracy of 77.9% and a top-five accuracy of 93.7%. There are over 15 million high-resolution labels on the ImageNet dataset, which is comprised of approximately 22,000 categories of images. InceptionV3 has 23,851,784 parameter

174

R. Patil et al.

Fig. 5 Inception model architecture [18]

and has 159 layer deep architecture [17]. Figure 5 shows the architecture of the InceptionV3 model. Ensemble model Ensembling is a meta-algorithm for combining several machine learning models. Ensembles can be used for several tasks like decreasing variance (bagging), bias (boosting) [19] and improving predictions (stacking) [20]. The stacking method can be used to stack several predictive models together; a new model can be constructed that combines the information from several models. Stacking shows the areas where each model performs best, while discrediting the areas where it performs poorly. For this reason, stacking is used to improve the models prediction. The ensemble model is built by stacking three convolutional models trained on the pathological myopia dataset. The three models are built by applying transfer learning to Xception, InceptionV3 and DenseNet201 models, respectively, and adding three custom layers to each. Images from the dataset are passed to models, each of which has previously been trained on images from the pathological myopia dataset. The output from these models is fed to a hidden layer which is connected to a softmax layer with two nodes corresponding to the two labels (pathological myopia and no pathological myopia). This stack ensemble model is then trained on the pathological myopia dataset. After training, the label with the highest probability is output as a result. The ensembled model architecture is shown in Fig. 6.

4 Experiments and Results Firstly, individual models were built for detecting Pathological myopia in binary format using pretrained models. Transfer learning was applied to several pretrained models like VGG16, VGG19, Inceptionv3, DenseNet201, Xception, ResNet, MobileNet, EfficientNet, and the best three models were selected. The models with the best

13 Automatic Pathological Myopia Detection Using Ensemble Model

Fig. 6 Ensemble model architecture

175

176

R. Patil et al.

Fig. 7 Evaluation metrics for first model Table 1 Classification report of first model Precision Recall No pathological myopia Pathological myopia Accuracy Macro average Weighted average

0.76 0.97

0.99 0.64

0.87 0.86

0.81 0.83

f 1 -score

Support

0.86 0.77 0.83 0.81 0.82

68 58 126 126 126

performance were selected for ensembling, namely Xception, Inceptionv3 and DensNet201. Then, these models are combined using stack ensembling. Custom layers are added to the models for relevant results. The class with maximum probability is predicted by the final ensembled model.

4.1 First Model The model is trained using transfer learning on Xception model which is pretrained on the ImageNet dataset. The weights of the model are frozen (i.e. are set to nontrainable), and the last layer of the pretrained model is removed. Then, the custom layers are attached, namely (three dense layers and a softmax layer with two nodes). A learning rate of 0.0001 is used with Adam optimizer in the training phase. The model is trained for ten epochs, after which no significant improvement was noted. An accuracy of 83% is achieved on the validation data. The confusion matrix, AUCROC curve and classification report of the model on testing it on validation data are shown in Fig. 7 and Table 1, respectively.

13 Automatic Pathological Myopia Detection Using Ensemble Model

177

Fig. 8 Evaluation metrics for second model Table 2 Classification report of second model Precision Recall No pathological myopia Pathological myopia Accuracy Macro avg Weighted avg

0.80 0.98

0.99 0.71

0.89 0.88

0.85 0.86

f 1 -score

Support

0.88 0.82 0.86 0.85 0.85

68 58 126 126 126

4.2 Second Model The second model adopts the same approach as the first model, but transfer learning is applied to the InceptionV3 model instead of Xception model. An accuracy of 86% is achieved on validation data. The confusion matrix, AUC-ROC curve and classification report of the model on testing it on validation data are shown in Fig. 8 and Table 2, respectively.

4.3 Third Model The third model adopts the same approach as the first model, but transfer learning is applied to the DenseNet201 model instead of Xception model. An accuracy of 94% is achieved on validation data. The confusion matrix, AUC-ROC curve and classification report of the model on testing it on validation data are shown in Fig. 9 and Table 3, respectively.

178

R. Patil et al.

Fig. 9 Evaluation metrics for third model Table 3 Classification report of third model Precision Recall No pathological myopia Pathological myopia Accuracy Macro average Weighted average

0.92 0.98

0.99 0.90

0.95 0.95

0.94 0.94

f 1 -score

Support

0.95 0.94 0.94 0.94 0.94

68 58 126 126 126

4.4 Ensembled Model The three trained models are loaded, and their last softmax layers are removed, and the weights of these models are frozen (i.e. set to non-trainable). These three models are stacked to form a stacked ensemble model. The output layers of these three models are connected to a hidden layer which is further connected to a softmax layer with two nodes corresponding to the two labels (non-pathological myopia and pathological myopia). The stacked ensemble’s input is in the form ([Xtrain, Xtrain, Xtrain], Ytrain) and ([Xtest, Xtest, Xtest], Ytest) in order to compensate input for all models and a single labels array to check the model’s performance. A learning rate of 0.0001 is used with Adam optimizer in the training phase. An accuracy of 95.23% is achieved on the validation data. No dropouts are added during training phase. The ensembled model is trained for 10 epochs, after which it converges. The training accuracy and training loss for the ensembled model are 95.19% and 0.1646, respectively. The validation accuracy and validation loss are 96.03% and 0.1341, respectively. The training and testing accuracy and training and testing loss graphs for the final ensembled model are shown in Fig. 10. The confusion matrix, AUC-ROC curve and classification report of stack ensemble model on validation data are shown in Fig. 11 and Table 4, respectively. The features learned by each individual model might be different; this gives the ensemble model an advantage as all models contribute together for better results. When some information is missed by a model and picked up by others and vice versa, they can contribute to increasing the accuracy of the complete ensemble model.

13 Automatic Pathological Myopia Detection Using Ensemble Model

179

Fig. 10 Ensemble model architecture

Fig. 11 Evaluation metrics for ensemble model Table 4 Classification report of the ensemble model Precision Recall No pathological myopia Pathological myopia Accuracy Macro average Weighted average

0.93 0.98

0.99 0.91

0.96 0.95

0.95 0.95

f 1 -score

Support

0.96 0.95 0.95 0.95 0.95

68 58 126 126 126

5 Comparative Study Prior studies have used a single dataset to detect pathological myopia. The dataset contains less number of images (from same source) for some of the classes which can lead to overfitting and result in a less robust model in real world.

180

R. Patil et al.

We propose a model which is trained on two datasets belonging to different sources which make our model more robust. We are collecting normal and pathological images from the two datasets. Since the quality of images can differ from different imaging conditions in the real world, by using multiple datasets, the model can address this issue. Our ensemble model achieved an accuracy of 95.23% for detecting pathological myopia in ocular images.

6 Conclusion and Future Scope The paper presents a deep learning approach for automatic detection of pathological myopia (an eye disease). Three models were built by applying transfer learning on Xception, InceptionV3 and DenseNet201 models. These models were then combined using stack ensembling. Transfer learning is applied on Xception, InceptionV3 and DenseNet201. These models are trained individually and then combined by using stacked ensembling. Additionally, two dense layers are added for improving the combined accuracy. The ensembled model is then trained on the combined pathological myopia dataset, and it achieved an accuracy of 95.23%. The above model can be extended to classify more diseases. Accuracy for classification of diseases can be improved by extensive training. More ensemble [21] learning techniques can be used for improving accuracy and robustness of model. More images of different classes can be added so as to normalize the model more.

References 1. 2. 3. 4. 5. 6.

7.

8.

9.

Myopia and pathological myopia, Feb 2021 The low vision centers of Indiana. http://www.eyeassociates.com/pathological-myopia Pathological myopia Devda J, Eswari R (2019) Pathological myopia image analysis using deep learning. Procedia Comput Sci 165:239–244 Rauf N, Gilani SO, Waris A (2021) Automatic detection of pathological myopia using machine learning. Sci Rep 11(1):1–9 Tan NM, Liu J, Wong DWK, Lim JH, Zhang Z, Lu S, Li H, Saw SM, Tong L, Wong TY (2009) Automatic detection of pathological myopia using variational level set. In: 2009 Annual international conference of the IEEE engineering in medicine and biology society. IEEE, pp 3609–3612 Yang Y, Li R, Lin D, Zhang X, Li W, Wang J, Guo C, Li J, Chen C, Zhu Y et al (2020) Automatic identification of myopia based on ocular appearance images using deep learning. Ann Transl Med 8(11) Baid U, Baheti B, Dutande P, Talbar S (2019) Detection of pathological myopia and optic disc segmentation with deep convolutional neural networks. In: TENCON 2019—2019 IEEE region 10 conference (TENCON). IEEE, pp 1345–1350 Kalyanasundaram A, Prabhakaran S, Briskilal J, Senthil Kumar D (2020) Detection of pathological myopia using convolutional neural network. Int J Psychosoc Rehabil 24(05)

13 Automatic Pathological Myopia Detection Using Ensemble Model

181

10. Smaida M, Yaroshchak S (2020) Bagging of convolutional neural networks for diagnostic of eye diseases. In: COLINS, pp 715–729 11. Larxel. Ocular disease recognition, Sep 2020 12. Linchundan. 1000 fundus images with 39 categories, June 2019 13. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258 14. Srinivasan K, Garg L, Datta D, Alaboudi AA, Jhanjhi NZ, Agarwal R, Thomas AG (2021) Performance comparison of deep CNN models for detecting driver’s distraction. CMC-Comput Mater Continua 68(3):4109–4124 15. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708 16. Khanna M (2020) Paper review: DenseNet—densely connected convolutional networks, Sep 2020 17. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9 18. Mahdianpari M, Salehi B, Rezaee M, Mohammadimanesh F, Zhang Y (2018) Very deep convolutional neural networks for complex land cover mapping using multispectral remote sensing imagery. Remote Sens 10(7):1119 19. Bühlmann P (2012) Bagging, boosting and ensemble methods. In: Handbook of computational statistics. Springer, Berlin, pp 985–1022 20. Güne¸s F, Wolfinger R, Tan P-Y (2017) Stacked ensemble models for improved prediction accuracy. In: Proceedings of the static analysis symposium, pp 1–19 21. Wolpert DH (1995) Stacked generalization. Neural Networks 5(2):241–259

Chapter 14

Revolutionary Solutions for Comprehensive Assessment of COVID-19 Pandemic Shradha Suman Panda, Dev Sourav Panda, and Rahul Dixit

1 Introduction The spread of novel coronavirus has created an emerging and alarming situation worldwide. Coronaviruses are originally grouped into the family Coronaviradae and are zoonotic in nature. According to UNICEF, coronaviruses are large in size where the cell diameter measures around 400–500 µm. Hence, the viral entryway can be prevented by wearing masks. These viruses have unique features of switching their genetic code [8] regularly as they evolve quickly, thereby making vaccines dysfunctional. The origin of first human coronaviruses occured in the year 1937. But the name coronavirus became prominent in the year 1965 due to its crownlike appearance. Some of the known prototype coronaviruses that infect humans are alpha coronavirus 229E, beta coronavirus OC43, SARS-CoV (2003, China), MERS-CoV (2012, Saudi Arabia) [19], and the novel SARS-CoV 2 (2019). This novel coronavirus outbreak became apparent on 31 December 2019, from Wuhan city, Hubei Provinces, China [3, 4]. These new strain of viruses are responsible for the emergence of the novel disease, namely COrona-VIrus Disease 2019 (COVID19), as declared officially by WHO [1] on 11 February 2020. On 11 March 2020, WHO officially declared the novel coronavirus outbreak as a pandemic. According to a survey, it has been observed that when an infected person sneezes or coughs, tiny liquid droplets that get sprayed from their nose or mouth can land into the nasal passage, mouth or eyes of any nearby person having higher tendency of inculcating COVID-19 [2], thus infecting the susceptible person. These viruses also have a large impact on the wildlife, thereby affecting the humans through animal-toS. S. Panda (B) Indian Institute of Technology Jodhpur, All India Institute of Medical Science Jodhpur, Joint Program, Jodhpur, Rajasthan 342037, India e-mail: [email protected] D. S. Panda · R. Dixit Indian Institute of Information Technology Pune, Pune, Maharashtra 411048, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_14

183

184

S. S. Panda et al.

human transmission. Although these viruses can infect people of all ages, but can be seen vulnerable mainly in individuals having weak immune response or pre-existing medical conditions such as asthma, diabetes, or any cardiovascular disease. The severity of this disease can be life threatening. According to the statistical data [4, 5], the number of patients infected by this virus has been increasing exponentially, mostly due to the community transmission after a single introduction of infected individual into the human population. The outbreak of this pandemic has wreaked havoc across the globe. The crisis insisted the nations across the globe to take necessary steps in order to fight against this deadly disease. This paper can be further structured as follows: Sect. 2 gives a complete briefing of the harsh effects of SARS-CoV 2 on the respiratory system of human beings. Further, in Sect. 3, various sophisticated methods of implementation for COVID-19 treatment are discussed. Furthermore, in Sect. 4, we have proposed SWIFT technique for procuring the lungs imaging of ARDS patients and further detecting COVID-19. Finally, Sect. 5 concludes the paper, providing certain future scopes to work with.

2 Harsh Effects of SARS-CoV 2 on Respiratory System Human respiratory system, also referred as respiratory apparatus or ventilator system as described in Fig. 1, consists of series of organs and structures which are responsible for gas exchange (such as ‘Oxygen’ and ‘Carbon-dioxide’). The gas exchange process is mainly performed by the primary organ of the respiratory system, i.e.

Fig. 1 Human respiratory system

14 Revolutionary Solutions for Comprehensive Assessment …

185

Fig. 2 a Broncho-pneumonia and b lobar pneumonia

lungs. The delicate tissues of the lungs are further protected by the thoracic cage. Upper respiratory tract of the respiratory system contains various organs such as nose and nasal passages, pharynx, and some portion of larynx lying above the vocal cords. Organs of the lower airways include trachea, portion of larynx below the vocal cords, large tube like structures, i.e. bronchi and small tube like structures, i.e. bronchioles. Each lung has separate sections referred as ‘lobes’. As the breathing action starts, the atmospheric air pumped in through the trachea (also referred as ‘windpipe’), surpassing the large tube like structures known as bronchi and then through the small tube like structures, i.e. bronchioles and finally into the tiny, balloon-shaped air sacs arranged in the form of clusters called as alveoli [23]. In the lungs, alveoli is the main site responsible for the interface of gases exchange in the bloodstream. When air enters through the nasal passage, these air sacs inflate like a balloon, and when an individual exhale, these tiny sacs deflate. Alveoli is further surrounded by the small blood vessels referred as ‘capillaries’. Oxygen from the air passes into the capillaries, and carbon dioxide from the body expels out of capillaries into the alveoli. Figure 1 represents the clear view of human respiratory system along with its function. The main role of the respiratory system, i.e. delivering air into the lungs is properly described along with complete mechanism of the respiratory system is briefly mentioned. Further, different lung injuries including pneumonia can be considered as one of the critical symptoms as seen in the COVID-19 patients was clearly mentioned. Coronavirus invades two types of cells in lungs, namely ciliated cells and mucus (or goblet) cells. Ciliated cells are the preferred as the host cells for SARS-CoV 2. In a healthy body, hair like cilia-linings aids in constantly pushing the germs and the mucus out of the airways. Cells of the immune system further attack these germs and viruses. As the immune system started attacking the multiplying viruses, bronchioles and alveoli became inflamed. This inflammation can cause the alveoli to fill with fluid, making it difficult to get the required amount of oxygen by the body and hence damaging the lungs. When one of the lobes of the lungs gets affected, it

186

S. S. Panda et al.

Fig. 3 Severe damage in lungs of COVID-19 patient

can result in the development of lobar pneumonia [16], and when several regions of lungs get affected, Broncho-pneumonia can develop. Broncho-pneumonia often leads to Lobar pneumonia. Lobar pneumonia can be characterised by inflammatory exudates that can affect large areas of the lobe of a lung. Figure 2 demonstrates the different types of pneumonia as Broncho-pneumonia and lobar pneumonia (some of the serious forms of lungs injuries) and its severe impacts on the lungs of the COVID-19 patients with the help of CT-imaging of chest X-rays. Pneumonia can cause chest pain, cough, fever, shortness of breath, loss of appetite, and fatigue. When the breathing action gets difficult, respiratory failure occurs depicting some lung injuries such as ARDS, resulting in the use of a machine called ‘Ventilator’ in order to assist an individual in breathing. This ARDS form of lung injury often leads to pneumonia which is one of the critical symptoms seen in COVID-19 patients, thereby damaging the lungs severely [24]. Figure 3 represents the chest X-rays of a 37-year-old man with confirmed coronavirus disease. The CT images depict bilateral multi-focal ground-glass opacities and consolidated lesions. Development of a potential vaccine against this deadly coronavirus is a challenging task. The vaccine would expose the body to the virus that is too weak to cause infection but strong enough to stimulate one’s immune response. Within few weeks, cells of immune system would produce markers called as antibodies which would be specific for only coronavirus or its spike proteins. Antibodies then attach the virus and prevent it from attaching to the cells. The immune system then responds to signals from the antibodies by consuming and destroying the clumps of viruses. If an individual will catch a real virus in later stage, then the body would recognise

14 Revolutionary Solutions for Comprehensive Assessment …

187

and destroy it with the help of these vaccines. Thus, development of vaccines amid a pandemic has become a global concern. Coronavirus can cause serious forms of lungs disease such as: 1. Acute respiratory distress syndrome (ARDS) [24, 25]. 2. Diffuse alveolar damage (DAD) [24, 25]. As compared with a normal person chest X-rays as shown in Fig. 4a, person with ARDS chest X-ray imaging, shown in Fig. 4b, has certain barriers. The ARDS patient’s CT-imaging after observation demonstrates bilateral lung opacities, i.e. both sides of the patient’s lungs are abnormal with the display of chest X-ray as opaque. Acute respiratory distress syndrome (ARDS) [24, 25] is a serious form of lungs disease, caused due to the severe inflammation of lungs. In patient showing ARDS symptoms, the survival rate is very less. Ageing, critical illness, sepsis and pneumonia are some of the factors to be considered. Some patients, who used to survive from ARDS, could recover completely, while others experience severe damages in their lungs. Very severe hypoxia or shortness of breath is one of its major symptoms. ARDS occur when fluid started leaking from the blood vessels and started filling in the tiny air sacs (alveoli) in the lungs. This fluid keeps the lungs from filling with enough quantity of air which means less amount of oxygen reaches the bloodstream. This deprives the organs to have the required amount of oxygen they needed for their full function. In a normal human body, the protective membrane keeps the fluids in vessels. Due to severe alveolar epithelial injury, the membrane became damaged, and fluid started leaking resulting in ARDS. COVID-19 patients have been found to be at greater risk for ARDS. In patients with ARDS, CT-imaging play a vital role in procuring lungs imaging pattern for accessing the lung morphology. From Fig. 4, it can be observed that in the normal chest X-ray, both the lung fields appeared black in colour, whereas in case of ARDS patient, the black linings became white in colour. The characteristic chest X-rays of ARDS patient are ‘whiteout of lungs’ [26]. The whiter it becomes, the more opaque it is and more abnormalities can be seen in the lung. It can be observed that at the bottom of the lung, there is a complete whiteout of the lungs. In a normal person chest X-rays, the ribs can be clearly identified along with the black lung fields, and the white portions visible are the blood vessels, merging into the lungs. Heart’s structure can be seen in the middle.

Fig. 4 Chest X-ray of a Normal person, and b ARDS patient

188

S. S. Panda et al.

Fig. 5 a Blood vessel lining in ARDS patient, and b blood vessel lining in normal person

From this comparison, it can be analysed that the COVID-19 patients with ARDS are having severely affected lungs. ARDS is characterised by diffuse alveolar damage (DAD) a potential form of lung damage. It diffusely damages the capillary walls along with cellular linings of alveolus referred as Numa-sites. According to study, the SARS-CoV 2 virus can enter into the alveoli and further can damage the alveolus, resulting in the development of ARDS. Debris that accumulates from all those damages lines the capillary walls. From Fig. 5, it can be observed that the pink linear structures are the highland membranes. Highland membranes are hallmark of diffuse alveolar damage. Parts of highland membrane are formed by plasma proteins that got leaked out from these capillaries. The capillaries leakage are due to the debris from epithelial and endothelial cells. Walls of alveolus become thicker in case of ARDS patient as shown in Fig. 5a, but in case of a normal person, as shown in Fig. 5b, the walls of alveolus are extremely thin that further assists in the exchange of gas, i.e. oxygen. As the walls become thicker, the patient will start developing symptoms like shortness in breath with severe respiratory illness which may lead to death. Patient who is more vulnerable to infections is at a greater risk. When treatment for ARDS patients failed by using conventional method, we can opt for the use of extra-corporeal membrane oxygenation(ECMO). ECMO is an alternative which uses heart–lung bypass technology to regulate gas exchange outside the body. When blood gets pumped out of the body, the machine puts oxygen into it, and then blood comes back into the body. It is similar to a portable heart–lung machine used in cardiac surgery. There are two types of ECMO, namely veno-arterial (V.A.) which supports the heart and the lungs, and then, there is veno-venus (V.V.) used for oxygenation that supports the lungs. When a person’s heart or lungs are not

14 Revolutionary Solutions for Comprehensive Assessment …

189

functioning properly, then they are in need of a mechanical device to perform those functions. During this pandemic emergency, FDA has approved the use of ECMO support machines for COVID-19 patients with severe lung damages.

3 Sophisticated Methods of Implementation for COVID-19 Treatment As of now, no such specific antiviral treatment is officially considered for controlling COVID-19, but on a trial basis, many countries are following some of the treatments as mentioned below:

3.1 Nucleic Acid Amplification Tests (NAAT) The most commonly used coronavirus tests are the NAAT [20], as recommended by the WHO. This test helps in detecting the coronavirus which is responsible for the COVID-19 disease. Samples from upper respiratory tract, i.e. from the throat and behind the nose (consisting of mixture of saliva and mucous), need to be gathered using nasopharyngeal swab technique. Further, the samples are being tested with the help of reverse transcriptase-polymerase chain reaction (RT-PCR) assay, for identifying genetic materials from specific pathogens. The reports can be obtained within 8–10 days after diagnosis.

3.2 Use of Hydroxychloroquine Drugs (HCQ) As coronavirus continues to go on a rampage, taking millions of lives across the world, along with severely affecting the mankind, trials for evaluating the efficacy of ‘Hydroxychloroquine’ drug as a medication, for COVID-19 treatment has started. HCQ drug is an anti-malarial drug, also considered as an anti-inflammatory drug, as it can be further utilised for treating rheumatoid arthritis and Lupus, and India is the leading manufacturer of this drug. It has gathered global interest ever since the most powerful countries have started demanding for the supply of these drugs in massive quantities. The Indian Council of Medical Research (ICMR) has suggested the healthcare workers fighting against the deadly coronavirus for the consumption of these drugs in a limited quantity; otherwise, this pandemic will lead to shortage of the drugs which can be life threatening for patients suffering from rheumatoid arthritis and Lupus and may result in side effects including irregular heartbeats. But till now this drug has not garnered any clarity on its effectiveness for the treatment of COVID-19 patient.

190

S. S. Panda et al.

3.3 Clinical Trials of Convalescent Plasma Therapy Convalescent plasma is one of the three immune-based options to lessen the impact of the COVID-19. The mechanism of this therapy follows the usage of antibodies from the blood of completely recovered patients for treating those who are under treatment and are critically affected by the virus. On a trial basis, some countries, including India, have already commenced clinical trials of this therapy, considering it as a potential treatment for COVID-19. In critically ill COVID-19 patients with ARDS, this therapy has depicted status of improvement. But due to some side effects and complications seen in recovered patients, this therapy is still in an experimental stage.

3.4 Reverse Transcriptase-Polymerase Chain Reaction (RT-PCR) Tests The real-time reverse transcriptase-polymerase chain reaction assays can be considered as one of the most accurate laboratory methods for detecting any specific genetic material from any pathogens including coronavirus. According to the Centre for Disease Control and Prevention (CDC), China, the RT-PCR test kits were recommended for the treatment of infected patients of COVID-19. After collecting the swab samples from back of nose or lungs, and further processing in the laboratories, the results can be obtained within 6–8 h of the diagnosis. But this test is also on a trial basis, since the positive RT-PCR cases after the diagnosis are still revealing lower accuracy.

3.5 Use of Plaquenil Drugs According to French lab SANOFI, chloroquine drugs [17], an anti-malarial drug administered by the name Plaquenil, have strong potential as a prophylactic measure against COVID-19 treatment. Guidelines were already outlined in countries like China and South Korea for the intake of Plaquenil as a preventive measure against COVID-19. However, corona positive patients, who are exposed to this drug, are at a higher risk of developing secondary complications like retinopathy, thereby restricting these drugs from COVID-19 treatment.

14 Revolutionary Solutions for Comprehensive Assessment …

191

3.6 Solidarity Trial of Antiviral Drug ‘Remdesivir’ According to Raman R. Gangakhedkar, Head of Epidemiology and communicable diseases at ICMR, use of the antiviral drug, Remdesivir [18], can be highly effective in stopping the replication mechanism of SARS-CoV 2, in order to reduce the spread of COVID-19. This drug was used against Ebola virus outbreak in the year 2009. According to the study, in the New England Journal of Medicine, patients who have administered this drug have shown signs of improvement and are no longer in the need of a ventilator support. However, the access to this drug is limited due to some complications seen in the infected patients.

3.7 Thapsigargin (TG): A Novel Antiviral Drug A new plant-based antiviral drug, recently discovered by researchers, is proved to be effective in dealing with all of the emergent variants of SARS-CoV 2, including the most dominant Delta Variant (i.e. highly infectious viral strain), thereby limiting harsh impact of the newer variants. During the infection stage, TG is able to inhibit each variant of this deadly disease. This plant-derived antiviral is effective to combat the highly infectious Delta Variant too. Researchers believe that the antiviral potential of TG as an active therapeutic agent can treat and block the growth of these emerging variants.

4 Proposed SWIFT Technique for Procuring Lungs Imaging of ARDS Patients and Further Detection of COVID-19 Magnetic resonance imaging (MRI) is one of the non-invasive imaging modalities, useful for examining the anatomical and physiological process of the body along with early diagnosis of various cardiopulmonary disorders. MRI scanning [30] is a painless procedure which can last upto 90 min. It uses strong magnetic field (about 2–3 T) to align the protons of hydrogen atoms and high frequency radio waves to generate the images of the internal organs and soft tissues in the body. For applying strong magnetic field, hydrogen nuclei (protons) are mostly preferred as they are rich in water and fats of the body and have strong magnetic properties referred as ‘nuclear spin’. The protons absorb the energy from the magnetic field to flip their spins. Although MRI is more expensive as compared to CT-imaging, but in case of MRI scanning technology, the image quality is of higher standard with no blurring effect. MRI being a cross-sectional imaging technique can also be useful in reducing the number of chest-CT scans and the risk of exposure to the radiations.

192

S. S. Panda et al.

Fig. 6 Tesla proton MRI for diagnosis of pneumonia in the lungs of patient

According to a report, it is observed that immuno-compromised individuals are more likely to show the symptoms of pulmonary abnormalities such as pneumonia, which can be considered as a critical factor for the COVID-19 patients. Figure 6 represents the pulmonary MRI [27] at 3 T for the diagnosis of pneumonia in some COVID-19 patients. In order to detect the viral infections in the specific part of lungs, this technique is usually preferred due to its higher accuracy level. Pulmonary MRI at 3 T is quite useful in immuno-compromised patients with suspected viral infections. As of the progressive technological advancements in the field of Medical Science, MR imaging modality is considered as a more potential alternative to the CT-imaging for the identification of pneumonia [28] and other lung disorders. Useful information regarding the morphological changes in the lungs and the ability for the better tissue characterisation can be assessed by this MRI technique. As compared to the conventional MR imaging technique, the 3D cine MRI modality using SWeep Imaging with Fourier Transformation (SWIFT) technique can be highly sensitive for obtaining lung imaging of ARDS patients. The term ‘cine’ is related to motion pictures, i.e. capturing the images of lungs while in motion. Till now, this technique is being implemented only on the rat lung ARDS using gradient modulated SWIFT with retrospective respiratory gating. Figure 7 represents the 3D cine MRI of rat lung ARDS using gradient modulation technique [31]. The impact on the lungs has been demonstrated for both normal and ARDS patients with different phases of respiratory cycle, such as inhalation and exhalation, using SWIFT and gradient respiratory gradient (GRE). The same procedure can also be implemented on the human lungs. But due to unavailability of MR imaging of the ARDS patients, the accuracy level is yet to be determined. We believe that the proposed technique will definitely show higher accuracy than the existing techniques, as the respiratory motion of the infected individuals is accurately monitored in different respiration phases.

14 Revolutionary Solutions for Comprehensive Assessment …

193

Fig. 7 3D cine MRI of rat lung ARDS

ARDS is a serious form of lung injury which can be life threatening. Severe pneumonia is one of the major symptoms seen in patients suffering from this disease. It prevents the passage of oxygen into the lungs and bloodstreams, resulting into serious complications like reduction of the level of oxygen in the body. Sepsis or life threatening illness, caused by the body response to an infection, is a major link between pneumonia and ARDS [29]. SWIFT technique can be used for obtaining the lungs imaging of ARDS patients using MRI modalities. SWIFT imaging technique [31] is an emerging and fast-growing modality. One of the advantages of implementing this technique on the human beings is its ability to acquire the images of tissues in a very short span of time with no motion artefacts. SWIFT method helps in three-dimensional assessment of lung tissues while acquiring appropriate information with high-quality images of the lungs. It can also be useful for in-vivo studies and can be operated in 3D radial acquisition mode. The method consists of frequency-swept excitation pulse and is associated with fabricating spin excitation response. In order to excerpt the signals of interest, correlation of the spin system response with the excitation pulse function is needed. The lungs protein density can be obtained with the help of this highly sensitive SWIFT technique.

194

S. S. Panda et al.

5 Conclusion and Future Extensions This chapter commences with a brief introduction of the coronavirus pandemic outbreak along with citing the clinical and biological features of COVID-19. In order to lessen the severe impact of this pandemic, several antiviral strategies such as the use of Hydroxychloroquine Drugs and Convalescent Plasma Therapy are adopted. But the mentioned strategies are under clinical trials with proper vaccines yet to be developed. As observed in the COVID-19 patients, most of them are suffering from respiratory illness due to weak immune system. ARDS is considered as one such respiratory illnesses that often lead to cause pneumonia. Out of all the available diagnostic imaging modalities, we have considered MR imaging technique for analysing the lung tissues, thereby helping the medical frontline healthcare workers for minimising the effect of COVID-19. Further, this paper can be extended by acquiring more number of samples in the form of lungs imaging of the COVID-19 patients worldwide, by which the accuracy level can be improved. As there is currently no medications available for proper treatment of COVID-19, with further real-time research and clinical trials a potential vaccine can be developed in order to fight against this disease. Our main responsibility is to reduce the impact of this disease by flattening the curve, which can be possible by developing appropriate vaccines amid a pandemic. Also, by following proper norms and protocols, we can mitigate the impact of COVID-19 to a greater extent.

References 1. Jebril N (2020) World Health Organization declared a pandemic public health menace: a systematic review of the coronavirus disease 2019 “COVID-19”, up to 26th March 2020. Available at SSRN 3566298. 1 Apr 2020 2. Singhal T (2002) A review of coronavirus disease-2019 (COVID-19). The Indian J Pediatrics 1–6 3. Zhang L, Liu Y (2020) Potential interventions for novel coronavirus in China: a systemic review. J Med Virol 4. Wu Z, McGoogan JM (2020) Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: summary of a report of 72 314 cases from the Chinese Center for Disease Control and Prevention. JAMA 323(13):1239–42 5. Holshue ML, DeBolt C, Lindquist S, Lofy KH, Wiesman J, Bruce H, Spitters C, Ericson K, Wilkerson S, Tural A, Diaz G (2020) First case of 2019 novel coronavirus in the United States. New England J Med 6. Nabi G, Siddique R, Ali A, Khan S (2020) Preventing bat-born viral outbreaks in future using ecological interventions. Environ Res 7. Hoffmann M, Kleine-Weber H, Krüger N, Mueller MA, Drosten C, Pöhlmann S (2020) The novel coronavirus 2019 (2019-nCoV) uses the SARS-coronavirus receptor ACE2 and the cellular protease TMPRSS2 for entry into target cells. BioRxiv 8. Chan JF, Kok KH, Zhu Z, Chu H, To KK, Yuan S, Yuen KY (2020) Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerging Microbes Infections 9(1):221–36

14 Revolutionary Solutions for Comprehensive Assessment …

195

9. Guo YR, Cao QD, Hong ZS, Tan YY, Chen SD, Jin HJ, Tan KS, Wang DY, Yan Y (2020) The origin, transmission and clinical therapies on coronavirus disease 2019 (COVID-19) outbreakan update on the status. Military Med Res 7(1):1 10. Cui J, Li F, Shi ZL (2019) Origin and evolution of pathogenic coronaviruses. Nat Rev Microbiol 17(3):181–92 11. Tung HY (2020) COVID-19 and SARS-COV-2 infection and virulence: hypothesis I. ScienceOpen Preprints 12. Zhang L, Shen FM, Chen F, Lin Z (2020) Origin and evolution of the 2019 novel coronavirus. Clin Infectious Diseases 13. Lei J, Kusov Y, Hilgenfeld R (2018) Nsp3 of coronaviruses: structures and functions of a large multi-domain protein. Antiviral Res 1(149):58–74 14. Letko M, Marzi A, Munster V (2020) Functional assessment of cell entry and receptor usage for SARS-CoV-2 and other lineage B betacoronaviruses. Nat Microbiol 5(4):562–9 15. Wang Y, Hu J, Li Y, Wang H, Li Z, Tang J, Hu L, Zhou X, He R, Ye L, Yin Z (2019) The transcription factor TCF1 preserves the effector function of exhausted CD8 T cells during chronic viral infection. Front Immunol 10:169 16. Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, Zhang L, Fan G, Xu J, Gu X, Cheng Z (2020) Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet 395(10223):497–506 17. Devaux CA, Rolain JM, Colson P, Raoult D (2020) New insights on the antiviral effects of chloroquine against coronavirus: what to expect for COVID-19? Int J Antimicrobial Agents 12:105938 18. de Wit E, Feldmann F, Cronin J, Jordan R, Okumura A, Thomas T, Scott D, Cihlar T, Feldmann H (2020) Prophylactic and therapeutic remdesivir (GS-5734) treatment in the rhesus macaque model of MERS-CoV infection. Proc Natl Acad Sci 117(12):6771–6776 19. Coleman CM, Sisk JM, Mingo RM, Nelson EA, White JM, Frieman MB (2016) Abelson kinase inhibitors are potent inhibitors of severe acute respiratory syndrome coronavirus and middle east respiratory syndrome coronavirus fusion. J Virol 90(19):8924–8933 20. Guan CS, Lv ZB, Yan S, Du YN, Chen H, Wei LG, Xie RM, Chen BD (2020) Imaging features of coronavirus disease 2019 (COVID-19): evaluation on thin-section CT. Acade Radiol 21. de Wit E, van Doremalen N, Falzarano D, Munster VJ (2016) SARS and MERS: recent insights into emerging coronaviruses. Nat Rev Microbiol 14(8):523 22. Kampf G, Todt D, Pfaender S, Steinmann E (2020) Persistence of coronaviruses on inanimate surfaces and its inactivation with biocidal agents. J Hospital Infection 23. Chen ZL, Chen YZ, Hu ZY (2014) A micromechanical model for estimating alveolar wall strain in mechanically ventilated edematous lungs. J Appl Physiol 117(6):586–592 24. Parvathaneni K, Belani S, Leung D, Newth CJ, Khemani RG (2017) Evaluating the performance of the pediatric acute lung injury consensus conference definition of acute respiratory distress syndrome. Pediatric Critical Care Med 18(1):17–25 25. Nanchal RS, Truwit JD (2018) Recent advances in understanding and treating acute respiratory distress syndrome. F1000 Research 7 26. Standiford TJ, Ward PA (2016) Therapeutic targeting of acute lung injury and acute respiratory distress syndrome. Translational Res 167(1):183–191 27. Liszewski MC, Görkem S, Sodhi KS, Lee EY (2017) Lung magnetic resonance imaging for pneumonia in children. Pediatric Radiol 47(11):1420–1430 28. Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, Zhao X, Huang B, Shi W, Lu R, Niu P (2020) A novel coronavirus from patients with pneumonia in China, 2019. New England J Med 29. Yin Y, Wunderink RG (2018) MERS, SARS and other coronaviruses as causes of pneumonia. Respirology 23(2):130–137 30. Bellani G, Rouby JJ, Constantin JM, Pesenti A (2017) Looking closer at acute respiratory distress syndrome: the role of advanced imaging techniques. Curr Opinion Critical Care 23(1):30– 37 31. Thet LA, Parra SC, Shelburne JD (1986) Sequential changes in lung morphology during the repair of acute oxygen-induced lung injury in adult rats. Experimental Lung Res 11(3):209– 228

Chapter 15

Application of Complex Network Principles to Identify Key Stations in Indian Railway Network Ishu Garg, Ujjawal Soni, Sanchit Agrawal, and Anupam Shukla

1 Introduction For most of the people in India, railway transportation is one of the most important modes of movement. An estimated 22.15 million passengers travelled using Indian Railways network on daily basis. In terms of railway network size, Indian Railway network is the fourth-largest (after USA, Russia and China) in the world covering 7325 stations and a total route of 67,368 km. Efficient transportation networks are important building blocks for a economic growth of a country. Indian Railways is a pivotal mode of transportation for upper middle and lower class sector travel segment. It serves as the most economical way of transportation of goods and people across land and hence significantly affects the ease and efficiency of trade and other aspects of the national economy. Hence, it is very important to identify the important points withing the network to prevent network failure and develop the network so as to make it more robust and efficient. Successive administrations of the Government of India have worked on improving the railways. We use complex network analysis on data provided by Government of India to analyse the Indian Railway network, and based on difference metrics, we identify important railways stations. We hope that our study can help in better planning policies and new developmental projects, judicial allocation of resources and effective railway budget planning supported by robust complex network-based models. Complex network analysis has been used in studying transportation networks including analysis on road network [2, 3], airways [4, 5] and railways [6–10]. Models I. Garg (B) · U. Soni · S. Agrawal Indian Institute of Technology, Madras, India e-mail: [email protected] A. Shukla Indian Institute of Information Technology, Pune, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_15

197

198

I. Garg et al.

based on complex network were used to identify the impact of failure in railway networks in. In most of the old studies, models are simplified by modelling network as an undirected graph, where two stations (nodes) are linked to each other iff they have a direct connection (undirected edge) between them. Many of the network properties depend greatly on network modelling. A weighted network flow model characterizing network flow and passenger flow in Chinese subway network was given in [11]. Complex weighted network analysis has also been done on rail and bus networks in Singapore in [12].

2 Paper Summary In this paper, we first describe how we formulate the Indian Railway network. Post which we present results of the analysis of of this network which help us characterize Indian Railway network and importance of it’s different railway stations based on many metrics defined over a complex network.

3 Graph Construction Data has been taken from [1]. We follow two distinct approaches for the network analysis. We first model the available data as a simple graph and then as a hypergraph and glean useful insights using both representations.

3.1 Simple Graph Model In this model, each node represents a railway station, and there is an edge from station A to station B iff B can be reached from A directly via some train without any intermediate stops. This leads to a directed graph. We observe that since most trains also run on the reverse route, there is a reverse edge in most cases. However, we choose not to collapse the graph to an undirected one, since this observation is only approximate, and need not hold exactly for all cases. Two stations can be connected by multiple edges; we choose to represent this information using edge weights to denote multiplicity of edges.

3.2 Hypergraph Model In this model, each node represents a railway station, and each hyperedge connecting a subset of nodes represents a train. This hypergraph can also be represented as a

15 Application of Complex Network Principles …

199

simple bipartite graph, where trains form one of the vertex partitions, while stations form the other partition, with an edge between a train and a station iff that train visits the station.

4 Simple Graph Analysis 4.1 Global Network Information From Table 1, we observe that while the number of strongly connected components is 25, it is worth noting that 99.26% of stations belong to a single strongly connected component. This is indicative of the wide reach of the Indian Railway network, which happens to be the third largest in the world. Moreover, the network diameter reveals that it is possible to reach from any station to another using only 17 intermediate stops, while the average number of stops between two stations is only 3.853.

4.2 Visualization We visualize the simple graph below. For the sake of neatness, we omit the edges. Various features of the graph can be interpreted as follows: – Node size: Nodes are sized according to their importance as determined by PageRank. Larger nodes have more importance. – Node colour: Nodes belonging to the same community have the same colour. Automatic community detection was done using [16]. – Node labels: Node labels are the station names.

Table 1 Global network metrics Metric Total no. of Stations Total no. of Trains Total no. of Edges Average degree Average weighted degree No. of weakly connected components No. of strongly connected components Network density Network diameter Average path length Average clustering coefficient

Score 4337 2810 26436 6.095 15.208 2 25 0.002 18 4.853 0.231

200

I. Garg et al.

Fig. 1 Global network community graph

By inspection of Fig. 1, we observe that the communities detected naturally correspond to geographical divisions of the country. The mapping between communities and geographical areas is shown below in Table 2.

4.3 Important Stations Stations with high weighted in-degree have several trains arriving at them while those with high weighted out-degree have several trains departing from them. These stations have maximum incoming and outgoing “train traffic”, respectively. As expected, the list of busiest Indian Railway stations [18] features all stations shown in Table 3 (except Surat). The unweighted in-degree and out-degree of a stations represent the number of stations from which trains arrive and to which trains depart, respectively. We naturally

15 Application of Complex Network Principles … Table 2 Communities detected Community detected North North-West Central South-East South West East North Maharashtra

Major stations Delhi, Kanpur, Lucknow, Allahabad Vadodara, Jaipur, Kota, Surat Nagpur, Bhopal, Itarsi, Raipur Vijaywada, Nellore, Warangal, Kharagpur Katpadi, Thrisur, Arakkonam, Tiruchirappalli Kalyan, Pune, Nasik, Daund Darjeeling, Kurseong, Tung, Mahanadi Dhulghat, Ratnapur, Wan road, Akot

Table 3 Stations with max weighted in and out degrees Station din Station VIJAYAWADA VADODARA KANPUR ITARSI KALYAN

302 296 271 259 254

Table 4 Stations with max in and out degrees Station din KANPUR VIJAYAWADA LUCKNOW ITARSI KALYAN

201

86 73 72 69 68

VIJAYAWADA VADODARA KANPUR SURAT AHMEDABAD

dout 293 281 261 252 243

Station

dout

HOWRAH NEW DELHI VARANASI LOKMANYATILAK H NIZAMUDDIN

106 98 78 77 76

expect stations in 4 to be those where many routes converge or diverge. As expected, according to [19], most of the stations in Table 4 are junctions or terminuses. Nodes with high betweenness centrality lie on several shortest paths. From Table 5, we observe that should one of the stations like Nagpur or Pune be blocked, several trains will have to be rerouted through longer paths. On the other hand, stations with high eigencentrality or PageRank are considered important since they are linked to other important stations. If these stations are blocked, several important routes will be affected adversely. In June 2015, the signalling system in Itarsi (which features high by both measures) was disrupted, and it made the news, since more than 100 important trains were affected [20].

202

I. Garg et al.

Table 5 Stations with max importance metrics Betweenness centrality PageRank Station Score Station Score NAGPUR PUNE KANPUR LUCKNOW JABALPUR

0.035 0.033 0.032 0.029 0.028

LUCKNOW KANPUR KALYAN GHAZIABAD ITARSI

0.0021 0.0021 0.0019 0.0019 0.0018

Eigenvector centrality Station Score KANPUR NAGPUR LUCKNOW ITARSI VIJAYAWADA

1.0 0.886 0.862 0.840 0.793

5 Hypergraph Analysis From Table 6, we see that on an average, a station is visited by 15.911 distinct trains. Moreover, one can travel from any station to any other using at most eight train hops, while on an average, a passenger needs to change 1.593 trains to travel from one randomly chosen station to another. The network radius is 6, which indicates that there is at least one station from which all other stations can be accessed via at most five train hops. This property is achieved by 2459 stations, which accounts for more than half of the nodes. This also explains the low average path length to some extent, since the eccentricity itself is rather low for most nodes.

6 Conclusion We observe that we have two weekly connected and 25 strongly connected components in the network. This clearly indicates that there are several stations within the network from which cannot reach other stations. Although the average number of stops between any two stations is 3.853, a network diameter of 18 indicates that there exists two stations between which the shortest path includes 16 stops. One can use these statistics to further improve the connectivity within the network.

Table 6 Global hypergraph information Metric

Score

Total no. of Stations Total no. of Trains No. of weakly connected components Average degree Network diameter Network radius Average path length

4337 2810 2 15.911 9 6 2.593

15 Application of Complex Network Principles … Table 7 Definition of metrics Metric

203

Formula

Shortest path length between u, v No. of shortest paths between u, v going through s Network radius Network diameter

δ(u, v) σuv (s)

Betweenness centrality

Eigenvector centrality Pagerank

Ax = λ1 x x = D(D − αA)−1 1

minu maxv δ(u, v) max maxv δ(u, v) u uv σuv (s) t σuv (t)

uv

We find that our network analysis explains some of the phenomenons observed and reported in the press. Disruption in normal functioning of these stations makes headlines in news which further help us underline the importance of the key stations identified by our paper.

7 Future Work Our analysis identifies key stations with respect to robustness of train network but does not include passenger traffic information. One can include this information as edge weight. Our analysis pays more attention of connectivity of the network, but with the additional of passenger traffic, one can find out important stations with respect to the impact on the number of people transporting within the network.

8 Appendix: Metrics i In Table 7, A is the adjacency matrix, D is a diagonal matrix with Dii = max(dout , 1), λ1 is the leading eigenvalue of A, and α is a parameter.

References 1. Dataset: Indian Railways Time Table for trains available for reservation as on 03.08.2015. https://data.gov.in/catalog/indian-railways-train-time-table-0 2. Sienkiewicz J, Holyst JA (2005) Statistical analysis of 22 public transport networks in Poland. Phys Rev E (APS) 3. Lu H, Shi Y (2007) Complexity of public transport networks. Tsinghua Science and Technology (TUP)

204

I. Garg et al.

4. Bagler G. Analysis of the airport network of India as a complex weighted network. Phys A: Stat Mech Appl 5. Chi L-P, Wang R, Su H, Xu X-P, Zhao J-S, Li W, Cai X . Structural properties of US flight network. Chin Phys Lett (IOP Publishing) 6. Wang, Y-L, Zhou T, Shi J-J, Wang J, He D-R. Empirical analysis of dependence between stations in Chinese railway network. Phys A: Stat Mech Appl 7. Li W, Cai X. Empirical analysis of a scale-free railway network in China. Phys A: Stat Mech Appl 8. Sen P, Dasgupta S, Chatterjee A, Sreeram PA, Mukherjee G, Manna SS. Small-world properties of the Indian railway network. Phys Rev E 9. Seaton KA, Hackett LM. Stations, trains and small-world networks. Phys A: Stat Mech Appl 10. Liu C, Li J. Small-world and the growing properties of the Chinese railway network. Front Phys China 11. Feng J, Li X, Mao B, Xu Q, Bai Y. Weighted complex network analysis of the Beijing subway system: train and passenger flows. Phys A: Stat Mech Appl 12. Soh H, Lim S, Zhang T, Fu X, Lee GKK, Hung TGG, Di P, Prakasam S, Wong L. Weighted complex network analysis of travel routes on the Singapore public transportation system. Phys A: Stat Mech Appl 13. Satchidanand SN et al (2014) Studying Indian Railways Network using hypergraphs. In: 2014 sixth international conference on Communication Systems and Networks (COMSNETS), IEEE 14. Estrada E, Rodriguez-Velazquez JA (2005) Complex networks as hypergraphs. arXiv preprint physics/0505137 15. Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. In: International AAAI conference on weblogs and social media 16. Blondel VD et al (2008) Fast unfolding of communities in large networks. J Stat Mech: Theor Experiment 10:P10008 17. Latapy M (2008) Coefficient clustering. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor Comput Sci (TCS) 407(1–3):458–473 18. Top 12 most busiest Railway Stations of India. Walk through India. http://www. walkthroughindia.com/walkthroughs/trains/top-12-busiest-railway-stations-india/ 19. List of Railway Junction Stations in India. https://en.wikipedia.org/wiki/List_of_railway_ junction_stations_in_India 20. More than 100 trains affected after fire at Madhya Pradesh’s Itarsi station. http://timesofindia. indiatimes.com/india/More-than-100-trains-affected-after-fire-at-Madhya-Pradeshs-Itarsistation/articleshow/47709612.cms

Chapter 16

Text Classification Using Deep Learning: A Survey Samarth Bhawsar, Sarthak Dubey, Shashwat Kushwaha, and Sanjeev Sharma

1 Introduction Text classification is nothing but to classify multiple unstructured text data into various categories [1]. Using text classification, we categorize unstructured data in the form of text objects into a group of words. By the use of natural language processing (NLP), text classification can analyse text automatically followed by assigning a set of predefined tags based on its context. We can use NLP for sentiment analysis, language detection, and topic detection [4]. Goal of sentiment analysis is to identify or check what kind of opinion is expressed. This can be expressed in binary opinion or a set of opinion such as rating of a movie from 1 to 5 [6]. For example, analysing Twitter post about a movie Spider-Man if people like it or not [6]. Topic classification flags spam emails which are then converted into spam folder [16]. While language detection is used to detect in which language category it lies [21]. Since the beginning of time, written texts have been a means to communicate, express, and document something of significance. Even in the modern age [12], it has been proven a lot of times that an individual’s writing style can be a defining aspect of one’s psyche. Ever since social media emerged micro-blogging became the new form of writing expressing or documenting an event. This also gave rise to lots of unstructured data S. Bhawsar (B) · S. Dubey · S. Kushwaha · S. Sharma Indian Institute of Information Technology, Pune, India e-mail: [email protected] S. Dubey e-mail: [email protected] S. Kushwaha e-mail: [email protected] S. Sharma e-mail: [email protected] URL: https://deeplearningforresearch.com/ © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_16

205

206

S. Bhawsar et al.

and with it a need to understand that data [14]. Now, this is where text classification can be put to our advantage. Deep learning is a artificial neural networks based on machine learning to extract higher level features from data through multiple layer of processing [18]. Today text classification in natural language processing (NLP) is one of the most used methods to classify text documents which is based on predefined classes. In today’s era, text classification task became more crucial due to the emergence of large set of documents which helps companies to classify data systematically and increases there workflow and profits [4]. Nowadays, the NLP research progress on text classification has reached at the state-of-the-art (SOTA) so that the need of accessing the performance of SOTA based deep learning models becomes essential [16]. This survey provides a review of the existing literature on text classification with deep learning architecture. The significant contribution of this survey can be summarized as follows: 1. Discuss the previous methods that successfully applied for the text classification. 2. Identify the popular data sets for the text classification. 3. Identify popular metrics of evaluation for text classification. The rest of the paper is organized as follows: Sect. 2 discusses the review methodology. Section 3 discusses the literature survey of the related work. Section 4 covers the various data sets available for text classification. At last, we are concluding work in Sect. 5.

2 Review Methodology The following database is explored for the survey: 1. 2. 3. 4.

Springer Link IEEE Xplore Digital Library Elsevier Google Scholar enumerate environment.

Over 80 papers were studied for the survey. High-quality papers and sources were used for carrying out the survey. Out of the 80 papers, 32 papers were shortlisted for this study.

3 Review of the Previous Methods In this paper [16], authors have given an excellent introduction to text classification. The paper is very beginner-friendly and has explained almost every text classification technique from basic linear classification models to advanced deep learning models,

16 Text Classification Using Deep Learning: A Survey

207

from supervised learning models to unsupervised learning models, and has also explained pros and cons of every technique. They have compared the accuracy of all the classification techniques in a detailed table. They have given a nice elaboration on the steps involved in text classification from data preprocessing, various word representations to classification models. A detailed table is given which contains a summary of almost every popular text classification data sets. The paper contains a wide range of evaluation matrices with their mathematical interpretations. In short, if one has to get started with text classification and natural language processing, this paper is like a bible. In [8], authors have introduced random multimodal deep learning (RMDL). It is an ensemble of deep neural network (DNN), convolutional neural network (CNN), and recurrent neural network (RNN) which are trained in parallel on the data set. The ensemble combines the results of the three neural networks and predicts a more accurate result. They have tested the result for text classifications for Web of science (WOS), Reuters, IMDB, and 20 Newsgroup data sets. For IMDB text data set, individual machine learning and deep learning models give an accuracy of approximately 83%-88.5%, whereas RMDL gives accuracy of approximately 89–91%. In [9], FastText—A library developed by Facebook AI Research for creating word embeddings and text classification models by the vector representation of the text data. In this paper, the authors have improved the memory efficiency of the existing FastText library by compromising with little accuracy. They achieved this by compressing the word embeddings through product quantization techniques. Product quantization approximates the vector representation of the words to the nearest vectors. They achieved great results in terms of reducing the memory consumption (up to a factor of 1000) of the trained model with negligible loss in accuracy. In [23], authors introduce a new and effective word embedding technique called the label-embedding attentive model (LEAM). In this, the label vector and the word vectors are embedded in the same vector space. The authors have built an attention mechanism that measures the compatibility between text sequences and labels. The attention system is trained on a training set of labelled data to ensure that text sequences contain relevant words with more weight as compared to irrelevant words. LEAM in addition to retaining the semantics of the input text data also provides additional information through compatibility of text data and labels. This technique outperformed some of the existing best text classifiers in terms of both accuracy and speed. In paper [2], authors introduce a new and effective data augmentation technique called language model-based data augmentation (LAMBADA). This method achieved great results in the cases where the input text data lacks labels . This involves training a neural network with the labelled input data to generate a new artificial labelled data that can be used to train supervised learning text classifiers. LAMBADA involves fine-tuning the text generator while training on the original labelled text data. The generated text sequence is filtered through a text classifier which is trained on original data. LAMBADA beats the majority of the data augmentation techniques by a great margin. It is most effective where the labelled data is very less. The authors have implemented this technique on a bunch of data sets and observed

208

S. Bhawsar et al.

significant increase in performance of the text classifiers as compared to other data augmentation techniques. The paper [17] basically discusses the potential of training the text classification models only on unlabeled data by using only the label names. To achieve this, a pre-trained language model is used which serves as both a source of distinction for classification as well as representation learning model which could be used for document classification. This approach relates the words with similar semantics through the associated label names. The authors have also generalized this model for self-training. They achieved up to 90% accuracy on some of the popular data sets without using any labelled document. They achieved this by training the model on unlabelled data which used three words per class as the label name. In paper [20], authors address the problem of language learning models not capturing enough information about the labels of the text data. This may occur due to semantic similarity of the labels. Thus authors propose an approach that generates distinct label representation with more specific information. This approach is mostly applicable to few-shot text classification, i.e. less number of text documents associated with each label. In paper [11], authors introduced a semi-supervised learning model called selfpretraining model. It works on a self-pre-training algorithm which involves an iterative process involving two classifiers, first classifier labels an unlabelled set of text data and then feeding the result to initialize the second classifier which is further trained on labelled data. This method outperforms some of the best semi-supervised text classification techniques. In this research paper [14], the authors had discussed various machine learning and deep learning models to classify the text. But in this research paper, they had mainly focused on machine learning models. And they found that machine learning models had produced better results in natural language processing. The reason for better classification lies in the better understanding of complex data and nonlinear relationship within the data. But it is very challenging for the researchers to choose the appropriate model. In this research paper, the authors have explained the text classification in a phased manner. First they had extracted the features from the document; then, they had reduced their dimension. After that, they used various classifiers and then evaluated their models. They had also discussed various applications of text classification such as information retrieval, sentiment analysis, recommendation systems, and so on. And finally they concluded that current classification algorithms can increase the accuracy if they had good grasp in feature extraction and if they can properly evaluate their models. In this article [28], authors had discussed various encoding techniques for text classification. Some of the encoding levels are UTF-8 bytes, characters, words, romanized characters and romanized words. They had compared these levels with the linear models, FastText, and convolutional network. Different encoding mechanisms in the convolutional network are character glyph, one hot encoding, and embedding. Data sets which were used are Rakuten, Amazon, China-news, NYTimes, etc. They had obtained these data sets from online shopping reviews. They had found that the

16 Text Classification Using Deep Learning: A Survey

209

encoding mechanism which works better for convolutional networks is byte level one hot encoding and for FastText is character-level n-gram encoding. In [13] recent years, there has been an increase in information which leads to an increase in number of documents which made it difficult for researchers to classify text because along with the growth of document there is also an increase in categories. So, to solve this problem, authors had approached differently and seen the problem as multi-class classification. Here, authors had used multiple deep learning methods to understand the hierarchy of documents at each level. Here, a Web of science data set is used. And after evaluating, authors had found that deep learning approaches give better results than traditional methods such as naive Bayes and SVM. It [5] is very difficult to classify short text unlike paragraphs and document classification. As it contains very less text which makes it difficult to classify it. So, authors came with an idea that they will use the external resource to gather some semantics about the short texts. They had used this knowledge and associated it with deep neural networks. Two attention mechanisms had been used. One is concept towards short text(C-ST), and the other one is concept towards concept set(C-CS). Authors had gathered all the conceptual information to classify the short texts. Four data sets had been used in this article, and the results showed that this approach had outperformed. Deep neural networks [5] have given better performance than traditional classification methods. But many researchers have ignored the matching signals between the word and the classes. As their models mainly depend on text representation. So, to solve this problem, authors had designed a framework called the explicit interaction model (EXAM) which uses the interaction mechanism to compute the matching signals between the words and the classes. This framework is applied on multi-class and multi-label text classification. They had used different public data sets to evaluate this approach. And found that they get astonishing results as a word-based model. Entities [26] provide lots of information to extract the semantics of the text. Authors had taken the advantage of entities, so they had first extracted all the entities from the document to classify text. They have also used neural attention mechanisms so that they can also focus on a small subset of relevant and unambiguous entities. They had taken two standard data sets 20 Newsgroups(20NG) and R8 data sets for evaluation. Their model had outperformed on both the data sets. In [22], authors discuss that multi-label each text document can be assigned with one or more labels. It makes it more challenging than other text classification due to higher dimensions and increased in labels. Authors had regularized the model to avoid over-fitting in the high dimension space and to predict accurately. For practical improvement, the authors had regularized the model complexity during the training phase. For regularizing purposes, they had to use elastic net. They had also applied support interference and GFM predictions to increase F-measure. They had produced good results on many benchmark data sets. They tested their model on nine data sets. In this paper [27], the authors had used graphs to achieve text classification. They had first constructed the graph and then converted it to its encoding tree. Here, nodes contain words or documents, and the weight for the connection of two words is

210

S. Bhawsar et al.

mutual information and for the word and the document weight is TFIDF value. And the tree has the information of the text which helps to classify the text. In this paper [7], the authors had introduced a new approach to classify text documents. They had used word embedding techniques to convert the text into RGB image. So, they can use CNN directly without any modification. The semantic of the original text document is preserved in the image. In research paper [3], author has introduced iNLTK which is an open-source NLP library. Various pre-trained language models are added so that the model supports sentence embedding, tokenization, text generation, data augmentation, and textual similarity in 13 Indic languages. Coming to accuracy by the use of less than 10% of the trained data the author has achieved more than 95% of the previous best performance. The paper [10] introduces a fast text classifier FastText. It is basically a model which can be trained on billions of words and classify them among 312 K classes which takes less than a minute. Accuracy of FastText classifier is comparable to that of deep learning classifier and even faster in magnitude for evaluation and training. In [24], due to the rise of lots of unstructured data, and with it, a need to classify data text classification is introduced, but many text classification techniques result to be slower so that the author has introduced easy data augmentation (EDA) technique to maximize the performance of text classification tasks. Basically, EDA consists of four operations, i.e. random insertion, random deletion, random swap, and synonym replacement. It improves performance for both recurrent neural networks and convolution. EDA comes up with better results for smaller data set. Accuracy achieved by the use of normal training with all available training set is achieved through EDA by the use of only 50% of the available training data set. The paper [25] describes a neural embedding model, i.e. star-space that can solve various tasks such as ranking tasks which include web search or information retrieval, embedding of multi-relational graphs, text classification task, recommendation tasks based on filtering and content, and embedding of different levels of sentence or document. Working of the model takes place by embedding the entities composed of discrete features and comparing them against each other. The paper [19] introduced a pre-trained language model, i.e. BERTweet. The experiments and accuracy achieved as described in the paper show that BERTweet has performed better than the other models: Roberta and XLM-R. Previous SOTA models on three tweet tasks of NLP; i.e. named-entity recognition, part-of-speech tagging, and text classification have showed that BERTweet outperforms them producing better results and output. In research paper [15], the author overcomes the weaknesses of fixed-length feature algorithm bag-of-words, i.e. losing order of words and ignoring semantics of words. They proposed a paragraph vector that interprets, learns, and classifies a fixed-length text from a text data such as paragraphs, sentences, and documents. A dense vector is represented which is trained to predict words in the document. By this way this algorithm performed better than models of bag-of-words and other similar techniques for text representation (Table 1).

16 Text Classification Using Deep Learning: A Survey

211

Table 1 Table below shows accuracy for various machine learning and deep learning models trained and validated on IMDB, 20 Newsgroup, Web of science(WOS) data sets Research Paper

Models IMDB

Data sets(Accuracy) 20News group Web of Science (WOS)

89.4

–

–

89.4 95.63

-

-

An improvement of data classification DNN using random multimodel deep learning (RMDL) [8] CNN RNN SVM(TF-IDF) 3 RMDLs 9 RMDLs 15RMDLs

88.55

86.50

–

87.44 88.59 88.45 89.91 90.13 90.79

82.91 83.75 86.00 86.73 87.62 87.91

– – – – – –

HDLTex: hierarchical Deep Learning RNN+DNN for text classification [13] Stacking SVM RNN+RNN Stacking SVM CNN+CNN Stacking SVM NBC

–

–

86.07

– – – – – –

– – – – – –

79.45 76.58 71.81 90.93 85.68 78.14

Deep Learning text classification: a XLNetcomprehensive review [18] Large(ensemble) BCN+Char+CoVe ULMFiT BERT-Base LDA Deep Pyramid CNN ULMFiT Block-sparse LSTM

96.21

–

–

91.80 95.40 95.63 67.40 97.63

– – – – –

– – – – –

97.84 97.73

– –

– –

Bag of tricks for efficient text classifi- SVN+TF cation [10] CNN Conv-GRNN fastText

40.5

–

–

37.5 42.5 45.2

– – –

– – –

Survey on text classification: from shal- DCNN low to deep learning [16] DAN BERT-base

212

S. Bhawsar et al.

Table 2 Table below shows summary of Yelp data set Target classes Train size Yelp-2 Yelp-5

2 5

560,000 38,000

Table 3 Table below shows summary of Amazon data set Target classes Train size Amazon-2 Amazon-5

2 5

3,600,000 400,000

Test size 650,000 50,000

Test size 3,000,000 650,000

Table 4 Table below shows summary of AG News data set Train size Test size 120,000

7600

Table 5 Table below shows summary of Reuters news data set Train set size Test set size 7769

3019

4 Data set Related to Text Classification IMDB reviews- The IMDB reviews is a data set used for binary sentiment classification of movie reviews with the same amount in each class. The data set has to be first split into training and testing data. It has two categories. Data-set size of 50,000. Yelp-data set- The Yelp-dataset data set is collected from Yelp data set challenge. This data set has two categories. Table 2 shows summary of Yelp data set. Amazon Reviews - The Amazon Reviews (AM) is a popular data set formed by collecting Amazon Website product reviews. This dataset has two categories. Table 3 shows summary of Amazon data set. 20 Newsgroups- The 20 Newsgroups (20NG) is a newsgroup text data set. It has 20 categories with the same number of each category and includes 18,846 texts. AG News—The AG News is a search engine for news from academia, choosing the four largest classes. It uses the title and description fields of each news. Table 4 shows summary of AG News data set. Reuters news—The Reuters news is a popularly used data set for text classification from Reuters financial news services—There are also some Reuters subsets of data, such as R8, BR52, RCV1, and RCV1-v2. Table 5 shows summary of Reuters data set.

16 Text Classification Using Deep Learning: A Survey

213

5 Metrics for Text Classification To find out how our classifier is working, to compare our methods with the current methods, we need to evaluate our classifier. Now, researchers always prefer to evaluate their methods so if they find any inconsistency or errors they can improve. But it is not an easy task because there is a lack of standard data collection methods. There are various performance metrics to evaluate the algorithms including recall, F-measure, accuracy, precision and so on [14].

5.1 Accuracy Accuracy refers to the ratio of sum of true positive (number of items correctly predicted belonging to the positive class) and true negative(number of items correctly predicted belonging to the negative class) to the total number of elements predicted belonging to both the positive and negative class. Accuracy =

TP + TN TP + TN + FP + FN

5.2 Precision Precision refers to the ratio of correctly predicted positive items to all the positive TP predicted items Precision = TP+FP

5.3 Recall Recall refers to the ratio of correctly predicted positive items to all the actual positive items. TP Recall = TP + FN

5.4 Specificity Specificity refers to the ratio of correctly predicted negative items to all the actual negative items. TN Specificity = TN + FP

214

S. Bhawsar et al.

5.5 F-Measure F-measure refers to the harmonic mean of precision and recall.

5.6 Micro-averaging and Macro-averaging Micro-averaging and macro-averaging—They are used when there are multiple classes. In micro-averaging, we take the sum of all the classes and then take average, while in macro-averaging, we compute things individually for the class, and then, we take average.

5.7 Matthews Correlation Coefficient (MCC) Matthews correlation coefficient (MCC)—It is used when there is unequal distribution in the class size. It uses all the values of the confusion matrix to evaluate the classifier. It is the same as finding correlation coefficient of any two variables; here, variables are true class and predicted class. MCC = √

TP · TN − FP · FN (TP + FP) · (TP + FN) · (TN + FP) · (TN + FN)

5.8 ROC Receiver operating characteristic(ROC)—It is the graphical representation to evaluate the classifier. In one axis, we plot false positive rate, and in other axis, we plot true positive rate. Where the true positive rate is the same as recall. While false positive rate is the ratio of incorrectly predicted negative class to the actual negative class (i.e. 1—specificity). Area under ROC curve (AUC)—From the term itself, we can understand that it is the whole area under the ROC curve. It helps to identify how our classifier is distinguishing between positive and negative classes. Higher the area under ROC the better it is distinguishing.

6 Conclusion and Future Scope Text classification is now becoming the most important tool for the business, health, and in the marketing field. It helps in the detection of mails whether it is spam

16 Text Classification Using Deep Learning: A Survey

215

or not and analyses the sentiment of the audience from social media and various online sources. It also helps to categorize the documents into defined topics and in recommending items to users based on their interests. In this paper, we have compared various machine learning and deep learning models. And from this we can conclude that deep learning models performed better than the machine learning models. Major challenges faced by the researchers are while selecting the classifiers. We have also compared different models across data sets based on their accuracy. Future scope of this project is that we will be implementing text classification using some well-known models such as LSTM, BERT, RNN, and hybrid model. We will try to improve the performance of the model by doing some alteration in the preprocessing of the data or in the algorithm. We will also evaluate our model on some metrics like recall, precision and accuracy. So that we can compare our model with the state-of-the-art techniques. We will be using some well-known data sets for training such as IMDB, Yelp, and Amazon review data sets.

References 1. Aggarwal CC, Zhai CX (2012) A survey of text classification algorithms, pp 163–222 2. Anaby-Tavor A, Carmeli B, Goldbraich E, Kantor A, Kour G, Shlomov S, Tepper N, Zwerdling N (2020) Do not have enough data? Deep learning to the rescue! 34(05):7383–7390 3. Arora G (2002) inltk: Natural language toolkit for indic languages. arXiv preprint arXiv:2009.12534 4. Cai J, Li J, Li W, Wang J (2018) Deep learning model used in text classification, pp 123–126 5. Chen J, Hu Y, Liu J, Xiao Y, Jiang H (2019) Deep short text classification with knowledge powered attention 33(01):6252–6259 6. Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89 7. Gallo I, Nawaz S, Landro N, La Grassa R (2020) Visual word embedding for text classification, pp 339–352 8. Heidarysafa M, Kowsari K, Brown DE, Meimandi KJ, Barnes LE (2018) An improvement of data classification using random multimodel deep learning (rmdl). arXiv preprint arXiv:1808.08121 9. Joulin A, Grave E, Bojanowski P, Douze M, Jégou H, Mikolov T (2016) Fasttext.zip: compressing text classification models. arXiv preprint arXiv:1612.03651 10. Joulin A, Grave E, Bojanowski P, Mikolov T (2016) Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 11. Karisani P, Karisani N (2021) Semi-supervised text classification via self-pretraining. In: Proceedings of the 14th ACM international conference on web search and data mining, pp 40–48 12. Korde V, Namrata Mahender C (2012) Text classification and classifiers: a survey. Int J Artif Intelli Appl 3(2):85 13. Kowsari K, Brown DE, Heidarysafa M, Meimandi KJ, Gerber MS, Barnes LE (2017) Hdltex: hierarchical deep learning for text classification, pp 364–371 14. Kamran K, Meimandi KJ, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification algorithms: a survey. Information 10(4):150 15. Le Q, Mikolov T (2014) Distributed representations of sentences and documents, pp 1188–1196 16. Li Q, Peng H, Li J, Xia C, Yang R, Sun L, Yu PS, He L (2020) A survey on text classification: from shallow to deep learning. arXiv preprint arXiv:2008.00364 17. Meng Y, Zhang Y, Huang J, Xiong C, Ji H, Zhang C, Han J (2020) Text classification using label names only: a language model self-training approach. arXiv preprint arXiv:2010.07245

216

S. Bhawsar et al.

18. Minaee S, Nal K, Erik C, Narjes N, Meysam C, Jianfeng G (2021) Deep learning-based text classification: a comprehensive review. ACM Comput Surv (CSUR) 54(3):1–40 19. Nguyen DQ, Vu T, Nguyen AT (2020) Bertweet: a pre-trained language model for english tweets. arXiv preprint arXiv:2005.10200 20. Ohashi S, Takayama J, Kajiwara T, Arase Y (2021) Distinct label representations for few-shot text classification. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (vol 2: Short Papers), pp 831–836 21. Pavlinek M, Vili P (2017) Text classification method based on self-training and lda topic models. Expert Syst Appl 80:83–93 22. Wang B, Li C, Pavlu V, Aslam J (2017) Regularizing model complexity and label structure for multi-label text classification. arXiv preprint arXiv:1705.00740 23. Wang G, Li C, Wang W, Zhang Y, Shen D, Zhang X, Henao R, Carin L (2018) Joint embedding of words and labels for text classification. arXiv preprint arXiv:1805.04174 24. Wei J, Zou K (2019) Eda: easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196 25. Wu LY, Fisch A, Chopra S, Adams K, Bordes A, Weston J (2018) Starspace: embed all the things! 26. Yamada I, Shindo H (2019) Neural attentive bag-of-entities model for text classification. arXiv preprint arXiv:1909.01259 27. Zhang C, Wu J, Zhu H, Xu K (2021) Tent: text classification based on encoding tree learning. arXiv preprint arXiv:2110.02047 28. Zhang X, LeCun Y (2017) Which encoding is the best for text classification in Chinese, English, Japanese and Korean? arXiv preprint arXiv:1708.02657

Chapter 17

Significance of Artificial Intelligence in COVID-19 Detection and Control Abhishek Shrivastava and Vijay Kumar Dalla

1 Introduction The role of AI in the healthcare system is concerned with the application of intricate algorithms and digital software in the investigation, clarification, and understanding of complex medically diagnosed data [1]. AI technology is different from other conventional techniques in the healthcare system; it can collect information, regulate it, and provide a precise output to the perspective user. AI employs deep learning, Internet of Things (IoT), and machine learning algorithms [2–5]. These algorithms are intellectual and user-defined. However, in order to minimize errors, the AI algorithm must be tested repeatedly. In the current COVID-19 situation, most countries, including India, are battling against COVID-19 pandemic and looking for a cost-effective solution to combat the COVID-19 intelligently [6]. This concise work aimed to provide an understanding of this AI-based technology and its critical implementation to effectively combat this pandemic. The primary importance of AI-based technology in the healthcare system is to recognize the relationship between treatment and outcome of COVID-19 infected patients [7, 8]. AI algorithms have the potential to be used intelligently in COVID-19 diagnosis processes, contact tracing of an infected patient, patient monitoring, and drug and medicine development [9]. Furthermore, healthcare organizations are anticipating the use of AI-based algorithms to implement optimized initiatives that improve cost reduction and reduce the workload of healthcare workers. Google and IBM, two large creative technology corporations, have created AI-based algorithms for a healthcare system to successfully tackle COVID-19 [10]. In this current pandemic situation, the majorities of the A. Shrivastava (B) · V. K. Dalla National Institute of Technology Jamshedpur, Jamshedpur 831014, India e-mail: [email protected] V. K. Dalla e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_17

217

218

A. Shrivastava and V. K. Dalla

problems originate from in-effective contact tracing, delay in patient diagnosis, and vaccine development. The AI-based technique quickens patient contact tracing and allowing healthcare workers to cure the infected patient efficiently [11]. Artificial intelligence (AI) is a pioneering technology that allows for the proper treatment of an infected patient in quarantine. Throughout the quarantine period, AI is helpful in maintaining accurate patient monitoring. It is easy to track high-risk patients through the use of AI-based technology. This technology can also be used to identify a group of people who are not wearing a facial mask. People are given smart helmets that can detect a temperature increase within a 5 m radius. The citizen monitoring system known as health code was created to identify a patient’s risk factor by investigating their travel history and their presence in hotspots areas. The healthcare ministry can divide the state into red, yellow, and green zones based on big data analysis. The red color represents a high-risk-prone area, whereas the green zone allows daily activity to resume by implementing safety measures. Because of the COVID-19 pandemic outbreak, the number of severely infected patient is rapidly increasing, necessitating the use of well-organized technology such as AI. Furthermore, AI has previously been used to deal with pandemic situations [12]. By using the guidelines of AI, the count of resolved cases can be increased.

2 Procedure Implicated in AI-enabled Technique for COVID-19 AI can also be used to quickly diagnose patients and determine whether or not they are infected with COVID-19. The longer it takes to diagnose a patient, the more likely it is that the virus will multiply and infect more people. As a result, it is critical that coronavirus must be diagnosed as accurately and quickly as possible. We discovered that AI can be used to quickly diagnose coronavirus symptoms from a review of 49 research papers. Initially, an infected patient with coronavirus symptoms visits a doctor for virus screening and identification. The doctor then collects a sample and uses AI to make a quick diagnosis. AI is used for patient monitoring and contact tracing. During the rehabilitation process, the doctor can improve or restore the patient’s functional ability through physical exercises. After 14 days of quarantine, a COVID-19 retest is performed: (a) If the report is negative, the patient is cured; (b) if the report is positive, the patient is advised to be isolated. AI, according to medical officials, is accurate and capable of making a quick diagnosis of a patient.

3 Methods The Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) framework is used to prepare and report this systematic literature review.

17 Significance of Artificial Intelligence in COVID-19 Detection …

219

3.1 Criteria of Acceptance This study focused on peer-reviewed articles that used AI to study and cure COVID19 dilemma encompassing on diagnoses, prediction, and pathogen spread, rapid diagnosis, establishing a population surveillance system, developing vaccines.

3.2 Data Sources and Search Policy PubMed, Scopus, ACM, IEEE, Research Gate, Web of Science, CINAHL, Science direct, and Google Scholar databases were searched through academic papers peerreviewed journals or scientific papers accessible from Jan 1, 2021 over Nov 02, 2021. The syntax for searching was created, containing the following keywords such as COVID-19, SARS-CoV-2, artificial intelligence (AI), digital healthcare system, automation. The major purpose of this research is to undertake a thorough literature review on the role of AI in the healthcare system as a tool for addressing the COVID19 issue, as well as to evaluate its usage in demographic, medical, and teleoperation developments.

3.3 Method of Study Selection Following a methodical search, 280 publications were retrieved. After removing 100 duplicate publications, there were 180 possibly relevant selected papers for theme and abstract assessment. Furthermore, 180 publications records are screened after title and abstract, and 100 records are excluded which are related not relevant to COVID-19 (n = 30), AI is not implemented (n = 40), and magazine articles with improvement (n = 30). Following the evaluation of these articles for eligibility, 30 articles from the literature were studied, representing the three themes and its descriptions. They are arranged by number of times they have been published and the percentage of times they have been published.

3.4 Analyses of Data (Qualitative and Quantitative) A descriptive and analytical analysis of qualifying research (n = 30) that used AI to battle the pandemic was carried out in this work. Studies from the CE were subjected to qualitative analysis, while studies from the EDD and DP themes were subjected to quantitative descriptive analysis.

220

A. Shrivastava and V. K. Dalla

3.5 Descriptive Analysis of Search Result The search strategy yielded 280 academic articles, all of which were published and made available between January 1, 2021, and Nov 02, 2021. The 30 selected publications were split into three topics built on the AI applications employed to fight the COVID-19 dilemma. Artificial intelligence tools used to forecast, categorize, analyze, monitor, and regulate viral propagation are utilized to identify the three themes (CE, EDD, and DP). Following data collection and classification, the findings were summarized and provided in Table 1 in line with the study’s aim.

4 Discussion In the future, AI techniques will be used to track, detect, and contain the COVID-19 pandemic. Our systematic review focused on 30 studies that used AI methods and identified three broad themes: models that aid to solve issues central in epidemiological studies, models that aid in COVID-19 patient diagnosis, and models that aid in COVID-19 patient prognosis. Table 1 Summary of the 30 publications studied in the research, emphasizing the three major themes and its descriptions S/N Theme

Explanation

1

Computational epidemiology (CE)

The development and application of 15 (50.00%) AI to epidemiological challenges, such as disease patterns and anticipating possible outbreaks, contact tracking and diagnosis, hospital disinfection, and vaccine and medication development, were all covered in the publications

2

Early detection and diagnosis (EDD) The papers discussed the use of AI 5 (16.6%) algorithms to detect and distinguish COVID-19 positive patients from the overall population, contact tracing, and COVID-19 pandemic monitoring

3

Disease progression (DP)

Several studies in the confirmed COVID-19 community employed AI models to forecast illness onset, complexity, and predicted results

Article count

10 (33.33%)

17 Significance of Artificial Intelligence in COVID-19 Detection …

221

4.1 Model 1: Computational Epidemiology (CE) In this model, we examined the use of AI in epidemiology-related issues such as disease trends and forecasting potential outbreaks, contact tracing and diagnosis, hospital sanitization, and vaccine development.

4.1.1

AI Technique for Establishing Contact Tracing Systems

If an individual has COVID-19, the next step is to follow their contacts in order to prevent the disease from spreading. According to the WHO [13], the virus spreads from person to person by contact transmission, most commonly via saliva, droplets, or nasal discharge. Contact tracing is an important public health tool for breaking the viral transmission chain and reducing the transmission of SARS-Cov-2. To minimize further COVID-19 transmission, the contact tracing strategy is used to identify and treat persons who have recently been in touch with an infected individual [14]. After a 14-day follow-up following the exposure, the approach generally identifies the afflicted person. This technique, if extensively used, will break the current novel coronavirus transmission cycle and reduce the pandemic by increasing the possibility of COVID-19 control. Several impacted nations employ Bluetooth, navigation systems, Facebook, network-based API, cellular tracking data, credit transaction records, and the system’s geographical region to create a virtual contact tracing mechanism with the smartphone app. In comparison to the non-digital technique, the digital contact tracing procedure may be basically real time and significantly faster. Both of these technological inventions are designed to capture personal information from individuals, which will be examined by AI software in order to monitor an individual who has been exposed to the viral disease as a result of their recent contact network [15–18]. A medical company, Infravision, is using artificial intelligence (AI) to diagnose patients faster than ever before. Hospital imaging departments are swarming as they assist in the diagnosis of over 10,000 coronavirus cases in a single day. Chinese company Alibaba has also developed an AI system that can diagnose coronavirus with 99% accuracy [19]. The AI system uses CT scans of the patients’ chests to determine whether or not they are contaminated in about 50 s. Alternatively, humans can take about 15 min to diagnose the illness by seeing the CT scans. Each minute saved in the diagnosis process is critical because it reduces the possibility of cross-contamination among hospitalized patients. To prevent the development of this pandemic and to treat infected individuals as soon as feasible, it is vital to detect positive cases as soon as possible. Coronavirus cases are tracked using population surveillance equipment. The AAROGYA SETU app categorizes a patient’s risk level into three color-coded categories (green, red, and yellow). Furthermore, machine learning algorithms estimate the position of the confinement zone in real time using traveling history and digitally stored data [20]. Many countries are keeping an eye out for COVID-19 instances

222

A. Shrivastava and V. K. Dalla

through population monitoring. China provides each individual a threat level (color code—red, yellow, or green) reflecting infection threat using mobile phone software. Machine learning algorithms, on the other hand, use travel, financial, and contacts data to anticipate the location of the next epidemic and warn border officials.

4.2 Model 2: EDD Model 4.2.1

Identification, Tracking, and Forecasting Coronavirus Outbreaks

To effectively stop this virus, it is necessary to identify its occurrence in humans and monitor its growth. Bluedot is using AI-enabled techniques to forecast and anticipate the global impact of COVID-19 [21]. AI is also used to identify people who may be infected with a virus or a coronavirus. Sense Time, an artificial intelligence SaaS company based in Hong Kong, uses facial recognition technology in conjunction with thermal imaging technology to identify patients who may be infected with coronavirus [22]. In contrast, image recognition systems synchronize medical diagnosis in the following ways.

Rapid Diagnosis By collecting data reports from suspected individuals in the containment zone, AI can quickly understand the spread of disease. This AI technology is precise and saves time in patient diagnosis [23]. The convolutional neural network (CNN) has been demonstrated as an effectual learning algorithm for image recognition in recent years [24]. This is one type of image processing algorithm that is used to extract features from images. It extracts features based on the edges and color of the image (horizontal and vertical). CNN is composed of three layers. They are divided into three types: convolutional layers, pooling layers, and fully connected layers. Real-time RT-PCR is the most extensively utilized test approach for COVID-19 diagnosis [25, 26]. Chest radiology imaging, such as computerized tomography (CT) and X-ray, is critical in the early detection and treatment of this condition [27]. Despite the poor RT-PCR sensitivity of 60–70%, symptoms can be diagnosed by evaluating radiographic imaging of patients, even when negative findings are obtained. CT scans are considered to be efficient for diagnosing COVID-19 pneumonia and could be used as an RT-PCR screening technique. CT findings are acquired over time following the beginning of illness, and individuals often have a normal CT during the first two days. Figure 1 illustrates X-ray scans of the chest of a 60-year-old COVID-19 patient from day one to day seven. Table 2 shows role of AI in screening of patient in COVID-19 pandemic. Early detection of any disease, infectious or non-infectious, is critical for early care and saving lives. The rapid method of diagnosis and screening aids in the prevention and diagnosis of pandemic diseases such as SARS-CoV-2.

17 Significance of Artificial Intelligence in COVID-19 Detection …

223

Fig. 1 a X-ray scans of the lungs of a 60-year-old COVID-19 patient who had pneumonia for a week. b Patchy ill-defined bilateral alveolar. c Infected upper left lobe of lungs. d Findings of acute respiratory distress syndrome (ADRS) in lungs

4.2.2

AI-enabled Diagnostic Models Based on Lung Ultrasound (LUS) and Non-imaging Clinical Data

Throughout the 2009 influenza (H1N1) pandemic, LUS were initiated to have higher sensitivity in sleuthing avian flu (H7N9) than chest X-ray and to be valuable in distinctive between bacterial pneumonia [30]. Although health professionals recommend using LUS image analysis in the emergency ward to cure COVID-19, the exact role of this imaging is unidentified. In our review, we revealed a study by [31] that used an accumulated LUS COVID-19 dataset and an ML prototype to antedate mortality risk. COVID-19 contagion could be predicted with a C-statistic of 82% and a compassion of 95%.

4.3 Model 3: DP Models 4.3.1

Risk Stratification

The ability to categorize COVID-19 cases that are likely to result in a negative outcome will aid in the organization of healthcare resources for patients who require immediate attention. Yan et al. [32] tested various machine learning models to determine whether COVID-19 cases were deceased or alive. Variables in this prediction model included demographic characteristics, chronic health conditions, and clinical findings. Other research has discovered a strong link between chronic health conditions and DP. The study also found that DP is associated with a reduction in polymorphonuclear leukocytes and an upsurge in alkaline phosphatase stages. The total AUC reported was 0.859. Li et al. [33] used X-rays and CNN architectures to compute a bronchial disease severity score. The score was calculated among the

AI method

Deep convolutional neural network ResNet-101 [28]

Random forest algorithm [29]

S/N

1

2

Clinical, demographics

Clinical, mamographic

Type of data

453,269,89,54

106,096

Patient count

Table 2 Role of AI in screening of patient in COVID-19 pandemic

Cross-validation

Holdout

Validation technique

Accuracy: 99.21% Specificity: 99.07%

Accuracy

Overall, 453 samples from Accuracy: 95.21% 269 suspected COVID-19 Specificity: 96.07% patients were obtained from different sources. About 89 patients received a clinical blood test from a private clinic center. And, 54 samples of COVID-19 infected patients

1060 CT images of 108 laboratory-confirmed COVID-19 patients; 96 CT images of patients with bacterial and a typical pneumonia;

Size of sample data

224 A. Shrivastava and V. K. Dalla

17 Significance of Artificial Intelligence in COVID-19 Detection …

225

patient’s image and a group of object classes through Siamese computational model as clustering algorithm. For patients who were not intubated at the time of admission, the score (AUROC 0.90) significantly predicted resuscitation or death within three days. Because COVID-19 pneumonia has a high mortality rate, distinguishing it from other respiratory infections is critical. The presence of myalgias, laboratory levels of aspartate aminotransferase (AAT), and increased red blood cells (RBC) were found to be the most predictive of these parameters. ARDS had an overall accuracy rate of 90%. Furthermore, using only AAT, the approach produced an accurateness of 80%. Estiri et al. [34] created a DL predictive and therapeutic analytical technique for identifying COVID-19, as well as risk variables for early detection and illness surveillance.

4.3.2

AI for Deterioration Prediction of COVID-19 Patients

An artificial intelligence system may forecast worsening in COVID-19 patients who present to the emergency department, with deterioration defined as the outcome of mortality, intubation, or ICU admission [35]. The system’s purpose is to offer doctors with a quantifiable estimate of the risk of deterioration, as well as how that risk is predicted to vary over time, so that patients at high risk of deterioration can be properly triaged and prioritized. The strategy could be especially effective in pandemic hotspots when triage upon entrance is critical for allocating limited resources such as hospital beds. Recent research has shown that chest X-ray imaging can aid in the diagnosis of COVID-19 [36]. Furthermore, chest X-ray imaging and routinely collected clinical factors contain complementary information that can be used to predict clinical deterioration. This research builds on previous prognostic research, which has primarily focused on developing risk prediction models based on non-imaging factors extracted from electronic health records.

4.3.3

To Disinfect Hospital and Nursing Duty Done by Social Robots

Social robots cannot be infected by coronavirus, so they are the understandable option to disinfect hospital and other critical places, so citizens are less expected to get contaminated by the novel coronavirus pandemic. As a result, the Blue Ocean Robotics, an award-winning worldwide robotics corporation, utilized its UVD robots to separately kill bacteria and viruses in hospitals with ultraviolet light (UV) rays [14– 16, 37]. These robots advise people to vacant the room, shut the gate, and then start sterilizing the room. Table 1 represents major applications of artificial intelligence (AI) in COVID-19 pandemic.

226

4.3.4

A. Shrivastava and V. K. Dalla

Role of Robots for Early Detection and Diagnosing of COVID-19 Pandemic

The first infection of coronavirus in China was identified by Bluedot which is an artificial intelligence (AI)-based deadly infectious disease identification system. Previously, Bluedot has predicted Zika (2016) and Ebola (2014) in the past [38]. Artificial intelligence [39] is also used to identify SARS-COV-2 by having infervision software, which automatically finds out symptoms of coronavirus using CT scan images. This technology saves time and has no human error in testing. In Wuhan, Huawei, and Alibaba, technology leaders are working with government to develop customized AIbased algorithm to fight against COVID-19 [40]. Alibaba is contributing in following ways: • • • •

Using CT image analytics solutions for mass testing Using AI-based mathematical modeling to identify cause of pneumonia types Using epidemic prediction solution to relate epidemic outcome of virus Using virus genome sequencing application to diagnose coronavirus disease within 3 h.

A Chinese company called Baidu is using specific computer vision control algorithm to help computers to label and understand images. This technology also uses infrared sensors for developing temperature profiles in public areas in very few seconds [41]. Also, AI is also helping to shorten the duration of development of drug and vaccine research against COVID-19. Since COVID-19 victims cases are rising abruptly, so AI is helping doctors to decide how to identify infected patient [42]. Prof. Andrew Hopkins discovered that AI can prove to be effective tool against spread of coronavirus disease. According to him, AI can contribute in following ways: • AI can develop antibodies and vaccines for coronavirus very fast • AI can scan through existing drug, to see whether it can be repurposed • AI can design drug to fight against future outbreak of epidemic disease. Google owned company Deep Mind using its Alpha Fold system to discover structure of proteins associated with virus.

4.3.5

Role of Automation to Fight Against COVID-19 Pandemic

Automation is playing critical role to safeguard people during global coronavirus crisis. The association for advancing automation (A3) supports members in robotics, artificial intelligence (AI), and vision and imaging technology [43]. Cloud minds have provided cloud-based systems for robots; it has donated 12 sets of robots in Wuhan, China. These robots can be used to monitor heart rate, fever symptoms, deliver medicines, spraying disinfectants in hospitals by enabling 5G network. PIA Automation, a MNC company, is offering automation solutions to increase production rate of surgical mask. A team of 24 engineers are working to manufacture 200,000 masks per day in Zhejiang Province, China. To fight against coronavirus,

17 Significance of Artificial Intelligence in COVID-19 Detection …

227

thermal cameras are utilized along with robots to measure body temperature scanning process. Airports are using FLIR thermal cameras for screening passengers and crew members [44]. In the USA, ABB robots approved by Roche Molecular Solutions are producing FDA tests to get faster results of coronavirus positive cases. This FDA tests once produced will increase testing rate around 400,000 cases per day [45].

4.3.6

Role of Robot in Teleoperation

Teleoperation is advanced technology that can be utilized for telemedicine and telecommunication. Robots can be utilized in universities, schools, industry for advancement of online courses and interaction among individuals by using technology of 5G and 6–8 K video. In the near future, it is expected to have virtual robot for conducting online conferences and international exhibitions worldwide [46]. Above-mentioned preventive steps will decrease the rate of infection and carbon footprints. COVID-19 could act as a catalyst in development of robotics systems which will be monitored by experts and essential service providers. Quarantine of infected patient may lead to isolation of patient from social interaction, which can create depletion in mental health of individuals. To solve this issue, social robots can be deployed in hospitals which can interact with patient in friendly manner. However, this task is difficult because social interaction requires managing variant group of people which have different culture, beliefs, knowledge, and emotion power along with environmental situation of the interaction.

5 Conclusion The aim of this work is to perform a thorough evaluation of the literature on AI’s involvement in COVID-19 outbreak prevention. According to the literature review, the major issue in the implementation of AI in the healthcare system is the security and privacy of data stored during the treatment of COVID-19 infected patients. This information is one-of-a-kind, and any insurance company can file a fraud claim. As a result, there is an urgent need for policy to monitor connected healthcare devices via internet protocol (IP) address. Any leakage of healthcare data should be avoided at all costs by using an AI-enabled algorithm. We believe that future research should focus on the proper management of patient-stored data based on the findings of this study. Furthermore, a cost-effective treatment facility is desperately needed in the healthcare system. The medical healthcare system is expected to become accustomed to digital technology, including smart surveillance technology, in the future. Artificial intelligence (AI) is a potential and beneficial tool for the healthcare system in the battle with COVID-19. All AI healthcare devices are interconnected via the internet, and it can accurately communicate a message to healthcare workers in an emergency. It will be simple to handle COVID-19 cases in rural areas thanks to the teleservice

228

A. Shrivastava and V. K. Dalla

network. We can use AI to intelligently screen, track, and monitor the patient. Furthermore, quality control can be carried out in the hospital using real-time data. AI can also predict when a pandemic will occur in the future. Finally, doctors and researchers will be able to propose better solutions to combat the COVID-19 pandemic if AI is successfully applied.

References 1. Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, Wang Y, Dong Q, Shen H, Wang Y (2017) Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol 2(4):230–243 2. Diro AA, Chilamkurti N (2018) Distributed attack detection scheme using deep learning approach for Internet of Things. Futur Gener Comput Syst 1(82):761–768 3. Ambika P (2020) Machine learning and deep learning algorithms on the Industrial Internet of Things (IIoT). Adv Comput 117(1):321–338 4. Oniani S, Marques G, Barnovi S, Pires IM, Bhoi AK (2021) Artificial intelligence for internet of things and enhanced medical systems. In: Bio-inspired neurocomputing. Springer, Singapore, pp 43–59 5. Zikria YB, Afzal MK, Kim SW, Marin A, Guizani M Deep learning for intelligent IoT: opportunities, challenges and solutions 6. Pham QV, Nguyen DC, Hwang WJ, Pathirana PN Artificial intelligence (AI) and big data for coronavirus (COVID-19) pandemic: a survey on the state-of-the-arts 7. Vaishya R, Javaid M, Khan IH, Haleem A (2020) Artificial Intelligence (AI) applications for COVID-19 pandemic. Diabetes Metab Syndr 14(4):337–339 8. Vafea MT, Atalla E, Georgakas J, Shehadeh F, Mylona EK, Kalligeros M, Mylonakis E (2020) Emerging technologies for use in the study, diagnosis, and treatment of patients with COVID-19. Cell Mol Bioeng 13(4):249–257 9. Agbehadji IE, Awuzie BO, Ngowi AB, Millham RC (2020) Review of big data analytics, artificial intelligence and nature-inspired computing models towards accurate detection of COVID-19 pandemic cases and contact tracing. Int J Environ Res Public Health 17(15):5330 10. Lee SM, Lee D (2021) Opportunities and challenges for contactless healthcare services in the post-COVID-19 Era. Technol Forecast Soc Chang 1(167):120712 11. Udgata SK, Suryadevara NK (2021) COVID-19, sensors, and Internet of Medical Things (IoMT). In: Internet of Things and sensor network for COVID-19. Springer, Singapore, pp 39–53 12. Naudé W Artificial intelligence against COVID-19: an early review 13. Peng X, Xu X, Li Y, Cheng L, Zhou X, Ren B (2020) Transmission routes of 2019-nCoV and controls in dental practice. Int J Oral Sci 12(1):1–6 14. Lalmuanawma S, Hussain J, Chhakchhuak L (2020) Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: a review. Chaos Solitons Fractals 25:110059 15. Vargo D, Zhu L, Benwell B, Yan Z (2021) Digital technology use during COVID-19 pandemic: a rapid review. Human Behav Emerg Technol 3(1):13–24 16. He W, Zhang ZJ, Li W (2021) Information technology solutions, challenges, and suggestions for tackling the COVID-19 pandemic. Int J Inf Manage 1(57):102287 17. Euchi J (2020) Do drones have a realistic place in a pandemic fight for delivering medical supplies in healthcare systems problems. Chin J Aeronaut 18. Fong SJ, Dey N, Chaki J (2020) Artificial intelligence for coronavirus outbreak. Springer, Singapore 19. Jin C, Chen W, Cao Y, Xu Z, Tan Z, Zhang X, Deng L, Zheng C, Zhou J, Shi H, Feng J (2020) Development and evaluation of an artificial intelligence system for COVID-19 diagnosis. Nat Commun 11(1):1–4

17 Significance of Artificial Intelligence in COVID-19 Detection …

229

20. Saleh S, Shayor F (2020) High-level design and rapid implementation of a clinical and non-clinical Blockchain-based data sharing platform for COVID-19 containment. Frontiers Blockchain 27(3):51 21. Zhang K, Liu X, Shen J, Li Z, Sang Y, Wu X, Zha Y, Liang W, Wang C, Wang K, Ye L (2020) Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography. Cell 181(6):1423–1433 22. Rahman MA, Zaman N, Asyhari AT, Al-Turjman F, Bhuiyan MZ, Zolkipli MF (2020) Datadriven dynamic clustering framework for mitigating the adverse economic impact of Covid-19 lockdown practices. Sustain Cities Soc 1(62):102372 23. Siddiqui MF (2021) IoMT potential impact in COVID-19: combating a pandemic with innovation. In: Computational intelligence methods in COVID-19: surveillance, prevention, prediction and diagnosis. Springer, Singapore, pp 349–361 24. Cai S, Zhou S, Xu C, Gao Q (2019) Dense motion estimation of particle images via a convolutional neural network. Exp Fluids 60(4):1–6 25. He JL, Luo L, Luo ZD, Lyu JX, Ng MY, Shen XP, Wen Z (2020) Diagnostic performance between CT and initial real-time RT-PCR for clinically suspected 2019 coronavirus disease (COVID-19) patients outside Wuhan, China. Respir Med 1(168):105980 26. Jawerth N (2020) How is the COVID-19 virus detected using real time RT-PCR. International Atomic Energy Agency. Vienna International Centre, PO Box. June 2020, p 100 27. Borghesi A, Maroldi R (2020) COVID-19 outbreak in Italy: experimental chest X-ray scoring system for quantifying and monitoring disease progression. Radiol Med (Torino) 125(5):509– 513 28. Sultana F, Sufian A, Dutta P (2020) Evolution of image segmentation using deep convolutional neural network: a survey. Knowl-Based Syst 9(201):106062 29. Iwendi C, Bashir AK, Peshkar A, Sujatha R, Chatterjee JM, Pasupuleti S, Mishra R, Pillai S, Jo O (2020) COVID-19 patient health prediction using boosted random forest algorithm. Front Public Health 3(8):357 30. Singh S, Dalla VK, Shrivastava A (2021) Combating COVID-19: study of robotic solutions for COVID-19. In: AIP conference proceedings, 13 May 2021, vol 2341(1). AIP Publishing LLC, p 020042 31. Abhishek K, Dalla VK, Shrivastava A (2021) Humanoid robot applications in COVID-19: a comprehensive study. In: AIP conference proceedings, 13 May 2021, vol 2341(1). AIP Publishing LLC, p 020040 32. Yan L, Zhang HT, Xiao Y, Wang M, Guo Y, Sun C, Tang X, Jing L, Li S, Zhang M, Xiao Y (2020) Prediction of criticality in patients with severe Covid-19 infection using three clinical features: a machine learning-based prognostic model with clinical data in Wuhan. MedRxiv, 1 January 2020 33. Li Y, Wei D, Chen J, Cao S, Zhou H, Zhu Y, Wu J, Lan L, Sun W, Qian T, Ma K (2020) Efficient and effective training of COVID-19 classification networks with self-supervised dual-track learning to rank. IEEE J Biomed Health Inform 24(10):2787–2797 34. Estiri H, Strasser ZH, Klann JG, Naseri P, Wagholikar KB, Murphy SN (2021) Predicting COVID-19 mortality with electronic medical records. NPJ Dig Med 4(1):1 35. Fang C, Bai S, Chen Q, Zhou Y, Xia L, Qin L, Gong S, Xie X, Zhou C, Tu D, Zhang C (2021) Deep learning for predicting COVID-19 malignant progression. Med Image Anal 1(72):102096 36. Feng Z, Yu Q, Yao S, Luo L, Zhou W, Mao X, Li J, Duan J, Yan Z, Yang M, Tan H (2020) Early prediction of disease progression in COVID-19 pneumonia patients with chest CT and clinical characteristics. Nat Commun 11(1):1–9 37. Shrivastava A, Dalla VK (2021) Failure control and energy optimization of multi-axes space manipulator through genetic algorithm approach. J Braz Soc Mech Sci Eng 43(10):1–7 38. Allam Z, Dey G, Jones DS (2020) Artificial Intelligence (AI) provided early detection of the coronavirus (COVID-19) in China and will influence future urban health policy internationally. AI 1(2):156–165 39. Nguyen D, Ding M, Pathirana PN, Seneviratne A (2020) Blockchain and AI-based solutions to combat coronavirus (COVID-19)-like epidemics: a survey

230

A. Shrivastava and V. K. Dalla

40. Arenal A, Armuña C, Feijoo C, Ramos S, Xu Z, Moreno A (2020) Innovation ecosystems theory revisited: the case of artificial intelligence in China. Telecommun Policy 101960 41. Yang Y, Yang D, Xu Y, Wang L, Huang Y, Li X, Liu X (2019) AI and retinal image analysis at Baidu. In: Computational retinal image analysis. Academic Press, pp 405–427 42. Keesara S, Jonas A, Schulman K (2020) Covid-19 and health care’s digital revolution. New England J Med 43. Huang MH, Rust RT (2018) Artificial intelligence in service. J Serv Res 21(2):155–172 44. Gade R, Moeslund TB (2014) Thermal cameras and applications: a survey. Mach Vis Appl 25(1):245–262 45. Doudna JA (2020) Blueprint for a pop-up SARS-CoV-2 testing lab. medRxiv 46. El Kalam AA, Ferreira A, Kratz F (2015) Bilateral teleoperation system using QoS and secure communication networks for telemedicine applications. IEEE Syst J 10(2):709–720

Chapter 18

Anomalous Human Activity Detection Using Stick Figure and Deep Learning Model P. D. Rathika, G. Subashini, S. Nithish Kumar, and S. Ram Prakash

1 Introduction Advanced surveillance systems replace the human surveillance currently used for monitoring shopping malls, banks, museums, etc. This paper focuses on a surveillance system which is intelligent enough to detect the pose of the human and automatically alert the authorities on the account of an anomaly. The key points are obtained from a pre-trained model by providing the camera image as input. A deep neural network architecture is created and trained with the collected key points. In general, the system takes in an image from the camera as the input and outputs a pose which is either normal or anomalous which can further be sent to a microprocessor or a microcontroller to turn on an alarm or send a private message.

P. D. Rathika (B) · G. Subashini Department of Robotics and Automation Engineering, PSG College of Technology, Coimbatore, Tamil Nadu, India e-mail: [email protected] G. Subashini e-mail: [email protected] S. Nithish Kumar · S. Ram Prakash Final Year UG Student, B.E. Robotics and Automation, PSG College of Technology, Coimbatore, Tamil Nadu, India e-mail: [email protected] S. Ram Prakash e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_18

231

232

P. D. Rathika et al.

2 Literature Survey 2.1 Pose Model Selection Conventional human pose estimation models involve two stages. The first one is to find the bounding box of the human and then finding the position of the joints. UniPose [1] is a human pose estimation model which unifies both the above tasks into a single stage with high accuracy. The architecture of UniPose involves ResNet101, Water-fall Atrous Spacial Pooling (WASP) module, and decoder. The ResNet101 extracts low-level features (256 features) from the image. The WASP module is a combination of Atrous Spatial Pyramid Pooling (ASPP), Res2Net, and cascade modules. The ASPP module consists of four Atrous convolutions in parallel branches each with different rates and thus will have different field of view providing a wide range of feature map (256 features). The features from both the ResNet-101 and WASP are passed into decoder which produces heatmaps for both joints and boundary. For pose estimation in videos, LSTM layer is also included which is known as UniPose-LSTM. The remaining architecture is same as UniPose. UniPose-LSTM involves the use of heatmap from previous frame and current frame in order to avoid error due to occlusion and for better accuracy. Convolution-based pose estimation models are also becoming popular. Convolutional Pose Machines [2] use deep convolutional layers for the purpose of pose estimation. Each convolutional layer outputs belief maps. The convolutional layers will learn image-dependent spatial models of the relationships between parts. The initial stage of convolution will have small receptive field which examines the image locally. The subsequent stages will have receptive fields increasing to learn implicit relationship between different body parts. The size of receptive fields can be increased by using multiple convolutional layers. Increasing the number of convolutional layers will cause the problem of vanishing gradients, and this problem has been overcome by enforcing supervision at intermediate stages. This work also proves that enforcing supervision at intermediate stages improves accuracy compared to the supervision only at the final stage. Compared to normal graphical models, CPM shows better accuracy. EfficientPose [4] is highly efficient single-person pose estimation model. When an input image 368 × 368 pixels is given to OpenPose [3], it uses an ImageNet to extract basic features. But in the case of EfficientPose IV, the OpenPose architecture is converted into ConvNets. This increases the precision of detection and provides an efficient estimation for a single-person pose from two-dimensional images. The EfficientPose architecture differs from that of OpenPose architecture in the following ways: (1) EfficientPose can work on both high and low-resolution input images, (2) It has scalable EfficientNet backbones, (3) It also has cross-resolution features, (4) It consists of scalable mobile DenseNet detection blocks in fewer detection passes, (5) and bilinear upscaling. Input for EfficientPose can be both low and high-resolution images. Through an initial average pooling layer, an image of high resolution is downsampled into 1/2 of its pixel height and width.

18 Anomalous Human Activity Detection Using Stick Figure …

233

EfficientPose consists of a feature extractor, made of initial blocks of EfficientNets pre-trained on ImageNet. The initial three blocks of a high-scale EfficientNet is used to extract high-level semantic information from high-resolution images. Lowlevel local information is extracted from the low-resolution image by the first two blocks of a lower-scale EfficientNet backbone. Cross-resolution feature is obtained by combining both the high-level and low-level features. Thus, EfficientPose architecture can focus on important local factors from the image. EfficientPose is more advantageous in cases where little memory and low processing power is present, because of the small and efficient ConvNets that are used for computing. For a single-person pose estimation, OpenPose takes more epochs to converge than EfficientNet (up to 400 epochs based on the paper). Based on the experimental results given in the paper, EfficientPose needed only 200 epochs to converge whereas it took more than that for OpenPose to converge (for single-person detection only). MediaPipe is a Python package which consists of many pre-trained models. BlazePose [8] is one such models in MediaPipe for human pose estimation specifically designed for applications like exercise, yoga, etc. Unlike the other models which detect the region of interest (ROI) for each and every frame, BlazePose detects the ROI only for the first frame and starts tracking subsequent frames using the features from the previous frame until the tracking confidence goes below the threshold value. This significantly reduces the computational power required. Detecting the face is easy compared to other body parts; therefore, BlazeFace has been used for detecting the face first, using which the torso and ROI can be found easily. Conventional human pose estimation models compute only heatmap from the image, whereas BlazePose computes heatmap and offset map. BlazePose outputs 33 key points along with two virtual key points (one at the human body center and another at the circumference of the circle). The two virtual key points help in consistent tracking even during complicated human poses. The drawback of this model is that the human face must always be visible for tracking since the face is used as a proxy for the detector. After analyzing different existing human pose estimation models, it is found that BlazePose is much suited to our application (anomalous detection) since it outputs more number of key points, and the computational power required is much lower compared to other models. The MediaPipe BlazePose model outputs 33 key points. Figure 1 shows the key points corresponding to the body parts. Paper [6] defines a pictorial structure model. Each part of the human body (li) is represented as a rectangular box with its location coordinate as (x, y), scaling factor (s), and orientation (). The first step is to feed the input image to the part-based model. The next step is to detect the face in the image followed by Search Space Reduction. An appearance model is then created which is sent to image parsing to finally develop a pictorial pose. In paper [5], a model is developed using the amplitude, angle, and temporal continuity for the joints on the account of detection of an anomaly. The above mentioned parameters are modeled using kernel density estimation. Infinite anomaly data can be sampled using kernel density estimation from the real anomaly data. In this paper, simple yet efficient techniques are used for estimating the human pose.

234

P. D. Rathika et al.

Fig. 1 Pose Landmarks of the BlazePose Model Source https://google.github.io/mediapipe/soluti ons/pose.html

3 Implementation Figure 2 shows the block diagram of the overall working of the project. The input image is taken from the integrated web cam of the laptop of size 480 × 640 pixels. The image is given to the BlazePose model to obtain the key points of human body present in the image. The key points are then fed into the trained ANN model which outputs the pose (one of the four classes—normal, climb, crawl, squat). The output pose, thus, obtained is sent to a microcontroller which is programmed to turn on LEDs and buzzer when the pose is anomalous. Figure 3 shows the sequence flow diagram of the project. In Step 1, the input is obtained from integrated web of the laptop and passed to BlazePose model to check the detection confidence. If detection confidence is above the threshold, landmarks (key points) will be returned by the BlazePose model. The key points are sent to deep learning model to identify the pose. After the pose is found, it will be displayed on the monitor and signal will be sent to Arduino microcontroller. Upon the detection of anomalous activity, the buzzer connected to the controller will turn on.

3.1 Data Collection Initially, a CSV file is opened. The user then sets a particular time in seconds until which the data must be collected. The program waits for the user to manually press the ‘z’ key on the keyboard. Once the key is pressed, the user is supposed to take the position. The input image is taken from the camera, while the user is posing.

18 Anomalous Human Activity Detection Using Stick Figure …

235

Fig. 2 Block diagram—Overall Working

The image is then sent to the BlazePose model, and the key points are obtained. Key points are then recorded in the CSV file. These actions continue until the set time is reached. For normal pose data points, ‘n’ is chosen as the label. For climb, squat, crawl—‘c’, ’q’, ‘r’, respectively, are chosen as labels.

3.2 Model Training The contents of the CSV file are read. The dataset is then split as train and test. Then, the output labels (i.e., Pose) must be one-hot encoded. The next step is to create a deep learning architecture using Python frameworks, TensorFlow, and Keras, and

236

P. D. Rathika et al.

Fig. 3 Sequence Flow Diagram

the model is trained. The model is tested to attain the best result on the classification report. The model is then saved in the desired format (.h5).

3.3 Prediction The trained model is then deployed for predicting the pose using the key points. The trained model takes the key points as the input, and output is a number from 0 to 3 which corresponds to a particular pose. The pose can then be decoded back and then fed to the microcontroller or microprocessor for further action.

3.4 Data Visualization and Preprocessing Once the data has been collected, it has to be visualized to check whether preprocessing is required. The following graphs are plotted for visualization. Since the labels are alphabets, first they are converted into numbers (‘c’–0, ‘r’–1, ‘n’–2, ‘q’–3). Figure 4 shows the correlation plot between the pose (labels) and the coordinates (x, y) of the key points. This plot clearly shows that only few key points are highly correlated to the pose. Hence, these data points need to be given high priority while preprocessing the data. The duplicate data points present in these coordinates (y15 , y14 , y23 , y21 , y17 , y22 , y19 , y20 , y16 , y18 ) need to be removed to ensure the model does not overfit while training. The duplicate data points corresponding to these 10

18 Anomalous Human Activity Detection Using Stick Figure …

237

Fig. 4 Correlation Plot between Key points and Pose

coordinates are removed using Microsoft Excel since it is easier to select and delete the required data points in Excel than coding in Python. Figures 5 and 6 show the boxplot of the coordinates x 25 and y25 (left_knee) for different poses. From the plot, it is clear that there are no outliers for x 25 coordinates, whereas there are lots of outliers for y25 coordinates. Similarly, when the remaining key points were visualized using boxplot, similar kinds of plots appeared. So, the data points with extreme outliers in the y coordinates are removed to ensure proper training. Figure 7 also includes the scatterplot of x 11 versus y11 key points. It clearly shows that the key points for different poses are differentiated properly. Similarly for the

Fig. 5 Coordinates x 25 (left_knee)

238

P. D. Rathika et al.

Fig. 6 Coordinates y25 (left_knee)

Fig. 7 Scatterplot of x 11 versus y11 key points

remaining key points for each pose, the plots look well differentiated so that the model does not confuse while training.

3.5 Results Data visualization and elimination of unnecessary data points are done; now, the dataset is ready to be split for training and testing. About 70% of the data has been allotted for training and 30% data for testing. Then, the labels are converted to

18 Anomalous Human Activity Detection Using Stick Figure …

239

Fig. 8 Model Accuracy

categorical by one-hot encoding, and the data has been fitted and transformed. Now, the dataset is ready for training. After a lot of iterations, a three-layer deep learning model was found to perform well. The input dense layer has 66 nodes, then the following hidden layers have 33, 16, 4 neurons, respectively. For the input layer and the following two hidden layers, rectified linear units (‘ReLU’) activation function is used. For the output layer, ‘softmax’ activation layer is used and ‘Adam’ optimizer is chosen. The training has been done for 160 epochs. The training accuracy is 94.63%, and test data accuracy is 94.51%. The graph projecting the accuracy of the model for each epoch is shown in Fig. 8. The accuracy of both the training and test dataset is almost similar since there is no much difference between the data points. The graph in Fig. 9 also includes the training and test data loss plot.

3.6 Output The model has been tested with real-time inputs. Figure 10 shows the output of the model for different poses. The output from the model is sent to the Arduino microcontroller which turns on three LEDs sequentially with a delay of 1 s after the anomalous activity is detected. Even after 3 s, if anomalous activity is detected, the fourth LED glows (shown in Fig. 10) which confirms the anomalous activity. Instead of the fourth LED, a buzzer could be replaced.

240

P. D. Rathika et al.

Fig. 9 Training and Test Data Loss

4 Conclusion A prebuild human pose estimation model is chosen based on the requirements, and a deep learning model is trained to classify the pose based on the inputs from the prebuild pose estimation model and has been successfully deployed using Arduino microcontroller.

18 Anomalous Human Activity Detection Using Stick Figure …

241

Fig. 10 Output of the ANN Model and Implementation using Arduino

References 1. Artacho B, Savakis A (2020) UniPose—unified human pose estimation in single ımages and videos 2. Wei S-E, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. The Robotics Institute Carnegie Mellon University 3. Cao Z, Hidalgo G, Simon T, Wei S-E, Sheikh Y (2018) OpenPose—realtime multi-person 2D

242

P. D. Rathika et al.

pose estimation using part affinity fields 4. Groos D, Ramampiaro H, Ihlen EAF (2020) EfficientPose—scalable single-person pose estimation 5. Zhang H (2019) A study on human pose data anomaly detection 6. Penmetsa S, Minhuj F, Singh A, Omkar SN (2014) Autonomous UAV for suspicious action detection using pictorial human pose estimation and classification 7. Ichihara K, Takeuchi M, Katto J (2020) Accuracy evaluations of video anomaly detection using human pose estimation. In: IEEE ınternational conference on consumer electronics (ICCE) 8. https://google.github.io/mediapipe/solutions/pose.html

Chapter 19

Noise Removal in ECG Signals Utilizing Fully Convolutional DAE Arun Sai Narla, Shalini Kapuganti, and Hathiram Nenavath

1 Introduction As per the World Health Organization (WHO), cardiovascular diseases (CVDs) are the dominant cause of death on the globe. Nonetheless, different forms of noises that includes baseline wander, electrode motion, and muscle artifact can taint ECG data (EM). The reduction of noise from ECG readings becomes increasingly important. A few strategies have been accounted for to eliminate noise from ECG signals based on techniques that include Empirical mode decomposition, adaptive filtering and wavelet techniques to prevent the occurrence of noise. Electrocardiography (ECG) is a diagnostic process which continuously monitors the functioning of the heart. Electrodes are made to held on the victim’s body and an electrocardiograph is used to calculate the heart rate.

2 Literature Survey Divedi et al. [1] used a pair of ensemble empirical mode decomposition and stationary wavelet transform; noise reduction in ECG signals was presented. The artifact in the elucidation of the input signal is caused by powerline interference in ECG. We suggest employing stationary wavelet and empirical mode decomposition ensemble empirical mode decomposition to eliminate this noise/artifact from input ECG signal. After dissecting ECG signals into a variety of intrinsic mode functions, SWT is applied for further noise reduction (IMFs). A. S. Narla (B) · S. Kapuganti · H. Nenavath Electronics and Communication Engineering, Vardhaman College of Engineering, Shamshabad, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_19

243

244

A. S. Narla et al.

Rahman et al. [2] proposed ECG sensor-dependent remote health examining requires efficient, simple adaptive noise cancelers. Medical expert systems are critical for the flexible and intelligent health-monitoring gadgets that are used in daily routine. Arrhythmic beat is mostly utilized to detect electrocardiogram (ECG) abnormalities in order to diagnose cardiac problems. To separate into conventional and odd issues, this research uses ECG data pre-processing and support vector machine-based arrhythmic beat categorization. A delayed error normalized LMS adaptive filter is utilized in ECG signal pre-processing to yield greater speed and lower latency design consisting fewer computing components. White noise removal is the main focus of the signal processing approach, which was created for remote health care systems. For HRV feature extraction, a discrete wavelet transform is used to the pre-processed data, and machine learning methods are utilized for arrhythmic beat categorization. SVM classifiers and other main stream classifiers were used to classify beats using noise-removed feature derived signals in this work. The presentation of the SVM classifier is superior to that of other AI-based classifiers, according to the results. Moradi et al. [3] proposed the ECG signal improvement utilizing an adaptive kalman filter and also signal averaging. Electrocardiogram is mostly used to diagnose heart problems. Doctors employ high-quality ECG data for interpreting and identifying psychical and chronical phenomena. A number of algorithms are presented for extracting ECG components from background noise, allowing for the assessment of small characteristics in the ECG signal. Adaptive filter design has been utilized for noise cancelation of ECGs with electromyogram (EMG) noise, baseline wander and motion distortions, and is one of the standard approaches. To extract a noiseless signal from the noisy signal, many approaches such as the independent component analysis and neural networks were used. Wavelet transform (WT)-based approaches for denoising multi-resolution signals, as ECG, have gotten a lot of attention in recent years. Simple, frequency selective filtering can help to suppress the noise and artifact difficulties. In any event, due to a lack of coverage of noise and signal transmission capabilities, frequency selective filtering can be of limited utility. We created a which that can do just that Kalman filter is inferred utilizing a Bayesian framework, dynamic fluctuations in ECG are illustrated by the covariance matrix which is calculated each once fresh data comes. Reddy et al. [4] proposed wavelet transforms-based enhanced thresholding for ECG denoising. To distinguish and inspect cardiac disorders, an electrocardiographic (ECG) signal is required. However, ECG signals can be polluted by a variety of disturbances, which might affect the signal’s application value. We present denoising of ECG signal procedure based on wavelet energy and a sub-band smoothing filter in this work. In contrast to the standard wavelet denoising method that performs threshold processing on the wavelet coefficients, the wavelet coefficients which require threshold denoising are picked by wavelet energy, while the remaining wavelet coefficients remain unaltered. In addition, a smoothing filter is used to remove noise from the ECG signal also further increases its quality. Liu et al. [5] developed using the wavelet transform, a novel thresholding approach for eliminating noise from electrocardiograms. The electrocardiogram (ECG) signal is the best straightforward way of determining a person’s health status. ECG would

19 Noise Removal in ECG Signals Utilizing Fully Convolutional DAE

245

be adulterated by a wide range of noises as well as interferences during acquisition, and the effects of standard wavelet denoising techniques are not very satisfactory. This paper introduced a novel thresholding technique dependent on discrete wavelet transform (DWT). Contrasted and the traditional hard thresholding and soft thresholding, it can adequately suppress a wide range of high frequency and low frequency noises. To a limited extent, the novel thresholding strategy ensures the characteristics and amplitude of ECG signal perfectly. Finally, database of MIT-BIH arrhythmia was utilized to verify this novel thresholding technique; the exploratory outcomes showed that performance of presented algorithm is better when contrasted with traditional wavelet thresholding algorithms. This proposed new thresholding algorithm can keep up the geometrical characteristics of ECG signal much better, and it likewise has higher signal noise ratio (SNR). El Charri et al. [6] suggested that denoising evaluation of ECG signal performance is dependent on threshold adjustment of dual-tree wavelet transform. Because electrocardiogram signal has a lower recurrence and magnitude, it is more susceptible to blended noises, that can lessen accuracy and can hinder the doctor’s ability to make the best decision for the patient. The double tree wavelet transform is the most latest discrete wavelet transform improvements. At this time, threshold tuning on this approach for removing the noise from ECG signals has not been done yet. Through this one, we will examine the impact of the threshold algorithm selection, threshold value, selection of wavelet decomposition level on denoising ECG signal execution. Weng et al. [7] proposed denoising ECG dependent on the empirical mode decomposition. Electrocardiogram (ECG) has long been used to diagnose cardiac conditions. Doctors utilize high-quality ECGs to analyze and identify evidence of physiological and pathological phenomena. ECG narratives, on the other hand, are usually tainted by artifacts in general. High frequency noise generated by electromyogram generated noise, mechanical forces on the electrodes is one visible artifact. Noise extremely limits the usefulness of the ECG and, as a result, should be eliminated for a good clinical estimation. For ECG denoising, a few methods are developed. We present another ECG denoising approach in the work, that is dependent on the recently established empirical mode decomposition algorithm (EMD). High recurrence noise can be removed with minimal signal distortion using the proposed EMD-based technique. Investigations on the MIT-BIH data set have approved the technique. There are both quantitative as well as subjective outcomes. The results demonstrate as the suggested approach produces good denoising results. Castillo et al. [8] proposed For wandering as well as noise elimination in electrocardiographic (ECG) data, a discrete wavelet transform (DWT). This article proposes a single-step solution that enhances the entire denoising process. This approach is used to offer threshold limits as well as thresholding rules for optimum wavelet denoising. The system is being tested utilizing synthetic ECG signals to ensure that the proposed processing is correct. The results of actual abdominal ECG signals obtained from the pregnant women are used to verify the suggested technique in this study. Ashfanoor Kabir and Shahnaz [9] proposed empirical mode decomposition and discrete wavelet transform were for ECG denoising. We provide a fresh ECG noise

246

A. S. Narla et al.

elimination technique dependent on noise limiting algorithms in the empirical mode decomposition and discrete wavelet transform domains in this work. We introduce windowing in EMD domain, contrast to existing EMD-based ECG noise removal methods, which ignore a significant amounts of initial intrinsic mode functions including QRS complex and noise. The QRS complex is kept to decrease noise from the preliminary IMFs rather than deleting them entirely, resulting in a noise-free ECG signal. Signal is then converted in DWT domain, a noise reduction technique based on adaptive soft thresholding is given. This technique is based on the DWT’s superiority to the EMD in terms of conserving signal energy. Marouf and Saranovac [10] proposed noise limiting in ECG data utilizing noise level approximation. The use of translation-invariant noise level approximation of electromyogram (EMG) noise removal in electrocardiogram (ECG) data was demonstrated in this study in order to achieve acceptable adaptiveness. Because the noise limiting framework is built on low pass filters, adaptive noise limiting is accomplished by choosing right filter. Both genuine EMG and generated sounds are employed in the evaluation of guiding signal. The objective is to achieve the optimal balance between signal readability and filtering distortion. For the purpose of evaluating the guiding signal, both real EMG and artificial noises were used. Main goal is to find the better trade-off between distortion caused by filtering as well as signal readability. Both genuine EMG and generated noises are employed in the evaluation of the guiding signal. The objective is to achieve the optimal balance between signal readability and filtering distortion.

3 Proposed Methodology With enhanced denoising execution and compression performance, we will most likely demonstrate the supremacy of clinical practice. Using FCN, we present a new denoising method for ECG data. Our work is supposed to be rapid enough to take use of FCN’s ability to process ECG signals. In the meantime, because of the DAE architecture, the presented FCN model has the ability to compress ECG signals. DAE has been demonstrated to be excellent at low-dimensional representations and could be utilized to reconstruct noisy data. Here, we present an FCN-based on DAE for removing noise from noisy ECG data in this paper. An autoencoder (AE) is an AI model which tries to recreate original data as accurately as feasible. Encoder and decoder are the two main sections of an AE. The encoder uses a nonlinear transformation to translate an input x to the hidden representation z. FCN, a special type of CNN. Convolutional layers, max-pooling layers, an activation function and a fully connected layer are common components of CNN. Convolutional layers are made up of group of filters which may extract the required feature maps to represent the input’s properties. Various feature maps in layer utilizes various filter parameters, whereas a feature map with comparable parameters utilizes same filter parameters. When contrasted with the fully connected layer, convolutional layers decrease the parameters by dividing parameters with the filters, allowing each

19 Noise Removal in ECG Signals Utilizing Fully Convolutional DAE

247

neuron to be connected with all outputs from the preceding layer. Max-pooling is a common pooling activity that is utilized to provide translation, rotation invariance. Max-pooling layers accomplish down-sampling by extracting maximum value from the mapping space, resulting in a smaller measurement. Regression or classification is expected to be performed by the fully linked layer. The dissimilarity among FCN and CNN is that in CNN, fully connected layer is removed, but in FCN, it is not. Number of parameters is decreased by eliminating the completely linked layers, making hardware implementation less complicated and easier. FCN, on the other hand, allows every output sample to maintain the locally spatial data of neighboring input areas, but fully connected layers do not keep up with this data from previous layers as expected. Our FCN model does not include a pooling layer. In this paper, we present a signal denoising procedure dependent on DAE and a method for compressing size of ECG waves. Presented FCN-based DAE is shown in Fig. 2 and has a 13-layer encoder and decoder. The ECG signal size is minimized in encoder, and signals are encoded into lower-dimensional characteristics. Decoder aims to rebuild the output based on low-dimensional characteristics. For hidden layers, we used exponential linear unit (ELU) activation functions, and the FCN model’s output layer has no activation function. The model’s encoder takes input signals with a size of 1024 × 1 and applies a convolutional procedure on the first layer consisting of 40 filters with size 16 × 1 as well as stride of 2. Following three convolutional layers each feature 20 filters with a stride of 2 and a size of 16 × 1. The following layer is made up of 40 16 X 1 filters with stride of 2. Final layer has a 16 × 1 filter with a stride of 1. A stride of 2 is used in the down-sampling process. A 32 × one-dimensional feature map is generated during the encoding procedure. The compressed data is also represented by this feature map, which is smaller 32 times than the original. The encoder and decoder parts are inversely symmetric. The deconvolutional layers performs up-sampling process on the feature maps and recreate structural information at this point. Deconvolutional layers, in contrast to convolutional layers, that connects many input activations to a single activation, project a single input activation into several output (Fig. 1). At first, the noisy signal is taken and decomposed into the different values by calculating the weights of the signal. This method undergoes two processes as mentioned above, i.e., convolution and deconvolution. In convolution process, after performing the convolution operation, we undergo again two more layers, i.e., drop-out layer,

Table 1 Values of the input signals S. No. Max value 1 2 3 4 5

0.3696 0.4962 0.9379 0.4962 0.5246

Min value

RMS value

−0.3152 −0.4188 −0.8550 −0.4188 −0.4537

0.302 0.0397 0.0706 0.0397 0.0464

248

A. S. Narla et al.

Table 2 SNRimp values Input SNR (dB) DNN −1 3 7

12 8.3 6.54

Table 3 RMSE values Input SNR (dB) DNN −1 3 7

0.094 0.080 0.077

CNN

FCN

14.32 12.6 9.37

16.7 14.7 11.2

CNN

FCN

0.076 0.060 0.054

0.062 0.052 0.047

max-pooling layer which are used to obtain the compressed feature map. This is almost 32 times compact one than the original one which we have applied as the input to the network. And in the deconvolutional process, the reverse operation is being performed. Here, the signal reconstruction takes place, i.e., signal of original size has to be

Fig. 1 Schematic diagram of DAE

19 Noise Removal in ECG Signals Utilizing Fully Convolutional DAE Table 4 PRD values Input SNR (dB) −1 3 7

249

DNN

CNN

FCN

32 28 26

23 20 17

18 16 14

re-constructed. Here, deconvolutional layers and ReLU layers are applied on the compressed signal to obtain the noiseless signal. Now, these weights are again reconstructed to obtain the reconstructed signal.

4 Experimental Results Our proposed method has been verified with the previous methods that include DNN and CNN. For a signal to be less noisy, signal to noise ratio improvement should

Fig. 2 Proposed FCN-based DAE architecture

250

A. S. Narla et al.

Fig. 3 Formulas

be greater and root mean square error and percentage root mean square difference should be as minimum as possible. A clear comparison is being made by taking a signal and measuring the required properties in DNN, CNN and FCN as shown in the tabular coloumn. These values are calculated using these formulas (Tables 1, 2, 3 and 4; Figs. 4, 5, 6, 7 and 8).

19 Noise Removal in ECG Signals Utilizing Fully Convolutional DAE

Fig. 4 Results for an input signal 1

251

252

Fig. 5 Results for an input signal 2

A. S. Narla et al.

19 Noise Removal in ECG Signals Utilizing Fully Convolutional DAE

Fig. 6 Results for an input signal 3

253

254

Fig. 7 Results for an input signal 4

A. S. Narla et al.

19 Noise Removal in ECG Signals Utilizing Fully Convolutional DAE

Fig. 8 Results for an input signal 5

255

256

A. S. Narla et al.

5 Conclusions In this paper, fully convolutional DAE has been used to efficiently remove noise from ECG signals. The method’s performance was compared to classic DNN and CNN, and it was shown that FCN offers significant benefits in accuracy over the other approaches. The SNR of FCN is greater than that of FCN. It has a reduced RMSE and PRD as well. Further comparisons reveal that the FCN-based DAE outperforms the other two models in terms of noise reduction. This strategy is also ideal for diagnosing problems.

References 1. Dwivedi AK, Ranjan H, Menon A (2020) Noise reduction in ECG signals using combined ensembled empirical mode decomposition methods with stationary wavelet transforms. Circ Syst Sig Proc 40 2. Rahman MZ, Shaik RA, Reddy DVR (2017) Impulsive noise cancellation from cardiac signal using modified WLMS algorithm. Res Int Comput Intell 10 3. Moradi M, Rad MA, Khezerlo RB (2014) ECG signal enhancement using adaptive Kalman filter and signal averaging. SN Appl Sci 4. Reddy GU, Muralidhaar M, Varadaraajan S (2009) ECG denoising using improved thresholding based on wavelet transforms. Bioinf Biomed 9 5. Liu S, Li, Hu X, Liu L, Hao DJ (2019) An ECG signal-denoising approach based on wavelet energy and sub band smoothening filter. Appl Sci 6. El Charri O, Rachid L, Elmansuri K, Abenau A, Jenkaal W (2017) ECG signal performance denoising assessment based on threshold tuning of dual-tree wavelet transform 7. Weng B, Blancoa-Velasco M, Barner KEJ (2006) ECG denoising based on empirical mode decomposition 8. Castillo E, Morales DP Parilla L (2013) Noise suppression in ECG signals through efficient one-step wavelet processing techniques 9. Ashfanoor Kabir Md., Shahnaz C (2012) Comparison of ECG signal denoising algorithms in EMD and wavelet domains 10. Marouf M, Saranovac L (2017) Adaptive EMG noise reduction in ECG signals using noise level approximation. Rob Miss Vis 11. Chiang H, Hsieh Y, Hung K (2019) Noise removal in ECG using FCN based DAE

Chapter 20

Performance Investigation of RoF Link in 16 Channel WDM System Using DPSK Modulation Technique Balram Tamrakar, Krishna Singh, and Parvin Kumar

1 Introduction The present scenario of the wireless network is demanding higher data rates, higher bandwidth, better connectivity and low distortions. These can be easily achieved by applying the radio over fiber system. The radio over fiber systems provide compatibility in the different applications as e-health, IoT-based applications, broadband and auto-adjustable mobiles/vehicles. The millimeter waves (mm-wave) operated with frequency range from 30 to 300 GHz are the most suitable frequency spectrum for radio over fiber communication systems. The RoF systems are categorized into three parts as: (1) central station (CS), optical distribution networks (ODNs) and base station (BS). The downlink process is started with the central station to base station. The simplified RoF link model is presented in Fig. 1. The central station consists of optical transmitter with laser source and receiver sections with photodetector. The central station has connectivity with the trunk networks, which enables the switching and Internet access, capabilities to the RoF link. Using downlink process, the CS up-converts the RF signals to optical domain and uses the ODN to establish communication with the BSs. The remote nodes (RNs) are used to amplify the week signals and forward toward the concerned BS. Using uplink process, the BS receives the high-frequency signal from mobile user end [1]. The B. Tamrakar (B) · P. Kumar Department of Electronics and Communication Engineering, KIET Group of Institution, Delhi-NCR, Ghaziabad, India e-mail: [email protected] B. Tamrakar · K. Singh University School of Information Communication and Technology, Guru Gobind Singh Indraprastha University, Dwarka, New Delhi, India Department of Electronics and Communication Engineering, G. B. Pant Government Engineering College, New Delhi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_20

257

258

B. Tamrakar et al.

Fig. 1 Simplified RoF link model

WDM is the key element to utilizing in the long-haul optical networking, to get the satisfactory performance and optimize the RoF Link, in the high-capacity medium [2]. More than one channel can be utilizing with unique wavelength to transmit information simultaneously in an optical environment with the help of WDM model. Let Z is the maximum achievable channels using P Gbps bitrate, so the equivalent capacity of the transfer medium in enhanced by (Z × P) Gbps. WDM channelbased DSPK modulation technique provide the feasibility to get effective and result oriented way to enhance the channel capacity of the transmitted medium. That is the simplest way to reduce the channel spacing by enabling WDM model. The nonlinear effects and dispersion act as a degradation factor, which is responsible to reduce the feasibility and channel capacity of the WDM model-based DSPK technique [2, 3]. As per the current scenario, we need higher bandwidth and higher capacity, so we need WDM model to apply multiple channels in the transmitted medium, and we can easily find it, by reduction of channel spacing. The results of this one are the increment of cross-talk between the proposed and valid number of channels. While we are adapting the WDM channel, the nonlinear effect arises, and this is called four-wave mixing phenomenon. FWM shows that third-order nonlinearity phenomenon. FWM effect arises due to the simultaneous transmission of two or more than two wavelengths in the optical medium. It is possible to get energy from the applying or existing input channel wavelengths. The results are that generation of new signals with the concerned wavelengths. Due to this activity the signal quality is degraded. DPSK is one of the solutions in suppressing FWM [3]. DPSK and QPSK schemes under the direct detection techniques are useful to optimize the RoF link. BER must be ≤10–09 with the use of 1 Gbps bit rate acts as an attractive approach, for optical fiber networks, with the fiber impairments up to 25 km [4, 5]. The performance analysis of RoF systems by the degradation of IM3 error is an important issue to enhance the RoF link [6]. Radio over fiber link design for next generation is the important consideration for the 5G network implementation [7]. Singh et al. describe phase-modulated RoF link-applying polarization method to enhance linearity in the noisy environment. The use of phase modulated structure exhibits the linearization for the RoF link [8]. Thomas Va et al. describe the optimization of RoF link and budget-related techniques for RoF systems in reference to the green radio communication systems [9].

20 Performance Investigation of RoF Link in 16 Channel …

259

Iqbal et al. describe the investigation of RoF systems with improved digital modulation techniques [10]. All used references provide the basis for implantation of WDM-DPSK-based RoF model. Simulation model is explained in Sect. 2. Section 3 contains the simulation results and discussion. Section 4 contains conclusion and references, respectively.

2 Simulation Model The proposed simulation model for 16 channel WDM-based DPSK modulation technique is shown in Fig. 2. The proposed model consists of WDM DSPK transmitter, WDM DPSK receiver, optical splitter, optical booster, optical fiber grating with multiplexing tools. The WDM DPSK transmitter is based on CW lasers and optical phase modulator, while in WDM receiver systems, each channel is composed by a raised cosine filter. The optical splitter is used to split the optical wave. The upper portion of the proposed model does not contain the optical fiber grating, while the bottom portion enables the optical fiber grating. The implemented RoF link is designed using optical simulator simulation software to confirm and validate the proposed results. DPSK transmitter enables the sharing of data with phase difference among consecutive bits. DPSK demodulator having capabilities to find the comparison of

Fig. 2 Proposed model for 16 channel WDM-based DPSK modulation technique

260

B. Tamrakar et al.

Fig. 3 Proposed model of ROF link using ASK modulation technique

phases between previously transmitted signals with the recent received signals to get actual phase differences. Due to this, we can get the original information, with low distortions, and it exhibits the linearization among received spectrum. Figure 3 shows that ROF link model using ASK modulation technique. In this technique, we used the digital data string followed by NRZ encoding method. The used bit rate is one Gbps, and sample per bits is 625. Laser source is used as carrier signal performing optical modulation by using MZM modulator. The laser frequency is around 193.41 THz, equivalent to 1550 nm wavelength. Used MZM modulator is having 3.0 dB excess loss, and offset voltage corresponding to phase is set about 2.0 V. The receiver sensitivity under test condition is around to -25 dbm, overall receiver responsivity is 1 A/W, and its quantum efficiency is taken around 0.75. The proposed technique is compared with the 5 km and 10 km fiber lengths. At the receiving end, we used the low-pass filter to extract the original information and to analyze the received information we used the BER analyzer and electrical spectrum.

20 Performance Investigation of RoF Link in 16 Channel …

261

3 Simulation Results and Discussion The proposed research article exhibits the performance investigation of RoF link in 16 channel WDM system using DPSK modulation technique. In this model, 40 Gbps bit rate is used, with 0.2 THz channel spacing. The upper portion of the proposed model assembles with WDM DPSK transmitter and receiver sections with long-haul optical medium. At the receiver section, we enable the electrical spectrum analyzer only at channel-1, channel-8 and channel-16, to measure its performance in term of SNR and BER parameters. Modulation performs at 40 GHz frequency. Carrier frequency cantered at 193.85 THz. PRBS value is 7. Roll-off factor is 0.3, and filter bandwidth is 50 GHz at WDM DPSK receiver section. Figure 4a shows the electrical spectrum of receiver channel-1 with non-fiber grating, Fig. 4b shows the eye diagram of receiver channel-1 with non-fiber grating, Fig. 4c shows the histograms on electrical signals for channel-1 with non-fiber grating, Fig. 4d shows the electrical spectrum of receiver channel-8 with non-fiber grating, Fig. 4e shows the eye diagram of receiver channel-8 with non-fiber grating, Fig. 4f shows the histograms on electrical signals for channel8 with non-fiber grating, Fig. 4g shows the histograms on eye diagram for channel-8 with non-fiber grating, Fig. 4h shows the electrical spectrum of receiver channel-16 with non-fiber grating, Fig. 4i shows the eye diagram of receiver channel-16 with non-fiber grating, Fig. 4j shows the histograms on eye diagram for channel-16 with non-fiber grating, Fig. 4k shows the electrical spectrum of receiver channel-1 with optical fiber grating, Fig. 4l shows the eye diagram of receiver channel-1 with optical fiber grating, Fig. 4m shows the electrical spectrum of receiver channel-8 with optical fiber grating, Fig. 4n shows the eye diagram of receiver channel-8 with optical fiber grating, Fig. 4o shows the electrical spectrum of receiver channel-16 with optical fiber grating, and Fig. 4p shows the eye diagram of receiver channel-16 with optical fiber grating. The used parameters with their specific values of the proposed simulation model of WDM DPSK system have been given to Tables 1 and 2. WDM-based DPSK modulation system is compared with the RoF link-based ASK modulation techniques for the different fiber impairments at 5 km and 10 km, and we found that the received electrical spectrum of WDM-based DPSK modulation system having very good eye opening as compared with the RoF link-based ASK modulation techniques. It clearly shows that our proposed techniques having good SNR exhibit linearization than other existing techniques. Figure 3 shows the proposed model of ROF link using ASK modulation technique, Fig. 5a shows the electrical spectrum of receiver at Scope 3 using ASK modulation technique, 5 (b) shows the eye diagram at Scope 4 of the received electrical signal at 5 km fiber length, 5 (c) shows the eye diagram at Scope 4 of the received electrical signal at 10 km fiber length. Table 1 shows used parameters in the proposed model. Figure 4p shows the degraded performance with optical fiber grating for channel-16, as its eye is not clearly visible. Table 2 shows that used parameters of eye diagram analyzer describe Q-factor, bit error rate, average opening of eye, jitter and sampling time of obtained

262

B. Tamrakar et al.

Fig. 4 a Electrical spectrum of receiver channel-1 with non-fiber grating, b eye diagram of receiver channel-1 with non-fiber grating, c histograms on electrical signals for channel-1 with non-fiber grating, d electrical spectrum of receiver channel-8 with non-fiber grating, e eye diagram of receiver channel-8 with non-fiber grating, f histograms on electrical signals for channel-8 with non-fiber grating, g histograms on eye diagram for channel-8 with non-fiber grating, h electrical spectrum of receiver channel-16 with non-fiber grating, i eye diagram of receiver channel-16 with non-fiber grating, j histograms on eye diagram for channel-16 with non-fiber grating, k electrical spectrum of receiver channel-1 with optical fiber grating, l eye diagram of receiver channel-1 with optical fiber grating, m electrical spectrum of receiver channel-8 with optical fiber grating, n eye diagram of receiver channel-8 with optical fiber grating, o electrical spectrum of receiver channel-16 with optical fiber grating, p eye diagram of receiver channel-16 with optical fiber grating

20 Performance Investigation of RoF Link in 16 Channel …

Fig. 4 (continued)

263

264

B. Tamrakar et al.

Fig. 4 (continued) Table 1 Simulation work parameters

Table 2 Parameters of eye diagram analyzer

Center Emission Frequency

193.85 THz

Bit rate

40 Gbps

Transmitter bandwidth

40 GHz

Roll-off factor

0.3

Channel spacing

0.2

Filter bandwidth

50 GHz

Channel power

−10 dBm

PRBS degree

7

Error rate

6039672e-038

Q-factor

22.6424 dB

Average opening of eye

0.000189255 (a.u.)

Jitter

0.181826 ns

Sampling time

0.00408163 ns

20 Performance Investigation of RoF Link in 16 Channel …

265

Fig. 5 a Electrical spectrum of receiver at Scope 3 using ASK modulation technique, b eye diagram at Scope 4 of the received electrical signal at 5 km fiber length, c eye diagram at Scope 4 of the received electrical signal at 10 km fiber length

electrical spectrum. The performance of WDM-based RoF link DPSK modulation has been investigated based on Fig. 3, including electrical signals, eye diagram and its related histograms.

4 Conclusion In view of green radio communication, 16 channel WDM-based DPSK modulation technique is analyzed and compared with RoF link-based ASK modulation technique, for different fiber impairments at 5 km and 10 km. We found that the received electrical spectrum of WDM-based DPSK modulation system having enhanced eye opening as compared with the RoF link-based ASK modulation techniques. It clearly shows that our proposed technique has good SNR, which exhibits linearization than other existing techniques. The simulation results show that the channel-1, channel-8 and channel-16 of the WDM-based DPSK systems having enhanced SNR, without

266

B. Tamrakar et al.

using optical fiber grating systems than other existing techniques. The electrical spectrum of used channels showing enhanced eye opening with its concerned histograms to find its desirable BER performances. The proposed model is optimized with respect to its SNR and BER parameters, with the variation of fiber length up to 20 km. The outcome of this research article can be applicable toward the optimization of next-generation networks in the RoF link.

References 1. Beas J, Castanon G, Aldaya I, Aragon-Zavala A, Campuzano G (2013) Millimeter-wave frequency radio over fiber systems: a survey. IEEE Commun Surv Tutorials 15(4) fourth quarter 2. Namita K, Garg AK (2019) Analysis of radio over fiber system for mitigating four-wave mixing effect. In: Digital communications and networks 3. Haque E (2010) Effect of DPSK modulation on four wave mixing in a WDM system. In: International Conference on Electrical and Computer Engineering ICECE 4. Ahmed N, Rashid MA (2018) Performance of hybrid OCDMA/WDM scheme under DPSK and QPSK modulation using spectral direct detection technique for optical communication networks. J Opt Commun 5. Llorente R, Beltran M (2010) Radio-over-fiber techniques and performance. In: Frontiers in guided wave optics and optoelectronics, InTech 6. Wake D, Nkansah A, Gomes NJ (2010) Radio over fiber link design for next generation, wireless systems. J Lightwave Technol 28(16):2456–2464 7. Singh S, Arya SK, Singla S (2018) RoF system based on phase modulator-employing polarization for linearization. J Opt Commun 47:460–466 8. Kumar P, Sharma SK, Singla S, Sharma A (2020) Dynamic Range measurement of radio over fiber link by employing 1200 phase shift method. In: Telecommunications and radio engineering (Scopus indexed), United States, vol 79 9. Thomas VA, El-Hajjar M, Hanzo L (2005) Performance improvement and cost reduction techniques for radio over fiber communications. IEEE Commun Surve Tutorials 17:627–670 10. Iqbal, Ji S, Kim K (2000) Performance of millimeter wave transmission systems with digital subcarrier modulations for radio over fiber links. In: Proceedings of Microwave Photonics (MWP), Oxford, UK, pp 43–47

Chapter 21

A Comprehensive Study on Automatic Emotion Detection System Using EEG Signals and Deep Learning Algorithms T. Abimala, T. V. Narmadha, and Lilly Raamesh

1 Introduction In our daily routines, emotion has a significant part in human–human interactions. Emotional intelligence, like logical intelligence, is regarded as an important component of human intelligence [1]. In human–computer interactions (HCIs), appropriate procedures are made to generate emotional artificial intelligence [2]. Affective computing is defined as the strategy of introducing affective factors to HCIs. It attempts to evolve artificial intelligence strategies recognize and regulate emotions. Emotion recognition is an important phase in the affective cycle. The majority of mental diseases, such as depression, autism, hypertension, and game addiction, are linked to emotions [3, 4]. HCI essentially incorporates five stages: signal procurement, preprocessing, feature extraction, feature characterization, and interface gadget control. Owing to limited knowledge on neural mechanisms, an effective strategy to detect emotions and provide convenient feedback with suitable evaluation to diagnose diseases is still lacking. Emotions are perplexing psycho-physiological cycles that are related with numerous external and internal activities. Various modalities depict various aspects of emotions and contain corresponding data. Amalgamating this information with fusion strategies is appealing for building powerful emotion recognition models [5, 6]. Numerous studies had been done on coordinating audio visual modalities for multimodal emotion recognition [7]. Fusion of brain signals such

T. Abimala (B) Assistant Professor, ICE Department, St. Joseph’s College of Engineering, Chennai 600119, India e-mail: [email protected] T. V. Narmadha Professor, EEE Department, St. Joseph’s College of Engineering, Chennai 600119, India L. Raamesh Professor, IT Department, St. Joseph’s College of Engineering, Chennai 600119, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_21

267

268

T. Abimala et al.

as EEG and eye movements has been suggested as a promising strategy. Emotionspecific brain markers are currently being used to research the nature of emotions. It makes perfect sense because any decision or action made by people is based on cerebral activity. Subsequently, developing artificial intelligence (AI) which has a capacity to catch neural action designs for getting human feelings can give dynamic reactions to explicit feeling just as help to individuals in settling on choices in their day-to-day routines. For quite a long time, it has been of exorbitant interest to induce data with respect to progressing brain activity and its genuine importance [8]. Aside from conduct measures, similar to eye tracking, scientists utilized actual boundaries that relate straightforwardly to physiological cycles inside the cerebrum to achieve this assignment. A huge segment of neuroscientific research depends on the electroencephalogram (EEG). The justification for why such countless scientists pick EEG is a direct result of its high temporal resolution, ease of use and the minimal expense. This paper will provide a new insight into distinct algorithms utilized for detecting emotions and classifying features. It gives a thorough examination of present emotion recognition methods based on deep learning and machine learning technologies. The main goal of this study is to find new research areas in the field of emotion detection.

2 Literature Review of Various Emotion Detection Techniques 2.1 Deep Learning-Based Emotion Detection Systems In recent years, deep learning techniques for identifying emotions have grown in popularity. A new detection technique based on hierarchical fusion convolutional neural network (HFCNN) was proposed by Zhang et al. [9]. The proposed work’s main technique is to generate a feature map from multiscale features by fusing global features made by combining weights with physically derived statistical features using feature-level fusion. The input EEG signals were from the DEAP and MAHNOBHCI database. The EEG and PPS signals are then combined to generate a multimodal feature map. Finally, a random forest classifier with tenfold cross-validation is employed to classify EEG data. According to the results of the experiments, the proposed HFCNN-based feature extraction strategy achieved accuracies of 84.71% and 89.00%, which is better than the CNN method. Because the suggested method is a subject-independent classification system, it is applied in day-to-day applications. Further study could be done to find an appropriate deep learning technique for extracting numerous multimodal features in order to execute a generic crosssubject emotion categorization system. In addition, neurological systems with a brain–computer interface are being developed. Rahul Sharma et al. put forward a novel EEG-based emotion recognition system based on deep learning techniques [10]. The proposed work utilized EEG data in

21 A Comprehensive Study on Automatic Emotion …

269

the benchmarked database such as DEAP. For feature extraction, particle swarm optimization (PSO) was used. The input signals were decomposed into sub-bands known as the pulses of EEG signals by discrete wavelet transform (DWT). The nonlinear dynamics of each decomposed signal in developed dimensional space had been discovered by the third-order cumulants (ToC). Because of varied symmetries in the ToC, the information in the higher dimensional space includes rehashed and repetitive material, which was reduced using the PSO method. To classify emotions based on EEG signals, a deep learning classifier LSTM is used. The major objective of this work is to develop an automated technique for feature abstraction from nonstationary EEG data in order to reduce feature dependency. According to the results of the experiments, the new strategy outperformed previous techniques by 82.01%. Yu Liu et al. recommended a features guided capsule network (MLF-CapsNet) for emotion identification [11]. The MLF-CapsNet is an enclosed structure that can extract appropriate characteristics and identify emotional states from raw EEG signals automatically. DEAP and DREAMER datasets provided the raw EEG data. The proposed MLF-CapsNet consists of ConvReLU, multi-level features guided Primary Caps (MLF-Primary Caps) and Emotion Caps. This additional layer combines the multi-features and provides a multi-level feature map with multiple layers, including the principal capsule, allowing for improved feature extraction. Furthermore, it employs a restricting layer to decrease the amount of inputs while increasing computation time. On EEG data obtained from DEAP, proposed technique achieved average accuracy of 97.97%, 98.31%, and 98.32% for Valence (VA), Arousa l (AR), and Dominance (DO), respectively, and on the DREAMER dataset, the technique achieved mean accuracy of 94.59%, 95.26%, and 95.13% for VA, AR, and DO. In comparison to Cont-CNN, CNNRNN, and DGCNN, the experimental findings illustrate that the suggested process classification accurateness is higher. For EEG-based emotion identification, Peixiang Zhong et al. presented a regularized graph neural network (RGNN) [12]. The EEG data for the input is derived from the SEED and SEED-IV databases. To find relationships between distinct EEG channels, RGNN considers the biological structure of various brain areas. An adjacency matrix in a graph neural network is engaged to determine the connection among distinct channels in EEG waveforms. The adjacency matrix is modelled after human brain architecture theories in neuroscience. Additional regularizers, such as node-wise domain adversarial training (NodeDAT) and emotion-aware distribution learning (EmotionDL), were used to handle interdisciplinary EEG differences and noise-corrupted signal patterns in the suggested technique. The experimental outcomes illustrate that the suggested model outperforms DGCNN, another GNN-based model which employs the typical EEG channel topology. The adjacency matrix and two regularizers also helped the classification model perform better. The proposed study can be expanded to include a more complex classifier, such as one that employs certain cutting-edge techniques to deal with domain-invariant EEG signals, and the same method can be tried on a reduced number of EEG channels. Furthermore, the work described above can be used to test data preparation methods such as spatial filtering that act on the spatial resolution of EEG signals.

270

T. Abimala et al.

Hector A. Gonzalez et al. proposed a hardware implementation of an EEG-based emotion classification system [13]. CNN is employed to create the hardware architecture. The input EEG data comes from datasets like DEAP and DREAMER FEXD, as well as five individuals who were given standard visual stimuli. A 14-channel hardware device for emotion recognition from EEG signals and a compressed, FPGA model based on a CNN interpretation engine make up BioCNN. The results suggest that BioCNN can be implemented on a limited edge node, such as the Xilinx Altys FPGA, with higher accuracy than software-based emotion detectors. Using the DEAP dataset, the proposed technique attained a recognition accuracy of 77:57% for valence and 71:25% for arousal dimensions. With a deep automated encoder, Hongli Zhang proposed an expression-emotion recognition system by EEG signals [14]. Face expressions plus EEG readings from 13 healthy people are used in this study. The experiment is carried out with the aid of a visual stimulus, such as a video library with 90 video clips. The features of the EEG signal are retrieved using the decision tree approach. The facial features are retrieved using sparse representation, and test samples are created using solution vector coefficients analysis. The face and EEG features are then merged using Bimodal deep automated encoder (BDAE). BDAE is then used to obtain the final 14-dimensional feature map. The LIBSVM classifier is employed to classify the data. The recommended study illustrates the implementation of BDAE, a deep network pattern. Experimental outcomes reveal that combining EEG signals with facial expression features improved classification accuracy by 85.71%. This work can be expanded upon by conducting the above-mentioned trials on a larger number of people, resulting in the creation of a public video database from which a new emotion classification system can be developed. Using EEG signals and blood volume pulse signals, Bahareh Nakisa et al. put forward a novel deep learning-established emotion classification system [15]. Wearable physiological sensors were used to collect data from 20 participants (Empatica E4 and EmotivInsight). The Empatica E4 and Emotiv are used to collect BVP and EEG signals. They watched video snippets from the MAHNOB dataset in order to elicit a range of emotions. EEG and BVP signals are combined by CNN and LSTM model. The classification model uses a subject-independent methodology and leave-one-subject-out (LOSO) cross-validation method to classify four types of dimensional emotions: Low Arousal positive (LA-P), High Arousal Positive (HA-P), Low Arousal Negative (LA–N), and High Arousal Negative (HA-N). The quantitative results demonstrate that the temporal multimodal deep learning models based on early and late fusion approaches achieved classification accuracy of 71.61% and 70.17%, respectively, for the early and late fusion approaches. Based on the experimental outcomes, the accuracy of early fusion-based multimodal learning models is better than late fusion methods for both temporal and non-temporal approaches. This work could be expanded to include a bigger group of people. Heng Cui et al. set forth an end-to-end regional-asymmetric CNN (RACNN) model that includes temporal, regional, and asymmetric features [16]. 1D convolution layers are used to extract temporal characteristics. A 2D convolution layer is also utilized to extract regional information. The asymmetric differential layer (ADL) is

21 A Comprehensive Study on Automatic Emotion …

271

employed as an asymmetric feature extractor in this study to capture the discriminative properties of the brain’s left and right hemispheres. The DEAP and DREAMER datasets provide the input EEG data. For feature extraction, the RACNN model is used, which extracts temporal, regional, and asymmetric features. Moreover, the model consists of a classifier too. The suggested work is related to existing methods such as shallow models, multi-layer perceptron, 2D-CNN, CNN with Pearson correlation coefficient, BDAE, bimodal-LSTM, correlated attention network, and deep canonical correlation analysis in order to demonstrate its efficacy [16]. The suggested technique achieved classification correctness of 96.65% and 97.11% for VA and AR on the DEAP database and 95.55 and 97.01% on the DREAMER database. It can be deduced from the aforementioned findings that deep learning-based systems are capable of automatic feature extraction. The above work’s fundamental weakness is that it is subject-specific. The approach described above can be expanded to compute subject-independent emotional classification. Aya Hassouneh et al. suggested a machine learning and deep learning-established emotion identification system by EEG and facial signals [17]. The work primarily categorizes the emotional states of physically impaired adults (deaf, dumb, and bedridden) and children with autism spectrum disorder. Features are extracted by CNN, while LSTM classifiers are engaged to classify them. The input dataset consists of 55 students who had their facial expressions recorded using the optical flow algorithm and 19 students who had their EEG signals recorded using the EEG signal reader EPOC [18]. The accuracy of facial features-based classification is around 99.81%. 87.25% of the classification is based on EEG readings. The proposed research serves as the foundation for a customized e-learning system. It enables us to comprehend a person’s physical and emotional state. The proposed work could be expanded by gathering more input data from a larger number of people. The accuracy of emotional classification can also be increased by gathering input data in real-time circumstances. Using the deep learning approach Tsception, Yi Ding et al. projected an emotion detection model from EEG signals. The VR-BCI system was used to collect data from 18 healthy adults for the study. The MUSE EEG headband is used to record the input EEG data. The collected EEG signals are preprocessed with a band-pass filter ranging from 0.3 to 45 Hz to eliminate low- and high-frequency noise. The time–spatial inception is made up of learner such as temporal, spatial and a classifier. Tsception is a neural network that is used to learn the time and spatial features of EEG data. External stimuli of two categories are induced during the data acquisition process: LA and HA. The entire experiment is run in subject-dependent mode, with cross-validation using the LOSO method. The new model is compared to existing approaches such as EEGNet, LSTM, SVM (RP), SVM (DE), as well as Tception and Sception to reveal the efficacy of the suggested work. Tception and Sception are two types of temporal and spatial characteristics. The experimental findings show that in SVM classification process differential evolution (DE) features acquired higher accuracy of about 82.23% compared to RP feature with 80.73%. Among deep learning models TSception attained the highest accuracy of 86.03%, tailed by Tception (83.9%) and LSTM (80.81%). Also EEGNet is improved compared to Sception with classification

272

T. Abimala et al.

accuracy of 79.96% and 77.39%, respectively. The findings show that in AR emotion states, dynamic temporal variables yield higher classification results. Habib Ullah et al. recommended a sparse discriminative ensemble learning (SDEL) system based on EEG signals for emotion identification [19]. The DEAP database was referred to get the input EEG signals. Kernel-based outcomes determined from EEG records are used to create the EEG channel. The ensemble learning approach is given a feature graph with kernel representations and a linear discriminant objective function. In a supervised framework, the SDEL technique learns the most discriminative subset of channels. The proposed method reduces the amount of data required for categorization while also increasing computing efficiency and accuracy. Only the most needed EEG channels were encompassed in sparse learning method. The suggested method outperforms the Bayesian classifier, the ontology-based method, segment-level decision fusion (SLDF), and sparsity-restricted differential evolution centred channel selection and EMD in terms of classification accuracy. J. X. Chen et al. recommended CNN method for automatic learning of emotional characteristics in spatial and temporal features of EEG [20]. The DEAP dataset provides the input EEG signals. Traditional classification algorithms such as BT, SVM, LDA, and BLDA models, and suggested classification model, are compared and contrasted. Deep CNN networks such as CVCNN, GSCNN, and GSLT-CNN were used in the suggested technique. The deep CNN network had the finest classification execution on combined EEG data in both VA and AR dimensions, which is 3.58% advanced than the best classical techniques, according to the results. The work described above has to be expanded in order to find an appropriate strategy for extracting EEG features for subject-independent emotion classification. Hao Chao et al. suggested an emotional recognition system that includes a multiband feature matrix (MFM) and capsule network (CapsNet) [21]. The DEAP dataset provided the input EEG signals. From this input data, the PSD features are derived which is considered as the characteristic of frequency domain. Thus for each channel of EEG data, the PSD characteristics are decomposed into four bands such as theta, alpha, beta, and gamma. Then normalization is done for the PSD values. Finally, a MFM matrix constructed for all the channels of EEG data. The total size of MFM is 18 × 18. A submatrix was created in the MFM to represent EEG signals based on the position of scalp electrodes. CapsNet emotional classification is carried out in four steps. Convolution with rectified linear unit (ReLU), primary capsules (PrimaryCaps), emotion capsules (EmotionCaps), and the final section aids in the reconstruction of the input matrix from the output capsules. The proposed method’s main characteristic is that it proposes a unique feature extraction procedure that includes frequency-domain information from EEG data and maps them to an MFM. The results of the experiments showed that the proposed technique improves valence and arousal emotional states. The proposed work can be expanded by running it on a larger number of datasets. Kai-Yen Wang et al. recommended an emotion detection method centred on multichannel fused EEG feature processing [22]. CNN was used to extract the input characteristics, which were then implemented in VLSI hardware. The DEAP dataset provided the input EEG signals. Only 12 EEG channels are chosen from the input

21 A Comprehensive Study on Automatic Emotion …

273

32 channel EEG channels. The time–amplitude domain is then translated to time– frequency–amplitude domain using the STFT. Finally, updated EEG pairs are fused together to form a single fused EEG map, which is then fed into the CNN model. In VLSI hardware design, the traditional CNN architecture is tweaked and implemented. When matched to existing approaches that operate on the DEAP dataset, the set forth technique improves classification accuracy by roughly 20%. The emotional classification correctness of the CNN-based hardware system was 83.88%, according to the experimental results. The suggested technique is also effective for subject-dependent analysis. The proposed technique, according to the author, might be expanded into a subject-independent emotion identification system. Elham et al. proposed a 3D-CNN model for emotion recognition using multichannel EEG data [23]. The DEAP dataset provides the input EEG data. The input multi-channel EEG signals are converted into a 3D representation. The 3D CNN model receives this 3D representation as input. To represent the temporal relationships in EEG signals, the 3D-CNN-based emotion detection model extracts spatiotemporal characteristics. In comparison to traditional machine learning algorithms, the suggested method achieved classification correctness of 87.44% and 88.49% for VA and AR emotional classes, respectively. Using a deep learning architecture, Chung-Yen Liao et al. proposed an emotion stress detection system from EEG signals [24]. EEG data was collected from seven people. The test subjects are allowed to listen to music during this time, and their brain waves are recorded for 10 min. With the help of NeuroSky Mind wave Mobile, the raw data is transformed to EEG brainwave frequency. The FFT is used to digitize analogue EEG waves. The person’s mental state is rated as 0 or 1 based on the acquired EEG brain waves, with 0 indicating meditation and 1 indicating attention. A fully linked deep learning network is employed to categorize the data. There are seven hidden layers in the network design, with hidden nodes in the following order: 2, 32, 16, 16, 16, 16, 8, 8. The number of nodes in the final output is 1, and the ReLu activation function is employed. The outcomes of the experiments confirm that the suggested technique can appropriately forecast roughly 80% of the test individual’s mental state. In addition to focus and mediation, the task can be expanded by incorporating various emotional states. Overfitting is a key problem of machine learning-based categorization algorithms. Jinpeng Li et al. suggested a novel emotion recognition system based on HCNN [25] to solve this challenge. The SEED dataset provided the raw EEG data. The EEG data is preprocessed using a band-pass filter with a frequency range between 0.3 and 50 Hz. The EEG signals are decomposed using STFT using a uncorrelated Hanning window. The EEG features were structured by means of a 2D feature map using all of these bands’ DE features. For further processing, this map is put into HCNN. The proposed strategy is compared to existing methods such as SAE, SVM, and KNN in order to demonstrate its efficacy. Deep learning models outperform machine learning techniques in emotion categorization, according to the results of the experiments. Emotion classification is also reported to be better in beta and Gamma bands. The authors propose that in the proposed study, a distinct HCNN is used for each frequency band. All five HCNN models could be fused in the future to

274

T. Abimala et al.

assess classification accuracy. The proposed approach must also be tested for evoked EEG signals in real time. Moon et al. [26] suggested an emotion identification system with deep learning models like CNN. Another significant phase of this investigation is that it proposes a technique for capturing asymmetric brain activity and so boosting the accuracy of emotion identification. The DEAP dataset provided the input EEG data. The raw EEG data is divided into five groups, with a fivefold leave-one-cluster-out crossvalidation scheme used to classify them. Because the DEAP dataset’s EEG data is insufficient for deep CNN training, the unique method offers a feature extraction method that uses Welch’s method to extract 320 PSD features. A total of 32 × 32 × 10 matrix was constructed for each frequency section of EEG waves. In each video, 115 segments of EEG data were generated. In addition to PSD features, the research advises using connection features. Pearson correlation coefficient (PCC), phase locking value (PLV), and phase lag index (PLI) are connection metrics derived from EEG data for each electrode. The classification process employs three different CNN architectures: CNN-2, CNN-5, and CNN-10. In comparison to an SVM classifier with radial basis function, the CNN-5 achieved a classification accuracy of 80.86% in the experiments. In comparison to PSD features, the classification accuracy of linking features is also increased. CNN-5 emotional classification with PLV matrices has a 99.72% accuracy rate. As a result, PLV outperforms other connectivity features. Additional connectivity elements for emotional classification can be added to the work in the future. Using a hybrid deep neural network, Youjun Li et al. recommended an automatic emotion recognition method from EEG signals [27]. DEAP dataset provided the input EEG signals. Emotion detection is done in two steps in this suggested study. The multidimensional feature map is constructed in the first stage by merging spatial, frequency, and temporal structures of EEG data. The collected EEG characteristics are then translated into EEG MFI pictures using the interpolation approach. To categorize emotional states in the second stage, a network that comprises CLRNN is utilized. The proposed model focuses primarily on the VA and AR states among emotional states such as VA, AR, DO, LI, and FA. The experimental findings are compared to baseline approaches such as KNN, random decision forest, and SVM in order to validate the proposed model’s classification accuracy. The findings are also compared to those obtained using the CNN + RNN approach. The results reveal that emotional classification accuracy with CLRNN is 75.21%, but other classification methods such as CNN + RNN, SVM, random decision forest, and KNN have average accuracies of 69.58%, 67.45%, 45.47%, and 62.84%, respectively. As a result, it can be concluded that CLRNN has a higher accuracy than other approaches. The time window of MFI sequences is also used to assess further classification accuracy. According to this investigation, MFI sequences with time windows of 2 or 3 s had a higher classification accuracy than sequences with time windows of more than 4 s. Liang et al. proposed an EEG-based emotion recognition application using hybrid CNN-based EEG fusenet [28]. The input EEG data was obtained from three benchmarked datasets such as DEAP, MAHNOB-HCI, and SEED. The proposed method

21 A Comprehensive Study on Automatic Emotion …

275

mainly aimed at creating a hybrid EEG feature model that includes the spatial and temporal characteristics of EEG signals. The hybrid model is constructed by combining CNN, RNN and GAN. EEG feature extraction and representation are done by CNN and RNN, respectively. The GAN method is used for training process. The experimental analysis showed that the proposed unsupervised method was evaluated in all three datasets. The validation is done using leave-one-subject-out cross-validation (LOOCV). Also the proposed method was evaluated for subjectindependent analysis, and results proved it to be better compared existing feature representation methods. The results showed that the values Pacc and Pf are higher in all three datasets. On the other hand, the results portray that emotion recognition using unsupervised method performance is little lower compared to the existing supervised methods. Hence, the work can be further extended by improving the performance of EEGfuse hybrid model with other unsupervised methods. Wang et al. proposed a novel EEG-based emotion recognition system using framelevel distilling neural network (FLDNet) [29]. The input EEG signals are obtained from datasets such as DEAP and DREAMER. The FLDNet is built using three neural network layers. From the input EEG signals, features are extracted using LSTM and frame gate layers. The extracted features are then decoded using another LSTM layer. The emotion recognition is performed using multidense layer. The experimental outcomes prove that the average emotion recognition accuracy of FLDNet outperforms in both datasets in classifying valence, arousal, and dominance states. In future, the proposed work can be combined with CNN model to achieve enhanced classification performance.

2.2 Machine Learning-Based Emotion Detection Systems Because of its sturdiness, emotional classification based on EEG data has increased in popularity. Qiang Gao et al. recommended a hybrid feature abstraction approach for emotion recognition system [30]. A cordless Emotiv EPOC headset was used to acquire the input EEG database from ten subjects [18]. Both power spectrum and wavelet energy entropy characteristics were combined in the proposed technique. The power spectrum features are extracted using STFT, and the wavelet energy entropy feature is extracted from frequency band data using DWT. As a feature selection algorithm, PCA is used. SVM and RVM classifiers are used to classify emotional states such as neutral, happiness, and sorrow. According to the results of the experiments, the classification accuracy of the SVM classifier is 85.67%, 87.11%, and 89.17%. To boost accuracy, the author utilized the RVM classifier, which produced better results than the SVM classifier. They continue by stating that the study can be further advanced by combining additional physiological data with EEG information using a deep learning algorithm, resulting in a multidimensional feature map. Ensemble learning can also be used to construct a feature abstraction method in deep learning and increase classification accuracy.

276

T. Abimala et al.

Dahua Li et al. projected a blend of facial emotion detection and facial landmark localisation based on EEG signals [31]. Using an Emotiv EPOC headset [18], raw EEG data was obtained from six people. The DWT was engaged to preprocess the signals. Individuals’ expression photographs are used to identify face landmarks. Eyebrow contour points, eye contour points, pupils, nose tips, nostrils, mouth contour points, and mouth centre are used to extract face features. Using a fusion vector, the EEG signals energy and facial characteristics are combined. The SVM method is used to classify face expressions. Natural, gnashing, smile left, smile right, raise eyebrows, and frowning are the six states of facial expression classified in the proposed work. The classification correctness of the EEG approach and the fusion method are compared in the experimental analysis. The average classification accuracy using the suggested fusion technique was 86.94%, which is advanced than the accuracy of the EEG-based method, which is around 82.78%. They employed fivefold and tenfold cross-validation methods during SVM classification in this study. As a result, regardless of whether EEG or fusion methods were used, the accuracy of the tenfold cross-validation approach was 0.83% greater than that of the fivefold validation method. The set forth technique has a disadvantage in that it only used the EEG signals’ energy properties. As a result, the research can be expanded to find an appropriate feature extraction model for EEG signals. The ability of EEG waves to detect emotions was largely influenced by feature selection approaches. Bahareh Nakisa et al. suggested an unique EEG feature selection model based on evolutionary algorithms such as ant colony optimization (ACO), simulated annealing (SA), genetic algorithm (GA), PSO, and DE [32]. EEG data for the input comes from datasets like MAHNOB and DEAP. Real-time data was also acquired from 13 subjects who wore EEG sensors on their phones, such as the Emotiv Insight wireless headset. The EEG data are denoised using Butterworth and Notch filters during the feature preparation stage. Following that, independent component analysis (ICA) aids in the generation of pure EEG signals. The time, frequency, and time–frequency domains of preprocessed EEG signals are used to extract features. For effective emotional categorization, EC algorithms aid in the reduction of feature dimensionality. Thirty features are chosen from a total of 1440 retrieved features in the proposed work. The final classification is completed in the four quadrants below. HA-P, LA-P, HA-N, LA–N. The reduced features from EC methods are classified using a PNN classifier. When compared to the DEAP dataset, the experimental outcomes illustrate that MAHNOB has higher classification correctness. On the MAHNOB dataset, DE, like the other five EC algorithms, achieved a classification accuracy of 96%. Furthermore, the results of the experiments indicated that the average classification accuracy of mobile sensors is around 65%. As a result, the suggested research shows that real-time data acquired by mobile sensors can be used to classify emotions in non-critical applications. In this paper, the premature convergence problem reported in EC algorithms is considered a restriction in emotional classification. Rami Alazrai et al. proposed a novel emotion detection system that uses time– frequency feature characteristics. The time–frequency representation of EEG signals was built using a quadratic time–frequency distribution (QTFD) technique [33]. The

21 A Comprehensive Study on Automatic Emotion …

277

DEAP dataset provides the input EEG data, and the proposed research focuses on the classification of VA and AR emotional states. When compared to STFT and WT techniques, the suggested QTFD method plots a time domain signal into a joint time–frequency domain. A SVM classifier with a Gaussian kernel function is employed in this study, along with a tenfold cross-validation process. Three analytic methods are used to evaluate classification performance which includes channelbased, feature-based and neutral class exclusion. EMD and higher-order crossings (HOC) are compared to the suggested QTFD approach. The findings illustrate that the suggested method achieved average classification accurateness of 89.8% and 88.9%, respectively, and that it attained a shorter computing time. According to the authors, the proposed work could be enhanced in the following ways. Deep learning-based algorithms can be utilized to detect emotions from a variety of sources, including EEG, voice, and facial signals. Then, using parallel computing technologies, the computation time can be further lowered. In addition, rather than classifying emotions, the suggested method can be utilized to assess individual emotions using time–frequency aspects of EEG signals. As a result, the detecting system is suitable for biological applications. The study can also be expanded to include subject-independent analyses. Using machine learning approaches, O. Bazgir et al. proposed an autonomous emotion identification system from EEG waves [34]. The input EEG data is collected from DEAP database. DWT is used to abstract EEG data, and average mean reference (AMR) approach is used to remove noise [33]. PCA is used to extract spectral properties such as entropy and energy from EEG data. SVM, KNN, and ANN are used to classify the data in the suggested work. Using an eightfold cross-validation approach, the model divides EEG data into VA and AR emotional states. Based on outcomes of the experiments, the SVM method attained an accuracy of 90.8% for AR and 90.6% for VA, which is higher than the KNN and ANN techniques. Other feature extraction methods, such as ICA or LDA, are suggested by the authors. They also recommend using classifiers like random forest, deep neural, and recurrent neural systems to improve classification accuracy. Yi-Ming Jin et al. suggested a domain adaptation network (DAN)-based emotion detection system [35]. SEED dataset provided the input EEG data for the proposed investigation. A band-pass filter in the range of 1 Hz and 75 Hz was engaged to preprocess the EEG waves. DE method is used to extract EEG characteristics. Based on the experimental outcomes DE outperforms other feature extraction techniques mainly PSD. Other classification methods, namely SVM, NN, TCA, KPCA, and TPT are compared to the proposed DAN network. DAN outperforms the other approach with an average classification accuracy of 79.19%, while SVM has the lowest classification correctness of 58.18%, according to test findings. The suggested method’s main characteristic is that it uses backpropagation to extract domain-invariant features after combining a gradient reversal layer with a domain classifier. The work may be used as a transfer learning platform as well as an emotion detection system. The work concludes with a proposal for future work that will primarily focus on classifying new sorts of emotions.

278

T. Abimala et al.

G. S. Thirunavukkarasu et al. put forward a smart human–machine interface (SHMI) based on EEG signals for vehicle safety [36]. Feature extraction approaches such as Russell’s circumflex model, Higuchi fractal dimension, and PSD emotion categorization are used in the suggested method. The wearable electrodes provide the EEG data for the input. EEG data is categorized into happy, sad, relaxed, and furious emotional states. The SVM classifier, according to the author, is used because of its performance in emotional classification regions. The proposed work’s major goal is to display the results of emotional classification in an HMI application. The classification findings are presented to the drivers as a source of information. The article concludes by claiming that the same framework may be used for preprocessed datasets like DEAP, and that in the future, we can design a real-time data gathering system to make the classification system subject-independent. Similar research can also be used to other BCI applications. R. Khosrowabadi et al. put forward a novel EEG-based emotion detection system. For border detection of EEG signals, the proposed approach used a self-organizing map [37]. The input EEG data was acquired using 8 Ag/AgCl scalp EEG electrodes from 26 healthy people. EEG signals were preprocessed with an elliptic bandpass filter to extract signals. The magnitude squared coherence estimate is used to extract the features (MSCE). Crispy boundary detection and SOM were used to find boundaries from the extracted features. The emotional classification is done using a KNN classifier and fivefold cross-validation. Based on the findings of the investigations, SOM boundaries classification outperformed crispy detection. The work also categorized emotional states according to the VA and AR dimensions.

3 Conclusion The performance of deep learning approach-based emotion classification system beats classical machine learning techniques, according to the literature review. This is discussed in Table 2. Machine learning approaches have a number of drawbacks, including increased computation time and poor classification accuracy. The emotional classification of multimodal analysis, such as fusion of EEG signals, facial expressions, and voice signals, is not suited for machine learning algorithms. Early convergence and overfitting of data are problems with machine learning. Finally, STFT, DE, interpolation technique, and RACNN model are used to extract deep learning-based classification feature extraction. Emotional classification is done utilising classifiers such as CNN, LSTM, MLF-CapsNet, RGNN, and RACNN based on the literature analysis. From the overview mentioned in Table 1, in emotional classification systems, CNN method is often employed. Furthermore, when it comes to the acquisition of input data, nearly all of the literature uses benchmarked datasets such as DEAP, SEED, and MAHNOB. And only a few studies have used real-time data from a low-cost EEG headset. The feature extraction approaches are the most difficult challenge in emotion categorization. It is vital to identify a good feature extraction method that aids in the proper classification of emotional states. Despite

21 A Comprehensive Study on Automatic Emotion …

279

Table 1 Overview of emotion detection systems EEG dataset

Author

Method

Classifier

DEAP, MAHNOB-HCI, DREAMER, SEED and SEED-IV

Zhang [9]

Deep Learning

CNN

Chen [20] Wang [22] Moon [26]

DEAP, DREAMER FEXD, and real-time data collected from persons using EEG electrodes

Sharma [10]

LSTM

Liu [11]

MLF-CapsNet

Cui [16]

RACNN

Zhong [12]

RGNN

Ullah [19]

SDEL

Chao [21]

CapsNet

Elham [23]

3D-CNN

Li [25]

HCNN

Li [27]

CLRNN

Liao [24]

Fully connected Deep learning model

Gonzalez [13]

BioCNN

Zhang [14]

LIBSVM

Nakisa [15]

CNN and LSTM

Hassouneh [17]

LSTM

Ding [40] Real-time data collected using EEG electrodes

MAHNOB, DEAP, SEED

Gao [30]

TSception Machine Learning

SVM, RVM

Li [31] Thirunavukkarasu [36] Khosrowabadi [37]

KNN

Nakisa [32]

PNN

Alazrai [33]

SVM

Bazgir [34]

SVM, KNN, ANN

Jin [35]

DAN

the fact that deep learning approaches improve classification outcomes, the majority of the literature uses a small number of data and conducts subject-dependent categorization. As a result, a subject-independent emotional categorization model will need to be created in the future, by integrating combining deep learning-based BCI methods with optimal learning methods [38] and reinforcement learning algorithms [39]. This integration paves way for more number of BCI applications in real world (Table 2).

280

T. Abimala et al.

Table 2 Advantages of deep learning model Classifier

Advantages

HFCNN

When compared to CNN provide better classification results. Also can be used in subject-independent emotional detection system

CNN

Classification accuracy is improved compared to SVM. Also capable of using diverse EEG features such as temporal and frequency features

LSTM

Improved classification accuracy in non-stationary EEG signals. Also suitable for classification of real-time data

MLF-CapsNet

Experimental outcomes prove that classification accuracy is improved compared to Cont-CNN, CNNRNN, and DGCNN

RACNN

This method is best suited for classification of EEG data with temporal, regional, and asymmetric features

RGNN

The model is better compared to DGCNN, other GNN-based classification systems. The same is illustrated in experimental findings

SDEL

The method outperforms Bayesian classifier, segment-level decision fusion (SLDF), and sparsity restricted differential evolution-based channel selection and empirical mode decomposition

CapsNet

This model classifies the multiband feature matrix of EEG signals

3D-CNN

Improved classification accuracy compared to CNN

HCNN

Classification results are enhanced compared to stacked auto encoder (SAE), SVM, and KNN methods

CLRNN

The classification result is improved compared to KNN, SVM, and random decision forest

BioCNN

This model is suitable for hardware-based emotion recognition system

LIBSVM

The classification model is best suited for real-time data and fusion of facial and EEG signals

TSception

The classification results are enhanced compared to EEGNet, LSTM, and SVM methods

References 1. Alarcão SM, Fonseca MJ (2019) Emotions recognition using EEGsignals: a survey. IEEE Trans Affect Comput 10(3):374–393 2. Morena M, Leitl KD, Vecchiarelli HA, Gray JM, Campolongo P, Hill MN (2016) Emotional arousal state influences the ability of amygdalar endocannabinoid signaling to modulate anxiety. Neuropharmacology 111:59–69 3. Lin XB, Lee T-S, Cheung YB, Ling J, Poon SH, Lim L, Zhang HH, Chin ZY, Wang CC, Krishnan R, Guan C (2019) Exposure therapy with personalized real-time arousal detection and feedback to alleviate social anxiety symptoms in an analogue adult sample: Pilot proofof-concept randomized controlled trial. JMIR Ment Health 6(6):e13869 4. Tseng A, Wang Z, Huo Y, Goh S, Russell JA, Peterson BS (2016) Differences in neural activity when processing emotional arousal and valence in autism spectrum disorders. Hum Brain Mapp 37(2):443–461 5. Koelstra S, Muehl C, Soleymani M, Lee J-S, Yazdani A, Ebrahimi T, Pun T, Nijholt A, Patras I (2012) DEAP: a database for emotion analysis using physiological signals. IEEE Trans Affective Comput 3(1):18–31

21 A Comprehensive Study on Automatic Emotion …

281

6. Posner J, Russell JA, Peterson BS (2005) The circumplexmodel of affect: an integrative approach to affective neuroscience, cognitive development, and psychopathology. In: Development and psychopathology, pp 715–734 7. Bann EY, Bryson JJ (2013) The conceptualisation of emotion qualia: semantic clustering of emotional tweets. In: Proceedings of the 13th neural computation and psychology workshop 3 8. Soleymani M, Lichtenauer J, Pun T, Pantic M (2012) A multimodal database for affect recognition and implicit tagging. IEEE Trans Affective Comput 3(1):42–55. https://doi.org/10.1109/ T-AFFC.2011.25 9. Zhang Y, Cheng C, Zhang Y (2021) Multimodal emotion recognition using a hierarchical fusion convolutional neural network. IEEE Access 9:7943–7951. https://doi.org/10.1109/ACC ESS.2021.3049516 10. Sharma R, Pachori RB, Sircar P (2020) Automated emotion recognition based on higher order statistics and deep learning algorithm. In: Biomedical SIGNAL PROCESSING AND CONTROL, vol 58, 101867. ISSN 1746-8094 11. Liu Y, Ding Y, Li C, Cheng J, Song R, Wan F, Chen X (2020) Multi-channel EEG-based emotion recognition via a multi-level features guided capsule network. Comput Biol Med 123:103927. https://doi.org/10.1016/j.compbiomed.2020.103927 Epub 2020 Jul 22 PMID: 32768036 12. Zhong P, Wang D, Miao C (2020) EEG-based emotion recognition using regularized graph neural networks. IEEE Trans Affective Comput. https://doi.org/10.1109/TAFFC.2020.299 4159.H. Gonzalez A, Muzaffar S, Yoo J, Elfadel IM (2020) BioCNN: a hardware inference engine for EEG-based emotion detection. IEEE Access 8:140896–140914. https://doi.org/10. 1109/ACCESS.2020.3012900 13. Gonzalez HA et al (2021) Hardware acceleration of EEG-Based emotion classification systems: a comprehensive survey. IEEE Trans Biomed Circuits Syst 15(3):412–442. https://doi.org/10. 1109/TBCAS.2021.3089132 14. Zhang H (2020) Expression-EEG based collaborative multimodal emotion recognition using deep AutoEncoder. IEEE Access 8:164130–164143. https://doi.org/10.1109/ACCESS.2020. 3021994 15. Nakisa B, Rastgoo MN, Rakotonirainy A, Maire F, Chandran V (2020) Automatic emotion recognition using temporal multimodal deep learning. IEEE Access 8:225463–225474. https:// doi.org/10.1109/ACCESS.2020.3027026 16. Cui H, Liu A, Zhang X, Chen X, Wang K, Chen X (2020) EEG-based emotion recognition using an end-to-end regional-asymmetric convolutional neural network. Knowl-Based Syst 205:106243. ISSN 0950-7051 17. Hassouneh A, Mutawa AM, Murugappan M (2020) Development of a real-time emotion recognition system using facial expressions and EEG based on machine learning and deep neural network methods. In: Informatics in medicine unlocked, vol 20, pp 100372. ISSN 2352-9148. https://doi.org/10.1016/j.imu.2020.100372 18. EMOTIV (2017) Eeg device EPOC (Online). Available: https://www.emotiv.com/epoc/ 19. Ullah H, Uzair M, Mahmood A, Ullah M, Khan SD, Cheikh FA (2019) Internal emotion classification using EEG signal with sparse discriminative ensemble. IEEE Access 7:40144– 40153. https://doi.org/10.1109/ACCESS.2019.2904400 20. Chen JX, Zhang PW, Mao ZJ, Huang YF, Jiang DM, Zhang YN (2019) Accurate EEG-based emotion recognition on combined features using deep convolutional neural networks. IEEE Access 7:44317–44328. https://doi.org/10.1109/ACCESS.2019.2908285 21. Chao H, Dong L, Liu Y, Lu B (2019) Emotion recognition from multiband EEG signals using CapsNet. Sensors 19:2212 22. Wang K, Ho Y, Huang Y, Fang W (2019) Design of intelligent EEG system for human emotion recognition with convolutional neural network. IEEE Int Conf Artif Intelli Circuits Syst (AICAS) 2019:142–145. https://doi.org/10.1109/AICAS.2019.8771581 23. Salama ES, El-Khoribi RA, Shoman ME, Wahby Shalaby MA (2018) EEG-based emotion recognition using 3D convolutional neural networks. Int J Adv Comput Sci Appl (IJACSA) 9(8)

282

T. Abimala et al.

24. Liao C, Chen R, Tai S (2018) Emotion stress detection using EEG signal and deep learning technologies. IEEE Int Conf Appl Syst Invention (ICASI) 2018:90–93. https://doi.org/10.1109/ ICASI.2018.8394414 25. Li J, Zhang Z, He H (2018) Hierarchical convolutional neural networks for EEG-based emotion recognition. Cogn Comput 10:368–380 26. Moon S, Jang S, Lee J (2018) Convolutional neural network approach for EEG-based emotion recognition using brain connectivity and its spatial information. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 2556–2560. https://doi. org/10.1109/ICASSP.2018.8461315 27. Li Y, Huang J, Zhou H, Zhong N (2017) Human emotion recognition with electroencephalographic multidimensional features by hybrid deep neural networks. Appl Sci 7:1060 28. Liang Z, Zhou R, Li L, Huang G, Zhang Z, Ishii S (2021) EEGFuseNet: hybrid unsupervised deep feature characterization and fusion for high-dimensional EEG with an application to emotion recognition 29. Wang Z, Gu T, Zhu Y, Li D, Yang H, Du W (2021) FLDNet: frame-level distilling neural network for EEG emotion recognition. IEEE J Biomed Health Inform 25(7):2533–2544. https://doi.org/ 10.1109/JBHI.2021.3049119 Epub 2021 Jul 27 PMID: 33400657 30. Gao Q, Wang C, Wang Z et al (2020) EEG based emotion recognition using fusion feature extraction method. Multimed Tools Appl 79:27057–27074 31. Li D, Wang Z, Gao Q, Song Y, Yu X, Wang C (2019) Facial expression recognition based on electroencephalogram and facial landmark localization. Technol Health Care 27(4):373–387. https://doi.org/10.3233/THC-181538 PMID: 30664515 32. Nakisa B, Rastgoo MN, Tjondronegoro D, Chandran V (2018) Evolutionary computation algorithms for feature selection of EEG-based emotion recognition using mobile sensors. In: Expert systems with applications, vol 93, pp 143–155. ISSN 0957-4174. https://doi.org/10. 1016/j.eswa.2017.09.062 33. Alazrai R, Homoud R, Alwanni H, Daoud MI (2018) EEG-based emotion recognition using quadratic time-frequency distribution. Sensors 18:2739 34. Bazgir O, Mohammadi Z, Habibi SAH (2018) Emotion recognition with machine learning using EEG signals. InL 2018 25th National and 3rd International Iranian Conference on Biomedical Engineering (ICBME), pp 1–5. https://doi.org/10.1109/ICBME.2018.8703559 35. Jin Y, Luo Y, Zheng W, Lu B (2017) EEG-based emotion recognition using domain adaptation network. Int Conf Orange Technol (ICOT) 2017:222–225. https://doi.org/10.1109/ICOT.2017. 8336126 36. Thirunavukkarasu GS, Abdi H, Mohajer N (2016) A smart HMI for driving safety using emotion prediction of EEG signals. In: 2016 IEEE international conference on systems, man, and cybernetics (SMC), pp 004148–004153. https://doi.org/10.1109/SMC.2016.7844882 37. Khosrowabadi R, Quek HC, Wahab A, Ang KK (2010) EEG-based emotion recognition using self-organizing map for boundary detection. In: 2010 20th international conference on pattern recognition, pp 4242–4245. https://doi.org/10.1109/ICPR.2010.1031 38. Reddy TK, Arora V, Behera L (2020) HJB-equation-based optimal learning scheme for neural networks with applications in brain-computer interface. IEEE Trans Emerging Topics Comput Intelli 4(2):159–170. https://doi.org/10.1109/TETCI.2018.2858761 39. Ko W, Jeon E, Suk H-I (2022) A novel RL-assisted deep learning framework for taskinformative signals selection and classification for spontaneous BCIs. IEEE Trans Industr Inf 18(3):1873–1882. https://doi.org/10.1109/TII.2020.3044310 40. Ding Y, Robinson N, Zeng Q, Chen D, Wai P, Aung A, Lee T-S, Guan C (2020) TSception:a deep learning framework for emotion detection using EEG, pp 1–7. https://doi.org/10.1109/ IJCNN48605.2020.9206750

Chapter 22

Sensor-Based Secure Framework for IoT-Based Smart Homes Nidhi Dandotiya, Pallavi Khatri, Manjit Kumar, and Sujendra Kumar Kachhap

1 Introduction IoT is used to make the house or industry smart. Due to the IoT feature, we are able to control the device in a digital form. IoT helps to automate the home/industry and does smart work for interaction among devices. It can be accessed from anywhere if it is connected to the Internet [1]. To make the home smart, sensors are used which can be accessed from any smart device like smartphone and laptop from anywhere. CISCO Packet tracer provides security using firewalls to create IoT-based smart homes. CISCO simulator helps to work virtually to establish the connection and working of all the smart devices for the smart home in a smart way. Without any security, IoT-based smart homes and industry may get accessed by intruders/hackers [2]. By accessing these systems, they can leak personal/private data related to home/industry. After unauthentic access, an attacker may play the vulgar sound/videos on the system and control all the systems. This may cause the person to get tortured by different methods [3]. To overcome the risks of unauthenticated access in IoT-based smart homes, this work proposes a security system that helps to be safe from these kinds of intruders and not be prey to this kind of action. This work uses CISCO Packet Tracer to display the security in IoT-based smart homes. In this paper, Sect. 1 will describe IoT-based home security devices controlled through CISCO Packet Tracer. Sect. 2 will share some related work, and Sect. 3 will introduce the proposed work. In the last, this research is concluded.

N. Dandotiya (B) · P. Khatri · M. Kumar · S. K. Kachhap CSA Department, ITM University, Gwalior, India e-mail: [email protected] P. Khatri e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_22

283

284

N. Dandotiya et al.

2 Literature Survey IoT smart devices help the industry to increase productivity and improve efficiency. But, in working through the Internet, there is a risk from the intruders/hackers to get controlled by them. In [1] section, a solution to block the IP from one end has been proposed to stop pinging of a system from outside PC. To achieve this, different LAN is created, and IP is blocked for one LAN so that no pinging can be done from other LAN and stop unauthorized access. Authors in [4] have used Raspberry pi3 for deploying a security system in IoT-enabled smart homes. The system deploys a camera that captures the snap of every person who enters the house and gives an alert to the admin of every entered person’s snap-through email. All IoT devices could be accessed from any location, so security in this system was found weak. As per [5] briefs about a hacking event where a hacker was able to tap the smart devices in the home of a Wisconsin couple and ended up malfunctioning of devices by altering the temperature of appliances, chatting through their webcam, and even played some vulgar music. Researchers in [6] have proposed a more secure system where the port automatically is shut down if any unauthorized access or security violation is detected. CISCO Packet Tracer is a common virtual tool that can be used to simulate a smart home environment using IoT devices. The virtual IoT-based smart home networks use networking devices like switches router, server and ASA firewall for implementing a robust system [7–9].

3 Proposed System As briefed in the literature survey in Sect. 2, following limitations have been found in the existing system: • The hardware implementation of IoT home security via mail can be accessed unauthentically as there is no network security. • The security system of IoT is found week because they can access from any location. • The online configuration in the router could also be accessed from other devices at different locations. This work tries to overcome the weaknesses of the existing system by proposing the following solutions (as shown in Fig. 1). 3.1 Motion sensor with a mailing security system and 3.2 RFID-based security system

22 Sensor-Based Secure Framework for IoT-Based Smart Homes

285

Fig. 1 Sensor-based secure framework for IoT-based smart homes

3.1 Motion Sensor with Mailing System In this work, microcontroller, motion sensor and webcam are used to create a mailing system. The microcontroller is programmed to send a mail to the admin when a motion sensor senses the motion of any human body. For the backup plan, this research used a registration server to make the home security smart, and this server allows us to create different credential information that is able to access the IoT devices. Here network sniffer analyzes every packet from the remote network side or the cell network side. Due to this, sniffer can trace every activity on this network on the local network side. As shown in Fig. 2 due to the zone-based firewall security, the remote side network is denied to access the local network and also denied to access the FTP of the server. Due to this security system, the local network is quite safe from intruders. This work can access or browse the server or ping the server IP address. In this research, browse the Google server in a remote network and also the IP of that server to access the same browsing for google. Figure 2 explained that the remote network is unable to access the local network. This is happening due to the SITE-TO-SITE IPsec VPN tunnel which is also known as the zone-based firewall. This can access the FTP and also able to ping the IP of that server which is happening due to the ASA 5505 firewall terms and conditions which is provided by this for the local network side. In ASA 5505, we provided the security conditions that help to block the remote network to prohibit access to the local network unauthentically, as explained about the ASA 5505 in Fig. 2 about the terms and conditions which makes it able to browse the google server. After that, register for the IoT devices to control these devices from out of the home by using the smartphone via a cell tower where Internet access is available. After registration, interface as shown in Fig. 3 for controlling the devices according to the users who want to make their home smart.

286

N. Dandotiya et al.

Fig. 2 Local network has permitted for FTP and is able to ping the 172.16.2.54

Fig. 3 IoT devices on local network

After registering, the email alert system is activated (Fig. 4) when the sensor senses the human body and takes a snap of that human. Now smartphones can access the devices that connect to the network in which the Internet access is provided and is easily able to monitor all the smart devices from anywhere out of the home and also able to control those devices from anywhere.

22 Sensor-Based Secure Framework for IoT-Based Smart Homes

287

Fig. 4 Email alert system

For sending email to admin, following algorithm helps, while motion sensor senses the human body and after sensing body the sensor turns the webcam and takes the snap. Algorithm 1: Algo of microcontroller for email system

EmailSystem( HIGH,LOW) 1. Repeat while (True) 2. If digitalRead(0):=High Set CustomWrite(1,HIGH) Sleep(1) EmailClient.Send(“ID”,”subject”,”message”) EmailClient.Receive() Sleep(1) Else Print(“Nothing”) CustomWrite(1,LOW) Sleep(5) [End of if statement] [End of loop] End

288

N. Dandotiya et al.

With the help of this algorithm, a mail is sent to a particular admin with the message and snaps. This system is able to keep our house safe and also knows every person who entered the home. This system makes it easy to take care of home from unexpected incidents. The network security system is necessary for a smart IoT home.

3.2 RFID Security System The RFID system unlocks the door by using the RFID card which is allowed to access by the RFID card reader. Figure 5 can protect the house from the thief. They cannot easily be able to enter the house because without a valid card they will be not able to access the door. In this system, if the card is accessed authentically, then the RFID card reader glows in green color which means that the user has authentic access to the house (Fig. 5). If the user has not a valid card, then the RFID card reader will glow RED which means unauthentical access. These security features can smartly access the door. As in Fig. 6 that if the card is not valid, then the door will not open and gives the

Fig. 5 RFID security system

22 Sensor-Based Secure Framework for IoT-Based Smart Homes

289

Fig. 6 RFID system working

red signal on the RFID card reader, and also the door will not able to open without the valid card. Algorithm 2: Algo For RFID card reader

Setup ( ) 1. 2.

Pinmode(Dcard , output) Pinmode(Rcard, input) Loop( ) If analogRead(Rcard ==0) {

CustomWrite(Dcard,1) }

Else CustoWrite(Dcard, 0) Degitawrite(1, high) Sleep(1) Degitalwrite(1, low) Sleep(1) [End of if statement] [End of Loop] End

290

N. Dandotiya et al.

Algorithm 2 allows access of the valid RFID card to access the smart door. This algorithm can provide a smart unlock system to unlock the door and also provide the security system from the thief. This algorithm is used to unlock and validate the RFID card to access if the RFID card is invalid and then the access will be denied which unlocks the door. These features can smartly secure the home from the thief or unauthentic users. This algorithm helps to send the analog signals to the door to work as per the algorithm functioned set in the RFID card reader.

4 Conclusion In this work, enhanced security on IoT systems using the email alert system integrated with a webcam is used that senses human radiation. This research can detect every movement happening in the smart house. This work also shows how an RFID system can be used to unlock the house in a smart way using the RFID card reader. The user must have a valid card to get access into the house if they do not have a valid card, then they will not be able to enter into that smart house. This system helps to keep the house safe from the thief and also from the unauthentic person who wants to enter the home with bad intentions. To secure all these IoT devices, this work implements multilayered security. This security system analyzes every packet transferred or requested to the local network side. This work monitors and reports all the activities performed over the network, hence helps in securing the smart house from cyber-attackers.

References 1. Smith A (2011) Development of a simulated internet for education. Research in learning technology. Archived from the original on 2017–08–16 (31 August 2011) 2. Expósito J, Trujillo V, Games E (2010) Using visual educational tools for the teaching and learning of EIGRP. In: Proceedings of the world congress on engineering and computer science, vol I. Retrieved 27 Aug 2018 3. Frezzo DC, Behrens JT, Mislevy RJ (2018) Simulation-based environment for a community of instructors: design patterns for learning and assessment. Archived from the original on 2017– 08–16. Retrieved (26 August 2018) 4. Zhang Y, Liang R, Ma H (2012) Teaching innovation in computer network course for undergraduate students with packet tracer. In: IERI Procedia 5. Ogheneovo EE, Kio IS (2014) Modeling network router, switches and security using Cisco and OPNET simulation software 6. Ravi Chandra ML, Varun Kuma B, Suresh Babu B (2017) International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS-2017) 7. Javid SR (2018) Role of packet tracer in learning computer networks. Int J Adv Res Comput Commun Eng 8. Trabelsi Z, Saleous H (2019) Exploring the opportunities of Cisco packet tracer for hands-on security courses on firewalls. In: IEEE Global Engineering Education Conference (EDUCON)s 9. Wisconsin couple describes the chilling moment that a hacker cranked up their heat and started talking to them by using Google Nest camera. www.businessinsider.com. 25 Sept 2019

Chapter 23

Manipulating Muscle Activity Data from Electromyography for Various Applications Using Artificial Intelligence Piyush Agrawal, Apurva Joshi, and Shailesh Bendale

1 Introduction Electromyography is a technique through which one can study the change in electrical signals produced by the nerve cells in the nervous system while performing movement tasks such as flexing, contracting, or relaxing muscles of the body [1]. The nervous system has two classes: the central nervous system which analyzes the information and the peripheral nervous system which brings the signals to the central nervous system. Neurons are the building blocks of the nervous system [2]. The electrical signals are generated when the central nervous system signals the motor neurons to move the muscles [3]. Different movements generate different electrical signals which can be collected from the surface of the skin and studied to isolate a particular signal pertaining to each of those movements. Time series classification using deep learning has been able to achieve state-of-the-art performance in many tasks [4]. Since time series data is continuous, using sequential models like RNN, GRU, and LSTM can give good results [5]. However, CNN is also a promising candidate for neuronal signal data [6]. When neuronal data like EMG or EEG is processed using a signal processing software like EMGLab, the continuous signal corresponds to an amplitude graph which can be used to detect distinct features in the incoming signal. We show below how EMG signals differ from a patient suffering from ALS from someone who is not. CNN is generally used in deep learning for image tasks, but a 1-D CNN is a very good prospect for neuronal data. As opposed to converting the continuous signal data into an image and then applying CNN, we directly pass it as input into a 1-D CNN. Since the signals form can be visualized by an amplitude P. Agrawal (B) · A. Joshi University of Sussex, Brighton, UK e-mail: [email protected] S. Bendale Savitribai Phule Pune University, Pune, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_23

291

292

P. Agrawal et al.

graph, there is already distinctness in the numerical data which can be resolved by training the CNN filters which can save multiple steps of generating the image before passing it to the model.

2 Collecting Data from Non-invasive Electrodes These electrodes are conductors through which one can measure electricity on a surface or a region of the skin without penetration. Non-invasive electrodes have many use cases, one of which is getting the neuromuscular electrical data from the surface of the skin [7]. Surface electrodes interface with the human skin through electrolytes present between the skin and the electrode itself. There are some considerations which need to be taken into account when placing the electrodes. A major concern while recording EMG signals is the noise, which is the nearby electrical signals within the area of the electrodes. To resolve this, several methods for EMG signal acquisition are used of which bipolar and multipolar electrode configurations provide best results. A bipolar electrode configuration involves two electrodes placed a couple of centimeters apart from each other [8] (Fig. 1). A differential amplifier is used to cancel out the common noise produced from each electrode and outputs the difference multiplied by the gain. Formula for output voltage: Vout = (V2 − V1 ) × Gain Potential difference is measured between a pair of electrodes which can also give the velocity of the traveling signal over the distance between those electrodes [9]. Fig. 1 Bipolar electrode

23 Manipulating Muscle Activity Data from Electromyography for Various …

293

2.1 sEMG Signal Analysis One of the main challenges while working with sEMG signals is to accurately differentiate between distinct movements. sEMG signals have variable time period and frequency, so Fourier transform may work on insufficient capacity. An alternative of the Fourier transform is used which is short-time Fourier transform (STFT). In this method, the signal is divided into smaller segments, and at each segment, Fourier transform is computed. A spectrogram for the resultant transformations for each segment can be obtained using following. spectrogram(x(t), w(t)) = |STFTx (t, f ) |2 ∞

STFTx (t, f ) = ∫ [x(u)w(u − t)]e− j2π f u du −∞

Here, x(t) is signal function, w(t) is the window function, t is time, and f is frequency [10]. Though STFT can narrow the variability of the signal, it still cannot adapt to the changing frequency which may be present in the segment. Wavelet transform has high time and frequency resolution which is suitable to compute discontinuous signals as compared to Fourier transform which provides resolution in the frequency domain. Wavelet transform outputs scaled and translated mother wavelets ϕ(t) as the processed signal. t −b dt X (a, b) = x(t)ϕ a −∞

∞

Here, x(t) is the real signal, and ϕ(t) is some random mother wavelet. Time-domain statistics is another way of sEMG signal analysis. A set of features is used for analyzing each segment for signal classification [11]. As specific muscles control distinct movements, precisely placing the electrodes can both collect the electrical data from that movement or stimulate it. Table 1 shows the electrode placements on various positions in the body which captures specific muscles movements [12, 13]. Electrical data can be passed as the input to the deep learning model, and the classified action will be the output which can be mapped to a function or a task to be performed.

3 Applications One of the most prominent use cases for electromyography is getting insights into motor neuron, neuromuscular, and neurodegenerative disorders like amyotrophic lateral sclerosis (ALS), myopathy, Parkinson’s, Huntington’s chorea, etc. Machine

294

P. Agrawal et al.

Table 1 Electrode placements corresponding to distinct movements Category

Electrode position

Movement

HAND Bicep/shoulder Anterior deltoid and middle deltoid

Flexion and abduction of whole hand

Tricep/elbow

Middle tricep and elbow joint

Elbow extension

Forearm

Distal to lateral condyle and tinnitus area of forearm

Wrist extension

Forearm

Medial condyle and flexor surface of tendons

Wrist flexion

Forearm

Middle forearm below wrist and tinnitus portion

Finger flexion

Hip

Gluteus maximus

Hip extension

Thigh

Over Patella at distal medial and proximal lateral (vastus lateralis)

Knee extension

Thigh

Popliteal fossa at distal medial and proximal about mid line (biceps femoris)

Knee flexion

Calf

Over muscle belly of anterior tip close Ankle dorsiflexion to tibia and close to tibia down the shank

Calf

Medial and lateral gastric and soleus

LEG

Ankle plantar flexion

learning and deep learning could provide a diagnosis of these disorders through analysis of EMG signals. Predicting ALS using deep learning: We show how deep learning can be used for analyzing EMG signals to classify or differentiate between a healthy person and a patient with ALS. ALS also known as Lou Gehrig’s disease is a neurodegenerative disease. The neurons responsible for permissive movement of a person suffering from ALS deteriorate and eventually die. Due to the breakdown of nerve cells, the patient loses motor functionality. Symptoms include muscle stiffness, muscular weakness, cramping, loss of muscle, fatigue, vocal cord spasm [14–17]. For this task, we use convolutional neural networks or CNNs. CNNs are used to decode the complex patterns of an image. It does so by mathematically analyzing the spatial arrangement of pixels of the image through convolutions. This makes CNNs a prime suitor for image processing and computer vision. The key layers of a CNN are convolutional layer and pooling layer which are arranged sequentially; i.e., the pooling layer follows a convolution layer followed by the output (Fig. 2). The dataset used here is provided by EMGLab. This dataset consists of EMG signals recorded from ALS patients and normal people. The control group comprised four females and six males aged between 21 and 37 years who did not have any

23 Manipulating Muscle Activity Data from Electromyography for Various …

295

Fig. 2 Convolution neural network architecture

history of neuromuscular disorders and were fit or in good shape, while the ALS group comprised four females and four males aged between 35 and 67 years [18]. The electrical signals from biceps brachii for each class of subjects (patients with ALS and normal people) were measured. The signals are generated as a result of isometric contraction in the muscles. We compiled a set of 2860 images in which the training set was 2522 images—1261 each for ALS and normal signals, and the test set was 338 images 169 each for ALS and normal. The amplitude scale is between −500 to 500 uV, and time step is 0.5 s for each image. Figure 3 shows the sample dataset. The images were preprocessed using VGG16 and then resized to (185,870) pixels (Table 2). We have used three convolutional 2D layers with 32, 64, and 128 filters and filter size of 3 × 3, 5 × 5, and 7 × 7, respectively, each followed by a max pooling 2D layer of size 2 × 2 and strides equal to 2. Table 3 shows the summary and results obtained from our CNN model. As a comparison, Sengar et al. [19] used an ANN classifier and achieved an accuracy of 92.50%. There is a lot of potential for medical applications using artificial intelligence. EMG analysis is used for the diagnosis of neuromuscular disorders, and we believe pairing the two together could provide affordable, automated, and quicker assessment for the same. Hence, we show how deep learning can analyze electrical data generated from the movement of muscles to detect if a person has ALS or not. Similarly, other MND and neurodegenerative disorders can be predicted (Fig. 4). Predicting hand gestures from EMG data: Gestures can be used to convey a command to a machine with the help of different electrical signals generated by different gestures. We implement a time series projection model using a onedimensional convolution neural network. Time series projection or forecasting is a method to analyze periodic temporal data meaning, analysis of the data points which are captured at equal time intervals.

296

P. Agrawal et al.

Fig. 3 a Sample of EMG signals of an ALS patient, b sample of EMG signals without any motor neuron disease

Table 2 CNN model summary No

Layer (type)

Filters/units

Size

Strides

1

conv2d (Conv2D)

32

3×3

1

2

Max_pooling2d

–

2×2

2

3

conv2d (Conv2D)

64

5×5

1

4

Max_pooling2d

–

2×2

2

5

conv2d (Conv2D)

128

7×7

1

6

Max_pooling2d

–

2×2

2

7

Flatten

8

Dense (output)

Table 3 Results after tenfold cross-validation

2

Accuracy

Precision

Recall

98.38%

97.16%

97.24%

1-D CNN takes a one-dimensional sequence as input, which in this case is the temporal sEMG signal data. CNN uses filters that are learned during the training. They are used to identify features in the data. By convention, the filter is smaller than the signal length.

23 Manipulating Muscle Activity Data from Electromyography for Various …

297

Fig. 4 Training versus validation accuracy graph for the first fold

1-D CNN equation is ym =

p

xm−k wk

k=− p

where k is the counter which counts along the length of the filter. w = filter/filter, nonzero real value of a specific length. x = signal/input, nonzero real value of length (0, m − 1). y = result/input, nonzero real value of length (0, m − 1). To handle sliding of the filter more easily, we consider that the filter has an odd number of elements, so we can easily line up the value of the signal with the value of the result; hence, w extends from −p to p. This is followed by a pooling layer to extract the dominant features in the signals and a flatten layer to convert the result into a 1-D matrix. Finally, the model contains a dense neural network and output (Fig. 5). The dataset we use here is preprocessed using band pass filtering at 20–380 Hz sampled at a 1000 Hz from CapgMyo which basically contains muscles’ electrical activity from a set of hand gestures obtained from an array of 8 × 16 electrodes placed around the right forearm of 23 subjects aged between 23 and 26 individually. In our case, we use the dataset labeled “Dba” recorded from 18 subjects executing 8 isometric and isotonic hand gestures for 3–6 s each and repeating the same 10 times [20]. The data for each trial of a gesture for 1 subject is a matrix of 1000 × 128 signals, i.e., 10,000 × 128 for each gesture and 80,000 × 128 signals for all the gestures. For all the 18 subjects, we have 1,440,000 × 128 signals. We have used the data from odd trials for each subject as our training set and the even trials as our testing set as

298

P. Agrawal et al.

Fig. 5 one-dimensional convolution neural network model architecture

Table 4 1-D CNN model summary L. no

Layer (type)

Filters

Size

Strides

1

conv1d(Conv1D)

64

3

1

2

Max_pooling1d

–

2

2

3

conv1d(Conv1D)

32

5

1

4

Max_pooling2d

–

2

2

5

Flatten

6

Dense

928

7

Dense

512

8

Dense (output)

8

done in the original paper [21]. We propose a time series projection model for this task. Table 4 shows summary of our model (Fig. 6). In the original paper [21], the authors use instantaneous sEMG images and their difference images (which is the temporal difference between two consecutive sEMG images) for the test on DBa which are grayscale image representations of twodimensional arrangement of the instantaneous sEMG signals at each sample and achieved an accuracy of 89.3% and 84.6%, respectively, with no majority voting over recognition results (Table 5). We trained our time series projection model for about 300 epochs with the Adam optimizer with a learning rate of 0.001 which took about 70 min to train on a Tesla P100 GPU and achieved an accuracy of 99.16% without any majority voting over the results. To compare with traditional machine learning algorithms, we trained our model using a random forest classifier on the same dataset and achieved an accuracy of 54% which was higher than SVM and KNN. The results clearly indicate that our model gives better results in comparison, and the extra step of arranging the EMG values into 2-D grid and converting them to images is not required. Also, there is no need to perform the step of majority voting.

23 Manipulating Muscle Activity Data from Electromyography for Various …

299

Fig. 6 Hand gestures: a thumb up, b abduction of all fingers, c extension of index and middle, flexion of the others, d extension of index and middle, flexion of the others, e extension of index and middle, flexion of the others, f pointing index, g thumb opposing base of little finger, h abduction of extended fingers [20]

Table 5 Result comparison (no majority voting)

Model

Description

Accuracy (%)

Proposed model

1-D time series CNN

99.16

Geng et al.

Instantaneous sEMG image

89.3

Geng et al.

Difference sEMG image

84.6

GRU

Sequence model

93.58

Random forest

Vanilla ML model

54

The recognized gestures can be easily used as triggers for executing certain functions such as opening a program or using system controls.

4 Conclusion In this paper, we learn about sEMG signal acquisition and analysis. We implement a CNN model for classification of motor neuron disease and a time series projection model using 1-D convolution for classifying 8 different gestures and their function mapping. Through our experiments, we achieved an accuracy of 99.16% on the gesture classification and 98.38% on motor neuron disease classification. Various other applications can be developed using electrical data from other muscles as mentioned in Table 1. For instance, an AI model trained on the electrical data collected from knee flexion and extension can control the movement of

300

P. Agrawal et al.

characters in a game or virtual worlds in VR. Similarly, applications for people with disabilities can be developed using the electrooculography (EOG) data collected from eye movements. The horizontal motion of the eye can be recorded with electrodes placed on the right and left outer canthus, and the vertical movements can be recorded with electrodes positioned below the left and the right eye on the infraorbital ridge [22] which can be used to train ML/DL models to map certain functions with respect to the blinking and or movement of the eye like typing and speech synthesis. A database of more gestures such as a sign language database can be used to make a real-time gesture to speech armband. Our experiments show significant results which provide a scope for creating such applications for commercial use. There are more applications that we hope to explore in our future work.

References 1. Qi J, Jiang G, Li G, Sun Y, Tao B (2019) Surface EMG hand gesture recognition system based on PCA and GRNN. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04142-8 2. National Institutes of Health (US); Biological Sciences Curriculum Study. NIH Curriculum Supplement Series [Internet]. Bethesda (MD): National Institutes of Health (US); 2007. Information about the Brain. Available from: https://www.ncbi.nlm.nih.gov/books/NBK20367/ 3. Sheng R, Zhang Z (2019) A hand gesture recognition using single-channel electrodes based on artificial neural networks. In: 2019 IEEE 8th joint international Information Technology and Artificial Intelligence Conference (ITAIC). https://doi.org/10.1109/itaic.2019.8785724 4. Ismail Fawaz H, Forestier G, Weber J et al (2019) Deep learning for time series classification: a review. Data Min Knowl Disc 33:917–963. https://doi.org/10.1007/s10618-019-00619-1 5. Yu W, Kim IY, Mechefske C (2021) Analysis of different RNN autoencoder variants for time series classification and machine prognostics. Mech Syst Signal Process 149:107322. https:// doi.org/10.1016/j.ymssp.2020.107322 6. Cheng Y, Li G, Yu M et al (2021) Gesture recognition based on surface electromyographyfeature image. Concurrency Computat Pract Exper 33:e6051. https://doi.org/10.1002/cpe.6051 7. Srinivasa MG, Pandian PS (2017) Dry electrodes for biopotential measurement in wearable systems. In: 2017 2nd IEEE international conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT). https://doi.org/10.1109/rteict.2017.825 6600 8. Jamal MZ (2012) Signal acquisition using surface EMG and circuit design considerations for robotic prosthesis. In: Naik GR (ed) Computational intelligence in electromyography analysis—a perspective on current applications and future challenges. IntechOpen. https://doi.org/ 10.5772/52556 9. Lamkin-Kennard KA, Popovic MB (2019) Sensors: natural and synthetic sensors. Biomechatronics 81–107. https://doi.org/10.1016/b978-0-12-812939-5.00004-5 10. Chen F, Wu L, Zheng (2020) Hand gesture recognition using compact CNN via surface electromyography signals. Sensors 20(3):672. https://doi.org/10.3390/s20030672 11. Nayana BR, Geethanjali P (2017) Analysis of statistical time-domain features effectiveness in identification of bearing faults from vibration signal. IEEE Sens J 17(17):5618–5625. https:// doi.org/10.1109/jsen.2017.2727638 12. Axelgaard Education Chapter 1: electrode placement and functional movement. Available from: https://www.axelgaard.com/Education/Electrode-Placement-and-Functional-Movement 13. EMG Practicum 1: Electrode location and placement. Available from: https://www.dnbm.univr. it/documenti/OccorrenzaIns/matdid/matdid174356.pdf

23 Manipulating Muscle Activity Data from Electromyography for Various …

301

14. Motor Neuron Diseases Fact Sheet. NINDS, Publication date: August2019. NIH Publication No. 19-NS-5371. Available from: https://www.ninds.nih.gov/Disorders/Patient-Caregiver-Edu cation/Fact-Sheets/Motor-Neuron-Diseases-Fact-Sheet#3144_5 15. Huntington’s Disease: Hope Through Research. NINDS, Publication date August 2020 NIH Publication No. 20-NS-19. Available from: https://www.ninds.nih.gov/Disorders/Patient-Car egiver-Education/Hope-Through-Research/Huntingtons-Disease-Hope-Through 16. Parkinson’s Disease: Challenges, Progress, and Promise. NINDS. September 30, 2015. NIH Publication No. 15-5595. Available from: https://www.ninds.nih.gov/Disorders/All-Disorders/ Parkinsons-Disease-Challenges-Progress-and-Promise 17. Myopathy Information Page|National Institute of Neurological Disorders and Stroke. Available from: https://www.ninds.nih.gov/disorders/all-disorders/myopathy-information-page 18. Nikolic M (2001) Detailed analysis of clinical electromyography signals EMG decomposition, findings and firing pattern analysis in controls and patients with myopathy and amyotrophic lateral sclerosis. PhD Thesis, Faculty of Health Science, University of Copenhagen, 2001 (The data are available as dataset N2001 at http://www.emglab.net) 19. Sengar N, Dutta MK, Travieso CM (2017) Identification of amyotrophic lateral sclerosis using EMG signals. In: 2017 4th IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics (UPCON), pp 468–471. https://doi.org/10.1109/UPCON.2017.825 1093 20. Du Y, Jin W, Wei W, Hu Y, Geng W (2017) Surface EMG-based inter-session gesture recognition enhanced by deep domain adaptation. Sensors 17(3):458 21. Geng W, Du Y, Jin W, Wei W, Hu Y, Li J (2016) Gesture recognition by instantaneous surface EMG images. Sci Rep 6(1). https://doi.org/10.1038/srep36571 22. Sullivan LR (1993) Technical tips: eye movement monitoring. Am J EEG Technol 33(2):135– 147. https://doi.org/10.1080/00029238.1993.11080442

Chapter 24

A Framework for Automatic Wireless Alert System for Explosive Detection Ankita Chandavale, C. Anjali, Sunita Jahirbadkar, and Niketa Gandhi

1 Introduction Threats to national security such as narcotics, explosives and special nuclear material (SNM) can be concealed in hidden compartments such as small bags, vehicles, etc. especially at crowded places by attackers [1]. Such threats result in loss of lives, hamper peace and our economic conditions. The security of our nation is of prime importance. It is essential to secure special high-risk locations such as government buildings, banking institutions, defense services and power stations. To address this problem, we have proposed and developed a mobile device based system that senses such explosives remotely and alerts to all users within Geo-Fence and nearby emergency security services so as to avoid destruction of lives. From literature survey, it is clear that there is a pressing need for security of human life from terrorist threats and maintains peace and hence to improve national economic conditions. There are many threats and risks due to vulnerabilities in the system. Many researchers have developed various explosive detection technologies and systems for different applications which has been reported with few problems and limitations [2–5]. Janet Reno has developed explosives detection systems for airports nationwide. A method based on firing the objects with newly designed bullets is devised in Florida, A. Chandavale (B) · S. Jahirbadkar Department of Computer Engineering, MKSSS’s Cummins College of Engineering for Women (CCOEW), Pune, India e-mail: [email protected] S. Jahirbadkar e-mail: [email protected] C. Anjali SMIEEE, Pune, India N. Gandhi SMIEEE, Toronto, Canada © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_24

303

304

A. Chandavale et al.

USA [6]. A technique based on emitting red light is developed by Joshua Sundram [5]. It is very essential and beneficial to the defense systems to get the knowledge of high explosives inside suspicious objects. A feather weight instrument that detects very tiny explosives from a short distance was developed and tested in Europe. It is essential to improve sensitivity, precision and robustness of instrument prior to its usage. To detect antipersonnel landmines and improvised explosive devices with sufficient speed, contrast and spatial resolution, Defense R&D Canada is under development of coded aperture-based X-ray backscatter imaging detector [6]. The detection portal system based on ion-trap mass spectrometry is developed for detecting dangerous chemicals that have high vapor pressure (such as TATP and inflammable liquids) in crowded places. The portal system is designed to detect 1200 persons/h [4]. It is seen that there is a need to have a sensor based system to find explosive materials at crowded places such as shopping malls, railway stations, airport, etc. To avoid loss of life, sensor-based systems must detect explosives hidden in bags, sealed containers and so on. A system based on flame sensor is designed using wireless communication [7]; however, it is restricted for fire detection. A non-intrusive sensor, cameras used for explosive detection need to be placed above at an angle perpendicular to the object [8, 9]. India is ranked at 135th position among 163 countries in global peace index report 2021 and ranks among the top 25 of the most violent countries across the world [10]. In light of such results based on few constraints, there is a need for a proposed and developed system to overcome such constraints and concentrate on some of the critical issues like remote, rapid, efficient and cost-effective alert systems for explosive detection.

2 Methodology The proposed framework in this paper is a step toward the same within our country itself. The past history shows that it is quite difficult to trace terrorists/ attackers and current time-consuming manual system allows the terrorists/attacker to escape due to lack in fast and rapid connectivity issues. The wireless alert system for explosive detection detects the explosive materials and provides alerts to nearby security controller, police stations and concerned higher authorities through mobile device. It identifies such attackers and tracks his location and informs to nearby security controller within few seconds. Thus, the proposed and developed system provides security to social relevance along with connect-dependent responsible organizations so as to avail prosperous economic conditions. Thus, the proposed and developed mobile device system senses such explosives and generates prior alerts, notifications to all users within the vicinity and concerned

24 A Framework for Automatic Wireless …

305

emergency services outside the boundaries of Geo-Fence so as to avoid destruction of lives. The framework basically has software as well as hardware. The algorithm is designed and developed for creating Geo-Fence based on four access points installed in college labs. The framework has developed algorithm for authentication and sending alert messages and notifications. The following section describes methodology steps in detail.

2.1 Geo-Fence Creation Geo-Fence is created using overlapping coverage based on omni-directional approach which is cost-effective solution [11]. The access points used for Geo-Fence have same specifications. These access points are installed in college lab with the unique SSIDs which are used for registration in the software programming. As soon as the user enters the Geo-Fence, i.e., college lab, he gets automatically connected to either of these access points and the user is validated using authentication process. Algorithm for Geo-Fence creation: 1. 2. 3.

4. 5.

Enable Wi-Fi on mobile device, i.e., user device. Set APfound flag = False and APCounter = N where N is number of access point used for Geo-Fence. If recognized SSID matches with either of SSIDs stored in central database, then make APfound flag = True and make mobile device camera ON. Else Go to step 2. Call Authentication Procedure Stop.

2.2 Authentification The developed system is aimed for particular physical boundaries so as to identify un-authenticated user within specified boundary. For this reason, Geo-Fence: virtual perimeter around a real geographic area is created. The exiting authentication techniques such as integrated windows authentication, basic authentication and multifactor authentication are able to secure the device or the area by the use of knowledge, possession and inherence factors. Due to compact size, secured and wireless communication of mobile devices, the mobile device has become integral part of human life, but at the same time it is a complex task to consider mobile devices for validation of user. We have developed an algorithm based on multifactor authentification technique. The algorithm implemented for two-step authentication is based on user facial image recognition and IMSI number of user’s mobile device. The image received from user mobile device is compared with images stored in central database for face

306

A. Chandavale et al.

recognition. The images stored in database have IMSI number as primary key which is used for comparing images in database. For facial recognition, local binary pattern histogram (LBPH) algorithm is used for feature extraction [12]. Image storage in database takes more memory as compared to vector storage. The LBPH algorithm generates local binary pattern vector for image which is stored in database with IMSI number as a primary key. The .png facial image received from mobile device is of 32-bit color image. This image is converted into grayscale image with dimensional size as 200 × 200. This 200 × 200 image is divided into 25 zones, each with 8 × 8 dimensions. The circular neighboring algorithm is used to calculate LPB code or vector for each zone. Initially, algorithm calculates the intensity of central pixel. The pixel is given value as “1” for central pixel intensity greater than neighbor pixel intensity otherwise “0” value is given to it. Thus, binary value is assigned to each pixel for all zones which generates LBP vector or code. Thus, byte code of an image is generated from set of LBP code of 25 zones. The algorithm implemented is as follows: 1. 2.

Divide the image into 25 zones each of 8 × 8 Calculate the LBP of central pixel using Eq. 1

np−1

LBP(X c , Yc ) =

2 p S Ip − Ic

(1)

p=0

where (X c , Y c ): coordinates of central pixel I c : Intensity of central pixel I p : Intensity of neighbor pixel 3. Align neighboring pixels on a circle as shown in Fig. 1. Calculate the position of neighbor pixel using Eq. 2 2Px × 100 2Px × 100 Y P Y P = Y P − R sin X p = X c + Rcos Ps Ps

(2)

where R: radius of the circle and p: number of sample points. 5. Interpolate the point if it doesn’t corresponds to image coordinates using bilinear interpolation using Eq. 3.

Fig. 1 Alignment of pixel

24 A Framework for Automatic Wireless …

f (x, y) = (1 − x x

307

f (0, 0) f (0, 1) f (1, 0) f (1, 1)

1−y y

(3)

2.3 Alarm Generator The sensors’ data from real-time detector is processed through hardware interface circuitry within Geo-Fence. It maintains the database which tracks the location of each and every user in Geo-Fence. The proposed and developed interface circuitry is responsible for transmission of data along with exact location of explosive devices to all users within the vicinity. It generates notifications along with alarms to nearby police station and security controller. Thus, the proposed and developed system is responsible to avoid threats, loss of lives and thus in turn provides support to police department to track suspicious attackers.

3 Testing and Results The LBP algorithm for face recognition is tested on images obtained from https:// www.kaggle.com website [13]. There are 500 images with 1100 tagged faces with bounding box. The recognition rate improves with quality of captured image. As shown in Table 1, as number of zones are increased, the accuracy of face recognition increases, but the storage space is also increasing. To trade-off between accuracy and space, 8 × 8 zone is used wherein the images is divided into 64 parts. The accuracy is calculated using Eq. 4. Accuracy =

Table 1 Results

Number of images correctly recognised × 100 Total images

Zone matrix

Accuracy (%)

Space (bytes)

5×5

62

198

10 × 10

73

396

15 × 15

91

587

20 × 20

93

699

25 × 25

97

824

(4)

308

A. Chandavale et al.

4 Conclusion The framework for automatic wireless alert system for explosive detection is designed and developed. The framework is based on mobile device with authentication of user. The paper has created Geo-Fence and developed authentication algorithm with accuracy of 93%. The future work is focused on integration of authentication, Wi-Fi positioning, detection of explosive materials with notifications or alerts.

References 1. Cheng C-T, Tse CK, Lau FCM (2011) A delay-aware data collection network structure for wireless sensor networks. IEEE Sensors J 11(3) 2. Yoshida T, Osada H, Chiba S, Kihchi T, Tayama N, Seki K, Matsuki H (1999) Construction of magnetic infrared sensor utilizing ferrimagnetic film. IEEE Trans Mag 35(5) 3. Zhang X, Schemm N, Balkır S, Hoffman MW (2014) A low-power compact NQR based explosive detection system. IEEE Sensors J 14(2) 4. Sifuentes E, Casas O, Pallas-Areny R (2011) Wireless magnetic sensor node for vehicle detection with optical wake-up. IEEE Sensors J 11(8) 5. Sundram J, Sim PP (2007) Wireless sensor networks in improvised explosive device detection. Naval Postgraduate School Monterey, California 6. Reno J , Fisher RC, Robinson L. Guide for the selection of commercial explosives detection systems for law enforcement applications. U.S. Department of Justice Office of Justice Programs 7. Ghosh P, Dhar PK (2019) GSM based low-cost gas leakage, explosion and fire alert system with advanced security. In: 2019 international conference on Electrical, Computer and Communication Engineering (ECCE), 7–9 Feb 2019 8. Haider S, Saeed U (2020) Explosive material detection and security alert system (e-DASS) 9. Chidella K, Saduzzaman A (2017) Prior detection of explosives to defeat tragic attacks using knowledge based sensor networks. In: 2017 ninth annual IEEE Green Technologies conference (GreenTech) 10. https://worldpopoulationreview.com/country-rankings/most-violent-countries. Retrieved on 21 Oct 2021 11. Sheth A, Seshan S, Wetherall D (2009) Geo-fencing: confining wi-fi coverage to physical boundaries. In: 7th international conference on pervasive computing, vol 5538. Springer, Heidelberg, pp 274–279 12. Oravec M, Pavlovicova J, Mazanec J, Omelina L, Feder M, Ban J (2011) Efficiency of recognition methods for single sample per person based face recognition, reviews, refinements and new ideas in face recognition. In: Corcoran P (ed). InTech. ISBN: 978-953-307-368-2. https:// doi.org/10.5772/18432 13. https://www.kaggle.com. Retrieved on 9 Aug 2021

Chapter 25

IoT and Deep Learning-Based Weather Monitoring and Disaster Warning System Chandra Kant Dwivedi

1 Introduction and Literature Review Climatic conditions play a very important role in our day-to-day life. It affects our course of action and daily activities. Hence weather monitoring is very important in our day-to-day life. Temperature, atmospheric pressure, humidity, moisture, air quality index are some basic weather parameters. With the evolution of IoT and vast sensor ranges, several weather parameters can be easily monitored and captured using existing IoT technologies [1–4]. In the past few years, weather monitoring system has evolved drastically that had helped in controlling various natural calamities and helped in advance preparation to reduce the possible losses [5]. As we know, weather parameters play a great role in any predicting and controlling the onset of various calamities. Proper study of weather parameters can give a great idea of the causes and forecast of the weather conditions and onset of disasters or calamities. IoT provides cheap, robust and reliable means to monitor and collect various weather parameters with the use of existing technologies (wifi, bluetooth, serial connections or ssh connections). With use of existing technologies (wifi, bluetooth, serial connections or ssh connections). In this paper, we have proposed a system that monitors various weather parameters and applies various deep learning algorithms on the data collected from the sensors that are being used. Moreover using api keys and TCP IP protocols, the system sends the various parameters to Thingspeak [5] server simultaneously. We can store these data values in usable formats. This data can be used for various calculations and studies. The objective of the proposed system is to design a cost-efficient IoT-based system that monitors the weather and disaster and issues warnings in case of any abnormal situation in a particular region in which it is deployed. In this project, we have utilised C. K. Dwivedi (B) Indian Institute of Information Technology, Pune, Maharashtra 411002, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_25

309

310

C. K. Dwivedi

hardware interface capabilities of Arduino Uno board and computational ability of Raspberry Pi board to apply various deep learning algorithm on the collected dataset [6]. The various sensors record the climatic parameters and send the warning to the concerned authorities in case of abnormal parameters or disaster occurrence. The data is collected and stored in a Raspberry Pi SD card in csv format. This file can be accessed to apply deep learning algorithms and train the models to predict the weather conditions and alarm is triggered. The aim is to combine network integration and data collecting capacity of IoT technology and the computational and prediction power of deep learning technology.

1.1 Technology and Software Used IOT, WSN, WIFI, TCP IP [3], Deep learning, Rasbian, Arduino IDE, Putty, Win scp.

1.2 Literature Review The Internet of Things(IoT) has changed people’s life. Now people do tedious work smartly. IoT is nothing but the interrelated machines and devices that are able to transmit data over the interconnected network. The devices can be any type such as computer devices, mechanical equipments and digital devices. Each devices provided with a unique identifier or the identity (UIDs) [3]. This IoT system is capable of transferring data over a network without requiring human-to-computer or human-tohuman interaction. An IoT environment comprises Web-enabled devices. They use communication hardware and processors, sensors to observe, collect, send and take action on data that is being collected from their environment [4]. The data that is being collected by the IoT devices by linking to an IoT gateway or edge is sent to the cloud for further analysis. These devices can also intra-communicate with the interconnected devices and take actions upon the information collected from each other. IoT had transformed the disaster mitigation strategies and weather monitoring systems. With help of sensors, microcontroller and protocols [7], we can observe the various very vital climate parameters that change before or during the disaster. Over the last few years with the evolution of heterogeneous system and wireless sensors networks, IoT had grown at lightning speed. The cheap and ease of communication with the various devices had made it very popular. IoT devices are being used in many fields such as defence, health care, weather monitoring, disaster control and prediction, agriculture [8], automobiles, security, home automation, surveillance, etc.

25 IoT and Deep Learning-Based Weather Monitoring

311

2 Methodology and Proposed System In this projects, various sensors such as DHT11, MQ7, BMP180, piezoelectric buzzer and rainfall sensor are utilised. The sensors are connected to Arduino Uno board that transfers the data to Raspberry Pi using esp8266-1 module using TCP IP protocols.

2.1 Proposed System The proposed system (as shown in Fig. 1) is a simpler approach for monitoring the weather conditions, namely temperature, humidity, pressure, light intensity, rainfall, fall detection and air quality levels at a particular place and storing the parameters in the Raspberry Pi in csv format [2]. The technology behind this is the Internet of Things (IoT), it is an advanced, improved, reliable and more efficient method to connect the things to the Web or the Internet. To do so in a network and storing data remotely and processing it, a Raspberry Pi 3 and deep learning model is used. Things are Arduino Uno, ESP8266 [4] module, various sensors and other computer applications. As shown in Fig. 2, the system monitors the weather parameters like temperature, pressure, altitude, humidity, rainfall, light intensity and air quality with sensors and sends the information to Raspberry Pi and processes the data to predict the situation and issues warnings using an alarm system and cloud server. As Fig. 3 depicts, the data retrieved from the implemented system can be accessed at any remote location with online cloud services like ThingSpeak [9] which acts as an intermediate to access the information and the data is also stored in a csv file in Raspberry Pi board in SD card. Sensors in this system collect information in a constant interval of time and upload the data to the cloud and Raspberry Pi. On this csv file, regression algorithm is performed to forecast the weather. The application

Fig. 1 System block diagram(A)

312

C. K. Dwivedi

Fig. 2 Process flow diagram

Fig. 3 System block diagram(B)

periodically updates, providing the end user with live weather parameters and while storing it in a csv file for data processing. The proposed system is cheap, robust, easy to operate, wide usage range or field of deployment and easy to maintain. It is fairly accurate and precise in predicting the weather condition.

25 IoT and Deep Learning-Based Weather Monitoring

313

3 Components Required 3.1 Hardware 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Arduino Uno Raspberry Pi 3 board ESP8266-based Wi-Fi module Temperature and Humidity Sensor(DHT11) Barometric Pressure Sensor(BMP180) LDR-Light intensity sensor Raindrop Module CO (Carbon monoxide) sensor Peizoelectric buzzer Printer cable.

3.2 Software 1. 2. 3. 4. 5. 6.

Arduino IDE Accessible Wi-Fi Putty Advance IP address scanner Raspbian OS for Raspberry Pi [7] WinSCP.

4 Implementation 1. Sensors are connected to the Arduino Uno. 2. They send the measured values to it and it uploads all the values to the Thingspeak server where the values are analysed and monitored in graphical forms. 3. ESP8266 Wi-Fi module is used to transmit data to the cloud via TCP IP protocol using api keys. The data on the Thingspeak server can be accessed remotely from anywhere using the api keys and server account. 4. This node(Arduino +sensors) is connected to Raspberry Pi 3. 5. The node sends the data not only to the cloud but also to the Raspberry Pi 3 using the netcat server that uses TCP IP protocols. 6. In Raspberry Pi 3, the data is collected using TCP IP protocols. 7. This database is stored in csv format using netcat server. The Raspberry Pi files can be accessed through the same network connection using ssh protocol using IP address and the password.

314

C. K. Dwivedi

8. On this csv file, a regression algorithm is implemented to train a regression model using a Python library to predict the output.

4.1 Setting Up Netcat Server in Raspberry Pi 1. The netcat (nc) is a computer networking utility for reading from and writing to network connections using TCP IP [10]. 2. It can produce almost any kind of connection, and its user could need and has a number of built-in capabilities. 3. Ability to use any local source port. 4. Ability to use any locally configured network source address. 5. Built-in port-scanning capabilities with randomisation. 6. Built-in loose source-routing capability. 7. Can read command line arguments from standard input. 8. Slow-send mode, one line every N seconds. Setting netcat: 1. Open raspberry pi linux terminal using putty, ssh enabled. 2. type nc and port name to listen to the port or the IP address. 3. e.g. nc -l -k 5000.

4.2 Developing the Models 1. Install raspbian OS on SD card using baleno etcher. 2. We can use ssh protocol in case a light version of raspbian is installed. 3. Now open the linux terminal and install Python and basic necessary libraries such as numpy, pandas, xgboost, matplot, sklearn and seaborn [7]. 4. Once the connections are made the data will be sent to the Raspberry Pi board through TCP IP protocol using netcat server [6]. 5. This data is stored in csv format. Now pre-process this data to use it in models by scaling it and removing the NAN values. Split the dataset into train and test to train the model [11]. 6. Import the model using libraries and feed the data to train the model. 7. Train the models one by one analyse the accuracy and precision of the models.

4.3 Deep Learning Models Used 1. Bagging classifier 2. KNeighbors classifier

25 IoT and Deep Learning-Based Weather Monitoring

315

Fig. 4 Basic working of deep learning models

3. 4. 5. 6. 7. 8. 9. 10.

Random forest classifier GridSearchCV Decision tree ExtraTree classifier Random forest classifier AdaBoost classifier Gradient boosting classifier XGBoost [4].

The above-mentioned algorithm is trained and tested on the dataset by splitting it into train and test set using Python tools. Figure 4 shows basic working of any deep learning model. The input layer is nothing but set of input weather parameters that are acquired by the sensors. The hidden layer is the processing layer that performs the calculation based on the activation functions and the output of each neuron is feed forward or feed backward based on the algorithm used. The depth the layer decides the computational cost and the speed of the algorithm. The activation function utilised play a deterministic role in the final output. The output layer is nothing but the classification result. This is how, in general, all basic deep learning models works. Each model has its own interconnected neurons and activation functions. Depending upon the activation function and hidden layer connection the accuracy and precision of the model varies. There are different concepts that various models use. To increase the efficiency and speed, nearby neuron results are some time clustered, bag, boosted or trees and forest concepts are implemented while designing the hidden layers. These concepts give rise to many class of classifiers. The techniques are efficient and fast and give good results.

316

C. K. Dwivedi

5 Result and Conclusion The system monitors (Fig. 5) the weather parameters and stores it. The proposed system generates alarms in case of abnormal conditions such as rain, fire and unfavourable weather conditions. The algorithms are implemented on the various parameters collected via sensors. Different algorithms show different results in predicting conditions. The below table shows performance of the models in predicting the weather conditions. From Table 1, we can conclude that XGBoost, decision tree classifier and random forest classifier give the best results while predicting the rain. From Tables 1, 2 and 3, we can conclude that the same algorithm does not give same accuracy while predicting a different condition. Results vary for different algorithm for same condition. Therefore,we need to choose the algorithm carefully as results vary largely. The results largely depend on the number of weather parameter used. More the number of sensors used more be the desired parameters, and hence, more accuracy can be gained. The training and test set is to be chosen very carefully to avoid over fitting and under fitting. There must be ample number of train and test cases to train the model. The various models implemented (Tables 1, 2 and 3) in the system can predict the different specific weather conditions such as rain and fog. (No. of data samples-100990)

Fig. 5 System developed

25 IoT and Deep Learning-Based Weather Monitoring Table 1 Algorithms implemented and their scores for rain

Table 2 Algorithms implemented and their scores for fog

317

318

C. K. Dwivedi

Table 3 Algorithms implemented and their scores for thunder

6 Limitations 1. Acquisition of accurate and precise data using more precise and accurate sensors. 2. There is always room for improvement in the accuracy and precision of the models used for prediction. 3. More variety of parameters can be utilised so as to predict better and accurately. 4. Using algorithms that are faster and more accurate can be used.

7 Scope for Future Work The future scope for this IoT-based system is the merging of highly accurate and vast data manipulation using machine learning and deep learning algorithms in order to predict the weather condition and disaster much accurately and early. As we know, the weather conditions are hard to predict and anticipate. The weather changes drastically and quickly within a short period of time. Hence, highly accurate and reliable systems are required that predict very accurately and within the safety measures. We can also add a GPS module in the design so that the location of the surrounding will also be mailed or messaged to the user along with the surrounding parameters, like temperature, humidity, pressure, light intensity, etc. Implementing the precise ML and deep learning algorithm on the collected data. A lot of work can be done on the

25 IoT and Deep Learning-Based Weather Monitoring

319

prediction of the weather and the onset of the disaster. The correlation and causation between the various weather parameters and the disasters could be established by the regression algorithm. Further making use the cloud computing further improves the application part of this project. The interaction and number of nodes that can be connected to a single database/Raspberry board and the separate interface of each database with the node. A lot of work can be done in this regard.

References 1. Franchi F, Marotta A, Rinaldi A, Graziosi F, D’Errico L (2019) IoT-based disaster management system on 5G uRLLC Network. In: 2019 international conference on information and communication technologies for disaster management (ICT-DM), Paris, France, pp 1–4. https:// doi.org/10.1109/ICT-DM47966.2019.9032897 2. Xie M, He Z, Liang L, Wu L, Sun G (2019) Slope disaster monitoring and early warning system based on 3D-MEMS and NB-IoT. In: 2019 IEEE 4th advanced information technology, electronic and automation control conference (IAEAC), Chengdu, China, 2019, pp 90–94. https://doi.org/10.1109/IAEAC47372.2019.8997685. 3. Khivsara BA, Gawande P, Dhanwate M, Sonawane K, Chaudhari T (2018) IOT based railway disaster management system. In: 2018 second international conference on computing methodologies and communication (ICCMC), Erode, 2018, pp 680–685. https://doi.org/10. 1109/ICCMC.2018.8487802 4. https://www.kaggle.com/radmirzosimov/eda-and-building-models-for-predicting-outflow 5. Nabil AM, Mesbah S, Sharawi A (2019) Synergy of GIS and IoT for weather disasters monitoring and management. In: 2019 ninth international conference on intelligent computing and information systems (ICICIS), Cairo, Egypt, pp 265–269. https://doi.org/10.1109/ ICICIS46948.2019.9014709 6. Lehmann G, Rieger A, Blumendorf M, DAGI SA ¸ (2010) A 3-layer architecture for smart environment models/A model-based approach/labor technische University Berlin, Germany 978-1-4244-5328-3/10 © IEEE 7. Siva SR, Gupta ANPS (2016) IoT based data Logger system for weather monitoring using wireless sensor networks. Int J Eng Trends Technol 32(2):71–75 8. Alif Y, Utama K, Widianto Y, Hari Y, Habiburrahman M (2019) Design of weather monitoring sensors and soil humidity in agriculture using internet of things (IoT). Trans Mach Intell Artif Intell 7(1):10–20 9. https://www.rhydolabz.com/wiki/?p=10872 10. Strigaro D, Cannata M (2019) Boosting a weather monitoring system in low income economies using open and non-conventional systems: data quality analysis. Sensors 19(5):1–22 11. Buzachis A, Fazio M, Galletta M, Celesti M, Villari M (2019) Infrastructureless IoT-as-aservice for public safety and disaster response. In: 2019 7th international conference on future internet of things and cloud (FiCloud), Istanbul, Turkey, pp 133–140. https://doi.org/10.1109/ FiCloud.2019.00026

Chapter 26

Sketch-Based Face Recognition M. Maheesha, S. Samiksha, M. Sweety, B. Sathyabama, R. Nagarathna, and S. Mohamed Mansoor Roomi

1 Introduction Facial recognition is one in which the identity of a person is related or confirmed based on their face, and it plays an important role in this fast-paced world. It is used for a variety of purposes such as unlocking the phone, law enforcement, airport and border control, attendance monitoring, and much more. In potential applications like law enforcement, where information about the crime is not available, the description of the crime by eyewitness accounts is huge. In such scenarios, the expertise of a forensic artist is availed in concurrence with the eyewitness to sketch the culprit as described. These forensic sketches are circulated among law enforcement offices and broadcasting outlets to locate the alleged culprit. This practice is time-consuming and may be hastened by automatic fetching of the suspected culprits from the mug shot database available with the law enforcement office, thereby making the task of finding out the suspects easier and saving M. Maheesha (B) · S. Samiksha · M. Sweety · B. Sathyabama · R. Nagarathna · S. M. M. Roomi Thiagarajar College of Engineering, Madurai, India e-mail: [email protected] S. Samiksha e-mail: [email protected] M. Sweety e-mail: [email protected] B. Sathyabama e-mail: [email protected] R. Nagarathna e-mail: [email protected] S. M. M. Roomi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_26

321

322

M. Maheesha et al.

time. Automatic sketch-based retrieval of the offender is carried out by matching the suspect’s sketch with the mug shot database. From a key crime scene, the eyewitness may describe a culprit to a forensic artist who can draw it. On the other hand, the eyewitness could also come up with a photograph of the culprit. These two cases are very much possible but they need not be a time-correlated snapshot of the original photograph available in the database. This means the photograph of the culprit taken and kept as the database in a police station or a central repository need not been taken recently. There could be a time difference between a photograph or sketch drawn with the photograph available in the database. We need to have a common representation that is insensitive to time difference which is an aging process. Since the sketch is insensitive to aging, illumination, pose, scale, and transition process, we have taken sketch as a backbone framework for this matching algorithm. Limited works have been reported on sketch-based image retrieval. Xiaoou Tang et al. [8] proposed face recognition using a sketch by Eigen sketch transformation. Galea et al. [9] proposed software-generated sketch matching with a very deep CNN. Anjan Dutta et al. [1] proposed a semantically tied paired cycle consistency for zero-shot sketch-based image retrieval. With the use of adversarial training, they tackle the problem of translating sketch and image features to share semantic space. It uses the Sketchy and TU-Berlin databases. Generalized fine-grained sketch-based image retrieval was proposed by Kaiyue Pang et al. [3]. This focus on identical photograph instance using the free-hand sketch as query modality. It uses two major datasets like QMUL-Shoe-V2 and Sketchy. A zero-shot sketch-image hashing was proposed by Yuming Shen et al. [6]. This work eases the sketch-image assortment and augments the semantic relations among data by utilizing the Kronecker fusion layer and graph convolution. Li Liu et al. [4] proposed a deep sketch hashing in which a novel binary coding scheme has been adapted for fast retrieval. Sounak Dey et al. [5] suggested a practical zero-shot sketch-based image retrieval. Since sketches and photographs belong to different modalities, matching them is a hard task. There occurs a difference between sketches and photographs in the unidentified psychological mechanism of sketch generation. Shadow texture added by forensic artists to bring reality to the sketches, exaggerative drawing of distinctive facial shape features further compound the problem. Possible solutions are 1. Direct feature extraction from the sketch and matching: Apart from consuming more time, the success rate of the match is low since the sketch to be matched may not be a perfect one as it relies on the recollection of eyewitnesses. 2. Conversion of photographs into sketch or vice versa: Here the task is to change face photographs of the database into a sketch and then match up a query sketch with the integrated sketches. Once the database of sketches is available, it becomes easier to match the sketch with the database. The two important and neglected challenges in their work are the significant domain gap between amateur sketch and photograph and the necessity for shifting toward a large-scale retrieval. Most of the earlier sketch-based retrieval methods are based on object sketches. Very limited works have been reported on face sketches. Similarly, there are limited databases available for sketches. There have been a few such face sketch databases

26 Sketch-Based Face Recognition

323

reported (CUHK face sketch FERET Database, IIIT-D Sketch Database). Those datasets do not consider the geometric, photometric, and behavioral variations. The main contributions of the paper are: (1) A face sketch dataset including pencildrawn and converted sketches for face recognition. (2) A sketch-based face recognition framework using deep learning with illumination, age, pose, and expression variations. (3) A modified Mask RCNN framework for sketch-based face recognition. The rest of the paper is structured as follows. In Sect. 2, the methodology of sketch-based face recognition (SBFR) is clearly explained. Section 3 focuses on experimental results and performance evaluation of our model. Finally, Sect. 4 concludes the paper.

2 Methodology Figure 1a depicts a high-level overview of the proposed SBFR paradigm. The input image is given to the convolution neural network (CNN). The CNN classifies the image as a sketch or original image. Here CNN is used as an image classifier. If the input image is sketched, it is given to Mask RCNN. Else if it is an original image, it is given to the image to sketch converter which converts the image to sketch using the below-mentioned image to sketch algorithm. Mask RCNN recognizes the person in the image and gives the result in the form of a bounding box and mask and class label. Since the mask RCNN localizes the object spatially, this has been selected in

(a)

(b) Fig. 1 a Architecture of SBFR. b Image to sketch converter

324

M. Maheesha et al.

the proposed architecture. The proposed SBFR handles variations in illumination, age, pose, and expression.

2.1 CNN for Image/Sketch Detection For image/sketch detection, a pre-trained CNN is employed. The CNN which we have used for sketch or image detection includes five convolutional layers and three dense layers and an output layer. Each convolutional layer includes a convolution layer, an activation layer, a max-pooling layer, and a batch normalization layer. Between the convolution layers and dense layers, a flatten layer is added. Each dense layer includes a dense, an activation layer, a dropout layer, and a batch normalization layer. The CNN checks the input for image or sketch. If it is an image, the image is inputted to an image to sketch converter and this model achieves a testing accuracy of 99% using the cross-entropy loss function.

2.2 Image to Sketch Conversion To generate a sketch image from an RGB image, we need to perform four steps. 1. 2. 3. 4.

Convert the RGB color image to a grayscale image. Invert the grayscale image to get a negative. Apply Gaussian blur to the negative that we got from step 2. Blend the grayscale image from step 1 with the blurred negative from step using a color dodge (Fig. 2).

Fig. 2 Original images and converted sketch images

26 Sketch-Based Face Recognition

325

2.3 Mask RCNN Mask RCNN (He et al., ICCV 2017) is an improvement over Faster RCNN where it includes a mask prediction branch in parallel with the class label and the bounding box prediction branch. It adds little overhead to the Faster RCNN network, so it can still run at 5fps on a GPU. Faster RCNN entails two stages. The initial stage, called the region proposal network (RPN), suggests bounding boxes for the candidate objects. The subsequent stage, which is essentially fast RCNN [10], extracts characteristics from each candidate box using the RoI pool and performs bounding box regression and classification. Mask RCNN uses the same two-step process with a similar first step (which is RPN). In addition to the class prediction and the box offset, the RCNN mask generates a binary mask for each RoI in the second stage. It does regression and bounding box classification parallelly and is intended to greatly simplify the original RCNN’s multi-stage process [10]. Mask Representation: A mask encrypts an input object’s layout in the spatial domain. It converts into quick output vectors using the aid of fully connected (fc) layers, extracting the spatial shape of the mask which can be addressed by pixelto-pixel correspondence implemented using convolutions. This pixel-to-pixel action demands that our RoI features, which in turn are tiny feature maps, are accurately aligned to maintain precise per-pixel spatial correspondence. This persuaded us to expand the following RoI Align layer which plays a vital role in mask prediction. RoI Align: RoI Pool [9] is a general process that generates a local feature map from each RoI. RoI Pool first quantizes a floating-number RoI to the feature map’s discrete granularity, then subdivides it into spatial bins, which are subsequently quantized, and then aggregates the feature values covered by each bin (usually by max-pooling). In Mask RCNN, the RoI Align computes the value by applying bilinear interpolation for each sampling point in a feature map (Fig. 3). The proposed Mask RCNN Network Architecture uses SE-ResNet: We use the SE-ResNet backbone to implement our model. The fundamental module of SEResNet is a hybrid of the Squeeze-and-Excitation block (SE block) [7] and the ResNet residual block, which we refer to as the SE-ResNet. The SE block intends to use global records to highlight informative characteristics while suppressing less advantageous ones. It aims to add weights to each feature map in the layer, to put it differently. The squeeze was performed to find channel-wise global max-pooling. This method of channel data aggregation keeps it in a lower dimension. The types of excitation operations require figuring out which of the distinctive maps are genuinely vital or have signaled them. Two FC layers are used in the learning process, with a ReLU layer in between. A sigmoid layer is then placed at the end. The sigmoid activation functions as a channel weight specific to the input descriptor x. The SE block introduces dynamics that are conditioned at the input, which helps to improve feature discriminability. Where H and W are the dimensions height and width and C represents the number of channel “C”. The hyperparameter “r” is used to condense information into a smaller representation to reduce computational time.

326

M. Maheesha et al.

Fig. 3 Se-ResNet module

3 Results and Discussion 3.1 Dataset In this study, we introduce a sketch database called POPSKETCH with two types of face sketches, which can essentially impact the face recognition research, in general, and suitable to the Indian setting (appearance, expressions, etc.) in specific. The former type 1 (“hand sketch”) includes around 1093 images for 36 individuals collected and compiled from the web. The images of later type 2 (“IM to SKETCH”) contain 1500 images of 24 individuals. These databases will be put together for the public to bring on research and development for accurate face recognition with minimal set. As per the Pareto principle, 80% (2074) of the images are taken for training and 10% (259) of the images are taken for validation and the remaining 10% (260) are taken for testing (Figs. 4 and 5). Dataset 1 includes sample images of hand-drawn sketches, and Dataset 2 includes sample images of converted sketches.

26 Sketch-Based Face Recognition

327

Fig. 4 Dataset 1—sample pictures

Fig. 5 Dataset 2—sample pictures

3.2 Mask RCNN for Sketch-Based Face Recognition Mask RCNN localizes the face region since it works spatially; this avoids considering the unwanted sketch areas as shown in Fig. 6 which increases the recognition rate.

3.2.1

With Respect to Illumination Changes

Illumination is the brightness of an image. Gradient operator is the result of edges in sketch images. This gradient operator is invariant to illumination. Thus, sketches are also invariant to illumination shown in Fig. 7. Here in SBFR even if we change the illumination, it recognizes the person with more accuracy. Though illumination changes, the sketch of the face remains the same. Sometimes during illumination variation, there may be some uncertainty in the results due to insufficiency in the data. The number of training images of both high and low brightness has been given more for each person. This avoids the indeterminacy in the results.

328

M. Maheesha et al.

Fig. 6 Face mask and bounding box produced by mask RCNN

Fig. 7 SBFR with illumination change

3.2.2

With Respect to the Age of Persons

The sketch is invariant to age. As the age varies only the texture of the skin and appearance of the image changes, the edges remain sharpened in case of a sketch. So, this helps in identifying a person correctly. Though the age varies the features like the shape of the eyes, nose, edges on the face remain the same for an age difference of at least ten years, i.e., between 30 and 40 years, with the help of training and testing it could extract these features and recognize the person more accurately. Consider

26 Sketch-Based Face Recognition

329

Fig. 8 SBFR with age variation

the below result, each row consists of sketch images of each person at different age. As we see even if age varies the result obtained is correct for each person (Fig. 8).

3.2.3

With Respect to Expression and Pose Variations

The muscles on the face are responsible for the expressions given by a person. Sketches vary based on the structure of bones. Thus, sketches are invariant to facial expressions. Expressions and emotions like crying, smiling, laughing, and so on are all concerned with the movement of muscles. Only during yawning the sketches are variant as it is concerned with the movement of bones. Consider the below result, each row consists of expressions, emotions, and poses of each person as sketches. As these features are invariant to sketches, the result obtained is also more accurate (Fig. 9). The sketch is invariant to the poses of a person. While posing for an image only the face and neck slightly moves, the edges and other features remain the same. Even if a person poses inside view, with the help of the shape of the nose and other features like edges it will identify them easily. Based on the training data, the results are obtained. In all cases say illumination, age, pose, expressions, and emotions, the input image is either a color image or a hand-drawn sketch. If it is a color image, it gets converted into a sketch and then training and testing are performed. The training decides the

330

M. Maheesha et al.

Fig. 9 SBFR with pose variation

accuracy of the result. The more the datasets are trained, the more accurate is the result.

3.3 Performance in Terms of Loss We have applied two loss functions: one is for classification and the other one for bounding box prediction for training the model. Classification uses cross-entropy and the bounding box uses mean-square-error as a loss function. The model has two outputs, one is a classifier and another one is a bounding box and mask. From Fig. 10, we can see that while training the classifier the loss reduces, but in the case of validating the classifier, there is some oscillation in the loss. From Fig. 11, it is evident that while training for plotting the bounding box, as the sketch images have only shape features, it is taking time to converge. In case of validation, the bounding box loss converges quickly.

Fig. 10 Performance loss of classifier

26 Sketch-Based Face Recognition

331

Fig. 11 Performance loss of bounding box

Table 1 Recognition rate Algorithm

Training accuracy (%)

Validation accuracy (%)

Testing accuracy (%)

Samples

2074

259

260

CNN with AlexNet

65.90

62.14

60.7

YoloV5

85.14

80.68

81.39

Mask RCNN

90.47

87.13

88.23

Modified mask RCNN (Proposed)

96.80

90.14

90.67

3.4 Recognition Rate The comparative study was done with CNN with AlexNet, Yolo v5, Mask RCNN, and Modified Mask RCNN. It shows that the proposed algorithm achieves a testing accuracy of 90.67% and an increased gain of 2.44% compared with Mask RCNN (Table 1).

4 Conclusion We have proposed a sketch-based face recognition method using a modified Mask RCNN deep learning framework. A sketch database POPSKETCH has been compiled with pencil sketches and image-converted sketches of popular faces. The proposed framework has been validated for variations in illumination, age, pose, and expression of human faces. And the proposed method outperforms CNN and YOLO by providing a 90.67 recognition rate. With large training samples, a minimal representation deep network employing alternative models with SE will be built in the future.

332

M. Maheesha et al.

References 1. Dutta A, Akata Z (2019) Semantically tied paired cycle consistency for zero shot sketch based. CoRR. https://arxiv.org/abs/190.03372 2. Zhang J, Shen F, Liu L, Zhu F, Yu M, Shao L, Shen HT, Van Gool L (2018) Generative domainmigration hashing for sketch-to-image. In: Proceeding of the European conference on computer vision (ECCV). https://openaccess.thecvf.com/content_ECCV_2018/html/Jingyi_Zhang_Gen erative_Domain-Migration_Hashing_ECCV_2018_paper.html 3. Pang K, Li K, Yang Y, Zhang H, Hospedales TM, Xiang T, Song Y-Z (2019) Generalising fine-grained sketch-based image retrieval. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://openaccess.thecvf.com/con tent_CVPR_2019/html/Pang_Generalising_Fine-Grained_Sketch-Based_Image_Retrieval_ CVPR_2019_paper.html 4. Liu L, Shen F, Shen Y, Liu X, Shao L (2017) Fast free-hand sketch-based image retrieval. CoRR. https://arxiv.org/abs/1703.05605 5. Dey S, Riba P, Anjan D, Llados J, Song Y-Z (2019) Doodle to search: practical zero-shot sketch-based image retrieval. IEEE.https://doi.org/10.1109/CVPR.2019.00228 6. Shen Y, Liu L, Shen F, Shao L (2018) Zero-shot sketch-image hashing. CVPR. https://arxiv. org/abs/1803.02284 7. Hu J, Shen L, Sun G (2017) Squeeze-and-excitation network. CoRR. https://arxiv.org/abs/ 1709.01507 8. Tang X, Wang X (2002) Face photo recognition using sketch. In: Proceedings of international conference on image processing, pp I-I. https://doi.org/10.1109/ICIP.2002.1038008 9. Galea C, Farrugia R (2017) Matching software-generated sketches to face photos with a very deep CNN, morphed faces, and transfer learning. IEEE Trans Inform Forensics Sec 1–1. https:// doi.org/10.1109/TIFS.2017.2788002 10. He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. ISSN 2980-2988. https://doi. org/10.1109/ICCV.2017.322

Chapter 27

Aerial Object Detection Using Different Models of YOLO Architecture: A Comparative Study Vinat Goyal, Rishu Singh, Aveekal Kumar, Mrudul Dhawley, and Sanjeev Sharma

1 Introduction Object detection is a subdomain of the computer vision field. Object detection refers to the capability of computer and software systems to locate objects and then identify and put a bounding box on each object of the image [1]. Object detection has been widely used for face detection, vehicle detection, pedestrian counting, web images, security systems and autonomous cars. Object detection paves the stepping stone for developing robots that can perceive the environment around them the way humans do. Object detection plays a primary role in language and vision to allow machines to learn through the intersection of vision and language. To interact with the objects, the machine needs to first learn to classify and localise the object. The motive of object detection is to enable such machines to recognise and localise all known objects in a scene. A more specific area of research in object detection is aerial object detection. Aerial object detection detects objects in images taken from an altitude. These images are generally captured using a drone or a satellite. Aerial images differ from general images in the way that object size in aerial images shows a significant variance. The quality of aerial images is generally poor because they are captured during the motion of a drone or satellite. Aerial object detection can be accommodating in many aspects V. Goyal (B) · R. Singh · A. Kumar · M. Dhawley · S. Sharma Indian Institute of Information Technology Pune, Pune, India e-mail: [email protected] R. Singh e-mail: [email protected] A. Kumar e-mail: [email protected] M. Dhawley e-mail: [email protected] S. Sharma e-mail: [email protected] URL: https://deeplearningforresearch.com © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_27

333

334

V. Goyal et al.

like visible and thermal infrared remote sensing for the detection of white-tailed deer Using an unmanned aerial system [2]. It can help in improving the precision and accuracy of animal population [3] estimates with aerial image object detection. It aid via robust detection of small and dense objects in images from autonomous aerial vehicles. In the field of Geography, it assists by accurate landslide detection leveraging UAV-based aerial remote sensing. One of the critical uses of aerial object detection is to detect unmanned aerial vehicles (UAVs) like a drone or the US Boeing Eagle Eye. Other uses include detecting vehicles from satellite images. This paper focusses on studying the past work done in the field and then studying the effectiveness of transfer learning in aerial object detection. Transfer learning refers to the method of training large pre-trained models on our own data set to adapt well. This paper performs a comparative study of three YOLOv5 models on a standard data set. YOLO is a state-of-the-art model which is implemented using the transfer learning approach. We use the VEDAI data set for this task. The study helps determine whether general object detection models are able to adapt to aerial images well. The rest of the paper is organised as follows. Section 2 discusses the literature survey of the related work. Section 3 covers the study of material and methods. Section 4 presents the experiments and results. At last, we are concluding work in Sect. 5.

2 Literature Survey There has been a lot of research dedicated to object detection owing to the value it brings. Today, object detection is widely used in several applications like image retrieval, security, surveillance, automated vehicle systems, machine inspection, etc. Generally two methods are used for this task: (1) machine learning based and (2) deep learning based. Previously, the traditional manual methods were used for this task. Some of the traditional object detection algorithms used were Haar and AdaBoost [4], histogram of oriented gradient (HOG) and support vector machine (SVM) [5] and deformable parts model (DPM) [6]. These algorithms are difficult to adapt to the diversity of the object, and the robustness of these detection models are not strong [7]. As the field of computer vision is developing, these traditional algorithms are losing application. With time, deep learning models have outperformed the traditional approaches and have become the go-to method for many computer vision tasks. Deep learning models have been there for decades but did not get popular then because of lack of data and computational power. Today there is much data available, the computational power of computers has drastically increased, and there has been a lot of development in developing better optimisation algorithms. Algorithms like stochastic gradient descent with momentum (SGDM) and RMSProp have emerged as the favourites for optimisation. All these factors have contributed to the success of deep learning models today.

27 Aerial Object Detection Using Different Models of YOLO …

335

Generic object detection is done by two important steps, classifying existing objects in an image and labelling them with rectangular bounding boxes. CNNs have been the go-to architecture for the task of image classification, but object detection also requires the task of object localisation. This is achieved by proposing different models developed on the foundational idea of a CNN. There are still some difficulties that make this task difficult, like foreground-background class imbalance [8]. There are generally two frameworks of generic object detection. The first framework, called the two-step detector, proposes two different pipelines for the task of classification and localisation. Algorithms belonging to this category are R-CNN [9], SPP-net [10], R-FCN [11], FPN [12] and Mask R-CNN [13]. On the other hand, the second framework, called the one-step detector, proposes a single pipeline and regards object detection as a regression or classification problem and achieves final results in one go. Examples of the second model framework include MultiBox, AttentionNet, YOLO [14], G-CNN, SSD [15], DSS and DSOD. R-CNN selects bounding box regions that are likely to contain object instances via selective search. By sharing features, R-CNN [16] unified the several models employed in R-CNN. Region proposal network (RPN) and Fast R-CNN with shared convolutional layers are used in a single unified model in Faster R-CNN [17]. Faster R-CNN is extended to pixel-level image segmentation with Mask R-CNN. In comparison with two-stage detectors, YOLO was the first to introduce a singlestage detector, which can infer faster. For multi-scale object detection, SSD was the first to use feature pyramids. RetinaNet [18] is based on a feature pyramid network and employs focus loss, which favours complex samples over easy ones by giving them more weight. The ability of deep learning models to adapt well to new unseen similar data has given rise to transfer learning. Transfer learning is an approach where a large model which has been trained on extensive data is used instead of developing an architecture from scratch. This approach has gained a lot of popularity since these large models have been able to adapt well to the new data. Transfer learning has attained outstanding results in tasks like image classification [19], retina image segmentation [20], generic object detection [21] and other such tasks. In this paper, we study the effectiveness of using a pre-trained object detection model on aerial images. For this study, we use three YOLOv5 models on the VEDAI data set. The models are fully fine-tuned on the images.

3 Materials and Methods The first step was to find the problem statement. Following step was to collect the related data set to the problem statement. After the data was collected, it was preprocessed to make it of desirable format and size. The pre-processing stage included annotating the data set in accordance with the YOLO architecture. The VEDAI data set annotations are not in the format that the yolov5 model can read. The YOLOv5 model takes input in the form i, X, Y, wandh where i is the object

336

V. Goyal et al.

id, w and h are the width and height of the annotated object and X, Y are the distance of the centre of the object from the top left corner. The annotations provided by VEDAI provide sample class, GT(ground truth) centre point coordinates, direction and the coordinates of the GT’s 4 corners. For converting the available format into the desired format, we used a tool called Roboflow to manually label the images in the data set. Robotflow is a website used to annotate, train and deploy computer vision models. It formats the data in various formats such as json, .csv, .txt, .xml, which can be read by various models such as YOLOv4, YOLOv5, amazon Rekognition, microsoft custom vision, etc. The model automatically reshapes the image to (640 × 640)px. After pre-processing, the data was fed to the different YOLOv5 models in the study. The three models in the study are the YOLOv5m6, YOLOv5x6 and the YOLOv5l6.

3.1 Data set We have made use of the VEDAI data set which is a standard benchmark data set in aerial object detection. The data set has 1250 images but for the study of this paper only 420 images were used. Below is the summary of this data set. VEDAI data set Vehicle Detection in Aerial Imagery (VEDAI) [22] data set has 1250 images and is annotated with 9 classes and a total of 3700 instances. It contains images with image size 512 × 512 and 25 cm resolution. It is the most common data set for vehicle detection. The samples of this data set are displayed in Fig. 1.

Fig. 1 Samples of the VEDAI data set

27 Aerial Object Detection Using Different Models of YOLO …

337

3.2 Data Pre-processing and Splitting Data pre-processing is one of the most important steps. This step makes the raw data compatible with the deep learning model. The annotation available with the original data is not compatible with the YOLO architecture. Out of the 1250 images available, only 420 images are used for this study. We only used 35% of the original data to study how well the models can adapt in a situation of limited data. The images are annotated using Roboflow making them compatible with the YOLO architecture. The data is split in a ratio of 80:20. 80% data is used for training and the remaining 20% is used for validation.

3.3 Models “You Only Look Once” or YOLO is a family of deep learning models designed for fast object detection. YOLOv5 is the latest and improved version of YOLO released by a company called Ultralytics in 2020. YOLO models are one stage detectors that work on the principle that rather than looking at different parts of the image, the YOLO model looks at the entire image only once, traverses the network once and detects the object. Hence, YOLO models are fast and require less computation power when compared to other object detection techniques. Figure 2 shows the general YOLO architecture taken from the original YOLO paper [23]. As YOLO v5 is a single-stage object detector, it has three important parts like any other single-stage object detector: 1. YOLOv5 Backbone: It employs CSPDarknet as the backbone for feature extraction from images consisting of cross-stage partial networks. In YOLO v5, the cross-stage partial networks (CSP) are used as a backbone to extract rich in informative features from an input image.

Fig. 2 YOLO architecture [23]

338

V. Goyal et al.

2. YOLOv5 Neck: It uses PANet to generate a feature pyramids network to perform aggregation on the features and pass it to Head for prediction. It helps to identify the same object with different sizes and scales. In YOLO v5, PANet is used for as neck to get feature pyramids. 3. YOLOv5 Head: The model Head is mainly used to perform the final detection part. It applied anchor boxes on features and generates final output vectors with class probabilities, objectness scores and bounding boxes. 4. Activation and Optimization: YOLOv5 uses leaky ReLU and sigmoid activation, and SGD and ADAM as optimizer options. 5. Loss Function: It uses binary cross-entropy with logits loss. enumerate environment The variants of the model have the same layers but differ in the size of the network. YOLOv5 includes four different variant sizes which are as follows: • • • •

YOLOv5s: smallest version. YOLOv5m: medium version. YOLOv5l: large version. YOLOv5x: largest version.

This paper studies and compares our proposed model and the three YOLOv5 architectures. The three models used are the YOLOv5m6, YOLOv5x6 and the YOLOv5l6. These three models are used for the study because they show significant differences in their sizes. This is beneficial in studying whether a deeper network adapts to limited data better or the smaller network.

4 Experiments and Results 4.1 Hardware and Software Setup Tesla K80 GPU and 13 GB RAM used for training along with TensorFlow, Keras and scikit-learn libraries in Google Colab, coded in Python 3.7.10.

4.2 Training and Testing Data The annotations of the original VEDAI data set were not compatible with the different YOLOv5 models. Therefore, the annotations were handmade using Roboflow in the format compatible with the YOLOv5 model. We make use of about 35% of the original VEDAI data set. The annotated data set is split into training data (80%) and validation data (20%).

27 Aerial Object Detection Using Different Models of YOLO …

339

4.3 Evaluation Criteria In the prediction phase, seven quantitative performance measures were computed to access the reliability of trained models using the validation data, including precision, recall, F1-score, accuracy, macro-avg and weighted-avg. These metrics are computed based on True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN). Performance metrics are a vital part of any research. Models proposed need to be evaluated based on standard metrics to compare their performance to previous models. This section discusses some of the widely used evaluation metrics to assess models for aerial object detection. Precision-recall curve (PRC), mean average precision (mAP) and F-measure (F1) are three well-known and widely applied standard measures approaches for comparisons. mAP is to solve the single-point value limitations of P, R, F1, it can get an indicator that reflects global performance. The definition is as follows: 1 mAP =

P(R) dR

(1)

0

4.4 Training Single Convolution Mode The annotations available with the VEDAI data set are not compatible with the YOLOv5 model. The selected data had to be first annotated using Roboflow in a format compatible with the YOLO architecture. All the images were rescaled. The data were then passed through the models.

4.5 Results and Discussion The three models are trained on the training data set for 15 epochs. Tables 1, 2 and 3 gives the report of YOLOv5m6, YOLOv5x6 and YOLOv5l6, respectively on the validation data. Figures 3, 5 and 7 shows the (a) recall versus Table 1 Report of YOLOv5m6 on the validation data set Precision

Recall

mAP_0.5

mAP_0.5:0.95

box_loss

obj_loss

cls_loss

0.18198

0.44639

0.31153

0.13619

0.07884

0.0081127

0.044935

obj_loss

cls_loss

0.013321

0.059137

Table 2 Report of YOLOv5x6 on the validation data set Precision Recall mAP_0.5 mAP_0.5:0.95 box_loss 0.095025

0.16743

0.085099

0.03115

0.059112

340

V. Goyal et al.

Table 3 Report of YOLOv5l6 on the validation data set Precision Recall mAP_0.5 mAP_0.5:0.95 box_loss 0.8099

0.16625

0.1836

0.076123

0.067825

obj_loss

cls_loss

0.0094907

0.03298

Fig. 3 Graphs a recall versus confidence score b F1 versus confidence c precision versus confidence score d precision versus recall for YOLOv5m6

confidence score (b) F1 versus confidence (c) precision versus confidence score and the (d) precision versus recall graph. Figures 4, 6 and 8 shows some more curves of the models.

4.6 Comparative Study The results of the three models are compared with each other. Table 4 gives the comparison between the three YOLOv5 models.

27 Aerial Object Detection Using Different Models of YOLO …

341

Fig. 4 Graphs of a train versus box_loss b train versus object_loss c train versus class_loss d metrics versus precision e metrics versus recall f validation versus box_loss g validation versus object_loss h validation versus class_loss i metrics versus mAP_0.5 j metrics versus mAP_0.5:0.95 for YOLOv5m6

Fig. 5 Graphs a recall versus confidence score b F1 versus confidence c precision versus confidence score d precision versus recall for YOLOv5x6

342

V. Goyal et al.

Fig. 6 Graphs of a train versus box_loss b train versus object_loss (c)train versus class_loss d metrics versus precision e metrics versus recall f validation versus box_loss g validation versus object_loss versus validation versus class_loss i metrics versus mAP_0.5 j metrics versus mAP_0.5:0.95 for YOLOv5x6

Fig. 7 Graphs a recall versus confidence score b F1 versus confidence c precision versus confidence score d precision versus recall for YOLOv5l6

27 Aerial Object Detection Using Different Models of YOLO …

343

Fig. 8 Graphs of a train versus box_loss b train versus object_loss (c)train versus class_loss d metrics versus precision e metrics versus recall f validation versus box_loss g validation versus object_loss h validation versus class_loss i metrics versus mAP_0.5 j metrics versus mAP_0.5:0.95 for YOLOv5l6 Table 4 Performance comparison of the models for the VEDAI data set Model mAP_0.5 mAP_0.5:0.95 YOLOv5m6 YOLOv5x6 YOLOv5l6

0.31153 0.085099 0.1836

0.13619 0.03115 0.076123

The table gives that the YOLOv5m6 model, which is the smallest architecture in our study, outperforms the other two larger models. However, none of the models have given competitive results. It is evident that generic object detectors are not able to overcome the issues possessed by aerial images.

5 Conclusion and Future Scope It can be concluded from the study that even though transfer learning has been very efficient for generic object detection tasks, the same models fail to adapt to aerial images. The models were not able to adapt well to the limited data set. These stateof-the-art object detectors fail to overcome the problem of angle shift of the anchor boxes, significant variation in the sizes of different objects amd low quality of small objects.

344

V. Goyal et al.

In future, we wish to propose a hybrid YOLOv5 model which can overcome the above-mentioned problems. We would also like to extend our work to other statesof-the-art detectors like R-CNN and data sets like the DOTA data set.

References 1. Zou Z, Shi Z, Guo Y, Ye J (2019) Object detection in 20 years: a survey 2. Louis-Philippe C, Jérôme T, Patrick M (2016) Visible and thermal infrared remote sensing for the detection of white-tailed deer using an unmanned aerial system. Wildlife Soc Bullet 40(1):181–191 3. Eikelboom JAJ, Wind J, van de Ven E, Kenana LM, Schroder B, de Knegt HJ, van Langevelde F, Prins HHT (2019) Improving the precision and accuracy of animal population estimates with aerial image object detection. Methods Ecol Evol 10(11):1875–1887 4. Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid object detection. In: Proceedings international conference on image processing, vol 1, pp I–I 5. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Schmid C, Soatto S, Tomasi C (eds) International conference on computer vision and pattern recognition (CVPR ’05), vol 1. San Diego, United States, June 2005. IEEE Computer Society, pp 886–893 6. Felzenszwalb PF, Girshick RB, McAllester D (2010) Cascade object detection with deformable part models. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 2241–2248 7. Ke L, Gang W, Gong C, Liqiu M, Junwei H (2020) Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J Photogrammetry Rem Sens 159:296–307 8. Tsung-Yi L, Priya G, Ross G, Kaiming H, Piotr D (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327 9. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation 10. He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. Lect Notes Comput Sci 346–361 11. Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems, vol 29. Curran Associates Inc 12. Tsung-Yi L, Piotr D, Girshick R, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection, Kaiming He 13. He K, Gkioxari G, Dollár P, Girshick R (2018) Mask r-cnn 14. Joseph R, Santosh D, Ross G, Farhadi A (2016) Unified, real-time object detection, you only look once 15. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. Lect Notes Comput Sci 21–37 16. Girshick R (2015) Fast r-cnn 17. Shaoqing R, Kaiming H, Ross G, Sun J (2016) Towards real-time object detection with region proposal networks, Faster r-cnn 18. Tsung-Yi L, Goyal P, He K, Dollár P (2018) Focal loss for dense object detection, Ross Girshick 19. Shaha M, Pawar M (2018) Transfer learning for image classification. In: 2018 second international conference on electronics, communication and aerospace technology (ICECA), pp 656–660 20. Zhexin J, Hao Z, Yi W, Seok-Bum K (2018) Retinal blood vessel segmentation using fully convolutional network with transfer learning. Comput Med Imaging Graph 68:1–15 21. Talukdar J, Gupta S, Rajpura PS, Hegde RS (2018) Transfer learning for object detection using state-of-the-art deep neural networks. In: 2018 5th international conference on signal processing and integrated networks (SPIN), pp 78–83

27 Aerial Object Detection Using Different Models of YOLO …

345

22. Razakarivony S, Jurie F (2015) Vehicle detection in aerial imagery: a small target detection benchmark. J Vis Commun Image Represent 23. Bochkovskiy A, Wang C-Y, Liao H (2020) Yolov4: optimal speed and accuracy of object detection

Chapter 28

Video Anomaly Classification Using DenseNet Feature Extractor Sanskar Hasija, Akash Peddaputha, Maganti Bhargav Hemanth, and Sanjeev Sharma

1 Introduction Videos have become an intrinsic part of people’s daily life. There are around 149 billion videos on YouTube alone [1], with many more on other platforms. Sorting the videos and structuring them into various categories is a challenging task. The process becomes increasingly difficult as new videos are added on multiple platforms, but this may not be the case if a machine is used to do the same. Many researchers around the world were piqued by the idea of employing computers to classify videos using deep learning. Video classification can be used to detect anomalies in the videos, surveillance, video summarising, action detection, facial sentiment analysis, scenery detection, etc. Brisk improvements in development and urbanisation have increased and sped up the requirement for clever real-time video surveillance. The main objective of various surveillance applications is the precise and timely detection of anomalies.

https://deeplearningforresearch.com/. S. Hasija (B) · A. Peddaputha · M. B. Hemanth · S. Sharma Indian Institute of Information Technology, Pune, India e-mail: [email protected] A. Peddaputha e-mail: [email protected] M. B. Hemanth e-mail: [email protected] S. Sharma e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_28

347

348

S. Hasija et al.

Video anomalies are inconsistencies in videos such as atypical exercises and inconsistent or irregular patterns that do not comply with the normal paradigm. Anomaly detection has been a research domain for nearly two decades, and with increasing data volumes and data type diversity, both in supervised and unsupervised learning domains, this problem requires advanced techniques to achieve better results. Convolution neural network advancements over the last decade have provided much more accurate and even insightful reasons than traditional artificial neural networks. Furthermore, the availability of open-source pre-trained deep convolution models on massive data has facilitated and equipped learning and hardware availability. These pre-trained model’s convolution layers already have various features within their hidden layers that make training easier and also expedite the process of minimising loss. Deep learning, a subset of artificial intelligence, is a sub-field of machine learning techniques that constructs artificial neural networks inspired by the functioning and structure of the human brain. Traditional machine learning algorithms were unable to provide improved performance as the volume of data and multitude of new unstructured data has increased. Deep learning, which introduced the notion of a large number of hidden layers of various types for various data types, tends to yield better results. This, in turn, necessitates time-consuming huge computational resources, but progressions in the development of GPUs and TPUs have massively diminished the latter. In various surveillance and crime detection video data, exceptionally small data about anomaly or atypical behaviours is accessible. We propose a deep CNN model that addresses this issue by using DenseNet121 [2] pre-trained architecture for extracting the features and then the features are trained with custom-built architecture. The training is done on a train set of the train–test split provided by the UCF crime dataset [3], and the videos are classified into 14 different categories. In each step of training, the model is validated on the test data, and several metrics such as precision, recall, F1-score, AUC(%) and ROC curve were subsequently evaluated to assess the model’s performance. The remaining paper is organised as follows: Sect. 2 discusses previous research on the problem. Section 3 outlines dataset preparation, pre-processing methods used, designing the model also configuring and tuning the model. Section 4 contains a discussion of the experiments that were carried out and the results that were achieved. Finally, in Sect. 5, we conclude and present the future scope of the research work.

2 Literature Survey Many deep learning approaches [4–21] for detecting anomalies in videos have been proposed over the past few years, few of them are outlined in this section. Anala et al. [4] presented a technique for categorising anomalies into different types regardless of the environment. The model was developed using a combination of VGG-16 and

28 Video Anomaly Classification Using DenseNet Feature …

349

LSTM, and it had an accuracy of roughly 85%. Nguyen et al. [5] built an anomaly detection network using a combination of convolutional auto-encoder (Conv-AE), U-Net with skip connections and inception module. Unlike other models, this one has no fully connected layers, allowing it to function with images of any resolution. Zhou et al. [6] proposed AnomalyNet, an optimised network built using the SLSTM approach for sparse coding, and used it to detect anomalies. This work bridges the gap between sparse optimization and long short-term memory (LSTM). Zahid et al. [7] extracted the features using the InceptionV3 network and bagging ensemble method for the combination of C3D and three-layer fully connected network and then predicted the results. Ozturk et al. [8] proposed a neural network ADNet to extract the temporal features of the video from the temporal space using temporal convolutions. The author also used the F1@k metric for the better evaluation of the results of the predictions and extended the UCF crime dataset. Wan et al. [9] proposed a large-scale dataset as well as a multitask learning model. The author used an inflated 3D convolutional network (I3D) to retrieve relevant local features, which were then fed into a recurrent neural network (RNN). These features aid in the training of global features, and the outcome of the predicted classes is provided by fully connected convolutional layers constructed from I3D and RNN. An efficient CNN-based model with the combination of multilayer bidirectional LSTM (BD-LSTM) was proposed by Ullah et al. [10]. Spatio-temporal features were extracted using pre-trained CNN architectures such as ResNet-50, VGG-19, and InceptionV3 and then integrated them with bidirectional LSTM (BD-LSTM) to enhance learning ability. Feng et al. [11] designed a self-training system with multiple steps to detect anomalies in videos. To enhance task-specific feature encoding for videos, the model is included with a pseudo-label generator and a self-guided boosted feature encoder. A weakly supervised model that enables top-k multiple instance learning(MIL) strategy was developed by Tian et al. [12] to differentiate between anomalous and non-anomalous snippets from an abnormal video. This model is considered to be the benchmark on UCF crime and ShanghaiTech Weakly Supervised datasets with 84.03 and 97.21 AUC-ROC scores, respectively [13]. Sabokoru et al. [14] considered the video to be a set of patches, which can be classified into two types based on structural similarity: local and global patches. Then, in the patches, Gaussian classifiers are used to determine the decision boundary through which the videos are classified. Tran et al. [15] devised the channel-separated convolution network to factorise 3D convolutions in order to segregate channel and spatio-temporal interactions, allowing for improved accuracy. The ResNet3D architecture serves as the base network, with many modifications. The model was trained on I3D and kinetics to improve the performance. Diba et al. [16] modified the DenseNet architecture by introducing additional 3D convolutions and a temporal transition layer (TTL), which extracts varying temporal 3D convolution kernel depths over varying time ranges. Xie et al. [17] improved the speed and accuracy of video classification by replacing the bottom layer’s 3D convolutions with 2D convolutions and combining them with distinguishable spatio-temporal convolutions and features. The two-stream I3D model, the benchmark model on UCF-101and HMDB-51 datasets, was developed by Carreira

350

S. Hasija et al.

et al. [18]. Tran et al. [19] proposed the convolution 3D (C3D) network for learning spatio-temporal features in videos. The author also claimed that 3D ConvNets are superior feature extraction tools when compared with 2D ConvNets. Mao et al. [20] used deep convolutional graph network model an advanced and powerful neural network to identify the videos in the frame sequence leveraging hierarchy in them. They connected the edges of graphs to group the frames, shots or events in the videos that exhibited similar actions. They trained their model on YouTube-8M Segments Dataset [22], evaluated the model with HIT@1 [23] metric and achieved 87.7% score. Dubey et al. [21] presented a method using 3D ResNet combined with multiple instance learning (MIL) [24] and the ranking loss function for classifying videos. That extracted spatio-temporal features using 3D ResNet network which were initially categorised as positive or negative. After extracting the features, they were fed into a deep neural network, which was then trained using MIL and the ranking loss function which yielded a good AUC score of 76.67%.

3 Materials and Methods 3.1 Dataset UCF crime dataset consists of 14 different classes, which include normal videos as well as anomaly videos. It consists of around 810 long untrimmed surveillance videos which cover 13 real-world anomalies, including Abuse, Arrest, Arson, Assault, Road Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting and Vandalism and 950 normal videos. Since we cannot directly input videos into any deep learning model, we must extract frames from the videos that can be used as images and trained. So, we extracted every tenth frame in the proposed work using custom-defined functions with the help of the OpenCV library. Every video in the dataset has a frame rate of 30 frames per second. We select to extract every tenth frame so that we may collect exactly three frames from the instance in a second. It aided in maintaining the size of the dataset, hence reducing training time without compromising essential information (Figs. 1 and 2). These frames were saved in their respective class folders. The final image count for training and testing data was 1,266,345 and 111,308, respectively. This custom dataset, which was used in the proposed work, is available online via this LINK.

3.2 Image Processing We defined two separate generators for training data and testing data using the TensorFlow library’s ImageDataGenerator function. Specific parameters were assigned

28 Video Anomaly Classification Using DenseNet Feature …

351

Fig. 1 Training data distribution of 13 different anomalies

Fig. 2 Test data distribution of 13 different anomalies

to the training data generator, which plays a crucial role in image augmentation before it is passed on to the model for training. With the horizontal flip parameter set to True, a 5% height shift and a 10% width shift were applied. The horizontal flip parameter flips images at random before passing them on. Both the training and test data generators had the re-scaling set to 1/255, which aids in scaling each pixel of an image to a range of values (0,1). The batch size was set to 64, which allows for

352

S. Hasija et al.

Fig. 3 Work flow of proposed model

the processing of 64 images at once and making a batch. Finally, a pre-processing function was utilised to convert all of the photos to the format required by our feature extractor (Fig. 3).

3.3 Proposed Model The proposed approach had three phases: feature extraction, classification and finally integration. The DenseNet121 model served as a feature extractor for the image data during the feature extraction phase. DenseNet is less subject to overfitting since it makes better use of the parameters than other architectures. The layers in the architecture are linked to each other rather than having a single subsequent connection, which increases feature information transfer between layers and reduces the vanishing gradient problem [2]. The feature extractor expects input data in the shape of (64, 64, 3), which will be handled by the previously defined pre-processed function. The DenseNet121 model, derived from the Keras package, was used as a transfer learning technique. ImageNet weights were employed in the feature extraction phase by the DenseNet121 model. Considering the fact that ImageNet weights were used to categorise 1000 different classes in the ImageNet dataset [25], they serve as an appropriate set of weight initializations. This phase produced an output shape (2, 2, 1024). The following phase (classification phase) was made of multiple layers that assisted in sorting the output from the feature extractor phrase into 14 different classes. The classification phase’s first layer was a global average pooling 2D layer which turns the output of the feature extractor to a one-dimensional vector that would be used by the subsequent levels. The global average pooling 2D layer computes the average value across the (2,2) block for all 1024 layers and produces an output with the shape of (1024). This output is then routed to a set of three separate fully connected dense layers, each with 256,1024,512 units. ReLu activation was used on

28 Video Anomaly Classification Using DenseNet Feature …

353

these three fully connected dense layers [26]. Following each dense layer is a dropout layer with a distinct dropout probability which plays an essential part in lowering the overfitting [27] of training data by randomly switching off nodes of the dense layers based on the dropout probability. After the three fully connected dense layers, a final dense output layer of 14 nodes was used. Since the multi-class classification of 14 distinct classes was necessary, softmax activation was used in the final dense layer. The last combination phase defined a model in which the previous two phases would be combined and utilised together. This model was developed with the TensorFlow framework and includes an input layer that is provided on to the feature extractor phase. The output of the feature extractor is passed on to the classification phase, and the output of the classification phase is used as the model’s final output. The final phase model was then constructed using the SGD optimiser [28] with a learning rate of 0.01. Categorical cross-entropy [29] was employed as the loss function, while AUC was used as the evaluation measure. The model has a total of 809,054 trainable parameters, and it was trained on training data before being evaluated on test data.

4 Experiments and Results 4.1 Hardware Set-up The presented model is executed on a Windows 10 environment with a TensorFlow back end and Python 3.6 on a Corei5-6600 configuration with 16 GB RAM and a 6-GB Nvidia 1060 Graphics Processing Unit (GPU).

4.2 Performance Metric In this section, we discuss the experiments and tests carried out on the proposed model. The test data which was already provided in the UCF crime dataset was used to test the model’s performance. Following many other works [3, 10, 30], etc., in anomaly detection, we evaluate the area under curve (AUC %) to assess the model’s performance. The area under the receiver operating characteristic (ROC) curve is referred to as AUC. It illustrates the chance of accurately identifying a class against incorrectly predicting a class which demonstrates the classifier’s capacity to distinguish across classes and is also a plot of false positive rate (FPR) vs true positive rate (TPR).

354 Table 1 AUC scores comparison Model Sultani et al. [3] Feng et al. [11] Dubey et al. [21] Kamoona et al. [31] Proposed model

S. Hasija et al.

AUC scores (%) 75.41 82.30 75.90 80.10 82.91

4.3 Calculated Results The model was tuned and trained using several hyperparameter combinations. To attain the best data accuracy, the best collection of hyperparameters was employed. The AUC score was 82.91%, demonstrating the model’s ability to differentiate across various classes.

4.4 Comparative Study The accuracy achieved by the model is higher than those proposed by several previous studies. We computed the AUC score since many other researchers examined the performance of their respective models using the AUC (%) score. From Table 1, we can see that the proposed model’s AUC score was 82.91%, which is 0.61%, 2.81%, 6.24% and 7.5% higher than the scores of earlier studies [3, 11, 21, 31] for anomaly detection and classification into their respective categories. As a result, the provided model can classify anomalies into respective classes with an 82.91% probability. In Fig. 4, ROC scores of different classes calculated using the micro-average method are presented. The micro-average ROC curve generated by the model from which the AUC score was calculated is depicted in Fig. 5. The ROC curve was calculated using the micro-averaging method depending on the true positive and false positive rates for all categories of anomalies. The micro-averaging method averages the ROC curves of different classes equivalently by weighing them based on the relative frequency of the respective classes [32].

5 Conclusion and Future Scope The proposed transfer learning model can classify 13 various types of anomalies in video. Detecting anomalies through surveillance cameras in correct time can assist to prevent treasonous acts and reduce the number of victims. However, it is a difficult task because it necessitates the attention of the person in charge. Applying

28 Video Anomaly Classification Using DenseNet Feature …

355

Fig. 4 Micro-average ROC scores of different classes of proposed model

Fig. 5 Micro-average ROC score of proposed model

deep learning algorithms for automated anomaly detection makes the job easier and allows us to react more quickly. This work once again proves the power of using convolutional neural networks in the wide-ranging video classification domain. We conclude that the model presented in this research work using transfer learning technique achieved a decent accuracy and AUC score better than many works that are currently available. In future works, we intend to explore more deep learning algorithms such as graph convolutional neural networks, generative adversarial networks (GANs), recurrent

356

S. Hasija et al.

neural networks (RNN) and LSTMs and also try to use reinforcement learning to classify videos more precisely and accurately. We will also explore more datasets and test our model on various datasets and tune according to the results obtained.

References 1. Estimate the total number of videos on YouTube (Asked at Google in the past year). https://www.productmanagementexercises.com/1637/estimate-the-total-numberof-videos-on-youtube. Last Accessed: 06.09.2021 2. Huang G, Liu Z, Maaten L, Weinberger K (2018) Densely connected convolutional networks 3. Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6479–6488 4. Makker MRA, Ashok A (2019) Anomaly detection in surveillance videos. In: 2019 26th international conference on High Performance Computing, Data And Analytics Workshop (HiPCW), pp 93–98 5. Nguyen T, Meunier J (2019) Anomaly detection in video sequence with appearance-motion correspondence 6. Zhou J, Du J, Zhu H, Peng X, Liu Y, Goh R (2019) AnomalyNet: an anomaly detection network for video surveillance. IEEE Trans Inf Forensics Secur 14:2537–2550 7. Zahid Y, Tahir M, Durrani N, Bouridane A (2020) IBaggedFCNet: an ensemble framework for anomaly detection in surveillance videos. IEEE Access 8:220620–220630 8. Öztürk H, Can A (2021) Temporal anomaly detection in surveillance videos, ADNet 9. Wan B, Jiang W, Fang Y, Luo Z, Ding G (2021) Anomaly detection in video sequences: a benchmark and computational model 10. Ullah W, Ullah A, Haq I, Muhammad K, Sajjad M, Baik S (2021) CNN features with bidirectional LSTM for real-time anomaly detection in surveillance networks. Multimedia Tools Appl 80:16979–16995 11. Feng J, Hong F, Zheng W (2021) Multiple instance self-training framework for video anomaly detection, MIST 12. Tian Y, Pang G, Chen Y, Singh R, Verjans J, Carneiro G (2021) Weakly-supervised video anomaly detection with robust temporal feature magnitude learning 13. Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. https://paperswithcode.com/paper/weakly-supervised-video-anomaly-detectioncode. Accessed: 07.09.2021 14. Sabokrou M, Fathy M, Hoseini M, Klette R (2015) Real-time anomaly detection and localization in crowded scenes. In: 2015 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 56–62 15. Tran D, Wang H, Torresani L, Feiszli M (2019) Video classification with channel-separated convolutional networks 16. Diba A, Fayyaz M, Sharma V, Karami A, Arzani M, Yousefzadeh R, Gool L (2017) Temporal 3D ConvNets: new architecture and transfer learning for video classification 17. Xie S, Sun C, Huang J, Tu Z, Murphy K (2018) Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification 18. Carreira J, Zisserman A, Vadis Q (2018) Action recognition? A new model and the kinetics dataset 19. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks 20. Mao F, Wu X, Xue H, Zhang R (2019) Hierarchical video frame sequence representation with deep convolutional graph network. In: Computer vision—ECCV 2018 workshops, pp 262–270. https://doi.org/10.1007/978-3-030-11018-5_24

28 Video Anomaly Classification Using DenseNet Feature …

357

21. Dubey S, Boragule A, Jeon M (2020) 3D ResNet with ranking loss function for abnormal activity detection in videos 22. Abu-El-Haija S, Kothari N, Lee J, Natsev P, Toderici G, Varadarajan B, Vijayanarasimhan S (2016) Youtube-8m: a large-scale video classification benchmark. ArXiv Preprint ArXiv:1609.08675 23. Youtube-8m starter code. https://github.com/google/youtube-8m. Accessed: 27.11.2021 24. Babenko B (2008) Multiple instance learning: algorithms and applications. In: View Article PubMed/NCBI Google Scholar, pp 1–19 25. Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255 26. Basha S, Dubey S, Pulabaigari V, Mukherjee S (2020) Impact of fully connected layers on performance of convolutional neural networks for image classification. Neurocomputing 378:112– 119. https://www.sciencedirect.com/science/article/pii/S0925231219313803 27. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958. http://jmlr. org/papers/v15/srivastava14a.html 28. Ruder S (2016) An overview of gradient descent optimization algorithms. ArXiv Preprint ArXiv:1609.04747 29. Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels 30. Park H, Noh J, Ham B (2020) Learning memory-guided normality for anomaly detection 31. Kamoona A, Gosta A, Bab-Hadiashar A, Hoseinnezhad R (2021) Multiple instance-based video anomaly detection using deep temporal encoding-decoding ˇ 32. Gunˇcar G, Kukar M, Notar M, Brvar M, Cernelˇ c P, Notar M, Notar M (2018) An application of machine learning to haematological diagnosis

Chapter 29

Empirical Analysis of Novel Differential Evolution for Molecular Potential Energy Problem Pawan Mishra, Pooja, and Shubham Shukla

1 Introduction DE is most emerging and frequently used optimization algorithm based on Darwin’s theory of biological evolution of species (“Survival of fittest”) [1]. The species needs to continuously improve itself to survive in contaminated environment at any cost. They need to mutate their-selves according to environmental changes for their survival. At the end most adoptive species will remain. The scientific and mathematical phenomenon of DE was first ever introduced by Stron and Price [2]. DE is one of most popular algorithms of its kind of technique, such that it contains some distinct qualities: efficiently handling the real-world problems, hard computational problems, and complex optimization challenges [3]. It follows the evolution theory, by that it focuses over the evolution of population over different generation, and finally, the most suitable solution remains. At the current emerging trends, DE provides very help-in hand for smoothly handling the various engineering [4], biological [5], mathematical [6], and even in bio-chemistry [7]-related problems. Although DE applied on vast area of research, at the same time, it follows the certain nature-inspired procedures to reach its goal. In order to achieve the best possible solution and enhanced convergence quality, there are many variant of DE have been proposed by researchers [8–14] and continuously trying to achieve the better to best performance of DE.

P. Mishra (B) · Pooja Department of Electronics and Communication, University of Allahabad, Prayagraj, UP, India e-mail: [email protected] Pooja e-mail: [email protected] S. Shukla Department of Electronics and Communication, Kiet Group of Institutions, Ghaziabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_29

359

360

P. Mishra et al.

In this article, we have used the NDE variant of DE and tried to get the better optimized result for molecular potential energy problem [15, 16], unlike in study we have found many method for solving the molecular potential energy problems [17–19], like developing the function to test-bed approach, in order to get the global minimum or optimality. There are some nature-inspired approaches are already developed like the cultivated differential evolution approach (CuDE) [20], hybrid social spider optimization implemented with genetic algorithm-based approach (HSSOGA) [17], and improved social spider algorithm (ISSA) [19] for efficiently solving the molecular potential energy problem. There are five sections involved in this article. Section 1, contains the introduction part and research gap along with intense reviews about the article. Section 2 explains about the standard DE along with problem statement related to molecular potential energy problem, and the third (Sect. 3) describes about the obtained mechanism, which is NDE and also contains the corresponding algorithm. Section 4 has result analysis that contains the tabular, graphical analysis of the obtained results along with deep discussion about the results, and the last Sect. 5 shows conclusion and future enhancement of problem.

2 Problem Statement This article follows one of the successful variants of DE, named NDE [14], that has been applied on different areas, and this approach has already been tested on various constraint optimization problems and several multi-modal benchmark functions.

2.1 Basic Differential Evolution Algorithm Differential evolution is most frequently used strategy for optimization. DE follows the certain procedural steps to achieve the best available outcomes. Step1: Population initialization, DE needs to generate the random population with Nρ ∗ D size, where Nρ refers to number of candidate solutions in the population, and D is dimension of each respective candidate solution. − →g X i = [1, 2, . . . , Nρ],

(1)

where the index i represents the i-th candidate solution to respective population vector, and g represents the respective generation of the particular population. Step2: Mutation, The next process in DE is mutation. In mutation, choose the three random vector, take difference of two vectors (difference vector), and add them to third vector (base vector); by such mechanism, it will create the mutated population with same size and dimension.

29 Empirical Analysis of Novel Differential Evolution …

− →g − →g − →g − →g V i = X a1 + F ∗ ( X a2 − X a3 ),

361

(2)

where i = 1, . . . , Nρ, a1, a2, a3 ∈ [1, . . . , Nρ] are randomly selected such that a1 = a2 = a3 = i, and F is the control parameter such that F ∈ [0, 1]. Step3: Cross-over, In this step, the cross-over process applied over the initial − → − → populations X i and mutated population Vi along with some mathematical conditions, which are as follows: ⎧ g ⎪ ⎨V j,i if rand j ≤ Cr or j = jrand , g U j,i = (3) ⎪ ⎩ g X j,i other wisej = 1, 2, 3, . . . , D, where i = [1, . . . , Nρ], j = [1, . . . , D] , 1 ≤ jrand ≤ D is a random parameter’s index, chosen once for each i. The cross-over rate Cr ∈ [0, 1] is chosen by the user. rand j , uniformly distributed random number such that (0 < rand j < 1), is generated for each dimension of the current generation. Step4: Selection, There are various techniques to perform the selection operator. Selection process insures to evolve the candidate solution over the generations. The tournament selection has been applied either from initial population (base population) and cross-over population (trial population) which depends upon the performance of each populations candidate solutions. The tournament selection follows the condition given to filter out the candidate solution for next generation: − →g+1 Xi =

⎧− →g − →g − →g ⎪ ⎨ U i if f ( U i ) ≤ f ( X i ), ⎪ →g ⎩− X i other wise.

(4)

− →g − →g In this selection approach, f ( U i ) and f ( X i ) are corresponding fitness value of the − →g − →g trial population ( U i ) and the target population ( X i ), respectively.

2.2 Molecular Potential Energy Problem (MPEP) The molecular potential energy problem is one of most challenging bio-chemistry problems. It also comes under the filed of physical-chemistry science [16]. Here, whole problem is focused behind the finding the lowest-energy state (structure) of molecules, and this lowest-energy state contains the several properties of molecules. So, we can relate the finding of global optimum of a function with molecules potential energy function, which shows the lowest energy structure of molecules [21]. It follows the existing global optimization mechanism. Therefore, this problem is taken as a benchmark function.

362

P. Mishra et al.

Although in past years many researchers have already tried to solve this problem, but it remains a challenge to optimize this problem, due to its complex nature. The main difficulty in finding out the solution for this problem is that it exponentially increases the number of optimum solution of minimizer function with respect to increase in size of molecules [22]. Suppose, we have simplified molecular model, which consists of chain of n variables, these variables are linearly arranged like x1 , x2 , . . . xn in a 3-D space. In this modelled space, the every two participated pair (xi and xi+1 ) of variables have focused with li,i+1 of bond length, and the length between these two pairs is also called as Euclidean distance. For every three consecutive variables (xi , xi+1 , xi+2 ), there is αi,i+1 be the corresponding bond angle to the relative position of 3r d variable with respect to line containing the previous two variables. Let ωi,i+3 be the angle (torsion angle) through the planes determined between the variables (xi , xi+1 , xi+2 ) and (xi+1 , xi+2 , xi+3 ), for every four consecutive variables, (xi , xi+1 , xi+2 , xi+3 ). Suppose, the simultaneous potential force E 1 , E 2 , E3 and E 4 with the corresponding bond length, bond angle and torsion angle, respectively, as shown below, E1 =

ci,1 j (li, j − li,0 j ),

(5)

ci,2 j (αi, j − αi,0 j ),

(6)

ci,3 j (1 + cos(3ωi, j − ωi,0 j )),

(7)

(i, j)∈M1

E2 = E3 =

(i, j)∈M2

(i, j)∈M3

E4 =

(i, j)∈M3

(

(−1)i ), li, j

(8)

where ci,1 j → bond stretching force constant. ci,2 j → angle bending force constant. ci,3 j → the torsion force constant. li,0 j and αi,1 j → the “preferred” bond length constant and bond angle constant, respectively. ωi,1 j → the phase angle which defines the position of the minima. Mk , k = 1, 2, 3 → the set of pair of atoms separated by k covalent bonds. li, j → Euclidean distance between the beads xi and x j . So, the main objective is to minimize the whole molecular potential energy, E = E 1 + E 2 + E 3 + E 4 , which contains the optimum spatial position of the involved variables. Using the parameters defined in [16], potential energy function takes the following mathematical form,

29 Empirical Analysis of Novel Differential Evolution …

363

(−1i ) (1 + cos(3ωi,i+3 )) + 10.60099896 − 4.141720682cos(ωi,i+3 ) i (9) where i = (1, 2, . . . n − 3), and n is the number of variables in the system taken into consideration. Thus, problem reduces to find ωi,i+3 , where i = (1, 2, . . . , n). From Eq. 9, it is clear that E is a non-convex function which involves numerous local minimizers along with small value of n. So, these local minimizers, which are 2 N in count, where N = n − 3 is the total number of variable in a molecule [16], are corresponding to a state which is not truly stationary, but it is almost stationary called as metastable state of the molecule. The existence of only one global minimum is guaranteed, by restricting ωi, j ; 0 < ωi, j < 5. E=

3 Novel Differential Evolution (NDE) NDE is one of the successful variants of DE. The NDE algorithm is based on two concept, first one is self-adaptive mechanism for control parameter, and another one is unit population structure. The NDE increases the chance to get the optimum candidate solution with better efficiency. a. Generating self-adaptive control parametric settings, Parameter tuning is of the common phenomenon behind getting the better quality of candidate solutions. So, choosing the parametric values is very important mechanism in development of DE algorithm. Sometime these control parameters decide the most diverse results and better convergence quality. There are two control parameters involved in DE, namely mutation factor (F) and cross-over factor (Cr ). Mutation factor (F) might be provided the better convergence rate, and cross-over factor (Cr ) decides the possibility of best available candidate solutions. F and Cr factors have constant values in basic DE. But NDE has something different strategy in the selection of control parameters, so we could compute the values of parameters during each iterations at run-time. NDE used following mathematical model to compute these control parameters, as follows: ⎧ √ ⎪ ⎨ F0 + r (r − r1 )(r − r2 )(r − r3 ) i f, P f < r4 , (10) F= ⎪ ⎩ other wise, F1

Cr =

⎧ ⎪ ⎨Cr0 + r5 i f, PCr < r5 , ⎪ ⎩ Cr1

(11) other wise,

where rk ∈ (0, 1), ∀k ∈ [1, 2, 3, 4, 5] is uniformly generated random numbers and r = (r 1 + r 2 + r 3)/2.

364

P. Mishra et al.

PF is scaling probability, while PCr is cross-over probability, and both are defined as 0.5. The values of F and Cr are tuned in the defined range [min-max]: (F0 , F1) = [0.1 - 0.5], for (Cr0 , Cr1 ) = [0.5, 0.9]. If somehow, the recommended range is crossed, and then, these parameters use following formula of bounded region to be in the defined range. ⎧ ⎪ ⎨2 ∗ F0 − F i f, F < F0 , F= (12) ⎪ ⎩ 2 ∗ F1 − F i f, F > F1 ,

Cr =

⎧ ⎪ ⎨2 ∗ Cr0 − Cr i f, Cr < Cr0 , ⎪ ⎩

(13) 2 ∗ Cr1 − Cr i f, Cr > Cr1 ,

Here, the values for mutation factor F and cross-over factor Cr have been computed for each and every iterations, and these values will further change the randomness of solutions that have been generated after the mutation and cross-over operations. This concept will ensure that we do not need to manually fix or change the values of these control factor for every iterations and it will provide the uniqueness for candidate solutions. b. Single Population Structure, With single population structure, DE enforces the newly generated elite candidate solutions to participate in the current generation. In basic DE, two population structures have been utilized to manage the population, while MDE [8] proposes single population structure where advanced candidate solutions (trial vector) are updated in the current population itself. This phenomenon will reduce the size of program and the time taken in execution as well. It will increase the probability of getting the most optimized candidate solution because the newly updated better candidate solutions will participate in mutation and cross-over process again. For better understanding of the NDE algorithm, the readers may refer to Pooja et al. [14].

3.1 NDE Algorithm This algorithm demonstrates the steps of NDE algorithm. Population Initialization Np, number of candidate solution, D, dimension of a vector, (PCr ) and (PF ) ← [0.5] for r un ← 0 to max_r un do for iter ← 0 to max_iter do pop ← scale( pop, Low Bound, H igh Bound)

29 Empirical Analysis of Novel Differential Evolution …

365

F and Cr ← Set according to equations(10) and (11) respectively. And to maintain the value of F and Cr within the defined feasible region ((F0 , F1 ) ← [0.1, 0.5] and (Cr0 , Cr1 ) ← [0.5, 0.9]) using equations (12) and (13) respectively. mutate_ pop ← list () trial_ pop ← list () selection ← list () for i ← 0 to pop_si ze do r1 ← init_ pop[random_index] r2 ← init_ pop[random_index] r3 ← init_ pop[random_index] di f f ← r2 − r3 mutate_ pop.append[r1 + F ∗ di f f ] end for for j ← 0 to pop_si ze do cr oss_ points = random(D) < Cr if notany(cr oss_ points) then cr oss_ points[random(0, D)] ← T r ue end if trial.append(wher e(cr oss_ points, mutate_ pop[ j], init_ pop[ j])) end for for k ← 0 to pop_si ze do scor e_trial ← benchmar k_ f n(trial[k]) scor e_target ← benchmar k_ f n(init_ pop[k]) if scor e_trial < scor e_target then init_ pop[k] ← trial[k] init_ pop.append(scor e_trial) else init_ pop.append(scor e_target) end if end for end for end for

4 Result Analysis and Discussion 4.1 Standard Benchmark Function To validate the efficiency of the NDE algorithm, the standard benchmark function has been taken from CEC “2008” [15] which are listed in Table 1 with their dimensions and corresponding bounds.

366

P. Mishra et al.

Table 1 Standard benchmark function with their respective bounds, dimension and global optimum value (g*) Sr. No Function Bounds (low/upper) Dimension Global optimum (g*) F1 F2 F3 F4 F5 F6 F7 F8

Sphere function Ackley function Exponential function Panelized-I function Restrigin function Noise function Rosenbrock function Schwefel 2.21 function

(−100/100) (−32/32) (−1/1) (−50/50) (−5.2/5.2) (−1.28/1.28) (−30/30) (−100/100)

30/50 30/50 30/50 30/50 30/50 30/50 30/50 30/50

0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Table 2 The parametric settings for NDE algorithm Parametric aspects Corresponding values Pop Size (Nρ) Dimension(D) PF /PCr NFE Max RUNs

100 15/30/50 (for Molecular Potential Energy Problem), 30/50 (for Benchmark function) 0.5, [((F0 , F1 ) ← [0.1, 0.5] and (Cr0 , Cr1 ) ← [0.5, 0.9])] 500,000 50

4.2 Parametric and Experimental Settings The parametric settings have been listed in Table 2, while all the functions are executed on i5 processor, 5GB RAM in DEV-C++ environment.

4.3 Analysis over Standard Benchmark Functions The comparative results for DE and NDE for benchmark functions listed in Table 1 are shown in Table 3. It has been found that the NDE performs better than standard DE in all aspects of performance, i.e. fitness value and standard deviation (S.D.). It has been tested over eight standard benchmark functions with 30 and 50 dimensions, respectively. In Table 3, the results, in the aspects of fitness value, i.e. average error and standard deviation for 50 runs, are obtained and show that used NDE provides better results than respective standard DE for the function F1 to F6 and F8 except function no. F7 .

29 Empirical Analysis of Novel Differential Evolution …

367

Table 3 Experimental outcomes for benchmark functions (Table 1) in terms of fitness value and SD Function Dem DE NDE Fitness value S.D Fitness value S.D F1 F2 F3 F4 F5 F6 F7 F8

30 50 30 50 30 50 30 50 30 50 30 50 30 50 30 50

8.53215e–020 8.43419e–012 4.30728e–011 7.45257e–007 7.22971e–020 7.54952e–016 1.39311e–019 5.15673e–013 149.228 337.487 0.00759495 0.013385 12.0827 40.3687 0.245413 7.69047

7.58546e–021 8.54165e–012 5.22121e–012 8.55454e–006 8.56454e–021 5.84547e–015 2.21514e–019 4.21545e–014 152.245 402.547 0.00842454 0.012465 13.2646 42.25641 0.016262 8.56415

3.29176e–044 9.15839e–028 3.9968e–015 1.25233e–014 2.54541e–025 1.11022e–016 1.3536e–020 8.12158e–020 6.96471 20.1801 0.00032561 0.0102791 33.7078 62.1753 0.021145 0.154651

2.2321e–043 8.12146e–028 4.64151e–016 2.2131e–015 4.74524e–025 2.62653e–015 3.2626e–021 5.36217e–022 4.32549 19.84520 0.00086562 0.0321565 32.5454 65.2541 0.86565 0.96847

4.4 Analysis over Molecular Potential Energy Problem The DE and NDE are analyzed for molecular potential energy problem on dimension D = { 15, 30, 50 }, the results obtained in terms of mean of number of function evaluations (NFE), and corresponding CPU execution time to perform there operations is kept in Table 4, while the the results obtained in terms of mean of fitness value and S.D. are shown in Table 5. The performance of NDE has been measured in terms of fitness value (mean, S.D.), NFE, and CPU time. With the results shown in Tables 3, 4 and 5, it is quite clear that NDE performed better than DE not only for the standard benchmark functions but performed very efficiently for molecular potential energy function as well. It can seen that three cases of the function have been taken for testing purpose and NDE worked very beautifully to produce the optimized results for all the cases. Convergence graph for 30D and 50D of molecular potential energy problem has been plotted and shown in Fig. 1. Statistical Analysis: In Table 6, the statistical analysis of NDE for molecular potential energy problem has been drawn to show whether the results obtained are significant. Here, p-test has been applied for 30-D and 50-D. The p-value of the test for both the cases is less than 0.001, which is also lower than the standard level of significance, i.e. α = 0.5.

368

P. Mishra et al.

Table 4 Experimental outcomes for molecular potential energy problem in terms of NFE and CPU time Function Dem DE NDE NFE CPU time (s) NFE CPU time (s) Molecular Potential Energy Problem

15 30 50

100,675 205,454 350,124

1.50 1.86 2.02

86,565 146,622 253,134

0.15 0.22 0.50

Table 5 Experimental outcomes for molecular potential energy problem in terms of fitness value and SD Fitness Dem DE NDE function Fitness value SD Fitness value SD Molecular Potential Energy Problem

−0.79628 5.85361 19.18512

15 30 50

−0.821125 5.68924 20.05121

−0.85326 −0.92208 −0.94049

−0.8.65641 −0.946451 −0.95898

(b) DE vs NDE(50D)

(a) DE vs NDE(30D)

Fig. 1 Convergence graph for graph plotting for molecular potential energy problem with dimension 30 and 50, respectively Table 6 p-test result of NDE for molecular potential energy problem, N refers the total observation, df stands for the degree of freedom Name of problem Dem N df p-test performance (α = 0.5) Molecular potential Energy problem

30

20

18

p < 0.001

50

20

18

p < 0.001

29 Empirical Analysis of Novel Differential Evolution …

369

5 Conclusion The utilized DE named as NDE is one of the advanced algorithms. To guarantee the solutions with the global optimality for the presented functions, this algorithm is exercised. This strategy had been used to handle the constraint benchmark problems as well as provides better performance like maximum exploration to search the global optima. Presently, this efficient mechanism has been used for solving unconstrained benchmark fitness functions and a real time problems of molecular potential energy problem. At the last numerical results and the convergence graphs validate that NDE outperformed for all the functions in terms of fitness value, NFEs and CPU execution time. Overall the NDE improved the quality of solution and proved to be effective tool for handling molecular potential energy problem.

References 1. Darwin C (1859) On the origin of species, or the preservation of favoured races in the struggle for life, vol 532. John Murray, London. https://doi.org/10.4324/9780203509104 2. Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Global Optimization 11:341–359. https://doi.org/10. 1023/A:1008202821328 3. Deb K, Anand A, Joshi D (2002) A computationally efficient evolutionary algorithm for real-parameter optimization. Evol Comput 10:371–395. https://doi.org/10.1162/ 106365602760972767 4. Plagianakos V, Tasoulis D, Vrahatis M (2008) A review of major application areas of differential evolution. In: Advances in differential evolution. vol 143. Springer, Berlin, pp 19–238. https:// doi.org/10.1007/978-3-540-68830-3_8 5. Eiben AE, Smith JE (2008) Introduction to evolutionary computing, natural computing series. Springer. https://doi.org/10.1007/978-3-662-44874-8_1 6. Qin K, Huang VL, Suganthan PN (2009) Differential evolution algorithm with strategy adaptation for global numerical optimization. IEEE Trans Evol Comput 13(2):398–417. https://doi. org/10.1109/TEVC.2008.927706 7. Lavor C, Maculan N (2004) A function to test methods applied to global minimization of potential energy of molecules. Numer Algorithms 35:287–300. https://doi.org/10.1023/B:NUMA. 0000021763.84725 8. Babu BV, Angira R (2006) Modified differential evolution (MDE) cof non-linear chemical processes. Comput Chem Eng 30(6–7):989–1002 (Elsevier). https://doi.org/10.1016/j. compchemeng.2005.12.020 9. Tessema B, Yen GG (2006) A self adaptive penalty function based algorithm for constrained optimization. IEEE Int Conf Evol Comput 2006:246–253. https://doi.org/10.1109/CEC.2006. 1688315 10. Pooja CP, Kumar P (2015) Control parameters and mutation based variants of differential evolution algorithm. J Comput Methods Sci Eng 15(4):783–800. https://doi.org/10.3233/JCM150593 11. Mezura-Montes E, Palomeque-Ortiz AG (2009) Parameter control in differential evolution for constrained optimization. In: IEEE Congress on Evolutionary Computation (CEC’2009), IEEE, Trondheim 12. Pant M, Ali M, Abraham A (2009) Mixed mutation strategy embedded differential evolution. In: IEEE Congress on Evolutionary Computation, pp 1240–1246. https://doi.org/10.1109/CEC. 2009.4983104

370

P. Mishra et al.

13. Zaheer H, Pant M (2014) A differential evolution approach for solving integer programming problems. Adv Intelli Syst Comput. https://doi.org/10.1007/978-81-322-2220-0_33 14. Pooja CP, Kumar P, Tomar A (2018) A novel dierential evolution approach for constraint optimisation. Int J Bio-Inspired Comput 12(4):254–265. https://doi.org/10.1504/IJBIC.2018. 096459 15. Tang K, Yao X, Suganthan P.N., MacNish C, Chen YP, Chen CM, Yang Z (2008) Benchmark functions for the CEC’2008 special session and competition on large scale global optimization. Technical Report CEC-08, 1–18. https://doi.org/10.1.1.515.821 16. Maranas CD, Floudas CA (1994) A deterministic global optimization approach for molecular structure determination. J Chem Phys (AIP) 100(2):1247–1261. Pub by Elsevier B.V. https:// doi.org/10.1063/1.467236 17. Tawhid MA, Ali Af (2017) A hybrid social spider optimization and genetic algorithm for minimizing molecular potential energy function. Soft Comput :6499–6514. https://doi.org/10. 1007/s00500-016-2208-9 18. Marques Jorge MC, Emilio M-N, Hase WL (2020) Application of optimization algorithms in chemistry. Front Chem 8:198. ISSN 2296-2646. https://doi.org/10.3389/fchem2020.00198 19. BaS¸ E, Ülker E (2020) Improved social spider algorithm for minimizing molecular potential energy function. Konya Mühendislik Bilimleri Dergisi 8(3):618–642. https://doi.org/10.36306/ konjes.788082 20. Pooja CP, Kumar P (2015) A cultivated differential evolution variant for molecular potential energy problem. In: 3rd International Conference on Recent Trends in Computing 2015 (ICRTC-2015). https://doi.org/10.1016/j.procs.2015.07.429 21. Leach AR (2001) Molecular modeling: principles and applications. 2/E, Prentice-Hall, Harlow, England. ISBN 0582382106 9780582382107, pp 253–273. https://www.worldcat.org/ title/molecular-modelling-principles-and-applications/oclc/45008511 22. Maranas CD, Floudas CA (1994) Global minimum potential energy conformations of small molecules. J Glob Optim 4:135–170. https://doi.org/10.1007/BF01096720

Chapter 30

Sign Language versus Spoken English Language—A Study with Supervised Learning System Using RNN for Sign Language Interpretation Sampada S. Wazalwar and Urmila Shrawankar

1 Introduction Society is a combination of different type of people where some people are normal but some are disabled physically or mentally. Physically disabled community consists of deaf and dumb people who are hearing and speech impaired. This deaf and dumb community faces so many challenges while communicating with normal people. The basic problem is of language. Sign language is a way of communication between deaf and dumb individuals and hearing individuals. Deaf–dumb individual community faces so many problems due to lack of knowledge of sign language amongst normal community. This leads to ignorance amongst the community for deaf–dumb. To overcome this problem of interpretation, this community needs a human interpreter for communication. Human interpreters are trained professional who are having an ability to interpreter the sign language and making understandable to normal people in their preferred language. But, finding an interpreter for day-to-day communication is not possible, and it includes cost, thus this community remains ignored. Looking at this problem, there is a need to have an interpreter system which will automatically convert the sign language communication into meaningful language sentences. Even if we talk about deaf and dumb education, there are very less teachers available for giving education to deaf and dumb as teachers need to learn the ISL first. So there is need for identifying easy ways of learning, understanding and interpreting sign language.

S. S. Wazalwar (B) · U. Shrawankar Computer Science and Engineering, G H Raisoni College of Engineering, Nagpur, MS, India e-mail: [email protected] U. Shrawankar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_30

371

372

S. S. Wazalwar and U. Shrawankar

2 Current Scenario with Major Challenges There are so many systems already proposed or are in use for sign recognition [1]. Most of these systems include hardware as it needs to capture the sign of signer. It is not feasible for the person to every time carry this hardware with him for others understanding. Even in schools, it is not affordable to purchase or to have such hardware system for teaching–learning purpose for sign language. Many deaf and dumb students are not aware of existence of such systems. So there is a need to develop a small portable system which can be used for sign interpretation. The problem with this is sign recognition system which will give an output mere in the form of consecutive words which are recognized with respect to sign. Output of sign recognition system is not a complete meaningful sentence, so sometimes it may be misinterpreted. Thus, the interpretation also plays an important role to convert this output of sign recognition system into meaningful language sentence translation. An easy interpretation system will make it easy for teachers and students to learn, understand and interpret the language. So in below sections, we have highlighted the difference between spoken language and sign language and also grammatical analysis between them. These all will make pedagogy of teaching–learning sign language easy. Interpretation Spoken Language versus Sign Language. There is a great difference between interpretation of spoken language and sign language. Table 1 shows the difference with respect to different parameters. Above parameters specify that different interpretation system will be needed for interpretation of sign language which can convert properties of sign language into spoken language. Table 1 Difference between spoken and Indian sign language Parameter

Spoken language

Indian Sign language

Type of language

Spoken

Visual

Input to interpreter

Complete sentence available

Only signs for keywords, missing words to be filled

No. of words

171,476 words in current use, (Oxford Standard Dictionary) and 47,156 obsolete words

3000 words (ISL Dictionary released on 23 March 2018) by ISLR&TC • Everyday terms (2351 words) • Legal terms (237 words) • Academic terms (212 words) • Medical terms (200 words)

Correctness

Grammatical correctness required

Partial correct will also serve the purpose

Ambiguity

Word/sentence ambiguity

Sign ambiguity

Grammar

Standard grammar

Partial grammar

Accuracy

Depends on sentence correctness

Depends on standard sign used

30 Sign Language versus Spoken English Language …

373

Table 2 Difference between English and ISL grammar SMS abbreviation

Full meaning

Well structured Follows SVO order

Follows SOV order

Uses various forms of verb and adjectives

Uses only root form of word, no gerund/suffixes and any other forms

Larger dictionary

Limited dictionary

Wh word in interrogative sentence is at start of sentence

Question word is always added in sentence at final

Articles, conjunctions and helping verbs are used

No use of articles, conjunctions and linking verbs

Use I and Me

Do not use I, only me

Use gerunds

Do not use Gerunds

‘Subject’ ‘Predicate’ Structure

TOPIC-COMMENT structure

3 Prominent Variance Between English and Sign Language Grammar for English language and sign language is different which creates many challenges for interpretation [3]. Table 2 shows some differences in English and ISL grammar.

4 Sign Language Interpretation System Some systems are available which are used for real-time interpretation of sign language for example Motion Savy, Sign All, etc. Motion Savy [4] is a tablet app which understands sign language using leap reader controller it translates ASL into English and vice versa. The problem with this is signer always has to carry this tablet with him and it is working with leap motion only not the complete set of sign language. Another tool, i.e. used is Sign All [5], is also a technology which translates ASL into English in real time. This is used generally for businesses and organizations which help to improve accessibility of their customers and employees of deaf and dumb community. No any such system is available for an Indian sign language.

4.1 Steps Involved in Model of Interpretation of Sign Language Sign recognition system gives specific words for respective sign. Interpretation of these sequence of words also becomes a challenging task in the absence of human

374

S. S. Wazalwar and U. Shrawankar

Fig. 1 Steps for model of interpretation for sign language

interpreter. Thus a system is required which will give output as a meaningful short English sentences for given signs. General system model is shown in Fig. 1. First step after sign recognition system is to collect sign keywords and then arrange it in a sequential fashion to get a correct meaningful English sentence with respect to context [6]. This involves different steps which are shown in diagram above and explained below. Step I: Sign Keywords. Sign recognition system will give the output in the form of words that are been assigned for particular sign. These sign words are nothing but all the words which are there in Sign Language Dictionary which is introduced by Indian Sign Language Research and Training Centre under Department of Empowerment of Persons with 3000 Indian sign language words [7]. Sample sentences: Deaf and dumb use sign language for communication so it is difficult to form long sentences. User communicates through short sentences which includes less number of sign. Thus, we are also concentrating on short sentences which include four to five words. Sample sentences are given below. “I am a boy” “I am very happy today” “Every day I go to school” “Today is holiday” “I am going to enjoy today” “My other is cooking food inside kitchen” “I have to study for good result” “It was a wonderful day” Step II: POS Tagging. To generate a grammatically correct sentence, the very first step is POS tagging, where we need to identify part of speech of each word [8]. The result for POS tagging is as follows for sample sentences.

30 Sign Language versus Spoken English Language …

375

Stanford POS tagging is applied for tags, and results obtained are shown below. I am a boy, my name is ___. I_PRP am_VBPa_DTboy_NN ,_, my_PRP$ name_NNis_VBZ ____CD ._. I am very happy today. I_PRP am_VBPvery_RBhappy_JJtoday_NN ._. Today is holiday. Today_NNis_VBZholiday_NN I am going to enjoy today. I_PRP am_VBPgoing_VBGto_TOenjoy_VBtoday_NN In case of above all sentence examples, the sentence is complete. Compared to sign language, we have to first do part of speech tagging, which will give us part of speech for that word. Thus, the first aim is missing word identification to generate sentence. Step III: Text Generation Model. We need text generation model for sentence completion. As sign language shows incomplete sentence, let us take some simple example sentences as shown in Figs. 2 and 3 where some words have standard sign for representation in sign language. Examples Sentence 1: I am a boy Sentence 2: It is a wonderful day In above example, sign is available for wonderful and day. Wonderful may also be interpreted as beautiful/good day. So multiple choices are also there. Again in place Fig. 2 Example 1 for missing word identification

Signed words

am

I

a

Boy.

To be inserted by system to complete sentence

Sentence 2: It is a wonderful day It

is

a

Wonderful

Fig. 3 Example 2 for missing word identification

day

376

S. S. Wazalwar and U. Shrawankar

of ‘is’, there can be ‘was’ which will also generate a perfect meaningful sentence, but it depends on adjacent statement tense also which to select. To overcome all these problems, we need to make strong generation rules that will define which words to be added in gap to form a proper grammatically correct sentence or we need to train the system for identifying the sequence of words to be arranged and giving a weightage to each word in sentence. Different methods can be used for such problems such as assigning probability, frequency distribution, Markov chain text generation, supervised learning systems or expert systems which will be trained on a huge learning database. Building rules for text generation becomes difficult when we are dealing with sign language as there are no such standard grammar for sign language. If we go on building rules, we need to think for so many cases which then becomes difficult. So in our work, we selected to use supervised training system to build the interpreter. If we are training the system here, then we can consider it under machine learning. Almost all problems of machine learning can be solved by using the neural network. But NN are having a limitation of not understanding the sequence. It cannot depict exactly how the current state can be affected by its previous stage. The solution to this is recurrent neural network (RNN) which is used to learn the past information. long short-term memory (LSTM) which can be a solution over it.

4.2 Subjective Analysis of a Given Solution Many people are involved in general teaching, learning and interpretation process of sign language. They are deaf/dumb signer, teacher/student, human interpreter, and common people. To know about the problems faced by these people, we have visited three deaf and dumb schools in Nagpur, which are listed below. 1. 2. 3.

Deaf and Dumb School, Shankar Nagar, Nagpur. Kalyan Muk Badhir Vidyalay, Reshimbagh, Nagpur. Deaf and Dumb industrial Institute, Ram Nagar, Nagpur.

Summarizing the discussion done with teachers, students and human interpreters, following points are identified as shown in Table 3.

5 System Implementation Some systems are available which are used for real-time interpretation; we are using RNN LSTM network for training the interpretation system for sentences. LSTM uses gates to flow gradients back in time and reduce the vanishing gradient problem. LSTM is capable of learning long-term dependencies (Fig. 4). Algorithm for RNN with LSTM includes the following steps.

30 Sign Language versus Spoken English Language …

377

Table 3 Subjective analysis Discussion person

General reviews

Solution needed

Feedback on proposed solution

Teacher in deaf and dumb school

Difficult to teach the sign language Standard dictionary not accessible easily No fixed grammar One word many signs Interpretation is difficult with hardware set-up and costly Identifying tense is a major challenge

Standard dictionary needed Online websites or apps for teaching Common standards needed Need a simple and handy system

Will help both teacher and student to learn and practise words and interpretation Digital/online teaching system Students will use technology

Deaf and dumb student

Standard sign not needed always Leap reading, air writing, finger spelling may serve purpose Easy to communicate with deaf and dumb but difficult in common society Cannot express until knows the sign

Self-controlled interpretation system needed Human interpreter is hard to find and costly Either stand-alone systems or some handy application needed at public places

System will help to communicate with others Reduced hesitation Can come out of the wall Need to carry a phone or a digital assistance always Cheaper and comfortable than human interpreter

Human interpreter

Varies person to person Depends on person speed, background, use of standard signs, facial expression, proper orientation Sometimes may be misinterpreted

Automatic interpretation systems may solve the purpose

Can reduce problem of misinterpretation, rigorous machine learning may help in accuracy Ambiguity problem can be handled by system mechanically Persons speed will not affect the output Background noise can be removed

Common man

Lack of sign language Technical help or knowledge automatic interpreter Difficult to understand may solve the purpose and communicate with deaf and dumb

Such system will reduce the communication gap between deaf–dumb and normal

Step 1. During training, it decides what information we are going to throw away, and it relates to forget gate which value should be taken though it is decided by the sigmoid function. f t = σ (W f . h t−1 , xt + b f )

378

S. S. Wazalwar and U. Shrawankar

Fig. 4 RNN LSTM unit

Step 2: It decides what new information we are going to save in current cell state. It denotes input gate. i t = σ (Wi . h t−1 , xt + bi ) xt denotes LSTM unit input vector, i t denotes activation input to forget gates, W and b weight matrices and vector parameters which are learned during training, h t denotes hidden state vector or output vector for a unit C˜ t = tanh(WC . h t−1 , xt + bC ) C˜ t is a cell input activation vector. Step 3: It decides what output to be given, which denotes output gate. Ct = f t ∗ Ct−1 + i t ∗ C˜ t Step 4: Apply tan h to get the value in between -1 and 1 ot = σ (Wo . h t−1 , xt + bo ) h t = Ot ∗ tanh(Ct ) ot is output gates activation vector.

30 Sign Language versus Spoken English Language …

379

6 Results We have made the database of sentences in ISL and its corresponding interpretation in English. We have made database of such basic 150 short sentences of length not more than eight words as shown in Fig. 5. We have applied RNN LSTM algorithm as stated above for 20 epochs at a time with details as given in Figs. 6, 7 and 8.

Fig. 5 Sample training sentences

Fig. 6 Training details

380

S. S. Wazalwar and U. Shrawankar

Fig. 7 Details of training in each epoch which results in vector matrix generation

7 Conclusion There is a need for accurate sign recognition and sign interpretation system for upliftment of deaf–dumb community in society. Most of the researchers are working on this, but the subjective analysis states that the technology is not known to deaf–dumb community and though they know they are not using because of cost or hardware. The people are in search of portable, handy systems which can be carried from one place to other easily or they need some stand-alone systems which can be mounted at various social places. Our system model is completely software based and can be used as an assistive system for deaf and mute. The system is under progress, we will be extending this work for solving ambiguity problems, and we will also work on growing the training database and comparing this work with respect to human interpreter. Still for current scenario, system is giving at least 88% accuracy which is good as compared to probabilistic and rule-based approach for text generation.

30 Sign Language versus Spoken English Language …

381

Fig. 8 Sample output received for some sentences

References 1. Cheok MJ, Omar Z, Jaward MH (2017) A review of hand gesture and sign language recognition techniques. Springer Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-017-0705-5 2. Costa F, Frasconi P, Lombardo V, Sturt P, Soda G (2005) Ambiguity resolution analysis in incremental parsing of natural language. IEEE Trans Neural Netw 16(4) 3. S. S. Indian Sign Language (2018) A linguistic analysis of its grammar. Gallaudet University Press, Washington D.C., United States 4. https://www.motionsavvy.com/ 5. https://www.signall.us/ 6. Ferreira R, Lins RD, Freitas F, Avila B, Simske SJ, Riss M (2014) A new sentence similarity method based on a three-layer sentence representation. In: IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT) 7. http://pib.nic.in/newsite/PrintRelease.aspx?relid=177900 8. Atray SH, Praad TV, Krishna GR (2012) Issue in parsing and POS tagging of hybrid language. CyberneticsCom.-1-4673-0892-2/12/$31.00 IEEE 9. Aiao M, Bian W, Xu RYD, Tao D (2015) Diversified hidden Markov models for sequential labeling. IEEE Trans Knowl Data Eng 27(11)

Chapter 31

Hidden Markov Modelling for Biological Sequence K. Senthamarai Kannan and S. D. Jeniffer

1 Introduction According to the Human Genome Project report, the human body may contain between 20,000 and 25,000 genes, with each gene having two copies. A tumour suppressor is a gene on chromosome 17 linked to cancer that instructs to make protein p53. The tumour protein p53 regulates cell division and uncontrollably suppresses cell growth and division (proliferation). It is connected directly to the DNA in every cell’s nucleus. If harmful substances such as toxic chemicals and UV rays from the sun can influence it, the hereditary material may damage. If the gene is damaged, then the malignancy of cells will begin to occur. Missense substitutions are assumed to be the most common cause of Tp53 gene alterations. This substitution brings the total up to 75%. Nonsense mutations are responsible for 9% and 7% of frameshift insertion and deletion, respectively. The remaining mutations are silent mutations and a few more uncommon alterations. Cancer is the second disease in the worldwide which cause mortality, accounting for 9.6 million deaths worldwide in 2018. Cancer is responsible for around one in every six deaths worldwide. Nearly, 70% of cancer fatalities occur in countries with poor and middle incomes. The malignant cells cum cancer in lung, breast, colorectal and skin are the most frequent happenings in human (WHO report) [1, 14, 19].

K. Senthamarai Kannan · S. D. Jeniffer (B) Manonmaniam Sundaranar University, Tirunelveli, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_31

383

384

K. Senthamarai Kannan and S. D. Jeniffer

2 Review on Related Research Analysis of biological sequence by hidden Markov models has improved from the work of Rabiner [21]. Simmossis et al. give the overview of multiple sequence alignment. Hidden Markov models are proposed for speech recognition and claimed or a real-world data [27]. Various aspects of multiple sequence alignment are also studied [2, 6, 7]. De Fonzo perceived an ideology of HMM for biological analysis, distinguishing between Markov, hidden Markov and profile hidden Markov models [4]. In multiple sequence alignment, the profile–profile alignment can be done. Gene HMM is the prediction process model which is used to model the pre-mRNA, and its gene is predicted. After few years, some of the additional features of HMM are discussed and introduced as gene HMM [26]. By following the footprints of the research and researchers, numerous improvements and implements are carried out in the following years. In recent days, bioinformatics has become a vast and effective tool to analyse the biological sequence and predict the disease earlier. Karuppusamy et al. derived and trained the HMM for random sequences. The probability analysis path approach was considered to determine the best path [11]. The article mainly focussed on the mathematical background of the hidden states Intron and Exon. A systematic review on hidden Markov models is convoyed by Mor et al. [17]. Robson et al. introduced MathFeature, a software package of feature extraction for biological sequences. The corresponding software package is mainly considered the mathematical descriptors [3]. Stuti et al. attempted to align the gene sequences of the patients of COVID-19. Denesh Kumar et al. also endeavoured in the alignment of sequences. For the study, six diabetic gene sequences are considered, aligned using the Clustal webtool, compared with six common diseases’ gene sequences [5]. Jeniffer et al. aligned TP53 gene sequences and constructed the transition probability matrix [10]. The aligned sequences are used to generate and train the profile HMM. The EM algorithm is used for the study to align the sequences, and a new tool was proposed. JalView is introduced by the proctor to visualize the aligned sequences [20]. An entropy-based biological sequence study was conducted by Bimal Kumar Sankar and examined the selection of SAR-CoV regions [25]. Roth et al. used the BaMM server to discover motifs; regulatory sequence analysis was made for the biological sequences [24]. By modifying the Baum–Welch algorithm, Jiefu Li et al. developed a method to improve the training of HMM [15]. This work will use the advances in the previously mentioned research and undertake the charge to predict the disease.

31 Hidden Markov Modelling for Biological Sequence

385

3 Description of the Data Mutated TP53 gene sequences of nine cancer patients had been used for this work. The sequences had been taken from Catalogues of Somatic Mutations in Cancer (COSMIC) databank. A part of the taken sequences of combinations of nucleotides has been shown below. ATGGATGATTTGATGCTGTCCCCGGACGATATTGATCACTGAAGACCCAGGT… ATGGATGATTTGATGCTGTCCCCGGACGATATTGATCACTGAAGACCCAGGT… ATGGATGATTTGATGCTGTCCCCGGACGATATTGATCACTGAAGACCCAGGT… ATGGATGATTTGATGCTGTCCCCGGACGATATTGATCACTGAAGACCCAGGT… ATGGAGGAGCCGCAGTCAGATCCTAGCGTCGAGCCGTCAGGAAACATTTTCA… ATGGAGGAGCCGCAGTCAGATCCTAGCGTCGAGCCGTCAGGAAACATTTTCA… ATGGAGGAGCCGCAGTCAGATCCTAGCGTCGAGCCGTCAGGAAACATTTTCA… ATGGAGGAGCCGCAGTCAGATCCTAGCGTCGAGCCGTCAGGAAACATTTTCA… ATGGATGATTTGATGCTGTCCCCGGACGATATTGATCACTGAAGACCCAGGT… The combination of nucleotides will follow as like this. In the above-mentioned sequence, the alphabet ‘A’ represents the Adenine nucleotide. ‘C’, ‘G’ and ‘T’ represent Cytosine, Guanine and Thymine nucleotides, respectively.

4 Methodology Before constructing the Markov chain for the data, the data has to be preprocessed. For the preprocessing, disparity index, agglomerative clustering cum multiple sequence alignment, generating the consensus sequence will be applied. Lemma 1 Disparity index is the vast using analytical method to test the homogeneity among taken sequences. Let X be the first gene sequence in the pair. In this way, Y be the second sequence. Number of occurrences of i nucleotide in the first sequence will be known by x i . In this aspect, the occurrences of the same nucleotide in the second sequence are yi . Here ‘i’ may be any nucleotide among Adenine nucleotide, Cytosine nucleotide, Guanine nucleotide and Thymine nucleotide. The composition difference between the taken sequences can be defined as DC =

1 (xi − yi )2 , where i = A, C, G or T. 2

(1)

Expected value for the function DC is obtained from the equation. ⎛ 2 ⎞ L 1 ⎝ k ⎠ DC = E i 2 k=1

(2)

386

K. Senthamarai Kannan and S. D. Jeniffer

⎧ ⎨ +1 for ak = i and bk = i where, ik = −1 for ak = i and bk = i ⎩ 0 otherwise Applying homogenous condition in (1) for calculating the expected value, then the following formula can be obtained. ID =

1 (xi − yi )2 − Nd , where Nd is used as an estimator of D. 2 i

(3)

When the homogeneity assumption is satisfied, E(I D ) = 0 [13]. Lemma 2 (Agglomerative Clustering for Multiple Sequence Alignment) All of the sequences must be aligned before they can be analysed. Determine the levels of homology between members of a series of globally linked sequences using multiple sequence alignment. Homologous residues from a set of sequences are aligned together in columns in a multiple sequence alignment. Homologous is used in both a structural and evolutionary sense. There are a number of more heuristic methods of multiple alignment. Progressive alignment methods are based on the natural idea of examining all of the pairwise alignments and using these to construct a multiple alignment. For constructing the multiple sequence alignment, all pairwise alignment scores are calculated. These alignment scores could determine the amount of relationship among the sequences. Using the above calculated alignment scores, the clustering can be made. Agglomerative hierarchical clustering is used to successively merge units. In the sequence analysis, clustering has been used to align the sequence according to the similarity. Let (x i , x j ) be the dissimilarity measure between the two sequences x i and x j, and δ(X, Y ) be the dissimilarity measure between the sequences X and Y. Then the sequence dissimilarity measures can be defined as δ(X, Y ) =

max

xi ∈X,x j ∈Y

d xi , x j

(4)

If there are p sequences to merge, two closest are merged in the beginning. Then there are only p − 1 sequences. The distance can be recomputed, and the closest two sequences can be merged. Then the process will continue until only one sequence is left [23]. This process is simply known as seq–seq, seq–prof and prof–prof alignment, and the aligned sequences can be visualized using the graph which is known as dendrogram. Lemma 3 (Consensus Sequence) A single sequence that is derived from the alignment of numerous constituent sequences and reflects the ‘best match’ for all of them.

31 Hidden Markov Modelling for Biological Sequence

387

If not all of the constituent sequences contain the same amino acid at a given position, a voting or other selection procedure is used to determine which amino acid is used [22]. Homogeneity of the taken gene sequences has been checked by disparity index. The homogenous gene sequences have merged using the agglomerative clustering and aligned according to the progressive alignment. The aligned sequences are then deduced into the consensus sequences to construct the Markov chain.

4.1 Markov Chain If any positive integers k, n, k ≤ n and any choice of states i0 , …, in+1 in S, a stochastic process X n with a finite or denumerable state space S has the Markov property. We have Pr X n = i n |X n−1 = i n−1 , . . . , X j+1 = i j+1 , X j = i j = Pr{X n = i n |X n−1 = i n−1 } Whenever the conditional probabilities involved are defined, i.e. when the conditioning events have positive probabilities, we call {X n } a Markov chain with state space S if it has the Markov property [8]. For each n in the {X n } of finite state space, the matrix P can be defined as shown below. ⎡

p11 ⎢ p21 ⎢ P=⎢p ⎣ 31 .. .

p12 p22 p32 .. .

p13 p23 p33 .. .

⎤ ... ...⎥ ⎥ ...⎥ ⎦ .. .

The stochastic matrix P(n) is known as the transition probability matrix of the process {X n } from the nth to the (n + 1)th step. The matrix P(n + 1) connects the distributions of the random variables {X n } and {X n+1 } for every n ≥ 0 [16].

4.2 Embedded Markov Chain The likelihood given that the system is now in state i and pij is the probability of transitioning to state j at the next leap. The one-step transition probability matrix of a discrete-time Markov chain is a stochastic matrix, as is the matrix P whose (i, j)th entry is pij . The embedded Markov chain is a discrete-time Markov chain that is represented by R. The matrix R = [r ij ] is known as the embedded matrix corresponded to the Markov matrix P = [pij ].

388

K. Senthamarai Kannan and S. D. Jeniffer

The elements r ij for j = i describe the probabilities of transitions between distinct nucleotides for transitions that occurred in the biological sequence defined by P, in the sense that all transitions in the system are from one nucleotide to another [28].

4.3 Hidden Markov Model All states in a linear sequence are readily observable in a Markov model. In some cases, non-observed factors have an impact on state transition calculations. Incorporating such aspects into calculations necessitates the use of more advanced models, such as HMMs. An HMM is made up of two or more Markov chains, with one chain containing entirely of observed states and the other chains consisting entirely of unobserved states that influence the outcome of the observed states. Probabilities of Transition: As each sequence is considered, the value of each transition probability is computed using the frequency of the transitions. The state transition sequences are used to calculate the model parameters. Emission Probabilities: Now that the state transition sequence has been defined, the emission probabilities for each symbol, α ∈ ||, in the model are estimated for each match and insert state, k. The formula is used to calculate the emission probability. As a result, each state has an emission probability that specifies the likelihood of emitting each of the symbols in || in state k, where emission probability Freq(α) ek (α) = k (v) v Freqk

(5)

Step Form of Baum–Welch Algorithm • Pick arbitrary model parameters for initialization. • Set all of the A and E variables to their pseudo count values r for recurrence (or to zero). For each j = 1 … n sequence: Using the forward technique, calculate f k (i) for sequence j. • Using the backward technique, calculate bk (i) for sequence j. Add sequence j’s contribution to A and E. • Calculate the new parameters for the model. Calculate the model’s updated log likelihood. • Stop if the change in log probability falls below a certain level or if the maximum number of repetitions is reached. Training of Viterbi Algorithm • Initiate the procedure by setting, (i = 0): A0 (0) = 1, Aik (0) = 0 for k > 0. • For the i values, (i = 1 … L) determine:

31 Hidden Markov Modelling for Biological Sequence

389

Al (i) = el (xi ) max(Ak (i − 1)akl ); and k

ptri (l) = arg max(Ak (i − 1)akl ). k

• Stop if the value attained P(x, π ∗) = maxk (Ak (L)ak0 ); also π ∗ L = argmaxk (Ak (L)ak0 ). • Then Traceback (i = L … 1) using π ∗ i−1 = ptri (π ∗ i ). • where π ∗ = arg max P π (x, π ), vl (i + 1) = el (xi + 1) max Aik akl k

Because all sequences must begin in state 0 (the start state), the initial condition is A0 (0) = 1. Backtracking can be used to find the actual state sequence by retaining pointers backwards [6].

4.4 Artificial Neural Network An artificial neural network is made up of a collection of very simple and densely interconnected processors known as neurons, which are similar to biological neurons in the brain. Weighted linkages connect the neurons, sending messages from one to the next. Through its connections, each neuron gets a variety of input signals, but it never creates more than one output signal. The outgoing link of the neuron transmits the output signal. The outgoing link then separates into several branches, each of which transmits the same signal. The incoming connections of other neuron in the network are terminated by the outgoing branches. Training Feed Forward Neural Network • Set the initial weights w1 , w2 , …, wn and the threshold to random integers between [−0.5, 0.5] • Activation: Apply the inputs x 1 (p), x 2 (p), …, x n (p) and the desired output Y p to the perceptron (p). At iteration p = 1, calculate the actual output, where n is the number of perceptron inputs and sigmoid is a step activation function. Y ( p) = sigmoid

xi ( p)wi ( p) − θ

(6)

i

• Weight training: Update the perceptron weights wi (p + 1) = wi (p) + wi (p), where wi (p) represents the weight adjustment at iteration p. The delta rule (wi (p) = α × x i (p) × e(p)) is used to calculate the weight correction (p). • Iteration: Increase iteration p by one, then return to the activation step and repeat until convergence is achieved [18].

390

K. Senthamarai Kannan and S. D. Jeniffer

5 Results and Discussions Examination on whether all the sequences are taken from the same organism, the disparity index of homogeneity testing is applied. In the seq1(X) and seq2(Y ), if i = A, then x A = 210, yA = 246. Similarly, if i = C, then x C = 286, yC = 329, if i = G, then x G = 210, yG = 246, and if i = T, then x T = 185, yT = 209. Here DC = 3362, the expected value of DC can be obtained from the sequence of set,

AT GC T G AGT C...T G A A AT GG AGC T AC ...C T AT

By definition, E(DC ) = 181. N d is simply half of the number of sites with difference between sequences. Then, E(I D ) = 0. Therefore, the taken sequences are homogeneous. In the way, all the pairs of nine sequence combinations are verified and found to be homogeneous, then concluded as all the sequences are from the same organism, which is homo sapiens (human) (by Lemma 1). The taken mutated gene sequences are merged using the agglomerative clustering algorithm. The derived dendrogram is shown below. Weights of each sequence are mentioned with the corresponding sequences. The two closest sequences (seq6 and seq7, seq4 and seq5, seq2 and seq3) can be merged in sequence–sequence alignment. In the aligned pairs of sequences, the sequence–profile alignments (profile(seq2, seq3) and seq3), profile–profile alignments (profile(seq6, seq7) and profile(seq4, seq5)) can be made simultaneously according to their relatedness (Lemma 2). As mentioned in the phylogenetic diagram in Fig. 1, the nine sequences are aligned and the part of the results are as follows. In the alignment, ‘–’ represents the gap or missing of the nucleotide and ‘*’ represents the corresponding nucleotide occurs in all the sequences in the same place.

Fig. 1 Dendrogram of the gene sequences

31 Hidden Markov Modelling for Biological Sequence

391

Fig. 2 Consensus sequence logo of aligned gene sequences seq6 seq7 seq2 seq3 seq8 seq9 seq1 seq4 seq5

--------CATAGCGAAATCCCATCTCTACTAA--------------------------GACCGGCGCACAGAGGAAGAGAATCTCCGCAAGAAAGGGGAGCCTCACCACGAGCTGCCC GACCGGCGCACAGAGGAAGAGAATCTCCGCAAGAAAGGGGAGCCTCACCACGAGCTGCCC GACCGGCGCACAGAGGAAGAGAATCTCCGCAAGAAAGGGGAGCCTCACCACGAGCTGCCC GACCGGCGCACAGAGGAAGAGAATCTCCGCAAGAAAGGGGAGCCTCACCACGAGCTGCCC GACCGGCGCACAGAGGAAGAGAATCTCCGCAAGAAAGGGGAGCCTCACCACGAGCTGCCC GACCGGCGCACAGAGGAAGAGAATCTCCGCAAGAAAGGGGAGCCTCACCACGAGCTGCCC GACCGGCGCACAGAGGAAGAGAATCTCCGCAAGAAAGGGGAGCCTCACCACGAGCTGCCC GACCGGCGCACAGAGGAAGAGAATCTCCGCAAGAAAGGGGAGCCTCACCACGAGCTGCCC

The alignment is visualized by the consensus logo chart and presented in Fig. 2. The obtained consensus sequence for the alignment is as follows: ATGGAGGAGCCGCAGTCAGATCCTAGCGTCGAGCCCCCTCTGAGTCAGGAAACAT TTTCAGACCTATGGAAACTACTTCCTGAAAACAACGTTCTGTCCCCCTTGCCGTC CCAAGCAATGGATGATTTGATGCTGTCCCCGGACGATATTGAACAATGGTTCACT GAAGACCCAGGTCCAGATGAAGCTCCCAGAATGCCAGAGGCTGCTCCCCCCGTGG CCCCTGCACCAGCAGCTCCTACACCGGCGGCCCCTGCACCAGCCCCCTCCTGGCC CCTGTCATCTTCTGTCCCTTCCCAGAAAA… The consensus sequence can be used to construct Markov chain (Lemma 3). As by the definition, the transition probability matrix for the nucleotide transition is A C 0.2364 0.2618 P= ⎢ ⎢ 0.2705 0.3743 ⎣ 0.2843 0.2810 0.1026 0.3034 ⎡

D E ⎤ 0.3273 0.1745 0.1148 0.2404 ⎥ ⎥ 0.2614 0.1732 ⎦ 0.4017 0.1923

The invariant solution for the matrix is [0.1848 0.2717 0.2411 0.2921]. The transition probabilities in the above Matrix can be visualized as shown in Fig. 3. In the transition probability matrix, P11 = 0.2364 represents the transition probability of the system moving from the Adenine nucleotide to Adenine nucleotide. P12 = 0.2618, P13 = 0.3273, P14 = 0.1745 are the transition probability of the consensus sequence transit from the Adenine nucleotide to Cytosine, Guanine and Thymine nucleotides, respectively. Similarly, the consecutive elements of TP matrix

392

K. Senthamarai Kannan and S. D. Jeniffer

Fig. 3 Transition digraph of a transition probability matrix

express the probability of transition from certain sequences to the same or another sequences. Wherever the dinucleotide CG appears in the human genome, the nucleotide is usually chemically changed by methylation. The likelihood of this methyl-C changing into a T is relatively high, resulting in CpG dinucleotides being less common in the genome than would be expected based on the independent probabilities of C and G. To assess whether the CpG island Markov chain will occur in the taken sequences, a fragment from a consensus sequence has been taken, and ast+ and ast− are calculated where c+ c− ast+ = st + , similarly ast− = st − t cst t cst

(7)

The resulting table of transition probabilities for the fragment path of CGCG is calculated. The log–odds ratio can be calculated using the equation to use the models for discrimination L L ax+i−1 xi P(x|model+) = S(x) = log log − = βxi−1 xi P(x|model−) axi−1 xi i=1 i=1

31 Hidden Markov Modelling for Biological Sequence

393

L here x represents the sequence i=1 βxi−1 xi is defined as ratio between log likelihoods of a corresponding transition probabilities. Since S(x) < 0, the CpG island cannot occur in this data. Then the matrix P shown in (1) is the preferable one for further calculations [6]. The transition probability matrix in Eq. (1) holds the conditions that each element of the matrix is positive (each Pij > 0) and the row sum equals to 1.

P1 j =

P2 j =

P3 j =

P4 j = 1; j = 1, 2, 3, 4 .

Also, [9] states that the Markov chain follows geometric distribution (by Feller, standard statement). To test this condition, consider T i to be a discrete random variable that expresses a system’s transition to the same nucleotide. As a result, T i = k refers to a system that has been deposited during the k consecutive transitions in which the system has remained in state i. The probability mass function of a geometric distribution is P(Ti = k) = Tii (1 − Tii )k−1 where k = 0, 1, 2, . . . and i = A, C, G, T 1 (1 − Tii ) ; Var(Tk ) = with E(Tk ) = Tii Tii2

(8)

It can be concluded each nucleotide in the consensus sequence follows geometric distribution with the following parameters. • • • •

T A ~ Geo (0.7637) with mean = 1.3094 and variance = 0.404 T C ~ Geo (0.6257) with mean = 1.5982 and variance = 0.9561 T G ~ Geo (0.7386) with mean = 1.3539 and variance = 0.4792 T T ~ Geo (0.8077) with mean = 1.2380 and variance = 0.2948. The probability mass (p.m.f) function of a multinomial distribution is P(X = x1 , X = x2 , X = x3 , X = x4 ) =

n! P x1 P x2 P x3 P x4 (x1 !)(x2 !)(x3 !)(x4 !) 1 2 3 4

(9)

In the taken sequence data, n1 = 276, n2 = 366, n3 = 306, n4 = 234, n = 1182; P1 = 0.2335, P1 = 0.3096, P1 = 0.2589, P1 = 0.1980. P(X = x 1 , X = x 2 , X = x 3 , X = x 4 ) = 0.0003 and the goodness of fit can be test using Chi-square test. The test statistic of χ 2 = 31.2792 with 3 df is less than the table value. The sequence data can thus be concluded to be multinomially distributed [9]. The data is multinomially distributed, and the Markov chain is geometrically distributed. As a result, the standard Markov chain should be renamed embedded Markov chain [12]. To construct the embedded Markov chain, the repeated same nucleotides can be replaced by a single nucleotide symbol. The revised consensus sequence fragment is

394

K. Senthamarai Kannan and S. D. Jeniffer

ATGAGAGCGCAGTCAGATCTAGCGTCGAGCTCTGAGTCAGACATCAGACTATGAC TACTCTGACACGTCTGTCTGCGTCAGCATGATGATGATGCTGTCGACGATATGAC ATGTCACTGAGACAGTCAGATGAGCTCAGATGCAGAGCTGCTCGTGCTGCACAGC AGCTCTACACGCGCTGCACAGCTCTGCTGTCATCTCTGTCTCAGACTACAGCAGC TACGTCGTCTGCTCTGCATCTGACAGCAGTCTGTGACTGCACGTACTCTGCTCAC AGATGTGCACTGCAGACTGCTGTGCAG… The embedded Markov chain transition probability matrix is ⎡

0.0000 ⎢ 0.4381 R=⎢ ⎣ 0.3883 0.1270

0.3429 0.0000 0.3705 0.3757

0.4286 0.1770 0.0000 0.4974

⎤ 0.2286 0.3850 ⎥ ⎥ 0.2411 ⎦ 0.0000

The transition digraph and invariant solution for the above matrix is shown, respectively (Fig. 4). Invariant solution π = [0.2473330 0.2661944 0.2638460 0.2226266]. The entire system can be referred to follow a simple random walk, because the system may move to the right and left in the next transitions. Therefore, the system can be considered as semi-Markov process, and the embedded Markov chain is constructed. This will express the structural behaviour of the biological sequence data better than the usual Markov chain. Behind each nucleotide (visible), there is a hidden state. In order to find the hidden state and their probabilities, hidden Markov model will be applied. Fig. 4 Transition digraph of an embedded Markov matrix

31 Hidden Markov Modelling for Biological Sequence

395

After aligning the sequences, the states will be noted. If all the sequences have certain nucleotide in a particular position, then the state will be known as match state and represented as ‘M’. Similarly, if the nucleotide is missing and expressed as ‘–’, it will be noted as ‘Deleted State (D)’. In a position of match state, only one nucleotide is of another is known as insert state. Using the above specification, the following sequence can be derived. The hidden states transition probability matrix are M I D ⎤ 0.9712 0.0273 0.0143 T = ⎣ 0.3396 0.6415 0.0189 ⎦ 0.0046 0.0000 0.9954 ⎡

The entries are interpreted as mentioned earlier according to their position. The emission probability matrix which interconnects the hidden states and visible states is as follows: A C M 0.2101 0.3252 E= I ⎣ 0.2075 0.1698 D 0.2742 0.3018 ⎡

G T ⎤ 0.2518 0.2130 0.3585 0.2642 ⎦ 0.2603 0.1636

The visual representation of the HMM is shown in Fig. 5. In the match state, there is 21% probability of emitting an Adenine nucleotide. Emissions of the Cytosine, Guanine and Thymine sequence in the same state

Fig. 5 Transition plot of the HMM

396

K. Senthamarai Kannan and S. D. Jeniffer

are 0.3252, 0.2518 and 0.2130, respectively. Similarly, the remaining entries are expressed. The Baum–Welch algorithm is used to train the hidden Markov model, and the estimated parameter values are as follows: A C M 0.2353 0.3173 E= I ⎣ 0.2112 0.1448 D 0.2270 0.3224 ⎡

G T ⎤ 0.2528 0.1946 0.3157 0.3283 ⎦ 0.2582 0.1924

While initialized with forward algorithm and backward algorithm, the iteration terminated at the 12th index. The estimated posterior probabilities are Index states 1 2 3 4 5 6 M 0.3807674 0.4999666 0.57669734 0.63150046 0.64626275 0.64047053 I 0.3038192 0.1789608 0.09957912 0.04469832 0.02986865 0.03528054 D 0.3154133 0.3210726 0.32372354 0.32380122 0.32386860 0.32424893 Index states 7 8 9 10 11 12 M 0.60665551 0.58562722 0.57601236 0.57720213 0.57917281 0.58223606 I 0.06818113 0.08833985 0.09712879 0.09516726 0.09022913 0.08174307 D 0.32516336 0.32603293 0.32685884 0.32763061 0.33059805 0.33602088

The estimated Viterbi path for the visible and hidden states is as follows:

The predicted Viterbi path of the hidden states are

31 Hidden Markov Modelling for Biological Sequence Table 1 Classification of training and testing sets

397 N

Percent (%)

Training

827

70.0

Testing

355

30.0

Valid

1182

100.0

Excluded

0

Total

1182

Sample

Artificial Neural Network To train the artificial network, the feed forward algorithm is used. The 80% of the data is trained to test the remaining 20% of the data (Table 1). The structure of the neural network is shown in Fig. 6. The activation function for the neural network is sigmoid function. The network is trained with the following weight values. w11 = −1.2512, w12 = −1.9579, w13 = 1.8189, w14 = −2.4504, w21 = 1.4262, w22 = 1.7639, w23 = −0.6802, w24 = −1.0687, w31 = −2.0005, w32 = −0.9455, w33 = −0.3996, w34 = −1.7565, w41 = −0.4058, w42 = 1.3035, w43 = 0.6633, w44 = −1.8172, w51 = 0.3440, w52 = −0.3184, w53 = −0.8270, w54 = 1.6429, w61 = −0.4469, w62 = 0.44319, w63 = −1.5731, w64 = 1.7113, w71 = −1.8902, w72 = −2.1062, w73 = −1.5482, w74 = −0.4028, w81 = −1.5666, w82 = 2.0021, w83 = 1.4896, w84 = −1.2506, w91 = −0.9286, w92 = −1.1099, w93 = −0.2008, w61 = 1.6718. with the threshold values, θ 1 = −0.2289, θ 2 = −0.1074, θ 3 = −1.0029, θ 4 = − 0.9799, θ 5 = −0.0316, θ 6 = 0.01412, θ 7 = −0.0493, θ 8 = −0.0753, θ 9 = −0.0183. From the neural network architecture, 236 instances in a test set are correctly classified. The absolute difference between the actual value and predicted value is 0.0018 (MAE). The standard deviation of the errors in the data is measured as 0.0024 (RMSE). Also, the Kappa statistic value is 1; this can be concluded that the data fitted for the model is perfectly reliable. The sensitivity, specificity, area under the ROC curve and PRC curve and F measure are measured and found to be the model which is perfectly fitted for the data. The results are tabularized and visualized as shown, respectively (Table 2; Figs. 7, 8 and 9). The confusion matrix for the test data is ⎡

49 0 ⎢ 0 49 C =⎢ ⎣0 0 0 0

0 0 67 0

⎤ 0 0 ⎥ ⎥ 0 ⎦ 71

398 Fig. 6 Structure of trained ANN

K. Senthamarai Kannan and S. D. Jeniffer

31 Hidden Markov Modelling for Biological Sequence

399

Table 2 Summary of the fit of ANN model TP rate

FP rate

ROC area

PRC area

F-statistic

A

1.0000

0.0000

1.0000

1.0000

1.0000

C

1.0000

0.0000

1.0000

1.0000

1.0000

G

1.0000

0.0000

1.0000

1.0000

1.0000

T

1.0000

0.0000

1.0000

1.0000

1.0000

Fig. 7 ROC curve

The input sequences are trained using the feed forward algorithm and tested for the given data sequences. The predicted values of the consensus sequence are noted, compared with the actual consensus sequence and found to be exactly matched. So, the model is perfectly trained and fitted for the data. The following is a summary of the training and testing data categorization with the projected sequence (Table 3). The methodology has been implemented as mentioned in [18].

6 Conclusion In this study, nine mutated TP53 gene sequences are taken and aligned using hierarchical clustering. From the aligned sequences, consensus sequences have been

400

Fig. 8 PRC curve

Fig. 9 Curve of sensitivity versus specificity

K. Senthamarai Kannan and S. D. Jeniffer

31 Hidden Markov Modelling for Biological Sequence

401

Table 3 Classification between training and testing data Sample

Observed

Training

Testing

Predicted A

C

G

T

Percent correct (%)

A

227

0

0

0

100.0

C

0

317

0

0

100.0

G

0

0

239

0

100.0

T

0

0

0

163

100.0

Overall percent (%)

23.99

33.51

25.26

17.23

100.0

A

49

0

0

0

100.0

C

0

49

0

0

100.0

G

0

0

67

0

100.0

T

0

0

0

71

100.0

Overall percent (%)

25.9

27.9

26.2

20.0

100.0

generated, and the Markov chain, hidden Markov model, artificial neural network have been constructed. The predictions have been made and compared with the actual values according to the error measures available for certain approaches. The mutated DNA sequences of the TP53 gene were analysed based on this study to identify and detect the malignant disease. Among all the bioinformatics approaches of prediction, hidden Markov model and artificial neural network are found to be efficient for prediction of the disease. The study has concluded that the actual sequence and the predicted sequence are very similar to each other. Results mentioned in the hidden Markov model section and Table 3 strengthen this aspect. Though Markov model is a state wise procedure for prediction, hidden Markov model considers the hidden states in prediction and found to be a better performing one. On the other hand, considering the invariant solutions of the matrices P and R, the proposed embedded Markov chain explains the structural behaviour of the data better than the conventional Markov chain. A trained artificial neural network also predicted the sequence as exact as the actual consensus data sequence. It can be concluded that the proposed methods are the efficient approaches for disease prediction. This research will aid in the analysis of the genome, as well as the identification and detection of malignant diseases caused by gene mutations. If the gene mutation has been found earlier, it can use as a low-cost primary prevention method for cancer and drug resistance. While ‘wholesome genome’ selection is a costly, time-consuming and complicated issue, the model developed by this study using stochastic methods will make gene sequence teaching faster, cheaper and trouble free. When we consider the population that requires this testing, the time and cost savings will add up to significant savings for policymakers and society. Financial Support and Funding This work was financially supported by DST-INSPIRE, Government of India, under grant DST/INSPIRE Fellowship/[IF190881].

402

K. Senthamarai Kannan and S. D. Jeniffer

References 1. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SA, Behjati S, Biankin AV et al (2013) Signatures of mutational processes in human cancer. Nature 500(7463):415–421 2. Boer J (2016) Multiple alignment using hidden Markov models. Proteins 4:14 3. Bonidia RP, Domingues DS, Sanches DS, de Carvalho AC (2021) MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors. Briefings Bioinform 4. De Fonzo V, Aluffi-Pentini F, Parisi V (2007) Hidden Markov models in bioinformatics. Curr Bioinform 2(1):49–61 5. Deneshkumar V, Manoprabha M, Senthamarai Kannan K (2020) Multiple sequence alignment with hidden Markov model for diabetic genome 6. Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press 7. Eddy SR (1995, July) Multiple alignment using hidden Markov models. In: ISMB, vol 3, pp 114–120 8. Gagniuc PA (2017) Markov chains: from theory to implementation and experimentation. Wiley 9. Gentleman JF, Mullin RC (1989) The distribution of the frequency of occurrence of nucleotide subsequences, based on their overlap capability. Biometrics 35–52 10. Jeniffer S, Kannan KS (2021) Stochastic modelling for identifying malignant diseases 11. Karuppusamy T (2021) Biological gene sequence structure analysis using hidden Markov model. Turk J Comput Math Educ (TURCOMAT) 12(4):1652–1666 12. Krumbein WC, Dacey MF (1969) Markov chains and embedded Markov chains in geology. J Int Assoc Math Geol 1(1):79–96 13. Kumar S, Gadagkar SR (2001) Disparity index: a simple statistic to measure and test the homogeneity of substitution patterns between molecular sequences. Genetics 158(3):1321– 1327 14. Lees A, Sessler T, McDade S (2021) Dying to survive—the p53 paradox. Cancers 2021(13):3257 15. Li J, Lee JY, Liao L (2021) A new algorithm to train hidden Markov models for biological sequences with partial labels. BMC Bioinform 22(1):1–21 16. Medhi J (1994) Stochastic processes. New Age International 17. Mor B, Garhwal S, Kumar A (2021) A systematic review of hidden Markov models and their applications. Arch Comput Methods Eng 28(3) 18. Negnevitsky M (2005) Artificial intelligence: a guide to intelligent systems. Pearson Education 19. Petitjean A, Achatz MIW, Borresen-Dale AL, Hainaut P, Olivier M (2007) TP53 mutations in human cancers: functional selection and impact on cancer prognosis and outcomes. Oncogene 26(15):2157–2165 20. Procter JB, Carstairs GM, Soares B, Mourão K, Ofoegbu TC, Barton D et al (2021) Alignment of biological sequences with Jalview. In: Multiple sequence alignment. Humana, New York, pp 203–224 21. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286 22. Rastogi SC, Mendiratta N, Rastogi P (2008) Bioinformatics methods and applications: genomics. In: Proteomics and drug discovery. PHI Learning Private Limited, New Delhi 23. Reilly C (2009) Statistics in human genetics and molecular biology. CRC Press 24. Roth C (2021) Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts. Doctoral dissertation, Georg-August University 25. Sarkar BK (2021) Entropy based biological sequence study. In: Entropy and exergy in renewable energy. IntechOpen 26. Schuster-Bockler B, Bateman A (2007) An introduction to hidden Markov models. Curr Protoc Bioinform 18(1):A–3A

31 Hidden Markov Modelling for Biological Sequence

403

27. Simossis V, Kleinjung J, Heringa J (2003) An overview of multiple sequence alignment. Curr Protoc Bioinform 3(1):3–7 28. The methodology of embedded Markov chain is retrieved from the continuous-time Markov chains lecture. https://mast.queensu.ca/~stat455/lecturenotes/set5.pdf

Chapter 32

Proposed Crowd Counting System and Social Distance Analyzer for Pandemic Situation Mrunal Girhepunje, Simran Jain, Triveni Ramteke, Nikhil P. Wyawahare, Prashant Khobragade, and Sampada Wazalwar

1 Introduction In this pandemic situation, the main cause for the spread of congenital disease was overcrowded regions and unmaintained social distance. Using the crowd counting method, you can determine the number of people present in a specific area. Social distance also plays a crucial role in reducing the impact of congenital disease. It implies reducing close contact between people; i.e. people should physically distance themselves from each other. The various applications of crowd counting and social distance analyzer are travel security, public safety, traffic control, etc. In our proposed system, we are making use of OpenCV in python, which is a programming language that is used for image processing tasks, and Raspberry Pi is used to get the count of people present in the particular region. The main objective of our project is to reduce the spread of congenital diseases and contribute to society by ensuring the safety of people under all circumstances.

M. Girhepunje · S. Jain · T. Ramteke G H Raisoni College of Engineering, Nagpur, MS, India N. P. Wyawahare (B) Department of Electronics Engineering, G H Raisoni College of Engineering, Nagpur, MS, India e-mail: [email protected] P. Khobragade Department of Computer Science and Engineering, G H Raisoni College of Engineering, Nagpur, MS, India e-mail: [email protected] S. Wazalwar Department of Information Technology, G H Raisoni College of Engineering, Nagpur, MS, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_32

405

406

M. Girhepunje et al.

2 Background and Related Work Counting crowds have too many uses that use computer vision and IoT technologies based on captured video and image recognition, which is mostly used for safety reasons [2]. Throughout history, there have been many different methods of counting how many people are present in a crowded scene. The outbreak of COVID-19 at the end of December 2019 led researchers to become involved in this deadly situation. Numerous projects were made to analyze social distance in order to enhance the safety of people as the social distance was suggested as an alternative solution. As a result of COVID-19, the government also restricted the number of people in seminar halls and gatherings to prevent the spread of congenital diseases. It was declared that a 6ft social distance was to be maintained between each other to prevent the spread of disease. In an effort to provide an effective solution to social distancing, many research studies were undertaken. Technology-based solutions are now being sought by many countries to address this problem. In addition, human detection played an important role in managing unusual human activities during this deadly event. As a result of the ArogyaSetu application in India, people are able to detect COVID-19 cases in the vicinity using GPS and Bluetooth technology, which assists them in keeping a safe distance from an infected person. A symptom tracker for Corona C9 has been introduced in the UK. There is also an application for Corona 100 m available in South Korea. CCTV footage was utilized by South Korea and Singapore to trace infected people who visited recently visited locations where COVID-19 patients were. China made use of artificial intelligence-driven thermal cameras to identify people in the crowd displaying a high body temperature. The same invention could flatten the curve in this drastic situation but at the same time, it could cause problems with the particular information. The author of a recent research paper on social distancing and facemask detection uses object detection and object tracking to calculate social distance and identify masks on people’s faces. A limited amount of light resulted in the situation that led Addina Rahim [11] to develop a novel approach to social distance monitoring that simplified social distance monitoring tasks to aid in saving lives. A multitask cascaded CNN (MTCNN) was used by Peng-Zhao to assist in counting the number of people [2]. Below are the main points of the work and the detailed workflow. Within the framework of MTCNN, three tasks were completed synchronously. In addition to face classification, bounding box position prediction, and facial landmark prediction, the trained model was tested against the Wider Face dataset. Each detection made by the model in the test set was treated as a successful detection, and its success rate was determined based on the success rate achieved. According to the original design, scaled-down images were included in the inference cascade for the model’s detection rate of 0.8397 with room to improve. The process was significantly slowed by iterating through a large image since P-Networks were not as efficient as sliding windows. Another issue was a poor recall rate, especially in images with many small faces using the pedestrian

32 Proposed Crowd Counting System and Social Distance Analyzer …

407

counting network (PCNet), Abhijit Baul [18] introduced the pedestrian flow inference model (PFIM) which quantifies the volume and density of pedestrian flows. Using MATLAB and ThingSpeak, Kadir Sabanci [19] used the background subtraction method to recognize humans in moving positions on the camera’s visual field. The result was saved on the Internet and retrieved with MATLAB and ThingSpeak. Researchers conducted different studies to find a useful social distance meter and crowd counting tool, but none focused on both simultaneously. There have been a lot of solutions developed recently utilizing deep learning techniques, YOLO algorithm, AIML model, CNN that were all used for automatic object detection. However, this paper focuses on people detection and analyzing their distance from each other.

3 Proposed System and Methodology This project uses a Raspberry Pi board for object tracking and counting the people who enter the front door. The Raspberry Pi is a small, yet powerful computer, on which we connected an IP camera and a PIR sensor. The IP camera takes the photographs of the people as they pass the door. In order to test the results on the monitor (display), Through HDMI, the Raspberry Pi board is connected to the monitor, displaying frames showing the number of faces detected by the IP camera. The counter is showing how many faces were detected. ThinkSpeak gives access to graphical data, and OpenCV gives access to camera data. The interface between those two is achieved through OpenCV. By converting the image data to a tensor array, the OpenCV library can resize images and create feature vectors from them. TensorFlow is an extension of deep neural nets that we will use to train our model. TensorFlow starts with the assumption that not all pixels will be required for identifying certain features on an image. The purpose of this category is to analyze an input image (aerial or perspective) and design a network accordingly to improve network accuracy. Applications include performing medical imaging, monitoring and counting people using drones, and monitoring targeted areas using these techniques. The proposed system gives output as the combination of projects using OpenCV, other software like python and Raspberry Pi. The proposed system will count the number of people present in a particular area. As any human enters the hall, the system will give them IDs according to the limit set. If anyone exits, the system will reduce one count as shown in Fig. 2. As soon as someone enters the conference, the system will add one newer ID as shown in Fig. 1. In the meantime, a social distance analyzer comes into effect. By detecting objects in a video stream, this algorithm will compute the pairwise distance between people. The distance between the people will be fixed, according to which detection will take place. Suppose, if the people limit for a conference hall is 20, the system will detect people and an alarm will come into effect if more than 20 people try to enter the hall. Meanwhile, the social distance analyzer will determine whether the fixed distance between them is maintained or not.

408

M. Girhepunje et al.

Fig. 1 Persons in counting zone

Fig. 2 Situation if persons exit from counting zone

3.1 Approach The working of the crowd counting social distance analyzer is depicted using a flowchart shown in Fig. 3.

32 Proposed Crowd Counting System and Social Distance Analyzer …

409

Fig. 3 Flowchart of proposed system

4 System Implementation This section discusses the system implementation and working of the crowd counting and social distance analyzer model.

4.1 Library Libraries used while making a proposed system: • OpenCV: Video analysis, CCTV footage analysis, and image analysis are all applications that can benefit from OpenCV’s extensive open-source library. • Raspistill: Raspistill is a library in python that is used to access the camera functionalities, directly from the terminal. The live streaming videos through Pi-Cam are done using this library. • Raspivid: The data is read directly through the ‘raspivid’ command from Pi-Cam. The data is then forwarded directly over the network using ‘nc’ command. • Dlib: With Dlib, you can perform image processing from captured videos and images using machine learning algorithm, such as classification, regression, clustering, data transformation, and structured predictions. • Pillow: For instance, Pillow, or PIL, is a Python Imaging Library which is capable of opening, manipulating, and saving images in an extremely different format. To install PIL, enter the following command.

410

M. Girhepunje et al.

• Face recognition: The face recognition library for python is considered to be the only library that can recognize and manipulate face images. We will use this library to coach and recognize faces. • Facial Recognition and Face detection using Python Libraries: The main difference between face recognition and face detection is that the face detection library is used to detect face, whereas the face recognition library can not only detect face but it also used for the recognition of face as the face data is saved and then used to recognize face. • Adafruit: Adafruit has put together a collection of code that is extremely useful to those wishing to experiment with electronics on their Pi. There are several Circuit Python Libraries for a variety of modules, including displays, sensors, actuators, etc. • Imutils: Imutils are a set of convenience functions for facilitating basic image processing functions, such as translation, rotation, resizing, skeletonizing, and displaying Matplotlib images using OpenCV and Python 2.7 and Python 3. • Base 64: Despite their popularity, Base64 encoding schemes are typically employed whenever there is a need to encode binary data that will need to be transferred over ASCII-compatible media. By doing so, the data will remain intact without any changes during transport. • Argparse: By using the argparse module, it is easy to create user-friendly command-line interfaces. By defining what arguments it needs, the program can use argparse to parse the arguments out of sys.argv. • Numpy: The NumPy library provides access to multidimensional array objects and routines for processing them. Through NumPy, it is possible to perform mathematical and logical operations on arrays.

4.2 Input Collection The image captured and video is recorded by using Pi Camera which is given as the input as shown in Fig. 4. The camera is setup at fixed angle in such way that it can estimate proper distance between each person.

4.3 Video Capturing For capturing video, enable the camera module in the Raspberry Pi configuration. For that use, the command sudo raspi-config then go to Interface Option then goto camera and enable it. To access the camera, use the raspivid command in the command prompt. Video is a sequence of images used in a project to detect human activity in a given area as it happens. To capture the most accurate image of a person, place the camera

32 Proposed Crowd Counting System and Social Distance Analyzer …

411

Fig. 4 Hardware connection for Input collection

two meters below the top with a slope of 90° vertically. One static camera is used to capture the images, which makes the system cost-effective. OpenCV provides a wide range of image and video operations through its library. Using OpenCV, we can capture videos from cameras. • For video capture, call cv2.VideoCapture(). • In an infinite while loop, read the frames with the read() method using the abovecreated object. • Use the cv2.imshow() method to show video frames. • Breaks the loop when a key is pressed.

4.4 Human Detection Using the bounding box of each object in an image frame, we can track and detect activities of the moving objects as they move through further frames. In addition to detecting human activity like crowds and estimating groups of people, these bounding boxes can be used for detecting humans and human activities. This bounding box has a rectangular shape, determined by the x- and y-coordinates of the upper-left corner in conjunction with those of the lower-right corner. As well as the (x, y)-axis coordinates of the bounding box center, bounding box width and height are also commonly used bounding box representations. • Use Detect() method: To detect a person in a frame, make a green box around the person. Then show the frame with the person bounded by the green box in it (Fig. 5). With OpenCV, a very fast algorithm for detecting humans is implemented called histograms of oriented gradients (HOG). In this method, pedestrians are trained to be

412

M. Girhepunje et al.

Fig. 5 Code for bound box

Fig. 6 Code for human detection

recognized as humans mostly standing up and completely visible. Various detection windows of different sizes are scanned to identify pedestrians. The detection window is divided into cells for each position and size of the detection window. In practice, the cells are relatively small: They usually contain only a tiny portion of the person to be detected, perhaps the side of the arm or the top of the head. The value is the angle of the gradient, and the weight is the magnitude of the gradient. The gradients are then used to fill a histogram: Each cell has a gradient. We combine the histograms of all cells and feed them to a machine learning discriminator to determine whether or not cells in the currently selected detection window correspond to people. Support vector machines (SVMs) comprise a group of supervised learning methods used for statistical classification, regression, and outlier detection. • Use human detector method. The cv2.VideoCapture(0) function passes 0 whether or not we want to record video from picamera. video.read() is called to read a frame one at a time. It returns True if the frame was read, otherwise False. • Argparse() method. In argparse(), we simply parse and return the arguments passed through our script to our terminal as a dictionary (Fig. 6).

4.5 Counting ID For tracking the human in given frame, store the centroid of the bounding box coordinates for the new object count the bounding box and give them id from which we easily calculate the human. • Count no of faces (Fig. 7).

5 Result and Discussion This section explains the different experiments conducted in this work. The detection result of pre-trained architecture has been shown in Figs. 8 and 10. In Fig. 8, No Person

32 Proposed Crowd Counting System and Social Distance Analyzer …

413

Fig. 7 Code for ID

Fig. 8 Negative results for crowd detection

is detected in the captured frame while in Fig. 10, it shows the captured human and count of it (Fig. 9). Figure 10 shows id assigned to detected person. The Id is assigned on entering the particular region after being detected. People counter is used to count the number of person present in a particular region. And Fig. 11 shows the visualization of people detected in a frame. The graph shows number of count after detecting a person and zero if not detected. The data is updated on the detection of people entering a particular region. The data updation takes place on a particular channel in ThingSpeak.

414

Fig. 9 System result if negative crowd detection

Fig. 10 Detection of person in crowd

Fig. 11 System result if positive crowd detection

M. Girhepunje et al.

32 Proposed Crowd Counting System and Social Distance Analyzer …

415

6 Conclusion Congenital disease like COVID-19 has hit the world and had disrupted millions of lives across the globe. Considering the deadly situation that the globe has faced in the past few months, the proposed system can be proven to reduce the spread of congenital diseases. The proposed architecture helps to count the number of people present in a particular region. This study was undertaken to learn about crowd counting and social distance analyzer in preparation for COVID-19. The proposed system uses human detection to estimate the crowd size. Where each individual is identified in real time with the help of bounding boxes and giving separate ids to each one of them.

7 Future Work In the future work, the proposed system will be developed using the sensors and controllers which actives the alarms system if persons count is increased above the threshold level. The basic sensors can also be used for limited entry through automatic opening and closing of doors. The proposed system is helpful in Metros, railway station, or in railways bogies. If persons count is equal or more than threshold value the automatic entry gates will open only for exits but not for entry. System is doing person identification along with crowd detection so in some cases criminal activity and suspect persons can also be monitored easily in public station, where persons tracking becomes very tedious work. In the future, system can be updated using IoT technology for specific person tracking and monitoring in crowd to avoid criminal actions in public gatherings.

References 1. Kuchhold M, Simon M, Eiselein V, Sikora T Scale-adaptive real-time crowd detection and counting for drone images. https://sci-hub.mksa.top/https://doi.org/10.1109/ICIP.2018.845 1289 2. Zhao P, Adnan KA, Lyu X, Wei S, Sinnott RO Estimating the size of crowds through deep learning. https://sci-hub.hkvisa.net/https://doi.org/10.1109/CSDE50874.2020.9411377 3. Greenstone M, Nigam V (2020) Does social distancing matter? University of Chicago, Becker Friedman Institute for Economics Working Paper (2020–26) 4. Hao N, Minglei T, Luyuan F et al (2019) Application of multi-task and multi-level CNN in crowd counting. Comput Eng Appl. https://doi.org/10.3778/j.issn.1002-8331.1808-0278 5. Santhini C, Gomathi V Crowd scene analysis using deep learning network. https://doi.org/10. 1109/ICCTCT.2018.8550851 6. Rajendran L, Shankaran RS Big-data enabled real-time crowd surveillance using artificial intelligence and deep learning. https://doi.org/10.1109/BigComp51126.2021.00032

416

M. Girhepunje et al.

7. Wu H (2021) High-accuracy crowd counting method based on mixed labeled dataset. In: 2021 IEEE 2nd international conference on big data, artificial intelligence and internet of things engineering (ICBAIE 2021) 8. Meiyun C, Bisheng W, Guo C et al (2020) Crowd counting method based on pixel-level attention mechanism. Comput Appl 40(1):56–61. https://doi.org/10.11772/j.issn.1001-9081. 2019050920 9. Alleviation of COVID by means of social distancing and face mask detection using YOLO V4. In: 2021 international conference on communication information and computing technology (ICCICT). https://doi.org/10.1109/ICCICT50803.2021.9510168 10. Indulkar Y Alleviation of COVID by means of social distancing and face mask detection using YOLO V4. https://www.analyticsvidhya.com/blog/2021/05/alleviation-of-covidby-means-of-social-distancing-face-mask-detection-using-yolo-v4/ 11. Rahim A, Maqbool A, Rana T Monitoring social distancing under various low light conditions with deep learning and a single motionless time of flight camera. https://journals.plos.org/plo sone/article?id=https://doi.org/10.1371/journal.pone.0247440 12. Kanjo E, Anderez DO, Anwar A, Alshami A, William J CrowdTracing: overcrowding clustering and detection system for social distancing. https://www.techrxiv.org/articles/preprint/CrowdT racing_Overcrowding_Clustering_and_Detection_System_for_Social_Distancing/14709762 13. Punn NS, Sonbhadra SK, Agarwal S, Rai G Monitoring COVID-19 social distancing with person detection and tracking via fine-tuned YOLO v3 and Deepsort techniques. https://arxiv. org/pdf/2005.01385.pdf 14. Pre-training convolution network for crowd counting. https://doi.org/10.1109/CTISC52352. 2021.00062, https://ieeexplore.ieee.org/document/9527619 15. W. H. Organization (2020) WHO corona-viruses (COVID-19). https://www.who.int/emerge ncies/diseases/novel-corona-virus-2019. [Online]. Accessed 02 May 2020 16. Hemangi B, Nikhita K (2016) People counting system using raspberry pi with Opencv. 2(1). ISSN 2494–9150 17. Subashree D, Mhaske SR, Yeshwantrao SR, Kumar A (2021) Real time crowd counting using OpenCV. 10(05). Paper ID: IJERTV10IS050147 18. Baul A, Kuang W, Zhang J, Yu H, Wu L Learning to detect pedestrian flow in traffic intersections from synthetic data. https://doi.org/10.1109/ITSC48978.2021.9564853 19. Sabancı K, Yigit E, Üstün D, Tokta¸s A, Çelik Y (2018) Thingspeak based monitoring IoT system for counting people in a library. In: 2018 international conference on artificial intelligence and data processing (IDAP). https://doi.org/10.1109/IDAP.2018.8620793 20. Herviana A, Sudiharto DW, Yulianto FA The prototype of in-store visitor and people passing counters using single shot detector performed by OpenCV. https://doi.org/10.1109/ICITAM EE50454.2020.9398507 21. Madhira K, Shukla A Pedestrian flow counter using image processing. https://doi.org/10.1109/ ICECDS.2017.8389782 22. B. News (2020) China coronavirus: lockdown measures rise across Hubei province. https:// www.bbc.co.uk/news/world-asia-china51217455. [Online]. Accessed 23 January 2020

Chapter 33

A Novel Ensemble Model to Summarize Kannada Texts S. Parimala and R. Jayashree

1 Introduction India is the second-most populated country in the world with around 1.4 billion people. The amount of information in the World Wide Web is enormously huge that it has become imperative to build tools which provide access to the information and content to the Indian users effectively. Various natural language understanding (NLU) models have been built for the purpose of analyzing and understanding the meaning of the text. But these perform worse with Indian languages leading to subpar experiences in downstream Web applications compared to English. Indian dialects are broadly classified into four categories, namely Indo-Aryan, Austro-Asiatic, Tibeto-Burman and the Dravidian. Kannada belongs to the Dravidian dialect family. There are around 57 million people conversing in Kannada everyday across the globe. With millions of articles and content in Kannada in the Internet, it is nearly impossible to read through the entire content in order to get an idea of what it is about. Text summarization plays a pivotal role in this task by reducing the number of lines to be read. Text summarization is a natural language processing task which aims at generating a concise and precise summary of voluminous texts by giving importance to more useful information without losing the overall meaning. It is broadly classified into extractive and abstractive summarization. Abstractive summarization deals with content-based summarization. Extractive summarization is a category which generates the summary by extracting sentences from the input text. In this paper, we have elaborated on the literature survey, the complete methodology used and a detailed discussion on parameter tuning and efficiency obtained while summarizing documents of various categories.

S. Parimala (B) · R. Jayashree Department of Computer Science, PES University, Bangalore, India e-mail: [email protected] R. Jayashree e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_33

417

418

S. Parimala and R. Jayashree

2 Literature Survey The elaborate literature study conducted shows that most of the algorithms that have been built revolve around scoring sentences and ranking them to extract the most important sentences. The maiden work in text summarization traces back to the Luhn’s heuristic method [1] in 1958. Here, the preprocessed sentences were ranked based on their phrase/word frequency. Summary was built from the sentences which had high frequency. The setback of this method was that it gave low accuracy and resulted in redundant sentences in the summary. Baxendale [2] proposed a method which followed the traditional approach of picking the document title, first and last sentences of each paragraph for building the summary. This method was not practically efficient as it is not completely reliable, but domain-related. Devi et al. [3] suggested a graph-based algorithm for Tamil document summarization. This approach uses word frequency and Levenstein distance to rank the sentences in the documents. Each sentence is denoted as a vertex of the graph which is assigned with sentence weights. The edges of the graph were given Levenstein similarity weights. Then, the sentences were ranked based on the average of the sentence weight and vertex weight and extracted the top few sentences. Jayashree et al. [4] have proposed the method which combines TF–IDF model GSS (Galavotti, Sebastiani, Simi) coefficients for extracting keywords from the Kannada texts. The sentences are ranked by the weights computed using the above models. The top ‘m’ sentences are used to generate the summary, where ‘m’ is the length of required summary. Nishikawa et al. [5] propose an algorithm for opinion summarization based on integer linear programming. This was designed by combining maximum coverage problem and traveling salesman problem. Here, each sentence is ranked based on its content and coherence score. It was evaluated using the ROUGE score. It performed better than the other existing opinion summarizers. Kamal Sarkar et al. [6] proposed a method of summarizing Bengali texts by leveraging term frequency and inverse document frequency of the terms in the document. Geetha and Deepamala et al. [7] Bengali documents by using latent semantic analysis and singular vector decomposition. SVD and LSA together help capture the relation between the sentences and reduce redundancy in the generated summary. Kumar et al. [8] came up with a graph-based approach for Hindi text summarization. This helps find the relation between the sentences and analyze the importance. The relevance between each of the sentences was computed using semantic similarity. The summary is then generated using the sentences whose rank was high after the semantic analysis of the sentences. This was validated using the F1 score. Kandarp Dave et al. [9] conducted a research work on feature selection algorithms for text categorization. Here, the concept of GSS coefficients was used to determine importance of words based on categories to classify them.

33 A Novel Ensemble Model to Summarize Kannada Texts

419

Kanitha et al. [10] emphasized on a graph theoretic approach for summarizing Malayalam text. The sentences were denoted as nodes, and the relation between the sentences was represented as edges. The cardinality of a graph indicated the importance of sentences. Precision, recall and F1-score measures were used for evaluation of the algorithm. Vaishali et al. [11] worked on a graph-based method for summarizing Marathi texts. This takes in the processed Marathi texts and applies graph scoring methods implemented using text rank algorithm, aggregate similarity and bushy path algorithms to rank all the sentences of the document. The final summary was evaluated using ROUGE similarity scores. Elvys Pontes et al. [12] proposed a novel method for cross lingual text summarization. Here, the documents belonging to certain input language were summarized in the target languages. It utilized long short-term memory (LSTM) networks with an attention layer to compress sentences. This removed the unwanted, irrelevant words and grouping similar sentences. But it was tested only on French to English language combination.

3 Methodology The methodology that is applied has been broadly divided into eight steps:

3.1 Data Acquisition Data consists of Kannada articles and documents belonging to the categories, namely sports, mythology and entertainment. The data was scraped from trusted Web sites like http://kannada.webdunia.com/. Kannada text documents were also collected from the dataset https://www.kaggle.com/disisbig/kannada-wikipedia-articles. A Python API built based on Beautiful Soup library was used to parse the HTML content from the Web site and store it as a string. All the documents/articles collected along with the category name that it belongs to are listed and placed into a single csv file.

3.2 Data Cleaning All the articles stored are read one by one. The data is cleaned to remove unwanted characters and punctuations present in it. The cleaned summary is tokenized into sentences.

420

S. Parimala and R. Jayashree

3.3 Stemming A morphological stemmer was built to determine the root word of any given word. Thus, all the sentences in the input text were converted into a sequence of root words by processing it in the stemmer. Kannada is a morphologically rich language. Three hundred and thirty different conjuncts have been created out of the 38 basic characters by different combination of vowels and consonants. There are more than 10,000 basic root words, and around a million morphed variants can be formed out of these. For example, consider the word below: This is divided into multiple parts as follows:

The entire set of suffixes are mapped in the form of rules. The total number of rule categories is 74, z number of suffixes pertaining to the same category. Given below is a sample of few of the rules (Fig. 1). Rules 1–40 denote the categories of suffixes belonging to different forms of tenses. 41–73 consist of rules which belong to the same category of meaning and some random suffixes. One of the rule categories is elaborated in detail below (Fig. 2). The figure clearly depicts that the words below have similar meanings and hence can be grouped together.

Fig. 1 Sample snapshot of the suffixes in few of the rule categories

33 A Novel Ensemble Model to Summarize Kannada Texts

421

Fig. 2 Figure depicts the various suffixes present in rule 48 and their meanings elaborately

Fig. 3 Table depicts the form of the word before and after stemming suffixes of type 1

3.3.1

Rule Types

The stemmer built focuses on four different types of stemming (Figs. 3, 4, 5 and 6). 1. 2. 3. 4.

Suffixes where left half of suffix is retained and right half is discarded Suffixes where left half of suffix is retained and right half is modified Suffixes which must be completely removed Single charactered suffixes are removed terminally after further analysis.

422

S. Parimala and R. Jayashree

Fig. 4 Table depicts the form of the word before and after stemming suffixes of type 2

Fig. 5 Table depicts the form of the word before and after stemming suffixes of type 3

Fig. 6 Table depicts the form of the word before and after stemming suffixes of type 4

3.4 Stop Words A file containing around 65 stop words was created. All these stop words were removed from all the articles in the dataset. Given below is a set of the stop words created (Fig. 7).

3.5 TF–IDF Computation After removing the stop words and retaining the root words, the list of sentences is considered as the input for further calculations. As the first step, we compute the frequency matrix of the document with index being the term. Frequency of term is computed as the number of times the term is present in the document. Next, the TF values for each term are computed as follows: For each term ‘t’ and the given document ‘d’

33 A Novel Ensemble Model to Summarize Kannada Texts

423

Fig. 7 Stop word list created for Kannada

TF(t, d) =

Frequency of the term Total no. of terms in document

(1)

The inverse document frequency (IDF) matrix is now computed for all the terms in the document as follows. For each term ‘t’ and the entire corpus ‘D’ IDF(t, D) = log10

N n term

(2)

where N —Total number of documents in the corpus; n term —Number of documents containing the term in it. Then, TF–IDF value for each sentence is calculated as TFIDF(t, d, D) = TF(t, d) ∗ IDF(t, d, D)

(3)

where t = term, d = current document, D = complete corpus. Thus, TF–IDF calculates the importance of a term in a document bereft of the category.

424

S. Parimala and R. Jayashree

Interpretation: A higher value of TF–IDF is obtained by a term with high frequency (in the given document), and a low frequency denotes that the terms are rare in the corpus of documents. Thus, the weights computed help to choose the common terms.

3.6 TF–GSS Computation After computing TF–IDF value, we do realize that it alone cannot efficiently generate summaries as it is unable to pick words which are characteristics to a particular category/topic. Few keywords which may occur very rarely yet important are missed out. Hence, we proceed by incorporating Galavotti Sebastiani Simi (GSS) coefficients into the existing model. The GSS coefficient characterizes the relevance of a word to a particular category. It is often used as a measure of extracting informative words to aid text categorization. For a given term ‘t’ and category ‘c’, the GSS coefficient is computed using probabilities as follows: GSS(t, c) = P(t, c) ∗ P(t , c ) − P(t , c) ∗ P(t, c )

(4)

where P(t, c)—Probability of a document containing term ‘t’ and belonging to category ‘c’ P(t , c)—Probability of a document not containing term ‘t’ and belonging to category ‘c’ P(t, c )—Probability of a document containing term ‘t’ and not belonging to category ‘c’ P(t , c )—Probability of a document not containing term ‘t’ and not belonging to category ‘c’ Here, category denotes the category to which the document containing the word belongs to. We then combine the term frequencies with the GSS matrix by computing: TFGSS(t, d, c) = TF(t, d) ∗ GSS(t, c)

(5)

3.7 Positional Ranking of Sentences The traditional method of summarizing text dealt with including the first and last few sentences as they are considered to contain important information. The middle portion is often assumed to contain the facts related to the theory of the topic. Hence, position of the sentences can be considered as a factor influencing the efficiency of getting good summaries. Hence to technically analyze, we have come up with a mathematical function to describe the relevance of the sentence based on the position in the document.

33 A Novel Ensemble Model to Summarize Kannada Texts

425

The cosine function was used to compute the relative position of a sentence within a document. The cosine function allows relatively higher values to be associated with sentences that are (1) The beginning of the document, (2) Middle of the document and (3) End of the document. Positional Rank = cos

Relative Index of sentence Total number of sentences

(6)

3.8 Proposed Ensemble Hybrid Model and Efficiency Computation A research was conducted on Kannada texts for three types of categories, viz. sports, entertainment and mythology. The documents were analyzed by subject experts of the native Kannada language to generate human text summaries that formed the basis for the target outcome of the model. The research explored the efficiencies for three different models, viz. 1. Using TF–IDF, 2. Using TF–IDF and TF–GSS combination and 3. Proposed ensemble model (using a combination of TF–IDF, TF–GSS and positional ranking). The ensemble model was evaluated and tuned for different values of hyperparameters. The research clearly indicated that the proposed ensemble model delivered better results. The hyperparameters (α, β, γ ) were further tuned to obtain an optimal efficiency. Sentence Weight Value = α ∗ (TF−IDF(t, d)) + β ∗ (TFGSS(t, d, c)) + γ ∗ (Positional Rank) (7)

The summary is now generated by coherently picking the sentences whose weight values are greater than the chosen threshold value.

3.9 Model Flexibilty and Performance Considerations • Ability to set the threshold based on the required length of the summary. It improves the computational efficiency by eliminating the need to rank the sentences and then produce a coherent output.

426

S. Parimala and R. Jayashree

Fig. 8 Graph depicting the maximum efficiency comparison across three different models

• The models were built to support output of any specified number of summary sentences.

4 Results and Discussions Benchmark human summaries were generated for all of the documents under consideration with the help of expert native speakers. The machine generated summary was then compared with the human summary. The efficiency of the model was determined by measuring the similarity score considering the two summaries. This was iteratively done on the different models and their hyperparameters. The below chart depicts the maximum efficiency that was observed for the three different models across the different text categories (Fig. 8). • It is to be observed that the TF–IDF + TF–GSS model performs substantially better than the traditional TF–IDF model. • We further observe that the inclusion of positional ranking in the proposed ensemble model delivers superior efficiencies compared to both the TF–IDF, TF–IDF + TF–GSS models. • The same was observed for each of the three categories, viz. sports, entertainment and mythology. The proposed ensemble model was evaluated for different combinations for hyperparameters α, β and γ (Fig. 9).

33 A Novel Ensemble Model to Summarize Kannada Texts

427

Fig. 9 Ensemble model performance for various hyperparameter values

Option 1 Option 2 Option 3 Option 4 Option 5

Alpha 1 0.5 0.75 1 0.75

Beta 1 0.5 0.75 1 0.85

Gamma 1 0.5 0.1 0.25 0.15

• When α, β, γ had equal weightages, the efficiency was found to be sub-optimal. • Higher values of α and β compared to γ showed better efficiency. • A marginal higher weightage to β compared to α and an optimal value of γ showed superior results. • The combination α = 0.75, β = 0.85, γ = 0.15 gave the maximum efficiency across the categories. The optimal ensemble model was tested across the three categories, viz. sports, entertainment and mythology (Fig. 10). • The proposed ensemble model delivers high efficiency for the three categories of Kannada texts analyzed.

5 Conclusion The research observes that the stemmer designed for this task performs better compared to various other stemmers which have been previously built including the inltk

428

S. Parimala and R. Jayashree

Fig. 10 Optimal ensemble model efficiencies across text categories

stemmer as it is built on thorough analysis of the morphological structure and construct of the language. The library of stop words created in this exercise can be further extended for improved performance. The ensemble method performs better than the combination of TF–IDF and TF–GSS models. It can be extended for various categories of documents from multitude of sources. This model can be integrated with a user interface to take documents real time from the users and summarize them. It can be utilized by libraries to summarize the rich Kannada literature for people who intend to do quick reading.

References 1. Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165 2. Baxendale PB (1958) Machine-made index for technical literature-an experiment. IBM J Res Dev 2(4):354–361 3. Text extraction for an agglutinative language. In: Kumar S, Ram VS, Devi SL (eds) Proceedings of journal: language in India 4. Jayashree RR (2011) Document summarization in Kannada using keyword extraction 1:121– 127 5. Hitoshi Nishikawa YM, Hasegawa T, Kikui G (2010) Opinion summarization with integer linear programming formulation for sentence extraction and ordering. In: Coling 2010: poster volume, Beijing, pp 910–918 6. Sarkar K (2012) Bengali text summarization by sentence extraction 7. Geetha JK, Deepamala N (2015) Kannada text summarization using latent semantic analysis. In: 2015 International conference on advances in computing, communications and informatics (ICACCI), pp 1508–1512. https://doi.org/10.1109/ICACCI.2015.7275826 8. Kumar VK, Yadav D, Kumar A (2015) Graph based technique for Hindi text summarization, vol 339 9. Dave K (2011) Study of feature selection algorithms for text-categorization. UNLV Theses, Dissertations, Professional Papers, and Capstones. 1380. http://dx.doi.org/10.34917/3274698

33 A Novel Ensemble Model to Summarize Kannada Texts

429

10. KanithaDK, Mubarak DN, Shanavas S (2018) Malayalam text summarization using graph based method. Int J Comput Sci Inf Technol 9(2) 11. Sarwadnya VV, Sonawane SS (2018) Marathi extractive text summarizer using graph based model. In: Fourth international conference on computing communication control and automation (ICCUBEA), pp 1–6 12. Linhares Pontes E, Huet S, Torres-Moreno J-M, Linhares AC (2020) Compressive approaches for cross-language multi-document summarization. Data and knowledge engineering, vol 125, p 101763. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0169023X19300217

Chapter 34

Parallel Computation of Probabilistic Rough Set Approximations V. K. Hanuman Turaga and Srilatha Chebrolu

1 Introduction Rough set theory (RST) [1], since its inception in 1982 by Z. Pawlak, has matured over the years into a successful mathematical tool for handling inconsistent decision systems. It is applied widely for attribute reduction [2–4], data analysis [5], and rule induction [6] in emerging areas like machine learning and data analytics. The basic building blocks of RST are information granulation and approximation space. Granulation decomposes the universal set of objects into subsets called equivalence classes. A granule or an equivalence class is a group of objects clumped together by means of equivalence, similarity, indistinguishability, closeness, or compactness. But, in a universe of objects, there exist some subsets that cannot be represented precisely as equivalence classes. Hence, they should be represented approximately by a pair of sets called lower and upper approximations. The fundamental notion of RST is to formally approximate a granule or a finite set of objects through a pair of definable sets one from the above and the other from below or using three sets, namely positive region, negative region, and boundary region which are pairwise disjoint. Objects in the lower approximation of any set certainly belong to that set, whereas all the objects in the complement of the upper approximation of any set certainly do not belong to that set. Pawlak’s classical RST is characterized by crisp approximation space. There is no consideration for the concept of the degree of belongingness of an object while defining the approximation space. This attracted the

V. K. H. Turaga (B) · S. Chebrolu Department of Computer Science and Engineering, National Institute of Technology Andhra Pradesh, Tadepalligudem, Andhra Pradesh 534101, India e-mail: [email protected] S. Chebrolu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_34

431

432

V. K. H. Turaga and S. Chebrolu

scientific community to explore the probabilistic generalizations to classical rough sets [7–11]. Probabilistic approaches to RST were investigated and proposed by many researchers, such as the decision-theoretic rough set model by Yao et al. [7], the variable precision rough sets model by Ziarko [8], the Bayesian rough set model by Slezak et al. [9, 12, 13], the parameterized rough set model by Greco et al. [10, 14], Naive Bayesian rough sets by Yao and Zhou [11], and the information-theoretic analysis of probabilistic rough sets by Deng and Yao [15]. To reduce the crispness in classical rough approximation space, Z. Pawlak, S. K. M. Wong, and W. Ziarko analyzed probabilistic approach to rough sets based on the threshold value 0.5 [16], Y. Y. Yao and S. K. M. Wong proposed a model for approximating concepts based on two thresholds (i.e., upper threshold and lower threshold) [17], the probabilistic rough set approximations was proposed by Yiyu Yao based on conditional probability and a pair of thresholds, and several other types of rough set approximations were also studied [18, 19]. Algorithms proposed based on RST are mostly serial in nature. Therefore, their usage is becoming restricted due to computational inefficiencies as the datasets are growing in size day by day. The MapReduce programming model proposed by Dean and Ghemawat [20] for processing large datasets in parallel on a cluster of computing nodes encouraged researchers to develop RST-based algorithms over the MapReduce framework [21–23]. J. Zhang and T. Li et al. proposed and implemented a MapReduce-based parallel algorithm on Apache Hadoop for computing rough set approximations [24]. Junbo Zhang and Jian-Syuan Wong et al. addressed the problems in processing incomplete data using matrix-based parallel methods on MapReduce over Twister [25]. In this article, an efficient MapReduce-based parallel algorithm called parallel algorithm for computing probabilistic rough set approximations (PACPRSA) is proposed for computing regions and probabilistic rough set approximations. Apache Spark [26] is used as the parallel processing framework and Scala for designing the proposed parallel algorithm. Apache Spark has been proven to perform better than Hadoop since it exploits in-memory computing for big data processing. As PRS is used extensively in fault-diagnosis-related real-time applications either in the domain such as medicine or in the power generation and supply or for the investigation of gene-related information in bio-informatics, the proposed PACPRSA algorithm will help accelerate the analysis tasks over massive data for faster decision making. Extensive experimentation is performed on standard machine learning datasets to verify the scalability of the proposed algorithm. PACPRSA demonstrates good performance gains in running time when compared to the serial execution as the sizes of the datasets increase. The proposed algorithm also performs well in scalability metrics like speedup, sizeup, and scaleup and thus makes it suitable for application on massive datasets. The structure of the remaining part of this paper is as given below: An introduction to RST, the computation of regions and approximations using the basic PRS model, and an overview of the MapReduce programming paradigm are provided in Sect. 2. Section 3 presents a serial procedure for computing PRS approximations and

34 Parallel Computation of Probabilistic Rough Set Approximations

433

the proposed parallel algorithm. The experimental setup, comparative analysis, and scalability results are discussed in Sect. 4. Section 5 concludes the paper.

2 Preliminaries 2.1 Rough Set Theory Rough set theory was introduced by a Polish mathematician Z. Pawlak in 1982. It is a formal mathematical tool for the approximation of the classical set theory to deal with vague and inconsistent data during decision making. In RST, the input dataset is called an information table or a decision system. Each row of the table corresponds to an object, and each column provides information regarding a specific feature of the objects. An information table, It, is a tuple of four components as shown below: I t = (O, At, D = {Dc |c ∈ At}, f = { f c |c ∈ At})

(1)

where O = {o1 , o2 , …, om } is a nonempty set containing a finite number of objects, At = {C ∪ d} is the union of conditional attribute set C = {c1 , c2 , …, cn } having a nonempty finite set of n conditional attributes and a decision attribute d having different decision values or classes, Dc is the domain of possible values for attribute c, and f c : O → Dc is a function that maps an object oi in O to exactly one domain value in Dc . For any subset of conditional attributes, S ⊆ C, the equivalence relation or indiscernibility relation in RST denoted by E S or IND(S) can be defined as follows: ES =

oi , o j ∈ O × O|∀c ∈ S, f c (oi ) = f c o j , i = j

(2)

i.e., for any conditional attribute subset S ⊆ C, two different objects oi and oj satisfy the equivalence relation E S , if and only if both the objects have the same values on all attributes in S. The equivalence relation E S obtains a family of equivalence classes by performing a partition on O, denoted by O/E S . Let T ⊆ O be a target set of objects. Then, the lower and upper approximations of T w.r.t. the set of attributes S are defined, respectively, as follows: apr(T ) = {oi ∈ O|[oi ] S ⊆ T } apr(T ) = {oi ∈ O|[oi ] S ∩ T = ∅}

(3)

In RST, a target set T can also be described using the following positive, negative, and boundary regions:

434

V. K. H. Turaga and S. Chebrolu

Positive region: POS(T ) = apr(T ) Negative region: NEG(T ) = O − apr(T ) Boundary region: BND(T ) = apr(T ) − apr(T )

(4)

2.2 The Basic Probabilistic Rough Set Model The three regions of Pawlak’s rough sets can be expressed in terms of probability with the help of conditional probability and the two extreme values of probability, viz., 0 and 1 as shown below: POS(T ) = {oi ∈ O|Pr (T |[oi ] ) = 1} NEG(T ) = {oi ∈ O|Pr (T |[oi ] ) = 0} BND(T ) = {oi ∈ O|0 < Pr (T |[oi ] ) < 1}

(5)

| where Pr(T |[oi ] ) = |[o|[oi ]∩T denotes the conditional probability that an object oi is in i ]| the target set T when it is already present in [oi ], i.e., the equivalence class to which oi belongs. The extreme values 0 and 1 can be generalized to a pair of thresholds β and γ such that 0 ≤ γ < β ≤ 1. Pawlak’s rough set model is a special case of this generalization in which γ = 0 and β = 1. The 0.5-probabilistic model, defined as β = γ = 0.5, which was introduced by Pawlak et al. [16], is another special case in which the positive region will have objects whose probability is > 0.5, the negative region with objects having probability < 0.5, and the boundary region with probability = 0.5. Therefore, the generalized definition of the regions of the basic probabilistic rough set model using the pre-determined thresholds β and γ is as given below:

POS(β,γ ) (T ) = {oi ∈ O|Pr (T |[oi ] ) ≥ β} NEG(β,γ ) (T ) = {oi ∈ O|Pr (T |[oi ] ) ≤ γ } BND(β,γ ) (T ) = {oi ∈ O|γ < Pr (T |[oi ] ) < β}

(6)

The approximations can also be determined using the regions as given below: apr(T ) = POS(β,γ ) (T ) apr(T ) = POS(β,γ ) (T ) ∪ BND(β,γ ) (T )

(7)

34 Parallel Computation of Probabilistic Rough Set Approximations

435

2.3 MapReduce Programming Model MapReduce [20], a parallel and distributed programming paradigm, was introduced by Google in the year 2004. It is used for the implementation of parallel and distributed algorithms that deal with big data. The term map in MapReduce indicates the map phase, and the term reduce indicates the reduce phase of the paradigm. The map process takes a collection of key, value pairs as input and outputs a different collection of intermediate data key1 , value1 , where each element is also in the form of a key/value pair. The reduce process collects the generated intermediate key/value tuples and combines them into a smaller set of final output pairs ( key1 , {value1 , …} ). A pictorial representation of the MapReduce programming paradigm is shown in Fig. 1.

3 Proposed Algorithm This section presents a naive serial procedure for computing probabilistic rough set approximations through the SCPRSA algorithm and then discusses its parallel implementation, the proposed PACPRSA algorithm.

Fig. 1 MapReduce programming paradigm

436

V. K. H. Turaga and S. Chebrolu

Algorithm 1: The SCPRSA algorithm Input: Input information table: It = (O, At = {C Upper threshold: β Lower threshold: γ Output: Approximations:

d})

and

Begin Compute O/C = {E1, E2, . . . , Ek} Compute O/d = {D1, D2, . . . , Dl} for each decision class Di O/d O for each object oi Compute Pr(Di|[oi]) if (Pr(Di|[oi]) ≥ β) then Assign oi to POS(Di) else if (Pr(Di|[oi]) ≤ γ) then Assign oi to NEG(Di) else Assign oi to BND(Di) end if end for Compute (Di) = POS(Di) (Di) = POS(Di) BND(Di) Compute end for End

For large datasets, each task in Algorithm 1 such as the computation of equivalence classes, the computation of decision classes, obtaining regions, and approximations is computationally intensive. Hence, the parallelization methods for each of these tasks using Apache Spark are discussed below. The proposed parallel algorithm, PACPRSA, is presented below through Algorithms 2, 3, and 4. Algorithm 2 is the main (or driver) algorithm. The information table It is initially converted to a resilient distributed dataset It RDD . A resilient distributed dataset (RDD) is the fundamental data structure provided by Apache Spark. An RDD facilitates us to work on the dataset in parallel across the nodes of a computing cluster. In Algorithm 2, we start by computing the family of equivalence classes w.r.t. the given conditional attribute set C using a parallel approach based on MapReduce. This process of finding the equivalence classes in parallel is detailed in Algorithm 3 below. Each object oi in the It RDD is originally in the form (id id i of oi , list of feature values oi C of oi , class d i of oi ). In Algorithm 3, a map() transformation is applied to each object oi to transform them from their original format into an intermediate pair oi C of oi , id i of oi , where the key is the list of conditional feature values or the feature vector and the value is the corresponding id of the object. The algorithm then merges the values

34 Parallel Computation of Probabilistic Rough Set Approximations

437

of each distinct key by using the reduceByKey() transformation. The result of this reduce transformation which is currently spread across the nodes of the cluster is collected at the driver node as an array of key-value pairs using the collect() action. The algorithm finally returns the values which are the equivalence classes w.r.t. C. The next step of Algorithm 2 is the computation of decision classes. The process of computing decision classes in parallel is similar to that of Algorithm 3 and is detailed in Algorithm 4. In Algorithm 4, a map() transformation is applied over the It RDD to convert each object oi into an intermediate pair of key and its corresponding value class d i of oi , id i of oi . The reduceByKey() transformation is used to aggregate the objects corresponding to each unique decision class d k . The above result is collected at the driver node using collect() and subsequently, and the values are returned as an array. Each value in the array is a list of id’s of objects having the same class label. After collecting the equivalence classes and decision classes at the master node, Algorithm 2 broadcasts them along with the pair of user-defined probability threshold values β and γ to all the nodes of the cluster for computation of regions and approximations further. In each iteration, the algorithm chooses a distinct decision class as a target set and finds its approximations. In an iteration, each object of the universe is assigned to either the positive region, negative region, or boundary region based on the probability value calculated. Each object is again mapped as a pair of key, value , where the key is the region code and the value is the object’s id. An object is assigned to the positive region indicated by code “PR” if its probability value is ≥ the upper threshold β. If the probability value is ≤ the lower threshold γ , then the object is mapped to the negative region indicated by “NR” or to the boundary region, “BR” otherwise. The final reduceByKey() is performed to aggregate the objects according to the regions they are assigned to. The above result is collected at the driver node to construct approximations as given in Eq. (7). The positive region forms the lower approximation, whereas the positive and boundary regions together form the upper approximation.

438

V. K. H. Turaga and S. Chebrolu

Algorithm 2: The PACPRSA Algorithm Input: Input information table: ItRDD = (ORDD, At = {C Upper threshold: β Lower threshold: γ Output: Approximations:

d})

and

Begin Compute ec ← equivalenceClasses(C) Compute dc ← decisionClasses(d) Broadcast ec, dc, β and γ ORDD do in parallel for each data partition Op dc do for each decision class dcj targetSet ← dcj for each object ok Op do eck ← Obtain the [ok]C from ec ← Compute Pr(targetSet|eck) if ( ≥ β) then regionsRDD ← Map ok to “PR”, idk of ok else if ( ≤ γ) then regionsRDD ← Map ok to “NR”, idk of ok else regionsRDD ← Map ok to “BR”, idk of ok end if end for Compute outputRDD ← regionsRDD.reduceByKey() Compute regions ← outputRDD.collect().toMap() Compute (targetSet) ← regions(“PR”) (targetSet) ← regions(“PR”) regions(“BR”) Compute end for end for End

34 Parallel Computation of Probabilistic Rough Set Approximations

Algorithm 3: Compute equivalenceClasses(C) Input: ItRDD = (ORDD, At = {C

d})

Output: An array of equivalence classes: [{idi of oi}] Begin ORDD do in parallel for each data partition Op for each object oi Op do Convert oi from(idi, , idi . , di) to end for end for Apply reduceByKey() to obtain: fvk, {idi of oi}k that we call as key1, value1 . Apply collect() to obtain the array[ key1, value1 ]. Return [value1]. End

Algorithm 4: Compute decisionClasses(d) Input: ItRDD = (ORDD, At = {C

d})

Output: An array of decision classes: [{idi of oi}] Begin ORDD do in parallel for each data partition Op for each object oi Op do Convert oi from(idi, , di) to di, idi . end for end for Apply reduceByKey() to obtain: dk, {idi of oi}k that we call as key2, value2 . Apply collect() to obtain the array[ key2, value2 ]. Return [value2]. End

439

440

V. K. H. Turaga and S. Chebrolu

Table 1 Datasets description S. No.

Dataset

# of objects

# of attributes

# of classes

1

Statlog (Landsat satellite)

4435

36

6

2

Letter recognition

16,000

16

26

3

Connect-4

67,557

42

3

4

MNIST

70,000

784

10

5

Poker-hand

1,000,000

10

10

4 Experimental Results In this section, the experimental environment used for validating the proposed PACPRSA algorithm and the performance results observed during comparative analysis against its serial counterpart, SCPRSA are presented.

4.1 Experimental Environment Apache Spark [26] is chosen as the parallel processing framework for implementing and evaluating the proposed work because of its ability for caching data in-memory and fault-tolerance. A cluster of five nodes is formed, where one node is used as the driver node, while the other four are used as worker nodes. Each node is equipped with an Intel® Core™ i7-9700 [email protected] GHz processor having eight cores, 8 GB of primary memory and is operated by Ubuntu 20.04.1 LTS. All the nodes are configured with the required software such as Java 1.8.0_281, Scala 2.12.12, and Apache Spark 3.0.1. The proposed algorithm runs on different benchmark datasets such as Statlog, Letter recognition, Connect-4, MNIST, and Poker-hand which are available for download in the UCI machine learning data repository [27]. Table 1 lists the datasets used and their corresponding details.

4.2 Performance Analysis The efficiency of PACPRSA is analyzed by comparing the results with its serial counterpart, SCPRSA. SCPRSA is written in Scala and runs on a system having an Intel® Core™ i7-9700 [email protected] GHz processor with eight cores, 8 GB of primary memory, and installed with Ubuntu 20.04.1 LTS operating system. The computational times obtained by implementing the proposed PACPRSA and the SCPRSA algorithms over different datasets in the above-mentioned experimental setup are presented in Table 2.

34 Parallel Computation of Probabilistic Rough Set Approximations Table 2 Computational times of SCPRSA and PACPRSA (time in minutes)

441

S. No.

Dataset

SCPRSA

PACPRSA

1

Statlog

0.02

1.5

2

Letter recognition

1.3

1.67

3

Connect-4

5.67

1.77

4

MNIST

20.11

4.28

5

Poker-hand

6584.3

127.63

From Table 2, it can be observed that the SCPRSA outperformed the proposed PACPRSA when running on the datasets having a smaller number of objects, viz., Statlog and Letter Recognition. As the PACPRSA runs on a cluster of computing nodes, additional time is consumed for the initial start-up, network communication, the time for the collection of results, etc. But, it can be observed that the PACPRSA starts to perform better than the SCPRSA as the size of the datasets increases. For Connect-4 and MNIST datasets, the performance improvement in computational times is 68.82% and 78.71%, respectively. The object space in the MNIST dataset is closer to the Connect-4 dataset, but Connect-4 is chosen for observing the changes in the performance because MNIST has a huge increase over Connect-4 in its attribute space as well as in the number of classes though not in the object space. With the Poker-head dataset, when the object space is increased to 1 million records, the PACPRSA achieved a significant 98.06% increase in computational time suggesting strongly that the proposed algorithm performs well as the datasets scaleup in size. As further evidence to the above statement, Table 3 compares SCPRSA and PACPRSA more precisely. In Table 3, we picked one of the datasets (i.e., MNIST) and increased it consecutively to 1.5, 2, 2.5, and 3 times its original size to check how the PACPRSA behaves when the dataset is scaled in size. The proposed algorithm showed 83.01%, 82.05%, 85.17%, and 86% improvement in computational gains, respectively, compared to SCPRSA. The above results of PACPRSA in comparison with SCPRSA demonstrate the relevance of using the proposed algorithm for massively large datasets. Table 3 Comparison of SCPRSA and PACPRSA by varying the object space of MNIST dataset (time in minutes) Dataset

Objects

Attributes

SCPRSA

PACPRSA

MNIST

70,000

784

20.12

4.28

MNIST 1.5 times

105,000

784

35.92

6.10

MNIST 2 times

140,000

784

39.73

7.13

MNIST 2.5 times

175,000

784

68.22

10.12

MNIST 3 times

210,000

784

88.70

12.42

442

V. K. H. Turaga and S. Chebrolu

Fig. 2 Speedup of PACPRSA

4.3 Scalability of PACPRSA The scalability of a parallel algorithm is evaluated using standard metrics such as speedup, sizeup, and scaleup. Speedup. The speedup of a parallel algorithm is measured by keeping the size of the dataset constant but gradually increasing the number of nodes within the cluster. The speedup of an algorithm running on a cluster with n nodes is defined as: speedup(n) =

running time on one node running time on n nodes

An ideal parallel or distributed algorithm should demonstrate a linear speedup. However, due to the initialization, communication, and other overhead costs, only a sub-linear speedup can be achieved in practice. Figure 2 displays the speedup of PACPRSA on MNIST and Connect-4 datasets. To check the speedup of PACPRSA on large object spaces, the sizes of Connect4 and MNIST datasets are increased to 4 times their original size (i.e., to 270,228 objects and 280,000 objects, respectively). From Fig. 2, it can be seen that MNIST has a slightly better speedup curve compared to Connect-4 which is an indication that the speedup performance improves as the size of the datasets increases further. Therefore, the PACPRSA handles large datasets efficiently. Sizeup. The sizeup is measured by keeping the cluster size unchanged, whereas the size of the dataset is increased by a factor of n. Each time the dataset is increased by a factor of n, we measure the increase in runtime. If s is the size of a dataset, then sizeup is defined as: sizeup(n) =

runtime of dataset of size (n × s) runtime of dataset of size s

34 Parallel Computation of Probabilistic Rough Set Approximations

443

Fig. 3 Sizeup of PACPRSA

To find the sizeup of PACPRSA, the number of nodes is kept unchanged at four. The datasets Connect-4 and MNIST are picked at their original size and are gradually increased to 1.5, 2, 2.5, and 3 times along their object space. Figure 3 shows the sizeup performance results for those datasets. Both the datasets displayed better sizeup performances. MNIST dataset achieved a near-ideal sizeup curve indicating that PACPRSA has a good sizeup performance. Scaleup. Scaleup evaluates whether the same performance is maintained when both dataset size and the number of nodes are increased proportionally. If s is the size of a dataset and n is the number of nodes, then scaleup is defined as: scaleup(n) =

runtime of dataset of size s on one node runtime of dataset of size (n × s) on n nodes

To find the scaleup of PACPRSA, we started adding each of the four nodes one after the other while increasing the size of the two datasets proportionally by 25%, 50%, 75%, and 100%, respectively. Figure 4 displays the scaleup performances on Fig. 4 Scaleup of PACPRSA

444

V. K. H. Turaga and S. Chebrolu

the datasets. The scaleup curve of the MNIST dataset is better than the Connect-4 dataset.

5 Conclusion PRS-based algorithms have been proposed by many researchers after their introduction, for addressing challenges in data processing for knowledge discovery. However, their application on big datasets has become a challenging task. Computing approximations is the fundamental step for algorithms based on RST and its variants. Therefore, the development of an efficient algorithm for its computation is a necessary task. In this paper, we proposed a parallel and distributed method for computing regions and approximations using probabilistic rough sets. The algorithm is based on MapReduce and is implemented on Apache Spark to evaluate the speedup, scaleup, and sizeup performances. The experimentation results demonstrate that the proposed parallel algorithm can handle large datasets effectively. As future work, we plan to apply the proposed parallel method in developing attribute reduction algorithms using probabilistic generalizations of rough sets on big data. Acknowledgements Authors acknowledge the sponsorship received from the Science and Engineering Research Board (SERB), the Department of Science and Technology (DST), Government of India, under the scheme of Empowerment and Equity Opportunities for Excellence in Science (Sanction Order No. EEQ/2019/000470).

References 1. Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356. https://doi.org/10.1007/BF0 1001956 2. Thangavel K, Pethalakshmi A (2009) Dimensionality reduction based on rough set theory: a review. Appl Soft Comput J. https://doi.org/10.1016/j.asoc.2008.05.006 3. Chebrolu S, Sanjeevi SG (2015) Attribute reduction on continuous data in rough set theory using ant colony optimization metaheuristic. https://doi.org/10.1145/2791405.2791438 4. Jia X, Shang L, Zhou B, Yao Y (2016) Generalized attribute reduct in rough set theory. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2015.05.017 5. Greco S, Matarazzo B, Slowinski R (2001) Rough sets theory for multicriteria decision analysis. Eur J Oper Res 129(1):1–47. https://doi.org/10.1016/S0377-2217(00)00167-3 6. Grzymala-Busse JW (1992) LERS—a system for learning from examples based on rough sets. Intell Decis Support 3–18. https://doi.org/10.1007/978-94-015-7975-9_1 7. Yao YY, Wong SKM, Lingras P (1990) A decision-theoretic rough set model. Methodol Intell Syst 5:17–27 8. Ziarko W (1993) Variable precision rough set model. J Comput Syst Sci 46(1):39–59. https:// doi.org/10.1016/0022-0000(93)90048-2 ´ ezak D, Ziarko W (2002) Bayesian rough set model. In: Proceedings of the international 9. Sl˛ workshop on foundation of data mining (FDM’2002), 9 Dec 2002, Maebashi, Japan, pp 131– 135

34 Parallel Computation of Probabilistic Rough Set Approximations

445

10. Greco S, Matarazzo B, Słowi´nski R (2005) Rough membership and Bayesian confirmation measures for parameterized rough sets. In: RSFDGrC 2005: Rough sets, fuzzy sets, data mining, and granular computing. LNCS, vol 3641, pp 314–324. https://doi.org/10.1007/11548669_33 11. Yao Y, Zhou B (2010) Naive Bayesian rough sets. In: RSKT 2010: Rough set and knowledge technology, Oct 2010. LNCS, vol 6401, pp 719–726. https://doi.org/10.1007/978-3-642-162480_97 ´ ezak D, Ziarko W (2005) The investigation of the Bayesian rough set model. Int J 12. Sl¸ Approximate Reasoning 40(1–2):81–91. https://doi.org/10.1016/j.ijar.2004.11.004 13. Zhang H, Zhou J, Miao D, Gao C (2012) Bayesian rough set model: a further investigation. Int J Approximate Reasoning 53(4):541–557. https://doi.org/10.1016/j.ijar.2011.12.006 14. Greco S, Matarazzo B, Słowi´nski R (2008) Parameterized rough set model using rough membership and Bayesian confirmation measures. Int J Approximate Reasoning 49(2):285–300. https:// doi.org/10.1016/j.ijar.2007.05.018 15. Deng X, Yao Y (2012) An information-theoretic interpretation of thresholds in probabilistic rough sets. In: RSKT 2012: Rough sets and knowledge technology. LNCS, vol 7414, pp 369– 378. https://doi.org/10.1007/978-3-642-31900-6_46 16. Pawlak Z, Wong SKM, Ziarko W (1988) Rough sets: probabilistic versus deterministic approach. Int J Man Mach Stud 29(1):81–95. https://doi.org/10.1016/S0020-7373(88)80032-4 17. Yao YY, Wong SKM (1992) A decision theoretic framework for approximating concepts. Int J Man Mach Stud 37(6):793–809. https://doi.org/10.1016/0020-7373(92)90069-W 18. Grzymala-Busse JW, Clark PG, Kuehnhausen M (2014) Generalized probabilistic approximations of incomplete data. Int J Approximate Reasoning 55(1), Part 2, 180–196. https://doi.org/ 10.1016/j.ijar.2013.04.007 19. Ma J, Zou C, Pan X (2017) Structured probabilistic rough set approximations. Int J Approximate Reasoning 90:319–332. https://doi.org/10.1016/J.IJAR.2017.08.004 20. Dean J, Ghemawat S (2010) Map reduce: a flexible data processing tool. Commun ACM. https://doi.org/10.1145/1629175.1629198 21. Yang Y, Chen Z, Liang Z, Wang G (2010) Attribute reduction for massive data based on rough set theory and MapReduce. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics). LNAI, Oct 2010, vol 6401, pp 672–678. https://doi.org/10.1007/978-3-642-16248-0_91 22. Qian J, Miao D, Zhang Z, Yue X (2014) Parallel attribute reduction algorithms using MapReduce. Inf Sci (NY). https://doi.org/10.1016/j.ins.2014.04.019 23. Zhang J, Li T, Pan Y (2014) PLAR: parallel large-scale attribute reduction on cloud systems. https://doi.org/10.1109/PDCAT.2013.36 24. White T (2012) Hadoop: the definitive guide, 4th edn. Online. citeulike-article-id:4882841 25. Zhang J, Wong JS, Pan Y, Li T (2015) A parallel matrix-based method for computing approximations in incomplete information systems. IEEE Trans Knowl Data Eng 27(2):326–339. https://doi.org/10.1109/TKDE.2014.2330821 26. Zaharia M et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65. https://doi.org/10.1145/2934664 27. Asuncion A, Newman DJ (2007) UCI machine learning repository: data sets. University of California Irvine School of Information. https://archive.ics.uci.edu/ml/index.php

Chapter 35

Simplified TOPSIS for MLN-MODM Problems Kailash Lachhwani

1 Background of Problem Mathematical programming problems (MPPs) are decision-making problems having specific format, namely objective function/(s) and constraints. MPPs are of wide variety of problems starting from single objective linear programming (LPP), multiobjective programming problems to complex hierarchical single and multiple objective problems like Bi-level linear/nonlinear programming problems (BLPPs) and their extension problems, multilevel linear/nonlinear programming problems (MLLPPs) and their extension problems. Nowadays, more typical and complex mathematical programming problems generally includes multilevel programming problems (MLPPs), multilevel multiobjective programming (ML-MOP) problems and bi-level multiobjective programming (BL-MOP) problems arise in government policies, competitive economical situations, agriculture, bio-fuel production, etc. These types of problems have the basic features of hierarchical decision-making structure which mainly includes sequential flow of decisions from top levels to bottom levels in hierarchy. MLPPs frequently appear in complex decentralized management situations. To illustrate real business problem as MLPP, we consider the following example as: one business firm in a decentralized manner having three level structures as: (i) First level—Manufacturing unit with objective functions—Maximize sales, revenue, etc., and decision variable—Investments, running cost, manpower, etc. (ii) Second level—Dealers with objective function—Maximize profit and decision variables includes new sales schemes, transportation costs, etc., and (iii) Third level—Stockiest with objective function—Maximize profit and decision variables includes running costs, manpower, etc. (as shown in Fig. 1) can be formulated as K. Lachhwani (B) Department of Applied Science, National Institute of Technical Teacher’s Training and Research, Chandigarh 160019, India e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_35

447

448

K. Lachhwani

Fig. 1 Hierarchical structure in business problem as MLPP

multilevel decision-making (MLDM) problems. In the same business problem, if firm has only two level structures, then this type of problems is classified as bi-level decision-making (BLDM) problems. While dealing with such hierarchical programming problems (BLPPs or MLPPs and their other extension problems), it is quite obvious that due to model structure of these problems, it is almost impossible to obtain an ideal optimal solution to entire problem optimizing every objective function at each level and satisfying constraints without zero dissatisfaction within hierarchical decision-makers at different levels. Therefore, researchers introduced the concept of satisfactory solution or compromise optimal solutions to these problems and now developments of different solution techniques for satisfactory or compromise optimal solutions of these problems is become the main area of current research. In literature too, significant number of research publications have been contributed by researchers/academicians and they have developed different techniques to tackle MLPPs, BLPPs and related extension problems, e.g. bi-level multiobjective decision-making problems (BL-MODMPs), multilevel multiobjective decision-making problems (ML-MODMPs), etc., based on traditional approaches like fuzzy programming (FP) approach and other approaches. In particular, various versions of fuzzy programming (FP), TOPSIS approach, balance space approach, etc., are widely circulated in [1–8] for solving different types of BLPPs and multilevel multiobjective programming problems. However, a choice of an efficient method is still a subject of active research. An old bibliography on MLPPs and BLPPs with different solution methodologies is available in [9]. Recently, Lachhwani and Dwivedi [10] suggested taxonomy of detailed literature reviews and issues related to MLPPs and BLPPs.

35 Simplified TOPSIS for MLN-MODM Problems

449

In this article, we briefly discuss multilevel nonlinear multiobjective decisionmaking problems (MLN-MODM) which are more complex hierarchical problem with multilevel structure and several nonlinear antithetical objective functions at each level mathematically. In context of MLN-MODM problems, we primarily focus on TOPSIS that is the technique for order performance by similarity to ideal solution which is theoretically based upon the concept that the best-preferred solution point be at a point which is close to positive ideal solution (PIS) point and more far away from the negative ideal solution (NIS) point where the PIS and NIS points are defined as: best solution point (solution maximizing objective function, if problem is of maximization) and worst solution point (solution minimizing objective function, if problem is of maximization), respectively. Initially, TOPSIS approach was suggested by Lai et al. [11] for multiobjective decision-making problems. Thereafter, TOPSIS approach is used as a solution methodology for different multiobjective programming problems. Zavadskas et al. [12] studied development of TOPSIS method to solve complicated decision-making problems from year 2000 to 2015. In between, Baky [13] extended the use of TOPSIS approach for maximization type multilevel nonlinear multiobjective decision-making (MLN-MODM) problems and proposed algorithms for MLN-MODMPs. Recently, Lachhwani [14] proposed solution methodology for an important problem ML-MOPP in which coefficients are fully neutrosophic numbers (NNs). Two interactive TOPSIS algorithms were presented by Baky [13] based on TOPSIS approach. These two interactive algorithms for MLNMODM problems simplify problems by converting problems into separate multiobjective programming problems (MOPPs) at each upper levels and lower levels. But, there are some drawbacks in the algorithms by Baky [13] for MLN-MODM problems and due to which repeated rejection of the solution by the upper level decision-makers (ULDMs) and subsequent frequent decision deadlock situations become apparent in algorithmic process. This drives us to propose a new simplified TOPSIS approach for MLN-MODM problems with some major modifications in the interactive TOPSIS approach by Baky [13] which include (i) modifications in the formulation of membership function for the top level decision variable vectors, (ii) simplifications in the membership functions of the distance functions from the PIS and NIS points. Thereafter usual goal programming is applied to obtain satisfactory solution for complex ML-MODM problems. This simplified TOPSIS methodology for ML-MODM problems avoids decision deadlock situations and possibility of rejection of the solution by the ULDMs in decision-making process. This simplified TOPSIS approach for MLN-MODM problems is simpler, more efficient and less computational than the interactive TOPSIS approach by Baky [13]. The paper unfolds the content in following order as: Formulation of MLN-MODM problems is discussed in brief in next section. Proposed simplified TOPSIS approach in context of solving MLN-MODM problems is discussed in details in Sect. 3. Thereafter, stepwise algorithm of proposed approach is given in Sect. 4. In Sect. 5, a detailed comparison between two approaches proposed simplified TOPSIS and technique given by Baky [13] over the same numerical example is presented. Concluding remarks and future extensions of simplified TOPSIS are presented in Sect. 6.

450

K. Lachhwani

2 Problem Introduction MLPPs are special class of programming problems with interacting decision making (sequential flow of decisions from top levels to bottom levels) within the structure. Further, when more than one conflicting objective functions appear at every level of MLPPs, then problem becomes more complex multilevel multiobjective programming problems (ML-MOPPs). The multilevel nonlinear multiobjective decisionmaking (MLN-MODM) problems are extended class of ML-MOPPs with a set of nonlinear objectives at each level. Here we consider a maximization type MLNMODM problem with m objective functions at each of T levels. The mathematical format of the problem can be described as: Max{Z 11 , Z 12 , . . . , Z 1m } X1

Max{Z 21 , Z 22 , . . . , Z 2m } X2

.. . Max{Z T 1 , Z T 2 , . . . , Z T m } XT

Subject to, Al1 X 1 + Al2 X 2 + · · · + AlT X T (≤, =, ≥)bi ∀l = 1, 2, . . . , p and X 1 ≥ 0, X 2 ≥ 0, . . . , X T ≥ 0.

(1)

where Z tr X ;∀t = 1, 2, 3, . . . , T r = 1, 2, 3, . . . , m is the rth objective function at tth level DM. Here it is important to note that Z tr X ;∀t = 1, 2, . . . , T r = 1, 2, . . . , m considered as nonlinear function. X 1 = X 11 , X 12 , . . . , X 1N1 set of decision variable vector under first level DM. ... ...

X T = X T1 , X T2 , . . . , X TNT set of decision variable vector under tth level DM. where indicates matrix transpose, Alt ; l = 1, 2, 3, . . . , p, t = 1, 2, 3, . . . , T are m row vectors having dimension (1 × Nt ). Alt X t , t = 1, 2, 3, . . . , T is a column vector of dimension ( p × 1). HereN = N1 + N2 + · · · + N T and we take X as union set of variables X 1 , X 2 , . . . , X T . Decision vector X t , t = 1, 2, . . . , T is tth level decision vector having Nt number of decision variables.

35 Simplified TOPSIS for MLN-MODM Problems

451

3 Simplified TOPSIS for ML-MODM Problems TOPSIS theoretically based upon the concept that the satisfactory solution of the problem will be at a point which has minimum distance from the PIS and maximum distance from the NIS. Therefore, firstly distances from the PIS and NIS are calculated individually with respect to the individual maximum and minimum objective function values. In this context, the simplified TOPSIS model formulation for the problem (1) can be presented as: t min d PIS p ; t = 1, 2, . . . , T t and max d NIS ; t = 1, 2, . . . , T p

(2)

with the same set of constraints and non-negativity conditions as given in (1). PIS t

Where d p

Xt =

t and d NIS Xt = p

⎧ m ⎨ ⎩

p

λt j

j=1

⎧ m ⎨ ⎩

j=1

p

λt j

⎫1/ p Z t∗j − Z t j X t ⎬ ⎭ Z t∗j − Z t−j

⎫1/ p Z t j X t − Z t−j ⎬ ⎭ Z t∗j − Z t−j

(3)

(4)

where Z t∗j , Z t−j and λt j , t = 1, 2, . . . , T ; j = 1, 2, . . . , m are the individual PISs (individual maximization optimal solution for tth level), the individual NISs (individual minimization optimal solution for tth level) and relative weights (importance of objective functions), respectively. The parameter p as suggested by Baky [13] is the “balancing factor” for the objective function value and maximal individual regret. Now, we formulate the membership functions in simplified form as: t ∗ t d PIS = Min d PIS p p , t = 1, 2, . . . , T

d NIS p

∗

t

tj = Max d NIS , T = 1, 2, . . . , T p

(5) (6)

t − t − Here it is important to note that Baky [13] defined d PIS and d NIS as given p p in [13] in different manner. But for simplicity of proposed technique, we propose

d PIS p

t

−

t = Max d NIS , t = 1, 2, . . . , T p

(7)

t

−

tj = Min d NIS , t = 1, 2, . . . , T p

(8)

d NIS p

452

K. Lachhwani

µt1 , µt 2

1

dP ( X ) t

(d pNIS ) −

t

(d pPIS )*

t

(d pNIS )*

t

(d pPIS ) −

Fig. 2 Membership function for μt1 and μt2

t − t − The use of proposed d PIS and d NIS as given in (7) and (8) makes the p p algorithm simple computationally. Now we can define membership functions μt1 ≡ t Xt t and μ ≡ μ X as (Fig. 2): μd PIS t2 t d NIS p p t t ∗ if d PIS X < d PIS p p t ∗ t t − ∀t = 1, 2, . . . , T if d PIS ≤ d PIS X ≤ d PIS p p p PISt − t dp < d PIS X p (9) ⎧ tj t ∗ 1 if d NIS X > d NIS ⎪ p p ⎪ ∗ ⎨ t tj d NIS −d NIS ( X ) t − tj t ∗ μt2 = 1 − pNISt ∗ p NISt − if d NIS ∀t = 1, 2, . . . , T ≤ d NIS X ≤ d NIS p p p d − d ⎪ ( ) ( ) p p ⎪ ⎩ tj t − 0 d NIS X < d NIS p p (10) ⎧ 1 ⎪ ⎪ ⎨ t t ∗ d PIS ( X )− d PIS p μt1 = 1 − p PISt − PIS ∗ ⎪ (d p ) −(d p t ) ⎪ ⎩ 0

Baky [13] adopted max–min decision criteria for the solution of this ML-MODM problems. Further, Baky [13] used the concept of α = min(μt1 (x), μt2 (x)), for membership functions and proposed the equivalent solution model. These all require large computational tasks. Therefore, in order to reduce computational difficulties at this stage and simplicity of proposed technique, we use traditional fuzzy goal programming (FGP) technique to find satisfactory solution of the problem. Using FGP technique, the respective flexible membership goals for membership functions (9) and (10) can be described as: − + μt1 + dt1 − dt1 = 1 ∀t = 1, 2, . . . , T

(11)

− + μt2 + dt2 − dt2 = 1 ∀t = 1, 2, . . . , T

(12)

35 Simplified TOPSIS for MLN-MODM Problems

453

− + where dt1 , dt1 (≥ 0) (∀t = 1, 2, 3, . . . , T ) are usual negative/positive deviational variables from the desired aspired levels, respectively. Using goal programming theory, problem (1) can be transformed into the problem as:

min λ = Subject to,

T

t=1 − μt1 + dt1 − μt2 + dt2

− − dt1 + dt2

+ − dt1 = 1 ∀t = 1, 2, 3, . . . , T

+ − dt2 = 1 ∀t = 1, 2, 3, . . . , T

Al1 X 1 + Al2 X 2 + · · · + AlT X T (≤, =, ≥)bi ∀l = 1, 2, . . . , p and X 1 ≥ 0, X 2 ≥ 0, . . . , X T ≥ 0.

(13)

It is well known that if we can find any optimal solution of (13), i.e. say the vector (λ, X t ; t = 1, 2, 3, . . . , T ), then X t ; t = 1, 2, 3, . . . , T may be one of the satisfactory solution of model (1). But this does not happen in such complex hierarchical decision-making problem due to interacting decision-making variables of the problem in hierarchy. To handle this situation, Baky [13] suggested the formulation of membership function for the upper level decision vector X t ; t = 1, 2, 3, . . . , T −1 in which there is always possibility of rejection the solution by respective lower level decision-makers as satisfactory solution depends upon the selection of positive and negative tolerance values t kj R , t kj L ; k = 1, 2, 3, . . . , T − 1; j = 1, 2, . . . , m [12] and consequent occurrence of decision deadlock during decision-making process. Therefore, in order to avoid possibilities of repeated rejection of solution and subsequent decision deadlock situation, here we propose to formulate the linear format for the membership function of the decision vector X T (t = 1, 2, 3, . . . , T − 1) (as displayed in Fig. 3) in this simplified form (without using tolerance values) as: µX ( Xt )

Fig. 3 Membership function for μ X t (X t ) ∀t = 1, 2, 3, . . . , T − 1

t

1

Xt

Xt

Xt

454

K. Lachhwani

⎧ ⎪ ⎨ 1X −X for X t ≥ X t t t μ X t (X t ) = X t −X t for X t ≤ X t ≤ X t ⎪ ⎩ 0 for X t ≤ X t

(14)

where X t and X t (∀t = 1, 2, . . . , up to T − 1) are values of vectors at tth t t level corresponding to minimum values of d PIS and maximum values of d NIS , p p respectively. This simplified formulation of linear format of membership function of decision vector X T (∀t = 1, 2, . . . , up to T − 1) avoids the possibilities of repeated rejection of the solution by the decision-makers and subsequent decision deadlock situation. Now including μ X t (X t ) ∀t = 1, 2, 3, . . . , T − 1 and applying the usual goal programming approach, the model formulation for MLN-MODM problem (1) can be stated as: min λ = Subject to,

T T −1 − − dt1 + dt2 + d X−t

t=1 − μt1 + dt1 − μt2 + dt2

μ X t (X t )

+ − dt1 =1 + − dt2 = 1 + d X−t − d X+t

t=1

∀t = 1, 2, 3, . . . , T ∀t = 1, 2, 3, . . . , T = 1 ∀t = 1, 2, 3, . . . , T − 1

Al1 X 1 + Al2 X 2 + · · · + AlT X T (≤, =, ≥)bi ∀l = 1, 2, . . . , p and X 1 ≥ 0, X 2 ≥ 0, . . . , X T ≥ 0.

(15)

Now if we adopt the simplest version of goal programming, then the proposed simplified model of the problem becomes: min λ =

T T −1 − − dt1 + dt2 + d X−t t=1

t=1

− Subject to, μt1 + dt1 ≥ 1 ∀t = 1, 2, 3, . . . , T

− μt2 + dt2 ≥ 1 ∀t = 1, 2, 3, . . . , T μ X t (X t ) + d X−t ≥ 1 ∀t = 1, 2, 3, . . . , T − 1

Al1 X 1 + Al2 X 2 + · · · + AlT X T (≤, =, ≥)bi ∀l = 1, 2, . . . , p (16) and X 1 ≥ 0, X 2 ≥ 0, . . . , X T ≥ 0. The proposed model can also be reduced into as: min λ =

T t=1

T −1 − − dt1 + + dt2 d X−t t=1

(17)

35 Simplified TOPSIS for MLN-MODM Problems

subject to,

455

− ∗ t ∗ − PISt PISt PISt dt1 d PIS − d X + d − d ≥0 p p p p ∀t = 1, 2, . . . , T ; j = 1, 2, . . . , m ∗ − NISt ∗ t − NISt NISt dt2 X − d + d − d ≥0 d NIS p p p p − X t + X t + X t − X t d X−t ≥ 0 ∀t = 1, 2, . . . , T Al1 X 1 + Al2 X 2 + · · · + AlT X T (≤, =, ≥)bi ∀l = 1, 2, . . . , p

and X 1 ≥ 0, X 2 ≥ 0, . . . , X T ≥ 0; ∀t = 1, 2, . . . , T ; r = 1, 2, . . . , m (18)

4 Simplified TOPSIS Algorithm for ML-MODM Problems The new proposed simplified TOPSIS models (17) and (18) provide an alternate methodology for obtaining the satisfactory solution for complex ML-MODM problems of T levels. With these theoretical discussions, the stepwise algorithm for simplified TOPSIS for the solution of ML-MODM problems can be described as: Step 1.

Step 2. Step 3. Step 4.

Step 5. Step 6. Step 7. Step 8. Step 9.

Firstly, minimum (NIS) and maximum (PIS) values of each of objective functions of tth levels (t = 1, 2, . . . , T ; r = 1, 2, . . . , m) under given constraints irrespective of hierarchical structures and corresponding optimal solutions are calculated. Construct the PIS and NIS pay-off table for T levels using (3) and (4), respectively. t ∗ t − t ∗ t − Calculate the values of d PIS , d PIS , d NIS and d NIS for each p p p p level. Select the decision variables X t and X t (∀t = 1, 2, . . . , up to T − 1) for t each level corresponding to the minimum value of d PIS and maximum p NISt value of d p , respectively. Select p, { p = 1, 2, . . . , ∞} according to DMs criteria. t Xt Elicit the membership function μt1 ≡ μd PIS and μt2 ≡ p t X t ; ∀t = 1, 2, . . . up to T . μd NIS p Construct the membership function μ X t (X t ) ∀t = 1, 2, 3, . . . , T − 1 (as described in Eq. (14)) using the decision variable values as given in step 4. Formulate the model (18). Solve this nonlinear model to obtain satisfactory solution of ML-MODM problem.

456

K. Lachhwani

5 Illustrative Numerical Example In this section, to show the algorithmic comparison and efficiency of proposed simplified TOPSIS approach over the interactive TOPSIS approach by Baky [13], we illustrate the same numerical example which has been taken in [12] as: Max Z 11 = x1 + x2 + x3 , Z 12 = x12 + x22 + x32 x1 Max Z 21 = x12 + x2 + x3 , Z 22 = (x1 − 1)2 + x22 + x32 x2 ,x3 Max Z 31 = x12 + x22 + x3 , Z 32 = (x1 − 1)2 + (x2 + 1)2 + (x3 − 1)2 x3

with constraints to, x1 + 2x2 + x3 ≤ 8, 2x1 + x2 + x3 ≤ 7, x1 + x2 + x3 ≤ 5 and x1 , x2 , x3 ≥ 0 Tables 1 and 2 summarize individual optimal solutions for the maximization and minimization of each objective function, PIS and NIS payoff, respectively, for given problem subject to given constraints. Now, the simplified TOPSIS model (using model (8)) for the three-level MODM problem can be given as: Table 1 Maximum and minimum optimal values of each objective function Z 11

Z 12

Z 21

Z 22

Z 31

Z 32

min Z i j

0

0

0

0

0

1

max Z i j

5

25

12.25

26

16

27

G

G

Table 2 PIS and NIS payoff for each objective function t

t

t

t

PIS NIS (X ) at min d NIS (X ) at X (∀t = X (∀t = Levels max d PIS t t p (X ) at min d p (X ) at max d p p 1, 2) 1, 2) (x1 , x2 , x3 ) (x1 , x2 , x3 ) (x1 , x2 , x3 ) (x1 , x2 , x3 )

t =1

0.7071 at (0, 0, 0)

0.3302 at (2, 3, 0.5099 at (0, 0, 0) 5)

0 at (0, 0, 0)

x1 = 2

x1 = 0

t =2

0.7118 at (0.3358, 0, 0)

0.2959 at (0, 0, 0.5400 at (0, 0, 5) 5)

0.0094 at (0.3771, 0, 0)

x2 = 0 x3 = 5

x2 = 0

0.3755 at (0.3755, 0, 0.1686)

0 at (0, 4, 0)

0.0217 at (0.4126, 0, 0.2884)

t =3

0.7071 at (0, 4, 0)

x3 = 5

35 Simplified TOPSIS for MLN-MODM Problems

min λ =

457

3 − − dt1 + dt2 + dx−t t=1

subject to,

− X + (0.7118)d11 ≥0 2 − − d PIS X + (0.7118)d21 ≥0 p 3 − PIS − d p X + (0.7118)d31 ≥ 0 2 − d NIS X + (0.7071)d22 ≥ 0.7071 p − NIS3 dp X + (0.7071)d13 ≥ 0.7071 −

1 d PIS p

x1 + 2dx−1 ≥ 2 ∀t = 1, 2, . . . , T − 1 and X = (x1 , x2 , x3 ) ∈ G = {X |x1 + 2x2 + x3 ≤ 8, 2x1 +x2 + x3 ≤ 7, x1 + x2 + x3 ≤ 5, x1 , x2 , x3 ≥ 0} Solving the above programming model using nonlinear package (LINGO— nonlinear add on), the satisfactory solution of problem obtained as: λ = 2.1189, − − − − − d11 = 0.3371, d12 = 0.2029, d21 = 0.5267, d22 = 0.5129, d31 = 0.2765, − − d32 = 0.2625, dx1 = 0, x1 = 2, x2 = 3, x3 = 0 with the objective function values as: Z 11 = 5, Z 12 = 13, Z 21 = 7, Z 22 = 10, Z 31 = 13, Z 32 = 17. Also, the membership values achieved are: μ11 (X ) = 0.6628, μ21 (X ) = 0.4733, μ31 (X ) = 0.7235, μ12 (X ) = 0.7969, μ22 (X ) = 0.5129, μ32 (X ) = 0.2625 and μx1 (X ) = 1. The description of simplified TOPSIS model on numerical example and its respective satisfactory solution by simplified model are demonstrated in LINGO software (with nonlinear add on) in Figs. 4 and 5, respectively.

Fig. 4 Simplified TOPSIS model for numerical example in LINGO

458

K. Lachhwani

Fig. 5 Satisfactory solution of numerical example by proposed TOPSIS model in LINGO

It is important to mention that the solution of MLN-MODM problem using TOPSIS algorithm I of Baky [13] is X (x1 , x2 , x3 ) = (0, 3.08, 1.85) (with tolerance values used t1R = 1, t2R = 0.5, t1L = 0.5, δ = 0.85) and the value of objective functions as: Z 11 = 4.93, Z 12 = 12.88, Z 21 = 4.93, Z 22 = 13.88, Z 31 = 11.3, Z 32 = 18.34. The satisfactory solution using algorithm II by Baky [13] is X (x1 , x2 , x3 ) = (0, 0.0026, 4.997) (with tolerance values used t1R = 1, t2R = 0.5, t1L = 0.5, δ = 0.91) and the value of objective functions as: Z 11 = 4.996, Z 12 = 24.97, Z 21 = 4.9996, Z 22 = 25.97, Z 31 = 4.997. Comparison tables (Table 3) between solutions by proposed simplified TOPSIS approach and by TOPSIS algorithms I, II given by Baky [13] show that both the solutions of the problems are close to one another. However, if we compare complexity of computations of both the approaches, we can notice that the proposed methodology requires less computational work and is simpler than approach given by Baky [13] because (i) in this simplified approach, we propose simple calculations for PISt − t − dp and d NIS , (ii) use of simple FGP approach in place of complex max–min p approach for different membership functions, (iii) simplified membership functions for distance functions and decision vectors. Further by algorithms I and II of Baky [13], satisfactory solutions of the problem are calculated using different tolerance values repeatedly according to algorithm and solution depends upon the choice of tolerance values on the decision variables. It is worth mentioning that choice of these tolerance values does not have any theoretical background and purely based on hitand-trial basis. As consequences, this requires large computational difficulties and repetitive works. But, with our simplified TOPSIS, the satisfactory solution of such complex problems is obtained directly in one single step, and also, this satisfactory

35 Simplified TOPSIS for MLN-MODM Problems

459

Table 3 Comparison of satisfactory solution and membership function values for example 1 For numerical example 1 Simplified TOPSIS approach Z 11 Z 12 Z 21 Z 22 Z 31 Z 32

=5 = 13 =7 = 10 = 13 = 17

μ11 (X ) = 0.6628 μ21 (X ) = 0.4733 μ31 (X ) = 0.7235 μ12 (X ) = 0.7969 μ22 (X ) = 0.5129 μ32 (X ) = 0.2625 and μx1 (X ) = 1

X (x1 , x2 , x3 ) = (2, 3, 0) without the use of tolerance values and with less computational efforts

TOPSIS by Baky [13], Algorithm I Z 11 Z 12 Z 21 Z 22 Z 31 Z 32

= 4.93 = 12.88 = 4.93 = 13.88 = 11.3 = 18.34

μ Z 11 (Z 11 ) = 0.958 μ Z 12 (Z 12 ) = 0.521 μ Z 21 (Z 21 ) = 0.4 μ Z 22 (Z 22 ) = 0.534 μ Z 31 (Z 31 ) = 0.71 μ Z 32 (Z 32 ) = 0.71

TOPSIS approach by Baky [13], Algorithm II Z 11 Z 12 Z 21 Z 22 Z 31 Z 32

= 4.996 = 24.97 = 4.9996 = 25.97 = 4.997 = 17.98

μ Z 11 (Z 11 ) = 0.9999 μ Z 12 (Z 12 ) = 0.999 μ Z 21 (Z 21 ) = 0.408 μ Z 22 (Z 22 ) = 0.999 μ Z 31 (Z 31 ) = 0.31 μ Z 32 (Z 32 ) = 0.69

X (x1 , x2 , x3 ) = X (x1 , x2 , x3 ) = (0, 3.08, 1.85) with tolerance (0, 0.0026, 4.997) with values used t1R = 1, t2R = 0.5, tolerance values used t1R = 1, t2R = 0.5, t1L = t1L = 0.5, δ = 0.85 0.5, δ = 0.91

solution is nearly same with the solutions given by Baky [13]. Overall, a considerable amount of repetitive computational work is eliminated in proposed technique in comparison with techniques specified by Baky [13].

6 Concluding Remarks An effort has been made to simplify the earlier TOPSIS approach for solving MLNMODMPs with modifications in the interactive TOPSIS approach. The advantages of proposed simplified TOPSIS approach are (i) it reduces the computational efforts by using simplified membership functions for the distance functions and decision vectors and (ii) simple FGP is used to obtain satisfactory solution of MLNMODMPs in the last step of proposed algorithm and (iii) avoids the possibility of repeated rejection of the solution by the ULDMs and repeated computation of the problem in terms of changing tolerance values on decision variables. With these, it is obvious that the proposed simplified approach is simpler, more efficient and very less computational than interactive TOPSIS algorithms given by Baky [13]. The future research scope of proposed simplified TOPSIS methodology is extension of this technique for solving multilevel decentralized multiobjective programming problems (ML-DMOPPs), multilevel multiobjective quadratic fractional programming problems (ML-MOQFPPs), multilevel multiobjective integer programming (ML-MOIP) problems, bi-level nonlinear multiobjective programming (BL-MOP) problems, etc.

460

K. Lachhwani

Conflict of Interest As an author of this manuscript, I declare that I have no financial/non-financial interest in the content of this manuscript.

References 1. Baky IA (2010) Solving multi-level multi-objective linear programming problems through fuzzy goal programming approach. Appl Math Model 34:2377–2387. https://doi.org/10.1016/ j.apm.2009.11.004 2. Lachhwani K (2013) On solving multi-level multi objective linear programming problems through fuzzy goal programming approach. Opsearch 51:624–637. https://doi.org/10.1007/ s12597-013-0157-y 3. Lachhwani K (2015) Modified FGP approach for multi-level multi objective linear fractional programming problems. Appl Math Comput 266:1038–1049. https://doi.org/10.1016/j.amc. 2015.06.027 4. Abo-Sinna MA, Amer AH (2005) Extensions of TOPSIS for multi-objective large-scale nonlinear programming problems. Appl Math Comput 162:243–256. https://doi.org/10.1016/ j.amc.2003.12.087 5. Baky IA (2009) Fuzzy goal programming algorithm for solving decentralized bi-level multiobjective programming problems. Fuzzy Sets Syst 160:2701–2713. https://doi.org/10.1016/j. fss.2009.02.022 6. Abo-Sinna MA, Baky IA (2007) Interactive balance space approach for solving multi-level multi-objective programming problems. Inf Sci 177:3397–3410. https://doi.org/10.1016/j.ins. 2007.02.005 7. Abo-Sinna MA, Amer AH, Ibrahim AS (2008) Extensions of TOPSIS for large scale multiobjective non-linear programming problems with block angular structure. Appl Math Model 32:292–302. https://doi.org/10.1016/j.apm.2006.12.001 8. Baky IA, Abo-Sinna MA (2013) TOPSIS for bi-level MODM problems. Appl Math Model 37:1004–1015. https://doi.org/10.1016/j.apm.2012.03.002 9. Vicente LN, Calamai PH (1994) Bilevel and multilevel programming: a bibliography review. J Global Optim 5:291–306. https://doi.org/10.1007/bf01096458 10. Lachhwani K, Dwivedi A (2017) Bi-level and multi-level programming problems: taxonomy of literature review and research issues. Arch Comput Meth Eng 25:847–877 11. Lai YJ, Liu TJ, Hwang CL (1994) TOPSIS for MODM. Eur J Oper Res 76:486–500 12. Zavadskas ED, Mardani A, Turskis Z, Jusoh A, Nor KMD (2016) Development of TOPSIS method to solve complicated decision-making problems—an overview on developments from 2000 to 2015. Int J Inf Technol Decis Mak 15(3):645–682 13. Baky IA (2014) Interactive TOPSIS algorithms for solving multilevel non-linear multiobjective decision making problems. Appl Math Model 38:1417–1433 14. Lachhwani K (2021) Solving the general fully neutrosophic multi-level multiobjective linear programming problems. OPSEARCH. Online published: https://doi.org/10.1007/s12597-02100522-8

Chapter 36

A Comprehensive Review Analysis on PSO and GA Techniques for Mathematical Programming Problems Kailash Lachhwani

1 Introduction Mathematical programming problems (MPPs) are specifically structured optimization problems generally having three different parts—(i) objective function/s, (ii) constraints, and (iii) non-negative restrictions in mathematical formulation. The types of MPP differ with the variation in the form of (linear/nonlinear/multiple/multi-level, etc.) of objective function/s and/or constraints, and accordingly, specific MPPs are defined. The brief on important mathematical programming problems is as: (i)

(ii)

(iii)

(iv)

LPP/NLPP (Linear/nonlinear programming problem): Programming problems in which objective function and constraints (parts (i) and (ii)) are linear/programming problems in which either objective function or any constraint (part (i) or any one of part (ii)) is nonlinear, respectively. MOLPP/MO-NLPP (Multiobjective linear/nonlinear programming problem): Programming problems with many conflicting objective functions and constraints (both in linear format) or many conflicting objective functions and constraints (either any of objective or any constraint is nonlinear). BL-LPP/BL-NLPP (Bi-level linear/nonlinear programming problem): Double level hierarchical programming problem in which objective function at each level and each constraint is in linear format/in which either at least one objective function of any level or any constraint is nonlinear, respectively. ML-LPP/ML-NLPP (Multi-level linear/nonlinear programming problem): Multiple levels (more than two levels) hierarchical programming problem in which each objective function at each level and each constraint are linear

K. Lachhwani (B) Department of Applied Science, National Institute of Technical Teacher’s Training and Research, Chandigarh 160019, India e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1_36

461

462

(v)

K. Lachhwani

function/in which either at least one objective function of any level or any one constraint is nonlinear function, respectively. ML-MOLPP/ML-MO-NLPP (Multi-level multiobjective linear/nonlinear programming problem): Multiple levels (More than two levels) hierarchical programming problem with many objective function at each level in which each of objective function and each constraint are in linear format/in which either at least one of the objective function of any level or any one constraint is nonlinear function, respectively.

Besides these problems, there are other extension problems like MO-LFPPs, BLLFPPs, ML-LFPPs, BL-MOPPs, BL-MO-LFPPs, and ML-MO-LFPPs which are also available in the literature of MPPs.

1.1 Related Studies and Contributed Works The existing literature witnessed that good research review have been carried out in the core area of MPPs such as taxonomy and classification of BLPPs and MLPPs [1], reviews on MOFPPs [2], and use of NNs for solving MPPs [3]. Furthermore, PSO—particle swarm optimization and GA—genetic algorithm are significant search techniques frequently used to find global or local optimal solutions to various types of constrained and unconstrained optimization problems. Particularly for different types of MPPs, it has been observed that there is lack of such comprehensive analysis which covers PSO, GA, and their hybrid techniques for solution of MPPs. The main purpose of this study is to present a comprehensive review analysis on use of PSO, GA, and their hybrid techniques for solving MPPs.

1.2 Target Research Groups The present review is intended for the scientist and researchers, who are curious to apply PSO, GA, and hybrid techniques in solving mathematical programming problems and engineering problems formulated as MPPs. Remaining part of the paper is organized into four more sections. Section 2 presents the brief background of PSO, GA, and their hybrid methodologies. Basic concepts of PSO and GA are also presented in this section. Review of literature in chronological order on these techniques or solving MPPs is explained in Sect. 3. The comprehensive analysis on review literature is presented in Sect. 4. Research implications and prospective research directions are suggested in Sect. 5. The concluding remarks are presented in the last section.

36 A Comprehensive Review Analysis on PSO and GA Techniques …

463

2 Background—PSO and GA Methodologies In 1995, Kennedy and Eberhart [4] introduced a new population-based evolutionary computational technique known as particle swarm optimization (PSO) based on simulation of social behavior of species like bird blocking and fish schooling. This algorithm starts with population of particles (random) with assigned random velocity. These particles have memory, and each of these keeps track of its previous best position Pbest (say). Among the number of Pbest for particles, the particle with the greatest fitness value (value of objective function) is taken as global best Pg of swarm (group of particles). The general formulas (1)–(2) for updated velocities and positions of ith particle for its dth dimension are given as: vid (t + 1) = 1wvid (t) + c1 r1 ( pid − xid (t)) + c2 r2 ( pgd − xid (t)) ↓ Previous velocity

↓ Cognitive component

xid (t + 1) = xid (t) + vid (t + 1)

↓ Social component

(1) (2)

where w (inertia weight); c1 (cognitive component) and c2 (social parameters) are the main parameters for PSO. r 1 and r 2 are random numbers lying in the range of (0.0, 1.0). While on the other hand, in 1975 Prof. John Holland [5] introduced another population-based search technique known as genetic algorithm (GA) based on the mechanism of natural genetics along with Darwin’s principle of natural selection. This GA starts with population of initial solutions generated at random with their individual fitness values. This population of solutions is iteratively modified employing different operators, namely reproduction, crossover, mutation, and others. After the operations of these operators on population of solutions, one generation of GA is completed. With generation of certain number of populations, user uses different criteria like desired level of accuracy and maximum number of generations to terminate this program. GA has many advantages over traditional methods. Some important advantages are: (i) being an iteration-based random search method GA overcomes the main limitation of computational complexities of traditional optimization methods. (ii) It does not require any gradient information of the objective functions. Further, slight modifications in GA strings can solve more complex optimization problems. Apart from this, GA has some disadvantages like computationally expensive, slow convergence, and lack of mathematical convergence proof.

464

K. Lachhwani

2.1 Comparison of PSO and GA If we compare PSO and GA computationally and with reference to mechanism charts of techniques (Figs. 1 and 2), both PSO and GA have some similarities like both start with population of solutions generated at random, and at every iteration, solutions are compared in terms of their fitness values. In PSO, previous velocities of particles, personal thinking of particles (c1 parameter) and social component of particles (c2 parameter) play roles, whereas in GA, operators like reproduction, crossover, and mutation are important. In practice, GA is a powerful tool for global optimization, whereas PSO carries both the global and local searches simultaneously.

Fig. 1 Mechanism chart of PSO

36 A Comprehensive Review Analysis on PSO and GA Techniques …

465

Fig. 2 Mechanism of GA

2.2 Hybrid of PSO and GA In order to improve the quality of solutions of complex problems, concept of hybrid techniques has been used. Hybrid or combination of these two important techniques PSO and GA provides better solution to complex problems as compared to individual PSO or GA technique is adopted. In a hybrid technique, PSO or GA is applied to some part of the problem and rest part of the problem is solved by either of PSO or GA technique using the solution sets given by the previous technique. Normally, hybrid techniques with PSO and GA are hybrid PSO–GA or hybrid GA–PSO. The proportion of techniques PSO or GA depends upon the complexity of the problem, initial information, and parameters available. Hybrid techniques enjoy the advantages of both the techniques and minimize the disadvantages of these techniques.

466

K. Lachhwani

3 Review of Literature PSO, GA, and hybrid techniques are useful optimization techniques widely used in solving complex optimization problems. But in case of solving complex mathematical programming problems (MPPs), application of GA, PSO, and their hybrid technique becomes comparatively more complicated and computational. In recent past, many researchers have tried to use these techniques to handle complex MPPs and proved that at some extend these techniques are also useful in solving such hard problems. The detailed research review on these techniques for solving MPPs and review analysis are given in the following sections. For this, research contributions from reputed databases like Thomson Reuters, Web of Science, Scopus, and IEEE are thoroughly reviewed and included in the presentation of this article. The contribution of researchers with these techniques on MPPs is described in the following parts as: (i)

PSO for mathematical Programming problem The application of PSO on MPPs was initiated during beginning of twentyfirst century by Laskari et al. [6], and this is continued till today. Now, this area has become an active area of research but comparatively less that GA on MPPs. Laskari et al. [6] investigated the output of the PSO method for integer programming problems. Thereafter, Tang et al. [7] presented a new PSO algorithm for integer programming problem based on the set of integer space. Dong et al. [8] developed “a PSO with constraint fitness priority-based ranking method” to solve nonlinear programming problem. This new PSO with varied inertia weighted values can find the global optimum of nonlinear programming problems. Kitayama and Yasuda [9, 10] suggested PSO for mixed integer programming problems. Li et al. [11] used a hierarchical PSO for solving BLPPs. Liu et al. [12] experimented with the performance of quantum-behaved PSO (QPSO) for integer programming and concluded that QPSO handles integer programming problems faster than the PSO algorithm. Pei et al. [13] invented a hybrid PSO algorithm with a simplex algorithm for nonlinear bi-level programming problems. Zhao and Gu [14] invented a modified PSO which was improved version of standard PSO and established an universal effective algorithm for solving BLP model based on improved PSO algorithm. In the year 2007, Liu et al. [15] further discussed the applicability of QPSO to integer programming problems and showed the superiority of QPSO to PSO on the problems. Sun et al. [16] used quantum-behaved PSO (QPSO) in solving nonlinear programming problems (NLPPs). Yiqing et al. [17] suggested “an improved PSO algorithm for solving nonconvex NLP/MINLP problem with equality and/or inequality constraints.” Zhao et al. [18] further used the same modified PSOs for bi-level programming model based on the hierarchical iteration. Matsui et al. [19] proposed “an

36 A Comprehensive Review Analysis on PSO and GA Techniques …

(ii)

467

approximate solution method based on PSO for nonlinear integer programming problems.” Further, Matsui et al. [20] proposed “an approximate solution method based on PSO for nonlinear 0–1 programming problems.” Kuo and Huang [21] developed an efficient method based on particle swarm optimization (PSO) algorithm for BLPPs. Congying et al. [22] suggested a particle swarm optimization (PSO) algorithm to solve the quadratic assignment problem. Mohammad Nezhad and Mahlooji [23] presented a novel algorithm based on revision of PSO algorithm for solving constrained nonlinear programming problems. Mamaghani and Meybodi [24] suggested a particle swarm optimization algorithm to solve the quadratic assignment problem. Zhang et al. [25] suggested an interactive “improved PSO algorithm for solving bi-level multiobjective programming problem (BLMPP).” Hezam and Raouf [26] used PSO technique to solve complex variable fractional programming problems (CVFPP). Jiang et al. [27] proposed “a novel approach based on PSO to solve nonlinear bi-level programming problem (NBLP).” ElHefnawy [28] suggested a corrective PSO algorithm for solving BLPPs with adaptive inertia weights to control the domain of the particles in region. Ma et al. [29] proposed a hierarchical hybrid particle swarm optimization (PSO) to deal with bi-level programming problem (BLPP) with some modifications to eliminate the shortcomings of basic PSO. Raouf and Hezam [30] proposed PSO techniques to solve fractional programming problems. Han et al. [31] developed “a novel bi-level PSO algorithm to solve bi-level programs (BLPPs) with nonlinear and large-scale constraints” and then further extended it to solve tri-level programming problems. GA for Mathematical Programming Problems GA and related population-based random search technique frequently used as a tool to obtain optimal or near-optimal solution to different complex mathematical programming problems. Starting in the year 1994, Mathieu et al. [32] reported the use of a genetic algorithm-based technique, genetic algorithm with only mutation operations (GABBA) to solve bi-level linear programming (BLLP) problems. Joines et al. [33] designed an integer programming problem model based on GA with application in cellular manufacturing system. Sakawa and Shibano [34] proposed the interactive fuzzy programming methods and genetic algorithms with double strings for solving multiobjective 0–1 programming problems. Yokota et al. [35] used GA with penalty function for solving nonlinear mixed integer programming (NMIP) problems. Yokota et al. [36] suggested a nonlinear integer programming problem model for optimal design of system reliability problem and solved it using genetic algorithms (GA). Gen et al. [37] attempted to apply genetic algorithms to the fuzzy nonlinear goal programming problems. Sakawa et al. [38] suggested GAs with double strings for general multiobjective 0–1 programming problems involving positive and negative coefficients. Tang and Wang [39] used GA with mutation operator along the weighted gradient direction for solution of fuzzy quadratic programming problems (FQP) with the fuzzy objective and resource constraints. Tang et al. [40] proposed a special hybrid genetic

468

K. Lachhwani

algorithm (HGA) with penalty function for nonlinear programming problems. Yin [41] suggested an efficient genetic algorithms-based (GAB) approach to solve bi-level programming problems and also proved that the GAB approach is more efficient and much simpler than previous heuristic algorithms. Deb [42] suggested “GAs for the goal programming problem as a multiobjective optimization problem of minimizing deviations from individual goals.” Fung et al. [43] extended the hybrid genetic algorithm-HGA to nonlinear programming (NLP) problems with equality and inequality constraints. Hejazi et al. [44] developed an efficient approach based on genetic algorithm for linear bi-level programming problems. Oduguwa and Roy [45] proposed a bi-level genetic algorithm (BiGA) to solve different classes of the BLP problems. Sakawa and Kato [46] suggested “a solution method for nonlinear integer programming problems by extending GAs with double strings” and further extended it to solve multiobjective nonlinear integer programming problems. Guang-Min et al. [47] provided a genetic algorithm method for solving the linear bi-level programming. Sakawa et al. [48] proposed a GA for nonlinear integer programming problems as an extension of previous genetic algorithms with double strings and further extended it to solve linear integer programming problems. Gupta and Bhunia [49] discussed “a real-coded genetic algorithm (RCGA)” for integer linear programming problem within a production–transportation problem. Jana and Biswal [50] suggested solution techniques to solve multiobjective chance-constrained programming problems. Wang et al. [51] discovered a genetic algorithm method for solving the linear bi-level programming. Pal and Gupta [52] suggested “a genetic algorithm (GA) to the goal programming (GP) formulation of interval-valued multiobjective fractional programming problems (MOFPPs).” Wang et al. [53] discussed “a GA based on the simplex method to solve the linear-quadratic bi-level programming problem (LQBP).” Deep et al. [54] suggested “a real-coded genetic algorithm (named MI-LXPM) with inclusion of a special truncation procedure to handle integer restrictions and a parameter-free penalty approach for constraints for solving integer and mixed integer programming problems.” Deep et al. [54] also proved this MI-LXPM performs better than other algorithms. Pal and Gupta [55] presented a genetic algorithm (GA)-based solution method to multiobjective programming problems with fractional criteria. Pal and Gupta [56] presented genetic algorithm (GA) formulation for a chance-constrained multiobjective decision making (MODM) problem. Jana and Sharma [57] presented a GA-based procedure for solving a multiobjective chance-constrained programming problem with discrete random variables. Osman et al. [58] designed a genetic algorithm (GA) for rough bi-level programming problems (RBLPPs). Pal et al. [59] presented a GA-based procedure for modeling and solving multi-level programming (MLP) problems in a hierarchical organization. Tang et al. [60] suggested an improved genetic algorithm (IGA) to solve nonlinear programming problems. Pal and Gupta [61] extended a genetic algorithm (GA) for modeling and solving bi-level

36 A Comprehensive Review Analysis on PSO and GA Techniques …

(iii)

(iv)

469

programming problems having fractional objectives. Hosseini and Kamalabadi [62] suggested efficient GA method to solve BLPPs with the Karush– Kuhn–Tucker (KKT) conditions for transforming the BLPP into single level problem. Li [63] presented a genetic algorithm with global convergence to solve a class of nonlinear bi-level programming problems. Li et al. [64] proposed modified NSGA-II algorithm for multiobjective bilevel linear programming problems (MOBLPPs) in which original problem is converted into multiobjective single level problem with the use of adaptive weights. Hybrid of PSO and GA for Mathematical Programming Problems Parallel to PSO and GA, their hybrid techniques have also been used as tools for solving MPPs for getting better solutions of the problems in last 90s. Formally in 1998, Eberhart and Shi [65] compared PSO and GA for MPPs and suggested each computation paradigms and its effect on search behavior in details. Li and Wang [66] proposed a new hybrid genetic algorithm for solving a class of bi-level programming problems. Kuo and Han [67] developed a hybrid of GA and PSO to solve the bi-level linear programming problems efficiently with applications in supply chain distribution problem. Sahoo et al. [68] developed an efficient hybrid GA–PSO approach for solving mixed integer nonlinear programming problems. Kuo et al. [69] presented a hybrid GA–PSO to solve the bi-level linear programming problem (BLPP). Applications of BLPPs with PSO and GA Recently, Cai et al. [70] used hybrid of PSO and GA to solve bi-level mixed integer programming problem for maritime search and rescue (SAR) systems. Khan et al. [71] proposed modified PSO and hybridized GA to solve traveling salesman problem (TSP) in different environments, and authors have proved efficiency of proposed approach by comparing results with approach by Kuo and Han [67]. Feng [72] proposed a BLPP model for routing problem of hazardous waste materials under fuzzy environment with the use of PSO and GA technique. Fathollahi-Fard et al. [73] suggested bi-level model between nurses and patients as healthcare supply chain model and proposed solution procedure based on PSO, GA, and other heuristic methods. Lofti et al. [74] suggested bi-level programming technique for locating power plants and other renewable energy sites. Luo et al. [75] suggested bi-level model for supply chain problem of an enterprise and applied a PSO-based algorithm to solve this problem.

4 Comprehensive Analysis on Review In this section, the research contributions with PSO, GA, and hybrid techniques are classified on basically three main types problems (i) linear/nonlinear programming problems, (ii) multiobjective programming problems (iii) bi-level programming problems in tabular form (Table 1). The number of contributions is reflected by bar

470

K. Lachhwani

Table 1 Classification of research contribution on use of PSO, GA and their Hybrid on MLPPs Classification of research contribution on MPPs with the use of PSO techniques

MPPs with the use of GA techniques

MPPs with the use of hybrid of techniques

Linear, integer, and nonlinear programming problems Laskari et al. [6], Tang et al. [7], Dong et al. [8], Kitayama and Yasuda [9, 10], Liu et al. [12, 15], Sun et al. [16], Yiqing et al. [17], Matsui et al. [19, 20], Congying et al. [22], Mohammad Nezhad and Hashem [23], Mamaghani and Meybodi [24], Hezam and Raouf [26], Jiang et al. [27], Raouf and Hezam [30], Han et al. [31] Total references: 18

Joines et al. [33], Yokota et al. Eberhart and Shi [65], [35, 36], Gen et al. [37], Fung Sahoo et al. [68] et al. [43], Gupta and Bhunia Total references: 02 [49], Wang et al. [53], Deep et al. [54], Pal and Gupta [56], Jana and Sharam [57], Tang et al. [60], Sakawa et al. [48] Total references: 12

Multiobjective programming problems Zhang et al. [25] Total references: 01

Sakawa and Toshihiro [34], Total references: NIL Sakawa et al. [38], Tang and Wang [39], Tang et al. [40], Deb [42], Sakawa and Kato [46], Jana and Biswal [50], Pal and Gupta [52, 55] Total references: 09

Bi-level programming problem Li et al. [11], Pei et al. [13], Zhao and Gu [14], Zhao et al. [18], Kuo and Huang [21], Zhang et al. [25], El-Hefnawy [28], Ma et al. [29] Total references: 08

Mathieu et al. [32], Yin [41], Hezari et al. [44], Oduguwa and Roy [45], Guang-Min et al. [47], Wang et al. [51, 53], Osman et al. [58], Pal et al. [59], Hosseini and Kamalbadi [62], Li [63] Total references: 11

Li and Wang[66], Kuo and Han [67], Kuo et al. [69] Total references: 03

Multi-level linear programming problem (ML-LPPs)/ML-NLPPs/ML-MOLPPs/ML-MO-NLPPs Total references: NIL

Pal et al. [59] (for MLPPs) Total references: 01

Total references: NIL

chart (Fig. 3). Besides these, research contributions on applications of BLPPs with PSO and GA solution techniques are given in Table 2. In view number of research papers contribution (Tables 1, 2, and bar chart (Fig. 3)) with PSO, GA, and their hybrid techniques in solving mathematical programming problems, it is quite evident that more research work is carried out with GA technique in comparison to PSO and hybrid techniques, and very less research contributions are visible with hybrid techniques. GA has been used more frequently in solving LPPs, NLPPs, multiobjective programming problems, bi-level programming problems than

36 A Comprehensive Review Analysis on PSO and GA Techniques …

471

Use of PSO, GA and Hybrid techniques in solving important MPPs 20

18

18

No. o Research ar cles

16 14

12

11

12 9

10

8

8 6 4

2

2

3 1

0

0

1

0

0 Linear, Integer and Non-linear programming problems

Mul objec ve programming problems PSO

GA

Bi-level programming problems

Hybrid techniques

Mul -level programming problems (MLPPs)/MLNLPPs/ ML-MOLPPs/ ML-MONLPPs)

Fig. 3 Number of research articles in solving important MPPs

Table 2 Application of BLPPs with PSO, GA and their Hybrid Research contributions on applications of BLPPs with PSO and GA techniques Applications of BLPPs with PSO

Applications of BLPPs with GA

Applications of BLPPs with hybrid of PSO and GA

Luo et al. [75]

Lofti et al. [74]

Cai et al. [70], Khan et al. [71], Fathollahi-Fard et al. [73]

PSO technique. Hybrid techniques are not significantly used to handle multiobjective programming problems. Further, still researchers are not able to touch more complex programming problems like MOPPs, MLPPs, MO-MLPPs, MO-LFPPs, and MO-NLPPs with PSO, GA, and hybrid techniques as no such relevant research contributions are available in reputed databases like Thomson Reuters, IEEE, and Scopus. This shows that PSO, GA, and hybrid techniques can be used/extended/modified for solving such problems as future research scope in this area for researchers. In nutshell, this study is presented as a comprehensive review analysis on two important nature-inspired techniques—PSO, GA, and hybrid techniques. A categorical list around sixty-eight articles with these techniques for MPPs is also presented. The maximum efforts have been put to include the important relevant manuscripts abstracted in reputed database like Web of Science, Scopus, and IEEE in this article.

472

K. Lachhwani

Besides this, the manuscripts purely based on application of PSO, GA, or hybrid techniques for MPPs have been considered for the presentation of this work.

5 Research Benefits and Prospective Research Directions The research benefits and prospective research directions of proposed analysis are represented in the following subsections:

5.1 Research Benefits This study provides classification of MPPs according to the solution techniques as PSO, GA, and hybrid techniques. Here, 68 most relevant manuscripts on PSO, GA, and hybrid techniques have been discussed in view of solution of mathematical programming problems with these techniques. This study will be very beneficial to the researchers working in this domain. The researchers will get summarized information about the use of PSO, GA, and hybrid techniques for solving mathematical programming problems.

5.2 Prospective Research Directions In particular, there is no comprehensive study/analysis which is carried out on the use of two important nature-inspired optimization techniques, namely PSO and GA for solving complex MPPs. The proposed study is sufficient to fill this research gap for prospective researchers in this area. The use of PSO, GA, and hybrid techniques has assisted in finding the optimal solution of some important complex MPPs with accuracy. However, no suitable work on hybrid of PSO and GA has been found for solving more complex MPPs, e.g., MOPPs, MO-LFPPs, MO-NLPPs, ML-LPPs, and ML-MOPPs. Therefore, as future research directions hybrid of PSO and GA can be used in solving these complex problems.

6 Concluding Remarks This study is presented as a comprehensive review analysis on two important natureinspired algorithms—PSO, GA, and hybrid of PSO and GA in context of their role in solving mathematical programming problems. This comprehensive analysis provides: (i) up-to-date literature survey, (ii) present research gaps in the area of MPPs with PSO and GA techniques, (iii) along with the future scope of these

36 A Comprehensive Review Analysis on PSO and GA Techniques …

473

techniques in solving mathematical programming problems. With this study, it is found that the least research area is multiobjective programming problem and most explored research areas are—linear/nonlinear programming problems which have been pointed out in view of PSO, GA, and hybrid solution techniques. Future research directions in the area of MOPPs, MLPPs, ML-MOPPs, etc., with PSO, GA, and hybrid techniques are identified. This study will be very beneficial to the researchers working in this domain. The researchers will get summarized information about the use of PSO, GA, and hybrid techniques for solving mathematical programming problems. Compliance with Ethical Standards Conflict of Interest Author declares that there is no conflict of interest regarding the publication of this article.

References 1. Lachhwani K, Dwivedi A (2018) Bi-level and multi-level programming problems: taxonomy of literature review and research issues. Arch Comput Meth Eng 25:847–877 2. Bhati D, Singh P, Arya R (2016) A taxonomy and review of the multi-objective fractional programming (MOFP) problems. Int J Appl Comput Math 3:2695–2717 3. Lachhwani K (2020) Application of neural network models for mathematical programming problems: a state of art review. Arch Comput Meth Eng 27:171–182 4. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of the 1995 IEEE international conference on neural networks, Perth, Australia. IEEE Service Center, Piscataway, NJ, pp 1942–1948 5. Holland J (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor 6. Laskari EC, Parsopoulos KE, Vrahatis MN (2002) Particle swarm optimization for integer programming. In: Proceedings of the IEEE congress on evolutionary computation CEC’02, pp 1582–1587 7. Tang Y, Gao H, Jian-Chao Z, Tan Y, Gao HM, Zeng JC (2004) Particle swarm optimization for integer programming. Syst Eng Theory Pract 24:126–129 8. Dong Y, Tang J, Xu B, Wang D (2005) An application of swarm optimization to nonlinear programming. Comput Math Appl 49:1655–1668 9. Kitayama S, Yasuda K (2005) A method for mixed integer programming problems by particle swarm optimization. IEEJ Trans Electron Inf Syst 125:813–820 10. Kitayama S, Yasuda K (2006) A method for mixed integer programming problems by particle swarm optimization. Electr Eng Jpn 157:40–49 11. Li X, Tian P, Min X (2006) A hierarchical particle swarm optimization for solving bilevel programming problems. In: International conference on artificial intelligence and soft computing—ICAISC 2006. Springer, Berlin, pp 1169–1178 12. Liu J, Sun J, Xu W (2006) Quantum-behaved particle swarm optimization for integer programming. In: King I, Wang J, Chan L, Wang D (eds) Neural information processing, ICONIP 2006. Lecture notes in computer science. Springer, Berlin, pp 1042–1050. https://doi.org/10.1007/ 11893257_114 13. Pei Z, Tian S, Huang H (2006) A novel method for solving nonlinear bilevel programming based on hybrid particle swarm optimization. https://ieeexplore.ieee.org/document/4129231. https://doi.org/10.1109/ICOSP.2006.345738

474

K. Lachhwani

14. Zhao Z, Gu X (2006) Particle swarm optimization based algorithm for bilevel programming problems. In: Proceeding of sixth IEEE international conference on intelligent systems design and applications, pp 951–956 15. Liu J, Xu W-B, Sun J (2007) Quantum-behaved particle swarm optimization for integer programming. Appl Res Comput 24:79–81 16. Sun J, Liu J, Xu W (2007) Using quantum-behaved particle swarm optimization algorithm to solve non-linear programming problems. Int J Comput Math 84:261–272. https://doi.org/10. 1080/00207160601170254 17. Yiqing L, Xigang Y, Yongjian L (2007) An improved PSO algorithm for solving non-convex NLP/MINLP problems with equality constraints. Comput Chem Eng 31:153–162 18. Zhao Z, Gu X, Li T (2007) Particle swarm optimization for bi-level programming problem. Syst Eng Theory Pract 8:92–98 19. Matsui T, Kato K, Sakawa M (2008) Particle swarm optimization for nonlinear integer programming problems. In: Proceeding of international multiconference of engineers and computer scientists, pp 1874–1877 20. Matsui T, Sakawa M, Kato K, Uno T (2008) Particle swarm optimization for nonlinear 0-1 programming problems. In: IEEE International conference on systems, man and cybernetics, pp 168–173. https://doi.org/10.1109/icsmc.2008.4811269 21. Kuo RJ, Huang CC (2009) Application of particle swarm optimization algorithm for solving bi-level linear programming problem. Comput Math Appl 58:678–685 22. Congying L, Huanping Z, Xinfeng Y (2011) Particle swarm optimization algorithm for quadratic assignment problem. In: Proceedings of IEEE international conference on computer science and network technology, pp 1728–1731 23. Mohammad Nezhad A, Mahlooji H (2011) A revised particle swarm optimization based discrete Lagrange multipliers method for nonlinear programming problems. Comput Oper Res 38:1164–1174 24. Mamaghani AS, Meybodi MR (2012) Solving the quadratic assignment problem with the modified hybrid PSO algorithm. In: 6th IEEE International conference on application of information and communication technologies (AICT), pp 1–6 25. Zhang T, Hu T, Zheng Y, Guo X (2012) An improved particle swarm optimization for solving bilevel multiobjective programming problem. J Appl Math 1–13 26. Hezam IM, Raouf OA (2013) Particle swarm optimization approach for solving complex variable fractional programming problems. Int J Eng Res Technol 2:2672–2677 27. Jiang Y, Li X, Huang C, Wu X (2013) Application of particle swarm optimization based on CHKS smoothing function for solving nonlinear bilevel programming problem. Appl Math Comput 219:4332–4339 28. El-Hefnawy N (2014) Solving bi-level problems using modified particle swarm optimization algorithm. Int J Artif Intell 12:88–101 29. Ma W, Wang M, Zhu X (2013) Improved particle swarm optimization based approach for bilevel programming problem-an application on supply chain model. Int J Mach Learn Cybern 5:281–292 30. Raouf OA, Hezam IM (2014) Solving fractional programming problems based on swarm intelligence. J Ind Eng Int 10:1–10 31. Han J, Zhang G, Hu Y, Lu J (2016) A solution to bi/tri-level programming problems using particle swarm optimization. Inf Sci 370–371:519–537 32. Mathieu R, Pittard L, Anandalingam G (1994) Genetic algorithm based approach to bi-level linear programming. RAIRO Oper Res 28:1–21 33. Joines JA, Culbreth CT, King RE (1996) Manufacturing cell design: an integer programming model employing genetic algorithms. IIE Trans 28:69–85 34. Sakawa M, Shibano T (1996) Interactive fuzzy programming for multiobjective 0–1 programming problems through genetic algorithms with double strings. In: Fuzzy logic foundations and industrial applications. International series in intelligent technologies, vol 8. Springer, Berlin, pp 111–128

36 A Comprehensive Review Analysis on PSO and GA Techniques …

475

35. Yokota T, Gen M, Li Y-X (1996) Genetic algorithm for non-linear mixed integer programming problems and its applications. Comput Ind Eng 30:905–917 36. Yokota T, Gen M, Li Y, Kim CE (1996) A genetic algorithm for interval nonlinear integer programming problem. Comput Ind Eng 31:913–917 37. Gen M, Ida K, Lee J, Kim J (1997) Fuzzy nonlinear goal programming using genetic algorithm. Comput Ind Eng 33:39–42 38. Sakawa M, Kato K, Sunada H, Shibano T (1997) Fuzzy programming for multiobjective 0–1 programming problems through revised genetic algorithms. Eur J Oper Res 97:149–158 39. Tang J, Wang D (1997) An interactive approach based on a genetic algorithm for a type of quadratic programming problems with fuzzy objective and resources. Comput Oper Res 24:413–422 40. Tang J, Wang D, Ip A, Fung RYK (1998) A hybrid genetic algorithm for a type of nonlinear programming problem. Comput Math Appl 36:11–21 41. Yin Y (2000) Genetic-algorithms-based approach for bilevel programming models. J Transp Eng 126:115–120 42. Deb K (2001) Nonlinear goal programming using multi-objective genetic algorithms. J Oper Res Soc 52:291–302 43. Fung RYK, Tang J, Wang D (2002) Extension of a hybrid genetic algorithm for nonlinear programming problems with equality and inequality constraints. Comput Oper Res 29:261–274 44. Hejazi SR, Memariani A, Jahanshahloo G, Sepehri MM (2002) Linear bilevel programming solution by genetic algorithm. Comput Oper Res 29:1913–1925. https://doi.org/10.1016/ S0305-0548(01)00066-1 45. Oduguwa V, Roy R (2002) Bi-level optimisation using genetic algorithm. In: Proceedings of the IEEE international conference on artificial intelligence systems (ICAIS 2002), pp 322–327 46. Sakawa M, Kato K (2003) An interactive fuzzy satisfying method for multiobjective nonlinear integer programming problems through genetic algorithms. In: Fuzzy sets and systems, IFSA 2003. Lecture notes in computer science (Lecture notes in artificial intelligence). Springer, Berlin, pp 710–717 47. Guang-Min W, Zhong-Ping W, Xian-Jia W, Ya-Lin C (2005) Genetic algorithms for solving linear bilevel programming. In: Sixth international conference on parallel and distributed computing applications and technologies (PDCAT’05), pp 920–924 48. Sakawa M, Kato K, Kalam Azad MdA, Watanabe R (2005) A genetic algorithm with double string for nonlinear integer programming problems. In: IEEE International conference on systems, man and cybernetics, pp 3281–3286. https://doi.org/10.1109/icsmc.2005.1571652 49. Gupta R, Bhunia A (2006) An application of real-coded genetic algorithm (RCGA) for integer linear programming in production-transportation problems with flexible transportation cost. AMO-Adv Model Optim 8:73–98 50. Jana RK, Biswal MP (2006) Genetic based fuzzy goal programming for multiobjective chance constrained programming problems with continuous random variables. Int J Comput Math 83:171–179 51. Wang G, Wang X, Wan Z, Jia S (2007) An adaptive genetic algorithm for solving bilevel linear programming problem. Appl Math Mech 28:1605–1612 52. Pal BB, Gupta S (2008) A goal programming approach for solving interval valued multiobjective fractional programming problems using genetic algorithm. In: 2008 IEEE Region 10 and the third international conference on industrial and information systems, pp 1–6. https://doi. org/10.1109/iciinfs.2008.4798454 53. Wang G, Wan Z, Wang X, Lv Y (2008) Genetic algorithm based on simplex method for solving linear-quadratic bilevel programming problem. Comput Math Appl 56:2550–2555 54. Deep K, Singh KP, Kansal ML, Mohan C (2009) A real coded genetic algorithm for solving integer and mixed integer optimization problems. Appl Math Comput 212:505–518 55. Pal BB, Gupta S (2009) A genetic algorithm approach to fuzzy goal programming formulation of fractional multiobjective decision making problems. In: IEEE First international conference on advanced computing, pp 55–60. https://doi.org/10.1109/icadvc.2009.5378218

476

K. Lachhwani

56. Pal BB, Gupta S (2009) A genetic algorithm approach for fuzzy goal programming formulation of chance constrained problems using stochastic simulation. In: IEEE International conference on industrial and information systems (ICIIS), pp 187–192. https://doi.org/10.1109/iciinfs. 2009.5429868 57. Jana RK, Sharma DK (2010) Genetic algorithm-based fuzzy goal programming for class of chance-constrained programming problems. Int J Comput Math 87:733–742 58. Osman M, El-Wahed WA, El-Shafei M, El-Wahab HA (2011) A proposed approach for solving rough bi-level programming problems by genetic algorithm. Int J Contemp Math 87:1453–1465 59. Pal BB, Chakraborti D, Biswas P (2011) Using genetic algorithm for solving linear multilevel programming problems via fuzzy goal programming. In: Balasubramaniam P (ed) Control, computation and information system, ICLICC 2011. Communications in computer and information science. Springer, Berlin, pp 79–88 60. Tang K, Yang J, Chen H, Gao S (2011) Improved genetic algorithm for nonlinear programming problems. J Syst Eng Electron 22:540–546 61. Pal BB, Gupta S (2012) A genetic algorithm-based fuzzy goal programming approach for solving fractional bilevel programming problems. Int J Oper Res 14:453–471 62. Hosseini E, Kamalabadi I (2013) A genetic approach for solving bi-level programming problems. Adv Model Optim 15:719–733 63. Li H (2015) A genetic algorithm using a finite search space for solving nonlinear/linear fractional bilevel programming problems. Ann Oper Res 235:543–558 64. Li H, Zhang L, Li H (2019) Modified NSGA-II based interactive algorithm for linear multiobjective bilevel programs. In: 2019 15th International conference on computational intelligence and security (CIS). https://doi.org/10.1109/cis.2019.00095 65. Eberhart RC, Shi Y (1998) Comparison between genetic algorithms and particle swarm optimization. In: Porto VW, Saravanan N, Waagen D, Eiden AE (eds) Evolutionary programming VII, EP 1998. Lecture notes in computer science, vol 1447, pp 611–616. https://doi.org/10. 1007/bfb0040812 66. Li H, Wang Y (2006) A hybrid genetic algorithm for solving a class of nonlinear bilevel programming problems. In: Wang TD et al (eds) Simulated evolution and learning, SEAL 2006. Lecture notes in computer science, vol 4247. Springer, Berlin 67. Kuo RJ, Han YS (2011) A hybrid of genetic algorithm and particle swarm optimization for solving bi-level linear programming problem—a case study on supply chain model. Appl Math Model 35:3905–3917 68. Sahoo L, Banerjee A, Bhunia AK, Chattopadhyay S (2014) An efficient GA–PSO approach for solving mixed-integer nonlinear programming problem in reliability optimization. Swarm Evol Comput 19:43–51 69. Kuo RJ, Lee YH, Zulvia FE, Tien FC (2015) Solving bi-level linear programming problem through hybrid of immune genetic algorithm and particle swarm optimization algorithm. Appl Math Comput 266:1013–1026 70. Cai L, Wu Y, Zhu S, Tan Z, Yi W (2020) Bi-level programming enabled design of an intelligent maritime search and rescue system. Adv Eng Inform 46, 101194. https://doi.org/10.1016/j.aei. 2020.101194 71. Khan I, Pal S, Maiti MK (2019) A hybrid PSO-GA algorithm for traveling salesman problems in different environments. Int J Uncertain Fuzziness Knowl Based Syst 27:693–717. https:// doi.org/10.1142/s0218488519500314 72. Feng J (2021) Application of a bilevel programming model in disposal site selection for hazardous waste. Environ Eng Sci 38. https://doi.org/10.1089/ees.2020.0375 73. Fathollahi-Fard AM, Hajiaghaei-Keshteli M, Tavakkoli-Moghaddam R, Smith NR (2021) Bilevel programming for home health care supply chain considering outsourcing. J Ind Inform Integr 100246. https://doi.org/10.1016/j.jii.2021.100246 74. Lotfi R, Mardani N, Weber G (2021) Robust bi-level programming for renewable energy location. Int J Energy Res 45:7521–7534. https://doi.org/10.1002/er.6332 75. Luo H, Liu L, Yang X (2019) Bi-level programming problem in the supply chain and its solution algorithm. Soft Comput 24:2703–2714. https://doi.org/10.1007/s00500-019-03930-7

Author Index

A Abhishek Shrivastava, 217 Abimala, T., 267 Akanksha Gupta, 99 Akash Peddaputha, 347 Anisha Kumari, 99 Anjali, C., 303 Ankita Chandavale, 303 Anuj Nandanwar, 57, 71 Anupam Shukla, 159, 197 Apurva Joshi, 291 Arjit Jaitely, 41 Arpit Gupta, 99 Arun Kumar Wadhwani, 159 Arun Sai Narla, 243 Arvind Mewada, 99 Ashish Shetty, 169 Aveekal Kumar, 333

H Hanuman Turaga, V. K., 431 Hathiram Nenavath, 243 Heera Lal Maurya, 27

I Ishank Agarwal, 41 Ishu Garg, 197

J Jayashree, R., 417 Jeniffer, S. D., 383

K Kailash Lachhwani, 447, 461 Ketan Singh, 41 Krishna Singh, 257

B Balram Tamrakar, 257

C Chandra Kant Dwivedi, 309

D Dev Sourav Panda, 183 Dhaval Jadhav, 109

G Gaurang Lakhani, 109

L Laxmidhar Behera, 1, 15, 27, 57, 71 Lilly Raamesh, 267 Lokesh Gagani, 109

M Maganti Bhargav Hemanth, 347 Maheesha, M., 321 Manjit Kumar, 283 Manoj Kumar Ojha, 159 Mohamed Mansoor Roomi, S., 321 Mohit Vohra, 1

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-19-2126-1

477

478 Mrudul Dhawley, 333 Mrunal Girhepunje, 405

N Nagarathna, R., 321 Narmadha, T. V., 267 Nidhi Dandotiya, 283 Niketa Gandhi, 303 Nikhil P. Wyawahare, 405 Nishchal K. Verma, 27 Nithish Kumar, S., 231 Nitin Kanzariya, 109

P Padmini Singh, 27 Pallavi Khatri, 283 Parimala, S., 417 Parvin Kumar, 257 Pawan Mishra, 359 Piyush Agrawal, 291 Pooja, 359 Prashant Khobragade, 405

R Rahul Dixit, 183 Rajeshwar Patil, 169 Raj Nath Shah, 99 Ram Prakash, S., 231 Ranjith Ravindranathan Nair, 147 Rashmi Maheshwari, 85 Rathika, P. D., 231 Rishu Singh, 333 Ritik Raj, 99 Rupesh Kumar Dewang, 99

S Sajna, S., 147 Samarth Bhawsar, 205 Samiksha, S., 321 Samkit Jain, 85 Sampada S. Wazalwar, 371 Sampada Wazalwar, 405

Author Index Sanchit Agrawal, 197 Sandeep Gupta, 15 Sanjeev Sharma, 169, 205, 333, 347 Sanskar Hasija, 347 Sarthak Dubey, 205 Sathyabama, B., 321 Senthamarai Kannan, K., 383 Shailesh Bendale, 291 Shalini Kapuganti, 243 Shashank Kumar, 41 Shashwat Kushwaha, 205 Shradha Suman Panda, 183 Shubham Shukla, 41, 359 Simran Jain, 405 Srilatha Chebrolu, 431 Subashini, G., 231 Subhash Yogi, 27 Sujendra Kumar Kachhap, 283 Sulochana Wadhwani, 159 Sunita Jahirbadkar, 303 Sweety, M., 321

T Tanmay Jaiswal, 99 Triveni Ramteke, 405

U Ujjawal Soni, 197 Urmila Shrawankar, 371 Uttam Chauchan, 109

V Vibhav Kumar Sachan, 41 Vibhu Kumar Tripathi, 57, 71 Vijay Kumar Dalla, 217 Vinat Goyal, 333 Vinod Kumar Jain, 85 Vishwaratna Srivastava, 41

Y Yatharth Kale, 169 Yogeshwar Patil, 169