Data Management, Analytics and Innovation: Proceedings of ICDMAI 2023 9819914132, 9789819914135

This book presents the latest findings in the areas of data management and smart computing, big data management, artific

269 60 30MB

English Pages 1067 [1068] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
Editors and Contributors
Machine Learning
Troomate—Finding a Perfect Roommate a Literature Survey
1 Introduction
2 Literature Survey
3 Existing Platforms
3.1 Olx [8]
3.2 Indianroommates [9]
3.3 Roomster [10]
3.4 Flatmatch [11]
4 Comparative Analysis
5 Need of Finding Roommate
6 Proposed Work
7 Conclusion
References
A Smart System to Classify Walking and Sitting Activities Based on EEG Signal
1 Introduction
2 Literature Review
3 Methodology
3.1 Experimental Setup
3.2 Signal Dataset Generation Method Description
3.3 Feature Extraction
3.4 Classification
4 Results
5 Conclusion
References
Forest Fire Detection and Classification Using Deep Learning Concepts
1 Introduction
2 Related Works
3 Proposed Approach
4 Evaluation Metrics
5 Results
6 Conclusion
References
Study of Cold-Start Product Recommendations and Its Solutions
1 Introduction
1.1 Non-personalized Recommender Systems
1.2 Personalized Recommender Systems
1.3 Types of Personalized Recommender Systems
2 Literature Survey
2.1 Inferences
2.2 Objectives
3 Proposed System
4 Results and Analysis
4.1 Inferences
4.2 User Ratings Prediction Based on Textual Reviews
4.3 User Ratings Prediction Based on Textual Reviews
4.4 Final Recommendations
5 Conclusion
References
Monitoring Urban Flooding Using SAR—A Mumbai Case Study
1 Introduction
2 Literature Review
3 Material and Methods
3.1 Study Area
3.2 Data
3.3 Algorithms
4 Results and Discussions
5 Conclusions
References
Do Women Shy Away from Cryptocurrency Investment? Cross-Country Evidence from Survey Data
1 Introduction
2 Data
3 Results
4 Discussion and Conclusions
References
Detection of Internet Cheating in Online Assessments Using Cluster Analysis
1 Introduction
2 Literature Review
3 Methodology
3.1 Data Collection
3.2 Considered Features
3.3 Clustering Method and Metrics
4 Results
4.1 Clustering with K-Means
4.2 Cluster Analysis
5 Discussion
6 Conclusion
References
Suspicious Event Detection of Cargo Vessels Based on AIS Data
1 Introduction
2 Statistics-Based Automatic Detection of Anomaly
3 Results
4 Conclusion
References
Identifying Trends Using Improved Affinity Propagation (IMAP) Clustering Algorithm on Evolving Data Stream
1 Introduction
2 Literature Survey
3 Gap Analysis
4 Affinity Propagation
4.1 Affinity Propagation Method
4.2 Weighted Affinity Propagation Method
4.3 STRAP Method
4.4 Proposed Improved Affinity Propagation Algorithm (IMAP)
5 Results and Discussion
5.1 Significance of Result
6 Conclusion
References
Comparative Analysis of Recommendation System Using Similarity Techniques
1 Introduction
2 Related Work
3 Material and Methods
3.1 Dataset
4 Proposed Methods
5 Implementation
6 Comparative Analysis
7 Conclusion and Future Work
References
Graphology-Based Behavior Prediction: Case Study Analysis
1 Introduction
2 Literature Survey
2.1 Skew
2.2 Slant
2.3 Pressure
2.4 Vowelinfoa/Vowelinfoi
2.5 Correlation
2.6 Length
2.7 Union of Letters
3 Case Study
3.1 Criminology
3.2 Depression
4 Proposed Model
5 Discussion
6 Conclusion
References
Statistics-Driven Suspicious Event Detection of Fishing Vessels Based on AIS Data
1 Introduction
2 Statistics-Driven Approach for Suspicious Event Detection
3 Description of AIS Data Used for the Analysis
4 Results
5 Conclusion
References
Landslide Susceptibility Mapping Using J48 Decision Tree and Its Ensemble Methods for Rishikesh to Gangotri Axis
1 Introduction
2 Study Area and Dataset
3 Methodology
3.1 J48 Decision Trees (DT)
3.2 Random Forest (RF)
3.3 Rotation Forest
3.4 Extra Tree/Extremely Randomized Trees
3.5 AdaBoost
3.6 XGBoost
3.7 Testing/Validation
4 Results and Discussion
5 Conclusions
References
Distributed Reduced Alphabet Representation for Predicting Proinflammatory Peptides
1 Introduction
2 Materials and Methods
2.1 Dataset
2.2 Protein Sequence Representation
3 Results
3.1 Model Performance Based on Embeddings Vectors
3.2 Model Performance Based on a Combination of Motifs and Embedding Vectors
4 Discussion
5 Conclusions
References
Predicting Injury Severity in Construction Using Logistic Regression
1 Introduction
2 Variables and Data
3 Preliminary Analysis
4 Logistic Regression Model
5 Block Diagram
6 Analysis and Results
7 Conclusions
References
Prakruti Nishchitikaran of Human Body Using Supervised Machine Learning Approach
1 Introduction
1.1 Tridosha and Panchmahabhuta
1.2 Prakriti Nishchitikaran
1.3 Machine Learning
2 Previous Work
2.1 Prakriti Nishchitikaran in Ayurveda
2.2 Predictive Learning
3 Methodology
3.1 Survey Methodology
3.2 Design and Creation
4 Research Findings
5 Conclusion
References
Time Series AutoML; Hierarchical Factor Based Forecasting
1 Introduction
2 Background
3 Methodology
3.1 Dataset
3.2 Steps Involved in the Process
4 Results and Discussion
5 Future Work
References
Economic Growth Prediction and Performance Analysis of Developed and Developing Countries Using ARIMA, PCA, and k-Means Clustering
1 Introduction
2 Literature Review
3 Materials and Methods
4 Experiments and Results
4.1 Principal Component Analysis
4.2 k-Means Clustering
4.3 ARIMA
5 Conclusion
References
Optimized Feature Representation for Odia Document Clustering
1 Introduction
2 Related Work
3 Methodology
3.1 Feature Extraction
3.2 Feature Normalization
3.3 Feature Optimization Using PCA
3.4 Model Training and Evaluation
4 Experimental Results
5 Conclusion and Future Work
References
Paris Olympic (2024) Medal Tally Prediction
1 Introduction
2 Literature Review
3 Proposed System
4 Data Compilation
5 Dataset Description
6 Data Pre-processing
6.1 Data Cleansing
6.2 Data Integration
6.3 Data Transformation
6.4 Data reduction
7 Exploratory Data Analysis [7]
7.1 Scatter Plots
7.2 Correlation Matrix Using Heat Map
8 Feature Selection [8]
8.1 Sequential Backward Elimination
8.2 Sequential Forward Selection
8.3 Extra Trees Classifier [9]
9 Data Standardization
10 Model Building
10.1 Regression [1]
10.2 Linear Regression
11 Model Performance
11.1 R2 Method
11.2 Adjusted R2 Method [11]
11.3 Mean Squared Error [12]
11.4 Akaike’s Information Criterion [13]
12 Model Equation
13 Out-of-Sample Prediction
14 Comparative Analysis
15 Conclusion
References
Crop Recommendation Using Hybrid Ensembles Model by Extracting Statistical Measures
1 Introduction
2 Related Work
3 Dataset Creation
3.1 Dataset Collection
3.2 Preprocessing of the Dataset
4 Proposed Work
4.1 Naïve Bayes Classification
4.2 Decision Trees Classification
4.3 K-Nearest Neighbor Classification (K-NN)
4.4 Hybrid Model using Majority Voting Ensemble Technique
5 Empirical Result and Discussion
6 Conclusion
References
Aspect-Based Product Recommendation System by Sentiment Analysis of User Reviews
1 Introduction
2 Background
2.1 Motivation
2.2 Related Work
3 Proposed Work
3.1 Dataset and Preprocessing
3.2 POS Tagging
3.3 Extracting Aspects
3.4 Sentiment Analysis
3.5 Classification and Accuracy
3.6 Product Recommendation
4 Results and Future Scope
5 Conclusion
References
Deep Learning and AI
X-ABI: Toward Parameter-Efficient Multilingual Adapter-Based Inference for Cross-Lingual Transfer
1 Introduction
1.1 Natural Language Inference (NLI)
1.2 Inference Logic
1.3 Cross-Lingual-Transfer
1.4 NLI Techniques
1.5 Adapters
2 Related Works
2.1 Cross-lingual Transfer
2.2 Natural Language Inference
2.3 Adapters
3 Proposed Approach
3.1 Data Preparation
3.2 Adapter Infused Model
3.3 Judgment Prediction
4 Evaluation
4.1 Dataset
4.2 Baseline Methods
4.3 Experiment Setup
4.4 Evaluation Metrics
4.5 Results
4.6 Discussions
5 Conclusion
References
Comparative Study of Depth Estimation for 2D Scene Using Deep Learning Model
1 Introduction
2 Methodology
2.1 Dataset
2.2 Architecture: Dense Depth Model
2.3 Architecture: U-net Model
2.4 Loss Function
2.5 Evaluation Matrix
3 Result and Discussion
4 Conclusion
References
Product Recommendation System Using Deep Learning Techniques: CNN and NLP
1 Introduction
2 Related Work
3 Implementation Methodology
3.1 Data Collection and Pre-processing
3.2 Training the Model
3.3 Object Detection
3.4 Collecting Product Details
3.5 Product Recommendation
4 Proposed Models
4.1 Convolution Neural Networks (CNNs)
4.2 Natural Language Processing (NLP)
5 Evaluation and Comparison Techniques
6 Conclusion and Future Work
References
Modified Long Short-Term Memory Algorithm for Faulty Node Detection Using node’s Raw Data Pattern
1 Introduction
2 Literature Review
3 Methodology
3.1 Hardware Details
3.2 Data Collection
4 Modified LSTM Algorithm
5 Result and Discussion
6 Conclusion
References
DCNN-Based Transfer Learning Approaches for Gender Recognition
1 Introduction
2 Datasets
3 Transfer Learning
4 Evaluation
5 Experimentation and Results
6 Conclusion
References
Fetal Brain Component Segmentation Using 2-Way Ensemble U-Net
1 Introduction
2 Related Works
3 Materials and Methods
3.1 Data Availability
3.2 Data Preprocessing
3.3 Network Architecture
4 Evaluation Metrics
5 Results
6 Discussion
7 Conclusion
8 Code Availability
References
Morbidity Detection from Clinical Text Data Using Artificial Intelligence Technique
1 Introduction
2 Related Works
3 Implementation Methodology
4 Performance Metrics and Evaluation
5 Conclusion and Future Work
References
Investigating the Application of Multi-lingual Transformer in Graph-Based Extractive Text Summarization for Hindi Text
1 Introduction
2 Related Work
2.1 Graph-Based Approaches
2.2 Transformers and Embeddings
3 Proposed Study
4 Evaluation and Results
4.1 Evaluation Parameters
4.2 Results
5 Threats to Validity
6 Conclusion
References
Analysis of Machine Learning Algorithms for COVID Detection Using Deep Learning
1 Introduction
2 Dataset Description
3 Related Work
4 Methodologies
4.1 Data Preprocessing
4.2 Data Augmentation
5 Model Building
5.1 VGG19
5.2 ResNet
5.3 Xception
5.4 Proposed Custom Model
6 Proposed Neural Network Architecture
7 Results and Discussion
8 Conclusion
References
Real-Time Learning Toward Asset Allocation
1 Introduction
1.1 Related Works
2 Methodology of Using Reinforcement Learning
2.1 The Standard Agent–Environment Interface
2.2 Reinforcement Learning Formulation for Portfolio Optimization
3 Evaluation
3.1 Experimental Setup
4 Experimental Results
5 Discussion
5.1 Future Work
References
CoffeeGAN: An Effective Data Augmentation Model for Coffee Plant Diseases
1 Introduction
2 Related Work
3 Methodology
3.1 Dataset and Data Pre-processing
3.2 Generative Adversarial Network
3.3 Convolution Neural Network
3.4 CycleGAN
3.5 CoffeeGAN
3.6 Performance Metrics
4 Empirical Results and Discussion
5 Conclusion
References
CycleGAN Implementation on Cross-Modality Transfer Between Magnetic Resonance Image (MRI) and Computed Tomography (CT) Images
1 Introduction
1.1 Developed Neural Network Architecture
2 Model’s Evaluation
3 Results and Discussion
4 Conclusion
References
A Dual Fine Grained Rotated Neural Network for Aerial Solar Panel Health Monitoring and Classification
1 Introduction
2 Literature Review
3 Methodology
3.1 Dataset Preparations and Synthetic Data Generation
3.2 Proposed Methodology
4 Implementation of Dirty Solar Panel Detection Using Dual Fine Grained Rotated Neural Network from Tiny Aerial Image
5 Result and Discussion
6 Conclusion
References
Explainable Automated Coding of Medical Narratives Using Identification of Hierarchical Labels
1 Introduction
2 Literature Review
3 Problem Statement
4 Proposed End-to-End Pipeline
5 Results and Conclusion
References
Automatic Text Summarization Using Word Embeddings
1 Introduction
2 Related Works
2.1 Summarization Utilizing Learning Methods
2.2 Summarization for Multi-documents Using Multilayer Networks
2.3 Summarization Using Neural Network Model
2.4 Summarization Using Machine Learning Techniques
2.5 Summarization Using Graph-Based Network
2.6 Hybrid Summarization
3 Proposed Methodology
3.1 PHASE-1: Extractive Summary
3.2 PHASE-2: Abstractive Summary
4 Experimental Setup and Results
4.1 Evaluation Datasets
4.2 Baselines
4.3 Evaluation Metrics
5 Result Analysis
6 Conclusion and Future Works
References
Video Object Detection with an Improved Classification Approach
1 Introduction
2 Research Methodology
3 Parameters for Classification Improvement
3.1 Related Work
3.2 The Following Hyper-tuning Parameters Are Considered to Check the Performance Improvement in Classification Methods
4 Experimentation
5 Results
5.1 Result on Live Video
6 Conclusion
References
An Unsupervised Image Processing Approach for Weld Quality Inspection
1 Introduction
2 Literature Survey
3 Methodology/Proposed Approach
3.1 Input Files
3.2 Solution Architecture
4 Results and Conclusion
References
YOLO Algorithms for Real-Time Fire Detection
1 Introduction
2 Method for an Automated Annotation Generation for Fire Images
3 YOLO Architectures for Fire Detection
4 Experimental Results
5 Conclusion and Future Scope
References
Improving Autoencoder-Based Recommendation Systems
1 Introduction
2 Survey of Literature
3 Proposed Technique
4 Experiment and Experiment Results
4.1 Dataset
4.2 Training Autoencoder-Based Recommendation System
4.3 Sentiment-Based Recommendation System
4.4 Combination of Autoencoder and Sentiment-Based Recommendation System
5 Conclusion
References
ARTSAM: Augmented Reality App for Tool Selection in Aircraft Maintenance
1 Introduction
2 Tool Selection in Aircraft Maintenance
3 ARTSAM: Design and Implementation
3.1 Tool Detection and Annotation
3.2 Task Guidance
3.3 Implementation
4 Results and Discussion
5 Summary and Conclusion
References
Named Entity Recognition over Dialog Dataset Using Pre-trained Transformers
1 Introduction
2 Related Work
3 Methodology
4 Experimental Setup and Dataset Details
4.1 Experimental Setup
4.2 Dataset Details
5 Result
6 Conclusion
References
Depression Detection Using Hybrid Transformer Networks
1 Introduction
2 Related Work
3 Data
4 Methodology
4.1 Preprocessing
4.2 Embeddings
4.3 Pooling Strategy
4.4 Feature Extraction
4.5 Classification
5 Results
6 Conclusion
References
A Comparative Study of Distance-Based Clustering Algorithms in Fuzzy Failure Modes and Effects Analysis
1 Introduction
2 Literature Review
3 Preliminaries
3.1 Fuzzy Set Theory
3.2 Fuzzy-Full Consistency Method (F-FUCOM)
3.3 Quantification of Failure Modes
3.4 Clustering
3.5 Measures for Comparison Between Clustering Algorithms
4 Case Study
4.1 Description of the System, Identification of Failure Modes, and Collection of Data
4.2 Calculation of Weights of the Risk Criteria
4.3 Calculation of Weights of the Risk Criteria
5 Discussion
6 Conclusion
References
Data Storage, Management and Innovation
Moderating Role of Project Flexibility on Senior Management Commitment and Project Risks in Achieving Project Success in Financial Services
1 Introduction
2 Review of Literature
2.1 Background
2.2 Senior Management Commitment
2.3 Project Risk Management
2.4 Risk Mitigation Strategies
2.5 Success in Projects
3 Conceptual Framework and Hypotheses Development
3.1 Senior Management Commitment and Project Success
3.2 Project Risk Management and Project Success
3.3 Project Flexibility and Project Success
3.4 Moderation Effects of Project Flexibility
3.5 Conceptual Framework
4 Research Design
4.1 Sampling and Data Collection
4.2 Characteristics of the Sample
4.3 Measures
5 Research Findings
5.1 Descriptive Statistics
5.2 Moderation Analysis with the Ordinal Regression Analysis
6 Discussion
7 Conclusion and Managerial Implications
8 Limitations and Recommendations for Future
9 Declaration of Competing Interest
References
Growth Profile of Using AI Techniques in Antenna Research Over Three Decades
1 Introduction
2 Methodology Followed in the Present Study
3 Results
3.1 Growth Rate of AI Research on Antennas
3.2 Research Focus Ratio for AI Research on Antennas
3.3 Most Impactful Authors
3.4 Research Publication Titles
3.5 h-index Study
3.6 Keyword Analysis
4 Discussion
4.1 Growth Rate of Research
4.2 Research Synchronization
4.3 Impactful Authors
4.4 Publishing Titles
4.5 Citation Study
4.6 Research Landscape
5 Conclusions
References
Economical Solution to Automatic Evaluation of an OMR Sheet Using Image Processing
1 Introduction
2 Literature Review
3 Proposed System
4 Methodology
4.1 Convert RGB Image to Gray Scale
4.2 Image Smoothing
4.3 Detecting Edges
4.4 Finding Rectangular Contours
4.5 Warp Perspective of the Bubbled Area
4.6 Thresholding
4.7 Finding the Marked Responses
4.8 Evaluating the Responses
4.9 Displaying Results
4.10 Final Output
5 Experimentation and Results
6 Future Scope
7 Conclusion
References
Defense and Evaluation Against Covert Channel-Based Attacks in Android Smartphones
1 Introduction
2 Types of Covert Channels
2.1 Detectability or Side-Effects
2.2 Tainted or Untainted
3 Covert Channels Applications in Technological Advancements
3.1 Covert Channels in IoT
3.2 Covert Channels in IPv6
3.3 Covert Channels in VoLTE
4 Security in Covert Channels
5 Covert Channels Detection
5.1 Domain Isolation
5.2 Privilege Escalation
5.3 Driver Permission Detection
5.4 Artificial Intelligence Detection
6 Conclusion
References
Study of the Need for Effective Cyber Security Trainings in India
1 Introduction
2 Methodology
3 Why Basic Cyber Training Is Required
3.1 Organizational-level Training and Awareness Initiatives
3.2 Review of Challenges Associated With Modern Training in Social Engineering and Awareness Programs
3.3 Traditional Training Programs Against Social Engineering Include Very Simple and Cohesive Steps Taken by the Organisation to Keep the Staff Informed About the Attacks and Threats
4 Cyber-crimes Statistics
4.1 WhatsApp Case Study
4.2 Initiatives Taken by the Indian Government on Cyber Security
5 Some Rare Cases of Cyber Attack in India
5.1 Case Study on Pegasus Spyware
5.2 Case Study on ShowPad
6 Safety and Security
6.1 Security Measures
7 Cyber Kill Chain
7.1 Phases of Cyber Kill Chain
7.2 Role of Cyber Kill Chain
8 Privacy versus Security
9 Threats and Solution
10 Malware Threat and Solutions
11 Cyber Security and Ethics
11.1 Menlo Report
12 Indian Cyber Laws
12.1 IT Act 2000 [13]
12.2 Sections of IT Law of 2000
13 International Forums
14 Conclusion
References
From Bricks to Clicks: The Potential of Big Data Analytics for Revolutionizing the Information Landscape in Higher Education Sector
1 Introduction
2 In-Depth Analysis of Data from the Field of Higher Education
3 The Ensuing Infrastructures
4 Consistency and Uniformity in Data
5 Data Harmonization in Higher Education and Quality Standards Applied Worldwide
6 Data Visualization and Analytics Dashboard Management
7 Data Presentation
8 Digital Dashboards for Monitoring Political Indicators
9 Discussion
10 Conclusion
References
Enabling Technologies and Applications
Learning on the Move: A Pedagogical Framework for State-of-the-Art Mobile Learning
1 1 Mobile Learning Platform
2 Learning and Socialization
3 AI and Recommender Systems
4 Augmented Reality
5 Game-Based Learning
6 Incorporating Technology into Classroom Learning
7 Ubiquitous Learning
8 Visualization of the Connections
9 Concluding Remarks
References
Quantum Web-Based Health Analytics System
1 Introduction
2 Literature Review
3 Proposed Model
3.1 Identification of Required Datasets
3.2 Processing of Required Datasets
4 Stimulation Results
4.1 Processing Time
4.2 Memory Consumption
5 Conclusion
6 Future Work
References
Comparative Study of Noises Over Quantum Key Distribution Protocol
1 Introduction
2 Theory of Quantum Communication
2.1 Quantum Cryptography—QKD
2.2 Noise Models
2.3 Motivation
3 Methodology
3.1 Proposed Approach
3.2 Simulation in Noisy Environment
4 Result
4.1 Depolarizing Channel
4.2 State Preparation and Measurement (SPAM) Channel
4.3 Thermal Decoherence and Dephasing Channel
5 Discussion
5.1 Execution with Four Qubits
5.2 Execution with 2016 Qubits
6 Conclusion
References
Comparative Assessment of Methane Leak Detection Using Hyperspectral Data
1 Introduction
2 Literature Review
3 Methodology
3.1 Synthetic Data
3.2 AVIRIS Data for Methane Leak in Canyon
3.3 Matched Filter
3.4 RX Algorithm
3.5 Adaptive Cosine/Coherence Estimator (ACE)
4 Experiments and Results
4.1 Experiment 1: Contamination of Indian Pines Dataset Using Signatures from Pavia University and Pavia Center Datasets
4.2 Experiment 2: Cuprite as a Base Image Contaminated with Aster
4.3 Experiment 3: Scaled Cuprite as a Base Image Contaminated with Aster (Converted to Reflectance)
4.4 Experiment 4: AVIRIS Data for Methane Leak in California
5 Conclusion
References
Application of Machine Learning in Customer Services and E-commerce
1 Introduction
2 Machine Learning Techniques
3 Methodology
4 Simulation Setup
5 Result and Discussion
6 Conclusion and Future Work
References
The Effect of Comorbidity on the Survival Rates of COVID-19 Using Quantum Machine Learning
1 Introduction
2 Materials and Method
2.1 Collection of Data
2.2 Analysis of Data
3 Results and Discussion
3.1 Effect of Comorbid Conditions: Predictive Analyzes of the COVID-19 Survival Rates
3.2 Data Analysis Using Classical Machine Learning
3.3 Data Analysis Using Quantum Machine Learning
4 Conclusion
References
Multi-GPU-Enabled Quantum Circuit Simulations on HPC-AI System
1 Introduction
2 Quantum Computing
3 PARAM Siddhi AI System
4 Results and Discussion
5 Scalability on PARAM Siddhi AI System GPUs
6 Comparative/Relative Performance
7 Conclusions
References
Study of NFT Marketplace in the Metaverse
1 Introduction
2 Prelude to Metaverse
2.1 Seven Layers of Metaverse
2.2 Understanding Metaverse Index
3 Stepping into the NFT Marketplace
3.1 Understanding NFT
3.2 NFT as a Digital Asset
3.3 Concept of NFT Marketplace
4 Recipe for NFT Market Place
5 Conclusion
References
Secure Authentication of IoT Devices Using Upgradable Smart Contract and Fog-Blockchain Technology
1 Introduction
2 Blockchain and Fog Computing
3 Literature Review
4 Proposed Fog Blockchain-Based Architecture
4.1 Overall System Architecture
4.2 Proposed Authentication Scheme
5 Implementation
5.1 Authentication Function
5.2 Upgradable Smart Contract
6 Security Analysis
7 Conclusion
References
The NFT Wave and Better Use Cases—Using NFTs for Documents, Identification, and College Degrees
1 Introduction
2 Literature Survey
2.1 Digital Certificates Using Blockchain and Educational NFTs
2.2 Bloom Protocol
2.3 Vaccine Passports and COVID-19
3 Challenges of NFTs
4 Working
5 Proposed Methodology
6 Conclusion
References
Data Science Techniques for Handling Epidemic, Pandemic
Outbreak and Pandemic Management in MOOCs: Current Status and Scope for Public Health Data Science Education
1 Introduction
2 Approach
2.1 Sample Description
2.2 Data Analysis
3 Results
3.1 Sample Characteristics
3.2 Mapping Outbreak and Pandemic Management in MOOCs
3.3 Using Data in Outbreak and Pandemic Management: MOOCs Content and Characteristics
4 Discussion
5 Conclusion
References
Data Science Approaches to Public Health: Case Studies Using Routine Health Data from India
1 Introduction
2 Review of Literature
2.1 What is Routine Health Data?
2.2 Sources of Routine Health Data?
2.3 Potential of Routine Health Data
2.4 Indian Context
2.5 Challenges of Routine Health Data
3 Case Studies
3.1 Case Study 1—The AMCHSS COVID-19 Dashboard
3.2 Case Study 2—Cause of Death
3.3 Case Study 3—Increasing Trends of Caesarean Sections in India
3.4 Case Study 4—Retinal Disease Classification
4 Discussion
5 Conclusion
References
Arboviral Epidemic Disease Forecasting—A Survey on Diagnostics and Outbreak Models
1 Introduction
1.1 Dengue Infection Transmission Cycle
2 Related Work
2.1 Dengue Outbreak
2.2 Dengue Diagnostic
2.3 Dengue Vector/Host
3 Challenges of Dengue Forecasting
4 Discussion
5 Conclusions
References
Special Session Bio Signal Processing Using Deep Learning
Convolution Neural Network for Weed Detection
1 Introduction
2 Literature Survey
3 Materials and Method
3.1 Data Acquisition and Pre-processing
3.2 Principle and Structure of CNN
3.3 Weed Identification Model
4 Results and Parameters
4.1 Result of Pre-processing
4.2 Results of CNN
4.3 Classification Result
4.4 Identification Results
5 Conclusion
References
Detecting Bone Tumor on Applying Edge Computational Deep Learning Approach
1 Introduction
2 Related Works
3 Methodology
3.1 AdaBoost Curvelet Transform Classifier
3.2 Ant Colony Optimization
4 Experimental Analysis
5 Conclusion
References
The Third Eye: An AI Mobile Assistant for Visually Impaired People
1 Introduction
2 Literature Survey
3 Proposed System and Workflow
3.1 Individual Modular Design
4 Result and Discussion
5 Conclusion
References
Machine Learning Model for Brain Stock Prediction
1 Introduction
2 Related Work
3 Proposed Work
3.1 Data Collection
3.2 Data Pre-processing
3.3 Formulating and Analyzing Data
4 Result and Analysis
5 Conclusion
References
The Prediction of Protein Structure Using Neural Network
1 Introduction
2 Experimental Backgrounds
3 Data Sets Evaluation
4 Architecture Schema for Protein Prediction
5 Algorithm and Implementation
6 Results Evaluation
7 Conclusion and Future Work
References
System of Persons Identification Based on Human Characteristics
1 Introduction
2 Literature Survey
3 Mathematical Models of Behavioral Identification
3.1 Psychological-Mathematical Model of Personality Recognition
3.2 Time-of-Flight Systems (ToFS) Model
3.3 Mathematical Model of Predictions
3.4 Cyclic Modeling of Biometric System Machine Learning
4 Organizing the Structure of the System
5 Conclusion
References
Analysis of COVID-19 Genome Using Continuous Wavelet Transform
1 Introduction
2 Related Work
2.1 Literature Review
2.2 Numerical Representation of the DNA Sequence
3 Proposed Work
3.1 Methodology
3.2 Wavelet Analysis
3.3 Coronavirus Genome
4 Experimental Results
4.1 Spike Protein
4.2 N (Nucleocapsid Protein)
4.3 Envelope (E) Protein
4.4 Membrane (M) Protein
4.5 ORF3a or NS3a Protein
4.6 ORF7a (NS7a) and ORF7b (NS7b) Protein
4.7 ORF8 (NS8) Protein
5 Conclusions and Future Work
References
Author Index
Recommend Papers

Data Management, Analytics and Innovation: Proceedings of ICDMAI 2023
 9819914132, 9789819914135

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Networks and Systems 662

Neha Sharma Amol Goje Amlan Chakrabarti Alfred M. Bruckstein   Editors

Data Management, Analytics and Innovation Proceedings of ICDMAI 2023

Lecture Notes in Networks and Systems Volume 662

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas— UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).

Neha Sharma · Amol Goje · Amlan Chakrabarti · Alfred M. Bruckstein Editors

Data Management, Analytics and Innovation Proceedings of ICDMAI 2023

Editors Neha Sharma Tata Consultancy Services Ltd. Pune, India Society for Data Science Pune, India Amlan Chakrabarti Faculty of Engineering A. K. Choudhury School of Information Technology Kolkata, West Bengal, India

Amol Goje Society for Data Science Pune, India Alfred M. Bruckstein Faculty of Computer Science Technion—Israel Institute of Technology Haifa, Israel

ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-99-1413-5 ISBN 978-981-99-1414-2 (eBook) https://doi.org/10.1007/978-981-99-1414-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

This volume constitutes the Proceedings of the 7th International Conference on Data Management, Analytics and Innovation (ICDMAI 2023) held from January 20–22, 2023, at Defence Institute of Advanced Technology, Pune, in association with DRDO and DIAT-Pune, along with partners like IBM, Ekin, IISER-Kolkata, NIELIT Guwahati, University of Arad, Romania, and Springer. ICDMAI is a signature conference of Society for Data Science (S4DS) which is a not-for-profit professional association established to create a collaborative platform for bringing together technical experts across industry, academia, government laboratories and professional bodies to promote innovation around data science. ICDMAI is committed to create a forum which brings data science enthusiasts on the same page and envisions its role toward its enhancement through collaboration, innovative methodologies and connections throughout the globe. This year was special, as we completed 7 years. They say, initial 7 years are the most crucial milestones as it provides the foundation for all future learning, behavior and health. We have not just managed to survive for 7 years, but we have become richer and mightier in terms of data and data science capabilities. We understand that the field of data science is growing with new competencies and has proven to be applicable to every industry that generates data. In the last 07 years, we could bring 80 doyens of data science as keynote speakers and another set of 70 technical experts contributed toward workshops and tutorials. Besides, we could engage around 400 experts as reviewers and session chairs. Till date, we have received around 2329 papers from 50 countries, out of which 488 papers have been presented and published, which is just 20.9% of submitted paper. The published papers have been accessed by more than 2.5 lakh people all across, who have cited these papers in their own research, taking the H-Index of the conference to 20. The papers we are receiving are from eminent researchers, leading research organization, from corporate houses and defense services. ICDMAI 2023 has witnessed delegates from ten countries, 15 industries, 121 international and Indian universities. The conference received papers from 412 authors that were evaluated by 192 reviewers. Totally, 75 papers are selected after rigorous review process for oral presentation, and the Best Paper Awards were given for each Track. v

vi

Preface

We tried our best to bring a bouquet data science through various workshops, tutorials, keynote sessions, plenary talks, panel discussion and paper presentations by the experts at ICDMAI 2023. The chief guest of the inauguration function was Dr. Samir V. Kamat, Secretary, Department of Defence R&D and Chairman, DRDO, and the guest of honor was Ms. Suma Varughese, Outstanding Scientist and Director General—Micro Electronic Devices, Computational Systems and Cyber Systems, whereas the chief guest of the valedictory function was Admiral R. Hari Kumar, Chief of Naval Staff and the guest of honor was Dr. N. R. Srinivasa Raghavan, CEO, Tarxya Technologies Limited, London. ICDMAI 2023 witness plethora of top level experts as keynote speakers like Dr. Harrick from TCS, Prof. Rajesh Kumble Nayak from IISER-Kolkata, Dr. Satyam Priyadarshy from Halliburton (NYSE: HAL)-USA, Dr. Mayukha Pal from ABB Ability Innovation Center, Hyderabad, Dr. Chittaranjan Yajnik, MD, FRCP, Director and Consultant—Diabetes Unit, Prof. Regiane Relva Romano, Head of Research and General Coordinator at Smart Campus FACENS, Brazil, Alfred M. Bruckstein, Technion—Israel Institute of Technology, and Aninda Bose, Executive Editor, Springer Nature. Besides, we also got an opportunity to listen to another set of experts in the pre-conference tutorials and workshops. There were two workshops—one on “Knime Analytics” by DIAT and another on “HPC in Bioinformatics” by CDAC and three tutorials—Quantum by I Hub, a start-up in quantum, GTP by Cognizant and AI by DRDO. Through this conference, we could build the strong data science ecosystem. Our special thanks go to Janusz Kacprzyk, Series Editor, Springer, Lecture Notes in Networks and Systems for the opportunity to organize this guest edited volume. We are grateful to Mr. Aninda Bose (Executive Editor, Springer Nature) for the excellent collaboration, patience, and help during the evolvement of this volume. We are confident that this volume will provide state-of-the-art information to professors, researchers, practitioners and graduate students in the area of data management, analytics and innovation, and all will find this collection of papers inspiring and useful. Pune, India Pune, India Kolkata, India Haifa, Israel February 2023

Neha Sharma Amol Goje Amlan Chakrabarti Alfred M. Bruckstein

Contents

Machine Learning Troomate—Finding a Perfect Roommate a Literature Survey . . . . . . . . . Aditi Bornare, Arushi Dubey, Rutuja Dherange, Srushti Chiddarwar, and Prajakta Deshpande A Smart System to Classify Walking and Sitting Activities Based on EEG Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shripad Bhatlawande, Swati Shilaskar, Advait Kamathe, Chinmay Kulkarni, and Neelam Chandolikar Forest Fire Detection and Classification Using Deep Learning Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. Nishanth and G. Varaprasad

3

19

37

Study of Cold-Start Product Recommendations and Its Solutions . . . . . Deep Pancholi and C. Selvi

45

Monitoring Urban Flooding Using SAR—A Mumbai Case Study . . . . . Chaman Banolia, K. Ram Prabhakar, and Shailesh Deshpande

59

Do Women Shy Away from Cryptocurrency Investment? Cross-Country Evidence from Survey Data . . . . . . . . . . . . . . . . . . . . . . . . . Ralf Hoechenberger, Detlev Hummel, and Juergen Seitz

69

Detection of Internet Cheating in Online Assessments Using Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manika Garg and Anita Goel

77

Suspicious Event Detection of Cargo Vessels Based on AIS Data . . . . . . Hari Kumar Radhakrishnan, Shyam Sundar, R. Bharath, and C. P. Ramanarayanan

91

vii

viii

Contents

Identifying Trends Using Improved Affinity Propagation (IMAP) Clustering Algorithm on Evolving Data Stream . . . . . . . . . . . . . . Umesh Kokate, Arvind Deshpande, and Parikshit Mahalle

101

Comparative Analysis of Recommendation System Using Similarity Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chour Singh Rajpoot and Santosh Kumar Vishwakarma

119

Graphology-Based Behavior Prediction: Case Study Analysis . . . . . . . . . Ieesha Deshmukh, Pradnya Agrawal, Aboli Khursale, Neha Lahane, and Harsha Sonune Statistics-Driven Suspicious Event Detection of Fishing Vessels Based on AIS Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hari Kumar Radhakrishnan, Saikat Bank, R. Bharath, and C. P. Ramanarayanan Landslide Susceptibility Mapping Using J48 Decision Tree and Its Ensemble Methods for Rishikesh to Gangotri Axis . . . . . . . . . . . . Vivek Saxena, Upasna Singh, and L. K. Sinha Distributed Reduced Alphabet Representation for Predicting Proinflammatory Peptides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hrushikesh Bhosale, Aamod Sane, Vigneshwar Ramakrishnan, and Valadi K. Jayaraman

129

143

153

161

Predicting Injury Severity in Construction Using Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reetun Maiti and Balagopal G. Menon

175

Prakruti Nishchitikaran of Human Body Using Supervised Machine Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Krantee M. Jamdaade and Harshali Y. Patil

187

Time Series AutoML; Hierarchical Factor Based Forecasting . . . . . . . . . Mrityunjoy Panday, V. Manikanta Sanjay, and Venkata Yaswanth Kanduri Economic Growth Prediction and Performance Analysis of Developed and Developing Countries Using ARIMA, PCA, and k-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefan Hutter, Noah Winkler, Neha Sharma, and Juergen Seitz Optimized Feature Representation for Odia Document Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Itishree Panda, Jyoti Prakash Singh, and Gayadhar Pradhan Paris Olympic (2024) Medal Tally Prediction . . . . . . . . . . . . . . . . . . . . . . . . Prince Nagpal, Kartikey Gupta, Yashaswa Verma, and Jyoti Singh Kirar

201

221

235 249

Contents

ix

Crop Recommendation Using Hybrid Ensembles Model by Extracting Statistical Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tanuja K. Fegade, B. V. Pawar, and Ram Bhavsar

269

Aspect-Based Product Recommendation System by Sentiment Analysis of User Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K. Anjali Kamath, Divya T. Puranam, and Ashwini M. Joshi

285

Deep Learning and AI X-ABI: Toward Parameter-Efficient Multilingual Adapter-Based Inference for Cross-Lingual Transfer . . . . . . . . . . . . . . . . Aditya Srinivas Menon and Konjengbam Anand

303

Comparative Study of Depth Estimation for 2D Scene Using Deep Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arvind Kumar, Bhargab Das, Raj Kumar, and Virendra Kumar

319

Product Recommendation System Using Deep Learning Techniques: CNN and NLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Israa Bint Rahman Salman and G. Varaprasad

331

Modified Long Short-Term Memory Algorithm for Faulty Node Detection Using node’s Raw Data Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . Ketki Deshmukh and Avinash More

345

DCNN-Based Transfer Learning Approaches for Gender Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Md. Shahzeb, Sunita Dhavale, D. Srikanth, and Suresh Kumar

357

Fetal Brain Component Segmentation Using 2-Way Ensemble U-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shinjini Halder, Tuhinangshu Gangopadhyay, Paramik Dasgupta, Kingshuk Chatterjee, Debayan Ganguly, Surjadeep Sarkar, and Sudipta Roy

367

Morbidity Detection from Clinical Text Data Using Artificial Intelligence Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H. L. Bhavyashree and G. Varaprasad

383

Investigating the Application of Multi-lingual Transformer in Graph-Based Extractive Text Summarization for Hindi Text . . . . . . . Sawan Rai, Ramesh Chandra Belwal, and Abhinav Sharma

393

Analysis of Machine Learning Algorithms for COVID Detection Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jyoti Kanjalkar, Pramod Kanjalkar, Kaivalya Aole, Abu Ansari, Harshal Abak, and Aarya Tiwari

405

x

Contents

Real-Time Learning Toward Asset Allocation . . . . . . . . . . . . . . . . . . . . . . . Manish Sinha CoffeeGAN: An Effective Data Augmentation Model for Coffee Plant Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Savitri Kulkarni, P. Deepa Shenoy, and K. R. Venugopal CycleGAN Implementation on Cross-Modality Transfer Between Magnetic Resonance Image (MRI) and Computed Tomography (CT) Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Priyesh Kumar Roy, Santhanam, Bhanu Pratap Misra, Abhijit Sen, T. Palanisamy, Sima Gautam, S. V. S. S. N. V. G. Krishna Murthy, and Mahima Arya

421

431

445

A Dual Fine Grained Rotated Neural Network for Aerial Solar Panel Health Monitoring and Classification . . . . . . . . . . . . . . . . . . . . . . . . . Indrajit Kar, Sudipta Mukhopadhyay, and Bijon Guha

457

Explainable Automated Coding of Medical Narratives Using Identification of Hierarchical Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sonam Sharma, Sumukh Sirmokadam, and Pavan Chittimalli

479

Automatic Text Summarization Using Word Embeddings . . . . . . . . . . . . Sophiya Antony and Dhanya S. Pankaj Video Object Detection with an Improved Classification Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sita Yadav and Sandeep M. Chaware An Unsupervised Image Processing Approach for Weld Quality Inspection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shashwat Shahi, Gargi Kulkarni, Sumukh Sirmokadam, and Shailesh Deshpande

489

511

525

YOLO Algorithms for Real-Time Fire Detection . . . . . . . . . . . . . . . . . . . . . Ashish Ranjan, Sunita Dhavale, and Suresh Kumar

537

Improving Autoencoder-Based Recommendation Systems . . . . . . . . . . . . Nilanjan Sinhababu, Monalisa Sarma, and Debasis Samanta

555

ARTSAM: Augmented Reality App for Tool Selection in Aircraft Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nikhil Satish and C. R. S. Kumar

569

Named Entity Recognition over Dialog Dataset Using Pre-trained Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Archana Patil, Shashikant Ghumbre, and Vahida Attar

583

Depression Detection Using Hybrid Transformer Networks . . . . . . . . . . . Deap Daru, Hitansh Surani, Harit Koladia, Kunal Parmar, and Kriti Srivastava

593

Contents

A Comparative Study of Distance-Based Clustering Algorithms in Fuzzy Failure Modes and Effects Analysis . . . . . . . . . . . . . . . . . . . . . . . . Nukala Divakar Sai, Baneswar Sarker, Ashish Garg, and Jhareswar Maiti

xi

605

Data Storage, Management and Innovation Moderating Role of Project Flexibility on Senior Management Commitment and Project Risks in Achieving Project Success in Financial Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pankaj A. Tiwari Growth Profile of Using AI Techniques in Antenna Research Over Three Decades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G. S. Mani and S. D. Pohekar Economical Solution to Automatic Evaluation of an OMR Sheet Using Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pramod Kanjalkar, Prasad Chinchole, Archit Chitre, Jyoti Kanjalkar, and Prakash Sharma Defense and Evaluation Against Covert Channel-Based Attacks in Android Smartphones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ketaki Pattani and Sunil Gautam Study of the Need for Effective Cyber Security Trainings in India . . . . . Rakesh Kumar Chawla, J. S. Sodhi, and Triveni Singh From Bricks to Clicks: The Potential of Big Data Analytics for Revolutionizing the Information Landscape in Higher Education Sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ashraf Alam and Atasi Mohanty

627

647

665

685 697

721

Enabling Technologies and Applications Learning on the Move: A Pedagogical Framework for State-of-the-Art Mobile Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ashraf Alam and Atasi Mohanty Quantum Web-Based Health Analytics System . . . . . . . . . . . . . . . . . . . . . . K. Pradheep Kumar and K. Dhinakaran Comparative Study of Noises Over Quantum Key Distribution Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sawan Bhattacharyya, Ajanta Das, Anindita Banerjee, and Amlan Chakrabarti Comparative Assessment of Methane Leak Detection Using Hyperspectral Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Karan Owalekar, Ujas Italia, Vijeta, and Shailesh Deshpande

735 749

759

783

xii

Contents

Application of Machine Learning in Customer Services and E-commerce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G. Aarthi, R. Karthikha, Sharmila Sankar, S. Sharon Priya, D. Najumnissa Jamal, and W. Aisha Banu

817

The Effect of Comorbidity on the Survival Rates of COVID-19 Using Quantum Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arsheyee Shahapure, Anindita Banerjee, and Rehan Deshmukh

833

Multi-GPU-Enabled Quantum Circuit Simulations on HPC-AI System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manish Modani, Anindita Banerjee, and Abhishek Das

845

Study of NFT Marketplace in the Metaverse . . . . . . . . . . . . . . . . . . . . . . . . Isha Deshpande, Rutuja Sangitrao, and Leena Panchal

855

Secure Authentication of IoT Devices Using Upgradable Smart Contract and Fog-Blockchain Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . Shital H. Dinde and S. K. Shirgave

863

The NFT Wave and Better Use Cases—Using NFTs for Documents, Identification, and College Degrees . . . . . . . . . . . . . . . . . . B. R. Arun kumar and Siddharth Vaddem

879

Data Science Techniques for Handling Epidemic, Pandemic Outbreak and Pandemic Management in MOOCs: Current Status and Scope for Public Health Data Science Education . . . . . . . . . . Anussha Murali, Arun Mitra, Sundeep Sahay, and Biju Soman Data Science Approaches to Public Health: Case Studies Using Routine Health Data from India . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arun Mitra, Biju Soman, Rakhal Gaitonde, Tarun Bhatnagar, Engelbert Nieuhas, and Sajin Kumar Arboviral Epidemic Disease Forecasting—A Survey on Diagnostics and Outbreak Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Supreet Kaur and Sandeep Sharma

897

913

941

Special Session Bio Signal Processing Using Deep Learning Convolution Neural Network for Weed Detection . . . . . . . . . . . . . . . . . . . . G. N. Balaji, S. V. Suryanarayana, G. Venkateswara Rao, and T. Vigneshwaran Detecting Bone Tumor on Applying Edge Computational Deep Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G. Megala, P. Swarnalatha, and R. Venkatesan

965

981

Contents

The Third Eye: An AI Mobile Assistant for Visually Impaired People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. S. Hariharan, H. Abdul Gaffar, and K. Manikandan

xiii

993

Machine Learning Model for Brain Stock Prediction . . . . . . . . . . . . . . . . . 1005 S. Amutha, S. Joyal Isac, K. Niha, and M. K. Dharani The Prediction of Protein Structure Using Neural Network . . . . . . . . . . . 1021 S. M. Shifana Rayesha, W. Aisha Banu, and Sharon Priya System of Persons Identification Based on Human Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1029 A. Akhatov, I. Himmatov, Christo Ananth, and T. Ananth Kumar Analysis of COVID-19 Genome Using Continuous Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1047 Shivani Saxena, Abhijeeth M. Nair, and Ahsan Z. Rizvi Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1079

Editors and Contributors

About the Editors Dr. Neha Sharma is a data science crusader who advocates its application for achieving sustainable goals, solving societal, governmental and business problems as well as promotes the use of open data. She has more than 22 years of experience and presently working with Tata Consultancy Services and is a Founder Secretary, Society for Data Science. Prior to this she has worked as Director of premier Institute of Pune, that run postgraduation courses like MCA and MBA. She is an alumnus of a premier College of Engineering and Technology, Bhubaneshwar and completed her Ph.D. from prestigious Indian Institute of Technology, Dhanbad. She is a Senior IEEE member, Secretary—IEEE Pune Section and ACM Distinguished Speaker. She is an astute academician and has organized several national and international conferences and published several research papers. She is the recipient of “Best Ph.D. Thesis Award” and “Best Paper Presenter at International Conference Award” at National Level. She is a well-known figure among the IT circle, and well sought over for her sound knowledge and professional skills. Neha Sharma has been instrumental in integrating teaching with the current needs of the Industry and steering students towards their bright future.

xv

xvi

Editors and Contributors

Dr. Amol Goje is a President of Society of Data Science and served as a Director, Vidya Pratishthan’s Institute of Information Technology (VIIT), Baramati, Pune for last 19 Years. He has a total of over 25 years of experience in the field of Information and Computer Technology (ICT). He has developed many systems for the University. Dr. Amol’s main area of interest is to work for underprivileged people in the rural part of India. In his 19 years as a Director, he has designed and implemented numerous path-breaking, innovative and cost effective solutions. Some solutions to name a few are—number of computer labs, which are economically sustainable to the rural schools and colleges. Dr. Amols’s main innovation is Computer Mobile Van. He has done lot of research work in Information Technology and its application for rural community. In appreciation to his exemplary work, Dr. Amol has received the Ashoka Fellow award in the year 2002. He is engaged as a Technical advisor on many government and non-government organizations. He was also a member of sub-committee of the Planning Commission, Government of India in Information Technology division. In his tenure as the Director at VIIT, he has organized 8 international conferences in Baramati under the aegis of Baramati Initiatives. Dr. Goje has received the Marathwada Bhushan Award for spreading the IT education to the masses. He is the member of the Working group on Agricultural Extension Constituted by Planning Commission government of India. He is key player in setting up the Community Training and Learning Centers (CTLC) in Maharashtra. In this project, VIIT is providing Computer Training to the women from the Self Help groups (SHG). Dr. Goje has received the Manthan (AIF) award successively for two years in Year 2005 and 2006.Governemt of Maharashtra “Maharashtra IT award” (IT HRD) in 2008. He is also the President of Community Radio Association (CRA) of India. Recently he has been bestowed with the honor of ‘Best Director’ of Pune University—2016– 2017. Currently he is a Chairman Board of Studies, Savitribai Phule Pune University, Pune.

Editors and Contributors

xvii

Amlan Chakrabarti is presently a Full Professor and Director of the A. K. Choudhury School of Information Technology, University of Calcutta. He is an M.Tech. from University of Calcutta and did his Doctoral research at the Indian Statistical Institute, Kolkata. He was a Post-Doctoral fellow at the School of Engineering, Princeton University, USA during 2011–2012. He is the recipient of DST BOYSCAST fellowship award in the area of Engineering Science in 2011, Indian National Science Academy Visiting Scientist Fellowship in 2014, JSPS Invitation Research Award from Japan in 2016, Erasmus Mundus Leaders Award from European Union in 2017, Hamied Visiting Fellowship University of Cambridge in 2018 and Shiksha Ratna Award from Government of West Bengal, in 2018. He is the Team Leader of European Center for Research in Nuclear Science (CERN, Geneva) ALICE-India project for University of Calcutta and also a key member of the CBM-FAIR project at Darmstadt Germany. He is also the Principal Investigator of the Center of Excellence in Systems Biology and Biomedical Engineering, University of Calcutta funded by MHRD (TEQIP-II). He has published around 150 research papers in referred journals and conferences and has graduated 10 Ph.D. students till date. He has been involved in research projects funded by DRDO, DST, DAE, MietY, UGC, Ministry of Social Empowerment, TCS, Intel India and TEQIP. He is a Sr. Member of IEEE and ACM, Secretary of IEEE CEDA India Chapter and Vice President Society for Data Science. His research interests are: Quantum Computing, VLSI design, Embedded System Design, Computer Vision and Analytics. Prof. Alfred M. Bruckstein, B.Sc., M.Sc. in EE from the Technion IIT, Haifa, Israel, and Ph.D. in EE, from Stanford University, Stanford, California, USA, is a Technion Ollendorff Professor of Science, in the Computer Science Department there, and is a Visiting Professor at NTU, Singapore, in the SPMS. He has done research on Neural Coding Processes, and Stochastic Point Processes, Estimation Theory, and Scattering Theory, Signal and Image Processing Topics, Computer Vision and Graphics, and Robotics. Over the years he held visiting positions at Bell Laboratories, Murray Hill, NJ, USA, (1987–2001) and TsingHua University,

xviii

Editors and Contributors

Beijing, China, (2002–223), and made short time visits to many universities and research centers worldwide. At the Technion, he was the Dean of the Graduate School, and is currently the Head of the Technion Excellence Program.

Contributors G. Aarthi Computer Science and Engineering, B. S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India Harshal Abak Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India H. Abdul Gaffar Vellore Institute of Technology, Vellore, India Pradnya Agrawal MKSSS’s Cummins College of Engineering for Women, Pune, India W. Aisha Banu Computer Science and Engineering, B. S. Abdur Rahman Crescent Institute of Science and Technology, Vandalur, Chennai, India A. Akhatov Samarkand State University, Samarkand, Uzbekistan Ashraf Alam Rekhi Centre of Excellence for the Science of Happiness, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India S. Amutha School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India Konjengbam Anand Indian Institute of Information Technology, Kottayam, India T. Ananth Kumar IFET College of Engineering, Gangarampalaiyam, Tamilnadu, India Christo Ananth Samarkand State University, Samarkand, Uzbekistan Abu Ansari Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India Sophiya Antony Department of Computer Science and Engineering, College of Engineering Trivandrum, Trivandrum, India Kaivalya Aole Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India B. R. Arun kumar Department of Computer Science and Engineering, BMS Institute of Technology and Management, Bengaluru, India Mahima Arya Unikul Solutions Pvt. Limited, Bangalore, India

Editors and Contributors

xix

Vahida Attar COEP, Savitribai Phule Pune University, Pune, India G. N. Balaji School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India Anindita Banerjee Centre for Development of Advanced Computing, Corporate Research and Development, Panchavati, Pashan, Pune, India Saikat Bank School of Computer Engineering and Mathematical Sciences, Defence Institute of Advanced Technology, Pune, Maharashtra, India Chaman Banolia TCS Research, Tata Consultancy Services Ltd., Pune, Maharashtra, India Ramesh Chandra Belwal Computer Science and Engineering Department, B.T.K.I.T., Dwarahat, Uttarakhand, India R. Bharath School of Computer Engineering and Mathematical Sciences, Defence Institute of Advanced Technology, Pune, Maharashtra, India Shripad Bhatlawande Vishwakarma Institute of Technology, Pune, India Tarun Bhatnagar National Institute of Epidemiology, Chennai, India Sawan Bhattacharyya Department of Computer Science, Ramakrishna Mission Vivekananda Centenary College, Kolkata, West Bengal, India Ram Bhavsar School of Computer Sciences, KBC NMU, Jalgaon, M.S, India H. L. Bhavyashree Department of Computer Science, BMSCE, Bangalore, India Hrushikesh Bhosale Department of Computer Science, FLAME University, Pune, Maharashtra, India Aditi Bornare MKSSS’s Cummins College of Engineering for Women, Pune, India Amlan Chakrabarti A. K. Choudhury School of Information Technology, University of Calcutta, Kolkata, West Bengal, India Neelam Chandolikar Vishwakarma Institute of Technology, Pune, India Kingshuk Chatterjee Government College of Engineering and Ceramic Technology, Kolkata, India Sandeep M. Chaware PCCOE, Pune, India; JSPM RSCOE, Pune, India Rakesh Kumar Chawla Amity Business School, Amity University, Uttar Pradesh, Noida, India; National Crime Records Bureau, Ministry of Home Affairs, New Delhi, India Srushti Chiddarwar MKSSS’s Cummins College of Engineering for Women, Pune, India Prasad Chinchole Vishwakarma Institute of Technology, Pune, India

xx

Editors and Contributors

Archit Chitre Vishwakarma Institute of Technology, Pune, India Pavan Chittimalli TRDDC, Pune, India Deap Daru Dwarkadas J. Sanghvi College of Engineering, Mumbai, India Abhishek Das Centre of Development for Advanced Computing, Pune, India Ajanta Das Amity Institute of Information Technology, Amity University Kolkata, Kolkata, India Bhargab Das CSIR-Central Scientific Instruments Organisation, Chandigarh, India Paramik Dasgupta Asian Institute of Technology, Khlong Nueng, Thailand P. Deepa Shenoy Department of CSE UVCE, Bangalore University, Bengaluru, India Ieesha Deshmukh MKSSS’s Cummins College of Engineering for Women, Pune, India Ketki Deshmukh Mukesh Patel School of Technology Management and Engineering, NMIMS University, Mumbai, India Rehan Deshmukh School of Biosciences and Technology, Dr. Vishwanath Karad MIT-World Peace University, Pune, India Arvind Deshpande Computer Engineering, SKNCoE, Vadgaon, SPPU, Pune, India Isha Deshpande Department of Information Technology, Cummins College of Engineering for Women, Pune, Maharashtra, India Prajakta Deshpande MKSSS’s Cummins College of Engineering for Women, Pune, India Shailesh Deshpande Tata Consultancy Services, Mumbai, India; Tata Research Development and Design Centre, Tata Consultancy Services, Hadapsar, Pune, India; TCS Research, Tata Consultancy Services Ltd., Pune, Maharashtra, India M. K. Dharani Kongu Engineering College, Erode, India Sunita Dhavale Defence Institute of Advanced Technology (DIAT), Pune, India Rutuja Dherange MKSSS’s Cummins College of Engineering for Women, Pune, India K. Dhinakaran Department of Artificial Intelligence and Data Science, Dhanalakshmi College of Engineering, Chennai, Tamil Nadu, India Shital H. Dinde Department of Technology, Shivaji University, Kolhapur, Maharashtra, India Arushi Dubey MKSSS’s Cummins College of Engineering for Women, Pune, India

Editors and Contributors

xxi

Tanuja K. Fegade KCES’s Institute of Management and Research, Jalgaon, M.S, India Rakhal Gaitonde Sree Chitra Tirunal Institute for Medical Sciences and Technology, Trivandrum, Kerala, India Tuhinangshu Gangopadhyay Government College of Engineering and Leather Technology, Kolkata, India Debayan Ganguly Government College of Engineering and Leather Technology, Kolkata, India Ashish Garg Department of Industrial and Systems Engineering, Indian Institute of Technology, Kharagpur, India; Centre of Excellence in Safety Engineering and Analytics, Indian Institute of Technology, Kharagpur, India Manika Garg Department of Computer Science, University of Delhi, New Delhi, India Sima Gautam Institute of Nuclear Medicine and Allied Sciences, DRDO, New Delhi, India Sunil Gautam Institute of Technology, Nirma University, Ahmedabad, Gujarat, India Shashikant Ghumbre GCOE&R, Awasari, Savitribai Phule Pune University, Pune, India Anita Goel Department of Computer Science, Dyal Singh College, University of Delhi, New Delhi, India Bijon Guha Siemens Technology and Services Private Limited, Mumbai, India Kartikey Gupta Banaras Hindu University, Varanasi, India Shinjini Halder Government College of Engineering and Leather Technology, Kolkata, India R. S. Hariharan Vellore Institute of Technology, Vellore, India I. Himmatov Samarkand State University, Samarkand, Uzbekistan Ralf Hoechenberger Allianz Kunde und Markt GmbH, Munich, Germany Detlev Hummel University of Potsdam, Potsdam, Germany Stefan Hutter Duale Hochschule Baden-Württemberg, Heidenheim, Germany Ujas Italia Tata Consultancy Services Mumbai, SDF 5, SEEPZ, Mumbai, India D. Najumnissa Jamal Electronics and Instrumentation Engineering, B. S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India Krantee M. Jamdaade K. J. Somaiya Institute of Management, Mumbai, India

xxii

Editors and Contributors

Valadi K. Jayaraman Department of Computer Science, FLAME University, Pune, Maharashtra, India Ashwini M. Joshi Department of Computer Science and Engineering, PES University, Bangalore, India S. Joyal Isac Saveetha Engineering College, Chennai, India K. Anjali Kamath Department of Computer Science and Engineering, PES University, Bangalore, India Advait Kamathe Vishwakarma Institute of Technology, Pune, India Venkata Yaswanth Kanduri Artificial Intelligence and Analytics, Cognizant Technology Solutions India, Bengaluru, India Jyoti Kanjalkar Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India Pramod Kanjalkar Department of Instrumentation Engineering, Vishwakarma Institute of Technology, Pune, India Indrajit Kar Siemens Technology and Services Private Limited, Mumbai, India R. Karthikha Electrical and Electronics Engineering, B. S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India Supreet Kaur Department of Computer Engineering and Technology, Guru Nanak Dev University, Amritsar, India Aboli Khursale MKSSS’s Cummins College of Engineering for Women, Pune, India Jyoti Singh Kirar Banaras Hindu University, Varanasi, India Umesh Kokate Department of Computer Engineering, SKNCoE, Vadgaon, SPPU, Pune, India Harit Koladia Dwarkadas J. Sanghvi College of Engineering, Mumbai, India S. V. S. S. N. V. G. Krishna Murthy Defence Institute of Advanced Technology, Pune, India Chinmay Kulkarni Vishwakarma Institute of Technology, Pune, India Gargi Kulkarni Tata Consultancy Services, Mumbai, India Savitri Kulkarni Department of CSE UVCE, Bangalore University, Bengaluru, India Arvind Kumar CSIR-Central Scientific Instruments Organisation, Chandigarh, India C. R. S. Kumar Department of Computer Science and Engineering, Defence Institute of Advanced Technology (DIAT) (Deemed to Be University), Pune, India

Editors and Contributors

xxiii

K. Pradheep Kumar Department of Computer Science and Engineering, BITS Pilani, Pilani, Tamil Nadu, India Raj Kumar CSIR-Central Scientific Instruments Organisation, Chandigarh, India Sajin Kumar University of Kerala, Trivandrum, Kerala, India Suresh Kumar Defence Institute of Psychological Research (DIPR), Delhi, India Virendra Kumar CSIR-Central Scientific Instruments Organisation, Chandigarh, India Neha Lahane MKSSS’s Cummins College of Engineering for Women, Pune, India Parikshit Mahalle Department of Artificial Intelligence and Data Science, B R A C T’s, Vishwakarma Institute of Information Technology, Pune, India Jhareswar Maiti Department of Industrial and Systems Engineering, Indian Institute of Technology, Kharagpur, India; Centre of Excellence in Safety Engineering and Analytics, Indian Institute of Technology, Kharagpur, India Reetun Maiti Centre for Computational and Data Sciences, IIT Kharagpur, Kharagpur, India G. S. Mani Symbiosis International University, Pune, India K. Manikandan Vellore Institute of Technology, Vellore, India V. Manikanta Sanjay Artificial Intelligence and Analytics, Cognizant Technology Solutions India, Bengaluru, India G. Megala Vellore Institute of Technology, Vellore, India Aditya Srinivas Menon Indian Institute of Information Technology, Kottayam, India Balagopal G. Menon Department of Industrial and Systems Engineering, IIT Kharagpur, Kharagpur, India; Centre of Excellence in Safety Engineering and Analytics, IIT Kharagpur, Kharagpur, India Bhanu Pratap Misra Unikul Solutions Pvt. Limited, Bangalore, India Arun Mitra Sree Chitra Tirunal Institute for Medical Sciences and Technology, Trivandrum, Kerala, India Manish Modani NVIDIA, Pune, India Atasi Mohanty Rekhi Centre of Excellence for the Science of Happiness, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India Avinash More Mukesh Patel School of Technology Management and Engineering, NMIMS University, Mumbai, India

xxiv

Editors and Contributors

Sudipta Mukhopadhyay Siemens Technology and Services Private Limited, Mumbai, India Anussha Murali Jawaharlal Nehru University, New Delhi, India Prince Nagpal Banaras Hindu University, Varanasi, India Abhijeeth M. Nair Department of Biotechnology and Bioengineering, Institute of Advanced Research, Gandhinagar, Gujarat, India Engelbert Nieuhas University of Koblenz and Landau, Landau, Germany K. Niha School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India P. Nishanth Department of Computer Science, BMSCE, Bangalore, India Karan Owalekar Tata Consultancy Services Mumbai, SDF 5, SEEPZ, Mumbai, India T. Palanisamy Amrita University, Coimbatore, Tamil Nadu, India Leena Panchal Department of Information Technology, Cummins College of Engineering for Women, Pune, Maharashtra, India Deep Pancholi Amrita Vishwa Vidyapeetham, Coimbatore, India Itishree Panda National Institute of Technology Patna, Patna, India Mrityunjoy Panday Artificial Intelligence and Analytics, Cognizant Technology Solutions India, Bengaluru, India Dhanya S. Pankaj Department of Computer Science and Engineering, College of Engineering Trivandrum, Trivandrum, India Kunal Parmar Dwarkadas J. Sanghvi College of Engineering, Mumbai, India Archana Patil COEP, Savitribai Phule Pune University, Pune, India Harshali Y. Patil Mumbai Educational Trust Institute of Computer Science, Mumbai, India Ketaki Pattani Department of Engineering and Physical Sciences, Institute of Advanced Research, Gujarat, India B. V. Pawar School of Computer Sciences, KBC NMU, Jalgaon, M.S, India K. Ram Prabhakar TCS Research, Tata Consultancy Services Ltd., Bangalore, Karnataka, India Gayadhar Pradhan National Institute of Technology Patna, Patna, India S. Sharon Priya Computer Science and Engineering, B. S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India

Editors and Contributors

xxv

Sharon Priya B. S. Abdur Rahman Crescent Institute of Science and Technology, Vandalur, India Divya T. Puranam Department of Computer Science and Engineering, PES University, Bangalore, India Hari Kumar Radhakrishnan Indian Navy, New Delhi, India Sawan Rai School of Computer Science Engineering and Technology, Bennett University, Greater Noida, U.P., India Chour Singh Rajpoot Manipal University Jaipur, Jaipur, India Vigneshwar Ramakrishnan School of Chemical and Biotechnology, SASTRA Deemed-to-be University, Thanjavur, Tamil Nadu, India C. P. Ramanarayanan Defence Institute of Advanced Technology, Pune, Maharashtra, India Ashish Ranjan Defence Institute of Advanced Technology (DIAT), Pune, India G. Venkateswara Rao GITAM University, Visakhapatnam, India Ahsan Z. Rizvi Department of Computer Sciences and Engineering, Institute of Advanced Research, Gandhinagar, Gujarat, India Priyesh Kumar Roy Defence Institute of Advanced Technology, Pune, India Sudipta Roy Artificial Intelligence and Data Science, Jio Institute, Navi Mumbai, India Sundeep Sahay Department of Informatics, University of Oslo, Oslo, Norway Nukala Divakar Sai Department of Industrial and Systems Engineering, Indian Institute of Technology, Kharagpur, India Israa Bint Rahman Salman Department of Computer Science and Engineering, BMSCE, Bangalore, India Debasis Samanta Computer Science and Engineering, Indian Institute of Technology, Kharagpur, India Aamod Sane Department of Computer Science, FLAME University, Pune, Maharashtra, India Rutuja Sangitrao Department of Information Technology, Cummins College of Engineering for Women, Pune, Maharashtra, India Sharmila Sankar Computer Science and Engineering, B. S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India Santhanam Unikul Solutions Pvt. Limited, Bangalore, India Surjadeep Sarkar Government College of Engineering and Leather Technology, Kolkata, India

xxvi

Editors and Contributors

Baneswar Sarker Department of Industrial and Systems Engineering, Indian Institute of Technology, Kharagpur, India Monalisa Sarma Subir Chowdhury School of Quality and Reliability, Indian Institute of Technology, Kharagpur, India Nikhil Satish National Institute of Technology Karnataka (NITK), Surathkal, India Shivani Saxena Department of Computer Sciences and Engineering, Institute of Advanced Research, Gandhinagar, Gujarat, India Vivek Saxena DGRE, DRDO Chandigarh, Chandigarh, India Juergen Seitz Baden-Wuerttemberg Cooperative State University Heidenheim, Heidenheim, Germany C. Selvi Indian Institute of Information Technology, Kottayam, Kerala, India Abhijit Sen Unikul Solutions Pvt. Limited, Bangalore, India Arsheyee Shahapure Computer Science and Engineering, Dr. Vishwanath Karad MIT-World Peace University, Pune, India Shashwat Shahi Tata Consultancy Services, Mumbai, India Md. Shahzeb Defence Institute of Advanced Technology (DIAT), Pune, India Abhinav Sharma Computer Science and Engineering Department, Institute of Technical Education and Research, Siksha O Anusandhan, Bhubaneswar, Odisha, India Neha Sharma Tata Consultancy Services, Pune, India Prakash Sharma PCOMBINATOR, Pune, India Sandeep Sharma Department of Computer Engineering and Technology, Guru Nanak Dev University, Amritsar, India Sonam Sharma Tata Consultancy Services, New Delhi, India S. M. Shifana Rayesha B. S. Abdur Rahman Crescent Institute of Science and Technology, Vandalur, India Swati Shilaskar Vishwakarma Institute of Technology, Pune, India S. K. Shirgave Department of Computer Science and Engineering, DKTE’s Textile and Engineering Institute, Ichalkaranji, Maharashtra, India Jyoti Prakash Singh National Institute of Technology Patna, Patna, India Triveni Singh Cyber-Crime at Uttar Pradesh Police, Uttar Pradesh, Noida, India Upasna Singh School of Computer Engineering and Mathematical Science, DIAT, Pune, India L. K. Sinha DGRE, DRDO Chandigarh, Chandigarh, India

Editors and Contributors

xxvii

Manish Sinha Research & Innovation, Analytics and Insight, Tata Consultancy Services, Mumbai, India Nilanjan Sinhababu Center for Computational and Data Sciences, Indian Institute of Technology, Kharagpur, India Sumukh Sirmokadam Tata Consultancy Services, Mumbai, India J. S. Sodhi Group CIO & Senior Vice President-Amity Education Group, Uttar Pradesh, Noida, India Biju Soman Sree Chitra Tirunal Institute for Medical Sciences and Technology, Trivandrum, Kerala, India Harsha Sonune MKSSS’s Cummins College of Engineering for Women, Pune, India D. Srikanth Defence Institute of Advanced Technology (DIAT), Pune, India Kriti Srivastava Dwarkadas J. Sanghvi College of Engineering, Mumbai, India Shyam Sundar School of Computer Engineering and Mathematical Sciences, Defence Institute of Advanced Technology, Pune, Maharashtra, India Hitansh Surani Dwarkadas J. Sanghvi College of Engineering, Mumbai, India S. V. Suryanarayana Department of IT, CVR College of Engineering, Hyderabad, India P. Swarnalatha Vellore Institute of Technology, Vellore, India Aarya Tiwari Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India Pankaj A. Tiwari School of Management, CMR University, Bengaluru, Karnataka, India Siddharth Vaddem Department of Computer Science and Engineering, BMS Institute of Technology and Management, Bengaluru, India G. Varaprasad Department of Computer Science and Engineering, BMSCE, Bangalore, India R. Venkatesan SRM TRP Engineering College, Tiruchirappalli, India K. R. Venugopal Department of CSE UVCE, Bangalore University, Bengaluru, India Yashaswa Verma Banaras Hindu University, Varanasi, India T. Vigneshwaran SRM TRP Engineering College, Mannachanallur, India Vijeta Tata Consultancy Services Ltd., TCSL-SEZ Unit, (IGGGL-SEZ), Chennai, India

xxviii

Editors and Contributors

Santosh Kumar Vishwakarma Manipal University Jaipur, Jaipur, India Noah Winkler Duale Hochschule Baden-Württemberg, Heidenheim, Germany Sita Yadav PCCOE, Pune, India; Army Institute of Technology, Pune, India

Machine Learning

Troomate—Finding a Perfect Roommate a Literature Survey Aditi Bornare, Arushi Dubey, Rutuja Dherange, Srushti Chiddarwar, and Prajakta Deshpande

Abstract When someone arrives at a new place, the first thing that they do is look for a place to dwell, and when the company they are with is not appropriate, it leads to disputes and sometimes shifting to a new place which can be very tedious. Finding the right roommate is very important as it affects the physical and mental health of a being. The present solutions in the market for this problem include websites like roomster.com, olx.in, indianroommates.in, etc., and applications like FlatMatch, Roomster, etc. A detailed analysis of potential competition was done in order to figure out our standing among them. The paper analyzes them on the basis of their features, ratings from users, etc. Roommate-finding platforms exist, but they just display a list of users without considering their preferences. This is where Troomate has an advantage. A detailed literature survey was done about how the pairing of people could be done based on their idea of the perfect roommate. This paper includes various algorithms like Gale–Shapley, Elo rating score, and techniques like clustering in order to effectively match on the basis of powerful filters like social traits, diet habits, sleeping schedules, etc. With an interactive, well-designed UI, dependable backend, and reliable algorithms, Troomate aims at solving the problem effectively. Keywords Roommate finder · Recommendation system · Algorithm · Match finder · Pairing algorithm

A. Bornare · A. Dubey · R. Dherange · S. Chiddarwar (B) · P. Deshpande MKSSS’s Cummins College of Engineering for Women, Pune, India e-mail: [email protected] A. Bornare e-mail: [email protected] A. Dubey e-mail: [email protected] R. Dherange e-mail: [email protected] P. Deshpande e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_1

3

4

A. Bornare et al.

1 Introduction Due to increasing urbanization many people shift to urban cities in search of employment, to get a better education and livelihood opportunities. As soon as they visit a new place they normally don’t have a place to live. Usually, people rent out someplace and live there. Most economical thing is to share a place with some other person. It is very important to get the right person to live with [1]. The survey conducted for the paper that asked people questions like, whether they got their roommate by choice, the various problems they faced with them, how these problems affected the person’s life, what actions they took to resolve these issues, and how happy they were with their current roommate, their description of an ideal roommate, and finally, the age group they prefer being roommates with. The survey stated that a maximum of people did not find a roommate of their choice. Moreover, common problems included bathroom-sharing and space-sharing issues, noisy/messy people, and mismatched sleeping schedules. This caused people to be late for work/college, forced to clean up the other person’s mess, and created issues in studying and sleeping. In trying to verbally resolve these issues with their roommate, many people found that things worsened: they didn’t change, the person had to leave, or heated arguments took place. The traditional way to find a roommate is to book a room, and the next person that books the same room is assigned as a roommate. Or if there’s no empty room available, people have to choose one of the already occupied rooms, without even knowing the person who they are sharing it with. Briefly, they get paired off with a stranger. If people with opposite habits or interests get paired as roommates whenever they decide to move into a new place it is going to cause disputes. It might affect the mental, physical, and emotional well-being of a person. For instance, if a person who always likes to keep things tidy gets a messy person as a roommate it’s going to be a headache for both of them. Everyone has different lifestyles and habits. Sometimes it is easier to adjust but at times it becomes very difficult to cope with a person of the opposite nature. And if a person is not getting along with their roommate then eventually they have to look for a better place to live in. The vicious circle of shifting continues if they don’t find the right roommate the second time too. When the user registers on Troomate, the user would be asked a few simple questions about their preferences, like location, gender, hobbies, food habits, etc. Based on this user input, a list of potential roommates would be generated who have similar preferences as the user. This list would be then shown to the user in the best match first pattern. The user would also be able to contact their matches and even set up a meeting with them before moving in. Along with this, all the users of Troomate would be verified and also reviewed to maintain the integrity of the community. Every decision in life is taken so carefully that a person should also be able to find a roommate of their choice. By using this platform, one can choose a roommate based on filters and preferences.

Troomate—Finding a Perfect Roommate a Literature Survey

5

2 Literature Survey ’ An exploratory investigation of the relationship between roommates’ first impressions and subsequent communication patterns [2].” This paper focuses on some roommate relationships that last longer while others don’t. It shows us that more positive initial interaction with roommates is the key to long-lasting relationships and will continue to live together in the upcoming time as well. Also, the roommates having more positive initial interactions use efficient conflict management strategies and avoid asocial or non-productive strategies. Predicted Outcome Value (POV) theory is on the impact of initial interaction on people which can help to determine the effects of future outcomes based on interpersonal relationships. This paper supports hypothesis: “H1. Roommates who are (a) still living together or (b) would live together in the future will report more positive initial impressions. H2. There will be a significant positive relationship between roommates’ initial impressions and satisfaction with subsequent conversations. H3a: Roommates who report more positive initial impressions will also report using more productive conflict management strategies with their roommates. H3b: Roommates who report more positive initial impressions will be less likely to use unproductive conflict management strategies.” From this paper, the conclusion is that for a better roommate relationship, initial interaction has to be positive. Therefore, in Troomate, a meeting feature is included which will cover the initial interaction among the people which will play an important part in ensuring the positive or negative future outcome of the roommate relationship. [3] “College Admissions and the Stability of Marriage.” The paper starts out by pointing out issues in college admissions for both parties: students and colleges. It discusses various uncertainties in the whole process, such as not knowing whether an applicant has applied elsewhere, what the ranking of the universities he has applied to is, and which of these other colleges will offer him admission. On the other hand, an applicant faces the risk of not getting his desired university by accepting another college’s offer and not waiting for the offer of his desired university. All these lead to a situation where it becomes difficult to get the desired quota of students of desired quality for colleges. The authors suggest a procedure for assigning applicants to colleges in such a way that is satisfactory to both parties: first, all applicants apply to the college of their first choice, this college then sends offers to the top candidates (within its quota) and rejects the rest, after which these rejected candidates apply to the college of their second choice, and so on. The paper goes on to extend the problem to another case: marrying off all people of a community in a satisfactory manner. However, the difference in this situation is that the number of women and men are equal, unlike the initial example where applicants would outnumber colleges. The authors proved that if there were two groups with an

6

A. Bornare et al.

equal number of elements, stable pairs for all can always be generated. The idea is that while someone from group X is unpaired, they go to the person who they have ranked highest from the other group Y. If this person is unpaired too, then these two people are paired, i.e., married. However, if the person from Y is already married, but prefers the new person from X, they will leave their current partner. Otherwise, if that is not the case, then this person from X will move on to the next person from Y in their rank list. The overall problem thus refers to situations where the solution to our problem is finding the most “stable” pairs possible. A pair (A, B) is said to be stable when A does not have a better choice available than B, and vice versa. A student choosing a university, men, and women looking to pair up (such as in the case of Tinder), are some practical examples of where we require to find stable pairs. In conclusion, the solution proposed by Gale and Shapley may be of use in matching problems and can be the basis for solving many real-life issues that involve finding stable pairs. [4] “House Allocation With Existing Tenants: An Equivalence.” In this paper, they showed that there is an important relationship between two intuitive house allocation mechanisms which are designed to avoid inefficiencies in those situations where there are existing tenants and newcomers. Since the core (or equivalently the competitive mechanism) is the undisputed mechanism in the context of housing markets, it is tempting to extend this mechanism via constructing an initial allocation by assigning existing tenants their current houses and randomly assigning vacant houses to newcomers. However, this extended mechanism grants initial property rights of vacant houses to newcomers, and therefore its equivalence: to the “newcomer favoring” top trading cycles algorithm is quite intuitive. Their result provides additional support for the top trading cycles mechanism by showing that its main competitor is a very biased special case. The first mechanism chooses the unique core allocation of a “sister exchange economy” which is constructed by assigning each existing tenant her current house and randomly assigning each newcomer a vacant house. The second mechanism -the top trading cycles mechanism first chooses an ordering from a given distribution and next determines the final outcome as follows: assign the first agent her top choice, the next agent her top choice among remaining houses, and so on, until someone demands house of an existing tenant who is still in the line. At that point modify the queue by inserting her at the top and proceed. Similarly, insert any existing tenant who is not already served at the top of the queue once her house is demanded. Whenever a loop of existing tenants forms, assign each of them the house she demands and proceed. Their main result is that the core-based mechanism is equivalent to an extreme case of the top trading cycles mechanism which orders newcomers before the existing tenants.”

Troomate—Finding a Perfect Roommate a Literature Survey

7

[5] “Roommate Finder Application.” This paper highlights the fact that accommodation in today’s world has been soaring at high rates and finding a place that fits budget, preference, and proximity can be quite a challenging task. Even more so, if the person who is making the hunt is a student, and is looking for not just a suitable shelter but also someone to share it with. As per a study conducted, students who prefer living off-campus have increased by 13%. The reason this has become an attractive option is the fact that they can find accommodation as well as a roommate as per their preferences and taste. It explores an Android application that takes advantage of a variety of features to find a potential match for the user’s preference. It discusses multiple features for the user, a summary of which is provided below: • Type of user is determined at the time of login—a user could be searching for a roommate or an accommodation. • User can fill out their preferences and other information, as well as view the profiles of other users. • User can update their profiles, i.e., their preferences and information are editable. • A user looking for a roommate can look at potential matches, the extent to which they meet the user’s criteria can be displayed as percentages. The user can shortlist and then contact the potential matches too. The app talked about in the paper is focused on the university level. The registration process has six steps: providing a purpose of registration; basic details like name, contact information, etc.; university-specific information like year of graduation, major, etc.; adding apartment preference or roommate preferences; adding interests and activities; and finally, general preferences and additional notes. This discussed application also makes use of a search page that will allow the user to search for profiles on the basis of the university. These profiles can be added to a “shortlist” and can also be viewed in detail by clicking on them. The potential matches can be contacted via an in-app messaging system as well. A separate page is created for viewing potential matches along with the percentage reflecting the degree to which the profile matches. Adding to the shortlist as well as sending a message will be supported on this page as well. In conclusion, the app conceived in this paper can be used by a wide range of people as it satisfies the needs of two types of users, along with providing different types of communication to connect to two users like app-to-app messages, text messages to other user’s phone, and direct email to other users. [6] “Roommate similarity: Are roommates who are similar in their communication traits more satisfied?” “This study investigated whether roommates who were similar in their communication traits would express more satisfaction with an affinity for their roommates. This study looked at roommate similarity based on three communication traits: willingness to communicate, interpersonal communication competence, and verbal aggressiveness.

8

A. Bornare et al.

There were 403 participants in this study, of which, 203 were college students enrolled in introductory communication courses at a large midwestern university. Students took one questionnaire with them for completion by a roommate (N = 200). Participants were 219 females and 184 males. There were 98 female–female pairs, 20 female–male pairs, and 82 male–male pairs. Ages ranged from 15 to 57 (Mean = 20.20, Standard Deviation = 3.75). Participants (college students and their roommates) completed measures of their own communication traits and their feelings about their roommates. Results showed that roommates who were prosocially similar (when both roommates were high in willingness to communicate when both roommates were high in interpersonal communication competence, and when both roommates were low in verbal aggressiveness) reported the highest roommate satisfaction and liking. Based on this research paper, the social trait filter (extrovert, introvert, ambivert) was defined. This was defined in the results of this paper that people with similar communication styles tend to match well together. Hence, an extrovert person would match well with an extrovert roommate, an introvert person would get along with an introvert roommate, and so on. Consequently, this would lead to a harmonious relationship between roommates.” [7] “The impact of study groups and roommates on academic performance.” This paper uses random assignment of students to investigate the impact of study groups and roommates on academic achievement. One of the results of this study is that informal social interaction with roommates has a significant positive impact on academic achievement while study group peers have no observable impact. This suggests that social interaction is more effective in boosting academic outcomes than study groups that are designed for learning. To put it simply, this study is proof that good social interaction with roommates plays an important role in academic success as well. This makes getting the right roommates even more important. Having similar interests between roommates is one of the factors that lead to a good connection between and conversation between them. As a result, basic interests were also included in the filters that we’ve defined. Pairing people with similar interests would not only ensure a good relationship between roommates but would also boost their academic performance which would improve their relationship further.

3 Existing Platforms 3.1 Olx [8] Accepts location from users. An optional filter is provided to select a budget range, i.e., minimum to maximum range. After entering the location, the user is provided with a list of accommodations available with photos of the rooms/apartments/flats and their details.

Troomate—Finding a Perfect Roommate a Literature Survey

9

3.2 Indianroommates [9] After entering the required city name, users are directed to a list of rooms available in the city, where each room has information such as room type, location, gender, and furnishing type, posted by.

3.3 Roomster [10] This website is quite similar to Indianroommates as well. It asks for location and gives you a listing of people who also require a roommate or room, with their details of the budget, move-in date, and listing type. It ensures that the users are genuine by verifying their email and mobile number.

3.4 Flatmatch [11] It is an app that takes user preferences, but makes it mandatory to enter property information. It is not always the case that if someone is finding a roommate who has a property or room, so the users cannot use the platform unless they have a property. After entering all the information, the user will get a list of recommendations with match percentages. After considering all the existing platforms, there is a need to create a dedicated platform for users to find roommates based on their preferences. This will help users to establish a good roommate relationship without having as much as fewer conflicts.

4 Comparative Analysis There are various existing platforms which provide the facility to find a roommate, but very few provide the feature of preferences (Table 1).

5 Need of Finding Roommate Most of the people do not get a choice to choose their roommate and end up having a roommate as a complete stranger. If the roommate is not good enough, the person may feel trapped and claustrophobic, money can make a living situation very stressful, pick up bad habits, and most importantly it can affect mental health.

10

A. Bornare et al.

Table 1 Comparison of existing applications Platforms

Flatmatch

Roomster

Indianroommates

Olx

Website/App

App

Website

Website

Website

Features

Takes user preferences from the user as well as makes it mandatory to enter the room/flat details owned by a person and displays recommendations with match percentage

Takes input as location and displays a list of users with people who also require a roommate or room, with their details of the budget, a move-in date, and listing type. Roommate preferences are optional

Takes location and sharing type as input and displays a list of rooms with their details

Takes location and budget (optional) as input and it gives a list of flats and rooms available and their details

Drawbacks if any

Not necessary every person searching for a roommate has a room/flat/property

Displays a list and does not provide recommendations as per user’s convenience by using enhanced filters

Does not use filters to display the roommates rather it focuses on displaying details of room

Lack of precise filters and the user has to manually search

Common roommate problems • • • •

Different sleeping habits Borrowing (more than required) Living messy Arguments over bill splitting are some out of many.

As per the survey conducted for the paper, responses were collected from people aged 18–25, and 76% of people do not get a choice to choose their roommate (Fig. 1).

Fig. 1 Response of survey form

Troomate—Finding a Perfect Roommate a Literature Survey

11

Fig. 2 How young people lived in 1968 versus 2012 [12]

According to Pew Research, there is a significant increase in the number of people from America, who prefer to live with roommates from the year 1968 to 2012 (Fig. 2). As many students are moving from rural areas to urban areas for education or jobs and by referring to the charts above, it is concluded that there is an increasing need for people to have roommates of their choice.

6 Proposed Work Use of Gale–Shapley Algorithm [3] The problem of finding the right roommate for a person can be seen as a pairing problem. Consider that it is required to create pairs among two groups. Let each person in each of these groups rank each person in the opposite group. For simplicity, each of these groups has four members. Let one group be represented by capital letters (group Y) and the other by lowercase letters (group X). This gives us two matrices of 4 × 4 (Table 2). Table 2 Group X ranks group Y A

B

C

D

a

1

2

3

4

b

1

4

3

2

c

2

1

3

4

d

4

2

3

1

12

A. Bornare et al.

Table 3 Group Y ranks group X A

B

C

D

a

3

3

2

3

b

4

1

3

2

c

2

4

4

1

d

1

2

1

4

This matrix represents rankings of group X. Person a rates people from group Y as 1-A, 2-B, 3-C, 4-D, which means, person a would like person A as roommate the most, while person D least. This matrix is read row-wise (Table 3). Similarly, this matrix represents the ranking of group Y: person A ranks people of group X as 1-d, 2-c, 3-a, 4-b. This matrix is read column-wise. Suppose (a, A) are paired up initially, then it can be observed that A has its third choice while a has its first choice. If A can break this pair and go for someone that is higher in A’s list, say c, and if c can break its current pair to go for A, then the new pair would be (A, c) and the previous ones would break (they are “unstable”). So, the goal is to find stable pairs for everyone, such that equilibrium is achieved. The Elo Rating System [13] Developed by Arpad Elo, the Elo Rating System is used in competitive chess to rate how good a player is. Starting off with an initial rating of 1000, it goes up and down depending on wins and losses. This is used in tournaments to match players with similar abilities together. Up until 2019, it was also used by Tinder to decide matches. In Tinder, a win would mean a right swipe and a loss would mean a left swipe. A similar concept can be applied to match roommates. Each player’s ability can be shown in the form of a bell curve. Each player has the potential to play at a range of different abilities. For instance, on some days a player plays really well which means they pick high numbers. But on some days when a player is sick, they play poorly meaning they pick low numbers. But mostly they play average, that is they pick middle numbers. So if the frequencies of the numbers is looked at, they form a bell-shaped curve (Fig. 3). The center of the curve is that player’s average which is that player’s rating. The Elo Rating System uses two formulae [15]. Formula derivation Below is the derivation of the first formula. For this, the frequency of the differences between the two players’ numbers is looked at. This is the logistic curve. If a player has a rating that is 400 points more than another player, then they are 10 times more likely to win. So on the curve, the area to the right of zero will be 10 times the area to the left of zero (Fig. 4). If this statement is turned into a formula, this is the probability that Player A wins where RA and RB are the ratings of Player A and Player B (Fig. 5)

Troomate—Finding a Perfect Roommate a Literature Survey

13

Fig. 3 Playing strengths of two players [14, 15]

Fig. 4 Difference in playing strength showing area under the curve [14, 15]

P(A wins) = 10((R A −R B )/400) ∗ P(B wins) . Using this the winner of the game can be predicted. What would happen if the player performs better or worse than the predicted outcome? • If a player performs better than expected then their rating would increase. The more surprising their win is the more points they will get up to a maximum of 32 points. • Similarly, if a player performs worse than expected their rating would decrease up to 32 points. Hence, after the game, the player rating is updated using the Update Formula which is based on the difference between the expected score and the actual score.

14

A. Bornare et al.

Fig. 5 Simplification of formula [14]

New rating = rating + 32(score-expected score) For example, suppose there is a player A whose rating is 1000. Assuming that A is the weaker player, A is supposed to lose the game. So the expected win is 0.35. In the turn of events, A actually wins the game and scores 1 point. So putting this in the formula, it is observed that Player A’s rating has increased. New rating of A = 1000 + 32 (1 − 0.35) New rating of A = 1000 + 20.8 New rating of A = 1000 + 21 = 1021 This is the same amount that is decreased from the loser’s rating. Our Solution The input will be taken from users which will be the preferences like gender, location, social traits, and basic interests. The scores for each of the users will be calculated. The aim is to build a solution such that it will recommend us a suitable roommate by taking into consideration all the preferences. For simplicity, the easiest and shortest example with few preferences is taken as input. Let the threshold be 2 (Table 4). Table 4 Sample data of four people Weight = 1

Weight = 1

Weight = 0.45

Weight = 0.45

Person

Location

Gender

Food

Var

A

Pune

F

Veg

x

B

Surat

M

Non-veg

y

C

Pune

M

Vegan

x

D

Pune

F

Veg

z

Troomate—Finding a Perfect Roommate a Literature Survey

15

Here, the weights of compulsory filters is considered as 1 and weights of additional filters (not-compulsory) as 0.45. Weights of additional filters are taken as 0.45 because they should add up to less than the threshold. The threshold is the summation of the compulsory filters. In this case, it is 2. Users cannot be paired with herself/himself. So instead of evaluating the weights for herself/himself as 2.9, it is explicitly programmed as 0. So for mat[A][A] = 0. For mat[A][B] we will compare data of A with B (Table 5). Therefore, mat[A][B] = 0 + 0 + 0 + 0 = 0 (Table 6). So, mat[A][C] = 1 + 0 + 0 + 0.45 = 1.45 (Table 7). Hence, mat[A][D] = 1 + 1 + 0.45 + 0 = 2.45. Similarly the entire matrix is evaluated (Table 8). The results are thus: A [0, 0, 1.45, 2.45]. B [0, 0, 1, 0]. C [1.45, 1, 0, 1]. D [2.45, 0, 1, 0]. Table 5 Score calculation of A and B A

Pune

F

Veg

x

B

Surat

M

Non-veg

y

Score

0

0

0

0

Table 6 Score calculation of A and C A

Pune

F

Veg

x

C

Pune

M

Veg

x

Score

1

0

0

0.45

Table 7 Score calculation of A and D A

Pune

F

Veg

x

D

Pune

F

Veg

z

Score

1

1

0.45

0

Table 8 Final matrix with scores A

B

C

D

A

0

0

1.45

2.45

B

0

0

1

0

C

1.45

1

0

1

D

2.45

0

1

0

16 Table 9 Matched pair matrix

A. Bornare et al. A[D]

A

B

B

C

C

D[A]

D

The data of rows is sorted like so: For A: [2.45, 1.45, 0, 0]. Thus, ranking: D. Here preferences will be filled only if the data is more than the threshold which is 2 in this case. Similarly, B [1, 0, 0, 0], Ranking will not be filled as data is less than the threshold. C [1.45, 1, 1, 0], Ranking will not be filled as data is less than the threshold. D [2.45, 1, 0, 0], Thus, ranking: A. Gale–Shapley Matrix (Table 9). So A will be matched with D.

7 Conclusion There are many platforms available to search for roommates, but they just display a list of users searching for roommates with details of the room. They generally focus more on finding rooms than roommates. There are very few dedicated platforms to find roommates which use user preferences to recommend roommates based on their habits, cuisine, language, and much more. To ensure genuine users, OTP verification is mandatory. The details of the user are not shared with another user, thus ensuring security. After selecting the appropriate roommate, a virtual meeting will help them to finalize their decision and not just blindly trust their Troomate. Therefore, Troomate has the advantage to recommend the users, the roommates based on their preferences. For finding roommates, the swap-based model was not preferred; hence, the Elo rating system was combined with the Gale–Shapley algorithm to develop an algorithm that would be suitable for the application.

References 1. Badjate M, Ponkshe S, Rohida S, Patel T, Deshpande P (2022) Family of friends—a hostel utility system. In: Tuba M, Akashe S, Joshi A (eds) ICT systems and sustainability. Lecture notes in networks and systems, vol 321. Springer, Singapore. https://doi.org/10.1007/978-98116-5987-4_10 2. Marek CI, Wanzer MB, Knapp JL (2004) An exploratory investigation of the relationship between roommates’ first impressions and subsequent communication patterns. Commun Res Rep 21(2):210–220. https://www.researchgate.net/publication/233319197_An_explor atory_investigation_of_the_relationship_between_roommates’_first_impressions_and_sub sequent_communication_patterns

Troomate—Finding a Perfect Roommate a Literature Survey

17

3. Gale D, Shapley LS (1962) College admissions and the stability of marriage. Am Math Monthly 69(1):9–15. http://www.eecs.harvard.edu/cs286r/courses/fall09/papers/galeshapley.pdf 4. Sönmez T, Ünver MU (2001) House allocation with existing tenants: an equivalence. https:// papers.ssrn.com/sol3/papers.cfm?abstract_id=297175 5. Mehetre S, Biradar J, Malghe N, Patil S (2020) Roommate finder application. Int J Adv Res Comput Commun Eng 9(1). https://ijarcce.com/wp-content/uploads/2020/02/IJARCCE.2020. 9135.pdf 6. Martin MM, Anderson CM (1995) Roommate similarity: Are roommates who are similar in their communication traits more satisfied? Commun Res Rep 12(1):46–52. https://www.resear chgate.net/profile/Matthew-Martin-29/publication/254264332_Roommate_Similarity_Are_ Roommates_Who_Are_Similar_in_Their_Communication_Traits_More_Satisfied/links/573 0fe6308aed286ca0dc521/Roommate-Similarity-Are-Roommates-Who-Are-Similar-in-TheirCommunication-Traits-More-Satisfied.pdf 7. Jain T, Kapoor M (2013) The impact of study groups and roommates on academic performance. Indian School Bus, pp 18–20. https://eprints.exchange.isb.edu/id/eprint/264/1/1.2.pdf 8. Olx. https://www.olx.in/items/q-looking-for-roommate. Last accessed 14 Sept 2022 9. Indianroommates. https://www.indianroommates.in/. Last accessed 14 Sept 2022 10. Roomster. https://www.roomster.com/. Last accessed 14 Sept 2022 11. Flatmatch. https://flatmatch392335379.wordpress.com/. Last accessed 14 Sept 2022 12. Thompson D (2013) How roommates replaced spouses in the 20th century. The Atlantic. https://www.theatlantic.com/business/archive/2013/09/how-roommates-replaced-spouses-inthe-20th-century/279210/. Last accessed 14 Sept 2022 13. Elo AE (1967) The proposed USCF rating system, its development, theory, and applications. Chess Life XXII (8):242–247. http://uscf1-nyc1.aodhosting.com/CL-AND-CR-ALL/ CL-ALL/1967/1967_08.pdf#page=26 14. Glickman ME (1995) A comprehensive guide to chess ratings. Department of Mathematics, Boston University, pp 1–49. http://www.glicko.net/research/acjpaper.pdf 15. singingbanana, Grime J, U.K. (2019) The elo rating system for chess and beyond [Online Video] https://www.youtube.com/watch?v=AsYfbmp0To0. Last accessed 14 Sept 2022

A Smart System to Classify Walking and Sitting Activities Based on EEG Signal Shripad Bhatlawande , Swati Shilaskar , Advait Kamathe, Chinmay Kulkarni, and Neelam Chandolikar

Abstract This paper presents a machine learning-based system to classify electroencephalogram (EEG) signal for walking and sitting actions. This system is proposed to generate control signal for the artificial limb which is aimed help the limb amputees to carry out movements. EEG data is collected from eight healthy male subjects with a mean age of 21 years while they performed walking and sitting actions. The montage activity was captured using seven electrodes considering frontal, central, and parietal regions. The variations of the EEG signal contain information of the physical activity being performed. The statistical features, namely mean, minimum, maximum, kurtosis, standard deviation, and skewness, are extracted from EEG signal. An array of five classifiers linear regression, support vector machine (SVM), K-nearest neighbors (KNN), decision tree, and random forest, was used for activity recognition. Random forest provided 100% accuracy of classification. Keywords Brain activity · Electroencephalography · Walking versus sitting · Lower limb amputation · Machine learning · Random forest classifier

S. Bhatlawande · S. Shilaskar (B) · A. Kamathe · C. Kulkarni · N. Chandolikar Vishwakarma Institute of Technology, Pune 411037, India e-mail: [email protected] S. Bhatlawande e-mail: [email protected] A. Kamathe e-mail: [email protected] C. Kulkarni e-mail: [email protected] N. Chandolikar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_2

19

20

S. Bhatlawande et al.

1 Introduction Amputation is the process of surgically removing the limb or a part of the limb. Annually around 150,000 people undergo a surgery due to amputation [1]. An Indian study reported the number of amputees to be 10,00,000 [2]. People with age more than 60 go through major lower limb amputation (LLA) [3]. The reason behind this is that 60% of LLA is caused by diabetes mellitus (DM) [4]. Other than the DM, brain stroke is also a contributing factor with 795,000 people annually affected [5]. Age-groups other than the elderly people also go through amputation, but the reasons might be different. The common causes of traumatic amputations, such as car accidents, machine accidents, work-related accidents, or military combat mishaps, were covered by Ketz et al. [6]. Amputation comes at the cost of dependency. Mobility gives independence and decides the quality of life [7]. Several mobility aids such as wheelchairs and prosthetics are described in [8]. The brain–computer interface (BCI) helps to control the lower limbs through an exoskeleton using thoughts [9]. The BCI uses electroencephalography (EEG), a technique used to capture brain activity. This paper aims to classify the sitting and walking positions from EEG signals.

2 Literature Review Electroencephalogram signals have received a lot of interest recently because they represent a person’s intention to do an action. EEG signals have been utilized by researchers to assist disabled people, control devices such as robotic arms, wheelchairs, and exoskeletons. As a result, for a BCI system, effective decoding of these signals is crucial. To decode the rhythms of EEG, Chaisaen et al. [10] first acquired data from eight healthy patients using 11 electrodes at frontopolar, central, and parietal positions. The sampling rate was set to 1200 Hz. Tortora et al. [11] followed a similar approach by recording EEG signals from 11 subjects using actiCap. Biosemi ActiveTwo System with 32 electrodes was used by Wang et al. [12] to collect EEG data. The sampling rate was 256 Hz. Gordleeva et al. [13] recorded EEG signals using NVK 52 with central electrodes. EEG signals of six subjects were captured by Ortiz et al. [14] with the StarstimR32. Electrodes covering the mid-scalp region were used for motion imagery detection analysis. A 32-channel EEG signal acquisition system was used by Roy et al. [15]. A 16-channel montage with frontal, central, and parietal positions was considered. Sampling frequency of 500 Hz was used by [13–15]. Electrode cap produced by NeuroScan Inc. was used by Liu et al. [16]. Nine electrodes covering central and surrounding regions were used. Xu et al. [17] collected EEG signal data from nine subjects with three central electrodes and sampled them with a sampling frequency of 250 Hz. Electroencephalogram signal preprocessing involves removal of noise using a notch filter of 50 Hz [10–15]. EEG signal was band-pass filtered using Butterworth filter [10, 12, 14, 15]. Tortora et al. [11] band-pass filtered raw EEG data between 1 and 98 Hz

A Smart System to Classify Walking and Sitting Activities Based …

21

using Chebyshev filters. Artifact Subspace Reconstruction was used to remove nonstereotypical movement artifacts [10, 15]. EEG signal was downsampled to 250 Hz after removing artifacts [10, 15]. Feature vectors were produced for classification task using Common Spatial Pattern filter [10–13]. Ortiz et al. [14] analyzed three methods, viz. fast Fourier transform, Stockwell transform, and Hilbert Huang transform, to extract feature characteristics. Cross-correlation approach was used by Roy et al. [15] to extract the different statistical features such as mean, maximum, skewness, standard deviation, minimum, and kurtosis. Xu et al. [17] employed the shorttime Fourier transform (STFT) to convert EEG signals to a set of time–frequency spectrum images. Mohamed et al. [19] compared two feature extraction techniques: empirical mode decomposition and intrinsic time-scale decomposition. Elsayed et al. [18] extracted features like general discontinuity spatial feature, spatial eye differences, spatial difference, temporal kurtosis, and maximum epoch from EEG dataset. SVM was employed for classifying EEG signals [10, 12, 23]. The linear discriminant analysis classifier has demonstrated to be effective for lower limb classification [11, 13]. SVM might be regarded as the preferred classifier algorithm when paired with Stockwell transform, according to Ortiz et al. [14], who studied the effectiveness of LDA and SVM for MI classification. Logistic regression was used along with a threefold cross-validation to validate the classification model by Roy et al. [15]. Deep learning models employed for classification showed slightly better results than traditional classification algorithms [17–22, 24, 25]. In comparison to typical machine learning techniques, deep learning models necessitate a vast amount of data and a high computing cost.

3 Methodology The proposed system acquires EEG signals from the subject, processes the signal, extracts necessary features, and classifies the activity user wants to perform. The overall block diagram for classification of EEG signals is shown in Fig. 1.

Preprocessing

EEG Data

Feature Extraction

Classification

Output

Fig. 1 Block diagram for classification of electroencephalogram signals

22

S. Bhatlawande et al.

3.1 Experimental Setup The experimental setup used to record the EEG data is explained in Fig. 2. The data is recorded using Digital Clarity BrainTech 24-channel EEG machine. The setup contains a time frame of 30 s. A single frame consists of three stages: resting, idle, and motor execution. A screen is used for providing video cues. In the first stage, i.e., the resting stage, the subject is supposed to be motionless and the screen is black. The next stage is the idle stage. After the idle stage, audio and video cues of actions to be performed are shown. The subject performs the action until the time frame of 30 secs exhausts along with the stimulus. The activity is repeated for capturing different trials. Activity Description Collected dataset consists of EEG signals of eight healthy males of age 20–21 years. During collection, the electrode impedance was maintained under 10 k. During ME, the subjects focused only on performing the action. Each subject performed two trials of each action, and 32 trials were recorded. The subject performing sitting and walking actions is shown in Figs. 3 and 4, respectively. Montage Selection EEG montage is nothing but the logical placement of the electrodes over the scalp so as to record the EEG data. Our brain comprises various sections, and each section is responsible for that particular activity. The montage region mimics the regions involved in the brain. The part of the brain responsible for the movement is called the motor cortex. The motor cortex is located at the middle of the scalp. The motor cortex has a subdivision called the primary motor cortex which is further responsible for lower limb movement. So in order to monitor the lower limb movement, one must target the primary motor cortex. To target the primary motor cortex while deciding the montage, central, parietal, and some temporal and frontal electrodes were also targeted. Electrodes Fz, C3, C4, Cz, P3, and P4 were used. Figure 5 shows the selected montage and central electrodes, and some of the frontal as well as parietal region were selected.

Fig. 2 Experimental setup for recording dataset

A Smart System to Classify Walking and Sitting Activities Based … Fig. 3 Motor execution—sitting activity

Fig. 4 Motor execution—walking activity

23

24

S. Bhatlawande et al.

Fig. 5 Electrode position montage for signal acquisition

3.2 Signal Dataset Generation Method Description The dataset comprises eight subjects, out of which seven were used for training and one was used for testing. Each subject was supposed to perform two actions for 30 s, and two trails were conducted for each action. The total trails were 32. Total of 1,30,000 data samples were collected for each action. A total of 2,60,000 samples were recorded. Out of which 30,000 were used for testing, and rest were used for training. Preprocessing EEG signals consist of different ranges of frequency bands. The bands, their frequency ranges, and their significance in various activities are described in Table 1. Brain map plots of different frequencies for walking activity are shown in Fig. 6. Brain map plots of different frequency bands during sitting activity are shown in Fig. 7. Table 1 EEG signal frequency bands

Name

Frequency range (Hz)

Activity

Delta

0.5–4

Deep sleep

Theta

4–8

Deep meditation

Alpha

8–12

Awake but quiet

Beta

12–30

Awake and alert

Gamma

Above 30

Actively learning

A Smart System to Classify Walking and Sitting Activities Based …

25

Fig. 6 Brain map of walking activity

Fig. 7 Brain map of sitting activity

Offline preprocessing of EEG signal involves a notch filter with cutoff frequency 50 Hz used to eliminate the main interference. EEG signal was then band-pass filtered between 10 and 500 Hz using fifth-order Butterworth filter. The original and filtered waveform for walking and sitting activities is shown in Figs. 8 and 9, respectively. Fig. 8 Waveform of filtered signal—walking activity

26

S. Bhatlawande et al.

Fig. 9 Waveform of filtered signal—sitting activity

3.3 Feature Extraction The next stage in EEG signal classification is extracting features using different signal processing techniques. EEG signals in the time–frequency domain are retrieved using the spectrogram. The time–frequency domain allows us to extract data from both domains at the same time. Spectrogram of filtered EEG signals while walking and sitting activity is shown in Figs. 10 and 11, respectively. Windowing technique is used instead of using the entire signal for feature extraction. A window of particular length and overlap of data points is specified. This window slides along the entire length of signal, extracting features for each slice. Spectrograms of cwindows of length 250, 500, and 1000 data points are shown in Fig. 12. Spectrogram of windows with an overlap of 25%, 50%, and 75% is shown in Fig. 13. In Fig. 12, the 500 length window holds pattern variation better than 250 and 1000 length windows. In Fig. 13. 50% overlap retains variations in patterns than the other two. Fig. 10 Spectrogram of EEG signal (Fz-Cz)—walking activity

A Smart System to Classify Walking and Sitting Activities Based …

Fig. 11 Spectrogram of EEG signal (Fz-Cz)—sitting activity

Fig. 12 Spectrogram of EEG signal with varying window sizes

Fig. 13 Spectrogram of EEG signal with varying overlap

27

28

S. Bhatlawande et al.

Statistical features such as mean, median, minimum, maximum, skewness, kurtosis, and standard deviation are the simplest features which can be extracted from an EEG signal. Statistical features which showed the highest amount of variation were extracted. Box plot was used to compare features of the two classes. High variation was observed in boxplots of minimum, maximum, standard deviation, skewness, and kurtosis. Features with high variation results in better training of model. Comparison of box plots in Fig. 14 shows variation of the minimum feature. Variation in maximum feature of both the classes is shown in Fig. 15. Box plots in Fig. 16 show the variation of standard deviation feature. A window of 500 samples with 50% overlap is used to extract features like mean, maximum, minimum, standard deviation, skew, and kurtosis. Extracted features were then used as input for classification. Data obtained after extracting features was normalized to improve the accuracy of classifiers. Total number of feature vectors generated were 81. Fig. 14 Comparison of minimum value feature using box plot

Fig. 15 Comparison of maximum value feature using box plot

A Smart System to Classify Walking and Sitting Activities Based …

29

Fig. 16 Comparison of standard deviation using box plot

3.4 Classification Logistic regression is a linear classifier. For any linear classifier, the prediction curve is always a straight line. Logistic regression is primarily used for binary class classification where the dependent variable is categorical (e.g., walking or sitting). The Logit function is given by:  ln

P 1− P

 = β0 + β1 X 1 + β2 X 2 + · · · + βk X k

(1)

where P is the probability of an event occurring. SVM determines the best hyperplane by maximizing the distance from each category. SVM learns from the extreme cases, the boundary values, unlike any other machine learning algorithm which learns from standard values. The support vectors are the points nearest to the decision border. SVM works great even with a small amount of training data. They are also not biased by outliers and not sensitive to overfitting. When data is not linearly separable RBF kernel can be used. The kernel function is given by:   ||X 1 − X 2 ||2 K (X 1 , X 2 ) = exp − 2σ 2

(2)

where σ is the variance and ||X 1 − X 2 ||2 is the Euclidean distance between two points X 1 and X 2 . KNN chooses K-nearest data points of the new query data and classifies it according to the most frequent occurring label among these neighbors. K-nearest data points are identified using a distance metric like Euclidean distance given by:

30

S. Bhatlawande et al.

Fig. 17 Error rate versus K value

  n  d(P, Q) =  (Q i − Pi )2

(3)

i=1

Optimum value of K is chosen by plotting K versus error rate graph. The value of K for which minimum error is observed is chosen as the optimum value. Minimum error was observed for K = 5 according to Fig. 17. Decision tree contains two nodes, decision node and leaf node. Decision nodes are used to make decisions and have several branches. The results of those choices are leaf nodes. Decision trees discover the best splits to maximize various leaf node positions. The criterion used is Gini entropy given by: Gini Index = 1 −



P j2

(4)

j

where P represents the likelihood that an element will fall into a specific class. The maximum depth is set to 4 after hyperparameter tuning which increased the accuracy by about 10%. Random forest is a supervised machine learning algorithm which uses multiple decision trees to predict the output. In the scope of this system, the number of trees used is 10. The logic behind choosing the number of tree is explained in Algorithm 1.

A Smart System to Classify Walking and Sitting Activities Based …

31

Algorithm 1. Random forest hyperparameter tuning Input Array of different estimator values Output Estimator which gave highest accuracy  1. Initialize estimators E = x0 , x1 , . . . , xn−1 2. Initialize max_accuracy = 0 3. Initialize optimal_estimators 4. for i = 0 to n-1 do 5. model = train (n_estimators = E[i]) 6. accuracy = evaluate (test, model) 7. if accuracy > max_accuracy then 8. max_accuracy = accuracy 9. optimal_estimators = E[i] 10. end if 11. end for 12. print(max_accuracy) 13. print(optimal_estimators)

Flowchart of the classification of EEG signals for walking and sitting signal is shown in Fig. 18. The system takes the EEG input from the user. The 24-channel EEG machine is used to take the input. After the EEG acquisition, the signal is filtered using various techniques. A notch filter with cutoff frequency 50 Hz is used. After removing the mains artifacts, it is necessary to select the operating frequency band. The system uses Alpha, Beta, and Gamma bands as the subject can be in any of the mentioned states. To select a particular frequency band, a band-pass filter is used with cutoff frequency 10–500 Hz. The next block is feature extraction. The system extracts statistical features of the EEG signal. These features were used to train the model. The classifier used was random forest with ten estimators. The classifier then classifies the signal into two classes, i.e., sitting and walking. If the class of action is sitting, then the system performs the sitting action and vice versa. The proposed physical implementation of the system is shown in Fig. 19. The system can be implemented on a low-power processor-based System on Chip (SoC) like Jetson Nano. The system will receive the input as EEG signal which then will be filtered and preprocessed. The statistical features will be extracted and will be fed to a machine learning model for classification. The random forest classifier will classify the signals into two classes as sitting and walking, respectively. The class of the output will decide what kind of action to perform. The stepper motors will be attached to a physical exoskeleton to bring precision in the movements. Depending on the input, the step angle of the motor can be precisely selected, and the torque can also be controlled by selecting the type of step sequence, either half step or full step. As the system will be implemented on Jetson Nano board, the overall power consumption is less.

32 Fig. 18 Flowchart of classification of electroencephalogram signals for walking and sitting actions

S. Bhatlawande et al.

Start

Capture EEG Signal

Notch and Bandpass Filtering

Statistical Feature Extraction

Classification

N Perform walking Action

If Result = Sitting

Y Perform Sitting Action

4 Results Six different machine learning algorithms were trained on the dataset and tested on new samples. Performance of these models is compared. Accuracies of classifiers are shown in Fig. 20. Linear classifiers like logistic regression and SVM linear kernel have shown poor results. Nonlinear classifiers have shown higher accuracies, with random forest giving the highest accuracy of 100%. This shows that data is not linearly separable. Classification report which consists of Precision, Recall, F1 Score, and Accuracy, of nonlinear models, is mentioned in Table 2. Parameters for the same models are compared in Fig. 21. Receiver operating characteristic (ROC) and area under curve (AUC) of random forest classifier are shown in Fig. 22.

A Smart System to Classify Walking and Sitting Activities Based …

33

EEG Data as Input

Feature Extraction

Preprocessing

Classification

Processor Based System

Output to Stepper Motor Exoskeleton Movement Fig. 19 Proposed hardware implementation

Fig. 20 Comparison of accuracy of classifiers Table 2 Classification report of nonlinear models Precision (%)

Recall (%)

F1 Score (%)

Accuracy (%)

SVM-RBF

100

87

93

KNN

100

90

95

95

98

100

99

99

100

100

100

100

Decision tree Random forest

93

34

S. Bhatlawande et al.

Fig. 21 Comparison of nonlinear models

Fig. 22 Area under curve and receiver operating characteristic curve

5 Conclusion This paper presents a machine learning approach to classify EEG signal for walking and sitting actions. EEG data is collected from eight healthy subjects. Frontal, central, and parietal regions are used to acquire the signals. Minimum, maximum, mean, standard deviation, skewness, and kurtosis are statistical features extracted from the signal. Linear regression, SVM, KNN, decision tree, and random forest were trained, and the results are compared. Random forest provided 100% accuracy. The proposed system consists of low computational cost model with high accuracy. Results obtained from this system are useful to control lower limb exoskeletons which will help amputees to achieve independent mobility.

A Smart System to Classify Walking and Sitting Activities Based …

35

Acknowledgement This work is supported by AICTE, Government of India, New Delhi, File No. 8-53/FDC/RPS (POLICY-I)/2019-20 and VIT Pune.

References 1. Newhall K, Spangler E, Dzebisashvili N, Goodman DC, Goodney P (2016) Amputation rates for patients with diabetes and peripheral arterial disease: the effects of race and region. Ann Vasc Surg 30:292–298 2. Sahu A, Sagar R, Sarkar S, Sagar S (2016) Psychological effects of amputation: a review of studies from India. Ind Psychiatry J 25(1):4 3. Fortington LV, Rommers GM, Geertzen JH, Postema K, Dijkstra PU (2012) Mobility in elderly people with a lower limb amputation: a systematic review. J Am Med Directors Assoc 13(4):319–325 4. Ephraim PL, Dillingham TR, Sector M, Pezzin LE, MacKenzie EJ (2003) Epidemiology of limb loss and congenital limb deficiency: a review of the literature. Arch Phys Med Rehabil 84(5):747–761 5. Boehme AK, Esenwa C, Elkind MS (2017) Stroke risk factors, genetics, and prevention. Circ Res 120(3):472–495 6. Ketz AK (2008) Pain management in the traumatic amputee. Crit Care Nurs Clin North Am 20(1):51–57 7. Bilodeau S, Hébert R, Desrosiers J (2000) Lower limb prosthesis utilization by elderly amputees. Prosthet Orthot Int 24(2):126–132 8. Asif M, Tiwana MI, Khan US, Qureshi WS, Iqbal J, Rashid N, Naseer N (2021) Advancements, trends and future prospects of lower limb prosthesis. IEEE Access 9:85956–85977 9. Kilicarslan A, Prasad S, Grossman RG, Contreras-Vidal JL (2013) High accuracy decoding of user intentions using EEG to control a lower-body exoskeleton. In: 2013 35th annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 5606–5609 10. Chaisaen R, Autthasan P, Mingchinda N, Leelaarporn P, Kunaseth N, Tammajarung S, Manoonpong P, Mukhopadhyay SC, Wilaiprasitporn T (2020) Decoding EEG rhythms during action observation, motor imagery, and execution for standing and sitting. IEEE Sens J 20(22):13776–13786 11. Tortora S, Artoni F, Tonin L, Chisari C, Menegatti E, Micera S (2020) Discrimination of walking and standing from entropy of EEG signals and common spatial patterns. In: 2020 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 2008–2013 12. Wang C, Wu X, Wang Z, Ma Y (2018) Implementation of a brain-computer interface on a lower-limb exoskeleton. IEEE Access 6:38524–38534 13. Gordleeva SY, Lobov SA, Grigorev NA, Savosenkov AO, Shamshin MO, Lukoyanov MV, Khoruzhko MA, Kazantsev VB (2020) Real-time EEG–EMG human–machine interface-based control system for a lower-limb exoskeleton. IEEE Access 8:84070–84081 14. Ortíz M, Rodriguez-Ugarte M, Iáez E, Azorín JM (2018) Comparison of different EEG signal analysis techniques for an offline lower limb motor imagery brain-computer interface. In: 2018 40th Annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 203–206 15. Roy, Ganesh, Dristanta Nirola, and Subhasis Bhaumik. “An approach towards development of brain controlled lower limb exoskeleton for mobility regeneration.“ In 2019 IEEE region 10 symposium (TENSYMP), pp. 385–390. IEEE, 2019. 16. Liu Y-H, Lin L-F, Chou C-W, Chang Y, Hsiao Y-T, Hsu W-C (2019) Analysis of electroencephalography event-related desynchronisation and synchronisation induced by lower-limb stepping motor imagery. J Med Biol Eng 39(1):54–69

36

S. Bhatlawande et al.

17. Xu G, Shen X, Chen S, Zong Y, Zhang C, Yue H, Liu M, Chen F, Che W (2019) A deep transfer convolutional neural network framework for EEG signal classification. IEEE Access 7:112767–112776 18. Mohamed EA, Yusoff MZ, Malik AS, Bahloul MR, Adam DM, Adam IK (2018) Comparison of EEG signal decomposition methods in classification of motor-imagery BCI. Multimedia Tools Appl 77:21305–21327 19. Elsayed NE, Tolba AS, Rashad MZ, Belal T, Sarhan S (2021) A deep learning approach for brain computer interaction-motor execution EEG signal classification. IEEE Access 9:101513– 101529 20. Dose H, Møller JS, Iversen HK, Puthusserypady S (2018) An end-to-end deep learning approach to MI-EEG signal classification for BCIs. Expert Syst Appl 114:532–542 21. Amin SU, Alsulaiman M, Muhammad G, Mekhtiche MA, Hossain MS (2019) Deep Learning for EEG motor imagery classification based on multi-layer CNNs feature fusion. Future Gener Comput Syst 101:542–554 22. Wang P, Jiang A, Liu X, Shang J, Zhang L (2018) LSTM-based EEG classification in motor imagery tasks. IEEE Trans Neural Syst Rehabil Eng 26(11):2086–2095 23. Zhang Y, Wang Y, Zhou G, Jin J, Wang B, Wang X, Cichocki A (2018) Multi-kernel extreme learning machine for EEG classification in brain-computer interfaces. Expert Syst Appl 96:302– 310 24. Bhatti MH, Khan J, Khan MU, Iqbal R, Aloqaily M, Jararweh Y, Gupta B (2019) Soft computing-based EEG classification by optimal feature selection and neural networks. IEEE Trans Ind Inf 15(10):5747–5754 25. Yang J, Yao S, Wang J (2018) Deep fusion feature learning network for MI-EEG classification. IEEE Access 6:79050–79059

Forest Fire Detection and Classification Using Deep Learning Concepts P. Nishanth and G. Varaprasad

Abstract Wildfires pose a major risk to humans and other species, but thanks to advances in remote sensing techniques, they are now being continuously observed and regulated. The existence of wildfires in the environment is indicated by the deposition of smoke in the atmosphere. Observation of fire is critical in fire alarm systems for reducing losses and other fire hazards with social consequences. To avoid massive fires, effective detectors from visual scenarios are crucial. A convolution neural network (CNN)-based system has been used to improve fire detection accuracy. Segregating inputs into training and testing subspace is a vital aspect of the Inception—v3 architecture (Vani in Deep learning based forest fire classification and detection in satellite images. IEEE, pp 61–65, 2019 [6]). By default, a maximal and minimal amount of data are used for training and testing, and accuracy vs loss graphs for training and testing data are plotted for data visualization. Keywords Convolutional neural networks (CNNs) · Deep learning (DL) · Fire detection · Fire categorization · Inception-v3 · Local binary pattern (LBP)

1 Introduction Wildfires pose a serious impact on human life, animals, and flora around the planet. They necessitate a faster reaction and wider detection in areas where traditional fire detection methods are ineffective. In general, the forest provides a habitat for a wide range of living species and resources; it regulates CO2 emissions and has a complex ecosystem. Bushfires can be categorized according to their pace, texture, and size. Wildfires burn around 90% of the world’s evergreens each year. Wildfires are a disaster waiting to happen, but they are uncontrollable in the forests. Lightning, volcanic eruptions, and the spontaneous combustion of dry plants are all causes of natural and human-caused fires. P. Nishanth (B) · G. Varaprasad Department of Computer Science, BMSCE, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_3

37

38

P. Nishanth and G. Varaprasad

2 Related Works In the paper [1], they devised a four-stage approach incorporating video-based data techniques to detect wildfires. A background subtraction technique was used first to discover moving zones. Second, candidate fire zones were identified using the CIE Lab color space. Third, although candidate zones may contain dynamic fire-like entities, wavelet transformation was used to differentiate between actual fire and fire-like entities [1]. Lastly, an SVM was used to label the zone of interest as either a blaze or a non-fire. The eventual findings provided that the recommended wildfire detection method had a high sensing accuracy (93.5%) and a low pseudo-rate (6.8%) [1]. In the paper [2], they have described a YOLO version-3-based learning model for concurrent drone-based wildfire detection. It is a deep learning fire detection process that had incorporated a drone to improve detection accuracy and efficiency (UAV). As a first step, the YOLO version 3 wide-range network has been developed to guarantee sensing accuracy. The technique was applied to the UAV-FFD plan, which enabled the drone to record and send real-time fire feeds to the workstation. According to the testing findings, the detection algorithm’s recognition rate is expected to be around 90%, and the frame rate can reach up to 30 frames/s [2]. The detection of wildfires in video images was described in the paper [3]. A conflagration is a natural calamity that appears to have a significant thump on the habitat, ecology, and the lives of people. In a building, sensors can take notice and evacuate. Outdoor fire notifications, on the other hand, were enabled via image fire detection. Many studies extracted the fire properties from digital data to extract fire as wildfire notice and rescue aid. However, the sensing system’s false warning occurs when it detects that the small flame can extinguish itself over time. If the pseudo-alarms occur regularly, it loses the initial possibility of rescue. The goal was to limit the number of pseudo-alarms and to bring down the prevalence of pseudoalarms by providing a system that incorporated deep learning as well as the hidden Markov model (HMM). The conditional transformation of images and frames was specified. After that, the status transition relationship is applied to notify classes. As a consequence of the tests, all deep learning architectures could detect fire at a rate of more than 96%. The system, in particular, has the potential to save up to 88.54% of emergency personnel. In the paper [4], they described an IoT-enabled fire indicator and monitoring framework based on NodeMCU for wildfire sensing [4]. In this work, the NodeMCU is linked to a thermistor, a smoke detector, and an alert. The thermistor detects warmness, the smoke detector detects any smoke as a result of consumption, or the Arduino-connected flame siren triggers an alarm. Whenever a fire breaks out, it consumes nearby protestors and emits smoke. In addition, the LCD has been connected to the NodeMCU system. Because of IoT innovation, the NodeMCU fire checks mechanical criterion was met. When it detects fire or smoke, it triggers an Ethernet reaction to notify the client. Similarly, the NodeMCU-based interface with the LCD indicates unit which is used to display system status, such as if or not smoke

Forest Fire Detection and Classification Using Deep Learning Concepts

39

and overheating have been detected, using the Arduino IDE with an ESP8266 chip. Furthermore, NodeMcU’s interface with the Ethernet component has been designed in a way that the client understands the principal condition message better. It alludes to the client’s flame identification. In the paper [5], they describe the use of CNN for wildfire sensing [5]. Two wellestablished techniques for wildfire detection seem to be video monitoring systems with sophisticated cameras that incorporate pattern recognition and remote sensing, which have been deployed. Although these techniques are so pricey, they are out of reach for most medium-sized and small consumers. Systems based on distributed sensor networks require less power and are less expensive. A camera has been paired with moisture and smoke sensors in this system. Due to recent breakthroughs in deep learning, analyzing camera images and identifying objects is indeed a suitable technique for determining the presence of wildfire or smoke (CNNs). They have used a widely used wildfire image database to train and evaluate CNN classifiers, as well as analyze their categorization accuracy. Their findings showed CNN can accurately recognize the presence of fire and smoke in wildfire images.

3 Proposed Approach Using an optimal mixture of convolution and pooling layers with an affordable computational cost, we propose and analyze CNN to categorize wildfire images. We intend to use the CNN model, so it is essential to identify wildfires in the proposed method. A prevalent feature in learning models is that the maximal training data we have, the more accurate the model will be. The images are classified using the Inception-v3 framework. Images of active forest fires are being used to generate the training data. Representative polygons with and without fire are also used to generate training pixels. The network can be built to differentiate between blaze and non-fire using the CNN method. A CNN produced results and classification accuracy. The inception-v3 framework includes datasets containing a fire, and LBP is deployed to the matched photograph to determine the fire-occurred area and filter the regions in the input image where the fire is present [6]. Figure 1 shows fire categorization and detection architecture diagram. The formulas for maximum and mean pooling are given in the equations, respectively. s

Pi j = max (Fi j )

(1)

i=1, j−1

⎞ ⎛ s s 1 ⎝  ⎠ Fi j Pi j = 2 i=1 j=1

(2) s

where F is the feature map, F ij is the feature map’s pixel value at coordinates (i, j), s × s is the size of the pooled domain, and P is the actual pooled result.

40

P. Nishanth and G. Varaprasad

Fig. 1 Fire categorization and detection architecture diagram. Source Paper [6]

The count of the training data and test data has been visualized by passing the path of the datasets to the PowerBI tool and extracting the folder contents into a pie chart. Here Fig. 2 shows the training and test data dashboard. The main aspect we are using the PowerBI tool to provide is the count of test and training data in a visualized manner in the form of a dashboard. In real time, the number of test and training data counts increases. Instead of providing the count in a tabular column/text, we are providing it in the form of a dashboard using a pie chart.

Fig. 2 Dashboard of training and test data visualization in power BI

Forest Fire Detection and Classification Using Deep Learning Concepts

41

4 Evaluation Metrics Using the datasets, compute precision, recall, and F1 measures to assess the design’s efficiency and correctness. True positives are shown by TP (the classifier recognizes fire in an image region with flames); false positives are represented by FP (the classifier detects fire in an image region without blaze). The count of false-negative/ pseudo-negative results is denoted by the symbol FN (the classifier does not detect fire in the image region with flames) [6]. Precision = TP/(TP + FP)

(3)

Recall = TP/(TP + FN)

(4)

F1-Score = 2 × (Precision × Recall)/(Precision + Recall)

(5)

5 Results The training image count is 5000, with 2500 images including fire and the other 2500 images containing non-fire, and the test image count is 50, as per the kaggle repository. The system is labeled as fire/blaze and non-fire image, and the training was performed with traditional CNN to detect fire/blaze and non-fire images, where the model predicts that in the given image and the testing accuracy is 94%. The loss and accuracy of training data and the loss and accuracy of testing data are plotted in Figs. 3 and4, respectively.

Fig. 3 Loss and accuracy of training data are being plotted

42

P. Nishanth and G. Varaprasad

Fig. 4 Loss and accuracy of testing data are being plotted

6 Conclusion The research builds on a previously proposed wildfire sensing system to develop a CNN framework that maximizes the wildfire detection rate while minimizing computing costs. This component of the study is critical since it will be applied to a sample wildfire-detecting technique. A CNN-based Inception-v3 for wildfire detection is proposed to enhance the system’s performance. The Inception-v3 architecture tends to detect characteristics automatically. This framework achieves a high sensing rate, especially for exploratory and analyzed outcomes. On a dataset of wildfire and error-free images, CNN models with various mixtures of convolution, fully connected layers, and pooling are being evaluated and validated. In the future, the method or technique could be implemented on the Raspberry Pi component or integrated as software, which could be linked to a 360-degree high-tech surveillance camera, allowing it to quickly identify forest fires and mitigate or narrow down the massive damage or loss. Declaration I, the corresponding author, declare that this manuscript is original, has not been published before, and is not currently being considered for publication elsewhere. I can confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. I further confirm that the order of authors listed in the manuscript has been approved by all of us. Signed by the corresponding author on behalf of all the other authors. NISHANTH P

References 1. Mahmoud MAI, Ren H (2019) Forest fire detection and identification using image processing and SVM. J Inf Process Syst 15(1):159–168 2. Jiao Z, Zhang Y, Mu L, Xin J, Jiao S, Liu H, Liu D (2020) A yolov3-based learning strategy for real-time UAV-based forest fire detection. In: Chinese control and decision conference (CCDC). IEEE, pp 4963–4967 3. Hung KM, Chen LM, Wu JA (2019) Wildfire detection in video images using deep learning and HMM for early fire notification system. In: 2019 8th International congress on advanced applied informatics (IIAI-AAI). IEEE, pp 495–498 4. Basu MT, Karthik R, Mahitha J, Reddy VL (2018) IoT based forest fire detection system. Int J Eng Technol 7(2.7):124–126

Forest Fire Detection and Classification Using Deep Learning Concepts

43

5. Khan MS, Patil R, Haider S (2020) Application of convolutional neural networks for wild fire detection. In: 2020 SoutheastCon, vol 2. IEEE, pp 1–5 6. Vani K (2019) Deep learning based forest fire classification and detection in satellite images. In: 2019 11th International conference on advanced computing (ICoAC). IEEE, pp 61–65 7. Wu S, Zhang L (2018) Using popular object detection methods for real-time forest fire detection. In: 2018 11th International symposium on computational intelligence and design (ISCID), vol 1. IEEE, pp 280–284 8. Kumar N, Kumar A (2020) Australian bushfire detection using machine learning and neural networks. In: 2020 7th International conference on smart structures and systems (ICSSS). IEEE, pp 1–7 9. Muhammad K, Ahmad J, Mehmood I, Rho S, Baik SW (2018) Convolutional neural networks based fire detection in surveillance videos. IEEE Access 6:18174–18183 10. Zhang Q, Xu J, Xu L, Guo H (2016) Deep convolutional neural networks for forest fire detection. In: 2016 International forum on management, education and information technology application. Atlantis Press, pp 568–575 11. Kinaneva D, Hristov G, Raychev J, Zahariev P (2019) Early forest fire detection using drones and artificial intelligence. In: 2019 42nd International convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE, pp 1060–1065 12. Shi J, Wang W, Gao Y, Yu N (2020) Optimal placement and intelligent smoke detection algorithm for wildfire-monitoring cameras. IEEE Access 8:72326–72339

Study of Cold-Start Product Recommendations and Its Solutions Deep Pancholi and C. Selvi

Abstract In today’s digital era, consumers rely more and more on the systems that provide them with a personalized experience. In interacting with the system, these consumers create more and more data of different types (click-through rates, items viewed, time spent, number of purchases, and other metrics.). This extensive collection of data from various users is used only to improve the personalizing experience of the users. These systems that utilize consumers’ data to create a more personalized and customized user experience are called Recommender Systems. Recommender Systems play a huge role in helping companies create a more engaging user experience. E-commerce giants like Amazon and Flipkart employ such Recommender Systems. These can learn from the user–system interaction, the likes and dislikes of users, and can promote the visibility of items that interest the user. They are also helpful in luring the customer to buy those things he would have to search for manually in the absence of such a system, which can recommend the item to a user based on his previous interactions. Streaming services like YouTube, Amazon Prime Video, and Netflix also use Recommender Systems to suggest movies/shows that the user might like based on the watch history. This study proposes a hybrid model with item–item collaborative filtering using a graph, user–user collaborative filtering based on textual reviews and ratings, and demographic data to generate accurate product recommendations that address the cold-start issue. Keywords Recommender Systems · Collaborative filtering · Cold-start issue · Hybrid model · Item–item graph · User–user collaborative filtering

D. Pancholi (B) Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] C. Selvi Indian Institute of Information Technology, Kottayam, Kerala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_4

45

46

D. Pancholi and C. Selvi

1 Introduction The importance of Recommender Systems in any market segment where customer interaction happens is increasing daily with the increase in data that people are generating using intelligent devices and systems linked with the web. Thus, any company or firm must utilize the customer–system interaction to maximize the profits as much as possible. Recommendations are generated using many different approaches and algorithms, most of which can be studied as research problems, making this field very intriguing. This paper further explores one such segment in this area: Product Recommender Systems using Graph Data Structures.

1.1 Non-personalized Recommender Systems Non-personalized recommendations refer to suggestions or recommendations which are not meant for one single user or type of user. They are comparatively easy to generate, and any prior information about the users is not required to be known. The recommendations generated, in turn, are more general than specific to users. These include showing the most popular products in each category, the most ordered items in a specific time window in the recent past, and many more methods.

1.2 Personalized Recommender Systems Personalized recommendations are those generated using data about one user or a group of similar users. These require data about each user to be known beforehand and are challenging to generate compared to non-personalized recommendations. The recommendations generated are specific to users as the data used was also specific. So, these include showing any user what he would like to purchase next based on his order history or watch history. Personalized Recommender Systems need userspecific data to generate accurate and meaningful recommendations. As mentioned before, this process is complicated compared to non-personalized systems, and hence, we may face more problems. Such problems can be a lack of data about the user to generate recommendations (data sparsity), inability to generate recommendations for many users at a time (scalability issue), and other issues.

Study of Cold-Start Product Recommendations and Its Solutions

47

1.3 Types of Personalized Recommender Systems • Collaborative filtering: These systems aggregate ratings of objects, identify similarities between the users based on their ratings, and generate new recommendations based on interuser comparisons. • Content-based: Here, the objects are mainly associated with each other through associated features. Unlike collaborative filtering, the system learns the user’s interests based on his interaction with the system and not by relating his behavior to other users. • Demographic-based: It categorizes users based on demographic classes. It is comparatively easy to implement and does not require history or user ratings. Demographic systems use collaborative correlations between users but use different data. • Hybrid: These combine any two of the systems mentioned above to suit some specific industry needs. As it can combine two models, it is the most sought-after in the industry (Fig. 1). This study focuses on the possible solutions to the cold-start issue in contentbased Recommender Systems. Collaborative filtering Recommender Systems are the most widely used technologies in the market. They use the data of users of the system in correlation with each other. This interaction of multiple users’ data to infer associations and recommendations assumes that the users who have agreed on something in the past will agree in the future and hence be interested in a similar type of object. This collaboration filtering among the user data is very interesting. Hence, Fig. 1 Concept of collaborative filtering

48

D. Pancholi and C. Selvi

Table 1 Types of collaborative filtering approaches: advantages and disadvantages Type of CF

Definition

Memory-based CF

Find similar users Easy creation and based on cosine explainability of similarity or Pearson results correlation and take the weighted average of ratings

Advantages

Performance reduces when data is sparse, so non-scalable

Model-based CF

Use machine learning to find user ratings of unrated items, e.g., PCA, SVD, neural networks, matrix factorization

Inference is intracable because of hidden/latent factors

Dimensionality reduction deals with missing/sparse data

Disadvantages

this paper will be focusing on this aspect, along with some metadata, to address the issue of cold-start recommendations. Memory-based approaches to collaborative filtering can be either user–item filtering or item–item filtering. As their names stand, user–item filtering focuses on a particular user and finds similar users based on item ratings. On the other hand, item–item filtering takes an item and, based on some users who liked the item, finds other users who also liked similar items. On the other hand, model-based approaches are developed using machine learning algorithms. These approaches can be clustering-based, matrix factorization based, or deep learning-based models. The primary difference is that the memory-based approaches do not use parametric machine learning approaches. However, we use similarity metrics like cosine similarity or Pearson correlation coefficients, which are based on arithmetic operations (Table 1).

2 Literature Survey The authors of [1] propose a text review-based collaborative filtering Recommender System which extends a text embedding technique. This method is used to predict ratings and generate recommendations. The authors of [2] propose constructing a hybrid model based on studying two probabilistic aspect models combined with user information such as age, gender, and job. This model posits that people with comparable characteristics (age, gender, and occupation) have similar interests. In the event of a large dataset, these three features are ineffective. Other options for improving performance can also be chosen. The authors in [3] propose a demographic collaborative Recommender System which initially partitions the users based on demographic attributes and then clusters them based on ratings using a k-means clustering algorithm.

Study of Cold-Start Product Recommendations and Its Solutions

49

The authors in [4] try to base recommendations solely on users’ demographic information by conducting k-means clustering experiments. The results did not exhibit any correlation between ratings and demographic features. The authors in [5] demonstrate two Recommender Systems: one that uses projections of the bipartite user–item network to generate recommendations and compare the performance and a straightforward approach that uses a probabilistic model without graph structure. The authors in [6] propose to eliminate fake reviews by performing sentiment analysis on them in order to avoid ambiguous recommendations. The authors in [7] propose a probabilistic model to offer non-registered users a natural interface based on uncertainty rules. This natural interface allows the new users to infer their recommendations. The model automatically calculates the probabilities of non-registered users liking or disliking an item and the probability that the non-registered user either likes or dislikes similar items. The authors of [8] present a solution to the cold-start problem based on a probabilistic machine learning model that takes advantage of data obtained during acquisition. The model extracts information that can be used to forecast customer behavior in the future. It is called the ’First Impression Model’ (FIM) by the authors, and it is based on the idea that the behaviors and choices of newly acquired customers might reveal underlying features that are predictive of their future behavior. The authors in [9] propose a new recommendation model. This heterogeneous graph neural recommender learns user-to-item embedding using a convolutional graph network based on a heterogeneous graph constructed from user–item interactions, social links, semantic links predicted from the social network and text reviews.

2.1 Inferences Currently, there are minimal applications of the same in a case where a new user registers with the system, and there is not enough data to generate recommendations. What is intriguing is that despite the users generating new data constantly and the same being uploaded to the web, the cold-start issue is something that renders the Recommender Systems useless for new users. Moreover, since the user has never interacted with the system before, there was no data collected regarding user activity. Hence, the model has nothing to work on to generate recommendations for the same user. This study aims to utilize the user–user collaborative filtering based on textual reviews and item–item graphs based on item similarity along with demographic data to improve the quality of recommendations and try to soothe the cold-start problem. A survey of the existing Recommender Systems was also done to understand the working of various product Recommender Systems.

50

D. Pancholi and C. Selvi

2.2 Objectives Product Recommender Systems: As mentioned above, product Recommender Systems play an essential role in the current era. Therefore, it is crucial to study and examine the working, architecture, and performance of different possible methodologies for generating user product recommendations. Objective 1: To study and thoroughly understand the working of different product Recommender Systems. Item–item recommender: The graph data structure allows us to represent and visualize metadata related to items in a single graph.\newline. Objective 2: Using metadata, generate an item–item graph with all items (including new items). User–item recommender: Collaborative filtering systems utilize the sparse matrix embeddings of all user–item interactions in the system and use similarity measures to find the most similar users based on their behavior. Objective 3: To apply a collaborative filtering method on textual reviews and star ratings given by users on the items. Cold-start issue in product recommendation: Cold-start issue, called data sparsity, is a scenario where the recommendation engine does not have enough data about the users and items in the dataset to generate accurate recommendations. This problem can be alleviated by including more data about users and items, like demographics, social networks, and other types of information. Objective 4: To utilize demographic data of users to find similar users to new users in the system.

3 Proposed System See Figs. 2 and 3. User–user Collaborative Filtering: User–user collaborative filtering considers all users’ purchase and rating history by constructing a sparse user–item matrix with values as ratings given by users to corresponding items. Each row as a 1-D vector depicts the items a particular user has rated. Top N similar users are fetched by calculating cosine similarity between the vectors. Using the similarity gives us the most similar users to a given user. In the case of new users, it groups them based on demographic data. So, whenever a user is given who does not have a rating history or has purchased and rated very few items, similar users to him will be found using clustering done based on the demographic data of all the users. Then the items in the purchase history of fetched similar users will be given as input to the item–item graph, and more items will be fetched to recommend to the given user. In this way, the recommendations given by the model will not be limited to the purchase history of other users. However, they will also recommend more similar items to those already purchased by a set of similar users.

Study of Cold-Start Product Recommendations and Its Solutions

51

Fig. 2 Hybrid architecture of recommender system

Item-Item similarity: An item–item graph is constructed, and the edge weight between any two nodes is calculated based on the similarity between categories that apply to those two items. Then after setting a threshold value for neighboring items in the graph, similar items to one item are fetched from the same graph. Finds similar items using items metadata based on the similarity formula given below: Similarity =

No. of words that are common between Categories Total no. of words in both Categories

(1)

where 0 ≤ similarity ≤ 1, such that 0 is the least similar and 1 the most similar. Following graph-related measures were used: • DegreeCentrality: this is the measure of centrality. As the graph is undirected, it is defined as the count of the number of neighbors a node has. • ClusteringCoeff: by definition, this is a measure of the degree to which nodes in a graph tend to cluster together. Recommendation Methodology • For Registered users: similar users will be fetched by collaborative filtering using a sparse user–item rating matrix. The similarity of the two users is computed using the dot product of their rating matrices. • For New users: similar users will be fetched using clustering of demographics data. Clustering will be done based on specific demographic features of users like age and gender. After fetching similar users from a demographics perspective, top-rated products of those users will be recommended to the concerned user.

52

D. Pancholi and C. Selvi

Fig. 3 Process of recommendation using proposed hybrid method

• Positively reviewed and rated items of filtered users (from collaborative filtering in case of old users and clustering in case of new users) will be fed into the item–item graph to get recommendations. • For New Item: New items will already be included in the item–item graph considering their metadata.

Study of Cold-Start Product Recommendations and Its Solutions

53

Evaluation For recommended items, the familiar words between the set of categories for the input item and the recommended item are checked for accuracy. For each set of similar users, ratings are predicted for items they have not yet rated. Root mean squared error (RMSE) and mean absolute error (MAE) are considered metrics. 



yi − y p RMSE = n    yi − y p  MAE = n

2 (2)

(3)

where, yi = actual value, y p = predicted value and n = number of observations. Categories similarity calculation: 1. The purchased books list of a customer is shuffled and divided into two lists of equal length: purchased and validation set. 2. For each book in the purchased set, top-k books are recommended to the user. All these books form the list of recommended books. 3. The categories of all the books in recommended and validation sets are fetched from the dataset and appended together for all books to form two strings—one containing all the categories of books in the validation set and the other having all the categories of books in recommended set. 4. Using the previously mentioned formula, the similarity is calculated between these two strings.

4 Results and Analysis 4.1 Inferences Item–item collaborative filtering is done by constructing a similarity graph which considers all the category labels that apply to each item in the dataset. Based on the input item, the neighboring nodes in the graph that satisfy a predefined threshold similarity value are chosen, called the Degree-1 network of that item (Figs. 4 and 5).

54

D. Pancholi and C. Selvi

Fig. 4 Degree-1 network of given product

Fig. 5 Trimmed graph with edge weight threshold of 0.5

4.2 User Ratings Prediction Based on Textual Reviews Rating is predicted based on sentiment analysis of textual reviews given by the users. A predefined python package ‘TextBlob’ is used to get the polarity of reviews, which utilizes rule-based Natural Language Processing methods to estimate the polarity of input text in the range of [−1,1], where −1 being the most negative and one being the most positive. Then, based on the comparison of predicted sentiment and actual sentiment from the given ratings, accuracy is calculated (Fig. 6).

Study of Cold-Start Product Recommendations and Its Solutions

55

Fig. 6 Accuracy of review sentiment

Fig. 7 RMSE and MAE of predicted ratings

4.3 User Ratings Prediction Based on Textual Reviews Using the user–item rating sparse matrix mentioned above, similar users are fetched, and ratings are predicted for the items the user has not yet rated. Then RMSE and MAE are calculated based on the predicted ratings and actual ratings the user gives (Fig. 7).

4.4 Final Recommendations Similar users are fetched based on the input users ID in the final step of generating recommendations, and ratings are predicted for non-rated items. Then the items with the highest predicted rating for that particular user are given input into the item–item similarity graph to fetch recommendations for each item that the user is likely to rate high (Fig. 8).

56

D. Pancholi and C. Selvi

Fig. 8 Comparison of proposed (user-based CF based on textual reviews) and existing recommendation method (item-based CF) [15, 20]: Comparing the MAE and RMSE values on Amazon Books dataset using item–item CF and user–user CF. It is seen that there is a slight reduction in both metrics when we try to predict ratings of items using user-based CF based on user–item sparse matrix

Fig. 9 Final generated recommendations and similarity measure in purchased and recommended items

Study of Cold-Start Product Recommendations and Its Solutions

57

5 Conclusion The growing amount of user and item data is being used to improve the accuracy and efficiency of product Recommender Systems, which has improved the domain’s appeal. We have attempted to use the depth of data in terms of features and size, trying to provide reliable recommendations for new users and items. This study examines several approaches, tactics, and recommendation algorithms in-depth and suggests a hybrid architecture to address the cold-start problem, which uses an item– item similarity graph and user–user collaborative filtering based on textual reviews (For new users, we consider their demographic data). In turn, the combination of products and user demographics data can be used to address data sparsity, resulting in recommendations being generated even for cold-start users and products. To further try and improve the quality of recommendations, one can consider more features of users’ data to group and find users of similar interests. Furthermore, the same model can be scaled to construct a graph of a larger dataset of items to include items across different categories, and users can also be categorized according to different demographic data like location and age group to improve the accuracy of recommendations across multiple users and multiple categories of products. We sincerely hope that the research and analysis presented in this work will aid other researchers in better understanding and exploring the applications of various models to improve the overall quality of recommender engines.

References 1. Srifi M, Oussous A, Ait Lahcen A, Mouline S (2020) Recommender systems based on collaborative filtering using review texts—A survey. Information 11(6):317 2. Pan R, Ge C, Zhang L, Zhao W, Shao X (2020) A new similarity model based on collaborative filtering for new user cold start recommendation. IEICE Trans Inf Syst 103(6):1388–1394 3. Zhang Z, Zhang Y, Ren Y (2020) Employing neighborhood reduction for alleviating sparsity and cold start problems in user-based collaborative filtering. Inf Retrieval J 23(4):449–472 4. Natarajan S, Vairavasundaram S, Natarajan S, Gandomi AH (2020) Resolving data sparsity and cold start problem in collaborative filtering recommender system using linked open data. Expert Syst Appl 149:113248 5. Liu S, Ounis I, Macdonald C, Meng Z (2020) A heterogeneous graph neural model for coldstart recommendation. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 2029–2032. Study of Cold-start Product Recommendations and its Solutions 13 6. Kapoor N, Vishal S, Krishnaveni KS (2020) Movie recommendation system using nlp tools. In: 2020 5th International conference on communication and electronics systems (ICCES). IEEE, pp 883–888 7. Hernando A, Bobadilla J, Ortega F, Gutiérrez A (2017) A probabilistic model for recommending to new cold-start non-registered users. Inf Sci 376:216–232 8. Leskovec J, Adamic LA, Huberman BA (2007) The dynamics of viral marketing. ACM Trans Web (TWEB), 1(1):5-es 9. McAuley J, Leskovec J (2013) Hidden factors and hidden topics: understanding rating dimensions with review text. In: Proceedings of the 7th ACM conference on Recommender systems, pp 165–172

58

D. Pancholi and C. Selvi

10. Valdiviezo-Díaz P, Bobadilla J (2018) A hybrid approach of recommendation via extended matrix based on collaborative filtering with demographics information. In: International conference on technology trends. Springer, Cham, pp 384–398 11. Qian T, Liang Y, Li Q (2019) Solving cold start problem in recommendation with attribute graph neural networks. arXiv preprint arXiv:1912.12398 12. Lam XN, Vu T, Le TD, Duong AD (2008) Addressing coldstart problem in recommendation systems. In Proceedings of the 2nd international conference on ubiquitous information management and communication, pp. 208–211) 13. Togashi R, Otani M, Satoh SI (2021) Alleviating cold-start problems in recommendation through pseudo-labelling over knowledge graph. In: Proceedings of the 14th ACM international conference on web search and data mining, pp 931–939 14. Chicaiza J, Valdiviezo-Diaz P (2021) A comprehensive survey of knowledge graph-based recommender systems: technologies, development, and contributions. Information 12(6):232 15. Ricci F, Rokach L, Shapira B, Kantor P (2011) Recommender systems handbook, vol 1. Springer, New York 16. Devika P, Jisha RC, Sajeev GP (2016) A novel approach for book recommendation systems. In 2016 IEEE international conference on computational intelligence and computing research (ICCIC). IEEE, pp 1–6 17. Kavinkumar V, Reddy RR, Balasubramanian R, Sridhar M, Sridharan K, Venkataraman D (2015) A hybrid approach for recommendation system with added feedback component. In: 2015 International conference on advances in computing, communications, and informatics (ICACCI). IEEE, pp 745–752 18. Bindu KR, Visweswaran RL, Sachin PC, Solai KD, Gunasekaran S (2017) Reducing the colduser and cold-item problem in recommender system by reducing the sparsity of the sparse matrix and addressing the diversity-accuracy problem. In: Proceedings of international conference on communication and networks. Springer, Singapore, pp 561–570 19. Tan Y, Zhang M, Liu Y, Ma S (2016) Rating-boosted latent topics: understanding users and items with ratings and reviews. In: IJCAI, vol 16, pp 2640–2646 20. Bobadilla J, Ortega F, Hernando A, Gutiérrez A (2013) Recommender systems survey. KnowlBased Syst 46:109–132 21. Sharma J, Sharma K, Garg K, Sharma AK (2021). Product recommendation system a comprehensive review. In: IOP conference series: materials science and engineering, vol 1022, no 1. IOP Publishing, p 012021

Monitoring Urban Flooding Using SAR—A Mumbai Case Study Chaman Banolia, K. Ram Prabhakar, and Shailesh Deshpande

Abstract Monitoring urban floods caused by heavy rains are critical for planning rescue and recovery responses. Due to climate change, flooding issues in coastal cities are getting worse as economic development activities speed up. Moreover, the flooding risk zoning map offers crucial decision support for the management of urban floods, urban development, and urban planning. Urban floods will affect economic growth and people’s life safety. Synthetic Aperture Radar (SAR) is not affected by cloud cover, unlike optical imagery, and provides the required data under cloudy conditions, so SAR provides an efficient method to monitor changes in water bodies across wide areas, and identify floods brought on by heavy rain. In the present work, we discuss the initial exploration results of the Sentinel-1A SAR GRD data for Mumbai on July 2, 2019. Mumbai experienced a severe flood on July 2 because of heavy rains, and many suburbs of Mumbai City were affected. The SAR images show enhanced double bounce because of the floodwater, which correlated well with the flood news reports. Further investigation is required using complete polarimetric data. Keywords Mumbai · Remote sensing · SAR · Urban flooding

1 Introduction Worldwide, both rural and urban areas are at risk from flooding, although the hazards to people and the resulting economic effects are the biggest in metropolitan areas. In C. Banolia (B) · S. Deshpande (B) TCS Research, Tata Consultancy Services Ltd., Pune, Maharashtra 411013, India e-mail: [email protected] S. Deshpande e-mail: [email protected] K. R. Prabhakar TCS Research, Tata Consultancy Services Ltd., Bangalore, Karnataka 560048, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_5

59

60

C. Banolia et al.

the most recent occurrence of the India and Bangladesh floods, nearly 9 million people in both countries were affected, and almost 300 people died. Northeastern India, especially Assam, is primarily flat floodplains with multiple rivers flowing through them, with the Ganga and Brahmaputra being the two most notable. According to Times Now [1], more than 25 lakh people in Assam were impacted, and Silchar, the state’s second-largest city, was underwater for a week in May 2022. Urban flooding during the monsoon because of heavy rains is a common problem in Indian metros. Rapid urbanization without mitigating adverse effects on the hydrological cycle (increase in impervious surfaces, loss of soil cover, loss of natural drainage, and so on) and poorly managed artificial drainage are making the situation worse. Remote sensing (RS) helps promptly monitor large cities as RS provides a synoptic view with sufficient temporal resolution. Passive optical remote sensing may provide the data intermittently because of the high probability of cloud cover during such episodes. However, synthetic aperture radar (SAR) is a viable option as SAR uses active microwave energy, which is not affected by cloud cover. Naturally, SAR imagery usage for monitoring urban floods is gaining increasing attention from researchers. We can also determine the degree of damage brought on by flood disasters, SAR data analysis in the Sentinel Application Platform (SNAP) can deliver both quantitative and qualitative measures.

2 Literature Review Many works have been proposed before for flood inundation detection [2], flood monitoring [3, 4], flood mapping [5], etc. Qiu et al. proposed to use Otsu’s method on 66 Sentinel-1 images to investigate floods and explore flood patterns [3]. In [4], a water mask is generated using the histogram stretching technique, which was used to conduct a detailed analysis of the flash flood of Patna City. Mason et al. [6] explored the effectiveness of double bounce for detecting urban floods. They used LiDAR data along with the SAR data and the method, to successfully detect urban flooding in a single image using the double bounce. Li et al. [7] used multitemporal intensity and coherence of the SAR images in an unsupervised manner for detecting the urban flood in Houston. Liao and Wen [8] used machine learning models to explore the double-bounce effect and prediction performance of urban water detection. Mason et al. [5] detected flooding in urban areas by merging realtime SAR flood extents with model-derived flood hazard maps. Baghermanesh et al. [9] proposed a random forest-based method for urban flood mapping using PolSAR and InSAR. Lin [10] showed in their paper that open Internet data is an effective tool to detect urban waterlogging. Urban waterlogging prior and its risk assessment has primarily focused on hazard vulnerability, simulation models, and GIS-based risk assessment for flooding. To assess or forecast the probability of waterlogging and the geographical distribution of waterlogged areas, the submerged area and depth of different rainfall return periods are computed and distribution properties

Monitoring Urban Flooding Using SAR—A Mumbai Case Study

61

are simulated. Photogrammetric/photointerpretation, socioeconomic, hydrological– geomorphological, and photogrammetric data were frequently used in studies [11]. The spatial analytic techniques used in the GIS-based flood risk assessment study are used to build topography, rainfall, drainage, and other models. Recently, Bhatt et al. [2] explored using moderate-resolution C-band SAR images to assess the changes in backscattering coefficient values for catastrophic floods. The authors observed a significant increase in the backscattering coefficient of about 8.0 dB. SAR images are crucial for monitoring urban flooding for two reasons. First, as a valuable tool for operational flood relief management, due to its ability to gain a spatial and temporal overview of the extent of urban and rural inundation during day and night, even when clouds are present. The Pitt Report (2008) [12] listed the lessons that could be drawn from the 2007 floods in the UK. One of its many recommendations was that emergency responders should have access to real-time or almost real-time flood visualization tools so that they can respond and manage rapidly evolving situations and concentrate their limited resources where they will be most effective. The emergency services might be kept informed with the help of a straightforward GIS that could be updated with timings, water levels, and the extent of flooding during a flood event. A prototype near real-time flood detection method uses high-resolution SAR pictures for urban and rural areas [13]. The approach assumes that the user can receive a processed multi-look geo-registered SAR image in close to real-time. Second, SAR data can be utilized to calibrate, validate, and assimilate urban flood inundation models. These models are key tools for predicting the risk of urban floods. Hydraulic models solve shallow water equations at each node of a regular or irregular grid spanning the river channel and floodplain, subject to boundary constraints including the domain’s input flow rate [14]. Urban flood modeling is more difficult than rural flood modeling because the interaction of floods with the built environment must be modeled. Not only topography and vegetation influence surface flows, but also structures and other man-made features (walls, roads, curbs, etc.) [15]. In stormwater drainage systems, both the surface and subsurface water flows must be modeled and linked. Disasters caused by flooding are nearly always inevitable, and there are few ways to make up for the losses fully. However, it is feasible to somewhat lower the risk in the future by assessing and contrasting harmful circumstances, improving risk perception, and creating early warning systems for flooding disasters [16]. Even though no one can stop catastrophes from happening, several major aspects, such as infrastructure vulnerability, and maintenance capacity, may be managed to lessen the severity caused by flooding impacts [17]. As SAR is not affected by the weather or time of day, it can give crucial information for estimating the number of floods. SAR data collected both before and after flooding is used to quantify the extent. However, because wind-wave reflections can be specular and lead to misinterpretation, it can often be challenging to tell the flooded region apart from the surrounding area just from SAR backscattering. For decreasing misinterpretation, it has been suggested that optical sensors or GIS data be used together. Large portions of the urban ground surface may not be visible to SAR due to radar shadowing and layover caused by buildings and taller vegetation, presenting a challenge for urban flood detection

62

C. Banolia et al.

using SAR [18]. Only one-third of the total road surface was visible to the SAR in Karlsruhe’s airborne SAR data. This makes SAR less effective at detecting urban flooding than it would otherwise be. There is increasing interest in the recent past in exploiting the double bounce for detecting urban floods. Double bounce has been used effectively either for detecting urban floods or for extracting urban water bodies earlier [2, 6, 8]. Our work is on similar lines. Toward that goal, this paper explores the possibility of monitoring Mumbai floods in 2019 (Jul 2) using the Sentinel-1A GRD product. The objective was to assess the effectiveness of Sentinel SAR imagery for monitoring urban floods. The specific goals are to study the backscatter signatures of the urban objects, especially double bounce, and their variation with flood water depth. We use only SAR data unlike earlier studies [5]. In addition to studying double bounce and flooding association, we further investigate whether the adaptive threshold is helpful in identifying enhanced double bounce because of flooding in a single SAR image. Intuition is double bounce displays unusually high brightness than its surroundings and is anomalous.

3 Material and Methods 3.1 Study Area Our study area is flooded regions of Mumbai City in the state of Maharashtra in India, a tropical city with coastal regions. The city lies on the shores of the Arabian Sea between longitude 19.0760° N and latitude 72.8777° E. Mumbai has a tropical, wet, and dry climate with high humidity all year. Mumbai encounters floods very frequently, due to heavy rainfall in the monsoon season (June–September). It experienced floods due to heavy rainfall in July 2019, especially in Malad, Andheri, and Kurla, which are regions where floods are commonly observed during the monsoon season.

3.2 Data We downloaded two SAR images to compare backscatter at different water levels. Mumbai experienced heavy rains from June 28, 2019, to July 2, 2019.1 Mumbai experienced a severe urban flood on July 2 and was brought to a standstill. Many areas were affected by flooding, including the suburbs and the areas around the Mithi River [19, 20]. We chose two dates based on the availability of the sentinel data: July 2, 2019, and July 26, 2019. The data for this date can be downloaded from USGS 1

Rainfall figures for consecutive days in mm, starting from Jun. 28, 2019: 186.2, 121.9, 34.8, 215.1, 264.7, 4.1, 23.1.

Monitoring Urban Flooding Using SAR—A Mumbai Case Study

63

Earth Explorer or Alaska Satellite Facility (ASF) with the Earthnet credentials. We have tested both sources for the data download. The intention of choosing these dates was to study the backscatter signatures under different surface wetness conditions: flooded vs. wet. It is highly likely to have a wet urban surface before and after the floods. There must be a sufficient signature difference in wet surface and flooded conditions at a given space and time. Otherwise, the damp surface would be confused with the flood.

3.3 Algorithms Sentinel-1A GRD product provides a range of resolved backscatter values. However, this image still needs further processing, such as subset creation (for computational efficiency), radiometric calibration, speckle removal, and geometric corrections, before one can interpret it for the objective. Flow of steps is given in Fig. 1. All these preprocessing steps were performed using a sentinel toolbox in Sentinel Application Platform (SNAP) software. To extract the region of interest, for computational efficiency first we created a subset from the Sentinel 1-A GRD product. SAR radiometric calibration aims to produce pictures whose pixel values are directly related to the scene’s radar backscatter. Although uncalibrated SAR imagery is adequate for qualitative applications, calibrated SAR images are necessary for SAR data’s quantitative applications. Radiometric corrections are typically not included in SAR’s data processing, which results in Level 1 images and leaves a considerable amount of radiometric bias. As a result, radiometric correction must be applied to SAR images for the pixel values to accurately depict the radar backscatter of the reflecting surface. For the comparison of SAR images taken with various sensors, or taken from the same sensor at various times and modes, and processed by various processors. This is done by the gain and offset calibration constants from Lookup Tables (LUTs) [21]. The return signal from a radar that illuminates a surface that is rough on a radar wavelength scale is composed of waves reflected from several elementary scatterers inside a resolution cell. Because of the surface roughness, the distances between the

Fig. 1 Steps for flood monitoring in SAR images

64

C. Banolia et al.

elementary scatterers and the receiver fluctuate. As a result, although the received waves are coherent in frequency, they are no longer coherent in phase. If the waves are in phase and add relatively constructively, a strong signal is received; otherwise, a weak signal is obtained. Coherently processing the returns from successive radar pulses creates a SAR picture. This phenomenon results in pixel-to-pixel intensity change, and this variation appears as a granular pattern known as speckle. This unique texturing of SAR pictures, known as speckles, degrades the image quality and makes it more challenging to interpret features. With the aid of a speckle filter, a filtering procedure was applied in order to reduce the effect of speckles and smooth the image. Speckle was removed using the multilook operator. In general, the texture and speckle have stronger spatial correlations. The texture variable must be assumed to be the same across all pixels used in the multi-look average. With this perspective, the multi-look covariance matrix domain is formed by extending the multiplicative speckle model [21]. Y = tX

(1)

where X is the multi-look covariance matrix of the speckle that obeys a complex Wishart distribution [22, 23]. The size of four looks was used. The image was geometrically corrected using DEM. No polarimetric decomposition was performed as it needs a full SLC product. Each image (VV and VH component from both dates July 2 and July 26) was processed as explained above, and then the two images were visually interpreted based on the background information reported by the news.

4 Results and Discussions In this section, we will discuss the results produced by our algorithm. And identify the areas using the double-bounce effects of SAR by comparing them to the news reports of flooded areas. First, we discuss the challenges posed by SAR’s doublebounce characteristics in monitoring urban flooding. Then, we will discuss how to overcome these challenges when using SAR to monitor urban flooding. Figure 2 describes the SAR image components for July 2 and July 26, 2019, and in Fig. 3, we have identified and circled the areas with double-bounce effect. Although the SAR can penetrate the clouds and observe the urban land covers during the monsoon season, it was still challenging to identify flooded regions. Because of the look angle of SAR, roads along the track could remain in the shadow region of the high-rise urban buildings. The layover effect could also be misleading. Thus, the overall visibility of the urban land covers could be very low, depending upon the spread of the city. The most common signatures available in general SAR scenes were not available in urban regions, as the flooded water might not have had enough extent and enough wind to generate detectable surface roughness.

Monitoring Urban Flooding Using SAR—A Mumbai Case Study

65

The flood extent is classified using SAR data collected before and after flooding. However, it can be challenging to distinguish the flooded area solely from SAR backscattering because wind waves cause a bright reflection, and misclassification can happen. To minimize misclassification, the simultaneous use of optical sensors or GIS data is recommended; here we are verifying results using groundlevel news reports. The presence of floodwater on the roads near the buildings results in a doublebounce effect (because of specular reflection by water). This double-bounce effect is more pronounced than caused by road and building as the reflection from the flooded road is much stronger than the road. The earlier research supports this deduction. We looked for the signs of double bounce because of the flood in the processed Mumbai images. This was marked by unusually high brightness areas. The image in Fig. 3 was taken after rains and when the flood had receded (Fig. 2). The image showed very bright areas because of the double bounce indicating buildings/block structures in the city. The same area was observed during the July 2 image (Fig. 1), the day of severe flooding. Many of the metro areas were brighter and contiguous compared to the July 26 image (Fig. 3). The extra brightness marked by enhanced double bounce correlated well with the reported regions that were severely affected by the flood. This was investigated using the full polarimetric product, which enabled us to perform decomposition. Mumbai encounters floods very frequently, due to heavy rainfall in the monsoon season (June–September). It experienced floods due to heavy rainfall in July 2019, especially in Malad, Andheri, and Kurla, which are regions where floods are commonly observed during the monsoon season.

5 Conclusions This study of urban water logging with the help of the double-bounce effect of SAR image effectively shows the waterlogged areas in Mumbai and its suburbs. Though the SAR penetrates the clouds and supposedly would be very useful in monitoring urban floods, it is a non-trivial task. Further systematic studies are required. Perhaps lab scale studies with lab grade SAR and physical scaled models of the city would be very useful to characterize the backscatter. We also must keep in mind that the temporal resolution of the Sentinel SAR is 6/12 days, upcoming NISAR is 25 days. Thus, SAR may be able to provide the aerial extent of affected urban areas but not the day-to-day updates. Another interesting opportunity is to combine daily highresolution RGB data and 6/12 day SAR data to provide the daily picture. This can be further extended to combine the data from news reports and create an event model for the flood situation. An additional approach would be processing SAR images without speckle correction for improved resolution.

66

C. Banolia et al.

(a) Jul 2 VV

(b) July 26 VV

(c) Jul 2 VV

(d) July 26 VV

Fig. 2 a VV component of July 2 SAR image, b VV component of July 26, c VH on July 2, and d VH on July 26

Monitoring Urban Flooding Using SAR—A Mumbai Case Study

67

(a) July 2 VV

(b) July 2 VV Fig. 3 Mumbai flood situation as reflected by SAR imagery on July 2. The areas showing enhanced double bounce because of flood (1) Malad, (2) Malad West, (3) Andheri West, (6) Kurla; other landmarks, (4) Powai Lake, (5) Airport

68

C. Banolia et al.

References 1. Times Now Homepage. https://www.timesnownews.com/india/. Last accessed 31 Aug 2022 2. Bhatt CM (2020) Detection of urban flood inundation using risat-1 SAR images: a case study of Srinagar, Jammu and Kashmir (North India) floods of September 2014. Model Earth Syst Environ 6(1):429–438 3. Qiu J (2021) Flood monitoring in rural areas of the pearl river basin (china) using sentinel-1 sar. Remote Sensing 13(7):1384 4. Ahmed T (2021) Monitoring and mapping of flash flood of Patna city using sentinel-1 images: a case of India’s most flood prone state. Acad Lett 2 5. Mason DC (2021) Improving urban flood mapping by merging synthetic aperture radar-derived flood footprints with flood hazard maps. Water 13(11):1577 6. Mason DC (2014) Detection of flooded urban areas in high resolution synthetic aperture radar images using double scattering. Int J Appl Earth Obs Geoinf 28:150–159. https://doi.org/10. 1016/j.Jag.2013.12.002 7. Li Y (2019) Urban flood mapping using SAR intensity and interferometric coherence via Bayesian network fusion. Remote Sens 11(19). https://doi.org/10.3390/rs11192231 8. Liao HY (2020) Extracting urban water bodies from high-resolution radar images: measuring the urban surface morphology to control for radar’s double-bounce effect. Int J Appl Earth Observ Geoinform 85:102003. https://doi.org/10.1016/j.jag.2019.102003 9. Baghermanesh SS (2021) Urban flood detection using sentinel1-a images. In: 2021 IEEE international geoscience and remote sensing symposium IGARSS. IEEE, pp 527–530 10. Lin T (2018) Urban waterlogging risk assessment based on internet open data: a case study in china. Habitat Int 71:88–96 11. Pistrika A (2007) Flood risk assessment: a methodological framework. European Water Resources Association, pp 14–16 12. Pitt M (2008) Learning lessons from the 2007 floods. UK Cabinet Office Report, June 2008. Available online at: http://archive.cabinetoffice.gov.uk/pittreview/thepittreview.html 13. Mason DC (2012) Near real-time flood detection in urban and rural areas using high resolution synthetic aperture radar images. IEEE Trans Geosci Rem Sens, pp 3041–3052 14. Bates PD (2006) Reach scale floodplain inundation dynamics observed using airborne SAR imagery. J Hydrol 328 (1–2):306–318 15. Hunter NM (2008) Benchmarking 2D hydraulic models for urban flood simulations. Water Manage 161(1):13–30 16. Wang H (2018) A new strategy for integrated urban water management in China: Sponge city. Sci China Technol Sci 61:317–329 17. Horita F (2018) Determining flooded areas using crowd sensing data and weather radar precipitation: a case study in Brazil. ISCRAM 18. Soergel U (2003) Visibility analysis of man-made objects in SAR images. In: 2nd GRSS/ISPRS joint workshop on data fusion and remote sensing over urban areas 19. The Hindu. Mumbai flood news, https://www.thehindu.com/news. Last Accessed 31 Aug 2022 20. The Hindu. Mumbai rain live updates, https://www.thehindu.com/news. Last Accessed 31 Aug 2022 21. SAR Technical Guides, Radiometric Calibration of Level-1 Products—Sentinel-1, https://sen tinels.copernicus.eu/web/sentinel/radiometric-calibration-of-level-1-products. Last Accessed 31 Aug 2022 22. Lee JS (1994), Classification of multi-look polarimetric SAR imagery based on complex Wishart distribution. Int J Remote Sens 15:12299–12311 23. Guoqing L (1998) The multilook polarimetric whitening filter (MPWF) for intensity speckle reduction in polarimetric SAR images. IEEE Trans Geosci Remote Sens 36(3):1016–1020. https://doi.org/10.1109/36.673694

Do Women Shy Away from Cryptocurrency Investment? Cross-Country Evidence from Survey Data Ralf Hoechenberger, Detlev Hummel, and Juergen Seitz

Abstract This study utilizes cross-country survey data to analyze differences in attitudes toward cryptocurrency as an alternative to traditional money issued by a central bank. Particularly, we investigate women’s general attitude toward cryptocurrency systems. Results suggest that women invest less into cryptocurrency, show less interest in the future cryptocurrency investment, and see less economic potential in these systems than men do. Further evidence shows that these attitudes are directly connected with lower literacy in cryptocurrency systems. These findings support theory on gender differences in investment behavior. We contribute to the existing literature by conducting a cross-country survey on cryptocurrency attitudes in Europe and Asia, and hence show that this gender effect is robust across these cultures. Keywords Cryptocurrencies · Bitcoin · Financial literacy · Gender gap · Risk tolerance

1 Introduction Cryptocurrency systems experience broad market acceptance and fast development, and several financial institutions include cryptocurrency-related assets into their portfolios and trading strategies [1]. A cryptocurrency is a specific type of digital currencies that is often based on blockchain technology which in turn is based on cryptographic algorithms to secure its transactions, to control the creation of additional money and to verify the asset transfer [2]. The idea came up after the financial sector R. Hoechenberger Allianz Kunde und Markt GmbH, Koeniginstrasse 28, 80802 Munich, Germany e-mail: [email protected] D. Hummel University of Potsdam, 14469 Potsdam, Germany e-mail: [email protected] J. Seitz (B) Baden-Wuerttemberg Cooperative State University Heidenheim, 89518 Heidenheim, Germany e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_6

69

70

R. Hoechenberger et al.

lost trust in the 2007 crisis to develop a currency which does not need institutions to care about trust [3]. The basic principle of cryptocurrency is that a predefined amount of money is collectively produced by the entire system, where the rate of production is set by a predefined value which is publicly known [2]. Bitcoin was the first cryptocurrency developed in 2008 [4]. Along with the growing digital technological innovations in the financial sector, the process of buying and selling commodities via cryptocurrencies is transformed, e. g., people are using digital wallets to buy commodities as well as to transfer money from one bank account to the other [5]. The main reason for investing in cryptocurrency is—as with all investment assets—to protect and increase resources, and hence obtain continuous income [6]. Despite the rationale of objectively meaningful investments, behavioral economics of the last fifty years study how personality traits, psychological factors and also demographic factors influence the decision of investing into financial assets—an important starting point of this research stream was done by Kahneman and Tversky [7]. Particularly, several studies analyze the gender factor, which is one of the effective factors in investment decisions based on demographics [8]. Fehr-Duda et al. [9], Anbar and Eker [10] and Wang et al. [11] find that women generally exhibit less risky behavior than men. Charness and Gneezy [12] find that there is a gender gap in financial risk taking, i.e., women are less willing to invest in risky financial assets (see also the literature on risk tolerance from Lemaster and Strough [13] and Bannier and Neubert [14]). This could also apply to cryptocurrencies which certainly represent risky financial assets. Indeed, there are studies finding that women invest significantly less in cryptocurrency [2, 8]. However, the first study focuses on data from Turkey, the latter on data from India. Cultural and/or country differences generally can play a role when analyzing the perception and use of virtual money (see, e. g., [15]). Our paper contributes to the existing literature on gender effects of attitudes toward cryptocurrency by analyzing a survey conducted mainly over five nationalities: Germany, China, India, Russia and Poland. We find that the gender effect is robust when including answers from all involved nationalities. The remainder of the paper is organized as follows: First, we introduce the data and present some summary statistics. In the next chapter, we perform Fisher tests on the difference between the frequency of answers that suggest an attitude in favor of cryptocurrency (usage, interest in usage, crypto salary, parallel currency and trust). Data is also compared with the self-reported literacy on cryptocurrency which then also reflects insights from the literature that women self-report being less literate in the cryptocurrency topic than men. Finally, the connection between both sets of variables is performed in the sense that lower cryptocurrency attitude significantly correlates with lower literacy. Some of the questions used in the questionnaire are strongly related to the ones used in the cross-country study on cryptocurrency by Maciejasz-Swiatkiewicz and Poskart [15]. However, this study does not look at gender differences.

Do Women Shy Away from Cryptocurrency Investment? Cross-Country …

71

Table 1 Summary statistics (n = 643) Share

Female

Male

Chinese

Russian

German

Indian

Polish

Others

60.8%

39.2%

42.6%

16.6%

14.5%

11.0%

9.6%

5.7%

Note Roughly, 85% of the students are between 16 and 24 years old

2 Data The data used in this study was collected from business students around the world. There are several good reasons for asking business students: Firstly, they should have at least basic knowledge about economics and the finance industry. Secondly, they often exhibit a higher affinity and adoption rate of new technologies. Taking smartphone banking as an example, in 2019, 63% of the Germans between 16 and 29 were smartphone users, but only 29% of the Germans older than 65 [16]. Data was gathered as a cross-country study with students from mainly five countries: China, Russia, Germany, India and Poland. Summary statistics are given in Table 1.

3 Results For analyzing the gender effect in cryptocurrency attitude, we used several items of the questionnaire that are partly based on the items of Maciejasz-Swiatkiewicz and Poskart [15]. The questions cover five different dimensions of attitude toward cryptocurrency like given in Table 2. To perform a meaningful analysis, we construct binary variables that take the value 1 for the answer options that reflect a positive attitude toward cryptocurrency, i. e., in the items’ usage, interest in usage, crypto salary and parallel currency, the answer yes is coded as 1 (and 0 otherwise), in the trust item, the answer cryptocurrency is coded 1 (and 0 otherwise). Table 2 Items analyzed for attitudes toward cryptocurrency Item

Question

Usage

Have you ever used cryptocurrencies for payments? (yes/no)

Interest in usage

Are you interested to make transactions using cryptocurrencies? (yes/no/perhaps/I don’t know)

Cryptocurrency salary Would you accept salary payment using cryptocurrencies? (yes/no/perhaps/I don’t know) Parallel currency

Should cryptocurrencies be established parallel to currencies issued by central banks? (yes/no/perhaps/I don’t know)

Trust

Which currency is in your opinion the most trustworthy? (cryptocurrency, central bank currencies, gold standard, others)

72

R. Hoechenberger et al.

Below, the shares of the variables having a 1 are analyzed over the whole data set, for women and men separately. Results in Fig. 1 and Table 3 show that each item has a higher share of positive answers for men than it has for women. 17% of men already invest in cryptocurrency, whereas only 9% of women do (p < 0.01), and 38% of men are interested in investing in cryptocurrency, whereas investment interest lies only at 27% with women (p < 0.01). 22% of men would accept their salary in cryptocurrency, whereas only 16% of women accept this (p < 0.05). 39% of men can imagine having cryptocurrencies as a parallel currency, but only 25% of women (p < 0.001). 21% of men trust cryptocurrencies most, women only 12% (p < 0.01). Hence, all differences of variables between men and women are significant and show that women shy away from cryptocurrency investments. As the sample includes answers from students of different cultures in Europe and Asia, this result seems to be robust at least for these two regions in the world. Unfortunately, a comparative analysis on country level is not feasible, as some sub-datasets do not have enough observations to deliver reliable results. These results are well in line with studies mentioned above dealing with the gender gap in financial decisions, particularly with the studies dealing with difference in cryptocurrency investments. As a second part of the analysis, we study possible reasons for this gender gap in cryptocurrency attitude. A main reason named in the literature is lower literacy in finance of women [8, 14, 17, 18]. We test this in my data by utilizing items that

0.4

0.3

gender female male

0.2

0.1

0.0 usage

interest in usage crypto salary

parallel currency

Fig. 1 Share of crypto attitude items, separated by gender

trust

Do Women Shy Away from Cryptocurrency Investment? Cross-Country … Table 3 Significance of Fisher tests for each crypto attitude variable and gender (Note: statistical significance levels *** p 0)Lossregression (h m,n , h ∗m,n ) + Numberpos m,n

    Loss pm,n , pm,n =

where Lossfocal is focal loss [71] and Lossregression is the IOU loss as in Unit Box [72]. The number of positive samples is denoted by Numberpos and φ takes the value 1 to balance the weight for Lossregression . The sum is determined by calculating it across all of the locations  shown on the feature map Si . The indicator function is denoted by the notation 1 ci∗ > 0 . FCOS is used as a backbone network in our proposed architecture for second neural network. The authors identify various object sizes on various layers of feature maps by using FPN [73]. In particular, utilize each of the following five feature map levels: P3, P4, P5, P6, and P7. As observed in Fig. 3. The feature maps of the backbone CNNs C3, C4, and C5 are used to generate P3, P4, and P5, and this is followed by a 1 * 1 convolutional layer that makes use of top-down connections, as illustrated in Fig. 3: P5 and P6 are both subjected to a single convolutional layer that has a stride of 2, and the results are P6 and P7, respectively. As a direct consequence of this, the appropriate strides for the feature levels P3, P4, P5, P6, and P7 are 8, 16, 32, 64, and 128, respectively.

Fig. 3 P3 through P7 are the feature levels that were used in the final prediction, and in backbone network C3, C4, and C5 work as the feature maps that are displayed in the design of the FCOS network. The height and breadth of a feature map are denoted by the letters H and W, respectively. “/s” represents the ratio of the level’s feature maps to the input image, where “s” can range from 8 all the way up to 128 [70]

A Dual Fine Grained Rotated Neural Network for Aerial Solar Panel …

467

EfficientDet Despite the significant limitations imposed on the current approaches, EfficientDet is capable of building a general detection framework with both high accuracy and high efficiency. It serves as the principal network in the neural network’s basic architectural idea. EfficientDet consists of a weighted bi-directional feature pyramid network (BiFPN) that accelerates multi-scale feature fusion, as well as a compound scaling method that simultaneously scales the width, resolution, and depth of all backbone, feature network, and box/class prediction networks. As a consequence of these improvements and reinforcements, EfficientDet object detectors may outperform the state-of-the-art in terms of efficiency when exposed to a broader range of resource constraints. As seen in Fig. 4, the architecture of EfficientDet corresponds to the one-stage detector paradigm [74–77]. EfficientNet, which was previously trained on ImageNet, is currently in use as the primary network. When the BiFPN is employed as the feature network, top-down and bottom-up bi-directional feature fusion is frequently performed on the level 3–7 features from the backbone network (P3, P4, P5, P6, P7). These fused characteristics are then transmitted to a class and box network, where object class and bounding box predictions are created based on the analysis results. The weights of the class and box networks are simultaneously transmitted to all feature granularities, as reported in [77]. Previous research has often sought to increase accuracy and efficiency by scaling up a baseline detector via the use of bigger backbone networks (for example, ResNeXt [78] or Amoeba Net [79], larger input images, or stacking more FPN layers [80]. These tactics were often ineffective since they concentrated their efforts on just one or two scaling axes at a time. According to recent research [81], increasing the network’s breadth, depth, and input resolution at the same time may result in excellent image categorization performance. [Citation required] Because of its compound scaling approach for object recognition, we chose EfficientDet based on recent research. This method includes using a single compound coefficient to increase the size of the

Fig. 4 In EfficientDet, the core network is built using EfficientNet [39], while the features are built using BiFPN and a common class/box prediction network. Due to the numerous resource constraints, many instances of BiFPN layers and class/box net layers are present, according to Ref. [73]

468

I. Kar et al.

backbone, BiFPN, class/box network, and resolution. Because of this method, we picked EfficientDet. Stage 2: Rotated Region Proposal and Rotated Bounding Box According to the developers of FCOS, it is potential for the system to be implemented in two-stage detectors as a Region Proposal Network (RPN), where it can achieve substantially better performance. We consider the same approach however with Rotation. The identical method that is utilized in FCOS is rotated. According to the authors, this technique may be employed as a Region Proposal Network (RPN) in two-stage detectors, resulting in much better performance than earlier methods. Rotating area suggestions have been applied to the problem of ship detection as a consequence of the development of such systems. Liu et al. [27] created rotation bounding box regression and pooling of a Rotation Region of Interest under the paradigm (RRoI). Yang et al. excellent’s results may be ascribed to their employment of a rapid R-CNN, RRPN, and dense feature pyramid network [6]. Zhou et al. [28, 29] proposes the network to semantically segment to determine which elements of a ship should be considered for a revolving zone. So far, no effective end-toend model for mitigating this problem has been developed to aid in the design of acceptable rotational area concepts for inshore ships of various locations and sizes. To demonstrate multi-scale feature fusion, Reference [82] implements cutting-edge feature pyramid processing based on backbone networks. Following that, ideas for rotated zones for ships of different sizes and locations are provided by using a network for rotation area recommendation. We can provide an accurate description of the target identification process and compensate for data loss caused by the skewed box by using rotating contextual RRoI pooling. This will enable us to compensate for the data loss caused by the skewed box. We use the same techniques for all solar panel variations, some of which have a greater packing density than others. Positive comparison studies [84] encouraged the usage of rotated bounding boxes in conjunction with rotating area ideas. Bilinear Pooling Bilinear pooling (Fig. 4) developed as a solution for fine grained image detection in the computer vision community. It feeds an input image into two separate deep convolutional neural networks A and B, as shown in Fig. 5. The authors generated a feature map from both A and B after performing multiple pooling and non-linear transformations. These two networks were pretrained to do separate jobs. The assumption is that A and B will learn various characteristics from the input image in this manner. A, for example, was taught to recognize basic object forms, whereas B recognizes texture information. The output features from A and B are then merged by the bilinear pooling layer. This basically implies that, take their underlying product and integrate every feature from A with every feature from B. Idea behind the bilinear pooling layer is that feature interactions allow us to identify more exact image details.

A Dual Fine Grained Rotated Neural Network for Aerial Solar Panel …

469

Fig. 5 Working procedure of bilinear pooling as mentioned in [83]

4 Implementation of Dirty Solar Panel Detection Using Dual Fine Grained Rotated Neural Network from Tiny Aerial Image Figure 6 depicts the flow of the aerial picture into the FCOS and EfficientDet backbone networks. To be more exact, the FCOS developers distinguish between things of varying sizes by using the neck of a feature pyramid network. Integrate information coming from below with information coming from above. The authors employ a PAN structure to deliver the most up-to-date data [85]. The FPN strategy often employs a top-down approach. We can no longer see things as well as we go higher. As increasingly complex structures are revealed, each consecutive layer has increasing relevance in terms of semantic meaning. The FCOS 1 feature map is supplied into the Regional Proposal Network, which is likewise an FCOS but features a rotating Region of Interest (RoI) [70]. This is done in line with FCOS’s developers’ advice. If you want to get the most out of your EfficientDet models, you’ll need to learn to function in a little different environment than you’re accustomed to. The classic FPN is distinguished by the accumulation of top-down traits that occur at different scales. Traditional FPN suffers from a lack of flexibility as a consequence of its reliance on a unidirectional flow of data (top-down). To address this problem, PANet employs a second network to aggregate routes. This network is built from the ground up. Unlike PANet’s single top-down and single bottom-up routes, the authors employ BiFPN, which is made up of repeated blocks. BiFPN, a newly constructed feature network, outperforms previous iterations in terms of both prediction accuracy and prediction efficiency. As with EfficientNets, the authors of BiFPN increase the number of channels at an exponential pace while increasing the number of layers at a linear rate. This is due to the fact that the number of layers must be rounded down to a very tiny integer.   Widthbifpn = 64. 1.35ψ , Depthbifpn = 2 + ψ

470

I. Kar et al.

Fig. 6 Architecture of fine grained rotated neural network

The depth (layers) is linearly increased while the width is maintained at the BiFPN level. We use ψ = 5, i.e., in EfficientDet, the backbone network is EfficientNet B5. The authors use a Rotated Region proposal and ROI pooling instead of the BiFPN [86], which takes P3–P7 features as input features and repetitively applies crossscale multi-scale fusion on these features as shown in Fig. 3 to get a multi-scale feature representation of the image, which is then fed to class prediction net and box prediction net to get class and bounding box outputs. The feature vectors from the FCOS RRPN and BIFPN RPN are fused independently. The output features from A and B are then merged by the bilinear pooling layer to obtained fixed size feature maps are fed into fully connected CNN for bounding box regression and for class prediction.

5 Result and Discussion The Model Training was carried out in a distributed manner on 2 NVIDIA 3080 GPU of 16 GB memory. We have used PyTorch framework for performing our research work. SGD is chosen as optimizer. Scheduler learning rate is used with 0.01 as starting value and 0.0001 weight decay for epoch 2000. The Model Training was carried out in a distributed manner on 2 NVIDIA 3080 GPU of 16 GB memory. From the model evaluation perspective, box loss, objectness loss, centerness loss, classification loss, precision, and recall are the metrics. The performance metrics are discussed below:

A Dual Fine Grained Rotated Neural Network for Aerial Solar Panel …

471

Box Loss The algorithm’s ability to detect the center of an item, as well as the extent to which the algorithm’s estimated bounding box encompasses an object, is reflected by the box loss metric. Objectness Loss Each box forecast is accompanied by a single “objectness” prediction. It takes the job of the confidence that an area proposal contains an item in earlier detectors, like as R-CNN, by being multiplied by the lass score to produce absolute class confidence. Contrary to popular belief, IoU prediction relates to how well the network predicts the box can hold the object. Utilizing the coordinate loss term, one may learn to make a better prediction for the box but using the objectness loss term can help one make a better prediction for the IoU. However, only the boxes that are most significant to one another inside a given spatial cell contribute to the objectness loss caused by errors in coordinates and classifications. Centerness Loss As far as FCOS is concerned, the entire ground-truth region is a valid sample. The end effect is poor bounding boxes that are formed in areas which are quite a distance from the center of an object. To counteract this, they devised a useful index called centerness, which hides the edges of projected bounding boxes. As a new offshoot of the feature maps index, the centerness index defines how far a point is from the geometric center of a ground-truth box. The centerness scale can take on any value between 0 and 1. The BCELoss formula is used by centerness during training to pinpoint a loss center (Binary Cross Entropy). The categorization score may also be utilized to boost the centerness factor, so excluding the low-quality bounding boxes when the model is used for forecasting. If the final non-maximum suppression succeeds in filtering out these low-quality bounding boxes, the detection performance will likely increase dramatically. Classification Loss Classification loss functions are computationally viable loss functions that indicate the cost of inaccurate prediction in classification situations. The Classification loss indicates how successfully the algorithm predicts the proper class of a given object (Fig. 7). Some of the predictions from the proposed network are attached here (Fig. 8 and Table 1).

6 Conclusion With this work we have shown that drone imagery is a feasible option for carrying out maintenance of solar panels. We examined the feasibility of drones for this purpose

472

I. Kar et al.

Fig. 7 Plot of loss and evaluation metric of the dual fine grained rotated neural network model

Fig. 8 Predictions from proposed model Table 1 Result of performance metrics of our proposed model Method

Box loss

Objectness loss

Classification loss

Precision

Recall

Centerness loss

Dual fine grain rotated NN

0.009

0.0005

0.005

0.970

0.920

0.010

A Dual Fine Grained Rotated Neural Network for Aerial Solar Panel …

473

through investigating the influence of changing solar panels, height and different weather conditions. We merged two different datasets and generated a additional synthetic dataset for robust training of deep learning model. The suggested dual fine grained rotated neural network rotated boundary box vectors is a novel architecture. As a result, the technique in this paper is specifically customized for the detection of dusty Solar Panels from various altitudes as well as complex back grounds can prove to be very useful for the maintenance of Solar PV systems.

References 1. Lorenzoni A (2010) The support schemes for the growth of renewable energy 2. Bloomberg (2019) Transition in energy, transport—predictions for 2019 ˙ 3. Zywiołek J, Rosak-Szyrocka J, Khan MA, Sharif A (2022) Trust in renewable energy as part of energy-saving knowledge. Energies 15(4):1566 4. Davaadorj U, Yoo KH, Choi SH, Nasridinov A, The soiling classification of solar panel using deep learning 5. Ghosh A (2020) Soiling losses: a barrier for India’s energy security dependency from photovoltaic power. Challenges 11(1):9 6. Shaju A, Chacko R (2018) Soiling of photovoltaic modules-Review. IOP Conf Ser Mater Sci Eng 396(1):012050 7. Shah A, Shelar P, Shrirame B, Shah D, Shaikh MA, Komble S, Automated cleaning tracking and cooling system (ACTCS) for solar panel 8. Solas ÁF, Micheli L, Almonacid F, Fernández EF (2021) Comparative analysis of methods to extract soiling losses: assessment with experimental measurements. In: 2021 IEEE 48th photovoltaic specialists conference (PVSC). IEEE, pp 0160–0164 9. Fan S, Wang Y, Cao S, Sun T, Liu P (2021) A novel method for analyzing the effect of dust accumulation on energy efficiency loss in photovoltaic (PV) system. Energy 234:121112 10. Zeedan A, Barakeh A, Al-Fakhroo K, Touati F, Gonzales AS Jr (2021) Quantification of PV power and economic losses due to soiling in Qatar. Sustainability 13(6):3364 11. Redondo M, Platero CA, Moset A, Rodríguez F, Donate V (2021) Soiling forecasting in large grid-connected PV plants and experience in Spain. In: 2021 IEEE international conference on environment and electrical engineering and 2021 IEEE industrial and commercial power systems Europe (EEEIC/I&CPS Europe). IEEE, pp 1–4 12. Abubakar A, Almeida CFM, Gemignani M (2021) Review of artificial intelligence-based failure detection and diagnosis methods for solar photovoltaic systems. Machines 9(12):328 13. Mellit A, Kalogirou S (2021) Artificial intelligence and internet of things to improve efficacy of diagnosis and remote sensing of solar photovoltaic systems: Challenges, recommendations and future directions. Renew Sustain Energy Rev 143:110889 14. Almalki FA, Albraikan AA, Soufiene BO, Ali O (2022) Utilizing artificial intelligence and lotus effect in an emerging intelligent drone for persevering solar panel efficiency. Wirel Commun Mobile Comput 15. Hammoudi Y, Idrissi I, Boukabous M, Zerguit Y, Bouali H (2022) Review on maintenance of photovoltaic systems based on deep learning and internet of things. Indonesian J Electr Eng Comput Sci 26(2):1060–1072 16. Sun K, Cui H, Xu R, Wang L, Li M, Yang Z et al (2022) Constructing of 3D porous composite materials of NiAl/CNTs for highly efficient solar steam generation. Sol Energy Mater Sol Cells 240:111722 17. Choi CS, Cagle AE, Macknick J, Bloom DE, Caplan JS, Ravi S (2020) Effects of revegetation on soil physical and chemical properties in solar photovoltaic infrastructure. Front Environ Sci 8:140

474

I. Kar et al.

18. Deitsch S, Christlein V, Berger S, Buerhop-Lutz C, Maier A, Gallwitz F, Riess C (2019) Automatic classification of defective photovoltaic module cells in electroluminescence images. Sol Energy 185:455–468 19. Einhorn A, Micheli L, Miller DC, Simpson LJ, Moutinho HR, To B et al (2018) Evaluation of soiling and potential mitigation approaches on photovoltaic glass. IEEE J Photovoltaics 9(1):233–239 20. Figgis B, Nouviaire A, Wubulikasimu Y, Javed W, Guo B, Ait-Mokhtar A et al (2018) Investigation of factors affecting condensation on soiled PV modules. Sol Energy 159:488–500 21. Ilse K, Figgis B, Khan MZ, Naumann V, Hagendorf C (2018) Dew as a detrimental influencing factor for soiling of PV modules. IEEE J Photovoltaics 9(1):287–294 22. Fernández-Solas Á, Micheli L, Almonacid F, Fernández EF (2022) Indoor validation of a multiwavelength measurement approach to estimate soiling losses in photovoltaic modules. Sol Energy 241:584–591 23. Chanchangi YN, Ghosh A, Sundaram S, Mallick TK (2021) Angular dependencies of soiling loss on photovoltaic performance in Nigeria. Sol Energy 225:108–121 24. Sanz-Saiz C, Polo J, Martín-Chivelet N, del Carmen Alonso-García M (2022) Soiling loss characterization for Photovoltaics in buildings: A systematic analysis for the Madrid region. J Clean Prod 332:130041 25. Figgis B, Scabbia G, Aissa B (2022) Condensation as a predictor of PV soiling. Sol Energy 238:30–38 26. Dahlioui D, Laarabi B, Barhdadi A (2022) Review on dew water effect on soiling of solar panels: towards its enhancement or mitigation. Sustain Energy Technol Assess 49:101774 27. Micheli L, Morse J, Fernandez EF, Almonacid F, Muller M (2018) Design and indoor validation of’DUSST’: a novel low-maintenance soiling station (No. NREL/CP-5K00–71216). National Renewable Energy Lab. (NREL), Golden, CO (United States) 28. Muller M, Morse J, Almonacid F, Fernandez EF, Micheli L (2019) Indoor and outdoor test results for “DUSST”, a low-cost, low-maintenance PV soiling sensor. In: 2019 IEEE 46th photovoltaic specialists conference (PVSC). IEEE, pp 3277–3280 29. Zsiborács H, Baranyai NH, Vincze A, Zentkó L, Birkner Z, Máté K, Pintér G (2019) Intermittent renewable energy sources: the role of energy storage in the European power system of 2040. Electronics 8(7):729 30. El-Rashidy MA (2022) An efficient and portable solar cell defect detection system. Neural Comput Appl, 1–13 31. John JJ, Tatapudi S, Tamizhmani G (2014) Influence of soiling layer on quantum efficiency and spectral reflectance on crystalline silicon PV modules. In: Proceedings of the 2014 IEEE 40th photovoltaic specialist conference (PVSC), Denver, Colorado, USA, 8–13 June 2014, pp 2595–2599 32. Maghami MR, Hizam H, Gomes C, Radzi MA, Rezadad MI, Hajighorbani S (2016) Power loss due to soiling on solar panel: a review. Renew Sustain Energy Rev 59:1307–1316 33. Cai S, Bao G, Ma X, Wu W, Bian GB, Rodrigues JJ, de Albuquerque VHC (2019) Parameters optimization of the dust absorbing structure for photovoltaic panel cleaning robot based on orthogonal experiment method. J Clean Prod 217:724–731 34. Bessa JG, Micheli L, Almonacid F, Fernández EF (2021) Monitoring photovoltaic soiling: assessment, challenges, and perspectives of current and potential strategies. iScience 24:102165 35. Deb D, Brahmbhatt NL (2018) Review of yield increase of solar panels through soiling prevention, and a proposed water-free automated cleaning solution. Renew Sustain Energy Rev 2018(82 Part 3):3306–3313 36. Jiang Y, Lu L, Ferro AR, Ahmadi G (2018) Analyzing wind cleaning process on the accumulated dust on solar photovoltaic (PV) modules on flat surfaces. Sol Energy 159:1031–1036 37. Chanchangi YN, Ghosh A, Sundaram S, Mallick TK (2020) An analytical indoor experimental study on the effect of soiling on PV, focusing on dust properties and PV surface material. Sol Energy 203:46–68 38. Kim D, Youn J, Kim C (2016) Automatic photovoltaic panel area extraction from UAV thermal infrared images. J Korean Soc Surv Geod Photogramm Cartogr 34:559–568

A Dual Fine Grained Rotated Neural Network for Aerial Solar Panel …

475

39. Ilse K, Micheli L, Figgis BW, Lange K, Daßler D, Hanifi H, Wolfertstetter F, Naumann V, Hagendorf C, Gottschalg R et al (2019) Techno-economic assessment of soiling losses and mitigation strategies for solar power generation. Joule 3:2303–2321 40. Lee C, Lim C (2021) From technological development to social advance: a review of Industry 4.0 through machine learning. Technol Forecast Soc Chang 167:120653. 41. Zhao S, Wang H (2021) Enabling data-driven condition monitoring of power electronic systems with artificial intelligence: concepts, tools, and developments. IEEE Power Electron. Mag. 8:18–27 42. Kurukuru V, Haque A, Khan M, Sahoo S, Malik A, Blaabjerg F (2021) A review on artificial intelligence applications for grid-connected solar photovoltaic systems. Energies 14:4690 43. Kandeal A, Elkadeem M, Thakur AK, Abdelaziz GB, Sathyamurthy R, Kabeel A, Yang N, Sharshir SW (2021) Infrared thermography-based condition monitoring of solar photovoltaic systems: a mini review of recent advances. Sol Energy 223:33–43 44. Kyi S, Taparugssanagorn A (2020) Wireless sensing for a solar power system. Digit Commun Netw 6:51–57 45. Chu Y, Ho C, Lee Y, Li B (2021) Development of a solar-powered unmanned aerial vehicle for extended flight endurance. Drones 5:44 46. Czarnecki T, Bloch K (2022) The use of drone photo material to classify the purity of photovoltaic panels based on statistical classifiers. Sensors 22(2):483 47. Quater PB, Grimaccia F, Leva S, Mussetta M, Aghaei M (2014) Light unmanned aerial vehicles (UAVs) for cooperative inspection of PV plants. IEEE Journal of Photovoltaics 4(4):1107–1113 48. Fernández A, Usamentiaga R, de Arquer P, Fernández MÁ, Fernández D, Carús JL, Fernández M (2020) Robust detection, classification and localization of defects in large photovoltaic plants based on unmanned aerial vehicles and infrared thermography. Appl Sci 10(17):5948 49. Sundaram KM, Hussain A, Sanjeevikumar P, Holm-Nielsen JB, Kaliappan VK, Santhoshi BK (2021) Deep learning for fault diagnostics in bearings, insulators, PV panels, power lines, and electric vehicle applications—the state-of-the-art approaches. IEEE Access 9:41246–41260 50. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 51. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan, D Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9 52. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR), June 2016 53. Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition 54. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99 55. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2016) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915 56. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 580–587 57. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans Pattern Anal Mach Intell 58. Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651 59. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241 60. Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: ICLR 61. Lin G, Milan A, Shen C, Reid I (2017) Refinenet: Multipath refinement networks with identity mappings for highresolution semantic segmentation. In: CVPR

476

I. Kar et al.

62. Liu F, Ren X, Zhang Z, Sun X, Zou Y (2020) Rethinking skip connection with layer normalization. In: Proceedings of the 28th international conference on computational linguistics, pp 3586–3598 63. Ma X, Fu A, Wang J, Wang H, Yin B (2018) Hyperspectral image classification based on deep deconvolution network with skip architecture. IEEE Trans Geosci Remote Sens 56(8):4781– 4791 64. Li Y, Zhang X, Chen D (2018) Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1091–1100 65. Li C, Qiu Z, Cao X, Chen Z, Gao H, Hua Z (2021) Hybrid dilated convolution with multi-scale residual fusion network for hyperspectral image classification. Micromachines 12(5):545 66. Cheng G, Zhou P, Han J (2016) Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans Geosci Remote Sens 54(12):7405–7415 67. Mehta S, Azad AP, Chemmengath SA, Raykar V, Kalyanaraman S (2018) Deepsolareye: Power loss prediction and weakly supervised soiling localization via fully convolutional networks for solar panels. In: 2018 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 333–342 68. https://doi.org/10.6084/m9.figshare.18093890 69. https://rentadronecl.github.io/docs/detection_models.html 70. Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9627–9636 71. Lin T-Y, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. In: ICCV 72. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) EAST: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5551–5560 73. Kim SW, Kook HK, Sun JY, Kang MC, Ko SJ (2018) Parallel feature pyramid network for object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 234–250 74. Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790 75. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: ECCV 76. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: CVPR 77. Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: CVPR 78. Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Focal loss for dense object detection. In: ICCV 79. Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: CVPR, pp 5987–5995 80. Real E, Aggarwal A, Huang Y, Le QV (2019) Regularized evolution for image classifier architecture search. In: AAAI 81. Ghiasi G, Lin T-Y, Pang R, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: CVPR 82. Tan M, Le QV (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: ICML 83. Gao Y, Beijbom O, Zhang N, Darrell T (2016) Compact bilinear pooling. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 317–326

A Dual Fine Grained Rotated Neural Network for Aerial Solar Panel …

477

84. Tian T, Pan Z, Tan X, Chu Z (2020) Arbitrary-oriented inshore ship detection based on multiscale feature fusion and contextual pooling on rotation region proposals. Remote Sens 12(2):339 85. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768 86. Chen J, Mai H, Luo L, Chen X, Wu K (2021) Effective feature fusion network in BIFPN for small object detection. In: 2021 IEEE international conference on image processing (ICIP). IEEE, pp 699–703

Explainable Automated Coding of Medical Narratives Using Identification of Hierarchical Labels Sonam Sharma, Sumukh Sirmokadam, and Pavan Chittimalli

Abstract Medical coding is an integral part of the healthcare process. Translation of medical procedures and diagnoses described in a patient’s medical record into codes used for billing and claims. This process is essential for reimbursements, provides data that can be used for tracking, and promotes consistency throughout the medical field. In this paper, we determine the medical codes using the derived diagnosis from medical narrations. We will discuss the process to convert ICD10-CM index and tabular data into hierarchical graph and derive the medical codes. We propose a knowledge graph-based solution that facilitates interoperability without sacrificing accuracy. Keywords Medical code · Knowledge graphs · Hierarchy traversal · UMLS · SNOMED-US · ICD10-CM · Advance NLP

1 Introduction The International Classification of Disease (ICD) is a widely used diagnostic ontology for the classification of health disorders and a valuable resource for healthcare analytics. ICD10 CM is a developing ontology and conditioned to periodic revisions (e.g., ICD9 CM to ICD10 CM) because of which there is a lot of difference among different versions make it difficult for cross complete cross-walk. It is both labor-intensive and error-prone for clinical experts to create custom mappings across multiple versions. The International Classification of Disease (ICD) is a extensively S. Sharma (B) Tata Consultancy Services, New Delhi, India e-mail: [email protected] S. Sirmokadam Tata Consultancy Services, Mumbai, India e-mail: [email protected] P. Chittimalli TRDDC, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_36

479

480

S. Sharma et al.

used ontology for mapping diagnosis identified for the patients and classification of health disorders and used for billing purposes for claims of diagnoses and procedures. In the healthcare domain, identification of ICD codes using various techniques using healthcare data is extremely valuable in research domain. The which is used usually describes the history of the patient and how the injuries have happened using which medical codes are derived. Unfortunately, the ICD ontology poses several challenges because of its hierarchical data structure and availability of vast vocabulary of this domain. It is an evolving ontology, which makes it difficult to establish stable representations of disease or other health conditions across the periodic ICD revisions. Medical narratives are in the form of free text information generated by healthcare professionals when patient visits. These narratives often contain key details related to patient health, along with other findings, history, and identified diagnosis of the case. Using the provided information from the medical narratives, identification of ICD codes is done. ICD codes are a set of alphanumeric codes where each digit/level signifies the detail on the specified diagnosis identified from the narrative. We make use of ICD10 CM alphabetic index, where each level adds additional details to the specified diagnosis in the narrative. Figure 1 explains as we start from root level where we identify the main disease and injury. Then further down the hierarchy, we keep on identifying additional details to reach the desired code. In this paper, we contribute in describing novel approach. • We propose a knowledge graph-based approach in deriving the ICD10 code from the medical narrative

Fig. 1 ICD10-CM index

Explainable Automated Coding of Medical Narratives Using …

481

• We additionally propose a method which can help in creating the hierarchical graph from ICD10 CM alphabetic index and tabular data, which can further help in deriving the codes if any synonymous information is present in the narrations • We showcase our results and experiments on the few samples detailing fracture injuries and various other fracture scenarios

2 Literature Review Several works related to ICD medical coding have been proposed in recent years, where most of them rely on machine learning approaches. In [1], authors explained the ensemble of character level CNN, word level CNN, and bidirectional LSTM to derive the ICD codes. Another study proposes the tree of sequence LSTM architecture, and other works explain various attention mechanisms to derive ICD codes [3, 4]. Medical narratives generally describe the diagnosis of the patient. Sometimes, there is a possibility that there are multiple diagnosis present for each patient, and the explanation of diagnosis is scattered across various sections in the narrative. The above-described various machine learning approaches fail here. The reason for this is the presence of multiple diagnosis and sometimes one diagnosis dependent on another. We propose a knowledge graph-based approach and mimicking the knowledge from domain experts to derive the correct diagnosis and reach the desired ICD10 CM codes. The problem which we are discussing here is not new in the medical domain but various approaches which are being tried so far does not provide complete solution and adds overhead to the end-user or domain SME since finally they have to go back to the narration to verify the provided codes which takes significant amount of time more than if it is done manually. We have tried to overcome this in the discussed approach before. We cannot leave the problem independently to machine learning solution, and there is significant amount of work that is required after machine learning approach predicts the code.

3 Problem Statement Automated ICD coding is a demanding task for multiple reasons. First, the label space is high-dimensional with over 70,000 codes [2]. For the identification of complete 7-digit code domain, expert has to go through various level of hierarchies in ICD index and tabular files to complete the code. The initial level in ICD index is the identification of main chapter/class of disease. Then in another subsequent levels, we need to identify various specifications related to that disease, e.g., body part, location of body part called as hierarchy, and other granular details of the diagnosis specific to the body part and the disease.

482

S. Sharma et al.

Additionally, medical narratives contain a summary of all findings diagnosed for the patient for and detail on whether the visit of the patient for the treatment of subsequent/first visit. It also provides relevant findings from the previous visits. Because of this medical coder, tasks have increased, and now, the identification of each diagnosis is required with its code.

4 Proposed End-to-End Pipeline We first parse the narratives and identify various sections and their related contents. During this stage, all the information is structured, and for each section, we additionally apply our in-house NLP module which helps in simplifying complex sentences and completes incomplete sentences explained in Fig. 2. Subsequently, after this, we create concept graph from the narration. Using parsed sentences, each sentence is linked with its appropriate section to get the connected graph and easy traversal. Enrichment of concept graph is to be done on the basis of identified nodes. We use scispacy NER + L for this activity. It helps in identifying the medical entities and helps them to link it to the appropriate UMLS entities, which additionally gives synonyms of each word. With these identified entities and their synonyms, we enrich our concept graph. We then identify the main term in the diagnosis nodes in the concept graph and tag them (Fig. 7). Main terms are the main diagnosis terms provided in alphabetic index as showcase in Fig. 4. We additionally handle “See” and “NEC” “Other Specified” Scenarios. Directly get 4 characters ICD code from alphabetic index. After identifying each level term from alphabetic index, we then link them together and synthesize the search string by traversing the graph. This graph traversal helps us in linking appropriate and important terms related to one diagnosis together. The final search string which we get will be similar to the descriptions of the codes written in tabular list hierarchy of ICD 10 codes. Additionally, other important entities are identifies using our custom NER module and few other medical entities we predict using the narrative based on the class identified for ICD code. These entities are further used to enhance the search string and to create complete code description, which is then used to complete the code in the tabular list for the specific series/class. And derive the 7-digit code to the highest level of specificity. The detailed steps which we use to process the narration are as follows: Step 1—Parse narration identify sections and create concept graph using DD. Fig. 2 Example sentence correction

Explainable Automated Coding of Medical Narratives Using …

483

Fig. 3 End-to-end pipeline

Step 2—Enrich CG using and identify the domain entities for the concepts, add synonyms using UMLS and SNOMED-US for the terms, e.g., acetabular fracture as fracture of acetabulum. Step 3—Identify main term(s) from the narration (check for synonymous words as well) - acetabular as acetabulum. Step 4—Add level information and ICD codes (wherever applicable) on each node of enriched graph using ICD10 CM index graph (column, wall, anterior, posterior). Step 5—Identify body parts and laterality from enriched graph nodes (right hip) (See Fig. 7). Step 6—Create hierarchy graph (See Fig. 8). Step 7—Traverse hierarchy graph from root to leaf and create search strings (each search string signifies one diagnosis). Add code from leaf of hierarchy graph. Step 8—Identify the type of fracture whether it displaced or non-displaced and open or closed. Step 9—Complete the code using tabular data (Figs. 3 and 4). (A)

Dataset

We have used 167 fracture-related pathology reports to validate the proposed algorithm. Samples we have used contain information scattered across various sections, names as history, findings, exam, differential diagnosis, case diagnosis, discussion, etc. Case diagnosis mainly describes the final diagnosis where other sections support and provide further details on the same (Fig. 5).

5 Results and Conclusion We have tested our results using 167 odd narrations which explain fracture and injuries and were able to achieve 80% accuracy. The approach which we have discussed is novel and unique since we are creating knowledge graph, and using that graph, we link various concepts and related words

484

S. Sharma et al.

Fig. 4 Alphabetic index sample

together. Because of this, term concepts which are far away or are available in different sections get linked together. This step is novel in various approaches which we have analyzed and various solutions available where terms which are nearby get linked, and they do not see the complete diagnosis to provide the desired code. Another, we help and reduce the work of domain SME to a great extent since we try to deduce the diagnosis from the narration, and each diagnosis provides one ICD code which is unique. One narration can have maximum of 2–3 diagnosis, and at max, our solution identifies 5 search strings and codes against the same. The devised novel approach which we have discussed here helps in achieving the desired result (Tables 1, 2, and 3). We have shown how using a graph-based approach can help the derivation of complete ICD10 CM diagnosis codes. We have shown how he helps in mimicking the process of domain experts. And how a solution links terms which appear far away can be linked together if they are related to one specific diagnosis (Figs. 6, 7, and 8).

Explainable Automated Coding of Medical Narratives Using …

485

Fig. 5 Sample narration Table 1 Narratives representation Total number of narratives tested

167

Number of narratives having rules driven path

134

Number of narratives which need tacit/inferred knowledge

33

Table 2 Main term analysis MT present

Case diagnosis

Findings

History

Individually yes

114

118

104

67

54

49

64

100

Individually no Case diagnosis | findings

114

Case diagnosis | findings | history

104

Case diagnosis | findings | history | examination

67

Findings | history

0

Findings | history | examination

0

History | examination

0

Examination

486

S. Sharma et al.

Table 3 Main terms section matches Case diagnosis | findings

Case diagnosis | findings | history

Case diagnosis Findings | | findings | history history | examination

Findings | history | examination

History | examination

True

True

True

True

True

True

12

0

0

0

0

0

False

False

False

False

False

False

156

167

167

167

167

167

Fig. 6 Concept graph derived for sample narration

Fig. 7 Enriched graph of the narration

Explainable Automated Coding of Medical Narratives Using …

487

Fig. 8 Hierarchy graph built by our solution of the narration

References 1. Xu K, Lam M, Pang J, Gao X, Band C, Mathur P, Papay F, Khanna AK, Cywinski JB, Maheshwari K et al (2019) Multimodal machine learning for automated ICD coding. In: Machine learning for healthcare conference. PMLR, pp 197–215 2. https://www.cms.gov/medicare/icd-10/2022-icd-10-cm 3. Xie P, Xing E (2018) A neural architecture for automated icd coding. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 1066–1076 4. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

Automatic Text Summarization Using Word Embeddings Sophiya Antony and Dhanya S. Pankaj

Abstract Text summarization generates a summary that highlights key sentences while condensing all information from the original document into a few sentences. Abstractive and extractive are the two types of summaries seen in general text summarization process. A hybrid summarization technique which generates an abstractive summary over an extractive summary is proposed in this paper. The initial phase incorporates the use of a semantic model to generate word embeddings for each sentence, and these embeddings are used to improve the lack of semantic disintegrity in extractive summaries. The second phase takes in the concept of WordNet, Lesk algorithm, and POS tagging for generating an abstractive summary from the extractive summary. The paper uses two different data sets: DUC 2004 and Daily Mail/CNN for evaluating the performance over ROUGE and BLEU metrics. The results highlight the relevance of developing hybrid approaches to summarization compared to complex abstractive techniques. Keywords Extractive summarization · Abstractive summarization · Semantic model · Word2Vec · WordNet · Machine learning

1 Introduction As the Internet and big data have increased in popularity, people have gotten overwhelmed by the massive amount of information and documents available on the Internet. Many researchers are inspired to develop scientific methods to deal with the vast amount of information available and to decide what to do with it. Automatic

S. Antony (B) · D. S. Pankaj Department of Computer Science and Engineering, College of Engineering Trivandrum, Trivandrum, India e-mail: [email protected] D. S. Pankaj e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_37

489

490

S. Antony and D. S. Pankaj

text summarization is a method for automatically summarizing texts that are discovered through research. It generates summaries that include key sentences as well as all other relevant information from the original document. The study of text processing began in the mid-twentieth century, with Lun (1958) [1] being the first to use a statistical technique called word frequency diagrams. Many other approaches have been developed to date. Depending on the amount of documents, there are single and multi-document summary choices. The extractive and abstractive outcomes, on the other hand, are based on the summary results. A single paper offers a summary of a specific source article, and the content described is all about the same thing. Contrarily, multi-document summarization is based on a number of different sources or publications that cover the same topic. Extractive summarization is a type of summary in which all of the information is extracted and the summary phrases are taken from the source material. The placement of the phrase and the quantity of words found in the text were initially the most common issues in extractive summarization studies. The extraction problem was solved using the Information Extraction (IE) technique, leading to a focused overview with more accurate data. Abstractive summaries, in contrast to extractive summarizing, introduce additional lines, sometimes known as paraphrases, employing terms not discovered in the original text to make summaries. Abstractive summaries are significantly more complicated and demanding than extractive summaries because they require extensive natural language processing. The two sorts of tactics employed in abstractive summaries are the linguistic and semantic approaches. Linguistic approaches are classified into instance-based, informationbased, and tree-based approaches. Methods that employ semantic techniques include template-based and ontology-based approaches. Information-based methods [2–4] and tree-based methods [5–7] are two examples of methods that use linguistic approaches. Template-based methods [8–11] and ontology-based methods [12, 13], for example, employ semantic strategies. The encoder-decoder frameworks [14–18] have lately spurred research on abstractive summarizing. The field of text summarizing research had a resurgence in the 2000s. Realtime summarizing refers to the ability to summarize real-time occurrences or edit summaries as new information becomes available. Fuzzy-based [15] and machine learning [12] approaches have both been employed in real-time summarization. Since extractive summarization has grown into a huge research area that is mature, the analysis of existing works is significant. Abstractive summarization and real-time summarizing are now the focus of research. Abstractive summaries are harder to generate than extractive summaries because they are more complex. Extractive summaries are more predictable and produce better evaluation scores than abstractive summaries. As a result, extractive summarization is still in high demand, as evidenced by the existence of extractive research in the last two years. This indicates that there may be additional possibilities or issues that must be addressed. In this paper, we propose a hybrid way of generating abstractive summary from the extractive summary. The main disadvantage of most automatic text summarization methodologies is the use of statistical features in removing the crucial phrases from

Automatic Text Summarization Using Word Embeddings

491

the text. In the proposed approach, we use the concept of semantics of each sentence to capture the extractive summary. For the semantic feature, Word2Vec [19] is a redistributive semantic framework that we employ, which works on the distributional hypothesis. The hypothesis works on a very simple norm, and it claims that phrases have the same meaning when used in the same context. The use of semantic model is to improve the summary’s reliability produced, and also, the use of semantic feature in extractive summarization is the new key concept to generate a better summary which convey some semantic coherence. The quality of the generated extractive summary can help to generate a better abstractive summary in the second phase, which includes a language generator. Language generator includes the 3 main components: Wordnet, Lesk algorithm, and POS tagger. Rest of the document is organized as follows: Existing works are in Sect. 2, Sect. 3 explains the methodological approach, Sect. 4 displays experimental design and findings, Sect. 5 represents result analysis while the conclusion and upcoming works are presented in Sect. 6.

2 Related Works 2.1 Summarization Utilizing Learning Methods A supervised and semi-supervised model [20] for text summarization proposes a method based on the qualities of a sentence where a supervised or semi-supervised categorization algorithm will assess its importance. Learning might be supervised or semi-supervised in the categorization model. Supervised methods usually outperform unsupervised methods, although they require more labeled training data. To exploit unlabeled data, the above method co-trains a probabilistic SVM with a Naïve Bayesian classifier for semi-supervised learning. First, each input sentence is analyzed using the feature functions that have been pre-defined. Following that, the classification model will forecast the significance of each sentence based on its feature values. The order is then revised using a re-ranking method. The best phrases are then incorporated into the description until the length limit has been met. Because more important content is expected to be contained in the final summary with a set length, the re-ranking process is critical. Candidates are important sentences that exceed a certain criterion. The one with the fewest words and the one at the start of the document is ranked first.

2.2 Summarization for Multi-documents Using Multilayer Networks A multi-document summarizing method is based on complex network measurements that model a collection of documents as a multilayer network [10]. Each sentence

492

S. Antony and D. S. Pankaj

Fig. 1 The edge’s weight is determined by the cosine similarity between words, and the node number in each node reflects sentences from a piece of text

in the text which acts as an input is represented by a node, and the cosine similarity of two sentences is used to construct the associated edges. The approach for extracting the most relevant sentences from texts includes document pre-processing, term vectorization, network building, measures extraction, and summarization. In the pre-processing phase, the following steps are applied: text categorization, term removal, tokenizing, and the TF-IDF score are all computed. After computing the weight, the network creation is done using: edge type recognition, type-based edge weighting, and edge eradication for non-weighted data using a TF-IDF network. Figure 1 shows example for the network for a piece of text. The edge’s weight is determined by the cosine similarity of distinct words, and the node number in each node indicates the sentences from a text piece. Different measurements are used: Degree, Strength, Shortest path, PageRank, Accessibility, Generalized accessibility, Symmetry, and Absorption time. The summary is prepared in the final phase or step by selecting the higher-ranking sentences using some selection algorithms.

2.3 Summarization Using Neural Network Model Attention mechanisms in an encoder-decoder model for text translations are highly capable to generate summaries having a good performance score. A paradigm for text processing based on neural networks proposed by the author [16] addresses the solution for one of the core problems in text summarization known as the repeating phrases problem. To solve the problem of repeating phrases, a crucial attention mechanism and a new learning target were developed. To reduce exposure bias, the decoder uses a sequential intra-attention model [16] that considers which words have already

Automatic Text Summarization Using Word Embeddings

493

been generated by the decoder and a new reward function that blends optimization techniques with policy gradients and maximum-likelihood cross-entropy loss. Mostly encoder-decoder model is built using two different phases: training phase and the inference phase. When the training is underway, the encoder-decoder model is skilled with data such that it produces the output target sequence during each timestep. During each timestep, the model tries to understand the information relying in the input sequence. Once the training phase is completed, in the inference phase new set of input sentences is provided to the encoder-decoder model to the result pattern to be produced which is not known to the model.

2.4 Summarization Using Machine Learning Techniques The summarization steps in this approach are broadly divided into three primary stages: pre-processing, feature extraction, and post-processing. Pre-processing phase includes segmentation, synonym removal, ambiguity removal, stop-word removal, POS tagging [21], and word stemming. The pre-processed text is passed over the next phase where the sentence and work rankings are performed. For word ranking, the criteria used are TF-IDF ranking [3], noun and verb phase ranking, and power factor. For sentence ranking, the criteria are TF-IDF, the presence of noun and verb phrases, the sum of the power factor, and the length of the sentence. The last phase consists of four steps: sentence extraction, dealing with connecting words, removing additional information, and using WordNet [22] for generating a hybrid summarizer.

2.5 Summarization Using Graph-Based Network Pre-processing, processing, and post-processing are the three key phases of the graphbased network-based summarization suggested by the authors [5]. The core difference between all the other methods is the way the input text is represented in the processing. There exists a graph construction algorithm during processing the input text, which represents the document as graph G = (V, E) where V represents vertices and E represents edges in graph [5]. Each graph node is made up of a single noun term. Each node is assigned a noun tag with the help of POS tagging. All pair of vertices vi and vj are connected with an edge (vi , vj ) E labeled with all other nonnoun terms (adverbs, adjectives, numbers, verb, etc.) in the parsed phrase between these two consequent nouns. Two more nodes “S#” and “E#” have been added to the graph to identify the starting and ending phrase of every sentence which are represented in the graphical representation. After graph construction, the weight of each node is calculated using the word frequency and weight value associated with each component of the input document. The following step is to locate the candidate edge list which contains list of edge set that has been selected based on some

494

S. Antony and D. S. Pankaj

Fig. 2 Example representing a model of text graph and the generated edges represents candidate links resulted from the search algorithm for a sentence

constraints. The selected candidate list of edges is again passed over another algorithm which selects the highly frequent sentences using some pre-defined threshold value. Figure 2 provides a sample of network constructed using the above algorithm and the selected edges from the graph for the summary generation.

2.6 Hybrid Summarization This hybrid approach mainly based on extractive summarization, where on the base of extractive summary, is used to develop an abstract-oriented summary. The suggested method for choosing a sentence to be in the extracted summary integrates the concepts of statistical measure, sentiment analysis, and ranking algorithm. A summary is prepared based on the sentence’s significance level. This summary will be used as the input to a second phase which will be producing the concept of abstract-oriented summary. Most paper discusses about the concept of WordNet, as in more than 200 languages, it represents a lexical database of meaningful relationships between words. WordNet [22] creates semantic relationships between words, such as synonyms, hyponyms, and meronyms. Synsets of synonyms with short definitions and usage examples have been created. These synonyms can be used to substitute certain words in extractive sentences, as well as to replace specific sentences with words that have a definition. These synonyms can be used to replace certain words in the extractive sentences. It can also get certain sentences to be replaced by words that have a definition over it. Another work in hybrid approach uses a graphical representation [13] of the extractive summary, where some path filtering and information extraction are done which will give a compressed summary. It also generates new sentences for the summary.

Automatic Text Summarization Using Word Embeddings

495

3 Proposed Methodology We discuss our suggested method for automatic text summarizing in this section. The two main stages of the suggested strategy are phases 1 and 2. The abstractive summary is built using the language generator module in Phase 2 using the extractive summary that was created in phase-1 using the semantic features of the document. Figures 3 and 4 show the architecture of our suggested design.

3.1 PHASE-1: Extractive Summary Our proposed method of generating extractive summary consists of the following: (1) removing irrelevant details using pre-processing the text; (2) applying the distributional semantic model over the pre-processed text; (3) the captured semantics will be clustered into semantically similar sentences using clustering algorithm; and (4)

Fig. 3 Block diagram of proposed extractive method

Fig. 4 Block diagram of proposed abstractive method

496

S. Antony and D. S. Pankaj

each cluster is subjected to a rating algorithm, and the description is created using highly scored words out of each group. Pre-processing: Text document contains both relevant and irrelevant data. Only the relevant data is needed for the generation of the extractive summary. To remove inconsistencies in data, we use the concept called pre-processing [23]. After this step, the data is made suitable for the suggested technique’s next step. It has the following steps: URL removal: The URLs present in the input document both “www” and “http” formats are taken away. Stop-word removal: Stop words are the most problematic inconsistencies for summary. The NLP software packages available can get rid of these. Lowercase handling: The whole input document is converted into lowercase words. Hyphen removal: Hyphens also create irrelevancies in data, which may cause emerging of meaningless words in summary generation. Tokenization: The whole input document is tokenized into smaller sentences. They are converted into single sentences for further processing. Tokenization is performed using the Core NLP packages. Distributional semantic model for semantic feature: Distributional semantic model is used for capturing the semantic of the model as they do not require any linguistic or lexical features for analysis and, also, they are more generic in nature. As the model does not require any external source to capture the semantic information, they are used to generate the semantic coherence between two textual sentences in a document. All words that appear in the same context may have the same meaning, according to the theory underlying these models; hence, with this hypothesis we can infer the meaning of each word appearing in the document. The model constructs embeddings for each word known as the semantic embeddings. These embeddings are constructed using statistical computation of word occurrences in the document. Word vectors are multidimensional space real number vectors that are calculated for every term. These vectors while representing in higher dimension vector spaces with their geometric property can be useful in finding the coherence between words. This coherence can lead to finding out syntactically and semantically similar the text of the article. To exploit the use of similarity of words in the document using distributional semantic model, the Word2Vec model [19] is employed. Word2Vec is used in many applications where it converts a textual document into an array of embeddings or vectors. These vectors represent the embedding value with regard to each phrase in the text based on the semantic and syntactic features. Word2Vec is a model which is designed as a two-layer simple neural network which is commonly used to process text documents. The input to this network is a textual collection that generates a set of vectors as its output for each word in the document. The output vectors are termed as feature vectors of words. The algorithm used in Word2Vec is trained in terms of a vector space representation of words by using the concept of neural network with two layers. There exist two main architectures: skip-gram and continuous bag of words (CBOW). These architectures differ by

Automatic Text Summarization Using Word Embeddings

497

the way it derives an embedding for each word through the neural network. CBOW derives an embedding from contextual words for the exact term, while skip-gram derives embedding for the context words from the target word. For our proposed method we use the skip-gram algorithm. For the computation of Word2Vec model, we use the pre-trained model of Word2Vec from the Genism package. The model is trained over Google news dataset. Once the representation of each word is done through semantic model, we get an array of vector value for each word in the sentence. Instead of taking the long big array of vector value, the averaging function is applied to each sentence or the array of vector. The averaging function is applied to convert the long big vectors for each sentence to a single-valued vector representing the whole sentence and also Estimation is a statistical term that provides a concise and complete representation of a collection of unique data, that is, it represents the whole set of data. Vector representing the whole sentence is termed as the big vector, which will be fed to the clustering part to identify the similar sentences from the document. Semantically similar sentences can be identified based on the distance between the vector values. The lower the distance, the more similar they are. Clustering: Once the big vectors are constructed for each word from the Word2Vec model using the distributional semantic hypothesis, we pass to the clustering part of our proposed method. Clustering algorithms are used to group the semantically similar sentences. The algorithm used in clustering for our proposed method is K-means clustering [24, 25] and which forms clusters of semantic similar big vectors of each sentence of the input text corpus. The k-means applied will group the semantically similar sentences into different clusters based on the K-value. After this step, the cluster is applied over a ranking algorithm which finds the highly ranked sentences and will give the extractive summary of the proposed method. Ranking Algorithm: For the last step, we apply a rating formula [25] to order the sentences. The sentences are placed in different clusters and we use the technique for grading to each cluster so that after the ranking step sentences in each cluster will contain ranking score. The highly ranked or scored sentence from each cluster will be chosen as the sentence for generating the extractive summary. The feature that is utilized to determine how similar each sentence in the group is: Cosine similarity: To determine how closely two documents are related to one another, cosine similarity is utilized. In other words, the larger the cosine similarity across two sentences, the larger their score. Equation (1) measures the similarity between two vectors A and B. sim(A, B) = Cosθ =

A·B AB

(1)

After applying the cosine similarity metric, which generates a rating for every phrase in each group, the highly scored sentence from each cluster is chosen as the candidate sentence for generating the extractive summary part. As each cluster is containing sentences of similar meaning, there are chances of redundancy. But our proposed method removes redundancy, as only the highly ranked sentences are taken into account. We can also avoid redundancy if we choose to take more than one

498

S. Antony and D. S. Pankaj

sentence from each cluster for the generation of summary. The analysis of redundancy or sentence rephrasing can be avoided by checking the ranking score, if the score is almost same and the sentences belong to same cluster which means they convey similar meaning and thus can be removed. This is one of the novel features of our approach and it is also an essential feature for most of the summarization methods containing long textual documents where the corpus may include sentences that may repeat by expressing them in many forms and their grade will be high, but our approach identifies them and discards such tendencies. Any statistical measure can be used to determine the sentence that is to be chosen among two semantically similar sentences. The highly scored sentence from this statistical feature can be chosen for the summary generation.

3.2 PHASE-2: Abstractive Summary Once the extractive summary has been generated, the phase-2 of our proposed system tries to generate an abstractive version [26] of the summary using three main methods: WordNet, Lesk algorithm, and POS tagging. These methods are applied together in step-by-step form to generate an abstractive version of the generated extractive summary. The three different methods together may be termed as the Language generator module, which tries to replace certain words and tags in the extractive summary to a new form. The 3 main methods used in phase-2 are explained below: WordNet: The lexical database WordNet contains the meaning relationships between words in even more than 200 multiple dialects. Synonyms, hyponyms, grammar, and other semantic relationships are linked together via WordNet. Synsets serve as WordNet’s vertices. Links among pair of nodes can be lexical (bird, feather) or conceptual-semantic (feather, feathery). Figure 5 depicts an example for how synsets in WordNet can be used to replace a word with its synonym. Take about the term ‘bike’. It has a diverse range of meanings. The bike could be a motorbike, bicycle, or bicycle (verb). Motorcycle.n.01, bicycle.n.01, and bicycle.v.01 are three synsets with distinctive names that WordNet uses to identify them. A range of lexical names with the same notion is present in each synset. In motorcycle.n.01, the words ‘motorcycle’ and ‘bike’ are present. The input extractive summary text from phase-1 is sentence tokenized initially, and from each sentence, a word tokenization is applied. Once each word is separated, we apply the WordNet library over these words to check for the different synonyms available for the word and also before applying the WordNet we remove all the stop-words in the text to avoid the unnecessary checking of synonyms of such words. Lesk Algorithm: The foundation of the Lesk algorithm is the idea that terms in a particular ‘neighborhood’ (portion of material) would typically have a similar subject matter. Comparing the dictionary definition of a lexical item with the words in its vicinity is a condensed version of the Lesk algorithm. A possible implementation would resemble this:

Automatic Text Summarization Using Word Embeddings

499

Fig. 5 Example for WordNet

Criteria one: The quantity of words in the word’s neighborhood and in the meaning of that sense in the dictionary should be counted for each sense of the word getting disambiguated. Criteria two: The sense with the highest amount in this score is the one that needs to be selected. To avoid word sense disambiguation (WSD) problem, we use Lesk algorithm which will be given a replacement word’s synonym. The array of different synonyms from the WordNet phase is distributed to the Lesk algorithm library which checks the sense of the replaced word in the sentence to the sense of the actual word in the sentence and if both match then the sense of the replace word and the actual word is compatible for replacing. Once the sense is checked, then the generator module passes to the last process of checking the tag of the replaced word and it is explained in the next section. POS Tagger: The task of labeling each term in a sentence with the proper part of speech is known as “POS tagging”. Nouns, verbs, adverbs, adjectives, pronouns, conjunction, and various subclasses are all examples of parts of speech. The word selected for replacement is passed to POS tagger which will compare the tag of the word to be replaced, and if the two tags match, the word will be replaced otherwise the process continues until a replacement is found. After performing the language generator module, the step-by-step replacement of words [27] in different sentences tries to change the extractive summary to an abstract summary. The phase-2 process replaces the words in a loop manner where until and unless a word to replaced is found for a particular word, the process of language generator continues. Once the replacement of words has been executed, the generator module gives an output summary, which is somewhat of an abstractive form.

500

S. Antony and D. S. Pankaj

4 Experimental Setup and Results This chapter addresses about the two different data sets and the experiments that are used to assess the suggested technique and describe the recorded outcomes.

4.1 Evaluation Datasets Document Understanding Conference (DUC) provides datasets for performing the analysis of text summarization, and one of most popular datasets is DUC [28] for text summarization methods which contains the below data: DUC 2004: The data available for every DUC document includes the input document data needed for summarization, output summaries based on input, results based on input, human written summaries, summaries which are automated using baseline models, summaries provided by the systems of the involved groups, charts reflecting the findings of the evaluation, and further supporting information and technology for the summarizing. Taking DUC2004 as an example which contains 51 clusters or divisions of relevant documents and each cluster has 10 text inputs, therefore, in total 510 input documents. Each cluster deals with a specific topic (e.g., a hurricane, war, textile, etc.) and comes with model summaries created by NIST assessors. The reference summary length of 50-, 100-, 200-, and 400-word summaries are provided for each document. CNN/Daily Mail: The CNN/Daily Mail database is an English-language collection that contains over 300,000 individual news stories published by CNN and Daily Mail reporters. According to their scripts, the corpus comprises 286,817 training pairs, 13,368 validation pairs, and 11,487 test pairs. For our proposed method, a total of 2500 testing pairs were used for performance evaluation.

4.2 Baselines Our proposed method is compared with the below state-of-the-art models: Genism [29] abstractive summarization is a TextRank algorithm implementation. The relevance of a phrase, which is depicted as a vertex in the text, is calculated iteratively from the global level of the graph in the text ranking method known as TextRank [30]. The algorithm in this case works by voting; if a vertex links up with another vertex, the linking vertex receives an extra vote, and the more links a vertex has with other vertices, the higher its rank. PKUSUMSUM [27] is a Java summary platform that incorporates ten different summarization techniques and supports many languages. Additionally, it supports the following three summarizing tasks: topic-based multi-document summarization, inter-summarization, and summarization of a single document.

Automatic Text Summarization Using Word Embeddings

501

OPINOSIS [10] is a framework for graph-based summarization that can produce succinct and effective abstractive summaries. The summaries are insightful and useful in locating the document’s critical viewpoints. The algorithm represents the input text as a word-graph data structure, iterating the graph repeatedly to get the summary. PyTextRank is a graph-based summarizing technique that uses feature vectors to generate summaries. The TextRank [30] algorithm variant that generates text abstracts instead of feature vectors is implemented in Python. Instead of getting relevant features from the TextRank algorithm, it builds the text summaries using graph techniques.

4.3 Evaluation Metrics Commonly used two evaluation metrics are explained below: ROUGE: ROUGE [31] stands for Recall-Oriented Understudy for Gisting Evaluation. It is simply a set of measures for assessing automatic text or document summarization, as well as machine translations. In most text summarizing algorithms, ROUGE is utilized as an assessment metric. It operates by contrasting a description or translation generated automatically with a series of known summaries (typically human-produced). The resolution of texts being examined between system reports and reference reports can be thought of as ROUGE-N, ROUGE-S, and ROUGE-L. ROUGE-N compares the generated and reference summary for unigram, bigram, trigram, and higher-level n-gram overlap. ROUGE-L employs LCS (Longest common sequence) deciding which is the largest matched word sequence. The LCS approach benefits from not needing consecutive matches in measurement but rather in-sequence matches, which helps to capture sentence-level word order. ROUGE 1, for instance, corresponds to the overlap of unigrams between the reference summary and the system summary. BLEU: BLEU (Bilingual Evaluation Understudy) is a score used to compare a potential interpretation for a phrase to one or even more examples of summaries or translations. An excellent match receives a value of 1, whereas a complete mismatch receives a value of 0. The score was created to assess the accuracy of automatic machine translation system predictions. The BLEU score [32, 33] is implemented in the Python Natural Language Toolkit or NLTK, which you can use to compare your generated text to a reference. Although it was created to assess text for translating, it could also be used to assess text for a number of natural language understanding applications (Tables 1, 2, 3, and 4).

502

S. Antony and D. S. Pankaj

Table 1 ROUGE score for CNN/daily mail dataset Document

Reference metric

ROUGE type

Phase-1

Phase-2

1

Precision

ROUGE 1

0.485

0.455

ROUGE 2

0.200

0.114

ROUGE-L

0.073

0.175

ROUGE 1

0.145

0.150

ROUGE 2

0.042

0.027

ROUGE-L

0.000

0.001

ROUGE 1

0.224

0.226

ROUGE 2

0.069

0.043

ROUGE-L

0.001

0.002

ROUGE 1

0.649

0.486

ROUGE 2

0.220

0.146

ROUGE-L

0.110

0.145

ROUGE 1

0.224

0.182

ROUGE 2

0.060

0.043

ROUGE-L

0.001

0.001

ROUGE 1

0.333

0.265

ROUGE 2

0.095

0.066

ROUGE-L

0.001

0.002

Document & Reference

Phase-1

Phase-2

1

0.122

0.117

2

0.144

0.129

3

0.155

0.140

4

0.139

0.127

5

0.083

0.069

6

0.125

0.121

7

0.065

0.057

8

0.174

0.160

9

0.095

0.085

10

0.090

0.080

Recall

F1-score

2

Precision

Recall

F1-score

Table 2 BLEU score for CNN/daily mail dataset

5 Result Analysis Figures 6 and 7 show an example summarization result from an input document of DUC2004 is depicted. The average and maximum ROUGE score gained by our proposed model is depicted in Tables 5 and 6 over CNN/Daily Mail and DUC2004 dataset. The proposed model evaluates ROUGE score and the BLEU score which

Automatic Text Summarization Using Word Embeddings

503

Table 3 Rouge score for DUC2004 dataset Document

Reference metric

ROUGE type

Phase-1

Phase-2

1

Precision

ROUGE 1

0.400

0.288

ROUGE 2

0.071

0.018

ROUGE-L

0.095

0.168

ROUGE 1

0.327

0.264

ROUGE 2

0.063

0.018

ROUGE-L

0.001

0.001

ROUGE 1

0.360

0.275

ROUGE 2

0.067

0.018

ROUGE-L

0.001

0.003

ROUGE 1

0.338

0.263

ROUGE 2

0.093

0.037

ROUGE-L

0.094

0.145

ROUGE 1

0.276

0.241

ROUGE 2

0.079

0.036

ROUGE-L

0.001

0.001

ROUGE 1

0.303

0.251

ROUGE 2

0.086

0.037

ROUGE-L

0.001

0.002

Recall

F1-score

2

Precision

Recall

F1-score

Table 4 BLEU score for DUC2004 dataset

Document

Reference

Phase-1

Phase-2

1

1

0.041

0.036

2

0.042

0.038

3

0.042

0.039

4

0.046

0.041

1

0.045

0.039

2

0.047

0.041

3

0.045

0.041

4

0.049

0.041

1

0.056

0.042

2

0.059

0.045

3

0.051

0.044

4

0.059

0.047

2

3

504

S. Antony and D. S. Pankaj

results in finding whether the translated summary from the input documents of DUC2004 and CNN/Daily Mail datasets are equivalent to the reference summaries of these datasets. Our model generates two different summaries; hence, two different phase scores are available. Phase-1 scores of ROUGE and BLEU denote the score generated by the extractive summary, and phase-2 scores of ROUGE and BLEU denote the score generated by the abstractive summary. Different state-of-the-art models are taken into consideration for the result analysis, and they include Genism [29], PKUSUMSUM [27], OPINOSIS [10], and PyTextRank [30]. These models are providing different methodologies for generating summaries from input text document. We use these models in comparison with our model to evaluate how well our proposed system works in a competitive manner. These models are mainly used as baselines and are commonly used for research studies and experimental analysis in text summarization area. Tables 7 and 8 show the comparative study of these baseline model with our proposed model. We are doing the comparison over the ROUGE metric over the

Fig. 6 Example of an extractive summary generated in phase-1

Fig. 7 Example of an abstractive summary generated in phase-2

Table 5 Averaged and maximum ROUGE score in CNN/Daily Mail dataset

Metric

ROUGE type

Average value

Max value

Precision

ROUGE 1

0.096

0.2

ROUGE 2

0.093

0.23

Recall

F1-score

ROUGE-L

0.095

0.203

ROUGE 1

0.101

0.203

ROUGE 2

0.092

0.272

ROUGE-L

0.097

0.205

ROUGE 1

0.092

0.166

ROUGE 2

0.093

0.208

ROUGE-L

0.098

0.177

Automatic Text Summarization Using Word Embeddings Table 6 Averaged and maximum ROUGE score in DUC2004 dataset

505

Metric

ROUGE type

Average value

Max value

Precision

ROUGE 1

0.171

0.389

ROUGE 2

0.021

0.169

Recall

F1-score

ROUGE-L

0.113

0.292

ROUGE 1

0.22

0.466

ROUGE 2

0.028

0.222

ROUGE-L

0.013

0.005

ROUGE 1

0.189

0.402

ROUGE 2

0.024

0.024

ROUGE-L

0.002

0.002

phase-2 results, i.e., comparison is done with the abstractive summary generated by the model. For immense comparison, we analyzed the system over a 50% and 25% summary length. It is clear from the comparative findings of DUC2004 and CNN/Daily Mail dataset that our suggested system works more effectively with respect to Precision and F-score. Tables 9 and 10 also show the comparative study of baseline model with our proposed model with respect to CNN/Daily Mail dataset. PyTextRank consistently excels the overall average F-score value for both CNN/Daily Mail and DUC2004 dataset. PKUSUMSUM exceeds the values of Fscore in 25% summary length for CNN/Daily Mail dataset. Also, Genism models cover the average recall score of 0.4 to 0.8 both in unigrams and bigrams of ROUGE score in both datasets. The findings demonstrate the strong competitive efficiency of our suggested system. The BLEU score generated is also taken into account, where the proposed system scored an average of 0.051 and 0.044 for 25% and for 50% and average of 0.057 and 0.048 in phase-1 and phase-2 results of DUC2004 dataset. Similarly for the CNN/Daily Mail dataset, the phase-1 and phase-2 BLEU scores with respect to 50% are 0.121 and 0.107 in average and for 25% are 0.110 and 0.099. The maximum value generated by our model is between 0.2 to 0.3 and 0.2 to 0.5 for CNN/Daily Mail and DUC2004 dataset are depicted in Tables 5 and 6.

6 Conclusion and Future Works In this paper, the proposed methodology aims to provide a summary with higher semantic meaning. The hypothesis used for bringing out the semantic relationship within sentences will work great in generating summaries of better quality. The paper describes a distributional hypothesis-based extractive summarization technique to preserve the meaning of the text in order to produce better extractive summaries. We have also discussed a way to upgrade the generated extractive summary to an abstractive summary by using a combination of 3 tools: WordNet, Lesk algorithm, and POS Tagger.

506

S. Antony and D. S. Pankaj

Table 7 Averaged 25% summarization result over DUC2004 dataset Metric

ROUGE type

Propd. approach

Genism

OPINOSIS

PKUSUMSUM

Precision

ROUGE 1

0.175

0.05

0.19

0.1

0.03

ROUGE 2

0.022

0.02

0.03

0.04

0.097

ROUGE-L

0.117

0.05

0.05

0.1

0.44

ROUGE 1

0.22

0.84

0.07

0.74

0.12

ROUGE 2

0.031

0.44

0.01

0.28

0.701

ROUGE-L

0.012

0.47

0.05

0.49

0.369

ROUGE 1

0.185

0.09

0.08

0.17

0.046

ROUGE 2

0.024

0.01

0.01

0.17

0.163

ROUGE-L

0.002

0.09

0.07

0.17

0.075

Recall

F1-score

PyText rank

Table 8 Averaged 50% summarization result over DUC2004 dataset Metric

ROUGE type

Propd. approach

Genism

OPINOSIS

PKUSUMSUM

PyText rank

Precision

ROUGE 1

0.27

0.048

0.064

0.075

0.097

ROUGE 2

0.046

0.043

0.189

0.049

0.317

ROUGE-L

0.158

0.024

0.088

0.028

0.109

ROUGE 1

0.182

0.521

0.053

0.649

0.106

ROUGE 2

0.03

0.875

0.07

0.85

0.404

ROUGE-L

0.008

0.557

0.025

0.508

0.165

ROUGE 1

0.2

0.087

0.067

0.134

0.101

ROUGE 2

0.032

0.08

0.08

0.092

0.355

ROUGE-L

0.001

0.045

0.029

0.053

0.136

Recall

F1-score

Table 9 Averaged 50% summarization result over CNN/Daily Mail dataset Metric

ROUGE type

Propd. approach

Genism

OPINOSIS

PKUSUMSUM

Precision

ROUGE 1

0.48

0.048

0.064

0.075

0.097

ROUGE 2

0.15

0.043

0.189

0.049

0.317

Recall

F1-score

PyText rank

ROUGE-L

0.17

0.024

0.088

0.028

0.109

ROUGE 1

0.15

0.521

0.053

0.649

0.106

ROUGE 2

0.03

0.875

0.07

0.85

0.404

ROUGE-L

0.006

0.557

0.025

0.508

0.165

ROUGE 1

0.22

0.087

0.067

0.134

0.101

ROUGE 2

0.06

0.08

0.08

0.092

0.355

ROUGE-L

0.013

0.045

0.029

0.053

0.136

Automatic Text Summarization Using Word Embeddings

507

Table 10 Averaged 25% summarization result over CNN/Daily Mail dataset Metric

ROUGE type

Propd. approach

Genism

OPINOSIS

PKUSUMSUM

Precision

ROUGE 1

0.3

0.05

0.19

0.1

0.03

ROUGE 2

0.06

0.02

0.03

0.04

0.097

ROUGE-L

0.13

0.05

0.05

0.1

0.44

ROUGE 1

0.18

0.84

0.07

0.74

0.12

ROUGE 2

0.03

0.44

0.01

0.28

0.701

ROUGE-L

0.0012

0.47

0.05

0.49

0.369

ROUGE 1

0.21

0.09

0.08

0.17

0.046

ROUGE 2

0.045

0.01

0.01

0.17

0.163

ROUGE-L

0.002

0.09

0.07

0.17

0.075

Recall

F1-score

PyText rank

For evaluating the performance of the system two different datasets: CNN/Daily Mail and DUC2004 are used. For evaluating the described model, two metrics are considered: ROUGE and BLEU. The proposed method is compared with different state-of-the-art models, and the comparative result has been discussed. The results show that the language generator module in the phase-2 tries to generate an abstractive summary which is a promising way of converting extractive summary to an abstractive summary. More potential work in this direction can lead to a better generation of summaries in a better human-comprehensible form. The suggested system’s shortcoming is that using phase-2 we are trying to generate an abstractive version over extractive summary. The generated extractive summaries have better ROUGE and BLEU scores but the overall performance of hybrid summarization is not very promising when considering human comprehension. A good language generation process is required to produce summaries that are more human comprehensible in nature. Our future work over the proposed work will address (1) employing a better word embedding to capture semantics in a very fine grain level; (2) invoking more semantic features in the ranking algorithm which helps to rank the sentences in much better manner; and (3) a language generator module which can compress sentences and form a new sentence that is more human comprehensible.

References 1. Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165 2. Mishra R, Gayen T (2018) Automatic lossless-summarization of news articles with abstract meaning representation. Procedia Comput Sci 135:178–185; The 3rd international conference on computer science and computational intelligence (ICCSCI 2018): empowering smart technology in digital era for a better life 3. Christian H, Agnus M, Suhartono D (Dec 2016) Single document automatic summarization using term frequency-inverse document frequency (TF-IDF). ComTech: Comput, Math Eng Appl 7:285

508

S. Antony and D. S. Pankaj

4. Khan A, Salim N, Kumar YJ (2015) A framework for multi-document abstractive summarization based on semantic role labelling. Appl Soft Comput 30:02 5. El-Kassas W, Salama C, Rafea A, Mohamed H (2020) Edgesumm: graph-based framework for automatic text summarization. Inf Process Manage 57:06 6. Alzuhair A, Al-Dhelaan M (2019) An approach for combining multiple weighting schemes and ranking methods in graph-based multi-document summarization. IEEE Access 7:120375– 120386 7. Lin Y-C, Ma J (2021) On automatic text extractive summarization based on graph and pretrained language model attention 8. Ma S, Sun X, Li W, Li S, Li W, Ren X (2018) Query and output: generating words by querying distributed word representations for paraphrase generation 9. Conroy JM, O’leary DP (2001) Text summarization via hidden Markov models. SIGIR’01. Association for Computing Machinery, New York, NY, USA, pp 406–407 10. Tohalino JV, Amancio DR (2018) Extractive multi-document summarization using multilayer networks. Phys A 503:526–539 11. Guan Y, Guo S, Li R, Li X, Zhang H (Nov 2021) Integrating semantic scenario and word relations for abstractive sentence summarization. In: Proceedings of the 2021 conference on empirical methods in natural language processing. Online and Punta Cana, Dominican Republic, Association for Computational Linguistics, pp 2522–2529 12. Alguliyev R, Aliguliyev R, Isazade N, Abdi A, Idris N (2019) Cosum: text summarization based on clustering and optimization. Expert Syst 36:02 13. Lloret E, Rom´a-Ferri M, Sanz M (Nov 2013) Compendium: a text summarization system for generating abstracts of research papers. Data Knowl Eng 88:164–175 14. Chen Q, Zhu X, Ling Z, Wei S, Jiang H (2016) Distraction-based neural networks for document summarization 15. Verma S, Nidhi V (2017) Extractive summarization using deep learning. CoRR, abs/1708.04439 16. Paulus R, Xiong C, Socher R (2017) A deep reinforced model for abstractive summarization 17. Chopra S, Auli M, Rush AM (June 2016) Abstractive sentence summarization with attentive recurrent neural networks. In: Proceedings of the 2016 conference of the North American chapter of the association for computational Linguistics: human language technologies. Association for Computer Lingusitics, San Diego, California, pp 93–98 18. An C, Zhong M, Geng Z, Yang J, Qiu X (2021) Retrieval-sum: a retrieval enhanced framework for abstractive summarization 19. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space 20. Wong K-F, Wu M, Li W (2008) Extractive summarization using supervised and semi-supervised learning. In: Proceedings of the 22nd international conference on computational Linguistics— volume 1, COLING’08. Association for Computational Linguistics, USA, pp 985–992 21. Fang C, Dejun M, Deng Z, Zhiang W (2016) Word-sentence co-ranking for automatic extractive text summarization. Expert Syst Appl 72:12 22. Bhat IK, Mohd M, Hashmy R (Jan 2018) SumItUp: a hybrid single-document text summarizer, pp 619–634 23. Mohd M, Jan R, Bashir M (Oct 2019) Text document summarization using word embedding. Expert Syst Appl 143:112958 24. Jin X, Han J. K-means clustering. Springer US, Boston, pp 563–564 25. Widyassari AP, Rustad S, Shidik GF, Noersasongko E, Syukur A, Affandy A, Setiadi DRIM (2020) Review of automatic text summarization techniques and methods. J King Saud Univ— Comput Inf Sci 26. You J, Hu C, Kamigaito H, Takamura H, Okumura M (Sept 2021) Abstractive document summarization with word embedding reconstruc tion. In: Proceedings of the international conference on recent advances in natural language processing (RANLP 2021). Held Online, INCOMA Ltd, pp 1586–1596

Automatic Text Summarization Using Word Embeddings

509

27. Zhu J, Zhou L, Li H, Zhang J, Zhou Y, Zong C (Jan 2018) Augmenting neural sentence summarization through extractive summarization, pp 16–28 28. Gambhir M, Gupta V (2016) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47:1–66 29. Barrios F, López F, Argerich L, Wachenchauzer R (2016) Variations of the similarity function of textrank for automated summarization. CoRR, abs/1602.03606 30. Mihalcea R, Tarau P (July 2004) TextRank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing. Association for Computational Linguistics, Barcelona, Spain, pp 404–411 31. Ganesan K (2018) Rouge 2.0: updated and improved measures for evaluation of summarization tasks 32. Parida S, Motlicek P (Nov 2019) Abstract text summarization: a low resource challenge. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 5994–5998 33. Indu M, Kavitha KV (2016) Review on text summarization evaluation methods. In: 2016 international conference on research advances in integrated navigation systems (RAINS), pp 1–4

Video Object Detection with an Improved Classification Approach Sita Yadav and Sandeep M. Chaware

Abstract Object detection is primary task in computer vision. The various CNN are majorly used by researchers to improve the classification and detection of objects present in video frames. Object detection is a prime task in self-driven cars, satellite images, robotics, etc. The proposed work is focused on improvement of object classification and detection in videos for video analytics. The key focus of work is identification and tuning of hyper-parameters in deep learning models. The deep learning-based object detection models are broadly classified into two categories, i.e., one-stage detector and two-stage detector. We have selected one-stage detector for experimentation. In this paper, a custom CNN model is given with hyper-parameter tuning and the results are compared with state of art models. It is found out that the hyper-parameter tuning on CNN models helps in improvement of object classification and detection accuracy of deep learning models. Keywords Hyper-parameter tuning · YOLO · Classification · Object detection · CNN

1 Introduction Detection of objects in videos is a primary task for video analytics. It deals with the identification of objects of a particular class in videos. Object detection is primarily required in autonomous driving surveillance cameras, medical images, robotics, and many more. Object detection gives localization and classification as output, it generally returns the class of object with confidence scores. For video analytics, the hyperparameter tuning can be done by considering learning rate, activation function, and S. Yadav (B) · S. M. Chaware PCCOE, Pune, India e-mail: [email protected] S. Yadav Army Institute of Technology, Pune, India S. M. Chaware JSPM RSCOE, Pune, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_38

511

512

S. Yadav and S. M. Chaware

weight parameters. Usually, trial-and-error method is used to identify best parameter to be hyper-tuned. This paper covers the various parameters which can be used for hyper-parameter tuning in video analytics. Experimentally, it is shown that improvement in hyper-parameter tuning speeds up the object detection for video analytics. The major contribution in the paper is identification of hyper-tuning parameters which directly impact the result on video object detection. Also, it is proven that normalized input increases the results. Hyper-parameter tuning and optimization have been an area of research for many years and mainly grading search. The commonly used models were Gaussian process model and TPE which gives density estimate for hyper-parameters. These methods were competent with few hyper-parameter but demands high computation. On the hand, the manual hyper-parameter tuning methods are less computation expensive and easy to judge the performance of any CNN-based model. Former object detection models were based on selective search-based region proposal. Such models were based on the CNN for candidate region proposal and act as feature extractor. The classifier used was SVM as it can handle regression. The fine-tuning of parameters was done with respect to learning rates [1]. The next object detection methodology was based on region proposal which is implemented as Fast R-CNN. The base network involved in process was VGG-16, and RoI pooling with softmax layer is used for implementation [2]. The higher version of region proposal method was Faster R-CNN to detect object; it is much faster than selective search. In region proposal methods, multitask loss function is considered [3], and the loss function considered in [1–3] is given below localization

classification          u L p, u, t , υ = Lc ( p, u) +λ [u ≥ 1]Ll t u , υ ,

(1)

In the above function, the classification term represents log loss on the object prediction for class u and the localization term is representing L 1 loss for the four coordinates of the rectangle giving the position of the object. When u = 0, it means no object is present and parameter λ is adjusted according to classification and localization. The p represents the predicted probability for an object. In video object detection, speed is required to detect objects in continuous frames; hence, one-stage object detector methods like SSD [4] and YOLO [5] were used. The one-stage methods use multi-term loss function along with a grid proposal for an object to ensure the detection proposes to complete in one stage. This method achieved speed but compromised accuracy. After the first version of YOLO, many other upgraded versions of one-detector were proposed by researchers. The YOLO [you only look once] versions V1, V2, V3, V4, and V5 are progressive methods for object detection with speed and accuracy. In all above-discussed research works, the focus was more on neutral network building, loss functions, learning rate, classifiers, and number of stages involved in object classification and detection. The role of hyper-parameter tuning is identified as one of the essential content in video object analytics which is dealing with object detection.

Video Object Detection with an Improved Classification Approach

513

The convolutional neural networks are having multiple layers and architectures. The deep learning architectures are divided into two categories, the first category is based on depth-wise separable convolutions, and the second works on pointwise separable convolution networks. The depth-wise separable convolution network is proposed in [6]. Random image augmentation technique with batch normalization process is used during training. Author has used multilayer perception, plane convolutional neural network, and the MobileNet base approach for object detection [6]. In this paper, we worked on a detailed review and proposed a method for classification improvement in single-stage object detection methods. This method significantly improves the quality of the classification on state-of-the-art methods like YOLO. The method can also be beneficial for real-time object detection methods. We have considered ADAM Optimizer, loss function, regularization, class imbalance factor, role image size during frame extraction, and depth-wise and point-wise convolution as parameters to be considered for tuning. In literature survey, it is identified that hyper-parameter tuning plays a key role in classification but very few research work is available which address all the parameters. This paper organization is as follows. In Sect. 2, the knowledge of CNN models with localization and classification is given. Sect. 3 contains details of parameter identification required for classification improvement and tuning. The experimentation and results are in Sects. 4 and 5. Section 6 described the conclusion.

2 Research Methodology Figure 1 shows the process of object detection. Starting steps include extraction of input frames from videos followed by bounding box prediction. Calculation of objectness score which gives confidence about the presence of object from a class in the frame. Object classification and localization are the final results of object detection in the models. Figure 2 shows the layers of CNN. The initial layer is input the layer provides input frame shape. It has no learnable parameters. After initial layers, the convolutional layer is applied where CNN learns and gives weight metrics. The learnable parameters are obtained by multiplying width ‘w’, height ‘h’, previous layer filters ‘f ’ and filters

Fig. 1 Steps involved in classification and localization

514

S. Yadav and S. M. Chaware

Fig. 2 Multiple layer connectivity in CNN [7]

in current layers ‘k’ ((w*h*f) + 1*k). 1 is added for bias term for each filter. The maxpool layers are applied after convolutional layers. The fully connected layers are applied to flatten the parameters. This layer gives meaningful parameters, and every neuron is connected with other neurons. The probabilities are calculated at this layer using formula (current layer neurons ‘c’ * previous layer neurons) + 1 *c).

3 Parameters for Classification Improvement 3.1 Related Work In [8], the object detection method was proposed with improvement in object localization and frame identification. An attention map was created to generate features. The paper [9] gives a fusion network to improve R-CNN. The improvement in loss function with better segmentation is proposed. The improvement achieved in segmentation was 2.8%, and efficiency for small object detection based on PASCAL VOC datasets was 4.8%. Reference [10] gave the comparison between SSD and Faster R-CNN and shows the detection accuracies are nearly the same in both methods.

Video Object Detection with an Improved Classification Approach

515

An accurate object detection method using the Local Illumination Background Subtraction Method was proposed in [11]. However, the moving object annotation is a challenge for this method. We performed multi-object detection by extracting objects from video frames. The parameters considered for tuning are discussed in later sections of paper. The depth-wise and point-wise convolution are applied on frames, and results were compared.

3.2 The Following Hyper-tuning Parameters Are Considered to Check the Performance Improvement in Classification Methods 3.2.1

Depth-wise Convolution [12]

The depth-wise and point-wise convolution are two methods used for feature extraction. We have implemented both to test the performance of both methods (Fig. 3). The calculation required for feature extraction is shown in Fig. 4. For the same, the video frame of size 12 × 12 × 3 is selected to verify the features and a tensor of 5 × 5 × 3 is applied on the top of it. The first pixel gets multiplied by the first pixel with the first weight the of tensor. Similarly, other pixel values get multiplied with other pixels and after that addition operation is applied to values. As the multiplication operation need more computation time, depth-wise and point-wise convolution is used. Every single weight gets multiplied with every 5 × 5 × 3 tensor and it moves to whole frame which is 64 times and total operation required is 75 × 64. Hence, for one tensor for 256 blocks will use 256 tensors. Hence, total multiplication required is 75 × 64 × 256.

Fig. 3 Overview of depth-wise and point-wise convolution

516

S. Yadav and S. M. Chaware

Fig. 4 Feature extraction

Fig. 5 Point-wise convolution

3.2.2

Point-wise Convolution

Figure 5 shows the calculation for point-wise convolution. In point-wise convolutions, same image frame will be used instead of having 3 separate convolutions. Consider one tensor of 5 × 1 when gets multiplied by 3 convolution with image, the result generated three 8 × 8 × 3 convolution. Here, one 1 × 1 × 3 convolution is used which is going to multiply 3 layers and adding them up to a simple tensor of 8 × 8 × 1 tensor. The computation required is 5 × 5 × 3 number of operation total 64 movement. Total calculation is 64 × 75; this is the depth-wise convolution. In point-wise convolution 3 × 64 multiplication so output will be 8 × 8 × 1. In point wise, we do not have to multiply 256 times in point-wise calculation.

3.2.3

Batch Normalization

Batch normalization enables faster training of network. When we train network, the output of each activation layer shifts due to weight updates. It reduces the dependence of gradients on the scale of the parameters or their initial values. The internal covariant shift causes loss of features; hence, we have to fix the distribution. The fixation reduces the internal covariant shift which is known as batch normalization.

Video Object Detection with an Improved Classification Approach

3.2.4

517

Image Size

The difference in input images to neural network and real work data is one of the reason for poor classification [3]. The real-world data is having different resolutions with different sizes. In the current work, the images with fix size are considered (640 × 640) for the dataset as well as for real-world testing images. The images with high resolution and size were annotated using LableImg software to generate the testing dataset in required format.

3.2.5

Imbalanced Classes

The imbalanced classes cause poor detection due to variable size of object. To solve this issue, YOLO (regression) is used which is a one-stage detector.

3.2.6

Accuracy Metrics

The accuracy metrics are calculated based on precision, recall, and F1-score. Following are the formulas required for the calculation of precision, recall, and F1-score. Precision = True Positive/Predictions

(2)

Recall gives the ratio of correctly predicted positive observations to the ground truth. Recall = True Positive/(Ground Truths)

(3)

F1-score is the weighted average of precision and recall [13]. F1 - score =

3.2.7

(2 ∗ Precision ∗ Recall) Precision + Recall

(4)

Loss

A loss function gives analytics of input data classification in a dataset based on a network [14]. Smaller value of loss represents better classifier for modeling the input data and the output targets. localization

classification          u L p, u, t , υ = Lc ( p, u) +λ [u ≥ 1]Ll t u , υ ,

(5)

518

• • • •

S. Yadav and S. M. Chaware

Classification = Log loss for true object class u. Localization = L 1 loss for the coordinates of the rectangle. When no object is present, then localization penalty has no impact, u = 0. Parameter λ may be adjusted as per requirement of classification or localization [15].

3.2.8

Regularization

Regularization is implemented during training and not in validation/testing [16]. If regularization loss is added during validation/testing, then it gives similar loss values and curves. Training loss is calculated during each epoch while validation loss is considered after each epoch The training loss is continually monitored during entire epoch. The validation metrics are computed over the validation set only once after completion of current training epoch. It reduces the risk of overfitting and also uses simpler methods Regularization methods: L 1 (LASSO), L 2 (RIDGE), and Dropout [17]. As ridge regression is more popular method, it includes labeling −1 and + 1. It is suitable for multiclass multi-label regression. We have considered L 2 for experimentation; the mathematical formula for the same is given below: Loss = Loss + Weight penalty   Wi2 = (Yi ∞ − Yi )2 + ∝

(6)

Y i Actual. Y i ’ Predicated. W i Regularization factor. 3.2.9

Optimizer

Two optimizers namely Stochastic Gradient Descent (SGD) and ADAM is used for experimentation purposes. SGD considers small subset or random selection of dataset. At low learning rate, the SGD gives performance similar to regular gradient descent [5]. Stochastic objective function optimization is done by using ADAM. Its performance is based on combination of root mean square and adaptive gradient algorithm [18].

4 Experimentation In this section, we perform a comparison of our results obtained after doing various hyper-parameter tuning with earlier proposed state-of-the-art models. The experiments were conducted on MSCOCO dataset and live video dataset. The GPUenabled machine was configured with python packages like scikit-image, Keras,

Video Object Detection with an Improved Classification Approach

519

Fig. 6 Result obtained with live video

Cython, scipy, and opencv-python. The initial learning rate used is 0.001. The training dataset contains 3000 images. The confusion matrix shows correct identification for few classes and incorrect for few classes. The confusion model shows confusion in two classes Books and medicine Fig. 6. The overall accuracy of correct prediction is calculated by dividing correct prediction with total number of prediction. The overall accuracy is 0.9241 Table 2. The proposed word focused on implementation of multiclass classification so the individual class accuracy calculation is required to find out which class is not getting classified properly. To achieve this one verses, all methods are used for multiclass classification. The individual class accuracy achieved ranges from 0.54 to 0.93. The average detection accuracy received for model is 0.9241. It has been observed that if learning rate is higher the loss gets enhanced after some iterations. The argmax() function is used to get highest probability value. Further, the COCO dataset is given as input to model and accuracy received is 93%. For COCO in same model, the num_classes file value set to 80 and batch size kept is 128 with learning rate 0.001.

5 Results The results of model are given below. Screenshots from live video show detection accuracy for single and multiple objects as seen in Fig. 6. For single object, the detection accuracy reaches up to 98%. Also, it is able to detect object in overlapping. Section 1 in Fig. 6 shows person detection accuracy up to 97%, and in Sect. 2, the objects are in overlapping and detection accuracy for the object which is behind

520

S. Yadav and S. M. Chaware

Table 1 Result comparison of Proposed model with various versions of YOLO Sr. No.

Anchor boxes

Model versions

Activation function

mAP

IoU (%)

IoU threshold (%)

1

Model on custom dataset

No

Mish

0.5136

41.47

50

2

YOLOV2

Yes

Leaky relu

0.4489

41.71

50

3

Tiny YOLO

Yes

Leaky relu

0.5154

39.75

50

4

YOLO V3

Yes

Mish

0.4666

41.86

50

Table 2 Training and validation accuracy received with different layers (Max pooling, flatten, dense) Sr. No.

Epochs

Train accuracy

Validation accuracy

1

2

0.9241

1.000 (With 1 max pooling)

2

10

0.6667

0.6632 (With 1 max pooling and 1 flatten)

3

20

0.8843

0.9112 (With 2 max pooling and 1 flatten)

4

30

0.97 tp 1.00

0.9 (With 2 max pooling,1 flatten, 1 Dense)

(Class medicine) is 73%. Section 3 shows multiple object detection accuracy in nonoverlapping poses. The detection accuracy reaches up to 100%. In Sect. 4, the single object detection accuracy for class plant is 98% (Tables 1 and 2).

5.1 Result on Live Video The results were obtained on challenging videos where the frames are dark and not stable. In Fig. 7, the object detection accuracy in Sect. 1 is 99%. In dark videos, the object detection accuracy ranges from 72 to 90%. After fine-tuning of parameters, the number of trainable parameters was 54,535. Less number of trainable parameters causes fast training of the network and reduces over head of non-required parameter learning by the network. Hence, overall system performance got improved. Global average pooling2D is used to generate feature maps.

6 Conclusion We have performed the hyper-parameter tuning by considering variation in learning rate, type of convolution, image size for anchor box selection, normalization, and loss functions. From the results of Table 3, it is observed that the mean average precision achieved by our model is better than R-CNN, Fast R-CNN, Faster R-CNN, YOLO2, and YOLO3. The mAP achieved is 72.5 which less than Mask R-CNN mAP 75.

Video Object Detection with an Improved Classification Approach

521

Fig. 7 Result on dark dataset

We have implemented our model with reference to YOLO3 base models; hence, the mAP is less than Mask R-CNN. The Mask R-CNN work on efficient net and YOLO is implemented on dark net. The advanced version of YOLO like YOLO V4, YOLO V5, etc., may outperform Mask R-CNN. As YOLO is one of the fastest model for live object detection. This is a part of further research. It is confirmed that the training and validation accuracy highly correlated with hyper-tuning of parameters. Result is compared with state-of-the-art models. We have received real-time object detection accuracy achieved up to 98% which is very much competent with Mask R-CNN. Table 3 mAP comparison of proposed model with state-of-the-art models Framework

Proposal

Optimizer

Loss function

R-CNN [1]

Selective search

SGD

Fast R-CNN Selective [2] search Faster R-CNN [3]

Platform

mAP

Learning rate

Bounding box Caffe regression (AlexNet)

58.5

0.001

SGD

Bounding box Caffe regression

63.2

0.001

RPN

SGD

Bounding box Caffe regression

69.4

0.0001 to 0.001

YOLO [19]

Confidence score (Grid)

SGD

Bounding box Darknet regression + Sum squared loss + Box confidence

70.2

0.001

Mask R-CNN

Extension of faster R-CNN

ADAM

Bounding box EfficientNet regression

75

0.001 (continued)

522

S. Yadav and S. M. Chaware

Table 3 (continued) Framework

Proposal

Optimizer

Loss function

Platform

Proposed

Confidence score (Grid)

ADAM

Bounding box Darknet regression + Sum squared loss + Box confidence + Ridge loss

mAP

Learning rate

72.5

0.001

References 1. Girshick R et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 580–587 2. Girshick R (2015) Fast R-CNN, IEEE international conference on computer vision (ICCV), pp 1440–1448 3. Ren et al S (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings NIPS, pp 91–99 4. Jalled F, Voronkov I (Nov 2016) Object detection using image processing. arXiv:1611.077 91v1 5. Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. Adv Neural Inf Proc Syst 379–387 6. Stephen O, Jang YJ, Yun TS, Sain M (2019) Depth-wise based convolutional neural network for street imagery digit number classification. In: 2019 IEEE international conference on computational science and engineering. https://doi.org/10.1109/CSE/EUC.2019.00034 7. He W, Huang Z, Wei Z, Li C, Guo B (2019) TF-YOLO: an improved incremental network for real-time object detection. Appl Sci 9:3225 8. Lou Y, Fu G, Jiang Z, Men A, Zhou Y (2017) Improve object detection via a multi-feature and multi-task CNN model. IEEE 9. Qian H, Xu J, Zhou J (2018) Object detection using deep convolutional neural networks. IEEE 10. Roach M, Mason J (2002) Recent trends in video analysis: a taxonomy of video classification problems. IMSA 11. Cholle F (April 2017) Xception: deep learning with depthwise separable convolutions. arXiv: 1610.02357 12. Cao D, Chen Z, Gao L (2020) An improved object detection algorithm based on multi-scaled and deformable convolutional neural networks. Hum Cent Comput Inf Sci 10:14 13. Antonopoulos N, Anjum A, Abdullah T (2015) Video stream analysis: an object detection and classification framework for high performance video analytics. IEEE transaction 14. Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: European conference on computer vision, pp 21–37 15. Qu L, Wang S, Yang N, Chen L, Liu L (2017) Improving object detection accuracy with region and regression based deep CNNs. IEEE 16. Tang C, Feng Y et al (2017) The object detection based on deep learning. In: 4th International conference on information science and control engineering. IEEE 17. Redmon J, Farhadi A (2017) YOLOv3: an incremental improvement. Computer vision and pattern recognition, arXiv:1804.02767 18. Makandar A, Jevoor M (2015) Preprocessing step-review of key frame extraction techniques for object detection in video. Int J Curr Eng Technol INPRESSCO 19. Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

Video Object Detection with an Improved Classification Approach

523

20. Stephen O, Jang YJ, Yun TS, Sain M (2019) Depth-wise based convolutional neural network for street imagery digit number classification. In: 2019 IEEE international conference on computational science and engineering (CSE) and IEEE international conference on embedded and ubiquitous computing (EUC), pp 133–137. https://doi.org/10.1109/CSE/EUC.2019.00034

An Unsupervised Image Processing Approach for Weld Quality Inspection Shashwat Shahi, Gargi Kulkarni, Sumukh Sirmokadam, and Shailesh Deshpande

Abstract Quality classification of artifacts such as weld joints depends on various factors which may not be equally important in determining the quality. Previously well received methods for this task are the use of deep neural networks, fuzzy inference systems and a combination of both these methods (Adaptive Neuro Fuzzy Inference System (ANFIS)). Although promising, these methods have presented notable roadblocks which, if addressed, can help make significant progress in this field of automated industrial quality check process. This work endeavors to infuse domain knowledge into the working of a regular ANFIS (DKI-ANFIS) and the modeling of a customized membership function to improve the computational capacity. As an essential contribution, this approach allows flexible modification of a DKI-ANFIS which can be a crucial means of improving automated quality inspection and also the saving of precious industrial resources as well as time as a consequence. Keywords Computer vision · Weld quality inspection · Geometrical features · Domain driven quality inspection technique · Domain knowledge infused-adaptive neuro fuzzy inference system (DKI-ANFIS)

S. Shahi (B) · G. Kulkarni · S. Sirmokadam · S. Deshpande Tata Consultancy Services, Mumbai, India e-mail: [email protected] G. Kulkarni e-mail: [email protected] S. Sirmokadam e-mail: [email protected] S. Deshpande e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_39

525

526

S. Shahi et al.

1 Introduction The process that allows the joining of materials, like metals by using heat at high temperatures is termed welding. After cooling, the base metal and the filler metal get attached. There are many methods through which the welding process can be performed. These can be broadly classified into four categories as: 1. Manual welding: A human welder is involved in each step of the welding process. Since this type of welding requires all the work to be done manually, it can be physically challenging. 2. Semi-automatic welding: Automated equipment handles most of the work with human intervention required only at selective steps. 3. Machine or mechanized welding: This requires an operator whose manipulation or adjustment is required at certain stages of the welding process. 4. Automated or robotic welding: In this type of welding very less to no manual or human intervention is needed and the entire process is carried out by automated or robotic equipment. Welding process plays a vital role in the manufacturing industry and forms its backbone. As per a Mckinsey report [1] in 2012, the manufacturing industry generates 16% of the global GDP and 14% of the employment. With such diverse applications, it is essential for manufacturers to maintain the quality of the welding joints and proper quality inspection methods are required for the same. The process of inspecting the quality of welding joints, in terms of, but not limited to strength, size etc. is termed as welding quality inspection. This process helps identify potential problems early on before they cause any damage and helps ensure compliance with technical manufacturing standards. Inspection of welding joint quality can be classified into two types namely Destructive and Non-destructive testing. 1. Destructive testing: A few samples from each batch of the welding joints produced are selected and are examined for their quality using destructive testing mechanisms. The main drawback of this method is that the probability of getting defective pieces is higher as the samples for testing are chosen randomly. The cost associated with this type of testing mechanism is significantly higher because if the sample chosen is defective, the whole batch of the welding joints are scrapped. 2. Non-destructive testing: This testing can be further classified into three subcategories. 2.1 Non-destructive manual testing: Welding joints are examined for probable defects, if any. In this type of testing, only the welding joints with visible defects through human eyes are discarded. 2.2 Semi-automated non-destructive testing: Quality of the welding joint is monitored by continuously evaluating the overall structure of the welding bead through cameras and monitors. This method is tedious and prone to human error as significant human intervention is required.

An Unsupervised Image Processing Approach for Weld Quality Inspection

527

2.3 Complete automated quality inspection through well-trained machine learning or deep learning models. The determination of weld joint quality has been a subject of interest from the perspectives of both quality and longevity of a product. Several methods to automate this process of the weld quality inspection have been proposed anterior to the one proposed in this paper [2–4]. While these techniques have the ability to determine the weld joint quality, they rely upon conventional deep learning methods to achieve the objective. A deep neural network can be trained well to detect the defects, provided there is an availability of adequate data set for the network to learn from. A paucity of balanced training datasets, especially in case of an artifact as diversified as weld joints is a hindrance in the usage of deep learning for this problem statement. Fuzzy logic methods on the other hand are explanatory and use if-else rules to establish a relationship between the input and output variables [5]. From this understanding, a hybrid concept called “Adaptive Neuro Fuzzy Inference System” or “ANFIS” was developed to amalgamate the features of both neural network and fuzzy inference system. ANFIS allows the furnishing of fuzzy logic into a neural network. While ANFIS gives way to the creation of a system that is capable of dealing with scarcity of training data, a significant ramification is the increase in required computation power. In this paper, we propose a technique that introduces domain knowledge–infusedANFIS along with a method to objectively determine the weld joint quality based on the geometrical properties of the welding beads. We address the lack of data for training a deep learning network and put forward a process to optimize the required computation power.

2 Literature Survey Previously proposed work in this area suggests extraction of geometrical features such as area, major axis, minor axis and solidity. Hassan et al. [2] make use of radiographic images to extract these features after preprocessing operations such as high boost filtering to enhance edges present in the images, thereby making it simpler to extract edges. After contrast enhancement, Sauvola thresholding method is used to “deduct whether defect present is of higher value than background (weld seam) or holds lower gray level values in comparison with gray level of weld seam”. This preprocessing is followed by morphological operations to derive aforementioned geometrical features. For defect classification, a feedforward ANN is used and a certain threshold is set for all geometrical feature values to classify the status of input as defective. The combined accuracy, i.e., defect detection and classification of this method gives an accuracy of 86%. However, time consumed is 91 s and there are restrictions on the type of input images used.

528

S. Shahi et al.

In the technique put forward by Luciane et al. [3], a computer visual system attached to the linear welding robot is used to address the non-availability of radiographic images with passive-lighting (using green LED lights). The placing of cameras is perpendicular to the weld seam, which provides a top view of the interest area. A system to detect edges is devised using a combination of the techniques Principal Component Analysis (PCA) to assist data reduction, Gaussian mixture as a probabilistic method and Hidden Markov Model (HMM) as a statistical model. The training stage includes the image capture of different welding processes performed by the Bug-O Matic Weaver robot with acceptable quality and discontinuities. These images are used as ground truth and its edges are manually marked. On the training data, PCA is used to eliminate possible overlaps and reduce data. These PCA space edge profiles are used to train a Gaussian mixture whose parameters are learned by a Maximum Likelihood Algorithm to deal with missing data. In the testing process, for each given image, a region of interest (ROI) is determined considering 250 pixels for each side of the empty groove position. It assists to calculate the probability of a pixel being an edge or not. By means of Viterbi algorithm, edge position sequence estimation is performed. Finally, to get the weld quality, features such as minimum maximum and average distance between the weld bead edge and the empty groove line for both sides of the weld bead are computed. The proposed method successfully identified the weld bead edges with maximum errors of 10.96 pixels and overcame the shortcomings of a deep learning method such as need to train and requirement of specific input type. But in order to give acceptable results, there still remain dependencies such as the requirement of green light—illuminated images and manual annotation of edges on input (training) data. In another such study by Zahran et al. [6], radiographic images are used for weld defect identification from radiographic images. However, this approach is based on the generation of a database of defect features using Mel-Frequency Cepstral Coefficients (MFCCs) and polynomial coefficients extracted from the Power Density Spectra (PDSs) of the weld segmented areas after performing preprocessing and segmentation with Artificial Neural Networks (ANNs) for the feature matching process in order to automatically identify defects in radiographic images. Although the proposed methodology gives good results in terms of accuracy, the use of ANNs adds to the dependencies such as computation power and balanced training datasets. In their study, Sassi et al. [7] introduce an intelligent system able to perform quality control assessment in an industrial production line. Deep learning techniques have been employed and proved successful in a real application for the inspection of welding defects on an assembly line of fuel injectors. Starting from state-of-the-art deep architectures and using the transfer learning technique, it has been possible to train a network with about 7 million parameters using a reduced number of injectors images, obtaining an accuracy of 97.22%. The system has also been configured in order to exploit new data collected during operation, to extend the existing data set and to improve its performance further. The developed system showed that deep neural networks can successfully perform quality inspection tasks which are usually demanded of humans although the dependency on humans still remains in order to

An Unsupervised Image Processing Approach for Weld Quality Inspection

529

correctly annotate the training images given to the application in initial runs should it be configured to be used in further use cases having different types of weld joints. In [8], Hou et al. propose an automatic detection schema including three stages for weld defects in x-ray images. Firstly, the preprocessing procedure for the image is implemented to locate the weld region; then a classification model which is trained and tested by the patches cropped from x-ray images is constructed based on deep neural network. Finally, the sliding-window approach is utilized to detect the whole images based on the trained model. The results demonstrate that the classification model we proposed is effective in the detection of welded joints quality. The proposed model is able to obtain a maximum classification accuracy rate of 91.84% (with 90.27% precision and 92.78% recall) however the dependency on availability of required format of input, i.e., x-ray images pose a difficulty. Sang et al. [9] propose a method to automatically detect welding defects from radiographic images using Faster R-CNN which is a deep learning method. Data augmentation is used to address lack of training data. In order to appropriately extract the features of the radiographic testing image, two internal feature extractors of Faster R-CNN were selected, compared, and performance evaluation was performed. Bacioiu et al. [10] propose a system to monitor welding process in Tungsten Inert Gas (TIG) welding process by using an HDR camera to capture weld pool and surrounding area images and passed these images as input to different neural network structures for comparative analysis. The purpose of this analysis and proposed system was to overcome the difficulties that come with the quality classification of an artifact as diverse as welding. The camera manages to balance the light emitted by the welding process and filters out the useful features of the weld pool images for the classification as good or bad. The images thus captured were passed as inputs to fully connected network (FCN) and convolutional neural network (CNN). After a comprehensive analysis of these two neural networks, it was concluded that while CNN was able to learn the identification of one defect against another with an accuracy of 93.4%, FCN is more resilient to input variations and was able to classify good vs. defective welds with an accuracy of 89.5%. The authors conclude that convolutional neural networks are indeed capable of contributing in automated industrial quality classification which will lead to improved consistency and may be used to make the classification process a sustainable activity in terms of time as well as human efforts required.

3 Methodology/Proposed Approach 3.1 Input Files The expected input format for this solution is any image with a top view of a single welding joint. The input image can be normal RGB image, captured from any commodity camera. It does not require any special lighting condition. Infact, the solution is designed in such a way that it can detect the welding joints even in low

530

S. Shahi et al.

Fig. 1 Sample input images

lighting conditions so long as the weld joint is clearly discernible to the human eye. The main goal is to extract the welding joint from the input image and discard the other parts of the image. The solution supports multiple file formats, viz., jpeg, .png, etc. (Fig. 1).

3.2 Solution Architecture Figure 2a and b represent the architecture and input image preprocessing pipeline of the solution.

3.2.1

Preprocessing

The input image is preprocessed to highlight the region of the image containing the weld joint and ignore other regions of the input image. First, the RGB input image is converted to grayscale. Next, denoising is performed on the input image using Gaussian blurring to remove noise from the input image. After denoising, adaptive thresholding is performed on the denoised image for edge differentiation. Following

An Unsupervised Image Processing Approach for Weld Quality Inspection

531

Fig. 2 a Proposed solution workflow. b Input image preprocessing pipeline

this, controlled masking of the input image is done along with separating region of interest to remove probable false edges. Lastly, controlled morphological operations are performed on the processed input image to form a closed outline of the welding joint on the masked image.

3.2.2

Segmentation of the Welding Joint

Following the preprocessing of the input, a weld joint segmentation algorithm is implemented. This algorithm was designed to segment the weld joint from the input images. Canny edge detection algorithm is used to detect edges from preprocessed images following which all the contours detected are extracted from the image. Of all the extracted contours, the largest contour is extracted in the form of fractals. Then a closed polygon is formed from the extracted fractals.

3.2.3

Extracting Geometrical Parameters from the Segmented Welding Joint

From the pixel coordinates of the detected fractals of the weld joint, some useful geometrical parameters are extracted as follows:

532

S. Shahi et al.

1. Average Width of the Weld Joint: Width of the detected fractal is calculated as the average of the varying widths of the beads of the detected weld joint. 2. Area of the Weld Joint: Area enclosed by the detected fractal in terms of pixels (squared). 3. Perimeter of the Weld Joint: Perimeter of the detected fractal in terms of pixels. 4. Average Eccentricity of the Weld Joint: Eccentricity here, is termed as the deviation of the weld from the central line. The widths of three beads of the detected weld joint are calculated and the averaged and returned as the final value of eccentricity. 5. Length of the Weld Joint: Length of detected fractal parallel to y-axis in terms of pixels. 6. Area: Perimeter Ratio: This parameter is computed as a ratio of area and perimeter of the detected fractal. A quality index is generated based on the above extracted geometrical features for the input images on which a domain knowledge infused-ANFIS is trained. The class labels for each of the entries in quality index are encoded into 0 or 1 which indicates the detected weld as bad weld or good weld, respectively. For each of the geometrical features in the quality index, the minimum and maximum threshold values were computed.

3.2.4

Configuration of Priority of the Geometrical Parameters

The Default Configuration of Geometrical Parameters is set as equal weightage is given to each of the geometrical parameters. But if the End-User/Domain SME believes that some of the geometrical parameters may be given more weightage than the others then it can be configured once for each of the weld joint type. Condition for configuring the weights is just that each weight associated with the geometrical parameter should be between 0 and 1 and sum of all the weights should be equal to 1.

3.2.5

Domain Knowledge Infused-Adaptive Neuro Fuzzy Inference System (DKI-ANFIS)

As per Lofti Zadeh [11], in almost all the cases, the boundaries of human perception are not sharp. This brings fuzzy systems into the picture. Fuzzy logic systems have excellent knowledge representation capabilities and can represent unsharp boundaries. These fuzzy systems, when combined with learning capability of a neural network result in a neuro fuzzy system.

An Unsupervised Image Processing Approach for Weld Quality Inspection

533

Fig. 3 Domain knowledge infused-ANFIS

In the proposed approach, the layers of an ordinary ANFIS system are modified and infused with domain knowledge to give better results than other ANFIS systems even if there is a class imbalance in the data or the data is skewed or there is only a short corpus of data available. Additionally, the membership functions of a standard ANFIS have been re-designed to give better results for the proposed solution. In DKIANFIS, along with the standard layers of an ordinary ANFIS, an additional layer is added to infuse domain knowledge. The layers of DKI-ANFIS can be described as follows (Fig. 3 and Table 1).

4 Results and Conclusion The solution was capable of handling end-to-end flow for welding joint segmentation and assessing the overall quality of the welding joint and classifying the joint as good or bad. It can also be highly customized based on the domain knowledge and type of welding joints as the measure of quality differs with each type of welding. An explicit well-defined preprocessing pipeline as well as segmentation pipeline was deployed for segmenting the welding joint from the input image. The facets of the solution were designed keeping in mind the variations that may occur in the input image, since welding can be considered as an artifact where the variations may occur because of numerous factors such as the intensity of the torch, the angle of the torch. The solution was tested on 40 different images of different welding joint types. The welding joint segmentation algorithm was able to successfully segment the welding joint from almost all the images with an accuracy of 95%. The segmentation algorithm gave poor performance in the cases where the welding joint was not easily

534

S. Shahi et al.

Table 1 Domain knowledge Infused-ANFIS layers description S. No. Layer

Description

1

Layer 1 The crisp values of every geometrical feature are passed onto the membership functions which generate the rules for each of the geometrical parameters. This layer contains trainable parameters

2

Layer 2 This layer is frozen and normalizes the weightage given to each parameter as per the domain knowledge. Additionally, in this layer, a domain influence is added to the generated rules based on the domain knowledge

3

Layer 3 Every node in this layer is a fixed node labeled. The output is the product of all the incoming signals. In this layer, the firing strength of the domain influenced rule is computed via a product operation

4

Layer 4 In this layer the normalized firing strength of the rule is computed

5

Layer 5 The parameters of the equation for producing the score are tuned by the learning algorithm of the neural network. Every node i in this layer is an adaptive node with a node function. Here, for each parameter, Wi is the normalized strength of the rule

6

Layer 6 The defuzzification of the consequent parts of the rules are performed by summing the outputs of all the rules and a final output is generated A single node in this layer is a fixed node labeled sum, which computes the overall output as the summation of all incoming signals

distinguishable for the human eye due to deviation from standard lighting condition. Once segmented, the extraction of geometrical parameters module was able to extract the geometrical parameters of the welding joint as per the expectations in all of the sample images. The following are some results of the segmentation algorithm for different weld joint types: Figures 4a and 5a illustrate sample input images for a curved and straight weld, respectively. Figures 4b and 5b show the output of the welding joint segmentation algorithm (in pink and blue highlighted color, respectively). As shown, the algorithm is able to detect the welding joint precisely. As the solution is still under development phase, a comparative study of the proposed solution with the existing solutions cannot be organized. However, as compared to the existing solutions the advantage of the proposed solution would be that it is completely based on open-source tech stack which can be easily configured along the type of the welding, as per the domain knowledge and/or the requirements of the end user.

An Unsupervised Image Processing Approach for Weld Quality Inspection Fig. 4 a Input weld image (curved weld). b Segmented welding joint (curved weld)

Fig. 5 a Input weld image (straight weld). b Segmented welding joint (straight weld)

535

536

S. Shahi et al.

References 1. https://www.mckinsey.com/business-functions/operations/our-insights/the-future-of-manufa cturing 2. Hassan J, Awan AM, Jalil A (2012) Welding defect detection and classification using geometric features. In: 2012 10th international conference on frontiers of information technology. https:// doi.org/10.1109/fit.2012.33 3. Soares L, Weis Á, de Vargas Guterres B, Rodrigues R, Botelho S (2018) Computer vision system for weld bead geometric analysis, pp 292–299. https://doi.org/10.1145/3167132.316 7159 4. Wei D, Li D, Tang D, Jiang Q, Wang D, Wang H, Peng Y (2021) Deep learning assisted vision inspection of resistance spot welds. J Manuf Process 62:262–274. https://doi.org/10.1016/j. jmapro.2020.12.015 5. Tiwari S, Babbar R, Kaur G (2018) Performance evaluation of two ANFIS models for predicting water quality index of River Satluj (India). Adv Civil Eng 2018:1–10. https://doi.org/10.1155/ 2018/8971079 6. Zahran O, Kasban H, Kordy M, Fathi AE-S (2013) Automatic weld defect identification from radiographic images. NDT and E Int 57:26–35. https://doi.org/10.1016/j.ndteint.2012.11.005 7. Sassi P, Tripicchio P, Avizzano CA (2019) A Smart Monitoring System for Automatic Welding Defect Detection. IEEE Trans Industr Electron 66(12):9641–9650. https://doi.org/10.1109/ TIE.2019.2896165 8. Hou W et al (2017) J Phys: Conf Ser 933:012006 9. Oh S-j, Jung M-j, Lim C, Shin S-c (2020) Automatic detection of welding defects using faster R-CNN. Appl Sci 10(23):8629. https://doi.org/10.3390/app10238629 10. Bacioiu D, Melton G, Papaelias M, Shaw R (2019) Automated defect classification of SS304 TIG welding process using visible spectrum camera and machine learning. NDT & E Int 107:102139. ISSN 0963-8695. https://doi.org/10.1016/j.ndteint.2019.102139 11. Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353. https://doi.org/10.1016/S0019-995 8(65)90241-X

YOLO Algorithms for Real-Time Fire Detection Ashish Ranjan, Sunita Dhavale, and Suresh Kumar

Abstract Early detection of violent material such as a Molotov cocktail and firelight carried by individuals in a mob, or the presence of any fire in a crowd may aid security personnel in taking further necessary actions in managing the group. Many recent automated fire detection techniques have been proposed using convolutional neural network (CNN) techniques. However, most of these methods suffer from a high rate of false alarms, slow detection, extraction of hand-crafted features, localization of fire region, and less accuracy. In this research work, we propose a you only look once (YOLO)-based fire detection technique for fast, real-time classification, and positioning of fire objects in crowd images. An extensive set of annotated fire images is required to train data-intensive CNN architectures robustly. Hence, a method for automatically annotating fire images based on color and HSV channel characteristics using morphological image processing operations is proposed, along with data augmentation techniques. A customized fire image dataset, “DIAT-FireDS”, is created using the Web scraping technique and the proposed annotation technique. Generated customized dataset is used for fine-tuning YOLO architectures. Experiments are conducted using a series of YOLO architectures against the standard and presented dataset to achieve real-time detection accuracy of about 0.74 mAP. Keywords Convolutional neural network · Deep learning · Fire detection · Real-time object detection · YOLOv4

1 Introduction According to DIPR research [1, 2], having or presenting violent material such as Firelight and Molotov cocktail at a site might increase people’s hostility and violent behavior in illegal crowd gatherings/protests. Early discovery of these items, or of A. Ranjan · S. Dhavale (B) Defence Institute of Advanced Technology (DIAT), Pune, India e-mail: [email protected] S. Kumar Defence Institute of Psychological Research (DIPR), Delhi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_40

537

538

A. Ranjan et al.

the individual carrying them, or the presence of any sort of fire may aid security personnel to take further necessary actions in managing such crowd gatherings. Also, fire spreads quickly and is challenging to put out quickly, especially if such protests happen in densely populated locations such as residential areas, crowded areas, and woods, where combustibles are densely packed. It is critical to identify fires quickly and accurately to avert large-scale tragedies. Traditional contact sensor-based systems such as smoke/temperature/air quality sensors are less costly and simpler to deploy for small spaces. However, they are not helpful when a fire breaks out suddenly in a large crowded area since they require to be directly triggered by flame temperature and smoke. We can utilize vision-based fire detection solutions in such scenarios to provide a real-time response, extensive coverage, and environmental robustness [3]. Despite the fact that the human visual framework has been effective for monitoring, humans are slow, expensive, and corruptible, putting people on the ground at risk over time. With the development of technology, it has been feasible to physically monitor conditions and detect/locate fire using cameras and CCTV live frames [4]. The sheer volume of data generated made it challenging for security officers to thoroughly review each frame, though. Therefore, automated fire detection approaches are required [5]. Traditional methods use color, HSV, motion, and shape features of fire to detect fire images [5, 6]. Varied types of combustibles, different flame colors, changing lighting conditions, and various forms of flame owing to airflow may all impact these characteristics [3]. These methods suffer from many false alarms, slow detection, extraction of hand-crafted features, localization of fire region, and less accuracy [5]. DL networks are becoming popular to solve many complex computer vision applications. Availability of CPU and GPU along with huge datasets helped researchers to train DL models. Many object detection techniques based on both two-stage models: such as regions with CNN features (R-CNN) [7], fast R-CNN [8], faster R-CNN [9], mask-R-CNN [7], or single-stage models: such as single shot multibox detector (SSD) [10] and you only look once (YOLO) [11–13] have been proved themselves for providing real-time performance [4]. Muhammad et al. [5] proposed MobileNet architecture inspired efficient CNN architecture with smaller convolution kernel and eliminating dense/fully connected layers for fire detection. Li et al. [3] presented a lightweight multiscale spatial domain feature extraction-based CNN model with a channel attention mechanism that can improve the discriminative capacity of fire-like objects by capturing deeper spatial features. However, this approach is also not capable of localizing fire regions in the given image. Daniel et al. [14] presented the best of both worlds fire detection (BoWFire) method for fire detection that uses image classification based on color (YCbCr space) and texture (LBP-local binary pattern) features. The pixel-color classification is carried out by a Naive Bayes classifier. Hikmat et al. [15] proposed lightweight CNN architecture with less number of layers for fire detection. Arpit et al. [16] proposed a lightweight CNN model, named FireNet, that can be deployed on Raspberry Pi like embedded platforms. The same authors provided a customcompiled fire dataset. These works classify an image into the “fire” and “non-fire”

YOLO Algorithms for Real-Time Fire Detection

539

categories and do not localize fire regions in given images. Localizing fire regions in live footage is an essential aid for security personnel to make critical decisions to control the violent crowd. Ahmad et al. [4, 17] developed a system for tracking students during virtual examinations that use YOLOv2 and YOLOv3 to detect cell phones, computers, iPads, and notebooks. Thoudoju et al. [18] explored YOLOv3 for aerial and satellite photos-based object recognization. Kumar et al. [19] presented an algorithm to detect various vehicle classes and people using YOLOv3 and YOLOv4. To determine the likelihood of domestic violence, Jose et al. [20] proposed YOLO-based algorithm for weapon detection in suspicious regions. Substantial challenges arise while creating a customized fire detection algorithm, including the following: (1) a dearth of annotated fire datasets in the open domain; and (2) effectively locate and classify the fire objects in real-time. In this work, to tackle the problem of less available annotated fire images, we used the annotated FireNet dataset [21] for studying RGB and HSV domain feature characteristics to develop a preliminary method for automatic annotation of fire images. Further, a customized fire image dataset “DIAT-FireDS” is created using the web scraping technique, which is annotated using the proposed annotation technique. Generated customized dataset is used for fine-tuning YOLO architectures. We selected various versions of YOLO, i.e., one stage detector to get the real-time performance. We explored YOLO architectures for both indoor and outdoor CCTV surveillance system for early fire detection. After extensive experimentations, we found that proposed fire detection system achieves good fire detection accuracy and reduces false alarms. In this work, Sect. 2 describes the method for an automated annotation generation for fire images; Sect. 3 describes proposed fire detection using YOLO algorithms, and Sect. 4 consists of results and experiments.

2 Method for an Automated Annotation Generation for Fire Images We used the “FireNet” dataset and our own customized fire image dataset “DIATfireDS”. FireNet dataset contains 502 annotated images, and “DIAT-fireDS” dataset contains 308 annotated images. Here, we have used 648 images which are training and 162 images for testing. The sample images are shown in Fig. 1. For developing a method for an automated annotation generation for fire images, we carried out statistical analysis on features related to RGB color and HSV domain on available annotated standard dataset FireNet. In case of annotated regions in FireNet dataset, it is found that amount of red (R channel pixel values) pixels are more compared to green (G channel pixel values), and amount of green are more compared to blue pixels (B channel pixel values) [6]. In the RGB model, red, green, and blue are the primary colors, and all other colors can be created from them. E.g., mixing red and green will produce yellow/possibly orange. Similarly, mixing green

540

A. Ranjan et al.

Fig. 1 FireNet dataset sample images

and blue will generate cyan, while red and blue produce violet. In the case of fire images, red as well as yellow color is more prominent. R, G, and B channels in RGB are all co-related to the color luminance/intensity. To separate image luminance from color information, we can convert images into HSV or hue saturation value domain and conduct an analysis. Figure 2a shows the distribution of minimum, maximum, and average pixel values for the R channel, i.e., Rmin ∈ (0, 221), Rmax ∈ (135, 255), and Ravg ∈ (43, 250). Similarly, Fig. 2b shows the distribution of minimum, maximum, and average pixel values for the G channel, i.e., G min ∈ (0, 70), G max ∈ (70, 221), and G avg ∈ (27, 243). Figure 2c shows the distribution of minimum, maximum, and average pixel values for the B channel, i.e., Bmin ∈ (0, 160), Bmax ∈ (57, 255), and Bavg ∈ (17, 219). Figure 2d shows the distribution of minimum, maximum, and average pixel values for the Hue channel, i.e., Hmin ∈ (0, 0.091), Hmax ∈ (0.091, 1.000), and Havg ∈ (0.040, 0.815). Similarly, Fig. 2e shows the distribution of minimum, maximum, and average pixel values for the saturation channel, i.e., Smin ∈ (0.000, 0.351), Smax ∈ (0.125, 1.000), and Savg ∈ (0.044, 0.893). Figure 2f shows the distribution of minimum, maximum, and average pixel values for the value channel, i.e., Vmin ∈ (0.000, 0.867), Vmax ∈ (0.529, 1.000), and Vavg ∈ (0.173, 0, 983).

YOLO Algorithms for Real-Time Fire Detection

541

Fig. 2 FireNet annotated dataset: channel distribution a R channel, b G channel, c B channel, d hue channel, e saturation channel, and f value channel

A method is proposed for an automated localized annotation generation for fire images. The method includes: (1) Obtaining an image to be annotated, say original image (I), sample image along with its color histogram for RGB channels is as shown in Fig. 3a, b, respectively; from Figure B, we can see amount of red pixels which are more compared to green and blue pixels.

542

A. Ranjan et al.

Fig. 3 a Sample original RGB image, b color histogram for RGB channels of sample image, c corresponding HSV image, and d histogram for HSV channels of sample image

(2) Choosing a parameter that represents the initial number of pixels in the image that are sufficiently red and yellow based on the defined thresholds (e.g., τr min and τr max thresholds for R channel, τgmin and τgmax thresholds for the G channel, along with τhmin and τhmax thresholds for hue channel, τsmin and τsmax thresholds for saturation channel, τvmin and τvmax thresholds for value channel) for both RGB and HSV channels concerning histogram settings and analysis. Thresholds are decided by carrying out a thorough statistical analysis on the available annotated standard dataset FireNet [21], as mentioned in the previous section; (3) Creating binary mask based on chosen histogram thresholds for each color channel c ∈ (r, g, b) using Eq. (1): Fig. 4 a shows the created binary mask for original sample image shown in Fig. 3a. We can apply Mask(x, y, c) on I (x, y, c), set background pixels to zero where Mask(x, y, c) = 0 to get a region of interest as shown in Fig. 4b.

YOLO Algorithms for Real-Time Fire Detection

543

Mask(x, y, c) ⎧ ⎪ &τgmax ≥ I (x, y, g) ≥ τgmin &τhmax ≥ H SV (I (x, y, h)) ≥ τhmin ⎨ τr max ≥ I (x, y, r ) ≥ τr min = 1 if (if meet all conditions) ⎪ ⎩ &τsmax ≥ H SV (I (x, y, s)) ≥ τsmin &τvmax ≥ H SV (I (x, y, v)) ≥ τvmin otherwise MASK = 0

(1) (4) Apply morphological closing on the binary mask image Mask(x, y, c). The morphological “close” procedure involves dilatation followed by erosion, with the same structural element used for both. Here, strel object in MATLAB represents a flat morphological structuring element with a specified neighborhood value of 30 used. Figure 5 shows the binary mask image after the morphological closing operation. Morphological closing operations remove background pixels that fit the structuring element, and hence, it tends to close gaps in the image. (5) Next, fill image holes in binary image mask using imfill (see Fig. 6). (6) Next, measure the properties of image regions, including bounding box coordinates that enclose the ROI in the mask image. To get bounding box coordinates for each 8-connected component, we used MATLAB “regionprops” command and output as shown in Table 1. For all images, the bounding box coordinates are predicted accurately, as the thresholds selected for color channels are generated after accurate statistical analysis of fire images. Figure 7 shows the generated Fig. 4 a Binary image mask and b region of interest of sample image

Fig. 5 Binary image mask after morphological closing operation

544

A. Ranjan et al.

Fig. 6 Binary image mask after the morphological filling operation

bounding box for the fire region in the given image. Hence, the proposed algorithm can be used for an automated localized annotation generation for fire images. The histogram of the region within the predicted bounding box in Fig. 8 shows the dominance of red-colored pixels and intensity values in marked fire locations. Deep learning applications require a huge amount of dataset. To tackle the problem of less available annotated fire images in the case of training data-intensive CNN architectures, a customized fire image dataset “DIAT-FireDS” is created by using Web scrapping technique which is annotated using proposed annotation technique. After labeling the downloaded fire images, we again carried out the manual analysis to remove/correct any wrongly labeled images. To deal with the small dataset, data augmentation is done on the images by adding random noise of 5%, adjusting color between −25 and +25, and modifying exposure between −25 and +25. Generated customized augmented dataset is used for training and fine-tuning YOLO architectures [4] with additional data.

3 YOLO Architectures for Fire Detection We experimented with various YOLO architectures [4]. YOLOv1 [11, 22], a singlestage object detector that takes 448 * 448 input size to locate and classify objects. It provides position information in the form of bounding box (i.e., having highest intersection-over-union (IOU)) with coordinates (x, y) of center point and height, width (w, h) along with prediction confidence for detected object. Figure 9 shows the YOLO v1 architecture that contains 24 convolutional layers followed by two fully connected layers [4]. In YOLOv2 [4, 22], Darknet-19 architecture had used to detect object faster. Figure 10 depicts the Darknet-19 architecture. In YOLOv3 [4, 12], independent softmax layers were replaced with independent logistic classifier for multi-label classification (Fig. 11). Bochkovskiy et al. [13] proposed the YOLOv4 architecture, and it was implemented in the Darknet framework.

Area (in pixels)

3632

No. of pixels in selected region

Property name

Property value

Description

Center of mass of selected region

132.89 151.93

Centroid

Table 1 Region properties for the mask image

79.456 The number of pixels along the ellipse’s major axis

Position + size of the smallest box containing selected region, top-left corner, width (w) and height (h) of box

Major axis length

85.5, 120.5, 82, 71

Bounding box

The number of pixels along the ellipse’s minor axis

62.438

Minor axis length

Ratio of the region’s pixels to the total pixels

0.62384

Extent

Distance around boundary of selected region

250.93

Perimeter

YOLO Algorithms for Real-Time Fire Detection 545

546

A. Ranjan et al.

Fig. 7 Generated bounding box for fire region in sample image

Fig. 8 Color histogram for RGB channels for fire region in sample image

Fig. 9 YOLO v1 architecture [21]

The YOLOv4-tiny architecture was derived from original YOLOv4 architecture with few adjustments like instead of using 137 convolution layers, it uses only 29 convolution layers to conduct object detection in real time with high frames per second. After the release of YOLOv4 [13], ultralytics introduced PyTorch-based YOLOv5 [23] architecture. Wangh et al. [24] introduced PyTorch-based scaled YOLOv4 architecture. In the YOLOv4 design, a network scaling technique with CSP approach [4] was used. The network’s depth, width, resolution, and structure are all affected by this scaling.

YOLO Algorithms for Real-Time Fire Detection Fig. 10 Darknet-19 architecture [11]

Fig. 11 Darknet-53 architecture [12]

547

548

A. Ranjan et al.

Wang et al. [25] proposed YOLOR architecture, which is based on storing implicit knowledge and explicit knowledge together for object detection [4]. A number of YOLO models that have already been trained on the MS-COCO benchmark dataset are tuned to our customized dataset.

4 Experimental Results The dataset is split into a train set and a test set in an 80:20 ratio. Every image has been resized to 416 × 416 size. All experiments are performed on NVIDIA RTX-6000 GPU-powered high-end Tyrone workstation with 2 Intel Xeon processors, 256 GB RAM, and 4 TB HDD configuration. For the software stack, Python 3.7.2, CUDA 10.0, cuDNN 7.6.5, and PyTorch 3.7.2 used [4, 22]. The IOU metric calculates accuracy in object detection models [26]. We used mean average precision (mAP) with precision (P), recall (R), and F1-score for assessing classification accuracy as given in Table 2 [4]. A higher mAP indicates a more accurate model. Frames per second (FPS) measures the model’s real-time performance for real-time classifications [4]. Fire dataset has been fine-tuned to YOLO models that are pertained with MSCOCO benchmark dataset. YOLOR performs better than other models in terms of mAP, YOLOv5 performs better than other models in terms of precision, and YOLOv4 outperformed other models in terms of F1-score. Figure 12 shows the mAP value plots per batch for YOLOv4 tiny and YOLOv4. YOLOv4 tiny and YOLO are trained in Darknet framework. YOLOv3, YOLOv5, scaled YOLOv4 CSP, and YOLOR are trained using PyTorch. Figures 13 and 14 shows the mAP value per plot for YOLOv3, YOLOv5, scaled YOLOv4 csp, and YOLOR. Figure 8 shows the results of YOLOR detecting multiple fire objects per image. It is found that YOLOR can localize the region containing fire at a more considerable distance and a region having a small amount of fire. Figure 15 shows the fire detection in sample images from the FireNet dataset, and Fig. 16 shows the detection of fire in sample images from the “DIAT-fireDS” dataset. Table 2 YOLO model results Metric

YOLOv3

YOLOv4 tiny

YOLOv4

YOLOv4-csp

YOLOv5

YOLOR

[email protected]

0.59

0.65

0.73

0.62

0.65

0.74

Precision

0.73

0.73

0.78

0.76

0.81

0.72

Recall

0.48

0.65

0.73

0.60

0.70

0.75

F1-score

0.58

0.69

0.75

0.63

0.74

0.73

YOLO Algorithms for Real-Time Fire Detection

Fig. 12 mAP a YOLOv4 tiny b YOLOv4

Fig. 13 mAP a YOLOv5, b scaled YOLOv4 CSP

Fig. 14 MAP a YOLOv3, b scaled YOLOR

549

550

A. Ranjan et al.

Fig. 15 Detection of fire objects

5 Conclusion and Future Scope We customized the YOLO object detector in this paper for real-time fire detection and localization in images/video/satellite images for an automatic surveillance application. During experimental analysis, YOLOR demonstrated real-time performance on a video test dataset with more than 30 fps. To tackle the problem of less available annotated fire images in the case of training data-intensive CNN architectures, we proposed a method for automatic annotation of fire images based on color and HSV channel characteristics using morphological image processing operations. Further, a customized fire image dataset “DIAT-FireDS” is created by using the Web scraping technique, which is annotated using the proposed annotation technique. Generated customized dataset is used for training and fine-tuning YOLO architectures. In future, we will consider the situations when the fire is not visible, and only smoke is present.

YOLO Algorithms for Real-Time Fire Detection

551

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 16 Detection of fire objects in DIAT-fire dataset samples

552

A. Ranjan et al.

Acknowledgements The authors thank NVIDIA for the academic GPU research grant for deep learning-related research. This research is supported by the Life Sciences Research Board (LSRB) in association with Defence Institute of Psychological Research (DIPR), DRDO.

References 1. Suresh K (2017) Detection, analysis and management of atypical behaviour of crowd and Mob in LIC environment. ST/14/DIP-732, DIPR/Note/No./714 2. Suresh K (2018) Predicting the probability of stone pelting in crowd of J&K. ST/14/DIP-732, DIPR/Note/No./719 3. Li S, Yan Q, Liu P (2020) An efficient fire detection method based on multiscale feature extraction, implicit deep supervision and channel attention mechanism. IEEE Trans Image Process 29:8467–8475. https://doi.org/10.1109/TIP.2020.3016431 4. Ranjan A, Pathare N, Dhavale S, Kumar S (2022) Performance analysis of YOLO algorithms for real-time crowd counting. In: 2022 2nd Asian conference on innovation in technology (ASIANCON), pp 1–8. https://doi.org/10.1109/ASIANCON55314.2022.9909018 5. Muhammad K, Khan S, Elhoseny M, Hassan Ahmed S, Wook Baik S (May 2019) Efficient fire detection for uncertain surveillance environment. IEEE Trans Ind Inf 15(5):3113–3122. https://doi.org/10.1109/TII.2019.2897594 6. Rafiee A, Dianat R, Jamshidi M, Tavakoli R, Abbaspour S (2011) Fire and smoke detection using wavelet analysis and disorder characteristics. In: 3rd international conference on computer research and development, pp 262–265. https://doi.org/10.1109/ICCRD.2011.576 4295 7. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: IEEE international conference on computer vision (ICCV), pp 2980–2988. https://doi.org/10.1109/ICCV.2017.322 8. Girshick R (2015) Fast R-CNN. In: IEEE international conference on computer vision (ICCV), pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169 9. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(06):1137–1149. https://doi. org/10.1109/TPAMI.2016.2577031 10. Liu W et al (2016) SSD: single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV. Lecture notes in computer science, vol 9905, ECCV 2016, Springer, Cham. https://doi.org/10.1007/978-3-319-46448-0_2 11. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 6517–6525. https://doi.org/10.1109/ CVPR.2017.690 12. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804. 02767 13. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 14. Chino DYT, Avalhais LPS, Rodrigues Jr JF, Traina AJM (10 Jun 2015) BoWFire: detection of fire in still images by integrating pixel color and texture analysis. arXiv:1506.03495v1 [cs.CV] 15. Yar H, Hussain T, Khan ZA, Koundal D, Lee MY, Baik SW (2021) Vision sensor-based realtime fire detection in resource-constrained IoT environments. Comput Intell Neurosci 2021:15, Article ID 5195508. https://doi.org/10.1155/2021/5195508 16. Jadon A, Omama M, Varshney A, Ansari MS, Sharma R (2019) FireNet: a specialized lightweight fire and smoke detection model for real-time IoT applications. ArXiv, abs/1905.11922 17. Ahmad I, Alqurashi F, Abozinadah E, Mehmood R (2021) A novel deep learning-based online proctoring system using face recognition, eye blinking, and object detection techniques. Int J Adv Comput Sci Appl 12. https://doi.org/10.14569/IJACSA.2021.0121094

YOLO Algorithms for Real-Time Fire Detection

553

18. Tahir A, Munawar HS, Akram J, Adil M, Ali S, Kouzani AZ, Parvez Mahmud MA (2022) Automatic target detection from satellite imagery using machine learning. Sensors 22(3):1147. https://doi.org/10.3390/s22031147 19. Kumar CB, Punitha R, Mohana (2020) YOLOv3 and YOLOv4: multiple object detection for surveillance applications. In: 2020 third international conference on smart systems and inventive technology (ICSSIT), pp 1316–1321. https://doi.org/10.1109/ICSSIT48917.2020. 9214094 20. Castorena CM, Abundez IM, Alejo R, Granda-Gutiérrez EE, Rendón E, Villegas O (2021) Deep neural network for gender-based violence detection on twitter messages. Mathematics 9(8):807. https://doi.org/10.3390/math9080807 21. FireNet Dataset. https://github.com/OlafenwaMoses/FireNET. Accessed 31 Aug 2022 22. Gali M, Dhavale S, Kumar S (2022) Real-time image based weapon detection using YOLO algorithms. In: Singh M, Tyagi V, Gupta PK, Flusser J, Ören T (eds) Advances in computing and data sciences. ICACDS 2022. Communications in computer and information science, vol 1614. Springer, Cham. https://doi.org/10.1007/978-3-031-12641-3_15 23. YOLOv5, Ultralytics open-source research into future vision AI methods. https://github.com/ ultralytics/yolov5. Accessed 31 Aug 2022 24. Wang C, Bochkovskiy A, Liao H (2021) Scaled-YOLOv4: scaling cross stage partial network. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, TN, USA, pp 13024–13033. https://doi.org/10.1109/CVPR46437.2021.01283 25. Wang C-Y, Yeh I-H, Liao H-YM (10 May 2021) You only learn one representation: unified network for multiple tasks. arXiv:2105.04206v1 [cs.CV]. https://doi.org/10.48550/arXiv.2105. 04206 26. Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th international conference on machine learning, vol 97, pp 6105–6114. PMLR 27. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788. https://doi.org/10.1109/CVPR.2016.91 28. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105. 0https://doi.org/10.1145/306 5386 29. Wang C-Y, Liao H-YM, Wu Y-H, Chen P-Y, Hsieh J-W, Yeh I-H (2020) CSPNet: a new backbone that can enhance learning capability of CNN. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 1571–1580. https://doi.org/ 10.1109/CVPRW50498.2020.00203 30. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916. https://doi.org/ 10.1109/TPAMI.2015.2389824 31. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: IEEE/CVF conference on computer vision and pattern recognition, pp 8759–8768. https://doi. org/10.1109/CVPR.2018.00913

Improving Autoencoder-Based Recommendation Systems Nilanjan Sinhababu, Monalisa Sarma, and Debasis Samanta

Abstract Recommendation systems help in providing suggestions regarding the products/services that the users might be interested in by analyzing ratings, reviews, and other feedback obtained from the users on their previous purchases. These types of systems can be seen in domains like e-commerce, entertainment, news, etc. The ratings and reviews in e-commerce are obtained from user feedback which are far less in number when compared with the users and items count. This leads to data sparsity issues where the recommendation systems are prone to provide in-accurate recommendations. To counter this issue, in our study, we used both autoencoder and sentiment analysis on the reviews using the BERT language model to generate a more accurate version of the rating matrix. The empirical studies done on the Amazon baby product reviews dataset have shown that the approach has significantly increased the accuracy of predicted ratings. Keywords Autoencoder · BERT · Hybrid recommendation · Sentiment analysis

1 Introduction One of the most commonly used recommendation methods in the e-commerce domain is collaborative filtering (CF). This method recommends a product to users based on similarity. The techniques are called memory-based and model-based CF. Memory-based CF uses the rating data directly to generate recommendation, whereas a model-based CF make use of various machine learning or data mining methods to N. Sinhababu (B) Center for Computational and Data Sciences, Indian Institute of Technology, Kharagpur, India e-mail: [email protected] M. Sarma Subir Chowdhury School of Quality and Reliability, Indian Institute of Technology, Kharagpur, India e-mail: [email protected] D. Samanta Computer Science and Engineering, Indian Institute of Technology, Kharagpur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_41

555

556

N. Sinhababu et al.

make predictions. The ratings and the user reviews are the most important aspects of user feedback in any e-commerce domain [1]. But due to the extremely huge number of items, it is seen that the ratings are sparse. Generally, memory-based CF tends to suffer from various issues like rating data sparsity [2] and dealing with large and complex volumes of data. One of the methods to alleviate is using an autoencoder model to predict the missing ratings. This improves the accuracy of predictions. Further, it helps in learning non-linear user-item relationships efficiently. But the issue with autoencoders is that it only uses the user rating information for predicting missing ratings. This results in prediction inaccuracy as there is no way to verify the correctness of the predicted ratings. Since autoencoders only use rating information, therefore, most of the rating predictions might not be accurate due to the very sparse nature of the data. Hence, integrating information other than the ratings can lead to better solutions. In this paper, we attempt to integrate sentiment analysis of reviews into an autoencoder-based recommendation system, which uses both rating and reviews using a hybrid approach. The main objectives of this research paper are outlined below: 1. To deal with sparse rating matrix with autoencoders. 2. To verify whether we can improve predicted ratings from autoencoders-based recommenders using sentiment-based recommender. 3. To observe how the prediction accuracy changes with the usage of different sentiment classifiers (2-class, 3-class, 5-class).

2 Survey of Literature One of the most popular techniques in recommendation system is matrix factorization technique. It is used for dimensionality reduction. Autoencoders can be used for dimensionality reduction and have shown large improvement over the existing matrix factorization techniques [3]. Most recommendation system face issues in dealing with large and complex volumes of data [4]. To address this, studies have been done to utilize deep learning techniques to improve recommendation system. Recent studies have shown that autoencoders are highly efficient in recommendation tasks. Therefore, modelling a recommendation system based on recommendation system improves the accuracy of predictions due to a better understanding of users’ requirements and features based on its capability to learn the non-linear user-item relations efficiently and encode complex abstractions into data representations [5]. AutoRec, a novel autoencoder framework for CF proposed by [6], was the first attempt in utilizing recommendation system for recommendation tasks. AutoRec is of two types: I-AutoRec and U-AutoRec, where I-AutoRec gave better results than U-AutoRec and it also performed better than LLORMA [7] and RBM-CF [8] on both MovieaLens and Netflix datasets. It was also stated that increasing the number of hidden neurons can increase the model’s performance and also that the non-linear

Improving Autoencoder-Based Recommendation Systems

557

activation function in the hidden layer was critical for the good performance of IAutoRec. Kuchaiev and Ginsburg [3] proposed a deep autoencoder model with 6 layers for prediction tasks on the Netflix dataset. It has empirically shown that deep recommendation system models generalize better than shallow models, the use of regularization techniques like dropout is necessary to avoid over-fitting, and nonlinear activation with negative parts is necessary for training deep recommendation system model. They have also proposed an algorithm called dense re-feeding based on iterative re-feeding to overcome sparsity issues of CF. Also, the model is based on U-AutoRec rather than I-AutoRec as they feel that the latter is a less practical approach than the former. Ferreira et al. [1] employed a four-layer (including dropout layer) autoencoder model with Tanh activation function for encoder and linear activation function for decoder. They converted the ratings in the data from [1, 5] scale to [0.2, 1] scale before training the autoencoder model. The results obtained show that the model performed better than singular vector decomposition on the same data. Sentiment analysis (SA) on reviews is a sentiment classification (can also be considered as text classification) problem where the review is classified into any one of two or more classes that are obtained using its associated ratings. Classification of reviews can be done using two approaches: machine learning-based and lexicon-based [9]. Lexicon-based approach classifies reviews primarily based on pre-compiled and known sentiment words also known as sentiment lexicon. Sentiment words are those words that are used to express positive or negative opinions [9]. The machine learning approach mainly depends on identifying and extracting a proper set of features from reviews called vector representations, which will be then passed through a machine learning model to predict sentiment scores [9]. In [10], a sentiment analysis integrated recommendation system model was proposed where the reviews were divided into 2 sentiment classes based on corresponding ratings such that reviews with ratings 4 and 5 are considered to be positive sentiment class, and the remaining reviews are considered to be negative class. For the SA model, a deep learning hybrid approach CNN-LSTM [11] was utilized for which they have used pre-trained fast text word embeddings [12] after preprocessing the text reviews with NLTK and Beautiful Soup Python libraries. After training the model, the sentiment scores were integrated into matrix factorization recommendation system, and the results were compared with baseline popular recommendations. The proposed model performed way better than the baseline model. In [13], a recommendation approach of collaborative filtering with sentiment analysis was proposed. The CF approaches used were singular vector decomposition, NMF, and SVD++ . Various hybrid and ensemble deep neural network models like CNN-LSTM [14, 15] and CNN-GRU [16] were used in many aspects of predictions. Some used the pre-trained BERT model for obtaining word vectors from reviews text. Two datasets were used namely Fine Foods and Movie Reviews provided by Amazon. The sentiment scores obtained from the SA model were used to predict ratings using an algorithm proposed in the paper which was then combined with ratings predicted from CF using a linear equation with a single parameter. The results show that the accuracy of predicted ratings has significantly improved.

558

N. Sinhababu et al.

3 Proposed Technique We proposed a robust technique using sentiment analysis in addition to the autoencoder to make it more accurate and reduce errors in rating predictions. The model architecture is shown in Fig. 1. First, the ratings are predicted using the autoencoder model. Simultaneously, ratings are predicted using sentiment analysis of reviews. Then, both the predicted rating matrices are modified using a linear combination to provide a more robust rating matrix. For the autoencoder-based recommendation system used in the experiment, we use six layers (including input, output, and dropout layers), which can be said to be a deep autoencoder model. An example showing the layers distribution is [N, 256, 512(d), 256, N], where d is the dropout rate in the dropout layer. The inputs are then passed into the autoencoder model with the partially observed vectors M×N obtained from the user-item rating of users R u = (Ru1  , Ru2 , . . . , Ru N ) ∈ R M×N having M users and N items. Both the encoder and decoder matrix R ∈ R layers are made up of feed-forward neural networks which use scaled exponential linear units (SELU) activation function. SELU is used where data have a negative part and unbounded positive part like in sparse rating matrix [3]. For calculating the loss between input and output of the model, masked mean square error (MMSE) [3] is used as shown in Eq. (1).

Fig. 1 Overall recommendation system framework

Improving Autoencoder-Based Recommendation Systems

N MMSE =

m i (ri − yi )2 N i=1 m i

i=1

559

(1)

where predicted rating is yi , actual rating is r i, and mask function is mi : mi = 1 when r i = 0 and mi = 0 otherwise. The masked √ root mean square error (M-RMSE) evaluation metric used is where itemi, we use M − RMSE = MMSE. To find the rating of the user u for a specific  a function pred_ ae(u, i). We first pass the sparse rating matrix R ∈ RM×N through  the trained autoencoder model and obtain a dense rating matrix Rˆ ∈ RM×N . We then clip the values in Rˆ with lower limit as 1 and upper limit as 5 which makes pred_ ae(u, i) = Rˆ u,i . The overall framework of the autoencoder-based recommendation system is as shown in Fig. 1, and the autoencoder used is as shown in Fig. 2. For obtaining the rating of user a and a specific item i for pred_ sa(u, i) by using predicted sentiment scores from the trained model as in Fig. 3, the following steps are followed: 1. From the complete training set data, we obtain a cleaned training dataset, such that the predicted sentiment scores are equal to the actual sentiment scores. 2. Obtain a set of items i from the cleaned training set such that each of those items has been already rated by the target user u. 3. Obtain a set of users U from the cleaned training set such that each of those users has rated target item i and has rated at least one item from setting i. 4. If set U is empty then predicted rating using sentiment-based recommendation system, the pred_sa(u, i) = r u is the mean of ratings given by target user u. If set U is not empty, then move to the next step. 5. As U is not empty, find top-k users based on their similarity values with target user u calculated using cosine similarity. Let these top-k users be in set Su . Fig. 2 Architecture of a deep encoder–decoder model

560

N. Sinhababu et al.

Fig. 3 Model of autoencoder used

6. Finally calculate the predicted rating using Eq. (2). 

 v∈Su

pred_ sa(u, i) = r u +

  sim(u, v) rv, j − r v  v∈Su sim(u, v)

(2)

Now that both the recommenders are prepared, we can find the final predicted ratings by using Eq. (3). pred_ f (u, i) = β · pred− sa(u, i) + (1 − β) · pred− ae(u, i)

(3)

where pred_sa and pred_ae are predicted ratings from sentiment-based recommendation system and autoencoder-based recommendation system, β is a hyperparameter, whose value lies between 0 and 1, which adjusts the importance of sentimentbased recommender’s predictions to the predicted ratings from the autoencoder-based model. For evaluating the accuracy of the final prediction, RMSE and MAE are used which are calculated using the formula shown below:

n

i=1 (ri

RMSE =

n n

MAE =

− yi )2

i=1 |ri

− yi |

n

where final rating is yi , actual rating is ri , and number of observations is n.

(4) (5)

Improving Autoencoder-Based Recommendation Systems

561

4 Experiment and Experiment Results Here, we present experiments conducted for training the autoencoder model and sentiment analysis model and evaluate the performance of our combined approach and verify if autoencoder-based model accuracy can be improved using sentiment analysis, and see how the change in the number of classes in sentiment analysis can affect the final performance.

4.1 Dataset For the experiments in this paper, we have used Amazon baby products 5-core dataset [17]. In this dataset, each review has a user id, product id, rating, plain text review, a review summary, timestamp, and helpfulness information. There is a total of 160,792 reviews from 19,455 users on 7050 products for more than 10 years. Also, in this dataset, each user has given reviews for minimum 5 products, and all of them have 5 or more reviews. The dataset statistics is shown in Table 1. The distribution of the dataset based on ratings is as shown in Fig. 4. Table 1 Statistics of dataset

Fig. 4 Distribution chart of dataset based on ratings

No. of ratings/reviews

160,792

Number of users

19,455

Number of items

7050

Sparsity

0.998827

Timespan

May 1996–July 2014

562

N. Sinhababu et al.

4.2 Training Autoencoder-Based Recommendation System The dataset is split into the train, validation, and test sets in the ratio 6:2:2. For autoencoder-based recommendation system, the activation functions used were SELU for both encoder and decoder layers. Adam optimizer was used for training, as it provides faster training times, with a learning rate of 1e-4. The model is trained over 500 epochs with early stopping by monitoring validation set loss and patience of 10 iterations. If the training stops early, then the model weights are set to the best model weights. The above-mentioned hyperparameters remain the same through the experimentation, and we will be changing the number of neurons in the hidden layers and dropout values. We start with a higher number of neurons in hidden layers and large dropout values as well because if they start to over-fit early with those values, then it can be confirmed that the number of neurons in hidden layers is larger than required we can reduce the number of neurons and keep the dropout rate unchanged till the model stops to over-fit. We first started with the following configuration for number of neurons in different layers [N, 256, 512(d = 0.8), 256, N]. The dropout value was initially set at 8e-1. The model stops training at 58 epochs with the best model 48 epochs, and on evaluation validation and test RMSE are 1.1700 and 1.1719, respectively. The train versus validation for loss and RMSE during training shows that model over-fits quickly for this configuration; hence, it is not optimal. As the model starts to over-fit early, we set the number of neurons in hidden layers reduced by half and the configuration became [N, 128, 256(d = 0.8), 128, N]. The model stops training at 106 epochs with the best model 96 epochs, and on evaluation validation and test RMSE are 1.1491 and 1.1495, respectively. After we reduce the dropout value to 0.5 and keep the configuration the same, the model starts to over-fit after 65 epochs. We reduce the number of hidden neurons again by half [N, 64, 128(d = 0.8), 64, N] and set dropout back to 0.8, and trained the new model, the model stops training at 109 epochs with best at 99 epochs, and its validation and test RMSE are 1.1370 and 1.1375, respectively. Similar to the above scenario, the model has trained again with the same configuration, and the dropout was reduced to 0.5 which yielded better results as the model stops training at 123 epochs with best at 113, and its validation and test RMSE scores are 1.357 and 1.1374, respectively. As the last configuration model has trained well without overfitting early and also has better performance than any of the other configurations, we keep it as the final configuration and use the dense rating matrix obtained from this model to obtain pred_ae values. The summary of experiments is shown in Table 2.

Improving Autoencoder-Based Recommendation Systems Table 2 Summary autoencoder-based recommendation system experiments

563

Layers

Dropout

Val RMSE

Test RMSE

256,512(0.8),256

0.8

1.17

1.1719

128,256(0.8),128

0.8

1.1491

1.1495

64,128(0.8),64

0.8

1.137

1.1375

64,128(0.5),64

0.5

1.1357

1.1374

4.3 Sentiment-Based Recommendation System 4.3.1

Training Sentiment Classifier

In this section, most of the experiments will be related to the sentiment classifier model training. The main model, which is BERT-base-uncased, is fine-tuned by first loading the weights of the pre-trained BERT-base model from tf-hub, which is an online repository of pre-trained deep learning models for the TensorFlow library. The dataset is split into train and test sets in the ratio of 8:2. Then, we fine-tune the model using the reviews from the train set and evaluate using the test set. When training, the optimizer used is Adam with a learning rate of 2e-5. Also, early stopping is used such that when validation loss keeps for more than 2 epochs, then the training stops, and the weights of the best model are restored. When the model is fine-tuned for 2 class classification, the model performs extremely well as it achieves 92% accuracy with 88% f1-score. For 3 class classification, 88% accuracy is obtained with 72% macro-F1-score. But for 5-class classification, we get an accuracy of 0.73 and a macro-F1-score of 0.57. When the main model is compared with the baseline model, TF-IDF + SVC, it performed much better in all three types of classifications. The related evaluation results for the models on three different classifications are shown in Tables 3 and 4. The related charts are shown in the Figs. 5 and 6, respectively. Table 3 Evaluating the fine-tuned BERT-base-uncased Fine-tuned model

Accuracy

Macro-F1-score

Weighted F1-score

2-BERT

0.92

0.88

0.92

3- BERT

0.88

0.72

0.88

5- BERT

0.73

0.57

0.71

Table 4 Sentiment analysis using TF-IDF + SVC classifier

Class-model

Accuracy

Macro-F1-score

Weighted F1-score

2-TF-IDF SVC

0.877

0.801

0.872

3-TF-IDF SVC

0.851

0.624

0.83

5-TF-IDF SVC

0.675

0.477

0.65

564

N. Sinhababu et al.

Fig. 5 Sentiment analysis performance of fine-tuned BERT model on different classifications

Fig. 6 Sentiment analysis performance of [TF-IDF + SVM] model on different classifications

4.3.2

Predictions of Sentiment-Based Recommendation System

With the sentiment analysis model trained and predicted sentiment scores calculated, the predicted ratings of sentiment-based recommendation system are calculated using the steps mentioned in the previous chapter. When selecting the top-k most similar users to a particular user, the hyperparameter k is set to 10, i.e. at-most 10 users will be taken as a neighbourhood of the target user when calculating pred_sa.

4.4 Combination of Autoencoder and Sentiment-Based Recommendation System With pred_ae and pred_sa values prepared, we need to find the best value of β and verify if the autoencoder-based recommender’s accuracy can be increased using the

Improving Autoencoder-Based Recommendation Systems Table 5 Comparing RMSE values of proposed technique without sentiment and with sentiment

Table 6 Comparing MAE values of proposed technique without sentiment and with sentiment

565 2 class

3 class

5 class

Without sentiment

1.1552

1.1552

1.1552

With sentiment (β = 0.1)

1.1359

1.1357

1.1359

With sentiment (β = 0.2)

1.1218

1.1212

1.1216

With sentiment (β = 0.3)

1.1131

1.1119

1.1124

With sentiment (β = 0.4)

1.1099

1.1078

1.1084

With sentiment (β = 0.5)

1.1123

1.1092

1.1097

With sentiment (β = 0.6)

1.1203

1.1159

1.1163

With sentiment (β = 0.7)

1.1336

1.1278

1.128

2 class

3 class

5 class

Without sentiment

0.8275

0.8275

0.8275

With sentiment (β = 0.1)

0.8228

0.8217

0.8217

With sentiment (β = 0.2)

0.8197

0.8175

0.8173

With sentiment (β = 0.3)

0.8185

0.815

0.8147

With sentiment (β = 0.4)

0.8194

0.8145

0.814

With sentiment (β = 0.5)

0.8227

0.8164

0.8156

With sentiment (β = 0.6)

0.8281

0.8204

0.8192

With sentiment (β = 0.7)

0.8355

0.8264

0.8248

sentiment analysis of reviews. Tables 5 and 6 show the RMSE and MAE values for predicted ratings on the dataset for different values of β and for different types of classifications. From the RMSE values, it can be seen that autoencoder-based recommendation system integrated with sentiment analysis performed better for nearly all values of β. The same is being observed with MAE values. Figures 5 and 6 also illustrate the same. For β = 0.4, the best results of the model were observed for all three classifiers. The improvement in RMSE for autoencoder-based recommendation system integrated with 3 class sentiment classifiers when compared with normal autoencoder-based recommendation system is about 4.1 per cent, which is not high, but at the same time cannot be ignored. When comparing the values of RMSE and MAE for the same value of β but different types of classifiers, 3 class and 5 class have very close values and they perform relatively better than 2 class classifier which has relatively higher accuracy when predicting the sentiment scores. From this, it can be observed that for the model, the classifiers whose number of classes is closer to the number of ratings (which were taken as actual sentiment scores) are more beneficial even though we are sacrificing a considerable amount of accuracy of predicted sentiment scores. Also, if the accuracy of the classifier model is high, then predicted ratings would be much more accurate.

566

N. Sinhababu et al.

5 Conclusion This paper shows a hybrid recommendation system model which uses review sentiment analysis from the e-commerce domain to predict ratings and then combine it with ratings predicted using autoencoder-based recommendation systems to improve the accuracy of predicted ratings. We have also conducted experiments to identify how many classes the reviews can be classified into using sentiment classification such that it is beneficial for improving accuracy. The results have shown that when combining sentiment analysis with autoencoder-based recommenders has slightly improved the accuracy when compared with autoencoder recommenders without using sentiment analysis. The best increase is observed when the number of classes in the sentiment classifier is closer to the number of levels of ratings given. This shows that making use of additional information in the form of user reviews has the recommendations more reliable and accurate. For future work, we can consider finding the neighbourhood of the target item instead of the target user that was used in the paper for finding the sentiment-based predicted ratings. For the e-commerce domain, we consider applying sentiment analysis on text information to generate a review summary instead of a complete review. This improves the overall performance as we can use simpler and faster sentiment analysis models to extract sentiment scores and use them in the model.

References 1. Ferreira D, Silva S, Abelha A, Machado J (2020) Recommendation system using autoencoders. Appl Sci 10:5510 2. Sarwar B, Karypis G, Konstan JA, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th international conference on World Wide Web, WWW 2001 3. Kuchaiev O, Ginsburg B (2017) Training deep autoencoders for collaborative filtering, arXiv 4. Chaudhuri A, Sinhababu N, Sarma M, Samanta D (2021) Hidden features identification for designing an efficient research article recommendation system. Int J Digit Libr 22(2):233–249 5. Zhang G, Liu Y, Jin X (2020) A survey of autoencoder-based recommender systems. Front Comp Sci 14:430–450 6. Sedhain S, Menon AK, Sanner S, Xie L (2015) AutoRec: autoencoders meet collaborative filtering. In: Proceedings of the 24th international conference on World Wide Web 7. Lee J, Kim S, Lebanon G, Singer Y, Bengio S (2016) LLORMA: Local Low-Rank Matrix Approximation. J Mach Learn Res 17:1–24 8. Salakhutdinov R, Mnih A, Hinton G (2007) Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the 24th international conference on Machine learning 9. Liu B (2012) Sentiment analysis and opinion mining. Synthesis Lectures Human Lang Technol 5:1–167 10. Bui H (2020) Integrating sentiment analysis in recommender systems, 127–137 11. Hung BT (2018) Vietnamese keyword extraction using hybrid deep learning methods. In: 2018 5th NAFOSTED conference on information and computer science (NICS) 12. Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching Word vectors with subword information, arXiv

Improving Autoencoder-Based Recommendation Systems

567

13. Dang CN, Moreno-García MN, Prieta FDL (2021) An approach to integrating sentiment analysis into recommender systems. Sensors 21:5666 14. Pramanik PKD, Sinhababu N, Kwak KS, Choudhury P (2021) Deep learning based resource availability prediction for local mobile crowd computing. IEEE Access 9:116647–116671. https://doi.org/10.1109/ACCESS.2021.3103903 15. Pramanik PKD, Sinhababu N, Nayyar A, Choudhury P (2021) Predicting device availability in mobile crowd computing using ConvLSTM. In: 7th international conference on optimization and applications (ICOA) 16. Pramanik PKD, Sinhababu N, Nayyar A, Masud M, Prasenjit C (2022) Predicting resource availability in local mobile crowd computing using convolutional GRU. CMC-Comput Mater Continua 70(3):5199–5212 17. He R, McAuley J (2016) Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of the 25th international conference on World Wide Web, Republic and Canton of Geneva, CHE

ARTSAM: Augmented Reality App for Tool Selection in Aircraft Maintenance Nikhil Satish and C. R. S. Kumar

Abstract Aircraft Maintenance is an advanced task requiring highly skilled engineers. Facilitating the Aircraft maintenance by providing proper tools and equipment is essential in ensuring good maintenance work. Aircraft Maintenance Technicians (AMTs) require precise knowledge and customized tools to perform their duties. They are responsible for an airplane’s safety and efficiency, and rely on a few basic pieces of equipment for a wide range of jobs pertaining to airplane maintenance. Specific maintenance tasks require unique tools. And while the AMTs could probably improvise and get the job done anyway, specialized tools exist for a reason–they help get the job done correctly and improvising will lead to unnecessary labor and a compromised aircraft. For example, an incorrectly sized screwdriver or screw causes wear and tear and makes the job harder. Besides, traditional tool management requires employees to manually check in and out each tool, which is time consuming. A Tool Selector app which recognises and tags tools in real time will help AMTs in determining how it is used in a particular task. Through this app, the AMTs can be guided through animations to perform specific tasks, such as replacement of Oil Filter from an aircraft engine. Keywords Aircraft maintenance · Annotation · Augmented reality · Checklists · Computer graphics · Mixed reality · Remote assistance · Tool management · 3D modeling

1 Introduction Augmented Reality (AR) technology combines virtual information with the real world. It is an interactive experience of a real-world environment where the objects N. Satish (B) National Institute of Technology Karnataka (NITK), Surathkal, India e-mail: [email protected] C. R. S. Kumar Department of Computer Science and Engineering, Defence Institute of Advanced Technology (DIAT) (Deemed to Be University), Pune, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_42

569

570

N. Satish and C. R. S. Kumar

in the real world are enhanced by computer-generated perceptual information, both visual and auditory. With the help of advanced AR technologies such as computer vision, incorporating AR cameras into smartphone applications and object recognition, the information about the surrounding real world of the user becomes interactive and digitally manipulated. Details about objects captured and the environment are placed on the real world. An AR system has the potential to enable task guidance and training for technicians in a real world environment. It could eliminate the need to leave the aircraft for the retrieval of information from maintenance manuals for inspection and repair procedures. With AR, aircraft maintenance can be performed faster and with fewer errors. It reduces the time to repair, improves equipment availability and reduces unplanned downtime. Specific benefits include: • The problem can be addressed quickly, with less time spent on identifying relevant tools and referring to physical manuals • Consistent service rendering, regardless of the AMT in charge • No need for experts to travel to the service location, thereby saving time and travel costs. Augmented Reality (AR) devices such as Hololens, Tablet PCs, and Smartphones have found many applications in Aircraft Maintenance. These AR applications can facilitate various complex maintenance tasks. AR superimposes multimedia content such as video, audio, animation and text in the real time environment. There are many useful AR applications which need to be evaluated and validated for the maintenance tasks (Fig. 1). AR Tool Selector app identifies aircraft maintenance tools in real time and annotates them. The specifications of the tool are displayed which enables the AMTs to determine if the selected tool is to be used for the given task. Other details related to the tool such as its previous use and last maintenance details can also be annotated.

Fig. 1 Application of AR in aircraft maintenance, courtesy of MenaFN. https://menafn.com/110 2892522/Aviation-Augmented-Virtual-Reality-Market-Rise-in-Usage-of-Smart-Technology-inAviation

ARTSAM: Augmented Reality App for Tool Selection in Aircraft …

571

This paper describes a study of using the concept of augmented reality for supporting AMTs in carrying out their task optimally. Since aircraft technicians must always refer to the maintenance manual of the specific aircraft to obtain proper procedures and specifications, regardless of the task, a lot of time is spent searching for instructions, which increases worker stress and decreases job performance. By overlaying virtual information on real world objects and thereby enhancing our perception of reality, AR makes it possible to improve the visual guidance to the workers. ARTSAM guides the users on performing specific tasks through 3D animations in AR. These would be easier for AMTs to refer while carrying out maintenance tasks. It enables maintenance engineers with technical difficulties to get the support they need for maintenance easily and effectively, without going through a complicated process of studying loads of manuals to identify and resolve a technical issue. This paper is structured as follows: In the next Sect. 2, details about tool selection in aircrafts are presented along with the applications of AR in aircraft maintenance. In Sect. 3, the design and implementation of ARTSAM has been discussed. This section has further been structured into 3 subsections–one for implementing tool detection and annotation of details, another for implementing task guidance through animations in AR, and the last to mention the use of Unity3D and Vuforia Engine in our app. Results and discussions are articulated in Sect. 4. In Sect. 5, summary and conclusions are presented.

2 Tool Selection in Aircraft Maintenance Tool management and maintenance training has been an area of extensive research in the past few decades. These studies mainly analyze the current system of training and management and the associated problems, some of which are listed as follows. Firstly, tools in a new toolbox are marked with the identification number of the toolbox. This task is done manually and can take up to several days. Since the markings fade with time, this process needs to be repeated once in two years [1]. Also there is no accessible information about the tools that the AMTs have occupied. The paper-based documentation can cause delays which may postpone the delivery of the aircraft and result in costly penalties. Augmented reality continues to develop and become more pervasive among a wide range of applications [2]. In this paper, the application of AR in tool selection for aircraft maintenance has been discussed, along with the other features of AR such as remote assistance, where the maintenance engineers could connect to experts for guidance during aircraft service. The paper [3] discusses an IoT based approach to a ‘Smart tool management system’, which makes use of RFID (Radio Frequency Identification) tag system–a wireless system comprising tags and readers. The reader, which is placed in the checkout counter of the tool inventory, emits radio waves through antennae and receives signal back from the RFID tag, which is attached to a particular tool. In this way, it is possible to check whether a particular tool is present in the counter. But most of the tool parts are

572

N. Satish and C. R. S. Kumar

constructed from metal, while RFID tags have difficulty operating in close proximity to metals. Besides, it is inconvenient to attach tags to tools. In paper [4], a comparison has been studied between traditional methods of aircraft maintenance and training, and AR based tool maintenance. As per this research, Augmented Reality delivered 3D work instruction was proven to elevate novice technicians to significantly reduce errors and checks back to instruction from their counterparts using paper-based instruction and put them in the same subset for experts using the paper-based instruction. Although not statistically significant, their time on task and visualization questions were brought much closer on average to experts while using the augmented reality delivered 3D work instruction than the paper based solution. Also, while using the AR delivered 3D work instruction novices perceived both the instruction and task to be easier to understand than experts and other novices using a paper based task card on a statistically significant level. The novices had statistically fewer checks back to the instruction when using the application. This is mainly because of the condensation of instruction present in the application. However, the control group (without AR) averaged 133.3% checks per instruction task while the experimental group (with AR guidance) had just a 97.2% checks per instruction task average. This 36.1% reduction implies better understandability, leading to an improved visualization schema and efficiency overall. In addition, time on task was a key variable challenged by this study. Although the data was not statistically conclusive, the variance in mean time on task between the control and experimental groups is worth noting. The experimental group completed the task on average over five minutes faster than their peers in the control group. The exact difference in time was 5:14.2 min, roughly 27.7% of the average time taken by the novices using the paper-based instruction. This translates to an improvement of 16.63 min on the hour. In an industry where time on task holds tremendous weight, utilizing technologically advanced instruction could drastically improve efficiency. In this paper, we thus propose the concept of Universal Computing principle. The Universal Computing Principle (UCP) states that “Computers can perform any activity that can be performed by humans with better efficiency and speed”. The foundations of the Universal Computing Principle are in Universal Computing Machines proposed by Turing and Universal Simulation Principle (USP) [5]. While USP states that “Any interaction mechanism from the real world can be simulated in VR”. The Universal Computing Principle formalizes the vision for capabilities of the computers to perform a variety of tasks. Aircraft maintenance also involves base completeness check wherein once a week each mechanic has to perform a check on his toolbox. Both correctness and completeness of all tools has to be checked. This search and look-up for tools, parts and documents is labor intensive and redundant. It is estimated that maintenance engineers spend 15–20% of their time searching for tools or documentation. Besides, documenting manually is inconvenient and prone to errors. Such incorrect documentation can in turn cause problems during planning of the maintenance events (Fig. 2). Various methods have been proposed to reduce text in technical documentation, aiming at Augmented Reality manuals. Augmented Reality technology reduces operator time and errors in the completion of the tasks. Furthermore, there are specific

ARTSAM: Augmented Reality App for Tool Selection in Aircraft …

573

Fig. 2 Tools used in aircraft maintenance courtesy of Reddit tools. https://www.reddit.com/r/Tools/ comments/9tr9bz/aircraft_mechanic_tools_brazil/. Accessed on 8 Jun 2022

industrial areas where the visualization of information through AR is very effective as compared to other technologies, such as remote assistance and localization of points for inspections. Digitalization of text manuals using AR has several advantages–AR visual manuals are easily updatable, have no storage problems and can be translated easily [6]. These advantages are useful for industrial companies, because their products change very quickly and so technical documentation must be easy to update. Furthermore, products are sold all over the world and local customers need documentation in their native language. To achieve all these advantages, visual manuals have less text and more graphics. Remote assistance can also be incorporated into AR applications which enable technicians to connect to experts during maintenance. Some examples of AR tools proposed in the literature for remote assistance applications are the laser pointer, a visual tool that allows the expert to point at a specific target while assisting the maintenance engineer. AR contents such as 3D shapes, texts, and hand drawings like arrows can be placed on frames taken from the input video and sent to the maintenance engineers, or attached onto real-world objects. These systems, however, do not have the provision to record the digital information so created. This issue has been addressed in paper [7], where the meeting contents are designed to be persistent and chronologically navigable even after the call, so that the AMTs could follow the instructions independently, similar to what happens with unassisted AR applications. AR can also aid aircraft maintenance in component assembly training and component maintenance assistance [8, 9]. In component assembly training, with the aid of advanced AR display terminals, a virtual assembly of the aircraft components and tools is realized through text, picture, animation and voice. Component maintenance

574

N. Satish and C. R. S. Kumar

assistance involves identifying tools, step-by-step guidance in maintenance, selection of tools for specific tasks, and prompt maintenance schemes for maintaining tools. In remote assistance guidance, remote experts can be called for maintenance guidance in case of problems that cannot be solved on site during aircraft service. The view of the site will be accessible to the expert who can draw arrows and insert texts to guide the maintenance engineer. The app proposed in the paper [9] captures real-time images in the AR system, collects audio and video streams from clients (AR system, mobile phone, pad, PC) and makes rendering of these contents into a high-precision real-time stream through unified coding, transmission, decoding and distribution to different terminals for playback.

3 ARTSAM: Design and Implementation 3.1 Tool Detection and Annotation There are 3 steps involved in AR Annotation which are: tool scanning, recognition and tagging (Fig. 3). AR Scanner matches a preset image to the image of the object scanned by the AR device’s camera. On camera activation, it puts a digital layer over the image captured which makes it possible to add computer generated information to that digital layer. After the successful scanning of the image, AR annotations will appear on the screen of the real-world view. The one shown in the figure is the ‘coaching view’, where the user is guided on how to scan the object that is, placing the image anchor to capture the object in the camera (Fig. 4). An augmented reality marker is an object that can be recognized by an AR camera or an AR-enabled mobile app and is used to trigger AR features. The computer vision algorithm will consider all angles of the object to be different or add-on markers and only be able to connect the AR content with one of the object angles and combine the marker object with the desired AR content. The users will be able to scan the devices from the object scanner of Unity. ARTSAM app then recognizes the object

Fig. 3 Tool detection and annotation overview

ARTSAM: Augmented Reality App for Tool Selection in Aircraft …

575

Fig. 4 Object scanning

and pairs it with the previously prepared AR content, displaying it on the device’s display in real-time. Markers (Object Target) can have local recognition or cloud recognition. Local recognition means that the marker databases are stored on the local device where the recognition happens. In cloud recognition, there is a cloud storage of databases, in which case the recognitions happen on the server. Device-based recognitions happen immediately while cloud recognition will take a longer time since the content has to be downloaded from the server. In general, the computer vision algorithm which reads the markers will not track its colors or contents, but track its feature points. And the more feature points an object has, the better and more stable will be the AR content. More feature points will guide the computer vision to recognize the marker object and this in turn speeds up the process considerably. Once the tool considered is detected, the respective AR content which includes a virtual 3D model of the tool along with labels and other details such as previous maintenance information are superimposed on the tool in real time (Fig. 5).

576

N. Satish and C. R. S. Kumar

Fig. 5 Annotation and labeling of tools, courtesy of PTC. https://www.ptc.com/ en/blogs/ar/3-ways-ar-canincrease-service-revenue. Accessed on 8 Jun 2022

3.2 Task Guidance Visual guidance can be used to aid the AMTs to support them to carry out the maintenance tasks efficiently. By overlaying virtual information on real world objects and thereby enhancing the human’s perception of reality, AR makes it possible to visually guide the AMTs. Under task guidance, for any particular task, say, replacement of an oil filter in an aircraft engine, the associated component(s) will be augmented onto the real world. In this case, an aircraft engine will be overlaid on ground, once the ARTSAM app detects a plane surface. Now the AMTs will be guided through a series of animations demonstrating the steps required to complete the task (Figs. 6, 7, 8): A button containing the instruction to be performed is displayed. On clicking, the corresponding animation is run. Fig. 6 Step 1: Grab a spanner

ARTSAM: Augmented Reality App for Tool Selection in Aircraft …

577

Fig. 7 Step 2: Rotate the spanner

Fig. 8 Removal of oil filter

3.3 Implementation ARTSAM is implemented using Unity3D, Vuforia Engine and Blender softwares. Unity3D is a cross-platform 3D engine and a user-friendly development environment. It is suitable for laying out levels, creating menus, animation, scripts, and organizing projects in 3D environments. The models used for animation were prepared in Blender, a free and open source 3D computer graphics software. These models were assembled and animated in Unity. Vuforia Engine is a Software Development Kit, which is incorporated into the Unity project for implementing AR apps. Plane Detector of Vuforia detects the presence of a plane surface and augments an aircraft

578

N. Satish and C. R. S. Kumar

engine model, post which the AMTs are guided through the animation. For tool detection and annotation, there has to be a collection of markers (images), which is saved in a library, for which Vuforia’s Model Target Generator (MTG) was used. The 3D CAD models of the tools were fed into MTG and trained using deep learning algorithms. The trained datasets are then incorporated into the Unity project for tool detection. A MarkerCondition then needs to be created which is attached to a Proxy. This Proxy contains the content to be placed at the position where the marker is detected in the scene. Due to advances in software development and data availability, innovative deep learning techniques have emerged for performing certain tasks. The traditional machine learning algorithms have some limitations and DL algorithms are suitable in these areas. In this app, a type of DL algorithm known as convoluted neural network (CNN) is used for object detection. As stated in Sect. 3.1 about marker-based detection, predefined markers trigger the augmented reality and the user chooses the position to place the object virtually on his/her device. Both marker-based and markerless AR are explored in our app. The app recognises a spanner, which is kept as focus, and its information is projected virtually onto the mobile device used for scanning. In addition to this, an aircraft engine can be placed virtually on any flat surface which gets detected. The user is then guided with further tutorials thereafter. Object recognition is done internally using RCNN. RCNN (Regional Convolutional neural network) is used for object recognition, which involves two steps–object localisation and detection. It uses a computer vision technique called ‘Selective Search’ to compute bounding boxes where the object could potentially be present. Feature extraction from the object is then performed. The output is then fed to SVM for classification.

4 Results and Discussion The paper [10] discusses an object detection app in AR using MobileNet-SSD v2 algorithm for electrical lab equipment detection and annotation. The evaluation metrics comprise average precision and recall. The dataset used in the model had 643 images with annotations of which 193 images were tested for accuracy of the model. The average precision achieved was 81.4% and the average recall was around 85%. But in practice, the DL model was able to recognise all the lab equipment that were pre-selected. But due to lack of camera focus, the equipment was not being detected at low light conditions during preliminary tests. In our app, tool detection gives accurate results under good to moderate lighting conditions. The tool (spanner, in this case) is recognised and tagged in the app after a small duration of 10 to 15 s upon placing it in front of a webcam or mobile camera. The time taken for detection is slightly higher under lower lighting conditions. The approximate dimensions of the tool were given as input to the app prior to detection. This relative dimension of the tool fed into MTG advanced target database should match the corresponding dimensions of the tool being detected. It is important to

ARTSAM: Augmented Reality App for Tool Selection in Aircraft …

579

have a strong overlap for the accuracy of the model. While the Model Target tracker can tolerate some deviation between physical and digital model, it is expected that objects that articulate or flex may fail to be detected or tracked effectively. As for the plane surface detection by Vuforia, both vertical and horizontal plane surfaces are detected under moderate lighting conditions, and the required components can be augmented onto it. Plane surface detection happens almost instantaneously. An aircraft engine is augmented on the ground upon detection of a plane surface. The maintenance guidance required for a particular part or component can be triggered by clicking on the respective part or the corresponding button. The guidance task used in this app is removal of an oil filter from an aircraft engine. The user is directed to the guidance animation upon clicking a button. The stepwise tasks are shown to carry out the respective task. This part did not have any errors or failures since the code was simple and correct (Figs. 9, 10).

Fig. 9 Spanner detection

Fig. 10 Augmenting aircraft engine on ground

580

N. Satish and C. R. S. Kumar

5 Summary and Conclusion In this paper, the design and implementation of Tool Selector App based on Augmented Reality is discussed along with the analysis of the potential benefits. This app reduces the time spent in retrieving information associated with aircraft maintenance tools while eliminating the need to leave the aircraft. It also reduces the cost of training and retraining of technicians. This system bears a high potential in the area of aircraft maintenance and will continue to draw significant attention from the academe and practitioners. The application of AR and other cutting-edge technologies helps to solve many problems in civil aircraft maintenance service. This paper thus highlights the need to exploit the application value of AR and other advanced technologies in the fields of aircraft maintenance and training. Augmented Reality (AR) is the emerging wave of change in the aviation industry. With a comprehensive insight into the components and inner workings of aircraft, maintenance tasks will become considerably easy and successful, while adding a layer of security to the aircrafts. Using AR, real-time information is used in the form of images, text, etc. integrated with real objects. From computers to mobiles, technology has greatly altered the way people communicate and engage. AR has reached a stage where a modern organization can use it as an efficient tool to improve business processes, workflows, and employee training. Technological innovations impact most of the industries, and aviation is no exception. Declaration This is to certify that the report entitled ‘ARTSAM: Augmented Reality App For Tool Selection In Aircraft Maintenance’, which has been submitted to the 7th International Conference on Data Management, Analytics & Innovation, (ICDMAI 2023) is a bonafide report of the work carried out by me. The material contained in this report has not been submitted or accepted for publication anywhere else.

Nikhil Satish, Student of NITK Surathkal Place: Bengaluru, India Date: 30th July, 2022

ARTSAM: Augmented Reality App for Tool Selection in Aircraft …

581

References 1. Gray AE, Seidmann A, Stecke KE (1993) A synthesis of decision models for tool management in automated manufacturing. Manage Sci 39(5):549–567. Retrieved from https://www.researchgate.net/publication/227446509_A_Synthesis_of_Decision_Mod els_for_Tool_Management_in_Automated_Manufacturing 2. Haritos T, Macchiarella ND (2005) A mobile application of augmented reality for aerospace maintenance training. In: AIAA/IEEE 24th Digital avionics systems conference – proceedings, pp 5.3–5.1. https://doi.org/10.1109/DASC.2005.1563376. Retrieved from https://www.res earchgate.net/publication/4206192_A_Mobile_Application_of_Augmented_Reality_for_Aer ospace_Maintenance_Training 3. Hao J-X, Fu Y, Jia S, Zhang C (2014) A smart tool management system for aircraft maintenance an exploratory study in China. In: The 12th international conference on industrial management, pp 353–358. Retrieved from https://www.researchgate.net/publication/317371262_A_S MART_TOOL_MANAGEMENT_SYSTEM_FOR_AIRCRAFT_MAINTENANCE_AN_ EXPLORATORY_STUDY_IN_CHINA 4. Pourcho JB (2014) Augmented reality application utility for aviation maintenance work instruction. Doctoral dissertation, Purdue University. Retrieved from https://docs.lib.purdue.edu/ open_access_theses/368/ 5. LaValle SM (2017) Virtual Reality/university of Illinois [S.l.:], Cambridge University Press, p 418 6. Gattullo M, Uva AE, Fiorentino M, Scurati GW, Ferrise F (2017) From paper manual to AR manual: do we still need text?. Procedia Manuf 11:1303–1310. Retrieved from https://www.researchgate.net/publication/319894930_From_Paper_Manual_ to_AR_Manual_Do_We_Still_Need_Text 7. Calandra D, Cannavò A, Lamberti F (2021) Improving AR-powered remote assistance: a new approach aimed to foster operator’s autonomy and optimize the use of skilled resources. Int J Adv Manuf Technol 114(9):3147–3164. https://doi.org/10.1007/s00170-021-06871-4 8. Yuan L, Hongli S, Qingmiao W (2021) Research on AR assisted aircraft maintenance technology. J Phys Conf Ser 1738:012107. https://doi.org/10.1088/1742-6596/1738/1/012107/ pdf 9. Hongli S, Qingmiao W et al (2021) Application of AR technology in aircraft maintenance manual. J Phys Conf Ser 1738:012133. https://doi.org/10.1088/1742-6596/1738/1/012133/pdf 10. Estrada J, Paheding S, Yang X, Niyaz Q (2022) Deep-learning-incorporated augmented reality application for engineering lab training. Appl Sci 12(10):5159. https://doi.org/10.3390/app121 05159

Named Entity Recognition over Dialog Dataset Using Pre-trained Transformers Archana Patil, Shashikant Ghumbre, and Vahida Attar

Abstract Need of natural language processing (NLP) applications and advancement in deep learning (DL) techniques have increased the need of large amount of human readable data leading to interesting research area named entity recognition (NER), which is a sub-task of natural language processing. NER identifies and tags different real-life entities in their pre-defined categories. Pre-defined categories include person, locations, times, organizations, events, etc., depending upon dataset in hand. For different natural language processing applications such as information retrieval (IR), question answering system (QAS), text summarization (TS), and machine translation (MT), NER forms a base work. Performance of earlier NER techniques is good but requires human intervention for forming domain-specific features or rules. Performance of NER system is further improved by application of emerging deep learning models, and also use of NER in dialog system is less explored area. So, in this paper, our aim is to mention different techniques which can be applied to do NER task and fine-tune the prê-trained transformers to work on in-car dialog dataset for NER task and evaluate the performance of system. Keywords Dialog system · Named entity recognition · Natural language processing · Transformers

1 Introduction NLP applications are increasing as companies and academicians are benefitted by getting insight into vast amount of structured or unstructured data. Indirectly, NER task has also gained importance as it is used for extricating information from the A. Patil (B) · V. Attar COEP, Savitribai Phule Pune University, Pune, India e-mail: [email protected] V. Attar e-mail: [email protected] S. Ghumbre GCOE&R, Awasari, Savitribai Phule Pune University, Pune, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_43

583

584

A. Patil et al.

given unstructured data and is a base work in wide variety of NLP downstream tasks like understanding text [1, 2], IR [3, 4], automatic TS [5], QAS [6], MT [7], and knowledge base construction [8]. Early NER models were lexicon-based, rulebased models, or machine learning-based NER models. In lexicon-based models, system maintains a dictionary that contains a collection of vocabulary, i.e., entities. The presence of entities in the given input is tested using various string-matching techniques between input and dictionary words. To reduce the size of the dictionary, one can use some preprocessing techniques like stemming and lemmatization before building dictionary and to increase detection rate of entity in given input, system can also track synonyms of the input word or dictionary word for matching between them. Necessity for updating of the dictionary is the short coming of this method. Also newly occurring entities would not be detected by this method, and these approaches show better results for specific domain. Recognition of newly occurring entities is a necessary part of NER systems. In absence of labeled training samples, NER detection using hand-crafted rules is preferred [9]. Rules can be created automatically also. System automatically extracting rules can be applied to different domain but system using hand crafting methods does not adapt well for new domain. Machine learning-based system uses probability models and features that are derived from the input text for NER task. These methods can be classified into: supervised, semi-supervised, or unsupervised learning methods. Supervised learning methods require a lot of annotated training data for model construction. But getting such huge data is difficult, time consuming, and also costly at the same time. Supervised machine learning algorithm uses the labeled data to discover generalized NER rules for entity detection. To mitigate the effect of insufficient training data, researches have explored the use of semi-supervised and unsupervised learning methods for NER. Few samples of labeled data are used by semi-supervised methods along with techniques like clustering, distribution statistics, and similarity-based functions to tag a large amount of unlabeled data. Not all applications will have labeled data, so researchers have explored unsupervised methods for NER detection using unlabeled data as well. NER is a particularly important task as one might be dealing with medical domain, financial domain, or some legal documents where precise identification of named entities is important. Recently, deep learning (DL)-based NER models have emerged, are good at discovering hidden features automatically, and have shown state-of-theart results. So, the aim here is to do experimental analysis of the pre-trained transformer BERT model on in-car dataset by fine-tuning it on the said dataset. Identifying correct entity in input message will form the ground work for further operation in dialog system like proper response generation for the given input.

2 Related Work Electronic health record maintaining system implemented NER detection using dictionary-based approach in [10], system showed improved recall performance.

Named Entity Recognition over Dialog Dataset Using Pre-trained …

585

Maintaining dictionary is extra work which has to be done. Named entity recognition system using rule-based grammar with statistical (maximum entropy) models was proposed in [11]. System demonstrated that using smaller gazetteers of wellknown names are more useful compared to system using larger gazetteers having lowfrequency names. Rule-based systems which are using hand-crafted semantic and syntactic rules to recognize entities require exhaustive lexicon to give better performance. Such systems cannot be reused for different domains because the dictionary and rules generated are domain-specific rules, performance wise such systems will give high precision but low recall. Another rule-based NER was proposed in [12], system used Brill parts-of-speech tagger to generate rules automatically. In supervised NER system, system may use word level, document level, or corpus level feature for NER task. Case, morphology, and part-of-speech tag are word-level features while document level features include features like using context window size and imposing document locality while corpus level similarity is also used by researchers for extracting NER. Supervised learning techniques include hidden Markov models [13], decision trees [14], maximum entropy models [15], support vector machine [16], and conditional random fields [17]. Clustering is the classical method in unsupervised learning, which can be used to capture the contextual similarity that can be used to gather named entities from clustered group. Main idea is that lexical information and statistics computed on a large un-annotated data can be utilized to extract entities occurring in the sentences. In [18], author proposed a conditional hidden Markov model (CHMM), which adds contextual representation power of pre-trained language models to HMM, the built model could recognize correct label from multi-source noisy labels using an unsupervised approach. By training sentences to entities and entities to sentences in two cycles, the authors of [19] exploited cycle-consistency training to acquire an efficient mapping between entities and sentences. To mitigate the problem of having large annotated data in [20], author has applied paraphrasing to get newer samples. Also he used unsupervised consistency training technique to improve the performance prediction even after paraphrasing the input. In [21], author has proposed a model which not only labels flat NER but also labels nested NER correctly. In flat NER, every token will receive a single label, but in nested NER, a token may received multiple labels. Nested NER is very useful for answering two independent questions in question answering system. Coming to transformers-based approaches, authors have explored it for NER task in various domains. NER is done on Arabic dataset by author in [22], using BERT and conditional random field. In [23], author has applied transformers-based model BERT to Chinese medical text data to jointly extract both entity and relation from the data.

586

A. Patil et al.

3 Methodology BERT [24] is a transformers-based model used for language understanding which is important for various tasks like summarization, machine translation, and question answering. This framework can be used for different NLP tasks like NER. It uses two steps for training: pre-training and fine-tuning (for required downstream task). Deep bidirectional representation from the unlabeled text is used for this transformers training, which uses English Wikipedia (2500 M words) and word Books Corpus. The BERT model is first set with all its default pre-trained parameters, and then, it is fine-tuned to update its weight which will improve the result of trained model for the required downstream task and dataset. Thus, no matter what the NLP task is using BERT, all uses same pre-trained parameters, and then, they are re-tuned to give better performance for the task in hand. We have BERTBASE and BERTLARGE pre-trained transformers which are variants of BERT [24]. BERTBASE having stack of 12 encoder, hidden size 768, and multihead attention—12 with total parameters—110 M, and BERTLARGE having stack of 24 encoder, hidden size 1024, and multi-head attention—16 with total parameters— 340 M. In the experiment here, the pre-trained BERTBASE UNCASED model is used for NER task. Uncased model is used as we want ‘India’ and ‘India’ both to be tagged as B-geo. BERT uses WordPiece embeddings, where the initial token of every input is a special token [CLS], followed by all tokens in input, and after that all extra spaces in input are denoted with token [PAD]. If sentence is a pair of sentences, they are separated using [SEP]. A sentence {[CLS] Token1 , Token2 , …, TokenN [PAD]} is given input to BERT model. BERT uses 12 attention heads in each of its layers to capture context of each word, and finally, output of BERT model is contextual BERT embeddings, i.e., hidden representation for each word say {w1 , w2 , …, wN }. These embeddings are then given as input to a linear feed forward layer to obtain entity labels {TL1 , TL2 , …, TLN } corresponding to each word as in Fig. 1. This complete network is trained with each epoch due to which BERT is fine-tuned for NER task on the given dataset. The point to note is that the final embedding of all tokens is given as input to linear layer on top of BERT model which acts as a token classifier for giving label for each word in the given input.

4 Experimental Setup and Dataset Details 4.1 Experimental Setup We have fine-tuned pre-trained BERT model on in-car dataset [25]. As dataset contains utterances between user and system, input given to system is a pair of sentences which starts with [CLS] and has [SEP] to separate user and system pair and input ends with [PAD]. We have manually labeled dataset to have 13 categories

Named Entity Recognition over Dialog Dataset Using Pre-trained …

587

Fig.1 BERT architecture for NER task

of IOB2 entity tags. In IOB2 tagging whenever person entity spans for more than one word, first occurrence is tagged with ‘B-per’, and remaining words are tagged with ‘I-per’ and similar is done for respective categories of each tag. Figure 2 shows an example of IOB2 tagging for the given sentence. Since ‘Sachin Tendulkar’ is an entity, first word ‘Sachin’ is labeled with ‘B-per’ while second word ‘Tendulkar’ which is continuation of entity is labeled with ‘I-per’, ‘India’ is again a single entity word which is labeled as ‘B-geo’ while all other words are labeled with ‘O’ as they do not belong to any pre-defined entity category. Our trained model can find out 13 categories, which are listed as follows: • B-per/I-per—person entity Fig.2 NER with IOB2 tagging example

588

A. Patil et al.

Table 1 Data statistics

• • • • • •

Parameter for dataset

Numbers

Number of utterances in train

5527

Number of utterances in validation

657

Number of utterances in test

709

B-geo/I-geo—geographical entity B-org/I-org—organization entity B-tim/I-tim—time indicator entity B-eve/I-eve—event entity B-nat/I-nat—natural phenomenon entity O—assigned if a word does not belong to any entity.

4.2 Dataset Details The fine-tuned model was tested on in-car dataset which is human–human multidomain task-oriented dialog dataset having samples for three different tasks: calendar scheduling, weather information retrieval, and point-of-interest navigation. Utterances in the dataset are shorter, but a key feature of dataset is that it has diverse behavior for user and system response. Statistics for the dataset are given in Table 1. Pre-trained BERT-base-uncased model was tuned finely for 5 epochs on the in-car training dataset having 5527 instances.

5 Result Precision = Recall =

True Positive True Positive + False Positive

True Positive True Positive + False Negative

F1-score =

2 ∗ Precision ∗ Recall Precision + Recall

(1) (2) (3)

Built model result on train and validation set is shown in Table 2. Various evaluation measures were used for testing the performance of model on the given test dataset and are noted in Table 3. Precision is nothing but rightly identified percentage of models result, while percentage of all entities rightly identified by model is called recall. F1-score, traditional measure used for performance evaluation, is the harmonic mean of both precision and recall, and all details are mentioned in Eqs. (1)–(3). Tag-wise classification report of the model on test data is in Table 4, which shows that model shows better performance for time and geographical tags compared to

Named Entity Recognition over Dialog Dataset Using Pre-trained … Table 2 Model performance: loss and accuracy

Table 3 Evaluation measures

Table 4 Tag-wise classification report

Dataset

Loss

589 Accuracy

Train

0.110

0.959

Validation

0.094

0.965

Measure

Value

Accuracy

0.941

Precision

0.798

Recall

0.755

F1-score

0.776

Tag

Precision

Recall

F1-score

Person

0.71

0.69

0.70

Organization

0.64

0.71

0.68

Time

0.87

0.78

0.82

Geographical location

0.83

0.81

0.82

Event

0.68

0.67

0.67

Natural phenomenon

0.90

0.69

0.78

Micro-average

0.80

0.76

0.78

Macro-average

0.77

0.73

0.75

Weighted-average

0.81

0.76

0.78

other tags like organization and event. This is because training dataset has 2055 and 3779 samples having different occurrences of time and geographical entities, respectively, while on the other hand, it has 1504 and 1617 samples having different occurrences of organization and event tag entities. Model also fails to tag time entity like ‘10’ in test dataset if it does not occur in training dataset.

6 Conclusion Correct identification of entities in input is an important task and forms ground work for various NLP applications. We have presented a simple architecture of BERTbased transformer model for NER task on in-car dataset. Model shows better result for entities occurring frequently in training set and comparatively lesser result for entities having low frequency in training set. To conclude we can say that results of the model can be improved by taking a larger dataset where one will get more data for fine-tuning of model, also increasing the number of training epoch of the model

590

A. Patil et al.

for fine-tuning on the dataset can be done so that model does not miss any correct entity labeling. Acknowledgements I thank all my co-authors for their continuous guidance and support, also my college for facilitating my research work.

References 1. Zhang Z, Han X, Liu Z, Jiang X, Sun M, Liu Q (2019) ERNIE: enhanced language representation with informative entities. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 1441–1451 2. Cheng P, Erk K (2020) Attending to entities for better text understanding. Proc AAAI Conf Artific Intell 34(05):7554–7561 3. Guo J, Xu G, Cheng X, Li H (2009) Named entity recognition in query. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pp 267–274 4. Petkova D, Croft WB (2007) Proximity-based document representation for named entity retrieval. In: Proceedings of the sixteenth ACM conference on Information and Knowledge Management, pp 731–740 5. Aone C, Okurowski ME, Gorlinsky J (1999) A trainable summarizer with knowledge acquired from robust nlp techniques. Adv Autom Text Summ 71 6. Molla D, van Zaanen M, Smith D (2006) Named entity recognition for question answering. In: Proceedings of the Australasian language technology workshop, pp 51–58 7. Babych B, Hartley A (2003) Improving machine translation quality with automatic named entity recognition. In: Proceedings of the 7th international EAMT workshop on MT and other language technology tools, improving MT through other language technology tools, Resource and tools for building MT at EACL, pp 1–8 8. Etzioni O, Cafarella M, Downey D, Popescu A-M, Shaked T, Soderland S, Weld DS, Yates A (2005) Unsupervised named-entity extraction from the Web: an experimental study. Artif Intell 165(1):91–134 9. Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvist Investig 30(1):3–26 10. Quimbaya AP, Múnera AS, Rivera RAG, Rodríguez JCD, Velandia OMM, Peña AAG, Labbé C (2016) Named entity recognition over electronic health records through a combined dictionarybased approach. Procedia Comput. Sci. 100:55–61 11. Mikheev A, Moens M, Grover C (1999) Named Entity Recognition without Gazetteers. In: Ninth conference of the European chapter of the association for computational linguistics, pp 1–8 12. Kim JH, Woodland PC (2000) A rule-based named entity recognition system for speech input. In: Proceedings of sixth international conference of spoken language processing—ICSLP 13. Daniel MB, Miller S, Schwartz R, Weischedel R (1997) Nymble: a high performance learning name-finder. In: proceedings of conference on applied natural language processing, pp. 109– 116 14. Sekine Satoshi, “Nyu: Description of the Japanese NE System Used For Met-2”. In Proceedings of Message UnderstandingConference, (1998). 15. Borthwick A, Sterling J, Agichtein E, Grishman R (1998) NYU: description of the MENE named entity system as used in MUC-7. In: Proceedings of the seventh message understanding conference (MUC-7) 16. Asahara M, Matsumoto Y (2003) Japanese named entity extraction with redundant morphological analysis. In: Proceedings of the human language technology conference of the North American chapter of the association for computational linguistics, pp 8–15

Named Entity Recognition over Dialog Dataset Using Pre-trained …

591

17. McCallum A, Li W (2003) Early results for named entity recognition with conditional random fields, feature induction and Web-enhanced lexicons. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL, pp 188–191 18. Li Y, Shetty P, Liu L, Zhang C, Song L (2021) BERTifying the hidden Markov model for multisource weakly supervised named entity recognition. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, pp. 6178–6190 19. Lovine A, Fang A, Fetahu B, Rokhlenko O, Malmasi S (2022) CycleNER: an unsupervised training approach for named entity recognition. In: Proceedings of the ACM Web conference, pp 2916–2924 20. Wang R, Henao R (2021) Unsupervised paraphrasing consistency training for low resource named entity recognition. In: Proceedings of the conference on empirical methods in natural language processing, pp 5303–5308 21. Li X, Feng J, Meng Y, Han Q, Wu F, Li J (2020) A unified MRC framework for named entity recognition. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 5849–5859 22. Al-Qurishi MS, Souissi R (2021) Arabic named entity recognition using transformer-basedCRF model. In: Proceedings of the fourth international conference on natural language and speech processing (ICNLSP), pp 262–271 23. Xue K, Zhou Y, Ma Z, Ruan T, Zhang H, He P (2019) Fine-tuning bert for joint entity and relation extraction in Chinese medical text. In: IEEE International conference on bioinformatics and biomedicine (BIBM), IEEE, pp 892–897 24. Jacob D, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, pp 4171–4186 25. Eric M, Krishnan L, Charette F, Manning CD (2017) Key-value retrieval networks for taskoriented dialogue. In: Proceedings of the 18th annual SIGdial meeting on discourse and dialogue, pp 37–49

Depression Detection Using Hybrid Transformer Networks Deap Daru , Hitansh Surani , Harit Koladia , Kunal Parmar , and Kriti Srivastava

Abstract Depression is a genuine medical condition characterized by lethargy, suicidal thoughts, trouble concentrating, and a general state of disarray. It is a “biological brain disorder” and a psychological state of mind. The World Health Organization (WHO) estimates that over 280 million people worldwide suffer from depression, regardless of their culture, caste, religion, or whereabouts. Depression affects how a person thinks, speaks, or communicates with the outside world. The key objective of this study was to try to identify and use those differences in linguistics in Reddit posts to determine if a person may suffer from depressive disorders. This paper proposes novel Natural Language Processing (NLP) techniques, and Machine Learning approaches to train and evaluate the models. The proposed textual contextaware depression detection methodology consists of a hybrid transformer network consisting of Bidirectional Encoder Representations from Transformers (BERT) and Bidirectional Long Short-Term Memory (Bi-LSTM) with a Multi Layered Perceptron (MLP) attached in the end to classify depression indicative texts that can achieve incredible results in terms of accuracy–0.9548, precision–0.9706, recall–0.9745 and F1 score–0.9725. Keywords BERT · Bi-LSTM · Depression · MLP · Transformers

1 Introduction The World Health Organization (WHO) defines mental health as “a state of wellbeing in which the individual realizes his or her abilities, can cope with the normal stresses of life, can work productively and fruitfully, and can make a contribution to his or her community” [1]. Depression, also known as major depressive disorder, is D. Daru (B) · H. Surani · H. Koladia · K. Parmar · K. Srivastava Dwarkadas J. Sanghvi College of Engineering, Mumbai 400056, India e-mail: [email protected] K. Srivastava e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_44

593

594

D. Daru et al.

a prevalent and genuine medical illness that represses how you feel, think, and act. It can also cause melancholy, and a loss of passion for activities that were hitherto enjoyed [2]. In many nations, depression is still underdiagnosed or left pending adequate treatment, leading to severe self-doubts and, at its worst, suicide [3]. Symptoms of depression range from overwhelming sadness, grief, and a constant feeling of depression to frequent fits of crying. It is a state of helplessness and emptiness. Symptoms vary from teens to older individuals. Depression can make it difficult for individuals to keep up a daily job schedule or attend social engagements. This might be the effect of symptoms like difficulty concentrating and inability to make rational judgments. Depression has also been linked to an increase in the likelihood of addiction to drugs and alcohol. Research suggests that people diagnosed with depressive disorders and other illnesses often tend to have elevated symptoms of both [4]. Thus, early detection of depression is the key to reducing the ill effects on health. With the increasing popularity of the internet and the rise of social media, people are more open to sharing their experiences and stories on message boards and forums such as Reddit. In addition, the palpable stigma around mental disorders prevents many affected people from pursuing appropriate professional assistance. As a result, numerous pieces of research have been conducted in this field. These studies further the proposition that textual data can contain depressive indicators helpful in detecting depressive disorders. Many such markers have been experimented with, including but not limited to word frequency, sentiment, punctuations per sentence, singularity index and singularity p index [5].

2 Related Work Research on depression and other mental diseases has garnered new insight into the effects of ever-growing social media platforms on an individual. A new platform for cutting-edge research has been made possible by abundant textual and social metadata on websites like Facebook, Twitter, and Reddit. In this study, the authors evaluate textual data and investigate how social networks affect users’ mental health using Natural Language Processing (NLP) technology and several classification methodologies. The information is assessed from several angles, including textual, authorial, and societal. In Tadasee et al. [6], posts of Reddit users are analyzed whose posts are related to depression. It used Linguistic Inquiry and Word Count (LIWC), n-grams, and Latent Dirichlet allocation (LDA) topics to explore the linguistic usage of the users. Text encoders are used for encoding, which is then used by classifiers. It employs different classification techniques like linear regression (to determine the probability that a binary response occurs on the basis on one or more predictors and features), Support Vector Machines (SVM), Random Forest (RF), Adaptive Boosting (used for combining weak classifier into a strong classifier for binary class) and Multilayer

Depression Detection Using Hybrid Transformer Networks

595

Perceptron (MLP) (two hidden layers having perceptrons to fix the features to maintain consistency for unbiased comparison). The best result is given by a combination of LIWC, LDA and bi-gram, and MLP having high accuracy and F1 score. Orabi et al. [7] used a Twitter dataset labelled as Control, Depressed, and PTSD, along with age and gender. The research is primarily based on identifying users susceptible to depression. It follows a neural network architecture which uses word embeddings using random trainable, skip-gram, and Continuous Bag of Words (CBOW). In addition, it uses Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) to evaluate the performance of depression detection. To improve accuracy, it needs more attention-based models. Resnik et al. [8] built a model to predict how depressed or not depressed an individual is using a Twitter dataset with roughly 3 million tweets. For a better and more accurate representation of more complex textual data and not ruling out the fact that a single topic might contribute to either side of the regression parameter based on its subcontext, it uses an SNLDA (4 Supervised Nested Latent Dirichlet Allocation) model. Pirina et al. [9] also used data gathered from social media platforms to identify depression. It experiments with a model having SVM with a combination of character and word n-grams; these are then weighted using BM25, and cross-fold validation is used for best results. It focuses more on the model’s performance, but other linear models are also available for better analysis and investigation of features related to the specified task. Shen et al. [10] used multi-source social media datasets. It implements a crossdomain Deep Neural Network (DNN) with the help of feature adaptive transformation and combination tactic. The shared features are processed with respect to divergence and isomerism to improve the model’s working. Paul et al. [11] used data gathered from posts and comments (in XML format) from the social media platform Reddit to detect depression in the early stages. The techniques used involve different classifiers and feature engineering schemes. Bag of Words (BOWs) is used for text representation in vector form, and the Unified Medical Language System (UMLS) for an in-depth understanding of medical jargon with the help of a tool called MetaMap. After feature engineering, classifiers such as Logistic Regression (LR), SVM, RF and a pre-trained RNN model using Fasttext embeddings and implemented with the help of GloVe embeddings. Wolohan et al. [12] used posts from relevant subreddits to perform lexical and predictive analyses to detect depression for two conditions, one in which depressionrelated text is included and the other in which it is withheld. LIWC software is used to get information about the psychological state of the user to get a better insight into the user’s mental state. Classification techniques include LIWC score, character n-grams, word n-grams and LIWC plus both character and word n-grams. Few individual studies have applied machine learning algorithms like SVM, KNN, Decision Tree and Ensemble separately. No detailed research has amalgamated different techniques on the same dataset to investigate the variations in technique-based findings.

596

D. Daru et al.

Table 1 Related works Sr. No.

Data source

Features

Methods

Evaluation metrics

References

1

Reddit

BOW, UMLS

AdaBoost & SVM

F1 Score: 0.75 & 0.98

Paul et al. [11]

2

LIWC, N-grams

SVM

Accuracy: 0.82

Wolohan et al. [12]

3

N-grams, topic MLP modelling

F1 Score: 0.64

Maupomé et al. [13]

LIWC, sentiment

AUC: 0.87

Reece et al. [14]

N-grams, topic SVM modelling

Accuracy: 0.69

Tsugawa et al. [15]

4

Tweets

5

RF

6

Facebook

LIWC, LR N-grams, topic modelling

AUC: 0.72

Guntuku et al. [16]

7

Live journal

LIWC, topic modelling, mood tags

Regression models

Accuracy: 0.93

Nguyen et al. [17]

8

Blog posts

TFIDF, Topic modelling, BOW

CNN

Accuracy: 0.78

Tyshchenko [18]

Very few studies have adopted a neural network approach and achieved satisfactory performance. However, most studies to date have yet to adequately explain the results of the evidence in terms of relevant theories in this area, making it difficult to analyze further processes such as diagnosis and prevention in more detail in Table 1.

3 Data Dataset curated for this study is made from publicly available data on Reddit, posted between 2017 and 2021. Reddit posts provide extensive and high-quality textual data required to train the model for accurate results. Moreover, it has plenty of users and is extensively used among this study’s target audience. It provides complete anonymity to the users and is often used for discussing heavily criticized topics. Specifically, this data does not include any posts declared ‘private’ by the user or direct messages. Anyone can access this data via Reddit’s own “Reddit Developer API” or a public third-party “PushShift API”. This study targets people who publicly aver that they have been suffering from different mental illnesses and describe the effects associated with it. People make such statements to seek guidance and support from others on social media platforms, to fight against the taboo of mental illness, and to seek answers and remedies for their behaviour and disease. Posts from Reddit

Depression Detection Using Hybrid Transformer Networks

597

Table 2 Sample from dataset Id

Text

Label (depression = 1, not depression = 0)

1

I don’t feel well lately and it is difficult to just talk to people and stuff

1

2

I got accepted into a Graduate program I wanted. But before that this was the journey, that took place over 7 years

0

3

I’m happy crying over my puppies right now

0

4

Well it seems that you can’t die from overdose of some drugs and 1 only suffer minimal consequences lol

were acquired using regular expressions with the aid of sizable multi-year healthrelated data. The subreddit r/depression is utilized to collect depression-indicative posts. A sample of the general population is required to use as an approximate representation of the community controls to form models for analysis and validate the data. For control data, standard posts by general users are collected from subreddits like r/happy, r/family, and r/friends. Data is extracted using PushShift API, where the relevant subreddit, time interval and the limit of posts required are given as parameters, and the data fetched through API are returned as JSON objects and then it is stored in a CSV file (Table 2). Pre-processing techniques are applied to the dataset noisy, missing, duplicate and irrelevant data is removed. The title and text of the posts are then separated, as only the text is required for analysis. It has to be noted that emoticons were kept in Unicode format for the training model as they may convey special meanings regarding the behaviour of depressed or control groups. Links were removed via regular expression; titles were not used to train the model. A total of 4738 posts are extracted, out of which 3937 posts display depressive mood while 801 posts display nondepressive mood. The texts which indicate depressive mood are labelled as 1, and the text which means nondepressive mood are labelled as 0. All these texts were reviewed by our institution-designated psychiatrist and verified as genuine depression-indicative posts so that a faulty inclusion in the dataset is not made. Dataset is made available publicly on Kaggle for future research and collaboration under the license “Reddit API Terms”.

4 Methodology Traditional methods have been employed previously to extract features from textual data such as n-grams (unigrams and bigrams), LIWC dictionary and LDA topics, which are often employed in mental healthcare projects, and are classified using conventional machine learning and deep learning techniques. This makes for a sound methodology but overlooks different aspects of the language that can be understood

598

D. Daru et al.

better by uprising transformer networks. In order to overcome these shortcomings, this study leverages the ability of DNN to detect and extract the features from the corpus that are predominant in depression-indicative posts on its own. The proposed hybrid architecture consists of BERT and Bi-LSTM with an MLP attached at the end to classify depression-indicative texts. A simplified pipeline with its 5 main stages, as marked in Fig. 1. Steps 1 to 3 form the embeddings of a sentence utilizing the BERT Base model after pooling is applied on the 12 layers. Step 4 extracts features from the embeddings using Bi-LSTM, and step 5 classifies these features into indicative depression or no depression through an MLP. Implementation is done using a ‘bert-base-uncased’ pretrained tokenizer and encoder model from the Hugging Face library. The embeddings generated are in tensors, later given as input to Bi-LSTM and MLP layers created using TensorFlow. Adam optimizer is used with a learning rate of 1e−3 with a batch size of 15.

Fig. 1 Hybrid transformer network (BERT, Bi-LSTM and MLP pipeline)

Depression Detection Using Hybrid Transformer Networks

599

4.1 Preprocessing The first step involves the input sentence being tokenized using the BertTokenizer, which creates tokens based on the WordPiece model [19] that includes words, subwords and individual characters, so no word falls under a catch-all token ‘OOV’ (out-of-vocabulary) which happens in the case of LSTM tokenization.

4.2 Embeddings In this step, the BERT Base model is employed, which generates context dependent word embeddings for each input sentence wherein each of 12 layers of BERT Base gives its own (n, 768) embeddings where n is the number of tokens generated by the j BertTokenizer in the previous step. Each token embedding E i consists of 768 hidden features for a single token in the sentence, where j is the number of a layer from 1 to 12 in BERT Base, and i is the number of tokens from 1 to n, as mentioned in Fig. 2. BERT utilizes a sequence of 12 transformer encoders that have a multi-head attention block having attention: 

 QK T Attention(Q,K ,V ) = softmax √ V dk

(1)

where Q stands for query matrix, K for the key matrix, d k , being the dimensions of matrix K and V for the value matrix. The above formula is utilized to calculate the multi-head attention as follows: Multi Head(Q,K ,V ) = Concat(head1 , . . . , headn )W O

(2)

  where headi = Attention QWiQ ,K WiK ,V WiV , projections as parameter matrices WiQ ∈ R dmodel ×dk , WiK ∈ R dmodel ×dk , WiV ∈ R dmodel × dk and WiO ∈ R hdv ×dmodel with h as the number of parallel heads [20].

4.3 Pooling Strategy The (12, n, 768) embedding sequence, as received from the BERT model, contains different information for different layers, influencing the basis of this methodology’s pooling strategy. Various pooling strategies highlight specific parts of the text and give further insight into its context. Four strategies are explored in this paper: last layer, second last layer, mean of last four layers and mean of all 12 layers.

600

Fig. 2 Embeddings and pooling strategy

D. Daru et al.

Depression Detection Using Hybrid Transformer Networks

601

4.4 Feature Extraction Next, in this model, the pooled sentence vector sequence is fed into a Bi-LSTM network with a cell state of 256, which extracts context-rich features from the given sentence encodings. Rather than providing the embeddings generated from BERT to an MLP, this step tries to extract further any features from the embedding from the previous step to improve the accuracy.

4.5 Classification MLP is exercised on the features to classify whether or not texts are depressionindicative. Two layers are employed, with the first one having 128 units and ‘relu’ activation function and the second layer having 1 unit to give binary output with the ‘sigmoid’ activation function. The output from the model is the probability of a text being depression-indicative between 0 and 1 (Fig. 3).

Fig. 3 Feature extraction and classification

602

D. Daru et al.

Table 3 Pooling strategy comparative analysis Evaluation metrics Pooling strategy

Accuracy

Precision

Recall

F1

Last hidden layer

0.9483

0.9640

0.9735

0.9687

Second-to-last hidden layer

0.9483

0.9676

0.9695

0.9685

Mean of last 4 layers

0.9451

0.9630

0.9705

0.9667

Mean of 12 layers

0.9548

0.9706

0.9745

0.9725

5 Results After performing various pooling strategies on the BERT word embeddings, Table 3 showcases the evaluation metrics compared for different layers and strategies employed and highlights the best performing pool, reducing the mean of all 12 layers of the BERT Base model. A comparative study on the same dataset using the standalone Bi-LSTM model, standalone BERT model and the proposed network was conducted. The proposed architecture yielded state-of-the-art for all the evaluation metrics: accuracy, precision, recall and F1 score compared to Bi-LSTM and BERT used individually. Figure 4 shows the progression of accuracy, precision and recall over 50 epochs of the proposed network with a final accuracy of 95.48%. This proposed network utilized the best-found pooling strategy - the mean of all twelve layers. This gave a higher recall of 0.9745 over a precision of 0.9706, which is preferable in the medical domain since it means more predicted depressed cases were depressed (Table 4).

6 Conclusion The authors have performed a comparison study of BERT Transformer networks, Bi-LSTM and an amalgamation of BERT and Bi-LSTM models that uses BERT neural networks and their multi-head attention blocks producing word embeddings and feed these feature-rich embeddings to Bi-LSTM model that generates context aware sentence encodings. Further research can be done to apply these methodologies to multi-modal data, including images, audio or video data and textual data. This would factor in all the activity conducted by the user on their preference of social media platform and give a holistic analysis of their mental state. With novel DNNs such as Transformers and Vision Transformers [21] this can be achieved and implemented even to build a complete product that can help an individual understand whether and at what point social media has taken a toll on their mental health.

Depression Detection Using Hybrid Transformer Networks

603

Fig. 4 Evaluation Metrics achieved by the proposed hybrid architecture, including a precision, b recall and c accuracy for the classification task

Table 4 Different architecture comparative analysis Evaluation metrics Models

Accuracy

Precision

Recall

F1

Bi-LSTM

0.9023

0.9342

0.9479

0.9410

BERT

0.9483

0.9676

0.9695

0.9685

Proposed architecture

0.9548

0.9706

0.9745

0.9725

References 1. WHO International (2022) Mental health: strengthening our response. Retrieved from https://www.who.int/news-room/fact-sheets/detail/mental-health-strengthening-our-res ponse. Accessed on 20 Jul 2022 2. American Psychiatric Association (2022) What is depression?. Retrieved from https://www. psychiatry.org/patients-families/depression/what-is-depression. Accessed on 20 Jul 2022 3. WHO International (2022) Depression. Retrieved from https://www.who.int/health-topics/dep ression#tab=tab_1. Accessed on 20 Jul 2022 4. National Institute of Mental Health (NIMH) (2022) Depression. Retrieved from https://www. nimh.nih.gov/health/topics/depression. Accessed on 20 Jul 2022

604

D. Daru et al.

5. Havigerová JM, Haviger J, Kuˇcera D, Hoffmannová P (2019) Text-based detection of the risk of depression. Front Psychol 10:513. https://doi.org/10.3389/fpsyg.2019.00513 6. Tadesse MM, Lin H, Xu B, Yang L (2019) Detection of depression-related posts in Reddit social media forum. IEEE Access 7:44883–44893. https://doi.org/10.1109/ACCESS.2019.2909180 7. Orabi AH, Buddhitha P, Orabi MH, Inkpen D (2018) Deep learning for depression detection of twitter users. In: Proceedings of the fifth workshop on computational linguistics and clinical psychology: from keyboard to clinic, pp 88–97. https://doi.org/10.18653/v1/W18-0609 8. Resnik P, Armstrong W, Claudino L, Nguyen T, Nguyen V-A, Boyd-Graber J (2015) Beyond LDA: exploring supervised topic modeling for depression-related language in Twitter. In: Proceedings of the 2nd workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 99–107. https://doi.org/10.3115/v1/W15-1212 9. Pirina I, Çöltekin Ç (2018) Identifying depression on Reddit: The effect of training data. In: Proceedings of the 2018 EMNLP workshop SMM4H: the 3rd social media mining for health applications workshop & shared task, pp 9–12. https://doi.org/10.18653/v1/W18-5903 10. Shen T, Jia J, Shen G, Feng F, He X, Luan H, Tang J, Tiropanis T, Chua TS, Hall W (2018) Cross-domain depression detection via harvesting social media. In: Proceedings of the 27th international joint conference on artificial intelligence (IJCAI 2018), vol 2018-july. International Joint Conferences on Artificial Intelligence, pp 1611–1617. https://doi.org/10.24963/ ijcai.2018/223 11. Paul S, Jandhyala SK, Basu T (2018) Early detection of signs of anorexia and depression oversocial media using effective machine learning frameworks. In: Proceedings of the CLEF (working notes), pp 10–14 12. Wolohan JT, Hiraga M, Mukherjee A, Sayyed ZA, Millard M (2018) Detecting linguistic traces of depression in topic-restricted text: attending to self-stigmatized depression with NLP. In: Proceedings of the first international workshop on language cognition and computational models, pp 11–21 13. Maupomé D, Meurs M-J (2018) Using topic extraction on social media content for the early detection of depression. In: CLEF (working notes), vol 2125 14. Reece AG, Reagan AJ, Lix KLM, Dodds PS, Danforth CM, Langer EJ (2017) Forecasting the onset and course of mental illness with Twitter data. Sci Rep 7(1):1–11. https://doi.org/10. 1038/s41598-017-12961-9 15. Tsugawa S, Kikuchi Y, Kishino F, Nakajima K, Itoh Y, Ohsaki H (2015) Recognizing depression from twitter activity. In: Proceedings of the 33rd annual ACM conference on human factors in computing systems (CHI’15). Association for Computing Machinery, New York, NY, USA, pp 3187–3196. https://doi.org/10.1145/2702123.2702280 16. Guntuku SC, Yaden DB, Kern ML, Ungar LH, Eichstaedt JC (2017) Detecting depression and mental illness on social media: an integrative review. Curr Opin Behav Sci 18:43–49 ISSN 2352–1546. https://doi.org/10.1016/j.cobeha.2017.07.005. 17. Nguyen T, Phung D, Dao B, Venkatesh S, Berk M (2014) Affective and content analysis of online depression communities. IEEE Trans Affect Comp 5(3):217–226.https://doi.org/10. 1109/TAFFC.2014.2315623 18. Tyshchenko Y (2018) Depression and anxiety detection from blog posts data. Master’s thesis, University of Tartu 19. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805, arXiv:1810.04805v2. https://doi. org/10.48550/arXiv.1810.04805 20. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30, pp 5998–6008. arXiv:1706.03762, arXiv:1706.03762v5. https://doi.org/10.48550/arXiv.1706. 03762 21. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Houlsby N et al (2020) An image is worth 16×16 words: transformers for image recognition at scale. arXiv:2010.11929, arXiv:2010.11929v2. https://doi.org/10.48550/arXiv.2010.11929

A Comparative Study of Distance-Based Clustering Algorithms in Fuzzy Failure Modes and Effects Analysis Nukala Divakar Sai, Baneswar Sarker, Ashish Garg, and Jhareswar Maiti

Abstract Failure mode and effects analysis (FMEA) is an engineering analytical technique that has found application in the identification of failure modes through the investigation of a system and quantification of the risk of occurrence of each failure mode and eliminating them. Failure modes are evaluated in terms of three risk criteria, namely, severity (S), occurrence (O), and non-detectability (D), and prioritized based on their risk priority number (RPN), that is equal to the product of the three risk criteria. However, conventional RPN has a number of drawbacks. This paper proposes a framework that involves a clustering-based approach for prioritizing the failure modes developed and applied to a particular system defined in an integrated steel plant. First, the relative significance of the risk criteria in terms of their weights was computed using the Fuzzy-Full Consistency Method (F-FUCOM). Linguistic terms were considered for comparison among criteria and rating every failure mode in terms of S, O, and D, which were subsequently converted to Triangular Fuzzy Numbers (TFN). Then the failure modes were clustered in the fuzzy environment into four groups using K-means, Agglomerative and Fuzzy C-means (FCM) algorithms, based on weighted Euclidean distance. The results obtained by the clustering algorithms were finally compared. While calculating the weights of the risk criteria, S was considered the most important. Four clusters were formed and named Minimal, Moderate, Major, and Extreme, based on the increasing centroid RPN values respectively. The largest number of failure modes were clustered in the Major-risk cluster followed by the Extreme-risk cluster. Finally, K-means clustering was found to be the superior clustering algorithm of the three. Keywords Agglomerative · Clustering · Failure modes · FMEA · Fuzzy C-means · F-FUCOM · K-means · Risk N. D. Sai · B. Sarker (B) · A. Garg · J. Maiti Department of Industrial and Systems Engineering, Indian Institute of Technology, Kharagpur, India e-mail: [email protected] A. Garg · J. Maiti Centre of Excellence in Safety Engineering and Analytics, Indian Institute of Technology, Kharagpur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_45

605

606

N. D. Sai et al.

1 Introduction Failure mode and effects analysis (FMEA) is an engineering analytical technique, which can be applied for the estimation of the effects of known and potential failure modes, after the investigation of the systems, subsystems, assemblies, components, or processes [1]. It involves the identification of failure modes that would impact the reliability of the overall system, followed by quantification and prioritization of the failure modes. FMEA can also be diversified to estimate failure modes whose outcome could be the realization of an undesired state of the system or a system hazard. Thus, FMEA also finds its application in hazard analysis. As an analysis technique, FMEA was first established for the U.S. military originally in 1949 [2]. It found its application for the evaluation of reliability and the estimation of the effects of failures in equipment and systems, where the failures were categorized on the basis of the effect they had on the success of the mission and the safety of personnel and equipment. It was utilized in the development of aerospace technology to identify and circumvent errors on small sample sizes. In the 1960s, FMEA was adopted by NASA to aid in the development of products by ensuring their safety and reliability. In the 1970s, Ford Motors was the first in the area of the automotive industry to utilize FMEA for improving the production and design of the products. FMEA begins with the definition of the system to be analysed. In order to define the system, it is required to identify and understand the operation and design of the system. The systems that are required to be analysed are then divided into components, so that analysis can be conducted at that level. A team of expert professionals, with expertise in different areas like design, quality, manufacturing, etc., is also selected to conduct FMEA analysis. All the necessary information regarding the design and process for all components is collected by the team. All the items of the system under analysis are then studied and the failure modes, i.e., the way an item fails and the state the item is in after the failure, are identified. Along with each failure mode, it is also required to find out the frequency and the consequence of the failure and how difficult it is for tracing and preventing these failures from occurring. As a result, three criteria form the risk criteria, based on which FMEA can be conducted, namely, the severity of the consequence of the failure mode, the probability of occurrence of the failure mode, and the detectability of the failure mode before its occurrence. The team of experts takes part in discussions and brainstorming sessions, in order to ascertain the values for each of the parameters of every failure mode. The values are determined through experts’ opinions on a 10-point scale, with 1 indicating the lowest value and 10 denoting the highest value of severity (S), probability of occurrence (O), and non-detectability (D). FMEA makes use of a parameter called the Risk Priority Number (RPN) for the assessment of all failure modes. The RPN value is calculated as RPN = S × O × D

(1)

A Comparative Study of Distance-Based Clustering Algorithms …

607

The risk levels of all the failure modes are then arranged according to the decreasing RPN values. As a result, failure modes with greater RPN values are prioritized and the control measures that can prevent the occurrence of the prioritized failure modes are selected. Recommended actions are taken to prevent the failure of the system items, and reduce the risk of occurrence of the failure modes to a level that is as low as reasonably practicable (ALARP). FMEA process concludes with the documentation of the analysis and preparation and completion of the report. Therefore, risk assessment through FMEA has been an essential component of any large-scale industry like manufacturing, mining, oil industries, etc. The failure modes with the largest values of RPN are the most critical. Since RPN values are simple to use and measure, FMEA remained to be the most well-known risk prioritization method in the literature. Although straightforward, it was found to be effective in identifying the hidden high-risk hazards in a system and their possible causes, thus finding a wide range of applications. However, for the lack of sophistication, conventional FMEA inevitably falls prey to many shortcomings in practical applications. One of the major criticisms of FMEA is the lack of any mathematical backing for the RPN formula. Secondly, the relative significance of the risk criteria is neglected in the analysis process, due to which less severe failure modes of higher RPN value may be reported to be more critical as attributing equal weights to all the risk criteria is inconsistent with most real situations. In addition, measuring the risk criteria, S, O, and D accurately and objectively is very difficult in any elaborate system owing to the complex nature of the analysis and the dearth of data. As mentioned earlier, FMEA is usually conducted by a panel or team of experts, with experience in different fields and departments. These experts find themselves more confident and comfortable in providing their ratings in terms of linguistic expressions owing to the convenience of expressing their opinion in words and not any numbered value on a scale. Moreover, they may falter in expressing their opinions regarding the assessment of the failure modes due to the complex and vague nature of the risk assessment and evaluation problems. Apart from that, converting linguistic terms directly to crisp values brings in a significant amount of error. In this paper, an endeavour has been made to present a clustering-based risk prioritization approach using fuzzy set theory to improve upon the limitations of traditional FMEA. Data clustering has been applied successfully across various fields in the past years such as pattern recognition, image processing, signal processing, intrusion detection, outlier detection, etc. The objective behind implementing clustering in risk analysis is to group those points with similar risk, particularly those with higher risk. Several classes of methods like distance-based methods, densitybased methods, grid-based methods, and probabilistic methods are available in the literature [3]. Owing to the inability to precisely determine the measures of different risk criteria for each of the failure modes, fuzzy numbers could be preferred to crisp values. Many scenarios in which the clusters are not crisply separated, can be encountered, and, in such cases, soft clustering or fuzzy clustering is used, where each of the data points is a member of all the clusters with varying degrees of membership. In view of all of this, all possible clustering methods can be divided into four types, as shown in Table 1 [3].

608

N. D. Sai et al.

Table 1 Types of clustering methods

Hard clustering

Soft clustering

Hard numbers

Type-1

Type-2

Soft numbers

Type-3

Type-4

Type-1 clustering is the most widely known and the simplest form of clustering and classical K-means, DBSCAN, and Agglomerative clustering are some of the wellknown Type-1 clustering algorithms. Type-2 clustering involves fuzzy clustering of hard numbers such that the sum of all membership values of every datapoint equals one. Fuzzy C-Means (FCM) is a popular Type-2 distance-based clustering algorithm. Despite the necessity to study uncertain data, Type-3 and Type-4 clustering were utilized only in a handful of studies and the rest convert the fuzzy numbers into crisp values i.e., defuzzification of the data before clustering. This study aims to provide a comparative study on the behaviour of Type-3 and Type-4 clustering algorithms which are extended from Type-1 and Type-2 algorithms to prevent the necessity for defuzzification. This paper attempts to construct an FMEA approach that learns from all these studies and presents a more generalized and convenient use of clustering-based methods. The analysis includes the definition of a system from which different failure modes had been identified. Relative weights of each of the three criteria i.e., S, O, and D, based on experts’ opinions, were determined through the Fuzzy-Full Consistency Method (F-FUCOM) technique. For each of the failure modes, ratings in the form of linguistic values, on a 10-point scale, were ascertained for each of the three criteria, i.e., S, O, and D, which were then converted into triangular fuzzy numbers. It was followed by the categorization of the failure modes into four clusters formed through the application of the weighted Euclidean-distance approach in three clustering algorithms, namely, K-means, FCM, and Agglomerative clustering. Finally, the results were compared and discussed. A flowchart, as shown in Fig. 1, demonstrates the steps systematically adopted in this paper.

2 Literature Review As mentioned earlier in the paper, FMEA had been first introduced by the American army and later utilized by NASA for its high-risk space programmes. Subsequently, it was utilized in several industries and several alternative approaches extending upon FMEA were developed to address and remedy the drawbacks of the conventional FMEA. Many studies advocated for the use of relative weights amongst different risk criteria. Wang et al. [4] developed a risk-prioritizing FMEA approach using IntervalValued Intuitionistic Fuzzy sets (IVIFSs) for cases where the risk criteria values cannot be precisely determined due to incomplete and vague data. Yousefi et al. [5] extended the FMEA analysis by including treatment cost and treatment duration as added risk criteria, for the evaluation of health, safety, and environmental (HSE)

A Comparative Study of Distance-Based Clustering Algorithms …

609

Define a system

Break system down to component level

Identify all the failure modes

Form a team of experts

Provide rating of relative significance of the risk criteria S, O, and D in linguistic terms Convert linguistic values into Triangular Fuzzy Numbers Calculate fuzzy weights using F-FUCOM

Quantify of each failure modes with linguistic for each of the risk criteria, S, O, and D Convert linguistic values into Triangular Fuzzy Numbers

Cluster all failure modes based on weighted Euclidean distance

K-means Clustering

Agglomerative Clustering

Fuzzy C-means Clustering

Obtain results and compare between clustering algorithm

Draw insights and conclusions from the clustering results

Fig. 1 The framework of the proposed clustering-based FMEA

risks, thus increasing the ability of the model to distinguish among failure modes with almost similar S, O, D values, while insisting upon those which incur higher treatment costs to organizations. Anes et al. [6] proposed a novel risk prioritization approach by introducing the concepts of Risk Isosurface function and Risk Priority Index function. The approach developed by Wu and Wu [7] considered both fuzzy and random uncertainty by combining fuzzy set theory, Bayesian statistical inference, and beta-binomial distribution through a fuzzy-beta binomial RPN evaluation method. Many studies delve deep into analysis for the prioritization and selection of failure modes through multi-criteria decision-making (MCDM) tools. Garg et al. [8] improved upon the traditional FMEA by using novel concepts, namely, double

610

N. D. Sai et al.

upper approximated rough number (DUARN) and granulized Z-number (gZN), followed by a novel risk prioritization technique, Z-VIKOR. Das et al. [9] proposed an extended FMEA by integrating Dual Hesitant Z-number (DHZN), Correlated distance between two DHZN (CD-DHZN), and VIKOR techniques. Dhalmahapatra et al. [10] proposed a DUARN-based integrated Rough-FUCOM-Rough-TOPSIS FMEA model, which involved the determination of the relative significance of the risk criteria in terms of their weight, followed by the ranking of the failure modes for prioritization purpose. Several clustering-based approaches were also proposed in recent years which are particularly helpful in complex systems with a higher number of failure modes. Arunajadai et al. [11] demonstrated the functionality and effectiveness of clusteringbased approaches in failure mode identification. Tay et al. [12] built an incrementallearning clustering model through the development of Euclidean-distance-based similarity measures. Shahri et al. [13] developed a clustering algorithm capable of taking Pythagorean fuzzy numbers as input and used Pythagorean fuzzy-VIKOR technique for cluster prioritization. Duan et al. [14] evaluated and developed clusters of failure modes through risk assessment aided by the application of double hierarchy hesitant fuzzy linguistic term sets (DHHFLTSs) and K-means clustering. Valipour et al. [15] developed an integrated approach using FCM clustering based on weighted Euclidean distance from weights calculated using Fuzzy Best–Worst Method (FBWM) which uses pairwise comparisons amongst the risk criteria to the prechosen best and worst factors for prioritizing Health, Safety and Environment risks. They made use of Triangular Fuzzy Numbers and defuzzified them before performing the FCM algorithm. It is evident from the detailed study of available literature on FMEA analysis that multiple attempts have been made, demonstrating the application of various techniques for the prioritization of the failure modes. A survey of recent studies also shows attempts in clustering of failure modes, quantified in terms of fuzzy and other soft numbers. Building on this concept, this paper does a comparative study of the clustering of failure modes, which were quantified with triangular fuzzy numbers.

3 Preliminaries 3.1 Fuzzy Set Theory The notion of a fuzzy set was first introduced by Zadeh [16] to generalize the classic notion of a set fuzzy set theory in a mathematical framework to study the vague conceptual phenomenon and situations where fuzzy relations or criteria exist. In this section, necessary concepts of fuzzy sets are covered. Subsequent paragraphs, however, are indented. Definition 1 If X is a collection of objects denoted by x, then a fuzzy set A in X is a set of ordered pairs.

A Comparative Study of Distance-Based Clustering Algorithms …

A = { (x, μA (x))|x ∈ X }

611

(2)

where μA (x) : X → [0, 1] is called the membership function of x. Definition 2 A fuzzy set A, defined on real line R, is called a discrete fuzzy number if A satisfies the following conditions (i) μA (x) is piecewise continuous, (ii) μA (x) is convex and normalized, (iii) There exist at least one x0 ∈ R such that μA (x0 ) = 1. Definition 3 A Triangular Fuzzy Number (TFN) is a fuzzy number A = (a, b, c) if its membership function is given by

μA =

⎧ ⎪ ⎪ 0, x < a ⎪ ⎨ x−a , a ≤ x < b b−a

c−x ⎪ ,b≤x Cj(2) > Cj(3) > Cj(4) > . . . . . . > Cj(k) > . . . . . . > Cj(n)

(10)

where the ranking of the criterion is represented by k. If two criteria have equal significance, then the = sign is placed instead of the > sign. 3. The criteria are compared to the first-ranked criteria using fuzzy linguistic terms from a defined scale. In this study, the nine-point fuzzy TFN scale was adopted as shown in Table 2. 4. This leads to n − 1 comparisons between the criteria. The fuzzy criterion significance thus obtained is represented as ωCj(k) and are used to calculate the fuzzy comparative criterion significance as given in the following equation.

φk / (k+1) =

ωCj(k+1) ωCj(k)

 u ωlCj(k+1) , ωm , ω Cj(k+1) Cj(k+1) =  l m u ωCj(k) , ωCj(k) , ωCj(k)

(11)

A Comparative Study of Distance-Based Clustering Algorithms …

613

5. Here φk / (k+1) represent the fuzzy comparative significance criterion Cj(k) has when compared to the criterion Cj(k) . A fuzzy vector  of comparative significance is obtained as given below,

 = φ1/ 2 , φ2/ 3 , φ3/ 4 . . . . . . φk / (k+1)

(12)

6. Now the optimal weights w1 , w2 , w3 , . . . . . . wn are required to be calculated. If the model is fully consistent, then the equations below will be satisfied. wk+1 = φk / (k+1) , wk

(13)

wk+2 = φk / (k+2) = φk / (k+1) ⊗ φ(k+1)/ (k+2) . wk

(14)

In order to find the weights, the absolute differences are minimized, optimal wk+2 wk+1 i.e., wk − φk / (k+1) and wk − φk / (k+1) ⊗ φ(k+1)/ (k+2) . The problem can be thus expressed as, Minimize χ Subject to wk+1 w − φk / (k+1) < χ , ∀j, k wk+2 w − φk / (k+1) ⊗ φ(k+1)/ (k+2) < χ , ∀j, k n

wj = 1, ∀j,

(15)

j=1

wjl ≤ wjm ≤ wju , ∀j, wjl ≥ 0, ∀j, j = 1, 2, 3, . . . . . . n   where, wj = wjl , wjm , wju and φk / (k+1) = φkl / (k+1) , φk / (k+1) , φk / (k+1) . The computation of fuzzy weights can be followed by the estimation of the exact weights of risk criteria by utilizing the expression for the graded mean integration representation exhibited in Eq. (8). However, in this paper, fuzzy weights were considered for further computation and analysis.

614 Table 2 TFN scale for the importance of each criterion [19]

N. D. Sai et al. Linguistic expression

Fuzzy scale Inverse fuzzy scale

Equally important

(1, 1, 1)

(1, 1, 1)

Between slightly and equally (1, 2, 3) important

(1/3, 1/2, 1)

Slightly important

(2, 3, 4)

(1/4, 1/3, 1/2)

Between slightly and more important

(3, 4, 5)

(1/5, 1/4, 1/3)

More important

(4, 5, 6)

(1/6, 1/5, 1/4)

Between more and very important

(5, 6, 7)

(1/7, 1/6, 1/5)

Very important

(6, 7, 8)

(1/8, 1/7, 1/6)

Between very and extremely (7, 8, 9) important

(1/9, 1/8, 1/7)

Extremely important

(1/10, 1/9, 1/8)

(8, 9, 10)

3.3 Quantification of Failure Modes After the optimal weights are obtained, the failure modes are required to be quantified. In this study, three risk criteria, S, O, and D, are considered for quantification and subsequent analysis. Generally, each of the three parameters is usually measured on a crisp 10-point scale where a rating of 1 denotes the least severe, the least frequently occurring, and the most detectable failure modes, while the rating of 10 indicates the most severe, the most frequently occurring, and the least detectable failure modes. However, as mentioned earlier, rating or measuring the risk criteria on a scale with precise crisp numbered values is a difficult task for experts. On the contrary, the experts find it more convenient and comfortable to express their opinions in terms of linguistic values. As a result, the rating can be provided on a 10-point linguistic scale, which can be quantified by the application of fuzzy TFN. Table 3 shows the linguistic expression and their corresponding TFN-scale values. It is to be also noted that the ratings for detectability are given in the reversed order, i.e., the values under criteria D represents the non-detectability of the failure mode.

3.4 Clustering After the conversion of the linguistic data into TFNs using the scale in Table 3, distance-based clustering methods have been implemented to categorize the failure modes in terms of the risks of occurrence of each failure mode. The two most widely utilized distance-based clustering methods are partitional and hierarchical clustering methods. Partitional clustering algorithms aim to determine the clusters to which each of the data points belongs, through the optimization of an objective function, while

A Comparative Study of Distance-Based Clustering Algorithms … Table 3 TFN Linguistic scale for the ratings [22]

615

Linguistic expression

Fuzzy scale

Extremely high

(9, 9, 10)

Very high

(8, 9, 10)

High

(7, 8, 9)

Medium–high

(6, 7, 8)

Medium

(5, 6, 7)

Medium–low

(4, 5, 6)

Low

(3, 4, 5)

Very low

(2, 3, 4)

Extremely low

(1, 2, 3)

None

(1, 1, 2)

enhancing the ability of the partitions to form clusters more effectively, through the iteration of the clustering process. On the contrary, the approach of a hierarchical clustering algorithm involves the construction and growth of the dendrogram, a binary tree-based data structure, followed by the selection of the required number of clusters through the splitting of the dendrogram at different levels for attaining different clustering membership solutions for the same set of data points without any iteration. This study is aimed at investigating the comparative performance of K-means (partitional), Agglomerative (Hierarchical), and FCM (an extension of Kmeans with fuzzy membership) clustering algorithms. The algorithms are briefly explained in following sub-sections. K-means Clustering K-means clustering begins with K points being selected as the initial centroids, each of them representing a cluster, followed by the determination of the membership of every data point to any of the clusters, represented by the centroid, closest to it based on a definite proximity measure [3]. The formation of clusters is succeeded by the re-computation and updating of the centroid point for each cluster. The algorithm continues to reiterate the process of cluster-formation and centroid-recomputation to the point where the centroids stop updating or any other convergence criterion is satisfied. A thumb rule states that iterations would continue until a minimum of 1% of the data points change the memberships of the clusters [3]. The objective function of K-means clustering is the sum of squared error (SSE). SSE, which is required to be minimized, is the sum of squared distances of all the data points from their respective cluster centroids. For a dataset D = D = {x1 , x2 , x3 . . . . . . xN } consisting of N points, with clusters formed after clustering as C = {C1 , C2 , C3 . . . Ck . . . CK }, SSE is calculated as follows, SSE =

K

  xj − ck 2 k=1 xj ∈Ck

(16)

616

N. D. Sai et al.

where, ck is the centroid of the cluster Ck . By minimizing SSE, the following is obtained  xj ∈Cj xj (17) cj = nj Thus, by minimizing the SSE, the centroids of each cluster are updated using Eq. 17 and are equal to the mean of all the points in the given cluster. Then, all the points are assigned to the clusters with the updated centroids. Agglomerative Clustering Agglomerative clustering algorithm begins by considering each of the points as its own cluster. At this point, the visual depiction of all the data points exhibits them at the bottom of the dendrogram. The distances measured between the centroids of all the clusters are stored in a dissimilarity matrix and then the clusters, which are closest to each other, are combined at every level of the dendrogram, succeeded by the re-computation and updating of the dissimilarity matrix correspondingly. The agglomerative combining of clusters can be continued until the attainment of the specified number of clusters or the formation of one final cluster, which will consequently contain all the data points. This final cluster would signify the apex of the dendrogram, as well as the end of the clustering process. Fuzzy C-means Clustering This algorithm is an extension of the classical K-means clustering. In FCM, each point is not hard assigned to one cluster, rather every point has fuzzy membership to each of the clusters, the sum of which equals one. Thus, FCM is particularly useful for studying complex data sets where overlapping clusters exist. The membership weight ωxik of the point xi belonging to cluster Ck with fuzziness parameter β is calculated by, ωxik =

1 2 k  xi −ck β−1 j=1

(18)

xi −cj

These membership weights thus calculated are used to update the centroids in FCM. The centroid ck of cluster k is calculated by,  ck =

xj ∈Ck (ωxik )



xj ∈Ck

β

ωxik

xi

(19)

Like K-means, the FCM algorithm minimizes the objective function, i.e., SSE repeatedly while re-calculating and updating ωxik and ck . This algorithm is carried on as long as the centroids do not converge. The FCM algorithm is outlier-sensitive, and the final computed values of the membership weights and the centroids will be indicative of the local minimum of the objective function. All the above algorithms are designed for crisp or hard numbers. Since this analysis deals with the clustering of the failure modes which are quantified by TFNs, these

A Comparative Study of Distance-Based Clustering Algorithms …

617

techniques are required to be extended for use in the fuzzy environment. Extending these algorithms to fuzzy numbers requires the change of the distance and centroidupdating functions accordingly. This study includes the concept of the weighted Euclidean distance since it captures the proximity and separation and spread between two Triangular reasonably well    and is also computationally efficient.  Fuzzy Numbers Let A = al , am , au and B = bl , bm , bu represent two triangular fuzzy numbers. The distance between A and B is denoted by DistAB , DistAB

   2 1  l a − bl + (am − bm )2 + (au − bu )2 = 3

(20)

˜ has f failure modes and r risk criteria, and x and y describe Suppose the data set D two failure modes with their TFN representations as given below: x= y=

       x1l , x1m , x1u , x2l , x2m , x2u , . . . xjl , xjm , xju . . . . . . xrl , xrm , xru

(21)

      y1l , y1m , y1u , y2l , y2m , y2u , . . . yjl , yjm , yju . . . . . . yrl , yrm , yru

(22)



  where xjl , xjm , xju and yjl , yjm , yju are the respective TFN values of the jth risk criterion. 2 between x and y can be expressed as Now the Squared Euclidean distance Distxy [23]: 2 Distxy

r  2  2  2 

l l m u l l xj − yj + xj − yj + xj − yj =

(23)

j=1

If the fuzzy vector of the relative significance of the r criteria obtained from the    l m F-FUCOM method is w = w1 , w2 , w3 , . . . wj . . . wr , where wj = wj , wj , wju is the weight of the jth risk criterion in TFN form, then the weighted Euclidean distance WtdDistxy between x and y becomes  ⎡  2 ⎤    l l l  ⎢ wj · xj − yj ⎥  r ⎢ 2 ⎥   ⎢  ⎥  m m u WtdDistxy =  ⎢ + wj · xj − yj ⎥ ⎥  j=1 ⎢ 2 ⎦   ⎣  u l l + wj · xj − yj

(24)

618

N. D. Sai et al.

3.5 Measures for Comparison Between Clustering Algorithms The analysis concludes with the comparison between the performances of the clustering algorithms, which can be evaluated with measures of dispersion such as the Silhouette Score and the Calinski-Harabasz Index. Silhouette Score Silhouette Score measures how close a given point is to the points in the neighbouring clusters [24]. It ranges from − 1 to 1 where 1 indicates clear distinguished clusters while 0 denotes the insignificance of the distance between the clusters. A value of − 1 indicates the incorrect assigning of clusters. Silhouette coefficient of a sample = (λ − δ) max(δ, λ)

(25)

Calinski-Harabasz Index Calinski-Harabasz Index or variance ratio criterion is the ratio between inter-cluster and intra-cluster dispersions [25]. A higher index value represents dense and well-separated clusters. For a dataset D = [d1 , d2 , d3 . . . dN ] clustered into K clusters it is defined as, ! "# !  " K nk K 2 2 n d − c n c − c k i k k k=1 i=1 k=1 k CH = (26) K −1 N −k where, nk is the no. of points, ck represents the centroid of the kth cluster and c denotes the global centroid, N indicates the total number of data points.

4 Case Study 4.1 Description of the System, Identification of Failure Modes, and Collection of Data The framework developed in this paper for the identification and prioritization of failure modes had been applied to a well-defined system identified in an integrated steel plant. The system considered for the study is the compact slab production system, also known as thin slab caster and rolling. The whole slab production system could be divided into five sub-systems, namely tunnel furnace, pendulum shear, descaler, finishing mill, and coiler. Every subsystem was broken down into its component levels, followed by the identification of the failure modes for all the components through group discussion and brainstorming with experts, which was supported by historical data of failures also. A total of 72 failure modes were identified through this extensive process.

A Comparative Study of Distance-Based Clustering Algorithms …

619

Table 4 Fuzzy weights of the risk criteria Factors

S

O

D

Linguistic expression with respect to Equally important Slightly important More important best criteria (S) TFN representation

(1, 1, 1)

(2, 3, 4)

(4, 5, 6)

Fuzzy weights

(0.43, 0.68, 0.85)

(0.21, 0.21, 0.21)

(0.07, 0.12, 0.21)

Two sets of questionnaires were developed for the collection of the experts’ opinions. The first questionnaire aimed at the prioritization and assigning of the relative significance of the risk criteria, S, O, and D, in terms of weights, by comparing and rating on a 9-point linguistic scale, as shown in Table 2. The second one quantified the failure modes with ratings or measures for all three risk criteria on a 10-point linguistic scale, as shown in Table 3. This would aid in the clustering of the failure modes using K-means, Agglomerative, and FCM clustering. An expert team of personnel from the plant was asked to respond to the questionnaires utilizing their knowledge and proficiency. The data collected from the experts had been used for further analysis.

4.2 Calculation of Weights of the Risk Criteria As mentioned earlier, to compute the risk criteria weights under uncertainty, the FFUCOM technique was employed in this study. While implementing the technique, the team rated one of the risk criteria as the best criterion, followed by pairwise comparisons of the other criteria with the best criterion, utilizing the linguistic scale, given in Table 2. In this study, the severity of consequences of occurrence (S) was considered the most important criterion while it was considered slightly important to probability or frequency of occurrence (O) and more important than non-detectability (D). The F-FUCOM minimization problem’s solution, as demonstrated in Sect. 3.2, was attained with the aid of LINGO 17.0 software, which provided the fuzzy criteria weights. Table 4 shows the relative significance and the risk criteria weights.

4.3 Calculation of Weights of the Risk Criteria The panel of experts also quantified the failure modes with linguistic values for each of the three risk criteria. The failure modes were required to be clustered based on the TFN values obtained from their corresponding linguistic values, as exhibited in Table 3. Hence, all three clustering algorithms were run in the fuzzy environment. The implementation of clustering began with the selection or determination of the number of clusters. In this study, the number of clusters was selected to correspond with the principles of maintaining risk as low as reasonably practicable (ALARP) [1].

620

N. D. Sai et al.

The concept of ALARP indicates the different predetermined levels of risk of mishap and their corresponding acceptability. The levels of risk according to the ALARP principle, in ascending order of risk, are acceptable region, tolerable-and-slightlycost-ineffective region, tolerable-and-grossly-cost-ineffective region, and unacceptable or intolerable region. Analogous to the risk levels of the ALARP principle, it was predetermined to have four clusters. The clusters were named Minimal, Moderate, Major, and Extreme, based on the increasing RPN values of the centroids. As previously mentioned, the clustering of failure modes was based on weighted Euclidean distance. The membership of the clusters along with their centroids and RPN values are shown in Table 5. The clusters formed from K-means and FCM were identical but were welldistinguishable compared to Agglomerative clustering. This can be further established using the Silhouette score and the Calinski-Harabasz Index as shown in Table 6. Agglomerative clustering had lower scores and both K-means and FCM had the same higher score which implied that K-means and FCM gave more dispersed clusters. On further examination of the centroids of the clusters, the fuzzy severity coordinate of those of K-means and FCM were consistently increasing with increasing values of risk, whereas in Agglomerative clustering, the Moderate-risk cluster contained several failure modes of very high severity, which were supposed to be in Extremerisk cluster. The centroid of the Moderate-risk cluster had a very high fuzzy severity value of (7.11, 8.11, 9.11). These considerations aided in inferring that Agglomerative clustering was an inferior algorithm for clustering failure modes. The Extreme and the Major-risk clusters had a large number of failure modes as members. According to the clustering results obtained from all three algorithms, there were 24 failure modes in the Extreme-risk cluster, which was 33.33% of all the failure modes. The largest number of failure modes were present in the Major-risk cluster (27 or 37.5% of all the failure modes, according to results of K-means and FCM clustering, and 31 or 47.06% of all the failure modes according to the result of Agglomerative clustering). Thus, a significant majority of all the failure modes were found to be of higher risk.

5 Discussion The principle of ALARP necessitates the maintenance of the risk level as low as possible. The ALARP region signifies the level where the risk reduction techniques are ineffective in terms of cost and time. However, the ALARP level of risk can be tolerated, above which it is unacceptable and requires actions to reduce the risk. The region below the ALARP level is broadly acceptable because of the low-risk level. Similar decisions can be taken after the clustering of failure modes into four significant groups. As explained in Sect. 4.3, the clusters formed based on the increasing RPN values of the centroids, were named Minimal, Moderate, Major, and Extreme. The Extreme-risk cluster can be considered analogous to the unacceptable or intolerable region. These failure modes in this cluster would require the attention of

A Comparative Study of Distance-Based Clustering Algorithms …

621

Table 5 Computation of the clusters of failure modes Clustering algorithms

Clusters

K-means

Cluster 1 (MINIMAL)

Agglomerative

Fuzzy C-means

No. of failure modes 5

Centroid

RPN of centroid

[(1, 1.8, 2.8), (4.6, 5.6, 6.6), (2.43, 3.4, 4.4)]

34.91

Cluster 2 (MODERATE)

16

[(2.81, 3.81, 4.81), (2.81, 3.81, 4.81), (3.12, 4.06, 5.06)]

59.20

Cluster 3 (MAJOR)

27

[(4.37, 5.37, 6.37), (2.30, 3.22, 4.22), (2.89, 3.81, 4.81)]

66.48

Cluster 4 (EXTREME)

24

[(6.42, 7.42, 8.42), (1.67, 2.29, 3.29), (3.21, 4.17, 5.17)]

72.87

Cluster 1 (MINIMAL)

8

[(1.35, 2.25, 3.25), (3.75, 4.75, 5.75), (3.25, 4.25, 5.25)]

45.42

Cluster 2 (MODERATE)

9

[(7.11, 8.11, 9.11), (1.33, 1.67, 1.67), (3, 3.89, 4.89)]

52.57

Cluster 3 (MAJOR)

31

[(3.61, 4.61, 5.61), (2.58, 3.55, 4.55), (2.97, 3.90, 4.90)]

63.89

Cluster 4 (EXTREME)

24

[(5.63, 6.63, 7.63), (2, 2.83, 3.83), (3, 3.96, 4.96)]

74.30

Cluster 1 (MINIMAL)

5

[(1.14, 1.99, 2.99), (4.48, 5.48, 6.48), (2.56, 3.56, 4.56)]

39.39

Cluster 2 (MODERATE)

16

[(3.04, 4.04, 5.04) (2.83, 3.83, 4.83) (2.82, 3.77, 4.77)]

58.46

Cluster 3 (MAJOR)

27

[(4.29, 5.29, 6.29) (2.1, 3.03, 4.03) (2.82, 3.76, 4.76)]

60.69

Cluster 4 (EXTREME)

24

[(6.35, 7.35, 8.35) (1.6, 2.24, 3.24) (3.23, 4.19, 5.19)]

70.78

Table 6 Comparison between clustering algorithms

Method

Silhouette score

Calinski-Harabasz index

K-means

0.456

124.800

Agglomerative

0.432

114.518

FCM

0.456

124.800

622

N. D. Sai et al.

the decision-makers, who can direct recommended action for decreasing the risk of occurrence of the failure modes. The decision-makers are also equipped with the information of two more clusters- Moderate and Major, which can be considered analogous to the tolerable-and-slightly-cost-ineffective and tolerable-and-grosslycost-ineffective regions, which constitute the ALARP region. The cost-effectiveness of risk reduction for the failure modes in these two clusters can be estimated and separate steps can be taken accordingly. The Minimal-risk clusters are broadly tolerable and acceptable. Thus, the failure modes in clusters do not require any special attention, although regular monitoring should be required to keep these failure modes in check. As investigated in the previous section, a significant majority of all the failure modes were found to be of higher risk, thus indicating that the system was more prone to accidents. As a result, it had become necessary to take necessary steps to decrease the risk contained in the system, through recommended actions. The centroids and their RPN values obtained by the FCM slightly differed from those obtained by K-means. This can be attributed to the fuzzy membership of all the failure modes in all the clusters. For example, the Extreme-risk cluster also had all the failure modes belonging to lower-risk clusters with some degrees of membership, albeit very small, the centroid slightly shifts closer to the lower-risk clusters and the RPN value hence decreases. Similarly, since the Low-risk cluster also contains every failure mode belonging to higher-risk clusters with some membership, the centroid slightly shifts closer to the higher-risk clusters, thereby increasing the RPN value. Therefore, the clusters were slightly less dispersed, in the case of the FCM method, in presence of larger datasets and require to be further investigated.

6 Conclusion This study involves the prioritization of failure modes through distance-based clustering and the comparison between the clustering algorithms. First, the steps of FMEA were followed in order to define and identify a system, break it down to the component level, and identify and investigate the failure modes. It was followed by the calculation of the weights using F-FUCOM and the clustering of the failure modes by K-means, Agglomerative, and FCM algorithms. A comparative study of the three clustering algorithms using the Silhouette Score and the Calinski-Harabasz Index indicated the inferiority of Agglomerative clustering in comparison to K-means and FCM, while the relative dispersion of the centroids underlined the overall superiority of K-means clustering. The provided approach overcomes several limitations in traditional FMEA. The clustering of failure modes into four groups of varying risk reduces the reliance on the RPN score which lacks any mathematical foundations. Allowing the panel of experts to express in linguistic terms enables a better measurement of the risk criteria, as the panel would be able to provide their ratings through linguistic terms and not exact numbers on a 10-point scale. The conversion of the linguistic terms into TFN captures the uncertainty involved in the analysis. Integrating F-FUCOM

A Comparative Study of Distance-Based Clustering Algorithms …

623

for the estimation of the relative significance of the risk criteria in terms of weights, especially fuzzy weights, improves prior existing clustering-based prioritization of failure modes. This approach enables a sound decision-making process in the industry. It gives an elaborate idea of the system conditions and enables effective planning of necessary actions to check the failure modes and prevent their occurrence. Simple calculation of RPN may have helped in the prioritization of the system, a clustering algorithm provides a bigger picture of the potential of all the failure modes. Hence, combining clustering and prioritization among the high-risk failure modes through RPN may be a direction in which future research work can advance [15]. The model can also be extended by employing the opinions of multiple experts and aggregating them prior to risk assessment [26]. This will be helpful in reducing the bias in decisionmaking. Various ranking multi-criteria decision-making techniques may also be used for ranking the higher-risk failure modes precisely for employing more cost-efficient risk reduction actions. Acknowledgements We acknowledge the Centre of Excellence in Safety Engineering and Analytics (CoE-SEA) (www.iitkgp.ac.in/department/SE), IIT Kharagpur and Safety Analytics & Virtual Reality (SAVR) Laboratory (www.savr.iitkgp.ac.in) of Department of Industrial & Systems Engineering, IIT Kharagpur for experimental/computational and research facilities for this work. We would like to thank the management of the plant for providing relevant data and their support and cooperation during the study.

References 1. Ericson CA (2015) Hazard analysis techniques for system safety. Wiley 2. Military Standard (1977) MIL-STD 1629A: Procedures for performing a failure mode, effects and criticality analysis. Department of Defense, Washington DC 3. Aggarwal CC, Reddy CK (2014) Data clustering: algorithms and applications. Chapman & Hall, London 4. Wang L-E, Liu H-C, Quan M-Y (2016) Evaluating the risk of failure modes with a hybrid MCDM model under interval-valued intuitionistic fuzzy environments. Comput Ind Eng 102:175–185 5. Yousefi S, Arash A, Jamileh H, Majid B (2018) HSE risk prioritization using robust DEAFMEA approach with undesirable outputs: a study of automotive parts industry in Iran. Saf Sci 102:144–158 6. Anes V, Henriques E, Freitas M, Reis L (2018) A new risk prioritization model for failure mode and effects analysis. Qual Reliability Eng Int 34(4):516–528 7. Wu X, Wu J (2021) The risk priority number evaluation of FMEA analysis based on random uncertainty and fuzzy uncertainty. Complexity 2021:1–15 8. Garg A, Das S, Maiti J, Pal SK (2020) Granulized Z-VIKOR model for failure mode and effect analysis. IEEE Trans Fuzzy Syst 30(2):297–309 9. Das S, Garg A, Khorania Y, Maiti J (2022) Dual hesitant Z-Number (DHZN), correlated distance, and risk quantification. Int J Intell Syst 37(1):625–660 10. Dhalmahapatra K, Garg A, Singh K, Xavier NF, Maiti J (2022) An integrated RFUCOM– RTOPSIS approach for failure modes and effects analysis: a case of manufacturing industry. Reliab Eng Syst Saf 221:108333

624

N. D. Sai et al.

11. Arunajadai SG, Uder SJ, Stone RB, Tumer IY (2004) Failure mode identification through clustering analysis. Qual Reliab Eng Int 20(5):511–526 12. Tay KM, Jong CH, Lim CP (2015) A clustering-based failure mode and effect analysis model and its application to the edible bird nest industry. Neural Comput Appl 26(3):551–560 13. Shahri MM, Jahromi AE, Houshmand M (2021) Failure mode and effect analysis using an integrated approach of clustering and mcdm under pythagorean fuzzy environment. J Loss Prev Process Ind 72:104591 14. Duan C-Y, Chen X-Q, Shi H, Liu H-C (2022) A new model for failure mode and effects analysis based on K-means clustering within hesitant linguistic environment. IEEE Trans Eng Manage 69(5):1837–1847 15. Valipour M, Yousefi S, Rezaee MJ, Saberi M (2022) A clustering-based approach for prioritizing health, safety and environment risks integrating fuzzy C-means and hybrid decision-making methods. Stoch Environ Res Risk Assess 36(3):919–938 16. Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353 17. Karsak EE (2002) Distance-based fuzzy MCDM approach for evaluating flexible manufacturing system alternatives. Int J Prod Res 40(13):3167–3181 18. Zadeh LA (1975) The concept of a linguistic variable and its application to approximate reasoning—I. Inf Sci 8(3):199–249 19. Huang Z, Zhao W, Shao Z, Gao Y, Zhang Y, Li Z, Li J, Xixi Q (2020) Entropy weight-logarithmic fuzzy multiobjective programming method for evaluating emergency evacuation in crowded places: a case study of a university teaching building. IEEE Access 8:122997–123012 20. Pamuˇcar D, Stevi´c Z, Sremac S (2018) A new model for determining weight coefficients of criteria in MCDM models: full consistency method (FUCOM). Symmetry 10(9):393 21. Pamucar D, Ecer F (2020) Prioritizing the weights of the evaluation criteria under fuzziness: the fuzzy full consistency method–FUCOM-F. Facta Univ Ser Mech Eng 18(3):419–437 22. Suliantoro H, Dewi IN, Handayani NU (2016) Supply chain analysis of disposable medical devices. In: Proceedings of the 2016 international conference of management sciences (ICoMS 2016), Indonesia, pp 193–198 23. de AT de Carvalho, F, Brito P, Bock H-H (2006) Dynamic clustering for interval data based on L 2 distance. Comput Stat 21(2):231–250 24. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65 25. Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2016) A new hybrid filterwrapper feature selection method for clustering based on ranking. Neurocomputing 214:866– 880 26. Garg A, Maiti J, Kumar A (2022) Granulized Z-OWA aggregation operator and its application in fuzzy risk assessment. Int J Intell Syst 37(2):1479–1508

Data Storage, Management and Innovation

Moderating Role of Project Flexibility on Senior Management Commitment and Project Risks in Achieving Project Success in Financial Services Pankaj A. Tiwari

Abstract Commitment from the senior leadership, successful management of project risks, and flexibility enhance responsiveness in projects especially in unpredictable events, considering the large size of projects. The present research work intends to ascertain how flexibility adopted in projects influences the support from senior leadership, and risks in successful IT projects. A cross-sectional survey of 166 managers from the financial services industry, was managed to collect empirical data and to test the conceptual framework based on the latest literature in the project management domain. Ordinal regression analysis was used to demonstrate a substantial relationship between support from senior leadership, project risks, and success in projects along with the significantly positive moderation by flexibility in projects. The research outcomes of the study can support various managers to realize project goals and reduce project failures. This research work adds value to current research from the perspective of IT projects and the importance of project flexibility considering overall project performance. Keywords Senior leadership · Risk mitigation · Success · Flexibility · Financial services · Project size · Project risks · Commitments

1 Introduction With the growing attention toward the management of projects and related disciplines in a dynamic environment, there is ambiguity regarding various project roles and misconceptions about success in projects. The dynamic environment offers several opportunities and difficulties in executing projects successfully and sustains in long term. The overall success, in general, is dependent on the capabilities of the managers to handle various projects and the way the project schedule is used considering the overall project budget [1]. A project that fulfills the project requirement and P. A. Tiwari (B) School of Management, CMR University, Bengaluru, Karnataka 560043, India e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_46

627

628

P. A. Tiwari

performs as expected is considered effective. The commitment from senior management supports project managers in take appropriate decisions to achieve expected project targets [2]. Project-based organizations consider success in projects as a prime goal and continue to focus on project management. Empirical-based research especially in project management proliferated with studies conducted on key factors [3] and consider only triple constraints such as project cost, quality, and time [4]. Senior leaders managing projects are always considered to be responsible to manage project groups’ performance and accomplishment of project-specific goals [5, 6]. As organizations consistently face failures in projects, the project outcomes are completely reliant upon the senior leaders [7, 8]. In competitive and dynamic settings, IT Projects in the financial services sector serve as enablers to offer services such as payments processing systems, financial risk assessment, and asset management systems. However, it is difficult for Financial Services to sustain with flexible processes, carry out stakeholder negotiations, be responsive to risk, and emphasize advanced technological projects, while increasing the likelihood of overall success. Senior leadership faces several challenges to overcome and meet complex project requirements to achieve overall success [9]. A higher focus on sustenance, improving productivity, and responding promptly to market conditions while having control over cost are some of the complexities the senior leaders need to deal with. The influence of senior management on the attainment of organizational success is broadly recognized; however, the role of senior management needs to be applied and considered as the key factor. Senior management commitment relates to active participation and project selection to achieve strategic goals [10]. Various factors result in project success, and these should get addressed by senior management. Senior management can easily manage budget and schedules related issues. The expected benefits from a project and overall scope are very important parameters to be managed because the benefits closely relate to the project’s justification and funding. Senior management facilitates interventions in project team building through constant communication, engagement, rewards, and recognition. More studies on senior leadership are required in the project management-based literature as most of the projects fail due to a lack of senior management commitment and support [11]. Several research works have been performed to ascertain their influence on the success story of projects by considering behavioral mediators [12]. Also, several research works observed that project risk management should remain the prime consideration while managing projects to improve project performance and achieve organizational efficiency [13–15]. Studies recommended various risk mitigation strategies to reduce delays in business operations [16, 17]. The effectiveness and the moderating nature of such recommended strategies (especially project flexibility) to mitigate IT project risks have not been empirically examined. This research work intends to analyze the considerable impacts of senior leadership and risks in projects on successful projects by introducing project flexibility as a precautionary project risk mitigation strategy in dynamic settings [18]. Prior studies

Moderating Role of Project Flexibility on Senior Management …

629

also recommended having empirical research work on project flexibility [19]. Hence, posed general research questions: RQ1. How does the commitment of the senior management, flexibility, and related risks in the projects influence success in projects? RQ2. How does flexibility in projects impact the relationship concerning the commitment of the senior management, project risk management, and successful projects? This study offers crucial insights to project management professionals by answering the desired research questions. Firstly, by considering various organizational theories such as upper echelon theory and contingency theory, this study lays out the different management roles and expertise needed to achieve success in projects. Secondly, it offers insights into the managerial capabilities and project attributes that should be considered in dynamic settings. Finally, by showing how to customize project management approaches to suit different environments and minimize risk, this study adds value to project-based literature and organizations.

2 Review of Literature 2.1 Background With globalization and digitalization as key enablers, organizations are aiming for technology-based digital transformation [20]. Financial services organizations back in the 1990s digitized business processes to create new financial services and products resulting in the availability of different offline and web-based channels. However, with increasing competition and the launch of enhanced financial services and products, organizations need to keep improving and innovating to sustain themselves. The financial catastrophe of the late 2000s made the financial market volatile and more challenging in terms of competition [21]. For decades financial services organizations have focused on enhancements of their service and product offerings. Only a few organizations were able to offer innovative solutions to consumers. Established financial services organizations have found it challenging to use technological advancements in realizing new opportunities in business [17]. Such technologyintensive projects remain contingent on technical competencies to be successful and offer business benefits [22, 23]. As mentioned previously, studies have been conducted to determine the factors that influence organizational performance as well as success in projects [24]. Constantly, senior management commitment is identified as a key factor in this research to achieve success in general [5].

630

P. A. Tiwari

2.2 Senior Management Commitment Senior management represents the topmost management level (e.g., Executive managers) in an organization. The commitment of the senior management is observed in different ways such as involvement, engagement, leadership, focus, and support of executive management in a project management environment. Support from senior management is imperative because resources are controlled within an organization by the top management. Having senior management commitment during project scope changes, budget overruns, delays in project schedules, and similar crises helps to obtain required resources and approvals. Specifically, for IT projects, budget allocation and extra funding for process engineering and additional training can be easily managed if they have top management commitment [25]. Senior management support encourages managers to set clear project objectives in achieving desired business outcomes and achieve customer satisfaction. As organizations want to increase their chances of accomplishing business success through projects, senior management should be committed to optimizing business operations and exploring more opportunities by taking pioneering risks [4]. Technology-based projects and their successful implementation depend on the intensity and sustained commitment of top management. With no involvement and support from senior management, it becomes difficult for managers, who run projects in silos, to manage when any complex issue arises. Senior management should clarify the rationale behind any projects with great enthusiasm and by applying their experience in project activities to enrich the business outcomes. Quite a few studies recognize support from upper management as a significant aspect of executing technology-based projects and their success. However, senior management commitment still needs attention [17].

2.3 Project Risk Management Project risk refers to uncertain events and outcomes that create a threat to overall success. Studies support the view of having opportunities as well as threats while considering the project risk management process (PMI 2013). Managing risks in projects is important for organizations to focus more on opportunities by reducing the impact of uncertain events and concentrating on assessing and identifying factors that influence project success [26]. The information generated during the process of managing project risks helps to make project decisions. Whereas tools and techniques developed during the risk process support project risk activities. Studies stated that to mitigate risks, effective governance structures should be outlined in project-based organizations along with effective project risk management [27]. During the different stages of project execution, it is unpredicted to determine the risks that may deter the governance and related strategies, by impacting the performance of projects. Studies stated that risks in projects are directly associated with success in projects and may underperform success, especially when new products are developed [28]. Project

Moderating Role of Project Flexibility on Senior Management …

631

risk management aims to influence project stakeholders’ perception, awareness of the project environment, and related behavioral aspects. Thus, to enhance continuous learning and knowledge there is a need to have more empirical research related to project risk management [14].

2.4 Risk Mitigation Strategies Studies claim that risks in projects can be defined in a way that impacts projects during the initiation and execution stages. Typical project risks or project changes include delays in the start and finish of project activities, changes in the project tasks, and resource variations. Such changes occur due to impulsive decision-making and insufficient information across project stakeholders. Studies have emphasized having a systematic risk management process in IT projects. It is debated that strategies for risk mitigation positively influence the on-time delivery of projects through improved estimation of project resources. To manage project risks, project managers should develop risk mitigation strategies that suit the project’s needs. Several risk mitigation strategies to reduce project delays have been projected in the existing literature related to projects; that focus on unexpected variations such as crashing project activities, vertical integration approach, project visibility, supplier development, project flexibility, etc. Project-based organizations should adopt flexibility to manage challenges that arise due to the complexity, uncertainty, and distinctiveness of projects; along with systemic thinking [29, 30]. Also, project performance outcomes based on scope, budget, and schedule, should be given more importance rather than goalsbased outcomes [31, 32]. The effectiveness of such recommended strategies to mitigate risks and lower the impact of delays in business has not been examined. The present research work examines the influence of flexibility in minimizing project delays and achieving success.

2.4.1

Project Flexibility

By incorporating a flexible approach, projects can manage uncertainty, remain flexible in schedules and protect themselves from indefinite consequences. Project flexibility is generally expected in the initial stages to make sure that changes can be adapted by organizations, especially in dynamic settings. Projects that showcase flexibility support the long-term plans of the organization and suggest strategies or evolving methodologies needed to accomplish success and utilize any overlooked opportunities by altering project capacities [18, 33]. Competent organizations remain flexible and sustained in dynamic settings and market competition by continued focus on potential clients or customers. Thus, any organization’s success rate is based on the ability to react to dynamic scenarios [19, 34]. Project flexibility does not violate any project outcomes decided previously. Rather, it reduces complexities and uncertainties through risk mitigation approaches required in dynamic settings. Hence,

632

P. A. Tiwari

when uncertainty is high organizations should maintain flexibility and advance effectiveness in projects.

2.5 Success in Projects The rate of failure or success in projects depends on how the stakeholders perceive it [13]. It is vital to distinguish the success measures while specifying the preliminary scope of any project during the initial stages of the project lifecycle. Both conceptual, as well as operational perspectives, are important in project management. Success in projects is the combination of schedule, customer satisfaction, quality, and cost [35]. The three aspects of project success are important because they determine the way overall business, clients, and employees get influenced by the projects, how efficient the project is, and the level of preparation for future opportunities [36]. Whereas, successful projects can be seen as the derivative of quality, time, budget, several external controls, user satisfaction, health & safety, and most importantly project’s commercial value [37]. It becomes difficult to quantify the performance of any project due to the presence of multiple stakeholders and goals to accomplish. Thus, selecting one predominant goal to represent each project stakeholder becomes challenging [38]. Thus, the observed performance of projects is regarded as a proxy demonstrating project outcomes. The project performance is determined using different dimensions such as time, quality, and cost dimensions [39]. Emphasizing various factors may restrict the expected performance of the projects and impact the required project activities as well as decision-making. Consequently, the additional dimension of scope was introduced to focus on the attitudes and perceptions of customers. It is observed as a key enhancement to determine project outcomes. The study by [40] mentioned about projects that are managed operationally and strategically in any organization. The projects which are managed operationally emphasize more on project-related performance considering goals, budget, and time. Whereas strategically managed projects keep the focus on business value creation and related outcomes to fulfill long-term avenues [14]. Organizations give attention and spend considerable time improving business success and planning for future endeavors. Hence, business success is included when overall success in projects is measured to account for value created through the projects in the market [41].

Moderating Role of Project Flexibility on Senior Management …

633

3 Conceptual Framework and Hypotheses Development 3.1 Senior Management Commitment and Project Success In the turbulent and rapidly changing environment, organizations should be proactive and remain innovative to gain a competitive advantage. Some of the aspects such as organizational culture can be managed by senior leaders. To achieve innovativeness and offer innovative culture, senior management should be committed, initiate the creative process, support creative ideas, and continuously focus on technological advancements by encouraging managers to take risks. Commitment from senior leadership is important to manage project risks, project autonomy, and overall project success. Top management can align individual activities by synergizing, to attain the organizational goal and is a crucial component for project success [42, 43]. Studies described that whenever environmental strategies are assimilated into organizational processes, they enhance organizational and project performance by committing the senior management to improved project management processes [44]. Hence, proposed: H1: Senior management commitment is significantly associated with success in projects.

3.2 Project Risk Management and Project Success The general contingency theory supports the notion that control activities in projects are important to achieve success. Project risks affect the overall project as well as success in business [45]. The organization benefits by realizing effective processes to manage risks resulting in business value creation and success. Thus, success in projects is achieved through short-term goals as project performance and long-term goals as business success in achieving the desired results [46]. Hence, proposed: H2: Project risk management is substantially associated with success in projects.

3.3 Project Flexibility and Project Success Usually, during the preliminary project phase, project planning and gathering specific information on project-related activities takes time and includes project scope definition, acquisition of project resources, project sponsorship, environmental factors, and any regulatory project needs [47]. Flexibility in projects provides the ability to make changes with minimal restrictions considering budget, effort, or performance. It represents the potential of a project to cope with the scope creeping with suitable management actions, measures, and defined policies. Project flexibility is a critical

634

P. A. Tiwari

element to make sure that the project remains as per plan considering time, quality, and cost [48]. Flexible project approaches and practices align with organizational and project-based goals as per strategic plans [29]. This offers a suitable indication of project success at all stages of project execution and implementation. Hence proposed: H3: Project flexibility has a substantial influence on successful projects.

3.4 Moderation Effects of Project Flexibility Contingency theory is extensively applied in the project management sphere. The contingency theory perspective foresees scenarios where the influence of project flexibility will be low or high. Based on the uncertainties, various approaches are applied to manage project risks. The study examines how flexibility in projects influences risks in projects as well as in business. Consequently, flexibility in projects is additionally supported by the contingency perspective as projects with more flexibility are less likely to embrace standard processes. In project-based organizations, risk mitigation strategies are used to overcome project-related uncertain events. Project managers should be adaptable and flexible concerning the changes as expected [49]. Customary attention in project management handles uncertainties and befits a progressive environment. Such dynamic-natured settings help when stakeholders of a project gain a significant understanding of the actual project requirements. Therefore, flexibility is the response to uncertainty created by the environment [50]. Flexibility relates to the involvement of project stakeholders, the approach used to manage projects and the way project information is shared across project stakeholders. Successful and effective projects need more flexibility especially in financial and technical capabilities and contractual engagements, for project risk mitigation [51]. Hence suggested: H4: Flexibility in projects moderates the influence of senior management commitment on the success of projects. H5: Flexibility in projects moderates the influence of risks in projects on the success of projects.

3.5 Conceptual Framework Figure 1 shows the conceptual framework for the present study based on projectrelated literature and by applying theoretical aspects.

Moderating Role of Project Flexibility on Senior Management …

635

Fig. 1 Conceptual framework

4 Research Design 4.1 Sampling and Data Collection To perform the hypotheses testing, 166 datasets are used as the sample for the present study considering the IT projects. To assess the impacts of senior management commitment and project flexibility on project success, the IT projects selected for the research were realized between 2015 and 2021. Over 400 various levels of managers from the Financial Services industry were emailed a web-based survey questionnaire. All questionnaires were cautiously verified for data correctness concerning the participants [52]. The survey response rate was 25.9%. There were no significant differences were observed (alpha 5%) between initial and later responses from the respondents [53]. To minimize the bias risk centered on common-method variance, a dual-informant design was implemented by including project managers at various levels. The project managers assessed senior management commitment, project risk management, project flexibility, and project performance. The business success is examined by senior managers.

4.2 Characteristics of the Sample Figure 2 shows the sample characteristics. Most projects have an average project size with a project budget between ten to fifteen million USD, a project duration between

636

P. A. Tiwari

Fig. 2 Sample characteristics

one to three years, and a number of project members per project between the ranges of fifty to hundred.

4.3 Measures The study variables were based on multiple-item scales referred from project management, top management commitment, entrepreneurship, and related literature. A few scales were re-worded and adapted to align with the study context. Five industryspecific professionals from the sample firms were consulted to evaluate all items based on a 7-point Likert scale, and by averaging the particular items each study variable was constructed [52]. A double-blind back-translation approach was taken into consideration to ensure meaning accuracy [54]. A pilot test was conducted with consultants from the Financial Services industry for the validation of all measures [55]. All item scales’ validity was verified by applying PCFA (principal components factor analysis) and CFA (confirmatory factor analysis). The PCFA was performed to observe that all items load as a single factor. The reliability of the scales was found to be more than 0.7 (considering Cronbach’s alpha (α. CFA was carried out for measurement model validation [56, 57]. The measurement model is found to be acceptable with CFI and GFI exceeding 0.90, RMSEA below 0.07, and SRMR below 0.08. The model fit was acceptable at CMIN/DF = 2.908, CFI = 0.984, RMSEA = 0.073, and SRMR = 0.024 [58].

4.3.1

Dependent Variable

Project success is assessed (α = 0.834) using seven items e.g., projects have a high scope, quality, budget, and schedule adherence, the project generated high profits,

Moderating Role of Project Flexibility on Senior Management …

637

advancement in technological capability, and new market or product created based on project results [40, 59].

4.3.2

Independent Variable

Senior management commitment is measured using a 6-item scale (α = 0.801) developed by [60] and [1] e.g. senior management supported to have adequate project resources to implement successfully; senior management instituted and adapted adequate processes, structures, and controlling mechanisms; senior management established frequent communication with project teams; senior management holds pertinent project expertise; senior management used authority to advance capabilities in project management; senior management inspired the project groups to attain project objectives. Project risk management is measured using four items (α = 0.890) based on [61] and [62] e.g., duties are specified to oversee risks, risk procedures are detailed, project risk analysis is taken out, and planning for uncertainty and risk response is performed.

4.3.3

Moderating Variable

Project Flexibility is measured using three items (α = 0.716), conceptually based on work by [19] e.g., it is possible to switch different project resources; the project team can cope with changes in the project; an alternative capacity is available to accommodate the change in project specifications.

4.3.4

Control Variable

Studies showed that the success of projects decreases as the project size increases [63, 64]. Thus, project size is one of the significant bases of project success and is measured with items adapted from [65]. Project size is captured by the natural logarithm of the mean value of project team size, project budget allocated, and project duration in months.

5 Research Findings This research significantly contributes to practices and literature because none of the studies, to our knowledge, examines the moderation of risk mitigation strategies on senior leadership and risks in successful financial services IT projects.

638

P. A. Tiwari

Table 1 Descriptive statistics

Kendall’s tau_b

Spearman’s rho

Variables

0

1

2

3

4

Mean

4.843

3.954

4.945

4.874

4.911

Std. Dev.

1.012

2.476

1.674

1.228

1.113

Reliability

0.834

n.a

0.802

0.88

0.705

0

Project success

1

1

Project size

(−)0.177**

1

2

Senior management commitment

0.318**

(−)0.123

1

3

Project risk management

0.393**

0.124*

0.424**

1

4

Project flexibility

0.405**

0.027

0.365***

0.426**

0

Project success

1

1

Project size

(−)0.228**

1

2

Senior management commitment

0.443**

(−)0.169*

1

3

Project risk management

0.525**

0.165*

0.529**

1

4

Project flexibility

0.517**

0.033

0.475**

0.571**

1

1

** *

Correlation is significant at the 0.01 level (2-tailed) Correlation is significant at the 0.05 level (2-tailed)

5.1 Descriptive Statistics Table 1 represents the descriptive statistics including the correlation between variables.

5.2 Moderation Analysis with the Ordinal Regression Analysis In the present research paper, the relationship between ordinal outcome variables i.e., project success, senior leadership commitment, and project flexibility is being established. The study variables were measured on an ordinal, categorical, and sevenpoint Likert scale. It was not possible to assume the homogeneity of variance and normality for the ordinal categorical outcomes. The ordinal regression method was

Moderating Role of Project Flexibility on Senior Management …

639

preferred because it does not assume constant variance and normality but needs the assumption of parallel lines throughout all levels of the categorical outcome [66, 67]. Diagnostic tests were performed before performing the regression analysis to determine any assumption violation. No variables were found with any missing values. Also, the data analysis using ordinal regression was carried out using the SPSS tool. Table 2 depicts the ordinal regression analysis outcomes. Model 1 consists of only the control variable with the project size (B = (−)0.287, p < 0.01) has a significant negative impact on project success. To measure the direct effects of the independent variable, Model 2 introduced independent variables. Model 2 shows the substantial effects of senior management commitment (B = 0.513; p < 0.01) and project risk management (B = 0.887; p < 0.001). As per the Cox and Snell pseudo-R2 , Model 2 explained 39.6 percent of the variation of the project success. The moderating variable project flexibility is included in Model 3 and shows significant outcomes of senior management commitment (B = 0.295; p < 0.1), project risk management (B = 0.627; p < 0.001), and project flexibility (B = 1.140; p < 0.001). As per the Cox and Snell pseudo-R2 , Model 3 described 48.6 percent of the variation of the project success. Model 4 added interaction terms to the previous model to examine the moderation effect of the project flexibility variable on senior management commitment and project risk management on project success. Model 4 described a 51.1 percent variation in project success as per Cox and Snell pseudo-R2 . Model 4 showed the variation inflation factor (VIF) as 2.54.

6 Discussion This study addresses the research gaps and supports the notion that senior management commitment and project flexibility have a significant influence on success. The positive impact of senior management commitment and risks in projects on success has been supported by the study outcomes. The strong influence of senior management commitment helps in achieving project success and business benefits and is in line with the earlier upper-echelon behavior that focuses on the effect of top management support like advice seeking, behavioral integration, entrepreneurial drive, and risk-taking on business benefits. The study results support this positive effect at the project level by applying a project-level approach. Study outcomes show that commitment from senior leadership teams in any organization is critical and motivates internal and external project stakeholders to accomplish desired success in projects. Thus, the involvement of senior management offers more opportunities to achieve project-based objectives, increases influence on project sponsors or stakeholders, and leads to long-term success. The success of projects differs based on project management approaches adopted, project environments, and projects. The study findings show that flexibility is a key factor to achieve success in projects. A few research studies offer similar conclusions, but none have examined the extent of senior management roles and the interactional effect of flexibility of

640

P. A. Tiwari

Table 2 Ordinal regression analysis Variables

Project success Model 1

Model 2

Model 3

Model 4

Control variable Project size

(−) 0.096 (−) 0.104 (−) 0.287** 0.402*** 0.353**

0.103

(−)0.367*** 0.104

0.175

0.126*

0.195

Independent variable Top management commitment

0.513**

0.166 0.295*

Project risk management

0.887*** 0.148 0.627*** 0.153

0.865***

0.179

1.140*** 0.209

1.188***

0.22

Top management commitment * Project flexibility

(−) 0.334*

0.194

Project risk management * Project flexibility

0.778**

0.252

Moderating variable Project flexibility Interaction

− 2 log likelihood

510.208

1145.605

1148.378

1148.378

Likeliehood 8.895** ratio (Chi-square)

83.650**

110.517***

73.413**

Pseudo R^2 Cox and Snell

0.396

0.486

0.511

0.052

Notes N = 166 1. Unstandardized regression coefficients and standard errors are shown 2.*** Significance at 0.001 level (2-tailed) 3.** Significance at 0.01 level (2-tailed) 4. * Significance at 0.1 level (2-tailed)

projects on the success of IT projects in the Financial Services industry types [8, 24]. Project managers can include flexible approaches as one of the project risk mitigation strategies to avoid any uncertain situations resulting in project failures. The current study indicates that flexibility in projects has a positive moderation effect on the association between senior management commitment and project success. Therefore, the impact of senior leadership on the success of projects is high when flexibility in projects is high.

Moderating Role of Project Flexibility on Senior Management …

641

The study findings suggest that in case of higher technological uncertainty having a defined process to manage project risk is always beneficial. Consequently, organizations should tailor the process to follow while managing project risks based on dynamic settings [68]. A process to manage project risks is imperative irrespective of environmental changes, technological, organizational, and market, and to have greater success in projects. Strategies used to mitigate risks minimize the unfavorable effects of uncertain events through primary project risk analysis along with contingent factors. During the risk planning for innovative projects, it becomes difficult to collate project-related information as innovation exposes projects to uncertain situations. Thus, project resources should be assigned appropriately to generate the required information needed for risk planning. Furthermore, the negative influence of flexibility on the success of projects should increase with more flexible approaches and practices. Hence, study findings reinforced the substantial moderation effect of flexibility in projects. Since managers may have different preferences, perspectives, and capabilities, these factors inhibit project success [69].

7 Conclusion and Managerial Implications Each project is unique and needs a different contingent approach to achieve project goals. Different variables are involved in a project change based on the project context. This study highlights and illustrates how senior management commitment, risks in projects, and mitigation strategies (i.e., project flexibility) result in the success of IT projects in financial services. The study findings state that greater remunerations of senior leadership commitment for projects with higher complexity, innovativeness, and uncertainties lead to overall business as well as project success. The examination of success in IT projects is imperative in measuring the overall performance of project groups in the long term. To improve project management approaches, lessons learned from individual projects should be recognized as significant. However, it can be challenging to separate the root causes of individual projects from failures or successes in projects. The study outcomes offer useful insights specific to project management and can be applied to project portfolios or programs and in discrete projects. Overall, this study addresses the weaknesses of the contingency perspective as only a few empirical studies have been presented to date. Considering the practical aspects of project management and effectively using flexibility in projects, project managers should showcase a “can do” attitude, remain focused to have a supportive culture across project groups, demonstrate the ability to accomplish project goals, collaborate with multiple stakeholders, and enhance knowledge sharing with continuous learning. Likewise, senior managers should continue to effectively manage risks with timely decision-making and have constant interaction with project sponsors. Senior managers should prioritize key challenges with careful allocation of project resources during the implementation of radical and disruptive innovation. Thus, project managers should remain focused in the current market

642

P. A. Tiwari

settings by not just achieving schedule and budget goals but also managing different organizational aspects. The study findings also illustrate that project flexibility has a substantial influence on senior management commitment and success in general. Hence, senior leadership should be committed to managing the required level of flexibility in project execution to remain successful. Senior managers should encourage innovation-driven leadership capabilities at the project or portfolio levels, have more visibility on project controls through periodic reviews, and build an entrepreneurial-oriented culture to foster business outcomes Organizations should train project managers on strategies that offer flexibility and develop risk mitigation strategies to deal with late reconciliation of requirements (especially for stage-gate models), resource allocation in dynamic settings, contingency planning, etc. Hence, managers should remain focused on the strategic goals of the organization, resource development process, and allocation of tangible and intangible assets, and also realize transformational remunerations such as new processes, skills, and business tactics.

8 Limitations and Recommendations for Future The limitations of this study can be answered by further research. First, crosssectional data representing IT projects from Financial Services organizations are used. Therefore, future studies can be conducted considering project types from different industries, and sectors. Second, although this study examines the moderation role of risk mitigation strategy (i.e., project flexibility) between senior management commitment and the success of projects, managers’ decision-making consists of many contingency factors. Future studies can examine other potential moderating variables considering risk mitigation strategies such as project visibility and how the risk management process influences the entire process. Third, the study depends on the manager’s perspective and quantitative method to analyze the relationship between senior management commitment and project flexibility in achieving project success. Future studies can assess using a qualitative method, for example, case studies, in-depth interviews, or focus groups, and analyze senior leadership commitment at different project stages.

9 Declaration of Competing Interest None

Moderating Role of Project Flexibility on Senior Management …

643

References 1. Raziq MM, Borini FM, Malik OF, Ahmad M, Shabaz M (2018) Leadership styles, goal clarity, and project success: evidence from project-based organizations in Pakistan. Leadersh Org Dev J 39(2):309–323 2. Shao J (2018) The moderating effect of program context on the relationship between program managers’ leadership competences and program success. Int J Project Manage 36(1):108–120 ˇ 3. Vrchota J, Rehoˇ r P, Maˇríková M, Pech M (2021) Critical success factors of the project management in relation to industry 4.0 for sustainability of projects. Sustainability 13(1):281 4. Nunes M, Abreu A (2020) Applying social network analysis to identify project critical success factors. Sustainability 12(4):1503 5. Garousi V, Tarhan A, Pfahl D, Co¸skunçay A, Demirörs O (2019) Correlation of critical success factors with success of software projects: an empirical investigation. Software Qual J 27(1):429–493 6. Nguyen LD, Ogunlana SO, Lan DTX (2004) A study on project success factors in large construction projects in Vietnam. Eng Constr Architectural Manage 11(6):404–413 7. Müller R, Turner R (2007) The influence of project managers on project success criteria and project success by type of project. Eur Manag J 25(4):298–309 8. Zaman U, Nawaz S, Tariq S, Humayoun AA (2019) Linking transformational leadership and “multi-dimensions” of project success; moderating effects of project flexibility and project visibility using PLS-SEM. Int J Managing Projects Bus 13(1):103–127 9. Montoya M. (2016) Agile adoption by the financial services industry. In: cprime. Retrieved from https://www.cprime.com/resources/blog/agile-adoption-financial-services-industry/ 10. Kaupa F, Naude MJ (2021) Critical success factors in the supply chain management of essential medicines in the public health-care system in Malawi. J Glob Oper Strateg Sourcing 14(3):454– 476 11. Malagueño R, Gomez-Conde J, de Harlez Y, Hoffmann O (2021) Controller involvement in a project management setting: effects on project functions and performance. J Appl Account Res 22(2):334–364 12. Naeem S, Khanzada B (2017) Impact of transformational leadership in attainment of project success: the mediating role of job satisfaction. Int J Bus Soc Sci 8(9):168–177 13. Müller R, Jugdev K (2012) Critical success factors in projects: pinto, slevin, and Prescott-the elucidation of project success. Int J Manag Proj Bus 5(4):757–775 14. Teller J, Kock A, Gemünden HG (2014) Risk management in project portfolios is more than managing project risks: a contingency perspective on risk management. Proj Manag J 45(4):67– 80 15. Yang L-R, Wu K-S, Wang F-K, Chin P-C (2012) Relationships among project manager’s leadership style, team interaction and project performance in the Taiwanese server industry. Qual Quant 46(1):207–219 16. Gunduz M, Nielsen Y, Ozdemir M (2013) Quantification of delay factors using the relative importance index method for construction projects in Turkey. J Manag Eng 29(2):133–139 17. Nguyen TS, Mohamed S (2021) Mediation effect of stakeholder management between stakeholder characteristics and project performance. J Eng Proj Prod Manage 11(2):102–117 18. Olsson NOE (2008) External and internal flexibility–aligning projects with the business strategy and executing projects efficiently. Int J Project Organ Manage 1(1):47–64 19. Zailani S, Aziz H, Ariffin M, Iranmanesh M, Moeinzadeh S, Iranmanesh M (2016) The moderating effect of project risk mitigation strategies on the relationship between delay factors and construction project performance. J Sci Technol Policy Manage 7(3):346–368 20. Parida V, Sjödin DR, Lenka S, Wincent J (2015) Developing global service innovation capabilities: how global manufacturers address the challenges of market heterogeneity. Res Technol Manage 58(5):35–44

644

P. A. Tiwari

21. Berry LL, Bolton RN, Bridges CH, Meyer J, Parasuraman A, Seiders K (2010) Opportunities for innovation in the delivery of interactive retail services. J Interact Mark 24(2):155–167 22. Frefer AA, Mahmoud M, Haleema H, Almamlook R (2018) Overview success criteria and critical success factors in project management. Ind Eng Manage 7(1):1–6 23. Tiwari P, Suresha B (2020) Mediating role of project innovativeness between top management commitment and business benefits. Kala Sarovar 23(4):359–377 24. Khattak MS, Shah SZA (2020) Top management capabilities and firm efficiency: relationship via resources acquisition. Bus Econ Rev 12(1):87–118 25. Oh M, Choi S (2020) The competence of project team members and success factors with open innovation. J Open Innov Technol Mark Complex 6(3):51 26. Petit Y (2012) Project portfolios in dynamic environments: organizing for uncertainty. Int J Project Manage 30(5):539–553 27. Zwikael O (2016) Editorial – international journal of project management special issue on project benefit management. Int J Project Manage 34(4):734–735 28. Salomo S, Weise J, Gemünden HG (2007) NPD planning activities and innovation performance: the mediating role of process management and the moderating effect of product innovativeness. J Prod Innov Manag 24(4):285–302 29. Saeed MA, Jiao Y, Zahid MM, Tabassum H (2017) Relationship of organisational flexibility and project portfolio performance: assessing the mediating role of innovation. Int J Proj Organ Manage 9(4):277–302 30. Frank M, Kordova S (2013) Developing systems thinking through engaging in multidisciplinary high-tech projects. Int J Proj Organ Manage 5(3):222–238 31. Davis P (2007) The effectiveness of relational contracting in a temporary public organization: intensive collaboration between an English local authority and private contractors. Public Adm 85(2):383–404 32. Olsson NOE (2006) Management of flexibility in projects. Int J Project Manage 24(1):66–74 33. Floricel S, Piperca S, Banik M (2011) Increasing project flexibility: the response capacity of complex projects. In: Project management institute 34. Skorstad EJ, Ramsdal H (2016) Flexible organizations and the new working life: a European perspective. Routledge Taylor & Francis Group, New York 35. Pinto MB, Pinto JK (1991) Determinants of cross-functional cooperation in the project implementation process. Proj Manage J 22(2):13–20 36. Carvalho MM, Rabechini R Jr (2017) Can project sustainability management impact project success? An empirical study applying a contingent approach. Int J Project Manage 35(6):1120– 1132 37. Wu G, Liu C, Zhao X, Zuo J (2017) Investigating the relationship between communication conflict interaction and project success among construction project teams. Int J Project Manage 35(8):1466–1482 38. Klijn EH, Koppenjan J (2016) The impact of contract characteristics on the performance of public-private partnerships (PPPs). Public Money Manage 36(6):455–462 39. Chipulu M, Ojiako U, Gardiner P, Williams T, de Mota CMM, Maguire S, Shou Y, Stamati T, Marshall A (2014) Exploring the impact of cultural values on project performance—the effects of cultural values, age and gender on the perceived importance of project success/failure factors. Int J Oper Prod Manag 34(3):364–389 40. Shenhar AJ, Dvir D, Levy O, Maltz AC (2001) Project success: a multidimensional strategic concept. Long Range Plan 34(6):699–725 41. Zhao J, Du B, Sun L, Lv W, Liu Y, Xiong H (2021) Deep multi-task learning with relational attention for business success prediction. Pattern Recogn 110:107469 42. Gemünden HG, Salomo S, Krieger A (2005) The influence of project autonomy on project success. Int J Project Manage 23(5):366–373 43. Tzempelikos N (2015) Top management commitment and involvement and their link to key account management effectiveness. J Bus Ind Mark 30(1):32–44

Moderating Role of Project Flexibility on Senior Management …

645

44. Unger BN, Kock A, Gemünden HG, Jonas D (2012) Enforcing strategic fit of project portfolios by project termination: an empirical study on senior management involvement. Int J Project Manage 30(6):675–685 45. Zwikael O, Meredith JR (2018) Who’s who in the project zoo? The ten core project roles. Int J Oper Prod Manag 38(2):474–492 46. Diab M, Mehany M (2021) Contingency use and project delivery influence on infrastructure project risk assessment. In: Collaboration and integration in construction, engineering, management and technology, pp 589–592 47. Pollack J, Helm J, Adler D (2018) What is the iron triangle, and how has it changed? Int J Managing Projects Bus 11(2):527–547 48. Shahu R, Pundir AK, Ganapathy L (2013) An empirical study on flexibility: a critical success factor of construction projects. Glob J Flex Syst Manag 13(3):123–128 49. Casady CB, Eriksson K, Levitt RE, Scott WR (2018) Examining the state of public-private partnership (PPP) institutionalization in the United States. Eng Proj Organ J 8(1):177–198 50. Nandakumar MK, Jharkharia S, Nair A (2013) Environmental uncertainty and flexibility. Glob J Flex Syst Manag 13(2):121–122 51. El-Sayegh SM (2014) Project risk management practices in the UAE construction industry. Int J Project Organ Manage 6(1–2):121–137 52. Hair JJF, Black WC, Babin BJ, Anderson RE, Tatham R (2010) Multivariate data analysis. Pearson Prentice Hall, Upper Saddle River 53. Podsakoff PM, MacKenzie SB, Lee J-Y, Podsakoff NP (2003) Common method biases in behavioral research: a critical review of the literature and recommended remedies. J Appl Psychol 88(5):879–903 54. Sinaiko HW, Brislin RW (1973) Evaluating language translations: experiments on three assessment methods. J Appl Psychol 57(3):328–334 55. Nunnally JC (1994) Psychometric theory 3E. Tata McGraw-Hill Education, New York 56. Jr Guide VDR, Ketokivi M (2015) Notes from the editors: redefining some methodological criteria for the journal. J Oper Manage 37(1):v–viii 57. Ketokivi M (2006) Elaborating the contingency theory of organizations: The case of manufacturing flexibility strategies. Prod Oper Manag 15(2):215–228 58. Hu LT, Bentler PM (1999) Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Modeling Multi J 6(1):1–55 59. Jonas D, Kock A, Gemünden HG (2013) Predicting project portfolio success by measuring management quality: a longitudinal study. IEEE Trans Eng Manage 60(2):215–226 60. Boonstra A (2013) How do top managers support strategic information system projects and why do they sometimes withhold this support? Int J Project Manage 31(4):498–512 61. Ilincuta A (1997) Risk, project management and project success for software industry. Unpublished master’s thesis, University of Calgary, Calgary, AB 62. Willauer B (2003) Consensus as key success factor in strategy-making, 1st edn. Dt. Univ., Verlag 63. Yaw K-AG, Afful E, Matey HA (2019) IT project success: practical frameworks based on key project control variables. Int J Softw Eng Appl 10(5):55–69 64. Shenhar AJ, Dvir D, Lechler T, Poli M (2002) One size does not fit all — true for projects, true for frameworks. In: Proceedings of PMI research conference. Frontiers of Project Management research and applications. Project Management Institute, Seattle, WA, U.S.A, pp 99–106 65. Barki H, Rivard S, Talbot J (2001) An integrative contingency model of software project risk management. J Manag Inf Syst 17(4):37–69 66. Norusis M (2008) SPSS 16.0 advanced statistical procedures companion. Prentice Hall Press, New Jersey, USA 67. Tiwari P, Suresha B (2021) Moderating role of project innovativeness on project flexibility, project risk, project performance, and business success in financial services. Glob J Flex Syst Manage 23(3):179–196 68. Shenhar AJ (2001) One size does not fit all projects: exploring classical contingency domains. Manage Sci 47(3):394–414

646

P. A. Tiwari

69. Patanakul P (2015) Key attributes of effectiveness in managing project portfolio. Int J Project Manage 33(5):1084–1097 70. Raz T, Shenhar AJ, Dvir D (2002) Risk management, project success, and technological uncertainty. R&D Manage 32(2):101–109

Growth Profile of Using AI Techniques in Antenna Research Over Three Decades G. S. Mani and S. D. Pohekar

Abstract Artificial Intelligence has emerged as one of the very highly researched areas in recent times. The antenna research community has been exploring AI techniques for the design and development of various types of antennas for communication, radar, aerospace, and other applications for several years. This research has gained large significance in the last few decades due to the emergence of many wireless devices used in communications and computing systems, health monitoring systems, remotely operated robotic systems, and strategic systems including those used for Radar and Electronic warfare. The recent growth of interest in 5G, MIMO, IoT, and RF ID devices has also been a stimulant for adapting AI techniques for novel antenna designs. The present study reviews about 6000 documents over the period 1991–2020 to look at the growth profile of Antenna research in relation to various Artificial Intelligence techniques. The paper provides a landscape view of use of the AI techniques in antenna research and identifies research themes and trends, Knowledge Gaps, and Linkages between the different techniques. The study will be useful for new researchers to help plan their research activity, for experienced antenna engineers to trace the growth profile of antenna research related to different AI techniques, and for other researchers to trace AI applications in different technologies. Keywords Antenna research · Artificial intelligence · Bibliometric analysis · Knowledge gaps · Research growth · Research themes · Research trends

1 Introduction Devices incorporating wireless technology is the norm of many developments in the recent decades, right from the handheld mobile phones to sophisticated satellite communications. The antenna research community has been exploring Artificial G. S. Mani (B) · S. D. Pohekar Symbiosis International University, PRN 19029001010, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_47

647

648

G. S. Mani and S. D. Pohekar

intelligence (AI) techniques for the design and development of various types of antennas used in wireless communication systems and technologies [1–3]. Several review articles on application of Artificial intelligence (AI) and related techniques for antenna and propagation have been published in recent years. Erricolo et al., have reviewed applications of Machine learning in electromagnetics and its perspectives for future research [4]. Similar reviews have been done on applications of Deep Learning to Antenna and Propagation [5, 6]. Detailed review of Artificial Intelligence Techniques (AIT) for Channel modelling, adaptive and reconfigurable antennas have also been reported [7, 8]. Several guest editorials have also been published on how AI has been influencing antenna research [9–12]. Yasutomo Takano et al., used Research Classification Schema (RCS) for identifying trends and typology of emerging antenna and propagation technologies [13]. Bibliometrics is a highly focused technique based on statistical and quantitative analyses of research publications, and helps in assessing research trends of a subject [14–16]. The technique is a useful tool for studying research patterns based on different parameters [17–20]. In the field of antennas, bibliometric studies have been carried out on 5G, Meta materials, microstrip antennas, and MIMO antennas [21–26]. Present study is on bibliometric analysis of published research literature on use of AI Techniques for antennas. For the present paper, AI is considered as ‘computer systems that interact with the world through capabilities that we usually think of as human’ [27]. AITs taken up for study are Artificial neural networks (ANN), Genetic Algorithms (GAL), Machine Learning (MLN), Deep learning (DLN), Swarm intelligence (SIG), Automation (AUT), and Robotics (ROB). The study is based on a bibliometric review of antenna research related to above AITs, relating them to corresponding overall research growth in the respective AITs. The study also identifies highly cited papers and most impactful authors in these areas. The paper provides an overview of AI research landscape in the field of Antennas, including Research themes and trends, Knowledge Gaps, and AIT Linkages, useful for a new researcher in planning his research activity. The paper is organized in the following manner. Section 2 describes the methodology used for the study, with a brief description of each step of the methodology. The next section describes the results of the study for different AITs, which is followed by discussions in Sect. 4. The paper ends with conclusions and acknowledgements.

2 Methodology Followed in the Present Study The methodology followed in the present work is shown in Fig. 1. (a) Selection of Bibliographic literature source: Two sources of literature considered for the present study are Scopus database, and Clarivate Analytics Web of Science (WOS). Both have well quantifiable, reproducible, and objective bibliographic data of published scientific literature including journal articles,

Growth Profile of Using AI Techniques in Antenna Research Over Three …

649

Fig. 1 Methodology followed in present study

conference proceedings, books, review articles and related documents. Based on the considerations discussed in literature elsewhere, Clarivate Analytics Web of Science (WOS) was chosen for the present study [28–31]. (b) Querying for relevant information: Literature Data Bases were created by querying the WOS literature source for the time period 1991– 2020 and for different AITs. These AIT terms were also Combined ‘logically’ with Antennas to obtain relevant databases. This resulted in about 1.2 million documents. (c) Filtering: The next step involved filtering so that documents related to Books, Book Chapters, Book Reviews, Review Articles, Abstracts of meetings, Editorial Materials, Letters, News Items, Corrections, Bibliographies etc. could be eliminated and retain only articles forming part of technical journals and conferences. This resulted in a total of 6304 documents classified into15 Document Databases (DDB). (d) Extraction and Analysis: Each document in DDB could have a maximum of 76 field Tags each carrying some relevant information about the document [32]. Based on the type of analysis, information is extracted from corresponding tags of the documents, which are then collated, analyzed, and presented.

3 Results This section describes the results of the bibliometric study on using different AITs for Antenna Research.

650

G. S. Mani and S. D. Pohekar

Fig. 2 Growth profiles of AIT research on antennas

3.1 Growth Rate of AI Research on Antennas Use of different AITs for antenna research has been on the increase over the years. These growth profiles are shown in Fig. 2. The growth shown for each block of 5 years are percentages with respect of total publications over the period 1991–2020.

3.2 Research Focus Ratio for AI Research on Antennas The volume of peer-reviewed AI papers covering all subjects has grown by more than 300% between 1998 and 2018, accounting for 3% of peer-reviewed journal publications and 9% of published conference papers [33]. This growth has been noticed across many subject areas published by authors covering all geographical regions. For studying how well research on antennas under each AIT is in synchronization with this overall academic research growth is studied by formulating a Research Focus Ratio (RFR). Research Focus Ratio(RFR) = a/b

(1)

where a = the number of documents published on AIT in Antenna Research. and b = the total number of documents on the corresponding AIT. This study has been carried out for each AIT, and details are given in Table1. Column 2 in the table shows the number of documents published on AIT in Antenna

Growth Profile of Using AI Techniques in Antenna Research Over Three …

651

Table 1 Regression model for RFR study Regression model

No of observations used for model

R2 of the model

4.56

E(a) = 0.0047b −0.8361

30

0.9308

170,819

15.87

E(a) = 0.0152b + 4.6114

27

0.8716

622

232,809

2.67

E(a) = 0.0046b −19.9569

18

0.9536

DL

368

115,574

3.18

E(a) = 0.0046b −15.802

9

0.8953

SI

146

13,266

11.00

E(a) = 0.0115b −0.2805

18

0.8188

Automation

1163

266,409

4.36

E(a) = 0.0062b −16.2252

30

0.9023

Robotics

664

290,300

2.29

E(a) = 0.0038b −4.6184

27

0.9541

Total no. of docs on using AIT in antenna research = (a)

Total no. of docs on AIT research = (b)

ANN

629

137,926

GA

2711

ML

Research focus ratio = (a/b) *1000

Research and column 3 shows the total number of documents on the AIT, as obtained from the corresponding DDBs. RFR for each AIT is given in Column 4. A simple regression analysis was carried out to examine the relationship between the two sets of publications over the years. The regression model and the number of observations used for construction of the model for each AIT are shown in column 5 and 6 respectively, where E(a) is the estimated value based on the model. The last column shows the R2 value of the model.

3.3 Most Impactful Authors Over 15,000 authors contributed to more than 6300 papers in the DDBs related to AITs in the study. In order to evaluate the technical impact of prominent authors, a term called Technical Impact Factor (TIF)is computed for authors who have more than a threshold number (Pth) of technical publications to their credit.

652

G. S. Mani and S. D. Pohekar

TIF is defined as TIF aut = Cit aut/Pub aut(Pub aut > Pth)

(2)

where TiF aut = Technical Impact Factor of an author, aut. Cit aut = Total no. of citations of the author, aut. Pub aut = Total no. of articles published by the author, aut. Pth = Threshold number of publications considered for the study. In the study, both self-citations and citations by others are considered at par. Also, all co-authors of a publication are given same credit as the main author for the citations. Five most impactful authors, who have more than 12 publications, their TIF values, and two of the most-cited publications of each author are shown in Table 2.

3.4 Research Publication Titles For the period of study, a total of 394 publication titles inclusive of Journal papers and Conference Proceedings were involved in publishing the 6304 documents related to AITs in antennas. A tree map chart of 10 major Publication Titles is shown in Fig. 3.

3.5 h-index Study h-index is an important statistic showing scholarly significance in bibliometric performance analysis studies. The h-index is the number of papers with citation number greater than or equal to h, where h is the number of papers published [34]. This can be applied to authors, countries, journals, and institutions. h-index has been computed for different AITs as applied to antenna research and are shown in Table 3. The table also shows the percentage of publications that can be considered as highly cited, based on h-index computation.

3.6 Keyword Analysis Keyword analysis can provide a broad framework for understanding the underlying trends, gaps in knowledge, and associated fields of research, which could be of interest for researcher. For present study, the text mining function of the VOS viewer [35] was used to process the DDBs and obtain the key words. Some of the standard procedures followed in Natural language Processing such as Removal of unimportant text data

Growth Profile of Using AI Techniques in Antenna Research Over Three …

653

Table 2 Five impactful authors, their TIF, and their most-cited articles Author

Technical impact factor Two most cited articles of the author

Werner. DH 39.26

1. An overview of fractal antenna engineering research. Werner. DH and Ganguly. S. Feb 2003. IEEE ANTENNAS AND PROPAGATION MAGAZINE 45 (1), pp. 38–57 (597 citations) 2. Particle swarm optimization versus genetic algorithms for phased array synthesis. Boeringer. DW and Werner, DH, Mar 2004 | IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION 52 (3), pp 771–779 (570 Citations)

Massa.A

27.96

1. Differential Evolution as Applied to Electromagnetics, Rocca, P; Oliveri. G and Massa. A Feb 2011 | IEEE ANTENNAS AND PROPAGATION MAGAZINE 53 (1), pp. 38–49 (384 Citations) 2. Evolutionary optimization as applied to inverse scattering problems. P Rocca. M Benedetti. M Donelli. D Franceschini and A Massa. published 23 November 2009 • 2009 IOP Publishing Ltd. Inverse Problems. Volume 25. Number 12(332 Citations)

Guney.K

22.08

1. Amplitude-only pattern nulling of linear antenna arrays with the use of bees algorithm. Guney. K and Onay. M. 2007 | PROGRESS IN ELECTROMAGNETICS RESEARCH-PIER 70, pp. 21–36 ( 92 Citations) 2. Shaped-beam pattern synthesis of equally and unequally spaced linear antenna arrays using a modified Tabu search algorithm Akdagli. A and Guney. K, Jan 5 2003 | MICROWAVE AND OPTICAL TECHNOLOGY LETTERS 36 (1), pp. 16–20 (71 Citations)

Elbir.AM

20.72

1. Deep Channel Learning for Large Intelligent Surfaces Aided mm-Wave Massive MIMO Systems Elbir, AM; Papazafeiropoulos.A; (…); Chatzmotas, S. Sept 2020 | IEEE WIRELESSCOMM LETTERS 9(9), pp. 1447–1451 (52 Citations) 2. Joint Antenna Selection and Hybrid Beamformer Design Using Unquantized and Quantized Deep Learning Networks. Elbir.AM and Mishra. KV. Mar 2020 | IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS 19(3), pp. 1677–1688 (46 Citations)

Koziel S

17.6

1. Multi-Objective Design of Antennas Using Variable-Fidelity Simulations and Surrogate Models Koziel. S and Ogurtsov. S. Dec 2013 | IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION 61 (12), pp. 5931–5939 (113 Citations) 2. Efficient Multi-Objective Simulation-Driven Antenna Design Using Co-Kriging. Koziel, S; Bekasiewicz. A. (...); Dhaene. T. Nov 2014 | IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION 62 (11), pp. 5900–5905 (103 Citations)

654

G. S. Mani and S. D. Pohekar

Fig. 3 Ten major publication titles

Table 3 h-index study(*outlier excluded) AIT in antenna research

Total no. of publications

Average citation per item

ANN

630

GAL

2711

MLN

622

9.93

39

6.7

DLN

368

13.51

37

10.1

SIG

146

6.66*

17

11.7

AUT

1163

4.47

35

3.0

ROB

664

16.58

39

5.9

7.12 11.8

h-index

% publications that can be considered as highly cited, based on h-index

32

5.1

72

2.7

(such as copyright etc.), sentence detection, part-of-speech tagging, searching for noun phrases, eliminating duplicates etc. were applied to the datasets to obtain the key words. A thesaurus was created for each dataset and used for data cleaning by merging of terms wherever appropriate, correcting spelling differences, ignoring non-significant terms etc. The resulting keywords were then searched for frequency of occurrence. Table 4 provides the information about keywords and their frequency of occurrences corresponding to each AIT.

Growth Profile of Using AI Techniques in Antenna Research Over Three …

655

Table 4 Frequencies of keywords Staring set of keywords

Frequency of occurrences 5

10

25

ANN

2053

103

39

13

GAL

5692

384

170

52

MLN

2498

109

35

11

DLN

1405

71

30

5

SIG

617

17

8

3

AUT

3653

111

34

5

ROB

2268

51

11

3

4 Discussion Results obtained from the bibliometric study are discussed in this section.

4.1 Growth Rate of Research It is observed that the work on using AIT for antenna research was mostly initiated after 2000, with only about 3% of research publications appearing between 1991 and 2000.About 40–55% of bibliometric data for all AITs excepting MLN and DLN belongs to the period 2006–2015. In case of MLN and DLN, 85–95% of activities have happened during 2016–2020.Work on GAL-based on antenna research has plateaued over the last 15 years, compared to the all-other AIT-based antenna activities, presumably because most researchers tried to explore other techniques. Techniques such as DLN and MLN could explore the higher speed of general-purpose GPU computing which could be two orders of magnitude compared to conventional CPU-based approaches.

4.2 Research Synchronization The regression model and the RFR study indicate that the growth curve on using AIT for antenna research fits well with the general growth curve of AIT research in most cases. The r2 value is consistently high, being >0.87 for all AITs except in the case of SIG, where it is 0.81. Thus, we can conclude that the trend followed by antenna researchers in using AIT is synchronizing well with other researchers working in similar areas.

656

G. S. Mani and S. D. Pohekar

4.3 Impactful Authors Names of five most impactful authors based on Technical Impact Factor are given in Table 2. Werner, DH of Pennsylvania State University must be considered as the most impactful author, since he has 525 antenna-related publications to his credit with a total of 7030 citations. His first publication was in 1989 [36] and his most productive year was in 2016, when he had published 44 articles. His contribution towards AIT is mostly related to Genetic Algorithm, where he has 50 publications, which have earned 2082 citations. The other most impactful author is Dr. Massa, A of University of Trento in Italy, who has published 290 articles on antennas with a total citation count of 6426. His first publications were in 2003 and ever since he has been contributing to antenna research continuously. His most productive year was 2014, when he published 43 publications, which has a total citation count of 320. His contribution towards AIT is mostly related to applying GA to antenna issues, where he has 43 publications, with a total citation of 1193 counts.

4.4 Publishing Titles Institute of Electrical and Electronic engineers (IEEE) through its journals and conferences contribute to more than 11% of publications related to AIT research on antennas. IEEE Transactions on Antenna and Propagation has been the major publishing title, whereas Microwave and Optical Technology Letters published by Wiley-Blackwell has been the second biggest publishing title. The other major publishing title is The Journal of Electromagnetic Waves and Applications of Taylor and Francis. The major conferences on the topic have been IEEE AP Society International Symposium and IEEE Global Communication Conference. IEEE International conference on Robotics and Automation (ICRA) has been the leading conference where publications related to automation and robotics as applied to antennas are discussed.

4.5 Citation Study Table 5 shows the most cited papers directly related to applying AIT to antennas. Though all these publications should be considered as milestone papers, the paper on Particle swarm optimization by Robinson has 1438 citations, far above the others. In case of swarm intelligence, there are 146 articles related to swarm intelligence, which together have a total citation of 2403. All other papers leaving the paper by Robinson together have a total citation of only 965, making this paper as an outlier.

Growth Profile of Using AI Techniques in Antenna Research Over Three …

657

Table 5 Citation study S No. No. of citations Details of the publication 1

1438

Particle swarm optimization in electromagnetics, Robinson, J and Rahmat-Samii, Y, Feb 2004 IEEE TRANSACTIONS ON ANTENNAS AND PROPOGATION 52(2), pp 397–407

2

570

Particle swarm optimization versus genetic algorithms for phased array synthesis, Boeringer, DW and Werner, DH, Mar 2004 IEEE TRANSACTIONS ON ANTENNAS AND PROPOGATION 52(3), pp 771–779

3

494

An introduction to genetic algorithms for electromagnetics, Haupt RL April 1995, IEEE ANTENNAS AND PROPOGATION MAGAZINE 37(2), pp 7–15

4

493

Genetic algorithms in engineering electromagnetics, Johnson, JM and Rahmat-Samii, Y, August 1997 IEEE ANTENNAS AND PROPOGATION MAGAZINE 39(4), pp 7–25

5

408

Linear array geometry synthesis with minimum sidelobe level and null control using particle swarm optimization, Khodier, MM and Christodoulou, CG, Aug 2005 IEEE TRANSACTIONS ON ANTENNAS AND PROPOGATION 53(8), pp 2674–2679

6

396

RFID research: An academic literature review (1995–2005) and future research directions, Ngai, EWT, Moon, KKL, (….), Yi, CT April 2008 INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS 112(2), pp 510–520

By excluding this outlier, we get an average of 6.66 citations per publication, as shown earlier in Table 3.

4.6 Research Landscape An overview of AI research landscape in the field of Antennas, including Research themes and trends, Knowledge Gaps, and AIT Linkages can be obtained from the Key-word analyses presented in Table 4. This is done through Density and Network maps using VOS viewer for different AITs and are shown in Fig. 4a–n. In the Density maps, the emphasis is on the frequency occurrence of a term. Higher the number of occurrences of a term, larger is the density of the cloud surrounding the term. In the Network maps, the emphasis is on the co-occurrence of the keywords. The number of co-occurrences of a pair of keywords is indicated by the width of the link between the keywords. Thus, a pair of keywords which has a larger number of co-occurrences is shown by a wider link than the pair having lesser number of co-occurrences. A pair not having any co-occurrence are not connected to each other in the network map. In the present study, keywords are taken as representative of the research themes. Larger frequency of a keyword is presumed to indicate larger intensity of the theme

658

G. S. Mani and S. D. Pohekar

(a) ANN Density Map

(b) ANN Network Map

(c) GAL Density Map

(d) GAL Network Map

(e) MLN Density Map

(f) MLN Network Map

(g) DLN Density Map

(h) DLN Network Map

Fig. 4 a ANN density map, b ANN network map, c GAL density map, d GAL network map, e MLN density map, f MLN network map, g DLN density map, h DLN network map i SIG density map, j SIG network map, k AUT density map, l AUT network map, m ROB density map, n ROB network map

Growth Profile of Using AI Techniques in Antenna Research Over Three …

(i) SIG Density Map

659

(j) SIG Network Map

(k) AUT Density Map

(l) AUT Network Map

(m) ROB Density Map

(n) ROB Network Map

Fig. 4 (continued)

studied by researchers. Thus, larger density of cloud surrounding a research theme in the density maps indicates more intense study of the theme. Similarly, wider the link between two keywords is presumed to indicate stronger research activity involving the pair of corresponding research themes. Based on the above approach, core research themes and trends related to each AIT are derived and shown in Table 6. Research themes and trends: The research themes for each AIT are divided into (a) the types of antennas and (b) other application areas. Whereas ANN and GAL have

660

G. S. Mani and S. D. Pohekar

Table 6 Core research areas AIT

Typical research applications

Linkages with other AITs

Antenna types

Other applications

ANN

Patch, Microstrip, Electrically thin, Array Antennas

Frequency tuning, Bandwidth

DLN, GAL, MLN, PSO

GAL

Patch, Microstrip, Phased Arrays, Array Antennas

Multi-objective Optimization, Mutual Coupling, Null control, Pattern Synthosis

PSO, Differential Evolution

MLN

Array antennas

Beam Forming, Channel Estimation, Classification, Cognitive Radio, loT, MIMO, Localization, Wireless Comm

DLN, ANN

DLN

Array antennas

Antenna Selection, MLN, ANN Signal Processing, Beam Forming, Channel Estimation, MIMO, Localization, Wireless Comm, Feature Extraction

SIG

Array antennas

Wireless Comm

AUT

Patch, Microstrip Slot, Design Automation, Monopole, Diversity, Array Antennas MIMO, RFID, OFDM, Radar, Wireless LAN

ROB

PSO, GAL, Differential Evolution PSO

Dynamics, Fabrication, Navigation, Localization, RFID

been used for design of different antenna types, MLN, DLN and SIG have found applications prominently in the design and optimization of array antennas. ROB has found application mostly in the application areas. ANN has been mostly used for fine tuning of individual antennas as well as for improving the overall operative bandwidth of an antenna during its design phase. GAL has found applications in multi-objective optimization and in various array control techniques including null control, and study of mutual coupling etc. MLN and DLN are recent techniques, which are found useful in channel estimation, classification, and beam formation in the design of MIMO, Wireless Communication, and cognitive Radios. Research in the fields of Automation, and Robotics are mostly in specialized applications such

Growth Profile of Using AI Techniques in Antenna Research Over Three …

661

as in Radar, OFDM, Wireless LAN, RFID etc. Robotics has been mostly studied in the context of fabrication and dynamics associated with antenna issues. Knowledge Gaps: In the network Maps, weaker links between research themes can be interpreted as areas where research activities are not yet totally mature and possibilities exist for further exploration. Based on the network Maps in Fig. 4, it could be inferred that some of the knowledge gaps which could be exploited by researchers are Application of DLN/MLN for designing 5G Communications, Application of Automation to Radar and other system designs, Application of ANN for improving bandwidth of antennas, Application of Robotics in antenna fabrication, and Application of Swarm intelligence / PSO in multiband systems. AIT Linkages: Table 6 also shows how each AIT links with other AITs based on the bibliometric study. ANN being one of older techniques has been studied in conjunction with almost all other techniques. GAL, SIG, PSO, and Differential evolution are clearly interconnected since they are evolutionary techniques operating simultaneously on multiple solutions. Similarly, MLN, and DLN have evolved from ANN, and thus are closely related, which is also clearly visible from the bibliometric data.

5 Conclusions Bibliometric study on applying Artificial Intelligence techniques for antennas during the period 1991–2020 has resulted in over 6300 research documents covering 7 AI techniques. The study on growth profile shows that work on GAL-based on antenna research has plateaued over the last 15 years, compared to all-other AITbased antenna activities. In general, trend followed by antenna researchers in using AIT synchronizes well with other researchers working on these techniques for other applications. The most impactful authors and the major publishing titles were also identified. The paper on Particle swarm optimization by Robinson was observed as the most cited paper compared to all others. Density and Network maps using VOS viewer are helpful in obtaining overall landscape view of the various research themes, which can help new researchers. Acknowledgements Symbiosis International University provided the necessary motivation for the work reported in this paper. Library services at SIU are gratefully acknowledged for their timely help. Discussions held with Dr. Urvashi Rathod are gratefully acknowledged.

662

G. S. Mani and S. D. Pohekar

References 1. Wang C-X, Renzo MD, Stanczak S, Wang S, Larsson EG (2020) Artificial intelligence enabled wireless networking for 5G and beyond: recent advances and future challenges. IEEE Wireless Commun 27(1):16–23 2. Bi S, Zhang R, Ding Z, Cui S (2015) Wireless communications in the era of big data. IEEE Commun Mag 53(10):190–199 3. Fu Y, Wang S, Wang C-X, Hong X, McLaughlin S (2018) Artificial intelligence to manage network traffic of 5G wireless networks. IEEE Netw 32(6):58–64 4. Erricolo D et al (2019) Machine learning in electromagnetics: a review and some perspectives for future research. In: Proceedings on international conference of electromagnetics in advanced applications (ICEAA), 1377–1380. https://doi.org/10.1109/ICEAA.2019.8879110 5. Massa A, Marcantonio D, Chen X, Li M, Salucci M (2019) DNNs as applied to electromagnetics, antennas, and propagation—a review. IEEE Antennas Wirel Propag Lett 18(11):2225–2229. https://doi.org/10.1109/LAWP.2019.2916369 6. Campbell SD, Jenkins RP, O’Connor PJ, Werner D (2021) The explosion of artificial intelligence in antennas and propagation: how deep learning is advancing our state of the art. IEEE Antennas Propag Mag 63(3):16–27. https://doi.org/10.1109/MAP.2020.3021433 7. Huang C et al (2022) Artificial intelligence enabled radio propagation for communications— part II: scenario identification and channel modeling. IEEE Trans Antennas Propag 70(6):3955– 3969 8. Zardi F, Nayeri P, Rocca P, Haupt R (2021) Artificial intelligence for adaptive and reconfigurable antenna arrays: a review. IEEE Antennas Propag Mag 63(3):28–38. https://doi.org/10.1109/ MAP.2020.3036097 9. Bayraktar, Anagnostou DE, Goudos SK, Campbell SD, Werner DH, Massa A (2019) Guest editorial: special cluster on machine learning applications in electromagnetics, antennas, and propagation. IEEE Antennas Wireless Propag Lett 18(11):2220–2224. https://doi.org/10.1109/ LAWP.2019.2945426 10. Haupt R, Rocca P (2021) Artificial intelligence in electromagnetics [guest editorial]. IEEE Antennas Propag Mag 63(3):14. https://doi.org/10.1109/MAP.2021.3069181 11. Goudos SK, Anagnostou DE, Bayraktar Z, Campbell SD, Rocca P, Werner DH (2021) Guest editorial: special section on computational intelligence in antennas and propagation: emerging trends and applications. IEEE Open J. Antennas Propag. 2:224–229. https://doi.org/10.1109/ OJAP.2021.3057997 12. Andriulli F, Chen PY, Erricolo D, Jin JM (2022) Guest editorial—machine learning in antenna design, modeling, and measurements. IEEE Trans Antennas Propag 70(7):4948–4952 13. Takano, Yasutomo, Kajikawa, Yuya and Ando, Makoto, Trends and Typology of Emerging Antenna Propagation Technologies Identified by Citation Network Analysis, 2014 Portland int. conf. on Management of Engineering and Technology, (PICMET), July 27–31, 2014 14. Hicks D, Wouters P, Waltman L et al (2015) Bibliometrics: the Leiden Manifesto for research metrics. Nature 520:429–431. https://doi.org/10.1038/520429a 15. Donthu N, Kumar S, Mukherjee D, Pandey N, Lim WM (2021) How to conduct a bibliometric analysis: an overview and guidelines. J Bus Res 133:285–296, ISSN 0148–2963 16. Narin F (1987) Bibliometric techniques in the evaluation of research programs. Sci Public Policy 14(2):99–106. https://doi.org/10.1093/spp/14.2.99 17. Ellegaard O, Wallin JA (2015) The bibliometric analysis of scholarly production: how great is the impact? Scientometrics 105:1809–1831. https://doi.org/10.1007/s11192-015-1645-z 18. Donthu N, Kumar S, Pandey N, Gupta P (2021) Forty years of the international journal of information management: a bibliometric analysis. Int J Inf Manage 57, Article 102307 19. Donthu N, Kumar S, Pandey N, Lim WM (2021) Research constituents, intellectual structure, and collaboration patterns in Journal of International Marketing: an analytical retrospective. J Int Market 20. Verma S, Gustafsson A (2020) Investigating the emerging COVID-19 research trends in the field of business and management: a bibliometric analysis approach. J Bus Res 118:253–261

Growth Profile of Using AI Techniques in Antenna Research Over Three …

663

21. Sujatha D, Padmini K (2015) IEEE transactions on antennas and propagation: a bibliometric study. DESIDOC J Library Inf Technol 35(6):443–449 22. Dixit AS, Shevada LK, Raut HD, Malekar RR, Kumar S (2020) Fifth generation antennas: a bibliometric survey and future research directions. Library Philosophy and Practice (e-journal). 4575. https://digitalcommons.unl.edu/libphilprac/4575 23. Raut H, Shevada L, Malekar R, Dixit A, Kumar S (2021) Metamaterials in 5G antenna designs: a bibliometric survey. Library Philos Practice 24. Saproo V, Choudhary S, Priya S, Payasi S, Kumar S, Tupe-Waghmare P (2021) A brief bibliometric survey on microstrip antennas for machine-to-machine (M2M) communication in smart cities 25. Malekar RR, Shevada LK, Raut HD, Dixit AS, Kumar S (2020) MIMO antenna for fifth generation mm-wave applications: a bibliometric survey. Library Philosophy and Practice (ejournal). 4854. https://digitalcommons.unl.edu/libphilprac/4854 26. Shevada LK, Raut HD, Malekar RR, Dixit AS, Kumar S (2020) A bibliometric survey on ultrawideband multiple input multiple output antenna with improved isolation. Library Philosophy and Practice (ejournal). 4841. https://digitalcommons.unl.edu/libphilprac/4841 27. Luckin R, Holmes W, Griffiths M, Forcier LB (2016) Intelligence unleashed: an argument for AI in education. Pearson, London 28. Adriaanse LS, Rensleigh C (2013) Web of science, Scopus and Google scholar: a content comprehensiveness comparison. Electron Libr 31:727–744. https://doi.org/10.1108/EL-122011-0174 29. de Winter JCF, Zadpoor AA, Dodou D (2014) The expansion of Google Scholar versus Web of science: a longitudinal study. Scientometrics 98:1547–1565. https://doi.org/10.1007/s11192013-1089-2 30. Falagas ME, Pitsouni EI, Malietzis GA, Pappas G (2008) Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses. FASEB J. 22:338–342. http:// dx.doi.org/https://doi.org/10.1096/fj.07-9492LSF 31. Martín-Martín A, Orduna-Malea E, Thelwall M, López-Cózar ED (2018) Google scholar, web of science, and scopus: a systematic comparison of citations in 252 subject categories. J Inf 12(4):1160–1177 32. https://www.bibliometrix.org/documents/Field_Tags_bibliometrix.pdf 33. Perrault R, Shoham Y, Brynjolfsson E, Clark J, Etchemendy J, Grosz B, Lyons T, Manyika J, Mishra S, Niebles JC (2019) The AI Index 2019 annual report. AI Index Steering Committee, Human-Centered AI Institute, Stanford University, Stanford, CA. https://creativecommons. org/licenses/by-nd/4.0/legalcode 34. Hirsch JE (2005) An index to quantify an individual’s scientific research output. Proc Natl Acad Sci USA 102:16569–16572 35. Van Eck NJ, Waltman L (2010) Software survey: VOS viewer, a computer program for bibliometric mapping. Scientometrics 84:523–538 36. Werner DH, Ferraro AJ, Cosine pattern synthesis for single and multiple main beam uniformly spaced linear arrays. Trans IEEE AP 37(11):1480–1482

Economical Solution to Automatic Evaluation of an OMR Sheet Using Image Processing Pramod Kanjalkar, Prasad Chinchole, Archit Chitre, Jyoti Kanjalkar, and Prakash Sharma

Abstract Optical Mark Recognition (OMR) Sheet, also called as Bubble sheet is a special type of form used to answer graded multiple choice question examinations, where students have to mark or darken the bubbles to answer the questions. OMR sheets are used by various school, college, university and competitive examinations. Larger institutions like universities have specialized machines to optically detect the marked answers and grade the student. However, these scanners are significantly heavy on the pocket and cannot be afforded by small tuitions or individual teachers. With the help of mobile phones, it is easily possible to scan multiple sheets within seconds. The user is expected to upload a single file of all responses and view the results in a tabular format. Using various image processing techniques, the algorithm evaluates the responses of the student and displays the grade and percentage on the input image along with the correctly, wrongly marked and actual answers with green, red and yellow colors respectively. The user also has access to an interactive dashboard which presents various analytics regarding the students’ performance. A unique feature in this implementation, is that users can choose the number of questions, number of choices, marks per question etc. after which he/she would be able to download a customized OMR response sheet. We experimented on 300 sample OMR sheets with 12,800 questions and obtained an average accuracy of 99.12%. This paper aims to present a cost-effective solution to accurately scan OMR sheets without the need of scanners. P. Kanjalkar (B) · P. Chinchole · A. Chitre · J. Kanjalkar Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] P. Chinchole e-mail: [email protected] A. Chitre e-mail: [email protected] J. Kanjalkar e-mail: [email protected] P. Sharma PCOMBINATOR, Pune, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_48

665

666

P. Kanjalkar et al.

Keywords Contours · Cost-effective · Edge detection · Flask · Image processing · OMR · OpenCV · Perspective transform · Thresholding

1 Introduction The automatic technique of capturing data in the form of marks such as bubbles or ticks is known as Optical mark Recognition. When collecting a huge amount of data manually, OMR-based assessment is preferred. OMRs are frequently used in graded multiple-choice exams where students must mark their responses by darkening the circles on a pre-printed paper. Next, the sheet is assessed using an image scanning device. These devices project a light beam onto the paper while accounting for variations in the amount of light that is reflected back, as the marked region reflects less light than the unmarked regions. However, they cannot detect the shape of the mark. However, these machines cost anywhere between Rs. 25,000—30,000 and can only be afforded by bigger institutions like schools or universities. Although popular examinations conducted in India like the JEE have moved onto online solutions and conduct examinations on their own online portal. In case of small classes, tuition groups or individual teachers, with limited economic capacity, buying these scanners or setting up their own online portals would generally not be an option. Even in various developing countries, buying and maintaining these devices becomes increasingly expensive. Therefore, this paper presents a full-fledged and cost-effective solution to grade students quickly and efficiently using Image Processing techniques. This paper presents an image processing algorithm to scan an answer sheet and detect the responses given by a student. The work presented in this paper is implemented in Python for the image processing using the OpenCV library. This entire project is deployed as a web application for the end user to use. HTML, CSS, and JavaScript are used to create the frontend, and the Flask framework is used to create the backend and SQLite as the database. Flask is a web framework that provides libraries to build lightweight web applications in Python. The user is expected to login into the application. After logging in the user has access to the dashboard where he has to perform 3 actions: • Create a test • Upload a PDF of all answer sheets • View results and export it as a PDF or Excel file. The user is expected to scan all the answer sheets, upload a single file to the application using their laptop/desktop and automatically see the results of all scanned sheets by comparing the answer key stored in the database. This method is very efficient in case of a large number of students where evaluating manually would be tedious task. The main aim of the project presented in this paper is to use various

Economical Solution to Automatic Evaluation of an OMR Sheet Using …

667

image processing techniques to automate the entire process of evaluating an OMR sheet quickly and efficiently. The report is organized in sections as discussed. In Sect. 2, a comprehensive literature review is done and some features are mentioned which stand out from the previous works done. In Sect. 3, the implementation and working of the web application is described. In Sect. 4, the entire image processing algorithm running at the backend is explained in a step-wise manner. In Sect. 5, the experimentation process carried out is mentioned. Section 6 talks about future implementations that can be done in the project. Finally, Sect. 7 concludes the paper mentioning some advantages and future scope of the project. At last, references are mentioned in Sect. 8.

2 Literature Review The strategy proposed by Ismael Amara Belag, Yasemin Gultepe, and Tark Melude Elmalti is based on the creation of a template answer sheet and a key points detection algorithm. The suggested method employs thresholding, vertical and horizontal projections, and algorithms to automatically determine the number of right responses. The proposed method is able to detect more than one or no selected choice. In this study they have tested more than 100 exam papers from 5 different formats [1]. Here, Nalan Karunanayake has presented a low-cost web camera-based OMR sheet evaluation system that can effectively assess any format of MCQ paper. To extract the responded portion of the student answer script, the manually marked region with all right answers is first extracted from the printed paper and utilized as a template image in the matching procedure. The responses marked as right or incorrect are then identified by comparing the cropped portion of the answer sheet with the template picture. The author tested using three distinct MCQ paper layouts, each of which had a different number of answers in a column. The accuracy of the findings was 97.6 percent. Additionally, the author pointed out that pencil-marked bubbles are less accurate than those made with a pen [2]. Apart from the above research, Nikita Kakade, who is from Pune, used image processing to evaluate OMR sheets, but she also created a supporting hardware setup for this study that included a conveyer belt and a microcontroller. OMR response sheets will be put on a conveyor belt that will move when the microcontroller gives the command. A snapshot is taken as soon as the sheet passes in front of the webcam. This screenshot is used as an input. After that, image processing techniques are used. Results are displayed once it is compared with the database [3]. The training and recognition phases of the OMR system that was developed in this study are based on the Modify Multi-Connect Architecture (MMCA) associative memory. The suggested approach might possibly identify detect more than one or no selected choice. The system displays an accuracy of 99.96 percent in the recognition of marked answers among 800 test samples using 8 different types of grid response sheets and a total of 58,000 questions [4].

668

P. Kanjalkar et al.

In this paper, Vaibhav Chidrewar, Junwei Yang, Donguk Moon have proposed a mobile phone application for auto grading of answer sheets. In their implementation, they use OpenCV library to grade the answer sheet using key-point detection algorithm. They have also done a comparative analysis based on different angles and tilt and compare the accuracy [5]. The technique of collecting data from multiple-choice forms is described in this paper as the identification of optical markings. Python software and the OpenCV image processing package were used to create the recognition programme. Also, Imutils and ZBar libraries are used. Additionally, a programme has been created that prints the data contained in a QR Code on the optical form in a file along with the student’s indicated responses on the optical form [6]. The features which stand out in this implementation as compared to the abovementioned papers are—the algorithm is independent of the number of questions and number of choices; the question number and choices can be set by the user accordingly. For further personalization, users can add their own institution logo and name on the response sheet. Users just have to scan all images and upload one single PDF file for evaluation after which all results will be displayed at once in a tabular format which can be exported as a CSV/PDF file. The algorithm also automatically detects the roll number and question set and stores accordingly in the database. Using perspective transformation technique, the algorithm identifies the box to display the results. Subsequently, marks and percentage are displayed on the input image. Last but not the least, all the analytics corresponding to a particular test is calculated from the results. The final accuracy obtained in this implementation comes out to be 99.12%

3 Proposed System This web application is developed from the perspective of the user who will be grading the students. The application is designed such that the user can create a new test and upload its answer key, upload the answer sheets or view student results. The frontend is developed using HTML, CSS and JavaScript. Bootstrap, an opensource CSS framework was also used for many components to create a responsive and mobile-friendly web application. Flask, a compact and lightweight Python-based framework, is utilized for the backend. Flask offers a wide range of practical tools and features to accelerate online applications. As for the database, SQLite3 is used. SQLite belongs to the family of embedded databases. It is a library that one can embed in their apps. The main advantage of SQLite is that is does not require a server to fetch the data. The entire database is stored locally on the machine. Various endpoints have been defined and each endpoint when requested sends an appropriate response back to the user. All scanned images are sent to the “/upload-answer-sheets” route where each image is fed into the algorithm and the data (marks) is returned. The user has access to a dashboard wherein he can either create tests, upload answer sheets or view results (Fig. 1).

Economical Solution to Automatic Evaluation of an OMR Sheet Using …

669

Fig. 1 Dashboard of the application. User can create new test, upload answer sheets or view results

A web application is developed for the end user to use this system. The user has to login with valid credentials to access the dashboard. After logging in, the user has to perform 3 steps: 1. Creating the Customized OMR Sheet User has to fill out a form with all the configurations of the test after which a customized OMR sheet can be downloaded. The user can then distribute the copies of these sheets to his/her students. The configurations required are: • • • •

Number of questions Number of choices Marks per question Negative marking per question.

For personalization, teachers can also upload their logo and Institute name to be displayed on the OMR sheet (Figs. 2, 3). After downloading the answer sheet, the user can create the test and set its answer key. After submitting, the test will be saved in the database with its configurations and answer key (Figs. 4, 5). 2. Uploading Answer Sheets User will have to scan all responses filled by the students into a single PDF. The user will then choose the test (by test name) and then upload the PDF file for evaluation. Each answer sheet of the student in the PDF will be converted into an image at the backend which will then be fed to the algorithm. After submitting, the user will be redirected to the score page where the student’s roll number, score, percentage, grade, the image of the evaluated sheet and final result is displayed in a tabular format (Fig. 6). All results will be stored in the database. These results can be exported and downloaded as a CSV or PDF file for users to share easily.

670

P. Kanjalkar et al.

Fig. 2 Form for creating customized OMR sheet. The form has 7 steps to customize all the parameters of the test Fig. 3 Sample OMR sheet downloaded for 40 questions and 4 choices. Similarly, user can download sample OMRs for 10 to 100 questions

Economical Solution to Automatic Evaluation of an OMR Sheet Using …

671

Fig. 4 Form to set test name and select the number of sets

Fig. 5 Form to select the number of sets

Fig. 6 Results page after submitting all the images. Detected roll number, set number, calculated marks, percentage, grade and result are displayed in a tabular format

672

P. Kanjalkar et al.

3. Results All the tests created by the user will be listed according to date created. The user can click on each test to view the scores and performance of the students again.

4 Methodology The steps in the methodology of the research proposal are depicted in the flow chart (Fig. 7).

4.1 Convert RGB Image to Gray Scale Input images are images of students’ response sheets. Since OMR answer sheets come in a variety of colors, it’s required to transform them to a uniform color for the rest of the process to run properly. Therefore, to obtain the grey scale picture of the input image, we use the RGB to grey scale image.

4.2 Image Smoothing By convolving the picture with a low-pass filter kernel, the image gets blurred. It can be used to reduce noise. The Gaussian filter is used to lower the high-frequency components. When it comes to removing noise from an image, Gaussian blurring is extremely effective technique.

Fig. 7 Flowchart of the algorithm. The algorithm can be broadly divided into 9 steps

Economical Solution to Automatic Evaluation of an OMR Sheet Using …

673

4.3 Detecting Edges Edge detection is a type of image processing that identifies the edges of objects or regions inside an image. Edge detection necessitates computing numerical derivatives of pixel intensities, which generally results in ‘noisy’ edges. In other words, nearby pixels’ brightness might fluctuate much in a picture, leading to edges that don’t correspond to the dominant edge structure we’re looking for. It is easier to identify the dominating edge structure of the image when the intensity fluctuations around the edges are smoothed out by blurring. For detecting the edges in the image, we use Canny Edge Detection. Canny Edge Detection is one of the most often used edge detection techniques due to its resilience and adaptability (Figs. 8, 9, 10). Fig. 8 Gray scale image

674

P. Kanjalkar et al.

Fig. 9 Blur image

4.4 Finding Rectangular Contours Using contour detection, we can easily identify an object’s outlines in a photograph. To discover the contours in a picture, use the findContours() function. Use the drawContours() function to place the contours on the input RGB image once they’ve been found. The contourArea() function is used to sort distinct rectangles by their area. Then, from the available rectangles, we pick the biggest rectangles. Because we require the external conner points of rectangles, we use cv2.RETR EXTERNAL. Then we reorder the corner points (Fig. 11).

4.5 Warp Perspective of the Bubbled Area The transformation or perspective correction of images from angle to bird’s eye view transform is known as perspective view warping. We used the warpperspective() function to extract the required rectangular portion to a bird’s eye view from the Input Image (Figs. 12, 13).

Economical Solution to Automatic Evaluation of an OMR Sheet Using …

675

Fig. 10 Canny edge detection

Fig. 11 Biggest contours are identified on the input image, and corner points of the biggest contours which are of our interest are marked

676

Fig. 12 Warp perspective of marked area Fig. 13 Warp perspective of roll number

P. Kanjalkar et al.

Economical Solution to Automatic Evaluation of an OMR Sheet Using …

677

Fig. 14 Threshold image of marked area

4.6 Thresholding First, we obtain the threshold image from the warp perspective of the marked area. The simplest type of image segmentation, thresholding is used to produce binary pictures. The pictures are in black and white. The black part is the “undetected” part and the white part is the “detected” part. We frequently employ thresholding to isolate the portions of a picture that are of interest to us while disregarding the others. It enhances the marked area in the binary image. But during this process, size of image reduces and unnecessary gray level information is removed (Fig. 14).

4.7 Finding the Marked Responses To detect the marked bubble for each question, we find out the non-pixel value of each bubble. We obtain the non-pixel values of bubbles for each question, out of which we choose the highest value and consider it as the marked bubble for that question. In Fig. 15, we obtain the non-pixel values of the bubbles for each row. The index of the highest value is considered as the response given. In this case, the index of the marked answer would be 1. This process is done for all subsequent questions. The processes mentioned above is also done for finding out the roll number and set number filled in by the student (Fig. 16).

678

P. Kanjalkar et al.

Fig. 15 Non-pixel values for a single row

Fig. 16 Non-pixel values for set number

4.8 Evaluating the Responses Once we have the indices which are marked by the student, it is compared to the actual answer key. Correctly marked answers are shown in green, wrongly marked answers by red and the actual answers in yellow (Fig. 17).

4.9 Displaying Results After getting the indices of marked bubbles, we can subsequently calculate marks obtained, percentage and grade. This data is also displayed on the input image. It is displayed on a blank image after which we take the inverse perspective transform of that image and superimpose it onto the original image (Fig. 18).

Economical Solution to Automatic Evaluation of an OMR Sheet Using …

679

Fig. 17 Actual responses are highlighted in yellow, incorrect answers in red, and correct answers in green Fig. 18 Displaying the results on the image

680

P. Kanjalkar et al.

Fig. 19 The final output image showing the calculated marks, percentage and grade

4.10 Final Output The image obtained in Fig. 18 is combined with the original image using cv2.addWeighted() function. The marks obtained, percentage and grade are accurately displayed in the corresponding boxes on the input image. Correctly marked answers are shown in green, wrongly marked answers by red and the actual answers in yellow (Fig. 19).

5 Experimentation and Results The results of testing the suggested method with 7 distinct OMR formats are shown in Table 1. Four sets of a simple quiz were created and the answer sheets and questionnaire were distributed among our classmates. After that, on the website, a test was created for all 7 formats separately with the all the configurations set—number of questions, marks per question (set as 2), number of choices (set as 4), negative marking (set as 0) and number of sets (set as

Economical Solution to Automatic Evaluation of an OMR Sheet Using … Table 1 Experimentation results

Type of OMR

Number of questions

681

Number of OMRs

Accuracy (%)

Type 1

10

50

100

Type 2

20

50

100

Type 3

30

50

100

Type 4

40

50

99.37

Type 5

60

40

98.33

Type 6

80

30

98.70

Type 7

100

30

97.50

4). The test was named “Simple Quiz” and the answer key for each set was stored in the database. After the responses were collected, a PDF file of all scanned images was uploaded to the website. The results obtained after uploading are described in Table 1. Figure 20 represents how accuracy varies as we increase the number of questions. Following are the observations derived from the graph: • It is observed that accuracy slightly decreases as number of questions increase. • We obtain a 100 percent accuracy for 10, 20 and 30 questions. On increasing further, accuracy slightly decreases. • Accuracy of the algorithm depends on various factors. – Quality of input image – Brightness of input image

Fig. 20 Accuracy versus number of questions graph. It is observed that as number of questions increases, accuracy slightly decreases

682

P. Kanjalkar et al.

– Method of filling bubbles – Multiple fills. The average accuracy resulted from this experimentation is 99.12%. But if the quality of image and brightness is ensured, bubbles are filled properly without errors, we can expect an accuracy of 100%.

6 Future Scope 1. Currently the algorithm presented in the paper does not take into account, if the user by chance does not mark an answer. The bubble having highest pixel value is considered as the marked response. We can handle this problem, if the non-pixel value falls out of a specified range, the question can be considered as unanswered. In such a case the algorithm can be modified to award 0 marks to that question. 2. The current OMR answer sheet does not include Student name, Subject ID and Date. We can use the same approach described above to detect the same. 3. The algorithm can be modified to also implement Optical Character Recognition (OCR) to detect the name, subject, institute name etc. 4. A QR code can be added in the sample OMR sheet. This QR code can contain all the information regarding the OMR sheet in a pictorial format. On scanning the QR code, one can get the number of questions, number of choices, number of sets etc. 5. Option can be given to the concerned teacher to edit the test answer key or test name. 6. After evaluation of all answer sheets, the data obtained can be visualized in tabular as well as graphical formats. For example, subject-wise test report, individual test reports, average class score, average subject score, highest score, comparative analysis with highest performing students etc. 7. This algorithm can even be modified to automatically evaluate multiple choicebased surveys, feedback forms, questionnaires etc. 8. The proposed system can also be deployed as an Android/iOS application to be used on mobile phones.

7 Conclusion The technique for assessing optical mark recognition sheets is presented in this study. This project is expected to be used by small-scale institutions who can grade a large number of students efficiently and quickly. This system can also be used for other applications including questionnaires, surveys and forms. The only limitation of the proposed system is that the algorithm is highly dependent on the OMR sheet. Slight

Economical Solution to Automatic Evaluation of an OMR Sheet Using …

683

changes in the sheet may cause the algorithm to give false results. Organizations can generate the sample OMR sheets as per their requirements. One of the advantages of using this application is that it is independent of the number of questions and number of choices. Users are free to design their tests according to their will. The User interface is designed with the aim to make this web application easy to use. The application can also be scaled up to be used by students to view their results and take the entire process online. Doing so, institutions will have access to a lot of data regarding students’ performance. Hence, fast and reliable systems like this will enable teachers to save time, money and energy. Performance analysis can be done easily as users have all the required data at their disposal.

References 1. Belag IA, Gultepe Y, Elmalti TM (2018) An image processing based optical mark recognition with the help of scanner. IJEIR 7(2):108–112. ISSN: 2277 – 5668 2. Karunanayake N (2015) OMR sheet evaluation by web camera using template matching approach. IJREST 2(8):40–44 3. Kakade N, Jaiswal RC (2017) OMR sheet evaluation using image processing. JETIR 4(12):640– 643 4. Hasan RH, Kareem EIA (2015) An image processing oriented optical mark reader based on modify multi-connect architecture MMCA. IJMTER 2(7):414–423. ISSN (Online):2349– 9745; ISSN (Print):2393–8161 5. Chidrewar V, Yang J, Moon D (2014) Mobile based auto grading of answersheets 6. Küçükkara Z, Tümer AE (2018) An image processing oriented optical mark recognition and evaluation system. IJAMEC 6(4):59–64 7. Kulkarni D, Thakur A, Kshirsagar J, Raju YR (2017) Automatic OMR answer sheet evaluation using efficient & reliable OCR system. IJARCCE 6(3):688–690 8. Kumar H, Chauhan H, Mittal S (2019) Analysis of OMR sheet using machine learning model. TIJ 22(10):5835–5840 9. Patel NV, Prajapati GI (2015) Various techniques for assessment of OMR sheets through ordinary 2D scanner: a survey. IJERT 4(9):803–807. ISSN: 2278–0181 10. Nithin T, Nasim M, Shekhar TR, Gautam OS (2015) OMR auto grading system. IJISET 2(5):522–526 11. Raundale P, Sharma T, Jadhav S, Margaye R (2019) Optical mark recognition using Open CV. IJCA 178(37):9–12 12. Kumar S, Rathee A (2020) Implementation of OMR technology with the help of ordinary WebCam. IJMTST 6(12):77–81 13. Ware V, Varute P, Menon N, Dhannawat R (2019) Cost effective optical mark recognition software for educational institutions. IJARIIT 5(2):1874–1877

Defense and Evaluation Against Covert Channel-Based Attacks in Android Smartphones Ketaki Pattani

and Sunil Gautam

Abstract The Android operating system (OS) currently occupies the majority of the global smartphone market. Even IoT specific applications have prevailing OS as Android into their end device or intermediary communication channels. These Android smartphones may store sensitive data such as texts, banking information, personal identification numbers (PIN), contact-based information, GPS/locationspecific information, images, movies, IoT device operations, and so on. Furthermore, Android devices are popular among users due to their extensive capabilities and multiple connectivity options, making them a perfect target for attackers. To get their task done, attackers are shifting to methods that neatly disguise existing state-of-the-art equipment and targets. One such strategy is evasion, which is used to deceive security systems or conceal information flow in order to evade detection. On the alternative side, covert channels disguise the existence of exchange itself, making it unidentifiable to both users and cutting-edge technology. These covert channels, by employing evasive methods, become extremely undetectable and bypass security architecture, ensuring the secure maintenance or transmission of the user’s confidentiality-based information. The research evaluates and analyses existing state-of-the-art technologies, as well as identifies potential defense mechanisms for mitigating and detecting such threats. Keywords Android security · Mobile security · Evasion · Covert channels · Covert communication · Defense · Mitigation · Smartphones

K. Pattani (B) Department of Engineering and Physical Sciences, Institute of Advanced Research, Gandhina-Gar, Gujarat, India e-mail: [email protected] S. Gautam Institute of Technology, Nirma University, Ahmedabad, Gujarat, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_49

685

686

K. Pattani and S. Gautam

1 Introduction According to International Data Corporation (IDC) business statistics, by 2021, Android smartphones have affected the worldwide market dominance with a share of roughly higher than 83%. It is also expected to increase by 2025 [1]. Also, the usage of Android devices as edge devices in IoT, embedded systems and monitoring systems is increasing. Despite the fact that mobile operating system reliability has received substantial research attention, and the safety of smart phones OSs has been consistently strengthened, the danger battle will continue, and prevailing as well as forthcoming edge specific OSs shall not be merged with recent safety breakthroughs that smartphone OSs have been embracing. Majority of the mobile phone and edge operating systems allow users to run several apps at the same time. App communication should take done via controlled pathways. Inter-app conversations on Android, for example, should be done via Android APIs in order for Android OS to monitor data exchanges and keep the user aware if required. Covert channel interactions between user apps that the system is not aware of will be a huge risk. To communicate data across separate processes, mobile devices use informationhiding mechanisms. A local covert channel, particularly in the case of smartphones, can be used to establish a transmission connection between two colluding services in order to collect confidential info [2, 3]. Smartphones are especially vulnerable to hidden communication attempts owing to system’s diverse underlying hardware, which include cameras, cellular networks, GPS, Bluetooth, Wireless LAN, and several sensors [4]. Furthermore, malware authors shifted a substantial amount of their focus to mobile devices, resulting in a heavy development in smartphone malware since past few years [5]. Covert transactions among apps that the operating system is unaware of pose a significant risk to the user’s confidentiality and safety. This may lead to risks of backdoors, whereby the connection between the two x and y apps can gain access to data specified by the combination of x’s and y’s union of permissions, as well as dangers of counter, whereby a web session trying to run in a mobile-Tor browser for secrecy can establish a covert communication with another on-device app x [6]. A covert channel, in contrast to standard secret message transfer systems, hides not only the transmission information but also the flow path [7]. Covert channels, in particular, retain two features to ensure information hiding. Security of communication data/information and connections/paths. Covert channels significantly increase both elements’ security [8]. There are several covert channels specialized to Android systems, such as screen settings channel, vibrator settings channel, free space or shared space, system settings channel, event intents, broadcast intents, volume settings channel, thread enumeration, files channel, UNIX sockets, and processor statistics and frequency [9]. These may be accessible by numerous functions at the same time and are typically used to relay system state or execute malicious behaviors. They do not require any harmful permission, and the defensive system ignores them since they are presumed to be providing control information. While intent-specific

Defense and Evaluation Against Covert Channel-Based Attacks …

687

or intent-utilizing covert channels may grab the victim’s attention and processorspecific covert communications may consume device’s battery power, others are considered invisible. They provide a significant level of throughput and are more than adequate in most circumstances when small length but great value information has to be delivered. More than one covert channel can also be utilized in tandem to achieve a higher level of obfuscation. Extensive research initiatives aimed at improving developer understanding of the vulnerabilities posed by covert communication channels are essential in order to identify common reasons and avoid duplication and conflicting remedies [9, 10]. Knowledge aids in avoiding the flaws that can be leveraged by such exploits at the early stages of development of standards, apps, and so on. However, several research publications have demonstrated effective uses of hidden channels [11–14], which is usual as covert channels may be used constructively also. This part provides basics on development of covert channel intrusions and explains whether such an issue can quickly escalate into legitimate difficulties that must be addressed. Section II provides a brief review of covert channel types, and Section III defines the widespread usage of approaches in new technologies such as the IPv6 protocol, Internet of Things (IoT), and VoLTE-based technologies [10]. It demonstrates how vulnerable these are to covert attacks[10], as well as how they provide suitability to develop numerous covert communication channel strategies raising a number of security problems. Section IV addresses the security of communication networks. Section V discusses defensive measures against hidden channels, bringing the work to a conclusion.

2 Types of Covert Channels Covert communication in android devices are different than those like the network covert channels or hardware-specific covert channels. Here, the communication takes place between the applications and there is no broad classification in types of covert channels. However one can classify them based on their characteristics as follows:

2.1 Detectability or Side-Effects The covert channels usually are meant to perform internal communications without the defense mechanisms being notified. However, some covert channels are bound to functionality that needs to be handled with care otherwise they are prone to be detected. For example screen settings need to turn the screen on and off in order to leak information. Such a communication may be suspected by the user if not done when the user is away from phone. Also, processor statistics and processor frequency when utilized to leak information causes processor to have increased workload and

688

K. Pattani and S. Gautam

hence impacting battery draining which again may be suspected by the user to have some malicious activity.

2.2 Tainted or Untainted Channels such as shared settings, system logs, files, broadcast intents, event intents, and UNIX socket get tainted when they transmit information, i.e., their control flow and data flow graphs can detect the transmission of information within the system. However, thread enumeration, free space, processor statistics and processor frequency do not have tainting of variables or data possible thereby making them free of taint-based detection. When it pertains to network-specific covert channels, however, there is different classification. Side channels are classified into two types, i.e., storage-based and timing-based [10]. In timing-based channels, the private information is transferred by modulation into the timing behavior on the transmitting end to be managed to recover by the receiver end [15], whereas in storage-based covert channels, a transmitter either directly or indirectly inserts a covert signal information into memory objects to be accessed by the receiver station [16]. Timing-based channels are thereafter also characterized as active or passive [17]. Thirdly, a combination of storage-based and timing-based channels is known as a hyper-covert channel [18]. This kind of channels can present specific issues, making detection tough [10, 19]. There are several overt and covert routes in Android. Overt channels are those that are meant to communicate data between systems. Covert channels, on the other hand, do not require any additional authorization to bypass security systems like as TaintDroid. Covert channels must have the following characteristics: (1) Detectability: The covert channels should only be discovered by the intended recipients. (2) Indistinguishability: They should not be distinguished or identified in general. (3) Size: They should not be used to substitute whole channels of legitimate data. Thus, covert channels confirm a secure method of communication between users. Covert channels have grown in popularity in recent years because they allow botnets to bypass Internet limits or censorships while also allowing botnets to execute stealthy communication depending on controls. Many hidden routes have been researched and neutralized, however the ratio in Android is rather low. Covert channels are sometimes misinterpreted as steganography and the concealment of illegal things within legitimate information objects. Covert channels, on the other hand, are considerably more than that and do not employ legitimate data transmission protocols.

Defense and Evaluation Against Covert Channel-Based Attacks …

689

3 Covert Channels Applications in Technological Advancements The usage of covert communication channels is expanding in not only smartphones but also end user devices in cutting-edge technologies. This section focuses on the extensive usage of covert channel methods in modern technologies like the VoLTE, IoT, and the IPv6 [10]. We also determine how mentioned developments serve as enhanced habitats for the development of several covert channel approaches that offer genuine issues.

3.1 Covert Channels in IoT The proliferation of hidden channels has been aided by IoT applications and accompanying new technologies. Many hidden communication techniques based on IoT processes have been devised. It’s been noted that covert channels offer threats to privacy and security in the IoT and have recently been detected, attracted the attention of security experts; however, this particular area of research has not been thoroughly studied/researched upon [20]. Most IoT equipment/devices feature system’s network interfaces that allow general users to access them. Such devices are termed typical in the sense that they have scarce resources such as battery, memory, and computational power, as well as a lack of effective security measures. As a result, individuals are vulnerable to many sorts of violence [10]. The researchers of [21] demonstrated the feasibility of information concealing in a cyber-physical infrastructure, like intelligent data center, modifying its parts (e.g., controllers, sensors, etc.) or putting secret information in un-used registers. The bulk of published works on secret paths/channels in IoT employed data-obfuscating strategies in various IoT protocols, according to Cabaj et al. [20]. Furthermore, Smith [22] remarked as to, CoAP being heavily used, it also stands unapproached in covert specific research. In addition, [23] attempted to depict the issues and vulnerability of IoT related environments to covert communication and synchronization channels through mobile networks [10]. They explored several covert timing channel creation methodologies in order to assess their potential to establish covert timing channels for IoT. Author [24] proposes a novel IoT specific covert timing channel (CTC) which uses encryption inside defined interface/network information, such as interfaces or addresses. This method of data encoding does not employ inter-packet delays. Encoding methodologies seven in all are utilized in two IoT protocols, namely TCP/IP and ZigBee. As a result, the proliferation of diverse covert approaches among IoT specific protocols reflect the efforts necessary in preventing the created vulnerabilities mainly by taking them into account throughout the design phases.

690

K. Pattani and S. Gautam

3.2 Covert Channels in IPv6 Inspite of the certitude that, IPv6 security defects have thereby been resolved and betterment is observed, several concerns persist and need observational better investigation. Such worries stem from IPv6’s inherent design problems and inadequate implementation in all operating systems. Furthermore, effective IPsec standard implementations inside this protocol provides no assurance or needed protection from attacks [25]. Lucena et al. [26] suggested and tested 22 different IPv6 cryptographic techniques [10]. Publication IPv6CC [27] is a network hidden channel collection designed to the IPv6 protocol. Its major goal is to aid penetration test campaigns in evaluating a system’s protection against the development of information-hiding capability assaults or steganalytic malware. The mechanisms used to put information into IPv6 packets are discussed in this article, as well as the standard use case and the suite’s system software. It also contains a performance measurement of IPv6CC’s numerous covert communication channels, as well as a study of its capability to defeat certain state-of-art security methods.

3.3 Covert Channels in VoLTE In covert timing channels, a hidden signal, mixed with the IPDs of ordinary packets; though, this is irrelevant for VoLTE since the inter-packet latencies of VoLTE-based transmission stand constant and therefore cannot be modulated. This led the developers of [28] to construct a secret channel in VoLTE connections by varying quiet intervals, in which a concealed transmission may be adjusted by prolonging or delaying quiet intervals [10]. Authors used gray coding to encrypt the secrecy based message to minimize the effect/s of packet/data loss. Statistical tests were used by the authors to establish the undetectability through obfuscation of their suggested channel/s. According to [28], the covert transmission channel surpasses existing IPD-based mechanisms in terms of resilience. Furthermore, [29] created a video specific packet re-ordering covert specification over VoLTE backed by the Machine Learning (ML) algorithms to prove the development of well-grounded covert communication over difficult restrictions of networks. Hidden channels in VoLTE applications that employ inter-packet transmission delays and ordering of packets are governed by certain protocols. Because subtle changes in covert communication can thus be determined, and available covert solutions can’t be easily transformed to VoLTE. It uses cascading hash encryption and an intelligent system to demonstrate an appropriate packet distribution ratio covert specific timing channel. To provide robustness and undetectability, hash-based inter-code word verification, code word self-verification using cyclic redundancy check (CRC), and dynamic mapping matrix is done [30].

Defense and Evaluation Against Covert Channel-Based Attacks …

691

The increased development of covert channel methods is undoubtedly obvious, necessitating a greater focus on the process of creating effective countermeasures. The introduction of innovative network covert channel design concepts, such as reversible network-based covert communication channels, has caused the development of novel analysis methods. Additionally, preventative techniques should be incorporated preliminary design of standards and procedures [10].

4 Security in Covert Channels Dangers are classified according to the type of potential incursion. This is significant because it shows the attackers’ capabilities in terms of the knowledge and resources required to conduct assaults. The many categories of intrusions are as follows: When a single person is the only perpetrator of an assault, this type of infiltrator has the fewest skills. An organized group is a group of people (typically more capable than individuals) who have gathered for the purpose of assaulting. This sort of invader is well-equipped and well-funded to carry out large-scale attacks such as DDoS infiltration via an end device. Each user application platform requires protection standards as a result of security standards and related research efforts. To secure sensitive data during the collection, accessibility, and use phases, certain privacy protection mechanisms are in place. Specifically, the data owner/user should remain informed on the acquisition of person’s privacy-based information, the users/organizations that may have access/availability to it, and the intended use of this material. IEC/ISO standards— 27,018 [31] and 29,100 [32], as well as the ‘General Data Protection Regulation (GDPR), European Union—Regulation (EC) 2016/679’ [33], specify the entire privacy framework and characteristics. As a result of the aforementioned efforts, there are additional key privacy qualities and particular protection mechanisms [34]. To help in the secure design of a system, several methodologies and standards have been established. Two well-known and commonly used security specification methodologies are the ‘Open Source Security Testing Methodology Manual’ [35, 36] and the ‘Common Criteria Evaluation Methodology (CEM)’ [37]. Confidentiality, integrity, and availability are the 3 fundamental cyber-defense baselines for any sort of management of security (CIA). Confidentiality is the quality of not disclosing information to consumers, processes, or devices until expressly permitted to do so. The lack of illicit information modification or deletion is a quality of integrity. The property of accessibility of same data in robust manner is availability. Each of these three principles thereby need to be critically protected in covert communications also.

692

K. Pattani and S. Gautam

5 Covert Channels Detection Flow analysis approaches are widely classified into two groups. The first is static analysis, which essentially means analyzing code without running it at runtime. Only the code accessible at the moment of execution is evaluated in this case, therefore it cannot cope with runtime fragments. Many attackers may utilize this to complete their tasks undetected by available static flow analysis techniques. Also, the methodology examines every conceivable pathway within the accessible code and identifies susceptible destinations, sources, and possible paths. The second type of analysis is dynamic analysis, in which the program is given a runtime path analysis within the environment and is actually determined as to how it works. The vulnerability in static analysis caused by non-analyzed runtime inserted blocks of code has been fixed. And, it offers advanced application and execution monitoring and analysis. The taint-based analysis methodology for information flow checking is the topic of current research. It should be noted that in order to ascertain data flow, they deploy techniques that are incapable of dealing with Evasion and identifying the sensitive sink using a covert route. Evasion is the concealment of information flow inside the tool in order to avoid security and so leak sensitive information. Figure 1 as shown provides an overview of all approaches assessed, as well as which techniques fit into the categories of abuse and anomaly identification. The sign “✓” indicates which requirements, whereas “x” indicates which criteria not met [38]. In order to detect covert channels in Android devices, there are specific techniques utilized to detect the target characteristics.

5.1 Domain Isolation In 2011, reference [39] introduced TrustDroid, a viable and inexpensive domain isolation concept aimed toward Android. Before installing any app, TrustDroid colors it to grant it a trust level. While installation, programs must satisfy their own manufacturer’s license check. TrustDroid thereby will modify firewall bindings in order to prevent connections from non-trusted networks, thereby separating transmission through network-based proxy. While the Android OS is operational, TrustDroid monitors all application communication and accessibility to file systems, public or shared databases, networks, and hence bans information sharing and application interactivity. In comparison to previous solutions, TrustDroid isolates the Android software stack’s middleware, kernel, and network layers.

Defense and Evaluation Against Covert Channel-Based Attacks …

693

Fig. 1 Analysis of detection techniques [38]

5.2 Privilege Escalation Android OS avoids data leakage by restricting application access and isolating them from one another. Apps may, however, rely on other apps to get permissions to execute illicit operations, i.e., to improve their own authorization. Figure 2 depicts the basic concept related to an escalation attack with app privilege: Application 1 does not have permission, but Application 2 does. At that point, Application 1 can obtain permission’s approval with the help of Application 2. In 2011, a structure called ‘XMandroid’ was introduced [40] to identify privilege escalation applications in Android OS. At the system’s core, XMandroid analyzes interaction between all elements. The relationship is defined for several apps that are executing in the system for the very first time.

5.3 Driver Permission Detection Because the Android system is open source, Android handset makers have taken on bespoke development to fulfill their own demands. However, reference [41] discovered that these firms’ bespoke development was responsible for more than 60%

694

K. Pattani and S. Gautam

of software vulnerabilities. The comparable detecting technique ADDICTED was proposed in reference. ADDICTED associates the works/tasks of safety equipment with its LINUX file, and compares them to analogous files from the free software Android system. ADDICTED may assess if a security problem exists depending on a driver file’s decreased capability, for ex., from ‘read only’ to ‘read and write’. This technique was proven by the real-world difficulties ADDICTED discovered on the ‘Galaxy SII’, ‘Galaxy ACE’, and ‘Galaxy GRAND’ [42].

5.4 Artificial Intelligence Detection In 2017, reference [43] suggested a machine intelligence detection approach based on the power utilized by a cellular telephone to detect a hidden channel. This study proposes two detection methods: ‘regression-based detection (RBD)’ and ‘classification-based detection (CBD)’. Despite the discovery of certain successful detection techniques, the detection objects are generally unitary. The artificial intelligence technology developed by XMandroid and Caviglione could only identify the hidden channel in colluding applications. And ADDICTED could only identify the known driver’s file permissions. As a result, there will still be a scarcity of systematic, all-encompassing detection procedures.

6 Conclusion The unique strategies and approaches discussed in this paper are not usually prioritized by current security mechanisms and analytical tools. A covert channel is usually utilized by intruders for leaking information that is prohibited by security regulations. This paper addresses various techniques for identifying covert channels. The research analyses several detection algorithms for detecting different sorts of threats. The study demonstrates the necessity for stronger security methods while also offering a chance for advancement in the realm of covert channels. Tools like FlowDroid and DroidSafe have some detectability constraints. As a result of this effort, a new area of research in the field that demands stronger security against mobile-specific covert communication leakage has emerged.

References 1. Ceci L (2021) Statistica. Number of available applications in the Google Play Store from December 2009–July 2021 2. Lalande JF, Wendzel S (2013) Hiding privacy leaks in Android applications using low-attention raising covert channels. Int Conf Availability Reliabil Secur, 701–710

Defense and Evaluation Against Covert Channel-Based Attacks …

695

3. Mazurczyk W, Caviglione L (2014) Steganography in modern smartphones and mitigation techniques. IEEE Commun Surv Tutorials 17(1):334–357 4. Mazurczyk W, Caviglione L (2015) Information hiding as a challenge for malware detection. Secur Privacy 13(2):89–93 5. Sharma S, Kumar R, Rama Krishna C (2021) A survey on analysis and detection of Android ransomware. Concurr Comput: Practice Experience 33(16):e6272 6. Li H, Liu Y, Tan R (2020) Covert device association among colluding apps via edge processor workload. IEEE Internet Things J 7(11):10763–10772 7. Zhang L, Huang T, Rasheed W, Hu X, Zhao C (2019) An enlarging-the-capacity packet sorting covert channel. IEEE Access 7:145634–145640 8. Tian J, Xiong G, Li Z, Gou G (2020) A survey of key technologies for constructing network covert channel. Secur Commun Netw 2020:1–20 9. Lalande JF, Wendzel S (2013) Hiding privacy leaks in android applications using low-attention raising covert channels. In: 2013 international conference on availability, reliability and security, 701–710, IEEE 10. Elsadig MA, Gafar A (2022) Covert channel detection: machine learning approaches. IEEE Access 10:38391–38405 11. deGraaf R, Aycock J, Jacobson MJ (2005) Improved port knocking with strong authentication. In: Proc. 21st Annu. Comput. Secur. Appl. Conf. (ACSAC), 10 12. Qu H, Cheng Q, Yaprak E (2005) Using covert channel to resist DoS attacks in WLAN. Proc. ICWN, pp 38–44 13. Mazurczyk W, Kotulski Z (2006) New security and control protocol for VoIP based on steganography and digital watermarking. Proceedings 5th international conference computer science research applications (IBIZA) 14. Vanderhallen S, Van Bulck J, Piessens F, Mühlberg JT (2021) Robust authentication for automotive control networks through covert channels. Comput Netw 193 15. Zhang X, Zhu L, Wang X, Zhang C, Zhu H, Tan Y-A (2019) A packet-reordering covert channel over VoLTE voice and video traffics. J Netw Comput Appl 126:29–38 16. Zhang X, Guo L, Xue Y, Zhang Q (2019) A two-way VoLTE covert channel with feedback adaptive to mobile network environment. IEEE Access 7:122214–122223 17. Wu S, Chen Y, Tian H, Sun C (2021) Detection of covert timing channel based on time series symbolization. IEEE Open J Commun Soc 2:2372–2382 18. Elsadig MA, Fadlalla YA (2017) Network protocol covert channels: countermeasures techniques. In: Proceedings 9th IEEE-GCC conference exhibition (GCCCE), pp 1–9 19. Goher SZ, Javed B, Saqib NA (2012) Covert channel detection: a survey based analysis. High capacity opt. network emerging/enabling technology, pp 057–065 ˙ 20. Cabaj K, Zórawski P, Nowakowski P, Purski M, Mazurczyk W (2020) Efficient distributed network covert channels for internet of things environments. J Cybersecurity 6(1) 21. Wendzel S, Mazurczyk W, Haas G (2017) Don’t you touch my nuts: information hiding in cyber physical systems. In: Proceedings IEEE security privacy workshops (SPW), pp 29–34 22. Smith S (2020) Hiding in the noise: creation and detection analysis of modern covert channels 23. Tan Y-A, Zhang X, Sharif K, Liang C, Zhang Q, Li Y (2018) Covert timing channels for IoT over mobile networks. IEEE Wireless Commun 25(6):38–44 24. Harris K, Henry W, Dill R (2022) A network-based IoT covert channel. In: 2022 4th international conference on computer communication and the internet (ICCCI), pp 91–99 25. Salih A, Ma X, Peytchev E (2017) Implementation of hybrid artificial intelligence technique to detect covert channels attack in new generation internet protocol IPv6. In: Leadership innovation and entrepreneurship as driving forces of the global economy, Cham, Switzerland, Springer, pp 173–190 26. Lucena NB, Lewandowski G, Chapin SJ (2005) Covert channels in IPv6. In: Proceedings international workshop privacy enhancing technology, 147–166 27. Caviglione L, Schaffhauser A, Zuppelli M, Mazurczyk W (2022) IPv6CC: IPv6 covert channels for testing networks against stegomalware and data exfiltration. SoftwareX 17:100975

696

K. Pattani and S. Gautam

28. Zhang X, Tan Y-A, Liang C, Li Y, Li J (2018) A covert channel over VoLTE via adjusting silence periods. IEEE Access 6:9292–9302 29. Zhang X, Pang L, Guo L, Li Y (2020) Building undetectable covert channels over mobile networks with machine learning. In: Proceedings international conference mechanism learning cyber security, pp 331–339 30. Yuanzhang L, Junli L, Xinting X, Xiaosong Z, Li Z, Quanxin Z (2022) A robust packet-dropping covert channel for mobile intelligent terminals. Int J Intell Syst 31. De Hert P, Papakonstantinou V, Kamara I (2016) The cloud computing standard ISO/IEC 27018 through the lens of the EU legislation on data protection. Comput Law Secur Rev 32(1):16–30 32. Drozd O (2015) Privacy pattern catalogue: a tool for integrating privacy principles of ISO/IEC 29100 into the software development process. In: IFIP international summer school on privacy and identity management. Springer, Cham, pp 129–140 33. Regulation P (2016) Regulation (EU) 2016/679 of the European Parliament and of the council. Regulation (eu) 679:2016 34. Hatzivasilis G, Papaefstathiou I, Manifavas C (2016) Software security, privacy, and dependability: metrics and measurement. IEEE Softw 33(4):46–54 35. Greenwich Academic Literature Archive. [Online]. Available: https://gala.gre.ac.uk/. [Accessed: 19-Mar-2022] 36. ISECOM 1988–2018. Open source security testing methodology manual, ISECOM 37. ISO/IEC 15408 (1996–2018) Common criteria for information technology security evaluation, ISO/IEC 38. Goher SZ, Javed B, Saqib NA (2012) Covert channel detection: a survey based analysis. High Capacity Opt Netw Emerg/Enabling Technol, 057–065 39. Bugiel S, Davi L, Dmitrienko A, Heuser S, Sadeghi AR, Shastry B (2011) Practical and lightweight domain isolation on Android. In: ACM workshop on security and privacy in smartphones and mobile devices, pp 51–62 40. Bugiel S, Davi L, Dmitrienko A, Fischer T, Sadeghi AR (2011) XManDroid: a new android evolution to mitigate privilege escalation attacks 41. Wu L, Grace M, Zhou Y, Wu C, Jiang X (2013) The impact of vendor customizations on android security. In: ACM Sigsac conference on computer and communications security, pp 623–634 42. Zhou X, Lee Y, Zhang N, Naveed M, Wang XF (2014) The peril of fragmentation: security hazards in android device driver customizations. In: Security and privacy, 409–423 43. Caviglione L, Gaggero M, Lalande JF, Mazurczyk W, Urba´nski M (2017) Seeing the unseen: revealing mobile malware hidden communications via energy consumption and artificial intelligence. IEEE Trans Inf Forensics Secur 11(4):799–810

Study of the Need for Effective Cyber Security Trainings in India Rakesh Kumar Chawla, J. S. Sodhi, and Triveni Singh

Abstract With the availability of android phones and internet in every hand, the rate of cyber-crimes have increased tremendously. This increase in cyber-crimes has led to the need for effective cyber training in all fields. Awareness and training are strongly needed at this point of time. Cyber training is a much debated but the least worked on topic. Everybody talks about this, but very few take steps in this regard. The reason behind this is the lack of expertise and knowledge in the field. The issue of cyber training against social engineering attacks remain in the forefront of organisations. But when we enquire into the larger context of its implementation, then it is revealed that no substantial measure is being taken due to low human resource in the field. Even after so many trainings that are provided there is no valuable decrease in the number of cases of hacking. This paper aims to bring about the ongoing developments in this field and also the vulnerabilities in the organizational structure. Keywords Cyber-crime · Capacity building · Training · Social engineering · Cyber security

1 Introduction Social Engineering is a technique of manipulation by which human errors are extorted to reveal personal information like bank account number, passwords, pin codes, etc. Although social engineering techniques have developed over a long period of time, R. K. Chawla (B) Amity Business School, Amity University, Uttar Pradesh, Noida, India e-mail: [email protected] National Crime Records Bureau, Ministry of Home Affairs, New Delhi, India J. S. Sodhi Group CIO & Senior Vice President-Amity Education Group, Uttar Pradesh, Noida, India e-mail: [email protected] T. Singh Cyber-Crime at Uttar Pradesh Police, Uttar Pradesh, Noida, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_50

697

698

R. K. Chawla et al.

their successful application depends on the use of contemporary tools, an effective security system, and knowledgeable, skilled, and skilled employees. With the growing popularity of interactive education, companies and firms have taken up the task of training the employees using awareness campaigns, specialised trainings to prevent and avoid any social engineering threat. The steps and ways of implementation include providing the training materials, policy and regulation frameworks, training for preventing any upcoming threat. Beyond the regular training, the organisations also have the advantage of providing the employees with monthly or weekly security campaigns to tell them the importance of constant vigilance towards all sensitive information. Enterprises choose to deploy information awareness programmes to protect their data because employees play a crucial role in defending the interests of organisations against public-facing attacks. Financial issues are just one of the constraints placed on the system of using traditional training methods. In training and awareness programmes, there can be difficulties in communication between various groups, and it’s important to keep an eye on any potential team trust during the awareness programme phase. Additionally, it is necessary to take into account how inventive thieves might be when coming up with new threats to coerce individuals into divulging crucial information. A lot of highly significant web services have been compromised over the past ten years. The use of social engineering in these attacks has led to the release of millions of passwords. Yahoo, Dropbox, Last.fm, LinkedIn, Weebly, and MySpace are a few other recent victims. Additionally, after being taken from those web services, those compromised passwords were transferred (or sold) for months or even years online, which increased the risk of attacks on social developers. The creation of mitigation mechanisms should be a continuous process given the recurrence and constant change of social engineering threats. Since social engineering organisations face dangers, there is no “perfect” defence system against them, but it is crucial to train people to fend off such assaults. As a result, businesses of all sizes are increasingly turning to education and awareness campaigns in addition to the creation of technology tools to limit the potential harm brought on by cyberattacks. It is critical to be aware of the difficulties that organisations can encounter when putting into place training and awareness campaigns intended to raise computer users’ awareness levels. This study emphasises the difficulties the organisation encounters when implementing conventional training and contemporary awareness initiatives as defensive measures.

2 Methodology This study used a cutting-edge research methodology to assess the difficulties businesses have when developing training programmes and raising awareness of social engineering. Natural exploration occurs as part of good research, which is then used to learn more about the underlying causes of the research. The experimental research

Study of the Need for Effective Cyber Security Trainings in India

699

methodology aims to define problem boundaries while allowing for flexibility in investigation. This strategy aids in overcoming the difficulties in using defence information training and awareness programmes to counter social engineering’s deceptive techniques. This study’s quality research is founded on a second literature review. This kind of analysis gives the topic more authority. Additionally, the literature quality analysis enables a thorough examination of the results of earlier investigations as this study analyses both conventional and contemporary advancements in the area of creating training and awareness methods against social engineering. This strategy enables comprehensive examination of viewing logs, forum notes, and discussion texts to improve field comprehension.

3 Why Basic Cyber Training Is Required This is the new ultra-modern age, which will define the next hundred years, making the possibility of human existence at different planets. The recent example is the Elon Musk’s (founder and CEO of SpaceX) red planet mission. If life is possible on other planets, the mankind will expand inexplicably and the demand will increase subsequently. The basic need not only includes food, clothing and lodging but also social awareness and this leads to the world of connectivity which eventually gives birth to the age of internet, a miniscule part of cyber world. Everyone is playing their own part in the cyber world, even teenagers and after corona epidemic children above the age of ten years are also the puppets in the hands of internet. Hence, every human is exposed to the cyber threats. These threats range from cyber stalking to cyber bullying and has no rules, no boundaries, resulting in unexpected outcomes. Basic cyber training is necessary not only for organisations but also for an individual to oppose the threats of lawless world of dark web and to prevent themselves from falling into the profound pit. With this cyber awareness is also necessary.

3.1 Organizational-level Training and Awareness Initiatives The integration of cutting-edge technical measures and management initiatives to improve employee awareness are necessary for organisational information systems to be effective in dealing with risks from social engineering. The provision of adequate training to personnel to enhance their capacity to recognise, alert to, avoid, and neutralise aggressive attacks is one of the most crucial and fundamental stages. However, there are many obstacles on the way to offering training and awareness programmes. The following subcategories can be applied to features.

700

3.1.1

R. K. Chawla et al.

The Business and Its Environment in Which the Employees Prosper

The environmental characteristics of a business include the employee’s interactive work areas within the firm and the outdoor environments. Aspects like current technology, organisational culture, staff education, policy, and changes to physical safety regulations are among the environmental changes that have an impact on training and awareness programmes provided to employees. Employees can be identified by email that can be accessed remotely by an external network. Damage can occur to the organization’s digital machines and networks that are remotely accessible. The effects of remote access have been growing with the integration of the internet in every part of the business. Wilcox, Bhattacharya, and Islam highlights the growing link between social engineering and the use of social networking websites such as Twitter, Snapchat, Facebook, etc. [1]. Today, business activities such as financial information systems and supply chain information systems are integrated into an integrated information system, which increases organizational risk. Moreover, the attacks on social engineers are changing and changing naturally, and with the growing integration and reliance on working with information technology, the complete segregation of staff from attack is a major challenge. The study suggests that the best solution to combat social engineering threats today is to improve staff knowledge about common methods of advanced social assault.

3.1.2

The Society, Social Communication and Its Impact

There has been a growing array of attacks on the employees, but these are curbed by training programs and awareness campaigns. But, there arises an issue when the employees engage in interaction and social exchanges. During their social responsibility they tend to forget their responsibility towards the company and thus leak of information takes place from here. This becomes a very dangerous issue for the organization. The informal exchanges and social interaction among the employees may result in violations of safety measures developed for the staff through training and awareness programs. Social factors that lead to barriers in training and awareness programs to fight against social engineering include cultural impact. According to some cultures that embrace social feeling, social interaction and communication is an inevitable part of the worker’s life. The factor is borne and lived by the workers in the workplace, which puts them at risk of being robbed via the social engineering processes. To the best of our knowledge, there are no comparative studies showing the impact of cultural or social factors on limiting the impact of training programs and awareness in literature. However, the findings of a comparison of Alseadoon research highlight that only 7% of email scams of sensitive identity theft are reported in Saudi Arabia (Alseadoon 2012). The response rate was discovered to be substantially greater than that of many research conducted in western nations, which typically reported 3– 11%. Even though there isn’t a direct correlation between awareness and social

Study of the Need for Effective Cyber Security Trainings in India

701

variables and training programme success, a social component can be significant in identifying human influence while developing a uniform training strategy for all employees inside the company.

3.1.3

Impact of Government

The impact of government on the training and awareness program is very limited in literature. Political discussions are frequently held to strengthen defences against socially driven attacks. In addition, deception based on the government’s agenda in the form of dissemination of inaccurate information is a common practice used by social engineers. Hijackers use affirmative discrimination and exploit disagreements in order to target like-minded groups and targeting certain demographics to support employee awareness and training initiatives. In addition, Costa and Figueira suggest that governments should enact more safety laws and take action against organizations to force them to improve their training and awareness programs against social engineering threats [2]. It is very important to ensure that all employees comply with the various legal policies to ensure that no safety law is violated. Circumstances may lead to trainers redesigning teaching materials to ensure that training methods are constitutionally authorized. Criminals and/or social engineer attack systems use a number of the latest methods to obtain personal information and data, including passwords. The five most common current attacks include crime of sensitive information, seduction, quid pro quo, lowlevel writing, and piggybacking. One of the ways is online guessing, which can be initiated against a publicly viewed server by anyone using the browser at any time. This online guessing method can be extremely fatal and thus it has raised serious security concerns for many governments and organizations as unique personally identifiable information (PII) and leaked passwords can be easily accessed by anyone. Further on, the online guesses that are targeted towards some specific groups of people don’t only extort the weaker passwords but also the stronger ones that are used everywhere and contain personal information.

3.1.4

Organization and Its Internal Structure

Industry-specific barriers make up the organization’s internal environment to the point where training and awareness programmes will be useful in handling socially designed attacks within the company. The constraint results from the absence of a wide range of awareness programmes intended to target particular employee groups with varying levels of awareness. Depending on the company’s needs, the demands of the market, the modernization of the business, the requirements, and the resources at its disposal, security training should change. The adaptable techniques that fraudsters employ to acquire information are being examined. Unfortunately, once hackers learn about the security precautions taken by a target firm, they create and utilise brand-new tactics that staff members are unaware of.

702

3.1.5

R. K. Chawla et al.

Economics

Training and awareness programs among staff can be greatly enhanced with interactive content. It is demonstrated that the provision of awareness materials plays an important role in the overall impact of such training. However, providing affordable training for each flexible attack poses a challenge to organizations. In addition, in order to remain compliant with progressive social threats, an organization must assess the competence of its employees. Employee appropriateness, resilience, and their capacity to adhere to emergency safety procedures are all assessed during this process. Even if the tests are carried out internally by a security or IT department, money will still need to be diverted from other parts of the company for training and awareness campaigns. In other words, business continuity might be seriously threatened if the essential financial resources for staff training cannot be allocated on a regular basis.

3.1.6

Personal

Personality factors are inherent human characteristics that may compromise the efficacy of training and awareness campaigns against socially engineered attacks. Based on the characteristics of the targeted employee, social engineers decide on their attack strategy. Cybercriminal criminals use a variety of psychological methods to clearly identify the behavioural defects of the victims. A study by Luo et al. suggests that criminals can resort to criminal means to ensure that people act in accordance with this practice [3]. In addition, employee personality differences may even affect the employee’s response to the attempt to use social engineering. References highlight neuroticism, a combination of easily frightened emotions, such as anxiety, anger, danger, or depression, as being strongly associated with the ability to respond to e-mails in order to steal sensitive information. In addition, extroverts and entertainment lovers are also assertive and can provide information easily. Finally, openness as a feature means that employees are willing to try new things, leaving personal information and digital steps so that criminals can use their method in their organizations. Trust can become a major cause of information breach in big and small organisations. The level of trust in the minds of employees has led them to not think of cyber security as an issue, concern or problem. They trust the system and the infrastructure in place blindly without realising that it can be fatal for their organisation itself. This issue can be avoided and escaped by some innovative, efficient and comprehensive training and campaign programs by the company. The organisation should include training materials that can help in educating the staff. However, most of the companies do just the bare minimum for training purpose, because they think that just a minimum amount of training and awareness is sufficient but this is not the case. In the long term this limited and restricted knowledge becomes a barrier for the employee and makes him feel handicapped. This lack of awareness and expertise becomes a major downfall in assuring the safety against social engineering threats. And thus the

Study of the Need for Effective Cyber Security Trainings in India

703

gap keeps widening between trust and security which can be only narrowed down by bringing about additional awareness campaigns. Another challenge to training and awareness programs against social engineering is the lack of interest available to the general public. People often lack the personal motivation for regular training to protect themselves and their organizations. The concept of security awareness often falls on deaf ears despite the common occurrence of people being attacked for information and data related to their personal role and occupation. This lack of interest has also been reflected in literature such as Hadnagy, which shows that user education related to safety awareness is absurd, as people do not show sufficient interest [4]. Hadnagy also emphasizes that this lack of interest can be seen as end users do not think safety is their concern. Instead, employees think that security personnel should be the ones in charge and that they have a responsibility to take care of any security threats their organization may face. As a result, the lack of interest of individuals associated with the threat of social engineering becomes a major factor in challenging the effective training and awareness of staff. Different human abilities to understand the types of malicious intent attacks on social engineers pose another major challenge to awareness programs. In the past, the strategies of social engineers were more precise than they are today. They were usually well covered, which gave people an idea of the nature of such attacks. However, the increasing technology and complexity of social engineering methods make it difficult for victims to see them. In short, the development of deceptive tactics may leave people unaware of new ways to attack social engineers even when they are visiting awareness programs. Social engineers often choose to identify low-level employees, which is not the strategy of large organizations. Social engineers begin collecting data from the lower level of staff about workloads in order to collect as sensitive data as possible before planning an attack on the entire firm. Ignorance of the seriousness of such attacks, as well as the fact that employees may not see themselves as important in large companies, puts workers at greater risk. This lack of self-esteem poses a challenge to awareness programs. To cover this lack of self-esteem can be incorporated into awareness programs. Employees need to feel their citizenship in their organizations and feel invested in the safety of the organization. The final personal challenge is that sometimes people face a lot of work pressure in their professional careers and as a result have a limited life balance. Several organizations have developed strict safety policies to force all employees to complete the safety training required to increase their knowledge of social engineering threats. When training and awareness programs are conducted, employees try to get the necessary training for their working hours. However, job stress can cause weaknesses in this regard, as some employees face critical times that challenge the notion of a balanced work life. This work pressure leads to a lack of attention of trainees during training and leaves them with a sense of indifference during awareness programs. This reckless attitude creates a challenge for those training sessions to develop knowledge effectively.

704

R. K. Chawla et al.

3.2 Review of Challenges Associated With Modern Training in Social Engineering and Awareness Programs Training and awareness programs have continued to evolve as threats to attack social engineers increase. Security training and information security programs include simulation techniques, critical games, visual labs, and videos and modules with an awareness theme. However, these modern training methods have their limitations in improving employee readiness. Several studies recommend introducing critical card games as an appropriate tool to raise a person’s level against attacks by social engineers. The card game method, however, presents a challenge for communication between the training teams. The participatory approach is followed in these types of games during the learning stages. It helps to develop employee decision-making skills when targeted for real attacks. However, personality differences in perceiving and absorbing learning materials between groups can be challenging. The modern training methods to the employees include comprehensive games and compatible labs which give real time training of solving the social engineering problems. But this causes a communication gap. Since the game has different steps which involve identification and mitigation, it becomes really difficult to bring everyone together. The game becomes even more confusing in deciding who all should participate and who all should perform what function in the security effort. Also, there is a limit of collaboration, it will be tough to control an employee’s behaviour in front of a hacker and to restrict the spread of any potential damage to the organisation. The newest techniques for raising awareness, such theme films and creative modules, are creative techniques. Although all security measures may be integrated via awareness modules, there could not be a single solution. This is so that the training and awareness programmes’ creators won’t be limited by the unpredictable nature of a socially designed attack. Additionally, social engineers are experts at generating a unique need for workers to use deception, exploit a fear of losing something, and use reverse psychology that awareness programmes might not be able to stop. The new training methods which involve real-life simulation situations, highlight how social engineers and hackers attack. These manifestations are used by these techniques of trainings. This enables the employees to think rationally that whether any incoming message is an attack or just a simple message. However, the problem occurs due to the fact that these simulations are designed for all employees without any differentiation. There is a need to understand that all employees have different levels of honesty, different feelings, different opinions and different standards of awareness. Thus, it can happen that one person might have understood a lot from the awareness program and others might have a totally different reaction. This difference needs to be taken into consideration while educating the employees. Thus, since, most of such training programs and campaigns for awareness are held in general, it can be assumed that not everybody gains from it and therefore it is not successful and fruitful in its approach. Another very disheartening and common issue with the modern training programs are their time consuming properties. Simulations, interactive videos, fun games,

Study of the Need for Effective Cyber Security Trainings in India

705

etc. necessitate the staff to set aside their daily regular tasks and complete these trainings. This training period limit’s the working capacity and the utilisation of time by the employees thus hindering their productivity. On the other hand, if these workshops and trainings are undertaken outside the working hours, then it might cause a distorted work life balance. Thus, it’s time taking property causes a lot of issues to the employees and the organisation. The training workshops and programs are designed very creatively by the IT experts but between all this they forget to take in consideration that not every employee is that much qualified. Thus, taking the need and qualification into account is very important. The lack of technical knowledge can bridge a gap between the employees. Simplicity, thus becomes a factor in determining how successful any given training or workshop will be. The end result will be fruitful only if everyone understands the purpose and the goal behind the training program.

3.3 Traditional Training Programs Against Social Engineering Include Very Simple and Cohesive Steps Taken by the Organisation to Keep the Staff Informed About the Attacks and Threats Such a training involves local level awareness campaign, posters, wallpapers, handmade goodies and some basic courses. Ghafir et al. points out that the low budget for allotted for training purposes is a major setback towards the goal [5]. It becomes even more difficult during times of cost cutting and inflation in organisations and firms. The Covid 19 recession period in March 2020–2021 is a great example of such difficult times in companies when cyber criminals and hackers found it easy to break into a company’s system. During this time, there is less of training and more of cost cutting, thus making employees loose in security terms. Plus it is important to note that the attacks cannot only be online it can take place physically too, thus training employees in cyber security and educating them properly is very important and essential at the same time. This training needs to be accessible by one and all, because everybody in an organisation needs the basic knowledge, thus the program has to be designed specifically for individuals and according to their ability to learn. Another issue with the traditional training methods is that they are very boring and tough to handle. They become monotonous and thus they don’t have any fruitful results for the employees and the company likewise. This causes negative effects for the employees as they are not able to retain any part of the training because it is boring. Any training needs to be short, crisp, interactive and fun so that the audience can engage and participate thus enjoying and learning at the same time. Another problem with traditional training methods is the lack of real life scenario check. This becomes a major issue because if the employees don’t know how to tackle the real life situations then there is no use of such training methods. The practical application and exposure is all in absent the traditional training methods.

706

R. K. Chawla et al.

Thus, it becomes very difficult for the employees to understand the concept of cyber security and threat completely. Thus the traditional training methods also lack the basic characteristics of avoiding or preventing social engineering attacks. It just points out how to identify an attack but how to deal with the attack is nowhere taught in the training process. The traditional methods also have just very basic and general info. They don’t give any specific idea about how social engineering attacks are tailored and crafted. Plus they don’t give info about how to handle them. Social engineering attacks are devised in new, innovative and creative ways thus making the traditional ways of training and awareness useless in some ways. Thus, itis very difficult to trust the reliability of these training programs. With the advancement and update in technology, attackers and hackers also keep updating them and their breaking into methods, thus making it difficult for all the organisations to keep up with the changes. Also, there is this tendency in employees that even after having all the knowledge they prefer opening all mails and unknown links out of curiosity. Thus a modern training method which has some simulation will help in protecting from such attacks. Also, there is a low level of attention towards the traditional sessions and the pamphlets given for reading from the employees’ side. Thus, at an overall level it can be put forward that in many ways these training programs fail to cater the needs of the present.

4 Cyber-crimes Statistics According to a report by National Crime Records Bureau, around 164 cases of cybercrimes against children were reported in 2019 and the most striking feature of this report is that the all these children are under the category of 14–15 years while 2018 was more or less the same. Even though the number of cyber-crime cases committed against children in 2020 remained small due to the pandemic, but the increase in this sinusoidal curve in much more since 2019 and this is alarming. The explicit sexual contents availability is an another category of cyber bullying and cyber stalking targeting the women as an active victim. Every-year approximately 900 cases are reported and every following year the number increases. Social media has become an active playground for the hackers and they target the naive people and trick them easily. The cases reported and the number of people falling victim, the difference is quite mind wrecking. Delhi Police has setup many helplines and center for the victims and has enrolled some fraud detection unit’s. End to end encryption is helping the criminals with anonymity and is encouraging them to be more outrageous and consistent (Figs. 1, 2, 3, 4).

Study of the Need for Effective Cyber Security Trainings in India

707

Fig. 1 Cyber crime in India (Source National crime records bureau)

Fig. 2 Cyber crime in India (Source National crime records bureau)

Fig. 3 Cyber crime in India (Source National crime records bureau)

4.1 WhatsApp Case Study Indian government recently put a request forward for the social media platforms working with end to end encryption feature to make some changes to their policy in order to decrease the anonymity for national interest. Breaking end to end encryption and finding the real originator of a particular message will lead to policy and image damage but according to Indian government

708

R. K. Chawla et al.

Fig. 4 Cyber crime in India (Source National crime records bureau)

it is necessary to break the ice to protect rights of citizens and country also. This method is called “Traceability”. According to experts in the field disturbing end to end encryption is not a big deal but it will make other people and their information vulnerable, creating more holes and issues. On the other hand there are several reports and news that end to end encryption is not certainly safe because there were several leaks and nobody knows what is end that is mentioned in the end to end encryption. According to the definition it is the sender’s end to the receiver’s end.

4.2 Initiatives Taken by the Indian Government on Cyber Security 1. The Indian Computer Emergency Response Team (CERT-In) CERT-In basically tackles all the attacks on the government networks. Basically, this team is responsible to check on the threats against the national cyber security. 2. Cyber Surakshit Bharat Following in the footsteps of Government’s “digital India” vision, working to strengthen the grip of India’s cyber security. 3. Personal Data Protection Bill This bill is an another breakthrough for the data protection, storage and processing of citizens. Also putting some constraints on the various social media platform. 4. National Cyber Security Policy, 2013 This policy helps to give a private and protected space to the citizens. It provides a better cyberspace. An availability of proper legislative intervention with matters of cyber-crime.

Study of the Need for Effective Cyber Security Trainings in India

709

Despite availability of several government steps, still there is a long way to go due the lack of knowledge and awareness. When it comes to cyber platform, it’s still developing and discovering new fathoms. In this new era of technology and isolation it’s impossible to imagine life without a personal bubble of cyberspace. But, the only issue is how to keep the space clean and threat free. Not only organisations but also individuals personal and private life is at stake.

5 Some Rare Cases of Cyber Attack in India 5.1 Case Study on Pegasus Spyware Pegasus is a spyware which is used for the very purpose of hacking, developed and licensed by Israel based company. It uses a “zero click attack” to get under the covers. It does not require any interaction or actions done by the host in order to infect the target. Basically, Pegasus is a wicked spyware because when it enters the host’s cell phone, it will serve as a fatal bomb. The location of the host can be easily traced, messages can be read, camera can be used as a mini-cam recorder, calls can be recorded, etc., these are the few dangerous things that the Pegasus is capable of. These spyware can reach the cell phones through calls and messages, even though the host has not taken up the call or clicked on the messages. Now a day it is very difficult to detect the Pegasus spyware. Recently, Pegasus was all over the news, that the higher officials like parliamentarian, journalists, and some other prominent citizens of India are being tracked through this spyware.

5.2 Case Study on ShowPad China based group named “Red Echo” attacked at India’s power supply to give a huge blow or either to have a great munch on India’s important sector, not to mention the border tension between India and China near Ladakh was at its peak, but no data loss was reported from the government spokesperson. In a report submitted by Indian Computer Emergency Response Team (CERT-In) was stated that the malware used is ShadowPad, found in Mumbai, India at electricity despatch center. ShadowPad is a supply chain attack, which marks the backdoor to access the host, making it difficult to detect and take precautionary actions on time.

710

R. K. Chawla et al.

6 Safety and Security Bruce Schneier stated that “complexity is the enemy of security”, every individual has created a private space for themselves and preserving the space is at the highest priority order but even with so many advancements the issue of cyber security is an asymmetrical alphabet. The thief are also keeping a pace with all the modernity and are giving birth to new ideas on how to evade the personal digital bubbles in order to get the maximum benefit out of it. Attackers have a large pool to choose the method to attack but to prevent this there are two blocks to look ahead–System security and Information security. Cyberspace includes two different dimensions–safety and security. These two can get a fatal blow from various sources either human made or non-human made. Safety is a method to protect the cyberspace due to any unavoidable circumstances caused by natural phenomenas ensuring the working of the space and maintaining its overall health. Whereas, security is all about protecting the cyberspace from malicious attacks from humans or organisation. Random attacks are recurring phenomena, so taking initially necessary steps is important and compulsory. To keep a check on the device is also very important because once attacked the data will be in attackers hand and if the information is worthwhile, it can cause a terrible situation.

6.1 Security Measures There two types of security measures–proactive and reactive measures respectively. Proactive and reactive measures are often combined together to increase the effectiveness.

6.1.1

Proactive Security

1. Preventive control This ensures the protection before the attacks, as the name suggests prevention is the main focus. It aims to nullify the attacks by controlling exposure. 2. Deterrence It merely makes the security complex and twisted making it difficult for the attacker to find the target and launch the attack. The best example is the two factor authentication. Every social media platform is using this simple but effective method. 3. Deflection This attack merely plays a game of hide and seek with the attacker by setting up traps, in other words, it deflects the efforts of the attacker by laying several

Study of the Need for Effective Cyber Security Trainings in India

711

honey traps making it an equally attractive target but in actual sense, those are just look alike sugar coated traps. 6.1.2

Reactive Security

1. Detection controls As the name suggests, it depends upon the real time notification and helps in detection of the situation. 2. Mitigation controls Mitigation control reduces the effectiveness of the attack, blocking the attacker to steal any important documents. 3. Recovery controls It provides best backup recovery and restoration during the crisis and helps avoid any further mishap. These proactive and reactive measures are intertwined together to get the maximum benefit and secure the organisation. As a result, these measures bring out the best of the cyber space but they are still not at their best because the field of cyber security is still evolving and the attackers are stubborn and rigid. Whereas the other side only reacts when there is a need of emergency. Fraudulent cases of cyber space is a common thing now days. Social media is an easy and popular target but how many solutions are given out and how many cases as such are reported and how many culprits are identified. Even if there is a solution it is not effective at the lower basic level. Therefore, cyber training at bottom level of the cyber space is compulsory to maintain cyber world a hygienic place.

7 Cyber Kill Chain Kill chain is a military concept to identify the source and pattern of enemy attack. It locates the intensity and then blocks the upcoming threats. It was developed in 2011 by Lockheed Martin. This is widely used to defend the system against complex attacks. There are many critics of cyber kill chain claiming it as ineffective and having a flawed personality [6, 7]. Based on the principle—Find, Fix, Track, Target, Engage and Assess. In other words, find the issue, fix the gaze, track all the possible movement of the target, keep the target under radar, select the best ammunition to blow up the target, now pull the trigger and engage the target causing losses to the other party. Finally, assess the losses and favourable gains.

712

R. K. Chawla et al.

7.1 Phases of Cyber Kill Chain 1. Reconnaissance In this phase the attacker recognizes the target and then seeks all the vulnerable information such as login credential and other personal details. The success of this phase depends on the amount of useful data collected from the target. 2. Weaponization Through weaponization, attacker uses a weapon maybe malware, ransomware, virus or worm. They also create a secondary path in the network in case if the main entry path is blocked they can still remain in the network or access it later on. 3. Delivery The attacker launches the ammunition and uses some social engineering skills to increase the effectiveness. 4. Exploitation The malicious content starts oozing poison in the system and corrupting all the files. Basically, it exploit’s the system. 5. Installation The malicious function entered into the system now starts usurping the power and starts controlling the system. Right after exploitation phase the installation takes place. 6. Command and Control In this phase, the attacker establishes the proper control over the system and creating several entry points for future uses. 7. Actions on Objective In this step, the attacker, starts taking steps to complete the main goal, starts data theft, credential robbery, etc.

7.2 Role of Cyber Kill Chain Cyber kill chain helps an individual to create strategies to keep their network free and healthy. It helps to prioritize cyber security. It keeps the unauthorized users at bay. Detects malicious contents and attackers at every cycle. Protects important data from being shared or stolen. Hinders the movement of attackers and responds at real time against attacks.

Study of the Need for Effective Cyber Security Trainings in India

713

8 Privacy versus Security Privacy and security are two important words, very popular with cyber world and cyber security. These two operations have a conflicting relationship with each other. Privacy and security are achieved at the cost of each other. Privacy sets a bar as to who can access or it puts a limit on the data which in turn helps in attaining security and sometimes security creates scenario through which the data becomes a private entity. It prevents data leak and maintains its confidentiality. Complete anonymity is also dangerous and can lead to data exploitation by individuals. Privacy and security goes hand in hand making it possible for the system to stay healthy and protected. Therefore, this shows that these two words are not conflicting rather they have a mutual relationship. Security is required to attest the importance of privacy. Some authors argue that security wins over privacy [8], while others claim that privacy takes over security.

9 Threats and Solution 1. Unauthorized Disclosure Of Data Data is transmitted over a network and in between it can be lost. Maybe attackers interested in data can enter the system and can easily steal the data, this phenomena is called eavesdropping. Eavesdropping is possible over distributed system, through wireless connection or by control of intermediary system (router or wifi access point). But to prevent this from happening data encryption is the best method. To facilitate the encryption establish a cryptographic key between sender and receiver. The only issue is that, only the message is encrypted but the identity of sender and receiver is clearly visible and the adversary knows the path which is followed. To ensure more security or to avoid eavesdropping one should use different path for sending information and should use multiple layer of encryption. 2. Unauthorized Modification and Fabrication Sometimes attackers modify the messages or send the wrong message, which can create a huge loss to an individual or an organization if not dealt with accuracy. Therefore, mere encryption is not helpful when it comes to protection of information. Additional protection includes the method authentication code, along with cryptographic key method authentication code is provided. The receiver is supposed to put together the message, key and the code into a function and verify if the address and the receiver is authorized. 3. Asymmetric Cryptography Till now we have discussed the generation of unique for each sender and receiver, but there is no guarantee that the unique key cannot be stolen instead if from

714

R. K. Chawla et al.

anyone of the system the key is copied and data can be easily stolen or manipulated. There is a possibility of adversary faking the identity or using the sender’s identity making it hard for the receiver to identify the possibilities of false identity. Asymmetric cryptography overcomes all the limitations of the cryptography. In this every individual creates a key pair which includes public key and the other is private key. Public key is given to everyone but private key is always kept a secret. In order to communicate with one another the sender has to have the receiver’s private key.

10 Malware Threat and Solutions Malware usually are transferred through internet. A common approach is to attach the malware with the email and then let the receiver click on it and finally the malware starts showing its true colour. These are simple tricks but a person who is actually devoted to attacking and theft uses some complex skills like spear phishing, drive by download, water holing attack, etc. A great example of a malware is the SQL Slammer which infected more than 75,000 individuals through Internet in 2003 [9]. It was so easy to target this mass because the servers were not protected with firewall. Since, then the firewalls became a significant feature. Malwares can be shared through external portable devices too. For example, people use USB which can easily transfer the malware from one host to the other. Experienced attackers use many different techniques. Spear phishing is used to target a single person. Drive by download is a way through which the target gets entangled into the website displayed by the attacker and hence, falls a prey in the hands of attacker in several ways. One of which is the malvertising attacks [10]. In these attackers put their malicious content through the advertisement available on the website. In water holing attack, the attacker looks for the websites usually visited by the host.

11 Cyber Security and Ethics Cyber security not only means combination of technology, software and some legal rules and regulation to protect the environment instead it has a deeper and stronger meaning to it. There few major components that certainly define cyber security– technology that keeps data confidential, technology that would act as a shield against foreign invasion and lastly the technologies which could deal with all the corrupted parts and create a perfect barrier against all the petty cyber-crimes (Hildebrandt 2013). For all these things to happen we need to level-up our game and have every possible complexity in mind, by learning new things and constantly scorching the issue at the level of attacker’s brain.

Study of the Need for Effective Cyber Security Trainings in India

715

Cyber security and ethics are related to each other in a very interesting way. Cyber security ethically has the responsibility to protect an individual from cyber warfare and give a healthy cyber world. Sometimes, security can be a coercive word because every organization has some twisted way of securing their space, having different rules and ethics according to the society forcing an individual to follow the one available at the movement. Similarly, the cyber security is also sometimes unethical yet, it is ethical. Menlo report is published on 3rd August, 2012 is the best example of ethically protecting humans and researches.

11.1 Menlo Report Menlo Report was published in order to facilitate the research involving Information and Communication Technology. It was published in 2012 by the US Department of Homeland Security Science and Technology Directorate, Cyber Security Division. Information and Communication Technology has become an integral part of daily human life and it is now an important part, to carry on day to day activities in one’s life. It is nothing but software, hardware and network sandwiched together to give the suitable arrangement when required. Information and Communication Technology Research includes the constant development of the field due to which there are several interactions between humans, subject and the researchers, due to which many data are leaked or is exposed to inevitable harms. Therefore, it is necessary to protect all the things involved through proper legal methods. Since 1980s, legal restrictions and regulations are strictly imposed in order to improve the safety and to decrease the vulnerability of humans involve. Humans are prone to danger even when engrossed in biomedical research or phycological finds but Information and Communication Technology Research cannot be ignored and the damage could be reduced. There should be a proper framework to protect and guide the research in a smooth and swift manner to encourage more and more research. The framework should be ethically infused to promote and inspire more institution to dig out new and undiscovered secrets of cyber world through consistent hard work. Menlo report is based on “Belmont Report” which has three fundamental ethical rules–respecting human subjects, calculating and balancing loss and gains and equally sharing the profits with society and human subjects. This report is not self imposing but instead includes basic rules and regulation for the research and development in the field of cyber world to protect the humans involved in the researches [11].

716

R. K. Chawla et al.

12 Indian Cyber Laws Cyber-crimes are not only limited to an individual or an organisation it is also a nationwide issue. International borders also include the problems related to cybercrimes, which give birth to cyber warfare. Every year around one percent of GDP is lost to cyber-crime [12].

12.1 IT Act 2000 [13] IT Act 2000 [13], covers many different types of crimes. 1. Identity Theft–theft of personnel information or in simpler words imitation of individual in order to avail any benefit. 2. Cyberterrorism–causing a huge havoc through cyber space in order to create terror in the mind of an individual. 3. Cyberbullying–cyberbullying involves defamation or online harassment of a person causing mental degradation. 4. Hacking–hacking is a skill where attacker enters the system unethically to steal the information. 5. Defamation–everybody has freedom of speech and it includes freedom of speech at online platforms also but it should not cross a limit or it should not be derogatory comment. 6. Trade secret–trade secret of an organisation is very important for an image of a company, cyber laws protect these companies against theft or forgery. 7. Freedom of Speech–according to Indian laws, freedom of speech is a fundamental right of an individual either it is offline or online. But cyber laws refrain individual from crossing the estimated limit. 8. Harassment and stalking–cyber harassment and cyber stalking are very common ailment. Cyber laws protect victims properly and punish the offender accordingly.

12.2 Sections of IT Law of 2000 • • • • • • • •

Section 65–tampering with computer source documents Section 66–using password of another person Section 66D–cheating using computer resources Section 66E–publishing of private image of others Section 66F–acts of cyber terrorism Section 67–publishing child porn or predating children online Section 69–government’s power to block websites Section 43A–data protection at corporate level

Study of the Need for Effective Cyber Security Trainings in India

717

The Indian government has launched a portal for reporting any case or grievances related to cyber-crime to enable the victims act as soon as possible and get the maximum benefit of all the cyber laws available in the country. Police and public order is a matter of concern of State list according to the Indian constitution. Therefore, states have the ability to protect and prevent people from any cyber degradation. National Critical Information Infrastructure Protection Centre has been established to protect the critical data in the country. Organisations providing any kind of digital services are required to report any theft or crime issues to the CERT-In. Cyber Swachhta Kendra has been established to detect malware, virus, Trojan, etc. List of several cyber threats and their immediate counter measures has been published by CERT-In. Several active NGOs have been established in order to provide basic training to the general public and as well as police personnel. Consistent auditing of website is done to check for any malicious content. Chief Information Security officers are appointed in order to look out for any issues regarding cyber laws and regulate national interest.

13 International Forums 1. International Telecommunication Union–regulated by United Nations, manages and improves cooperation among nations regarding cyber laws for cyber security. 2. Budapest Convention on Cybercrime–an international treaty which addresses several international cyber issues. Came into existence on 1st July, 2004. India is not a signatory member of this convention.

14 Conclusion Day to day life includes a piece of technology and is now an eternal part of our daily lives but that does not mean that we have to overlook all the threats and the discomforts. In fact to bring about a stronger foundation and use our cyber space to the fullest, one should understand the basics of cyber laws and should take lessons regarding the initials of cyber security. Nobody can protect an individual if he/she is not fully aware and if that is the issue then seek expert on time. According to today’s generation, training is really important because it will boast country’s future. The capacity building efforts should be in tune of the future needs. The study may be further extended to understand the current requirements as the technology has changed many folds. Now the old time methods of cyber security may not be effective in this era. Cyber criminals are using very advance techniques to be remain undetected and the older techniques may not be useful.

718

R. K. Chawla et al.

References 1. Wilcox H, Bhattacharya M (2015) Countering social engineering through social media: an enterprise security perspective 2. Costa LPDS, Figueira ACR (2017) Political risk and internationalization of enterprises: A literature review 3. Jansson B (2011) Becoming an effective policy advocate: from policy practice to social justice. Belmont, CL: Brooks/Cole 4. Hadnagy C, Aharoni M, O’Gorman J (2010) Social engineering capture the flag results. In: Proceedings of the Defcon 18, Las Vegas, NV, USA 5. Ghafir I, Prenosil V, Alhejailan A, Hammoudeh M (2016) Social engineering attack strategies and defence approaches. In: Proceedings of the 2016 IEEE 4th International Conference on Future Internet of Things and Cloud (FiCloud), Vienna, Austria, 22–24 August 2016, pp 145– 149 6. Engel G (2014) Deconstructing the cyber kill chain. https://www.darkreading.com/attacks-bre aches/deconstructing-the-cyber-kill-chain/a/d-id/1317542 7. Sheridan K (2018) The cyber kill chain gets a makeover. https://www.darkreading.com/threatintelligence/the-cyber-kill-chain-gets-a-makeover/d/d-id/1332892 8. Himma KE (2016) Why security trumps privacy. In: Moore AD (ed) Privacy, security, and accountability: ethics, law and, policy. Rowman & Littlefield International, London/New York, pp 145–170 9. Ferguson AG (2017) The rise of big data policing: surveillance, race, and the future of law enforcement. New York University Press, New York 10. Nichols S (2015) You’ve been Drudged! Malware-squirting ads appear on websites with 100+ million visitors. https://www.theregister.co.uk/2015/08/14/malvertising_expands_drudge/ 11. The Menlo Report (2012) Ethical principles guiding information and communication technology research 12. CSIS Report (2018) https://www.canada.ca/content/dam/csis-scrs/documents/publications/ 2018-PUBLIC_REPORT_ENGLISH_Digital.pdf 13. IT ACT 2000. Retrieved from https://www.meity.gov.in/content/cyber-laws 14. Domingo-Ferrer J, Blanco-Justicia A, Arnau JP et al (2017) Canvas white paper 4 – technological challenges in cybersecurity. In: SSRN. Retrieved from https://papers.ssrn.com/sol3/pap ers.cfm?abstract_id=3091942 15. Himma KE, Tavani HT (eds) (2008). Wiley, Hoboken 16. Lucas G (2017) Ethics and cyber warfare: the quest for responsible security in the age of digital warfare. Oxford University Press, New York, p 187 17. Manjikian M (2017) Cybersecurity ethics: an introduction. Routledge, London/New York 18. Amir W (2017) CCleaner backdoor attack: a state-sponsored espionage campaign. Retrieved from https://www.hackread.com/ccleaner-backdoor-attack-a-state-sponsored-espionage-cam paign/ 19. Yaghmaei E, van de Poel I, Christen M et al (2017) Canvas white paper 1 – cybersecurity and ethics. In: SSRN. Retrieved from https://papers.ssrn.com/sol3/papers.cfm?abstract_id= 3091909 20. Axelsson S (1999) The base-rate fallacy and its implications for the difficulty of intrusion detection. In: Proceedings of the 6th ACM conference on computer and communications security, CCS’99. ACM, New York, pp 1–7 21. Chan CS (2012) Complexity the worst enemy of security. Retrieved from https://www.schneier. com/news/archives/2012/12/complexity_the_worst.html 22. Erickson J (2008) Hacking: the art of exploitation, 2nd edn. No Starch Press, San Francisco 23. Fielding RT, Reschke J (2014) Hypertext transfer protocol (HTTP/1.1): message syntax and routing. Request for comments, RFC 7230. Retrieved from https://datatracker.ietf.org/doc/rfc 7230/

Study of the Need for Effective Cyber Security Trainings in India

719

24. Francillon A, Danev B, Capkun S (2011) Relay attacks on passive keyless entry and start systems in modern cars. In: Network distributed system security. The Internet Society, NDSS, Reston 25. Gollmann D (2011) Computer security, 3rd edn. Wiley, Chichester 26. Householder AD, Wassermann G, Manion A, King C (2017) The CERT® guide to coordinated vulnerability disclosure. Special report CMU/SEI-2017-SR-022. Carnegie Mellon University, CERT Division 27. Hutchins EM, Cloppert MJ, Amin RM (2011) Intelligence-driven computer network defence informed by analysis of adversary campaigns and intrusion kill chains. Lead Issue Inf Warf Secur Res 1(1):1–14 28. Pfleeger CP, Pfleeger SL, Margulies J (2015) Security in computing, 5th edn. Prentice Hall Press, Upper Saddle River 29. Schmidle N (2018) The digital vigilantes who hack back. Retrieved from https://www.newyor ker.com/magazine/2018/05/07/the-digital-vigilantes-who-hack-back 30. Smith RE (2012) A contemporary look at Saltzer and Schroeder’s 1975 design principles. IEEE Secur Priv Mag 10(6):20–25 31. Spitzner L (2002) Honeypots: tracking hackers. Addison-Wesley Longman Publishing Co., Inc., Boston 32. Spring T (2017) ExPetr called a Wiper Attack, not Ransomware. Retrieved from https://threat post.com/expetr-called-a-wiper-attack-not-ransomware/126614/ 33. Allen AL (2016) The duty to protect your own privacy. In: Moore AD (ed) Privacy, security, and accountability: ethics, law and, policy. Rowman & Littlefield International, London/New York, pp 18–38 34. Anderson E (1993) Value in ethics and economics. Harvard University Press, Cambridge, MA 35. Christen M, Gordijn B, Weber K et al (2017) A review of value-conflicts in cybersecurity: an assessment based on quantitative and qualitative literature analysis. Orbit J 1(1):1–19. https:// doi.org/10.29297/orbit.v1i1.28 36. Dancy J (1993) Moral reason. Philosophy 69(267):114–116 37. Friedman B, Nissenbaum H (1996) Bias in computer systems. ACM Trans Inf Syst 14(3):330– 347 38. Gregoratti C (2013) Human security: political science. In: The editors of Encyclopaedia Britannica. Retrieved from https://www.britannica.com/topic/human-security 39. Dewey J (1922) Human nature and conduct: an introduction to social psychology. Henry Holt, New York 40. Katell M, Moore AD (2016) Introduction: the value of privacy, security and accountability. In: Moore AD (ed) Privacy, security, and accountability: ethics, law and, policy. Rowman & Littlefield International, London/New York, pp 1–17 41. Mokrosinska D (2016) Privacy, freedom of speech and the sexual lives of office holders. In: Moore AD (ed) Privacy, security, and accountability: ethics, law and, policy. Rowman & Littlefield International, London/New York, pp 89–104 42. Moore AD (2003) Privacy: it’s meaning and value. Am Philos Q 40:215–227 43. Schwartz SH, Bilsky W (1987) Toward a universal psychological structure of human values. J Pers Soc Pyschol 53(3):550–562 44. Van den Hoven J, Lokhorst G-J, Van de Poel I (2012) Engineering and the problem of moral overload. Sci Eng Ethics 18:143–155

720

R. K. Chawla et al.

45. Williams RM Jr (1968) The concept of values. In: Sills D (ed) International encyclopedia of the social sciences. Macmillan Free Press, New York, pp 283–287 46. Dittrich D, Bailey M, Dietrich S (2011) Building an active computer security ethics community. IEEE Secur Priv Mag 9(4):32–40 47. Dworkin R (1977) Taking rights seriously. Harvard University Press, Cambridge, MA 48. Cybersecurity 101 Cyber kill chain (2022) What is cyber kill chain? Process and model. In: Crowdstrike. Retrieved from https://www.crowdstrike.com/cybersecurity-101/cyber-killchain/ 49. Lockheed Martin, Cyber kill chain. Retrieved from https://www.lockheedmartin.com/en-us/ capabilities/cyber/cyber-kill-chain.html 50. Kaspersky lab, ShowPad: how attackers hide backdoor in software used by hundreds of large companies around the world. Retrieved from https://www.kaspersky.com/about/pressreleases/2017_shadowpad-how-attackers-hide-backdoor-in-software-used-by-hundreds-oflarge-companies-around-the-world 51. Aggarwal A (2021) IoT security threats and challenges: how to secure IoT devices at the workplace. In: ETCIO. Retrieved from https://cio.economictimes.indiatimes.com/news/int ernet-of-things/iot-security-threats-and-challenges-how-to-secure-iot-devices-at-the-workpl ace/82191454 52. Rajgopala K (2022) 29 phones tested for Pegasus spyware: Supreme Court. Court gives more time to its panel to submit probe report. In: The Hindu. Retrieved from https://www.thehindu. com/news/national/pegasus-case-sc-grants-more-time-to-probe-panel-29-mobiles-being-exa mined-for-spyware/article65438682.ece 53. Over 400% rise in cyber crime cases committed against children in 2020. In: NCRB data. Retrieved from https://economictimes.indiatimes.com/news/india/over-400-rise-in-cybercrime-cases-committed-against-children-in-2020-ncrb-data/articleshow/87696995.cms?utm_ source=contentofinterest&utm_medium=text&utm_campaign=cppst 54. Cyber And Information Security (C&IS) DIVISION, India. Retrieved from https://www.mha. gov.in/en/divisionofmha/cyber-and-information-security-cis-division 55. Steps Taken to Deal with Cyber-crime and Cyber Security, Press Information Bureau, Government of India, Ministryof Home Affairs. Retrieved from https://pib.gov.in/Pressreleaseshare. aspx?PRID=1579226 56. Christen M, Gordijn B, Loi M (2020) The ethics of cybersecurity. In: The international library of ethics, law and technology, vol 21, p 384. Springer Open. Retrieved from https://library. oapen.org/bitstream/id/cf499c97-63dd-4227-ac4c-def79590a9b3/1007696.pdf

From Bricks to Clicks: The Potential of Big Data Analytics for Revolutionizing the Information Landscape in Higher Education Sector Ashraf Alam

and Atasi Mohanty

Abstract There has been a recent shift toward using big data in the administration of educational institutions. New complex data infrastructures with human and non-human agents enable data collection, processing, and dissemination in higher education (HE), where all these aspects are characterized by political, economic, and social settings. Instead of being seen just as technical endeavors, HE data infrastructures should be regarded as practical conduits of political ambitions to revamp the educational system. Projects on data infrastructure in higher education are the focus of this paper. Businesses, industry standards, dashboards, software, and visual analytics make up the infrastructure, and their relationship to legislative demands for market change along with their role in big data infrastructure is analyzed in this paper. The findings provide light on how the marketization agenda of politics is reconfiguring higher education and how the utopian ideal is remaking a ‘smarter university’. Keywords Pedagogy · Curriculum · Teaching · Learning · Educational technology · Smart campus · Visualization · Standards · Marketization · Infrastructure · Data · Dashboards

1 Introduction For quite some time, higher educational institutions (HEIs) have been collecting massive volumes of information on their students, courses, and other resources [1]. This article seeks to explore how complex data infrastructure projects in the higher education sector are laying the groundwork for a new architecture of technologies, experts, standards, values, and practices that will enable the integration of new big data technologies into existing institutional frameworks and procedures, with the potential to radically alter the field. The primary argument is that government aspirations to alter the sector are being communicated via HEIs data infrastructures in A. Alam (B) · A. Mohanty Rekhi Centre of Excellence for the Science of Happiness, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_51

721

722

A. Alam and A. Mohanty

a way that is both practical and technologically feasible [2]. The idealistic aim of establishing smarter digital universities and the political ambition of market reform is both being realized at the same time, thanks to data infrastructures, which are revolutionizing the higher education (HE) sector [3]. In this way, the data infrastructure serves as the hidden architecture of the commercialization of universities [4]. In an effort to make higher education more market and consumer-oriented, governments have positioned digital data as a key component of a radical political transition [5]. The government has begun to consider how big data and learning analytics may improve college recruitment, retention, and individualization of the educational experience [6]. Fluid data, if collected, linked, and analyzed, might offer a real-time, precise picture of a student’s functioning. To better understand and enhance learning and its settings, learning analytics involves the measurement, collection, analysis, and reporting of data on learners [7]. Data has the potential to greatly improve the university experience for students by allowing the institution to give each student individualized and targeted support and assistance. Learning analytics frameworks for policy formation in higher education facilitate formative evaluation and personalized instruction. Developing a framework is also necessary for the expansion of learning analytics programs and tools throughout the nation’s higher educational institutions. Funded by the Bill & Melinda Gates Foundation, the Postsecondary Data Collaborative (PostsecData) was established by the Institute of Higher Education Policy (IHEP) in the United States to encourage the collection of ‘high quality, robust, and meaningful’ data at the postsecondary level [8]. It is proposed that a centralized statistics agency will host this network in order to standardize data collection and use the lessons learned from existing data-sharing initiatives [9]. New companies are racing to perfect smart learning tools and algorithms as a result of the dazzling potential of adaptive learning and big data, which works with educational institutions, business owners, and the government to develop new models for postsecondary education in the United States [10]. Predictive modeling is used by these businesses to determine which pupils need interventions and which kinds of interventions will have the most impact. Some studies that look far enough into the future have even speculated on the potential roles that AI and natural user interfaces (those that can be controlled by the user’s voice, gesture, or facial expressions) could play in higher education, arguing that learning ecosystems will need to be flexible enough to incorporate these new approaches [11]. Data warehouses, data files, spreadsheets, information and records systems, visualization software, algorithm-led analytics packages, institutional dashboards, data managers, data stewards, business managers, financial officers, and deans are all part of the complex new data infrastructure systems that make it possible to collect, analyze, and present data in higher education. The development of HEI’s data infrastructure necessitates the participation of a wide range of technical experts. These experts often participate in public–private partnerships and work for multinational corporations. Consulting businesses, public-private partnerships, and think tanks are all contributing to the development of new data infrastructures by providing both theoretical and empirical support [12]. The HEIs data infrastructure is the

From Bricks to Clicks: The Potential of Big Data Analytics for …

723

outcome of new constellations of international, intergovernmental, and nongovernmental groups working together, just as the transportation, communication, and energy infrastructures were. We focus on three key points throughout this article: (1) Higher education (HE) data infrastructures are the product of policy networks comprised of governmental actors, independent organizations, consulting companies, think tanks, and corporate partners, whose reformative intentions for the sector are mirrored in the technology design. (2) Data standards enable the interoperability of the infrastructure by dictating the unstated rules for what types of HE data must be recorded and published and in what formats. And, (3) the production of data analytics dashboards and visualizations.

2 In-Depth Analysis of Data from the Field of Higher Education Changes in higher education have begun in many countries over the last 20 years. Numerous parts of academia are now being disrupted by the proliferation of digital data. Therefore, universities must strengthen their research capacities and acquire knowledge in the area of data collection, analysis, interpretation, and management. Multiple factors have influenced academic work recently. Another way that data and its analysis are thought to support responsive and effective learning is through the implementation of learning analytics and adaptive learning platforms. These tools can organize and categorize student data in order to provide feedback on pedagogical processes, shape how course materials are structured, and tailor each student’s learning experience. Mobilization of big data-driven learning analytics in higher education is on the rise because of the popularization of the idea of digital universities, which has the backing of both governments and businesses. Some researchers have hypothesized that ‘datafying’ the classroom might completely change education for students. The exponential expansion of the commercial education technology sector, the global proliferation of massive open online courses (MOOCs), and the advent of learning analytics, educational data mining, intelligent tutoring systems, and even artificial intelligence are all evidence that higher education has already become data-driven. Even for market-driven HE policies, big data analytics are becoming more crucial. Proponents of big data-driven changes in higher education are attempting to scale data analytics so that educational institutions at the national, state, and district levels can reap the benefits of the massive volumes of data presently being created in a manner that no one campus can [13]. As a result, educational institutions will be able to create more accurate prediction models that may be utilized to improve student outcomes and make more well-informed strategic choices. Digital data systems are used to keep tabs on teachers and students, and dataveillance techniques are used in order to zero in on particular individuals for the purposes of intervention via the establishment of data-driven administrative identities [14]. National and state-wide

724

A. Alam and A. Mohanty

efforts to harness the potential of massive troves of institutional and student data need a robust information technology backbone.

3 The Ensuing Infrastructures What we call ‘infrastructures’ are the physical networks that make it possible for things like goods, ideas, waste, power, people, and money to be transported from one place to another. However, as physical constructions, infrastructures in the current context need considerable effort to design, build, operate, and repair. In this line of inquiry, infrastructures are considered more than just the physical foundations or manmade structures that serve as a blank canvas for an infinite variety of uses. Infrastructure is a complex system that necessitates a wide range of people, organizations, technologies, policies, legal considerations, and financial arrangements to build, maintain, and rebuild due to its wide variety of technological objects, standards, values, administrative practices, and organizational work. Consequently, the infrastructures of daily life are constructed not only by a complex network of cables, connectors, and other infrastructure-related parts but also by regulatory agencies that approve interventions and committees that resolve competing demands when establishing standards [15]. Because of their modular design and ability to connect to other systems through plug-in modules and switches, infrastructures may grow and adapt to meet the needs of a changing business environment. In the network stage, participation grows exponentially, and so do social commitments, which manifest themselves in things like institutionalized practices, user routines, and overt social norms. Therefore, infrastructural assemblages shift and evolve as new data and theories are uncovered, new technologies are created, new companies are founded, existing organizations are restructured, the political economy shifts, new laws, and regulations are enacted, and old ones are repealed, new knowledge is gained, new perspectives are discussed, and new markets are opened or closed [16]. It is also worth noting that research into infrastructure has shown the utopianism of several projects. Because it provides the material basis for a certain kind of utopian future, infrastructure is seen as having political power. Data collection, storage, and sharing through interconnected digital technologies are all made feasible because of existing infrastructures. Commonly referred to as ‘data infrastructure’, this refers to the institutional, physical, and digital processes that enable the storage, exchange, and consumption of data across interconnected technologies. Therefore, it is difficult to create clear differences between science, technology, morality, society, and political power when it comes to infrastructure since these factors all interact and occur simultaneously. Examining the texts, photos, speeches, Websites, and pamphlets used to spread their ideas and sway audiences is equally important [17]. All infrastructures are built from interconnected systems of institutions, tools, and rules. Through their collaboration, a technology framework is being developed to facilitate strategic political shifts. By establishing the HE data

From Bricks to Clicks: The Potential of Big Data Analytics for …

725

infrastructure, the government is putting into practice its reformatory aims to build a marketized HE sector.

4 Consistency and Uniformity in Data Standardization is a political act because it dictates the types and formats of information that institutions of higher education must gather and make public [18]. This in turn affects how universities are identified, perceived, and perhaps targeted for change. The success of the reformed higher education market will depend on whether or not these conditions are met. The term ‘standard’ refers to any set of generally agreed-upon principles for producing documents or physical artifacts that may be used to link together various parts of a system [19]. When implemented as part of a larger system, certain standards for production and practice become fully functional. As standards become embedded in the infrastructures that define and aid in the coordination and organization of companies and even entire societies, they emerge as an integral part of all facets of society, from the economic to the cultural to the political. Standards are not only a set of technical requirements; they also define the plugs, jacks, and interoperable connections that make it possible to build and run a system. Internal social and psychological surroundings are also affected by data standards, not simply the outward physical environment. To construct an orderly world, we need standards to follow. In defining, embodying, or imposing ethics and values, standards may have far-reaching consequences for individuals [20]. They have connections that interweave with one another across a wide range of institutions, geographies, and technical infrastructures. In order to organize, analyze, and store the vast volumes of data that these systems and their users generate, new sorts of standards have been established during the last several decades [21]. The purpose of data standards is to ensure that data can be shared across programs and utilized in a consistent manner for tasks such as storage, management, and retrieval. They also serve as data quality standards [22]. As a result of these considerations, data standards are fundamental to the development of any comprehensive data infrastructure. Information is collected using standardized forms and metrics that have been agreed upon, developed, and planned by personnel to achieve a goal. Data standards, in other words, define the types of data that may be included in a given dataset as well as the rules for how that data can be combined, analyzed, and organized. Specifying standards is an important part of any HE data infrastructure project because it limits the types of data-generating devices that can be connected, the types of data that can be created and shared, the types of analyzes that can be conducted, and the types of actions and phenomena that can be tracked and reported. To work, the technical infrastructure needs more than simply bits and bytes normalized. As an added bonus, people’s professional and discursive practices are regulated to ensure consistency. Existing infrastructures standardize personnel in addition to physical assets [23]. The establishment and maintenance of standards

726

A. Alam and A. Mohanty

are a complex political and philosophical issue since they underpin our capacity to function in the world on a political and scientific level.

5 Data Harmonization in Higher Education and Quality Standards Applied Worldwide Improved data flows and analytics are possible as a result of the standardization provided by the common data language, which includes data on student applications and financial assistance, information from funders and regulators, information from national leadership programs, and business intelligence. Therefore, the data standards provide consistent data definitions that may be used universally to improve the efficiency of reporting and the comparability of reported data [24]. The standard dataset is structured in a way that makes it simple to incorporate historical data. The standard dataset requires both standardized coding for machine processing and standardized language for human consumption. The standards we create must be in line with those established by the International Organization for Standardization (ISO). New standards and data language will make reference to international frameworks like ISO 9001 (for quality management systems) and ISO 27001 (for information security). ISO is a worldwide metaorganization that connects practically every kind of organization in the world. Because of this, a worldwide consultant business dealing with standards compliance has developed. When a company aims to meet ISO 9000 standards, it must evaluate its performance against predetermined criteria, such as the satisfaction of its customers. Standards always have a hierarchy within themselves. Quality and security standards that are universally accepted around the globe must be included in the rules that govern data collection. The concept of making students the focal point of the educational process is framed by a worldwide norm that places a premium on data gathering, measurements, and the production of proof about internal procedures. Compliance with ISO 9001 sets a further norm for performance assessment in addition to its aims of standardizing the ‘collecting and exchange’ of student data. Standards play a vital role in infrastructure research because they facilitate the interconnection of distinct sociotechnical systems into networks that have the potential to initiate social transformation [25]. International benchmarks for quality management may be found within these specifications. The work of developing standards within the HE data infrastructure may be seen as an effort to regulate the regular tasks, activities, and experiences of universities, as well as a set of encoding rules which may regulate decisions and choices for a very long time.

From Bricks to Clicks: The Potential of Big Data Analytics for …

727

6 Data Visualization and Analytics Dashboard Management Data standards are not always obvious while working inside an infrastructure, but data visualization makes the data molded by those standards very clear. When it comes to higher education, data visualization is an approach that takes advantage of the standardized data flowing through the infrastructure as easily accessible displays to influence public perception, policymaker choices, and HE managers’ own evaluations of their institutions’ performances. Data visualization is widely used as a means of comprehending information and communicating findings, interpretations, or insights resulting from patterns in massive datasets [26]. Furthermore, modern control rooms often make use of dashboards displaying graphical representations of dynamic data. These dashboards summarize dynamic systems graphically for human operators, displaying data in the form of time-series graphs, charts, and maps of developing situations. Nowadays, a new kind of ‘governance by the dashboard’ is emerging as data dashboards make their way from the private to the public spheres [27]. To the governance process, dashboards provide new requirements, opportunities, dynamics, skills, and issues. They signal a vast epistemological and organizational realignment because they provide new ways of knowing, new criteria for what constitutes reliable knowledge, and new ways of acting in relation to its many forms of knowledge. Dashboards are a tool for governance that encourages a heightened focus on metrics, indicators, and measures; increased monitoring and analysis; a shift in the empirical basis for decision-making and the standards by which a good decision is judged; and the introduction of a new performance environment in which employees or the general public are more aware of how something measured is faring. Dashboards enable leaders and managers to control higher education more effectively by visualizing standards and allowing for a granular examination of phenomena and events of interest [28]. However, the new perspectives, conventions, and behaviors that emerge from the usage of dashboards are internal to the organizations that adopt them.

7 Data Presentation The control room is being introduced into university administration offices as part of the expanding HE data infrastructure in ways that have the potential to alter cultural norms, behavioral patterns, and philosophical outlooks, or at the very least to draw more attention to particular metrics. Users may create interactive dashboards and visualizations to reveal the relevant data, and the software’s drag-and-drop interface makes it simple to organize data into comparable groups for benchmarking purposes and choose the appropriate metrics and filters to suit their needs. It features intuitive visual analytics tools and gives users access to both historical and real-time HE data for use in analyzing trends over time. If the HE sector has access to a

728

A. Alam and A. Mohanty

real-time data processing environment that prioritizes state-of-the-art data manipulation and analysis and allows for the collection, combination, and visualization of datasets, then it may be better able to reap the benefits of business intelligence. Public users can access the visualizations and dashboards [29]. Team efforts across institutions use well-known data sources linked to additional demographic, socioeconomic, geographic, and pedagogical databases to identify and address issues. Executives and administrators at universities often use their own custom-made drag-and-drop visuals as a decision-making aid. Administrators in educational institutions have access to personally identifiable information about students and faculty. This allows schools to easily compare and analyze data from competitors at the provider and subject levels, track shifts in rankings over time, and identify the greatest winners and losers in the market. In addition to displaying data, dashboards and visualizations participate in visual analytics—a kind of visual reasoning.

8 Digital Dashboards for Monitoring Political Indicators Researchers who study infrastructure tend to zero in on its less obvious aspects, highlighting the ways in which it is embedded in and dependent upon other physical and social structures as well as technological developments. However, data visualization and dashboard technologies are developed to support the new data architecture and provide clear and comprehensible user interfaces. Data and its display as political indicator systems are open to experimentation by institutions. They also make it possible for regulators, the public, and policymakers to see data from institutions in a way that can influence their decisions and the courses of action they take. It has been recognized as an issue that dashboards and visualizations oversimplify complex phenomena for easy interpretation. An information dashboard’s data is rarely as straightforward as it may seem at first glance. To facilitate rapid consumption, dashboards often compress data, making it difficult for users to assess the data’s veracity and accuracy. Visualizations like this make it possible to quickly grasp and comprehend otherwise difficult concepts, facts, and data by presenting them in a more visually appealing and organized format. Imagery is a potent tool for effective thinking [30]. They may be taken somewhere else and used there; they can be duplicated, manipulated, merged, overlaid, and recreated. Visualization, whether in the form of a graphic, image, or diagram, has the potential to anchor ideas, problems, concepts, explanations, and arguments in one place and influence the way that others think about them. The user’s priorities may be defined via the use of subtle color cues, the arrangement of graphs and charts, and other visual signals. Furthermore, dashboards add further levels of authority and knowledge to the decision-making procedure. Expert dashboards require the combined efforts of visual designers, data analysts, and algorithm experts. Since they are entrusted with illuminating and explaining a phenomenon, their methods get embedded in that phenomenon. An even more drastic shift in governance relating to big data is the adoption of dashboard-based administration in higher education. Trust in conventional forms of

From Bricks to Clicks: The Potential of Big Data Analytics for …

729

knowledge, authority, and judgment has been on the decline as experts and elites have been criticized for failing to make objective assessments of the facts of the situation. Instead, there has been a rise in public regard for large-scale data processing computing systems and the people who oversee them and report on their findings. Big data, therefore, rewards those who can translate between numbers and the narrative that may be drawn from them. Thus, new forms of expertise are needed to manage political indicator systems, which enable higher education institutions to be evaluated and held accountable for their market performance. New real-time feedback technologies consist of, among other things, business intelligence applications and visualization tools. In the new measurement and performance-driven environment, the rising HE data infrastructure may alter the way universities are governed and, by extension, the ways in which institutions and individuals behave.

9 Discussion A sizable global industry has developed big data technologies for tertiary education, covering everything from business and organizational intelligence to learning analytics and being ready to be plugged into the infrastructure of smarter universities. This article laid the groundwork for completing the market reform of HE by exploring an adaptable architecture for forthcoming big data-driven technology upgrades. There is still a pressing need for international comparisons of the ways in which new forms of dynamic data processing infrastructure are being incorporated. In the United States, the postsecondary data collaborative (also known as PostsecData) is managed by the Institute for Higher Education Policy (IHEP). The Institute for Higher Education Policy (IHEP) is a philanthropic organization based in the nation’s capital that seeks to broaden participation in and completion of postsecondary education. The PostsecData collaborative was established in 2015 with funding from the Bill and Melinda Gates Foundation with the goal of creating robust and significant postsecondary education data policies and providing guidance to policymakers as they make critical decisions about what data to collect, how to collect it, who should have access to it, how to define metrics, and how to present data to the public [31]. The current infrastructure cannot keep up with the growing needs of the public. Assemblages like these are characterized by the pressure and interpenetration of a complex web of individuals, groups, organizations, and practices. In the past, universities and colleges relied on public funding from the government. However, as the market for higher education has grown and new commercial providers have entered the market, universities and colleges have joined the global consulting and technology industries and adopted the private sector’s managerial discourse on marketization. Particularly, noticeable is the infrastructure technology’s extra-statecraft character when it comes to data collection. Corporations, agencies, authorities, and organizations are producing a wide variety of data, which is challenging the state’s monopoly on data production, collection, and even data interception, which has historically been the purview of state agencies and authorities. Data has become an object of

730

A. Alam and A. Mohanty

interest to those in positions of power because no state, monarchy, kingdom, empire, government, or corporation in history have had such granular, immediate, varied, and detailed data about topics and objects that concern them. Given the enormous power it grants to infrastructure owners to comprehend, quantify, and categorize the topics and activities included in the infrastructure, control over data infrastructure can be seen as a form of data politics. The participation of these non-state actors improves the state’s capacity to monitor, audit, and execute political laws inside its own institutions. In particular, software solutions provide the state with a more accurate means of displaying HE institutions and the people who inhabit there in statistical, observable, contrasted, and evaluative terms, all benchmarked to certain needs. These programs are essential for today’s higher education infrastructure. They are the central node in a network of computers that collects and stores information from many different sources, process, and disseminates that information to other locations in ways that might lead to conviction about what the information is saying and ultimately influences activities like decision-making, political debate, and policy-making.

10 Conclusion The current political struggles for control of the higher education sector are motivating efforts to build a new HE data infrastructure. It is hoped that by making more institutional data available to applicants who are willing to pay a fee, a more competitive market of service providers may be established. The goal is to speed up the process of gathering information on the quality of education provided by institutions so that rankings may be created more efficiently. These components of the HE infrastructure are facilitating collaborative human-algorithm decision-making and monitoring. It emphasizes the importance of higher education (HE) institutions in creating new knowledge and developing the next generation’s digital skills. As such, it exemplifies the worldwide movement toward developing data-driven, digitally-enhanced universities that can compete successfully in the modern economy. The incorporation of new sources and practices of big data into organizational and pedagogical processes as well as rehabilitative initiatives is a priority for many HE systems throughout the globe, which is why they are beginning to move toward flexible data infrastructure setups. Utopian goals of improving the nation’s infrastructure inspire both groups. Smart learning tools are on the verge of being plugged into the university’s architecture in ways that will impose new modes of quantification and standardization and bring new actors and priorities from across the public and private sectors into consideration.

From Bricks to Clicks: The Potential of Big Data Analytics for …

731

References 1. Williamson B (2019) Policy networks, performance metrics and platform markets: charting the expanding data infrastructure of higher education. Br J Edu Technol 50(6):2794–2809 2. Alam A (2020) Challenges and possibilities in teaching and learning of calculus: a case study of India. J Educ Gift Young Sci 8(1):407–433 3. Aljohani NR, Aslam MA, Khadidos AO, Hassan SU (2022) A methodological framework to predict future market needs for sustainable skills management using AI and big data technologies. Appl Sci 12(14):6898 4. Alam A (2020) Pedagogy of calculus in India: an empirical investigation. Periódico Tchê Química 17(34):164–180 5. Alturki U, Aldraiweesh A (2022) Students’ perceptions of the actual use of mobile learning during COVID-19 pandemic in higher education. Sustainability 14(3):1125 6. Alam A (2020) Possibilities and challenges of compounding artificial intelligence in India’s educational landscape. Int J Adv Sci Technol 29(5):5077–5094 7. Sellar S (2020) Machinic affects: education data infrastructure and the pedagogy of objects. In: Mapping the affective turn in education. Routledge, pp 164–178 8. Alam A (2020) Test of knowledge of elementary vectors concepts (TKEVC) among first-semester bachelor of engineering and technology students. Periódico Tchê Química 17(35):477–494 9. Williamson B, Bayne S, Shay S (2020) The datafication of teaching in higher education: critical issues and perspectives. Teach High Educ 25(4):351–365 10. Alam A (2021) Should robots replace teachers? Mobilisation of AI and learning analytics in education. In: 2021 international conference on advances in computing, communication, and control (ICAC3). IEEE, pp 1–12 11. Ruipérez-Valiente JA, Gomez MJ, Martínez PA, Kim YJ (2021) Ideating and developing a visualization dashboard to support teachers using educational games in the classroom. IEEE Access 9:83467–83481 12. Alam A (2022) Impact of university’s human resources practices on professors’ occupational performance: empirical evidence from India’s higher education sector. In: Rajagopal BR (Eds) Inclusive businesses in developing economies. Palgrave Studies in Democracy, Innovation, and Entrepreneurship for Growth. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-03112217-0_6 13. Alam A (2021) Possibilities and apprehensions in the landscape of artificial intelligence in education. In: 2021 international conference on computational intelligence and computing applications (ICCICA). IEEE, pp 1–8 14. Kerssens N, van Dijck JOSÉ (2022) Governed by edtech? Valuing pedagogical autonomy in a platform society. Harv Educ Rev 92(2):284–303 15. Alam A (2022) Educational robotics and computer programming in early childhood education: a conceptual framework for assessing elementary school students’ computational thinking for designing powerful educational scenarios. In: 2022 international conference on smart technologies and systems for next generation computing (ICSTSN). IEEE, pp 1–7 16. Williamson B (2021) Making markets through digital platforms: pearson, edu-business, and the (e)valuation of higher education. Crit Stud Educ 62(1):50–66 17. Alam A (2022) A digital game based learning approach for effective curriculum transaction for teaching-learning of artificial intelligence and machine learning. In: 2022 international conference on sustainable computing and data communication systems (ICSCDS). IEEE, pp 69–74 18. Fathi M, Haghi Kashani M, Jameii SM, Mahdipour E (2021) Big data analytics in weather forecasting: a systematic review. Arch Comput Meth Eng 1–29 19. Alam A (2022) Investigating sustainable education and positive psychology interventions in schools towards achievement of sustainable happiness and wellbeing for 21st century pedagogy and curriculum. ECS Trans 107(1):19481

732

A. Alam and A. Mohanty

20. Webber KL, Zheng H (2020) Data analytics and the imperatives for data-informed decisionmaking in higher education. Big data on campus: data analytics and decision making in higher education (Part 1, 1) 21. Alam A (2022) Social robots in education for long-term human-robot interaction: socially supportive behaviour of robotic tutor for creating robo-tangible learning environment in a guided discovery learning interaction. ECS Trans 107(1):12389 22. Zuo C, Ding L, Liu X, Zhang H, Meng L (2022) Map-based dashboard design with open government data for learning and analysis of industrial innovation environment. Int J Cartogr 1–17 23. Alam A (2022) Positive psychology goes to school: conceptualizing students’ happiness in 21st century schools while ‘minding the mind!’ are we there yet? evidence-backed. Sch Based Pos Psychol Intervent ECS Trans 107(1):11199 24. Pangrazio L, Stornaiuolo A, Nichols TP, Garcia A, Philip TM (2022) Datafication meets platformization: materializing data processes in teaching and learning. Harv Educ Rev 92(2):257–283 25. Alam A (2022) Cloud-based e-learning: development of conceptual model for adaptive elearning ecosystem based on cloud computing infrastructure. In: Kumar A, Fister Jr I, Gupta PK, Debayle J, Zhang ZJ, Usman M (Eds) Artificial intelligence and data science. ICAIDS 2021. Communications in computer and information science, vol 1673. Springer, Cham 26. Zhang JZ, Srivastava PR, Sharma D, Eachempati P (2021) Big data analytics and machine learning: a retrospective overview and bibliometric analysis. Expert Syst Appl 184:115561 27. Alam A (2022) Mapping a sustainable future through conceptualization of transformative learning framework, education for sustainable development, critical reflection, and responsible citizenship: an exploration of pedagogies for twenty-first century learning. ECS Trans 107(1):9827 28. Shi Y (2022) Advances in big data analytics: theory, algorithms and practices. Springer 29. Alam A (2022) Employing adaptive learning and intelligent tutoring robots for virtual classrooms and smart campuses: reforming education in the age of artificial intelligence. In: Shaw RN, Das S, Piuri V, Bianchini M (Eds) Advanced computing and intelligent technologies. Lecture notes in electrical engineering, vol 914. Springer, Singapore 30. Damyanov I, Tsankov N (2019) On the possibilities of applying dashboards in the educational system. TEM J 8(2):424 31. Alam A (2022) Cloud-based e-learning: scaffolding the environment for adaptive e-learning ecosystem based on cloud computing infrastructure. In: Computer communication, networking and IoT: proceedings of 5th ICICC 2021, Vol 2. Springer, Singapore, pp 1–9

Enabling Technologies and Applications

Learning on the Move: A Pedagogical Framework for State-of-the-Art Mobile Learning Ashraf Alam

and Atasi Mohanty

Abstract Efforts in both technology and pedagogy concentrate on four central axes that together define the educational landscape of the future. Mobility, interactivity, artificial intelligence, technological learning tools like games, and augmented reality all fall under this category. Combining them requires creating a mobile-interactive model that takes into consideration the learner’s availability and their convenient times. Technology is already being utilized in education, albeit various forms of it are used in different settings. Because of this, it is critical to integrate and combine them into pedagogical models that place a premium on the students’ education. This article analyzes many kinds of technology and proposes a unified model that might serve as a basis for classroom education. In the end, it is emphasized how important it is to have intelligent tutoring systems in order to make tutoring broadly available, as well as how important it is to conduct technological experiments and apply the findings to ‘teaching–learning models’ that make use of multiple interaction patterns. Keywords Social networks · AI · Mobile learning · Pedagogy · Curriculum · Teaching · Learning · Educational technology · Smart campuses

1 1 Mobile Learning Platform Today, information and knowledge can be accessed in record time, technological advances are being made at a faster rate than pedagogical ones, and the future of many academic disciplines is determined by technological advances [1]. The Horizon Report forecasted that mobile computing will increase in popularity over the next several years based on evidence of device sales [2]. Two elements that have helped the rise of mobile learning are the expansion of mobile access plans and the introduction of mobile learning that is accessible to all students [3]. The widespread use of Internet access via mobile devices is working to level the playing field for people of different races and socioeconomic backgrounds [4]. A. Alam (B) · A. Mohanty Rekhi Centre of Excellence for the Science of Happiness, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_52

735

736

A. Alam and A. Mohanty

As a result of mobile devices’ versatility and the ease with which they may be combined with other technologies, have had a profound effect on the education sector. They make it possible to have resources available at any time, and from any place, creating a convergence of opportunities that, if harnessed properly, might improve educational performance [5]. The concept of mobility, when applied to the field of education, has the potential to make processes universal and to combine formal classroom learning with informal learning in social networks, thereby shattering structures and concepts and paving the way for several innovations whose effects can only be fully appreciated through personal experience [6]. The Web 2.0 phenomenon has helped extend the mobile wave internationally, in addition to its use in the classroom [7]. People spend a lot of time on the Internet for fun and games, so it makes sense that they spend a lot of time uploading content to share with others, chatting with each other, and working together to produce knowledge as part of a larger group [8]. The responsibilities of a student may be divided and simplified into four categories using mobile learning: 1. 2. 3. 4.

Producing and recording one’s own music is a skill that may be taught to students. There are resources available to help students learn. Inputs to the learning process are digitally processed by the students. Via their interactions with one another and their instructors, students form relationships that are conducive to their academic growth.

The fourth category focuses on the dialog that develops between professors and their students within a network of education [9]. Through interpersonal interactions and social learning, a social constructivist environment is produced in which the social context serves as the basis for the creation of knowledge, and constructive community behaviors are encouraged [10]. Mobile devices broaden our access to information and facilitate interaction, learning, and group work. It improves the dynamism of education by increasing the available means of communication [11]. Nonetheless, it is critical to develop experimental models that include cutting-edge innovation, since they may lead to beneficial educational results [12]. Practitioners and theorists in this area have advocated for more research to be conducted on the efficacy of using mobile devices in the classroom, with positive results widely documented [13]. While research and anecdotal evidence from many mobile learning domains are available, this information must be synthesized and included in reference models that accurately depict the desired results in terms of usefulness and effectiveness [14]. There is a distinction between egocentric and object-centric social networks, but otherwise, conversations in these structures are mostly focused on the topics and information being shared [9].

Learning on the Move: A Pedagogical Framework for State-of-the-Art …

737

2 Learning and Socialization Experience-based learning, or ‘informal education’, refers to instruction that is based on actual life circumstances [15]. Every day, via trial and error, chance meetings, and unexpected circumstances, we expand our prospects and horizons of possibilities [16]. Often it is believed to be unusual for learning to take place in an informal setting. Informal learning originated in the social learning theory, which states that individuals are more likely to adopt desirable behaviors from others around them [17] (Table 1). With randomly occurring content-generating visuals and moving pictures, users can think critically and creatively while completing complicated tasks when they are presented with information in several formats (text, audio, music, images, pictures, animation, and video). Talking about problems, it stems from people wanting to hear other people’s perspectives or educate themselves on how others in their community feel about various topics [18, 19]. The explosive popularity of mobile phones has sparked the creation of related technologies with an emphasis on improving society [20, 21]. Students may engage in a group-based, inclusive spiral by producing, sharing, commenting on, and enhancing the content using Web 2.0 technologies and social networks [22]. In addition, courtesy mobile technology, these pursuits may be engaged anytime, anywhere [23]. One of the main advantages of mobile technology is that it makes it easier for people to communicate with one another. These days, users like to have asynchronous conversations rather than ones that occur at set intervals [24]. With this, learning becomes a way of life rather than a routine activity. Formal subjects may be learned in an informal setting via the usage of social networks [25]. Students produce material, communicate with one another, and acquire knowledge on curriculum-related topics [26]. It is evident that today’s youth have made informal learning a normal part of their lives. Academia must adapt to these new conditions, to which students are accustomed and has become comfortable to, in order to take advantage of the spaces that social networks provide and, more crucially, the strategies employed to connect and learn jointly [27].

3 AI and Recommender Systems AI has recently expanded into the field of recommender systems (RSs), which provide personalized recommendations for content, actions, and products [28]. An RS may provide a product suggestion or a product prediction depending on the user’s preferences [29]. RSs may be seen as an information filtering system that offers customized product suggestions to the user [30]. These provide a convenient way to suggest relevant people, places, and things to others. The future of artificial intelligence will be in RSs. They act as tutors and provide assistance to teachers in the classroom to

738

A. Alam and A. Mohanty

Table 1 Trends in mobile learning 1

Flash to HTML5

HTML5 has become the smarter and faster way to render engaging content such as animations and videos to a whole range of mobile devices

2

Mobile app analytics

In 2025, we can expect to see analytics playing a bigger role in understanding learners’ interaction and behavior with mobile-based courses, such as by monitoring app traffic and visualizing page-flows

3

Responsive Web development for multi-device consistency

There is a greater shift toward keeping content responsive, i.e., adjustable to different screen sizes to provide consistency in the quality of learning experiences because of the multi-screen usage trend

4

Mobiles as devices for competency-based learning

There is an escalating trend of mobile videos and simulators being used for professional competency building, such as in the healthcare and manufacturing industries

5

Geo-location sensitive learning

Learning design that is based on a social, mobile, real-time lemming, and geo location can create authentic, personalized, and context-aware learning models wherein learners can have real-time dashboards to monitor their progress and access to the right content or experts based on geolocation

6

Social mobile learning

Brainstorming discussions, events, and groups on Facebook or a specially created social mobile platform are rapidly changing how employees interact with each other to share knowledge and drive innovation

7

Device agnostic design approach

The rise of the BYOD trend calls for a design approach that can cater to a variety of devices at the workplace and device agnostic content is the answer to it

8

Multi-screen usage

A GfK study confirmed in 2014 that shuffling between multiple screens has become a norm. The same is true for mobile learning as both smartphones and tablets have become an inseparable part of our lives

9

Byte-sized learning for performance support

Mobiles are being injected as the ultimate medium for just-in-time support which could come in easily digestible and immediately useful information ‘nuggets’ (continued)

Learning on the Move: A Pedagogical Framework for State-of-the-Art …

739

Table 1 (continued) 10

Gamified learning and assistance

Gamification on the mobile is fun, engaging, and convenient and provides an easy method to teach abstract concepts such as team building and decision making

11

Augmented reality for mobiles

Mobile apps that superimpose digital information on a mobile or tablet screen that captures an object using the camera have already been developed

12

Wearable devices

Be it for monitoring health, finding directions, or connecting with people, there is a lot of activity in the wearable technology segment. As wearables shift from becoming more of a need than a luxury, we can expect them to see them being used actively for learning

propose educational materials and activities [31]. Because of their immense potential, they are expected to play a pivotal role in the evolution of massive open online courses (MOOCs) [32]. RSs are an integral part of the academic process in a variety of studies. Intelligent technologies can help users find the digital learning resources that best suit their individual profiles [33]. Interface agents, semantic refining agents, user profile agents, search engine agents, mediator agents, and recommender agents are all parts of the multi-agent architecture used in the creation of intelligent systems. In addition, we have the continuous improvement of e-learning courses framework (CIECoF), a collaborative RS for education whose main aim is to help professors improve their online lessons [34]. This is accomplished by applying distributed data mining to a client–server architecture with N clients all using the same association rule mining technique locally on their own and taking as input the prior behavior of students in an online course. By analyzing student participation in OpenACS/dotLRN course forums, unsupervised learning approaches in VLEs are facilitating the creation of recommendations that improve students’ educational experiences. These studies are only a small sample of the substantial literature on the topic of incorporating AI into instructional materials and practices.

4 Augmented Reality The term ‘augmented reality’ (AR) is used to describe a world that goes beyond the capabilities of the human senses. Augmented reality is a technology that superimposes digital information over a user’s live view of the world. Data created by computers, or data of any kind, might be stored here. Augmented reality is a technology that uses graphics, vision, and multimedia to superimpose digital information onto a user’s real-world surroundings. Static, dynamic, interactive, or autonomous

740

A. Alam and A. Mohanty

Table 2 Design tips for mobile learning 1 Minimize GUI functionalities

Use minimum functional elements and should be highly visible and large enough to operate easily

2 Divide the course into multiple modules

Short modules capture learners’ attention and make them stay focused till the end

3 Offer small nuggets

To have just-in-time knowledge during their downtime to retain only essential information

4 Replace long text with audio

Based on the target audience and their working environment, replace large sections of text with audio

5 Avoid complicated graphics and background Complex graphics and backgrounds distract images the learners’ attention 6 Minimize scrolls

Try to avoid scrolls, if not, use vertical scrolls as the height of a mobile screen is more than its width

7 Design for a single-hand use

Users use their mobile with a single hand and use their thumb to navigate through the screen (the average width of the adult thumb is 25 mm)

8 Use simple interactivities

Use simple interactives such as clicking on tabs, images, icons, rollovers, and hotspots for effective learning

features may all be used in augmented reality applications. These components may be seen on a conventional monitor, a device with enhanced vision, or holographic projections. Allowing students to engage with the real world, where more knowledge is stored, is a major benefit. Adding this layer allows for the modification of reality via the use of digital components, which may enhance learning and the ability to perceive the world around them (Table 2). In the classroom, augmented reality technology might have a major influence. It is best employed for investigating sensitive or inaccessible fields of information. Augmented reality technology is not a spectator sport. Students may use it to learn new material by engaging with virtual objects that illustrate abstract concepts. Dynamic processes, enormous datasets, and unusually shaped or scaled items may all be brought into the learner’s personal space at a level and in a manner that is simple to grasp and deal with. The ability of students to see, hear, and change content gives them an edge over more conventional learning approaches, making interactive and autonomous elements extremely important in the field of education. Also, it is feasible to repeat specific steps of a process as much as needed without squandering resources or putting oneself in harm. The multi-modal visualization of complex theoretical concepts, the practical exploration of the theory through concrete examples, the natural interaction with multimedia representations of teaching material, and the successful collaboration and discussion among the participants are four possible benefits of augmented reality in the field of education.

Learning on the Move: A Pedagogical Framework for State-of-the-Art …

741

5 Game-Based Learning Game-based learning refers to the use and development of game mechanics in ‘nongame’ contexts. Motive, cognition, and sociocultural lenses may all be used in the study of game-based learning. Students with ambition and access to material that is somewhat inspiring have a chance of succeeding in school. Several studies have shown a correlation between study motivation and retention. There are six factors that combine to create a state of intrinsic motivation: challenge, control, fantasy, competence, cooperation, and acknowledgment. These features promote user engagement, and they align with the shift from a teacher-centered to a learner-centered teaching approach. From a theoretical point of view, there are two avenues for incorporating games into higher education. At the outset, games are used for a wider variety of reasons, and their presence within a set of educational activities is given more importance. That is why they are so useful; they make it possible to expand one’s knowledge and skill set. Second, games are used when they add something useful to the content being covered.

6 Incorporating Technology into Classroom Learning Several studies and standards have been developed because of the incorporation of technology in classrooms. These have facilitated the organization of schoolwork and provided guidelines for the most effective use of technological advances in education. By going through that procedure, it became clear that both teaching techniques and teaching–learning materials affect student learning. This exemplifies the need of taking a holistic view of teaching and learning models, as well as the strong relationship between concepts and approaches. In certain cases, it is important to guide formative behaviors by providing context, which may be aided by a conceptual framework that positions mobile learning from multiple viewpoints. In an early attempt to establish a platform for studying mobile learning, a four-type classification of mobile learning systems was devised [1]. Figure 1 divides the x-axis into two halves, with individualistic pursuits (−x) and collaborative pursuits (+x). High transactional distance (−y) occurs when an endeavor needs a highly structured academic curriculum, whereas low transactional distance (+y) occurs when no such structure is seen to be necessary [1].

7 Ubiquitous Learning The most game-changing inventions are also the ones most likely to be forgotten. The term ‘ubiquitous computing’ characterizes situations in which computers are present but not intrusive. As this concept is applied to the realm of education, we get

742

A. Alam and A. Mohanty

Fig. 1 Mobile learning framework [1]

the term ubiquitous learning (u-learning), which refers to the practice of learning in an environment where all students have constant access to a variety of digital tools and resources, such as mobile computing devices and Internet-connected computers. Figure 2 illustrates the conceptual shifts from e-learning to m-learning to u-learning [1]. Using ubiquitous computing in education, we may visualize a classroom in which teachers maintain concentration on their area of expertise while simultaneously using technology to boost student learning [1, 9]. Mobile computers are a vital component of ubiquitous learning, among the many other technological tools that may be used [1]. Involvement, presence, and adaptability are the three parts of yet another paradigm. These concepts serve as tools for evaluating the efficacy of new approaches to mobile education. Alternatively, the following three elements are also often included: individualization, collaboration, and genuineness. With a focus on education, this model takes a spatial and temporal approach to mobility. Beyond conceptual elements, usability should be considered when designing models. One way to

Fig. 2 Flow of e-, m-, and u-learning [1]

Learning on the Move: A Pedagogical Framework for State-of-the-Art …

743

achieve this is via the use of a multi-level assessment framework. This may take the form of a model with micro-, meso-, and macro-levels to examine individual behaviors, the learning experience, and the impact on the institution. Two more thorough models exist, however, and they take into consideration mobile devices’ core technical components. It involves four parts: incorporating tools, implementing instructional techniques, evaluating evaluation approaches, and training teachers. One is founded on a holistic perspective that allows for optimal context-specific placement of learning. In this paradigm, mobile technology improves the interaction between learners, educators, and teaching–learning materials. Education approaches such as constructivism, active learning, collaborative learning, and blended learning are all applicable here. Essential assessment strategies include, but are not limited to, computer-based assessment, tutor assessment, self-assessment, and peer review (Table 3). In this article, we provide a framework for defining the elements that, when applied to a digital platform, enable the development of dynamic, intelligent mobile learning scenarios that provide students with an educational experience that is uniquely suited to their needs and preferences. Mobility is the initial pillar of the concept, and its greatest strength is that it allows participants in a teaching–learning process to stay in constant contact with one another regardless of where they are in the world or what time it is. The second aspect of this framework is socialization, the building blocks for an interconnected world. These connections represent the variety and potential of education. The need to combine formal and informal learning is especially important to consider here since informal learning often takes place in social networks via leisure or entertainment content. There are, however, issues with adapting social network technology and methods to the more traditional curriculum used in schools.

8 Visualization of the Connections Artificial intelligence (AI) plays a supporting role in our method through RSs, which are entrusted with modeling students’ learning patterns and customizing tools and resources accordingly, setting our strategy unique from others currently in use. It may act as an online instructor for students. The purpose is to steer pupils in the right direction. Educational resources are often crucial to the learning activities that make up a course’s framework. As a result, they boost efficiency and maximize technical potential. Technology like video games and augmented reality may aid students in their learning process. They are interesting because of the multimedia features and easy-to-understand writing. There are two types of interactions between the components, both of which are based on the principle of ubiquity. The first set of connections suggests that learning and practice are key to fostering participation. This is articulated in the course’s instructional design, which proposes that formal education evolves into a lifestyle that makes use of informal learning methods and challenges established assumptions.

744

A. Alam and A. Mohanty

Table 3 Future of learning A glimpse into the future of learning These changes point the way toward a diverse learning ecosystem in which learning adapts to each child instead of each child trying to adapt to school

Learning will no longer be defined by time and place unless a learner wants to learn at a particular time and in a particular place Whatever the path, radical personalization will become the norm, with learning approaches and supports tailored to each learner Some of those tools will use rich data to provide insight into learning and suggest strategies for success Diverse forms of credentials, certificates, and reputation markers will reflect the many ways in which people learn and demonstrate mastery At the same time, geographic and virtual communities will take ownership of learning in new ways, blending it with other kinds of activity As more people take it upon themselves to find solutions, a new wave of social innovation will help address resource constraints and other challenges Educators’ jobs will diversify as many new learning agent roles emerge to support learning A wide variety of digital networks, platforms, and content resources will help learners and learning agents connect and learn ‘School’ will take many forms. Sometimes, it will be self-organized Work will evolve so rapidly that continuous career readiness will become the norm Learners and their families will create individualized learning playlists reflecting their particular interests, goals, and values Those learning playlists might include public schools but could also include a wide variety of digitally-mediated or place-based learning experiences

The student’s interaction with electronic media defines the second group of ties. When a user’s actions inside a virtual learning environment (VLE) are tracked, a profile of the user’s preferred methods of study may be constructed. There are three elements in any system that influence this connection. The first is input, which originates from the activities of students inside the classroom. In other words, everything from their preferences and actions to the study materials they consult, the activities they partake in, the amount of time they dedicate to them, etc., is tracked (Table 4).

Learning on the Move: A Pedagogical Framework for State-of-the-Art …

745

Table 4 Benefits of m-learning over traditional e-learning Benefits of m-learning over traditional e-learning Performance support

M-learning is ideal for performance support intervention as learners have easy access to information while at work. This leads to increased usage and retrieval

Learning path

Mobile devices can be used to update learners on their ‘learning path’, thereby facilitating ‘learning as a continuum’

Higher engagement

The training experience is more immersive, and completion rates are higher as compared to traditional e-learning

Better completion rates and higher retention The bite-sized or microlearning approach makes it easier for learners to initiate, complete, and retain learning better Flexibility to learners

With m-learning, learners have the flexibility of learning anytime, anywhere on the device of their choice and in varied formats

Collaborative learning

It is a great way to engage with peers to share learning experiences and be part of communities of specific practices

Multi-device support

The same course is available on varied devices ranging from PCs and laptops to tablets and smartphones

Processing the incoming data is the second stage. Algorithms used in artificial intelligence provide paths to knowledge based on preferences and habits. The materials and activities along these routes are tailored to the learner based on their unique learning profile. By constantly analyzing incoming data, the system is able to grow and learn as it continually refines its predetermined learning paths, patterns, and preferences. The third is the development of online classrooms driven by students’ needs and inclinations. This modification is based on the findings from analyzing the students’ data.

9 Concluding Remarks In the not-so-distant future, it is possible that technological aspects may dominate two distinct fields. In the first place, the development of mobile technology has contributed to the widespread acceptance of the concept of ubiquity. For this reason, technological advancements are needed that fully exploit the advantages of mobile learning, adapt methods to new forms of interaction and learning, and develop and experiment with teaching–learning models as alternatives that can raise the students’ levels of content assimilation and technological innovation proposals. The second

746

A. Alam and A. Mohanty

is that student support and tutoring are increasingly being incorporated into classrooms via automated means. The widespread availability of massive open online courses (MOOCs) is a key factor in propelling the research and development of automated tutoring systems. Artificial intelligence has a lot of potential applications in the classroom, and some of the earliest advances in this direction have already been accomplished. In future, VLEs will have smart features like personalized learning and coaching. In order to successfully deliver adequate learning levels, it is essential to experiment with various technologies and to have a cross-disciplinary methodological axis.

References 1. Park Y (2011) A pedagogical framework for mobile learning: categorizing educational applications of mobile technologies into four types. Int Rev Res Open Dist Learn 12(2):78–102 2. Adewale OS, Agbonifo OC, Ibam EO, Makinde AI, Boyinbode OK, Ojokoh BA, Olatunji SO et al (2022) Design of a personalised adaptive ubiquitous learning system. Interact Learn Environ 1–21 3. Akturk AO (2022) Thirty-five years of the journal of computer assisted learning: a bibliometric overview. J Comp Assist Learn 4. Alam A (2020) Challenges and possibilities in teaching and learning of calculus: a case study of India. J Educ Gift Young Sci 8(1):407–433 5. Tseng SS, Chen SN, Yang TY (2022) Building an AR-based smart campus platform. Multimedia Tools Appl 81(4):5695–5716 6. Alam A (2020) Pedagogy of Calculus in India: an empirical investigation. Periódico Tchê Química 17(34):164–180 7. Zhang X (2022) The influence of mobile learning on the optimization of teaching mode in higher education. Wire Commun Mob Comput 8. Alam A (2020) Possibilities and challenges of compounding artificial intelligence in India’s educational landscape. Int J Adv Sci Technol 29(5):5077–5094 9. Zhang M, Chen Y, Zhang S, Zhang W, Li Y, Yang S (2022) Understanding mobile learning continuance from an online-cum- offline learning perspective: a SEM-neural network method. Int J Mobile Commun 20(1):105–127 10. Alam A (2020) Test of knowledge of elementary vectors concepts (TKEVC) among first-semester bachelor of engineering and technology students. Periódico Tchê Química 17(35):477–494 11. Zahtila M, Burghardt D (2022) Location-based mobile learning on relief mapping methods. J Locat Based Serv 1–28 12. Alam A (2022) Impact of university’s human resources practices on professors’ occupational performance: empirical evidence from India’s higher education sector. In: Rajagopal BR (Eds) Inclusive businesses in developing economies. Palgrave Studies in Democracy, Innovation, and Entrepreneurship for Growth. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-03112217-0_6 13. Yu J, Denham AR, Searight E (2022) A systematic review of augmented reality game-based Learning in STEM education. Educ Technol Res Develop 1–26 14. Alam A (2021) Possibilities and apprehensions in the landscape of artificial intelligence in education. In: 2021 International conference on computational intelligence and computing applications (ICCICA). IEEE, pp 1–8 15. Yu D, Yan Z, He X (2022) Capturing knowledge trajectories of mobile learning research: a main path analysis. Educ Inform Technol 1–24

Learning on the Move: A Pedagogical Framework for State-of-the-Art …

747

16. Alam A (2021) Should robots replace teachers? Mobilisation of AI and learning analytics in education. In: 2021 International conference on advances in computing, communication, and control (ICAC3). IEEE, pp 1–12 17. Wang LH, Chen B, Hwang GJ, Guan JQ, Wang YQ (2022) Effects of digital game-based STEM education on students’ learning achievement: a meta-analysis. Int J STEM Educ 9(1):1–13 18. Alam A (2022) A digital game based learning approach for effective curriculum transaction for teaching-learning of artificial intelligence and machine learning. In 2022 International conference on sustainable computing and data communication systems (ICSCDS). IEEE, pp 69–74 19. Todino MD, Desimone G, Kidiamboko S (2022) Mobile learning and artificial intelligence to improve the teaching-learning process in ICT global market age. Studi sulla Formazione/Open J Educ 25(1):233–249 20. Alam A (2022) Educational robotics and computer programming in early childhood education: a conceptual framework for assessing elementary school students’ computational thinking for designing powerful educational scenarios. In 2022 International conference on smart technologies and systems for next generation computing (ICSTSN). IEEE, pp 1–7 21. Tlili A, Padilla-Zea N, Garzón J, Wang Y, Kinshuk K, Burgos D (2022) The changing landscape of mobile learning pedagogy: a systematic literature review. Interact Learn Environ 1–18 22. Alam A (2022) Employing adaptive learning and intelligent tutoring robots for virtual classrooms and smart campuses: reforming education in the age of artificial intelligence. In: Shaw RN, Das S, Piuri V, Bianchini M (Eds) Advanced computing and intelligent technologies. Lecture Notes in Electrical Engineering, vol 914. Springer, Singapore 23. Singh Y, Suri PK (2022) An empirical analysis of mobile learning app usage experience. Technol Soc 68:101929 24. Alam A (2022) Cloud-based e-learning: development of conceptual model for adaptive elearning ecosystem based on cloud computing infrastructure. In: Kumar A, Fister Jr I, Gupta PK, Debayle J, Zhang ZJ, Usman M (Eds) Artificial intelligence and data science. ICAIDS 2021. Communications in Computer and Information Science, vol 1673. Springer, Cham 25. Petrovi´c L, Stojanovi´c D, Mitrovi´c S, Bara´c D, Bogdanovi´c Z (2022) Designing an extended smart classroom: an approach to game-based learning for IoT. Comput Appl Eng Educ 30(1):117–132 26. Alam A (2022) Investigating sustainable education and positive psychology interventions in schools towards achievement of sustainable happiness and wellbeing for 21st century pedagogy and curriculum. ECS Trans 107(1):19481 27. Sáez-López JM (2022) Application of the ubiquitous game with augmented reality in primary education. Sáez-López JM, Sevillano-García ML, Pascual-Sevillano MA (2019) Application of the ubiquitous game with augmented reality in primary education. Comunicar 61:71–82 28. Alam A (2022) Mapping a sustainable future through conceptualization of transformative learning framework, education for sustainable development, critical reflection, and responsible citizenship: an exploration of pedagogies for twenty-first century learning. ECS Trans 107(1):9827 29. Pishtari G, Rodríguez-Triana MJ (2022) An analysis of Mobile learning tools in terms of pedagogical affordances and support to the learning activity life cycle. In Hybrid learning spaces. Springer, Cham, pp 167–183 30. Alam A (2022) Positive psychology goes to school: conceptualizing students’ happiness in 21st century schools while ‘minding the mind!’ are we there yet? evidence-backed. Sch Based Pos Psychol Intervent ECS Trans 107(1):11199 31. Peramunugamage A, Ratnayake UW, Karunanayaka SP (2022) Systematic review on mobile collaborative learning for engineering education. J Comp Educ 1–24 32. Alam A (2022) Social robots in education for long-term human-robot interaction: socially supportive behaviour of robotic tutor for creating robo-tangible learning environment in a guided discovery learning interaction. ECS Trans 107(1):12389

748

A. Alam and A. Mohanty

33. Almaiah MA, Ayouni S, Hajjej F, Lutfi A, Almomani O, Awad AB (2022) Smart mobile learning success model for higher educational institutions in the context of the COVID-19 pandemic. Electronics 11(8):1278 34. Alam A (2023) Cloud-based e-learning: scaffolding the environment for adaptive e-learning ecosystem based on cloud computing infrastructure. In: Satapathy SC, Lin JCW, Wee LK, Bhateja V, Rajesh TM (Eds) Computer communication, networking and IoT. Lecture Notes in Networks and Systems, vol 459. Springer, Singapore. https://doi.org/10.1007/978-981-19-197 6-3_1

Quantum Web-Based Health Analytics System K. Pradheep Kumar and K. Dhinakaran

Abstract In this work, a quantum web-based health analytics system has been proposed. The system uses a neuro-fuzzy algorithm which uses a selective rulebased strategy to process datasets. When a query is being raised, the datasets required are identified and processed to formulate inference for a knowledge repository. The knowledge repository is decentralized and ensures authentic access for a particular query. The quantum block chain algorithm reduces processing time and memory consumed by 24 and 28%, compared to the conventional block chain approach. Keywords Quantum block chain · Neuro fuzzy · Rule strength · Hyperledger · Qubits · Web 3 · Shy computing

1 Introduction In today’s world, several diseases occur as pandemics which result in disasters. To overcome the same, it is essential for everybody to have awareness in the health domain. The information available should be authentic and reliable. Hence, a security system is required to ensure that no unauthorized access is attempted on the data. This is possible only if we have a decentralized distributed database system where users are restricted to access only a particular subset of datasets. Such a decentralized web known as Web 3 is designed which permits execution on datasets to provide inferences. In this model, datasets are stored in several clouds hosted by multiple service providers. Each service provider has a separate service level agreement. When data is being accessed from a particular cloud environment, several challenges such as K. P. Kumar (B) Department of Computer Science and Engineering, BITS Pilani, Pilani, Tamil Nadu 100190, India e-mail: [email protected] K. Dhinakaran Department of Artificial Intelligence and Data Science, Dhanalakshmi College of Engineering, Chennai, Tamil Nadu 10587, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_53

749

750

K. P. Kumar and K. Dhinakaran

authorized access and latency are associated. To overcome the same, it becomes essential to have a sky computing model where access is provided to multiple cloud providers without any breach to the service level agreement. Also optimizing the network latency by an appropriate virtualization strategy to facilitate the storage is mandatory. Each cloud computing source is identified by a tag to identify its source. A query might require several datasets to provide an inference. Further in several cases a query may not require a complete dataset, it might require a partial dataset which is a subset of a dataset. It may also require a simultaneous data mining on partial datasets. This may require extensive processing of the datasets. Such extensive processing may consume large amount of memory resource. This is because inferences are required in a timely manner to avoid any disaster. A block chain is constructed with inferences which are essential for a query set to provide a required diagnosis for a patient. A quantum computing algorithm is executed on all the nodes of the block chain to provide a diagnosis. This is accomplished by formulating a rule base for the chosen datasets for each query. A query set comprising of relevant queries is created. A one-to-one mapping exists between the query set and the datasets. Based on the inferences obtained, a diagnosis for a patient is obtained. The quantum block chain algorithm reduces processing time and memory consumed by 24% and 28%, compared to the conventional block chain approach. The paper is organized as follows: Sect. 2 reports the literature analysed in this domain. Section 3 explains the proposed model. It also highlights the salient features of the architecture and the algorithm. Section 4 discusses the simulation results in detail. Section 5 concludes the work and highlights the future directions of extension of work.

2 Literature Review A quantum block chain is one which has multiple nodes which are in consensus. The consensus problem with Byzantine Agreement has been discussed by Chuntang et al. in [1]. Different data frameworks and health records for various health records have been discussed by Rehab et al. in [2]. The mapping of cloud computing data as block chains has also been illustrated with different data frameworks. Different digital signature algorithms and use of qTESLA for extracting public key using basic operations of lattices have been explained by Zhang et al. in [3]. Digital signature algorithms and hash checking have also been explained in [3]. Quantum resistant cryptographic algorithm particularly using lattices which has additional features compared to RSA, ECDSA, etc., has been discussed by Mohammed and Askar in [4]. Several block chain schemes based on code and lattice for extraction of public key have been explained by Teodora et al. in [5]. The hash-based digital schemes have also been emphasized with reference to the public key cryptography that have

Quantum Web-Based Health Analytics System

751

been illustrated in [5]. A quantum network protocol with quantum repeaters with emphasis on the network layer architecture has been discussed by Kozlowski et al. in [6]. A quantum key distribution protocol for decoherence-free subspace by formulating special quantum error correcting codes in Hilbert space has been explained by Qianqian et al. in [7]. The vulnerabilities in a block chain network have been illustrated, and Post Quantum Certificates by LACChain nodes have been explained by Allende et al. in [8]. The vulnerabilities have been rated based on their severity level as discussed by Kearney et al. in [9]. The various security primitives, hash functions and quantum cryptosystems have also been classified as multivariate cryptosystems, and lattice cryptosystems have been explained by Fernandez and Paula in [10].

3 Proposed Model For providing diagnosis for any disease, a set of inferences need to be analysed from a knowledge repository. These inferences require extensive processing of datasets, partial datasets and combined processing of multiple partial datasets. The proposed model has two stages: 1. Identification of required datasets 2. Processing of the required datasets.

3.1 Identification of Required Datasets The datasets reside on multiple cloud servers in multiple clusters. The first step is to identify the required datasets. To accomplish the same when a query is raised, the same is decomposed into multiple query components. These query components are modelled as query points. Using the metadata information from the cloud, the required datasets are collected by making an assessment from the sources. Each source has a tag, and a lookup table is created with a tag ID and the required dataset. Datasets from multiple clouds are linked to a dataset identifier for extraction and then mapped with the query points as shown in Fig. 1.

3.2 Processing of Required Datasets For each query, a rule base is formulated and a threshold for the same is computed based on the rule strength. The rule strength is computed using the following expression:

752

K. P. Kumar and K. Dhinakaran

Fig. 1 Block diagram showing mapping operation of query points with datasets

RS(i) =

i=n 

(Wt(i) ∗ Lingval(i)

(1)

i=1

where Wt(i) is the attribute significance for the query component and Lingval(i) is the mid-value of the linguistic fuzzy variable. The threshold of the rule base is computed using the following expression: i=n T =

(Rid ∗ RS(i)) i=n i=1 Rid

i=1

(2)

where Rid is the rule number and RS(i) is the rule strength. Rules whose rule strength is less than the threshold are chosen, and only those rules are packed as qubits. Each qubit would handle 2n rules. The qubits then process the datasets identified.

3.2.1

Creation of Quantum Block Chain

The output of each qubit which is an inference is stored as a block in the quantum block chain. The block chain used here is a Hyperledger type without a wallet. Finally, a Hadamard gate operation is applied to the quantum block chain by aggregating the outputs of each qubit. The Hadamard gate operation analyses all combinations of qubits and optimizes the same. It then provides the diagnosis. The quantum block chain with qubits processing the datasets has been illustrated in Fig. 2. The Hadamard gate processing on the quantum block chain to provide the diagnosis is shown in Fig. 3.

Quantum Web-Based Health Analytics System

Inference 1 from Qubit 1

Inference 2 from Qubit 2

753

Inference n from Qubit

Fig. 2 Quantum block chain

Block 1 Diagnosis for Disease Block 2

Block n

Fig. 3 Block diagram processing of quantum block chain

Algorithm • • • • • • • • • • • • • •

Fetch a Query Set For each query in set Decompose the query into query components For each Query Component identify the required datasets For each query point create a Rule base Compute the Rule Strength and Threshold for the Rule base Choose Rules whose Rule Strength is less than the Threshold Decide on the number of qubits Each qubit would store 2n rules where n is the number of rules Map each qubit to the required datasets to process Aggregate Results of all query points of a particular query. Store results of each query as a node in the quantum block chain Repeat the above step for all queries in the set. Apply Hadamard gate processing for all blocks in the block chain to obtain the diagnosis for a patient.

754

K. P. Kumar and K. Dhinakaran

4 Stimulation Results The algorithm was implemented using CIRQ to convert JSON objects to qubits. It was tested for 1,00,000 datasets across three cloud provider platforms, namely Google, Amazon and Azure. The performance of the algorithm was assessed based on the following metrics: • Processing time • Memory consumption.

4.1 Processing Time The processing time is the total time duration to decompose a query into query components, fetch the required datasets and process the same. It is the time involved for the entire query set after applying the Hadamard gate operation on quantum block chain. It could be observed that the average reduction of processing time which is 24% could be obtained for 50,000 datasets compared to the conventional approach and maximum reduction of processing time which is 29% is obtained for 30,000 datasets as given in Table 1 and illustrated in Figs. 4 and 5. Table 1 Query processing times (conventional block chain vs. quantum block chain approach) No. of. datasets

Processing time

Reduction (%)

Conventional block chain approach (s)

Quantum block chain approach (s)

10,000

15.78

12.41

21.36

20,000

23.45

18.65

20.47

30,000

34.56

24.51

29.08

40,000

47.51

35.65

24.96

50,000

57.87

43.56

24.73

60,000

65.67

52.34

20.30

70,000

75.67

56.54

25.28

80,000

86.57

66.54

23.14

90,000

94.56

72.34

23.50

1,00,000

98.65

75.47

23.50

Average

60.63

45.80

23.63

150 100 50 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000

Processing Time (Secs)

Quantum Web-Based Health Analytics System

755

Conventional Block Chain Approach Quantum Block Chain Approach

No.of Data Sets

Fig. 4 Plot comparing processing times (conventional block chain approach vs. quantum block chain approach)

35.00

Reduction (%)

30.00 25.00 20.00 15.00 10.00 5.00 0.00

No.of Data Sets Fig. 5 Plot showing reduction of processing time using quantum block chain approach

4.2 Memory Consumption The memory resource consumed stores the entire quantum block chain which is the aggregate of all the nodes in the quantum block chain. The memory resource consumed by the quantum block chain and conventional block chain approach is given in Table 2. It could be observed that the average reduction of memory resource which is 28% could be obtained for 60,000 datasets compared to the conventional approach and maximum reduction of memory resource which is 36% is obtained for 40,000 datasets as given in Table 2 and illustrated in Figs. 6 and 7.

756

K. P. Kumar and K. Dhinakaran

Table 2 Resource consumption (conventional block chain vs. quantum block chain approach) No. of. datasets

Processing time Conventional block chain approach (s)

Reduction (%) Quantum block chain approach (s)

10,000

11.45

8.75

23.58

20,000

18.67

12.67

32.14

30,000

29.45

19.54

33.65

40,000

36.76

23.45

36.21

50,000

47.54

32.54

31.55

60,000

56.76

41.43

27.01

70,000

65.43

49.87

23.78

80,000

74.34

57.65

22.45

90,000

84.42

63.45

24.84

100,000

96.46

72.34

25.01

Average

52.128

38.17

28.02

Resource Consumption

120 100 80 60

Conventional Block Chain Approach

40

Quantum Block Chain Approach

20 0

No. of DataSets Fig. 6 Plot comparing processing times (conventional block chain approach vs. quantum block chain approach)

5 Conclusion In this work, a quantum block chain network has been proposed for a Web 3 architecture. The work attempts to enhance security and achieve high-speed dataset

Quantum Web-Based Health Analytics System

757

40.00 35.00

Reduction (%)

30.00 25.00 20.00 15.00 10.00 5.00 0.00 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000

No. of Data Sets Fig. 7 Plot showing reduction of memory consumed using quantum block chain approach

processing. The quantum block chain algorithm reduces processing time and memory consumed by 24% and 28%, compared to the conventional block chain approach.

6 Future Work The quantum block chain network could be ported in Web 3 platform to create a Quantum Internet. Additional security protocols could be incorporated to add more security and provide reliable diagnosis for patients.

References 1. Li C, YinSong X, Tang J, Liu W (2019) Quantum blockchain: a decentralized, encrypted and distributed database based on quantum mechanics. J Quant Comput 1(2):49–63 2. Rayan RA, Zafar I, Tsagkaris C (2021) Blockchain technology for healthcare cloud-based data privacy and security. Book: Integration of WSN into Internet of Things, pp 336–349 3. Zhang P, Wang L, Wang W, Fu K, Wang J, A blockchain system based on quantum-resistant digital signature 4. Mohammed Khalid Z, Askar S (2021) Resistant blockchain cryptography to quantum computing attacks. Int J Sci Business, pp 116–125 5. Ciulei A-T, Cretu M-C, Simion E (2022) Preparation for post-quantum era: a survey about blockchain schemes from a post-quantum perspective, pp 0–37 6. Kozlowski W, Dahlberg A, Wehner S (2021) Designing a quantum network protocol. ACM 1–16

758

K. P. Kumar and K. Dhinakaran

7. Hu Q, Sun T, Feng B, Jia W (2019) A quantum key distribution scheme based on quantum error-avoiding code in decoherence-free subspace. ACM 1–35 8. Allende M, Lopez D, Ceron S, Leal A, Pareja A, Da Silva M, Pardo A, Jones D, Worrall D, Merriman B, Gilmore J, Kitchener N, Venegas-Andraca SE (2021) Quantum-resistance in blockchain networks. ITE Department, IDB Lab, pp 1–30 9. Kearney JJ, Perez Delgado CA (2021) Vulnerability of blockchain technologies to quantum attacks, pp 1–10 10. Fernandez-Carames TM, Fraga-Lamas P (2021) Towards post-quantum blockchain: a review on blockchain cryptography resistant to quantum computing attacks. IEEE Access 1

Comparative Study of Noises Over Quantum Key Distribution Protocol Sawan Bhattacharyya, Ajanta Das, Anindita Banerjee, and Amlan Chakrabarti

Abstract The security of modern cryptographic systems mostly depends on mathematical hardness to solve a particular problem manually. The most popular and widely used classical cryptosystem, RSA (Rivest, Shamir, Alderman) is based on the mathematical hardness of prime factorization of the product of two large prime by any of the present classical processors. But the advancement in technology particularly in quantum information provides a threat to the present communication protocols. It motivates the researchers to move to new technologies that are fundamentally more secure through the principle of quantum mechanics. Such protocols rely upon quantum bits or qubits for data transmissions securely. The application of qubits instead of bits in cryptography using quantum key distribution protocols makes encryption robust. But the quantum communication protocols are prone to noise due to environmental factors, fragile and traverse’ a long fiber optical path. Hence, the states of qubits are easily tampered with by external factors. Noise alters the information; eventually, the message may not be meaningful to receivers. Except for bit flip and phase flip noises, depolarization, amplitude damping, decoherence, thermal relaxation, and phase damping are obstacles to achieving long-distance communication. The objective of this paper is to present a comparative study of quantum communication protocols in a noisy environment with different classes of noise. S. Bhattacharyya (B) Department of Computer Science, Ramakrishna Mission Vivekananda Centenary College, Kolkata, West Bengal, India e-mail: [email protected] A. Das Amity Institute of Information Technology, Amity University Kolkata, Kolkata, India A. Banerjee Corporate Research and Development, Centre for Development of Advanced Computing, Pune, India e-mail: [email protected] A. Chakrabarti A. K. Choudhury School of Information Technology, University of Calcutta, Kolkata, West Bengal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_54

759

760

S. Bhattacharyya et al.

Keywords Quantum communication · Quantum entanglement · Quantum superposition · Decoherence · Dephasing

1 Introduction Communicating using Morse Code on telegraph lines to digital communication in modern-day computers and the Internet, security had always been an indispensable part. In the digital communication binary data, bits that could either be 0 or 1 but not both are transferred from senders to receivers. These bits travel either through free space or through cables. These bits are physically realized through the variation in some physical quantity commonly voltage; quantum communication, on the other hand, is administered by the laws of quantum mechanics particularly superposition, entanglement, and the particular no-cloning theorem. The last one, i.e., the no-cloning theorem is of particular interest because it satisfies the unconditional security of many of our quantum communication protocols [1]. The quantum states cannot be copied and any attempt to do so would let a trace upon the measured qubit [2]. The trace is accounted for by a change in probability distribution which in turn let the measurement be detected, making the quantum communication inherently private. The main two categories of the quantum channel are the free space channel and fiber optics cable. The reason for using these two as our quantum communication channel is that decoherence occurs within the limit of 20–30 km of atmosphere [3, 4]. The main limitation posed by our classical counterpart is its security which is unconditionally served by the quantum counterpart. The problem of key distribution in the classical domain is purposefully served by quantum key distribution (QKD) protocols, a prime candidate of quantum communication protocols and our main area of focus in this paper [5]. Quantum communication in the simplest term can be explained as the art of transferring the quantum state from one place to other. Information is encoded into qubits, the unit of information in quantum information science. Quantum communication protocols can be classified into the following classes based on the applications and techniques used [6]. 1. Quantum key distribution: The goal of the QKD protocol is to generate a shared secret key between the sender and receiver over a public communication channel. Generally, the QKD protocols can be divided into two phases; the first phase is the quantum transmission phase where the two legitimate users encode or decode the quantum information to or from qubits. The second phase is that of the classical postprocessing phase where they generate the secure key from the bit string generated during transmission. The quantum transmission task is usually performed by the polarization in the case of the photonic qubit where a specific degree of polarization orientation represents one specific bit of information (refer to Fig. 1). Example: The most prominent example here is the BB84 developed by Bennet and Brassard in the year 1984, other examples include Ekert E91 Protocol, B92 Protocol, and SARG04 protocol [7].

Comparative Study of Noises Over Quantum …

761

Fig. 1 Schematic diagram of a typical QKD protocol [8]

2. Quantum authentication: The quantum authentication protocol is related to finding the identity of the sender and the integrity of the sender. The QKD technique had provided a new edge in the process of secure communication but there exists a serious loophole in it. The prior technique can easily detect the presence of the attacker but provides no information regarding the identity of the sender, i.e., the receiver can never know whether the message came from a legitimate user or attacker [9]. This is the point where the authentication came into play which guaranteed that the message just came from the legitimate sender and not from the attacker. In a standard quantum authentication scheme, the idea is to encode the quantum state in a quantum error-correcting code. Instead of using only one error-correcting code, there is a need to use a family of codes to reduce the chances of the creation of errors by an attacker in the used correcting code (refer to Fig. 2). Now in this article, quantum key distribution had been focused on and studied which can further be classified into the following form • Prepare and measure protocols: Sender prepares the quantum states which he or she wants to transfer according to the classical information, and on the other hand, receiver measures the received quantum state to obtain the information. Example: BB84, B92, SARG04, DPS-QKD • Entanglement based protocol: Sender and receiver both depend on some central source or some third party to transmit the entangled pair—a part of which belongs to the sender and the other to the receiver. Example: Ekert E91.

762

S. Bhattacharyya et al.

Fig. 2 Schematic diagram of a typical quantum authentication protocol where authentication is carried out in a separate channel apart from the transmission channel [4]

Quantum communication even though provides unconditional security but lags behind classical communication in providing long-distance error-free transmission of information. Qubits are extremely fragile and prone to various noise and errors [10]. Noise that can arise in our communication protocols can be classified into three classes of channel [2]. 1. Depolarizing channel 2. State preparation and measurement (SPAM) channel 3. Thermal decoherence and dephasing channel. The main objective of this paper is to study the effects of these categories of noise on quantum key distribution protocol with various qubits in the above-mentioned three noisy quantum channels and to calculate the secure key generation rate for the QKD protocol. The experimental results stand with the theoretical prediction except for some marginal errors that arise from the simulations. The paper is organized into the following sections. Theory of quantum communication with the fundamentals of quantum computing and background of the QKD mathematical formulation of the secure key generation rate is briefly mentioned in Sect. 2. This section also elaborates on different types of noises in quantum communication. Section 3 presents the methodology and the proposed approach for implementation of the algorithm studied and the tool used for the purpose. It specifies the various packages and modules that had been used in the study. Section 4 presents the results of the specific cases of three types of noise channels using the probability distribution chart. It also specifies the secure key generation rate for the three noisy channels. The achieved experimental results for various noises are discussed and explained in Sect. 5. Finally, the paper concludes with the challenges that had been arises in the research with a future direction.

Comparative Study of Noises Over Quantum …

763

2 Theory of Quantum Communication Quantum communication can be stated in a simple term as the transfer of a quantum state from one place to another. The motivation behind encoding our classical information in a quantum state is the unconditional security provided by this QKD mechanism. Another motivation that motivated physicists and engineers to switch from classical systems to the quantum system is the close correlation between quantum communication and quantum nonlocality as illustrated by the process of quantum teleportation. The quantum states are encoded in a quantum system qubits that live in a two-dimensional hilbert space. Some of the other quantum communication protocols besides QKD are quantum oblivious transfer where the senders send much potential information to the receiver but the sender is himself not aware of the specific content of the transmission. Informally speaking, the sender sends a message to the receiver which the receiver received half of the time. The sender does not know anything about the outcome but the receiver knows whether he has received the message or not. Another yet most important quantum communication protocol is the quantum authentication protocol which certifies the identity of the sender and the integrity of the message sent traditionally authentication and identification of the sender it’s done through the use of the hash function [5, 11, 12]. A different form of quantum communication technique is quantum teleportation which is based on the principle of quantum entanglement. It allows transmitting an arbitrary qubit from location a to location b using a preshared pair of entangled qubits sent over by some third party or central source [9, 13]. For a qubit to be useful, we must be able to perform three fundamental operations on it, viz. prepare it in a well-defined state, apply controlled unitary operations on it, and be able to measure it. In communication, we use photons to carry information; they are known as flying qubits. The requirements for the photonic qubits to be useful in our quantum communication can be listed in the following ways: 1. State preparation: Qubit can be prepared using a single photon by the spatial mode, polarization mode, and time slot implementations. • In the first case, the single photon in mode 1 is split into two spatially separated modes 2 and 3 using a beamsplitter and a phase shifter; hence, the state of the photon becomes |11 → |ψqubit  = cos θ |12 |03 + eiφ sin θ |02 |13

(1)

Thus, binary information can be encoded in the presence of the photon as |0 = |12 |03 |1 = |02 |13

(2)

764

S. Bhattacharyya et al.

• In the polarization technique, the two spatial modes are replaced by the two polarization states of a single spatial mode. Thus, binary information can be encoded in horizontal or vertical polarization as |0 = |H  |1 = |V 

(3)

These techniques are not well suited for long-distance communication because they are very sensitive to polarization drifts and phase instability in long optical fibers. • Another very practical implementation of photons that are well suited for long-distance communication. A single-photon in mode 1, which defines a transform-limited wavepacket in space and time, is sent through an unbalanced interferometer, which has a short and a long arm. The long arm introduces a time delay relative to the short arm, which is greater than the coherence length of the input photon. The output of the interferometer is two pulses separated in time. Assuming this time separation is sufficiently long so that the two-time slots can be treated as orthogonal modes, we can define the modes corresponding to time slots t1 and t2 . Qubit state after passing through the unbalanced interferometer becomes |ψqubit  = cos θ |t1  + eiφ sin θ |t2 

(4)

The information, in this case, is encoded in the relative phase of the two-time slots t1 and t2 . This information remains undisturbed during propagation in optical fiber because the time separation of the two pulses is usually very short, on the order of a nanosecond, while the phase and polarization drifts occur at long time scales, so the pulses undergo the same distortion in the fiber. This fact makes time slot implementation advantageous for long-distance quantum communication. The above technique is used in differential phase shifting QKD [14]. 2. Unitary operation: Manipulation of the quantum information needed to perform controlled unitary evolution, which means that we should be able to transform the qubit from its initial state to any other state on the Bloch sphere. The transformation must conserve probability; hence, it must be described by unitary operators, which can be thought of as rotations or combinations of rotations on the Bloch sphere. Transformation is easy in the case of a photonic qubit. 3. Measurement: The theory of quantum measurement can be described by two postulates: • Postulate 1: The wavefunction of a quantum particle is represented by a vector in a normalized Hilbert space which is spanned by an orthonormal basis |0, |1, . . . , |n − 1, where n is the dimensionality of the Hilbert space. Every measurement is represented by a projection onto a complete orthonormal basis

Comparative Study of Noises Over Quantum …

765

Fig. 3 Schematic diagram of steps in a typical QKD protocol explaining the flow from raw key to secure key [14]

which spans the Hilbert space. The basis are defined as |P0 , |P1 , . . . , |Pn−1 . The probability of measuring the qubit in the state |Pi  is simply given by |Pi |ψ|2 where |ψ is the wavefunction of the qubit. • Postulate 2: Define the wavefunction of a quantum system before measurement as |ψ. The measurement basis are defined as |P0 , . . . , |Pn−1  Given that the the system was measured in the state |Pi , the wavefunction of the system after the measurement is also |Pi .

2.1 Quantum Cryptography—QKD The QKD technique as discussed in the previous sections is not sufficient to use in the real world. The key obtained after decoding by the receiver contains a lot of error that need to be mitigated. Thus to achieve the goal of security and filter the key from errors, two more steps are needed in addition to encoding and decoding the quantum information (refer to Fig. 3). The general steps involved in a QKD Protocol are

766

S. Bhattacharyya et al.

• Quantum transmission: In the quantum transmission step, the sender and receiver share a random string of bits transmitted over a quantum channel. Most QKD protocols belong to one of two categories, single qubit protocols and entangled qubit protocols. Single qubit protocols make use of the measurement uncertainty properties to ensure secrecy [15]. Important examples of single-qubit protocols are the BB84, B92, Koashi01, and six-state protocols. Entangled qubit protocols use nonlocal correlations to achieve security. They rely on the fact that if any local variable exists which can predict the state of an entangled qubit pair, then nonlocal correlations are not observed. Important examples of entangled qubit protocols are the Ekert91 and BBM92 protocols. The outcome of the first step is an ensemble of bits called the raw key. The raw key generation rate Rraw is simply equal to the product of the repetition rate of the transmission and the probability of a photodetection event registered by the detectors in the measurement setup. • Sifting: The sifting phase correspondence to the measurement stage where the sender and receiver discard those bits where their measurement basis does not match. The two legitimate users use a public channel to communicate information regarding the basis they choose for measurement. The process of discarding the bits in the cases where they used different bases is called sifting. The ensemble of bits remains after the basis reconciliation from the shifted key. If there are no errors in the cryptographic system, then a potential eavesdropper cannot yield any information regarding the transmission of a message without being detected. In this case there the sifted key is unconditionally secure. However in any practical communication system error naturally occurs due to a defect in the device (circuit noise) or the line of transmission (channel noise). Thus, in practical systems, the statement that any eavesdropping will unavoidably cause errors and reveal that eavesdropping is there is not sufficient security proof. There is always a baseline system error rate so we must take into account that some information about the transmission had been leaked. A practical QKD system handles system error and eavesdropping by following two additional steps, i.e., error correction and privacy amplification. These additional steps can be performed in a public channel. • Error correction: The error correction step serves the dual purpose of correcting all erroneously received bits and giving an estimate of the error rate. The sender reveals some additional information to the receiver about his or her key that will allow the receiver to find and correct all of the error bits. One possible technique is that the two parties can group their bits in the segment and check the parity and optimize the segment size as the error process continues. Since this process occurs in the public channel, it would naturally leak some additional information to the attacker. Thus, this information leaked needs to be as small as possible. The minimum number of bits in forming the segment needed for the error correction is given by the result from the classical information theory, Shannon’s noiseless coding theorem. The theorem asset that lim

x→∞

κ = −e log2 e − (1 − e) log2 (1 − e) ≡ h(e) n

(5)

Comparative Study of Noises Over Quantum …

767

where n is the length of the shifted key and κ is the size of the disclosing segment. There are two classes of error-correcting schemes, unidirectional and bidirectional. In the prior case information, only flow from the sender to the receiver, and then sender provides the receiver with the additional string needed for error correction. It’s difficult to design an algorithm that is both computationally efficient and works near the Shannon limit. In the bidirectional scheme, information flows from both ends [13]. The receiver sends feedback to the sender and the sender works upon the feedback to detect what additional information is needed for error correction. These two error correction algorithms classes can be further subdivided into algorithms that discard errors and algorithms that correct them. Discarding errors is usually done to prevent additional side information from leaking to the attacker [11]. By correcting the errors, we allow for this additional flow of side information, which can be accounted for during privacy amplification. • Privacy amplification: Privacy amplification corresponds to the compression needed to account for the information leaked during transmission and error correction. The amount of compression depends on the amount of information leaked in the previous phases. For security proof to be useful, it must be bound to the amount of information leaked and relates it to how much compression is needed in privacy amplification. There are three categories of generalized attacks that have been considered: individual, collective, and coherent attacks. The ability to perform collective and coherent attacks is well beyond today’s technological advancement. In this paper, we will focus on individual attacks because they can be realized with the current scenario with some assumptions. The role of the privacy application step is to deduce the shrinking factor τ by which the corrected key needs to be compressed to account for the information leaked during the transmission and error correction steps. This calculation is performed using the methods of the generalized privacy amplification theory, which makes the worst-case assumption that all errors are potentially caused by eavesdropping. The shrinking factor τ is given by τ =−

log2 pc n

(6)

where pc is the average collision probability. The theory sets the length of the final key as r = nτ − κ − t (7) The average collision probability is the measure of the attacker’s mutual information with the two legitimate users. This factor is a function of the error rate and parameter of the specific cryptographic system [5]. The secure key generation rate is a much more important and useful quantity to study the effect of noise over QKD protocols. It can also be defined normalized communication rate. If N is the length of the transmission, then n = N Rshifted = N s Rraw , and secure key generation rate is given by

768

S. Bhattacharyya et al.

Table 1 Benchmark performance of a bidirectional error correction algorithm e f(e) 0.01 0.05 0.1 0.15

1.06 1.16 1.22 1.35

R = lim

N →∞

 r = lim Rshifted τ − x→∞ N

κ n



t n



(8)

in the limit of long string nt = 0 and κn is the fraction of additional information disclosed during error correction. None of the practical algorithm work at Shannon’s limit. Thus, lim

n→∞

κ = − f (e)[e log2 e − (1 − e) log2 (1 − e)] = f (e)h(e) n

(9)

where f (e) is defined as the ratio of the performance of the algorithm to Shannon’s limit. In all the calculations in the paper, it had been assumed that the algorithm is bidirectional. The algorithm work within 35% of Shannon’s limit. Values of f (e) for several different error rates, produced by benchmark tests, are represented in (refer Table 1). The final expression for the secure key generation rate is R = Rshifted {τ + f (e)[e log2 e − (1 − e) log2 (1 − e)]}

(10)

where Rshifted and τ depends on the QKD protocols and system parameter. Here, f (e) ≥ 1, the function f (e) can be determined by benchmark testing the algorithm under a broad range of strings. The quantum key distribution can be of two types depending on the technique applied. Now for the BB84 protocol, the secure key rate generation can easily be calculated. For BB84, the collision probability for each bit pc0 is given by pc0 ≤

1 + 2e − 2e2 2

(11)

Thus the average collision probability for the n bit string is calculated as pc = pcn0 and the shrinking factor is given by, log pc = − log2 pc0 = − log2 τ =− 2 n



1 + 2e − 2e2 2

 (12)

Comparative Study of Noises Over Quantum …

769

The final expression for the secure key generation rate is R = Rshifted {− log2

1 + 2e − 2e2 + f (e)[e log2 e − (1 − e) log2 (1 − e)]}. (13) 2

2.2 Noise Models Noise is the central obstacle in developing long-distance quantum communication. The noise may arise either due to infidelities in the hardware (i.e., gates, measurement device) or due to unwanted interaction with the environment (i.e., thermal, electromagnetic, and gravitational decoherence) quantum communication got highly affected due to the noise in the channel. Three sources of errors concern us: (1) hardware infidelities in the form of depolarizing pauli noise, (2) state preparation and measurement (SPAM) error, and (3) decoherence in the form of thermal relaxation and dephasing. 1. Depolarizing channel: The term symmetric depolarizing channel is often interchangeably used with gate infidelities or depolarizing channels. It essentially stimulates the bit-flips and phase flip error due to gate infidelities within the hardware as a depolarizing channel. The bit flip and phase flip error is often represented through the Pauli X and Pauli Z operation [8, 13]. The combined effect of bit and phase flip is represented through Pauli Y. The depolarizing channel can be represented by the following operator. K D0 =



1 − p1 I,  p1 X, K D1 = 3  p1 K D2 = Z, 3  p1 K D3 = Y 3

(14)

The effect of the depolarizing channel on a quantum system can be expressed as ρ → D(ρ) =

3 

K Di ρ K D† i

(15)

i=0

where ρ is the density matrix of the qubit. 2. State preparation and measurement (SPAM) channel: This channel is essentially a simple Pauli X error, but it affects the measurement aspect of the computation. Thus, one can represent the SPAM quantum channel for the measurement error by following kraus operator

770

S. Bhattacharyya et al.

 1 − p2 I, √ K M 1 = p2 X

K M0 =

(16)

where p2 is the probability of incorrect measurement. The effect of the SPAM channel for measurement error is given by ρ → S(ρ) = K M0 ρ K M0 + K M1 ρ K M1

(17)

In the case of the state preparation, the error channel is of a similar form as that of the measurement case (i.e., ρ → S (ρ)), with the qubit fail to prepare in the desired state, resulting in the inverted state by X with the probability p2 . 3. Thermal decoherence and dephasing channel: There are two aspects of noise within this error group: (i) the thermal decoherence (or relaxation) that occurs over time in the form of excitation/de-excitation and (ii) the dephasing of the qubits over time. Thermal decoherence is defined as the loss of quantum coherence due to a quantum system’s physical interaction with its environment. It can affect the qubits in a variety of ways [2, 10, 16]. One among which is that the qubits were at state |0 ground state and ends at |1 excited state. It’s a form of non-unital (i.e., irreversible) that describes the thermalization of the qubit spins toward equilibrium at the temperature of their environment. The time required to relax (moving toward either of the equilibrium state |0 or |1), coincidentally called the energy relaxation time, is denoted as T1 [17]. The another aspect of noise that came under this third group is dephasing, which explains how coherence behavior decays over time. It is a mechanism that describes the transition of a quantum system toward classical behavior. That’s the phase information spread out widely so the phase information is lost. The time of dephasing is denoted as T2 . These two times are related as T2 ≤ 2T1 . The probability for a qubit to relax (thermal relaxation) ( pT1 ) and probability for a qubit to dephase ( pT1 ) is given by pT1 = e

T

− Tg

pT2 = e

1

,

T − Tg 2

(18)

Theoretically, thermal relaxation causes a shift of the qubit to either the equilibrium state of |0 or |1 when the temperature of the quantum processor is not 0, but practically excitation is an extremely rare event due to the extremely low temperature of the quantum processor and other associated components and high frequency of the qubits of order 109 Hz. Thus, we can effectively assume that the reset the error takes the form of only reset to the ground state [5, 13]. Thus, we can now refer to thermal relaxation simply as relaxation or spontaneous emission. If T2 ≤ 2T1 is held for every qubit in the system, then relaxation and dephasing noise can be expressed as a mixed reset and unital quantum channel. If the temperature of the device is 0, we can have the following form of noise:

Comparative Study of Noises Over Quantum …

771

• Dephasing: A phase flip occur with probability pz = (1 − preset )(1 − pT2 pT1 −1 )/2 where preset = 1 − pT1 . • Identity: Nothing happens to the qubit or the identity I occurs with probability p I = 1 − pz − preset . • Reset to ground state: Thermal decay or jump toward ground state with probability preset = 1 − pT1 . The above cases can be represented with the following operators: √ K I = p I I, √ K Z = pZ Z , (19) √ K reset = preset |00| Thus, the effect of relaxation channel when T2 ≤ 2T1 is given by ρ → N (ρ) =



K k ρ K k†

(20)

k∈I,Z ,reset

2.3 Motivation Quantum communication system suffers from a major drawback due to the presence of noise in the channel. Large-scale reliable quantum communication system can never be possible if the central hurdles-Noise had not been eliminated. The main candidates in the quantum communication system—QKD protocols can’t be applied to the real world to be a part of the communication system if noise and error in the channel are not accounted for. The powerful feature of the QKD protocols can only be realized in a real sense of the noise and error in the channel being taken care of. It motivated us to conduct the research on the implementation of a popular QKD protocol in the popular three noise channels, viz. depolarizing channel, state preparation and measurement (SPAM) channel, and thermal decoherence and dephasing channel to calculate the secure key generation rate in presence of noise.

2.3.1

Comparative Study

This work focuses on the three noise channels, viz. depolarizing channel, state preparation and measurement (SPAM) channel, and thermal decoherence and dephasing channel to study the comparative effectiveness of these noisy channels and calculates the secure key generation rate for BB84 protocol for the 5 and 10% noise.

772

S. Bhattacharyya et al.

3 Methodology The study of the BB84 QKD protocol in a noisy environment had been carried out in two different ways. In the first case, the protocol had been executed with four qubits to study the effect of each type of noisy channel over every single qubit by calculating the probability of getting each bit. The probability distribution of generated key had been depicted through the “plot_histogram” method under the “qiskit.visualization” package. In the second case, the protocol had been carried out with 2016 qubits using our algorithm which had been developed using qiskit standard packages to compute the secret key rate [14]. The protocol had been executed in three noisy channels, viz. depolarizing channel, state preparation and measurement (SPAM) error channel, and finally the decoherence and dephasing channel to examine the effect of each noisy channel over the protocol.

3.1 Proposed Approach The proposed approach is very simple yet effective in computing the secret key rate. Firstly, the noise model had been created using Qiskit standard packages. Then, it is followed by encoding the message in the desired basis which is chosen at random using the python standard random number generating package. It is followed by adding the noise model in the protocol with desired noise percentage. This dissertation it had been carried out with 5 and 10% noise separately.The next step had been carried out from two different perspectives; in the first case, the message had been measured directly representing the scenario without eavesdropping, and in the second case, interception had been presented by adding an extra measurement before the receiver. It is then followed by key shifting and finally appending the generated key to a list. The same process repeats in a loop 63 times, where each loop executes the protocol for 32 qubits, and thus, the protocol executes for a total of 2016 qubits. Each loop appends the shifted key and the process continues 63 times. The proposed approach had been presented in refer (Fig. 4).

3.2 Simulation in Noisy Environment The noise had been deployed with Qiskit Aer Noise Module. The “NoiseModel” class had been used to store a noise model used for noisy simulation. The “QuantumError” class which describes CPTP gate errors had been used to generate the noisy channel. The protocol had been tested with 5 and 10% noise. “depolarizing_error()” function which comes under Qiskit Aer Noise Module’s class “QuantumError” had been used

Comparative Study of Noises Over Quantum …

773

Fig. 4 Schematic diagram of workflow our the proposed idea. The left part indicates the eavesdropping and the right part is the ideal case. The process appends the shifted key after each cycle and the process continues

to create a depolarising channel. To create a noisy SPAM channel, “pauli_error()” function which comes under Qiskit Aer Noise module’s class “QuantumError” had been used. Now finally for the thermal decoherence and dephasing channel, “thermal_relaxation_error()” function which comes under Qiskit Aer Noise Module’s class “QuantumError” had been used. Likewise in a noiseless environment, the experiment had been done in two different ways, the first time the protocol had been run 4 qubits had been built to see the effect of eavesdropping by plotting the data in the histogram, and the second time the protocol had been tested with 32 qubits and repeated 63 times, each time appending the result; getting the length of the shifted key near to 900. To visualize the result from the first part of the experiment with 4 qubits, Qiskit’s “plot_histogram” package had been used. Each of the experiments had been repeated with a shot of 1024. The sample size had been kept at 25% of the shifted key.

774

S. Bhattacharyya et al.

4 Result The result of the execution of the BB84 QKD protocol in both a noiseless and noisy environment had been presented in the following section. In both cases, it had been executed with four qubits and 2016 qubits separately. The prior had been done to examine the effect of noise over each qubit, and the latter had been done to compute the secure key rate in presence of noise or eavesdropping or both. For the purpose of simulation, Z basis had encoded to 1 and X basis to 0.

4.1 Depolarizing Channel 4.1.1

Ideal

Sender’ Bits : [1 0 1 1], Sender’s Basis : [0 1 1 0], Receiver’s Basis : [1 0 1 0] The result is presented in Table 2, and the probability distribution is shown in Fig. 5 with 5% noise.

Fig. 5 Probability distribution for the experimental result of BB84 in depolarizing channel without eavesdropping Table 2 Experimental result of the BB84 in depolarizing environment with 5% noise without the presence of eve q[0] q[1] q[2] q[4] 0(49.8%), 1(50.2%)

0(51.3%), 1(49.1%)

0(1.8%), 1(98.2%)

0(0%), 1(100%)

Comparative Study of Noises Over Quantum …

775

Fig. 6 Probability distribution for the experimental result of BB84 in depolarizing with eavesdropping Table 3 Experimental result of the BB84 in depolarizing environment at 5% noise with the presence of eve q[0] q[1] q[2] q[4] 0(2.3%), 1(97.7%)

4.1.2

0(50.4%), 1(49.6%)

0(48%), 1(52%)

0(51.1%), 1(48.9%)

Eavesdropping

Sender’ Bits : [1 0 1 0], Sender’s Basis : [1 1 1 0], Attacker’s Basis : [1 0 0 1], Receiver’s Basis : [1 1 0 0] The result is presented in Table 3, and the probability distribution is shown in Fig. 6.

4.2 State Preparation and Measurement (SPAM) Channel 4.2.1

Ideal

Sender’ Bits : [0 0 1 1], Sender’s Basis : [0 1 1 1], Receiver’s Basis : [0 1 0 1] The result is presented in Table 4, and the probability distribution is shown in Fig. 7 with 5% noise.

776

S. Bhattacharyya et al.

Fig. 7 Probability distribution for the experimental result of BB84 in SPAM channel without eavesdropping Table 4 Experimental result of the BB84 in SPAM Error environment with 5% noise without the presence of eve q[0] q[1] q[2] q[4] 0(96.6%), 1(3.4%)

0(94.7%), 1(5.3%)

0(48.9%), 1(51.1%)

0(3.3%), 1(96.7%)

Table 5 Experimental result of the BB84 in SPAM Error environment with 5% noise with the presence of eve q[0] q[1] q[2] q[4] 0(50.4%), 1(49.6%)

4.2.2

0(90.2%), 1(9.8%)

0(50.9%), 1(49.1%)

0(49.8%), 1(50.2%)

Eavesdropping

Sender’ Bits : [1 0 1 0], Sender’s Basis : [0 0 0 1], Attacker’s Basis : [1 0 1 1], Receiver’s Basis : [0 1 0 0] The result is presented in Table 5, and the probability distribution is shown in Fig. 8.

Comparative Study of Noises Over Quantum …

777

Fig. 8 Probability distribution for the experimental result of BB84 in SPAM channel with eavesdropping Table 6 Experimental result of the BB84 in thermal decoherence and dephasing without the presence of eve q[0] q[1] q[2] q[4] 0(75.7%), 1(24.3%)

0(76.5%), 1(23.5%)

0(77.3%), 1(22.7%)

0(48.9%), 1(51.1%)

4.3 Thermal Decoherence and Dephasing Channel 4.3.1

Ideal

Sender’ Bits : [0 0 0 1], Sender’s Basis : [0 1 1 1], Receiver’s Basis : [1 0 0 0] The result is presented in Table 6, and the probability distribution is shown in Fig. 9 with T1 = 0.0125 and T2 = 0.0025.

4.3.2

Eavesdropping

Sender’ Bits : [1 0 1 1], Sender’s Basis : [1 1 0 1], Attacker’s Basis : [1 1 1 1], Receiver’s Basis : [1 0 1 0] The result is presented in Table 7, and the probability distribution is shown in Fig. 10. The protocol had been executed with 2016 qubits in the above three mentioned channels with 5% and 10% noise to calculate the secure key rate generation using 13. The result is summarized into the following two tables (refer Tables 8 and 9).

778

S. Bhattacharyya et al.

Fig. 9 Probability distribution for the experimental result of BB84 in thermal decoherence and dephasing without eavesdropping at T1 = 0.0125 and T2 = 0.0025

Fig. 10 Probability distribution for the experimental result of BB84 in thermal decoherence and dephasing with eavesdropping Table 7 Experimental result of the BB84 in depolarizing environment with the presence of eve at T1 = 0.0125 and T2 = 0.0025 q[0] q[1] q[2] q[4] 0(57.1%), 1(42.9%)

0(77.6%), 1(22.4%)

0(50.6%), 1(49.4%)

0(76.7%), 1(23.3%)

Comparative Study of Noises Over Quantum …

779

Table 8 Error rates and secure key generation rate (if error ≤ 11) for 10% noise and T1 = 0.0125 and T2 = 0.0025 Noise Ideal Eavesdropping Depolarizing noise SPAM Decoherence and dephasing

e = 0.064, Secret key rate = 289.225 e = 0.036, Secret key rate = 553.180 e = 0.474; Discarded

e = 0.392, Discarded e = 0.317, Discarded e = 0.380, Discarded

Table 9 Error rates and secure key generation rate (if error ≤ 11) for 5% noise and T1 = 0.0125 and T2 = 0.0025 Noise Ideal Eavesdropping Depolarizing noise SPAM Decoherence and dephasing

e = 0.075, Secret key rate = 179.594 e = 0.033, Secret key rate = 593.349 e = 0.474 Discarded

e = 0.3252, Discarded e = 0.421, Discarded e = 0.380, Discarded

Fig. 11 Graphical representation of trend of sifted key rate over error rate for depolarizing channel

The graphical variation of the secure key rate for depolarizing and state preparation and measurement error channel is shown in Figs. 11 and 12. The error rate for the third thermal decoherence and dephasing channel is beyond the expected limit, and hence, there is no way to create the key.

780

S. Bhattacharyya et al.

Fig. 12 Graphical representation of trend of sifted key rate over error rate for state preparation and measurement error channel

5 Discussion The result for BB84 stands well on the theoretical prediction except for some errors incorporated due to noise.

5.1 Execution with Four Qubits 5.1.1

Depolarizing Channel

In the ideal, i.e., without eve dropping, the basis for measurement of third qubit match but due to the error of 5% the probability of getting 1 had fallen to 98.2%. The reduced probability is negligible; thus, it may be concluded that depolarization had not affected our system. In the eavesdropping case, except for the third qubit, all the qubit had been measured on the same basis but due to the presence of eve probability split into half for the second and fourth qubit; thus, the attacker had been noticed. Noise does not help the attacker in any way.

5.1.2

State Preparation and Measurement (SPAM) Channel

In the ideal, i.e., without eve dropping, the basis for measurement of first, second, and fourth qubit match, but due to the error of 5%, the probability of getting the correct message had fallen to 96.6%,94.7%, and 96.7%, respectively. The reduced probability is negligible; thus, it may be concluded that depolarization had not affected our system.

Comparative Study of Noises Over Quantum …

781

In the eavesdropping case, the basis for measurement of the second and fourth qubits does not match, but for the second qubit, the probability of getting 0 is higher than 1. Thus, eve measured the second qubit and got 0. Since the basis is not the same sender and receiver would drop that position. Noise does not help the attacker in any way.

5.1.3

Thermal Decoherence and Dephasing Channel

In an ideal case basis does not match yet for the first, second, and third qubit receiver got a high probability for the correct message; thus, it may be concluded that decoherence and dephasing increase the probability of getting the correct result despite the wrong basis. On the other hand in the case of eavesdropping, decoherence and dephasing are helping the users to get a trace of the attacker by splitting the probability for the first qubit where every three including the attacker choose the same basis.

5.2 Execution with 2016 Qubits With the increase in the noise for a depolarizing channel, the secure key rate generation got increased. For an increase in noise of 5%, the secure key rate got increased by 109.631 parts while for the SPAM error channel the key rate generation got decreased by 40.169 parts. The eavesdropping got easily detected by an increase in error rate, and decoherence and dephasing had a devastating effect on generating a long key. The error rate for T1 = 0.0125 and T2 = 0.0025 is near about 50%.

6 Conclusion The main challenge that had been faced include the unavailability of reliable articles that describe the mathematics of the noise model. The unavailability of the IBMQ server and a limited number of qubits on the risk remains a big obstacle. The simulators and IBMQ platform does not well mimic the real-life scenario of the quantum communication system, and thus, the accuracy may not tally exactly with the real-life cases. Yet the results may give some insight. In the future, the work may be extended over a much wide range covering a much wider range of protocols and noise models. In the future, we may expect to have access to real-time quantum processor with a large number of qubits to study the effect on a much more practical ground with a higher rate of accuracy.

782

S. Bhattacharyya et al.

References 1. Bhattacharyya S, Chakrabarti A (2022) Post-quantum cryptography. In: Sharma N, Chakrabarti A, Balas VE, Bruckstein AM (eds) Data management, analytics and innovation. Springer Singapore, Singapore, pp 375–405 2. Georgopoulos K, Emary C, Zuliani P (2021) Modeling and simulating the noisy behavior of near-term quantum computers. PRA 104(6):062432 December 3. Crépeau C (1994) Quantum oblivious transfer. J Mod Opt 41(12):2445–2454 4. Lin T-S, Tsai I-M, Wang H-W, Kuo S-Y (2006) Quantum authentication and secure communication protocols. In: 2006 Sixth IEEE conference on nanotechnology, vol 2, pp 863–866 5. Zhou T, Shen J, Li X, Wang C, Shen J (2018) Quantum cryptography for the future internet and the security analysis. Secur Commun Netw 1–7(02):2018 6. Gisin N, Thew R (2007) Quantum communication. Nat Photonics 1(3):165–171 Mar 7. Chen J (2021) Review on quantum communication and quantum computation. J Phys Conf Ser 1865(2):022008 Apr 8. Nadlinger DP, Drmota P, Nichol BC, Araneda G, Main D, Srinivas R, Lucas DM, Ballance CJ, Ivanov K, Tan EY-Z, Sekatski P, Urbanke RL, Renner R, Sangouard N, Bancal J-D (2022) Experimental quantum key distribution certified by bell’s theorem. Nature 607(7920):682–686 Jul 9. Fung CH, Tamaki K, Lo HK (2006) Performance of two quantum-key-distribution protocols. Phys Rev A 73(1):012337 10. Singh A, Dev K, Siljak H, Joshi HD, Magarini M (2021) Quantum internet-applications, functionalities, enabling technologies, challenges, and research directions 11. Shaib A, Naim MH, Fouda ME, Kanj R, Kurdahi F (2021) Efficient noise mitigation technique for quantum computing 12. Curty M, Santos DJ (2001) Quantum authentication of classical messages. Phys Rev A 64:062309 Nov 13. Wolf R (2021) Quantum key distribution, an introduction with exercises 14. Diamanti E (2006) Security and implementation of differential phase shift quantum key distribution systems. Ph.D. thesis, Stanford University, California 15. Hong KW, Foong OM, Low TJ (2016) Challenges in quantum key distribution: a review. In: Proceedings of the 4th international conference on information and network security, ICINS ’16, New York, NY, USA. Association for Computing Machinery, pp 29–33 16. Gupta S, Sau K, Pramanick J, Pyne S, Ahamed R, Biswas R (2017) Quantum computation of perfect time-eavesdropping in position-based quantum cryptography: quantum computing and eavesdropping over perfect key distribution. In: 2017 8th Annual industrial automation and electromechanical engineering conference (IEMECON), pp 162–167 17. Yuen HP (2016) Security of quantum key distribution. IEEE Access 4:724–749

Comparative Assessment of Methane Leak Detection Using Hyperspectral Data Karan Owalekar, Ujas Italia, Vijeta, and Shailesh Deshpande

Abstract Methane is the major source of ground-level ozone, a dangerous air pollutant, and greenhouse gas that could kill one million people each year (McArthur in Methane emissions are driving climate change. Here’s how to reduce them. United Nations Environment Programme. [1]). It is 80 times more potent than carbon dioxide at warming over a 20-year period. There is a critical demand for timely methane leak detection in remote places throughout the energy sector. Algorithms such as matched filters are commonly used to detect affected areas. These algorithms use the spectral signature of methane and try to match it with the signature of each pixel in a hyperspectral image. Considering the complexity of the situation and the performance variation of these algorithms, it is essential to compare them in order to find a quick, effective, and reliable method among these to detect methane leaks. In this paper, we compared three algorithms for identifying methane-contaminated areas: RX algorithm, matched filter, and adaptive cosine/coherence estimator (ACE). We first tested these algorithms on synthetic data, and then, we used them to detect methane-contaminated areas of the Santa Susana Mountains near Porter Ranch, Los Angeles, California. This methane leak disaster is known as Aliso Canyon gas leak (also called Porter Ranch gas leak) (Wikipedia contributors in Aliso Canyon gas leak. [2]). Hyperspectral data for this region was acquired by NASA using airborne visible/infrared imaging spectrometer (AVIRIS) (NASA-OMI in https://www.nasa. gov/mission_pages/aura/spacecraft/omi.html [3]).

Ujas Italia: He worked on this project while working at TCS. K. Owalekar (B) · U. Italia Tata Consultancy Services Mumbai, SDF 5, SEEPZ, Andheri East, Mumbai 400096, India e-mail: [email protected] Vijeta Tata Consultancy Services Ltd., TCSL-SEZ Unit, (IGGGL-SEZ), Chennai 600096, India e-mail: [email protected] S. Deshpande Tata Research Development and Design Centre, Tata Consultancy Services, Hadapsar, Pune 411013, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_55

783

784

K. Owalekar et al.

Keywords ACE · Hyperspectral imagery · Matched filter · Methane detection · Plume detection · RX algorithm

1 Introduction Methane (CH4 ) is the primary component of natural gas and a significant greenhouse gas (GHG). Greenhouse gasses act as an insulator in the atmosphere, absorbing energy, and slowing the rate at which heat escapes from the planet. This energy is absorbed exceptionally well in the case of methane. Methane also contributes to poor air quality, animal health problems, premature human deaths, and decreased crop yields. As a result, when there is a methane leak, it affects the entire environment adversely. For instance, a significant natural gas leak occurred in the Santa Susana Mountains close to Porter Ranch in Los Angeles, California, in October 2015 [1]. Within the Aliso Canyon underground storage facility, gas was leaking from a well. Methane emissions were calculated at 97,100 tons (95,600 long tons; 107,000 short tons). The leak initially released approximately 44,000 kg of methane per hour, or 1200 tons of methane per day, which compares to the equivalent effluvia from 200,000 cars in a year in terms of greenhouse gas output per month. The leak was caused by a metal pipe in the breached 7-inch (180 mm) casing of the injection well “Standard Sesnon 25” (SS 25), which is 8750 feet (2670 m) deep. The only reason the valves weren’t replaced was because the well wasn’t deemed “essential”, that is, it wasn’t within 300 feet of a house or within 100 feet of a road or park. Multiple safety valves were removed from the Aliso Canyon/Porter Ranch 1950s-era pipes in 1979 and never replaced, according to a special investigation into the leak conducted by Congressman Brad Sherman’s office. The gas leak also contained tertbutyl mercaptan, tetrahydrothiophene, and methyl mercaptan, which gave the gas a rotten-egg odor. Headaches, nausea, skin rashes, and severe nosebleeds have all been reported by locals. The school nurses treated about 50 kids per day for serious nosebleeds. There have been more cases of throat, ear, and eye infections than typical. Experts and independent academics have described the gas leak as the largest in US history. To prevent such catastrophes, leaks must be found before they propagate throughout the atmosphere. Though there are numerous methods for detecting such anomalies (leaks), matched filters are commonly used by correlating it to a known signature with each of the signatures in a hyperspectral image in order to detect contamination. The RX algorithm, on the other hand, uses spectral (color) differences to detect the anomaly. Targets that are spectrally distinct from the image’s background are extracted. Furthermore, ACE measures this disparity in a coordinate space where the statistics of the background class distribution whiten the data.

Comparative Assessment of Methane Leak Detection Using …

785

2 Literature Review Plume detection has been a long-standing area of study. Researchers use various methods to detect the presence/leakage of specific gases. One common method for detecting such plumes is to treat them as anomalies. The RX algorithm and its variants are among the most commonly used and studied by researchers [2–5]. RX algorithm tries to find color differences with the target and its surroundings. Reference [2] reduced the band for the RX detector using the fast Fourier transform [3]. A specific nonlinear mapping function was used to extend the RX technique to a feature space related to the original input space. Reference [5] tried to find how contamination would affect performance of the RX algorithm. Other researchers [6– 9] have employed matched filters because they are simple and effective in practical use. Reference [7] employed matched filters for detection and tried to address the issue of false positives brought on by matched filters. Other techniques used are adaptive covariance estimation [10–15]. Talks about how ACE performs better than other algorithms for anomaly detection. In [16], the authors discuss the implementation of plume detection for CubeSat architecture. They presented two ideas for data reduction and improving onboard computation. Their goal was to reduce the complexity of Mahalanobis distance O(Nd2) by reducing O(d2). Further, they used sparse matrix transform (SMT) to speed up the computation. In comparison with the standard approach, which takes five hours, they were able to obtain a pair of images in half an hour.

3 Methodology This study compares three well-known anomaly detection algorithms: matched filters, which match known signatures with unknown signatures, RX algorithm, which identifies color difference from background, and ACE is a simple extension of the MF in which the MF value is computed and then normalized by the length of x. These statistics are simply the cosine of the angle formed by x and s. Sect. 3.1 and 3.2 below describe the dataset and its preparation, while Sects. 3.3, 3.4, and 3.5 describe the implementation methodology for the “matched filter”, “RX algorithm”, and “ACE”, respectively.

3.1 Synthetic Data We used three publicly available hyperspectral datasets, namely Indian Pines, Pavia University, and Pavia Center. These datasets are earth observations images taken from airborne or satellites. We contaminated these images with a hyperspectral signature of foreign material and used the above algorithm to detect anomalies (Table 1).

786

K. Owalekar et al.

Table 1 Dataset properties Indian Pines

Pavia university

Pavia center

Cuprite

Rows

145

610

1096

512

Samples

145

340

715

614

Bands

200

103

102

188

Spectral range (nm)

400–2500

430–860

430–860

400–2500

3.2 AVIRIS Data for Methane Leak in Canyon The main dataset which we used was a methane-contaminated area in Santa Susana Mountains near Porter Ranch, Los Angeles, California. This data was collected by NASA using airborne visible/infrared imaging spectrometer (AVIRIS). This is hyperspectral data which consists of 224 bands ranging from 380 to 2510 nm. This image consists of 5366 rows and 1840 columns.

3.3 Matched Filter To determine the presence of material in the hyperspectral image, we use the target material’s existing signature or wavelet and compare it to the signatures present in the image that is embedded in the noise. It finds cross-correlation between two signals. √ AMF: D(X ) = s T R −1 (X − μ)/ s T R −1 s

(1)

We choose the signature of the target material (methane in this case) and then compare it to each signature one by one. The presence of a target signature in the pixel is detected by this.

3.4 RX Algorithm In RX anomaly detection, the Reed-Xiaoli detector (RXD) algorithm is used to detect spectral or color differences between a region to be tested and its neighboring pixels or the entire dataset. It is regarded as a benchmark for anomaly detectors for multi/hyperspectral images. This algorithm is based on the generalized likelihood ratio test (GLRT). −1 δr xd (r ) = (r − μ)T C L×L (r − μ)

(2)

The sample mean vector is represented by µ in this Eq. (2). C is the covariance matrix of the sample data. Finally, δr xd (r ) is the well-known Mahalanobis distance,

Comparative Assessment of Methane Leak Detection Using …

787

which indicates the amount of abnormality in the pixel under test (PUT). The AD process yields a two-dimensional detection matrix. A threshold should be applied to the detection matrix to determine the precise location of targets (anomalies). Bad pixels or lines appear abnormal, but they have no effect on the detection of other, legitimate anomalies. Excluding bad bands improves the accuracy of results, as it does with any spectral algorithm.

3.5 Adaptive Cosine/Coherence Estimator (ACE) ACE calculates a sample’s angular or cosine similarity to a known class example. This can be equivalent to comparing the spectral shape of the feature vectors. ACE, however, measures this disparity in a coordinate space where the statistics of the background class distribution whiten the data. The adaptive cosine/coherence estimator (ACE), which can be expressed as the square-root of the GLRT, is a popular and effective method for performing statistical binary classification.  s T −1 (x − μb )  b , D AC E (x, s) =   T −1 s T −1 s − μ − μ (x ) (x ) b b b b

(3)

In Eq. (3), x represents a sample feature vector and s represents an a priori target −1  are used class representative. The mean vector (μb ) and inverse covariance b to parameterize the background class distribution. The ACE classifier’s response, D, is essentially the dot product of a sample and a known class representative in a whitened coordinate space.

4 Experiments and Results 4.1 Experiment 1: Contamination of Indian Pines Dataset Using Signatures from Pavia University and Pavia Center Datasets The Indian Pines scene is composed of two-thirds agriculture and one-third forest or other natural perennial vegetation, and when urban material signatures such as asphalt and bricks are added, they act as an anomaly in the Indian Pines image. In this experiment, we attempted to locate steel towers first. The steel tower in the Indian Pines dataset appears and behaves as an anomaly in the image. We then attempted to contaminate the image ourselves. We dispersed random pixels of asphalt, bricks, and water throughout the Indian Pines image. This was done to see how algorithms performed on the synthetic data that we created. Furthermore, rather than a single

788

K. Owalekar et al.

Table 2 Anomaly detection in Indian Pines dataset Description

RX

Matched filter

ACE

ACE (windowed)

4.1.1

Steel towers (in Indian Pines image)

2/93

93/93

93/93

93/93

4.1.2

Water contamination (8 pixels at random locations)

8/8

8/8

8/8

8/8

4.1.3

Contamination by brick (8 pixels at random locations)

8/8

8/8

8/8

8/8

4.1.4

Asphalt contamination (8 pixels at random locations)

8/8

8/8

8/8

8/8

4.1.5

Water contamination (single group of not more than 5 pixels at random 5 locations)

25/25

25/25

25/25

25/25

4.1.6

Contamination by brick (single group of not more than 5 pixels at random 5 locations)

25/25

25/25

25/25

25/25

4.1.7

Asphalt contamination (single group of not more than 5 pixels at random 5 locations)

25/25

25/25

25/25

25/25

pixel, we contaminated using groups of five pixels at five different locations. The results of this experiment are shown in Table 2. Table 2 displays the number of pixels accurately detected as an anomaly out of the total pixels in the Indian Pines image.

4.1.1

Steel Tower as an Anomaly

The location of the steel tower is depicted in Fig. 1 in the area denoted by the black pixels. Figures 2, 3, and 4 show the results of the RX algorithm, matched filter, and ACE. Pixels marked with white color are anomalies detected by these algorithms.

Comparative Assessment of Methane Leak Detection Using …

Fig. 1 Steel tower

Fig. 2 RX

789

790

K. Owalekar et al.

Fig. 3 Matched filter

Fig. 4 ACE

4.1.2

Water (8 Pixels) from Pavia Center as an Anomaly

The Pavia Center dataset’s water signature was introduced at random to the Indian Pines dataset at eight distinct locations. Pixels marked with white are anomalies identified by respective algorithms in Figs. 5, 6, and 7.

Comparative Assessment of Methane Leak Detection Using …

Fig. 5 RX algorithm

Fig. 6 Matched filter

791

792

K. Owalekar et al.

Fig. 7 ACE

4.1.3

Brick (8 Pixels) as an Anomaly

We placed self-blocking brick signatures from the Pavia University dataset at 8 random locations in the Indian Pines image. The algorithm’s output is depicted in Figs. 8, 9, and 10.

Fig. 8 RX algorithm

Comparative Assessment of Methane Leak Detection Using …

793

Fig. 9 Matched filter

Fig. 10 ACE

4.1.4

Asphalt (8 Pixels) from Pavia University as Anomaly

Similarly, to the previous two experiments, we randomly placed an asphalt signature into the Indian Pines scene. Figures 11, 12, and 13 show the results of the RX algorithm, matched filter, and ACE.

794

Fig. 11 RX algorithm

Fig. 12 Matched filter

K. Owalekar et al.

Comparative Assessment of Methane Leak Detection Using …

795

Fig. 13 ACE

4.1.5

A Group of 5 Water Pixels from Pavia Center Dataset as an Anomaly

Instead of placing a single pixel at a random location, we placed a group of 5 pixels at 5 different locations to see how these algorithms perform in the case of a cluster versus a single pixel. Anomaly pixels detected by these algorithms are shown below in Figs. 14, 15, and 16. Fig. 14 RX algorithm

796

K. Owalekar et al.

Fig. 15 Matched filter

Fig. 16 ACE

4.1.6

Group of 5 Brick Pixels as an Anomaly

We took a cluster of 5 brick pixels, similar to the experiment above, and placed them at 5 different locations throughout the image. Figures 17, 18, and 19 show results from RX, matched filter, and ACE, respectively.

Comparative Assessment of Methane Leak Detection Using …

Fig. 17 RX algorithm

Fig. 18 Matched filter

797

798

K. Owalekar et al.

Fig. 19 ACE

4.1.7

Cluster of Asphalt Pixels as Anomaly

We carried out the same procedures as the previous two experiments and dispersed a cluster of five asphalt pixels at random within Indian Pines. Figures 20, 21, and 22 show results from RX, matched filter, and ACE, respectively. Fig. 20 RX algorithm

Comparative Assessment of Methane Leak Detection Using …

799

Fig. 21 Matched filter

Fig. 22 ACE

4.2 Experiment 2: Cuprite as a Base Image Contaminated with Aster Cuprite was used as the base image for these experiments. We began by cleaning up the data by removing noise and water absorption bands. The aster data was then interpolated according to cuprite specifications and used to contaminate cuprite. The rest of the procedure followed the same pattern as the previous experiments. These experiments yielded the following results (Table 3)

800

K. Owalekar et al.

Table 3 Anomaly detection in cuprite Description

Rx

Matched filter

ACE

4.2.1

Aster man-made materials data

4/4

4/4

4/4

4.2.2

Aster soil data

4/4

4/4

4/4

4.2.3

Aster vegetation data

4/4

4/4

4/4

4.2.1

Manmade Materials from Aster as an Anomaly

Terra cotta tiles, construction concrete, galvanized steel metal, and weathered red brick are man-made materials found in Aster. These components were dispersed throughout the cuprite dataset as an anomaly. Figures 23 and 24 represent the results of the RX algorithm and the ACE, respectively. Fig. 23 RX algorithm

Comparative Assessment of Methane Leak Detection Using …

801

Fig. 24 ACE

4.2.2

Aster Soil as an Anomaly

In the experiment that follows, we contaminated the Cuprite dataset with soil signatures found in the Aster data as an anomaly. Black loam, reddish brown fine sandy loam, white gypsum dune sand, and gray silty clay are all types of aster soil. The results of the ACE and RX algorithms are displayed in the following Figs. 25 and 26. Fig. 25 RX algorithm

802

K. Owalekar et al.

Fig. 26 ACE

4.2.3

Aster Vegetation as an Anomaly

We used vegetation found in Aster as an anomaly in this experiment. These include Ficus Craterostoma, Eucalyptus ficifolia, Cedrus atlantica, and Abies-Concolor1. Results from the RX algorithm and ACE algorithm are displayed in Figs. 27 and 28. Fig. 27 RX algorithm

Comparative Assessment of Methane Leak Detection Using …

803

Fig. 28 ACE

4.3 Experiment 3: Scaled Cuprite as a Base Image Contaminated with Aster (Converted to Reflectance) Cuprite was also used in these experiments. Cuprite data was scaled by dividing it by 10,000, and Aster data was converted to reflectance from percent reflectance by dividing it by 100. When a good threshold value for each material was found, we found no false positives. We ran the same tests as in the previous experiments. In addition to the previous experiments, we dispersed a cluster of 5 pixels in four different locations. The experiment’s results are presented in Table 4. Table 4 Anomaly detection in scaled cuprite Description 4.3.1 4.3.2 4.3.3

Aster man-made materials data Aster soil data Aster vegetation data

Rx

Matched filter

ACE

Single pixel

4/4

4/4

4/4

Cluster

20/20

20/20

20/20

Single pixel

4/4

4/4

4/4

Cluster

20/20

20/20

20/20

Single pixel

4/4

4/4

4/4

Cluster

20/20

20/20

20/20

804

4.3.1

K. Owalekar et al.

Manmade Materials from Aster as an Anomaly

The outcomes of the RX algorithm and the ACE algorithm, respectively, when we dispersed a single pixel are shown in Figs. 29 and 30. When we dispersed a cluster of pixels, the RX method and ACE algorithm produced the results shown in Figs. 31 and 32. Fig. 29 RX

Fig. 30 ACE

Comparative Assessment of Methane Leak Detection Using …

805

Fig. 31 RX (cluster)

Fig. 32 ACE (cluster)

4.3.2

Aster Soil as an Anomaly

The outcomes of the RX algorithm and the ACE algorithm, respectively, when we dispersed a single pixel are shown in Figs. 33 and 34. When we dispersed a cluster of pixels, the RX method and ACE algorithm produced the results shown in Figs. 35 and 36.

806 Fig. 33 RX

Fig. 34 ACE

K. Owalekar et al.

Comparative Assessment of Methane Leak Detection Using …

807

Fig. 35 RX (cluster)

Fig. 36 ACE (cluster)

4.3.3

Aster Vegetation as an Anomaly

The outcomes of the RX algorithm and the ACE algorithm, respectively, when we dispersed a single pixel are shown in Figs. 37 and 38. When we dispersed a cluster of pixels, the RX method and ACE algorithm produced the results shown in Figs. 39 and 40.

808

Fig. 37 RX

Fig. 38 ACE

K. Owalekar et al.

Comparative Assessment of Methane Leak Detection Using …

809

Fig. 39 RX (cluster)

Fig. 40 ACE (cluster)

4.4 Experiment 4: AVIRIS Data for Methane Leak in California In these experiments, we studied the methane leak that took place in Santa Susana Mountains. The data used was collected by NASA. It is hyperspectral data consisting of 224 bands. Since the amount of data is too large to process everything at once, we divide it into two separate cropped images and perform the analysis on those. It is to be noted that the filers depending upon the background covariance matrix for

810

K. Owalekar et al.

evaluation of the leak will be affected because of this. However, considering the small amount of contamination, anomalies remain in the smaller background sets still. As usual, we eliminate the water and noisy channels after loading the data which are [0, 1, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 220, 221, 222, 223]. We use only a few pixels from the known plume as targets and input it to the algorithm. We note that the RX values of the plume in the image do not appear to be an abnormality, but our objective of detecting the plume in these trials works flawlessly.

4.4.1

Experiments on First Cropped Image

In this image, we utilized the RX and ACE algorithms to look for anomalies. A white rectangle in Fig. 41 denotes the area where the methane plumes that were emitted were found. The histogram produced by the RX values for this image is displayed in Fig. 42. Figure 43 is the result of applying the RX algorithm. Anomalies found by the RX algorithm are indicated by the white areas. Figure 44 is the outcome of the ACE algorithm. The presence of methane plumes is indicated by white pixels.

Fig. 41 Location of methane plume

Comparative Assessment of Methane Leak Detection Using … Fig. 42 Histogram for RX algorithm

Fig. 43 Anomaly detected by RX algorithm

Fig. 44 ACE for methane detection in 1st cropped image

811

812

4.4.2

K. Owalekar et al.

Experiment on Second Cropped Image

We followed the same steps as we did for the first cropped image. The plume is illustrated in Fig. 45 by a white rectangle. The histogram and anomaly detected by the RX algorithm are shown in the following two figures, Figs. 46 and 47. In the RX algorithm, areas detected as anomalous are marked with circles. Figure 48 depicts the detected anomaly pixels by ACE algorithm in white.

Fig. 45 Location of methane plume

Fig. 46 Histogram for RX algorithm

Fig. 47 Anomaly detected by RX algorithm

Comparative Assessment of Methane Leak Detection Using …

813

Fig. 48 ACE for methane detection in 2st cropped image

NASA provided the area’s CH4 mixing ratio in ppm. When we compare the results with places where methane concentrations are over 10,000 ppm, we obtain precision of 99.6351%, recall of 97.9727%, and a f1 score of 98.79690 using the ACE algorithm.

5 Conclusion We used RX algorithm, matched filter, and ACE algorithm to detect anomalies in synthetic data and AVIRIS data for canyon methane leaks. We discovered that the RX algorithm (Fig. 2) did not perform well for large clusters of area, such as the steel tower in Fig. 1, whereas the matched filter (Fig. 3) and ACE (4), which compares each pixel with the target signature, performed much better. When we used the RX algorithm to discover anomalies in the AVIRIS data, there were a number of false positives. In contrast, the ACE algorithm accurately identified methane-contaminated sites in the region where we knew methane was leaking. High accuracy was achieved in the detection of anomalies using the matched filter and ACE algorithm. In many areas, ACE was able to perform better than matched filters. The ACE algorithm outperforms the other two algorithms in terms of precision and recall. We were able to get 100% accuracy for synthetic data or data that has a distinctive signal for the target material compared to other material. However, in the case of AVIRIS image where methane’s signature was mixed with background, we were still able to recognize it with a f1 score of 98.79690. Additionally, we found that ACE was faster to compute than matched filters and RX algorithm. Overall, finding anomalies with ACE took about 47% less time. For locating target material in hyperspectral images, such as leaked plumes, the adaptive covariance/cosine estimator (ACE) is a more reliable and accurate method.

814

K. Owalekar et al.

Although these algorithms perform well, the results are site specific and in current form contaminated pixels take part in background covariance calculations, we plan to improve upon and deploy guard window approach to detect anomalies. This defines a guard window and an outer window, and the background statistics are calculated using the pixels between the two. The performance of this algorithm against ACE can be evaluated in future research.

References 1. Wikipedia contributors (14 Aug 2022) Aliso canyon gas leak. Wikipedia. Retrieved September 15, 2022, from https://en.wikipedia.org/wiki/Aliso_Canyon_gas_leak 2. Zare-Baghbidi M, Homayouni S, Jamshidi K (2015) Improving the RX anomaly detection algorithm for hyperspectral images using FFT. J Model Simul Electr Electron Eng (MSEEE) 1:89–95 3. Kwon H, Nasrabadi NM (Feb 2005) Kernel RX-algorithm: a nonlinear anomaly detector for hyperspectral imagery. IEEE Trans Geosci Remote Sens 43(2):388–397. https://doi.org/10. 1109/TGRS.2004.841487 4. Molero JM, Garzon EM, Garcia I, Plaza A (2013) Analysis and optimizations of global and local versions of the RX algorithm for anomaly detection in hyperspectral data. IEEE J Sel Top Appl Earth Observations Remote Sens 6:801–814 5. Matteoli S, Diani M, Corsini G (2012) Effects of signal contamination in RX detection of local hyperspectral anomalies. In: Processing IEEE international geoscience and remote sensing symposium (IGARSS), pp 4845–4848 6. Robey FC, Fuhrmann DR, Kelly EJ, Nitzberg R (1992) A CFAR adaptive matched filter detector. IEEE Trans Aerosp Electron Syst 28:208–216 7. Boardman JW, Kruse FA (2011) Analysis of imaging spectrometer data using N-dimensional geometry and a mixture-tuned matched filtering approach. IEEE Trans Geosci Remote Sens 49:4138–4152 8. DiPietro RS, Manolakis DG, Lockwood RB, Cooley T, Jacobson J (2012) Hyperspectral matched filter with false-alarm mitigation. Opt Eng 51:016202 9. Theiler J, Foy BR, Fraser AM (2006) Nonlinear signal contamination effect for gaseous plume detection in hyperspectral imagery. Proc SPIE 6233:62331U 10. Manolakis D, Pieper M, Truslow E, Cooley T, Brueggeman M, Lipson S (2013) The remarkable success of adaptive cosine estimator in hyperspectral target detection. Proc SPIE 8743:874302 11. Cao G, Bouman CA (2009) Covariance estimation for high dimensional data vectors using the sparse matrix transform. Adv Neural Inf Proc Syst 21:225–232. MIT Press 12. Theiler J (2012) The incredible shrinking covariance estimator. Proc SPIE 8391:83910P 13. Friedman J, Hastie T, Tibshirani R (2007) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9:432–441 14. Scharf LL, McWhorter LT (1996) Adaptive matched subspace detectors and adaptive coherence estimators. In: Processing Asilomar conference on signals, systems, and computers 15. Kraut S, Scharf LL, Butler RW (2005) The adaptive coherence estimator: a uniformly most powerful invariant adaptive detection statistic. IEEE Trans Sig Proc 53:427–438 16. Theiler J, Foy BR, Safi C, Love SP (8 May 2018) Onboard CubeSat data processing for hyperspectral detection of chemical plumes. In: Proceeding SPIE 10644, algorithms and technologies for multispectral, hyperspectral, and ultraspectral imagery XXIV, 1064405. https://doi.org/10. 1117/12.2305278 17. McArthur JA (20 Aug 2021) Methane emissions are driving climate change. Here’s how to reduce them. United Nations Environment Programme. https://www.unep.org/news-and-sto ries/story/methane-emissions-are-driving-climate-change-heres-how-reduce-them

Comparative Assessment of Methane Leak Detection Using …

815

18. NASA-OMI [Online] https://www.nasa.gov/mission_pages/aura/spacecraft/omi.html 19. Folkman MA, Pearlman J, Liao LB, Jarecke PJ (8 Feb 2001) EO-1/Hyperion hyperspectral imager design, development, characterization, and calibration. In: Proceedings SPIE 4151, hyperspectral remote sensing of the land and atmosphere. https://doi.org/10.1117/12.417022 20. Brett CJC, DiPietro RS, Manolakis DG, Ingle VK (2013) Efficient implementations of hyperspectral chemical-detection algorithms. Proc SPIE 8897:88970T 21. Reed IS, Mallett JD, Brennan LE (1974) Rapid convergence rate in adaptive arrays. IEEE Trans Aerosp Electron Syst 10:853–863 22. Mahalanobis PC (1936) On the generalized distance in statistics. Proc National Inst Sci India 2:49–55 23. Reed IS, Yu X (1990) Adaptive multiple-band CFAR detection of an optical pattern with unknown spectral distribution. IEEE Trans Acoust, Speech, Signal Process 38:1760–1770 24. Theiler J (2013) Matched-pair machine learning. Technometrics 55:536–547 25. Theiler J (2014) Transductive and matched-pair machine learning for difficult target detection problems. Proc SPIE 9088:90880E 26. Manolakis DG, D’Amico FM (2005) A taxonomy of algorithms for chemical vapor detection with hyperspectral imaging spectroscopy. Proc SPIE 5795:125–133 27. Theiler J, Foy BR (2006) Effect of signal contamination in matched-filter detection of the signal on a cluttered background. IEEE Geosci Remote Sens Lett 3:98–102 28. Matteoli S, Diani M, Corsini G (2014) Impact of signal contamination on the adaptive detection performance of local hyperspectral anomalies. IEEE Trans Geosc Remote Sens 52:1948–1968 29. Minet J, Taboury J, Goudail F, Pealat M, Roux N, Lonnoy J, Ferrec Y (2011) Influence of band selection and target estimation error on the performance of the matched filter in hyperspectral imaging. Appl Opt 50:4276–4285 30. Schaum A, Stocker A (1997) Spectrally selective target detection. In: Proceedings ISSSR (International symposium on spectral sensing research)] 23 31. Bachega L, Theiler J, Bouman CA (2011) Evaluating and improving local hyperspectral anomaly detectors. In: Proceedings 40th IEEE applied imagery and pattern recognition (AIPR) workshop 32. Velasco-Forero S, Chen M, Goh A, Pang SK (2015) “Comparative analysis of covariance matrix estimation for anomaly detection in hyperspectral images. IEEE J Sel Top Signal Proc 9:1061–1073

Application of Machine Learning in Customer Services and E-commerce G. Aarthi, R. Karthikha, Sharmila Sankar, S. Sharon Priya, D. Najumnissa Jamal, and W. Aisha Banu

Abstract Nowadays, machine learning (ML) plays the important role in the E-commerce industry and its customer relations to perform different kinds of tasks such as prediction of purchases, segmentation of customers according to their reviews/sentiments, and recommendation of products to the active users. Various ML algorithms are implemented to get trained with data patterns to perform the above-mentioned tasks. In this paper, the customer segmentation and recommendation of women’s clothing based on the reviews are presented. The comparative study is done using five different ML algorithms, namely regression analysis, Naïve Bayes, decision trees, support vector machines, and clustering analysis. The results show that the Naïve Bayes algorithm has better performance when compared to other algorithms by showing better accuracy. Keywords Customer segmentation · E-commerce · Machine learning · Regression analysis · Naïve Bayes · Decision trees · Support vector machines · Clustering analysis · Sentiment analysis G. Aarthi (B) · S. Sankar · S. S. Priya · W. A. Banu Computer Science and Engineering, B. S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India e-mail: [email protected] S. Sankar e-mail: [email protected] S. S. Priya e-mail: [email protected] W. A. Banu e-mail: [email protected] R. Karthikha Electrical and Electronics Engineering, B. S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India e-mail: [email protected] D. N. Jamal Electronics and Instrumentation Engineering, B. S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_56

817

818

G. Aarthi et al.

1 Introduction E-commerce has become increasingly significant in the current era of science and information technology, especially with the rapid expansion of the Internet. It has grown swiftly, and people have become increasingly reliant on Internet purchasing, which not only simplifies their daily lives but also connects them to the rest of the world in a modern way. One of the most crucial aspects of E-commerce is the customer. It is a random method to figure out what a large customer wants and what product they buy. However, it is critical to deliver a product that meets the needs of the consumer and to give support promptly. As a result, we believe that by utilizing machine learning techniques, we will be able to provide a solution for the E-commerce application. The top priority for all the merchants is whether customers can repeat purchases or not. Channel integration has a better and more robust impact on the quality of service in both mobile and online environments, which further regulates the trade-off between explicit fulfillment and combined fulfillment. This trade-off emphatically affects total fulfillment and successively has a good impact on repurchase intentions [1]. Customer happiness is more vital than ever for every organization, and it is the foundation for success and a healthy increase in revenue. Customer lifecycle management is the process of recording the stages of the customer’s lifetime, assigning metrics to each stage, and determining success using those data. The primary purpose of the company’s performance is to follow the client’s lifetime over time. The lifetime of a business-customer relationship has five stages, as indicated in Fig. 1: reach, acquisition, conversion, retention, and loyalty. (1) Reach In the primary stage, a customer looks for an item in the wake of getting mindful of an issue or issue they need to solve. This stage is termed, ‘reach’ since it is your opportunity to arrive at the client while they are thinking. In this stage, your customer is looking at items across other contending brands (counting yours), doing research, and reading customer reviews. Social media marketing, Web-based media advertising, search motor promoting, and other inbound and outbound techniques should put your image on this customer’s radar. This stage is effective when the client connects with you for more data, looking to either teach themselves further or get a complete cost [2]. (2) Acquisition Customers officially access the acquisition stage when they browse a Website or get enquired depending on the customer’s acquisition channel. Customers when browsing through the Website, the content should be helpful and informative which can assist them in making a purchasing decision [3]. A content offer, evaluation page, or blog post should provide all of the information a customer requires for making an efficient purchasing decision. A password-protected area of the Webpage should be used to collect consumer information, and also administrative staff should be able to respond to urgent questions via live chat or chatbot. From a variety of perspectives,

Application of Machine Learning in Customer Services and E-commerce

819

Fig. 1 Life cycle of customer relationship

customer service interactions are very important and also simply accessing, and Website usage is a customer care touchpoint [4]. (3) Conversion After gathering all necessary information and being satisfied with the customer experience provided by the Website reviews, there is a possibility of purchase. At this point, the customer should be provided with a valuable advantage. Keeping the customer’s attention regularly is an excellent opportunity for the company [5]. (4) Retention Customer retention begins by discovering how the customer feels by conducting complete customer support studies. These studies are measured by customer satisfaction scores, and also a chatbot should be set up to discover what can be improved further in service. Utilizing data straightforwardly from them can make upgrades to the items and administrations known as customer support insight [6]. (5) Loyalty Customer lifecycle programming is the best technique to can help to automate the customer life cycle. Machine learning techniques assist the E-commerce shop in the implementation of intelligent retargeting campaigns. With these predictive models, real-time discounts are implemented to increase sales or improve margins. Customers

820

G. Aarthi et al.

have gotten more intrigued by the quality of service (QoS) given by the respective associations. Machine learning can assist customers in making sense of client data to further tailor marketing strategies [11]. Some of the applications of machine learning in E-commerce are customer segmentation, personalization of services, and targeted campaigning. Businesses can use diversification to increase the effectiveness of their marketing efforts and get an edge over rivals [12]. Examples of demographic information include gender, age, marital and family status, income, level of education, and employment. Consumer spending and consumption patterns, product and service use, feature utilization, session frequency, browsing history, average order value, and anticipated benefits are all examples of behavioral data [14]. The advantage of customer segmentation is product pricing determination; making the best distribution plan; selecting particular product features to use; prioritizing developing new products efforts; creating tailored marketing campaigns [15].

2 Machine Learning Techniques Machine learning algorithms are generally used for customer segmentation based on the history of the customer purchase details by implementing any of the following machine learning methods or algorithms, (1) regression analysis; (2) decision tree; (3) support vector machine; (4) Naïve Bayesian; (5) ensemble-based learning; (6) clustering analysis. A brief introduction of the above-said algorithms is presented in this section. (1) Regression Analysis: Regression analysis is a proven method for identifying relationships between dependent and independent variables or attributes. The relationship between one target variable and one or more independent variables is included in many regression models. When the dependent variable is binary, the selected regression analysis model is logistic regression (LR) which is operated as a predictive analytic model [16]. (2) Decision Tree: This model creates a tree-like design which addresses a set of decisions and returns the likelihood scores of the particular class. The decision tree model is consisting of (a) inner nodes where each node alludes to a single variable/feature and addresses a test point at the feature level; (b) branches which address the test result and are addressed by lines that at long last lead; (c) leaf nodes which address the class labels. It is an adaptable model that upholds both categorical and continuous data [17]. (3) Support Vector Machine: Support vector machine (SVM) is a supervised machine learning technique used for classification and regression problems. SVM finds a set of hyperplanes which identifies the best instances of different classes. New instances are based on their proximity to the separating gap. SVM is a classification model that uses all attributes and provides a learning method, by non-overlapping segmentation [18].

Application of Machine Learning in Customer Services and E-commerce

821

(4) Naïve Bayes: The Bayes algorithm evaluates the probability of an event occurring. It is the foundation of the Naive Bayesian (NB) classification method where the presence or absence of one feature does not influence the presence or absence of another. NB is a supervised machine learning technique that predicts new events on a review of the previous events. The NB model generates a probability score and class participation [19]. (5) Ensemble-Based Learning: Ensemble-based learning is a prediction-based technique that determines the combination of the outputs of multiple classifiers. This includes bagging methods (including random forest) and boosting methods (including AdaBoost, stochastic gradient boosting, etc.,) [20]. (a) Random Forest: It is an ensemble-based technique which generates prediction values by constructing many decision trees. During the training stage, it builds many classification trees. Each classification tree in the arbitrary forest is made up of prepared data and illustrative components that were randomly deleted from the entire training dataset. Each classification tree has a discrete arrangement of learned information, and the tree’s model and predicted features are distinct [21]. Each decision tree in the forest creates a voting mechanism and chooses the class with maximum votes over other trees in the forest [22]. One of the significant benefits is the assurance against overfitting, which makes the model perform well. The learning data of one classification tree is applied to the irregular forest, and the irrelevant information is removed subsequently. The information which is not utilized in developing classification trees is utilized for model approval. This is called out-of-bag (OOB). The all-out number of OOB choices in the whole classification tree of the arbitrary forest is distinctive for every individual substance, and the qualities that are ordered when chosen are likewise anticipated contrastingly for each tree. This element permits us to figure the likelihood of expectation for an individual element [23]. The benefit of this model is that, first, albeit the exactness of a single tree may drop because the trained dataset is little, and however, the last precision anticipated by joining them is better than the straightforward classification tree calculation. Second, as per the law of polynomial math, the bigger the size of the forest (the number of trees), the more summed-up blunders, ordinarily known by the misclassification rate, unites to a specific breaking point esteem [24]. Third, while recovering individual classification trees, we utilize haphazardly recreated information from the whole learning information, so it is not affected by outliers or noise. (b) Boosting—AdaBoost and stochastic gradient are boosting-based methods. These methods transform a group of powerless students into more grounded students. The poor algorithm will outperform random guessing and can perform somewhat better than random solutions. AdaBoost provides more weight to frail learners that cannot handle and rejects the precise prediction by feeble learners. The major goal of AdaBoost is to teach new powerless people to deal with erroneous impressions. In accordance with their alpha

822

G. Aarthi et al.

weight (exactness), weaker students are joined to more firmly grounded students; the higher the alpha weight, the greater the contribution to the final student [25]. Feeble learners are decision trees with a single split, and each instance’s grade is determined by combining the results of all weak learners and weighting them according to their respective precision. On the other hand, gradient boosting gives weight to misclassified cases by exploiting the pseudo-residuals. Mistakes are processed during each iteration, and a fragile learner is replaced as a result. The commitment of the powerless learner to the solid learner at that time is the minimizing of the solid learner’s error [26]. (6) Clustering Analysis: When analyzing data, it is frequently utilized to find intriguing patterns, such as groupings of customers based on their behavior [27]. There is various range of clustering algorithms and different configurations for each algorithm. Some of the clustering algorithms are as follows: (a) K-means Clustering This clustering method divides a dataset into a ‘K’ number of non-covering groups. It allocates the perception after the initial value has been set, which depends on the first set of values and also limits the variance of the distance contrast for each cluster. The major goal is to discover a solution to the two objective tasks that enhance the level of cluster union and the divide between clusters [28]. (b) DBSCAN Density-based spatial clustering of applications with noise (DBSCAN) clustering includes discovering high-density regions in space and extending the feature area around them as clusters [29].

3 Methodology To implement the above-mentioned machine learning-based models using the dataset, it should be analyzed and preprocessed for extracting the important features. Analyzing data includes data collection and data preprocessing techniques [30]. The below flow diagram in Fig. 2 shows the methodology of work carried out in this paper. (1) Data collection The dataset is collected from the Kaggle data repository. It consists of E-commerce reviews of women’s clothing. This data is used to segment the customer according to their reviews (including positive and negative reviews) and also to further recommend the clothing to other active users of the E-commerce platform based on positive reviews. The collected data size is 23487 * 11 (rows * columns) which includes the features like clothing identification number, customer’s age, title and text of the

Application of Machine Learning in Customer Services and E-commerce Fig. 2 Flow diagram of proposed work

823

E-commerce platform

Data collection

Data pre-processing A) Data cleaning B) Feature selection

Data classification Using Machine Learning (ML) algorithms

Visualisation results and statistical reports

review, rating, positive feedback count, clothing’s division name, department name, and class name. It also consists of the target variable recommendation indicator ‘0’ or ‘1’ (ref. Table 1). (2) Data Preprocessing Data preprocessing is the process of removing artifacts from raw data to make it appropriate for the application. This is the most important step in the development of any machine learning model. There are two stages to this data preparation. Data cleansing and feature selection are two of them. [31]. (a) Data Cleaning Data cleaning entails dealing with missing data that has no bearing on the ML model’s implementation. There are several missing values/null values in the obtained dataset for garment ID and title. It also includes removing unnecessary punctuation and stop words from the review content. Data cleansing has revealed that there is no link between age and good feedback, nor between age and rating [32, 33] (Fig. 3). (b) Feature Selection The process of picking a subset of characteristics that has more significance in the ML model’s result, the target variable, is known as feature selection. By reducing undesired and duplicated features, it will increase the classifier accuracy in supervised learning algorithms. It is made up of wrappers and filters, with the filters having a

824

G. Aarthi et al.

Table 1 Women’s clothing in feature dataset Variable

Datatype

Description

S. no.

Integer

Serial number of the dataset

Clothing ID

Integer

Clothing identification number

Age

Integer

Age of customer

Title

String

Feedback title

Review text

String

Detailed customer review of the product

Rating

Integer

Rating of the product (maximum value-5 (good); lower value-1 (worst))

Recommended IND

Categorical type Yes (1)—recommends to others/no (0)—no recommendation

Positive feedback count Integer

Total count of the positive feedback for the particular product

Division name

String

The division name of the product (like, intimates or general or general petite)

Department name

String

The department name of the product (like, bottoms or dresses or intimate or jackets or tops or trend)

Class name

String

The class name of the product (like blouses, causal bottoms, chemises, dresses, fine gage, intimates, jeans, jackets, knits, layering, lounge, legwear, outerwear, pants, shorts, skirts, sleep, sweaters, swim, trend)

Fig. 3 Unigram representation after the removal of stop words

Application of Machine Learning in Customer Services and E-commerce

825

lower time complexity to reduce redundancy. This stage condenses the full raw dataset into a smaller dataset that is easier to access the target variable, saving time in the process. Dimensionality reduction techniques are used to create smaller datasets [34–36].

4 Simulation Setup The aforementioned data preprocessing approaches and the implementation of the dataset into ML models are performed using the Python programming language, which has many relevant libraries for ML algorithms. The performance of many ML classifiers is evaluated using the performance metric. Performance metrics include accuracy, precision, recall, positive predictive value, and negative predictive value. One of them is accuracy, which is critical in performance evaluation. The following methods are used to calculate the accuracy of various machine learning approaches. (a) Regression Analysis The accuracy value obtained through regression analysis is represented in Fig. 4 showing a value of 47% which has very little significance. (b) K-means Clustering Analysis Figure 5 shows the k-means value of the collected women’s clothing dataset by kmeans clustering analysis. It also classifies the dataset into four clusters for easy

Fig. 4 Accuracy score obtained through regression analysis

826

G. Aarthi et al.

Fig. 5 K-means value of the collected dataset

segmentation of customers and customer review-based recommendations to new users which are shown in Fig. 6. Figure 7 shown below depicts the accuracy score achieved by the k-means clustering analysis having a value of 67%. Fig. 6 Cluster formation of the dataset

Application of Machine Learning in Customer Services and E-commerce

827

Fig. 7 Accuracy value obtained through clustering analysis

Fig. 8 Accuracy value obtained through the decision tree classifier

(c) Decision Tree The decision tree classifier has obtained an accuracy value of 80% which is shown in Fig. 8. This shows a slight improvement when compared to the previous regression analysis. (d) Support Vector Machines The accuracy obtained by using support vector machines shows a value of 83% represented in Fig. 9 with improved significance. (e) Naive Bayes The Naïve Bayes classifier is used to obtain the accuracy metrics of the collected dataset. This shows the value of 87% providing much better performance than the other classifiers used previously shown in Fig. 10.

5 Result and Discussion To categorize customers based on reviews and recommend products to new users, many types of machine learning algorithms are evaluated. The box plot between the

828

G. Aarthi et al.

Fig. 9 Accuracy value obtained through support vector machines

Fig. 10 Accuracy value obtained through the Naïve Bayes classifier

recommended indicator and the rating is shown in Fig. 11. It shows that review scores of more than three are seen as positive feedback and are recommended to other users. Figure 12 shows the pie chart depicting the classification of review texts as positive, negative, and neutral feedback. The results contain more neutral feedback. Figure 13 shows the correlation matrix of the women’s clothing dataset comparing the relations between each feature of the dataset. Different machine learning (ML) algorithms are compared, including regression analysis, decision trees (DT), support vector machines (SVMs), Naive Bayes (NB), and clustering analysis. Table 2 depicts the contrasts between the above-mentioned

Application of Machine Learning in Customer Services and E-commerce

829

Fig. 11 Recommended indicator versus rating

Fig. 12 Representation of review texts classification

machine learning methods and their accuracy. In comparison with other ML techniques, Naive Bayes analysis has a higher accuracy of 87%, which shows 40% of increased performance than regression analysis of 47% accuracy, as seen in the table. Also decision trees and support vector machine algorithms performed good when compared with Naïve Bayes, since they have only 3–4% difference in performance. But, regression analysis and clustering analysis show the poor performance in client segmentation based on reviews and product recommendations to new users for the above-said dataset.

830

G. Aarthi et al.

Fig. 13 Correlation matrix of useful features in the dataset

Table 2 Comparison of different ML algorithms and their accuracy

S. no.

Algorithm

Accuracy (%)

1

Regression analysis

47

2

Clustering analysis

67

3

Decision trees

80

4

Support vector machine

83

5

Naïve Bayes

87

6 Conclusion and Future Work This work employs a variety of machine learning algorithms to deliver client segmentation and product recommendations in E-commerce to new users. Among the algorithms used are regression analysis, Naive Bayes (NB), decision trees (DT), support vector machines (SVMs), and clustering analysis. The comparison study’s goal is to ensure that the various classifiers perform better. The Naive Bayes (NB) algorithm has a greater accuracy of 87%, according to the comparison study. Further, this study can be broadened using deep learning classification methods which give more accurate results with reduced training time and handle overfitting issues better than other ML classifiers.

References 1. Archer N, Yuan Y (2000) Managing business-to-business relationships throughout the e-commerce procurement life cycle. Internet Res 2. Ryding D (2010) The impact of new technologies on customer satisfaction and business-tobusiness customer relationships: evidence from the soft drinks industry. J Retail Consum Serv 17(3):224–228

Application of Machine Learning in Customer Services and E-commerce

831

3. Osterwalder A, Pigneur Y (2003) Modelling customer relationships in ebusiness illustrated through the mobile industry. BLED 2003 Proc 41 4. Morris MH, Pitt LF, Honeycutt ED (2001) Business-to-business marketing: a strategic approach. Sage 5. Helander N, Ulkuniemi P (2012) Customer perceived value in the software business. J High Technol Managem Res 23(1):26–35 6. Pearson S (2016) Building brands directly: creating business value from customer relationships. Springer 7. Ragins EJ, Greco AJ (2003) Customer relationship management and e-business: more than a software solution. Rev Bus 24(1):25 8. Xu D, Ruan C, Korpeoglu E, Kumar S, Achan K (2021) Theoretical understandings of product embedding for e-commerce machine learning. In: Proceedings of the 14th ACM international conference on web search and data mining, pp 256–264 9. Yang HW, Pan ZG, Wang XZ, Xu B (2004) A personalized product selection assistance based on e-commerce machine learning. In: Proceedings of 2004 international conference on machine learning and cybernetics (IEEE Cat. No. 04EX826). IEEE 4:2629–2633 10. Sabbeh SF (2018) Machine-learning techniques for customer retention: a comparative study. Int J Adv Comp Sci Appl 9(2) 11. Policarpo LM, da Silveira DE, da Rosa Righi R, Stoffel RA, da Costa CA, Barbosa JLV, Arcot T et al (2021) Machine learning through the lens of e-commerce initiatives: an up-to-date systematic literature review. Comp Sci Rev 41:100414 12. Wong AN, Marikannan BP (2020) Optimising e-commerce customer satisfaction with machine learning. J Phys Conf Ser 1712(1):012044. IOP Publishing 13. Pang H, Zhang W (2021) Decision support model of e-commerce strategic planning enhanced by machine learning. Inform Syst e-Business Manage 1–17 14. Capalbo V, Ghiani G, Manni E (2021) The role of optimization and machine learning in e-commerce logistics in 2030. Int J Econom Manage Eng 15(3):290–294 15. Shen B (2021) E-commerce customer segmentation via unsupervised machine learning. In: The 2nd international conference on computing and data science, pp 1–7 16. Dayton CM (1992) Logistic regression analysis. Stat 474:574 17. Liu CJ, Huang TS, Ho PT, Huang JC, Hsieh CT (2020) Machine learning-based e-commerce platform repurchase customer prediction model. PLoS ONE 15(12):e0243105 18. Monil P, Darshan P, Jecky R, Vimarsh C, Bhatt BR (2020) Customer segmentation using machine learning. Int J Res Appl Sci Eng Technol (IJRASET) 8(6):2104–2108 19. Singh A, Thakur N, Sharma A (2016) A review of supervised machine learning algorithms. In: 2016 3rd international conference on computing for sustainable global development (INDIACom). IEEE, pp 1310–1315 20. Mahesh B (2020) Machine learning algorithms—a review. Int J Sci Res (IJSR) 9:381–386 [Internet] 21. Ekelik H, Senol ¸ EM˙IR, A comparison of machine learning classifiers for evaluation of remarketing audiences in e-commerce. Eski¸sehir Osmangazi Üniversitesi ˙Iktisadi ve ˙Idari Bilimler Dergisi 16(2):341–359 22. Ray S (2019) A quick review of machine learning algorithms. In: 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon). IEEE, pp 35–39 23. Khan A, Baharudin B, Lee LH, Khan K (2010) A review of machine learning algorithms for text-documents classification. J Adv Inform Technol 1(1):4–20 24. Choudhary R, Gianey HK (2017) A comprehensive review on supervised machine learning algorithms. In: 2017 International conference on machine learning and data science (MLDS). IEEE, pp 37–43 25. Sarker IH (2021) Machine learning: algorithms, real-world applications and research directions. SN Comp Sci 2(3):1–21 26. Dhall D, Kaur R, Juneja M (2020) Machine learning: a review of the algorithms and their applications. Proc ICRIC 2019:47–63

832

G. Aarthi et al.

27. Praveena M, Jaiganesh V (2017) A literature review on supervised machine learning algorithms and boosting process. Int J Comp Appl 169(8):32–35 28. Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: a review of classification techniques. Emerg Artif Intell Appl Comp Eng 160(1):3–24 29. Santana CJ, Aguiar P, Bastos-Filho CJ (2018) Customer segmentation in a travel agency dataset using clustering algorithms. In: 2018 IEEE Latin American conference on computational intelligence (LA-CCI). IEEE, pp 1–6 30. Duong GHT, Nguyen-Thi TA (2021) A review: preprocessing techniques and data augmentation for sentiment analysis. Comput Soc Netw 8(1):1–16 31. Alexandropoulos SAN, Kotsiantis SB, Vrahatis MN (2019) Data preprocessing in predictive data mining. Knowl Eng Rev 34 32. Dwivedi SK, Rawat B (2015) A review paper on data preprocessing: a critical phase in web usage mining process. In: 2015 International conference on green computing and internet of things (ICGCIoT). IEEE, pp 506–510 33. Huang J, Li YF, Xie M (2015) An empirical analysis of data preprocessing for machine learningbased software cost estimation. Inf Softw Technol 67:108–127 34. Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from classimbalanced data: Review of methods and applications. Expert Syst Appl 73:220–239 35. Ali H, Salleh MNM, Hussain K, Ahmad A, Ullah A, Muhammad A, Khan M et al (2019) A review on data preprocessing methods for class imbalance problem. Int J Eng Technol 8:390–397 36. Famili A, Shen WM, Weber R, Simoudis E (1997) Data preprocessing and intelligent data analysis. Intell Data Anal 1(1):3–23

The Effect of Comorbidity on the Survival Rates of COVID-19 Using Quantum Machine Learning Arsheyee Shahapure, Anindita Banerjee, and Rehan Deshmukh

Abstract We report the effect of comorbidity on the survival rates of COVID-19 using quantum machine learning. Recent understanding of the novel coronavirus aims to verify the target organ of the virus, which could lead shortly to significant advances in the diagnosis and treatment of infected patients. An overview of the impact of the SARS-CoV-2 virus based on many different parameters such as age, type of comorbidity, and gender has been studied. The data calculations are done manually by referring to parent articles and using machine learning and quantum machine learning algorithm. It is helpful to verify the target age group, gender at risk of infection, and survival rates of the person. The data used is classical data, and quantum algorithms were run on it. We found out that the accuracy has increased to the classical machine learning state vector machine results. We found that pulmonary diseases are the most harmful type of comorbidity when an individual gets infected with COVID-19. Keywords Comorbidity · Machine learning · Support vector machine · Quantum machine learning · SARS-CoV-2

A. Shahapure Computer Science and Engineering, Dr. Vishwanath Karad MIT-World Peace University, Paud Road, Pune 411036, India A. Banerjee Centre for Development of Advanced Computing, Corporate Research and Development, C-DAC Innovation Park, Panchavati, Pashan, Pune 411008, India R. Deshmukh (B) School of Biosciences and Technology, Dr. Vishwanath Karad MIT-World Peace University, Paud Road, Pune 411036, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_57

833

834

A. Shahapure et al.

1 Introduction On 31st December, WHO was informed about a peculiar case of pneumonia with unknown cause in Wuhan City, Hubei Province of China [1]. It spread worldwide within a couple of months affecting millions of people [2]. Since then, the SARSCoV-2 virus has manifested varied symptoms in infected individuals with or without exiting illnesses [3, 4]. Therefore, the question as to how the existing diseased conditions could affect the severity of COVID-19 condition? And probably what would be the survival rates of patients diagnosed with COVID-19 suffering specifically from comorbidities? We selected patients with pulmonary diseases, hypertension/hypotension for this investigation. Data from patients admitted to several hospitals in Pune, Maharashtra, was included in our study. The government institutions were directly contacted for the raw data. Age, comorbidities, the patient’s condition, the name of the hospital, and the admission and discharge dates were all included in the raw data. Machine learning is a subpart of artificial intelligence [5]. It uses algorithms to study and analyze data to provide predictions. It uses mathematical models and statistics to build training models of a dataset [6]. These models are then tested and remodeled to give accurate results. The machine tries to group similar data patterns and make predictions based on them [7]. Whereas the quantum machine learning is the integration of quantum algorithms on machine learning programs [7]. It is the analysis of classical data on a quantum computer. While classical machine learning can process large amounts of data, the machine slows down as the size of the data [8]. Herein, quantum computers play a major role as they are superior in both speed and space to the classical machine [8]. In current paper, we have used quantum computing to determine the effect of comorbidities on the survival rates of COVID-19.

2 Materials and Method 2.1 Collection of Data The data of the COVID-19 infected individuals was procured from the local government institution of Pune. The data of the infected patients represented diverse parameters such as age, comorbidity, the status of the person, hospital name, and admission and discharge dates.

2.2 Analysis of Data The data was analyzed using three different methods. First, the manual analysis by making predictions based on the raw data in excel sheets; second, using machine

The Effect of Comorbidity on the Survival Rates of COVID-19 Using …

835

learning and third by quantum machine learning approach [5–7]. Quantum machine learning, which combines machine learning and quantum computing, has received a lot of interest recently [5]. Both machine learning and quantum computing have the potential to change the way computation is done in order to solve issues that were previously intractable [8]. Support vector machines (SVMs), the most popular kernel approach for classification issues, are a common tool for pattern identification in machine learning [9]. A support vector machine (SVM) is a supervised machine learning algorithm that divides vectors in a feature space into one of two groups. The SVM can be formulated as a quadratic programming problem, which can be solved in time proportional to O(log(ε−1 )poly(A, N)), with A the dimension of the feature space, N the number of training vectors, and ε the accuracy. Quantum SVM performs the classical SVM on a quantum system [10]. To implement quantum-based machine learning, the open software Qiskit was used and qiskit_machine_learning_modules gave numerous quantum circuit libraries and quantum kernels and inbuilt datasets. Further, QSVM algorithm was executed on Qiskit [11].

3 Results and Discussion 3.1 Effect of Comorbid Conditions: Predictive Analyzes of the COVID-19 Survival Rates Hypertension and pulmonary diseases were the two diseased conditions considered as comorbid conditions for determining the effect of survival rates if COVID-19 infection occurred. Based on the analysis of the data, two hypotheses were formed [11]: H0 : The survival rate of hypertension was more than that of pulmonary disease. H1 : The survival rate of pulmonary disease was more than hypertension. We could infer with certainty from the data that patients with lung disorders died at a higher rate than those with hypertension. Patients with pulmonary illnesses had overall survival rates of 48.68%, while those with hypertension had survival rates of 65.67%. The analyzes by gender revealed that patients with pulmonary illness and hypertension had lower survival rates for women than for men. In patients with lung illnesses, females had a 47.83% likelihood of survival, whereas male patients had a chance of survival of about 50%. Males with hypertension had the highest survival rates, with a survival rate of 67.18%. Females with lung illnesses had the lowest survival rate, with a death rate of 52.17%. The age-based research unequivocally demonstrated the decline in survival rate. It had an inverse correlation. The survival rate decreased as a person’s age rose. People over the age of 70 had a higher risk of dying from COVID-19 infection in the city of Pune than they had of surviving because as people age, their immune systems become weaker. Patients with hypertension had greater survival rates than those with pulmonary illnesses, even as their ages increased. The age group 40 ≤ A < 50 with hypertension had the highest survival

836

A. Shahapure et al.

Fig. 1 Segregation of the patients according to discharged and death parameters with the comorbidities according to the age groups and gender of the patients and their overall analysis. a overall percentage of survival having the two diseases was analyzed; b segregation of patients by gender having pulmonary and hypertension; c segregation of patients by age groups were sorted according to the diseases and their outcome

rate (62.50%), and patients with pulmonary disorders had the lowest survival rate (only 25% of them survived) (Fig. 1). Accordingly, the findings indicated that a male with hypertension in the age range of 40 ≤ A < 50 had the highest survival rates compared to other groups, while a female with pulmonary disorders in the age range of 80 ≤ A < 90 had the lowest survival rates. Therefore, we agreed with the null hypothesis. H0: Hypertension had a higher survival rate than pulmonary illness. A manual analysis of the data overview was used to complete the first analysis. The data was further analyzed using a dataset with 1000 patients. Figure 2 shows the distribution of the data according to age, gender, number of comorbidities, and comorbidities along with the survival analysis of each according to gender and age. Machine learning and quantum machine learning were applied on the data to get the accuracy of the models when trained and tested. After analyzing the data, we have then studied the rates of survival of the comorbidities. As evident from Fig. 3, the overall rate of survival of pulmonary disease patients is low and the survival rates of patients with other diseases are more. Clearly, the virus damaged the lungs, and as a result, if a person with pulmonary disease contracted the SARS-CoV-2 virus, more than half of them perished. Women outlived men in both hypertension and pulmonary illness conditions, according to data analysis by gender. Males and females with hypertension had a much lower mortality rate than individuals with pulmonary illnesses. The following conclusions were made generally:

The Effect of Comorbidity on the Survival Rates of COVID-19 Using …

837

b

a

d

c

(Females) (Males)

e

f

Fig. 2 Distribution of data. a There were three categories of the severity: 0, 1, 2, 3 where 0 is the least and 3 is the highest. From the data, we can see that the most patients that were admitted are on severity level 1 with 544 patients and least being 3 where only 64 patients were admitted; b Most patients that were in the data had no comorbidities and were kept as controls in the data; c the age groups are unevenly distributed; d the gender distribution its percentage; e The data has most patients suffering from diabetes and hypertension and f treatment used to treat the patient’s versus the outcome of the patients

1. Patients with various disorders had higher survival rates than those with pulmonary diseases. 2. Male patients with lung illnesses had a poorer rate of survival. 3. The age group with the lowest rate of survival for pulmonary illnesses was the elderly.

838

A. Shahapure et al.

Fig. 3 Segregation of the patients based on discharged and death parameter with the comorbidities according to the age groups and gender of the patients. The values on Y-axis represent number of patients

Therefore, based on the aforementioned conclusions, we again accepted the null hypothesis, H0 : Hypertension had a higher survival rate than pulmonary illness.

3.2 Data Analysis Using Classical Machine Learning In the classical machine learning, the data was processed. The column renaming and the filtering of data were done. The null values were replaced with the mean value of the data which was calculated on Python. The heat map was plotted to show the correlation of the data and the dependency of the columns in the data on each other (Fig. 4). If the columns on the intersection of the Y-axis have a value greater than 0.75, then a decency on the X-axis columns is observed [12]. The negative sign indicated indirectly the proportional correlation. The dataset was split into two parts: test and train the testing model were of 20% and train 80%. Train_test_split function was used to perform the split. The data was given the target column as outcome. Based on this, the machine algorithms were run and trained and tested. Four machine learning algorithms were used on this data, and their best results are as follows. It was observed that the best algorithm that gave the most accurate results was the random forest algorithm. This algorithm gave a 98.99% accuracy to the model when compared with others as shown in Table 1. Sklearn package was used to determine

The Effect of Comorbidity on the Survival Rates of COVID-19 Using …

839

Fig. 4 Heat map of data

Table 1 Machine learning algorithms used for data curation

Algorithm used

Accuracy of the model

KNN

89.33

Random forest

98.99

Logistic regression

91.25

Support vector machines

89.37

the accuracy of the models. The models were tested, and the mean and deviation were found: The mean score of the model was good at 0.896 which showed accurate results as it was tending toward 1. The deviation in the model was also less making the model more accurate wherein the standard deviation observed was 0.0267.

3.3 Data Analysis Using Quantum Machine Learning Machine learning is a popular data processing and data analyzing approach. Quantum machine learning was used to implement classical data on quantum circuits for quick and efficient data processing to obtain the results. Quantum support vector machines (QSVM) divided the hyperplane into linear and non-linear parts known as feature map and feature space. For performing QSVM, the classical dataset was first converted into the quantum form for it to be able to run on a quantum circuit to generate the feature maps. Some examples of non-linear kernels were Gaussian, polynomial, ANOVA, and sigmoid. The kernel trick that QSVM used to divide the hyperplane

840

A. Shahapure et al.

into linear and non-linear transformations avoided the unnecessary space complexity that would be found in a classical system [13]. In classical computing, the kernel function was not efficient, and the feature maps in quantum circuits run on quantum system allowed minimum usage of computational resources that otherwise was not possible in classical computation. QSVM followed same training and testing steps as done in classical systems. QSVM also provided multiclass extension algorithms that can be useful for classifying data with multiple groups. In QSVM, time complexity was also reduced. QSVM had a time complexity of O(log AN) run-time in both training and classification stages [2]. It was an advantage compared to the classical machines as time complexity was reduced significantly. Training time for SVM was O(nd2 ) where d is the dimension and n is the number of data points. The training time was high while analyzing the large dataset. SVM was not typically used in this scenario as the run-time of the algorithm was O(xy), where x is the number of support vectors and d is the number of data points. To implement quantum-based machine learning, the open software Qiskit was used, and qiskit_machine_learning_modules gave numerous quantum circuit libraries and quantum kernels and inbuilt datasets. Further, QSVM algorithm was executed on Qiskit. QSVM was provided with the aqua libraries that execute the quantum algorithms. The circuit that was used was Pauli-Z circuit, whereas the QSAM simulator was employed for execution. The complete circuit image is provided in Fig. 5. The Fig. 5 is divided into three subparts: the creation of the feature map, the kernel, and the training and testing of the data measurement. The feature map was created by using the code available on the Qiskit Website. The Z feature map was the first step that was used at the start of the circuit. Z feature map was the most basic circuit with no entanglement in the circuit. This circuit was the Pauli strings fixed. The order of expansion of the circuit was without entangled gates. The next step involved in creating the ZZ feature map. This circuit had linear entanglement presented. When paired with several other gates that were not diagonal in the computational basis, rotations around the Z axis could undoubtedly change the odds of measuring 0 and 1. The Hadamard gates’ placement at the circuit’s start and middle was important in this scenario studied. The fundamental concept was to create the data that the classifier would classify using the ZZ feature map. This explained why the ZZ feature map is functioning with a precision of 100%. The classification problem was created such that the quantum circuit combined with the SVM classifier would be powerful enough to learn the accurate basic distribution of the data. Evolution of the Pauli circuit took place with combining the effect of the Z feature map and the ZZ feature map. This resulted in the complete and ready circuit to operate on the data. The next step was the quantum kernel. The quantum kernel used for this circuit was the kernel available on Qiskit. As discussed earlier that the most interesting feature of the quantum machine learning was the kernel trick that divided the hyperplane into linear and non-linear to avoid the unnecessary space complexity that occurs in classical computing (Fig. 6). These were found after running the circuit and executing the job with 8192 shots. The division of the training and the test matrix is shown. The accuracy of the QSM model in quantum environments is shown below:

The Effect of Comorbidity on the Survival Rates of COVID-19 Using …

841

Fig. 5 Pauli-Z circuit. The job was executed using 8192 shots. The train set was of size 980, and the test set was size 20. The output was: [−0.9824049 0.36101949 −0.5600189 −0.53567675] [−0.97255923 −0.4893078 −0.55381597 −0.70895069]

Fig. 6 Training and testing kernel matrices

842

A. Shahapure et al.

linear kernel classification test score: 0.80. poly kernel classification test score: 0.90. rbf kernel classification test score: 0.90. sigmoid kernel classification test score: 0.80. The clustering and classification algorithms were also tried and implemented. Using the scikit-learn, the ad hoc da taset classification was implemented for data analysis (Fig. 7) [5]. The kernel was set up again using the ZZ feature map, and the Basic-Aer simulator was implemented on Qiskit to execute the circuit using 1024 shots. The results are in the Fig. 8. The kernel used for this was from the scikit-learn library. This is specifically the kernel used for the classification and clustering algorithm. Fig. 7 Ad hoc dataset for clustering described in supervised learning with quantum-enhanced feature spaces [4]

Fig. 8 Ad hoc clustering matrix

The Effect of Comorbidity on the Survival Rates of COVID-19 Using …

843

The data as shown in Fig. 8 was found after running the circuit and executing the job with 1024 shots. Clustering score was 87.82. However, as observed, the scores of the algorithms which were run have given 85.56% average response. These results are on par when compared with the classical machine learning algorithm of SVM. The ad hoc has better results than the Pauli circuit. Clustering yielded more accurate results while the target column was the same. The target column was the outcome same for all the algorithms such as SVM, QSVM, and ad hoc.

4 Conclusion We have performed the quantitative analysis of the data; however, the data was found to be uneven. Data and the number of people with various ailments were outnumbered by people with hypertension and diabetes. Men outnumbered women in the data thus minimized the sample size of women, which could be a potential cause of error. For more accurate results and to train the machine learning models, a larger dataset will be more beneficial representing proportional number of men and women. If the data reaching close to infinity, the results cannot be found with classical machine learning as it has limits to computations in time and power. Nonetheless, quantum computers can easily overcome this problem. Once the table of the dataset used to reach to a larger number of entries like 10,000, the quantum computers will give more accurate results. Researchers searching for a solution to the COVID-19 can benefit from the data analysis presented in this paper because it provides information on the age, gender, and overall survival rate of persons with comorbidities, particularly patients with pulmonary disease and those with hypertension. A wider dataset may be explored thanks to the merit offered by machine learning and quantum machine learning. In this report, people with hypertension had the highest rates of survival while people with pulmonary illnesses had the lowest rates. It might be because the virus targets the lungs as it is the target organ. The application of machine learning and quantum machine learning in the current study produced accurate and respectable findings. When comparing the results of SVM and QSVM, QSVM yielded results at par to SVM. Improvement in the quantum hardware and a lager dataset can provide more accurate results closer to 100%. This can contribute to computational accuracy and speedup. Acknowledgements Authors acknowledge their host institutes for logistical assistance. Conflict of Interest Authors declare that there is no conflict of interest.

844

A. Shahapure et al.

References 1. Dima S, et al (2021) Effect of comorbidity on lung cancer diagnosis timing and mortality: a nationwide population-based cohort study in Taiwan. BioMed Res Int. Hindawi, 4 Nov 2018. Web. 5 Feb 2. Janssen-Heijnen et al (2021) Effect of comorbidity on the treatment and prognosis of elderly patients with non-small cell lung cancer. Thorax. BMJ Publishing Group Ltd., 1 July 2004. Web. 5 Feb. 2021 3. Novel coronavirus structure reveals targets for vaccines and treatments, National Institutes of Health. U.S. Department of Health and Human Services, 10 Mar 2020. Web. 5 Feb 2021 4. SARS (2021) Centers for disease control and prevention. Centers for Disease Control and Prevention, 6 Dec 2017. Web. 5 Feb 5. Wang L, Shen H, Enfield K, Rheuban K (2021) COVID-19 infection detection using machine learning. IEEE Int Conf on Big Data (Big Data) 2021:4780–4789 6. Turabieh H, Ben W, Karaa A (2021) Predicting the existence of COVID-19 using machine learning based on laboratory findings. In: 2021 International Conference of Women in Data Science at Taif University (WiDSTaif), pp 1–7 7. Amin J, Sharif M, Gul N, Kadry S, Chakraborty C (2022) Quantum machine learning architecture for COVID-19 classification based on synthetic data generation using conditional adversarial neural network. In: Cognitive Computation. Springer US, 10 Aug, Web. 15 Aug 8. Acar E, Yılmaz I (2021) COVID-19 detection on IBM quantum computer with classicalquantum transfer learning. Turk J Electr Eng Comput Sci 29:46–61. https://doi.org/10.3906/ elk-2006-94 9. Gudigar A, et al (2021) Role of artificial intelligence in COVID-19 detection. Sensors 21(23):8045 10. Kaheel H, Hussein A, Chehab A (2022) AI-based image processing for COVID-19 detection in chest CT scan images. Front Web. 15 Aug 11. Tammemag M, Neslund-Dudas C, Simoff M, Kvale P (2022) Impact of comorbidity on lung cancer survival. Publication of the international union against cancer. onlinelibrary.wiley.com. 12. Havlíˇcek V, Córcoles AD, Temme K, Harrow AW, Kandala A, Chow JM, Gambetta JM (2019) Supervised learning with quantum-enhanced feature spaces. Nat 567(7747):209–212. https:// doi.org/10.1038/s41586-019-0980 13. Rebentrost P, Mohseni M, Lloyd S (2014) Quantum support vector machine for big data classification. Phys Rev Lett 113(13). https://doi.org/10.1103/physrevlett.113.130503 14. Rao S, et al (2021) COVID-19 detection using cough sound analysis and deep learning algorithms, pp 655–665

Multi-GPU-Enabled Quantum Circuit Simulations on HPC-AI System Manish Modani, Anindita Banerjee, and Abhishek Das

Abstract The quantum circuit simulator on HPC system plays a key role in algorithm implementation, algorithm design, quantum education, and associated other research areas. In this work, we have implemented cuQuantum for accelerating quantum computing workflows on 64-bit floating point and TensorFloat-32-based accelerated system. The commonly used quantum algorithms like Shor’s, quantum Fourier transformation (QFT), and Sycamore circuit are implemented on HPC-AI system. These algorithms are further accelerated using cuQuantum. The observed performance for GPU-enabled circuits increases linearly on 2, 4, and 8 A100 GPUs for the given qubit size. GPU-enabled performance obtained on PARAM Siddhi AI system is compared with those observed from CPU only run as well as observed from the previous generation volta architecture (V100) GPUs. For Shor, QFT, Sycamore circuits, the relative speedup (qubits) between eight A100 GPU enabled run and CPU only run and is observed as ~143x(30), ~115x(32), ~104x(32) respectively. Similarly, the relative speedup (qubits), between 4 V100 GPU enabled run and CPU run observed as ~43x(30), ~29x(32), and 24x(32) respectively. In view of the better compute capability and memory, the relative performance between four A100 and four V100 GPUs varies from 1.5x to 2.2x for all the three algorithms. Keywords Accelerated computing · cuQuantum · HPC-AI · NVIDIA GPU · A100 · V100 · Quantum simulator · Quantum circuits · QFT · Shor · Sycamore

M. Modani (B) NVIDIA, Pune, India e-mail: [email protected] A. Banerjee · A. Das Centre of Development for Advanced Computing, Pune, India e-mail: [email protected] A. Das e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_58

845

846

M. Modani et al.

1 Introduction Present day cryptographic methods for ensuring security of data in transit are based on mathematical problems that are hard to solve on a classical system. The existing mathematical primitives which are extensively used are based on integer factorization and discrete log, both, which are solvable on a quantum computer using Shor’s algorithm in polynomial time. Thus, the unfolding quantum era brings several promising quantum algorithms that on one hand challenges prevalent methods of data security and on the other hand complements several verticals like finance, drug discovery, data security (fraud detection), etc. Hence, quantum computing is going to play a pivotal role in our society, and therefore, it brings the importance to explore and design more sophisticated algorithms. In 1982, Feynman [1] introduces the idea to simulate quantum mechanical systems using quantum computer. This emphasized the urgency and criticality for the development of quantum computer to leverage the potentials of quantum computing. Finally, it was oxford physicists David Deutch [2] who is accredited with the development of idea of quantum computer, known as father of quantum computing. This development eventually led to birth of quantum algorithms particularly Shor’s algorithm [3], Grover’s algorithm [4], etc. Today, quantum computing is marching towards a selfsustaining industry [5]. There is a global race towards building a scalable quantum computer led by Government and tech giants like Google, Microsoft, IBM, etc. The quantum computing landscape is filled with a variety of hardware implementations based on technologies such as photonic, trapped ion, and superconducting. However, these belong to noisy intermediate scale quantum computer (NISQ). Alongside, with the development of quantum processor (quantum computer), there are quantum simulators [6], which permits the study of a quantum system in a programmable fashion. The quantum simulators are executed on classical computers and commonly used for (i) quantum system validation (ii) design upcoming quantum computers (iii) experiment Noise and Error mitigation techniques (iv) develop the quantum accelerated applications etc. The commonly used quantum circuit simulators are from IBMQiskit [7], Google-Cirq [8], Xanadu-Lightning [9], etc. In addition, QSim [10] with cloud access facility is a first-of-its-kind toolkit to be indigenously developed as a compact outcome by IISc Bangalore, IIT Roorkee, and C-DAC. The cuQuantum [11] which is the SDK of optimized libraries and tools which has been recently launched by NVIDIA. The purpose of this simulator is accelerating quantum computing workflows/simulators. This is a multi-GPU-enabled accelerated platform and is agnostic to the commonly used quantum simulators. In this paper, we implemented the cuQuantum on the HPC-AI system called PARAM Siddhi AI. Here, we used cuQuantum, the SDKs of optimized libraries and tools for accelerating quantum computing workflows on FP64 and TF32-based accelerated system, for performance analysis. We observed that the performance increases linearly on 2, 4, and 8 A100 GPUs for higher qubits sizes. The GPUenabled performance obtained on PARAM Siddhi AI system is compared with those observed from CPU only run as well as observed from previous generation volta

Multi-GPU-Enabled Quantum Circuit Simulations on HPC-AI System

847

architecture (V100) GPUs. The commonly used quantum algorithms (Shor, QFT, and Sycamore) are implemented. In the next section, the quantum algorithms (Shor, QFT, and Sycamore) and their setup are discussed in the detail. PARAM Siddhi AI system configuration is discussed in Sect. 3. Performance analysis on multi-GPUenabled runs using ampere (A100) and volta (V100) GPU architecture with CPU only performance is discussed in the results and discussion section. Conclusions section has summary of all the findings.

2 Quantum Computing Quantum computing harnesses the principles of quantum information science to solve certain class of problems that are complex for the classical computers. The classical analogue of a bit, quantum bit, or qubit is a state with a two-dimensional Hilbert space, i.e. which can exist in two mutually orthogonal states. Mathematically, a pure qubit state [12, 13] is represented by linear combination of two states as given below: |ψ = α|0 + β|1

(1)

where α and β are complex numbers with |α|2 + |β|2 = 1 as |ψ is a unit vector. The |0 and |1 are the computational basis states. The quantum algorithms will have its application in cryptography, search, optimization, simulation of quantum systems, etc. In this study, we narrowed down to three quantum algorithms which are Shor’s algorithm [14, 15], quantum Fourier transformation, and the sycamore circuit. The inspiration of Shor’s algorithm comes from number theory which shows that factoring x can be reduced to finding the period of a function, and the period can be found by quantum Fourier transformation (QFT) [12]. This is interesting because RSA which is the most widely used public key cryptography algorithm exploits the prime factorization problem to provide security. According to experts, the cracking of RSA2048-bit encryption key for conventional state-of-the-art computational devise will take around 300 trillion years whilst using Shor’s algorithm on quantum computer with 4099 perfectly stable qubits could break the encryption in 10 s, whilst for 20 million noisy qubits, it will take 8 h [16]. In Fig. 1a, b, we have presented Shor circuit and QFT. In this work, we have implemented random quantum circuits executed on the qubit sycamore quantum chip (Fig. 1c). In 2019, Google announced the quantum supremacy [17] by making a quantum computer sycamore perform series of task that it solved in 200 s which can be solved by a supercomputer in 10,000 years. However, this achievement also leaves some space for further discussions.

848

M. Modani et al.

Fig. 1 a Shor circuit implementation with 8 qubits in cuQuantum. b QFT circuit implementation with 6 qubits in cuQuantum. c Sycamore circuit implementation with 8 qubits in cuQuantum

3 PARAM Siddhi AI System Quantum circuit simulation behaviour is analyzed on PARAM Siddhi AI system node, which consists of Dual AMD EPYC Rome 7442 CPU (128 total CPU cores) and 8 NVIDIA A100 GPUs. It results in 19.5 teraflops double-precision peak performance per node. PARAM Siddhi-AI system has HDR InfiniBand network and 10.5 PB storage. The system-enabled researchers to simulate the accelerated applications on the modern high-performance computing architecture, however, these applications require optimizations to have optimal performance. The PARAM Siddhi AI system can deliver 210 AI petaflops and 6.3 petaflops of double-precision peak

Multi-GPU-Enabled Quantum Circuit Simulations on HPC-AI System

849

performances. The system (NVIDIA DGX A100) consists of the latest Dual AMD EPYC Rome 7742 with 128 total CPU cores. The 64 cores in EPYC are organized into sixteen core complex (CCX), in which groups of four cores have private L1 and L2 cache and shared L3 cache. The L3 cache size of AMD EPYC 7742 is relatively on the larger side in comparison with the other CPU architectures. NVIDIA A100 GPUs are based on ampere architecture, used to analyze the quantum simulator acceleration. A100 GPUs, with third-generation tensor core, delivers 300 teraflops deep learning performance. It has next-generation NVLink connectivity which delivers up to 600Gbps GPU-to-GPU bandwidth. A100 GPU also has 40 GB of high bandwidth memory (HBM2), which enables memory bandwidth of up to 1.6 TB/sec, almost 1.7x higher than the previous generation (Volta) GPUs. The sparsity feature provides 2x higher performance for the sparse model on A100 GPUs.

4 Results and Discussion In this work, we are studying the wall time of the commonly used algorithms with the variation in number of qubits and GPUs. The main goal of the experiment is to test the efficiency of the GPU-enabled quantum simulators over CPU only run using A100 GPU. In this study, the performance of cuQuantum for Shor, QFT, and Sycamore algorithms is analyzed post implementation on the one node of PARAM Siddhi-AI system. The main goal of the experiment is to test the efficiency of the latest generation ampere (A100) GPU, the previous generation volta (V100) GPUs, in the simulation of quantum circuits. The code for Shor, QFT, and Sycamore algorithms is written in Python. To have scalability analysis, the Python code is run for 2, 4, and 8 GPUs. The container version of cuQuantum 22.03 (cuquantum-appliance:22.03cirq) is used. To get the optimal performance measure, the best out of four consecutive runs is considered. The performance (wall time) to run Shor algorithm for 22(25), 26(63), and 30 (65) qubits (integer factor), QFT, and Sycamore algorithm run for 28, 30, and 32 qubits, respectively. The sample commands used to run the jobs using 8 GPUs are python./Shor.py –backend = mgpu –n_subsvs 8 –devices [0,1, 2, 3, 4, 5, 6, 7] 65 –n_runs 4; python./QFT.py –backend = mgpu –n_subsvs 8 –devices [0,1, 2, 3, 4, 5, 6, 7] –n_qubits = 32 –n_runs 4 and python./Supremacy.py –backend = mgpu –n_subsvs 8 –devices [0,1, 2, 3, 4, 5, 6, 7]n_runs 4 for Shor, QFT, and Sycamore algorithms, respectively.

5 Scalability on PARAM Siddhi AI System GPUs Figures 2, 3, and 4 show the performance of GPU-enabled run using cuQuantum for Shor, QFT, and Sycamore algorithms. It is clear that for smaller qubits size, there is no advantage observed of higher (8) GPUs. However, for higher qubits size (≥30), the advantage of 4 and 8 GPUs over 2 GPUs is almost linear. Relative speedup

850

M. Modani et al.

(qubit), 8 GPUs over 2 GPUs, for Shor, QFT, and Sycamore algorithms are 3.95x(30), 3.83x(32), and 3.86x(32), respectively. Similarly, relative speedup (qubit), 8 GPUs over 4 GPUs, for Shor, QFT, and Sycamore algorithms are 2.0x (30), 2.0x (32), and 1.95x (32), respectively. The 4 100000

Execuon Time (ms)

Fig. 2 Execution time for Shor algorithm, run for 22, 26, and 30 qubits using 2, 4, and 8 A100 GPUs, respectively

Shor 10000

22 26 30

1000

100

2

4

6

8

A100 GPUs

4000

Execuon Time (ms)

Fig. 3 Execution time for QFT algorithm, run for 28, 30, and 32 qubits using 2, 4, and 8 A100 GPUs, respectively

QFT

28 30 32

3000 2000 1000 0

2

4

6

8

A100 GPUs

1800

Execuon Time (ms)

Fig. 4 Execution time for Sycamore algorithm, run for 28, 30, and 32 qubits using 2, 4, and 8 A100 GPUs, respectively

Sycamore

1600

28 30 32

1400 1200 1000 800 600 400 200 0

2

4

A100 GPUs

6

8

Multi-GPU-Enabled Quantum Circuit Simulations on HPC-AI System

851

and 8 GPUs relative speedup shows that quantum simulators performance is linearly scaling with the increase in GPUs.

6 Comparative/Relative Performance In this section, the performance of Shor, QFT, and Sycamore algorithms is compared with those obtained from CPU only run, GPU-enabled run on PARAM Siddhi AI system with ampere (A100) GPUs, and then those observed from earlier generation GPU volta (V100), respectively. In CPU only mode, the algorithm runs on physical (128) cores as well as on logical (256 & 512) cores for Shor, QFT, and Sycamore algorithms. The lowest execution time from all three runs (128, 256 and 512 cores) is considered for the analysis. The execution time increases with the increase exponentially with the increase in number of qubits. It is expected as with the increase of qubits, the computational load, and memory. The execution commands for CPU only run are python./Shor.py –backend = cpu –n_threads = 128 65 –n_runs 4; python./QFT.py –backend = cpu –n_threads = 128 –n_qubits = 32 –n_runs 4; python./Supremacy.py –backend = cpu –n_threads = 128 –n_qubits = 32 –n_runs 4. To obtain the performance from earlier generation volta GPUs (V100), the system available with 4xV100-SXM2 GPUs of 32 GB, CUDA driver version 470.141.03, and CUDA Version 11.5 is used. Number of runs were made to analyze the comparative performance. For illustration, the performance for higher qubits (algorithm), e.g. 30 (Shor), 32 (QFT), and 32 (Sycamore) is illustrated in Figs. 5, 6, and 7, respectively. Significant speedup is observed for GPU-enabled Shor run on PARAM Siddhi AI system with A100 GPUs as well as on V100 GPUs, in comparison with CPU only run (Fig. 5). The observed speedup is 22x and 43x for 2 and 4 V100 GPUs run, whereas 36x, 72x, and 142x are for 2, 4, and 8 A100 GPUs run in comparison with CPU only run. Performance obtained using 2 and 4 A100 GPUs is approximately 160 140

Relave Speed Up

Fig. 5 Relative performance analysis for Shor algorithm’s 30 qubits run

120 100 80 60

142 CPU 2 x V100 4x V100 2 x A100 4 x A100 8 x A100 43

40 20

72 36

22 1

0 1

852 160

Relave Speed Up

Fig. 6 Relative performance analysis for QFT algorithm’s 32 qubits run

M. Modani et al.

140 120 100 80

CPU 2 x V100 4x V100 2 x A100 4 x A100

115

62

60 20

30

29

40

15 1

0 1

160

Relative Speed Up

Fig. 7 Relative performance analysis for Sycamore algorithm’s 32 qubits run

140 120 100

80

CPU 2 x V100 4x V100 2 x A100 4 x A100 8 x A100

104

53

60 40 20

1

14

27

24

0 1

~1.6x better than those observed from 2 and 4 V100 GPUs run. It is expected as A100 GPU has better compute and storage capabilities. Similarly, for QFT algorithm, the observed speedup is 15x and 29x for 2 and 4 V100 GPUs run, whereas 30x, 62x, and 115x speedup is observed for 2, 4, and 8 A100 GPUs-enabled run in comparison with CPU only run (Fig. 6). Performance obtained using 2 and 4 A100 GPUs is approximately ~2x better than those observed from 2 and 4 V100 GPUs run. In case of Sycamore algorithm, the observed speedup is 14x and 24x for 2 and 4 V100 GPUs run, whereas it is 27x, 53x, and 104x for 2, 4, and 8 A100 GPUs run in comparison with CPU only run (Fig. 7). Performance obtained using 2 and 4 A100 GPUs is approximately ~2x and 2.2x better than those observed from 2 and 4 V100 GPUs run, respectively. We bring to your notice that the performance obtained for CPU only and GPUenabled run is specific to the software version and system configuration. These may vary with a different implementation. We have observed approximately 20% better performance with higher memory (80 GB) A100 GPUs.

Multi-GPU-Enabled Quantum Circuit Simulations on HPC-AI System

853

7 Conclusions Quantum circuit simulators are commonly used to validate current quantum hardware, design next-generation quantum computer, experiment with noise model, and error migration techniques. NVIDIA cuQuantum, an SDK, of optimized libraries and tools, accelerates the quantum circuit simulators on NVIDIA GPUs. In the present work, the commonly used simulator algorithm’s (qubits) Shor(30), QFT (32), and Sycamore (32) circuit are implemented on PARAM Siddhi AI system using cuQuantum. The observed performance increases linearly on 2, 4, and 8 A100 GPUs for higher qubits sizes. GPU-enabled performance obtained on PARAM Siddhi AI system is compared with those observed from CPU only run as well as observed from the previous generation volta architecture (V100) GPUs. For Shor, QFT, and sycamore circuit, the relative speedup, between CPU only and 8 A100 GPU-enabled, run is observed as ~143x, ~115x, and 104x for 30, 32, and 32 qubits, respectively. Similarly, the relative speedup, between CPU only and 4 V100 GPU-enabled run, is observed as ~43x, ~29x, and 24x for 30, 32, and 32 qubits, respectively. In view of the better compute capability and memory, the relative performance between 4 A100 and 4 V100 GPUs varies from 1.5x to 2.2x for all the three algorithms. Acknowledgements The support and the resources provided by the Centre for Development of Advanced Computing (C-DAC) and the National Supercomputing Mission (NSM), Government of India are gratefully acknowledged. We acknowledge the use of computing resources at PARAM Siddhi AI (C-DAC, Pune).

References 1. Feynman R (1982) Simulating physics with computers. Int J Theo Phys 21:467–488 2. Deutch D (1985) Quantum theory, the Church–Turing principle and the universal quantum computer. In: Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences, 400 no. 1818, pp 97–117 3. Shor PW (1999) Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Rev 41(2):303–332 4. Grover LK (1997) Quantum mechanics helps in searching for a needle in a haystack. Phys Rev Lett 79(2):325 5. McQuantum computing use cases are getting real—what you need to know, December 14, 2021, https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/quantum-comput ing-use-cases-are-getting-real-what-you-need-to-know 6. https://thequantuminsider.com/2022/06/14/top-63-quantum-computer-simulators-for-2022/ 7. https://qiskit.org/ 8. https://quantumai.google/cirq 9. https://www.xanadu.ai/products/lightning/ 10. https://qctoolkit.in/ 11. https://developer.nvidia.com/cuquantum-sdk 12. Pathak A (2013) Elements of quantum computation and quantum communication. CRC Press, Boca Raton, USA. ISBN-10:1466517913; ISBN-13:978-1466517912

854

M. Modani et al.

13. Pathak A, Banerjee A (2016) Optical quantum information and quantum communication SPIE Spotlight Series. SPIE Press. ISBN: 9781510602212 14. Tankasala, Hesameddin I, Quantum-kit: simulating shor’s factorization of 24-bit number on desktop. arXiv:1908.07187 15. Demirci SM, Yılmaz A (2018) Accelerating shor’s factorization algorithm on GPUs. Canadian J Phys 96:759–761 16. Gidney C, Ekerå M (2021) How to factor 2048 bit RSA integers in 8 hours using 20 million noisy qubits, 5:433. arXiv:1905.09749v3, https://doi.org/10.22331/q-2021-04-15-433 17. Arute F, Arya K, Babbush R et al (2019) Quantum supremacy using a programmable superconducting processor. Nature 574:505 18. https://developer.nvidia.com/nsight-systems

Study of NFT Marketplace in the Metaverse Isha Deshpande, Rutuja Sangitrao, and Leena Panchal

Abstract The metaverse has opened up the next door in digital evolution. It has the ability to expand the domain of online services by providing a life-like online experience. The digital experience is powered by blockchain technology, tokenomics, and decentralization. The non-fungible token (NFT) is a unique token on the blockchain that is traded, bought, and sold on various NFT marketplaces. NFT marketplaces can also be curated in the metaverse, providing a more interactive experience. Key concepts and principles regarding metaverse, blockchain, decentralized applications, and user experience have been studied, and a recipe to understand and create a metaverse NFT marketplace has been presented in this paper. Keywords Blockchain · Metaverse · Digital twins · Non-fungible tokens (NFT) · NFT marketplace · Decentralized application (DApps) · Metaverse index · Tokenization

1 Introduction A metaverse is a virtual ecosystem focused on social interaction and virtual trading. The metaverse is opening the next door of digital exploration, and it has the potential to transform the nature of digital adoption by expanding the domain of services beyond traditional systems that are currently in play with online access. The metaverse has equipped organizations, creators artists, game developers, and designers to connect, share, and trade their work in an interactive life-like digital environment. Blockchain, tokenomics, decentralization, fifth-generation mobile network (5G), I. Deshpande (B) · R. Sangitrao · L. Panchal Department of Information Technology, Cummins College of Engineering for Women, Pune 411052, Maharashtra, India e-mail: [email protected] R. Sangitrao e-mail: [email protected] L. Panchal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_59

855

856

I. Deshpande et al.

cheaper cost of computation, adaptable and user-friendly game engines, augmented reality, virtual reality, and development of Web 3.0 are some of the factors that have fueled the rapid developments in the metaverse [1]. The non-fungible token (NFT) is a special token that is minted on the blockchain. NFTs can be digital artwork, photos, videos, audio, game assets, and virtual land. NFT provides a safe means to produce, store, purchase, sell, collect, and exchange digital assets, as well as prove intellectual ownership, protecting copyrights. NFTs can be bought, sold, or can be bid over in the decentralized NFT marketplace. The metaverse has developed at an incredible rate due to which the minting and trading of NFTs have skyrocketed. The accelerating growth in creating minted and trading of NFTs has opened a new door or opportunity for trading digital assets. This has shed fresh light on the importance and significance of an NFT marketplace [2]. Witnessing the digital reforms, technologies like augmented reality (AR), virtual reality (VR), and mixed reality (MR) and at the forefront of providing a consumer-centric and interactive experience. As the general populous adapted and explored the metaverse, traditional NFT marketplaces were outrun by metaverse NFT marketplaces [3].

2 Prelude to Metaverse Neal Stephenson coined the phrase “metaverse” in 1992 for his science fiction book Snow Crash [4]. From a broader perspective, metaverse can be envisioned as a decentralized online multiplayer game. A game where you have your own digital identity similar to that in the real world. Unlike the real world in the metaverse, your trades and transactions do not take place using cash, centralized banks, or legal papers indicating transfers of money, property artwork, etc. In the metaverse, this is accomplished using the blockchain and NFTs. As we adapt to the realm and 5G and Web 3.0, the traditional online experience of the consumer is evolving via the metaverse [5]. E-commerce platforms, online communities, transactions, and digital asset holdings are exploring the metaverse blockchain and digital twin technologies and consumer-centric user experience (UX) methodologies [6]. The metaverse is a concept used with various virtual platforms, infrastructures, and ecosystems being built. As there is no one sole creator or owner of the metaverse, the various decentralized apps (Dapps) [7] that are metaverses need to have an element of interoperability. Interoperability is not just in terms of the blockchain, and the metaverse is using but also the adaptability in the fidelity of graphics, importation of the virtual identities, and hardware compatibility between the various metaverses [8].

2.1 Seven Layers of Metaverse See Fig. 1.

Study of NFT Marketplace in the Metaverse

857

Fig. 1 Seven layer of metaverse [3]

1. Experience: The user’s interactions: games, social experiences, live music, and shopping. The experience layer of the metaverse is layer that is the medium through which users will interact with this new technology; thus, it needs to be designed in a customer-centric and user-friendly manner. The design interactions usability and adaptability of a similar experience on various devices should be executed [9]. 2. Discovery: The user learns about an experience through discovery. Knowledge of new initiatives and tutorials about the functioning of the metaverse should be created and communicated to the users [10]. 3. The creator economy: The metaverse can support the creator economy by serving as a new ecosystem for content and experience creation. The metaverse enables creators by providing tools such as design tools, animation systems, graphics tools, monetization technologies, targeted advertising, and communication using NFTs. 4. Spatial computing: The transition from a 2D screen to a 3D ecosystem using technologies like XR, game engines gesture recognition, spatial mapping, and AI [11]. 5. Decentralization: Transitioning from a centralized and dependent ecosystem to a pseudonymous, distributed, and liberalized structure is referred to as decentralization [12]. 6. Human Interface: The interaction between the user and the metaverse is facilitated by hardware such as mobile devices, VR headsets, haptics, and smart glasses [13]. 7. Infrastructures: Graphical processing units, semiconductors, materials science, cloud computing, and telecommunications networks like 5 g enable the higher layers to be built [14].

858

I. Deshpande et al.

2.2 Understanding Metaverse Index Metaverse index is a collection of 15 tokens generated to make capital out of on the trend of entertainment, enjoyment, fashion, creativity, sports, and business moving to the virtual realm. The index will be represented by an ERC20 token that can be purchased from tokensets.com. Virtual worlds and digital experiences are not band new concepts [15]. Due to terrific growth in digitalization, more people are turning to the virtual world with business ideas, creative entertainment and social engagement model, and enhanced social interaction. The metaverse index is designed to turn this trend into a single investable token at a high level. A crypto index in the metaverse will allow business owners and creative creators to get wide exposure to the crypto market in the metaverse and also get a division specific to crypto and NFT tokens which make trading for participants in the verse easy.

3 Stepping into the NFT Marketplace 3.1 Understanding NFT Non-fungible tokens (NFTs) are one-of-a-kind digital artifacts stored on the blockchain. The digital aspects can be art, audio tracks, video gaming asset, and digital real estate. The immutability of NFT helps us express ownership and authenticity of assets stored on a blockchain [16]. This ensures that the digital artifact for a given asset cannot be cloned. In the creator economy of the metaverse, NFTs are vital as they can be used as artifacts to create exclusive communities, as tokens of appreciation for the most loyal audience members, or as assets with which you can enter the creators or organizations’ metaverse [17].

3.2 NFT as a Digital Asset As the world becomes more digital, physical qualities such as scarcity, uniqueness, and proof of ownership are becoming more important [18]. As the digital world began to culminate using blockchain technology traits such as scarcity, individuality and ownership became relevant in the virtual ecosystem. Malpractices like coping of work, theft of ownership, and destruction of intellectual property and records are real-world problems that began to tranced into the virtual ecosystem. To find solutions to these, the concept of NFTs was introduced. NFTs are non-fungible (indivisible) which means that the asset or intellectual property can only have one owner at a time. They are invisible and traceable as they are minted using a blockchain. NFTs are tokens that make the digital assets a part of the blockchain, thus having a unique hash id helping it to differentiate and track the work and its

Study of NFT Marketplace in the Metaverse

859

owners. When a piece of digital data or asset is minted as an NFT, the ownership of that data or asset is with the person who minted it. NFTs can be bought and sold on different marketplaces such as OpenSea [19].

3.3 Concept of NFT Marketplace Metaverse is a virtual world developed using blockchain. NFTs are tokens minted using the blockchain [20]. They are unique tokens minted to secure, track, and differentiate data and assets. NFTs are traded on an NFT marketplace, which was traditionally hosted using Websites. An NFT metaverse marketplace is a space in the metaverse that can function as an avenue where NFTs and be displayed and traded. Virtual real estate, digital fashion, certificates, code snippets, audio files, and art are a few examples of NFTs that can be displayed and traded in the metaverse NFT marketplace. Metaverse NFT marketplaces can provide a elaborate [21].

4 Recipe for NFT Market Place Step 1: Scope of the marketplace The features that are added to the metaverse should be customized depending on your used case, scope, and vision for the virtual space. The basic features that are included in an NFT marketplace should be similar to that of a traditional e-commerce platform so that early adopters will be able to find those similarities and flawlessly navigate in the metaverse [22]. Step 2: Customer Journey The NFT metaverse marketplace is a new step in the e-commerce ecosystem. As the traditional user is accustomed to the sign-up, add-to-cart, and view product functionalities, the metaverse needs to implement the same user friend and navigable nature of functionality onto the metaverse. Uniformity, adaptability, and discoverability will be the keep principles that should be maintained while building more custom experience in the metaverse interface need to be simple responsive and easily navigable [23]. Step 3: Blockchain Depending on the scope, requirements, and finances of the project, a blockchain needs to be selected. A blockchain network needs to be picked that satisfies the requirements of your project. Currently, Ethereum is a popular blockchain that is used to host a metaverse. As blockchain technology continues to develop, metaverses are also built on private blockchains, side chains, and third-generation blockchains.

860

I. Deshpande et al.

There are also specific protocols that can support the development of a cross-chain NFT marketplace [23]. Step 4: Smart contract As the metaverse uses blockchain for transactions between two users, a smart contract is developed which is a self-executing contract that is created that specifies the conditions of the transaction between two trading parties [24]. Once the smart contract is created and executed, and the transaction is completed, it can be tracked as it will be on a public ledger. Although the transactions are traceable, they are irreversible. The smart contract should be dependable and aligned with the features that the marketplace intends to implement [25]. Step 5: Database configuration The metaverse marketplace that is built will require decentralized storage of user data. As NFTs will be traded on the marketplace, the security and efficiency of that platform need to be maintained. The security accuracy and efficiency will be shouldered by the interplanetary file system (IPSF) and other secure databases. The sets of unique metadata and user information are configured to store information about the NFT transaction history [26]. Step 6: Integration The metaverse can be thought of as an open environment multiplayer game, although the integration of blockchain is the key element that secures it and differentiates and elevates it from a game. Various application programming interfaces (API) are used like the blockchain, the crypto wallets, and the visual elements. The APIs and protocols will differ depending on the blockchain and the various development environment [27]. Step 7: Optimization, testing, and deployment The NFT metaverse marketplace should be rigorously tested as the transactions that will take place will be using blockchain, and the nature of those transactions is irreversible. The smart contracts should be checked and revised before implementation. The code that is developed to build the environment should be neat and easy to modify. The database configuration with the marketplace and the APIs that is in use should go through integration testing before the marketplace is deployed and open for transactions [28].

5 Conclusion Over a period of time as the Internet transitioned from Web 1.0 to Web 2.0 and now to Web 3.0, the concept of marketplace went through various iterations. Now with the metaverse taking shape, the experience economy has begun to develop. The metaverse

Study of NFT Marketplace in the Metaverse

861

NFT marketplace has the ability to provide the user with a traditional shopping experience in a virtual environment. The key concepts like adaptable and customer-centric user interface, decentralized computing, and application of blockchain in the creator economy are highlighted in this paper. Creating a metaverse NFT marketplace and virtual space is a trend that is swaying the creator economy in a new direction. There are a few concepts and steps that need to be planned and executed while creating a metaverse NFT marketplace. Researching various metaverse NFT marketplaces, we have curated 6 steps of execution required while setting up a customized NFT metaverse marketplace. Those steps are the selection of blockchain, customer journey, the scope of the market, deployment of smart contracts, database configuration, integration, and testing.

References 1. Wang Y, et al (2022) A survey on metaverse: fundamentals, security, and privacy. IEEE Comm Surv Tut, pp 1. https://doi.org/10.1109/COMST.2022.3202047 2. Lee L-H, et al (2021) All one needs to know about metaverse: a complete survey on technological singularity. Virtual Ecosystem, and Research Agenda, Oct [Online]. Available: http:// arxiv.org/abs/2110.05352 3. Metaverse report-Future is here Global XR indurstry insight (2022) 4. IEEE Xplore full-text PDF, https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=846 6786 (accessed Sep. 15, 2022) 5. Casino F, Dasaklis TK, Patsakis C (2019) A systematic literature review of blockchain-based applications: Current status, classification and open issues. Telematics Inform 36:55–81. https:// doi.org/10.1016/J.TELE.2018.11.006 6. Jiang XJ, Liu XF (202) CryptoKitties transaction network analysis: the rise and fall of the first blockchain game mania. Front Phys 9. https://doi.org/10.3389/FPHY.2021.631665 7. Somin S, Gordon G, Altshuler Y (2022) Social signals in the ethereum trading network, May 2018. Accessed: Sep. 15, 2022. [Online]. Available: http://arxiv.org/abs/1805.12097 8. Gabrielli S, Rizzi S, Mayora O, More S, Baun JCP, Vandevelde W (2022) Multidimensional study on Users’ evaluation of the KRAKEN personal data sharing platform. Appl Sci 12(7):3270. https://doi.org/10.3390/APP12073270 9. Santosha, Mallick K (2020) Causal relationship between crypto currencies: an analytical study between bitcoin and binance coin. J Contem Issues Bus Govern 26(2). https://doi.org/10.47750/ cibg.2020.26.02.265 10. NFTs and the metaverse revolution: research perspectives and open challenges. SpringerLink. https://doi.org/10.1007/978-3-030-95108-5_6 (accessed Sep. 15, 2022) 11. Wang Q, Li R, Wang Q, Chen S (2022) Non-fungible token (NFT): overview, evaluation, opportunities and challenges, May. Accessed: Sep. 15, 2022. [Online]. Available: http://arxiv. org/abs/2105.07447 12. Christodoulou K, Katelaris L, Themistocleous M, Christodoulou P, Iosif E (2022) NFTs and the metaverse revolution: research perspectives and open challenges, pp 139–178. https://doi. org/10.1007/978-3-030-95108-5_6 13. Aghaei S (2012) Evolution of the world wide web: from web 1.0 to web 4.0. Int J Web Sem Technol 3(1):1–10. https://doi.org/10.5121/IJWEST.2012.3101 14. Xu M, Chen X, Kou G (2019) A systematic review of blockchain. Fin Inno 5(1):1–14. https:// doi.org/10.1186/S40854-019-0147-Z/FIGURES/2 15. Bloomberg Global Equity Indices Bloomberg Metaverse Index Methodology Bloomberg Global Equity Indices 2

862

I. Deshpande et al.

16. de Xavier FA, Lolayekar AP, Mukhopadhyay P (2021) Decentralization and its impact on growth in India. J South Asian Dev 16(1):130–151. https://doi.org/10.1177/097317412110 13210/ASSET/IMAGES/LARGE/10.1177_09731741211013210-FIG1.JPEG 17. Dey S (2018) A proof of work: securing majority-attack in blockchain using machine learning and algorithmic game theory. Int J Wireless Micro Technol 8(5):1–9. https://doi.org/10.5815/ IJWMT.2018.05.01 18. Haque S, Eberhart Z, Bansal A, McMillan C (2022) Semantic similarity metrics for evaluating source code summarization. In: IEEE International Conference on Program Comprehension, vol 2022, March, pp 36–47. https://doi.org/10.1145/nnnnnnn.nnnnnnn 19. Zhao Y et al (2022) Metaverse: perspectives from graphics, interactions and visualization. Vis Inform 6(1):56–67. https://doi.org/10.1016/J.VISINF.2022.03.002 20. Chen X, Wang M, Wu Q (2017) Research and development of virtual reality game based on unreal engine 4. In: 2017 4th International Conference on Systems and Informatics, ICSAI, vol 2018, January, pp 1388–1393, Jun. https://doi.org/10.1109/ICSAI.2017.8248503 21. Momtaz PP (2022) Some very simple economics of web 3 and the metaverse. FinTech 1(3):225– 234. https://doi.org/10.3390/fintech1030018 22. Patil M (2020) Land registry on blockchain. Master’s Projects, p 912, May. https://doi.org/10. 31979/etd.2cc7-a5nd 23. Gadekallu TR, et al, Blockchain for the metaverse: a review, Mar. Available: http://arxiv.org/ abs/2203.09738 24. Chohan UW (2021) Non-fungible tokens: blockchains, scarcity, and value 25. Sunyaev A et al (2021) Token economy. Bus Inf Syst Eng 63(4):457–478. https://doi.org/10. 1007/S12599-021-00684-1 26. Lacity MC, Treiblmaier H (eds) (2022) Blockchains and the token economy. https://doi.org/ 10.1007/978-3-030-95108-5 27. Panda SK, Satapathy SC (2021) An investigation into smart contract deployment on ethereum platform using web3.js and solidity using blockchain, pp 549–561. https://doi.org/10.1007/ 978-981-16-0171-2_52 28. Lal C, Marijan D (2021) Blockchain testing: challenges, techniques, and research directions, Mar [Online]. Available: http://arxiv.org/abs/2103.10074

Secure Authentication of IoT Devices Using Upgradable Smart Contract and Fog-Blockchain Technology Shital H. Dinde and S. K. Shirgave

Abstract There are many new challenges related to the security of the Internet of Things networks due to the rapid growth of the technology. One of the main challenges is device security and authentication. Due to their limited processing and storage capabilities, the Internet of Things devices are unable to protect themselves from different attacks. Passwords or predefined keys have drawbacks that limit their uses. With high trust, integrity, and transparency, distributed ledger technology has the ability to address these issues in IoT. The solution discussed in this paper is to use fog computing and blockchain technology to authenticate IoT devices using smart tokens generated by an upgradable smart contract. Keywords IoT · Authentication · Smart contract · Blockchain · Distributed shared ledger · Fog computing

1 Introduction Technology has changed the standard of living in the community. Communication and semiconductor technologies allow devices to be associated with a network and change the way machines and people are connected. This is referred to as the Internet of Things [1]. Low-power lossy networks (LLNs) are the standard for the Internet of Things since it uses smart devices and fast networks. The restricted asset can be utilized by these LLNs [2, 3]. The devices in the Internet of Things may be controlled from a distance. Wearable and large machines with sensor chips are different from certainly linked devices [4]. S. H. Dinde (B) Department of Technology, Shivaji University, Kolhapur, Maharashtra, India e-mail: [email protected] S. K. Shirgave Department of Computer Science and Engineering, DKTE’s Textile and Engineering Institute, Ichalkaranji, Maharashtra, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_60

863

864

S. H. Dinde and S. K. Shirgave

The computing infrastructure becomes more complex as the connection between devices increases. This complexity can give exposure to vulnerabilities for the different cyber-attacks. In the Internet of Things, the actual devices are put in a way that protects them from hackers and gives them the chance to modify the data that is transmitted over the network. Device authorization and data root are essential issues. In the last couple of years, there have been many features of blockchain that address various issues in the Internet of Things networks. In a distributed decentralized ledger, every transaction is recorded once it satisfies the proof of work consensus which overcomes the single point of failure. The digital ledger transactions are immutable and can be established through the setting of the Internet of Things network which helps with getting the interest of the public in the network. Public trust plays a fundamental role in public financial exchanges, starting with a different world of the distributed economy in the Internet of Things [4–7]. The main objective of this paper is to propose an upgradable smart contract in case of any missing features or to patch any bugs, as well as an authentication mechanism that uses fog-based blockchain technology. As it tackles the limits of computation and storage assets, this is beneficial for the Internet of Things [8]. The smart contract feature allows the execution of custom logic inside the blockchain. The fog computing decision helps to see and block cyber-attacks. Fog nodes are close to the Internet of Things, which allows them to move large processing tasks [9]. The benefit diminished the computational burden. After an underlying setup is done by the user, the proposed framework will allow the devices to work independently and safely. It is more secure because all of the fog nodes act as a blockchain.

2 Blockchain and Fog Computing The first person to promote the creation of a digital currency was Satoshi Nakamoto [10]. Initially, it was created with blockchain technology to remove double-spending. But nowadays, there are different uses for the blockchain. The blockchain contains a digital ledger that has ordered list of blocks. Each block contains multiple transactions and is linked with the previous block. The transaction history cannot be changed without completely changing the contents of the block [11]. This makes it harder for hackers to gain access to the ledger. As described in Fig. 1, each block contains a block header and block body. The block header contains the Block version, Markel Tree root hash, Timestamp, difficulty, nonce, and hash of the previous block. Public key cryptography and distributed consensus algorithms are implemented for user security. Blockchain has the following main features as follows.

Secure Authentication of IoT Devices Using Upgradable Smart Contract …

865

Fig. 1 Blockchain network and block structure

1. Immutability: Immutability implies things that cannot be adjusted or modified. This is one of the top blockchain highlights that assist to guarantee that the technology will remain as it is—a permanent, unchangeable network. 2. Decentralized: Decentralized network implies that it has no controlling authority or an individual looking for the architecture. On the other hand, a group of nodes keeps up with the network making it decentralized. Consensus algorithms are utilized to keep up with information consistency on blockchain networks. 3. Enhanced Security: As it disposes of the requirement for a central authority, nobody can just change any characteristics of the organization for their advantage. Utilizing encryption adds one more layer of safety to the framework. Each datum on the blockchain is hashed cryptographically. 4. Anonymity: The identity of the user is not revealed on the network as they are identified with a unique address. 5. Auditability: Every transaction in the network can be easily tracked as every transaction is associated with the previous one. Additional processing tasks can be performed at the edge nodes with fog computing. The resource constraints device cannot handle large processing tasks so, they can move the task to fog nodes. With the help of this fog computing technology, the Internet of Things can give fast responses. Fog computing prohibits cloud computing problems like delay, privacy concerns, and network congestion on the off chance that the information processing is occurring on the cloud [12]. Cloud computing is not a substitute for fog networking. The cloud performs longer-term analytics, while fog computing gives short-term analytics at the edge. Even though edge devices and sensors are where the information is gathered, they do not have the process and storage capacity assets to perform progressed tasks. Cloud

866

S. H. Dinde and S. K. Shirgave

Fig. 2 Fog computing

servers have the ability to do this, but they are far away to deal with the information and reply as soon as possible. There are some advantages to building fog blockchain-based IoT applications. 1. The shared data by different IoT applications can be easily traced and tracked by blockchain which improves trustworthiness. 2. As blockchain records are immutable, no one can change the record with erroneous data. 3. An additional layer of safety that would be moored by without a doubt the most energetic encryption rules open would be an added layer of blockchain safety that any undermining onscreen character would have to avoid. 4. Fog computing overcomes the computational and storage burden of IoT devices (Fig. 2).

3 Literature Review The Internet of Things is one of the newest developments. There will be 27 billion associated devices on the planet by the year 2025. Increasing security concerns like vulnerabilities and cyber-attacks can cause many clients to keep away from using the Internet of Things. The security of the Internet of Things is an enormous issue for affiliations that work in medical care, finance, logistics, retail, and other businesses. There are a lot of different studies that have been done in the area of identity and authentication for the Internet of Things. A group of identification features is checked against the information in a dataset or authentication server [13] in order to verify identity.

Secure Authentication of IoT Devices Using Upgradable Smart Contract …

867

In [14], an Identity Management Framework for the Internet of Things is proposed by Chen and Liu that has three basic issues of Identity management, which include the standard identity data model, client-driven design, and multi-channel authentication model. This allows the client to gain admittance to various Internet of Things services without keeping up with multiple identities. An identity approach for the Internet of Things that can profit from the SDN and fog processing is proposed by the authors [15]. They accept that every device in the Internet of Things has an IPv6 address. Scalability and single point of failure are issues of using a centralized security approach. Decentralized identity storage capacity is used to stay away from these limits. In mutual authentication, the two components approve each other when a client checks itself to a server and vice versa [16, 17]. Certificate-based and password-based are the most popular types of mutual authentication [17]. There is a risk of online distortion in applications like web-based businesses guaranteeing to do activities with legitimate entities for genuine purposes. In [18] to perform IoT authentication, Li, Peng, and others proposed a lightweight framework that uses asymmetric key cryptography and blockchain. To make sure that the framework would not go down regardless of whether a few hubs are under attack. The hashed information, device ID, and public key were put away in the blockchain while registration. Every node creates its private and public keys utilizing (CSPRNG), and these public keys are put away in the blockchain. The Internet of Things could be secured with a threshold cryptography-based group authentication [19]. Five primary capabilities are included. Key dissemination, key update, group credit generation, and message decoding are some of the things that are done. A centralized Identity Management scheme for the Internet of things devices is proposed in [20]. Mahalle et al. [19] proposed a group authentication mechanism in which the produced keys are divided between many nodes. The risk of one of the nodes being compromised is seriously jeopardized by the use of aggregate and key-sharing security. The work proposed [21] a certificate-based authentication convention for distributed Internet of Things frameworks. The convention to cloning assaults is caused by putting the certifications at the edge hubs. While these procedures give authentication to IoT devices, however, it limits emerge from being a centralized design. With a centralized design, the framework becomes restricted regarding scalability, dependability, and furthermore security as it becomes a single point of attack. Also, these strategies do not use fog node computing power and do not think about the restricted power and computational capacity limit that IoT devices have.

4 Proposed Fog Blockchain-Based Architecture The architecture and design details of our proposed fog blockchain-based authentication system are described in this section.

868

S. H. Dinde and S. K. Shirgave

4.1 Overall System Architecture The framework design is shown in Fig. 3. The main participants in the design have access to the smart contract. These are admin, end users, and fog node. The smart contract does not have direct interaction with IoT having the unique public and private keys of the devices. It can be done through a front-end application for the admins and end users. • Admin: The admin provides the user and device access control policies through smart contracts. The smart contract is owned by the admin. Different users and devices can be added by him. • End Users: They ask for access authorization from the smart contract for getting permission to access the fog node of the device. • Fog Nodes: These are nearer to the Internet of Things and do local storage and computation to diminish response time from the cloud to the Internet of Things. The fog nodes are part of the blockchain network. There are multiple IoT devices managed by the fog node. This makes it possible to store, introduce, and run an Ethereum client. IoT devices are not having the pressure of computation, storage, and interaction with Ethereum. • IoT Devices: The IoT device is assumed to be resource constrained so that each device is mapped to one fog node.

Fig. 3 System architecture

Secure Authentication of IoT Devices Using Upgradable Smart Contract …

869

4.2 Proposed Authentication Scheme The admin builds a smart contract to register the devices and assign them to a fog block. Each one has a unique address. End users could be appointed with access to specific devices using UserDeviceMapping. The Authentication function is used when the end user needs to get to access of specific device. The following are the proposed algorithms involved in the IoT device authentication process (Algorithms 1, 2, 3, 4, 5, and 6). The saved list of authorized devices will be checked by the smart contract. The token TokenCreated = (UID, T, , EAUser, EADevice, EAFog) will be broadcast to all users if the smart contract is authorized. The Unique Identification UID is produced from the parameters of the token using keccak256 hashing. The user will be able to use it to send a signed message SignedUser(TokenUser) to the fog once he gets the acknowledgment token. The smart contract token data, unique device ID, timestamp, Ethereum addresses, of the user and device, and the user’s public key are contained in this UserToken.

Algorithm 1 User registration

Algorithm 2 Device registration

870

Algorithm 3 Mapping of IoT device and fog node

Algorithm 4 Mapping of IoT device and user

Algorithm 5 Authentication of user

S. H. Dinde and S. K. Shirgave

Secure Authentication of IoT Devices Using Upgradable Smart Contract …

871

Algorithm 6 Authentication process at IoT device end

5 Implementation The implementation aspects of the smart contract are given in this section. Solidity programming language is used to create the smart contract [22]. The smart contract contains variables, structs, modifiers, events, and different functions. This implementation is tested on the Ethereum framework Ganache. Figure 4 shows the variables and structs used in the smart contract. The smart contract is owned by admin. The data connected with a token is saved in the token struct. There are two mappings presented. A user can access a group of devices if his devices are linked to his address. The devices linked to a fog node are represented by fog devices. Adding and deletion tasks are restricted to administrator only. A DeviceCreated event is triggered when a device is added using the createDevice function. The mapping list of each authorized device for the authorized user is created using the userDeviceMapping function. This function is given in Algorithm 4. The FogDeviceMapping function adds the mapping of device to fog node and causes a FogDeviceMappingAdded event. The function works as it is mentioned in Algorithm 3 (Fig. 5).

5.1 Authentication Function Figure 6 shows the Authentication function called by the user to get to access the IoT device. This function works as given in Algorithm 5.

5.2 Upgradable Smart Contract The smart contracts that are deployed on the Ethereum blockchain network are immutable by default. While this accomplishes decentralization and security, it can lessen the usefulness of a smart contract.

872

S. H. Dinde and S. K. Shirgave

Fig. 4 Declaration of structure and mappings

Taking care of this issue requires utilizing upgradeable smart contract designs during advancement. Upgradeable smart contracts, developed by utilizing proxy patterns, empower developers to change contract functionality after deployment without harming security or decentralization as well as previous data [23]. An upgradeable smart contract utilizes an exceptional element called a “proxy pattern” that offers developers a space to change contract functionality postdeployment. The following is a breakdown of the parts in the proxy-contract design Fig. 7: 1. The proxy contract 2. The execution or logic contract The principal contract (proxy contract) is deployed at the beginning and contains the contract storage capacity and balance. In the meantime, the second agreement (execution contract) stores the contract logic utilized in executing capabilities. The address of the execution contract is stored in the proxy contract while using the proxy pattern. At the point when clients send requests, the message goes through the proxy contract, which courses it to the execution contract. A short time later, the intermediary contract gets the result of the processing from the execution contract and returns it to the client. The proxy contract itself cannot be changed. Notwithstanding, we can make extra execution contracts with updated contract logic and reroute message calls to the

Secure Authentication of IoT Devices Using Upgradable Smart Contract …

873

Fig. 5 Functions created in smart contracts

new contract. With proxy patterns, smart contracts cannot be changed at all. Just deploying another logic contract and asking the proxy contract to reference it rather than the old contract. This new logic contract can have extra usefulness or fixes for an old bug. Steps involved in making an upgradable smart contract: 1. Create the first version of the smart contract 2. Deploy the first version of the smart contract in the network using the following command npx hardhat run –network localhost .\deploycontract1.js

3. Check whether the contract is working or not using the following command npx hardhat run –network localhost .\index.js

4. Create a new version of the smart contract 5. Deploy the new version of the smart contract in the network using the following command npx hardhat run –network localhost .\deploypgradedcontract2.js

6. Upgrade the contract address and restore previous data

874

S. H. Dinde and S. K. Shirgave

Fig. 6 Authentication function

npx hardhat run –network localhost .\restoredata_contract.js

7. Check whether the contract is working or not using the following command npx hardhat run –network localhost .\index.js

6 Security Analysis There are threats and attacks on systems discussed here. The proposed architecture focuses on how we can make the Internet of Things simpler to use. Ensuring that connected devices on the Internet of Things can be trusted is a strong requirement. When an IoT device tries to connect to a server, it needs a unique identity that can be verified. This unique ID is given to a device by Ethereum as Ethereum Address. The traditional authentication method for device authentication

Secure Authentication of IoT Devices Using Upgradable Smart Contract …

875

Fig. 7 Upgradeable smart contracts flowchart

is centralized and password based. Every device has a username and password which are stored on a central server. With the help of the Brute force attack, attackers can get access to the device. In the proposed solution, a blockchain-based authentication scheme is used which achieves confidentiality. A blockchain network is a publictrusted distributed network as it has a transparent and immutable property. Logs are maintained for every activity in the network so, even if an attacker tries to get access or modify something we can easily find it. Due to the periodic upgradable nature of the smart contracts discussed here, one cannot easily recognize the access pattern. One cannot clone the device as all its information is stored in a smart contract, if the information is matched and cross-verified with the fog nodes then access is given otherwise not. The overload of computation on IoT is overcome by using fog computing. The processing is done by fog nodes. Here, fog nodes are present the in blockchain network so they are secure. Information trustworthiness and non-repudiation are important in maintaining the sender’s identity. The replay and MITM attacks are restricted by the use of a nonce and timestamps. The attacker cannot sign it with his key. The smart contract signs the events generated which makes the system more secure. The proposed scheme also restricts the denial-of-service attack because records of all mappings for users, smart objects, fog nodes, and admins are kept on the public Ethereum ledger to be run in a distributed and decentralized manner, and guarded by

876

S. H. Dinde and S. K. Shirgave

a huge number of mining hubs that keep records in highly original, consistent, safe, and integrity-supported manner.

7 Conclusion The framework design and execution of a fog blockchain-based structure for Internet of Things device authentication is proposed in the paper. It uses the Ethereum framework and smart contract with Solidity Programming for implementation where there is no third-party involvement. The computation overhead on IoT due to integration with blockchain is alleviated by the presence of fog nodes in the Ethereum network. The implementation details are discussed here. The complexity of a proposed algorithm is O(n). The security analysis shows that the proposed mechanism can be used against known attacks. Constructing a quantum-safe Raspberry-pi-based Internet of Things framework with fog nodes.

References 1. Banerjee M, Lee J, Choo K-KR (2017) A blockchain future to Internet of Things security: a position paper. Digital Commu Netw. https://doi.org/10.1016/j.dcan.2017.10.006 2. Atzori L, Iera A, Morabito G (2010) The Internet of Things: a survey. Comput Netw 54:2787– 2805 3. Giusto D, Iera A, Morabito G, Atzori L (2014) ‘The Internet of Things’, 20th Tyrrhenian Workshop on Digital Communication. Springer Publishing Company, Incorporated 4. Khan MA, Salah K (2017) IoT security: Review, blockchain solutions, and open challenges. Future Gene Comp Syst. https://doi.org/10.1016/j.future.2017.11.022 5. Antonopoulos AM (2014) Mastering bitcoin, 1st edn. O’Reilly Media, USA 6. Christidis K, DevetsikIoTis M (2016) Blockchains and smart contracts for the internet of things. IEEE Access 4:2292–2303 7. Swan (2015) Blockchain blue print for a new economy, 1st edn. O’Reilly Media, USA 8. Commercial National Security Algorithm Suite and Quantum Computing FAQ U.S. National Security Agency, January 2016 9. Tordera EM, Masip-Bruin X, Garcia-Alminana J, Jukan A, Ren GJ, Zhu J, Farre J (2016) What is a fog node a tutorial on current concepts towards a common definition. arXiv preprint arXiv: 1611.09193 10. Nakamoto S (2008) Bitcoin: a peer-to-peer electronic cash system. White paper 11. Zheng Z, Xie S, Dai H, Chen X, Wang H (2017) An overview of blockchain technology: architecture, consensus, and future trends. In: 2017 IEEE International Congress on Big Data (BigData Congress) 12. Hong HJ (2017) From cloud computing to fog computing: unleash the power of edge and end devices. In: 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). https://doi.org/10.1109/cloudcom.2017.53 13. Lee I, Lee K (2015) The Internet of Things (IoT): applications, investments, and challenges for enterprises. Business Horizons, pp 431–440. https://doi.org/10.1016/j.bushor.2015.03.008 14. Chen J, Liu Y, Chai Y (2015) An identity management framework for internet of things. In: IEEE 12th International Conference on e-Business Engineering. IEEE, pp 360–364

Secure Authentication of IoT Devices Using Upgradable Smart Contract …

877

15. Salman O, Abdallah S, Elhajj IH, Chehab A, Kayssi A (2016) Identity based authentication scheme for the Internet of Things. In: 2016 IEEE Symposium on Computers and Communication (ISCC). IEEE, pp 1109–1111 16. Saif I, Peasley S, Perinkolam A (2015) Safeguarding the internet of things: Being secure, vigilant, and resilient in the connected age. https://dupress.deloitte.com/dupus-en/deloittereview/ issue-17/internet-of-things-data-security-andprivacy.html [Retrieved: 2015-07-27] 17. Weber M, Boban M (2016) Security challenges of the internet of things. In: 2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), May, pp 638–643 18. Li D, Peng W, Deng W, Gai F (2018) A blockchain-based authentication and security mechanism for IoT. In: 2018 27th International Conference on Computer Communication and Networks (ICCCN), pp 1–6 19. Mahalle PN, Prasad NR, Prasad R (2014) Threshold cryptographybased group authentication (tcga) scheme for the internet of things (IoT). In: 2014 4th International Conference on Wireless Communications, pp 1–5 20. Trnka M, Cerny T (2016) Identity management of devices in internet of things environment. In: 2016 6th International Conference on IT Convergence and Security (ICITCS), Prague, pp 1–4 21. Porambage P, Schmitt C, Kumar P, Gurtov AV, Ylianttila M (2014) Two-phase authentication protocol for wireless sensor networks in distributed IoT applications. In: 2014 IEEE Wireless Communications and Networking Conference (WCNC), pp 2728–2733 22. “Solidity Programming”, https://soliditylang.org/, 2022 23. https://docs.openzeppelin.com/learn/upgrading-smart-contracts 24. Wan J, Li J, Imran M, Li D, E-Amin F (2019) A blockchain-based solution for enhancing security and privacy in smart factory. IEEE Trans Indust Inform 25. Gabriel T, Cornel-Cristian A, Arhip-Calin M, Zamfirescu A (2019) Cloud storage. a comparison between centralized solutions versus decentralized cloud storage solutions using Blockchain technology. In: 2019 54th International Universities Power Engineering Conference (UPEC), Bucharest, Romania, pp 1–5 26. Andrea I, Chrysostomou C, Hadjichristofi G (2015) Internet of things: security vulnerabilities and challenges. In: 2015 IEEE Symposium on Computers and Communication (ISCC), July, pp 180–187 27. Khan MA, Salah K (2018) IoT security: review, blockchain solutions, and open challenges. Future Generation Comp Syst 82:395–411. [Online]. Available: http://www.sciencedirect.com/ science/article/pii/S0167739X17315765 28. Agrawal TK, Kalaiarasan R, Wiktorsson M (2020) Blockchain-based secured collaborative model for supply chain resource sharing and visibility. In: Lalic B, Majstorovic, V, Marjanovic U, von Cieminski G, Romero D (eds), Advances in production management systems. The path to digital transformation and innovation of production management systems. Springer International Publishing, pp 259–266. https://doi.org/10.1007/978-3-030-57993-7_30 29. Biswas B, Gupta R (2019) Analysis of barriers to implement blockchain in industry and service sectors. Comp Indust Eng 136:225–241. https://doi.org/10.1016/j.cie.2019.07.005 30. Mandolla C, Petruzzelli AM, Percoco G, Urbinati A (2019). Building a digital twin for additive manufacturing through the exploitation of blockchain: a case analysis of the aircraft industry. Comp Indus 109:134–152. https://doi.org/10.1016/j.compind.2019.04.011 31. Ahmed ZE, Hasan MK, Saeed RA, Hassan R, Islam S, Mokhtar RA, Khan S, Akhtaruzzaman M (2020) Optimizing energy consumption for cloud internet of things, Front Phys 8. https:// doi.org/10.3389/fphy 32. Hasan MK, Islam S, Sulaiman R, Khan S, Hashim AHA, Habib SH, Islam M, Alyahya S, Ahmed MM, Kamil S, Hassan MA (2021) Lightweight encryption technique to enhance medical image security on internet of medical things applications. IEEE Access. https://doi.org/10.1109/ACC ESS.2021.3061710 33. Hassan R, Qamar F, Hasan MK, Aman AHM, Ahmed AS (2020) Internet of things and its applications: a comprehensive survey. Symmetry 12:1674

878

S. H. Dinde and S. K. Shirgave

34. D’Andrea GN (2022) Truffle—simple development framework for ethereum. https://libraries. io/npm/truffle/5.5.23 35. Ganache (2022) Truffle suite. https://trufflesuite.com/docs/ganache/7 36. N..js (2022) Nodejs. https://nodejs.org/

The NFT Wave and Better Use Cases—Using NFTs for Documents, Identification, and College Degrees B. R. Arun kumar and Siddharth Vaddem

Abstract With the innovation of smart contracts, the tokenization of digital assets reached great heights in the form of NFTs. While NFTs have opened up various use cases and opportunities in the form of digital art and collectibles, the potential of NFTs remains unclear. Office documents, as well as educational and identification documents, play a significant role in our daily lives. However, in this environment, forgery, duplication, and manipulation are all too common. This paper aims to address the gap between present and potential use cases. It proposes utilizing NFTs for better trust, transparency, and security by demonstrating the efficacy of NFTs in the domain of document verification and presenting the usefulness in tokenizing documents such as driving licenses, college certificates and other identity documents to prevent fraud and improve transparency over the authentication domain. This paper discusses NFTs in detail, the working, challenges, current applications, and an innovative approach to moving documents from off-chain to on-chain for validation of data with the help of decentralized oracles following prisoner’s dilemma and ERC-721 standards. Keywords NFT · Blockchain · Documents · Identity · Education · Digital certificates · Verification · Oracles

1 Introduction NFTs are a form of digital collectibles derived from Ethereum smart contracts, an acronym for non-fungible tokens, different from their counterparts like Bitcoin [1] or Eth. The term ‘fungible’ is defined as an asset’s replaceability in exchange for another asset or item of the same or similar value. Non-fungible is the polar opposite. These items cannot be replaced or have another replica and are unique to themselves— a one-of-a-kind baseball card. With the current exposure of NFTs to the world, there are mixed reactions, and some compare its innovation to the breakthrough of the Internet or social networks, while others dismiss it as a passing trend that will B. R. Arun kumar (B) · S. Vaddem Department of Computer Science and Engineering, BMS Institute of Technology and Management, Bengaluru 560064, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_61

879

880

B. R. Arun kumar and S. Vaddem

Fig. 1 Classification of Assets. Src: [2–4]

gradually decline and become a failed technology—a thing of the past. Figure 1 shows us examples of the assets being classified under physical and digital goods. Celebrities, artists, content creators, and studios have made headlines with their participation in various NFT projects in the form of JPEG images, music, basketball clips, digital art, GIFs, virtual lands, and other digital collectibles and crypto art [5] that go for exorbitant prices. Creating scarcity of digital assets has been perplexing before NFTs were introduced. Despite having copyright laws in place, it was easy to duplicate or pirate these digital works. NFTs help solve this problem of proving ownership of something. There is no uncertainty that the lucrative sales of digital art and assets as NFTs have contributed to the growing interest in NFTs in recent times. CryptoKitties [6] was one of the early popular examples of NFTs. Players of this game were able to get hold of, breed, collect and sell cartoon-istic cats. The NFT market started to grow in the mid-2020 [7]. There was a huge moment that occurred in March 2021, when the artwork of Beeple was sold for a whopping $69.3 Million at Christie’s auction house. Cryptopunks—a collection of unique digital characters—were sold for up to $11 million [8]. The first tweet by Jack Dorsey was sold at $2.9 million. This insane profitability of NFTs leads to big celebrity names creating their own NFTs. Even the NBA started selling Topshots Clips as NFTs. The term ‘NFT’ was chosen as Collins Dictionary’s word of the year for 2021 [9]. In comparison with 2020, the volume of dollars traded in the NFT market increased by 21,350.1% (Fig. 2) [10]. It is evident that NFT is making a major impact. The reason people justify their crazy prices is that they fall under the collectibles asset bracket class (Fig. 3).

The NFT Wave and Better Use Cases—Using NFTs for Documents …

Volume of dollars traded Volume of sales

881

2019

2020

2021

$24,532,783

$82,492,916 +236%

$17,694,851,721 +21350%

1,619,516

1,415,638 -13%

27,414,477 +1836%

Fig. 2 Comparing volume traded over the years. Src: [10]

Fig. 3 Sales in USD and unique active wallets from 2018 to 2022. Src: [11]

Several sectors are already leveraging its huge potential in industries such as gaming and virtual real estate with the likes of companies such as Coca-Cola and Louis Vuitton. The current implementation of NFTs is primarily in the digital world in the form of tokens such as Badges [12]. The fact that NFTs are most commonly employed in the digital world does not imply that they can only be used for digital assets. Many are quick to assume that NFTs are only meant for digital art but they can be extended to much more which is discussed further in the paper. The development of the NFT ecosystem is still in its early stage and the technologies are infantile. Since documents and verification processes are widely utilized and play a significant role in our daily lives, the use case of NFTs in this setting is ideal. Forgery, duplication, and manipulation are all too common, and guaranteeing document security and integrity is critical for users and other organizations seeking

882

B. R. Arun kumar and S. Vaddem

authenticity and verification. There exists a clear gap with respect to its potential use cases. This paper aims to bridge the gap by demonstrating the application of NFTs in the domain of authentication and verification of documents such as college certificates, driving licenses, birth certificates, passports, and many more which the government and other agencies could use [13]. It also answers the question: What are the practical use cases of NFTs in the real world? The question is answered by proposing a ‘document as NFT’ system that converts authentic documents to NFTs to prove their integrity, authenticity, and identity. The paper also goes into detail about how the documents to be minted are valid in the first place with the help of oracle services which help improve control in the authentication sector.

2 Literature Survey The following literature survey focuses on the applications of NFTs and analyzes the present use cases and proposals of NFTs in the real world.

2.1 Digital Certificates Using Blockchain and Educational NFTs Chen et al. [14] proposed a digital certificate system powered by blockchain to overcome the problems of counterfeit certificates and forging. As the blockchain is immutable, a digital certificate that cannot be counterfeited and can be verified easily could be made. The driving factors for proposing such a system were to achieve instant confirmation, reduce the wastage of hard copy paper, prevent tampering with the certificates, and reduce in-person involvement. The procedure suggested to achieve the desired results was to first generate the electronic version of the paper certificate granted by the schools. The hash value of the electronic file is calculated and stored in the blockchain system. A QR code along with an inquiry number is attached to the certificate. Later, companies or organizations can enquire about the authenticity of these papers with the help of the QR code and inquiry number. This way the blockchain-powered system solidifies the credibility of the different certificates. Gräther et al. [15] urged for a system that offers increased efficiency and enhanced security for certification authorities through digitalization using blockchain. Employers can get trustworthy certificate verification. EduTCX [16] presented a blockchain education platform for credential issuance, validation, and distribution. The platform seeks to provide counterfeit protection as well as secure certificate access. Brannan [17] proposed a system to take our educational adventures and upload them to a public ledger as proof of work which is uniquely identified and

The NFT Wave and Better Use Cases—Using NFTs for Documents …

883

owned by a student that shall act as proof of completing the said course. In other words, what if we could display the professional education courses that we enroll into the public which is unalterable. This way education could become more fun, collectible, and valuable.

2.2 Bloom Protocol Bloom is a system that uses federated attestation-based identity verification and proposed the establishment of a network of peer-to-peer and organizational nodes vouching to assess credit risk [18]. A decentralized credit scoring system that is powered by Ethereum and IPFS. People who move to different nations must rebuild their credit ratings from the beginning because credit scoring firms cannot function globally. Identity verification is also centralized; applying for loans necessitates disclosing all of one’s personal information, leaving one vulnerable to identity theft. It addresses the limitations in lending by moving this setting to the blockchain. Bloom is a programmable ecosystem that enables on-demand, safe, and global credit services. It provides a revolutionary approach to assessing credit risk for issuing compliant loans on the blockchain with a seamless experience at every stage of the credit issuance process. With the help of BloomID, users are able to create a federated global identity with independent third parties who publicly vouch for their identity, legal position, and creditworthiness. Identity attestation such as electronic ID, documents, social verification, and sanction screening is supported by BloomID. The vast bulk of global identity data is owned by private companies such as credit bureaus and governments. These organizations play a crucial role in assisting people in registering for the BloomID system. BloomIQ is used to keep track of current and past liabilities which are connected to the BloomID. The bloom protocol also comprises BloomScore which is similar to the FICO score.

2.3 Vaccine Passports and COVID-19 Mishra [19] investigates the applications and use cases of blockchain in vaccine passports. Due to the high frequency of intercontinental travel during the pandemic, travelers from different countries require different types of vaccination and travel documents. This makes the management of travel documents very complicated. To solve this problem, there is a need for vaccine passports. Vaccine passports should be implemented in a way that respects human rights and is not restrictive. Blockchain technology offers several advantages, including decentralization, irreversibility, and immutability [20]. The most prominent use of blockchains to alleviate COVID-19 implications is contact tracing and vaccine/immunity passport support [21]. San Marino, a tiny principality in the Mediterranean, has pioneered the use of blockchain technology to create digitized COVID-19 vaccine passports and issue

884

B. R. Arun kumar and S. Vaddem

them [22]. The record of the civilian’s vaccination is subsequently verified with the help of two quick response codes on VeChain’s blockchain ledger, allowing for immutable, secure, and seamless evidence of vaccination status which increases confidence in the information’s legitimacy, lowering the risk of counterfeiting. Immunify.Life [23], a blockchain-based initiative aimed at transforming the healthcare environment, is undergoing final technical evaluations in Zambia before launching the COVID-19 NFT-based certification.

3 Challenges of NFTs The non-fungible tokens ecology still has a variety of concerns to figure out. Table 1 provides an overview and description of the major NFT challenges faced in the current times.

4 Working An NFT can be owned by a single person only at any point in time. With the help of some metadata and a unique identifier, the ownership can be managed easily. Smart contracts [24] enable us to assign ownership of assets and control the logic required for handling the transfer. When NFTs are generated or minted, they are running the code from a smart contract that adheres to some of the various standards defined in the Ethereum such as ERC-721 [25] (Ethereum Request for Comment) unlike cryptocurrencies that follow ERC-20 [26] standard. From a higher level, the minting process includes the following steps: building a new block, validating the data, and recording the validated data into the blockchain. The principle behind NFTs is to use the blockchain to provide uniqueness to each associated item. NFTs have various properties such as: • • • •

Every token minted has a unique identifier. Cannot be interchanged with other tokens. There is just one unique owner, and the information is easily verified. Transaction history is publicly verifiable and can be easy to prove ownership history. • It is nearly impossible to modify the data or steal ownership once it has been minted. All NFTs are identified with the help of a unique unit 265 ID in the ERC-721 smart contract. This ID will remain the same throughout the life of the smart contract. The NFT system usually consists of two main actors—the owner and the buyer. Figure 4 gives an overview of the system. The steps involved from buying to selling an NFT are:

The NFT Wave and Better Use Cases—Using NFTs for Documents …

885

Table 1 Overview of the challenges faced with NFTs Problem

Description

Copyright and distribution

Although NFTs are non-replaceable, unique, and exchangeable, it is just a code defining ownership that points to something. One might be under the impression that they have purchased the underlying content linked to the NFT. Anyhow, the original creator can still be the copyright owner and can retail exclusive rights of the content. It depends on the smart contracts designed

Artificial scarcity

One can view, copy, download, and distribute the NFT images, art, and music for free. The only thing that distinguishes the real and the fake NFT is the ownership ID. But nothing can stop someone from using the same image and minting an NFT of their own that is no different from the existing one. Valuation becomes an issue due to artificial scarcity

Current use cases

The current implementation of the technology doesn’t provide any great value to society. Right now NFT is associated with wealthy collectors purchasing digital art and dumping their crypto gains into something to avoid paying taxes. There is a common misconception that NFTs are a get rich quick scheme

Taxation and legality

There is an absence of laws governing the assets underlying each NFT. NFTs are associated with entertainment in certain nations, property tax schemes in others, and art in a few. It becomes extremely difficult for individuals to disclose income from the selling of NFTs, and even more difficult for businesses due to restrictions in different locations

Regulations

It becomes increasingly difficult to safeguard consumers from frauds or illegal copies. Because the blockchain was designed to be decentralized, it is nearly hard to foresee scams beforehand. For years to come, NFT market regulation will be puzzling and challenging

Cost and climate

Minting or transferring NFTs is expensive and to do that you have to pay a ‘gas fee’ which is the cost of solving complex cryptographic puzzles which requires an insane amount of computing power and electricity and is harmful to the environment

• The owner/creator first checks and uploads the raw data. • The owner/creator signs a transaction which includes the hash of the NFT and then sends it to a smart contract. • The smart contract receives the data and the transaction and the minting process begins. The mechanism behind the process is token standards such as ERC-20, ERC-721, and ERC-1155. ERC-721 is used to implement NFTs. This NFT will be forever linked to a unique blockchain address once the transaction is confirmed. • The newly minted NFT can be listed on a marketplace for sale. • The buyer should pay the listed fee on the marketplace to the owner’s wallet address. The smart contract takes care of the owner transfer logic after the conditions in the contract are met.

886

B. R. Arun kumar and S. Vaddem

Fig. 4 NFT buyer–seller system overview

There are various other standards like ERC-998 [27] which extends the functionality of ERC-721 by allowing the NFTs to own other NFTs and ERC-20 tokens, and ERC-1155 which allows the creation of both non-fungible and fungible assets. ERC-1155 [28] was developed with video games in mind where non-fungible tokens equate to in-game items and exchangeable assets and the fungible tokens equate to a transactional currency [29].

5 Proposed Methodology This section discusses closing the potential gap of NFTs and using it for something that has more societal relevance. The problem statement at hand- ‘Forgery, duplication, and manipulation are too common in document verification and authentication. How do we promote better trust, transparency, and security to prevent fraud and tampering with documents such as passports, licenses, college degrees, and other IDs?’ This paper proposes a multidimensional platform to upload important documents for identity and authentication needs in a more secure and decentralized way compared to existing solutions [30–33] with the help of NFTs. NFTs different from the conventional NFTs shall be used to build this system. These NFTs are worthless, unlike regular NFTs that have a predefined value. Users will also be unable to sell, transfer, or purchase these NFTs, making them nontransferable [34] ERC-721 tokens that are unique. The NFTs will be present on the Ethereum blockchain and hosted on the IPFS with the metadata that links to the

The NFT Wave and Better Use Cases—Using NFTs for Documents …

887

Fig. 5 Comparison between the regular and modified NFTs

attested documents verified to be original. Having the documents publicly accessible to everyone isn’t ideal as the personal information of every user will be public knowledge. Therefore, the files linked to the NFT must be encrypted with keys. An overview of the NFTs comparison can be seen in Fig. 5. The files must be encrypted to prevent others from viewing it. To be able to perform data operations, a multi-key concept is employed, in which many parties are involved. Two keys, namely ‘private key’ and ‘authority key’ are used. The private key is the key that the document’s user/owner uses to decrypt the file and read its contents. The authority key is used by the respective authority to access and read the document’s contents. An ‘authority’ could be a university board or a government office. This allows authorized agencies/organizations to access documents for verification while also allowing users to access their documents. As an outcome, the NFT minted is immutable and unalterable in the blockchain. To update the documents, a combination of the authority and private key is required as the user cannot be permitted to change the content as he likes. The data in the document should be authentic and attested. The combination of these keys will prevent any single party from altering the data. NFTs have fixed metadata when it is minted. With the help of NFT DID (decentralized identifier) also known as CIP-94 of the Ceramic Network [35], metadata can be modified over time by annotation. This way an extensive history of linked documents and changes can be realized. The above Fig. 6 gives an overview of the process of implementing the documents to be minted as NFTs. These documents can be used as proof of legality and authenticity for settings like employment record checks, opening a new bank account, passport verification at airports, college degrees, and even birth certificates. The possibilities are endless. To make the NFTs non-transferable, the ERC-721 transfer function can be overridden and custom logic can be added, such as adding a counter and blocking the transfers if the counter reaches its maximum which in the current scenario is zero since transfer of ownership of documents is not allowed. There is a problem that exists. How can it be ensured that the data entered into the blockchain is legitimate in the first place? According to Szabo [36], blockchains do not ensure truth or validity; rather, they protect truth and falsehood against future modification. The user could upload some fictitious data to the blockchain and an NFT could be minted to prove that it is genuine and valid. To solve this problem, an organization could be entrusted with the responsibility of verifying and authenticating but that would make the use of blockchain void since decentralization is

888

B. R. Arun kumar and S. Vaddem

the key feature of blockchain. A trustable method of validation is necessary such as a consensus. Consensus is the method in blockchain that allows nodes to agree on a data value, and the nodes must be deterministic to get a consensus. Some of the recognizable consensus models are Nakamoto Consensus and Byzantine Consensus [37]. The user data must be collected from off the chain. Data is required in order to use blockchain for one of its most essential applications, smart contracts. So given this hurdle, how do we close the gap between the off-chain and on-chain realm? To solve this problem, blockchain oracle services could be used. A blockchain oracle is a service or institution that connects a blockchain to off-chain trustless data. Oracles act as an interface between real world data and a trustable chain. Bloom protocol’s whitepaper mentions that to attest the identity information, legal status, and credit worthiness, they take the help of third parties such as family or peers who vouch for a user’s identity or other organizations who have an existing reputation or track record of successful identity attestations. This approach can still be manipulated if a relative or private third party companies give false attestation and false data enters the blockchain. To validate and verify documents and papers on-chain, a decentralized solution following a consensus resistant to malevolent parties or individuals should be used such as a decentralized oracle network. The oracles participating in the consensus are financially rewarded for truthfully attesting and penalized heavily for altering or lying about the validity of the documents to be uploaded. To make the system more trustworthy and efficient, the prisoner’s dilemma’ can be implemented. The prisoner’s dilemma [38] framed by Merrill Flood and Melvin Dresher is a concept in-game theory that explains why two reasonable individuals may not cooperate even though it appears to be in their best interests to do so. ‘Two individuals accused of a crime are detained. The two individuals are put in different rooms and cannot speak with each other. Prosecutors do not have enough proof to charge them with the primary accusation, but they do have enough to charge them on a less serious offense. The prosecutors give the individuals the option of cooperating with the other by staying silent or betraying the other by testifying against them about the committed crime.’ The following results are possible:

Fig. 6 Outline and example of minting an NFT

The NFT Wave and Better Use Cases—Using NFTs for Documents …

889

Fig. 7 Game of prisoner’s dilemma

• If X and Y both betray each other, they will both be imprisoned in jail for two years. • If X betrays Y but Y keeps silent, X is free to go and Y is imprisoned in jail for three years. • If X keeps silent but Y betrays X, X will be imprisoned in jail for three years, while Y is free to go. • If both X and Y keep silent, they will be imprisoned in jail for just one year (Fig. 7). Because betraying the other member is more rewarding than cooperating with them, it is reasonable to assume that all purely rational incarcerated members will betray the other, leaving only one possible outcome: both prisoners betraying each other. Individual reward should, on paper, lead to a better outcome; however, in the prisoner’s dilemma, individual reward leads to a worse individual outcome. With the help of a consensus algorithm following the infamous prisoner’s dilemma, the oracle participants are persuaded to participate in the validation and reap the benefits of a virtuous financial reward for truthfully attesting. The prisoner’s dilemma prevents the participants from acting rogue or feeding false information about the genuinity of documents that are to be uploaded on-chain. They shall be castigated for such actions. It is very important to verify the authenticity of the documents uploaded on the chain because once the documents are in, they cannot be altered. To enable the participants to partake in the consensus algorithm, they must stake a certain amount of ETH or crypto called ‘staking fees.’ This is the amount that they shall forego along with a set ‘penalty fee’ to be able to partake in the consensus in future again if their answer does not satisfy or meet the outcome of the consensus algorithm. The user compensates the oracles for their contribution by paying a ‘service fee’ for authenticating their document. After the result is declared, the oracles are rewarded by the consensus algorithm. There may also be an option for the user to pay a higher fee than the standard ‘service fee’ in exchange for faster approval and verification. As a result, we have designed a fully decentralized oracle service that can be trusted and given the task of validating the authenticity of documents and pave the way for onboarding valid documents on the chain for authentication and verification purposes. The proposed methodology in this paper trumps the implementation of Bloom protocol. BloomID allows users to create a global, federated identity with the help of an independent third party who can publicly vouch for their identity and credibility. These third parties can be friends or relatives, or private organizations like credit bureaus with a “track record” of successfully attesting on the network. The majority

890

B. R. Arun kumar and S. Vaddem

Fig. 8 System architecture consisting of decentralized oracle networks

of the companies that hold and attest are credit bureaus and government agencies, which makes it more centralized because we can’t trust a single party or group of parties that bloom “trusts.” An attacker or user could try to create a large number of fake BloomIDs and have them vouch for each other, or they could create their own organization to vouch or attest validity. This method could effectively generate BloomIDs that appear to be genuine and have a good financial history, which the attacker could then use to commit fraud (Fig. 8). To solve this, the Bloom network is “bootstrapped” by a small number of users and organizations being designated as “trusted” network participants [18]. The term “trusted” network participants mentioned in the whitepaper does not strongly support or adhere to the core idea of decentralization, whereas the methodology proposed of creating a network of decentralized oracles that follow the prisoner’s dilemma promotes decentralization and is a better way to prove the veracity of documents while they are being uploaded to the blockchain by penalizing or rewarding based on the consensus algorithm outcome.

6 Conclusion This paper proposes a system for validating documents for authenticity, ownership, and identity using NFTs and decentralized oracles. Even as the world and technology advance, there is still a need for a seamless validation process which isn’t

The NFT Wave and Better Use Cases—Using NFTs for Documents …

891

manual or tedious and isn’t centralized. The application of NFTs and decentralized oracles following the prisoner’s dilemma in such a setting is ideal. The above discussed decentralized ecosystem as a whole stores the history of valid, secure, and encrypted data on a network which is trustable and doesn’t fail. Intermediaries who have sole control over data are eliminated. Forgery, duplication, and other issues with identity verification are prevented. Access is completely open with a public blockchain network. To safeguard users’ privacy, we combine this public network with NFTs linked to IPFS encrypted files. It also addresses the problem of updating documents after they have been minted with the help of Ceramic Network NFT DIDs. The proposal is comparable to some of the existing solutions, but it takes a unique and significantly improved approach to decentralization and trust. This way, a more socially relevant implementation of NFTs is realized and the gap in potential use cases is closed. Finally, the blockchain industry is currently attempting to develop solutions in search of a problem, but with the help of this proposed idea, a future with better applications and use cases can be anticipated.

References 1. Nakamoto S (2008) Bitcoin: a peer-to-peer electronic cash system. Bitcoin.org. [Online]. Available: https://bitcoin.org/bitcoin.pdf 2. Pixabay, https://pixabay.com/ 3. BAYC, Bored ape yacht club 4. Ethereum.org assets. https://ethereum.org/en/assets/ 5. Franceschet M, Colavizza G, Smith T, Finucane B, Ostachowsk M, Scalet S, Perkins J, Morgan J, Hernandezi S (2021) Crypto art: a decentralized view. Leonardo 54(4) Available: https://doi. org/10.1162/leon_a_02003 6. Singh B, Jain K (2021) Story of Cryptokitties, art and NFTS. The Daily Guardian. [Online]. Available: https://thedailyguardian.com/story-of-cryptokitties-art-and-nfts/ 7. CryptoSlam! NFT data, rankings, prices, sales volume charts, market cap. Cryptoslam.io. [Online]. Available: https://cryptoslam.io/ 8. Nadini M, Alessandretti L, Di Giacinto F, Martino M, Aiello L, Baronchelli A (2021) Mapping the NFT revolution: market trends, trade networks, and visual features. Sci Rep 11(1). Available: https://doi.org/10.1038/s41598-021-00053-8 9. Redman J (2021) ‘NFT’ Chosen as 2021’s Collins English dictionary ‘Word of the year’— Bitcoin news, Bitcoin news. [Online]. Available: https://news.bitcoin.com/nft-chosen-as2021s-collins-english-dictionary-word-of-the-year/ 10. Yearly NFT market report 2021. NonFungible (2022) [Online]. Available: https://nonfungible. com/news/corporate/yearly-nft-market-report-2021 11. Market tracker | NFT sales history and trends. https://nonfungible.com/market-tracker 12. Hripak A, Arjona S, Bohrer J, Haag V, Hata T, Houston C, Leuba M, Miller A, Mufeed O, Otto N, Pitcher J, Reis A (2020) Open badges version 2.1 conformance and certification guide | IMS global learning consortium. Imsglobal.org 13. Corten P (2018) Implementation of blockchain powered smart contracts in governmental services. Repository.tudelft.nl. [Online]. Available: https://repository.tudelft.nl/islandora/obj ect/uuid%3A87709465-b9a1-48da-9ba5-eba98bc263d7 14. Cheng J, Lee N, Chi C, Chen Y (2018) Blockchain and smart contract for digital certificate. In: 2018 IEEE international conference on applied system invention (ICASI). Available: https:// doi.org/10.1109/icasi.2018.8394455

892

B. R. Arun kumar and S. Vaddem

15. Gräther W, Kolvenbach S, Ruland R, Schütte J, Torres C, Wendland F (2018) Blockchain for education: lifelong learning passport. Dl.eusset.eu. [Online]. Available: https://dl.eusset.eu/han dle/20.500.12015/3163 16. Turkanovic M, Holbl M, Kosic K, Hericko M, Kamisalic A (2018) EduCTX: a blockchainbased higher education credit platform, vol 6. IEEE Access, pp 5112–5127. Available: https:// doi.org/10.1109/access.2018.2789929 17. Brannan B (2021) Education NFTs will replace diplomas and resumes. Linkedin.com. [Online]. Available: https://www.linkedin.com/pulse/education-nfts-replace-diplomas-res umes-beau-brannan 18. Leimgruber J, Meier A, Backus J (2018) Bloom protocol decentralized credit scoring powered by Ethereum and IPFS. Bloom: the truth platform. [Online]. Available: https://hellobloom.io/ whitepaper.pdf 19. Mishra V (2022) Applications of blockchain for vaccine passport and challenges. J Global Oper Strateg Sourcing. Available: https://doi.org/10.1108/jgoss-07-2021-0054 20. Vanderslott S, Marks T (2020) Travel restrictions as a disease control measure: lessons from yellow fever. Global Public Health 16(3), 340–353. Available: https://doi.org/10.1080/174 41692.2020.1805786 21. Ricci L, Maesa D, Favenza A, Ferro E (2021) Blockchains for COVID-19 contact tracing and vaccine support: a systematic review, vol 9. IEEE Access, pp 37936–37950. Available: https:// doi.org/10.1109/access.2021.3063152 22. Cooling S (2021) Yahoo is part of the Yahoo family of brands. Finance.yahoo.com. [Online]. Available: https://finance.yahoo.com/news/san-marino-adopts-nft-vaccine-092053414.html 23. Spotlight on COVID-19 in Zambia. Medium (2022). [Online]. Available: https://immunifylife.medium.com/spotlight-on-covid-19-in-zambia-fdc98d344c7f 24. Szabo N (1997) Formalizing and securing relationships on public networks. First Monday 2(9). Available: https://doi.org/10.5210/fm.v2i9.548 25. Entriken W, Shirley D, Evans J, Sachs N (2018) EIP-721: non-fungible token standard. Ethereum improvement proposals. [Online]. Available: https://eips.ethereum.org/EIPS/eip-721 26. Vogelsteller F, Buterin V (2015) EIP-20: token standard. Ethereum improvement proposals. [Online]. Available: https://eips.ethereum.org/EIPS/eip-20 27. Lockyer M, Mudge N, Schalm J (2018) EIP-998: ERC-998 composable non-fungible token standard. Ethereum improvement proposals. [Online]. Available: https://eips.ethereum.org/ EIPS/eip-998 28. Radomski W, Cooke A, Castonguay P, Therien J, Binet E, Sandford R (2018) EIP-1155: multi token standard. Ethereum improvement proposals. [Online]. Available: https://eips.ethereum. org/EIPS/eip-1155 29. Token standards: ERC-721, ERC-998 and ERC-1155 | How are they different?—LCX. LCX (2022) 30. Zhai X, Pang S, Wang M, Qiao S, Lv Z (2022) TVS: a trusted verification scheme for office documents based on blockchain. Complex Intell Syst. Available: https://doi.org/10.1007/s40 747-021-00617-1 31. Sunitha Kumari S, Saveetha D (2018) Blockchain and smart contract for digital document verification. Int J Eng Technol 7(46). Available: https://doi.org/10.14419/ijet.v7i4.6.28449 32. Wang Y, Kogan A (2018) Designing confidentiality-preserving blockchain-based transaction processing systems. Int J Account Inf Syst 30. Available: https://doi.org/10.1016/j.accinf.2018. 06.001 33. Martinod N, Homayounfar K, Lazzarotto D, Upenik E, Ebrahimi T (2021) Towards a secure and trustworthy imaging with non-fungible tokens. Spie.org. Available: https://spie.org/Public ations/Proceedings/Paper/10.1117/12.2598436 34. Srinivasan B (2021) Non transferable, but fungible 35. Ceramic network—let your data flow. Ceramic.network. [Online]. Available: https://ceramic. network/

The NFT Wave and Better Use Cases—Using NFTs for Documents …

893

36. Ridley M (2017) Minimizing the need for trusted third parties. Rationaloptimist.com. [Online]. Available: https://www.rationaloptimist.com/blog/block-chains-bitcoins-and-distri buted-ledgers/ 37. Collins P (2020) What is a blockchain oracle? Medium 38. What is the prisoner’s dilemma? Investopedia (2021). [Online]. Available: https://www.invest opedia.com/terms/p/prisoners-dilemma.asp

Data Science Techniques for Handling Epidemic, Pandemic

Outbreak and Pandemic Management in MOOCs: Current Status and Scope for Public Health Data Science Education Anussha Murali , Arun Mitra , Sundeep Sahay , and Biju Soman

Abstract The COVID-19 pandemic has revealed the flaws in our health system and provided new opportunities to build resilience. Data science is one of the six scientific gaps that emerged in the global experience with the COVID-19 pandemic. Massive open online courses (MOOCs) can potentially address the critical need to build public health data science capabilities and competencies in outbreak and pandemic management. The objective of the study was to perform a scoping review with thematic analysis and identify the key themes across MOOCs covering content on outbreak and pandemic management using qualitative evidence synthesis. A total of 458 unique records were found, of which 69 were relevant to the pandemic and outbreak management context. Thematic analysis identified three cross-cutting themes: (i) personal and professional competencies, (ii) institutional and organisational response, and (iii) community participation—the growing role of MOOCs in building competencies that are empowering and enabling to individuals and communities. A limited number of courses covered the use of data science in outbreak and pandemic management suggesting the need for future MOOCs to integrate data science knowledge and methods in public health education and training. Data science competencies are central to strengthening the health system and effective public health response. MOOCs present a promising opportunity to deliver public health data science education, integrating domain expertise, public health ethics, and data science methods.

A. Murali Jawaharlal Nehru University, New Delhi, India A. Mitra · B. Soman (B) Sree Chitra Tirunal Institute for Medical Sciences and Technology, Trivandrum, Kerala, India e-mail: [email protected] A. Mitra e-mail: [email protected] S. Sahay Department of Informatics, University of Oslo, Oslo, Norway e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_62

897

898

A. Murali et al.

Keywords Public health data science · Pandemic management · Outbreak management · Public health education · Massive open online courses (MOOCs)

1 Introduction Massive open online courses (MOOCs) have grown in popularity as a way to facilitate large scale, low-cost learning while also providing opportunities and empowering individuals through education [1], especially on topics related to public good. Expectedly, it is a natural ally to public health, with its multitude of functionaries with varied skill sets potentially benefiting from access to free and open high-quality resources, especially for those in low- and middle-income countries (LMICs). Additionally, MOOCs introduce often neglected but upcoming areas of practise and research, such as data science in public health. The ongoing COVID-19 pandemic revealed the integral role of data science in outbreak and pandemic management, with data science contributing to the analysis of COVID-19 trends, monitoring of disease outbreaks and forecasting, making risk assessments, and studying the impact of government policies and mitigation strategies, among other examples [2]. To realise the potential of data science in managing outbreaks, a skilled public health professional with competencies in data science and domain expertise in public health is needed [3]. While this urgent need has been realised with the emergence of post-graduate specialist programmes on public health data science, the courses offered come with high tuition and may not be affordable to individuals from low- and middle-income countries. Online resources such as MOOCs provide a unique opportunity to address these knowledge gaps. MOOCs have been used in public health education since its inception, with the Harvard School of Public Health (Harvard University; [4] and the Johns Hopkins Bloomberg School of Public Health (Johns Hopkins University, [5] leading the way. While previous studies have analysed the role of MOOCs in public health [5, 6] and more specifically, in outbreaks of emerging infectious diseases [7], our literature search failed to identify any studies systematically examining MOOCs related to outbreak and pandemic management and applications of data in the area. Such a study is required to understand what existing resources are available, the rationale for their development, by whom and where the courses are designed and produced, and what gaps exist in course content. These insights can help us frame how outbreak and pandemic management feature in MOOCs and provide a useful starting point for educators developing courses on the topic, as future courses can be built on a prior understanding of what relevant academic content is already available and how it has been organised within the MOOC format. Given the context outlined, this study employed scoping review and thematic analysis methods to explore courses related to outbreak and pandemic management in MOOCs and identify key themes and their conceptual linkages. Using these findings, it further examined how data science has featured in course syllabi to identify the current status of public health data science education in MOOCs related to

Outbreak and Pandemic Management in MOOCs: Current Status …

899

outbreak and pandemic management by answering the following questions: How are outbreak and pandemic management approached in MOOCs? How is public health data science discussed in relation to pandemic and outbreak management in MOOCs? To answer these questions, we examine the course syllabi of MOOCs covering outbreak and pandemic management, including but not limited to courses specifically focused on the subject. This approach allows us to explore courses from a wide range of disciplines and sectors. Conversely, it may restrict a closer analysis of outbreak and pandemic management course content to a particular field such as public health. Given that effective management of outbreaks relies on strong intersectoral and multisectoral action [8], excluding courses from outside the health sector would limit our analysis from moving beyond disciplinary silos, while including them would provide important insights into existing MOOCs, potentially informing future course development in the subject. As a result, the study also offers a comprehensive window into public health educational content on outbreak and pandemic management alongside the status of public health data science in existing courses.

2 Approach This study on outbreak and pandemic management MOOCs combined a scoping review with thematic analysis of course syllabi, using qualitative data and inductive reasoning to map currently available courses and examine course content. A scoping review method was chosen given the broad research questions, which require a mapping of existing courses and an examination of course syllabi across a wide range of sources. The thematic analysis allows us to then synthesise our findings as key themes across courses, supporting the identification of gaps in course content.

2.1 Sample Description MOOCs were identified by a search on MOOC platforms, a digital library and a search engine. A preliminary search was conducted to identify the study population. For instance, platforms such as MIT OpenCourseWare, Open Learn (Open University), and Open Learning Initiative (Carnegie Mellon University) were eliminated as the search resulted in zero courses meeting the inclusion criteria. Courses met the inclusion criteria if the course content covered outbreak and pandemic management with English as the medium of instruction. MOOCs included both specialised courses on outbreak and pandemic management and courses covering technical, scientific, and conceptual aspects of the topic. For example, courses on risk communication during pandemics and technical tutorials on outbreak investigation software were included. Courses on disease characteristics were excluded if they did not contain content related to prevention and control of outbreaks,

900

A. Murali et al.

Table 1 Overview of search numbers across platforms MOOC sources

No. of search results

Duplicates

OpenWHO

104

24

52

edX

112

19

87

6

Coursera

220

23

182

15

Future Learn

101

14

68

19

1

0

0

1

538

80

389

69

Google search enginea Total

No. of courses excluded

No. of courses included 28

a

Results from Google were identified via handsearching. Only search results meeting the inclusion criteria were indexed

whereas MOOCs on key policy approaches such as One Health were included if they did. MOOCs were also excluded if the medium of instruction was not English, if they were classified as courses while lacking an organised course outline, and if the course was archived. To identify the sample, a search was carried out across the selected platforms using search filters for language (English only) and type of resource (courses only), if available. The search terms used were “pandemic”, “epidemic”, and “outbreak”, where each term was manually searched. Broad search terms were used to be able to identify courses across disciplines and sectors. OpenWHO produced 104 results, edX produced 112, Coursera produced 220, and FutureLearn produced 101. Of these courses, 80 were duplicates, and 373 were excluded. Handsearching on the Google search engine produced 1 additional result. The total included courses were 69 in number, with OpenWHO (n = 28; 40.57%) contributing the largest number of MOOCs. Table 1 provides an overview of the search numbers across the platforms. It also provides a breakdown of how we arrived at the final number of courses included for analysis. For courses covering the use of data in outbreak and pandemic management, MOOCs were included if they included modules on the application of data in managing outbreaks. This included a variety of courses, from descriptive, introductory lessons on epidemiological investigation for laypersons to courses designed for technical assistance and specialised knowledge transfer for experienced outbreak investigators.

2.2 Data Analysis The primary search was conducted by one researcher while another researcher reviewed the search strategy. The MOOCs were indexed and maintained in an online repository for extraction and analysis. Results from a search engine identified through

Outbreak and Pandemic Management in MOOCs: Current Status …

901

handsearching were added to the index. The index collected data regarding course title, year of publication, and course description. For data extraction, duplicates were reviewed and removed by one researcher. For checking the relevance of the courses and their inclusion in the study, one researcher accepted or eliminated MOOCs based on the inclusion criteria, while another researcher examined the selection process and adherence to the inclusion criteria. A common data format was created for the extraction of relevant data. This included year of publication, course organisers, details regarding course syllabi such as aims, learning outcomes, course outlines, course length, modalities and structure, intended audience, education level, country/region of course development, and disciplinary area. Researchers examined the selected MOOCs and extracted relevant data. It should be noted that the researchers largely limited data extraction to information available on the platform pages and did not examine the teaching material and supplementary resources. The 69 MOOCs syllabi provided the textual data for a qualitative, inductive analysis. The first stage of data analysis was free-coding based on the course titles, descriptions, and outlines by one researcher. These codes were manually grouped into clusters using a concept mapping strategy at the initial stage. Discussions between the researchers were used to refine the grouping of codes and identify potential themes and sub-themes. These emergent themes and sub-themes were reorganised and renamed until they were mutually exclusive and conceptually distinct. A thematic map was developed to explicate the findings and draw inferences. To explore MOOCs featuring data-based approaches to outbreak and pandemic management, the selected courses were re-examined, and those containing relevant content were identified and reviewed.

3 Results 3.1 Sample Characteristics The MOOCs included for the final analysis (n = 69) were predominantly developed by international public health organisations (namely, WHO) and universities located in high-income countries (HICs), including the U.S., U.K., Italy, Switzerland, and Hong Kong. While courses tailored to non-Western contexts were present and developed by regional experts, these courses were small in number, and no MOOCs were independently produced by national institutions and universities located in these countries. Nine out of 69 courses (13.04%) were specifically designed for countries outside Europe and North America. Six of the courses were developed for African contexts, one for the Indian context, one for LMICs, and one course for the WHO Eastern Mediterranean Region. The majority of these courses (n = 6, 8.70%) were created by or in collaboration with WHO and its regional offices.

902

A. Murali et al.

The discipline contributing the highest number of MOOCs by a large margin was public health (n = 52, 75.36%). Most public health MOOCs focused on individual and organisational capacity building through technical and scientific knowledge transfer. Courses from the field of social science (psychology, sociology, and social work), nursing, medical science, general science, and biotechnology were also present, typically as part of multi- and inter-disciplinary courses. The latter courses covered the linkages between human, animal, and environmental health in relation to disease emergence and control, which inherently require contributions and collaborations between and within academic disciplines. The courses were designed for a diverse range of actors and stakeholders, including health workers at all levels, students in higher education, policy- and decision-makers, administrators and professionals from the animal or environmental health sectors, local and national government officials, civil society, individuals from non-profit organisations and humanitarian sectors, and the private sector. MOOCs were available to individuals across various educational levels, with the majority of courses classified as basic or beginner level. MOOCs classified as intermediate or advanced were limited to the area of public health and intended for health workers or students in higher education. MOOCs on outbreak and pandemic management were heterogeneous in terms of instructional and assessment strategies. All MOOCs used short videos, specifically lectures. Videos were usually combined with textual resources, activities, peer and self-assessment, or discussion boards. The course length and structure varied from 45 min in one session to 49 h over eight weeks. The platform pages of MOOCs did not usually provide details regarding the year the course was introduced and when it was last updated. Where the information was present, all courses were introduced on or after 2017. A large percentage of courses carried “COVID-19” (and in one case, “post-COVID”) in their title (n = 25, 36.23), reflecting the recent use of MOOCs as educational responses to the COVID-19 pandemic. The MOOCs reflect the trend towards paid courses and closed licencing in recent years. Around half of the identified courses (n = 38, 55.07%) required payment to avail certificates of completion. Except for courses on the OpenWHO platform, all courses fell into this category.

3.2 Mapping Outbreak and Pandemic Management in MOOCs Despite the courses included in the review covering a wide range of areas, the availability of information such as course descriptions, learning objectives, and course structures allowed for the data analysis to identify and group the textual data. The analysis revealed three linked themes: personal and professional competencies, institutional and organisational response, and community participation (Table 2).

Outbreak and Pandemic Management in MOOCs: Current Status …

903

Table 2 Descriptions of the thematic categories with examples Theme

Description

Personal and professional Knowledge and skill acquisition competencies and application in managing outbreaks

Examples • AFRO IDSR course 3: investigating, preparing and responding to outbreaks and other public health events (WHO regional office for Africa) • COVID-19 critical care: understanding and application (University of Edinburgh) • Introduction to COVID-19: methods for detection, prevention, response, and control (WHO) • Communication essentials for member states (WHO) • Introduction to participatory approaches in public health (Imperial college London)

Institutional and organisational response

Strengthening systems across sectors in emergency management

• Health emergency and disaster risk management for resilient cities (WHO) • COVID-19: operational planning guidelines and COVID-19 partners platform to support country preparedness and response (WHO) • Measuring and maximizing impact of COVID-19 contact tracing (Johns Hopkins University) • Prison health: managing outbreaks of tuberculosis in prisons (UK health security agency) • Navigating the tripartite zoonoses guide (TZG): a training for advocates and implementers (WHO) (continued)

Seventeen sub-themes were identified (Fig. 1). The theme “personal and professional competencies” was comprised of seven sub-themes: epidemiological investigation and disease modelling; conceptual frameworks and approaches to disease emergence and control; risk communication and community engagement; leadership and decision-making; psycho-social support in emergencies; health equity and

904

A. Murali et al.

Table 2 (continued) Theme

Description

Examples

Community participation

Individuals and communities adapting to emergencies by engaging with health information, self-support skills and strengthening family and community networks

• Fighting COVID-19 with epidemiology: a Johns Hopkins teach-out (Johns Hopkins University) • COVID-19 and work: staying healthy and safe at work during the COVID-19 pandemic (WHO) • Monkey pox: introductory course for African outbreak contexts (WHO) • Anti-vaccination and vaccine hesitancy (University of Queensland) • COVID-19: helping young people manage low mood and depression (University of reading)

ethical practises; and clinical and laboratory management. The sub-themes reflect areas covered in the courses to develop the knowledge and skill base of individuals involved in emergency response and outbreak management. Epidemiological investigation and disease modelling included the technical know-how required to respond to outbreaks, whereas conceptual frameworks and approaches include contributions from public health, ecology, zoology, and social science in shaping understanding of disease emergence and control. Risk communication and community engagement includes communicating disease risk effectively, supporting communities in responding to outbreaks and pandemics, and incorporating participatory approaches. Leadership and decision-making include leading and managing teams under crises and uncertainty, partnership-building and maintaining effective decision-making and communication channels. Psycho-social support refers to providing mental health support to individuals and communities, along with building emotional resilience in oneself. Health equity is discussed using concepts such as health disparities and vulnerability, requiring knowledge and ethical commitment to be addressed in outbreak response. Clinical and laboratory management includes prevention and management of pandemics and outbreaks in clinical settings. The seven sub-themes of “institutional and organisational response” were: technical assistance and knowledge transfer; risk management and emergency preparedness; systems approach and resilience; programme planning; implementation and evaluation; intersectoral action and the human-animal-ecosystem interface; global health and security; and vulnerable populations and settings.

Outbreak and Pandemic Management in MOOCs: Current Status …

905

Outbreak and Pandemic Management MOOCs Personal and professional competencies

Institutional and organisational response

Epidemiological investigation and disease modelling

Technical assistance and knowledge transfer

Conceptual frameworks and approaches to disease emergence and control

Risk management and emergency preparedness

Risk communication and community engagement

Systems approach and resilience

Leadership and decision-making

Programme planning, implementation and evaluation

Psycho-social support in emergencies

Community participation Health literacy

Personal and community resilience

Guidance and advice

Intersectoral action and the human-animalecosystem interface

Health equity and ethical practises Clinical and laboratory management

Global health and security

Vulnerable populations and settings Fig. 1 Themes and sub-themes identified

Technical assistance and knowledge transfer included support to manage data, carry out, and evaluate mitigation measures and adhere to governance and regulatory norms such as the International Health Regulations (IHR). Risk management and emergency preparedness included matters relating to the set-up and processes of risk management within organisations and systems. Under programme planning, implementation, and evaluation, it included developing and execution of outbreak management measures and linked policies. The sub-theme of intersectoral action and the human-animal-ecosystem interface refers to the integration of human, animal, and environmental/ecosystem health and operationalising intersectoral policy approaches such as One Health. Global health and security cover global governance and response to outbreaks, conceptualised as health emergencies and security threats. The final

906

A. Murali et al.

sub-theme, vulnerable populations, and settings, included communities and settings disproportionately impacted by outbreaks. MOOCs explored outbreak management in humanitarian crises and prisons as well as supporting children and immigrants. Lastly, “community participation” included the sub-themes of health literacy, personal and community resilience, and guidance and advice. This theme covers making sense of science and health information (including addressing vaccine misinformation), public health advice and guidance to prevent and respond to outbreaks, and building emotional and community resilience to adapt to crises or disruptions. While the sub-themes are not uniformly represented across the MOOCs, when seen in conjunction with the previous section (Sect. 3.1), the analysis provides an overview of the current status of outbreak and pandemic management in MOOCs. Most MOOCs covered sub-themes across at least two thematic categories, showcasing linkages between the three identified themes.

3.3 Using Data in Outbreak and Pandemic Management: MOOCs Content and Characteristics Fourteen of the selected courses (20.29%) were found to incorporate content on the use of data in outbreak and pandemic management. Eight out of the 14 courses (57.14%) were introductory or beginner level, geared towards a diverse audience of policymakers, students in higher education, health workers, and general public. The six remaining courses were intended for healthcare workers involved in outbreak response, health officials, researchers and students across health, medical, and environmental disciplines, and epidemiologists. They covered specialised topics and skills, requiring basic public health knowledge at a minimum, and in some cases, recommending learners have prior work experience in managing outbreaks. Four courses used the MOOC format to provide technical assistance for pandemic management. “COVID-19 national survey package” (WHO) was designed to support countries in carrying out national data collection, providing assistance in conducting a national survey collecting social and behavioural data to inform vaccine roll-out. “Learning package for Rapid Response Teams in the Context of COVID-19 in India” (WHO) covers data management as part of emergency response. “Measuring and Maximizing Impact of COVID-19 Contact Tracing” (Johns Hopkins University) provides guidance on evaluating the performance of contact tracing programmes using a digital application. Similarly, “Introduction to Go.Data—Field data collection, chains of transmission and contact follow-up” (WHO) covers the use of a free, multilingual outbreak investigation software in outbreak scenarios for case and contact data, investigation and follow-up, and data visualisation. Use of epidemiological data as part of outbreak management was covered in seven courses. The courses were largely introductory, with one course—“Fighting COVID19 with Epidemiology: A Johns Hopkins Teach-Out” (Johns Hopkins University)— specifically designed for the general public to encourage public health literacy.

Outbreak and Pandemic Management in MOOCs: Current Status …

907

Other courses carried modules focused on epidemiological tools and analysis during outbreaks, with one course including the impact of prevention measures such as vaccination (“Epidemics III” organised by the University of Hong Kong). Out of the three courses providing introductions to epidemiological modelling, “Infectious Disease Transmission Models for Decision-Makers” (Johns Hopkins University) was explicitly focused on the subject, supporting professionals in making programme, and policy decisions based on improved understanding and interpretation of modelling data. Two courses introduced students to data science applications and the use of genomics in pandemic management. Additionally, one of the courses, “From Swab to Server: Testing, Sequencing, and Sharing During a Pandemic” (COVID-19 Genomics UK consortium and Wellcome Connecting Science), was the only MOOC to discuss ethical, legal, and social issues linked to the use and sharing of data. One course, “SocialNet: Empowering communities before, during, and after an infectious disease outbreak” (organised by WHO), covered the use of data in social sciences as part of emergency response. Specifically, it focused on community engagement and the application of social science in working together with communities to develop interventions.

4 Discussion The aim of the study was to examine MOOCs on outbreak and pandemic management with a focus on content covering the use of data in emergency response. The scoping review and thematic analysis identified personal and professional competencies, institutional and organisational response, and community participation as three modes in which existing MOOCs function as educational responses to outbreak and pandemic management. In the available resources, knowledge and skills related to the use of data in traditional areas such as emergency response have been predominantly confined to technical education of outbreak management tools and epidemiology, reflecting the relatively recent development of MOOCs as an educational phenomenon [9]. With more than one-fifth of the included courses being developed during the COVID-19 pandemic, the use of MOOCs potentially presents unique benefits in disseminating knowledge and information during health emergencies. Identifying where and how MOOCs have been employed has been important in recognising existing challenges and future opportunities in further developing MOOCs as a learning resource. While MOOCs have provided access to outbreak and pandemic management courses from reputable universities and organisations, no such courses have been independently developed by an institution located in an LMIC. The course content, technology, and pedagogical ideas come from western countries [10], wherein there is a “unidirectional transfer of standardised Western education to a diverse international pool of participants” [11]. Such courses may fail to reflect regional needs,

908

A. Murali et al.

ignore specific populations and issues, and lack context-specific learning and materials in local languages [7, 12]. Alternative ways of knowing and thinking, informed by local and indigenous knowledge systems also get erased when regions or countries lacking the resources required are unable to develop MOOCs [10, 11]. Further, MOOCs pose the risk of replicating existing inequities within health education or exacerbating them. For example, in a study conducted by Hansen and Reich [13], American students from more-affluent neighbourhoods and with undergraduate-level education were found to be more likely to register and attain certificates in MOOCs offered by Harvard and MIT. In the context of LMICs, while MOOCs have shown potential in supporting healthcare workers (HCWs), the majority of the courses were for post-graduates [14]. Studies have also found an underrepresentation of women in enrolment [15, 16]. Given that the majority of community health workers in LMICs are women from low-income families [17, 18], future outbreak management course development may benefit from addressing barriers to usage. Although MOOCs have previously shown low enrolment rates in developing countries [15], there has been a recent surge in enrolments during the COVID-19 pandemic [19]. Addressing technological and pedagogical barriers to developing MOOCs within institutions in LMICs can leverage the growing interest in online education and strengthen outbreak and pandemic management competencies. The MOOCs included in the study, due to the focus on sharing of technical knowledge, frame outbreak and pandemic management and related policy-making as a technical exercise. Few courses discussed the ethics and value base directing public health or the role of context in shaping decision-making. Consequently, the social, economic, and financial impacts of outbreak measures on communities were rarely covered in MOOCs, despite equitable public health response being identified as a core aspect of outbreak response [20]. Only a small number of included studies used a multi-or inter-disciplinary approaches to understanding outbreaks and outbreak response. However, our study found that MOOCs as educational responses capture key aspects of community engagement and development by facilitating effective scientific communication regarding disease, risk, and public health interventions, along with providing mental health support skills training for self and others. While the field of data science is well represented in MOOCs, there are no courses integrating public health goals and expertise with data science practises in outbreak and pandemic management. Domain expertise in medicine and public health ensures existing gaps in knowledge, appropriate data science methods, and feasible data science solutions are identified in research and implementation. Central to aiding the use of data science for social good is training public health data scientists, where data science education is integrated with public health perspectives and ethics [21]. The MOOC format, being online, open and allowing massive enrolment, is suited to supporting such training needs, ensuring data science contributions informing outbreak management policies and practises are built on sound public health rationale. In the context of LMICs, the capital-intensive approaches of the high-income countries might not be suitable for LMICs lacking their immense labour resources. Locally relevant public health data science training can build approaches that empower grassroot workers and establishments to improve work efficiency,

Outbreak and Pandemic Management in MOOCs: Current Status …

909

automating the mundane tasks to ensure quality time can be devoted to service delivery. The uptake and completion of public health in data science courses could be further supported by moving away from conventional lecture-centred pedagogical approaches (termed behaviourist MOOCs or xMOOCs) to using more interactive approaches. The potential of the use of MOOCs in outbreak response, training of public health actors, and providing continuing education in LMICs depends on building MOOCs that uphold open education principles. More than half of the courses included in the review were hosted by large, for-profit MOOC providers, employed closed licencing, required certification fees, and were developed by private universities—largely in the U.S. and U.K.—with business interests in the monetisation and scalability of the courses [5]. Such practises prevent MOOCs from being cost-effective resources in LMICs. Further, paid upgrades for certification introduce financial barriers in accessing quality education, as individuals in LMICs are less likely to afford such fees. Lastly, closed licencing prevents the collaborative creation of knowledge and promotes the commodification of education [11]. It discourages MOOC adoption and creation in countries lacking financial and infrastructural resources. On the other hand, under open licencing, open educational resources can be repurposed into MOOC formats, reused across different regions via translation to local languages, and revised to reflect local contexts better. The study had some significant limitations. In the absence of databases with filters and advanced search options, a uniform search strategy could not be used across the MOOC platforms. Furthermore, different platforms presented course details and descriptions in varied ways. A comprehensive search including information missing in the course description would require auditing the courses, which went beyond the scope of our study. However, pilot searches were conducted to ensure relevant courses which were captured using broad search terms. As the study did not examine MOOCs across all existing platforms and conduct systematic searches on search engines, it is likely that some MOOC resources may have been missed. Having included the most popular MOOC platforms providing full-fledged courses (edX, Coursera, and FutureLearn), platforms specific to public health (OpenWHO), and employing handsearching (Google Search Engine), it is likely that the MOOCs we failed to identify are negligible in number. Further, the aim of our scoping review was to comprehensively map existing courses rather than provide an exhaustive list of existing courses. When coupled with the systematic procedures used for study selection and data extraction and analysis, our search strategy and thematic analysis have allowed us to adequately address this research objective. While the study analysed course syllabi on outbreak and pandemic management, it does not appraise the quality of MOOCs or provide insights into the actual impact of the courses on learning given that previous studies indicate MOOCs to have low completion rates [15] and there is currently limited evidence for the educational benefits in public health training [22]. However, recent evidence has shown support for the integration of MOOCs with online social networks in increasing rates of completion, suggesting changes in pedagogical strategies can be used to address the issue [23]. Course syllabi do not fully reflect the course content and learning

910

A. Murali et al.

experiences of students. The findings from the current study will benefit from being seen in conjunction with results from quantitative and mixed methods approaches. Our findings can inform these future investigations, where, for instance, methods such as content analysis may provide a clearer picture of which themes and sub-themes are more frequently featured in MOOCs. Lastly, the impact of MOOCs needs to be assessed to generate insights into the effectiveness of using the format across diverse contexts and populations. We are yet to understand if the MOOC format ensures adequate knowledge retention and if the benefits of course completion are limited to information acquisition or can extend to attaining competencies [4]. The pedagogical benefits of MOOCs may differentially translate into positive learning outcomes based on the core competency and linked knowledge and skills being taught. Future research can help to support the format’s effective use by determining which areas of outbreak response are more feasible, compatible, and beneficial to be taught in MOOC formats than others.

5 Conclusion This research study sought to examine MOOCs on outbreak and pandemic management. While there exist diverse opinions on the contributions of MOOCs, the courses examined as part of the study showcased the use of MOOCs in creating resources to strengthen competencies in outbreak management across a multitude of settings, populations, and educational levels. Existing MOOCs provide knowledge and skills training to strengthen outbreak response by building the personal and professional capacities of individuals involved in health emergency management. The courses also worked towards strengthening the wider health system response through institutional capacity building. The study found that MOOCs can be viewed as educational resources for community resilience in how they provide a unique format for communities to engage with health information and public health measures to make informed decisions and support one another. Learning resources on the application of data in outbreak and pandemic management, however, are limited and restricted to the epidemiological aspects. The growing role of data science in public health research and practise was not found to be reflected in available outbreak management courses. Public health data science capacities in the health workforce will be needed to strengthen response to outbreaks and pandemics. MOOCs with strong open-access features present a promising opportunity to deliver public health data science education, integrating domain expertise, public health ethics, and data science methods.

Outbreak and Pandemic Management in MOOCs: Current Status …

911

References 1. Iniesto F, McAndrew P, Minocha S, Coughlan T (2016) Accessibility of MOOCs: understanding the provider perspective. J Interact Media Educ 2016(1) 2. Zhang Q, Gao J, Wu JT, Cao Z, Zeng DD (2022) Data science approaches to confronting the COVID-19 pandemic: a narrative review. Philos Trans R Soc A 380(2214):20210127 3. Aldridge RW (2019) Research and training recommendations for public health data science. Lancet Public Health 4(8):e373 4. Hunter DJ, Lapp I, Frenk J (2014) Education in public health: expanding the frontiers. Am J Prev Med 47(5):S286–S287 5. Gooding I, Klaas B, Yager JD, Kanchanaraksa S (2013) Massive open online courses in public health. Front Public Health 1:59 6. Bhattacharya S, Singh A, Hossain MM (2020) Health system strengthening through Massive open online courses (MOOCs) during the COVID-19 pandemic: an analysis from the available evidence. J Educ Health Promot 9 7. Bendezu-Quispe G, Torres-Roman JS, Salinas-Ochoa B, Hernández-Vásquez A (2017) Utility of massive open online courses (MOOCs) concerning outbreaks of emerging and reemerging diseases. F1000Research 6 8. World Health Organization (2018) Multisectoral and intersectoral action for improved health and well-being for all: mapping of the WHO European region. Governance for a sustainable future: improving health and well-being for all. No. WHO/EURO: 2018-2667-42423-58849. World Health Organization. Regional Office for Europe 9. Ebben M, Murphy JS (2014) Unpacking MOOC scholarly discourse: a review of nascent MOOC scholarship. Learn Media Technol 39(3):328–345 10. Altbach PG (2014) MOOCs as neocolonialism: who controls knowledge? Int High Educ 75:5–7 11. Adam T (2019) Digital neocolonialism and massive open online courses (MOOCs): colonial pasts and neoliberal futures. Learn Media Technol 44(3):365–380 12. Allotey P, Reidpath D, Certain E, Vahedi M, Maher D, Launois P, Ross B (2021) Lessons learned developing a massive open online course in implementation research in infectious diseases of poverty in low-and middle-income countries. Open Praxis 13:127. https://doi.org/ 10.5944/openpraxis.13.1.1172 13. Hansen JD, Reich J (2015) Democratizing education? Examining access and usage patterns in massive open online courses. Science 350(6265):1245–1248 14. Nieder J, Schwerdtle PN, Sauerborn R, Barteit S (2022) Massive open online courses for health worker education in low-and middle-income countries: a scoping review. Front Public Health 10 15. Christensen G, Steinmetz A, Alcorn B, Bennett A, Woods D, Emanuel E (2013) The MOOC phenomenon: who takes massive open online courses and why?. Available at SSRN 2350964 16. Suter F, Lüthi C (2021) Delivering WASH education at scale: evidence from a global MOOC series. Environ Urban 33(1):99–116 17. Ved R, Scott K, Gupta G, Ummer O, Singh S, Srivastava A et al (2019) How are gender inequalities facing India’s one million ASHAs being addressed? Policy origins and adaptations for the world’s largest all-female community health worker programme. Hum Resour Health 17:3. https://doi.org/10.1186/s12960-018-0338-0 18. Morgan R, Ayiasi RM, Barman D, Buzuzi S, Ssemugabo C, Ezumah N et al (2018) Gendered health systems: evidence from low- and middle-income countries. Heal Res Policy Syst 16:58. https://doi.org/10.1186/s12961-018-0338-5 19. Impey C, Formanek M (2021) MOOCS and 100 Days of COVID: enrollment surges in massive open online astronomy classes during the coronavirus pandemic. Soc Sci Humanit Open 4(1):100177 20. Haldane V, Jung A-S, De Foo C, Bonk M, Jamieson M, Wu S, Verma M et al (2021) Covid-19 preparedness and response: implications for future pandemics: strengthening the basics: public health responses to prevent the next pandemic. BMJ 375

912

A. Murali et al.

21. Goldsmith J, Sun Y, Fried L, Wing J, Miller GW, Berhane K (2021) The emergence and future of public health data science. Public Health Rev 4 22. Baker PRA, Dingle K, Dunne MP (2018) Future of public health training: what are the challenges? What might the solutions look like? Asia Pac J Public Health 30(8):691–698 23. Fidalgo-Blanco Á, Sein-Echaluce ML, García-Peñalvo FJ (2016) From massive access to cooperation: lessons learned and proven results of a hybrid xMOOC/cMOOC pedagogical approach to MOOCs. Int J Educ Technol High Educ 13(1):1–13

Data Science Approaches to Public Health: Case Studies Using Routine Health Data from India Arun Mitra , Biju Soman , Rakhal Gaitonde , Tarun Bhatnagar , Engelbert Nieuhas, and Sajin Kumar

Abstract The promise of data science for social good has not yet percolated to public health, where the need is most, but lacks priority. The lack of data use policy or culture in Indian health information systems could be one of the reasons for this. Learning from global experiences on how routine health data has been used might benefit us as a newcomer in the field of digital health. The current study aims to demonstrate the potential of data science in transforming publicly available routine health data from India into evidence for public health decision-making. Four case studies were conducted using the expanded data sources to integrate data and link various sources of information. Implementing these data science projects required developing robust algorithms using reproducible research principles to maximize efficiency. They also led to new and incremental challenges that needed to be addressed in novel ways. The paper successfully demonstrates that data science has immense potential for applications in public health. Additionally, data science approach to public health can ensure transparency and efficiency while also addressing systemic and social issues such as data quality and health equity.

A. Mitra · B. Soman (B) · R. Gaitonde Sree Chitra Tirunal Institute for Medical Sciences and Technology, Trivandrum, Kerala, India e-mail: [email protected] A. Mitra e-mail: [email protected] R. Gaitonde e-mail: [email protected] T. Bhatnagar National Institute of Epidemiology, Chennai, India e-mail: [email protected] E. Nieuhas University of Koblenz and Landau, Landau, Germany e-mail: [email protected] S. Kumar University of Kerala, Trivandrum, Kerala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_63

913

914

A. Mitra et al.

Keywords Health information systems · Data science · Public health · Decision-making · Deep learning · Digital health · ICT4D

1 Introduction The dawn of the digital era has transformed our society in many ways. Rapid advances in technology and innovation have brought about unprecedented changes in our daily lives. These changes brought about by the digital revolution have also touched all sectors and industries. Health care likewise has evolved and is well launched down this digitalization journey [1–3]. Innovations like drug discovery, vaccine research, diagnostic modalities, and therapeutic interventions have significantly impacted human life expectancy and how humans live their lives. In addition, evolving habits, interactions, and lifestyles brought about by technological innovations have directly and indirectly affect human health. Over the last few decades, the emergence of the information age has accelerated this transformation opening windows of opportunities to new possibilities. It has become increasingly clear that data collected in the healthcare industry has become more complex, huge in volume, and generated faster than ever before [4, 5]. The types of data generated through health information systems have also increased significantly, moving from paper-based systems to machine-generated data such as medical imaging, wearable technology, IoT devices, remote sensing sensors, and satellite imagery [6, 7]. Data on health is considered a national asset for every country. It helps track the performance of the health system. It provides reliable information for decisionmaking across all building blocks. It is also vital for monitoring the overall objective of health systems, such as the changes in health status and outcomes over time. Therefore, health information is necessary to take many decisions at all levels, such as the local, regional, and national levels. A national health information system typically collects crucial health-related data for various purposes. The health information systems are integral to the health system providing specific information support to the decision-making process at multiple levels. Just like a health system, the health information system is not static and must amend itself to the changes that occur with time. A health information system’s primary functions are to monitor trends in health outcomes and services, take decisions relating to public health quickly and efficiently, identify the best strategies for public health interventions, and ensure the coordination and equity of health services. At the same time, ensuring that data is trustworthy while also managing resources for optimal use and benefit. The routine health information systems constitute data that is collected and collated at regular intervals at different levels from health facilities and health programmes including public, private, and community-level. This data provides a picture of health status of the population, health services delivered by the health system, and health resources available for utilization. The data from the routine health information system is both actively and passively captured by healthcare providers as they go about their work, by

Data Science Approaches to Public Health: Case Studies Using Routine …

915

their supervisors for the purposes of monitoring, and through routine health facility surveys for the purposes of health governance and administration. The role of a routine health information system is critical to achieving Universal Health Coverage (UHC) and it is of paramount importance for decision-making in public health. Traditional health data has been limited to clinical health information systems, and data from the vertical national health programmes has been used for surveillance and monitoring purposes. Digitalization of paper-based systems, implementation of electronic medical/health records, and integration of multiple databases including sociodemographic, climatic, environmental, and economic data routine health information systems has ensured that there is a need to expand the RHIS into an expanded RHIS which is more dynamic, responsive, and proactive than the traditional RHIS. However, one of the major challenges faced by the expanded RHIS is issue of transforming data into evidence for decision-making in public health. Existing data analysis methods and techniques have their limitations, owning to not just the volume, and velocity of the generated data but also the evidence generation needs to be done in a timely and transparent fashion to earn public trust. The true challenge is transforming data into information and knowledge for public health benefit. It has become evident that innovative solutions to generate robust, near real time, high-quality, actionable evidence for public health decision-making are the need of the hour. With this background, the current study aims to demonstrate the utility and potential of data science as a tool to address the challenge of transforming routinely available data into evidence for informed decision-making in public health.

2 Review of Literature 2.1 What is Routine Health Data? Routine health data often refers to non-targeted data obtained passively from different health information systems. They are collected continuously at various periods (daily, weekly, monthly, annually, etc.) and can be collected individually patient by patient or aggregated at the family, facility, or geographic level. They come from the existing national or regional health information system and its subsystems that are collected as part of an ongoing routine. It can be categorized as (a) demographic data, (b) health (disease) events data, (c) population-based health-relevant information, and (d) data from other disciplines such as climatic data and meteorological data. One other way of classification of routine data could be as (a) administrative data, (b) clinically generated data, (c) patient-generated data, and (d) machine-generated data [8].

916

A. Mitra et al.

2.2 Sources of Routine Health Data? There can be many sources of routine health data. They can be classified as standard sources or expanded sources [6, 9]. Standard. Standard sources are traditional sources of health data and can be divided into three different sub-divisions again—(i) health services, (ii) public health, and (iii) research. The sources of health services data could be clinical records, electronic health records, electronic medical records, prescription data, diagnostic data, laboratory information, and data on health insurance. The sources relating to public health could be data on disease surveillance, immunization records, public health reporting data, vital statistics, disease registries, civil registration systems, etc. Research data sources could include demographic surveillance sites, omics or genomics data, clinical trial registries, bio-banks, medical devices, and the Internet of medical Things (IoMT). Expanded. Expanded data sources are newly identified sources that were traditionally not considered sources of health data. These sources can be further classified into (i) environmental, (ii) lifestyle and socio-economic, and (iii) behavioural and social. Environmental sources include climate, meteorological, transport, pollution, forest cover, and animal health. The data sources on lifestyle and socio-economic could be location-tracking information, financial data, education, and data from various relevant mobile applications. Similarly, the behavioural and social data sources are data on wellness, fitness, Internet use, social media, self-monitoring devices, wearables, and IoT sensors.

2.3 Potential of Routine Health Data There is a lot of potential for using routine data in informed decision-making in public health. Some of these are it improves the utilization of health information being collected. It saves time by leveraging already available data and also adds no or low additional costs to the health system. It helps generate new hypotheses, testing these hypotheses, and comparing across populations and high-risk groups. It also provides a baseline estimate of the expected levels of health and disease in a specific population and allows for comparisons across geographic regions. As it is passively collected information, it is possible to conduct natural experiments which are unbiased and aid in informed decision-making processes. It is also comprehensive and considers the forward motion of time, allowing for longitudinal analysis of health outcomes and states.

Data Science Approaches to Public Health: Case Studies Using Routine …

917

2.4 Indian Context It is argued that there are three distinct stages of digital health evolution, and India has crossed the first stage of digitalization, where most paper-based systems are being captured in their electronic forms, including X-rays and CT films. As a country, India has now stepped into the second stage of digitalization where new forms of digital data are being collected, such as mobile health and health monitoring devices such as wearables, the capture of health information in the forms of electronic health records and electronic medical records, and telemedicine consultations including both public and private health facilities. To achieve complete digitalization, the healthcare systems (both public and private) must embrace digital workflows. This stage also gives rise to new opportunities and business models, which have already started in some metropolitan cities but are yet to percolate to the rural and grass-root level where the need is most. The final stage of the digital evolution is digital transformation, where the good quality data from the health system (including routine health information systems, clinical health information systems, and both public and private health facilities) is envisioned to be leveraged through government initiatives like the National Health Stack, National Digital Health Mission, Electronic Health Record Standards, Integrated Health Information Portal Health ID, Universal Payment Interface, etc. [6, 10, 11]. However, the current state of the Indian health information system reveals multiple challenges that must be addressed before achieving the ultimate digital transformation. Currently, routine health data is not utilized to its full potential in India. Just five per cent of all healthcare data is collected in India, out of which only a tiny fraction is being used for public health decision-making. Some of these are the governmental focus on centralizing health information systems with a limited emphasis on supporting local action, the use of proprietary platforms, and working in silos. Some issues that might arise could be the probability of data breach, data storage and ownership, data quality, and standardization. In-depth studies on the use of data in the Indian health sector suggest that the decision-making process is central to the health system. To function efficiently, the decision-making process should incorporate an iterative cycle of generating data demand, data collection and analysis, information availability, and the information used for decision-making. Implementing this will lead to improved health decisions and accountability. Some of the recommendations arising from the research are to train healthcare professionals on the importance of data use and encourage its use in the decisionmaking process. Additionally, support all stakeholders, involve civil society groups, conduct comprehensive needs assessments, implement data standards, and conduct regular data audits.

918

A. Mitra et al.

2.5 Challenges of Routine Health Data Some of the critical challenges and questions to consider when using any data collected by the routine health information system to infer population health are as follows: Data Quality. Is the data of good quality? To what extent is the data accurate in capturing the natural phenomenon? Are there any inherent data quality issues or biases present in the data? Precision. Can we comment on the level of uncertainty in the data? Can we provide a confidence interval or any other measure of uncertainty along with the estimate? Completeness and Coverage. Is the data from the routine health information system complete? Is it representative of the population at hand? Is there any missing data? If yes, how much? Are there any patterns of missingness? Timeliness. When was the data in question collected? Is it still relevant? Are any newer, more recent data sources that might be more relevant? Analysis. What are the different analytical approaches that have been applied to the data? Has the data from the routine health information system been analysed adequately? Has there been any standardization done to this data? Is there adequate documentation mentioning the pre-processing done? Has it been done in a way that can be reproducible and peer-reviewed? Accessibility. Who owns the data at hand? Who has access to this data? Who controls access to this data? Confidentiality. Is the data at hand confidential? Does it contain any sensitive information? Can individuals be identified from this data? Has it been anonymized? Original purpose of collection. Some of the data may contain personal information and may not be used for any other purpose unless informed consent. Motivation. The other major challenge is that all the stakeholders involved in the data generation process of a routine health information system may not be adequately motivated. Some additional challenges associated with the use of routine health information systems are given below [12]. Technical Determinants. Examples of these are the lack of explicit information such as gender and age; different definitions in different formats (paper vs electronic); data on emerging infections and diseases lacking; and lack of integration with data from other sources at the community level. Behavioural Determinants. Some of the behavioural determinants are misclassification of conditions by healthcare workers; incomplete data collection; incorrect input due to factors like recall bias by the healthcare worker; delay in submission of data which may sometimes be due to the delay in salaries; data errors due to computational mistakes; difficulty in understanding feedback from the central level.

Data Science Approaches to Public Health: Case Studies Using Routine …

919

Environmental/Organizational Determinants. Shortage in precious resources, including time, monetary, and human resources; lack of trained personnel; nonprioritization of local needs in decision-making; inadequate data analysis and interpretation, lack of inter and intra-departmental coordination on data sharing and many more constitute the environmental/organizational determinants. Some of the lessons we can learn from the experiences of other countries are outlined in Table 1.

3 Case Studies The authors chose to demonstrate the potential of public health data science as four case studies, each using publicly available data from different sources, formats, and levels. • Case Study 1—using crowd-sourced data on COVID-19 in India at the national level • Case Study 2—using mortality data from the civil registration system at the panchayat level • Case Study 3—using periodic survey data to inform policy on maternal health at the district level • Case Study 4—using medical image data and design innovative solutions with applications in telemedicine and tele-ophthalmology.

3.1 Case Study 1—The AMCHSS COVID-19 Dashboard For this case study, a translational research design with a data science approach was chosen. The data sources used for this case study include: • Crowd-sourced database and website maintained at https://covid19bharat.org. • State-Level Daily Medical Bulletins (for the state of Kerala). • District-level population density based on 2020 projections by NASA Socioeconomic Data and Applications Centre (SEDAC) NASA and the Unique Identification Authority of India, Govt India [19, 20]. • Population Mobility Data from Google (https://www.google.com/covid19/mob ility/). • Indian COVID-19 Genome Surveillance Data—(https://clingen.igib.res.in/covid1 9genomes/). The tools used were open-source statistical programming language R and the RStudio IDE [21, 22]. The study was implemented adhering to the tidy data principles and the tenets of reproducible research recommended by the scientific community [23–27]. The packages used were available from CRAN and from the R Epidemics

920

A. Mitra et al.

Table 1 Lessons learned from use cases of routine health data globally Countries

Lessons learned

Australia [13, 14]

Suggest the importance of continually engaging and incentivizing national and local stakeholders Within the Australian government structure, there are defined roles and responsibilities for the different layers of government, advisory committees, and research centres Department of Health and Ageing allows these groups to collaborate better and support each other in decision-making activities

Europe [14–16]

Strong political will and support for the collection of credible, population-based data Efforts of non-governmental organizations to improve data collection and measurement allowed Ensuring the data used for making decisions were valid, reliable, and current Data transparency—the success of the UK’s hospital waiting lists policy reform. Research using hospital waiting list data—regularly reported in the media and leading medical journals raising public and health professional awareness Although the government understands the importance of collecting data and measuring policy effectiveness, of policies—must re-compete for priority in every political election cycle Another barrier to long-term reform may be its sustainability. Eventually, the costs involved may become higher than the incremental gains Maintaining data collection over time—important for evaluating temporal trends and for determining whether a shift in policy focus will be required

USA [13, 14]

Retrospective medical record databases not only provide readily available data for timely decision-making but the fact that the same data is analysed by policy decision-makers, academic researchers, and the pharmaceutical industry enhances the credibility and transparency of the findings Funding for this research falls short of the demand Researchers trained in pharmacoepidemiology, drug safety, and risk management are needed in the USA to increase research capacity for this important policy-relevant work

Scandinavian countries [17, 18]

Improving user-friendliness and adding even more health indicators to monitor the state of social inequalities in health

Developing countries

Reporting requirements must be able to change over time Programme reporting requirements must be integrated in order to ensure the development of coherent information Need for a hierarchy of information needs Additional information can be collected through specific programme surveys

Data Science Approaches to Public Health: Case Studies Using Routine …

921

Table 2 Packages used for the implementation of the COVID-19 dashboard Clean, Tidy, Link

Epidemiology

Visualization

Dashboard

tidyverse

incidence

ggplot2

flexdashboard

dplyr

R0

plotly

shiny

fuzzyjoin

projections

dygraphs

shinybulma

Table 3 Intended target audience for the COVID-19 dashboard case study Intended users

Intended purpose

Mode

District level programme manager

Evidence informed decision-making; feedback on future improvements

Interactive dashboard

Decision-makers

Evidence informed decision-making

Dashboard

Academia and researchers

Discussion on methods; peer-review; academic discourse

Dashboard + Methodology

General public

Citizen involvement, public discourse, media, and information professionals

Website/Blog

Consortium (ReCon) [28, 29]. The fill list of packages is available below (see Table. 2). The detailed methodology is described elsewhere [30–32]. The case study results can be viewed as a live dashboard which is hosted at https:// amchss-sctimst.shinyapps.io/covid_dashboard/ for public viewing (see Table. 3). . Figures 1, 2, 3, 4, 5, 6, 7, 8, and 9 illustrate the salient features of the dashboard. The figure description provides more insight into the feature and is self-explanatory.

3.2 Case Study 2—Cause of Death The Civil Registration System (CRS) may be defined as a unified process of continuous, permanent, compulsory, and universal recording of the vital events and characteristics thereof, per the country’s legal requirements [33]. CRS in India dates back to the mid-nineteenth century. It started with the registration of deaths with a view to introducing sanitary reforms for control of pestilence and diseases. CRS in Kerala came into force on 1 April 1970. As of today, CRS in Kerala is computerized in corporations, municipalities, and rural registration units (gram panchayat). The registry is being supported by the ‘Sevana’ Civil Registration Software developed by Information Kerala Mission (IKM), set-up for computerization of local bodies (https://cr.lsgkerala.gov.in). The CRS records prior to the date of computerization have been digitized and the issue of certificates is also computerized.

922

A. Mitra et al.

Fig. 1 Screenshot of the AMCHSS COVID-19 dashboard (https://amchss-sctimst.shinyapps.io/ covid_dashboard/)

Fig. 2 Modelling epidemiological parameters based on different COVID-19 waves in India

For this case study, we chose CRS data from the Trivandrum Corporation. The study area is presented below (see Fig. 10). Though data is being collected in real time, the analysis is being performed and being reported only once a year as part of the Annual Vital Statistics Report [34, 35]. Some of the challenges with the current CRS in Kerala are as follows: • • • • • •

Pre-occupation of Registrars and other functionaries with other duties. Excessive Delay in Registrations and Reporting. Inability to use the data for local action. Large proportion of infant and child deaths missed (esp. in rural areas). Data from various LSGs are collected, but unclear on the data use. Lack of user-friendly of information systems for the decision-makers.

Data Science Approaches to Public Health: Case Studies Using Routine …

923

Fig. 3 Interactive visualization of the state-wise comparison of doubling time of number of COVID19 cases in India during the third COVID-19 wave

Fig. 4 3D visualization of the trends in test positivity rate (TPR) across different states between January 2021 till July 2022

• Data is available for decision-makers only as a PDF file which cannot be used readily for data analysis. We sought to explore the use of data science address some of the challenges faced. The data extraction pipeline is presented in the figure below (see Fig. 11). Results of the case study are presented as illustrations below (see Figs. 12, 13, 14, and 15).

924

A. Mitra et al.

Fig. 5 Forecasting 15-day daily future incidence of COVID-19 cases based on the trajectory of time dependent reproduction number (projections for three different scenarios for the state of Telangana shown for illustration)

Fig. 6 Illustrative example of semi-automated report generation using reproducible algorithms for interactive and timely information for decision-making in public health

Some of possible benefits of using the data science approach are as follows: • • • • • • •

Encourage data use policy Improve data collection, processing, validation, efficiency of the system Integrate data from different sources Timely and actionable evidence for public health interventions Allows for spatial and temporal analysis (use of geocoded information) Transformation of text data from PDF into analysis ready formats Automated report generation.

Data Science Approaches to Public Health: Case Studies Using Routine …

925

Fig. 7 Section detailing an in-depth analysis addressing issue of gender equity in access to COVID19 vaccination using data from the COWIN portal

Fig. 8 Example of customizing the code for creating state-level dashboards for monitoring the status of COVID-19 at the district level (Kerala state for illustration)

3.3 Case Study 3—Increasing Trends of Caesarean Sections in India Globally, it is estimated that one in five children are born by caesarean deliveries and this is going to increase to three in five by the year 2030. The continued rise in

926

A. Mitra et al.

Fig. 9 Screenshot the dashboard for Karnataka state demonstrating the scalability of the algorithms to quickly create decision support tools for local action at the district level

Fig. 10 Study area: Kerala state, Thiruvananthapuram district, Trivandrum corporation (inset)

caesarean sections has been a cause of concern in low- and middle-income countries (LMICs) like India for many reasons, including post-partum bleeding, infection, complications, or even death to the mother and the child. Previous studies show the growing disparities in access to quality maternal health care as well as inequitable distribution of these services across different geographical regions and sociocultural contexts contribute to caesarean sections. However, evidence on these complex relationships is still emerging and need for an in-depth analysis in the Indian context is critical. Geospatial techniques using data from large-scale national surveys like

Data Science Approaches to Public Health: Case Studies Using Routine …

927

Fig. 11 Schema of the process of data extraction from raw data (*.pdf files) into a unified database for case study 2. Note the steps of scraping of data, pre-processing data and data linkage using reproducible methods

Fig. 12 Distribution of the infectious causes of death registered in Trivandrum corporation in the years 2017, 2019, and 2020

the National Family Health Survey can unearth crucial evidence into the patterns of medically unnecessary and potentially harmful caesarean sections in India. This much needed exploration has the potential to inform public health policy and provides opportunities for course correction in maternal and child health service delivery in India.

928

A. Mitra et al.

Fig. 13 Some summary statistics of the cause of death segregated by age and number of deaths (the colour scheme represents the ranking, blue representing the top cause of death that year)

Fig. 14 Geographical distribution of cause specific mortality rate of non-communicable diseases across Trivandrum district. One can appreciate the spatial relationship with higher rates (in red colour) around the central part of the district which is home to the urban population

The objectives of this case study were to study the patterns of caesarean section at the state and district level in India and investigate spatial clustering of caesarean sections across districts of India.

Data Science Approaches to Public Health: Case Studies Using Routine …

929

Fig. 15 Reproducible and scalable algorithms to generate automated reports for improving efficiency and promoting the using cause of death data from the CRS in Trivandrum corporation in the years 2017, 2019, and 2020

The data extraction methodology used reproducible algorithms that was used in similar situations with minimal modification and customization (see Figs. 16 and 17).

Fig. 16 Data extraction and preparation schema for case study 3. The reproducible framework has been modified from the case study 2 (see Fig. 11) with additional step of spatial analysis and visualisation

930

A. Mitra et al.

Fig. 17 Summary of the data analysis and visualization methods and tools used for the study

Results There has been as steady increase in caesarean sections in India over the last 15 years (see Fig. 18). The proportion of caesarean sections in India rose from 17.2% in NFHS-4 (2015–16) to 21.5% in NFHS-5 (2019–21). As a result, 21 states and union territories had C-section births proportion greater than the national rural average (>17.6%), while 17 states and union territories were greater than the national urban average (32.3%). The proportion of caesarean births was higher among private hospitals as compared to public hospitals in NFHS-5 (see Fig. 19). Additional investigation for the evidence of clustering in districts with high caesarean sections revealed highly significant clustering the global level with the Moran’s I, p-value < 0.01. Similarly, statistically significant clustering was observed at the local levels (Getis-Ord General G, p-value < 0.01) in both the public and private hospitals (Table. 4).

Fig. 18 Spatial distribution of caesarean births across the three rounds of national family health surveys

Data Science Approaches to Public Health: Case Studies Using Routine …

NFHS-3 (2005-2006)

Location Rural Urban

NFHS-4 (2015-2016)

NFHS-5 (2019-2021)

Proportion of Caesarean Sections (%)

Private

931 Public

40

Place

30

Rural Urban

20

10

0

20

40

60

NFHS-4 NFHS-5

80

NFHS-4 NFHS-5

Proportion of Births by Caesarean Section

Fig. 19 Distribution and increase of the proportion of caesarean births across different types health facilities in rural and urban India

Table 4 Summary of the clustering analysis districts with high rates of caesarean sections by type of facility and NFHS round Facility type

NFHS round

G statistic

Z score

p-value

Overall

NFHS-4

0.01099

18.67214

< 0.0001

Overall

NFHS-5

0.01081

19.13736

< 0.0001

Public

NFHS-4

0.0081

8.28497

< 0.0001

Public

NFHS-5

0.00808

9.00839

< 0.0001

Private

NFHS-4

0.01082

19.18451

< 0.0001

Private

NFHS-5

0.00749

17.28355

< 0.0001

The urban–rural differential in quality maternal care may be widening with unnecessary and medically unwarranted caesarean sections being performed in rural hospitals. Much of these caesarean sections are being brought about by the recent outcrop of private medical facilities in rural India (see Figs. 20, 21 and 22). Geospatial analysis can reveal crucial insights into where the disproportionate increase in caesarean sections is being undertaken, which may aid in designing tailored public health interventions and come up with policy decisions necessary for course corrective steps. One limitation is that we did not take into account factors such as total fertility rate, health insurance, and the possibility of repeat caesarean sections. These factors need to be kept in mind in future research works. To conclude, the case study demonstrates the utility of spatial data science methods to using large survey data to reveal inequities in maternal health care with potential public health policy implications.

932

Fig. 20 Optimized hotspot analysis of caesarean sections (%) in India

A. Mitra et al.

Data Science Approaches to Public Health: Case Studies Using Routine …

933

Fig. 21 Hotspot analysis of clusters of districts with high and low caesarean sections (%) by type of health facility (public vs private)

Fig. 22 Local indicators of spatial association (LISA) for the cluster analysis of districts with high caesarean rates

934

A. Mitra et al.

3.4 Case Study 4—Retinal Disease Classification A number of years have passed since telemedicine and tele-ophthalmology were introduced, however, they have become more important in the recent years as a result of the COVID-19 pandemic. A recent survey (n = 1180) done by All India Ophthalmological Society reveals that currently 17.5% of the ophthalmologists use tele-ophthalmology in their practice. 98.6% showed increased interest to incorporate tele-ophthalmology in their practice and 98.8% of the practicing ophthalmologists view mobile-based applications as having huge potential in tele-ophthalmology solutions. Technological advances and the ability to directly visualize and image the eye have made ophthalmology an ideal specialty for the use of telemedicine. The application of tele-ophthalmology is not a new method of care or technique, however, the implementation of artificial intelligence (AI) is set to lead to a great transformation in its use. Deep convolutional neural networks have shown promise in predicting and classifying retinal diseases and has immense public health implications, especially in low-resource settings like India. The objectives of this case study were to classify diseases of the retina (CNV, DME, Drusen, and normal) based on optical coherence tomography (OCT) images using deep neural networks and to build a full connected network/convolution neural network and compare the results with existing model architectures and to deploy the final model to production as a mobile application. The data set contains 84,495 OCT images in *.JPEG format which were taken from various cohorts of adult patients from 5 hospitals between July 1, 2013 and 1 March 2017 [36]. There were four categories of retinal diseases such as: – – – –

Choroidal New Vascularization (CNV) Diabetic Macular Oedema (DME) Drusen Normal. The computational resources used were as follows:

– – – –

RStudio and Google Colab were used for building and training the model CPU: Intel Xeon CPU@ 2.3 GHz GPU: Nvidia Tesla P100 (16 GB) PyTorch library in Python and torch package in R.

The methods for the case study include dividing the images into test, training, and validation data sets randomly in a proportion of 70%, 20%, and 10%, respectively. Data was explored for class imbalance and discrepancies were rectified using upsampling method. The images were then classified according to the four classes of retinal diseases using a custom deep convolutional neural network. Subsequently, a transfer learning approach was adopted to train two models, ResNet-18 and MobileNetV2. The final model was selected using standard metrics such as accuracy, precision, and F1 score [37].

Data Science Approaches to Public Health: Case Studies Using Routine …

935

Fig. 23 Incorporating model interpretability

We also implemented model interpretability using Captum and Gradcam visualization for explainablility of the artificial intelligence algorithm (see Fig. 23). The final model was developed into an Android mobile application using the torchscript library. Validation of model predictions was done using new set of OCT images. The MobileNetV2 model performed better than ResNet-18 by achieving a training accuracy of about 97% and a test accuracy of 95%. The ResNet-18 model performed better in the precision and recall with an average F1 score of 0.94 versus 0.92. The average training time and inference time for both the models were around 1 h 30 min and > 0.02 s, respectively (see Table 5). The accuracy of a typical ophthalmologist in classifying an OCT image is ∼85% and our best model (MobileNetV2) was able to perform with an accuracy of ∼96%. A typical ophthalmologist requires ∼2 secs to classify one image while our model required only ∼0.03 secs for the same. Upon increasing the training data, the accuracy of the model increased (∼94% with 8,000 images versus ∼97% with 80,000 images). Through this case study, we demonstrate the application of deep neural networks in tele-ophthalmology using open-source tools and publicly available data. Leveraging explainable artificial intelligence approach provides an edge over many existing ‘black-box’ neural networks as it aids the clinician in decision-making. Mobilebased deep CNN models have huge potential to address many issues of public health importance. With adequate training, peripheral health workers can be utilized for delivering tele-ophthalmology services in hard-to-reach rural areas (see Fig. 24).

936 Table 5 Comparison of model parameters, hyper-parameters, model metrics of the three deep learning models

A. Mitra et al. Details

ConvNet

ResNet-18

MobileNetV2

Convolution layers 3

20

52

Parameters

728,184

11,178,564

3,504,872

Flops (GFLOPS)

0.05

1.83

0.31

Optimizer

Adam

Adam

Adam

Hyper-parameters Epochs

25

30

25

Learning rate

0.001

0.001

0.001

Training

~ 47 min

~ 1 h 40 min ~ 1 h 27 min

Inference

~ 0.09 secs ~ 0.003 secs

Computation time ~ 0.02 secs

Accuracy (%) Training

98.0

96.4

97.1

Validation

65.9

94.6

94.1

Test

43.3

94.1

95.5

Fig. 24 Screenshots of the deployed mobile application

4 Discussion We would like to reflect on the challenges faced in each of the case study. The first case study (COVID-19 Dashboard) had brought challenges of data pre-processing and linkage where data from multiple sources had to be integrated with a spatial

Data Science Approaches to Public Health: Case Studies Using Routine …

937

attribute. This challenge was overcome by developing robust reproducible algorithms that could be reused and customized for research needs. For example, algorithms written for comparing between first two waves of COVID-19 have been repurposed with minimal effort and time during the onset of the third wave. Another challenge we faced was the lack of data availability. As the covid19india.org tracker website had to suspend operations as it was driven by volunteers with no funding. Fortunately, efforts by activists and data journalists helped revive the ongoing effort of citizen involvement through the new covid19india.org website which filled the shoes of its predecessor. The second case study on using CRS data to gain public health insights into cause of death and mortality patterns brought new challenges as data was not in analysable formats, it required a lot of data preparation and geocoding of addresses. Other challenge was the organizational barriers in providing access to data to public health stakeholders including researchers. One way some of these challenges were address was through stakeholder engagement and earning trust. The third case study on caesarean section deliveries had unique challenges such as broken links to publicly available district level factsheets. The second challenge was the complexities involved in computational extraction of data in tidy formats from the PDF documents. Also, the cleaning of data required a lot manual checks to prevent misinterpretation of information. Additionally, one challenge that was particularly difficult to solve was geocoding the data as the district names are not uniformly written across different data sets including spatial and sociodemographic. The fourth case study required considerable expertise and knowledge on the fundamentals of deep learning techniques and image processing and analysis. The case study also necessitated additional computational resources that may not be readily available to all researchers. Addressing the issue of clinical interpretably, explainability, and visualization of where the weights are being picked up through Captum and Gradcam was difficult to implement. The most difficult challenge for the author was to develop an app that can be deployed on an Android Smart Phone especially because of lack of prior experience or lack of formal training in app development. Additional challenges faced in all the case studies included the issue of data quality. Considerable time was spent on all the case studies to address this challenge adequately. Some of the data quality issues faced were missing data, outliers, erroneous entries, and entering data in inconsistent formats. Some of these challenges and the approaches taken to address these were discussed in some of the previous work by the authors [38–41]. Apart from this, the lack of adequate computational infrastructure for performing the analysis was persistent across most case studies. Despite these challenges, the authors were successful in demonstrating that public health data science has immense potential in transforming data from the routine health information systems into evidence for public health. Data science approach to public health has varied applications including health equity, pandemic or outbreak management, medical imaging, health policy and maternal, and child health. Empowered with data science tools and FAIR Principles for data management (Find-able, Accessible, Interoperable, and Reusable), [42] the way forward for public health can be

938

A. Mitra et al.

more equitable, acceptable, patient centric, community oriented, participatory, and trusted.

5 Conclusion To conclude, this study has many value additions: • Generates value to data generated from the routine large-scale national-level surveys. • Adds to the quality of evidence generated from routine health information systems by integrating it with data from multiple sources. This data linkage and integration allows for a comprehensive view of the situation, which is not otherwise possible. • Captures the ubiquitousness of data triangulation and geographical information systems (GIS), spatial, and spatiotemporal methods. This adds a new spatial dimension to the data, greatly enhancing the information and its interpretation by the decision-makers. • Leverages the applications of novel methods like data science and spatial techniques by providing robust and reproducible frameworks for evidence generation. This becomes especially important when the data volume is vast and resources are low. • Provides a framework to engage with the programme managers/stakeholders in near real time to gain insights into patterns and dynamics of issues of public health concern. • This exploration also can enhance data quality and encourage data use policy in public health. • It will help the routine public health surveillance system through watchful observation and undertaking corrective action. • It will enable researchers to study the epidemiological trends in individual factors and geospatial and region-specific disease patterns. • Potential to help in better resource allocation, resource reallocation, and mobilization. • It can aid in decision support and inform policy decisions by providing robust epidemiological evidence, which is crucial for optimal use of limited resources and improving the overall health system efficiency. • It can help us understand the emerging public health threats and create an enabling environment for precision public health.

References 1. Odone A, Buttigieg S, Ricciardi W, Azzopardi-Muscat N, Staines A (2019) Public health digitalization in Europe. Eur J Public Health 29:28–35

Data Science Approaches to Public Health: Case Studies Using Routine …

939

2. Chiolero A, Buckeridge D (2020) Glossary for public health surveillance in the age of data science. J Epidemiol Community Health 74(7):612–616 3. Ford E, Boyd A, Bowles JKF, Havard A, Aldridge RW, Curcin V et al (2019) Our data, our society, our health: a vision for inclusive and transparent health data science in the United Kingdom and beyond. Learn Health Syst 3(3):e10191 4. Benke K, Benke G (2018) Artificial intelligence and big data in public health. Int J Environ Res Public Health 15(12):E2796 5. Belle A, Thiagarajan R, Soroushmehr SMR, Navidi F, Beard DA, Najarian K (2015) Big data analytics in healthcare. Biomed Res Int 2015:1–16 6. Sahay S, Sundararaman T, Braa J (2017) Public health informatics: designing for change-a developing country perspective. Oxford University Press 7. Haneef R, Delnord M, Vernay M, Bauchet E, Gaidelyte R, Van Oyen H et al (2020) Innovative use of data sources: a cross-sectional study of data linkage and artificial intelligence practices across European countries. Arch Public Health 78:55 8. Deeny SR, Steventon A (2015) Making sense of the shadows: priorities for creating a learning healthcare system based on routinely collected data. BMJ Qual Saf 24(8):505–515 9. Vayena E, Dzenowagis J, Brownstein JS, Sheikh A (2018) Policy implications of big data in the health sector. Bull World Health Organ 96(1):66–68 10. Zodpey SP, Negandhi HN (2016) Improving the quality and use of routine health data for decision-making. Indian J Public Health 60(1):1 11. Pandey A, Roy N, Bhawsar R, Mishra RM (2010) Health information system in India: issues of data availability and quality. Demography India 39(1):111–128 12. Hung YW, Hoxha K, Irwin BR, Law MR, Grépin KA (2020) Using routine health information data for research in low- and middle-income countries: a systematic review. BMC Health Serv Res 20(1):790 13. Bomba B, Cooper J, Miller M (1995) Working towards a national health information system in Australia. Medinfo MEDINFO 8:1633–1633 14. Morrato EH, Elias M, Gericke CA (2007) Using population-based routine data for evidencebased health policy decisions: lessons from three examples of setting and evaluating national health policy in Australia, the UK and the USA. J Public Health 29(4):463–471 15. Houston TK, Sands DZ, Jenckes MW, Ford DE (2004) Experiences of patients who were early adopters of electronic communication with their physician: satisfaction, benefits, and concerns. Am J Manag Care 10(9):601–608 16. Tull K (2018) Designing and implementing health management information systems 17. Trewin C, Strand BH, Grøholt EK (2008) Norhealth: norwegian health information system. Scand J Public Health 36(7):685–689 18. Ringard Å, Sagan A, Sperre Saunes I, Lindahl AK, World Health Organization et al (2013) Norway: health system review 19. Center for international earth science information network—CIESIN—Columbia University. Gridded population of the world, version 4 (GPWv4): population density, revision 11 [Internet]. NASA Socioeconomic Data and Applications Center (SEDAC), Palisades, New York (2018). Available from: https://doi.org/10.7927/H49C6VHW 20. Government of India (2021) Unique identification authority of India [Internet]. [cited 2021 May 19]. Available from: https://uidai.gov.in/images/state-wise-aadhaar-saturation.pdf 21. R Core Team. R (2021) A language and environment for statistical computing [Internet]. Vienna, Austria. Available from: https://www.R-project.org/ 22. RStudio Team (2021) RStudio: integrated development environment for r [Internet]. Boston, MA. Available from: http://www.rstudio.com/ 23. Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R et al (2019) Welcome to the tidyverse. J Open Source Softw 4(43):1686 24. Wickham H. Advanced R, 2nd edn. Advanced R 604 25. Wickham H (2021) Mastering shiny. O’Reilly Media, Inc., p 395 26. Munafò MR, Nosek BA, Bishop DV, Button KS, Chambers CD, Percie du Sert N et al (2017) A manifesto for reproducible science. Nat Hum Behav 1(1):1–9.

940

A. Mitra et al.

27. Peng RD, Hicks SC (2021) Reproducible research: a retrospective. Annu Rev Public Health 1(42):79–93 28. The comprehensive R archive network [Internet]. [cited 2022 Sep 30]. Available from: https:// cran.r-project.org/ 29. RECon (2021) R epidemics consortium [Internet]. [cited 2021 May 25]. Available from: http:// reconhub.github.io/ 30. Mitra A, Soman B, Gaitonde R, Singh G, Roy A (2022) Data science methods to develop decision support systems for real-time monitoring of COVID-19 outbreak. J Human, Earth, Future 3(2):223–236 31. Mitra A, Pakhare AP, Roy A, Joshi A (2020) Impact of COVID-19 epidemic curtailment strategies in selected Indian states: an analysis by reproduction number and doubling time with incidence modelling. PLoS ONE 15(9):e0239026 32. Mitra A, Soman B, Singh G (2021) An interactive dashboard for real-time analytics and monitoring of COVID-19 outbreak in India: a proof of concept. In: arXiv preprint arXiv: 210809937 [Internet]. International Federation for Information Processing, Norway. Available from: https://arxiv.org/ftp/arxiv/papers/2108/2108.09937.pdf 33. United Nations (2014) Principles and recommendations for a vital statistics system: revision 3 [Internet]. UN [cited 2022 Sep 30]. (Statistical papers (Ser. M)). Available from: https://www. un-ilibrary.org/content/books/9789210561402 34. MCCD division (2018) Report on medical certification of cause of death–2018. Department of Economics and Statistics, Thiruvananthapuram 35. Vital Statistics Division (2019) Annual vital statistics report–2019. Department of Economics and Statistics, Thiruvananthapuram 36. Kermany DS, Goldbaum M, Cai W, Valentim CCS, Liang H, Baxter SL et al (2018) Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172(5):11221131.e9 37. Mitra A (2021) Retinal disease classification using deep learning networks : applications in teleophthalmology. In: Proceedings of the 17th international conference of telemedicine society of India. TSI, India, India 38. Mitra A, Soman B, Gaitonde R, Singh G, Roy A (2021) Tracking and monitoring COVID-19 in Kerala: Development of an interactive dashboard. In: Health system research. Cochin, Kerala 39. Singh G, Soman B, Mitra A. A systematic approach to cleaning routine health surveillance datasets: an illustration using national vector borne disease control programme data of Punjab, India. arXiv:210809963 [cs] [Internet]. 2021 Aug 23 [cited 2021 Oct 2]; Available from: http:// arxiv.org/abs/2108.09963 40. Joshi A, Mitra A, Anjum N, Shrivastava N, Khadanga S, Pakhare A et al (2019) Patterns of glycemic variability during a diabetes self-management educational program. Med Sci 7(3):52 41. Saoji A, Nayse J, Deoke A, Mitra A (2016) Maternal risk factors of caesarean delivery in a tertiary care hospital in Central India: a case control study. People’s J Sci Res 9(2):18–23 42. Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A et al (2016) The FAIR guiding principles for scientific data management and stewardship. Sci Data 3(1):160018

Arboviral Epidemic Disease Forecasting—A Survey on Diagnostics and Outbreak Models Supreet Kaur and Sandeep Sharma

Abstract Globally, the most rampant arboviral disease yearly is Dengue fever with approximately 96 million cases and with the absence of global approved vaccine. Scientists are in continuous search to devise an epidemiological system which could timely predict, diagnose, and treat Dengue-infected person to lessen global yearly mortality rate associated with the disease. In this paper, a quick review is done on studies done till December 2020 focusing forecasting Dengue epidemiology covering various aspects like diagnostics, prognosis, and surveillance prediction to emphasize overall scheme of current research progress and obstacles faced in it. This paper highlights various predictive approaches along with open issues that require further investigation like mosquito breeding sensory reporting, human mobility, and diagnosis of disease prognosis to risky state to ensure precise prediction model of Dengue-infected patient and after care. This paper has a good blend of papers encompassing various research directions that are both traditional and innovative in Dengue epidemiology and are further sub-grouped according to their proposed approach and objectives to enhance deeper understanding. Keywords Arboviral-disease epidemiology · Classification · Dengue epidemic · Expert system · Machine learning · prediction

1 Introduction The female Aedes aegypti species mosquito is day-biting urban adapted and ovipositors in natural and artificial stagnant water collections and is responsible for various deadly infections including Dengue, Zika, Yellow fever, and Chikungunya [1]. Before 1970, there are only nine countries under the Dengue threat which has now extended to more than 100 countries in WHO regions, 2016 year being marked as a huge Dengue outbreak year worldwide [2]. Till date, no licensed treatment or vaccine for Dengue (globally accepted) is present which further makes preventive measures—the S. Kaur (B) · S. Sharma Department of Computer Engineering and Technology, Guru Nanak Dev University, Amritsar, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_64

941

942

S. Kaur and S. Sharma

only best strategy in the battle against this disease. Moreover, scientists firmly believe that the sudden global climatic shift which has impacted rainfalls and temperatures which are primary dependencies for water habitat availability have some contributing aspect to mosquito-borne disease trend too. Earlier research has found a link between the transmission of Dengue infection and climatic conditions, but the correlation is positive or negative; both are still debatable [3]. Still one of the research concentrated on Indian Dengue epidemic pattern supports the positive correlation as it originates in the south and then spreads to the north in a fixed pattern yearly as this day-biting mosquito breeds in warm-humid environment and with monsoon comes abundance of water habitats as this season increases the survival rate of mosquitoes thereby escalating the possibility of transmitting the virus to other people hence wide spread of the disease [4]. But the popular belief about epidemiology of Dengue which used to be constricted to tropical/subtropical areas is no longer valid as areas with extreme colder temperatures have also reported Dengue findings as studies show that the mosquito gene responsible for infection has evolved making itself resilient to colder temperatures [5]. Yet one innovative study focusing on human suspects that human mobility to be the prime cause of the widespread Dengue disease across the globe as an infected human has more tendencies to travel longer distances over the globe than the Dengue virus-carrying mosquito itself [6]. Moreover, recent studies Fig. 1 (Data source1: Directorate of National Vector Borne Disease Control Programme, Dte.GHS, Ministry of Health & Family Welfare. Source2: Lok Sabha and Rajya Sabha unstarred questions, Ministry of Health and Family Welfare) show that cases reported of Dengue are rising at alarming rates each year (2017 being the deadliest Dengue year since 2009) highlighting the dire need to curb this disease outburst on an early basis.

Fig. 1 Cases and deaths due to Dengue in India

Arboviral Epidemic Disease Forecasting—A Survey on Diagnostics …

943

It is speculated that the occurrence of COVID-19 has camouflaged Dengue reporting in the year 2020 as both being infectious viral infections cause similar symptomatic onset thereby the former being more deadly overpowered the latter in grabbing the attention of the concerned parties. Predicting Dengue outbreak to facilitate Dengue surveillance with the help of meteorological data and Dengue cases data reported have given appreciable positive results [7]. In some predictive systems, an early prediction of up to four months has been observed [7] which can further aid in preparing gears against the onset of this epidemic; however, the variations observed differ from one to another geographical area which inhibits the adoption of such system globally. Some researchers have also speculated that the inclusion of sensory data from wearable gadgets, geo-locations, image sensing of possible mosquito breeding grounds/density of Aedes mosquitoes as well as the addition of traveler’s air route into the prediction system can yield promising results [8]. As discussed earlier, the absence of the Dengue vaccine confines the absolute preventive measure, but still researchers are contributing to finding ways that increase ways that could aid in combating this disease.

1.1 Dengue Infection Transmission Cycle When a human gets bitten by the mosquito carrying Dengue infection-causing serotypes, the virus pops out of the salivary gland of the mosquito onto human skin and the virus enters the human cell. Once inside, this virus makes use of the host cell’s machinery to make multiple copies of itself, and hence, virus gets multiplied and released into the blood and infects other cells quickly. In response to the virus, the WBCs are released but due to the special capability of this virus, WBC also becomes infected with the virus as illustrated in Fig. 2. The infected person, therefore, gets a high fever that lasts for approx. a week, which is the very first sign of Dengue onset. As illustrated in Fig. 1 earlier, infectious diseases, such as Dengue, are raising the death toll each year, making India’s success murky in achieving its target of eliminating the mosquito-borne disease. The problem with this day-biting Dengue-causing mosquito is that it is too similar in appearance to common mosquito found near civilization which raise the bar of identification anomaly as shown in Fig. 3 (source: https://www.differencebetween.com/differencebetween-dengue-mosquito-and-vs-normal-mosquito/). In the figure, we have very clear and highly zoomed picture of Dengue mosquito on left side and at the same time similar quality picture of normal mosquito incapable of spreading Dengue disease infection on right side. Only when seen microscopically, one could identify the classic tiger pattern on Dengue mosquito which is otherwise absent on commonly found mosquito breeds. Also in their early stages of life when they have not fully matured as a mosquito, it becomes certainly impossible to identify and distinguish larvae or eggs laid in water. The cycle starts with Dengue virus infection manifesting itself in female Aedes aegypti mosquito, and the same infected mosquito then bites human which itself

944

S. Kaur and S. Sharma

Fig. 2 Dengue infection transmission life cycle

Fig. 3 Dengue mosquito versus normal mosquito

gets infected with Dengue infection. Ultimately, the infected human becomes active carrier of Dengue infection virus and infects as much as mosquito feeds on that person’s blood and this cycle continues. This ultimately put the entire Dengue infection cycle standing on three basic pillars, namely Aedes aegypti female mosquito, Dengue virus, and human as shown in Fig. 4. And if any way we could negate occurrence of all three pillars together can lead the current research into positive possibility of curbing this global Dengue infectious epidemic. Because as discussed earlier, recent studies show that the mosquito genus causing Dengue has also evolved over time and now can live and breed in colder climatic conditions. That negates the common notion of Dengue being disease of tropical or subtropical countries which put this deadly rapidly growing epidemic disease on high-risk alert around globe. Some researchers from Singapore have claimed to raise and release sterile male mosquito of Dengue-causing mosquito genus which upon copulation renders the

Arboviral Epidemic Disease Forecasting—A Survey on Diagnostics …

945

Fig. 4 Key components of Dengue cycle

female mosquito infertile. However, the practicality and sustainability of this new found cure are still under trail, but the addition of guppy fishes (Poecilia reticulata) in any or every stagnant water bodies which may serve as breeding grounds for mosquito has shown positive results. Not only reduction in Dengue cases was observed but also malaria infections were bought under control with this step which serves as a positive hope in curbing this menace. Dengue usually has symptoms like a sudden onset of fever that lasts up to a week accompanied by a severe headache, pain in the knee/ankle and elbows, loss of appetite along with vomiting and diarrhea, rashes on arms and legs, fatigue, and severe itchiness [6]. But the similar symptomatic onset of most viral infections often increases the complexity of the current diagnosis, thereby delaying the patient’s recovery. This sometimes compromises the survivability of patients as although viral infections inhibit similar behavior yet the course of treatment for each differs. From Fig. 2, it can be seen that the whole Dengue transmission cycle revolves around three main aspects, namely Dengue mosquitoes, humans, and Dengue infection. So hence, we have three parameters in this Dengue equation, and therefore, the studies usually revolve around them as singular or as combinations themselves. This paper has been made by taking primary focus on research done in the area of predicting this disease; however, it was found that predominately studies were focused on using predictive models based on meteorological and Dengue cases data, and very few in comparison were on Dengue patient itself. The studies using the symptomatic dataset are crucial in not only identifying the infected person but also in the patient care during and post-disease onset which ensures proper timely care to reduce health risks. Some more interesting studies were also found which did not choose the traditional approach to deal with the disease. So in this paper, author tried to focus on the different Dengue predictive approaches, categorizing and analyzing

946

S. Kaur and S. Sharma

them to give readers an understanding of the various research undergoing on this disease. The motivation behind this study can be broadly categorized as follows: 1. To emphasize the need to predict Dengue infection effectively and precisely. 2. To perform a deep analysis on various aspects of Dengue epidemiology and highlight the key areas of study. 3. To study the role of different prediction models in combating Dengue disease. 4. To examine the solutions, main challenges, and open issues in the existing literature. 5. To provide an effective review to give insight into the possible research growth areas as the magnitude of research in this area has been escalating regularly. The rest of the article is structured as follows: “Related work” summarizing the existing related work. Various challenges faced by researchers are summarized in the “Challenges” section. The experimentation results and performance analysis of the reviewed works are conducted in “Discussions”. “Conclusion” gives the conclusions derived from the proposed models in the studies included along with future directions in Dengue disease prediction.

2 Related Work The section can broadly be subdivided into three parts based upon the nature of the objective proposed/achieved by researchers in their study, namely Dengue outbreak prediction/control, Dengue diagnostic/prediction, and lastly Dengue vector/human subjectively. The grouping is done by carefully understanding the similarities, differences, and trends which are interpreted in each study. Under the Dengue outbreak section, all papers of researchers who have worked on Dengue outbreak control and prediction are collectively summarized, whereas Dengue diagnostic section contains those which are focused on Dengue patient parameters which may aim to predict or diagnose Dengue patients based upon the symptomatic parameters of Dengue disease. Lastly, the human/vector section is also included to emphasize the research work done on detection/control as well as prevention of Dengue mosquitoes and the role of human mobility in aid of Dengue transmission which is different from the traditional approach. These sections are just made to avoid confusion as well as to give a better understanding to the readers so that they can get an overall idea of how research is progressing in different directions to ultimately aid in curbing Dengue disease.

2.1 Dengue Outbreak Models working on Dengue outbreak forecasting are predominately using meteorological data along with monthly/annual Dengue death cases reported according

Arboviral Epidemic Disease Forecasting—A Survey on Diagnostics …

947

to the concerned geographical location of the intended research [7–11]. A massive chunk of Dengue-related research belongs to this category, reasons such as the wide availability of the needed datasets without any or minimal obligations to follow facilitate the overall progress of work. Many studies have claimed to predict somewhere ranging from 2 months and maximum up to 2 years in advance the accurate Dengue outbreak forecasting [12, 13]. Some key studies are being classified under this section, and their comparative analysis is also done consecutively in the following table, thereby giving a quick view of factors such as dataset, prediction technique, the accuracy of the model, and country in which the model is focused along with some key findings of each study [14, 15].

2.2 Dengue Diagnostic Patient care can reduce risk factors concerning Dengue disease infection by reducing the burden on the medical sector. Dengue like any other virus-causing disease exhibits flu-like symptoms which are very often neglected by the suffering human [16]. If a system could timely predict Dengue patients and also guides them about the severity level of the disease, patient care can be considerably improved along with fewer burdens on the medical staff. Employment of IoT sensors along with fog/cloud computing technologies has been found good to sail according to the severity of Dengue [17, 18]. Map-reduce programming helps in combating complex humongous datasets to find the count of infected people in a population [19]. With more involvement of machine learning capabilities in disease, researchers have shown to claim accuracy in the range of 79–96% [20–28] emphasizing better working of certain machine learning algorithms such as AN, DT, MLP, and SVM as solo or infusion concerning not only accuracy but also to sensitivity and specificity. Such studies are shown by tabular comparison in Table 2 by focusing on certain parameters as follows.

2.3 Dengue Vector/Host Researchers always try to explore various dimensions to find even those parameters which may have some effect either directly or indirectly on the intended subject. Similarly in Dengue epidemiology, findings have pointed to the involvement of humans in the global distribution of this infection to areas that were globe apart from Dengue-infected geographical areas in past. Human mobility [29] is said to be the reason as only humans are capable of traveling from one part of the world to another carrying the disease within them, COVID-19’s spread being the best example of this. Abeyrathna et al. [30] made a model which supports the assumption that only humans aid in disease propagation over longer distances. The mosquito population is also targeted for study to find ways to identify and eradicate the species which is

948

S. Kaur and S. Sharma

responsible for this infection at different stages of their life. Fuad et al. [31] claimed their method helps in estimating the Dengue mosquito population in stagnant water habitats for which their dataset has five images containing forty-six Aedes aegypti larvae, 37 larvae out of a total of 46 larvae were successfully detected. It is often assumed that hybrid systems most of the time yield more accuracy compared to traditional ones. Babu et al. [32] focused their study on Dengue reporting systems as well as on the public by focusing on mosquito abundance measures (manual entry of mosquito/egg counts), environmental factors as well as microclimate-related data as parameter input to achieve effective surveillance system [16]. These three studies are analyzed, and a tabular comparison is provided in Table 3 as follows.

3 Challenges of Dengue Forecasting Dengue is a rapidly growing epidemic all around the world whose accurate and actionable forecast plays a big burden on global public healthcare systems. However, small as well as large time scale predictions can improve the disease management by stepping up response against the outbreak. Predicting the real-time widespread of Dengue is a tedious task as it requires deep knowledge of this disease alongside a way of procuring raw data from the real world and converting it into stipulated predictions. The major challenge in many parts of the world for carrying out Dengue prediction is the lack of data available on large scale, and even if some availability is there, the lack of standard format makes the task difficult. In some countries like Thailand, the national health inventory has some individual case reports available for DF, DHF, and DSS, respectively, which can be requested for research use by fellow citizens. These data contain each case information including onset date of symptoms, disease diagnostic symptoms and tests, home address, and other information like age, sex, phone, etc. However, this is not so easy in other parts of the world where you have to manually procure data from hospitals, and health centers after undergoing all legal formalities, and it becomes more challenging when data is maintained on hard materials. Analytical challenges like appropriate model selection, training, validation, and adjustments for effective reporting become the second hurdle in this process once you get a sufficient scale of the desired dataset. The biology of this disease also plays a tricky role as the interaction of Dengue infection serotypes present solely or in combination affects the person differently. Also, the level of risks increases as a person gets secondary Dengue infection and no lifetime immunity is acquired by it. The three types of Dengue infection have their own set of complications when it comes to detection. DF is more common in occurrence and least easy to accurately get detected as its symptoms are alike any common flu-like symptoms, whereas DHF is less likely misdiagnosed than DF and is more life threatening. To date, no systems have been deployed that alone can detect the three types accurately and timely as the difficult symptomatic nature of the disease poses a challenging task.

Arboviral Epidemic Disease Forecasting—A Survey on Diagnostics …

949

4 Discussion Dengue is one of the deadliest diseases over the globe that yearly has a major death toll which is no longer restricted to subtropical as well as tropical regions as earlier. With the change in environmental factors, Dengue mosquitoes have also adapted and revolutionized themselves as now many recent incidences have shown Dengue mosquitoes to show resistance to colder temperatures. As a result, now regions with colder climates are even at the risk of Dengue disease infiltration, thereby making it more of a global burden. More importantly, the absence of a worldwide accepted Dengue vaccine makes it an even trickier bomb. Researchers are working continuously to give their contribution to the fight against this infectious disease by exploring various approaches to which this problem can be tackled. In this paper, various Dengue approaches by researchers have been summarized, and for ease of understanding, they have been categorized broadly into three subsections. At first, tabular data representation gives a glance at information, which is followed by a more detailed context about each paper discussed in Tables 1, 2, and 3. Table 1 contains all the papers in which the main objective was to predict the Dengue outbreak which incorporates Dengue cases/deaths and meteorological data. All the proposed models in the studies included have claimed to generate accuracy above 80% in most of the cases, whereas some were successful with prediction ranges from monthly to yearly advance in outbreak risk estimation of Dengue outbreak. However, the inclusion of other factors such as fog computing and IoT gives this approach a cutting-edge advantage over its peers as the real-time data makes the working of the model more efficient. But the areas with minimal or no resources become a major limitation and do hamper the full functionality of such systems, also the absence of parametric inclusion about Dengue symptoms as well as the patient’s key health aspect makes the overall approach lack full exposure. However, a newer approach which includes proposed systems using sensors to monitor the mosquito population to raise a warning to the dedicated government bodies for possible risk areas (active Dengue mosquito breeding grounds) tends to fail in emphasizing the critical point on which their system was able to distinguish and pinpoint Dengue mosquitoes apart from others in the same water body. As the microscopic size of various species of mosquito larvae habituating the water body is already difficult to be picked by sensors, hence how Dengue mosquito larvae were differentiated should have been discussed. Moreover, the usual approach followed in this method is of collecting a sample from the water area and then analyzing them at the laboratory and also incurs a limitation which is the sample can be biased as it may not contain the best possible representation of the actual mosquito population. It has been noticed that a major amount of research work is done in this predictive approach where data is a combination of both meteorological and Dengue cases reported in comparison with other research directions followed which have yielded many beneficial outcomes, also countries like Thailand, Malaysia, and Singapore have carried Dengue dominating research more than other parts of the world. The readily available dataset on the country’s government websites is one of the primary reasons for more adoption of

950

S. Kaur and S. Sharma

Table 1 Comparative analysis of Dengue outbreak prediction models Title

Country

Prediction technique

Findings/limitations

Modeling and prediction of Dengue occurrences in Kolkata, India, based on climate factors

India

Zero-inflated Poisson’s regression model

There is an increasing Dengue infection trend observed with the rise in carbon dioxide emission Higher incidence projection of Dengue cases is found in post-monsoon months Key blow of meteorological factors is observed on birth/amplification of Dengue in Kolkata

A robust and nonparametric model for prediction of Dengue incidence

Singapore

Gaussian process

In the expansion of work in the future, the role of the human as well as vector factors is to be investigated in forecasting Dengue incidence The proposed model can also consider the weekly scale spatially and weekly for more effectiveness A correlation between climatic variables and Dengue incidence is observed

Real-time Dengue prediction using machine learning

Not given

Logistic regression algorithm

The logistic regression algorithm is preferred as it obtains better generalization when compared to the gradient boost algorithm Integration of Raspberry Pi by utilizing GPS and GSM modules for generating alert maps

Multivariate Poisson’s regression model, SARIMA and SARIMA with external regressors for selection

The results support previous studies-with temperature and humidity being powerful predictors in the magnitude of Dengue incidence This study exhibits limitations such as under-reporting, limitation of data, and unmeasured confounders such as population density data, Dengue serotypes and herd immunity, access to piped water, effective vector control

Developing a Dengue Malaysia prediction model based on the climate in Tawau, Malaysia

(continued)

Arboviral Epidemic Disease Forecasting—A Survey on Diagnostics …

951

Table 1 (continued) Title

Country

Prediction technique

Findings/limitations

Prediction of Dengue outbreak in Selangor using fuzzy logic

Malaysia

Fuzzy logic

For long-term forecasting, fuzzy is talented in minimizing errors Parameters such as total rainfall and total rainy days are only considered Parameters such as humidity/temperature/wind velocity need to be addressed to examine model effectiveness

Improved prediction of Dengue outbreak using the delay permutation entropy

Hong Kong

Delay permutation entropy with radial basis function in SVM

Analysis showed a strong correlation between Dengue cases and rainfall DPE The results were consistent with observations that Dengue outbreaks were always during the rainy season It was found that Dengue outbreak prediction based on DPE features is up to 11% more than those from monthly average data In cooperation with additional global climate, indices in the predictive model in the future should be investigated

CART and C5 algorithm with equal-width four binning method

A study is done to investigate the rainfall and other remote sensing variables to correlate their contribution to the 2006–2015 Dengue outbreaks in Nakhon Ratchasima The main contributing factor that appeared in the model is the Dengue cases in the previous month Various satellite-based indices such as thermal condition, temperature condition, and greenness of vegetation also appear in the model

Remote sensing-based Thailand modeling of Dengue outbreak with regression and binning classification

(continued)

952

S. Kaur and S. Sharma

Table 1 (continued) Title

Country

Prediction technique

Proposed conceptual framework of Dengue active surveillance system (DASS) in Malaysia

Malaysia

API, SOAP, MongoDB, Data from weather, social and NoSQL media as well from hospitals can be combined and fed to the system for diverse and better prediction not only on patient symptom level but also on outbreak level with warning generation This proposed system needs to be verified with the development of the actual system to test efficiency Although the proposed system may be useful only after implementation, its unification scope can be justified, i.e., all different components will work in harmony or not

Findings/limitations

Identification of the prediction model for Dengue in Can Tho city, a Mekong delta area in Vietnam

Vietnam

SMR, SARIMA, and Poisson’s distributed lag model

Although this study is in line with previous findings of temperature and relative humidity being significant factors, it strongly negates the effect of cumulative rainfall on Dengue incidence Follow-up study is being recommended to validate the model on a larger scale Emphasis is also given to mapping predictions to a user-friendly operational tool for public use

Prediction of high incidence of Dengue in the Philippine

Philippine

Fuzzy association rule mining (FARM), weighted voting classifier

This model may complement early detection, but it is intended to predict whether or not a high incidence of disease will occur several weeks in the future The model showed good performance in predicting multi-week Dengue incidence 4 weeks in advance

Arboviral Epidemic Disease Forecasting—A Survey on Diagnostics …

953

Table 2 Comparative analysis of Dengue diagnostic models Title

Expert system technique

Country

Findings/limitations

An intelligent and secure healthcare framework for the prediction and prevention of Dengue virus outbreak using fog computing

Cloud environment, fog computing

India

Effective for Dengue detection at an earlier phase The symptomatic investigation is carried out followed by an alert sent to mobile The severity level is being predicted and it’s proposed that patient will get an alert for measures of safety: diet, rest and care Moreover, the officials will be alerted for locality tracing and scanning using the cloud

Prevention of infectious disease based on big data analytics and map-reduce

Map-reduce programming

Not given

The population is categorized into three main parts of vulnerability: high, mid, and low The mechanism proposed needs to be implemented for evaluating its correctness

The diagnosis of Dengue disease: an evaluation of three machine learning approaches

ANN, DT, NB

India

Dengue dataset is divided between two attributes here: clinical and non-clinical Non-clinical data contains entries for age, gender, vomit, abdomen pain, chills, body ache, headache, weakness, and fever Clinical data contains values for platelet count, temperature, heart rate, NS1, IgM, IgG, and ELISA Evaluation using accuracy, sensitivity, specificity, and error parameter indicates ANN provides better outcomes but has larger computational time Two classes are predicted: Dengue positive or negative (continued)

954

S. Kaur and S. Sharma

Table 2 (continued) Title

Expert system technique

Country

Findings/limitations

DengueViz: a knowledge-based expert system integrated with parallel coordinates visualization in Dengue diagnosis

C language integrated production system

Not given

Certainty points are allocated to each clinical characteristic in Dengue data The knowledge base contains if–then rules found from research expert sources/WHO publications Real-world testing is suggested to enhance the accuracy and reliability of the model

A preliminary Dengue fever prediction model based on vital signs and blood profile

Linear discriminant model

Malaysia

Classifiers in MATLAB are used, but a linear discriminant with top accuracy is selected Patient data is biased as more data is from the DFWS category Missing values lead to low statistical computation along with unbalancing severity group distribution was responsible for less sensitivity

Taiwan

860 rules are constructed by the system Signs/symptoms are input, and the result is categorized as Dengue confirm, no Dengue, and probable Dengue For correct diagnosis, both symptoms and clinical test results can only give confirmed results The system is unable to further investigate from which level of Dengue, the patient is suffering

Early diagnosis of Dengue Fuzzy inference engine disease using fuzzy inference engine

(continued)

this approach which after careful selection of the prediction model gives promising future trends. Table 2 highlights different expert systems used to effectively predict Dengue patients among non-Dengue patients using the patient dataset. It was observed that the authors often preferred the machine learning approach for its advantages in disease

Arboviral Epidemic Disease Forecasting—A Survey on Diagnostics …

955

Table 2 (continued) Title

Expert system technique

Country

Findings/limitations

Combined committee machine for classifying Dengue fever

MLP, linear kernel in SVM, Dempster-Shafer theory, and Fisher score method

India

Dengue database is composed of positive and negative samples over years which makes it imbalanced and nonlinear This uneven distribution often results in biased machine classifications Selecting key features among high-dimensional datasets reduces computational running

Diagnosis and prognosis Adaptive histogram of the arbovirus-Dengue threshold algorithm, using intelligent algorithm box-counting algorithm, an SVM classifier

India

Dengue detection is done by the analysis of digital blood cell images After analysis, platelet feature vector is infused and counting is done A classifier based on features input and counting classifies Dengue patients

diagnosis when the objective of Dengue prediction is to be done by using symptomatic patient data. In studies selected for this paper, the ones used a single machine learning approach [16, 17] and were able to achieve lower accuracy in comparison with approaches where more than one approach was deployed [20–22]. On a broader view, it may be misjudged that using only one machine learning technique yields less accuracy, but there are several factors underlying responsible, which may have a direct or indirect effect on the proposed system’s performance. For better prediction results, the choice of the technique employed does matter as much as the quantity and quality of the dataset. One common finding among all studies in this sub-selection was the difficulty of collecting Dengue data from medical institutes as the symptomatic dataset concerning the disease is not available over online resources and manual labor is needed in almost all the cases. Another common issue faced by researchers while dealing with the patient dataset is, the restriction of data (participant’s as well as concerned authorities’ consent needed) that can be collected, hospitals tend to store files in traditional styles, i.e., registers which are often inaccessible after a while and if procured have to be processed and formatted into a soft version for use. Still, some studies were able to predict the progression of Dengue fever into its more serious levels/types along with the prediction of factors such as length of hospital stay and mortality rate by using machine learning algorithms with accuracy achieved as high as 98%. The inclusion of sensors along with these approaches gives real-time Dengue forecasting which may act as a life-saving system in areas with minimal/absence of specialized

956

S. Kaur and S. Sharma

Table 3 Comparative analysis of Dengue vector/host models Title

Proposed System

Detection of Aedes aegypti larvae using single-shot multibox detector with transfer learning

Single-shot multibox The proposed system detectors, transfer comprises a water learning Inception_V2 storage tank in which single-shot multibox detectors are used along with Inception_V2 to identify Aedes ageypti larvae without raising any false alarm

Smartphone geospatial EWARS, two apps for Dengue applications MOSapp control, prevention, and DISapp prediction, and education: MOSapp, DISapp, and the mosquito perception index (MPI)

Proposed methodology

MOSapp and DISapp will facilitate the working of EWARS by allowing on-field health workers to upload data from surveillance as well as environmental data as well as relevant data uploaded by the community providing mosquito abundance parameters for better disease surveillance system

Findings/limitations Training data images of Aedes larvae was tiny, thereby too sparse for better training resulting in the loss of accuracies Still, there was no false alarm observed in the test Although results are satisfactory, more work needs to be done in the detection of smaller objects for better larvae detection This approach claims to estimate the Dengue mosquito population in stagnant water habitats The inclusion of mobile-based applications can become a key role factor in Dengue prevention/monitoring Geospatial tags along with real-time data from both community and public health workers can increase the strength of Dengue surveillance Here conceptual framework is discussed which could be validated by practical implementation in the future (continued)

medical aid. However, data collection plays a huge role in this aspect as dataset containing parameters such as age, sex, hemoglobin, red blood cell count, white blood cell count, mean corpuscular volume, red cell distribution width-corpuscular volume, heart rate, blood pressure, body temperature, vomit, body ache, headache, and weakness with results from Dengue tests, i.e., NS1, IgM, IgG, and ELISA can

Arboviral Epidemic Disease Forecasting—A Survey on Diagnostics …

957

Table 3 (continued) Title

Proposed System

Proposed methodology

Findings/limitations

Dengue propagation prediction using human mobility

Modified deterministic model

A modified deterministic model is constructed taking into account the assumption that only humans are capable of transmitting Dengue over longer distances to foretell Dengue spread

The inability of Dengue mosquitoes to travel far away distances points toward an important key point Only infected human is capable of traveling and spreading Dengue over the globe Past Dengue victim records, population data, and human mobility data (obtained from CDR–cell towers) are input parameters to the model The model aims to predict the Dengue-infected human

only be gathered manually from hospitals and medical centers which may act as limitation sometimes. Even if the dataset is collected, it is usually area/timeline specific and lacks diversity as after experimenting the resultant key variable may be absent in other regions. Hence, one shoe fits all scenarios are hard to achieve in this context. Moreover, systems dealing with DF as well as DHF are comparatively more than those of DSS, whereas this is the phase that results in the mortality of patients with this disease. Similarly, mostly studies are focused on and around DHF prediction as they are less likely to be misdiagnosed in comparison with DF, so proposed models focusing on predicting DHF have attained more robustness and accuracy than the others. Table 3 gives an insight into other approaches that are worth studying but not often followed by the usual research trend of researchers which predominately focuses on forecasting Dengue diseases concerning cases/death reported or symptoms observed by humans. The study [29] work follows the assumption of human mobility’s role in disease propagation and even if the model showed more accurately the direction of change regardless of magnitude, but it failed to see the sudden peaks/drops in actual reported data for certain areas. The external factors for such findings were claimed outside the consideration of the study and recommendation to adopt an ensemble learning method. Incorporation of domain expertise was highly suggested to estimate the mosquito population as no credible source of human: Mosquito population ratio is available. The study [30] though claimed to achieve an accuracy of 80% along with

958

S. Kaur and S. Sharma

no false alarm in detecting Dengue mosquito yet the practical implementation of the above method in a real-time scenario seems tricky as the author failed not only to discuss the possibility of other species larvae co-habiting in that water body but also failed to highlight the efficiency of the proposed system in successfully identifying Dengue larvae apart from others, as the distinctive striped appearance alike that of the tiger comes in the very late phase of Aedes ageypti mosquitoes life cycle. As it needs Aedes gene mosquito to take a bite of Dengue-infected human, and within 3–4 days that mosquito will infect any other human with Dengue virus as it bites, researchers proposed a conceptual model which will not only work on data upload by health workers but also collects environmental parameters to evaluate mosquito breeding sites [31, 32]. They assumed the system will educate the community which will then participate in the community reporting facility. However, the limitation of resource-poor parts of India as well as the world can become a hindrance to the effectiveness. Also, the proposed system is a conceptual model whose work can only be accurately judged once this gets implemented, which may lead to the discovery of certain issues that may still not be able to be resolved in the present resource scope. As discussed earlier, the absence of the Dengue vaccine confines the absolute preventive measure, but still researchers are contributing to finding ways that increase the ways that could aid in combating this disease; one such initiative is taken by the Mexico government by introducing fish species named “Poecilia reticulata” in unused stagnant water bodies, thereby preventing mosquito from breeding/multiplying has resulted in a visible drop in Dengue cases reported in concerned state area [33]. Similarly, a scientist from Singapore has claimed to produce sterile male mosquito species in their laboratories which when bred with female Aedes mosquito (species responsible for Dengue infection transfer) inhibits the reproductive abilities which consecutively will result in a gradual decrease of the population of Aedes mosquito species in the concerned area successively reducing Dengue incidences as the vector responsible for Dengue infection is getting sterile [34]. More such new path studies should be done in Dengue prediction which upon merging with the traditional approaches yields an effective Dengue surveillance system that not only can predict Dengue outbreaks in advance inhibits the mosquito population causing disease but also aids inpatient management once the epidemic hits thereby lessening the burden on public healthcare services.

5 Conclusions This paper intends to give an insight into research work done on Dengue disease outbreak as well as diagnostics-related systems to emphasize the incredible potential of timely disease diagnosis and prediction. Around the globe, there may be some areas that have no or very little medical expert access. Those people can benefit from an intelligent medical system that can warn them about their present health risks of acquiring the disease and about its progression into a more life-threatening one. Infectious diseases are the trickiest time bombs which if not early detected as well as

Arboviral Epidemic Disease Forecasting—A Survey on Diagnostics …

959

managed properly can result in an epidemic with rising death tolls; COVID-19 can very well be witness to this. Vector-borne disease—Dengue has become one such global burden with the death toll being quite high and early detection almost always is confused with viral fever by the common people, making its prediction a more complex task. As discussed in Tables 1, 2, and 3, many experts have studied and proposed systems that gave high accuracies. Though no study has claimed to effectively develop a model which can stand-alone detect this infection in a timely and categorize each according to the type of Dengue infection, i.e., DF, DHF, and DSS nor they could accurately alert when the disease prognosis into a life-threatening stage. Also extensive analysis on the variants in symptoms of data repository needs to be carried out to check the validity of the inferences made by predictions. Also the false positive cases need to be eliminated by some approach applied to variants of symptomatic cases. However, after careful analysis of the studies, it can be concluded that a collaborative approach in which not only the outbreak prediction is the main focus but also diagnostics, vector handling, human mobility, IoT/machine learning are also incorporated such that each system works stand-alone as well as a unit in harmony to achieve the best possible Dengue surveillance system for the mass population. Systems like MYCIN and WATSON have already proved to the world how revolutionary the expert system can be in helping mankind achieve better health and disease-free aspects achieving not only more than 96% accuracy but also they beat the human intelligence with a robust timely response beyond human’s cognitive ability possible in that time frame. This paper successfully analyzed not only the most common approaches of Dengue forecasting but also gave an insight into new trends observed. Studies on Dengue mosquito controlling/eradication using IoT and machine learning, along with human mobility, opened an interesting direction that demands more future work. Moreover, it is seen that major work is done in Dengue cases/outbreak prediction, whereas patient eccentric prediction systems should be more encouraged. Symptomatic investigations to predict risk levels along with disease level evaluation could help people in getting alarmed to seek timely medical aid.

References 1. World Health Organization Thailand Dengue and Severe Dengue. http://www.searo.who.int/ thailand/factsheets/fs0008/en/. Accessed 28 April 2018 2. Aziz T, Lukose D, bin Abu Bakar S, Sattar A. A literature review of methods for dengue outbreak prediction 3. Mane TU (Feb 2017) Smart heart disease prediction system using Improved K-means and ID3 on big data. In: 2017 international conference on data management, analytics and innovation (ICDMAI). IEEE, pp 239–245 4. Alto BW, Bettinardi D (2013) Temperature and dengue virus infection in mosquitoes: independent effects on the immature and adult stages. Am J Trop Med Hyg 88(3):497–505 5. Gufar M, Qamar U (Nov 2015) A rule based expert system for syncope prediction. In: SAI intelligent systems conference (IntelliSys), 2015. IEEE, pp 559–564

960

S. Kaur and S. Sharma

6. Buczak AL, Baugher B, Babin SM, Ramac-Thomas LC, Guven E, Elbert Y, Koshute PT, Velasco JMS, Roque VG Jr, Tayag EA, Yoon IK (2014) Prediction of high incidence of dengue in the Philippines. PLoS Neglected Trop Dis 8(4):e2771 7. Phung D, Huang C, Rutherford S, Chu C, Wang X, Nguyen M, Nguyen NH, Do Manh C (2015) Identification of the prediction model for dengue incidence in Can Tho city, a Mekong delta area in Vietnam. Acta Trop 141:88–96 8. Othman MK, Danuri MSNM (May 2016) Proposed conceptual framework of dengue active surveillance system (DASS) in Malaysia. In: International conference on information and communication technology (ICICTM). IEEE, pp 90–96 9. Bal S, Sodoudi S (2020) Modeling and prediction of dengue occurrences in Kolkata, India, based on climate factors. Int J Biometeorol 64(8):1379–1391 10. Ooi JYL, Thomas JJ (Nov 2017) DengueViz: a knowledge-based expert system integrated with parallel coordinates visualization in the dengue diagnosis. In: International Visual Informatics Conference. Springer, Cham, pp 50–61 11. Idris MFIM, Abdullah A, Fauzi SSM (2018) Prediction of dengue outbreak in Selangor using fuzzy logic. In: Proceedings of the second international conference on the future of ASEAN (ICoFA) 2017–volume 2. Springer, Singapore, pp 593–603 12. Chakraborty A, Chandru V (2020) A robust and non-parametric model for prediction of dengue incidence. J Indian Inst Sci 1–7 13. Pravin A, Jacob TP, Nagarajan G (2019) An intelligent and secure healthcare framework for the prediction and prevention of dengue virus outbreak using fog computing. Health Technol 1–9 14. Mohapatra C, Rautray SS, Pandey M (Feb 2017) Prevention of infectious disease based on big data analytics and map-reduce. In: 2017 Second international conference on electrical, computer and communication technologies (ICECCT). IEEE, pp 1–4 15. Zhu G, Hunter J, Jiang Y (Dec 2016) Improved prediction of dengue outbreak using the delay permutation entropy. In: 2016 IEEE international conference on internet of things (iThings) and IEEE green computing and communications (GreenCom) and IEEE cyber, physical and social computing (CPSCom) and IEEE smart data (SmartData). IEEE, pp 828–832 16. Chellappan K (Dec 2016) A preliminary dengue fever prediction model based on vital signs and blood profile. In: 2016 IEEE EMBS conference on biomedical engineering and sciences (IECBES). IEEE, pp 652–656 17. Gambhir S, Malik SK, Kumar Y (2018) The diagnosis of dengue disease: an evaluation of three machine learning approaches. Int J Healthc Inf Syst Inf (IJHISI) 13(3):1–19 18. Saikia D, Dutta JC (Jan 2016) Early diagnosis of dengue disease using fuzzy inference system. In: 2016 international conference on microelectronics, computing and communications (MicroCom). IEEE, pp 1–6 19. Saha S, Saha S (Jan 2016) Combined committee machine for classifying dengue fever. In: 2016 international conference on microelectronics, computing and communications (MicroCom). IEEE, pp 1–6 20. Hashi EK, Zaman MSU, Hasan MR (Feb 2017) An expert clinical decision support system to predict disease using classification techniques. In: International conference on electrical, computer and communication engineering (ECCE). IEEE, pp 396–400 21. Divya A, Lavanya S (2020) Real time dengue prediction using machine learning. Indian J Public Health Res Dev 11(2) 22. Jiji GW, Lakshmi VS, Lakshmi KV, Priya SS (2016) Diagnosis and prognosis of the arbovirusdengue using intelligent algorithm. J Inst Eng (India): Ser B 97(2):115–120 23. Yang X, Tong Y, Meng X, Zhao S, Xu Z, Li Y, Liu G, Tan S (Aug 2016) Online adaptive method for disease prediction based on big data of clinical laboratory test. In: 2016 7th IEEE international conference on software engineering and service science (ICSESS). IEEE, pp 889–892 24. Gambhir S, Malik SK, Kumar Y (2017) PSO-ANN based diagnostic model for the early detection of Dengue disease. New Horiz Transl Med 4(1–4):1–8

Arboviral Epidemic Disease Forecasting—A Survey on Diagnostics …

961

25. Zainee NBM, Chellappan K (Feb 2017) A preliminary dengue fever prediction model based on vital signs and blood profile. In: 2016 IEEE-EMBS conference on biomedical engineering and sciences, IECBES 2016. Institute of Electrical and Electronics Engineers Inc 26. Sigera PC, Amarasekara R, Rodrigo C, Rajapakse S, Weeratunga P, De Silva NL, Huang CH, Sahoo MK, Pinsky BA, Pillai DR, Tissera HA, Jayasinghe S, Handunnetti S, Fernando SD (2019) Risk prediction for severe disease and better diagnostic accuracy in early dengue infection; the Colombo dengue study. BMC Infect Dis 19(1):1–8 27. Chovatiya M, Dhameliya A, Deokar J, Gonsalves J, Mathur A (April 2019) Prediction of dengue using recurrent neural network. In: 2019 3rd international conference on trends in electronics and informatics (ICOEI). IEEE, pp 926–929 28. Sangkaew S, Ming D, Boonyasiri A, Honeyford K, Kalayanarooj S, Yacoub S, Dorigatti I, Holmes AH (2020) Enhancing risk prediction of progression to severe disease during the febrile phase of dengue: a systematic review and meta-analysis. Int J Infect Dis 101:237–238 29. Fuad MAM, Ab Ghani MR, Ghazali R, Izzuddin TA, Sulaima MF, Jano Z, Sutikno T (2019) Detection of Aedes aegypti larvae using single shot multibox detector with transfer learning. Bull Electr Eng Inf 8(2):514–518 30. Babu AN, Niehaus E, Shah S, Unnithan C, Ramkumar PS, Shah J, Binoy VV, Soman B, Arunan MC, Jose CP (2019) Smartphone geospatial apps for dengue control, prevention, prediction, and education: MOSapp, DISapp, and the mosquito perception index (MPI). Environ Monit Assess 191(2):1–17 31. Abeyrathna MPAR, Abeygunawrdane DA, Wijesundara RAAV, Mudalige VB, Bandara M, Perera S, Maldeniya D, Madhawa K, Locknathan S (April 2016) Dengue propagation prediction using human mobility. In: Moratuwa engineering research conference (MERCon), 2016. IEEE, pp 156–161 32. Tian H, Sun Z, Faria NR, Yang J, Cazelles B, Huang S, Xu B, Yang Q, Pybus OG, Xu B (2017) Increasing airline travel may facilitate co-circulation of multiple dengue virus serotypes in Asia. PLoS Neglected Trop Dis 11(8):e0005694 33. Shafique M, Lopes S, Doum D, Keo V, Sokha L, Sam B, Vibol C, Alexander N, Bradley J, Liverani M, Hii J, Hustedt J (2019) Implementation of guppy fish (Poecilia reticulata), and a novel larvicide (Pyriproxyfen) product (Sumilarv 2MR) for dengue control in Cambodia: a qualitative study of acceptability, sustainability and community engagement. PLoS Neglected Trop Dis 13(11):e0007907 34. Alphey L, Benedict M, Bellini R, Clark GG, Dame DA, Service MW, Dobson SL (2010) Sterileinsect methods for control of mosquito-borne diseases: an analysis. Vector-Borne Zoonotic Dis 10(3):295–311

Special Session Bio Signal Processing Using Deep Learning

Convolution Neural Network for Weed Detection G. N. Balaji, S. V. Suryanarayana, G. Venkateswara Rao, and T. Vigneshwaran

Abstract For a productive and healthy harvest, it is essential to be able to not only spot weeds early on but also to manage their growth. Having an effective management strategy in the early season helps prevent weed infestation from spreading to other areas of the field. In this paper, a deep leaning method is applied to estimate the percentage of weed, crop, grass, and soil in an image. The similarity between weed, crop, and grass is more; therefore, it is a strenuous task to detect weed, crop, and grass from a visual. Initially, the data augmentation techniques are applied to generalize the model, and it enables the model to provide accurate results. In the proposed method, one of the most predominant methods of deep learning called convolutional neural network is utilized to build a system for detecting weeds. The convolutional neural network is trained to classify and detect soybean crops, broadleaf weed, grass, and soil in an image. RMSprop optimizer is used to compile the model as the magnitude of recent gradients is utilized to generalize the model. The loss function is set to be categorical cross-entropy as the model is a multi-class classifier and the model achieved 96% accuracy. Keywords Convolutional neural network (CNN) · Weed detection · Deep learning · Loss function

G. N. Balaji (B) School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India e-mail: [email protected] S. V. Suryanarayana Department of IT, CVR College of Engineering, Hyderabad, India G. V. Rao GITAM University, Visakhapatnam, India T. Vigneshwaran SRM TRP Engineering College, Mannachanallur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_65

965

966

G. N. Balaji et al.

1 Introduction Precision agriculture is a process of inventing new techniques in agriculture using information technology. Since weed control in the field is precision agriculture’s primary objective, weed growth in the field is managed and avoided using the techniques that precision agriculture established. The primary factor in agriculture that influences the crop’s development in the field is weed, and as a result, farmers’ potential financial gains are diminished. Weed absorbs all the plant growth factors of crops such as water, sunlight, space, and mineral nutrients, and it leads to a major loss for farmers such as the reduction in crop production, crop productivity, crop quality, and even the land worth also drops down. Farmers use cultural, biological, and mechanical approaches, which are costly for them to afford, to control the spread of weeds in their fields. As a result, they start using new technologies [1]. Farmers use pesticides and herbicides to control the growth of weeds and keep insects out of their fields, respectively, in a practice known as the chemical technique. To limit weed development, these chemicals are evenly sprayed across the field; however, this has a negative impact on crops by lowering crop quality, growth, and production. Site-specific weed management (SSWM) is a precision agricultural technique that uses a technology that can identify weeds at their earliest stages of growth and spray herbicides only where the weeds are [2]. Site-specific weed management approach achieves a reduction in usage of herbicides uniformly in the field; therefore, the effects of weed on the crop are resolved. Deep learning [3] will work well to create a weed detection system because it has been used in disciplines like computer vision, object identification, and image recognition in recent years. To begin with, we need a picture dataset to train the model; therefore, we get the images from Kaggle, an open-source image database. The downloaded photos are in RGB format, which divides each image into three channels (red, green, and blue). The dataset’s picture must be preprocessed in accordance with the model’s requirements in order to extract reliable features. The convolutional neural network is one of the most well-liked deep learning methods (CNN). Convolution neural networks, a sort of feed-forward network, have the ability to autonomously extract features and classify the data on their own. The four layers that make up the convolution neural network’s architecture are the convolution layer, the ReLU layer, the pooling layer, and the fully connected layer. The convolution layer is added after the input layer in the model architecture, and the convolution layer takes the data from the input layer to process and extract features from the data. In the convolution layer, a convolution operation is applied between the filter and input image to extract features, and it provides an outcome of a twodimensional feature map. As a next layer, we add the ReLU layer, and it consists of an activation function called rectified linear unit (ReLU). In the ReLU layer, the input is taken from the convolution layer, and it applies a nonlinear ReLU activation function. In this layer, the values that are above the threshold are accepted and transferred to the next layer by the activation function, and if the values are below the threshold or if it is zero, then it is been neglected by the activation function. The

Convolution Neural Network for Weed Detection

967

dimension of the feature map is been reduced in the ReLU layer by an activation function. Now the output of a ReLU layer is transferred to the next layer called the pooling layer, and in this pooling layer, we can apply either max-pooling or average pooling operations. After applying the pooling operation, it provides a feature map that has the highest probability value to classify the data. Later on, a fully connected layer is added to the network, and it takes input from the pooling layer. The fully connected layer is capable of handling only one-dimensional data, but the output of the pooling layer is in two dimensions so we have to apply a flatten method to convert the feature map to one dimension. Now the single stack of an array is been processed by the fully connected layer, and it provides a class score for all the classes in the model as an outcome that is used to categorize the data. The specific objective of this paper is to represent the convolution neural network classifying soybean crop, broadleaf weed, soil, and grass. It also shows how CNN can be used to detect the percentage of each element present in the given image.

2 Literature Survey The authors in [4] use an artificial neural network (ANN) to categorize the crop and weed. Confusion matrix results stated that ANN has performed better overall with an 88.7% recognition rate. The weed segmentation model described in this paper can be implemented to detect weeds at the six-leaf growth stage, and this model cannot detect weeds when there is an occlusion over crop. Sa [5] implemented a weed classification system to promote smart farming methods that replace the labor and manual work with robots in the agriculture field, to reduce the cost of crop yielding. They designed a micro-aerial vehicle with a multispectral camera to collect multispectral images of the field. The obtained images are of three types: They are only crop, only weed, and the third type consists of both crop and weed images. The normalized difference vegetation index is been used by them to extract vegetation index between soil and plant. To train the model, they used NVIDIA’S Titan X GPU module, and for model inference, they used the TegraTX2 embedded GPU module. The input images were processed by CUDA using Caffe C++, and Python languages were used to train the model in this paper. SegNet neural network with 26 convolutional layers is followed with ReLU layers, and they applied five max-pooling layers for the decoder and five up-sampling layers for a decoder. The author trained six different models with a different number of input channels, training conditions, and the model was evaluated based on F1-score and AUC metrics. The conclusion stated that the model has taken 30 h for training, and they found that the model performance is speed and accurate as they deployed the model with GPU. Due to the fact that weeds in the field are lowering the yield of grains, Om Tiwari [6] developed a weed identification system. The collection includes many varieties of weed, including Dactyloctenium aegyptium, Digera arvensis, Phalaris minor, and Echinochloa colona. The photographs were taken using a smartphone. With 500 training iterations, the CNN was trained, and its literacy rate is 0.004. To efficiently

968

G. N. Balaji et al.

classify weed and shorten training periods, the transfer learning approach is applied. They employed Python 3 and the TensorFlow framework to put the model into practice. Each weed can be identified with 95%, 65%, 61%, and 54% accuracy, respectively. Bhagyalakshmi [7] described their survey on weed and pest detection system using image processing techniques. They mentioned some procedure methods that have to be followed while implementing a weed detection system. To acquire images for the dataset, they suggested using a digital camera to capture images of crop and weed. In the preprocessing step, they recommend the conversion of captured images to either RGB, greyscale, or binary image type 15 to process the image efficiently for feature extraction. To avoid background and noise in the image, they recommended using the low-pass and high-pass filters. The authors mentioned some classification algorithms by referring to fifteen papers. To classify weed and crop, the author specified neural networks and clustering algorithms. In conclusion, they stated that the above-mentioned classifying algorithms accuracy lies between 85 and 95% based on the limitation of image acquisition. The goal of Basavarajeshwari [8]’s study on weed detection using image processing was to find a weed detection system approach that would assist farmers in reducing the usage of herbicides and improving crop quality. They highlighted a few papers that were referenced in their research. When putting a weed identification system into practice, the author advised using some image processing techniques. The suggested techniques include feature extraction, preprocessing, classification algorithms, and picture acquisition. The author discussed the techniques used in the seven studies that were reviewed. According to the conclusion, a model’s accuracy depends on the dataset’s constraints and the algorithm’s output image’s resolution. Johnson [9] presents a weed detection system that resolves the issue of weed in agriculture. The dataset consists of weed images captured using a Raspberry Pi camera, and the images are processed using the Raspberry Pi processor. In the preprocessing step, it includes image smoothening, erosion, and dilution. Finally, the diluted image was in binary form which enables the processor to identify the crop which is present in white color and remaining results as a weed. The cutter aligned to the system detects the weed and removes the weed using the cutting motor. The building model can detect the weed which is a bit far from the crops, and the system results in improving crop production by absorbing the nutrients present in the soil. (Lottes [10]) The objective of their paper is to reduce the use of agrochemicals such as herbicides or pesticides in the field as it is affecting the environment, biodiversity, and human health. To overcome this issue, they want to invent autonomous precision farming robots that can spray herbicide only on the weed. The dataset to train the model was collected using the BOSCH Deepfield Robotics BoniRob platform, and the images captured using camera JAI AD-130 GE consist of four channels including RGB + NIR. The images are preprocessed to improve the generalization of the classification system. The fully convolution neural network (FCN) with encoder and decoder has been used to classify weed and crop. The FCN uses pixel-wise segmentation in the decoder for up-sampling. The conclusion states that FCN generalizes the 16 models compared to other approaches and the model classification is more

Convolution Neural Network for Weed Detection

969

robust in classifying crops at their different growth stages. Di Cicco’s [11] goal is to develop an effective autonomous farming robot using an image processing technique. The objective of the robot is to reduce the use of herbicides and to increase crop production. The dataset consists of sugar beet field images, and the images are captured using the JAI AD-130 camera. The classification model is to build using a SegNet neural network with encoder and decoder for pixel-wise classification. The neural network was implemented using NVIDIA GPU GTX 1070, and the model was trained using synthetic data. The results state that the performance of the model is better to compare to the model trained with real-world images. Nathalia [12] collected dataset from the region of Cundinamarca–Boyaca in Colombia. The image was captured using a Nikon camera with a focal length of 18–55. The images are segmented using OpenCV to extract the region of interest. The ANN was trained with a dataset to identify the plant species, and the performance of training is evaluated by gradient algorithm. The behavior of the feedforward neural network was analyzed based on the neuron’s performance in the hidden layer. Vikhram [13] developed an automatic weed detection system that helps the farmers in detecting weeds automatically and sprays herbicide only on weed by doing this it enhances crop production and helps farmers to develop in agribusiness. The images were captured using a Raspberry Pi camera, and the images are preprocessed based on threshold, erosion, and dilation to decide the region of interest. The images are converted to greyscale using OpenCV, and a green color mask is applied to detect the crop from the background of the image. To spray herbicide on weeds, they programmed three separate codes for wheel motor, weed detection, and pump motor. The build system sprays herbicides only for 4 s, and the spraying process was performed using ragi and leaved weeds. The model conclusion states that the classification among a vast group of crops and weeds was a bit confusing for the classification model. Lottes [14] aims to implement a weed detection system that reduces the use of agrochemicals in the field. The dataset consists of images of sugar beet fields capture using RGB + NIR format. The images are collected by the BOSCH Deepfield Robotics BoniRob platform in the locations of Stuttgart in Germany. The JAI AD-130 GE camera was used to capture the images, and the images have four channel RGB + NIR. A small amount of data is used for training, and the images were labeled manually. The random forest classification is a semi-17 supervised algorithm used to implement a weed detection system. The performance of the model is compared with Bonn, Zurich, Stuttgart datasets, and the Zurich dataset performance was better to compare to other datasets. The configuration of systems is Intel i7 CPU, GeForce GTX-1080 GPU. The conclusion states that the classification performance is high and it takes only 1 min effort to label the image. Wendel [15] mentioned that precision agriculture tools can be emerged to spread awareness of environmental issues, and it also helps to reduce the waste caused in agriculture and mainly it enhances the crop quality and production. In this paper, they experimented to detect weeds in vegetable crops; therefore, they collected images of the corn crop and weed at Mulyan farms near Cowra, NSW. The varieties of weed found in the field are caltrop, curly dock, and Barnyard grass. The images are captured using Resonon Pika VNIR hyperspectral

970

G. N. Balaji et al.

camera, and the images consist of a 244 spectrum. In the preprocessing step, the images are labeled manually. The linear discriminant analysis (LDA) and SVM classification models were implemented and compared using scikit learn packages. The conclusion states that the SVM achieved better results compared to LDA while training the model with hyperspectral images. Dhayabarani [16] implemented a weed detection system to reduce the labor cost and to increase food resources. The dataset to train the model was downloaded from the Internet, and the images are cropped in the preprocessing step to indicate the region of the plant in the image. The CNN method is used to build the weed detection system, and the network processes RGB images with dimensions of 224 × 224. The features are automatically extracted by the convolution layer on CNN as it works better for the image recognition task. The model build in their experiment can classify healthy leaves and weed efficiently. Ngo [17] aims to detect weeds and to reduce the workload of farmers in the field author built a robot using LEGO Mindstorm EV3 and a computer. The dataset to train the model was collected manually, and the dataset has two classes: one is crop and the other is crop weed. In each class, there are 123 images, and they are stored in jpeg format. They used 80% of these images for training and 20% for validation and testing. In the preprocessing step, they used an image augmentation technique to avoid noise in the dataset. Later on, they trained CNN with the dataset, and finally, there are 16,390,691 parameters used by the model. The author experimented with two convolutions neural networks: one with the original dataset and the other with preprocessed data augmented images. Each model is been trained by 20 epochs. CNN with original image achieves train accuracy of 90.01% and testing accuracy of 85.5%, whereas the training and testing accuracy of CNN with data augmentation is 95.45% and 70.5%, respectively. The author concluded that the efficiency of the robot is based on the quality of software and hardware. Sarvini [18] compared three algorithms to analyze the performance of the model for weed detection. They mentioned weed control is the essential step in agriculture to maintain crop production. The dataset consists of three different kinds of weeds that are most commonly grown in the vegetable crop field. The images in the dataset are acquired by capturing the field using a 10MP digital camera. The images are preprocessed to detect greenness in the image. The three different classifiers used in this paper are SVM, ANN, and CNN. The results of the analysis state that CNN performance is better compared to SVM and ANN. The conclusion states that the accuracy of the model can be enhanced by extracting more robust features. Tejeda [19] researched based on tools of precision agriculture. The objective is to build a weed detection system using binary classification, and simultaneously, they mainly focused on image processing algorithms to identify weeds. The images in the dataset are acquired manually using a semi-professional camera, and the images are captured at a height of 1.20 m from the ground. In the preprocessing step, they applied a medium filter and threshold segmentation to reduce noise in images. Initially, the algorithm eliminates small objects in the image as weed, and based on the threshold, the objects are classified either as crop or weed. The results state that the weeds are

Convolution Neural Network for Weed Detection

971

identified correctly compared to vegetable crops and the build algorithm can detect weeds in between the crop lines.

3 Materials and Method 3.1 Data Acquisition and Pre-processing The dataset is been acquired from an open-source repository called Kaggle. The acquired dataset consists of 15,336 images, and all these images are in RGB format. The images in the dataset are already categorized into their respective soybean crop, broadleaf weed, soil, and grass directories. Data augmentation is an image preprocessing technique that is used to process the images as per the model specification, and it helps to extract robust features [3]. To preprocess the image, it consists of steps such as resize, color mode, flip, and class mode. The input images are resized to 150 × 150 dimensions, and the image color mode is set to RGB [20]. The class mode is set to be categorical as the model has to categorize the data to their respective classes. To preprocess all the images all at once, the system consumes more time so we have to apply batch size to preprocess the images. The batch size is set as 92 to preprocess the training data, and it prevents a load on the network to process the image.

3.2 Principle and Structure of CNN Since the convolution neural network (CNN) is a deep learning technology that has been widely utilized in computer vision, object detection, and image recognition applications, we can use it to create a system for weed detection. A large image dataset is needed to train CNN, and all of the images must be processed before the model can be trained. The primary benefit of a convolution neural network is its ability to automatically extract features by taking pixel correlation into account. The convolution layer, ReLU layer, pooling layer, and fully connected layer are the layers that constitute a convolution neural network in general. The multilayer CNN used to build the weed identification model contains four convolution layers and four max-pooling layers alternately. The feature map is extracted and then flattened into a one-dimensional stack because the fully linked layer can only handle onedimensional input. Each neuron in each layer is connected to the neurons in the layer below it in the dense layer, which is made up of a fully connected network. The dropout of 0.5 is applied to activate only 50% of neurons in the layer, and it will transfer only those features to the next layer which has a high probability value. The multilayer CNN enhances the model to extract more robust features. Figure 1 represents the layers in CNN.

972 Fig. 1 Structure of multilayer CNN

G. N. Balaji et al.

Convolution Neural Network for Weed Detection

973

3.3 Weed Identification Model To develop a weed identification system, a multilayer convolution neural network that has four convolution layers and four max-pooling layers alternatively is proposed [21]. In the convolution layer, a ReLU activation function is applied for dimension reduction and that helps to enhance the training speed of the model. The network is been trained according to Fig. 2. Initially, all the input images are processed as per the model specification and these images are used to train the model. The size of input images is 150 × 150, and these images are loaded into the input layer to train the model. The first convolution layer conv2d_1 takes input from the input layer and applies convolution operation. In conv2d_1, the filter is set to 32, the kernel size is 3 × 3, and the ReLU activation function is activated that provides an outcome of 148 × 148 × 32 dimensions of the feature map. The first max-pooling layer max_pooling2d_1 is used for downsampling, and the window size is set to 2 × 2. The max_pooling2d_1 provides an outcome of feature map 74 × 74 × 32 dimensions, and this data is passed as input for second convolution layer conv2d_2. In conv2d_2, the filter is set to 64 and the kernel size is 3 × 3 that provides an outcome 72 × 72 × 64 dimensions of the feature map. In the second max-pooling layer max_pooling2d_2, the down-sampling size is set to 2 × 2 and the stride is set to its default. The feature map we obtained in max_pooling2d_2 consists of 36 × 36 × 64 dimensions that are half of the conv2d_2 output. The third and fourth convolution layers in the model are conv2d_3 and conv2d_4. In conv2d_3 and conv2d_4, the kernel size is set to 3 × 3 and its filters are set to 128. The third and fourth max-pooling layers in the model are max_pooling2d_3 and max_pooling2d_4. The down-sampling size of max_pooling2d_3 and max_pooling2d_4 is set to 2 × 2. The conv2d_3 provides an output of feature map with 34 × 34 × 128 dimensions. The output of conv2d_3 is taken as input by max_pooling2d_3, and it provides an output of 17 × 17 × 128 dimensions of the feature map. Finally, the fourth convolution layer conv2d_4 takes input from max_pooling2d_3 and provides an outcome of 15 × 15 × 128 dimensions of the feature map. Later on, the fourth max-pooling layer max_pooling2d_4 with a down-sampling size of 2 × 2 provides an outcome that consists of a feature map with 7 × 7 × 128 dimensions. The feature map is transformed into a one-dimensional stack using the flatten layer, and its size is now 6272. The dropout method is used to activate a certain percentage of neurons in the layer to transfer the features that have high probability, and a dropout of 0.5 is added to avoid model overfitting. Finally, the two dense layers are added to the model dense_1 and dense_2, respectively. The first dense layer dense_1 consists of 512 neurons, and it processes the input stack of size 6272. In dense_1, we apply a ReLU activation function for dimension reduction, and after processing the input in dense_1, it provides an output of stack size 512. Now the second dense layer dense_2 takes the output of dense_1 which has a stack size of 512 as input. In dense_2, a SoftMax activation function is assigned to extract the features that have maximum probability, and finally, it provides an output of four classes. At last, the model is been trained with 3,454,660 trainable parameters.

974 Fig. 2 Structure of weed classification model with parameters

G. N. Balaji et al.

Convolution Neural Network for Weed Detection

975

4 Results and Parameters Intel Core i5-6200u processor is used to perform this experiment, the CPU speed is 2.30 GHz, and 8 GB of RAM and 300 GB of disk space are utilized to store and process the data. The dataset that we obtained to implement this experiment consists of 15,558 images totally, and these images are already been categorized into their directories soybean crop, weed, soil, and grass, respectively. From each directory, 60% of samples are used for the training set, 20% for the testing set, and the remaining 20% for the validation set. Initially, to preprocess the images we applied an image preprocessing technique called data augmentation and that results in model generalization. The input images are resized to 150 × 150 dimensions as the size of input data is been mentioned in the input layer. The epochs are set to be 15, and in each epoch, the number of iterations is decided based on training samples and a batch size of training generator.

4.1 Result of Pre-processing The input images are preprocessed using an image preprocessing technique called data augmentation which results in the extraction of robust features. In the first convolution layer conv2d_1, the input size is set to 150 × 150 so the original images are resized to 150 × 150 dimensions while preprocessing the data. In the preprocessing step, the images are rotated to an angle of 40°, and also the images are flipped horizontally to obtain a slightly modified data to train the model that enables the model to achieve generalization while training [22]. The preprocessing technique results in model generalization, and as the input data is been modified slightly, the network will learn more robust features.

4.2 Results of CNN The multilayer convolution neural network consists of four convolution layers, four max-pooling layers, one flatten layer, and two dense layers. The dropout of 0.5 is added to activate 50% of neurons in the network that results in transferring only those data that has a high probability feature value, and it also avoids model from overfitting. Finally, for 15 epochs the model results in 3,454,660 trainable parameters. The model is been analyzed based on the validation accuracy and loss values where the model has achieved an accuracy of 96.07% and the loss is 12%. Figures 3 and 4 describe the model accuracy and loss in graph format. It can be observed that more changes in validation since less number of features are considered.

976

G. N. Balaji et al.

Fig. 3 Analysis between training and validation accuracy

Fig. 4 Analysis between training and validation loss

4.3 Classification Result The model can classify the images that belong to either of the soybean crop, weed, grass, and soil classes. The confusion matrix of the model is computed to evaluate the precision, recall, F1-score, and support of each class. Table 1 represents the confusion matrix. Table 1 Representing confusion matrix Grass

Soil

Soybean

Weed

Grass

190

147

349

18

Soil

191

138

293

28

Soybean

395

326

692

62

60

42

123

13

Weed

Convolution Neural Network for Weed Detection

977

Table 2 Precision, recall, F1-score, and support for each class Grass

Precision

Recall

F1-score

0.23

0.21

0.22

Support 704

Soil

0.21

0.21

0.21

650

Soybean

0.48

0.48

0.48

1475

Weed

0.09

0.10

0.09

238

The precision value indicates how many images have been classified correctly that belong to their respective classes. Precision =

True positive True positive + False positive

The recall value denotes the false-negative instances in the given images. Recall =

True positive True positive + False negative

F1-score denotes the harmonic mean of precision and recall. F1-score =

2 ∗ (Recall ∗ Precision) (Recall + Precision)

The support value indicates the total number of occurrences of each class in the given dataset. Table 2 consists of a precision, recall, F1-score, and support for each class.

4.4 Identification Results CNN is used to distinguish between the agricultural components such as crop, weed, soil, and grass. Convolution layers of the convolution neural network automatically extract the features. The multilayer CNN is trained with preprocessed data, the model achieved an accuracy of 96.07%, and the loss is 12%. To identify the elements in an image, a function is been defined that loads the image which has to be predicted and the function can also preprocess the image as per the model specification [23]. Later on, the image that has to be predicted is been processed by CNN, and it provides an outcome as a list that consists of values that represent the percentage of each element present in the given image.

978

G. N. Balaji et al.

5 Conclusion In this paper, the agriculture elements such as soybean crop, weed, grass, and soil images are taken for research to implement a weed detection system. The data augmentation is an image preprocessing method that modifies the original image which is used for model training, and it provided a generalized model that can detect the elements even if the images are slightly modified. The classification model is been built based on the features extracted by CNN. The main advantage of using multilayer CNN is it reduces the time complexity of the model to extract and learn the features. This model can classify the agriculture elements such as crop, weed, grass, and soil in the images, whereas the existing systems are only capable of classifying weed and crop. This experiment provides a list as output that consists of values that represent the percentage of each element present in the given prediction image. The model used in this experiment can categorize all of the agricultural components in an image, making it useful for the weed detection system to distinguish weeds from other components and making it possible to only spray herbicide on weeds, which ultimately reduces the amount of herbicide used in the field.

References 1. Bosilj P, Duckett T, Cielniak G (2018) Analysis of morphology-based features for classification of crop and weeds in precision agriculture. IEEE Robot Autom 1–7 2. Gao J, Nuyttens D, Lootens P, He Y (2018) Recognising weeds in a maize crop using a random forest machine-learning algorithm and near-infrared mosaic hyperspectral imagery. Biosys Eng 170:39–50 3. Burgos-Artizzu XP, Ribeiro A, Tellaeche A (2010) Analysis of natural images processing for the extraction of agricultural elements. Image Vis Comput 28:138–149 4. Bakhshipour A, Jafari A (2017) Weed segmentation using texture features extracted from wavelet sub images. Biosyst Eng 157:1–12 5. Sa I (2018) Weed net: dense semantic weed classification using multispectral images and MAV for smart farming. Robot Autom 3:588–595 6. Tiwari O (2019) An experimental set up for utilizing convolutional neural network in automated weed detection. 2019 4th international conference on internet of things: smart innovation and usages (IoT-SIU) 7. Bhagyalakshmi (2013) A survey on weed and pest detection system. Int J Adv Res Comput Sci Manage Stud 33–38 8. Basavarajeshwari (2017) A survey on weed detection using image. Int J Eng Res Technol (IJERT) 1–3 9. Johnson R (2020) Weed detection and removal based on image processing. Int J Recent Technol Eng (IJRTE) 8(6):347–352 10. Lottes P (2018) Fully convolutional networks with sequential information for robust crop and weed detection in precision farming. IEEE Robot Autom Lett 3(4):1–8 11. Di Cicco M (2017) Automatic model based dataset generation for fast and accurate. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5188–5195 12. Nathalia B (2016) A computer vision application to detect unwanted weed in early stage. WSEAS Trans Comput Res 4, 41–45 13. Vikhram R (2018) Automatic weed detection and smart herbicide sprayer robot. Int J Eng Technol 115–118

Convolution Neural Network for Weed Detection

979

14. Lottes P (2017) Semi-supervised online visual crop and weed classification in precision farming exploiting plant arrangement. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5155–5161 15. Wendel A (2016) Self-supervised weed detection in vegetable crops using ground based hyperspectral imaging. In: 2016 IEEE international conference on robotics and automation (ICRA), pp 5128–5135 16. Dhayabarani R (2018) Detection of weed using neural network. Int J Eng Res Technol (IJERT) 6(8):1–5 17. Ngo HC (2019) Weed detection in agriculture fields using convolution neural network. Int J Innovative Technol Exploring Eng (IJITEE) 8(11):292–296 18. Sarvini T (2019) Performance comparison of weed detection algorithm. In: International conference on communication and signal processing, pp 843–847 19. Tejeda I (2019) Algorithm of weed detection in crops by computer vision. In: 29rd International conference on electronics, communications and computing, pp 124–128 20. Wu J (2018) Weed detection based on convolutional Nerual. IT Cognition 21. Tang J, Wang D, Zhang Z, He L (2017) Weed identification based on K-means feature learning combined with. Comput Electron Agric 135:63–70 22. Farooq A (2019) Analysis of spectral bands and spatial resolutions for weed classification via deep convolutional neural network. IEEE Geosci Remote Sens 183–187 23. Aravind R, Daman M, Kariyappa BS (2015) Design and development of automatic weed detection and smart herbicide sprayer robot. IEEE Recent Adv Intell Comput Syst 257–261 24. Bakhshipour A, Jafari A (2018) Evaluation of support vector machine and artificial neural networks in weed. Comput Electron Agric 145:153–160

Detecting Bone Tumor on Applying Edge Computational Deep Learning Approach G. Megala , P. Swarnalatha , and R. Venkatesan

Abstract Bone cancer (BC) affects the majority of the elderly in today’s world. It directly affects the neurotransmitters and leads to dementia. MRI images can spot bone irregularities related to mild cognitive damage. It can be useful for predicting bone cancer, though it is a big challenge. In this research, a novel technique is proposed to detect bone cancer using AdaBoost classifier with a hybrid ant colony optimization (ACO) algorithm. Initially, MRI image features are extracted, and the best features are identified by the AdaBoost curvelet transform classifier. The proposed methods yield better classification accuracy of 97% on analyzing MRI images and detecting the bone tumor in it. Three metrics, namely accuracy, specificity, and sensitivity, are used to evaluate the proposed method. Based on the results, the proposed methods yield greater accuracy than the existing systems. Keywords Ant colony optimization · AdaBoost classifier · Bone cancer · Curvelet transform · Convolutional neural network

1 Introduction One crucial aspect of cancer diagnosis that must not be disregarded is the ability to properly divide photos into numerous portions. The most popular and crucial medical examination techniques for projecting the infected segment to the foreground from a bone structure are magnetic resonance imaging and computed tomography. The process’s purpose (image segmentation) is to separate the image into several segments, each of which is represented in a less complex manner, and then extract meaningful information from the segmented regions. The goal of this study is to use MRI scans to detect the existence of a bone malignancy. Image intensity levels have G. Megala (B) · P. Swarnalatha Vellore Institute of Technology, Vellore 632014, India e-mail: [email protected] R. Venkatesan SRM TRP Engineering College, Tiruchirappalli 620014, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_66

981

982

G. Megala et al.

two essential characteristics: discontinuity and similarity. By locating the changes in intensity across the image, discontinuity aids the segmentation process [1]. The process of picture splitting begins with the detection of edges and intensity differences. Support vector machine, K-nearest neighbor, and adaptive neural fuzzy inference system are some of the most popular and widely used classification algorithms. Each one follows its own set of rules. For classification, the first described classifier employs a differential technique. The second method makes use of the feature’s similarity principle. The third technique combines neural networks and fuzzy logic, with the neural network assisting in determining the fuzzy system’s requirements [2]. For the photo segmentation process, there are pre-set standards. It divides the photo into corresponding regions with similar features, using the pre-set standards. Most present procedures necessitate the manipulation of numerous elements in order to achieve effects with great accuracy [3]. The multisensor scales, on the other hand, can function even if the aforesaid conditions are not met. Sarcoma of the bone is a rare type of bone cancer. One of the agents known to cause bone sarcoma is excessive ionizing radiation exposure. The overall architecture is shown in Fig. 1. The structure of this article can be briefly chronicled as follows: Sect. 2 comprises the extensive literature survey. Section 3 explains the proposed technique AdaBoost curvelet transform classifier and ant colony optimization algorithms, respectively. Section 4 encompasses results and discussion followed by the conclusion of the paper.

Fig. 1 Overall architecture for bone cancer net model

Detecting Bone Tumor on Applying Edge Computational Deep …

983

2 Related Works Globally, one of the most fatal diseases is bone cancer, and hence, its detection and treatment is an area of intense research. Its intricate tissue structure makes the process even more complicated. The study involves the use of the deep learning method on three major bone cancer types. It can be deduced from the research that nuclei count is paramount to defining the state of malignancy. The results from the study showed increased accuracy and performance [4]. The importance of early detection in the treatment of cancer cannot be overstated. Using image processing techniques, the study provides a cost-effective and simple way of detecting lymphatic cancer. The retrieved feature parameters are used to differentiate the normal tissues from the malignant lesions. The integration of image processing with machine learning algorithms such as SVM and random forests results in enhanced accuracy [5, 6]. N-fold cross-validation is performed on the selected features while training. Segmentation and classification are very vital processes to locate fractures in bones. The proposed wavelet transform-based segmentation technique not only keeps the important information of the image, but it yields an appropriate segmented image. Results prove that the proposed method is viable for all types of X-ray images with enhanced accuracy [7]. For metastatic bone lesions in prostate cancer patients, the study proposes a classification system based on deep learning. In order to establish the best possible technique for bone lesion categorization, deep classifiers were employed using a lesion extraction plan. Research proves that the part revealing the maximum information of a lesion image is texture followed by volumetric information [8]. The process of cancer detection can be enhanced through continuous research. The paper proposes a method of convolution neural networks, supervised learning, and morphological operations which were used. Classification of the type of cancer is carried out through CNN which reduces training time to a considerable extent. And from the extracted portion, the percentage area of cancer is evaluated. The integration method is proven to be more practical rather than using individual methods [9]. Using machine learning, the study aims to make a comprehensive analysis for the diagnosis and cure of cancer in different organs/parts of the human body. The effective application of different machine learning techniques such as supervised, unmonitored, and deep learning is demonstrated in the study. Solving the complications related to cancer diagnosis and accurate treatment is a challenging process [10]. Using different techniques of machine learning, the research encompasses a comprehensive assessment of all the processes and developments involved in cancer detection. It also undertakes a comparative analysis of many methods and highlights their advantages as well as disadvantages. Finally, it brings out the challenges and aspects to be incorporated for future work in order to enhance the process of cancer detection and cure [11]. A multistage process using automatic image processing for early detection of melanoma skin cancer is proposed. Adaptive principal curvature helps in the first

984

G. Megala et al.

stage, color normalization is used for separating the lesions, and the ABCD rule is deployed for feature extraction. Experimental results of the proposed method show higher accuracy and enhanced speed [12]. A structured review of sixteen works involved in the detection of leukemia using machine learning is carried out in the research. It can be inferred that compared to other existing techniques, deep learning proves to yield better results with regard to precision. The study also encourages the use of ML techniques by technicians in laboratory applications [13]. The importance of early discovery and classification of ruptures caused by proximal humerus fractures cannot be overstated. The goal of this research is to see if a deep learning algorithm can help with the above process. After that, a deep convolutional neural network was trained, which resulted in great accuracy. The findings demonstrate the effectiveness of artificial intelligence in detecting and classifying proximal humerus ruptures [14]. Acute lymphoblastic leukemia and multiple myeloma are classified differently as a time-consuming and error-prone manual technique. The study uses the SNAM dataset to reduce mistakes made during the manual procedure and save time. In comparison with traditional machine learning techniques, the suggested model achieved an overall accuracy of 97.2, making it a useful method for identifying the type of malignancy in the bone marrow [15]. The paper proposes a multi-step process for enhancing bone cancer segmentation. The success of the system in accurately detecting the tumor is because of using a fuzzy-possibilistic classification algorithm. The isolated cancer tissue from the experiment after surface measurement yields satisfactory results [16]. Based on spectral intensity ratios, the study proposes a classification technique for three bacteria. The function of autofluorescence microscopy prompted by light was also thoroughly examined in the work. In order to fulfill experimental conditions, the unnecessary lights were removed with the help of narrowband filters. The classification yielded very high sensitivity and specificity [17]. In order to aid both investigative and remedial processes, the paper deploys CNN and OpenCV Python for the identification of bone tumors and prediction of the different phases of cancer. Training data is utilized to distinguish malignant cells from those of non-cancerous ones. The method requires less manual intervention and is proven to be cheaper [18, 19].

Detecting Bone Tumor on Applying Edge Computational Deep …

985

3 Methodology 3.1 AdaBoost Curvelet Transform Classifier To extract features from MRI images, the curvelet transform is used. The waveletbased retrieval of the image is done to extract features without directional sensitivity. A multiresolution geometric analysis curvelet transform is done to the discrete wavelet transform to address the missing directional selectivity. For a multiple scale analysis, the image f (x, y) is represented as a continuous Ridgelet coefficient expressed as in Eq. (1), and the curvelet transforms can broaden their Ridgelet transforms: ¨ R f (a, b, θ ) =

ψa,b,θ (x, y) f (x, y)dx dy,

(1)

in which a, b denote the scale parameter, with a > 0, b. R f denotes the translation parameter, and θ denotes the orientation parameter. All of these coefficients are used to reconstruct the image, and the Ridgelet is defined in Eq. (2).  1

ψa,b,θ (x, y) = a 2 ψ

x cos θ + y sin θ − b a

 (2)

The wavelets are the ridges’ transverses, and ψ denotes the orientation of a Ridgelet, which is a constant line defined by x cos θ + y sin θ = const. Ridgelets are the basic building pieces for achieving a high anisotropy that captures edges better than a traditional sinusoidal wavelet [6, 9]. Also, the curvelets spectra envelop the image completely in the frequency plane, and hence, it is a powerful and effective tool to extract image features. The 2D image is represented by the Cartesian array f [m, n], 0 ≤ m < M, 0 ≤ n < N which is transformed to the curvelet based on Fourier sample wrapping. A scale j, an orientation l, and two different spatial location factors have been used to index the resulting curvelet coefficient (k 1 , k 2 ). The image is broken down into multiple subbands with different scales and orientations. Curvelet texture descriptors are created using statistical procedures, and these discrete curvelet coefficients are determined using an Eq. (3): C D ( j, l, k1 , k2 ) =



f [m, n]φ Dj,l,k1 ,k2 [m, n].

(3)

0≤m≤M

In which, each φ Dj,l,k1 ,k2 [m, n] designated digital curvelet waveform in this frequency range has sub-bands that use the effective parabolic scaling law. The image is decomposed into oscillating behavioral edges exhibited through the curvelets. These wrapping-based transforms can be considered multi-scale transforms which use pyramid structures with a number of orientations on both scales [20]. The

986

G. Megala et al.

Fig. 2 Proposed bone cancer detection

proposed hybrid approach is shown in Fig. 2. The curvelets are generally realized in the frequency domain to achieve the best efficiency. They are robust to be used in medical image feature extraction due to their capability of approximating curved singularities in edge-based features. Fourier product of the curvelet and image transformed in the frequency domain is done. Applying inverse Fourier to the image-curvelet product yields the curvelet coefficients.

3.2 Ant Colony Optimization Preprocessing methods for MRI images improve the detection of problematic areas. Two phases involved are the preprocessing and augmentation technique. To begin with, a positioning system removes MRI film artifacts such as labels and X-ray tags. Second, the ant colony optimization (ACO) technique is used to get rid of highfrequency components efficiently while comparing a result of DL-EDED to those of

Detecting Bone Tumor on Applying Edge Computational Deep …

987

the median, adaptive, and spatial filter systems. Before extensive data interpretation and separation, preprocessing including tracking functions that include noise removal is frequently necessary. This preprocessing is frequently characterized as radiological or geometrical improvements. The pheromone concentration of the paths and an intuitive assessment are the only criteria used to choose high-frequency segments. A specific ant, k, chooses which node to move to next throughout the construction process using a probabilistic action selection rule that calculates the likelihood that ant, k, will move from the current node, i, to the following node, j: 

pikj

 Ti j ]α [n i j ]β   i f j ∈ Nik .  Ti j ]α n i j ]β k Ni

(4)

Here Ti j is the arc or edge from node i to j. Nik implies the neighboring nodules for a specific ant k, considering node i as start node. The constants correspond to the subject and intuitive changes, respectively. Finally, n i j indicates the heuristics movements from node i to j. In the proposed approach, all of the images were focused on the sensory receptors. Various fields can be found inside a photograph. The images were taken from the Kaggle dataset repository, which in this case included a variety of images. For the assessment of pixel intensities in various sectors, there are significant factors. About 50 images of the multiple patients were captured. The actions and performance of the patient are ensured to be normal and accurate by the approach. Algorithm 1: Training visual modal on modified CNN Input: BC image of size 256*256 *3 Output: The weight parameters in.pkl files for no of EPOCHS and BATCH 1. Import tf, cv2, numpy, 2. Define hyper parameters that are as follow Learning rate = 0.5, Batch size = 25, Epochs = 2 3. Initialize placeholder 4. Create tf session using tf.sessiontf.Session(config = config) 5. Import model and initialize with pre-processed vggface model 6. Initialize optimizer, we used Adam optimizer and pass it with learning rate as mentioned above using tf.train.AdamOptimizer(learning_rate = LEARNING_RATE) 7. Read image data and labels from tf record reader with function tf. TFRecordReader() and decode it 8. Shuffle the batch with function tf.train.shuffle_batch() 9. Run the tf session and store the weight parameters in.pkl files for no of EPOCHS and BATCH 10. Save the whole session and dump it to files to use in prediction using tf.train.Saver()

988

G. Megala et al.

4 Experimental Analysis This section makes use of the BC-AdaBoost, BC-PCA-AdaBoost, BC-PSOAdaBoost, and BC-PCA-ACO-AdaBoost. The number of classification trees can be maximized up to 300. The tree depth can be maximized up to 6. Table 1 is a summary of the findings. The categorization accuracy is shown in Fig. 3, and the true positive and true negative rates are shown in Figs. 4 and 5. In Fig. 3, it is found that the BC_ANN-ACO-AdaBoost has improved classification accuracy by 16.47% than BC-AdaBoost, by 12.72% than BC-PCA-AdaBoost, by 5.21% than AD-PSO-AdaBoost, and by 3.34% BC-CNN-ANN-ACO-AdaBoost. Table 1 Results AdaBoost

CNN-ACO-AdaBoost

ANN-ACO-AdaBoost

CNN-ANN-ACO

CNN-ANN-ACO-AdaBoost

Accuracy

82

85

88

91

93

Normal true positive

89

91

92

95

96

True positive MCI

67

73

80

82

85

True positive BC

57

58

67

75

78

True negative normal

78

81

84

88

90

True negative MCI

92

93

95

96

98

Accuracy 94 92 90 88 86 84 82 80 78 76

Fig. 3 CNN-ANN-ACO-AdaBoost accuracy

Accuracy

Detecting Bone Tumor on Applying Edge Computational Deep …

100 90 80 70 60 50 40 30 20 10 0

989

Normal True PosiƟve True PosiƟve MCI True PosiƟve BC

Fig. 4 True positive rate for BC-CNN-ANN-ACO-AdaBoost

100 90 80 70 60 50 40 30 20 10 0

True NegaƟve Normal True NegaƟve MCI True NegaƟve PD

Fig. 5 True negative for BC-CNN-ANN-ACO-AdaBoost

The average true positive rate for AdaBoost is 9.75% for BC-CNN-PSOAdaBoost, compared to 6.73% for BC-ANN-AdaBoost and 3.54% for BC-ACOAdaBoost and 3.21 for BC-CNN-ANN-ACO-AdaBoost, as shown in Fig. 4. Figure 5 shows that the BC-CNN-ACO-AdaBoost has a higher average true negative rate for AdaBoost than the Pd-PCA-AdaBoost, the BC-CNN-AdaBoost, and the BC-ANN-AdaBoost and BC-CNN-ANN-ACO-AdaBoost. Table 2 represents the performance results of classification models.

990

G. Megala et al.

Table 2 Classification results Classification

F-measure

Sensitivity

Specificity

Accuracy

BC-AdaBoost

82.8

75.9

84.5

70.02

BC-CNN-AdaBoost

76.2

77.9

89.7

76.06

BC-ANN-AdaBoost

90.77

84.1

91.28

86.03

BC-ACO-AdaBoost

83.15

76.03

84.27

88.07

BC-CNN-ANN-AdaBoost

85.4

78.4

86.5

90.2

BC-CNN-ANN-ACO-AdaBoost

90.05

80.56

88.7

97.02

120 100 80 60 40

F-Measure

20

SensiƟvity

0

Specificity Accuracy

Fig. 6 Bone cancer detection results

Figure 6 shows the BC-CNN-bonce cancer detection achievement for AdaBoost than the BC-ANN-AdaBoost, the BC-CNN-AdaBoost, and the BC-ANN-ACOAdaBoost and BC-CNN-ANN-ACO-AdaBoost.

5 Conclusion In this paper, an edge computational deep learning approach is proposed to detect the tumor found in bone MRI images. Detecting bone cancer is the hardest part which must be diagnosed at an early stage due to its devastating effects. It has been found through a review of the relevant literature that there are several methods used to

Detecting Bone Tumor on Applying Edge Computational Deep …

991

detect bone cancer, each with its own set of advantages and disadvantages. In the proposed procedure, features are initially extracted by preprocessing, edge detection, morphological operation, segmentation, and testing, before being used to train and evaluate the neural network. CNN with AdaBoost curvelet transform classifier methods are used to detect bone malignancy. Finally, the procedure produces the desired result for the system. The experimental analysis shows that the accuracy of 97.02% is achieved corresponding to the ground truth values. In future, other optimization techniques such as elephant herding optimization, shuffle frog leaping optimization, and butterfly optimization are to be implemented for improving the performance.

References 1. Binhssan A (June 2015) Enchondroma tumor detection. Int J Adv Res Comput Commun Eng 4(6) 2. Verma A, Khanna G (2016) A survey on digital image processing techniques for tumor detection. Indian J Sci Technol 9(14) 3. Chaudhary A, Singh SS (2012) Lung cancer detection on CT images by using image processing. Int Conf Comput Sci 4. Patil BG (2014) Cancer cells detection using digital image processing methods. Int J Latest Trends Eng Technol 3(4) 5. Nithila EE, Kumar SS (2016) Automatic detection of solitary pulmonary nodules using swarm intelligence optimized neural networks on CT images. Eng Sci Technol Int J 6. Taher F, Werghi N (2012) Lung cancer detection by using artificial neural network and fuzzy clustering methods. Am J Biomed Eng 2(3):136–142 7. Milligan GW, Soon SC, Sokol LM. The effect of cluster size, dimensionality, and the number of clusters on recovery of true cluster structure. IEEE Trans Pattern Anal Mach Intell 5(1) 8. Deen KJ, Ganesan R (2014) An automated lung cancer detection from CT images based on using artificial neural network and fuzzy clustering methods. Int J Appl Eng Res 9(22):17327–17343 9. Reddy KK, Anisha PR, Raju GVS (2015) A novel approach for detecting the tumor size and bone cancer stage using region growing algorithm. Int Conf Comput Intell Commun Netw 10. Mistry KD, Talati BJ (2016) An approach to detect bone tumor using comparative analysis of segmentation technique. Int J Innovative Res Comput Commun Eng 4(5) 11. Kumaresan T, Saravanakumar S, Balamurugan R (2017) Visual and textual features based email spam classification using S-Cuckoo search and hybrid kernel support vector machine. Cluster Comput. Springer. https://doi.org/10.1007/s10586-017-1615-8 12. Avula M, Lakkakula NP, Raja MP (2014) Bone cancer detection from MRI scan imagery using mean pixel intensity. In: Asia modeling symposium 13. Miah MBA, Yousuf MA (2015) Detection of lung cancer from CT image using image processing and neural network. In: International conference on electrical engineering and information and communication technology 14. Megala G, Prabu S, Liyanapathirana BC (2021) Detecting DDoS attack: a machine-learningbased approach. In: Applications of artificial intelligence for smart technology. IGI Global, pp 55–66 15. Al-Tarawneh MS (2012) Lung cancer detection using image processing techniques. Leonardo Electron J Practices Technol 20:147–158 16. Hadavi N, Nordin MJ, Shojaeipour A (2014) Lung cancer diagnosis using CT-scan images based on cellular learning automata. IEEE

992

G. Megala et al.

17. Saravanakumar S, Thangaraj P (2018) Hybrid optimized PSO using greedy search for identifying Alzheimer from MRI images. Int J Pure Appl Math 119(15):2391–2403 18. Saravanakumar S, Thangaraj P (2018) A voxel based morphometry approach for identifying Alzheimer from MRI images. Cluster Comput J Netw, Softw Tools Appl 9(3):142–156. Springer, Publishers 19. Sharma A, Yadav DP, Garg H, Kumar M, Sharma B, Koundal D (2021) Bone cancer detection using feature extraction based machine learning model. Comput Math Methods Med 20. Sinthia P, Sujatha K (July 2016) A novel approach to detect the bone cancer using K-means algorithm and edge detection method. ARPN J Eng Appl Sci 11(13)

The Third Eye: An AI Mobile Assistant for Visually Impaired People R. S. Hariharan, H. Abdul Gaffar, and K. Manikandan

Abstract Today, many hardware products and mobile devices are developed to provide useful features, but their effectiveness and practicability do not fully help visually impaired people. They are dependent on people around them for their daily needs. If no one is available next to them, they become helpless to do many simple tasks independently. Thus, this paper is intended for visually impaired people, who can become more self-reliant with the help of the Third Eye android mobile application. They will be able to identify objects around them and find the object they need with the YOLO object detection algorithm. Learn whatever they wish to be read with the optical character recognition algorithm and operate an android application. They use mobile internal features on their own using text-to-speech converter and voice listener functionalities. This android mobile application eliminates their need for a person around them and makes their life smart and simple. It is developed using multiple domain ideas and efficient algorithms that are well designed and implemented to be an ideal product for visually impaired people. This great development of technology helps them overcome the challenges they face in their daily life. Keywords YOLO object detection · Optical character recognition · Image processing · User-friendly interface · User experience

1 Introduction Many ideas were proposed to solve the problems faced by visually impaired people in their daily life, and applications were developed in the market. But, most of these applications do not turn out to be an ideal product for the blind, which aligns with their practical usage in the long term. They aim to provide the best functionality to the R. S. Hariharan · H. Abdul Gaffar (B) · K. Manikandan Vellore Institute of Technology, Vellore, India e-mail: [email protected] K. Manikandan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_67

993

994

R. S. Hariharan et al.

user with minimal importance given to the usability part of the application by them. The Third Eye application focuses on improving visually impaired people’s lives by assisting them in a user-friendly way with useful features. The first key feature is detecting objects using object recognition and informing the user through voice commands. The YOLO (you only look once) object detection algorithm is used to achieve the same. The second key feature uses the results of the first feature, gets input from the user, filters the data accordingly to see if any of the input matches the results, and informs the user. The third key feature of the application is reading text in images with optical character recognition (OCR) technology. Users can now choose this feature, show their device over written text, and tap on the screen. The application will start capturing video, process it into image frames, recognize text in the image, and return it as output to the system to read it out for the user. Other features of the application include sending SMS, emails, and text messages and making phone calls over voice commands. Email is sent from the account linked during the first installation of the application with the help of JavaMail API. SMS and text messages are sent by configuring the device. This device is unique from other existing devices by providing the features mentioned above in a more convenient and user-friendly manner, as discussed in the following sections. The user interface and user experience provided by this application are designed purposefully to serve them with ease of using this entire application on their own. The developed application successfully achieves its objectives by providing desired results presented in this paper. The outcome of this product would prove its effectiveness and practicality in providing various features in a single mobile application.

2 Literature Survey Understanding other methods and techniques taken to solve the identified problem, and studying the existing products in the market, will help develop a well-designed, optimized, and more effective product. The drawbacks of the existing models will help us identify mistakes we might make in our product design. Different approaches taken to solve the problem are studied and analyzed in this section, contributing to identifying the key requirements and helping in designing a system more efficiently than the existing ones. In [1], Redmon et al. presented a new approach for detecting objects called YOLO–you only look once. Instead of using classifiers, they used regression to identify spatially separated bounding boxes in an image and their associated class probabilities. The architecture of YOLO works extremely fast by processing 155 frames per second and is highly unlikely to predict false positives in the background. Also, the speed and accuracy of YOLO are better than other existing RCNN and DPM detection methods. The authors of [2], Jayawardhana et al., analyzed the novel text-to-speech algorithm based on an attention-based and fully convolutional deep voice mechanism. They proposed a methodology for achieving better accuracy and

The Third Eye: An AI Mobile Assistant for Visually Impaired People

995

correctness in speech synthesis using a text-to-speech synthesizer that does phoneticto-acoustic mapping with a neural network approach. Brunello et al. [3] proposed a preprocessing technique for improving the performance of tesseract optical character recognition on pictures with colorful backgrounds. In this technique, the images are clustered into two categories: images with text and images without text. Then, OCR is performed on images belonging to the former cluster category to obtain text content from the image using text segmentation. The proposed methodology improves the system’s performance by approximately 20%. Tepelea ¸ et al. [4] understood the potential of smartphones to change the lives of visually impaired people and built an android application to assist them daily. Their android application has MEMS sensors to gather external sensory information, a text-to-speech converter and an external Bluetooth Wi-Fi module to communicate with users, and a user-friendly interface to help them to use the application easily. Galvez et al. [5] used a convolutional neural network (CNN) algorithm to recognize objects around them. They analyzed a single-shot multi-box detector (SSD) and Faster-RCNN in terms of performance speed and accuracy. In this paper [6], Milios Awad et al. presented an android mobile application called “The Intelligent Eye” that assists visually impaired people. This single application provides various useful features like identifying objects, detecting color, detecting light, and recognizing banknotes, thus reducing the cost and complexity of the product. A blind assistive and tracking embedded system was developed by Tanveer et al. [7]. This system helps blind people physically navigate and tracks their location using GPS to help them if they are lost. They are provided with a spectacle that sends real-time feeds to an android application, detects obstacles in their path, and guides them according to the object’s position. Saeed et al. [8] developed an android application that allows visually impaired people to detect and locate objects by comparing captured object features and stored features of real objects. In a paper [9], Dahiya et al. proposed a robust, low-cost system that helps visually impaired people identify the symbols of public facilities like restrooms, pharmacies, etc., using the deep learning method. This method which is trained over 450 images uses faster region-convolutional neural networks with ResNet50 to identify the public amenities with high accuracy of 92.13%. Lin et al. [10] aimed to provide a safe, convenient system for visually impaired people to identify things in their indoor environment, like remote control using deep learning and a device camera. This system keeps its users informed about the location of all objects used to train the deep learning method through mobile applications. In a paper [11], Badave et al. developed an android application that identifies different objects around visually impaired users and navigates them to an object location by informing them about its distance and direction. It utilizes the device camera and audio to get inputs and informs the user about the findings. Vaithiyanathan et al. [12] presented a smart reader which captures images from the device camera, enhances them by removing noise and blur, analyzes them using optical character recognition developed by Google Cloud Vision, and outputs it to the visually impaired people through device audio and text-to-speech conversion. In [13], Jiang et al. discuss a system that automatically recognizes signs and text

996

R. S. Hariharan et al.

using the optical character recognition algorithm and helps visually impaired people navigate. This system uses the smartphone’s text-to-speech conversion feature and computer vision to interact with its users. The authors of the paper [14], Maiti et al., designed and developed an intelligent electronic eye that captures data and helps visually impaired people to walk independently on roads. The user has to put on a helmet that collects visual data of the user’s environment using image and obstacle sensors and informs the user using the speaker invoice format. Yadav et al. [15] proposed a low-cost navigation stick made up of an ultrasonic sensor, wet floor sensor, camera, and DSP processor that can help the machine learning model recognize objects, detect obstacles, and navigate user away from them in a real-time environment. The above study on proposed ideas by different authors to solve the problem gives an insight into their strategies and techniques used in building a system to make the life of a visually impaired person better and easier. The section allowed us to understand multiple approaches, their results, drawbacks, and future scopes of their systems, helping us identify the missing gaps and key requirements in developing the best system for them.

3 Proposed System and Workflow The application has been developed for visually impaired people, it does not have to depend on anyone for some of their daily needs, and it aims to simplify their lives. Besides, it can identify objects around them, find the object they need, learn whatever they wish to be read, and operate a phone easily through voice assistance to make phone calls, send SMS, and know the battery percentage without human intervention. YOLO object detection algorithm and optical character recognition algorithm are used for implementing these features. Today, blind people are most likely doing things in their way and leading life like normal people. The Third Eye application focuses on solving certain key problems which they are not able to overcome particularly. By providing a perfect user interface and user experience, we ensure that this application is suitable for real-life daily usage with them as the primary users. This contribution greatly makes a difference in visually impaired people’s lives by helping them become more self-reliant. Many organizations look forward to funding and bringing such products into the market. The object detection feature of the application uses the YOLO object detection algorithm to identify the objects through video capturing and processing. The object finding feature has the same functionality with an additional action of getting input from the users about the object they are searching, checking if it matches any of the identified ones, and informing the user about the same. An advanced form of image classification is object detection which uses a neural network to identify objects in an image and form bounding boxes around them. It detects and localizes objects to a predefined set of classes. The YOLO object detection algorithm used in the app uses end-to-end neural network to make accurate

The Third Eye: An AI Mobile Assistant for Visually Impaired People

997

Fig. 1 YOLO object detection framework

predictions of bounding boxes of objects and their classification. It divides the entire image uniformly into N grids of size S × S and detects and localizes objects in each. These N grids predict bounding boxes, the object label, and the probability of its presence in the cell. Since both detection and recognition are handled by the cells of an image, the computation is much lower. But, it results in multiple bounding boxes and predictions for the same object. Hence, the YOLO algorithm suppresses cells with low probability scores using non-maximal suppression. It picks up the bounding box with the largest score and suppresses those having the largest intersection over union with the current one until the final boxes are obtained. Figure 1 depicts the YOLO object detection algorithm framework, where the input image goes through multiple CNN and max pooling layers to produce a final output image. The application allows visually impaired people to read any text by showing the phone above written material and tapping on the screen. It uses optical character technology to achieve the same. Other features like emailing, making phone calls, and sending SMS are done by configuring personal accounts and phone settings. Inputs required by the application are captured through the Google Voice listener, and all the outputs and status of the application are communicated through the device’s speaker. Figure 2 shows the architecture of the Third Eye. The blind and visually impaired are the target users who interact with the android application using the available text-to-speech and Google listener services. They can use the application simply by long pressing the mobile screen, tapping it, and following the instruction they hear from the app. On long pressing any part of the screen, the application tells them about the feature they are pressing, and by double tapping, they can utilize it. All required permissions and account authorization are to be done by the buyer of this application only once during installation.

3.1 Individual Modular Design There are eight different modules in this android application which include Object Detection; Object Finding; Reading Text in the object; Sending Emails; Making Calls; Sending SMS; Knowing Battery percentage, and Knowing the Current Time. The app has two intentional activities in it. Intent 1 provides the first four listed features, and Intent 2 provides the last four listed features to the user.

998

R. S. Hariharan et al.

Fig. 2 Architecture diagram of the Third Eye

Fig. 3 Modular design for object detection feature of the Third Eye

3.1.1

Feature 1—Detect Objects Around the User

Figure 3 shows the low-level design of feature 1. Once the user selects this feature, he is asked to turn his camera to landscape mode and turn around. Live video is captured and sent as an image frame to the backend. The YOLO object detection algorithm identifies the objects in each frame and dictates them to the user using the text-to-speech converter.

3.1.2

Feature 2—Find the Object that the User Needs

Figure 4 depicts the low-level design of feature 2. Once the user selects this feature, the application gets the object’s name as input from them, and they are asked to turn their camera to landscape mode and turn around. Live video is captured and sent as an image frame to the backend, where the YOLO object detection algorithm identifies the objects in each frame. Once any object identified matches the user’s input, the application intimates it to the user using the text-to-speech converter.

The Third Eye: An AI Mobile Assistant for Visually Impaired People

999

Fig. 4 Modular design for object finding feature of the Third Eye

Fig. 5 Modular design for reading text in object feature of the Third Eye

3.1.3

Feature 3—Read Text in an Object

Figure 5 shows the low-level design of feature 3. When the user selects this feature, the android application configures the device camera and starts video capture. Each video image frame will be processed using OCR, and the text will be identified and shown with a bounding box on the screen. The application reads the text for the user by tapping the bounding box.

3.1.4

Feature 4—Send Email

Figure 6 shows the low-level design of feature 4. When the user selects this feature, the application inputs the recipient’s email address, subject, and message using a Google Text listener and sends mail using JavaMail API. Post this process, the application updates the user about the success status of the same.

3.1.5

Feature 5—Make Phone Calls

Figure 7 shows the low-level design of feature 5. Once the user selects this feature, the android application configures the device call system and gets the phone number

1000

R. S. Hariharan et al.

Fig. 6 Modular design for email sending feature of the Third Eye

Fig. 7 Modular design for making call feature of the Third Eye

from the user using a Google Text listener. Then, it automatically dials and establishes the call for the user.

3.1.6

Feature 6—Send SMS

Figure 8 shows the low-level design of feature 6. This works similarly to feature 5. Once the user selects this feature, the android application configures the device messaging system and gets the number from the user using a Google Text listener. Then, it uses a Java SMS manager and sends SMS automatically.

Fig. 8 Modular design for sending SMS feature of the Third Eye

The Third Eye: An AI Mobile Assistant for Visually Impaired People

1001

Fig. 9 Modular design for knowing battery percentage feature of the Third Eye

3.1.7

Feature 7—Know Battery Percentage

Figure 9 shows the low-level design of feature 7. When the user selects this feature, the android application configures the phone settings and fetches information about battery percentage. Then, it uses conditioning to tell the battery percentage along with an alert like device charging required, device fully charged, device battery stable, etc.

3.1.8

Feature 8—Know the Current Time

Figure 10 shows the low-level design of feature 8. When the user selects this feature, the android application configures the phone settings, fetches information about the current time, and intimates it to the user.

Fig. 10 Modular design for knowing the current time features of the Third Eye

1002

R. S. Hariharan et al.

4 Result and Discussion The Third Eye application has a well-designed user interface to be an ideal system for visually impaired people, giving them the best user experience. Figure 11 shows the user interface of the application with large-sized buttons. Pressing the button informs the user what feature is selected, and double tapping it will start running the feature. Throughout the app, instructions are set to tell the user about any change of intent in the application. The Third Eye application interacts with the user. It gets their requirements as inputs using a Google Voice listener, processes them, and provides output to the user through text-to-speech converter. The following images in this section show the result obtained in implementing the features of the android application. In Fig. 12, the application detects a TV monitor with a confidence level of 89% and is bounded by a box with the object name displayed. The application detected a laptop with a confidence level of 84%, as bounded by a box with the object name displayed. It was captured by running feature 2 to find out the object around the user. The object, name laptop, is taken as input from the user, and on detecting the object, the app repeatedly intimates that the object (laptop) is found until it is present in the frame. The text in the book is recognized using OCR and bound by a box with the recognized text inside it. By tapping these boxes, the user can hear the text recognized. Object detection algorithms like Fast-RCNN and Faster-RCNN break down object detection into two stages. First, they detect possible object boundaries in an image and then later classify them. Although these algorithms have high mean average precision (mAP) on detection, their detection speed is slow due to multiple iterations on the same image. Hence, they cannot be used for real-time detection. Compared to real-time object detectors, YOLO is faster by a speed of as high as 45 frames per second. Unlike two-stage algorithms, YOLO performs its prediction more accurately and the better intersection of bounding boxes using a single fully connected layer and single iteration.

Fig. 11 User interface of the Third Eye application

The Third Eye: An AI Mobile Assistant for Visually Impaired People

1003

Fig. 12 Detection of a TV monitor and a laptop and reading a book by the Third Eye application

5 Conclusion Hence, the objective of the paper to develop an AI android application for blind and visually impaired people is achieved. An emphasis is placed on user experience in this application, so those blind individuals can use it on their own. Blind people can now be self-reliant through this application and overcome their daily challenges.

References 1. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Las Vegas, NV, USA, pp 779–788 2. Jayawardhana P, Aponso A, Krishnarajah N, Rathnayake A (2019) An intelligent approach of text-to-speech synthesizers for English and Sinhala languages. In: 2019 IEEE 2nd international conference on information and computer technologies (ICICT). IEEE, Kahului, HI, USA, pp 229–234 3. Brisinello M, Grbi´c R, Stefanoviˇc D, Peˇckai-Kovaˇc R (2018) Optical character recognition on images with colorful background. In: 2018 IEEE 8th international conference on consumer electronics-berlin (ICCE-Berlin). IEEE, Berlin, Germany, pp 1–6 4. Tepelea ¸ L, Gavrilu¸t I, Gacsadi A (2017) Smartphone application to assist visually impaired people. In: 2017 14th international conference on engineering of modern electric systems (EMES). IEEE, Oradea, Romania, pp 228–231 5. Galvez RL, Bandala AA, Dadios EP, Vicerra RRP, Maningo JMZ (2018) Object detection using convolutional neural networks. In: TENCON 2018–2018 IEEE region 10 conference. IEEE, Jeju, Korea (South), pp 2023–2027 6. Awad M, El Haddad J, Khneisser E, Mahmoud T, Yaacoub E, Malli M (2018) Intelligent eye: a mobile application for assisting blind people. In: 2018 IEEE middle east and North Africa communications conference (MENACOMM). IEEE, Jounieh, Lebanon, pp 1–6 7. Tanveer MSR, Hashem MMA, Hossain MK (2015) Android as-sistant EyeMate for blind and blind tracker. In: 2015 18th international conference on computer and information technology (ICCIT). IEEE, Dhaka, Bangladesh, pp 266–271 8. Saeed NN, Salem MAM, Khamis A (2013) Android-based object recognition for the visually impaired. In: 2013 IEEE 20th international conference on electronics, circuits, and systems (ICECS). IEEE, Abu Dhabi, United Arab Emirates, pp 645–648

1004

R. S. Hariharan et al.

9. Dahiya D, Gupta H, Dutta MK (2020) A deep learning based real time assistive framework for visually impaired. In: 2020 international conference on contemporary computing and applications (IC3A). IEEE, Lucknow, India, pp 106–109 10. Lin W-J, Su M-C, Cheng W-Y, Cheng W-Y (2019) An assist system for visually impaired at indoor residential environment using Faster-RCNN. In: 2019 8th international congress on advanced applied informatics (IIAI-AAI). IEEE, Toyama, Japan, pp 1071–1072 11. Badave A, Jagtap R, Kaovasia R, Rahatwad S, Kulkarni S (2020) Android based object detection system for visually impaired. In: 2020 international conference on industry 4.0 technology (I4Tech). IEEE, Pune, India, pp 34–38 12. Vaithiyanathan D, Muniraj M (2019) Cloud based text extraction using Google cloud vison for visually impaired applications. In: 2019 11th international conference on advanced computing (ICoAC). IEEE, Chennai, India, pp 90–96 13. Jiang H, Gonnot T, Yi W-J, Saniie J (2017) Computer vision and text recognition for assisting visually impaired people using android smartphone. In: 2017 IEEE international conference on electro information technology (EIT). IEEE, Lincoln, NE, USA, pp 350–353 14. Maiti M, Mallick P, Bagchi M, Nayek A, Rana TK, Pramanik S (2017) Intelligent electronic eye for visually impaired people. In: 2017 8th annual industrial automation and electromechanical engineering conference (IEMECON). IEEE, Bangkok, Thailand, pp 39–42 15. Yadav S, Joshi RC, Dutta MK, Kiac M, Sikora P (2020) Fusion of object recognition and obstacle detection approach for assisting visually challenged person. In: 2020 43rd International conference on telecommunications and signal processing (TSP). IEEE, Milan, Italy, pp 537–540

Machine Learning Model for Brain Stock Prediction S. Amutha, S. Joyal Isac, K. Niha, and M. K. Dharani

Abstract The goal of the research work is to create a prediction model for brain stroke. People who live at higher altitudes have a lower risk of dying from strokes. The effect can be found at between 2000 and 3500 m. When the blood supply to part of brain is disrupted, a stroke can happen. The brain cells begin to die. When the blood flow to even a single part of the heart is blocked leads to heart attack. The muscles of the heart dies if there is not enough oxygenated blood. A stroke happens when a blood vessel in the brain is damaged. Machine learning is being applied to the healthcare system to predict diseases early. Data is the main necessity of artificial intelligence. The dataset is used to build a machine learning model. The aim is to predict the chances of stroke using machine learning techniques. A comparison is made for better accuracy with the use of four different algorithms. Aim is to create an application that is easy to use and navigate. Keywords Brain stock · Decision tree · Logical regression · Naive Bayes · Prediction model and random forest

S. Amutha (B) · K. Niha (B) School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India e-mail: [email protected] K. Niha e-mail: [email protected] S. Joyal Isac Saveetha Engineering College, Chennai, India e-mail: [email protected] M. K. Dharani Kongu Engineering College, Erode, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_68

1005

1006

S. Amutha et al.

1 Introduction During the flow of blood to any part of our brain is diminished, a stroke occurs where the brain’s tissue gets deprived of oxygen and nutrients. In minutes, the degeneration of brain cells occurs. Heart muscle starts to deteriorate when it is not supplied with oxygen. A stroke is an attack on the brain that stops the crucial supply of blood and oxygen to the brain. A blood artery that supplies the brain with blood becomes blocked or bursts during a stroke. Machine learning is now used in the healthcare system if there is a probability of early disease prediction [1]. Data is the primary requirement for artificial intelligence. The historical dataset is gathered, and a machine learning model is created using that dataset. Bivariate analysis and un-invariant analysis, among other necessary pre-processing methods, are used. A classification model is developed using machine learning after the data is visualized to help with feature interpretation. Algorithms are compared based on performance criteria like accuracy, F1-score, recall, etc. Stroke has been a significant health burden for both the individuals and the national healthcare systems and is the second highest cause of death in the world. Figure 1 illustrates some of the potentially modifiable risk factors [2] for stroke, such as hypertension, heart illness, diabetes, glucose metabolism dysregulation, atrial fibrillation, and lifestyle choices. Therefore, the objective of our proposed work is to successfully predict strokes based on potentially modifiable risk variables by using machine learning methods to huge current datasets. After that, it planned to create an application that would provide each user a customized warning based on their level of stroke risk as well as a message about changing their lifestyle to reduce their risk. There has been amazing progress in lowering death rates all throughout the world. This study uses a prediction model to estimate the risk of stroke [3] in older adults who are dependent on the study’s risk factors. The result of current study work can be expanded in the future to provide the stroke percentage.

2 Related Work The goal is to ultimately bring to the reader’s notice about the information with current literature on the topic and helps in the formation of the base for future research that might be essential in the area with a simple compiled jest of the essential sources. It consists of series of design [4] and combined jest as well as combination in Fig. 2. A summary is known as a recap of vey essential information, but a synthesis is information that are reorganized. It will give a new perspective of old material [5] or a combination of new interpretations with the previous ones, or it could also depict the field with intellectual progression. However, because they are unable to simulate the complexity on feature representation [6] of the medical problem domains, standard predictive models or techniques are still insufficient for capturing the underlying

Machine Learning Model for Brain Stock Prediction

Fig. 1 Stages of brain stroke

Fig. 2 Stroke functional connectivity

1007

1008

S. Amutha et al.

information. This study presents [7] deep learning model-based prediction analysis methods for stroke using heart disease datasets. The symptoms of atrial fibrillation in heart patients are a significant risk factor for stroke and have similar characteristics that can predict stroke. The findings of this study are more precise than existing medical grading systems for alerting cardiac patients who are at risk of stroke. Although a stroke can be treated, it often results in impairment [8]. We employ magnetic resonance imaging text records to forecast stroke and measure performance in accordance with the document-level and sentence-level representation to understand the degree of handicap brought on by stroke. It is a significant contributor to impairment in the elderly people as well as the adult, which has a variety of economic and social repercussions. If a stroke is not treated, it could be fatal. Most often, patients with stroke have been discovered to have abnormal biosignals. In the above study, we have introduced a stroke prediction system that uses machine learning to identify stroke using real-time biosignals [9]. The purpose of the study was to use machine learning to analyze the data. The inclusion criteria that were specifically designed were used for the data cleaning. There have been two datasets created, each consisting of 575 facial motor nerve conduction study reports. The datasets were introduced to four machine learning algorithms: logistic regression [10], support vector machine [11], linear regression [12], and reinforcement learning [13]. Several algorithms have been compared for accuracy and recall rate, and the reinforcement learning method performs better in both the congenital heart disease (CHD) [14] and the International Stroke Trial (IST) [15] datasets. Additionally, comparisons of each algorithm’s performance with and without deviation standardization were conducted, and the findings demonstrate that deviation standardization improves accuracy. Three classification algorithms are used in the experiment: logistic regression, support vector machines (SVM), and reinforcement learning. As a result, it has been inferred that reinforcement learning is the best algorithm for the diagnosis. It is important to note that feature importance ranking can help with diagnosis that turn out clinical and has clinical potential for diagnosis and diagnostic evaluation. One of the most common diseases is stroke. Large and complex datasets are collected and analyzed to understand this disease [5]. The analysis of this data is a big challenge. There is a need to find effective techniques to deal with the huge dataset. Researchers look at how different risk variables affect a person’s start of stroke in order to anticipate strokes. The chance of having a stroke is then predicted using the machine learning methods and techniques such as neural networks, decision trees, and linear regression utilizing the data that has been analyzed. A new hybrid framework [16] is proposed in this study to predict stroke disease utilizing two primary processes, clustering and classification. Recent studies that proposed stroke prediction frameworks using data mining methodologies have been evaluated. The dataset is subjected to enhance hierarchical clustering, after which five classifiers are assessed and contrasted. The algorithms being employed are XGBoost, random forest, support vector machine, and logistic regression. According to accuracy and AUC, they all produce good results. Random forest classifier has succeeded the finest result, which is (97%).

Machine Learning Model for Brain Stock Prediction

1009

Machine learning has long been hailed as the next big thing, changing research, as well as influencing sectors like finance and retail. Naturally, the prospect of future healthcare improvement is of great interest [17]. However, there are many challenges involved in applying conventional machine learning techniques in this field, with interpretability being the most significant one. We concentrate on the medical issue of stroke, which is one of the most common but preventable disorders afflicting Americans today, making it a particularly appealing problem to tackle. The topic of forecasting stroke existence, location, acuity, and mortality risk for patient populations at two separate hospital systems is addressed in this thesis by the application of novel interpretable prediction approaches. Here, a tree-based, interpretable technique is about as effective—if not more— than black-box methods. We investigate new interpretable approaches created with medical applications [18] and their problems in mind using the clinical learnings from these investigations. We provide a unique regression approach to forecast results when the population consists of noticeably distinct subpopulations, and we show that this gives equivalent performance with enhanced interpretability. Finally, we investigate fresh approaches to natural language processing for text-based machine learning. We suggest an alternative end-to-end methodology [19] that uses an interpretable linguistics-based technique to model words and goes from raw textual input to predictions. This work shows the potential of novel interpretable, frugal algorithms [20] in the field of stroke and other general healthcare issues.

3 Proposed Work Datasets from various sources are combined to create a generalized dataset after initial exploratory data analysis of pulmonary disease prediction was completed. Afterward, various machine learning algorithms are implied to for extracting the series and to produce outcome with highest degree of accurateness. In this section of the report, data wrangling has already been completed. Move ahead and load the info, to have a glance for tidiness, and then to cut short and tidy the provided dataset for analyzing. Making sure to precisely folder the tidying decisions and show the requirement.

3.1 Data Collection Two categories of dataset are used such as training set and test set. The ratio of data used is 7:3 ratios for the training and test set. The data model, that was developed using the methods of machine learning [21], is then being implied to the training set as well as the test set is predicted is going to be based on the accuracy of the test results. The ML algorithms prediction model in Fig. 3 is effective for constructing

1010

S. Amutha et al.

Fig. 3 Architecture of proposed model

a classification model for predicting stroke because it offers superior outcomes in classification problems. Data collection is absolutely needed for machine learning, and there is also a lot of historical old data. Data collection [22] has enough raw data and historical data. Raw data pre-processed before using. Pre-processing is required before the type of algorithm with the model in Fig. 3 may be used. This model has undergone testing and training and helps in making precise anticipations also with little number of errors. A model is being tuned to improve accuracy.

3.2 Data Pre-processing The rate of error of the machine learning (ML) model is obtained using validation procedures and is thought to be very close to the actual rate of error of the dataset as possible. It might not require the approaches of validation if the amount of data is sufficient to be represent the amount of population. However, working with data samples that might not accurately represent the population of a given dataset is necessary in real-world circumstances [23]. Finding duplicate values, missing values, and information about the data type are all necessary. The subset of data that is used to assess a model’s fit to train dataset while rectifying model hyper parameters. As the validation dataset skill is united into model setup, the evaluation gets mislead in an increasing rate. A model that is given is evaluated using the validation set, although this is done frequently. This information is used by machine learning developers to rectify the model hyper parameters. Understanding data and its characteristics can assist choose the method to use to construct model during the identification of data phase. Several distinct data cleaning jobs are utilizing Python’s Pandas module, with an eye on missing values—possibly the biggest data tidying task—and the ability to clean data swiftly. Less effort should be spent cleaning data,

Machine Learning Model for Brain Stock Prediction

1011

Fig. 4 Data pre-processing

and more time should be spent investigating and modeling. Some of these sources contain merely careless errors. Sometimes there may be a more significant cause for missing data. It is critical from a statistical perspective to comprehend these various missing data kinds. The kind of missing data will affect how it is handled in terms of filling in the blanks, identifying missing values, basic imputation, and a thorough statistical methodology. Before writing any code, it is crucial to comprehend where the missing data is coming from. Here are few typical explanations for missing data: • The field has been forgotten by the user • Data got lost while being transferred manually • Error in program. Manipulators declined to enter information in a field related to their opinions of how the outcomes will be used or perceived, as illustrated in Fig. 4. A stepby-step approach is provided for a variable identification with univariate, bivariate, and multivariate analyses. First, read the dataset and load libraries for access and functionality. Next, analyze the dataset for general characteristics. Finally, display the dataset as a data frame with columns and a description of its structure. Once this process is complete, examine the data type and dataset information for data duplication, missing values, unique values, or data frame count values. Finally, to change the name and to put down the provided frame of data and highlight the types of values and crest extra columns.

3.3 Formulating and Analyzing Data Import the related library packages along with dataset. Using data type for identification of variable, data shape, and assessing the copied and values that are missing. A validation dataset, as depicted in Fig. 5, is a data sample that been withheld from the training of model and used to access the skill during the evaluation of models. To evaluate the univariate, bivariate, and multivariate processes, data cleaning and preparation steps include the name change of the given dataset, deleting of columns, etc. Depending on the dataset, different procedures and methods will be used to clean

1012

S. Amutha et al.

Fig. 5 Data validation

the data. To maximize the data value in the field of analytics and decision-making, data cleaning’s main objective is to identify and remove mistakes. Data visualization [24] is an important ability in machine learning and applied statistics. In fact, the main aim of statistics is focused on the numerical estimates and the descriptions of data. Data visualization provides the essential set of tools for comprehending a qualitative understanding. This is helpful for discovering trends, corrupted data, outliers, and much more while exploring and getting to know a dataset. In plots and charts that are more of void and not intellect and holders of stake than again measures the importance, data visualizations can be utilized to express and demonstrate essential relationships. This is possible with a little subject knowledge. It is advised to take a deeper look at some of the books listed in Fig. 6 because visualization of data and exploratory analysis of data are separate and significant topics within themselves. Data can sometimes only be understood when it is presented on a format of visual treat. Both applied statistics and applied machine learning consist of the requirement to quickly view samples of data and other types of information. It will show the Fig. 6 Comparison of BMI and glucose level

Machine Learning Model for Brain Stock Prediction

1013

Fig. 7 Summarized data

various plot types that are available and the usage of them and for better understanding of own data when visualizing data in Python. Figure 7 illustrates how box plots and histograms can be used to summarize data distributions. Pre-processing is referred to as the adjustments made to data prior to algorithm feeding. A method for transforming unclean data into clean datasets is known as data pre-processing. The data must be organized to get the best outcome from the machine learning method used to apply the model. Some machine learning models have specific information requirements, such as the random forest method not supporting null values. Therefore, to run the random forest method, null values from the original raw dataset must be controlled. Additionally, the dataset must be prepared so that many machine learning and deep learning algorithms can be run on the given dataset, as illustrated in Fig. 8.

4 Result and Analysis It is crucial in regularly comparing the various machine learning algorithms, and it is going to become clear that using scikit-learn and Python, it is possible to develop a test harness for this purpose. There will be variations in the performance attributes of each model. We may estimate each model’s potential accuracy on unobserved

1014

S. Amutha et al.

Fig. 8 Work-type pie chart (use of machine learning and deep learning algorithms based on worktype attribute)

data by using the various techniques like the pre-validation. It must be capable of selecting one or two of the best from the group of models that we have developed using these estimates. It is a worthy knowledge to envision new datasets by means of a variety of ways to view the data from many angles. The choice of models follows the same logic. To select the one or two that will be used for finalization, the estimated accuracy should be examined with machine learning algorithms in various methods. To display the average accuracy, variance, and other characteristics of the model accuracies distribution, usage of the various visualization techniques is one way to achieve this. Here, we will learn precisely how to achieve that in Python using scikit-learn in the following section. Making sure that each algorithm is evaluated uniformly on the same data is essential for conducting a fair comparison of machine learning algorithms, and this might be done by requiring that let each algorithm be tested using a uniform test harness. Four alternative algorithms are being compared in the example below: • • • •

Logistic regression Random forest Naive Bayes Decision tree classification.

Machine Learning Model for Brain Stock Prediction

1015

K-means cross-validation [25] technique is used to test each algorithm to show guarantee that the same separations to the training data are being carried out and that each of the algorithms is assessed precisely. Install scikit-learn libraries prior to that comparing procedure while building a machine learning model. Pre-processing is a linear model with the logistic regression method, cross-validation using the K-fold method, an ensemble with the random forest method, and a tree with a decision tree classifier must all be completed in this library package. Separating the train set and test set is also a good idea. To compare accuracy when forecasting an outcome. Here, the correctness of the expected result has been discussed. A value using a linear equation and independent predictors are predicted by the algorithm that is used for the logistic regression. Anywhere from negative infinity to positive infinity can be the expected value. Variable data must be categorized in the algorithm’s output. Having in mind the best accuracy, the logistic regression model predicts outcomes absolutely with a higher degree of precision. The percentage of all accurate prediction, or in other words, how frequently the model predicts defaulters as well as non-defaulters precisely. The easiest performance metric to get to know is accuracy [26]. One would believe that if our model is accurate, it makes best one. The outcomes are displayed in Fig. 9. The mathematical formulation for accuracy = (TP + TN)/(TP + TN + FP + FN). The proportion of the precisely anticipated positive observations to all positive observations that are predicted is known as precision [27]. How many of all the passengers who were declared alive did so is the major question that this measure attempts to give us the answer. High precision also the low false positive rate is related. The mathematical formulation for precision = TP/(TP + FP). Our precision was 0.788, which is quite good. The percentage of optimistic predictions that come true. The outcomes are displayed in Fig. 10. The projected positive observed value’s Fig. 9 Levels of accuracy (x-axis represents the four algorithms (LR, DT, RF, and NB), y-axis represents the measure of accuracy)

1016

S. Amutha et al.

percentage that came true. (The percentage of real defaulters that the model will forecast accurately) Sensitivity is to be said as the proportion of correctly foreseen positive results to all other results [28]. The outcomes are displayed in Fig. 11. The mathematical formulation for recall = TP/(TP + FN). By calculating harmonic mean of a classifier’s precision along with recall, we get the F1-score [29] into a single metric after combining. It mainly used to compare the two classifiers. Assume classifiers A and B to have higher recall as well as precision, respectively. The F1-scores for both the classifiers in the current situation can be Fig. 10 Levels of precision score (x-axis represents the four algorithms (LR, DT, RF, and NB), y-axis represents the measure of precision)

Fig. 11 Levels of recall score (x-axis represents the four algorithms (LR, DT, RF, and NB), y-axis represents the measure of recall)

Machine Learning Model for Brain Stock Prediction

1017

Fig. 12 Levels of F1-score (x-axis represents the four algorithms (LR, DT, RF, and NB), y-axis represents the measure of F1-score)

Table 1 Performance analysis metrics for different algorithms Algorithm

Accuracy (%)

Recall (%)

F1 (%)

Precision (%)

Logistic regression (LR)

78

77.6

77.6

77.5

Decision tree classifier (DT)

66

77.5

77.6

77.5

Random forest classifier (RF)

73

73.5

72.7

72

Naive Bayes classification (NB)

80

83.7

80.4

77.4

utilized to assess which one would yield superior results. Averaging the precision and recall weighted. Therefore, both the false positives as well as the false negatives are taken into consideration while calculating the score. Although we know that F1 is much more effective than the accuracy, especially having an uneven class of distribution, it is not as simple as that to understand precisely. When false positives and false negatives both costs almost the same, accuracy comes out as the best. If the costs of false positives and false negatives are completely different, it is necessary to include both precision and recall. The outcomes are displayed in Fig. 12. The performance of each algorithm has been analyzed in terms of metrics (accuracy, recall, precision, and F1) and has been depicted in Table 1. General Formula F-measure = 2TP/(2TP + FP + FN) F1-score Formula F1-score = 2*(Recall * Precision)/(Recall + Precision).

5 Conclusion According to a recent study, residents of higher altitudes have a lower incidence of stroke and death from stroke. A stroke tends to occur when the flow of blood to a

1018

S. Amutha et al.

portion of brain is cut off or diminished, where the brain’s tissue gets deprived of oxygen and nutrients. In minutes, the degeneration of brain cells occurs. When a blood clot, which typically blocks flow of blood to the affected area of the heart, a heart attack results. Heart muscle starts to deteriorate when it is not supplied with oxygen. A blood artery that supplies the brain with blood becomes blocked or bursts during a stroke. The dataset is gathered, and a machine learning model is being created using that dataset. The major goal of this study is to forecast the likelihood of stroke utilizing cutting-edge machine learning techniques. The accuracy of four distinct algorithms (decision tree, logistic regression, random forest, and Naive Bayes) is compared. Data preparation and processing, analysis of missing the value, exploratory analysis, and even the model construction as well as the evaluation came first in the analytical process. The highest accuracy score on the public test set will be discovered. With regards to stroke prediction in relation to an AI model, an automation of the process for result prediction in online applications or desktop applications, these applications might be useful. Acknowledgements Funding agency: NA. Conflict of Interest: We declare no conflict of interest. Availability of Dataset: Kaggle dataset.

References 1. Klem GH, Lueders HO, Jasper HH, Elger C (1999) The ten-twenty electrode system of the international federation. Recommendations for the practice of clinical neurophysiology: guidelines of the IFCN, Elsevier 2. Sanei S, Chambers JA (2013) EEG signal processing. Wiley, New York 3. Chaovalitwongse WA, Prokopyev OA, Pardalos PM (2006) Electroencephalogram (EEG) time series classification: applications in epilepsy. Ann Ope Res 148(1):227–250 4. Moselhy HF (2011) Psychosocial and cultural aspects of epilepsy. In: Novel aspects on epilepsy. InTech 5. Logesparan L, Rodriguez-Villegas E, Casson AJ (2015) The impact of signal normalization on seizure detection using line length features. Med Biol Eng Comput 53(10):929–942 6. Amin HU, Malik AS, Ahmad RF, Badruddin N, Kamel N, Hussain M, Chooi W-T (2015) Feature extraction and classification for EEG signals using wavelet transform and machine learning techniques. Austr Phys Eng Sci Med 38(1):139–149 7. Esteller R, Echauz J, Tcheng T, Litt B, Pless B (2022) Line length: an efficient feature for seizure onset detection. In: Engineering in medicine and biology society, proceedings of the 23rd annual international conference of the IEEE, vol 2. IEEE, pp 1707–1710 8. Lojini LAJC, Rodriguez-Villegas E (2012) Optimal features for online seizure detection. Med Biol Eng Comput 50(7):659–669 9. Guerrero-Mosquera C, Trigueros AM, Franco JI, Navia-Vázquez Á (2010) New feature extraction approach for epileptic EEG signal detection using time-frequency distributions. Med Biol Eng Comput 48(4):321–330 10. Basha NK, Wahab AB (2020) Automatic absence seizure detection and early detection system using CRNN-SVM. Int J Reasoning-based Intell Syst 11(4):330–335

Machine Learning Model for Brain Stock Prediction

1019

11. Esteller R, Echauz J, Tcheng T, Litt B, Pless B (2001) Line length: an efficient feature for seizure onset detection. In: Engineering in medicine and biology society, 2001. Proceedings of the 23rd annual international, conference of the IEEE, vol 2. IEEE, pp 1707–1710 12. Hussein R, Elgendi M, Wang ZJ, Ward RK (2018) Robust detection of epileptic seizures based on l1-penalized robust regression of EEG signals. Exp Syst Appl 104:153–167 13. Chen D, Wan S, Xiang J, Bao FS (2017) A high-performance seizure detection algorithm based on discrete wavelet transform (DWT) and EEG. PLoS ONE 12(3):0173138 14. Guo L, Rivero D, Dorado J, Rabunal JR, Pazos A (2010) Automatic epileptic seizure detection in EEGs based on line length feature and artificial neural networks. J Neurosci Methods 191(1):101–109 15. Koolen N, Jansen K, Vervisch J, Matic V, De Vos M, Naulaers G, Van Huffel S (2014) Line length as a robust method to detect high-activity events: automated burst detection in premature EEG recordings. Clin Neurophysiol 125(10):1985–1994 16. Shimizu M, Iiya M, Fujii H, Kimura S, Suzuki M, Nishizaki M (2019) Left ventricular endsystolic contractile entropy can predict cardiac prognosis in patients with complete left bundle branch block. J Nucl Cardiol 1–10 17. Quintero-Rincón A, D’Giano C, Batatia H (2019) Seizure onset detection in EEG signals based on entropy from generalized Gaussian pdf modeling and ensemble bagging classifier. Digital health approach for predictive, preventive, personalized and participatory medicine. Springer, New York, pp 1–10 18. Basha NK, Wahab AB. Single channel EEG signal for automatic detection of absence seizure using convolutional neural network. Recent Adv Comput Sci Commun (Formerly: Recent Patents Comput Sci) 14(6):1781–1787 19. Salem O, Naseem A, Mehaoua A (2014) Epileptic seizure detection from EEG signal using discrete wavelet transform and ant colony classifier. In: IEEE ICC, selected areas in communications symposium 20. Rashmi A, Vanjerkhede K, Bhyri C (March/April 2016) Study and analysis of EEG signal. Int J Res Eng Technol 1(3) 21. Nanthini BS, Santhi B (2021) Seizure detection using SVM classifier on EEG signal. J Appl Sci 14:16581661 22. Samiee K, Kovacs P, Gabbouj M (Feb 2015) Epileptic seizure classification of EEG time-series using rational discrete short-time Fourier transform. IEEE Trans Biomed Eng 62(2) 23. Birjandtalaba J, Pouyana MB, Cogana D, Nourania M, Harveyb J (2021) Automated seizure detection using limited-channel EEG and non-linear dimension reduction. Comput Biol Med, Elsevier 82:49–58 24. Patidar S, Panigrahi T (2017) Detection of epileptic seizure using Kraskov entropy applied on tunable-Q wavelet transform of EEG signals. Biomed Signal Process Control 34:74–80 25. Chena L-L, Zhanga J, Zoua J-Z, Zhaob C-J, Wangb G-S (2021) A framework on wavelet-based nonlinear features and extreme learning machine for epileptic seizure detection. Biomed Signal Process Control 10:1–10 26. Kevrica J, Subasib A (2017) Comparison of signal decomposition methods in classification of EEG signals for motor-imagery BCI system. Biomed Signal Process Control 31:398–406 27. Mahmud M, Kaiser MS, Hussain A (2020) Deep learning in mining biological data. arXiv: 2003.00108 28. Boonyakitanont P, Lek-uthai A, Chomtho K, Songsiri J (2020) A review of feature extraction and performance evaluation in epileptic seizure detection using EEG. Biomed Signal Process Control 57:101702 29. Siddiqui MK, Morales-Menendez R, Gupta PK, Iqbal HM, Hussain F, Khatoon K, Ahmad S (2020) Correlation between temperature and COVID-19 (suspected, confirmed and death) cases based on machine learning analysis. J Pure Appl Microbiol

The Prediction of Protein Structure Using Neural Network S. M. Shifana Rayesha, W. Aisha Banu, and Sharon Priya

Abstract Protein structure with the contact maps of atoms is predicted in neural network model. The model or the data sample is collected form CASP13 database. Segregation of data sample into alpha fold separately. Hence, the orientation of each molecule and contact maps the prediction of inter residues atoms present in alpha or beta fold is discovered with the neural networks. Neural network model training the input data set of alphafold protein structure and inter connecting residues with atomic orientation of molecules is predicted. We validate the entire training and test set with high level accuracy in atomic residues with similar structures. The neural network yield high percentage of efficiency in predicting the atomic structure compare to the other model. The accuracy of model is demonstrated with the previous experimental model is comparatively high in neural network model. The novel approach of deep learning incorporates the knowledge about the protein alignment of amino acid residues prognosis for biological genomes in an efficient technique. Keywords CASP13 · C=O · Helix · N–H group · Alpha fold · α-Helix · ψ · ϕ dihedral angles · POTTS model · HHblist · Ramachandra plot · Activation · Batch normalization · CATH · Cα · Cβ atoms

1 Introduction In this paper, elicitation of alphafold protein of CASP model of 13th Critical assessment dataset is obtained to predict the protein structure. The protein structure is classified into primary, secondary, tertiary, and quadrant structure. In secondary structure, S. M. Shifana Rayesha (B) · W. Aisha Banu (B) · S. Priya B. S. Abdur Rahman Crescent Institute of Science and Technology, Vandalur, India e-mail: [email protected] W. Aisha Banu e-mail: [email protected] S. Priya e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_69

1021

1022

S. M. Shifana Rayesha et al.

alpha stand protein structures are available. Hence, the binding the DNA stands in right-handed-helix with the backbone of N–H group in hydrogen bonds, with that of backbone of C=O group in amino acid locates four amino acid residues in the presences of protein sequence [1]. Helix name is 3.613 -Helix represents the residues present in per helical turn and 13 atoms involved in the atom ring which is formatted by hydrogen bond. Position and orientation for the amino acid residues correspond helix is 100 turn and distance of 1.5 A along the axis of helical. The alpha helix with turns in consecutive helix and vertical distance between them is 5.4 A, is the 3.6 and 1.5 product of the distance between the turns. Amino acid of N–H group forms hydrogen bonding with the C=O of four residues of amino acids, which is repeated by i + 4 Where i—is Hydrogen Bonding with amino acids of 4 residues, is the basic features of α-helix [1]. With these phenomenal aspects, we determined the orientation and intermolecular connecting residues to predict the position of atoms inside the α-helix. In terms of orientation and molecular position of atoms in alpha is denoted by ϕ, ψ of torsion angles is the incorporated with pitch and hydrogen bonding. Several computational techniques are used to obtain the protein structure of α-helix. 310 Helix is compared with 3.613 α-helix where add three more hydrogen atoms of 3.6 amino acid average of one ring present in α-helix. Dihedral angles of ϕ, ψ is given by the −60, −45.  of dihedral angle of one residues is sum up with the ψ angle of another residues to give the next sum residue of 105. In Ramachandra plot, the α-helix orientation is ranging from (−90, −15) to (−35, −70) of slope with −1 [2]. Critical assessment of protein structure (CASP) is the platform where experiments are conducted to predict the protein structures. The research community modeling the protein structure and predict the protein structure in CASP [3, 4]. The X-ray crystallography or Nuclear Magnetic Resonance (NMR) spectroscopy is resolved the structure of protein and kept in protein data bank. Resolved structure is compared with the existing structure for the identification of sequence similarity [5]. Templates of protein is founded in sequence alignment method and protein thread methods. The evaluation method in critical assessment of protein structure is to predict the α-helix position with the target protein structure. After comparison of protein, the model calculates GDT-TS (Global Distance Test-Total Score) to find the percentages of well-modeled residues [6]. The Distance expectation and the precision forecast is used to characterized a potential in each framework. The generation neural network assembly the fragments and revise the fragments of protein structures with low GDTTS cutoff. The distance of incorrect GDT-TS cutoff is directly optimized with the gradient descent approach. The conclusion of the work is to find out the similarity with the template protein structure with GDT-net and gradient descent approach to fine tune the protein structure prediction [3, 4]. The existing system is meant for the protein structure prediction, to increase the accuracy in protein structure, we deployed this method.

The Prediction of Protein Structure Using Neural Network

1023

2 Experimental Backgrounds The entire protein is modelled in a 3D structure using multiple sequence alignment of pair representation. The geometric rotation and translation of Cα –Cα of protein folds prioritized the position and orientation of the residues. These positions and orientation of the molecules are translated into Ramachandra plot of ψ, ϕ dihedral angles [5]. To predict the probability of good correlation distance between the molecules, the metric it should have more than 0.75% [7]. This correlation is worked with the predicted probability and actual probability of the mismatched protein structures. Mismatch of protein sequence or low cutoff frequency in protein is obtained from CAPS databases. Generally, the prediction of protein structure is obtained from the deep learning model. Input data of multiple sequence alignment is sent into deep neural network architecture. The deep layer will predict the GDT-TS with position and structure of atomic position residues. Finally, the GDT-TS cutoff frequency compare the training set with the template protein structure for further more refinement or to increase the accuracy in protein prediction [8]. Thus, position and orientation of each and every residues are send into the training set of neural network algorithm. The network algorithm learns the internal structure and position of atomic residues for particular protein.

3 Data Sets Evaluation PDB structures are evaluated using the neural network model which is trained. Using CATH sequences, the nonredundant sequences and similarity between the sequences are discovered. The alphafold of protein domain of 24,638 is split into test and training set of 1051 and 23,587 proteins respectively. Therefore, there are three main methods in predicting the protein structures: i. MSA sequence and features of 1D structure are extracted in the HHblits. ii. MSA features are further fitted into potts model in form of 2D coevolutionary features [9]. iii. For the distance prediction PSI BLAST is potential used for the distance prediction.

4 Architecture Schema for Protein Prediction Understanding how adding layers may increase the complexity and expressiveness of residual layers becomes to create deeper networks. Even more crucial is the capacity to build networks that adding layers makes networks is more expressive. If it is trained the newly added layer into n identity function f (x) = x for deep neural networks,

1024

S. M. Shifana Rayesha et al.

SoftMax

Average Pool

Identity_Block

Con_Block

Identity_Block

Con_Block

MP (3×3)

Activation

Batch Normalization

Con(7×7)

Fig. 1 ResNet50 architectural schema for the POTTs model [10]

the new model will be equally effective as the previous model [10]. The residual network (ResNet) is that every successive layer should include the identity function as one of its members more readily. ResNet50 is initially applied in image processing of 2D Coevolution potts model. Training of 2D coevolution image is taken from HHblist for the alpha fold, and test set images are taken from the CASP 13 Database. For identifying the images, shortcut connection is performed to identify applying and features present in the images [5]. All the layers are stacked together in a way output layer is added with the next preceding layers. By skipping the connection or bypassing the connection, we are able avoid the repeated steps of features learning pattern with simple (Fig. 1). ResNet50 skip three layers and convolution layer of size 1 * 1 is added in each layer. The ResNet model is build with following parameters: A kernel size of 11 * 11 and 55 with different kernel size of stride 3 is given as the input in layer 1. Max pooling with strides 2 * 2 is implemented. Proceeding with previous layer 1 * 196 kernel following with this 27 * 2796 layer, all these layers are repeated for the three times thus gave as nine layers. After that the kernel size of 27 * 27,256 next of kernel size 13 * 13,256 and at last 13 * 13,384 which giving us 12 layers. Kernel layers of 13 * 13,384 and two more kernels with 13 * 13,256 and this is repeated for 6 times to give 18 layers. With that of 27 * 2796 kernel with more of 27 * 27,256 and 1 * 19,216 and this layer was repeated for three times of 9 layers. Finally connected with fully connected layer containing nodes and at the end softmax of 10 dense layer is last layer (Fig. 2).

5 Algorithm and Implementation The train set of N × N MSA similar feature is send into the neural network of distance prediction of structure for all the protein set. The GDT-Net of features containing the N × N distance matrix and contact map structure Cα and Cβ atoms of structure describe sine/torsion angles. The training set begins with the architecture of deep 2D resNet. ResNet is of N × N MSA as an input to predict the distance between the orientation of protein structure. The 2D ResNet stack is compared with the similar distance

The Prediction of Protein Structure Using Neural Network

Conv_block

1025

Identity_block Base

Base

Con(N×N)

Con(N×N)

Batch Normalization

Batch Normalization

Activation

Con(N×N)

Con(N×N)

Activation

Con(N×N)

Batch Normalization

Batch Normalization

Activation

Activation Batch Normalization

Con(N×N)

Con(N×N)

Batch Normalization

Activation

Batch Normalization Activation

Add

Add

Fig. 2 Convolution block and identity block from architectural diagram [10]

predictions, but the progressive resolution is reduced by Convolutional Stride of kernel size 3 × 3 with 64 channels per block [8]. Pooling is applied to each block where these pooling will convert the vectors into dimensional residues. Softmax is performed at the pooling layer with the 100 bin range. GDT-net structures are run on GPU of batch size 1 on 32 for every structure with the decay of 10 M Steps for every learning rate.

6 Results Evaluation Results are evaluated for the test and the training data sets. The training and test data samples are the poots model of the protein structures to predict position of the residues. For the derived test and train dataset performance is calculated. Therefore, the accuracy of data is obtained for the train set is 99.4% percentage and test set are 87.5% for 10 epochs. The above metrics and data set yields the maximum efficiency in predicting the alphafold protein structure and residues of the proteins. From the comparisons made the gradient decent approach for the prediction of protein

1026

S. M. Shifana Rayesha et al.

structures give maximum likelihood and accuracy. As we expect the accurate structures when multiple sequence alignment is decoded in coevolution models. The test and training set of coevolution models, the determination of protein structure is constructed from the template-based models (Fig. 3). Further, the ResNet model is compared with the Alexnet model. Alexnet model comprises with five layers and three fully connected layer. The accuracy percentage yield by the Alexnet is 0.8747, and for the ResNet which is 0.9819 for 5 epochs (Fig. 4). Fig. 3 Performance analysis for the parameters in the test and train set

Fig. 4 Model accuracy is compared with ResNet and Alexnet

The Prediction of Protein Structure Using Neural Network

1027

7 Conclusion and Future Work In this proposed work, we presented the protein structure of alpha residues is determined for the sample dataset of POTTS model in the CASP13 assessment is extracted. Therefore, the deep learning components predicts the protein structure which is built from the template structure. The accuracy obtained for this model is 99.4%, and loss is 0.02%. Therefore, the loss is significantly low the model fit for the prediction. This model is also compared with Alex net accuracy achieved is 87.47% is low compared to the ResNet model. This research is used to determine the protein structure with simultaneously the position of residues effectively compared to the Alex net. Generally, the position of the residues which is discovered from the Ramachandra Plot. This proposed work will made comparison with the exiting models and the modelled to be tested is in the train set. The model will evaluate and give the resultant of the structure belongs to the existing structure. To fine-tune the residual position in future works, the model will be compared with the Ramchandra plot to evaluate the resultant. This technique is useful in avoiding the heuristics approaches and rough assumption made in the protein structure prediction. The coevolution technique is the main weakness for this approach. Possible number of poor structure and uninformative information are present in coevolution model. In future work to overcome this POTTS model, multiple sequence alignment and Ramachandra plot are combined, and evaluations are made in order to bring the accuracy and improve correctness in the model.

References 1. Fasman GD (2012) Prediction of protein structure and the principles of protein conformation. Springer Science and Business Media 2. Carugo O, Djinovi´c-Carugo K (2013) A proteomic Ramachandran plot (PRplot). Amino Acids 44:781–790 3. Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson AW, Bridgland A, Penedones H (2019) Protein structure prediction using multiple deep neural networks in the 13th critical assessment of protein structure prediction (CASP13). Proteins: Struct, Funct, Bioinf 87:1141–1148 4. Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson AW, Bridgland A, Penedones H (2020) Improved protein structure prediction using potentials from deep learning. Nature 577:706–710 5. Bouatta N, Sorger P, AlQuraishi M (2021) Protein structure prediction by AlphaFold2: are attention and symmetries all you need? Acta Crystallogr Sect D: Struct Biol 77:982–991 6. Anishchenko I, Baek M, Park H, Hiranuma N, Kim DE, Dauparas J, Mansoor S, Humphreys IR, Baker D (2021) Protein tertiary structure prediction and refinement using deep learning and Rosetta in CASP14. Proteins Struct, Funct, Bioinf 89:1722–1733 7. Torrisi M, Pollastri G, Le Q (2020) Deep learning methods in protein structure prediction. Comput Struct Biotechnol J 18:1301–1310 8. AlGhamdi R, Aziz A, Alshehri M, Pardasani KR, Aziz T (2021) Deep learning model with ensemble techniques to compute the secondary structure of proteins. J Supercomput 77:5104– 5119

1028

S. M. Shifana Rayesha et al.

9. Xu J, Mcpartlon M, Li J (2021) Improved protein structure prediction by deep learning irrespective of co-evolution information. Nat Mach Intell 3:601–609 10. Ji Q, Huang J, He W, Sun Y (2019) Optimized deep convolutional neural networks for identification of macular diseases from optical coherence tomography images. Algorithms 12:51 11. Fukuda H, Tomii K (2020) DeepECA: an end-to-end learning framework for protein contact prediction from a multiple sequence alignment. BMC Bioinform 21:1–15 12. Pakhrin SC, Shrestha B, Adhikari B, Kc DB (2021) Deep learning-based advances in protein structure prediction. Int J Mol Sci 22:5553 13. Wardah W, Khan MG, Sharma A, Rashid MA (2019) Protein secondary structure prediction using neural networks and deep learning: a review. Comput Biol Chem 81:1–8

System of Persons Identification Based on Human Characteristics A. Akhatov, I. Himmatov, Christo Ananth, and T. Ananth Kumar

Abstract Mathematical models have significantly contributed to the experimental analysis of behavioral biometrics, organizing and developing a new scientific direction. It identifies biometric behavior. Modern mathematical models can accurately anticipate big datasets. If two models are not mathematically similar, their prediction assumptions about underlying behavioral and psychological processes diverge; consequently, competing models must identify and investigate situations in which such assumptions are at play. While studying biometric behavior transition, mathematical models were developed and applied in real-world settings to predict and control the actions of individuals. Mathematical rigor can provide coherence to phenomena that appear unrelated at first glance. For the creators of mathematical models of behavioral biometrics, there is no inter-relationship between the models’ main buildings, despite psychologists’ quantitative skills and tolerance of math equation q. It is very important to find them. As well as the selection of effective methods and mathematical models in the identification of the individual, as well as the selection of the most efficient algorithm. Keywords Identification · Behavioral biometrics · Mathematical model · Keystroke dynamics · Gait analysis · Authentication · Access control · Recognition results · Gait · Detection and identification rate · Biometric system components

1 Introduction External influences can falsify the information properties of existing identification technologies. In such cases, effective technology is to identify a person based on his or her behavior [1]. Behavioral biometrics analyzes human behavior patterns, unlike physical biometrics like fingerprints and iris patterns. Behavioral biometrics A. Akhatov · I. Himmatov · C. Ananth Samarkand State University, Samarkand, Uzbekistan 140100 T. Ananth Kumar (B) IFET College of Engineering, Gangarampalaiyam, Tamilnadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_70

1029

1030

A. Akhatov et al.

include signature analysis, cognitive biometrics, gait analysis, voice recognition, and mouse usage. Behavioral biometrics are used for secure authentication in financial institutions, businesses, factories, and schools [2]. Biometric observation and testing is any tool that can uniquely identify an individual by evaluating one or more different biological characteristics [3]. Unique identifiers include fingerprints, hand shape geometry, ear canal geometry, retina and iris lines, voices, DNA, and personal signatures [4].

2 Literature Survey The most traditional method of biometric identification is fingerprinting. Significant advancements have been made in biometric verification due to the creation of computerized databases and the digitization of analog data [5]. These advancements have made identification possible in a matter of milliseconds. Because passwords are immutable and behavior is biometric, we must reconsider our security strategy because combining the two types of systems is feasible. Because nearly every authentication method can be technically or programmatically compromised, financial institutions should not rely on a single control to allow high-risk transactions. Instead, they should use a multi-factor authentication scheme that combines multiple authentication methods to improve security without negatively impacting the user experience [6]. External variables constrain the capabilities of technology based on human behavior. This increases the already high level of security even further. In modern behavioral or psychological biometric systems, authentication is taken one step further by requiring the user to prove their identity and have the correct fingerprint to log in. Although authentication is a clear goal with these technologies, the flexible part is equally important [7]. At a time when heavy or complex security measures alienate most users, flexible technologies work with them instead of disrupting the user’s workflow. Maybe it’s a customer trying to access their account information or an employee looking for customer information [8]. Whatever they do, different behind-the-scenes processes stop them, calculate their authenticity, and perform dynamic actions without asking them to take another step to prove their identity. Using methods such as device detection, geo-speed, geolocation, or IP reputation, flexible authentication contextualizes these elements to accurately identify the user [9]. After all, hackers can steal login information; a device that is already logged in can be used by someone else. But attackers typically cannot type in a user ID and mimic it until the mouse moves. At the same time, behavioral biometrics are created by working with such subtle features of the user that the human eye finds it difficult to observe them. This technology records these nuances and microbehaviors and compares them with subsequent logins to verify their identities.

System of Persons Identification Based on Human Characteristics

1031

3 Mathematical Models of Behavioral Identification 3.1 Psychological-Mathematical Model of Personality Recognition The work of researchers who have spent decades investigating fundamental behavioral processes has become significantly more dependent on mathematical models [10]. Mathematical modeling in behavioral science has been on the rise ever since the Society for Quantitative Analysis of Behavior (SQAB) was established. As a result, more and more articles utilizing mathematical models have been published in the Journal of Experimental Behavioral Analysis (JEAB) [11]. Figure 1 shows the frequency with which JEAB articles include an equation to illustrate the connection between a dependent and independent variable. Like Maxwell’s equations combine many branches of physics, a suitable mathematical model in psychology can apply fundamental principles to account for distinct behaviors [12]. This can be accomplished in the same way that Maxwell’s equations integrate several fields of physics. This is like how Maxwell’s equations combine physics into behavior models. Maxwell’s equations produce a similar situation. An abundance of contextual factors can profoundly impact any behavior displayed by a living organism. Nonetheless, mathematical models can carry out a variety of tasks that are helpful when conducting experimental research on behavior [13]. Psychologists often resort to developing mathematical models to provide a more detailed

Fig. 1 Graph describing the relationship between a separate independent/dependent variable in the JEAB every 10 years

1032

A. Akhatov et al.

description of core behavioral processes than is possible with words alone. Predictions based on written theories are not always reliable, but when theories are presented mathematically, their consequences (and the differences between them) become apparent [14]. Critical tests can be conducted to determine which assumptions about the underlying behavioral processes are true and which are false, comparing the quantitative predictions of two or more models. Finding the missing pieces of the puzzle and zeroing in on what really matters in terms of influencing behavior is crucial for testing the quantitative predictions of mathematical models [15]. Both a neurologist trying to understand the biological basis of a behavioral phenomenon and a therapist trying to predict and control behavior in a clinical setting can find helpful links to resources provided by people who have developed mathematical models of these processes, and we can write this using the gradient amplitude reduction formula: V = Ae(−KD) ,

(1)

where V is the amplifier’s value after D seconds, A is its value if delivered immediately, e is the base of the natural logarithm, and K is the parameter that controls the delivery rate. V decreases in size as delay lengthens. Furthermore, we see that a hyperbolic function, as shown in the formula below, accurately represents the delay gradient of reinforcement: V = A/(1 + KD),

(2)

Each of these two equations describes a decomposition curve with the same shape but describes different phenomena [16]. The results of an experiment with a single subject are shown in Fig. 2. This user opted for a single presentation of a VT table and a delayed amplifier. The points of differentiation shown on the graph were obtained by varying the delay during the tests. As the length of the test delay increased, the value of the single amplifier fell as a result. Both equations explain 99% of the variation in the data, and the best-matching predictions can be found in Eqs. (4) and (5), whose parameters can be set arbitrarily. The preference shift under these conditions should not be consistent with the exponential equation, as K’s assumptions decreased with increasing amplification. Participants were given the option of selecting from several hypothetical cash amplifiers that ranged in both size and time lag. The fewer assumptions K held, the higher the level of amplification. The hyperbolic equation, which states that a preference shift occurs whether or not K is evaluated, was considered to bolster our case.

3.2 Time-of-Flight Systems (ToFS) Model ToFS systems use signal modulation, camera based, and phase shift to measure the distance between two points (Fig. 3). The region being examined is illuminated by modulated near-infrared light (NIL) modulated by a sinusoidal megahertz signal.

System of Persons Identification Based on Human Characteristics

1033

Fig. 2 Data from 1 flight were assigned to the equation of Eq. (1) and the hyperbolic function of Eq. (2). For both of the equations, K was taken as the free parameter

Based on our research, the light source is either a CMOS sensor or a charge-coupled device (CCD). Both may be true. Parallel measurements at each pixel determine the phase shift, which is proportional to distance. This is done so the image can be accurately displayed. The m + 1 dimension of the optical input signal Si (t) = {si (t0), si (t1),…, si (tm) | i = 1,…, n}) is the number of images taken at each n pixel location. Then a set of amplitude data A = ({ai | i = 1, …, n}) and a set of intensity (offset) data B = ({bi | i = 1, …, n}). The aim of our issue is considering from the illuminated sinusoidal light, four dimensions si (t 0 ), si (t 1 ), si (t 2 ), and si (t 3 ) of each periodic phase at 0°, 90°, 180°,

Fig. 3 Time of flight operation process and structure

1034

A. Akhatov et al.

and 270° are obtained T = 1/fm. The background light can be calculated by the following equations, φi let the phase shift of the pixel be, is the amplitude ai and the intensity bi : φi = tan  ai =

−1



si(τ0 ) − si(τ2 ) si(τ2 )/si(τ1 ) − si(τ3 )



[si(τ0 ) − si(τ2 )] − [si(τ1 ) − si(τ3 )] 2   3 j=0 si τj bi = 4

(3)

(4)

(5)

The measurement of the distance between the object and the image array is D = {di| i = 1, …, n} defined as: di = λ m2 φi2 π

(6)

The ToF chambers are limited in wavelength range as a result of the periodicity of the signal modulation. As an illustration, consider the following: Distance is the only quantifiable attribute at this scale. These results lead us to conclude that the camera’s frequency modulation, which establishes the wavelength of the output signal, is directly related to the range. The camera calculates distances by comparing the phase of the reference signal to the signal it has received; the larger the difference, the greater the distance, and vice versa for the value of I, which is proportional to d. In order to detect human gait, this system is gradually replacing conventional time-offlight methods by disambiguating gait characteristics from distinct joints and body parts. ToF cameras can also be used for precise dynamic trace pressure analysis. As with the models mentioned above, the thermographic infrared (TI) model can be used to identify poses. Surface temperature-based visuals are possible thanks to the human body’s precise ability to gauge infrared heat intensity and its skin’s emission of 0.98, 0.01, which leads to absorption (0.98, 0.01) regardless of reflection (0.02), pigmentation, or permeability. That is to say; the human body can accurately gauge infrared heat intensity, allowing it to construct visual images based on surface temperature (0.000). Based on the probability of correct recognition, this method could identify human gait patterns with an accuracy of 78–91% (Fig. 4).

3.3 Mathematical Model of Predictions The most famous mathematical models do not always produce incompatible predictions, unlike the hyperbolic and exponential decay equations. Often, two or more separate mathematical models for large-scale experiments produce strikingly similar

System of Persons Identification Based on Human Characteristics

1035

Fig. 4 TI the process of image processing in distinguishing basic walking features

results, making the models appear roughly equivalent in their ability to predict accurately. In such cases, we need to know how to differentiate which one we use if those models also predict behavior at the same level when predicting behavior. Comparing three models of chain operation at the same time shows such a situation. Our research assesses the precision of the hyperbolic value-added model’s predictions by contrasting Fantino’s theory of delay reduction with Grace’s contextual selection model. Every model is uniform in its number of adjustable settings. This allows us to consider factors like the imprecision of the response to each model and the fact that different people are more or less sensitive to different reinforcement tables. To derive quantitative predictions from these three models, 92 data sets were collected from 19 separate experiments and published in chart tables at the same time. Standard curvilinear methods were used to obtain these forecasts. We found that 90.8% of the variance in these data sets could be accounted for by the CCM, 89.6% by the HVA, and 83.0% by the DRT. When both models were given the same number of null values for their parameters, the contributions to the discrepancy were roughly the same, and the third model didn’t fall too far behind. Minor inconsistencies in prediction accuracy are probably attributable to random fluctuations in the data or to subjective choices made when deciding how many free parameters to include in each model. All three models appear to have achieved a high level of success if the prediction of performance across multiple chart tables simultaneously is of interest. The predictions of the three models are often quite similar, but they are grounded in different assumptions about the underlying psychological mechanisms that give rise to the observed behavior. Although the models produce strikingly similar predictions, they are grounded in different assumptions. To do so, we will first give a sample of a standard parallel chain table, then give the equations for the three different models. Multiple chains can be linked together at once, as shown in Fig. 5. Specifically, this process uses two tables that run in the background over time. These tables represent the so-called “initial links” that are established at the start of the process and ultimately lead to a unique final connection. Therefore, at the same time, the chain procedure alternates between the selection period (initial connections) and the consequence period (terminal connections that cannot be switched to another alternative).

1036

A. Akhatov et al.

Fig. 5 Same simultaneous chains with two different FI tables as terminal links and the same VI 60-s tables as the first links in the table

A subject’s response rate in the introductory references is the gold standard for determining preference. This is expected to be the case, so no shock there. In this case, the left button seems to have a greater effect on the hand’s moving geometry than the right. The left button communicates with a shorter FI table terminal link. However, it is well known that tables in primary links, as opposed to just terminal links, affect preferences. The response ratio will be sharper and the terminal-connection tables’ effects will be felt more strongly if the duration of two identical initial connection tables is reduced. The term “initial binding effect” is used to describe this phenomenon. If, for example, Few tables [14] were used instead of the original reference tables in Fig. 3, more answers would be given by pressing the left button. The most fundamental equations for CCM, DRT, and HVA are presented to explain how these models simultaneously forecast for chain tables. This suggests that the curve-matching empty parameters have been left out of the equations. The key differences between the models are more easily seen, and any unnecessary complexity is eliminated. Differentiating between descriptive and theoretical equations is helpful when thinking about mathematical behavior models. Descriptive equations are helpful in generalizing the mathematical relationship between an independent and a dependent variable. This does not theoretically justify using a specific equation in practice. Instead, the theoretical equation is based on the psychological processes that cause the behaviors under consideration. This is so because the equation’s form directly reflects the assumptions. Experiment participant selection psychological processes are the focus of several theoretical equations, including the CCM, the DRT, and the HVA. While some assumptions are common to all three models, others are model-specific or absent altogether. The CCM can be written as follows if no parameters are given:

System of Persons Identification Based on Human Characteristics

B1 = B2



ri1 ri2



rt1 rt2

1037

(Ti /Tt ) (7)

In this equation, r i1 and r i2 represent the reinforcement rates at the initial joints of the chain table, r t1 and r t2 represent the reinforcement rates at the two terminal joints, and B1 and B2 represent the response rates that are co-occurring at the initial joints of the chain table. With the CCM data at hand, it is clear that the selection responses in the simultaneous chain tables depend on those in both the first and last links of the chain. The relationship between T t and T i is one characteristic that differentiates CCM from other approaches. The symbol represents the initial connection time, T i , while the symbol represents the final connection time, T t . CCM terminal connections that are longer than the initial connections and link sizes have a more significant effect on preferences. This is because the T t /T i ratio is less useful as an exponent for accelerating the rate of terminal connection formation. In addition, the number of initial connections and the size of terminal links have less impact. When the initial connection tables are reduced from VI = 60 s to VI = 30 s, T t /T i increases, and Eq. (7) has an even more significant advantage over Eq. (6) in the previously cited example. Using Herrnstein’s blood as an example, comparing the rate of reinforcement to the rate of behavior reveals that the relative speed of behavior is proportional to the relative speed of reinforcement. This law is designed to account for the occurrence of selection on parallel tables with no terminal links. Grace suggested multiplying the various amplifier dimensions (such as speed, delay, quantity, and quality) to obtain a global scale when there are two or more dimensions of difference. Grace viewed terminal-connection tables as conditional amplifiers with variable gain-dependent values (r t1 and r t2 ). Grace was also persuaded by theoretical and empirical arguments that the context in which a value of a terminal connection was presented influenced how that value was manifested in behavior. The reality is that (i.e., compared to the duration of terminal connections). Grace’s T t /T i exponent demonstrated that the emphasis placed on the rate of reinforcement in terminal connections depends on the initial connections’ duration. Comparing the T t value to the T i value proved the point. From these assumptions, we were able to conclude that CCM exists. If Tt equals zero, then the situation described by Eq. (7) holds, indicating that the absence of a terminal connection satisfies the requirements of a corresponding simple law. This notation may be used to abbreviate Herrnstein’s law for the Squires and Fantino versions of the DRT [4]: B1 = B2



R1 R2



Ttotal − Tt1 Ttotal − Tt2

(8)

where R1 and R2 are the consolidation rates, and again, in this case, in the chain diagram, they are calculated at the common rates, including the time between the initial and terminal connections. T total is the average amount of time that has elapsed between the start of the initial connections and the start of the initial consolidation. T t1

1038

A. Akhatov et al.

and T t2 , respectively, represent the average durations of the two terminal connections, also known as the two terminal connections. In addition to the hypothesis that we achieve the goal of achieving the most effective state of delay reduction, we see that the selection action from DRT CCM also differs from the hypothesis that the overall consolidation rates are functions R1 and R2 relative to the assumption that CCM is the initial link. From this approach, Eq. (8) implies that the main reinforcement is multiplied by the conditional reinforcement values of the terminal connections to determine the overall selection behavior. Specifically, we find that the initial and terminal link effects of CCM, DRT, and HVA are shorter than those of the other variables studied. In addition, these models follow the relevant fundamental law and reliably predict the advantages accruing to other outcomes of parallel chain tables when there are no terminal connections. Data limit mathematical models in this situation; this allows for a new model to emerge while also allowing for the possibility of accounting for well-established behavioral events. However, as was mentioned, the three models make different assumptions about the selection behavior, with the length of the selected period particularly salient for the CCM. Benefits are less sensitive to shifts in terminal connection tables when the selection time is longer than the duration of the terminal connections. The delay reduction is the most crucial part of DRT, as the benefit depends on the speed with which the strengthening is initiated, as shown by the beginning of the terminal connection. We conclude that the value of the conditional amplifier linked to each table is the most crucial part of HVA, and that the preference is set by the increase in value signaled by the initiation of the terminal connection. The mathematical structure of these models allows for the rigorous testing and refutation of precise predictions, enabling the empirical testing of various hypotheses regarding the determinants of their preferred behaviors. Our previous analysis has shown us that this is the case. When mathematical principles are supported by empirical and computational evidence, it becomes apparent that they can be used to predict or regulate behavior in everyday contexts. (Illustration:) The aforementioned law of Herrnstein serves as an example of a mathematical principle that psychologists widely use in practice. This principle of selection is used extensively in the modern world, as some examples can show. Neurology and psychopharmacology researchers can use operant research mathematical models of behavior. They used the hyperbolic fragmentation equation for delayed amplifiers to study brain damage and decisionmaking (Fig. 5). The hyperbolic decay model (Eq. 6) showed how the indifference functions of objects with OFPC lesions differed from controlled objects, based on the predictions in Eq. (6), as seen in the three panels in Fig. 6. We consider K as a discount rate parameter. For OFPC-lesioned objects, a rise in the value of Eq. (6) indicates that the regenerative power of the object degrades at a faster rate with time. Figure 4’s left panel demonstrates that as K rises, the indifference function flattens out, with the y intersection flattening out less than the slope. Secondly, OPFC lesions can be fragile. Dimensions of the device are represented by the parameter A in Eq. (5). In Fig. 6, we see potential outcomes in the case that OPFC lesions raise object sensitivity to reinforcement differences. Since steep slopes more easily influence indifferent people and animals, the indifference function should have steeper

System of Persons Identification Based on Human Characteristics

1039

Fig. 6 Hypothesized insensitivity functions, which show three possible effects, are related to damage to the orbital prefrontal cortex in objects

slopes and more prominent y intersections. Sometimes that happens. In the third scenario, lesions to the OPFC can change both the object’s delay and the amplifier’s sensitivity. Predictions of Eq. (5) for OPFC lesions that amplify delay and quantity sensitivity are shown in the right panel of Fig. 6. Lesions to the occipital prefrontal cortex (OPFC) raise the slope but not the y-intersection in this case. Where are Eq. (5)’s hyperbolic fragmentation model predictions: K-left panel lesions, central panel—amplified amplifier sensitivity, or both resulting in right panel lesions.

3.4 Cyclic Modeling of Biometric System Machine Learning As an effective aspect of using machine learning, we have formed the structure of the algorithm in Fig. 8 below, which consists of a data set, an evaluation module, a feature selection and learning algorithm, and a machine learning design cycle. Dataset in the above block scheme: This is an introduction to machine learning, which may or may not require pre-processing, data completeness or inconsistency, missing attributes or attribute values, inconsistencies in coding or naming rules, errors, noisy data, and multiple cases naming. Additionally, it must match the data used to validate and analyze inputted and processed data. By combining them during data pre-processing, the verification mode of the biometric system lowers both error rates. Reduce biometric system Equal Error Rates to accomplish this (EER). This indicator is displayed on the DET curve. Developing an identification-based biometric system for walking raises various concerns. This system can be either closed or open. In the “closed set,” the system is aware of all identifiable entities, whereas in the “open set,” unknown entities can be identified. If a small group of

1040

A. Akhatov et al.

Fig. 7 Performance structure of machine learning

specific users is set up for identification, a closed-loop approach simplifies the design of a walk-based biometric system (Fig. 7). The identification rate quantifies the percentage of correct identification matches and is a single practical indicator for assessing a design in a closed-loop system. With this change, evaluating is much more efficient. The following are the two most important criteria used to assess this method of open package identification: Faulty transmission speed, leading to incorrect identification of a previously unknown user or inaccurate data for a previously known user. The rate at which cases are identified and solved, as well as the rate at which they are detected. Figure 8 shows an example of a receiver performance characteristics (ROC) curve for a system operating in open package identification mode, which depicts the balance between the two rates discussed earlier. There are pros and cons to every approach that could influence how they are combined into a unified security architecture. When subjects cannot cooperate or it is otherwise necessary to collect data remotely, video-based imaging of walking biometrics may be appropriate; however, this may raise privacy concerns among the subjects. While a sensor-based approach can address privacy concerns, doing so will necessitate extensive subject cooperation. Choosing to catch a walk at other times may be limited by available resources. Nevertheless, if possible, the use of multiple forms of walking biometric imaging may be beneficial, as shown in previous studies.

System of Persons Identification Based on Human Characteristics

1041

Fig. 8 This graph shows an example of a detection error curve with an equal error rate of about 10%

The next consideration method should be built into a walking-based biometric system after a decision has been made on the mode of operation. There are two methods available for obtaining reliable measurements of biometrics associated with walking: The first technique uses cameras to record a person’s gait, while the second uses sensors attached to the item being tracked. The figures illustrate the three most researched walking techniques for biometric identification and verification. Using a single image, two sensors, or both are possible. Since the walking method affects the format of the generated data samples and different machine learning algorithms are best for different biometric system designs, we developed the algorithm scheme in Fig. 9.

4 Organizing the Structure of the System The recognition system in Fig. 10 above has the “intelligence” of a walking biometric system. This component interacts with users in two phases: registration and testing. The system must produce real-time results and selections at this stage. If real-time data analysis is not needed, more precise results can be obtained. Walking data paths are extensive and variable, making it difficult to identify potentially significant patterns visually. Walking-based research seeks to identify these patterns and help people distinguish themselves. Numerous studies have shown that pattern recognition and prediction-focused machine learning algorithms are the best way to achieve our goal. Many studies have concluded that pattern recognition and

1042

A. Akhatov et al.

Fig. 9 This figure shows a graph of the receiver performance characteristics curve used to estimate the error for a biometric system using the open batch identification mode

Fig. 10 Structure of gait biometric capture

prediction-oriented machine learning algorithms are the most effective means to our end. Feature separation and classification are two areas where these techniques shine, but they can also be used to normalize data via a regression-like process. References needed. Designing a walk-based biometric system requires creating an algorithm to examine different types of machine learning and data analysis techniques to see

System of Persons Identification Based on Human Characteristics

1043

if they can be usefully integrated into the process. The maximum allowable size of the feature space is determined by considering the size of the training sample, the required quality values, and the reliability of recognition. Each category of an object’s primary characteristics has its unique criteria for selecting features. The walk biometric imaging structure exemplifies how the three most-studied biometric techniques can work together to form a comprehensive and reliable biometric system. A subject’s sample data is collected and compared to stored samples or templates (walking signatures). The results are transmitted to a security or surveillance authority, who then uses them to either restrict access or, depending on the mode of operation, determine whether or not the subject is authorized to be there (Fig. 10). The two stages of user engagement illustrate how the walk is integrated into the overall design of the biometric system. At the registration stage, applicants are required to walk for a predetermined number of cycles up until they have accumulated sufficient information regarding their walks. In a later stage of the testing process, the system will use the previously registered data to identify or verify the user while simultaneously attempting to exclude potential fraudulent users. In most cases, gait monitoring system inputs function as a continuous data stream, ignoring gait cycles. Specific movements against a static background may start and stop a sampling cycle when recording a video loop. In sensory approaches, however, the step affects power. Any walking-based biometric system must collect a sample at the desired frequency. Some systems may normalize to account for changes in scale, such as the distances at which image objects appear. In the final step of data processing, feature separation, large sample sizes may need to be reduced to controlled sizes before being sent to a classifier, the final component of all biometric systems. Before transferring the sample, do this. The order of data sample processing can deconstruct a walk-based biometric system’s “intelligent” components. Each component of the biometric gait system uses a different data analysis or machine study technique (Figs. 11 and 12). Given the advantages of recognition through gait systems, future research should focus on four areas to minimize interference with the subject’s daily activities when measuring and evaluating human gait and overcome gait measurement system limitations. These include new sensors for parameter analysis, power consumption, miniaturization, and signal processing algorithms. Each region is detailed below. The need for new sensors that can detect a more significant number of gait parameters is represented by Field 1. This is necessary in order for new gait systems to have the capacity and accuracy necessary for their use. To be more specific, new sensors are required to provide a more accurate measurement of segment location and orientation as well as velocity, joint angles, pressure distribution, pitch, and length. Work should also be done to determine the sensor locations that are likely to yield the best results for each research objective. The work done in Area 2 should center on developing technologies that will increase autonomy for long-term analysis and lengthen the life of energy sources. Area 3 should emphasize minimizing measurement and communication systems, the goal of which is to create wholly non-intrusive and invisible systems that can then be fully integrated into clothing or the human body.

1044

A. Akhatov et al.

Fig. 11 Filtering stages in comparison with the real biometric system of gait

Fig. 12 Classifications of gait biometric system

This would be a departure from the systems that are currently in use. It shows that we have achieved increased ease of use. The algorithm selected on the basis of the above models is appropriate in the context of Uzbekistan as an information system that recognizes a new person on the basis of biometric authentication data. In short, when the value of a mathematical model depends on how it is applied, the developers of mathematical models in psychology are asked not only by other experts, but also by anyone interested in explaining, predicting, and controlling behavior. Should use, should be strengthened, behavior, identification of biometric data, i.e., recognition status occurs and identification becomes possible. Analyzing the above mathematical models, we come to the following conclusions: The above figures compare the delay functions of hyperbolic and exponential reinforcement, two mathematical functions with remarkably similar forms but very different behavioral predictions. Based on several presumptions about the mental

System of Persons Identification Based on Human Characteristics

1045

operations that form the basis for the manifested behaviors. While several charting methods (CCM, DRT, and HVA, for instance) can consider multiple charts simultaneously, their underlying selection behavior principles are different (context-dependent, delay reduction, and added value). Models of behavior derived from fundamental studies of human behavior have been put to use in the fields of neurology and psychopharmacology to understand better how different brain regions function and how drugs affect people’s behavior. Factors that affect biometric walking recognition are discussed, and a brief introduction to machine learning is provided. It also discusses how a person’s gait can be used to identify them uniquely. It was shown that different imaging devices can be used to measure gait differently and that no single measure encompasses all of the dynamics that comprise human gait. An overview of the challenges inherent in developing a biometric recognition system that uses gait data was presented.

5 Conclusion We evaluated by comparing the WS, FS, and MV methods for detecting gait. Almost all studies measured steps to within 10% accuracy. Instead, a person’s psychological characteristics are formed by their actions during the identified gait, enhancing accuracy. The results of MV method may reveal similarities between controlled laboratory environments and MV databases. MV databanks may have been derived from prototypes with greater detail. When both sensory methods are calibrated to detect foot sound forces by gait (GRF), utilizing Earth reaction forces is displayed. The algorithm selected on the basis of the above models is suitable in Uzbekistan as a new identity recognition information system based on biometric authentication data, and in the future as a continuation of this study to form a mechanism for identifying suspects. Acknowledgements Funding agency: NA. Conflict of Interest: We declare no conflict of interest. Availability of Dataset: NA.

References 1. Mei Yuan L (2021) Student’s attitude and satisfaction towards transformative learning: a research study on emergency remote learning in tertiary education. Creative Educ 12(1):494– 528 2. Tumpa SN (2022) Online user recognition using social behavioral biometric system. Master’s thesis, Schulich School of Engineering

1046

A. Akhatov et al.

3. Wilson R, Bhandarkar A, Lyons P, Woodard DL (2021) SQSE: a measure to assess sample quality of authorial style as a cognitive biometric trait. IEEE Trans Biometrics, Behav, Identity Sci 3(4):583–596 4. Hull CL (1943) Principles of behavior: an introduction to behavior theory 5. Wearden JH (1989) Mysteries of the organism: CL Hull’s principles of behavior and some problems in contemporary schedule theory. J Exp Anal Behav 51(2):277 6. Grace RC (1994) A contextual model of concurrent-chains choice. J Exp Anal Behav 61(1):113–129 7. Squires N, Fantino E (1971) A model for choice in simple concurrent and concurrent-chains schedules 1. J Exp Anal Behav 15(1):27–38 8. Kheramin S, Body S, Mobini S, Ho M-Y, Velázquez-Martinez D, Bradshaw C, Szabadi E, Deakin J, Anderson I (2002) Effects of quinolinic acid-induced lesions of the orbital prefrontal cortex on inter-temporal choice: a quantitative analysis. Psychopharmacology 165(1):9–17 9. Bizo LA, Killeen PR (1997) Models of ratio schedule performance. J Exp Psychol Anim Behav Process 23(3):351 10. Bekmuratov KA, Bekmuratov DK, Akhatov AR (2019) Synthesis of feature spaces ensuring the quality and reliability of recognition. In: 2019 dynamics of systems, mechanisms and machines (Dynamics). IEEE, pp 1–5 11. Correia CJ, Simons J, Carey KB, Borsari BE (1998) Predicting drug use: application of behavioral theories of choice. Addict Behav 23(5):705–709 12. Muro-De-La-Herran A, Garcia-Zapirain B, Mendez-Zorrilla A (2014) Gait analysis methods: an overview of wearable and non-wearable systems, highlighting clinical applications. Sensors 14(2):3362–3394 13. Mazur JE (2006) Mathematical models and the experimental analysis of behavior. J Exp Anal Behav 85(2):275–291 14. de Wee G, Asmah-Andoh K (2022) Using science of conceptual systems to guide policy and theory of social change: a push towards more accurate anticipatory system modelling. Int J General Syst 51(2):95–125 15. Supena I, Darmuki A, Hariyadi A (2021) The influence of 4c (constructive, critical, creativity, collaborative) learning model on students’ learning outcomes. Int J Instr 14(3):873–892 16. Shoshani O, Shaw SW (2021) Resonant modal interactions in micro/nano-mechanical structures. Nonlinear Dyn 104(3):1801–1828

Analysis of COVID-19 Genome Using Continuous Wavelet Transform Shivani Saxena, Abhijeeth M. Nair, and Ahsan Z. Rizvi

Abstract The virus has been a debatable topic for decades as it cannot be placed anywhere in the classification, as viruses are neither dead nor alive. These are small parasites that require the host body to replicate. A virus consists of DNA or RNA, either single or double-stranded or circular or linear. They also have a protective covering made of a protein coat called a capsid. There are two different methods by which we can analyze the genomic functions either by direct experiments or by computational methods. Computational techniques are slowly taking over the research. In this paper, Continuous Wavelet Transform (CWT) is used to detect the hot regions present in the COVID-19 genome using different wavelets at different scales. Results show that wavelets coif5 and db4 give better results at a scale of 200. Keywords Continuous wavelet transform · COVID-19 genome · DNA representation · Signal processing · Wavelet analysis

1 Introduction Coronavirus disease of 2019 (COVID-19) is a pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus. Currently, there are more than 60 Lakh death cases due to Corona [1]. Due to time, the virus is mutating at an Supported by organization Institute of Advanced Research. Shivani Saxena and Abhijeeth M. Nair—Joint Author. S. Saxena (B) · A. Z. Rizvi Department of Computer Sciences and Engineering, Institute of Advanced Research, Gandhinagar, Gujarat, India e-mail: [email protected] A. Z. Rizvi e-mail: [email protected] A. M. Nair Department of Biotechnology and Bioengineering, Institute of Advanced Research, Gandhinagar, Gujarat, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2_71

1047

1048

S. Saxena et al.

alarming rate. Several factors influence the impact of mutations on test performance, including the genome of the variants, the test design, and the prevalence of variants in the population. The majority of those infected with SARS-COV-2 will develop mild to moderate respiratory illness, and they will also recover without the need for any specific treatment. Some people though will become very ill and may require immediate medical attention. People over the age of 65, as well as those with any prevailing medical diseases like cardiovascular disease, diabetes, chronic respiratory disease, or cancer, are at a higher risk of developing serious illness [2]. Anyone regardless of any age can become very ill or die due to COVID-19. Coronavirus belongs to the Coronaviridae family, and these are positive-sense single-stranded RNA viruses [3]. Various research and evidence have shown that COVID-19 has undergone several recombinations resulting in the formation of new strains with altered virulence. Those strains have the capability to even escape the host-antiviral defense system failure of manufacturing the perfect vaccine. Analyzing the COVID-19 genome using mathematical tools will be helpful in studying the nature and the features of the coronavirus. In this paper, we are going to use Digital Signal Processing techniques to study the coronavirus in depth. There are several techniques to deal with it, like, Fourier Transform (FT), Discrete Fourier Transform (DFT), Fast Fourier Transform (FFT), and Short Time Fourier Transform (STFT). But there are some drawbacks to using these techniques. When we are using Fourier Transform (FT), Discrete Fourier Transform (DFT), and Fast Fourier Transform (FFT), these techniques are frequency localization and not time–frequency localization. This problem is solved with Short Time Fourier Transform (STFT), but there is a problem with the window size, so to resolve this we are using Wavelets, which are time–frequency localization. To study the abrupt changes in any signal Continuous Wavelet Transform (CWT) is used, as it is helpful in identifying the abrupt changes that occur in the signal. Here in the place of signals, we are using the coronavirus genome to study the pattern in it. In this paper, we are going to discuss the work done till now by different researchers will be explained in Sect. 2. In Sect. 3, we will discuss the proposed methodology we have used in extracting the information from the coronavirus genome. The results are explained and discussed in Sect. 4, and last, the conclusion and the future work are discussed in Sect. 5.

2 Related Work In this section, we’re going to discuss the work carried out till now by different researchers in the field on analyzing coronavirus genomes using different methods and techniques. How we can convert the coronavirus genome into mathematical form for analyzing it?

Analysis of COVID-19 Genome Using Continuous Wavelet Transform

1049

2.1 Literature Review When a severe acute respiratory syndrome outbreak occurred during 2002–2004 [4] MARCO A. MARRA et al. sequenced 29,751-base genome of SARS also known as Tor2 isolate. They used the SuperScript Choice System for cDNA synthesis and used Applied Biosystems BigDye terminator reagent (version 3), with electrophoresis and data collection on AB 3700 and 3730 XL instruments to sequence the genetic material of the viral genome. Later with the help of Blast, they confirmed that the viral genome acquired was novel [5]. In 2009, Chandra et al. did an Entropy, Autocorrelation, and Fourier analysis of the HIV-1 Genome [6]. Then, in 2014, several types of research were carried out; Martin et al. used Droplet Digital PCR for Absolute Determination of Single-Stranded and Self Complementary AAV Vector Genome Titers [7]. Then, Biju George et al. analyzed small repetitive sequences in caulimoviruses. They experimented and analyzed both the nature and distribution of simple and complex microsatellites present in the complete genome of 44 species of Caulimovirida. They selected 54 viral genomes and using statistical analysis they compared the observed number of microsatellites (O) with the expected number of microsatellites (E) in the form of a ratio of O/E [8]. Also, Ramakanth et al. analyzed the alphacorona RNA structure using RNA folding algorithms to identify RNA structural elements. [9]. In 2015, Juan Cristina et al. did a genome-wide analysis of codon usage bias in Ebolavirus using the program CodonW the first, second, and third codon positions were calculated [10]. In 2020, Geng et al. did a complete genome analysis of staphylococcus aureus phage (vBSM-A1) [11]. They observed the morphology of the phage using a transmission electron microscope (TEM), and later used Illumina Hiseq paired-end platform, and the results were evaluated using Trimmomatic-0.33 (a flexible trimmer for Illumina data) [12]. They used several bioinformatics tools too like Blastp and used GeneMark server to identify the open reading frame. Julie Yamaguchi et al. did a Complete Genome Sequence of CG-0018a-01which Establishes HIV-1 Subtype L. To increase genome coverage, they mixed metagenomic next-generation sequencing (mNGS) and HIV-specific target-enriched (HIVxGen) libraries for NGS. The sequence-based ultrarapid pathogen identification bioinformatics workflow was used to look for other coinfections in NGS readings [13]. In 2021, TurabNaqvi et al. analyzed a novel COVID-19 genome using PONDR® [14] (Predictor of Natural Disordered Regions), VLXT, VL3, VLS2 32 and IUPred2A web servers [15]. These tools are used to identify the disordered protein regions by predicting the residues which do not possess the tendency to form a structure in native conditions. They have also carried out several MSAs to compare the sequences with other sequences. COVID treatment research could benefit from wavelet analysis of the COVID-19 genome. Continuous wavelet analysis of the COVID genome can provide insight into major mutation sites. These mutation sites are crucial in the development of anti-COVID drugs.

1050

S. Saxena et al.

2.2 Numerical Representation of the DNA Sequence The fundamental technique for analyzing DNA sequences is sequence comparison. DNA is composed of four nitrogen bases which also fall under two different categories. A and G falls under purines, whereas C and T fall under pyrimidine. We can characterize DNA sequences as Coding (Exons) sequences and non-coding(introns) sequences. For using DNA sequences, we have to convert the DNA sequence into a mathematical expression so that we can use this expression for extracting features. Basically, there are two types of representing any genome: Fixed Mapping Representation and Physicochemical Property-Based Representation. The summary of each representation technique is given in Table 1.

3 Proposed Work In this section, we will discuss the basics behind the wavelet transform, inverse wavelet transform, and the methodology we have used in analyzing the COVID sequence. We will also discuss the COVID sequences we have used in this paper.

3.1 Methodology For analyzing the COVID sequence, we have used Continuous Wavelet Transform (CWT). First of all, we have converted the COVID genome using the Real Number Representation. Then, we applied the CWT on the converted sequence using different mother wavelets at different scales at an interval of 1. Then, we extract the information from the scalogram and identify the hot regions in the scalogram and then analyze the hot regions using bioinformatics tools. The flowchart of the proposed method is shown in Fig. 1.

3.2 Wavelet Analysis In Fourier analysis, the signal is broken down into several sinusoids of different frequencies. This mathematical analysis is useful for stationary signals but not suitable for non-stationary signals. So we are using wavelets, which are applicable to both stationary and non-stationary signals. In wavelets, the data is broken down into different frequencies and each component’s resolution matches its scale (Fig. 2).

An advanced version of the VOSS Mapping is the Z curve mapping which is a three-dimensional curve representing the DNA sequence using the set of 3D nodes – Easily reconstruction of the sequence – Conversion of nucleotides sequences into genomic signals [22] The nucleotides are assigned to each of the vertices of a regular tetrahedron in space – minimizes the numerical depictions from 4 to 3 which is symmetric to all the 4 component [24] Cattani, in 2012, proposed this method of representation. In this, the complex conjugate pairs will be considered as a = t and g = c – This kind of representation reflects the (A-T & G-C)’s complimentary nature This technique of representation is used when we have to map DNA bases in 1D. 0, 1, 2, 3 are assigned to the nucleotides T, C, A, G This represents a complimentary property. We can take A, T as -1 and C, G as 1 or vice-versa according to the application – The assignment of the number don’t have any effect on the DNA structure [26]

Z-curve [21]

Tetrahedron [23]

Complex number [25]

Integer

Real Number

Physiochemical property-based Physiochemical property [27] This method is based on the physiochemical properties of DNA. Properties like biophysical and biochemical are taken into consideration while mapping the sequences

Richard F Voss was the scientist to propose Voss binary indicator, which converts a DNA sequence to four 0 and 1 sequences that signify A, C, G, and T, respectively – Simplicity and the efficiency it provides for the spectral analysis of DNA sequences – It fails to represent the relationship between the four sequences – Cluster DNA sequences [17] – Similarity comparison genomics [18] – Characteristic of coding sequences [19] – Genomic analysis [20]

Voss mapping [16]

Fixed mapping

Description

Mapping techniques

Representation

Table 1 Numerical representation of DNA sequences

Analysis of COVID-19 Genome Using Continuous Wavelet Transform 1051

1052

S. Saxena et al.

Fig. 1 Methodology

Fig. 2 Proposed workflow for finding the mutations in the Coronavirus Genome using CWT

Analysis of COVID-19 Genome Using Continuous Wavelet Transform

1053

Wavelet transforms are the integral sum of the wavelets. Suppose a function s(t) which is a continuous square-integrable function, whose Continuous Wavelet Transform (CWT) is defined as 1 Sw (a, b) = √ a

∞ s(t)ψ −∞





 t −b dt a

(1)

where ψ ∗ is the mother wavelet, which is continuous in both the time and frequency domain, * represents the complex conjugate, a is the scale parameter and b is the translation parameter. We can get back the original signal s(t) using Inverse Continuous Wavelet Transform (ICWT). ∞ ∞ x(t) = 0 −∞

  1 1 t −b dbda Sw (a, b) √ ψ a2 a a

(2)

CWT is the convolution of the signal with the set of functions generated by the mother wavelet. The simplest wavelet for analyzing any sequence is the HAAR Wavelet. The HAAR Mother Wavelet is defined as ψ(t) = {10 ≤ t < 1/2 − 11/2 ≤ t < 10 otherwise}

(3)

and Haar scaling function is defined as φ(t) = {10 ≤ t < 10 otherwise}

(4)

3.3 Coronavirus Genome A clade is a phrase used in biology to describe a group of species that all descended from a common ancestor. A clade in virology represents groupings of related viruses based on their genetic sequences, and phylogeny can also be used to track changes in those viruses. Rapid genome sequencing is a technique for tracking changes in a virus’s genetic makeup. SARS-CoV-2 is a clade within the coronaviridae family and the genus betacoronavirus. In general, a virus’s genetic variations are classified as clades, which can alternatively be referred to as subtypes, genotypes, or groupings. PANGOLIN In a Rambaut et al. (2020) study, the Phylogenetic Assignment of Named Global Outbreak Lineages (PANGOLIN) software team recommended naming for the SARS-CoV-2 clades. This includes the primary A, B, B.1, B.1.1, B.1.177, and B.1.1.7 lineages. These lineages are further subdivided [28]. This is where strain designations like B.1.1.7 (British originating) and B.1.351 (South

1054

S. Saxena et al.

Table 2 Viral genomes [30, 31] Name

Accession no./EPI ISL

Genome size

Clade

hCoV-19/Spain/GA-SCQ22 − 205,525/2022

12,954,505

29,903

L

hCoV-19/Australia/VIC332/2020

2,968,043

29,837

V

hCoV-19/USA/CA-CDPH-3000004343(alpha)/2021

12,979,091

29,849

G

hCoV-19/Namibia/N5698/2021(beta)

12,957,015

29,751

G

hCoV-19/USA/CA-CDPH-3000004343(alpha)/2021

12,979,091

29,849

G

hCoV-19/Peru/CUS-INS-12588/2021 lambda

8,882,505

29,861

G

hCoV-19/USA/CO-CDPHE-2103210316/2021 delta

13,021,619

29,769

G

hCoV-19/India/KA-CBR-SEQ-11036/2021 omicron

9,430,038

29,893

G

hCoV-19/Dominican Republic/33892/2021 mu

3,045,391

29,816

G

hCoV-19/Congo/RC-374/2021 gh r

10,023,507.1/2

27,818

G

African originating) come from. The original strain, A, is employed as a reference sequence in the PANGOLIN nomenclature. GISAID The Global Initiative for Sharing Avian Influenza Data has hundreds of full and high-coverage genomes available (GISAID). GISAID categorizes SARSCoV-2 clades as S, O, L, V, G, GH, GR, GV, and GRY. The S and L clades were present at the start of the pandemic. S remained dominant at first, while L separated into G and V. G was further subdivided into GR and GH, and then GV. After July 2020, GR will be separated into GRY. The letters are the result of mutations that caused them to fork. The G clade corresponds to the PANGO B.1 lineage, while GR corresponds to the PANGO B.1.1 lineage [29]. For this purpose, we took ten different genes from GISAID [30] and NCBI [31] as given in Table 2.

4 Experimental Results In this section, we will discuss the results obtained after analyzing the coronavirus genome using different wavelets. In analyzing the coronavirus genome, we have used the real number representation method because of its reliability and efficiency. We have used different mother wavelets such as Haar, Daubechies, and Coiflet at different scales at an interval of 1. The hot regions can be observed clearly when we zoomed in on the graphs.

Analysis of COVID-19 Genome Using Continuous Wavelet Transform

1055

Fig. 3 Spike protein first half; db4 scale 200

4.1 Spike Protein As we know the mutations in COVID-19 are over the spike protein area. So when we select a region only from the spike area, we got the following graph.1 There are four VOCs (Variant of concern), 2 VOIs (Variant of Interest), and 1 VUM (Variant under monitoring). The scalogram for finding mutations in the region of spike protein in different strains using different wavelets is shown in Figs. 3, 4, 5 and 6 with the mutation sites in Table 3. At the scale of 1:1:200, we got the mutation sites as we can see in the scalogram by using db4, coif5.

1

The values in the following sections in this paper may contain some errors of ± 2.

1056

S. Saxena et al.

Fig. 4 Spike protein second half; db4 scale 200

4.2 N (Nucleocapsid Protein) The coronavirus Nucleocapsid (N) is a structural protein that forms complexes with the genomic RNA and interacts with the viral membrane protein during the virion assembly, and it is also essential for virus transcription and assembly efficiency. The Nucleocapsid is a versatile protein, according to many recent research studies. The scalogram for finding mutations in the region of Nucleocapsid protein in different strains using different wavelets with the mutation sites in Table 4. The graphs in Figs. 7 and 8 depict the possible mutation at the scale of 300 by using db4 and coif5 as a mother wavelet.

Analysis of COVID-19 Genome Using Continuous Wavelet Transform

1057

Fig. 5 Spike protein first half; coif5 scale 200

4.3 Envelope (E) Protein The envelope protein is a tiny, integral membrane protein that participates in several stages of the virus’s life cycle, including assembly, budding, envelope formation, and pathogenicity. Recent research has focused on its structural motifs and structure, as well as its functions as an ion-channeling viroporin and interactions with other CoV proteins and host cell proteins. The possible mutations sites in the envelope protein are shown in Figs. 9 and 10 at the scale of 50 by using db4 and coif5 as a mother wavelet with the nucleotide position in Table 5.

1058

S. Saxena et al.

Fig. 6 Spike protein second half; coif5 scale 200

4.4 Membrane (M) Protein This is the most abundant protein present in COVID-19. It holds dominant cellular immunogenicity. The scalogram depicts the possible mutation sites present in the membrane protein, which are shown in Figs. 11, and 12. The possible mutation sites present in different strains are given in Table 6. We got the mutation sites on a scale of 100.

21,564–25,385

21,510–25,331

21,510–25,331

21,495–25,307

21,510–25,331

21,522–25,343

21,510–25,325

21,593–25,408

21,524–25,345

21,510–25,331

“‘·‘Clade L

Clade V

‘Alpha

Beta

Gamma

Lambda

Delta

Omicron

mu

gh r

Spike protein

Name

Table 3 Mutation table of spike protein

A520S, D614G, D936H F490R, N394S, N501Y P9L, P681H, R346S T859N, Y449N

A67S, D138Y, D614G E484K, E702Q, N501Y R346K, T95I

D614G, D950N, D1260E E156G, E554Q, F157del G142D, H655Y, N679K, P681H Q954H, R158del, T19R

D614G, D950N, E156G F157del, G142D, K187Q, L452R P681R, R158del, T19R, V1122L

D614G, F490S, G75V L452Q, T76I, T859N

L242I, N501Y, P26S C1235F, D138Y, D614G E484K, H655Y, K417T, L18F P681R, R190S, T470N T1027I, V1176F

A243del, A653V, A701V, A1020V D614G, E484K, K417N L242del, L244del, N501Y D80A, D215G, L18F

A570D, D614G N501Y, P681H, S982A D1118H, T716I, V70I

No

No

Mutation

23070, 23352, 24318, 22980 22692, 23013, 21537, 23553 22548, 24087, 22857

21725, 21938, 23366, 22976 23630, 23027, 22562, 21809

23435, 24443, 22061, 22064 23225 22019, 23058, 23630, 23636 22019, 23058, 23630, 23636

23352,24360, 21978, 21981 21936,22071, 22866,23553 23984,21567, 24876

23364, 22992, 21747 22878, 21750, 24099

22236, 23013, 21588, 25215 21924, 23352,22962,23475 22761, 21564, 23913, 22080 22920, 24591, 25038

2224,23454, 23598 24555,23337, 22947 22746, 22221, 22227, 22998 21735, 22140, 21549

21720,23013,23220 23352,23553,23658 24456,24864





Nucleotide position

Analysis of COVID-19 Genome Using Continuous Wavelet Transform 1059

1060

S. Saxena et al.

Table 4 Mutation table of Nucleocapsid protein Sr. no.

Name

Nucleocapsid (N) protein

Mutation

Nucleotide position

1

Clade L

28,275–29,534

No



2

Clade V

28,252–29,511

No



3

Alpha

28,221–29,840

G204R, S235F

28,833, 28,926

4

Beta

28,197–29,456

T205I

28,812

5

Gamma

28,221–29,480

G204R, P80R

28,833, 28,541

6

Lambda

28,233–29,492

G214, P13L

28,875, 28,272

7

Delta

28,224–29,467

D63G, D377Y, G215C, R203M

28,413, 29,355, 28,869, 28,833

8

Omicron

28,272–29,550

D377Y, G215C, R203M

29,403, 28,917, 28,881

9

mu

28,251–29,494

T205I

28,866

Fig. 7 Nucleocapsid protein; db4 scale 300

Analysis of COVID-19 Genome Using Continuous Wavelet Transform

1061

Fig. 8 Nucleocapsid protein; coif5 scale 300

4.5 ORF3a or NS3a Protein This encodes a major viral accessory protein. This also plays a major role in replication and pathogenesis. The scalogram depicts the possible mutations sites present in the ORF3a protein, as shown in Figs. 13 and 14. The possible mutation sites present in different strains are given in Table 7. We got the mutation sites on a scale of 200.

4.6 ORF7a (NS7a) and ORF7b (NS7b) Protein ORF7a and ORF7b encode viral accessory protein. The scalogram for finding mutations in the region of ORF7a and ORF7n protein in different strains using different wavelets with the mutation sites is given in the Tables 8 and 9, respectively. The

1062

S. Saxena et al.

Fig. 9 Envelope (E) protein; db4; scale 50

graphs in Figs. 15 and 16 depict the possible mutation at scale of 200 by using db4 and coif5 as a mother wavelet in the region of ORF7a and ORF7b, respectively.

4.7 ORF8 (NS8) Protein ORF8 protein mediates immune evasion by downregulating the MHC-Iota. The mutation sites are given in Table 10 with the scalogram in Figs. 17, 18, 19, and 20.

Analysis of COVID-19 Genome Using Continuous Wavelet Transform

1063

Fig. 10 Envelope (E) protein; coif5; scale 50

In the case of spike protein, there are four different graphs, first, two graphs are actually one graph divided into two. There were approximately 3600 base pairs in the main spike protein so to access it better the genome was taken as 1800 each time. In the rest graphs, we can clearly see that the envelope protein may be a false report provided as the graph did not depict any patterns, and also the mutation provided was way out of the gene region.

1064

S. Saxena et al.

Table 5 Mutation table of envelope protein Sr. no.

Name

Envelope (E) protein

Mutation

Nucleotide position

1

Clade L

26,246–26,455

No



2

Clade V

26,223–26,432

No



3

Alpha

26,192–26,401

No



4

Beta

26,168–26,377

P71L

26,381

5

Gamma

26,192–26,401

No



6

Lambda

26,204–26,413

No



7

Delta

26,186–26,395

No



8

Omicron

26,269–26,478

No



9

mu

26,206–26,415

No



10

gh r

26,192–26,401

No



5 Conclusions and Future Work The main aim of this paper is to find the hot regions in the COVID-19 genome using wavelet transform. There were three important steps in this paper: • Conversion of DNA sequences to mathematical form. • Continuous Wavelet Transform of the sequence. • Pattern recognition based on the visual prowess and confirming it using experiments or previous results. A 1D wavelet transform is used in this paper. Different wavelets like Haar, Coiflet, and Daubechies were used for this purpose at different scales. By trial and error method and referring to previous papers, db4 and coif 5 wavelet is used to identify the patterns. The numerical representation is the crucial base that makes up the framework in today’s era. The results illustrate the possible mutation in the particular virus. The change in the pattern depicts the change of amino acids in the genome. Since the collected gene had few undetermined areas so it also showed some change in the pattern too. This can only be corrected with the help of wet lab analysis. The scales here depict the scaling of the mother wavelet, and thus, the signal can be interpreted using previous knowledge. The scale can be changed accordingly to the number of base pairs present. Different mutation sites are visible on a scale of 200. Some proteins show patterns on a scale of 50. In the future, the most important and crucial task will be carefully choosing the required numerical representation technique and applying the correct wavelet transform. This could be the best technique in the future in order to check the hot points analyzing the genome. Using this and taking base pairs in small numbers, we can depict the mutation region easily. This technique can be used in detecting mutations in cancer also.

Analysis of COVID-19 Genome Using Continuous Wavelet Transform

Fig. 11 Membrane protein; db4 scale 100

1065

1066

S. Saxena et al.

Fig. 12 Membrane protein; coif5 scale 100 Table 6 Mutation table of membrane protein Sr. no.

Name

Membrane (M) protein

Mutation

Nucleotide position

1

Clade L

26,254–27,192

No



2

Clade V

26,231–27,169

No



3

Alpha

26,200–27,138

V70L

26,680

4

Beta

26,176–27,114

P71L

26,381

5

Gamma

26,200–27,138

No



6

Lambda

26,212–27,150

No



7

Delta

26,194–27,132

H125Y, M182T

26,839, 27,010

8

Omicron

26,277–27,215

A63T, Q19E

26,736, 26,604

9

mu

26,214–27,152

No



10

gh r

26,200–27,138

I82T

26,716

Analysis of COVID-19 Genome Using Continuous Wavelet Transform

Fig. 13 ORF3a or NS3a; db4 scale 200

1067

1068

Fig. 14 ORF3a or NS3a; coif5 scale 200

S. Saxena et al.

Analysis of COVID-19 Genome Using Continuous Wavelet Transform

1069

Table 7 Mutation table of ORF3a protein Sr. no.

Name

ORF3 Protein

Mutation

Nucleotide position

1

Clade L

25,394–26,221

No



2

Clade V

25,371–26,198

G251V

26,124

3

Alpha

25,340–26,167

No



4

Beta

25,316–26,143

S171L

25,829

5

Gamma

25,340–26,167

S216L, S253P

25,988, 26,099

6

Lambda

25,352–26,179

No



7

Delta

25,334–26,161

E239Q, S26L

26,051, 25,412

8

omicron

25,417–26,224

T223I

26,086

9

mu

25,354–26,181

V259L

26,131

10

gh r

25,340–26,167

Q57H, T32I

25,511, 25,436

Table 8 Mutation table of ORF7a or NS7a protein Sr. no.

Name

ORF7a (NS7a) protein

Mutation

Nucleotide position

1

Clade L

27,395–27,744

No



2

Clade V

27,372–27,737

No



3

Alpha

27,341–27,690

No



4

Beta

27,317–27,666

No



5

Gamma

27,341–27,690

No



6

Lambda

27,353–27,702

No



7

Delta

27,335–27,684

T120I, V82A

27,695, 27,581

8

Omicron

27,418–27,767

T120I, V82A

27,778, 27,664

9

mu

27,355–27,704

V104F

27,667

10

gh r

27,341–27,690

No



Table 9 Mutation table of ORF7b or NS7b protein Sr. no.

Name

ORF7b (NS7b) protein

Mutation

Nucleotide position

1

Clade L

27,757–27,888

No



2

Clade V

27,734–27,865

No



3

Alpha

27,703–27,834

No



4

Beta

27,679–27,810

No



5

Gamma

27,703–27,849

No



6

Lambda

27,715–27,846

No



7

Delta

27,697–27,769

T40I

27,817

8

Omicron

27,780–27,911

NO



9

mu

27,717–27,848

No



1070

Fig. 15 ORF7a or NS7; db4 scale 200

S. Saxena et al.

Analysis of COVID-19 Genome Using Continuous Wavelet Transform

Fig. 16 ORF7a or NS7a; coif5 scale 200

1071

1072

S. Saxena et al.

Table 10 Mutation table of ORF8 or NS8 protein Sr. no.

Name

ORF8 (NS8) protein

Mutation

Nucleotide position

1

Clade L

27,895–28,260

No



2

Clade V

27,872–28,237

No



3

Alpha

27,841–28,206

Q27stop, R52I, Y73C

27,922, 27,997, 28,060

4

Beta

27,817–28,182

No



5

Gamma

27,841–28,206

G8 stop

– 27,865

6

Lambda

27,853–28,207

No



7

Delta

27,835–28,194

D119del, F120del

28,192, 28,195

8

Omicron

27,918–28,272

E106stop

28,236

9

mu

27,855–28,220

P38S, S67F, T11K

27,969, 28,056, 27,888

Fig. 17 ORF7b or NS7b; db4 scale 200

Analysis of COVID-19 Genome Using Continuous Wavelet Transform

Fig. 18 ORF7b or NS7b; coif5 scale 200

1073

1074

Fig. 19 ORF8 or NS8; db4 scale 200

S. Saxena et al.

Analysis of COVID-19 Genome Using Continuous Wavelet Transform

1075

Fig. 20 ORF8 or NS8; coif5 scale200

Acknowledgements This project is the outcome of the SBRB Project: MTR/2019/000432. We are thankful to SERB for providing financial support.

References 1. Simonsen L, Viboud C (2021) A comprehensive look at the COVID-19 pandemic death toll. eLife 10: e71974 2. Tsonis AA, Kumar P, Elsner JB, Tsonis PA (1996) Wavelet analysis of DNA sequences. Phys Rev E 53(2):1828–1834 3. Kirtipal N, Bharadwaj S, Kang SG (2020) From SARS to SARS-CoV-2, insights on structure, pathogenicity and immunity aspects of pandemic human coronaviruses. Infect Genet Evol 85:104502 4. ncbi datasource https://www.ncbi.nlm.nih.gov/sars-cov-2/ 5. Marra MA, Jones SJM, Astel CR, Holt RA, Brooks-Wilson A, Butter-Field YS, Khattra J, Asano JK, Barber SA, Chan SY (2003) The genome sequence of the sars-associated coronavirus. sciencexpress

1076

S. Saxena et al.

6. Chandra S, Rizvi AZ (2010) Entropy, autocorrelation and fourier analysis of hiv-1 genome. In: Sobh T (eds) Innovations and advances in computer sciences and engineering. Springer Netherlands, Dordrecht, pp 119–121 7. Lock M, Alvira MR, Chen S-J, Wilson JM (2014) Absolute determination of single-stranded and self-complementary adeno-associated viral vector genome titers by droplet digital pcr. Human Gene Therap Method 25(2):115–125 8. George B, Gnanasekaran P (2014) SK Jain, and Supriya Chakraborty, “Genome wide survey and analysis of small repetitive sequences in caulimoviruses.” Infect, Genetics Evol 27:15–24 9. Madhugiri R, Fricke M, Marz M, Ziebuhr J (2014) RNA structure analysis of alphacoronavirus terminal genome regions. Virus Res 194:76–89 10. Cristina J, Moreno P, Moratorio G, Musto H (2015) Genome-wide analysis of codon usage bias in ebolavirus. Virus Res 196:87–93 11. Geng H, Zhang M, Li X, Wang L, Cong C, Cui H, Wang l, Xu Y (2020) Complete genome analysis of a staphylococcus aureus phage (vbsm-a1) Archiv Microbiol 202(7):1617–1626 12. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformat 30(15):2114–2120 13. Yamaguchi J, Vallari A, McArthur C, Sthreshley L, Cloherty GA, Berg MG, Rodgers MA (2020) Brief report: complete genome se- quence of cg-0018a-01 establishes hiv-1 subtype l. J Acquired Imm Def Synd 83(3):319 14. Bomma R, Venkatesh P, Kumar A, Babu AY, Rao SK (2012) Pondr (predicators of natural disorder regions). Int J Comp Tech Elect Eng 21(4):61–70 15. Mészáros B, Os GE, Dosztányi Z (2018) Iupred2a: context- dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res 46(W1):W329– W337 16. Voss RF (1992) Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Physical Rev Lett 68(25):3805 17. Hoang T, et al (2015) A new method to cluster DNA sequences using Fourier power spectrum. J Theoret Biol 372:135–145 18. Hoang T, Yin C, Yau SS (2016) Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison. Genomics 108(3–4):134–142 19. Yin C, Yau SS-T (2005) A Fourier characteristic of coding sequences: origins and a non-Fourier approximation. J Comp Biol 12(9):1153–1165 20. Zahhad A, Ahmed SM, Abd-Elrahman SA (2012) Genomic analysis and classification of exon and intron sequences using DNA numerical mapping techniques. Int’l J Info Techn Comp Sci 4(8):22–36 21. Zhang R, Zhang CT (1994) Z curves, an intutive tool for visualizing and analyzing the DNA sequences. J Biomol Struct Dyn 11(4):767–782 22. Zhang C, Zhang R, Ou H (2003) The Z curve database: a graphic representation of genome sequences. Bioinform Comp Appl Biosci 19:593–599 23. Silverman BD, Linsker R (1986) A measure of DNA periodicity. J Theor Biol 118(3):295–300 24. Cristea PD (2007) Conversion of nucleotides sequences into genomic signals. J Cell Mol Med 6(2):279–303 25. Cattani C (2010) Fractals and hidden symmetries in DNA. Mathematical Problems in Engineering 26. Kwan HK, Arniker SB (2009) Numerical representation of DNA sequences. In: IEEE International Conference on Electro/Information Technology, pp 307–310 27. Arniker SB, Kwan HK (2012) Advanced numerical representation of DNA sequences. In: International Conference on Bioscience, Biochemistry and Bioinformatics IPCBEE, 31, pp 2–5 28. Zhang T, Qunfu W, Zhang Z (2020) Probable pangolin origin of sars-cov-2 associated with the covid-19 outbreak. Curr Biol 30(7):1346–1351 29. Shu Y, McCauley J (2017) Gisaid: Global initiative on sharing all influenza data–from vision to reality. Eurosurveillance 22(13):30494

Analysis of COVID-19 Genome Using Continuous Wavelet Transform

1077

30. Huang Y, et al (2004) The sars epidemic and its aftermath in China: a political perspective. Learning from SARS: preparing for the next disease outbreak, pp 116–136 31. gisaid covid bank https://gisaid.org/

Author Index

A Aamod Sane, 161 Aarthi, G., 817 Aarya Tiwari, 405 Abdul Gaffar, H., 993 Abhijeeth M. Nair, 1047 Abhijit Sen, 445 Abhinav Sharma, 393 Abhishek Das, 845 Aboli Khursale, 129 Abu Ansari, 405 Aditi Bornare, 3 Aditya Srinivas Menon, 303 Advait Kamathe, 19 Ahsan Z. Rizvi, 1047 Aisha Banu, W., 817, 1021 Ajanta Das, 759 Akhatov, A., 1029 Amlan Chakrabarti, 759 Amutha, S., 1005 Ananth Kumar, T., 1029 Ananth, Christo, 1029 Anindita Banerjee, 759, 833, 845 Anita Goel, 77 Anjali Kamath, K., 285 Anussha Murali, 897 Archana Patil, 583 Archit Chitre, 665 Arsheyee Shahapure, 833 Arun kumar, B. R., 879 Arun Mitra, 897, 913 Arushi Dubey, 3 Arvind Deshpande, 101 Arvind Kumar, 319 Ashish Garg, 605 Ashish Ranjan, 537

Ashraf Alam, 721, 735 Ashwini M. Joshi, 285 Atasi Mohanty, 721, 735 Avinash More, 345

B Balagopal G. Menon, 175 Balaji, G. N., 965 Baneswar Sarker, 605 Bhanu Pratap Misra, 445 Bharath, R., 91, 143 Bhargab Das, 319 Bhavyashree, H. L., 383 Bijon Guha, 457 Biju Soman, 897, 913

C Chaman Banolia, 59 Chinmay Kulkarni, 19 Chour Singh Rajpoot, 119

D Dasgupta, Paramik, 367 Deap Daru, 593 Debasis Samanta, 555 Debayan Ganguly, 367 Deepa Shenoy, P., 431 Deep Pancholi, 45 Dhanya S. Pankaj, 489 Dharani, M. K., 1005 Dhinakaran, K., 749 Divya T. Puranam, 285

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 N. Sharma et al. (eds.), Data Management, Analytics and Innovation, Lecture Notes in Networks and Systems 662, https://doi.org/10.1007/978-981-99-1414-2

1079

1080 G Gayadhar Pradhan, 235 Gargi Kulkarni, 525

H Hariharan, R. S., 993 Hari Kumar Radhakrishnan, 91, 143 Harit Koladia, 593 Harshal Abak, 405 Harshali Y. Patil, 187 Harsha Sonune, 129 Himmatov, I., 1029 Hitansh Surani, 593 Hoechenberger, Ralf, 69 Hrushikesh Bhosale, 161 Hummel, Detlev, 69 Hutter, Stefan, 221

I Ieesha Deshmukh, 129 Indrajit Kar, 457 Isha Deshpande, 855 Israa Bint Rahman Salman, 331 Itishree Panda, 235

J Jhareswar Maiti, 605 Joyal Isac, S., 1005 Jyoti Kanjalkar, 405, 665 Jyoti Prakash Singh, 235 Jyoti Singh Kirar, 249

K Kaivalya Aole, 405 Karan Owalekar, 783 Kartikey Gupta, 249 Karthikha, R., 817 Ketaki Pattani, 685 Ketki Deshmukh, 345 Kingshuk Chatterjee, 367 Konjengbam Anand, 303 Krantee M. Jamdaade, 187 Krishna Murthy, S. V. S. S. N. V. G., 445 Kriti Srivastava, 593 Kumar, C. R. S., 569 Kunal Parmar, 593

L Leena Panchal, 855

Author Index M Mahima Arya, 445 Manika Garg, 77 Mani, G. S., 647 Manikandan, K., 993 Manikanta Sanjay, V., 201 Manish Modani, 845 Manish Sinha, 421 Md. Shahzeb, 357 Megala, G., 981 Monalisa Sarma, 555 Mrityunjoy Panday, 201

N Najumnissa Jamal, D., 817 Neelam Chandolikar, 19 Neha Lahane, 129 Neha Sharma, 221 Nieuhas, Engelbert, 913 Niha, K., 1005 Nikhil Satish, 569 Nilanjan Sinhababu, 555 Nishanth, P., 37 Nukala Divakar Sai, 605

P Palanisamy, T., 445 Pankaj A. Tiwari, 627 Parikshit Mahalle, 101 Pavan Chittimalli, 479 Pawar, B. V., 269 Pradheep Kumar, K., 749 Pradnya Agrawal, 129 Prajakta Deshpande, 3 Prakash Sharma, 665 Pramod Kanjalkar, 405, 665 Prasad Chinchole, 665 Prince Nagpal, 249 Priyesh Kumar Roy, 445

R Raj Kumar, 319 Rakesh Kumar Chawla, 697 Rakhal Gaitonde, 913 Ramanarayanan, C. P., 91, 143 Ram Bhavsar, 269 Ramesh Chandra Belwal, 393 Ram Prabhakar, K., 59 Reetun Maiti, 175 Rehan Deshmukh, 833 Rutuja Dherange, 3

Author Index Rutuja Sangitrao, 855

S Sahay, Sundeep, 897 Saikat Bank, 143 Sajin Kumar, 913 Sandeep M. Chaware, 511 Sandeep Sharma, 941 Santhanam, 445 Santosh Kumar Vishwakarma, 119 Savitri Kulkarni, 431 Sawan Bhattacharyya, 759 Sawan Rai, 393 Seitz, Juergen, 69, 221 Selvi, C., 45 Shailesh Deshpande, 59, 525, 783 Sharmila Sankar, 817 Sharon Priya, S., 817, 1021 Shashikant Ghumbre, 583 Shashwat Shahi, 525 Shifana Rayesha, S. M., 1021 Shinjini Halder, 367 Shirgave, S. K., 863 Shital H. Dinde, 863 Shivani Saxena, 1047 Shripad Bhatlawande, 19 Shyam Sundar, 91 Siddharth Vaddem, 879 Sima Gautam, 445 Sinha, L. K., 153 Sita Yadav, 511 Sodhi, J. S., 697 Sonam Sharma, 479 Sophiya Antony, 489 Srikanth, D., 357 Srushti Chiddarwar, 3 Sudipta Mukhopadhyay, 457 Sudipta Roy, 367 Sumukh Sirmokadam, 479, 525 Sunil Gautam, 685 Sunita Dhavale, 357, 537

1081 Supreet Kaur, 941 Suresh Kumar, 357, 537 Surjadeep Sarkar, 367 Suryanarayana, S. V., 965 Swarnalatha, P., 981 Swati Shilaskar, 19

T Tanuja K. Fegade, 269 Tarun Bhatnagar, 913 Triveni Singh, 697 Tuhinangshu Gangopadhyay, 367

U Ujas Italia, 783 Umesh Kokate, 101 Upasna Singh, 153

V Vahida Attar, 583 Valadi K. Jayaraman, 161 Varaprasad, G., 37, 331, 383 Venkata Yaswanth Kanduri, 201 Venkatesan, R., 981 Venkateswara Rao, G., 965 Venugopal, K. R., 431 Vigneshwaran, T., 965 Vigneshwar Ramakrishnan, 161 Vijeta, 783 Virendra Kumar, 319 Vivek Saxena, 153

W Winkler, Noah, 221

Y Yashaswa Verma, 249