Intelligent Sustainable Systems: Selected Papers of WorldS4 2021, Volume 2 (Lecture Notes in Networks and Systems, 334) 9811663688, 9789811663680

This book provides insights of World Conference on Smart Trends in Systems, Security and Sustainability (WS4 2021) which

151 27 22MB

English Pages 862 [821] Year 2021

Table of contents :
Preface
Contents
Editors and Contributors
Wall-Distance Measurement for Indoor Mobile Robots
1 Introduction
2 Literature Review
3 Methodology
3.1 Image Acquisition and Resizing
3.2 Edge Detection
3.3 BRISK Key-Points Extraction
3.4 Pixel Counting and Distance Calculation
3.5 Characterization
4 Results and Discussion
5 Conclusion
References
Automating Cognitive Modelling Considering Non-Formalisable Semantics
1 Introduction
2 Non-formalisable Cognitive Semantics
3 Uncaused Decisions
4 Quantum and Wave Semantics
5 Automation of Cognitive Modelling
6 Discussion
7 Results and Conclusions
References
Using a Humanoid Robot to Assist Post-stroke Patients with Standardized Neurorehabilitation Therapy
1 Introduction
2 Training Tasks
2.1 Arm Basis Training (ABT)
2.2 Mirror Training
3 Robot-Supported Training Tasks
3.1 Model of a Patient
3.2 Model of a Helper
4 Related Work
5 Summary and Outlook
References
Human Resource Information System in Healthcare Organizations
1 Introduction
2 What Is Human Resources Information System
3 Roles of Human Resources Information System and its Importance in Healthcare
4 Functional Attributes
4.1 Employee Database
4.2 Employee Life Cycle [12]
4.3 Hiring Process
4.4 Payroll System
5 Non-functional Attributes
6 Challenges of Human Recourses Information System
7 Literature Review
8 Methodology
9 Data Analysis and Discussion
9.1 Overview of Respondents Analysis
9.2 Factors Measurement
9.3 Correlation Coefficient Analysis
9.4 Descriptive Statistics Analysis
9.5 Discussion
10 Limitations
11 Recommendations
12 Future of Human Resource Information System in Healthcare
13 Conclusion
References
Performance Prediction of Scalable Multi-agent Systems Using Parallel Theatre
1 Introduction
2 Background
2.1 An Overview of Parallel Theatre
2.2 Basic Concepts of Minority Game
2.3 Fundamental Behaviour of MG
2.4 A Genetic Extension of MG
3 A Formal Actor Model of the Minority Game
3.1 Checking Model Correctness
3.2 Performance Checks
4 Transforming the MG Model to Parallel Theatre
5 Performance Prediction of MG Models
6 Experimental Results of Parallel Execution
7 Conclusions
References
Dynamics of Epidemic Computer Subnetwork Models for Scan-Based Worm Propagation: An Internet Protocol Addressing Configuration Perspective
1 Introduction
2 Related Works
3 Methodology
4 Epidemic Models and Numerical Simulation
4.1 The SEI1I2R1R2 Model with Mass Action Incidence
4.2 The XYAZ Model
4.3 The SIR Model with Immunity Divisions and Standard Incidence
5 Conclusion
References
Security Analysis of Integrated Clinical Environment Using Attack Graph
1 Introduction
2 Integrated Clinical Environment (ICE)
2.1 ICE Topology
2.2 Formal Description of ICE
3 Attack Graph Generation
4 Conclusions
References
Smart University: Key Factors for a Cloud Computing Adoption Model
1 Introduction
2 Method
3 Proposed Model of Adoption of Cloud Computing
4 Conclusions
References
Agile Governance Supported by the Frugal Smart City
1 Introduction
2 Research Goal and Methodology
3 Post-crisis Participatory Governance Model
3.1 Prediction of Future Data—Casablanca
4 Discussion
5 Conclusion
References
Effect of Normal, Calcium Chloride Integral and Polyethene Sheet Membrane Curing on the Strength Characteristics of Glass Fiber and Normal Concrete
1 Introduction
1.1 Curing
1.2 Glass Fiber Concrete
2 Materials and Methods
2.1 Cement
2.2 Fine Aggregate
2.3 Coarse Aggregate
2.4 Mix Ratio
2.5 Calcium Chloride
2.6 Glass Fiber
2.7 Compression Strength Test
2.8 Tensile Strength Test
3 Results and Discussions
3.1 Glass Fiber Concrete
3.2 Calcium Chloride Integral Curing
3.3 Polythene Membrane Curing
3.4 Normal Concrete Under Different Curing Conditions
3.5 Glass Fiber Concrete Under Different Curing Conditions
3.6 Strength Characteristics of Normal Concrete and GFRC
4 Conclusions
References
Problems with Health Information Systems in Ecuador, and the Need to Educate University Students in Health Informatics in Times of Pandemic
1 Introduction
2 Method
3 Results
3.1 Participants
3.2 Years of Study
3.3 Health Informatics Research
3.4 Use of Information Technology by University Students
3.5 Knowledge in Health Informatics to be Received
4 Discussion
References
Utilizing Technological Pedagogical Content (TPC) for Designing Public Service Websites
1 Introduction
2 Theoretical Framework of TPCD
2.1 Pedagogical Content (PC)
2.2 Technological Content (TC)
2.3 Technological Pedagogy (TP) and Pedagogical Technology (PT)
3 From Theory to Action
3.1 Defining Objective Goals of Website
3.2 Selecting and Developing Materials
3.3 Inserting Content on Webpages
4 Conclusions
References
A Multidimensional Rendering of Error Types in Sensor Data
1 Introduction
2 Classification of Errors in Sensor Data
3 A Multidimensional Model of Errors in Sensor Data
4 Conclusion
References
Optimization of the Overlap Shortest-Path Routing for TSCH Networks
1 Introduction
2 System Model
2.1 Network Model
2.2 Flow Model
3 Problem Formulation
4 Epsilon Greedy Heuristic Optimization Method
5 Tests Scenarios
6 Results
7 Conclusions and Future Work
References
Application of Emerging Technologies in Aviation MRO Sector to Optimize Cost Utilization: The Indian Case
1 Introduction
1.1 Technology Penetration in the Aviation Sector
1.2 Current Status-Quo of Footprint of Technology in India
1.3 Sustainability
1.4 Data and Supply Chain Maintenance
1.5 Impact of Emerging Technology in MRO
1.6 Challenges
2 Research Design
2.1 Objectives
2.2 Hypotheses Declaration
2.3 Research Methodology
3 Description
4 Research Outcome
4.1 Cost Factor in Technology Implementation
4.2 AI and IoT (Emerging Technologies) in Indian MRO Industry and Global Aviation MRO Industry
4.3 Mental Bias of Age Group Towards AI and IoT in Aviation Sector
5 Conclusion
6 Recommendations
References
User Evaluation of a Virtual Reality Application for Safety Training in Railway Level Crossing
1 Introduction
2 Literature Review
3 Research Methodology
3.1 Participants
3.2 Overview of VR Training Prototype
3.3 Data Collection Process and Questionnaires
4 Results and Discussion
4.1 SUS Questionnaire
4.2 Sense of Presence (SoP)
5 Conclusion
References
GreenMile—Gamification-Supported Mobile and Multimodal Route Planning for a Sustainable Choice of Transport
1 Introduction
2 Related Work
3 GreenMile
4 Evaluation
5 Conclusion and Future Work
References
A Detailed Study for Bankruptcy Prediction by Machine Learning Technique
1 Introduction
2 Background
3 Proposed Model
3.1 Dataset Characteristics
3.2 Feature Engineering
4 Classification
5 Methodology
6 Model Evaluation and Assessment
7 Results
8 Conclusion
References
Design and Development of Tracking System in Communication for Wireless Networking
1 Introduction
2 Literature Review
3 Methodology
4 Proposed System
5 Implementation
6 Result Analysis
6.1 Comparison of Tracking Error of Existing and Proposed System
6.2 Throughput
7 Conclusion
References
Cyberbullying in Online/E-Learning Platforms Based on Social Networks
1 Introduction
1.1 Use of E-learning
2 What Is Cyberbullying
2.1 Cyberbullying Types
3 Cyberbullying Avoidance
3.1 Case Study: Cyberbullying in Social MediaText
3.2 Case Study: Cyberbullying Using Machine Learning
3.3 Case Study: Cyberbullying Using Social and Textual Analysis
4 Conclusion
References
Machine Learning and Remote Sensing Technique for Urbanization Change Detection in Tangail District
1 Introduction
2 Related Works
3 Data Acquisition and Methodology
3.1 Area of Study
3.2 Data Acquisition and Processing
3.3 Methodology
3.4 Training and Classification
4 Result and Analysis
5 Conclusion and Future Works
References
Enabling a Question-Answering System for COVID Using a Hybrid Approach Based on Wikipedia and Q/A Pairs
1 Introduction
2 Question-Answering Systems
2.1 Main Approaches
2.2 Question-Answering Approaches for COVID
3 Design of the QA System for COVID
3.1 Overview of the Architecture
3.2 Implementation of a Proof of Concept
4 Preliminary Evaluation
4.1 Performance of the Query Processor
4.2 Overall Qualitative Evaluation
5 Conclusion and Future Work
References
A Study of Purchase Behavior of Ornamental Gold Consumption
1 Introduction
2 Literature Review
3 Objectives
4 Research Methodology
5 Data Analysis and Findings
6 Results of Hypothesis Testing
7 Conclusion
References
Circularly Polarized Microstrip Patch Antenna for 5G Applications
1 Introduction
2 Antenna Design and Geometry
2.1 Workflow of Proposed Antenna
2.2 Antenna Geometry
3 Results and Analysis
4 Conclusion
References
An Approach Towards Protecting Tribal Lands Through ICT Interventions
1 Introduction
1.1 Role of ICT
2 E-Governance Initiatives Towards Land Protection
2.1 Strengthening of Revenue Administration and Updating of Land Records Scheme
2.2 Computerization of Land Records (CLR) Scheme
2.3 National Land Records Modernization Programme
3 Proposed Tribal Land Information Management System
4 Discussion and Conclusion
References
Urban Sprawl Assessment Using Remote Sensing and GIS Techniques: A Case Study of Ernakulam District
1 Introduction
2 Data Requirements
2.1 Study Location
2.2 Data Sources
2.3 Image Preprocessing
2.4 Normalized Difference Vegetation Index (NDVI)
2.5 Normalized Difference Built-Up Index (NDBI)
2.6 The Land Surface Temperature (LST)
2.7 Land Use/Land Cover (LULC)—Supervised Classification
2.8 Accuracy Assessment
3 Results and Discussion
4 Conclusion
References
Exploring the Means and Benefits of Including Blockchain Smart Contracts to a Smart Manufacturing Environment: Water Bottling Plant Case Study
1 Introduction
2 Background
2.1 Industry 4.0
2.2 Blockchain
2.3 Smart Contracts
3 Methodology
4 Conclusion and Future Work
References
Extreme Gradient Boosting for Predicting Stock Price Direction in Context of Indian Equity Markets
1 Introduction
2 Methods for Data Collection and Pre-processing for Implementation of Xgboost
2.1 Stock Selection
2.2 Dataset
2.3 Feature Extraction
2.4 Feature Pre-processing and Co-relation Mapping
3 Stock Price Direction Prediction Model Implementation Using Xgboost
3.1 High Level Architectural Diagram
3.2 Algorithm
4 Experimental Results
4.1 Metrics
4.2 Results and Comparison
5 Conclusions
References
Performance of Grid-Connected Photovoltaic and Its Impact: A Review
1 Introduction
2 Centralized and Decentralized GCPV System
2.1 Centralized Grid-Connected Photovoltaic System
2.2 Decentralized Grid-Connected System
3 Generations of Photovoltaic Cells
4 Sizing of GCPV System
5 Economical Aspects
6 Modelling and Simulation
6.1 Models of Grid-Connected Photovoltaic Systems
6.2 Real-Time Simulation
7 Performance Analysis
7.1 Faults
7.2 Synchronization
8 Conclusion
References
A Novel Approach of Deduplication on Indian Demographic Variation for Large Structured Data
1 Introduction
2 Literature Survey and Gap Analysis
2.1 Literature Survey
2.2 Gap Analysis
3 Methodology
3.1 Approach for Customization of Deepmatcher
3.2 Blocking
3.3 Entity Matching
3.4 Deduplication
4 Results
5 Conclusion and Future Work
References
Dual-Message Compression with Variable Null Symbol Incorporation on Constrained Optimization-Based Multipath and Multihop Routing in WSN
1 Introduction
2 Related Works
3 Effect of Source Coding on Transmission Energy
4 Network Model and Optimization Criterion
4.1 Network Model Description and Packet Distribution Policy
4.2 Energy Calculation
4.3 Solving Optimization Problem
5 Employ Source Coding Scheme
6 Effects of Device Characteristics
7 Result Analysis
7.1 Theoretical Energy Savings
7.2 Simulation on Real-Life Sensor Data
8 Conclusion
References
Spatio-temporal Variances of COVID-19 Active Cases and Genomic Sequence Data in India
1 Introduction
1.1 Spatio-temporal Data of COVID-19
1.2 Spatio-temporal Analysis of SARS-CoV-2 Genome Sequencing Data
2 Spatio-temporal Clustering for Epidemiological Analysis
2.1 ST Variations in India During the Second Wave of COVID-19
2.2 ST Clustering Helps to Understand COVID-19 Hotspots
2.3 ST Clustering Algorithms
2.4 Event Clustering for Disease Clusters
3 Spatio-temporal Variances of COVID-19 in India
3.1 Active Cases Have Seen Much More Clusters in the Second Wave
3.2 ST Clusters of Genomic Sequences in the Second Wave Are More but Lack Spatial Coverage
4 Comparing ST Clusters of Case Data with Genomic Data
4.1 Metrics for Measuring Similarity of ST Clusters
5 Conclusions
References
Significance of Thermoelectric Cooler Approach of Atmospheric Water Generator for Solving Fresh Water Scarcity
1 Introduction
2 Working Concept of TEC
3 Related Work on TEC Based AWG
4 Proposal for TEC Based AWG
5 Conclusion
References
A Comprehensive Analysis of Testing Efforts Using the Avisar Testing Tool for Object Oriented Softwares
1 Introduction
2 Related Work
3 AVISAR for Estimating the Testing Effort
4 Interfaces and Exploratory Setup of AVISAR
5 Results of Experiments
6 Conclusions
References
Fabrication of Energy Potent Data Centre Using Energy Efficiency Metrics
1 Introduction
2 Metrics Framework
3 Power Usage Effectiveness (PUE)
4 Proposed Work and Discussions
5 Conclusion
References
Performance Anomaly and Change Point Detection for Large-Scale System Management
1 Introduction
2 Statistical Exception and Trend Detection System (SETDS)
3 Anomaly Detection by Neural Network
4 Entropy-Based Method of Anomaly Detection
References
Artificial Intelligence Driven Monitoring, Prediction and Recommendation System (AIM-PRISM)
1 Introduction
2 Historical Background
3 Problem Definition
3.1 Limitations of Existing Solutions
3.2 Scope of AIM-PRISM System
4 Proposed Solution
4.1 Monitoring, Prediction and Recommendation Approach
4.2 Leveraged Advanced AI/ML and Associated Techniques
5 System Architecture
6 Advantages of Using AIM-PRISM
7 Results and Comparison with Benchmarks
8 Conclusion and Future Directions
References
Financial Forecasting of Stock Market Using Sentiment Analysis and Data Analytics
1 Introduction
2 Related Work
3 Proposed Model
3.1 Data Collection
3.2 Data Preparation
3.3 Classical Machine Learning Techniques
3.4 Deep Learning Techniques
4 Results
5 Conclusion
References
A Survey on Learning-Based Gait Recognition for Human Authentication in Smart Cities
1 Introduction
2 Background
3 Literature Review
4 Conclusion and Future Scope
References
Micro-arterial Flow Simulation for Fluid Dynamics: A Review
1 Introduction
2 Background of Small Vessel Disease
3 Evolution of Hemodynamics
4 Problem Statement and Objectives
5 Methodology
5.1 3D Model Reconstruction
5.2 Initial and Boundary Conditions
5.3 Analyzing the Model in CFD
6 Results and Discussions
7 Conclusion
References
Hip-Hop Culture Incites Criminal Behavior: A Deep Learning Study
1 Introduction
2 Methodology
3 Song Lyrics Dataset
4 Models
4.1 Support Vector Machine
4.2 Random Forest Classifier
4.3 Multinomial Naive Bayes
4.4 Long Short-Term Memory (LSTM)
5 Model Evaluation
6 LSTM Application and Outputs
7 Visualizations
8 Crime Data
9 Results
10 Future Scope
11 Conclusion
References
Complex Contourlet Transform Domain Based Image Compression
1 Introduction
1.1 Methods to Perform Lossy Compression
2 Contourlet Family-Based Transform Method
3 Methods Used
3.1 Quantization
3.2 Arithmetic Encoding Technique
4 Conclusion
References
HWYL: An Edutainment Based Mobile Phone Game Designed to Raise Awareness on Environmental Management
1 Introduction
2 Related Studies
3 Research Methodologies
4 Project Development
5 Results and Discussions
6 Conclusion and Recommendation
References
IoT and AI Based Advance LPG System (ALS)
1 Introduction
2 Related Work
3 Components
4 System Operation
4.1 Distributor and Customer/User Interface
4.2 Automatic Gas Booking System
4.3 Gas Leakage Detection System
4.4 Smart Gas Stove
5 Comparison
6 Results
7 Conclusion
References
ICT-Enabled Automatic Vehicle Theft Detection System at Toll Plaza
1 Introduction
1.1 Vehicle Theft Statistics in India
1.2 Current FIR Process for Vehicle Theft Detection
1.3 Current Vehicle Theft Recovery Process
1.4 Limitations of the Available Vehicle Theft Detection Systems
2 ICT-Enabled Vehicle Theft Detection System at Toll Plaza (Proposed Concept)
2.1 Elements of New Toll System
2.2 Working Mechanism
2.3 Flow Chart
3 Pros and Cons
4 Conclusion
References
RBJ20 Cryptography Algorithm for Securing Big Data Communication Using Wireless Networks
1 Introduction
2 Past Work
3 Methodology
3.1 RBJ20 Algorithm
3.2 RBJ20 Algorithm Encryption
3.3 Working of RBJ Decryption Algorithm
4 Comparison
5 Conclusion
References
Decision Tree for Uncertain Numerical Data Using Bagging and Boosting
1 Introduction
2 Research Objectives
3 Boosting and Bagging
4 Uncertain Data
5 Related Work
6 Problem Formulation
7 Research Methodology
8 Expected Outcome
9 Conclusion
References
A Comparative Study on Various Sharing Among Undergraduates in PreCovid-19 and Covid-19 Period Using Network Parameters
1 Introduction
1.1 Objective of the Study
2 Social Network Data
3 Results and Discussion
3.1 Intellectual Sharing, Financial Sharing, Emotional Sharing and Information Sharing in PreCovid-19 Period
3.2 Intellectual Sharing, Financial Sharing, Emotional Sharing and Information Sharing During the Period of Covid-19
3.3 A Comparative Study on Various Sharing Among Undergraduates in PreCovid-19 and Covid-19 Period Using Network Parameters
4 Conclusion
References
Adoption of Smart Agriculture Using IOT: A Solution for Optimal Soil Decision Tree Making
1 Introduction
2 Methodology
3 Related Work
3.1 Soil Parameters
3.2 Soil Report Analysis
4 Implementation
5 Results and Discussions
6 Conclusion
References
ThreatHawk: A Threat Intelligence Platform
1 Introduction
2 Related Work
3 Proposed Approach
4 Technologies Used
5 Experimental Severity
6 Conclusion
References
Network Slicing in Software-Defined Networks for Resource Optimization
1 Introduction
2 Related Work
2.1 Challenges
3 Proposed Model
3.1 Methodology
4 Implementation
4.1 Simulation Environment Setup
5 Result Analysis
6 Conclusion
References
A Tensor-Based Submodule Clustering for 2D Images Using l12-Induced Tensor Nuclear Norm Minimization
1 Introduction
2 Related Works
3 Proposed Method
3.1 Proposed l12-induced TNN Deduced Using Half Thresholding
3.2 Solution
4 Results and Discussions
5 Conclusions
References
A CNN-LSTM Approach for Classification of Major TCP Congestion Control Algorithms
1 Introduction
2 Related Work
3 Proposed Hybrid Classification Approach
3.1 Step 1: Generation of Congestion Window Value Files
3.2 Step 2: Normalization of Data
3.3 Step 3: Classifiers in the Proposed Approach
4 Simulation and Performance Analysis
4.1 Estimation of cwnd from Packet Traces
4.2 Normalization
4.3 Neural Network Classifiers
5 Conclusion
References
A Novel Approach for Detection of Intracranial Tumor Using Image Segmentation Based on Cellular Automata
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Image Acquisition and Image Processing
3.2 Growing Regions with Modified Texture
3.3 CA Based Edge Detection
3.4 Algorithm Used for Image Segmentation
4 Experimental Results
5 Conclusions
References
Latency Evaluation in an IoT-Fog Model
1 Introduction
2 Background
2.1 Cloud Computing
2.2 Fog Computing
2.3 Internet of Things
3 Role of Latency in IoT-Fog-Cloud Model
4 Related Work
5 Discussion
6 Conclusion and Future Work
References
Effective Online Tools for Teaching Java Programming Course on an Online Platform
1 Introduction
2 Related Work
3 Online Tools
3.1 JDoodle
3.2 Codeshare
3.3 Poll Everywhere
4 Result Analysis
5 Conclusion
References
Interactive Learning Application for Indian Sign Language
1 Introduction
2 Literature Survey
3 Proposed Method
3.1 Architecture
3.2 Algorithm
4 Test Result and Performance Evaluation
5 Challenges
6 Future Work
7 Conclusion
References
Impact of IT Leadership on Transformation of the Role of IT in Improving Individual Tax Return Reporting Compliance at the Directorate General of Taxes
1 Introduction
2 Literature Studies
2.1 Taxpayer Compliance
2.2 The Role of IT in an Organization
2.3 IT Leadership Approaches to the Role of IT
3 Research Methodology
3.1 Research Data Collection Procedure and Instrument
3.2 The Research Methodology Stage
4 Result and Discussion
4.1 Changes in the Role of Information Technology
4.2 Information Technology Leadership
5 Conclusion and Suggestions
References
Self-driving Car Using Raspberry Pi 3 B+ and Pi Camera
1 Introduction
2 Methodology
3 Learning
4 Automation
5 Hardware Implementation
5.1 Raspberry Pi
5.2 Pi Camera
5.3 L298N Motor Driver
6 Results
7 Conclusion
8 Future Scope
References
Energy-Efficient Cluster Formation for Wireless Sensor Networks Using Fuzzy Logic
1 Introduction
2 Related Work
3 System Model
3.1 Fuzzy Logic Model
4 Result
4.1 Sample Network 1
5 Conclusions
References
Hierarchical Cluster-Based Model to Evaluate Accuracy Metrics Based on Cluster Efficiency
1 Introduction
2 Clustering Concept
2.1 Agglomerative Hierarchical Clustering Mechanism
3 Quality Analysis
3.1 Elapsed Time
3.2 Cohesion Measurement
3.3 Silhouette Index
4 Experimental Analysis
5 Methodology and Design
6 Conclusion
References
Digital Audio Watermarking: Techniques, Applications, and Challenges
1 Introduction
2 Important Properties of Digital Audio Watermarking
3 Watermarking Procedure [2, 4, 9–11]
3.1 Watermark Embedding
3.2 Watermark Extraction
4 Applications of Audio Watermarking [3, 4, 9, 11, 14]
5 Possible Attacks on Audio Watermarking [3–5, 11, 16]
6 Different Techniques Used for Audio Watermarking [3–5, 9, 17]
6.1 Time Domain
6.2 Transform Domain
6.3 Hybrid Techniques
7 Performance Evaluation Parameters [4, 6, 14, 17]
7.1 Peak Signal-to-Noise Ratio (PSNR) [1, 4]
7.2 Bit Error Rate (BER) [4, 6, 12]
7.3 Mean Squared Error (MSE) [1, 4, 14]
8 Conclusion
References
Transfer Learning for Handwritten Character Recognition
1 Introduction
2 Related Work
3 Preparation of Dataset
4 Architecture on Pre-trained Networks
4.1 AlexNet
4.2 Vgg Net
4.3 GoogLeNet
4.4 Residual Network (ResNet)
4.5 Experimental Results and Analysis
5 Conclusion and Future Work
References
IoT-Based Smart Solar Monitoring System
1 Introduction
2 Literature Review
3 Methodology
3.1 Solar Monitoring System
3.2 Sun Position Tracking
4 Applications
5 Conclusion
References
Conglomerate Elgamal Encryption-Based Data Aggregation and Knapsack-Based Energy Efficient Cryptosystem in Wireless Sensor Network
1 Introduction
2 Review of Literature
3 Proposed Methodology
3.1 Data Aggregation-Based ConglomerateElGamal Encryption
4 Performance Analysis
5 Conclusion
References
Optimization of Resource and Energy Utilization in Device-to-Device Communication Under Cellular Network
1 Introduction
2 Related Work
3 Proposed Methodology
4 Experimental Analysis
5 Conclusion and Future Work
References
Towards an Integrated Conceptual Model for Open Government Data in Saudi Arabia
1 Introduction
2 Literature Review
2.1 Open Government Data Adoption
2.2 Open Government Data Benefits
2.3 Barriers of Open Government Data
2.4 Theories of Technology Adoption
3 A Review of Proposed Models for OGD Adoption
4 The Proposed Model
4.1 The Category of Suppliers Side
4.2 The category of demand side
5 Methodology
6 Results
6.1 Data Governance
6.2 Data Integration
7 Conclusion
References
IoT-Based Horticulture Monitoring System
1 Introduction
2 Research Method
2.1 Design of the System
2.2 Electronic Components
2.3 Networking
3 Testing, Results and Discussion
3.1 Testing
3.2 Firebase and Android Application
3.3 MATLAB ThingSpeak
4 Conclusions and Future Work
References
A Comprehensive Review on the Role of PMU in Managing Blackouts
1 Introduction
2 State of the Art PMU Methods for Stability Evaluation
2.1 Review of Literature
3 Scope of PMU Applications in Network Situation Awareness
4 Blackouts: Causes and Control Actions
4.1 Causes for Blackouts
4.2 PMUs—Selections of Control Actions to Avoid Blackouts
5 Conclusions
References
Secure COVID-19 Treatment with Blockchain and IoT-Based Framework
1 Introduction
1.1 Background
1.2 Motivations
1.3 Covid Remedy Technique
1.4 Outline
2 Literature Review
3 Blockchain with COVID-19
4 Blockchain with COVID-19
4.1 Pandemic Tracking
4.2 User Confidentiality
4.3 Safe Procedure
4.4 Medical Ledger
4.5 Donation Record
5 IoT and Blockchain with COVID-19
5.1 Proposed Layered Architecture
5.2 Complete Healthcare Framework with IoT and Blockchain
5.3 Securing Patients Buy Insurance with a Proposed Framework
6 Conclusion
References
Design of a CPW-Fed Microstrip Elliptical Patch UWB Range Antenna for 5G Communication Application
1 Introduction
2 Theories of Elliptically Patch Microstrip Antenna
3 Selection of Substrate
4 CPW Theory
5 Proposed Antenna Structure
6 Simulation Results and Discussion
7 Conclusion
References
A Review on Consensus Protocol of Blockchain Technology
1 Introduction
2 Literature Review
3 Consensus Algorithm
3.1 Proof of Work (PoW)
3.2 Proof of Stake (PoS)
3.3 Delegated Proof of Stake (DPoS)
3.4 Proof of Elapsed Time (PoET)
3.5 Practical Byzantine Fault Tolerance (PBFT)
3.6 Proof of Activity (PoA)
4 Comparative Investigation of Consensus Algorithm
4.1 Basic Characteristics
4.2 Transaction Metrics
4.3 Network Aspect
4.4 Performance Metrics
4.5 Security Aspect
4.6 Block Aspect
5 Merits and Demerits
6 Conclusion
References
Retraction Note to: Optimization of the Overlap Shortest-Path Routing for TSCH Networks
Retraction Note to: Chapter “Optimization of the Overlap Shortest-Path Routing for TSCH Networks” in: A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-714
Author Index

Recommend Papers

Intelligent Sustainable Systems: Proceedings of ICISS 2021 (Lecture Notes in Networks and Systems, 213) 9811624216, 9789811624216

This book features research papers presented at the 4th International Conference on Intelligent Sustainable Systems (ICI

124 55 25MB Read more

Intelligent Sustainable Systems: Proceedings of ICISS 2022 (Lecture Notes in Networks and Systems, 458) 9811928932, 9789811928932

This book features research papers presented at the 5th International Conference on Intelligent Sustainable Systems (ICI

106 45 28MB Read more

Intelligent Systems and Networks: Selected Articles from ICISN 2021, Vietnam (Lecture Notes in Networks and Systems, 243) 9811620938, 9789811620935

This book presents Proceedings of the International Conference on Intelligent Systems and Networks (ICISN 2021), held at

125 78 90MB Read more

Emerging Research in Intelligent Systems: Proceedings of the CIT 2021 Volume 2 (Lecture Notes in Networks and Systems) 3030960455, 9783030960452

This book constitutes the proceedings of the XVI Multidisciplinary International Congress on Science and Technology (CIT

124 108 27MB Read more

Intelligent Systems: Proceedings of ICMIB 2021 (Lecture Notes in Networks and Systems, 431) 9811909008, 9789811909009

This book features best selected research papers presented at the International Conference on Machine Learning, Internet

127 122 21MB Read more

Emerging Research in Intelligent Systems: Proceedings of the CIT 2021 Volume 1 (Lecture Notes in Networks and Systems) 3030960420, 9783030960421

This book constitutes the proceedings of the XVI Multidisciplinary International Congress on Science and Technology (CIT

99 76 48MB Read more

Intelligent Systems and Applications: Proceedings of the 2021 Intelligent Systems Conference (IntelliSys) Volume 2 (Lecture Notes in Networks and Systems, 295) 3030821951, 9783030821951

This book presents Proceedings of the 2021 Intelligent Systems Conference which is a remarkable collection of chapters c

121 57 86MB Read more

Advances in Data-driven Computing and Intelligent Systems: Selected Papers from ADCIS 2022, Volume 2 (Lecture Notes in Networks and Systems, 653) [1st ed. 2023] 9819909805, 9789819909803

The volume is a collection of best selected research papers presented at International Conference on Advances in Data-dr

116 37 23MB Read more

International Conference on Advanced Intelligent Systems for Sustainable Development: Volume 2 - Advanced Intelligent Systems on Network, Security, ... (Lecture Notes in Networks and Systems) 3031352505, 9783031352508

This book describes the potential contributions of emerging technologies in different fields as well as the opportunitie

124 50 55MB Read more

Communication and Intelligent Systems: Proceedings of ICCIS 2022, Volume 2 (Lecture Notes in Networks and Systems, 689) 9819923212, 9789819923212

This book gathers selected research papers presented at the Fourth International Conference on Communication and Intelli

120 43 Read more

Intelligent Sustainable Systems: Selected Papers of WorldS4 2021, Volume 2 (Lecture Notes in Networks and Systems, 334)
9811663688, 9789811663680

Author / Uploaded
Atulya K. Nagar (editor)
Dharm Singh Jat (editor)
Gabriela Marín-Raventós (editor)
Durgesh Kumar Mishra (editor)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Lecture Notes in Networks and Systems 334

Atulya K. Nagar Dharm Singh Jat Gabriela Marín-Raventós Durgesh Kumar Mishra Editors

Intelligent Sustainable Systems Selected Papers of WorldS4 2021, Volume 2

Lecture Notes in Networks and Systems Volume 334

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas— UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Turkey Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.

More information about this series at https://link.springer.com/bookseries/15179

Atulya K. Nagar · Dharm Singh Jat · Gabriela Marín-Raventós · Durgesh Kumar Mishra Editors

Intelligent Sustainable Systems Selected Papers of WorldS4 2021, Volume 2

Editors Atulya K. Nagar School of Mathematics, Computer Science, and Engineering Liverpool Hope University Liverpool, UK Gabriela Marín-Raventós University of Costa Rica Curridabat, San Jose, Costa Rica

Dharm Singh Jat Namibia University of Science and Technology Windhoek, Namibia Durgesh Kumar Mishra Sri Aurobindo Institute of Technology Indore, Madhya Pradesh, India

ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-16-6368-0 ISBN 978-981-16-6369-7 (eBook) https://doi.org/10.1007/978-981-16-6369-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022, corrected publication 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

The Fifth Edition of the WorldS4 2021—World Conference on Smart Trends in Systems Security and Sustainability—was held during 29–30 July 2021, digitally on Zoom and organised by Global Knowledge Research Foundation. The associated partners were Springer and InterYIT IFIP and IEEE UK and Ireland Section. The conference will provide a useful and wide platform both for display of the latest research and for exchange of research results and thoughts. The participants of the conference will be from almost every part of the world, with background of either academia or industry, allowing a real multinational and multicultural exchange of experiences and ideas. A great pool of more than 870 papers were received papers for this conference from across 28 countries among which around 144 papers were accepted with this Springer series and were presented through digital platform during the two days. Due to overwhelming response, we had to drop many papers in hierarchy of the quality. A total of 28 technical sessions were organised in parallel in 2 days along with few keynotes and panel discussions. The conference will be involved in deep discussion and issues which will be intended to solve at global levels. New technologies will be proposed, experiences will be shared, and future solutions for enhancement in systems and security will also be discussed. The final papers will be published in 2 volumes of the proceedings by Springer LNNS Series. Over the years, this conference has been organised and conceptualised with collective efforts of a large number of individuals. I would like to thank each of the committee members and the reviewers for their excellent work in reviewing the papers. Grateful acknowledgements are extended to the team of Global Knowledge Research Foundation for their valuable efforts and support.

v

vi

Preface

I look forward to welcome you on the 6th Edition of this WorldS4 Conference in 2022. Liverpool, UK Windhoek, Namibia Curridabat, Costa Rica Indore, India

Atulya K. Nagar Dharm Singh Jat Gabriela Marín-Raventós Durgesh Kumar Mishra

Contents

Wall-Distance Measurement for Indoor Mobile Robots . . . . . . . . . . . . . . . . Nadeem Zain, Nadeem Hamza, Khan Jameel Ahmed, and Ullah Anayat

1

Automating Cognitive Modelling Considering Non-Formalisable Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexander Raikov

11

Using a Humanoid Robot to Assist Post-stroke Patients with Standardized Neurorehabilitation Therapy . . . . . . . . . . . . . . . . . . . . . . Peter Forbrig, Alexandru Bundea, Ann Pedersen, and Thomas Platz

19

Human Resource Information System in Healthcare Organizations . . . . Hawraa Aref Al-Mutawa and Paul Manuel

29

Performance Prediction of Scalable Multi-agent Systems Using Parallel Theatre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Franco Cicirelli and Libero Nigro

45

Dynamics of Epidemic Computer Subnetwork Models for Scan-Based Worm Propagation: An Internet Protocol Addressing Configuration Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ChukwuNonso H. Nwokoye, Roseline U. Paul, Ikechukwu I. Umeh, and Vincent S. O. Okeke

65

Security Analysis of Integrated Clinical Environment Using Attack Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mariam Ibrahim, Haneen Okasha, and Ruba Elhafiz

75

Smart University: Key Factors for a Cloud Computing Adoption Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dewar Rico-Bautista, César D. Guerrero, César A. Collazos, Gina Maestre-Gongora, María Camila Sánchez-Velásquez, Yurley Medina-Cárdenas, and Jose Swaminathan

85

vii

viii

Contents

Agile Governance Supported by the Frugal Smart City . . . . . . . . . . . . . . . Adnane Founoun, Aawatif Hayar, Kabil Essefar, and Abdelkrim Haqiq

95

Effect of Normal, Calcium Chloride Integral and Polyethene Sheet Membrane Curing on the Strength Characteristics of Glass Fiber and Normal Concrete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Jaison Joy Memadam and T. V. S. Varalaksmi Problems with Health Information Systems in Ecuador, and the Need to Educate University Students in Health Informatics in Times of Pandemic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Gustavo Caiza, Fernando Ibarra-Torres, Marcelo V. Garcia, and Valeria Barona-Pico Utilizing Technological Pedagogical Content (TPC) for Designing Public Service Websites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Zahra Hosseini, Jani Kinnunen, and Kimmo Hytönen A Multidimensional Rendering of Error Types in Sensor Data . . . . . . . . . 139 Zlatinka Kovacheva, Ina Naydenova, and Kalinka Kaloyanova RETRACTED CHAPTER: Optimization of the Overlap Shortest-Path Routing for TSCH Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Javier Caceres and Marcelo V. Garcia Application of Emerging Technologies in Aviation MRO Sector to Optimize Cost Utilization: The Indian Case . . . . . . . . . . . . . . . . . . . . . . . . 161 Chandravadan Goritiyal, Aditi Bairolu, and Laxmi Goritiyal User Evaluation of a Virtual Reality Application for Safety Training in Railway Level Crossing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Oche A. Egaji, Ikram Asghar, Luke Dando, Mark G. Griffiths, and Emma Dymond GreenMile—Gamification-Supported Mobile and Multimodal Route Planning for a Sustainable Choice of Transport . . . . . . . . . . . . . . . . 191 Robin Horst, Timon Fuß, and Ralf Dörner A Detailed Study for Bankruptcy Prediction by Machine Learning Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Suriya Begum Design and Development of Tracking System in Communication for Wireless Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Kamal Upreti, Vinod Kumar, Dharmendra Pal, Mohammad Shabbir Alam, and A. K. Sharma

Contents

ix

Cyberbullying in Online/E-Learning Platforms Based on Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 N. Balaji, B. H. Karthik Pai, Kotari Manjunath, Bhat Venkatesh, N. Bhavatarini, and B. K. Sreenidhi Machine Learning and Remote Sensing Technique for Urbanization Change Detection in Tangail District . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Ananna Talukder, Sadia Mahbub Mim, Sabrina Ahmed, Muhammad Syed, and Rashedur M. Rahman Enabling a Question-Answering System for COVID Using a Hybrid Approach Based on Wikipedia and Q/A Pairs . . . . . . . . . . . . . . . 251 Janneth Chicaiza and Nadjet Bouayad-Agha A Study of Purchase Behavior of Ornamental Gold Consumption . . . . . . 263 Shalini Kakkar and Pradnya V. Chitrao Circularly Polarized Microstrip Patch Antenna for 5G Applications . . . . 273 Sanjeev Kumar, A. Veekshita Sai Choudhary, Aditya Andotra, Himshweta Chauhan, and Anshika Mathur An Approach Towards Protecting Tribal Lands Through ICT Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 K. Rajeshwar and Sonal Mobar Roy Urban Sprawl Assessment Using Remote Sensing and GIS Techniques: A Case Study of Ernakulam District . . . . . . . . . . . . . . . . . . . . . 293 Sreya Radhakrishnan and P. Geetha Exploring the Means and Benefits of Including Blockchain Smart Contracts to a Smart Manufacturing Environment: Water Bottling Plant Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 O. L. Mokalusi, R. B. Kuriakose, and H. J. Vermaak Extreme Gradient Boosting for Predicting Stock Price Direction in Context of Indian Equity Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Sachin Jadhav, Vrushal Chaudhari, Pratik Barhate, Kunal Deshmukh, and Tarun Agrawal Performance of Grid-Connected Photovoltaic and Its Impact: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Sanjiba Kumar Bisoyi, Sudhanshu Maurya, Suyash Binod, Aayush Srivastava, and Abhishek Baluni A Novel Approach of Deduplication on Indian Demographic Variation for Large Structured Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Krishnanjan Bhattacharjee, Chahat Garg, S. Shivakarthik, Swati Mehta, Ajai Kumar, Shonil Bhide, Kshitija Kulkarni, Shivank Ratnaparkhi, Khushboo Agarwal, and Varsha Naik

x

Contents

Dual-Message Compression with Variable Null Symbol Incorporation on Constrained Optimization-Based Multipath and Multihop Routing in WSN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Pratham Majumder, Tuhin Majumder, Punyasha Chatterjee, and Sunrose Shrestha Spatio-temporal Variances of COVID-19 Active Cases and Genomic Sequence Data in India . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Sumit Sen and Neelam Dabas Sen Significance of Thermoelectric Cooler Approach of Atmospheric Water Generator for Solving Fresh Water Scarcity . . . . . . . . . . . . . . . . . . . 377 B. K. Imtiyaz Ahmed and Abdul Wahid Nasir A Comprehensive Analysis of Testing Efforts Using the Avisar Testing Tool for Object Oriented Softwares . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Zunaid Aalam, Satnam Kaur, Prashant Vats, Amandeep Kaur, and Rini Saxena Fabrication of Energy Potent Data Centre Using Energy Efficiency Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Subhodip Mukherjee, Debabrata Sarddar, Rajesh Bose, and Sandip Roy Performance Anomaly and Change Point Detection for Large-Scale System Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Igor Trubin Artificial Intelligence Driven Monitoring, Prediction and Recommendation System (AIM-PRISM) . . . . . . . . . . . . . . . . . . . . . . . . 409 Sanjeev Manchanda Financial Forecasting of Stock Market Using Sentiment Analysis and Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 Dipashree Patil, Shivani Patil, Shreya Patil, and Sandhya Arora A Survey on Learning-Based Gait Recognition for Human Authentication in Smart Cities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 Arindam Singh and Rajendra Kumar Dwivedi Micro-arterial Flow Simulation for Fluid Dynamics: A Review . . . . . . . . 439 Rithusravya Jakka and Sathwik Rao Alladi Hip-Hop Culture Incites Criminal Behavior: A Deep Learning Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 Niharika Abhange, Rahul Jadhav, Siddhant Deshpande, Swarad Gat, Varsha Naik, and Saishashank Konduri Complex Contourlet Transform Domain Based Image Compression . . . . 467 G. Saranya, G. S. Shrinidhi, and S. Bargavi

Contents

xi

HWYL: An Edutainment Based Mobile Phone Game Designed to Raise Awareness on Environmental Management . . . . . . . . . . . . . . . . . . 475 Ace C. Lagman, Ma. Corazon F. Raguro, Maria Vicky S. Solomo, Jay-Ar P. Lalata, Marie Luvett I. Goh, and Heintjie N. Vicente IoT and AI Based Advance LPG System (ALS) . . . . . . . . . . . . . . . . . . . . . . . 483 Veral Agarwal, Pratap Mishra, Raghu Sharan, Naveen Sharma, and Rachit Patel ICT-Enabled Automatic Vehicle Theft Detection System at Toll Plaza . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 Kamlesh Kumawat and Vijay Singh Rathore RBJ20 Cryptography Algorithm for Securing Big Data Communication Using Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 S. Rajaprakash, N. Jaishanker, Chan Bagath Basha, S. Muhuselvan, A. B. Aswathi, Athira Jayan, and Ginu Sebastian Decision Tree for Uncertain Numerical Data Using Bagging and Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 Santosh S. Lomte and Sanket Gunderao Torambekar A Comparative Study on Various Sharing Among Undergraduates in PreCovid-19 and Covid-19 Period Using Network Parameters . . . . . . . 525 V. G. Deepa, S. Aparna Lakshmanan, and V. N. Sreeja Adoption of Smart Agriculture Using IOT: A Solution for Optimal Soil Decision Tree Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 C. Sathish and K. Srinivasan ThreatHawk: A Threat Intelligence Platform . . . . . . . . . . . . . . . . . . . . . . . . 547 Adhyayan Panwar, Anjali Nair, Ayush Sonthalia, Kavitha Sooda, and Mohan Yelnadu Network Slicing in Software-Defined Networks for Resource Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 Meenaxi M. Raikar and S. M. Meena A Tensor-Based Submodule Clustering for 2D Images Using l 21 -Induced Tensor Nuclear Norm Minimization . . . . . . . . . . . . . . . . . . . . . . 565 Jobin Francis, Akhil Johnson, Baburaj Madathil, and Sudhish N. George A CNN-LSTM Approach for Classification of Major TCP Congestion Control Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579 B. Nithya, V. Venkataraman, D. V. Nithin Balaaji, and Chandra Chud A Novel Approach for Detection of Intracranial Tumor Using Image Segmentation Based on Cellular Automata . . . . . . . . . . . . . . . . . . . . 593 Prashant Vats, Sandeep Singh, Seema Barda, Zunaid Aalam, and Manju Mandot

xii

Contents

Latency Evaluation in an IoT-Fog Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Eram Fatima Siddiqui, Sandeep Kumar Nayak, and Mohd. Faisal Effective Online Tools for Teaching Java Programming Course on an Online Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615 Sapana S. Barphe, Varsha T. Lokare, Sanjay R. Sutar, and Arvind W. Kiwelekar Interactive Learning Application for Indian Sign Language . . . . . . . . . . . 623 Pratiksha Sancheti, Nayan Sabnis, Keshav Kadam, Yash Rode, and Pujashree Vidap Impact of IT Leadership on Transformation of the Role of IT in Improving Individual Tax Return Reporting Compliance at the Directorate General of Taxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633 Septi Ariani, R. A. Aliya Rahmawati Wahab, Yudhi Dwi Fajar Maulana, Fitri Wijayanti, Mainar Swari Mahardika, and Muhammad Rifki Shihab Self-driving Car Using Raspberry Pi 3 B+ and Pi Camera . . . . . . . . . . . . . 649 D. Rakesh Reddy, D. Akshith Reddy, and Mahammad Eliyaz Energy-Efficient Cluster Formation for Wireless Sensor Networks Using Fuzzy Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657 Atul Pawar, Mihir Mondhe, Pranav Kharche, Shrutik Manwatkar, and Ganesh Dhore Hierarchical Cluster-Based Model to Evaluate Accuracy Metrics Based on Cluster Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667 K. N. Sridevi, Surekha Pinnapati, and S. Prakasha Digital Audio Watermarking: Techniques, Applications, and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679 Abhijit Patil and Ramesh Shelke Transfer Learning for Handwritten Character Recognition . . . . . . . . . . . . 691 Sanasam Inunganbi and Robin Singh Katariya IoT-Based Smart Solar Monitoring System . . . . . . . . . . . . . . . . . . . . . . . . . . 701 Vinay Rasal, Vrushabh Raulkar, Sumant Reddy, Jitesh Sanap, Arpit Sarap, and Manisha Mhetre Conglomerate Elgamal Encryption-Based Data Aggregation and Knapsack-Based Energy Efficient Cryptosystem in Wireless Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709 T. G. Babu and V. Jayalakshmi

Contents

xiii

Optimization of Resource and Energy Utilization in Device-to-Device Communication Under Cellular Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729 Munna Lal Jatav, Ashutosh Datar, and Leeladhar Malviya Towards an Integrated Conceptual Model for Open Government Data in Saudi Arabia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741 Abdullah Alhujaylan, Leslie Carr, and Matthew Ryan IoT-Based Horticulture Monitoring System . . . . . . . . . . . . . . . . . . . . . . . . . . 765 Monika Rabka, Dion Mariyanayagam, and Pancham Shukla A Comprehensive Review on the Role of PMU in Managing Blackouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775 K. S. Harshith Gowda and N. Gowtham Secure COVID-19 Treatment with Blockchain and IoT-Based Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785 Garima Jain, Garima Shukla, Priyanka Saini, Anubha Gaur, Divya Mishra, and Shyam Akashe Design of a CPW-Fed Microstrip Elliptical Patch UWB Range Antenna for 5G Communication Application . . . . . . . . . . . . . . . . . . . . . . . . . 801 Abhishek Kumar, Garima Jain, Suraj, Prakhar Jindal, Vishwas Mishra, and Shyam Akashe A Review on Consensus Protocol of Blockchain Technology . . . . . . . . . . . 813 Arpit Jain and Dharm Singh Jat Retraction Note to: Optimization of the Overlap Shortest-Path Routing for TSCH Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Javier Caceres and Marcelo V. Garcia

C1

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831

Editors and Contributors

About the Editors Atulya K. Nagar holds Foundation Chair as a professor of Mathematical Sciences and is pro-vice-chancellor for Research and Dean of the Faculty of Science at Liverpool Hope University, UK. He is also the head of the School of Mathematics, Computer Science, and Engineering, which he established at the University. He is an internationally respected scholar working at the cutting edge of theoretical computer science, applied mathematical analysis, operations research, and systems engineering. He received a prestigious Commonwealth Fellowship to pursue his doctorate (D.Phil.) in Applied Nonlinear Mathematics, which he earned from the University of York (UK) in 1996. He holds a B.Sc. (Hons.), an M.Sc., and M.Phil. (with distinction) in Mathematical Physics from the MDS University of Ajmer, India. His research expertise spans both applied mathematics and computational methods for nonlinear, complex, and intractable problems arising in science, engineering, and industry. Dharm Singh Jat is a professor of Computer Science at Namibia University of Science and Technology (NUST). He has guided about 8 Ph.D. and 24 master research scholars. He is the author of more than 150 peer-reviewed articles and the author or editor of more than 20 books. His interests span the areas of multimedia communications, wireless technologies, mobile communication systems, edge roof computing and, software defined networks, network security, and Internet of things. He has given several guest lecturer/invited talks at various prestigious conferences. He is a fellow of the Institution of Engineers (I), Fellow of Computer Society of India, Chartered Engineer (I), Senior Member IEEE and Distinguished ACM Speaker. He also developed experiments based on simulation software for network experiments, research, and project work for undergraduate and postgraduate students. His research in developing video communication platforms for solving QoS issues in video communications and also developed a framework for video transmission over wireless networks for undergraduate and postgraduate students.

xv

xvi

Editors and Contributors

Gabriela Marín-Raventós received a M.Sc. in Computer Science from Case Western Reserve University in 1985 and a Ph.D. in Business Analysis and Research from Texas A&M University, USA, in 1993. She has been a Computer Science faculty member at Universidad de Costa Rica (UCR) since 1980. She was the dean of Graduate Studies and the director of the Research Center for Communication and Information Technologies (CITIC), both at UCR. Currently, she is a director of the Graduate Program in Computer Science and Informatics. She has organized several international and national conferences, and has been, and actually is, a chair of several program and editorial committees. From 2012 to 2016, she was the president of the Latin American Center for Computer Studies (CLEI), becoming the first woman to occupy such a distinguished position. Since September 2016, she has been vicepresident of the International Federation for Information Processing (IFIP), in charge of the Digital Equity Committee. Her research interests include smart cities, human– computer interaction, decision support systems, gender in IT and digital equity. Durgesh Kumar Mishra has received M.Tech. degree in Computer Science from DAVV, Indore, in 1994 and Ph.D. degree in Computer Engineering in 2008. Presently, he has been working as a director and professor (CSE) at Sri Aurobindo Institute of Technology, Indore, MP, India. He is also a visiting faculty at IIT-Indore, MP, India. He has 24 years of teaching and 10 years of research experience. He has completed his Ph.D. on the area—Secure Multi-Party Computation for Preserving Privacy. He has published more than 90 papers in refereed international/national journals and conferences including IEEE, ACM conferences. His publications are listed in DBLP, Citeseer-x, Elsevier, and Scopus. He is a senior member of IEEE. He visited MIT Boston and presented his presentation on security and privacy. He also chaired a panel on “Digital Monozukuri” at “Norbert Winner in 21st century” at BOSTON. He became the member of Bureau of Indian standards (BIS), Government of India, for Information Security domain.

Contributors Zunaid Aalam SGT University, Gurugram, Haryana, India Niharika Abhange Dr. Vishwanath Karad’s, MIT World Peace University, Pune, India Khushboo Agarwal Dr. Vishwanath Karad MIT World Peace University, Pune, India Veral Agarwal Department of Electronics and Communication Engineering, ABES Institute of Technology, Ghaziabad, Uttar Pradesh, India Tarun Agrawal Department of Information Technology, PCCoE, Pune, Maharashtra, India

Editors and Contributors

xvii

B. K. Imtiyaz Ahmed Department of ECE, CMR Institute of Technology, Bengaluru, India Sabrina Ahmed Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh Shyam Akashe Electronic and Communication, ITM University, Gwalior, Madhya Pradesh, India D. Akshith Reddy Vardhaman College of Engineering, Kacharam, Shamshabad, India Hawraa Aref Al-Mutawa Department of Information Science, College of Life Sciences, Kuwait University, Kuwait City, Kuwait Mohammad Shabbir Alam Department of Computer Science, College of Computer Science and Information Technology, Jazan University, Jizan, Saudi Arabia Abdullah Alhujaylan University of Southampton, Southampton, UK R. A. Aliya Rahmawati Wahab Indonesia University, Jakarta, Indonesia Sathwik Rao Alladi Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India Aditya Andotra Department of Electronics and Telecommunications, Symbiosis International (Deemed University), Pune, India Septi Ariani Indonesia University, Jakarta, Indonesia Sandhya Arora MKSSS Cummins College of Engineering for Women, Pune, India Ikram Asghar The Centre of Excellence in Mobile and Emerging Technologies, University of South Wales, Pontypridd, UK A. B. Aswathi Aarupadai Veedu Institute of Technology, Chennai, India T. G. Babu Vels Institute of Science, Technology and Advanced Studies, Pallavaram, Chennai, India Aditi Bairolu Prin.L.N Welingkar Institute of Management Development & Research (WeSchool), Mumbai, India N. Balaji NMAM Institute of Technology, Udupi, Karnataka, India Abhishek Baluni Department of Electrical Engineering, JSS Academy of Technical Education, Noida, U.P., India Seema Barda Chandigarh University, Chandigarh, India S. Bargavi Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi, India

xviii

Editors and Contributors

Pratik Barhate Department of Information Technology, PCCoE, Pune, Maharashtra, India Valeria Barona-Pico Universidad Técnica de Ambato, UTA, Ambato, Ecuador Sapana S. Barphe Department of Information Technology, Dr. Babasaheb Ambedkar Technological University, Lonere-Raigad, Maharashtra, India Chan Bagath Basha Aarupadai Veedu Institute of Technology, Chennai, India Suriya Begum Bharat Institute of Engineering and Technology, Hyderabad, Telangana, India Krishnanjan Bhattacharjee Center for Development of Advanced Computing (CDAC), Pune, India N. Bhavatarini REVA University, Bengaluru, Karnataka, India Shonil Bhide Dr. Vishwanath Karad MIT World Peace University, Pune, India Suyash Binod Department of Electrical Engineering, JSS Academy of Technical Education, Noida, U.P., India Sanjiba Kumar Bisoyi Department of Electrical Engineering, JSS Academy of Technical Education, Noida, U.P., India Rajesh Bose Brainware University, Kolkata, India Nadjet Bouayad-Agha Faculty of Computer Sciences, Multimedia and Telecommunication, Universitat Oberta de Catalunya (UOC), Barcelona, Spain Alexandru Bundea Department of Computer Science, University of Rostock, Rostock, Germany Javier Caceres Universidad Tecnica de Ambato, UTA, Ambato, Ecuador Gustavo Caiza Universidad Politécnica Salesiana, UPS, Quito, Ecuador Leslie Carr University of Southampton, Southampton, UK Punyasha Chatterjee School of Mobile Computing and Communication, Jadavpur University, Jadavpur, West Bengal, India Vrushal Chaudhari Department of Information Technology, PCCoE, Pune, Maharashtra, India Himshweta Chauhan Department of Electronics and Telecommunications, Symbiosis International (Deemed University), Pune, India Janneth Chicaiza Universidad Técnica Particular de Loja, Loja, Ecuador Pradnya V. Chitrao Symbiosis Institute of Management Studies (SIMS), A Constituent of Symbiosis International Deemed University, Pune, India

Editors and Contributors

xix

Chandra Chud Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, India Franco Cicirelli CNR—National Research Council of Italy, Institute for High Performance Computing and Networking (ICAR), Rende (CS), Italy César A. Collazos Universidad del Cauca, Popayán, Colombia Luke Dando The Centre of Excellence in Mobile and Emerging Technologies, University of South Wales, Pontypridd, UK Ashutosh Datar Samrat Ashok Technological Institute, Vidisha, M.P., India V. G. Deepa Sree Krishna College, Guruvayur, Kerala, India Kunal Deshmukh Department of Information Technology, PCCoE, Pune, Maharashtra, India Siddhant Deshpande Dr. Vishwanath Karad’s, MIT World Peace University, Pune, India Ganesh Dhore Pimpri Chinchwad College of Engineering, Pune, Maharashtra, India Rajendra Kumar Dwivedi Department of IT and CA, MMMUT, Gorakhpur, UP, India Emma Dymond Motion Rail Ltd., Ebbw Vale, Wales, UK Ralf Dörner RheinMain University of Applied Sciences, Wiesbaden, Germany Oche A. Egaji The Centre of Excellence in Mobile and Emerging Technologies, University of South Wales, Pontypridd, UK Ruba Elhafiz Department of Mechatronics Engineering, German Jordanian University, Amman, Jordan Mahammad Eliyaz Vardhaman College of Engineering, Kacharam, Shamshabad, India Kabil Essefar School of Computer and Communication Science, Polytechnic University, Benguerir, Morocco Mohd. Faisal Department of Computer Application, Integral University, Lucknow, Uttar Pradesh, India Peter Forbrig Department of Computer Science, University of Rostock, Rostock, Germany Adnane Founoun Faculty of Sciences and Techniques, Computer, Networks, Mobility and Modeling Laboratory: IR2M, Hassan First University of Settat, Settat, Morocco

xx

Editors and Contributors

Jobin Francis Department of Electronics and Communication Engineering, National Institute of Technology Calicut, Calicut, Kerala, India Timon Fuß RheinMain University of Applied Sciences, Wiesbaden, Germany Marcelo V. Garcia Universidad Técnica de Ambato, UTA, Ambato, Ecuador; University of Basque Country, UPV/EHU, Bilbao, Spain Chahat Garg Center for Development of Advanced Computing (C-DAC), Pune, India Swarad Gat Dr. Vishwanath Karad’s, MIT World Peace University, Pune, India Anubha Gaur Computer Science Engineering, Swami Vivekanand Subharti University, Meerut, India P. Geetha Center for Excellence in Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India Sudhish N. George Department of Electronics and Communication Engineering, National Institute of Technology Calicut, Calicut, Kerala, India Marie Luvett I. Goh FEU Institute of Technology, Manila, Philippines Chandravadan Goritiyal Prin.L.N Welingkar Institute of Management Development & Research (WeSchool), Mumbai, India Laxmi Goritiyal Vivekanand Education Society’s Institute of Management Studies and Research, Mumbai, India N. Gowtham Vidyavardhaka College of Engineering, Mysuru, India Mark G. Griffiths The Centre of Excellence in Mobile and Emerging Technologies, University of South Wales, Pontypridd, UK César D. Guerrero Universidad Autónoma de Bucaramanga, Bucaramanga, Colombia Abdelkrim Haqiq Faculty of Sciences and Techniques, Computer, Networks, Mobility and Modeling Laboratory: IR2M, Hassan First University of Settat, Settat, Morocco K. S. Harshith Gowda Vidyavardhaka College of Engineering, Mysuru, India Aawatif Hayar University Hassan 2, ENSEM, Casablanca, Morocco Robin Horst RheinMain University of Applied Sciences, Wiesbaden, Germany Zahra Hosseini Tampere University, Tampere, Finland Kimmo Hytönen Independent MSc. Engineering Researcher, Tampere, Finland Fernando Ibarra-Torres Universidad Técnica de Ambato, UTA, Ambato, Ecuador

Editors and Contributors

xxi

Mariam Ibrahim Department of Mechatronics Engineering, German Jordanian University, Amman, Jordan Sanasam Inunganbi Koneru Lakshmaiah Education Foundation, Guntur, India Rahul Jadhav Dr. Vishwanath Karad’s, MIT World Peace University, Pune, India Sachin Jadhav Department of Information Technology, PCCoE, Pune, Maharashtra, India Arpit Jain Namibia University of Science and Technology, Windhoek, Namibia Garima Jain Computer Science and Engineering, Noida Institute of Engineering and Technology, Greater Noida, India N. Jaishanker Misrimal Navajee Munoth Jain Engineering, Chennai, India Rithusravya Jakka Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India Dharm Singh Jat Namibia University of Science and Technology, Windhoek, Namibia Munna Lal Jatav Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal, M.P., India V. Jayalakshmi Department of Computer Applications, School of Computing Science, Vels Institute Of Science, Technology And Advanced, Studies, Pallavaram, Chennai, India Athira Jayan Aarupadai Veedu Institute of Technology, Chennai, India Prakhar Jindal Amity University, Gurugram, Haryana, India Akhil Johnson Department of Electronics and Communication Engineering, National Institute of Technology Calicut, Calicut, Kerala, India Keshav Kadam Department of Computer Engineering, Pune Institute of Computer Technology, Pune, India Shalini Kakkar PTVA’s Institute of Management, Mumbai, India; Symbiosis Institute of Management Studies (SIMS), A Constituent of Symbiosis International Deemed University, Pune, India Kalinka Kaloyanova Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria; Sofia University “St. Kliment Ohridski”, Sofia, Bulgaria B. H. Karthik Pai NMAM Institute of Technology, Udupi, Karnataka, India Robin Singh Katariya NTPC Lara, Lara, India Amandeep Kaur Chandigarh Engineering College, Jhanjeri, Mohali, India Satnam Kaur SGT University, Gurugram, Haryana, India

xxii

Editors and Contributors

Jameel Ahmed Khan Balochistan University of Information Technology Engineering and Management Sciences, Quetta, Pakistan Pranav Kharche Pimpri Chinchwad College of Engineering, Pune, Maharashtra, India Jani Kinnunen Åbo Akademi University, Turku, Finland Arvind W. Kiwelekar Department of Computer Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere-Raigad, Maharashtra, India Saishashank Konduri Dr. Vishwanath Karad’s, MIT World Peace University, Pune, India Zlatinka Kovacheva Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria; University of Mining and Geology “St. Ivan Rilski”, Sofia, Bulgaria Kshitija Kulkarni Dr. Vishwanath Karad MIT World Peace University, Pune, India Abhishek Kumar Swami Vivekanand Subharti University, Meerut, Uttar Pradesh, India Ajai Kumar Center for Development of Advanced Computing (C-DAC), Pune, India Sanjeev Kumar Department of Electronics and Telecommunications, Symbiosis International (Deemed University), Pune, India Vinod Kumar Department of Computer Science and Engineering, SGT University, Gurugram, Delhi, India Kamlesh Kumawat Department of CS and IT, IISU (Deemed to be University), Jaipur, India R. B. Kuriakose Central University of Technology, Bloemfontein, Free State, South Africa Ace C. Lagman FEU Institute of Technology, Manila, Philippines S. Aparna Lakshmanan Cochin University of Science and Technology, Kochi, Kerala, India Jay-Ar P. Lalata FEU Institute of Technology, Manila, Philippines Varsha T. Lokare Department of Computer Science and Engineering, Rajarambapu Institute of Technology, Sakharale, Maharashtra, India Santosh S. Lomte Radhai College of Computer Science, Aurangabad, Maharashtra, India Baburaj Madathil Department of Applied Electronics and Instrumentation, Government Engineering College Kozhikode, Kozhikode, Kerala, India

Editors and Contributors

Gina Maestre-Gongora Universidad Colombia

xxiii

Cooperativa

de

Colombia,

Medellín,

Mainar Swari Mahardika Indonesia University, Jakarta, Indonesia Pratham Majumder University of Calcutta, Kolkata, West Bengal, India; CMR Institute of Technology, Bengaluru, Karnataka, India Tuhin Majumder Cognizant Technology Solutions, Kolkata, West Bengal, India Leeladhar Malviya Shri Govindram Seksaria Institute of Technology and Science, Indore, M.P., India Sanjeev Manchanda A&I, TCS, Mumbai, India Manju Mandot J. R. N. Rajasthan Vidyapeeth, Udaipur, Rajasthan, India Kotari Manjunath Alva’s Institute of Engineering and Technology, Moodbidri, Karnataka, India Paul Manuel Department of Information Science, College of Life Sciences, Kuwait University, Kuwait City, Kuwait Shrutik Manwatkar Pimpri Chinchwad College of Engineering, Pune, Maharashtra, India Dion Mariyanayagam Department of Communications Technology and Mathematics, London Metropolitan University, London, UK Anshika Mathur Department of Electronics and Telecommunications, Symbiosis International (Deemed University), Pune, India Yudhi Dwi Fajar Maulana Indonesia University, Jakarta, Indonesia Sudhanshu Maurya Department of Electrical Engineering, JSS Academy of Technical Education, Noida, U.P., India Yurley Medina-Cárdenas Universidad Francisco de Paula Santander Ocaña, Ocaña, Colombia S. M. Meena K. L. E. Technological University, Hubballi, India Swati Mehta Center for Development of Advanced Computing (C-DAC), Pune, India Jaison Joy Memadam Acharya Nagarjuna University, Guntur, Andra Pradesh, India Manisha Mhetre Vishwakarma Institute of Technology, Pune, Maharashtra, India Sadia Mahbub Mim Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh Divya Mishra Computer Science Engineering, Swami Vivekanand Subharti University, Meerut, India

xxiv

Editors and Contributors

Pratap Mishra Department of Electronics and Communication Engineering, ABES Institute of Technology, Ghaziabad, Uttar Pradesh, India Vishwas Mishra Swami Vivekanand Subharti University, Meerut, Uttar Pradesh, India O. L. Mokalusi Central University of Technology, Bloemfontein, Free State, South Africa Mihir Mondhe Pimpri Chinchwad College of Engineering, Pune, Maharashtra, India S. Muhuselvan Aarupadai Veedu Institute of Technology, Chennai, India Subhodip Mukherjee Techno International New town, Chakpachuria, Kolkata, India V. N. Sreeja Sree Krishna College, Guruvayur, Kerala, India Hamza Nadeem Balochistan University of Information Technology Engineering and Management Sciences, Quetta, Pakistan; Control Automotive and Robotics Lab, National Centre of Robotics and Automation, Rawalpindi, Pakistan Zain Nadeem Balochistan University of Information Technology Engineering and Management Sciences, Quetta, Pakistan; Control Automotive and Robotics Lab, National Centre of Robotics and Automation, Rawalpindi, Pakistan Varsha Naik Dr. Vishwanath Karad’s, MIT World Peace University, Pune, India Anjali Nair BMS College of Engineering, Bangalore, India Abdul Wahid Nasir Department of ECE, CMR Institute of Technology, Bengaluru, India Sandeep Kumar Nayak Department of Computer Application, Integral University, Lucknow, Uttar Pradesh, India Ina Naydenova Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria Libero Nigro University of Calabria, DIMES, Rende (CS), Italy D. V. Nithin Balaaji Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, India B. Nithya Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, India ChukwuNonso H. Nwokoye Nnamdi Azikiwe University, Awka, Nigeria Haneen Okasha Department of Biomedical Engineering, German Jordanian University, Amman, Jordan

Editors and Contributors

xxv

Vincent S. O. Okeke Gregory University Uturu, Uturu, Nigeria Dharmendra Pal Department of Computer Science and Engineering, SGT University, Gurugram, Delhi, India Adhyayan Panwar BMS College of Engineering, Bangalore, India Rachit Patel Department of Electronics and Communication Engineering, ABES Institute of Technology, Ghaziabad, Uttar Pradesh, India Abhijit Patil Pacific University, Udaipur, India Dipashree Patil MKSSS Cummins College of Engineering for Women, Pune, India Shivani Patil MKSSS Cummins College of Engineering for Women, Pune, India Shreya Patil MKSSS Cummins College of Engineering for Women, Pune, India Roseline U. Paul Nnamdi Azikiwe University, Awka, Nigeria Atul Pawar Pimpri Chinchwad College of Engineering, Pune, Maharashtra, India Ann Pedersen Neurorehabilitation Research Group, Universitätsmedizin Greifswald, Greifswald, Germany Surekha Pinnapati R&D Centre, RNSIT, Bengaluru, India Thomas Platz Institut Für Neurorehabilitation Und Evidenzbasierung, An-Institut der Universität Greifswald, BDH-Klinik Greifswald, Greifswald, Germany S. Prakasha RNSIT, Bengaluru, India Monika Rabka Department of Communications Technology and Mathematics, London Metropolitan University, London, UK Sreya Radhakrishnan Center for Wireless Networks and Applications (WNA), Amrita Vishwa Vidyapeetham, Amritapuri, India Ma. Corazon F. Raguro FEU Institute of Technology, Manila, Philippines Rashedur M. Rahman Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh Meenaxi M. Raikar K. L. E. Technological University, Hubballi, India Alexander Raikov Laboratory of Modular Data Processing and Control Systems, Trapeznikov Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia S. Rajaprakash Aarupadai Veedu Institute of Technology, Chennai, India K. Rajeshwar National Institute of Rural Development & Panchayati Raj, Hyderabad, India D. Rakesh Reddy Vardhaman College of Engineering, Kacharam, Shamshabad, India

xxvi

Editors and Contributors

Vinay Rasal Vishwakarma Institute of Technology, Pune, Maharashtra, India Vijay Singh Rathore Department of CS and IT, IISU (Deemed to be University), Jaipur, India Shivank Ratnaparkhi Dr. Vishwanath Karad MIT World Peace University, Pune, India Vrushabh Raulkar Vishwakarma Institute of Technology, Pune, Maharashtra, India Sumant Reddy Vishwakarma Institute of Technology, Pune, Maharashtra, India Dewar Rico-Bautista Universidad Francisco de Paula Santander Ocaña, Ocaña, Colombia Yash Rode Department of Computer Engineering, Pune Institute of Computer Technology, Pune, India Sandip Roy Brainware University, Kolkata, India Sonal Mobar Roy National Institute of Rural Development & Panchayati Raj, Hyderabad, India Matthew Ryan University of Southampton, Southampton, UK Nayan Sabnis Department of Computer Engineering, Pune Institute of Computer Technology, Pune, India Priyanka Saini Electronic and Communication, Swami Vivekanand Subharti University, Meerut, India Jitesh Sanap Vishwakarma Institute of Technology, Pune, Maharashtra, India Pratiksha Sancheti Department of Computer Engineering, Pune Institute of Computer Technology, Pune, India G. Saranya Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi, India Arpit Sarap Vishwakarma Institute of Technology, Pune, Maharashtra, India Debabrata Sarddar University of Kalyani, Kalyani, India C. Sathish Periyar University, Salem, India Rini Saxena Chandigarh Engineering College, Jhanjeri, Mohali, India Ginu Sebastian Aarupadai Veedu Institute of Technology, Chennai, India Neelam Dabas Sen School of Life Sciences, JNU, New Delhi, India Sumit Sen GISE Lab, CSE-IIT Bombay, Mumbai, India Raghu Sharan Department of Electronics and Communication Engineering, ABES Institute of Technology, Ghaziabad, Uttar Pradesh, India

Editors and Contributors

xxvii

A. K. Sharma Department of CSI, University of Kota, Kota, India Naveen Sharma Department of Electronics and Communication Engineering, ABES Institute of Technology, Ghaziabad, Uttar Pradesh, India Ramesh Shelke Pacific University, Udaipur, India Muhammad Rifki Shihab Indonesia University, Jakarta, Indonesia S. Shivakarthik Center for Development of Advanced Computing (C-DAC), Pune, India Sunrose Shrestha CMR Institute of Technology, Bengaluru, Karnataka, India G. S. Shrinidhi Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi, India Garima Shukla Computer Science and Engineering, Noida Institute of Engineering and Technology, Greater Noida, India Pancham Shukla Department of Communications Technology and Mathematics, London Metropolitan University, London, UK Eram Fatima Siddiqui Department of Computer Application, Integral University, Lucknow, Uttar Pradesh, India Arindam Singh Department of IT and CA, MMMUT, Gorakhpur, UP, India Sandeep Singh S. G. T. University, Gurugram, Haryana, India Maria Vicky S. Solomo FEU Institute of Technology, Manila, Philippines Ayush Sonthalia BMS College of Engineering, Bangalore, India Kavitha Sooda BMS College of Engineering, Bangalore, India B. K. Sreenidhi Cambridge Institute of Technology, Bengaluru, Karnataka, India K. N. Sridevi R&D Centre, RNSIT, Bengaluru, India K. Srinivasan Periyar University Constituent College of Arts and Science, Pennagaram, India Aayush Srivastava Department of Electrical Engineering, JSS Academy of Technical Education, Noida, U.P., India Suraj NIT Patna, Patna, Bihar, India Sanjay R. Sutar Department of Information Technology, Dr. Babasaheb Ambedkar Technological University, Lonere-Raigad, Maharashtra, India Jose Swaminathan Vellore Institute of Technology, Vellore, India Muhammad Syed Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh

xxviii

Editors and Contributors

María Camila Sánchez-Velásquez Universidad Francisco de Paula Santander Ocaña, Ocaña, Colombia Ananna Talukder Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh Sanket Gunderao Torambekar K.P.C.Yogeshwari Tantraniketan, Ambajogai, Maharashtra, India Igor Trubin Capital One Bank, McLean, VA, USA Anayat Ullah Balochistan University of Information Technology Engineering and Management Sciences, Quetta, Pakistan; Control Automotive and Robotics Lab, National Centre of Robotics and Automation, Rawalpindi, Pakistan Ikechukwu I. Umeh Nnamdi Azikiwe University, Awka, Nigeria Kamal Upreti Department of Information Technology, Dr. Akhilesh Das Gupta Institute of Technology & Management, Delhi, India T. V. S. Varalaksmi Acharya Nagarjuna University, Guntur, Andra Pradesh, India Prashant Vats Fairfield Institute of Management and Technology, GGSIP University, New Delhi, India A. Veekshita Sai Choudhary Department of Electronics and Telecommunications, Symbiosis International (Deemed University), Pune, India V. Venkataraman Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, India Bhat Venkatesh Alva’s Institute of Engineering and Technology, Moodbidri, Karnataka, India H. J. Vermaak Central University of Technology, Bloemfontein, Free State, South Africa Heintjie N. Vicente FEU Institute of Technology, Manila, Philippines Pujashree Vidap Department of Computer Engineering, Pune Institute of Computer Technology, Pune, India Fitri Wijayanti Indonesia University, Jakarta, Indonesia Mohan Yelnadu AppSec Prudential Corporation Asia, Singapore, Singapore

Wall-Distance Measurement for Indoor Mobile Robots Nadeem Zain , Nadeem Hamza, Khan Jameel Ahmed, and Ullah Anayat

Abstract Indoor mobile robots have gone hand-in-hand with the automation of factories, households, and commercial spaces. They are present in all shapes and sizes, from humanoids used as waiters and personal assistants to box-like warehouse item-sorters. Working of these robots depends on several key elements like localization, mapping, and surroundings detection. They also need to accurately detect walls and measure the distance to them to be able to avoid collisions. Currently, this task is accomplished using LiDARs and other optical sensors which are costly, and this can be catered for by using a vision-based solution. The main hurdle for vision-based systems is that walls do not have any distinguishable visual features to detect them. This paper proposes a system for measuring the distance to the wall by detecting the wall-floor edge. BRISK key-points are extracted on the detected edge, and pixel counting is then used to calculate the distance. The accuracy of the calculated distance is 95.58% and serves as a positive motivation for further development and refining of the proposed system. This paper implements the system on a single image which can be expanded to incorporate support for implementation on live video stream in the future. Keywords Mobile robots · Robot vision · Edge detection · BRISK · Computer vision

1 Introduction Indoor mobile robots are not a new phenomenon, and initial attempts to break into the commercial market date back to the early 1980s. HERO robot series by Health Company [1] and TOPO by Androbot Inc. [2] are some of the oldest known robots of Zain N. (B) · Hamza N. · Jameel Ahmed K. · Anayat U. Balochistan University of Information Technology Engineering and Management Sciences, Quetta, Pakistan Zain N. · Hamza N. · Anayat U. Control Automotive and Robotics Lab, National Centre of Robotics and Automation, Rawalpindi, Pakistan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_1

1

2

Zain N. et al.

such kind. There has been an upward trend toward their usage in recent years due to the availability of cheap manufacturing facilities and automation in industries. Indoor robots have traditionally been used in warehouses and factories which can be deemed as highly structured environments. Recent surveys show adoption in low-structured environments such as homes and commercial spaces like museums, malls, banks, etc. These robots are used for a wide variety of tasks ranging from warehouse storage management [3] and automated vacuum cleaners [4] to smart waiters in restaurants [5]. The working of such indoor mobile robots depends on several elements including object and pedestrian detection and path planning [6]. Localization within an indoor area has been a well-researched field during the years with several state-of-the-art papers being published [7]. Another important part of the working of indoor robots is the detection of distance to a nearby wall to avoid collisions. This is especially difficult to accomplish as walls do not have any specifically distinguishable features. Although this has been solved using LiDARs and other optical or LASER-based methods, they remain a costly alternative. They do provide a higher degree of precision and are the only viable option where precision is of utmost importance. In common robots, the distance to the wall does not need to be measured down to a millimeter and a vision-based solution can be implemented. This paper proposes a system of measuring the distance of a nearby wall from an indoor mobile robot. As mentioned above, this is integral concerning a lot of applications. This paper proposes a method where distance is calculated from a single image using the edge information present where the wall meets the floor. The image from a camera source is passed through an edge detection filter which removes all other aspects of the image barring the edge. Key-points are then extracted from this edge, and pixel counting is used to calculate the distance from the key-point—on the line—to the robot. The system needs to be calibrated to determine the distance from the number of pixels. The results are encouraging and can be used in future hardware implementation with ease due to elaborate explanations and open-source resources used.

2 Literature Review Indoor robots are predominantly of two types: mobile and stationary. Stationary indoor robots are mostly used in factories and manufacturing plants to aid humans being with their work [8]. These are generally purpose-built for automation and are programmed to carry out repetitive tasks. These robots do not have any specific form and can be one of the six major types: Cartesian, cylindrical, spherical, SCARA, articulated, and parallel-based on their joint mechanism [9]. The second type of indoor robots is the mobile ones, and these are described as robots that can move around an area automatically [10], based on various algorithms [11]. These are present in the form of several applications including automatic vacuum cleaners [4] and floor sweepers [12] in household settings. In commercial

Wall-Distance Measurement for Indoor Mobile Robots

3

applications, the market is expanding due to the recent boom in AI-inspired robots. Some examples include Cobalt [13] which is an office assistant, MaidBot [14] which provides cleaning services in hotels, and Aethon [15] which is used in hospital equipment deliveries. All the mobile robots mentioned above use algorithms and elements to maneuver around the given space. Each of these applications and implementations require sensing of nearby objects. They use input from several sources such as cameras, LiDAR, RADARS, and other sensors [16]. They need to detect objects and people to avoid collisions and accidents. One area with a lack of available literature is measuring the distance to walls using vision. This task has been accomplished using LASERs, which is costly. This paper presents a vision-based system for the measurement of distance to the wall. Indoor localization is also of importance in the working of indoor robots. This task is easier in outdoor robots with GPS providing a claimed accuracy of 95% up to 7.8 m in ideal environments [17]. However, this margin of error is not acceptable in an indoor environment, and effective localization algorithms [18] have been established to deal with this problem. Edge detection is a field of image processing with roots dating back to as far as the 1980s with substantial work done by John Canny, summarized in [19]. Several different types of detectors have been developed with various applicational advantages. Some of the detectors focus on computational efficiency, while others delve into the accuracy of detected edges. Some of the recent works [20] have built upon the previous advances, and development of patented detectors has also taken huge steps [21]. Previously, SIFT [22] and SURF [23] algorithms have been widely used for keypoint and feature extraction purposes, but this paper uses the BRISK algorithm. Binary robust invariant scalable key-points or BRISK is a feature extractor proposed by a team from ETH Zurich [24]. BRISK is faster than the previous feature extraction methods, and this aspect was also validated during the experimentation stage of this paper. Another positive aspect of BRISK is the computational efficiency, and it provides due to the use of scale-space FAST-based detector [24] and its binary nature [25].

3 Methodology This paper follows the general flow shown in Fig. 1. An image is acquired through the camera source and is resized to 200 × 200 resolution. The resized image is then passed through the edge detection stage, and a feature map is produced as shown in Fig. 2. Thereafter, the BRISK algorithm is used to extract the key-points which have both ‘x-’ and ‘y’-coordinates. The ‘y’-coordinate is the height of the key-point from the lower edge of the image and by default the distance of the wall from the robot. Distance is measured based on the calibrated model, and results are produced.

4

Zain N. et al.

Fig. 1 Flowchart

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 2 Comparison of edge detection methods for sample image (from left to right) a original image, b Sobel’s method, c Canny’s method, d Prewitt’s method, e Roberts’ method, and f Laplacian of Gaussian method

3.1 Image Acquisition and Resizing The process of wall-distance measurement begins with image acquisition from a single camera source. The image is taken using a camera mounted onto the side of a small indoor mobile robot. The ideal condition for the image is when the robot is perpendicular to the wall. This is to say that the robot should be facing the wall at an angle equal to 90°. This would ensure that the edge between the wall and the floor is present as a straight line and extraction of BRISK features becomes more accurate. The image taken is of relatively high resolution. This high-resolution image carries a lot of details and information, some of which is not of our interest. This can become worrisome as the unwanted details will be perceived as noise and will result in a noisy feature map from the edge detection stage [26]. As all the steps are dependent on the accuracy and precision of the previous steps, the extraction of BRISK features will be hampered. To counter this adverse effect, the acquired image is resized to a specific resolution of 200 × 200. This ensures that the edge details—which are the required high-level features—are retained, yet the low-level details which are not required are decreased to a minimum.

3.2 Edge Detection Edge detection is an age-old technique with several variants available. During experimentation, several of the available edge detectors were applied, and the results were observed as shown in Fig. 2. The edge detection methods used during experimentation were Sobel [27], Canny [19], Prewitt [28], Roberts [29], and Laplacian of Gaussian (LoG) [30]. As is evident from the results, none of the detectors was able

Wall-Distance Measurement for Indoor Mobile Robots

(a)

(b)

(c)

(d)

(e)

5

(f)

(g)

(h)

Fig. 3 Comparison of threshold values (t) of Canny edge detector for sample image (from left to right) a original image, b t = 0.15, c t = 0.25, d t = 0.35, e t = 0.45, f t = 0.55, g t = 0.65, h t = 0.75

to detect the edges which can be used for distance measurement. The Canny edge detector was found to be a better performer even under the default settings—which are pre-set to produce optimal results. After selecting Canny’s method for detection, it was tweaked to allow for better noise removal. Several different threshold values are used, and their outputs are compared as shown in Fig. 3. We can clearly see that as the threshold value (t) is increased from 0.15 to 0.75, the noise decreases. The Canny edge detector uses nonmaxima suppression [19] to remove the edges which do not qualify after a certain degree of the threshold. As observable, we get best results at t = 0.75 and that is the value we will be using for further experiments.

3.3 BRISK Key-Points Extraction After we have extracted the edge feature maps as shown in Fig. 3, we are left with just the most relevant information which we need—that is the edge between the wall and the floor. Hereafter, we run the BRISK algorithm on the feature maps, and the five strongest key-points are identified. These points are saved as ‘x-’ and ‘y’-coordinates of the image pixels. These points may or may not be at the same exact point. The detected features lie on the detected edge as shown in Fig. 4. Fig. 4 BRISK key-points extracted for sample image

6

Zain N. et al.

3.4 Pixel Counting and Distance Calculation As mentioned previously, the BRISK features are saved in the form of ‘x-’ and ‘y’coordinates and the ‘y-’ coordinate corresponds to the height of the key-point from the image edge. As the image is taken using a camera mounted onto the robot, it is safe to assume that the height of the key-points is equal to the distance from the wall. We use the ‘y’-coordinate for each image and calculate their mean pixel count (pc)—to iron out any anomalies if present in the detected key-points. We need to convert the counted pixels into the distance by using the Euclidean distance formula [31]. Each pixel corresponds to a certain distance in the real world depending on the total number of pixels in the image. We need to initially calibrate the camera, and to do this, we implement the system on an object with pre-known dimensions. We specify that the dimensions—in pixels count (pc )—correspond to a specific distance in the real world. Then, a baseline is established, and the system can use the height of the key-point to calculate the distance (d c ).

3.5 Characterization The characterization of the system is done by calculation of accuracy. Initially, the difference between calculated distance (d c ) and actual distance (d a ) is computed for each image using (1); this is the error (εi ) present in the calculation for each image ‘i’. Accuracy is then calculated by taking the mean of the errors for all the images and subtracting from 100 as shown in (2) where k is the total number of images. εi = |da − dc | Accuracy = 100 −

(1)

k εi i=0

k

(2)

4 Results and Discussion This paper aimed to establish a system for measuring the distance of the wall from the robot. To validate the results, the actual distance from the robot to the wall is measured (d a ). After passing through all the required steps, pixels were counted, and distance was calculated for each sample image. As visible in Figs. 5 and 6, the results are encouraging on most of the images. The detected edges are clean and crisp, while the key-point extraction phase has placed all the points on the detected edge.

Wall-Distance Measurement for Indoor Mobile Robots

7

Fig. 5 Mean pc = 59.8382, mean d c = 9.1589 cm, d a = 9 cm, a input image, b output image with detected key-points

Fig. 6 Mean pc = 102.6334, mean d c = 15.7092, d a = 15.5 cm, a input image, b output image with detected key-points

Fig. 7 Mean pc = 176.0736, mean d c = 26.9500, d a = 18.0 cm, a input image, b output image

Figure 7, on the other hand, paints an entirely different picture and is one of the failure cases. As is evident from the input image, the wall-floor edge is not easily distinguishable. This causes unclear edge detection, which translates into erroneous key-point detection. As seen in Fig. 7, the key-points are way off the required mark and the calculated distance d c varies significantly from the actual distance d a

5 Conclusion The successful implementation of this paper can result in an easy-to-use and comprehend system for measuring the distance to the wall for an indoor mobile robot. There is room for improvement as the computation time for each iteration was not discussed and that is something that can be explored further. Furthermore, this implementation is for a single image as we have only proposed a new method. This can be expanded

8

Zain N. et al.

to use with a video stream from the camera source which would, in turn, allow real-world implementation. Acknowledgements This research is conducted at Control Automotive and Robotics Lab (CARLBUITEMS), funded by National Center of Robotics and Automation (NCRA), with the collaboration of Higher Education Commission (HEC) of Pakistan.

References 1. Heath Company, Benton Harbor: HERO Robot Manual, p. 96 2. A. Inc.: TOPO Robot [Online]. Available http://www.megadroid.com/Robots/Androbot/Topo. htm. Accessed 27 May 2019 3. Vincent, J.: Welcome to the Automated Warehouse of the Future—The Verge [Online]. Available https://www.theverge.com/2018/5/8/17331250/automated-warehouses-jobs-ocadoandover-amazon. Accessed 27 May 2019 4. IRobots: Roomba Vacuum Cleaner [Online]. Available https://www.irobot.com/for-the-home/ vacuuming/roomba. Accessed 27 May 2019 5. Nepal’s First Robot Waiter is Ready for Orders [Online]. Available https://phys.org/news/201811-nepal-robot-waiter-ready.html. Accessed 27 May 2019 6. González García, C., Meana-Llorián, D., Pelayo G-Bustelo, B.C., Cueva Lovelle, J.M., GarciaFernandez, N.: Midgar: Detection of people through computer vision in the Internet of Things scenarios to improve the security in smart cities, smart towns, and smart homes. Futur. Gener. Comput. Syst. 76, 301–313 (2017) 7. Bailey, T., Durrant-Whyte, H.: Simultaneous localization and mapping (SLAM): part II. IEEE Robot. Autom. Mag. 13(3), 108–117 (2006) 8. Cheng, H., Jia, R., Li, D., Li, H.: The Rise of Robots in China 9. Chong, C.: Stationary Robots [Online]. Available https://cnc-machine-tools.com/stationaryrobots/. Accessed 28 May 2019 10. Tzafestas, S.G.: Mobile robot control and navigation: a global overview. J. Intell. Robot. Syst. 91, 35–58 (2018) 11. Chen, Q., Chen, Y., Tang, P., Chen, R., Jiang, Z., Deng, A.: Indoor simultaneous localization and mapping for Lego Ev3. DEStech Trans. Comput. Sci. Eng. CCNT, 500–504 (2018) 12. Scholten, A., Jeffrey, A., Dillane, M.T., Imhoff, T.J., Rose, S.M., Luedke: Autonomous Floor Cleaner. Patent Application Publication (2019) 13. Cobalt Robotics|Robots as a Service for Security, Facilities, and Operations [Online]. Available https://cobaltrobotics.com/. Accessed 28 May 2019 14. Maidbot [Online]. Available https://maidbot.com/. Accessed 28 May 2019 15. TUG Autonomous Mobile Robots. Manufacturing, Hospitality, Healthcare [Online]. Available https://aethon.com/products/. Accessed 28 May 2019 16. Kinsky, P., Quan, Z., YiMing, R.: Obstacle avoidance robot 5(2), 439–442 (2011) 17. Global Positioning System Standard Positioning Service Performance Standard, 4th edn. (2008) 18. Turgut, Z., Aydin, G.Z.G., Sertbas, A.: Indoor localization techniques for smart building environment. Proc. Comput. Sci. 83, 1176–1181 (2016) 19. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-8(6), 679–698 (1986) 20. Dollár, P., Zitnick, C.L.: Fast edge detection using structured forests. IEEE Trans. Pattern Anal. Mach. Intell. 37(8), 1558–1570 (2015) 21. Jeffrey Rzeszutek, R.: Method and apparatus for support surface edge detection (2017) 22. Lowe, D.G.: Object recognition from local scale-invariant features (1999) 23. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-Up Robust Features (SURF)

Wall-Distance Measurement for Indoor Mobile Robots

9

24. Leutenegger, S., Chli, M., Siegwart, R.Y.: BRISK: Binary Robust Invariant Scalable Keypoints 25. Khan, S., Ullah, S.: Feature-Based Tracking via SURF Detector and BRISK Descriptor. In: Lu, H. (ed.) Cognitive internet of things: frameworks, tools and applications, pp. 147–157. Springer International Publishing, Cham (2019) 26. Lin, X., Ma, Y.-L., Ma, L.-Z., Zhang, R.-L.: A survey for image resizing. J. Zhejiang Univ. Sci. C (Comput. Electron.) 15(9), 697–716 (2014) 27. Sobel, I.: An Isotropic 3 × 3 Image Gradient Operator (2015) 28. Prewitt, J.M.S.: Object enhancement and extraction. In: Picture Processing and Psychopictorics, p. 535. Elsevier Science (1970) 29. Roberts, L.: Machine Perception in Three-Dimensional Solids. Massachusetts Institute of Technology (1963) 30. Mintz, D.: Robust consensus based edge detection. CVGIP Image Underst. 59(2), 137–153 (1994) 31. Chapter 4 4. Methods for Measuring Distance in Images 4

Automating Cognitive Modelling Considering Non-Formalisable Semantics Alexander Raikov

Abstract The paper addresses the acceleration of cognitive modelling by automating the creation of cognitive semantics of Artificial Intelligence (AI) models considering such human abilities as free will, consciousness, unconsciousness, feelings, thoughts and experience. The problem consists of the impossibility of formalising these abilities during modelling, especially considering their uncaused character. Traditional AI tools such as knowledge management, logical ontologies and neuron networks cannot totally embrace these abilities, to understand and explain events. The paper suggests using the Hybrid AI (HAI) approach, which integrates formalisable AI and non-formalisable human abilities. This paper’s main idea consists of using a non-local approach to enrich the cognitive semantics of AI models that consider the subatomic structure of the human mind’s biological tissue. In this case, quantum operators map an AI model on relevant big data, thereby enriching the cognitive semantics. The quantum non-local approach helps to increase the quality of cognitive models and synthesise their drafts automatically. The inverse problemsolving method on topological spaces, which ensures the purposefulness and sustainability (convergence) of decision-making processes, is also used. The approach is applied in real practice during collective strategic planning and is now developing for decision-making in an emergency. Keywords Artificial intelligence · Cognitive modelling · Convergent approach · Inverse problem-solving · Non-local semantics

1 Introduction Cognitive modelling is being implemented to accelerate collective decision-making processes in ill-defined and urgent situations [1]. This modelling includes such steps

Present Address: A. Raikov (B) Laboratory of Modular Data Processing and Control Systems, Trapeznikov Institute of Control Sciences Russian Academy of Sciences, Profsoyuznaya st., 65, 117997 Moscow, Russia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_2

11

12

A. Raikov

as experts identifying factors that characterise the situation, assessing mutual influences between the factors, checking the correctness of the model by mapping the components of the model on big data. However, the unreliability of information and subjective influences make cognitive models inadequate to real events. The expert modelling process takes some days or more, but for emergencies, there are only minutes to respond. The issue, therefore, arises automating of cognitive modelling. The main problem of automating cognitive modelling consists of the impossibility of formalising the cognitive semantics of artificial intelligence (AI) models. Traditional AI tools represent only formalised denotative semantics—by mapping AI models on logical ontologies, texts, images and big data. But the cognitive semantics of AI models is a non-formalisable and uncaused phenomenon [2, 3]. As an idea, the special quantum and wave operators can perhaps transform denotative semantics into cognitive one. It could help to cover also a non-local nature of cognitive semantics. To make the synthesis of cognitive models more purposeful and sustainable, the Hybrid AI (HAI) approach can also be applied. Hybrid Reality (HR) is a system in which people and AI coexist and affect each other. In HR, human participation can be guided by the instructions of networked strategic conversations [4] and using the idea of social responsibility (SR) [5]. The author’s special convergent approach, based on inverse problem-solving on topological spaces, category theory, thermodynamic theory and genetic algorithms, helps create cognitive semantics. This approach ensures the purposefulness and sustainability of the collective decision-making process. Due to the convergent approach, a strategic conversation with 35 brainstorming takes about 4 h [6].

2 Non-formalisable Cognitive Semantics Cognitive semantics represents human free will, consciousness, unconsciousness, experiences, emotions, feelings, thoughts, insights, etc. The interpretation of this semantics is beyond the possibility of traditional formalised AI tools, which use logical schemas, knowledge bases, neuron networks, ontologies, big data, etc. Figure 1 demonstrates the difference between denotative and cognitive semantics. The cognitive semantics can be created indirectly or in a non-local way by revealing the connections between events from outside spaces. Spaces can be logical and transcendental, discrete and continuous, ordered and chaotic, causal and uncaused. The author’s convergent approach [1, 6] helps to take into account the cognitive semantics by ensuring the purposeful and sustainable integration of nonlocal components. In cognitive semantics, the tacit and ill-defined aspects play the main role [7]. Computers cannot handle causality because they cannot be fully immersed in the world [8]. This conclusion shows one of the ways of AI development in the future—it is an extension of cognitive semantics in the quantum, wave and cosmic spheres. The creation of advanced forms of AI is associated with the HAI. It helps to remove the

Automating Cognitive Modelling Considering …

13

Fig. 1 Denotative and cognitive semantics

limitation of the discreteness of data presentation in a logical–formalised way. In HR, new capacity reasoning components are constructed [9]. The processes of immersing a human into the HAI system can be represented, for example, by the network strategic conversation and using SR principles [4, 6]. The meeting implies collective consensual goal-setting, problem formulation and the elaboration of a plan of action. All remote participants can reach a mutual understanding by exchanging messages. The strategic conversation can use the following methods: • strategic analysis with proper consideration of 70–120 important factors, • hierarchy analysis and cognitive modelling, • direct and inverse problem-solving on the cognitive model, etc. The usefulness of reproduction mechanisms for adequately elaborating HR was noticing, which invoked such functions as fertilisation, hybridisation, selection, genetic material manipulation and systemic design [10]. In ISO 26000 [11], devoted to SR, the three basic principles of SR are as follows: the subject’s responsibility for its influences on society, interdependence and a holistic approach. These principles are correlated with seven SR concepts: transparency, accountability, respect for stakeholders, etc. Usually, the method of system dynamics is used for analysing dynamic interaction processes. For example, papers [5, 12] proposed an approach in which a model uses Causal Loop Diagrams from systems dynamics to explain cause and effect relations.

3 Uncaused Decisions A human can make appropriate but uncaused decisions [13]. The well-known phenomena of the ‘eureka’ effect [14] or ‘gut feelings’ [15] do not have sufficient caused logical explanation. A lot of phenomena such as the collective unconscious, cosmic vacuum, particle-wave duality, the double-slit experiment, cosmic strings, quantum non-locality, etc. have so far eluded persuasive cause-effect explanations. It

14

A. Raikov

may be concluded that every unclear scientific problem can be put into this uncaused trap. Some approaches seek to represent uncaused phenomena by the topologicalrelated mechanisms of projections and mappings. These approaches describe different phenomena, in which classical logic no longer hold [16]. For this, different authors turn to the following theorems: • Hairy ball theorem [17] states that there is no non-vanishing continuous tangent vector field on even-dimensional n-spheres, • Brouwer fixed point theorem [18] states that every continuous function from an n-sphere of every dimension to itself has at least one fixed point, • Borsuk–Ulam theorem [19] states that provided the function under assessment is continuous, two points with matching descriptions in higher dimensions map to a single point in lower dimensions. Considering these theorems, the mapping objects or events between different spaces representing different levels of the hierarchy of ordering these objects change the traditional understanding of cause-effect relationships cardinally. The fruitfulness of this suggestion was demonstrated clearly on some dynamic systems, in such cases as: • topological framework for cellular duplication, • the divergence of light from a distant galaxy due to the gravitational effects, • particle-wave duality in the double-slit experiment, etc. The article [20] substantiates that AI systems differ from human intelligence in crucial ways. The truly human-like AI systems have to work beyond current engineering systems. The authors argue that AI machines should build causal models that can: • explain and understand the world, rather than making only pattern recognition, • enrich the knowledge that has been got from physics and psychology, • be able to acquire and generalise knowledge to solve new tasks. Many AI researchers have criticised the scientific efforts to create AI systems that are identical to—or better than—human intelligence, which is called Strong AI. For example, the author of [21] argued that computers, which have no human body and no cultural history, could not acquire human intelligence. His main argument is that a computer cannot articulate human knowledge because it is tacit and latent. The construction of the cognitive semantics can be influenced by the behaviour of both the brain neurons and cells of the entire human body. These cells have a quantum nature. The scientific direction of quantum biology approves the relevance of the quantum approach to study human behaviour. A broad range of biological processes in which living systems exploit quantum systems is reviewed in [22]. It is shown that quantum effects play a crucial role in maintaining the state of biomolecular systems. Scientists try to find a causal explanation of events. But traditional AI and big data analysis methods can give only the level of correlation between different events.

Automating Cognitive Modelling Considering …

15

Causality and correlation are not the same. According to [23], a mark of causality is a 100% correlation between cause and effect. But many such correlations are not causal. Traditional AI cannot transform a correlation into causation.

4 Quantum and Wave Semantics It is necessary to turn to the quantum semantics paradigm while constructing AI to consider the behaviour of the subatomic component of the human brain and body in the processes of thinking. Quantum operators can help to enrich the cognitive semantics [2, 24]. There are several phenomena on a quantum level that can be useful for the representation of cognitive semantics. The effects of the collapse of quantum states and the entanglement are of particular interest. The collapse of the quantum state occurs during measurements when a certain observer is included in the process. At this moment, one of the infinite numbers of states of a quantum particle becomes “frozen”. In turn, the quantum entanglement effect demonstrates the phenomenon of non-locality. It reflects the instantaneous interconnection of the states of particles located at large distances [20, 25]. It shows the possibility of an indirect dependence of mental activity on the distant environment. A human’s consciousness looks like a physical field. It can be expressed through biochemical, wave, acoustic and quantum-relativistic means. The wave nature of quantum particles interaction also has to be taken into account. The famous Einstein–Podolsky–Rosen (EPR) thought experiment [26] and Bell test against the local realism [27]—that is, physical locality and relativistic restrictions on causation—have been confirmed experimentally only recently [28]. It has become possible only now due to the modern abilities of informational technologies, which have closed the whole technical freedom-of-choice locality loophole. The results of this experiment strongly contradict local realism; they reject local realism in a wide variety of quantum and relativistic physical systems and demonstrate the possibility of global crowdsourcing techniques in experimental science.

5 Automation of Cognitive Modelling The process of cognitive modelling can be only partially formalisable. For example, inverse problem-solving method [29], which is used for creating the cognitive semantics, is characterised by incorrectness—the solution may be absent, may not be unique and there may be no possibility of ensuring a continuous convergence of the solution. These methods work well only in metrical spaces. However, during cognitive modelling, non-metrical spaces are used. The inverse problem-solving on the topology spaces can give the necessary condition for convergence decision-making [24]. There are different ways of

16

A. Raikov

Fig. 2 Steps of cognitive modelling automation

inverse problem-solving. Papers [1, 6] suggest the genetic algorithm for inverse problem-solving on a cognitive model. There are three steps in developing the cognitive modelling approach that are suggested for its automation (Fig. 2). In step I, only the skilled experts create the cognitive model. Step II shows the process of automatic verification of the cognitive model and deep learning of the neural network by mapping cognitive models on the relevant big data. Step III is devoted to the automatic creation of the cognitive model. It includes wave and quantum operators.

6 Discussion The debatable issue is the creation of AI software and hardware by taking into account the non-formalisable cognitive semantics [24]. The possibility of indirect algorithmisation of various human activities can be realised by including the human himself in the AI system. The use of wave techniques and quantum computations to create cognitive semantics has to be studied in much more detail. The cognitive architectural approach is used for formalising cognitive processes. But there are no cognitive architectures for the representation of the HAI and Strong AI. Then, the deeper issue arises: can the current digital computer fully represent the cognitive semantics? The adequacy of digital solutions for the cognitive semantics representation is not obvious because they cut off wave spectra of images and signals.

Automating Cognitive Modelling Considering …

17

7 Results and Conclusions The author of this paper has used convergent and cognitive modelling methods for over 25 years; see, e.g. [6, 24, 30]. They help accelerate getting good solutions to the problems, such as creating an effective strategic plan for the government or business. However, certain circumstances are forcing a significant speeding up of cognitive modelling, for example, to make it instantaneous in an emergency [1]. Attempts to implement such acceleration have shown the need to turn to deep learning methods and non-classical approaches to construct cognitive semantics based on quantum and wave operators. Initial experimentation has shown that it is possible to improve the quality of automatic cognitive modelling. However, these results need more proof from experiments and scientific research. In this research, an important role might be played by integrating collective intelligence methods, HAI, Strong AI, AGI and the creation of a convergent platform. This platform could be called, for example, a “Cogniscope” because it aims to ensure the sustainability and purposefulness of the processes of gaining a collective insight on the problems, including such complex ones as the creation of a unified field theory. It is necessary to create a unified information space for scientific and educational resources, ensuring integrating information in a convergent way on developments, publications, consulting activities, regulatory information, distance learning, application packages and databases. Acknowledgements This work is supported by the Russian Science Foundation, grant № 21-1800184.

References 1. Raikov, A.N.: Accelerating decision-making in transport emergency with artificial intelligence. Adv. Sci. Technol. Eng. Syst. J. (ASTESJ) 5(6), 520–530 (2020). https://doi.org/10.25046/aj0 50662 2. Dalela, A.: Quantum meaning: a semantic interpretation of quantum theory. Shabda Press, Kindle Edition (2012) 3. Kotseruba, I., Tsotsos, J.K.: 40 years of cognitive architectures: core cognitive abilities and practical applications. Artif. Intell. Rev. 53(1), 7–94 (2020). https://doi.org/10.1007/s10462018-9646-y 4. Gubanov, D., Korgin, N., Novikov, D., Raikov, A.: E-expertise: modern collective intelligence. Springer, Series: Studies in Computational Intelligence, vol. 558, XVIII (2014). https://doi. org/10.1007/978-3-319-06770-4 5. Perco, I.: Hybrid reality development—can social responsibility concepts provide guidance? Kybernetes. 50, 676–693 (2020). https://doi.org/10.1108/K-01-2020-0061 6. Raikov, A.: Megapolis tourism development strategic planning with cognitive modelling support. In: Yang, X.S., Sherratt, S., Dey, N., Joshi, A. (eds.) Fourth International Congress on Information and Communication Technology (London), Advances in Intelligent Systems and Computing, vol. 1041. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-06376_12

18

A. Raikov

7. Polanyi, M.: Personal Knowledge. Routledge & Kegan Paul, London (1958) 8. Fjelland, R.: Why general artificial intelligence will not be realized. Humanit. Soc. Sci. Commun. 7(10) (2020). https://doi.org/10.1057/s41599-020-0494-4 9. Samsonovich, A.V.: Socially emotional brain-inspired cognitive architecture framework for artificial intelligence. Cogn. Syst. Res. 60, 57–76 (2020). https://doi.org/10.1016/j.cogsys. 2019.12.002 10. Schwaninger, M.: Managing complexity—the path toward intelligent organizations. Syst. Pract. Action Res. 13(2), 207–241 (2000). https://doi.org/10.1023/a:1009546721353 11. ISO 26000: Social responsibility. ISO (2010) 12. Guzman, A.L.: A strategic and dynamic land-use transport interaction model for Bogota and its region. Transp. B Transp. Dyn. 7(1), 707–725 (2019). https://doi.org/10.1080/21680566. 2018.1477636 13. Chomsky, N.: Language and nature. Mind New Series 104(413), 1–61 (1995). https://doi.org/ 10.1093/mind/104.413.1 14. Perkins, D.: The Eureka Effect. The Art and Logic of Breakthrough Thinking. W.W. Norton & Company, NY, London (2000) 15. Gigerenzer, G.: Gut Feelings. The Intelligence of the Unconscious. Viking, London (2007) 16. Tozzi, A., Papo, D.: Projective mechanisms subtending real world phenomena wipe away cause effect relationships. Prog. Biophys. Mol. Biol. 151, 1–13 (2020). https://doi.org/10.1016/j.pbi omolbio.2019.12.002 17. Eisenberg, M., Guy, R.: A proof of the Hairy Ball theorem. Am. Math. Mon. 86(7), 571–574 (1979). https://doi.org/10.2307/2320587 18. Crabb, M.C., Jawaworski, J.: Aspects of the Borsuk-Ulam theorem. J. Fixed Point Theor. Appl. 13, 459–488 (2013). https://doi.org/10.1007/s11784-013-0130-7 19. Matoušek, J.: Using the Borsuk–Ulam theorem. Lectures on Topological Methods in Combinatorics and Geometry. Springer-Verlag Berlin Heidelberg (2003). https://doi.org/10.1007/9783-540-76649-0 20. Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J.: Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017). https://doi.org/10.1017/S0140525X160 01837 21. Dreyfus, H.L.: What computers can’t do: the limits of artificial intelligence. Harper & Row, Harper Colophon Books; CN 613, NY (1979) 22. Kim, Y., Bertagna, F., D’Souza, E.M., et al.: Quantum biology: an update and perspective. Quantum Rep. 3, 1–48 (2021). https://doi.org/10.3390/quantum3010006 23. Mill, J.S.: A system of logic, ratiocinative and inductive: being a connected view of the principles of evidence, and the methods of scientific investigation. Cambridge University Press (2011).https://doi.org/10.1017/CBO9781139149839 24. Raikov, A.: Cognitive semantics of artificial intelligence: a new perspective. Springer Singapore, Topics: Computational Intelligence XVII (2021). https://doi.org/10.1007/978-981-336750-0 25. Zwiebach, B.: Entanglement. https://ocw.mit.edu/courses/physics/8-04-quantum-physics-i-spr ing-2016/video-lectures/part-1/entanglement/. Last accessed 2021/02/25 26. Einstein, A., Rosen, N., Podolsky, B.: Can quantum-mechanical description of physical reality be considered complete? Phys. Rev. 47, 777–780 (1935). https://doi.org/10.1103/PhysRev. 47.777 27. Bell, J.S.: On the Einstein–Podolsky–Rosen paradox. Physics 1(3), 195–200 (1964). https:// doi.org/10.1103/PhysicsPhysiqueFizika.1.195 28. The BIG Bell Test Collaboration, Abellán, C., Acín, A., et al.: Challenging local realism with human choices. Nature 557, 212–216 (2018). https://doi.org/10.1038/s41586-018-0085-3 29. Ivanov, V.K.: Incorrect problems in topological spaces. Siberian Math. J. 10, 785–791 (1969) (in Russian) 30. Raikov, A.: Health care trust space based on collective artificial intelligence and blockchain technologies. J. eHealth Technol. Appl. 16(1), 37–42 (2018)

Using a Humanoid Robot to Assist Post-stroke Patients with Standardized Neurorehabilitation Therapy Peter Forbrig , Alexandru Bundea , Ann Pedersen , and Thomas Platz

Abstract Worldwide the number of people living with stroke-related disability is increasing. Neurorehabilitation, e.g. training therapy provided by occupational and physiotherapists helps to reduce impairment and activity limitations effectively. Yet, to cope with the increasing demands, the number of physiotherapists and occupational therapists is not sufficient. The paper proposes the hypothesis that a social humanoid robot might serve as therapeutic assistant for patients during standardized training sessions, after therapists have evaluated a patient’s needs, decided on and have the patient made acquainted with an individualized training program. It is first described what kind of training tasks is intended to be supported by a social humanoid robot as training assistant. Second, digitalization of those tasks is presented. Third, the process of building user models for patients and helping persons using the repertory grid approach is discussed. Finally, opportunities for motivating interactions based on these models are mentioned. Keywords Social humanoid robot · Interaction design · Robot assistance · Rehabilitation stroke · Repertory grid · User model arm basis training

P. Forbrig (B) · A. Bundea Department of Computer Science, University of Rostock, Albert-Einstein-Str. 22, 18055 Rostock, Germany e-mail: [email protected] A. Bundea e-mail: [email protected] T. Platz Institut Für Neurorehabilitation Und Evidenzbasierung, An-Institut der Universität Greifswald, BDH-Klinik Greifswald, Karl-Liebknecht-Ring 26a„ 17491 Greifswald, Germany e-mail: [email protected] A. Pedersen · T. Platz Neurorehabilitation Research Group, Universitätsmedizin Greifswald, Fleischmannstraße 44, 17491 Greifswald, Germany e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_3

19

20

P. Forbrig et al.

1 Introduction Currently, in our society, disability is often caused by stroke [13]. Unfortunately, related brain lesions can affect various body functions. They can lead to activity limitations and other social life restrictions. To overcome the disability, intensive specific training activities are necessary to promote the recovery of body functions. Training therapies were designed for stroke survivors with severe arm paresis, i.e. the arm basis training (ABT) [9, 10] or the mirror therapy (MT) [16]. While therapeutic training is clinically effective, if both specific and of high enough intensity, there is a lack of therapists for post-stroke patients to implement intensive training schedules in many healthcare systems. Within our project E-BRAiN (evidence-based robot assistance in neurorehabilitation; www.ebrain-science.de) [3], we want to study whether humanoid robots can support therapists, if they are designed to be a socially interactive companion for therapies that support daily repetitive training schedules. Therefore, we developed software that allows a humanoid robot to give instructions for carefully selected training exercises and provide feedback and motivation. Within that design, the robot’s task is not to take therapeutic decisions, but to continue a repetitive training schedule that has been decided on, individually adapted, and introduced to a stroke survivor by a human therapist. We followed a human-centred design (HCD) approach that was adapted from Harte et al. [7]. According to this approach, one has to study the tasks and contexts carefully. Some of the exercises for post-stroke patients will be presented at the beginning. Afterwards, we focus on modelling patients and helpers to cope with the challenges of using a social robot for providing instructions and individual feedback. Related work is discussed in the fourth section. The paper closes with a summary and an outlook.

2 Training Tasks The clinical members of our E-BRAiN team selected the arm basis training (ABT) and the mirror therapy (MT) as candidates to be implemented digitally. These training therapies are highly standardized, require daily repetitive training schedules, and have been shown to be clinically effective. We will shortly describe these therapies.

Using a Humanoid Robot to Assist Post-stroke …

21

Fig. 1 Examples of the ABT training movements with a therapist (from [12])

2.1 Arm Basis Training (ABT) The arm basis training has been designed for patients with severe arm paresis. It focuses on the capacity for selective movements. Within a systematic training structure, all segments of the arm and hand repetitively trained. The training starts with single joint movements without anti-gravity control (phase 1), single joint movements with anti-gravity control are trained next (phase 2), and finally multi-joint movements (phase 3); training schedules across these phases are individualized and adjusted over the course of recovery [9]. Figure 1 gives an impression of the therapy in phase 1. The therapy movements are starting with repetitive attempts to selectively move the affected limb in a single joint without the need to control the factor gravity, meaning that the therapist is holding the patient’s extremity, while she or he actively and repetitively attempts to move a single segment of the limb. Figure 1 illustrates examples of such movements that are repetitively trained.

2.2 Mirror Training An alternative treatment option for stroke survivors with severe arm paresis is the mirror therapy in which a mirror is placed on a table between the arms. In this setting, the illusion of “normal” movements of an even severely affected paretic limb can be provided by the mirror image of a moving non-affected limb [16]. By this setup, different brain network regions for movements of the affected arm are stimulated (by movement observation) and promote the recovery of movement function in the paretic limb. During mirror therapy, a patient sits in 90° next to a mirror placed on a table with his healthy arm in front of the mirror and the involved arm behind the mirror (see Fig. 2). A patient receives instructions to exercise his non-affected arm, to look into the mirror and to imagine that his affected arm is moving.

22

P. Forbrig et al.

Fig. 2 Situation at the beginning of a mirror therapy. The paretic arm is positioned behind the mirror

This imagination is able to stimulate the brain areas around the stroke-affected areas, promotes the recovery of motor control of these areas, and hence, can result in a recovery of the ability to move the affected arm.

3 Robot-Supported Training Tasks Within our project E-BRAiN, we want to develop a therapeutic system where a humanoid robot can assist a human therapist. In this conceptual framework, background for any robot-supported training therapy is the assessment, goal-orientation, selection of appropriate therapy, information and consent about any findings and plans with the patient, and equally the introduction of the training tasks to the patients as performed by a human therapist. Thereafter, the human therapist also supervises the first cycles of training performance. The human therapist introduces the patient in a specific training, e.g. the ABT or mirror therapy, makes individual adjustments, and gets the patient acquainted with the training. At this stage, there might be a role for a social robot to support additional largely standardized training sessions. Sensors observe the training tasks. In this way, the robot can be informed electronically about the provided results and hence the training tasks performed and any progress made. This information can be permanently stored and is in this way available for detailed evaluations. However, there is still the problem of appropriate communication with the patient. What should the robot say if one or several training tasks go wrong? This will ofcourse depend on the characteristics of the patients. However, what are these characteristics? We will discuss this aspect in the following subsection.

Using a Humanoid Robot to Assist Post-stroke …

23

3.1 Model of a Patient The implemented training therapies are both standardized and need to be individualized (individually adapted). Both the training strategies and their individual adaptations based on patient characteristics and hence the design of the expert system were based on prior knowledge (e.g. [9–11]). Accordingly, the patient model consists of general personal data like name, gender, and age. Further, standardized assessment scores were identified that allow to characterize the different clinical presentations that are relevant for therapeutic decisions and therapist–patient communication. They are grouped into three different categories. The first two categories are the National Institutes of Health Stroke Scale (NIHSS) providing standardized information about stroke sequelae in terms of neurologically impaired body function (e.g. degree of paresis, presence of visual or speech disorders, etc.) and the hospital anxiety and depression scale (HADS) indicating the degree of psychological distress someone experiences. The third category is called “assessment” and consists of standardized scores that measure the current level of capacity a stroke survivor has in the domain to be trained like the box-and-block-test used to measure gross manual dexterity, the nine-hole-peg-test measuring finger dexterity, or the Fugl-Meyer, arm motor score indicating the degree of ability to perform selective movements in the various segments of the affected arm. All these scores are assessed by physicians and therapists before the therapy starts and can be used during therapy by the humanoid robot for individualized patient care and communication. As an example, the robot might explain problems with a specific exercise by referring to a specific score. The humanoid robot is conceptualized as therapeutic assistant only, i.e. supervising training sessions only when therapeutic decision was taken and after the standardized training schedules have been individually adapted and evaluated in a therapeutic session with the human therapists. All knowledge gained during that process and specified decisions regarding individualization of therapy are documented and become part of the user model that the humanoid robot has to and will use. We discussed further aspect of a user model for patients in a brainstorming meeting with psychologists and physicians in our project team. As a consequence, a personal goal of the patient was considered to be important. This goal is used to motivate the patient during the exercises. A patient might have the following goal: “I would like to be able to play canasta with my friends again.” Currently, he might not be able to hold and draw cards. During the exercises, the robot might come back to this goal and say “Well done! Your exercises were really well performed. Soon, you will be able to play canasta with your friends again.” As a result, a first user model was created. All these attributes of patients are explicit knowledge that resulted in a first version of a user model with a related database. All personal data are protected within the system by data encryption and organizational measure that restrict access to authorized personnel only. In addition to this user model conceptualization as based on prior knowledge, we felt that additional relevant information might be implicitly

24

P. Forbrig et al.

available to therapists. In requirements engineering, we often use the repertory grid technique that was first discussed in a publication by Kelly [8]. With this technique, one has to arbitrary select three objects and explain what two of them have in common that distinguishes them from the third. We asked two therapists about their subjective perception regarding patient characteristics of a sample of patients receiving the type of rehabilitation therapy addressed. As a result, we elicited 170 attributes. About 30 of them seem to be promising candidates to further refine the user model. Examples are perceived motivation, need of attention from the therapists, degree of frustration during a training session, signs of forgetfulness, cognitive abilities in a more general sense, and number of breaks needed, to mention only a few of the analysed attributes. The clinical team engages in the analyses whether such perceptions of patient characteristics could reliably be assessed and whether they are relevant for human–patient interactions during therapeutic sessions.

3.2 Model of a Helper A special challenge to establish user models is the arm basis training (ABT) (see Fig. 1). When human therapists perform the ABT together with stroke survivors, they need to provide physical help, e.g. to hold the arm and provide anti-gravity weight support, while the patient is asked to move single segments of her/his limb without having to control the limb’s weight or to posturally stabilize the limb. If the ABT was to be supported by a humanoid robot, such physical support would have to be provided by a human helper, while the humanoid robot would provide the therapeutic supervision. Accordingly, it is intended that a friend or a relative plays the role of a helper after being instructed to do so by a human therapist. In this case, the robot has to cope with two user models, i.e. for both the patient and the helper. This is a very uncommon situation in computer science. To obtain some information related to this scenario, we asked two therapists to provide observations made during ABT sessions. Forty-six characteristics were identified. They help us to specify a user model for the arm basis training, where a helper is necessary to support a patient. In addition, the characteristics can help us to learn about the behaviour of therapists that could be imitated by the humanoid robot. As an example, it was recognized that therapists never criticize patients using socially negative expressions. They always communicate in a positive encouraging way. Or, therapists at times provide a set of answer options when they ask a patient. In this way, it is sometimes easier for the patient to decide what to answer. The current model of the helper for the ABT is only a starting point, and further analyses will follow to refine the model.

Using a Humanoid Robot to Assist Post-stroke …

25

4 Related Work There are several attempts to use robots for rehabilitation. Choe et al. [2] studied already in the year 2013, a combination of physical and speech training for stroke patients with a robot. The robot was called humanoid, but it looked like a machine with a TV on top. Nevertheless, positive results were reached [5]. Feng et al. [4] describe how a Nao robot interacts with children that have autism spectrum disorder (ASD). The robot guides one child to imitate its actions. The motivational aspects are different to our domain. The authors write, e.g.: (1) If the child makes a wrong action, the robot will point out the mistake and ask the child to do it again. If the child fails three times, this action will be abandoned. (2) If the child is distracted, the robot reminds him to focus attention. (3) When the child completes the task correctly, the robot praises him; when the child completes the task incorrectly, the robot encourages him. “The provide architecture CARAI is fully implemented on the robot. This is in contrast to our approach, where we use a kind of ‘thin robot’”. The software for dialogues and other services runs on servers external to Pepper. Figure 3 gives an impression of the architecture we use. The communication between different software components is executed via a MQTT server. Therapy sessions are performed with a flickboard. Since some patients might have problems to pronounce a “yes” in such a way that Pepper can understand it, with this flickboard, patients can inform the system that they are ready by putting the hand on it. The two tablets are necessary for arm-ability training and the touch screen for neglect patients. The interaction server interprets dialog scripts written in Python and controls in this way the whole application. We have been worked on a domain-domain specific language that allows the generation of the Python scripts (see e.g. [6]). A visualization tool for hierarchical state diagrams allows the identification of errors like missing transitions or wrong guards. User models and the therapy

Fig. 3 Software architecture of our E-BRAiN system

26

P. Forbrig et al.

sessions are managed by a Web-based administration tool. This architecture allows an integration of further tools easily. Only MQTT clients have to be implemented, and messages have to be specified. The receivers have to be extended by an interpretation of the new messages. Schrum et al. [15] report about a similar application of using a Pepper robot for encouraging exercise with dementia patients. The robot demonstrates an exercise and encourages the patient to copy the move. Additionally, Pepper also verbally encourages a patient by saying sentences like “Do not slow down”. Unfortunately, it is not explained how situations for encouragement are detected and how it is decided what to say. Pulido et al. [13] provide an architecture for a mirror game with the Nao robot that uses the robot only for providing instructions. A Kinect sensor is used for precepting information. Next actions are selected by a decision support system. The Nao is used similar to our Pepper as a kind of thin robot. However, communication seems to be performed directly. Blankenburg et al. [1] provide a state machine diagram of architecture flow upon issue detection. The idea of using state machines for spoken dialogs can already be found in Raux and Eskenazi [14]. Polak and Levy-Tzedek [12] developed a novel gamified system for post-stroke longterm rehabilitation, using the humanoid robot Pepper. In one game, a patient has to order the jars on a shelve according to the picture displayed on the robot’s tablet. Patient and robot are located on different sides of the shelve and can see each other. Game instructions and feedback are provided by Pepper. The system is a non-contact system, means that there is no physical interaction between Pepper and the patient. The authors mention the following feedback: “After each trial, the robot either gives the patient feedback on the timing (e.g. ‘try to do it faster next time’) or on the success on the task (e.g. ‘you succeeded!’, ‘you were not right this time, but I’m sure you will make it next time!’)”. The approach shares similarities with our ideas. However, both the therapeutic approach and the software architecture are different. All software is running on the robot. Overall, the cited work used implementations of therapeutic interventions based on a humanoid robot platform that had less complex structures, while the presented E-BRAiN system is set forth to provide neurorehabilitation therapy that closely resembles complex evidence-based training strategies and integrates complex user models for individualized robot-based therapy.

5 Summary and Outlook In this paper, we discussed ideas of training tasks supported by a humanoid robot Pepper for the arm basis training and the mirror therapy for stroke survivors with moderate to severe arm paresis. The humanoid robot acts as therapeutic assistant providing instructions and giving feedback how patients perform their training tasks. In addition, we discussed a software architecture that enables the communication of different devices and software apps via the MQTT technology. This architecture allows a smooth integration of further external services.

Using a Humanoid Robot to Assist Post-stroke …

27

Currently, we are extending our user model. Beside the existing relevant clinical personal data and individual details regarding the prescribed training, the user model also contains personal treatment goals of the patients. Further, personal and interaction characterizations are intended to be included in the future. Brainstorming sessions with therapeutic experts were performed to extract such candidates for characteristics to be integrated in a user model. The robot on the one hand will refer to various information sources including clinical data, training data, and individualized goals in order to motivate a patient. On the other hand, further identified characteristics might be used to modify and individualize the humanoid robot’s behaviour. Further, apps on tablets and touch screen for other training tasks will be developed and evaluated. Finally, the E-BRAiN project will evaluate whether patients achieved relevant clinical benefits by training with the system and whether they appreciate the collaboration with a humanoid robot based on our robot apps. Acknowledgements This joint research project “E-BRAiN—Evidenz-based Robot Assistance in Neurorehabilitation” is supported by the European Social Fund (ESF), reference: ESF/14-BM-A550001/19-A02, and the Ministry of Education, Science and Culture of Mecklenburg-Vorpommern, Germany. This work was further supported by the BDH Bundesverband Rehabilitation e.V. (charity for neuro-disabilities) by a non-restricted personal grant to TP. The sponsors had no role in the decision to publish or any content of the publication.

References 1. Blankenburg, J., Zagainova, M., Simmons, S.M., Talavera, G., Nicolescu, M., Feil-Seifer, D.: Human-robot collaboration and dialogue for fault recovery on hierarchical tasks. In: Wagner, A.R. et al. (eds.) Social Robotics, pp. 144–156. Springer International Publishing, Cham (2020) 2. Choe, Y.-K., Jung, H.-T., Baird, J., Grupen, R.A.: Multidisciplinary stroke rehabilitation delivered by a humanoid robot: interaction between speech and physical therapies. Aphasiology 27, 252–270 (2013) 3. E-BRAiN: https://wwwswt.informatik.uni-rostock.de/webebrain/. Last visited 30 Sept 2020 4. Feng, Y., Jia, O., Wei, W.: A control architecture of robot-assisted intervention for children with autism spectrum disorders. Hindawi J. Robot. 2018, 1–12. (2018) https://doi.org/10.1155/ 2018/3246708 5. Forbrig, P., Bundea, A., Platz, T.: Assistance app for a humanoid robot and digitalization of training tasks for post-stroke patients. In: Zimmermann, A., Howlett, R.J., Jain, L.C. (eds.) Proceedings of Human Centred Intelligent Systems, pp. 41–51. Springer, Singapore (2021) 6. Forbrig, P., Bundea, A., Pedersen, A., Platz, T.: Digitalisation of training tasks and specification of the behaviour of a social humanoid robot as coach. In: HCSE Conference, Eindhoven, vol. 12481, pp. 45–57. The Netherlands, December, Springer LNCS (2020) 7. Harte, R., Glynn, L., Rodríguez-Molinero, A., et al.: A human-centered design methodology to enhance the usability, human factors, and user experience of connected health systems: a three-phase methodology. JMIR Hum. Factors 4(1), e8 (2017) 8. Kelly, G.A.: The Psychology of Personal Constructs. Norton, New York (1955) 9. Platz, T.: Impairment-oriented training (IOT)—scientific concept and evidence-based treatment strategies. Reestor. Neurol. Neurosci. 22(3–5), 301–315 (2004)

28

P. Forbrig et al.

10. Platz, T., van Kaick, S., Mehrholz, J., Leidner, O., Eickhof, C., Pohl, M.: Best conventional therapy versus modular Impairment-oriented training (IOT) for arm paresis after stroke: a single blind, multi-centre randomized controlled trial. Neurorehabil. Neural Repair 23, 706–716 (2009) 11. Platz, T.: Impairment-Oriented Training—Official Homepage. Retrieved February 24, 2021 from http://www.iotraining.eu/abt.html (2019) 12. Polak, R.F., Tzedek, S.L.: Social robot for rehabilitation: expert clinicians and poststroke patients’ evaluation following a long-term intervention. In: Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction (Cambridge, United Kingdom) (HRI’20), pp. 151–160. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3319502.3374797 (2020) 13. Pulido, J.C., Suarez-Mejias, C., Gonzalez, J.C., Ruiz, A.D., Ferri, P.F., Sahuquillo, M.E.M., Ruiz De Vargas, C.E., Infante-Cossio, P., Calderon, C.L.P., Fernandez, F.: A socially assistive robotic platform for upper-limb rehabilitation: a longitudinal study with pediatric patients. IEEE Robot. Autom. Mag. 26(2), 24–39 (2019). https://doi.org/10.1109/MRA.2019.2905231 14. Raux, A., Eskenazi. M.: A finite-state turn-taking model for spoken dialog systems. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (Boulder, Colorado) (NAACL’09), pp. 629–637. Association for Computational Linguistics, USA (2009) 15. Schrum, M., Park, C.H., Howard, A.: Humanoid therapy robot for encouraging exercise in dementia patients. In: Proceedings of the 14th ACM/IEEE International Conference on HumanRobot Interaction—HRI 19, Daegu, Republic of Korea, IEEE Press, pp. 564–565 (2019) 16. Thieme, H., Morkisch, N., Mehrholz, J., Pohl, M., Behrens, J., Borgetto, B., Dohle, C.: Mirror therapy for improving motor function after stroke. Cochrane Database Syst. Rev. 2018(7). https://doi.org/10.1002/14651858.CD008449.pub3 (2018)

Human Resource Information System in Healthcare Organizations Hawraa Aref Al-Mutawa and Paul Manuel

Abstract Human resource information system (HRIS) is a significant module of healthcare information system (HIS). It has two attributes: functional and nonfunctional. The functional attributes include the payroll system, employee database, hiring process, and employee life cycle. The non-functional attributes of HRIS include ethics, privacy, legal, social, and professional issues. This research focuses on the implementation of non-functional HR attributes in HRIS. The research restricts its study in Kuwait hospitals. The result of this research shows that effective human resources information system (HRIS) has a strong impact on the quality of healthcare services. Previous literature and cross-section survey were applied to analyze and demonstrate the data and information. Keywords Healthcare information system · Human resources information system · Functional and non-functional attributes

1 Introduction Many works of literature have discussed the importance of human resources information system (HRIS) in developing high-quality services in healthcare. Moreover, functional and non-functional attributes have a huge impact to improve the performance of working staff in hospitals. HRIS can make a significant difference among healthcare institutions through better performance [1]. The HRIS is expected to be in compliance with the industrial model of management [2]. However, hospitals and healthcare institutions are not manufacturing plants. They highly require knowledge-intensive and service-oriented objects and thus requiring specific HRIS to support them. Human resource is very important because it provides a competitive advantage for healthcare institutions. With the globalized world, the stabilized national economy, and the increasingly competitive H. A. Al-Mutawa (B) · P. Manuel Department of Information Science, College of Life Sciences, Kuwait University, Kuwait City, Kuwait e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_4

29

30

H. A. Al-Mutawa and P. Manuel

market, healthcare institutions increasingly feel the need for developing an effective HRIS, as well as to solve the current and future challenge that affect the healthcare services to make any healthcare institutions different from their competitors. Through surveys among hospitals in Kuwait that have human resource professionals, it was possible to identify essential non-functional attributes that affect the performance of HRIS. The research starts with the definition of HRIS and importance, its role in healthcare industry, its responsibility, challenges, and functional and non-functional attributes. However, the study is to undertake the challenges of the HRIS in the healthcare sector in Kuwait. The study further assesses the challenges imposed by HRIS in general and the future of HRIS. Research shows that effective HRIS has a strong impact on the quality of healthcare services. Besides, it helps to improve the performance of the hospital’s staff.

2 What Is Human Resources Information System HRIS is an information system that is in charge to manage human resources of an organization. The functional attributes of HRIS include public-relation, recruitment, training, support, records, assessment, and rewarding of employees, managing organizational leadership, and human resource management process [3]. It is important to differentiate between human resources information system (HRIS) and healthcare information system (HIS). HRIS is a component of HIS. While the HRIS focuses on the hospital staff and employees, the HIS focuses on the patients [4]. Several definitions are in the literature to describe HRIS, and most of the definitions are illustrated by functions and responsibilities. And, some of these responsibilities are planning, directing, and coordinating the executive functions of an organization. They oversee recruitment, interviewing, and hiring of new staff as well as training both current and new staff. On the other hand, few studies such as EU health commission gives attention to non-functional attributes of the system such as the privacy, safety, ethical, professional, social, and legal issues [5, 6]. Heru et al. [7] defines HRIS in his work as “a systematic procedure for collecting, storing, maintaining, displaying, and validating data required by the organization on human resources, human resource activities, and organizational unit characteristics.” In general, we can say that HRIS is a system that gives the abilities for human resource professionals to be more effective, efficient, and have a high ability to support the strategic decision-making process in any institution [8].

Human Resource Information System in Healthcare Organizations

31

3 Roles of Human Resources Information System and its Importance in Healthcare Human resources information system (HRIS) can contribute to executing most of the human resource responsibilities effectively and efficiently. The area in which HRIS is responsible will be mentioned below [9]. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

Recruiting Hiring Training Organization development Communication Performance management Coaching Policy recommends Salary and benefits Team building Employee relations Leadership.

Several researchers have shown that organizations who developed and acquired in associating with HR are likely to gain additional cohesive vision and higher employee fulfillment and performance that directly touches the patient experience and the service quality [10]. Moreover, healthcare institutions are suffering from a continuing labor shortage, and that shortage may last for years. Consequently, the HRIS can help healthcare institutions to make employment decisions. In addition, it can help in evaluating valuable human resources well. Hussain et al. [11] stressed the importance of information systems present in the human resource department by any organization for nonstrategic purposes. And, the availability of such a system may contribute in reduce staffing levels of routine administration and gaining high organization efficiencies.

4 Functional Attributes HRIS has two attributes: functional and non-functional. The functional attributes include the payroll system, hiring process, and employee life cycle. The functional requirements of HRIS are given as follows [12].

32

H. A. Al-Mutawa and P. Manuel

4.1 Employee Database An employee database is a centralized database that organizes and manages all the employee data and information.

4.2 Employee Life Cycle [12] 1.

Recruit • Which includes the planning and hiring process of a new employee.

2.

Onboard • New employees become part of the company’s workforce. • The company provided the employee with the information and tools to work and integrate into the company culture. • The new employees learn about the policies of the company, the procedures, and the job responsibilities. • An employee who just accepted into the job will start training.

3.

Develop • Ensuring continued employee development. • Employee performance is assessed. • Feedback is provided to employees for their work over performance reviews and meetings. • The employees are referred to additional training or counseling.

4.

Retain • Which include continuous evaluation, appreciation, upgrade, and salary modification to ensure that employees will engage and retain.

5.

Offboard • This is the last stage of the employee life cycle. • The employee will be informed by HR about final pay and benefits

4.3 Hiring Process The hiring process includes eight stages [12] given as follows: 1. 2. 3.

Identifying hiring needs before a position can be filled. Planning includes a timeline, recruitment plan, criteria for initial candidate Screening, selection committee, interview questions, and monitoring the selection process.

Human Resource Information System in Healthcare Organizations

4. 5. 6. 7. 8. 9.

33

The decision on the qualification includes the selection process, committee, contract, and compensation. Post and promote job openings by advertised internally and externally. Application screening using an applicant tracking system (ATS) to select qualified candidates. Interview and tests. Background check and Reference checks. Onboarding.

4.4 Payroll System A payroll system involves everything that has to do with the payment of employees and the filing of employment taxes.

5 Non-functional Attributes The non-functional attributes of HRIS include ethics, privacy, legal, social, and professional issues. The EU health commission released a report in 2016 where it stated that the HRIS was lacking non-functional attributes [5]. This research focuses on the implementation of non-functional HR attributes in HRIS. The research restricts its study in Kuwait hospitals. Non-functional requirements are constraints that are not explicitly conveyed as requirements but can affect the performance and efficiency of HRIS. These are risks that can cause failure to the system. A list of non-functional requirements identified in the survey is shown in (Table 1) [5].

6 Challenges of Human Recourses Information System The HRIS is a very vital resource in the healthcare industry. When examining healthcare systems, numerous human resources concerns, and inquiries arise. Some of the problems consist of the employee’s turnover, retention and training issues, cost, employee satisfaction, poor performance, and workforce accountability and responsibility. Despites all these challenges, hospitals must find ways to overcome these difficulties. No doubt, the area is growing, and the challenges are probable to solve over a period. And, if the stakeholders come together and take appropriate initiatives, they can give a hand to solve the difficulties.

34

H. A. Al-Mutawa and P. Manuel

Table 1 Non-functional attributes Name

Description

Ethical

Protect the system and its data from any unethical practices and certify the core ethical values and obligations

Legal

Protect the system and its data from any illegal practices. To make sure that all the system user are following the legal policies and practices

Safety

Protect and defense the system and data. And, ensure that records, property, activities, documentation, and information are effectively protected against any security threats

Quality

Quality is the capability of a system to handle the traffic and load of work to meet our hospital needs

Privacy

Apply relevant measures to avoid breach/leak of personal data and breach/leak of sensitive data

Personal issues

Employees’ family issues, race, gender, and nationality are some examples of personal issues

Standards

By providing and clarifying the policies, rules, and regulations to all users the system will be effective

Professional issues Professional issues are the ability to make professional decisions when the guidance unclear. And, sending employees for a short course for training and development

7 Literature Review Many studies have been conducted to discuss and cover both topics the human resource, the role of HRIS, and their impression on the service quality in healthcare. In this section, we illustrate different works of literature that discussed HRIS in the health sector. According to Tangos [1], human resources directors have an important role and impact in saving people’s lives in healthcare institutions. Therefore, the information system in the healthcare human resource department has a vital role to ensure the effective workflow in the organization. Heru et al. [7] discuss important roles of the system. These roles are “planning activities, recruitment, selection, development of human resources, promotion, assessment of work, and remuneration.” The authors found that the information system roles are limited due to the limited function on the system that needs to develop. The limited function can be, for example, gathering some data or submitting the report by employees. The information system able to contribute as a decision-making system in human resource planning if some development is done. On the other hand, Driessen et al. [11] highlight that the information system in healthcare institutions had a prominent role in improving daily tasks. In addition to that, the information system had an effective role in the process of recruiting, selecting the best candidates, and employee training in skills that needed to be developed. Also, this system has many benefits that can extend to work rewards in the health sector, which in turn raise the level of performance in the healthcare organization.

Human Resource Information System in Healthcare Organizations

35

Siregar and Dachyar [13] identify important criteria of HRIS that have an impact on the performance of human resources system which they are “high-quality data presentation, quick and precise, accessible, information need in time, and fulfill needs of HR.” The two authors use the DEMATEL-based ANP (DANP) method to identify these criteria. Additionally, Hussain et al. [14] stressed the importance of information system present in human resource departments by any organization for non-strategic purposes. And, the availability of such system may contribute in reduce staffing levels of routine administration and gaining high organization efficiencies. Therefore, Lewis [15] states that “information technology has a huge impact in developing and changing the healthcare system. A different survey and case studies show that more investment in information technology will improve the services in healthcare institutions by enhancing team collaboration, improving performance, and boosting engagement and retention efforts.” Aizhan [16] has “concentrated on HRIS technology that can help in collecting, storing, and reporting data which help in solving the challenge of staff shortages in healthcare. The finding of this work is that using this technology to have a balance between the drive for innovation, productivity and efficiency and respect for all potential legal, ethical, and compliance issues, as well as taking account of the importance of HR for health comfortable and contentment.” A research done by Dilu et al. [17] showed the willingness of low-income countries to use and implement the information system. The existence of such a system is related to the availability of the Internet, basic computer skills, and human resource department to deal with such a system. Low-income countries face various challenges when implanting information systems due to lack of funding, inefficiency, and lack of commitment of stakeholders.

8 Methodology This study is based on a conceptual approach, and an exploratory quantitative method is used. Thus, to ensure the reliability and validity of the questionnaire, and butter fit to Kuwait in terms of cultural similarities. That is, due to time and cost constraints. The questionnaire was designed online using Microsoft Forms application and distributed using WhatsApp and emails among people living in Kuwait. A questionnaire designed by the researcher to cover the applied aspects of the study through which will be answered the questions of the study. The community of the study consists of patients, doctors, hospital staff, and managers in Kuwait. The dummy sample consists of 128 participants. The survey is divided into two parts, completed by all respondents. The first part is the demographic questions that consists of four questions, related to personal information about the company, and the person who wants to fill the survey. The second section is the main part which consists of five sections, and each section refers to several observations about HRIS. Most of the questions are statement with

36

H. A. Al-Mutawa and P. Manuel

a Likert Scale five points system. The form is in both English and Arabic languages to be more convenient for participants. The survey is developed using Microsoft Forms and sent to the public via a popular mobile application “WhatsApp application.” The primary language of most human resource directors in Kuwait is Arabic. Thus, the survey was developed in English and then translated to Arabic. Re-translation was conducted to confirm the Arabic translation. After the questionnaire was completed, it was tested in both language versions. As result, insignificant changes were made for some questions.

9 Data Analysis and Discussion By using Microsoft Forms, data were uploaded and analyzed using Excel. Analysis of demographic data included questions about gender, age, education level, and group. Also, data about human resources information system implementations included system usages, the duration of system use, functionality, and the need for such a system in Kuwait. Additionally, the question about important system requirements is like the safety and quality of the system, ethical and legal requirements, and finally privacy questions.

9.1 Overview of Respondents Analysis The targeted population for the questionnaire was people living in Kuwait; both citizens and residents were included. The distribution of the questionnaire was through the social media platform “WhatsApp” to increase the possibility of reaching out to participants within the context of the study and due to the more feasibility and convenience. The sample size of the questionnaire participants was 128 responses, with no excluded responses. The participant’s characteristics were both gender, ages between 18 and above, different education levels from less than high school to graduates degrees, and diverse groups as represented in (Table 2) the demographic profile summary. Second, (Table 3) is a representation of the summary of HRIS usages of all the participants. According to (Table 2), the demographic profile of the responses in terms of gender female representation was at 52% when the male was at 48%. In terms of age, participants with the age of between 25 and 34 were the highest number with percentages of 51%, followed by ages of between 35 and 44 years, then above 44, and last 18–24 with percentages of 24%, 13%, and 12%, respectively. The education level in the percentages were: less than high school with 3%, high school graduates with 15%, higher education with 19%, and as mentioned, the highest group is college graduates with 63%. Regarding groups, the percentages were managers with 15%, doctors with 10%; hospital staff were at 14%, and the highest group were patients with 77%.

Human Resource Information System in Healthcare Organizations

37

Table 2 Summary of demographic data Characteristics

Response

Frequency

Percent (%)

Gender

Female

66

51.969

Male

61

48.031

Age

18–24

15

12

25–34

65

51

35–44

31

24

+ 44

16

13

Less than high school

4

3

High school graduates

19

15

College graduate

80

63

Higher education

24

19

Patient

77

61

Doctor

13

10

Hospital staff

18

14

Manager

19

15

Education level

Group

Table 3 Summary of the sample HRIS usage HRIS usage

Response

Frequency

Percent (%)

Are you using human resources information system (HRIS) in your organization?

Yes

106

83

No

21

17

For how long this company using HRIS?

Less than 3 year

43

34

3–5 years

30

24

More than 5 years

54

43

Strongly agree

50

39.1

Agree

56

43.8

Neutral

21

16.4

Disagree

1

0.8

Strongly disagree

0

0

HRIS provides functionality to interact with other systems?

Do you agree that we need a HRIS in Kuwait? Strongly agree

65

50.8

Agree

49

38.3

Neutral

13

10.2

Disagree

1

0.8

Strongly disagree

0

0

38

H. A. Al-Mutawa and P. Manuel

Fig. 1 Summary of usages

Fig. 2 Summary of sample HRIS usage

From (Table 3) (Figs. 1 and 2) that represent the summary of HRIS usages of all the participants, it shows that 83% of the participant use HRIS in their organizations, and only 17% of the participant answer NO that they do not have HRIS in the organizations that they work in. Most of the samples start using HRIS for more than 5 years; they were around 43, 24% started using HRIS within 3–5 years, and only 43% have just started using SM within less than 3 years.

9.2 Factors Measurement In this research, the dependent and independent factors are used to measure the public perception of HRIS in Kuwait healthcare initiations and the use of HRIS. The independent variable in this research is ethical requirements of the system, legal requirement of the system, safety, quality, and privacy of the HRIS. Responses to the statement sections were by a Likert Scale of five points system with the codes of 1 = strongly disagree, 2 = disagree, 3 = neutral, 4 = agree, and 5 = strongly agree. Thus, to specify the position of mean in the descriptive statistics in Table 4, a measurement scale was adapted (low agreement = 00.1–2.32; medium agreement =

Human Resource Information System in Healthcare Organizations

39

Table 4 Summary of correlation coefficient analysis Safety

Privacy

Legal

Quality

Ethical

Safety

1

Privacy

0.325145

1

Legal

0.486675

0.562303

1

Quality

0.417945

0.689424

0.698255

1

Ethical

0.399723

0.594412

0.568701

0.715485**

1

**Correlation is significant at the 0.01 level (2-tailed)

2.33–3.65; and high agreement = 3.66–5.00) [18]. Data were loaded, manipulated, and analyzed using Excel.

9.3 Correlation Coefficient Analysis The Pearson’s correlation coefficient is used to statistically measure the relationship between the independent variables in this research. The value of correlation as shown in (Table 4) between ethical and quality indicates a strong positive relationship also means both relationships are strongly correlated with each other. Also, the value of correlation shows a weak correlation between safety with privacy and with ethical.

9.4 Descriptive Statistics Analysis The overall response of participants in (Table 4) indicates a positive level of agreement to the statements in the questionnaire, but the agreement level is high in all the statements. Looking at (Fig. 3) and (Table 5) shows that the means of all the responses are between 3.9 and 4.21 indicating a high level of agreement.

MEAN

Fig. 3 Summary of mean 4.5

4.21

4 3.5

SAFETY

4.05

PRIVACY

3.91 LEGAL

4.04

QUALITY

3.9

ETHICAL

40

H. A. Al-Mutawa and P. Manuel

Table 5 Overall descriptive statistics Factor

N

Mean

Std. deviation

Level

Safety

128

4.21

61.895

High

Privacy

128

4.05

61.975

High

legal

128

3.91

62.045

High

Quality

128

4.04

61.98

High

Ethical

128

3.9

62.05

High

9.5 Discussion As mentioned before, the aim of this paper is to assess the following issues in the Kuwait healthcare institutions in both private and government hospitals: • The difference between the focus of the ethical issues of patients and the ethical issues of the hospital employees. • The difference between the focus of the safety issues of patients and the safety issues of the hospital employees. • The difference between the focus of the privacy and security issues of patients and the privacy and security issues of the hospital employees. • The difference between the focus of the legal issues of patients and the legal issues of the hospital employees. • The difference between the focus of the quality issues of patients and the quality issues of the hospital employees. And, the finding of the study is that HRIS in Kuwait addresses the ethical issues properly, but the ethical issues of hospital employees are not properly addressed. The privacy issues of the patients are addressed appropriately, on the other hand, the privacy issues of employees are inappropriately addressed. While the safety issues of the patients are also properly addressed, the safety issues of employees are not properly addressed. And for the legal issue, it properly addresses the patient and is not properly addressed for the employee. Finally, the quality issues of the patients are addressed properly, but the employee’s safety issues are not properly addressed. In general, most of the healthcare initiation does not concentrate on the non-functional attributes which will affect the HRIS negatively. The results indicate that most of the participants are patients and have use HRIS in their work for more than 5 years. Most of the participants have a concern about how the system will ensure the safety and privacy issues for the patients and employees. Also, the results show there is a need for HRIS for the Kuwait region. This survey further assists in generating functional and non-functional requirements for the proposed application.

Human Resource Information System in Healthcare Organizations

41

10 Limitations The sample selection process for this study included few groups of the system users such as patients, doctors, hospital staff, and managers. The selection process was random, and it is only a subset of the entire population of Kuwait. As a result, this limitation would influence the generalizability of the study, since the random sample only included a subset of the Kuwait population. Additionally, the sample size was small and included numerous independent variables which required to be controlled. Such a small sample size introduced limitations for a lack of estimating parameters available for larger samples, including consistency and efficiency.

11 Recommendations The analysis of previous literature and studies that cover the arena of HRIS in hospitals and healthcare organizations shows the essential needs for enhancement in healthcare organization which they are summarized as recommendations in the following points: 1. 2. 3. 4.

Build a new model for HRIS such that Kuwait legal issues are properly addressed in the system. Build a new model for HRIS such that local privacy issues are properly incorporated in the system. Improve the model of HRIS such that security assurance meets international standards. Modify a flexible HRIS to support continuous improvements.

12 Future of Human Resource Information System in Healthcare The HRIS in the healthcare industry may face pressing challenges that have a big impact in the coming years. That the most important three issues that will have the biggest impact are safety, digitization, and privacy. Healthcare institutions must keep safety by not only preventing diseases but by preventing the destruction of HRIS. The digitization challenge means that HR needs to up to date with any new technology, and they must assess, develop, prepare for the digital revolution, and they needed to keep up with all the changes of the new technology. And privacy issues, HR needs incentive training to prevent a data breach and protect digital information and data [19].

42

H. A. Al-Mutawa and P. Manuel

13 Conclusion The HRIS has a critical role to qualify the delivery of effective efficient medical services and accomplish patient fulfillment. Moreover, the study clarifies that HRIS influence healthcare quality in a significant way. According to data collected from the online survey, the HRIS has an important attribute to attain the goals of health organizations and progress the performance. In general, the HRIS in healthcare institutions should have a clear strategic plan, goals, and objectives to progress the care quality and improve staff performance. Acknowledgements We wish to acknowledge the generous financial support from the Kuwait Foundation for the Advancement of Sciences (KFAS) to present this paper at the conference under the Research Capacity Building/Scientific Missions program.

References 1. Tangos, J.: How to Become an HR Manager in Healthcare. Retrieved September 23, 2020 (2020, July 17) 2. Osibanjo, O.A., Adeniji, A.: Human Resource Management: Theory and Practice (2012) 3. Armstrong, M.: Human Resource Management Practice, 10th edn. (2006) 4. Tursunbayeva, A., Bunduchi, R., Franco, M., Pagliari, C.: Human resource information systems in health care: a systematic evidence review. J. Am. Med. Inform. Assoc. 24(3), 633–654 (2017) 5. Ec.europa.eu: [Online]. Available at https://ec.europa.eu/info/sites/info/files/file_import/aarhr-2016_en_0.pdf. Accessed 16 Mar 2021 (2016) 6. Aykan, E.: Gaining a competitive advantage through green human resource management. Corporate Govern. Strat. Decis. Making (2017). https://doi.org/10.5772/intechopen.69703 7. Heru, S., Endang Siti, A.: Analyzing and Modeling the Role of Human Resource Information System on Human Resource Planning at Higer Education Institution in Indonesia [online] (2017) 8. Lewis, N.: HR Technology’s Role in Health Care is Growing. Retrieved August 23, 2020 (2020, Mar 31) 9. Sajeewanie, T., Opatha, H.: Relationships between human resource manager-related factors and practice of strategic human resource management in Sri Lankan listed firms. Sri Lankan J. Hum. Resour. Manage. 1(1), 71 (2013). https://doi.org/10.4038/sljhrm.v1i1.5112 10. Eisenhower, D.D.: Dwight D. Eisenhower Quotes. Retrieved November 01, 2020 (2001) 11. IvyPanda: Role of Human Resources Management in Health Care Industry (2020, Mar 22) 12. Msichoices.org: Human resource management information system [Online] Available at https:// www.msichoices.org/media/2735/annex-2-hrmis-requirements.pdf (2008) 13. Bureau of Labor Statistics, U.S. Department of Labor, Occupational Outlook Handbook, Human Resources Managers, on the Internet 14. Russell, A.: The Role of HR Manager in Health Care. Retrieved October 23, 2020 (2018, June 29) 15. El-Jardali, F., Tchaghchagian, V., Jamal, D.: Assessment of human resources management practices in Lebanese hospitals. Hum. Resour. Health 7, 84 (2009) 16. Hussain, Z., Wallace, J., Cornelius, N.E.: The use and impact of human resource information systems on human resource management professionals. Inf. Manage. 44(1), 74–89 (2007). ISSN 0378-7206. https://doi.org/10.1016/j.im.2006.10.006

Human Resource Information System in Healthcare Organizations

43

17. Nguyen, A.: Healthcare HR Week 2020: What’s Ahead for Healthcare HR in the Coming Decade? [Web log post] (2020, Mar 17) 18. AlAufi, A., AlHarthi, I.: Citizens’ perceptions of government’s participatory use of social media. Transf. Govern. People Process Policy 11(2) (2017). https://doi.org/10.1108/TG-092016-0056 19. O’Donnell, R.: Nursing—clinical nursing; researchers from university of Toronto report new studies and findings in the area of clinical nursing (the employee retention triad in health care: exploring relationships amongst organisational justice, affective commitment and turnover ...). (2018, Apr 13). Health & Medicine Week (2019, May 28) 20. Prachi, M.: What is Team Building? Definition, Process, Advantages, Disadvantages (2018, Dec 14)

Performance Prediction of Scalable Multi-agent Systems Using Parallel Theatre Franco Cicirelli and Libero Nigro

Abstract This paper proposes an approach to modelling and performance prediction of large multi-agent systems, based on the theatre actor system. The approach rests on Uppaal for formal modelling, graphical reasoning and preliminary property checking, and on Java for enabling large model sizes and execution benefits on a multi-core machine. As a significant case study, the minority game (MG) binary game often used in economics, natural and social sciences is chosen for modelling and analysis. In MG, a population of agents/players compete, without explicit interactions, in the use of a shared and scarce resource. At each step, each player has to decide if to use or not the resource, and by understanding that when the majority of agents decides to exploit the resource, an inevitable congestion would arise. In classic MG, although each player learns from the experience, it is unable to improve its behaviour/performance. A genetic variant of MG is then considered which by using crossover and mutation on local strategies allows a bad-performing player to possibly improve its attitude. The paper shows an MG formal actor model, which is then transformed into Java for parallel execution. Experimental results confirm good execution speedup when the size of the model is scaled to large values, as required by practical applications. Keywords Actors · Theatre framework · Minority game · Genetic algorithm · Evolutionary learning · Performance prediction · Uppaal · Multi-core machines · Java

F. Cicirelli (B) CNR—National Research Council of Italy, Institute for High Performance Computing and Networking (ICAR), 87036 Rende (CS), Italy e-mail: [email protected] L. Nigro University of Calabria, DIMES, 87036 Rende (CS), Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_5

45

46

F. Cicirelli and L. Nigro

1 Introduction Model-driven development (MDD) is an important discipline in engineering [1] which centres on the use of (hopefully formal) models for designing, analysing and implementing, e.g. a software product. MDD is also advocated in the simulation domain, e.g. [2, 3] where models are uniformly exploited and transformed during all the phases of the system development life cycle. This paper proposes an MDD-based approach suited to the performance prediction of scalable, large multi-agent systems (MAS). The approach depends on the use of isolated components actors [4, 5]—which share no data and interact to one another by asynchronous message passing. In particular, the actor model provided by parallel theatre [6] is adopted, which has an efficient implementation in Java, on top of the now common multi-core machines. Actor models are first formalized as timed automata in Uppaal SMC [7], for model correctness and preliminary property checking, being assisted by the intuitive visual graphical interface and the query temporal language offered by the toolbox. Correct actor models are then transformed in Java for extensive performance prediction. Actors can be partitioned into distinct theatres (computing nodes possibly allocated to distinct physical cores) to enable performance measures on large actor populations. The approach is practically demonstrated by applying it to minority game (MG) [8] modelling. Both the basic MG behaviours are modelled, where players have limited learning capabilities resting on a short memory of previous game outcomes, as well as an extended genetic version of MG is considered where bad-performing players can dynamically adapt their behaviour by replacing poor-performing strategies with better strategies achieved by crossover and mutation of good performing strategies. Models are thoroughly analysed, and their performances are documented. In addition, the execution benefits provided by parallelism on scalable MG models are shown. The approach contributed by this paper significantly extends previous authors’ work based on Uppaal [9], by providing more general actor models and controlled message scheduling and delivery, and model transformation and execution in Java for high-performance computing. The paper is structured as follows. Section 2 provides a synthesis of basic concepts of parallel theatre which the proposed approach is based on and of the minority game (MG). Section 3 starts application of the approach by showing a formal actor model for MG and its preliminary analysis, using Uppaal SMC. Space and time restrictions of using Uppaal are clarified. Section 4 discusses porting the formal actor model to Java, for both mono-core and parallel execution. Performance prediction of MG models is reported in Sect. 5. Section 6 indicates some execution speedups when the MG population size is scaled. Section 7 concludes the paper with some hints about on-going and future work.

Performance Prediction of Scalable …

47

2 Background 2.1 An Overview of Parallel Theatre The work described in this paper focuses on the use of actors [4, 5] as the building blocks of applications. Classical actors are isolated software components (they share no data) which interact each other by asynchronous message passing and nondeterministic message delivery. Autonomy and location transparent communications depend on a thread-based local behaviour and a mailbox where incoming messages get buffered and which acts as the universal actor name/address. In last years, actors have demonstrated their power and capabilities as an important alternative to classic multi-threaded concurrent systems based on shared data and locks [10], through the successful realization of large distributed, web-based, untimed software systems. Theatre [11] is a particular version of actors designed to fulfil the timing requirements of cyber-physical systems (CPS) [11, 12]. The core infrastructure of theatre centres on light-weight actors (not-thread-based) and a customizable reflective control layer which manages a time notion (real time or simulated time) and (transparently) supervises message exchanges and their delivery. In a case, message delivery can be made deterministic, e.g. [12], thus, enabling the development of predictable CPS. Actors encapsulate a data status and have a message interface. Actors are at rest until a message arrives. A message is processed by a reaction method (message server or msgsrv) which cannot be interrupted nor suspended (macro-step semantics [11]). Theatre is currently hosted by Java with a standalone (mono-core), multi-core (parallel theatre) [6] and distributed implementation [13]. The resultant programming style is shown in Fig. 12. A library of control forms was realized enabling both concurrent (untimed) and timed applications. The use of global time is assumed. In the following, due to its relevance in the paper, the parallel theatre organization is briefly considered. More information can be found in [6]. A system is a federation of theatres (computing nodes/threads) which can be mapped on distinct cores of a multi-core machine. Each theatre has a transport layer (for receiving external messages), a control layer and an application layer (subsystem of local executing actors). A timeserver component, attached to one theatre, is in charge for time alignment in a theatre system. Java object references are directly used as ‘actor names’ to enable location transparent communications. Despite the use of default by-reference semantics of object parameters during message passing, which can be useful for execution efficiency, it is the responsibility of the modeller/programmer to guarantee a ‘de-facto’ by-value semantics as in the standard actors model [5]. A side benefit of parallel theatre, though, is the safe sharing of data (e.g. a large data structure) among the actors belonging to a same theatre. This is a consequence of message interleaving; that is, the fact that the local control form processes messages one at a time (cooperative concurrency by macro-step semantics). Parallel message processing, instead, occurs among actors of different theatres. The control forms of the various theatres coordinate each other and with the timeserver, through hidden

48

F. Cicirelli and L. Nigro

control messages, to ensure consistency of global time or to detect the termination condition when using the concurrent control form. Parallel theatre is totally lock-free and has the potential for high-performance computing [14].

2.2 Basic Concepts of Minority Game The minority game (MG) [8] is a binary game based on inductive reasoning, which has applications in the economic, natural and social sciences and so forth. MG was abstracted from the ‘El Farol Bar’ problem of Brian Arthur [15]. Using the bar metaphor, a certain number N of people have to choose, at every week-end, whether to go to the bar or not (i.e. stay at home), the bar denoting a limited/scarce resource. If the majority of people opts for the bar, the comfort will be lost due to the congestion, and the best decision belongs to people who remained at home. Dually, when the majority of people chooses to stay at home, people who decided to go to the bar will find a pleasant uncongested situation. In any case, the winning side is the minority one. The metaphor has many interpretations, e.g. in the economic domain. Consider a new business opportunity which is prospected to firms. If the majority of firms decides to try the new business, the situation deteriorates because the real benefits arising from the new business vanish and vice versa. Formally, N (supposed odd) players are repeatedly asked, for a great number T of steps, to choose between A or B, or 1 or 0, or −1 or 1, etc. When all the players have formulated their decision, the outcome of the step is established and will be A or B depending on the minority side. Players are not allowed to exchange information to one another how to commit to a decision. The decision process can only be based on the short knowledge (memory) of M immediately preceding outcomes and on a pool S of local strategies. Each strategy proposes a binary choice on the basis of the contents of the M bits. At the end of a step, players are informed of the current outcome, which then enters the M memory through a right shift. Players know the outcome but not the numerosity of the winning/minority side. MG has an immediate binary interpretation. Each strategy is a binary function defined on the 2 M binary space established by the M bits (M is also termed the brain si ze). Of course, the M total space of strategies is made up of 22 possible binary functions. Absence of explicit interactions, force players to base their decision solely on the M bits and the S available strategies (inductive reasoning and limited rationality). To give players an objective measure about how to choose a strategy in the local pool of S strategies, at the end of each time step, player strategies receive a r ewar d or a blame depending on the fact if the strategy would have it been selected to guide the decision process would have been capable of guessing or not the outcome. Strategies are assigned a virtual score (vs) which is incremented in case of a reward, decremented in case of blaming. The player, finally, receives a real score (rs), and each time it locates in the minority side. At every step, each player selects for playing the (or one of the) best strategy in its pool, i.e. having the highest virtual score. As the game progresses, local strategies of players tend to differentiate each other. Upon

Performance Prediction of Scalable …

49

this, differentiation is based the learning attitude of players. Each player randomly initializes the M bits of its brain (or history) and the 2 M bits of each strategy of its local pool S.

2.3 Fundamental Behaviour of MG The behaviour of MG can be captured by the attendance measure At (t) which is the numerosity of a chosen observed side (A or B is indifferent) at each time step t. In particular, fluctuations of the attendance around the point A M = (N 2−1) which is the maximum numerosity of the minority side, mirror the inducted coordination degree of players. Smaller fluctuations denote better coordination. It is expected that, for a given S, as the “intelligence” of the player increases by augmenting the brain size M, the fluctuations of the attendance will reduce. On the other hand, for a certain M value, increasing the number S of local strategies tends definitely to confound the player behaviour. This aspect can be investigated through the average frequency of success FS (t). At each time step t, the numerosity of the winning side is detected and averaged over the N population. Such fractions are then accumulated (FS ), and the temporal average FtS estimates the trend of variation of the frequency of success. Average confusion in players due to many available strategies manifests with a lower value of the frequency of success.

2.4 A Genetic Extension of MG It is worth noting that in basic MG, players cannot replace strategies. As a consequence, an ‘unfortunate’ player which is assigned a pool of poor-performing strategies will continue its bad-performing behaviour. In order to improve the evolutionary behaviour of MG, in [8, 16], a genetic extension is proposed with the goal of dynamically reverting a bad-performing player towards a better behaviour. The genetic adaptation is founded on two parameters: τ , which is a rate or number of steps, after which each player can check if it is a bad performer, and n which is a threshold (percentage) such that if the experimented fraction of bad decisions, that is (steps−rs) steps is greater than or equal to n, the time is arrived to revise its own behaviour. Adaptation is achieved by crossover and mutation (see Fig. 1) of two mother strategies, thus, achieving two child strategies which then can replace two strategies in the local pool S. Four schemas (policies) are considered in [16]: (A)

Mother strategies are randomly chosen in the local pool of strategies, and after crossover/mutation, mother strategies are replaced by child strategies.

50

F. Cicirelli and L. Nigro

Fig. 1 Example of crossover and mutation

(B)

(C) (D)

Mother strategies are randomly chosen, and after crossover/mutation, child strategies replace the two worst performing strategies (i.e. having the lowest virtual score). Mother strategies are chosen among the best performing strategies in the pool, and after crossover/mutation, child strategies replace mother strategies. Mother strategies are chosen among the best performing strategies in the pool, and after crossover/mutation, the child strategies replace the two worst-performing strategies.

It can be anticipated that although the “democratic” aspect of policies (A) and (C), they are less effective than (B) and (D) policies which address replacement of worst strategies. Policy D is expected to be the most effective, because replaces badperforming strategies by genetic modification of two best strategies. In any case, after a genetic adaptation, the virtual scores of replaced strategies are reset. Besides the attendance measure, the behaviour of genetic policies can also be evaluated through the scaled utilit y f unction: Ut = ((1 − θ (At − A M )) ∗ At + θ (At − A M ) ∗ (N − At ))/A M where θ (At − A M ) is the heavy side step function which is 0 when At ≤ A M , 1 when At > A M . Ut,max = 1 when the numerosity of the winning side reaches A M . The more close to 1 is the value of Ut , better is the players coordination and behaviour.

3 A Formal Actor Model of the Minority Game MG was modelled as a network of timed automata (TA) in Uppaal SMC [7] composed by three actor types: manager , player and a main for configuration purposes. Players have unique ids in the range 0 . . . N − 1. The manager instance has the id N . The main has the id N + 1. Corresponding subrange types were introduced, and a single parameter (named sel f for uniformity) of the relevant type used for the automatic creation of the actor instances at the system configuration time. The manager supervises the time-stepped simulation. At each step, a self-sent ST E P message is received by the manager which then asks players to play by sending them a P L AY message. In response, each player sends a P R O P O S E

Performance Prediction of Scalable … Fig. 2 Automaton of the manager actor

51

Msg==INITsend! initManager(), R=self,Msg=STEP, A=1 Select Receive msgsrv[self]? Msg==PROPOSE propose(arg[0]), ply++ ply 45.77 25.83–45.77 million—Morocco (I9)

12.76–25.83

4.76–12.75

Unemployed population (%) (I1)

Population living < 6.7 in poverty (%) (I2) Health

Economic

Number of > 1.0 national patent applications accepted Morocco by 10,000 (I7) Literacy rate (%) (I8)

Technology Mobile cellular subscriptions (%) (I10) Popularization rate of Internet (I11)

< 4.76

> 151.3 122.57–151.36 97.91–122.56 64.50–97.90 < 64.50

> 76.07 65.77–76.07

65.76–52.54

52.53–22.39 < 22.39

Initially, an evaluation of the data used is required in order to determine the nature of the series (stationary, non-stationary) facing each type of series, a method will be used to predict future data for the axis studied. The non-stationary series are mainly linked to two causes either the series is expressed as a function of time, and it is the case of the following indicators I2, I13, I16, I8, I10. Regarding I9 the series related to this indicator is also non-stationary, and this is mainly due to variance and co-variance.

100

A. Founoun et al.

Table 2 The objective function of indicators that have a non-stationary series Indicators Objective function

Predictive value of 2021 (%)

I2

y = 0.01625 ∗ x − 32.4435(1)

0.38

I3

Y = 0.0 ∗ x + 9.65(2)

9.65

I4

y= 38.78 −2.7857142857142856 ∗ x + 5665.928571428572(3)

I6

y = 0.0 ∗ x + 0.93(4)

I8

93

y = 0.005499999999999977 ∗ x+ − 10.393499999999953

Table 3 The result of Arima model

(5)

71.64

Indicators

Predictive values of 2021

Indicators

Predictive values of 2021

I5

6.51%

I9

12.39

I7

0.56

I11

52.96%

For the non-stationary series, we calculate the objective function to predict the value of the year 2021 in this case (Table 2). And for the stationary series we applied the Arima model. For some indicators, we can’t train the model because of insufficient data for training and it is the case of I5, I7, I11 indicators and we predict the value of the year 2021 by the mean of the values in each indicators, and this is the result (Table 3). And finally, we train the Arima model for the other cases and we obtained the following result (Table 4). Vulnerability Assessment Based on the results of the predictions for the year 2021, and on the basis of the evaluation scale defined in Table 3, the vulnerability of the various fields is assessed, with the aim of determining the most vulnerable fields (Table 5). The gamification enablers, their role is mainly the design and definition of the different game scenarios based on the initial evaluation results, in this part the design specifies with the various stakeholders the expected objectives and the expected rewards. The integration of players (Citizen, NGO, Tourist and Company) with the experts in the description of the game scenarios is important in order to have a consensus around the game scenarios and to ensure that the different scenarios are not rejected. Table 4 The result of Arima model

Indicators I1 I10

Predictive values of 2021 (%) 9.68 137.87

Agile Governance Supported by the Frugal Smart City

101

Table 5 The objective function of indicators that have a non-stationary series Areas

Indicators

Vulnerability level

Financial

Unemployed population (%) (I1)

L4

Population living in poverty (%) (I2)

L1

Health

Number of medical staff (1000 people) (I3)

L2

Population without compulsory health insurance (%) (I4)

L5

Economic

Technology

Population aged 65 or above (%) (I5)

L2

SMEs in the industrial environment (%) (I6)

L3

Number of national patent applications by 10,000 (I7)

L3

Literacy rate (%) (I8)

L3

Number of tourist million—Morocco (I9)

L5

Mobile cellular subscriptions (%) (I10)

L2

Popularization rate of Internet (I11)

L3

This participatory action also seeks to reassure the various stakeholders with regard to the management rules for this operation (e.g., protection of personal data of players and stakeholders) [10]. This integration can go through several channels and several forms to promote collective reflection, for example, living labs in a framework of participatory democracy. 1. 2. 3. 4. 5.

Recruitment of volunteers (Citizen, Company, NGO, Tourist, etc.) Definition of working groups by area, ensuring a mix between stakeholders. Presentation of the results of the initial area assessment. A mix between stakeholders. Discussion around possible game scenarios and objectives for the city, as well as on the definition of the rules for managing data and results, etc. Definition of the general framework of the game scenarios as well as the possible platforms for the game.

After the phase of consultation with the various actors, a global idea on the framework of gamification must be drawn up for visibility on the general progress as well as on the objectives to be followed, etc., (Table 6). The gamification support seeks to develop a web platform with several types of profiles (Citizen, Company, NGO, etc.), and this app platform considers the player profile, history, results, rewards and ranking and supports possible modifications. The gamification support seeks to develop a web platform with several types of profiles (Citizen, Company, NGO, etc.), and this app platform considers the player profile, history, results, rewards and ranking and supports possible modifications. Registration of stakeholders—Games and their progress of Stakeholders— Ranking of Stakeholders—The gifts given to Stakeholders—Follows the progress of Stakeholders by government. To do this, we define a sequence of tasks that the actors and the government follow.

The game aims to improve the purchases of local products The game consists in introducing a list of local products purchased at the platform, the citizen will have the objective to buy these products and record the lot numbers, at the end of the month a ranking to see the most engaged citizens The municipality will identify the most tourist places at city level, tourists must take a photo with the monuments and places identified by the municipality, the names of the people who will succeed in the challenge will be appointed as the city’s ambassador

The game is for all profiles and it consists of: The number of guests for registration on the platform per week for each category

Economic Area

Technology vulnerability

• All improve awareness of the platform Badge fidelity

Citizen and tourists • Improve citizen engagement with their territories and local products • Develop awareness and attractiveness of the city • Build loyalty among tourists and improve their relationship with the city; Gifts and badge

ONG Closely follow the elderly and Prepare the vaccine phase for COVID-19 Badge

The game will take place on a web platform The game consists of recruiting the largest number of people over the age of 65 The platform will give access to associations to enter the number of recruits, a ranking will be published at the end of each month

Health Area

Players goals and awards Enterprise • Local job creation • Encourage local investment Badge

Goal and scenario

Financial and economic area The game aims to motivate businesses to create local employment and local investment The game will take place on a web platform: Companies must have access to a platform to declare the number of local employees recently hired as well as the investments made at regional level A ranking will be published at the end of each month to show the community the companies most involved locally

Application areas

Table 6 Gamification goal and scenario Business objectives

Improve awareness of the platform

• Develop sales of local products and reduce imports of existing local products • Develop the sales of traders around monuments and tourist places

Development of the notoriety of associations working in this social segment

• Develop awareness of local businesses • Improvement of the company’s social commitment

102 A. Founoun et al.

Agile Governance Supported by the Frugal Smart City

103

Fig. 2 The layout of the smart city gamification framework

First, the player has registered in the application and choose from a drop-down list his membership: 1—Company 2—NGOs 3—Citizen 4—Tourists. The choice of the nature of the player also makes it possible to allocate the games and/or missions, all registration is verified and approved by the government through an administrator account. Through the mobile application, players can record their progress and the administrator approves the progress, with the aim of giving the player a gift according to his progress in his game and or mission. The reward system in place mainly aims to integrate stakeholders in the deployment of city projects through missions and games defined at the platform level. The platform also ensures the distribution of a streaming ranking. The government after a precise period can compare the results of the indicators predicted at the level of the table with the actual to know at what level this system can impact the vulnerability of the city. The development teams also provide reports for decision-makers to get an idea of participation rates, and this communication is ensured in both directions in order to review the types of games as well as the scenarios to adapt them more with the development of the situation post-COVID 19 (Fig. 2).

4 Discussion The integration of stakeholders in the deployment of post-crisis projects presents an important opportunity for the city to improve the relationship between decisionmakers and its stakeholders. This integration also allows the city to improve its economic performance by reducing the possible impact of resistance to the deployed projects. The collection and intelligent use of data is also an important issue for improving the governance of the city through the prediction of the vulnerable axes of the city.

104

A. Founoun et al.

This prediction based on the data also allows the prioritization of projects as well as the rationalization of budgets, which directly impacts the cost of executing post-crisis projects. The game scenarios established at the level of gamification can reduce the vulnerability of the axes studied, as well as improve the participation of citizens in the daily management of the city. The mobile application allows streaming monitoring of the activity of stakeholders in relation to the projects decided, as well as the progress against the action plan decided for the post-crisis. This sequence comes to materialize an approach of intelligent governance centered on the needs of stakeholders while calling on the use of data as well as technology, and we can clearly say that this sequence makes it possible to feed Human Technology Driven approach for the management of crisis situations. The smart city of Casablanca, as already described at the start, highlights an ambitious project that calls on the concept of Human Technology Driven approach for the management, this approach that allows the participation and consideration of the city’s stakeholders. The concept of the frugal smart city improves our vision and perception of intelligence by taking into account collective intelligence and technological progress at the same time.

5 Conclusion COVID-19 can present a real opportunity for smart cities to stand out quickly, as this concept of gamification adds a layer of trust between decision-makers and the various other actors of the city by integrating them into a collective decision-making process and of purely participatory democracy. Something that will allow the smart city to follow the needs of these citizens and build a framework of participatory democracy based on technology for the service of humans. We also presented a general framework for gamification by integrating the various interested parties and mainly the citizen through the Human Technology Driven approach for the management of crisis situations, more precisely for the post-crisis situation through an integral solution, participatory and above all frugal. Our work comes to initiate a serious reflection on the integration of the citizen in public decision making to further develop the concept of smart cities as well as to make the citizen an active actor in the process of management of the city and especially to empower the other actors in making city governance a common subject. This paper also comes to build the foundations of a participatory democracy within smart cities by using technology and reducing the distance between the decision maker and the citizen.

Agile Governance Supported by the Frugal Smart City

105

References 1. Kunzmann, K.R.: Smart cities after COVID-19 : ten narratives. disP—Planning Rev. 2020. https://doi.org/10.1080/02513625.2020.1794120 2. Kummitha, R.K.R.: Smart technologies for fighting pandemics : the techno- and human—driven approaches in controlling the virus transmission. Govern. Inf. Q. 101481 (2020). https://doi. org/10.1016/j.giq.2020.101481 3. Xu, C., Luo, X., Yu, C., Cao, S.J.: The 2019-nCoV epidemic control strategies and future challenges of building healthy smart cities. Indoor Built Environ. 29(5), 639–644 (2020). https://doi.org/10.1177/1420326X20910408 4. Clay, K., Lewis, J., Severnini, E.: Economics and human biology what explains cross-city variation in mortality during the 1918 influenza pandemic? Evidence from 438 U. S. cities. Econ. Hum. Biol. 35, 42–50 (2019). https://doi.org/10.1016/j.ehb.2019.03.010 5. Fernandez-Anez, V., Fernández-Güell, J.M., Giffinger, R.: Smart city implementation and discourses: an integrated conceptual model. The case of Vienna. Cities 78, 4–16 (2018). https:// doi.org/10.1016/j.cities.2017.12.004 6. Brajkovi, E.: Evaluation of methods for sentence similarity for use in intelligent tutoring system. 3(5), 1–5 (2018) 7. Moustaka, V., Maitis, A., Vakali, A., Anthopoulos, L.G.: CityDNA dynamics: a model for smart city maturity and performance benchmarking. In: The Web Conference 2020—Companion of the World Wide Web Conference, WWW 2020, pp. 829–833 (2020). https://doi.org/10.1145/ 3366424.3386584 8. Batabyal, A.A., Nijkamp, P.: Creative capital, information and communication technologies, and economic growth in smart cities. Econ. Innov. New Technol. 28(2), 142–155 (2019). https:// doi.org/10.1080/10438599.2018.1433587 9. Hayar, A., Betis, G.: Frugal social sustainable collaborative smart city Casablanca paving the way towards building new concept for “future smart cities by and for all”. 1–4 (2018). https:// doi.org/10.1109/senset.2017.8305444 10. Rezzouqi, H., Gryech, I., Sbihi, N., Ghogho, M., Benbrahim, H.: Analyzing the accuracy of historical average for urban traffic forecasting using google maps. Springer International Publishing (2018). https://doi.org/10.1007/978-3-030-01054-6_79 11. Kazhamiakin, R., Marconi, A., Martinelli, A., Pistore, M., Valetto, G., Bruno, F., Trento, K.: A gamification framework for the long-term engagement of smart citizens (Section III) (2016) 12. Garrett, T.A.: Pandemic economics : the 1918 influenza and its modern-day implications 75–94 (2008) 13. Founoun, A., Hayar, A.: Evaluation of the concept of the smart city through local regulation and the importance of local initiative. In: 2018 IEEE International Smart Cities Conference, ISC2 2018, pp. 1–6 (2019). https://doi.org/10.1109/ISC2.2018.8656933 14. Founoun, A., Hayar, A., Haqiq, A.: The textual data analysis approach to assist the diagnosis of smart cities initiatives. In: 5th IEEE International Smart Cities Conference, ISC2 2019, pp. 150–153 (2019). https://doi.org/10.1109/ISC246665.2019.9071663 15. Besley, T.: Law, regulation, and the business climate: the nature and influence of the world bank doing business project. 29(3), 99–120 (2015)

Effect of Normal, Calcium Chloride Integral and Polyethene Sheet Membrane Curing on the Strength Characteristics of Glass Fiber and Normal Concrete Jaison Joy Memadam

and T. V. S. Varalaksmi

Abstract Compressive strength of concrete is more when compared to its tensile strength. Fiber inclusion improves the tensile strength and the compressive strength of concrete. Many researchers have included the fiber in the concrete and positive results were obtained. In this study, Alkali resistant glass fibers are added in different proportions (0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5 and 4%) to the concrete. Compressive and tensile strength of the specimens were then determined. The optimum glass fiber percentage is obtained by comparing the compressive strength and tensile strength of the different specimens. Similarly the percentage of calcium chloride to be added to the concrete for calcium chloride integral curing was also determined. Different percentages like 0, 0.5, 1, 1.5, 2, 2.5 and 3% of calcium chloride were added. The optimum percentage was determined by identifying the concrete mix with the highest compressive strength. Different polythene sheets of various colors (Black, Blue, White and Pink) were used as membranes for curing. Compressive strength after 28 days was then obtained and the best suitable color of the membrane was identified. Both normal concrete and glass fiber concrete were then cured by normal curing, calcium chloride integral curing and polythene membrane curing. The strength characteristics of both normal and glass fiber concrete were then identified after curing for 7, 14 and 28 days. Keywords Fiber reinforced concrete · Calcium chloride integral curing · Membrane curing

1 Introduction Concrete construction is an unavoidable necessity in the modern world. It is obtained by mixing the required proportion of cement, water, aggregates and admixtures, in which ordinary Portland cement is the major binder component. When cement is mixed with water, hydration process occurs and will result in the strength gaining. J. J. Memadam (B) · T. V. S. Varalaksmi Acharya Nagarjuna University, Guntur, Andra Pradesh 522510, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_10

107

108

J. J. Memadam and T. V. S. Varalaksmi

Fibers also mixed with concrete in order to enhance its strength and durability characteristics. In this study, glass fibers were added to the concrete. Addition of glass fibers in to the concrete eliminates the cracks and shrinkage in the surface [1]. Elastic modulus of glass fibers is in the higher side which will result in increasing tensile strength and the addition results in glass fiber concrete.

1.1 Curing Moisture inside the concrete is maintained by the process of curing. Moisture is extremely essential for the hydration process to occur which will ultimately leads to strength gaining in the concrete [2]. Curing can be defined as the process of providing moisture for the concrete for a specified time period by controlling the movement of moisture and temperature in order to facilitate the hydration process [3]. The external weather condition may lead to the loss of the moisture inside the concrete. Proper curing will results in increased density and reduction in porosity. Pores, shrinkage, strength defects and chemical defects are the results of improper curing [4]. There are different types of curing like conventional curing, in which the concrete surface is kept moist for a certain duration, membrane curing in which impermeable membranes are used to prevent the moisture loss, accelerated curing in which early strength is achieved by adding some admixtures. It is also proved that concrete will gain only 50% of its estimated strength when it is left uncured [5]. So in this study, we have used three types of curing methods, normal curing, calcium chloride integral curing and polythene sheet membrane curing for both normal concrete and glass fiber concrete. Calcium chloride is one of the common accelerators used in concrete. It accelerates cement hydration and will reduce the concrete setting time [6]. The use of calcium chloride is limited to 2% by American Concrete Institute. Membrane curing is generally used in places where the availability of water is limited. It acts as a physical barrier to protect the concrete against the escape of water through evaporation [7].

1.2 Glass Fiber Concrete Glass fiber addition to the concrete prevents shrinkage and cracks in its surfaces [8]. Tensile strength can also be enhanced by its addition [9]. Fiber’s also mixed with concrete in order to enhance its strength and durability characteristics. Addition of glass fibers prevents the crack formation and results in less shrinkage. Elastic modulus of glass fibers is in the higher side which will result in increasing tensile strength. Fibers can be treated as an aggregate, instead of the round and smooth appearance it is linear. These fibers can easily surround the aggregates by interlocking themselves with them [10]. The addition of fibers in the concrete results in less workability as the fibers tend to form a wire mesh around the aggregates. So, it should be properly

Effect of Normal, Calcium Chloride Integral and Polyethene … Table 1 Properties of cement

109

Property

Result

IS 12269-1987 requirements

Specific gravity

3.12

– m2 /kg

Fineness a) Blaine b) Sieve test

275 2.5%

Min 225 m2 /kg Not more than 10%

Standard consistency

27%

–

Setting time (initial)

45 min

Not less than 30 min

Final setting time (final)

360 min

Maximum 600 min

Le-Chatlier soundness

2 mm

Not more than 10 mm

dispersed in the concrete during the mixing. It also helps to increase ductility, impact strength and fatigue strength [11].

2 Materials and Methods 2.1 Cement Ordinary Portland cement was used for mixing. Testing of physical properties was in accordance with IS: 4031, and the results are tabulated in Table 1.

2.2 Fine Aggregate River sand is used as the fine aggregate. Figure 1 depicts the particle distribution graph. The fineness modulus and the specific gravity of fine aggregate used were 3.83 and 2.68, respectively.

2.3 Coarse Aggregate Aggregates with nominal Size 20 mm was used. Water absorption and specific gravity was found to be 0.5% and 2.72, respectively. Figure 2 represents the particle distribution graph for the coarse aggregate.

110

J. J. Memadam and T. V. S. Varalaksmi Cumulative Percent passing

120 100 80 60 40 20 0 0.1

1

10

Size of sieze in mm

Fig. 1 Particle size distribution graph

Particle sized distribution graph Cumulative Percent passing

150 100 50 0 1

10

100

-50 Size of sieze in mm

Fig. 2 Particle size distribution graph for coarse aggregate

2.4 Mix Ratio Mix ratio of M-30 concrete was calculated using IS: 10,262-2009 and SP23, which resulted in a mix ratio of 1:1.88:3.38:0.45 (Cement:Sand:Coarse aggregate:water cement ratio).

2.5 Calcium Chloride The CaCl properties that are used for the experiment is shown below. • • • •

Type: Powder Color: White Odor: No Odor Solubility: Water Soluble

Effect of Normal, Calcium Chloride Integral and Polyethene … Table 2 Glass fiber properties

Property

111 Result

Type

Alkali resistant

Specific gravity

2.68

Elastic modulus

72 GPa

Tensile strength length

2100 MPa 12 mm

Density

2680 kg/m3

Aspect ratio

857.1

2.6 Glass Fiber Table 2 shows the glass fiber properties.

2.7 Compression Strength Test The compression test of the concrete cubes of size 15 cm was carried out as per IS: 516-1959.

2.8 Tensile Strength Test 150 mm diameter and 300 mm height cylindrical specimens were casted and tested as per IS: 5816.

3 Results and Discussions 3.1 Glass Fiber Concrete The compressive strength of concrete with different glass fiber percentages is shown in Fig. 3. The compression test showed that the optimum percentage of glass fiber to be added as 3%. The tensile test results are shown in Fig. 4. There was an increase of 4% in compressive strength and 22% in tensile strength for the addition of 3% glass fiber.

112

J. J. Memadam and T. V. S. Varalaksmi Comp. strength of Different Percentages of Glass Fiber Comp. Strength( MPa)

45 40 35 30 25 20 7

14

28

No. of curing Days 0%

0.50%

1%

1.50%

2.50%

3%

3.50%

4%

2%

Fig. 3 Compression test result of concrete with different glass fiber percentages

Tensile Strength( MPa)

Tensile strength of Different Percentages of Glass Fiber 7 6 5 4 3 2 1 0 7

28 No. of curing days

0%

0.50%

1%

1.50%

2.50%

3%

3.50%

4%

2%

Fig. 4 Tensile strength test result of concrete with different glass fiber percentages

3.2 Calcium Chloride Integral Curing Calcium chloride can be used as an accelerated curing agent. The compressive strength with different percentages of calcium chloride is shown in Fig. 5. Calcium chloride addition to the cement enables it to gain strength rapidly. So, the compressive strength after 7 days of curing is much higher than the normal concrete after the same number of curing days. There was an increase of 25% in compressive strength after 7 days of curing. The strength after 14 days didn’t show much variation, but there was a noticeable decrease in the compressive strength after 28 days of curing which is resulted by the effect of chloride attack. The optimum percentage of calcium chloride was found to be 2%, since it gave the least strength loss.

Effect of Normal, Calcium Chloride Integral and Polyethene …

113

Comp. strength of different Percentages of Calcium chloride used for integral curing Comp. Strength( MPa)

40 35 30 25 20 7

14

28

No. of curing Days 0%

0.50%

1%

1.50%

2%

2.50%

3%

Fig. 5 Compression test result of concrete with different calcium chloride percentages

Compressive Strength( MPa)

Comp. strength of Different Colours of Polythene Sheets used for curing 40 35 30 25 20 7

14

28

No. of curing Days Normal Curing

Black

Blue

white

Pink

Fig. 6 Compression test result of concrete with different colors of polythene sheet membranes

3.3 Polythene Membrane Curing The compressive strength for different polythene sheet colors is shown in Fig. 6. The white color polythene sheet curing showed the least strength loss. Compressive strength loss of 11.4% was observed for white polythene sheet curing when compared to normal curing.

3.4 Normal Concrete Under Different Curing Conditions The compressive strength of normal concrete under different types of curing is shown in Fig. 7. Normal curing gave the highest, and polythene curing gave the least compressive strength. Density also showed the similar trend as shown in Fig. 8.

114

J. J. Memadam and T. V. S. Varalaksmi

Compressive Strength (MPa)

compressive strength of Normal concrete under different curing conditions 45 40 35 30 25 20 15 10 5 0 Normal Curing

White polythene sheet curing

Calcium Chloride Integral Curing

Types of Curing 7 Days

14 Days

28 Days

Fig. 7 Compression test result of normal concrete by different types of curing

density of Normal concrete after curing age of 28 days 2680 2660

Density (Kg/m3)

2640 2620 2600 2580 2560 2540 Normal Curing

Polythene Curing

Calcium Chloride Integral Curing

Fig. 8 Density of normal concrete under different types of curing

3.5 Glass Fiber Concrete Under Different Curing Conditions Figure 9 shows the compressive strength of glass fiber concrete at 7, 14 and 28 days of curing for different types of curing. The highest and the least compressive strength was observed in case of normal and polythene curing, respectively. The same trend followed for density as represented in Fig. 10.

Effect of Normal, Calcium Chloride Integral and Polyethene …

115

Comp. strength of GFRC under different types of curing 45

Compressive Strength (MPa)

40 35 30 25 20 15 10 5 0 Normal Curing

Polythene Curing

7 Days

14 Days

Calcium Chloride Integral Curing

28 Days

Fig. 9 Compression test result of glass fiber concrete by different types of curing

Density of Glass Fiber concrete 2700 2690

Density (Kg/m3)

2680 2670 2660 2650 2640 2630 2620 2610 Normal Curing

Polythene Curing

Calcium Chloride Integral Curing

Fig. 10 Density of glass fiber concrete under different types of curing

3.6 Strength Characteristics of Normal Concrete and GFRC The compressive strength of both normal and glass fiber concrete after 7, 14 and 28 days are shown in Figs. 11, 12 and 13, respectively. Glass fiber concrete showed the maximum compressive strength for three durations of curing. The calcium chloride integral curing resulted in high early strength gain in the both normal concrete and glass fiber concrete.

116

J. J. Memadam and T. V. S. Varalaksmi

Compressive Strength (MPa)

Comp. strength of GFRC and Normal concrete under different types of curing after 7 days 33 31 29 27 25 23 21 19 17 15 Normal Curing

Polythene Curing

Calcium Chloride Integral Curing

Types of Curing Normal Concrte

Glass Fiber Concrete

Fig. 11 Compression test result of normal concrete and GFRC after 7 days

Comp. strength of GFRC and Normal concrete for different types of curing after 14 days Compressive Strength (MPa)

40 38 36 34 32 30 Normal Curing

Polythene Curing

Calcium Chloride Integral Curing

Types of Curing Normal Concrte

Glass Fiber Concrete

Fig. 12 Compression test result of normal concrete and glass fiber at a curing age of 14 days

4 Conclusions The experiments resulted in the following conclusions. (i)

The tensile and compressive strength of glass fiber concrete showed the maximum value when the percentage of glass fiber added was 3%. There was an increase of 22% in tensile strength for the glass fiber concrete.

Effect of Normal, Calcium Chloride Integral and Polyethene …

117

Compressive Strength (MPa)

Comp.strength of GFRC and Normal concrete for different types of curing ( 28 days) 42 40 38 36 34 32 30 Normal Curing

Polythene Curing

Calcium Chloride Integral Curing

Types of Curing Normal Concrte

Glass Fiber Concrete

Fig. 13 Compression test result of normal concrete and glass fiber after 28 days

(ii)

(iii)

(iv)

The minimum compressive strength loss was found when amount of CaCl added to the concrete is 2%. The addition of calcium chloride for the accelerated curing resulted in 25% increase in compressive strength after 7 days of curing. White colored polythene sheets are more desirable for membrane curing as it showed only 10% decrease in compressive strength when compared to the normal curing. The density of normal and glass fiber concrete is the least for polythene membrane curing when compared to normal and calcium chloride integral curing.

References 1. Khan, S.A. Ahmed, Z., Afiya, A.: Performance of FRC produced with mineral admixtures and waste plastic fibers under sulfate attack. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 8(6S4), 1101–1107 (2019) 2. Neville, A.M.: Properties of Concrete, 4th edn. John Wiley and Sons, USA, New York (1996) 3. Price, W.H.: Factors influencing concrete strength. J. Am. Concr. Inst. (1991) 4. Wojcik, G.S., Fitzjarrald, D.R.: Energy balances of curing concrete bridge decks. J. Appl. Meteorol. 40(11) (2001) 5. Mamlouk, M.S., Zaniewski, J.P.: Materials for Civil and Construction Engineers, 2nd edn. Pearson Prentice Hall, New Jersey (2006) 6. Rapp, P.: Effect of calcium chloride on Portland cements and concretes. In: Proceedings, Fourteenth Annual Meeting, Highway Research Board (1934) 7. Akinwumi, I., Gbadamosi, Z.: Effects of curing condition and curing period on the compressive strength development of plain concrete. Int. J. Civil Environ. Res. 1, 83–99 (2014) 8. Shah, S.P., Rangan, V.K.: Effect of fiber addition on concrete strength. Indian Concr. J. 5(2–6), 13–21 (1994) 9. Shende, A., Pande, A., Pathan, M.G.: Experimental study on steel fiber reinforced concrete for M-40 grade. Int. Refer. J. Eng. Sci. 1(1), 043–048 (2012)

118

J. J. Memadam and T. V. S. Varalaksmi

10. Jagarapu, D.C.K.: Experimental studies on glass fiber concrete. Am. J. Eng. Res. 5, 100–104 (2016) 11. Vairagade, V., Kene, K.: Comparative study of steel fiber reinforced over control concrete. Int. J. Sci. Res. Publ. 2(5), 1–4 (2012) 12. Al-Gahtani, A.S.: Effect of curing methods on the properties of plain and blended cement concretes. J. Construct. Build. Mater. 24, 308–314 (2014) 13. ASTM standards 14. Bakhshi, M., Mobasher, B.: Simulated Shrinkage Cracking in the Presence of Alkali Resistant Glass Fibers, pp. 36–48. ACI Special Publication, American Concrete Institute (2011) 15. El-Reedy, M.A.: The concrete industry. In: Advanced Materials and Techniques for Reinforced Concrete Structures (2009) 16. Hemavathi, S., Kumaran, A., Sindhu, R.: An experimental investigation on properties of concrete by using silica fume and glass fibre as admixture. Mater. Today: Proc. 21 (2019). https://doi.org/10.1016/j.matpr.2019.06.558 17. IS 10262-2009: Recommended guidelines for concrete mix design (2009) 18. IS 383-1970: Specification for coarse and fine aggregates from natural sources for concrete (1970) 19. Klieger, P.H.: Effect of mixing and curing on concrete strength. ACI J. Proc. 4(12), 1063–1081 (1958) 20. Siddiqui, M.S., Nyberg, W., Smith, W., Blackwell, B., Riding, K.A.: Effect of curing water availability and composition on cement hydration. ACI Mater. J. 110(3), 315–322 (2013) 21. SP-23: Hand Book on Concrete Mixes 22. IS: 12269-1987: Specification for 53 grade ordinary Portland cement. BIS, Delhi, India (1987) 23. IS: 4031-1996: Methods of physical tests for hydraulic cement. Bureau of Indian Standards, New Delhi, India (1996) 24. IS: 516-1959: Indian standard code of practice methods of test for strength of concrete. Bureau of Indian standards, New Delhi, India (1959) 25. IS: 5816-1999: Splitting tensile strength of concrete method of test. Bureau of Indian Standards, New Delhi, India (1999) 26. IS: 10262-2009: Concrete mix proportioning guidelines. Bureau of Indian Standards, New Delhi, India (2009) 27. IS: 456-2000: Plain and reinforced concrete code of practice (fourth revision). Bureau of Indian Standards, New Delhi, India (2000)

Problems with Health Information Systems in Ecuador, and the Need to Educate University Students in Health Informatics in Times of Pandemic Gustavo Caiza, Fernando Ibarra-Torres , Marcelo V. Garcia , and Valeria Barona-Pico

Abstract Background: Health information systems are important in every nature, but only macro has been thought of and the Universities that host thousands of students have been neglected. At present, they have faced a serious problem, the lack of health informatics personnel and the lack of importance of health information systems. Objectives: The present research has taken as a tool a survey to determine the use of health informatics in universities and to assess the training needs of university students in public health fields. Methods: The survey was conducted in two periods, the first period July–September 2019, and the second period in July– September 2020, to 3967 university students in Zone 3 of Ecuador, with the objective of evaluating the knowledge and use of health information systems. Results: The majority of students surveyed acknowledge that they do not use a computerized system to manage health services. Less than half have received any type of education in health informatics. Only between 15 and 33% say that their universities provide the necessary support to promote education in health informatics. Ninety-two percent would be interested in learning the use of online hospital services, and 68% health information search. Conclusions: University students in Ecuador need preparation in health informatics, which requires the revision of educational curricula and the implementation of educational projects for health informatics training. Keywords Health informatics · University students · Health information systems · Health problems in education · Health informatics education

G. Caiza Universidad Politécnica Salesiana, UPS, 170146 Quito, Ecuador e-mail: [email protected] F. Ibarra-Torres (B) · M. V. Garcia · V. Barona-Pico Universidad Técnica de Ambato, UTA, 180103 Ambato, Ecuador e-mail: [email protected] M. V. Garcia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_11

119

120

G. Caiza et al.

1 Introduction The combination of informatics and health informatics is becoming increasingly important for improving health system services [1]. Health information systems or health informatics is a multidisciplinary field that uses health information technology to improve health care services. A health information system is based on the design and construction of information applied to the health care field [2, 3]. In times of pandemic, computer systems are being used as a tool by healthcare professionals to assist in the daily management of care, avoid complications, prevent medical and medication errors, and conduct clinical research [4]. Health information systems facilitate the management of various aspects of health, while health informatics supports both healthcare professionals and patients in their decision making to achieve desired outcomes. This support is achieved through the use of information structures, information processes, and information technology [5, 6]. Globally, countries regardless of their region or geographical location are trying to improve their health information systems mainly due to lack of logistical and personnel resources [7]. In general, countries face similar challenges in the implementation of hospital information systems [8]. Such is the case in the South American region where there is a marked difference between one country and another when implementing its own health information system, although efforts are made to adopt a single approach to promote the implementation of information technologies in their healthcare environments [9]. In different health institutions at local, regional, and global levels, data are still collected manually in booklets and then transcribed to digital databases, but a serious problem arises, the double work for the collection of the same information and even worse the alteration or deletion of certain sensitive information at the time of transcription [10]. Analyzing the paragraph described above, it becomes important to generate health policies, training in health terms for informatics professionals so that they can create adequate systems, with the sole objective of improving health care services [11]. At the local level, several Ecuadorian universities have a health information system, but lack well-trained informatics personnel, both in data collection and data analysts [7, 12]. Poor management and excessive bureaucracy when entering information into a health information system causes discomfort in the people in charge of the process [13, 14]. Now an important factor to be taken into account for the development of health information systems in local universities is the cost, having as a drawback the lack of investment that state governments make in research [13]. However, a critical factor is the shortage of competent and well-trained informatics personnel at the country level. Several efforts are generated to mitigate the mentioned shortages, but it is necessary to allocate more financial resources in academia to continue training human talent in information technology and in the creation of health information systems that precisely address local needs [15, 16]. Several studies have been generated in which it has been recommended that the academic preparation of future health professionals be reviewed, but nevertheless, in

Problems with Health Information Systems …

121

times of pandemic, each person seeks to obtain information to solve health problems and not expose themselves to potential contagions [13]. The lack of concern on the part of the governments of the day has meant that preparation has not changed that professionals continue to leave with the same deficiencies and that university students continue to be lost in the search for true health information [17, 18]. Developed countries, such as the United States, despite their advances in health informatics, continue to have barriers especially in the entry of health information [19, 20]. Universities in the country and the region provide personnel or professionals in health, but what about students who without pursuing these careers also find themselves in the need to learn basic knowledge of health informatics. The lack of knowledge in the health field has been a true reflection of how entire sectors have collapsed in this time of pandemic [11, 21]. Several governments in the region are looking for ways to implement health information systems, encountering various challenges and obstacles, but their use and application are still limited to only recording data without seeking to further analyze such information [22, 23]. Implementing or developing a health information system at the university level is not a simple task and even more so at the country level. This depends on several factors between organizational and technological in order to arrive at a concise model [24, 25]. Therefore, commitment from university education is required, and therefore, our study aims to evaluate the use of health information systems by university students.

2 Method A longitudinal descriptive design was carried out among students from several universities in zone 3 of the productive development of Ecuador. A sample of universities was selected in two periods, one to September 2019 and another to September 2020. In each period, the survey was executed as far as possible to the same students, and the study questionnaire was applied to approximately 10% of the total number of students in each university. For the selection of students, the sampling method was adopted among different faculties of the universities in order to obtain different criteria on the subject. The two executions of the survey were first designed to answer the question “What would be the reaction of university students to unforeseen health problems?” The second execution of the survey was designed to answer the same question, but after several actions had been taken, especially because of the pandemic, the collection of information was facilitated with the help of online surveys. The use of online surveys was decided by the researchers in view of the mobility restriction that began to apply in the country. We are grateful for the collaboration of several friendly professors from the universities involved, who were great facilitators for the present research. Each questionnaire was divided into 2 parts, the first part aimed to find demographic information of the participants, the second part the use of health informatics systems, and the current knowledge of health informatics and skills needed to develop to understand health informatics.

122 Table 1 Distribution of university students (n = 3967), according to survey, 2020

G. Caiza et al. Has economic activity

%

Study and work

44.6

Just study

55.4

The students did not have a time limit to answer the questionnaire, a total of 20 questions were asked and a Likert-type scale was used. The decision was made by the researchers to very politely ask fellow teachers to carry out the survey during class hours, thus ensuring that the answers would be delivered on time. From the observations obtained from the teacher colleagues, it was possible to obtain that the questionnaire was answered in an average time of 15 min.

3 Results 3.1 Participants A total of 3967 students from 3 universities in the central part of the country responded to the questionnaires; 100% of the respondents were Ecuadorian. The average age of the respondents is 22-year old (range 19–25). About 45% are in the first levels of university (first and third), 35% are in the intermediate levels (fourth–sixth) and 20% are in the upper or final levels of university (seventh–ninth). About 80% of the respondents had never manipulated health informatics tools, 10% of students consider themselves to be upper class or wealthy, 55% middle class and 35% poor or economically low class. Approximately, 55% of the students surveyed did not work, while 45% do work in different areas (see Table 1).

3.2 Years of Study The mean number of years of university study was 1 year (range 1–6). While the mean number of years of knowledge of the use of health informatics or health informatics was 1 (range 0–6) years. Most of the undergraduate students of the participating universities started using health informatics systems or researching health informatics with the onset of the pandemic (Tables 2 and 3).

Problems with Health Information Systems … Table 2 Years of study (n = 3967), according to survey, 2020

Table 3 Time spent using health information systems (n = 3967), according to survey, 2020

123

Years of study

%

0

0

1–2

45.2

3–4

35.2

5–6

19.5

Driving time

%

0

79.8

1–2

20.2

3–4

0

5–6

0

3.3 Health Informatics Research Overall, 70% of the students surveyed stated that prior to the pandemic they had not received any information on health informatics (see Fig. 1), 75% of respondents stated that they had never reviewed or manipulated a health information system (see Fig. 2), 60% of respondents did not know what health informatics was (see Fig. 3). Fig. 1 Have received training in health informatics

Received health informatics training before the pandemic

30% 70%

Yes

Fig. 2 Manipulation of health information systems

No

Manipulated health information systems

25% 75%

Yes

No

124 Fig. 3 Knowledge of health informatics

G. Caiza et al. Knows what health informatics is

40% 60%

Yes

Fig. 4 Interest in health informatics

No

Interest in health informatics by gender

40% 60%

Man

Female

3.4 Use of Information Technology by University Students One aspect that caught the attention of the researchers is that of the 40% of students who have had some knowledge of health informatics, 65% are female and show more inclination to learn or investigate the topics surrounding health informatics (see Fig. 4). 52% of male students who responded to the survey showed a lack of interest in learning about health informatics.

3.5 Knowledge in Health Informatics to be Received A large proportion of university students indicated that they needed knowledge in health informatics; 56.0% identified that with the pandemic their interest in learning health informatics grew regardless of their original degree program, and finally, 42% said that the development of a university’s own health information system was important (see Fig. 5).

Problems with Health Information Systems …

125

Statistics and interest in learning and developing health informatics 70 60 50 40 30 20 10 0

58

56 44

Interest in learning health informatics

42

Development of health information systems information

Fig. 5 Statistics and interest in learning and developing health informatics

4 Discussion The results of this survey applied at two different times, one at the beginning of the pandemic and the other at the height of the pandemic, allow us to have a different vision of what university students need in terms of health. The shortcomings of the academic curriculum make it clear that a restructuring of the curriculum is needed. In a marked way, the results of this research show that health informatics is increasingly important from an early age, in this case university students. The majority of those surveyed do not use or know about health informatics. This is worrying and in a certain way justifies the reason for the crisis at the country level in this pandemic. When analyzing the economic and social situation of the students, it is left aside at the moment of learning about health informatics. The requirement is the same in the students, the need is the same, and the lack of knowledge and the concern for the learning gap is the same. An important aspect to highlight is the greater concern for learning about health informatics that the female student has, at the country level this is reflected in the fact that unfortunately most of those infected during the pandemic period in Ecuador have been men. It should be mentioned that the present research has limitations. It was carried out as part of a broad survey of several university students in the country, to determine the degree of need to learn about health informatics; therefore, the sample size included 10% of the students of the 3 universities; therefore, this sample could not be described as representative of all university students in the country. But nevertheless, it is worth noting that the sample was with a considerable number of participants, which helped to have a clearer picture; compared to what was initially available. In addition, by having respondents from the three socio-demographic groups, the research has had a clearer vision of the problem. The results of the research also identified that although there are university students in the health area in the sample, and they do not have sufficient knowledge and in several cases no knowledge at all in health informatics.

126

G. Caiza et al.

References 1. Sánchez-Henarejos, A., Fernández-Alemán, J.L., Toval, A., Hernández-Hernández, I., Sánchez-García, A.B., de Gea, J.M.C.: García de buenas prácticas de seguridad informática en el tratamiento de datos de salud para el personal sanitario en atención primaria. Atención Primaria 46(4):214–222 (2014) 2. Nutley, T., Reynolds, H.: Improving the use of health data for health system strengthening. Glob. Health Action 6(1), 20001 (2013) 3. Liu, H., et al.: An information extraction framework for cohort identification using electronic health records. AMIA Summits Transl. Sci. Proc. 2013, 149 (2013) 4. W. H. Organization and others, Bangladesh health system review. Manila: WHO Regional Office for the Western Pacific, 2015 5. Abdekhoda, M., Ahmadi, M., Dehnad, A., Hosseini, A.F.: Information technology acceptance in health information management. Methods Inf. Med. 53(1), 14–20 (2014) 6. Murauskiene, L., et al.: Lithuania: health system review, 2013 7. Pleasant, A.: Advancing health literacy measurement: a pathway to better health and health system performance. J. Health Commun. 19(12), 1481–1496 (2014) 8. Duckett, S., Willcox, S., et al.: The Australian Health Care System, no. Ed. 5. Oxford University Press (2015) 9. Naylor, C.D.: On the prospects for a (deep) learning health care system. JAMA 320(11), 1099–1100 (2018) 10. Naser, S.S.A., Al-Bayed, M.H.: Detecting Health Problems Related to Addiction of Video Game Playing Using an Expert System, 2016 11. Lee, J., Wu, F., Zhao, W., Ghaffari, M., Liao, L., Siegel, D.: Prognostics and health management design for rotary machinery systems—reviews, methodology and applications. Mech. Syst. Signal Process. 42(1–2), 314–334 (2014) 12. Xu, B., Xu, L., Cai, H., Jiang, L., Luo, Y., Gu, Y.: The design of an m-Health monitoring system based on a cloud computing platform. Enterp. Inf. Syst. 11(1), 17–36 (2017) 13. Hammad, S.A., Jusoh, R., Ghozali, I.: Decentralization, perceived environmental uncertainty, managerial performance and management accounting system information in Egyptian hospitals. Int. J. Account. Inf. Manag. (2013) 14. Riggirozzi, P.: Regionalism through social policy: collective action and health diplomacy in South America. Econ. Soc. 43(3), 432–454 (2014) 15. Rutten, C.J., Velthuis, A.G.J., Steeneveld, W., Hogeveen, H.: Invited review: sensors to support health management on dairy farms. J. Dairy Sci. 96(4), 1928–1952 (2013) 16. Dodson, S., Good, S., Osborne, R.: Health literacy toolkit for low and middle-income countries: a series of information sheets to empower communities and strengthen health systems, 2015 17. Abdelhak, M., Grostick, S., Hanken, M.A.: Health Information-e-Book: Management of a Strategic Resource. Elsevier Health Sciences (2014) 18. Herrero, M.B., Tussie, D.: UNASUR Health: a quiet revolution in health diplomacy in South America. Glob. Soc. Policy 15(3), 261–277 (2015) 19. Wager, K.A., Lee, F.W., Glaser, J.P.: Health Care Information Systems: A Practical Approach for Health Care Management. Wiley, New York (2017) 20. Fradelos, E.C., Papathanasiou, I.V., Mitsi, D., Tsaras, K., Kleisiaris, C.F., Kourkouta, L.: Health based geographic information systems (GIS) and their applications. Acta Inform. Medica 22(6), 402 (2014) 21. Cline, G.B., Luiz, J.M.: Information technology systems in public sector health facilities in developing countries: the case of South Africa. BMC Med. Inform. Decis. Mak. 13(1), 1–12 (2013) 22. Darrigran, G., et al.: Non-native mollusks throughout South America: emergent patterns in an understudied continent. Biol. Invasions 22(3), 853–871 (2020) 23. Maggi, M., et al.: Honeybee health in South America. Apidologie 47(6), 835–854 (2016)

Problems with Health Information Systems …

127

24. Avigliano, E., Schenone, N.F.: Human health risk assessment and environmental distribution of trace elements, glyphosate, fecal coliform and total coliform in Atlantic Rainforest mountain rivers (South America). Microchem. J. 122, 149–158 (2015) 25. Montalvo, W., Ibarra-Torres, F., Garcia, M.V., Barona-Pico, V.: Evaluation of WhatsApp to Promote Collaborative Learning in the Use of Software in University Professionals. In: Applied Technologies (2020), pp. 3–12

Utilizing Technological Pedagogical Content (TPC) for Designing Public Service Websites Zahra Hosseini, Jani Kinnunen, and Kimmo Hytönen

Abstract With the indispensable role of the Internet in information transfer, ensuring the validity and accuracy of data is ever more important. Digital public services are expected from e-governments. Such services will require user-centered websites as well as full interoperable platforms. Many studies have shown the lack of user satisfaction with website content. In this regard, we suggest technological pedagogical content design (TPCD) for the use of a pedagogical theory for designing a website. TPCD is derived from the technological pedagogical content knowledge (TPACK) framework, introduced by Mishra and Koehler (Mishra and Koehler in Teach. Coll. Rec. 108:1017–1054, 2006) emphasizing the use of integrating technology into pedagogy and content. TPCD with a pedagogical view in selecting and constructing the content aims to pull attention toward learning theories and focusing on the quality of content to build a user-centered website. It suggests considering the learning theories to understand the user needs, background, and expectations to define the goals, organize the content, and select the technological features. This paper opens a new viewpoint for website developers by introducing practical guidelines to transform TPACK as knowledge of integrating technology into action. Keywords TPACK · Website designing · TPCD · User-centered design

Z. Hosseini (B) Tampere University, Kalevantie 4, 33100 Tampere, Finland e-mail: [email protected] J. Kinnunen Åbo Akademi University, Tuomiokirkontori 3, 20500 Turku, Finland e-mail: [email protected] K. Hytönen Independent MSc. Engineering Researcher, Tampere, Finland © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_12

129

130

Z. Hosseini et al.

1 Introduction While organizations are investing huge amounts of money and efforts to transfer valid and reliable information, many users seek the information they require from various social media and assume them as better sources of information [2, 7, 12, 28]. Through the important role of social media with its powerful communication facilities and attractions, paid information may be prioritized and the wrong information delivered by peers may lead to biased, incorrect, or expired information. This is seen as a significant issue calling for attention to find practical and valid solutions. The literature indicates various attempts to find and introduce models for creating and evaluating websites in different areas [14, 18, 19, 24–26]. Some studies report deficiencies in website content, particularly, when the information transfer is purposed (Bernstein et al. 2021). Acosta-Vargas [1], after evaluating government websites in 20 countries, concluded that most websites do not achieve the desired level of compliance. In Europe, the European Commission promotes its e-government action plan to “provide high quality, accessible online services to citizens and businesses in the EU by transforming websites managed by the Commission into a thematic, usercentered web presence” [4]. This is argued to increase transparency and supports the participation of both citizens and businesses in policymaking, facilitates interactions, and answers to the growing demand to high quality, interoperable and re-usable data. Due to the importance of government websites to provide information for public, the researchers have attempted to define the criteria and models for evaluating the quality of government websites. [23] introduced seven dimensions of website evaluation to elaborate their model including: web design, reliability, responsiveness, privacy and security, personalization, information, and ease of use. The authors used their model to assess the quality of Indonesian government websites and found four of them suitable in their practice. In another attempt, Puling and Sitokdana (2020) evaluated a government website based on ten dimensions including: accuracy, consistency, punctuality, completeness, reliability, availability, relevancy, credibility, efficiency, and value-added concluding that the information on the website did not meet the expectations of the users. While the researchers have focused on defining the criteria to evaluate the quality of content of government websites, there seems to be a need for practical instructions, which help the web designers in terms of using technology aspects to organize the content of a website. Therefore, this article aims to suggest the instructions for integrating pedagogical principles to organize the content of a website. In this regard, we suggest technological pedagogical content design (TPCD) as a model for integrating pedagogy into technology for transferring a specific content. TPCD has potential to develop the human-interaction technology models with utilizing user-experience findings. It has more focus on content and pedagogical principles. Currently, we have progressed with guidelines, using TPCD model, to redesign the website of Suomen Yrittäjät (eng. “Finland’s Entrepreneurs”) and the largest national association of small and medium-sized enterprises in Finland. The user-centered design is built on

Utilizing Technological Pedagogical Content (TPC) for Designing …

131

needs of the members of the association and accompanying content analysis; similarly, the design of e-government websites needs to be built on the citizen-centered user analysis together with the context-specific content analysis. This paper presents the theoretical foundations of TPCD by discussing TPC and the bidirectional relations of its components as the practice.

2 Theoretical Framework of TPCD TPC is a practical instruction of TPACK framework for designing a website. Technological pedagogical content knowledge (TPACK) proposed by (Mishra and Koehler 2006) is a promising solution for making the content easier to learn by integrating technology into pedagogy and content. Most of the existing practices on TPACK have emphasized developing abilities of in-service and pre-service teachers in virtual and face-to-face educational environments [3, 8, 9, 27]. We expand this framework to a broader context and transform TPC knowledge to TPC practice outside of the educational area. TPC is a core of three fundamental concepts: (a) technology meaning all different available technology to create and design the website; (b) pedagogy, which is mostly related to learning theories and knowledge about learners; and (c) content, which can be linear or nonlinear, defines the construct of each subject. For integrating the components of TPC, we will determine and define the stages of integration. We have three sub-integrations including pedagogical content (PC), technological content (TC), and technological pedagogy (TP).

2.1 Pedagogical Content (PC) Pedagogical content is using pedagogical principles for organizing the content. For this process, we need to consider the following steps: (a) Content is the selected information and materials to be inserted on the website; (b) goal is whatever we expect our audiences know or do as a result of our work; goal map shows the selected contents based on the learning goals; (c) we need to consider different materials for the different users’ learning styles; (d) outlines when we have the “goal map” or topics of goals; we need to divide the content according to the topics and subtopics. Each topic may include paragraphs or text, pictures, videos, graph. Steps of integrating pedagogy to content are shown in Fig. 1. Before starting to integrate pedagogy to content, the knowledge about the users’ expectation, background, and needs are obtained and defined.

132

Z. Hosseini et al.

Fig. 1 Steps of using pedagogical content (PC) for creating a website

2.2 Technological Content (TC) Integration of technology and content for designing websites refers to the use of techniques for presenting or teaching the content. The steps for the content developers are (cf. Fig. 2): (a) selecting the tool or platform: Mostly, designers select a CMS platform, based on the facilities and complexities offered for user needs, such as Wordpress, Drupal, and Joomla; the selection of the template is based on the goal of the website or content; (b) customizing the template: by using add-ons or plugins to manage the content; (c) deciding the content formats is based on available tools and techniques.

Fig. 2 Steps of technological content (TC) for web designing

Utilizing Technological Pedagogical Content (TPC) for Designing …

133

2.3 Technological Pedagogy (TP) and Pedagogical Technology (PT) Integration of technology into pedagogy is implementing in two ways: (a) create or select a learning management system (LMS) such as Moodle which is used as e-learning environment; (b) develop or use different tools for pedagogical purposes, e.g., YouTube, Google search, PowerPoint, or Zoom. Although some of these tools are not specialized for education, they are being used for educational purposes and listed as top educational tools. Integrating pedagogy into technology suggests using pedagogical principles to design a digital service to help users to understand content in a non-educational environment [11].

3 From Theory to Action TPC for designing the website includes: (i) defining the aim (and objective goals) of a website (P analysis); (ii) selecting and developing materials for user groups (C analysis, developing PC); and (iii) delivering materials through the website (TPC creation).

3.1 Defining Objective Goals of Website A successful user-centered website covers the aims of the website owners and the users. Therefore, defining the objective goals of the website is rooted on knowing the users. To understand the aim of users, there is a need to investigate their characteristics, needs, background, skills, expectations, interests, etc. Currently, the different tools provide the information of the statistical data or feedback of users. These data help a website owner to find the number of users and tracking information, or even their profession and expertise. However, implementing interviews or surveys is important ways to get deeper data from the needs, expectation, and feedback of the users for updating or renewing the website. It is notable that a website is most successful when both the demands and the aims of website owner and users are overlapped. Therefore, at this stage, both expectations and needs of users with the aim of website owners are compared.

3.2 Selecting and Developing Materials Each subject includes unlimited knowledge and materials. The content is the selected text and visual materials based on the aim of the website owner and the users. The

134

Z. Hosseini et al.

Fig. 3 Different user groups’ structures through the content

most complex and professional decision in creating a website is determining what kind of materials (text, picture, tables, graphs, interactive information, etc.) is suitable for each content. A user-centered website needs to keep important notes on: (1) Different users have different tastes, habits, age, digital literacy skills, cultural backgrounds, and learning styles; and (2) the website owner cannot exactly predict the users’ characteristics. The next challenge after selecting the material is constructing them. Each content has a particular construct, which can be linear or nonlinear. While content sequences are built based on the construct of a particular subject and mind/concept map tools are suitable to draw it, a more challenging task for a content developer is predicting the different paths of users to move through the different content objects (Fig. 3). The effectiveness of a website depends on providing easy and fast navigating of the website pages.

3.3 Inserting Content on Webpages All information on the web is divided into the pieces that are called content objects. This term in e-learning content development is learning content object. It includes the goal and materials as the content. Each learning content object includes: (i) A topic (based on the goal) is very short and clear and covers the text; (ii) a text that is very short. Texts are short like subtopics which are linked to the other pages or collapsing explanations for reading details; (iii) materials are based on different users’ background knowledge; (iv) diversity in materials provides the chance for different users to get the information more easily; and (v) linking the page to other ones. Selecting visual techniques is based on how the users can understand the information easier, faster, and more effectively. Accordingly, some pedagogical criteria are suggested such as sensible classification, simplicity, meaningfulness, minimality, clear instruction, and feedback [10]. Reviewing, testing, and updating the website are an iterative and continuous process, and each company and organization should have a strong plan and tools for it. For updating the content, realizing the degree of stability of each part of content helps save time for the website owner.

Utilizing Technological Pedagogical Content (TPC) for Designing …

135

4 Conclusions While TPACK is introduced as a framework for the knowledge of integrating technology to teach content, TPCD is the action to integrate pedagogical conception into technology and content. TPCD utilizes pedagogical theories and principles and assumes users as learners rather than readers or observers, presents instructions and prepares content of websites. Content in TPCD applies the knowledge related to the construct of each subject, and technology applies all techniques of a website to present the content. Integrating pedagogy into content results in the goals of a website and the outline of content based on users and a learning process. Integrating pedagogy into technology results in the prediction of different users’ paths on a website based on the knowledge about the users’ experiences, needs, expectations, etc. Integrating content into technology suggests different formats of materials and helps prepare content objects for the pages on a website. TPCD proposes practical instructions to apply learning theories and knowledge about users to organize the content and insert it on webpages in a way that users can find out their required information faster and more easily (Fig. 4). TPCD is promising to offer practical instructions to create an effective website. TPCD, by utilizing the previous attempts such as user-centered design (UCD) [6], user-driven development (UDD) [15], and user-based design [21], attempts to introduce practical instructions for designing a website. It utilizes a pedagogical lens to focus on users in developing content and designing technology that can be applied in different technological tools, platforms, and templates for creating a website. However, the model is still in an early stage, and this paper introduced the general instructions for TPCD conception. Obviously, more applications and feedback from the real practice will be valuable for further development. Currently, authors have initiated the TPC design to a website of a large national business association of small and medium-sized firms and are producing a case study of the redesigned website,

Fig. 4 TPC as the intersection of technology, pedagogy, and content

136

Z. Hosseini et al.

which expected to be published soon. The approach will be applied to a case website of a selected government organization to enhance the quality of e-government digital services in a citizen-centered manner. It was argued to be a suitable domain for TPCD, when taking account its context-specific requirements.

References 1. Acosta-Vargas, P., Luján-Mora, S., Salvador-Ullauri, L.: Quality evaluation of government websites. In: Proceedings of the Fourth International Conference on eDemocracy & eGovernment (ICEDEG), pp. 8–14, Apr 2017. IEEE. https://doi.org/10.1109/ICEDEG.2017.796 2507 2. Bontcheva, K., Gorrell, G., Wessels, B.: Social media and information overload: survey results. arXiv preprint arXiv:1306.0813 (2013). https://arxiv.org/abs/1306.0813. Last accessed 2021/05/13 3. Chai, C.S., Koh, J.H.L.: Changing teachers’ TPACK and design beliefs through the Scaffolded TPACK Lesson Design Model (STLDM). Learn.: Res. Pract. 3(2), 114–129 (2017). https:// doi.org/10.1080/23735082.2017.1360506 4. European Commission: EU eGovernment Action Plan 2016–2020: Accelerating the digital transformation of government (2016) 5. Grigoroudis, E., Siskos, Y.: Customer satisfaction evaluation. Int. Ser. Oper. Res. Manage. Sci. (2010). https://doi.org/10.1007/978-1-4419-1640-2 6. Graham, A.K., Wildes, J.E., Reddy, M., Munson, S.A., Barr Taylor, C., Mohr, D.C.: Usercentered design for technology-enabled services for eating disorders. Int. J. Eat. Disord. 52(10), 1095–1107 (2019). https://doi.org/10.1002/eat.23130 7. Hosseini, Z., Kotilainen, S., Okkonen, J.: The potential of social media to enhance cultural adaptation: a study on Iranian student in the finnish context. In: Rocha, Á., Ferrás, C., Montenegro Marin, C., Medina García, V. (eds.) Information Technology and Systems. ICITS 2020. Advances in Intelligent Systems and Computing, vol. 1137, pp. 535–549. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-40690-5_52 8. Hosseini, Z., Tee, M.Y.: Development of technological pedagogical content knowledge through project–based learning. In: Proceeding of 1st International Conference on WorldClass Networking for World-Class Education (ICWEd 2011), 5–6 Dec 2011. Kuala Lumpur, Malaysia (2011). http://eprints.um.edu.my/13579/1/0001.pdf. Last accessed 2021/05/06 9. Hosseini, Z.: The comparison between the effect of constructivism and directed instruction on student teachers’ technology integration. J. New Educ. Approaches 10(2), 21–40 (2016). http://uijs.ui.ac.ir/nea/article-1-1556-en.html. Last accessed 2021/05/06 10. Hosseini, Z., Okkonen, J.: Web-based learning for cultural adaptation: constructing a digital portal for Persian speaking immigrants in Finland. In: Arai, K. (ed.) Intelligent Computing. Lecture Notes in Networks and Systems, vol. 283. pp. 930–945. Springer, Cham (2021). https:// doi.org/10.1007/978-3-030-80119-9_62 11. Hosseini, Z, Kinnunen, J.: Integration of pedagogy into technology: A new paradigm. Education and New Developments. pp. 406–410 (2021). https://doi.org/10.36315/2021end086 12. Kim, K.S., Sin, S.C.J., Yoo-Lee, E.Y.: Undergraduates’ use of social media as information sources. Coll. Res. Libr. 75(4), 442–457 (2014). https://doi.org/10.5860/crl.75.4.442 13. Kinnunen, J., Georgescu, I.: Disruptive Pandemic as a Driver towards Digital Coaching in OECD Countries. Revista Romaneasca pentru Educatie Multidimensionala 12(2Sup1), 55–61 (2020). https://doi.org/10.18662/rrem/12.2Sup1/289 14. Li, L., Peng, M., Jiang, N., Law, R.: An empirical study on the influence of economy hotel website quality on online booking intentions. Int. J. Hosp. Manag. 63, 1–10 (2017). https://doi. org/10.1016/j.ijhm.2017.01.001

Utilizing Technological Pedagogical Content (TPC) for Designing …

137

15. Merilampi, S., Ihanakangas, V., Virkki, J.: User-driven development with scientific & applied research-RFID-controlled physiogame case study. In: 2019 IEEE International Conference on RFID Technology and Applications (RFID-TA), pp. 167–170. IEEE (2019). https://doi.org/10. 1109/RFID-TA.2019.8892150 16. Mishra, P., Koehler, M.J.: Technological pedagogical content knowledge: a framework for teacher knowledge. Teach. Coll. Rec. 108(6), 1017–1054 (2006). https://doi.org/10.1111/j. 1467-9620.2006.00684.x 17. Niess, M.L.: Scaffolding Subject Matter content with pedagogy and technologies in problembased learning with the online TPACK learning trajectory. In: Teacher Training and Professional Development: Concepts, Methodologies, Tools, and Applications, pp. 914–931. IGI Global (2018) 18. Mohseni, S., Jayashree, S., Rezaei, S., Kasim, A., Okumus, F.: Attracting tourists to travel companies’ websites: the structural relationship between website brand, personal value, shopping experience, perceived risk and purchase intention. Curr. Issue Tour. 21(6), 616–645 (2018). https://doi.org/10.1080/13683500.2016.1200539 19. Kim, Y., Wang, Q., Roh, T.: Do information and service quality affect perceived privacy protection, satisfaction, and loyalty? Evidence from a Chinese O2 O-based mobile shopping application. Telemat. Inform. 56, 101483 (2021) 20. Oliver, R.L.: Satisfaction: A Behavioral Perspective on the Customer, 2nd edn. McGraw-Hill, New York (2010) 21. Perdomo, E.G., Cardozo, M.T., Perdomo, C.C., Serrezuela, R.R.: A review of the user based web design: usability and information architecture. Int. J. Appl. Eng. Res. 12(21), 11685–11690 (2017) 22. Puling, A.S.E., Sitokdana, M.N.N.: Evaluation of information quality Kupang city government website. Tepian 1(3), 97–102 (2020) 23. Rasyid, A., Alfina, I.: E-service quality evaluation on e-government website: case study BPJS Kesehatan Indonesia. J. Phys.: Conf. Ser. 801(1), 012036 (2017). IOP Publishing 24. Sanabre, C., Pedraza-Jiménez, R., Vinyals-Mirabent, S.: Double-entry analysis system (DEAS) for comprehensive quality evaluation of websites: case study in the tourism sector. Profesional de la información 29(4) (2020). https://doi.org/10.3145/epi.2020.jul.32 25. Sharma, D., Srivastava, P.R., Pandey, P., Kaur, I.: Evaluating quality of matrimonial websites: balancing emotions with economics. Am. Bus. Rev. 23(2), 9 (2020). https://doi.org/10.37625/ abr.23.2.358-392 26. Semerádová, T., Weinlich, P.: Website Quality and Shopping Behavior: Quantitative and Qualitative Evidence. Springer Nature (2020). https://doi.org/10.1007/978-3-030-44440-2 27. Tanak, A.: Designing TPACK-based course for preparing student teachers to teach science with technological pedagogical content knowledge. Kasetsart J. Soc. Sci. 53–59 (2018). https://doi. org/10.1016/j.kjss.2018.07.012 28. Westerman, D., Spence, P.R., Van Der Heide, B.: Social media as information source: recency of updates and credibility of information. J. Comput.-Mediat. Commun. 19(2), 171–183 (2014). https://doi.org/10.1111/jcc4.12041

A Multidimensional Rendering of Error Types in Sensor Data Zlatinka Kovacheva , Ina Naydenova , and Kalinka Kaloyanova

Abstract The focus of this article is on data quality in wireless sensor networks. Different types of errors that occur during the operation of wireless sensor networks and their classifications are discussed. Based on reviewed classifications, an approach for a multidimensional organization for the rendering of WSN error types is proposed that provides opportunities to monitor the frequency of sensor errors and analyze the causes of their occurrence based on various criteria. Keywords Multidimensional model · Sensor data quality · Types of errors

1 Introduction The very fast development of wireless sensor networks (WSNs) leads to the application of a huge amount of sensor devices in many fields of modern life such as industry, smart city, agriculture, healthcare, and transport Along with advanced data analytics, IoT-enabled devices and sensors improve the quality of human life by reducing air and water pollution, cutting food waste, optimizing traffic congestion, energy consumption, and trash collection in large cities, improving health care and quality of life of the elderly and many others. The WSN applications use thousands of sensors that produce huge amounts of data, but the data are considered useless if they are not correct. Poor sensor data quality may significantly affect the results of the decision-making processes [1]. Z. Kovacheva (B) · I. Naydenova · K. Kaloyanova Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria K. Kaloyanova e-mail: [email protected] Z. Kovacheva University of Mining and Geology “St. Ivan Rilski”, Sofia, Bulgaria K. Kaloyanova Sofia University “St. Kliment Ohridski”, Sofia, Bulgaria © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_13

139

140

Z. Kovacheva et al.

The reliability of WSNs data is often challenged. There are cases when almost half of the collected data is faulty and cannot be used for meaningful interpretation [2, 3]. The quality of readings coming from sensor devices may be affected by noise, lack of measurements, inconsistent, or duplicate observations. Low-cost and lowquality devices, which suffer from a severe lack of resources, are often used in sensor networks (e.g., limited memory, storage, power communication, and computational capacities, etc.). In fact, the limited resources of the WSN devices are among the main factors contributing to the inaccuracies and even unreliability of the data sets collected by sensor networks [4]. Other reasons for poor data quality could be network congestion, intricate conditions, aging of the sensor, security breaches, improper sensor adjustment or deployment procedures, and more. In this paper, we will explore different types of errors in data collected by WSN applications. We will present various perspectives and classifications of sensor networks errors that could be organized in a hypercube structure allowing easy summarization and investigation of the data quality issues by different criteria.

2 Classification of Errors in Sensor Data The most commonly addressed sensor data quality errors in publications are different types of faults and anomalies in the sensory output, followed by the lack of data [1]. Faults are deviations from the normal operation at the sensor output. They can be classified as intermittent, persistent, or transient, depending on their occurrence pattern. Sensor faults can be also categorized according to their divergence from the normal operations [1, 5–10]. International Organization for Standardization defines the measurement error as the difference between the measurement of and the true value of the measure [9]. In terms of mathematics, if a sensor is operating normally, its work can be presented by the equation [8]: (1) presents the sensor output at time t and n denotes noise. where , but in the reality, For a fault-free sensor, the signal Sn should be equal to there is some noise n associated with it. Usually, the values of n are small enough and they do not significantly affect the results. In such cases, it is difficult to estimate the true value of the measurement. We will call this value a “true value” below, although technically this is the expected value in normal sensor operation rather than the actual value of the measurement. The data flow of a WSN application can be presented in the following layers: • Perception layer—this physical layer consists of various types of sensors. On this layer, the sensor devices measure and collect readings.

A Multidimensional Rendering of Error Types in Sensor Data

141

• Network layer—this layer is responsible for the transportation and processing of sensors readings, collected in the perception layer. Different type of technologies are used to determine the routes for sending sensor data. • Application layer—this layer delivers application specific services to the user. The data received by the network layer are stored, processed, and analyzed here [1]. We can outline the following most typical sensor faults, which occur in the perception layer [1, 5–10]: • Noise error—the reading deviates from the true value arbitrarily in the value range and constantly over time. As we mentioned, usually the variations are small but random. Small noises are common and expected, but the emergence of a large amount of electronic noise can indicate a problem with the sensor. • Constant offset error (bias or zero drift)—sensor readings deviate from the true value with a constant amount; i.e., the value shifts compared to the normal behavior of the sensor [12]. A possible reason for small deviations in sensor measurements could be the manufacturing imperfections, environment interferences such as differences in temperature and electronic noise. [9]. This kind of error can be presented mathematically by the following equation [8]: Sn = Ƒ(t) + n + a, a = constant

(2)

• Continuous varying or drifting error—sensor readings deviate constantly from the true value in accordance to some linear or nonlinear continuous function. The drifting error usually appears due to the amortization of sensing material [9]. If we denote by (t) the offset at time t, and it increases over time: (t) = (t − 1) + a, then the drift fault can be presented by the equation [8]: Sn = Ƒ(t) + n + Δ(t)

(3)

Crash or jammed error (also known as crash, jammed error, or dead sensor fault)— the sensor stops giving readings to its interface or crashes into some incorrect value. If the sensor reading is zero for a long period (alternatively some other constant value—e.g., one or a maximum value), this state is called “stuck-at-zero,” which is mainly due to the failure of the sensor [9]. In such a case, the sensor output signal does not undergo any or almost zero change. Such an error may be transient as well as persistent (a complete failure). The crash fault can be described by the following equation [8]: Sn = a, a = constant

(4)

• Clipping error (trimming error)—sensor readings are accurate if their values fall within a certain range, but are inaccurate outside the range. Hard clipping occurs when the signal is strictly limited to the threshold, which leads to a flat

142

Z. Kovacheva et al.

interruption. The clipping is described as a soft one when the clipped signal continues to follow the original with a reduced gain. The clipping fault can be described by the following equation: (5) • Outliers error—the reading deviates from the true value occasionally, at random moments in time. In general, the deviations are significant compared to normal values. • Spike fault—the reading deviates significantly from the true value periodically. Spike fault is a variation of the outlier error types. If we denote by s the period at which the spikes occur, then the spike fault can be modeled by the equation [8]: (6) Often spikes are caused by hardware or connection failures. However, the reasons for the appearance of spikes could also be due to environmental conditions and physical phenomenon. Some authors suggest that the identification of spikes should be based on environmental context [5]. For example, light sensor outputs can suddenly give large variations in gradient, but this should not always be considered as sensor fault, as light intensity could vary in a wide range. • Missing Values (data loss fault)—the data loss error is a zero state at the sensor output at random intervals. The transmitted data can be lost because of interruptions of the sensor device due to battery lifetime, unstable wireless connection, environmental interferences, etc. [9]. If the amount of missing data is large, it leads to serious problems. The network can introduce additional communication faults affecting sensor data. The most typical communication faults in the network layer are the following [10, 11]: • Omissions—omissions occur when sensor readings are missing because of lost messages. One of the prime reasons for this type of error is packet losses and sensor damages. Fading signal strength and periodic or continuous disturbances in the environment because of rain or wind, etc., may cause high packet loss and asymmetric communications links. To prevent omissions, it is necessary to enforce the reliability of the communication, for example, on the basis of message retransmission. Unfortunately, such communication protocols require more energy and therefore are not widely used in WSNs. Missing values affect the result of different queries over sensor measurements. They could be critical for data processing and aggregation techniques implemented in the network layer. Omissions can be avoided by masking absent values using additional information or by forecasting them based on past values. Missing value

A Multidimensional Rendering of Error Types in Sensor Data

143

issues in sensor data have been well studied and resolved to some extent, but not fully. • Crash faults—a total crash can occur during the transmission or communication. This type of fault can cause a significant lack of data and could be avoided by using redundancy mechanisms. • Delay faults—such types of faults are relevant for applications, which rely on current and up-to-date sensor data to provide quality services. To avoid timing failures, it is recommended to use mechanisms for synchronous timestamping of data and disposal of obsolete data. A possible approach is also to use redundant but current sensor data if the network supports such information. • Message corruption—a communication fault leads to message corruption. Usually, the communication protocols apply data integrity verification mechanisms for the detection of corrupted messages in order to discard them. But there are cases when received data does not fully correspond to the sent data. It happens because of an accidental fault, which is not covered by the integrity verification mechanisms or when some part of the communication stack in the sending or receiving node has been intentionally corrupted. In addition, network failures can occur if networking devices or services have fallen into an abnormal state (e.g., network congestion, link failure or looping) [13]. Such network conditions automatically affect the quality of data in WSN applications. Errors in sensor data can be spread to the subsequent layers. For this reason, it is important to eliminate sensory data quality issues at the lowest possible level. At the application level, the most common sources of errors are program bugs and limitations in the data processing. However, the probability of occurrence of such types of errors is relatively small compared to other types of faults. In general, the errors in WSN data could be divided into two large classes: intentional or unintentional errors. Intentional data errors can be caused by malicious attacks over the wireless network, while hardware failures, sensor malfunctions, poor configurations, depleted or depreciated resources, and more cause unintentional data errors [14]. According to the frequency of occurrence of the deviation, the errors could be classified as follows: • Random errors—these errors are unpredictable, and they have no pattern and could not be repeated. Random errors in sensor networks occur constantly but are stochastic in nature. These errors could be further classified into subtypes depending on their causes (e.g., instrument noise errors, switch repeatability errors, connector repeatability errors). Since random errors are unpredictable, they cannot be removed by device calibration and configuration activities. • Systematic errors—these are consistent and repeatable over time errors. Systematic errors reflect constant variations in the same direction, while random errors reflect different variations in random directions. Systematic errors could be further classified into subtypes depending on their source. According to Jesus et al. [10], systematic errors could be divided into (a) calibration errors, which are an effect

144

Z. Kovacheva et al.

of calibration and linearization processes; (b) loading errors, which occur due to the energy extracted by a sensor when making a measurement; (c) environmental errors, which occur when the sensor is exposed to environmental influences, and these effects are not taken into account. Unlike the calibration and loading errors, which are caused by internal procedures, environmental errors are caused by external factors. • Spurious readings—these are unsystematic errors of a stochastic but erratic nature. They occur when certain physical events affect the measured value in a way that does not reflect the usual state of reality. For example, the measuring indoors intensity of the light may give an incorrect value if it is obtained exactly when someone takes a picture and the camera flash is activated [10]. In the next few paragraphs, we will pay a little more attention to outliers, as this type of anomalies in data is of great interest in the scientific and technical literature. Depending on the nature of the anomaly, outliers can be classified into three main categories [15, 16]: • Point outliers (also called global or unconditional outliers)—if an individual observation is at an abnormal distance from the other data points, this observation is referred to as a point outlier (e.g., if a fist-size meteorite hits a house). This is the simplest type of anomaly in the data and is the subject of interest in most outlier detection studies; • Contextual outliers (also called conditional outliers)—the observation is defined as a contextual or conditional outlier if it is at abnormal distance from the other data in the same context, but the same observation may not be determined as abnormal if it occurs in a different context (e.g., if it snows in the summer). • Collective outliers—if the values of a set of observations deviate outstandingly from the entire data set, but the single observations are not considered as abnormal in a global or contextual aspect, then the observations in the set are defined as collective outliers (e.g., all residents of a neighborhood moving out on the same day). Outliers can be categorized into two general types: global or local, depending on their scope: • Local outliers—in terms of sensor networks, local outliers are determined based on data collected by a particular sensor node. The local outlier detection in WSNs is based on two main approaches [4]. With the first approach, sensor nodes use only their historical data to determine one measurement as abnormal. With the second one, sensor nodes use both historical sensor data and data from neighboring nodes. Utilizing spatial and temporal interconnections between neighboring sensor measurements provides better accuracy and stability of the anomaly detection, but requires more data transmissions and energy consumption. • Global outliers—this type of data anomalies in WSNs is determined in a more global context. The data analysts are interested in global outliers as they give them a more global and comprehensive vision of how the sensor network works as a connected unit. Depending on the topology of the sensor network, the process of

A Multidimensional Rendering of Error Types in Sensor Data

145

global identifiers identification could be performed by different participants (e.g., by the base station in case of centralized network organization or by the cluster head in the case of cluster-based topology) [4]. The outliers can be also classified according to their source. There are three general sources of data anomalies occurring in WSNs [4]: • Events—events are phenomena that suddenly change the environment (e.g., fires, floods, chemical leaks, earthquakes, etc.). This type of anomaly is rare and usually lasts for a relatively long period of time, changing the historical pattern of sensor data; • Malicious attacks—the outliers are a consequence of network security issues. • Noise and errors—the outliers are a consequence of noises in measurement or readings coming from a defective sensor. This type of outlier may occur frequently compared to other types. Generally, they have a random nature and have no real significance, but can affect data analysis seriously, so they should be corrected or deleted if possible. An exception regarding the significance is spikes due to their periodicity (as far as we consider spikes as a variation of outliers). Anomalies caused by other sources should also not be overlooked, as they can reflect events that are of great importance to researchers. The presence of spikes in the sensory data may also be a consequence of a natural event. The current classification could be extended not only to the outliers but also to other types of sensor data errors (both random and systematic) by specifying the sources/reasons of data error occurrence (e.g., hardware malfunction, battery defect, incorrect calibration, communication issues, natural phenomenon, environment conditions, malicious attacks, and more). All of the above-mentioned sensor errors lead to different data quality issues. Based on the presented classifications, in terms of data quality dimensions [4, 5], we can summarize that most of the problems in the field of sensor networks are related to accuracy, completeness, consistency, and timeliness. It should be noted that here we present various categorizations of sensor errors that generalize the most frequently cited errors in the literature. However, depending on the specific type of sensor, the error lists may vary or be strictly specific (e.g., in the case of data collection from video sensors). The type of sensor network (e.g., terrestrial, underground, underwater, multimedia, mobile, etc.) also has its own specifics both in terms of types of data quality issues and in terms of types of data quality processing procedures.

3 A Multidimensional Model of Errors in Sensor Data Based on the reviewed classifications of sensor errors based on various criteria, we can create a multidimensional cube of sensor errors and observe the frequency of

146

Z. Kovacheva et al.

Fig. 1 Three-dimensional model of error frequency

each type of error. Figure 1 illustrates a simple model of a three-dimensional cube, focused on the following criteria: • Most typical sensor faults and outliers; • Network segments; • Periods of time. We can add more dimensions related to the criteria discussed above: • • • • • • • •

Frequency behavior (intermittent, persistent, or transient); Types of fault factors (internal or external); Layers of error occurrence (perception, network, application); Reasons for the fault (intentional or unintentional); Outlier and fault sources (noise, events, malicious attacks, hardware failure, etc.); Scope of data (local or global); Nature of outliers (point, contextual, or collective); Others.

Depending on the research needs, the cube may contain more generalized and aggregated information or include dimensions with a greater level of detail. At the lowest level of detail, we can even include particular sensor nodes to track their hardware configuration and instrumental features. One of the challenging aspects in the field of providing quality data in WSN applications is to establish a connection between the occurrence of an error and the reasons for its happening [17]. Causes for sensor failures and the occurrence of errors in their readings usually are closely related to the deployment context. For example, the depth of placement of the sensor in the ground could affect the quality of the measurement [18]; an ozone sensor output depends not only on the ozone concentrations but also on the temperature and humidity of the location where the sensor is deployed [19];

A Multidimensional Rendering of Error Types in Sensor Data

147

onsite calibrated temperature sensors behave differently when deployed in other location due to differences in environmental conditions between the two places such as wind speed and solar radiation rainfall. [20]. The accumulation of information about the occurrence of errors in the sensor data, organized in a multidimensional manner, will help to better understand how the deployment context affects the measurement quality. For instance, monitoring the concentration of a certain type of error in a given segment of the network will trigger a deeper investigation of the factors specific to that segment that contribute to the occurrence of the corresponding type of error. To collect statistical information and investigate the causal relationships between the types of errors along with the reasons for their occurrence, we can include additional “features” dimensions. They will describe characteristics of the sensor network or environment that can cause faults [5]. Some examples of environmental features are sensor location, constant environment characteristics (e.g., soil type), physical limitations (e.g., “relative humidity does not exceed 100%”), environmental interferences (e.g., rain patterns or irrigation events), environmental models (e.g., microclimate models). Some examples of the sensor network system features are hardware components characteristics (e.g., the type of transducers and analog-to-digital converters), battery health, sensor age, hysteresis, response time, sensor range, modalities, resolution, saturation, and more. Here, we are presenting a generalized model that covers classifications relating to all types of sensors. For a specific sensor network, these classifications may be far more specific and/or detailed, including dimensions describing network properties, implementation context, and so on. The hypercube model can be implemented as a part of the application layer in any WSN application. It allows additional statistical analysis to be applied to the collected information to reveal different trends and patterns in the occurrence of certain types of WSN errors to study relationships between the types of errors, their distributions, etc. A limitation of the proposed approach is that errors, which are automatically eliminated in the perception or network layer, are out of the scope of the multidimensional analysis unless their occurrence is logged appropriately in order to be registered in the hypercube. It should be noted that by increasing the number of dimensions, the size of the multidimensional cube can grow significantly and large areas of sparse data can appear. Sparsity is a phenomenon that is observed in multidimensional models of data: Many of the points in multidimensional space do not contain values. Sparse data cause a lot of troubles such as the explosion of data during the hypercube aggregations and storage and performance issues. The techniques for overcoming the negative effects of sparsity in multidimensional data models work mainly on a “physical level”—various techniques are used for data compression, efficient indexing, materialization of the most commonly used calculations, etc. These techniques do not take into account the nature and type of sparsity. In [21], we propose a design of a regular sparsity map, which is an innovative object that saves information about specific empty domains of the hyper-cubes. This object could be useful for various purposes such as data validation, storage consideration, and user interface improvements. The regular sparsity map is especially effective when the sparsity in the multidimensional model is caused by the data correlations that do not have a random nature

148

Z. Kovacheva et al.

but are known in advance (i.e., when certain combinations of values of individual dimensions do not make sense). The model we propose in this article for analyzing types of errors occurring in sensor networks has exactly such characteristics (e.g., the calibration sensor faults are systematic, while outliers are random; all values of the dimension “nature of outliers” do not make sense for errors different from outliers, etc.).

4 Conclusion In this article, we focused on the design of a multidimensional model of errors in sensor data. This model allows to observe the frequency and other characteristics of sensor errors and to analyze the reasons for their appearance based on different criteria. The collection of such statistical information is essential to improve the quality of data from sensor networks. The information can be used to improve the network and its deployment context to design more effective methods for detecting and eliminating problems in the data collected by sensor networks. In general, errors in a set of real data can never be 100% corrected. This raises the question of the contextual nature of quality and the importance of properly defining the concept of quality data, namely data meeting specific data consumers’ requirements. In the current contribution, we have explored the most commonly cited types of errors in the literature and illustrated a possible approach for multidimensional rendering and analysis of sensor fault occurrence. However, when defining a hypercube for real-life applications, appropriate dimensions must be selected to analyze the data quality issues based on the desired goals. We look forward to further expand the model with more features that allow additional opportunities to investigate the causal relationships between errors and the WSN deployment context, as well as the use of data mining models to detect hidden dependencies in WSN data. Acknowledgements The authors acknowledge the financial support of the EXTREME project, funded by the Bulgarian Ministry of Education and Science, D01-76/30.03.2021, through the program “European Scientific Networks.”

References 1. Teh, N., Kempa-Liehr, A., Wang, K.: Sensor data quality: a systematic review. J. Big Data 7, Article number 11 (2020) 2. Tolle, G., et al.: A macroscope in the redwoods, In: 3rd international conference on Embedded networked sensor systems. In: Proceedings (SenSys ’05) 3. Szewczyk, R., Mainwaring, A., Polastre, J., Anderson, J., Culler, D.: An analysis of a large scale habitat monitoring application. In: Proceedings of SenSys.M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.Author, F.: Article title. Journal 2(5), 99–110 (2016)

A Multidimensional Rendering of Error Types in Sensor Data

149

4. Zhang, Y., Meratnia, N., Havinga, P.: Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun. Surv. Tutor. 12(2), 159–170 (2021) 5. Ni, K., et al.: Sensor network data fault types. ACM J. Name 5(3), 1–29 (2009) 6. Perez-Castillo, R., et al.: Data Quality Best Practices in IoT Environments. https://intelligente nvironments.es/images/files/QUATIC2018_SHORT.pdf. Last accessed 2021/03/27 7. Nguyen, T.A., et al.: Applying time series analysis and neighbourhood voting in a decentralised approach for fault detection and classification in WSNs. In: Thang, H.Q., et al. (eds.) SoICT, pp. 234–241 ACM (2013) 8. Saeed, U., Lee, Y., Jan, S., Koo, I.: CAFD: context-aware fault diagnostic scheme towards sensor faults utilizing machine learning. MDPI, Sensors 21, 617 (2021) 9. Joseph, A., Sharma, A.: IoT Analytics: Data Quality Challenges. Tech Mahindra, https://cache. techmahindra.com/static/img/pdf/iot-analytics-pov-modified-3Aug2020.pdf. Last accessed 2021/03/27 10. Jesus, G., Casimiro, A., Oliveira, A.: A survey on data quality for dependable monitoring in wireless sensor networks. Sensors 17(9), 2010 (2017) 11. 5 major sensor data analytics challenges: deadly or curable? Data Science Central—A community for Big Data Practitioners, Tech Target https://www.datasciencecentral.com/profiles/blogs/ 5-major-sensor-data-analytics-challenges-deadly-or-curable. Last accessed 2021/04/17 12. Rabatel, J., Bringay, S.P.: Poncelet: anomaly detection in monitoring sensor data for preventive maintenance. Expert Syst Appl. 38(6), 7003–7015 (2011) 13. Mehmood, A., Alrajeh, N., Mukherjee, M., Abdullah, S., Song, H.: A survey on proactive, active and passive fault diagnosis protocols for WSNs: network operation perspective. Sensors 18, 1787 (2018) 14. Rodriguez, C., Servigne, S.: Managing Sensor Data Uncertainty: A Data Quality Approach. https://hal.archives-ouvertes.fr/hal-01339140/document. Last accessed 2021/04/21 15. Outlier Analysis, https://towardsdatascience.com/outliers-analysis-a-quick-guide-to-the-differ ent-types-of-outliers-e41de37e6bf6. Last accessed 2021/04/19 16. Divya, D., Babu, S.: Methods to detect different types of outliers. In: Proceedings of 2016 International Conference on Data Mining and Advanced Computing (SAPIENCE), https://www.researchgate.net/publication/311610830_Methods_to_detect_different_ types_of_outliers. Last accessed 2021/04/11 17. Naydenova, I., Covacheva, Z., Kaloyanova, K.: Data quality: enterprise initiatives’ issues and WSN challenges. Sens. Transducers J. 251(4), 37–46 (2021). ISSN: 2306-8515 18. Todorov, J., Sturbanova, I., Trifonova, M.: Information system for planning, management and reporting of open Cast Mines Production (Output). In: First International Conference on Information Systems & Datagrid, Sofia, 17–18 Feb 2005, pp. 147–154. ISBN: 954-649-761-4 19. Barcelo-Ordinas, J., Doudou, M., Garcia-Vidal, J., Badache N.: Self-calibration methods for uncontrolled environments in sensor networks: a reference survey, Ad Hoc Networks. Elsevier, 2019, 88, pp.142–159 (2019) 20. Yamamoto, K., Togami, T., Yamaguchi, N., Ninomiya, S.: Machine learning-based calibration of low-cost air temperature sensors using environmental data. Sensors 17(6) (2017) 21. Naydenova, I., Covacheva, Z., Kaloyanova, K.: A model of regular sparsity map representation. Analele Stiin¸ ¸ tifice ale Universit˘a¸tii “Ovidius” Constan¸ta. Seria Matematic˘a 17(3), 197–208 (2009)

RETRACTED CHAPTER: Optimization of the Overlap Shortest-Path Routing for TSCH Networks and Marcelo V. Garcia

ER

Javier Caceres

TE

D

C

H

A

PT

Abstract A wider usage of wireless technologies throughout the lower levels of the automation pyramid is one of the results of the integration of the concepts of Industry 4.0 and Industrial Internet of Things (IIoT). Among the most popular communication standards used in the current industrial paradigm, we can find the: IEEE 802.15, ISA100.11a, 6TiSCH and the Wireless-HART. One of the main reasons behind the user preference toward the aforementioned is their characteristic of supporting realtime data traffic in wireless sensor and actuator networks (WSAN). This variety of communication options has been the starting point of many studies and researches about the topic of prioritized packet scheduling. However, only a few works have been developed related to improving real-time performance compared to the amount of works mentioning the previous. Using this fact as a basis, the following work proposes an epsilon greedy heuristic agent whose main work objective is the reduction of the overlap in Shortest-Path routing for WSANs using the policy of earliest-deadline-first (EDF) to schedule packet transmission.

A

C

The original version of this chapter was retracted: The retraction note to this chapter is available at https://doi.org/10.1007/978-981-16-6369-7_73

R

ET

R

The editors have retracted this paper due to significant overlaps with a previously published conference paper [1] that concern overall structure, proposed solution, equations, notation, test scenarios and results. Marcelo Garcia has agreed to this retraction. Javier Caceres has not responded to any correspondence from the editor or publisher about this retraction. [1] Gutierrez Gaitan, M., Almeida, L., Santos, P. M., & Meumeu Yomsi, P. (2021). EDF scheduling and minimal-overlap shortest-path routing for real-time TSCH networks. In Proceedings of the 2nd Workshop on Next Generation Real-Time Embedded Systems (NG-RES 2021) (Vol. 87, pp. 2:1-2:12). Schloss Dagstuhl–Leibniz-Zentrum für Informatik.

J. Caceres (B) · M. V. Garcia Universidad Tecnica de Ambato, UTA, 180103 Ambato, Ecuador e-mail: [email protected] M. V. Garcia e-mail: [email protected]; [email protected] M. V. Garcia University of Basque Country, UPV/EHU, 48013 Bilbao, Spain © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022, corrected publication 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_14

151

152

J. Caceres and M. V. Garcia

Keywords Real-time · Wireless networks · Network algorithm · Path optimization

1 Introduction

R

ET

R

A

C

TE

D

C

H

A

PT

ER

Given the upgrade toward a production scheme whose main basis is the capability of sharing data between all the devices from the system, wireless sensor and actuator networks (WSAN) have acquired a major roll within the current industry. One of the main appealing points of this type of networks is the high flexibility and scalability that they posses compared to wired architectures [1]. However, it does not surpass its predecessor in all fields. WSANs have not been able to fully replicate the bandwidth and reliability capacities of its predecessor, but in the industrial area this does not represent an impediment for their implementation and usage. Many industrial applications based on real-time monitoring and audio streaming that involve sensors and actuators only need bandwidth rates up to 250 Kbps [4]. This results in standards like Wireless-HART, ISA 100.11a, IEEE 802.15, and 6TiSCH being able to effectively satisfy the reliability need of the systems [7]. Many standards compatible with WSANs support real-time data traffic. Within the most popular ones we can find the time-syncronized channel hopping (TSCH) that stands out thanks to its outstanding features. TSCH has been opening up field in both industrial [2] and automotive [3] environments due to its characteristics of: Timedivision multiple-access (TDMA), frequency diversity and centralized scheduling. Given this characteristics, this standard has gained a bigger usage in factory automation and process control applications. Thus, increasing the acceptance and integration of the concepts of Industry 4.0 and Industrial Internet of Things (IIoT) [5]. In the following work, a real-time wireless routing method for TSCH networks approach is presented. The main objective of this type of algorithms is to improve and/or secure the real-time properties of a network using as basis routing decisions. Thus, a conflict-aware routing method for TSCH WSANs using an Earliest-deadlinefirst (EDF) scheduler is developed. The aforementioned depicts a Minimal-overlap Shortest-path routing to reduce at minimum the path overlaps within the network data flows.

2 System Model A general representation of a WSAN is shown in Fig. 1. This image depicts how the network is made of a finite number of N ∈ N nodes that include several field devices (actuators and sensors), multiple access points (APs) and a gateway. The interconnection of the field devices with the APs is done using a wireless medium. The resulting network presents a mesh topology that operates with half-duplex omnidirectional radio transceivers. The connection between the APs and the gateway uses a direct link, enabling this way a bidirectional communication among the field devices

153

ER

RETRACTED CHAPTER: Optimization of the Overlap Shortest-Path Routing ...

Fig. 1 Representation of a data flow in an Industrial WSAN

R

ET

R

A

C

TE

D

C

H

A

PT

and elements outside the physical network such as: host applications, system controllers and network managers. Specifically in the case of networks managers, they are software modules commonly deployed in the gateway where they continuously run their programmed instructions. They are in charge of collecting the topological information of the network, with which they perform scheduling and routing operations. For the following work, there are some assumptions made about the network. (i) The network is based on TSCH and uses a physical layer compatible with the IEEE 802.15.4 standard. (ii) The network operates using a centralized multi-channel TDMA protocol all together with the functions of global synchronization. With the integration of the multi-channel feature the characteristic of concurrent transmission per-slot is enabled. It is based on channel hopping methods that operate using a number of m active radio channels. The possible values that the m parameter can have are delimited by the expression 1 ≤ m ≤ 16 ∈ N. (iii) The duration of the time slots is fixed to a value around 10ms. This value corresponds to a dedicated time interval used in the allocation of a single packet transmission. With this, a maximum number of w − 1 re-transmissions with their corresponding acknowledgment transactions can take place, where w ∈ N. In the operation of the presented WSAN, the sensor nodes have the capacity of periodically transmitting data to elements inside and outside the network. An example of this takes part when a sensor sends data to a remote controller as shown in Fig. 1. To reach the controller, the sensor needs to send information through the gateway. They also follow predefined multi-hop routes when traveling by an uplink (origin point: sensor, end point: gateway) or by a downlink (origin point: gateway, end point: actuators). Both of these data transactions take place with rigorous delivery time limitations. As a method to decrease the complexity of the system, in this work, it is considered that the generated routes are configured under source routing. Therefore, the routes for both uplinks and downlinks are simple and predefined. Following the results of the work depicted in [6], the maintenance of the network is considered to be a built-in centralized service which can be implemented in a next stage.

154

J. Caceres and M. V. Garcia

2.1 Network Model

H

A

PT

ER

With the characteristics of the network depicted in the general system model, the network model can be represented as a graph G in function of G = (V, E), where: (i) V is the group of the nodes represented as vertices, and (ii) E is the group of links between the nodes represented as edges. G is assumed to be connected, undirected and incomplete. This means that the number of links between nodes is lower than the total number of existing pairs. Talking about the total number of nodes is equivalent to talk about the total number of vertices N = |V |. From this total, one of the vertices corresponds to the network gateway leaving a N − 1 number of vertices for the field devices and APs. In a next stage of the model, the gateway will be considered as the node with the betweenness centrality, meaning that removing this node from the analysis will generate the greatest impact in the final network connectivity. In the final network model, a subset n ∈ N of the total number of field devices is considered to be operating as sensors and generate data. Thus, the final (N − n) − 1 is considered to be the amount of actuators.

C

2.2 Flow Model

R

ET

R

A

C

TE

D

The set of m real-time network flows that occur during the operation of the network are described as a function of F = ( f 1 , f 1 , . . . , f m ), where every data transaction is transmitted using an EDF methodology. Each of the elements of f i has the characteristics of being periodic and constrained from end to end. Additionally, based on the fact that each flow is able to release almost an infinite number of transmissions, each of them is represented as a cluster containing the 4 elements of f i = (Ci , Di , Ti , φi ), where: (i) Ci is the effective transmission time between the source and the destiny. (ii) Di is the relative deadline. (iii) Ti is the period equivalent to the sampling rate of the sensors. (iv) φi is the routing path. The ξ th instance of the data transmissions is described as f i,ξ , with ξ ∈ N. It happens at time ri,ξ , such that ri,ξ +1 + ri,ξ = Ti . With this and following the guidelines of the EDF policy, f i,ξ is required to reach its destination before its absolute deadline, resulting in di,ξ = ri,ξ + Di . With this, it is assumed that the model has a constrained dead-line characteristic Di ≤ Ti and allows only a single flow transmission within a time slot. It can be remarked that Ci is an interpretation of the time required by a flow f i to be transmitted in the cases when it is not is affected by the rest of the flows. Therefore, Ci is calculated as Ci = γi × w, where: γi represents the total number of connections in the route path φi , and w is the amount of transmission slots that correspond to a flow in each connection also taking into account re-transmissions. In the following work, a fixed value of w is used in order to Ci to be only dependent on the topology and routing dinamics.

RETRACTED CHAPTER: Optimization of the Overlap Shortest-Path Routing ...

155

3 Problem Formulation

C

H

A

PT

ER

After the guidelines depicted in the previous section, the problem considered in the following work is to find the optimal set of flow paths which are able to reduce to the minimum the overall number of path overlaps between any pair of nodes in a network, given by a expression opt = (1 opt , 2 opt , . . . , n opt ). Throughout this process, the overall number of overlaps existing between the flows of F will be denoted as . It is defined as the sum of all the individual node overlaps λi j that correspond to the routes of any pair of flows ( f i , f j ), where i, j ∈ [1, n] ∧ i = j. During the development F0 = ( f 1 0 , f 1 0 , . . . , f m 0 ) will be the denomination of the network flows corresponding to the original set. With this, the original set of flow paths 0 = (1 0 , 1 0 , . . . , m 0 ) is obtained using a shortest-path algorithm of the type hop-count. Finally, the relationship Fk = ( f 1 k , f 1 k , . . . , f m k ) is defined to describe the kth variation of the flows set F0 . This takes place when a sub-optimal group of routes k = (1 k , 2 k , . . . , n k ) is considered. As a result, the parameter k is equivalent to the kth total number of path overlaps generated. An initial solution 0 and its corresponding 0 can be formulated to the initial graph G in the form of:

(1)

TE

D

minimize δi, j (k ) k = k ∀i, j∈[1,n]∧i = j k ∈ [1, kmax ], subject to k ∈ [1 , k max ]

ET

R

A

C

In Eq. 1 the term δi, j (k ) represents the amount of node overlaps generated between the nodes f i k and f j k . As a result of this, we have that = k min . In the end, the representation of the set of optimal routes is given by opt . It is a term used to describe any of the group of paths depicted by k resulting of k min .

4 Epsilon Greedy Heuristic Optimization Method

R

Based on the formalization of the problem depicted in Eq. 1, this work proposes a solution based on an epsilon greedy heuristic method that uses the explorationexploitation tradeoff (EE). In this approach, an agent chooses between k different actions and receives a reward based on the chosen action. In order for the agent to select an action, it is assumed that each one of them has a separate distribution of rewards R = (r1 , r2 , . . . , rm ) and at least one action generates the maximum numerical reward. Therefore, the probability distribution of the set of r s is different and unknown to the agent. With this on mind, the agent is developed with the main objec-

156

J. Caceres and M. V. Garcia

tive of identifying the proper actions related to the maximum reward R after a set of trials. The implementation of the aforementioned agent means the generation of an individual group of k for every kth iteration prior to the calculation of the corresponding k and Rk . After kmax iterations the smallest k with the highest Rk is designed as the k min . The final algorithm consists in a 3 step solution where as a first an initial solution is calculated, then a greedy search is performed based on the previous, and finally the best solution is obtained from the results of the last search. For the initial solution calculation the algorithm behaves as follows:

A

δi, j

ψ

(2)

H

Wi, j (u, v) = 1 +

PT

ER

(i) During k = 1, the value of k and rk is obtained as a functions of the path overlaps resulting from 0 . 1 is calculated as the set of weighted shortest paths from the graph G 1 = (V, E 1 ). This one is a modified version of the unitary weighted graph G. In the previous, each set of edges receives a weight based on the node overlapping degree resulting from the set 0 . (ii) The cost function in charge of weighting any edge Wi, j (u, v) in G is defined as:

e=1

TE

D

C

where ψ ∈ R is a constant user defined parameter and δi, j is the number of node overlaps obtained from the routes i0 and i0 ∈ 0 , ∀i, j ∈ [1, n] ∧ i = j . (iii) With the previous it can be said that 1 and r1 is obtained from the shortest paths that corresponds to G 1 , and 1 is the overall number of overlaps related to them. = 1 )] ∧ [¬(1 < 0 ) → (iv) Finally, it can be said that [(1 < 0 ) → (min k = 0 )] (min k

R

ET

R

A

C

For the epsilon greedy agent the algorithm behaves as follows: (i) It generalizes the search of a min for any k ∈ [1, kmax ] by choosing between k exploration and exploitation randomly. For this decision, the agent uses the value of epsilon as reference to the probability of exploiting over exploring. (ii) Then, the value of G k = (V, E k ) is defined as a modified version of G k−1 = (V, E k−1 ). min = (iii) In the end k is calculated as function of k−1 and [(k < min k ) → (k min min min k )] ∧ [¬(k < k ) → (k = k )] Finally to obtain the best solution, the algorithm ends the calculations when k = kmax and returns min k . The quality of the final its proportional to the quality of the generated k sets and to the number of iterations used during the calculus.

5 Tests Scenarios In order to obtain a comparison data, random group of network topologies and flows are generated in order to test the performance of the resulting epsilon greedy heuristic

RETRACTED CHAPTER: Optimization of the Overlap Shortest-Path Routing ...

157

D

C

H

A

PT

ER

(EE) compared to a shortest path algorithm (SP). With the help of a network graphs generator, a set of 100 topologies are prepared for the analysis. Each graph was generated using a seed that defines a random matrix of N × N and a density

which can have a value in the range of [0, 1] ∈ R. The density value is obtained from the relationship = λ/N where λ represents the median nodes degree of the graph and takes values in the range of [4, 12]. The number of vertices used for in the generation seed is a constant value of N = 66. The gateway is chosen as the node with the highest betweeness centrality and a subset of n ⊂ N of field devices are configured as sensors programmed to periodically transmit data to the gateway. As part of this experiment, the value of n is limited within the range of [2, 22]. With the depicted set up, the group of shortest-path routes between the n sensors and the gateway are generated. This provides 100 instances of n possible routes. The user defined parameter ψ adopts the same value as the graphs density and the value of 100 for kmax parameter of the experiments. Each of the f i flows generated for the 100 topologies can be defined as a cluster of 4 elements f i = (Ci , Di , Ti , φi ) as discussed in the Sect. 2.2. Each of the Ci values are directly obtained from the product of the number of hops times the path φi between the source and the destination times the number of transmissions assigned to each slot. In this work the value of the last parameter w was assumed as 2. The periods Ti were generated in the form of 2 p where p ∈ N in the range of [4, 7]. Finally the value of Di is assumed to be equal to Ti in order to obtain a implicit-deadline model.

TE

6 Results

R

ET

R

A

C

The performance of both EE and SP methods when optimizing the routing in the generated random graphs is described in Fig. 2a–d. As it can be appreciated when comparing the performance of the proposed EE method with the SP strategy, the former is able to improve the results depicted by the later by a high margin. As shown in Fig. 2a, the node overlapping phenomenon that follows an exponential behavior is reduced in almost 50% with this values of λ, which gives a sight that in bigger networks this mitigation will be better. The positive effects of the overlap reduction are also visible in Fig. 2c. Given the direct relationship between the overlaps and transmission conflicts, affecting one will proportionally affect the remaining. When analysing the results of Fig. 2b it can be appreciated that the affection to the length of the routes is almost negligible and is not likely to increase in bigger networks. Finally, the positive benefits of the EE optimization are appreciated with the difference in the schedulability ratio between the two methods.

J. Caceres and M. V. Garcia

TE

D

C

H

A

PT

ER

158

C

Fig. 2 Result using variation of λ = [4, 8, 12]

R

A

7 Conclusions and Future Work

R

ET

As final result of the present investigation work, an effective real-time routing method for WSANs based on TSCH that uses an EDF policy is developed. The usage of a greedy heuristic method allowed the final controller to improve the real-time component of the data exchange in the network. This was done by reducing the total amount of overlaps in the paths generated for the traveling information. Additionally, parameters as the number of transmission conflicts and schedulability ratio were also improved as side effect of the operation of the proposed controller. The usage of an epsilon greedy heuristic algorithm for the minimization of overlaps in the transmission routes turned out to be an optimal network performance enhancer. After analyzing the test data of both scenarios, it was clearly appreciated that the presented algorithm obtained improvements that almost reached effectiveness levels of 50%. The positive results of the former work demonstrated that the routing process in networks can be optimized and still has room left for its growth. Theoretically, autonomous intelligent structures developed using machine learning algorithms are

RETRACTED CHAPTER: Optimization of the Overlap Shortest-Path Routing ...

159

able to reach even higher levels of performance in tasks such as optimizing. With this in mind, the future steps in this line of investigation are the improving of the routing in WSANs with the usage of machine learning algorithms.

References

R

ET

R

A

C

TE

D

C

H

A

PT

ER

1. Brun-Laguna, K., Minet, P., Tanaka, Y.: Optimized scheduling for time-critical industrial IoT. In: 2019 IEEE Global Communications Conference (GLOBECOM). IEEE (2019). https://doi. org/10.1109/globecom38437.2019.9014218 2. Chinchilla-Rodríguez, Z., Bu, Y., Robinson-García, N., Costas, R., Sugimoto, C.R.: Revealing existing and potential partnerships: Affinities and asymmetries in international collaboration and mobility. In: ISSI 2017—16th International Conference on Scientometrics and Informetrics, Conference Proceedings. pp. 869–880. International Society of Scientometrics and Informetrics (2017) 3. Lu, C., Saifullah, A., Li, B., Sha, M., Gonzalez, H., Gunatilaka, D., Wu, C., Nie, L., Chen, Y.: Real-time wireless sensor-actuator networks for industrial cyber-physical systems. Proc. IEEE 104(5), 1013–1024 (2016). https://doi.org/10.1109/jproc.2015.2497161 4. Mangharam, R., Rowe, A., Rajkumar, R., Suzuki, R.: Voice over sensor networks. In: 2006 27th IEEE International Real-Time Systems Symposium (RTSS’06). IEEE (2006). https://doi.org/ 10.1109/rtss.2006.51 5. Tavakoli, R., Nabi, M., Basten, T., Goossens, K.: Topology management and TSCH scheduling for low-latency convergecast in in-vehicle WSNs. IEEE Trans. Ind. Inform. 15(2), 1082–1093 (2019). https://doi.org/10.1109/tii.2018.2853986 6. Terraneo, F., Polidori, P., Leva, A., Fornaciari, W.: TDMH-MAC: Real-time and multi-hop in the same wireless MAC. In: 2018 IEEE Real-Time Systems Symposium (RTSS). IEEE (2018). https://doi.org/10.1109/rtss.2018.00044 7. Vucinic, M., Chang, T., Skrbic, B., Kocan, E., Pejanovic-Djurisic, M., Watteyne, T.: Key performance indicators of the reference 6tisch implementation in internet-of-things scenarios. IEEE Access 8, 79147–79157 (2020). https://doi.org/10.1109/access.2020.2990278

Application of Emerging Technologies in Aviation MRO Sector to Optimize Cost Utilization: The Indian Case Chandravadan Goritiyal, Aditi Bairolu, and Laxmi Goritiyal

Abstract This paper mentions the various behavioural aspects of aviation personnel and the status of Indian MRO sector when technological advancements are considered. The Indian aviation sector consists of many personnel who belong to the baby boomer generation at the senior management level. It has been observed that the older generations are technology-averse and any change in the system further makes them reluctant to work and adjust. We further notice the various facets of changes occurring in the sector once emerging technologies have been implemented. Adding to this, this paper further dwells into the upcoming trends in the Indian MRO sector. This paper also describes the outcomes on various responses that industrialists and individuals from the sector hold for introducing emerging technology in the aviation MRO sector using ANOVA for analysis for the survey conducted for the paper. Keywords Maintenance repair overhaul · Emerging technology · Aviation · Generation gap · Sustainable · Lean principles · Management · Cost optimization · Artificial intelligence · Machine learning

1 Introduction 1.1 Technology Penetration in the Aviation Sector Considering the working age group (18–64) and the various generations within the age group, it is evident that not everyone is comfortable with using technology and would prefer operating any machinery manually. There can be many reasons. One is the fact that they are conditioned that way and other is the trust element. Much of the age group of 40–64 will trust a man over technology even when using C. Goritiyal (B) · A. Bairolu Prin.L.N Welingkar Institute of Management Development & Research (WeSchool), Mumbai, India L. Goritiyal Vivekanand Education Society’s Institute of Management Studies and Research, Mumbai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_15

161

162

C. Goritiyal et al.

technology adds to their comfort. The use of technology as elementary as IoT can increase productivity in the aviation sector which is still dominated by people (senior management) from 40 to 64 years of age. It is a mental bias that people have against technology. Aircraft these days uses high-end technology for their operations. The most common technologies are IoT, AI, and ML. The presence of these technologies in the aviation sector was much before any other industry had adapted to these. This signifies the importance of subtle technologies in this field, and yet, it is not fullfledged used. The scope in India is huge for technological boom in the aviation sector; however, Indian MRO industry seems to be hesitant to adapt to new technologies fully. The Indian aviation MRO sector is not using advanced technology as per expectations due to the following reasons: • The development cost of capabilities improvement increases due to new technology for an initial period. • Senior management in the MRO industry is from airlines of earlier generation (baby boomers) who are less acquainted with new technology (generation gap). • There is a constant uncertainty of increasing dependence on vendor escalating support from vendors whenever required. • Regulators (These are the government agencies which regulate the aviation MRO industry, and without their approval, MROs cannot function) trust any technology after constant trials and tests. Even after conducting tests on a regular basis, there is a fear of facing a technical snag on flight, causing jeopardy to life. But if the confidence is developed and MRO companies integrate IoT-based systems, it is proven that it will be extremely useful. Consider an instance: if you are developing a new capability wherein multiple analogue devices are measuring inputs like flow, pressure, and temperature which are given to programmable logic controller (PLC) or microcontroller as inputs, the PLC will be programmed as per the CMM requirement. After this, the unit under test will be tested as per CMM procedure. If the test results are as per CMM requirements, then the units are declared as serviceable. Presently, all MRO services are using conventional methodology for test set-up and data recording; that is manual entry and uploading of data. However, with the IoT-based data system, the same test results can be directly available in SQL data server. This data can then be monitored and interpreted for results. Today, almost all international airlines have incorporated IoT and are rendering benefits in various aspects. This paper describes how IoT can improve Indian MRO industry.

1.2 Current Status-Quo of Footprint of Technology in India In India, the MRO sector is heavily dependent on analogue technology. The test set-up used for validating and checking the parts including the pneumatic, hydraulic, and electrical system components is analogue in nature. The cost of digital testing systems is yet to be designed and applied for the Indian MRO sector. IoT has been

Application of Emerging Technologies in Aviation MRO …

163

enabled to certain extent within aircrafts, but for the on-ground MRO, infusing the IoT or digital capability is yet to be tested and introduced. In case of requirement of sophisticated equipment, the airlines prefer to send the component to OEM and not to the MRO. The airlines that can afford the equipment are equipped with automatic test rigs. The automatic testing rigs cost millions of dollars which trigger chances of airlines not preferring to buy the advanced products. The airline sector is a capital-intensive sector and heavily depends on the macroeconomic factors. The pandemic situation has paused the operations entirely or partially, which makes the industrialists sceptical about their preferences, and the cash outflows have reduced making the sector conservative. Due to limited know-how about intelligent systems and resources, Indian MRO commercial aviation is reluctant to invest in the same. The Indian defence-related aerospace firms have a much larger footprint of intelligent systems given the need and the necessity. The Indian MRO industry is moving towards digitization. Today, some of the companies in this sector in India have adopted enterprise resource planning (ERP) systems to organize and store the data. Otherwise, most companies still prefer the conventional methodology or maintaining data. Considering a mid-sized MRO company without automation, around 1 TB of data will be generated, but if the entire industry becomes automated and smart sensors with IoT are implemented, the data generated will be huge, and better systems need to be incorporated to store the same.

1.3 Sustainability Civil aviation MRO is audited by DGCA which is mandatory in India, whereas some of the MRO organizations are EASA and FAA certified. These firms (DGCA, EASA, and FAA) are very much concerned about sustainability. Every component that has completed its shelf life has to be disposed of as per the norms and environmental guidelines set by the state. MRO organizations must maintain the data about the same. These records are maintained as part of audit compliance. The entire industry utilizes a set of chemicals as one of the major resources for efficient operability of the parts. These chemicals are hazardous and hence require constant monitoring. For the same, the industry follows guidelines set by OSHA and other standards for handling, disposing, and using these substances. For disposal of E-waste, laws of the land are followed unless OEM has separate provision for returning the component post the completion of its usable life. Companies like Air France Industries and Singapore Airlines are trying to implement IoT in their MRO vertical, but it will take a few more years to reach a 100% potential as it may not be needed for all the parts for maintenance and enable the entire facility with IoT.

164

C. Goritiyal et al.

1.4 Data and Supply Chain Maintenance Data play a major role and will act as the determinant for any future implementation of technology in every field. Aviation MRO industry works on data. Every year with the growing need of data and growing preference of air travel by the general populace, the aviation industry is relying more on IoT to collect and facilitate data. In many MRO facilities, this data are utilized to identify malfunctions to conduct routine maintenance with predictive analysis. The aviation MRO or parts suppliers can use IoT to track and monitor the usage of a part and run performances to check if proper inventory and stock levels are maintained. This will help avoid hoarding of parts eventually removing the surplus parts. This will further help in determining the lifespan of an item and help in building an early-warning system to alert if a part is about to run out of its lifespan. In the supply chain domain, there are many technological changes taking place. Radio frequency identification or RFID sensors are playing a major role to streamline the process and make it systematic and eventually apply the 5S/6S principle of operations. This technology when enabled with IoT can serve the following areas: • Accuracy of Inventory counts The use of RFID in the supply chain domain in general has helped in tracing products and product levels efficiently. Similarly, in the field of MRO too, RFID with IoT will help in maintaining the supplies of the parts and will reduce misplacement of inventory. This will make the entire process of tracing sustainable and reduce waste generation with reduced labour hours. This will further lead to optimum cost utilization, asset monitoring, and predictive maintenance of products and parts with on-time delivery. • Asset tracking and real-time tracking of parts/products A combination of RFIDs, IoT, and an efficient GPS system allows efficient product traceability. The details will then be correctly fed into the system generating on-time processing of data providing details like locations, delivery, or dispatch time. This will further help in analysis and predicting potential delays if any due to unavoidable circumstances before the delays occur eventually providing a forecast and will help in taking preventive measures if needed. • Monitoring of parts after installation The technology of sensors has provided a ground for efficiency permitting suppliers to monitor parts. Sensors will help to collect data to check on the lifespan, repairs, or malfunctions of parts if needed. New parts can be delivered efficiently on need-based requirements. This will further help in achieving lower downtime for airlines and achieve balanced flow of customers for supply chains.

Application of Emerging Technologies in Aviation MRO …

165

• New Product Launch and Product Transition An IoT set-up with smart technology can help achieve efficient launching of new products and in product transition or evolution too. The data collected throughout provide product manufacturers with room for improvement. Any analysis performed on the data will lead to pointers for improvement eventually accepting advancement in technology. • Consideration of all stakeholders in a supply chain (visibility) This will help in monitoring the suppliers and other stakeholders considering all factors related to operations. All factors will expedite the performance and resolve quality-related issues. The parts produced will have elements of advanced engineering and functionalities. This will help with reducing preventive costs and overall help in saving the cost of the products. Internet of things or Internet of avionics bridges fills all the voids existing in the current Indian MRO sector. The traditional approaches are still accounted for in India. QuEST shares a major say in the global market for airframe and aircraft component manufacturers. The firm provides solutions to everyday challenges in design, production, customer support, and compliance. According to QuEST, IoT or IOAT will expand the horizon by enabling, rapid recovery, delivery analysis and disruption identification, on-site validation and verification, reporting and escalation, corrective action identification, and the ARC tool to prevent reoccurring problems [1].

1.5 Impact of Emerging Technology in MRO The increasing penetration of technology in the MRO sector will help the industry achieve lean philosophy by removing redundant or non-valuable activities. This increasing emphasis on technology will increase the need to have better storage and database management systems. Today, every organization has its own database management systems, and most of them have adapted to the cloud-based storage systems. According to the report published by MIT Lean Aerospace Initiative (2005) [2], implementation of lean philosophy will lead to the following effects: 1. 2. 3. 4. 5. 6. 7. 8. 9.

Set-up time: 17–85% improvement. Lead time: 16–50% improvement. Labour hours: 10–71% improvement. Costs: 11–50% improvement. Productivity: 27–100% improvement. Cycle time: 20–97% improvement. Factory floor space: 25–81% improvement. Travel distances (people or product): 42–95% improvement. Inventory or work in progress: 31–98% improvement.

166

10.

C. Goritiyal et al.

Scrap, rework, defects, or inspection: 20–80% improvement.

Eventually, application of lean principle will allow the industry to explore technology and automation. This will need IoT, RPA, and cloud technology implementation. Looking at the numbers stated above, overall efficiency increases. Implementation of technology will reduce unscheduled maintenance.

1.6 Challenges According to a report by PWC, the Indian MRO sector will reach a USD 2.6 billion market [3]. There are many challenges while keeping up with the advancements in this field and serving the ever-growing industry with its growing needs. MRO organizations in India belong to the SME sector. Hence, there exists a gap in accepting any change and implementing the same. SMEs require funds and investments and hence are risk-averse while executing technological changes. Also, technological literacy plays a major role as not everyone is aware and well-versed with the same.

2 Research Design 2.1 Objectives • To spread awareness of emerging technologies in Indian aviation MRO industry to make it more competitive in the global arena. • To know the reasons for lesser adoption of emerging technologies in capability development in Indian MRO. • To test mental bias among the age group of people towards emerging technologies in aviation MRO.

2.2 Hypotheses Declaration The hypotheses declaration is based on significance of age group with acceptance of emerging technology. The hypotheses are as follows: • Ho1 (null): No significant difference in age group and their opinion towards cost of emerging technologies is affecting adaptation. • Ha1 (alternate): Significant difference in age group and their opinion towards cost of emerging technologies is affecting adaptation.

Application of Emerging Technologies in Aviation MRO …

167

• Ho2 (null): No significant difference in age group and their opinion toward adoption of technology will bring Indian MRO industry at par with the Global Aviation MRO industry. • Ha2 (alternate): Significant difference in age group and their opinion towards adoption of technology will bring Indian MRO industry at par with the Global Aviation MRO industry.

2.3 Research Methodology The research is about aviation and aviation MRO industry specifically. A survey form was shared to collect responses, and interviews were also conducted to understand the segment well. Hence, focused group sample size was taken. The aviation MRO industry has a total size of around 5000–10,000 people in India. Hence, out of which around 46 respondents have been considered. In many cases, the respondent had to be informed about the new technologies as they were reluctant while filling the form due to less awareness of the latest technologies. Further, interviews were also directly taken to capture responses from the MRO industry. The research methodology is a quantitative and qualitative research type. The paper addresses the following two major concerns: • Will making the aviation MRO industry completely IoT and emerging technology equipped benefit in cost optimization? • Is the industry ready to accept change and are the personnel educated enough to work with a hundred per cent automated industry? Both the above questions define the characteristics of the related population who are in the field of aviation. The two questions ask for opinions and highlight perception of a population whether they are exposed to the titled technology. A survey was conducted to get a deeper understanding of the psyche of the test population. It was conducted with their due will and permission.

3 Description The MRO industry can bring vast changes in existing facilities by adaptation of new technologies. According to researchers, who are already in the MRO domain for more than 15 years, there are many advantages by adapting to new technologies. The researcher researched to find out the adaption of these new technologies in Indian MRO industry. According to researchers, there are many advantages of having applied IoT and AI in aviation MRO domain. Hence, the following are the advantages of having an infrastructure completely built of IoT: • The data are directly available with the authority uploaded in the cloud.

168

C. Goritiyal et al.

• It increases traceability for regulators with regard to the maintenance and overhaul process of the aircraft component. • IoT will reduce the probability of occurrence of human errors as the bench tests (if applicable) will be performed by system commands, and the readings are recorded automatically without human intervention. This increases the credibility of the system. The data are recorded at real-time basis, and hence, transparency will always be maintained by MRO companies with regard to component testing. • Will it help authorities have a clear picture of the history of the individual part or component which can be analysed for the repaired data which will be helpful for future inventory management. • IoT and AI will be a major help for auditors: Quality auditors, safety auditors, etc., to trace any mishandling of maintenance data. • IoT will make the process paper-free and hassle-free reducing carbon footprint helping India achieve its green goals. • The data will be handled by an encrypted algorithm, ergo avoiding human interference. • Maintenance and segregation of the data will not be cumbersome. • Employees will invest time learning new skills which will develop new capabilities. • This will help employees get hands-on experience in technology removing fear and increasing trust. They will further act as educators. • All reports will be available at a touch. This will increase efficiency, bring delays to zero, and improve quality of work. • After introduction of IoT and data management systems, regulators will know which employee is accountable for a data if defaulted. • Every individual associated with the MRO company will have their performance records available digitally to help the human resource department make decisions for improving their skills. According to research, cost optimization can be achieved when the industry is completely IoT- and AI-driven. IoT has majorly caused the disruption in all the industries with the Internet revolution upgrading the need to have better skills. Though this has created the certainty of people losing their jobs, it has helped the world move towards a digital world. The following is how IoT helps in cost optimization: • Having digital security increased the scope for data points to be visually scrutinized and save finances incurred from employing people. • Today, virtual reality (VR) and IoT have helped leverage the single station workplace, to control and analyse from one unit, having better automated systems and has reduced errors caused due to humans. This has also created safer places to work as on flight nuances can be handled remotely as well. • IoT has given an edge to the introduction of biometrics. This ensures that the singularity of the person or technician or engineer entering and leaving a system is maintained. This maintains security and keeps a track of individuals and their movements.

Application of Emerging Technologies in Aviation MRO …

169

4 Research Outcome The MRO industry is growing in India due to the increasing number of aircrafts which warrants more services capability requirements in India. The respondent is category-wise: From the sector analysis of respondents (see Fig. 1), the research ensures that 91.3% of the respondents are from the aviation or allied sector which matters for the industry.

4.1 Cost Factor in Technology Implementation Below results describe different aspects related to cost in implementation of technology. All results are a product of the survey conducted targeting personnel in the aviation sector. From Tables 1 and 2, majority of the respondents feel that high cost is a cause for lesser preference and lesser adaptation of AI and IoT in the aviation MRO sector. Fig. 1 Sector analysis of respondents

Table 1 Viewpoints of people relating high cost and lesser adaptation of AI and IoT

Scale

Number of responses

1. Strongly disagree

4

2. Disagree

4

3. Neutral

13

4. Agree

20

5. Strongly agree

5

Grand Total

46

Response to survey question: Is high cost a factor for lesser adoption of AI and IoT (emerging technologies) by Indian aviation MRO organization?

170

C. Goritiyal et al.

Table 2 Viewpoints of people relating high cost and lesser adaptation of AI and IoT Scale

Number of responses (%)

1. Strongly disagree

8.70

2. Disagree

8.70

3. Neutral

28.26

4. Agree

43.48

5. Strongly agree

10.87

Grand Total

100.00

Response to survey question: Is high cost a factor for lesser adoption of AI and IoT (emerging technologies) by Indian aviation MRO organization?

Table 3 Viewpoint of respondents with respect to age group Scale

Age group 18–30

31–40

41–55

Strongly disagree

1

1

2

Disagree

1

1

1

1

4

Neutral

3

1

6

3

13

Agree

7

4

6

3

20

Strongly agree

2

1

0

2

5

Grand Total

14

8

15

9

46

56 and above

Grand total 4

Response to survey question: Is high cost a factor for lesser adaption of AI and IoT (emerging technologies) by Indian aviation MRO organization?

43.48% of the respondents agree that high cost is a factor for lesser adaptation of AI and IoT in the Indian aviation MRO sector. 10.87% strongly agree with the above agreement. Around 28% are neutral with the above statement (Table 3). From Table 4, the ANOVA test proves that there is no significant difference in age group, and their opinion towards cost of emerging technologies is affecting adaptation of the same since p-value (0.456469) is greater than 0.05 level of significance. Hence, we do not reject null hypothesis. It is a paradox in the minds of individuals that introduction of any new technology will lead to increased cost and budget. The whole understanding is that only the initial cost is high, and once installation is done, the operating expense reduces to a great extent and increases profit. The above data (see Fig. 2) show that industry still thinks new technology is at higher cost and hence lesser adaption. Table 5 shows the relation between employee behaviour (resistance) and acceptance of emerging technologies. 47.83% of the respondents disagree, and 23.91% strongly disagree towards avoiding acceptance of new technologies due to resistance caused due to employees. Hence, organizations should push installing emerging technology in their work premise (Fig. 3).

Application of Emerging Technologies in Aviation MRO …

171

Table 4 ANOVA single factor for viewpoint of respondents with respect to age group Summary Groups

Count

Sum

Average

Variance

Column 1

5

14

2.8

6.2

Column 2

5

8

1.6

1.8

Column 3

4

15

3.75

6.916667

Column 4

4

9

2.25

0.916667

ANOVA Source of Variation

SS

df

MS

F

P-value

F crit

Between Groups

10.944

3

3.648148

0.920254

0.456469

3.343889

Within Groups

55.5

14

3.964286

Total

66.444

17

Fig. 2 Response to survey question: Is high cost a factor for lesser adaptation of AI and IoT (emerging technologies) by Indian aviation MRO? Table 5 Responses to survey question: Should AI and IoT (emerging technologies) be avoided due to employees’ resistance to adapt to new technologies?

Scale

Percentage of respondent (%)

Strongly disagree

23.91

Disagree

47.83

Neutral

21.74

Agree

2.17

Strongly agree

4.35

Grand total

100.00

172

C. Goritiyal et al.

Fig. 3 shows relation between employee behaviour (resistance) and acceptance of emerging technologies. (Response to survey question: Should AI and IoT (emerging technologies) be avoided due to employee’s resistance to adapt to new technologies?)

Some of the respondents feel that emerging technologies should surely be implemented as a pilot project initially before full-fledged adaptation. Irrespective of the people ready to accept the changing dynamics of the workplace, any new change should be welcomed. Once implemented, professionals will surely try and experience change. Initially, most of them will be averse to change, but will eventually go for it either under peer pressure or out of curiosity (human nature). Employees should be educated by conducting workshops and with that provide individuals an insight about how it will benefit the organization. How it is part of a cost optimization strategy instead of thinking that it is used for cost cutting.

4.2 AI and IoT (Emerging Technologies) in Indian MRO Industry and Global Aviation MRO Industry 34.78% of the people strongly agree that adapting emerging technologies will bring Indian MRO sector at par with the global sector. 21.74% agree with the same. The test cases are all industry professionals. Their answers are their accountability proof to accept AI and IoT as a boon to the Indian industry (Fig. 4 and Table 6).

4.3 Mental Bias of Age Group Towards AI and IoT in Aviation Sector The data analysis shows that there is no significant difference in age group, their opinion towards adoption of technology will bring Indian MRO industry at par with

Application of Emerging Technologies in Aviation MRO …

173

Fig. 4 Percentage composition of individuals accepting the technology. (Response to survey question: Do you think the adaptation of AI and IoT (emerging technologies) will bring Indian MRO industry at par with Global Aviation MRO industry?)

Table 6 Responses of people thinking that adopting new technology will bring Indian MRO industry at par with Global Aviation MRO sector

Scale

Number of responses

Strongly disagree

6

Disagree

4

Neutral

10

Agree

10

Strongly agree

16

Grand Total

46

Response to survey question: Do you think the adoption of AI and IoT (emerging technologies) will bring Indian MRO industry at par with the Global Aviation MRO industry?

Global Aviation MRO industry, and since p-value (0.910356) is greater than 0.05 level of significance, we do not reject null hypothesis. All the age groups feel that emerging technology should be adapted in the aviation MRO sector in a globally competitive environment. In fact, during survey of this research, the industry leaders were also persuaded for new technology adoption, and they have shown keen interest in adoption of new technologies which can increase their business potential at par with global peers (Tables 7 and 8).

5 Conclusion Many of the seniors in the industry are not familiar with the terms of emerging technologies like IoT and AI. However, they agree to adapt these technologies in aviation and aviation MRO industry. For example, the research team explained IoT and AI

174

C. Goritiyal et al.

Table 7 Responses of people thinking that adopting new technology will bring Indian MRO industry at par with Global Aviation MRO sector (percentage) Scale

Age group 18–30 (%)

31–40 (%)

41–55 (%)

56 and above (%)

Grand total (%)

Strongly disagree

2.17

2.17

6.52

2.17

13.04

Disagree

2.17

0.00

6.52

0.00

8.70

Neutral

10.87

4.35

2.17

4.35

21.74

Agree

8.70

0.00

8.70

4.35

21.74

Strongly agree

6.52

10.87

8.70

8.70

34.78

30.43

17.39

32.61

19.57

100.00

Grand Total

Response to survey question: Do you think the adoption of AI and IoT (emerging technologies) will bring Indian MRO industry at par with the Global Aviation MRO industry?

Table 8 ANOVA single factor analysis for responses of people thinking that adopting new technology will bring Indian MRO industry at par with Global Aviation MRO sector (percentage) Summary Groups

Count

Sum

Average

Variance

Column 1

5

14

2.8

3.2

Column 2

3

8

2.666667

4.333333

Column 3

5

15

3

1.5

Column 4

4

9

2.25

1.583333

ANOVA Source of Variation Between Groups

SS 1.312745

df

MS

F

P-value

F crit

3

0.437582

0.176572

0.910356

3.410534

2.478205

Within Groups

32.21667

13

Total

33.52941

16

and their application to senior management professionals (name to be kept confidential before filling the form). Hence, educating the respondent was also involved in this research, and researchers tried to fulfil this objective. Based on questionnaire responses, we can conclude the following: Though people are knowing about IoT and AI but awareness about actual application of the same is limited. The response from the industrial professionals points out that current adaptation of emerging technologies in Indian MRO is limited due to apprehension about the cost and intricacies of adaption of new technologies in MRO industry. However, general perception is that adoption of technologies will increase over a period as more and more people get conversant with the same. Majorly, all respondents agree that it will increase efficiency and transparency in the organization. Stakeholders agree that the adaption of these technologies will bring Indian MRO industry at par with global standards. Further, the adaption of new technologies will increase the need of data security. However, it will create more job opportunities

Application of Emerging Technologies in Aviation MRO …

175

like data analysts, data management staff, etc. One should always be on the better side of the coin; it will enhance the present quality assurance techniques and will help to give zero defects. Every technology when introduced will incur cost and replace the middle layer workforce, but it will increase the need to have better skill sets (up-gradation) in employees which will increase profits and reduce operational expenses. Research responses from different age groups prove that emerging technology should be adapted in the aviation MRO sector to make Indian aviation MRO sector globally competitive. Commercial opportunities for public aerospace markets and aviation services are available in almost all sub-sectors including public and private airlines; engines and parts; maintenance, repair, and overhaul (MRO); and the construction of the airport, equipment, and facilities, among others [4]. India can set its neighbours as its benchmark and benefit from Aatmanirbhar Bharat initiative by the Indian government and use the current budget allocation (Rs. 3700 crores in 2019–2020 fiscal year) in an efficient way [5, 6]. A stepwise approach has been chosen in this paper to highlight many factors related to technology in Indian MRO sector. It might seem that there exists a huge gap to be fulfilled but the pandemic situation acts as a stimulus to allow officials and industrialists to build upon and research on various opportunities. Data will play a major role in determining the accountability of systems to build better and more efficient systems. Need of data scientists and software engineers will increase when the organizations in India adapt to changes. Recent advancements in the concept of blockchain have inhibited technology enthusiasts to find a path to connect blockchain with all industries to increase robustness, confidentiality, continuous-improvement approach and to reduce the risk to default. Overall, there is a huge scope in India. Investments and capex will be huge, but in the long run, it will help generate maximum profits acting as a breather for the sector.

6 Recommendations Considering aviation industries rapid fleet expansion in India, implementing AI techniques in aviation MRO could be the future requirement for Indian MRO industry growth. According to the industry based survey response, deployment of pilot projects to judge the adaptability of the emerging technology in the MRO sector is a definite recommendation. New technology should always be welcomed. AI and IoT should be part of enterprise resource planning (ERP) systems used by MROs. If it is an additional cost, it will increase the cost to the customer as well. This may make MRO less cost efficient initially. However, in future, it will reduce operating cost due to increased productivity of workmen for essential activities. Further, initially AI and IoT can be adapted in limited applications, for example, monitoring and recording the test data which can be exploited in the long run. It should take a few years before use for QA/regulatory purposes. Educating the MRO industry capability development professional with emerging technology is a must, and it is the need of the hour.

176

C. Goritiyal et al.

References 1. QuEST Global IoT: Setting the Pace of Progress in Aerospace, https://www.quest-global.com/ iot-setting-pace-progress-aerospace/. Last accessed 2021/06/18 2. Sánchez, P.A., Sunmola, F.: Factors Influencing Effectiveness of Lean Maintenance Repair and Overhaul in Aviation. In: 2017 International Conference on Industrial Engineering and Operations Management (IEOM) Proceedings on Proceedings, pp. 855–864. IEOM Society International, Bristol, UK (2017) 3. PWC: General Aviation Unfolding Horizons, https://www.pwc.in/assets/pdfs/industries/gen eral-aviation-070312.pdf. Last accessed 2021/06/18 4. Export China—Aviation, https://www.export.gov/apex/article2?id=China-Aviation. Last accessed 2021/06/18 5. Union Budget Budget 2019–2020, https://www.indiabudget.gov.in/budget2019-20/doc/Bud get_Speech.pdf. Last accessed 2021/06/18 6. IBEF: Indian Aviation Industry, https://www.ibef.org/industry/indian-aviation.aspx. Last accessed 2021/06/18

User Evaluation of a Virtual Reality Application for Safety Training in Railway Level Crossing Oche A. Egaji, Ikram Asghar, Luke Dando, Mark G. Griffiths, and Emma Dymond

Abstract The railway is one of the safest modes of transport all over the world. However, there are safety concerns for pedestrians and other road users as some of these railways can pass through smaller towns. As a potential technological solution, this study utilised a virtual reality (VR) application to train pedestrians on essential safety skills required to cross a level crossing safely. It builds on the previous work to evaluate the user acceptance of an educational VR railway level crossing application utilising the system usability scale (SUS) and sense of presence (SoP) questionnaire. Overall, the SUS score decreases with age, while the sense of presence score increases with age. Keywords Virtual reality · Safety training · Railway level crossing · System usability · Sense of presence · User experience

O. A. Egaji (B) · I. Asghar · L. Dando · M. G. Griffiths The Centre of Excellence in Mobile and Emerging Technologies, University of South Wales, Pontypridd, UK e-mail: [email protected] I. Asghar e-mail: [email protected] L. Dando e-mail: [email protected] M. G. Griffiths e-mail: [email protected] E. Dymond Motion Rail Ltd., Ebbw Vale, Wales, UK e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_16

177

178

O. A. Egaji et al.

1 Introduction The railway is one of the safest modes of transport all over the world. However, there are safety concerns for pedestrians and other road users as some of these railways pass through smaller towns [1]. There are reported incidents that have led to death or disability, especially in children who are at four times the risk of having an accident on the road or railway than adults [2]. This number is significant as the mode of everyday travel for children age (5–10 years old) is walk (46%), car (46%) and bus (5%). The mode of everyday travel for children aged (11–16 years old) includes walk (38%), car (23%) and bus (29%) [3]. Also, 95% of road and railways accidents are attributed to the human factor where the users fail to comply with the safety guidelines properly [1]. Hence, pedestrian safety on roads and railway has gained global importance from academia and stakeholders over the past couple of years. There is no single approach to reducing accidents at a crossing; however, about 70.06% of experts in this area agree that a continuous educational campaign is crucial to reducing accidents at railway crossings [1]. As a result, pedestrians must be trained on the appropriate skills to use the roads and railway crossings safely. An increasing number of campaigns promote children’s education with cognitive skills such as identifying safer railways crossings, judging distance from the crossing, managing their crossing timing and controlling their movements according to the environment [4]. However, stakeholders in this area are still committed to innovation and using technology-based solutions for safer railway crossings. The long-term goal is to focus on technology-assisted solutions that can improve pedestrian’s safety at railway level crossing [5]. As a potential technological solution, this study utilised a virtual reality (VR) application to train pedestrians on essential safety skills required to cross a level crossing. VR can be used to study human behaviour. Over the years, the research interest in VR technology has been increasing as it is considered a stepping stone in technology innovation. This increased interest has hugely been influenced by the current availability of low-cost VR devices, including the Sony PlayStation VR, HTC Vive, Oculus Rift and the mixed reality interfaces (MRITF) (e.g. HoloLens) [6]. VR provides a safe and interactive approach to learning; it can increase user engagement, improving their knowledge retention over time. Another useful functionality of VR is conducting an experiment that would have been otherwise expensive, risky to conduct in reality, and it gives users the ability to control environmental factors. Also, the VR application offers the opportunity for repeated performance of a task with immediate feedback [7]. Hence, children are capable of learning essential safety skills via VR [8]. This study builds on [9] to evaluate the user acceptance of the VR application utilising the system usability scale (SUS) and sense of presence (SoP) questionnaire. The study participants consisted of pupils from a local comprehensive school and some university students with various abilities in using the VR headset. The participants listened to a presentation about safety guidelines on a railway crossing and

User Evaluation of a Virtual Reality Application for Safety …

179

were introduced to VR. The participants then applied what they have learnt in a VR level crossing task before completing the SUS usability and SoP questionnaire.

2 Literature Review The rise in the popularity of VR technology can be attributed to the availability of low-cost headsets capable of delivering high-quality immersive experiences [10]. Most research application of VR has focused on software solution due to the low-cost availability of VR hardware. However, these software solutions have to be adapted for bespoke use, often leading to extended development times [6]. The users experience in VR is a critical factor of their willingness to adopt this technology. This can be inferred by measuring the presence, reality and realism levels in VR. This paper evaluates the SUS and SoP for the VR application. The SoP in VR can be defined as the complex psychological feeling of “being there” in the virtual space. This consists of the perception and sensation of the physical presence and the ability to react and interact with objects in VR space as though they were in the real world [11]. The SoP can often influence user’s collaboration in virtual space because the strong emotion associated with presence does foster a better understanding of the virtual environment and the ease of use [12]. The utilisation of VR technology as a tool for training in various academic disciplines has evolved over the years. A systematic review carried out by Checa and Bustillo [13] on the effect of VR-serious game (SG) on knowledge acquisition found that 30% of the published research in this area demonstrated an enhanced learning and training outcome in VR compared to the traditional approach. However, there was no significant difference in 10% of the published research. VR can be a valuable tool for identifying and correcting risky pedestrian behaviour in children [7]. Schwebel et al. showed an improved self-efficacy of children in crossing the street after VR training which means children can learn essential safety skills in VR [8]. This finding was reaffirmed by Wu et al. [14], the study investigated the behaviour of pedestrians with macular degeneration when crossing the street. Their study found that male participants had a higher tendency to take risks than female participants. They suggested that VR can be used to investigate similar issues of public concern [14]. VR has been adopted as a rehabilitation tool in clinical settings to treat specific medical conditions. One such example is the use of VR to rehabilitate patients with an eating disorder [15]. Also, VR has been used to treat social anxiety [16], aid physical rehabilitation [17] and assist stroke patients in regaining a sense of balance [18]. Other applications of VR can be found in the understanding of structural data in drug discovery [19], training children on traffic safety skills [2] and training soldiers to make swift decisions during battle [20]. Another author explored learner driver’s attitudes towards safety at a level crossing after a VR education session. The authors claimed that the attitude of the learner drivers towards the risk associated with level crossing was a good predictor of future driving behaviours. The

180

O. A. Egaji et al.

relationship between the attitude of the learner drivers towards the level crossing and risky driving behaviour was altered after the VR educational session [21]. Hence, continuous exposure to VR training is capable of changing dangerous pedestrian behaviours. The interactive nature and captivating power of VR enhance the feeling of being part of the virtual environment; this enables users to perform actions that are closer to reality. VR is a valuable tool for controlling research experiments as all participants will have the same virtual experience [22], and the VR environment is also a valuable tool for automated data collection [23]. The use of VR has shown to foster student engagement which in turn leads to better skill acquisition. It provides a bespoke, convenient, engaging, interactive alternative to the traditional classroom approach or other traditional methods [24]. The successful implementation of VR applications in multi-domains, as discussed above, and its ability to engage the users more naturally motivate this paper to investigate its use in railway safety training for children and adults alike. Additionally, VR will mitigate the risk of training the children in an actual level crossing setting, which is time- and resource-intensive [25].

3 Research Methodology This section outlined the methodology and the relevant material for this study. In addition, this section includes an overview of participants in the study, a description of the VR application, the data collection process and questionnaires.

3.1 Participants This study participant consisted of 11/12-year-old children in year 7 at a Welsh comprehensive school. The school was recruited because of its proximity to the researchers. There were 83 participants from the school, comprising both genders. Of the participants, 46 are 11-year-olds, and 37 are 12-year-olds. The gender record was excluded from the questionnaire based on the recommendation of the university’s ethics board. The average age of the participants from the comprehensive school is 11.45, with a standard deviation (SD) of 0.50. Additional data were collected from participants recruited from the university. The university participants were recruited by advertising and sending out an online signup sheet. A total of 16 participants aged between 18 and 25 years old were recruited. Hence, the study consisted of a total of 99 participants.

User Evaluation of a Virtual Reality Application for Safety …

181

3.2 Overview of VR Training Prototype The VR environment consists of elements with similar characteristics as the physical world. It includes the external landscape such as the sky, grass, railway tracks, trains, power stations, roads and pavements. In addition, the VR environment incorporates sounds and ambient noise to bolster the user’s immersion. The prototype consists of two scenarios—the first is the level crossing, while the second is the foot crossing. The participants are made to start with the first scenario and on successful completion will proceed to the second. After the second scenario has been completed, the participants are required to move to a separate room where they complete the SoP and SUS questionnaire. Another screen mirrors the participants view in the VR environment to enable the teacher to monitor their performance and provide the necessary feedback afterwards. When a participant fails to cross safely, they can run through the scenario again until they cross safely. Level crossing: The level crossing scenario is designed to teach users how to cross a level crossing safely. The participant approaches the level crossing on the side of a road causing the crossing to trigger. The sirens will sound, and the barriers will close. From this point on, any movement past the barrier will be classed as an unsafe attempt to cross. If the child moves onto the track, they are first limited to a walking speed to give the train time to reach them. Upon failure or success of the scenario, the participant will find themselves on the other side of the track with an LED board displaying the scenario’s outcome (pass or fail). Figures 1 and 2 consist of the level crossing and LED display board, respectively. Foot Crossing. The foot crossing scenario is an extension of the level crossing scenario. It teaches users how to safely cross a foot crossing, usually located in the countryside. The participant will approach the foot crossing from a path with a gate that separates them from the track. The gates open automatically once the participant approaches. However, right next to the gate is a phone that the participant needs to pick up to ask an operator if it is safe to cross. When the participant picks up the phone and brings it to their head, it is safe to cross. Otherwise, they will get stuck on the track, which means they fail the scenario. On the other side of the track is a gate that opens when approached with a LED message board displaying the scenario’s Fig. 1 Level crossing

182

O. A. Egaji et al.

Fig. 2 Display LED board

Fig. 3 Foot crossing

outcome (pass or fail). The foot crossing as seen in the VR environment is shown in Fig. 3.

3.3 Data Collection Process and Questionnaires The user testing of the VR application continued over a couple of sessions. The participants were briefed on the VR application and its various functionalities, including navigating the VR environment. After finishing both scenarios, the participants completed two questionnaires (system usability scale (SUS) [26] and an adapted sense of presence (SoP) [27, 28] questionnaire) in a separate room. A research team member was present in the room to help explain the questions further for those that required more clarifications. The SUS questionnaire was adopted because of its popularity and its effectiveness in evaluating system usability in different sectors over the years. This has established it as a reliable instrument for measuring system usability [14]. The minimum required sample size for SUS is 15, making it a specialist utility tool when the sample size is small [15]. Hence, 99 is considered an acceptable sample size. The SUS questionnaire consists of ten questions with a five-point Likert scale. The participants were advised to give their immediate response to each item and not think about it for too long. The

User Evaluation of a Virtual Reality Application for Safety …

183

participants were also advised to select the centre point for any item, and they were not sure about. The SUS questionnaire consists of the following questions: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

I think that I would like to use this system frequently. I found the system unnecessarily complex. I thought the system was easy to use. I think that I would need the support of a technical person to be able to use this system. I found the various functions in this system were well integrated. I thought there was too much inconsistency in this system. I would imagine that most people would learn to use this system very quickly. I found the system very cumbersome to use. I felt very confident using the system. I needed to learn a lot of things before I could get going with this system.

The SoP questionnaires developed by Slater, Usoh and Steed were chosen for their simplicity and length. The questionnaire was slightly modified to make it more understandable to the test participants. The SoP questionnaire consists of six questions with seven-point Likert scale. The SoP questionnaire consists of three themes, and each of the themes has varying questions. The three themes are the extent to which the virtual environment (VE) becomes the dominant reality, the sense of being in the VE and the extent that the VE is remembered as a place. The questionnaire consists of a scale of 1–7, where the higher score represents a more substantial presence by the participants. The presence score can be computed using answers to questions that are greater or equal to 6 [29]. The SoP questionnaire consists of the following questions: 1.

Please rate your sense of being in a railway level crossing scene, on the following scale from 1 to 7, where seven represents your normal experience of being in a place. I had a sense of “being there” at the level crossing.

(1) Not at all. (7) Very much. 2.

To what extent was there times during the experience when the virtual level crossing space became the “reality” for you, and you almost forgot about the “real world” of the classroom in which the whole experience was really taking place? There were times during the experience when the virtual level crossing space became more real for me compared to the “real world”.

(1) At no time. (7) Almost all the time. 3.

When you think back about your experience, do you think of the virtual level crossing space more as images that you saw or more as somewhere that you visited? Please answer on the following 1–7 scale: The virtual level crossing space seems to me to be more like.

(1) Images that I saw. (7) Somewhere that I visited.

184

4.

O. A. Egaji et al.

During the time of the experience, which was strongest on the whole, your sense of being in the virtual level crossing scene or of being in the real world of the classroom? I had a stronger sense of being in.

(1) The real world of the classroom. (7) The virtual reality of the level crossing. 5.

Consider your memory of being in the virtual level crossing space. How similar in terms of the structure of the memory is this to the structure of the memory of other places you have been today? By “structure of the memory”, consider things like the extent to which you have a visual memory of the level crossing. Whether that memory is in colour, the extent to which the memory seems vivid or realistic, its size, location in your imagination, the extent to which it was an unobstructed and wide view of an extensive area in all directions and other such structural elements. I think of the virtual level crossing space as a place in a way similar to other level crossing places I have visited.

(1) Not at all. (7) Very Much so. 6.

During the time of the experience, did you often think to yourself that you were actually just standing in a classroom wearing a helmet or did the virtual field of trains, tracks, trains and sound overwhelm you?

During the experience, I often thought that I was really standing in the classroom wearing a helmet. (1) Most of the time I realised I was in the classroom. (7) Never because the virtual level crossing field overwhelmed me.

4 Results and Discussion This work builds on Danbo et al. [9] to include SUS and SoP evaluation. The result and discussions for the 83 comprehensive school participants and the 16 university participants are discussed in this section.

4.1 SUS Questionnaire The SUS questionnaire is a simple but yet effective measure for the usability of products. Each question consists of five possible responses, namely strongly disagree (1), disagree (2), neutral (3), agree (4) and strongly agree (5). The SUS score can be calculated using the formula given in [30]. The SUS scores can only be between 0 and 100, where the best possible score is 100, and the worst possible is 0. The SUS score is not a percentage; however, the average SUS score is 68, and it has a standard deviation of 12.5 [31]. According to Bailenson and Yee [17], the system

User Evaluation of a Virtual Reality Application for Safety … Table 1 SUS score adjective ranges [33]

185

SUS score

Adjective rating

84.10–100.00

Best imaginable

80.80–84.00

Excellent

71.10–80.70

Good

51.70–71.00

Ok/fair

25.10–51.60

Poor

0.00–25.00

Worst imaginable

usability is considered good when the SUS score is greater than 68. Simultaneously, a SUS score below average points to issues with system usability and the need for further investigation. However, it does not explicitly tell you what needs to be changed. A SUS score between 0 and 100 is easy to understand when making a relative judgement on the usability of a product. However, paper [32] attempted to translate the numeric usability score into an absolute judgement of usability by adding a seven-point adjective Likert scale as an eleventh question. The selected adjectives such as “excellent”, “good”,” ok” or “poor” are loosely associated with users’ descriptions of the usability of a product. The authors found that the SUS score significantly (a < 0.01) correlates (r = 0.822) with the seven-point Likert scale. The proposed seven-point adjective rating for the SUS score is shown in Table 1 [33]. As shown in Table 1, the authors associated a SUS score over 84 as “best imaginable”, SUS score over 80.7 as “excellent”, SUS score above 71 as “good”, and a SUS score under 25.1 as “worst imaginable”. The adjective awful was excluded as it was not significantly different from the other adjective. It is hoped that the adjective rating will help simplify the understanding of the SUS score to non-human factor professionals. The SUS score shown in Table 2 is based on the various demographics. According to the table, the average SUS score for the participants at the comprehensive school is 80.48, with a standard deviation of 10.91. In comparison, the SUS score for university students is 78.75, with a standard deviation of 12.01. However, according to the SUS score grading system proposed by Bangor et al. [33], these two scores are relatively close and are graded “good”. Therefore, the marginal difference between both scores does agree with the empirical study carried out by Bangor et al. [33]. In addition, the study by Bangor et al. [33] found a small significant negative correlation between SUS score and age (i.e. an increasing SUS score with decreasing age), but there was no significant difference between gender and SUS score. However, the authors suggested the need for further data collection to ascertain their claims. Table 2 SUS scores for demographics Age

No. of participants

Mean SUS score

STD (SUS score)

11–12

83

80.48

10.91

18–25

16

78.75

12.01

186

O. A. Egaji et al.

Table 3 Percentage participants SUS score adjective distribution SUS score adjective

% No. of participants (Comp)

% No. of participant (Uni)

Best imaginable

38.55

43.75

Excellent

10.84

6.25

Good

30.12

25.00

Ok/fair

20.48

25.00

Poor

0.00

0.00

Worst imaginable

0.00

0.00

Total

100

100

Fig. 4 combines SUS score (Comp and Uni)

The percentage distribution of the participants based on the SUS adjectives for the comprehensive school and the university is shown in Table 3. The percentages of participants in the comprehensive school group with the adjectives best imaginable, excellent, good and ok/fair are 38.55%, 10.84%, 30.12% and 20.48%, respectively. The percentage distribution for the university group is 43.75%, 6.25%, 25.00% and 25.00% for best imaginable, excellent, good and ok/fair, respectively. None of the SUS scores from the two groups falls within the poor and worst imaginable category, which is valuable feedback on the system usability. A higher percentage of the SUS score from both groups falls within the best imaginable compared to the other adjectives. The average SUS scores for the comprehensive school and university groups across the various SUS adjective are shown in Fig. 4. The comprehensive school group has a higher average SUS score for the best imaginable SUS adjective.

4.2 Sense of Presence (SoP) The mean SoP count and the SoP mean for the comprehensive school and university participants are shown in Table 4. The mean SoP count consists of the average count of the presence questionnaire response 6 and above. The mean SoP count and standard deviation for the comprehensive school and the university are 0.50 (STD: 0.29) and 0.6 1 (STD: 0.31). The SoP mean is the average of the score across all six questions. The mean SoP and standard deviation for the comprehensive school pupils and the

User Evaluation of a Virtual Reality Application for Safety …

187

Table 4 Sense of presence (SoP): means and standard deviations SoP count (STD)

SoP mean (STD)

Comp (N = 83)

0.50 ± 0.29

5.20 ± 1.01

Uni (N = 16)

0.61 ± 0.31

5.59 ± 1.25

university students are 5.20 (STD: 1.01) and 5.59 (STD: 1.25). This result illustrates that the university student participants recorded a higher presence overall than the comprehensive school pupils. However, the difference between both groups is not significant. According to Table 5, the university students recorded a higher presence across the individual categories than the comprehensive school pupils. The top two SoP scores are Q1 and Q5 for the comprehensive school pupils and the university students. The presence score shown in Table 6 is the average response count per participant that is “6” or “7”. Based on this table, the university students recorded a higher presence score for all six questions. According to Table 6, the top two highest mean SoP counts for the comprehensive school group are 0.65 (STD: 0.48) and 0.67 (STD: 0.47), and for the university group are 0.82 (STD: 0.39) and 0.76 (STD: 0.44)) which corresponds to Q1 and Q4, respectively, for both groups. The two lowest SoP count scores for the comprehensive school pupils are 0.28 (STD: 0.45) and 0.40 (STD: 0.49) for Q3 and Q5, respectively. While the lowest two for the university students are Q2 and Q5, with a mean SoP count score of 0.47 (STD: 0.51) and 0.47 (STD: 0.51), respectively. The Q5 is the lowest score for the comprehensive school and the university students. The distribution of the SoP questions among the three themes is given below: Theme 1: The sense of being in the VE (Q1 and Q4). Theme 2: The extent to which the VE becomes the dominant reality (Q2 and Q6). Theme 3: The extent to which the VE is remembered as a place (Q3 and Q5). The average SoP score across the three themes is shown in Table 7. According to the table, the university students recorded higher presence scores than the comprehensive school students across the three themes. Theme 1 is the highest score for both Table 5 Sense of presence (SoP): means and standard deviations (individual questions) Q1

Q2

Q3

Q4

Q5

Q6

Comp

5.75 ± 1.29

4.93 ± 1.76

4.51 ± 1.68

5.77 ± 1.55

4.89 ± 1.58

5.37 ± 1.72

Uni

6.12 ± 0.70

5.18 ± 1.59

5.41 ± 1.42

6.0 ± 1.06

5.41 ± 1.23

5.41 ± 1.50

Table 6 Mean and standard deviation to response (6 and 7) of individual questionnaire Q1

Q2

Q3

Q4

Q5

Q6

comp

0.65 ± 0.48

0.45 ± 0.50

0.28 ± 0.45

0.67 ± 0.47

0.40 ± 0.49

0.57 ± 0.50

Uni

0.82 ± 0.39

0.47 ± 0.51

0.53 ± 0.51

0.76 ± 0.44

0.47 ± 0.51

0.59 ± 0.51

188

O. A. Egaji et al.

Table 7 Average sense of presence score across the three themes Theme 1

Theme 2

Theme 3

Comp

5.76

5.15

4.7

Uni

6.06

5.30

5.41

groups (university and the comprehensive school). This implies that both groups have a higher sense of being in the VE than themes 2 and 3. On the other hand, theme 3 is the lowest score for the comprehensive school compared to the university students. The low presence score could be because the younger participants have less experience of an actual railway crossing, hence finding it difficult to associate the VR space and reality. This is heavily reflected in the mean SoP count score for Q3, as shown in Table 6.

5 Conclusion This paper explored the user acceptance of a VR application for training children on safety skills on a level crossing. The VR application consists of level crossing and foot crossing scenarios. Participants from a comprehensive school and university evaluated the VR application using the SUS and SoP questionnaire. The total number of participants was 99; it consists of 83 participants from the comprehensive school and 16 participants from the university. The initial trial results revealed that children liked the VR-based training system and showed considerable interest in learning. Overall, the SUS score decreases with age which does agree with the previous studies. However, the SoP score does increase with age. An explanation could be that the older participants from the university have more real-life experience with railway level crossings than those from the comprehensive school because of the age gap. Hence, the participants from the university could more closely associate the VE with reality than those from the comprehensive. However, the participants from the comprehensive school enjoyed and found the VR application slightly more helpful than those from the university. The long-term adaption of this application can reduce the chances of accidents for children, who are at four times the risk of having an accident on the road or railway than adults. Acknowledgements The authors would like to acknowledge the European Regional Development Fund (ERDF) and the Welsh Government for funding this project. We would also like to acknowledge the roles played by Motion Rail Limited and CEMET throughout the project.

User Evaluation of a Virtual Reality Application for Safety …

189

References 1. Starcevic, M., Bari´c, D., Pilko, H.: Safety at Level Crossings: Comparative Analysis (2016) 2. James, T., Tolmie, A., Hugh, F.: Adoption of behavioural roadside training programme improves children’s road crossing skills. University of Strathclyde. https://pureportal.strath.ac.uk/en/imp acts/adoption-of-behavioural-roadside-training-programme-improves-chil. Accessed 10 Nov 2020 3. Department for transport, National Travel Survey 2014: Travel to school. 2014 (Online). Available: https://assets.publishing.service.gov.uk/government/uploads/system/upl oads/attachment_data/file/476635/travel-to-school.pdf 4. Foot, H.C., Thomson, J.A., Tolmie, A.K., Whelan, K.M., Morrison, S., Sarvary, P.: Children’s understanding of drivers’ intentions. Br. J. Dev. Psychol. 24(4), 681–700 (2006) 5. Network Rail: Level crossings, Network Rail (2018a). https://www.networkrail.co.uk/runningthe-railway/looking-after-the-railway/level-crossings/. Accessed 10 Nov 2020 6. Cipresso, P., Giglioli, I.A.C., Raya, M.A., Riva, G.: The past, present, and future of virtual and augmented reality research: a network and cluster analysis of the literature. Front. Psychol. 9 (2018). https://doi.org/10.3389/fpsyg.2018.02086. 7. Luo, H., Yang, T., Kwon, S., Zuo, M., Li, W., Choi, I.: Using virtual reality to identify and modify risky pedestrian behaviors amongst Chinese children. Traffic Inj. Prev. 21(1), 108–113 (2020). https://doi.org/10.1080/15389588.2019.1694667 8. Schwebel, D.C., et al.: PW 0376 Can we teach children to cross streets using virtual reality delivered by smartphone? Results from china. Inj. Prev. 24(Suppl 2), A51–A51 (2018). https:// doi.org/10.1136/injuryprevention-2018-safety.137 9. Dando, L., Asghar, I., Egaji, O.A., Griffiths, M., Gilchrist, E.: Motion rail: a virtual reality level crossing training application. In: Proceedings of the 32nd International BCS Human Computer Interaction Conference, Swindon, GBR, July 2018, pp. 1–5. https://doi.org/10.14236/ewic/HCI 2018.131 10. Huygelier, H., Schraepen, B., van Ee, R., Vanden Abeele, V., Gillebert, C.R.: Acceptance of immersive head-mounted virtual reality in older adults. Sci. Rep. 9(1), Art. no. 1 (2019). https:// doi.org/10.1038/s41598-019-41200-6 11. Heeter, C.: Being there: the subjective experience of presence. Presence Teleoper. Virtual Environ. 1(2), 262–271 (1992). https://doi.org/10.1162/pres.1992.1.2.262 12. Cruz, A., Paredes, H., Fonseca, B., Morgado, L., Martins, P.: Can presence improve collaboration in 3D virtual worlds? Procedia Technol. 13, 47–55 (2014). https://doi.org/10.1016/j.pro tcy.2014.02.008 13. Checa, D., Bustillo, A.: A review of immersive virtual reality serious games to enhance learning and training. Multimed. Tools Appl. 79(9), 5501–5527 (2020) 14. Wu, H., Ashmead, D.H., Adams, H., Bodenheimer, B.: Using virtual reality to assess the street crossing behavior of pedestrians with simulated macular degeneration at a roundabout. Front. ICT 5 (2018). https://doi.org/10.3389/fict.2018.00027 15. Gutiérrez-Maldonado, J., Ferrer-García, M., Caqueo-Urízar, A., Letosa-Porta, A.: Assessment of emotional reactivity produced by exposure to virtual environments in patients with eating disorders. Cyberpsychol. Behav. Impact Internet Multimed. Virtual Real. Behav. Soc. 9(5), 507–513 (2006). https://doi.org/10.1089/cpb.2006.9.507 16. Walkom, G.: Virtual reality exposure therapy: to benefit those who stutter and treat social anxiety. In: 2016 International Conference on Interactive Technologies and Games (ITAG), Oct. 2016, pp. 36–41. https://doi.org/10.1109/iTAG.2016.13 17. Bailenson, J.N., Yee, N.: Virtual interpersonal touch: haptic interaction and copresence in collaborative virtual environments. Multimed. Tools Appl. 37(1), 5–14 (2008). https://doi.org/ 10.1007/s11042-007-0171-2 18. Deutsch, J., Mirelman, A.: Virtual reality-based approaches to enable walking for people poststroke. Top. Stroke Rehabil. 14(6), 45–53 (2007)

190

O. A. Egaji et al.

19. Kingsley, L.J., et al.: Development of a virtual reality platform for effective communication of structural data in drug discovery. J. Mol. Graph. Model. 89, 234–241 (2019). https://doi.org/ 10.1016/j.jmgm.2019.03.010 20. Hill, R.W., Jr, Gratch, J., Marsella, S., Rickel, J., Swartout, W.R., Traum, D.R.: Virtual humans in the mission rehearsal exercise system. Ki 17(4), 5 (2003) 21. Bari´c, D., Havârneanu, G.M., M˘airean, C.: Attitudes of learner drivers toward safety at level crossings: do they change after a 360° video-based educational intervention? Transp. Res. Part F Traffic Psychol. Behav. 69 (2020). Accessed 03 Nov 2020 (Online). Available: https://trid. trb.org/view/1688921. 22. Blascovich, J., Loomis, J., Beall, A.C., Swinth, K.R., Hoyt, C.L., Bailenson, J.N.: Immersive virtual environment technology as a methodological tool for social psychology. Psychol. Inq. 13(2), 103–124 (2002) 23. Bainbridge, W.S.: The scientific research potential of virtual worlds. Science 317(5837), 472– 476 (2007) 24. Concannon, B.J., Esmail, S., Roduta Roberts, M.: Head-mounted display virtual reality in postsecondary education and skill training. Front. Educ. 4 (2019). https://doi.org/10.3389/feduc. 2019.00080 25. Vafadar, M.: Virtual reality: opportunities and challenges. Int. J. Mod. Eng. Res. IJMER 3(2), 1139–1145 (2013) 26. Brooke, J.: System usability scale (SUS): a quick-and-dirty method of system evaluation user information. Read. UK Digit. Equip. Co Ltd, vol. 43, 1986 27. Slater, M., Steed, A., McCarthy, J., Maringelli, F.: The influence of body movement on subjective presence in virtual environments. Hum. Factors 40(3), 469–477 (1998). https://doi.org/10. 1518/001872098779591368 28. Usoh, M., et al.: Walking > walking-in-place > flying, in virtual environments. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, USA, July 1999, pp. 359–364. https://doi.org/10.1145/311535.311589 29. Usoh, M., Catena, E., Arman, S., Slater, M.: Using presence questionnaires in reality. Presence Teleoper. Virtual Environ. 9(5), 497–503 (2000). https://doi.org/10.1162/105474600566989 30. Asghar, I., Egaji, O.A., Dando, L., Griffiths, M., Jenkins, P.: A virtual reality based gas assessment application for training gas engineers. In: Proceedings of the 9th International Conference on Information Communication and Management, Prague, Czech Republic, Aug 2019, pp. 57–61. https://doi.org/10.1145/3357419.3357443 31. Sauro, J.: SUStisfied? Little-Known System Usability Scale Facts, 22 Sept 2020. https://uxp amagazine.org/sustified/ 32. Bangor, A., Kortum, P., Miller, J.: Determining what individual SUS scores mean: adding an adjective rating scale. J. Usability Stud. 4(3), 114–123 (2009) 33. Bangor, A., Kortum, P.T., Miller, J.T.: An empirical evaluation of the system usability scale. Int. J. Human-Comput. Interact. 24(6), 574–594 (2008). https://doi.org/10.1080/104473108 02205776

GreenMile—Gamification-Supported Mobile and Multimodal Route Planning for a Sustainable Choice of Transport Robin Horst, Timon Fuß, and Ralf Dörner

Abstract The consideration of sustainability within the area of passenger transportation is an essential aspect for reducing CO2 emission within this particular part of our daily life. In this paper, we explore the use of gamification within route planning processes on mobile devices to inform users about environmental impacts. We investigate how an app can help the users in making informed decisions about which routes could be used for their journey considering the sustainability of the utilized transportation vehicles. We propose a gamified route planning app—GreenMile— and discuss its underlying concepts. We evaluate the concepts within a user study. The results show that our participants were particularly motivated initially by collaborative leaderboards that visualize the efforts of an individual group of users. Furthermore, it could be shown that after using the app, users showed a long-term willingness to increasingly consider sustainability in their route planning in future. Keywords Smart mobility · Gamification · Sustainable transportation

1 Introduction Sustainability is usually mentioned in connection with climate and environmental protection and indicates that products and projects should aim to protect our environment. Sustainable development should satisfy present needs without compromising the conditions of future generations, and resources should be used without overusing them, i.e., only to the extent that they can grow again naturally [1]. In such efforts, the societal transitioning from utilizing transportation with a high CO2 emission (e.g., R. Horst (B) · T. Fuß · R. Dörner RheinMain University of Applied Sciences, Wiesbaden, Germany e-mail: [email protected] T. Fuß e-mail: [email protected] R. Dörner e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_17

191

192

R. Horst et al.

car or plane) toward public transportation (e.g., bus or train) is seen as an important objective (e.g., [2]). Overall, there is high interest in how the transportation sector can adjust to reach certain climate goals by reducing CO2 emission (e.g., [3, 4]). Often, such efforts rely on technical innovations. However, sustainability within the transport sector can also be adjusted by peoples’ decisions regarding their choice of transportation. People using less CO2 emissive transportation could reduce the overall emission. However, different aspects can influence such decisions [5], for example, costs, convenience, and the travel duration. A common technology for planning a journey is to utilize apps on mobile devices as a companion of our everyday life. This work explores gamification methodology [6] to be used within the route planning process of passengers on their smartphones. Different work (e.g., [6, 7]) suggests that game elements in serious contexts can positively influence the motivation and commitment of people. Specifically, peoples’ intrinsic motivations can be increased, for example, for educational purposes [8]. In this paper, we make the following contributions: • We propose a concept for a gamified, multimodal, and mobile route planner app that helps its users to make informed decisions on how certain routes and the transportation they incorporate impact CO2 emission. The concept is implemented in a resulting iOS app—GreenMile. Within this app, we explore suitable gamification elements and how they could foster sustainable decision-making. • We evaluate our app and its underlying concepts within a user study, including a long-term application over several days. Based on the study results, we derive recommendations on how to use gamification within route planning to make sustainability-aware decisions concerning CO2 emission. This paper is organized as follows. The next section discusses related work gamification, route planning, and sustainability. Section 3 presents GreenMile. Section 4 describes the evaluation of our concepts and presents lessons learned. The last section concludes our work and gives an outlook on future work.

2 Related Work Gamification describes the use of game elements within a non-game context [6]. While people consciously play games, for example, this need not or should not be the case with gamification. With a view to related areas, such as serious games, the game elements and the game experience are not in the center of a gamified application—the serious intent prevails. There already exist various game elements suitable for gamification purposes (e.g., [9, 10]), however, their actual application within a particular domain such as sustainable route planning must be examined individually. Among those established elements are leaderboards, point scoring, badges, quests, narrative, levels, and onboarding, among others.

GreenMile—Gamification-Supported Mobile and Multimodal Route …

193

Concerning route planning and the choice of transportation, the term multimodality generally refers to the possibility of a person to use different modes of transport [11]. Consequently, multimodality can be seen as a variation or combination of different transport vehicles, and a destination can be reached by using different concatenations of transportation. Thus, each variation possibly varying in its amount of CO2 emission. There exist recent studies exploring gamification within the transport area (e.g., [12, 13]). Yen, Mulley, and Burke [12] conclude that gamification for sustainable transportation planning should be explored, applied, and evaluated in practical studies to investigate the suitability of particular game elements and derive best practices. They conclude that ‘gamification in the transport context is a developing method which can be used to encourage participant engagement and enjoyment’ [12] that ‘gamification design analysis and effect evaluation are also critical’ [12]. Finally, Yen, Mulley, and Burke [12] conclude that future work should focus on low-cost trials to help to identify which game elements can motivate passengers and what gamification might achieve for this purpose. Kazhamiakin et al. [13] explore sustainable urban mobility and the potential of gamification to create incentives for people to voluntarily decide toward more sustainable transportation solutions. They conduct a case study within a smart city utilizing a gamified route planner. They describe their technical application framework and conclude that gamification could incentivize users to use alternatives to their private cars for transportation. However, the proposed gamification also offered the participants a chance to win real profit in terms of a one-month free pass for a local bike-sharing service. Thus, it is of high interest if gamification without extrinsic motivators such as real prizes, only with intrinsic motivators, is sufficiently rewarding for players to voluntarily change transport behavior to sustainable solutions. Overall, related work indicates that sustainable decisions with regard to the choice of transport can benefit from gamification. Furthermore, we revealed reveals opportunities that future work should explore and build on. Particularly, recent work highlights the necessity of conducting practical trials to corroborate existing theoretical and technical concepts and draw conclusions with actual users. Low-level gamification elements must be explored to give advice on which and how game elements can inform users and motivate them to make sustainable decisions and contribute to lower CO2 emission within our daily routine.

3 GreenMile We investigate different game elements within GreenMile. In particular, we focus on aspects about onboarding, narrative, leaderboards, levels, and achievements. Figure 1a illustrates the onboarding in GreenMile. It is conducted by providing four simplistic screens the first time GreenMile is started on a device. The first screen describes that GreenMile is a multimodal route planner. The second screen introduces collaborative and competitive leaderboards that we describe later. The third

194

R. Horst et al.

introduced the user to a simple narrative we designed for GreenMile. The user is assigned to take care of a plant avatar. It grows and flourishes during the usage of GreenMile, the more CO2 emission was avoided by the user. This narrative is utilized to engage the user and give the sustainable decisions a direct meaning. The avatar represents this meaningfulness visually. Finally, the last screen provides a motivational message about that sustainable daily travels can have a direct environmental impact to engage philanthropist player types [14]. After the app’s onboarding, users can access functionalities to fulfill the main purpose of a multimodal route planner: planning routes from a start to one over several target destinations. Besides common functionalities illustrated in Fig. 1b left, we provide educative elements within the planning process. We provide different route alternatives for a query. Particularly, we provide users with the fastest, the cheapest, and the route with the least CO2 emission to select from (Fig. 1b middle). When a specific route was chosen and the destination was reached, a suitable narrative notice is displayed for the users (e.g., Fig. 1b right). For example, if the sustainable route was chosen, then the user is informed about how much kg CO2 was saved compared to the cheapest and fastest route. Otherwise, it is displayed how many trees should be planted to compensate CO2 overhead. This information is embedded in the narrative by displaying the impact on the players’ plant avatar (e.g., sustainable route: the plant is watered; other route: the plant dries out a bit). Additional to sidelined gamification methods such as the onboarding and the narrative, we considered more explicit gamification, apart from the original intention of a route planner app. First, we consider leaderboards a suitable game element. The leaderboards are divided into two categories, individual and group boards (Fig. 1c left). Both consist of two parts, the list itself and the users’ avatar. The list displays the saved CO2 emission in bars and gives a matching value. A users rank is determined in comparison with other users. However, some users may not travel as much as others and thus cannot save as much CO2 emission as others. If the user’s ranking is not among the first places, an individual leaderboard is created displaying both the three placements that were able to collect more and fewer points than our user. This allows a direct comparison with users who have similar travel behavior and ensures that users can achieve feasible progress. The same principle was applied to the group boards. To participate in a group, a user must first create it and add other users to it. Afterward, points can be collected together. This is intended to increase the feeling of belonging to a group and thus provide internal motivational incentives. Mobilespecific methods such as push notifications can be used additionally to increase the presence of GreenMile, for example, when other people in a user’s group contributed to the score. The second game element we considered for GreenMile is using levels (Fig. 1c middle). Levels could be graphically represented as a path that the user follows and progresses by completing either specific tasks/quests or gathering points as described for the leaderboards. For example, a first quest might be ‘Start your first sustainable journey,’ which is relatively easy to complete. Once a level is completed, access to the next, more difficult task is released. This process can increase curiosity, and the desire to achieve goals or conquer something. It can trigger the strive for competence

GreenMile—Gamification-Supported Mobile and Multimodal Route …

Fig. 1 Different aspects of GreenMile

195

196

R. Horst et al.

and mastery of a game as one important motivator. However, the user can also be deterred since individual quests might get too difficult. An example would be a business-related frequent flyer who might consider the task ‘Use the sustainable route ten times in a row’ as difficult. The last element we considered in this paper is using achievements. Compared to levels, users do not have to follow any sequence when using achievements. As soon as a task is completed, an achievement, i.e., its relating badge, is earned. In addition, the user’s progress is recorded for each task, which can be used as a guideline for the user. To provide long-time motivation, a level-up mechanic can be used, so that badges can be earned multiple times to level up the badges. Achievements can also offer a social component, for example, by forming opinions through the deeds of others or through comparison with others. Finally, we implemented GreenMile within a prototype for iOS.

4 Evaluation In a user study, we evaluated our sustainable and gamified route planner concepts. It involved a homogeneous group of 15 participants between 16 and 54 years (Ø 28). A think aloud test was conducted within a moderated remote study. We provided our participants with a GreenMile prototype including all base functionalities, onboarding, narrative, and leaderboards, and then introduced them to four tasks they had to solve within a given scenario to ensure they used all features of GreenMile during the study. After completing the tasks, we asked our participants to fill out a questionnaire. Its questions utilized a 7-point semantic differential or likert scale. In addition to custom questions about gamification and the relation to a sustainable transportation choice, the abbreviated version of the standardized AttrakDiff questionnaire [15] was used to draw conclusions about established software evaluation criteria such as hedonic and pragmatic qualities (e.g., usability). In the further course, we will address the following custom questions Q1-Q6: (Q1) How likely is it that a direct comparison with other users will motivate you to use sustainable means of transport in the future? (Q2) Do you find collecting points together in a group motivating? (Q3) Would collecting points in a group motivate you to choose sustainable transport? (Q4) Would you continue to consider emissions if you used it for a longer period of time? (Q5) Would you prefer a sustainable route to the cheapest or fastest route in the future? (Q6) Would best lists motivate you to save emissions in the long term? Finally, five participants also agreed to undergo an extended long-term study and use GreenMile for at least three days up to one week. We analyzed the outcome of the AttrakDiff questionnaire using its portfolio presentation [15] illustrated in Fig. 2 left. It shows that the tool was assessed with slightly higher pragmatic than hedonic quality. The confidence rectangle (light blue) indicates that the confidence of the pragmatic value was also higher than the hedonic ones. The descriptive statistic of the AttrakDiff results places GreenMile within the lower-left corner of the ‘desired’ area. This indicates that our participants assessed

Fig. 2 Descriptive statistics of the user study results. Left: Portfolio presentation of the AttrakDiff outcome spanned by pragmatic and hedonic qualities [15]. Right: Q1-Q6 for regular user group and long-term group. High values represent negative answers

GreenMile—Gamification-Supported Mobile and Multimodal Route … 197

198

R. Horst et al.

the usability of our tool as balanced and usable. However, both qualities could be further improved. Further data from the AttrakDiff’s single items and the qualitative analysis (think aloud, oral, and written free-form comments) support that specifically the clarity of the application’s UI should be improved, for example, by making it less complex and more intuitive. Figure 2 right represents the value distributions of Q1–Q6. The box-whisker plots show that all mean values lie below 4 (the neutral value of the scale), except for Q1, Q3, and Q6 for the long-term group. This indicates that the overall gamification was perceived well. It also shows that the regular group tended towards much better ratings than the long-term group. Particularly, the differences of Q1, Q3, and Q6 between the groups show that the considered collaborative gamification aspects might be less effective after initial usage. However, interestingly, Q4 was answered more positively by the long-term group than the regular one. This indicates that GreenMile could motivate our participants initially in making informed decisions, but it also has the potential to make a longer-lasting impact on the overall decision process by educating the users during the route planning. Overall, we derive the recommendation that gamification within mobile route planning apps can help motivating users initially to engage with sustainability aspects, particularly cooperative gamification aspects within individual groups. However, the proposed game elements were only partially suitable in terms of long-term motivation. In our case, it might be valuable considering the game elements also as onboarding of users, or including the game elements from time to time instead of making them a main aspect of the route planning app.

5 Conclusion and Future Work In this paper, we have explored the use and impact of gamification for fostering making sustainable decisions when planning transportation within a mobile route planning app. We introduced GreenMile and discussed different game elements and how they could be implemented in a route planner. We derived recommendations based on the results of a user study. In future work, we will build upon our insights and specifically investigate how gamification concepts can be included in route planning processes to motivate longterm usage. Such gamified long-term motivators could complement GreenMile and further increase the contribution of GreenMile toward sustainable transportation. Acknowledgements The work is supported by the Federal Ministry of Education and Research of Germany in the project Innovative Hochschule (funding number: 03IHS071).

GreenMile—Gamification-Supported Mobile and Multimodal Route …

199

References 1. Brundtland, G. H., Khalid, M., Agnelli, S., Al-Athel, S., Chidzero, B.J.N.Y.: Our Common Future. New York (1987) 2. Hickman, R., Hall, P., Banister, D.: Planning more for sustainable mobility. J. Transp. Geogr. 33, 210–219 (2013) 3. Sperling, D., Gordon, D.: Two Billion Cars: Driving Toward Sustainability. Oxford University Press, Oxford (2010) 4. Gilbert, R., Perl, A.:. Transport Revolutions: Moving People and Freight Without Oil. New Society Publishers (2010) 5. Gardner, B., Abraham, C.: Going green? Modeling the impact of environmental concerns and perceptions of transportation alternatives on decisions to drive. J. Appl. Social Psychol. 40(4), 831–849 (2010) 6. Deterding, S., Dixon, D., Khaled, R., Nacke, L.: From game design elements to gamefulness: defining “gamification”. In: Proceedings of the 15th International Academic MindTrek Conference: Envisioning Future Media Environments, pp. 9–15 (2011) 7. Sailer, M., Hense, J., Mandl, J., Klevers, M.: Psychological perspectives on motivation through gamification. Inter. Design Architect. J. 19, 28–37 (2014) 8. Banfield, J., Wilkerson, B.: Increasing student intrinsic motivation and self-efficacy through gamification pedagogy. Contemp. Issues Educ. Res. (CIER) 7(4), 291–298 (2014) 9. Kumar, J.: Gamification at work: Designing engaging business software. In: International Conference of Design, User Experience, and Usability, pp. 528–537. Springer, Berlin (2013) 10. Zichermann, G., Cunningham, C.: Gamification by Design: Implementing Game Mechanics in Web and Mobile Apps. O’Reilly Media, Inc. (2011) 11. Kuhnimhof, T., Chlond, B., Von Der Ruhren, S.: Users of transport modes and multimodal travel behavior: steps toward understanding travelers’ options and choices. Transp. Res. Record 1985(1), 40–48 (2006) 12. Yen, B.T., Mulley, C., Burke, M.: Gamification in transport interventions: another way to improve travel behavioural change. Cities 85, 140–149 (2019) 13. Kazhamiakin, R., Marconi, A., Perillo, M., Pistore, M., Valetto, G., Piras, L., et al.: Using gamification to incentivize sustainable urban mobility. In 2015 IEEE First International Smart Cities Conference (ISC2) (pp. 1–6). IEEE (2015) 14. Marczewski, A.: Even Ninja Monkeys Like to Play. Blurb Inc., London (2015) 15. Hassenzahl, M., Burmester, M., Koller, F.: AttrakDiff: Ein Fragebogen zur Messung wahrgenommener hedonischer und pragmatischer Qualität. In: Mensch & Computer 2003, pp. 187–196. Vieweg Teubner Verlag (2003)

A Detailed Study for Bankruptcy Prediction by Machine Learning Technique Suriya Begum

Abstract Predicting the bankruptcy is important for any company to take their financial decisions. With the help of company’s data, using machine learning techniques, we can predict whether a company is bankrupt or not. For studying and prediction, the data is obtained from Taiwan Economic Journal for a period of ten years (1999–2009). The data has 6819 records with 96 features. This paper proposes four classification algorithms like random forest classifier, XGBoost classifier, logistic regression and artificial neural network with K-fold cross-validation technique. Random forest classifier outperforms other algorithms with an accuracy score of 96.53%. The second highest accuracy with 96.12% is shown by artificial neural network in this study. Keywords Bankruptcy · Logistic regression · Random forest classifier · Artificial neural network · XGBoost · Taiwan

1 Introduction Bankruptcy is a legal process in which a person or company is declared a debtor. To avoid financial losses, it is critical to determine the danger of bankruptcy early on. Different soft computing techniques can be used to determine insolvency in this context [1]. Since the 1960s, the occurrence of significant bankruptcy cases has sparked an increase in interest in corporate bankruptcy prediction models [2]. Bankruptcy, financial depression and, as a result, business collapse are usually very expensive and disruptive events for any firm or organisation. Financial depression models use statistical projections to try to predict whether a company will go bankrupt in the future [3]. Bankruptcy prediction has long been a topic of interest in finance and management science, attracting both academics and practitioners’ attention. From the first study of financial accounts, it has progressed into applying machine learning or deep learning algorithms to accomplish the forecast; thanks to the rapid advancement of modern computer technology [4]. S. Begum (B) Bharat Institute of Engineering and Technology, Hyderabad, Telangana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_18

201

202

S. Begum

Practitioners and academics have conducted extensive research on models in predicting bankruptcy and default situations in the context of credit risk management. Traditional statistical techniques (e.g. discriminate analysis and logistic regression) and earlier artificial intelligence models have been used to analyse bankruptcy in fundamental academic research (e.g. artificial neural networks) [5]. Predicting bankruptcy and credit reporting have long been considered important topics that have been extensively researched in the accounting and finance literature. These financial decision-making difficulties were solved using artificial intelligence and machine teaching approaches [6]. For all economic stakeholders, bankruptcy prediction is extremely useful in the accounting and finance fields. The task of accurately predicting business failure, particularly in financial crisis conditions, is well acknowledged to be difficult [7]. For financial institutions to verify the creditworthiness of companies or management, the bankruptcy prediction model (BPM) is critical. The failure to precisely foresee bankruptcy has the potential to obliterate socioeconomic effects. As a result, it is critical to provide financial decision-makers with accurate bankruptcy forecasting in order to avoid these loss scenarios [8]. Applying fresh data mining approaches for assessing firm financial distress has recently gained a lot of scholarly attention. The support vector machine (SVM) and back propagation neural (BPN) networks have been effectively used in a variety of applications with great generalisation outcomes, including rule extraction, classification and assessment [9]. Predictive models for bankruptcy have sparked a lot of curiosity. Traditional statistical analysis has dominated academic research, although interest in machine learning methods is expanding [10]. Studies on bankruptcy prediction have shown greater accuracy with enhanced machine learning models [11]. Machine learning, which has been applied in the field of financial hardship warning, has a major issue with classification learning [12]. Machine learning for predicting financial distress is becoming increasingly popular, with more academics contributing to the field. Despite the large amount of study, the domain does not appear to be synchronised, and there is still a lot of uncertainty about the best strategy to employ and which facts to use [13]. The primary modelling approach for bankruptcy prediction is artificial neural network (ANN) modelling [14]. Despite the fact that there have been numerous successful researches on bankruptcy identification, probabilistic approaches are rarely used [15]. The remainder of this paper is split into seven sections. A brief background work is carried out on bankruptcy in Sect. 2. The proposed model is explained in Sect. 3. Classifier algorithms are discussed in Sect. 4, followed by methodology used in Sect. 5. In Sect. 6, evaluation and assessment of the proposed model are discussed. Section 7 shows results of the experiments, and finally, Sect. 8 concludes the paper with the conclusion.

A Detailed Study for Bankruptcy Prediction by Machine …

203

2 Background Yi Qu et al. studied machine learning and deep learning techniques that are used to forecast bankruptcy [4]. Kalyan Nagaraj et al. present a bankruptcy prediction method that categorises businesses according to their risk levels. For the prediction of bankruptcy, the predictive model serves as a decision-making tool [1]. Giovanni Cialone integrated deep and convolution neural networks in the research. The accuracy, sensitivity and AUC of the findings were compared for predictive performance on a testing set. The results demonstrate that the variable selection was quite effective, with all the models exhibiting quite good performance [10]. Sen Zweng et al. and Shin et al. demonstrated that the neural network with extra layers with dropouts has the maximum accuracy among the approaches (support vector machine, neural network with dropouts and autoencoder). In addition, when compared to previous methods (logistic regression, genetic algorithm and inductive learning), the new method is more accurate [12, 16]. Pragya Patel et al. proposed a new hybrid naive Bayes classifier for bankruptcy dataset categorisation in their paper. In comparison with two other naive Bayes and Bayes net classifiers, the proposed classifier has achieved the highest accuracy of more than 92%. The principle of log probability was employed in the hybrid naive Bayes classifier [3]. Sarojini Devi et al. compare and contrast the various strategies employed, depending on their respective strengths and drawbacks [8]. Chih-Fong Tsai et al. applied neural networks and three datasets and shown that the multiple classifiers only outperform the single classifier in one of the three datasets when compared to the single classifier as the benchmark in terms of the average prediction accuracy [6]. Flavio Barboza et al. and Zhang et al. evaluated the performance of machine learning techniques (support vector machines, bagging, boosting and random forest) and shown the outcomes from discriminant analysis, logistic regression, and neural networks in bankruptcy prediction one year ahead of time [5, 17]. Ming-Chang Lee and Chang developed a model for evaluating firm economic difficulties based on SVM with Gaussian RBF kernel. The comparative results reveal that, while the difference in performance metrics is minor, SVM provides greater precision and accuracy with lesser error rates. [9]. Richard P. Hauser et al. compare the categorization and predictions of bankrupt enterprises using comprehensive logistic regression with the Bianco and Yohai (BY) estimator vs maximum likelihood (ML) logistic regression using a three-fold cross-validation methodology [18]. Feng Mai et al. studied efficacy of two different deep learning architectures and suggested that empirical findings of simpler models like average embedding outperform convolutional neural networks and proved that their study is the first large-scale proof of textual disclosures’ predictive potential [19]. Francisco Antunes et al. and Antunes et al. developed a probabilistic approach to bankruptcy prediction by using Gaussian processes (GP) and compare them to support vector machines (SVM) and logistic regression (LR) and stated that the GP may effectively increase the bankruptcy prediction performance in terms of accuracy in addition to a stochastic interpretation [7, 15]. Philippe du Jardin demonstrates that

204

S. Begum

a neural network-based strategy with a set of parameters chosen according to a metric of network adaptation outperforms a set chosen according to criteria utilised in the economic literature and also illustrates that the way a group of variables represents the financial portfolios of healthy organisations can help reduce Type I errors [20]. Shin demonstrates that the designed classifier from the SVM technique outperforms BPN when it comes to predicting corporate insolvency. As the training set size shrinks, the results show that SVM works better than BPN in terms of accuracy and generalisation and also look at the impact of performance variability with respect to different model parameters in SVM [21]. Odom M and Sharda R. show that the modest neurogenetic technique yielded a meaningful performance, the results of the research demonstrate that a neurogenetic methodology to bankruptcy prediction is possible [14]. Adnan Aziz et al. offer a novel ranking methodology, the first of its kind, to address the challenge of model selection in empirical bankruptcy predictive models applications [2, 22]. Claudiu Clement discovered that no model works better on any type of data, and the domain is still far from determining what works best. This document helps to keep academics and practitioners up to date on the current state of the domain, as well as technologies that have recently been utilised for predicting financial distress and their performance [13].

3 Proposed Model 3.1 Dataset Characteristics In this paper, the dataset used is taken from kaggle. Table 1 shows features taken for final modelling after dropping the features based on correlation coefficient, feature importance. Finally, I have taken 6819 records and 22 features for the experiment. Figure 1 shows correlation matrix of the selected 22 features using feature importance. Correlation Coefficient is defined as the standard deviation and covariance between two variables as shown in Eq. 1 z y i = (yi − y)/(s y )

(1)

Variance is defined as the sum of the squared distances of each term in the distribution from the mean (μ), divided by the number of terms in the distribution (N) as shown in Eq. 2. σ2 =

(χ − μ)2 N

(2)

A Detailed Study for Bankruptcy Prediction by Machine …

205

Table 1 Selected features and description S. No.

Attribute

Description of attributes

1

‘ROA (C) before interest and depreciation before interest’

This is profitability ratio that Several values from 0.0 to provides how much profit a 1.0 company is able to generate from its assets

2

‘Operating gross margin’

The ratio of profit to net sales of a company

Several values from 0.5965 to 0.6231

3

‘Interest-bearing debt interest rate’

This is the total amount of outstanding indebtedness of the companies for borrowed money

Several values from 0.0 to 990,000,000.0

4

‘Tax rate (A)’

The percentage at which a company is taxed annually

Several values from 0.0 to 1.0

5

‘Operating profit growth rate’

The measure of company’s Several values from 0.0 to profitability that tells us how 1.0 much revenue will eventually be earnings for the company

6

‘Total asset growth rate’

The rate at which an asset increases or decreases in value over time

Several values from 0.0 to 990,000,000.0

7

‘Cash reinvestment %’

This is used to estimate the amount of cash flow that management reinvests in a business

Several values from 0.0 to 1.0

8

‘Borrowing dependency’

This is the cost of interest-bearing debt

Several values from 0.0 to 1.0

9

‘Total asset turnover’

The ratio of total sales or revenue to average assets

Several values from 0.0 to 1.0

10

‘Fixed assets turnover frequency’

This is an efficiency ratio Several values from 0.0 to that indicates how well or 9,990,000,000.0 efficiently a business uses fixed assets to generate sales

11

‘Working capital to total assets’

The ratio compares the net liquid assets to the total assets of the company

12

‘Current liability/liability’ The measure that assesses the proportion of total liabilities that are due in the near term

Several values from 0.0 to 1.0

13

‘Average collection days’

Several values from 0.002645 to 0.01175

This represents the average number of days between the date a credit sale is made and the date purchaser pays for that sale

Distinct attribute values

Several values from 0.0 to 1.0

(continued)

206

S. Begum

Table 1 (continued) S. No.

Attribute

Description of attributes

14

‘Total expense/assets’

The measure of the total cost Several values from 0.0 to of a fund to the investor 1.0

Distinct attribute values

15

‘Cash turnover rate’

An efficiency ratio that shows the number of times cash is turned over in an accounting period

Several values from 0.0 to 10,000,000,000.0

16

‘Fixed assets to assets’

This is a type of ratio which is found by dividing total fixed assets of a company with its long-term funds

Several values from 0.0 to 8,320,000,000.0

17

‘Cash flow to total assets’ An efficiency ratio that rates Several values from 0.0 to cash flows to the company 1.0 assets without being affected by income recognition or income measurements

18

‘No-credit interval’

It is a financial metric that Several values from 0.0 to indicates the number of days 1.0 that a company can operate without needing to access non-current assets, long-term assets whose full value cannot be obtained within the current accounting year

19

‘INTEREST coverage ratio (interest expense to EBIT)’

The interest coverage ratio is Several values from 0.0 to a debt and profitability ratio 1.0 used to determine how easily a company can pay interest on its surrounding debt

20

‘Operating expense rate’

It is a measurement of the cost to operate a piece of property, compared to the income brought in the property

Several values from 0.0 to 9,990,000,000.0

21

‘Research and development expense rate’

It is a ratio of research and development for an industrial company

Several values from 0.0 to 9,980,000,000.0

22

‘Bankrupt?’

It is the dataset aim column which has binary classification 0 and 1. 0 is the class that predicts the company will not bankrupt and 1 predicts the company will bankrupt

0 and 1

A Detailed Study for Bankruptcy Prediction by Machine …

207

Fig. 1 Correlation matrix of the selected 22 features using feature importance

Standard Deviation is defined as the square root of variance as shown in Eq. 3 σ =

(xi − μ)2 n

(3)

To accomplish the objective of enhancing the model calculation and also to decrease the learning difficulty as well as to rank the important features, random forest algorithm is used.

3.2 Feature Engineering Feature Abstraction: All categorical features are encoded with label encoding method. Visualisation: The count of non-bankrupt and bankrupt class in target variable is as shown in Fig. 2.

4 Classification Random Forest Classifier (RF) For regression and classification, random forest algorithms are widely used. It builds a data tree and makes predictions on that basis. On large datasets, the random forest algorithm can be used, and missing values are also taken care by this classifier. You can save the samples created from the decision tree so that it can be used for

208

S. Begum

Fig. 2 Count of target variable

another data. Two main steps in the creation of random forests are: random forest construction and then predicting a random forest classifier created in the first step. XGBoost Classifier These days, it is the most popular algorithm for machine learning. It is well known to have better solutions than other ML algorithms irrespective of the data form (regression classification). Extreme gradient boosting (XGBoost) is similar, but more effective, to the gradient boosting system. Logistic Regression Mostly used for binary classification problems, it is a classification algorithm. The logistic regression algorithm uses the logistic function in logistic regression, instead of fitting a straight line or hyper plane, to squeeze the output of a linear equation between 0 and 1. Gini Index The probability of misclassifying an observation is called as the Gini impurity. The value should be lower for better results. Advantages of RF • When compared to other algorithms, it is one of the most accurate learning algorithms. • It is very efficient to use on large databases. • Through feature importance, it gives contribution of all input features towards target feature. • It takes care of missing values and outliers. Advantages of XGBoost In certain instances, XGBoost had a prediction error ten times smaller than boosting or random forest.

A Detailed Study for Bankruptcy Prediction by Machine …

209

Advantages of Logistic Regression • • • • •

Training a model require less computation power. Trained weights give inference about the feature importance. New data is updated easily in the models. It is less prone to over-fitting. It works efficiently if the dataset has features that cannot be linearly separable.

Artificial Neural Network (ANN) The radial basis function is employed in neural network algorithms, which can be utilised for strategic reasons. Though ANNs are influenced by the human brain, they operate on a much lower level. ANN refers to the utilisation of the anatomy of neurons in machine learning. The ANN gets input signals in the form of patterns and images in the form of vectors from the outside environment. Each input is then multiplied by the weights assigned to it by the ANNs in order to solve a specific problem [23]. In most circumstances, ANN is utilised when something that happened in the past is recreated almost exactly in the same way. With enough practise, ANN will be prepared for any eventuality. As a result, it is a type of machine learning technology with a large memory. However, it does not operate effectively when the scoring population differs greatly from the training sample [24].

5 Methodology Our approach to solve this problem is to make multiple regression models, then choose the model with the highest accuracy and tune the hyper-parameters of that model to obtain maximum accuracy, as discussed in the previous section. We use synthetic minority oversampling technique (SMOTE) method in our study. It is an oversampling technique. It generates synthetic samples from the minority class, to solve the imbalance class problem. It is used to find a synthetically class-balanced training set, which helps in training the classifier.

6 Model Evaluation and Assessment This section demonstrates the outcomes obtained through the application of logistic regression, random forest classifier and XGBoost classifier. Accuracy ranking, accuracy, recall, F-measure, AUC and ROC are the metrics used to conduct performance analysis of the algorithm. The metric of precision (Eq. 4) provides the proper measure of positive analysis. The measure of actual positives that are right side is defined by recall (Eq. 5). The F-measurement (Eq. 6) measures precision.

210

S. Begum

Precision = (True Positive(TP))/(True Positive(TP) + False Positive(FP)) (4) Recall = (True Positive(TP))/(True Positive(TP) + False Negative(FP)) F-Measure = (2 ∗ Precision ∗ Recall)/(Precision + Recall) • • • •

(5) (6)

TP: the customer is a defaulter, and the test is positive. FP: the customer is not a defaulter, but the test is positive. TN: the customer is not a defaulter, but the test is negative. FN: the customer is a defaulter, but the test is negative.

Receiver Operating Characteristic (ROC) curve: It provides the performance analysis of a classification algorithm taking all classification thresholds. This curve displays two parameters, TP rate in Y-axis and FP rate in X-axis, as shown in Fig. 3. Area Under ROC Curve (AUC): It provides accumulative measure of performance analysis taking all possible classification thresholds, as shown in Fig. 3. We can split the space into two parts, a triangle and a trapezium. The triangle will have area (Eq. 7) TPR ∗ FPR/2

(7)

The trapezium will have area (Eq. 8) (1 − FPR) ∗ (1 + TPR)/2 = 1/2 − FPR/2 + TPR/2 − TPR ∗ FPR/2

Fig. 3 ROC performance curve of the four methods

(8)

A Detailed Study for Bankruptcy Prediction by Machine …

211

The total area (Eq. 9) 1/2 − FPR/2 + TPR/2

(9)

7 Results Random forest and ANN outperform when compared to XGBoost and logistic regression. The results obtained are shown in Table 3. Prediction model of RF and ANN shows both precision and recall as 0.98 and 0.40 as shown in Table 2 suggesting that the model has a high generalisation capability. Figure 3 shows ROC curve. The closer the ROC curve is to the upper left corner, the higher the model’s recall rate. The sum of false negative and false positive is the lowest. Table 4 shows confusion matrix. Table 2 Metrics evaluation of the four algorithms Algorithm

Precision (%)

Recall (%)

F1-score (%)

1

0

1

0

1

0

Random forest

0.98

0.40

0.98

0.37

0.98

0.38

XGBoost

0.99

0.18

0.87

0.85

0.93

0.29

Logistic regression

0.97

0.04

0.92

0.10

0.94

0.05

Artificial neural network

0.98

0.40

0.98

0.40

0.98

0.40

Table 3 Accuracy of the four algorithms Algorithm

Training accuracy (%)

Testing accuracy (%)

Random forest classifier

100

96.53

Artificial neural network

100

96.12

XGBoost classifier

91.62

87.39

Logistic regression

52.00

89.26

Table 4 Confusion matrix S. No.

Algorithm

TN

FP

FN

TP

1

Logistic regression

1517

136

47

5

2

Artificial neural network

1622

31

31

21

3

XGBoost classifier

1446

207

8

44

4

Random forest classifier

1624

29

33

19

212

S. Begum

8 Conclusion Model for predicting bankruptcy has been built using random forest algorithm, and the outcomes are compared with other three algorithms, namely XGBoost, artificial neural network and logistic regression. Random forest and ANN algorithm outperform compared to the other two algorithms conferring to the analysis of the experimental results. The above four algorithms can be used to test the entire population by using Indian dataset as a future work, and also, I hope to improve bankruptcy prediction by combining various heuristic optimization techniques with machine learning by using Apache Mahout tool.

References 1. Nagaraj, K., Sridhar, A.: A predictive system for detection of bankruptcy using machine learning techniques. Int. J. Data Min. Knowl. Manage. Process (IJDKP) 5(1) (2015) 2. Adnan Aziz, M., Dar, H.A.: Predicting corporate bankruptcy: where we stand? Corp. Governance J. 6(1), 18–33 (2006). ISSN 1472-0701. http://doi.org/10.1108/147207006106 49436 3. Patel, P., Shrivastava, A., Nagar, S.: Bankruptcy prediction model using Naïve Bayes algorithms. Int. J. Innov. Trends Eng. (IJITE) 59(83), Number 01 (2019). ISSN: 2395-2946 4. Qu, Y., Quan, P., Lie, M., Shi, Y.: Review of bankruptcy prediction using machine learning and deep learning techniques. In: 7th International Conference on Information Technology and Quantitative Management (ITQM 2019). Procedia Comput. Sci. 162, 895–899 (2019) 5. Barboza, F., Kimura, H., Altman, E.: Machine learning models and bankruptcy prediction. Expert Syst. Appl. 83, 405–417 (2017) 6. Tsai, C.-F., Wu, J.-W.: Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Syst. Appl. 34, 2639–2649 (2008) 7. Antunes, F., Ribeiro, B., Pereira, F.: Probabilistic modeling and visualization for bankruptcy prediction. Appl. Soft Comput. 60, 831–843 (2017) 8. Sarojini Devi, S., Radhika, Y.: A survey on machine learning and statistical techniques in bankruptcy prediction. Int. J. Mach. Learn. Comput. 8(2) (2018) 9. Lee, M.-C., To, C.: Comparison of support vector machine and back propagation neural network in evaluating the enterprise financial distress. Int. J. Artif. Intell. Appl. (IJAIA) 1(3) (2010) 10. Cialone, G.: Bankruptcy prediction by deep learning. In: CS230 Winter 2020 11. Wang, N.: Bankruptcy prediction using machine learning. J. Math. Financ. 7, 908–918 (2017). ISSN Online: 2162-2442, ISSN Print: 2162-2434 12. Zweng, S., Li, Y., Yang, W., Li, Y.: A financial distress prediction model based on sparse algorithm and support vector machine. Math. Probl. Eng. 2020, Article ID 5625271 13. Clement, C.: Machine learning in bankruptcy prediction—a review. J. Public Adm. Financ. Law (17), 178–197 (2020) 14. Odom, M., Sharda, R.: A neural networks model for bankruptcy prediction. In: Proceedings of the IEEE International Conference on Neural Network, vol. 2, pp. 163–168 (1990) 15. Antunes, F., Ribeiro, B., Pereira, F.C.: Probabilistic modeling and visualization for bankruptcy prediction. Appl. Soft Comput. 60, 831–843 (2017). http://doi.org/10.1016/j.asoc.2017.06.043 16. Shin, K.-S., Lee, T.S., Kim, H.: An application of support vector machines in bankruptcy prediction model. Experts Syst. Appl. 28, 127–135 (2005) 17. Zhang, G., Hu, Y.M., Patuwo, E.B., Indro, C.D.: Artificial neural networks in bankruptcy prediction: general framework and cross validation analysis. Eur. J. Oper. Res. 116, 16–32 (1999)

A Detailed Study for Bankruptcy Prediction by Machine …

213

18. Hauser, R.P., Booth, D.: Predicting bankruptcy with robust logistic regression. J. Data Sci. 9, 565–584 (2011) 19. Mai, F., Tian, S., Lee, C., Ma, L.: Deep learning models for bankruptcy prediction using textual disclosures. Eur. J. Oper. Res. 274, 743–758 (2019) 20. du Jardin, P.: Predicting bankruptcy using neural network and other classification methods: the influence of variable selection techniques on model accuracy. Neurocomputing 73(10–12), 2047–2060 (2010) 21. Shin, K.S.: An application of support vector machines in bankruptcy prediction model. Expert Syst. Appl. 28(1), 127–135 (2005). http://doi.org/10.1016/j.eswa.2004.08.009 22. Begum, S.: A study to predict home loan defaulter using machine learning. In: Samaroh 2021—SP Sampathy’s Memorial International Conference Technology, Innovation and Quality Management, 17–18 Apr 2021 23. https://en.wikipedia.org/wiki/Artificial_neural_network 24. https://www.analyticsvidhya.com/blog/2014/10/ann-work-simplified/

Design and Development of Tracking System in Communication for Wireless Networking Kamal Upreti, Vinod Kumar, Dharmendra Pal, Mohammad Shabbir Alam, and A. K. Sharma

Abstract A mobile tracking device is used to control the location of vehicles and many functional details, including altitude, the temperature of the cabin and the number of passengers, can be tracked in specific cases. The tracking procedure is carried out by satellite using GPS and transmitting the data to a server via the GSM modem. It is modest in capacity, inexpensive and has low data rates, ZigBee is a simple, low-tech wireless technology for personal area and device-to-device networking. A realistic monitoring and positioning device built upon ZigBee for artifacts suggested in this article. In addition, the configuration of the device, the system elements, and their features are discussed. In addition, the findings of the commercial application indicate that the proposed device is appropriate for monitoring and position detection. Keywords ZigBee · Wireless technology · Global positioning system (GPS) · Radio frequency recognition (RFID) · Radio signal intensity indication (RSSI)

K. Upreti (B) Department of Information Technology, Dr. Akhilesh Das Gupta Institute of Technology & Management, Delhi, India e-mail: [email protected] V. Kumar · D. Pal Department of Computer Science and Engineering, SGT University, Gurugram, Delhi, India e-mail: [email protected] D. Pal e-mail: [email protected] M. S. Alam Department of Computer Science, College of Computer Science and Information Technology, Jazan University, Jizan, Saudi Arabia A. K. Sharma Department of CSI, University of Kota, Kota, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_19

215

216

K. Upreti et al.

1 Introduction Advanced computation and sensing network systems provide advanced data collection and connectivity potential for integration and process efficiency enhancement like RFID and GPS. Radio frequency recognition (RFID) system developments have also been implemented in the last few years to identify and monitor materials as a prototype for communication [1]. Research shows that the time taken for RFID tags to download data through the material monitoring system of a corporation has decreased, and RFID will serve as a promising technology to receive materials. It carried out field testing of existing RFID technologies to determine the technological viability for automated identification, monitoring individual pipe spools at laydown yards and shipping portals in reaction to the need of the supply chain to trace identification material [2]. It also performed a method monitoring experimental test using active 32 KB RFID, 3.6 V, and 915 MHz RFID tags. A concept is created using a PDA to monitor tools in a mobile setting and store hand tools in portable or truck boxes [3]. When compared with older technology, e.g., barcodes, RFIDs provide an innovative material monitoring system, but there are a few drawbacks when extended to building techniques. RFID’s primary feature is to identify and monitor distributed RFID marks remotely. Since RFID is initially intended to substitute the bar code technology, there are relatively few broader uses for wireless tracking and location. A recent RFID Journal survey revealed that the cost of RFID tags range, depending on the tag’s specifications, from 20 cents to 6 dollars. Most RFID readers, though, cost 2500–3000 dollars based on the different features in the unit [4]. Thus, for practical usage in the comprehensive site, the coverage for contact except with active RFID tags is prohibitively costly because the reading range is not sufficient for practical use. While the Global Positioning System (GPS) will increase the exactness of the location of the tags by integrating them with RFID; a significant amount of materials in the standard building project will have to be tracked and monitored by the GPS recipients. In addition to high costs, the multi-track and signal masking devices dependent only on GPS suffer from high-density regions. Due to this drawback, using a standalone global positioning device in construction-vehicle tracking systems was found to cause significant mistakes in positioning more than 20 m over 40% point and more than 100 m over 9% of energy [5]. The exact positioning for material detection and surveillance in a crowded setting such as building sites is often inaccurate. This paper suggests a surveillance and position detection scheme consisting of a base station, mobile station, truck, monitoring facility, and mobile exploiters. ZigBee modules communicate with each other both with a base station and a cell station. The primary station provides state information for mobile stations through a communications bus to the monitoring center [6]. The monitoring center administrator can use the details on the monitors [7]. The contact bus between the base station and the monitoring center can be destroyed when an emergency occurs, making our device unique. Under that scenario, we might look for people taking a mobile station using

Design and Development of Tracking System in Communication …

217

a mobile explorer. ZigBee module is often used for communication with one another through nomadic explorers and mobile devices [8].

2 Literature Review Jang [9] in this document, a current prototype architecture for electronic tracking and monitoring of building materials is presented. In building materials monitoring, RFID and GPS technologies were investigated, and signal strength-based localization is examined. A brief specification of the ZigBeeTM protocol has been defined as an evolving network standard for industrial applications. To boost positioning precision and cost advantage, we have introduced ZigBee device design to utilize RF and ultrasound hybrid techniques. The researcher also analyzed the viability study and scenario for installation to present the potential mechanism for construction implementation. Shi [10] with wireless networks and embedded devices, monitoring and position detection apps are increasingly prevalent with these technologies. For personal-area and device-to-device networking, ZigBee is low capacity, low costs, low data rates, and low sophistication of wireless technology. A realistic monitoring and positioning device built upon ZigBee for artifacts and staff is suggested in this article. In addition, the configuration of the device, the system elements, and their features are discussed. In addition, the findings of the commercial application indicate that the proposed device is appropriate for monitoring and position detection. Amutha [11] a positioning and monitoring device is critical for our potential computer environment. An algorithm is used in the human walking model and the blind human walking model for precise position knowledge. We want to use Zigbee and GPS to introduce a reliable localization tracking mechanism. We have implemented the Markov chain algorithm to create accuracy. In the known setting on our campus, regular human and blind human walking steps were taken, and the Markov chain algorithm was applied to smoothen the step-by-step difference on position changes. Mutiara [12] wireless sensing technologies are more and more needed today. However, it also includes the inclusion application for unauthorized deforestation and forestry conservation, and increasing illegal log cutting in the forest. The wireless sensor network is an appropriate network architecture for remote device control or tracking. Mahafzah [13] wireless and digital networking develop smaller yet more efficient battery-equipped gadgets which are simple to manage. ZigBee is a home or indoor wireless personal area network (WPAN) which uses low-performance digital radios to collect information to conduct low data rates. The sensor nodes are installed and constantly moving in ZigBee’s network. You X. [14] wireless communication networks with fifth generation (5G) are being implemented globally from 2020 and further abilities, such as mass connection, ultrareliability, and low latency, are in the process of being standardized. However, the

218

K. Upreti et al.

5G will not fulfil all future needs in 2030 and beyond, and wireless communication networks in the sixth generation (6G) should offer worldwide coverage, increased energy/spectral/cost efficiency, improved intelligence and security, etc. 6G networks will rely on new technology for the enablement of this type, i.e., aerial interface and transmission systems, and new networking architectures, such as wavelength design, multiple access systems, channel-coding schemes, multi-antenna technology, cell-free network slicing, and cloud/fog/edge computing. Dener [15] the usage of tracking technologies began to expand with the recent surge in robberies and abductions. The tracking systems were also significantly enhanced and new systems were created according to this requirement. The vehicle tracking and human object tracking systems are two of the most extensively utilized tracking systems. With such systems, you may monitor your own vehicle or the family members, and, if needed, you may monitor the fleet of cars or goods. Vehicle and object racking systems can locate places with the use of wireless network sensors and GPS modules and may be shown and monitored on the created Web or mobile platforms. This research includes the presentation on online and mobile platform of vehicle and object tracking systems using wireless sensor networks.

3 Methodology It will be challenging to provide the appropriate guidelines for the best communication with the GPS and IMU if two directional antennas in a point-by-point network are run in an accessible, thoroughly familiar place. It would generally be better if the two antennas were located on each other in such a case. However, this solution is only possible if both sides have an exact GPS and a compass sensor [16]. In addition, since GPS signals in indoor conditions are nearly impossible to obtain; they cannot be located in areas that have the potential to improve the wireless capability of directional antenna. Moreover, if there are multi-path consequences and the inclusion of other wireless disturbances, pointing to each other cannot be the most vital guideline nor guarantee the highest efficiency. Optimizing the work of the two antennas will only be insufficient if knowledge is shared about their actual directions and locations [17]. In addition, in the case of multi-channel effects or other wireless interference, the optimal directional antenna orientation without sufficient data on the impact is difficult to estimate or quantify. This paper, therefore, suggests an active antenna monitoring method and the DOA evaluation for the directional antenna self-orientation. First, two directive antennas installed on either side of a pan-tilt servo system are needed for the proposed plan, i.e., a total of four antennas, as seen in Fig. 1, for constructing a point-to-point network [18]. Figure 1 is showing the robot on the left and the command center on the right hand, the layout of this device. For data transmission, the top antenna on both sides is the real one, such that the antennas are coupled. The lower antenna is for the opposite hand of the DOA estimate. The optimal position of the top antenna can be calculated by spinning the bottom antennas,

Design and Development of Tracking System in Communication …

219

Fig. 1 A configuration of proposed antenna tracking system

taking measurements with RSSI (radio signal intensity indication), and locating the path from the top antenna with the highest RSSI. The direction of each of the top antennas is then regularly changed after each rotation using the lower antennas [19]. This active antenna tracking device works separately on either foot so that the best antenna orientation can be adjusted and oriented over some time. Nonetheless, the position of the two top antennas can be optimized without GPS and a compass sensor using this technique carried out by measuring radio signal strengths [20].

4 Proposed System To minimize the overall cost of messages in periodic monitoring mode, the amount of responses sent to the server by transferring units during Tobs for a given T is suggested to be reduced. The principle is that if the distance travel within the last T minutes is less than a certain level, a moving device does not submit location details to the server via SMS. The Android app software is designed to determine the distance between the current coordinates and the last SMS coordinates until the time for the SMS response message arrives. If the gap to the threshold is more significant or equivalent, a new SMS location will be received. The software performs little otherwise and waits for the next T seconds. The overall amount and expense of SMS messages can be reduced in the regular monitoring mode by implementing the scheme. This decrease depends on the motion style and the proper selection of the appropriate T and threshold distance values relevant to the monitoring precision [21]. The suggested system distinguishes perfectly between movement and mobile units’ stopping conditions. In case of an actual pause and shift a short distance, it reduces the need to relay SMS messages (less than a threshold). The importance of the threshold size, however, influences the precision of monitoring the moving target and recording it. This means that the tracking precision is intense, and the amount of SMS messages used is near the original case of zero thresholds if a short threshold distance is picking and the mobile device mainly reaches the share per T. This reduces costs somewhat. However, if the threshold is high enough that the moving device never runs this threshold per T, a substantial decrease in SMS costs

220

K. Upreti et al.

can be obtained in this situation. It should be remembered that at the cost of more inadequate monitoring precision, this benefit is gained [22]. The distance of the threshold and T values are therefore case dependent and should be chosen based on the anticipated pace, movement style, required accuracy level, and the available tracking budget [23].

5 Implementation Use Network Simulator version 2.35 to incorporate the proposed model. NS2 is a network modeling development application. The concept of object architecture, code reuse, and modularity is elaborated in essence. The norm in this area is now available, and several research laboratories recommend its usage for testing new protocols. The current NS simulator is particularly suitable for the simulation of large-scale packetswitched networks [24]. It includes the functions used to learn single or multicast routing algorithms, transport protocols, session, booking, automated facilities, and application protocols like FTP. ZigBee has 52 sensor nodes with nodes spanning from 0 up to 44 sense node (SN), 6 mobile node (MN) node 45 to 50, and 1 base station to incorporate the planned wireless sensor networks model. During the simulation, a particular mobile node is chosen as a source node to connect with the node through other nodes. The base station is programmed as the sink node. These nodes may detect, locate multiple tasks such as collecting nodes (location, power, etc.) and move the packet [25, 26]. The 39-transmission range of each sensor node is 30 m, meaning that sensed data are transmitted inside the 30 m but cannot surpass them. The packets are lost as they begin to exceed their boundaries. Sensor nodes may create the routing route depending on their range and node mobility, where sensor nodes are set and can only be used to transmit data. The actual situation in which sensor nodes, moving nodes, and sink nodes are randomly deployed. The 0.0 s snapshot reveals the exact location of the whole picture [13]. The existing network location and the positioning of sensor nodes during movements have a disadvantage. The ZigBee WSN has a small range for packets to relay, has poor power capabilities, and has to conduct numerous roles with such restricted resources, as indicated by a literature survey. Therefore, we have suggested this unique system to solve our study problems. The proposed device accurately locates and calculates the distance between the deployed nodes, verifies the number of batteries of the nodes, and finally links the data for transmission [13]. A mobile node knows precisely if the information is effectively transmitted or missed by calculating the radius. We conduct an assessment based on throughput, delay, and packet distribution ratios for the current work compared to the work proposed. After a certain period, we also monitor the distance of each node to let the sensor nodes realize that they are indeed attached to other sensor nodes. This raises the performance and decreases packet leakage, which ultimately improves the method suggested. The proposed system’s residual energy is also calculated [12, 27].

Design and Development of Tracking System in Communication …

221

In our method, the sensor nodes in 400 × 400 topography are used at random. The scenario is generated by taking into account the moving indoor paths. Sensor nodes (SN), mobile nodes (MN), and base station (BS) scattered across the network to track the consistency of the data are the various kinds of sensor nodes. The transmitting range of sensor nodes is 30 m, and the transmission is carried out Omni antenna. We use the energy model norm of 100 J of initial energy [28]. Sensor nodes only use energy while the other sensor nodes are sent, received, or tracked [26]. They are in sleep mode otherwise. The AODV routing protocol is ideally suited for suggested systems with an Ad hoc on-demand distance vector. This gives the proposed algorithm versatility to calculate the gap between knots as required, reducing needless throw-out of the data packets to verify node location and availability. This simple application of AODV is an on-demand calculation of the distance between the sending node, i.e., the sender node and the destination node, i.e., the receiver node, another application AODV, which allows managing the routing table for each node. To do this, we use the system-calculated distance graph [11]. Figure 2 indicates the distinction between the sensor node, moving node, sink node, and even labels, displaying the same topology diagram, using the various colors for different types of nodes [29]. The whole distribution of nodes is achieved under the topography of 400 × 400, such that a wireless ZigBee sensor network with a smaller range and fewer nodes are represented. At the end of the simulation, Fig. 3 indicates the energy level of various nodes. After many activities such as sensing, maintaining the table, contacting each other, and finally sending the data packets to sink nodes, the energy level of all sensor nodes goes down. Mobile nodes often travel to destinations from their origins, some to fall nodes and some to them. This should be achieved experimentally since the course of mobility in the case of mobility cannot be estimated [25]. RSSI localization strategy is used to connect sensor nodes and mobile nodes, and a regular energy model is used

Fig. 2 Sensor nodes, mobile nodes and sink node

222

K. Upreti et al.

Fig. 3 Energy level of sensor nodes

for energy levels. For locating the node and further contact, the calculated distance using RSSI is used.

6 Result Analysis 6.1 Comparison of Tracking Error of Existing and Proposed System In comparison with the current paradigm, success assessments must be carried out where location is not the same as the method suggested. With the roughly predicted change of the process, the mean values of various parameters are shown below. Any technique has a breakdown to restore or fix by comparing the outcome of the current system using the framework proposed. By comparing the current model with the proposed model, we can conclude that it is better, more effective, or more effective than the previous model. In the efficiency, delay, and package delivery ratio, the proposed method is efficient than the current system. This may then be derived by a higher average efficiency than the current model in the proposed model [30]. Tracking errors are seen in modeling studies. Figure 5 shows the precision of various sensor densities of our proposed system over time. The effect is a deficient tracking error, and this allows for correct localization. The latest error of each round helps offset the methods between error and resources to avoid an increase in monitoring error. In about the same way, the tracking mistake, but we did our utmost to prevent it from increasing [24]. Due to a reduction in the number of sensor nodes, the average tracking error rises over time as some of them died because of power failure. As we can see with the results of experiments in Fig. 4, when a mobile node

Design and Development of Tracking System in Communication …

223

Fig. 4 Tracking error

Fig. 5 Throughput

reaches a reference node, the location with an error distance less than 1 m, which improves the tracking error by 8 percent, can be correctly determined by our system. Around the same time, as the mobile node approaches the fixed sensor, the approach is precise. In addition, the average error measured in Fig. 5 was very stable.

6.2 Throughput The number of bytes stored over a specific period in Kbps, Mbps, and Gbps. The accuracy of the proposed model has been measured via the simulation. The output is determined every 1 s at an interval. With an estimated production of 91,095, the proposed framework has a current system of about 63,020. This means a 45% increase in the efficient delivery of packets to destinations. The relation between the current model and the planned network output model is seen in Fig. 6. The x-axis is the duration of simulation in seconds, and the y-axis is the net output of both systems. This indicates that the time the proposed model improves its production in comparison with the current model. It has been created using the AtmelATMega128@8 MHz processor with 2400– 2483.5 MHz frequency band and has been built using NESC to use the open-source

224

K. Upreti et al.

Fig. 6 Movement of object

TinyOS MICAz operating system. Using XSniffer, received packets are seen on the PC. It is primarily designed with the wireless sensor network and collects particular information like movement speed and direction of the item. For this purpose, the PIR sensors are connected to the MICAz node using NesC. For a single sensor and numerous sensors, the test will be repeated. On a PC, the sensor data are saved. Data from more than three sensors in a line-topology is gathered for several sensors. The application is evaluated in a laboratory configuration with four sensors, spaced by 5 m, placed in a topological line. We have chosen alternative nodes as the reporting nodes for our experiment. The base station reports the occurrences to the computer. We get the message and save it in an Excel file through XSniffer program. The message is legible. This information may be utilized to produce alarm signals or alert messages delivered through a GSM modem to mobile phones. Figure 6 displays the results of the simulation of the XSniffer data. The mean time interval and standard deviation between dispatch and confirmation of the messages via direct communication and by the usage of the wireless system were 37.92 ± 19.19 s and 30.65 ± 9.80 s, respectively. The mean time interval was not statistically significantly different between the two groups (p = 0.108). The mean time delay and standard deviation to find equipment by the nurses and by the usage of the wireless system were 234.00 ± 59.99 s and 23.97 ± 6.17 s, respectively. The mean time delay was statistically substantially different between the two groups (p < 0.001). The connection intervals to the localization server were ten seconds for a PDA and one hour for the standalone tracking device. There were a few localization problems found in the pilot research since the connection interval of the PDA was originally set to three minutes, which was too lengthy to follow nurses in real time. This issue was fixed by changing the connection interval of the PDA to 10 s. However, there was no substantial localization error in tracking the stand-alone tracking device in spite of the connection interval was one hour, since the tracking device was linked to the equipment in which the position was typically not altered.

Design and Development of Tracking System in Communication …

225

7 Conclusion After taking all the loopholes that can influence the environment, the proposed algorithm is constructed. In the ZigBee wireless sensor network, heterogeneous sensor nodes are used to test the performance and efficacy of the device proposed. This system will divide the implantation of the algorithm mainly into three sections, namely by taking account of the remaining energy for the sensor nodes and storing the energy in a trace file, by making the sensor nodes, mobile nodes and sinks available during connection and path formation, and by using the trilateral technique, by stored the location in a trace file, and after 0.5 s, this operation will go ahead [7]. The energy and distance tracking tables are revised and the exact numbers are presented throughout the simulation. The paradigm is intended to eliminate the inconveniences of the current method by combining the several algorithms in order to produce a modern and better approach.

References 1. Elnahrawy, E., Li, X., Martin, R.M.: The limits of localization using signal strength: a comparative study. In: Proceedings of Sensor and Ad-Hoc Communications and Networks Conference 2004 (SECON’04), Santa Clara, CA, USA (2004) 2. Song, J., Ergen, E., Haas, C.T., Caldas, C.: Automating the task of tracking the delivery and receipt of fabricated pipe spools in industrial projects. Autom. Constr. 15(2), 166–177 (2006) 3. Goodrum, P.M., McLaren, M.A., Durfee.: The application of active radio frequency identification technology for tool tracking on construction job sites. Autom. Constr. 15(3), 292–302 (2006) 4. Lu, M., Chen; W., Chan, W.H.: Discussion of building project model support for automated labor monitoring. J. Comput. Civ. Eng. ASCE 18(4), 381–383 (2004) 5. Tuna, G., Gungor, V.C., Gulez, K.: An autonomous wireless sensor network deployment system using mobile robots for human existence detection in case of disasters. Ad Hoc Netw. 13, 54–68 (2014) 6. Bellini, A., Cirilo, C.E., Prado, A.F., Souza, W.L., Zaina, L.A.M.: A Service Layer for Building GSM Positioning Systems in E-Health Domain (2017). https://doi.org/10.1109/U-MEDIA.201 1.27 7. Obaid, T., Abou-Elnour, A., Rehan, M., Muhammad Saleh, M., Tarique, M.: Zigbee technology and its application in wireless home automation systems: a survey. Int. J. Comput. Netw. Commun. 6(4), 115–131 (2014) 8. Srinivasan, Wu, J.: A Survey on Secure Localization in Wireless Sensor Networks, 1st edn. CRC Press (2007) 9. Jang, W.-S., Skibniewski, M.J.: A wireless network system for automated tracking of construction materials on project sites. J. Civ. Eng. Manag. 14(1), 11–19 (2014) 10. Shi, J., Song, H., Fang, Z.: A tracking and location detection system using ZigBee wireless network. Adv. Mater. Res. 546–547, 1223–1228 (2012). https://doi.org/10.4028/www.scient ific.net/AMR.546-547.1223 11. Amutha, B.: Location update accuracy in human tracking system using Zigbee modules. (IJCSIS) Int. J. Comput. Sci. Inf. Secur. 6(2) (2009) 12. Mutiara, G.A.: Using long-range wireless sensor network to track the illegal cutting log. Appl. Sci. 10, 6992 (2020)

226

K. Upreti et al.

13. Mahafzah, A., Abusaimeh, H.: Optimizing power-based indoor tracking system for wireless sensor networks using ZigBee. Int. J. Adv. Comput. Sci. Appl. 9 (2018). https://doi.org/10. 14569/IJACSA.2018.091233 14. You, X., Wang, C.X., Huang, J., et al.: Towards 6G wireless communication networks: vision, enabling technologies, and new paradigm shifts. Sci. China Inf. Sci. 64, 110301 (2021) 15. Dener, M.: Mobile and web architectures of vehicle tracking and human-object tracking systems in wireless sensor networks. J. Adv. Comput. Netw. 4(3) (2016) 16. Abusaimeh, H., Yang, S.H.: Energy-aware optimization of the number of clusters and clusterheads in WSN. In: 2012 International Conference on Innovations in Information Technology (IIT), pp. 178–183. IEEE (2012) 17. Wu, W., Wen, X., Xu, H., Yuan, L., Meng, Q.: Efficient range-free localization using elliptical distance correction in heterogeneous wireless sensor networks. Int. J. Distrib. Sens. Netw. 14(1) (2018). 1550147718756274 18. Akyildiz, I.F., Vuran, M.C.: Wireless sensor networks, 1st edn. Wiley, UK (2010) 19. Kaemarungsi, K., Krishnamurth, P.: Modeling of indoor positioning system based on location fingerprinting. In: IEEE Conferences; INOFCOM 2004.Twenty-third Annual joint Conference of the IEEE computer and Communication Societies, vol. 2, pp. 1012–1022 (2004) 20. Alhmiedat, T., Samara, G., Abu Salem, A.: An indoor fingerprinting localization approach for ZigBee wireless sensor networks. Eur. J. Sci. Res. 105(2), 190–202 (2013) 21. Deva Gifty, J.J., Sumathi, K.: ZigBee wireless sensor network simulation with various topologies. In: International Conference on Green Engineering Technology (2016) 22. Youssef, M., Agrawala, A.: Continuous space estimation for WLAN location determination system. In: IEEE Conferences; Computer Communications and Networks (2004) 23. Prez-Jimenez, R., Rabadan, J., Rufo, J., Solana, E.: Visible Light Communication Technology For Smart Tourism Destination. IEEE (2015) 24. Chen, Y., Wang, Y., Li, X., Gao, L.: The design and implementation of intelligent campus security tracking system based on RFID and ZigBee. In: 2011 Second International Conference on Mechanic Automation and Control Engineering, pp.1749–1752 (2011) 25. Wu, L., Huang, J., Zhao, Z.: ZigBee wireless location system research. In: Second International Conference on Computer Modeling and Simulation, vol. 3, pp. 316–320 (2010) 26. Upreti, K., Nasir, M.S., Alam, M.S., Verma, A., Sharma, A.K.: Analyzing real time performance in Vigil Net using wireless sensor network. In: Materials Today: Proceedings (2021). ISSN 2214-7853. https://doi.org/10.1016/j.matpr.2021.01.490 27. Upreti, K., Sharma, A.K., Vargis, B., Sidhu, R.S.: An efficient approach for generating IRIS codes for optimally recognizing IRIS using multi objective genetic algorithm. In: Materials Today: Proceedings (2020). ISSN 2214-7853. https://doi.org/10.1016/j.matpr.2020.10.085 28. Chu, C.H., Wang, C.H., Liang, C.K., Ouyang, J.H., Chen, Y.H.: High-accuracy indoor personnel tracking system with a ZigBee wireless sensor network. In: 7th International Conference on Mobile Ad-hoc Sensors Networks, MSN (2011) 29. Abusaimeh, H.: Balancing the network clusters for the lifetime enhancement in dense wireless sensor networks. Arab. J. Sci. Eng. (2014). https://doi.org/10.1007/s13369-014-1059-x 30. Upreti, K., Nasir, M.S., Alam, M.S., Verma, A., Sharma, A.K.: Analyzing real time performance in Vigil Net using wireless sensor network. In: Materials Today: Proceedings (2021). ISSN 2214-7853. https://doi.org/10.1016/j.matpr.2021.01.490. 31. Cho, H., Jang, H., Baek, Y.: Practical localization system for consumer devices using zigbee networks. IEEE Trans. Consumer Electron. 56(3), 1562–1569 (2010)

Cyberbullying in Online/E-Learning Platforms Based on Social Networks N. Balaji, B. H. Karthik Pai, Kotari Manjunath, Bhat Venkatesh, N. Bhavatarini, and B. K. Sreenidhi

Abstract Cyberbullying is a method of harassment using electronic means and is very common among all, as the digital sphere has extended its technology. As school districts shut down in retort to COVID-19 and students are moving to Elearning, ever than before our children may be using most of the period in front of screens. Though the digital world is supporting us link and study more, it is also a situation where harmful behavior can be encountered. It is serious that we engross our children in dialogue to keep them protective and motivate them to be very vigilant with what they share online. To better safeguard your child, we wish you to study more about digital awareness, check your child’s screen time and online activities, comprehend and set settings related to privacy, and set rules with your child. Bullying is spiteful behavior that is aggressive, unwanted, and repeated. This can be verbal, social, physical, or online. They are acts of power and can directly cause guilt, sadness, shame, and anger. Bullying is not restricted to children; adults can also bully other adults or children. Cyberbullying is victimization that happens online and over digital devices. Cyberbullying examples comprise detestable or mean texts, social media posts envisioned to embarrassing or fake images, spread rumors, sexually explicit, or threatening direct messages. It is significant to take cyberbullying and bullying of any kind, seriously. Bullying can have a long-lasting effect on a child’s mental health, relationships, and confidence. It can distress their capability to focus on academics and extracurricular happenings. And, it can also cause a child to make to bullying others, as a technique to get control back and feel like they have power again. In social media where bigger communication opportunities are offered, they even enhance the defenselessness for many people in the form of threatening online N. Balaji (B) · B. H. Karthik Pai NMAM Institute of Technology, Nitte, Karkala, Udupi, Karnataka, India e-mail: [email protected] K. Manjunath · B. Venkatesh Alva’s Institute of Engineering and Technology, Moodbidri, Karnataka, India N. Bhavatarini REVA University, Bengaluru, Karnataka, India B. K. Sreenidhi Cambridge Institute of Technology, Bengaluru, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_20

227

228

N. Balaji et al.

with the messages, images, and so on. In recent studies, cyberbullying constitutes a growing problem among youngsters, teenagers, and school students. To avoid such situation, an intelligent system is required for the identification of these threats automatically. The chapter mainly focuses on the types of cyberbullying, followed by case studies in which we discuss about the automatic detection of cyberbullying in online learning/E-learning platform based on social network. Keywords Bully · Cyberbullying · Online/E-learning · Perpetrator · Cyber-aggression · Machine learning

1 Introduction The technology progression had transformed the education way in all level of institution education, mainly in higher education. Through the adoption and usage of electronic learning (E-learning), which mainly targeted to simplify the process of learning by sending individually information and instructions to over an intranet or over the Internet [1]. Higher education institutions acceptance of E-learning is also pointed to offer an encouraging cooperative environment for students to connect and share ideas in a learning environment in online mode. Today, cyberbullying is one of the most widespread delinquency in online activity, specifically among youth. Cyberbullying has arisen as a phenomenon at the college, school, and university together with business and home. Some of the researchers visualize cyberbullying as a novel way of bullying in a traditional way, ensuing the classical definition firstly proposed by Olweus [2]. The definition specifies that it is a repeated over time process, includes a power imbalance among target and perpetrator and has a purpose to harm [2]. Defines cyberbullying as “an overrated phenomenon,” desiring to view it as just an extension of bullying in a traditional way into the virtual world. Other researchers explain that cyberbullying can attack all aspects of a target’s privacy, both at the educational institution and at home wherever possible. Cyberbullying in social media among university students has become very common nowadays, but it is being introduced even in the E-learning platform which may be utilized to succeed such misconduct. A perpetrator can select to masquerade their individuality, so enhancing the target student’s insecurity regarding their relationship quality [3]. In the cyberbullying, the students may not identify which associates of their group are involved. If a destructive message goes viral through the passersby’ actions who push the message to others in their network, the cyber-victim’s agony is compounded. Like traditional face-to-face bullying, from the other viewpoint, cyberbullying comprises the deliberate intent to offend a person or set of persons recurrently over time. In E-learning platform such as in higher education, cyberbullying has transformed the way of education done in all levels. The adoption carried out of E-learning by higher education is aiming to facilitate an environment in a collective way for students to share and communicate

Cyberbullying in Online/E-Learning Platforms Based …

229

ideas through platforms in online mode. But, online activity is the one of the most powerful weapons for cyberbullying. We have some extensive research progress in the context of cyberbullying in online learning platform among universities. These research works provide the likelihood of how online learning platform may be utilized to identify such misbehavior [1]. The major purpose of the chapter is to give some lights on the use of data analytics which has been conducted specifically in mining of text, in order to comprehend the characteristics of cyberbullying in an online platform of study in higher education sector.

1.1 Use of E-learning The Internet bandwidth and its progression in these many years have made possible to courses development where many interactive and multimedia objects deliver it in the online platform [4]. Among the many advantages of E-learning, convenience of learners has increased the content delivery flexibility and the promotion of students to prepare their own content of delivery. It works like a traditional education system, which allows the learners to ask queries and share the ideas and information. This kind of communication helps the learners to get support socially, which is needed for making an environment that nourishes the learning in a collaborative way. In spite of benefits that E-learning offers to the educational institution and its learners, there are some challenges that will refer to the use of essential technology and its utilization.

2 What Is Cyberbullying The upsurge in usage of social media for communications has drastically raised the issues of cyberbullying using various methods. The social media involves chat rooms, Facebook, Twitter, online learning and gaming platforms where people can sight and participate in the sharing of content. It is being notes that 33.8% of high and middle school students who are engaged in online learning, i.e., E-learning was the victims of cyberbullying in 2006 and increased accordingly as the usage has increased in day-to-day life of the individuals [5]. It involves the humiliation of a person through hate speech, comments, through SMS, messaging, etc., using various online apps. This also involves posting wrong information about a person, criticizing the character of a person, and so on.

2.1 Cyberbullying Types The cyberbullying types will help the young and adults to differentiate properly and allow them to report and adopt preventive measures. The various types are:

230

N. Balaji et al.

Table 1 Types of cyberbullying Harassment Aggressive and offensive messages continuously Flaming Unpleasant messages Denigration Sending fake or false about someone Masquearde The violator pretends to be someone who they are not Outing Sharing other‘s private information to someone else Trickery Tricking someone to share secrets or solicit information Exclusion Intentionally leaving someone out from an online group Cyberstalking Threats of harm or intimidation

1. Hurtful, nasty or criticizing rumors, or any related comments about an individual online. 2. Video or an image publishing on the Web site. 3. Create a fake profile of an individual. 4. Issuing online threats, provoke an individual to kill herself/himself or any other person. 5. Posting the religious, ethnic, racial hurting post or the comments. Cyberbullying is an emerging problem in E-learning platform specifically in higher education institutions and universities. The definition can be said in such a way that the utilization of communication tools on the Internet recurrently caused damage to a group of individuals or a specific individual [3]. Different face-to-face bullying in higher education triggers the rate of increase in victims and perpetrators, which leads in bullying text or images. Cyberbullying can be classified into various kinds of misconduct in online, which is represented as shown in Table 1 [6].

3 Cyberbullying Avoidance The cyberbullying sufferers are psychologically affected rather than bodily in terms of feelings of unease, depression, and miserable thoughts. Regrettably, many parents and educators are oblivious of where and when its affect has reached to the suicidal thoughts [7]. Many educational institutions lack of code of conduct, policies, and restrictions to prevent the cyberbullying. Hence, it is essential for both educators and parents to consider the fact of psychological harm and harmful behavior that may cause to the victims [8]. A series of records of individual conversation used in Malay and English between the students were taken from an E-learning platforms database. This is the dataset considered for our study, and any message recorded as personal or broadcast communication with the instructor is not included. To address the focus of the study, no private information was taken in the dataset. The attributes of the dataset were

Cyberbullying in Online/E-Learning Platforms Based …

231

UserID, full messages, and time created, and this dataset utilized the messages taken in between September 3rd, 2017 and July 17th, 2018, which comprises of 3498 messages (spr eadsheets) were collected as the text mining sample. Text mining is a field of interdisciplinary that draws the attention toward the information retrieval, statistics, data mining, machine learning, and computational linguistics. It is a process which collects messages from the repository and directed to the preprocessing phase, in which raw data was changed into a utilizable format. In the next step, extracting the gain insights about the contents or communications in the dataset. We have a well-known tool in mining and analysis of textual data known as AT L AS.ti primarily utilized for accomplishment of qualitative analysis of data, where researchers apply codes to collections of unstructured text, and it gives functionality for content visualization and identification that can be utilized for basic text analysis. The major advantage of this tool lets researchers to gather and primary data consolidation and assess their significance utilizing a varied set of tools. This tool takes a extensive diversity of data formats, and it boosts illustrating qualitative analytical connections among many diverse materials, from images and video to case study transcripts to survey data. Using common linguistic methods like grouping, extracting, indexing, AT L AS.ti was used to provide the mining and analysis. The main output of this work explores and extracts the key concepts, generates the different types of sorts, and gains understandings from motivating patterns and their connections (relationships) and other information from the textual data.

3.1 Case Study: Cyberbullying in Social MediaText Cyberbullying falls under three major criteria, i.e., intention where the main cause is to harm the victim. Repetition where the bully will repeat to blackmail the victims and power imbalance among the victim and bully and the bully attacking the other bully in the network. Prediction of cyberbullying incidents on the Instagram social media gives a very significant role in the today’s world [9]. In American teens, cyberbullying is an upward problem affected more than half of the teens. Cyber-aggression is the other keyword which describes the deliberate maltreatment carried with the help of electronic media with respect to an individual or to a particular cluster of persons regardless of their age. There is a clear difference between cyberbullying and cyber-aggression in Instagram. Cyberbullying labeling is used to define ground truth labeling. Cyber-aggression [10] that is defined as the occurrence of bullying utilizing digital media to deliberately harm the other individual is the major distinction between cyberbullying and cyber-aggression in Instagram. Some instances of words and negative content are slang, profanity, and abbreviations that would be utilized in posts of negative type. Cyberbullying has one of the major features like the power of imbalance—take on various forms which includes social,

232

N. Balaji et al.

relational, physical, or psychological [10]. In Instagram, each media comprises of a media element dispatched by the owner and the corresponding comments for each media element. Labeling process consists of two different views, first one incorporates the both image and its associated text, and the second one focuses on the image content on your own and is utilized to recognize the category and content of the image. The major intention behind to consider the image and associated text is to provide a scrollable interface to support the contributor to perceive all of the comments related with this image [11]. Labeling the image contents of media sessions is used to investigate the relationship between image content and cyberbullying. Linguistic and psychological analysis pattern focuses on measurements of cyberbullying or non-cyberbullying versus cyber-aggression media sessions or non-cyber-aggression. We use word count (LIWC) and linguistic inquiry [12]—strategy to find which categories of word have been utilized for cyberbullying or cyber-aggression-labeled media sessions. LIWC provides and assesses the various styles of words usage for psychological meaningful categories in each type and used for disparities in language use across different persons. The number of pronouns and words and negations and noun was analyzed. Then, investigate some of the particular roles such as work, achievements, and others. Finally, look in to the psychological measurements like family, friends, social. For every case, obtain the LIWC values, then compute the average LIWC for each media sessions (media session along with cyberbullying, media session with cyberaggression, no cyberbullying, and media session with no cyber-aggression). 1. Cyberbullying Detection: Detection of occurrences of Instagram media sessions designed and trained the classifier, and fivefold cross-validation was assigned to the dataset where 80% of data was utilized for training and 20% was utilized for testing. Features which are stated in the previous section were derived from the media elements and those got from user. From the text, some characters were removed such as “!”, “.”, as a part of preprocessing step. In order to reduce the dimensionality, latent semantic analysis (LSA) was used for the text analysis which protects the over-fitting and provides the semantics as features. Later, removing stop words was carried out, and finally feature vector was normalized [10]. 2. Cyberbullying Prediction: To predict the cyberbullying in Instagram, considering an image posted at that time based on the initial posting of the media element, which extracts the features of the image like graph feature of the owner of the profile prediction can be performed accurately in the subsequent steps. Firstly, the set of labeled media sessions with offensive words which creates the labeled dataset. This labeled dataset will not have any access to comments, applied pre-filtering strategy on the labeled dataset. A robust strategy to be identified to find the bullies and aggressors in Twitter, which is on the basis of text, user, and their attributes, understanding the features of aggressors and bullies. Crowd-sourcing platform Cr owd Flower [13], which considers the features of Twitter user network-based attributes and extracts the properties of bullies and aggressors, was used to detect

Cyberbullying in Online/E-Learning Platforms Based …

233

Fig. 1 Overview of the methodology

the bullies. Approach to detect bullying and aggression in Twitter is depicted in Fig. 1, and the same will be explained in the subsequent paragraphs which involves the data collection, preprocessing, sessionization, extracting user-text and network-level features, ground truth building, user modeling, and characterization and classification [14]. (a) Data Collection—Collecting tweets from Twitter in the form of Twitter’s streaming API, which has a 1% of tweets free-access. The API provides the tweets in JSON format which contains the tweets, some metadata and its information such as time, name of the follower. The process of building the ground truth, the dataset was collected. For an instance, the collected data amid June, August 2016, comprises of two sets such as a baseline for 1 M tweets of random type and a hate based of 650 K tweets from Twitter streaming API which uses 309 hashtags to hateful speech and bullying. (b) Preprocessing—Removal of stopwords, URLs, punctuations from the tweeted text and apply normalization. This strategy also removes the spam content which forms the micro-clusters during the process of classification [15]. There are various steps to perform the data to be labeled and develop ground truth. • Cleaning—Which removes the number, punctuations, and stop words and translating all characters to lower cases. • Spam Removing—Twitter comprises non-negligible number of spammers and proposed the detection tool for the same. This tool indicates the two major indicators of the spam, namely use of big number of hashtags and posting tweets in large number which is highly similar to each other. Persons with more than five hashtags per tweet are detached initially in this heuristic process, and later, estimation of similarity of user’s tweet using the well-known distance measure like Levenshtine distance measure [16]

234

N. Balaji et al.

which provides the minimal number of single-character edits wanted to be converted into a string to another, then calculate the average for overall pairs of the tweets of the users. (c) Sessionization—Grouping the tweets from same user in the context of time clusters and forms into sessions and do the analysis. Cyberbullying involves the repetitive actions, which aims to study user’s tweets often [17]. In the first stage, removing the non-significant tweets from user’s over 3 months of period, then utilize a session-related model, where finding the inter-arrival time per session among tweets which does not exceed the threshold. This experiment will be carried out for various values of threshold and finding the duration of optimal session and reach at a threshold of 8 h. In the second stage, dividing the sessions in batches which contains maximum information to be inspected by a crowd-worker within a stipulated period of time. To identify the optimal result, perform the preliminary labeling runs on Cr owd Flower which involves 100 staffs; each utilizing batches of precisely 5–10 and 5–20 tweets. (d) Ground Truth—Building ground truth using human annotators which uses the crowd sourced strategy, i.e., workers provisioned with set of tweets from the person and carry out the classification accordingly to predefined labels. Crowd-sourced Labeling—Labeling process will be performed on Cr owd Flower : • Labeling—Each Twitter user has to be labeled as—normal, bullying, spammer, or aggressive by evaluating their batched of tweets. • Aggressor—posting at least one tweet or negative meaning retweet, with the purpose of insulting or harming other users. • Bully—posting multiple tweets or negative meaning retweet, with the purpose of insulting or harming other users. • Spammer—posting advertisement or marketing or something distrust nature like selling adult nature products and attempts phishing. Cr owd Flower Task—An online tool was employed for crowd workers, which firstly asked to provide some of the elementary demographic information like name, nationality, age, gender, annual income, and education level. In summary, the following information is in Table 1. This information comes from different countries enlisted in the table and the annotators from the top ten nations which contribute 75% of all annotations. (e) Feature Extraction—Extracting features from tweets and user profile such as user, network-based, and text. User-based Features—From a user’s profile, features are extracted which include the number of tweets made, the age on his account, the number of people subscribed to, whether the account is verified or not and what kind of profile image will be used. Session statistics considers the number of sessions from user around 3 months (June–August) and estimated the standard deviation and average median of each user’s sessions.

Cyberbullying in Online/E-Learning Platforms Based …

235

i. Text-based Features—Checking the deeper into a person’s activity of tweeting which analyses the specific features in the tweets. Consider the basic metrics of person’s tweets such as number of hashtags utilized, text of uppercase type, number of emotions, and URLs. Compute the average of overall tweets in user’s annotated batch. • Web embedding—provides the syntactic and semantic relation of words, which captures the contextual clues and refined attributes that are related to human language. Use of Word2vec, an unsupervised word embedding strategy, detects the syntactic and semantic relations of word. A two-layer neural network, Word2vec, is strategy which functions on set of text; initially, removal of noise which establishes the vocabulary based on words, generating a learning model which accepts text as input to study the representation of words vector in a D-dimensional and, finally, output a representation of vector for each word. • Sentiment detection will be done through tool SentiStrength, which estimates the negative and positive sentiment in short words [18]. • Hate and curse words existence will be examined with the help of Hatebase database which comprises of list of hate words from crowdsource [19]. ii. Network-based Features—Twitter social network demonstrates an important role in spread out the ideas and information but also rumors, abusive languages, negative opinions, etc., [20]. The association among cyberbullying behavior and the location of the Twitter network friends and their cohorts has been observed clearly for the purpose of identifying the changeable degree of embeddedness with regard to these friends. • Popularity can be defined with the count of followers or friends and the ratio followers in their user profile and its popularity. This popularity quantifies the positive or negative impact of each user profile and their ego network directly. • Reciprocity quantifies which user follows the connections they obtain from users. This reciprocity provides the interaction-based graph utilizing likes in the posts, sharing the same in their user profiles. • Power difference—based on every power of bullies and the behavioral and emotional state of victims was analyzed, i.e., maximum negative emotions will lead to more cyberbullies that are popular in attack which has a higher power differences in their network status. Considering the power differences tweeter and their mentions stats with analysis of user’s mentions reveals the possible bullies’ behavior [21]. • Centrality scores—provides the user’s place in the network with the help of different metrics such as hub (sum of authority scores of the node) [22], authority (number of nodes connected to user), eigen vector (finds the impact of a person in his network), closeness

236

N. Balaji et al.

Table 2 CrowdFlower task description S. No. Information 1

2

3

4

5

Gender Male Female Educational qualification Secondary Bachelor degree Master degree Ph.D. Income Below 10 k Between 10 and 20 k Between 20 and 100 k Age 18–24 25–31 32–38 39–45 45 Years and above Country USA, Venezuela, Russia, and Nigeria

Units in percentage 70% 30% 18.4% 35.2% 44% 2.4% 35.5% 20% Rest 27% 30% 21% 12% Rest Rest

centrality (in the network finds to which is closer to each other), and connectivity. The entire network and their followers were used to measure these scores as an un-directed network or graph of nodes (Table 2).

3.2 Case Study: Cyberbullying Using Machine Learning Machine learning which is presented in this section is based on textual features. The dataset was considered for the same from Web site Formspring.me in the form of an answer and question-based Web site which allows persons to any other user’s page to post questions anonymously. Subset information was extracted from the sites of Formspring.me of different users and chooses randomly. The range of considering these questions from 1 post to 1000 posts. 1. Labeling the data: The questions and answers were extracted randomly from Formspring.me data for the purpose of training and testing set. Amazon’s mechanical turk service was used to provide the labels for truth sets, and an online marketplace permits the requestors to post tasks, i.e., HITS which are then finished by paid workers [23]. Each HIT or question and answer from Formspring.me requested the information as follows:

Cyberbullying in Online/E-Learning Platforms Based …

• • • •

237

Whether the post comprise any cyberbullying? In this post, how bad is the cyberbullying? What are the phrases indicating the cyberbullying? Kindly, provide any extra data you would like to share related to the post.

Later, labeling the data is extracted based on the above-mentioned information, since the cyberbullying is a subjective task [24]. 2. Features Extracted—Identifying and extracting of the features from each Formspring.me were extracted. We need to label every data, and there are “bad” words that considered as cyberbullying. In order to achieve this, we recognized some of the bad words which were posted in the Web site www.noswearing.com, and these words are classified based in the severity levels. Features were extracted like as number of “bad” words (NUM) and the density of “bad” (NORM) words. Also, generating a feature to measure the overall “badness” (SUM) of a post, by considering the “bad” words weighted average [23]. 3. Learning the Model—Waikato environment for knowledge analysis (W eka) is a software, which is composed of algorithms related to machine learning for tasks related to data mining. W eka comprises data preprocessing, classification, regression, association rules, clustering, and visualization tools. Some of the most useful machine learning algorithms are for our problem of statement, whereas the detection of cyberbullying to be carried out on top of For mspring.me dataset. • J48—utilizes the C4.5 implementation to generate a decision tree model from the attributes of the given dataset [25]. • JRIP—algorithm of rule based which makes a broad rule set, later then reduce the rule set to retain the same success [26]. • IBK—a k-nearest neighbor approach, instance-based algorithm, is with a value k = 1 and k = 3 [27]. • SMO—function-based support vector machine [28] related on sequential minimal optimization.

3.3 Case Study: Cyberbullying Using Social and Textual Analysis The concept of ego network was introduced, which comprises of a focal node also known as “ego”, and the nodes to whom ego is connected straightly to plus the ties, if any, among the alters. Specifically, a “1.5 level ego network” is termed that because its collection procedure stands amid a network distance of 1 and a network distance of 2 network has as numerous egos as it has nodes. Social network was represented as a graph G = V ; E where V set of all vertices/nodes and E is the set of all directed edges/relationships in a G. And, represent the 1 ego network of a node v as the graphs G 1 (V1 ; E 1 ) such that V1 has all of the nodes u such that there is an edge which exists (v; u) in E, and that E 1 has all of the edges from v to the nodes of V1 . Later, the 1.5 ego graph was presented, i.e., G 1.5 (V1.5 ; E 1.5 ) where V1.5 = V and E 1.5 = E [29].

238

N. Balaji et al.

Fig. 2 1.5 ego network

In Fig. 2a, which represents the ego node A specified as a square, and its neighbors are specified as triangles. The edges of the same are represented as a solid line, while the other edges are shown as dotted lines. The focus was on 1.5 ego network graph, since it captures and holds a judicious level of social context such as my friends, me, the relationship among the friends. Figure 2b represents the users’ relationship graph by merging the 1.5 ego network of the two users B (receiver) and A (sender). The communication between these two graphs will be represented as a directed weighted edge and allows the characterization of both receiver and sender in terms of the location where they grip their respective ego network. Few important social network features were considered and as follows: • Number of nodes—resulting in 1.5 ego network which shows that how the graph large is and its relationships. • Number of edges—which represents how well the connected community or subcommunity. • Degree centrality—number of relationships incident upon the node in a directed graph or a network. The use of both the degree of centrality such as popularity (indegree) and active (out-degree) in the feature set. Both the features were calculated for receiver and sender, where it uses the following expression to calculate the same. n j=1 x ji (1) C I (i) = (n − 1) where, C I (i) is indegree centrality, x ji is the value of the tie from j to i (either 0 or 1), n is the number of nodes in the network. Similarly, calculate the same of out degree connections in the network graph. • Edge betweenness Centrality E B(e)—measuring the centrality and influence of an edge in the network as a directed graph G(V, E), and it is calculated by using Eq. (2). σvi ,v j (e) (2) E B(e) = σvi ,v j v ∈V v ∈V \{v } i

j

i

where, σvi ,v j is the shortest path between the vertices vi and v j and σvi ,v j (e) is the shortest path between the vertices vi , and v j on edge e. • Links—count of posts among two persons from the labeled conversation. • K -core score—is a maximal sub-graph which contains the degree k or more.

Cyberbullying in Online/E-Learning Platforms Based …

239

1. Characterizing Message Count (Detection): Cyberbullying detection in the social and text follows some of the features [24]. • Density of Bad Words—Few curse words were collected around 713, and a stylized picture of an object representing a word related on resources of online type and for our convenience is then extracted physically. • Density of Uppercase Words—The ratio of uppercase letters in the posted message will be considered as a feature since it might be interrelated to talking loud in online platform. • Number of Exclamations Points and Question Marks—These question marks and exclamations stand as comments of emotional type and chosen as one more feature. • Number of Smileys—Most commonly used indicator of emotions and considered as one more feature. • Parts-of-speech Tags—Detecting commonly occurring bi-gram pairs in the training data. 2. Classification (Prediction): “Synthetic minority oversampling technique (SMOTE)” technique was used to create a balanced dataset for training and to test the imbalanced properties. SMOTE strategy works under sampling the majority class with minority class and which causes the problem of over fitting of data points. The training set of data is 70%, and remaining is for testing. The usage of weka 3.0 tool for the purpose of implementation and chooses the well-known classification strategy includes J48, naive Bayes, SMO, dagging and bagging. Information gain strategy was used to rank the features. The classification results yield the performance related to receiver operating characteristic (ROC) and positive rate of true nature and also lists the top three features for the textual model, the composite model, and the social model.

4 Conclusion The chapter discusses the importance of the detection of cyberbullying in E-learning. We discussed the emergence of the cyberbullying and the types of cyberbullying. The case studies which we have considered were limited to a particular group. The case studies are discussed to throw light on the work done still so far in the area and helps for further study in this area for research.

References 1. Kartiwi, M., Gunawan, T.S.: Cyberbullying in E-learning platform: an exploratory study through text mining analytics. J. Inf. Syst. Digit. Technol. (2019) 2. Olweus, D.: Invited expert discussion paper cyberbullying: an overrated phenomenon? Eur. J. Dev. Psychol. (2012)

240

N. Balaji et al.

3. Ang, R.P., Goh, D.H.: Cyberbullying among adolescents: the role affective and congitive empathy and gender. In: Child psychiatry and human development (2010) 4. Concannon, F., Flynn, A., Campbell, M.: What campus–based students think about the quality and benefits of e-learning. Br. J. Educ. Technol. (2005) 5. Kowalski, R.M., Giumetti, G.W., Schroeder, A.N., Lattanner, M.R.: Bullying in the digital age: a critical review and meta-analysis of cyberbullying research among youth. Psychol. Bull. 140(4), 1073 (2014) 6. Akbulut, Y., Eristi, B.: Cyberbullying and victimisation among Turkish university students. Aust. J. Educ. Technol. (2011) 7. Aricak, O.T.: Psychiatric symptomatology as a predictor of cyberbullying among university students. Eurasian J. Educ. Res. (EJER) (2009) 8. Eskey, M.: Cyberbullying in the Online Classroom: Faculty as the Targets. TCC Hawaii (2014) 9. Tokunaga, R.S.: Following you home from school: a critical review and synthesis of research on cyberbullying victimization. Comput. Hum. Behav. (2010) 10. Hosseinmardiz, H., Mattsonx, S.A., Rafiqz, R.I., Hanz, R., Lvz, Q., Mishraz, S.: Prediction of Cyberbullying Incidents on the Instagram Social Network, Aug 2015 11. Limber, S.P., Kowalski, R.M., Agatston, P.A.: Cyber Bullying: A Curriculum for Grades 6–12. Hazelden, Center City, MN (2008) 12. Pennebaker, J.M., Francis, M.E., Booth, R.J.: Linguistic Inquiry and Word Count. Lawerence Erlbaum Associates. Mahwah, NJ (2001) 13. CrowdFlower: www.crowdflower.com (2017) 14. Chatzakouy, D., Kourtellisz, N., Blackburn, J., De Cristofaro, E., Stringhini, G., Vakaliy, A.: Mean Birds: Detecting Aggression and Bullying on Twitter, May 2017 15. Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. In: Radiology (1982) 16. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1) (2001) 17. Hosseinmardi, H., Mattson, S.A., Rafiq, R.I., Han, R., Lv, Q., Mishra, S.: Analyzing Labeled Cyberbullying Incidents on the Instagram Social Network. SocInfo (2015) 18. Nahar, V., Unankard, S., Li, X., Pang, C.: Sentiment Analysis for Effective Detection of Cyber Bullying. APWeb (2012) 19. Hatebase database. https://www.hatebase.org/ (2017) 20. Jin, F., Dougherty, E., Saraf, P., Cao, Y., Ramakrishnan, N.: Epidemiological Modeling of News and Rumors on Twitter. SNAKDD (2013) 21. Pieschl, S., Porsch, T., Kahl, T., Klockenbusch, R.: Relevant dimensions of cyberbullying— results from two experimental studies. J. Appl. Dev. Psychol. 34(5) (2013) 22. Kleinberg, J.M.: Hubs, authorities and communities. ACM Comput. Surv. 31(4es) (1999) 23. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kauffman, San Francisco, CA (2005) 24. Reynolds, K., Kontostathis, A., Edwards, L.: Using machine learning to detect cyberbullying 25. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kauffman, San Mateo, CA (1993) 26. Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning (ICML’95). Tahoe City, CA (1995) 27. Aha, D.W., Kibler, D.: Instance-based learning algorithms. Mach. Learn. 6 (1991) 28. Yin, D., Xue, Z., Hong, L., Davison, B.D., Kontostathis, A., Edwards, L.: Detection of harassment on web 2.0. In: Proceedings of the Content Analysis of Web 2.0 Workshop (CAW 2.0). Madrid, Spain (2009) 29. Altshuler, Y., Fire, M., Shmueli, E., Elovici, Y., Bruckstein, A., Pentland, A.S., Lazer, D.: The social amplifier—Reaction of human communities to emergencies. J. Stat. Phys. (2013) 30. Huang, Q., Singh, V.K., Atrey, P.K.: Cyber Bullying Detection Using Social and Textual Analysis. ACM (2014)

Machine Learning and Remote Sensing Technique for Urbanization Change Detection in Tangail District Ananna Talukder, Sadia Mahbub Mim, Sabrina Ahmed, Muhammad Syed, and Rashedur M. Rahman

Abstract Presently, urban growth is a typical marvel globally, and it is obvious in non-industrial countries with current rapidity. Land-use changes are inferable from the fast urbanization in recent decades in non-industrial countries. With the fast growth of the populace, urbanization in Bangladesh has occurred in the same way as other different urban communities of the world. Tangail District has likewise been confronting rapid urban growth and land-use changes for the most recent couple of years. This paper aims to examine land-use changes investigation of Tangail District by utilizing multi-transient Landsat Thematic Mapper information. Remote sensing procedures are being used for analyzing examples of urban development utilizing geographic information systems (GIS). To understand the land-use change and urban development pattern, GIS and remote sensing procedures are cost-effective. In this research, Landsat images are ordered in four land-use classes utilizing different supervised classification including random forest algorithm. The spatial and fleeting changes of Tangail District concerning urban development and land use have been portrayed from Sentinel-2A data. We made an endeavor to explore the progressions from the years 2017, through 2020 for Tangail District. This study could significantly assist the urban planners through fundamental data about the degree and pattern of Tangail District’s urbanization to deal with the rapid urban development and give the essential urban drawbacks and benefits.

A. Talukder · S. M. Mim · S. Ahmed · M. Syed · R. M. Rahman (B) Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh e-mail: [email protected] A. Talukder e-mail: [email protected] S. M. Mim e-mail: [email protected] S. Ahmed e-mail: [email protected] M. Syed e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_21

241

242

A. Talukder et al.

Keywords Land use and land cover · Urbanization · Remote sensing · Sentinel-2A · Satellite images · Multispectral band · Supervised classification · Landsat Thematic Mapper · GEE · Random forest

1 Introduction Tangail District is located in the central division of Bangladesh with a population of around 3.8 million people with a land area of 3414.28 km2 . Many factories have grown rapidly in some areas of Tangail in recent years, changing the land size and region. This alteration has a major impact on the city’s urbanization process. In this paper, the use of Google Earth Engine in conjunction with remote sensing is reported to observe changes in land-use/land cover change (LULC) in the Tangail District as a result of the development of industries. Satellite remote sensing is a way of detecting land-use changes through real-time information, concise view and tedious coverage. Change detection in remote sensing means analyzing two enlisted, airborne or satellite multispectral bands obtained at two unique periods from a common topographical region. Physical and biological changes to the Earth’s atmosphere show how diverse areas including agricultural regions, forests, protected areas, wetlands and water sources alter or surround a certain region [1]. High-resolution satellite imagery can be used to figure this out. Several machine learning classifier packages for supervised classification using multi-temporal Landsat images could be found in the GEE. Random forests (RF), support vector machine (SVM) and a variety of other algorithms are examples. This research aims to look at the depletion of land and how much urbanization has been taken place. To achieve the primary goal, the following aspects are required. • • • •

Identify improvements in Tangail District’s urban growth. Investigate urban ground cover and improvements in the usage of existing lands. Investigate the framework and examples of land use. Analyze the effect of urbanization on ground cover/land-use modifications.

The aim of this paper is to look at the trends in Tangail District for the years 2017, 2018, 2019 and 2020. This paper is organized as follows. A number of related works in this field are presented in Sect. 2. The methodology is discussed in Sect. 3, and the results and discussion are presented in Sect. 4. Section 5 discusses the conclusion and potential future scope of the research.

2 Related Works The authors in [1] used a pixel-based classification to classify built-up areas in India. They used Landsat 7 and Landsat 8 as inputs for image classification and used the

Machine Learning and Remote Sensing Technique …

243

lowest possible range of cloud scores. They highlighted the dataset’s possible use in GEE for the temporal large-scale study of the urbanization process, despite the fact that it can be used for supervised image classification on any platform. In [2], the authors focused on the following tasks: (1) Use of GEE to acquire high-quality long time series images, sample datasets process and classifier selection; (2) Add to the classification of spectral details, texture, landscape and climate variables; (3) Analyze Xining development of impermeable surfaces from 1987 to 2019. Another paper [3] is organized into parts explaining the land loss due to urbanization, such as the issue statement. The support vector machine (SVM) classifier outperformed the random forest (RF) classifier in terms of overall accuracy and Kappa coefficient, despite the fact that both used the same training dataset. In another paper [4], a model was built on the classification of SPOT satellite photos from 2006, 2011 and 2016, as well as an estimation of ten driving forces. The Markov Chain model was used to run the Multi-Layer Perceptron Neural Network (MLPNN) process for the Business-asUsual scenario. Satellite photographs from three different time periods were used in this analysis (2006, 2011, and 2016). The use of remote sensing techniques used in different domains that range from change detection in urbanization, LULC mapping, crop prdouctions etc. are discussdd in [5–13]. The distinction between our study and the other studies is that we used the Google Earth Engine to conduct supervised classification after importing and exporting the dataset. For a clean perspective, we used multispectral Sentinel-2A pictures; we did not use any other source to eliminate the cloudiness from the photographs. Moreover, individual parts that we intend to categorize are labeled.

3 Data Acquisition and Methodology 3.1 Area of Study We analyzed the urban detection in Tangail Sadar Upazila. The analysis is done on the area of the coordinate [89.92, 24.25] (Fig. 1).

3.2 Data Acquisition and Processing Sentinel-2 MSI: Multispectral instrument, Level-2A is the satellite imagery data used in the analysis, and COPERNICUS/S2_SR is the ID of this dataset. This study used satellite images of four periods: 2017, 2018, 2019 and 2020. The total number of bands in this dataset is 12, with only six bands B2, B3, B4, B5, B6 and B7, chosen for this study and used B4, B3, B2 for land cover classification. The area bound has been set according to the value of the coordinate mentioned above under the Tangail District. We have classified the Tangail District by labeling 1500 polygons. These 215

244

A. Talukder et al.

Fig. 1 Area of study (Tangail)

images are filtered as per the study of the area. The images are processed by cropping, periods of dates and cloud coverage. For 2017, we fixed the date from 2017-05-30 to 2017-07-30, for 2018, the date was from 2018-02-28 to 2018-06-30, for 2019, the date was from 2019-01-28 to 2019-06-30, and for 2020, the date was from 2020-0130 to 2020-06-30. The images were filtered to have less than 1% cloud coverage. We have used geometry alternative of the GEE frame for classification to determine the representations of training points and used RF classifiers, to compose the classified maps for 2017, 2018, 2019 and 2020. The whole processing and computation is done in Google Earth Engine using four scripts per year.

3.3 Methodology First, we chose Sentinel-2A to gather satellite photos, and then, we filtered the images by date fixing, which provides the sharpest satellite view. For a better picture, less than 1% cloud coverage was used. Then, in GEE, we imported the Tangail District shapefile. Following that, labeling was done using the Sentinel-2A satellite image. The data was then separated into two parts: training and testing, with the training portion accounting for 70% of the total and the testing portion accounting for 30%. We used the random forest machine learning method to do supervised categorization. After that, we received the 2017–2020 land cover classifications. We then analyzed the results and detected the urban area for Tangail District. The complete process is depicted in Fig. 2.

Machine Learning and Remote Sensing Technique …

245

Fig. 2 Flowchart of the methodology

3.4 Training and Classification The labeling of images as a feature collection was done to perform supervised classification. For training the data, we divided the labeled images into two parts: 70% for training data and 30% for testing data and train the data into RF classifier. After finishing training, the result of classification was shown on the map and got the accuracy assessment in GEE. After that we detected the area in the main city of Tangail in square kilometers. This research has chosen supervised classification using a machine learning classifier which is random forest. It simply creates multiple decision trees and merges them together for better, accurate and stable prediction.

4 Result and Analysis Four separate GEE scripts were used to execute all of the steps mentioned in the methodology. The purpose is to evaluate and assess changes in land cover over the last four years, from 2017 to 2020. Tangail District’s urban growth pattern is increasing due to its proximity to Bangladesh’s capital, Dhaka, as we discussed at the beginning. We conducted supervised classification after importing and exporting the dataset in the Google Earth Engine. The categorized result for the desired region was obtained after conducting supervised classification with RF (random forest) in the four years 2017, 2018, 2019 and 2020. The overall change in land in the Tangail District over the last four years is portrayed in Fig. 3. In this figure, the red zone is a built-up area, the green zone is a vegetation area, the blue zone is water, and the orange zone is an open area. In this study, the representation of the classification is basically based on the use of confusion matrix, which has usually a quantitative method to measure the land-use– land cover classification accuracy. To give a more realistic result, Kappa coefficient is used when the data is imbalanced. This is used to maintain only those instances which

246

A. Talukder et al.

Fig. 3 Land cover changes for Tangail: 2017, 2018, 2019 and 2020

may have been correctly classified. Equations of accuracy and Kappa coefficient are given below. Classifier’s Accuracy: (TCP/TNP) * 100, Here, TCP is the sum of diagonals values of confusion matrix, and TNP is the sum of all values in the confusion matrix. Kappa Coefficient: (OLA − ELA)/(TNP − ELA), Here, OLA is the observed level of the agreements, and basically, this is also the sum of the diagonal values of the confusion matrix. ELA is the expected level of agreement that is calculated by multiplying the total number of rows and columns values of the confusion matrix and then divided by TNP (Tables 1, 2 and Figs. 4, 5). Table 1 Confusion matrix for 2017’s classification based on RF classifier 2017

Urban

Urban

172

1

Water

Water

Vegetation 32

Open 27

6

457

2

12

Vegetation

20

1

291

16

Open

23

14

11

1007

Machine Learning and Remote Sensing Technique …

247

Table 2 Accuracy and Kappa coefficient for 2017–2020 Accuracy (%) Kappa coefficient

2017

2018

2019

2020

92.11

89.92

88.74

91.98

0.8881

0.8352

0.7858

0.8396

Fig. 4 Chart for accuracy

Fig. 5 Chart for Kappa coefficient

2017 (Table 1): TCP = 172 + 457 + 291 + 1007 = 1927 TNP = 172 + 1 + 32 + 27 + 6 + 457 + 2 + 12 + 20 + 1 + 291 + 16 + 23 + 14 + 11 + 1007 = 2092 Accuracy = (1927/2097) * 100 = 92.11% ELA: 221 * 231/2092 = 24.5086 473 * 477/2092 = 107.8494 74 * 328/2092 = 11.6022 1062 * 1055/2092 = 552.19812 Expected Accuracy = The sum of all accuracy = 696.1584

248

A. Talukder et al.

Kappa Coefficient = (OLA − ELA)/(TNP − ELA) = (1927 − 696.1584)/(2092 − 696.1584) = 0.881 For analyzing the urban area, there have been chosen only Tangail Sadar Upazila. The area of Tangail municipality was 29.04 km2 (2017). This is the main city of Tangail but ignores built-up areas in dense and open areas. For 2017, urban area is 7.0009 km2 , for 2018, urban area is 7.0045 km2 , for 2019, urban area is 7.0579 km2 , and for 2020, urban area is 7.0894 km2 . According to the survey, the urban area could be 7–9 km2 . The changes of Urban Area within four years = (7.0894−7.0009) km2 = 0.0885 km2 0.0885 Percentage Increase = ∗ 100 7.0009 = 1.26% 7.0009 + 7.0045 + 7.0579 + 7.0894 2 Average Area = km 4 2 = 7.0381 km

5 Conclusion and Future Works The aim of this paper is to review and quantify land cover improvements over the last four years, from 2017 to 2020. The GEE was used to map changes in land cover and analyze dynamics and transition in certain densely populated areas. The changes in land cover among 2017, 2018, 2019 and 2020 are specifically classified by model, as well as their consistency and percentage change in the increasing scale. This study is carried out to determine the level of urbanization and the scarcity of land points and polygons. The classifier makes better choices and predicts future improvements in growth/loss land patterns, whether normal or artificial environmental considerations, using the same size of training sample scenario. The study’s drawbacks, which include a lack of statistical data and cloud cover issues with satellite data, point to potential analysis that may be performed based on this current research. Photos from Sentinel-2 are available from 2015 onwards; images from previous years are not available. Because of this, it is possible that there are not many variations in classification. Our future studies on this will include forecasting improvements over a longer period of time. We have done only one supervised classification technique, and next, we will try with other supervised classification techniques and compare the performance among them. We can also include some less developed districts to determine the urban growth which can be useful to the policymakers.

Machine Learning and Remote Sensing Technique …

249

References 1. Gothman, J.: Megalpolis or the urbanization of the northeastern seaboard. Econ. Geogr. 33, 189–200 (1957) 2. Cao, X., Gao, X., Shen, Z., Xiang, R.: Expansion of urban impervious surfaces in Xining city based on GEE and landsat time series data. IEEE Access (2020) 3. Hassan, F., Safdar, T., Irtaza, G., Khan, A.U., Kazmi, S.M.H., Murtaza, F.: Urbanization change analysis based on SVM and RF machine learning algorithms. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 11(5) (2020) 4. Andi Mohammad, Y.H., Sumbangan, R., Dorotha, A.R., Samsu, A.: Modeling land use/land cover changes prediction using multi-layer perception neural network (MLPNN) 14(6) (2020) 5. Midekisa, A., Holl, F., Savory, D.J., Andrade-Pacheo, R., Gething, P.W., Bennett, A., Sturrock, H.J.W.: Mapping land cover change over continental Africa using Landsat and Google earth engine cloud computing. PLoS ONE 12, e0184926 (2017) 6. Dewan, A.M., Yamaguchi, Y.: Land use and land cover change in greater Dhaka, Bangladesh: using remote sensing to promote sustainable urbanization. Appl.Geogr. 29, 390–401 (2009) 7. Pugh, C.: Sustainability the environment and urbanization. Earthscan, New York, NY, USA (1996) 8. Rwanga, S.S., Ndambuki, J.M.: Accuracy assessment of land use/land cover classification using remote sensing and GIS. Int. J. Geosci. 8(04), 611 (2017) 9. Ishii, T., et al.: Detection by classification of buildings in multispectral satellite imagery. In: 23rd International Conference on Pattern Recognition (ICPR). IEEE (2016) 10. Lam, N.S.-N.: Methodologies for mapping land cover/land use and its change. In: Advances in land remote sensing, pp. 341–367. Springer (2008) 11. Thanh Noi, P., Kappas, M.J.S.: Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using Sentinel-2 imagery 18(1), 18 (2018) 12. Singha, M., Dong, J., Zhang, G., Xiao, X.: High-resolution paddy rice maps in cloud-prone Bangladesh and Northeast India using Sentinel-1 data. Sci. Data (2019) 13. Wahap, N.A., Shafri, H.Z.M.: Utilization of Google earth engine (GEE) for land cover monitoring over Klang Valley, Malaysia. In: IOP Conference Series: Earth and Environmental Science, vol. 540, Kuala Lampur, Malaysia (2020)

Enabling a Question-Answering System for COVID Using a Hybrid Approach Based on Wikipedia and Q/A Pairs Janneth Chicaiza and Nadjet Bouayad-Agha

Abstract The research on COVID-19 disease has produced much information, but there are more questions than certainty. This proposal aims to contribute by providing reliable and updated answers to questions aimed at the general public. To achieve this goal, we design a question-answering architecture that leverages two information sources of different nature, controlled-official and open-collaborative. Thus, the system can answer several questions that the community may have about COVID. During the experimentation, we found that thanks to knowledge graphs, information retrieval, and NLP methods, the system can provide explainable answers; i.e., they obtain direct answers and can browse into enriched responses. Keywords COVID-19 · Question-answering system · Wikipedia · Knowledge graph · NLP · Information retrieval · FAQ

1 Introduction The research on COVID-19 disease has produced much information, but there are more questions than certainty. Scientists and authorities have many questions, but the general public is also concerned and has many questions about this topic. Keeping people well-informed is crucial to ensure collective responsibility to maintain the virus spreading under control and take care of each other. Question-answering (QA) systems help reduces misinformation or lack of information. Nowadays, thanks to advances from natural language processing (NLP), deep learning (DL), information retrieval (IR), and knowledge graphs (KGs) to build a QA system that has become an easier task. However, there are still some issues and challenges to understanding users’ requirements expressed in natural language. J. Chicaiza (B) Universidad Técnica Particular de Loja, Loja 110105, Ecuador N. Bouayad-Agha Faculty of Computer Sciences, Multimedia and Telecommunication, Universitat Oberta de Catalunya (UOC), Barcelona, Spain © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_22

251

252

J. Chicaiza and N. Bouayad-Agha

Also, in open and distributed platforms like the Web, QA-based applications have to overcome particular challenges such as managing the meaning, lack of structure of information sources, and information volatility. In this paper, we propose a QA system to provide people with more reliable and updated information. In summary, we highlight three main contributions of our proposal. First, we designed a QA architecture that provides information of general interest of COVID and takes advantage of the nature of different data sources. Second, as a proof of concept of the proposed architecture, we build and evaluate the system’s main components. And third, the community interested in replicating or testing the implemented solution can download the code and data from the COVID19QA project1 available on Github. Continuing with the paper, Sect. 2 presents the main approaches to building QA systems and some proposals created for COVID. Section 3 describes the system’s architecture and explains the essential activities of building a proof of concept. In Sect. 4, we offer the results of the preliminary evaluation. Finally, the conclusions are presented in the last section.

2 Question-Answering Systems 2.1 Main Approaches Question-answering is a common NLP task. According to Alqifari [1], p. 1, a QA is a “type of system in which a user can ask a question using natural language, and the system provides a concise and correct answer.” Therefore, the main problem of QA is, given a set of information resources r and a user’s query q, we need to find the best answer a for q. There are two main approaches to build a QA system [5]: Information retrievalbased QA (IR-QA) and knowledge-based QA (K-QA). IR-QA applies IR techniques for processing text documents. Commonly the main source of information for the system is the textual Web content. The knowledge-based systems (K-QA) build a formal representation of the query used to retrieve the answers from a knowledge base. In general, a K-QA retrieves answers from databases or KGs. In addition, there is a third approach that combines both paradigms, or others, and it is called hybrid QA. In this proposal, we choose a hybrid system that combines the ability of Q/A pairs given in FAQs to retrieve, from a database, precise answers with the power of IR-QA to retrieve a large amount of information and find answers in unstructured data, which are the most common on the Web. The main motivation to choose this paradigm is leverage two different data sources and two methods to find or extract answers.

1

https://github.com/jachicaiza/COVID19-QA.

Enabling a Question-Answering System for COVID … Table 1 QA applications for the domain of COVID-19 References Approach [4] [6] [7] [10] [11] [12] [14] [15] [16]

IR-QA K-QA + IR-QA IR-QA IR-QA IR-QA K-QA IR-QA K-QA IR-QA

253

Data source CORD-19 CORD-19, QAs from FAQs sites, etc. WHO and CDC. CORD-19 CORD-19 FAQs de WHO CORD-19 CORD-19 CORD-19

2.2 Question-Answering Approaches for COVID Since the COVID-19 outbreak appeared, some initiatives have emerged to try to reduce the information gap. Among those efforts are those focused on building QA applications. Table 1 lists nine papers that describe QA applications or services specialized in the COVID-19 domain. As seen in Table 1, the most popular type of application IR-QA. Regarding the origin of the data, the most used source to build the systems has been CORD-19.2 This means that the majority of proposals have aimed at improving access and solving the information requirements of specialists in the field of biomedicine. Only three proposals [6, 7, 12] exploit less specialized sources that could be more appropriate for the general public. In this paper, we use text sources aimed at the general public and combine the precise Q/As given in FAQs with broader information that IR-QA can offer.

3 Design of the QA System for COVID 3.1 Overview of the Architecture We identify two components for our proposal: (1) system data and (2) questionanswering. Figure 1 shows the system architecture and its underlying tasks and data resources. The first component collects and prepares the information that the system will use. The three data creation tasks generate two types of repositories. First, the system’s data repository contains Q/A pairs, metadata of data sources, and semantic descriptions that the system will use to enrich the answers. Second, the system’s document 2

https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge.

254

J. Chicaiza and N. Bouayad-Agha

Fig. 1 System architecture diagram

repository contains dense representations of text fragments. Since COVID is a fastpaced changing domain, the availability of structured data will be scarce. Therefore, curated content from Wikipedia is a reasonable choice to answer the last questions about the virus. By combining the two sources of different nature (controlled-official and open-collaborative), the system will provide answers to several questions that the community have about COVID. The second component consists of three modules. First, the query processor creates vector representations of the user’s question and attempts to detect pairs of similar questions in the system data repository. Second, the retrieval of answers is done using two engines: (1) a query engine obtains the answers from the data store and (2) a retrieval engine extracts the answers from the document store. By combining two complementary approaches, the system can be flexible to process different types of queries. Also, the system will be able to take advantage of the structure of the data to enrich with semantic annotations the answers that the user will receive. Third, the last module is the answer processor, which integrates and filters the answers it receives from the second module, then delivers them to the users.

Enabling a Question-Answering System for COVID …

255

3.2 Implementation of a Proof of Concept In this section, we describe some details of the implementation of a proof of concept.

3.2.1

System Data Component

To create the data sources of the system, we performed the following tasks: 1. Collection of Q/A pairs. As a provider of the Q/A pairs, we select official FAQ sources about COVID-19: (1) the CDC’s Dedicated Coronavirus Resource3 of Disease Control and Prevention (CDC), (2) The WHO’s Dedicated Coronavirus Resource,4 and (3) the CNN portal.5 To extract those pairs, we used the Python BeautifulSoup. For the proof of concept we collected more than 800 Q/A pairs. 2. Creation of the controlled vocabulary. We create a controlled vocabulary of COVID by leveraging KGs. The system uses the vocabulary to enrich, with semantic annotations, the answers that will be delivered to users. The links will allow the users to find information related to their concerns. As a source of terminological information related to COVID, we use Wikidata,6 which is a RDF-based KG. Accessing the SPARQL-EndPoint as a gateway, we apply an iterative process to collect the domain terms directly and indirectly connected to the seed node wd:Q84263196, which represents the term COVID19 in Wikidata. After executing the queries, we obtained 1369 entities and 4635 labels related to COVID. Figure 2 illustrates how from the seed node we obtained a group of labels or terms related to the disease. 3. Creation of the document store. We selected Wikipedia as the source for our document store because people are continually editing content related to the virus and the underlying disease. For example, so far the page entitled COVID-19 pandemic7 has received around 48 revisions per day. In general, Wikipedia is constantly evolving as it contains updated information on events and topics that people are interested in [2, 3]. Likewise, Wikipedia is interesting as a source of information because it can be used [8] to provide answers to any simple question [2], even in open and large-scale domains. In addition, it is easy to automatically monitor and detect changes in the content of the Wikipedia pages [13]. To create the document repository, we take advantage of the SKOS concepts structure available in DBPedia-Live. The first step in building the repository was to identify the pages related to COVID. The next step was to extract the content

3

https://www.cdc.gov/coronavirus/2019-ncov/faq.html. https://www.who.int/health-topics/coronavirus. 5 https://edition.cnn.com/interactive/2020/health/coronavirus-questions-answers/. 6 https://www.wikidata.org/wiki/Wikidata:Main_Page. 7 https://en.wikipedia.org/wiki/COVID\discretionary-19_pandemic. 4

256

J. Chicaiza and N. Bouayad-Agha

Fig. 2 Partial view of the graph of terms related to COVID

from the pages using the Wikipedia API and partition the content into smaller units of information (sections and paragraphs). Finally, the last step was to index the content using a model based on dense vector representations.

3.2.2

Question-Answering Component

In general, as seen in Fig. 1, the system takes as input a user’s question expressed in natural language (qu ) and returns one or more answers ( Au ). To generate the most appropriate outputs, we built the following modules: 1. Query processor. The query processor is responsible for: (1) creating neural-based representation for the user’s question and (2) selecting the system questions (those extracted from WHO, CDC, and CNN) that are similar to the user. We used distilbert-base-nli-stsb-quora-ranking as the embedding model to represent sentences and identify duplicated questions. To calculate the similarity between the user’s question (qu ) and those of the system (Q FAQ ) we use the util.semantic_search function of the Sentence-Transformers (SBERT) framework.8 2. Retrieval and query engines. Being qu the user’s question, there are two courses of action for the system:

8

https://sbert.net.

Enabling a Question-Answering System for COVID …

257

– Query engine. If there is at least one similar question (Q FAQ ) in the system with an acceptable score, then the system retrieves the answer(s) ( AFAQ ) from the data store. This first case is the most straightforward course of action for the system because the Q/A pairs are stored in the data repository. Executing a query, we can obtain the corresponding answer to the similar question that was found. In addition to retrieve the response, the query engine, from a set of preestablished queries (templates), is responsible for executing the most appropriate templates to obtain additional data that helps enrich the output that will be presented to the user. – Retrieval engine. If no similar system question is found for the user question, then the retrieval engine goes into operation. The goal of this engine is to find passages (passage retriever) and extract specific text units that could contain the response by using a machine reading comprehension (MRC) approach. The tasks associated with information retrieval were implemented using different methods and resources from Haystack.9 Haystack allowed us to streamline the proposed implementation process and take advantage of the potential of semantic search approaches based on pretrained embeddings models. Specifically, as a passage retriever we used the dpr-ctx_encoder-single-nq-base10 model and the reader (MRC) was initialized with the deepset/bert-large-uncased-wholeword-masking-squad2.11 3. Answer processor. This module evaluates the best answers delivered by the retrieval and query engines and selects those that will be returned to the user (Au ). The best answers are those with the highest automatic score which are above a given threshold. Answers are selected from FAQs (AFAQ ) by the query engine, and from Wikipedia pages (AIR ) by the retrieval engine. In addition to the automatic score for each answer, to rank the results, the system can use qualitative information such as data source metadata and semantic annotations found in the answer’s text.

4 Preliminary Evaluation To evaluate the prototype, we randomly selected 80 questions from two datasets: COVID-QA [9] and Qorona.12 We analyze the system’s response from two point of views: (1) the performance of a key module of the system, the query processor, and (2) the overall functioning of the system.

9

https://haystack.deepset.ai. https://huggingface.co/facebook/dpr-ctx_encoder-single-nq-base. 11 https://huggingface.co/deepset/bert-large-uncased-whole-word-masking-squad2. 12 https://github.com/allenai/Qorona. 10

258

J. Chicaiza and N. Bouayad-Agha

Fig. 3 Precision of the question selector

4.1 Performance of the Query Processor We wanted to check the ability of the system to find questions similar to the users’ ones. To achieve this objective, we manually analyzed the sentences returned by the system and their automatic score. We obtained the first five similar sentences for each test question that the system found in the data store. The question with the highest score receives the first position (or rank = 1), the question with the best second score receives position 2, and so forth. For the 80 questions in the test dataset, we obtained 400 similar statements (Q c ) from the system. Then, we assign each candidate question with the relevance value HIGH or LOW, depending on its similarity with the analyzed question. To obtain a quantitative measure associated with the performance of the query processor, we calculate the precision considering: the system score (HIGH for a score ≥ 0.75 and LOW for a score < 0.75) and the qualitative score assigned by the authors during the manual evaluation. Figure 3 shows the precision reached by the question selector, considering two scenarios: (1) the quality of the question that occupies the first position (rank = 1) and (2) the correct sentences found in the top-5. In the first case, the module achieved the precision of 77.5%. Extending the analysis to questions up to fifth position, the precision rose to just over 81%. To improve the coverage and precision of the query processor, we plan to extract data from other reliable sources of FAQs and try using other NLP models.

4.2 Overall Qualitative Evaluation By integrating all the modules of the system and considering the test question Can hand sanitizer protect against COVID-19?, the user receives answers like shown Fig. 4. In this figure, we can see: (1) the paragraph or context that contains the answer with the best score in Wikipedia and in the FAQs, (2) the extractive summary that the recovery engine returns, which is highlighted, (3) the links of the most relevant terms of the domain that point to Wikipedia entities, and (4) the key metadata of the source of each response.

Enabling a Question-Answering System for COVID …

259

Fig. 4 Best answers obtained for the question “Can hand sanitizer protect against COVID-19?”

Given this result, we believe that our system will be easy to use because users will be able to ask questions using a language that they understand. In addition, thanks to the use of KG and NLP methods, the system will be able to adapt the outputs so that the user can obtain the direct answer but can also browsing related information. Thanks to the hybrid approach of the system, during the evaluation, we found that the system is able to answer two types of requests: interrogatives and short queries based on keywords. In both cases, the system was able to find answers in the two repositories: data and documents. Although in the case of queries expressed as keywords, the task of finding answers was less successful because user’s information needs are more general and uncertain (e.g., “How to boost immune system by food” or “is coronavirus the flu”).

5 Conclusion and Future Work To cover the continuous demand for information that the community has about COVID, some initiatives have emerged focused on creating information systems such as search engines and QA systems. However, the majority of those proposals are based on specialized text corpora in the biomedical domain; therefore, they are intended to provide answers that only domain experts would understand.

260

J. Chicaiza and N. Bouayad-Agha

In this paper, we present the architecture of a question-answer system that was designed to be used by the community or the general public. In addition, we implemented the main modules of the system in order to carry out a proof of concept. For its development, we reused existing packages and frameworks such as SBERT and HayStack, which sped up the implementation process and facilitated future maintenance tasks. During the experimentation, we detected that there are questions for which the system did not find results. Therefore, to improve the system’s coverage, we plan to (1) automatically update the system’s repositories, (2) collect data from other FAQ sources, and (3) expand and better clean the corpus obtained from Wikipedia. Before releasing a production version of the system, we will continue improving the proposal and evaluating its performance. In addition, since the domain of this project is a sensitive topic related to public health, we need to carry out extensive tests that include assessments made with real users.

References 1. Alqifari, R.: Question Answering Systems Approaches and Challenges. In: Proceedings of the Student Research Workshop Associated with RANLP 2019. pp. 69–75. Assoc. for Computational Linguistics Bulgaria, Varna (2019) 2. Chen, D., Fisch, A., Weston, J., Bordes, A.: Reading Wikipedia to Answer Open-Domain Questions. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL). vol. 1, pp. 1870–1879. ACM (2017) 3. Chou, H., Lin, D., Ishida, T., Yamashita, N.: Understanding open collaboration of wikipedia good articles, Lecture Notes in Computer Science, vol. 12195 LNCS (2020) 4. Esteva, A., Kale, A., Paulus, R., Hashimoto, K., Yin, W., Radev, D., Socher, R.: CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization. Tech. rep., arxvid (2020) 5. Jurafsky, D., James, M.: Question answering. In: Speech and Language Processing, chap. 23, pp. 1–30. 3 edn. (2020) 6. Lee, J., Yi, S.S., Jeong, M., Sung, M., Yoon, W., Choi, Y., Ko, M., Kang, J.: Answering Questions on COVID-19 in Real-Time. In: Proceedings of The 2020 Conference on Empirical Methods in Natural Language Processing. arXiv (2020) 7. Li, Y., Grandison, T., Silveyra, P., Douraghy, A., Guan, X., Kieselbach, T., Li, C., Zhang, H.: Jennifer for COVID-19: An NLP-Powered Chatbot Built for the People and by the People to Combat Misinformation. In: Proceedings of The NLP COVID-19 Workshop (2020) 8. Lymperopoulos, P., Qiu, H., Min, B.: Concept Wikification for COVID-19. In: Proceedings of The 2020 Conference on Empirical Methods in Natural Language Processing (2020) 9. Möller, T., Reina, A., Jayakumar, R., Pietsch, M.: COVID-QA: A Question Answering Dataset for COVID-19 | OpenReview. In: ACL 2020 Workshop NLP-COVID Submission (2020) 10. Oniani, D., Wang, Y.: A Qualitative Evaluation of Language Models on Automatic QuestionAnswering for COVID-19. Tech. rep., arxvid (2020) 11. Otegi, A., Campos, J.A., Azkune, G., Soroa, A., Agirre, E.: Automatic Evaluation vs. User Preference in Neural Textual Question Answering over COVID-19 Scientific Literature. In: Proceedings of the 2020 Conference on Empirical Methods in NLP (2020) 12. Schreurs, E.: How we created an open-source covid-19 chatbot. Tech. rep, Towards Data Science (2020) 13. Steiner, T., Verborgh, R.: Disaster Monitoring with Wikipedia and Online Social Networking Sites: Structured Data and Linked Data Fragments to the Rescue? (2015)

Enabling a Question-Answering System for COVID …

261

14. Su, D., Xu, Y., Yu, T., Siddique, F.B., Barezi, E.J., Fung, P.: CAiRE-COVID: A Question Answering and Query-focused Multi-Document Summarization System for COVID-19 Scholarly Information Management. In: Proceedings of The 2020 Conference on Empirical Methods in NLP (2020) 15. Wang, Q., Li, M., Wang, X., Parulian, N., Han, G., Ma, J., Tu, J., Lin, Y., Zhang, H., Liu, W., Chauhan, A., Guan, Y., Li, B., Li, R., Song, X., Ji, H., Han, J., Chang, S.F., Pustejovsky, J., Rah, J., Liem, D., Elsayed, A., Palmer, M., Voss, C., Schneider, C., Onyshkevych, B.: COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation. Tech. rep., arxvid (2020) 16. Zhang, E., Gupta, N., Nogueira, R., Cho, K., Lin, J.: Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned. In: ACL 2020, NLP-COVID Workshop. pp. 1–10 (2020)

A Study of Purchase Behavior of Ornamental Gold Consumption Shalini Kakkar and Pradnya V. Chitrao

Abstract Gold is an important asset for investment in all households as it has cultural impact on purchase decisions. Indian consumers give a huge scope to analyze factors affecting its purchase decision. The main purpose is to study insights about buying behavior pattern and its effect by cultural, social, brand awareness and economic factors. The primary data was collected from Mumbai to understand consumer decision process by assessing the demographics of the consumer, behavioral variables and psychographics to understand consumer behavior. The value of gold has risen to greater heights beating national, political and cultural borders to be labeled as one of the ideal investments. Keywords Ornamental gold · Purchase · Consumer behavior · Cultural

1 Introduction The sparkling luster of gold has created its significance in almost every Indian household. Gold is admired in all forms as it provides an umbrella in uncertain times. Its easy liquidity makes it a safe investment with assurance of appreciation. Purchasing ornamental gold is an area to explore with numbers of factors like family tradition, preservation of cultural heritage, investment, gifting and ensuring family security. The consumption of gold could also be witnessed in the form of investment and gifting purpose which increases its purchase among the consumers. Interest rates and gold prices are directly linked to each other. If there is an increase in the interest rate, the consumers generally sell gold so that there is more attainment of cash. As a result, due to an increase in the sale of gold by the consumers, there is a fall in the S. Kakkar (B) PTVA’s Institute of Management, Chitrakar Ketkar Marg, Behind M.L. Dhanukar College, Vile Parle E, Mumbai 400057, India S. Kakkar · P. V. Chitrao Symbiosis Institute of Management Studies (SIMS), A Constituent of Symbiosis International Deemed University, Range Hills Road, Khadki, Pune 411020, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_23

263

264

S. Kakkar and P. V. Chitrao

price of the gold products. On the other, if there is a reduction in the interest rate, the consumption of gold increases among the consumers. Thus it can be said that under certain prevailing situations like low interest rate, festive seasons, stable political conditions and high gold reserves by the governing body, stable political condition, and high inflation rates tend to increase the consumption of gold by consumers. Gold purchase in India is mainly associated with cultural and religious beliefs. People buy gold ornaments for celebrating festivals occasions, birth ceremonies, weddings and birthday celebrations. There is an exchange of gold between relatives and offering to the deities on special occasions to make the celebrations more pious. Gold is also used in the form of decoration, jewelry, ornament or accessory to showcase high status and position in society. It is also a source of investment that is passed on from grandparents to parents to their children. Thus, there has been a significant increase in the demand and consumption of gold in India. The consumption of gold is deeply rooted in Indian culture with people using gold for religious connotations, family heirloom, gifs, status and investment from ages. Gold is considered to be an integral part of Indian religious celebrations irrespective of creed. The consumers belonging to different communities such as Hindu, Sikh, Jain, Christian or Muslim make gold purchases in their religious celebrations. The increasing prices of gold do not discourage the devotees from donating gold to religious places. For example, the temple, Tirumala in Andhra Pradesh, receives tons of gold as offerings from the devotees irrespective of the rising gold prices in the state. Gold is an essential commodity in every household and passed over generations by the family members in the form of the family legacy. It includes passing jewelry from mother to brides or mother-in-law to daughter in law at weddings. Gold forms the sentimental part of the traditions and cherished in Indian hierarchy. Moreover, passing gold from one generation to another helps in saving money and maintaining gold prices in the Indian market. Gifting gold ornaments or objects forms an essential part of the Indian culture.

2 Literature Review Khadekar and Kohad [10] in their study found that consumers view gold as a pure investment. It was seen that the rural consumers demanded less than urban consumers because of better financial stability of the urban people of Nagpur than their rural counterparts. Chaisuriyathavikun and Punnakitikashem [2] have found that two major factors like buyer preference and expected future value are significantly related with customers’ intention to purchase gold ornaments. Prabhakarrajkumar and Mohandass [18] found that the bullish or bearish trend of ornamental gold price was decided not only by the demand increased by the buyers due to festivals, important social occasions and feel best investment, but also by other external factors like overall demand of other countries for gold, monetary policy and inflation status of the nation.

A Study of Purchase Behavior of Ornamental Gold Consumption

265

Joseph [8] has found that consumers have inclination toward branded jewelers as compared to the small dealers of gold. The consumer behavior is influenced by name, reputation and shop ambiance of the sellers, the purity of gold, etc. The influence of gold price on the consumers is subjective and is dependent on their occupation and income. Advertisements do not have any influence on purchase behavior of gold consumers, but they are helpful in positioning the jeweler in the market. Napompech et al. [15] in their study found that gender plays an important role in influencing gold consumption for savings and investments in the aspects of yielding higher return rates than savings/investment in other types of assets and buying/selling from government agencies. Education level and income strongly influence gold consumption. Mathivanan and Sangeetha [13] it was seen from the analysis that high level of customer satisfaction is important in gold ornamental industry as it leads to repeated purchase showing customer loyalty. Palanichamy [17] it was observed that majority of them are buying gold as an investment. Vanitha and Saravanakumar [22] the authors are of the opinion that the investment in gold is better suited for easy conversion into money with quickest possible time from the bank and gold merchants, thus having flexibility. Jain [7], the author finds that purchase behavior of jewelry is influenced by design, price and information medium. Four purchasing criteria emerge as price, comfort, jewelry design and good quality. Hundal et al. [6], in their study said that the perceptions of an investor differ with respect to alternative investment avenues, assets and segments present in the market. It was found that variables like profitability, tax aversion, future prospect, time value of money, etc., motivate a retail investor to purchase gold as an investment. Praveenkumar [19] it was found that if the shop owners give more emphasis on quality of gold, offers and discounts, then more consumers will be attracted toward them. On concluding, it was said that gold jewelry in India is very essential because of its cultural importance. Raghavan and Ahmed [20] it was found from the study because of inflation and currency debasement gold, and commodities will become preferred portfolio asset classes, attracting pension fund managers and others to the space. The study shows that investment demand in gold is encouraged due to gold market deregulation. Schoenberger [21] it was seen in the study that gold value is connected to its social qualities and artificial scarcity as well as its physical qualities and natural scarcity. The study found out that the social relations of production and consumption are bounded by their value. Liu [12] in his study found that gold consumption and disposable income have a curvilinear relationship, and for that as emerging markets mature and middle-class consumers’ shopping baskets become more diverse and sophisticated, their gold consumption decreases and eventually stabilizes. Godbole and Sashidharan [5] in their study observed that according to the HowardSheth consumer behavior model social factors comprising family, reference groups and social class influence consumers’ buying decision.

266

S. Kakkar and P. V. Chitrao

Lakshmi [11] the results of the study show that customers are influenced by neighborhood and advertisements, and they prefer to purchase gold in corporate showrooms. Nair and Gulati [14] found that advertisements have important role for gold jewelry promotions. Consumer buying decision is not influenced by moods and cultural factors. Nwankwo et al. [16] in their study found that societies and cultures influence the consumption of luxury goods. It is evident that influence of religion has not significant impact on affordability of luxury good purchase but impact is due to globalization of markets. The study also finds that women were found to be more positively disposed to impulse purchasing of luxury goods compared to men. Asha and Christopher [1] the authors believed that factors such as increasing consumer false belief, decreasing investment compulsive purchases, fascinating retail channels and competition from other expensiveness products have led to an upward activity of larger brands. Joseph [9], in their research, has tried to show the fact that Indian people has a great fascination toward gold because the role it has played in India’s cultural heritage. Chaisuriyathavikun and Punnakitikashem [2] studies found that there is significant relation with customers’ intention to purchase gold ornaments and these factors are buyer preference and expected future value. Ertimur [4] in his dissertation research shows that gold and jewelry have been purchased for basically three main reasons, which are, gift giving, ornamentation and investment. It was also denoted by the author that ornamentation of gold jewelry is somewhat related to the fashion choice of the people.

3 Objectives 1. 2.

To discover the purchasing behavior of ornamental gold buyers in Mumbai. To understand consumer’s cultural and social values significance associated with the possession of ornamental gold.

4 Research Methodology Primary data was collected through a well-structured questionnaire. Sampling was done on the basis of non-probability methods. Sample size of 1000 was collected from Mumbai, as city identifies itself as a cosmopolitan with heterogeneous groups with varied culture, religious and different socio-economic background. The city well represents people having different purchase behavior for ornamental gold jeweler. The objective of the questionnaire is to analyze consumers purchasing pattern and opinions of gold consumption. Issues related to social and cultural significance of gold consumption, preference, frequency of purchase and need for purchase are

A Study of Purchase Behavior of Ornamental Gold Consumption

267

widely covered in the questionnaire. Extensive literature review was done from established journals to understand broad ideas and background of the study.

5 Data Analysis and Findings Demographic description of the sample is as follows. Particulars

Levels

Percentage of sample

Gender

Male

48.1

Age group

Marital status

Education level

Employability status

Income per year

Female

51.9

25–34 years

17.7

35–44 years

36.9

45–54 years

45.4

Single

34.6

Married

57.4

Divorced

8.0

Higher secondary

12.7

Bachelor degree

28.6

Master’s degree

39.1

Master’s degree and above

19.6

Housewife

13.0

Self-employed

29.0

Employed in a public/private company

39.0

Retired

19.0

1,000,000 and less

9.5

1,000,000–1,500,000

18.7

1,500,001–2,500,000

26.6

2,500,001–3,500,000

12.4

3,500,001–4,500,000

17.0

4,500,001 and above

15.8

Findings indicate that about • • • • •

31.2% of the respondents had monthly savings between 25,001 and 35,000. 39.0% of the respondents were purchasing once in 6 months. 20.0% of the respondents were purchasing gold during festival days. 23.9% of the respondents were purchasing the gold ornaments as gift. 17.3% of the respondent’s source of knowledge for purchasing gold ornaments was newspapers and magazines.

268

S. Kakkar and P. V. Chitrao

• 44.1% of the respondents were buying gold ornaments from branded gold ornaments available through online medium. Based on above observations, it can be interpreted that consumers ensure that savings are enough to make them purchase gold. Generally, the preference time of buying is festive season giving cultural importance to purchase. Consumers are quite skeptical on gold transactions and hence rely on branded gold.

6 Results of Hypothesis Testing

Statements

Pearson’s Chi square value

There is significant association between frequency of purchase and gold ornaments holding significant importance in Indian weddings

0.047

There is significant association between preference for buying gold ornaments and gold ornaments holding significant importance in Indian weddings

0

There is significant association between gender and preference to 0.023 buy gold ornaments because it is highly liquid and easy to sell in case of emergency There is significant association between age group and preference 0.007 to buy gold ornaments because it is highly liquid and easy to sell There is significant association between marital status and purchasing gold ornaments because its value will increase in future

0.05

There is significant association between employability status and 0.025 gold being a less risky option than other investment avenues There is significant association between income per year and purchasing gold ornaments because believe its value will increase in future

0.019

There is significant association between frequency of purchase and preference to buy gold ornaments because it is highly liquid and easy to sell in case of emergency

0.02

There is significant association between frequencies of purchase, 0.001 and only lightweight ornaments are used for daily use, while others are occasionally utilized There is significant association between age group and Akshaya Tritiya being a special occasion wherein buying gold ornaments is auspicious

0.001

There is significant association between age groups, and believing high price means high quality

0.005

There is significant association between income per year and price playing a vital role in influencing purchase of gold ornaments

0.004

(continued)

A Study of Purchase Behavior of Ornamental Gold Consumption

269

(continued) Statements

Pearson’s Chi square value

There is significant association between monthly savings and Akshaya Tritiya being a special occasion wherein buying gold ornaments is auspicious

0.038

There is significant association between frequency of purchase and gold ornaments holding significant importance in Indian weddings

0.047

There is significant association between frequency of purchase 0.003 and Diwali accounts for an important festival for Hindus in India, and buying gold ornaments during those days holds cultural value There is significant association between purpose of purchasing gold ornaments and doing thorough research on prices before purchase of gold ornaments

0

There is significant association between preference for buying gold ornaments and Akshaya Tritiya being a special occasion wherein buying gold ornaments is auspicious

0.006

There is significant association between preference for buying gold ornaments, and gold ornaments hold significant importance in Indian weddings

0

There is significant association between preference for buying gold ornaments, and believing a high price means high quality

0

There is significant association between preferences for buying gold ornaments, and promotional offers do not influence my purchase for gold ornaments

0.009

Based on one-sample T-test, we conclude that • There is a significant association between purchasing behavior of ornamental gold buyers and gold price movements in Mumbai. • There is a significant association between investment size and usage pattern of ornamental gold buyers. • There is a significant association between investment in gold and its need for family future. • There is an association between demand for ornamental gold and preference because of cultural, religious and ritualistic values of Indian tradition.

7 Conclusion The frequency of gold purchase is once in six months, and most of them purchase gold during festivals. It leads to understand that consumers give importance to gold purchase during festive seasons. Indian traditions and customs are depicted for the fact that purpose of gold purchase is mainly for gifting purpose. Gold ornaments are purchased mostly in weddings and gifted to children as a mark of custom carried

270

S. Kakkar and P. V. Chitrao

forward since olden days. Awareness of gold is captured through print media, and consumers rely on branded gold. Pricing plays an important part in purchase behavior as consumers do a thorough research prior purchase. Consumers prefer to purchase gold in small quantities so that cultural significance is maintained. Consumers feel the traditional need for weddings an important aspect in possessing gold. Consumers totally agree that they overlook the increase in gold price provided there is a reason for purchasing gold. Promotional offers do not have influence on consumers’ reasons for gold purchase. Price and gold quality play vital role in gold purchase. Consumers feel gold is highly liquid and can be converted into cash when required. This gives them confidence of returns from their investment in gold. They feel investment in golds adds value in future, and it is a less risky option. Gold usage is more of lightweight ornaments, and purchase quantity is in reasonable amount. Consumers give importance to festivals for purchasing gold. Consumers feel gold is less risky option and believe that its value will increase in future. Their investment in gold depends on their annual income. Consumers ensure monthly savings so that they do gold purchase on the occasion of Akshaya Tritya as it is considered auspicious. Indian weddings play an important cultural impact on consumers’ preference of buying gold.

References 1. Asha, K., Christopher, S.E.: A study on buying behaviour of customers towards branded and non-branded gold jewellery with reference to Kanyakumari district. Int. J. Manag. 5(10), 105– 114 (2014) 2. Chaisuriyathavikun, N., Punnakitikashem, P.: J. Acad. Bus. Retail Manag. 10. https://jbrmr. com/cdn/article_file/i-24_c-237.pdf 3. Chaisuriyathavikun, N., Punnakitikashem, P.: A study of factors influencing customers’ purchasing behaviours of gold ornaments. J. Bus. Retail Manag. Res. 10(3), 147–159 (2016) 4. Ertimur, B.: Gold and gold jewelry: exploration of consumer practices. The Department Of Management, Bilkent University, Ankara (2003) 5. Godbole, S.S., Sashidharan, G.: Will employment effect gold buying? An Indian Persp. Theor. Econ. Lett. 09(05), 1225–1234 (2019). https://doi.org/10.4236/tel.2019.95079 6. Hundal, B.S., Grover, S., Bhatia, J.K.: Herd behaviour and gold investment: a perceptual study of retail investors. J. Bus. Manag. 15(4), 63–69 (2013) 7. Jain, N.: A study on consumer buying behaviour towards traditional jewellery of Rajasthan. Res. Rev. Int. J. Multidisc. 4(12), 126–131 (2019) 8. Joseph, J.K.: Consumer behaviour in the gold jewellery market of Kerala. Int. J. Bus. Adm. Res. Rev. 1(6), 86–91 (2014) 9. Joseph, H.: A study on the effectiveness of integrated marketing communication on different brands of gold jewellery. Int. J. Res. Comm. Manag. 7(9), 77–82 (2016) 10. Khadekar, S.G., Kohad, R., Dr, R.: Consumer buying behavior of gold and gold jewellery of Nagpur region. IOSR J. Bus. Manag. 18(09), 01–03 (2016). https://doi.org/10.9790/487x-180 9020103 11. Lakshmi, H.H.: Female customer intentions in buying gold jewelry. Int. J. Res. IT, Manag. Eng. 6(2), 101–113 (2016) 12. Liu, J.: Covered in gold: examining gold consumption by middle class consumers in emerging markets. Int. Bus. Rev. 25(3), 739–747 (2016). https://doi.org/10.1016/j.ibusrev.2016.03.004

A Study of Purchase Behavior of Ornamental Gold Consumption

271

13. Mathivanan, M., Sangeetha, D.: An analytical study on buyers’ satisfaction towards purchasing gold ornaments. Adalya J. 8(9), 636–641 (2019) 14. Nair, S.S., Gulati, M.G.: Understanding the effect of cultural factors on consumers moods while purchasing gold jewelry. Adv. Market. Custom. Relationsh. Manag. E-Services, 298–318 (2019). https://doi.org/10.4018/978-1-5225-5690-9.ch014 15. Napompech, K., Tanpipat, A., Ueatrakunkamol, N.: Factors influencing gold consumption for savings and investments by people in the Bangkok metropolitan area. Int. J. Arts Sci. 3(7), 508–520 (2010) 16. Nwankwo, S., Hamelin, N., Khaled, M.: Consumer values, motivation and purchase intention for luxury goods. J. Retail. Consum. Serv. 21(5), 735–744 (2014). https://doi.org/10.1016/j.jre tconser.2014.05.003 17. Palanichamy, C.: Buying behaviour of women towards gold jewellery in Erode City, Tamilnadu. Jac J. Compos. Theory 12(12), 659–666 (2019) 18. Prabhakarrajkumar, K., Mohandass, S.: Gold price trend and investigation of purchasing patterns of ornamental gold buyers in Tamil Nadu. Asia Pacific J. Res. 1(17), 21–32 (2014) 19. Praveenkumar, S.: Buying behavior of consumers towards gold jewellery in Madurai District, Tamil Nadu. Int. J. Res. Human. Arts Lit. 7(1), 95–102 (2019) 20. Raghavan, A.S., Ahmed, N.N.: Passion for ornamental gold jewellery in India. Int. J. Enterprise Innov. Manag. Stud. (IJEIMS) 2(2), 125–136 (2011) 21. Schoenberger, E.: Why is gold valuable? Nature, social power and the value of things. Cult. Geogr. 18(1), 3–24 (2011). https://doi.org/10.1177/1474474010377549 22. Vanitha, S., Saravankumar, K.: The usage of gold and the investment analysis based on gold rate in India. Int. J. Electr. Comput. Eng. (IJECE) 9(5), 4296–4301 (2019)

Circularly Polarized Microstrip Patch Antenna for 5G Applications Sanjeev Kumar, A. Veekshita Sai Choudhary, Aditya Andotra, Himshweta Chauhan, and Anshika Mathur

Abstract A compact size square-shaped patch antenna with circular slots working in 5G frequency is proposed in this paper. The following novel design is a circularly polarized antenna working at operating frequency of 28.2 GHz in the 5G frequency range. The substrate material used for the design is Rogers RT/Duroid 5880 having relative permittivity of 2.2(Er) with a thickness of 1.575 mm. The feed mechanism chosen for this design is inset microstrip line. The positioning and length of the inset feed in this design resulted in good return loss of − 21.24 dB at 28.2 GHz. There are three circular slots, one having a radius of 1.8 mm and the other two having a radius of 0.5 mm. The proper separation of these slots resulted in the achievement of high axial ratio bandwidth. Hence, the axial ratio bandwidth of the proposed antenna is wide that is 1200 MHz from 27.8 to 29 GHz. The proposed antenna operates under a wide bandwidth of 26.8–29.5 GHz. The simulated gain of the antenna is 3.5 dB. The VSWR of the antenna is 1.5, thus proving that the power transmission is quite good. This paper covers entire bits of topology that were utilized to design an antenna reconfigured to nurture the attributes of circular polarization in high frequency, with large axial ratio bandwidth, high return loss and perfect impedance matching. Hence, the antenna is suitable for 5G applications.

S. Kumar (B) · A. Veekshita Sai Choudhary · A. Andotra · H. Chauhan · A. Mathur Department of Electronics and Telecommunications, Symbiosis International (Deemed University), Pune, India e-mail: [email protected] A. Veekshita Sai Choudhary e-mail: [email protected] A. Andotra e-mail: [email protected] H. Chauhan e-mail: [email protected] A. Mathur e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_24

273

274

S. Kumar et al.

Keywords 5G · Square patch · Wide bandwidth · Wide axial ratio bandwidth · Circular polarization · Circular slots · Feed mechanism · Operating frequency · Impedance matching · Return loss · Gain

1 Introduction The 5G technology has become one of the most important sources for economic and technical development in the world [1]. India has been allocated with the license to use the 5G spectrum of 24–30 GHz [2]. Circular polarization is the most widely used feature in design of an antenna to work for it under the 5G spectrum due to its great number of advantages. There are numerous advantages like having a much better line of sight than linear polarization, higher rate of absorption of radio signal and resistance towards signal degradation. The motivation for designing this antenna is its structural and functional flexibility, a smaller size makes it suitable for use in communication equipment that have size limitations, and circular polarization allows for this equipment to maintain communication while being in motion. By combining these and other desirable features, the proposed antenna will be well adjusted for many applications. In [3] they have described that one of the most general way of obtaining circular polarization is to have a square patch truncated symmetrically at the corners at an angle of ± 45°. In [4], a pair of unequal slots have been placed on the patch in a cross-position, and HMS technique was used to generate circular polarization. An antenna with a patch in between and two rectangular patches parallel to the central patch separated by a calculated length, and placement of rectangular slot in the ground plane resulted in the generation of circular polarization antenna [5]. In [6], they have declared that for single feed configurations, the perturbation techniques involve insertion of slots, truncation of corners, introducing slits to the patch and loading stubs in the boundary. In [7], circular polarized miniaturized antenna has been designed by having rectangular slots, thus reducing the size of patch antenna by 44.8%. In [8], the antenna utilizes L-probe feed. The achievable bandwidth given was 27% at a perceived -10db return loss. The reason for choosing inset feed technique is that it provides ease of design and fabrication along with simplicity in modelling [9]. The feedline should be in a position where it is diagonal to the perturbations so that it generates two orthogonal modes for circular polarization radiation [10]. A compact-sized circularly polarized microstrip patch antenna with inset feedline is proposed in this paper. The proposed antenna gives high axial ratio bandwidth with proper return loss and impedance matching. Section one introduces different circular polarized antennas along with the reasons of selecting a specific parameter in the proposed design. Section two explains the design and workflow of the proposed antenna. Section three demonstrates the results and the analysis of results. In the end, section four consists of conclusion.

Circularly Polarized Microstrip Patch Antenna …

275

2 Antenna Design and Geometry 2.1 Workflow of Proposed Antenna Detailed information related with the design procedure followed to achieve the proposed design is summarized in the below given flowchart in Fig. 1. In the flow chart, one can find different parameters that play a crucial role in optimizing the impedance, return loss, axial ratio and AR bandwidth of the antenna.

Fig. 1 Flow chart description of the design process

276

S. Kumar et al.

2.2 Antenna Geometry Here, parameters of the proposed circularly polarized antenna working in 5G frequency are discussed. It starts by the design of a microstrip patch that operates at 28.2 GHz, over a conventional ground plane. A square patch with length and width of (7 × 7) mm2 is designed. The upper left and lower right corner of the square patch are truncated at an angle of 45° and are almost symmetrical. The substrate used here is Rogers 5880 having a dielectric constant of (εr = 2.2) and thickness of 1.575 mm. The length and width of inset feedline are 3 mm and 1 mm, respectively. The radius of circular slot circle1 is 1.8 mm. The other two circular slots have same radius of 0.5 mm, and their importance is discussed in further paragraphs; both the circular slots are separated by 2.6 mm. The length of the truncated corners is 2 mm. Figure 2 shows the microstrip patch antenna designed in HFSS Version 13. The dimensions of the proposed patch antenna is reflected in Table 1.).

Fig. 2 Microstrip patch antenna design

Table 1 Measurements of various dimensions

Dimensions

Measurements (mm)

a (Length of the patch)

7

r1 (Radius of circle 1)

1.8

r2 = r3 (Radius of circle 2 and circle 3) 0.5 h (Height/thickness of substrate)

1.575

l (Length of truncated corners)

2

L (Length of the inset feedline)

3

W (Width of the inset feedline)

1

Lsub (Length of ground plane)

13.03

W sub (Length of ground plane)

12.03

Circularly Polarized Microstrip Patch Antenna …

277

3 Results and Analysis Axial ratio is the most important parameter to check the circular polarization of an antenna. The proposed antenna has achieved an axial ratio less than 3 dB for a frequency range of 27.8–29 GHz. This is a very wide axial ratio bandwidth, and with this, we can clearly say that the E–H planes of this antenna have minimal loss of power signals as they are shifted to another plane due to circular polarization. Figure 3 shows the plot of axial ratio in dB. S-Parameters are generally used to express the relationship between the input ports and output ports of the antenna. For wireless communication, return loss is expected to be less than − 10 dB over its frequency bandwidth. Figure 4 shows the return loss plot. The proposed antenna has achieved a return loss of − 21.24 dB resonating at 28.2 GHz. VSWR values describe how good the antenna is matched with the transmission line which is connected to the antenna. It is expected that the value of the VSWR

Fig. 3 Plot of axial ratio in dB

Fig. 4 Plot of the return loss

278

S. Kumar et al.

should be minimum and the ideal value of VSWR is 1. Practically, the value of VSWR lies in between 1 and 2. Figure 5 shows the plot of VSWR. Our proposed antenna has achieved 1.524 at 28.2 GHz. Parameters like beamwidth, sidelobe level and gain can be determined from one parameter that is radiation pattern. The basic information which we get from radiation pattern is how the gain is spread across the angles which is basically the angular direction of radiation. Figure 6 shows the 2D polar plot of the radiation pattern of the proposed novel design. From the figure, we can observe that the plot has achieved a peak gain of almost 3.5 db. Side lobes are pretty much unwanted in any antenna. But practically, we do have side lobes which result in unwanted transmission and reception of the signal, and the proposed design consists of only a small side lobe which is quite acceptable for an antenna working in the 5G spectrum. Figure 7 shows

Fig. 5 Plot of VSWR

Fig. 6 2D polar plot of radiation pattern

Circularly Polarized Microstrip Patch Antenna …

279

Fig. 7 3D polar plot of radiation pattern

the 3D polar plot of radiation pattern and is close to the ideal plot of the polar plot. From these figures, we can observe that the proposed antenna is radiating properly. H-plane and E-plane depend on the width of the patch and height of the substrate, respectively. H-plane plot tells us the magnitude intensity. Figure 8 shows the Hplane radiation. From Fig. 8, we can see that there is very less signal loss that is the loss is only between 5 and 10 dB. The co-polarization and cross-polarization in the figure of H-plane can be seen clearly. Figure 9 displays the E-plane radiation which depicts the electric flied intensity for the proposed antenna. Hence, the electric field distribution in the novel design is quite satisfactory. Figure 10 shows the E-field distribution of the proposed design. We can observe that there is more intensity near the small circular slots and towards the edge of the big circular slot, resulting in proper distribution of the electric field. The gain versus frequency plot has been shown in Fig. 11 where the achieved gain is almost 3.5 dB which is quite satisfactory for an antenna working in such high frequency. Fig. 8 H-plane radiation pattern

280 Fig. 9 E-plane radiation pattern

Fig. 10 Electric field distribution of the proposed design

Fig. 11 Plot of gain versus frequency

S. Kumar et al.

Circularly Polarized Microstrip Patch Antenna …

281

Fig. 12 Plot of peak gain versus frequency

The peak gain vs frequency plot has been shown in Fig. 12, where the peak gain is 6.5 db at 28.2 dB. Hence, at 28.2 GHz, the concentration of the input power in the direction of the main beam is acceptable.

4 Conclusion Hence, this paper has presented a novel design of a circularly polarized microstrip patch antenna with inset feed resonating at a frequency of 28.2 GHz and operating in a frequency range between 26.8 and 29.5 GHz. The proposed design has successfully achieved a 3 dB axial ratio bandwidth of 1200 MHz and a peak gain of 6.5 dB at 28.2 GHz. The proposed design has achieved reflection coefficient of − 21.24 dB. The electric field distribution of the antenna is accurate. Hence, from the simulated results, we can conclude that the proposed design can be used in various 5G applications like mobile communications, intelligent transportation systems, augmented reality (AR) and virtual reality (VR) applications. For future studies, a metamaterial surface can be added in order to increase the efficiency of the antenna.

References 1. Kumar, S., Dixit, A.S., Malekar, R.R., Raut, H.D., Shevada, L.K.: Fifth Generation Antennas: A Comprehensive Review of Design and Performance Enhancement Techniques. IEEE Access (2020). https://doi.org/10.1109/ACCESS.2020.3020952 2. Qualcomm_tech: Global Update on Spectrum for 4G and 5G (2020). www.qualcomm.com 3. Sung, Y.: Investigation into the polarization of asymmetrical-feed triangular microstrip antennas and its application to reconfigurable antennas. IEEE Trans. Antennas Prop. 58(4) 4. Verma, M.K., Kanaujia, B.K., Saini, J.P., Saini, P.S.: A Broadband Circularly Polarized Crossslotted Patch Antenna with Horizontal Meandered Strip (HMS). DE Gruyter (2019). https:// doi.org/10.1515/freq-2019-0113

282

S. Kumar et al.

5. Jian, R., Chen, Y., Chen, T.: Compact wideband circularly polarized antenna with symmetric parasitic rectangular patches for Ka-band applications. Hindawi Int. J. Antennas Prop. (Article ID 2071895) (2019) 6. Sahal, M., Tiwari, V.N.: Review of Circular Polarization Techniques for Design of Microstrip Patch Antenna. Research Gate (2015) 7. Mak, K.M., Lai, H.W., Luk, K.M., Chan, C.H.: Circularly Polarized Patch Antenna for Future 5G Mobile Phones. IEEE Access (2014). https://doi.org/10.1109/ACCESS.2014.2382111 8. Islam, M.T., Misran, N., Shakib, M.N., Zamri, M.N.A.: Circularly polarized microstrip patch antenna. Inf. Technol. J. 9(2), 363–366 (2010) 9. Paul, L.C., Hosain, M.S., Sarker, S., Prio, M.H., Morshed, M., Sarkar, A.K.: The effect of changing substrate material and thickness on the performance of inset feed microstrip patch antenna. Am. J. Netw. Commun. 4(3), 54–58 (2015). https://doi.org/10.11648/j.ajnc.201504 03.16 10. Garg, R., Bhartia, P., Bahl, I., Ittipiboon, A.: Microstrip Antenna Design Handbook. Artech House Antennas and Propagation Library, London (2001)

An Approach Towards Protecting Tribal Lands Through ICT Interventions K. Rajeshwar and Sonal Mobar Roy

Abstract The Panchayat Act (extension to the scheduled areas) of 1996, widely known as PESA, was passed to place the scheduled areas in nine countries under the jurisdiction of the Panchayati Raj National Framework. This act was drafted in accordance with traditional tribal self-regulation by entrusting Gram Sabha with special authority not spelled forth in the National Panchayati Raj Institutions (PRIs). This study illustrates how information and communication technology is utilized to safeguard tribal lands and investigates an approach to the computerization of tribal land accounts for the inclusion of transparency and accountability features. The article also concludes with a suggested land information system model which would encourage data exchange between tribal groups on land ownership information. Land constitutes the cornerstone of all economic growth and can only be properly achieved if there is knowledge about the area. Therefore, all stakeholders such as planners and administrators need land information. However, the collecting of land data has at all times been a costly endeavour, and hence, information is insufficient in most places. The article suggests land information management system (LIMS) paradigm, which encourages data exchange among diverse land administration stakeholders. Keywords ICT · PESA · LIMS · Tribals · Land rights

1 Introduction It has been well noted that among the STs in our nation, poverty and landlessness are endemic. According to the 2011 census, 51% of all STs are below the income threshold compared with 40.2% on the national average. In addition, 65% of the STs are landless as well. Land alienation is the key element in the suffering of tribals. K. Rajeshwar (B) · S. M. Roy National Institute of Rural Development & Panchayati Raj, Hyderabad, India e-mail: [email protected] S. M. Roy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_25

283

284

K. Rajeshwar and S. M. Roy

There are 100 million tribal people living in India who have constitutionally been tackled through two separate pathways, namely the fifth and sixth schedules. In the tribal groups, Panchayats (extension to the scheduled areas) Act of 1996 (PESA) has been a landmark. According to the 2011 census, the tribes form 8.6% of the country’s population and are listed in the fifth and sixth schedule areas, mostly tribals. They cover around 15% of the territory of the nation. The suggestion of the Bhuria Committee led to the passage in parliament of the PESA Act, and the PESA Act was initiated on 24 December 1996 [2]. The Panchayat (scheduled regions extension) Act of 1996 is a concisely and somewhat unconventionally legislative structure, generally known as the PESA Act or PESA, one of India’s very few political legislations with potential for the development of certain areas and for raising certain groups of people. This pertains to a geopolitical system outlined in tribal territories, but omitting north-eastern tribal districts. These regions are referred to as scheduled V areas or simply planned areas, and the PESA respects the sociocultural and political rights of the indigenous populations who live there. The Indian Constitution shall, by virtue of a number of its dispositions in Articles 15, 16, 19(5), 23, 26, 46, 164, 343(M), 244, 275, 330, 332, 334, 335, 338-A, 339, 342 or 366(25), protect the identity and rights of the scheduled tribal as well as the fifth schedule and the sixth schedule annexed to the constitution. The spirit and the mandates of two significant constitutional articles, Articles 243 and 244, are synthesized by PESA in themselves. In contrast to the traditional country panchayat, by virtue of the 73rd Constitution Amendment Act in 1992, a standard and statutory three-tier Panchayati Raj system was intended to be institutionalized and included in the constitution as part IX (panchayats). Notwithstanding that the amendment laid down path-break-up provisions for an effective local self-government scheme in the country’s mainstream, non-scheduled areas, 73rd amendment was not automatically extended to scheduled areas or tribal area in view of their unique characteristics and special needs. Clause 4(a) authorized the legislature of that state, if desired, by passing a resolution, to extend the amendment to its tribal areas. In Clause 4(b), the parliament was empowered to apply the amendment to the scheduled areas or tribal areas by means of a statute embodying the amendments and exemptions required under that legislation. The key facets of the fifth schedule that distinguish the administration of the scheduled areas include the report of the governors to the president on management in the scheduled areas, the tribes consultative council, the governor’s special legislative powers, the president’s power to declare and redefine the scheduled areas. Digital and medium resources have been useful to minimize community marginalization. This is lacking from the central and state tribal development agendas. In accordance with Article 275(1) of our constitution, the tribal sub-planning offers special central help for investment in development programmes for tribals alone. More than 14 Tribal Research Institutes provide development input for necessary policy programmes to relevant agencies. The Tribal Cooperative Marketing Development Federation of India Limited is the core agency to provide sustainable goods and services in terms of revenue and livelihood. There are roughly 192 integrated tribal development projects (ITDA) in 19 countries and territories of the Union. There

An Approach Towards Protecting Tribal Lands Through …

285

is absolute lack of attention in all these efforts on the integration of information and communication technologies (ICT). The benefits of digital inclusion are much needed by tribal people. Tribal clusters need to concentrate on digital skills and literacy. There is an urgent demand for digital ways to connect and access communities. Digital technologies will make tribal goods and services easier to sell. Do we need a distinct policy or action plan for a tribal action plan like the e-tribal policy or ICT? Seeing Karbi Anglong as an instance, it is undoubtedly crucial to underline that tribal inclusion should be simultaneously taken place with digital integration, taking connection and access as key to demand and services. This unique approach may contribute to development equity. India’s efforts towards an advanced knowledge-based society and economy cannot ignore the millions of tribal residents of India.

1.1 Role of ICT Information and communication technologies have given to humanity by enhancing their working culture and providing better public service delivery, better government interactions with companies and industries, empowering citizens through easy access to information and public participation in decision-making. All in all, effective administration within the government system would enable stakeholder-closed service centres to be located. The idea of e-governance was developed to lessen the distance between the government and its stakeholders and establish a transparent environment. The information communication and technology has established a simpler government service delivery system for huge segments of individuals from various geographical regions. This has contributed to a more effective and efficient administration by decreasing communication costs and enhancing transparency in the operational section of the many government ministries. It allowed individuals to work with basic applications such as online form filling, billing and payments or even in complicated applications such as distant education and tele-medicine. Land records are the crucial instrument for assigning and settling land titles. For land taxes, reforms and administration, land records are highly necessary. The manual records do not fulfil the actual demands and cannot properly gather and analyse the data that are vital for the land markets. It was once the task of the so-called village accountants who had the obligation to keep their land records. Most of these village accountants, however, were not readily accessible, and even if farmers could come in touch, those accountants would ask them for a flat sum of bribes to tell them of what the landowners were entitled to. For mutations, i.e. where land records have to be altered, the process is rather lengthy and a complicated process that may take around a year or two to update. For the central and state governments of India, thus the establishment of an effective land information system has been a serious difficulty. In the forthcoming section of the paper, the authors discuss the various initiatives taken up for land protection.

286

K. Rajeshwar and S. M. Roy

2 E-Governance Initiatives Towards Land Protection Bhoomi: Land records automation (State Government of Karnataka). It offers an informatic record of tenancy and crops rights (RTC)—required by farmers for bank loans, settling land disputes and so on. It also guaranteed more openness and trustworthiness, a substantial decrease in farmers’ corruption, exploitation and persecution. Twenty million rural farmers have profited from this effort, involving 6.7 million farms. Bhoomi supports the computerization of 6.7 million farmers’ whole 20 million land ownership data. It is created exclusively for the Karnataka State and dominance of regional language i.e. Kannada is seen. However, it is applicable to general public as well. To assist the Bhoomi initiative, 177 talks and 203 kiosks are prepared. Kiosks (Bhoomi Centre) provide RTC online for Rs. 15/- just at very minimal cost. Right records are highly effective; it only takes 5–30 min, whereas the previous method takes roughly 3–30 days. Mutation occurs within 35 days, whereas it takes at least 200 days in the traditional manual technique. Distribution of land record is quite high (nearly 14 million records). The incidences of mutations every year compared to the prior system are 1.6 million. It enables simple and swift access to land records, high proper recording efficiency and rapid mutation. Bhoomi offers high record reliability, a preponderance of local language that allows citizens to interact and acquire services at relatively little cost. CARD: The Computer-Aided Government of the Registration Department (CARD) registration project (State Government of Andhra Pradesh) affects 10 million residents over three years. It has finished registering 2.8 million titles with 1.4 million incidents of title queries. The technology guarantees transparency in property appraisal and an effective method of document management. The expected savings of 70 million people-hours in citizenship were assessed at US$35 thousand (CARD investment—US$6 million). The CARD project was intended at fully computerizing AP’s land registration procedure. The National Registration Act of 1908 did not provide for the registration use of computers, but the Andhra Pradesh Government changed this law and permitted the use of electronic devices for the land registration procedure. In a space of three years, about 90% of registration transactions in Andhra Pradesh took place online. The CARD project alters the registration procedure of legal papers in 214 offices in Andhra Pradesh State. Citizens embraced the CARD initiative due to its quality and the time needed to complete the registration procedure.

2.1 Strengthening of Revenue Administration and Updating of Land Records Scheme The land record management and updating scheme (SRA and ULR) was established in 1987–88 to assist states and UTs in updating, maintaining and strengthening land records, establishing and strengthening survey and settlement organizations

An Approach Towards Protecting Tribal Lands Through …

287

and survey training infrastructure, modernizing surveys and settlement operations, and enhancing income machinery. The funding provided under the SRA and ULR arrangement between the centre and the states was 50:50. 100% central aid is given to UTs. Buildings (including training centre, hostel, Patwarghars and record rooms); new surveying equipment (including procurement of current survey equipment such as GPS and air survey for quick and efficient surveying); maps, for example! The SRA and ULR system includes storage equipment and records digitalization technology. Assisting the states and UTs to update and maintain property records, create settlement organisations, modernise surveillance and settlement operations while improving income machinery were the goals of the SRA and ULR in 1987–88.

2.2 Computerization of Land Records (CLR) Scheme The CLR plan was initiated between 1988 and 1989 with trial projects in eight areas and then expanded to cover the rest of the nation. The major purpose of the plan was to guarantee digital copies of the records on demand that were obtained for landowners. Under this arrangement, the states and UTs were supplied with 100 per cent financial aid. A list of operations covered by the CLR plan includes the input of data, computer centre establishment at taluk/tehsil/block/circle level and sub-divisional level, computer sensitization training and digitalization of cadastral maps.

2.3 National Land Records Modernization Programme In August 2008, the Cabinet approved to merge the foregoing initiatives under the “National Programme for Land Records Modernization” (NLRMP). The programme aimed to establish the Torrens system principles. It is based on fundamental principles: (i) a single land registry agency (including textual record maintenance and updates, maps, settlement and survey operations, registration of mutations in real estates, etc.); (ii) the “mirror” principle that the ground registers reflect the ground reality at any given moment; (iii) the principle of the “curtain”. The work programme of the guidelines under the programme includes modifications to the Registration Act and the establishment of a “model legislation for conclusive titles” [1]. However, the act literally has nothing to say about how the new model law that is applied, how more vulnerable parties are safeguarded or how the concepts of “curtain” and “mirror” are maintained. Attention is to be drawn to the point that this is just a model legislation which is acceptable to the states as the country forms part of the national list.

288

K. Rajeshwar and S. M. Roy

3 Proposed Tribal Land Information Management System The tribal land information management system attempts to combine the duties of land management. The system should include features on the same platform such as land use plan, process plot applications, plot allocations, land use change, plot registration title transfer, sub-divisions, sub-leasing/sub-letting, development control and compliance, acquisition and compensation and land board revenues [7]. Data gathering: Data on existing land holdings must be collected methodically, which is undoubtedly a tremendous task. In order to optimize data collection in the present developed regions, two things must happen: (a) Data identifying existing boundaries must be collected, and a systematic adjudication mechanism will be required if they are not available. (b) Collect information on the ownership, kind of use and original purpose for which the property has been allotted [4]. Creating boundaries: Borders are defined by certain transaction costs, implying that the precision of the boundary is dictated by the nature and use of the region in question [9]. Given that the land borders or drawings essential for customary subsidies are not established, the Department of Surveys and Mapping may make use of orthophotographs or DXF or DWG drawings. In order to provide an economically active area with the basis for the GIS, the Department of Surveys and Mapping has prepared 1:5000 digital map sheets. GIS software was used to convert a digitised map into an ArcView form file. Data to be gathered: A customary law claim applicant must furnish a land board with the following information in compliance with Tribal Land Regulation 6(1): Tribal Land Regulation (a) Full name and postal address (b) Marital status (c) Ward in which the property is being sought (d) Nature of the right sought (e). In addition, true tribal land registries should contain plot numbers, allocation date of the plot, name and address of the allottee and plot description (e.g. number) and location (e.g. ward name, map reference/coordinates). A method is presented to enable land boards collect this information and build appropriate registers as well as to improve data exchange and reduce duplication [3]. LIS: In order to solve existing land management and management deficiencies, the planned land information system (LIS) will make linked data accessible to all stakeholders at different levels of accessibility and data change. The success of the LIS is contingent on the ownership of the land information system and the key to its financing, design, implementation and administration by all stakeholders. In order to confirm that the system functions correctly for all interested parties, they must all satisfy minimum data requirements to warrant that their activities are functioning correctly, and these requirements serve as the basis for defining which attributes data are needed during the entire data collection phase. As indicated above, the basic facts needed by the tribal land rules have to be collected. Structure: All key actors in land management should be connected and open under the new LIS paradigm, with a single database managed by the Ministry of Tribal

An Approach Towards Protecting Tribal Lands Through …

289

Affairs. The Ministry of Finance would be responsible for land taxes and the collection of income. Public access to the final system should also be limited so that development and land availability in their respective places are learned. In the Ministry of Tribal Affairs, the Department of Surveys and Mapping would be responsible for providing coordinate data, air photography, digital photography and other data relating to the initial digitalization of the boundaries of the plot and subsequent allocation of the plot numbers to be used in the system. The agency would be responsible for reconciling the tribal territory surveyed that had been converted to common law. Within the same ministry, the Town and Country Planning Board, whose function is to correctly plan and zone localities as planning areas, might identify allocated areas, zoning them and provide other information relevant to the planning issues based on data gathered. Accessibility: As land quality changes, the relevant departments will have access to and are responsible for their care and validity, while other stakeholders will have read-only access. It is envisaged that the database is accessible from and linked to the main land boards around the country for the exchange and sharing of information. At the zonal level, the main land boards would also be linked to sub-land boards and would operate as the main data retention and verification centre at sub-land board level, before it was forwarded to the central database. At the sub-zonal level, subsoil boards would provide data gathering technology capable of gathering attribution data relating to geographic location, ownership, legal status, land use/zoning and so on and verification of the plot size and form. Data security: Access at different levels within the system would need to be regulated in order to protect the security and authenticity of any information obtained. Only the sub-zonal level would be able to download and send the data to the zonal level. Data collection units might input new data and modify data fields but cannot change information in the database already. Data verification and exchange of data between zonal and central databases would provide zonal inter-zonal updates. This functionality would enable the collection and update of the database under instances in which tenants resided or worked at a location other than the property and the initial data collection would be incomplete. The data transfer procedure: For collection of data in the field, data will be utilized to record or collect equipment which will download a GIS-style overlay of connected and geo-reference aerial and orthography as well as digital polygons indicating plot borders. An equivalent duplicate of the data is saved in the database to be used simply by stakeholders. The data recording device may also upload all the acquired data into the database, so removing the need to enter human data. This technique would lower the danger of human error in the process of data transfer and boost the reliability of the data obtained. Digitizing polygons and preliminary assignment of identification numbers to relate database details to a particular piece of land must be given to the Primary Land Boards in each zone. The photographs are used to simply define areas where they are being processed. Photography on site will be used to quickly orient the user and to ensure that the digitalized polygons conform to the reality of the

290

K. Rajeshwar and S. M. Roy

ground. The polygon plot overlay should be used to activate a collection of linked database tables with all the necessary features for efficient land management. On the ground, any coordinate limited by the polygon sides would be achieved. Information gathering in the field: The initial data collection for LIS would involve both the land council and private surveyors. Both would at this stage play a role to speed up the data collection process and to ensure that the digitized polygons meet the ground limits. It is acknowledged that any data collection endeavour involves a tradeoff between quality, speed and costs [4]. However, a quick data collection approach is proposed here that would have ramifications for both people and financial resources. The data collection application should contain GIS capacities and the ability to alter the polygon shape to meet the features on the ground. Data dictionaries with the minimum data to be collected will be transferred to the loggers to guarantee that during collectivity the necessary data are not ignored. Since most noteworthy towns have been classed as planning areas, the current practice is for the assignment of plots only after the land is defined. This data are easy to include in the database as it is necessary to identify parcels of land based on the coordinates and parcel numbers supplied by the income department. This would happen at the same time as the initial data collection when no data are currently available. Once the initial process of data gathering has been completed, all ongoing data collection and input is the responsibility of sub-land boards. Editing, verification, updating and change of the data will take done at zonal level.

4 Discussion and Conclusion While a distinct legal and administrative structure is put in place for the protection of tribals’ rights to land and affirmative action, both the fifth and the sixth schedules of the constitution, the tribals nonetheless remain the most vulnerable, marginalized and poor part of society. PESA has emerged as the most significant law that may play a role in the recognition of tribal rights to natural habitat and resources in scheduled areas, therefore altering their quality of life. PESA should be seen as a break-up act to empower the tribal peoples. A beneficial component of the act is that the cultural and traditional characteristics of indigenous tribes are taken into account. This is one of the most significant acts in India that acknowledges and promotes the indigenous rights of tribes over their natural resources. Generating awareness is a prerequisite for implementing requirements in accordance with the PESA Act. The land information system suggested should form the foundation of all spatial data and be utilized in rural tribal land management [5]. The government has begun on two systems for managing land, namely the state land information management systems and the tribal land information management system. It is hoped that the two systems being built will be organized such that access to information which will come in this system will not only be available to technocrats in the field of land management but will also be a system for all those who work on the land and who are interested in the

An Approach Towards Protecting Tribal Lands Through …

291

land. The architecture suggested here is meant to help a variety of land information users to share data and improve data accessibility.

References 1. https://prsindia.org/files/bills_acts/bills_parliament/1970/Report-of-the-Committee-to-DraftModel-Act-Rules-and-Regulation-on-Conclusive-Land-Titling.pdf 2. Census of India: Census of India, Office of the Registrar General & Census Commissioner, New Delhi (2011) 3. Fourie, C.: Comments: Designing Viable Land Administration. Paper presented at (2002) 4. Longley, P., Goodchild, M., Maguire, D., Rhind, D.: Geographic Information (2001) 5. Natural Resources Services: Review of Botswana National Land Policy (2003) 6. Tembo, E., Simela, J.V.: Optimizing Land Information Management in Tribal Lands of Botswana, 2004 7. Tembo, E., Manisa, M., Maphale, L.: Land Information Management in (2001) 8. World Bank Report 9. World Bank: Land Policies for Growth and Poverty Reduction. Oxford University Press (2003)

Urban Sprawl Assessment Using Remote Sensing and GIS Techniques: A Case Study of Ernakulam District Sreya Radhakrishnan and P. Geetha

Abstract Urban sprawl is seen as a huge global challenge that primarily affects emerging countries with large populations. The outcomes of unplanned city growth into adjacent rural areas are uncontrolled wiping of natural resources and creating fluctuations in land surface temperature. This, in turn, influences climate change by causing heatwaves, droughts, and other extreme weather events. The present study uses spatial analytic techniques of RS and GIS to assess the urban sprawl in Ernakulam District of Kerala, India. Determining the Land Use/Land Cover (LULC), Built-up Index (NDBI), Normalized Difference Vegetation Index (NDVI), and Land Surface Temperature (LST) help to better understand how built-up areas and vegetation impact the surface temperature. Observations using Landsat 8 images from 2014 and 2020 showed that built-up areas have expanded by 2%, while waterbodies/wetlands have decreased by 1%. During the same period maximum, LST climbed from 34.3 °C in 2014 to 43.16 °C in 2020. Without proper urban planning, increasing surface temperature coupled with unchecked urban growth will harm the environment and quality of life. Hence an effective urban growth plan is required to limit urban sprawl. Keywords Urban sprawl · LULC · NDVI · LST · NDBI · Ernakulam District · Kerala · India

1 Introduction Urban sprawl can be defined as the rapid growth of towns and cities. It results from the requirement to accommodate a growing metropolitan population and is S. Radhakrishnan (B) Center for Wireless Networks and Applications (WNA), Amrita Vishwa Vidyapeetham, Amritapuri, India P. Geetha Center for Excellence in Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_26

293

294

S. Radhakrishnan and P. Geetha

linked to increased energy consumption, pollution, and traffic congestion. This causes degradation of wildlife habitat and the splitting up of remaining natural regions [1]. Urban sprawl majorly impacts developing countries and India is no exception. The South Indian State of Kerala saw slow urbanization until 2001 and seems to have had rapid urbanization in 2011 according to the Census 2011 report. Kerala had roughly 48 percent urban share in 2011 and had the greatest rate of urban population increase over the 2001–2011 decade. Kerala’s districts are seeing a significant increase in the urban population [2] as Ernakulam district shows the most urbanized growth during the 2001–2011 decade. Remote sensing (RS) and GIS technology, together with verified ground truth data, can identify, map, and evaluate the physical appearance and patterns of urban sprawl on landscapes. Mapping urban sprawl is one of the applied aspects of RS, which is independent of the spatial and spectral resolutions of the sensors [3]. Land Cover is referred to as how the water bodies, vegetation, and settlements are geographically distributed on the surface of the Earth. Land use refers to the human modification or management of the land. Land use/land cover (LULC) is an outcome of socio-economic activities and naturally occurring changes on Earth. When urban areas increase at the cost of transforming wetlands and forests, the Earth’s surface will have more impermeable areas, which affects the hydrologic cycle [4–6]. Land surface temperature (LST) is a critical metric in energy balance and the evapotranspiration process. It includes energy variations as well as interactions between the earth and the atmosphere. LST varies from surface air temperature because it is the heating and cooling of the earth’s surface [7]. In this paper, an urban sprawl evaluation for the Ernakulam district is performed, revealing the relations between LST and LULC. Kochi, Ernakulam’s principal port city, is claimed to have its own urban heat island. This research aids in determining the effect of the district’s urbanization on LST.

2 Data Requirements 2.1 Study Location Ernakulam district located in the State of Kerala lies on the southwestern coast of the Arabian Sea. It has an area of 3064 km2 lying between 9° N and 10° N latitude and 76° E and 77° E longitude (see Fig. 1). The district has a population of (approx.) 3,427,659 as per the Annual Vital Statistics Report of 2018 [8]. It is bordered on the north by Thrissur District, on the east by Idukki District, on the south by Alappuzha and Kottayam Districts, and on the west by the Arabian Sea shoreline. According to the 2011 District Urbanization Report [9], the district is 47.56% urbanized. The district’s administrative divisions include two revenue divisions located at Fort Kochi and Muvattupuzha. It is divided into seven Taluks, having a total of 124 revenue

Urban Sprawl Assessment Using Remote Sensing …

295

Fig. 1 Study location

villages. The district is divided into one Municipal Corporation, 8 Municipalities, 15 Block Panchayats, and 88 Grama Panchayats. The district’s physiography comprises highland produced by a part of the Western Ghats, midland made by plains, backwaters, and rivers, and ultimately lowland territory near the beach. The district has a tropical humid climate with scorching summers from March to May. It has two seasonal rainfalls which are the South-West monsoon and the North-East monsoon from June to September and October to November/midDecember, respectively. The remaining months are mainly dry. The main rivers in the district are the Periyar and Muvattupuzha. Ernakulam district ranks fourth in the state in terms of literacy rate and is the state’s second most urbanized district.

2.2 Data Sources The necessary satellite imagery was obtained from the USGS (United States Geological Survey) website. The Landsat 8 OLI/TIRS imagery for the years 2014 and 2020 during the cloud-free months of January and March were downloaded respectively. The images had very less cloud cover (less than 5% cloud cover) which makes the LULC and LST analysis more accurate. The district boundary shapefile was generated by georeferencing the district map created by the Petroleum and Natural Gas Regulatory Board that was available online on the ArcGISPro software (Table 1).

296

S. Radhakrishnan and P. Geetha

Table 1 Details of landsat 8 OLI/TIRS data Satellite

Acquisition date

Path/row

Resolution of spectral band (m)

Resolution of thermal band (m)

Landsat 8 OLI/TIRS

26-01-2014

144/53

30

100

Landsat 8 OLI/TIRS

31-03-2020

144/53

30

100

2.3 Image Preprocessing The Landsat 8 imagery consists of 11 spectral bands where Bands 1–7 and 9 have got a resolution of 30 m, Band 8 is a panchromatic band with 15 m resolution, and Bands 10 and 11 with resolution of 100 m. Newly acquired remote sensed data must be preprocessed before any analysis can be performed on the image [10]. The preprocessing steps included the following: 1. 2. 3.

4.

Both the Landsat datasets are geometrically corrected to a single map reference system—Zone 43 N of UTM, with the geodetic datum WGS84. Produce a composite view of the 11 bands from both datasets. The area of interest was subsetted from the composite image using a shapefile of the district boundary derived from the district boundary map created by the Petroleum and Natural Gas Regulatory Board. Depending upon the application the required radiometric calibration (converting DN to Reflectance Values) and atmospheric corrections (DOS—Dark Object Subtraction) were performed on the datasets using the ENVI 5.5.2 software.

Data preprocessing is done to rectify the radiometric and geometric errors and also the sensor errors, which occur due to variations in lighting of the scene and viewing geometry, sensor noise, atmospheric conditions, and responses. Hence this radiometric correction methodology is essential [10].

2.4 Normalized Difference Vegetation Index (NDVI) The NDVI index, defined as the difference between the near-infrared (NIR) and red bands, was used to assess the vegetation characteristics of the study region. The NDVI values go from − 1 to + 1, with higher values suggesting dense vegetation and lower values suggesting non-vegetated areas [11]. The equation to calculate the NDVI index is given below [11]. NDVI = (NIR − Red)/(NIR + Red)

(1)

Band 5 of the Landsat 8 imagery captures the NIR part of the EM spectrum, which is strongly reflected by plants, and band 4 collects the red light from the EM spectrum, which is significantly absorbed by plants (see Figs. 2 and 3). The combination of

Urban Sprawl Assessment Using Remote Sensing …

Fig. 2 NDVI map for Ernakulam District in the year 2014

Fig. 3 NDVI map for Ernakulam District in the year 2020

297

298

S. Radhakrishnan and P. Geetha

LST and NDVI has been analyzed in several urban sprawl [10, 11] related studies and also in studies to monitor agricultural drought [12]. The NDVI was calculated using the ArcGISPro 2.7 software.

2.5 Normalized Difference Built-Up Index (NDBI) The NDBI index is defined as the difference between short-wave infrared (SWIR) and near-infrared (NIR) wavelengths (see Figs. 4 and 5). The NDBI value goes from − 1 to + 1, with negative values indicating non-built-up areas and a positive value indicating greater chance of built-up areas in the environment. The index is calculated as follows [13]. NDBI = SWIR−NIR/SWIR + NIR

(2)

Band 6 of the Landsat 8 imagery captures the SWIR part of the EM spectrum which is of the wavelength 1.57–1.65 µm and band 5 collects the NIR region from the EM spectrum. Several studies have used a combination of NDVI and NDBI to quantify the vegetated areas and manufactured built-up areas in their study area. Techniques that can improve the limitations of the use of this index to map the urban region has also been done [14–16]. The NDBI was calculated using the ArcGISPro 2.7 software.

Fig. 4 NDBI map of Ernakulam District for the year 2014

Urban Sprawl Assessment Using Remote Sensing …

299

Fig. 5 NDBI map of Ernakulam District for the year 2020

2.6 The Land Surface Temperature (LST) The LST is characterized as the temperature sensed by one’s hand while touching any land surface [17] (see Figs. 6 and 7). LST has been a major topic for creating spacebased measurement approaches since it is an important element of the land surface. LST is used in many scientific fields, such as global climate change, hydrology, land use/land cover for urban studies, and agriculture [18, 19]. LST from the Landsat 8 Imagery can be calculated by following the steps as described by the USGS [18]: 1.

Calculating TOA—Input Landsat 8 imagery from Band 10 into the ArcGISPro software. Using the Raster Calculator and values from metadata of the dataset, the following expression is calculated. TOA(L l ) = M L ∗ Q cal + A L

2.

(3)

M L stands for band-specific multiplicative rescaling factor, Qcal is image Band 10 and AL stands for band-specific additive rescaling factor [18]. Converting TOA to Brightness Temperature (BT) in Celsius—The preceding step’s Spectral Radiance value is converted to BT by using thermal constant values from the dataset’s metadata. The BT is calculated using the following expression. BT = (K 2 /(ln(K 1 /L) + 1)) − 273.15

(4)

300

S. Radhakrishnan and P. Geetha

Fig. 6 LST map of Ernakulam District for the year 2014 with highest temperature as 34.3 °C and lowest temperature as 10.7 °C

Fig. 7 LST map of Ernakulam District for the year 2020 with highest temperature as 43.18 °C and lowest temperature as 16 °C

Urban Sprawl Assessment Using Remote Sensing …

3.

4.

Here K 1 and K 2 stands for band-specific thermal conversion constants. Calculate the NDVI—The NDVI for the imagery is calculated as described in Sect. 2.4. Here, the calculation of the NDVI is significant because it is followed by the calculation of the percent of vegetation (PV ), that is related to NDVI, and the calculation of emissivity (ε), related to the PV. Calculating PV —The proportion of vegetation is computed using the equations provided below. The maximum and minimum NDVI measurements are used to calculate how much of the land is vegetated. PV = ((NDVI−NDVImin )/(NDVImax −NDVImin ))2

5.

(5)

Calculating emissivity (ε)—It is a proportionality factor that measures blackbody radiance to predict the radiance emitted [20]. The following equation is used [21]: ε = 0.004 ∗ PV + 0.986

6.

301

(6)

Calculating the emissivity corrected LST (T S )—The emissivity corrected LST is calculated as follows [18]: TS = BT/{1 + [(λ ∗ BT/ρ) ln ε]}

(7)

Here, λ is the wavelength of the radiance emitted, taken as 10.8 for band 10 and ρ = h * (c/σ ), which equals 1.438 × 10−2 mK, σ is the Boltzmann constant (1.38 × 10–23 J/K), h stands for Planck’s constant (6.626 × 10–34 J s) and c is the speed of light (2.998 × 108 m/s).

2.7 Land Use/Land Cover (LULC)—Supervised Classification An FCC image of the band combination 5-4-3 (NIR-Red-Green) was used for the LULC study. The Support Vector Machine Supervised Classification Technique was used to do object-based classification using the segmented image on the ArcGISPro software. SVM is a machine learning algorithm that can provide efficient classification accuracy outcomes [22]. The training samples were chosen based on the FCC color signatures of the Earth Surface Features [23]. The NRSC Classification Schema of 2011 was used to choose the various LULC types from the study region [24]. The following classes were delineated based on the classification schema and color signature of each feature: Built-up, Barren land/Unculturable/Wasteland, Agriculture/Plantation (Cropland, Fallowland), Waterbody/Wetlands (Wetland Vegetation), and Forest (Deciduous, Evergreen, and Mixed) (see Figs. 8 and 9).

302

Fig. 8 Map of LULC for Ernakulam District in the year 2014

Fig. 9 Map of LULC for Ernakulam District in the year 2020

S. Radhakrishnan and P. Geetha

Urban Sprawl Assessment Using Remote Sensing …

303

2.8 Accuracy Assessment Accuracy Assessment is a key step in classification work since it indicates the quality of information obtained from the RS imagery. Accuracy assessments are performed both quantitatively as well as qualitatively [25]. In this case, a quantitative assessment was performed by calculating user accuracy, producer accuracy, and overall accuracy of information. Finally, the kappa coefficient was determined, which allows measurement of accuracy of classification [26]. Because of the limitations in collecting ground truth points owing to the current pandemic situation, a random sampling approach was adopted to produce 400 points on the map, and the classified map was compared with the Google Earth High-Resolution timeline Imagery that served as the reference map for the study area (Tables 2 and 3). Table 2 Accuracy assessment points chosen for the district for the year 2014 Waterbody Forest Barrenland Agriculture Builtup Total User Producer (User) accuracy accuracy (%) (%) Waterbody

68

12

0

0

0

80

85

97.4

Forest

2

93

0

7

0

102

91.9

83.03

Barrenland 0

0

52

0

17

69

75.3

82.5

Agriculture 0

7

0

70

0

77

90.9

90.9

Builtup

0

0

11

0

61

72

84.7

78.20

Total (producer)

70

112

63

77

78

400

Overall Accuracy = 86%, Kappa Coefficient = 82.3%

Table 3 Accuracy assessment points chosen for the district for the year 2020 Waterbody Forest Barrenland Agriculture Builtup Total User Producer (User) accuracy accuracy (%) (%) Waterbody

65

10

0

0

0

75

86.6

92.8

Forest

5

88

0

12

0

105

83.8

77.8

Barrenland 0

0

56

0

4

60

80

91.9

Agriculture 0

15

0

80

0

95

84.21

86.9

Builtup

0

0

5

0

60

65

92.3

93.75

Total (producer)

70

113

61

92

64

400

Overall Accuracy = 87.25%, Kappa Coefficient = 83.8%

304

S. Radhakrishnan and P. Geetha

3 Results and Discussion According to the LULC analysis, over the past 7 years (2014–2020), the following results were found—built-up area expanded by 2%, the waterbody/wetlands decreased by 1%, agriculture/plantation decreased by 3%, barren land/unculturable/wasteland increased by 2%, and forest cover improved by 1%. The loss in agricultural/plantation area and increase in barren land/unculturable/wasteland area may be owing to high levels of soil erosion induced by the Kerala floods in 2018. According to the findings from a study on soil erosion caused due to floods [27], the researchers discovered that the soil erosion rate in Ernakulam was increased by 95%. This rise in erosion indicates agricultural damage as a result of waterlogging and washing away of topsoil caused by the flood. High rates of erosion are caused by high rainfall and also by converting natural land into settlement areas [27]. The NDVI and NDBI indices confirm the LULC classification results. It can be seen on the map that higher values of NDVI are found in areas with good forest cover and agriculture. In the year 2020, an increase in forest cover is found in the southeast portion of the research area on NDVI map and is readily observed on the LULC map supported by lower temperature in the LST map. Similarly, high NDBI values are observed precisely where there is an increase in built-up regions. This is also supported by the higher temperature shown on the LST maps. The increase is most noticeable in the southwestern region of the study area. The maximum recorded temperature from the LST map for the years 2014 (34.3 °C) and 2020 (43.16 °C) occurs precisely in the areas with the most builtup area, particularly close to the major towns/cities of Kochi, Ernakulam, Aluva, Perumbavoor, Muvattupuzha, and the Nedumbassery airport. During both years, forest ranges and areas with high green cover appear to have held back the rise in land surface temperature. The difference between the lowest recorded temperature of 2014 and 2020 is around 5 °C and the difference between maximum recorded Land Surface Temperatures between 2014 and 2020 is around 9 °C. This rise in maximum recorded temperature could be attributed to an increase in urban heat islands coupled with the effects of climate change over the years. Concrete structures are said to emit a lot of heat and according to a news report, Kochi is evolving into a concrete jungle, as urbanization increases by having its own urban heat island [28]. This is caused by the extraordinary heating up of metropolitan regions due to excessive built-up area and infrastructure. According to a recent study conducted by academicians from the Indian Institute of Technology Kharagpur, the urbanized regions of Kochi have seen significant growth and dispersion between the years 2002 and 2013 where 17% of urban cover in 2002 increased to 23% urban cover in 2013 (Table 4).

Urban Sprawl Assessment Using Remote Sensing …

305

Table 4 Area wise statistics during the study period S. No. LULC type

Area 2014 (km2 ) Area 2020 (km2 ) Change in % 409.6

443.5

+2

212.07

−1

1

Builtup

2

Waterbody/wetlands

3

Forest

1166.2

1212.8

+1

4

Agriculture/plantation

1181.8

1067.3

−3

5

Barrenland/unculturable/wasteland

138.9

+2

236.9

79.99

4 Conclusion The increase in densely populated areas and industrial zones produces hotspots for urbanization leading to urban sprawl development. Scientists recommend that regulating developmental projects, conserving existing wetlands, and maintaining vertical gardens on metro pillars can trap the heat [28]. Studies also show that both the configuration and composition of green space contribute to Urban Heat Islands [29]. Urban sprawl, a complex type of urban development, is a major contributor to several significant difficulties that cities face. Greenhouse gas emissions, air pollution, traffic congestion are among the difficulties and all this contributes to climate change [30]. Climate change is one of the reasons to blame for an increased air surface temperature in Kerala during 2020. According to the report, Kerala was steaming during the second week of February 2020 due to the exceptional heating of the sea and land caused by global warming, with the rise in temperatures of 2–3 °C above normal [31]. Reducing the impact of urbanization on climate change is critical for long-term development. Focused policy action at all levels of government is urgently needed to direct urban growth towards more sustainable paths. This is critical for attaining the United Nations Sustainable Development Goals and for achieving the Paris Climate Agreement targets [30]. Acknowledgments I would like to express my deep gratitude to Dr. Maneesha V. Ramesh, Director and Professor at Center for Wireless Networks and Applications (WNA), Amrita Vishwa Vidyapeetham for her constant support during the period of this study. I would also like to express my sincere gratitude to my guide P. Geetha, Assistant Professor, Center for Excellence in Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, for her invaluable guidance and motivation throughout this work. I owe a great debt of gratitude to my parents and my brother for their unwavering support in pushing me farther than I believed I could go. Finally, I would want to express my sincere thanks to all my friends who contributed valuable suggestions to the writing of this paper.

306

S. Radhakrishnan and P. Geetha

References 1. Urban sprawl. https://www.britannica.com/topic/urban-sprawl. Last accessed 26 May 2021 2. Das, S., Laya, K.S.: Urbanization and development in Kerala. Int. J. Appl. Res. 2, 586–590 (2016) 3. Rahman, A., Aggarwal, S.P., Netzband, M., Fazal, S.: Monitoring urban sprawl using remote sensing and GIS techniques of a fast growing urban centre, India. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sensing 4(1), 56–64 (2010) 4. Mamtha, R., Jasmine, N.M., Geetha, P.: Analysis of deforestation and land use changes in Kotagiri Taluk of Nilgiris District. Indian J. Sci. Technol. 9, 44 (2016) 5. Prasad, G., Ramesh, M.V.: Spatio-temporal analysis of land use/land cover changes in an ecologically fragile area—Alappuzha District, Southern Kerala, India. Nat. Resour. Res. 28(1), 31–42 (2019) 6. Prasad, G., Rajesh, R., Arun, K.: Land use pattern as an indicator of sustainability: a case study. In: 10th Annual International Conference on Industrial Engineering and Operations Management. IEOM Society International. ISSN, pp. 2169–8767 (2020) 7. Land Surface Temperature. https://land.copernicus.eu/global/products/lst. Last accessed 20 May 2021 8. http://www.ecostat.kerala.gov.in/images/pdf/publications/Vital_Statistics/data/vital_statis tics_2018.pdf. Last accessed 15 May 2021 9. http://townplanning.kerala.gov.in/town/wp-content/uploads/2019/04/dur_ernakulam.pdf. Last accessed 15 May 2021 10. Yasin, M.Y., Yusof, M., Nisfariza, M.N.: Urban sprawl assessment using time-series LULC and NDVI variation: a case study of Sepang, Malaysia. Appl. Ecol. Environ. Res. 17(3), 5583–5602 (2019) 11. Karaku¸s, C.B.: The impact of land use/land cover (LULC) Changes on land surface temperature in Sivas city center and its surroundings and assessment of urban heat island. Asia-Pac. J. Atmos. Sci. 55(4), 669–684 (2019) 12. Jasmineniketha, M., Geetha, P., Soman, K.P.: Agricultural drought analysis for Thuraiyur Taluk of Tiruchirappali District using NDVI and land surface temperature data. In: 2017 11th International Conference on Intelligent Systems and Control (ISCO), pp. 155–159. IEEE, 2017 13. Hussain, S., Mubeen, M., Ahmad, A., Akram, W., Hammad, H.M., Ali, M., Masood, N. et al.: Using GIS tools to detect the land use/land cover changes during forty years in Lodhran district of Pakistan. Environ. Sci. Pollut. Res. 1–17 (2019) 14. Macarof, P., Statescu, F.: Comparasion of NDBI and NDVI as indicators of surface urban heat island effect in landsat 8 imagery: a case study of Iasi. Present Environ. Sustain. Dev. 11(2), 141–150 (2017) 15. Chen, L., Li, M., Huang, F., Xu, S.: Relationships of LST to NDBI and NDVI in Wuhan city based on landsat ETM+ image. In: 2013 6th International Congress on Image and Signal Processing (CISP), vol. 2, pp. 840–845. IEEE, 2013 16. He, C., Shi, P., Xie, D., Zhao, Y.: Improving the normalized difference built-up index to map urban built-up areas using a semiautomatic segmentation approach. Remote Sens. Lett. 1(4), 213–221 (2010) 17. Rajeshwari, A., Mani, N.D.: Estimation of land surface temperature of Dindigul district using Landsat 8 data. Int. J. Res. Eng. Technol. 3(5), 122–126 (2014) 18. Avdan, U., Jovanovska, G.: Algorithm for automated mapping of land surface temperature using LANDSAT 8 satellite data. J. Sensors 2016 (2016) 19. Shanmugapriya, E.V., Geetha, P.: A framework for the prediction of land surface temperature using artificial neural network and vegetation index. In: 2017 International Conference on Communication and Signal Processing (ICCSP), pp. 1313–1317. IEEE, 2017 20. Jiménez-Muñoz, J.C., Sobrino, J.A., Gillespie, A., Sabol, D., Gustafson, W.T.: Improved land surface emissivities over agricultural areas using ASTER NDVI. Remote Sens. Environ. 103(4), 474–487 (2006)

Urban Sprawl Assessment Using Remote Sensing …

307

21. Gis Crack. https://giscrack.com/how-to-calculate-land-surface-temperature-with-landsat-8images/. Last accessed 20 May 2021 22. Babu, M.J., Geetha, P., Soman, K.P.: Classification of remotely sensed algal blooms along the coast of india using support vector machines and regularized least squares. Ind. J. Sci. Technol. 9, 30 (2016) 23. https://jogamayadevicollege.ac.in/uploads/1586347159.pdf. Last accessed 15 May 2021 24. https://bhuvan-app1.nrsc.gov.in/2dresources/thematic/2LULC/lulc1112.pdf. Last accessed 15 May 2021 25. Accuracy Assessment. https://gis.humboldt.edu/OLM/Courses/GSP_216_Online/lesson6-2/ index.html. Last accessed 20 May 2021 26. Halmy, M.W.A., Gessler, P.E., Hicke, J.A., Salem, B.B.: Land use/land cover change detection and prediction in the north-western coastal desert of Egypt using Markov-CA. Appl. Geogr. 63, 101–112 (2015) 27. Research Matters. https://researchmatters.in/news/slipping-away-surface-impact-kerala-2018floods-soil-erosion. Last accessed 24 May 2021 28. The Hindu. https://www.thehindu.com/news/cities/Kochi/rapid-urbanisation-giving-rise-toheat-islands-in-kochi/article25809737.ece. Last accessed 24 May 2021 29. Maimaitiyiming, M., Ghulam, A., Tiyip, T., Pla, F., Latorre-Carmona, P., Halik, Ü., Sawut, M., Caetano, M.: Effects of green space spatial pattern on land surface temperature: implications for sustainable urban planning and climate change adaptation. ISPRS J. Photogramm. Remote. Sens. 89, 59–66 (2014) 30. https://www.oecd.org/environment/tools-evaluation/Policy-Highlights-Rethinking-Urban-Spr awl.pdf. Last accessed 26 May 2021 31. Down To Earth. https://www.downtoearth.org.in/news/climate-change/blame-climate-changefor-a-sweltering-kerala-69354. Last accessed 26 May 2021

Exploring the Means and Benefits of Including Blockchain Smart Contracts to a Smart Manufacturing Environment: Water Bottling Plant Case Study O. L. Mokalusi , R. B. Kuriakose , and H. J. Vermaak

Abstract Blockchain smart contracts are increasingly becoming prominent with applications in all spheres of industry. The main advantages of blockchains that promulgate their use are the increased security, transparency, and traceability that it offers to users. This article explores how a blockchain smart contract can be incorporated into a smart manufacturing environment and what are its plausible benefits. This article is written based on research done on the case study of a Make-to-Order water bottling plant. The current challenge that necessitated this research stems from customer’s need for transparency on the contents of the bottled water, date of filling, date of packaging, and elimination of third-party deliveries. The hypothesis of the research is that these challenges can be overcome by incorporating blockchain smart contracts into the water bottling plant. Keywords Industry 4.0 · Smart manufacturing · Supply chain · Blockchain · Smart contracts

1 Introduction Traditional manufacturing plants have always been looking at ways in which operational processes can be optimized to yield greater profits in minimal time [1]. Studies show that this has been hindered by factors such as the varying interests of stakeholders and complex business processes [2]. The intricacies of the traditional manufacturing environment are accentuated when moving to a smart manufacturing environment. The challenges that existed previously

O. L. Mokalusi (B) · R. B. Kuriakose · H. J. Vermaak Central University of Technology, Bloemfontein, Free State, South Africa R. B. Kuriakose e-mail: [email protected] H. J. Vermaak e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_27

309

310

O. L. Mokalusi et al.

move on to more complex platforms with the inclusion of Cyber Physical Systems (CPS), Cloud manufacturing, and Machine to Machine (M2M) communication [3]. This research focuses on a Make-to-Order (MTO) smart manufacturing plant which produces bottled water [4]. The customer orders are collected through a cloud server and executed using three Smart Manufacturing Units which operate in a decentralized manner using a Smart Manufacturing protocol [4]. Currently, customers have no means of knowing the source and contents of the water, the plant where it was bottled, and track the delivery process. This research looks at how blockchain smart contracts can be included in the water bottling plant to address the afore mentioned challenges. The paper initially gives a background on the current technologies used to mitigate the challenge described in the introduction and the principle of operation of blockchains and Smart contracts. It then focuses on the case study used in this research before looking at the proposed methodology used to include blockchain smart contracts in the case study. As the research is still in its infancy, it is rounded off with the expected results and possible future work.

2 Background 2.1 Industry 4.0 Industrial automation is undergoing a major transformation with the evolution of the Fourth Industrial Revolution, or Industry 4.0, which dictates that smart manufacturing machines need to be self-managing, self-configuring, self-organizing, self-optimizing, and self-enforcing [1]. The factors [2] are as a result of the need from most technology organizations to promote collaboration and heighten the proliferation of automated production processes, from producing a product from raw materials to delivering to a customer in the traditional supply chain [3]. However, traditional manufacturing processes struggle from deprivation of provenance, data disintegration, and various regulatory policies in the supply chain which prohibit product traceability [1]. Provenance is defined as the place of origin or the earliest known history of something [4] which is tamperproof, and its integrity across the whole, or part, of a manufacturing and supply chain, can be verified. Traceability [5] is defined as the capability to track a product origin or course of development that may be found or followed [6] in which markets become global, promoting collaboration of organizations. Verifying product provenance and traceability is often limited to a simple printing on the product which indicates the date of manufacturing and expiry. In most cases, the data in the printed barcode is encrypted with serialized numeric code format posing difficulty for customers to track the product origin or history. This was evident with a listeriosis outbreak in South Africa, where several people

Exploring the Means and Benefits of Including …

311

lost their lives [7] due to the fact that the contaminated meat could not be traced back to its origin in time to effect a product recall. Currently, some of the technologies used to mitigate this scenario are Radio frequency identification (RFID), Near-field communication (NFC), barcodes, Digital Twin and non-line-of-sight technologies [3]. The use of RFID technology facilitates control and transparency in production process flow, enabling the implementation of data analysis for efficient production process control [8]. However, RFID product tracking methods possess drawbacks since they are connected to a local database [8]. This means customers without access to the local database, cannot verify product provenance or trace it in the supply chain. RFID product tracking also has the added disadvantage and risk of being written, corrupted, or completely erased. These threats and risks [9] defeats the purpose of “provenance” and “traceability” as the history or origin of the product or process may be altered. Another possible solution may be the use of a [10] Digital Twin. Digital Twins can replicate the physical assets and the production process flow of a smart manufacturing plant. However, the issues of “provenance” and “traceability” still persist, as the history or origin of the product or process being prone to data alterations still exist within a Digital Twin.

2.2 Blockchain The principle of operation of a blockchain can be used to positively impact the challenges of provenance and traceability in a supply chain. Bitcoin and blockchain implementation was originally published in a whitepaper by Satoshi Nakamoto on October 31, 2008 [11]. The aim of blockchain is to create a decentralized system that eliminated third parties and cannot be manipulated (see Fig. 1). Blockchains were developed as an open-source project with Bitcoin as a cryptocurrency, while blockchain as a distributed database. A blockchain is a writeonly distributed database, which means that data or transactions are trackable and irreversible [1] and can only be added and not edited or removed [12]. Every transaction is stored on the blockchain and is grouped together in blocks to make processing easier. Blocks contain information about the transactions [13] as well as information about the previous blocks. If alterations occur in the network in a process called mining, it will reject the block. A hash of each preceding block is stored in the following block, which ensures that all the blocks can be verified through a data mining process. Data mining is a process, which verifies the transaction in the blockchain. Data miners check each block with the previous block or a Proof-of-Work [14] and able to detect unauthorized changes ensuring provenance, traceability, immutability, and trust. Blockchain principles (see Fig. 2) [15] can be applied to a supply chain environment. This can be in the form of critical data pertaining to the product that can be logged from the time of sourcing of raw materials to the final product ending up with the customer in the blockchain [16]. Several companies are adopting Blockchain as

312

O. L. Mokalusi et al.

Fig. 1 Blockchain transaction [11]

Fig. 2 IBM blockchain supply chain [15]

a solution in the smart manufacturing and supply chain as seen published by Forbes on their third annual Blockchain 50 [17]. An example is that of Coke One North America (CONA) who have implemented collaboration among bottlers and aspires to include raw materials through the use of blockchain for both customers and business organizations to track and monitor their products instantaneously [18]. Blockchain enables businesses to build

Exploring the Means and Benefits of Including …

313

Fig. 3 Excerpt of the specific provenance smart contract [19]

smart contracts, which is facilitated collaborations and complex inter-organizational business processes.

2.3 Smart Contracts Smart contracts are modular, repeatable, autonomous scripts, which can be used to build Decentralized Applications (DAPP’s). Smart contracts enable business organizations to implement automated complex transactional rules [16] championing the integrity of blockchain DAPP. As Bitcoins were developed as a decentralized open-source project, many developers have adopted the code and created their own currencies and blockchains with their different rules. These include currencies like Ethereum which provides a programming layer on top of blockchain (see Fig. 3) [19] which allows complicated business rules to be executed. There are programmable smart contract languages like Solidity developed for Ethereum [20] and executable infrastructure to enable the smart contracts to be compiled into Ethereum Virtual Machine (EVM) bytecode then deployed to the blockchain. The Ethereum blockchain offers several tools for implementing, testing, and deploying a DAPP. Ethereum [21] token standard ERC-721 defines a unique asset, even functions such as ownership transfer, transfer of data points, and logging of provenance records can be incorporated into the smart contract. This would allow a smart contract in a Supply Chain Management (SCM) platform to reduce transaction costs, make a process trackable and irreversible [22].

314

O. L. Mokalusi et al.

3 Methodology The discussions are done in this article use the case study of a Make-to-Order water bottling plant. As mentioned in the introduction the current challenge that necessitated this research stems from customer’s need for transparency about the manufacturer details, contents of the bottled water, date of filling, date of packaging, and elimination of third-party deliveries. The data collected (antenna interaction with the tags as seen in Fig. 4) from a water bottling plant at Central University of Technology (CUT) will utilize in this research. The water bottling plant allows customers to place their orders online. Once the ordering process is completed, a real time optimization model sequences the orders for filling, capping, and packaging according to their priority. The process is achieved using three Smart Manufacturing Units (SMU’s) as depicted in Fig. 4. The aim of this research is to develop a smart contract that can be incorporated into the water bottling plant. This will mean that there will be two parties (customer and the water bottling plant). The smart contract collects data (see Fig. 5) from a water bottling plant and logs it into the blockchain for data provenance. The data to be logged includes, but is not limited to, a water bottling plant responsible for filling, capping, and packaging the water bottles for delivery (see Fig. 6).

Fig. 4 Smart water bottling plant at Central University of Technology (CUT) [8]

Exploring the Means and Benefits of Including … Fig. 5 Class diagram planned tracking Ethereum smart contract

Fig. 6 Make-to-Order (MTO) smart manufacturing plant which produces bottled water [8]

315

316

O. L. Mokalusi et al.

Logged data collected per bottled water is given a unique product number stored in the RFID tag. Traceability in a supply chain process flow begins at the origin [23] of every ingredient or raw material through transportation and to ending up at the customers through raw material processing, product packing, and product distribution. Customers will then use a developed “Blockchain data provenance tracking” DAPP, that scans a water bottle RFID tag and retrieve from the blockchain operated in a decentralized manner using a Smart Manufacturing Protocol based on Global Standards 1 (GS1) [24] standards (see Fig. 7). GS1 has been influential in the implementation of standards for proliferation of transparency, effective traceability, and security of the supply chain process flow. GS1 business rules facilitate collaborations and complex inter-organizational business processes in networks and smart supply chains significantly improving product provenance and traceability.

4 Conclusion and Future Work Executing programmable scripts on top of a blockchain, which is called smart contracts allowing parties to encode and implement business terms and rules. Business terms and rules implemented by GS1 standards on the blockchain platform will facilitate decentralized global language of businesses to improving product provenance and traceability. This article presented a blockchain-based framework for providing generic data provenance. The transparency, the contents of the bottled water, date of filling, date of packaging, and elimination of third-party deliveries may include smart contract based drones as seen with Boeing SkyGrid [17] which acquired approvals for blockchainbased air traffic control system in order to track, monitor, and control drones. A Make-to-Order (MTO) smart manufacturing plant at Central University of Technology (CUT) may further use the information to autonomously purchase raw material, machine service parts and send out smart contracts for machine maintenance. This is due to low human intervention as a major requirement [1] which dictates that smart manufacturing machines need to be self-managing, selfconfiguring, self-organizing, self-optimizing, and self-enforcing. Ensuring smart manufacturing machines adhere to the decentralized global business rules governed by the blockchain smart contracts for product provenance and traceability in traditional manufacturing plants.

Exploring the Means and Benefits of Including …

317

Fig. 7 GS1 global data network codes [3]

References 1. Prause, G.: Smart contracts for smart supply chains. IFAC-PapersOnLine 52, 2501–2506 (2019). https://doi.org/10.1016/j.ifacol.2019.11.582 2. Jamwal, A., Agrawal, R., Manupati, V.K., Sharma, M., Varela, L., Machado, J.: Development of cyber physical system based manufacturing system design for process optimization. IOP Conf. Ser. Mater. Sci. Eng. 997 (2020). https://doi.org/10.1088/1757-899X/997/1/012048

318

O. L. Mokalusi et al.

3. Jabbar, S., Lloyd, H., Hammoudeh, M., Adebisi, B., Raza, U.: Blockchain-enabled supply chain: analysis, challenges, and future directions. Multimed. Syst. (2020). https://doi.org/10. 1007/s00530-020-00687-0 4. Jaquet-Chiffelle, D.O., Casey, E., Bourquenoud, J.: Tamperproof timestamped provenance ledger using blockchain technology. Forensic Sci. Int. Digit. Investig. 33, 300977 (2020). https://doi.org/10.1016/j.fsidi.2020.300977 5. Demestichas, K., Peppes, N., Alexakis, T., Adamopoulou, E.: Blockchain in agriculture traceability systems: a review. Appl. Sci. 10, 1–22 (2020). https://doi.org/10.3390/APP101 24113 6. Behnke, K., Janssen, M.F.W.H.A.: Boundary conditions for traceability in food supply chains using blockchain technology. Int. J. Inf. Manage. 52, 101969 (2020). https://doi.org/10.1016/ j.ijinfomgt.2019.05.025 7. Tchatchouang, C.D.K., Fri, J., De Santi, M., Brandi, G., Schiavano, G.F., Amagliani, G., Ateba, C.N.: Listeriosis outbreak in South Africa: a comparative analysis with previously reported cases worldwide, pmc/articles/PMC7023107 (2020) 8. Jardine, N., Gericke, G.A., Kuriakose, R.R., Vermaak, H.J.: Wireless SMART product tracking using radio frequency identification. In: 2019 IEEE 2nd Wireless Africa Conference WAC 2019—Proceedings (2019). https://doi.org/10.1109/AFRICA.2019.8843418 9. Yiu, N.C.K.: Toward blockchain-enabled supply chain anti-counterfeiting and traceability. Futur. Internet. 13 (2021). https://doi.org/10.3390/fi13040086 10. Coetzer, J., Kuriakose, R.B., Vermaak, H.J.: Devising a novel means of introducing collaborative decision-making to an automated water bottling plant to study the impact of positive drift. In: Lecture Notes in Networks and Systems. pp. 661–669. Springer Science and Business Media Deutschland GmbH (2021) 11. Nakamoto, S.: Bitcoin: A Peer-to-Peer Electronic Cash System. https://bitcoin.org/bitcoin.pdf (2008) 12. Cui, P., Dixon, J., Guin, U., Dimase, D.: A blockchain-based framework for supply chain provenance. IEEE Access 7, 157113–157125 (2019). https://doi.org/10.1109/ACCESS.2019. 2949951 13. Alkhader, W., Alkaabi, N., Salah, K., Jayaraman, R., Arshad, J., Omar, M.: Blockchain-based traceability and management for additive manufacturing. IEEE Access 8, 188363–188377 (2020). https://doi.org/10.1109/ACCESS.2020.3031536 14. Malik, S., Kanhere, S.S., Jurdak, R.: ProductChain: Scalable blockchain framework to support provenance in supply chains. In: NCA 2018—2018 IEEE 17th International Symposium Networking Computer Application (2018). https://doi.org/10.1109/NCA.2018.8548322 15. Tijan, E., Aksentijevi´c, S., Ivani´c, K., Jardas, M.: Blockchain technology implementation in logistics. Sustainability 11 (2019). https://doi.org/10.3390/su11041185 16. Xu, M., Chen, X., Kou, G.: A systematic review of blockchain. Financ. Innov. 5 (2019). https:// doi.org/10.1186/s40854-019-0147-z 17. Blockchain 50 2021. https://www.forbes.com/sites/michaeldelcastillo/2021/02/02/blockchain50/?sh=2a895c83231c 18. CONA Services Partners with Salesforce for Real-Time Tracking—FreightWaves. https:// www.freightwaves.com/news/cona-services-partners-with-salesforce-for-real-time-tracking 19. Sigwart, M., Borkowski, M., Peise, M., Schulte, S., Tai, S.: Blockchain-based data provenance for the internet of things (2019) 20. Pearson, S., May, D., Leontidis, G., Swainson, M., Brewer, S., Bidaut, L., Frey, J.G., Parr, G., Maull, R., Zisman, A.: Are distributed ledger technologies the panacea for food traceability? Glob. Food Sec. 20, 145–149 (2019). https://doi.org/10.1016/j.gfs.2019.02.002 21. Westerkamp, M., Victor, F., Küpper, A.: Tracing manufacturing processes using blockchainbased token compositions. Digit. Commun. Networks. 6, 167–176 (2020). https://doi.org/10. 1016/j.dcan.2019.01.007 22. The Role of Blockchain in Manufacturing|Smart Manufacturing|Manufacturing Global. https:// manufacturingglobal.com/smart-manufacturing/role-blockchain-manufacturing

Exploring the Means and Benefits of Including …

319

23. Geethanjali, B., Muralidhara, B.L.: A framework for banana plantation growth using blockchain technology. In: Lecture Notes in Networks and Systems, pp. 615–620. Springer Science and Business Media Deutschland GmbH (2021) 24. Pal, A., Kant, K.: Using blockchain for provenance and traceability in internet of thingsintegrated food logistics. Computer (Long. Beach. Calif). 52, 94–98 (2019). https://doi.org/10. 1109/MC.2019.2942111

Extreme Gradient Boosting for Predicting Stock Price Direction in Context of Indian Equity Markets Sachin Jadhav , Vrushal Chaudhari , Pratik Barhate , Kunal Deshmukh , and Tarun Agrawal

Abstract Algorithmic Trading is becoming ever increasing field. We implemented various machine learning methods to predict stock price direction in the following paper. Technical Indicators are taken as features and models are trained upon them. Stock price direction forecasting helps us to choose from daily trading techniques and increases the probability of having massive profits while maintaining low-risk profile. In our research extreme gradient boosting (XGBoost) technique has given the highest accuracy, i.e. 73.1% and have outperformed other machine learning techniques used. Our Proposed model outperforms existing models in literature and adds forecasting on day-by-day basis. Keywords Trading · XGBoost · Equity markets · Algorithmic model · Machine learning for finance · NSE · NIFTY50

1 Introduction Predicting the stock price direction is an ongoing topic of discussion among all financial forums. The main motive behind this trend to get profits from stock market while minimizing the risks associated. It can be achieved through algorithmic trading, using machines instead of humans to do the actual trades. Stock price direction prediction can be done in two ways, i.e. long-term stock price prediction and shortterm stock price prediction. While long-term stock price prediction is usually done through fundamental analysis of stock, Short-term stock price prediction is done using technical indicators. Various new techniques like Data mining, Sentimental analysis and Arbitrage analysis are also being used by professionals to achieve desired results. In this paper, we will be focussing on short-term prediction of stock price direction. Techniques such as sma-rsa overlap, morning breakout, start gap analysis, momentum-price and market modules are used in Technical Analysis. But the main S. Jadhav (B) · V. Chaudhari · P. Barhate · K. Deshmukh · T. Agrawal Department of Information Technology, PCCoE, Pune, Maharashtra, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_28

321

322

S. Jadhav et al.

drawback of above techniques is they are not able to find hidden correlations between various indicators [1]. While stock price prediction is a complex topic due to chaotic nature of equity markets, AI outshines in finding correlation between technical indicators. In Machine learning models, simple moving average, moving standard deviation, relative strength index, Williams %R, Parabolic SAR and Average directional index are taken as features. Last 20 years of stock data is used for each NIFTY50 stock. We have used various techniques such as support vector machine, k-neighbours, neural network using LSTM, Random forest, logistic regression, decision tree and extreme gradient boosting.

2 Methods for Data Collection and Pre-processing for Implementation of Xgboost This section includes criteria we used for stock selection, how we created dataset, feature extraction using various technical indicators, data pre-processing and feature co-relation mapping.

2.1 Stock Selection We have chosen to go with stocks in NIFTY50 index. • Stocks in NIFTY50 index constitute over 65% of National Stock Exchange Market capitalization. • NIFTY50 stocks give us exposure to all the sectors in Indian market such as Oil, Banking, Technology. • These stocks are regularly updated to accommodate best stocks of country.

2.2 Dataset Dataset is acquired using finance library [2]. • It includes the data of past 20 years from JAN-2001 to DEC-2020. • It includes Open, High, Low, Close, Adj Close, Volume of each day for each nifty stock. • This data is consistent in format and closing price is adjusted according to stock splits and dividend

Extreme Gradient Boosting for Predicting …

323

2.3 Feature Extraction Feature Extraction is done through TALib library. Following are features calculated. Simple moving average • It is Arithmetic average of closing price/Mean of closing price over n no. of days. • We have used simple moving average of 5 days and 10 days. N • SMA = C1 +C2 +···+C N where C 1 –C n N

are closing price of each day. is number of days.

Moving standard deviation • It is standard deviation over n number of days. • We have used moving standard deviation of 5 days and 10 days. (C1 − SMA)2 + (C2 − SMA)2 + · · · + (C N − SMA)2 MSD2 = • N √ 2 MSD = MSD where C 1 –C n N SMA

are closing price of each day. is number of days. is simple moving average for n days.

Relative strength index • • • •

It is moment indicator measuring magnitude of price change in n days. We have used 14-day timeframe for RSI. Its value is between 0 to 100 and thresholds are 70, 30 for overbought, oversold. RSI = 100 – 100/(1 + RS).

where RS = Average price increase over past n days/Average price decrease over past n days. Williams %R • It is a moment indicator as well as an oscillator. • We have used 14-day timeframe for Williams %R. • Its value is between 0 to − 100 and thresholds are − 20, − 80 for overbought, oversold. −C • Williams % R = −100 ∗ HHn−L n n where Hn C Ln

stands for highest high over n days. stands for closing price. stands for lowest low over n days.

324

S. Jadhav et al.

Parabolic SAR • Parabolic stop and reverse indicator to determine asset price direction. • We have taken acceleration and maximum as 0.2. • SAR = SARn−1 + AF ∗ (|EP − SARn−1 |) where SARn-1 AF EP

is SAR of previous day. is acceleration factor. is Lowest Low/Highest High.

Average directional index • It is used to determine strength of trends. • Its value is between 0 and 100. Increasing value denotes stronger trend. DX • ADX = ADXn−1 ∗13+Current 14 where ADXn-1 Current DX

is ADX of previous day. is current directional index.

2.4 Feature Pre-processing and Co-relation Mapping All features are transformed using Standard Scaler to have uniformity over data. Following figure is created using seaborn library heatmap. It denotes how features are co-related to each other. Values are from − 1 to 1, where values at both ends denote strong negative and strong positive co-relation respectively. Both x and y axis represents all the features that we have in dataset. In diagonal, we can see 1 is there as each feature is fully co-related to each other. The features we are using, i.e. ma_5, ma_10, std_5, std_10, RSI, Williams %R, SAR, ADX have weak co-relation with each other. This ensures that there are unique data points in our dataset. As Open, High, Low, Close is strongly co-related with each other and with ma_5 and ma_10, we will not be using them while training our model. Figure 1 denotes the mapping of co-relation between features used for training machine learning algorithms. We have explained features in feature extraction section.

3 Stock Price Direction Prediction Model Implementation Using Xgboost XGBoost Algorithm dominates the applied machine learning. We will be implementing it for stock price direction prediction. It focuses on prediction model performance and process computational speed. As more computation speed is achieved

Extreme Gradient Boosting for Predicting …

325

Fig. 1 Co-relation between features for RELIANCE stock price data over last 20 years

using XGBoost, trading signals can be generated faster than intern makes sure trading signals to be transferred to actual trades made. Model performance is better than other machine learning algorithms. In this section, high level architectural diagram of system and algorithm implemented is explained.

3.1 High Level Architectural Diagram We will start by presenting high level architectural diagram. This diagram includes steps in our stock price prediction model implementation. There are 5 steps, Model Training is done in XGBoost step. Stock price prediction is used to make trades. It gives actual flow of our system and explains the main building blocks behind our model. Figure 2 denotes how exactly system works. Following are the main 5 steps. • • • • •

It starts from data collection using yfinance. Then feature extraction using TALib library. Feature scaling using StandardScaler. Feeding data to XGBoost to be trained. Using trained model to predict stock price direction.

3.2 Algorithm The given three steps are recursively called in the process. Step 1 Step 2

Regression Predictor is learned. Error residual is computed.

326

S. Jadhav et al.

Fig. 2 High level architectural diagram of system

Step 3

Prediction the residual is leant. Following formula is used to calculate the error rate.

Error in prediction denoted by D = z, zˆ where D(.) =

(z[i] − z[i])2

zˆ is used to reduce the error z[ j] = z[ j] + α f [ j] where f [ j] ≈ ∇ J z, zˆ Loss function gradient is estimated by each learner. Gradient Descent with Step size α is used to reduce sum predicator. We present the proposed algorithm for XGBoost below.

Extreme Gradient Boosting for Predicting …

327

1 : p ro cedure st art XTREM E GR ADIE NTB OOST (S) > S i s th e l ab elle d t rai ning d at a of st ock s 2: I n it i aliz e mo d el with a co n st ant v al u e . 3: f o r do m = 0 to M 4: C o mput e th e pseudo -residu al s 5: Fi t ba se l e arne r to pseudo resi du al s 6: T i = new Deci sionTree() 7: f e a tu r es i = R and o mF ea tu reSel ecti on ( Si ) 8: T i . t rain (Si , f eatur es i ) 9: C o mpute multipli er γm 10: Update the model 11: output Fm (x)

where L(y, γ ) is the differentiable loss function.

4 Experimental Results This section explains details of Metrics used for comparison and Results.

4.1 Metrics The parameters used to assess the robustness of binary classification are accuracy, precision and recall [3]. The Confusion Matrix values are used to calculate these parameters. These parameters are calculated using following parameters. Accuracy: Accuracy =

Correct Predictions Total Predictions

Accuracy =

T p + Tn T p + Tn + F p + Fn

Precision: Precision =

Accurate up Predictions Total up Predictions

Precision =

Tp T p + Fp

Recall: Recall =

Tp Accurate up Predictions Recall = Actual up T p + Fn

328

S. Jadhav et al.

Table 1 Result matrix, average of all NIFTY50 stocks Precision

Recall

F1-score

Support

0 (downward trend)

0.88

0.54

0.67

13

1 (upward trend)

0.67

0.92

0.77

13

Accuracy

–

–

0.73

26

Macro-average

0.77

0.73

0.72

26

Weighted-average

0.77

0.73

0.72

26

where, T p = true up values number, T n = true down values number, F p = false up values number, F n = false down values number. Evaluating machine learning models, we take average accuracy, precision and recall of all 50 stocks we have taken. These matrices are used for performance evaluation in this paper.

4.2 Results and Comparison Following results are average of 50 stocks classification report of stocks in NIFTY50. It indicates various outputs such as F1-score, support, precision and recall for downward trend, upward trend of stock price direction. XGBoost is the classifier used. • NIFTY50 Average: The accuracy of model XGBoostClassifier is 0.731. • Precision: 0.667. • Recall: 0.923. Table 1 is obtained through classification report in sklearn metrics. As we can see from Table 1, XGBoost have accuracy of 73.1%. It also has 66.7% precision and 92.3% recall. Table 2 values are obtained from machine learning methods implemented by us on the same dataset that of XGBoost. From Table 2, It is clear that XGBoost has Outperformed all other machine learning techniques in terms of accuracy. This indicates our proposed method is successful in beating other methods implemented before. As XGBoost gives more Accurate results, we are able to make more profitable trades while minimizing risk associated. Figure 3, clearly indicates XGBoost as better candidate for stock price direction prediction in context of Indian equity markets. Thus, implemented method works as expected and manages to outperform other methods.

Extreme Gradient Boosting for Predicting …

329

Table 2 Comparison between various ML techniques based on accuracy, precision and recall AI/ML technique

Accuracy average % of NSE50

Precision average % of NSE50

Recall average % of NSE50

Support vector machine [4]

46.2

47.5

69.2

Logistic regression 42.3

44.4

61.5

Neural network (LSTM) [5]

52.3

52.8

63.8

Decision tree

38.5

33.3

23.1

Random forest [6]

53.8

55.6

38.5

K-nearest neighbour

50.9

56.9

46.6

XGBoost [3]

73.1

66.7

92.3

100 80 60 40 20 0 Support Logistic Neural Decision Vector Regression Network Tree Machine using LSTM Accuracy

Precision

Random K-Nearesh XGBoost Forest Neighbour

Recall

Fig. 3 Visual representation of comparison between various ML models

5 Conclusions From the obtained results, we can conclude that extreme gradient boosting outperforms other machine learning techniques by considerable margin. XGBoost gives on average 40% more accuracy than a support vector machine, k-neighbours, neural network using LSTM, Random forest, logistic regression and decision tree. XGBoost predicts correct upticks that is required for placing orders, this allows us to make profitable trades while minimizing the risk associated. We conclude that XGBoost gives greater performance in terms of accuracy in prediction of stock price direction than support vector machine, k-neighbours, neural network using LSTM, Random forest, logistic regression and decision tree.

330

S. Jadhav et al.

References 1. Deepak, R.S., Uday, S.I., Malathi, D.: Machine learning approach in stock market prediction. IJPAM 115(8), 71–77 (2017) 2. https://pypi.org/project/yfinance/ 3. Dey, S., Kumar, Y.: Forecasting to Classification: Predicting the Direction of Stock Market Price Using Xtreme Gradient Boosting PESIT South Campus 4. Jae Kim, K.: Financial time series forecasting using support vector machines. Neurocomputing 55 (2003). U.S. Patent 5668842, Sept. 16, 1997 5. Silva, E., Castilho, D., Pereira, A., Brandao, H.: A Neural Network Based Approach to Support the Market Making Strategies in High-Frequency Trading. 978-1-4799-1484-5/14 ©2014 IEEE 6. Ghosh, R.: Forecasting Profitability in Equity Trades Using Random Forest, Support Vector Machine and xgboost. Cisco Video Technologies India Pvt Ltd. (India)

Performance of Grid-Connected Photovoltaic and Its Impact: A Review Sanjiba Kumar Bisoyi, Sudhanshu Maurya, Suyash Binod, Aayush Srivastava, and Abhishek Baluni

Abstract The fast depletion of fossil fuels has created a quest for alternate energy sources to take care of the increased load demands Solar energy is the most abundant form of renewable energy available to us than any other source of energy. The photovoltaic energy generated on a large scale and connected to the grid will produce everlasting power. This paper offers a detailed review of the grid-connected photovoltaic (GCPV) system and the different management and techniques related to it. Various aspects of the GCPV system have been explained throughout the paper which covers many different algorithms and software such as MATLAB, PVsyst6.7, etc. dedicated to researching many possible merits and demerits of the system. As the ever-increasing need for alternatives, this paper would provide a better understanding and knowledge of the GCPV system as a whole. Keywords GCPV · MATLAB/Simulink · Performance analysis · Cascaded multilevel DC/AC converters (CMCs) · Maximum PowerPoint Tracking (MPPT) · Particle swarm optimization (PSO)

1 Introduction In this present scenario, countless hours of research and capital are finding their way towards the technology that presents itself as a good alternative to the present technology. The reason is simple, there is either a lack of the technology that we possess or the technology has demerits that could be overcome. Such technology is the grid-connected photovoltaic (GCPV) system. The integration of solar technology with good enough synchronization and fault control with the present grid is the gridconnected system. A GCPV system generally comprises of a controller that controls the maximum power point of PV array and the current and power injected into the grid, an inverter that converts DC output of PV array into AC current and injects to the grid connected and lastly, it is comprised PV array itself [1]. As the research in this S. K. Bisoyi (B) · S. Maurya · S. Binod · A. Srivastava · A. Baluni Department of Electrical Engineering, JSS Academy of Technical Education, Noida, U.P., India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_29

331

332

S. K. Bisoyi et al.

field escalates, the concept like cost, performance and many more was responsible for the betterment of the said system. In this paper, various GCPV models are showcased with the factors by which it is affected. The factors by which the performance of the GCPV system is affected including the faults, etc. is also explained. This paper will also walk the readers through the explanation of centralized and decentralized GCPV systems, then it talks of various generations of PV modules and in terms of sizing, it gives a glimpse of the real time performance of GCPV considering its modelling and simulation. Further, it also mentions possible fault conditions and the methods of synchronization of a PV with the grid mains. Hence many simulations and experiments are performed for the study using different software. This paper will help to introduce the topic and also will help in a better understanding of the topic to further investigate in the domain of the GCPV system.

2 Centralized and Decentralized GCPV System 2.1 Centralized Grid-Connected Photovoltaic System Large scale centralized distributed Systems are made using cascaded multilevel DC/AC converters (CMCs), reduction in power loss is due to lower switching frequency in CMCs on the DC side of each sub-module. Also, in CMCs, the dc side of each module can be connected to an independent PV array string. PV array is connected to DAB converters which are then connected in series by CMCs. CMCs contain Maximum PowerPoint Tracking to maximize efficiency [2].

2.2 Decentralized Grid-Connected System Insolation mismatch among PV modules is the main drawback of centralized GridConnected PV systems (which refers to partial shading). The decentralized GCPV can be done with a Multiple Integrated Converter (MIC) with a common inverter and intermediate DC load. MIC is integrated with an individual MPPT controller. Thus, the power conversion efficiency is increased in partial shading [3].

3 Generations of Photovoltaic Cells The generations of solar cells can be categorized into four parts depending on the materials used. The first-generation solar cells which are commonly available comprises single and multi-crystalline silicon. These solar cells have old technology

Performance of Grid-Connected Photovoltaic …

333

Fig. 1 Different generations of photovoltaic cells [4]

and hence, have many drawbacks. To minimize these drawbacks second generation solar cells came into limelight to rectify a huge consumption of material and price of silicon solar cells. Third generation solar cells correspond to solar radiation management concepts via dye-sensitized solar cells (DSSCs), organic solar cells, Photo-chemical cells and, etc. The continuous advancement of solar cells has led to the evolution of solar cells consisting of composites, known as the fourth generation [4]. The effectiveness of GCPV output also depends upon the material of solar cell used, e.g. m-Si, a-Si, a-Si. p-Si material is best for the construction of solar cells [5]. Multi-crystalline-silicon Grid, amorphous silicon can also be to construct solar cells to achieving better output [6] (Fig. 1).

4 Sizing of GCPV System Grid-connected photovoltaic system sizing can be determined on various variables such as the amount of strings, panels per string and spaces between rows of modules, number of modules in series or parallel. String size variation (positive) helps to achieve an increase in the inverter’s input voltage. The suitable tilt angle of the photovoltaic system increases almost one percent of the output energy of the PV panel [7]. Particle Swarm Optimization (PSO) algorithm is an effective and efficient way for the design optimization of optimal numbers of devices and the optimal value of PV module, tilt angle, placement of PV module. PSO is more efficient than exhaustive search methods and Genetic algorithms as the cost of production of energy

334

S. K. Bisoyi et al.

using PSO is less [8]. And thus, the controlling of electricity tariffs and the increase in the production of energy can be easily monitored by the Sizing of the GCPV. The prescribed quantity of PV modules (modules per string and the strings in parallel) of the system is suggested by the methodologies and this reduces the cost of energy and appropriate gaps between rows to generate maximum energy [9].

5 Economical Aspects When it comes to minimization of cost, Although, the 4th generation is under ongoing research, the 3rd generation solar cell is more feasible than the 1st and 2nd generations [4]. The Teaching Based Optimization Algorithm is better than any other algorithm in terms of reduction of total net present cost (NPC) and Cost of energy (COE) [10]. Also, there are some strategies for energy management at residential levels which are as follows: Energy management optimization under time-of-use tariff (EMOTT) and Energy management optimization under step tariff (EMOST). Generally, it is used for the optimization of total energy cost using a genetic algorithm [11]. A 1-phase single-stage CSI-based GCPV connection is applied, by replacing a transformer while using a fuzzy logic controller. Hence it provides high efficiency and low-cost output power to the grid [12].

6 Modelling and Simulation Nowadays, GCPV systems are Designed and Simulated with the help of different software. Since this software is programmed as such to counter errors while performing simulation and modelling of the GCPV system. Hence, these types of software are easily accessible, easy to learn and can be used anytime. In today’s generation, there are many software available in the market by which we can design GCPV models some of them are as follows: 1. 2. 3.

MATLAB/Simulink [13] PVsyst6.7 [14] Typhoon HIL (Hardware in loop) and many more.

6.1 Models of Grid-Connected Photovoltaic Systems Grid-Connected Photovoltaic System consists of many different types of electrical components depending upon many factors like performance, economic aspects, sizing, etc. As we know, today these systems can be developed easily by using different software with different types of Algorithms. Since every GCPV system mainly consists of a PV panel with a step-up Boost converter and Inverters used to

Performance of Grid-Connected Photovoltaic …

335

Fig. 2 Block diagram of decentralized GCPV system [3]

synchronize the PV system with grid. But some consist of other components, by permitting independent control relative to irradiation level, which produces a multilevel of DC voltage and will be transformed into AC through an Active Neutral Point Clamped (ANPC) inverter. An ANPC inverter is connected with the system [15] (Fig. 2). Depending on the specific applications system is connected with different components, 2 control laws are considered in this system mainly known as Slide Mode Control. 1. 2.

1st Sliding mode Control: The desired reference DC Voltage at the Inverter’s Input is ensured by DC-DC Converter. 2nd Direct power control rule: This modulates the exchange of Reactive and Active power between the Grid and PV source [16].

Also, as in grid-connected systems, inverter control is important and is done through the PLL (Phase-Locked Loop) technique which not only improves the stability of the system and its performance but also plays an important role in determining whether inverter grid connection is possible [17]. Hence there are many types of algorithms depending upon applications like the P&O algorithm. It shows that the modified P&O technique works more efficiently to track MPPT under the rapid change in temperature and irradiation [18].

6.2 Real-Time Simulation The definition of real-time simulation can be depicted as the computer processing the simulated model at the actual “clock” time. After Modelling, we need to Simulate the model in real time to know the performance, efficiency, faults, etc. of the system. In some conditions, where faults are introduced in the system like- Low Voltage Ride Through (LVRT), High Voltage Ride Through (HVRT), Low-Frequency Ride

336

S. K. Bisoyi et al.

are injected analysis in real time [19]. Also, some applications like the analysis of photovoltaic cells and their working parameters are observed in the form of grid-connected systems using real-time simulation [20]. Hence, the designing of a PV/battery grid-connected system can also be done with real-time simulation [21].

7 Performance Analysis We know that every GCPV system is designed and its performance is compared to the optimum level by which it can efficiently work on its specific application. But for specific applications, we need to specify the ratings of the GCPV system. Since for some factors like economic stability, a nominal GCPV system can be combined with subsystems each rating in KWs [22]. Thus, a general MW high-scale GCPV can be connected to the medium voltage (MV) side of the grid depending upon the modified requirements, hence the main aim is to assess and manage the power quality issues that need to be fulfilled [23]. Since in a GCPV system a new operational method can be introduced. In which, we can modulate the voltage at a specific point just to improve voltage characteristics in a system. Also, hysteresis current control and instantaneous power theory methods can be applied to inverter control to provide better performance as well [24]. Also, to achieve system stability, the cascaded-mode control scheme is better than the standard vector control scheme under abnormal conditions [25]. For producing a new method for modelling, simulation and also to detect the transient nature of PV module a rated power in KW’s containing PV generator and singlephase grid-connected inverter is considered [13]. Hence, a generalized PV module is acquired under a modified P&O technique which works more efficiently to track MPPT under the rapid change in temperature and irradiation [18] (Fig. 3). There is a loss in energy in Dry area: − 15% and in Desert area: − 20% and regular cleaning of PV module increases the efficiency of PV Output [27]. Although based on Techno-economic analysis of a grid-connected PV/battery system using TLBO technique for cost-effectiveness and comparing it with swarm optimization and genetic Fig. 3 Power-voltage characteristics of solar PV with varying irradiation and temperature [26]

Performance of Grid-Connected Photovoltaic …

337

algorithm considering identical climate, greenhouse emissions and reliability conditions in the total objective cost and it was found that the climate was not affecting the data [10]. Now discussing methods proposed to optimize the performance under different weather conditions. 1.

2.

The humid tropic region, the outdoor performance of 5 Photovoltaic systems operating on 5 different solar cell technologies namely Amorphous Silicon, monocrystalline, Heterojunction Incorporating thin film, poly-crystalline and Copper Indium disulphide thin film (CIS), concludes that the p-Si is more feasible than the CIS technologies, which can be implied on PV system for the proposed location when considered, followed by a-Si, mc-Si and HIT respectively [22]. Based on A Comparative Investigation of Max. PowerPoint Tracking Techniques for Grid-connected PV System under various weather conditions it was found that Adaptive Neuro-Fuzzy Inference System has impressive optimization and adaptive abilities (Higher precis and flexile) where Cell temp and the Irradiation level is the input to the Controller. Adaptive Neuro-Fuzzy Inference System based MPPT adjusts duty cycle easily and accurately concerning change in weather [28]

7.1 Faults Commonly seen faults in GCPV are voltage sag, voltage flicker, harmonics, voltage unstable and instantaneous frequency. So special systems are modelled to alleviate the sage incidence and also acquire a low voltage flow in the system an optimized inverter controller is used. These new systems converse the operating mode and rectify the detected sag and faults in the system by applying a reactive current into the circuit depending upon the depth of sag during normal conditions [23]. The most common issue is disconnection from the grid of PV inverter due to variation of grid voltage as it affects the stability of the power network [29, 30]. Disconnection of PV System causes (1) Dc-link overvoltage (2) Excessive Ac Current (3) Loss of grid Synchronization. The improvement of LVRT technology proposed by linking the Energy Storage System (ESS) to DCL System by use of DC Braking Chopper where the braking resistor releases the energy that would have made a voltage dip, but the maintenance cost is required. The comparison shows that DC-Link voltage regulation is done better with D-MPPT (loss maintenance cost). But the efficiency is low [30]. To provide, a robust control process and better power quality in the GCPV system. A Voltage Source Inverter (VSI) and DC Boost Converter and LCL filter are further added, to provide prevention from harmonics and faults [31]. The LVRT was chosen because it keeps generating plants connected to the grid in the low grid voltage period and might disconnect the renewable resources to the system and reconnects it after the fault is dealt with [32]. Sometimes, an automatic fault detection software tool is applied to detect the faults in the GCPV system. Also, it is based on the calculation of voltage indicators and currents with certain limits

338

S. K. Bisoyi et al.

to calculate the system faults as well. In conclusion, the tool is easy to understand, requires fewer sensors, fewer Simulink models and is more accurate [33].

7.2 Synchronization Synchronization of the grid with a photovoltaic system is most important for proper functioning. As in grid-connected systems inverter control is important and is done through the PLL technique which improves the stability of the system and its performance, also determines whether inverter grid connection is possible. Using a SOGIQSG (Second-Order Generalized Integrator Based Quadrature Signal Generator) based single-phase PLL, analysis show operational designing of proposed PLL technique and hence its improvised efficiency is seen through simulation results on MATLAB. Also improving the islanding detection method using PLL where at present in active islanding detection techniques perturbation signal is added to PLL signal so on disconnecting grid voltage, the common couple point shifts more than a threshold for amplitude and frequency, which introduced disturbance signal to PLL hence reducing the efficiency of distributed generation, in place of this latest passive islanding detection technique is introduced that increases time delay in PLL control and improves efficiency [17]. Studies have shown that the use of synchronization techniques and control strategies is related to GCPV systems, which would ensure the system’s maximum efficiency without noise or hindrance. For interfacing electronic devices with the grid, adequate control is required, which is provided through a synchronization algorithm. This provides Synchronization of a 3-phase grid-connected system using DQ-PLL algorithm along with positive sequence detector (PSD) ensures desired outputs of phase and frequency of the system by eliminating any harmonic distortions and unbalance. Since these two are the major factors determining synchronization between PV systems and 3- phase grid, controlling these efficiently helps in obtaining perfect outcomes of the load voltage, load current, grid voltage, current and their respective instantaneous active and reactive power [34]. To mitigate this issue, the main ideology is to work on the application of the Fortescue theorem in positive sequence detector (PSD) [35]. This method is used to eradicate the issue of sensitivity towards the asymmetric grid voltages while using conventional DQPLL or synchronous reference frame phase-locked loop algorithm which leads to delusion while sensing the frequency and phase [36]. A study on a single-phase two stage GCPV system successfully executed under Partially shaded conditions, Standard test conditions and load variation show that synchronization of grid current with grid voltage under variable irradiance and load. And the primary benefit of the ABC-PO algorithm was the elimination of convergence during partial shading [37].

Performance of Grid-Connected Photovoltaic …

339

8 Conclusion The overview of the grid-connected photovoltaic system is presented. Many algorithms and simulation software including models are explained and compared. Software like MATLAB/Simulink, PVsyst6.7, etc. has been provided for the simulation of the GCPV system. The performance analysis with various aspects has also been explained. And as predicted, it is a better alternative to the present conventional technology and gives better performance, as for how much it can develop, it is still under the researching stage. The cost of producing energy can be minimized by monitoring the sizing of GCPV through various methodologies that give the right number of PV modules and suitable spacing between the rows for maximum energy production.

References 1. Li, F., Li, W., Xue, F., Fang, Y., Shi, T., Zhu, L.: Modeling and simulation of large-scale gridconnected photovoltaic system. Int. Conf. Power Syst. Technol. 2010, 1–6 (2010). https://doi. org/10.1109/POWERCON.2010.5666736 2. Liu, H., Jia, Q.: Operation and control of a novel multi-input PV grid-connected system. In: 2017 IEEE Transportation Electrification Conference and Expo, Asia-Pacific (ITEC Asia-Pacific), 2017, pp. 1–6. https://doi.org/10.1109/ITEC-AP.2017.8080860 3. Aditya, C., Sundararaman, K., Gopalakrishnan, M.: Multiple-integrated converters for gridconnected PV systems with intermediate DC loads. In: 2016 2nd International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB), 2016, pp. 544–548. https://doi.org/10.1109/AEEICB.2016.7538350 4. Engineered Nanomaterials for Energy Applications: Handbook of Nanomaterials for Industrial Applications, pp. 751–767, 2018. https://doi.org/10.1016/b978-0-12-813351-4.00043-2 5. Notton, G., Lazarov, V., Stoyanov, L.: Optimal sizing of a grid-connected PV system for various PV module technologies and inclinations, inverter efficiency characteristics and locations. Renew. Energy 35(2), 541–554 (2010). https://doi.org/10.1016/j.renene.2009.07.013 6. Singh, S., Kumar, R., Vijay, V.: Performance analysis of 58 kW grid-connected roof-top solar PV system. In: 2014 6th IEEE Power India International Conference (PIICON), 2014. https:// doi.org/10.1109/34084poweri.2014.7117738 7. Rostegui, G., Salles, M.B.C., Gemignani, M.: Methodology for the sizing of grid-connected photovoltaic systems. In: 2018 IEEE 7th World Conference on Photovoltaic Energy Conversion (WCPEC) (A Joint Conference of 45th IEEE PVSC, 28th PVSEC and 34th EU PVSEC), 2018, pp. 1225–1230. https://doi.org/10.1109/PVSC.2018.8547851 8. Gemignani, M., Rostegui, G., Salles, M.B.C., Kagan, N.: Methodology for the sizing of a solar PV plant: a comparative case. In: 2017 6th International Conference on Clean Electrical Power (ICCEP), 2017, pp. 7–13. https://doi.org/10.1109/ICCEP.2017.8004784 9. Kornelakis, A., Marinakis, Y.: Contribution for optimal sizing of grid-connected PV systems using PSO. Renew. Energy 35(6), 1333–1341 (2010). ISSN 0960-1481. https://doi.org/10. 1016/j.renene.2009.10.014 10. Ashtiani, M.N., Toopshekan, A., Astaraei, F.R., Yousefi, H., Maleki, A.: Techno-economic analysis of a grid-connected PV/battery system using the teaching-learning-based optimization algorithm. Solar Energy 203, 69–82 (2020). ISSN 0038-092X. https://doi.org/10.1016/j.sol ener.2020.04.007 11. Zhang, S., Tang, Y.: Optimal schedule of grid-connected residential PV generation systems with battery storages under time-of-use and step tariffs. J. Energy Storage 23, 175–182 (2019). ISSN 2352-152X. https://doi.org/10.1016/j.est.2019.01.030

340

S. K. Bisoyi et al.

12. Anil Kumar Reddy, K., Surajith Reddy, P.: Single-phase transformer less single-stage gridconnected PV system. Int. Res. J. Eng. Technol. (IRJET) 02(06) (2015), p-ISSN: 2395-0072, e-ISSN: 2395-0056, www.irjet.net 13. Chouder, A., Silvestre, S., Sadaoui, N., Rahmani, L.: Modeling and simulation of a gridconnected PV system based on the evaluation of main PV module parameters. Simul. Model. Pract. Theory 20(1), 46–58 (2012). https://doi.org/10.1016/j.simpat.2011.08.011 14. Dey, D., Subudhi, B.: Design, simulation and economic evaluation of 90 kW grid connected Photovoltaic system. Energy Rep. 6, 1778–1787 (2020). ISSN 2352-4847. https://doi.org/10. 1016/j.egyr.2020.04.027 15. Mao, M., Zhang, L., Huang, H., Chong, B., Zhou, L.: Maximum power exploitation for gridconnected PV system under fast-varying solar irradiation levels with modified salp swarm algorithm. J. Clean. Prod. 268, 122158 (2020). ISSN 0959-6526.https://doi.org/10.1016/j.jcl epro.2020.122158 16. Abbadi, A., Hamidia, F., Morsli, A.: New MPPT sliding mode approach for grid connected PV system. Int. Conf. Appl. Smart Syst. (ICASS) 2018, 1–5 (2018). https://doi.org/10.1109/ ICASS.2018.8652069 17. Xie, D., Zhang, D., Gao, P.: Research on phase-locked loop control and its application. In: 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, 2016, pp. 818–821.https://doi.org/10.1109/ITNEC.2016.7560475 18. Salman, S., Ai, X., Wu, Z.: Design of a P-&-O algorithm based MPPT charge controller for a stand-alone 200 W PV system. Prot. Control Mod. Power Syst. 3(1), 25. https://doi.org/10. 1186/s41601-018-0099-8 19. Gautam, G., Poddar, S.: Real time simulation of 3-ϕ grid-connected converter with real and reactive power control under different grid fault conditions. In: Reddy, M.J.B., Mohanta, D.K., Kumar, D., Ghosh, D. (eds.) Advances in Smart Grid Automation and Industry 4.0. Lecture Notes in Electrical Engineering, vol. 693. Springer, Singapore. https://doi.org/10.1007/978981-15-7675-1_60 20. Guo, X., Chen, J., Liu, Q.: Real-time and grid-connected control of PV power system. In: 2011 International Conference on Advanced Power System Automation and Protection, 2011. https://doi.org/10.1109/apap.2011.6180724 21. Xie, Z., Wang, Z., Li, P., Ding, C.: A real-time transient simulator of grid-connected photovoltaic/battery system based on FPGA. In: 2015 5th International Conference on Electric Utility Deregulation and Restructuring and Power Technologies (DRPT), 2015, pp. 402–406. https:// doi.org/10.1109/DRPT.2015.7432266 22. Quansah, D.A., Adaramola, M.S., Appiah, G.K., Edwin, I.A.: Performance analysis of different grid-connected solar photovoltaic (PV) system technologies with combined capacity of 20 kW located in humid tropical climate. Int. J. Hydrogen Energy 42(7), 4626–4635 (2017). ISSN 0360-3199. https://doi.org/10.1016/j.ijhydene.2016.10.119 23. Al-Shetwi, A.Q., Hannan, M.A., Jern, K.P., Alkahtani, A.A., PG Abas, A.E.: Power quality assessment of grid-connected PV system in compliance with the recent integration requirements. Electronics 9(2), 366 (2020). https://doi.org/10.3390/electronics9020366 24. Kuo, M.-T., Lu, S.-D.: A study of DC-AC inverter optimization for photovoltaic power generation system. In: 2012 IEEE Industry Applications Society Annual Meeting, 2012. https://doi. org/10.1109/ias.2012.6374000 25. Krommydas, K.F., Alexandridis, A.T.: Nonlinear analysis methods applied on grid-connected photovoltaic systems driven by power electronic converters. IEEE J. Emerg. Sel. Top. Power Electron. 8(4), 3293–3306 (2020). https://doi.org/10.1109/JESTPE.2020.2992969 26. Sharma, R., Kumar, V., Sharma, S., Karthikeyan, V., Kumaravel, S.: High efficient solar PV fed grid connected system. In: 2018 IEEE International Conference on Power Electronics, Drives and Energy Systems (PEDES), 2018, pp. 1–6. https://doi.org/10.1109/PEDES.2018.8707850 27. Mostefaoui, M. et al.: Importance cleaning of PV modules for grid-connected PV systems in a desert environment. In: 2018 4th International Conference on Optimization and Applications (ICOA), 2018, pp. 1–6. https://doi.org/10.1109/ICOA.2018.8370518

Performance of Grid-Connected Photovoltaic …

341

28. Omar, B.M., Samir, H., Ahmed, Z.S., Islam, D.K.Y.: A Comparative investigation of maximum power point tracking techniques for grid connected PV system under various weather conditions. In: 2017 5th International Conference on Electrical Engineering—Boumerdes (ICEE-B), 2017, pp. 1–5. https://doi.org/10.1109/ICEE-B.2017.8192014 29. Khawla, E.M., Dhia, C., Lassaad, S.: LVRT control strategy for three-phase grid connected PV systems. Int. Conf. Green Energy Convers. Syst. (GECS) 2017, 1–7 (2017). https://doi.org/10. 1109/GECS.2017.8066237 30. RajaMohamed, S., Jeyanthy, P.A., Devaraj, D., Bouzguenda, M.: Performance comparison of active and passive LVRT strategies for grid connected PV systems. In: 2019 IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS), 2019, pp. 1–5. https://doi.org/10.1109/INCOS45849.2019.8951327 31. Amar Raj, R.D., Aditya, T., Shinde, M.R.: Power quality enhancement of grid-connected solar photovoltaic system using LCL filter. In: 2020 International Conference on Power Electronics and IoT Applications in Renewable Energy and Its Control (PARC), 2020. https://doi.org/10. 1109/parc49193.2020.236621 32. Bhangale, S.S., Patel, N.: Design of LVRT capability for grid connected PV system. In: 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), 2017, pp. 1625–1630. https://doi.org/10.1109/ICICICT1.2017.8342814 33. Rai, N., Kumar, A.: Fault detection in GCPV plant using a graphical user interface. In: 2014 6th IEEE Power India International Conference (PIICON), 2014. https://doi.org/10.1109/pow eri.2014.7117637 34. Naick, B.K., Das, M., Chatterjee, T.K., Chatterjee, K.: Study and implementation of synchronization algorithm in three phase grid connected PV system. In: 2016 3rd International Conference on Recent Advances in Information Technology (RAIT), 2016, pp. 304–309. https://doi. org/10.1109/RAIT.2016.7507921 35. Vincenti, D., Ziogas, P.D., Patel, R.V.: An analysis and design of a force commutated threephase PWM AC controller with input unbalance correction capability. In: [Proceedings] APEC ‘92 Seventh Annual Applied Power Electronics Conference and Exposition, 1992, pp. 487–493. https://doi.org/10.1109/APEC.1992.228370 36. Rodriguez, P., Teodorescu, R., Candela, I., Timbus, A.V., Liserre, M., Blaabjerg, F.: New positive-sequence voltage detector for grid synchronization of power converters under faulty grid conditions. In: Proceedings of 37th IEEE Power Electronics Specialists Conference 2006, PESC’06, pp. 1–7, 18–22 June 2006 37. Pilakkat, D., Kanthalakshmi, S.: Single-phase PV system operating under partially shaded conditions with ABC-PO as MPPT algorithm for grid-connected applications. Energy Rep. 6, 1910–1921 (2020). ISSN 2352-4847. https://doi.org/10.1016/j.egyr.2020.07.019 38. Alhejji, A., Mosaad, M.I.: Performance enhancement of grid-connected PV systems using adaptive reference PI controller. Ain Shams Eng. J. 12(1), 541–554 (2021). ISSN 2090-4479. https://doi.org/10.1016/j.asej.2020.08.006

A Novel Approach of Deduplication on Indian Demographic Variation for Large Structured Data Krishnanjan Bhattacharjee, Chahat Garg, S. Shivakarthik, Swati Mehta, Ajai Kumar, Shonil Bhide, Kshitija Kulkarni, Shivank Ratnaparkhi, Khushboo Agarwal, and Varsha Naik Abstract In the era of Big Data Analytics, information dissemination, data integrity, and identifying unique records from large pool of data poses a big challenge for analysts in entity matching and linking scenarios. Data ingestion from multiple sources of same real-world entity exhibits several data quality issues like redundancy, incorrectness, variations, etc. Also, there are data input errors like typographical/spelling mistakes as well as missing fields. In order to achieve entity resolution, uniqueness and eradicate data redundancy and improve the data quality issues, deduplication is the solution. India being a multi-lingual and multi-cultural country K. Bhattacharjee (B) · C. Garg · S. Shivakarthik · S. Mehta · A. Kumar Center for Development of Advanced Computing (C-DAC), Pune, India e-mail: [email protected] C. Garg e-mail: [email protected] S. Shivakarthik e-mail: [email protected] S. Mehta e-mail: [email protected] A. Kumar e-mail: [email protected] S. Bhide · K. Kulkarni · S. Ratnaparkhi · K. Agarwal · V. Naik Dr. Vishwanath Karad MIT World Peace University, Pune, India e-mail: [email protected] K. Kulkarni e-mail: [email protected] S. Ratnaparkhi e-mail: [email protected] K. Agarwal e-mail: [email protected] V. Naik e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_30

343

344

K. Bhattacharjee et al.

with vast demographic variations, there is a need to develop India-centric model for handling deduplication on various Indian structured data held by various authorities. This research proposes a novel approach catering to India-centric demographic variations, region-specific naming conventions, address standardization using a highly customizable and scalable deep learning approach, by customizing DeepMatcher algorithm along with a synthetic data generation tool reckoning Indian variations of names and addresses in a region-specific manner. Keywords Entity resolution · Entity matching · Deep learning · Deduplication

1 Introduction The age of Big Data and digitization has assimilated into every sphere of modern society. Besides the digitization of education, news, scientific or historical documents; in today’s era of Big Data Analytics is characterized by large volume of structured data that is produced and consumed by various agencies like income tax authorities, telephone providers, licensing authorities, etc. in every modern country. Personal identification details of entities are held by various governmental agencies whereas transactional data is mainly held by service providers—banks, credit cards. Such identity and transaction data have currently become digital footprints of individuals in every country. If a State needs to disburse welfare to its citizens or if any agency wants to track criminals, let be vaccination drive to census, it is all about the volume of structured data of individuals in every country that need to be accessed and appropriately used. The Digital India Initiative has brought in digitization of Indian citizen’s identity, organization, and finance specific data in every sphere of Government-citizen interaction scenario. But at the point of creating this structured data is prone to errors. Data entry errors like spelling variations, missing fields, typographical errors, incomplete data entry, data anomalies are prevalent. There also exist challenges in multiple formats for names and addresses. In the case of Western world, there exists some well-defined and mapped rules for structuring names (First Name Last Name) and addresses (mapped and formatted). The structured and well-defined format of name and address in western countries facilitates structured data processing. Utilization and applicability of this data are then harnessed by the agencies using the data. Deduplication can be performed easily on this data due to symmetrical and uniformity of the data. However, the Indian specific challenges for deduplication remain in varieties of names as per demography, i.e., Northern India vis-à-vis South India, compared to Eastern and North-Eastern India, the sub-continent is much diversified in culture and languages. Hence, human culture specific errors while creating digital records for similar sounding names as well as Indian region-specific conventions for writing names are not uniform throughout the nation. The culture specific variation of same name often distorts a person’s real name while creating identity documents in another

A Novel Approach of Deduplication on Indian …

345

part of country in a scenario of migration or resettlement owing to job or marriage. For example, Amrita in North India is written as Amrutha in South India. The issue of transliteration through Romanization of Devanagari script or other scripts creates variations of data entries. This leads to multiple versions of the same name (Harilal, Hari Lal, Krishna, Krushna). Duplication of data poses several problems. There are many reported cases in media of a person holding multiple voter cards, passports, or PAN cards in many states. This is not only a potential law and order issue but it also hinders the Indian digitization initiative of dissemination of benefits to people, medical benefits to poor or even census. The sheer volume of duplicate data causes problems in efficient search retrieval and identification of correct entities in case of unlawful activities. Considering the above challenges, there is a need for an efficient deduplication approach on large scale structured data suitable for Indian data nuances. As depicted in this research, AI based Deep Learning algorithms with India specific novel approach and customization for structured data is capable of offering solution of the aforementioned India-centric data duplication issues.

2 Literature Survey and Gap Analysis 2.1 Literature Survey The challenges mentioned above have been approached by various researchers and solution providers. Following are the glimpses of the pivotal existing approaches: Fellegi and Sunter [1], established the formal mathematical foundations for record linkage. The model proposed by them, solely based on probability theory, is called a probabilistic model. Further work has been done by Winkler [2] in enhancing and extending the model by using the Expectation Maximization method. Alias [3] is an open-source machine learning tool for the deduplication process, which calculates similarity between fields using edit distance and then classifies a match based on trained decision tree classifiers. However, this tool can only be used on domain-specific fields and works only for small datasets. In TAILOR [4] the authors have discussed record linkage problems by adopting traditional ML methodologies such as decision tree induction, clustering, etc. The proposed hybrid model outperforms the probabilistic record linkage model when considering performance and accuracy metrics. Dedupe.io [5] is a cloud service, for deduplication and entity matching. It allows for uploading the datasets in the form of CSVs, setting up and training a model, clustering, and reviewing the results. Dedupe.io also supports record linkage across various data sources using an API. Mapping based object matching system (MOMA) [6] uses an approach similar to ensemble technique on both attribute information and contextual information. The output is an instance level mapping that helps in Peer-to-Peer data integration

346

K. Bhattacharjee et al.

systems and can be reused for other matching tasks. Nevertheless, this tool does not give effective results on big data, and tuning of match workflows, to select existing mappings, needs to be done manually. Febrl [7] is a tool suitable for new record linkage users and practitioners, based on the use of distance measures and classifiers like SVM and K-means Classifiers. This is the first ever open-source tool built for entity resolution (ER), specifically for the Healthcare domain. However, this tool has the ability to work on only the name, address, date, and phone number fields. Real-world data is quite noisy and has several fields with several variations which cannot be compared with the help of only simple distance measures and classifiers. Dedoop [8] is a Map-Reduce based entity resolution tool for large datasets. It also includes blocking, matching along with the voluntary use of ML for the auto generation of match classifiers. It provides a versatile web interface that allows an ER workflow definition as well as the inspection of computed match results. Magellan [9] is an entity matching tool that seeks to cover the entire pipeline of blocking, matching, extraction, sampling, labeling, estimating accuracy, etc. It is built using the python data science stack and has the capacity to add and patch up new methods in all parts of the process. Overlap blocking along with hash and canopy based methods are provided for comparison between blocking methods. Rule based learning along with custom features for training and matching is available. Deepmatcher [10] uses a Recurrent Neural Network with attention which implements attribute and a vector concatenation augmented with element-wise absolute difference during attribute comparison to form the input to the classifier module. With the use of Attribute Embedders and Attribute Summarizers, it helps to identify matching records, having various inconsistencies, and is suitable for entity matching process on a large scale, structured, and noisy data.

2.2 Gap Analysis It has been observed after careful analysis of the above-mentioned solutions/approaches that traditional machine learning strategies-based ones are often less accurate in handling real-world data variations which contain noisy data along with several inconsistencies, missing values, etc. Deep Learning (DL) Techniques, due to the ability to define different hidden layers and having feedback loop structures, do not have such a drawback and can work on high-dimensional, non-linear, real-world noisy data as observed in the case of solutions like ‘DeepMatcher’. Such DL based systems mentioned above are mostly trained on western demographic data and consistencies observed in Christian names as well as European and American address standardization. India is a country having 22 major languages and several hundred other sublanguages. The Indian demographic variations with linguistic variation of North

A Novel Approach of Deduplication on Indian …

347

India, South India, and several language specific orthographic conventions post challenge to standardize transliteration through romanization. Moreover, western pretrained data models do not yield efficient outputs on above-mentioned Indian-centric nuances. Hence, there is a requirement for creating Indian-centric datasets and models for DL approach for deduplication which will cater to the following variations typical to Indian scenario. Phonetic Variation: The romanization problem of culture and language specific naming conventions differences among south India, north India, north-east India lead to a number of phonetic and spelling variations when same name being used in different parts of the country, but written in different styles: viz., Vijeta (North India) ≥ Vijetha (South India), Amrita (North India) ≥ Amrutha (South India), Vivek (North India) ≥ Bibek (Eastern India: E.g. Bengali). Socio-Cultural Variations of Indian Demography • Variations in parts of name: In some parts of India, people only write their first names, where surnames are optional. Also, the middle name concept does not exist in several parts of India while in other regions it is highly important. This leads to creation of unstructured fields in the structured data. The concept of first name and surname is relative to a specific part of India and is often found to be interchanged when seen in different parts. Females in India tend to change their surnames after marriage. The new Surname may either be the surname of her husband or a combination of both her previous surname and her husband’s surname. • North Indian Nuances: The middle name indicates the father’s or husband’s name in some parts, while in other parts, it represents the gender (Singh, Kumar for males and Kaur, Kumari for females) or extensions to the name (Prasad, Lal, Rani, Devi, etc.). Further, these extensions can be a part of the first name or can be written as separate middle names (Hari Lal and Harilal). • South Indian Nuances: In South Indian names the surname is usually the father’s name and is often written with a single letter. (Bhuvaneshwari K., where K is the short form of the father’s name, Krishnamurthy). The first name in South Indian names is also written as a single letter. (Bhuvaneshwari K. can also be written as B. Krishnamurthy). Certain syllables have a different pronunciation and hence a different way of writing in south Indian languages. ‘i’ is often pronounced as ‘u’ and ‘t’ is often written as ‘th’. Thus, the name ‘Amrita’ in north India will be written as ‘Amrutha’ in South India. • Community specific formats of parts of name: Various religious communities have different ways of writing names in India. Most of the names in the Sikh Community have ‘Singh’ as their surname. Hence, the first name becomes the only distinct primary field to be compared. Similarly, ‘Mohammed’ with its numerous orthographic variations (Mohd., Md., Muhammad., etc.) is mostly found in Islamic names. Hence, only one part of these community specific names have distinct field values to be compared.

348

K. Bhattacharjee et al.

• Indian Address Variations: Unlike Western Countries, where address standardization with Geo-tagging has been achieved, Indian addresses do not adhere to such standards across the nation. Formats like Postal Pin Codes, mention of state, survey number, etc. are given, but people often shorten pin codes (e.g., 411027 as Pune-27) or even omit necessary information like street or building names. In villages or in case of old houses in many parts of the country, even no house number is found. Hence, Indian address poses great challenge in deduplication and normalization.

3 Methodology After considering all the approaches and algorithms mentioned above along with the analysis that has been done of Indian name variations, address variations, and the demographic-specific specialties as mentioned in Sect. 2.2, it was found that deepmatcher [10] is the algorithm which is a highly customizable, scalable algorithm using which a trainable model can be created. The models and parameters can be tweaked, and the layers can be defined catering to every Indian-centric variation explored in this research. The attribute embedding mode in the model can use methods like word embedding or character embedding. The comparison methods used can be fixed distance or learnable distance functions. Additionally, using Aggregate functions, RNN, Attention functions, or a combination of these, attribute summarization can be achieved. As deepmatcher uses a combination of these sophisticated methods like attribute embedding, attribute summarization, and comparison methods, it handles aptly the aforementioned Indian scenarios as depicted in this research. Figure 1 shows all the processes and modules that requires by the system for ER. These modules will be explained in the sections below.

Fig. 1 Flow diagram for deduplication approach

A Novel Approach of Deduplication on Indian …

349

3.1 Approach for Customization of Deepmatcher The parameters required to customize our novel approach are as follows: Synthetic Data Creation for Deduplication: The creation of any novel approach in deep learning requires customized model building, training rule sets, and data on which the model can be trained. The entity centric information for consumers and producers includes fields like First Name, Surname, Father’s Name, Mother’s Name, Gender, DOB, Address, Phone Number along with some Unique Identifiers like Driver’s License, PAN Card. These data sets along with transactional data, used by various government agencies, banks, customer centric firms, and other institutions usually contain sensitive data that cannot be disclosed to third parties or tech firms to create any deduplication approach. However, to attain success of Digital India Initiative and bring uniformity in the above-mentioned datasets of national importance, there is a need to create customized approach of synthetic data generation which is an integral part of the current approach. In absence of real data samples from the further perspective of privacy, security, and data sensitivity, this synthetic data has to be carefully crafted by analyzing the fields, obtained from real life day-to-day experiences or forms that are available to general public. Hence, training and testing data is generated after an extensive study of Indian demographic name variations as mentioned in Sect. 2. It is also carefully crafted on certain logical parameters like: • Short forms for names and surnames (R. K. Sharma and Rajesh Kumar Sharma, Shivbhushanam Srinivasan and Shivabushanam S.) are generated for entities having the same unique identifiers. • Address having different formats (Flat 10, Tara Apt. and 10, Tara Apartment). • Date of Birth Having various formats (7/11/1996 and 07-11-1996). • Unique identifiers are unique to a particular entity and can exist in the duplicates of that entity. Hence, a particular unique id is shared by only the duplicate entities. Considering all the variations, synthetic data is generated and also potential linking scenarios have been included so that it resembles real life scenarios like: (1) One family sharing the same address. (2) Children sharing their parent’s email id or phone number. (3) Work address shared by employees in the same organization. (4) Paying guest accommodation. Based on the parameters a sample synthetic data will look like Fig. 2.

Fig. 2 Example of generated synthetic data

350

K. Bhattacharjee et al.

Though all experiments are done with generated synthetic data, in order to avoid accidental resemblance with real-world entity details, the depiction of such data as shown in Figs. 2 and 5 are masked with marking some portions with ‘*’ to avoid such scenarios. A synthetic data set, having a size of 50,000 has been created with a customized synthetic generation tool, which has been custom coded using pandas library. The variations in names are created by using a dictionary for individual characters in the names, based on the rules found after extensive study of Indian demographic name variations. The fields are made missing using the ‘random’ library. Along with above variations, missing values in fields except name and ID fields are also added. Defining Match and Non-Match Parameters: Rules are created for creating matching and non-matching entities after observing real life data. Some of these rules are: • For Non-Matching records rules like: (1) Distinct rows, (2) Distinct rows having same names and surnames, (3) Distinct rows with same address, (4) Distinct rows with same father’s name and mother’s name, etc. • For Matching records rules like: (1) Exact matching rows, (2) Name variation present in names and surnames, (3) Same entities having missing values in various fields, (4) Females having different surnames, but the same remaining fields, etc.

3.2 Blocking Duplicate records always share some common attributes. Defining groups of data that share these common attributes and only comparing the records in that group, or “block”, the number of comparisons made can be significantly reduced along with the time required to do so [11]. Efficient blocking will lead to more confidence that the compared records are potential duplicates. The approach of indexing techniques is to process all records to be matched and to either insert each record into one or multiple blocks, according to the blocking key [12], so as to move the similar records in one block. The entire Blocking process (see Fig. 3) is explained below. Preprocessing: Indian first names often have extensions like ‘Kumar’ and ‘Prasad’ which may or may not be used. Parts of the name and titles like ‘Mohammed’ or ‘Singh’, are found in a large number of Indian names and often do not add value while comparing names. Hence, they are filtered out before encoding the name. As females in India usually change their surnames after marriage, surnames cannot be

Fig. 3 Flow diagram of blocking process

A Novel Approach of Deduplication on Indian …

351

used to generate blocking keys. Hence, the data sets are divided on the basis of the gender field and given as an input to the process of phonetic encoding. Phonetic Encoding: Functions to (phonetically) encode specific fields of the entity are commonly used in the indexing process of entity matching. They are also used to bring similar sounding attributes of entities into the same blocks. This can then help to find possible matches and non-matches between the names that sound the same, but have different spellings and might possibly be the same entity. After considering variations in Indian names, and by referring to the rules in existing algorithms like Soundex, NYSIIS, Phonex, Double Metaphone, and Indian name specific phonetic rules given in [13], rules for a custom phonetic encoding algorithm are designed. Some additional rules used for the custom phonetic algorithm are: • Standardize the pronunciations in different languages, E.g., TH → T, DNY → GY • Standardize pronunciations from Indian languages to English, E.g., KS, KSH → X, GRE, GRA → GR (Aggarwal → AGRVAL, Agrewal → AGRVAL) • Normalize all the vowels, E.g., Behl → BAhl Similarity Matching: String similarity metrics are used to calculate similarity between the encoded fields, i.e., blocking keys. After testing various similarity metrics like Levenshtein, Cosine [14], Jaro Winkler, Jaccard, etc. on data having variations in Indian names, it can be seen that Jaro Winkler distance and Cosine distance took the least amount of time to calculate the distances between 2000 pairs of records (see Fig. 4). After considering the accuracy and F1 score of these distances, as shown in Table 1, Cosine distance performs the best in terms of F1 score and time. The formula used to calculate the cosine distance between words is:

Fig. 4 Time comparison of distance measures

352

K. Bhattacharjee et al.

Table 1 Performance measure of distance functions Distance metric

Precision

Recall

Accuracy

F-score

Cosine

0.99

0.97

0.98

0.99

Levenshtein

0.98

0.99

0.98

0.98

Jaccard

1.00

0.88

0.94

0.93

Jaro Winkler

0.70

1.00

0.79

0.82

Manhattan

0.99

0.97

0.98

0.98

Euclidean

0.99

0.93

0.96

0.96

Fig. 5 Sample output with scores

Similarity = cosθ =

A.B ||A||||B||

(1)

where, A, B θ

word vectors. Angle between word vectors.

The score has the lowest value of 0 for least similar vectors and the highest value of 1 for highly similar vectors. If any of the keys to be compared have a single character, only the first character of the other key is considered for similarity calculation. This helps to overcome the problem of initials found in the Indian names. Block Generation: Based on similarity calculated in the earlier step, records having a value greater than the threshold were grouped data into “blocks”. The threshold value selected was 0.6 according to the results shown in Table 2. Tuples of two are then created for all the data present in the block. These tuples are then sent as an input to the entity matching module.

A Novel Approach of Deduplication on Indian …

353

Table 2 Performance metrics for different threshold values Threshold

Precision

Recall

Accuracy

F-score

0.5

0.96

0.9

0.98

0.98

0.6

0.98

0.99

0.99

0.99

0.7

0.99

0.97

0.98

0.98

0.8

0.99

0.95

0.97

0.97

3.3 Entity Matching A deep learning model based on Deepmatcher [10] is trained using conditions for classifying the tuples as matches or non-matches. Specifically, Hybrid attribute summarizer is used for this process along with word-level embeddings. It takes in the contents of a tuple pair, i.e., two sequences of words for each attribute, as input and generated a match score as output. The input and output data are CSV files. Traditional Recurrent Neural Network in its default nature generates a sequence vector leading to a scenario where averaging out weights leads to a feature generalization which sometimes highlights even the less important features in the input string. Combining attention mechanisms with traditional RNNs, used in Deepmatcher [10] allows creation of a vector of each individual attribute, focusing on information linked to their corresponding context as per importance of the features. This makes the alignment computations between the input sequences to be independent of the input RNN encodings, which eventually leads to the model converging faster. Further, the accuracy is improved by providing the encoded names as input, instead of the actual names.

3.4 Deduplication Similarity scores are obtained after the Entity Matching Process as shown in Fig. 5. The records are considered as potential duplicates, only if the generated similarity score is higher than a specified threshold value (0.90).

4 Results A data set with 6250 rows having 1560 duplicate entries were used for testing the blocking phase. The total number of comparisons for 6250 rows without blocking would have been 19 × 106 considering the formula: n = n(n − 1)/2. With blocking, the number of comparisons was reduced to 12 × 104 . It was also observed that all of the comparisons between 1560 duplicate entries along with corresponding rows were present in these 12 × 104 comparisons. This shows a 99% reduction in comparisons

354

K. Bhattacharjee et al.

Table 3 Performance metrics for deduplication Data variation

Without encoding

With encoding

Precision Recall Accuracy F score Precision Recall Accuracy F score

Missing 0.99 fields with address and name variations

0.58

0.73

0.73

1.00

0.86

0.93

0.92

Missing 1.00 fields with address variations

0.96

0.98

0.98

1.00

0.90

0.95

0.95

Missing 0.99 fields with name variations

0.60

0.79

0.74

0.99

0.93

0.96

0.96

Missing fields

0.98

0.98

0.98

1.00

0.93

0.93

0.96

0.99

between entities along with capability to cluster the duplicate entries in blocks for comparison. Anywhere between 1 and 5 fields, excluding names, were considered missing. For names and addresses, all the variations described in Sects. 2 and 3.1 were considered. Accuracy and F1 were calculated using 3120 matching and 3130 non-matching cases. Table 3 shows that accuracy and F1 score were poor for case 1 and 3. Further, the names and surnames were replaced with their encoded values and a model was trained accordingly. This model gave better results in terms of average f 1 score (84.25% increased to 94.75%) and average accuracy (88.5% increased to 95%).

5 Conclusion and Future Work An entity matching and deduplication module for structured data catering to Indiacentric customized layers and models for deep learning have been developed. Data quality issues, schema variations, volume, and velocity of data growth are taken into consideration while designing the system. A custom phonetic function, suitable for Indian names has also been developed. The system is a proof of concept to show the capabilities of deep learning strategies in entity resolution and deduplication with a complex multi-lingual scenario of Indian data digitization nuances coupled with lack of standardization of addresses names or form factors. The outcome of this research shows that DeepMatcher based deep learning approach is promising to solve several problems in the field of data integration like automated data extraction, data cleaning, and deduplication in all

A Novel Approach of Deduplication on Indian …

355

aforesaid Indian scenarios. The system can be further extended for other application domains like medical records, transactional data analysis, etc., and can suit any domain that requires structured data analytics.

References 1. Fellegi, I.P., Sunter, A.B.: A theory for record linkage. J. Am. Stat. Assoc. 64, 1183–1210 (1969). https://doi.org/10.1080/01621459.1969.10501049 2. Winkler, W.E.: The state of record linkage and current research problems. In: Statistical Research Division, US Census Bureau (1999) 3. Sarawagi, S., Bhamidipaty, A., Kirpal, A., Mouli, C.: Alias. VLDB’02: Proceedings of the 28th International Conference on Very Large Databases, pp. 1103–1106 (2002). https://doi.org/10. 1016/b978-155860869-6/50119-0 4. Elfeky, M.G., Verykios, V.S., Elmagarmid, A.K.: TAILOR: a record linkage toolbox. In: Proceedings 18th International Conference on Data Engineering (2002). https://doi.org/10. 1109/icde.2002.994694 5. Bilenko, M., Mooney, R.J.: Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining—KDD’03 (2003). https://doi.org/10.1145/956750.956759 6. Thor, A., Rahm, E.: MOMA-A mapping-based object matching system. CIDR 2007, 247–258 (2007) 7. Christen, P.: Febrl- an open source data cleaning, deduplication and record linkage system with a graphical user interface. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining—KDD 08 (2008). https://doi.org/10.1145/1401890. 1402020 8. Kolb, L., Thor, A., Rahm, E.: Dedoop: efficient deduplication with Hadoop. Proc. VLDB Endow. 5, 1878–1881 (2012). https://doi.org/10.14778/2367502.2367527 9. Konda, P., Das, S., Suganthan, G.C.P., et al.: Magellan: toward building entity matching management systems. Proceedings of the VLDB Endowment 9, 1197–1208 (2016). https:// doi.org/10.14778/2994509.2994535 10. Mudgal, S., Li, H., Rekatsinas, T., et al.:Deep learning for entity matching. In: Proceedings of the 2018 International Conference on Management of Data (2018). https://doi.org/10.1145/ 3183713.3196926 11. Papadakis, G., Skoutas, D., Thanos, E., Palpanas, T.: Blocking and filtering techniques for entity resolution. ACM Comput. Surv. 53, 1–42 (2020). https://doi.org/10.1145/3377455 12. Christen, P.: A comparison of personal name matching: techniques and practical issues. In: Sixth IEEE International Conference on Data Mining—Workshops (ICDMW’06) (2006). https://doi. org/10.1109/icdmw.2006.2 13. Kaushik, V.D., Bendale, A., Nigam, A., Gupta, P.: Certain reduction rules useful for deduplication algorithm of indian demographic data. In: 2014 Fourth International Conference on Advanced Computing and Communication Technologies (2014). https://doi.org/10.1109/ acct.2014.85 14. Rahutomo, F., Kitasuka, T., Aritsugi, M.: Test collection recycling for semantic text similarity. In: Proceedings of the 14th International Conference on Information Integration and Web-based Applications and Services—IIWAS’12 (2012). https://doi.org/10.1145/2428736.2428784

Dual-Message Compression with Variable Null Symbol Incorporation on Constrained Optimization-Based Multipath and Multihop Routing in WSN Pratham Majumder, Tuhin Majumder, Punyasha Chatterjee, and Sunrose Shrestha Abstract Majority of the energy-aware routing protocols for wireless sensor networks focus on hierarchical routing mechanism which is having major drawback of rapid depletion of battery life of sensor nodes near the sink due to uneven distribution of packets. To mitigate the adverse effect of improper packet distribution policy flowing toward cluster head, we have introduced a novel packet distribution policy with constrained optimization problem foundation, so that powers at all nodes are uniformly dissipated. Our analysis shows, for a nonlinear network, this optimization strategy can be effectively employed for a 1D nonlinear network with three parallel paths consisting of five nodes. Additionally, we have incorporated our established source coding scheme known as dual-message compression with variable null symbol (DCVNS) followed by the concept of silent communication. This scheme reduces the duration of most energy-consuming active state of sensor node and also reduces receiver energy by reducing the length of the encoded message, which eventually reduces overall communication time. Simulation result on real-life sensor dataset with commercially available low-cost and low-power devices, e.g., CC1100 and Maxim 2820 shows our proposed approach outperforms in all aspects of transmission energy profile with the existing schemes. Keywords Adhoc network · Wireless sensor network · Energy-efficient routing · Load-balanced in multipath routing

P. Majumder University of Calcutta, Kolkata, West Bengal, India T. Majumder Cognizant Technology Solutions, Kolkata, West Bengal, India P. Chatterjee School of Mobile Computing and Communication, Jadavpur University, Jadavpur, West Bengal, India P. Majumder (B) · S. Shrestha CMR Institute of Technology, Bengaluru, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_31

357

358

P. Majumder et al.

1 Introduction With the rapid spread of pervasive computing, the use of sensor nodes in various wireless applications is increasing these days. One major application of wireless communication has been noted in the field of wireless sensor networks (WSNs). The advent of low-cost communication and sensor devices has led to the deployment of a large number of sensors in geographical areas that are not easily accessible, helping in remotely monitoring different activities in those areas. One of the essential criteria of these sensor devices is lifetime sustainability in terms of battery power, e.g., more the residual battery power, more the service life. Usually, these sensors are deployed in large numbers so as to maintain a long battery life, high reliability of service, simultaneously bringing down costs [1]. The nodes are to be declared dead if and when the battery power becomes insufficient to carry out any kind of communication process. As these sensor nodes are deployed in areas mostly inaccessible, hence, the replacement of battery of sensors is practically unfeasible. Therefore, increasing sustainability of sensor nodes is one of the challenging problems in this domain. Hence, optimization of transmission sensor energy has gained paramount importance in all researches that focus on integrating data acquisition [2], multi-dimensional data and query processing [3], media access control [4], energy-efficient routing protocol [5], and much more. This paper presents our proposal toward the implementation of a novel energyaware communication protocol in a multipath propagation-based sensor network.

2 Related Works Numerous schemes broadening over various communication layers protocol stack have been investigated in the literature to implement the energy-efficient communication in wireless sensor networks. For example, a common approach in MAC layer-based solutions for reduction of energy consumption is to avoid or reduce collisions so as to minimize packet retransmissions [6]. A large number of such low-energy routing protocols are available in the literature among which the most popularly used ones for wireless sensor networks include TEEN [7]. In this algorithm, packets are strategically routed through different intermediate nodes by reducing the transmission range of sensor nodes, thereby saving overall transmission energy. LEACH [8] proposed dynamic clustering algorithm for uniformly distribution of load by the nodes in a specific cluster, potentially increases network lifetime. Though, cluster reconfiguration at every phase is considered to be a tedious task in terms of implementation cost and complexity. To overcome such issues, this work presents a novel energy-efficient transmission process that employs transmission of ratio-based segregation of packets to the neighbor nodes, which is proportionate to the distance toward cluster head, resulting power depletion of all nodes in a multipath network. We have formulated this problem as a constrained optimization problem [9] to solve the required ratios.

Dual-Message Compression with Variable Null Symbol …

359

3 Effect of Source Coding on Transmission Energy Recent trend of many WSN-based IoT applications is to make sustainable in terms of CO2 generating rate by keeping the cost and energy consumption of the devices to be extremely low, so that it employs low-cost and low-power radios with simple modulation schemes such as OOK, ASK, PSK/FSK. Researches have pointed out the genuine performance analysis of various narrowband digital modulation schemes used in WSNs. Conventionally, every bit of a message, be it a ‘0’ or ‘1’, is transmitted in the form of a modulated carrier signal, and hence, a finite amount of energy is spent in transmitting every bit of the message known as energy-based transmission (EbT) scheme, which faces a problem of exponential energy requirement if the message size is too large. In contrast to this scheme, a new paradigm has been introduced where sensor node can selectively choose some bit periods during which the transmitter does not transmit at all and, thus, saves some amount of transmission energy by keeping the transmitter in deep sleep mode by incorporating a novel concept called communication through silence in addition with efficient source coding algorithms proposed by [10, 11]. In these schemes, a raw binary message will be appropriately encoded by a suitable source coding technique featuring firstly, reduction in message length and secondly, provides highly asymmetric distribution of encoding symbols, so that the highest frequent symbol will be kept Silent during transmission, which gives an uniqueness to the transmission energy reduction policy with other established schemes in literature.

4 Network Model and Optimization Criterion 4.1 Network Model Description and Packet Distribution Policy Fig. 1 shows a simple network model of wireless sensor network. Let us suppose that the nodes are capable to transmit message at non-identical distances and able to control their transmission power. For simplicity, we initiate with a 1D nonlinear network. Let us assume that d be the distance between any two nodes in the network. Therefore, the transmission power to send the packet at d distance is proportional to d 2 . We further assume, the packet generation rate of every node be identical, say m-packets, in a specific interval of time. There are two possibilities that Node 1 can send all its packet to Node 5: (i) can send all its packet to the Node 5 directly or (ii) can incorporate multihop packet distribution and multipath transmission policy. The direct propagation from source (Node 1) to destination (Node 5) can cause the packet to be routed over a distance of 2d. Energy requirement for this scheme is proportional to square of the distance, i.e., 4d 2 . In the second scheme, the packet distribution policy can be explained as below,

360

P. Majumder et al. x115 PATH-1 NODE-2 x25

x35

x12

PATH-2

x215

PATH-3

x13

NODE-3

NODE-5 DESTINATION

NODE-1 d

x45

d

NODE-4

SOURCE

x14

x315

Fig. 1 Network architecture with five nodes

• source node distributes its packet among every path to balance the load at every servicing nodes so that whole energy of the network is minimized. • every node has to forward not only the packets it generate, but also the packets it will receive from previous nodes in such a manner so that energy requirement of every servicing nodes must be equal. In reference Fig. 1, let us assume, every node (1, 2, 3, and 4) generates m packets. The total number of packets of each nodes can be expressed as, Ri = mxi j + m = m(1 + xi j )

(1)

where xi j is the fraction of packets directly received from node i by node j.

4.2 Energy Calculation Energy of servicing nodes can be expressed as the sum of distance of all interacting nodes multiplied by amount of generated packets transferred by the source node. Ei =

n

2 Ri xi j ( j − i)d

(2)

j=1,=i

i and j are the generating node and receiving nodes, respectively. Therefore, total energy of the network can be expressed as the sum of energy of individual servicing nodes, E total = E 1 + E 2 + E 3 + E 4 .

Dual-Message Compression with Variable Null Symbol …

361

4.3 Solving Optimization Problem The objective is to minimize total energy of the network E total keeping the energy expenditure of every node to be equal. We solve the problem using linear optimization technique. Objective Function Objective function of the problem can be represented as f obj = j i mx i j + m. In our problem, i = 1 and j = 1–4. Constraints Constraints of the following objective function f obj can be represented as, • energy expenditure of every node should be equal, i.e., E 1 = E 2 = E 3 = E 4 . • sum of all fraction of packet departing from each generating node should be equal j to 1, i.e., i xi j = 1 . • the fractional value of packet should lie in between 0 and 1, i.e., 0 ≤ xi j ≤ 1. • nodes those are directly connected to destination node can send their packets directly without using any other hops. So, the fraction of generated packet for these nodes should be equal to 1. In our problem x25 , x35 , x45 = 1.

5 Employ Source Coding Scheme An energy-efficient source coding scheme dual-message compression with variable null symbol (DC V N S) [11] is applied on the message generated by each participati , of length n, generated from a ing nodes. Consider two binary strings Mti and Mt+1 particular sensor i at two consecutive time intervals t and t + 1, respectively, and to be transmitted to a destination node via multipath and multihop propagation methi by interleaving the bits of Mti ods. We generate a composite message string Mt,t+1 i i i i i i i and Mt+1 , respectively. Let us assume, Mt = Mt,n−1 Mt,n−2 · · · Mt,1 Mt,0 and Mt+1 = i i i i Mt+1,n−1 Mt+1,n−2 · · · Mt+1,1 Mt+1,0 , then the composite string can be represented by i i i i i i i i i = Mt,t+1,n−1 · · · Mt,t+1,0 = Mt,n−1 Mt+1,n−1 Mt,n−2 Mt+1,n−2 · · · Mt,1 Mt+1,1 Mt,t+1,n i i i Mt,0 Mt+1,0 . After that, the two consecutive bits of composite string, i.e., Mt,n− j i Mt+1,n− j (where 0 ≤ j ≤ n − 1), will be replaced by one of the four encoding symi i bols A, B, C and D, respectively if Mt,n− j Mt+1,n− j values are 00, 01, 10, and 11, respectively.

6 Effects of Device Characteristics To estimate the total energy consumed by the radio device used by a sensor node, we consider the commercial radio devices, e.g., CC1100 [12] and Maxim 2820 [13] chips that are widely used for low-power, low-cost WSN IoT applications. Such devices, however, consume considerably large power in transmit or receive states compared to that when the radio is in its low-power operation mode.

362

P. Majumder et al.

Table 1 Characteristics of radio devices Device specifications Maxim 2820 Data rate (Kbps) Tp (µs) Vcc (v) IHIGH (mA) ILOW (mA) TON (µs)

50 20 2.7 70 25 3

CC1100 2.5 400 3.6 30.3 1.9 88.4

Total energy (E TOTAL ) consumed by a sensor node is calculated by accumulating different power consumption levels associated with base energy (E base ), transmission energy (E trans ), and switching energy (E sw ), respectively, i.e., E TOTAL = E base + E trans + E sw . To transmit n bit binary raw data with a compression factor of rcomp and having pON percentage of non-silent symbol in the encoded string, different energy requirements can be represented as follows. E base = n × rcomp × t P × ILOW × VDC , E tran = n × rcomp × pON × (t P − tRISE ) × I0 × VDC , E sw = n × rcomp × pON × tRISE × (ITRAN − ILOW ) × VDC (Table 1).

7 Result Analysis In this section, we describe analysis of energy expenditure of the sensor nodes from two aspects. Firstly, we have demonstrated the theoretical energy calculation solved using our proposed optimization strategy, and after that, we will incorporate our proposed DCVNS [11] source coding technique on multipath multihop-based approach for performance analysis.

7.1 Theoretical Energy Savings The linear optimization problem is solved using MATLAB optimization toolbox [14]. In our work, we have analyzed the overall energy distribution of the network considering distance parameters of the nodes and by varying packet generation type in the network. Table 2 suggests the packet distribution ratios to neighboring nodes considering Case 2.1 and Case 1.1, respectively. Result shows the optimized value of the objective function is 5.200 unit, and verifying constraint 1, energy requirement of source node is 1.3000 unit. Whereas, if the source node tries to send all m packet directly to destination node without using any multipath multihop strategy, then energy requirement will be of 4 unit. So, there is an improvement of 307.69 times of the energy requirement by the source using multipath strategy. Considering nonidentical distance between hops in the network, the energy expenditure is increased by 46%.

Dual-Message Compression with Variable Null Symbol …

363

Table 2 Energy calculation of each nodes Distant metric Packet generation/ transmission type Identical d unit

Estimated energy (Unit)

Optimized packet fraction

E total

e1

e2

e3

e4

x12

x13

x14

1 x15

2 x15

3 x15

Source

5.2

1.3

1.3

1.3

1.3

0.3

0.3

0.3

0.1

–

–

Source + 1 node

5.16

1.29 1.29 1.29

1.29

0.29

0.29

0.29

0.05 0.05 –

Source + 2 nodes

5.12

1.28 1.28 1.28

1.285

0.285

0.285

0.285

0.04 0.04 0.04

No direct

4

1

1

0.33

0.33

0.33

–

–

–

9.76

2.44 2.44 2.44

2.44

0.22

0.22

0.22

0.33 –

–

Source + 1 node

9.68

2.42 2.42 2.42

2.42

0.21

0.21

0.21

0.18 0.18 –

Source + 2 nodes

9.52

2.38 2.38 2.38

2.38

0.19

0.19

0.19

0.14 0.14 0.14

No direct

6

1.5

1.5

0.33

0.33

0.33

–

Non-identical Source √ d , 2d unit

1

1.5

1

1.5

–

–

7.2 Simulation on Real-Life Sensor Data This dataset is assembled from Activity 2.3 CityPulse EU FP7 project [15], which concerns with real-time IoT stream processing and large-scale data analytics for smart city applications. Performance Comparison We now compare the effectiveness of our proposed scheme with popular load-balancing protocols, e.g., Load-balanced routing (LBR) protocol [16], energy-efficient sleep awake aware (EESAA) protocol [17]. Figure 2 shows the overall communication energy profile, exploiting three individual loadbalancing techniques on commercial radios (CC1100 and Maxim 2820) using device specification mentioned in Table 1. From Fig. 2, it is clear that a significant energy savings is achieved for transmission energy (E tran ) and base energy (E base ) in the entire communication energy profile for both the transceiver radios CC1100 and

Fig. 2 Communication energy profile for different schemes using CC1100 and Maxim 2820

364

P. Majumder et al.

Maxim 2820. Usually, the fall time tFALL and rise tRISE time are very small and the corresponding energy for switching from transmission state to idle state and viceversa is small enough to be neglected.

8 Conclusion In this paper, we have introduced a novel constrained optimization problem-based packet distribution policy to maintain power dissipation rate of the all participated nodes to be equal. Additionally, we have incorporated our established source coding scheme known as dual-message compression with variable null symbol (DCVNS) on our proposed network layer-based packet transmission scheme. This source coding method uses the concept of silent communication that helps to reduce most energy-consuming active state duration and thereby reduces overall network energy. Simulation result on real-life sensor dataset with commercially available low-cost and low-power devices, e.g., CC1100 and Maxim 2820 shows our proposed approach outperforms in all aspects of transmission energy profile with the existing schemes.

References 1. Kuhn, F., Moscibroda, T., Wattenhofer, R.: Initializing newly deployed ad hoc and sensor networks. In: Proceedings of the 10th Annual International Conference on Mobile Computing and Networking, pp. 260–274 (2004) 2. Majumder, P., Chatterjee, P., Sinha, K.: Run length distribution based block coding scheme for sustainable IoT applications. In: 2020 2nd Ph.D. Colloquium on Ethically Driven Innovation and Technology for Society (Ph.D. EDITS), pp. 1–2. IEEE (2020) 3. Cheng, S., Cai, Z., Li, J.: Curve query processing in wireless sensor networks. IEEE Trans. Veh. Technol. 64(11), 5198–5209 (2014) 4. Kumar, A., Zhao, M., Wong, K.-J., Guan, Y.L., Chong, P.H.J.: A comprehensive study of IoT and WSN MAC protocols: research issues, challenges and opportunities, vol. 6, pp. 76228– 76262. IEEE Access (2018) 5. Brar, G.S., Rani, S., Chopra, V., Malhotra, R., Song, H., Ahmed, S.H.: Energy efficient direction-based PDORP routing protocol for WSN. IEEE Access, vol. 4, pp. 3182–3194 (2016) 6. Demirkol, I., Ersoy, C., Alagoz, F.: MAC protocols for wireless sensor networks: a survey. IEEE Commun. Mag. 44(4), 115–121 (2006) 7. Manjeshwar, A., Agrawal, D.P.: TEEN: A Routing Protocol for Enhanced Efficiency in Wireless Sensor Networks, vol. 1, p. 189 (2001) 8. Ley, S.V., Baxendale, I.R., Bream, R.N., Jackson, P.S., Andrew, G.: Multi-step organic synthesis using solid-supported reagents and scavengers: a new paradigm in chemical library generation. J. Chem. Soc. Perkin Trans. 1 23, 3815–4195 (2000) 9. Farmani, R., Wright, J.A.: Self-adaptive fitness formulation for constrained optimization. IEEE Trans. Evol. Comput. 5, 445–455 (2003) 10. Bhattacharya, A., Majumder, P., Sinha, K., Sinha, B.P., Kavitha, K.V.N.: An energy-efficient wireless communication scheme using quint fibonacci number system. Int. J. Commun. Netw. Distrib. Syst. 16(2), 140–161 (2016)

Dual-Message Compression with Variable Null Symbol …

365

11. Majumder, P., Sinha, K., Sinha, B.P.: DCVNS: a new energy efficient transmission scheme for wireless sensor networks. In: 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), pp. 1–5. IEEE (2018) 12. CC1100: https://www.ti.com/product/CC1100 13. Maxim2820: https://datasheetspdf.com/pdf/497162/Maxim/MAX2820/1 14. Matlab Optimization Toolbox: https://www.mathworks.com/optimization.html 15. Dataset: http://iot.ee.surrey.ac.uk:8080 16. Agarwal, S., Das, A., Das, N.: An efficient approach for load balancing in vehicular ad-hoc networks. In: IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS), pp. 1–6 (2016) 17. Ennaciri, A., Erritali, M., Bengourram, J.: Load balancing protocol (EESAA) to improve quality of service in wireless sensor network. Proc. Comput. Sci. 151, 1140–1145 (2019)

Spatio-temporal Variances of COVID-19 Active Cases and Genomic Sequence Data in India Sumit Sen and Neelam Dabas Sen

Abstract Active cases of the COVID-19 pandemic have been reported for more than a year, and separately, there have been significant efforts to collect genome sequencing data during this period to track mutations and evolving strains. While both these datasets can be independently analyzed over space and time, the pattern and variances as evidenced by clustering of these datasets during two different waves of the epidemic in India show important differences. Quantification of these differences can help characterize relative need for collection of genomic data. Differences in the clusters are evident both spatially and temporally, and there are varying distances between such clusters as well. While similarity metrics and techniques have been developed in the context of spatio-temporal datasets, especially in moving objects, we demonstrate the limitations of such methods in analyzing epidemiological data. Finally, we highlight the challenges of such analysis in massive datasets and performance constraints at variant spatial and temporal scales. Keywords Spatio-temporal data · Clustering · Similarity metrics · Epidemiology · COVID-19

1 Introduction 1.1 Spatio-temporal Data of COVID-19 Spatio-temporal data provide important epidemiological insights into the progression, and in the context of the COVID-19 pandemic, the availability of daily data across the world has formed the cornerstone of policy interventions and public health planning in different parts of the world [1, 2]. Multiple studies have documented the

S. Sen GISE Lab, CSE-IIT Bombay, Mumbai 400076, India N. D. Sen (B) School of Life Sciences, JNU, New Delhi 110067, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_32

367

368

S. Sen and N. D. Sen

Fig. 1 Active cases of COVID-19 in India and the two periods used in the paper

spatial and spatio-temporal pattern of the disease [3, 4], and there has been a systematic effort to collect, analyze, and use data for the management of the pandemic. The Ministry of Health and Family Welfare has provided daily updates for newly reported cases, deaths, and active cases.1 While much of the data reported has been at the district level, most researchers have focused on data at lesser granularity (state level), partly because of higher mobility across districts and greater administrative control through policies of a state government. Data have been collected since the beginning of the pandemic, and even at state level, there are more than 12,000 spatio-temporal records of each type of data (such as new cases, new deaths, and active cases). Combined with the heterogeneity of demography, climatology, human mobility, and public health datasets, these spatio-temporal records provide an opportunity to develop tools and techniques for management and control of this pandemic and possibly those in the future. The data for active cases in India are shown in Fig. 1 and illustrates the second wave starting at about February 15, 2021. In this paper, we use (i) the period of the second wave and (ii) the period leading from a year prior to this period as two distinct temporal datasets to compare the spatio-temporal nature of data related to pandemic.

1.2 Spatio-temporal Analysis of SARS-CoV-2 Genome Sequencing Data SARS-CoV-2, virus responsible for COVID-19, is a positive-strand RNA virus with a genome of about 30 kb that encodes four structural proteins: the spike (S) protein, 1

https://www.mygov.in/corona-data/covid19-statewise-status/

Spatio-temporal Variances of COVID-19 Active Cases …

369

the envelope (E) protein, the matrix (M) protein, and the nucleocapsid (N) protein; together with 8 accessory proteins and 16 non-structural proteins, including the RNA-dependent RNA polymerase [5]. Over the period between March 4, 2020 (when the first genome sequence was submitted from India) and March 21, 2021, a total of 15,281 sequences have been deposited at Gisaid.org and are made available for researchers [6]. There are several initiatives for tracking SARS-CoV-2 single-nucleotide variations, lineages, and clades using the available genomes on the GISAID database while filtering by location, date, gene, and mutation of interest [7]. It is important to note that while genome sequences of the SARS-CoV-2 virus have been deposited all over the world in the past, the frequency at which such sequences were deposited (and hence are available for analysis) varies a great deal. For example, while 15,281 sequences are available in India, there are more than 415,897 sequences during the same period from the UK. Furthermore, in December 2020, India had 876 sequence submissions compared to 28,834 submissions in the UK (which was in the middle of a surge). In April 2021, India (which had a surge during this duration) had 2,708 submissions compared to 53,204 post-surge submissions in the UK. Such variances in genome data are also noted within the country. The state of Maharashtra in India reported 892 sequences in April compared to 286 sequences from Karnataka and 9 from Kerala during this period. It is important to note the strong causal linkage of high active cases to greater interest in sequencing the viral genome as evident in Fig. 2. However, it is difficult to establish (or negate) this linkage, given the time gaps between date of sample collection and date of sequence submission. Furthermore, as we will examine later in this paper, comparing similarity of genome data submissions and case data is a challenging task.

2 Spatio-temporal Clustering for Epidemiological Analysis Spatio-temporal (ST) clustering is an extension of spatial clustering in which the time dimension is introduced into spatial data. Spatio-temporal clustering plays a vital role in many areas of engineering, scientific, and real-world applications such as epidemiological analysis. The combination of geographic dimensions with time introduces several application-dependent challenges as well as computational challenges, especially related to large and complex datasets.

2.1 ST Variations in India During the Second Wave of COVID-19 Pandemics affect different regions differently, and governmental policies and societal behavior have important implications in the temporal variations such as ‘flatten the

370

S. Sen and N. D. Sen

a.

c.

b.

d.

Fig. 2 a Map-based visualization of active caseloads in different states before the second wave along with (blue) hotspot of genome sequences during this time. The number labels are the values of the total number of sequences of the dominant strain (Clade 20B). b Similar to a visualization of active case data during the second wave with (red) hotspot of genome sequence submissions. Clade 21A (or lineage B.1.617, also known as G/452R.V3) is denoted by red color and appears to be the dominant strain in the second wave, and number labels show the number of sequences submitted that belongs to this strain. c Time-resolved phylogenetic tree built using Nextstrain tools for samples collected in India before the second wave showing phylogenetic variation in the genomic sequence of SARS-CoV-2. Colored nodes indicate different lineages or clades of the virus. d Time-resolved phylogenetic tree similar to c. for samples deposited in India during the second wave·

curve’ [1] and reduce spatial spread [8]. In the Indian context, it is easy to observe marked differences in the spread of disease both in spatial sense (Fig. 2a, b) and in temporal sense (Fig. 1). Also, phylogenetic variations in genome data submissions are shown in Fig. 2c, d. Figure 2b also shows the spatial variation between sequence data submission during the second wave (total of 5936 sequences for the period of almost 3 months) compared to the 12 months before (total of 3935 sequences).

Spatio-temporal Variances of COVID-19 Active Cases …

371

2.2 ST Clustering Helps to Understand COVID-19 Hotspots ST clustering of daily reports of active cases provides an important view of emergence and disappearance of disease clusters. We examined the Getis-Ord GI* statistic, calculated by comparing the local sum of the number of active cases in a given state and those of its neighbors to the sum of all feature values [9] to create hotspot maps. However, these hotspots continue to transform as evident from maps of Fig. 2a, b. ST clusters provide a view of such hotspots while accounting for the temporal dimension.

2.3 ST Clustering Algorithms Clustering is one popular unsupervised method for discovering potential patterns and is widely used in data analysis, especially for geographical data [10]. It aims to group events according to neighboring occurrence and/or similar attributes, and in the case of spatio-temporal clustering, these variables are location and time. Most clustering algorithms measure the distance between each pair. Various distance functions are adopted in the clustering methods, such as the Euclidean and Manhattan distance functions. There are several techniques that have been employed for identifying spatio-temporal clusters such as spatiotemporal K-nearest neighbors test, space–time interaction methods, spatial scan statistic, and partitional clustering techniques used by DBSCAN or Kernel density estimation (KDE) [10]. While most of these techniques extend spatial clustering by treating time as another dimension, the specific requirements of the application dictate the parameters and thresholds. For example, different parameters and adaptation of the algorithms are used in identification of disease clusters compared to identification of flocks, convoys, and swarms [11] For the analysis in this paper, we employ a spatial scan statistic that has been developed to test for geographical clusters and to identify their approximate location [12]. The spatial scan statistic imposes a circular window on the map and lets the center of the circle move over the area so that at different positions the window includes different sets of neighboring areas. Conditioning on the observed total number of cases, N, the definition of the spatial scan statistic S is the maximum likelihood ratio over all possible circles Z, S=

max{L(Z )} z

L0

= max z

L(Z ) , L0

(1)

where L{Z} is the maximum likelihood for circle Z, expressing how likely the observed data are given a differential rate of events within and outside the zone, and where L 0 is the likelihood function under the null hypothesis. Thus, to find the most likely spatio-temporal clusters, the ratio of L{Z}/L 0 is maximized over all the

372

S. Sen and N. D. Sen

circles and cut-off p-values obtained through Monte Carlo hypothesis testing are used to decide on feasible clusters.

2.4 Event Clustering for Disease Clusters Event clustering focuses on the discovery of groups of events that are close to each other with respect to space and time. Clustering of events is extensively applied to establish evolutionary relationships among genomes by identifying accumulation of viral mutations and identification of different clades or lineages. Genetic cluster analysis has been regularly applied to genome sequencing, and the global evolution of SARS-CoV-2 virus has been extensively reported through Nextstrain [13]. We use the Nextstrain and Augur bioinformatics toolkit for the phylogenetic analysis and presentation of the genome sequences accessed from GISAID. Figure 2c, d represents the phylogenetic analysis of the gene sequence data from India during two time periods. It is also important to note that most of the sequence data (obtained from infected patients) also have the collection date and location. Thus, spatio-temporal clustering of the phylogenetic records yields genetic events (in terms of number and nature of mutations). Such clustering provides insights into mutational events in the evolution of the virus [14]. Note, in the second wave, clade 21A is the dominant lineage compared to the dominant clades 19B and 20B during the earlier period. Such events are distinct but surely related to clusters of disease case data. While spatiotemporal clustering has been extensively applied to the area of moving objects, the clustering algorithms in the context of understanding disease clusters do not presume topological connectivity in the form of trajectories. Contiguity is, however, taken into consideration through neighborhood relations in such disease-based event clustering.

3 Spatio-temporal Variances of COVID-19 in India 3.1 Active Cases Have Seen Much More Clusters in the Second Wave We conducted spatio-temporal clustering of the active caseload in India before the second wave (122,843,446 cumulative instances from 02/15/2020 to 02/14/2021) and during the second wave (134,022,118 cumulative instances from 02/15/2021 to 2021/05/21). Both computations were done for 36 states and union territories and required 433 and 284 min on an Intel i9-8950HK CPU @ 2.90 GHz workstation with 12 processors. The results of the two runs were markedly different with 23 ST clusters detected in the second wave (with p < 0.01) compared to four such clusters in the period prior to the second wave. The summaries of clusters detected are provided in Table 1.

Spatio-temporal Variances of COVID-19 Active Cases …

373

Table 1 Summary of ST clusters from different runs Run

# of clusters

Avg spatial range

Avg time frame

Avg # of cases

Avg test static

# of clusters with p < 0.1

Before second wave (case data)

4

1 state

5 days

116,740

91,247

4

Before second wave (genome data)

5

1 state

110.8 days

138.4

8.61

3

Second wave (case data)

24

1 state

5.2 days

611,280

40,024

23

Second wave (genome data)

5

3.8 states

48.8 days

116,739

91,247

5

3.2 ST Clusters of Genomic Sequences in the Second Wave Are More but Lack Spatial Coverage We ran ST clustering of the genomic sequence data using sequences deposited prior to the second wave (3935 cumulative instances from 1/1/2020 to 11/4/2021) and during the second wave (5936 cumulative instances from 3/1/2021 to 3/1/2020). While five clusters were detected for the second wave cases, only four clusters were detected in the prior time period (of which only one cluster had p < 0.01). As evident in Table 1, these are significantly different from the ST clusters of the active case data for the same locales and time periods. It is clear that the frequency of sequences deposited has increased over time and naturally more deposits were made in the more recent second wave.

4 Comparing ST Clusters of Case Data with Genomic Data It is important to compare the ST clusters of the two datasets (active case vs genome sequence data) to determine (i) if the genome sequencing efforts have been proportional to the active caseloads, (ii) if further sequences need to be collected at different places and time points, and (iii) if the mutational information from the genomic data in these clusters correspond to the ST clusters found in the active case data. The number of active cases is one of many epidemiological metrics used to understand

374

S. Sen and N. D. Sen

and manage public health conditions, and it is evident that ST clustering studies can also be carried out with metrics such as daily new cases, daily deaths, and others to answer the same three questions listed above. In the context of the limited use case of comparing similarities of the two ST clustering results (and across the two time periods), we are able to ascertain that the case data clusters and genome sequence data of the same periods are more similar than the datasets across different periods, but we are limited in our ability to quantify such similarity.

4.1 Metrics for Measuring Similarity of ST Clusters A significant handicap in answering questions related to similarity of ST clusters derived from different types of data or ST clusters from different geographies and time periods is the non-existence of similarity metrics that can help in quantification. While the number of spatial clusters found using similar parameter settings for clustering can be a starting point, it is important to use the number of observations within such clusters, the spatial range, and recurrence interval as well as likelihood estimates (p-values) to derive the similarity. Density-based clustering of spatial trajectories has similar approaches [15].

5 Conclusions We applied ST clustering techniques to study COVID-19 datasets collected for two distinct purposes and reported the variances of these datasets both in different regions and for different time periods. While this is a first attempt at comparing spatiotemporal datasets of the pandemic, we believe that there are important areas for future work that include but are not restricted to. 1.

2.

3.

Designing faster algorithms for ST clustering using that can process massive amounts of data and thus enable large-scale studies like clustering of trajectories [11]. Developing similarity metrics for ST clustering results and answering epidemiological questions by doing so. Density-based approaches [15] and indices suggested by McIntosh and Yuan [16] are possible directions. Integrating covariates such as mutational information and clade information within ST clustering similar to the approach of generalized linear models (GLM) for cluster detection [15].

Acknowledgements Authors acknowledge discussions with colleagues at their respective laboratories. NDS is supported by Ramalingaswami Re-Entry Fellowship of the Department of Biotechnology, Ministry of Science & Technology, India (BT/RLF/Re-entry/55/2017).

Spatio-temporal Variances of COVID-19 Active Cases …

375

References 1. Gross, B., Zheng, Z., Liu, S., Chen, X., Sela, A., Li, J., Li, D., Havlin, S.: Spatio-temporal propagation of COVID-19 pandemics. EPL (Europhys. Lett.) 131(5), 58003 (2020) 2. Yalcin, M.: Mapping the global spatio-temporal dynamics of COVID-19 outbreak using cartograms during the first 150 days of the pandemic. Geocarto Int. 1–10 (2020) 3. Elson, R., Davies, T.M., Lake, I.R., Vivancos, R., Blomquist, P.B., Charlett, A., Dabrera, G.: The spatio-temporal distribution of COVID-19 infection in England between January and June 2020. Epidemiol. Infect. 149 (2021) 4. Bag, R., Ghosh, M., Biswas, B., Chatterjee, M.: Understanding the spatio-temporal pattern of COVID-19 outbreak in India using GIS and India’s response in managing the pandemic. Reg. Sci. Policy Pract. 12(6), 1063–1103 (2020) 5. Nakagawa, S., Miyazawa, T.: Genome evolution of SARS-CoV-2 and its virological characteristics. Inflamm. Regeneration 40(1), 1–7 (2020) 6. Singh, H., Singh, J., Khubaib, M., Jamal, S., Sheikh, J.A., Kohli, S., Hasnain, S.E., Rahman, S.A.: Mapping the genomic landscape & diversity of COVID-19 based on> 3950 clinical isolates of SARS-CoV-2: Likely origin & transmission dynamics of isolates sequenced in India. Indian J. Med. Res. 151(5), 474 (2020) 7. Chen, A.T., Altschuler, K., Zhan, S.H., Chan, Y.A., Deverman, B.E.: COVID-19 CG enables SARS-CoV-2 mutation and lineage tracking by locations and dates of interest. Elife 10, e63409 (2021) 8. Ramírez-Aldana, R., Gomez-Verjan, J.C., Bello-Chavolla, O.Y.: Spatial analysis of COVID-19 spread in Iran: insights into geographical and structural transmission determinants at a province level. PLoS Neglected Trop. Dis. 14(11), e0008875 (2020) 9. Getis, A., Ord, J.K.: The analysis of spatial association by use of distance statistics. In: Perspectives on Spatial Data Analysis, pp. 127–145. Springer, Berlin (2010) 10. Shi, Z., Pun-Cheng, L.S.: Spatiotemporal data clustering: a survey of methods. ISPRS Int. J. Geo Inf. 8(3), 112 (2019) 11. Mhatre, J., Agrawal, H., Sen, S.: Efficient algorithms for flock detection in large spatio-temporal data. In: International Conference on Big Data Analytics, pp. 307–323. Springer, Cham (2019) 12. Kulldorff, M.: A spatial scan statistic. Commun. Stat. Theory Methods 26(6), 1481–1496 (1997) 13. Hadfield, J., Megill, C., Bell, S.M., Huddleston, J., Potter, B., Callender, C., Sagulenko, P., Bedford, T., Neher, R.A.: Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34(23), 4121–4123 (2018) 14. Saxenhofer, M., Weber de Melo, V., Ulrich, R.G., Heckel, G.: Revised time scales of RNA virus evolution based on spatial information. Proc. Royal Soc. B Biol. Sci. 284(1860), 20170857 (2017) 15. Gómez-Rubio, V., Moraga, P., Molitor, J., Rowlingson, B.: DClusterm: model-based detection of disease clusters. J. Stat. Softw. 90(1), 1–26 (2019) 16. McIntosh, J., Yuan, M.: Assessing similarity of geographic processes and events. Trans. GIS 9(2), 223–245 (2005)

Significance of Thermoelectric Cooler Approach of Atmospheric Water Generator for Solving Fresh Water Scarcity B. K. Imtiyaz Ahmed and Abdul Wahid Nasir

Abstract In the developing countries with predominantly rural population and dry regions, drinking water scarcity is serious challenge which is to be addressed. With interdisciplinary approach to address this challenge, atmospheric water generator (AWG) can be one of the promising and effective solution to solve the water scarcity. In this paper, we have discussed the relevance of AWG in providing clean drinking water with natural phenomenon of condensation of humid air. Basic approaches of condensation principle are briefly described, and the relevance of thermoelectric cooling (TEC) effect is elaborated. Also, the relevant work towards AWG based on TEC is also presented along with typical steps that can be adopted in deployment of AWG is also illustrated. Keywords Atmospheric water generator · Relative humidity · Thermoelectric cooling · Peltier effect

1 Introduction Currently, entire population of the globe is facing fresh water crisis, in particular developing countries are the most affected. The reason for such a situation is common perennial problems that include population growth, urbanization and environment pollution [1]. Water that is suitable for drinking being the basic necessity for survival, it is of prime priority to work towards this. Developed nations are capable enough to explore the various options to cater the need of drinking water with available resources and technology such as desalination or transportation of water. Developing countries with access to limited technology and lack of basic infrastructure highcost approaches to solve water crisis is far from reach. One more challenge for B. K. I. Ahmed (B) · A. W. Nasir Department of ECE, CMR Institute of Technology, Bengaluru, India e-mail: [email protected] A. W. Nasir e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_33

377

378

B. K. I. Ahmed and A. W. Nasir

the developing countries is the majority of rural areas with insufficient resources to support technology. Furthermore, the dry regions of the country will have even worst scenarios due to less or no rain causing due to environmental hazards. Needless to say, the implications and repercussion of unsafe drinking water or no water at all to the humanity will be unimaginable, thereby illustrating the relevance of problem statement. Hence, an approach for providing safe drinking water to all, in particular to the population of dry regions would be essential in the current era. Any solution developed or proposed would be a contribution towards nation building in terms of providing basic necessities of citizens [2, 3]. As the drinking water must be clean and safe to avoid water borne health disorders such as cholera, hepatitis, etc., and other brain and heart related deficiencies. Considering the challenges towards solving drinking water crisis, atmospheric water generator (AWG) would be a tremendous option. AWG is considered as one of the feasible solution for providing safe drinking water from the environment. Since AWG extracts water from air in the atmosphere except for any air pollutants, generated water will be safe for drinking and the system works with natural resources as raw material. Even though generated water is considered to be safe for drinking, but still related filtering methods can be adopted to maintain WHO recommended standards. AWG acts as a device or machine that is capable of generating water from the surrounding air. Principle of operation of AWG involves dehumidification of air through condensation of water vapour molecules in the air. Classical methods of dehumidification of air for AWG operation are dew point condensation of humid air by compression, by employing desiccants that absorbs moisture or water content from air and compression method of vapour with refrigeration and peltier cooling. Condensation approach adopts the compression principle in which humid air is subjected to high pressure (typically 5 times higher to ambient pressure) which in turn raises the dew point. This method is capable of achieving higher water extraction levels and works efficiently in the regions high temperature and humidity. Nonetheless, its performance hugely depends on the method and equipment used for compression and decompression. Typical methods include vapour compression method with refrigeration but prove to be costly (due to maintenance) for a developing nation, also produces chemical by-products which need to be take care of. Desiccants are solids that absorb water in their surrounding which can be extracted by heating or baking them. This method eases the dehumidification process along with no dependency on energy requirement for its functioning. One of the challenging aspects of this method is the reusability of the desiccant after extraction of water from it and the processing system [4]. An effective method of AWG design would be the thermo electric cooling by which water generation is possible at lower energy requirement and can also contribute towards the portable AWG (PAWG). Thermoelectric cooler (TEC) is based on thermo-couple transducer which is capable of cooling warm air and could condense air moisture. With the small size and less weight AWG with TEC approach can play crucial role towards the design of PAWG. With the smaller voltage requirement for their operation, feasibility of renewable energy sources (solar and wind) would be prominent [5, 6].

Significance of Thermoelectric Cooler Approach of Atmospheric …

379

In this paper, the significance of the AWG with TEC approach to address fresh water scarcity in rural areas of developing nations is presented. Organization of the paper includes Sect. 2 briefs about the TEC working, research work related to AWG based on TEC is presented in Sect. 3, TEC based AWG approach is described in Sect. 4, followed by the conclusion.

2 Working Concept of TEC The basic principle of thermoelectric cooling effect is the seebeck effect in which temperature difference between two different conducting materials can generate a voltage across them. TEC is converse of seebeck effect in which application of voltage across junction of two different conducting materials will create an effect of release and absorption of heat at its sides. This phenomenon of thermo dynamics due to applied voltage is referred as “Peltier effect”. The peltier effect is profound high with semiconductor materials, typically using Bismuth telluride and its alloys in the form of p-type and n-type semiconductors. Due to applied voltage across the device charge carriers, i.e. electrons and holes are enabled to cross over the junction towards p-type and n-type materials, respectively. Such a charge recombination will give rise to the effect of heat transfer as hot and cold sides based on the current flow direction [7–9] (Fig. 1). TEC would be simplified and compact method of condensing the humid air, whose performance depends on the peltier device configuration in terms of pairing of n-type and p-type material unions. Amount of cooling achieved will be directly dependent on the number of unions and the effectiveness of the cooling of hot side usually referred as “Heat Sinks”. Improper heat sink would lead to vulnerability of low cooling of the device for the applied voltage and number of unions. TEC approach offers the benefits of low cost solution with small size and no maintenance efforts for its functionality compare to other approaches of condensing. Also TECs are eco-friendly due to the absence of any refrigerants and proven high co-efficient of performance (COP) [10]. Typical design of AWG using TEC involves selection and design of TEC module, air inlet module, heat sink for “hot side”, air exhaust and water collector (with filtering) [11–14].

Fig. 1 Representation of Peltier device

380

B. K. I. Ahmed and A. W. Nasir

3 Related Work on TEC Based AWG As the part of providing basic life facilities, many developing countries have stressed upon the technology for providing safe drinking water. One of such initiative in India under United Nation Organization (UNO) guidelines has been presented in [2, 3]. It highlights the scenario of the nation towards need of water and use of AWG to solve the drinking water problem as per WHO standards. The main contribution of this work is probable technology solution for the scheme “Har Ghar Jal” is AWG and it can be efficient and cost effective solution in addressing the challenge and the programme. An attractive solution for the AWG users is PAWG was proposed in [15] using TEC technology. AWG prototype model with manageable weight to make it portable is discussed. The significance of TECs has contributed significantly in development of PAWGs. It has been concluded with the analysis that water generation and condensation was better with higher humidity, whereas condensation has negative effect with higher air flow. Further to add sustainability along with portability usage of solar or photo voltaic (PV) panels is also suggested to make it cost-effective operation and maintenance. AWG proto type with lower capacity or small scale has been developed at optimum performance was carried out in [16]. The effect of AWG parameters variations has been studied and compared with other systems in the literature with COP. TEC effect has impact on the system performance with respect to relative humidity and operating currents. It was concluded that the higher currents can generate more water but leads to more power dissipation and thereby degrading the COP. An optimal AWG design with respect to functioning of TEC has been presented in [17]. Here, the utilization of electronic circuit pulse width modulation (PWM) has been demonstrated to control the air flow to the TEC module that effects the condensation. The proposed work has been developed targeting the drought areas of Indonesia that have limited rainfall and dry climate. The use of low dc voltage supply and incorporation of fluid dynamics design has been the key aspect of this work. By comparative study, it has been concluded that the developed model generates more water with less power consumption under given conditions. A concept of tuning of PWAG is discussed in detail in [18] with effect of operating voltages and air flow control over the AWG performance. It was proved that an approach of tuning of AWG adaptively will be excellent option for optimized performance of AWG in term of water generation and power consumption. This work also discusses briefly about various works related to tuning of AWG parameter in terms of number of TEC modules, application and control parameters tuning of voltage and current. With results it is substantiated the tuning of AWG could lead to optimum performance of the system. In literature, significant work has been carried out by modification of cooling system of “Hot side” or “cold side” of TEC. A module referred as “cooler box” was attached to cold side that would aid in increasing the condensing effect thereby

Significance of Thermoelectric Cooler Approach of Atmospheric …

381

generating more water in [19]. Similar work but towards the hot side was presented in [20] in which the mechanical design was experimented with air inlet and heat sink. Lastly, but definitely not least the study or analysis of climatic conditions using psychometric chart would be extremely beneficial in development of AWG. To provide water harvesting in area affected by mining pollution is discussed in [21]. Depending on the climatological results mathematical model AWG design was developed and the best suited for the area under consideration was performed. Such a system would provide effective method of solving water crisis by smart planning based on the seasonal variations of area of implementation [22, 23].

4 Proposal for TEC Based AWG Due to the growing population in developing countries by adopting advancement in technology the basic requirement of water can be catered for rural areas. Since the cost and performance are the crucial aspects of any project, hence low maintenance and implementation cost, with efficient or optimum performance of system is must. In this article based on the discussions in above sections, it can be deduce that AWG with TEC technology would be vital in providing water problem solution. Along with efficient solution this approach would be designed and optimized for performance in given circumstances of operation. Figure below depicts the typical approach for TEC based AWG design. As depicted in figure, the AWG design shall be initiated with the climatological analysis of the area under consideration. This will be of great importance for model design that would yield maximum output by adaptive control of air inlet and applied voltage. Based on the observations from climatological study, the system design and cost can be utilized for budget estimation. Budget includes entire design of AWG which includes number and configuration of TEC modules, mechanical design of air inlet, heat sink, exhaust, water collector and possible filtering module. Next step would be the feasibility study of AWG for its implementation cost and performance in the area under consideration. Last stage will be the proto typing of the designed system and probable optimization for performance improvement. After the final testing and verification the developed AWG can be deployed as the solution for water non availability (Fig. 2).

5 Conclusion As “need was the mother of all inventions” at present one can say that “the research is the mother technology”. Addressing the existing and forthcoming challenges to the humanity will be primary objective of all the research. In this regard, water scarcity is the serious problems for most population in particular rural and dry areas that are not under the urban category. AWG can be the best possible solution to address

382

B. K. I. Ahmed and A. W. Nasir

Fig. 2 Typical steps for deployment of AWG system

the water problem at minimum cost and effective implementation. In this article, a brief overview of the AWG significance is presented with basic variants of it. In particular, the use of TEC technology in development of PAWG has been discussed. Related work pertaining to TEC based AWG is also illustrated with probable design approaches, along with typical steps to be followed in deployment of the AWG. One of the challenges in design of AWG is the performance in terms of water generation and maintenance which can be due to change in weather conditions.

References 1. WHO/UNICEF.: Progress on sanitation and drinking water 2015 Update and MDG Assessment, Geneva (2015)

Significance of Thermoelectric Cooler Approach of Atmospheric …

383

2. Das, S.D.: Assessment of atmospheric water generator for rural and remote India. IOSR J. Electr. Electron. Eng. (IOSR-JEEE), 13(2), 67–74. e-ISSN: 2278-1676, p-ISSN: 2320-3331 (2018) 3. Sevak, R., Gimble, A.: Har Ghar Jal 2030, current status and next steps, Ministry of Drinking Water and Sanitation, Government of India (2017) 4. Milani, D., et al.: Experimentally validated model for atmospheric water generation using a solar assisted desiccant dehumidification system. Energy Build. 77, 236–246 (2014) 5. Chaitanya, B., et al.: Biomass-gasification-based atmospheric water harvesting in India. Energy 165, 610–621 (2018) 6. Eltawil, M., Samuel, D.: Performance and economic evaluation of solar photovoltaic powered cooling system for potato storage. Agric. Eng. Int. CIGR Ejournal. Manuscript EE 07 008, IX (2007) 7. Goldsmid, H.J.: Introduction to Thermoelectricity, pp. 176–177. Springer, New York (2010) 8. Bell, L.E.: Cooling, heating, generating power and recovering waste heat with thermoelectric systems. Science 12(321), 1457–1461 (2008) 9. Zhao, D., Tan, G.: A review of thermoelectric cooling: materials, modeling and applications. Appl. Therm. Eng. 66, 15–24 (2014) 10. Atta, R.M.: Solar water condensation using thermoelectric coolers. Int. J. Water Resour. Arid Environ. 1(2), 142–145, ISSN 2079-7079 (2011) 11. Russel, M., Ewing, D., Ching, C.: Characterization of a thermoelectric cooler based thermal management system under different operating conditions. Appl. Thermal Eng. 50, 652–659 (2013) 12. Muñoz-García, M., Moreda, G., Raga-Arroyo, M., Marín-González, O.: Water harvesting for young trees using Peltier modules powered by photovoltaic solar energy. Comput. Electron. Agric. 93, 60–67 (2013) 13. Vián, J.G., Astrain, D., Domınguez, M.: Numerical modelling and a design of a thermoelectric dehumidifier. Appl. Thermal Eng. 22, 407–422 (2002) 14. Joshi, V.P., et al.: Experimental investigations on a portable fresh water generator using a thermoelectric cooler. Energy Proc. 109, 161–166 (2017) 15. Liu, S., et al.: Experimental analysis of a portable atmospheric water generator by thermoelectric cooling method. Energy Proc. 142, 1609–1614 (2017) 16. Shourideha, A.H., et al.: A comprehensive study of an atmospheric water generator using Peltier effect. Thermal Sci. Eng. Prog. 6, 14–26 (2018) 17. Suryaningsih, S., Nurhilal, O.: Optimal design of an atmospheric water generator (AWG) based on thermo-electric cooler (TEC) for drought in rural area. AIP Conf. Proc. 1712, 030009 (2016) 18. Casallas, I.: Experimental parameter tuning of a portable water generator system based on a thermoelectric cooler. Electronics 2021, 10, 141 (2021) 19. Tan, F.L., Fok, S.C.: Experimental testing and evaluation of parameters on the extraction of water from air using thermoelectric coolers. J. Test. Eval. 41, 96–103, (2013) 20. Huajun, W., Chengying, Q.: Experimental study of operation performance of a low power thermoelectric cooling dehumidifier. Int. J. Energy Environ. 1, 459–466 (2010) 21. Mendoza-Escamilla, J.A.: A feasibility study on the use of an atmospheric water generator (AWG) for the harvesting of fresh water in a semi-arid region affected by mining pollution. Appl. Sci. 9, 3278 (2019) 22. Eslami, M., Tajeddini, F., Etaati, N.: Thermal analysis and optimization of a system for water harvesting from humid air using thermoelectric coolers. Energy Convers. Manag. 174, 417–429 (2018) 23. Tu, R., Hwang, Y.: Reviews of atmospheric water harvesting technologies. Energy, 117630 (2020)

A Comprehensive Analysis of Testing Efforts Using the Avisar Testing Tool for Object Oriented Softwares Zunaid Aalam, Satnam Kaur, Prashant Vats, Amandeep Kaur, and Rini Saxena

Abstract A good OOTF should be able to tell the difference between two things the test results that may be outside the specification and test results that may be within the predetermined specification criteria. During testing of OOTF, the testing code coverage criteria are very important and play an essential criterion for determining which tests to run and when to stop them. Based on the Java coding platform using Java Parser library, Jenetics java classes, JCOCO for ensuing code coverage using java classes, we have provided our study work on evaluating efforts during the OOP testing using the AVISAR OOTF that is using the GA as a serving algorithm for the OOT. During our research study, the coding and development of the suggested OOTF AVISAR come up with an evaluating tenet platform for addressing the difficulties are related to the testing of complex features of OOSUT such as polymorphism, inheritance, and polymorphism, among others. Keywords Object oriented programs (OOP) · Genetic algorithm (GA) · OO software (OOS) · Polymorphism OO software under test (OOSUT) · Inheritance · Object oriented testing framework (OOTF) · Object oriented testing (OOT)

1 Introduction In this study, we offer a methodology for estimating effort during OOP testing using the AVISAR framework, which is based on the usage of genetic algorithms. During Z. Aalam (B) · S. Kaur SGT University, Gurugram, Haryana, India P. Vats Fairfield Institute of Management and Technology, GGSIPU, New Delhi, India A. Kaur · R. Saxena Chandigarh Engineering College, Jhanjeri, Mohali, India e-mail: [email protected] R. Saxena e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_34

385

386

Z. Aalam et al.

testing of OOTF, the testing code coverage criteria are an important criterion for determining which tests to run and when to stop them. A good OOTF should be able to tell the difference between two things the test results that may be outside the specification and test results that may be within the predetermined specification criteria. The control flow chart technique doesn’t seem to be useful as an OOS abstraction for calculating OOS testing effort. The key difficulty with utilizing traditional metrics to determine SUT complexity during OOS production and testing is that it is impossible to do so using the control architecture. When most OOPs are tested, the OO class methods are called within the OO classes in such a way that uses the class objects in a way that resolving the number of verdicts and recommendations in the test control flow graph is nearly impossible. In general, using any OO metric, it refers to a comprehensive methodology for estimating the amount of time required to test classes in OOS. However, during object call patterns across all OO classes call methods with the access to the multiple classes of an OOSUT, these OO metrics get more complicated.

2 Related Work To facilitate automatic test beds creation from correctly described classes, Borie et al. [1] employed OO class-level testing based on data control flow graph. They started by creating a basic flowchart that helps with test beds development and coverage analysis. However, because as it doesn’t model the space of OO class state, it may not be able to effectively investigate the OO class behavior. To investigate the current behavior of objects, Hiroki et al. [2] used solemn OO tests based for OOP on a unit based OO class Petri color network. This strategy is independent of specific needs, design approaches, or programming languages. Frankl et al. [3] used the class-level OOTF called the ASTOOT methodology (i.e., “a collection of tools for OOT”) for generating OO test beds in a classical way that monitors the state values and the class state metamorphosis transitions. The ASTOOT methodology examines the OO implementation system by basically using a test oracle to automatically validate the correctness of the test beds in this methodology. However, because test beds are generated at random, it is difficult to do a test coverage analysis a "Boolean" value label to denote the expected outcome of OO test beds. For a system sized level OO system, the Dasiewicz and colleagues [4] by using and emphasizing on the interplay between associated OO classes events, used event flow testing to create test scenarios. Its key benefit is that it detects dormant potential loops and confines the kinky looping path by the user intercession in the OOSUT. However, it’s not suitable for small-scale OO systems since it uses and employs the telephonic “private branch exchange” (PBX) software. Jefferson et al. [5] demonstrated that the current OOT methodology can detect defects in OOS using integrated class-level category splitting approaches; the

A Comprehensive Analysis of Testing Efforts Using the Avisar Testing …

387

compound fusion of the “category partition method” with a tool to identify memory management errors is highly beneficial for OOT. At the integrated class-level Chen et al. [6] created state-based tests for representing the OOSUT shifting states. To represent an integrated system, they have used the FSM (Finite State Machines). On peer-based events tests, to take benefits of the OO systems and for checking the violation of constraints which leverage the relationships between events, another technique offered by the same authors [7].

3 AVISAR for Estimating the Testing Effort 1.

Evaluation of Coherence interrelatedness: LOC (Line of Codes) is accepted as input by the Coherence interrelatedness calculator [8]. A double-edged twin fold bipartitely graph is created using these LOC. Two nodes for the double-edged twin fold bipartitely graph are created using functions and attributes called from an OOS class. An attribute is related to a class function [9]. If this class’s attribute calls this function, the degree of cohesiveness may be determined as {[Set of Functions ∗ Set of Attributes]} − Set of Double − edged twin fold bipartitely graph , 0 ∗ {Max. [Set of Functions ∗ Set of Attributes] − Set of Double − edged twin folds bipartitely Graph

2.

(1)

Calculation of Modular convolution level: It receives the NOC as input to an SUT during the OOD to compute the level of complexity. When applying weights to these classes, the complexity may be determined by using the formula CC =

n 1 ∗ Number of methods, n(i) NOC i=0

where NOC is the number of complexity weights in each of the n classes, and NOM n(i) denotes the number of methods in each of the n classes. Our goal should be to keep the degree of modular convolution level or object oriented complexity as low as feasible [10]. 3.

Estimation of Object oriented Modularity: It gives a rough approximation of different OOD features including encapsulation, inheritance depth, polymorphism, and visibility of methods (Vm) for the object oriented classes. As an input, this component accepts NOC from an SUT.

The designing process of the AVISAR OOTF is shown in the use case and sequence diagram as shown in Figs. 1 and 2.

388

Z. Aalam et al.

Fig. 1 Use case for AVISAR OOTF

Fig. 2 Object sequencing diagram for AVISAR OOTF

4 Interfaces and Exploratory Setup of AVISAR For the OOT effort approximation and estimation in the course of the constitution of the AVISAR OOTF, which we constructed in Java Parser, we employed the code block illustrated in Fig. 3 to ensure maximum code coverage (Figs. 4, 5, 6, 7 and 8).

5 Results of Experiments The suggested OO testing framework AVISAR, when implemented in Java, produced superior results for short pieces of OO code. The followings are the outcomes.

A Comprehensive Analysis of Testing Efforts Using the Avisar Testing …

Fig. 3 AVISAR framework’s Java Parser code block ensures maximum code coverage

Fig. 4 To demonstrate the OOTF AVISAR, which will be used to carry out the requested job

389

390

Z. Aalam et al.

Fig. 5 Selection of any OO source code that may be checked with OOTF AVISAR

Fig. 6 The use of the GA to execute test beds using AVISAR

A Comprehensive Analysis of Testing Efforts Using the Avisar Testing …

391

Fig. 7 To demonstrate the GA’s instantaneous optimization of test beds

1.

2.

3.

When a set of ramification and characteristics were combined with the bend in the double-edged twin fold bipartitely graph to measure cohesiveness, we got the graph shown in Fig. 9. It produced unexpected results for lesser code, but as the OO Morse code proportions grow larger, it grows to infinity. The following nomogram, plotted between number of methods and number of classes in each OOP, was used to compute complexity in Fig. 10 to illustrate its performance. For smaller Java programs and modules, the method visibility, polymorphism, inheritance, and encapsulation are utilized to estimate work in the entire performance of the OOD estimator, as shown in Fig. 11.

6 Conclusions Using the AVISAR OOTF, we have developed a testing technique for evaluating efforts put in during the execution of OOP trails [11]. Using Jenetics java classes, Java Parser library, JCOCO for ensuing code coverage using java classes we have provided our study work, on evaluating efforts during the OOP testing using the AVISAR OOTF that is using the GA as a serving algorithm for the OOT. During the examination of efforts utilizing the OOTF AVISAR, we discovered that the usage of

392

Z. Aalam et al.

Fig. 8 To demonstrate the test beds that is being developed for OOP under test

Fig. 9 AVISAR outputs for the coherence interrelatedness estimator

GA has shown to be beneficial in terms of estimating decreased effort when testing the OOP. For addressing difficulties related to testing of features of OOSUT such as polymorphism, inheritance, and polymorphism, among others the suggested OOTF AVISAR provides an evaluating tenet platform.

A Comprehensive Analysis of Testing Efforts Using the Avisar Testing …

393

Fig. 10 AVISAR outputs for the complexity calculator

Fig. 11 In AVISAR, the results of the OOD estimator

References 1. Parrish, A.S., Borie, R.B., Cordes, D.W.: Automated flow graph based testing of object oriented software modules. J. Syst. Softw. 23, 95–109 (1993) 2. Harumi, W., Hiroki, T., Wu, W., Saeki, M.: A technique for analyzing and testing objectoriented software using colored petri nets. In: IEEE International Conference, pp. 182–190 (1998) 3. Doong, R.K., Frankl, P.G.: The ASTOOT approach to testing object oriented programs. ACM Trans. Softw. Eng. Methodol. 3, 101–130 (1994) 4. Liu, W., Liu, W., Dasiewicz, P.: The event-flow technique for selecting test cases for objectoriented programs. In: IEEE Conference Proceedings (1997) 5. Irvine, A., Offutt, A.J.: The effectiveness of category partition testing of object-oriented software. CiteseerX (1995) 6. Chen, H.Y., Tse, T.H., Chan, F.T., Chen, T.Y.: In black and white: an integrated approach to class-level testing of object-oriented programs. ACM Trans. Softw. Eng. Methodol. 7(3), 250–295 (1998) 7. Chan, W.K., Chen, T.Y., Tse, T.H.: An overview of integration testing techniques for objectoriented programs. In: Proceedings of the 2nd ACIS Annual International Conference on Computer and Information Science (ICIS 2002), International Association for Computer and Information Science, Mt. Pleasant, Michigan (2002) 8. Li, Z., Hamilton M.T.: An approach to integration testing of object oriented programs. In: Seventh International Conference on IEEE Transactions, Quality Software, 2007. QSIC’07, vol. 13, pp. 268–273 (2007); Alisa Irvine, A., Offutt, J.: The effectiveness of category partition testing of object-oriented software. CiteseerX (1995)

394

Z. Aalam et al.

9. Xie, T., Notkin, D.: Automatically identifying special and common unit tests based on inferred statistical algebraic abstractions (2003) 10. Savage, S., Burrows, M., Nelson, G., Sobalvarro, P., Anderson, T.: Eraser a dynamic data race detector for multithreaded programs. ACM Trans. Comput. Syst. (1997) 11. Gerald, H., André, B., Jan, B.: A distributed real-time java system based on csp. CiteseerX (2000)

Fabrication of Energy Potent Data Centre Using Energy Efficiency Metrics Subhodip Mukherjee, Debabrata Sarddar, Rajesh Bose, and Sandip Roy

Abstract For estimating the yielding capacity and rate of performance metrics for every important power consumption sub-procedure in the data centre, a few metrics are used. For augmenting such metrics and controlling better data centre energy potency for trades all over the globe, this research presents two metrics and those metrics’ fruitful utilization for creating a green data centre. They are power usage effectiveness (PUE) and data centre efficiency (DCE) which support to offer an impartial outlook of the accurate condition of data centre executions in regard to power consumption. PUE and DCE estimate the procedure of fruitful energy prices that are distributed to IT components. The use of whole power obtainable for a data centre is specified by the PUE. With the modification of the potency of the cooling procedure of power allocation, reducing PUE, permits more power to be exploited for IT components and lingers the requirement for the constitution of novel data centres. DCE is nothing but the correlative of PUE. This research allows us to get offered with a substitute standard named data centre efficiency (DCE), the correlative of the PUE. Such points may be more beneficial in a few conditions as this illustrates data centre power consumption potency in the mode of percentage. Keywords Data centre · Distributed computing · Energy consumption · Green computing · Carbon emission · Power usage effectiveness (PUE) · Data centre efficiency (DCE) · Efficiency metrics

S. Mukherjee Techno International New town, Chakpachuria, Kolkata, India D. Sarddar University of Kalyani, Kalyani, India R. Bose · S. Roy (B) Brainware University, Kolkata, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_35

395

396

S. Mukherjee et al.

1 Introduction Nowadays, data centres have turned out to be a fundamental segment for the operation of any organization in the globe [1]. They are nothing but the technical edifices that get applied for making rooms of computer systems and their correlated equipment such as telecommunication and repository devices, support power supplies, additional transmission links, ambient management and safety measures as a group [2, 3]. Data centre is a mechanized privilege which has different components for exertion of energy like servers, tools for network, tools for data repository, cooling units and a few more [4, 5]. Collectively, such components process, save and convey digital information which are called IT components. With reference to edifices [6, 7], the power exploitation differs. Tier categorization illustrates the area uniform framework network or topography that is a must to assist the data centre functioning. The benchmark of tier is constituted as a matter of fact that the data centres that are particularly based on the competent, consolidated functioning of several distinct subprocedures, and the amount of it is based on the sole mechanisms such as production of electricity, freezing and synchronized power supplies and many more for the assistance of the overhauls as well as functioning. The benchmarks consist of a fourtired measurement device, along with the Uptime Institute validating the data centres together with the tier number in the expanse from Tier 1 to Tier 4 [8]. It is nothing but the Green Grid [1] that suggested the application of power usage effectiveness (PUE) and its correlative, data centre efficiency (DCE) metrics, that empower data centre controllers to measure hurriedly the energy competence of their data centres, and ascertain whether any kind of energy competence modifications are required being devised [9, 10]. Though PUE has got comprehensive acquisition in the trades since that time, DCE at present is termed DCiE that has got restricted area of achievements on account of the misapprehension of the true significance of energy competence.

2 Metrics Framework If you want to augment the energy potency of your data centre, you should apply the Green Grid [1, 11], an alliance of IT persons which, with a number of suggestions, targets to enhance the energy potency of data centres. The Green Grid suggested the application of power usage effectiveness (PUE) with its correlative data centre infrastructure efficiency (DCiE) metrics for empowering the expeditious evaluation of the potency of energy of a data centre and distinguishes the outcomes and various data centres. Besides, in due course assess genuine rectifications for the modifications of the states [9, 12].

Fabrication of Energy Potent Data Centre …

397

3 Power Usage Effectiveness (PUE) In this research, we interpret power usage effectiveness (PUE) as the ratio of the entire provision power to the power of IT component [9]. PUE =

Total_Facility_Energy × 100% IT_Equipmment_Energy

(1)

The entire provision power is somewhat, which is the high voltage power that is transmitted to the data centre to make functioning of the IT and cooling components. Additional power supply modules like illumination and workplace area for data centre controllers are not regarded for those are customarily small in comparison with power pull of the IT component, switchgear, UPS chillers, cooling tower, air conditioners, liquid conditioners and some more power or cooling components [13]. We can interpret IT component power in such a way that the real line cable power which we draw from each of the IT components within the data centre. The deficit of power, that is connected with controlling and mitigating the voltage from the benefit, is not included in it. The purpose of it signifies that it vividly interprets and estimates the total power dedicated to IT activities. The vital elements are included in the entire provision power. They are IT component power, power controlling or communication deficit and cooling component power. The deficits of power controlling communication are measured to be almost 14% of the IT component power [14, 15]. The Fig. 1 as follows, describes the clarified power sequence along with the connected rate of performance that gets applied for such measurement. Reduced PUE value is advantageous. Reduction of PUE signifies a growing fraction of the entire provision power gets applied by the IT component. It is perhaps obtained with the modification of the potency of the data centre cooling elucidation or lessening AC transformation deficits. The optimum potency of the data centre will be denoted by the PUE whose value is 1. We can say practically that the PUE whose value is 1 that signifies the entire power entering the data centre is being utilized to supply power to the IT component. When the value shows a number more than 1, it signifies that now a data centre overhead is needed to assist the IT volume.

Fig. 1 Power conditioning/conversion efficiency

398

S. Mukherjee et al.

The correlative of PUE, which is named data centre efficiency (DCE), may be regarded too like a substitute picture for a point of reference [13]. 1 IT_Equipmment_Power = × 100% PUE Total_Facility_Power

(2)

This appearance allows us to interpret entire provision power like the power supplied to the data centre, whereas we can interpret IT component power like the device that is applied for handling, processing, saving or routing data in the data centre [16]. We are able to illustrate the equipments for the capacities in the metrics in the following manner: IT COMPONENT POWER: It incorporates the capacity which is connected with each of the IT components like reckoning, repository, component for network and a few additional components for network and a few additional components like KVM switches, monitors, workstations or laptops which are applied for observing or else handling the data centre [17]. • ENTIRE PROVISION POWER: It incorporates all which uphold the IT component capacity like • Equipment for power transmission like UPS, switchgears, PDUs, batteries and dissemination deficits exterior to the IT component • Equipment for cooling procedures like chillers, computer room air conditioning units (CRACs), and direct expansion air handler (DX) units, pumps and cooling towers • Repository junction, reckoning and network • Additional equipment capacities like data centre illumination. We are able to estimate IT component power by power control, switching and modification. The most noteworthy estimation references will be at the output of the computer chamber power allocation units. Such estimation ought to indicate the entire power that is transmitted to the computer device shelves in the data centre [18]. When the allotment of energy in the data centre is well organized, the PUE is able to be applied for estimation. The PUE is able to expand its expanse from 1.0 to an infinite frame, and when a PUE shows 2.0, it designates that the requirement of the data centre is two times better than the energy which is needed to supply the power of the IT component [18]. Moreover, the PUE is able to be applied like a multiplier to compute the genuine influence of the device’s power requirements. Here, we can say as an instance that when a server requires 300 W along with the PUE in favour of the data centre comes 2.0, the power from the benefit grid required for conveying 300 W to the server comes 600 W. Since the entire power is consumed by just IT components, preferably, a PUE significance that comes 1.0 will point to 100% competence. A few studies point to that PUE values of 1.6 are at present attainable with suitable plan in any data centre.

Fabrication of Energy Potent Data Centre …

399

4 Proposed Work and Discussions The research suggested two metrics for creating a power structured data centre. PUE and DCE estimate the procedure of fruitful energy prices that are distributed to the IT component. Such metrics are able to offer an impartial outlook of the accurate condition of the data centre execution in regard to power consumption. The PUE and DCE offer a procedure for the arbitrations as follows: • Scope for modifying a functioning potency of data centre • The procedure of making a comparison of a data centre with rival data centres • Whether the controllers of the data centre are modifying the pattern and methods gradually • Scopes for regenerating energy for supplementary IT components. This research allows us to work on a customary data centre. In this research, we lessen a few unexploited or lifespan finished server and repository procedure which was substituted by virtual server [19]. Then, we compute the PUE, after amending it appropriately. For making computation of PUE, one requires being acquainted with (i) Maximum IT Capacity (KW) and (ii) Maximum Facility Capacity (KW). In this study, initially, the UPS exploitation is computed. Then, from this computation, the entire IT Capacity can be computed. In Table 1 as follows, it is displayed vividly. From the tabular representation above, we discover that the entire real UPS volume in this data centre is 36.2 KW and maximum power consumption is 17.8 KW which is extracted from UPS o/p power. At present, we are able to simply compute obtainable volume and power max exploitation. Maximum facility capacity (KW) computes from LT panel (the power factor has been speculated 0.8 during the computation of facility capacity). The following tabular representation displays it. Many types of equipments affect the entire facility capacity. Most of the inward bound electricity may be engrossed by the cooling framework. Table 1 shows UPS capacity utilization. Table 2 represents heat load and CRACK capacity utilization. The PUE and DCiE are computed using Eqs. (1) and (2), respectively [6, 9]. From Eq. (1), PUE =

34.04 = 1.85 18.4

Now, we calculate DCiE using Eq. (2): Table 1 UPS capacity utilization

Total UPS capacity (KVA)

40.0

Power factor

0.9

Total UPS capacity (KW)

36.2

Current max utilization (KW)

17.8

Available capacity (KW)

18.4

400

S. Mukherjee et al.

Table 2 Heat load and CRACK capacity utilization calculation IT load (KW)

Same as total IT load power in watts

UPS with battery (KW)

(0.04 × power system rating) + (0.05 × total IT load power) 2.4

18.4

Power distribution (KW) (0.01 × power system rating) + (0.02 × total IT load power) 0.7 Lighting (KW)

2. × floor area (sq. ft.)

1.1

People (KW)

100 × Max no. of personnel

0.2

Facility load (KW)

34.04 Total heat load (KW)

22.8

Available cooling capacity (KW)

47.2

DCiE =

18.4 × 100% = 54% 34.04

The entire facility power is interpreted as the power estimated at the benefit meter the power committed uniquely to the data centre, which is significant in blended utilization of edifices which contain data centres like one of many users of power. The IT component power gets interpreted as the component which is exploited to handle, process, stock up or route data in the data centre. Then, the tabular representation as follows displays the real PUE and DCiE value [9]. A greater DCE signifies a superior data centre. DCE with 54% is produced by the PUE with 1.85. It signifies that 54% of the power from the benefit department gets exploited for providing the power to the IT component, and the surplus 46% gets exploited for the framework such as switch gear, UPS, Chillers, CRAC units and a few more. A greater DCE signifies a superior data centre. In need of developing the energy potency of data centres, we have two locations where we are able to influence the amendment, and in need of lessening the power off to the assistance framework or lessening the power admittance in the data centre, we shall create it to the IT capacity. It develops our energy potency and lessens our PUE. Table 3 representation of PUE and DCE testimonial values as follows, is suggested by the Green Grid [20, 21]. That is why, this study goes on due to the data centres which are almost methodical or well organized. Table 4 represents the PUE reference values for our research [22]. Table 3 Energy efficiency calculation using PUE and DCiE

Maximum IT load (KW)

18.40

Maximum facility load (KW)

34.04

PUE (power usage effectiveness)

1.85

DCiE

54%

Fabrication of Energy Potent Data Centre … Table 4 PUE reference values

401

PUE

DCiE (%)

Level of efficiency

3.0

33

Very inefficient

2.5

40

Inefficient

2.0

50

Average

1.5

67

Efficient

1.2

83

Very efficient

5 Conclusion In this manuscript, PUE and DCIE, two metrics are established by the research. Such metrics are able to be exploited for the improvement of a more profound comprehension of the influence of power and cooling on the entire price of a data centre. The application of the entire power supplied for the data centre, is specified by the power usage effectiveness (PUE) metric and its correlative. As those two metrics are fundamentally akin, those are able to be applied for the description of the energy distribution in the data centre separately. As an instance, we can say that when a PUE is arbitrated to be 1.85, it signifies that the requirements of the data centre are 1.85 times more than the energy required for supplying power of the IT component. Moreover, the ratio is able to get applied like a multiplier to compute the genuine impression of the power requirements of the device. Here, as an instance, we can say that when a server requires 500 W and the PUE requires 1.85 in favour of the data centre, the power which is supplied from the usefulness grid, which is necessary to transmit 500 W to the server is 925 W. DCE gets beneficial enough too. If you find a DCE with the value of 54%, which is similar to a PUE with 1.85, then it signifies the IT component engrossed with 54% of the power in the data centre. Such research displayed the genuine PUE of a customary data centre is able to be reduced and for this reason, sometimes the worth of energy is lessened, and sometimes more servers are able to be utilized in the data centre. If either PUE is reduced with the development of the potency of the cooling elucidation or the allotment of power is there, it permits greater power being utilized for IT component, and lingers the necessity for the constitution of novel.

References 1. Mukhopadhyay, B., Bose, R., Roy, S.: A novel approach to load balancing and cloud computing security using SSL in IaaS environment. Int. J. Adv. Trends Comp. Sci. Eng. 9, 2362–2370 (2020) 2. Al-Dulaimy, A., Itani, W., Zekri, A., Zantout, R.: Power management in virtualized data centers: state of the art. J. Cloud Comp. 5, 1–15 (2016) 3. Ayanoglu, E.: Energy efficiency in data centers. IEEE ComSoc. Tech. Committees Newslett. 1 (2019)

402

S. Mukherjee et al.

4. Mukherjee, D., Chakraborty, S., Sarkar, I., Ghosh, A., Roy, S.: A detailed study on data centre energy efficiency and efficient cooling techniques. Int. J. Adv. Trends Comp. Sc. Eng. 9, 1–21 (2020) 5. Koronen, C., Åhman, M., Nilsson, L.J.: Data centres in future European energy systems— energy efficiency, integration and policy. Energ. Effi. 13, 129–144 (2020) 6. The Green Grid Data Center Power Efficiency Metrics: PUE and DCiE. White Paper, 1–16 (2007) 7. Renugadevi, T., Geetha, K., Muthukumar, K., Woo Geem, Z.: Optimized energy cost and carbon emission-aware virtual machine allocation in sustainable data centers. Sustainability 12, 1–27 (2020) 8. Turner IV, W.P., Seader, J.H., Brill, G.B.: Tier classification define site infrastructure performance. The Uptime Institute, White Paper, 1–17 (2006) 9. Bose, R., Roy, S., Mondal, H., Row Chowdhury, D., Chakraborty, S.: Energy-efficient approach to lower the carbon emissions of data centers. Computing, 1–19 (2021) 10. Zhou, Q., Xu, M., Gill, S.S., Gao, C., Tian, W., Xu, C., Buyya, R.: Energy efficient algorithms based on VM consolidation for cloud computing: comparisons and evaluations. In: Proceedings of 20th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID 2020), Melbourne, Victoria, Australia, 1–10 (2020) 11. Gill, S.S., Buyya, R.: A taxonomy and future directions for sustainable cloud computing: 360 Degree View. ACM Comput. Surv. 51, 1–33 (2019) 12. Cole, D.: Data center energy efficiency—looking beyond PUE. No Limits Software White Paper #4, 1–15 (2011) 13. Kumar, R., Khatri, S.K., Divan, M.J.: Efficiency measurement of data centers: An elucidative review. J. Discrete Math. Sci. Crypt. 23, 221–236 (2020) 14. Hossain, M.S., Rahaman, S., Kor, A., Andersson, K., Pattinson, C.: A belief rule based expert system for datacenter PUE prediction under uncertainty. IEEE Trans. Sustain. Comput. 2, 140–153 (2017) 15. Perekrest, A., Chebotarova, Y., Al-Issa, H.A.: Principles of designing and functioning of the expert system of assessing the efficiency of introducing energy conservation measures. In: 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON), Lviv, Ukraine, pp. 871–875 (2019) 16. Napoli, C.D., Forestiero, A., Lagana, D., Lupi, G., Mastroianni, C., Spataro, L.: Efficiency and green metrics for distributed data centers. Tech. Report ICAR-CNR 2016–04, 1–28 (2016) 17. Mukherjee, D., Roy, S., Bose, R., Ghosh, D.: A practical approach to measure data center efficiency usage effectiveness. In: 2nd International Conference on Computing and Sustainable Informatics (ICMCSI 2021), Tribhuvan University, Nepal, pp. 1–9 (2021) 18. Mukherjee, D., Roy, S., Bose, R., Mondal, H.: Potency of virtualization technology for getting energy potent data center. In: International Conference on Innovations in Energy Management and Renewable Resources (IEMRE 2021), Kolkata, pp. 1–5 (2021) 19. Bilal, K., Khan, S.U., Zomaya, A.Y.: Green data center networks: challenges and opportunities. In: 11th international Conference on Frontiers of Information Technology (FIT), Islamabad, pp. 229–234 (2013) 20. Santhaanam, A., Keller, C.: The role of data centers in advancing green IT: a literature review. J. Soft Comput. Decis. Support Syst. 5, 1–19 (2018) 21. Roy, S., Bose, R., Sarddar, D.: Self-servicing energy efficient routing strategy for smart forest. Braz. J. Sci. Technol. 3, 1–21 (2016) 22. Avelar, V., Azevedo, D., French, A.: PUE™: A comprehensive examination of the metric. The green grid, White Paper #49, 1–83 (2012)

Performance Anomaly and Change Point Detection for Large-Scale System Management Igor Trubin

Abstract The presentation starts with the short overview of the classical statistical process control (SPC)-based anomaly detection techniques and tools including Multivariate Adaptive Statistical Filtering (MASF); Statistical Exception and Trend Detection System (SETDS), Exception Value (EV) meta-metric-based change point detection; control charts; business driven massive prediction and methods of using them to manage large-scale systems such as on-prem servers fleet or massive clouds. Then, the presentation is focused on modern techniques of anomaly and normality detection, such as deep learning and entropy-based anomalous pattern detections. Keywords Anomaly detection · Change point detection · Business driven forecast · Control chart · Deep Learning · Entropy analysis

1 Introduction IT management processes, such as capacity planning and performance engineering, in a very large organizations require the following approaches: • Exception-based management to focus on real upcoming issues with low false positives rate and zero tolerance to false negatives. • Self-service based online tools for IT users that should have AI/ML elements to underline patterns of IT resources usage to make optimal and proactive decisions. There are some proven technologies that can be used to implement those approaches: • Statistical Exception and Trend Detection System (SETDS), which is based on the known MASF method (SPC method adopted for computer performance measurements).

I. Trubin (B) Capital One Bank, 7900 Westpark Drive, McLean, VA 22102, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_36

403

404

I. Trubin

• Anomaly visualization, e.g., control charts to show baseline versus most recent data comparison with pointing to anomalies. • Business driven predictions based on regression analysis that correlates business and system performance data to produce meaningful forecasts. • The most modern techniques such as a neural network (deep learning) and entropybased imbalance analysis.

2 Statistical Exception and Trend Detection System (SETDS) SETDS is a methodology [1] of using MASF statistical filtering [2], pattern recognition, active baselining, dynamic versus static thresholds, control charts [3], EV-based reporting/smart alerting and EV-based change points/trends detection to do Systems Capacity Management including capacity planning and performance engineering. Figure 1 illustrates visually how SETDS analyzes data that groups data by 168 week hours and compares two data sets: baseline or reference/learning set and most recent

Fig. 1 Users number versus CPU usage correlated control charts

Performance Anomaly and Change Point Detection …

405

Fig. 2 The two trend-forecasts: based on all history versus from change point detected

7 days actual data points [4]. Those charts show visually a workload pattern with weekly daily seasonality and when anomalies are happening. Then, Exception Values (EV) [2], which basically is an anomaly score (magnitudes), are calculated hourly or daily as a difference between statistical upper and lower limits (UCL and/or LCL) and the actual data to keep that aside for additional analysis. EV data is used to detect past change points by solving the “EV(t) = 0” equation, where “t” is time and roots are change points. This method can be used to detect a recent trend within historical data from the last change point to the most recent data point. Using that subset of the historical data as a sample allows to build a much more accurate trend-forecast as shown on Fig. 2. Even the EV-based adjusted trend-forecast could be not accurate as that does not take in account any business plans or metrics. That could be improved by building a regression model by correlating performance data (e.g., CPU usage on Fig. 1) and business metrics (e.g., number of customers/users on Fig. 1, second chart). But the issue is a scalability concern for a large number of systems, and the solution is [5] to use not raw hourly historical data but only processed by SETDS data that consists of only 168-week hours data points summarized over long a period of time (months or years).

3 Anomaly Detection by Neural Network To detect both abnormality and normality in the system performance data e.g., to separate different workload patterns such as Online Transactional Processing (OLTP),

406

I. Trubin

Fig. 3 Deep learning neural network to detect a “normal” (OLTP) workload

batch jobs or patterns of some typical defects [6] could be done by neural network. Figure 3 shows one of them built by using some R-system package and could put workload patterns like shown on Fig. 1 into 3 categories: OLTP, not OLTP and “flat” (that could be a batch job or idle pattern). By the way, the second chart on Fig. 1 is the classical “cowboy hat” OLTP workload pattern but the 1st chart shows a deviation from OLTP in the middle of the week—that is the indication of some defect. In contrary to SETDS, this method is a supervised machine learning one as to get that working correctly one needs to prepare the learning set manually by labeling profiles that look close to OLTP, no-OLTP or “flat” and then “to teach” the net to do categorization correctly. One of the practical benefits of the method is the ability to process thousands of workload profiles and then to put aside abnormal ones as possibly affected by some workload pathology defects such as memory leaks, run-away processes, or others.

4 Entropy-Based Method of Anomaly Detection The known challenge of finding anomalies for short living objects is a lack of measurements. For example, that could be servers or customers that have just started being monitored or cloud objects (EC2s, ASGs) that usually have very short lifespans. The suggested approach is to detect anomalous behavior of this type of object by estimating the entropy of each object by using the following formula [6]:

Performance Anomaly and Change Point Detection …

nIC =

407

n xi 1 xi ∗ ln( ). i=1 x avg n ∗ ln(n) xavg

Where (for the object that is monitored over time): nIC n x i

Normalized Imbalance Coefficient, number of observation (e.g., hours a day), measured metric value for the interval (e.g., server instances count), measurement interval (e.g., hour or day).

The nIC value varies from 0 to 1. “0” means it is completely balanced, (e.g., for a cloud cluster (e.g., AWS ASG) it is just the constant number of server nodes in the group) and 1 is complete random disorder. For example, if the cluster has a strange behavior of spinning and unspinning randomly a lot of nodes the nIC number should be much higher than for a case when a new node has been created and deleted not very often that could help to tune the autoscaling policies or identify a workload defect.

References 1. Trubin, I.: Exception based modeling and forecasting. In: Proceedings of Computer Measurement Group (2008) 2. Jeffrey Buzen, F., Annie Shum, S.: MASF—multivariate adaptive statistical filtering. In: Proceedings of Computer Measurement Group (1995) 3. Trubin, I.: Review of IT control chart. CIS J. 4(11), 2079–8407 (2013) 4. Perfomalist Homepage, http://www.perfomalist.com. Last accessed on 10 June 2021 5. Trubin, I., et al.: Systems and methods for modeling computer resource metrics. US Patent 10,437,697 (2016) 6. Trubin, I.: Capturing workload pathology by statistical exception detection. In: Proceedings of Computer Measurement Group (2005) 7. Loboz, C.: Quantifying imbalance in computer systems. In: Proceedings of Computer Measurement Group (2011)

Artificial Intelligence Driven Monitoring, Prediction and Recommendation System (AIM-PRISM) Sanjeev Manchanda

Abstract System Performance Monitoring and Management of Enterprise Information systems is an exhaustive work that every organization confronts, and it consumes a lot of effort and cost. Different organizations spend significant cost and efforts in maintaining their applications. This paper presents a system that automates the monitoring, predictions and recommendation of performance issues, bottlenecks, trends and system failures in enterprise information systems, using state-of-the-art machine learning algorithms. This system will help the organizations in maintaining the overall health of different applications intelligently as centralized control with distributed access and control of applications. Keywords AI/ML · Application performance management · Enterprise information systems · Intelligent systems

1 Introduction System Performance Monitoring and Management of Enterprise Information systems for different applications of organization involves a lot of human resources and costs. With rise of artificial intelligence and machine learning (AI/ML) driven automation, it is important to centrally control and automate this exhaustive exercise. Individual applications maintained by individual teams cannot be an affordable affair in today’s competitive business environment. It motivated us to develop a system that can maintain applications in a better way. This paper presents a novel system that first, centralizes system the control of monitoring, prediction and recommendation system for performance issues, bottlenecks, trends and system failures in Enterprise Information Systems. Secondly, automates many critical tasks and minimizes the system failure rates using state-of-the-art AI/ML algorithms. Third Reduces human efforts and costs involved to maintaining different applications. Fourth, its intuitive UI/UX design makes it easier to learn and use this system. State-of-the-art S. Manchanda (B) A&I, TCS, Mumbai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_37

409

410

S. Manchanda

AI/ML-based automation enhances the accuracy of predicting operational issues, bottlenecks, trends, and system failures by minimizing the operational risks leading to revenue loss for organizations.

2 Historical Background Application and network performance monitoring is explored a lot in the past and many systems exist worldwide to help the organizations for the cause. But most of those systems are performance monitoring systems but are not intelligent enough to predict and take corrective actions automatically. Recent developments in AI/ML has led to develop more intelligent systems, so system monitoring, predictions and recommendations through state-of-the-art deep learning algorithms can give an edge to latest systems. This progress is a journey over time, many such systems are created, reviewed and helped the organizations. In a study, a set of monitoring tools were discussed that are used within their organization and shown the benefits of utilizing these tools [1]. In another study approach for proactive monitoring of applications and metrics for infrastructure capacity planning, Service Level Agreements (SLAs) on systems availability were described [2]. In another study, the development techniques of application monitoring and continuous performance assurance were presented [3]. Their results indicated the use of their techniques can minimize costs, reduce risks and help in achieving business objectives. In a research, a system called Toddler was presented that could find system bugs with a higher accuracy than standard Java profiler [4]. In another study, the methods for monitoring application’s quality of service over cloud services were reviewed [5]. In a comparative study, application monitoring tools and metrics were listed for their evaluation [6]. In another study, an application performance monitoring approach to monitor performance and availability of web application, based on Bayesian networks was presented [7]. Another study discussed about research dimensions and design issues as well as handling of engineering cloud monitoring tools [8]. Another study presented multiple monitoring tools available in the market [9]. Another study presented the approaches of data collection from distributed systems [10]. It further described the criteria for evaluating and selecting right monitoring tools from available options. In another study, different application performance management tools were compared and evaluated them for performance regressions [11]. Their evaluation results have shown the differences between approaches that are described in performance regression and the implementations. Another study presented a comparative analysis of many network monitoring tools through many metrics [12]. Their evaluation of different tools described the advantages and limitations of tools as well. One latest study discussed about artificial intelligence-based approaches in optical networks and presented artificial intelligence-based quality of transmission as well as impairment modeling and monitoring methods [13]. They also presented problems in artificial intelligencebased methods deployment and their resolutions. Recent developments in artificial intelligence/machine learning (AI/ML) has enabled us to work on a comprehensive

Artificial Intelligence Driven Monitoring, Prediction …

411

software that will facilitate artificial intelligence driven monitoring, prediction and recommendation system (AIM-PRISM) for optimizing performance issues, bottlenecks, trends and system failures in enterprise information system and minimize costs, reduce efforts with ease of deployment and use.

3 Problem Definition Monitoring, prediction and recommendation system for performance issues, bottlenecks, trends and system failures in Enterprise Information Systems consume a lot of human resources and costs. Optimizing resource utilization and costs is essential. There is a need of a framework that can monitor different applications, can intelligently predict issues or system failures, and can recommend solution for them. AI/ML-based techniques can contribute in monitoring, predicting issues as well as system failures and can recommend suitable actions to overcome them.

3.1 Limitations of Existing Solutions There are many systems that exist to monitor applications. These systems have certain limitations as follows:

3.1.1

Lack of Centralized View of Applications’ Status

Usually, applications are maintained individually, and centralized status view of all applications is usually missing. Few applications that have central monitoring features lack in latest automation techniques.

3.1.2

High Costs of Maintaining Applications

Different applications involve individual team’s effort for monitoring and maintaining applications that lead to higher costs.

3.1.3

Lacking in Real-Time Processing

Real time analysis, processing and actions are not implemented in most of the applications and are mostly dependent on human efforts for issue tracking and resolutions.

412

3.1.4

S. Manchanda

Unavailability of Automation

Lack of automation brings such bottlenecks to maintain big operations teams to monitor and resolving issues around the clock, which makes it difficult to maintain applications.

3.2 Scope of AIM-PRISM System Current paper addresses the problem of monitoring, prediction and recommendation system for performance issues, bottlenecks, trends and system failures in enterprise information systems through AI/ML-based algorithms. This system needs to collect real-time data of different performance statistics from different applications, store the data into repositories for trend analysis/reporting and perform advanced analytics for identifying/predicting issues and/or system failure to intelligently resolve the issue or recommending solution for further action. AIM-PRISM employs central system for near real time to long-term monitoring, storage, predictions and recommendations of issues and their actions of their resolution, real-time monitoring, analysis and actions are performed at edge analytics servers, State-of-the-art deep learning neural networks help in creating automated system that can monitor applications, predict issues and recommend/trigger actions automatically.

4 Proposed Solution An artificial intelligence driven monitoring, prediction and recommendation system (AIM-PRISM) for performance issues, bottlenecks, trends and system failures in enterprise information system is proposed for facilitating centralized monitoring and controlling of distributed applications. System collects real-time information from different applications and stores in repositories for further analysis. Real-time data is used for real time analysis and actions/recommendations.

4.1 Monitoring, Prediction and Recommendation Approach AIM-PRISM employs real-time processing of data for real-time monitoring through edge analytics and near real time to long-term analytics through central data center. Edge computing deploys real-time monitoring of applications through analysis of real-time application status data stream, real-time actions are triggered by edge servers, whereas processed data is made available to central server for near real time to long-term analytics and suitable actions. AIM-PRISM system facilitates central

Artificial Intelligence Driven Monitoring, Prediction …

413

monitoring, predictions and recommendations of actions without compromising real-time monitoring, analysis and suitable actions through edge analytics. Figure 1 depicts the process flow of AIM-PRISM system. Performance and status statistics related real-time data is ingested into AIM-PRISM system, where data is stored for trend analysis and data streams are processed for real-time analytics. Real-Time Analytics analyzes the data from real-time streaming as well as from archived data for artificial intelligence/machine learning-based performance issues, bottlenecks, trends and system failures in enterprise applications. This real-time analysis is further converted into actionable insights, where automatic actions are triggered for automated control of applications, raising alerts for threshold-based events and real-time reporting. All analytical activities are carefully divided into actions, alerts and reporting events to support centralized administration. Centralization of administration has its own drawbacks of single point of failures. To overcome the problem of single point of failure, AIM-PRISM is designed to utilize edge computing and maintains synchronized copies of centralized data on multiple locations, and AIM-PRISM is optimized to connect with most responsive data repository for reporting. Figure 2 depicts the connectivity of cluster of central servers and central server having replicated copies of data, these servers are further connected to edge servers to collect performance metrics, logs and server status data from connected applications. Figure 3 depicts the Prediction and Recommendation Processes of AIM-PRISM. Real-time status, performance metrics and logs are collected by AIM-PRISM system to display real-time status to user on their dashboards. System generates real-time predictions and recommendations based on real-time data, Logs and Historical Data. Alerts and reports generated for suitable further actions, whereas automated actions are triggered by system itself. System will generate an alert for users, whenever

Fig. 1 Process flow of AIM-PRISM system

414

S. Manchanda

Fig. 2 Connectivity network of AIM-PRISM server

Fig. 3 Prediction and recommendation processes of AIM-PRISM

it predicts a system failure due to probable disk crash in near future. System will also generate scheduled and contingency reports on health checks and performance statistics of different applications. System will automate many tasks that can be performed without involving human, e.g., disk space enhancements, index creation for growing data. AIM-PRISM system is evolving with open systems technology stacks to support different operating systems and AIM-PRISM Designer and Scheduler module

Artificial Intelligence Driven Monitoring, Prediction …

415

enables the system administrator to design and review the automation of tasks. Designer enables designing of automation tasks through easy to use drag and drop graphical user interface. System Administrator can select from library of pre-defined automated task or can design new task with easy drag and drop editor with optional parameter settings and defining custom rules. One can decide to automate certain set of tasks that can be scheduled, whenever needed and can be removed from schedule as per requirement and convenience.

Algorithm 1 depicts the processing of system, where system receives applications’ logs on edge gateways and analyzed to trigger real-time actions. Applications logs are filtered to store relevant data. Filtered data is sent to central system for analyzing newly received data with historical data to generate predictions and triggering alerts, actions and generating reports, followed by that user feedback is collected and is stored into persistent historical storage.

416

S. Manchanda

4.2 Leveraged Advanced AI/ML and Associated Techniques Many source applications are connected to AIM-PRISM system that have variety of data that is processed for further evaluation and recommendations. Identifying patterns in different applications involve different levels of processing as follows:

4.2.1

Edge Analytics

Edge analytics help in distributing processing of data toward edge computing device that helps in real-time processing of information for quick actions, e. g. log streams share CPU temperature information and suddenly a machine of connected source application starts operating on extremely high temperature that can burn the machine then edge server of AIM-PRISM system will trigger a shutdown action of that machine of application. If an application is not responding, then edge server generates alerts for that. Edge computing helps in analyzing the log in real time as it is very close to actual application and can trigger real-time actions based on real time analysis and predictions of application logs. Real-time applications, databases, cloud services, containers, workstation status, is monitored and their metrics are analyzed. Edge server also analyzes errors, CRUD events, transactions, access/permission requests and other associated metrics like CPU, memory, and disk utilization for analysis and further actions. Edge server filters the streams of applications log data for forwarding qualified data to central servers for near real time to long-term analysis and storage of data, which leads to savings of bandwidth and storage costs.

4.2.2

Data Center Analytics

Central system’s analytics server receives qualified data from edge server for near real time, short, middle and long-term patterns identification is done. Central server keeps a track on trends, bottlenecks and issues to resolve them automatically, e. g., if there is a change in policy for archival from monthly to fortnight then central system will implement this policy across all applications automatically. From past history, if system predicts possibility of potential breakdown using AI/ML techniques, then system generates alerts for that. Central system keeps a track of updates, deployment/rollback of software, migration and peak transaction times on different connected applications.

4.2.3

Automated Feature Engineering

State-of-the-art latest techniques allow automated feature engineering to analyze different types of data through automated feature engineering to select and preprocess input data for further analysis, modeling, predictions and appropriate actions,

Artificial Intelligence Driven Monitoring, Prediction …

417

e. g. which data should be used for long-term analysis that is decided at edge computing servers through automated feature engineering. Similarly, which data should be used for analysis and model building for predicting system failures and breakdown at central server are supported by automated feature engineering.

4.2.4

Advanced AI/ML Techniques

Advanced deep learning neural network algorithms customized for this system help in analyzing the data with very less examples, e.g., a disk failure is a rare event, this system identifies the probable disk failures in real time through past minimal examples and triggers backup actions before failures or sends alerts to concerned stakeholders. Similarly, AI/ML techniques are employed for predictive analytics, anomaly detection, generating alerts through smart log parsing for network and application maintenance.

4.2.5

Feedback Looping for Machine Learning

There is provisioning of automated continuous feedback mechanism for all predictions, where system learns from actual values against predictions and learns from them, e. g., predicted disk failure is learnt to improve predictions in the future through actual disk failures. Unauthorized access of applications, user behavior and patterns identification are improved through proper feedback looping.

5 System Architecture Figure 4 depicts the architecture of AIM-PRISM system. There are four modules of the system, viz. Storage and Intelligence Server Module, Visualizations/Alerts Module, Administrative Module and Input Sources Module. • Storage and Intelligence Server Module: This module is responsible for receiving updates from source applications about their real-time status, log files, storing application status, analyzing results in real-time, streaming data for Real-Time Dashboards/Scorecards for updates, generating alerts/ad-hoc queries/scheduled reports and issuing automated control over applications. • Edge Computing Module: Edge computing is used for edge analytics of input streams of data and real-time processing as well as actions are triggered. Processed and filtered data is transferred to central system for further processing. • Visualizations/Alerts Module: This module allows users to view reports, query the repository with ad-hoc queries, generate alerts and visualize real-time status of applications.

418

S. Manchanda

Fig. 4 Architecture of AIM-PRISM system

• Administrative Module: This component is responsible for administering application user management, designing/scheduling automation tasks, Storage/Application/Intelligence server management, controlling source applications and profiling users for system access. • Input Sources Module: This component is responsible for facilitating interface to different application for receiving real-time/batch updates and interacting/controlling applications for automated tasks.

6 Advantages of Using AIM-PRISM System Performance Monitoring and Management of Enterprise Information systems for different applications of organization involves a lot of human resources and costs. With rise of artificial intelligence and machine learning (AI/ML) driven automation, it is important to centrally control and automate this exhaustive exercise. Individual applications maintained by individual teams cannot be an affordable affair in today’s competitive business environment. This paper presents a system that gives following benefits to organization that will use this system: • Centralized Control: This system centralizes system the control of monitoring, prediction and recommendation system for performance issues, bottlenecks, trends and system failures in enterprise information systems. • Automation: This system automates many critical tasks and minimizes the system failure rates using state-of-the-art AI/ML algorithms and ensures high availability of applications. • Reduction in Efforts Involved: This system reduces human efforts and costs involved to maintaining different applications.

Artificial Intelligence Driven Monitoring, Prediction …

419

• Ease of Use: Intuitive UI/UX design of this system makes it easier to learn and use this system. • Performance and Reliability: State-of-the-art AI/ML-based automation enhances the accuracy of predicting operational issues, bottlenecks, trends and system failures by minimizing the operational risks leading to revenue loss for organizations. • Key Differentiators: Key differentiators of AIM-PRISM system are use of edge computing, central server for holistic control of applications, automation using state-of-the-art AI/ML techniques, automated feature engineering, realtime processing/actions and feedback looping for machine learning. AIM-PRISM system will facilitate real-time monitoring of applications through centralized view of performance issues, failures and will help in identifying bottlenecks. AI/ML techniques will automate the resolution of issues resolution through identifying the patterns and taking required actions automatically. System employs edge computing to resolve real-time issues, bottlenecks, and system failures, whereas near real-time to long-term issues, trends, system failures and application failures are resolved through central system. Real-time streaming of applications’ logs is handled by edge servers, whereas filtered data required for near real time to long-term processing as well as storage is transferred to central system.

7 Results and Comparison with Benchmarks Results of proposed system are generated and evaluated through multiple metrics, viz. Response Time, Throughput, Error Rates, Performance of External Services, Most Time-Consuming Transactions, Cross-Application Tracing, Transaction Breakdown, Deployment Analysis, History, and Comparison. Currently, proposed system is evolving with a bigger vision of augmented intelligence through artificial intelligence. Current implementations have shown very encouraging results, where AIMPRISM system has shown significant improvements over industry benchmarks [14] for key performance indicators to address key challenges for improving usability of application performance monitoring data, time, issues identification, costs and data usability efficiency. Figure 5 depicts key challenges for Improving usability of application performance monitoring data with time spent correlating performance data with a reduction from 63 to 22%, amount of irrelevant performance data storage reduction from 61 to 11% due to use of edge computing, number of false positives reduced from 48 to 9.7% and user interface (UI) difficulty reduced from 42 to 7%. Figure 6 depicts key challenges for time, issues identification, costs and data usability efficiency with time spent in troubleshooting reduced from 63 to 22%, issues identification before they affect users improved from 61 to 91%, management costs reduced from 48 to 12% and usability of application performance data improved from 42 to 89%. Followed by completion of development, results will be evaluated against benchmark systems and results of this system will be compared with results of earlier

420

S. Manchanda

70% 60% 50% 40% 30% 20% 10% 0%

63%

61% 48%

42%

22% 11%

Time Spent Amount of Correlating Irrelevant Performance Performance Data Data Benchmark

9.70%

Number of False Positives

7% UI is difficult to use

AIM-PRISM

Fig. 5 Key challenges for improving usability of application performance monitoring data

91%

100% 80%

63%

61% 48%

60% 40%

89%

22%

42%

12%

20% 0%

Time Spent Identifying Management Usability of Trouble issues Costs Application shooting before they Performance affect Data users Benchmark

AIM-PRISM

Fig. 6 Key challenges for time, issues identification, costs and data usability efficiency

studies. In addition to process flow creation, many parallel activities are also in progress, viz. Technology and Libraries evaluation, Corpus Creation and Evaluation through Domain Experts and Identifying suitable algorithms for automating different processes and recommendations.

Artificial Intelligence Driven Monitoring, Prediction …

421

8 Conclusion and Future Directions In this paper, an artificial intelligence driven monitoring, prediction and recommendation system (AIM-PRISM) is presented for resolving and controlling performance issues, bottlenecks, trends and system failures in enterprise information systems. AIM-PRISM facilitates a complete AI/ML driven eco-system to monitor and control enterprise applications based on edge computing. Results of initial implementations indicate significant improvements over benchmark results. AIM-PRISM system is evolving to meet domain specific challenges and for implementation of end-to-end automation of controlling performance issues, bottlenecks, trends and system failures in enterprise information systems with state-of-the-art use of AI/ML techniques and edge computing.

References 1. Miller, R., Hill, J.J., Dillow, D.A., Gunasekaran, R., Shipman, G., Maxwell, D.: Monitoring tools for large scale systems. Comput. Sci. (2010) 2. Horalek, J., Sobeslav, V.: Proactive ICT Application Monitoring. Latest Trends in Information Technology, pp. 49–54. Wseas Press (2012) 3. Sahasrabudhe, M., Panwar, M., Chaudhari, S.: Application performance monitoring and prediction. In: IEEE International Conference on Signal Processing, Computing and Control (ISPCC), Solan, pp. 1–6, (2013). Doi: https://doi.org/10.1109/ISPCC.2013.6663466. 4. Nistor, A., Song, L., Marinov, D., Lu, S.: Toddler: Detecting performance problems via similar memory-access patterns. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE’13, pages 562–571 (2013) 5. Ranjan, R., Buyya, R., Leitner, P., Haller, A., Tai, S.: A note on software tools and techniques for monitoring and prediction of cloud services. Softw. Pract. Exp. 44, 771–775 (2014) 6. Kowall, J., Cappelli, W.: Magic quadrant for application performance monitoring. Gartner (2014) 7. Wang, C., Su, L., Zhao, X., Zhang, Y.: Application performance monitoring and analyzing based on Bayesian network. In Proceedings of the 11th Web Information System and Application Conference, WISA, 14, pages 61–64 (2014) 8. Alhamazani, K., Ranjan, R., Mitra, K.: An overview of the commercial cloud monitoring tools. Res. Dimen. Des. Issues State-of-the-art Comput. 97, 357–377 (2015) 9. Hernantes, J., Gallardo, G., Serrano, N.: IT infrastructure-monitoring tools. IEEE Softw. 32(4), 88–93 (2015) 10. Kufel, Ł.: Tools for distributed systems monitoring. Found. Comput. Decis. Sci. 41 (2016). https://doi.org/10.1515/fcds-2016-0014,No.4 11. Ahmed, T.M., Bezemer, C.P., Chen, T.-H., Hassan, A.E., Shang, W.: Studying the effectiveness of application performance management (APM) tools for detecting performance regressions for web applications: an experience report. In: Proceedings of the 13th International Conference on Mining Software Repositories, Pages 1–12 (2016). https://doi.org/10.1145/2901739.2901774 12. Chahal, D., Kharb, L., Choudhary, D.: Performance analytics of network monitoring tools. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 8(8). ISSN: 2278-3075 (2019) 13. Liu, X., Lun, H., Fu, M., Fan, Y., Yi, L., Hu, W., Zhuge, Q.: AI-based modeling and monitoring techniques for future intelligent elastic optical networks. Appl. Sci. 10, 363 (2020) 14. Baker, J.: Dealing with Application Performance Monitoring (APM) Data Overload, https:// www.extrahop.com/company/blog/2012/dealing-application-performance-monitoring-apmdata-overload-part-1 (2012)

Financial Forecasting of Stock Market Using Sentiment Analysis and Data Analytics Dipashree Patil, Shivani Patil, Shreya Patil, and Sandhya Arora

Abstract The stock market prediction is the focus of many research works. State of art methodologies for stock prediction uses historical stock indices. News headlines, social media mentionings, and official reports influence stock market movements remarkably. The aim of this paper is to combine traditional data analytics methodologies and sentiment analysis to forecast the stock market trends. Along with machine learning classifiers we move one step forward and propose a system using deep learning methodologies and correlation of both the textual and numerical data analysis. We present the research done on prediction on stock trends using natural language processing. We further enhance the predictive model by integrating a sentiment analysis module on textual data to correlate the public sentiment of stock prices with the market trends. The experiments performed on real-world datasets conclude that Support Vector Machine (SVM), Random Forest Classifier, and Decision Tree Classifier performed well with more than 90% accuracy. For deep learning models, LSTM showed the highest accuracy (92%) followed by bidirectional RNN, deep CNN, shallow RNN neural networks. Our analysis shows that deep learning can be applied efficiently for stock market sentiment analysis, and the LSTM model is proven to be best performing on the textual data under study. Keywords Stock prediction · Classification · Deep learning · Convolutional neural network · Recurrent neural network · Long-short term memory

D. Patil (B) · S. Patil · S. Patil · S. Arora MKSSS Cummins College of Engineering for Women, Pune, India e-mail: [email protected] S. Patil e-mail: [email protected] S. Patil e-mail: [email protected] S. Arora e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_38

423

424

D. Patil et al.

1 Introduction The stock market is a platform where companies float shares to the general public in an initial public offering (IPO) to raise capital. Forecasting the stock market prices has always been area of interest of research for investors. Data mining techniques have been proved effective for forecasting stocks with the help of different algorithms. A lot of factors affect the stock market prices which include the numerical data, i.e., the opening and closing values for different companies, announcements and news that are in the field of politics, finance, industrial sector, etc. Such challenging factors require faster ways to support investors perform sentiment analysis of news articles and provide relevant information for decision-making in the stock market. The aim of the research study is to combine NLP techniques to perform sentiment analysis on recent textual data consisting of news, social media mentioning, and official filings by the company and data analytics to make predictions in stock trends. The rest of the paper is organized as follows: Sect. 2 shows the related state of art methodologies and literature study. Followed by, Sect. 3 where we discuss the proposed model framework. Section 4 evaluates all the techniques discussed in previous chapters. Section 5 describes the conclusions of the study and the future work.

2 Related Work We have discussed the related work going on in the field of prediction of stock market trends in this section. Sawant et al. in their paper [1] propose a method to Integrating Stock Twits and News Feed with Stock Data for a better stock market prediction. Loughran and Mcdonald [2] have performed an experimental study on financial 10-k reports, and separate word lists were made available for our research work. The empirical study performed by Sagala et al. [3] shows the highest accuracy achieved from the method combining features from historical data and online media sentiment, on 5 days trading window using the SVM algorithm. Nti et al. [4] and Bharathi and Geetha [5] in the respective studies performed propose a methodology for effective stock market prediction using sentiment analysis. Kim [6] has proposed a convolutional neural network design for effective sentiment analysis. Liu et al. [8] proposed the method to analyze both textual data for sentiment analysis and numerical stock quotes for forecasting the price.

Financial Forecasting of Stock Market Using Sentiment …

425

3 Proposed Model The ultimate goal of the proposed model is to analyze the crowd sentiment using sentiment analysis and opinion mining techniques to establish the relationship between peoples’ sentiment, past stock market behavior, and predicting whether the stock price will be rising or falling. We consider various features like social media mentionings, financial news along numeric stock market data to boost the accuracy of the prediction model. The proposed dynamic linear model forecaster for sentiment analysis of textual data under study is explained in Fig. 1. The implementation process is as follows:

3.1 Data Collection We curated the robust dataset by scraping data from multiple reliable sources like: a. b.

c.

d.

Social media mentionings consists of tweets [9] and news related to Apple stocks, for the timeframe of Sept to Dec 2020 and Reddit posts from Kaggle News Scraping: We have scraped recent news data from various platforms like Bloomberg news—using RapidAPI and Google news dataset of nearly 4 million articles for about 6000 stocks from 2009 to 2020 for training purposes [10] 10-k Filings: 10-k are the financial reports officially published yearly by the company that indicates the company’s potential to succeed. We obtained these reports by scraping data from the EDGAR dataset [11] Numerical stock prices: real-time stock data is fetched from: Yahoo finance market data downloader library finance [12] for the decided time frame.

Fig. 1 Dynamic linear model forecaster

426

D. Patil et al.

3.2 Data Preparation The pre-processing methodology used for textual data is, removing unnecessary URLs, usernames, and user id information, removal of hashtags, case folding, Tokenization, Stemming and lemmatization, Stopwords removal, Spam/False tweet filtering. Pre-processing steps for numerical data are: Filtering of missing rows, scaling. Handling unannotated data: We have implemented our own classifier model trained on multiple classifiers. The final dataset considered for further classification is labeled and tested by applying performance metrics from the designed trained classifier on the Sentiment140 dataset from Stanford Sentiment Treebank.

3.3 Classical Machine Learning Techniques Dataset finely curated and pre-processed is passed to the classification ensemble model which consists of the output of ensemble results from a comparative analysis of 13 different classifiers. Classification model types used are: 1.

2.

3.

4.

5.

6.

Naive Bayes: Different versions of Naive Bayes ensembled are BernoulliNB, ComplementNB, MultinomialNB, GaussianNB classifiers. No normalization is done on the dataset while training the model. Logistic Regression: We have used a logistic regression algorithm for binary classification of a particular sentence into positive or negative. The problem for classification does not include multiclass classification for huge dataset, and hence, we have used Stochastic Average Gradient descent solver for optimization for handling L2 penalty. The maximum number of iterations used for the solver to converge is 200. Support Vector Machine: We have implemented a Support Vector Classifier with linear kernel. In the case of linear SVC, the standard penalty, i.e., the L2 penalty is used with the default SVM loss function “squared_hinge.” Decision Tree: In our case, tweets and news are classified into positive or negative sentiments, with no specific predefined max_depth so that all the leaf nodes are explored. The splitter used is the default “best” split. Random Forest Classifier: The number of trees considered for this classifier is 100. Just like a decision tree classifier, no predefined max_depth is used while keeping the rest of the parameters as default as specified in sklearn. Boosting classifier: In this, we have a list of different learning rates ranging from 0.05 to 1 to get the highest accuracy amongst them. For each learning rate, 20 n_estimatores, i.e., boosting stages, max_features as 3, random state for controlling random permutation for the feature as 0, and the maximum depth of each individual regression estimator is 5.

Financial Forecasting of Stock Market Using Sentiment … Table 1 CNN model specifications

7.

427

Layer (type)

Output shape

Parameters

embedding (Embedding)

(None, 50, 400)

8,704,800

conv1d (conv1D)

(None, 49, 200)

160,200

conv1d_1 (conv1D)

(None, 48, 200)

240,200

global_max_pooling1d

(None, 200)

0

concatenate

(None, 400)

0

dropout (Dropout)

(None, 400)

0

dense (Dense)

(None, 128)

51,328

dropout_1 (Dropout)

(None, 128)

0

dense_1 (Dense)

(None, 2)

258

SGD Classifier: In our case, we have used the classifier with default L2 penalty along with the “squared_hinged” loss function. The maximum iterations are 2000 and the data is shuffled at each epoch.

3.4 Deep Learning Techniques 1.

CNN: Convolutional neural network is a deep neural network building algorithm that believes in building the deep and more precise model forward. It is known for the high accuracy and precision due to the hidden layers (Table 1).

2.

RNN: Recurrent neural networks is mainly developed for sequential data. The differential point of using RNN is it remembers the initial input given to the model which is easier for sequential data as it uses its internal memory for the current state and computation (Table 2).

3.

LSTM: Long-short term memory gives better performance than RNN because of the capability of learning order dependence in sequence prediction problems (Table 3)

Table 2 Bidirectional RNN model specifications

Layer (type)

Output shape

Parameters

embedding_1 (Embedding)

(None, 30, 300)

7,192,500

Bidirectional (Bidirectional)

(None, 30, 128)

186,880

bidirectional_1

(None, 64)

41,216

dense_1 (Dense)

(None, 10)

650

428 Table 3 LSTM model specifications

D. Patil et al. Layer (type)

Output shape

Parameters

embedding_1 (Embedding)

(None, 30, 300)

7,192,500

lstm (LSTM)

(None, 30, 256)

570,368

lstm_1 (LSTM)

(None, 256)

525,312

dense_1 (Dense)

(None, 1)

257

4 Results On the datasets used for training classical machine learning models, Support Vector Machine showed higher accuracy (93%) as well as a good confidence score. Random Forest Classifier and Decision Tree Classifier showed accuracies 92.84 and 91.5%, followed by Adaboost classifier, SGD classifier, and logistic regression with more than 90% accuracy. All the classifiers with their accuracies are listed in Table 4. For deep models, since the dataset used is skewed, we have used evaluation parameters like Accuracy, Precision, Recall, and AUC. Table 5 shows comparative evaluation results on testing data. Table 4 Comparative results of traditional classification models

Sr. No

Classification model

Tweets + Reddit + News (%)

On 10 k filings (%)

1

Logistic regression

91.88

93.91

2

Perceptron

90.48

97.26

3

Support vector machine

92.9

95.9

4

Naïve bayes

88.54

89.36

5

Decision tree

91.47

96.14

6

Random forest classifier

92.84

96

7

Boosting algorithm

88.76

89.81

8

SGD classifier

90.06

92.05

Table 5 Comparative results of deep learning models on testing data Deep learning model

Accuracy

Precision

Recall

AUC

CNN

0.8923

0.8923

0.892

0.9289

Shallow RNN

0.9033

0.9078

0.9894

0.8652

RNN + GRU

0.9116

0.9232

0.9799

0.8987

Bidirectional RNN

0.9179

0.9356

0.9726

0.9196

LSTM

0.9140

0.9418

0.9606

0.8959

Financial Forecasting of Stock Market Using Sentiment …

429

Figure 2 shows a graphical representation of sentiment analysis of 10 k filings using classical machine learning classifiers for Apple between 2010 and 2020. Predictions for stock trends on Apple for the Jan 201–Dec 2020 timeframe are done using sentiment analysis of tweets using a RF classifier and stock direction was predicted with almost 58.7% accuracy. The following graph shows the correlation between actual stock trends and predictions done by our framework (Fig. 3).

Fig. 2 Sentiment analysis of the 10-K report

Fig. 3 Correlation of prediction using sentiment analysis of social media mentionings and actual stock trends

430

D. Patil et al.

5 Conclusion In this process, we have studied and compared different machine learning approaches and machine learning algorithms in online sentiment analysis. In the proposed system, we analyzed different natural language processing techniques, classical machine learning models. Based on current analysis, it is observed that combining both sentiment analysis and classical machine learning algorithms leads to better accuracy. For future work, we work on multilingual analysis, i.e., performing sentiment analysis on the multilingual datasets and providing access to users in multiple languages. Optimizing results, computation time, and resources. We hope to build models with higher accuracy and precision to show a better comparative study of analysis.

References 1. Sawant, A.G., Dhawane, A., Ghate, G., Lohana, P., Kishan, U.: Integrating stock twits and news feed with stock data for better stock market prediction. Int. J. Res. Advent Technol. (2019) 2. Loughran, T., McDonald, B.: When is a liability not a liability? Textual analysis, dictionaries and 10-ks. J. Finance (2011) 3. Sagala, T.W., Saputri, M.S., Mahendra, R., Budi, I.: Stock price movement prediction using technical analysis and sentiment analysis. In: Proceedings of the 2020 2nd Asia Pacific Information Technology Conference (2020) 4. Nti, I.K., Adekoya, A.F., Weyori, B.A.: Predicting stock market price movement using sentiment analysis: evidence from Ghana. Appl. Comput. Syst. (2020) 5. Bharathi, S.V.S., Geetha, A.: Sentiment analysis for effective stock market prediction. Int. J. Intell. Eng. Syst. (2017) 6. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, October 2014 7. Sen, J.: Stock price prediction using machine learning and deep learning frameworks. In: Proceedings of the 6th International Conference on Business Analytics and Intelligence, Bangalore, India, December 2018 8. Liu, X.-Y., Yang, H., Chen, Q., Zhang, R., Yang, L., Xiao, B., Wang, C.: A deep reinforcement learning library for automated stock trading in quantitative finance. SSRN Electron. J. (2020) 9. https://docs.tweepy.org/en/latest/getting_started.html 10. www.kaggle.com/miguelaenlle/massive-stock-news-analysis-db-for-nlpbacktests 11. https://www.sec.gov/edgar/browse/?CIK=320193&owner=exclude 12. https://finance.yahoo.com/quote/AAPL/history/

A Survey on Learning-Based Gait Recognition for Human Authentication in Smart Cities Arindam Singh and Rajendra Kumar Dwivedi

Abstract Smart cities include good infrastructure, better transportation, connectivity, security, etc. Human identification is a part of providing security to societies. Artificial intelligence plays an important role in this objective. In this paper, we have presented a state-of-the-art on different machine learning techniques used for the identification of human beings according to their movement. We found that there are various machine learning techniques, viz., support vector machine (SVM), k-nearest neighbor (k-NN), convolutional neural network (CNN), Fuzzy set theory, Discrete Fourier transform which are used for human authentication. We presented a comparative study of such schemes and provided our major findings of the survey. We observed that under certain conditions, neural network-based techniques are performing better than the other existing schemes of gait-based human authentication. Keywords Human authentication · Gait analysis · Gait biometrics · Gait recognition system · Machine learning · Deep learning · Supervised learning · Unsupervised learning

1 Introduction As seen in today’s world of rising needs of security toward the increasing rates of crime, scientific research of human identification has greatly deployed in the field of security and defense challenges. Humans can be easily identified according to their gait and gestures. We can use various machine learning and deep learning techniques to identify the movements of a person. These techniques are very much useful in video surveillance. The main purpose of a human being’s identification is security. Artificial intelligence plays here vital role in designing safe cities. Urban migration is a global phenomenon. The UN predicts that by 2050, 68% of the world’s population will live in cities, a significant increase from 55% living in Mother Earth cities today. A. Singh (B) · R. K. Dwivedi Department of IT and CA, MMMUT, Gorakhpur, UP, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_39

431

432

A. Singh and R. K. Dwivedi

This relocation of municipal roads puts pressure on city planners, administrators, and political policy creators to exercise the governance of their communities and create sustainable economic stability by providing their representatives with a quality of life. Smart cities make it possible because the rapid development of software systems, connected devices, and knowledge communication technologies are helping them to meet this challenge. Biometric technology based on human estimates identifies people farther away even if their faces are hidden, covered, not visible, or not clear to the cameras in low-light areas. A previous study based on the human gait is made to look at both the bright and dark places of identification of a person in monitoring systems. Studies conducted in low-light areas (dark areas) based on left and right view headings (horizontal movement) Though, there are occasions where people only look in front and back of their bodies when moving in tunnels of low-light conditions. In these ideas, it is very difficult to recognize people using common features such as rotation, cadence, walking length, and distance between points (ankle, knee, and hip). For example, a large-scale study of people with similar faces, fingers, or iris has been performed biometrics [1–6] in two-dimensional videos or photos. To date, systems of identity which contain one or more of these biometrics [7, 8] appear to rule society. However, the identified individual is often required to physically touch the sensor or, when receiving the data, to comply with the sensor. The exact identity and placement of these biometric can also be subject to photometric variations and limitations (e.g., glasses, hair, hats) in actual photographs, which can dramatically affect vision output. Movement-based recognition technique can be preferred and is the only scanning-based technique which can be performed from the larger distances in comparison with other scanning-based technique. Most of the movement-based recognition technique consists of several steps including silhouette extraction, similarity comparison, feature learning, and feature selection. Deep learning techniques such as CNN RNN, etc., can be used for recognizing human beings. Gait is quite an intelligent and strong scanning-based technique that can find many applications even in an uncooperative environment. The great benefit of this type of human identification technique can be seen in the fact that it does not require awareness of the identified human. Unfortunately, it is not so accurate as compared to iris and fingerprint scanning-based systems. Machine learning techniques can be categories as supervised, unsupervised, and reinforcement machine learning. The taxonomy of machine learning techniques used for gait recognition is shown in Fig. 1.

2 Background Aristotle (384–322 BCE) was the first who made the written reference for analyzing the way of waking. He was from Athens Greece he gives various theories on human and animal movement [9]. From Aristotle to the beginning of the new age of computerized research techniques, this phase has been recorded, after that many researchers give their contribution in this field. In 1608–1679 Giovanni Alfonso Borelli from

A Survey on Learning-Based Gait Recognition …

433

Fig. 1 A taxonomy of machine learning algorithms for gait recognition

Pisa & Rome, Italy was the first who experimented on gait analysis and gives their contribution to the study of tendon and muscle biomechanics [9]. A group of great French physiologists (Albrecht von Haller (1708–1777), Paul Barthez (1734–1806), Francoise Magendie (1783–1855), Samuel Poisson (1781– 1840), and Pierre Nicolas Gerdy (1797–1856)) made sporadic observations on walking during the late 18th and early nineteenth centuries [10]. Willhelm Eduard Weber (1804–1891) was a physicist at Gottingen (and later Leipzig) who is commemorated in the SI unit of magnetic flux named after him. Ernst Heinrich (1795–1878) and Eduard Friedrich Willhelm (1806–1871), two of his brothers, were both professors of physiology at the University of Leipzig. Eduard and Willhelm collaborated on a series of works, including Mechanik der Gehwerkzeuge (Mechanics of the Human Walking Apparatus), which was published in 1836 [10]. After then major work is done by Jules Etienne Marey (1830–1904) on the human movement in Paris. Gaston Carlet (1849–1892) Marey’s student who developed the pressures exerted by the foot on the floor were recorded by a shoe with three pressure transducers built into the sole. He was the first to register the ground reaction’s double bump. Otto Fischer (1861–1917) was a German mathematician who conducted the first 3-D gait analysis. Willhelm Braune (1831–1892), a Professor of Anatomy with whom he had previously collaborated in calculating the inertial parameters of the human body, will still be synonymous with his name. The use of electrogoniometers to measure action at individual joints was perhaps the last breakthrough of gait research in the pre-computer period. Engineer Larry Lamoreux attached goniometers to a metal exoskeletal frame to test three-dimensional hip joint kinematics and one-dimensional knee and ankle

434

A. Singh and R. K. Dwivedi

angles [9]. Instrumented gait study, on the other hand, had not yet stepped away from being a testing technique only available on a small number of topics by the end of the 1970s. The majority of the equipment was bulky and time-consuming to use. Fortunately, this was the dawn of the computing revolution, which allowed for quicker and faster data processing and more and more performance opportunities [9].

3 Literature Review Human mobility was first studied in the medical field [11, 12]. Doctors analyze how a person walks to find that patient if he had a health problem. Researchers found just like iris and fingerprints, nearly everyone had their style of walking. So someone believes that a gait can also be used as a natural element to recognize a person. In this section, we have studied several approaches which are used for identifying human beings using their movements. There are several types of research done on this, which are using different approaches and methods for identification. Some of the researches are as follows: Songa [13] proposed a method called GaitNet which is based on two convolutional neural networks, one is on the segmentation of Gait and another is on classification. The model of human silhouette release and gait recognition with a single frame, and trained them in an end-to-end manner. Maryam Babaee [14] had used a fully convolutional neural network (deep learning) for identifying the human being by partial gait cycle. A method is tested on OULP and Casia-B and gives an accuracy of full Gait Energy Image (GEI) is 96.1%. Chantapakul [15] explains the method for identifying human beings by using 3 Kinects and String Grammar Fuzzy-Possibilistic C-Medians. Kinects are recording sides view as well as front view and person need to walk in 3.35 m fixed path. Batchuluun [16] proposed a technique for the identification of human beings by capturing the front and back images of the person by using thermal cameras. This method uses body movement for identification and a convolutional neural network for extracting the feature and classification. Sharma [17] defines a new technique named a sigmoid feature (GII-BPSF) which is a gait energy image-based feature. This method uses Fuzzy set theory and gait information image-sigmoid feature. Deng [18] explains the new method that gait dynamics graph (GDG) for human identification. He found that these approaches produce new gait data representations for identification from a series of gait sequences, substantially reducing the data size while yet maintaining the unique characteristics of human walking. This graph is plotted by extracting dynamics information into the 3-D graphic. Wang [19] proposed a method that is based on an Acoustic sensor system and a deep neural network for human identification. An acoustic sensor helps in extracting the gait signature of individuals. Gowtham Bhargavas [20] developed a project for identifying the person by using Kinect sensor and SVM algorithm. Kinect sensor extracts a skeleton image which is a color image with skeleton points. Bajwa [21] uses a support vector machine algorithm

A Survey on Learning-Based Gait Recognition …

435

with K-nearest neighbors (K-NN) and neural network (NN) for the identification of human beings. Dubois [22] developed the method which uses depth images and gait sequence for the identification of human beings. The Hidden Markov Models (HMMs) are used for detecting gait sequences that are recognizing the activity performed by persons. The identification is done by using individual height and gait patterns. Chen [23] describes a silhouette correlation analysis-based method for the identification of human beings. Background modeling and image subtraction are done on Image sequence then Binary silhouette image is used for image correlation. Correlation result is used for features extraction by the Discrete Fourier transform. PCA is used for reducing the dimension of the gait feature vector. Munsell [24] prescribed a new method that uses motion and anthropometric biometrics for human being identification. The Kinect sensors are producing depth images that are used for personal identification. Action classifier can recognize running and walking action from Kinect videos. Sudha [25] explains a novel hybrid holistic approach for identifying the unauthorized person from the surveillance area. The method is categorized into two-phase training and testing in the training phase binary silhouettes is generated by background modeling and foreground extraction then features extraction is done. In the testing phase, a binary silhouette of the image is matched with existing ´ image data stored in the database. Swito´ nski [26] proposed the method for the identification of human beings by using a gait path. He extracts features from the gait path by methods such as statistical, histogram, Fourier transforms, and by creating the timeline for obtaining reduced location and height by motion filter on a gait path. He used the supervised machine learning technique for classification. Lam [27] describes a gait flow image technique for human recognition. And the optical flow field is used for generating gait flow images (GFI). Firstly background subtraction is done for extracting binary silhouettes from the image then gait period estimation is done for the GFI generation process. He used two approaches direct matching and dimensional reduction for recognition. Gkalelis [28] explains multimodel-based human identification method which is using fuzzy vector quantization (FVQ) and linear discriminant analysis (LDA)-based algorithms. Major findings of the survey Table 1 presents a comparison of the accuracy of various identification techniques. According to the review done, we can say that various machines learning algorithms have different accuracy for identifying human beings. As we see in this survey, the methods which are using neural network for identification having a higher accuracy then compare to other.

4 Conclusion and Future Scope This paper gives concise details of various machine learning algorithms such as SVM, k-NN, CNN, NN, Fuzzy set theory, Discrete Fourier transform, etc., which are used for human detection from their movements. Moreover, the methods which are using

436

A. Singh and R. K. Dwivedi

Table 1 Comparison of accuracy of the various identification techniques S No Authors

Technique used/working

1

Songa [13], Elsevier, Pattern Recognition 96 (2019)

Convolutional neural It achieves 78.5% mean networks: Gait segmentation accuracy + classification

Outcome

2

Maryam [14], Neurocomputing (2019)

Convolutional neural networks: Deep Learning

3

Chantapakul [15] IEEE conference, ICCSCE (2018)

K-nearest neighbor tested on 73.33% correct 27 data person Data sets classification

4

Batchuluun [16] Expert Convolutional neural Systems With Applications network (2018)

CCR 97.4% and EER 0.51%

5

Sharma [17], Springer (2018)

Fuzzy set theory, gait information image-sigmoid feature GII-BPSF

The average rate of identification for a general case is 93.21

6

Deng [18], Pattern Recognition, Elsevier (2018)

GDG recognition can be accomplished with either the direct matching method or the nonlinear dynamics analysis method and it is compared with TUM GAID, OUISIR-A, CASIA-B, CASIA-C, and CMU MoBo databases

Average accuracy is around 95%, 99%, (A-90%, B-91%), (A-89%, B-89%), (A-98% B-99%), Respectively

7

Wang [19], (2017)

Acoustic sensor system and deep neural network applied on 50 participants by collecting 75 recordings of each

Identification accuracy of 97%

8

Gowtham Bhargavas [20] (2017)

SVM algorithm is applied on 93% recognition rate 20 individual on 10 video frame

9

Bajwa [21], Fourth (PDGC), (2016)

SVM with K-NN & NN

It shows 98.7% accuracy

10

Dubois [22], (EMBC), (2015)

Hidden Markov Models (HMM) is applied on 12 subject

It classifies 75% accurately

11

Chen [23], January (2014)

Discrete Fourier transform is Accuracy of classification is tested on 124 persons with 3 around 90% and Equal walking sequence Error Rate is 10% approx

12

Munsell [24], (2012)

Two-stage classification system

Accuracy of full Gait Energy Image (GEI) is 96.1%

Equal Error Rate of 13% and cumulative match characteristic rank-1 rate of identification is 90 (continued)

A Survey on Learning-Based Gait Recognition …

437

Table 1 (continued) S No Authors

Technique used/working

Outcome

13

Sudha [25], (2011) ´ Swito´ nski [26], (ACIVS) pp 531–542 (2011)

SVM Radial Basis function

Accuracy of 97.91%

15

Lam [27], Elsevier, Pattern Recognition 44 (2011)

The optical flow field is used for generating the binary silhouette of (GFI) Gait flow image

The average identification rate is 42.83% in direct matching & 43.08% in reducing the dimension

16

Gkalelis [28], (ICIP), (2009)

fuzzy vector quantization & linear discriminant analysis

Above 90% correct classifier based on movements of a run, jumps, and skip

14

Supervised machine learning 96.9% identification techniques on 25 actors with accuracy 353 different motions

the deep learning technique, especially neural network techniques are given more than 95% of accuracy. However, the accuracy of such a system can be enhanced. In the future, we can devise a gait-based reliable system to authenticate a person in smart cities. We can deploy this system in different regions of the smart city. We can work to enhance the accuracy of the learning-based gait recognition system.

References 1. Turk, M.A., Pentland, A.P.: Face recognition using eigenfaces. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–591 (1991) 2. Jain, A., Hong, L., Bolle, R.: Online fingerprint verification. IEEE Trans. Pattern Anal. Mach. Intell. 19, 302–314 (1997) 100 B.C. Munsell et al. 3. Ma, L., Tan, T., Wang, Y., Zhang, D.: Personal identification based on iris texture analysis. IEEE Trans. Pattern Anal. Mach. Intell. 25, 1519–1533 (2003) 4. Ross, A., Dass, S., Jain, A.: Fingerprint warping using ridge curve correspondences. IEEE Trans. Pattern Anal. Mach. Intell. 28, 19–30 (2006) 5. Lu, X., Jain, A.: Deformation modeling for robust 3D face matching. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1346–1357 (2008) 6. Pillai, J., Patel, V., Chellappa, R., Ratha, N.: Secure and robust iris recognition using random projections and sparse representations. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1877–1893 (2011) 7. Hong, L., Jain, A.: Integrating faces and fingerprints for personal identification. IEEE Trans. Pattern Anal. Mach. Intell. 20, 1295–1307 (1998) 8. Chang, K., Bowyer, K., Sarkar, S., Victor, B.: Comparison and combination of ear and face images in appearance-based biometrics. IEEE Trans. Pattern Anal. Mach. Intell. 25, 1160–1165 (2003) 9. Baker, R.: The history of gait analysis before the advent of modern computers, vol. 26(3), pp. 331–342. Elsevier Gait & Posture (2007) 10. Weber, W., Weber, E.: Mechanics of the human walking apparatus. Translated by Maquet P, Furlong R. Berlin: Springer-Verlag (1991) 11. Jain, A.K., Bolle, R., Pankanti, S.: Biometrics: Personal Identification in Networked Society. Kluwer Academic Publishers (1999)

438

A. Singh and R. K. Dwivedi

12. Blanke, D.J., Hageman, P.A.: Comparison of gait of young men and elderly men. Phys. Ther. 69(2), 144–148 (1989) 13. Songa, C., Huanga, Y., Huanga, Y., Jia, N.B., Wanga, L.: GaitNet: an end-to-end network for gait based human identification. Pattern Recogn. 96 (2019) 14. Babaee, M., Li, L., Rigoll, G.: Person identification from partial gait cycle using fully convolutional neural network. Neurocomputing (2019) 15. Chantapakul, W., Auephanwiriyakul, S., Theera-Umpon, N., Khunlertgit, N.: Person identification from full-body movement using string grammar fuzzy-possibilistic C-medians. In: IEEE Confrence, ICCSCE (2018) 16. Batchuluun, G., Naqvi, R.A., Kim, W., Park, K.R.: Body-movement-based human identification using convolutional neural network. Expert Syst. Appl. (2018) 17. Sharma, H., Grover, J.: Human identification based on gait recognition for multiple view angles. Springer Int. J. Intell. Robot. Appl. (2018) 18. Denga, M., Wang, C., Zheng, T.: Individual identification using a gait dynamics graph. Pattern Recogn. 83, 287–298 (2018) 19. Wang, Y., Chen, Y., Bhuiyan, M.Z.A., Han, Y., Zhao, S., Li, J.: Gait-based human identification using acoustic sensor and deep neural network. Future Gener. Comput. Syst. (2017) 20. Gowtham Bhargavas, M., Harshavardhan, K., Mohan, G.C., Nikhil Sharma, A., Prathap, C.: Human identification using gait recognition. In: IEEE International Conference on Communication and Signal Processing, pp. 1510–1513 (2017) 21. Bajwa, T.K., Garg, S., Saurabh, K.: GAIT analysis for identification by using SVM with K-NN and NN techniques. In: IEEE, Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC) (2016) 22. Dubois, A., Bresciani, J.-P.: Person identification from gait analysis with a depth camera at home. In: 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (2015) 23. Chen, J.: Gait correlation analysis based human identification. Sci. World J. 168275, 8 (2014) 24. Munsell, B.C., Temlyakov, A., Qu, C., Wang, S.: Person identification using full-body motion and anthropometric biometrics from Kinect videos. In: European Conference on Computer Vision (2012) 25. Sudha, L.R., Bhavani, R.: Biometric authorization system using gait biometry. Int. J. Comput. Sci. Eng. Appl. (2011) ´ 26. Swito´ nski1, A., Pola´nski, A., Wojciechowski, K.: Human identification based on gait paths. In: International Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS), pp 531–542 (2011) 27. Lam, T.H.W., Cheung, K.H., Liu, J.N.K.: Gait flow image: a silhouette-based gait representation for human identification. Pattern Recogn. 44 (2011) 28. Gkalelis, N., Tefas, A., Pitas, I.: Human identification from human movements. In: 16th IEEE International Conference on Image Processing (ICIP) (2009)

Micro-arterial Flow Simulation for Fluid Dynamics: A Review Rithusravya Jakka and Sathwik Rao Alladi

Abstract The planet is evolving technologically in all respects, but mortality due to various diseases continues to be a concern. In particular, the cardiovascular disease mortality rate is high and affects people of all ages. Innovation and technology are the only resources that exponentially and quickly progress in every industry. The use of technology in the medical field is currently limited. Many developments are being made in this area so that the treatment can be done easily and efficiently. One such cost-effective technology that maximizes the efficiency of performance is computational fluid dynamics (CFD), which is a mechanical engineering approach that uses simulation to analyze fluid flow, heat transfer, and some other phenomena. CFD is increasingly being utilized in biomedical research of coronary artery disease due to its high performance in hardware and software. This paper focuses on “MicroArterial Flow Simulation for Fluid Dynamics,” specifically designed to predict the velocity flow and the pressure in the coronary arteries via anastomosis, which helps to reduce the complexity of the issue. Keywords Computational fluid dynamics · Anastomosis · Coronary micro-vascular disease · Hemodynamics

1 Introduction Over the years, technology has changed our lifestyle dramatically. Technology has provided incredible tools and resources that bring us valuable information. Technology has made our lives comfortable with all of these revolutions. Technology is taking positive and negative steps forward. There is a great impact on people’s lifestyles. Tuberculosis is the only illness that in many countries around the world is the greatest murderer. But now the predominant diseases are non-communicable. Non-communicable diseases (NCDs) kill 41 million people a year, which is 71% of R. Jakka · S. R. Alladi (B) Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_40

439

440

R. Jakka and S. R. Alladi

the world’s deaths. Most of these NCDs are killed in cardiovascular diseases of 17.9 million people each year, followed, by cancers (9.0 million), respiratory diseases (3.9 million), and diabetes (1.6 million) [1]. Research conducted in partnership with the World Health Experts by Washington-based Global Burden of Disease (GBD), Registrar General of India (RGI), showed that most people have cardiovascular issues.

2 Background of Small Vessel Disease Coronary micro-vascular or small vessel disease is a heart-related disease that affects many people worldwide. Coronary micro-vascular disease is a disease, where the small branches of the main coronary arteries are weakened and are not adequately distended. The blood flow to the heart reduces when it is impaired. Small arteries must thus expand to supply blood to the heart with oxygen-rich blood. If angioplasty and stents are taken to cure coronary artery disease, signs and symptoms are not lost, people can even have a small vessel disease. The symptoms of a small vessel disease include angina that can intensify during everyday work and stress, chest pain, shortness of breath, fatigue, and energy shortage. The coronary artery structure is shown in Fig 1. One of the most recent strategies for restoring damaged vessels and blood supplies in extreme limbs is micro-anastomosis [3]. This operation establishes contact between adjoining distant sections of the artery/vein. In several areas, such as plastic, cardiac, reconstructive, orthopedic, head and neck, and organ transplantation [4, 5], there is a high requirement for the most recent surgical procedures to generate vascular anastomosis which is easy, cost-effective, and less detrimental but

Fig. 1 General layout of coronary artery [2]

Micro-arterial Flow Simulation for Fluid Dynamics: A Review

441

reliable [4, 5]. In general, there are three types of anastomosis depending on two sutured conduits: the two ducts are sutured end-to-end to a transverse diameter [6]. In end-to-side form, the vessel is laterally sutured to the parent vessel, whereas in side-to-side form, the two vessels are linked in the longitudinal direction. Anastomosis alters the arteries and restores blood supply [7]. Simulation is an alternative method to understand and analyze the biological and geometry parameters that can support further arterial hemodynamics study [8].

3 Evolution of Hemodynamics Blood is non-Newtonian fluid, better studied with rheology than with hydrodynamics. Hydrodynamics and fluid mechanisms dependent on the use of classic viscometers are not able to describe hemodynamics [9] and are thixotropic, as a result, blood vessels are not solid tubes. The literature has shown that the approach of Newtonian properties is suitable for approximately smooth, uniform vessels with more and nearly 0.5 mm diameter, but non-Newtonian effects are also important in larger vessels in irregular geometries (stenosis, aneurysm, anastomoses, etc.). CFD experiments in anatomically realistic arterial geometries, based on the first images, were based on clinical x-ray angiograms [10, 11]. For imagery vessels, MRI is used in particular, as the blood itself can be used as a contrasting agent. Unlike MRI or x-ray images, ultrasound imagining is used manually without a connection to a set coordinating scheme, so it’s difficult to rebuild a sequence of them in 3D. Doppler ultrasound is usually used to deliver real-time velocities measurements on the nominal line of the container, which can calculate the average velocity and flow rate gave the vessel’s radius, provided a fully defined velocity profile has been developed [12]. These processes help to construct the 3D CFD model. Microsurgical tissue auto-transplantation is the latest technology focused on modern techniques for reconstruction of defects after trauma, and surgical cancer; where there is a diameter mismatch for suturing side-to-side in anastomoses. There are various ways to have blood vessels anastomosed. One of the methods was taken into account. Previous studies by Rickard et al. [13, 14] developed the rodent model for anastomosis analyses (Fig. 2); resin cast of the anastomosis artery was used from these studies to generate digital CFD input scan micro-CT geometry. The below figures represent the sequence of the rodent anastomosis model (from Rickard et al. [13]) (Figs. 3 and 4). Key: FA: femoral artery; SCEA: the superficial caudal epigastric artery; SA: saphenous artery, PA: popliteal artery.

442

R. Jakka and S. R. Alladi

Fig. 2 Anatomy of the distal femoral artery

Fig. 3 View before anastomosing a to b

Fig. 4 A completed anastomosis (small arrow: tie around FA; large arrow: sutured anastomosis SCEA to FA)

4 Problem Statement and Objectives To study velocity and pressure fluctuations in various coronary artery outlet diameters. Earlier treatments, such as angioplasty, are used for heart vascular disease resolution, but the signs and symptoms are not gone away. A new technique was developed, called anastomoses. Micro anastomosis is the new surgical treatment to save time from end-to-end vessels. Simultaneously, CFD analysis is an alternative to costly, time-consuming, complex, risky, or impossible experiments as well as to

Micro-arterial Flow Simulation for Fluid Dynamics: A Review

443

theoretical approaches for handling simpler situations. Our research mainly focuses on determining the velocity flow at the adjacent artery walls and analyzing it using ANSYS-CFD Fluent software.

5 Methodology This project is mainly motivated by the nature of the velocity flow and pressure variance in different diameters. The following methods are used for obtaining the difference in velocity and pressure variation: • 3D model reconstruction • Initial and boundary conditions • Analyzing the model in CFD. These methods are described below.

5.1 3D Model Reconstruction The 3D model was reworked with the help of the general coronary artery and was done in SOLIDWORKS software. Taking into consideration of two outlet valves with a same diameter, one outlet valve with a larger diameter and the other with a smaller diameter and vice-versa. In this sense, an inference is made that all the artery configurations considered are circular. The angle of the two outlet valves is around 83°. Figure 5 represents the general layout of the inlet and outlet valves of a coronary artery. The 3D models have been reconstructed and are represented in Table 1.

5.2 Initial and Boundary Conditions The blood rheology was assumed to be Laminar and non-Newtonian fluid. The blood density is 1060 kg/m3 and the blood viscosity is known to be 0.003 Pa s. The blood enters the inlet valve at a speed of 0.3 m/s. The limits such as specific heat were 3513 J/kg k and thermal conductivity were 0.44 W/m k.

5.3 Analyzing the Model in CFD Based on the 3D model reconstruction, simulation was performed on CFD-FLUENT software. It is meshed tetrahedral before analysis, meshing increases the simulation

444

R. Jakka and S. R. Alladi

Fig. 5 General representation of coronary artery

precision and speed. The meshed model is initialized and the results of approximately 150 iterations were determined. The differences in velocities are simulated by streamlines. In streamlines, the velocity flow variation with different colors is indicated. The results obtained are discussed below.

6 Results and Discussions In this study, we have compared three different categories of coronary arteries, based on the diameter of outlets. In the first type, both the outlets are having the same diameter which is a very rare possibility, and the results of the analysis are given below. The following diagrams represent the results from the analysis (Table 2). These are the velocity streamlines derived from three different mesh models. They reflect the flow of velocity in different colors at every point. The plots of pressure and velocity are given in Fig. 6. This XY plot is between the static pressure within the coronary artery and the position from the inlet. This condition has the highest pressure at the vessel inlet, but the pressure drops dramatically as the distance increases. At the connection, the pressure is less and steadily increases after a certain distance (Fig. 7). This is a plot for velocity versus the position from the inlet. The velocity of the blood inside the artery is increasing rapidly. The initial velocity was 0.3 m/s, and the maximum speed was 0.78 m/s. At the branching, when the blood flows into various outputs, the velocity constantly decreases. At the end of the outputs, it becomes zero (Fig. 8).

Micro-arterial Flow Simulation for Fluid Dynamics: A Review

445

Table 1 Reconstructed 3D models Conditions

Reconstructed 3D models

Condition-1: Diameters of outlets are constant which is 1.2 mm

Condition-2: The diameter of outlet 1 (1.8 mm) is greater than the diameter of outlet 2 (1.3 mm)

(continued)

In the second type, the diameter of the output is taken larger than the diameter of output 2, keeping all the other parameters constant. In this case, the pressure of blood flow is lower than in the previous case. Like in the previous situation, the pressure decreased rapidly as the position gets increased. At the branching of the coronary artery, the pressure is at its lowest value and also increased as it moves to outlets. Similarly, the velocity of blood that is sent through the inlet increases constantly and reaches a limit of approximately 0.78 m/s. As per the previous condition, the velocity decreased near the diversion and was nil at the end of the outlets (Fig. 9). In the third plot, if the diameter of outlet 2 is greater than outlet 1, the results are very similar to the cases above. In the pressure vs. position plot, pressure decreases by a very high rate compared with the previous case. In this situation, the blood pressure

446

R. Jakka and S. R. Alladi

Table 1 (continued) Conditions

Reconstructed 3D models

Condition-3: The diameter of outlet 2 (1.70) is greater than the diameter of outlet 1 (1.24)

is very high even after the branching and the blood pressure suddenly increases as the blood flows to the outlets (Fig. 10). In the next plot, there is a linear increase in the velocity until a distance of 0.013 m. Here, the velocity of blood started decreasing at the branching due to obstructions. The velocity is greater than the previous case, also it dropped to zero at the end of the outlets (Fig. 11). These are the results obtained from the analysis of the three cases.

7 Conclusion The three coronary arteries of various outlet diameters have been modeled and rebuilt in this analysis. Steady CFD flow analysis was performed on all three models. The flow patterns were evaluated by contrasting three conditions. From the study, velocity streamlines were collected. These streamlines demonstrate that the velocity of blood varies according to outlet diameters. • In the first condition, the velocity was consistent at the inner part of the vessel but the velocity at the edges decreased due to interruptions. The pressure curve also reveals that the inlet of the vessel has peak blood pressure relative to the other two situations. • In the second condition, due to the greater diameter and lower pressure, the flow of velocity was more at outlet 1 relative to outlet 2. The pressure and velocity curves indicate optimum values, but the maximum velocity was at the intersection.

Design parameters

Diameters of the outlets are constant which is 1.2 mm

The diameter of outlet 1 (1.8 mm) is greater than the diameter of outlet 2 (1.3 mm)

S. No

1

2

Table 2 Analysis results Analysis results

(continued)

Micro-arterial Flow Simulation for Fluid Dynamics: A Review 447

Design parameters

The diameter of outlet 2 (1.70) is greater than the diameter of outlet 1 (1.24)

S. No

3

Table 2 (continued) Analysis results

448 R. Jakka and S. R. Alladi

Micro-arterial Flow Simulation for Fluid Dynamics: A Review

Fig. 6 Static pressure versus position plot of the first condition

Fig. 7 The velocity versus position plot of the first condition

Fig. 8 Static pressure versus the position plot of 2nd condition

449

450

Fig. 9 The velocity versus position plot of 2nd condition

Fig. 10 Static pressure versus the position plot of the third condition

Fig. 11 The velocity versus position plot of the third condition

R. Jakka and S. R. Alladi

Micro-arterial Flow Simulation for Fluid Dynamics: A Review

451

• In the third condition, the velocity at outlet 2 was higher than in outlet 1. In all three conditions, the value of the velocity at the branching was maximum, while the pressure value was minimum. The velocity and the pressure, therefore, differ as the diameter of outlets varies and at the same outlet diameter. Anastomosis is the latest breakthrough in coronary artery disease treatment. This process will save time and be cost-effective. Many techniques based on CFD are being developed which makes the method more reliable and easier. This research is helpful to know the conditions of the artery, depending on the diameters of outlets.

References 1. Non-Communicable diseases, World health organization referred by GBD 2015 Risk Factors Collaborators. 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet, 388(10053), 1659–1724 (2016) 2. Small Vessel Disease, 1998–2020 Mayo Foundation for Medical Education and Research (MFMER), https://www.mayoclinic.org/diseases-conditions/small-vessel-disease/symptomscauses/syc-20352117 3. Historical overview of vascular anastomoses. In: Sutureless Anastomoses, Steinkopff, pp. 1–11 (2007) 4. Li, H., Xie, B., Gu, C., Gao, M., Zhang, F., Wang, J., Dai, L., Yu, Y.: Distal end side-to-side anastomoses of sequential vein graft to small target coronary arteries improve intraoperative graft flow. BMC Cardiovasc. Disord. 14, 65 (2014) 5. Robert, E., Facca, S., Atik, T., Bodin, F., Bruant-Rodier, C., Liverneaux, P.: Vascular micro anastomosis through an endoscopic approach: feasibility study on two cadaver forearms. Chir. Main 32(3), 136–140 (2013) 6. Haimovici, H.: Vascular sutures and anastomoses. In: Haimovici’s Vascular Surgery, E. A. MD, Ed, pp. 241–252. Wiley-Blackwell (2012) 7. Varshney, G., Katiyar, V.K.: Mathematical modeling of blood flow in an arterial bypass anastomosis. J. Biomech. 39, S405 (2006) 8. Hull, J.E., Balakin, B.V., Kellerman, B.M., Wrolstad, D.K.: Computational fluid dynamic evaluation of the side-to-side anastomosis for arteriovenous fistula. J. Vasc. Surg. 58(1), 187193.e1 (2013) 9. Fieldman, J.S., Phong, D.H., Saint-Aubin, Y., Vinet, L.: “Rheology”. In: Biology and Mechanics of Blood Flows, Part II: Mechanics and Medical Aspects, pp. 119–123. Springer. ISBN 9780-387-74848-1 (2007) 10. Gibson, C.M., Diaz, L., Kandarpa, K., Sacks, F.M., Pasternak, R.C., Sandor, T., Feldman, C., Stone, P.H.: Relation of vessel wall shear stress to atherosclerosis progression in human coronary arteries. Arterioscler. Thromb. 13, 310–315 (1993) 11. Tasciyan, T.A., Banerjee, R., Cho, Y.I., Kim, R.: Two dimensional pulsatile hemodynamic analysis in the magnetic resonance angiography interpretation of a stenosed carotid arterial bifurcation. Med. Phys. 20, 1059–1070 (1993) 12. Holdsworth, D.W., Norley, C.J., Frayne, R., Steinman, D.A., Rutt, B.K.: Characterization of common carotid artery blood-flow waveforms in normal human subjects. Physiol. Meas. 20, 219–240 (1999) 13. Rickard, R.F., Wilson, J., Hudson, D.A.: Characterization of a rodent model for the study of arterial micro anastomoses with size discrepancy (small-to-large). Lab. Anim. 43(4), 350–356 (2009) 14. Rickard, R.F., Meyer, C., Hudson, D.A.: Computational modeling of micro arterial anastomoses with size discrepancy (small-to-large). J. Surg. Res. 153(1), 1–11 (2009)

Hip-Hop Culture Incites Criminal Behavior: A Deep Learning Study Niharika Abhange, Rahul Jadhav, Siddhant Deshpande, Swarad Gat, Varsha Naik, and Saishashank Konduri

Abstract For centuries, music has been an inseparable part of many human cultures. The rise of the hip-hop culture over the last 50 years has turned into a powerful movement, empowering people from various communities and making their voices heard. However, certain parts of hip-hop and rap music have started being associated with misogyny, substance abuse and violent behavior. This study aims to find a correlation between lyrics of hip-hop and rap songs that glorify such illicit behavior through their lyrics and the actual rate of criminal activity of individuals that are directly or indirectly influenced by hip-hop culture. This research employs NLP concepts to build a model that detects song lyrics that falls into any of the 3 categories—“Misogyny,” “Substance Abuse” and “Violence.” A comparative study is conducted by training multiple models including multinomial naïve Bayes, random forest and LSTM on a manually collected and labeled dataset consisting of rap song lyrics released between 1970 and 2020. The highest performing model (LSTM— 87% accuracy) was subsequently used to detect objectionable lyrics in popular rap songs of the decade of 2010–2019. To obtain a correlation of these with the criminal activity of the target population, official data of criminal activity (2010–2019) of citizens aged 0–29 from the largest hip-hop influenced areas in the world are compiled. This dataset is analyzed and to obtain strong evidence of a correlation between objectionable rap song lyrics being promoted through song lyrics and the criminal tendencies of the youth that is primarily affected by it. Keywords Hip-Hop · LSTM · NLP · Criminal · Juvenile crime · Rap · Misogyny · Substance abuse · Violence

N. Abhange (B) · R. Jadhav · S. Deshpande · S. Gat · V. Naik · S. Konduri Dr. Vishwanath Karad’s, MIT World Peace University, Pune, India V. Naik e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_41

453

454

N. Abhange et al.

1 Introduction In the decade 1970, the streets of Bronx in New York saw an economic downfall, rising crime, gang violence, poverty and racial disparity. It was on these very streets, emerging from the powerlessness of the marginalized society, that the culture of hiphop was born—a movement of hope in an era of despair [1]. It was during these times that “Rap music,” a form of music that incorporates masterful rhythmic verses and combinations of beats to deliver a message or a story, was created. For the suppressed African American and Latino community of that time, rap music told stories of pain, abandonment, poverty, hardships and vulnerability. The hip-hop culture over the next few decades grew rapidly, overcoming boundaries of color, class and ethnicity. People across the globe found solace and freedom in rap music, feeling a sense of belonging toward the hip-hop culture. However, not long after this, a shift was seen in the way a part of rap music was perceived over the world. Stories of vulnerability, pain, loss and anger turned into boasting of wealth, objectification of women, romanticizing drug abuse and justification of violence. Misogyny, substance abuse and violence are now often portrayed as characteristics of the hip-hop culture. Studies show that younger people who listen to rap and hip-hop are more likely to abuse alcohol and commit violent actions [2]. According to a survey conducted on individuals 25 years old or younger, two-fifths of the study sample (38%) reported use of marijuana and 13% use of club drugs. Moreover, 27% reported being engaging in at least one act of aggressive behavior. Most of the respondents (94%) reported listening to music “daily or almost daily.” Among these “daily or almost daily” music listeners, 69% of them reported often listening to rap music [3]. Unfortunately, the movement that started as a means of empowerment and redemption of the people now persuades young minds toward alcoholism and addiction, stirs up aggressive behavior and a regressive mentality toward women [4]. This is a machine learning and deep learning-based study of the transformation of the hip-hop genre, specifically rap music lyrics in the decade of 2010–2019 and its correlation with criminal tendencies of juveniles and young adults that are most likely to be influenced by these songs. A neural network has been built to detect objectionable lyrics in rap songs and classification and in-depth analysis of 500 songs and juvenile criminal data from 2010 to 2019 has been conducted to find a strong correlation between objectionable rap song lyrics and criminal tendencies of juveniles and young adults. To date, several humanities and survey-based studies explore the prevalent detrimental traits of many parts of hip-hop culture [5–8]. This study aims to support these results using natural language processing, deep learning and machine learning principles.

Hip-Hop Culture Incites Criminal Behavior: A Deep Learning Study

455

2 Methodology The structure of this study is divided into two segments. The purpose of the first segment is to build a model using various natural language processing, machine learning and deep learning techniques to automatically detect if the lyrics of a certain song promote illicit behavior falling under any of the three categories—misogyny, substance abuse and violence. Multiple NLP algorithms were trained on a manually collected dataset for this purpose—random forest classifier, support vector machine, multinomial naïve Bayes and LSTM. After the model was built, it was used on a compilation of 500 most popular rap songs of the years 2010–2019 to obtain outputs for each of these songs. Visualizations and analysis of these outputs are discussed in Sect. 7 to gain deeper insights such as variation of slurs over the decade and streaming ratios of objectionable and non-objectionable songs. The second segment of the study focuses on finding a correlation between the degree of misogyny, substance abuse and violence in the songs detected by the model and the criminal tendencies of juveniles and young adults who are influenced by hip-hop or rap music. Criminal or arrest data of juveniles and young adults in the largest hip-hop-influenced areas in the world was compiled. Finally, the nature of the crimes and the conditioning of the popular hip-hop culture of that time are compared to understand if hip-hop culture has effects on the juvenile and young adult crime rates.

3 Song Lyrics Dataset The lyrics data for training the model was collected manually by choosing rap songs that glorify misogyny, substance abuse and violence. The lyrics which consisted of certain predefined keywords (e.g., weed, stoned, high etc., for substance abuse) were targeted. The first step was to identify lyrics that fell under any of the three categories on which the study is focused upon. Moreover, songs of artists with criminal records or prior felonies or those known to have objectionable lyrics in their songs were targeted. Following this, the lyrics of these songs were collected manually from reputable sources such as Genius and AZLyrics and arranged in a datasheet that was used for the models as training data. The final step involved sorting them into three different text categories—misogyny, substance abuse and violence.

4 Models The following section elaborates various model architectures used for detection of song lyrics that consist of references to misogyny, substance abuse and violence.

456

N. Abhange et al.

4.1 Support Vector Machine Support vector machine (SVM) classification algorithm is a type of supervised learning algorithm that takes labeled data and tries to classify among the given labels [9]. SVM is not limited to classification as it also does regression. SVM primarily tries to create a hyperplane that separates the classes. Data points from each class that are nearest to the hyperplane are called “Support Vectors.” For a multi-class classification problem like ours, the SVM has two approaches: one versus one and one versus rest. In this application, the latter is used. In the one to rest approach, one class is separated from the rest of the classes using a hyperplane. The classifier uses SVMs. A linear kernel is used and the default value of C(=1.0) is used.

4.2 Random Forest Classifier Random forest is an ensemble technique that constructs many individual decision trees for making predictions, of which the best performing is finalized [10]. Random forest implements feature extraction to calculate the node impurity. A higher value of node probability implies higher importance of the feature. Each decision tree is provided with its respective training dataset. Random sampling was used to create subsets of our main dataset. Each subset is used as a training set for its respective decision tree. Similarly, each decision tree is also provided with a test dataset to check the accuracy of each tree.

4.3 Multinomial Naive Bayes Multinomial naïve Bayes is a special form of the naïve Bayes model [11]. The difference between naïve Bayes and multinomial naïve Bayes is that naïve Bayes essentially considers the presence or absence of a word in a document for its calculations, but multinomial naïve Bayes takes not only the presence of a word but also its count into account. P(A|x1 , . . . , xn )α P( A) ∗ P(x1 |A) . . . P(xn |A)

(1)

Here, “A” is the variable that represents the classes, i.e., misogyny, substance abuse and violence. “x n ” represents the words in a sentence.

Hip-Hop Culture Incites Criminal Behavior: A Deep Learning Study

457

Fig. 1 LSTM cell

4.4 Long Short-Term Memory (LSTM) LSTM is a state-of-the-art deep learning model [12]. What makes the LSTM different from the RNN is the gated unit or cell present in the hidden layer. As the gates open and close, the information in LSTM is lost or retrieved as required. Architecture. LSTM consists of three logistic sigmoid gates and two tanh layers. The tanh activation function maintains a value between −1 and 1 while the sigmoid activation function maintains a value between 0 and 1. The LSTM memory cell consists of three gates (Fig. 1). Forget Gate. The forget gate has two inputs of h(t − 1) and x(t). The information from the current input x(t) and hidden state h(t − 1) are passed through the sigmoid function. A value of 1 denotes that the information is important and needs to be remembered, while a value of 0 denotes that the information can be ignored. f t = σ W f . h t−1 , xt + bt

(2)

t = timestamp, f t = forget gate, x t = current input, ht −1 = previous hidden state, W f = defined weight, bt = Connection bias at t. Input Gate. The input gate has two inputs of x(t) and h(t − 1) which are passed through the sigmoid and tanh activation units. The generated values are then passed on to calculate the point-by-point multiplication. The previous cell state C(t − 1) gets multiplied with the forget vector f (t), and if the output is 0, then values will get dropped in the cell state, while an output of 1 retains the values. Ct = f t ∗ Ct−1 + i t ∗ Ct

(3)

t = timestamp, f t = forget gate, C t = cell state information, it = input gate at t, C t −1 = previous timestamp.

458

N. Abhange et al.

Output Gate. The inputs of ht−1 and x t are passed to the sigmoid function, and the new cell state value is passed to the tanh function. A point-by-point multiplication of these outputs is performed to generate the information of the new hidden state. ot = σ Wo h t−1 , xt + bo

(4)

h t = ot ∗ tan h(Ct )

(5)

t = timestamp, ot = output gate at t, W o = Weight matrix of output gate, bo = bias vector w.r.t W o , ht = LSTM output. The model built uses 70% of data for training and the other 30% of data for testing, with an embedding layer, spatial dropout of 0.4, an LSTM layer and a dense layer with three output neurons. A batch size of 32 was defined and 50 epochs were generated.

5 Model Evaluation The accuracies and encoding techniques of the four models used are listed in Table 1. In the case of multinomial naïve Bayes and SVM, a different testing file was made instead of splitting the dataset. This was done to separately “fit_transform” the training data and only transform the test data. For random forest, grid search method was employed to find the best hyperparameters. Different “n_estimators” and depths of trees were tried. The best accuracy was found for n_estimators = 150 and max_depth = 60. The accuracy for these hyperparameters was 76.87%. One of the key advantages of using LSTM over RNN networks is that it address the vanishing gradient problem [13]. Hence, LSTM is better suited to capture the longterm dependencies in song lyrics, giving the model a higher capability to understand Table 1 Model accuracies

Sr. No

Model

Encoding technique

Accuracy (%)

1

Support vector machine

Count vectorizer

82

2

Random forest classifier

Count vectorizer

76.87

3

Multinomial naive Count Bayes vectorizer

85

4

Long short-term memory

87

Index-based encoding

Hip-Hop Culture Incites Criminal Behavior: A Deep Learning Study

459

Table 2 LSTM output format Year

Song

Substance abuse%

Misogyny%

Violence%

NO%

Label

2017

Love

0

17.39

0

82.60

NO

2011

Murder to excellence

12.5

8.33

34.37

44.79

Violence

2015

Antidote

25.33

14.67

6.67

53.33

Substance abuse

context and repetitive phrases to aid classification. The LSTM model was found to be the most accurate and reliable model, the weights and results of which were used for the subsequent part of this study.

6 LSTM Application and Outputs The LSTM model, hence built, was used to classify 500 song lyrics from the Genius top 50 rap song lyrics of each year from 2011 to 2019 into the three labels. These songs were scraped with an automated customized search engine made using the library lyrics_extractor. The 500 songs were thus compiled and fed into the model. The output of the model comes in the form of three probabilities lying between 0 and 1 per category, each probability corresponding to a category showing how confidently the song lies in it. Only the songs with a probability of more than 0.8 in one particular category were considered to lie in it, otherwise the song was labeled as “Not Objectionable.” At the end of this step, each song had one of the labels—“Misogyny,” “Substance Abuse,” “Violence” and “Not Objectionable.” The following is a depiction of the format in which the outputs of the model were compiled. (NO = Not Objectionable) (Table 2).

7 Visualizations To perform analysis of hip-hop songs over the past decade (2010–2019), top 100 songs of each year were scraped from popular music sites like Genius. All the songs were preprocessed prior to analysis. Note that all the visualizations and analysis are based on the outputs generated by the model. TF-IDF values of each category were used to determine the important words under each label. A list of ten words was made for each of the labels. The occurrence of each of these words in the top 100 songs over the past 10 years was taken and plotted (see Fig. 2). The substance abuse category is divided into three subclasses, namely “Hard Drugs,” “Alcohol” and “Marijuana” (see Fig. 3). Using the TF-IDF weights of the

460

N. Abhange et al.

Fig. 2 Variation of significant slurs over the decade

Fig. 3 Variation of individual substance per 1000 words

“Drugs” category, the occurrence of these words per 1000 words was calculated. For instance, references to marijuana peaked in 2013 with 5 out of 1000 words in rap songs as direct references to marijuana. A major part of analyzing the transformation of rap music involves understanding what audiences accept and like. Streaming data of 10 most objectionable and 10 least objectionable songs of each year in the decade is collected from YouTube and plotted year-wise (see Fig. 4). The

Hip-Hop Culture Incites Criminal Behavior: A Deep Learning Study

461

Fig. 4 Streaming ratios of objectionable and non-objectionable songs

year of 2013 saw the largest ratio of streams of objectionable songs (0.92). The tastes of hip-hop audiences have changed for the better over the decade as this ratio sees a general decline until 2019 (0.33). The prominent rap artists of the decade 2010–2020 that repeatedly had songs in the categories of misogyny, substance abuse and violence as analyzed by the model are plotted (see Fig. 5).

Fig. 5 Distribution of categories of artists with the most objectionable lyrics

462

N. Abhange et al.

8 Crime Data The target population of this study is individuals aged 10–29 who are heavily influenced by the hip-hop culture. As no data directly recording the crime rate of the global hip-hop audience is available, three factors were considered to maximize the proportion of hip-hop listeners in our dataset. 1. 2.

3.

The dataset contains criminal records of cities with the largest hip-hop listening population in the world. The data is further narrowed down into the age group that is most impressionable and likely to follow, listen to and be influenced by the hip-hop culture, which is 10–29 [14]. The kind of criminal activity that was taken into account is directly promoted by a part of hip-hop culture through song lyrics and music videos.

To maintain the veracity of the data, arrest data provided on the official state government websites were collected and narrowed down according to the three filters mentioned above. In multiple cases, the arrest data for the target age group (10–29) of the “Target City” is not made publicly available, and hence the arrest data of the county or state in which the city lies is considered instead. The types of crimes in the data are categorized into misogyny, substance abuse and violence as stated below: 1. 2. 3.

Misogyny—sex offences, prostitution, rape. Substance Abuse—possession and manufacturing of narcotics (opium, cocaine, marijuana etc.), driving in an intoxicated state, violating liquor laws. Violence—assault, aggravated assault, homicide, murder, illegal possession of weapons, damage to property, robbery.

The final dataset is 10 years of criminal records of impressionable juveniles or young adults living in cities with the strongest rooted hip-hop cultures in the world.

9 Results The hypothesis that rap songs have a direct effect on the criminal tendencies of juveniles and young adults is explored in this section. Table 3 demonstrates the overall rise or drop percentages in each of the measured value of misogyny, substance abuse and violence level of song lyrics as recorded by the model and the overall rise or drop percentage of juvenile or young adult crime over the same period. The 14 positive correlations out of 16 records provide strong support to the theory that criminal activities of juveniles and young adults in areas that are influenced by the hip-hop culture are closely linked to the kind of rap songs that were popular in those years.

Hip-Hop Culture Incites Criminal Behavior: A Deep Learning Study

463

Table 3 Song lyrics and crime correlation Years

Area

Crime category

Criminal age group

Song variation

Crime variation

Correlation

2013–17

Detroit (Wayne County)

Misogyny

10–16

17.03% Decrease

19.04% Decrease

Positive

2013–17

Detroit (Wayne County)

Substance Abuse

10–16

17.17% Decrease

24.48% Decrease

Positive

2013–17

Detroit (Wayne County)

Violence

10–16

14.27% Decrease

50.55% Decrease

Positive

2011–18

Atlanta

Misogyny

0–29

28.03% Decrease

31.00% Decrease

Positive

2011–18

Atlanta

Violence

0–29

14.63% Decrease

44.60% Decrease

Positive

2011–18

Atlanta

Substance Abuse

0–29

8.25% Decrease

13.19% Decrease

Positive

2010–18

California

Substance Abuse

10–17

8.25% Decrease

86.96% Decrease

Positive

2010–18

California

Violence

10–17

14.63% Decrease

69.52% Decrease

Positive

2010–18

Pennsylvania

Substance Abuse

10–17

8.25% Decrease

35.10% Decrease

Positive

2010–18

Pennsylvania

Violence

10–17

14.63% Decrease

52.71% Decrease

Positive

2014–18

Bronx

Misogyny

0–16

20.56% Decrease

26.58% Decrease

Positive

2014–18

Bronx

Substance Abuse

0–16

1.7% Increase

86.53% Decrease

Negative

2014–18

Bronx

Violence

0–16

26.57% Decrease

53.34% Decrease

Positive

2014–18

New York County

Misogyny

0–16

20.56% Decrease

42.00% Decrease

Positive

2014–18

New York County

Substance Abuse

0–16

1.70% Increase

78.57% Decrease

Negative

2014–18

New York County

Violence

0–16

26.57% Decrease

45.54% Decrease

Positive

10 Future Scope Very few computational works of research have been conducted that provide strong evidence of the effect of the hip-hop culture on the psychology of the youth. More efforts are required to record and enhance the dataset so as to promote research in this unexplored field. For instance, this study uses criminal data of hip-hop-influenced

464

N. Abhange et al.

states and cities to draw results due to lack of publicly available arrest data, but targeting smaller individual areas with greater hip-hop influence will make the results more specific and precise. One of the major challenges faced in this study was to associate the motive behind the crime with cultural influences. Factors such as the environment in which a child is brought up, crimes committed in the heat of the moment or for self-defense, the judicial system and literacy rates of the targeted areas all contribute to the crime rate. To address these challenges, a more comprehensive juvenile arrest data can be obtained by conducting interviews, surveys or by taking into account the circumstances of the crime to provide future researchers with a more relevant and insightful dataset for computational studies.

11 Conclusion This study provides a comprehensive overview of the evolution of rap music, a major part of hip-hop culture in the decade of 2010–2019. The analysis and visualizations of the rap lyrics over the years provide deeper insights into the psychological effect it may have on juveniles and young adults. Finally, juvenile and young adult arrest data for each category is analyzed and a correlation is obtained between the arrest rates and the variation of song lyrics over the decade. These results suggest strong evidence of a psychological effect of rap songs promoting misogyny, substance abuse and violence on young people. This study does not aim to undermine the real hip-hop culture that has been a thread of art, music and stories connecting various communities and people all over the globe. Instead, it aims to demonstrate the power that words and music hold over the human psyche. Music artists and songwriters need to understand that their music is heard by millions of people over the world and their fame must be used responsibly—not to spread hatred, addiction or to degrade lives but to heal wounds, share pain and to inspire love and kindness.

References 1. DP Alridge J Stewart 2005 Introduction: Hip Hop in history: past, present, and future J. Afr. Am. Hist. 90 190 195 2. Tatum, B.L.: The link between rap music and youth crime and violence: a review of the literature and issues for future research. Justice Prof. 11(3), 339–353. https://doi.org/10.1080/1478601x. 1999.9959513 (1999) 3. Chen, M.J., Miller, B.A., Grube, J.W., Waiters, E.D.: Music, substance use, and aggression. J. Stud. Alcohol 67(3), 373–381 (2006). https://doi.org/10.15288/jsa.2006.67.373 4. R Weitzer CE Kubrin 2009 Misogyny in rap music: a content analysis of prevalence and meanings Men Masculinities 12 1 3 29 https://doi.org/10.1177/1097184X08327696 5. Cundiff, G.: The influence of rap and hip-hop music: an analysis on audience perceptions of misogynistic lyrics. Elon J. Undergraduate Res. Commun. 4(1) (2013)

Hip-Hop Culture Incites Criminal Behavior: A Deep Learning Study

465

6. D Herd 2009 Changing images of violence in rap music lyrics: 1979–1997 J. Public Health Policy 30 4 395 406 7. Oredein, T., Evans, K., Lewis, M.J.: Violent trends in hip-hop entertainment journalism. J. Black Stud. 51(3), 228–250 (2020). https://doi.org/10.1177/0021934719897365 8. J Tanner M Asbridge S Wortley 2009 Listening to rap: cultures of crime, cultures of resistance Soc. Forces 88 2 693 722 9. Evgeniou, P.M.: Support vector machines: theory and applications In: Paliouras G., Karkaletsis V., Spyropoulos C.D. (eds) Machine Learning and Its Applications. ACAI 1999. Lecture Notes in Computer Science, vol 2049. Springer, Berlin (1999). https://doi.org/10.1007/3-540-446737_12 10. Ali, J., Khan, R., Ahmad, N., Maqsood, I.N.: Random forests and decision trees. Int. J. Comput. Sci. (9) (2012) 11. Xu, S., Li, Y., Zheng, W.: Bayesian multinomial naïve bayes classifier to text classification. In: International Conference on Multimedia and Ubiquitous Engineering International Conference on Future Information Technology, pp. 347–352. (2017) https://doi.org/10.1007/978-981-105041-1_57 12. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 13. Bengio, Y. & Frasconi, P., & Simard, P.: Problem of learning long-term dependencies in recurrent networks. In: IEEE International Conference on Neural Networks, vol. 3, pp. 1183–1188 (1993). https://doi.org/10.1109/ICNN.1993.298725 14. LJ Knoll JT Leung L Foulkes SJ Blakemore 2017 Age-related differences in social influence on risk perception depend on the direction of influence J. Adolesc. 60 53 63 https://doi.org/10. 1016/j.adolescence.2017.07.002

Complex Contourlet Transform Domain Based Image Compression G. Saranya, G. S. Shrinidhi, and S. Bargavi

Abstract In most of the important applications like transmission and storage purposes, the important technique used worldwide is image compression method. In general, the digital image contains an immense size of information, and it is an essential need to remove the data or information before transmission and storage. This work process the image compression method using the Raspberry Pi processor. The processor helps to retain a huge amount of image or data information with better image quality. Raspberry Pi supercomputer permits the execution with help of contourlet (families) transform (CT) using python for image compression technique. The digital still images are focussed and captured at the given time using a Web camera that is connected to a Raspberry Pi supercomputer at an inaccessible place. Further, the image compression method ensures that storage capacity is good in the proposed method with good memory compatibility. Then, the target host receives the compressed image and displays the decompressed output. The image compression method is performed by using the complex contourlet transform, it quantizes the transformed matrix, and then performs the encoding process. Finally, the inverse complex contourlet transform is used for image decompression method in order to retrieve the image back. Keywords Complex contourlet transform (CCT) · Quantization · Encoding · Histogram plot

1 Introduction The image compression technique is an essential way to make up the strong storage and transmission. Lossy and lossless are the two major techniques performed in the image compression method. Lossless compression is most widely preferred for medical imaging to retain image conditions and to reduce file size. Lossy compression method, mainly used for e-source, is easily accessible, and it is fast to perform G. Saranya (B) · G. S. Shrinidhi · S. Bargavi Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_42

467

468

G. Saranya et al.

transmission and storage. It is especially suitable for real data images, namely picture mode where the few loss of data or information accuracy is bearable to carry off a significant cut in bit rate. The lossy compression [1] method gives quality degrades where we cannot retrieve back the original image after the compression technique. Information redundancy is the major component of the image compression technique. Generally, there are three types of redundancy as follows, (1) (2) (3)

Relationship between neighbouring pixel points—Spatial redundancy Relationship between different colour planes—Spectral redundancy Relationship between nearby points—Temporal redundancy

Lossless compression is used to minimize the size of the report without losing the features of an image so it is also known as reversible. Obviously, lossless compression is preferably desired since no data is compromised. However, only a moderate amount of compression is able to be done. The lossy compression process is also known as irreversible, i.e. some information from the original document. The biggest advantage of this method is it reduces the size of the document, but the demerit is it reduces the quality of the image. In general, the compression method is performed with lots of distortion. This distortion can be seen clearly or sometimes it is not visible. In most cases, visible lossless compression has been used to identify the lossy compression method which results in no visible loss under the normal conditions. Unfortunately, visually lossless compression is a little subjective, and more precautions should be taken in its performance. The digital image is a still image that contains spatial redundancy; if it is a colour image, then it contains spectral redundancy, and if it is motion pictures [2, 3], then it contains temporal redundancy. The basic methods to implement lossless image compression are: • • • •

Run-length encoding Coding method Entropy encoding Chain codes

1.1 Methods to Perform Lossy Compression 1.

2.

The most common colour in an image is reduced using the lossy compression method. The targeted colours are mentioned in the rank in the name of the compressed picture. Each and every pixel is set as a base and the index of colour is chosen in the colour palette; this process may be combined with delay to abrupt changes from one colour to another colour. The most common colour in an image is reduced using the lossy compression method. The targeted colours are mentioned in the rank in the name of the compressed picture. Each and every pixel is set as a base and the index of

Complex Contourlet Transform Domain Based Image Compression

3.

4.

5.

469

colour is chosen in the colour palette, this process may be combined with delay to abrupt changes from one colour to another colour. Quality of colour sub-sampling takes important merit that the human visual perception will understand in a better way how the spatial variations of brightness is very clear compared to the normal colour. This is achieved by just adjusting or leaving some of the coloration data information in the pixel image. Transform-based coding is the process where the image is processed with Fourier transform first before the image compression method is applied, and it is the most widely used operation. Recently, wavelet transform and contourlet families are used extensively. Later, the process is followed by quantizer method and entropy coding method. Final stage is pattern compression. It is used to find out the strings in compression technique with few or no decompressed data information [4, 5].

2 Contourlet Family-Based Transform Method Wavelets are important tools in representing the signal and good in detecting the point discontinuities. However, they are not practical in representing the geometrical level of the contours. Hence, contourlet transform is employed. The important features of complex contourlet transform (CCT) are directionality and anisotropy, which wavelets do not have and so complex contourlet transform outperform wavelets in many image processing applications. Complex contourlet transforms comprised of two major steps. (1) Dual-treecomplex wavelet transform (DT-CWT) it is the main focus which is used for the multi-resolution analysis decomposition level [6]. It is divided into total six sub-band directional values on each pixel scale of the detailed coefficient DC value sub-space. All the directional sub-bands have real coefficient values and imaginary coefficient values of wavelet transform. At the secondary phase [7], filter bank of directional scheme is used for large directional decomposition values to set the nearby matched coefficients which are used to record the pixel coefficient values by DT-CWT. After these two phases, the complex contourlet transform (CCT) come up with low-frequency bands and high-frequency bands. The end result of the new performed transform incorporates the main features of non-sub-sampled contourlet transform (NSCT) (i.e. multi-resolution, localization, directionality and anisotropy) with dual-tree-complex wavelet transform (DT-CWT) (i.e. translation invariant, directionality). Hence, it is computationally more logical than the other transforms. Figure 1 shows the level one decomposition stage for complex contourlet transform.

470

G. Saranya et al.

Fig. 1 Decomposition workflow for CCT—stage 1 process

3 Methods Used 3.1 Quantization Quantization method is widely used in the image processing data tool method which is used to achieve the set of pixel values to set the process of one quantum value. Quantization method is more compressible if the discrete symbols code words are low in the bit stream [8]. The major application of this technique includes the discrete cosine transform-based data quantization in JPEG, discrete wavelet transform-based data quantization in JPEG 2000 [9], and therefore, in this paper, contourlet transform is trying to implement the compression method in JPEG 2000 compression process. The human eye vision is efficient to observe the minute changes in the brightness over a nearly larger area, but the human eye vision is not so good to transform the actual power of a high-frequency brightness variation. Therefore, users can eliminate the high-frequency components to reduce the size of information. The process of high

Complex Contourlet Transform Domain Based Image Compression

471

frequency is executed by a usual way of division operation in which each and every pixel set of values in the frequency domain method is considered as a unique value for that block, and further, it is rounding the values to the nearest integer block value. Thus, the lossy compression method is performed in this work. Therefore, all the highfrequency pixel values are rounded off to zero values, and the other block elements are becoming small integer positive value or negative integer sample values. The human visual perception is very sensitive to the enhanced than normal colour [10]. Therefore, the compression technique is performed with the help of a colour space that splits into two (e.g. Y-one luminance component, two chrominance; Cb-blue chrominance difference, and Cr-red chrominance difference), and quantizing the channels separately. The quantizer block model taken should have weighing points that can be used to validate the largest compression with a minimal loss of features.

3.2 Arithmetic Encoding Technique It is a basic procedure that uses entropy coding in lossless compression [11]. When a character changes the loop to arithmetic encoding, i.e. trained data of the character will be saved with fewer bits and not trained data will be saved with larger bits. Therefore, minute bits will be applicable for the whole entropy encoding process. Arithmetic encoding is completely varied from the other forms of entropy encoding. Arithmetic encoding will encode the entire data into a single digit, i.e. an arbitraryaccuracy bit. It indicates the present data value as experienced range values, which are defined by two unique numbers, i.e. zero or one. The output of the proposed work is illustrated in Fig. 2a, b and c. A histogram plot is a graphical view which is used to observe the human eye intellection of the uniform distribution of data. The result of the histogram plot is shown in Fig. 3a, b and c. The performance values of the proposed method CCT-based compression method is performed by components such as error calculation, SNR calculation, ratio between original and reconstructed image and storage capacity. Table 1 shows the performance evaluation for different sets of input images. Compared to the other entropy encoder methods, the proposed CCT method has significant advantages like high space save (−98% for four test images and −92 for one test image), low MSE (−25 × 10–5 ) with high PSNR (−94.18), and also complexity is less. Therefore, the proposed compression method using CCT gives a better result than the other entropy encoding which can be further used for storage in hospital digital databases.

472

G. Saranya et al.

Fig. 2 Test image 1: a Input image. b Image compression. c Output decompressed image

4 Conclusion The proposed work suggests that the compression ratio is increased by using complex contourlet transform compared to DCT and DWT methods. This method can be efficiently used to compress any type of image format. Therefore, this algorithm is an efficient and ideal method for transmission and storage purposes, and it is also used to transmit the images or data information with a less possibility of trespasser modifying the original data or changing the content of original images while transmitting. Hence, this method ensures a good security level for secure transmission not only secure and faster transmission, but also it is safer than other normal jpeg compression algorithms which achieve a good compression ratio. Therefore, we can use this method for ehealthcare telemedicine technology for better storage and transmission. This method also ensures that the information compressed will be secured with proper security key which follows medical standard HL7.

Complex Contourlet Transform Domain Based Image Compression

473

Fig. 3 Histogram plot-test image 1: a Input image. b Image compression. c Output decompressed image Table 1 Metrics for contourlet family—different test images Test image SS %

Dim

CR

Comp size (KB)

Un comp size (KB)

MSE

PSNR

1

99.4

256 × 256

99.9

15

59

25 × 10–6

94

2

99.3

256 × 256

98.3

14

55

12 × 10–5

97

3

94.6

256 × 256

92

19

61

54 ×

10–5

93

4

99.5

256 × 256

97

16

68

18 × 10–7

96

5

99.3

256 × 256

96.5

11

49

41 × 10–5

92

474

G. Saranya et al.

References 1. Abdulhameed Al-Rawi, Z.N., et al.: Image compression using contourlet transform. In: Proceedings 1st Annual International Conference on Information and Sciences, pp. 254–258. IEEE Publisher (2018) 2. Sahitya, S., Lokesha, H., Sudha, L.K.: Real time application of Raspberry Pi in compression of images. In: Proceeding of International Conference on Recent Trends in Electronics, Information and Communication Technology (RTEICT), IEEE Publisher, Bangalore (2016) 3. Marot, J., Bourennane, S.: Raspberry Pi for image processing education. IEEE Publisher (2017) 4. Howse, J.: OpenCV Computer Vision with Python. Kindle Edition (2013) 5. Mordvintsev, A., Abid, K.: OpenCV Python Tutorials Documentation (2013) 6. Chen, D., Li, Q.: The use of complex contourlet transform on fusion scheme. Proc. World Acad. Sci. Eng. Technol. 7, 342–347 (2005) 7. Do, M.N., Vetterli, M.: Contourlets: Beyond Wavelets. In: Stoeckler, J., Welland, G.V. (eds.) pp. 1–27. Academic Press (2001) 8. Taubman, D., Marcellin, M.: JPEG2000 Image Compression Fundamentals, Standards and Practice Image Compression Fundamentals, Standards and Practice, International Series in Engineering and Computer Science (2002) 9. Acharya, T., Tsai, P.-S.: JPEG 2000 Standard for Image Compression: Concepts, Algorithms, VLSI Architecture (2004) 10. Li, J.: Image compression: the mathematics of JPEG 2000 (2003) 11. Alzahir, S., Borici, A.: An innovative lossless compression method for discrete-color images. IEEE Trans. Image Proc. 2, 44–56 (2015)

HWYL: An Edutainment Based Mobile Phone Game Designed to Raise Awareness on Environmental Management Ace C. Lagman, Ma. Corazon F. Raguro, Maria Vicky S. Solomo, Jay-Ar P. Lalata, Marie Luvett I. Goh, and Heintjie N. Vicente Abstract HWYL (meaning “a stirring feeling of emotional motivation and energy”) is a 3D isometric puzzle adventure game for Android devices. The whole game revolves around the adventures of Thomas as he unknowingly helps the mayor and the townspeople in solving environmental problems through playing meaningful mini games. The mini games comprise of different casual games that focus on teaching the players about the major environmental issues such as ozone depletion, and disposal of wastes. The interactive learning experience educates players how to mitigate environmental problems. The game garnered very satisfactory results from the play testers, proving that the game has been successful in promoting environmental awareness through edutainment, and the game as a system works as intended with compliance to software quality factors. Keywords 3D · Puzzle adventure game · Android mobile game · Edutainment · Environment awareness · Environmental management

1 Introduction HWYL (pronounced “who will”) is a video game which focuses on saving and maintaining a healthy environment and giving public awareness to the possible effects to our weather and climate when we do not give proper attention in sustaining our surroundings clean. A video game is also one of the best possible mediums in providing an interactive learning experience to the children (primarily those in kindergarten to pre-teens) in such a way that the proponents would be able to give them an understanding to the possible chain reaction of the effects of climate change and in demonstrating practical ways to save mother Earth even in as little means as possible while ultimately giving the players an entertaining experience through series A. C. Lagman (B) · Ma. C. F. Raguro · M. V. S. Solomo · J.-A. P. Lalata · M. L. I. Goh · H. N. Vicente FEU Institute of Technology, 1015 Manila, Philippines e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_43

475

476

A. C. Lagman et al.

of interesting game mechanics for educating the children about environmentalism without giving too much technical information.

2 Related Studies One of the major factors the proponents looked into when conceptualizing the game design that revolves around the effects of climate change are the abundance of the resources that focuses on the reasons for climate change and its domino effect to the environment that ultimately leads to natural disasters. Another factor is the mini games that some government and educational institutions have created to relay relevant information about the planet, climate change and steps on how to solve it in a way that it would easily be understood by everyone regardless of age and educational attainment. One of the main causes of climate change is what we call the greenhouse effect. It stated that the greenhouse effect refers to circumstances where the short wavelengths of visible light from the sun pass through a transparent medium and are absorbed, but the longer wavelengths of the infrared re-radiation from the heated objects are unable to pass through that medium [1]. NASA’s Space Place website made the whole discussion about the greenhouse effect understandable by children by providing a game about the greenhouse effect titled “Greenhouse Gas Attack!” alongside an article that provides a brief explanation about the greenhouse effect and the ozone layer in a way that children would understand the topic with ease [2, 3]. They also found out that there is an underlying connection between drought and floods. They explained that, likely, the direct impacts of climate change on water resources will be hidden beneath natural climate variability [4]. Although floods can occur alongside the presence of drought, in some places, especially in developing countries, the mere act of proper waste disposal can help in mitigating flood. It was stated that solid waste management is a large and growing problem for countries in the developing world. A review of the literature and analysis of case studies (both from the literature and from examples collected in the preparation of a global urban flood handbook) confirm that solid waste management is an emerging issue in flood risk management practice [5]. Solid waste management is seen to be observed in households in Metro Manila. The types of wastes commonly generated are food/kitchen wastes, papers, PET bottles, metals, and cans, boxes/cartons, glass bottles, cellophane/plastics, and yard/garden wastes. The respondents segregate their wastes into PET bottles, glass bottles, and other waste (mixed wastes) [6]. These wastes can be segregated by type to be properly used according to their classification. This is also what the proponents aim for in creating the game: to be able to teach the players on the practices to help in cleaning the environment while enjoying the game as an entertainment medium. Games are a great medium

HWYL: An Edutainment Based Mobile Phone Game …

477

for people to learn about everything in a fun and entertaining way. In one capstone game project created by students at De La Salle-College of Saint Benilde, you play as Gerb, a hermit crab. Segregate garbage correctly to earn rewards and help him clean the beach for the arrival of his would-be special someone. This game teaches players about proper waste segregation while in turn let them enjoy the game with its interesting story and gameplay mechanics.

3 Research Methodologies The proponents used an agile method to manage the game development project. The Scrum framework was used, thereby dividing the tasks into parts called “sprints” that last by a week as monitored by the project manager. Progress of the project is recorded when a team member finishes a task according to the timeline. Tasks that are not completed was re-evaluated if it should be continued, excluded, or substituted by a new task, and if it will be assigned to a new member. Scrum gives the developers an opportunity to create certain goals and provides team building that promotes distributed work for all members for the game project. This section also describes the methods of research the proponents performed for the project’s development. Since the goal of this project is to give awareness or positive contribution to some of society’s dilemmas, the proponents chose to research on the possible activities people could do to help save our environment from manmade destruction of our natural resources. The proponents considered some of the environmental problems created by man as a starting point for their research. They then designed the game’s narrative inline to the dilemmas and then focused the game play design for edutainment through teaching its possible solutions. In one of the books written by Jane McGonigal, she stated that today, the interactive character of digital services and technologies has further stimulated the development of playful uses and applications of gaming in many areas of life, and the border between gaming for entertainment and gaming for utility purposes is proving increasingly porous and negotiable. Apart from learning, gaming is used commonly in marketing, political campaigning, and even as a gamer generation tactic toward solving the world problems [7]. For the game’s art direction, the proponents considered the most efficient way to create the game’s assets for the purpose of being able to mass produce the needed list in the least amount of time while keeping the art style visually appealing. The team decided on aiming for a stylized low poly mix of Crossy Road and Tearaway for the game’s overall look. In the aspects of the project’s development, the proponents considered using the agile methodology as their software development model as it encourages iteration, collaboration, and a speedy development. The proponents can also track down their progress and see the phase of the project through their backlogs which will help them in the process of creating the game.

478

A. C. Lagman et al.

4 Project Development The game will be divided in two modes: adventure mode and mini game mode (Fig. 1). In the adventure mode, the player can explore the game world and interact with different object and NPCs (Non-Player Characters) to progress through the game. In some instances, some NPCs would require the player to finish a mini game to proceed. In this case, the game would switch into mini game mode. In mini game mode, the player must solve and complete a specific mini game to continue with the game. Failure to do so will revert the player to adventure mode. There will not be any losing penalties in failing mini games, so the player can just interact to the NPC again and play the mini game indefinitely until they solve it. The character should freely be able to move in any location in the game by tapping the Android device to their desired location, and the character would go to it. Players can control the game by mainly tapping the mobile screen to interact in both adventure mode and mini game mode. The game camera will be overhead in a somewhat isometric view of the world. The game levels will be larger than the view accommodated by the camera, requiring the screen to scroll from side to side as the player moves through the level. The game camera will be fixed in such a way that the character will always be visible and the only way to see the other parts of the level is for the character to move. The game will accommodate two mini games for the player to solve for the game story to progress. The mechanics of the mini games are lined with the purpose to inform players about the environment. Illustrated in Fig. 2, which is the first mini game entitled as Segregate! This mini game will be available upon interacting with the sanitary engineer. He will

Fig. 1 A HWYL main menu

HWYL: An Edutainment Based Mobile Phone Game …

479

Fig. 2 Segregate!

Fig. 3 O-zoned!

ask you to give him a hand in manually segregating garbage while he fixes the segregation machine that was damaged due to the overflowing garbage. There will be two (2) containers, each with its distinct uses for segregating biodegradable and non-biodegradable garbage. Three mistakes and you will lose the mini game. You will win the mini game once the timer finishes. The conveyor belt will slowly increase in speed as the bar slowly reaches completion.

480

A. C. Lagman et al.

The second mini game shown in Fig. 3 is named as O-zoned! This mini game will be available upon interacting with the steel worker. She will ask you to help them with the carbon emissions that the factory would give out as the steel for the bridge is being created. Layers of carbon emissions will descent from the stratosphere to the troposphere, in which they will act as greenhouse gas. Using the carbon emissions from the steel factory, you must fill out the layer of carbon so that it would dissipate back to the stratosphere, which will help in absorbing lots of harmful ultraviolet (UV) light from the sun. You will win the mini game once the timer finished. The layers of gas will slowly increase in speed as the bar slowly reaches completion. Once the layer of carbon emissions reaches the troposphere located at the bottom of the screen, you lose the mini game.

5 Results and Discussions The proponents created an online questionnaire to let the play testers give their insights on the game in the following factors: functionality, reliability, usability, efficiency, maintainability, and portability. These factors are in line with the software quality factors that enumerates the factors for a software’s good quality. The grading system used for the survey was the five-point Likert scale with five being excellent Table 1 Functionality factor results Functionality factor results

Result

The game raises environment awareness

4.20

The gameplay gives justice to the theme

4.30

The game controls are easy and understandable

4.35

Table 2 Reliability factor results Reliability factor result

Result

The game is playable all throughout the session

4.30

The game does not show any signs of crashing in any instance

4.15

Table 3 Usability factor results Usability factor result

Result

The game is not difficult to play

4.20

The flow of the game is clear

4.15

The gameplay is fit for mobile devices

4.10

HWYL: An Edutainment Based Mobile Phone Game …

481

Table 4 Efficiency factor results Efficiency factor result

Result

The game runs smoothly

4.20

The loading times are short

4.20

Table 5 Maintainability factor results Maintainability factor result

Result

The game goes back to previous progress even after quitting

4.10

The game completely deletes save data after reset

4.20

Table 6 Portability factor results Portability factor result

Result

The overall look of the game fits the target platform

4.30

The game has potential to be ported to other mobile platforms

4.05

Table 7 Overall results Quality factor

Weighted average

Verbal interpretation

Functionality

4.28

Very satisfactory

Reliability

4.22

Very satisfactory

Usability

4.15

Very satisfactory

Efficiency

4.20

Very satisfactory

Maintainability

4.15

Very satisfactory

Portability

4.17

Very satisfactory

and one being poor. The results of the evaluation are as in Tables 1, 2, 3, 4, 5, 6 and 7. Based on the results, the game garnered very satisfactory results from the play testers, proving that the game has been successful in promoting environment awareness through edutainment, and the game as a system works as intended with compliance to software quality factors.

6 Conclusion and Recommendation Creating a video game is a multidisciplinary endeavor as it would require people who specialized in the technical and the creative fields to collaborate to make a product that would give interactive entertainment to the people. In the span of the creation of the project, the proponents found out that video games are a medium that does not

482

A. C. Lagman et al.

completely revolves on only giving entertainment to the players but can also be used to educate others in an interesting and enjoyable experience that can also be proven effective. In the span of the development of the project, the proponents have found out some important points to consider for future proponents who would want to develop an edutainment game. First is the project’s scope and limitations. Future proponents should be ready for sudden iterations to the design in such a way that the project’s completion is always taken into consideration, and production cuts will not become a bottleneck to the workflow. Future respondents are advised to be very careful in setting their objectives because it has a major influence in the overall development time and will limit the game’s gameplay or form. If possible, future proponents can focus on the following: (1) develop varied gameplay modes if they have enough development time for it and if it has been planned in the early stages of their proposal. (2) Create gameplay mechanics that are deeper and can appeal to hard-core gamers and promote replayability. (3) Research deeper as of why waste disposal continues to be a problem in the Philippines.

References 1. Nave, R.: Greenhouse effect. Hyper Physics, Retrieved March 17, 2016, from Department of Physics and Astronomy, Georgia State University. http://hyperphysics.phy-astr.gsu.edu/hbase/ thermo/grnhse.html 2. Have a Greenhouse Gas Attack! Retrieved March 17, 2016, from NASA Space Place. http://spa ceplace.nasa.gov/greenhouse-gas-attack/en/ 3. Life in a greenhouse? How ghastly! Retrieved March 17, 2016, from NASA Space Place. http:// spaceplace.nasa.gov/greenhouse/en/ 4. Climate Change, Retrieved March 17, 2016, from The National Drought Mitigation Center, University of Nebraska-Lincoln. http://drought.unl.edu/DroughtBasics/ClimateChange.aspx 5. Lamond, J., Bhattacharya, N., Bloch, R.: The role of solid waste management as a response to Urban flood risk in developing countries, a case study analysis. 2012, Retrieved March 17, 2016, from WIT Press. http://www.witpress.com/elibrary/wit-transactions-on-ecology-and-theenvironment/159/23365 6. Bernardo, E.C.: Solid-waste management practices of households in Manila, Philippines. Retrieved March 17, 2016, from PubMed: http://www.ncbi.nlm.nih.gov/pubmed/18991942. Mäyrä, F., Holopainen, J., Jakobsson, M. Research Methodology in Gaming: An Overview. SAGE Publications (2012) 7. Mäyrä, F., Holopainen, J., Jakobsson, M.: “Research Methodology in Gaming: An Overview,” SAGE Publications (2012)

IoT and AI Based Advance LPG System (ALS) Veral Agarwal, Pratap Mishra, Raghu Sharan, Naveen Sharma, and Rachit Patel

Abstract The liquefied petroleum gas (LPG) plays a very crucial role in our day to-day life, whether it is for cooking, heating appliances, industrial use or used by vehicles. The use of LPG in a safe and responsible manner is a necessity of life whereas it is also a necessity to advance the LPG system, but inconsolably until now, no major revolutionary steps are taken to implement the safety and advancement in the field of LPG system. We all observed that the booking and handling gas stove is a bit more laborious task; to first book the cylinder by handling the calls and data and then delivering; these are very cost-consuming and laborious work. To contribute to the motion of safe, advanced, and more responsibility of LPG, we came with an innovative product to satisfy the concern regarding the safe use of LPG gas as well as way more advanced and comfortable transaction between customer and distributor in case of domestic usage of LPG. Our whole idea revolves around the Internet of things and artificial intelligence. The need of this project is to save the time and life of the consumer while operating the gas system, as we are introducing automatic gas booking system, gas leakage detection system, and smart gas stove. Hence, these features are justifying the name of this product “The Advance LPG System (ALS).” Keywords Internet of things · Artificial intelligence · Interactive Web distributor portal · Smart gas stove · Automatic booking system · Gas leakage detection · LPG · PNG · Gas cylinder · Recording gas consumption · Data analysis · ALS · Embedded systems · Sensors and instrumentation · Application programming interface management · Json data packet · Raspberry Pi · Consumer Web application

V. Agarwal (B) · P. Mishra · R. Sharan · N. Sharma · R. Patel Department of Electronics and Communication Engineering, ABES Institute of Technology, Ghaziabad, Uttar Pradesh 522502, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_44

483

484

V. Agarwal et al.

1 Introduction Since earlier twentieth century, LPG has played a crucial role in industries as well in domestic usage. LPG is a flammable gas that has uses in heating appliances, cooking equipment and in industries. Throughout decades, many advancements in the field of LPG have occurred for the beneficiary of the consumers. Prior to electronic gas leakage detector, the gas leakage was detected by the chemically infused papers which changes its colors when exposed to LPG gas [1]. None of the advancement could regulate the gas flow accordingly to stop the further spread. In our proposed system, we are not only notifying the consumer about gas leakage but also regulating it to stop the gas leakage. Introducing interactive voice response system (IVRS); many of the cylinder distributing companies introduced this system that can book gas cylinder with just a call or text message [2, 3]. But in the current scenario of home automation, who will check the cylinder’s weight whether it is empty or not? In our product, we are making our gas cylinder distribution line without any human intervention by using artificial intelligence (AI). That our interactive Web portal made this process hassle-free to book the gas cylinder before it is going to empty and also stores the historical data, and this leads to decrease the chances of getting cheated by the distributor. In this world of automation, we are IoTfying many of the household equipment including the refrigerator, curtains, and other home appliances; here, we are intended to make a smart gas stove [4, 5] that operates from its Web application while ensuring the consumer’s safety [6]. This product is user-friendly as the timer features enables the user to use according to the preferred requirement. In addition to that, we are adding another safety feature in which human detection system is established to detect the human presence nearby gas stove, by which if any gas is remained open for more than threshold value set by the user and it does not detect any human presence, then it will completely turn off the gas automatically and notify the user, respectively. User can operate the gas stove from local area network having specific range in which he is able to regulate the gas flow of every gas burner of stove from the Web app [7].

2 Related Work In this Advance LPG System, The LPG gas can normally be saved without any spillage issue [8], no distributor can cheat based on cylinder weight, Automated booking of cylinder, Ease of using Gas stove with Smart gas stove that is having Timer, Temp and Humidity, Human detection features to ensure the safety of consumers [9]. Analysis of gas consumption at Individual and National level.

IoT and AI Based Advance LPG System (ALS)

485

3 Components We have used many of components, and here we are explaining some of our major components: MQ-6—this sensor is used for detecting the gas leakage, PIR—this sensor is used for the human detection, BME-280—this sensor is used for temperature and humidity detection, load Cell—this sensor is used for taking weight of cylinder, servo motor for controlling the gas supply, ESP-32 for controlling the sensors inputs and outputs, Raspberry Pi-4—used in main processing unit in distributor side. HX711—a load cell amplifier module, voltage converter (Buck)—converts the voltage from one level to another, GSM SIM800L—this module is used for sending the data from the consumer device to distributor side using the GPRS.

4 System Operation This system comprises of four distinctive operations that work all together by sharing immediate live data continuously and working upon that. 4.1 Distributor and user interface 4.2 Automatic gas booking system 4.3 Gas detection system 4.4 Smart stove

4.1 Distributor and Customer/User Interface The user interface is to show live data of the amount of gas, gas leakage level (if any) and current state of the gas stove (on or off) as well as detailed information about the booking status. The interface is also useful for the distributor to check live status of various gas booking from numerous customers. Websites are to work as an interface as it is easily accessible by both user and distributor [10]. 1.

2.

Register page—This is where user can register and open/sign up their account on the distributors/company portal for further services by giving some details like customer ID, registered ID, contact number and address. The info gets stored in library for future references. Login Page—After registering on company portal, user just needs to login with their customer number/username from next time to see live status of amount of gas left in cylinder or any other data regarding safety like gas leakage, humidity, and temperature. User will also get notification for gas booking if gas level goes down by set limit, and they can book next cylinder online with just one click. The gas pipeline customer can also check their gas consumption and can also pay their bill from the portal only.

486

3.

V. Agarwal et al.

Home/Live status page—The stage that can be accessed after log in into the portal, here user can access all the data regarding gas usage and gas booking. User can also edit some previously stored data from the home page like contact details or address.

4.2 Automatic Gas Booking System In existing framework, there are numerous issues like whenever our LPG gas barrel is vacant, at that point, we ask for new chamber at the workplace of LPG. And mainly due to unavailability/delay of barrel at that certain time, we face so many problems. LPG gas comes in metal chamber, and in this way, one cannot monitor the amount of fuel present in the barrel. In our proposed framework, all these issues are resolved (Fig. 1). As we constantly measure the gas level in the chamber utilizing load cell which is interfaced with Arduino UNO. Client can see the gas level anytime, and when the level of gas is low, then our system will book the barrel automatically through our Web portal and sending the confirmation message of booking to the client by utilizing GSM module which is interfaced with Arduino UNO. GSM module is helpful to send and get messages based on AT commands, also using GPRS for http post to post the JSON packet that includes all the relevant data on API. These instructions used to control a modem interfacing it to the micro-controller. It works in 12 V adapter. It requires particularly less memory to send [8].

Fig. 1 Automatic gas booking and gas leakage detection

IoT and AI Based Advance LPG System (ALS)

487

4.3 Gas Leakage Detection System To detect the leakage of the gas, we proposed smart leakage detection. The prominent feature of our product is its ability to detect gas leakage for the safety and welfare of users. The detection of leaked gas is done by MQ-6. The MQ-6 gas sensor is able to detect or measure gases like LPG and butane. The digital pin comes with MQ-6 sensor which makes this sensor to operate even without a microcontroller which makes it easy when you are only trying to detect one particular gas I. MQ-6 is placed near of the gas cylinder; when leakage occurs, the resistance of the sensor decreases hence increasing its conductivity [1].

4.4 Smart Gas Stove There are several scenarios, in which we forget to turn off the gas and the food we are making gets burned due to overheating, or sometimes we cannot reach the gas stove at that time or sometimes we just forget that something is put on the gas. To solve these problems and make the kitchen technically advanced, we proposed an IoT-based gas Stove that can be controlled by its application [7]. In current times, almost all the gas stoves are manually operated. The feature of self-ignition while turning the knob to ON position is the most advanced gas stove available at present [8]. The goal of the automation is to reduce the human interference in the work and provide more safety. The idea is to make cooking food easy and more accessible by advancing the gas stove (Fig. 2). The proposed idea is that we introduce some sensors on the gas stove like motion sensor (PIR), gas sensor (MQ-6), temperature sensor (DHT11) where PIR sensor is used for detecting the motion, gas sensor is used for detecting the gas leakage also a servo motor plays a very important for turning off the gas supply. Here, the servo motor we used in the work is directly coupled with the cylinder valve with a

Fig. 2 Smart gas stove

488

V. Agarwal et al.

rigid point. Hence, when the motor is rotated, it compels the valve to turn ON/OFF, according to the direction of rotation of the motor. The motor gets the input voltage only when the proximity sensor detects the presence of vessel, and when the vessel is taken out, the proximity sensor will lose its contact with the vessel and gives the input to the motor to rotate in opposite direction to shut the flame OFF [11].

5 Comparison

Factors

Existing system

Advance LPG system

Features Till now existing system only focusses on Our system provides all existing features the gas leakage detection, automated gas as well as some additional features like booking, and automatic gas stove smart gas stove that detects the presence of human being around the stove and also created interactive distributor portal Cost

Cheaper but having limited feature

Our system is also comparatively cheaper and provides more features than existing system

Security Provide security only in terms of leakage Advance LPG system provide more detection and automatic ON/OFF cylinder security than the existing system. Our regulator IoT-based stove is automatically off when the user forgets to turn off the gas stove

6 Results The system is constructed, readings are checked, and the distributor gets the consumption data (Fig. 3), and hence, it is quantitative information to the distributor about whereabouts of the gas consumption for better gas distribution management as well as better market analysis for future growth. The gas sensor notifies the user whenever it detects gas particle, and system also books gas automatically whenever gas load goes below mentioned threshold level. The portal is giving all the important information to the user. Figure 4 shows the prototype of the system; it benefits in such way that it detects the leakage of the LPG and when the level exceeds its threshold it immediately alarms the user while sending notification on its app also regulates the knob to off position which blocks the further leakage. Another functionality is that it continuously monitors the LPG level in the cylinder and books the cylinder automatically while acknowledging and asking the consumer to book gas and accordingly sent the requirement on distributor portal. For making our product more efficient in terms of power consumption, we are using microcontroller in deep sleep mode due to which its power consumption is 10 uA; in terms of notification speed, our system is sending

IoT and AI Based Advance LPG System (ALS)

489

Fig. 3 Data analysis (monthly LPG consumption in 2021–2022)

Fig. 4 Prototype of automatic booking system

data at 90 bytes within 10 s that is sufficient for managing the flow of the system. If we talk about the drawback, our product needs to be placed in well-networked region in means of Internet or cellular connectivity.

7 Conclusion The system of detecting gas level, automatically booking it, and providing many other crucial data regarding cylinder on live status is designed and implemented in

490

V. Agarwal et al.

this paper is cost-effective. In addition to that, it covers extra safety measures like motion detection which makes this system innovative as well as a smart system. Well-structured and user-friendly online domain page makes it consumer friendly and makes the process of gas booking very convenient and hustle free. The cost for developing this system is comparably less to the price of fuel detectors or similar products which are commercially available within the market. We are researching and developing to make our product more efficient and advanced in all means for future.

References 1. Kodali, R.K., Tirumala Devi, B., Rajanarayanan, S.C.: IOT based automatic LPG gas booking and leakage detection system. In: 2019 11th International Conference on Advanced Computing (ICoAC), pp. 338–341 (2019). https://doi.org/10.1109/ICoAC48765.2019.246863 2. Tamizharasan, V., Ravichandran, T., Sowndariya, M., Sandeep, R., Saravanavel, K.: Gas level detection and automatic booking using IoT. In: 2019 5th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 922–925 (2019). https://doi.org/10. 1109/ICACCS.2019.8728532 3. Macker, A., Shukla, A.K., Dey, S., Agarwal, J.: ARDUINO based LPG gas monitoring … automatic cylinder booking with alert system. In: 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI), pp. 1209–1212 (2018). https://doi.org/10.1109/ICOEI. 2018.8553840 4. Islam, M.R., Matin, A., Siddiquee, M.S., Hasnain, F.M.S., Rahman, M.H., Hasan, T.: A novel smart gas stove with gas leakage detection and multistage prevention system using IoT LoRa technology. In: 2020 IEEE Electric Power and Energy Conference (EPEC), pp. 1–5 (2020). https://doi.org/10.1109/EPEC48502.2020.9320109 5. Jahan, S., Talukdar, S., Islam, M.M., Azmir, M.M., Saleque, A.M.: Development of smart cooking stove: harvesting energy from the heat, gas leakage detection and IoT based notification system. In: 2019 International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), pp. 117–120 (2019). https://doi.org/10.1109/ICREST.2019.8644117 6. Yalmar, A., Parihar, M., Kadam, V., Kharat, K.: Implementation of automatic safety gas stove. In: 2015 Annual IEEE India Conference (INDICON), pp. 1–6 (2015) 7. Dhianeswar, R., Sumathi, S., Joshitha, K.L.: Automatic gas controller. In: 2018 International Conference on Communication, Computing and Internet of Things (IC3IoT), pp. 215–218 (2018). https://doi.org/10.1109/IC3IoT.2018.8668178 8. Abirami, S., Priya, T., Iswarya, L.R., Thaila, M., Priyanka, S.: Automatic gas monitoring and booking through android application. IJITEE 8(6S4), ISSN: 2278–3075 (2019) 9. Unnikrishnan, S., Razil, M., Benny, J., Varghese, S., Hari, C.V.: LPG monitoring and leakage detection system. In: 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 1990–1993 (2017) 10. Kumaran, M., Pradeep, J., Hounandan, R., Prahatheesh, B.: Smart LPG cylinder monitoring and explosion management system. In: 2021 12th International Symposium on Advanced Topics in Electrical Engineering (ATEE), pp. 1–7 (2021) 11. Saha, D., Mandal, A., Pal, S.C.: User interface design issues for easy and efficient human computer interaction: an explanatory approach. Int. J. Comput. Sci. Eng. 3(1), 127–135, EISSN: 2347–2693 (2015)

ICT-Enabled Automatic Vehicle Theft Detection System at Toll Plaza Kamlesh Kumawat and Vijay Singh Rathore

Abstract In the current era, technology is the basic need of people and the imagination of life without technology is not possible. Everyone is surrounding with technologies. The automobile industry is also improving their technologies and introducing latest technologies for safety, security and comfort of people. But still, there is one major issue facing by people which is to detect and recover theft vehicle. Theft vehicle detection cases are the minimum solving cases in India. After vehicle got stolen, it is not possible for police department to track and detect the vehicle in some situations. The present electronic toll collection (ETC) system plays vital role in vehicle theft detection because the vehicles are RFID-enabled and easy to detect while crossing toll. But still, there are some situations where this ETC system cannot detect theft vehicle. To solve this problem, the paper presents an advance vehicle theft detection model based on ICT which helps to detect and recover the theft vehicle at toll plaza. The paper also focuses on current technologies like GPS, GSM, QR code, RFID, OTP, Fingerprint identification system and smart phone applications which are available for detection of theft vehicles. The main objective of this paper is to analyze the limitations of the existing vehicle theft detection system and develop new system to overcome these limitations. Keywords ETC · FASTAG · RFID · Vehicle theft detection system · QR code · GSM · GPS · Regional transport office (RTO)

1 Introduction The general reports published in newspapers of India show that the recovery rate of theft vehicle is too low, and these are the least solving cases by the police department. There are different technologies available to detect the theft vehicle like GPS, K. Kumawat (B) · V. S. Rathore Department of CS and IT, IISU (Deemed to be University), Jaipur, India V. S. Rathore e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_45

491

492

K. Kumawat and V. S. Rathore

GSM, OTP, and RFID, fingerprint identification system, QR code and smartphone applications. The government of India mandates FASTag, which should be mounted on vehicles windshield. FASTag is the radio frequency identification (RFID) technology which helps to detect the vehicle information and to deduct toll amount at toll plaza. The FASTag also helps to detect theft vehicle. Whenever a vehicle is stolen and reported to the police station, the department informs to Regional Transport Office (RTO) and the vehicles’ RFID tag would be blacklisted by RTO. When the tag is blacklisted and reaches at toll plaza, the toll collection system shows the vehicle as theft and blacklisted and toll plaza does not allow the vehicle to pass through toll gate and also informs nearby police station and the owner of the vehicle through short message service (SMS). The research paper includes vehicle theft statistics as well as current vehicle theft first information report (FIR) process and recovery process in India.

1.1 Vehicle Theft Statistics in India • The report published by Times of India on Jan 20, 2018 states that the recovery percentage of stolen vehicle was only 13% at Mohali, India. Increasing cases of vehicle theft in cities are proving that the thieves are no longer scared of the police department [1]. • Another report published on Jan 10, 2019 presents that for every hour, over five vehicles reported theft in New Delhi during 2018. The data released by Delhi Police showed that more than five vehicle/hour reported theft in Delhi. In 2018, 44,158 vehicle theft cases were reported, which were 39,084 in 2017. Among the stolen vehicles, 8036 (18.20%) were cars. As many as 4,619 (10.46%) stolen vehicles were recovered, and 6751 auto lifters arrested [2]. • Another report published by The Hindu says that police data shows motor vehicle theft as the least solved crime [3].

1.2 Current FIR Process for Vehicle Theft Detection After vehicle theft, the owner of the vehicle lodges FIR to the police station. Previously, the FIR reported only at police station, but nowadays, some of the states have launched Web applications for online lodging FIR of theft vehicle. The complaints can be lodged anytime and anywhere through these applications. This online process brings transparency. The whole data is also available on these Web applications which can be executed by both public and officials. For now, these applications are only used for vehicle theft and lifting cases. The system helps to reduce pendency of cases. If citizen provided wrong information about vehicle theft or lodged fake FIR, then case will be registered against the person.

ICT-Enabled Automatic Vehicle Theft Detection System …

493

The Web application launched by Delhi police in India contains five links: (a) Register FIR, (b) Retrieve FIR, (c) Retrieve Final report, (d) FIR Status and (e) FAQs through which the complaints can be easily lodged by vehicle owner by entering some basic information about vehicle, place and owner detail. The person can retrieve the copy of the FIR also immediately from the app or website [4]. The process makes the way to lodge FIR against theft vehicle very smooth and easy, but still, the recovery rate is very low as per records.

1.3 Current Vehicle Theft Recovery Process • The current offline recovery process is that the vehicle owner has to lodge FIR against the theft vehicle at the police station, and then, the department informs RTO about the vehicle. The RTO then blacklisted the RFID tag mounted on the vehicle. When the vehicle arrived at the toll plaza, the RF reader read the tag as blacklisted, and a silent alarm buzz and information of the vehicle is sent to the nearby police station and the vehicle owner through SMS and e-mail. But there are still many situations where the toll plaza is not able to detect the theft vehicle. • A V-Seva service also has been introduced by Insurance Information Bureau (IIB) to detect the theft vehicle in India in 2014. It stores a unique database of vehicles. The data has collecting through general public, police departments and insurers as well. With this plan, the police can find out theft vehicle, but they need enough data that can match with car’s chassis and engine number, which is quite difficult as this is possible only after recovering the stolen car [5]. • Another application only to check and trace the updated status of theft vehicle has been also launched in 2016 which is ‘Vahan Saman- vaya’ [6]. Figure 1 shows the working process of automated toll collection system, where the motorist purchases an RFID tag and mounts it to the vehicles’ windshield. When the vehicle approaches the toll plaza, the RFID reader scans the RF tag and sends the radio waves to the tag. The waves activate the windshield-mounted RFID tag. The activated tag sends the vehicle information to the RFID reader. The reader sends the tag details to the lane controller. The lane controller, which is part of the local area network, transmits the vehicle information to the central computer that deducts the toll from the motorists account. A system where the owner of the vehicle can directly inform the toll gate authorities about the theft vehicle and can send a secret number introduced in 2016. According to the system, when vehicle cross the toll gate, the gate controller software displays related information of vehicle and identify the vehicle as theft. After the toll gate authority ask the unique code to the vehicle driver. If the code given by driver is correct, the gate opens but if it is wrong, the gate remains close and an alert SMS is sent to the vehicle owner with information of place and toll gate name [8]. Another system proposed in 2018, uses RFID technology, API and communication protocol. The main objective of the system was to present a new approach of arresting

494

K. Kumawat and V. S. Rathore

Existing Toll collection and Vehicle Theft Detection working process at Toll plaza

Fig. 1 Working of automated toll collection system [7]

system and car theft detection using RFID technology at toll plaza. The paper also describes uses of FASTAG, electronic toll collection (ETC) using RFID technology that automatically deduct the toll from vehicles when they pass through the toll plaza. The owner can change the RFID tag just like changing password. The paper also introduces an application RCTDAS GUI through which user can report anytime from anywhere for missing car [9]. The system proposed by Viniatha and Velantina [10] uses IoT-based toll collection system. This system helps the toll plaza to deduct the toll amount by just showing a card. The RFID reader scans the RFID card shown by the vehicle owner and get information of the RFID cards. The LCD screens on the plaza display the message that shows the available balance of the card. Then, toll amount deducts from available balance of the users’ card amount. The SMS will be sent to the owner’s mobile number after deduction of toll amount. The indication by the red LED lights shows the operation default, yellow shows processing and green shows successful. After successful transaction, the vehicle will be allowed to cross the toll plaza. Thus, the system includes RFID technology, GSM, Arduino for toll collection and theft detection [10]. Another system of fingerprint and driver identification was proposed in 2019. It introduces an application for improved car security and driver identification. A new driver profiling and identification model has been introduced which is based on data

ICT-Enabled Automatic Vehicle Theft Detection System …

495

collected from different sources. The driver of car can log in the application and download his profile from the cloud. In case of car sale, a car owner can log out and reset the application to its initial settings. This system can be also helpful for driver identification in proposed system [11]. The paper published in 2020 proposed a system using vehicle number plate and color whether number plate is altered or not. It uses microcontrollers and some modules. The police department can also upload number or the engine number of the theft vehicle for the more exact search. The system has an inbuilt digital signature which helps to track vehicle. It enables the breakdown mode after few seconds it left the toll plaza to immobilize the theft vehicle [12]. A number plate recognition method is proposed in the paper introduced by Akhtar and Ali [13]. At first, the number plate image of the passing vehicle captured, and then toll collection process takes place. It is four steps process: reprocessing, number plate localization, character segmentation and character recognition. The accuracy of this method is 90.9% according to the experimental results. Thus, the system uses automatic number plate recognition system (ANPR), character recognition and edge detection techniques [13].

1.4 Limitations of the Available Vehicle Theft Detection Systems 1.

After a vehicle theft, following phases may occur: • A vehicle theft but no FIR lodged by owner at police station, and the vehicle reaches the toll plaza with RFID and vehicle number plate. • A vehicle theft but no FIR lodged by owner, and vehicle reaches toll plaza with real number plate and damaged/destroyed RFID. • A vehicle theft but no FIR lodged by owner, and vehicle reaches toll plaza with forged number plate and damaged/destroyed RFID. • A vehicle theft and reported to the police station, and the vehicle reaches toll plaza with RFID and real number plate. • A vehicle theft and reported to the police station, and the vehicle reaches toll plaza with damaged/destroyed RFID and real number plate. • A vehicle theft and reported to the police station, and the vehicle reaches toll plaza with damaged/destroyed RFID and forged number plate.

2. 3. 4.

There is no model proposed to detect vehicles in all situations. There is no application from which the overall status of the vehicle can be viewed by the police as well as vehicle owner. The existing system is not secure because if there is weapon carried by thief, then there is high risk to stop the vehicle after theft detection at toll plaza. The existing system is not able to detect the vehicle whose speed is more than the limit decided to pass through the toll gate.

496

K. Kumawat and V. S. Rathore

2 ICT-Enabled Vehicle Theft Detection System at Toll Plaza (Proposed Concept) 2.1 Elements of New Toll System A.

B.

C.

D.

E.

RFID System (FASTAG): A RFID system includes two major parts: RFID Tag and RFID Reader. This technology uses radio waves to identify people and objects. It is a wireless technology which can work from a distance without requiring a line of sight between tag and reader [14]. The vehicle owner can register himself inserting following required information and can get RFID tag. The tag has to be mounted on vehicle, and the reader scans the tag at toll plaza. Central Server: Central server is a large-scale database that can process so many records at a time has efficient searching algorithm and provides quick response to the scanned input from the scanner. In our proposed system, it helps to maintain the records, manage the toll taxes and also check the user toll amount status [15]. Automatic Number Plate Recognition System (ANPR): ANPR is a computer vision technology that recognizes the vehicle’s number plate without direct human intervention. This system captures the image of the vehicle and extracts the characters of the number plate. These extracted characters then can be searched in the database to identify the owner of the vehicle [13]. GSM: GSM module helps the system to communicate by sending and receiving messages. This architecture is used by many countries for mobile and computer communication [16]. Buzzer: The buzzer is a beeping device used at toll plaza. It requires oscillator circuit and speaker to make a beeping noise with the help of DC voltage [16].

2.2 Working Mechanism The current working process is when a theft vehicle arrives at toll plaza, the reader reads the RFID tag mounted on the theft vehicle. If the tag is blacklisted by RTO, the buzzer beeps and the toll gate remains closed, and also the vehicle is not allow to pass the gate. Then, the system automatically sends a message to vehicle owner and nearby police station. The proposed system works if there is no RFID tag mounted on vehicle. In this situation, it becomes difficult to detect the theft vehicle. In this case, when a theft vehicle arrives at toll plaza without RFID tag, the proposed system detect the owner information with help of ANPR system and send an OTP to the owners’ registered mobile number. If the driver of the vehicle provides the correct OTP, the vehicle allows passing or the silent buzzer beeps and information send to the owner and nearby police station.

ICT-Enabled Automatic Vehicle Theft Detection System …

497

2.3 Flow Chart The flow chart represents the control flow directions and also shows various possibilities may occur at the time of toll deduction at toll plaza. Flow chart presents how the proposed system work when a theft vehicle arrives at toll plaza with or without RFID and number plate.

3 Pros and Cons The proposed method is very efficient for electronic toll tax collection and theft vehicle detection. The system can work in various following situations that may occur at toll plaza when a theft vehicle approach. The system is also cost effective as RFID is the low-cost and fastest growing technology all over the world. This will lead to reduce the current vehicle theft cases and increase the theft vehicle detection in India. Some of the problems we can face are due to the network issues. In low-network areas, the system cannot work properly because the messages can take time to deliver on owners’ mobile and police station. The delay of messages and retrieve data through ANPR can affect overall system work and time.

4 Conclusion The RFID technology with ANPR at toll plaza’s will make the traffic smoother and efficient. It will also reduce vehicle theft cases and improve recovery rates in India. The system also helps in low-cost transportation by saving fuel cost reduce long jams and waiting for cash toll deduction. The obstacles can be of this study is to collect actual stolen vehicles database from RTO and police department, also the system cannot be work properly in rural or low network areas. In future scope, we can apply a weapon detection system also, like airports, to reduce the accidental cases while recovering theft vehicles because in most of the trafficking cases, theft vehicles are used by criminals.

References 1. Times of India, https://timesofindia.indiatimes.com/city/chandigarh/only-13-stolen-vehiclesrecovered-in-2017/articleshow/62575258.cms. Last accessed 2021/02/21

498

K. Kumawat and V. S. Rathore

2. Jakhar, A.: Over 5 Vehicles Were Reported Stolen Every Hour in Delhi During 2018: Police. https://www.news18.com/news/auto/over-5-vehicles-were-reported-stolen-every-hour-indelhi-during-2018-police-1997933.html. Last accessed 2021/02/21 3. Bhandari, H.: Police Data Show Motor Vehicle Theft the Least Solved Crime. https://www.the hindu.com/news/cities/Delhi/police-data-show-motor-vehicle-theft-the-least-solved-crime/art icle25954331.ece. Last accessed 2021/02/21 4. http://mvt.delhipolice.gov.in/. Last accessed 2021/02/21 5. Times of India, Tracing Stolen Vehicle Just a Click Away. https://timesofindia.indiatimes.com/ city/hyderabad/Tracing-stolen-vehicle-just-a-click-away/articleshow/39319578.cms. Last accessed 2021/02/21 6. Gadgetsnow: ‘Vahan Samanvaya’ App with Data on Stolen Vehicles Across the Country Launched. https://www.gadgetsnow.com/apps/Vahan-Samanvaya-app-with-data-on-stolenvehicles-across-the-country-launched/articleshow/51394172.cms. Last accessed 2021/02/21 7. Parimi, J.: How Will Electronic Toll Collection Work in India. https://www.quora.com/Howwill-electronic-toll-collection-work-in-India. Last accessed 2021/02/21 8. Mahesh, B., Prabu, S.F., Kumar, M., Balamurugan, P.: Theft vehicle identification system in toll gate by using RFID, GSM and visual basic front end. Int. J. Sci. Eng. Appl. Sci. (IJSEAS) 2, 2395–3470 (2016) 9. Murugan, K., Gobu, R., Zabiyullah, G.S., Gunasekaran, R., Santhosh, V.: An automation of vehicle theft detection in the toll plaza by using the RF technology. IJEEE 1(3), 10–17 (2018) 10. Vinitha, V., Velantina, V.: Advanced automatic toll collection and vehicle detection system using Internet of Things. SSRG-IJEEE 6(8), 5–10 (2019) 11. Mekki, A.E., Bouhoute, A., Berrada, I.: Improving driver identification for the next-generation of in-vehicle software systems. IEEE Trans. Veh. Technol. 68(8) (2019) 12. Mallikalava, V.Y., Vengatesan, K., Kumar, A., Punjabi, S., Sadara, S.S.A.: Theft Vehicle Detection Using Image Processing integrated Digital Signature Based ECU. IEEE (2020). 978-1-7281-5821-1/20 13. Akhtar, Z., Ali, R.: Automatic Number Plate Recognition Using Random Forest Classifier. Springer. Department of Computer Engineering, Aligarh Muslim University, Aligarh, India. SN Comput. Sci. 1, 120 (2020) 14. Prathiba, S.D., Viji, A., Mary, A.: Online payment of tolls and tracking of theft vehicles using number plate image. Glob. J. Pure Appl. Math. 13(7), 3005–3012 (2017). ISSN 0973-1768 15. Raj, U., Nidhi, N., Nath, V.: Automated Toll Plaza Using Barcode-Laser Scanning Technology. Springer. Department of Electronics and Communication Engineering, Birla Institute of Technology Mesra, Ranchi 835215, India. © Springer Nature Singapore Pte Ltd. 2019. Nath, V., Mandal, J.K. (eds.) Nanoelectronics, Circuits and Communication Systems, Lecture Notes in Electrical Engineering 511. https://doi.org/10.1007/978-981-13-0776-8_44 (2019) 16. Mohanasundaram, S., Krishnan, V., Madhubala, V.: Vehicle Theft Tracking, Detecting and Locking System Using Open CV. In: International Conference on Advanced Computing & Communication Systems (ICACCS), ©2019 IEEE, pp. 1075–1078 (2019). 978-1-5386-95333/19/$31.00

RBJ20 Cryptography Algorithm for Securing Big Data Communication Using Wireless Networks S. Rajaprakash , N. Jaishanker , Chan Bagath Basha , S. Muhuselvan , A. B. Aswathi, Athira Jayan, and Ginu Sebastian

Abstract In computer networks communication, the amount of data is continued to grow because of use of internet, smart phone, social network, web log data, etc. The volume of data communicated in this way is very large in size as well as complex. In general, the amount of data transferred in our apps is increasing, and it is now measured in peta-bytes and exa-bytes. Big data analytics with novel encryption techniques can provide new ways for businesses and government to analyse and communicate unstructured data more efficiently. After analysing, it is necessary to store them with protection and it is necessary to authenticate these data sets. The traditional security systems are unable to provide effective protection and authentication for these types of data sets. To overcome the above problem, we have proposed an effective cryptographic method called RBJ20 algorithm with matrix of order N × N it has four layers. The first layer has one step called column operations in the matrix. In the second layer, any integer sk is chosen as secret key and the matrix elements multiplied discriminant part should not be a whole number, concatenating the digits in the discriminusing sk. The third layer has five steps, namely choosing of any three prime numbers to form quadratic equation with condition that ant part, pairing of digits with two digits from left to right to act as cells of the matrix. Now each element will be treated as cells and then the cell elements will be swapped according to pairs, and this process is done for all the pairs. There are five stages in the fourth tier. Keywords RBJ20 · Prime numbers · Perfect numbers · Salsa20 · Chacha20 and AES

S. Rajaprakash (B) · C. B. Basha · S. Muhuselvan · A. B. Aswathi · A. Jayan · G. Sebastian Aarupadai Veedu Institute of Technology, Chennai, India N. Jaishanker Misrimal Navajee Munoth Jain Engineering, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_46

499

500

S. Rajaprakash et al.

1 Introduction The use of the Internet, social media, and other apps has become an integral part of daily life, resulting in a massive growth in the amount of data being saved, communicated, and processed. Especially for storage, transmission and to process by the individual and industry. As a result, new cryptographic algorithms must be designed to investigate the different symmetric and asymmetric encryption decryption approaches accessible. Various versions of chacha [1] and chacha [2] algorithms have been studied and analysed along with efficiency of traditional AES algorithm and compared the amount of resistance provided and time taken by these algorithms. To meet the upcoming new challenges and to provide effective protection for large analysed data from various sources such as the Internet, bank, social network, smart phone, and confidential data, a four-layer method known as RBJ20 (named after authors Rajaprakash, Basha, and Jaisankar) has been proposed. With improved resistance against various traditional attacks and authentication. The suggested approach is tested against a variety of known cryptographic techniques.

2 Past Work This author discusses the ChaCha family, which discusses the fault attack of extra rotations XOR [3]. They presented a novel approach called Freestyle, which utilised several cypher texts and provided a new idea called hash-based stopping conditions and key guessing [1]. Alexandre Adomnicai et al. proposed chacha side channel analysis, and it is intended to detect leaks linked to memory and also implemented bricklayer attract [2]. Kazuhide Fukushima et al. are presented an attack for fault injection on the ChaCha and Salsa20 cyphers, which were apply to initialise the matrix, generate a key, count the blocks, generate a nonce, and add the matrix [4]. They spoke about the hash function Double A, which has two rounds, one for each column and one for each row by Abdullah Issa et al. [5]. Bodhisatwa Mazumdar et al. discuss “an addition rotation XOR (ARX)”, which is a high-security cypher [6]. The Double A hash calculation has been broadly read for security purposes [7]. Bodhisatwa Mazumdar et al. proposed the weakness of Salsa20, they examined the force investigation assault and connections power examination (CPA). Power analysis assault [8] is the most effective assault. Conrad Watt et al. studied about secure algorithms can be implemented quickly and easily with this architecture [9]. Z. Shi et al. proposed probabilistic neutral vectors (PNV) are a generalisation of probabilistic neutral bits. It’s utilised to discover and refine crucial recovery assaults on Chacha20 and ChaCha [10]’s decreased rounds. Somasundaram, Rajaprakash, and Bagahbasha21 introduced the SRB21 approach, which focuses on the prime numbers of the secret key [11, 12].

RBJ20 Cryptography Algorithm for Securing Big Data …

501

3 Methodology The suggested approach is a four-layer RBJ20 cryptographic algorithm with a matrix of order N. The matrix’s initial layer is in column operations. The matrix’s subsequent layer is multiplication of secret key. There are five stages in the third tier. Choose any three prime numbers as the first step. The second step is to write a quadratic equation with the constraint that the discriminant component (D) in the provided matrix is not a whole number. Concatenate all of the digits in D into a one row in the third phase. The fourth step is to build a pair from the third step’s left side to right side. The fifth step is to use a pair from the specified matrix to swap the cell values. There are five stages in the fourth tier. The first stage is to choose prime numbers in the supplied matrix based on the matrix’s size and the integer e; the second step is to locate perfect numbers. Concatenate all ideal numbers into a solitary column in the third stage. From the third stage, the fourth stage is to make a couple from left side to right side. The fifth step is to utilise a couple from the predetermined network to trade the cell esteems. As demonstrated in Fig. 1, the four-layer inverse method is used to decrypt the original data.

Fig. 1 Flow diagram of RBJ20

502

S. Rajaprakash et al.

3.1 RBJ20 Algorithm The RBJ20 encryption and decryption Algorithm given below. It contain thirteen steps. RBJ20 Encryption Algorithm RBJ20 contain 13 steps. 1: CA = Ci ↔ Ci+(n−m)

(1)

where CA is column encrypted matrix, C is represent columns, i, n and m are represent column numbers. 2: In the matrix “A, multiply the secret key”. A = sk.A

(2)

A is matrix, sk is key encryption in any integer. 3: To investigate prime numbers that may exist in the existing matrix. 4: E M = −b ±

√

D/2a

(3)

where EM is encrypted matrix, Where a, b, and c are prime numbers, D = (b2 − 4ac) and D should be perfect square. 5: To organise the entirety of the numbers in single line. 6: From Step 5, form a couple of numbers from left side to right side. 7: Each pair should swap the cells value of the in the supplied matrix. 8: Distinguishing the indivisible numbers in the Matrix A. 9: PN = (e K −1 )(e K − 1)

(4)

where “PN is perfect number”, “e is integer number”, “e > = 2”, and “k is prime number” from given Matrix A. 10: From step 9, calculate the number of perfect numbers. 11: Arrange all of the perfect numbers’ digits in a single row. 12: From Step 11, form a couple of numbers from left side to right side. 13: From the supplied matrix, each pair should exchange cell values. RBJ20 Decryption Algorithm There are thirteen stages in the RBJ20 decryption algorithm. 1: Prime numbers are identified in the Matrix A.

RBJ20 Cryptography Algorithm for Securing Big Data …

503

2: PN = (d K −1 )(d K − 1)

(5)

where “perfect number is PN”, “d is integer number”, “d > = 2”, and “prime number k” from “given Matrix A”. 3: Step 2 yields the number of perfect numbers. 4: Make a single row with all of the perfect numbers. 5: From Step 4, form a couple of numbers from right side to left side. 6: From the provided grid, each pair should trade cell esteems. 7: Inspect the lattice for any plausible indivisible numbers. 8: (6) DM = −b ± b2 − 4ac/2a where decrypted matrix DM, a, b, and c are possible prime numbers. 9: To “sort out the aggregate of the numbers” in single line. 10: From Step 9, “build a couple of numbers from” right side to left side. 11: From “the provided lattice, each pair should trade cell” esteems. 12: In “the matrix A”, “divide” by “sk”. A = A/vk

(7)

CA = Ci ↔ Ci+(n−m)

(8)

vk is decryption key. 13:

“CA is column encrypted matrix”, C is columns, i, n and m is column numbers.

3.2 RBJ20 Algorithm Encryption • The proposed RBJ20 algorithm is based on the RBJ20 technique. A = where “matrix A is analysed data”. • By using Eqs. (1) and (2) and “sk = 5” ⎡

⎤⎡ ⎤ 511/3 512/3 513/3 2536/3 2560/3 2555/3 A = ⎣ 514/3 515/3 516/3 ⎦ ⎣ 2580/3 2575/3 2570/3 ⎦ 517/3 518/3 519/3 2595/3 2590/3 2585/3

504

S. Rajaprakash et al.

By “using equations” (3) following steps. 1: “a = 2, b = 3,√c = 7”. √ 2 2: EM = (−3 ± √ 3 − 4 ∗ 2 ∗ 7)/2 ∗ 2, EM = (−3 ± 9 − 56)/4. EM = (−3 ± 47)/4, EM = (−3 ± 6.85665)/4. 3: Finally, “EM = 36,865,654”. 4: “Pair of numbers (3, 6), (8, 5), (5, 6) and (5, 4)”. 5: The 1st pair number (3, 6), “starting from 0, 1, 2, 3, 4, 5, 6, 7, 8, and ninth cell number is 9–1”. • “2nd pair number (8, 5) traded in FPN matrix”. “3rd pair number (5, 6) traded in SPN matrix”. “4th pair number (5, 4) traded in TPN matrix”. ⎡

⎤ ⎡ ⎤ 2536/3 2560/3 2555/3 2536/3 2560/3 2555/3 FPN = ⎣ 2595/3 2575/3 2570/3 ⎦fPN = ⎣ 2595/3 2580/3 2575/3 ⎦ 2580/3 2590/3 2585/3 2585/3 2590/3 2570/3 where FPN, SPN, TPN, fPN are first, second, third forth pair number matrix, respectively. Then, “the original matrix could ‘be successfully encrypted” by using equations. • Finally, PN = 6,284,968,128. Pair of perfect numbers (6, 2), (8, 4), (9, 6), (8, 1), and (2, 8). The 1st pair of perfect number (6, 2), “starting from 0, 1, 2, 3, 4, 5, 6, 7, 8, and ninth cell number is 9–1”. The “2nd pair of perfect number (8, 4) should be traded from FPN matrix”. The “3rd pair of perfect number (9, 6) should be traded from SPN matrix”. The “4th pair of perfect number (8, 1) should be traded from TPN matrix”. The “5th pair of perfect number (2, 8) should be traded from fPN matrix”. Where FPN, SPN, TPN, fPN, fIPN are first, second, third, forth, fifth pair of perfect number, respectively. • Finally, “the original matrix could be encrypted successfully”. ⎤ ⎡ ⎤ 2536/3 2555/3 2560/3 2536/3 2560/3 2585/3 SPN = ⎣ 2595/3 2580/3 2575/3 ⎦fIPN = ⎣ 2595/3 2570/3 2575/3 ⎦ 2555/3 2590/3 2570/3 2580/3 2590/3 2585/3 ⎡

3.3 Working of RBJ Decryption Algorithm ⎡

⎤ 2536/3 2555/3 2560/3 A = ⎣ 2595/3 2570/3 2575/3 ⎦ 2580/3 2590/3 2585/3 where “A is encrypted data matrix”.

RBJ20 Cryptography Algorithm for Securing Big Data …

505

By using Eq. (5), finally, “PN = 6,284,968,128”, “Pair of perfect numbers (8, 2), (1, 8), (6, 9), (4, 8), and (2, 6)”. The 1st pair of perfect number (8, 2), “starting from 0, 1, 2, 3, 4, 5, 6, 7, 8, and ninth cell number is 9–1”. • ⎡ ⎤ 2536/3 2555/3 2585/3 FPN = ⎣ 2595/3 2570/3 2575/3 ⎦, 2580/3 2590/3 2560/3 The “2nd pair of perfect number (1, 8) should be traded from FPN matrix”. The “3rd pair of perfect number (6, 9) should be traded from SPN matrix”. The “4th pair of perfect number (4, 8) should be traded from TPN matrix”. The “5th pair of perfect number (2, 6) should be traded from fPN matrix”. ⎡

⎤ ⎡ ⎤ 2536/3 2560/3 2555/3 2536/3 2560/3 2555/3 FIPN = ⎣ 2595/3 2580/3 2575/3 ⎦ A = ⎣ 2595/3 2580/3 2575/3 ⎦. 2585/3 2590/3 2570/3 2585/3 2590/3 2570/3 where FPN, SPN, TPN, fPN, fIPN are first, second, third, forth, fifth pair of perfect number, respectively. • By “using equations” (6)

√ Step√1: “a = 2, b = 3, c = 7”,√EM = (−3 ± 32 − 4 ∗ 2 ∗ 7)/2 ∗ 2, EM = (−3 ± 9 − 56)/4, EM = (−3 ± 47)/4, EM = (−3 ± 6.85565)/4. Step 2: Finally, “EM = 36,855,654”. “Pair of numbers (4, 5), (6, 5), (5, 8), and (6, 3)”. Step 3: The 1st pair number (4, 5), “starting from 0, 1, 2, 3, 4, 5, 6, 7, 8, and ninth cell number is 9–1”. ⎡ ⎤ 2536/3 2560/3 2555/3 FPN = ⎣ 2595/3 2575/3 2580/3 ⎦, FPN is “1st pair number 2585/3 2590/3 2570/3 The 2nd pair number (6, 5) should be traded from FPN matrix. The “3rd pair number (5, 8) should be traded from SPN matrix”. The “4th pair number (6, 3) should be traded from TPN matrix”. By using Eqs. (7) and (8) then the matrix encrypted and successfully decrypted. ⎡

⎤ ⎡ ⎤ 511/3 512/3 513/3 511/3 512/3 513/3 NPN = ⎣ 514/3 515/3 516/3 ⎦ A = ⎣ 514/3 515/3 516/3 ⎦ 517/3 518/3 519/3 517/3 518/3 519/3

506

S. Rajaprakash et al.

Table 1 RBJ20 encryption, encryption time File size in bytes

Salsa

AES

RBJ20

Encryp

Decryp

Encryp

Decryp

Encryp

Decryp

24

1.59

1.56

1.125

0.991

2.3

1.99

76

1.39

1.35

1.752

1.922

2.8

1.99

111

1.09

1.06

2.758

2.54

2.94

2.91

312

2.83

2.74

1.864

1.82

3.95

4.52

823

2.54

2.69

3.169

2.99

4.9

4.29

1541

3.41

3.44

2.049

2.39

5.65

5.88

6582

2.37

2.21

3.231

2.99

5.9

5.98

4 Comparison Table 1 analyses the presentation of the proposed RBJ20 cryptographic encryption strategy to the customary AES technique and the current salsa algorithm. The presentation investigation of the RBJ20 encryption calculation when contrasted with Salsa and Traditional AES strategies is appeared in Table 1, the proposed RBJ20 cryptographic decryption method is compared to the AES method and Salsa method. Table 1 compares the RBJ26 encryption algorithm’s performance against salsa and AES.

5 Conclusion In this work, we propose the RBJ20 algorithm, a four-layer encryption method for protecting and authenticating analysed data from various sources such as the Internet, banks, credit and debit card transactions social media, smart phones, personal data, web log data, and analysed prediction data, among others., Each layer of the proposed four-layer algorithm performs specific actions in the encryption and decryption parts to strengthen the method’s resistance, resulting in enhanced data security and successful authentication. Finally, the work was implemented using Python and compared to the classic AES method as well as the chacha20 security method.

References 1. Arun Babu, P., Thomas, J.J.: Freestyle, a randomized version of ChaCha for resisting offline brute-force and dictionary attacks. IEEE Trans. Inf. For. Secur. (2018) 2. Adomnicai, A., Fournier, J.J.A., Masson, L.: Bricklayer attack: a side-channel analysis on the ChaCha quarter round. Progr. in Crypt. – INDOCRYPT, Lect. Notes in Computer Science, pp. 65–84. Springer

RBJ20 Cryptography Algorithm for Securing Big Data …

507

3. Dilip Kumar, S.V., Patranabis, S., Breier, J., Mukhopadhyay, D., Bhasin, S., Chattopadhyay, A., Baksi, A.: A practical fault attack on ARX-like ciphers with a case study on ChaCha20. In: Works on Fault Diagnosis and Tolerance in Cryptography, pp. 33–40, 2017 4. Fukushima, K., Xu, R., Kiyomoto, S., Homma, N.: Fault injection attack on Salsa20 and ChaCha and a lightweight countermeasure. IEEE Trustcom/BigDataSE/ICESS, pp. 1032–1037, 2017 5. Issa, A., Al-Ahmad, M.A., Al-Saleh, A.: Double-A—a Salsa20 like the design. In: 4th International Conference on Advanced Computer Science Application and Technologies, IEEE, pp. 18–23, 2015 6. Mazumdar, B., Subidh Ali, S., Sinanoglu, O.: A compact implementation of Salsa20 and its power analysis vulnerabilities. ACM Trans. Des. Autom. Electron. Syst. 22, 11:1–11:26 7. Al-Saleh, A., Al-Ahmmad, M., Issa, A., Al-Foudery, A.: Double-A—a Salsa20 like the security. In: 4th International Conference on Advanced Computer Science Application and Technologies, pp. 24–29. IEEE, 2015 8. Mazumdar, B., Subidh Ali, S., Sinanoglu, O.: Power analysis attacks on ARX: an application to Salsa20. In: 21st International On-Line Testing Symposium, pp. 40–43. IEEE 9. Watt, C., Renner, J., Popescu, N., Cauligi, S., Stefan, D.: CT-Wasm: type-driven secure cryptography for the web ecosystem. In: Proceedings of ACM Program. Lang. 3, POPL, pp. 77:1–77:29, 2019 10. Shi, Z., Zhang, B., Feng, D., Wu, W.: Improved Key Recovery Attacks on Reduced-Round Salsa20 and ChaCha. ICISC 2012, LNCS 7839, pp. 337–351. Springer 11. Bagath Basha, C., Rajaprakash, S.: Securing Twitter data using phase I methodology. Int. J. Sci. Technol. Res. 8, 1952–1955 12. BagathBasha, C., Somasundaram, K.: A comparative study of Twitter sentiment analysis using machine learning algorithms in big data. Int. J. Recent Technol. Eng. 8, 591–599 (2019)

Decision Tree for Uncertain Numerical Data Using Bagging and Boosting Santosh S. Lomte and Sanket Gunderao Torambekar

Abstract Data uncertainty is ubiquitous in real-world applications, and the number of variables such as imprecise estimation, network latency, obsolete sources, and sampling errors leads to data uncertainty. It is important to carefully deal with this kind of ambiguity, or else the mining results might be inaccurate or even incorrect. We have therefore used a new method called bagging and boosting algorithms for decision tree for the classification and prediction algorithm. As a model of sequential decision problems under uncertainty, a decision tree may be used. To manage data with uncertain information, classifiers are used. To deal with data confusion, we primarily use the PDF distribution method. It is the method in which the complete data are carried out by the distributions of probability to construct a decision tree. Supply these values to the decision tree by calculating the PDF and accordingly generate the decision tree, and on the other hand, supply the bagging and boosting algorithms with PDF and then compare both outcomes. Bagging is one of the well-known techniques of the ensemble, which constructs data bags with the same class mark and the same probability of the original data collection. More complex strategies, such as boosting the distribution of the sample, rely on how difficult it is to identify each sample. Instead of abstracting unknown data by statistical derivatives, if we use the probability density function (PDF) distribution, the accuracy of a decision tree can be much better. The dataset is initially provided to the PDF generation in our project, we provide it to the bagging and boosting algorithm in the next stage instead of giving it directly to the decision tree, then, the generated dataset is given to the decision tree, and then, the classification is carried out. Keywords Bagging · Boosting · Numerical uncertain data · Decision tree · Classifiers · Ensembles · Random forest · ANN

S. S. Lomte (B) Radhai College of Computer Science, Aurangabad, Maharashtra, India S. G. Torambekar K.P.C.Yogeshwari Tantraniketan, Ambajogai, Maharashtra 431517, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_47

509

510

S. S. Lomte and S. G. Torambekar

1 Introduction Classification plays a vital part in data mining and machine learning. Traditional classification techniques concentrate on particular data. However, data uncertainty ultimately happens in several real applications [1–4]. For example, when tracking the position of an object with the global positioning system (GPS) devices, the location reported may have errors of several meters [5]. For another example, sensor measurements may be imprecise to some degree because of the existence of several noisy variables [6]. In addition, the treatment of probe-level uncertainty of microarray data from gene expression is also a primary study factor in the field of biomedical research [7]. Uncertain awareness has offered a big challenge to traditional classification techniques. Numerous approaches have been recommended to resolve data uncertainty [8], including the potential world model that is efficient in dealing with multiple types of data uncertainty in the management of ambiguous data [9–12]. However, as far as we are concerned, only a few uncertain data classification techniques are constructed based on a possible universe. Classification is a classic issue in data mining. The purpose is to produce an algorithmic model that predicts the class mark of an unseen tuple test centered on the function vector of the tuple, given a collection of training results tuples, each with a class mark, and characterized by a function vector. The most frequent model of classification is the decision tree model. Decision trees are widespread because they are simple to understand. Laws can easily be removed from decision trees, too. Several techniques, for example, IC4.5 and ID3, have been invented for decision tree construction. Such algorithms are extensively implemented and utilized in a broad range of applications, including image identification, loan applicant credit rating, medical diagnosis, scientific testing, target marketing, and fraud detection. Because of the vast rise in data use. It is collected and stored in warehouses at incredible speeds (GB/hour). This may trigger the data to contain errors or may be made available for partial use only. For example, remote satellite sensors, telescopes scanning the sky, generating microarrays, data on gene expression, scientific simulations, generating terabytes of data, and a lot of data are unknown in other such applications. This created a need for unknown algorithms and applications for data processing such as sorting, clustering, and association. The word came to be likelihood to manage uncertain data of this sort. One of the simplest approaches to the study of statistical methods to make quantitative inferences about uncertainty is probability. In engineering and scientific problems, where the estimation of quantities is unknown, probability occurs naturally. The general classification method consists of a training set consisting of records that must be supplied with known class labels. The training set is utilized to establish a classification model, which is then applied to a test set consisting of unknown class mark data. The performance assessment of a classification model is based on the number of test records that are correctly and incorrectly identified by the model. A learning algorithm is involved in each technique. The model established through the

Decision Tree for Uncertain Numerical Data …

511

learning algorithm should match the class label, and the input data of records for the test dataset should be predicted correctly.

2 Research Objectives • To study and evaluate the briefing about the bagging and boosting, • To formulate the problem regarding the classification of the uncertain numerical data, • To propose a technique for the classification of the uncertain numerical data using a decision tree, • To validate the research methodology based on certain parameters for proving the effectiveness of the proposed technique.

3 Boosting and Bagging Aggregation bootstrap, or bagging, is a method suggested by Breiman [13] that is utilized with various categorization techniques and regression techniques for reducing the uncertainty connected with forecasting and thus boosts the prediction process. It is a straightforward idea: for the available data, several samples of bootstraps are taken, each bootstrap sample is applied to prediction process, and then the observations are combined. Bagging has been shown to improve accuracy in actual and simulated data sets utilising regression trees, classification method and subset for selection in linear regression. Boosting like bagging, is a group-based technique which can be used to improve accuracy of regression and classification approaches. Bagging is a method for uniformly weak learners that learns from each other in simultaneously and then merges the results to determine the model average. Boosting is a similar uniformly weak learner method as Bagging; however, it operates differently. In this approach learners learn continuously and flexibly to enhance predictions of learning algorithm. Moreover, in weak learners, boosting is often used (e.g., a simple classifier such as a two-node decision tree), although bagging is not. Boosting, like bagging, is a committee-centered technique that can be utilized for improving the precision of regression and classification approaches. In contrast with bagging, a weighted average of results obtained by applying a projection technique is used to boost different samples and utilizes a simple average of results to create a full forecast. Not all the tests used at each step are also taken in a similar way from the similar population with boosting, but rather, increased weight is applied to the incorrectly predicted instances of a given stage during the next step. In contrast to being based, as is the case with bagging, on a simple average of predictions, boosting is thus an iterative operation, incorporating weights. Moreover, in weak learners, boosting is often used (e.g., a simple classifier like a two-node decision tree), although bagging is not the case.

512

S. S. Lomte and S. G. Torambekar

Schapire [14] created the predecessor to subsequently enhance techniques created by him and others. Two-class classifiers were involved in his initial solution. And the results of three classifiers were pooled, created by simple majority voting from different learning samples. Freund [15] extended Schapire’s original technique by incorporating the findings of a larger number of weak learners. Then, Schapire and Freund [16] developed the algorithm AdaBoost, which quickly became extremely popular. The full boosting policy and consideration of the Schapire and Freund algorithm as a special scenario of the arcing algorithm class were extended by Breiman [17], proposing the term arcing through adaptive merging and resampling. But in the interest of brevity, and because of the algorithm success of Schapire and Freund, this chapter will concentrate on AdaBoost and refer only briefly to related methods.

4 Uncertain Data Knowledge involves noise that causes it to drift from the correct, unpredictable data is predicted or actual values. Complexity or veracity of information is one of the defining features of data in the time of big data. Information is continuously growing in length, variety, velocity, and uncertainty (1/veracity). It is abundantly contained in uncertain information on the Internet today, both in its organized and unstructured roots in sensor networks within companies. For example, in an enterprise dataset, there could be uncertainty about a consumer’s address or the reading of the temperature obtained by a sensor due to the aging of the sensor. In 2012, IBM called for the handling of volatile data on a scale in its global technology outlook report, which offers a comprehensive analysis for three to ten years looking into the future, aimed at identifying major, disruptive changes that will change the world. Analyses must definitely take into account the many types of the uncertainty of very large volumes of data to make confident decisions about companies on the basis of real-world data. Analyses based on uncertain data compromise the accuracy of subsequent decisions, so it is not possible to ignore the quantity and form of inaccuracies in this unknown evidence. For example, the abstraction of odds distributions by statistics summery differs and implies a simple way to handle data ambiguity. In several applications, data instability arises naturally for distinct reasons. Here, three groups are discussed briefly: staleness of outcomes, repeated measurement and errors in calculation. 1.

Calculation Errors: Due to measurement errors data obtained from physical device calculation is imprecise. Through an infrared sensor, body temperature is measured by tympanic (ear) thermometer that measuring the temperature of the ear drum. In a regular ear thermometer, there is a quoted ±0.2 °C calibration error, which is around 6.7% of the normal operating frame, taking human body temperature as between 37 °C (normal) down to 40 °C (Severe fever). Other variables like positioning and method, the calculation error compound can be high. Taking scales as 24% such as greater than 0.5 °C or in the case of 17%

Decision Tree for Uncertain Numerical Data …

2.

3.

513

are operating spectrum [18]. Digitization process introduced an error which is quantization. With any appropriate method such errors can be corrected, such as the distribution of uniform errors for quantization failures or distribution of Gaussian errors for random noise. Data staleness: Data values in several applications are constantly changing, and recorded data are stale all the time. One instance is a position-based tracking device. The location of a mobile device can be approximated only by applying a model of uncertainty at its last reported position [19]. A normal model of uncertainty requires knowledge of the moving speed of the system as well as what movement is limited (e.g., a vehicle traveling through a road network) or uncontrolled (like an elephant that moves on the plateau). Usually, a 2D probability density function over a bounded area is defined to model such uncertainty. Repeated measurements: From repeated observations, perhaps the very ordinary source of confusion occurs. For example, the body temperature of a patient could be taken on many occasions during the day; wind speed could be determined by an anemometer every minute; a large number of heat sensors were placed across the surface of the space shuttle. What values do we utilize when talking about wind speed, or the temperature of a certain part of the shuttle, or a patient’s temperature? Or would it be easier to use all the information through considering the distribution provided by the data values collected?

5 Related Work In the real-time world, with various methodologies, data are processed at incredible speed, quickly generating unpredictable kinds of data. The databases are more complicated due to advanced technologies to represent data in a probabilistic way. Various problems occur with uncertainty and with the classification of the decision tree on unknown numerical data [20]. Abstract probability distributions are a straightforward approach to handling data uncertainty. It is known as averaging. The other method is probability distributions which is used to construct a decision tree with the help of complete information. This technique is recognised as a method which are focused on distribution. There are different techniques in bagging, such as exactly balanced bagging and approximately balanced bagging, and RUS-boost is used in boosting techniques such as SMOTE Boost to enhance the classification of base classifiers (weak classifiers). In the presence of noise and imbalanced data, it compares the bagging techniques with boosting. It also tells us that at the time of imbalanced data, boosting is a very effective method compared to bagging [21]. The U2P-Miner algorithm was implemented by Golpira [22] for assisting decision-making. The U2P-Miner algorithm was utilized by Wu et al. [23] to classify important energy-consuming sections of the urban rail operating system. Liu [24] planned the mining of maximum frequent patterns from univariate uncertain data and the generation of summaries for frequent univariate uncertain patterns due to

514

S. S. Lomte and S. G. Torambekar

the enormous amount of frequent patterns [25]. In an incremental way, Fasihy and Shahraki [26] suggested the mining of maximum regular trends. Shao and Tziatzios [27] suggested, the univariate ambiguous format, the mining of range associations consisting of mathematical data from process. Xie et al. [28] proposed an ambiguous kernel SVM (UKSVM) which retrieves a set of values from the interval of the univariate ambiguous attribute and sampling the interval according to the underlying PDF. Then identify samples, SVM have additive kernels, i.e., HIK, χ2, and kernels of Hellinger are used. Huang et al. [29] suggested a system which used XG Boost algorithm that take ambiguous data and adopts the boosting approach to identify ambiguous data. Beside it take more expressive continuous PDF, UKSVM and XG Boost take the sampled different data values. Beside it to create a classification model, the AssoU2Classifier used the continuous probability density function. The UPTAN algorithm has recently been developed by Gan et al. [30], which uses the Bayesian belief network to identify unknown data for the itemset. The possibility of the occurrence of transaction is expressed in the unknown tuple data by a probability. Tavakkol et al. [31] suggested the UKFDA algorithm and UFLDA algorithm which adopted FDA to categorize the ambiguous data output. The author defines covariance matrix, matric of scatter is inside, and scatter measures are in middle of scatter matrix for ambiguous data objects. For ambiguous data the FDA discriminant perform better classification than other method. In 2015, Zhang et al. [32] suggested a method for explore dataset in to 2-way process used bidirectional active learning algorithm (both labelled as well as unlabelled or forward and backward active learning). The author suggested a method which combine adaptive incremental learning and active learning. The suggested technique performs well on five synthetic sets of data and two genuine data sets, as evidenced using experimental findings based on 5 synthetic collection of data and 2 actual collections of data. However, for just one instance, they did not guarantee one label in this system, then why many types of label classification enable an instance to be attached many types of labels. In 2016, [33] suggested a framework using data streams with drift for active learning. This approach tests drift degree and calculates the selective likelihood by indicating an incoming unlabelled data sample. For active labelling the sample is taken for this an adaptive incremental formula used. In this process Unlabelled data samples, active leaning, and adaptive model update are combined effectively in the detection of drift. In et al. [34] and an example weighting strategy was suggested in 2016 [34], which calculates the smallest weight associated with the new example. Subsequently, for this case, the classifier changes its prediction. In et al. [35] suggested a structure for active learning in 2017 in which no need of label information. In et al. [35] suggested query-by-committee framework in this without using any predefined data labels, a basic level DT is created and when new label data is added then DT is modified. Demir and Bruzzone introduced an active learning approach for solving problem of regression with small-scale learning data in [36]. In this study a paradigm of ε-insensitive SVR was suggested that based on the significance assessment density and diversity in process. In et al. [37] suggested a method, in which

Decision Tree for Uncertain Numerical Data …

515

a grouping of an active learning approach based on semisupervised and ambiguity. In the active learning technique, the density of the taken examples is assumed to prevent output part. In et al. [38] suggested a active learning algorithm which based on semi supervised batch mode with many class that is used to test ambiguity of data for visual recognition. This technique can leverage ambiguity and evaluating the pool’s informativeness into different groups, but it depends on the pool sets and seed, classification efficiency varies. In [39] presented a method of DT learning that can decrease epistemic ambiguity by querying and managing the most valued ambiguous data. Evidential probability is used to drive entropy intervals for querying unknown training events. Classical decision trees are used to manage whose values are precisely known. In 2015, Reitmaier et al. [39] proposed a transudative active learning method where in each active learning period named as data pool. When several labels are available during the active learning process, this approach demonstrates a PGM that can be better and iteratively refined.

6 Problem Formulation Schapire and Freund introduced the adaptive classification method (AdaBoost) algorithm. The main term in the AdaBoost algorithm is the regular use of similar training data as an alternative to the weights, randomly choosing fresh ones because the AdaBoost is not a complicated and large technique associated with other classification algorithms. The greater the weight, the more the example affects the classifier trained, the greater the weight for each case, rather than drawing a collection of separate bootstrap samples from the original instances. At each trial, the weight vector is adjusted to represent the output of the appropriate classifier, thus increasing the weight of misclassified instances. The final classifier also aggregates the learned classifiers by voting, but every vote by the classifiers is a function of its accuracy. The estimate for the training data is given by learning algorithms for each event. And if we have suitable outcomes, we can get the optimum prediction solution. Since the algorithm for boosting and bagging produces different predictions, when considering the training dataset, many of them look remarkably similar and accurate. One is chosen from the entire accessible forecasts as the final example for that example, but the question might be extra expected to be solved by incorporating the available classifiers into the training data. Thus, there is a growing knowledge that combinations of classifiers can be more powerful than single classifiers. Why depend on the best single classifier when a mixture achieves a more precise and specific outcome for many? This is essentially the logic behind the idea that classifiers have several structures. Bagging and boosting are applied to the same training data that we can use the CART for a defined base learner due to its classification simplicity, and the problem of overfitting is also the decision tree (A slight shift in the training pattern allows

516

S. S. Lomte and S. G. Torambekar

the established model to change tremendously). For each case, the different classifiers obtained from the different basic students are then clustered using the stacking process. Let D is the dataset containing n records and m attributes over Rm n, and all m attributes containing [min, max] pairs are uncertain. Calculate the probabilities of attributes, in turn, a record, apply bagging and boosting to enhance the efficiency of the decision tree, and perform the classification by applying PDF distribution.

7 Research Methodology The general classification method consists of a training set consisting of records that must be supplied with known labels for class. Training data are used to construct a classification model, which is then applied to an unknown class test set mark data. The performance assessment of the classification model is based on the number of test records that are correctly and incorrectly identified by the model. A learning algorithm is involved in each technique. The model developed by the learning approach should match the class label, and the input data of records for the test dataset should be predicted correctly. We typically use the bagging and boosting algorithm in our classification method to produce a well-defined training dataset that is given to the decision tree classifier in which the learning is carried out on the training dataset. Then, without the class label information, we apply test data in the test process, and then, the class label is predicted. If the predictive class is the class to which the record is expected to belong, either the prediction is accurate, or the prediction is incorrect. For that test set, the proportion of accurate predictions from test data is nothing but the precision of the dataset. In our work, we are expanding the AdaBoost (adaptive classification technique) algorithm to the Freund and Schapire proposal. The main term of the AdaBoost algorithm is the repeated use of weights for the similar training data as an alternative of randomly choosing fresh ones, since, matched to other algorithms of classification, the AdaBoost is not a complex and broad technique. It is noted that the number of base-level classifiers is not greatly affected when one prepares the ensemble, and typically researchers randomly choose 3 or 7 depending on the type of applications. As decision trees are built, if the concepts of linear regression are also followed, m regression equations are created for each of the ‘m’ target groups. In an algorithm, namely M5 by Quinlan, this definition is adopted. Ensembling the classifiers is performed in the suggested approach where two separate approaches are used as classifiers such as decision tree for bagging and artificial neural network (ANN). The key part of the methodology is that the bagging and boosting are independently treated using two different methods for data classification or for learning the dataset. The measures involved in performing the bagging and boosting are given below: • Step processing of the proposed work

Decision Tree for Uncertain Numerical Data …

517

Step 1: Preprocessing of the dataset which is to be trained, Step 2: Bagging and boosting techniques are applied separately on the dataset, Step 3: Different learning algorithms are used differently for bagging and boosting, Step 4: For bagging, the classifier used is decision tree, and for boosting, the classifier used is the artificial neural network. Step 5: The obtained results are then ensembled using the meta decision tree (MDTs). Step 6: Final prediction is being done on the basis of the ensembled data. • Using decision tree as base classifier A decision tree is a set of tree-structured decision tests that function in a divide-andconquer manner. A feature test, also called a split, is associated with each non-leaf node; data falling into the node will be split into separate subsets according to their different feature test values. A mark is associated with each leaf node, which will be allocated to instances falling within this node. A series of function tests are performed starting from the root node in the estimation, and the result is obtained when a leaf node is reached. • Using artificial neural networks (ANN) as a base classifier Neural networks emerged from the simulation of biological neural networks, also called artificial neural networks. A neural network’s function is determined by the neuron model, the configuration of the network, and the learning algorithm. The neuron, which is the basic computational component of neural networks, is also called the unit (Fig. 1).

Fig. 1 System architecture

518

S. S. Lomte and S. G. Torambekar

8 Expected Outcome For the final analysis of the work that will decide the classification of the dataset and the usefulness of the process addressed, certain parameters are considered. The condition is viewed as the only data point in the parameters of the assessment section, and parameters are tested for a single case in the dataset. Some of the evaluation parameters considered include relative squared error of root, relative absolute error, mean absolute error, mean squared error of the root, etc. An uncertainty matrix is being developed for the assessment of the job and represents the classification of the dataset. A matrix of uncertainty is a table that is widely used to represent the output of a test dataset that is believed to be a true classification model value. Although the terms related may be unclear, the uncertainty matrix itself is fairly easy to understand. With 400 different samples and five fields with user ID, gender, age, approximate salary, and purchased, the complete work is analyzed over the dataset. The complete work is validated as bagging and boosting in two steps, where two differentiating criteria are counted as bagging, considered to be the random forest classification technique, and boosting takes into account the ANN criteria for the unknown dataset classification (Figs. 2, 3, 4, 5, 6, and 7). As per the results communicated with the help of the graphs generated over the uncertain numerical dataset with different fields and records, the accuracy for the classification via bagging is recorded around 91%, whereas for the boosting classification, the accuracy of the system is recorded about 93%. The research technique is also analyzed on the basis of three more parameters as mean squared error (MSE), mean absolute error (MAE), and root mean squared error (RMSE) on the basis of the classification generated results for the mentioned parameters are having different outcomes as stated in Table 1.

Fig. 2 System efficiency for classification using bagging for random forest classification technique

Decision Tree for Uncertain Numerical Data …

519

Fig. 3 Random forest classification for training data

Fig. 4 Random forest classification for testing data

9 Conclusion In solving many pattern recognition issues, DTs provide a great deal of promise and are well known for their visualization of performance data. The key feature of DTC is to offer versatility, i.e., the ability to use different features subsets and decision rules at different classification points and the ability for maintaining a balance between the precision of classification and efficiency of time/space. In the current work, the weighted learning technique is used for the dataset, and the work processes over the uncertain numerical dataset. So as to ensemble the classifiers, bagging and boosting technique is used over the uncertain numerical data. The work formatted in the paper

520

S. S. Lomte and S. G. Torambekar

Fig. 5 Classification for boosting using ANN

Fig. 6 Classification using ANN for training dataset

also presents the fundamentals of an uncertain dataset, boosting and bagging. In the current work, two different classification criteria are picked as bagging and boosting, where the bagging is considered with random forest technique and is then evaluated over certain parameters. The next classification technique considered in the work is boosting for which the ANN technique is being used, and based on the results stated in the work the boosting is having high accuracy as compared to the bagging classification for the uncertain numerical dataset.

Decision Tree for Uncertain Numerical Data …

521

Fig. 7 Classification using ANN for testing dataset

Table 1 Efficiency comparison for classification by bagging and boosting Parameters

Bagging

Boosting

MAE

0.09

0.1205

MSE

0.09

0.0500

RMSE

0.3

0.2425

Accuracy

91.0%

93.0%

References 1. Aggarwal, C.C., Yu Philip, S.: A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21(5), 609–623 (2008) 2. Aggarwal, C.C., Reddy, C.K.: Data Clustering, Algorithms and Applications (2014) 3. Zhang, X., Liu, H., Zhang, X.: Novel density-based and hierarchical density-based clustering algorithms for uncertain data. Neural Netw. 93, 240–255 (2017) 4. Liu, H., et al.: Self-adapted mixture distance measure for clustering uncertain data. Knowl.Based Syst. 126, 33–47 (2017) 5. Trajcevski, G., et al.: Managing uncertainty in moving objects databases. ACM Trans. Database Syst. (TODS) 29(3), 463–507 (2004) 6. Deshpande, A., et al.: Model-based approximate querying in sensor networks. VLDB J. 14(4), 417–443 (2005) 7. Liu, X., et al.: A tractable probabilistic model for Affymetrix probe-level analysis across multiple chips. Bioinformatics 21(18), 3637–3644 (2005) 8. Sarma, A.D., et al.: Representing uncertain data: models, properties, and algorithms. VLDB J. 18(5), 989 (2009) 9. Jampani, R., et al.: MCDB: a monte carlo approach to managing uncertain data. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (2008) 10. Zhang, W., et al.: Managing Uncertain Data: Probabilistic Approaches. In: 2008 The Ninth International Conference on Web-Age Information Management. IEEE (2008) 11. Wang, Y., et al.: A survey of queries over uncertain data. Knowl. Inf. Syst. 37(3), 485–530 (2013)

522

S. S. Lomte and S. G. Torambekar

12. Dallachiesa, M., Palpanas, T., Ilyas, I.F.: Top-k nearest neighbor search in uncertain data series. Proc. VLDB Endowm. 8(1), 13–24 (2014) 13. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996) 14. Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990) 15. Freund, Y.: Boosting a weak learning algorithm by majority. Inf. Comput. 121(2), 256–285 (1995) 16. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. Icml 96 (1996) 17. Breiman, L.: Arcing classifier (with discussion and a rejoinder by the author). Ann. Stat. 26(3), 801–849 (1998) 18. Freed, G.L., Kennard Fraley, J.: 25% “Error rate” in ear temperature sensing device. Pediatrics 87(3), 414–415 (1991) 19. Wolfson, O., Yin, H.: Accuracy and resource consumption in tracking and location prediction. In: International Symposium on Spatial and Temporal Databases. Springer, Berlin (2003) 20. Liu, Y.H.: Mining frequent patterns from univariate uncertain data. Data Knowl Eng 71(1), 47–68 (2012) 21. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the Very Large Data Base, pp. 487–499 (1994) 22. Wu, M., Wang, Y., Lin, S., Hao, B., Sun, P.: A U2P-miner-based method to identify critical energy-consuming parts of urban rail operation system. In: Proceedings of the 4th International Conference on Electrical and Information Technologies for Rail Transportation, pp. 245–255 (2019) 23. Liu, Y.H.: Generating summaries for frequent univariate uncertain pattern. NTU Manag. Rev. 27(2S), 29–62 (2017) 24. Liu, Y.H.: Mining maximal frequent U2 patterns from univariate uncertain data. Intell. Data Anal. 18, 653–676 (2014) 25. Fasihy, H., Shahraki, M.H.N.: Incremental mining maximal frequent patterns from univariate uncertain data. Knowl.-Based Syst. 152, 40–50 (2018) 26. Shao, J., Tziatzios, A.: Mining range associations for classification and characterization. Data Knowl. Eng. 118, 92–106 (2018) 27. Xie, Z., Xu, Y., Hu, Q.: Uncertain data classification with additive kernel support vector machine. Data Knowl. Eng. 117, 87–97 (2018) 28. Huang, J., Li, Y., Qi, K., Li, F.: An efficient classification method of uncertain data with sampling. In: Liang, Q., Liu, X., Na, Z., Wang, W., Mu, J., Zhang, B.: (eds.) Communications, Signal Processing, and Systems. CSPS 2018. Lecture Notes in Electrical Engineering, vol. 516 (2018) 29. Gan, H., Zhang, Y., Song, Q.: Bayesian belief network for positive unlabeled learning with uncertainty. Pattern Recogn. Lett. 90, 28–35 (2017) 30. Tavakkol, B., Jeong, M.K., Albin, S.L.: Measures of scatter and fisher discriminant analysis for uncertain uata. IEEE T Syst. Man CY-S 99, 1–14 (2019) 31. Zhang, X.-Y., Wang, S., Yun, X.: Bidirectional active learning: a two-way exploration into unlabeled and labeled data set. IEEE Trans. Neural Netw. Learn. Syst. 26(12), 3034–3044 (2015) 32. Park, C.H., Kang, Y.: An active learning method for data streams with concept drift. In: IEEE International Conference on Big Data, pp. 746–752. IEEE (2016) 33. Bouguelia, M.-R., Bela¨ıd, Y., Bela¨ıd, A.: An adaptive streaming active learning strategy based on instance weighting. Pattern Recogn. Lett. 70, 38–44 (2016) 34. Dou, C., Sun, D., Li, G., Wong, R.K.: Active learning with density-initialized decision tree for record matching. In: 29th International Conference on Scientific and Statistical Database Management, p. 14. ACM (2017) 35. Demir, B., Bruzzone, L.: A multiple criteria active learning method for support vector regression. Pattern Recogn. 47(7), 2558–2567 (2014) 36. Hajmohammadi, M.S., Ibrahim, R., Selamat, A., Fujita, H.: Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples. Inf. Sci. 317, 67–77 (2015)

Decision Tree for Uncertain Numerical Data …

523

37. Yang, Y., Ma, Z., Nie, F., Chang, X., Hauptmann, A.G.: Multi-class active learning by uncertainty sampling with diversity maximization. Int. J. Comput. Vis. 113(2), 113–127 (2015) 38. Ma, L., Destercke, S., Wang, Y.: Online active learning of decision trees with evidential data. Pattern Recogn. 52, 33–45 (2016) 39. Reitmaier, T., Calma, A., Sick, B.: Transductive active learning—a new semi-supervised learning approach based on iteratively refined generative models to capture structure in data. Inf. Sci. 293, 275–298 (2015)

A Comparative Study on Various Sharing Among Undergraduates in PreCovid-19 and Covid-19 Period Using Network Parameters V. G. Deepa, S. Aparna Lakshmanan, and V. N. Sreeja

Abstract Best way amongst other approach to analyze social–cultural–economical– emotional–intellectual components are by using the network parameters in social network analysis (SNA). Covid-19 is a staggering illness brought about by an as of late found coronavirus. This has totally changed the manner in which we approach life, particularly in propensities, working environment, public transportation framework, and not least the instructive framework. Institutional instructive framework offered approach to online training framework. Here, we studied various sharings such as intellectual, financial, emotional and information among undergraduates forming these relations as a directed graph. The concepts of density, reciprocity, geodesic distance etc. are used here to treat the directed graphs. The effects of gender, caste, locality, financial status and intelligent quotient level in this sharing are examined in these periods. Keywords Social network · Network parameters · Graph · Sharing · Covid-19

1 Introduction Graphs normally emerge in numerous ways, as they are helpful approach to envision different circumstances or complex systems. In reality, graphs are almost ubiquitous in science as they can be utilized to speak to various parts of a wide range of numerical structures. Therefore, graphs are likewise pervasive in numerous logical fields where they are regularly called networks. Technically, the two terms are interchangeable, but one commonly utilizes the term network when one is considering social or mechanical associations. Interpersonal network is a graph comprising of nodes and edges used to speak to social relations. SNA is the way toward examining social structures V. G. Deepa (B) · V. N. Sreeja Sree Krishna College, Guruvayur, Kerala, India S. A. Lakshmanan Cochin University of Science and Technology, Kochi, Kerala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_48

525

526

V. G. Deepa et al.

using graph hypothesis. It portrays organized structures as far as vertices such as individual actors inside the network and the edges, or links indicates the relationships or associations between them. For more details, one can refer [1, 2]. Friendship can profit you in lifting your self-esteem, urging you to live healthier, or even simply hoisting an amazing nature. Individuals who consider themselves to be an innovator in their group of friends are more joyful than the individuals who consider themselves to be outsiders. Being socially drawn in prompts increasingly positive feelings, which understudy may really help your body’s insusceptible framework and lessen the physical indication of stress. It is likewise critical to be a decent companion of yourself, providing others with the same number of the advantages of friendship as you can. It feels great to help others [3, 4]. Covid-19 is a disease infected by coronavirus. The most ideal approach to forestall and hinder transmission is be all around educated about the infection, the illness it causes and how it spreads [5]. This pandemic period completely changed the approach to life, especially in habits, working atmosphere, public transportation system, and not least the educational system. Institutional educational system gave way to online education system. From the vast world of the classroom, each student was confined to the internet. During teenage, a good friendship can profit you in lifting your selfesteem, urging you to live healthier, or even simply hoisting an amazing nature. It is interesting to see how much these change in educational system affects student’s friendship and various sharing between them. It was in this context, we decided to do a comparative study of a classroom network using social network analysis (SNA) tools during the period of Covid-19 and its prior period. Section 1.1 describes the objective of the study, Sect. 2 contains the methodology adopted, Sect. 3 includes the results and discussion and finally the conclusion is given in Sect. 4.

1.1 Objective of the Study Through this paper, we studied the effect of coronavirus diseases among students in a classroom network. We tried to understand various sharings such as the emotional, intellectual, information and financial among 40 undergraduates in a classroom network during precovid-19 and Covid-19 periods. Also, we examined the effects of gender, caste, locality, financial status and intelligent quotient level in these sharings.

2 Social Network Data We considered a population of 40 teenage students in a classroom in Kerala for this social network study. A questionnaire is prepared and collected the data individually. Considered methodology is similar to the works in [6, 7]. Each individuals are labeled as 1, 2, …, 40. The Collected data contains the information of emotional sharing, intellectual sharing, information sharing and financial sharing for each student ‘i’ and

A Comparative Study on Various Sharing Among Undergraduates …

527

whom they approach for in the period of precovid-19 and Covid-19. This situation can be explained by the help of a directed graph G = (V, E) where the vertex set V is the set of 40 students and (u 1 , v1 ) is a directed edge in E if u 1 approaches v1 . The sociograms obtained by this manner is given in Figs. 1, 2, 3, 4, 5, 6, 7 and 8. According to Mark Newman, the frequency of loops of length two is measured by the reciprocity, which tells you how likely it is that a node you point to also points back at you [8]. In a directed network, suppose there exists a directed link from node Fig. 1 Intellectual sharing during precovid-19 period

Fig. 2 Intellectual sharing during Covid-19 period

528 Fig. 3 Information sharing during precovid-19 period

Fig. 4 Information sharing during Covid-19 period

V. G. Deepa et al.

A Comparative Study on Various Sharing Among Undergraduates … Fig. 5 Emotional sharing during precovid-19 period

Fig. 6 Emotional sharing during Covid-19 period

529

530

V. G. Deepa et al.

Fig. 7 Financial sharing during precovid-19 period

Fig. 8 Financial sharing during Covid-19 period

i to j and also from node j to i, we can say that the two edges are reciprocated each other. Such pairs of edges are known as co-links [9]. Fraction of reciprocated edges is called reciprocity, r. That is, in a directed network, r is a proportion of the probability of vertices to be mutually linked. r=

L↔ L

(1)

where L ↔ is the number of links pointing in both directions and L indicates the total number of links.

A Comparative Study on Various Sharing Among Undergraduates …

If r =

531

1, for a purely bidirectional network . 0, for a purely unidirectional network

Usually, the reciprocity,r lies in between 0 and 1for social networks of real situations. We can measure two types of reciprocities: arc reciprocity and dyad reciprocity [10]. The arc reciprocity in a directed network is defined as, Arc reciprocity = No. of reciprocated ties/(No. of reciprocated ties + No. of unreciprocated ties)

(2)

and the dyad reciprocity is calculated as Dyad reciprocity = No. of reciprocated ties/ (No. of reciprocated ties + 2 No. of unreciprocated ties)

(3)

In a graph, a basic measure of distance between two nodes is the shortest path between them. The above mentioned measure is known as geodesic. In other words, in a social network, the geodesic distance of two actors is the length in terms of the number of ties of the shortest path between the corresponding actors [10]. Average path length is a concept in social network that is characterized as the normal number of steps along the shortest paths for all possible pairs of network vertices [11]. It is a proportion of the efficiency of information on a network [11]. The average or mean distance in a social network is defined as the average length of all shortest paths between all pairs of connected vertices in the graph. Average path length gives a feeling of how strong the relations in a community. Greater the average path length, numerous people in the social network don’t straightforwardly have any acquaintance with one another. Individuals might be associated through a companion of a companion of a companion of a friend, but not through short path. On the off chance that it is low, the vast majority know each other either straightforwardly or through a common friend. For in undirected or strongly connected directed networks, average graph distance is d=

u 1 = v1 {d(u 1 , v1 ), ∀u 1 , v1 ∈ V } n · (n − 1)

(4)

where n is cardinality of V. Also we have, this d ≤ n+1 for connected graphs. 3 Average graph distance is a characteristic measure of compactness. The greatest distance between the pair of nodes is called the diameter of a graph. Generally, in a graph G = (V, E), the diameter is the largest distance between any two connected nodes. That is, D = max{d(u 1 , v1 ), ∀u 1 , v1 ∈ V }

(5)

532

V. G. Deepa et al.

If a graph has a diameter, then there exist another measure, radius along with the same. The minimum among every one of the greatest distances between a node to any remaining nodes is called as radius of the graph. Graph density is another characteristic of a graph which shows how interconnected the nodes are in a social network, which lies in between 0 and 1. For an undirected graph which is connected, the graph density is the fraction of the number of total ties to the maximum number of possible ties. The reciprocity of a network is positively related to its density.

3 Results and Discussion 3.1 Intellectual Sharing, Financial Sharing, Emotional Sharing and Information Sharing in PreCovid-19 Period Friendship is a basic relationship for the duration of the life. ‘Friendship is by and large characterized as an intentional, dyadic connection between two people that incorporates a complementary friendship where companion take part in common shared exercises’ [12]. Close friendships during the teenage period are especially developmental in light of the fact that they give the setting to learning the personal relationship abilities important to build up binds with individuals outside of the family. It is a period of psychological development and expanded freedom and independence from guardians. As per the methodologies explained earlier, from the directed graphs (Figs. 1, 3, 5 and 7) we found the following results (Table 1). At the point when we intently inspect the four sharing networks during preCovid19 period, it is clear that locality is the most powerful factor, yet gender, financial Table 1 Network parameters of sharing networks during precovid-19 period During precovid-19 period S. No.

Name of parameter

Intellectual sharing

Information sharing

Emotional sharing

Financial sharing

1 2

Reciprocity-arc

0

Reciprocity-dyad

0

0.617678

0.542636

0.297872

0.453125

0.37234

3

Geodesic distance (max)

0.177215

6

14

15

10

4

Average graph distance

2.0037

2.43718

3.37535

3.2618

5

Graph diameter

6

5

8

8

6

Graph radius

0

1

0

0

7

Density

0.0487179

0.179487

0.0826923

0.0596154

A Comparative Study on Various Sharing Among Undergraduates …

533

status and intelligence quotient level haven’t any role in these sharing. Caste effects on intellectual sharing, financial sharing and emotional sharing during this period.

3.2 Intellectual Sharing, Financial Sharing, Emotional Sharing and Information Sharing During the Period of Covid-19 The pandemic situation obliged educational institutions all throughout the world to shut down their grounds uncertainly and move their educational activities onto online platforms. The institutions were not prepared for such a change and their web teaching -learning measure progressed continuously. This pandemic has prompted a far reaching selection of online training and the exercises we adapt now will be useful later on. According to the study conducted by Pinaki Chakraborty et al., 65.9% students felt that they learn better in physical class rooms than online education [13]. The pandemics and the lockdown to contain it have influenced the psychological wellness of people all around the world. Numerous understudies are experiencing pressure and nervousness [14, 15]. As per the methodologies explained earlier, during covid-19 period, from the graphs (Figs. 2, 4, 6 and 8) we observed the following in Table 2. When we go through the four sharing networks during Covid-19 period, we can see that only gender and intelligence quotient level has a little effect on the sharing. Gender effects on information sharing and intelligence quotient level depends on intelligence sharing during the period of Covid-19. Locality, financial status and Caste haven’t any role in these sharing. Table 2 Network parameters of sharing networks during Covid-19 period During covid-19 period S. No.

Name of parameter

1

Reciprocity-arc

2

Reciprocity-dyad

3

Geodesic distance (max)

4

Intellectual sharing

Information sharing

Emotion sharing

Financial sharing

0.358209

0.369748

0.0364286

0.173913

0.218182

0.226804

0.226087

0.0952381

18

21

8

5

Average graph distance

2.85642

2.49321

3.22653

2.6997

5

Graph diameter

7

5

8

6

6

Graph radius

1

1

0

0

7

Density

0.128846

0.152564

0.0897436

0.442308

534

V. G. Deepa et al.

3.3 A Comparative Study on Various Sharing Among Undergraduates in PreCovid-19 and Covid-19 Period Using Network Parameters At the point when we think about the sharing between undergraduates during the coronavirus time frame with precorona virus period, it is very intriguing to see, the intellectual sharing network is much be more grounded during the coronavirus period. Indeed, even the classroom atmosphere is completely changed, the density of the information network and emotion network is practically same. Yet, the average graph distance of the financial network is impressively little when compared with the precovid-19 period. Locality is the most influential factors during precovid-19 period, cast also place an inevitable role in sharing, whereas these two elements haven’t any role during covid-19 period. Also, we can see, financial status does not affect these sharing networks both of these periods. A detailed report on the effect of gender, caste, locality, financial status and intelligence quotient level in these four sharing networks during precovid-19 and during Covid-19 period is given in Table 3.

4 Conclusion The Covid-19 pandemic has prompted to adopt online education for a huge scope rather than institutional training. Even though we are free from the threat of this pandemic, there is a possibility of remain these changes adopted in the educational field and so the pure institutional system may become a blended learning system in future. The relation between sharing individual data and relationship improvement has been entrenched both offline and online. Information sharing is the demonstration of specific substances (for example, people) passing data from one to another. Traditional data sharing alluded to coordinated trades of information between teenagers. By socially sharing their experiences, individuals can clearly adjust their emotional impression of these encounters in a positive way. When you are well-connected, you become not so much pushed but rather more focused, which can make your scholastic objectives more achievable friends can be incredible investigation accomplices to enable you to learn and comprehend materials. At the point when the world countenances coronavirus pandemic, the instructive framework face numerous troubles. It was difficult to move everyone of these academic activities online right away. Yet, later the students can bear the issue and we merrily found that that their intellectual sharing network is much be more grounded than the past period. The emotional sharing of teenagers is a great challenge though it doesn’t affect the system (institutional/ online). Since the online framework doesn’t offer an opportunity to meet the teens one another, therefore the financial sharing network is significantly little during the period of coronavirus.

Effect of

Gender

Caste

Locality

Financial status

Intelligent quotient level

S. No.

11

22

33

44

55

Intellectual sharing

Information sharing

During precovid-19 period Emotional sharing

Financial sharing

Intellectual sharing

Information sharing

During Covid-19 period Emotional sharing

Table 3 Comparison between the effect of gender, caste, locality, financial status and intelligence quotient level in these four sharing networks Financial sharing

A Comparative Study on Various Sharing Among Undergraduates … 535

536

V. G. Deepa et al.

Social media permits young people to make online characters, speak with others and fabricate interpersonal networks. The stages can open teenagers to recent developments, permit them to collaborate across geographic barriers and show them an assortment of subjects, including healthy behavior. Along with this, the information sharing networks haven’t any noticeable impact in the period of study.

References 1. Scott, J.: Social Network Analysis: A Handbook, 2nd edn. Sage Publications, London (2000) 2. Freeman, L.C.: The Development of Social Network Analysis: A Study in the Sociology of Science. Empirical Press, Vancouver (2004) 3. Buhrmester, D.: Intimacy of friendship, interpersonal competence, and adjustment during preadolescence and adolescence. Child Dev. 61, 1101–1111 (1990) 4. Crosnoe, R.: Friendships in childhood and adolescence: the life course and new directions. Soc. Psychol. Q. 63, 377–391 (2000) 5. http://www.who.org 6. Deepa, V.G., Aparna Lakshmanan, S., Sreeja, V.N.: Centrality and reciprocity in directed social networks—a case study. Malaya J. Mat. S(1), 479–484 (2019). https://doi.org/10.26637/MJM 0S01/0086 7. Deepa, V.G., Aparna Lakshmanan, S., Sreeja, V.N.: The role of social factors in education: a case study in social network perspective. In: Computing and Network Sustainability, Lecture Notes in Networks and Systems, vol. 75, pp. 61–72. S.L. Peng et al. (eds.) © Springer Nature Singapore Pte Ltd. (2019). https://doi.org/10.1007/978-981-13-7150-9_7 8. Newman, M.: Networks, 3rd edn. Oxford University Press (2018) 9. Newman, M.: Networks, 2nd edn. Oxford University Press (2010) 10. http://www.socnetv.org 11. Cui, Y., Chen, X., Li, X.: Notice of retraction the topology analyze of blogosphere through social network method. In: Seventh International Conference on Natural Computation (2011) 12. Heather Kohler Flynn.: Friendships of Adolescence (2018). https://doi.org/10.1002/978140 5165518.wbeosf073.pub2 13. Chakraborty, P., Mittal, P., Gupta, M.S., Yadav, S., Arora, A.: Opinion of students on online education during the COVID-19 pandemic. Hum. Behav. Emerg. Tech. 1–9 (2020). https://doi. org/10.1002/hbe2.240 14. Cao, W., Fang, Z., Hou, G., Han, M., Xu, X., Dong, J., Zheng, J.: The psychological impact of the COVID-19 epidemic on college students in China. Psychiatry Res. 287, 112934 (2020) 15. Islam, M.A., Barna, S.D., Raihan, H., Khan, M.N.A., Hossain, M.T.: Depression and anxiety among university students during theCOVID-19 pandemic in Bangladesh: a web-based crosssectional survey. PLoS One 15(8), e0238162 (2020)

Adoption of Smart Agriculture Using IOT: A Solution for Optimal Soil Decision Tree Making C. Sathish and K. Srinivasan

Abstract Agriculture farming is the essential wellspring of work to around half and above the percentage of our national’s population. During 2019–2020 harvesting data shows, food grain production was assessed to arrive at a best of 295.67 million tons (MT). In 2020–21, Ministry of Agriculture department is focusing on food production will be predicted as of 298 MT. Our nation has the biggest livestock populace of around 535.78 million and it shows approximately 31% of the total demography in the world. To fulfill this demand, farmers and farming organizations are turning to the Internet of Things for investigation and more prominent creation abilities. The Internet of Things (IoT) is set to push the eventual fate of cultivating to the following level. Smart agriculture is as of now turning out to be more normal spot among farmers, and innovative cultivating is rapidly turning into the standard gratitude to rural mechanism and sensors. With the expansion in the requests and the requirement for reasonable agriculture, it is getting truly vital for farmers and the related partners to put a great deal in information and more refined machines and devices. Keywords Smart agriculture · Sensors · Internet of Things (IoT) · Innovative cultivation · Livestock

1 Introduction Brilliant cultivating [1] introduces to a fresh out of the flexible new way to oversee ranches the utilization of current realities and interchanges innovation to development the sum and pleasant of horticultural product, improve efficiencies and productiveness, just as decrease costs. By utilizing the skills of the Internet of Things (IoT) [2], astute cultivating can use innovation comprising of sensors, programming program, C. Sathish (B) Periyar University, Salem, India K. Srinivasan Periyar University Constituent College of Arts and Science, Pennagaram, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_49

537

538

C. Sathish and K. Srinivasan

availability, region data, robotization and records investigation, to make cultivating measures insights pushed and insights empowered. With IoT, ranchers can screen field circumstances without setting off to the field and settle on vital decisions for the entire homestead, a chose crop or even an unmarried plant (or animal), if significant. As per the insights from the UN Food and Agriculture Organization, ranchers over the globe will must create 70% additional food in 2050 than they did starting at now to satisfy the foreseen people development. The utilization of brilliant innovation will empower the cunning homestead to improve its operational generally execution through perusing the records gathered and following up on it in manners that development productiveness or smooth out activities. Brilliant cultivating is a piece of the Agri-Tech industry. The food and horticulture undertaking comprise over 15% of overall GDP ($7.8 trillion) and utilizes almost 40% of the overall staff. Brilliant farming [3] is a rural administration idea essentially dependent on following [4–6] and reacting to ecosystem and un-circumstances variability in plantation and soil are the big issue. Such a chain of undesirable effects the conventional agriculture can withstand future challenges to make higher agricultural productivity. To accomplish this, the resources are to be properly managed with the new technique of Smart Farming. SF is viewed as the best bearing for better food quality, farming efficiency and ideal utilization of regular assets. Initially, planning of various soils, yields, plants and ecological effects inside a zone or a nursery, creates ‘data over-burden’ for the rancher. Besides, records procurement for soil, yield, plant and natural components are accessible anyway profoundly estimated and work concentrated on the grounds that limit of them require soil examining and lab examination [7]. These days the forecast and grouping inconveniences are effectually managed through Machine Learning (ML) procedures. The presentation of ML techniques in the spot of agribusiness basically lessens the difficulties looked by methods for the area specialists. Various examination were applied with ML methodologies to get mindful of and resolve soil inconveniences in farming simply like the expectation of soil ripeness, the conveyance of the ideal supplement levels and water and numerous others. Soil testing is a significant gadget for contrasting with be had supplement notoriety of soil and permits to decide the best possible amount of supplements to be brought to a given soil dependent on its richness and harvest wishes. In the present day have a look at the sensor values are used for the evaluation of soil for a premiere yield of crops. This observe aims to evaluation with soil and to make a higher choice about the soil vitamins and other parameters [6]. To make a higher selection, we makes a specialty of neural community. Considering all of the parameters and by using the use of ID3 set of rules work offers better selection on soil. In the current horticultural situation, an early forecast about the dirt notoriety the utilization of gadget picking up information on could be valuable for ranchers to establish a superior climate for ensuing development.

Adoption of Smart Agriculture Using IOT: A Solution …

539

2 Methodology Some popular computer algorithm techniques, Supervised Machine Learning Algorithm are used to construct the class, make relation and forms a proper designs structure inside the type of a tree structure is known as choice tree. A choice tree is a tree wherein each: • Node—a characteristic (characteristic) • Branch—a variable decision (rule) • Leaf—an optimal solution (express/non-stop). We have to make numerous calculations to build choice tree, by using ID3 set of rules is referenced. ID3 represents Iterative Dichotomiser Three. It is a sort set of decides that follows a covetous methodology via picking an agreeable property that produces most Information Gain (IG) or least Entropy (H). Entropy is a proportion of the measure of vulnerability inside the dataset S. Numerical Representation of Entropy is appeared here: H (S) =

c ∈ C − p(c) log 2 p(c)

where • S—Contemporary dataset for which entropy is being evaluated (modifications each repetition of the ID3 set of rules). • C—Collection of values for training in S instance—C = {yes, no} • p(c)—Share the wide variety of values in the class c to the wide variety of values in set S. In ID3 algorithm, entropy is evaluated for every last trademark. The trademark have the littlest entropy is utilized to the part of set S on that particular new delivery. Entropy = zero infers it’s far of unadulterated style, which implies all to the equivalent class. Information Gain IG(A) discloses to parcels vulnerability in S was diminished subsequent to parting set S on property A. Numerical portrayal of Information Gain advantage is legitimized as: I G( A, S) = H (S) −

t ∈ T p(t)H (t)

where • H(S)—Entropy of set of values S. • T —The subsets constituted of separating set S by means of variable A such that S = t Tt • p(t)—Proportion of the quantity of values in t for the wide variety of factors in set S. • H(t)—Entropy of subset of values t.

540

C. Sathish and K. Srinivasan

In ID3 algorithm, statistics advantage may be determined (as opposed to entropy) for every final characteristic. The characteristic with the most important facts advantage are used to split the set of values S on that specific repetition.

3 Related Work 3.1 Soil Parameters For proficient plant blast, nutrients are extra significant. In soil moistures, the supplements levels are marked into miniature nutrients and full scale nutrients [8]. Miniature supplements envelop Chlorine (Cl), Manganese (Mn), Copper (Cu), Boron (B), Zinc (Zn), etc., Large scale nutrients incorporate Nitrogen (N), Phosphorus (P) and Potassium (K) with common representation as NPK nutrients. Full scale supplements are more prominent vital for plant growing levels. A plant takes 80% of full scale macro nutrients and 20% value of micro nutrients. The plants don’t take the significant supplements because of to adjustment in the phase of manures (Fig. 1). Surplus utilization of manures influences the abode creatures and prompts odd ways of life of individuals. Utilization of surplus measure of manures is viewed as a moderate toxic substance for the plants. To keep away from this, pH level of supplements should be recognized. Through deciding the pH level of supplements in soil, the volume of manures for use might be decreased. At the point when the degree of soil supplement is 75%, at that point 25% of compost can be utilized. Average quantity of compost will be utilized if the dirt supplement indication is of 50%. The 75% of compost will be utilized best if soil supplement amount is of 25%. Alongside we have to monitor the temperature phase of field area, humidity in atmosphere and humidity of soil is resolved. To sorting out the dirt humidity, consumption of water can be decreased. Utilizing these ranchers can figure out what kind of yield to be making on plantation [9, 10]. Best qualities chose based on exceptional information base for moreover development in horticulture.

3.2 Soil Report Analysis In general, soil take a look at document will generally offer values which include pH [11–13], percentage natural matter (NM), phosphorus, and potassium; however, other statistics may be extraordinarily precious for better information of the soil’s ability to maintain and supply vitamins, as well as the soil fertility stability. Fig shows the pattern soil test file [14, 15] (Table 1). Cation Exchange Capacity (CEC) and Base Saturation (BS) are very vital values on a soil test report. For proper fertility plan, these facts are most important. CEC is

Adoption of Smart Agriculture Using IOT: A Solution …

541

Soil Analysis for an Optimal Yield SOIL PARAMETERS

NUTRIENTS

PROPERTIES

SOIL MOISTURE

MACRO NUTRIENTS

PHYSICAL PROPERTIES

NITROGEN (N)

HORIZONATION

HUMIDTY

PHOSPHOROUS (P)

SOIL COLOR

TEMPRATURE

POTTASIUM (K)

SOIL TEXTURE

SULPHUR (S) CALCIUM (Ca)

SAND

MAGNESIUM (Mg) SILT

MICRO NUTRIENTS CLAY ZINC (Zn)

CHEMICAL PROPERTIES

COPPER (Cu) IRON (Fe)

CATION EXC. CAPACITY

MANGANESE (Mn) SOIL REACTION (pH) BORON (B) CHLORIDE (CI-)

Fig. 1 Soil analysis for an optimal yield Table 1 Sample soil test report Sample

Ph

P*

K*

OM%

Ca*

Mg*

CEC

Micro nutrients* S

Zn

B

Fe

Mn

Cu

1

6.5

48.0

416

3.2

6.111

735

20.33

16

1.3

1.5

120

15

1.7

2

6.2

78.0

472

3.4

5.376

777

23.02

17

0.7

1.0

147

12

1.8

3

6.3

64.0

388

3.2

5.229

1134

20.46

20

0.7

0.9

102

20

1.6

4

6.3

76.0

508

3.4

5.733

1155

26.00

25

0.7

1.0

135

13

2.1

5

5.8

62.0

462

3.4

5.062

1029

24.57

15

0.9

0.8

117

15

1.5

6

6.2

84.0

464

3.7

6.615

756

22.49

14

0.8

0.9

141

16

1.9

7

6.1

62.0

496

3.7

6.216

1071

26.50

16

1.1

0.8

133

18

1.7

8

6.2

52.0

424

3.6

6.258

750

21.40

14

1.2

0.9

98.5

12

1.5

9

6.3

72.0

572

3.9

6.972

861

21.66

17

0.9

1.0

117

13

2.1

10

6.1

44.0

392

3.2

5.964

903

23.20

13

0.8

1.2

108

11

1.9

*

lbs/ac

542

C. Sathish and K. Srinivasan

commonly well understood by using many farmers due to the fact it’s far strongly correlated with soil texture (concentration of clay, silt and sand). Furthermore, the amount of clay, together with percent OM, has a giant effect on CEC. Soils with excessive CEC (~20+) typically have better water-retaining ability, clay content, OM, and nutrient-imparting energy. Soils with low CEC ( 0.0026). Training: (0.005 > 0.0026). Social networks: (0.005 > 0.0026). Risks and uncertainty: (0.012 > 0.0026). Organization size: (0.131 > 0.0026). Competitive pressure: (0.009 > 0.0026).

Although this t-test of experts’ opinions indicated that six factors have no statistically significant impact on private technology organizations’ decisions, findings from previous studies showed that two of these factors have a major influence on adopting OGD. These factors are competitive pressure, and risks, and uncertainty [12, 127]. Consequently, the competitive pressure and risks, and uncertainty factors will be kept in the proposed model, and the other four factors will be removed. Table 3 provides the confirmation of each factor in the model after the results based on the recommendations of the expert interviews. Other factors influencing the use of the OGD that most participants mentioned (availability, feedback, response time, metadata, licensing policy, and top management support) already exist in the proposed model (Sect. 4).

Towards an Integrated Conceptual Model for Open Government Data …

755

Table 3 Confirmation of the model Side of the model Category

Factors

Confirmation status

Supply side

Perceived ease of use

Confirmed

Availability

Confirmed

System quality

Information quality

Support service Demand side

Technology context

Response time

Confirmed

Data integration

Suggested by the experts

Metadata

Confirmed

Licensing policy

Confirmed

Data governance

Suggested by the experts

Awareness

Confirmed

Feedback

Confirmed

Perceived benefits

Confirmed

Risks and uncertainty

Retuning

Organization’s technology Confirmed readiness Organization context Top management support External factors

External factors

Confirmed

Competitive pressure

Retuning

Political leadership

Confirmed

Culture

Confirmed

Open data intermediaries

Confirmed

It is interesting to note that availability, feedback, and metadata were the big concerns for most organizations participating in this study and were the main reasons behind their decisions not to adopt the OGD. Additionally, the licensing policy factor has an important impact on OGD adoption. Three participants stressed that culture is an important factor in the successful adoption decision process and has been highlighted by experts, in the following comments. Expert D: Disseminating a culture (data-driven decision making), which means relying on data in decision-making, the dissemination of this culture and practice in government agencies by publishing dashboards and measuring performance indicators, will make the Agency’s public servants recognize the value and importance of data. Expert C: The majority of government agencies collect data for administrative and financial control, not for the production of detailed data that can be published as raw data. Even with this type of data, officials are unwilling to publish government agency data for fear of criticism or raising questions about the data. Expert H: I think there is still a reluctance to release government data as open data, perhaps because the practice of sharing government data is a new exercise and needs some time to mature. Perhaps more progress will be made following the

756

A. Alhujaylan et al.

adoption of the freedom of information system, which has already been discussed in the Consultative Assembly of Saudi Arabia. Another interesting point made by some participants is that, in some cases, they take advantage of OGD by hiring third-party services. This finding indicates the impact of the open data intermediaries factor, even among private technology organizations. Expert G: In some cases, we employ a third party to obtain the advantages of OGD due to a lack of appropriate data analysis software in our organization or a lack of trained human resources to perform a particular type of analysis. Expert H: In terms of the open data intermediaries, it is a normal practice that we periodically collaborate with other third parties when faced with obstacles to the processing of complex data sets or dealing with poor data quality. In addition, the experts recommended some other significant factors, most of which have already been discussed in the proposed model, such as licensing policy, risks and uncertainty, and competitive pressure. The first open-ended question asked of the experts was, “In your opinion, what other important factors need to be considered when an organization intends to adopt and use OGD?” This question seeks to identify other factors concerning organizations’ adoption of OGD in Saudi Arabia from the experts’ perspective. Below are other influential factors suggested by the experts’ opinions that may impact organizations’ intention to use OGD: data governance and data integration.

6.1 Data Governance A number of experts emphasized the importance of data governance, mentioning that it is essential to provide a framework to all government agencies and their staff to determine which data should be published as OGD, to strengthen relevant legislation, and to establish effective oversight mechanisms to enhance transparency and increase the impact of published data. Some quotations from experts are as follows: Expert B: Lack of quality and clear data governance. Also, some data sets are inconsistent with the governance of similar government agency data sets. Expert D: In my opinion, government agencies need to implement data governance that guarantees to maintain the privacy of data and protect particularly sensitive data. On the other hand, the establishment of a robust data governance system will protect the responsibility of any employee for the release of data sets (meaning that the decision to make data sets open or not open will be subject to the governance document – i.e., publishing any data sets as OGD and which data sets are not published).

Towards an Integrated Conceptual Model for Open Government Data …

757

Expert G: The existence of effective data governance increases confidence in published data and reduces data errors, in addition to enhancing the quality of interaction between the private sector and government agencies in order to achieve common objectives. This view was also reported by experts H, K, and L.

6.2 Data Integration Expert G: I think it is very important to strengthen data integration with other related data sets, such as the integration of data on commercial records from the Ministry of Commerce and Industry with spatial locations from the Ministry of Municipal and Rural Affairs. With a lack of government data integration, it is difficult to work with OGD and this, therefore, reduces the possibility of developing new products on the basis of published data. Expert F: I think it is very important to strengthen data integration with other related data sets, such as the integration of data on commercial records from the Ministry of Commerce and Industry with spatial locations from the Ministry of Municipal and Rural Affairs. with lack of data government data integration, it is difficult to work with OGD and therefore reduce the possibility to develop new on the basis of published data. Experts A, C, and K: Three of the participants noted that each government agency is responsible for its own data with almost no integration between data that has shared a specific domain. This makes it very difficult to compare data sets to ensure data reliability and quality, which in turn reduces the opportunities to benefit from them. Table 3 presents the confirmation status of each factor in the model after the results based on the recommendations from the expert interviews.

7 Conclusion OGD refers to government data that is publicly available for various stakeholder groups (citizens, private organizations, journalists, researchers, entrepreneurs, etc.) to be used and reused to derive higher returns for the national economy by creating and improving new and innovative products and services, or enhance existing ones, as well as boosting the transparency of government data and enabling citizens to assess the performance of government sectors. However, the adoption of the OGD is not straightforward; data users, such as private technology organizations, face several obstacles. Therefore, this study investigates how to encourage organizations to adopt OGD and explores the factors that may influence an organization’s intention to adopt OGD.

758

A. Alhujaylan et al.

The expert interview approach was used in this study to confirm the factors in the model identified by the literature review. In this exploratory study, semistructured interviews were used to collect data from fourteen owners or senior managers of different private technology organizations as well as from data experts in Saudi Arabia. The research participants are considered to be an expert if they had at least five years of industry experience in private technology organizations or have data expertise of at least five years. The findings showed that six of the nineteen proposed factors are not statistically significantly important (data visualization, training, social networks, risks and uncertainty, organization size, and competitive pressure). Although the outcome of experts’ opinions indicated that six factors have no statistically significant impact on private technology organizations’ decisions, findings from previous studies showed that two of these factors have a major influence on adopting OGD. These factors are competitive pressure, and risks, and uncertainty. Furthermore, the experts suggested two factors that may influence private technology organizations to adopt OGD, namely data integration, and data governance. This research is ongoing in order to develop a model for OGD adoption among Saudi private technology organizations. Future work will be carried out to validate the findings using self-administered questionnaires with data experts at Saudi private technology organizations. Further results will soon be published.

References 1. Ziegler, L.D.: Computational journalism: shaping the future of news in a big data world. In: Journalism and Ethics, eBook. IGI Global, Florida, p 14 (2019 2. Reis, J.R., Viterbo, J., Bernardini, F.: A rationale for data governance as an approach to tackle recurrent drawbacks in open data portals. In: Proceedings of the 19th Annual International Conference on Digital Government Research Governance in the Data Age-dgo ’18. ACM International Conference Proceeding Series, pp 1–9 (2018) 3. Kalampokis, E., Zeginis, D., Tarabanis, K.: On modeling linked open statistical data. J. Web Semant. 55, 1–13 (2018). https://doi.org/10.1016/j.websem.2018.11.002 4. Wirtz, B.W., Weyerer, J.C., Rösch, M.: citizen and open government: an empirical analysis of antecedents of open government data. Int. J. Public Adm. 41, 308–320 (2018). https://doi. org/10.1080/01900692.2016.1263659 5. Zhao, Y., Fan, B.: Exploring open government data capacity of government agency: based on the resource-based theory. Gov. Inf. Q. 35, 1–12 (2018). https://doi.org/10.1016/j.giq.2018. 01.002 6. Wang, H.J., Lo, J.: Factors influencing the adoption of open government data at the firm level. IEEE Trans. Eng. Manag. 67, 670–682 (2019). https://doi.org/10.1109/TEM.2019.2898107 7. Alawadhi, N., Al Shaikhli, I., Alkandari, A., Kalaie Chab, S.: Business owners’ feedback toward adoption of open data: a case study in Kuwait. J. Electr. Comput. Eng. 2021, 1–9 (2021). https://doi.org/10.1155/2021/6692410 8. Safarov, I., Meijer, A., Grimmelikhuijsen, S.: Utilization of open government data: A systematic literature review of types, conditions, effects and users. Inf. Polity 22, 1–24 (2017). https:// doi.org/10.3233/IP-160012 9. Saxena, S.: Open government data (OGD) in six Middle East countries: an evaluation of the national open data portals. Digit. Policy Regul. Gov. 20, 310–322 (2018). https://doi.org/10. 1108/DPRG-10-2017-0055

Towards an Integrated Conceptual Model for Open Government Data …

759

10. Aziz, B.: Towards open data-driven evaluation of access control policies. Comput. Stand. Interfaces 56, 13–26 (2018). https://doi.org/10.1016/j.csi.2017.09.001 11. Zainal, N.Z., Hussin, H., Nazri, M.N.M.: A trust-based conceptual framework on open government data potential use. In: Proceedings—International Conference on Information and Communication Technology for the Muslim World 2018, ICT4M 2018. Pp. 156–161 (2018) 12. Smith, G., Sandberg, J.: Barriers to innovating with open government data: exploring experiences across service phases and user types. Inf. Polity 23, 249–265 (2018). https://doi.org/ 10.3233/IP-170045 13. Talukder, M.S., Shen, L., Hossain Talukder, M.F., Bao, Y.: Determinants of user acceptance and use of open government data (OGD): an empirical investigation in Bangladesh. Technol. Soc. 56, 147–156 (2018). https://doi.org/10.1016/j.techsoc.2018.09.013 14. Kitsios, F., Kamariotou, M.: Open data hackathons: an innovative strategy to enhance entrepreneurial intention. Int. J. Innov. Sci. 10, 519–538 (2018). https://doi.org/10.1108/IJIS06-2017-0055 15. Davies, T., Walker, S., Rubinstein, M.: The state of open data: histories and horizons, eBook. International Development Research Centre, Ottawa (2019) 16. Zuiderwijk, A., Shinde, R., Janssen, M.: Investigating the attainment of open government data objectives: is there a mismatch between objectives and results? Int. Rev. Adm. Sci. (2018). https://doi.org/10.1177/0020852317739115 17. Gonzalez-Zapata, F., Heeks, R.: The multiple meanings of open government data: understanding different stakeholders and their perspectives. Gov. Inf. Q. 32, 441–452 (2015). https:// doi.org/10.1016/j.giq.2015.09.001 18. Boudreau, C.: Reuse of open data in Quebec: from economic development to government transparency. Int. Rev. Adm. Sci. (2020). https://doi.org/10.1177/0020852319884628 19. Anshari, M., Almunawar, M.N., Lim, S.A.: Big data and open government data in public services. ACM Int Conf Proceeding Ser 140–144 (2018). https://doi.org/10.1145/3195106. 3195172 20. McBride, K., Toots, M., Kalvet, T., Krimmer, R.: Leader in e-government, Laggard in open data: Exploring the case of Estonia. Rev. Fr d’Administration Publique 167, 613–625 (2018). https://doi.org/10.3917/rfap.167.0613 21. Castelnovo, W.: Unlocking the value of public sector personal information through coproduction. In: Lecture Notes in Information Systems and Organisation, pp. 379–391. Springer, Heidelberg (2020) 22. Crusoe, J.: Why is it so challenging to cultivate open government data? : Understanding impediments from an ecosystem perspective. Linköping University Electronic Press (2019) 23. Davies, T.: Open data barometer—global report. World Wide Web Found Second Ed 1–62 (2015). https://doi.org/10.1177/2043820613513390 24. Heeks, R.: Building e-governance for development : a framework for national and donor action. Institute for Development Policy and Management, Manchester (2001) 25. Said, J., Omar, N., Janssen, M., Sohag, K.: The diffusion of ICT for corruption detection in open government data. Knowl. Eng. Data Sci. 2, 10–18 (2019). https://doi.org/10.17977/um0 18v2i12019p10-18 26. Barry, E., Bannister, F.: Barriers to open data release: a view from the top. Inf. Polity 19, 129–152 (2014). https://doi.org/10.3233/IP-140327 27. Beno, M., Figl, K., Umbrich, J., Polleres, A.: Perception of key barriers in using and publishing open data. eJournal eDemocracy Open Gov. 9, 134–165 (2017). https://doi.org/10.29379/ jedem.v9i2.465 28. Conradie, P., Choenni, S.: On the barriers for local government releasing open data. Gov. Inf. Q. 31, S10–S17 (2014). https://doi.org/10.1016/j.giq.2014.01.003 29. Ma, R., Lam, P.T.I.: Investigating the barriers faced by stakeholders in open data development: a study on Hong Kong as a “smart city.” Cities 92, 36–46 (2019). https://doi.org/10.1016/j. cities.2019.03.009

760

A. Alhujaylan et al.

30. Steven, D., Gao Tianpeng, P.: A critical analysis of the guarantee mechanism of the United Kingdom and Australian Health Sector on open data. Int. J. Manag. Stud. Soc. Sci. Res. 2 (2020) 31. Saxena, S., Muhammad, I.: Barriers to use open government data in private sector and NGOs in Pakistan. Inf. Discov. Deliv. 46, 67–75 (2018). https://doi.org/10.1108/IDD-05-2017-0049 32. Vostrovský, V., Tyrychtr, J., Kvasniˇcka, R.: Open data quality management based on iso/iec square series standards in intelligent systems. In: Advances in Intelligent Systems and Computing, pp. 625–631. Springer, Berlin (2020) 33. Ramos, E.F.: Open data development of countries: global status and trends. In: Proceedings of the 2017 ITU Kaleidoscope Academic Conference: Challenges for a Data-Driven Society, ITU K 2017. Institute of Electrical and Electronics Engineers Inc., Nanjing, pp 1–8 (2017) 34. Husin, N.N.F.A., Zakaria, N.H., Dahlan, H.M.: Factors influencing open data adoption in Malaysia based on users perspective. In: International Conference on Research and Innovation in Information Systems, ICRIIS. IEEE Computer Society (2019) 35. Wilson, B., Cong, C.: Beyond the supply side: Use and impact of municipal open data in the U.S. Telemat Inf. 58, 101526 (2021). https://doi.org/10.1016/j.tele.2020.101526 36. Gao, Y., Janssen, M., Zhang, C.: Understanding the evolution of open government data research: towards open data sustainability and smartness. Int. Rev. Adm. Sci, 002085232110099 (2021). https://doi.org/10.1177/00208523211009955 37. DePietro, R., Wiarda, E., Fleischer, M.: The context for change: organization, technology and environment, 4th edn. Lexington Books, New York, New York, USA (1990) 38. Baker, J.: The technology–organization–environment framework. In: Dwivedi, Y.K., Michael, R.W., Schneberger, S.L. (eds.) Information Systems Theory Explaining and Predicting Our Digital Society, 2012th edn., p. 529. Springer, US, New York, New York, USA (2012) 39. Alqahtani, N.: Identifying the critical factors that impact on the Development of Electronic Government using TOE Framework in Saudi E-Government Context: A Thematic Analysis. De Montfort University (2016) 40. Delone, W.H.D., Mclean, E.R.M.: The DeLone and McLean model of information systems success: a ten-year update. J. Manag. Inf. Syst. 19, 9–30 (2003). https://doi.org/10.1080/074 21222.2003.11045748 41. DeLone, W.H., McLean, E.R.: Information Systems Success Measurement (2016) 42. Petter, S., McLean, E.R.: A meta-analytic assessment of the DeLone and McLean IS success model: an examination of IS success at the individual level. Inf. Manag. 46, 159–166 (2009). https://doi.org/10.1016/J.IM.2008.12.006 43. Alzahrani, A.I., Mahmud, I., Ramayah, T., Alfarraj, O. and Alalwan, N.: Modelling digital library success using the DeLone and McLean information system success model. J.Librarianship Inf. Sci. 51(2), 291–306 (2019) 44. Flack, C.K.: IS Success Model for Evaluating Cloud Computing for Small Business Benefit: A Quantitative Study. Kennesaw State University (2016) 45. Williams, M.D., Rana, N., Dwivedi, Y.K.: Information Systems Theory Explaining and Predicting Our Digital Society, 2012th edn. Springer, US, New York (2011) 46. Scott, W.R.: Institutions and organizations ideas, interests, and identities. SAGE Publications, California, FOURTH (2013) 47. Wang, H.J., Lo, J.: Adoption of open government data among government agencies. Gov. Inf. Q. 33, 80–88 (2016). https://doi.org/10.1016/j.giq.2015.11.004 48. Temiz, S., Brown, T.: open data innovation, what are the main issues/challenges for open data projects in sweden: an abstract. In: Krey, N., Patricia, R. (eds.) Back to the Future: Using Marketing Basics to Provide Customer Value, pp. 217–218. Springer, Cham (2017) 49. Susha, I., Grönlund, A., Janssen, M.: Driving factors of service innovation using open government data: an exploratory study of entrepreneurs in two countries. Inf Polity 20, 19–34 (2015). https://doi.org/10.3233/IP-150353 50. Bin, G.: A reasoned action perspective of user innovation: model and empirical test. Ind. Mark. Manag. 42, 608–619 (2013). https://doi.org/10.1016/J.INDMARMAN.2012.10.001

Towards an Integrated Conceptual Model for Open Government Data …

761

51. Cruz, R., Lee, H.J.: A Socio-Technical Model for Open Government Data Research Investigating Value-Generating Open Government Data View Project (2016). https://doi.org/10. 14329/apjis.2016.26.3.339 52. DeLone, W.H., McLean, E.R.: Information systems success: the quest for the dependent variable. Inf. Syst. Res. 3 (1992). https://doi.org/10.1287/isre.3.1.60 53. Davis, F.D.: perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q. 13, 319 (1989). https://doi.org/10.2307/249008 54. Furner, J.: Definitions of “Metadata”: a brief survey of international standards. J. Assoc. Inf. Sci. Technol. 71, E33–E42 (2019). https://doi.org/10.1002/asi.24295 55. Mosley, M., Brackett, S.E.: DAMA-DMBOK: Data Management Body of Knowledge, 1st edn. Technics Publications, Westfield (2009) 56. Pitt, L.F., Watson, R.T., Kavan, C.B.: Service quality: a measure of information systems effectiveness. MIS Q. 19, 173 (1995). https://doi.org/10.2307/249687 57. Wang, C., Teo, T.S.H.: Online service quality and perceived value in mobile government success: an empirical study of mobile police in China. Int. J. Inf. Manag. 52, 102076 (2020). https://doi.org/10.1016/j.ijinfomgt.2020.102076 58. Bauer, R.A.: Consumer behavior as risk taking, 1st edn. Division of Research, Graduate School of Business Administration, Harvard University (1967) 59. Lee, G., Xia, W.: Organizational size and IT innovation adoption: a meta-analysis. Inf. Manag. 43, 975–985 (2006). https://doi.org/10.1016/j.im.2006.09.003 60. Parasuraman, A., Colby, C.L.: Techno-Ready Marketing: How and Why Your Customers Adopt Technology. Free Press (2007) 61. Ghobakhloo, M., Arias-Aranda, D., Benitez-Amado, J.: Adoption of e-commerce applications in SMEs. Ind. Manag. Data. Syst. 111, 1238–1269 (2011). https://doi.org/10.1108/026355711 11170785 62. Hofstede, G.: Culture’s consequences : international differences in work-related values. Sage Publicatio (1984) 63. Alzahrani, A.I: Web-based e-Government services acceptance for G2C: A structural equation modelling approach (2011) 64. Elkaseh, A.M., Wong, K.W., Fung, C.C.: Perceived ease of use and perceived usefulness of social media for e-learning in libyan higher education: a structural equation modeling analysis. Int. J. Inf. Educ. Technol. 6, 192–199 (2016). https://doi.org/10.7763/ijiet.2016.v6.683 65. Hamid, A.A., Razak, F.Z.A., Bakar, A.A., Abdullah, W.S.W.: The effects of perceived usefulness and perceived ease of use on continuance intention to use E-Government. Procedia Econ. Financ. 35, 644–649 (2016). https://doi.org/10.1016/S2212-5671(16)00079-4 66. Berends, J., Carrara, W., Radu, C.: Analytical report 9: the economic benefits of open data. Eur. Data Portal, 12, 3 (2017) 67. Matheus, R., Janssen, M.: Transparency dimensions of big and open linked data. In: Conference on e-Business, e-Services and e-Society (I3E). Springer, Cham, Delft, pp. 236–246 (2015) 68. Brugger, J., Fraefel, M., Riedl, R., et al. Current barriers to open government data use and visualization by political intermediaries. In: Proceedings of the 6th International Conference for E-Democracy and Open Government, CeDEM 2016. Institute of Electrical and Electronics Engineers Inc., Krems, Austria, pp. 219–229 (2016) 69. Böhm, C., Schmidt, M., Freitag, M., et al.: GovWILD: Integrating open government data for transparency Christoph. In: Proceedings of 21st International Conference on Companion World Wide Web—WWW ’12 Companion, pp. 321–324 (2012). https://doi.org/10.1145/218 7980.2188039 70. Janssen, M., Charalabidis, Y., Zuiderwijk, A.: Benefits, adoption barriers and myths of open data and open government. Inf. Syst. Manag. 29, 258–268 (2012). https://doi.org/10.1080/ 10580530.2012.716740 71. Alexopoulos, C., Diamantopoulou, V., Charalabidis, Y.: Tracking the evolution of OGD portals: a maturity model. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, Berlin, pp 287–300 (2017)

762

A. Alhujaylan et al.

72. AlSukhayri, A.M., Aslam, M.A., Saeedi, K., Malik, M.S.A.: A linked open data-oriented sustainable system for transparency and open access to government data: a case study of the public’s response to women’s driving in Saudi Arabia. Sustainability 12, 8608 (2020). https:// doi.org/10.3390/su12208608 73. Mulder, A.E., Wiersma, M.G., Van Loenen, B.: Status of national open spatial data infrastructures: a comparison across Continents. Int. J. Spat. Data Infrastruct. Res. 15, 56–87 (2020). https://doi.org/10.2902/1725-0463.2020.15.art3 74. Gao, Y., Janssen, M., Zhang, C.: Understanding the evolution of open government data research: towards open data sustainability and smartness. Int. Rev. Adm. Sci. (2021). https:// doi.org/10.1177/00208523211009955 75. Chatfield, A.T., Reddick, C.G.: The role of policy entrepreneurs in open government data policy innovation diffusion: an analysis of Australian Federal and State Governments. Gov. Inf. Q. (2017). https://doi.org/10.1016/j.giq.2017.10.004 76. Quarati, A., De Martino, M., Rosim, S.: Geospatial open data usage and metadata quality. ISPRS Int. J. Geo-Inf. 10, 30 (2021). https://doi.org/10.3390/ijgi10010030 77. Jung, K., Park, H.W.: A semantic (TRIZ) network analysis of South Korea’s ‘Open Public Data’ policy. Gov. Inf. Q. 32, 353–358 (2015). https://doi.org/10.1016/j.giq.2015.03.006 78. Nugroho, R.P., Zuiderwijk, A., Janssen, M., de Jong, M.: A comparison of national open data policies: Lessons learned. Transform Gov. People Process Policy 9, 286–308 (2015). https:// doi.org/10.1108/TG-03-2014-0008 79. Maccani, G., Donnellan, B., Helfert, M. Adoption of open data case study adoption of open government data for commercial service innovation: an inductive case study on parking open data services. In: 23rd Americas Conference on Information Systems (AMCIS 2017). Association for Information Systems (AIS), Boston, Massachusetts, p. 10 (2017) 80. Wang, D., Richards, D., Chen, C.: An analysis of interaction between users and open government data portals in data acquisition process. In: Yoshida, K.L.M. (ed.) Knowledge Management and Acquisition for Intelligent Systems, pp. 184–200. Springer, Cham (2018) 81. Athmay AlAA, A.L., Fantazy, K., Kumar, V.: E-government adoption and user’s satisfaction: an empirical investigation. EuroMed. J. Bus. 11, 57–83 (2016). https://doi.org/10.1108/EMJB05-2014-0016 82. Hermanto, A., Solimun, S., Fernandes, A.A.R., et al.: The importance of open government data for the private sector and NGOs in Indonesia. Digit Policy Regul. Gov. 20, 293–309 (2018). https://doi.org/10.1108/DPRG-09-2017-0047 83. Temiz, S.: Open data and innovation adoption: lessons from Sweden (Doctoral dissertation, KTH Royal Institute of Technology), (2018) 84. Pitt, L.F., Watson, R.T., Kavan, C.B.: Service quality: a measure of information systems effectiveness. MIS Q. Manag. Inf. Syst. 19, 173–185 (1995). https://doi.org/10.2307/249687 85. Zuiderwijk, A., Gascó, M., Parycek, P., Janssen, M.: Special Issue on Transparency and Open Data Policies: Guest Editors´ Introduction. J. Theor. Appl. Electron. Commer. Res. 9, i–ix (2014). https://doi.org/10.4067/s0718-18762014000300001 86. Link, G.J.P., Lumbard, K., Conboy, K., et al.: Contemporary issues of open data in information systems research: Considerations and recommendations. Commun. Assoc. Inf. Syst. 41, 587– 610 (2017). https://doi.org/10.17705/1cais.04125 87. Nikiforova, A., McBride, K.: Open government data portal usability: A user-centred usability analysis of 41 open government data portals. Telemat Inf. 58, 101539. https://doi.org/10.1016/ j.tele.2020.101539 (2021) 88. Zuiderwijk, A., Janssen, M.: Barriers and development directions for the publication and usage of open data: a socio-technical view. In: Gascó-Hernández M (ed.) Open Government Opportunities and Challenges for Public Governance, eBook. Springer, New York, New York, USA, p. 402 (2014) 89. Aboelmaged, M.G.: Predicting e-readiness at firm-level: an analysis of technological, organizational and environmental (TOE) effects on e-maintenance readiness in manufacturing firms. Int. J. Inf. Manag. 34, 639–651 (2014). https://doi.org/10.1016/j.ijinfomgt.2014.05.002

Towards an Integrated Conceptual Model for Open Government Data …

763

90. Styrin, E., Luna-Reyes, L.F., Harrison, T.M.: Open data ecosystems: an international comparison. Transform Gov. People Process Policy 11, 132–156 (2017). https://doi.org/10.1108/TG01-2017-0006 91. Lassila-Perini, K., Lange, C., Jarrin, E.C., Bellis, M.: Using CMS Open data in research— challenges and directions. In: proceedings of the 25th International Conference on Computing in High Energy and Nuclear Physics (vCHEP 2021). Cornell University, pp. 1–12 (2021) 92. Pfeffer, J., Zorbach, T., Carley, K.M.: Understanding online firestorms: negative word-ofmouth dynamics in social media networks. J. Mark. Commun. 20, 117–128 (2014). https:// doi.org/10.1080/13527266.2013.797778 93. Cordasco, G., De Donato, R., Malandrino, D., et al.: Engaging citizens with a social platform for open data. In: ACM International Conference Proceeding Series. Association for Computing Machinery, pp. 242–249 (2017) 94. Kitsios, F.: Open Data Hackathons: a strategy to increase innovation in the city. In: Proceedings of International Conference for Entrepreneurship, Innovation and Regional Development (ICEIRD 2017). Thessaloniki, pp. 231–238 (2018) 95. Kumar, A., Krishnamoorthy, B.: business analytics adoption in firms: a qualitative study elaborating TOE framework in India. Int. J. Glob. Bus. Compet. 15, 80–93 (2020). https:// doi.org/10.1007/s42943-020-00013-5 96. Alhujaylan, A., Car, L., Ryan, M.: An investigation of factors influencing private technology organizations’ intention to adopt open government data in Saudi Arabia. In: 2020 10th Annual Computing and Communication Workshop and Conference, CCWC 2020. Institute of Electrical and Electronics Engineers Inc., pp 654–661 (2020) 97. Alderete, M.V.: Towards measuring the economic impact of open data by innovating and doing business. Int. J. Innov. Technol. Manag. 17 (2020). https://doi.org/10.1142/S02198770 20500224 98. Kucera, J., Chlapek, D.: Benefits and risks of open government data. J. Syst. Integr. 5, 30–41 (2014) 99. Lin C-S (2006) Organizational, technological, and environmental determinants of electronic commerce adoption in small and medium enterprises in Taiwan. Lynn University 100. Parasuraman, A., Colby, C.L.: Techno-ready marketing: how and why your customers adopt technology, 1st edn. The Free Press, New York, New York, USA (2007) 101. Toufani, S., Montazer, G.A.: E-publishing readiness assessment in Iranian publishing companies. Electron. Libr. 29, 470–487 (2011). https://doi.org/10.1108/02640471111156740 102. Khayer, A., Talukder, M.S., Bao, Y., Hossain, M.N.: Cloud computing adoption and its impact on SMEs’ performance for cloud supported operations: a dual-stage analytical approach. Technol. Soc. 60, 101225 (2020). https://doi.org/10.1016/j.techsoc.2019.101225 103. Abed, S.S.: Social commerce adoption using TOE framework: an empirical investigation of Saudi Arabian SMEs. Int. J. Inf. Manag. 53, 102118 (2020). https://doi.org/10.1016/j.ijinfo mgt.2020.102118 104. Yao, J.E., Liu, C., Xu, X., Lu, J.: Organizational size: A significant predictor of it innovation adoption. J. Comput. Inf. Syst. 43, 76–82 (2016). https://doi.org/10.1080/08874417.2003.116 47088 105. Nier, R.D.J., Wahab, S.N., Daud, D.: A qualitative case study on the use of drone technology for stock take activity in a third-party logistics firm in Malaysia. In: IOP Conference Series: Materials Science and Engineering. Institute of Physics Publishing, p. 062014 (2020) 106. Grange, C., Pinsonneault, A.: The responsible adoption of (highly) automated decisionmaking systems. In: Proceedings of the 54th Hawaii International Conference on System Sciences. Hawaii International Conference on System Sciences (2021) 107. Ismail, W.N.S.W., Ali, A.: Conceptual model for examining the factors that influence the likelihood of computerised accounting information system (CAIS) adoption among Malaysian SMES. Int. J. Inf. Technol. Bus. Manag. 15, 122–151 (2013) 108. Ezzaouia, I., Bulchand-Gidumal, J.: Factors influencing the adoption of information technology in the hotel industry. An analysis in a developing country. Tour. Manag. Perspect. 34, 100675 (2020). https://doi.org/10.1016/j.tmp.2020.100675

764

A. Alhujaylan et al.

109. Yoon, C., Lim, D., Park, C.: Factors affecting adoption of smart farms: The case of Korea. Comput. Human Behav. 108:106309 (2020). https://doi.org/10.1016/j.chb.2020.106309 110. Hainia, S.I., Nor, N.Z., Norziha, N.M., Ibrahima, R.: Factors influencing the adoption of open government data in the public sector: A systematic literature review. Int. J. Adv. Sci. Eng. Inf. Technol. 10, 611–617 (2020). https://doi.org/10.18517/ijaseit.10.2.9488 111. Hossain, M., Chan, C.: Open data adoption in Australian government agencies: an exploratory study. In: ACIS 2015 Proceedings—26th Australasian Conference on Information Systems. Association for Information Systems (2016) 112. Haini, S.I., Zairah, N., Rahim, A., et al.: Adoption of open government data in local government context: conceptual model development. In: Proceedings of the 2019 5th International Conference on Computer and Technology Applications. ACM, New York, NY, USA, pp 193–198 (2019) 113. Khurshid, M.M., Zakaria, N.H., Rashid, A., et al.: Modeling of open government data for public sector organizations using the potential theories and determinants-a systematic review. Informatics 7, 3–24 (2020). https://doi.org/10.3390/INFORMATICS7030024 114. Walker, J., Simperl, E.: Analytical Report 10: Open Data and Entrepreneurship. Eur. Data Portal, (2018) 115. Haini, S.I., Rahim, Z.A., Megat, N., Zainuddin, M.: A conceptual model of open government data adoption for local authorities in Malaysia. Open Int. J. Inf. 7, 87–98 (2019) 116. Lee, D.: Building an open data ecosystem-an Irish experience. In: Proceedings of the 8th International Conference on Theory and Practice of Electronic Governance. ACM, New York, NY, USA, pp. 351–360 (2014) 117. Berends, J., Carrara, W., Vollers, H.: Analytical Report 5: Barriers in working with Open Data. Eur. Data Portal, (2017) 118. Khurshid, M.M., Zakaria, N.H., Arfeen, M.I., et al.: An intention-adoption behavioral model for Open Government Data in Pakistan’s Public Sector Organizations—an exploratory study. In: IFIP Advances in Information and Communication Technology. Springer Science and Business Media Deutschland GmbH, pp. 377–388 (2020) 119. Khurshid, M.M., Zakaria, N.H., Rashid, A., et al.: Modeling of open government data for public sector organizations using the potential theories and determinants-a systematic review. Informatics 7, 24 (2020). https://doi.org/10.3390/INFORMATICS7030024 120. Kovaˇci´c, Z.J.: The impact of national culture on worldwide eGovernment readiness. Informing Sci J 8, 143–158 (2005) 121. Mohtaramzadeh, M., Ramayah, T., Jun-Hwa, C.: B2B E-commerce adoption in Iranian Manufacturing Companies: analyzing the moderating role of organizational culture. Int J Hum Comput Interact 34, 621–639 (2018). https://doi.org/10.1080/10447318.2017.1385212 122. Salehan, M., Kim, D.J., Lee, J.N.: Are there any relationships between technology and cultural values? A country-level trend study of the association between information communication technology and cultural values. Inf. Manag. 55, 725–745 (2018). https://doi.org/10.1016/j.im. 2018.03.003 123. Sunny, S., Patrick, L., Rob, L.: Impact of cultural values on technology acceptance and technology readiness. Int. J. Hosp. Manag. 77, 89–96 (2019) 124. Huijboom, N., Van Den Broek, T.: Open data: an international comparison of strategies. Eur J ePractice 12:1–13 (2011). 1988-625X 125. Gurstein, M.B.: Open data: Empowering the empowered or effective data use for everyone? First Monday 16 (2011). https://doi.org/10.5210/fm.v16i2.3316 126. Van Schalkwyk, F., Caˇnares, M., Chattapadhyay, S, Andrason, A.: Open Data Intermediaries in Developing Countries (2015) 127. Sin, K.Y., Osman, A., Salahuddin, S.N., et al.: Relative advantage and competitive pressure towards implementation of E-commerce: overview of small and medium enterprises (SMEs). Procedia Econ. Financ. 35, 434–443 (2016). https://doi.org/10.1016/S2212-5671(16)00054-X

IoT-Based Horticulture Monitoring System Monika Rabka, Dion Mariyanayagam, and Pancham Shukla

Abstract With climate change and global warming in mind, vertical farms, hydroponics and urban greenhouses can now be found in many cities worldwide as we transform the ways we produce food. Additionally, recent implications of the COVID19 pandemic prove that as a society we can harness the benefit of remote monitoring and automation for controlled-environment agriculture and horticulture. The subject matter of this paper is implementation of a solar-powered, Internet of Things (IoT)-based Real-time Autonomous Horticulture Monitoring System (RAHMS). The RAHMS integrates a mobile application for viewing the greenhouse crop data and camera feed of plants, and interacts with cloud databases such as Firebase and MATLAB ThingSpeak for the scalability. In particular, a simple and distinctive design of a solar-powered, low energy consuming, and inexpensive greenhouse monitoring system is presented. The paper outlines RAHMS design methodology and showcases a proof-of-concept prototype with its core hardware and software components. The proposed system has a potential to further advance the practical aspects of the remote solutions for the cultivation and monitoring of horticulture and controlled-environment agriculture. Keywords Internet of Things (IoT) · Cloud · Smart system · Horticulture · Greenhouse monitoring · Image processing

1 Introduction In the times of climate change and global warming alongside the global population growth which is expected to rise by 2.4 billion by 2050 [1], increasing food security has become an important area of research for scientists worldwide. With each M. Rabka (B) · D. Mariyanayagam · P. Shukla Department of Communications Technology and Mathematics, London Metropolitan University, London, UK P. Shukla e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_68

765

766

M. Rabka et al.

degree rise in temperature, it’s estimated that we will lose 10% of existing agricultural land. Controlled-environment agriculture (CEA) with vertical farms and urban greenhouses are being encouraged as a sustainable response to the increased demand of food. CEA also reduces the food wastage and production losses as it eliminates the pests and pathogens found in outdoor farms. Additionally, this form of food production reduces the carbon footprint caused by transportation as CEA farms are mainly placed in and around cities [2, 3]. Increase in implementation of indoor farms could lead to freeing up agricultural lands that in turn could be reverted back into forests resulting in reduction of the greenhouse emissions in the atmosphere [4]. Remote monitoring and control of these farms is the obvious next step in trying to tackle the issues at hand. This is where the Internet of Things (IoT) comes into the fold. IoT is a network of Internet-connected devices that collect user-accessible data using sensors and processors and transfer it over wired or wireless networks. As of 2019, Cisco estimated that approximately 31 billion of IoT devices were connected to the Internet, and that number will rise exponentially over the next decade [5]. Integrating these low-cost and low-powered devices into CEA will help make farmers’ daily work more manageable by reducing the amount of time needed for physical monitoring of the crops. As part of this research, several similar systems in the area of agricultural monitoring and control have been evaluated. The key elements in these systems are sensors, microprocessor and actuators. In the literature review, we carefully looked at the features, strengths, and weaknesses of such systems in comparison to the proposed Real-time Autonomous Horticulture Monitoring System (RAHMS). Our evaluation of the relevant recent work in this area is summarised in Table 1. The works presented in [6] and [7] mainly focus on the Image Processing for greenhouse crops. Our work under RAHMS focuses on adapting and enhancing some of the features from all of these studies for a low-cost, solar-powered and cloud enabled prototype.

2 Research Method 2.1 Design of the System The top-down design of the Real-time Autonomous Horticulture Monitoring System (RAHMS) is based on a client–server networking architecture model as shown in the block diagram of Fig. 1. The microcontrollers (ESP32 MCU) used in this project are inexpensive units commonly used in the industry. In the context of the system, the RAHMS client consists of a standalone camera server (ESP32-CAM) and a sensor board which collects and sends the sensor data to the server. The server (Server ESP32 MCU) transfers the received data over the Internet to cloud APIs from where this data is then processed.

IoT-Based Horticulture Monitoring System

767

Table 1 Evaluated studies Paper

Features

“IOT based environment change monitoring • Usage of Raspberry Pi 3 as a gateway with and controlling in greenhouse using WSN” [8] Arduino Uno nodes for sensor data collection • Theoretical control of the watering system, (2018) fans, light sliders and heater • Usage of basic sensors such as the LM35 temperature sensor, humidity, soil moisture and Light Dependent Resistor (LDR) “IoT based automated greenhouse monitoring system” [9] (2018)

• Full utilisation of the fairly expensive Raspberry Pi for gathering sensor data, control of the actuators and connection to the Internet • Usage of basic sensors such as YL69 moisture sensor and DHT11 temperature and humidity sensor • Integration of cloud computing in the form of MATLAB’s ThingSpeak open-source API • The authors propose the creation of a custom website/application for better visualisation of data. The RAHMS project improves on this study by creation of an asynchronous website along with a bespoke mobile application

“Design of intelligent greenhouse environment • Focus on ZigBee implementation rather than monitoring system based on ZigBee and the whole system’s functionality • Usage of local storage (SQLite3 database) embedded technology” [10] (2014) along with an LCD screen for local display of gathered data • Usage of ARM Micr02440 core board with Samsung S3C2440 microcontroller unit “IOT based greenhouse environment monitoring and controlling system using arduino platform” [11] (2017)

• Stands out with the use of GSM to control the greenhouse with SMS • Usage of basic sensors such as the DHT11, soil moisture and an LDR • RAHMS project improves on this study with the use of more advanced technologies such as Wi-Fi and 4G/LTE

“Secured IoT based smart greenhouse system with image inspection” [12] (2020)

• The strongest of all the above-researched studies • Usage of the 32-bit MSP432 MCU for sensor data collection and Raspberry Pi for the camera functionality and encryption/decryption of the data • Implementation of image inspection of the plants using image segmentation and classification using Open-Source Computer Vision (OpenCV) library • Addition of a CO2 sensor is an advantage as CO2 is an integral part of CEA

768

M. Rabka et al.

Fig. 1 Block diagram of the RAHMS, where the MCU and servers are ESP32 microcontrollers and database is the firebase real-time database

To achieve an autonomous system, RAHMS client is powered with a solar power bank, and the server receiving the sensor data is powered separately with 5 V. In this proof-of-concept implementation, in order to be able to access the photos from the camera server MCU in ThingSpeak and Android application, a tunnelling service NGROK was used to expose the local server to the public network.

2.2 Electronic Components The sensors used in the RAHMS implementation are shown in Table 2. These sensors were chosen based on the requirement to monitor essential parameters of the controlled-environment agriculture. The photosynthesis process, which produces oxygen and makes the green plants grow, is commonly defined with the chemical Formula (1). Table 2 Sensors used in the RAHMS Sensor

Description

Dallas semiconductors DS18B20

Contact digital waterproof thermometer

Bosh sensortec BME280

Combined air temperature, humidity and pressure digital sensor

AMS CCS811

CO2 and total volatile compound levels digital metal-oxide sensor

FC28

Resistive soil moisture sensor

GL55 light dependent resistor

Lux sensor

IoT-Based Horticulture Monitoring System

769

Fig. 2 Proof-of-concept implementation of the RAHMS client

6CO2 + 6H2 O → C6 H12 O6 + 6O2

(1)

Hence, it was important to include soil moisture (water), CO2 , and light levels sensors. Additionally, soil temperature and air humidity, pressure and temperature sensors were added to monitor the overall conditions in the greenhouse. Too high or too low temperature can damage the crop, whereas high air humidity can protect crops from heat damage. Figure 2 shows the 3D printed casing with the custom made PCB and RAHMS client components.

2.3 Networking The RAHMS uses the client–server model for networking purposes. Figure 3 shows a top-down network diagram of the RAHMS. The camera server connects directly to the Internet while the sensor board client (or clients) connects to the Server ESP32 MCU to transfer the sensor values to the cloud APIs.

Fig. 3 The RAHMS network diagram

770

M. Rabka et al.

3 Testing, Results and Discussion 3.1 Testing The solar power bank used for system testing contains a 26,800 mAh battery. By calculating the current consumption as shown in Table 3, we estimate that the power bank can support the client for up to 4 days continuously. To lower the power consumption, sleep mode on ESP32 MCUs could be implemented in the firmware to turn off the MCUs for set periods and only turn them on when taking a sensor reading or taking a photo. Sleep mode on ESP32 consumes only 5 µA. This allows extended power bank battery life in darker months of the year (if used in urban greenhouses). Table 3 shows the tested and calculated current consumption of the RAHMS client. Figure 4 shows the deployment of a 3D printed prototype of the RAHMS client in an outdoor (back garden) greenhouse setup. Table 3 Measured and calculated power consumption

Fig. 4 The RAHMS client tested in a greenhouse

Power consumption Initial spike

290 mA

Peak

260 mA

Average

230 mA

Wattage per hour (ideal/calculated)

0.0196708 kWh

Wattage per hour (measured)

0.0182835 kWh

IoT-Based Horticulture Monitoring System

771

Fig. 5 Firebase real-time database nodes populated with the sensor values from the RAHMS client (left); Android app screen displaying Firebase data and the camera server photo (right)

3.2 Firebase and Android Application Figure 5 (on the left) shows the Firebase JSON nodes as recorded during the greenhouse client test shown in Fig. 4. The current date and time are accessed using the Network Time Protocol (NTP) and stored in variables that are then used as part of the data path for the JSON object. This allows creation of separate nodes for each day and better organisation of stored data. On the right side of Fig. 5, the second screen of the Android app after a successful login is presented. This activity contains a scrollable view of the last five recorded sensor values taken from the Firebase real-time database and the latest photo taken by the camera MCU.

3.3 MATLAB ThingSpeak A simple MATLAB code (as shown in Fig. 6) was used for RGB to gray conversion and thresholding of the photo. Figure 7 shows the ThingSpeak channel with all the sensor data visualised on graphs along with the unprocessed and processed photos of the plant being monitored in the greenhouse.

772

M. Rabka et al.

Fig. 6 Image processing MATLAB script

4 Conclusions and Future Work To conclude, this paper presented a proof-of-concept, inexpensive solution for remote monitoring of small to medium sized, remote greenhouses that can access the Internet, either in the form of broadband, satellite, or 4G/LTE access points. Some of the obstacles faced during the design process were regarding the cellular communication and weatherproofing the RAHMS client. This has been overcome by using a tunneling service NGROK to expose the camera server to the public network and using an outdoor standard 3D printing filament. In the era of climate change and global warming where controlled-environment agriculture and horticulture are rapidly becoming essential parts of the food-producing industry, we anticipate that these design efforts will stimulate the appetite for more advanced contributions to this topic. For future work, the networking model could be improved by scrapping the serverside and introducing a 4G/LTE module so the sensor data could be sent directly to the cloud without the need for a Wi-Fi access point. Over-the-air implementation along with UI and image processing improvements in the mobile application would be advantageous. Additionally, a Machine Learning-based notification system for diseased plants, an autonomous watering system and a larger range of sensors to increase the data set for yield prediction is under consideration. As for security, the RAHMS could also implement end-to-end encryption of the data between the communication streams, with unique session IDs per client to prevent spoofing attacks or use unique credentials for each client using a RADIUS server.

IoT-Based Horticulture Monitoring System

773

Fig. 7 ThingSpeak channel graphs/visualizations populated with the sensor values from the RAHMS client, and processed and unprocessed images from the camera MCU

774

M. Rabka et al.

Acknowledgements The author, Monika Rabka, would like to thank the co-authors of this paper, Dion Mariyanayagam and her supervisor Dr. Shukla for all the generous support and guidance given throughout this project. This work is supported in part by the University’s Rescaling Funds awarded to Dr. Pancham Shukla.

References 1. United Nations Department of Economic and Social Affairs: World population projected to reach 9.7 billion by 2050 (2015). https://www.un.org/en/development/desa/news/population/ 2015-report.html. Accessed June 27, 2021 2. Despommier, D.: The vertical farm: controlled environment agriculture carried out in tall buildings would create greater food safety and security for large urban populations. J. für Verbraucherschutz und Leb. 6(2), 233–236 (2011). https://doi.org/10.1007/s00003-010-0654-3 3. Stein, E.W.: The transformative environmental effects large-scale indoor farming may have on air, water, and soil. Air Soil Water Res. 14 (2021). http://doi.org/10.1177/1178622121995819 4. Goldstein, H.: The green promise of vertical farms. IEEE Spectrum (2018). https://spectrum. ieee.org/energy/environment/the-green-promise-of-vertical-farms. Accessed June 27, 2021 5. Horwitz, L.: Internet of Things (IoT)—The Future of IoT Miniguide: The Burgeoning IoT Market Continues. Cisco. July 19, 2019. https://www.cisco.com/c/en/us/solutions/internet-ofthings/future-of-iot.html. Accessed Jan 14, 2021 6. Baquero, D., Molina, J., Gil, R., Bojacá, C., Franco, H., Gómez, F.: An image retrieval system for tomato disease assessment. In: 2014 19th Symposium on Image, Signal Processing, and Artificial Vision, STSIVA 2014, pp. 1–5 (2015). http://doi.org/10.1109/STSIVA.2014.7010156 7. Sarkate, R.S., Kalyankar, N.V., Khanale, P.B.: Application of computer vision and color image segmentation for yield prediction precision. In: Proceedings of 2013 International Conference on Information Systems and Computer Networks, ISCON 2013, pp. 9–13 (2013). http://doi. org/10.1109/ICISCON.2013.6524164 8. Shinde, D., Siddiqui, N.: IOT based environment change monitoring controlling in greenhouse using WSN. In: 2018 International Conference on Information, Communication, Engineering and Technology, ICICET 2018, pp. 1–5 (2018). http://doi.org/10.1109/ICICET.2018.8533808 9. Danita, M., Mathew, B., Shereen, N., Sharon, N., Paul, J.J.: IoT based automated greenhouse monitoring system. In: Proceedings of 2nd International Conference on Intelligent Computing and Control Systems, ICICCS 2018, pp. 1933–1937 (2019). http://doi.org/10.1109/ICCONS. 2018.8662911 10. Qiu, W., Dong, L., Wang, F., Yan, H.: Design of intelligent greenhouse environment monitoring system based on ZigBee and embedded technology. In: Proceedings of 2014 IEEE International Conference on Consumer Electronics—China, ICCE-C 2014, pp. 33–35 (2015). http://doi.org/ 10.1109/ICCE-China.2014.7029857 11. Vimal, P.V., Shivaprakasha, K.S.: IOT based greenhouse environment monitoring and controlling system using Arduino platform. In: 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies, ICICICT 2017, vol. 2018, pp. 1514–1519 (2018). http://doi.org/10.1109/ICICICT1.2017.8342795 12. Sundari, S.M., Mathana, J.M., Nagarajan, T.S.: Secured IoT based smart greenhouse system with image inspection. In: 2020 6th International Conference on Advanced Computing and Communication Systems, ICACCS 2020, no. 978, pp. 1080–1082 (2020). http://doi.org/10. 1109/ICACCS48705.2020.9074258

A Comprehensive Review on the Role of PMU in Managing Blackouts K. S. Harshith Gowda and N. Gowtham

Abstract PMUs are regarded as a most basic measurement device usually preferred for online monitoring of power system. The interference of this method has a greater effect on the techniques that are usually preferred for system analysis and control. The study also presents a brief survey of techniques and the role of PMU in mitigating various instability issues in the power system network. The concept of Blackouts and its effects are discussed along with its detrimental effects. The various control actions to prevent instability issues in the power system network is discussed as these instability issues if not taken care will lead to power system outages (blackouts). In addition, some of the control actions that are vital in preventing blackouts like tripping, use of FACTS for power system control and proper scheduling of generators for voltage control will be presented in this report. Keywords PMU · Stability · Blackouts

1 Introduction Power system is the electrical components network that performs transmission and distribution of the electrical quantity. The power grid is an example of a power system. The grid contains various electrical quantities like voltage, current, and frequency [1]. The Indian power grid has an installed capacity of over 200 GW and due to the massive population and land area, the generated power needs to travel a huge distance to provide electricity. As a result, the power system must monitor the electrical quantities for the efficient supply of electricity. The various electrical quantities like voltage at every bus, current in every branch, and line power flow needs to be monitored to minimize the transmission losses with increased efficiency [2]. To reduce the transmission losses an accurate reading of electrical quantities is K. S. Harshith Gowda (B) · N. Gowtham Vidyavardhaka College of Engineering, Mysuru, India N. Gowtham e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_69

775

776

K. S. Harshith Gowda and N. Gowtham

necessary. And also the grid has to be made flexible for continuous operation with self-healing capability against faults or contingencies. To take care of the aforesaid issues, the concept of the smart grid was evolved. Smart grid means integration of technologies which allows to rethink grid design and operation. The characteristics of smart grid is highlighted in [3, 4]. The SCADA and PMUs are also the part of smart grid. Conventionally the electrical quantities of the power system were measured by using the Remote terminal unit (RTU) and supervisory control and data acquisition (SCADA) system. The data obtained from RTU was processed by a SCADA system. This combination updates the system state for every 4–6 s and is inefficient to monitor the power grid which later results in blackouts [5]. This has paved the way for the development of synchro-phasor technology. To overcome the inefficient measurement of the SCADA system, the synchrophasor technology has evolved. Phasor is a complex number that provides both the magnitude and phase angle of the sinusoidal waveform at a specific interval of time. And the synchro-phasor refer to a sample analogue waveform of voltage and current data synchronism with a GPS clock and compute the corresponding phasor component from widely dispersed locations [6]. By using this phasor and synchro-phasor technology, Virginia Tech power system research laboratory developed a Phasor Measurement Unit (PMU). The speed and accuracy of the PMUs has encouraged the world to equip it in the power system network. Since the phasor measurement unit provides accurate and time-synchronized value of phasor, it became a popular device for the power system network [7]. One of the ways to observe the entire power system network is to insert PMUs at all the busses. As they are costly, it is not economical to deploy them across the entire power system, but they can measure voltage and current phasor values of the adjacent busses [8]. As result there exits a new problem of optimally locating PMUs to observe an entire power system network. The method of phasor measurement units is actually based on global positioning system (GPS) technology. Since PMUs are operated through satellites, phasor measurement units will give synchronized and greater speed values of positive sequence current and voltage of power system [9]. The aforementioned discussions has highlighted that PMUs are capable in enhancing the operation of the system and also in creating interest amongst the power system utilities.

2 State of the Art PMU Methods for Stability Evaluation 2.1 Review of Literature The previous research was mainly concerned in highlighting the applications of synchronized measurement signals [1]. With the advancement of PMU, research on its development and application is a major area for research since 1990s [4]. In these years, studies regarding PMU applications have dealt with many topics like evaluation of model, assessment of stability, corridor supervision and state estimation.

A Comprehensive Review on the Role of PMU in Managing Blackouts

777

In [5], considering PMU measurements, an impedance method involving trajectory sensitivity is implemented to derive certain parameters of the system that could be problematic. The technique is applied to dynamic simulation and later it is applied to minimize the search area for the error parameter in major power systems. A real-time overhead line parameter detection method is proposed in [7]. The parameter variations that can be caused by conductor sag is explained based on PMU measurements. The correctness of the technique is validated with various data methodologies. The major concern amongst the researchers is its application of PMUs to stability analysis. In [8] the selected PMU signals from various locations is checked in frequency and time domains and later the result is processed with the help of fuzzy rule-based classifiers. These classifiers are later initialized by more number of proper decision trees. Stability evaluation can be performed within one to two seconds after disturbance with the use of wide-area PMU measurement. Decision trees are also preferred [9] for security evaluation. The security indicators that are real-time are obtained with the use of PMU signals and are processed with decision trees which periodically updates it for future estimations [10]. A robust recursive least squares method with autoregressive moving average exogenous technique is presented in [10]. The role of this technique is to measure the data with the help of a robust objective function to take care of the nontypical measurement data. Voltage instability risk indicators are implemented with the help of fast local phasor measurements [11]. This technique is mainly developed for real-time adaptive identification of simplified circuit components. In order to identify the voltage instabilities, caused by the outages mainly due to the generation or transmission equipment, in [12], sensitivity indices for finding the reactive power generation to load is presented. The developed model utilizes PMU data measurements and is capable of observing the instability of the complete region particularly for real-time use. In [13] authors have presented the utilization of PMU data particularly to find out the equivalent models of interconnected systems to analyse I–V curve characteristics and also for stability related issues. In [14], an inertia extrapolating algorithm along with a measurement-based reactance is implemented to develop a technique of a radial network. This technique highlights the inter-area dynamics on the transmission line which is later processed by the single inter-area mode. In [15], the concept of energy function is presented to determine the transmission line transient and small-signal stability status that involves active power flow data. The way of transmission is transformed as a two-machine system to estimate the details related to PMU, the parameters of transmission corridor and equivalent inertias. In [16], a novel method in which the components are found by the least square method with the help of several samples of PMU measurements. Also, the model is capable of estimating the load margin details of the transmission line network. These details can further be used to predict the voltage stability index. In [17], the details

778

K. S. Harshith Gowda and N. Gowtham

suggest that real-time calculation of line impedance is performed with PMU as input mainly to modify the data related to distance relays. In [18], for large-scale power systems, a distributed state estimation algorithm is explored. Diakoptic method can be used to divide the system into subsystems and later the input from PMUs can be used to determine subsystem’s estimation data. In [19], authors elucidates the key impacts and the bottle necks of using the data of the PMU for state estimation. The increase in demand and liberalization of power system network, the utility globally is exposed to enhanced operation uncertainties and also to the risks related to blackouts. As a result, PMUs have helped in giving the strategic information mainly for decision support. The literatures in [19] highlights the structure of wide-area measurements and controllers and a few applications of PMU.

3 Scope of PMU Applications in Network Situation Awareness Globally, PMUs have been installed and has created major interests amongst the stake holders [20]. The need of PMU in enhancing the operation and control of power system is well appreciated. In United States, PMUs are incorporated to evaluate system performance and to model them suitably by probing tests [10]. Also few Asian countries, Wide-Area Measurements technology is used for validations of system model and in monitoring the stability [11]. In Canada incorporates widearea monitoring system to check frequency regulation and also to avoid contingencies related to induced geomagnetic storm. The aforementioned techniques help in enhancing the capability of PMU in real-time analysis (Table 1). Table 1 PMU implementation in some parts of the world [11] PMU applications

North America

Europe

China

India

Brazil

Russia

Post-disturbance analysis

•

•

•

P

T

•

Stability monitoring •

•

•

P

P

•

Thermal overload monitoring

•

•

•

P

P

•

Power system restoration

•

•

•

P

P

P

Model validation

•

•

•

P

T

•

State estimation

T

P

P

P

P

P

Real-time control

T

T

T

P

P

P

Adaptive protection

P

P

P

P

P

P

Wide-area stabilizer

T

T

T

P

P

P

A Comprehensive Review on the Role of PMU in Managing Blackouts

779

4 Blackouts: Causes and Control Actions 4.1 Causes for Blackouts There are several reasons for blackouts in a power system network. Prevention of a blackout in a power system network is a key issue in the power system network and a challenging task for power system utilities. The main causes for blackouts is transmission line overloading, ice loading on transmission lines and damage of protection and control systems. With proper control techniques it is easier to prevent blackout from n − 1 contingencies and ensure load generation balance. In recent times, the most severe blackout was the July 2012 India blackout that had occurred as two phases on 30 and 31 July 2012. This blackout had affected around 22 states of northern, and eastern part of India. Around 32 GW of generating capacity was off during this black out [1]. Some of its effects include: This outage had affected the summer of 2012 causing failure, extreme heat led to enhance the usage of power to highest levels in Delhi that later caused coal shortages in the country. Since monsoon arrived late in agricultural areas of Punjab and Haryana, this caused more power consumption from the grid as farmers used more water pumps for farming. Also hydel power plants couldn’t generate much power than usual, as a result the load on thermal power plants increased to suffice the need of the load. PMU have been installed globally to obtain the extra phase data of the voltage and current and also to obtain the synchronized measurements to provide solutions that are needed for secure and reliable operation. PMU can provide real-time data of system measurement and security assessment mainly for control room applications. Some of the information related to applications of PMU is listed below. • Information related to PMU The voluminous data collected from PMU repository is a major issue for real-time visualization. The structured representation of system states must be visualized for better understanding of the operators. The data includes variations of potential, angle, frequency, topological changes, temperature, harmonics, so on and so forth. The techniques mainly available for visualization contains graphic interface and worst case alarm [11]. Also, some of the practical highlighting static system performance indices that can be calculated instantly with the help of strategic PMU input to know the operating state of the system. • Dynamic security assessment Real-time dynamic security assessment actually relies on system method and data measured. Reliable methods are often required for real-time model validation and state estimation. Dynamic Security Assessment has to satisfy the roles of identifying and estimating the oscillations, stability, instability and security margin.

780

K. S. Harshith Gowda and N. Gowtham

To interpolate other operating areas of power systems that includes generating topologies, sensitivity indicators has to be developed. The sensitivity indicators later can be used for coordinated control architecture and also for emergency control applications. Indicators are mainly for voltage collapse, frequency and transient instabilities. DSA can be classified as below [12]: • Voltage stability Voltage stability can be divided according to time spans. In case of short-term events that are up to 1–2 s, line voltage stability indices or local bus with system reactive power margin can be used for real-time analysis. In case of short-term or long-term voltage stability detection that are more than 10 s to minutes equivalent circuits with Thevenin or Jacobian eigenvalues can be used. • Small-signal stability Two methods normally used for small-signal stabilities are conventional eigenvalue analysis which includes dominant frequency and damping details. This method is dependant on the correctness of system model and is costly for computations mainly for large-scale systems. The other method is measurement-based, in this method measured data is used to predict defined model. Also, methods that includes fourier transform, wavelet transform and prony analysis can be used for oscillation detection. • Transient security assessment After the fault inception and clearance, methods involving transient security assessment should enable real-time computation for stability analysis. Until recently, methods included modified equal area criterion or energy function [13]. Possible techniques are decision trees, artificial neural network, support vector machine, etc. As a result, security assessment should include, • Monitoring • Analysis of power system security • Margin Determination for power system security. The key takeaways of online dynamic security assessment should include: 1. 2. 3. 4.

Assessment of system snapshot. The need of contingency in validating online dynamic security assessment. Automated and HMI interfaces state responses that depends on short-term, midterm or long-term identification. Identification of security issues and suggestions for needful actions.

• State estimation The main objective of PMU is to enhance the capability of state estimators. PMU data obtained from the data concentrator software of the PMU can be sent to SCADA system. One of the main reason for its incorporation is its reliability and accuracy as

A Comprehensive Review on the Role of PMU in Managing Blackouts

781

applied to state estimation. PMUs are capable of measuring the phasors of voltage and currents at bus systems and feeders. The measured values are highly accurate w.r.t magnitude and phase angle [14]. Also, PMUs can send measured data that ensures whether the quality of the sent data is as per the set of defined standards. • Integration of distributed generation Large-scale integration of distributed generation with many renewable energy sources, such as solar and wind, CHP will intervene the power grid in terms of planning, operation and control. In recent times, enhanced use of electrical vehicles has also led to several grid integration issues. The output of the renewable energy sources will provide distorted outputs to the grid. As a result, the performance of the system gets affected by harmonics later causing issues related to security and reliability [15]. To ensure control room services, forecasting of energy and real-time PMU measurements, PMU measurements should incorporate real-time visualization and stability margin assessment.

4.2 PMUs—Selections of Control Actions to Avoid Blackouts Some of the major power system blackouts were the ones that happened in North America and Europe. These events have created attention amongst the power system engineers as these lead to detrimental effects with a major failure of the power systems. CIGRE and IEEE committee were formed to study and suggest suitable remedial measures for such power system failures [16]. The main reason for blackouts in power system networks is due to continuous failures of few components and significantly because of instability issues like transient and voltage stability. To prevent blackouts in power system network, coordinated measures need to be incorporated to optimize the risk. Some of the suggested control measures for the system include, • • • • • • • • •

Protection Lower frequency load shedding Proper scheduling of generators and voltage control Reactance switching Tapping of Transformer Power system control using FACTS Power modulation for HVDC systems Tripping of unstable devices Islanding.

For implementing the control actions, detection methods should be based on different instability modes. Optimization of actions is necessary to reduce the control risk. For proper investigation of control action for different operating conditions,

782

K. S. Harshith Gowda and N. Gowtham

offline studies need to be performed. For proper validation of these studies, hardware or software related experiments can be conducted. Data mining methods can be used as a tool to predict the situation and to suggest further action. Observability of the power system can be enhanced with the use of PMU data [19]. Wide-area measurements can be used for applications related to voltage control, oscillations, voltage stability and also for wide-area protection.

5 Conclusions PMUs are regarded as a most basic measurement device usually preferred for online monitoring of power system. The interference of this method has a greater effect on the techniques that are usually preferred for system analysis and control. The integration of this method has greater effect on the principles that are mainly used for analysis and control of power systems. Also, there are a few techniques that are available for using PMU information for the sake of system security assessment and to prevent blackout. The key takeaways of PMUs is recognized and some of the important studies and some of its implementation projects are still in infancy.

References 1. Kamali, S., Amraee, T.: Prediction of unplanned islanding in power systems using PMU data. In: 2018 IEEE International Conference on Environment and Electrical Engineering and 2018 IEEE Industrial and Commercial Power Systems Europe (EEEIC/I&CPS Europe), pp. 1–5 (2018). http://doi.org/10.1109/EEEIC.2018.8494479 2. Waqar, A., Khurshid, Z., Ahmad, J., Aamir, M., Yaqoob, M., Alam, I.: Modeling and simulation of phasor measurement unit (PMU) for early fault detection in interconnected two-area network. In: 2018 1st International Conference on Power, Energy and Smart Grid (ICPESG), pp. 1–6 (2018). http://doi.org/10.1109/ICPESG.2018.8384491 3. Li, B., Li, H., He, W., Wei, B., Chen, S., Yu, C.: Application of improved online monitoring method using PMU data in recent European blackout. In: 2016 International Conference on Condition Monitoring and Diagnosis (CMD), pp. 815–818 (2016). http://doi.org/10.1109/ CMD.2016.7757951 4. Patil, G.C., Thosar, A.G.: Application of synchrophasor measurements using PMU for modern power systems monitoring and control. In: 2017 International Conference on Computation of Power, Energy Information and Communication (ICCPEIC), pp. 754–760 (2017). http://doi. org/10.1109/ICCPEIC.2017.8290464 5. Adhikari, S., Snyder, A., Mueller, D., Zavadil, B., Smith, B., Loehr, G.: Effective angle prediction algorithm for utilization of PMU data: toward prevention of wide area blackouts. In: 2018 Clemson University Power Systems Conference (PSC), pp. 1–8 (2018). http://doi.org/10.1109/ PSC.2018.8664026 6. Wang, C., Hou, Y.: A PMU-based three-step controlled separation with transient stability considerations. In: 2014 IEEE PES General Meeting | Conference & Exposition, pp. 1–5 (2014). http://doi.org/10.1109/PESGM.2014.6939236

A Comprehensive Review on the Role of PMU in Managing Blackouts

783

7. Gupta, S., Waghmare, S., Kazi, F., Wagh, S., Singh, N.: Blackout risk analysis in smart grid WAMPAC system using KL divergence approach. In: 2016 IEEE 6th International Conference on Power Systems (ICPS), pp. 1–6 (2016). http://doi.org/10.1109/ICPES.2016.7584069 8. Pulok, M.K.H., Faruque, M.O.:Utilization of PMU data to evaluate the effectiveness of voltage stability boundary and indices. In: 2015 North American Power Symposium (NAPS), pp. 1–6 (2015). http://doi.org/10.1109/NAPS.2015.7335111 9. Shazdeh, S., Golpira, H., Bak, C.L., Bevrani, H.: A wide-area back-up protection scheme for discrimination of symmetrical faults from power swing and load encroachment. In: 2020 10th Smart Grid Conference (SGC), pp. 1–6 (2020). http://doi.org/10.1109/SGC52076.2020.933 5755 10. Zhou, N., Trudnowski, D.J., Pierre, J.W., Mittelstadt, W.A.: Electromechanical mode online estimation using regularized robust RLS methods. IEEE Trans. Power Syst. 23(4), 1670–1680 (2008) 11. Corsi, S., Taranto, G.N.: A real-time voltage instability identification algorithm based on local phasor measurements. IEEE Trans. Power Syst. 23(3), 1271–1279 (2008) 12. Glavic, M., Van Cutsem, T.: Detecting with PMUs the onset of voltage instability caused by a large disturbance. In: 2008 IEEE Power and Energy Society General Meeting, pp. 1–8 (2008) 13. Parniani, M., Chow, J., Vanfretti, L., Bhargava, B., Salazar, A.: Voltage stability analysis of a multiple-infeed load center using phasor measurement data. In: 2006 IEEE PES Power Systems Conference and Exposition, pp. 1299–1305 (2006) 14. Chow, J.H., Chakrabortty, A., Vanfretti, L., Arcak, M.: Estimation of radial power system transfer path dynamic parameters using synchronized phasor data. IEEE Trans. Power Syst. 23(2), 564–571 (2008) 15. Chow, J.H., Chakrabortty, A., Arcak, M., Bhargava, B., Salazar, A.: Synchronized phasor data based energy function analysis of dominant power transfer paths in large power systems. IEEE Trans. Power Syst. 22(2), 727–734 (2007) 16. Liu, M., Zhang, B., Yao, L., Han, M., Sun, H., Wu, W.: PMU based voltage stability analysis for transmission corridors. In: 2008 Third International Conference on Electric Utility Deregulation and Restructuring and Power Technologies, pp. 1815–1820 (2008) 17. Kim, I.-D., Aggarwal, R.K.: A study on the on-line measurement of transmission line impedances for improved relaying protection. Int. J. Electr. Power Energy Syst. 28(6), 359–366 (2006) 18. Jiang, W., Vittal, V., Heydt, G.T.: Diakoptic state estimation using phasor measurement units. IEEE Trans. Power Syst. 23(4), 1580–1589 (2008) 19. Wu, H., Giri, J.: PMU impact on state estimation reliability for improved grid security. In: 2005/2006 IEEE/PES Transmission and Distribution Conference and Exhibition, pp. 1349– 1351 (2006) 20. Bentarzi, H.: PMU based centralized adaptive load shedding scheme in power system. In: 2015 27th International Conference on Microelectronics (ICM), pp. 150–153 (2015). http://doi.org/ 10.1109/ICM.2015.7438010

Secure COVID-19 Treatment with Blockchain and IoT-Based Framework Garima Jain, Garima Shukla, Priyanka Saini, Anubha Gaur, Divya Mishra, and Shyam Akashe

Abstract The COVID pandemic has opened the eyes of numerous nations about their medical services framework. The short blast and uncontrolled overall spread of COVID-19 show the constraints of existing medical services frameworks to handle general wellbeing crises conveniently. The disease and demise numbers detailed by World Health Organization (WHO) about this pandemic is an expanding danger to the lives of individuals and the financial matters of nations. The greatest challenge that most governments are experiencing is the absence of a precise mechanism to recognize obscure contaminated cases and anticipate the disease danger of COVID19 infection. Numerous countries have been utilizing a scope of devices to battle the pandemic, looking for data about development, checking just as the releasing the private data of the occupants. This research paper plans to help tainted individuals with care utilizing the Internet of Things (IoT) and blockchain innovations. From one viewpoint, blockchain can battle pandemics by empowering early discovery of the cases, securing client protection, and guaranteeing a compacted clinical flexibly chain during the pandemic attack. Then again, IoT-based medical services accumulate valuable data, give advanced knowledge through indications and practices, permit far-off checking, and essentially give individuals better self-assurance and medical services. A proposed layer consists of four-layer architecture utilizing IoT and blockchain to identify and predict people to be COVID-19. This idea provides a framework for patients with COVID-19 irresistible infection and perceives medical problems and determinations. The proposed approach is anticipated to deliver a G. Jain (B) · G. Shukla Computer Science and Engineering, Noida Institute of Engineering and Technology, Greater Noida, India e-mail: [email protected] P. Saini Electronic and Communication, Swami Vivekanand Subharti University, Meerut, India A. Gaur · D. Mishra Computer Science Engineering, Swami Vivekanand Subharti University, Meerut, India S. Akashe Electronic and Communication, ITM University, Gwalior, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_70

785

786

G. Jain et al.

robust framework ready to help governments, healthcare specialists, and residents make basic choices concerning disease recognition, disease forecast, and disease avoidance. Keywords Blockchain · IoT · COVID-19

1 Introduction 1.1 Background The COVID-19 coronavirus problem is growing day by day, and it is spreading more and more. It is presently affecting many countries and regions worldwide. The WHO (World Health Organization) has already confirmed the pandemic of Coronavirus, a worldwide crisis for the community health domain. CORONAVIRUS PANDEMIC + “GREAT CESSATION” = BLACK SWAN OF THIS GENERATION. The outbreak of COVID-19 has generated a worldwide health emergency that has had a bottomless impact on the way we identify our world and our daily lives. The sudden rise and rapid but uncontrolled global spread of the Coronavirus show us the disaster of existing healthcare surveillance methods to appropriately handle all the community health crises on time. Since there is no therapy or vaccines round to combat this virus yet, the fitness industry and services asking for tremendous plans. It can assist in controlling the amount of disease spread. Thus, the entire showed case estimation is vital for handling the healthcare system’s request and creating a new clinical infrastructure. It may use various mathematical, statistical, and artificial Intelligence modelling strategies. Estimating quick and long-time period inflamed cases could assist in powerful making plans and the wide variety of extra substances and sources required to address the outbreak. This estimation of the healthcare system’s expected burden is vital to well-timed and successfully manipulate the clinical centres and different necessary sources to combat the pandemic. Such estimates can direct the severity, and the wide variety of measures had to carry down the outbreak.

1.2 Motivations It has been seen that COVID-19 had an adverse and partial effect when we talk about the world’s stock markets in the early stage of the pandemic. With the spill-over effect on the world, the catalogues failed after the pandemic, as opposed to the evaluation era. Finance aims to uncover outlier events on a regular basis and fails with equal regularity. One such occurrence, the impact of a novel coronavirus (COVID-19) on financial exchange, bears all the characteristics of a dark swan. A black swan is the result of an incredibly unexpected incident that also has an incredible effect. As a

Secure COVID-19 Treatment with Blockchain and IoT-Based Framework

787

result, the term “black swan” came to represent an event that occurred despite its absurd appearance. This, in turn, represents multiple stock market crashes in various regions of the globe. COVID-19’s behavioural Impact causes mental disease in humans, which can lead to depression. COVID-19 has impacted the industries of large scale, medium-scale as well as small scale. As per a survey, many small-scale industries are about to shut down their business, ultimately impacting the employees of different grades. Health impact in developing countries or even in the major developed countries, people are no so health conscious as to build their immunity to fight with corona. Many countries are lacking the first aid procurement for COVID-19. Significant research is monitored for the critical health tools to fulfil the crisis of COVID-19.

1.3 Covid Remedy Technique The vaccinations are ready, and COVID-19 treatments are being tested; however, it will take several months. Aside from that, future demands on the global healthcare system will increase. This type of stress comes in two flavours. In the start, there is the potentially overwhelming burden of diseases, which raises questions about the quality of healthcare infrastructure. Second, there are negative consequences for healthcare practitioners, such as a greater risk of infection.

1.3.1

Hit and Trial Method

In several cases, it has been observed that if a person is suffering from normal fever, then the direct fear has been made of COVID-19 positive without even going through checking. It will give a different negative impact on individuals. Firstly a person who has gone through mental trauma due to the fever of infected with the disease may cause depression. Secondly, in many developing countries, it is seen that the government by themselves to an individual to stay in quarantine time for a certain time.

1.3.2

Prediction-Based Technique

Table 1 is used to describe the technique, which includes the predicted data based on the age factor, as it has been seen that a person from a certain age group was found infected. That is because people of the same age group have the same symptoms and are treated in the same manner. So, according to prediction, the person kept in quarantine as it decreases productivity economic session decrease.

Patient age

40

43

59

49

50

39

45

57

48

Name

A

B

C

D

E

F

G

H

I

F

F

M

M

F

M

M

M

F

Sex

China

Italy

Singapore

China

Singapore

Italy

Germany

UK

Italy

Travel history

Table 1 Prediction based technique

2nd June–10th June

9th June–18th June

2nd March–10th March

9th June–18th June

10th July–17th July

11th March–19th March

1st June–7th June

5th April–12 April

13th–20th March

Duration of travelling

N

Y

Y

N

Y

Fever

Y

N

Y

Y

Y

Cough

Disease history Sneeze

Y

Y

Y

Y

N

Not a good corona remedy procedure

Not a good corona remedy procedure

Not a good corona remedy procedure

Not a good corona remedy procedure

Infected

Infected

Infected

Infected

Infected

Conclusion

Predicted according to given data

Predicted according to given data

Predicted according to given data

Predicted according to given data

Prediction

788 G. Jain et al.

Secure COVID-19 Treatment with Blockchain and IoT-Based Framework

789

1.4 Outline The rest of the paper is carried out as follows. Section 2 is a synopsis of the connected work. Section 3 contains a thorough examination of several theories and concepts. COVID-19 and blockchain, where the definition of blockchain and blockchain in COVID-19 is detailed. Section 4 explains IoT and blockchain with COVID-19 with layer techniques and framework is explained in healthcare maintenance and insurance. Finally, Sect. 5 is followed by a conclusion and some suggestions for further research and areas of interest.

2 Literature Review They defined the remedy of several infectious disorders, according to Khatoon [1]. The Ethereum blockchain infrastructure described in this paper was used to design and implement a smart settlement. According to Kumar and Pundir [2], these paintings may help more than one stakeholder concerned about the clinical device reducing disease transmission. They propose that reducing the direct engagement of employees can reduce the risk of infection. The framework comes with a slew of advantages for controlling and manipulating the COVID-19 correctly. Implementing those technologies could help to improve essential clinical services by utilizing the logistics network. Make body paintings of our notion proposed can develop the intelligence, decentralization, and autonomous operations of connected multi-robotic collaboration within the blockchain network, according to Alsamhi and Lee [3]. We look at the benefits of blockchain in building a framework of multi-robotic collaboration packages to combat COVID-19 pandemics, such as tracking and outside and clinic End to End (E2E) transit systems [4]. According to Wang et al., They demonstrate how the blockchain dilemma revolves around obtaining the authority to change the software to ensure that critical blockchain data is kept secure. Experiments have shown that the protection manipulates components are practical and applicable. We can use it in a variety of contexts inside the CDMBS. According to Airfeen et al. [6], the author discussed the security and privacy procedures implemented utilizing the suggested framework to demonstrate the efficiency in addressing the security and privacy issues generated by current cellular touch tracking packages. According to Arun et al. [7], their paper aims to propose a method for detecting and displaying asymptotic patients using IoT-based sensors. Blockchain with COVID-19. Blockchain Technology began in 2009, and it is the key primary technology of Bitcoin. Currently, it is making significant inroads in the medical field—the traceability aspect of blockchain and the ability to leverage that to track donations and many more aspects.

790

G. Jain et al.

3 Blockchain with COVID-19 Blockchain is one of the most extensively used technologies in today’s world. It is currently making substantial gains in a variety of industries. With use cases, proofsof-concept, and full-fledged enterprises built on blockchain technology growing at a rapid rate, there aren’t many industries that should not be enthusiastic or concerned about its possibilities. Robustness and trust are the benefits of this type of numerous, decentralized storage at the sacrifice of confidentiality and processing performance. Any network participant can verify the validity of transactions. Network consensus mechanisms and cryptographic techniques are employed to validate transactions. In addition, decentralized storage in the blockchain is well-known for its high failure resilience. The blockchain remains operational even if many network participants fail, removing the single point of failure. The blockchain is a distributed ledger that keeps new data in an immutable fashion. As new blocks were added to the blockchain, its record-keeping mechanism prevents the erasure or reversal of previously added transactions. These unique blockchain properties as facilitators for more trustworthy, tamper-proof, and failure-resistant systems are the basis for the following use cases. A public (permissionless) blockchain is different from a private (permitted) blockchain [7]. Anyone can join and transact on a public blockchain, and anyone can participate in the consensus supporter process. Bitcoin and Ethereum [8] are two of the most well-known public blockchain applications. The private blockchain, on the other hand, is a network that is only accessible by invitation and control by a central organization. A verification tool must be available to a participant. Data blocks, distributed ledgers (databases), and consensus algorithms are the three essential blockchain components. Figure 1 depicts the essential components of a blockchain. It can explain the sequence of blocks using data blocks. It created a chain by starting with a genesis block and interacting with each recently updated block. A hash label connects each block to the one before it, ensuring reliable interconnection and removing the possibility of manipulation [9]. The distributed ledger also functions as a database that connects all network applicants. It Using the mining approach, it records and archives user transactions and confirms consensus among services (i.e., Proof of Work-PoW). Within the distributed ledger, each record has a unique cryptanalytic signature and a timestamp that prevents

Fig. 1 Framework of blockchain

Secure COVID-19 Treatment with Blockchain and IoT-Based Framework

791

the history from being changed. The entity should not control the mechanism for transacting the block across the chain in consensus algorithms. To avoid security issues, like double-spending attacks, all participants with equal privileges manage each block. Consensus is the technique that facilitates this process. According to the blockchain vision, the consensus technique primarily ensures that entities agree on every data blockchain. The mining approach will include nodes with high statistical capability that will compete with one another to be the first to verify the block. In biomedical and healthcare applications such as data privacy, blockchain has a high success rate [10]. Secure data management [11, 12], transparent data storage in medical [13–17], all related to health, makes it feasible to handle healthcare concerns related to the COVID-19 pandemic.

4 Blockchain with COVID-19 Blockchain is a method of having an “encrypted” block of data that is “chained” to each other over a secure link to ensure that data is not tampered with since transactions are recorded. It empowers service providers and patients in today’s digitalized and current healthcare environment. In a period where there are many factors containing epidemics, the tool is frequently developing in structure. This not only improved and secured the entire payment process, from insurance purchases to claims, but it also resulted in a significant reduction in face-to-face interactions. The biggest downside is that with the advancement of technology, physical labour is also removed. It also helps to eliminate a considerable amount of infection risk. It is one of the essential elements for making transactions traceable and safe. Even with mobile phones, the entire method is much faster, more practical, secure, and robust. In this paper, a detailed description of the design is given, shown in Fig. 2, describing the blockchain module for COVID-19 pandemic situation. The architecture is conceptually organized, from gathering coronavirus data sources to blockchain functions and stakeholders. All of the data from clinical labs, hospitals, social media, and various other sources are first combined to form raw data, scaled up to big data. During the coronavirus epidemic tracking and analytics, this data must be kept private and secure using blockchain. Here, blockchain can help with coronavirus-related services like outbreak tracking, user privacy protection, secure day-to-day operations, medical supply chain, and donation tracking. Intelligent AI-based solutions are used to examine secure data collected from the blockchain network. AI can support coronavirus fighting via five main applications: outbreak estimation, coronavirus detection, coronavirus analytics, vaccine/drug development, and prediction of any future coronavirus-like outbreak by using reliable prediction and accurate analysis ability on extensive data collected from coronavirus sources. Finally, the stakeholder layer, which includes governments and healthcare providers who profit from blockchain-AI solutions, is at the top of the structure. Due to its decentralized system, blockchain can create secure communication networks

792

G. Jain et al.

Fig. 2 Blockchain module for corona pandemic

and protocols to construct a privacy-preserving, rapid, and trustworthy data exchange with stakeholders [18]. We give an exhaustive assessment of the most recent research efforts combining blockchain and AI for the coronavirus pandemic in the sections below. Initially, data from many sources such as labs, hospitals, social media, and other sources are joined to create raw data, which scales up to vast amounts of data. This information must be kept anonymous and secure throughout the coronavirus pandemic tracking and analytics using blockchain. Blockchain is broken down into several modules, allowing for various coronavirus-related services such as pandemic tracking, user confidentiality, safe procedures, a medical ledger, and a contribution record. Due to its localized character, it is expected that blockchain would provide secure communication networks and protocols to decide a privacy-preserving, rapid, and reliable knowledge exchange between stakeholders.

Secure COVID-19 Treatment with Blockchain and IoT-Based Framework

793

4.1 Pandemic Tracking Blockchain will offer possible solutions for tracking the coronavirus pandemic. Certainly, the blockchain provides tools for coronavirus monitoring. It can be seen as a distributed information ledger where multiple updates can be received in a real-time period and store on blocks that link with each other secure and changeable manner. In summary, blockchain will support pandemic tracking by the subsequent solutions: • • • •

Common reporting of data. Forming a unique source of fact. We are analysing quarantine cases, in-home or hospitals successfully. We provide data privacy while enabling patient location monitoring via periodic regular inspection and recording on blocks.

4.2 User Confidentiality Several remedies are being considered around the world to combat the virus’s epidemic. Blockchain securely stores patient data like symptoms, whereabouts, and previous health conditions within the coronavirus pandemic. This trait denotes decentralization and trust. Every healthcare record usually is time-stamped with a signature for data proof, and the block is encrypted before being stored, which protects against rogue users and adversarial assaults. It encompasses the privacy of data for all users, including those who have been affected. Data is also decentralized via the network, as evidenced by public authorities, users, and a medical organization [19].

4.3 Safe Procedure In the COVID-19 era, technology evolved into a trustworthy platform that allows for the safe execution of routine essential events in virtual environments, reducing the danger of virus infection. The module describes two operations are one to one operation in which virtual environments do user services in the use of blockchain in civil services. Second is cross-boundary operations provide a solution for the financial loss. It plays a key role in creating a virtual environment for communicating different areas [20] and regions securely and effectively in healthcare support [21].

794

G. Jain et al.

4.4 Medical Ledger Medical ledgers, such as goods supply networks and trading supply chains, have found blockchain quite helpful [22–25]. In this pandemic, preserving a constant ledger of medicines and foodstuff has become a task for the medical area. It plays a vital role in supply chain firms in accomplishing a fast supply flow by tracking the flow end to end reliably and securely. In addition, it helps in Civilizations authorizations, economic expenditures, Shipping tracking, Stock creditability, Invention supplies.

4.5 Donation Record The donation record use has been potentially used to investigate current research [26, 27] as donations play one of the key properties supporting livings and medical services for infected patients. Tracking donation efforts to ensure that the supplied product, cash area unit, is transmitted to the specified patient is an important topic that has been raised. In these cases, a blockchain is frequently a viable option [28]. Many blockchain-based donation platforms establish high-cost donations using a blockchain concept called Bitcoin that helps supply useful stuff in different regions.

5 IoT and Blockchain with COVID-19 5.1 Proposed Layered Architecture The suggested framework’s four-layer architecture is depicted in Fig. 3 [29]. Current technologies have sought assistance in diagnosing many sick people, identifying the zones where COVID-19 is scattering, and dynamically finding important data [30]. Fig. 3 Layered architecture of proposed framework

Secure COVID-19 Treatment with Blockchain and IoT-Based Framework

795

Furthermore, despite the fact that data security is the most serious risk element, everyone is concerned about developing secure solutions to preserve users’ privacy [31]. Various studies have been conducted in order to establish a connection between applications and software [32]. The programmes are used to develop a variety of skills that rely on electronic observation to track an individual’s exposure to the COVID19 illness outbreak [33]. It also pinpoints others who may have come into contact with the afflicted. The World Health Organization (WHO) is regularly updating the COVID-19 database, submitting changes numerous times per day via the ArcGIS GeoEvent API [34].

5.2 Complete Healthcare Framework with IoT and Blockchain The Current pandemic caused allows every individual to rethink many common ideas and perspectives. All COVID-19 expansions encouraged everyone to come to a halt, but the safety of personal fitness at the forefront of their minds. As everyone returns to their daily activities, the most crucial responsibility is to do so safely. Blockchain technology has been proven to be a solution-based technique at the physical layer of the network, where transmission occurs by avoidance performing as a peer-to-peer system, with the goal of changing the various entities, in collaboration with some organizations or suppliers who can store data, create new, alter, or modify healthcare data. Figure 4 illustrates the use of blockchain in conjunction with an IoT-based architecture for healthcare maintenance. Blockchain technology’s hash performance has given it the ability to certify users. Verifying the hash keys is used to validate the legitimacy of blockchain-based posts. Customers will verify the legitimacy of the information without having to decipher all of the hashing in order to certify the group action specifics and will apply Merkle’s proof. It entails merging the branches’ left and right keys and comparing the result to family members. The hardware layer is made up of sensors that collect and deliver data to the higher level. The device node would be valid before it joined the framework because sensors capture healthcare data. Only after validating the sensor node is successful checking indicated [35–40]. Information is collected by devices and sensors, which can then be processed within the blockchain network. When compared to heterogeneous data, which generates traffic at a central place, distributed information provides secure functionality and data reliability.

796

G. Jain et al.

Fig. 4 Blockchain with IoT-based framework for healthcare maintenance

5.3 Securing Patients Buy Insurance with a Proposed Framework Studying the interaction between devices connected to the internet may utilize various communication mechanisms in the transmission channel. This information is collected by sending it to portable devices used by both consumers and healthcare providers. It will need to create consensus between patients and healthcare providers on confidentially sharing user data. As a result, the smart device platform can be developed as a peer-to-peer network. The application layer is where numerous applications/services work together to achieve collective behaviour. Aside from that, an implementation could be built by a medical practitioner. It has been asking for sensory information from the service user’s gadgets. Before the data to transfer to the network, clients must accept it. The network then verifies the signature to assure data security. If the authentication is reliable, it will be agreed upon appropriate diagnosing suggestion information and

Secure COVID-19 Treatment with Blockchain and IoT-Based Framework

797

possibly dismissed after all. The IoT and blockchain-based structure for a healthcare system with insurance is shown in Fig. 5. This planned healthcare system is evolving into an information-intensive system that will necessitate massive amounts of data to be collected, evaluated, and shared regularly. Patient data is typically divided throughout organizational-centric patient information, contributing to the incoherence of repercussions ranging from poor provider cohesion to the relevance of necessary data during most emergencies. Each allowed action is tracked using distributed computing, which has been copied among authorized users. A shared database approach is a shared database that ensures transparency through approved, checked, and protected data sharing.

Fig. 5 Framework for a healthcare system with insurance based on IoT and blockchain

798

G. Jain et al.

6 Conclusion The WHO brings together experts from all across the world, including government officials, to endorse the pattern of research and development and devise worldwide measures to restrict the spread of the COVID-19 pestilence and improve support for those who have been afflicted. Testing COVID-19 may be possible with the proposed notion, and blockchain will make it possible to spread the pandemic through its ledger services, allowing for the establishment of a secure medical supply chain. Individuals would be used to distinguish and test chemicals, such as anti-infection antibodies. This paper can be used to identify the scope of disease-causing agents such as infections, contaminants, and diseases. This study covers a wide range of adaptable products and activities. The presence in remote diagnostics and observation, silent data, clinical informatics, sensitive data wellbeing, general wellbeing data collection, medical care specialists, and data innovation. The extent to which health technology has an impact differs according on the segment. Its portable medical care administrations exposition is based on open and honest advancement and the utility of each aid industry.

References 1. Khatoon, A.: Use of blockchain technology to curb novel coronavirus disease (COVID-19) transmission. Available at SSRN 3584226 (2020) 2. Kumar, S., Pundir, A.K.: Blockchain–Internet of things (IoT) enabled pharmaceutical supply chain for COVID-19. In: NA International Conference on Industrial Engineering and Operations Management Detroit (2020, January) 3. Alsamhi, S.H., Lee, B.: Blockchain for multi-robot collaboration to combat COVID-19 and future pandemics. arXiv preprint arXiv:2010.02137 (2020) 4. Kumar, S., Raut, R.D., Narkhede, B.E.: A proposed collaborative framework by using artificial intelligence-internet of things (AI-IoT) in COVID-19 pandemic situation for healthcare workers. Int. J. Healthc Manage 1–9 (2020) 5. Wang, S., et al.: Security control components for pandemic prevention donation management blockchain. In: Proceedings of the 2nd ACM International Symposium on Blockchain and Secure Critical Infrastructure (2020) 6. Arifeen, M.M., Al Mamun, A., Kaiser, M.S., Mahmud, M.: Blockchain-enable contact tracing for preserving user privacy during COVID-19 outbreak. Preprints (2020) 2020070502 doi: https://doi.org/10.20944/preprints202007.0502.v1 7. Arun, M., et al.: Detection and monitoring of the asymptotic COVID-19 patients using IoT devices and sensors,Int. J. Pervasive Comput. Commun. Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/IJPCC-08-2020-0107 (2020) 8. Nguyen, D.C., et al.: Blockchain for 5G and beyond networks: a state of the art survey. J. Netw. Comput. Appl. 102693 (2020) 9. Nguyen, D.C., Pathirana, P.N., Ding, M., Seneviratne, A.: Integration of blockchain and cloud of things: Architecture, applications and challenges. IEEE Communications Surveys & Tutorials, 22(4), 2521–2549 (2020) 10. Nguyen, D.C., Ding, M., Pathirana, P.N., Seneviratne, A.: Blockchain and AI-based solutions to combat coronavirus (COVID-19)-like epidemics: A survey. IEEE Access, 9, 95730–95753 (2021)

Secure COVID-19 Treatment with Blockchain and IoT-Based Framework

799

11. Kuo, T.-T., Kim, H.-E., Ohno-Machado, L.: Blockchain distributed ledger technologies for biomedical and health care applications. J. Am. Med. Inform. Assoc. 24(6), 1211–1220 (2017) 12. Hasselgren, A., et al.: Blockchain in healthcare and health sciences—a scoping review. Int. J. Med. Inform. 134, 104040 (2020) 13. Nguyen, D.C., Pathirana, P.N., Ding, M., Seneviratne, A.: Blockchain for secure ehrs sharing of mobile cloud-based e-health systems. IEEE Access 7, 66792–66806 (2019) 14. Nguyen, D.C., Nguyen, K.D., Pathirana, P.N.: A mobile cloud-based IoT framework for automated health assessment and management. In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 6517–6520 (2019) 15. Jiang, S., Cao, J., Wu, H., Yang, Y., Ma, M., He, J.: Blochie: a blockchain-based platform for healthcare information exchange. In: 2018 IEEE International Conference on Smart Computing (Smart Comp), pp. 49–56 (2018) 16. Zheng, X., Mukkamala, R.R., Vatrapu, R., Ordieres-Mere, J.: Blockchain-based personal health data sharing system using cloud storage. In: 2018 IEEE 20th International Conference on e-Health Networking, Applications, and Services (Healthcom), pp. 1–6 (2018) 17. Novikov, S.P., Kazakov, O.D., Kulagina, N.A., Azarenko, N.Y.: Blockchain and smart contracts in a decentralized health infrastructure. In: 2018 IEEE International Conference “Quality Management, Transport, and Information Security, Information Technologies” (IT&QM&IS), pp. 697–703 (2018) 18. Dimitrov, D.V.: Blockchain applications for healthcare data management. Healthc. Inform. Res. 25(1), 51–56 (2019) 19. Hasan, M.R., Deng, S., Sultana, N., Hossain, M.Z.: The applicability of blockchain technology in healthcare contexts to contain COVID-19 challenges. Library Hi Tech 39(3) pp. 814–833 (2021). https://doi.org/10.1108/LHT-02-2021-0071 20. Kshetri, N., Voas, J.: Blockchain in developing countries. It Professional 20(2), 11–14 (2018) 21. Juma, H., Shaalan, K., Kamel, I.: A survey on using blockchain in trade supply chain solutions. In: IEEE Access. 7, 184115–184132 (2019). https://doi.org/10.1109/ACCESS.2019.2960542 22. Gonczol, P., Katsikouli, P., Herskind, L., Dragoni, N.: Blockchain implementations and use cases for supply chains—a survey. IEEE Access 8, 11856–11871 (2020) 23. Kouhizadeh, M., Sarkis, J.: Blockchain practices, potentials, and perspectives in greening supply chains. Sustainability 10(10), 3652 (2018) 24. Wu, H., Cao, J., Yang, Y., Tung, C.L., Jiang, S., Tang, B., Liu, Y., Wang, X., Deng, Y.: Data management in a supply chain using blockchain: challenges and a case study. In: 2019 28th International Conference on Computer Communication and Networks (ICCCN), pp. 1–8 (2019) 25. Sirisha, N.S., Agarwal, T., Monde, R., Yadav, R., Hande, R.: Proposed solution for trackable donations using blockchain. In: 2019 International Conference on Nascent Technologies in Engineering (ICNTE), pp. 1–5 (2019) 26. Saleh, H., Avdoshin, S., Dzhonov, A.: Platform for tracking donations of charitable foundations based on blockchain technology. In: 2019 Actual Problems of Systems and Software Engineering (APSSE), pp. 182–187 (2019) 27. Blockchain and AI Amidst the Coronavirus Crisis: ‘A Call to Arms’ [Online] 28. Alam, T.: mHealth communication framework using blockchain and IoT technologies. Int. J. Sci. Technol. Res. 9(6) (2020) 29. Singhal, T.: A review of coronavirus disease-2019 (COVID-19). Indian J. Pediatr. 1–6 (2020) 30. Jiang, S., Xia, S., Ying, T., Lu, L.: A novel coronavirus (2019-nCoV) causing the pneumoniaassociated respiratory syndrome. Cell. Mol. Immunol. 17(5), 554–554 (2020) 31. Mehta, P., McAuley, D.F., Brown, M., Sanchez, E., Tattersall, R.S., Manson, J.J., HLH Across Speciality Collaboration: COVID-19: consider cytokine storm syndromes and immunosuppression. Lancet (London, England) 395(10229), 1033 (2020) 32. Kannan, S., Shaik Syed Ali, P., Sheeza, A., Hemalatha, K.: COVID-19 (Novel coronavirus 2019)—recent trends. Eur. Rev. Med. Pharmacol. Sci. 24(4), 2006–2011 (2020) 33. Udugama, B., Kadhiresan, P., Kozlowski, H.N., Malekjahani, A., Osborne, M., Li, V.Y.C., Chen, H., Mubareka, S., Gubbay, J.B., Chan, W.C.V.: Diagnosing COVID-19: the disease and tools for detection. ACS Nano 14(4), 3822–3835 (2020). https://www.cryptonewsz.com/blo ckchain-and-ai-amidst-the-coronavirus-crisis-a-call-to-arms/

800

G. Jain et al.

34. Alam, T.: A reliable framework for communication on the internet of smart devices using IEEE 802.15.4. ARPN J. Eng. Appl. Sci. 13(10), 3378–3387 (2018) 35. Alam, T., Aljohani, M.: Design and implementation of an ad hoc network among android smart devices. In: 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), pp. 1322–1327. IEEE (2015). http://doi.org/10.1109/ICGCIoT.2015.7380671 36. Alam, T., Aljohani, M.: An approach to secure communication in mobile ad-hoc networks of android devices. In: 2015 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS), pp. 371–375. IEEE (2015). http://doi.org/10.1109/iciibms.2015.7439466 37. Aljohani, M., Alam, T.: An algorithm for accessing traffic database using wireless technologies. In: 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–4. IEEE (2015). http://doi.org/10.1109/iccic.2015.7435818 38. Alam, T., Aljohani, M.: Design a new middleware for communication in ad hoc network of android smart devices. In: Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies, p. 38. ACM (2016). http://doi. org/10.1145/2905055.2905244 39. Chen, E., Lerman, K., Ferrara, E.: Covid-19: the first public coronavirus twitter dataset (2020). arXiv preprint arXiv:2003.07372. Cohen, J.P., Morrison, P., Dao, L.: Covid-19 image data collection (2020). arXiv preprint arXiv:2003.11597 40. Liu, W., Yen, P.T.-W., Cheong, S.A.: Coronavirus disease 2019 (covid-19) outbreak in China, spatial-temporal dataset. arXiv preprint arXiv:2003.11716

Design of a CPW-Fed Microstrip Elliptical Patch UWB Range Antenna for 5G Communication Application Abhishek Kumar, Garima Jain, Suraj, Prakhar Jindal, Vishwas Mishra, and Shyam Akashe

Abstract These day, due to tremendous growth of wireless communication and the need for higher data transfer rates and portable and compact devices, the need for antenna with a simple design, small size, reliable radiation pattern while retaining an incredibly large frequency spectrum is on high demand. However, Ultra-Wide Band (UWB) antenna design is particularly challenging for portable devices. Nevertheless, UWB antenna design faces several challenges especially for portable devices, containing the UWB quality of impedance matching, small antenna size, constant group delay, radiation stability, and low production costs, etc., for consumer usage. Because of their low profile, broad impedance bandwidth and compact design, ease of fabrication, etc. Today, Ultra-Wide-Band antennas (UWB) have been analyzed and their features explored in the future wireless communication of the fifth generation (5G). The antennas are called planned and the scale should be minimal, so that the geometry in the 5G candidate frequency bands will be properly optimized to operate within an ultra-wide frequency band. Microstrip antenna are very successful candidates for wireless communication systems. In this research paper, by using UWB Application, we proposed a novel concept design and study of a coplanar wave guide (CPW) fed UWB for 5G. Our proposed patch antenna is elliptical in shape that offers

A. Kumar (B) · V. Mishra Swami Vivekanand Subharti University, Meerut, Uttar Pradesh, India G. Jain Noida Institute of Engineering and Technology, Greater Noida, India Suraj NIT Patna, Patna, Bihar, India e-mail: [email protected] P. Jindal Amity University, Gurugram, Haryana, India e-mail: [email protected] S. Akashe ITM University, Gwalior, Madhya Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Nagar et al. (eds.), Intelligent Sustainable Systems, Lecture Notes in Networks and Systems 334, https://doi.org/10.1007/978-981-16-6369-7_71

801

802

A. Kumar et al.

greater degree of freedom and flexibility in design compared to circular configuration and can also provide circular polarization by proper selection of ellipse feed line while CPW-fed provides strong matching impedance. Keywords UWB · CPW · Microstrip · Antenna · VSWR · RT-Duroid · 5G

1 Introduction With increasing growth and advancement of wireless networking, the Ultra Broad Band (UWB) plays the breakthrough technology for transmitting vast quantities of digital data over a high frequency range utilizing short bursts and low power radio signals [1]. With the rapid growth of UWB systems, antennas are said to be electronic eye of world in communication system and hence in UWB applications various styles of antennas are being built. UWB antenna design should adopt properties such as constant gain and radiation pattern, wider bandwidth, low profile, strong radiation efficiency, constant group delay, easy development, etc. Microstrip antenna for Ultra Broad Band (UWB) spectrum has been given much attention to its advantages such as compact structure, low profile, fast data transmission rate, wide bandwidth quick integration with monolithic integrated microwave circuits (MMICs) and efficient production [2]. The UWB antenna was thus the most exciting choice for short (10 m), peer-to-peer ultra-fast communications and many other applications. This has sparked the domain of the researcher into dwelling deep in UWB antenna design. UWB antennas are getting widespread popularity due to their various superior qualities. They can be constructing to use short pulse technique to get high range resolution while radiating energy in particular direction. In multipath environment, the performance of UWB antenna is better to that of narrowband system because a UWB short pulse allows return from distinct scatters to be identifying by using time application. Since 2002, when the US Federal Communication Commission (FCC) approved the unlicensed use of the UWB band at a frequency spectrum of 3.1–10.6 GHz [3], The researcher gets much attention to use this spectrum for UWB communication system for short data transmission Because of its inherent low power consumption properties, high data rate and simple configuration that led to the high demand for the UWB antenna. Different design shapes of patches of compact microstrip patch antennas such as circular, rectangular, elliptical and random geometry with different feeding method at various substrates has designed for UWB application [4]. In another way, for increasing growth and advancement of wireless networking, there are need of multi-band, broadband and receiving wires of UWB which are flexible and have multi-purpose coverage, Multi-faceted systems, and remote services and improves the overall system and reduces the prices. New antenna geometries are being identified that are sufficient to address the demands of remote Communication structures. The 5G technology is evolving to address the need for substantially greater data rates, which would be required for applications in the future such

Design of a CPW-Fed Microstrip Elliptical Patch UWB Range …

803

as Internet of Things (IoT), Machine to Machine (M2M) Communication, broadband organizations. Particularly in comparison with the 4G technology, one of main changes in 5G frameworks is that they switch to higher frequencies with greater outstanding communication capability and greater data speeds. These 5G structures, designated to be 24.25–27.5 GHz, 31.8–33.4 GHz, 26–43.5 GHz, 45.5–52.6 GHz, 66–76 GHz and 81–86 GHz, are designed for the preparation of new problems of scalable system antennas. As the five G groupings have not only pushed up at high frequencies but are now scattered across the 24 and 86 GHz spectrum, (a) radio cables have to display ultra-wide or multi-band operation (b), they have to be well organized in order to obtain a good amount of expertise and in particular to raise their esteem properly; Two primary reception appliance classifications suggested and used. The number of receives that can reach extended transmission speeds for use in UWB frames, such as the bi-cone-shaped antenna and Vivaldi receiver. Winding radio cables and Log-broken are two separate UWB reception systems that can run in the 3.1–10.6 GHz frequency range for indoor remote networking applications or flexible systems. Since they have considerable physical dimensions, just like dispersive properties with frequency and serious ringing effect [5–7]. The purpose of this work is to synthesize the useable UWB antennas in all frequency ranges. For the overview, all antennas are evaluated to support UWB operation by design, benefit and efficiency. The past radio-opening wires inspire the current structure, but a new finish is added to the newly produced antennas so that the antenna is of a high quality and particularly for the 5G.

2 Theories of Elliptically Patch Microstrip Antenna The Microstrip antenna is perhaps the most innovative and popular invention ever. The Microstrip patch antenna is the most widely used antenna for military, rocket, aircraft, satellites, cell and satellite communications and medical applications. This is due to its advantages such as low profile, low cost and simple processing, mechanical robustness and flexibility with regard to electromagnetic characteristics. Of course patch antenna has some disadvantages; one well known is narrow bandwidth and low efficiency. However, even these shortcomings have been overcome by deploying several techniques to increase the bandwidth and very wideband Microstrip antennas have been developed. Although elliptical patch structure is perhaps rare analyzed regular shape geometry due to involvement of complex and higher order of mathematics but it has several advantage over circular or other patch configuration like providing larger degree of freedom and flexibility in design of antenna. One of the remarkable advantages is that an exciting elliptical patch, rather than a rectangular or circular polarization, can easily be achieved by selecting the ellipse feed position properly [7]. To achieve circular polarization, the feed line must be on the radial side relative to the main axis where the positive side axis provides the left-hand circular polarized (LHCP) and the negative axis the right-hand circular polarized (RHCP) dimension. In addition, the

804

A. Kumar et al.

elliptical configuration provides better flexibility in the design by using eccentricity and the focal length to precisely adjust the antenna to the required output.

3 Selection of Substrate For Antenna designing, picking an appropriate substrate is the first step a play very important role. The primarily need of substrate in designing of microstrip patch antenna is to provide mechanical support of antenna but with providing this mechanical strength to antenna, substrate also effect electrical properties of antenna, transmission line and circuits. The substrate usually having dielectric ranges from 2.2 to 12 are used in the designing of integrated antenna. The thin substrate having higher dielectric constant gives smaller antenna size but they are less efficient and have relatively smaller bandwidth due to their high losses. While thick substrate having lower dielectric constant gives good performance and provide better efficiency and larger bandwidth but gives larger antenna size. The performance of antenna improves when the value of dielectric constant reduces. RT-Duroid, FR-4, Bakelite, Taconic TLC are the some common substrate that are frequent used in designing of microstrip antennas as shown in Table 1. FR-4 is low cost, easy available and with good strength to weight ratios and nearly zero water absorption. RT-Duroid has low loss tangent, exhibit excellent chemical resistance and environmental friendly [8]. In our proposed antenna, RT-Duroid substrate material has taken due to its properties such as lowest electrical loss, moderate tangent loss, low absorption of moisture, strong electrical properties over frequency and excellent chemical resistance. RTDuroid substrate material has been a must in recent wireless communication for many applications such as Microwave, mmwave, Radar and space communication. Table 1 Some properties of different substrate Parameters

Substrate RT-Duroid

FR-4

Taconic TLC

Bakelite

Dielectric constant

2.2

4.36

3.2

4.78

Loss tangent

0.0004

0.013

0.002

0.03045

Surface resistivity (M)

3 × 107

2 × 105

1 × 107

5 × 1010

Volume resistivity (M cm)

2×

8×

1×

3 × 1015

Tensile strength (MPa)

450